Statistical Considerations for Cross-Sectional HIV Incidence Estimation Based on Recency Test

Fei Gao; Marlena Bannick

doi:10.1002/sim.9296

. Author manuscript; available in PMC: 2023 Apr 15.

Published in final edited form as: Stat Med. 2022 Jan 4;41(8):1446–1461. doi: 10.1002/sim.9296

Statistical Considerations for Cross-Sectional HIV Incidence Estimation Based on Recency Test

Fei Gao ^1,^2,^*, Marlena Bannick ³

PMCID: PMC8918003 NIHMSID: NIHMS1767850 PMID: 34984710

Summary

Longitudinal cohorts to determine the incidence of HIV infection are logistically challenging, so researchers have sought alternative strategies. Recency test methods use biomarker profiles of HIV-infected subjects in a cross-sectional sample to infer whether they are “recently” infected and to estimate incidence in the population. Two main estimators have been used in practice: one that assumes a recency test is perfectly specific, and another that allows for false-recent results. To date, these commonly used estimators have not been rigorously studied with respect to their assumptions and statistical properties. In this paper, we present a theoretical framework with which to understand these estimators and interrogate their assumptions, and perform a simulation study and data analysis to assess the performance of these estimators under realistic HIV epidemiological dynamics. We find that the snapshot estimator and the adjusted estimator perform well when their corresponding assumptions hold. When assumptions on constant incidence and recency test characteristics fail to hold, the adjusted estimator is more robust than the snapshot estimator. We conclude with recommendations for the use of these estimators in practice and a discussion of future methodological developments to improve HIV incidence estimation via recency test.

Keywords: Biomarker, HIV, Incidence, Prevalence, Recency Assay

1 |. INTRODUCTION

Determination of the incidence rate of HIV is critical for HIV surveillance and for evaluating the effectiveness of HIV prevention efforts. The current gold standard is through longitudinal follow-up and repeated testing of a cohort of participants drawn from the population of interest, such that the incidence can be estimated by the ratio of number of new cases and total follow-up time. This approach is theoretically simple, but it may present issues for HIV surveillance¹. For example, high follow-up rates in large representative samples may be difficult to obtain, the cost of such studies is usually high, and there may be differences in HIV risk behaviors among persons who do and do not participate in cohort studies. In addition, there may be retention bias during follow-up or alteration of HIV risk by repeated HIV counseling and testing (Hawthorne effect)².

An important alternative approach that avoids longitudinal follow-up and repeated testing is cross-sectional incidence estimation. This approach utilizes a biomarker-based algorithm to determine which infections in a cross-sectional sample drawn from the population of interest were acquired “recently”. It was first proposed by Brookmeyer & Quinn (1995)³, where subjects with negative HIV-antibody test and positive HIV-1 p24 antigen test were classified as recently infected. This recency test is infeasible in practice since the short p24 antigen-positive pre-seroconversion period requires testing a large number of individuals to estimate incidence with precision. Later, a number of serological assays that measure the antibody response to HIV infection were proposed, for example the detuned assay⁴, the BED capture EIA⁵, and the avidity assay^6,7. Genetic diversity of HIV has also been used as a biomarker to indicate HIV recency^8,9,10. To improve the performance of the recency algorithm, others have proposed multi-assay algorithms that make use of multiple assays and biomarkers to indicate recency^11,12,13,14.

A number of statistical approaches have been proposed to determine HIV incidence based on a cross-sectional sample. They make use of recency test results in a cross-sectional sample as well as the classification characteristics of the recency test. Based on the understanding that incidence can be viewed as the expected number of new infections per uninfected person per unit time, Kaplan & Brookmeyer (1999)¹⁵ proposed the “snapshot estimator”

\hat{λ} = \frac{N_{r e c}}{\hat{μ} N_{n e g}},

(1)

where N_rec and N_neg are the numbers of test-recent HIV-positive subjects and HIV-negative subjects, respectively, from the cross-sectional sample, and $\hat{μ}$ is an estimate of the “mean window period” of the recency test (the average duration of infection among the subjects classified as recently infected; we will discuss a formal definition of this parameter in Section 2.2). The snapshot estimator was suggested to be unbiased when the incidence is constant over time, and it has been adopted in a number of applications in HIV incidence estimation^16,17,18.

The snapshot estimator implicitly assumes that the mean window period is finite (and short), such that a long-infected subject would have a zero probability of being classified as recent. However, for many recency tests, a proportion of long-infected persons may be classified as “recent”, such that the performance of the snapshot estimator may not be satisfactory. A number of methods have been proposed to address such false-recency^19,20,21. One widely adopted approach is the “adjusted estimator” from Kassanjee et al.(2012)²¹, where an infection duration cutoff T* is defined to delineate between “recent” and “long” infected subjects. Based on this cutoff, the adjusted estimator uses two characteristics of a recency test that are closely related to the sensitivity and specificity of a classification procedure: mean duration of infection (MDRI) $Ω_{T^{*}}$ and false-recent rate (FRR) $β_{T^{*}}$ . MDRI is similar to the mean window period in that it captures the average duration of infection among those who are “truly recent” and classified as recently infected, and FRR is the probability of mis-classified for a randomly selected long-infected subjects (these parameters will be further discussed in Section 2.2). For a recency test, the adjusted estimator is given by

\tilde{λ} = \frac{N_{r e c} - N_{p o s} {\hat{β}}_{T^{*}}}{N_{n e g} ({\hat{Ω}}_{T^{*}} - {\hat{β}}_{T^{*}} T^{*})},

(2)

where N_pos is the number of HIV-positive subjects in the cross-sectional sample, and ${\hat{Ω}}_{T^{*}}$ and ${\hat{β}}_{T^{*}}$ are estimates of MDRI and FRR for the recency test, respectively. Since it accounts for recency tests that produce false-recent results, the adjusted estimator is thought to be more flexible and theoretically more robust than the snapshot estimator. It has also been widely adopted in applications of HIV incidence estimation^22,23. Furthermore, it is important to note that while for the adjusted estimator, a “truly recent” infection is defined as one that happened at most T* in the past, the snapshot estimator does not explicitly define a true recency duration. Rather, subjects with a longer duration of infection may be identified as a recent infection with a decreased probability, reflecting that infections that happen farther from the cross-sectional time contribute less to the snapshot estimator.

Even though the cross-sectional incidence estimators have been widely utilized in practice, the statistical properties, especially those of the adjusted estimator, have not been well studied or understood. Specifically, the key parameters (MDRI, FRR) may not be well characterized and the assumptions under which the estimators serve as unbiased estimators for the incidence in a target population have not been rigorously studied. In this paper, we formulate a theoretical framework for assessing HIV recency, formally establish the assumptions for cross-sectional incidence estimation based on the snapshot and adjusted estimators, and evaluate the bias of the estimators when the assumptions fail to hold. We evaluate the numerical performance of the estimators under various simulated settings with different HIV epidemic trajectories and recency tests with different properties and provide some recommendations in using the estimators in practice.

2 |. THEORETICAL MODEL

2.1 |. Notation

Let T be the (calendar) HIV infection time of a subject and let A(t) be an indicator of eligibility at time t, i.e., whether this subject would be eligible to be included in a survey for HIV incidence (and prevalence) of the target population at time t. This eligibility indicator A(t) can be based on a collection of (possibly time-dependent) individual covariates and a population of interest. For example, in a cross-sectional population survey, a minimum requirement for eligibility is being alive at the time the survey is conducted. Another example is A(t) = I(MSM, Age 18–50 at time t), which uses two characteristics – an indicator of being a member of the men who have sex with men population (MSM), and an indicator of being aged 18–50 – to define the eligible population at time t.

At any calendar time t, the prevalence in this target population is given by p(t) = Pr(T ≤ t|A(t) = 1), and the incidence in this population is given by

λ (t) = lim_{d t \to 0} \frac{1}{d t} Pr (t \leq T < t + d t ∣ T \geq t, A (t) = 1),

(3)

i.e., it is the rate of instantaneous HIV infection for an eligible HIV-negative subject at time t. With slight abuse of notation, we write λ(t) = Pr(T = t|T ≥ t, A(t) = 1). Note that both p(t) and λ(t) concern the distribution of infection time T in a restricted population defined by A(t), such that they are conditional quantities given A(t) = 1. The main goal of cross-sectional incidence estimation is to estimate λ(t) based on a finite-sized cross-sectional sample collected at time t satisfying A(t) = 1.

2.2 |. Cross-Sectional Sample

Suppose that we collect a random sample from the eligible population at time t (all subjects in the sample satisfy A(t) = 1). For each subject in that sample, we first assess their HIV status, and if it is positive, we apply a subsequent HIV recency test. We assume that HIV status can be determined without any mis-classification (e.g., using an RNA-based diagnostic), while HIV recency may not be, with details described below. Recall that we use N_neg, N_pos, and N_rec to denote the numbers of subjects who are HIV-negative, HIV-positive, and HIV test-recent, respectively. The probabilities associated with those subjects in the cross-sectional sample are:

The probability of HIV-negative: Pr(T > t|A(t) = 1) = 1 − p(t).
The probability of HIV-positive: Pr(T ≤ t|A(t) = 1) = p(t).
- The probability of HIV-positive and subsequently classified as recently infected based on the recency test: $P_{r e c} (t) = Pr (M \in R, T \leq t ∣ A (t) = 1)$ .

The variable M denotes the biomarker values of the HIV recency test and $R$ is a region for those values that classifies a subject as HIV recent. One example of a recently proposed HIV recency test is based on the combination of three biomarkers: LAg Avidity assay, BioRad Avidity assay, and viral load. For this test, the test-recent region $R$ is defined as LAg Avidity OD_n < 2.8, BioRad Avidity OD_n < 95%, and viral load > 400 copies/ml¹⁴.

At time t when the cross-sectional sample is taken, the probability of test-recent, i.e., $M \in R$ , shall depend on the true infection duration t − T. In particular, we define the duration-specific test-recent probability

ϕ (u, t) = Pr (M \in R ∣ T = t - u, A (t) = 1),

for infection duration u ≥ 0. Since the recency test is always applied to an HIV-positive subject that is eligible at time t, ϕ(u, t) is a probability conditional on A(t) = 1. We assume that ϕ(u, t) depends only on u, the infection duration, and does not depend on t. That is, the calendar time when the test is taken is irrelevant to test accuracy given a fixed infection duration, and we denote the quantity as ϕ(u), i.e., ϕ(u) ≔ ϕ(u, t) with u ∈ [0, ∞).

Summary measures of the duration-specific test-recent probability function ϕ(u) are suggested in literature to describe recency test properties and are used as parameters in cross-sectional incidence estimation. For example, the mean window period (μ) used in the snapshot estimator can be defined as an integration of ϕ(·) (as long as the integration is finite):

μ = \int_{0}^{\infty} ϕ (u) d u .

The mean duration of infection (MDRI, $Ω_{T^{*}}$ ) in the adjusted estimator is defined as a truncated integration of ϕ(·) from 0 to T*:

Ω_{T^{*}} = \int_{0}^{T^{*}} ϕ (u) d u .

The false-recent rate (FRR, $β_{T^{*}}$ ) is defined as the probability that a randomly chosen person from the population of long-infected subjects (i.e., has an infection duration for more than time T*) will be classified as “recently” infected by the recency test²¹. Let G(u) be the distribution of infection times among these long-infected subjects. Then $β_{T^{*}}$ can be written as

β_{T^{*}} = \frac{\int_{T^{*}}^{\infty} ϕ (u) d G (u)}{\int_{T^{*}}^{\infty} d G (u)} .

Remark 1. In Kassanjee et al.(2012)²¹, the MDRI is defined as

\int_{0}^{T^{*}} P_{R} (u) d u,

where P_R(u) is the probability of a person infected u time units ago still being alive and “recent” (in this setting being alive is the only eligibility criterion). That is,

P_{R} (u) = Pr (A (t) = 1, M \in R ∣ T = t - u, A (t - u) = 1),

with u ∈ [0, ∞). This definition is different from ours in that ϕ(u) conditions on A(t) = 1 but P_R(u) involves the probability of A(t) = 1 conditioning on A(t − u) = 1. Determining MDRI of a recency test typically involves sampling eligible individuals with known infection duration (to some reasonable approximation). In order to be sampled at time t, they need to be eligible at the current time t (A(t) = 1)), instead of eligible at the time of infection (A(t−u) = 1). Therefore, our definition of MDRI based on ϕ(u) is the one that is aligned with the sampling strategy of studies that are conducted in practice.

Based on our notation, the test-recent probability is given by

P_{r e c} (t) = \int_{0}^{\infty} Pr (T = t - u, M \in R ∣ A (t) = 1) d u = \int_{0}^{\infty} Pr (M \in R ∣ T = t - u, A (t) = 1) Pr (T = t - u ∣ T \leq t, A (t) = 1) Pr (T \leq t ∣ A (t) = 1) d u = p (t) \int_{0}^{\infty} ϕ (u) Pr (T = t - u ∣ T \leq t, A (t) = 1) d u .

That is, the test-recent probability is a weighted version of the duration-specific test-recent probabilities, where the weight is related to the distribution of the infection time for those infected and eligible at time t. The probability Pr(T = t − u|T ≤ t, A(t) = 1) is not directly linked to λ(t), the quantity of interest. Some further assumptions are needed to construct this linkage such that the estimation of λ(t) based on the cross-sectional sample is valid.

2.3 |. Assumptions for Cross-Sectional Estimators

Suppose that the prevalence function of HIV, p(t), is continuous over time. We introduce the following set of assumptions for cross-sectional incidence estimation.

Assumption A.1. ϕ(u) = 0 with u greater than some large value. Let τ be the upper bound of u such that ϕ(u) is positive, i.e., τ = max_u{ϕ(u) > 0}.

Assumption A.2. Pr(T = t − u|T ≤ t, A(t) = 1) = Pr(T = t|T ≤ t, A(t) = 1) for all u ∈ [0, τ].

Assumption A.1 indicates that the tail of ϕ(u) goes to zero when u is large, indicating zero test-recent probability for a subject infected long enough. It would ensure that the mean window period μ is finite, which is a key requirement for the validity of the snapshot estimator. Assumption A.2 suggests that the infection time is uniformly distributed in [t − τ, t] for an infected eligible subject at time t. Note that this is not necessarily equivalent to a constant incidence in [t − τ, t], and we will discuss this in detail in Section 2.4. Given Assumptions A.1–A.2, the test-recent probability can be written as

P_{r e c} (t) = p (t) \int_{0}^{\infty} ϕ (u) d u Pr (T = t ∣ T \leq t, A (t) = 1) = μ Pr (T = t ∣ A (t) = 1) = μ Pr (T = t ∣ T \geq t, A (t) = 1) Pr (T \geq t) = μ λ (t) {1 - p (t -)} = μ λ (t) {1 - p (t)},

such that

λ (t) = \frac{P_{r e c} (t)}{μ {1 - p (t)}} .

By replacing the parameters with their estimators, an estimator for λ(t) can be formulated as

\hat{λ} = \frac{N_{r e c}}{N_{n e g} \hat{μ}} .

This estimator is indeed the snapshot estimator¹⁵.

Some alternative assumptions may be considered for the adjusted estimator²¹.

Assumption B.1. ϕ(u) is constant for u ≥ T*. The constant value is given by $β_{T^{*}}$ .

Assumption B.2. Pr(T = t − u|T ≤ t, A(t) = 1) = Pr(T = t|T ≤ t, A(t) = 1) for all u ∈ [0, T*].

Assumption B.1 allows a non-zero test-recent probability for a long-infected subject, however, it restricts this probability to be constant. Otherwise, the false-recent rate would depend on G(·), the distribution of infection time with respect to which the false-recent rate is evaluated. Assumption B.1 may not necessarily be less restrictive than Assumption A.1. For example, Assay 2A in our simulation shown in Figure 1 satisfies Assumption A.1 but not Assumption B.1. Assumption B.2 suggests that the infection time is uniformly distributed in [t − T*, t] for an infected eligible subject at time t. Since T* is usually smaller than τ, Assumption B.2 is less restrictive than Assumption A.2, since the uniform distribution requirement on infection times is on a shorter time span in the past. Given Assumptions B.1 and B.2, the probability of test-recent can be written as

P_{r e c} (t) = p (t) {\int_{0}^{T^{*}} ϕ (u) Pr (T = t - u ∣ T \leq t, A (t) = 1) d u + \int_{T^{*}}^{\infty} β_{T^{*}} Pr (T = t - u ∣ T \leq t, A (t) = 1) d u} = p (t) [\int_{0}^{T^{*}} ϕ (u) Pr (T = t - u ∣ T \leq t, A (t) = 1) d u + β_{T^{*}} {1 - \int_{0}^{T^{*}} Pr (T = t - u ∣ T \leq t, A (t) = 1) d u}] = p (t) [Ω_{T^{*}} Pr (T = t ∣ T \leq t, A (t) = 1) + β_{T^{*}} {1 - T^{*} Pr (T = t ∣ T \leq t, A (t) = 1)}] = (Ω_{T^{*}} - β_{T^{*}} T^{*}) Pr (T = t ∣ A (t) = 1) + p (t) β_{T^{*}} = (Ω_{T^{*}} - β_{T^{*}} T^{*}) Pr (T = t ∣ T \geq t, A (t) = 1) Pr (T \geq t ∣ A (t) = 1) + p (t) β_{T^{*}} = (Ω_{T^{*}} - β_{T^{*}} T^{*}) λ (t) {1 - p (t)} + p (t) β_{T^{*}},

such that

λ (t) = \frac{P_{r e c} (t) - β_{T^{*}} p (t)}{{1 - p (t)} (Ω_{T^{*}} - β_{T^{*}} T^{*})} .

Then, an estimator for λ(t) can be formulated as

\tilde{λ} = \frac{N_{r e c} - N_{p o s} {\hat{β}}_{T^{*}}}{N_{n e g} ({\hat{Ω}}_{T^{*}} - {\hat{β}}_{T^{*}} T^{*})},

which is the adjusted estimator²¹.

Plot of duration-specific test-recent probability ϕ(t) for eight recency tests. Left: Tests 1A-D. Right: Tests 2A-D.

Assumption B.1 requires a constant ϕ(u) for u ≥ T*, such that the false-recent rate $β_{T}^{*}$ no longer depends on the distribution of the long-infected population G(·). Then, an unbiased estimate of $β_{T^{*}}$ can be obtained by taking the average test-recent rate among an arbitrary sample of long-infected subjects. In practice, ϕ(u) may be non-constant for u > T*. In that case, the summary FRR $β_{T^{*}}$ depends on the distribution G(·) and is context-specific. For example, it may depend on the demographic and epidemiological history of the population²⁴. An estimate ${\hat{β}}_{T^{*}}$ depends on the distribution of long-infected subjects based on which ${\hat{β}}_{T^{*}}$ is estimated. In practice, researchers usually prefer a recency test with a small FRR (< 2%), so that ϕ(u) can be viewed as approximately constant for u > T*.

Notice that we have made minimal assumptions on the shape of ϕ(u) (Assumptions A.1 and B.1). In practice, it is desirable to have ϕ(u) monotone decreasing in u such that a subject with shorter duration of infection is likely to be identified as recent with higher probability. This is particular relevant when incidence is non-constant, such that the snapshot and adjusted estimators estimate a weighted incidence with a larger weight on recent incidence (more details in section 2.5).

Remark 2. In the case when Assumption B.1 is violated, use of the adjusted estimator may still be appropriate if FRR is evaluated among a similar population as the long-infected subjects in the cross-sectional sample. Specifically, if Pr(T = t − u|T ≤ t, A(t) = 1)du = dG(u) for u ≥ T*, then

\int_{T^{*}}^{\infty} ϕ (u) Pr (t - u ∣ T \leq t, A (t) = 1) d u = \int_{T^{*}}^{\infty} ϕ (u) d G (u) = β_{T^{*}} \int_{T^{*}}^{\infty} d G (u) = β_{T^{*}} \int_{T^{*}}^{\infty} Pr (T = t - u ∣ T \leq t, A (t) = 1) d u = β_{T^{*}} {1 - \int_{0}^{T^{*}} Pr (T = t - u ∣ T \leq t, A (t) = 1) d u} = β_{T^{*}} {1 - T^{*} Pr (T = t ∣ T \leq t, A (t) = 1)},

where the last equality follows from Assumption B.2,. Then, derivations for P_rec(t) to obtain the adjusted estimator still hold. Therefore, we may still appropriately use the adjusted estimator, if the distributions of the long-infected subjects in the cross-sectional sample and in the evaluating external study where ${\hat{β}}_{T^{*}}$ is estimated are the same.

2.4 |. Distribution of Infection Time in the Eligible Population

In the derivations for both estimators, one key assumption is that Pr(T = s|T ≤ t, A(t) = 1) = Pr(T = t|T ≤ t, A(t) = 1) for s ∈ [t − c, t], where c = τ for the snapshot estimator and c = T* for the adjusted estimator. A similar assumption was also suggested in Mahiane et al.²⁵ in describing the sensitivity and specificity of recency biomarker.

Write $λ_{t}^{*} (s) = Pr (T = s ∣ T \geq s, A (t) = 1)$ and $p_{t}^{*} (s) = Pr (T \leq s ∣ A (t) = 1)$ as the incidence and prevalence at time s restricted to the eligible population at time t. Note that $λ_{t}^{*} (s)$ differs from the λ(s) defined in (3), since they are restricted to the populations that are eligible at different times. Obviously we have $λ_{t}^{*} (t) = λ (t)$ and $p_{t}^{*} (t) = p (t)$ . The key quantity Pr(T = s|T ≤ t, A(t) = 1) can be written as

Pr (T = s ∣ T \leq t, A (t) = 1) = \frac{Pr (T = s ∣ T \geq s, A (t) = 1) Pr (T \geq s ∣ A (t) = 1)}{Pr (T \leq t ∣ A (t) = 1)} = \frac{λ_{t}^{*} (s) {1 - p_{t}^{*} (s -)}}{p (t)} .

To connect $λ_{t}^{*} (s)$ and $p_{t}^{*} (s)$ with the observed incidence λ(t) and prevalence p(t), we make the following assumption.

Assumption C. For s ∈ [t − c, t], the restricted incidence is equal to the unrestricted (or observed) incidence, i.e., $λ_{t}^{*} (s) = λ (s)$ , and the restricted prevalence is equal to the unrestricted prevalence, i.e., $p_{t}^{*} (s) = p (s)$ , and for all t.

This assumption would approximately hold when c is small. Specifically, if A(t) is defined by characteristics such that only a small proportion of the subjects move in and out of the eligible population in a time span of c, then the eligible population remains approximately the same, i.e., A(s) ≈ A(t) for s ∈ [t−c, t]. When A(t) is defined by covariate values such as membership of a particular population (e.g., MSM), this assumption requires that the most of the subjects who were part of this population at time s are also part of this population at time t. Based on Assumption C,

Pr (T = s ∣ T \leq t, A (t) = 1) = \frac{λ (s) {1 - p (s)}}{p (t)}

for s ∈ [t − c, t]. Therefore, Assumptions A.2 and B.2 would hold if the incidence and prevalence are constant over [t − c, t] (c = τ for the snapshot estimator and c = T* for the adjusted estimator). Specifically, we consider the following assumption.

Assumption D. λ(s) = λ(t) and p(s) = p(t) for s ∈ [t − c, t], and for all t.

Assumption D suggests a constant prevalence of HIV in the eligible population over a c time units period. Even though p(s) and p(t) concerns different populations (those with A(s) = 1 and A(t) = 1), there may be substantial overlap between the populations if the definition of eligibility is general(e.g., individuals 18+ years or older). Furthermore, this assumption on constant prevalence may be verified based on HIV prevalence surveillance data for specific populations and time periods (e.g., Silom Community Clinic Bangkok, Thailand (2005–2018; see Figure 2 in Pattanasin et al. 2020²⁶) and in Ethiopia (1990–2016; see Figure 1 and 2 in Girum et al. 2018²⁷)).

To summarize these assumptions, the consistency of the snapshot estimator and adjusted estimator is given in the following theorems.

Theorem 1. Suppose that Assumptions C and D hold for c = τ. Then, Assumption A.2 holds. If we further assume Assumption A.1, then the snapshot estimator $\hat{λ}$ is unbiased for estimating λ(t).

Theorem 2. Suppose that Assumptions C and D hold for c = T*. Then, Assumption B.2 holds. If we further assume Assumption B.1, then the adjusted estimator $\tilde{λ}$ is unbiased for estimating λ(t).

Remark 3. Under a different set of assumptions, Gao et al. (2020)²⁸ has explored the relationship of the snapshot and adjusted estimators with the likelihood for the cross-sectional samples under different recency test scenarios. With a perfect recency test (i.e., within a “recent” time window of T* years, ϕ(u) = 1 for u ∈ [0, T*] and ϕ(u) = 0 for u > T*), or a recency test that has imperfect sensitivity (ϕ(u) < 1 for u ∈ [0, T*]) but perfect specificity (ϕ(u) = 0 for u > T*), the snapshot estimator is approximately the maximum likelihood estimator, and the adjusted estimator reduces to the snapshot estimator. For a test with imperfect sensitivity and imperfect (but constant) specificity, ( $ϕ (u) = β_{T^{*}}$ for u > T*), the adjusted estimator is approximately the maximum likelihood estimator.

2.5 |. Violation of Constant Incidence

We have given results on consistency of the snapshot and adjusted estimators with Theorems 1 and 2. The main epidemiological requirement is Assumption D, i.e., incidence and prevalence are constant over a period of time. In this section, we explore the expected bias when Assumption D fails to hold (but all other assumptions hold). Specifically, we assess the bias associated with non-constant incidence λ(t) but constant prevalence. By Assumption C,

P_{r e c} (t) = (1 - p) \int_{0}^{\infty} ϕ (u) λ (t - u) d u,

where p is the constant prevalence.

We first consider the bias of the snapshot estimator. Given Assumption A.1, the expected value of the snapshot estimator is given by

E (\hat{λ}) = \int_{0}^{τ} \frac{ϕ (u)}{μ} λ (t - u) d u,

which is a weighted version of the incidence over [t − τ, t].

With an HIV epidemic If the incidence λ(s) is linearly changing in s ∈ [t − τ, t], i.e., λ(s) = λ(t) + ρ(t − s), then

E (\hat{λ}) = \int_{0}^{τ} \frac{ϕ (u)}{μ} {λ (t) + ρ u} d u = λ (t) + ρ \frac{\int_{0}^{τ} u ϕ (u) d u}{μ} = λ (t - ω),

where $ω = \int_{0}^{τ} u ϕ (u) d u / μ$ is the mean shadow time defined by Kaplan and Brookmeyer¹⁵, indicating that the cross-sectional sample is “casting a shadow” back in time. That is, when the incidence is linearly changing in time and the prevalence is constant, the snapshot estimator estimates the incidence rate ω time units ago. The estimation bias is given by $E (\hat{λ}) - λ (t) = λ (t - ω) - λ (t) = ρ ω$ . For example, if incidence is decreasing, i.e., ρ > 0, the underlying incidence that produced an infection u > 0 time units ago was higher than the current incidence, so the estimate $\hat{λ}$ will have positive bias.

Similarly, for the adjusted estimator, we evaluate the expected value under Assumption B.1, which is given by

E (\tilde{λ}) = \int_{0}^{T^{*}} \frac{ϕ (u) - β_{T^{*}}}{Ω_{T^{*}} - T^{*} β_{T^{*}}} λ (t - u) d u .

When λ(s) = λ(t) + ρ(t − s) for s ∈ [t − T*, t],

E (\tilde{λ}) = λ (t) + ρ \frac{\int_{0}^{T^{*}} u {ϕ (u) - β_{T^{*}}} d u}{Ω_{T^{*}} - T^{*} β_{T^{*}}} = λ (t - ω^{*}),

where $ω^{*} = \int_{0}^{T^{*}} u {ϕ (u) - β_{T^{*}}} d u / (Ω_{T^{*}} - T^{*} β_{T^{*}})$ . It can be viewed as a “mean shadow time” for the adjusted estimator with a recency test that satisfies Assumption B.1. The estimation bias is given by $E (\tilde{λ}) - λ (t) = λ (t - ω^{*}) - λ (t) = ρ ω^{*}$ .

Thus far, we have provided a framework with precisely defined assumptions through which to understand both the snapshot and adjusted estimators. To our knowledge, rigorous derivation of the estimators and their assumptions has not been done by others. In the following section, we evaluate how these estimators perform empirically under realistic epidemiological scenarios and with realistic recency test algorithms.

3 |. NUMERICAL STUDIES

To evaluate the numerical performance of the estimators under various settings, we conducted simulation studies and a simple data analysis. We first describe the simulation setup in Sections 3.1 through 3.6. Throughout the simulations, we assume that Assumption C on approximation of the eligible population always hold. We also assume that prevalence is constant over time. We consider different settings of HIV epidemics and recency tests, where Assumptions A.1, B.1, and D hold or not. We calculate the snapshot and adjusted estimators using (1) and (2), with variance estimators calculated based on Appendix A of Gao et al.²⁸. Importantly, these variance estimators accounts for variability in estimating μ, $Ω_{T^{*}}$ and $β_{T^{*}}$ from an external study. The data analysis is presented in Section 3.7.

3.1 |. Epidemiological Parameters

We generate practical settings by mimicking the epidemiological dynamics of HIV in a population of men who have sex with men (MSM) attending Silom Community Clinic in Bangkok, Thailand²⁶. Particularly, we set the prevalence to be constant and as the mean prevalence in 2011–2018 in that population, and generate settings with different incidences by modeling the HIV incidence in 2011–2018 by either a linear model or log-linear model, to reflect a linearly decreasing or exponentially decreasing incidence. Based on the estimates from the Bangkok MSM data, we consider the following settings corresponding to different trends in HIV incidence.

Constant incidence: λ(s) = 0.032, p = 0.29.
Linearly decreasing incidence: λ(s) = 0.032 + 0.0028(t − s), p = 0.29.
Exponentially decreasing incidence: λ(s) = 0.032 exp{0.07(t − s)}, p = 0.29.

Assumption D is satisfied when the incidence is constant, and it is violated in the linear and exponential settings. We would like to estimate the incidence at time t such that the “true” incidence is 0.032 across all settings. In our simulations, we will assess how violating this assumption affects the performance of the snapshot and adjusted estimators.

3.2 |. Recency Test Characteristics

We sought to assess the performance of the two estimators with a variety of recency tests with different characteristics. The properties of those simulated recency tests mimic two tests in Brookmeyer et al.¹¹ and Laeyendecker et al.¹⁴, with modifications that allow us to assess the performance of the estimators under diverse conditions. For the snapshot estimator, we set τ = 12. For the adjusted estimator, we always considered T* = 2, i.e., any person with an infection acquired longer than 2 years ago is a “long-infected” case.

We first consider a set of recency tests with a relatively short mean window period and a short shadow period, mimicking a recency test in Brookmeyer et al.¹¹ that classifies a subject as recent if their BED capture enzyme immunoassay (BED-CEIA) ≤ 1.5, their Bio-Rad Avidity (BRAI) (Bio-Rad Laboratories, Mississauga, ON) < 40, and their viral load > 400 copies/ml. This test has a mean window period of 101 days and a shadow period of 194 days. We generated four different recency tests that mimic this test:

(1A)
ϕ_1A(t) = 1 − F_Gamma(t; α = 0.352, β = 1.273), where F_Gamma(·; α, β) is the cumulative distribution function of a Gamma random variable with shape α and rate β. Assumption A.1 (approximately) holds for this test with mean window period 101 days and mean shadow 194 days. Assumption B.1 fails to hold since ϕ_1A(t) is non-constant for t ≥ 2. MDRI = 98 days and the test-recent rate probability at t = 2 is 1.4%.
(1B)
ϕ_1B(t) = ϕ_1A(t)I(t ≤ 2) + 0.014I(t > 2). This test modifies test 1A by carrying forward the 1.4% test-recent probability at t = 2, such that Assumption B.1 for the adjusted estimator holds with MDRI = 98 days and FRR = 1.4%. Assumption A.1 for the snapshot estimator no longer holds such that the mean window period is infinite.
(1C)
ϕ_1C(t) = ϕ_1B(t)+f_N(t; 7, 1)/8, where f_N(t; μ, σ) is the density function of a normal random variable with mean μ and standard deviation σ. This test further modifies test 1B by adding a normally distributed spike centered at 7 years, such that Assumption B.1 on constant false-recent rate no longer holds. This test, similar to that depicted in the figure of epidemiological and test recent dynamics in Kassanjee et al²¹, represents a setting in which individuals who have been on antiretroviral therapy for years may have biomarker profiles similar to those who have been recently infected, and thus the false-recent rate among those individuals is relatively higher.
(1D)
ϕ_1D(t) = ϕ_1B(t) + F_N(t; 10, 2)/10 where F_N(t; μ, σ) is the cumulative distribution function of a normal random variable with mean μ and standard deviation σ. This test modifies test 1B by steadily increasing the false-recent rate starting around 6 years, and reaches 9.8% at 12 years. The high false-recent rate is motivated by the BED assay⁵, which has been shown to have an FRR in some populations up to 15%²⁹.

Additionally, we consider another set of recency tests with a longer mean window period and a longer shadow period. This set of tests was modeled after the one presented in Laeyendecker et al.¹⁴ for HIV sub-type C, LAg Avidity (Sedia HIV-1 LAg Avidity EIA; Sedia Biosciences Corporation, Portland, OR, USA) ODn ≤ 2.8 BRAI ≤ 95, and viral load > 400 copies/ml. It has a mean window period of 248 days and a shadow period of 306 days. We generated four recency tests that mimic this test:

(2A)
ϕ_2A(t) = 1 − F_Gamma(t; α = 0.681, β = 1.003). Assumption A.1 holds for this test with mean window period 248 days and shadow 306 days. Assumption B.1 fails to hold since ϕ_2A(t) is non-constant for t ≥ 2. MDRI = 224 days and the test-recent rate probability at t = 2 is 7.25%.
(2B)
ϕ_2B(t) = ϕ_2A(t)I(t ≤ 3.17) + 0.020I(t > 3.17). This recency test has a constant 2% test-recent probability when t ≥ 3.17. Unlike test 1B, Assumption B.1 for the adjusted estimator is violated since the test-recent rate is non-constant after year 2, as depicted in Figure 1 by the shaded grey region.
(2C)
ϕ_2C(t) = ϕ_2B(t) + f_N(t; 7, 1)/8. A similar normally distributed spike centered at 7 years was added to test 2B.
(2D)
ϕ_2D(t) = ϕ_2B(t) + F_N(t; 10, 2)/10. Similar to ϕ_1D, FRR increases up to 10.4% at time 12.

The duration-specific test-recent probabilities of all six recency tests are depicted in Figure 1. The shaded grey area highlighted that Assumption B.1 is violated for test 2B, even though it holds for test 1B: there is a non-constant fraction of the long-infected subjects (infection duration > 2) who test recent.

3.3 |. Data Simulation Procedure

The data simulation procedure consists of two parts. The first part requires simulating data to mimic an external study based on which we estimate the properties of a particular recency test, including mean window period, MDRI and FRR. The second part includes simulation of cross-sectional samples from a population with given HIV epidemiological dynamics. These separate data simulation procedures are outlined in sections 3.3.1 and 3.3.2, respectively.

3.3.1 |. External Study and Estimation of Recency Assay Parameters

Here we outline the process of simulating recency test results for samples (with known infection durations) in an external study and estimating recency test parameters based on such simulated data. We simulate infection durations in the external study similar to those in Duong et al.⁷, with detailed procedure described in Section S1.1 in the supplementary material. Then, given a duration-specific test-positive probability ϕ(·), we generate a test-recent indicator by Δ_ij ~ Bernoulli(ϕ(u_ij)), where u_ij is the simulated infection duration for sample j = 1, …, n_i of subject i = 1, …, m in the external study. Based on the observed data {(u_ij, Δ_ij) : i = 1, …, n; j = 1, …, n_i} in the external study, we estimate the function ϕ(t) using generalized estimating equations³⁰, with an exchangeable correlation structure accounting for within-subject correlation. The marginal model uses a logit link and a cubic polynomial for the linear predictor by assuming

logit (E [Δ_{i j}]) = γ_{0} + γ_{1} u_{i j} + γ_{2} u_{i j}^{2} + γ_{3} u_{i j}^{3} .

Then, an estimate for ϕ(u) can be constructed by $\hat{ϕ} (u) = {\hat{γ}}_{0} + {\hat{γ}}_{1} u + {\hat{γ}}_{2} u^{2} + {\hat{γ}}_{3} u^{3}$ , where $\hat{γ} = ({\hat{γ}}_{0}, {\hat{γ}}_{1}, {\hat{γ}}_{2}, {\hat{γ}}_{3})$ is the parameter estimate. We use robust standard errors for variance estimation. We then calculate the mean window period and MDRI by numerically integrating the estimated the recency test-positive function $\hat{ϕ}$ . In particular, the mean window period involves a numerically integrating until infinity, however, in the simulation we set the upper bound of the integration to the maximum duration observed in the simulated sample (approximately 8 years). We estimate the variance of ${\hat{Ω}}_{T^{*}}$ and $\hat{μ}$ using the delta method and the robust variance-covariance matrix of $\hat{γ}$ . We estimate FRR by evaluating the average test-recent probability among a number of long-infected subjects. In particular, we consider 1500 long-infected subjects with duration of infection uniformly distributed between T* = 2 and τ = 12 years, similar to other studies²⁴.

3.3.2 |. Simulation of Cross-Sectional Samples

To generate a cross-sectional sample with fixed size N, we first simulate the number of HIV positive subjects N_pos ~ Binomial(N, p), where p is the (constant) prevalence. The number of HIV negative subjects is given by N_neg = N−N_pos. For each HIV-positive subject i = 1, …, N_pos, we simulate their infection duration based on the epidemic parameters (p, λ(t)), with details given in Section S1.2 in the supplementary material. Given T_i, we generate a recency test indicator Δ_i ~ Bernoulli(ϕ(t − T_i)). Finally, we calculate $N_{r e c} = \sum_{i = 1}^{N_{p o s}} Δ_{i}$ .

3.3.3 |. Incidence Estimation

For each simulation replicate, we first generate observations of an external study to obtain the estimates $\hat{μ}$ , ${\hat{Ω}}_{T^{*}}$ and ${\hat{β}}_{T^{*}}$ . Then, we generate an independent cross-sectional sample {N_rec, N_pos, N_neg}. We calculate $\hat{λ}$ by Equation (1) and calculate $\tilde{λ} (t)$ by Equation (2). We estimate their variances based on the formulas in Appendix A of Gao et al.²⁸. Confidence intervals are constructed on the incidence scale to accommodate potential non-positive estimates.

3.4 |. Software

The code to reproduce these simulations, and instructions to use functions for estimating incidence based on cross-sectional data are available:https://github.com/mbannick/XSRecency/, version 0.1.0.

3.5 |. Results

Table 1 shows the simulation results in settings with constant, linear, and exponential incidence trends and recency tests 1A-D and 2A-D, with fixed cross-sectional trial sample size N = 5000. Across all settings, the “true” value for incidence is λ = 0.032. Each entry is based on 5,000 simulations.

TABLE 1.

Summary statistics (×10⁻²) for the simulation studies with different settings over 5000 simulations each. For each epidemiological setting and recency test, we show the empirical median bias (Bias), the empirical standard error (SE), the average standard error estimate (SEE), and the empirical coverage probability of the 95% confidence intervals (Cov). We note whether the assumptions are satisfied for the snapshot and adjusted estimator in the Asm. column.

Setting		Snapshot Estimator (1)					Adjusted Estimator (2)
Incidence	Assay	Asm.	Bias	SE	SEE	Cov	Asm.	Bias	SE	SEE	Cov
Recency Assay 1A-D
Constant	1A	✔	0.04	0.63	0.63	94.54	×	0.05	0.67	0.66	94.52
Linear	1A	×	0.21	0.64	0.64	95.24	×	0.22	0.68	0.68	95.00
Exponential	1A	×	0.18	0.64	0.64	95.32	×	0.18	0.68	0.67	95.00
Constant	1B	×	0.79	0.68	0.68	84.24	✔	0.11	0.99	0.99	95.40
Linear	1B	×	0.87	0.69	0.69	81.42	×	0.23	1.00	1.00	95.02
Exponential	1B	×	0.85	0.69	0.69	82.06	×	0.20	1.00	1.00	95.08
Constant	1C	×	0.73	0.71	0.69	84.54	×	−0.01	1.32	1.32	95.58
Linear	1C	×	1.32	0.80	0.76	60.66	×	1.25	1.37	1.38	86.28
Exponential	1C	×	1.33	0.80	0.77	60.24	×	1.28	1.38	1.38	85.70
Constant	1D	×	3.28	0.98	0.97	03.86	×	1.01	1.62	1.63	90.66
Linear	1D	×	1.41	0.77	0.77	54.56	×	−2.45	1.55	1.54	64.23
Exponential	1D	×	1.41	0.77	0.77	54.32	×	−2.44	1.55	1.54	64.59
Recency Assay 2A-D
Constant	2A	✔	0.06	0.40	0.40	95.10	×	0.05	0.46	0.46	95.24
Linear	2A	×	0.29	0.41	0.42	92.44	×	0.33	0.48	0.48	91.72
Exponential	2A	×	0.26	0.41	0.42	93.40	×	0.28	0.47	0.48	92.70
Constant	2B	×	0.48	0.43	0.43	83.74	×	0.08	0.57	0.57	94.96
Linear	2B	×	0.62	0.44	0.44	74.06	×	0.28	0.57	0.58	93.18
Exponential	2B	×	0.59	0.44	0.44	75.74	×	0.25	0.57	0.58	93.78
Constant	2C	×	0.43	0.46	0.44	84.76	×	0.03	0.65	0.65	95.46
Linear	2C	×	0.84	0.49	0.48	58.22	×	0.69	0.67	0.68	84.98
Exponential	2C	×	0.83	0.49	0.48	59.34	×	0.67	0.67	0.68	85.54
Constant	2D	×	1.62	0.52	0.52	09.06	×	0.40	0.72	0.73	92.08
Linear	2D	×	0.88	0.46	0.47	53.68	×	−0.68	0.69	0.69	82.04
Exponential	2D	×	0.87	0.46	0.47	54.52	×	−0.71	0.69	0.69	81.18

Open in a new tab

3.5.1 |. Snapshot Estimator

Recency assays 1A and 2A satisfy Assumption A.1 for the snapshot estimator. In the constant incidence setting where Assumption D further holds, the empirical bias is small. In the settings where Assumption D fails to hold (linear or exponential incidence), there is empirical bias associated with the snapshot estimator, and the empirical bias is close to the expected bias calculated based on the formula in Section 2.5 (0.15×10⁻² and 0.12×10⁻² for assay 1A in the linear and exponential settings, respectively; 0.20×10⁻² and 0.23×10⁻² for assay 2A in the linear and exponential settings, respectively). For the recency tests that violate Assumption A.1 (assays 1B-D, 2B-D), the empirical bias is larger, and it increases if the constant incidence assumption is also violated.

Across all settings, the standard error estimate is close to the empirical standard error, indicating reasonable estimation of variability. Snapshot estimates based on recency tests 2A-D have smaller variability than those based on 1A-D, corresponding to a larger area under the ϕ(t) curve. The coverage probabilities are close to the nominal level with recency test 1A and 2A when incidence is constant. There is under-coverage when the incidence is non-constant or when Assumption A.1 fails to hold (recency tests 1B-D, 2B-D).

3.5.2 |. Adjusted Estimator

When incidence is constant, the empirical bias for recency tests 1A-C and 2A-C is small, while the empirical bias for recency tests 1D and 2D are relatively large. When the constant incidence assumption is violated, the empirical bias gets larger and the empirical bias for assay 1B matches with the theoretical calculated expected bias when Assumption B.1 holds (0.10×10⁻² and 0.09×10⁻² for assay 1B for the linear and exponential settings, respectively).

Across all settings, the standard error estimate is close to the empirical standard error, indicating reasonable estimation of variability. Notably, the empirical standard errors for the adjusted estimator are always larger than those for the snapshot estimator. The coverage probabilities are close to the nominal level with recency tests 1A and 1B, even in non-constant incidence settings. The coverage probabilities in the constant incidence setting are always close to the nominal level. Similar to the snapshot estimator, there is under-coverage when the incidence is non-constant for some recency tests. The coverage probabilities are closer to the nominal level compared to the snapshot estimators in most non-constant incidence settings, mainly due to a larger associated variability.

3.6 |. Sensitivity to Distribution of Long-infected Subjects in External Study

Table 1 shows that recency tests 1C and 2C give minimal bias and nominal coverage in the constant incidence setting. It seems to suggest that the adjusted estimator may sometimes be robust to violation of the recency test requirement (Assumption B.1) when the incidence in the population is constant. However, it is the result of a specific choice of distribution of infection duration in the external study, based on which we estimate $β_{T^{*}}$ . The infection times of the long-infected subjects were generated uniformly from 2 to 12 years, such that the estimated ${\hat{β}}_{T^{*}}$ reflects the average false-recent rate in [2,12], similar to the average false-recent rate in a constant incidence setting.

To evaluate sensitivity to this distributional assumption, we consider two different distributions of infection durations for long-infected subjects in the external study in estimating $β_{T^{*}}$ . The first distribution is similar to those who were infected more than two years ago in Duong et al.⁷, where the range of infection duration is [2,8.25] years and the distribution is certainly not uniform (see Figure S1 in the supplementary material). The second distribution is to truncate Duong’s infection durations at 5 years, such that the range of infection duration is [2,5] years.

Simulation results are shown in Table 2. Different sampling schemes for the long-infected subjects provide different estimates for the incidence with recency tests 1C and 2C. In particular, if the distribution of long-infected subjects fails to recover the major trend of the ϕ function (e.g., the spike at 7 years for assays 1C and 2C), the estimator is biased and the coverage is poor. This suggests caution in using the adjusted estimator when Assumption B.1 fails to hold, and that one should be sure that the distribution of long-infected subjects matches with the cross-sectional sample.

TABLE 2.

Sensitivity analysis for the adjusted estimator over 5000 simulations for different sampling strategy to estimate $β_{T^{*}}$ . All samples include 1500 subjects (we re-sampled the infection duration in Duong dataset to get 1500 subjects). Results shown for recency tests 1C and 2C.

Setting		Adjusted Estimator (2)
Assay	Long-Infected Distribution	Bias	SE	SEE	Cov
1C	Uniform [2,12]	−0.01	1.32	1.32	95.58
1C	Duong et al.⁷ [2,8.25]	0.56	1.49	1.26	88.26
1C	Duong et al.⁷ [2,5]	1.78	1.32	1.13	62.98
2C	Uniform [2,12]	0.03	0.65	0.65	95.46
2C	Duong et al.⁷ [2,8.25]	0.01	0.76	0.66	90.92
2C	Duong et al.⁷ [2,5]	0.36	0.77	0.64	86.82

Open in a new tab

3.7 |. Data Analysis

To demonstrate the use and performance of the cross-sectional incidence estimators, we perform an analysis on the data from Akwa Ibom AIDS Indicator Survey, which is a population-based cross-sectional household study reported in Negedu-Momoh et al. 2021³¹. The study was conducted in 31 Local Government Areas of Akwa Ibom state form April to June 2017. Of the 8306 individuals who completed specimen collection and screening for HIV, 394 were found HIV sero-positive. Of these 394, 370 with eligible plasma samples were tested for recent HIV infection using the LAg Avidity assay, of which 19 were identified as recent. Even though incidence estimation in Negedu-Momoh et al. 2021³¹ was based on a more complicated algorithm that includes viral load, here we focus on the algorithm with LAg Avidity assay only for the purpose of illustration.

To assess the property of the LAg Avidity assay under the assumptions of the snapshot and adjusted incidence estimators, we analyzed data from Duong et al. 2015⁷ as in Section 3.3.1 to calculate the mean window period, MDRI, and FRR for the LAg Avidity assay. Data from Duong et al. 2015⁷ suggest that long-infected subjects may be indicated as test-recent by LAg Avidity assay, such that Assumption A.1 is not likely to hold. We estimated a mean window period of 140 days (95% CI: 127–154), an MDRI (based on T* = 1) of 139 days (95% CI: 126–151), and an FRR of 1.6% (95% CI: 1.2%–2.0%).

Using the HIV screening data and properties of the LAg Avidity assay, we estimate the cross-sectional incidence using a slightly modified version of the snapshot and adjusted estimators to account for the fact that not all HIV positive individuals receiving recency tests (details on the modification in Section S2 of the supplementary material). The estimated incidence is 6.7 per 1000 person-years (95% CI: 3.6–9.7) based on the snapshot estimator and 4.8 per 1000 person-years (95% CI: 1.6–8.0) based on the adjusted estimator. The difference in the estimated incidences may be due to the fact that Assumption A.1 for the snapshot estimator fails to hold. Similar to the simulation studies, the estimated incidence based on the adjusted estimator is associated with larger variability.

4 |. DISCUSSION

In this manuscript, we considered a unified statistical framework to assess the assumptions for cross-sectional incidence estimation based on the snapshot and Kassanjee’s adjusted estimators. We established two key assumptions: the incidence and prevalence in the population of interest are constant over the period of time preceding a cross-sectional sample; and the duration-specific test-recent probability function ϕ(t) goes to zero for the snapshot estimator or is constant in the tail for the adjusted estimator. We derived the theoretical biases of the estimators when constant incidence assumption fails to hold. To empirically assess the biases, we conducted simulation studies under various scenarios with different epidemiological settings and different recency test properties.

Indeed, the estimators perform well when their corresponding assumptions hold. When the constant incidence assumption is violated, the numerical bias is commensurate with the theoretically calculated bias. The adjusted estimator is more robust when the assumptions about the recency test properties (Assumption A.1 or B.1) are violated; though to compensate for this, the variability of the adjusted estimator is always larger than that of the snapshot estimator. This robustness to mis-specification makes the adjusted estimator more flexible in the setting where the property of a specific recency test is not precisely known.

There are important differences between the snapshot and adjusted estimator with respect to their requirements. The snapshot estimator requires a finite positive range for ϕ(t) (Assumption A.1). In other words, if someone was infected sufficiently long time ago, the recency test is perfectly specific. In contrast, the adjusted estimator requires a constant ϕ(t) in the tail (Assumption B.1). In other words, past a certain point, the false-recent test probability is unrelated to infection duration. As illustrated by assay 2A in the simulation studies, Assumption B.1 is not necessarily less restrictive than Assumption A.1 and indeed the snapshot estimator performs better for this specific assay. In practice, to obtain the best performance, the researcher should be cautious and understand the properties of the recency test before choosing whether to apply either of the estimators.

An additional consideration when using the adjusted estimator is that the performance of the adjusted estimator may be affected by the distribution of the long-infected subjects that are used to estimate FRR. As suggested in Remark 2, the bias from the adjusted estimator may be minimal if the distributions of the long-infected subjects in the cross-sectional sample and in the evaluating external study are the same. However, if the range of the infection duration of the long-infected subjects fails to recover a non-constant region of the ϕ function of the recency test, or if the distribution of infection duration differs much from that of long-infected subjects in the cross-sectional sample, biases from estimating FRR lead to bias and under-coverage in the adjusted estimator.

In order to use the adjusted estimator, researchers need to specify a fixed T* beyond which point subjects are regarded as long-infected. In practice, T* is set at 1 or 2 years, and a proper test-recent region $R$ is then chosen to yield a recency test with the desired properties (e.g., large MDRI, FRR<2.0%) defined upon this choice of T*. Choice of T* would affect the performance of the adjusted estimator through its impact on MDRI/FRR and the fact that the adjusted estimator estimates a weighted average incidence in a range of T* prior to the cross-sectional time. In this manuscript, we did not assess the impact of T*, since it would involve extensive modeling for the biomarker values at different infection duration. We wish to explore the effect of different choices of T* in future research.

The performance of the adjusted estimator is sensitive to Assumption B.1 that includes a constant tail of the duration-specific test-recent probability ϕ(t), which is affected by the test-recent region $R$ of the recency test. In particular, the test-recent region $R$ is usually chosen to guarantee such assumption, leading to a potentially small MDRI and suboptimal power for the adjusted estimator. An alternative strategy is to directly model the infection duration and construct an incidence estimator based on a predicted infection duration given recency assay readings. Since it uses the recency assay readings, such strategy is similar to making use of “recency” results from multiple test-regions so that more power may be gained. This alternative strategy is currently under investigation.

Even though motivated by efforts in HIV surveillance, cross-sectional incidence estimation has the potential for broad applicability. For example, it can be applied to the screening population of an active-controlled HIV prevention trial to serve as a counterfactual placebo incidence estimate²⁸. It provides a comparator for the observed incidence among the trial participants that receive the active intervention, so that prevention efficacy can be evaluated, under further assumptions on the similarity of the trial and screening populations. In addition, the cross-sectional incidence estimation may not be limited to HIV applications and can be applied to other pathogens whose duration of infection detection is finite. For example, incidences estimated from cross-sectional samples of SARS-Cov-2 PCR results from different vaccine/placebo groups may be used to infer vaccine efficacy in a randomized trial³².

Supplementary Material

supinfo

NIHMS1767850-supplement-supinfo.pdf^{(293.7KB, pdf)}

ACKNOWLEDGMENTS

The authors thank the Associate Editor, and the two anonymous reviewers for helpful comments and suggestions. The work was supported by the U.S. National Institutes of Health grant R56 AI143418. This work was also supported by Scientific Computing Infrastructure at Fred Hutch funded by ORIP grant S10OD028685.

0. Abbreviations:

FRR: false-recency rate
MDRI: mean duration of recent infection

Footnotes

SUPPORTING INFORMATION

Additional supporting information may be found online in the Supporting Information section at the end of this article.

DATA AVAILABILITY STATEMENT

The LAg Avidity assay data that support the findings of this study are openly available via the link https://doi.org/10.1371/journal.pone.0114947.

References

1.Brookmeyer R Measuring the HIV/AIDS epidemic: approaches and challenges. Epidemiol Rev. 2010; 32(1): 26–37. [DOI] [PubMed] [Google Scholar]
2.Sherr L, Lopman B, Kakowa M, et al. Voluntary counselling and testing: uptake, impact on sexual behaviour, and HIV incidence in a rural Zimbabwean cohort. AIDS 2007; 21(7): 851–860. [DOI] [PubMed] [Google Scholar]
3.Brookmeyer R, Quinn TC. Estimation of current human immunodeficiency virus incidence rates from a cross-sectional survey using early diagnostic tests. Am J Epidemiol. 1995; 141(2): 166–172. [DOI] [PubMed] [Google Scholar]
4.Janssen RS, Satten GA, Stramer SL, et al. New testing strategy to detect early HIV-1 infection for use in incidence estimates and for clinical and prevention purposes. JAMA 1998; 280(1): 42–48. [DOI] [PubMed] [Google Scholar]
5.Parekh BS, Kennedy MS, Dobbs T, et al. Quantitative detection of increasing HIV type 1 antibodies after seroconversion: a simple assay for detecting recent HIV infection and estimating incidence. AIDS Res Hum Retrov. 2002; 18(4): 295–307. [DOI] [PubMed] [Google Scholar]
6.Suligoi B, Galli C, Massi M, et al. Precision and accuracy of a procedure for detecting recent human immunodeficiency virus infections by calculating the antibody avidity index by an automated immunoassay-based method. J Clin Microbiol. 2002; 40(11): 4015–4020. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Duong YT, Kassanjee R, Welte A, et al. Recalibration of the limiting antigen avidity EIA to determine mean duration of recent infection in divergent HIV-1 subtypes. PLoS One 2015; 10(2): e33328. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Kouyos RD, Wyl vV, Yerly S, et al. Ambiguous nucleotide calls from population-based sequencing of HIV-1 are a marker for viral diversity and the age of infection. Clin Infect Dis. 2011; 52(4): 532–539. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Yang J, Xia X, He X, et al. A new pattern-based method for identifying recent HIV-1 infections from the viral env sequence. Science China Life Sciences 2012; 55(4): 328–335. [DOI] [PubMed] [Google Scholar]
10.Cousins MM, Laeyendecker O, Beauchamp G, et al. Use of a high resolution melting (HRM) assay to compare gag, pol, and env diversity in adults with different stages of HIV infection. PLoS One 2011; 6(11): e27211. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Brookmeyer R, Konikoff J, Laeyendecker O, Eshleman SH. Estimation of HIV incidence using multiple biomarkers. Am J Epidemiol. 2013; 177(3): 264–272. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Laeyendecker O, Brookmeyer R, Cousins MM, et al. HIV incidence determination in the United States: a multiassay approach. J Infect Dis. 2013; 207(2): 232–239. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Konikoff J, Brookmeyer R, Longosz AF, et al. Performance of a limiting-antigen avidity enzyme immunoassay for cross-sectional estimation of HIV incidence in the United States. PLoS One 2013; 8(12): e82772. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Laeyendecker O, Konikoff J, Morrison DE, et al. Identification and validation of a multi-assay algorithm for cross-sectional HIV incidence estimation in populations with subtype C infection. J Int AIDS Soc. 2018; 21(2): e25082. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Kaplan EH, Brookmeyer R. Snapshot estimators of recent HIV incidence rates. Oper Res. 1999; 47(1): 29–37. [Google Scholar]
16.Eshleman SH, Hughes JP, Laeyendecker O, et al. Use of a multifaceted approach to analyze HIV incidence in a cohort study of women in the United States: HIV Prevention Trials Network 064 Study. J Infect Dis. 2013; 207(2): 223–231. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Rehle T, Johnson L, Hallett T, et al. A comparison of South African national HIV incidence estimates: a critical appraisal of different methods. PLoS One 2015; 10(7): e0133255. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Solomon SS, Mehta SH, McFall AM, et al. Community viral load, antiretroviral therapy coverage, and HIV incidence in India: a cross-sectional, comparative study. Lancet HIV 2016; 3(4): e183–e190. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.McDougal JS, Parekh BS, Peterson ML, et al. Comparison of HIV type 1 incidence observed during longitudinal follow-up with incidence estimated by cross-sectional analysis using the BED capture enzyme immunoassay. AIDS Res Hum Retrov. 2006; 22(10): 945–952. [DOI] [PubMed] [Google Scholar]
20.Hargrove JW, Humphrey JH, Mutasa K, et al. Improved HIV-1 incidence estimates using the BED capture enzyme immunoassay. AIDS 2008; 22(4): 511–518. [DOI] [PubMed] [Google Scholar]
21.Kassanjee R, McWalter TA, Bärnighausen T, Welte A. A new general biomarker-based incidence estimator. Epidemiol. 2012; 23(5): 721. [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Maman D, Chilima B, Masiku C, et al. Closer to 90–90–90. The cascade of care after 10 years of ART scale-up in rural Malawi: a population study. J Int AIDS Soc. 2016; 19(1): 20673. [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Moyo S, Gaseitsiwe S, Mohammed T, et al. Cross-sectional estimates revealed high HIV incidence in Botswana rural communities in the era of successful ART scale-up in 2013–2015. PLoS One 2018; 13(10): e0204840. [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Kassanjee R, Pilcher CD, Busch MP, et al. Viral load criteria and threshold optimization to improve HIV incidence assay characteristics-a CEPHIA analysis. AIDS 2016; 30(15): 2361. [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Mahiane SG, Fiamma A, Auvert B. Mixture models for calibrating the BED for HIV incidence testing. Stat Med. 2014; 33(10): 1767–1783. [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Pattanasin S, Griensven vF, Mock PA, et al. Recent declines in HIV infections at Silom Community Clinic Bangkok, Thailand corresponding to HIV prevention scale up: An open cohort assessment 2005–2018. Int J Infect Dis. 2020; 99: 131–137. [DOI] [PubMed] [Google Scholar]
27.Girum T, Wasie A, Worku A. Trend of HIV/AIDS for the last 26 years and predicting achievement of the 90–90–90 HIV prevention targets by 2020 in Ethiopia: a time series analysis. BMC Infect Dis. 2018; 18: 320. [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Gao F, Glidden DV, Hughes JP, Donnell DJ. Sample size calculation for active-arm trial with counterfactual incidence based on recency assay. Stat Commun Infect Dis. 2021; 13(1). [DOI] [PMC free article] [PubMed] [Google Scholar]
29.Mastro TD, Kim AA, Hallett T, et al. Estimating HIV Incidence in Populations Using Tests for Recent Infection: Issues, Challenges and the Way Forward. J HIV AIDS Surveill Epidemiol. 2010; 2(1): 1–14. [PMC free article] [PubMed] [Google Scholar]
30.Liang KY, Zeger SL. Longitudinal Data Analysis Using Generalized Linear Models. Biometrika 1986; 73(1): 13–22. [Google Scholar]
31.Negedu-Momoh OR, Balogun O, Dafa I, et al. Estimating HIV incidence in the Akwa Ibom AIDS indicator survey (AKAIS), Nigeria using the limiting antigen avidity recency assay. J Int AIDS Soc. 2021; 24(2): e25669. [DOI] [PMC free article] [PubMed] [Google Scholar]
32.Follmann D, Fay MP. Vaccine Efficacy at a Point in Time. medRxiv 2021. doi: 10.1101/2021.02.04.21251133 [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

supinfo

NIHMS1767850-supplement-supinfo.pdf^{(293.7KB, pdf)}

Data Availability Statement

The LAg Avidity assay data that support the findings of this study are openly available via the link https://doi.org/10.1371/journal.pone.0114947.

[R1] 1.Brookmeyer R Measuring the HIV/AIDS epidemic: approaches and challenges. Epidemiol Rev. 2010; 32(1): 26–37. [DOI] [PubMed] [Google Scholar]

[R2] 2.Sherr L, Lopman B, Kakowa M, et al. Voluntary counselling and testing: uptake, impact on sexual behaviour, and HIV incidence in a rural Zimbabwean cohort. AIDS 2007; 21(7): 851–860. [DOI] [PubMed] [Google Scholar]

[R3] 3.Brookmeyer R, Quinn TC. Estimation of current human immunodeficiency virus incidence rates from a cross-sectional survey using early diagnostic tests. Am J Epidemiol. 1995; 141(2): 166–172. [DOI] [PubMed] [Google Scholar]

[R4] 4.Janssen RS, Satten GA, Stramer SL, et al. New testing strategy to detect early HIV-1 infection for use in incidence estimates and for clinical and prevention purposes. JAMA 1998; 280(1): 42–48. [DOI] [PubMed] [Google Scholar]

[R5] 5.Parekh BS, Kennedy MS, Dobbs T, et al. Quantitative detection of increasing HIV type 1 antibodies after seroconversion: a simple assay for detecting recent HIV infection and estimating incidence. AIDS Res Hum Retrov. 2002; 18(4): 295–307. [DOI] [PubMed] [Google Scholar]

[R6] 6.Suligoi B, Galli C, Massi M, et al. Precision and accuracy of a procedure for detecting recent human immunodeficiency virus infections by calculating the antibody avidity index by an automated immunoassay-based method. J Clin Microbiol. 2002; 40(11): 4015–4020. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R7] 7.Duong YT, Kassanjee R, Welte A, et al. Recalibration of the limiting antigen avidity EIA to determine mean duration of recent infection in divergent HIV-1 subtypes. PLoS One 2015; 10(2): e33328. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R8] 8.Kouyos RD, Wyl vV, Yerly S, et al. Ambiguous nucleotide calls from population-based sequencing of HIV-1 are a marker for viral diversity and the age of infection. Clin Infect Dis. 2011; 52(4): 532–539. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R9] 9.Yang J, Xia X, He X, et al. A new pattern-based method for identifying recent HIV-1 infections from the viral env sequence. Science China Life Sciences 2012; 55(4): 328–335. [DOI] [PubMed] [Google Scholar]

[R10] 10.Cousins MM, Laeyendecker O, Beauchamp G, et al. Use of a high resolution melting (HRM) assay to compare gag, pol, and env diversity in adults with different stages of HIV infection. PLoS One 2011; 6(11): e27211. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R11] 11.Brookmeyer R, Konikoff J, Laeyendecker O, Eshleman SH. Estimation of HIV incidence using multiple biomarkers. Am J Epidemiol. 2013; 177(3): 264–272. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R12] 12.Laeyendecker O, Brookmeyer R, Cousins MM, et al. HIV incidence determination in the United States: a multiassay approach. J Infect Dis. 2013; 207(2): 232–239. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R13] 13.Konikoff J, Brookmeyer R, Longosz AF, et al. Performance of a limiting-antigen avidity enzyme immunoassay for cross-sectional estimation of HIV incidence in the United States. PLoS One 2013; 8(12): e82772. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R14] 14.Laeyendecker O, Konikoff J, Morrison DE, et al. Identification and validation of a multi-assay algorithm for cross-sectional HIV incidence estimation in populations with subtype C infection. J Int AIDS Soc. 2018; 21(2): e25082. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R15] 15.Kaplan EH, Brookmeyer R. Snapshot estimators of recent HIV incidence rates. Oper Res. 1999; 47(1): 29–37. [Google Scholar]

[R16] 16.Eshleman SH, Hughes JP, Laeyendecker O, et al. Use of a multifaceted approach to analyze HIV incidence in a cohort study of women in the United States: HIV Prevention Trials Network 064 Study. J Infect Dis. 2013; 207(2): 223–231. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R17] 17.Rehle T, Johnson L, Hallett T, et al. A comparison of South African national HIV incidence estimates: a critical appraisal of different methods. PLoS One 2015; 10(7): e0133255. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R18] 18.Solomon SS, Mehta SH, McFall AM, et al. Community viral load, antiretroviral therapy coverage, and HIV incidence in India: a cross-sectional, comparative study. Lancet HIV 2016; 3(4): e183–e190. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R19] 19.McDougal JS, Parekh BS, Peterson ML, et al. Comparison of HIV type 1 incidence observed during longitudinal follow-up with incidence estimated by cross-sectional analysis using the BED capture enzyme immunoassay. AIDS Res Hum Retrov. 2006; 22(10): 945–952. [DOI] [PubMed] [Google Scholar]

[R20] 20.Hargrove JW, Humphrey JH, Mutasa K, et al. Improved HIV-1 incidence estimates using the BED capture enzyme immunoassay. AIDS 2008; 22(4): 511–518. [DOI] [PubMed] [Google Scholar]

[R21] 21.Kassanjee R, McWalter TA, Bärnighausen T, Welte A. A new general biomarker-based incidence estimator. Epidemiol. 2012; 23(5): 721. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R22] 22.Maman D, Chilima B, Masiku C, et al. Closer to 90–90–90. The cascade of care after 10 years of ART scale-up in rural Malawi: a population study. J Int AIDS Soc. 2016; 19(1): 20673. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R23] 23.Moyo S, Gaseitsiwe S, Mohammed T, et al. Cross-sectional estimates revealed high HIV incidence in Botswana rural communities in the era of successful ART scale-up in 2013–2015. PLoS One 2018; 13(10): e0204840. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R24] 24.Kassanjee R, Pilcher CD, Busch MP, et al. Viral load criteria and threshold optimization to improve HIV incidence assay characteristics-a CEPHIA analysis. AIDS 2016; 30(15): 2361. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R25] 25.Mahiane SG, Fiamma A, Auvert B. Mixture models for calibrating the BED for HIV incidence testing. Stat Med. 2014; 33(10): 1767–1783. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R26] 26.Pattanasin S, Griensven vF, Mock PA, et al. Recent declines in HIV infections at Silom Community Clinic Bangkok, Thailand corresponding to HIV prevention scale up: An open cohort assessment 2005–2018. Int J Infect Dis. 2020; 99: 131–137. [DOI] [PubMed] [Google Scholar]

[R27] 27.Girum T, Wasie A, Worku A. Trend of HIV/AIDS for the last 26 years and predicting achievement of the 90–90–90 HIV prevention targets by 2020 in Ethiopia: a time series analysis. BMC Infect Dis. 2018; 18: 320. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R28] 28.Gao F, Glidden DV, Hughes JP, Donnell DJ. Sample size calculation for active-arm trial with counterfactual incidence based on recency assay. Stat Commun Infect Dis. 2021; 13(1). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R29] 29.Mastro TD, Kim AA, Hallett T, et al. Estimating HIV Incidence in Populations Using Tests for Recent Infection: Issues, Challenges and the Way Forward. J HIV AIDS Surveill Epidemiol. 2010; 2(1): 1–14. [PMC free article] [PubMed] [Google Scholar]

[R30] 30.Liang KY, Zeger SL. Longitudinal Data Analysis Using Generalized Linear Models. Biometrika 1986; 73(1): 13–22. [Google Scholar]

[R31] 31.Negedu-Momoh OR, Balogun O, Dafa I, et al. Estimating HIV incidence in the Akwa Ibom AIDS indicator survey (AKAIS), Nigeria using the limiting antigen avidity recency assay. J Int AIDS Soc. 2021; 24(2): e25669. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R32] 32.Follmann D, Fay MP. Vaccine Efficacy at a Point in Time. medRxiv 2021. doi: 10.1101/2021.02.04.21251133 [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Statistical Considerations for Cross-Sectional HIV Incidence Estimation Based on Recency Test

Fei Gao

Marlena Bannick

Summary

1 |. INTRODUCTION

2 |. THEORETICAL MODEL

2.1 |. Notation

2.2 |. Cross-Sectional Sample

2.3 |. Assumptions for Cross-Sectional Estimators

FIGURE 1.

2.4 |. Distribution of Infection Time in the Eligible Population

2.5 |. Violation of Constant Incidence

3 |. NUMERICAL STUDIES

3.1 |. Epidemiological Parameters

3.2 |. Recency Test Characteristics

3.3 |. Data Simulation Procedure

3.3.1 |. External Study and Estimation of Recency Assay Parameters

3.3.2 |. Simulation of Cross-Sectional Samples

3.3.3 |. Incidence Estimation

3.4 |. Software

3.5 |. Results

TABLE 1.

3.5.1 |. Snapshot Estimator

3.5.2 |. Adjusted Estimator

3.6 |. Sensitivity to Distribution of Long-infected Subjects in External Study

TABLE 2.

3.7 |. Data Analysis

4 |. DISCUSSION

Supplementary Material

ACKNOWLEDGMENTS

0. Abbreviations:

Footnotes

DATA AVAILABILITY STATEMENT

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases