Methods for Generalized Change-point Models: with Applications to HIV Surveillance and Diabetes Data

Jean de Dieu Tapsoba; Ching-Yun Wang; Sahar Zangeneh; Ying Qing Chen

doi:10.1002/sim.8469

. Author manuscript; available in PMC: 2021 Apr 15.

Published in final edited form as: Stat Med. 2020 Jan 29;39(8):1167–1182. doi: 10.1002/sim.8469

Methods for Generalized Change-point Models: with Applications to HIV Surveillance and Diabetes Data

Jean de Dieu Tapsoba ^1,^*,^†, Ching-Yun Wang ², Sahar Zangeneh ¹, Ying Qing Chen ¹

PMCID: PMC7260994 NIHMSID: NIHMS1590210 PMID: 31997385

Abstract

In many epidemiological and biomedical studies, the association between a response variable and some covariates of interest may change at one or several thresholds of the covariates. Change-point models are suitable for investigating the relationship between the response and covariates in such situations. We present change-point models, with at least one unknown change-point occurring with respect to some covariates of a generalized linear model for independent or correlated data. We develop methods for the estimation of the model parameters and investigate their finite sample performances in simulations. We apply the proposed methods to examine the trends in the reported estimates of the annual percentage of new human immunodeficiency virus (HIV) diagnoses linked to HIV-related medical care within three months after diagnosis using HIV surveillance data from the HIV Prevention Trial Network (HPTN) 065 study. We also apply our methods to a data set from the Pima Indian diabetes study to examine the effects of age and body mass index on the risk of being diagnosed with type 2 diabetes.

Keywords: Change-point, correlated data, generalized linear model, HIV surveillance, type 2 diabetes

1. Introduction

In many epidemiological and biomedical studies, it is common that the association between a response variable and some covariates of interest is not linear, and the association may change at one or more unknown thresholds of the covariates. Change-point models (also known as segmented models) are well-suited for data analysis in these situations and allow the regression function to take different forms before and after the thresholds.¹

An example is the trend in the reported estimates of the annual percentage of individuals (age 13 years or older) who were newly diagnosed with HIV infection in 2011 and linked to HIV-related medical care within three months of diagnosis (L2C) from the HIV-surveillance data for six jurisdictions (New York, NY; Washington D.C; Miami, FL; Chicago, IL; Philadelphia, PA and Houston, TX) participating in the HPTN 065 study, which was a feasibility study of an enhanced test, link to care, plus treatment strategy for HIV prevention in the United States.² The data included jurisdiction-level aggregate estimates of the number of new HIV diagnoses, L2C and viral load suppression. These measures were routinely updated and submitted to the surveillance systems’ web sites by the local Departments of Health (DOH) on a quarterly basis from March 2012 through December 2014. Figure 1 shows the updated L2C estimate over the course of the data submission time for the six cities. A noticeable pattern emerging from this figure is that the updated L2C estimates generally increased from the first data-upload month up to a certain time point at which they somewhat stabilized. The number of new HIV diagnoses, shown in Figure S1 of the Supplementary Materials also exhibited similar trends.

Figure 1: — Reported percentage of new HIV-diagnoses linked to medical care within 3 months following diagnosis with HIV infection by local Departments of Health, 2011 for the HPTN 065 study participating jurisdictions; τ is the estimated jurisdiction-specific change-point.

Another example arises from a data set from the Pima Indian diabetes study. This data set was obtained from the University of California, Irvine (UCI) Repertory of Machine Learning Repository and was part of a comprehensive database maintained by the US National Institutes of Diabetes and Kidney Diseases. It included 764 Pima Indian women (age 21 years or older) from greater Phoenix, Arizona and was previously analyzed by Wang et al³ using generalized additive partial linear models. The variables included age, BMI, plasma glucose concentration levels and a binary indicator variable for a T2D-positive test according to the World Health Organization criteria. Figure 2 shows the smooth functions of age, BMI and glucose in their associations with T2D diagnosis based on an additive logistic regression model⁴ that was fitted using the ‘mgcv’ R package.⁵ The effect of glucose on the log odds for a T2D diagnosis appeared to be linear. However, the effects of age and BMI appeared nonlinear, with possibly one change-point for age and two change-points for BMI. The estimation of the change-points themselves is often of primary interest in many applications.

Figure 2: — Smooth components of age, BMI and glucose (solid lines) with the associated two standard error bounds (dashed lines) in an additive logistic regression model for their associations with the logit transformation of a positive test for diabetes in the Pima Indians diabetes study. df is the effective degree of freedom from the estimation of the smooth components based on spline transformations.

The change-point estimation problem has been extensively studied in the literature and most works have focused on piecewise linear (broken-stick) regression models with one or multiple change-points.^6,7 Also, the single change-point estimation in the context of logistic regression models for independent observations has received substantial attention, and relevant references include Pastor and Guallar,⁸ Pastor et al,⁹ and Fong et al.¹⁰ However, the change-point problem has not been well-studied in the general situation of generalized linear models for independent or correlated data. To our knowledge, only limited research pertaining to this issue exist in the literature. Among the very few references, Zhou and Liang¹¹ proposed an estimation method for a single change-point in one covariate of a generalized linear model for independent observations. Their approach relies on an approximation by smoothing techniques tying the performance of the method to an appropriate choice of the smoothing parameter. Muggeo^12,13 developed an algorithm for the estimation of one or multiple change-points in regression models including the generalized linear models for independent data. Muggeo’s algorithm uses a linear-approximation to the regression function. In a longitudinal data setting, Das et al⁷ proposed a method using approximation by smoothing techniques for the estimation of multiple change-points in linear models. The approximate methods may not work well in many situations, especially when the regression model is not linear and the sample size is not large. Methods that are not based on approximation by smoothing or linearization for the estimation of the change-points in the general frameworks of generalized linear models for independent or correlated data still have yet to be developed.

We address the general problem of estimating one or multiple unknown change-points in the covariates of generalized linear models for independent or correlated response data. We introduce change-point models for the situations where at least one change-point is present in some covariates of a generalized linear model for independent, correlated or longitudinal data. Furthermore, we develop methods using estimating equations for the estimation of the change-points along with the other model parameters. Our methods involve no approximation by smoothing or linearization techniques and yield $\sqrt{n}$ -consistent estimators.

We present the change-point models in Section 2. In Section 3, we develop our methods for the estimation of the model parameters. The finite-sample performance of the proposed methods are numerically investigated in Section 4. In Section 5, we apply our methods to examine the trends of the reported L2C estimates over the data upload time and provide improvements of these estimates. We also illustrate our methods with the data set from the Pima Indian diabetes study. We conclude with a discussion in Section 6.

2. Generalized change-point models

We distinguish two settings corresponding to whether the response variable is measured once or repeatedly for each sampled unit.

2.1. Setting with one measurement per unit

Let (Y_i,X_i,W_i), i = 1,...,n, denote the observed data, where Y_i is the response, X_i is a scalar and continuous covariate, and W_i is a vector of other covariates involving no changepoints. In the HPTN065 study, Y represents the L2C measure and X is the time since the first data-upload month for a given jurisdiction. In the Pima Indian diabetes study, Y is the indicator of T2D-positive test and X is age or BMI. We model the relationship between the response and covariates through the following model, termed here as the generalized change-point model with $R + 1$ segments:

E (Y_{i} | X_{i}, W_{i}) = h {η (θ, X_{i}, W_{i})} = h {β_{0} + β_{1} X_{i} + \sum_{r = 1}^{R} β_{2 r} {(X_{i} - τ_{r})}_{+} + γ^{'} W_{i}},

(1)

where u₊ = max(u,0) for any u and h(.) is a known function. For example h(u) takes the form u, {1 + exp(−u)}⁻¹ and exp(u) for linear, logistic and Poisson regression models, respectively. In addition, θ = (ϑ′, τ′)′ is a vector of unknown parameters with ϑ = (β′, γ′)′, $β = {(β_{0}, β_{1}, β_{2}^{'})}^{'}$ , $β_{2} = {(β_{21}, \dots, β_{2 R})}^{'}$ and $τ = {(τ_{1}, \dots, τ_{R})}^{'}$ . Here the τ_r’s are ordered $(τ_{1} < \dots < τ_{R})$ change-points occurring in the conditional mean of the response given the covariates. Furthermore, β₀ is the intercept, β₁ is the coefficient for X in the first segment and β_2r is the difference in the coefficients for X between the two consecutive segments joining at the change-point τ_r. Also, the coefficient for X in the segment that immediately follows τ_r is $β_{1} + \sum_{l = 1}^{r} β_{2 l}$ , $r = 1, \dots, R$ . Model (1) encompasses as special cases the change-point model of Zhou and Liang¹¹ for a generalized linear model with a single change-point in one covariate and the broken stick model, investigated by Das et al⁷ when Y₁,...,Y_n are independent. Under the current setting, the Y_i’s are allowed to be independent or correlated. For example, the data on diabetes could be assumed independent among the participants of the Pima Indian diabetes study. Meanwhile, the updates of the L2C estimate for each jurisdiction in the HPTN 065 study were reported over time and could be correlated.

2.2. Setting with repeated/clustered measurements

This setting includes situations where there are repeated, longitudinal or clustered data for the response variable. Let Y_ij denote the jth measurement of the response variable for the ith study subject, X_ij be a continuous covariate experiencing at least one change-point and W_ij denote a vector of other covariates without change-points, j = 1,...,m_i, i = 1,...,n. Here m_i is the number of available repeated data for the ith subject i = 1,...,n. An extension of Model (1) under this situation is the following population average model:

E (Y_{i j} | X_{i j}, W_{i j}) = h {η (θ, X_{i j}, W_{i j})} = h {β_{0} + β_{1} X_{i j} + \sum_{r = 1}^{R} β_{2 r} {(X_{i j} - τ_{r})}_{+} + γ^{'} W_{i j}},

(2)

where h(.), (.)₊ and θ are defined similarly as in (1). The observations, Y_i1,...,Y_imi are possibly correlated. However, (Y_i,X_i,W_i), i = 1,...,n, are assumed to be independent. Here $Y_{i} = {(Y_{i 1}, \dots, Y_{i m_{i}})}^{'}$ , $X_{i} = {(X_{i 1}, \dots, X_{i m_{i}})}^{'}$ and $W_{i} = {(W_{i 1}^{'}, \dots, W_{i m_{i}}^{'})}^{'}$ , i=1,…,n.For example, in the HPTN 065 study, Y_ij can be thought of as the jth updated L2C estimate and X_ij as the jth quarter since the first data-upload month for the ith jurisdiction.

For ease of discussion, $R$ , which represents the number of change-points in (1) or (2) is assumed to be known. Its estimation lies beyond the scope of the current work. If the vector of change-points, τ were known, ϑ in (1) could be estimated using standard estimation methods for generalized linear models.¹⁴ Similarly, usual generalized estimating equation techniques^15,16 could be applied to estimate ϑ in (2) under the same situation. However, this is not the case in our context, and we are concerned with the estimation of θ that involves both τ and ϑ in (1) and (2).

3. Estimation

For identifiability of the generalized change-point model specified in (1) or (2), we assume that the change-points exist (implying that $β_{2 r} \neq 0, r = 1, \dots, R$ ) and the τ_r’s are wellseparated when $R \geq 1$ . We first present our methods for the settings with a single measurement in Subsection 3.1 and then extend these methods to settings with repeated/clustered measurements for the response variable in Subsection 3.2.

3.1. Setting with one measurement per unit

We first consider the case where (Y_i,X_i,W_i), i = 1,...,n, are independent. A direct application of the traditional maximum likelihood or least squares method for estimating θ would encounter the problem that (.)₊ in (1) is not a smooth function. This complicates the use of a Newton-Raphson type algorithm for the optimization of the log-likelihood function or the least square objective function. One way to overcome this challenge is to use an adaptation of Muggeo’s algorithm^12,13 which is based on a linear approximation of the nonlinear part of the regression function through a first order Taylor series expansion. This method may not work well in many situations including when h(u) ≠ u and n is not large.

An estimator for θ using an approximation by smoothing techniques can be obtained by replacing (.)₊ in (1) with a smooth function and solving the following estimating equations:

\sum_{i = 1}^{n} \frac{\partial}{\partial θ} \tilde{η} (θ, X_{i}, W_{i}, λ_{n}) [Y_{i} - h {\tilde{η} (θ, X_{i}, W_{i}, λ_{n})}] = 0,

(3)

where {λ_n} is a sequence of positive real numbers converging to zero as n goes to infinity, $\tilde{η} (θ, X_{i}, W_{i}, λ_{n}) = β_{0} + β_{1} X_{i} + \sum_{r = 1}^{R} β_{2 r} f_{s} (X_{i} - τ_{r}, λ_{n}) + γ^{'} W_{i}$ , and f_s(u,λ) is a smooth function that is differentiable with respect to u and gets closer to u₊ as λ approaches 0. We refer to λ as the tuning parameter for f_s(u,λ). Potential choices for this function include f_s(u,λ) = (u + λ)²I(|u| ≤ λ)/4λ + uI(u > λ)^6,7 and f_s(u,λ) = uK(u/λ), where K(u) is the cumulative distribution function of the standard normal distribution¹¹ or of the form {1 + tanh(u)}/2.^17,18 This can be viewed as an extension of the methods in Das et al⁷ and Zhou and Liang.¹¹ Its performance is generally not affected by the choice of f_s(u,λ_n) but could be sensitive to the choice of λ_n. Improperly selecting λ_n could lead to biased inference on θ when n is not large. This has motivated us to seek an alternative approach that is free of approximations through smoothing or linearization. When the Y_i’s are independent, the proposed estimator, denoted by ${\hat{θ}}_{I}$ for θ in (1) solves

\sum_{i = 1}^{n} [\begin{matrix} 1 \\ X_{i} \\ W_{i} \\ X_{i} (θ) \end{matrix}] [Y_{i} - h {η (θ, X_{i}, W_{i})}] = 0,

(4)

where $X_{i} (θ) = {{(X_{i} - τ_{1})}_{+}, \dots, {(X_{i} - τ_{R})}_{+}, - β_{21} I (X_{i} > τ_{1}), \dots, - β_{2 R} I (X_{i} > τ_{R})}^{'}$ . The computation of ${\hat{θ}}_{I}$ can be performed via iterative methods for solving a system of nonlinear equations without derivatives such as the derivative-free spectral algorithm for nonlinear equation,¹⁹ which combines the Barzilai-Borwein spectral gradient method with non-monotone line search techniques as a strategy to achieve a global convergence. It can be used for solving non-smooth estimating equations²⁰ and implemented via the ‘BBsolve’ function in the ‘BB’ R package. We rely on this algorithm for the computation of the proposed estimators in the simulations in Section 4 and the application to real examples in Section 5. Alternatively, solving (4) can also be formulated as a problem of minimizing $Q (ϑ, τ) = - \sum_{i = 1}^{n} f (θ, Y_{i}, X_{i}, W_{i}) / ϕ$ , where f(θ,y,x,w) is a log quasi-likelihood function under Model (1). For example, f(θ,y,x,w) takes the form −[y − h{η(θ,x,w)}]²/2 for a linear model, y log[h{η(θ,x,w)}] + (1 − y)log[1 − h{η(θ,x,w)}] for a logistic model and y log[h{η(θ,x,w)}] − h{η(θ,x,w)} for a Poisson model. Furthermore, φ is the error variance for a linear model and takes the value 1 for logistic and Poisson models. Q(ϑ,τ) may also be taken as the squared L₂ norm of the estimating function in (4). The minimization of Q(ϑ,τ) can be carried out through derivative-free optimization techniques that can handle non-smooth objective functions. The simulated annealing algorithm²¹ and the genetic algorithm²² are among such optimization techniques. It is worth mentioning that there may exist multiple roots for (4) or local minima for Q(ϑ,τ).^9,12 Therefore, the performance of the derive-free root finding algorithms or optimization algorithms may depend on the choice of initial values for τ and ϑ. This issue is shared by Muggeo’s method as well as the change-point estimation methods using smoothing techniques.^9,11 Reasonable starting values for the change-points may be obtained based on the following two-step approach. In the first step, an estimator $\tilde{ϑ} (τ)$ for ϑ is obtained as the minimizer of Q(ϑ,τ) for a fixed τ. Subsequently in the second step, the initial value ${\tilde{τ}}^{(0)}$ for τ is found by grid search near the potential change-point locations as the minimizer of $Q (\tilde{ϑ} (τ), τ)$ . Potential change-point locations may be obtained by means of graphical examination of the data. The starting value for θ is then taken as ${\tilde{θ}}^{(0)} = ({\tilde{ϑ}}^{(0)}, {\tilde{τ}}^{(0)})$ , where ${\tilde{ϑ}}^{(0)} = \tilde{ϑ} ({\tilde{τ}}^{(0)})$ .

Let θ₀ be the true value of θ. The large sample properties of ${\hat{θ}}_{I}$ are summarized as follows. Under the regularity conditions (I1)–(I5) in Appendix A of the Supplementary Materials, ${\hat{θ}}_{I}$ is a consistent estimator for θ and $\sqrt{n} ({\hat{θ}}_{I} - θ_{0})$ is asymptotically normally distributed with mean 0. The covariance matrix of ${\hat{θ}}_{I}$ can be consistently estimated by $ϕ {D {({\hat{θ}}_{I})}^{'} A ({\hat{θ}}_{I}) D ({\hat{θ}}_{I})}^{- 1}$ , where $D (θ)$ is the matrix whose ith row is ${1, X_{i}, W_{i}^{'}, X_{i} (θ)}^{'}$ , i = 1,...,n, and $A (θ) = D i a g [ν_{1} (θ), \dots, ν_{n} (θ)]$ . Here ν_i(θ) is 1 for a linear model and takes the form h{η(θ,X_i,W_i)} and h{η(θ,X_i,W_i)}[1 − h{η(θ,X_i,W_i)}, for Poisson and logistic models, respectively. More details regarding the asymptotic properties for ${\hat{θ}}_{I}$ are provided in Appendix B of the Supplementary Materials.

We now turn our attention to the situation where the Y_i’s are correlated. This includes the case of a broken stick model with autocorrelated errors. Methods for the estimation of θ under this setting are scarce in the literature. Let Y = (Y₁,...,Y_n)⁰, X = (X₁,...,X_n)’ and W be the matrix whose ith row is W’_i. Also, let Σ denote the correlation matrix of Y and $Ω (θ) = ϕ A {(θ)}^{1 / 2} Σ A {(θ)}^{1 / 2}$ , where φ and $A (θ)$ are defined as above. The proposed estimator, denoted by ${\hat{θ}}_{C}$ for θ in (1) under this situation solves

D (θ) Ω^{- 1} (θ) [Y - h {η (θ, X, W)}] = 0,

(5)

where $D (θ) = D {(θ)}^{'} A (θ)$ and h{η(θ,X,W)} is the vector whose ith component is given by h{η(θ,X_i,W_i)}, i = 1,...,n. In practice, Σ is generally unknown and is replaced in (5) by an estimator Σ of a working correlation matrix based on an assumed correlation structure. Under suitable regularity conditions, ${\hat{θ}}_{C}$ is a consistent estimator for θ and $\sqrt{n} ({\hat{θ}}_{C} - θ_{0})$ is asymptotically normally distributed with mean 0. Furthermore, the covariance matrix of ${\hat{θ}}_{C}$ can be consistently estimated by ${D ({\hat{θ}}_{C}) \hat{Ω} {({\hat{θ}}_{C})}^{- 1} D {({\hat{θ}}_{C})}^{'}}^{- 1}$ , where $\hat{Ω} (θ)$ is defined similarly as Ω(θ) but with Σ replaced by $\hat{Σ}$ . It is worth noting that ${\hat{θ}}_{C}$ coincides with ${\hat{θ}}_{I}$ when is $\hat{Σ}$ the identity matrix.

3.2. Setting with repeated/clustered response data

For the particular case of repeated numeric response data corresponding to h(u) = u (piecewise linear model), Das et al⁷ proposed an estimator using smoothing techniques similarly as the estimator that solves (3). Here we are mainly interested in an estimation method that is free of smoothing approximation for the general problem of the estimation of change-points in at least one the covariates of a generalized estimating equation model for longitudinal continuous or binary data. Let Σ_i denote the correlation matrix of $Y_{i} = {(Y_{11}, \dots, Y_{1 m_{i}})}^{'}$ , i = 1,...,n. Also, let $X_{i j} (θ) = {{(X_{i j} - τ_{1})}_{+}, \dots, (X_{i j} - {τ_{R})}_{+}, - β_{21} I (X_{i j} > τ_{1}), \dots, - β_{2 R} I (X_{i j} > τ_{R})}^{'}$ , and $D_{i} (θ)$ be the matrix whose jth row is ${1, X_{i j}, W_{i j}^{'}, X_{i j} (θ)}^{'}$ , j=1,, mi, i = 1,, n. Furthermore, denote by $A_{i} (θ)$ the diagonal matrix $D i a g [ν_{i 1} (θ), \dots, ν_{i m_{i}} (θ)]$ , where ν_ij(θ) is the error variance for linear model, h{η(θ,X_ij,W_ij)}[1−h{η(θ,X_ij,W_ij)} for logistic model and h{η(θ,X_ij,W_ij)} for Poisson model with change-points. Our estimator, ${\hat{θ}}_{L}$ for θ in (2) is obtained by solving

\sum_{i = 1}^{n} D_{i} (θ) {\hat{Ω}}_{i}^{- 1} (θ) [Y_{i} - h {η (θ, X_{i}, W_{i})}] = 0,

(6)

where $D_{i} (θ) = D_{i} {(θ)}^{'} A_{i} (θ), {\hat{Ω}}_{i} (θ) = A_{i} {(θ)}^{1 / 2} {\hat{Σ}}_{i} A_{i} {(θ)}^{1 / 2}$ and ${\hat{Σ}}_{i}$ is a working correlation matrix that can be obtained based on an assumed correlation structure for Y_i. Under suitable regularity conditions, ${\hat{θ}}_{L}$ is consistent and $\sqrt{n} ({\hat{θ}}_{L} - θ_{0})$ asymptotically follows a zero-mean normal distribution. The covariance matrix of ${\hat{θ}}_{L}$ can be consistently estimated by $B {({\hat{θ}}_{L})}^{- 1} M ({\hat{θ}}_{L}) B {({\hat{θ}}_{L})}^{- 1}$ , where $M (θ) = \sum_{i = 1}^{n} D_{i} (θ) {\hat{Ω}}_{i}^{- 1} (θ) Ψ_{i} (θ) Ψ_{i} {(θ)}^{'} {\hat{Ω}}_{i} {(θ)}^{- 1} D_{i} {(θ)}^{'}$ , $B (θ) = \sum_{i = 1}^{n} D_{i} (θ) {\hat{Ω}}_{i}^{- 1} (θ) D_{i} {(θ)}^{'}$ and $Ψ_{i} (θ) = Y_{i} - h {η (θ, X_{i}, W_{i})}$ , i = 1,...,n.

It is important to note that Models (1) and (2) can be extended to incorporate more than one variable experiencing the change-points. Our estimation methods can be easily applied to such a situation, which is considered in the simulations and applications. Moreover, key differences with the existing methods in Das et al⁷ and Zhou and Liang¹¹ are highlighted as follows. First, the methods introduced in these references use approximations by smoothing techniques requiring the selection of a tuning parameter while our estimators, ${\hat{θ}}_{I}$ , ${\hat{θ}}_{C}$ and ${\hat{θ}}_{L}$ do not involve such smoothing techniques. Second, the methods in Das et al⁷ were built for the estimation of change-points in the covariate of a linear regression model. Meanwhile, our methods can accommodate linear and nonlinear models under the generalized linear modeling framework. Third, the method by Zhou and Liang¹¹ was developed for the estimation of a single change-point in one covariate of a generalized linear model for independent observations. Our methods on the other hand can handle the estimation of multiple change-points in one or several covariates of a generalized linear model for independent or correlated data.

4. Simulation study

We performed extensive simulations to examine the finite-sample performances of the proposed estimators, ${\hat{θ}}_{I}$ , ${\hat{θ}}_{C}$ and ${\hat{θ}}_{L}$ presented in Section 3. We also compared ${\hat{θ}}_{I}$ with Muggeo’s estimator, denoted by ${\hat{θ}}_{M}$ , when the data are independent. ${\hat{θ}}_{M}$ can be easily implemented via the ‘segmented’ R package. This represents a computational advantage for Muggeo’s algorithm over the proposed methods when the independence assumption for the data holds. The implementation of the proposed methods however requires solving the estimating equations (4), (5) and (6). In Table 1, we considered the situations where there is a single change-point in a covariate of a linear, logistic or Poisson model and the observations are independent. The covariate X was drawn from U(−3,3) and Y was generated according to $E (Y | X) = h {β_{0} + β_{1} X + β_{2} {(X - τ)}_{+}}$ with (β₀,β₁,β₂,τ) = (1,2,−3,1), n = 250 or 1000. For the linear regression, Y given X was normal with mean β₀+β₁X+β₂(X−τ)₊ and variance 0.25. The results were from 1000 Monte Carlo samples and the performances of ${\hat{θ}}_{M}$ and ${\hat{θ}}_{I}$ were assessed with regard to bias (Bias), sample standard deviation of the estimates (SD), average of the estimated standard errors (ASE), mean square error (MSE) of the estimates and coverage probability of the 95% Wald-type confidence interval. Here bias is defined as the difference between the sample mean of estimates from the 1000 replicates and the true parameter value and MSE = Bias² +SD². Also, coverage probability denotes the proportion of simulations from the 1000 replicates when the 95% Wald-type confidence interval includes the true parameter value. It can be noted from this table that ${\hat{θ}}_{I}$ and ${\hat{θ}}_{M}$ perform well and yield very close results for linear and Poisson models. For Logistic regression however, ${\hat{θ}}_{M}$ exhibits larger bias and MSE compared with ${\hat{θ}}_{I}$ when n = 250. Also, it shows smaller ASE and coverage probability (under 90%) than ${\hat{θ}}_{I}$ for τ when n = 250. Meanwhile, ${\hat{θ}}_{I}$ demonstrates good coverage probabilities and its SDs are halved when n increases from 250 to 1000, suggesting that ${\hat{θ}}_{I}$ is $\sqrt{n}$ -consistent.

Table 1:

Simulation results for the situation where there is a single change-point for linear, logistic and Poisson regression models for independent data; ${\hat{θ}}_{M}$ , estimator based on Mueggo’s method; ${\hat{θ}}_{I}$ , proposed estimator solving (4).

			${\hat{θ}}_{M}$					${\hat{θ}}_{I}$
Model	n		Bias	ASE	SD	MSE	CP	Bias	ASE	SD	MSE	CP
Linear	250	β₀	0.004	0.052	0.053	0.003	0.939	0.004	0.052	0.052	0.003	0.944
		β₁	0.000	0.034	0.035	0.001	0.938	0.000	0.034	0.034	0.001	0.940
		β₂	−0.001	0.102	0.110	0.012	0.929	−0.001	0.102	0.109	0.012	0.930
		τ	−0.002	0.045	0.047	0.002	0.940	−0.002	0.045	0.046	0.002	0.946
	1000	β₀	0.000	0.026	0.026	0.001	0.942	0.000	0.026	0.026	0.001	0.946
		β₁	0.000	0.017	0.017	0.000	0.948	0.000	0.017	0.017	0.000	0.947
		β₂	0.001	0.050	0.052	0.003	0.942	0.001	0.050	0.052	0.003	0.943
		τ	0.000	0.022	0.023	0.001	0.940	0.000	0.022	0.023	0.001	0.943

Logistic	250	β₀	0.095	0.316	0.414	0.180	0.942	0.042	0.296	0.294	0.088	0.960
		β₁	0.122	0.333	0.407	0.180	0.950	0.073	0.314	0.305	0.098	0.968
		β₂	−0.236	0.783	0.798	0.692	0.965	−0.196	0.765	0.769	0.629	0.964
		τ	−0.040	0.324	0.415	0.174	0.870	0.003	0.330	0.300	0.090	0.952
	1000	β₀	0.015	0.144	0.153	0.024	0.939	0.013	0.143	0.148	0.022	0.944
		β₁	0.021	0.151	0.160	0.026	0.945	0.020	0.150	0.156	0.025	0.948
		β₂	−0.044	0.351	0.350	0.125	0.956	−0.038	0.349	0.344	0.120	0.959
		τ	−0.003	0.163	0.185	0.034	0.910	−0.003	0.163	0.169	0.029	0.932

Poisson	250	β₀	−0.004	0.070	0.069	0.005	0.944	−0.003	0.070	0.068	0.005	0.946
		β₁	0.008	0.101	0.106	0.011	0.937	0.007	0.100	0.103	0.011	0.943
		β₂	−0.013	0.124	0.124	0.015	0.945	−0.011	0.124	0.120	0.014	0.940
		τ	0.000	0.031	0.034	0.001	0.914	0.000	0.031	0.034	0.001	0.922
	1000	β₀	−0.001	0.035	0.035	0.001	0.936	−0.001	0.035	0.035	0.001	0.936
		β₁	0.002	0.050	0.050	0.003	0.948	0.002	0.050	0.049	0.002	0.951
		β₂	−0.002	0.061	0.060	0.004	0.956	−0.002	0.061	0.059	0.004	0.956
		τ	0.000	0.016	0.016	0.000	0.946	0.000	0.016	0.015	0.000	0.947

Open in a new tab

Note: SD denotes the sample standard deviation of the estimates; ASE is the average of the estimated standard errors; MSE is mean square error; CP represents the coverage probabilities of the 95% confidence intervals.

Additionally, we conducted simulations to examine the performances of ${\hat{θ}}_{M}$ and the proposed estimators ${\hat{θ}}_{I}$ and ${\hat{θ}}_{C}$ for the case of a single change-point in a covariate of a linear model with small to moderate sample sizes. The covariate X was simulated from U(−3,3) and the response Y given X followed a multivariate normal distribution with mean β₀ + β₁X + β₂(X − τ)₊ and variance σ²Σ with $Σ_{i, j} = ρ^{| i - j |}$ assuming an AR(1) correlation structure. We set ρ = 0 or ρ = 0.3 (depicting independent or correlated observations), θ = (β₀,β₁,β₂,τ) = (1,2,−3,1), σ = 0.25 and n = 15,30,60 and 120. The parameter θ was estimated by ${\hat{θ}}_{I}$ and ${\hat{θ}}_{M}$ when ρ = 0 and ${\hat{θ}}_{C}$ when ρ ≠ 0. Moreover, the performance of ${\hat{θ}}_{C}$ was further investigated when n = 300 and ρ = 0.3, 0.7. The results of these simulations are presented in Table S1 of the Supplementary Materials. For the situation where ρ = 0, it can be observed that ${\hat{θ}}_{I}$ and ${\hat{θ}}_{M}$ show similarly small biases and adequate coverage probabilities when n ≥ 30. Furthermore, ${\hat{θ}}_{I}$ works satisfactorily while ${\hat{θ}}_{M}$ may suffer from low coverage probabilities when n = 15. For the situation where ρ 6= 0, ${\hat{θ}}_{C}$ generally displays small biases and coverage probabilities that are close to the nominal 95% level except when the sample size is very small (n = 15). The performances of all the methods improve as n gets larger. The ASE’s for the change-point and slope parameters decrease as ρ increases.

Similarly, simulations were also performed to investigate the performances of the methods ${\hat{θ}}_{M}$ and ${\hat{θ}}_{I}$ for the estimation of a single change-point in a covariate of a logistic or Poisson regression model for independent data under small to moderate sample sizes. The data generation was as in Table 1 for logistic and Poisson models, except that n = 50 or 100. The results are presented in Table S2 of the Supplementary Materials. For logistic model, it can be noted that both ${\hat{θ}}_{M}$ and ${\hat{θ}}_{I}$ show large biases and inadequate coverage probabilities. Furthermore, ${\hat{θ}}_{M}$ shows smaller ASE’s and lower coverage probabilities for the change-point in comparison with ${\hat{θ}}_{I}$ . For the Poisson model, both estimators have reasonable biases and satisfactory coverage probabilities for the estimators of the intercept and slope parameters. Furthermore, ${\hat{θ}}_{M}$ exhibits low coverage probabilities for the change-point. Additional simulations examining the performance of the smoothing techniques-based estimator that solves (3) are provided in Appendix C of the Supplementary Materials.

In Table 2, there were two change-points in one covariate of a linear, logistic or Poisson regression model for independent observations. The covariate X followed U(−3,3) and Y was generated via the model $E (Y | X) = h {β_{0} + β_{1} X + β_{21} {(X - τ_{1})}_{+} + β_{22} {(X - τ_{2})}_{+}}$ with θ = (β₀,β₁,β₂₁,β₂₂,τ₁,τ₂) set as (1,−1,2,−3,−0.5,1), n = 500 or 2000. For linear model, Y given X was normal with variance 0.25. The parameter θ was estimated using ${\hat{θ}}_{I}$ and ${\hat{θ}}_{M}$ . These estimators show similar performances as in Table 1 for linear, logistic or Poisson regression model. ${\hat{θ}}_{M}$ has inadequate coverage probabilities for the change-points τ₁ and τ₂ for logistic regression model when n = 500. It also shows larger biases and MSE’s in comparison with the ${\hat{θ}}_{I}$ counterparts for the logistic regression model. Simulations assessing the performances of the methods when there are two covariates, each experiencing one change-point in a linear, logistic or Poisson regression model are presented in Appendix D of the Supplementary Materials.

Table 2:

Simulation results for the situation where there are two change-points in a covariate of a linear, logistic or Poisson regression model; ${\hat{θ}}_{M}$ , estimator based on Mueggo’s method; ${\hat{θ}}_{I}$ , proposed estimator solving (4).

			${\hat{θ}}_{M}$					${\hat{θ}}_{I}$
Model	n		Bias	ASE	SD	MSE	CP	Bias	ASE	SD	MSE	CP
Linear	500	β₀	−0.007	0.092	0.095	0.009	0.943	−0.006	0.092	0.093	0.009	0.948
		β₁	−0.002	0.049	0.049	0.002	0.944	−0.002	0.049	0.048	0.002	0.947
		β₂₁	−0.001	0.115	0.117	0.014	0.944	−0.003	0.115	0.107	0.012	0.960
		β₂₂	0.002	0.124	0.125	0.016	0.953	0.004	0.124	0.116	0.013	0.963
		τ₁	−0.006	0.057	0.061	0.004	0.924	−0.006	0.057	0.056	0.003	0.938
		τ₂	0.002	0.040	0.043	0.002	0.926	0.002	0.040	0.041	0.002	0.941
	2000	β₀	−0.001	0.046	0.047	0.002	0.941	0.001	0.046	0.047	0.002	0.944
		β₁	0.000	0.024	0.025	0.001	0.944	0.000	0.024	0.024	0.001	0.942
		β₂₁	0.003	0.057	0.057	0.003	0.944	0.002	0.057	0.055	0.003	0.948
		β₂₂	−0.004	0.062	0.064	0.004	0.941	−0.003	0.062	0.062	0.004	0.944
		τ₁	0.000	0.028	0.029	0.001	0.943	0.000	0.028	0.028	0.001	0.953
		τ₂	0.000	0.020	0.020	0.000	0.946	0.000	0.020	0.020	0.000	0.954

Logistic	500	β₀	−0.162	0.462	0.625	0.417	0.910	−0.069	0.431	0.403	0.167	0.961
		β₁	−0.092	0.275	0.344	0.127	0.925	−0.046	0.262	0.250	0.064	0.967
		β₂₁	0.425	0.738	0.901	0.992	0.956	0.123	0.569	0.466	0.232	0.979
		β₂₂	−0.458	0.795	0.965	1.141	0.961	−0.163	0.633	0.526	0.303	0.985
		τ₁	0.002	0.230	0.351	0.123	0.791	−0.002	0.241	0.199	0.040	0.958
		τ₂	−0.006	0.179	0.244	0.060	0.840	0.008	0.184	0.168	0.028	0.953
	2000	β₀	−0.032	0.210	0.234	0.056	0.934	−0.023	0.209	0.204	0.042	0.963
		β₁	−0.018	0.127	0.135	0.019	0.938	−0.014	0.127	0.123	0.015	0.956
		β₂₁	0.081	0.279	0.337	0.120	0.944	0.039	0.269	0.243	0.060	0.967
		β₂₂	−0.077	0.307	0.371	0.144	0.930	−0.041	0.298	0.277	0.079	0.969
		τ₁	0.001	0.119	0.151	0.023	0.896	0.000	0.120	0.109	0.012	0.961
		τ₂	−0.005	0.092	0.110	0.012	0.904	0.001	0.093	0.092	0.008	0.947

Poisson	500	β₀	−0.005	0.057	0.059	0.003	0.938	−0.005	0.057	0.058	0.003	0.946
		β₁	−0.002	0.025	0.026	0.001	0.940	−0.002	0.025	0.025	0.001	0.944
		β₂₁	0.008	0.073	0.076	0.006	0.937	0.007	0.073	0.073	0.005	0.942
		β₂₂	−0.010	0.109	0.112	0.013	0.942	−0.008	0.109	0.107	0.011	0.950
		τ₁	0.001	0.041	0.044	0.002	0.928	0.000	0.041	0.042	0.002	0.937
		τ₂	0.001	0.024	0.026	0.001	0.924	0.001	0.024	0.025	0.001	0.933
	2000	β₀	−0.001	0.029	0.030	0.001	0.942	−0.001	0.029	0.029	0.001	0.943
		β₁	0.000	0.012	0.013	0.000	0.945	0.000	0.012	0.013	0.000	0.947
		β₂₁	0.001	0.036	0.036	0.001	0.950	0.001	0.036	0.035	0.001	0.952
		β₂₂	0.001	0.054	0.053	0.003	0.962	0.001	0.054	0.051	0.003	0.966
		τ₁	0.000	0.021	0.021	0.000	0.945	0.000	0.021	0.020	0.000	0.948
		β₂	0.000	0.012	0.012	0.000	0.950	0.000	0.012	0.012	0.000	0.949

Open in a new tab

In Table 3, we examined the situation with a single change-point in a covariate of a linear, logistic or Poisson model for longitudinal data. The sample size was n = 200 and the number of observation m_i for subject i was generated as max(2,κ_i), where κ_i ∼ Poisson(10), i = 1,...,n. Also, the covariate X_ij for subject i was simulated from U(−3,3), j = 1,...,m_i. For the correlation among the observations within subject, an AR(1) correlation structure was considered for linear model while exchangeable correlation structures were assumed for logistic and Poisson models with a correlation parameter ρ = 0.3 or 0.7. Moreover, Y_ij was generated according to the population mean model $E (Y_{i j} | X_{i j}) = h {β_{0} + β_{1} X_{i j} + β_{2} {(X_{i j} - τ)}_{+}}$ . For linear model, the error followed a zero-mean normal distribution with variance 0.25. The vector of parameters θ = (β₀,β₁,β₂,τ) was set as (1,−1,2,2) and estimated using ${\hat{θ}}_{I}$ . This estimator shows small biases and adequate coverage probabilities for the change-point τ and the other components of θ. Also, the ASE for τ decreases as ρ increases.

Table 3:

Simulation results for the situation where there is a single change-point in a covariate of a linear, logistic or Poisson regression model for longitudinal data; ${\hat{θ}}_{L}$ , proposed estimator solving (6).

		${\hat{θ}}_{L}$ (ρ = 0.3)					${\hat{θ}}_{L}$ (ρ = 0.7)
Model		Bias	ASE	SD	MSE	CP	Bias	ASE	SD	MSE	CP
Linear	β₀	0.001	0.016	0.016	0.000	0.951	0.001	0.022	0.022	0.000	0.951
	β₁	0.000	0.008	0.008	0.000	0.952	0.000	0.005	0.005	0.000	0.955
	β₂	0.006	0.088	0.088	0.008	0.942	0.004	0.058	0.058	0.003	0.939
	τ	0.001	0.027	0.028	0.001	0.941	0.000	0.018	0.018	0.000	0.944

Logistic	β₀	−0.003	0.096	0.095	0.009	0.955	0.002	0.128	0.128	0.016	0.950
	β₁	−0.005	0.059	0.060	0.004	0.945	−0.010	0.069	0.073	0.005	0.935
	β₂	0.079	0.400	0.383	0.153	0.961	0.070	0.361	0.335	0.117	0.964
	τ	0.004	0.125	0.125	0.016	0.933	0.001	0.113	0.116	0.013	0.925

Poisson	β₀	−0.002	0.027	0.027	0.001	0.944	−0.002	0.032	0.032	0.001	0.944
	β₁	−0.001	0.009	0.009	0.000	0.943	−0.001	0.010	0.010	0.000	0.957
	β₂	0.023	0.224	0.220	0.049	0.939	0.019	0.173	0.168	0.029	0.954
	τ	0.002	0.074	0.074	0.005	0.938	0.003	0.058	0.058	0.003	0.946

Open in a new tab

5. Applications

We applied the proposed methods to i) examine the trends of the reported L2C estimates and to provide improvements of these estimates in the HPTN065 study and ii) investigate the associations of age, BMI and glucose with T2D in the Pima Indian diabetes study.

5.1. HIV surveillance data from HPTN 065 study

The HIV surveillance data were briefly described in the Introduction section. In our analysis, L2C is the measure of interest. It also represents one of the key measures that the National HIV/AIDS Strategy (NAHS) aims to enhance.²³ However, its monitoring is difficult due to reporting delays and data incompleteness among other potential issues that may compromise the quality of surveillance data in general.

In our analysis, we were primarily interested in investigating the L2C trends over the data upload time and providing improved-quality L2C estimates based on the trend examination. A graphical examination of the L2C trends in Figure 1 led us to assume the existence of a single change-point in the L2C trend for each jurisdiction. We first considered the jurisdiction-specific trends and used the following piecewise linear model for the L2C data separately for each city jurisdiction, corresponding to the setting with one measurement for the response variable:

E (Y_{i} | X_{i}) = β_{0} + β_{1} X_{i} + β_{2} {(X_{i} - τ)}_{+},

where Y_i is L2C and X_i denotes the ith quarter since the first data-upload month, θ = (β₀,β₁,β₂,τ)’ represents the jurisdiction-specific vector of parameters for the change-point model. We estimated θ by our estimator ${\hat{θ}}_{C}$ solving (5) assuming an AR(1) correlation structure for the Y_i’s. The estimates of the change-point model parameters together with their corresponding standard errors and 95% Wald-type confidence intervals are reported in Table 4. Also, the fitted piecewise linear regression lines are displayed in grey color solid lines in Figure 1. The results suggest that the change-point for the updated estimate of the L2C measure varied across the jurisdictions, signaling a difference in time when the L2C estimates approximately became stable between the six cities. The L2C estimates for Houston, TX and Philadelphia, PA appeared to become somewhat stable earlier compared to the other cities, while Chicago, IL showed the slowest pace to reaching approximate stability. Also, the reported L2C estimates for Chicago seemed to be relatively stable before the fifth update quarter after which time it showed a sudden increase probably due to reporting delays. This pattern was not evident in the trends of the L2C estimates for the other jurisdictions.

Table 4:

Results of the analysis of the HIV-surveillance data: fitting a two-phase piecewise linear model (with an unknown change-point) to the jurisdiction-specific L2C measures in the HPTN 065 study based on the proposed estimator solving (5).

		City jurisdiction
		DC	Miami	Chicago	New york	Philadelphia	Houston
		Estimation of the change-point model parameters
β₀	Estimate	71.25	42.21	74.13	78.60	74.19	49.43
	SE	0.76	0.70	1.15	1.22	0.06	0.86
	95% CI	(69.76, 72.74)	(40.84, 43.59)	(71.88, 76.38)	(76.21, 81.00)	(74.07, 74.31)	(47.74, 51.12)
β₁	Estimate	3.29	3.13	0.96	3.58	2.92	16.37
	SE	0.61	0.39	0.27	0.67	0.05	1.26
	95% CI	(2.09, 4.48)	(2.37, 3.90)	(0.44, 1.49)	(2.26, 4.90)	(2.82, 3.02)	(13.90, 18.83)
β₂	Estimate	−3.01	−2.93	−0.96	−3.87	−3.0	−16.04
	SE	0.61	0.39	0.77	0.69	0.05	1.26
	95% CI	(−4.21, −1.80)	(−3.70, −2.16)	(−2.47, 0.55)	(−5.22, −2.52)	(−3.10, −2.90)	(−18.50, −13.57)
τ	Estimate	2.96	3.35	7.52	3.38	2.34	1.55
	SE	0.46	0.39	1.96	0.47	0.03	0.10
	95% CI	(2.05, 3.87)	(2.66, 4.04)	(3.67, 11.36)	(2.46, 4.31)	(2.28, 2.40)	(1.36, 1.74)

		Annual percentage of linkage to care
%	Estimate	82.13	53.54	81.44	89.68	80.62	76.42

Open in a new tab

Note: β₀ is the intercept; β₁ is the slope of the first segment; β₂ is the difference in slopes between the two segments; τ is the jurisdiction change-point.

The knowledge of the time when the L2C estimates became somewhat stable is also important as it may give an insight into how fast the surveillance system reaches maturity with regard to delays and completeness of reporting of the HIV surveillance data for the jurisdictions. Without such valuable information, critical decisions on allocation of resources and planning for HIV treatment and prevention may be made based on unreliable estimates of the prevention measures. This could lead to a misunderstanding of the trend of the true linkage to care measure and potentially hamper the efforts to reduce HIV infections in the general population. Also, the knowledge of the change-point can be used to obtain improved estimates of the reported surveillance measures in the absence of accurate estimates. It is very likely that the data quality is higher for the post change-point estimates compared to the earlier ones. We used the L2C data that were reported after the change-point to provide more reliable L2C estimates by taking the average of the post change-point estimates for each jurisdiction. The improved estimates of the annual percentage of linkage to care for the six cities are shown at the bottom of Table 4. They ranged from 53.54% (Miami, FL) and 89.68% (New York, NY) and were higher than 80%, except in Miami, FL and Houston, TX (76.41%). To further investigate the bias in the reported estimates over the data upload time, we defined the bias of the reported estimate as the reported estimate from the surveillance data minus the improved estimate that we calculated. It can be seen from Figure S2 of the Supplementary Materials that the reporting bias appeared to be generally large and negative before the change-point probably due to the data incompleteness or reporting delays. Also, it decreased and got close to zero after the change-point for each jurisdiction. It is important to mention that we did not pursue a formal test for the post change-point data stabilization. This could be entertained by testing if the slope after the change-point is equal to 0.

Next, we considered the HIV surveillance data as clustered within the jurisdictions and modeled the L2C data as follows for illustrative purposes only.

E (Y_{i j} | X_{i j}) = β_{0} + β_{1} X_{i j} + β_{2} {(X_{i j} - τ)}_{+},

where Y_ij represents the jth update of the L2C estimate and X_ij denotes the jth quarter since the first data-upload month for the ith jurisdiction, and τ is the average time when the measure somewhat stabilizes across the jurisdictions. We estimated θ = (β₀,β₁,β₂,τ) in this model by our estimator ${\hat{θ}}_{L}$ , solving (6) and assuming an AR(1) correlation structure for ${(Y_{i 1}, \dots, Y_{i m_{i}})}^{'}$ . We obtained ${\hat{θ}}_{L} = {(64.68, 4.66, - 4.43, 2.39)}^{'}$ with the corresponding vector of estimated standard errors (4.88,2.03,2.33,0.75). This indicates that on average, the reported L2C estimate became somewhat stable within a year following the first data upload month across the jurisdictions. We would like to note in passing that the sample size in this analysis was small and this could cause challenges for statistical inference based on methods requiring large samples in general.

5.2. Pima Indians diabetes study

The data set for the Pima Indian diabetes study was described in the Introduction section. In this analysis, 16 participants with biologically abnormal zero values for BMI and glucose concentration levels were excluded. We were interested in exploring the effect of age, BMI and glucose on the risk of being diagnosed with T2D. We used the following logistic regression model to capture the changing effects of age and BMI on the log-odds of T2D diagnosis (observed in Figure 2), assuming one change-point for age and two change-points for BMI and adjusting for glucose concentration level.

P (Y_{i} = 1 | X_{i}, Z_{i}, W_{i}) = h {β_{0} + β_{1 x} X_{i} + β_{1 z} Z_{i} + β_{w} W_{i} + β_{2 x} {(X_{i} - τ_{x})}_{+} + β_{21 z} {(Z_{i} - τ_{1 z})}_{+} + β_{22 z} {(Z_{i} - τ_{2 z})}_{+}},

where Y_i is the indicator variable for being diagnosed with T2D, X_i is age experiencing a change-point at τ_x, Z_i denotes BMI with possibly two ordered change-points, τ_1z and τ_2z, W_i represents glucose level for the ith participant and h(u) = (1 + e^−u)⁻¹. This corresponds to the setting with one measurement for the response. We estimated θ =(_β0,β_1x,β_1z,β_2x,β_21z,β_22z,τ_x,τ_1z,τ_2z)’ by our estimator ${\hat{θ}}_{I}$ , solving (4) and Muggeo’s estimator, ${\hat{θ}}_{M}$ with the independence assumption for the observations. The parameter estimates together with their estimated standard errors and 95% Wald-based confidence intervals are shown in Table 5. The estimation results for the two change-points in BMI obtained by both methods were very close. For age, the estimate (SE) of the change-point based on our method was 49.85 (13.86) while Muggeo’s counterpart was 52.06 (2.91). A plausible explanation for the differences in these results could be that Muggeo’s method uses linear approximation to estimate the change-points and the other model parameters. Also, the variance of the change-point by Muggeo’s method is obtained as an approximation to the variance of the ratio of two random variables based on the delta method and is sensitive to the difference in slopes and the sample size¹². This method may not adequately account for the uncertainty involved in the estimate of the change-point.²⁴ Our method on the other hand uses (4) and the asymptotic variance formula for θ_I in Section 3 for the estimation of the model parameters and associated variances. Another reason could be that the estimated location of the change-point for age by the proposed and Muggeo’s method is around 50, which lies far away from the median (29) of the distribution of the age variable as seen in Figure S3 of the Supplementary Materials. Locations of change-point near the edge of the threshold variable are known to affect the estimation of the change-points and may lead to wide estimates of confidence intervals in finite samples.¹² The estimated change-points for BMI were around 31 and 40, not far from the median (32) of the distribution of BMI. Nonetheless, these results indicated that Pima Indian women were most likely to be tested positive for T2D at about their menopausal age (50 years) after adjusting for BMI and glucose. Also, the risk of developing T2D appeared to be similar for women with BMI values between 31–40 (kg/m²), but increased more rapidly with each unit increase in BMI for women with BMI greater than 40 (kg/m²) after adjusting for age and glucose.

Table 5:

Application to the Pima Indians diabetes data: fitting a logistic regression change-point model for T2D diagnosis including age, BMI and glucose as covariates with age experiencing one change-point, BMI having two change-points and glucose having a linear effect (no change-point). ${\hat{θ}}_{I}$ represents the proposed estimator; ${\hat{θ}}_{M}$ denotes Muggeo’s estimator.

	${\hat{θ}}_{M}$			${\hat{θ}}_{I}$
Parameter	Estimate	SE	95% CI	Estimate	SE	95% CI
β₀	−15.656	1.918	(−19.53, −11.98)	−15.628	1.925	(−19.40, −11.86)
β_1x	0.066	0.012	(0.04, 0.09)	0.069	0.013	(0.04, 0.10)
β_1z	0.281	0.064	(0.16, 0.41)	0.276	0.065	(0.15, 0.40)
β_w	0.037	0.004	(0.03, 0.04)	0.037	0.004	(0.03, 0.04)
β_2x	−0.237	0.072	(−0.36, −0.11)	−0.204	0.050	(−0.30, −0.11)
β_21z	−0.355	0.086	(−0.52, −0.18)	−0.347	0.086	(−0.52, −0.18)
β_22z	0.221	0.089	(0.04, 0.39)	0.218	0.089	(0.04, 0.39)
τ_x	52.060	2.879	(46.91, 56.86)	49.811	13.88	(22.61, 77.02)
τ_1z	31.602	1.053	(29.42, 33.54)	31.60	1.830	(28.01, 35.19)
τ_2z	40.066	2.245	(35.67, 44.63)	40.105	1.433	(37.33, 42.91)

Open in a new tab

Note: β₀ is the intercept; τ_x is the change-point for age; τ_1z and τ_2z are the two change-points for BMI; β_1x is the slope for age before τ_x; β_2x is the difference in slopes for age before and after τ_x; β_1z is the slope for BMI before τ_1z; β_21z is the difference in slopes between the two segments of BMI joining at τ_1z; β_22z is the difference in slopes between the two segments of BMI joining at τ_2z; β_w is the slope for glucose concentration levels.

We note that polynomial regression models or regression spline models,^4,25,26 which are other popular methods for modeling nonlinearity could also be used to capture the nonlinear associations of age and BMI with the logit transformation of the probability of T2D diagnosis in the Pima Indian diabetes study. The results of an additional analysis of the diabetes data using a logistic regression model incorporating quadratic polynomial terms in age and natural cubic spline terms in BMI with 3 degrees of freedom are presented in Table S5 of the Supplementary Materials. They may not be directly interpretable. Our change-point models mainly differ from the polynomial or spline models in their simplicity and ease of interpretation. Additionally, polynomial models may involve no change-points. Furthermore, regression spline (e.g. cubic splines) models may require a certain degree of smoothness for the regression function at the knots (change-points), whose locations are often assumed to be known a priori or obtained as grid points from a quantile-based or uniform grid-spacing partition of the support of the change-point variable. In contrast, the regression functions in the change-point models (1) and (2) are not smooth at the change-points. Moreover, these models can be used to make inference about the change-points together with the other model parameters. Additional differences between spline and change-point models in general are discussed in Feder.²⁷

It is worth recalling that the confidence intervals in Tables 4 and 5 are Wald-type, using asymptotic normal approximations for the proposed and Muggeo’s estimators. Although valid in large samples, Wald-type confidence intervals may perform poorly in finite samples due to the parameter-effects curvature for some nonlinear models including those with change-points.^9,28 Alternative confidence intervals for the change-points could be based on likelihood ratio or smoothed score approaches.^29,30

6. Discussion

We have introduced generalized change-point models with at least one change-point occurring in some covariates of a generalized linear model for independent or correlated data. Moreover, we have developed methods to estimate the change-points along with the other model parameters under these change-point modeling frameworks. The proposed methods are based on estimating equations and involve no smoothing or linear approximation. They lead to $\sqrt{n}$ -consistent estimators with asymptotic normal distributions and can handle situations where the existing methods may not be straightforwardly applicable. The proposed methods have been applied to investigate the trends of the reported estimates of the annual percentage of new HIV diagnoses linked to care within three months of HIV diagnosis in aggregated HIV-surveillance data from the HPTN 065 study. They have also been used to examine the associations of age, BMI and glucose with diabetes in the Pima Indian diabetes study.

To ensure identifiability of the model parameters, we assumed the existence of the changepoints (β₂ ≠ 0). In practice, a graphical examination of the data could be useful for determining the presence of change-points in the covariates.⁴ Formally testing for the existence of a change-point is a nonstandard problem that causes the usual asymptotic distributions of the Wald and likelihood ratio type tests to not hold.^27,31 Although undertaking such a task under Models (1) and (2) was beyond the scope of the present work, it is worth mentioning that methods for testing the existence of a single change-point in one covariate of a regression model were discussed by Pastor-Barriuso et al,⁹ Davies,³² and Muggeo.³³ Also, permutation tests for the existence of multiple change-points in a covariate of a linear model were presented in Kim et al.³⁴

The proposed estimation methods are presented for linear, logistic and Poisson models, which are the most commonly used generalized linear models. They can also be applied to any other generalized linear models when some covariates have piecewise relationships with the response and there is an interest in estimating the change-points. The methods can also be extended to estimate change-points in the covariates of a survival model (e.g. Cox proportional hazards model) for a time-to-event outcome. However, their finite sample performance under such models may differ from the case with linear, logistic or Poisson model.

Our approach for repeated/clustered data was marginal in the sense that the focus was in modeling the nonlinear relationships between the population mean response and covariates with change-points, which were assumed to be the same for all individuals. A conditional approach accounting for a possible variability in the change-points between individuals represents an interesting extension that would be based on random-effect models incorporating random change-points. For the estimation of a linear mixed effects model with random change-points, a Bayesian approach was discussed by Dominicus et al³⁵ and a linear approximation method was presented by Muggeo et al³⁶ under the likelihood framework. Also, a polynomial model with random change-points was discussed in Van den Hout et al.³⁷ An extension of Model (2) to include a random intercept as well as random change-points and random slopes is given by, $E (Y_{i j} | X_{i j}, θ_{i}) = h {β_{i 0} + β_{i 1} X_{i j} + \sum_{r = 1}^{R} β_{i 2 r} {(X_{i j} - τ_{i r})}_{+}}$ , where $θ_{i} = (β_{i 0}, β_{i 1}, β_{i 2}^{'}, τ_{i}^{'})$ , i = 1,, n, are independent and identically distributed random variables with mean $θ = {(β_{0}, β_{1}, β_{2}^{'}, τ^{'})}^{'}$ and covariance matrix Ξ. Here $β_{i 2} = {(β_{i 21}, \dots, β_{i 2 R})}^{'}$ and $τ_{i} = {(τ_{i 1}, \dots, τ_{i R})}^{'}$ . The extension of our estimation methods to such conditional models may not be straightforward and is worth further investigation.

Supplementary Material

Supplemental Material

NIHMS1590210-supplement-Supplemental_Material.pdf^{(164.9KB, pdf)}

Acknowledgments

This work was partially supported by the National Institutes of Health grants MH105857 (Chen, Tapsoba, Wang and Zangeneh), HL121347 (Tapsoba and Wang), CA235122 (Wang) and UM1A1068617 (Zangeneh).

References

1.Hudson DJ. Fitting segmented curves whose join points have to be estimated. Journal of the American Statistical Association. 1996; 61: 1097–1129. [Google Scholar]
2.Donnell DJ, Hall HI, Gamble T, Beauchamp G, Griffin AB, Torian LV, Branson B, ElSadr WM. Use of HIV case surveillance system to design and evaluate site-randomized interventions in an HIV prevention study: HPTN 065. The Open AIDS Journal. 2012; 6: 122–130. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Wang L, Liu X, Liang H, Carroll RJ. Estimation and variable selection for generalized additive partial linear models.The Annal of Statistics. 2011; 39: 1827–1851. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Hastie T, Tibshirani R. Generalized additive models: some applications. Journal of the American Statistical Association. 1987; 82: 371–386. [Google Scholar]
5.Wood SN. mgcv: Mixed GAM Computation Vehicle with GCV/AIC/REML Smooth-ness Estimation 2017; R package version 18–22; https://cran.r-project.org/web/packages/mgcv/mgcv.pdf. [Google Scholar]
6.Chiu G, Lockhart R, Routledge R. Bent-cable regression theory and applications. Journal of the American Statistical Association. 2006; 101: 542–553. [Google Scholar]
7.Das R, Banerjee M, Nan B, Zheng H. Fast estimation of regression parameters in a broken-stick model for longitudinal data. Journal of the American Statistical Association. 2016; 111: 1132–1143. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Pastor R and Guallar E. Use of two-segmented logistic regression to estimate changepoints in epidemiologic studies. American Journal of Epidemiology. 1998; 148: 631–642. [DOI] [PubMed] [Google Scholar]
9.Pastor-Barriuso R, Guallar E, Coresh J. Transition models for change-point estimation in logistic regression. Statistics in Medicine. 2003; 22: 1141–1162. [DOI] [PubMed] [Google Scholar]
10.Fong Y, Di C, Huang Y, Gilbert PB. Model-robust inference for continuous threshold regression models. Biometrics. 2017; 73: 452–462. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Zhou H, Liang KY. On estimating the change point in generalized linear models. Institute of Mathematical Statistics. 2008; 1: 305–320. [Google Scholar]
12.Muggeo VMR. Estimating regression models with unknown break-points. Statistics Medicine. 2003; 22: 3055–3071. [DOI] [PubMed] [Google Scholar]
13.Muggeo VMR. Segmented: an R package to fit regression models with broken-line relationships. R News. 2008; 8: 20–25. [Google Scholar]
14.McCullagh P, Nelder JA. Generalized linear models. First edition, Chapman & Hall, London; 1983. [Google Scholar]
15.Wu L Mixed effects models for complex data. Boca Raton, FL: Chapman & Hall/CRC; 2009. [Google Scholar]
16.Liang KY, Zeger SL. Longitudinal data analysis using generalized linear models.Biometrika. 1986; 73: 13–22. [Google Scholar]
17.Bacon DW, Watts DG. Estimating the transition between two intersecting straight lines. Biometrika. 1971; 58: 525–534. [Google Scholar]
18.Tapsoba JD, Lee SM, Wang CY. Joint modeling of survival time and longitudinal data with subject-specific changepoints in the covariates. Statistics in Medicine. 2011; 30: 232–249. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.La Cruz W, Martinez JM, Raydan M. Spectral residual method without gradient information for solving large-scale non-linear systems of equations. Mathematics of Computation. 2006; 75: 1429–1448. [Google Scholar]
20.Varadhan R, Gilbert PD. BB. An R package for solving a large system of nonlinear equations and for optimizing a high-dimensional nonlinear objective function. Journal of Statistical Software. 2009; 32: 1–26. [Google Scholar]
21.Lin DY, Geyer CJ. Computational methods for semiparametric linear regression with censored data. Journal of Computational and Graphical Statistics. 1992; 1: 77–90. [Google Scholar]
22.Dorsey RE, Mayer WJ. Genetic algorithms for estimation problems with multiple optima, non-differentiability and other irregular features. Journal of Business & Economic Statistics. 1995; 13: 53–66. [Google Scholar]
23.Laffoon BT, Hall HI, Babu S, Benbow N, Hsu LC, Hu YW. HIV infection and linkage to HIV-related medical care in large urban areas in the United States, 2009. Journal of Acquired Immune Deficiency Syndromes. 2015; 69(4): 487–492. [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Huang Fong Y, Gilbert PB, Permar SR. Chngt: threshold regression model estimation and inference. BMC Bioinformatics. 2017; 18:454. [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Stasinopoulos DM, Rigby RA. Detecting break points in generalized linear models. Computational Statistics and Data Analysis. 1992; 13: 461–471. [Google Scholar]
26.Molinari N, Daures JP, Durand JF. Regression splines for threshold selection in survival data analysis. Statistics in Medicine. 2001; 20: 237–247. [DOI] [PubMed] [Google Scholar]
27.Feder P On asymptotic distribution theory in segmented regression problems-identified cases. The Annals of Statistics. 1975; 3, 49–83. [Google Scholar]
28.Seber GAF, Wild CJ. Non-linear Regression. Wiley: New York; 1989. [Google Scholar]
29.Muggeo VMR. Interval estimation for the breakpoint in segmented regression: a smoothed score-based approach. Australia & New Zealand Journal of Statistics. 2017; 59: 311–322. [Google Scholar]
30.Lerman PM. Segmented regression models by grid search. Journal of th Royal Statistical Society. Series C (Applied Statistics). 1980; 29: 77–84. [Google Scholar]
31.Ulm K A statistical method for assessing a threshold in epidemiological studies. Statistics in Medicine. 1991; 10: 341–349. [DOI] [PubMed] [Google Scholar]
32.Davies R Hypothesis testing when a nuisance parameter is present only under the alternative. Biometrika. 1987; 74: 33–43. [Google Scholar]
33.Muggeo VMR. Testing with a nuisance parameter present only under the alternative: a score-based approach with application to segmented modelling. Journal of Statistical Computation and Simulation. 2016; 86, 3059–3067. [Google Scholar]
34.Kim HJ, Fay MP, Feuer EJ, Midthune DN. Permutation tests for joinpoint regression with applications to cancer rates. Statistics in Medicine. 2000; 19(3): 335–351. [DOI] [PubMed] [Google Scholar]
35.Dominicus A, Ripatti S, Pedersen NL, Palmgren J. A random change point model for assessing variability in repeated measures of cognitive function. Statistics in Medicine. 2008; 27: 5786–5798. [DOI] [PMC free article] [PubMed] [Google Scholar]
36.Muggeo VMR, Atkins DC, Gallop RJ. Segmented mixed models with random changepoints: a maximum likelihood approach with application to treatment for depression study. Statistical Modelling. 2014; 14: 293–313. [Google Scholar]
37.van den Hout A, Muniz-Terra G, Mathews FE. Smooth random change points models. Statistics in Medicine. 2011; 30: 599–610. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental Material

NIHMS1590210-supplement-Supplemental_Material.pdf^{(164.9KB, pdf)}

[R1] 1.Hudson DJ. Fitting segmented curves whose join points have to be estimated. Journal of the American Statistical Association. 1996; 61: 1097–1129. [Google Scholar]

[R2] 2.Donnell DJ, Hall HI, Gamble T, Beauchamp G, Griffin AB, Torian LV, Branson B, ElSadr WM. Use of HIV case surveillance system to design and evaluate site-randomized interventions in an HIV prevention study: HPTN 065. The Open AIDS Journal. 2012; 6: 122–130. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R3] 3.Wang L, Liu X, Liang H, Carroll RJ. Estimation and variable selection for generalized additive partial linear models.The Annal of Statistics. 2011; 39: 1827–1851. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R4] 4.Hastie T, Tibshirani R. Generalized additive models: some applications. Journal of the American Statistical Association. 1987; 82: 371–386. [Google Scholar]

[R5] 5.Wood SN. mgcv: Mixed GAM Computation Vehicle with GCV/AIC/REML Smooth-ness Estimation 2017; R package version 18–22; https://cran.r-project.org/web/packages/mgcv/mgcv.pdf. [Google Scholar]

[R6] 6.Chiu G, Lockhart R, Routledge R. Bent-cable regression theory and applications. Journal of the American Statistical Association. 2006; 101: 542–553. [Google Scholar]

[R7] 7.Das R, Banerjee M, Nan B, Zheng H. Fast estimation of regression parameters in a broken-stick model for longitudinal data. Journal of the American Statistical Association. 2016; 111: 1132–1143. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R8] 8.Pastor R and Guallar E. Use of two-segmented logistic regression to estimate changepoints in epidemiologic studies. American Journal of Epidemiology. 1998; 148: 631–642. [DOI] [PubMed] [Google Scholar]

[R9] 9.Pastor-Barriuso R, Guallar E, Coresh J. Transition models for change-point estimation in logistic regression. Statistics in Medicine. 2003; 22: 1141–1162. [DOI] [PubMed] [Google Scholar]

[R10] 10.Fong Y, Di C, Huang Y, Gilbert PB. Model-robust inference for continuous threshold regression models. Biometrics. 2017; 73: 452–462. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R11] 11.Zhou H, Liang KY. On estimating the change point in generalized linear models. Institute of Mathematical Statistics. 2008; 1: 305–320. [Google Scholar]

[R12] 12.Muggeo VMR. Estimating regression models with unknown break-points. Statistics Medicine. 2003; 22: 3055–3071. [DOI] [PubMed] [Google Scholar]

[R13] 13.Muggeo VMR. Segmented: an R package to fit regression models with broken-line relationships. R News. 2008; 8: 20–25. [Google Scholar]

[R14] 14.McCullagh P, Nelder JA. Generalized linear models. First edition, Chapman & Hall, London; 1983. [Google Scholar]

[R15] 15.Wu L Mixed effects models for complex data. Boca Raton, FL: Chapman & Hall/CRC; 2009. [Google Scholar]

[R16] 16.Liang KY, Zeger SL. Longitudinal data analysis using generalized linear models.Biometrika. 1986; 73: 13–22. [Google Scholar]

[R17] 17.Bacon DW, Watts DG. Estimating the transition between two intersecting straight lines. Biometrika. 1971; 58: 525–534. [Google Scholar]

[R18] 18.Tapsoba JD, Lee SM, Wang CY. Joint modeling of survival time and longitudinal data with subject-specific changepoints in the covariates. Statistics in Medicine. 2011; 30: 232–249. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R19] 19.La Cruz W, Martinez JM, Raydan M. Spectral residual method without gradient information for solving large-scale non-linear systems of equations. Mathematics of Computation. 2006; 75: 1429–1448. [Google Scholar]

[R20] 20.Varadhan R, Gilbert PD. BB. An R package for solving a large system of nonlinear equations and for optimizing a high-dimensional nonlinear objective function. Journal of Statistical Software. 2009; 32: 1–26. [Google Scholar]

[R21] 21.Lin DY, Geyer CJ. Computational methods for semiparametric linear regression with censored data. Journal of Computational and Graphical Statistics. 1992; 1: 77–90. [Google Scholar]

[R22] 22.Dorsey RE, Mayer WJ. Genetic algorithms for estimation problems with multiple optima, non-differentiability and other irregular features. Journal of Business & Economic Statistics. 1995; 13: 53–66. [Google Scholar]

[R23] 23.Laffoon BT, Hall HI, Babu S, Benbow N, Hsu LC, Hu YW. HIV infection and linkage to HIV-related medical care in large urban areas in the United States, 2009. Journal of Acquired Immune Deficiency Syndromes. 2015; 69(4): 487–492. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R24] 24.Huang Fong Y, Gilbert PB, Permar SR. Chngt: threshold regression model estimation and inference. BMC Bioinformatics. 2017; 18:454. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R25] 25.Stasinopoulos DM, Rigby RA. Detecting break points in generalized linear models. Computational Statistics and Data Analysis. 1992; 13: 461–471. [Google Scholar]

[R26] 26.Molinari N, Daures JP, Durand JF. Regression splines for threshold selection in survival data analysis. Statistics in Medicine. 2001; 20: 237–247. [DOI] [PubMed] [Google Scholar]

[R27] 27.Feder P On asymptotic distribution theory in segmented regression problems-identified cases. The Annals of Statistics. 1975; 3, 49–83. [Google Scholar]

[R28] 28.Seber GAF, Wild CJ. Non-linear Regression. Wiley: New York; 1989. [Google Scholar]

[R29] 29.Muggeo VMR. Interval estimation for the breakpoint in segmented regression: a smoothed score-based approach. Australia & New Zealand Journal of Statistics. 2017; 59: 311–322. [Google Scholar]

[R30] 30.Lerman PM. Segmented regression models by grid search. Journal of th Royal Statistical Society. Series C (Applied Statistics). 1980; 29: 77–84. [Google Scholar]

[R31] 31.Ulm K A statistical method for assessing a threshold in epidemiological studies. Statistics in Medicine. 1991; 10: 341–349. [DOI] [PubMed] [Google Scholar]

[R32] 32.Davies R Hypothesis testing when a nuisance parameter is present only under the alternative. Biometrika. 1987; 74: 33–43. [Google Scholar]

[R33] 33.Muggeo VMR. Testing with a nuisance parameter present only under the alternative: a score-based approach with application to segmented modelling. Journal of Statistical Computation and Simulation. 2016; 86, 3059–3067. [Google Scholar]

[R34] 34.Kim HJ, Fay MP, Feuer EJ, Midthune DN. Permutation tests for joinpoint regression with applications to cancer rates. Statistics in Medicine. 2000; 19(3): 335–351. [DOI] [PubMed] [Google Scholar]

[R35] 35.Dominicus A, Ripatti S, Pedersen NL, Palmgren J. A random change point model for assessing variability in repeated measures of cognitive function. Statistics in Medicine. 2008; 27: 5786–5798. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R36] 36.Muggeo VMR, Atkins DC, Gallop RJ. Segmented mixed models with random changepoints: a maximum likelihood approach with application to treatment for depression study. Statistical Modelling. 2014; 14: 293–313. [Google Scholar]

[R37] 37.van den Hout A, Muniz-Terra G, Mathews FE. Smooth random change points models. Statistics in Medicine. 2011; 30: 599–610. [DOI] [PubMed] [Google Scholar]

PERMALINK

Methods for Generalized Change-point Models: with Applications to HIV Surveillance and Diabetes Data

Jean de Dieu Tapsoba

Ching-Yun Wang

Sahar Zangeneh

Ying Qing Chen

Abstract

1. Introduction

Figure 1:

Figure 2:

2. Generalized change-point models

2.1. Setting with one measurement per unit

2.2. Setting with repeated/clustered measurements

3. Estimation

3.1. Setting with one measurement per unit

3.2. Setting with repeated/clustered response data

4. Simulation study

Table 1:

Table 2:

Table 3:

5. Applications

5.1. HIV surveillance data from HPTN 065 study

Table 4:

5.2. Pima Indians diabetes study

Table 5:

6. Discussion

Supplementary Material

Acknowledgments

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Methods for Generalized Change-point Models: with Applications to HIV Surveillance and Diabetes Data

Jean de Dieu Tapsoba

Ching-Yun Wang

Sahar Zangeneh

Ying Qing Chen

Abstract

1. Introduction

Figure 1:

Figure 2:

2. Generalized change-point models

2.1. Setting with one measurement per unit

2.2. Setting with repeated/clustered measurements

3. Estimation

3.1. Setting with one measurement per unit

3.2. Setting with repeated/clustered response data

4. Simulation study

Table 1:

Table 2:

Table 3:

5. Applications

5.1. HIV surveillance data from HPTN 065 study

Table 4:

5.2. Pima Indians diabetes study

Table 5:

6. Discussion

Supplementary Material

Acknowledgments

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases