Confidence Estimation via the Parametric Bootstrap in Logistic Joinpoint Regression

Ryan Gill; Grzegorz A Rempala; Michal Czajkowski

doi:10.1016/j.jspi.2009.02.017

. Author manuscript; available in PMC: 2010 Sep 1.

Published in final edited form as: J Stat Plan Inference. 2009 Sep 1;139(9):3132–3141. doi: 10.1016/j.jspi.2009.02.017

Confidence Estimation via the Parametric Bootstrap in Logistic Joinpoint Regression

Ryan Gill ^†,^*, Grzegorz A Rempala ^‡, Michal Czajkowski ^†

PMCID: PMC2761632 NIHMSID: NIHMS128074 PMID: 20160983

Abstract

We consider asymptotic properties of the maximum likelihood and related estimators in a clustered logistic joinpoint model with an unknown joinpoint. Sufficient conditions are given for the consistency of confidence bounds produced by the parametric bootstrap; one of the conditions required is that the true location of the joinpoint is not at one of the observation times. A simulation study is presented to illustrate the lack of consistency of the bootstrap confidence bounds when the joinpoint is an observation time. A removal algorithm is presented which corrects this problem, but at the price of an increased mean square error. Finally, the methods are applied to data on yearly cancer mortality in the United States for individuals age 65 and over.

Keywords: logistic joinpoint regression, confidence estimation, parametric bootstrap, maximum likelihood, mortality trends

1 Introduction

There are a wide variety of statistical methods for analyzing nonlinear models. If one is interested only in summarizing the trends in the data and in obtaining a good flexible non-linear fit, then one may take advantage of many types of spline models existing in the literature (see [10, 11, 13, 14, 20, 21, 23, 36, 42] and the references therein). On the other hand, if one’s interest lies mainly in estimating and making inferences on the location of structural changes in the underlying model, the model frequently considered is that of a segmented regression in which the knots (also referred to as joinpoints) are unknown (see, for instance, [15, 16, 25, 26, 27, 29, 31, 37]).

Segmented regression models are popular, for instance, as tools in modeling general disease trends and originally have been introduced in the context of epidemiological studies of occupational exposures for modeling threshold limit values in logistic regression models with a single joinpoint (see [18, 41]). Subsequently, various multiple joinpoint algorithms have also been applied to disease trend models. For instance, Kim et al. [28] suggested a sequential backward selection algorithm for testing the number of joinpoints in a model that uses the least-squares criteria under squared-error loss and a goodness-of-fit measure based on the F-statistic, and applied the algorithm to model U.S. yearly cancer rates. The algorithm has been implemented in the free software Joinpoint, version 3.3, (see http://srab.cancer.gov/joinpoint/) which facilitates fitting and testing the model for Gaussian and Poisson regressions. Czajkowski et al. [7] compared the above algorithm with a forward selection algorithm in the logistic joinpoint regression setting with multiple joinpoints where model parameters were estimated by maximum likelihood and applied the methods to model longitudinal data on cancer mortality in a cohort of chemical workers. An R package ljr [6] available at http://www.R-project.org [34] has been developed implementing both the backward and forward algorithms for the logistic joinpoint model. Many alternative approaches for selecting the number of change points based on information theory have been also considered in a variety of contexts (see [8, 30, 32, 40, 43] and the references therein).

When working with statistics having complicated distributions such as those encountered in segmented regression models, the bootstrap and parametric bootstrap are common tools used for confidence estimation (see, for instance, [4, 9, 12, 22]). In many cases, the bootstrap is effective, but there are also cases where the bootstrap is not consistent when parameters are on the boundary of the parameter space. Several examples are discussed in [1]. In this paper, we are interested in consistency of parametric bootstrap confidence bounds in the context of the logistic joinpoint regression. In particular, we describe herein a situation in which the consistency of the bootstrap confidence bounds fails, in the sense that they are not asymptotically correct. We refer to Section 7.4 of [39] for a general discussion of the bootstrap confidence bounds. The material in this paper discusses the behavior of the bootstrap for a segmented regression model; however, see [24] for a detailed discussion of the asymptotic behavior of the bootstrap for detecting changes in a multiphase linear regression model which differs from ours in that it does not impose continuity constraints.

The paper is organized as follows. In Section 2, we introduce a clustered logistic joinpoint model and discuss maximum likelihood estimation of its parameters. The consistency and asymptotic normality of the maximum likelihood and related estimators are discussed in Section 3. Section 4 gives sufficient conditions for the consistency of the parametric bootstrap, illustrates via simulation a situation when the parametric bootstrap is not consistent, and suggests a removal algorithm to restore consistency for the simulated example. Finally, in Section 5, the methods proposed are applied to data on yearly cancer mortality in the United States for individuals age 65 and over.

2 Clustered Logistic Joinpoint Model

Suppose that Y₁,…, Y_N are independent Binomial random variables such that Y_i is the sum of m_i independent Bernoulli random variables each with probability of success p_0i. Denote the realizations of these random variables as y₁,…, y_N, respectively, let y = [y₁,…, y_N]^Τ, and let $n = \sum_{i = 1}^{N} m_{i}$ . Furthermore, let p_0i have the functional form

p_{0 i} = \frac{e^{η_{0 i}}}{1 + e^{η_{0 i}}}

(1)

where

η_{0 i} = α_{0} + β_{0} t_{i} + δ_{0} {(t_{i} - τ_{0})}_{+} + γ_{0}^{Τ} x_{i}

(2)

and t₊ = max{t, 0}. Here x₁,…, x_N are fixed q-dimensional covariates, and the ‘times’ t₁ ≤ … ≤ t_N are ordered covariates. Also, α₀ is the unknown intercept, τ₀ is the unknown joinpoint, β₀ is the unknown slope coefficient for t_i before τ₀, δ₀ ≠ 0 is the change in the slope coefficient after τ₀, and γ₀ is the unknown q-dimensional vector of coefficients for the fixed covariates.

2.1 Maximum Likelihood Estimation

In this section, the maximum likelihood estimate of $θ_{0} = {[α_{0}, β_{0}, τ_{0}, δ_{0}, γ_{0}^{Τ}]}^{Τ}$ is derived. The log-likelihood function for the sample y₁,…, y_N is given by

l (θ) = \sum_{i = 1}^{N} {η_{i} (θ) y_{i} - m_{i} {ln (1 + e}^{η_{i} (θ)})}

where θ = [α, β, τ, δ, γ^Τ]^Τ and

η_{i} (θ) = {\begin{matrix} α + β t_{i} + γ^{Τ} x_{i} & if t_{i} \leq τ \\ α + β t_{i} + δ (t_{i} - τ) + γ^{Τ} x_{i} & if t_{i} > τ \end{matrix} .

It is useful to consider the jth super log-likelihood function

\begin{array}{l} l_{j} (θ) & = \sum_{i = 1}^{j} {(α + β t_{i} + γ^{Τ} x_{i}) y_{i} - m_{i} ln (1 + e^{α + β t_{i} + γ^{Τ} x_{i}})} \\ + \sum_{i = j + 1}^{N} {(α + β t_{i} + δ (t_{i} - τ) + γ^{Τ} x_{i}) y_{i} - m_{i} ln (1 + e^{α + β t_{i} + δ (t_{i} - τ) + γ^{Τ} x_{i}})} \end{array}

(3)

for j = 1,…,N − 1. Note that

l_{j} (θ) = l (θ) if τ \in [t_{j}, t_{j + 1}]

and l_j is infinitely differentiable with respect to θ, but l is not differentiable at τ = t_i for i = 1,…,N if δ ≠ 0. Letting ϕ(θ) = [α, β, α − δτ, β + δ, γ^Τ]^Τ, we can express (3) as

l_{j} (θ) = \sum_{i = 1}^{N} {ϕ^{Τ} (θ) z_{i}^{(j)} y_{i} - m_{i} ln (1 + e^{ϕ^{Τ} (θ) z_{i}^{(j)}})}

where $z_{i}^{(j)} = {[1 - υ_{i}^{(j)}, (1 - υ_{i}^{(j)}) t_{i}, υ_{i}^{(j)}, υ_{i}^{(j)} t_{i}, x_{i}^{Τ}]}^{Τ}$ and

υ_{i}^{(j)} = {\begin{array}{l} 0 & if i \leq j \\ 1 & if i > j \end{array} .

This is a logistic regression model with the design matrix $Z^{(j)} = {[z_{1}^{(j)}, \dots, z_{N}^{(j)}]}^{Τ}$ and the canonical parameter vector ϕ(θ).

The function l_j is an infinitely differentiable function with respect to ϕ. Differentiating l_j with respect to ϕ, we have

\frac{\partial l_{j}}{\partial ϕ} = \sum_{i = 1}^{N} z_{i}^{(j)} {y_{i} - m_{i} ρ_{i}^{(j)} (θ)}

(4)

where $ρ_{i}^{(j)} (θ) = e^{η_{i}^{(j)} (θ)} / (1 + e^{η_{i}^{(j)} (θ)})$ with $η_{j}^{(j)} (θ) = ϕ^{Τ} (θ) z_{i}^{(j)}$ . Setting (4) to zero, dividing by n, and simplifying, we obtain

\sum_{i = 1}^{N} u_{i} z_{i}^{(j)} (\frac{y_{i}}{m_{i}} - ρ_{i}^{(j)} (θ)) = 0

(5)

where u_i = m_i/n. If the matrix $Z^{(j)} = {[z_{1}^{(j)}, \dots, z_{N}^{(j)}]}^{Τ}$ is of full rank, then the solution to (5) is the unique maximizer of l_j since

- \frac{\partial^{2} l_{j}}{\partial ϕ \partial ϕ^{Τ}} = n {(Z^{(j)})}^{Τ} U W^{(j)} (θ) Z^{(j)}

is positive definite, where W^(j)(θ) is a diagonal matrix with diagonal elements $ρ_{i}^{(j)} (θ) (1 - ρ_{i}^{(j)} (θ))$ i = 1,…,N and

U is a diagonal matrix with diagonal elements u_{1}, \dots, u_{N} .

(6)

For each j = 1,…,N − 1, denote the maximizer of l_j as

{\hat{θ}}_{n}^{(j)} = {[{\hat{α}}_{n}^{(j)}, {\hat{β}}_{n}^{(j)}, {\hat{τ}}_{n}^{(j)}, {\hat{δ}}_{n}^{(j)}, {({\hat{γ}}_{n}^{(j)})}^{Τ}]}^{Τ} .

(7)

Using the invariance property of the method of maximum likelihood (for example, see [33]) ${\hat{θ}}_{n}^{(j)} = ϕ^{- 1} (\hat{ϕ (θ_{n}^{(j)})})$ where $\hat{ϕ (θ_{n}^{(j)})}$ is the solution to (5).

The following algorithm can be used to compute the maximum likelihood estimate (MLE) θ̂_n of θ̂₀.

For j = 1,…,N −1, compute ${\hat{θ}}_{n}^{(j)}$ . If ${\hat{τ}}_{n}^{(j)} \in (t_{j}, t_{j} + 1)$ , then compute $l ({\hat{θ}}_{n}^{(j)})$ ; otherwise, the MLE of τ₀ is not in (t_j, t_j+1) and there is no need to evaluate l at any θ such that τ ∈ (t_j, t_j+1).
For j = 2,…,N − 1, fix τ = t_j so that the model is equivalent to logistic regression with covariates 1, t_i, (t_i − t_j)+, and x_i, i = 1,…,N. Then fit a logistic regression model to obtain respective possible estimates ${\tilde{α}}_{n}^{(j)}, {\tilde{β}}_{n}^{(j)}, {\tilde{δ}}_{0}^{(j)}, and {\tilde{γ}}_{0}^{(j)}$ of α₀, β₀, δ₀, and γ₀. Denote the possible estimate of θ₀ as
${\tilde{θ}}_{n}^{(j)} = {[{\tilde{α}}_{n}^{(j)}, {\tilde{β}}_{n}^{(j)}, t_{j}, {\tilde{δ}}_{n}^{(j)}, {({\tilde{γ}}_{n}^{(j)})}^{Τ}]}^{Τ},$
and compute $l ({\hat{θ}}_{n}^{(j)})$ .
The MLE of θ₀ is the value of θ which maximizes l(θ) among the values at which we evaluate l in steps 1 and 2.

See [7] for a detailed description of an algorithm for computing the MLE in the logistic joinpoint regression model with multiple joinpoints.

3 Asymptotic properties

In this section, the consistency and the asymptotic distribution of the maximum likelihood estimator are established. Consider the function F^(j) : ℝ^N × ℝ^4+q → ℝ^4+q such that

F^{(j)} (p, θ) = \sum_{i = 1}^{N} u_{i} z_{i}^{(j)} (p_{i} - ρ_{i}^{(j)} (θ)), j = 1, \dots, N - 1

where p = [p1,…, p_N]^Τ and p_i ∈ (0, 1) for i = 1,…,N. In particular, we will be interested in values of k such that

τ_{0} \in [t_{k}, t_{k + 1}] .

(8)

Throughout this paper, k shall be reserved for values such that (8) holds. If τ₀ ∈ (t_k, t_k+1), then k is unique. If τ₀ = t_ℓ for some ℓ, then (8) holds for both k = ℓ − 1 and k = ℓ.

Letting p₀ = [p₀₁,…, p_0N]^Τ, the following lemma uses the Implicit Function Theorem (see, for example, [35]) to express ϕ(θ) as a function of p. Proofs of lemmas and theorems are deferred to Appendix A.

Lemma 3.1

Suppose that F^(k)(p, θ) = 0 and (8) holds. If Z^(k) is full rank, then in an open neighborhood of (p₀, θ₀), say B_p0, there exists a function g^(k) such that g^(k)(p) = ϕ(θ) for (p, θ) ∈ B_p0 and the second partial derivatives of g^(k) exist and are continuous.

Implicit differentiation can be used to obtain the explicit form for the partial derivatives of g^(k). Differentiating both sides of F^(k)(p, θ) = 0 with respect to p^Τ, we obtain

{(Z^{(k)})}^{Τ} U = {(Z^{(k)})}^{Τ} U W^{(k)} (θ) Z^{(k)} G^{(k)} (p)

Where $G^{(k)} (p) = \frac{\partial g^{(k)} (p)}{\partial p^{Τ}}$ is the matrix of partial derivatives of each component of g with respect to p^Τ. If Z^(k) is full rank, then we have

G^{(k)} (p) = ({{(Z^{(k)})}^{Τ} U W^{(k)} (θ) Z^{(k)})}^{- 1} {(Z^{(k)})}^{Τ} U .

Let p̂_n = [y₁/m₁,…, y_N/m_N]^Τ. Lemma 3.1 can be used to show that the maximizer of l_k is a strongly consistent estimator of θ₀ for any k such that the model is correctly specified. The following lemma states this result for the case when each of the m_i’s increases linearly, although the linearity can be relaxed to deal with differing rates of divergence. Let ${\hat{θ}}_{n}^{(k)}$ be as defined by (7).

Lemma 3.2

For sufficiently large n₀, suppose there exist constants 0 < c₁ < c₂ such that nc₁ ≤ m_i ≤ nc₂ for i = 1,…,N when n ≥ n₀. If Z^(k) is full rank and (8) holds, then ${\hat{θ}}_{n}^{(k)} \to θ_{0} a . s . as n \to \infty$ .

In this lemma and the following theorems, N is taken to be fixed. This setting is reasonable for analyzing data retrospectively when there is a large amount of data at each observation time or in each time-specific cluster of observations (see, for example, [7]). Since N is fixed, few assumptions are required to achieve consistent estimators. In practice, the assumption that Z^(k) is full rank for values of k such that (8) holds is needed so that the parameters are identifiable. In other settings in which N is allowed to increase to infinity, additional assumptions are required on the spacing of the covariates and/or the location of the change-point parameter to ensure that there is enough data in each part of the input space to estimate all of the parameters consistently. However, the specific assumptions required depends on the particular way that N increases. For discussion about typical assumptions for some other change-point problems, see [2] and [32].

Under the conditions of Lemma 3.2, it is seen that the estimator of θ₀ based on k such that the model is correctly specified is consistent. Next, it must be shown that, for each j such that τ₀ ∉ [t_j, t_j+1], the maximum value of l_j is less than the maximum value of l_k. To prove this, we introduce the saturated likelihood function s : ℝ^N → ℝ defined by

s (p) = \sum_{i = 1}^{N} {ln (\frac{p_{i}}{1 - p_{i}}) y_{i} + m_{i} ln (1 - p_{i})}

(9)

which provides a parameter for each distinct set of observed covariates. Using this concept, it can be shown that the maximum likelihood estimator (maximizer of l) is strongly consistent. As before, let θ̂_n denote the maximum likelihood estimator of θ₀.

Theorem 3.1

Under the conditions of Lemma 3.2 for each k such that (8) holds, it follows that θ̂_n → θ₀ a.s. as n→∞.

Next, we examine the asymptotic distribution of θ̂_n. Since consistency has been established, it suffices to consider the distribution(s) of ${\hat{θ}}_{n}^{(k)}$ with k given by (8).

There are two cases which must be examined. First consider the more complicated one when τ₀ = t_ℓ for some ℓ. In this case, we must consider the joint behavior of ${\hat{θ}}_{n}^{(ℓ - 1)} and {\hat{θ}}_{n}^{(ℓ)}$ . Denote by $Φ (ω) = \frac{\partial ϕ^{- 1} (ω)}{\partial ω^{Τ}}$ the (q+4)×(q+4) matrix of partial derivatives of the components of ϕ⁻¹. Let 𝒜(0,S) denote the (multivariate) normal distribution with zero mean vector and covariance matrix S. Also, let ⇒ denote convergence in distribution. The following theorem proves the asymptotic normality of the joint distribution of these quantities. The proof is given in Appendix A.

Theorem 3.2

For sufficient large n, suppose there exist constants 0 < c₁ < c₂ such that nc₁ ≤ m_i ≤ nc2. If τ₀ = t_ℓ for some ℓ and Z^(k) is full rank for k = ℓ − 1 and k = ℓ, then

\sqrt{n} ([\begin{matrix} {\hat{θ}}_{n}^{(ℓ - 1)} \\ {\hat{θ}}_{n}^{(ℓ)} \end{matrix}] - [\begin{matrix} θ_{0} \\ θ_{0} \end{matrix}]) \Rightarrow N (0, ψ G U^{- 1} W^{(ℓ)} G^{Τ} ψ^{Τ})

(10)

as n→∞ where $G = G (p_{0}) = [\begin{matrix} G^{(ℓ - 1)} (p_{0}) \\ G^{(ℓ)} (p_{0}) \end{matrix}], ψ = [\begin{matrix} Φ^{(ℓ - 1)} & O \\ O & Φ^{(ℓ)} \end{matrix}], Φ^{(k)} = Φ (g^{(k)} (p_{0}))$ and W^(k) = W^(k)(θ₀) for k = ℓ − 1 and k = ℓ, and U is defined by (6).

The result for the second case where τ₀ ∈ (t_k, t_k+1) for some k is given in Theorem 3.3. The proof is omitted since it can be proven in a manner similar to Theorem 3.2. Note that ${\hat{θ}}_{n} = {\hat{θ}}_{n}^{(k)} a . s .$ when n is sufficiently large, as shown in the proof of Theorem 3.1.

Theorem 3.3

For sufficient large n, suppose there exist constants 0 < c₁ < c₂ such that nc₁ ≤ m_i ≤ nc₂. If τ₀ ∈ (t_k, t_k+1) for some k and Z^(k) is full rank, then

\sqrt{n} ({\hat{θ}}_{n} - θ_{0}) \Rightarrow N (0, Φ^{(k)} G^{(k)} U^{- 1} W^{(k)} {(G^{(k)})}^{Τ} {(Φ^{(k)})}^{Τ})

as n → ∞ where $G^{(k)} = G^{(k)} (p 0), Φ^{(k)} = Φ (g^{(k)} (p_{0})), W^{(k)} = W^{(k)} (θ_{0})$ , and U is defined by (6).

4 Consistency of Bootstrap Confidence Bounds

Suppose we use the parametric bootstrap to generate Binomial random variables $Y_{ib}^{*}$ with sizes m_i and probabilities of success p̂_0i for i = 1,…,N and b = 1,…, B where B is the number of bootstrap samples. Set $Y_{b}^{*} = {[Y_{1 b}^{*}, \dots, Y_{Nb}^{*}]}^{Τ}$ and let ${\hat{θ}}_{nb}^{*}$ denote the bth bootstrap replication of θ̂_n, with P and P* being the probability measures under θ₀ and ${\hat{θ}}_{n}^{*}$ , respectively.

4.1 Consistency when τ₀ ≠ t_ℓ for any ℓ

We shall consider the consistency of the bootstrap percentile method. For simplicity, we describe the one-sided interval for a parameter ζ₀, but the results extend to two-sided intervals with proper modifications. Set $K_{B} (x) = P^{*} ({\hat{ζ}}_{n}^{*} \leq x)$ . Then the bootstrap estimator of the upper bound of the α-level one-sided percentile confidence interval for ζ₀ is $ζ_{BP} = K_{B}^{- 1} (α)$ . The following lemma gives sufficient conditions for the consistency of bootstrap confidence intervals of this form. For a proof, see Theorem 7.9 of [39].

Lemma 4.1

Let $H_{n} (x) = P (\sqrt{n} ({\hat{ζ}}_{n} - ζ_{0}) \leq x)$ and ${\hat{H}}_{B} (x) = P^{*} (\sqrt{n} ({\hat{ζ}}_{n}^{*} - {\hat{ζ}}_{n}) \leq x)$ . Suppose that

sup_x|H_n (x) − H^B(x)| = op(1)
sup_x|H_n(x) − Ψ(x)| = o(1) for some continuous, strictly increasing and symmetric about zero distribution Ψ(x)

Then P(ζ_BP ≤ ζ₀) → 1 − α as n → ∞.

Thus, in order to prove consistency of bootstrap confidence intervals for a^Τθ₀ where a is a fixed non-zero (q + 4)-dimensional vector, we need to check conditions (1) and (2). If τ₀ ∈ (t_k, t_k+1), Theorem 3.3 verifies condition (2) of Lemma 4.1. So, for the case when τ₀ ∈ (t_k, t_k+1), we next verify condition (1) in the following result.

Theorem 4.1

Under the conditions of Theorem 3.3, it follows that

sup_{x} | P (\sqrt{n} (a^{Τ} {\hat{θ}}_{n} - a^{Τ} θ_{0}) \leq x) - P^{*} (\sqrt{n} (a^{Τ} {\hat{θ}}_{n}^{*} - a^{Τ} {\hat{θ}}_{n}) \leq x) | \to 0 a . s .

(11)

as n → ∞.

Hence, from the last theorem, we see that the conditions for bootstrap confidence interval consistency given in Lemma 4.1 hold when τ₀ ∈ (t_k, t_k+1) since (11) implies convergence in probability. Continuity of Ψ will not hold if τ₀ = t_ℓ for some ℓ as shown by Theorem 3.2.

4.2 Simulation when τ₀ = t_ℓ for some ℓ

To illustrate the behavior of the bootstrap estimates and confidence bounds when τ₀ = t_ℓ for some ℓ, we perform the following simulation study. Suppose that we have equally-spaced observation times t_i = i for i = 1,…,N = 7, no additional covariates (q = 0), true coefficient values α₀ = β₀ = 0 and δ₀ = 0.2, and a joinpoint at τ₀ = 4 for the model specified by (1) and (2). For various choices of m₁ =…= m₇, we simulate R = 1, 000, 000 data sets and compute the estimate of τ₀ and the bootstrap estimate of τ₀ for each data set.

One simple way to see that Theorem 4.1 does not hold for this case is to use our simulation to estimate the proportion of times that τ̂_n equals τ₀ = 4 and compare this with the proportion of times that the bootstrap estimate ${\hat{τ}}_{n}^{*}$ equals τ₀. These quantities, denoted by P̂ (τ̂_n = 4) and ${\hat{P}}^{*} ({\hat{τ}}_{n}^{*} = 4)$ , as well as empirical estimates of the mean square error $\hat{MSE} ({\hat{τ}}_{n}) = E [{({\hat{τ}}_{n} - τ_{0})}^{2}]$ and the bootstrap estimate ${\hat{MSE}}^{*} ({\hat{τ}}_{n}^{*}) = E^{*} [{({\hat{τ}}_{n}^{*} - {\hat{τ}}_{n})}^{2}]$ of this quantity are reported in Table 1 for m_i = 100, 10⁵, 10⁷, and 10⁹ using the contributed R package ljr [6]. Clearly, the bootstrap underestimates the true probability that τ̂_n = 4 as seen in columns 2 and 4 of Table 1 and overestimates the true MSE of τ̂_n as seen in columns 3 and 5.

Table 1.

Simulation results for the bootstrap method applied to the MLE.

m_i, i = 1,…, 7	P^(τ^_n = τ₀)	$\hat{MSE} ({\hat{τ}}_{n})$	$\hat{P} * ({\hat{τ}}_{n}^{*} = τ_{0})$	${\hat{MSE}}^{} ({\hat{τ}}_{n}^{})$

10³	.2007	3.967 × 10⁻¹	.1097	4.779 × 10⁻¹
10⁵	.2023	2.792 × 10⁻³	.1157	3.545 × 10⁻³
10⁷	.2020	2.899 × 10⁻⁵	.1155	3.698 × 10⁻⁵
10⁹	.2020	3.122 × 10⁻⁷	.1160	3.972 × 10⁻⁷

Open in a new tab

The estimate P̂(τ̂_n = 4) = .2020 when m_i = 10⁹ agrees with the theoretical value suggested by Theorem 3.2. As n → ∞, note that

\sqrt{n} ([\begin{array}{l} {\hat{τ}}_{n}^{(3)} \\ {\hat{τ}}_{n}^{(4)} \end{array}] - [\begin{array}{l} 4 \\ 4 \end{array}]) \Rightarrow N (0, S)

where

A = [\begin{array}{l} 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 \end{array}]

and

S = A ψ G U^{- 1} W^{(4)} G^{Τ} ψ^{Τ} A^{Τ} = [\begin{array}{c} 2127.2073 & 638.1622 \\ 638.1622 & 2167.1859 \end{array}] .

Thus, for large n, we have

P ({\hat{τ}}_{n} = 4) = P ({\hat{τ}}_{n}^{(3)} > 4 and {\hat{τ}}_{n}^{(4)} < 4) \to .2019703;

this bivariate normal probability can be computed using the contributed R package mvtnorm [17].

4.3 Removal Algorithm

Herein we discuss a modified estimator of τ₀ for which bootstrap consistency holds as a remedy for the lack of consistency of the parametric bootstrap when τ₀ = t_i for some i. First, compute θ̂_n. Then remove the observation(s) at the observation time t_ℓ which is closest to τ̂_n and re-fit the MLE without the observation(s), denoting the result as

{\hat{θ}}_{n}^{-} = {[{\hat{α}}_{n}^{-}, {\hat{β}}_{n}^{-}, {\hat{τ}}_{n}^{-}, {\hat{δ}}_{n}^{-}, {({\hat{γ}}_{n}^{-})}^{Τ}]}^{Τ} .

If the matrix Z^(k), with k defined by (8), is still full rank without the observation(s) and m_i satisifies the condition in Lemma 3.2, then bootstrap consistency holds for this algorithm even if τ₀ = t_ℓ for some ℓ (since these observations will be removed with probability 1 as n → ∞).

As an illustration of the effect of using this algorithm, we use the same simulated data sets considered in Section 4.2 and compare the results for the bootstrap based on the method of maximum likelihood with the bootstrap based on the removal algorithm. Letting τ̂ (.95) and ${\hat{τ}}_{(.95)}^{-}$ represent the 95th percentiles of the empirical distributions of τ̂_n and ${\hat{τ}}_{n}^{-}$ , Table 2 computes the estimated tail probabilities ${\hat{P}}^{*} ({\hat{τ}}_{n}^{*} - {\hat{τ}}_{n} \geq {\hat{τ}}_{(.95)})$ and ${\hat{P}}^{*} ({({\hat{τ}}_{n}^{-})}^{*} - {\hat{τ}}_{n}^{-} \geq {\hat{τ}}_{(.95)}^{-})$ for the bootstrap method. Clearly, the bootstrap overestimates the 95th percentile of the distribution of τ̂_n − τ₀, but provides a good estimate of ${\hat{τ}}_{n}^{-} - τ_{0}$ even for relatively small sample sizes. However, the tradeoff for obtaining accurate confidence bounds is the loss of information caused by discarding part of the data. This also can be seen in Table 2 by observing that the estimates of the mean square errors for ${\hat{τ}}_{n}^{-}$ are higher than those for τ̂_n.

Table 2.

Comparison of results for the MLE and the estimates based on the removal algorithm.

m_i, i = 1,…, 7	$\hat{P} * ({\hat{τ}}_{n}^{*} - {\hat{τ}}_{n} \geq {\hat{τ}}_{(.95)})$	$\hat{MSE} ({\hat{τ}}_{n})$	$\hat{P} * ({({\hat{τ}}_{n}^{-})}^{*} - {\hat{τ}}_{n}^{-} \geq {\hat{τ}}_{(.95)}^{-})$	${\hat{MSE}}^{-} ({\hat{τ}}_{n}^{-})$

10³	.0625	3.967 × 10⁻¹	.0496	7.910 × 10⁻¹
10⁵	.0670	2.792 × 10⁻³	.0500	4.738 × 10⁻³
10⁷	.0676	2.899 × 10⁻⁵	.0499	4.730 × 10⁻⁵
10⁹	.0676	3.122 × 10⁻⁷	.0499	5.061 × 10⁻⁷

Open in a new tab

5 Example

We now apply the method of clustered logistic joinpoint regression to model yearly cancer mortality in the United States for individuals age 65 and over during the period 1979–1998. The data set was obtained from the CDC wonder database [5], and it includes y_i – the number of deaths in the ith observed year due to neoplasms (ICD-9 codes 140–239), m_i –the population during the ith observed year, and t_i = 1978.5 + i – the midpoint of the ith observed year. So, we use the model given by (1) and (2) with q = 0. The observed cancer mortality rates y_i/m_i are plotted versus time in Figure 1.

Observed US yearly cancer mortality rates for individuals age 65 and over. The solid line gives the fitted model based on all of the data. The dashed line gives the fitted model with the 15th observation (where t₁₅ = 1993.5) removed.

Table 3 gives the parameter estimates for each of the unknown parameters and the fitted probabilities based on maximum likelihood are illustrated by the solid curve in Figure 1. The estimated joinpoint τ̂_n = 1993.686 supports previous findings which attribute the decrease in cancer mortality to improvements in prevention, detection, and treatments [19].

Table 3.

Parameter estimates in the logistic joinpoint regression model for the US yearly 65+ cancer mortality data.

Parameter	Estimate	Value	Estimate	Value

α₀	α^_n	−23.17397	${\hat{α}}_{n}^{-}$	−23.44145
β₀	β^_n	.0093903	${\hat{β}}_{n}^{-}$	.0095251
δ₀	δ^_n	−.0140205	${\hat{δ}}_{n}^{-}$	−.0141553
τ₀	τ^_n	1993.686	${\hat{τ}}_{n}^{-}$	1993.595

Open in a new tab

Next, we obtain estimated 95% confidence bounds for τ based on the parametric bootstrap. Using the estimates listed in Table 3 as our model parameters, we generate R = 100000 bootstrap samples and compute the estimate of τ for each sample. Then our estimated confidence bounds are the 2.5 and 97.5 percentiles of the bootstrap distribution of τ. In this manner, we obtain the confidence bounds (1993.371, 1993.930).

Note that the 15th observed year t₁₅ = 1993.5 falls within this interval. Thus, there might be a problem with consistency of the bootstrap as discussed in Section 4. We can attempt to remedy this possible bias by repeating our analysis with i = 15 removed from the data set. When the model is fit without the 15th observation, the estimate of the joinpoint changes slightly to ${\hat{τ}}_{n}^{-} = 1993.595$ . The estimated coefficients change very little (see Table 3) and the fitted probabilities based on the removal algorithm are illustrated by the dashed curve in Figure 1.

The 95% estimated confidence bounds with observation 15 omitted are (1993.318,1993.851). While it is true that the interval still contains 1993.5, this value is no longer in our data set since it has been removed, and it seems reasonable to claim that it is unlikely that we are dealing with a situation where τ₀ = t_ℓ for some ℓ and, thus, the consistency of the bootstrap confidence bounds is more plausible.

6 Summary and Conclusion

After presenting details necessary for the computation of the maximizer of the super log-likelihood functions and the MLE of the parameters in the clustered logistic joinpoint model, we considered the asymptotic properties of these estimators. Sufficient conditions for the consistency of the MLE were given. Asymptotic normality of the MLE was also shown under the same conditions as long as the true location of the joinpoint had not been at one of the observation times. Under this latter proviso we also showed the consistency of the bootstrap confidence bounds.

However, if the true location of the joinpoint was one of the observation times, then it was shown that the joint distribution of the maximizers of indices corresponding to the neighboring intervals was asymptotically normal. A simulation study was performed to illustrate the lack of consistency of the bootstrap method in generating confidence bounds in this case due to asymptotic bias. It was also shown that we could remove this bias and obtain consistent estimates via a removal algorithm at the cost of a higher MSE.

Finally, the model and the methods were used to analyze yearly cancer mortality in the United States for individuals age 65 and over. The bootstrap confidence interval included one observation time, so the model was refit without that observation. There is only a slight change in the resulting fitted model, but the consistency of the bootstrap is more plausible with the second fit. An R package ljr [6] capable of fitting these models is available in the contributed packages at http://www.R-project.org [34].

A Proofs

Proof of Lemma 3.1

Since F^(k) is infinitely differentiable and F^(k)(p₀, θ₀) = 0, it remains to verify that the determinant of ${\frac{\partial F^{(k)}}{\partial ϕ^{Τ}} |}_{(p 0, θ_{0})} = - {(Z^{(k)})}^{Τ} {DZ}^{(k)}$ is not zero in order to apply the Implicit Function Theorem. Here D = UW^(k)(θ₀) is a N × N diagonal matrix with diagonal elements d_i = m_ip_0i(1 − p_0i)/n for i = 1,…,N. Since d_i ∈ (0, 1), ${\frac{\partial F^{(k)}}{\partial ϕ^{Τ}} |}_{(p 0, θ_{0})}$ has a nonzero determinant if and only if Z^(k) is full rank.

Proof of Lemma 3.2

The strong law of large numbers implies that p̂_n → p₀ a.s. as n → ∞. For all θ, the left side of (5) converges to F^(k)(p₀, θ) a.s. The continuity of F^(k) implies that.

0 = lim_{n \to \infty} F^{(k)} ({\hat{p}}_{0}, {\hat{θ}}_{n}^{(k)}) = F^{(k)} ({\hat{p}}_{0}, lim_{n \to \infty} {\hat{θ}}_{n}^{(k)}) .

It is clear that θ₀ is the unique value of θ such that F^(k)(p₀, θ) = 0 by the same argument as the one used to show that the solution to (5) is a unique maximizer of l_j when y_i/n_i is replaced by p_i. Thus, it follows that ${\hat{θ}}_{n}^{(k)} \to θ_{0} a . s .$ as n→∞.

Proof of Theorem 3.1

In view of Lemma 3.2, it suffices to show that, with probability 1, $l ({\hat{θ}}_{n}^{(k)}) > l ({\hat{θ}}_{n}^{(j)})$ if τ₀ ∈ [t_k, t_k+1] but τ₀ ∉ [t_j, t_j+1] From Lemma 3.2, we have. ${\hat{θ}}_{n}^{(k)} \to θ_{0} a . s .$ Using the continuity of $ρ_{i}^{(k)}$ , it follows that $p_{i}^{(k)} ({\hat{θ}}_{n}^{(k)}) \to p_{i}^{(k)} (θ_{0}) = p 0_{i} a . s .$ . for i = 1,…,N. Hence, we have $l ({\hat{θ}}_{n}^{(k)}) \to s (p_{0}) a . s .$ as n→∞.

Let $S = {(p_{1}, \dots, p_{N}) | p_{i} \in [0, 1], i = 1, \dots, N}$ be the set of points considered by the saturated model and let

A_{ℓ} = {(p_{1}, \dots, p_{N}) \in S | τ \in [t_{ℓ}, t_{ℓ + 1}], p_{i} = p_{i}^{(ℓ)} (θ), i = 1, \dots, N}

be the set of fitted probabilities corresponding to restricting τ ∈ [t_k, tk+1]. Clearly 𝒜_k ⊂ 𝒮 for all k. Furthermore, p₀ ∈ 𝒜_k if τ₀ ∈ [t_k, t_k+1] and p₀ ∉ 𝒜_j if τ₀ ∉ [t_j, t_j+1]. Thus, max_{θ∈𝒜_j} l(θ) < s(p₀) and $l ({\hat{θ}}_{n}^{(k)}) \to s (p_{0}) a . s .$ so that max_{θ∈𝒜_j} l(θ) < maxθ∈𝒜_k l(θ) with probability 1.

Consequently, if k is unique, then τ̂_n ∈ (t_k, t_k+1) a.s. as n → ∞. If k is not unique, then τ̂_n ∈ (t_ℓ−1, t_ℓ+1) a.s. as n →∞, and we have ${\hat{θ}}_{n}^{(ℓ - 1)} \to θ_{0} and {\hat{θ}}_{n}^{ℓ} \to θ_{0} a . s .$ as n → ∞ by Lemma 3.2. In either case, it follows that θ̂_n → θ₀ a.s. as n → ∞.

Proof of Theorem 3.2

Take any a ∈ ℝ^2(q+4) and let $g (p) = [\begin{matrix} g^{(ℓ - 1)} (p) \\ g^{(ℓ)} (p) \end{matrix}]$ . Using a multivariate Taylor series expansion, we obtain

\begin{array}{l} a^{Τ} ([\begin{matrix} ϕ ({\hat{θ}}_{n}^{(ℓ - 1)}) \\ ϕ ({\hat{θ}}_{n}^{(ℓ)}) \end{matrix}] - [\begin{matrix} ϕ (θ_{0}) \\ ϕ (θ_{0}) \end{matrix}]) = a^{Τ} (g ({\hat{p}}_{n}) - g (p_{0})) \\ = a^{Τ} G (p_{0}) ({\hat{p}}_{n} - p_{0}) + \frac{1}{2} {({\hat{p}}_{n} - p_{0})}^{Τ} H ({\tilde{p}}_{n}) ({\hat{p}}_{n} - p_{0}) \end{array}

(12)

where $H ({\tilde{p}}_{n}) = \frac{\partial a^{Τ} G ({\tilde{p}}_{n})}{\partial p}$ is the Hessian matrix of a^Τg evaluated at a point p̂_n on the segment connecting p̂_n and p₀. Note that

\sqrt{n} ({\hat{p}}_{n} - p_{0}) \Rightarrow N (0, U^{- 1} W^{(ℓ)} (p_{0}))

(13)

as n → ∞. Since H(p̂_n) is symmetric, we can apply the singular value decomposition to obtain H(p̂_n) = C(p̂_n)Λ(p̂_n)C(p̂_n)^Τ where C(·) is orthogonal and Λ(·) is diagonal with entries λ₁ ≥ … ≥ λ_2(q+4). Note that λ₁ is bounded as n → ∞. Thus, (13) implies that

\frac{\sqrt{n}}{2} {({\hat{p}}_{n} - p_{0})}^{Τ} H ({\tilde{p}}_{n}) ({\hat{p}}_{n} - p_{0}) \leq \frac{λ_{1}}{\sqrt{n}} {‖ \sqrt{n} ({\hat{p}}_{n} - p_{0}) ‖}^{2} = o_{P} (1) .

(14)

Hence, (12), (14), and Slutsky’s Theorem (see [3]) imply that

\sqrt{n} a^{Τ} ([\begin{matrix} ϕ ({\hat{θ}}_{n}^{(ℓ - 1)}) \\ ϕ ({\hat{θ}}_{n}^{(ℓ)}) \end{matrix}] - [\begin{matrix} ϕ (θ_{0}) \\ ϕ (θ_{0}) \end{matrix}]) \Rightarrow N (0, a^{Τ} G U^{- 1} W^{(ℓ)} G^{Τ} a)

as n → ∞. So, the Cramér-Wold Criterion (see, for example, [38]) implies that

\sqrt{n} ([\begin{matrix} ϕ ({\hat{θ}}_{n}^{(ℓ - 1)}) \\ ϕ ({\hat{θ}}_{n}^{(ℓ)}) \end{matrix}] - [\begin{matrix} ϕ (θ_{0}) \\ ϕ (θ_{0}) \end{matrix}]) \Rightarrow N (0, {GU}^{- 1} W^{(ℓ)} G^{Τ})

(15)

as n → ∞. Since ${\hat{θ}}_{n}^{(k)} = ϕ^{- 1} (ϕ ({\hat{θ}}_{n}^{(k)}))$ for k =ℓ−1, ℓ and $Φ (ω) = \frac{\partial ϕ^{- 1} (ω)}{\partial ω^{Τ}},$ , a similar argument can be used to show that (15) implies (10).

Proof of Theorem 4.1

Simple modifications can be made to the arguments given in Theorem 3.2 to show that

\sqrt{n} S_{n}^{- 1 / 2} ({\hat{θ}}_{n}^{*} - {\hat{θ}}_{n}) \Rightarrow N (0, I) a . s .

as n → ∞, where

S_{n} = Φ^{(k)} (g^{(k)} ({\hat{p}}_{n})) G^{(k)} ({\hat{p}}_{n}) U^{- 1} W^{(k)} ({\hat{θ}}_{n}) {(G^{(k)} ({\hat{p}}_{n}))}^{Τ} {(Φ^{(k)} (g^{(k)} ({\hat{p}}_{n})))}^{Τ} .

As n → ∞, we have p̂_n → p₀ a.s. and θ̂_n → θ₀ a.s. so that S_n → S₀ a.s. where

S_{0} = Φ^{(k)} G^{(k)} U^{- 1} W^{(k)} {(G^{(k)})}^{Τ} {(Φ^{(k)})}^{Τ} .

Thus, by Slutsky’s Theorem, we have

\sqrt{n} ({\hat{θ}}_{n}^{*} - {\hat{θ}}_{n}) = S_{n}^{1 / 2} (\sqrt{n} S_{n}^{- 1 / 2} ({\hat{θ}}_{n}^{*} - {\hat{θ}}_{n})) \Rightarrow S_{n}^{1 / 2} N (0, I) a . s .

as n → ∞. Thus for any non zero vector a ∈ R^q+4 with probability one ${\sqrt{na}}^{Τ} ({\hat{θ}}_{n} - θ_{0})$ and ${\sqrt{na}}^{Τ} ({\hat{θ}}_{n}^{*} - {\hat{θ}}_{n})$ have the same weak limit. The triangle inequality and the fact that in this case the weak convergence is uniform, yield (11).

Acknowledgement

The research was partially sponsored by the National Cancer Institute under grant R15 CA106248-02. The authors thank an anonymous reviewer and the associate editor for their helpful comments which helped to improve the paper.

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

References

1.Andrews DWK. Inconsistency of the bootstrap when a parameter is on the boundary of the parameter space. Econometrica. 2000;68(2):399–405. [Google Scholar]
2.Bai J. Estimation of a change point in multiple regression models. The Review of Economics and Statistics. 1997;79(4):551–563. [Google Scholar]
3.Bilodeau M, Brenner D. Theory of Multivariate Statistics. New York: Springer-Verlag; 1999. [Google Scholar]
4.Carpenter J, Bithell J. Bootstrap confidence intervals: when, which, what? A practical guide for medical statisticians. Statistics in Medicine. 2000;19:1141–1164. doi: 10.1002/(sici)1097-0258(20000515)19:9<1141::aid-sim479>3.0.co;2-f. [DOI] [PubMed] [Google Scholar]
5.Centers for Disease Control and Prevention, National Center for Health Statistics. Compressed Mortality File 1979–1998. CDC WONDER On-line Database, compiled from Compressed Mortality File CMF 1968–1988, Series 20, No. 2A, 2000 and CMF 1989–1998, Series 20, No. 2E. 2003 Accessed at http://wonder.cdc.gov/cmf-icd9.html on March 23, 2008.
6.Czajkowski M, Gill R, Rempala G. ljr: Logistic Joinpoint Regression. 2007 R package version 1.0-1. [Google Scholar]
7.Czajkowski M, Gill R, Rempala G. Model selection in logistic joinpoint regression with applications to analyzing cohort mortality patterns. Statistics in Medicine. 2008;27:1508–1526. doi: 10.1002/sim.3017. [DOI] [PubMed] [Google Scholar]
8.Chen J, Gupta AK. Parametric Statistical Change Point Analysis. Boston: Birkhauser; 2000. [Google Scholar]
9.Davison AC, Hinkley DV. Bootstrap Methods and their Applications. Cambridge: Cambridge University Press; 1997. [Google Scholar]
10.de Boor C. A Practical Guide to Splines. Berlin: Springer; 1978. [Google Scholar]
11.Dierckx P. Curve and Surface Fitting with Splines. Oxford: Clarendon; 1993. [Google Scholar]
12.Efron B, Tibshirani R. An Introduction to the Bootstrap. New York: Chapman and Hall; 1993. [Google Scholar]
13.Eilers PHC, Marx BD. Flexible smoothing with B-splines and penalties. Statistical Science. 1996;11(2):89–102. [Google Scholar]
14.Eubank RI. Spline Smoothing and Nonparametric Regression. New York: Marcel Dekker; 1988. [Google Scholar]
15.Feder PI. On asymptotic distribution theory in segmented regression problems – identified case. Annals of Statistics. 1975;3:49–83. [Google Scholar]
16.Gallant AR, Fuller WA. Fitting segmented polynomial regression models whose joinpoints have to be estimated. Journal of the American Statistical Association. 1973;68:144–147. [Google Scholar]
17.Genz A, Bretz F, Hothorn T. mvtnorm: Multivariate Normal and T Distribution. 2008 R package version 0.8–3. [Google Scholar]
18.Gössl C, Küchenhoff H. Bayesian analysis of logistic regression with an unknown change point and covariate measurement error. Statistics in Medicine. 2001;20:3109–3121. doi: 10.1002/sim.928. [DOI] [PubMed] [Google Scholar]
19.Grady DUS. Cancer Death Rates Are Found to Be Falling. New York Times. 2007 October 15; Accessed at http://www.nytimes.com/2007/10/15/us/15cancer.html on March 23, 2008.
20.Green PJ, Silverman BW. Nonparametric Regression and Generalized Linear Models. London: Chapman & Hall; 1994. [Google Scholar]
21.Gu C. Smoothing Spline ANOVA Models. New York: Springer; 2002. [Google Scholar]
22.Hall P. The Bootstrap and Edgeworth Expansion. New York: Spring-Verlag; 1992. [Google Scholar]
23.Hastie T, Tibshirani R, Friedman J. Elements of Statistical Learning: Data Mining, Inference, and Prediction. New York: Springer; 2001. [Google Scholar]
24.Hušková M, Picek J. Bootstrap in detection of changes in linear regression. Sankhyā. 2005;67:200–226. [Google Scholar]
25.Jandhyala VK, MacNeill IB. Tests for parameter changes at unknown times in linear regression models. Journal of Statistical Planning and Inference. 1991;27:291–316. [Google Scholar]
26.Jandhyala VK, MacNeill IB. Iterated partial sum sequences of regression residuals and tests for changepoints with continuity constraints. Journal of the Royal Statistical Society B. 1997;59:147–156. [Google Scholar]
27.Jarušková D. Testing appearance of linear trend. Journal of Statistical Planning and Inference. 1998;70:263–276. [Google Scholar]
28.Kim H-J, Fay MP, Feuer EJ, Midthune DN. Permutation tests for joinpoint regression with applications to cancer rates. Statistics in Medicine. 2000;19:335–351. doi: 10.1002/(sici)1097-0258(20000215)19:3<335::aid-sim336>3.0.co;2-z. [DOI] [PubMed] [Google Scholar]
29.Kim H-J, Fay MP, Yu B, Barrett MJ, Feuer EJ. Comparability of segmented regression models. Biometrics. 2004;60:1005–1014. doi: 10.1111/j.0006-341X.2004.00256.x. [DOI] [PubMed] [Google Scholar]
30.Kim H-J, Yu B, Feuer EJ. Selecting the number of change-points in segmented linear regression. Statistica Sinica. preprint. [PMC free article] [PubMed] [Google Scholar]
31.Kim J, Kim H-J. Asymptotic results in segmented multiple regression. Journal of Multivariate Analysis. preprint. [Google Scholar]
32.Liu J, Wu S, Zidek JV. On segmented multivariate regression. Statistica Sinica. 1997;7:497–525. [Google Scholar]
33.Pawitan Y. In All Likelihood: Statistical Modelling and Inference Using Likelihood. Oxford: Oxford University Press; 2001. [Google Scholar]
34.R Development Core Team. R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing; 2008. ISBN 3-900051-07-0, URL http://www.R-project.org. [Google Scholar]
35.Rempala GA, Szatzschneider K. Bootstrapping Parametric Models of Mortality. Scandinavian Actuarial Journal. 2004;1:53–78. [Google Scholar]
36.Ruppert D. Selecting the number of knots for penalized splines. Journal of Computational Graphics and Statistics. 11(4):735–757. [Google Scholar]
37.Seber GAF, Wild CJ. Nonlinear Regression. New York: Wiley; 1989. [Google Scholar]
38.Serfling RJ. Approximation Theorems of Mathematical Statistics. New York: Wiley; 1980. [Google Scholar]
39.Shao J. Mathematical Statistics. New York: Springer; 1999. [Google Scholar]
40.Tiwari RC, Cronin KA, Davis W, Feuer EJ. Bayesian model selection for join point regression with application to age-adjusted cancer rates. Applied Statistics. 2005;54:919–939. [Google Scholar]
41.Ulm K. A statistical method for assessing a threshold in epidemiological studies. Statistics in Medicine. 1991;20:341–349. doi: 10.1002/sim.4780100306. [DOI] [PubMed] [Google Scholar]
42.Wahba G. Spline Models for Observational Data. Philadelphia: Society for Industrial and Applied Mathematics; 1990. [Google Scholar]
43.Yao Y-C. Estimating the number of change-points via Schwarz’ criterion. Statistics and Probability Letters. 1988;6:181–189. [Google Scholar]

[R1] 1.Andrews DWK. Inconsistency of the bootstrap when a parameter is on the boundary of the parameter space. Econometrica. 2000;68(2):399–405. [Google Scholar]

[R2] 2.Bai J. Estimation of a change point in multiple regression models. The Review of Economics and Statistics. 1997;79(4):551–563. [Google Scholar]

[R3] 3.Bilodeau M, Brenner D. Theory of Multivariate Statistics. New York: Springer-Verlag; 1999. [Google Scholar]

[R4] 4.Carpenter J, Bithell J. Bootstrap confidence intervals: when, which, what? A practical guide for medical statisticians. Statistics in Medicine. 2000;19:1141–1164. doi: 10.1002/(sici)1097-0258(20000515)19:9<1141::aid-sim479>3.0.co;2-f. [DOI] [PubMed] [Google Scholar]

[R5] 5.Centers for Disease Control and Prevention, National Center for Health Statistics. Compressed Mortality File 1979–1998. CDC WONDER On-line Database, compiled from Compressed Mortality File CMF 1968–1988, Series 20, No. 2A, 2000 and CMF 1989–1998, Series 20, No. 2E. 2003 Accessed at http://wonder.cdc.gov/cmf-icd9.html on March 23, 2008.

[R6] 6.Czajkowski M, Gill R, Rempala G. ljr: Logistic Joinpoint Regression. 2007 R package version 1.0-1. [Google Scholar]

[R7] 7.Czajkowski M, Gill R, Rempala G. Model selection in logistic joinpoint regression with applications to analyzing cohort mortality patterns. Statistics in Medicine. 2008;27:1508–1526. doi: 10.1002/sim.3017. [DOI] [PubMed] [Google Scholar]

[R8] 8.Chen J, Gupta AK. Parametric Statistical Change Point Analysis. Boston: Birkhauser; 2000. [Google Scholar]

[R9] 9.Davison AC, Hinkley DV. Bootstrap Methods and their Applications. Cambridge: Cambridge University Press; 1997. [Google Scholar]

[R10] 10.de Boor C. A Practical Guide to Splines. Berlin: Springer; 1978. [Google Scholar]

[R11] 11.Dierckx P. Curve and Surface Fitting with Splines. Oxford: Clarendon; 1993. [Google Scholar]

[R12] 12.Efron B, Tibshirani R. An Introduction to the Bootstrap. New York: Chapman and Hall; 1993. [Google Scholar]

[R13] 13.Eilers PHC, Marx BD. Flexible smoothing with B-splines and penalties. Statistical Science. 1996;11(2):89–102. [Google Scholar]

[R14] 14.Eubank RI. Spline Smoothing and Nonparametric Regression. New York: Marcel Dekker; 1988. [Google Scholar]

[R15] 15.Feder PI. On asymptotic distribution theory in segmented regression problems – identified case. Annals of Statistics. 1975;3:49–83. [Google Scholar]

[R16] 16.Gallant AR, Fuller WA. Fitting segmented polynomial regression models whose joinpoints have to be estimated. Journal of the American Statistical Association. 1973;68:144–147. [Google Scholar]

[R17] 17.Genz A, Bretz F, Hothorn T. mvtnorm: Multivariate Normal and T Distribution. 2008 R package version 0.8–3. [Google Scholar]

[R18] 18.Gössl C, Küchenhoff H. Bayesian analysis of logistic regression with an unknown change point and covariate measurement error. Statistics in Medicine. 2001;20:3109–3121. doi: 10.1002/sim.928. [DOI] [PubMed] [Google Scholar]

[R19] 19.Grady DUS. Cancer Death Rates Are Found to Be Falling. New York Times. 2007 October 15; Accessed at http://www.nytimes.com/2007/10/15/us/15cancer.html on March 23, 2008.

[R20] 20.Green PJ, Silverman BW. Nonparametric Regression and Generalized Linear Models. London: Chapman & Hall; 1994. [Google Scholar]

[R21] 21.Gu C. Smoothing Spline ANOVA Models. New York: Springer; 2002. [Google Scholar]

[R22] 22.Hall P. The Bootstrap and Edgeworth Expansion. New York: Spring-Verlag; 1992. [Google Scholar]

[R23] 23.Hastie T, Tibshirani R, Friedman J. Elements of Statistical Learning: Data Mining, Inference, and Prediction. New York: Springer; 2001. [Google Scholar]

[R24] 24.Hušková M, Picek J. Bootstrap in detection of changes in linear regression. Sankhyā. 2005;67:200–226. [Google Scholar]

[R25] 25.Jandhyala VK, MacNeill IB. Tests for parameter changes at unknown times in linear regression models. Journal of Statistical Planning and Inference. 1991;27:291–316. [Google Scholar]

[R26] 26.Jandhyala VK, MacNeill IB. Iterated partial sum sequences of regression residuals and tests for changepoints with continuity constraints. Journal of the Royal Statistical Society B. 1997;59:147–156. [Google Scholar]

[R27] 27.Jarušková D. Testing appearance of linear trend. Journal of Statistical Planning and Inference. 1998;70:263–276. [Google Scholar]

[R28] 28.Kim H-J, Fay MP, Feuer EJ, Midthune DN. Permutation tests for joinpoint regression with applications to cancer rates. Statistics in Medicine. 2000;19:335–351. doi: 10.1002/(sici)1097-0258(20000215)19:3<335::aid-sim336>3.0.co;2-z. [DOI] [PubMed] [Google Scholar]

[R29] 29.Kim H-J, Fay MP, Yu B, Barrett MJ, Feuer EJ. Comparability of segmented regression models. Biometrics. 2004;60:1005–1014. doi: 10.1111/j.0006-341X.2004.00256.x. [DOI] [PubMed] [Google Scholar]

[R30] 30.Kim H-J, Yu B, Feuer EJ. Selecting the number of change-points in segmented linear regression. Statistica Sinica. preprint. [PMC free article] [PubMed] [Google Scholar]

[R31] 31.Kim J, Kim H-J. Asymptotic results in segmented multiple regression. Journal of Multivariate Analysis. preprint. [Google Scholar]

[R32] 32.Liu J, Wu S, Zidek JV. On segmented multivariate regression. Statistica Sinica. 1997;7:497–525. [Google Scholar]

[R33] 33.Pawitan Y. In All Likelihood: Statistical Modelling and Inference Using Likelihood. Oxford: Oxford University Press; 2001. [Google Scholar]

[R34] 34.R Development Core Team. R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing; 2008. ISBN 3-900051-07-0, URL http://www.R-project.org. [Google Scholar]

[R35] 35.Rempala GA, Szatzschneider K. Bootstrapping Parametric Models of Mortality. Scandinavian Actuarial Journal. 2004;1:53–78. [Google Scholar]

[R36] 36.Ruppert D. Selecting the number of knots for penalized splines. Journal of Computational Graphics and Statistics. 11(4):735–757. [Google Scholar]

[R37] 37.Seber GAF, Wild CJ. Nonlinear Regression. New York: Wiley; 1989. [Google Scholar]

[R38] 38.Serfling RJ. Approximation Theorems of Mathematical Statistics. New York: Wiley; 1980. [Google Scholar]

[R39] 39.Shao J. Mathematical Statistics. New York: Springer; 1999. [Google Scholar]

[R40] 40.Tiwari RC, Cronin KA, Davis W, Feuer EJ. Bayesian model selection for join point regression with application to age-adjusted cancer rates. Applied Statistics. 2005;54:919–939. [Google Scholar]

[R41] 41.Ulm K. A statistical method for assessing a threshold in epidemiological studies. Statistics in Medicine. 1991;20:341–349. doi: 10.1002/sim.4780100306. [DOI] [PubMed] [Google Scholar]

[R42] 42.Wahba G. Spline Models for Observational Data. Philadelphia: Society for Industrial and Applied Mathematics; 1990. [Google Scholar]

[R43] 43.Yao Y-C. Estimating the number of change-points via Schwarz’ criterion. Statistics and Probability Letters. 1988;6:181–189. [Google Scholar]

PERMALINK

Confidence Estimation via the Parametric Bootstrap in Logistic Joinpoint Regression

Ryan Gill

Grzegorz A Rempala

Michal Czajkowski

Abstract

1 Introduction

2 Clustered Logistic Joinpoint Model

2.1 Maximum Likelihood Estimation

3 Asymptotic properties

Lemma 3.1

Lemma 3.2

Theorem 3.1

Theorem 3.2

Theorem 3.3

4 Consistency of Bootstrap Confidence Bounds

4.1 Consistency when τ0 ≠ tℓ for any ℓ

Lemma 4.1

Theorem 4.1

4.2 Simulation when τ0 = tℓ for some ℓ

Table 1.

4.3 Removal Algorithm

Table 2.

5 Example

Figure 1.

Table 3.

6 Summary and Conclusion

A Proofs

Proof of Lemma 3.1

Proof of Lemma 3.2

Proof of Theorem 3.1

Proof of Theorem 3.2

Proof of Theorem 4.1

Acknowledgement

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

4.1 Consistency when τ₀ ≠ t_ℓ for any ℓ

4.2 Simulation when τ₀ = t_ℓ for some ℓ