GENERALIZED LEAST SQUARES ESTIMATION OF THE MEAN FUNCTION OF A COUNTING PROCESS BASED ON PANEL COUNTS

X Joan Hu; Stephen W Lagakos; Richard A Lockhart

. Author manuscript; available in PMC: 2010 Mar 9.

Published in final edited form as: Stat Sin. 2009;19:561–580.

GENERALIZED LEAST SQUARES ESTIMATION OF THE MEAN FUNCTION OF A COUNTING PROCESS BASED ON PANEL COUNTS

X Joan Hu ¹, Stephen W Lagakos ², Richard A Lockhart ¹

PMCID: PMC2835534 NIHMSID: NIHMS80666 PMID: 20221323

Abstract

This paper considers nonparametric estimation of the mean function of a counting process based on periodic observations, i.e., panel counts. We present estimators derived through minimizing a class of generalized sums of squares subject to a monotonicity constraint. We establish consistency of the estimators and provide procedures to implement them with various weight functions. For specific weight functions, they reduce to the estimator given in Sun and Kalbfleisch (1995), and are closely related to the nonparametric maximum likelihood estimator studied in Wellner and Zhang (2000). With other weight functions, the proposed estimators provide alternatives that can have better efficiency in non-Poisson situations than previous approaches. Simulations are used to examine the finite-sample performance of the proposed estimators.

Keywords: Isotonic regression, monotonicity constraint, periodic observations

1. Introduction

Suppose that the number of events that occur until time t > 0 is N(t), and we wish to estimate the marginal expectation E{N(t)} without specifying a probability model for the counting process {N(t), t > 0}. See Lawless and Nadeau (1995), Lin, Wei, Yang and Ying (2000), and Wellner and Zhang (2000) for examples of this setup.

When {N(t), t > 0} is a continuously-observed Poisson process subject to right-censoring, the Nelson-Aalen estimator (cf: Andersen, Borgan, Gill, and Keiding (1992), Chap.IV) is commonly used for estimating the cumulative intensity function of the counting process, which is the same as its mean function Λ₀(t) = E{N (t)} (e.g., Lawless (1995)). Lawless and Nadeau (1995) and Lin, Wei, Yang, and Ying (2000) indicate that the Nelson-Aalen estimator is a consistent estimator for the marginal mean function of a counting process regardless of the underlying probability model.

When the process for an individual is only observed at a finite set of time points (that is, the observations are panel counts), Sun and Kalbfleisch (1995) present an estimator of the mean function based on isotonic regression, and Wellner and Zhang (2000) derive a maximum pseudo-likelihood estimator (NPMPLE) and the nonparametric maximum likelihood estimator (NPMLE) under the assumption that the counting process is Poisson. Wellner and Zhang note that their NPMPLE is the same as the Sun-Kalbfleisch estimator, demonstrate that the NPMLE is more efficient than the NPMPLE via simulation, and show that both their estimators are consistent without the Poisson assumption. The efficiency of the NPMLE for non-Poisson counts has not been explored. Hu and Lagakos (2007) consider two weighted least squares estimators for the mean of an arbitrary response process in a general incomplete data setting. Their estimators reduce to, or are closely related to, some commonly-used estimators with certain data structures, but are not necessarily monotone with panel counts.

This paper proposes a general class of nonparametric estimators for the mean function Λ₀(·) based on periodic observations, without specifying an underlying probability model. The estimators arise by minimizing a generalized weighted sum of squares under a monotonicity constraint. We establish consistency of the general estimator and provide procedures to implement it with various weights. One choice of weight function leads to the Sun-Kalbfleisch estimator and thus also the Wellner-Zhang NPMPLE. Another leads to an estimator that is closely related to the Wellner-Zhang NPMLE. Via simulation, we study the finite sample performance of the proposed estimators in various situations, and compare them to the Sun-Kalbfleisch estimator and the Wellner-Zhang NPMLE. With some other specific weight functions, the proposed approach provides alternatives to the NPMLE and NPMPLE that can have higher efficiency in situations with non-Poisson counts.

Without loss of generality, we take N(0) ≡ 0 and thus Λ₀(0) = 0. We assume that the counting process and the observation mechanism are independent. Section 2 presents a generalized least squares estimator for the mean function Λ₀(·) subject to the constraint that Λ₀(·) is monotone, and then establishes consistency of the estimator. Section 3 provides procedures to implement the estimator, and Section 4 examines the specific estimators that result from several choices of weight function. Section 5 presents the results of a simulation study comparing the proposed estimators with those of Sun and Kalbfleisch (1995) and Wellner and Zhang (2000). Section 6 discusses the results and extensions.

2. A Generalized Least Squares Monotonic Estimator

Hu and Lagakos (2007) show that two commonly-used nonparametric estimators, the sample mean of available data and the Nelson-Aalen estimator, can be derived as a solution to a weighted sum of squares criterion. This, along with the monotonicity of the mean function Λ₀(·), leads us to explore a criterion for defining nonparametric estimators through minimizing a generalized sum of squares subject to the monotonicity constraint. In the following, we introduce a generalized sum of squares criterion, and establish consistency of the estimator obtained through its constrained maximization.

Let T = (T_K_,1, …, T_K_,_K)′, with T_K_,_j denoting the jth of K observation times during time period [0, τ], with 0 < τ < ∞, where τ is fixed and K can be random. Let N = (N(T_K_,1), …, N (T_K_,_K)′ and Λ = (Λ(T_K_,1), …, Λ(T_K_,_K))′. Assume {X_i: i = 1, …, n} is a set of i.i.d. samples of X = (K, T, N). We consider the generalized sum of squares with a symmetric weight function w(u, v) for u, v ∈ {T_k_,1, …, T_K_,_K},

L_{n} (Λ; w) = \sum_{i = 1}^{n} m (Λ; w ∣ X_{i}),

(1)

where

\begin{array}{l} m (Λ; w ∣ X) = (N - Λ)^{'} W (N - Λ) \\ = \sum_{j = 1}^{K} \sum_{l = 1}^{K} w (T_{K, j}, T_{K, l}) {N (T_{K, j}) - Λ (T_{K, j})} {N (T_{K, l}) - Λ (T_{K, l})}, \end{array}

(2)

and where W is the K × K symmetric matrix with (j, l) entry w(T_K_,_j, T_K_,_l). The weight function is deterministic given (K, T). Let Λ̂_n(·; w) be the minimum point of L_n(Λ; w) over all nondecreasing functions defined in [0, τ], that is,

{\hat{Λ}}_{n} (\cdot; w) = {argmin}_{Λ \in F} L_{n} (Λ; w)

(3)

with ℱ = {Λ(·): nondecreasing in [0, τ]}. Following Wellner and Zhang (2000), we define a measure μ in [0, τ] based on the distribution of (K, T):

μ (B) = \sum_{k = 1}^{\infty} P (K = k) \sum_{j = 1}^{k} P (T_{K, j} \in B ∣ K = k),

where B is a Borel set in [0, τ]. For Λ, Λ′ ∈ ℱ, define

d (Λ, Λ^{'}) = {[\int {Λ (t) - Λ^{'} (t)}^{2} d μ (t)]}^{1 / 2} .

(4)

The following theorem establishes the consistency of Λ̂_n(·; w) with respect to the metric d.

Condition A: Λ₀(t) = E{N (t)} and c(t, s) = Cov{N (t), N (s)} exist for t, s ∈ [0, τ], Λ₀(·) is strictly increasing with Λ₀(0) = 0, and E{N′N} < ∞.

Condition B: The K × K weight matrix W = (w(T_K_,_j, T_K_,_l) can be expressed as W = A′A with nonsingular A, and E{||A||²} < ∞ and E{||A⁻¹||²} < ∞, where $∣ ∣ A ∣ ∣^{2} = \sum a_{i j}^{2} = t r (A^{'} A)$ .

Condition C: μ({t}) > 0 and E{K²} < ∞.

Theorem 1

Assume Conditions A–C. For a given weight function w, d(Λ̂_n(·; w), Λ₀(·)) → 0 almost surely as n → ∞.

A proof of the theorem is outlined in Appendix A. Following the arguments in Wellner and Zhang (2000), we can extend the result to the situations with μ({τ}) = 0. With some additional conditions on the observation mechanism, we may also show that the convergence rate of Λ̂_n(·; w) is at least n¹^/³.

3. Implementation of Estimator Λ̂_n(·; w)

We begin by introducing an estimator closely related to Λ̂_n(·; w) with an arbitrary weight function, then show how this can be used to find Λ̂_n(·; w).

3.1. Preparation

Denote the set of distinct values of the observation times T_{K_ij}, j = 1, …, K_i, i = 1, …, n} by Inline graphic _n = {s_l: l = 1, …, J_n} with 0 = s₀ < s₁ < … < s_{J_n} ≤ τ. Let N̰ = (N(s₁), …, N(s_{J_n})) with N̰_i denoting the realization from subject i, Λ̰ = (Λ(s₁), …, Λ(s_{J_n}))′, and Δ_i = diag{δ_i(s_l): s_l ∈ _n}, where δ_i(s_l) = 1 or 0 depending on whether or not N_i(s_l) is observed. Further, let W̰_i be the J_n × J_n matrix with the (l, l′) entry w_i(s_l, s_l_′) = w(s_l, s_l_′) if both l, l′ belong to {k: there exists a j such that T_{K_ij} = s_k}, and zero otherwise. The objective function in (1) is then expressible as

L_{n} (Λ; w) = \sum_{i = 1}^{n} {(\underset{\sim}{N_{i}} - \underset{\sim}{Λ})}^{'} Δ_{i}^{'} \underset{\sim}{W_{i}} Δ_{i} (\underset{\sim}{N_{i}} - \underset{\sim}{Λ}) .

(5)

The estimator Λ̂_n(·; w) defined in (3) is thus an element in the class ${Λ : Λ \in F, \underset{\sim}{Λ} = Λ_{\sim}^{^} (w)}$ with $Λ_{\sim}^{^} (w) = {argmin}_{\underset{\sim}{Λ} \in C_{+}^{J_{n}}} L_{n} (Λ; w)$ for a given weight function w(·). Here $C_{+}^{J}$ is the cone in ℛ^J = {y = (y₁, …, y_J)′: y_j ∈ ℛ} defined by $C_{+}^{J} = {y \in R^{J} : 0 \leq y_{1} \leq \dots \leq y_{J}}$ . A natural choice for Λ̂_n(·; w) is the step function, starting with 0 at t = 0 and having jumps at s_l with size equal to the lth component of $Λ_{\sim}^{^} (w)$ for l = 1, …, J_n. For a given weight function w(·), we can construct Λ̂_n(·; w) through $Λ_{\sim}^{^} (w)$ . Thus, in the following we focus on how to obtain $Λ_{\sim}^{^} (w)$ .

The objective function L_n(Λ; w) in (5) can be decomposed into the sum of two terms,

L_{n} (Λ; w) = \sum_{i = 1}^{n} {(\underset{\sim}{N_{i}} - Λ_{\sim}^{-})}^{'} Δ_{i}^{'} \underset{\sim}{W_{i}} Δ_{i} (\underset{\sim}{N_{i}} - Λ_{\sim}^{-}) + {(Λ_{\sim}^{-} - \underset{\sim}{Λ})}^{'} B_{n} (w) (Λ_{\sim}^{-} - \underset{\sim}{Λ})

(6)

with

B_{n} (w) = \sum_{i = 1}^{n} Δ_{i}^{'} \underset{\sim}{W_{i}} Δ_{i} and Λ_{\sim}^{-} (w) = {(\sum_{i = 1}^{n} Δ_{i}^{'} \underset{\sim}{W_{i}} Δ_{i})}^{- 1} \sum_{i = 1}^{n} Δ_{i}^{'} \underset{\sim}{W_{i}} Δ_{i} \underset{\sim}{N_{i}} .

(7)

Note that $Λ_{\sim}^{-} (w)$ minimizes L_n(Λ; w) in (5). It is easy to show that, conditional on the observation times and for a given weight function w(·), $Λ_{\sim}^{-} (w)$ is an unbiased estimator of Λ̰₀ = (Λ₀(s₁), …, Λ₀(s_{J_n}))′, provided the inverse in (7) exists. The conditional covariance matrix of $Λ_{\sim}^{-} (w)$ is $B_{n} {(w)}^{- 1} (\sum Δ_{i}^{'} \underset{\sim}{W_{i}} Δ_{i} Var (\underset{\sim}{N}) Δ_{i}^{'} \underset{\sim}{W_{i}} Δ_{i}) B_{n} {(w)}^{- 1}$ . In fact, $Λ_{\sim}^{-} (w)$ gives a consistent estimator of Λ₀(t) for t ∈ Inline graphic = lim_n_→∞ _n, provided that the probability of an observation at t ∈ is positive (Hu and Lagakos (2007)).

The bar estimator $Λ_{\sim}^{-} (w)$ does not always belong to $C_{+}^{J_{n}}$ and thus is not the same as $Λ_{\sim}^{^} (w)$ in general. We present procedures to compute $Λ_{\sim}^{^} (w) \in C_{+}^{J_{n}}$ . From (6), the estimator $Λ_{\sim}^{^} (w)$ can be viewed as the projection of $Λ_{\sim}^{-} (w)$ into $C_{+}^{J_{n}}$ with respect to the metric ρ: ρ_w(a, b)² = (a − b)′ B_n(w)(a − b) for a, b ∈ ℛ^J^_n.

3.2. Procedures to implement $Λ_{\sim}^{^} (w)$

We consider an application of the result given in Wellner and Zhan (1997) and Wellner and Zhang (2000), which is restated as Lemma 2 in Appendix B. Let φ(Λ̰) = L_n(Λ; w)/2). The estimator can be obtained by jointly solving the following equation and inequalities in $\underset{\sim}{Λ} = (Λ (s_{1}), \dots, Λ (s_{J_{n}}))^{'} \in C_{+}^{J_{n}}$ :

{\underset{\sim}{Λ}}^{'} \nabla φ (\underset{\sim}{Λ}) = \sum_{s \in T_{n}} Λ (s) \frac{\partial φ (\underset{\sim}{Λ})}{\partial Λ (s)} = 0,

(8)

(\overset{j - 1}{\overset{︷}{0, \dots, 0}}, 1, \dots, 1)^{'} \nabla φ (\underset{\sim}{Λ}) = \sum_{s \geq s_{j}; s \in T_{n}} \frac{\partial φ (\underset{\sim}{Λ})}{\partial Λ (s)} \geq 0, s_{j} \in T_{n} .

(9)

Here we have

\nabla φ (\underset{\sim}{Λ}) = - \sum_{i = 1}^{n} Δ_{i}^{'} \underset{\sim}{W_{i}} Δ_{i} (\underset{\sim}{N_{i}} - \underset{\sim}{Λ}) = - B_{n} (w) (Λ_{\sim}^{-} (w) - \underset{\sim}{Λ}),

(10)

with B_n(w) and $Λ_{\sim}^{-} (w)$ defined in (7). For the special case where $Λ_{\sim}^{-} (w) \in C_{+}^{J_{n}}, Λ_{\sim}^{^} (w) = Λ_{\sim}^{-} (w)$ . In general, it is not easy to jointly solve (8) and (9) for $\underset{\sim}{Λ} \in C_{+}^{J_{n}}$ .

An alternative approach for evaluating $Λ_{\sim}^{^} (w)$ utilizes the ICM algorithm (e.g., Jongbloed (1998)), which is restated for our application as Lemma 3 in Appendix B. Specifically, the sequence {Λ̰⁽^k⁾(w): k = 1, …} in the algorithm is obtained as

Λ^{(k)} (s_{l}) = {max_{v \leq l} min_{r \geq l} \frac{\sum_{j = v}^{r} Λ (s_{j}) φ_{j j} (\underset{\sim}{Λ}) - φ_{j} (\underset{\sim}{Λ})}{\sum_{j = v}^{r} φ_{j j} (\underset{\sim}{Λ})} ∣}_{\underset{\sim}{Λ} = {\underset{\sim}{Λ}}^{(k - 1)}}, l = 1, \dots, J_{n},

(11)

where Λ̰⁽⁰⁾ is an initial estimator, φ_j(Λ̰) is the jth component of ∇φ(Λ̰) in (10), and φ_jj(Λ̰) is the (j, j) element of the matrix ∇²φ(Λ̰) = B_n(w).

From (6), $Λ_{\sim}^{^} (w) = {argmin}_{\underset{\sim}{Λ} \in C_{+}^{J_{n}}} φ^{*} (\underset{\sim}{Λ})$ with $φ^{*} (\underset{\sim}{Λ}) = {(Λ_{\sim}^{-} - \underset{\sim}{Λ})}^{'} B_{n} (w) (Λ_{\sim}^{-} - \underset{\sim}{Λ}) / 2$ . It can be computationally easier in some situations to use φ^*(Λ̰) instead of φ(Λ̰) = L_n(Λ; w)/2 to obtain the estimator $Λ_{\sim}^{^} (w)$ . The kth evaluation Λ⁽^k⁾(w) in the iterative procedure is the left derivative of the greatest convex minorant of the cumulative sum diagram

{(\sum_{r = 1}^{j} a_{r r}, \sum_{r = 1}^{j} b_{r} (\underset{\sim}{Λ})) ∣}_{\underset{\sim}{Λ} = {\underset{\sim}{Λ}}^{(k - 1)}}, j = 1, \dots, J_{n}

with a_rr the (r, r) element of the matrix B_n(w) = ∇²φ(Λ̰), and b_r(Λ̰) the rth component of the vector $B_{n} (w) Λ_{\sim}^{-} (w) + [diag (B_{n} (w)) - B_{n} (w)] \underset{\sim}{Λ}$ . When B_n(w) is diagonal, the estimator $Λ_{\sim}^{^} (w)$ is the isotonic regression of $Λ_{\sim}^{-} (w)$ with respect to the weights of B_n(w)’s diagonal elements (Barlow, et al. (1972), Chap.1). Thus, we view $Λ_{\sim}^{^} (w)$ as a generalized isotonic regression of the bar estimator $Λ_{\sim}^{-} (w)$ with weight matrix B_n(w).

4. Choices of Weight Function

We now consider several weight functions for the estimator $Λ_{\sim}^{^} (w)$ . Each determines the random matrix W_K×K in the expression m(Λ; w|X) in (2).

4.1. Observed sample mean weight (OSM)

In practice, the sample mean of available data, termed the observed sample mean by Hu and Lagakos (2007), is routinely used to summarize repeated measures with random missing data to indicate trends along the time and differences between treatment groups. This suggests taking W in (2) to be W_OSM = I_K×K, the K × K identity matrix. The matrices W̰_i in (5) are then Δ_i = diag(δ_i(s_j): j = 1, …, J_n. It is easy to verify that this weight satisfies Condition B in Section 2, since W_OSM = A′A with A = I_K×K. The bar estimator and the B_n matrix in (7) are then

{\bar{Λ}}_{OSM} (t) = \frac{\sum_{i = 1}^{n} δ_{i} (t) N_{i} (t)}{\sum_{i = 1}^{n} δ_{i} (t)}, t \in T_{n} = {s_{1}, \dots, s_{J_{n}}}

(12)

and $B_{n} (w_{OSM}) = \sum Δ_{i}^{'} Δ_{i} = diag (n_{j} : j = 1, \dots, J_{n})$ with n_j = Σδ_i(s_j), the size of the observations at time s_j. See Hu and Lagakos (2007) for more discussion of Λ̄_OSM(t) for an arbitrary response process with a more general observation mechanism. For right-censored survival data, it becomes the reduced sample estimator introduced by Kaplan and Meier (1958) as an alternative to the product-limit estimator.

Following the definition given in Barlow, et al. (1972, page 9), the estimator $Λ_{\sim}^{^} (w_{OSM})$ is the isotonic regression of $Λ_{\sim}^{-} (w_{OSM})$ with weights {n_j: j = 1, …, J_n}. Thus $Λ_{\sim}^{^} (w_{OSM})$ can be obtained directly from $Λ_{\sim}^{-} (w_{OSM})$ and the given weights by

{\hat{Λ}}_{OSM} (s_{l}) = max_{v \leq l} min_{r \geq l} \frac{\sum_{j = v}^{r} n_{j} {\bar{Λ}}_{OSM} (s_{j})}{\sum_{j = v}^{r} n_{j}}, l = 1, \dots J_{n} .

The estimator Λ̂_OSM (·) is the same as the Sun-Kalbfleisch estimator (1995). Wellner and Zhang (2000) note that their NPMPLE is also the same as the Sun-Kalbfleisch estimator.

4.2. Cumulative observed increment weight (COI)

Motivated by the construction of the Kaplan-Meier estimator, Hu and Lagakos (2007) propose an estimator based on the cumulative observed increments (COI), and show that this estimator can be more efficient than the observed sample mean when the observations at different times from the same study individual are correlated. Following their approach, we consider the weight matrix W in (2) as W_COI = Ω′Ω with

Ω = {(a (T_{K, j}, T_{K, l}))}_{K \times K}, and a (T_{K, j}, T_{K, l}) = {\begin{array}{l} 1 & for l = j with j = 1, \dots, K, \\ - 1 & for l = j - 1 with j = 2, \dots, K, \\ 0 & otherwise . \end{array}

(13)

It is easy to verify that the weight function satisfies Condition B in Section 2. The associated objective function L_n(Λ; w_COI) depends only on the observed increments ΔN(T_K_,_j) = N(T_K_,_j) − N(T_K_,_j₋₁), j = 1, …, K, with T_K_,0 = 0. Here the matrix W̰_i in (5) becomes Ω̰_i′ Ω̰_i, where Ω̰_i is the J_n × J_n matrix with elements (j, l) that are zero or are the same as the corresponding elements of Ω given in (13) subject to subject i not having or having observations at s_j and s_l, respectively.

Hu and Lagakos (2007) note that when using the current weight w_COI, $Λ_{\sim}^{-} (w_{COI})$ is the nonparametric version of the estimator studied in Lawless and Nadeau (1995) and Lin, Wei, Yang, and Ying (2000) for semi-parametric estimation of the mean function Λ₀(·) from right-censored counting process data, which is an extension of the Nelson-Aalen estimator of the cumulative intensity function under the Poisson assumption.

The components of $Λ_{\sim}^{-} (w_{COI})$ are not in general in nondecreasing order when used with periodic observations. The estimator $Λ_{\sim}^{^} (w_{COI})$ , which minimizes the second term of (6) with the given weight $\underset{\sim}{Λ} \in C_{+}^{J_{n}}$ , is thus more desirable in the current setting. The matrix B_n(w_COI) = Σ Ω̰_i′ Ω̰_i is not in general diagonal and thus Λ̂_n(·; w_COI) is a generalized isotonic regression of Λ̄_n(·; w_COI). We can use the procedures in Section 3.2.2 to compute $Λ_{\sim}^{^} (w_{COI})$ and then to obtain Λ̂_n(·; w_COI).

4.3. Generalized estimating equation weights (GEE)

Motivated by the construction of generalized estimating equations (see Diggle, Liang and Zeger (1994), Chap.8) for parametric estimation, consider the weight matrix W in (2) to be the inverse of the covariance matrix of N = (N(T_K_,1), …, N (T_K_,_K))′, conditional on the number of observations K and the observation times T = (T_K_,1, …, T_K_,_K)′; that is,

W_{GSM} = {(Cov {N (T_{K, j}), N (T_{K, l}) ∣ T, K})}_{K \times K}^{- 1} .

(14)

The weight matrices W̰_i in (5) for evaluating the estimator are now ${(Δ_{i} Var (\underset{\sim}{N}) Δ_{i}^{'})}^{-}$ , where D⁻ denotes the Moore-Penrose generalized inverse of D, and Var(N̰) = (c(s_j, s_l)). Another option uses the weight

W_{GCI} = Ω^{'} {(Cov {Δ N (T_{K, j}), Δ N (T_{K, l}) ∣ T, K})}_{K \times K}^{- 1} Ω,

(15)

where Ω is defined in (13) and ΔN(T_K_,_j) = N (T_K_,_j) − N (T_K_,_j₋₁). The weight matrices W̰_i in (5) are then W_G̰CI_,_i = Ω̰_i′(Ω̰_iVar(N̰)Ω̰_i′)⁻ Ω̰_i.

Using either W_GSM in (14) or W_GCI in (15) requires the covariance matrix Var(N̰). When this is unknown, as would often occur in practice, it can be replaced by an estimate. Note that, with Var(N̰) substituted by the J_n × J_n identity matrix, W_GSM and W_GCI reduce to W_OSM (§4.1) and W_COI (§4.2), respectively. Below we consider two approximations to the covariance matrix Var(N̰) and the associated estimation procedures, and we compare the resulting estimators with the Wellner-Zhang NPMPLE and NPMLE.

Approximation A: using Var(N̰) ≈ diag(Λ₀(s_j): j = 1, … J_n)

If the counting process {N(t), t > 0} is Poisson, this approximation is the diagonal matrix of the true covariance matrix, and we plug it into either W_GSM (14) or W_GCI (15) to obtain a GEE type weight.

Use of this approximation in W_GSM (14) corresponds to the weight $W_{GSM}^{*} = diag (1 / Λ_{0} (T_{K, j}) : j = 1, \dots, K)$ . Provided E{N(t)} = Λ₀(t) > 0 for t > 0, the weight matrix based on this approximation satisfies Condition B in Section 2. The matrix B_n(w) = ∇²φ(Λ̰) is the diagonal matrix $B_{n} (w_{GSM}^{*}) = diag (n_{j} / Λ_{0} (s_{j}) : j = 1, \dots, J_{n})$ . Thus, if Λ₀(·) were known, the resulting estimator $Λ_{\sim}^{^} (w_{GSM}^{*})$ would be the isotonic regression of $Λ_{\sim}^{-} (w_{GSM}^{*})$ with weights {n_j/Λ₀(s_j): j = 1, …, J_n=. Note that $Λ_{\sim}^{-} (w_{GSM}^{*})$ is the same as $Λ_{\sim}^{-} (w_{OSM})$ given by (12), and also that $Λ_{\sim}^{^} (w_{OSM})$ is the isotonic regression of $Λ_{\sim}^{-} (w_{OSM})$ in Section 4.1, the same as $Λ_{\sim}^{-} (w_{GSM}^{*})$ , with weights {n_j: j = 1, …, J_n}. The hat estimator with the current weight $Λ_{\sim}^{^} (w_{GSM}^{*})$ is not, in general, the same as the estimator $Λ_{\sim}^{^} (w_{OSM})$ , which corresponds to the estimator proposed by Sun and Kalbfleisch (1995).

Because the weight $W_{GSM}^{*}$ involves Λ₀(·), we consider the following iterative algorithm to implement the estimator $Λ_{\sim}^{^} (w_{GSM}^{*})$ : given the (k − 1)th iteration Λ̰⁽^k⁻¹⁾, obtain the kth iteration for the estimator by using the weight $W_{GSM}^{*}$ with Λ₀ substituted by Λ̰⁽^k⁻¹⁾, for k = 1, … until the sequence converges.

Appendix C shows that the procedure of applying the ICM algorithm (Lemma 3 in Appendix B) to maximize the pseudo-likelihood given in Wellner and Zhang (2000) gives the isotonic regression of $Λ_{\sim}^{-} (w_{OSM})$ with weights ${n_{j} {\bar{Λ}}_{n} (s_{j}; w_{OSM}) / Λ_{0}^{2} (s_{j}) : j = 1, \dots, J_{n}}$ . As pointed out in Wellner and Zhang (2000), the procedure should converge to the Wellner-Zhang NPMPLE, which is the same as the Sun-Kalbfleisch estimator (1995); that is, $Λ_{\sim}^{-} (w_{OSM})$ . Since Λ̄_n(s_j; w_OSM) is a consistent estimator of Λ₀(s_j), the resulting estimator from the iterative procedure is then close to the estimator $Λ_{\sim}^{^} (w_{GSM}^{*})$ for large n. This is confirmed for Poisson counts via simulation in Section 5.1.

Approximation B: using Var(N̰) ≈ (σ_lj) with σ_lj = Λ₀(s_j) for j ≤ l: j, l = 1, …, J_n

Applying the approximation in W_GCI (15), the weight in the estimation is $W_{GCI}^{*} = Ω^{'} diag (1 / Δ Λ_{0} (T_{K, j}) : j = 1, \dots, K) Ω$ . If {N(t), t > 0} is a Poisson process, this approximation uses the true covariance matrix of the increments ΔN(T_K_,_j), j = 1, …, K, where Cov {ΔN(T_K_,_j), ΔN (T_K_,_l)|T, K} is ΔΛ₀(T_K_,_j) = Λ₀(T_K_,_j) − Λ₀(T_K_,_j₋₁) or 0 for l = j or not, respectively. The following additional condition, which holds in many practical situations, ensures that the weight $W_{GCI}^{*}$ satisfy Condition B in Theorem 1.

Condition D: The observation times are ε-separated; that is, there exists a constant ε > 0 such that P(T_K_,_j − T_K_,_j₋₁ ≥ ε: j = 1, …, K) = 1.

It appears that Wellner and Zhang (2000) need this condition in the asymptotics derivation for their nonparametric maximum likelihood estimator (NPMLE).

Applying Lemma 3 (the ICM algorithm) in Appendix B, we can evaluate $Λ_{\sim}^{^} (w_{GCI}^{*})$ as given in Section 3.2 with weight $w_{GCI}^{*}$ , which is a generalized isotonic regression of the corresponding bar estimator $Λ_{\sim}^{-} (w_{GCI}^{*})$ . Since the weight $W_{GCI}^{*}$ involves Λ₀(·), an iterative algorithm similar to the one for $Λ_{\sim}^{^} (w_{GSM}^{*})$ is needed to implement $Λ_{\sim}^{^} (w_{GCI}^{*})$ .

Appendix D shows the connection of $Λ_{\sim}^{^} (w_{GCI}^{*})$ to the Wellner-Zhang NPMLE (2000). The difference is due to Λ̰ in the matrix $B_{n} (w_{GCI}^{*})$ being fixed at the previous estimate in evaluating $Λ_{\sim}^{^} (w_{GCI}^{*})$ and is treated as unknown in evaluating the Wellner-Zhang NPMLE. Section 5 compares the two estimators for various situations via simulation.

5. Simulation Study

To study the properties of the estimators with the weights discussed in Section 4 in finite-samples, we conducted a simulation with N(·) = {N(t), 0 < t ≤ 1} being a Poisson or mixed-Poisson process. We used n = 100 i.i.d. realizations of N(·), combined with one of the following two observation schemes, each chosen to yield an average of four observations per subject.

Observation Scheme A: Observation times for each individual were generated from a time-homogeneous Poisson process with rate of 4. This scheme simulates a study in which observation times vary among individuals.

Observation Scheme B: The potential observation times are t_j = 0.05, 0.10, ···, 0.95, 1.00, with the probability of an observation at time t_j being $1.05 - t_{j}^{1 / 4}$ , and where the presence or absence of observations at the different times are independent. This observation scheme is intended to simulate a study with pre-scheduled observation times, but where different subjects can have missed visits and the risk of a missed visit increases as the study proceeds.

The following estimators were considered: Λ̂_n(·; w) and Λ̄_n(·; w), with the OSM, COI, and several GEE type weights. For the GEE type estimators in Section 4.3, we used the following weights: (i) W_GSM in (14) and W_GCI in (15) with the true covariance matrix, to show the best the GEE weights can achieve, (ii) W_GSM in (14) and W_GCI in (15) with the covariance matrix replaced by the sample covariance matrix obtained from a random sample with size of 30, denoted respectively by W_GSM₂ and W_GCI₂, and (iii) $W_{GSM}^{*}$ and $W_{GCI}^{*}$ in Section 4.3 with Λ₀(·) replaced by its current estimate at each stage in the algorithms.

With the OSM weights, Λ̂_n(·; w) corresponds to the Sun-Kalbfleisch estimator (i.e., the Wellner-Zhang NPMPLE estimator), and is denoted by SK. We also computed the Wellner-Zhang NPMLE estimator, denoted by WZMLE.

Our primary program was written in C, and we used Splus functions runif, rpois and rgamma (the Splus generators of uniform, Poisson and Gamma random variables) to generate the random variables needed in the simulation. Iterations were terminated when the largest change in any component of the current estimate of Λ₀(·) from the previous estimate is below 10⁻⁵. All simulations converged. The results reported in the following are based on 200 repetitions for each of the simulation settings.

5.1. Time-nonhomogeneous Poisson panel counts

The response process {N(t), t ∈ (0, 1]} was generated as a time-nonhomogeneous Poisson process with Λ₀(·) = 6t^γ, where γ was 1/2, 1, or 2. Coupled with the two observation schemes described above, this gave six simulation settings. In each setting, the sample means of the estimators were very close to the true mean functions over time, except occasionally in the right tail where there were fewer observations, confirming that the estimators studied are consistent.

By examining the simulation results, we have the following observations.

Finding 1: The bar estimators Λ̄_n(·; w), obtained without a monotonicity constraint, are usually much easier to compute than the hat estimators Λ̂_n(·; w). However, in addition to ensuring monotonicity, the hat estimators Λ̂_n(·; w) generally had smaller sample mean squared errors.

Finding 2: The COI weights (e.g., W_COI and W_GCI_*), based on observed increments of the response process, in general led to estimators with smaller sample mean squared errors than the estimators obtained by using the OSM weights (e.g., W_OSM and W_GSM_*). This also applies to the greater efficiency seen with the NPMLE estimator compared to the NPMPLE (i.e., SK) estimator, since these can be viewed as being based on observed increments and on observed responses, respectively. When the true or a sample covariance matrix is used in GEE type weights, the resulting COI and OSM estimators, which are the GSM and GCI or GSM2 and GCI2, are very similar.

Finding 3: By taking into account the covariance structure of the response process, the estimators using GEE weights generally have higher efficiency. However, when the covariance or its approximation involves unknown parameters, more computation time is needed to compute the estimates GSM* and GCI* than the corresponding ones with the OSM or COI weight.

The Wellner-Zhang NPMLE performed the best in all the simulated settings. All the evaluations of the SK estimator (i.e., Λ̂_n(·; w) with weight W_OSM) were very close to those of Λ̂_n(·; w) with weight W_GSM_*. This numerically confirms that in Poisson situations, NPM-PLE is the same as the Sun-Kalbfleisch estimator, that the ICM algorithm gives the NPM-PLE, and that the GSM* is close to the NPMPLE. The estimates with weights W_GSM or W_GCI, using the true covariance matrices Var(N̰) or the sample covariance matrices, were similar, and had better efficiency than the corresponding estimates with W_GSM_* or W_GCI_*. To illustrate the findings, Figure 1 presents the sample means and the sample mean squared errors (pointwise) of the estimates under nonhomogeneous Poisson process responses with γ = 1/2 and Observation Scheme A.

Sample Means and Sample Mean Squared Errors of Estimators with Poisson Panel Counts Under Observation Scheme A

5.2. Mixed-Poisson panel counts

We next took N(·) to be a mixed-Poisson process with conditional intensity function λ(t/α) = 6αt, where the random effect α was a Gamma random variable with mean 1 and variance θ = 1, 2 or 3. This corresponds to an unconditional mean function Λ₀(t) = 6t and variance function Var {N(t)} = Λ₀(t) {1 + Var(α) Λ₀(t)}, and a process with dependent increments (Lawless (1987)). The overdispersion of the simulated counting processes depends on the variance of the random effect θ = Var(α). Coupled with the two observation schemes, the three choices of θ gave a total of six settings. Figure 2 presents the pointwise sample means and sample mean squared errors of the estimates evaluated with the data generated from the mixed-Poisson process for θ = 2 with Observation Scheme B.

Sample Means and Sample Mean Squared Errors of Estimators with Mixed-Poisson Panel Counts (θ = 2) Under Observation Scheme B

Based on the simulation outcome, we had the same general observations with the mixed-Poisson processes as those listed as Findings 1–3 in Section 5.1 with time-nonhomogeneous Poisson processes. However, in this non-Poisson setting, the Wellner-Zhang NPMLE was no longer the most efficient estimator. As θ, a measure of the dependence of increments, increased, the proposed estimator Λ̂(·; w) with weights W_GSM and W_GCI, or with weights W_GSM₂ and W_GCI₂, showed higher efficiency. Moreover the estimate with GEE weight and either of Approximation A or B, which uses a Poisson covariance matrix to approximate the true covariance, did not lead to much efficiency improvement over the OSM or COI estimates. This indicates the need to explore alternative approximations.

Unlike the situations in Section 5.1 with Poisson response, the evaluations of SK (that is, NPMPLE and Λ̂(·; w) with weight W_OSM ), and GSM* (i.e., Λ̂_n(·; w) with weight W_GSM_*) were not in close agreement. Although GSM* is consistent in theory, the sample means of the GSM* estimates indicate some bias, especially in the settings with Observation Scheme B. This may be due to fewer observations late in the study period.

6. Final Remarks

This paper presents a general class of estimators for the mean function of a counting process based on panel counts. Special cases include the estimator proposed by Sun and Kalbfleisch (1995), and estimators similar to the NPMLE estimator considered by Wellner and Zhang (2000). Simulations suggest that the use of GEE weights can lead to efficiency close to that of the Wellner-Zhang NPMLE for Poisson processes and better efficiency than the Wellner-Zhang NPMLE for non-Poisson processes. With modification, the proposed estimator can be generalized to accommodate nondecreasing processes with jumps that are not necessarily of size one.

Several further investigations would be worthwhile. One of theoretical and practical interest is the determination of asymptotic variance and variance estimation for the estimator Λ̂(·; w). In principle, a resampling method could be used. Another is to find the optimal weight for the estimator to achieve the best efficiency in a given situation, and a third is to extend the estimator to the situations where the observation mechanism can depend on the response.

The estimator Λ̂_n(·; w) is an M-estimator subject to the monotonicity constraint. As done by Wellner and Zhang (2000), we apply the iterative convex minorant algorithm (ICM) to implement the estimator. This may lead to extensive computation time, and thus faster algorithms would be useful. Finally, the methods can be readily extended to incorporate covariates which, among other things, can be used to assess the dependence of the observation and the response processes.

Acknowledgments

The research was partially supported by grants from the US National Institute of Allergy and Infectious Diseases and the Natural Sciences and Engineering Research Council of Canada. The authors thank Professor Y. Zhang for many helpful discussions, and two Referees for their constructive comments and suggestions.

Appendix: Some Technical Details

A. Proof of Theorem 1

We first state a lemma to be used in the proof, then outline the proof of consistency. The approach utilized in the proof is similar to the one in Wellner and Zhang (2000). The following lemma is another version of the one-sided Glivenko-Cantelli Theorem given in Wellner and Zhang (2000) orFerguson (1996, Section 17).

Lemma 1

Suppose that Inline graphic = {U(·; θ): θ ∈ Θ} is a class of measurable functions defined on a probability space ( , P), where Θ is compact with respect to a metric d and U(x; θ) is lower semicontinuous in θ for all x. Suppose further that there exists a function V(x) such that EV(X) < ∞ and U(x; θ) ≤ V(x) for all x ∈ Inline graphic , θ ∈ Θ, and for all θ and all sufficiently small ρ > 0, inf_{_φ:d₍_{φ, θ}_)< _ρ_} U(x; φ) is measurable in x. Then if X₁, …, X_n are i.i.d. P with values in , and ℙ_n is the empirical measure of the X_i’s, almost surely P

\begin{matrix} {lim_{_}}_{n \to \infty} inf_{θ \in Θ} P_{n} U (X; θ) \geq inf_{θ \in Θ} P U (X; θ), \\ {lim_{_}}_{n \to \infty} inf_{θ \in Θ} (P_{n} - P) U (X; θ) \geq 0. \end{matrix}

Moreover, PU(X; θ) is lower semicontinuous in θ ∈ Θ: lim_φ_→_θPU (X; φ) ≥ PU (X; θ).

Proof of Theorem 1

Denote by M(Λ; w) = E{m(Λ; w|X)} the limit of the objective function L_n(Λ; w)/n = M_n(Λ; w) in (1).

We note first that M(Λ; w) ≥ M(Λ₀; w) for ∀Λ ∈ ℱ, and that the equality holds if and only if Λ = Λ₀ μ, since

E {m (Λ; w ∣ X) - m (Λ_{0}; w ∣ X) ∣ K, T} = (Λ - Λ_{0})^{'} W (Λ - Λ_{0}) \geq 0,

and the equality holds if and only if Λ(t) = Λ₀(t) for all t = T_K,j, since W is positive definite (Condition B).

We can then show that the sequence of nondecreasing functions {Λ̂_n(·; w)} is bounded in [0, τ] almost surely and, according to Helly’s Selection Theorem, there is a uniform subsequence of {Λ̂_n(·; w)(ω)} for ∀ω, denoted by {Λ̂_n_^*(_ω₎(·; w)(ω)}, that converges. This follows by noting that

\begin{array}{l} {lim^{¯}}_{n \to \infty} P_{n} {\hat{Λ}}_{n}^{'} W {\hat{Λ}}_{n} \leq 2 [{lim^{¯}}_{n \to \infty} M_{n} (Λ_{0}; w ∣ X) + {lim^{¯}}_{n \to \infty} P_{n} N^{'} WN] \\ = 2 [E {m (Λ_{0}; w ∣ X)} + E {N^{'} WN}] < \infty, \end{array}

because Λ̂_n = argmin_Λ∈ℱM_n(Λ; w) and by Conditions A and B. Thus, for ∀t ∈ [0, τ],

\begin{array}{l} {lim^{¯}}_{n \to \infty} {\hat{Λ}}_{n}^{2} (t) μ ([t, τ]) \leq {lim^{¯}}_{n \to \infty} P_{n} \sum_{j = 1}^{K} I_{[t, τ]} (T_{K, j}) {\hat{Λ}}_{n}^{2} (T_{K, j}) \leq {lim^{¯}}_{n \to \infty} P_{n} {\hat{Λ}}_{n}^{'} {\hat{Λ}}_{n} \\ \leq E {∣ ∣ A^{- 1} ∣ ∣^{2}} \times {lim^{¯}}_{n \to \infty} P_{n} {\hat{Λ}}_{n}^{'} W {\hat{Λ}}_{n} \end{array}

is bounded, by Condition B, where A is the matrix in the decomposition W = A′ A in Condition B. Denote the bound by C, and let Λ^*(·; w) be the limit of the subsequence {Λ̂_n_^*(_ω₎(·; w)}. We know that M(Λ^*; w) ≥ M(Λ₀; w). We show that

M (Λ^{*}; w) \leq M (Λ_{0}; w),

(16)

and thus M(Λ^*; w) = M(Λ₀; w), which implies Λ^* = Λ₀ a.e. μ, i.e., all the limits of subsequences of{Λ̂_n(·; w)} are Λ₀ a.e. μ, and this proves the theorem.

Take

F_{τ} = {Λ \in F : Λ (τ) \leq {(\frac{C}{μ ({τ})})}^{1 / 2} + 1},

compact in the metric d of (4), and ℳ_τ (w) = {m(Λ; w|X): Λ ∈ ℱ}, a class of measurable functions indexed in ℱ_τ for a given weight function. We note that, for given w and X, m(Λ; w|X) ∈ ℳ_τ (w) is lower semicontinuous in Λ ∈ ℱ_τ, since

m {(Λ_{1}; w ∣ X)}^{1 / 2} \leq m {(Λ_{2}; w ∣ X)}^{1 / 2} + {[(Λ_{1} - Λ_{2})^{'} W (Λ_{1} - Λ_{2})]}^{1 / 2}

and has an integrable envelope, since

m (Λ; w ∣ X) \leq 2 [N^{'} WN + Λ^{'} W Λ] \leq 2 [N^{'} WN + K ∣ ∣ Ω ∣ ∣^{2} \frac{C}{μ ({τ})}]

using Conditions A and B. Thus Lemma 1 yields

{lim_{_}}_{n \to \infty} inf_{Λ \in F_{τ}} (P_{n} - P) m (Λ; w ∣ X) \geq 0

(17)

almost surely. Therefore,

{lim_{_}}_{n^{*} \to \infty} P_{n^{*}} m (Λ; w ∣ X) ∣_{Λ = {\hat{Λ}}_{n^{*}}} \geq lim_{n^{*} \to \infty} P m (Λ; w ∣ X) ∣_{Λ = {\hat{Λ}}_{n^{*}}} = M (Λ^{*}; w)

almost surely by the Dominated Convergence Theorem. Then (16) follows by noting

M (Λ_{0}; w) = lim_{n^{*} \to \infty} M_{n^{*}} (Λ_{0}; w) \geq {lim_{_}}_{n^{*} \to \infty} M_{n^{*}} ({\hat{Λ}}_{n^{*}}; w)

by the Strong Law of Large Numbers and the definition of Λ̂_n(·; w).

B. Some known results

For the purpose of our application, we restate Theorem 2.1 of Wellner and Zhang (2000) or Lemma 3.1 of Wellner and Zhan (1997). Write ∇φ for the gradient of φ, ∇²φ for ∇( ∇φ)′, and <·, ·> for the usual inner product in ℛ^J.

Lemma 2

Let φ: ℛ^J → ℛ∪{∞} be a continuous convex function, Inline graphic ⊂ ℛ^J be a convex cone, and ₀ = ∩φ⁻¹(ℛ). Suppose ₀ is nonempty and φ is differentiable on ₀. Then ẑ ∈ ₀ satisfies φ(ẑ) = min_z_∈φ(z) if and only if

< \hat{z}, \nabla φ (\hat{z}) > = 0 and < z, \nabla φ (\hat{z}) > \geq 0 for all z \in K .

It is easy to see that ẑ is determined by ∇φ(z). The following lemma presents the iterative convex minorant (ICM) algorithm as stated in Jongbloed (1998).

Lemma 3

If the convex cone Inline graphic is ^J = {y ∈ ℛ^J: y₁≤ … ≤ y_J}, the sequence {z⁽^k⁾: k = 1, …} converges to ẑ = argmin_z_∈φ(z) where, for a fixed z⁽⁰⁾ ∈ ^J, the lth component of z⁽^k⁾ is

z_{l}^{(k)} = {max_{s \leq l} min_{r \geq l} \frac{\sum_{j = s}^{r} z_{j} φ_{j j} (z) - φ_{j} (z)}{\sum_{j = s}^{r} φ_{j j} (z)} ∣}_{z = z^{(k - 1)}}, l = 1, \dots, J

with φ_j(z)=∂φ(z)/∂z_j and $φ_{j j} (z) = \partial^{2} φ (z) / \partial z_{j}^{2}$ . For $K = C_{+}^{J} = {y \in R^{J} : 0 \leq y_{1} \leq \dots \leq y_{J}}$ , the negative components of ẑ should be set to zero.

See Jongbloed (1998), for example, for a geometric interpretation of the algorithm. Jongbloed (1998) presents a modified iterative convex minorant algorithm to avoid the problem that the ICM algorithm does not necessarily converge globally.

C. Connection of ${\hat{Λ}}_{n} (w_{GSM}^{*})$ with the Wellner-Zhang NPMPLE (2000)

Recall the nonparametric maximum pseudo-likelihood estimator (NPMPLE) considered in Wellner and Zhang (2000) is derived by maximizing

log P L (Λ) = n P_{n} (N^{'} log Λ - 1_{K}^{'} Λ) = \sum_{i = 1}^{n} \sum_{t : δ_{i} (t) = 1} N_{i} (t) \log Λ (t) - Λ (t),

where1_K is the K-dimensional vector with all components 1, and log z denotes (log z₁, …, log z_k)′ for z = (z₁, …, z_k)′. As indicated in Wellner and Zhang (2000), PL(Λ) is proportional to the pseudo-likelihood, assuming {N(t), t > 0} is Poisson and ignoring the dependency of the events within a subject. Note that

\begin{matrix} \nabla log P L (Λ) = \sum_{i = 1}^{n} Δ_{i}^{'} {\underset{\sim}{W_{i}} (\underset{\sim}{Λ}) Δ_{i} {\underset{\sim}{N}}_{i} - 1_{J_{n}}} = B_{n} (w_{GSM}^{*}; \underset{\sim}{Λ}) ({Λ_{n}}_{\sim}^{-} (w_{OSM}) - \underset{\sim}{Λ}), \\ \nabla^{2} log P L (Λ) = - \sum_{i = 1}^{n} Δ_{i}^{'} diag (N_{i} (s_{j}) : j = 1, \dots, J_{n}) \underset{\sim}{W_{i}} {(\underset{\sim}{Λ})}^{2}, \end{matrix}

with the matrix W̰_i(Λ̰) = diag(δ_i(s_j)/Λ(s_j): j = 1, … J_n) and

B_{n} (w_{GSM}^{*}; \underset{\sim}{Λ}) = \sum_{i = 1}^{n} Δ_{i}^{'} \underset{\sim}{W_{i}} (\underset{\sim}{Λ}) Δ_{i} = diag (n_{j} / Λ (s_{j}) : j = 1, \dots, J_{n}) .

Here $B_{n} (w_{GSM}^{*}; \underset{\sim}{Λ_{0}}) = B_{n} (w_{GSM}^{*})$ as given in Section 4.3. The solution of ∇ log PL(Λ) = 0 is the bar estimator with weight w_OSM, $Λ_{\sim}^{-} (w_{OSM})$ . Thus the procedure of applying Lemma 3 in Appendix B with φ(Λ̰) = − log PL(Λ) to obtain the Wellner-Zhang NPMPLE is the procedure of getting the isotonic regression of $Λ_{\sim}^{-} (w_{OSM})$ with weights ${n_{j} {\bar{Λ}}_{n} (s_{j}) / Λ_{0}^{2} (s_{j}) : j = 1, \dots, J_{n}}$ .

D. Connection of ${\hat{Λ}}_{n} (w_{GCI}^{*})$ with the Wellner-Zhang NPMLE (2000)

The nonparametric maximum likelihood estimator (NPMLE) of Wellner and Zhang (2000) is from the likelihood function based on the current data with the Poisson assumption, the log-transformation of which is

log F L (Λ) = n P (N^{'} Ω^{'} log Ω Λ - Ω Λ) = \sum_{i = 1}^{n} \sum_{t : δ_{i} (t) = 1} Δ N_{i} (t) log Δ Λ_{i} (t) - Δ Λ_{i} (t)

with ∇N_i(T_{K_ij}) = N_i(T_{K_ij}) − N_i(T_{K_ij}₋₁) for j = 1, …, K_i. In our notation,

\nabla log F L (Λ) = \sum_{i = 1}^{n} Δ_{i}^{'} \underset{\sim}{W_{i}} (\underset{\sim}{Λ}) Δ_{i} \underset{\sim}{N_{i}} - Δ_{i}^{'} \underset{\sim}{{Ω_{i}}^{'}} 1_{J_{n}} = B_{n} (w_{GCI}^{*}; \underset{\sim}{Λ}) (Λ_{\sim}^{-} (w_{GCI}^{*}; \underset{\sim}{Λ}) - \underset{\sim}{Λ}),

Where W̰_i(Λ̰) = Ω̰_i′ diag(δ_i(s_j)/e_ij: j = 1, … J_n)Ω̰_i, with 0/0 = 0 and e_ij the jth component of Ω̰_iΔ_iΛ̰, $B_{n} (w_{GCI}^{*}; \underset{\sim}{Λ}) = \sum Δ_{i}^{'} \underset{\sim}{W_{i}} (\underset{\sim}{Λ}) Δ_{i}$ with $B_{n} (w_{GCI}^{*}; \underset{\sim}{Λ_{0}})$ the same as the B_n(w) matrix with weight $w_{GCI}^{*}$ , and

Λ_{\sim}^{-} (w_{GCI}^{*}; \underset{\sim}{Λ}) = B_{n} {(w_{GCI}^{*}; \underset{\sim}{Λ})}^{- 1} \sum_{i = 1}^{n} Δ_{i}^{'} \underset{\sim}{W_{i}} (\underset{\sim}{Λ}) Δ_{i} \underset{\sim}{N_{i}} .

We see that $Λ_{\sim}^{-} (w_{GCI}^{*}; \underset{\sim}{Λ})$ is the solution of ∇ log FL(Λ) = 0 with fixed $B_{n} (w_{GCI}^{*}; \underset{\sim}{Λ})$ , which is the same as the bar estimator $Λ_{\sim}^{-} (w_{GCI}^{*})$ if $B_{n} (w_{GCI}^{*}; \underset{\sim}{Λ})$ is evaluated at Λ̰ = Λ̰₀.

Moreover, note that $B_{n}^{*} (\underset{\sim}{Λ}) = \nabla (\nabla log F L (Λ))^{'}$ is

B_{n} = (w_{GCI}^{*}; \underset{\sim}{Λ}) - \nabla B_{n} (w_{GCI}^{*}; \underset{\sim}{Λ}) (Λ_{\sim}^{-} (w_{GCI}^{*}; \underset{\sim}{Λ}) - \underset{\sim}{Λ})^{'} - B_{n} (w_{GCI}^{*}; \underset{\sim}{Λ}) \nabla {Λ_{n}}_{\sim}^{-} (w_{GCI}^{*}; \underset{\sim}{Λ})^{'} .

Using Lemma 3 in Appendix B with φ(Λ̰) = − log FL(Λ), the kth iteration for the NPMLE is the left derivative of the greatest convex minorant of the cumulative sum of diagram

{(\sum_{r = 1}^{j} a_{r r}^{*} (\underset{\sim}{Λ}), \sum_{r = 1}^{j} b_{r}^{*} (\underset{\sim}{Λ})) ∣}_{\underset{\sim}{Λ} = {\underset{\sim}{Λ}}^{(k - 1)}}, j = 1, \dots, J_{n},

with $a_{r r}^{*}$ the (r, r) element of the matrix $B_{n}^{*} (\underset{\sim}{Λ})$ , and $b_{r}^{*} (\underset{\sim}{Λ})$ the rth component of the vector

B_{n} (w_{GCI}^{*}; \underset{\sim}{Λ}) {Λ_{n}}_{\sim}^{-} (w_{GCI}^{*}; \underset{\sim}{Λ}) + {diag (B_{n}^{*} (\underset{\sim}{Λ})) - B_{n} (w_{GCI}^{*}; \underset{\sim}{Λ})} \underset{\sim}{Λ} .

The difference between $B_{n}^{*} (\underset{\sim}{Λ})$ and $B_{n} (w_{GCI}^{*}; \underset{\sim}{Λ})$ leads to the difference between $Λ_{\sim}^{^} (w_{GCI}^{*})$ and the Wellner-Zhang NPMLE (2000).

References

Barlow RE, Bartholomew DJ, Bremner JM, Brunk HD. Statistical Inference Under Order Restrictions: The Theory and Applications of Isotonic Regression. Wiley; New York: 1972. [Google Scholar]
Ferguson TS. A Course in Large Sample Theory. Chapman & Hall; London: 1996. [Google Scholar]
Hu XJ, Lagakos SW. Nonparametric estimation of the mean function of a stochastic process with missing observations. Lifetime Data Anal. 2007;13:51–73. doi: 10.1007/s10985-006-9030-0. [DOI] [PubMed] [Google Scholar]
Kaplan EL, Meier P. Nonparametric estimation from incomplete observations. J Amer Statist Assoc. 1958;53:457–481. [Google Scholar]
Jongbloed G. The iterative convex minorant algorithm for nonparametric estimation. J of Computational and Graphical Statistics. 1998;28:161–183. [Google Scholar]
Lawless JF. Negative binomial regression models. Can J Statist. 1987;15:209–226. [Google Scholar]
Lawless JF. The analysis of recurrent events for multiple subjects. Applied Statistics. 1995;44:487–498. [Google Scholar]
Lawless JF, Nadeau C. Some simple robust methods for the analysis of recurrent events. Technometrics. 1995;37:158–168. [Google Scholar]
Lin DY, Wei LJ, Yang I, Ying Z. Semiparametric regression for the mean and rate functions of recurrent events. J R Statist Soc B. 2000;62:711–730. [Google Scholar]
Sun J, Kalbfleisch JD. Estimation of the mean function of point processes based on panel count data. Statistica Sinica. 1995;5:279–290. [Google Scholar]
van der Vaart AW, Wellner JA. Weak Convergence and Empirical Processes, with Applications to Statistics. Springer; New York: 1996. [Google Scholar]
Wellner JA, Zhan Y. A hybrid algorithm for computation of the nonparametric maximum likelihood estimator from censored data. J Amer Statist Assoc. 1997;92:945–959. [Google Scholar]
Wellner JA, Zhang Y. Two estimators of the mean of a counting process with panel count data. Ann Statist. 2000;28:779–814. [Google Scholar]

[R1] Barlow RE, Bartholomew DJ, Bremner JM, Brunk HD. Statistical Inference Under Order Restrictions: The Theory and Applications of Isotonic Regression. Wiley; New York: 1972. [Google Scholar]

[R2] Ferguson TS. A Course in Large Sample Theory. Chapman & Hall; London: 1996. [Google Scholar]

[R3] Hu XJ, Lagakos SW. Nonparametric estimation of the mean function of a stochastic process with missing observations. Lifetime Data Anal. 2007;13:51–73. doi: 10.1007/s10985-006-9030-0. [DOI] [PubMed] [Google Scholar]

[R4] Kaplan EL, Meier P. Nonparametric estimation from incomplete observations. J Amer Statist Assoc. 1958;53:457–481. [Google Scholar]

[R5] Jongbloed G. The iterative convex minorant algorithm for nonparametric estimation. J of Computational and Graphical Statistics. 1998;28:161–183. [Google Scholar]

[R6] Lawless JF. Negative binomial regression models. Can J Statist. 1987;15:209–226. [Google Scholar]

[R7] Lawless JF. The analysis of recurrent events for multiple subjects. Applied Statistics. 1995;44:487–498. [Google Scholar]

[R8] Lawless JF, Nadeau C. Some simple robust methods for the analysis of recurrent events. Technometrics. 1995;37:158–168. [Google Scholar]

[R9] Lin DY, Wei LJ, Yang I, Ying Z. Semiparametric regression for the mean and rate functions of recurrent events. J R Statist Soc B. 2000;62:711–730. [Google Scholar]

[R10] Sun J, Kalbfleisch JD. Estimation of the mean function of point processes based on panel count data. Statistica Sinica. 1995;5:279–290. [Google Scholar]

[R11] van der Vaart AW, Wellner JA. Weak Convergence and Empirical Processes, with Applications to Statistics. Springer; New York: 1996. [Google Scholar]

[R12] Wellner JA, Zhan Y. A hybrid algorithm for computation of the nonparametric maximum likelihood estimator from censored data. J Amer Statist Assoc. 1997;92:945–959. [Google Scholar]

[R13] Wellner JA, Zhang Y. Two estimators of the mean of a counting process with panel count data. Ann Statist. 2000;28:779–814. [Google Scholar]

PERMALINK

GENERALIZED LEAST SQUARES ESTIMATION OF THE MEAN FUNCTION OF A COUNTING PROCESS BASED ON PANEL COUNTS

X Joan Hu

Stephen W Lagakos

Richard A Lockhart

Abstract

1. Introduction