Abstract
This paper considers nonparametric estimation of the mean function of a counting process based on periodic observations, i.e., panel counts. We present estimators derived through minimizing a class of generalized sums of squares subject to a monotonicity constraint. We establish consistency of the estimators and provide procedures to implement them with various weight functions. For specific weight functions, they reduce to the estimator given in Sun and Kalbfleisch (1995), and are closely related to the nonparametric maximum likelihood estimator studied in Wellner and Zhang (2000). With other weight functions, the proposed estimators provide alternatives that can have better efficiency in non-Poisson situations than previous approaches. Simulations are used to examine the finite-sample performance of the proposed estimators.
Keywords: Isotonic regression, monotonicity constraint, periodic observations
1. Introduction
Suppose that the number of events that occur until time t > 0 is N(t), and we wish to estimate the marginal expectation E{N(t)} without specifying a probability model for the counting process {N(t), t > 0}. See Lawless and Nadeau (1995), Lin, Wei, Yang and Ying (2000), and Wellner and Zhang (2000) for examples of this setup.
When {N(t), t > 0} is a continuously-observed Poisson process subject to right-censoring, the Nelson-Aalen estimator (cf: Andersen, Borgan, Gill, and Keiding (1992), Chap.IV) is commonly used for estimating the cumulative intensity function of the counting process, which is the same as its mean function Λ0(t) = E{N (t)} (e.g., Lawless (1995)). Lawless and Nadeau (1995) and Lin, Wei, Yang, and Ying (2000) indicate that the Nelson-Aalen estimator is a consistent estimator for the marginal mean function of a counting process regardless of the underlying probability model.
When the process for an individual is only observed at a finite set of time points (that is, the observations are panel counts), Sun and Kalbfleisch (1995) present an estimator of the mean function based on isotonic regression, and Wellner and Zhang (2000) derive a maximum pseudo-likelihood estimator (NPMPLE) and the nonparametric maximum likelihood estimator (NPMLE) under the assumption that the counting process is Poisson. Wellner and Zhang note that their NPMPLE is the same as the Sun-Kalbfleisch estimator, demonstrate that the NPMLE is more efficient than the NPMPLE via simulation, and show that both their estimators are consistent without the Poisson assumption. The efficiency of the NPMLE for non-Poisson counts has not been explored. Hu and Lagakos (2007) consider two weighted least squares estimators for the mean of an arbitrary response process in a general incomplete data setting. Their estimators reduce to, or are closely related to, some commonly-used estimators with certain data structures, but are not necessarily monotone with panel counts.
This paper proposes a general class of nonparametric estimators for the mean function Λ0(·) based on periodic observations, without specifying an underlying probability model. The estimators arise by minimizing a generalized weighted sum of squares under a monotonicity constraint. We establish consistency of the general estimator and provide procedures to implement it with various weights. One choice of weight function leads to the Sun-Kalbfleisch estimator and thus also the Wellner-Zhang NPMPLE. Another leads to an estimator that is closely related to the Wellner-Zhang NPMLE. Via simulation, we study the finite sample performance of the proposed estimators in various situations, and compare them to the Sun-Kalbfleisch estimator and the Wellner-Zhang NPMLE. With some other specific weight functions, the proposed approach provides alternatives to the NPMLE and NPMPLE that can have higher efficiency in situations with non-Poisson counts.
Without loss of generality, we take N(0) ≡ 0 and thus Λ0(0) = 0. We assume that the counting process and the observation mechanism are independent. Section 2 presents a generalized least squares estimator for the mean function Λ0(·) subject to the constraint that Λ0(·) is monotone, and then establishes consistency of the estimator. Section 3 provides procedures to implement the estimator, and Section 4 examines the specific estimators that result from several choices of weight function. Section 5 presents the results of a simulation study comparing the proposed estimators with those of Sun and Kalbfleisch (1995) and Wellner and Zhang (2000). Section 6 discusses the results and extensions.
2. A Generalized Least Squares Monotonic Estimator
Hu and Lagakos (2007) show that two commonly-used nonparametric estimators, the sample mean of available data and the Nelson-Aalen estimator, can be derived as a solution to a weighted sum of squares criterion. This, along with the monotonicity of the mean function Λ0(·), leads us to explore a criterion for defining nonparametric estimators through minimizing a generalized sum of squares subject to the monotonicity constraint. In the following, we introduce a generalized sum of squares criterion, and establish consistency of the estimator obtained through its constrained maximization.
Let T = (TK,1, …, TK,K)′, with TK,j denoting the jth of K observation times during time period [0, τ], with 0 < τ < ∞, where τ is fixed and K can be random. Let N = (N(TK,1), …, N (TK,K)′ and Λ = (Λ(TK,1), …, Λ(TK,K))′. Assume {Xi: i = 1, …, n} is a set of i.i.d. samples of X = (K, T, N). We consider the generalized sum of squares with a symmetric weight function w(u, v) for u, v ∈ {Tk,1, …, TK,K},
| (1) |
where
| (2) |
and where W is the K × K symmetric matrix with (j, l) entry w(TK,j, TK,l). The weight function is deterministic given (K, T). Let Λ̂n(·; w) be the minimum point of Ln(Λ; w) over all nondecreasing functions defined in [0, τ], that is,
| (3) |
with ℱ = {Λ(·): nondecreasing in [0, τ]}. Following Wellner and Zhang (2000), we define a measure μ in [0, τ] based on the distribution of (K, T):
where B is a Borel set in [0, τ]. For Λ, Λ′ ∈ ℱ, define
| (4) |
The following theorem establishes the consistency of Λ̂n(·; w) with respect to the metric d.
Condition A: Λ0(t) = E{N (t)} and c(t, s) = Cov{N (t), N (s)} exist for t, s ∈ [0, τ], Λ0(·) is strictly increasing with Λ0(0) = 0, and E{N′N} < ∞.
Condition B: The K × K weight matrix W = (w(TK,j, TK,l) can be expressed as W = A′A with nonsingular A, and E{||A||2} < ∞ and E{||A−1||2} < ∞, where .
Condition C: μ({t}) > 0 and E{K2} < ∞.
Theorem 1
Assume Conditions A–C. For a given weight function w, d(Λ̂n(·; w), Λ0(·)) → 0 almost surely as n → ∞.
A proof of the theorem is outlined in Appendix A. Following the arguments in Wellner and Zhang (2000), we can extend the result to the situations with μ({τ}) = 0. With some additional conditions on the observation mechanism, we may also show that the convergence rate of Λ̂n(·; w) is at least n1/3.
3. Implementation of Estimator Λ̂n(·; w)
We begin by introducing an estimator closely related to Λ̂n(·; w) with an arbitrary weight function, then show how this can be used to find Λ̂n(·; w).
3.1. Preparation
Denote the set of distinct values of the observation times TKij, j = 1, …, Ki, i = 1, …, n} by
n = {sl: l = 1, …, Jn} with 0 = s0 < s1 < … < sJn ≤ τ. Let N̰ = (N(s1), …, N(sJn)) with N̰i denoting the realization from subject i, Λ̰ = (Λ(s1), …, Λ(sJn))′, and Δi = diag{δi(sl): sl ∈
n}, where δi(sl) = 1 or 0 depending on whether or not Ni(sl) is observed. Further, let W̰i be the Jn × Jn matrix with the (l, l′) entry wi(sl, sl′) = w(sl, sl′) if both l, l′ belong to {k: there exists a j such that TKij = sk}, and zero otherwise. The objective function in (1) is then expressible as
| (5) |
The estimator Λ̂n(·; w) defined in (3) is thus an element in the class with for a given weight function w(·). Here is the cone in ℛJ = {y = (y1, …, yJ)′: yj ∈ ℛ} defined by . A natural choice for Λ̂n(·; w) is the step function, starting with 0 at t = 0 and having jumps at sl with size equal to the lth component of for l = 1, …, Jn. For a given weight function w(·), we can construct Λ̂n(·; w) through . Thus, in the following we focus on how to obtain .
The objective function Ln(Λ; w) in (5) can be decomposed into the sum of two terms,
| (6) |
with
| (7) |
Note that
minimizes Ln(Λ; w) in (5). It is easy to show that, conditional on the observation times and for a given weight function w(·),
is an unbiased estimator of Λ̰0 = (Λ0(s1), …, Λ0(sJn))′, provided the inverse in (7) exists. The conditional covariance matrix of
is
. In fact,
gives a consistent estimator of Λ0(t) for t ∈
= limn→∞
n, provided that the probability of an observation at t ∈
is positive (Hu and Lagakos (2007)).
The bar estimator does not always belong to and thus is not the same as in general. We present procedures to compute . From (6), the estimator can be viewed as the projection of into with respect to the metric ρ: ρw(a, b)2 = (a − b)′ Bn(w)(a − b) for a, b ∈ ℛJn.
3.2. Procedures to implement
We consider an application of the result given in Wellner and Zhan (1997) and Wellner and Zhang (2000), which is restated as Lemma 2 in Appendix B. Let φ(Λ̰) = Ln(Λ; w)/2). The estimator can be obtained by jointly solving the following equation and inequalities in :
| (8) |
| (9) |
Here we have
| (10) |
with Bn(w) and defined in (7). For the special case where . In general, it is not easy to jointly solve (8) and (9) for .
An alternative approach for evaluating utilizes the ICM algorithm (e.g., Jongbloed (1998)), which is restated for our application as Lemma 3 in Appendix B. Specifically, the sequence {Λ̰(k)(w): k = 1, …} in the algorithm is obtained as
| (11) |
where Λ̰(0) is an initial estimator, φj(Λ̰) is the jth component of ∇φ(Λ̰) in (10), and φjj(Λ̰) is the (j, j) element of the matrix ∇2φ(Λ̰) = Bn(w).
From (6), with . It can be computationally easier in some situations to use φ*(Λ̰) instead of φ(Λ̰) = Ln(Λ; w)/2 to obtain the estimator . The kth evaluation Λ(k)(w) in the iterative procedure is the left derivative of the greatest convex minorant of the cumulative sum diagram
with arr the (r, r) element of the matrix Bn(w) = ∇2φ(Λ̰), and br(Λ̰) the rth component of the vector . When Bn(w) is diagonal, the estimator is the isotonic regression of with respect to the weights of Bn(w)’s diagonal elements (Barlow, et al. (1972), Chap.1). Thus, we view as a generalized isotonic regression of the bar estimator with weight matrix Bn(w).
4. Choices of Weight Function
We now consider several weight functions for the estimator . Each determines the random matrix WK×K in the expression m(Λ; w|X) in (2).
4.1. Observed sample mean weight (OSM)
In practice, the sample mean of available data, termed the observed sample mean by Hu and Lagakos (2007), is routinely used to summarize repeated measures with random missing data to indicate trends along the time and differences between treatment groups. This suggests taking W in (2) to be WOSM = IK×K, the K × K identity matrix. The matrices W̰i in (5) are then Δi = diag(δi(sj): j = 1, …, Jn. It is easy to verify that this weight satisfies Condition B in Section 2, since WOSM = A′A with A = IK×K. The bar estimator and the Bn matrix in (7) are then
| (12) |
and with nj = Σδi(sj), the size of the observations at time sj. See Hu and Lagakos (2007) for more discussion of Λ̄OSM(t) for an arbitrary response process with a more general observation mechanism. For right-censored survival data, it becomes the reduced sample estimator introduced by Kaplan and Meier (1958) as an alternative to the product-limit estimator.
Following the definition given in Barlow, et al. (1972, page 9), the estimator is the isotonic regression of with weights {nj: j = 1, …, Jn}. Thus can be obtained directly from and the given weights by
The estimator Λ̂OSM (·) is the same as the Sun-Kalbfleisch estimator (1995). Wellner and Zhang (2000) note that their NPMPLE is also the same as the Sun-Kalbfleisch estimator.
4.2. Cumulative observed increment weight (COI)
Motivated by the construction of the Kaplan-Meier estimator, Hu and Lagakos (2007) propose an estimator based on the cumulative observed increments (COI), and show that this estimator can be more efficient than the observed sample mean when the observations at different times from the same study individual are correlated. Following their approach, we consider the weight matrix W in (2) as WCOI = Ω′Ω with
| (13) |
It is easy to verify that the weight function satisfies Condition B in Section 2. The associated objective function Ln(Λ; wCOI) depends only on the observed increments ΔN(TK,j) = N(TK,j) − N(TK,j−1), j = 1, …, K, with TK,0 = 0. Here the matrix W̰i in (5) becomes Ω̰i′ Ω̰i, where Ω̰i is the Jn × Jn matrix with elements (j, l) that are zero or are the same as the corresponding elements of Ω given in (13) subject to subject i not having or having observations at sj and sl, respectively.
Hu and Lagakos (2007) note that when using the current weight wCOI, is the nonparametric version of the estimator studied in Lawless and Nadeau (1995) and Lin, Wei, Yang, and Ying (2000) for semi-parametric estimation of the mean function Λ0(·) from right-censored counting process data, which is an extension of the Nelson-Aalen estimator of the cumulative intensity function under the Poisson assumption.
The components of are not in general in nondecreasing order when used with periodic observations. The estimator , which minimizes the second term of (6) with the given weight , is thus more desirable in the current setting. The matrix Bn(wCOI) = Σ Ω̰i′ Ω̰i is not in general diagonal and thus Λ̂n(·; wCOI) is a generalized isotonic regression of Λ̄n(·; wCOI). We can use the procedures in Section 3.2.2 to compute and then to obtain Λ̂n(·; wCOI).
4.3. Generalized estimating equation weights (GEE)
Motivated by the construction of generalized estimating equations (see Diggle, Liang and Zeger (1994), Chap.8) for parametric estimation, consider the weight matrix W in (2) to be the inverse of the covariance matrix of N = (N(TK,1), …, N (TK,K))′, conditional on the number of observations K and the observation times T = (TK,1, …, TK,K)′; that is,
| (14) |
The weight matrices W̰i in (5) for evaluating the estimator are now , where D− denotes the Moore-Penrose generalized inverse of D, and Var(N̰) = (c(sj, sl)). Another option uses the weight
| (15) |
where Ω is defined in (13) and ΔN(TK,j) = N (TK,j) − N (TK,j−1). The weight matrices W̰i in (5) are then WG̰CI,i = Ω̰i′(Ω̰iVar(N̰)Ω̰i′)− Ω̰i.
Using either WGSM in (14) or WGCI in (15) requires the covariance matrix Var(N̰). When this is unknown, as would often occur in practice, it can be replaced by an estimate. Note that, with Var(N̰) substituted by the Jn × Jn identity matrix, WGSM and WGCI reduce to WOSM (§4.1) and WCOI (§4.2), respectively. Below we consider two approximations to the covariance matrix Var(N̰) and the associated estimation procedures, and we compare the resulting estimators with the Wellner-Zhang NPMPLE and NPMLE.
Approximation A: using Var(N̰) ≈ diag(Λ0(sj): j = 1, … Jn)
If the counting process {N(t), t > 0} is Poisson, this approximation is the diagonal matrix of the true covariance matrix, and we plug it into either WGSM (14) or WGCI (15) to obtain a GEE type weight.
Use of this approximation in WGSM (14) corresponds to the weight . Provided E{N(t)} = Λ0(t) > 0 for t > 0, the weight matrix based on this approximation satisfies Condition B in Section 2. The matrix Bn(w) = ∇2φ(Λ̰) is the diagonal matrix . Thus, if Λ0(·) were known, the resulting estimator would be the isotonic regression of with weights {nj/Λ0(sj): j = 1, …, Jn=. Note that is the same as given by (12), and also that is the isotonic regression of in Section 4.1, the same as , with weights {nj: j = 1, …, Jn}. The hat estimator with the current weight is not, in general, the same as the estimator , which corresponds to the estimator proposed by Sun and Kalbfleisch (1995).
Because the weight involves Λ0(·), we consider the following iterative algorithm to implement the estimator : given the (k − 1)th iteration Λ̰(k−1), obtain the kth iteration for the estimator by using the weight with Λ0 substituted by Λ̰(k−1), for k = 1, … until the sequence converges.
Appendix C shows that the procedure of applying the ICM algorithm (Lemma 3 in Appendix B) to maximize the pseudo-likelihood given in Wellner and Zhang (2000) gives the isotonic regression of with weights . As pointed out in Wellner and Zhang (2000), the procedure should converge to the Wellner-Zhang NPMPLE, which is the same as the Sun-Kalbfleisch estimator (1995); that is, . Since Λ̄n(sj; wOSM) is a consistent estimator of Λ0(sj), the resulting estimator from the iterative procedure is then close to the estimator for large n. This is confirmed for Poisson counts via simulation in Section 5.1.
Approximation B: using Var(N̰) ≈ (σlj) with σlj = Λ0(sj) for j ≤ l: j, l = 1, …, Jn
Applying the approximation in WGCI (15), the weight in the estimation is . If {N(t), t > 0} is a Poisson process, this approximation uses the true covariance matrix of the increments ΔN(TK,j), j = 1, …, K, where Cov {ΔN(TK,j), ΔN (TK,l)|T, K} is ΔΛ0(TK,j) = Λ0(TK,j) − Λ0(TK,j−1) or 0 for l = j or not, respectively. The following additional condition, which holds in many practical situations, ensures that the weight satisfy Condition B in Theorem 1.
Condition D: The observation times are ε-separated; that is, there exists a constant ε > 0 such that P(TK,j − TK,j−1 ≥ ε: j = 1, …, K) = 1.
It appears that Wellner and Zhang (2000) need this condition in the asymptotics derivation for their nonparametric maximum likelihood estimator (NPMLE).
Applying Lemma 3 (the ICM algorithm) in Appendix B, we can evaluate as given in Section 3.2 with weight , which is a generalized isotonic regression of the corresponding bar estimator . Since the weight involves Λ0(·), an iterative algorithm similar to the one for is needed to implement .
Appendix D shows the connection of to the Wellner-Zhang NPMLE (2000). The difference is due to Λ̰ in the matrix being fixed at the previous estimate in evaluating and is treated as unknown in evaluating the Wellner-Zhang NPMLE. Section 5 compares the two estimators for various situations via simulation.
5. Simulation Study
To study the properties of the estimators with the weights discussed in Section 4 in finite-samples, we conducted a simulation with N(·) = {N(t), 0 < t ≤ 1} being a Poisson or mixed-Poisson process. We used n = 100 i.i.d. realizations of N(·), combined with one of the following two observation schemes, each chosen to yield an average of four observations per subject.
Observation Scheme A: Observation times for each individual were generated from a time-homogeneous Poisson process with rate of 4. This scheme simulates a study in which observation times vary among individuals.
Observation Scheme B: The potential observation times are tj = 0.05, 0.10, ···, 0.95, 1.00, with the probability of an observation at time tj being , and where the presence or absence of observations at the different times are independent. This observation scheme is intended to simulate a study with pre-scheduled observation times, but where different subjects can have missed visits and the risk of a missed visit increases as the study proceeds.
The following estimators were considered: Λ̂n(·; w) and Λ̄n(·; w), with the OSM, COI, and several GEE type weights. For the GEE type estimators in Section 4.3, we used the following weights: (i) WGSM in (14) and WGCI in (15) with the true covariance matrix, to show the best the GEE weights can achieve, (ii) WGSM in (14) and WGCI in (15) with the covariance matrix replaced by the sample covariance matrix obtained from a random sample with size of 30, denoted respectively by WGSM2 and WGCI2, and (iii) and in Section 4.3 with Λ0(·) replaced by its current estimate at each stage in the algorithms.
With the OSM weights, Λ̂n(·; w) corresponds to the Sun-Kalbfleisch estimator (i.e., the Wellner-Zhang NPMPLE estimator), and is denoted by SK. We also computed the Wellner-Zhang NPMLE estimator, denoted by WZMLE.
Our primary program was written in C, and we used Splus functions runif, rpois and rgamma (the Splus generators of uniform, Poisson and Gamma random variables) to generate the random variables needed in the simulation. Iterations were terminated when the largest change in any component of the current estimate of Λ0(·) from the previous estimate is below 10−5. All simulations converged. The results reported in the following are based on 200 repetitions for each of the simulation settings.
5.1. Time-nonhomogeneous Poisson panel counts
The response process {N(t), t ∈ (0, 1]} was generated as a time-nonhomogeneous Poisson process with Λ0(·) = 6tγ, where γ was 1/2, 1, or 2. Coupled with the two observation schemes described above, this gave six simulation settings. In each setting, the sample means of the estimators were very close to the true mean functions over time, except occasionally in the right tail where there were fewer observations, confirming that the estimators studied are consistent.
By examining the simulation results, we have the following observations.
Finding 1: The bar estimators Λ̄n(·; w), obtained without a monotonicity constraint, are usually much easier to compute than the hat estimators Λ̂n(·; w). However, in addition to ensuring monotonicity, the hat estimators Λ̂n(·; w) generally had smaller sample mean squared errors.
Finding 2: The COI weights (e.g., WCOI and WGCI*), based on observed increments of the response process, in general led to estimators with smaller sample mean squared errors than the estimators obtained by using the OSM weights (e.g., WOSM and WGSM*). This also applies to the greater efficiency seen with the NPMLE estimator compared to the NPMPLE (i.e., SK) estimator, since these can be viewed as being based on observed increments and on observed responses, respectively. When the true or a sample covariance matrix is used in GEE type weights, the resulting COI and OSM estimators, which are the GSM and GCI or GSM2 and GCI2, are very similar.
Finding 3: By taking into account the covariance structure of the response process, the estimators using GEE weights generally have higher efficiency. However, when the covariance or its approximation involves unknown parameters, more computation time is needed to compute the estimates GSM* and GCI* than the corresponding ones with the OSM or COI weight.
The Wellner-Zhang NPMLE performed the best in all the simulated settings. All the evaluations of the SK estimator (i.e., Λ̂n(·; w) with weight WOSM) were very close to those of Λ̂n(·; w) with weight WGSM*. This numerically confirms that in Poisson situations, NPM-PLE is the same as the Sun-Kalbfleisch estimator, that the ICM algorithm gives the NPM-PLE, and that the GSM* is close to the NPMPLE. The estimates with weights WGSM or WGCI, using the true covariance matrices Var(N̰) or the sample covariance matrices, were similar, and had better efficiency than the corresponding estimates with WGSM* or WGCI*. To illustrate the findings, Figure 1 presents the sample means and the sample mean squared errors (pointwise) of the estimates under nonhomogeneous Poisson process responses with γ = 1/2 and Observation Scheme A.
Figure 1.
Sample Means and Sample Mean Squared Errors of Estimators with Poisson Panel Counts Under Observation Scheme A
5.2. Mixed-Poisson panel counts
We next took N(·) to be a mixed-Poisson process with conditional intensity function λ(t/α) = 6αt, where the random effect α was a Gamma random variable with mean 1 and variance θ = 1, 2 or 3. This corresponds to an unconditional mean function Λ0(t) = 6t and variance function Var {N(t)} = Λ0(t) {1 + Var(α) Λ0(t)}, and a process with dependent increments (Lawless (1987)). The overdispersion of the simulated counting processes depends on the variance of the random effect θ = Var(α). Coupled with the two observation schemes, the three choices of θ gave a total of six settings. Figure 2 presents the pointwise sample means and sample mean squared errors of the estimates evaluated with the data generated from the mixed-Poisson process for θ = 2 with Observation Scheme B.
Figure 2.
Sample Means and Sample Mean Squared Errors of Estimators with Mixed-Poisson Panel Counts (θ = 2) Under Observation Scheme B
Based on the simulation outcome, we had the same general observations with the mixed-Poisson processes as those listed as Findings 1–3 in Section 5.1 with time-nonhomogeneous Poisson processes. However, in this non-Poisson setting, the Wellner-Zhang NPMLE was no longer the most efficient estimator. As θ, a measure of the dependence of increments, increased, the proposed estimator Λ̂(·; w) with weights WGSM and WGCI, or with weights WGSM2 and WGCI2, showed higher efficiency. Moreover the estimate with GEE weight and either of Approximation A or B, which uses a Poisson covariance matrix to approximate the true covariance, did not lead to much efficiency improvement over the OSM or COI estimates. This indicates the need to explore alternative approximations.
Unlike the situations in Section 5.1 with Poisson response, the evaluations of SK (that is, NPMPLE and Λ̂(·; w) with weight WOSM ), and GSM* (i.e., Λ̂n(·; w) with weight WGSM*) were not in close agreement. Although GSM* is consistent in theory, the sample means of the GSM* estimates indicate some bias, especially in the settings with Observation Scheme B. This may be due to fewer observations late in the study period.
6. Final Remarks
This paper presents a general class of estimators for the mean function of a counting process based on panel counts. Special cases include the estimator proposed by Sun and Kalbfleisch (1995), and estimators similar to the NPMLE estimator considered by Wellner and Zhang (2000). Simulations suggest that the use of GEE weights can lead to efficiency close to that of the Wellner-Zhang NPMLE for Poisson processes and better efficiency than the Wellner-Zhang NPMLE for non-Poisson processes. With modification, the proposed estimator can be generalized to accommodate nondecreasing processes with jumps that are not necessarily of size one.
Several further investigations would be worthwhile. One of theoretical and practical interest is the determination of asymptotic variance and variance estimation for the estimator Λ̂(·; w). In principle, a resampling method could be used. Another is to find the optimal weight for the estimator to achieve the best efficiency in a given situation, and a third is to extend the estimator to the situations where the observation mechanism can depend on the response.
The estimator Λ̂n(·; w) is an M-estimator subject to the monotonicity constraint. As done by Wellner and Zhang (2000), we apply the iterative convex minorant algorithm (ICM) to implement the estimator. This may lead to extensive computation time, and thus faster algorithms would be useful. Finally, the methods can be readily extended to incorporate covariates which, among other things, can be used to assess the dependence of the observation and the response processes.
Acknowledgments
The research was partially supported by grants from the US National Institute of Allergy and Infectious Diseases and the Natural Sciences and Engineering Research Council of Canada. The authors thank Professor Y. Zhang for many helpful discussions, and two Referees for their constructive comments and suggestions.
Appendix: Some Technical Details
A. Proof of Theorem 1
We first state a lemma to be used in the proof, then outline the proof of consistency. The approach utilized in the proof is similar to the one in Wellner and Zhang (2000). The following lemma is another version of the one-sided Glivenko-Cantelli Theorem given in Wellner and Zhang (2000) orFerguson (1996, Section 17).
Lemma 1
Suppose that
= {U(·; θ): θ ∈ Θ} is a class of measurable functions defined on a probability space (
, P), where Θ is compact with respect to a metric d and U(x; θ) is lower semicontinuous in θ for all x. Suppose further that there exists a function V(x) such that EV(X) < ∞ and U(x; θ) ≤ V(x) for all x ∈
, θ ∈ Θ, and for all θ and all sufficiently small ρ > 0, inf{φ:d(φ, θ)< ρ} U(x; φ) is measurable in x. Then if X1, …, Xn are i.i.d. P with values in
, and ℙn is the empirical measure of the Xi’s, almost surely P
Moreover, PU(X; θ) is lower semicontinuous in θ ∈ Θ: limφ→θPU (X; φ) ≥ PU (X; θ).
Proof of Theorem 1
Denote by M(Λ; w) = E{m(Λ; w|X)} the limit of the objective function Ln(Λ; w)/n = Mn(Λ; w) in (1).
We note first that M(Λ; w) ≥ M(Λ0; w) for ∀Λ ∈ ℱ, and that the equality holds if and only if Λ = Λ0 μ, since
and the equality holds if and only if Λ(t) = Λ0(t) for all t = TK,j, since W is positive definite (Condition B).
We can then show that the sequence of nondecreasing functions {Λ̂n(·; w)} is bounded in [0, τ] almost surely and, according to Helly’s Selection Theorem, there is a uniform subsequence of {Λ̂n(·; w)(ω)} for ∀ω, denoted by {Λ̂n*(ω)(·; w)(ω)}, that converges. This follows by noting that
because Λ̂n = argminΛ∈ℱMn(Λ; w) and by Conditions A and B. Thus, for ∀t ∈ [0, τ],
is bounded, by Condition B, where A is the matrix in the decomposition W = A′ A in Condition B. Denote the bound by C, and let Λ*(·; w) be the limit of the subsequence {Λ̂n*(ω)(·; w)}. We know that M(Λ*; w) ≥ M(Λ0; w). We show that
| (16) |
and thus M(Λ*; w) = M(Λ0; w), which implies Λ* = Λ0 a.e. μ, i.e., all the limits of subsequences of{Λ̂n(·; w)} are Λ0 a.e. μ, and this proves the theorem.
Take
compact in the metric d of (4), and ℳτ (w) = {m(Λ; w|X): Λ ∈ ℱ}, a class of measurable functions indexed in ℱτ for a given weight function. We note that, for given w and X, m(Λ; w|X) ∈ ℳτ (w) is lower semicontinuous in Λ ∈ ℱτ, since
and has an integrable envelope, since
using Conditions A and B. Thus Lemma 1 yields
| (17) |
almost surely. Therefore,
almost surely by the Dominated Convergence Theorem. Then (16) follows by noting
by the Strong Law of Large Numbers and the definition of Λ̂n(·; w).
B. Some known results
For the purpose of our application, we restate Theorem 2.1 of Wellner and Zhang (2000) or Lemma 3.1 of Wellner and Zhan (1997). Write ∇φ for the gradient of φ, ∇2φ for ∇( ∇φ)′, and <·, ·> for the usual inner product in ℛJ.
Lemma 2
Let φ: ℛJ → ℛ∪{∞} be a continuous convex function,
⊂ ℛJ be a convex cone, and
0 =
∩φ−1(ℛ). Suppose
0 is nonempty and φ is differentiable on
0. Then ẑ ∈
0 satisfies φ(ẑ) = minz∈
φ(z) if and only if
It is easy to see that ẑ is determined by ∇φ(z). The following lemma presents the iterative convex minorant (ICM) algorithm as stated in Jongbloed (1998).
Lemma 3
If the convex cone
is
J = {y ∈ ℛJ: y1≤ … ≤ yJ}, the sequence {z(k): k = 1, …} converges to ẑ = argminz∈
φ(z) where, for a fixed z(0) ∈
J, the lth component of z(k) is
with φj(z)=∂φ(z)/∂zj and . For , the negative components of ẑ should be set to zero.
See Jongbloed (1998), for example, for a geometric interpretation of the algorithm. Jongbloed (1998) presents a modified iterative convex minorant algorithm to avoid the problem that the ICM algorithm does not necessarily converge globally.
C. Connection of with the Wellner-Zhang NPMPLE (2000)
Recall the nonparametric maximum pseudo-likelihood estimator (NPMPLE) considered in Wellner and Zhang (2000) is derived by maximizing
where1K is the K-dimensional vector with all components 1, and log z denotes (log z1, …, log zk)′ for z = (z1, …, zk)′. As indicated in Wellner and Zhang (2000), PL(Λ) is proportional to the pseudo-likelihood, assuming {N(t), t > 0} is Poisson and ignoring the dependency of the events within a subject. Note that
with the matrix W̰i(Λ̰) = diag(δi(sj)/Λ(sj): j = 1, … Jn) and
Here as given in Section 4.3. The solution of ∇ log PL(Λ) = 0 is the bar estimator with weight wOSM, . Thus the procedure of applying Lemma 3 in Appendix B with φ(Λ̰) = − log PL(Λ) to obtain the Wellner-Zhang NPMPLE is the procedure of getting the isotonic regression of with weights .
D. Connection of with the Wellner-Zhang NPMLE (2000)
The nonparametric maximum likelihood estimator (NPMLE) of Wellner and Zhang (2000) is from the likelihood function based on the current data with the Poisson assumption, the log-transformation of which is
with ∇Ni(TKij) = Ni(TKij) − Ni(TKij−1) for j = 1, …, Ki. In our notation,
Where W̰i(Λ̰) = Ω̰i′ diag(δi(sj)/eij: j = 1, … Jn)Ω̰i, with 0/0 = 0 and eij the jth component of Ω̰iΔiΛ̰, with the same as the Bn(w) matrix with weight , and
We see that is the solution of ∇ log FL(Λ) = 0 with fixed , which is the same as the bar estimator if is evaluated at Λ̰ = Λ̰0.
Moreover, note that is
Using Lemma 3 in Appendix B with φ(Λ̰) = − log FL(Λ), the kth iteration for the NPMLE is the left derivative of the greatest convex minorant of the cumulative sum of diagram
with the (r, r) element of the matrix , and the rth component of the vector
The difference between and leads to the difference between and the Wellner-Zhang NPMLE (2000).
References
- Barlow RE, Bartholomew DJ, Bremner JM, Brunk HD. Statistical Inference Under Order Restrictions: The Theory and Applications of Isotonic Regression. Wiley; New York: 1972. [Google Scholar]
- Ferguson TS. A Course in Large Sample Theory. Chapman & Hall; London: 1996. [Google Scholar]
- Hu XJ, Lagakos SW. Nonparametric estimation of the mean function of a stochastic process with missing observations. Lifetime Data Anal. 2007;13:51–73. doi: 10.1007/s10985-006-9030-0. [DOI] [PubMed] [Google Scholar]
- Kaplan EL, Meier P. Nonparametric estimation from incomplete observations. J Amer Statist Assoc. 1958;53:457–481. [Google Scholar]
- Jongbloed G. The iterative convex minorant algorithm for nonparametric estimation. J of Computational and Graphical Statistics. 1998;28:161–183. [Google Scholar]
- Lawless JF. Negative binomial regression models. Can J Statist. 1987;15:209–226. [Google Scholar]
- Lawless JF. The analysis of recurrent events for multiple subjects. Applied Statistics. 1995;44:487–498. [Google Scholar]
- Lawless JF, Nadeau C. Some simple robust methods for the analysis of recurrent events. Technometrics. 1995;37:158–168. [Google Scholar]
- Lin DY, Wei LJ, Yang I, Ying Z. Semiparametric regression for the mean and rate functions of recurrent events. J R Statist Soc B. 2000;62:711–730. [Google Scholar]
- Sun J, Kalbfleisch JD. Estimation of the mean function of point processes based on panel count data. Statistica Sinica. 1995;5:279–290. [Google Scholar]
- van der Vaart AW, Wellner JA. Weak Convergence and Empirical Processes, with Applications to Statistics. Springer; New York: 1996. [Google Scholar]
- Wellner JA, Zhan Y. A hybrid algorithm for computation of the nonparametric maximum likelihood estimator from censored data. J Amer Statist Assoc. 1997;92:945–959. [Google Scholar]
- Wellner JA, Zhang Y. Two estimators of the mean of a counting process with panel count data. Ann Statist. 2000;28:779–814. [Google Scholar]


