A New Class of Minimum Power Divergence Estimators with Applications to Cancer Surveillance

Nirian Martín; Yi Li

doi:10.1016/j.jmva.2011.03.011

. Author manuscript; available in PMC: 2012 Feb 23.

Published in final edited form as: J Multivar Anal. 2011 Sep 1;102(8):1175–1193. doi: 10.1016/j.jmva.2011.03.011

A New Class of Minimum Power Divergence Estimators with Applications to Cancer Surveillance

Nirian Martín ^1,^*, Yi Li ^2,³

PMCID: PMC3285401 NIHMSID: NIHMS353973 PMID: 22368308

Abstract

The Annual Percent Change (APC) has been adopted as a useful measure for analyzing the changing trends of cancer mortality and incidence rates by the NCI SEER program. Difficulties, however, arise when comparing the sample APCs between two overlapping regions because of the induced dependence (e.g., comparing the cancer mortality change rate of California with the national level). This paper deals with a new perspective of understanding the sample distribution of the test statistics for comparing the APCs between overlapping regions. Our proposal allows for computational readiness and easy interpretability. We further propose a more general family of estimators, namely, the so-called minimum power divergence estimators, including the maximum likelihood estimators as a special case. Our simulation experiments support the superiority of the proposed estimator to the conventional maximum likelihood estimator. The proposed method is illustrated by the analysis of the SEER cancer mortality rates observed from 1991 to 2006.

Keywords: Minimum power divergence estimators, Age-adjusted cancer rates, Annual percent change (APC), Trends, Poisson sampling

1 Introduction

According to the World Health Statistics 2009, published by the World Health Organization, in 2004 the age-standardized mortality rate in high-income countries attributable to cancer deaths was 164 per 100,000. Cancer constituted the second highest cause of death after cardiovascular disease (its age-standardized mortality rate was equal to 408 per 100,000). For cancer prevention and control programs, such as the Surveillance, Epidemiology and End Results (SEER) in the United States (US), it is very important to rely on statistical tools to capture downward or upward trends of rates associated with each type of cancer and to measure their intensity accurately. These trends in cancer rates are defined within a specific spatial-temporal framework, that is, different geographic regions and time periods are considered.

Let r_ki be the expected value of the cancer rate associated with region k and the i-th time point in a sequence of ordered I_k time points ${t_{ki}}_{i = 1}^{I_{k}}$ . We shall assume that Region 1 starts with the earliest time. Each point is representing an equally spaced period of time, for instance a year, and thus without any loss of generality t_1i = i, i = 1, …, I₁ (any change in origin or scale with respect to the time should not affect a measure of trend). The cancer rates are useful to evaluate either the risk of developing cancer (cancer incidence rates) or dying from cancer (cancer mortality rates) in a specific moment. Statistically, the trend in cancer rates is an average rate of change per year in a given relatively short period of time framework when constant change along the time has been assumed. The annual percent change (APC) is a suitable measure for comparing recent trends associated with age-adjusted expected cancer rates

r_{ki} = \sum_{j = 1}^{J} ω_{j} r_{kji},

(1)

where J is the number of age-groups, ${ω_{j}}_{j = 1}^{J}$ is the age-distribution of the Standard Population $(\sum_{j = 1}^{J} ω_{j} = 1, ω_{j} > 0, j = 1, \dots, J) and {r_{kji}}_{j = 1}^{J}$ is the set of expected rates associated with the k-th region (k = 1, 2) at the time-point t_ki (i = 1, …, I_k), or the i-th year, in each of the age-groups (j = 1, …, J). For example, the SEER Program applies as standard the US population of year 2000 with J = 19 age-groups [0, 1), [1, 5), [5, 10), [10, 15), …, [80, 85), [85, *). The APC removes differences in scale by considering the proportion (r_k,i+1 − r_k,i)/r_k,i = r_k,i+1/r_k,i − 1 under constant change assumption of the expected rates. Proportionality constant θ_k = r_k2/r_k1 = … = r_{kI_k}/r_{k,I_k−1} constitutes the basis for defining APC_k = 100(θ_k − 1) as a percentage associated with the expected rates ${r_{kji}}_{j = 1}^{J}$ of the k-th region. Since the models that deal with the APCs consider the logarithm of age-adjusted cancer rates, the previous formula is usually replaced by

{APC}_{k} = 100 (exp (β_{1 k}) - 1),

(2)

and we would like to make statistical inferences on parameter β_1k.

The data that are collected for modeling the APC associated with region k, are:

d_kji, the number of deaths (or incidences) in the k-th region, j-th age-group, at the time-point t_ki;
n_kji the population at risk in the k-th region, j-th age-group, at the time-point t_ki;

so that the r.v.s that generate d_kji, D_kji, are considered to be mutually independent. In a sampling framework we can define the empirical age-adjusted cancer rates as $R_{ki} = \sum_{j = 1}^{J} ω_{j} R_{kji} = \sum_{j = 1}^{J} ω_{j} \frac{D_{kji}}{n_{kji}},$ , whose expected value is (1). Even though the assumption of “independence” associated with D_kji simplify the process of making statistical inference, it is in practice common to find situations in which the two APCs to be compared, APC₁ and APC₂, share some data because there is an overlap between the two regions. For example, in Riddell and Pliska (2008) county-level data on 22 selected cancer sites during 1996–2005 are analyzed, so that the APC of each county is compared with the APC of Oregon state. It is not possible to assume independence between the data of counties (local level) and their state (global level). Moreover, the APC comparison between overlapping regions is more complicated when the APCs are not for the same period of time. For instance in the aforementioned study that appeared in Riddell and Pliska (2008), while Oregon APC was obtained for a period of time ending in 2005, the US APC was calculated for a period of time ending in 2004 because the US data of year 2005 were not available. Figure 1 represents the most complicated overlapping case for two regions, where {1, 6} × {5, 8} is the set of points of the first region, {5, 9} × {2, 6} is the set the points of the second region, {5, 6} × {5, 6} is the set of points of the overlapping region (boxed points). Each of the two regions have a portion of space and period of time not contained in the other one (circular points for region 1 and diamond points for region 2).

Two overlapping regions not sharing the same period of time.

This paper is structured as follows. In Section 2 different models that establish the relationship between r_ki and β_1k are reviewed and the two basic tools for making statistical inferences are presented, the estimators and test-statistics for equal APCs. Specifically, the Age-stratified Poisson Regression model, introduced for the first time in Li et al. (2008), is highlighted as the model that arises as an alternative to improve the previous ones. Based on Power-divergence measures, in Section 3 a family of estimators that generalize the maximum likelihood estimators (MLEs) are considered for the Age-stratified Poisson Regression model. In addition, a new point of view for computing the covariance between the MLEs of β_1k is introduced inside the framework of this family of estimators and this is the key for substantially improving the Z-test statistic for testing the equality of APCs for the Age-stratified Poisson Regression model. In addition, such a methodology provides explicit and interpretable expressions of the covariance between the estimators of β_1k. We evaluate the performance of the new proposed methodology in Section 4 through a simulation study and we also consider an application example to Breast and Thyroid cancer data from California (CA) and the US population, extracted from the SEER*STAT software of the SEER Program. Finally in Section 5 some concluding remarks are given.

2 Models associated with the Annual Percent Change (APC)

When non-overlapping regions are taken into account, there are basically two models which allow us to estimate the APC starting from slightly different assumptions, the Age-adjusted Cancer Rate Regression model and Age-stratified Poisson Regression model. The main difference between them is based on the probability distribution of D_kji, number of deaths in the k-th region, j-th age-group, at the time-point t_ki: while the Age-adjusted Cancer Rate Regression model assumes normality for log R_ki with D_kji having the same mean and variance, the Age-stratified Poisson Regression model assumes directly a Poisson random variable (r.v.) for D_kji. The Age-adjusted Cancer Rate Regression model establishes log R_ki = β_0k + β_1kt_ki + ε_ki, where $ε_{ki} \overset{ind}{~} 𝒩 (0, σ_{ki}^{2}) with σ_{ki}^{2} = \sum_{j = 1}^{J} ω_{j}^{2} r_{kji} / n_{kji} = \sum_{j = 1}^{J} ω_{j}^{2} m_{kji} / n_{kji}^{2}$ under

E [D_{kji}] = Var [D_{kji}] = n_{kji} r_{kji} \equiv m_{kji},

(3)

i.e. $log R_{ki} \overset{ind}{~} 𝒩 (log r_{ki}, σ_{ki}^{2})$ with

r_{ki} = exp (β_{0 k}) exp (β_{1 k} t_{ki}) .

(4)

According to the Age-stratified Poisson Regression model (Li et al. 2008), $D_{kji} \overset{ind}{~} 𝒫 (n_{kji} r_{kji})$ and for r_kji it holds

log r_{kji} = β_{0 kj} + β_{1 k} t_{ki} or   log \frac{m_{kji}}{n_{kji}} = β_{0 kj} + β_{1 k} t_{ki} .

(5)

Observe that the parametrization of both models is essentially the same because the expected age-adjusted rate r_ki in terms of (5) is equal to (4), where

exp (β_{0 k}) = \sum_{j = 1}^{J} ω_{j} exp (β_{0 kj}),

(6)

and thus for both models it holds that

θ_{k} = {(\frac{r_{{kI}_{k}}}{r_{k 1}})}^{\frac{1}{t_{{kI}_{k}} - t_{k 1}}} = exp (β_{1 k}) .

(7)

The original estimators associated with the Age-adjusted Cancer Rate Regression model and Age-stratified Poisson Regression model are the Weighted Least Square estimators (WLSE) and Maximum Likelihood estimators (MLE) respectively.

The hypothesis testing for comparing the equality of trends of two regions, ℋ₀ : APC₁ = APC₂, is according to (2), equivalent to ℋ₀ : β₁₁ − β₁₂ = 0. Hence, the Z-test statistic for both models can be defined as

Z = \frac{{\hat{β}}_{11} - {\hat{β}}_{12}}{\sqrt{\hat{Var} ({\hat{β}}_{11} - {\hat{β}}_{12})}},

(8)

where β̂_1k, k = 1, 2 are the estimators of β_1k associated with each region, $\hat{Var} ({\hat{β}}_{11} - {\hat{β}}_{12})$ is the estimator of the variance of β̂₁₁ − β̂₁₂, Var(β̂₁₁ − β̂₁₂). The expression of the variance is $Var ({\hat{β}}_{11} - {\hat{β}}_{12}) = σ_{11}^{2} + σ_{12}^{2}, with σ_{1 k}^{2} \equiv Var ({\hat{β}}_{1 k})$ , k = 1, 2, for non-overlapping regions. When overlapping regions are taken into account, the methodology for obtaining the estimators as well as Z-test statistic (8) remain valid, but the given expression for Var(β̂₁₁ − β̂₁₂) is no longer valid. When the overlapping regions do not share the same period of time (t₁₁ ≠ t₂₁ or I₁ ≠ I₂), we must consider a new reference point for index i, denoted by Ī, such that t_1Ī represents the time point within ${t_{1 i}}_{i = 1}^{I_{1}}$ where the time series associated with the second region is about to start, i.e. we have ${t_{2 i}}_{i = 1}^{I_{2}}$ such that t₂₁ = t_1Ī + 1. In particular, if t_1i = i, i = 1, …, I₁, then t_2i = Ī + i, i = 1, …, I₂. Observe that ${t_{1 i}}_{i = \bar{I} + 1}^{I_{1}}$ , or equivalently ${t_{2 i}}_{i = 1}^{I_{1} - \bar{I}}$ , is the time series associated with the overlapping region (t_1i = t_2,i−Ī, i = Ī + 1, …, I₁). In Figure 1 I₁ = 6, I₂ = 5, Ī = 4 and thus we can distinguish three subregions {5, 6} × {1, …, 4}, {5, 6} × {5, 6} and {5, 6} × {7, ×, 9}. Without any loss of generality each random variable D_kji can be decomposed into two summands

D_{kji} = D_{kji}^{(1)} + D_{kji}^{(2)}

(9)

where $D_{kji}^{(1)}$ , i ∈ {1, …, I_k}, is the number of deaths (or incidences) in the k-th region, j-th age-group, at the time-point t_ki for the subregion where there is no overlap in space; $D_{kji}^{(2)}$ , i ∈ {1, …, I_k}, is the the number of deaths (or incidences) in the k-th region, j-th age-group, at the time-point t_ki for the subregion where there is overlap in space. Similarly, $n_{kji} = n_{kji}^{(1)} + n_{kji}^{(2)} and m_{kji} (β_{k}) = m_{kji}^{(1)} (β_{k}) + m_{kji}^{(2)} (β_{k})$ . Observe that when i ∈ {Ī + 1, …, I₁}, r.v.s $D_{1 ji}^{(2)} and D_{2 j, i - \bar{I}}^{(2)}$ are associated with the same overlapping subregion. Revisiting the example illustrated in Figure 1, it should be remarked that in the y-axis (space) there are more points than those that represent one realization of all r.v.s $D_{kji}^{(b)}$ in each time point, but grouping the points belonging to the same vertical line inside the portion marked in dash we are referring to one realization of them (for instance, for t₁₁ = 1 we have two groups of points associated with $D_{1 j 1}^{(1)}, D_{1 j 1}^{(2)}$ respectively, while for t_1j5 = t_2j1 = 5 we have three groups of points associated with $D_{1 j 5}^{(1)}, D_{1 j 5}^{(2)} or D_{2 j 1}^{(2)}, D_{2 j 1}^{(1)}$ . Grouping points symbolize different extension in regions. In Figure 1 there are 20 realizations of all r.v.s $D_{kji}^{(b)}$ in total, 12 for region 1, 10 for region 2 and 2 r.v.s are shared for both regions.

It is important to understand r.v.s $D_{kji}^{(b)}$ , b ∈ {1, 2} as “homogeneous contributors” with respect to D_kji, i.e. $D_{kji}^{(b)} ~ 𝒫 (m_{kji}^{(b)})$ such that (10) holds, and hence ${m_{1 ji}^{(2)} (β_{1})}_{i = \bar{I} + 1}^{I_{1}} and {m_{2 ji}^{(2)} (β_{2})}_{i = 1}^{I_{1} - \bar{I}}$ are only equal when β₁₁ = β₁₂ (or equivalently, when β₁ = β₂). Now we can say thoroughly that under β₁₁ = β₁₂, the reason why Cov(β̂₁₁, β̂₁₂) = 0 is not true inside Var(β̂₁₁ − β̂₁₂) = Var(β̂₁₁) + Var(β̂₁₂) − 2Cov(β̂₁₁, β̂₁₂) for overlapping regions is that {D_1ji}_{i=1, …, I₁;j=1, ‥, J} and {D_2ji}_{i=1, …, I₂;j=1, ‥, J} are not independent, because both regions share the same the set of r.v.s ${D_{1 ji}^{(2)}}_{i = \bar{I} + 1, \dots, I_{1}; j = 1, ‥, J} with D_{1 ji}^{(2)} = D_{2 j, i - \bar{I}}^{(2)}$ .

Assumption 1 $D_{kji}^{(b)} \overset{ind}{~} 𝒫 (m_{kji}^{(b)})$ , b ∈ {1, 2}, where for $n_{kji}^{(b)} > 0$ the following holds

m_{kji}^{(b)} = \frac{n_{kji}^{(b)}}{n_{kji}} m_{kji}, b \in {1, 2} .

(10)

We accept the case where $n_{kji}^{(b)} = 0,$ , for some b ∈ {1, 2}, so that $D_{kji}^{(b)} = 0$ a.s. (degenerate r.v.) because $m_{kji}^{(b)} = 0$ .

Regarding the basic models considered in the papers dealing with overlapping regions, the Age-stratified Poisson regression model can be considered as the most realistic one, actually they have been constructed by successive improvements on the previous models so that initially normality assumptions were taken as approximations of underlying Poisson r.v.s. In the first paper concerned about trend comparisons across overlapping regions (Li and Tiwari (2008)), it is remarked that “… the derivation of Cov(β̂₁₁, β̂₁₂), …., is nontrivial as it requires a careful consideration of the overlapping of two regions”. The assumption considered by them (which is based on Pickle and White (1995)) for the overlapping subregion is similar to the assumption considered herein in the sense that the overlapping subregion follows the same distribution considered for the whole region. A similar criterion was followed in Li et al. (2007) and Li et al. (2008).

3 Minimum Power Divergence Estimators for an Age-stratified Poisson Regression Model with Overlapping

Let m_s be the expected value of the r.v. of deaths (or incidences) D_s associated with the s-th cell of a contingency table with M_k ≡ JI_k cells (s = 1, …, M_k). In this section, we consider model (5) in matrix notation so that the triple indices are unified in a single one by following a lexicographic order. Hence, the vector of cell means m_k(β_k) = (m₁(β_k), …, m_{M_k} (β_k))^T = (m_k11(β_k), …, m_{kJI_k} (β_k))^T of the multidimensional r.v. of deaths (or incidences) D_k = (D₁, …, D_{M_k})^T = (D_k11, …, D_{kJI_k})^T, is related to the vector of parameters β_k= (β_0k1, …, β_0kJ, β_1k)^T ∈ Θ_k = ℝ^J+1 according to

log ({Diag}^{- 1} (n_{k}) m_{k} (β_{k})) = X_{k} β_{k} or m_{k} (β_{k}) = Diag (n_{k}) exp (X_{k} β_{k}),

(11)

where Diag(n_k) is a diagonal matrix of individuals at risk n_k = (n₁, …, n_{M_k})^T = (n_k11, …, n_{kJI_k})^T (n_s > 0, s = 1, …, M_k) and

X_{k} = {(\begin{matrix} 1_{I_{k}} & t_{k} \\ ⋱ & ⋮ \\ 1_{I_{k}} & t_{k} \end{matrix})}_{{JI}_{k} \times (J + 1)} = (I_{J} \otimes 1_{I_{k}}, 1_{J} \otimes t_{k}),

with t_k ≡ (t_k1, …, t_{kI_k})^T, a full rank M_k × (J + 1) design matrix. Based on the likelihood function of a Poisson sample D_k the kernel of the log-likelihood function is given by

ℓ_{β_{k}} (D_{k}) = \sum_{s = 1}^{M_{k}} D_{s} log m_{s} (β_{k}) - \sum_{s = 1}^{M_{k}} m_{s} (β_{k}),

and thus the MLE of β_k is

{\hat{β}}_{k} = arg max_{β_{k} \in Θ_{k}} ℓ_{β_{k}} (D_{k}) .

It is well known that there is a very close relationship between the likelihood theory and the Kullback-Leibler divergence measure (Kullback and Leibler (1951)). Focussed on a multinomial contingency table it is intuitively understandable that a good estimator of the probabilities of the cells should be such that the discrepancy with respect to the empirical distribution or relative frequencies is small enough. The oldest discrepancy or distance measure we know is the Kullback divergence measure, actually the estimator which is built from the Kullback divergence measure is the MLE. By considering the unknown parameters of a Poisson contingency table, the expected values, rather than probabilities and the observed frequencies rather than relative frequencies, we are going to show how is it possible to carry out statistical inference for Poisson models through power divergence measures. According to the Kullback divergence measure, the discrepancy or distance between the Poisson sample D_k and its vector of means m_k(β_k) is given by

d_{Kull} (D_{k}, m_{k} (β_{k})) = \sum_{s = 1}^{M_{k}} (D_{s} log \frac{D_{s}}{m_{s} (β_{k})} - D_{s} + m_{s} (β_{k})) .

(12)

Observe that d_Kull(D_k,m_k(β_k)) = −ℓ_{β_k} (D_k) + C_k, where C_k does not depend on parameter β_k. Such a relationship allows us to define the MLE of β_k as minimum Kullback divergence estimator

{\hat{β}}_{k} = arg min_{β_{k} \in Θ_{k}} d_{Kull} (D_{k}, m_{k} (β_{k})),

and the MLE of m_k(β_k) functionally as m_k(β̂_k) due to the invariance property of the MLEs. The power divergence measures are a family of measures defined as

d_{λ} (D_{k}, m_{k} (β_{k})) = \frac{1}{λ (1 + λ)} \sum_{s = 1}^{M_{k}} (\frac{D_{s}^{λ + 1}}{m_{s}^{λ} (β_{k})} - D_{s} (1 + λ) + λ m_{s} (β_{k})), λ \notin {0, - 1} .

(13)

such that from each possible value for subscript λ ∈ ℝ − {0, −1} a different way to quantify the discrepancy between D_k and m_k(β_k) arises. In case of λ ∈ {0, −1}, we define d_λ(D_k,m_k(β_k)) = lim_ℓ→λ d_ℓ(D_k,m_k(β_k)), and in this manner the Kullback divergence appears as special case of power divergence measures when λ = 0, d₀(D_k,m_k(β_k)) = d_Kull(D_k,m_k(β_k)) and on the other hand case λ = −1 is obtained by changing the order of the arguments for the Kullback divergence measure, d₋₁(D_k,m_k(β_k)) = d_Kull(m_k(β_k), D_k).

The estimator of β_k obtained on the basis of (13) is the so-called minimum power divergence estimator (MPDE) and it is defined for each value of λ ∈ ℝ as

{\hat{β}}_{k, λ} = arg min_{β_{k} \in Θ_{k}} d_{λ} (D_{k}, m_{k} (β_{k})),

(14)

and the MPDE of m_k(β_k) functionally as m_k(β̂_k,λ) due to the invariance property of the MPDEs. Apart from the MLE (β̂_k or β̂_k,0) there are other estimators that are members of this family of estimators: minimum modified chi-square estimator, β̂_k,−2; minimum modified likelihood estimator, β̂_k;−1; Cressie-Read estimator, β̂_k,2/3; minimum chi-square estimator, β̂_k,1. These estimators were introduced and analyzed for multinomial sampling by Cressie and Read (1988), but for Poisson sampling were applied for the first time in Pardo and Martín (2009). The so-called minimum ϕ-divergence estimators are a wider class of estimator that contains MPDEs as special case (see Pardo (2005) for more details) and this statistical problem could be easily extended for these estimators.

Taking into account that the asymptotic distribution of all MPDEs tend to be “theoretically” the same, including the MLE, we are going to propose an alternative method for estimating Var(β̂₁₁ − β̂₁₂) = Var(β̂_11,0 − β̂_12,0) that covers a new element for overlapping regions, Cov(β̂₁₁, β̂₁₂) = Cov(β̂_11,0, β̂_12,0). We postulate that for not very large data sets, the MLEs, β̂_11,0 − β̂_12,0, might be likely improved by the estimation associated with λ = 1, β̂_11,1 − β̂_12,1, when overlapping regions are considered.

In order to obtain the MPDE of (2), ${\hat{APC}}_{k, λ} = 100 (exp ({\hat{β}}_{1 k, λ}) - 1)$ , we need to compute the estimator of the parameter of interest by following the next result.

Proposition 2 The MPDE of β_1k, β̂_1k,λ, is the solution of the nonlinear equation

f ({\hat{β}}_{1 k, λ}) = \sum_{i = 1}^{I_{k}} t_{ki} ϒ_{ki} = 0,

with

\begin{matrix} ϒ_{ki} = & \sum_{j = 1}^{J} m_{kji} ({\hat{β}}_{λ}) (φ_{kji} - 1), \\ m_{kji} ({\hat{β}}_{λ}) = & n_{kji} exp ({\hat{β}}_{0 kj, λ}) exp ({\hat{β}}_{1 ki, λ} t_{ki}) and φ_{kji} = {(\frac{D_{kji}}{m_{kji} ({\hat{β}}_{λ})})}^{λ + 1}, \\ exp ({\hat{β}}_{0 kj, λ}) = & {(\sum_{s = 1}^{I_{k}} p_{kjs} ψ_{kjs}^{λ + 1})}^{\frac{1}{λ + 1}}, j = 1, \dots, J, \\ p_{kjs} = & \frac{n_{kjs} exp ({\hat{β}}_{1 k, λ} t_{ks})}{\sum_{h = 1}^{I_{k}} n_{kjh} exp ({\hat{β}}_{1 k, λ} t_{kh})} and ψ_{kjs} = \frac{D_{kjs}}{n_{kjs} exp ({\hat{β}}_{1 k, λ} t_{ks})} . \end{matrix}

Our aim is to show that β̂_11,λ − β̂_12,λ is asymptotically normal and to obtain an explicit expression of the denominator of the Z-test statistic (8) with MPDEs

Z_{λ} = \frac{{\hat{β}}_{11, λ} - {\hat{β}}_{12, λ}}{\sqrt{\hat{Var} ({\hat{β}}_{11, λ} - {\hat{β}}_{12, λ})}},

(15)

when the random vectors of observed frequencies of both regions, D₁ and D₂, share some components (those belonging to the overlapping subregion). Since (15) is approximately standard normal for min{N₁, N₂} large enough, we can test ℋ₀ : APC₁ = APC₂ (β₁₁ = β₁₂) vs ℋ₁ : APC₁ ≠ APC₂ (β₁₁ ≠ β₁₂), so that if the value of |Z_λ| is greater than the quantile $z_{1 - \frac{α}{2}} (i . e ., Pr (Z_{λ} < z_{1 - \frac{α}{2}}) = 1 - \frac{α}{2})$ , ℋ₀ is rejected with significance level α.

The following result is the key result for estimating the variances and covariance of the estimators of interest, β̂_1k,λ, k = 1, 2. It allows us to establish a linear relationship between the parameter of interest and the observed frequencies under Poisson sampling when the expected total mean N_k in each region (k = 1, 2) is large enough and the way that N_k increases is given in Assumption 3.

Assumption 3 $m_{kji}^{*} (β_{k}^{0}) = m_{kji} (β_{k}^{0}) / N_{k}$ remains constant as N_k increases, that is, $m_{kji} (β_{k}^{0})$ increases at the same rate as N_k.

Theorem 4 The MPDE of β_1k, β̂_1k,λ, k = 1, 2, can be expressed as

{\hat{β}}_{1 k, λ} - β_{1 k}^{0} = σ_{1 k}^{2} {\tilde{t}}_{k}^{T} (β_{k}^{0}) X_{k}^{T} (D_{k} - m_{k} (β_{k}^{0})) + o (‖ \frac{D_{k} - m_{k} (β_{k}^{0})}{N_{k}} ‖),

where superscript 0 is denoting the true and unknown value of a parameter, o is denoting a little o function for a stochastic sequence (see Appendix in Bishop et al. (1975)) and

σ_{1 k}^{2} = {({\tilde{t}}_{k}^{T} (β_{k}^{0}) X_{k}^{T} Diag (m_{k} (β_{k}^{0})) X_{k} {\tilde{t}}_{k} (β_{k}^{0}))}^{- 1} = {(\sum_{j = 1}^{J} \sum_{i = 1}^{I_{k}} m_{kji} (β_{k}^{0}) {(t_{ki} - {\tilde{t}}_{kj} (β_{k}^{0}))}^{2})}^{- 1},

(16)

\begin{matrix} {\tilde{t}}_{k}^{T} (β_{k}^{0}) = & (- {\tilde{t}}_{k 1} (β_{k}^{0}) \dots - {\tilde{t}}_{kJ} (β_{k}^{0}) 1), \\ {\tilde{t}}_{kj} (β_{k}^{0}) = & \frac{\sum_{i = 1}^{I_{k}} m_{kji} (β_{k}^{0}) t_{ki}}{\sum_{i = 1}^{I_{k}} m_{kji} (β_{k}^{0})} . \end{matrix}

(17)

Theorem 5 The MPDE of β_1k, β̂_1k,λ, k = 1, 2, is asymptotically Normal, unbiased and with variance equal to (16).

Note that Theorem 5 would be more formally enunciated in terms of $\sqrt{N_{k}} ({\hat{β}}_{1 k, λ} - β_{1 k}^{0}),$ , because $σ_{1 k}^{2}$ is not constant as N_k increases. We have avoided that in order to focus directly on the estimator of interest. Due to Assumption 3 and ${\tilde{t}}_{kj} (β_{k}^{0}) = \sum_{i = 1}^{I_{k}} m_{kji}^{*} (β_{k}^{0}) t_{ki}$ , what is constant is

Var (\sqrt{N_{k}} {\hat{β}}_{1 k, λ}) = N_{k} σ_{1 k}^{2} = {(\sum_{j = 1}^{J} \sum_{i = 1}^{I_{k}} m_{kji}^{*} (β_{k}^{0}) {(t_{ki} - {\tilde{t}}_{kj} (β_{k}^{0}))}^{2})}^{- 1} .

Let N be the total expected value of the region constructed by joining regions 1 and 2. Note that N ≤ N₁ + N₂, being only equal with non-overlapping regions. In order to establish the way that N increases with respect to N_k, we shall consider throughout the next assumption.

Assumption 6 $N_{k}^{*} = \frac{N_{k}}{N} (k = 1, 2)$ is constant as N increases, that is N increases at the same rate as N_k.

Note that for overlapping regions, $N_{1}^{*} + N_{2}^{*} > 1$ holds and under the hypothesis that $β_{11}^{0} = β_{12}^{0}$ , we have a common true parameter vector $β_{11}^{0} \equiv β_{12}^{0} (k = 1, 2)$ . Hence, under the hypothesis that $β_{11}^{0} = β_{12}^{0}, since N_{1}^{*} + N_{2}^{*} = 1 + \sum_{j = 1}^{J} \sum_{i = 1}^{I_{1} - \bar{I}} m_{2 kj}^{(2)} (β^{0}) / N$ is constant, the overlapping death fraction, $\sum_{j = 1}^{J} \sum_{i = 1}^{I_{1} - \bar{I}} m_{2 kj}^{(2)} (β^{0}) / N,$ , is also constant as N increases.

Theorem 7 Under the hypothesis that $β_{11}^{0} = β_{12}^{0}$ , the MPDE of β₁₁ − β₁₂, β̂_11,λ − β̂_12,λ, is decomposed as

{\hat{β}}_{11, λ} - {\hat{β}}_{12, λ} = X_{1} + X_{2} + X_{3} + Y,

(18)

\begin{matrix} X_{1} = & σ_{11}^{2} {\tilde{t}}_{1}^{T} (β^{0}) X_{1}^{T} (D_{1}^{(1)} - m_{1}^{(1)} (β^{0})), \\ X_{2} = & - σ_{12}^{2} {\tilde{t}}_{2}^{T} (β^{0}) X_{2}^{T} (D_{2}^{(1)} - m_{2}^{(1)} (β^{0})), \\ X_{3} = & (σ_{11}^{2} {\tilde{t}}_{1}^{T} (β^{0}) {\bar{X}}_{1}^{T} - σ_{12}^{2} {\tilde{t}}_{2}^{T} (β^{0}) {\bar{X}}_{2}^{T}) ({\bar{D}}^{(2)} - {\bar{m}}^{(2)} (β^{0})), \\ Y = & o (‖ \frac{D_{1} - m_{1} (β^{0})}{N_{1}} ‖) + o (‖ \frac{D_{2} - m_{2} (β^{0})}{N_{2}} ‖), \end{matrix}

where X̄_k is an amplified J(Ī + I₂) × (J + 1) matrix of X_k,

\begin{matrix} {\bar{X}}_{k} = & {(\begin{matrix} {\bar{1}}_{k} & {\bar{t}}_{k} \\ ⋱ & ⋮ \\ {\bar{1}}_{k} & {\bar{t}}_{k} \end{matrix})}_{J (\bar{I} + I_{2}) \times (J + 1)} = (I_{J} \otimes {\bar{1}}_{k}, 1_{J} \otimes {\bar{t}}_{k}), \\ {\bar{1}}_{1}^{T} = & (1_{I_{1}}^{T}, 0_{\bar{I} + I_{2} - I_{1}}^{T}) and {\bar{1}}_{2}^{T} = (0_{\bar{I}}^{T}, 1_{I_{2}}^{T}), \\ {\bar{t}}_{1}^{T} = & (t_{1}^{T}, 0_{\bar{I} + I_{2} - I_{1}}^{T}) and {\bar{t}}_{2}^{T} = (0_{\bar{I}}^{T}, t_{2}^{T}), \end{matrix}

and D̄⁽²⁾ = (D̄₁₁₁, …, D̄_1J,Ī+I₂)^T, ${{\bar{m}}^{(2)} (β^{0}) = {\bar{m}}_{111}^{(2)} (β^{0}), \dots, {\bar{m}}_{1 J, \bar{I} + I_{2}}^{(2)} (β^{0}))}^{T}$ are the vectors obtained joining $D_{k}^{(2)}$ for k = 1, 2 and $m_{k}^{(2)} (β^{0})$ for k = 1, 2 respectively, i.e.

\begin{matrix} {\bar{D}}^{(2)} = & {((D_{111}, \dots, D_{1 J \bar{I}}), {(D_{2}^{(2)})}^{T})}^{T}, D_{2}^{(2)} = {(D_{211}, \dots, D_{2 {JI}_{2}})}^{T}, \\ {\bar{m}}^{(2)} (β^{0}) = & {((m_{111}^{(2)} (β^{0}), \dots, m_{1 J \bar{I}}^{(2)} (β^{0})), m_{2}^{(2)} (β^{0}))}^{T}, m_{2}^{(2)} (β^{0}) = {(m_{211}^{(2)} (β^{0}), \dots, m_{2 {JI}_{2}}^{(2)} (β^{0}))}^{T} . \end{matrix}

Theorem 8 Under the hypothesis that $β_{11}^{0} = β_{12}^{0}$ , the asymptotic distribution of β̂_11,λ − β̂_12,λ is central Normal with

Var ({\hat{β}}_{11, λ} - {\hat{β}}_{12, λ}) = σ_{11}^{2} + σ_{12}^{2} - 2 σ_{11}^{2} σ_{12}^{2} ξ_{12}

where $σ_{1 k}^{2}$ is equal to

σ_{1 k}^{2} = {(\sum_{j = 1}^{J} \sum_{i = 1}^{I_{k}} m_{kji} (β^{0}) {(t_{ki} - {\tilde{t}}_{kj} (β^{0}))}^{2})}^{- 1} = {(\sum_{j = 1}^{J} \sum_{i = 1}^{I_{k}} m_{kji} (β^{0}) t_{ki}^{2} - \sum_{j = 1}^{J} m_{kj •} {\tilde{t}}_{kj}^{2} (β^{0}))}^{- 1},

(19)

with $m_{kj •} = \sum_{i = 1}^{I_{k}} m_{kji} (β^{0}), {\tilde{t}}_{kj} (β_{k}^{0})$ is (17) and

ξ_{12} = \sum_{j = 1}^{J} \sum_{i = 1}^{I_{1} - \bar{I}} \frac{n_{2 ji}^{(2)}}{n_{2 ji}} m_{2 ji} (β^{0}) (t_{2 i} - {\tilde{t}}_{1 j} (β^{0})) (t_{2 i} - {\tilde{t}}_{2 j} (β^{0})) = \sum_{j = 1}^{J} \sum_{i = 1}^{I_{1} - \bar{I}} \frac{n_{2 ji}^{(2)}}{n_{2 ji}} m_{2 ji} (β^{0}) (t_{2 i}^{2} + {\tilde{t}}_{1 j} (β^{0}) {\tilde{t}}_{2 j} (β^{0})) - \sum_{j = 1}^{J} \sum_{i = 1}^{I_{1} - \bar{I}} \frac{n_{2 ji}^{(2)}}{n_{2 ji}} m_{2 ji} (β^{0}) t_{2 i} ({\tilde{t}}_{1 j} (β^{0}) + {\tilde{t}}_{2 j} (β^{0})) .

(20)

That is, the covariance between β̂_11,λ and β̂_12,λ is given by

σ_{1, 12} = Cov ({\hat{β}}_{11, λ}, {\hat{β}}_{12, λ}) = σ_{11}^{2} σ_{12}^{2} ξ_{12},

(21)

and the correlation by ρ_1,12 = Cor(β̂_11,λ, β̂_12,λ) = σ₁₁σ₁₂σ₁₂.

For the expression in the denominator of (15), we need to obtain the MPDEs of $σ_{1 k}^{2}$ , k = 1, 2 and ξ₁₂, ${\hat{σ}}_{1 k, λ}^{2}$ , k = 1, 2 and ξ̂_12,λ respectively. A way to proceed is based on replacing β⁰ by the most efficient MPDE

{\hat{β}}_{λ}^{0} \equiv {\begin{matrix} {\hat{β}}_{1, λ}^{0}, & if N_{1} \geq N_{2} \\ {\hat{β}}_{2, λ}^{0}, & if N_{1} < N_{2} \end{matrix} .

An important advantage of this new methodology is that the expression of the denominator of (15) is explicit, easy to compute and can be interpreted easily. The term (20) determines the sign of (21). The structure of (20) is similar to the covariance proposed in the model of Li et al. (2007) for WLSEs or as well as for the estimators in the model of Li and Tiwari (2008). We can see that if there is no time-point shared by the two regions, i.e. Ī ≥ I₁, then σ̂_1,12,λ = 0 and $\hat{Var} ({\hat{β}}_{11, λ} - {\hat{β}}_{12, λ}) = {\hat{σ}}_{11, λ}^{2} + {\hat{σ}}_{12, λ}^{2}$ ; if there is no space overlap, then it holds $m_{2 ji}^{(2)} ({\hat{β}}_{λ}^{0}) = 0$ for all i and j belonging to the overlapping subregion and hence σ_1,12,λ = 0 and $\hat{Var} ({\hat{β}}_{11, λ} - {\hat{β}}_{12, λ}) = {\hat{σ}}_{11, λ}^{2} + {\hat{σ}}_{12, λ}^{2} .$ . On the other hand, when the two regions to be compared share at least one time-point and there is space overlap, $\hat{Var} ({\hat{β}}_{11, λ} - {\hat{β}}_{12, λ}) = {\hat{σ}}_{11, λ}^{2} + {\hat{σ}}_{12, λ}^{2} - 2 {\hat{σ}}_{1, 12, λ}$ holds, with σ̂_1,12,λ ≠ 0. Moreover, when the period of time not shared by the two regions is large (small), the covariance tends to be negative (positive) because the average values, ${\tilde{t}}_{1 j} ({\hat{β}}_{1, λ}^{0}) and {\tilde{t}}_{2 j} ({\hat{β}}_{2, λ}^{0})$ , are more separated from (closer to) the time-points associated with the overlapping subregion. We shall later analyze this behaviour through a simulation study, and we shall now investigate how is the structure of ξ₁₂ when the two regions to be compared share the whole period of time.

Corollary 9 When Ī = 0 and I₁ = I₂, under the hypothesis that $β_{11}^{0} = β_{12}^{0}$

ξ_{12} = \frac{1}{σ_{1 (2)}^{2}} + \sum_{j = 1}^{J} \frac{m_{1 j •}^{(1)}}{m_{1 j •}} \frac{m_{2 j •}^{(1)}}{m_{2 j •}} m_{j •}^{(2)} ({\tilde{t}}_{1 j}^{(1)} (β^{0}) - {\tilde{t}}_{1 j}^{(2)} (β^{0})) ({\tilde{t}}_{2 j}^{(1)} (β^{0}) - {\tilde{t}}_{2 j}^{(2)} (β^{0})),

(22)

with

\begin{matrix} \frac{1}{{\hat{σ}}_{1 (2)}^{2}} = & \sum_{j = 1}^{J} \sum_{i = 1}^{I_{2}} m_{2 ji}^{(2)} (β^{0}) {(t_{2 i} - {\tilde{t}}_{2 j}^{(2)} (β^{0}))}^{2}, \\ {\tilde{t}}_{kj}^{(b)} (β^{0}) = & \frac{\sum_{i = 1}^{I_{k}} m_{kji}^{(b)} (β^{0}) t_{ki}}{\sum_{i = 1}^{I_{k}} m_{kji}^{(b)} (β^{0})}, \\ m_{kj •}^{(b)} = & \sum_{i = 1}^{I_{k}} m_{kji}^{(b)} (β^{0}), m_{kj •} = m_{kj •}^{(1)} + m_{kj •}^{(2)}, \\ m_{j •}^{(2)} = & \sum_{i = 1}^{I_{2}} m_{2 ji}^{(2)} (β^{0}) = \sum_{i = 1}^{I_{1}} m_{1 ji}^{(2)} (β^{0}) . \end{matrix}

$σ_{1 (2)}^{2}$ represents the variance of β̂_12,λ focussed on the overlapping subregion. In particular, if region 2 is completely contained in region 1, $ξ_{12} = 1 / σ_{1 (2)}^{2} = 1 / σ_{12}^{2}, m_{2 j •}^{(1)} = 0$ for all j = 1, …, J, and hence

Var ({\hat{β}}_{11, λ} - {\hat{β}}_{12, λ}) = σ_{12}^{2} - σ_{11}^{2} .

(23)

4 Simulation Studies and Analysis of SEER Mortality Data

When dealing with asymptotic results, it is interesting to analyze the performance of the theoretical results in an empirical framework. Specifically, for Poisson sampling what is important to calibrate is the way that the total expected value of deaths (or incidences) N_k affects the precision of the results. Other characteristics such as the percentage of overlapping regions “in space” or “in time”, as well as the suitable choice of λ values are also worth to be analyzed. As a preliminary study, before focussing on N_k, we have considered thyroid cancer mortality (rare cancer) in three regions, Western (W) US population (composed of Arizona, New Mexico and Texas), South Western (SW) US population (composed of Arizona, California and Nevada) and West Coast (WC) US population (composed of California, Oregon and Washington). APC comparison of W vs. SW (Arizona is shared) on one hand and SW vs. WC (California is shared) on the other hand are considered. We have taken different scenarios with different time periods, 1998–2007 for SW in all scenarios and 1986–1995, 1989–1998, 1992–2001, 1995–2004 and (1998–2007) for the other region (W or WC) in each of scenarios A′, B′, C′, D′ and E′ respectively. In Table 1 the percentage of expected deaths in the regions to be compared with respect to the shared part (the percentages of overlapping) are shown, when β₁₁ = β₁₂ = −0.005 (APC₁ = APC₂ ≃ −0.5) for W vs. SW, and β₁₃ = β₁₄ = 0.02 for SW vs. WC (APC₃ = APC₄ ≃ 2.02). Observe that in the same scenario but different couple of comparisons, the change in overlapping percentage is due to the space overlapping (the overlapping percentages are greater for SW vs. WC, actually the shared part is a large state, California). In addition, we have chosen some values of λ, $λ \in {- 0.5, 0, \frac{2}{3}, 1, 1.5}$ , in order to compare the performance of minimum power divergence estimators. In Table 3 these results are shown for W vs. SW. From scenario B′ to E′ (i.e. when the overlapping percentage is increasing), the covariance is increasing, starts with negative values at B′ (1 time point is shared), decreases at E′ (4 time points are shared), later positive values but small are reached at F′ (7 time points are shared) and finally at E′ (10 time points are shared) ends with positive and high values. It seems that more or less the sign of the covariance changes in the middle of time points considered for each of the regions. In scenario A′ the theoretical covariance is zero, actually the two regions do not share observations. By asterisk we have marked the variances and significance levels obtained by simulation which are greater than its corresponding theoretical values, in order to visualize them as the worst cases. From the results it is concluded the minimum power divergence estimators with λ = 1, that is the minimum chi-squared estimators provide empirically efficient estimators and their Z-test statistics have good performance with respect to the theoretical significance level in the sense that tend to be much smaller. We have omitted the results for SW vs. WC because we have seen that the space overlapping by itself do not affect much the covariances of β̂_k1,λ. That is, there were no remarkable difference among the covariances in case of choosing SW vs. WC rather than W vs. SW, because the sign of the covariances starts at the same scenario and it is just the value of the covariance what marks the difference between both of them. The behaviour of minimum power divergence estimators is very similar too. Hence, in the simulation study that follows we are going to focus only on fixed overlapping percentages and one of them is going to be 100% and the focus of interest are going to be the MLEs and the MCSEs.

Table 1.

Overlapping percentages for W vs SW and SW vs WC in five scenarios.

space \ time	sc A′	sc B′	sc C′	sc D′	sc E′
W vs SW	18.96%; 13.03%	12.66%; 9.12%	6.94%; 5.24%	1.66%; 1.32%	0%; 0%
SW vs WC	81.80%;78.39%	59.09%; 54.06%	34.75%; 30.30%	8.93%; 7.40%	0%; 0%

Open in a new tab

Table 3.

Minimum Power Divergence Estimators with $λ \in {- 0.5, 0, \frac{2}{3}, 1, 1.5}$ for scenarios A′, B′, C′, D′ and E′.

σ_{11}^{2}

{\tilde{σ}}_{11, λ}^{2}

σ_{12}^{2}

{\tilde{σ}}_{12, λ}^{2}

σ_1,12

σ̃_1,12,λ

Var(β̂_11,λ − β̂_12,λ)

\tilde{Var} ({\hat{β}}_{11, λ} - {\hat{β}}_{12, λ})

α̃_λ

A′

−0.5

106106.94

117206.91

88722.57

96004.29

0.00

−190.20

194829.51

*213591.59

*0.056

A′

106106.94

106482.52

88722.57

88399.60

0.00

−131.23

194829.51

*195144.57

0.050

A′

\frac{2}{3}

106106.94

100968.49

88722.57

84561.80

0.00

−64.51

194829.51

185659.31

0.047

A′

106106.94

99842.89

88722.57

83793.94

0.00

−34.77

194829.51

183706.37

0.047

A′

1.5

106106.94

99346.15

88722.57

83510.99

0.00

2.27

194829.51

182852.60

0.049

B′

−0.5

106106.94

117293.27

83850.45

92311.97

−4020.16

−3833.28

197997.72

*217271.80

*0.058

B′

106106.94

106707.01

83850.45

85398.66

−4020.16

−3490.04

197997.72

*199085.75

0.051

B′

\frac{2}{3}

106106.94

101342.71

83850.45

81753.09

−4020.16

−3229.76

197997.72

189555.32

0.049

B′

106106.94

100261.70

83850.45

80985.96

−4020.16

−3142.66

197997.72

187532.98

0.049

B′

1.5

106106.94

99807.30

83850.45

80649.82

−4020.16

−3047.64

197997.72

186552.39

*0.052

C′

−0.5

106106.94

116056.24

79295.40

84620.24

−6035.64

−5099.81

197473.63

*210876.09

*0.055

C′

106106.94

105572.08

79295.40

78400.39

−6035.64

−4630.90

197473.63

193234.25

0.048

C′

\frac{2}{3}

106106.94

100178.76

79295.40

75138.00

−6035.64

−4302.56

197473.63

183921.87

0.046

C′

106106.94

99090.96

79295.40

74470.54

−6035.64

−4199.62

197473.63

181960.74

0.046

C′

1.5

106106.94

98646.02

79295.40

74214.18

−6035.64

−4094.68

197473.63

181049.56

0.049

D′

−0.5

106106.94

115548.66

74971.59

81107.54

2294.32

2271.85

176489.89

*192112.50

*0.057

D′

106106.94

104872.37

74971.59

75820.32

2294.32

2148.49

176489.89

176395.71

0.050

D′

\frac{2}{3}

106106.94

99400.99

74971.59

72923.02

2294.32

2060.12

176489.89

168203.77

0.048

D′

106106.94

98300.76

74971.59

72306.86

2294.32

2034.16

176489.89

166539.31

0.050

D′

1.5

106106.94

97854.16

74971.59

72044.77

2294.32

2011.33

176489.89

165876.27

*0.052

E′

−0.5

106106.94

115740.28

70747.13

75885.14

15621.44

17152.32

145611.20

*157320.78

*0.055

E′

106106.94

105114.37

70747.13

71094.62

15621.44

16123.83

145611.20

143961.33

0.048

E′

\frac{2}{3}

106106.94

99710.57

70747.13

68383.44

15621.44

15273.05

145611.20

137547.92

0.047

E′

106106.94

98636.63

70747.13

67789.30

15621.44

14953.01

145611.20

136519.90

0.047

E′

1.5

106106.94

98219.30

70747.13

67513.89

15621.44

14557.46

145611.20

136618.26

0.049

Open in a new tab

For studying the precision of the results when N_k changes, we have considered three proportionality constants $κ \in {1, \frac{1}{100}, \frac{1}{300}}$ associated with N_k in each of the following scenarios for Regions 1 and 2, with β_1k ∈ {0.02, 0.005, 0, −0.005} being equal for both (k = 1, 2) as it is required for the null hypothesis, i.e. APC₁ = APC₂ ≃ 2.02, APC₁ = APC₂ ≃ 0.50, APC₁ = APC₂ ≃ 0, APC₁ = APC₂ ≃ −0.50:

Scenario A: Low level overlapping regions, I₁ = 6, I₂ = 11, I₁ − Ī = 3.
Scenario B: Medium level overlapping regions, I₁ = 10, I₂ = 11, I₁ − Ī = 7.
Scenario C: High level overlapping regions, I₁ = 8, I₂ = 8, I₁ − Ī = 8.

The values of n_kji have been obtained from real data sets for female:

Scenario A: Region 1 = United States (US) during 1993–1998, Region 2 = California (CA) 1996–2006.
Scenario B: Region 1 = US during 1993–2002, Region 2 = CA during 1996–2006.
Scenario C: Region 1 = US during 1999–2006, Region 2 = CA during 1999–2006.

From the same data sets we have have taken β_0kj = log(κD_kj1/n_kj1) − β_k1t_k1, focussed on the Breast cancer for the first year of the time interval (i = 1). All these data were obtained from the SEER database and hence we are taking into account J = 19 age groups. Once the previous parameters have been established we can compute in a theoretical framework the individual variances of estimators β̂_k1,λ, $σ_{1 k}^{2}$ , covariance σ_1,12 and $Var ({\hat{β}}_{11, λ} - {\hat{β}}_{12, λ}) = σ_{11}^{2} + σ_{12}^{2} - 2 σ_{1, 12}$ . We can also compute the theoretical value of η_k ≡ N_k/(JI_k), the average expected value per cell, which is useful to see if the value of N_k is large enough, these values are in Table 2.

Table 2.

Average total expected means of deaths per cell.

Scenario A

Scenario B

Scenario C

β_1k

η₁

η₂

η₁

η₂

η₁

η₂

0.020

2538.24

331.42

2741.10

331.42

2493.85

265.98

0.005

2441.69

292.43

2552.96

292.43

2360.81

251.71

0.000

2410.67

280.62

2494.19

280.619

2318.67

247.19

−0.0050

2380.23

269.35

2437.28

269.35

2277.59

242.79

\frac{1}{100}

0.020

25.38

3.31

27.41

3.31

24.94

2.66

\frac{1}{100}

0.005

24.42

2.92

25.53

2.92

23.61

2.52

\frac{1}{100}

0.000

24.11

2.81

24.94

2.81

23.19

2.47

\frac{1}{100}

−0.0050

23.80

2.69

24.37

2.69

22.77

2.43

\frac{1}{300}

0.020

8.46

1.10

9.14

1.10

8.31

0.89

\frac{1}{300}

0.005

8.14

0.97

8.51

0.97

7.87

0.84

\frac{1}{300}

0.000

8.03

0.93

8.31

0.93

7.73

0.82

\frac{1}{300}

−0.0050

7.93

0.90

8.12

0.90

7.59

0.81

Open in a new tab

Since both regions share a common space, we have generated firstly its death counts by simulation and thanks to the Poisson distribution’s reproductive property under summation, we have generated thereafter the death counts for each region by adding the complementary Poisson observations. In Tables 4, 6, 8 are summarized the theoretical results as well as those obtained by simulation for the MLEs and in Tables 5, 7, 9 for the MCSEs. The variances and covariances appear multiplied by 10⁹ in all the tables. We have added tilde notation for those parameter that have been calculated by simulation with R = 22, 000 replications:

\begin{matrix} {\tilde{σ}}_{1 k, λ}^{2} = & \frac{1}{R} \sum_{r = 1}^{R} {({\hat{β}}_{1 k, λ} (r) - \tilde{E} [{\hat{β}}_{1 k, λ}])}^{2}, \tilde{E} [{\hat{β}}_{1 k, λ}] = \frac{1}{R} \sum_{r = 1}^{R} {\hat{β}}_{1 k, λ} (r), \\ {\tilde{σ}}_{1, 12, λ} = & \frac{1}{R} \sum_{r = 1}^{R} ({\hat{β}}_{11, λ} (r) - \tilde{E} [{\hat{β}}_{11, λ}]) ({\hat{β}}_{12, λ} (r) - \tilde{E} [{\hat{β}}_{12, λ}]) . \end{matrix}

Table 4.

Scenario A. Maximum Likelihood Estimators (λ = 0).

β_1k

σ_{11}^{2}

{\tilde{σ}}_{11, λ}^{2}

σ_{12}^{2}

{\tilde{σ}}_{12, λ}^{2}

σ_1,12

σ̃_1,12,λ

Var(β̂_11,λ − β̂_12,λ)

\tilde{Var} ({\hat{β}}_{11, λ} - {\hat{β}}_{12, λ})

α̃_λ

0.020

1188.50

1196.88

1468.14

1475.38

−152.70

−153.71

2962.03

*2979.67

0.049

0.005

1233.49

1245.97

1653.48

1644.16

−166.41

−159.88

3219.78

3209.89

0.050

0.000

1248.91

1237.81

1720.50

1707.01

−171.20

−156.44

3311.81

3257.70

0.049

−0.005

1264.55

1276.82

1790.33

1801.21

−176.11

−185.35

3407.10

*3448.73

*0.052

\frac{1}{100}

0.020

118849.86

120545.66

146813.66

146902.40

−15269.95

−16186.46

296203.41

*299820.97

*0.052

\frac{1}{100}

0.005

123349.03

123414.45

165348.33

167479.53

−16640.50

−16374.79

321978.37

*323643.55

*0.052

\frac{1}{100}

0.000

124891.15

124618.15

172050.41

173875.81

−17119.82

−17946.96

331181.19

*334387.88

0.050

\frac{1}{100}

−0.005

126455.01

125135.69

179033.49

181356.96

−17610.72

−15914.04

340709.94

338320.73

0.047

\frac{1}{300}

0.020

356549.59

359581.84

440440.97

451288.03

−45809.84

−53204.15

888610.23

*917278.18

*0.052

\frac{1}{300}

0.005

370047.09

373291.90

496045.00

503332.51

−49921.51

−51558.77

965935.10

*979741.96

0.050

\frac{1}{300}

0.000

374673.44

375119.77

516151.22

532280.30

−51359.46

−50448.54

993543.56

*1008297.13

*0.051

\frac{1}{300}

−0.005

379365.02

380562.71

537100.47

562780.79

−52832.16

−58143.46

1022129.82

*1059630.42

*0.054

Open in a new tab

Table 6.

Scenario B: Maximum Likelihood Estimators (λ = 0).

β_1k

σ_{11}^{2}

{\tilde{σ}}_{11, λ}^{2}

σ_{12}^{2}

{\tilde{σ}}_{12, λ}^{2}

σ_1,12

σ̃_1,12,λ

Var(β̂_11,λ − β̂_12,λ)

\tilde{Var} ({\hat{β}}_{11, λ} - {\hat{β}}_{12, λ})

α̃_λ

0.020

234.90

234.40

1468.14

1461.71

12.72

6.74

1677.59

*1682.63

0.050

0.005

251.10

252.79

1653.48

1648.47

13.91

10.97

1876.77

*1879.32

*0.052

0.000

256.77

255.02

1720.50

1713.16

14.35

7.81

1948.56

*1952.57

0.050

−0.005

262.57

261.96

1790.33

1792.15

14.83

17.78

2023.24

2018.56

0.049

\frac{1}{100}

0.020

23489.78

23328.11

146813.66

147774.27

1272.17

181.93

167759.10

*170738.52

*0.053

\frac{1}{100}

0.005

25109.90

24424.06

165348.33

147273.99

1390.58

1546.90

187677.08

168604.26

0.049

\frac{1}{100}

0.000

25676.71

25666.21

172050.41

171995.21

1435.45

822.83

194856.21

*196015.76

*0.052

\frac{1}{100}

−0.005

26257.50

26172.50

179033.49

179024.69

1483.35

708.28

202324.30

*203780.65

*0.051

\frac{1}{300}

0.020

70469.35

71112.09

440440.97

442433.57

3816.51

2392.77

503277.31

*508760.12

*0.052

\frac{1}{300}

0.005

75329.71

74737.59

496045.00

510147.59

4171.74

3181.74

563031.24

*578521.71

*0.053

\frac{1}{300}

0.000

77030.13

76849.11

516151.22

521168.35

4306.36

2781.06

584568.62

*592455.34

0.050

\frac{1}{300}

−0.005

78772.49

79582.80

537100.47

545463.20

4450.04

5288.71

606972.89

*614468.57

0.050

Open in a new tab

Table 8.

Scenario C: Maximum Likelihood Estimators (λ = 0).

β_1k

σ_{11}^{2}

{\tilde{σ}}_{11, λ}^{2}

σ_{12}^{2}

{\tilde{σ}}_{12, λ}^{2}

σ_1,12

σ̃_1,12,λ

Var(β̂_11,λ − β̂_12,λ)

\tilde{Var} ({\hat{β}}_{11, λ} - {\hat{β}}_{12, λ})

α̃_λ

0.020

505.19

502.38

4753.38

4766.57

505.19

515.35

4248.20

4238.26

0.050

0.005

532.12

529.53

5006.55

4962.78

532.12

527.77

4474.43

4436.77

0.049

0.000

541.45

543.59

5094.21

5129.48

541.45

549.19

4552.76

*4574.69

0.050

−0.005

550.96

550.15

5183.56

5202.96

550.96

563.62

4632.60

4625.87

0.051

\frac{1}{100}

0.020

50518.62

50823.72

475338.31

480772.05

50518.62

52893.29

424819.68

*425809.19

0.050

\frac{1}{100}

0.005

53212.46

53963.47

500654.98

500398.48

53212.46

53825.39

447442.52

446711.16

0.049

\frac{1}{100}

0.000

54145.29

53655.97

509420.86

511073.61

54145.29

55232.68

455275.57

454264.23

0.050

\frac{1}{100}

−0.005

55096.19

55610.24

518356.01

521012.61

55096.19

56166.72

463259.82

*464289.42

0.050

\frac{1}{300}

0.020

151555.86

149118.50

1426014.92

1461021.71

151555.86

152950.11

1274459.05

*1304240.00

0.051

\frac{1}{300}

0.005

159637.38

161042.69

1501964.94

1529631.00

159637.38

160328.08

1342327.55

*1370017.53

0.049

\frac{1}{300}

0.000

162435.88

162828.12

1528262.58

1534795.18

162435.88

165625.27

1365826.70

*1366372.76

0.047

\frac{1}{300}

−0.005

165288.56

165289.23

1555068.02

1599312.35

165288.56

168363.58

1389779.46

*1427874.42

0.050

Open in a new tab

Table 5.

Scenario A. Minimum Chi-Square Estimators (λ = 1).

β_1k

σ_{11}^{2}

{\tilde{σ}}_{11, λ}^{2}

σ_{12}^{2}

{\tilde{σ}}_{12, λ}^{2}

σ_1,12

σ̃_1,12,λ

Var(β̂_11,λ − β̂_12,λ)

\tilde{Var} ({\hat{β}}_{11, λ} - {\hat{β}}_{12, λ})

α̃_λ

0.020

1188.50

1196.61

1468.14

1474.56

−152.70

−153.88

2962.03

*2978.92

0.049

0.005

1233.49

1245.63

1653.48

1642.03

−166.41

−158.86

3219.78

3205.38

0.049

0.000

1248.91

1237.43

1720.50

1704.17

−171.20

−156.10

3311.81

3253.80

0.049

−0.005

1264.55

1276.42

1790.33

1797.77

−176.11

−185.22

3407.10

*3444.64

*0.051

\frac{1}{100}

0.020

118849.86

118678.59

146813.66

131711.15

−15269.95

−14717.95

296203.41

279825.64

*0.051

\frac{1}{100}

0.005

123349.03

121351.67

165348.33

148155.30

−16640.50

−14873.11

321978.37

299253.18

0.049

\frac{1}{100}

0.000

124891.15

122229.69

172050.41

152728.68

−17119.82

−16259.50

331181.19

307477.36

0.048

\frac{1}{100}

−0.005

126455.01

122628.20

179033.49

158913.98

−17610.72

−14215.39

340709.94

309972.96

0.045

\frac{1}{300}

0.020

356549.59

342888.09

440440.97

340354.35

−45809.84

−42220.99

888610.23

767684.41

0.050

\frac{1}{300}

0.005

370047.09

354377.57

496045.00

369058.29

−49921.51

−41173.07

965935.10

805782.01

0.045

\frac{1}{300}

0.000

374673.44

356408.32

516151.22

388239.25

−51359.46

−38265.08

993543.56

821177.73

0.045

\frac{1}{300}

−0.005

379365.02

360799.63

537100.47

403732.21

−52832.16

−47176.61

1022129.82

858885.06

0.045

Open in a new tab

Table 7.

Scenario B: Minimum Chi-Square Estimators (λ = 1).

β_1k

σ_{11}^{2}

{\tilde{σ}}_{11, λ}^{2}

σ_{12}^{2}

{\tilde{σ}}_{12, λ}^{2}

σ_1,12

σ̃_1,12,λ

Var(β̂_11,λ − β̂_12,λ)

\tilde{Var} ({\hat{β}}_{11, λ} - {\hat{β}}_{12, λ})

α̃_λ

0.020

234.90

234.36

1468.14

1459.11

12.72

6.73

1677.59

*1680.02

0.050

0.005

251.10

252.87

1653.48

1646.88

13.91

10.79

1876.77

*1878.16

*0.052

0.000

256.77

255.00

1720.50

1710.41

14.35

7.53

1948.56

*1950.35

0.050

−0.005

262.57

261.91

1790.33

1790.65

14.83

17.89

2023.24

2016.78

0.049

\frac{1}{100}

0.020

23489.78

23039.44

146813.66

132848.97

1272.17

204.75

167759.10

155478.91

*0.056

\frac{1}{100}

0.005

25109.90

24424.06

165348.33

147273.99

1390.58

1546.90

187677.08

168604.26

0.049

\frac{1}{100}

0.000

25676.71

25227.72

172050.41

152123.09

1435.45

950.07

194856.21

175450.67

0.049

\frac{1}{100}

−0.005

26257.50

25713.71

179033.49

157016.58

1483.35

608.74

202324.30

181512.81

0.050

\frac{1}{300}

0.020

70469.35

68417.63

440440.97

333558.97

3816.51

2545.19

503277.31

396886.23

*0.055

\frac{1}{300}

0.005

75329.71

71630.87

496045.00

375196.57

4171.74

2568.93

563031.24

441689.58

0.049

\frac{1}{300}

0.000

77030.13

73435.63

516151.22

380384.11

4306.36

1980.50

584568.62

449858.76

0.046

\frac{1}{300}

−0.005

78772.49

75952.38

537100.47

394349.52

4450.04

3665.47

606972.89

462970.97

0.046

Open in a new tab

Table 9.

Scenario C: Minimum Chi-Square Estimators (λ = 1).

β_1k

σ_{11}^{2}

{\tilde{σ}}_{11, λ}^{2}

σ_{12}^{2}

{\tilde{σ}}_{12, λ}^{2}

σ_1,12

σ̃_1,12,λ

Var(β̂_11,λ − β̂_12,λ)

\tilde{Var} ({\hat{β}}_{11, λ} - {\hat{β}}_{12, λ})

α̃_λ

0.020

505.19

502.28

4753.38

4756.74

505.19

514.11

4248.20

4230.79

*0.051

0.005

532.12

529.39

5006.55

4956.27

532.12

527.17

4474.43

4431.32

0.049

0.000

541.45

543.50

5094.21

5120.89

541.45

549.07

4552.76

*4566.24

0.050

−0.005

550.96

550.04

5183.56

5194.95

550.96

563.79

4632.60

4617.41

*0.051

\frac{1}{100}

0.020

50518.62

49937.40

475338.31

417941.93

50518.62

47230.21

424819.68

373418.90

0.050

\frac{1}{100}

0.005

53212.46

53092.20

500654.98

434030.70

53212.46

48517.38

447442.52

390088.14

0.048

\frac{1}{100}

0.000

54145.29

52785.97

509420.86

441697.10

54145.29

49680.01

455275.57

395123.06

0.046

\frac{1}{100}

−0.005

55096.19

54621.08

518356.01

449926.19

55096.19

50651.05

463259.82

403245.16

0.047

\frac{1}{300}

0.020

151555.86

141857.76

1426014.92

1037025.40

151555.86

119830.05

1274459.05

939223.06

0.048

\frac{1}{300}

0.005

159637.38

153101.35

1501964.94

1075673.38

159637.38

123845.16

1342327.55

981084.41

0.046

\frac{1}{300}

0.000

162435.88

154380.49

1528262.58

1074400.76

162435.88

128138.62

1365826.70

972504.01

0.043

\frac{1}{300}

−0.005

165288.56

57146.31

1555068.02

1110194.55

165288.56

131114.59

1389779.46

1005111.67

0.044

Open in a new tab

It is important to remark that such a large quantity of replications have been chosen in order to reach a reliable precision in the simulation study (e.g., it was encountered that R = 10, 000 was not large enough). The last column is referred to the exact significance level associated with the Z-test obtained by simulation when the nominal significance level is given by α = 0.05,

{\tilde{α}}_{λ} = \frac{1}{R} \sum_{r = 1}^{R} I (| Z_{λ} (r) | > z_{0.975}),

where I() is an indicator function and z_0.975 ≃ 1.96 the quantile of order 0.975 for the standard normal distribution.

It can be seen as expected, that in Scenario 3 the covariance is positive in all the cases, while in Scenario 1 the covariance is negative. It is clear that the precision for $\tilde{Var} ({\hat{β}}_{11, λ} - {\hat{β}}_{12, λ})$ as well as for α̃_λ gets better as κ increases. While for large data sets (κ = 1) there is no best choice regarding λ, for small data sets (κ = 1/300) the choice in favour of λ = 1 is clear because estimators β_11,λ − β_12,λ are more efficient, in fact $\tilde{Var} ({\hat{β}}_{11, 1} - {\hat{β}}_{12, 1}) < Var ({\hat{β}}_{11, λ} - {\hat{β}}_{12, λ}) < \tilde{Var} ({\hat{β}}_{11, 0} - {\hat{β}}_{12, 0})$ , and the exact significance levels or estimated type I error is less than for λ = 0 in all the cases (α̃₁ ≤ α̃₀). Since perhaps type II error could be better for MLEs, the power functions for both estimators have been studied. In particular, for κ = 1/300 it was observed the same behaviour as appears in Figure 2: in equidistant differences regarding β = β₁₁ − β₁₂, when $β_{11}^{0}$ is fixed, if error II is better for MLEs when β > 0 (β < 0) then error II is better for MCSEs when β < 0 (β > 0). Hence, in overall terms we recommend using MCSE rather than MLEs for small data sets. This is the case of the study illustrated for instance in Riddell and Pliska (2008) where there are a lot of cases such that the value of ${\hat{η}}_{k} = \sum_{j =}^{J} \sum_{i = 1}^{I_{k}} d_{kji} / ({JI}_{k})$ is quite low (moreover, several cases such that η̂_k < 12/19 appear without giving any estimation “due to instability of small numbers”).

Power fuction in terms of β = β₁₁ − β₁₂ when $β_{11}^{0} = 0$ , for Scenario A and κ = 1/300.

We have applied our proposed methodology to compare with real data the APC in the age-adjusted mortality rates of WC, WS and W (described at the beginning of this section) for different periods of time, 1969–1983, 1977–1991 and 1990–1999 respectively, with both estimators and for Thyroid cancer (rare cancer). The third one differs from the rest in the sense that it considers a shorter period of time for its study. The rates are expressed per 100, 000 individuals at risk. In Figure 3 the fitted models are plotted and from them it seems at first sight that there is a decreasing trend for Thyroid cancer in WC and SW, and null or decreasing trend in W. The specific values for estimates and test-statistics Z̃_λ, for λ = 0, 1, are summarized in Table 10. Apart from the appropriate test-statistic, we have included naive test-statistics Z̃λ, for λ = 0, 1 that are obtained by applying the methodology for non-overlapping regions. For Thyroid cancer there is no evidence for rejecting the hypothesis of equal APCs for WS and W but it is not clear WC and WS. Looking at the confidence intervals for each region, observe that for WC and WS the test-statistic has more power to discriminate differences than for WS and W, because the variability is less (the period of time considered for W is shorter). The hypothesis of equal APCs is rejected with 0.05 significance level for WC and WS when using the naive test, and cannot be rejected when using the proper test-statistic for overlapping regions (anyway, its p-value is close to 0.05). When dealing with common cancer types the same value of APC differences on the sample would probably lead to reject the null hypothesis.

MCSE and MLE for Thyroid cancer mortality trends in WC, SW and W.

Table 10.

Thyroid cancer mortality trends comparison among WC, SW and W during 1969–1983, 1977–1991 and 1990–1999 respectively: Maximum Likelihood Estimators and Minimum Chi-Square Estimators.

Region

β̂_1k,λ

β̂_0k,λ

{\hat{σ}}_{1 k, λ}^{2}

{\hat{σ}}_{1, k, k + 1, λ}^{2}

{\hat{APC}}_{k, λ}

CI_{APC_k,λ}(95%)

−0.0267

−0.3680

2.923 × 10⁻⁵

−77.292 × 10⁻⁵

−2.639

(−3.665,−1.601)

−0.0268

−0.3241

2.785 × 10⁻⁵

−73.429 × 10⁻⁵

−2.646

(−3.648,−1.635)

−0.0107

−0.5404

3.044 × 10⁻⁵

−32.915 × 10⁻⁵

−1.064

(−2.128; 0.011)

−0.0106

−0.4943

2.888 × 10⁻⁵

−3.1074 × 10⁻⁵

−1.053

(−2.089,−0.005)

0.0003

−0.7939

13.064 × 10⁻⁵

0.031

(−2.184; 2.297)

−0.0012

−0.7084

12.421 × 10⁻⁵

−0.117

(−2.275; 2.088)

Z-test statistics for WC vs. SW: Z_12,0 = −1.85, Z̃_12,0 = −2.08; Z_12,1 = −1.92, Z̃_12,1 = −2.16

Z-test statistics for SW vs. W: Z_23,0 = −0.85, Z̃_23,0 = −0.87; Z_23,1 = −0.75, Z̃_23,1 = −0.76

Open in a new tab

5 Concluding Remarks

In this work, we have dealt with an important problem of comparing the changing trends of cancer mortality/incidence rates between two overlapping regions. Our new proposal allows us to correctly account for the correlation induced by the overlapping regions when drawing statistical inference. The better finite sample performance of the minimum chi-square estimators, in comparison with the maximum likelihood estimators, suggests the practical utility of the proposed methods especially when comparing the APCs of rare cancers. Not only do our results verify the claim of Berkson (1980) that the efficiency of the maximum likelihood estimator is questionable for the finite sample size situations, they also encompass the Poisson models, for which the power-divergence based theoretical results (in particular for the minimum chi-square estimators) have remained elusive. In this paper, we have mainly focused on comparing two regions. Extending the methods to accommodate more than two regions simultaneously is certainly worthy of future investigations.

Acknowledgements

The authors thank the associate editor and two referees for their valuable comments and suggestions. This work was carried out during the stay of the first author as Visiting Scientist at Harvard University and Dana Farber Cancer Institute, supported by the Real Colegio Complutense and grant MTM2009-10072.

Technical Appendix

Proof of Theorem 4

Let Δ_{M_k} be the set with all possible M_k-dimensional probability vectors and $C^{M_{k}} = (0, 1) \times \overset{M_{k}}{\dots} \times (0, 1)$ . The way in which N increases is so that Diag⁻¹(n_k) m_k(β_k) does not change, hence m_s(β_k) and n_s increase at the same time (s = 1, …, M_k). This means that as N_k increases, parameter β_k does not suffer any change and neither does the normalized mean vector of deaths, $m_{k}^{*} (β) = \frac{1}{N_{k}} m_{k} (β_{k})$ . Note that $m_{k}^{*} (β_{k}) \in Δ_{M_{k}} \subset C^{M_{k}}$ . Let V ⊂ ℝ^J+1 be a neighbourhood of $β_{k}^{0}$ and a function

F_{N_{k}}^{(λ)} = (F_{1}^{(λ)}, \dots, F_{J + 1}^{(λ)}) : C^{M_{k}} \to ℝ^{J + 1},

so that

F_{i}^{(λ)} (m_{k}^{*}, β_{k}) = \frac{\partial d_{λ} (N_{k} m_{k}^{*}, m_{k} (β_{k}))}{\partial θ_{ki}}, i = 1, \dots, J + 1,

with β_k = (β_0k1, …, β_0kJ, β_1k)^T = (θ_k1, …, θ_kJ, θ_k,J+1)^T ∈ V and $m_{k}^{*} = {(m_{1}^{*}, \dots, m_{M_{k}}^{*})}^{T} \in Δ_{M_{k}} \subset C^{M_{k}}$ .

More thoroughly, considering X_k = (x_si)_{s=1, …,M_k;i=1, …, J+1} and $d_{λ} (D_{k}, m_{k} (β_{k})) = \sum_{s = 1}^{M_{k}} m_{s} (β_{k}) ϕ_{λ} (\frac{D_{s}}{m_{s} (β)})$ , where

ϕ_{λ} (x) = {\begin{matrix} \frac{x^{λ + 1} - x - λ (x - 1)}{λ (λ + 1)}, & λ (λ + 1) \neq 0, \\ {lim}_{α \to λ} ϕ_{α} (x), & λ (λ + 1) = 0, \end{matrix}

it holds

F_{i}^{(λ)} (m_{k}^{*}, β_{k}) = \sum_{s = 1}^{M_{k}} m_{s} (β) x_{si} (ϕ_{λ} (\frac{{Nm}_{s}^{*}}{m_{s} (β_{k})}) - \frac{N_{k} m_{s}^{*}}{m_{s} (β_{k})} ϕ_{λ}^{'} (\frac{{Nm}_{s}^{*}}{m_{s} (β_{k})})) .

It can be seen that replacing $m_{k}^{*} by m_{k}^{*} (β_{k}^{0}), β_{k} by β_{k}^{0}$ , it holds $F_{i}^{(λ)} (m_{k}^{*} (β_{k}^{0}), β_{k}^{0}) = 0$ , for all i = 1, …, M_k. We shall now establish that Jacobian matrix

\frac{\partial F_{N_{k}}^{(λ)} (m_{k}^{*}, β_{k})}{\partial β_{k}} = {(\frac{\partial F_{i}^{(λ)} (m_{k}^{*}, β_{k})}{\partial θ_{kj}})}_{i, j = 1, \dots, M_{k} + J + 1}

is nonsingular when $(m_{k}^{*}, β_{k}) = (m_{k}^{*} (β_{k}^{0}), β_{k}^{0})$ . For i, j = 1, …, J + 1

\frac{\partial F_{i}^{(λ)} (m_{k}^{*}, β_{k})}{\partial θ_{kj}} = \frac{\partial}{\partial θ_{j}} \frac{\partial d_{λ} (N m_{k}^{*}, m_{k} (β_{k}))}{\partial θ_{ki}} = \frac{\partial}{\partial θ_{kj}} (\sum_{s = 1}^{M_{k}} m_{s} (β_{k}) x_{si} (ϕ_{λ} (\frac{N_{k} m_{s}^{*}}{m_{s} (β_{k})}) - \frac{N_{k} m_{s}^{*}}{m_{s} (β)} ϕ_{λ}^{'} (\frac{N_{k} m_{s}^{*}}{m_{s} (β_{k})}))) = \sum_{s = 1}^{M_{k}} m_{s} (β_{k}) x_{si} x_{sj} (ϕ_{λ} (\frac{N_{k} m_{s}^{*}}{m_{s} (β)}) - \frac{N_{k} m_{s}^{*}}{m_{s} (β)} ϕ_{λ}^{'} (\frac{N_{k} m_{s}^{*}}{m_{s} (β)})) + \sum_{s = 1}^{M_{k}} N_{k} m_{s}^{*} x_{si} x_{sj} \frac{N_{k} m_{s}^{*}}{m_{s} (β)} ϕ_{λ}^{″} (\frac{N_{k} m_{s}^{*}}{m_{s} (β)}),

and because $ϕ_{λ} (1) = ϕ_{λ}^{'} (1) = 0, and ϕ_{λ}^{″} (1) = 1$ for all λ,

{\frac{\partial F_{i}^{(λ)} (m_{k}^{*}, β_{k})}{\partial θ_{kj}} |}_{(m_{k}^{*}, β_{k}) = (m_{k}^{*} (β_{k}^{0}), β_{k}^{0})} = N_{k} \sum_{s = 1}^{M_{k}} m_{s}^{*} (β_{k}^{0}) x_{si} x_{sj} .

Hence,

{{(\frac{\partial F_{N_{k}}^{(λ)} (m_{k}^{*}, β_{k})}{\partial β_{k}})}^{- 1} |}_{(m_{k}^{*}, β_{k}) = (m_{k}^{*} (β_{k}^{0}), β_{k}^{0})} = N_{k} X_{k}^{T} Diag (m_{k}^{*} (β_{k}^{0})) X_{k} .

Applying the Implicit Function Theorem there exist:

a neighbourhood U_k of $(m_{k}^{*} (β_{k}^{0}), β_{k}^{0})$ in C^M_k × ℝ^J+1 such that $\partial F^{(λ)} (m_{k}^{*}, β_{k}) / \partial β_{k}$ is nonsingular for every $(m_{k}^{*}, β_{k}) \in U_{k}$ ;
an open set A_k ⊂ C^M_k that contains $m_{k}^{*} (β_{k}^{0})$ ;
and a unique, continuously differentiable function ${\tilde{β}}_{k}^{(λ)}$ such that ${\tilde{β}}_{k}^{(λ)} (m_{k}^{*} (β_{k}^{0})) = β_{k}^{0}$ and
${(m_{k}^{*}, β_{k}) \in U_{k} : F_{N_{k}}^{(λ)} (m_{k}^{*}, β_{k}) = 0} = {(m_{k}^{*}, {\tilde{β}}_{k}^{(λ)} (m_{k}^{*})) : m_{k}^{*} \in A_{k}} .$

Since

min_{m_{k}^{*} \in A_{k}} d_{λ} (m_{k} (β_{k}^{0}), m_{k} ({\tilde{β}}_{k}^{(λ)} (m_{k}^{*}))) = min_{β_{k} \in Θ_{k}} d_{λ} (m_{k} (β_{k 0}^{0}), m_{k} (β_{k})),

it holds

{\tilde{β}}_{k}^{(λ)} (arg min_{m_{k}^{*} \in A_{k}} d_{λ} (m_{k} (β_{k}^{0}), m_{k} ({\tilde{β}}_{k}^{(λ)} (m_{k}^{*})))) = arg min_{β_{k} \in Θ_{k}} d_{λ} (m_{k} (β_{k 0}^{0}), m_{k} (β_{k})),

that is

{\tilde{β}}_{k}^{(λ)} (m_{k}^{*} (β_{k}^{0})) = arg min_{β_{k} \in Θ_{k}} d_{λ} (N_{k} m_{k}^{*} (β_{k}^{0}), m (β_{k})) .

(24)

Furthermore, from the properties of power divergence measures and because ${\tilde{β}}_{k}^{(λ)} (m_{k}^{*} (β_{k}^{0})) = β_{k}^{0}$ , we have

0 = d_{λ} (m_{k} (β_{k}^{0}), m ({\tilde{β}}_{k}^{(λ)} (m_{k}^{*} (β_{k}^{0})))) < d_{λ} (m_{k} (β_{k}^{0}), m_{k} (β_{k})), \forall m_{k} (β_{k}) \neq m_{k} (β_{k}^{0}) .

By applying the chain rule for obtaining derivatives on $F_{k}^{(λ)} (m_{k}^{*}, {\tilde{β}}_{k}^{(λ)} (m_{k}^{*} (β_{k}^{0}))) = 0$ with respect to $m_{k}^{*} \in A_{k}$ , we have

{\frac{\partial F_{N_{k}}^{(λ)} (m_{k}^{*}, β_{k})}{\partial m_{k}^{*}} |}_{β_{k} = {\tilde{β}}_{k}^{(λ)} (m_{k}^{*})} + {\frac{\partial F_{N}^{(λ)} (m_{k}^{*}, β_{k})}{\partial β_{k}} |}_{β_{k} = {\tilde{β}}_{k}^{(λ)} (m_{k}^{*})} \frac{\partial {\tilde{β}}_{k}^{(λ)} (m_{k}^{*})}{\partial m_{k}^{*}} = 0,

so that for $m_{k}^{*} = m_{k}^{*} (β_{k}^{0})$

{\frac{\partial {\tilde{β}}_{k}^{(λ)} (m_{k}^{*})}{\partial m_{k}^{*}} |}_{m_{k}^{*} = m_{k}^{*} (β_{k}^{0})} = - {(\frac{\partial F_{N}^{(λ)} (m_{k}^{*} (β_{k}^{0}), θ)}{\partial β_{k}})}^{- 1} {\frac{\partial F^{(λ)} (m_{k}^{*}, β_{k}^{0})}{\partial m_{k}^{*}} |}_{(m_{k}^{*}, β_{k}) = (m_{k}^{*} (β_{k}^{0}), β_{k}^{0})} .

The last expression is part of the Taylor expansion of ${\tilde{β}}_{k}^{(λ)} (m_{k}^{*}) around m_{k}^{*} (β_{k}^{0})$

{\tilde{β}}_{k}^{(λ)} (m_{k}^{*}) = {\tilde{β}}_{k}^{(λ)} (m_{k}^{*} (β_{k}^{0})) + {\frac{\partial {\tilde{β}}_{k}^{(λ)} (m_{k}^{*})}{\partial m_{k}^{*}} |}_{m_{k}^{*} = m_{k}^{*} (β_{k}^{0})} (m_{k}^{*} - m_{k}^{*} (β_{k}^{0})) + o (‖ (m_{k}^{*} - m_{k}^{*} (β_{k}^{0})) ‖) .

Taking derivatives on $F_{i}^{(λ)} (m_{k}^{*}, β_{k})$ with respect to $m_{j}^{*}$

\frac{\partial F_{i}^{(λ)} (m_{k}^{*}, β_{k})}{\partial m_{j}^{*}} = \frac{\partial}{\partial m_{j}^{*}} \frac{\partial d_{λ} (N_{k} m_{k}^{*}, m_{k} (β_{k}))}{\partial θ_{ki}} = \frac{\partial}{\partial m_{j}^{*}} \sum_{s = 1}^{M_{k}} m_{s} (β_{k}) x_{si} (ϕ_{λ} (\frac{{Nm}_{s}^{*}}{m_{s} (β_{k})}) - \frac{{Nm}_{s}^{*}}{m_{s} (β_{k})} ϕ_{λ}^{'} (\frac{{Nm}_{s}^{*}}{m_{s} (β_{k})})), = - N_{k} \frac{N_{k} m_{j}^{*}}{m_{j} (β_{k})} x_{ji} ϕ ″ (\frac{N_{k} m_{j}^{*}}{m_{j} (β_{k})}),

that is

{\frac{\partial F_{i}^{(λ)} (m_{k}^{*}, β_{k})}{\partial m_{j}^{*}} |}_{(β_{k}, m_{k}^{*}) = (β_{k}^{0}, m_{k}^{*} (β_{k}^{0}))} = - N_{k} x_{ji},

and hence

{\frac{\partial F_{N_{k}}^{(λ)} (m_{k}^{*}, β_{k})}{\partial m_{k}^{*}} |}_{(m_{k}^{*}, β_{k}) = (m_{k}^{*} (β_{k}^{0}), β_{k}^{0})} = {({\frac{\partial F_{i}^{(λ)} (m^{*}, θ)}{\partial m_{j}^{*}} |}_{(m^{*}, θ) = (m^{*} (β_{0}), θ_{0})})}_{i = 1, \dots, B; j = 1, \dots, M_{k}} = - N_{k} X_{k}^{T},

and

{\tilde{β}}_{k}^{(λ)} (m_{k}^{*}) = {\tilde{β}}_{k}^{(λ)} (m_{k}^{*} (β_{k}^{0})) + {(X_{k}^{T} Diag (m_{k}^{*} (β_{k}^{0})) X_{k})}^{- 1} X_{k}^{T} (m_{k}^{*} - m_{k}^{*} (β_{k}^{0})) + o (‖ (m_{k}^{*} - m_{k}^{*} (β_{k}^{0})) ‖) .

(25)

It is well known that for Poisson sampling $\frac{D_{k}}{N_{k}}$ converges almost surely (a.s.) to $m_{k}^{*} (β_{k}^{0})$ as N_k increases, which means that $\frac{D_{k}}{N_{k}} \in A_{k}$ a.s. for N_k large enough and thus according to the Implicit Function Theorem $(\frac{D_{k}}{N_{k}}, {\tilde{β}}_{k}^{(λ)} (\frac{D_{k}}{N_{k}})) \in U$ a.s. for N_k large enough. We can conclude from (24)

{\tilde{β}}_{k}^{(λ)} (\frac{D_{k}}{N_{k}}) = arg min_{β_{k} \in Θ_{k}} d_{λ} (N_{k} \frac{D_{k}}{N_{k}}, m_{k} (β_{k})) = arg min_{β_{k} \in Θ_{k}} d_{λ} (D_{k}, m_{k} (β_{k})),

which means that ${\hat{β}}_{k, λ} = {\tilde{β}}_{k}^{(λ)} (\frac{D_{k}}{N_{k}})$ , and hence from (25)

{\hat{β}}_{k, λ} - β_{k}^{0} = {(X_{k}^{T} Diag (m_{k} (β_{k}^{0})) X_{k})}^{- 1} X_{k}^{T} (D_{k} - m_{k} (β_{k}^{0})) + o (‖ \frac{D_{k} - m_{k} (β_{k}^{0})}{N_{k}} ‖) .

Taking into account that ${\hat{β}}_{1 k, λ} - β_{1 k}^{0} = e_{J + 1}^{T} ({\hat{β}}_{k, λ} - β_{k}^{0}), where e_{J + 1}^{T} = (0, \dots, 0, 1)$ , we are going to show that $e_{J + 1}^{T} {(X_{k}^{T} Diag (m_{k} (β_{k}^{0})) X_{k})}^{- 1} = σ_{1 k}^{2} {\tilde{t}}_{k}^{T} (β_{k}^{0})$ . For that purpose we consider the design matrix partitioned according to X_k = (U, υ), where U = I_J ⊗ 1_{I_k}, υ = 1_J ⊗ t_k, so that for

\begin{matrix} {(X_{k}^{T} Diag (m_{k} (β_{k}^{0})) X_{k})}^{- 1} = {(\begin{matrix} A_{11} & A_{12} \\ A_{21} & A_{22} \end{matrix})}^{- 1} = (\begin{matrix} B_{11} & B_{12} \\ B_{21} & B_{22} \end{matrix}), \\ {\begin{matrix} A_{11} = U^{T} Diag (m_{k} (β_{k}^{0})) U = Diag ({N_{kj}}_{j = 1}^{J}), \\ A_{12} = U^{T} Diag (m_{k} (β_{k}^{0})) υ = {(\sum_{i = 1}^{I_{k}} m_{k 1 i} (β_{k}^{0}) t_{ki}, \dots, \sum_{i = 1}^{I_{k}} m_{kJi} (β_{k}^{0}) t_{ki})}^{T} = A_{21}^{T}, \\ A_{22} = υ^{T} Diag (m_{k} (β_{k}^{0})) υ = \sum_{j = 1}^{J} \sum_{i = 1}^{I_{k}} m_{kji} (β_{k}^{0}) t_{ki}^{2}, \end{matrix} \end{matrix}

we can use formula

{\begin{matrix} B_{11} = A_{11}^{- 1} + A_{11}^{- 1} A_{12} B_{22} A_{21} A_{11}^{- 1} \\ B_{21} = B_{12}^{T} = - B_{22} A_{21} A_{11}^{- 1} \\ B_{22} = {(A_{22} - A_{21} A_{11}^{- 1} A_{12})}^{- 1} \end{matrix} .

(26)

It follows that

e_{J + 1}^{T} {(X_{k}^{T} Diag (m_{k} (β_{k}^{0})) X_{k})}^{- 1} = (\begin{matrix} B_{21} & B_{22} \end{matrix}) = (\begin{matrix} - B_{22} A_{21} A_{11}^{- 1} & B_{22} \end{matrix}) = B_{22} (\begin{matrix} - A_{21} A_{11}^{- 1} & 1 \end{matrix}),

where

A_{21} A_{11}^{- 1} = (\sum_{i = 1}^{I_{k}} m_{k 1 i} (β_{k}^{0}) t_{ki}, \dots, \sum_{i = 1}^{I_{k}} m_{kJi} (β_{k}^{0}) t_{ki}) Diag ({N_{kj}^{- 1}}_{j = 1}^{J}) = (N_{k 1}^{- 1} \sum_{i = 1}^{I_{k}} m_{k 1 i} (β_{k}^{0}) t_{ki}, \dots, N_{kJ}^{- 1} \sum_{i = 1}^{I_{k}} m_{kJi} (β_{k}^{0}) t_{ki}) = ({\tilde{t}}_{k 1} (β_{k}^{0}), \dots, {\tilde{t}}_{kJ} (β_{k}^{0}))

and

B_{22} = {(\sum_{j = 1}^{J} \sum_{i = 1}^{I_{k}} m_{kji} (β_{k}^{0}) t_{ki}^{2} - \sum_{j = 1}^{J} (\sum_{i = 1}^{I_{k}} m_{kji} (β_{k}^{0})) {\tilde{t}}_{kj}^{2} (β_{k}^{0}))}^{- 1} = {(\sum_{j = 1}^{J} \sum_{i = 1}^{I_{k}} m_{kji} (β_{k}^{0}) t_{ki}^{2} - \sum_{j = 1}^{J} (\sum_{i = 1}^{I_{k}} m_{kji} (β_{k}^{0})) {\tilde{t}}_{kj}^{2} (β_{k}^{0}) \pm \sum_{j = 1}^{J} (\sum_{i = 1}^{I_{k}} m_{kji} (β_{k}^{0}) t_{kj}) {\tilde{t}}_{kj} (β_{k}^{0}))}^{- 1} = {(\sum_{j = 1}^{J} \sum_{i = 1}^{I_{k}} m_{kji} (β_{k}^{0}) {(t_{ki} - {\tilde{t}}_{kj} (β_{k}^{0}))}^{2})}^{- 1},

because $\sum_{j = 1}^{J} (\sum_{i = 1}^{I_{k}} m_{kji} (β_{k}^{0})) {\tilde{t}}_{kj}^{2} (β_{k}^{0}) = \sum_{j = 1}^{J} (\sum_{i = 1}^{I_{k}} m_{kji} (β_{k}^{0}) t_{kj}) {\tilde{t}}_{kj} (β_{k}^{0})$ .

Proof of Theorem 5

Reformulating Theorem 4 we obtain

\sqrt{N_{k}} ({\hat{β}}_{1 k, λ} - β_{1 k}^{0}) = a_{k}^{T} \sqrt{N_{k}} (D_{k} - m_{k} (β_{k}^{o})) + o (‖ \sqrt{N_{k}} (\frac{D_{k}}{N_{k}} - m_{k}^{*} (β_{k}^{o})) ‖),

with $a_{k}^{T} \equiv σ_{1 k}^{2} {\tilde{t}}_{k}^{T} (β_{k}^{0}) X_{k}^{T}$ . We would like to calculate the asymptotic distribution as a linear function of

\sqrt{N_{k}} (\frac{D_{k}}{N_{k}} - m_{k}^{*} (β_{k}^{o})) \to_{N_{k} \to \infty}^{ℒ} 𝒩 (0, Diag (m_{k}^{*} (β_{k}^{0}))) .

Since

Var (a_{k}^{T} \sqrt{N_{k}} (D_{k} - m_{k} (β_{k}^{o}))) = a_{k}^{T} Var (N_{k} \sqrt{N_{k}} (\frac{D_{k}}{N_{k}} - m_{k}^{*} (β_{k}^{o}))) a_{k} = N_{k}^{2} a_{k}^{T} Diag (m_{k}^{*} (β_{k}^{0})) a_{k} = N_{k} σ_{1 k}^{2},

it holds

a_{k}^{T} \sqrt{N_{k}} (D_{k} - m_{k} (β_{k}^{o})) \to_{N_{k} \to \infty}^{ℒ} 𝒩 (0, N_{k} σ_{1 k}^{2}) .

(27)

Taking into account that $o (‖ \sqrt{N_{k}} (\frac{D_{k}}{N_{k}} - m_{k}^{*} (β_{k}^{o})) ‖) = o (O_{P} (1)) = o_{P} (1)$ , according to the Slutsky’s Theorem, the asymptotic distribution of $\sqrt{N_{k}} ({\hat{β}}_{1 k, λ} - β_{1 k}^{0})$ must coincide with the asymptotic distribution of 27.

Proof of Theorem 7

From Theorem 4 subtracting ${\hat{β}}_{12, λ} - β_{12}^{0} to {\hat{β}}_{11, λ} - β_{11}^{0}$ we get

({\hat{β}}_{11, λ} - β_{11}^{0}) - ({\hat{β}}_{12, λ} - β_{12}^{0}) = σ_{11}^{2} {\tilde{t}}_{1}^{T} (β_{1}^{0}) X_{1}^{T} ((D_{1}^{(1)} - m_{1}^{(1)} (β_{1}^{0})) - (D_{1}^{(2)} - m_{1}^{(2)} (β_{1}^{0}))) - σ_{12}^{2} {\tilde{t}}_{2}^{T} (β_{2}^{0}) X_{2}^{T} ((D_{2}^{(1)} - m_{2}^{(1)} (β_{1}^{0})) - (D_{2}^{(2)} - m_{2}^{(2)} (β_{1}^{0}))) + o (‖ \frac{D_{1} - m_{1} (β_{1}^{0})}{N_{1}} ‖) - o (‖ \frac{D_{2} - m_{2} (β_{2}^{0})}{N_{2}} ‖) .

Observe that $X_{k}^{T} D_{k}^{(2)} = {\bar{X}}_{k}^{T} {\bar{D}}^{(2)}$ , k = 1, 2, and under $β_{11}^{0} = β_{12}^{0}$ it holds $X_{k}^{T} m_{k}^{(2)} (β_{k}^{0}) = {\bar{X}}_{k}^{T} {\bar{m}}^{(2)} (β^{0})$ , k = 1, 2. In addition, o () function is not affected by the negative sign and under $β_{11}^{0} = β_{12}^{0}$ it holds $β_{1}^{0} = β_{2}^{0}$ and thus we obtain (18).

Proof of Theorem 8

We can consider the following decomposition

\sqrt{N} ({\hat{β}}_{11, λ} - {\hat{β}}_{12, λ}) = (N a_{1}^{T}) \sqrt{N} \frac{D_{1} - m_{1} (β^{0})}{N} + (N a_{2}^{T}) \sqrt{N} \frac{D_{2} - m_{2} (β^{0})}{N} + \sqrt{N} Y,

(28)

with

\sqrt{N} Y = o (\frac{1}{N_{1}^{*}} ‖ \frac{D_{1} - m_{1} (β^{0})}{\sqrt{N}} ‖) + o (\frac{1}{N_{2}^{*}} ‖ \frac{D_{2} - m_{2} (β^{0})}{\sqrt{N}} ‖),

rather than (18). Note that from Assumptions 3 and 6 $m_{k} (β^{0}) / N = N_{k}^{*} m^{*} (β^{0})$ is constant as N increases and hence $\sqrt{N} Y = o (‖ \frac{D_{1} - m_{1} (β^{0})}{\sqrt{N}} ‖) + o (‖ \frac{D_{2} - m_{2} (β^{0})}{\sqrt{N}} ‖) = o (O_{P} (1)) + o (O_{P} (1)) = o_{P} (1)$ . We would like to calculate the asymptotic distribution as a linear function of

\sqrt{N} \frac{D_{k} - m_{k} (β^{0})}{N} \to_{N \to \infty}^{ℒ} 𝒩 (0, Diag (N_{k}^{*} m^{*} (β^{0}))) .

From (28) and by applying Slutsky’s theorem we can conclude that the asymptotic distribution of $\sqrt{N} ({\hat{β}}_{11, λ} - {\hat{β}}_{12, λ})$ is central Normal. In order to calculate the variance we shall follow (18) so that

\sqrt{N} ({\hat{β}}_{11, λ} - {\hat{β}}_{12, λ}) = \sqrt{N} X_{1} + \sqrt{N} X_{2} + \sqrt{N} X_{3} + \sqrt{N} Y,

with

\begin{matrix} \sqrt{N} X_{1} = & a_{1}^{T} \sqrt{N} (D_{1}^{(1)} - m_{1}^{(1)} (β^{0})), \\ \sqrt{N} X_{2} = & a_{2}^{T} \sqrt{N} (D_{2}^{(1)} - m_{2}^{(1)} (β^{0})), \\ \sqrt{N} X_{3} = & ({\bar{a}}_{1}^{T} - {\bar{a}}_{2}^{T}) \sqrt{N} ({\bar{D}}^{(2)} - {\bar{m}}^{(2)} (β^{0})), \\ \sqrt{N} Y = & o_{P} (1) \end{matrix}

where ${\bar{a}}_{k}^{T} \equiv σ_{1 k}^{2} {\tilde{t}}_{k}^{T} (β^{0}) {\bar{X}}_{k}^{T}$ , and X₁, X₂ and X₃ are independent random variables. Since

Var (\sqrt{N} X_{k}) = Var (a_{k}^{T} \sqrt{N} (D_{k}^{(1)} - m_{k}^{(1)} (β^{0}))) = N a_{k}^{T} Diag (m_{k}^{(1)} (β^{0})) a_{k}, k = 1, 2,

Var (\sqrt{N} X_{3}) = Var (({\bar{a}}_{1}^{T} - {\bar{a}}_{2}^{T}) \sqrt{N} ({\bar{D}}^{(2)} - {\bar{m}}^{(2)} (β^{0}))) = N ({\bar{a}}_{1}^{T} - {\bar{a}}_{2}^{T}) Diag ({\bar{m}}^{(2)} (β^{0})) ({\bar{a}}_{1} - {\bar{a}}_{2}) = N ({\bar{a}}_{1}^{T} Diag ({\bar{m}}^{(2)} (β^{0})) {\bar{a}}_{1} + {\bar{a}}_{2}^{T} Diag ({\bar{m}}^{(2)} (β^{0})) {\bar{a}}_{2} - 2 {\bar{a}}_{1}^{T} Diag ({\bar{m}}^{(2)} (β^{0})) {\bar{a}}_{2}) = N (a_{1}^{T} Diag (m_{1}^{(2)} (β^{0})) a_{1} + a_{2}^{T} Diag (m_{2}^{(2)} (β^{0})) a_{2} - 2 σ_{11}^{2} σ_{12}^{2} ξ_{12}),

with

ξ_{12} = {\tilde{t}}_{1}^{T} (β^{0}) {\bar{X}}_{1}^{T} Diag ({\bar{m}}^{(2)} (β^{0})) {\bar{X}}_{2} {\tilde{t}}_{2} (β^{0}) = \sum_{j = 1}^{J} \sum_{i = 1}^{I_{1} - \bar{I}} m_{2 ji}^{(2)} (β^{0}) (t_{2 i} - {\tilde{t}}_{1 j} (β^{0})) (t_{2 i} - {\tilde{t}}_{2 j} (β^{0})) = \sum_{j = 1}^{J} \sum_{i = 1}^{I_{1} - \bar{I}} \frac{n_{2 ji}^{(2)}}{n_{2 ji}} m_{2 ji} (β^{0}) (t_{2 i} - {\tilde{t}}_{1 j} (β^{0})) (t_{2 i} - {\tilde{t}}_{2 j} (β^{0})),

it holds

Var (\sqrt{N} (X_{1} + X_{2} + X_{3})) = N (a_{1}^{T} Diag (m_{1}^{(1)} (β^{0}) + m_{1}^{(2)} (β^{0})) a_{1} + a_{2}^{T} Diag (m_{2}^{(1)} (β^{0}) + m_{2}^{(2)} (β^{0})) a_{2} - 2 σ_{11}^{2} σ_{12}^{2} ξ_{12}) = N (σ_{11}^{2} + σ_{12}^{2} - 2 σ_{11}^{2} σ_{12}^{2} ξ_{12}),

that coincides with the asymptotic variance of $\sqrt{N} ({\hat{β}}_{11, λ} - {\hat{β}}_{12, λ})$ .

Proof of Corollary 9

Since

{\tilde{t}}_{kj} (β^{0}) = {\tilde{t}}_{kj}^{(2)} (β^{0}) + \frac{m_{kj •}^{(1)}}{m_{kj •}} ({\tilde{t}}_{kj}^{(1)} (β^{0}) - {\tilde{t}}_{kj}^{(2)} (β^{0})), k = 1, 2,

formula (20) can be rewritten as

ξ_{12} = \sum_{j = 1}^{J} \sum_{i = 1}^{I_{1} - \bar{I}} m_{2 ji}^{(2)} (β^{0}) (t_{2 i} - {\tilde{t}}_{1 j}^{(2)} (β^{0}) - \frac{m_{1 j •}^{(1)}}{m_{1 j •}} ({\tilde{t}}_{1 j}^{(1)} (β^{0}) - {\tilde{t}}_{1 j}^{(2)} (β^{0}))) (t_{2 i} - {\tilde{t}}_{2 j}^{(2)} (β^{0}) - \frac{m_{2 j •}^{(1)}}{m_{2 j •}} ({\tilde{t}}_{2 j}^{(1)} (β^{0}) - {\tilde{t}}_{2 j}^{(2)} (β^{0}))) = \sum_{j = 1}^{J} \sum_{i = 1}^{I_{1} - \bar{I}} m_{2 ji}^{(2)} (β^{0}) {(t_{2 i} - {\tilde{t}}_{2 j}^{(2)} (β^{0}))}^{2} + \sum_{j = 1}^{J} \sum_{i = 1}^{I_{1} - \bar{I}} m_{2 ji}^{(2)} (β^{0}) \frac{m_{1 j •}^{(1)}}{m_{1 j •}} \frac{m_{2 j •}^{(1)}}{m_{2 j •}} ({\tilde{t}}_{1 j}^{(1)} (β^{0}) - {\tilde{t}}_{1 j}^{(2)} (β^{0})) ({\tilde{t}}_{2 j}^{(1)} (β^{0}) - {\tilde{t}}_{2 j}^{(2)} (β^{0})) - \sum_{k = 1}^{2} \sum_{j = 1}^{J} \sum_{i = 1}^{I_{1} - \bar{I}} m_{2 ji}^{(2)} (β^{0}) (t_{2 i} - {\tilde{t}}_{2 j}^{(2)} (β^{0})) \frac{m_{kj •}^{(1)}}{m_{kj •}} ({\tilde{t}}_{kj}^{(1)} (β^{0}) - {\tilde{t}}_{kj}^{(2)} (β^{0})) .

The last summand is canceled because

\sum_{j = 1}^{J} \sum_{i = 1}^{I_{1} - \bar{I}} m_{2 ji}^{(2)} (β^{0}) (t_{2 i} - {\tilde{t}}_{2 j}^{(2)} (β^{0})) \frac{m_{kj •}^{(1)}}{m_{kj •}} ({\tilde{t}}_{kj}^{(1)} (β_{2}^{0}) - {\tilde{t}}_{kj}^{(2)} (β^{0})) = \sum_{j = 1}^{J} \frac{m_{kj •}^{(1)}}{m_{kj •}} ({\tilde{t}}_{kj}^{(1)} (β^{0}) - {\tilde{t}}_{kj}^{(2)} (β^{0})) \sum_{i = 1}^{I_{1} - \bar{I}} m_{2 ji}^{(2)} (β^{0}) (t_{2 i} - {\tilde{t}}_{2 j}^{(2)} (β^{0}))

and $\sum_{i = 1}^{I_{1} - \bar{I}} m_{2 ji}^{(2)} (β^{0}) (t_{2 i} - {\tilde{t}}_{2 j}^{(2)} (β^{0})) = 0$ . Hence, it holds (22).

If region 2 is completely contained in region 1, ξ₁₂ = 1/σ₁₂, and therefore

Var ({\hat{β}}_{11, λ} - {\hat{β}}_{12, λ}) = σ_{12}^{2} + σ_{11}^{2} - 2 σ_{12}^{2} σ_{11}^{2} ξ_{12} = σ_{12}^{2} + σ_{11}^{2} - 2 σ_{11}^{2},

and it follows (23).

References

1.Berkson J. Minimum chi-square, not maximum likelihood! Annals of Statistics. 1980;8:457–487. [Google Scholar]
2.Bishop YMM, Fienberg SE, Holland PW. Discrete Multivariate Analysis: Theory and Practice. Cambridge: MIT Press; 1995. [Google Scholar]
3.Fay M, Tiwari R, Feuer E, Zou Z. Estimating average annual percent change for disease rates without assuming constant change. Biometrics. 2006;62:847–854. doi: 10.1111/j.1541-0420.2006.00528.x. [DOI] [PubMed] [Google Scholar]
4.Horner MJ, Ries LAG, Krapcho M, Neyman N, Aminou R, Howlader N, Altekruse SF, Feuer EJ, Huang L, Mariotto A, Miller BA, Lewis DR, Eisner MP, Stinchcomb DG, Edwards BK, editors. SEER Cancer Statistics Review,1975–2006. Bethesda, MD: National Cancer Institute; http://seer.cancer.gov/csr/1975_2006/ [Google Scholar]
5.Imrey PB. Power Divergence Methods. In: Armitage P, Colton T, editors. Encyclopedia of Biostatistics. New York: John Wiley and Sons; 2005. [Google Scholar]
6.Kullback S, Leibler RA. On information and sufficiency. The Annals of Mathematical Statistics. 1951;22:79–86. [Google Scholar]
7.Li Y, Tiwari RC. Comparing Trends in Cancer Rates Across Overlapping Regions. Biometrics. 2008;64:1280–1286. doi: 10.1111/j.1541-0420.2008.01002.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Li Y, Tiwari RC, Zou Z. An age-stratified model for comparing trends in cancer rates across overlapping regions. Biometrical Journal. 2008;50:608–619. doi: 10.1002/bimj.200710430. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Pardo L. Statistical Inference Based on Divergence Measures. New York: Chapman & Hall / CRC (Statistics: Textbooks and Monographs); 2006. [Google Scholar]
10.Pardo L, Martín N. Homogeneity/heterogeneity hypotheses for standardized mortality ratios based on minimum power-divergence estimators. Biometrical Journal. 2009;51:819–836. doi: 10.1002/bimj.200800158. [DOI] [PubMed] [Google Scholar]
11.Pickle LW, White AA. Effects of the choice of age-adjustement method on maps of death rates. Statistics in Medicine. 1995;14:615–627. doi: 10.1002/sim.4780140519. [DOI] [PubMed] [Google Scholar]
12.Cressie N, Read TRC. Multinomial goodness-of-fit tests. Journal of the Royal Statistical Society B. 1988;46:440–464. [Google Scholar]
13.Riddell C, Pliska JM. Portland, Oregon: Department of Human Services, Oregon Public Health Division, Oregon State Cancer Registry; 2008. Cancer in Oregon, 2005: Annual Report on Cancer Incidence and Mortality among Oregonians. http://egov.oregon.gov/DHS/ph/oscar/arpt2005/ar2005.pdf. [Google Scholar]
14.Tiwari RC, Clegg L, Zou Z. Efficient interval estimation for age-adjusted cancer rates. Statistical Methods in Medical Research. 2006;15:547–569. doi: 10.1177/0962280206070621. [DOI] [PubMed] [Google Scholar]
15.Walters KA, Li Y, Tiwari RC, Zou Z. A Weighted-Least-Squares Estimation Approach to Comparing Trends in Age-Adjusted Cancer Rates Across Overlapping Regions. Journal of Data Science. 2010;8:631–644. [PMC free article] [PubMed] [Google Scholar]

[R1] 1.Berkson J. Minimum chi-square, not maximum likelihood! Annals of Statistics. 1980;8:457–487. [Google Scholar]

[R2] 2.Bishop YMM, Fienberg SE, Holland PW. Discrete Multivariate Analysis: Theory and Practice. Cambridge: MIT Press; 1995. [Google Scholar]

[R3] 3.Fay M, Tiwari R, Feuer E, Zou Z. Estimating average annual percent change for disease rates without assuming constant change. Biometrics. 2006;62:847–854. doi: 10.1111/j.1541-0420.2006.00528.x. [DOI] [PubMed] [Google Scholar]

[R4] 4.Horner MJ, Ries LAG, Krapcho M, Neyman N, Aminou R, Howlader N, Altekruse SF, Feuer EJ, Huang L, Mariotto A, Miller BA, Lewis DR, Eisner MP, Stinchcomb DG, Edwards BK, editors. SEER Cancer Statistics Review,1975–2006. Bethesda, MD: National Cancer Institute; http://seer.cancer.gov/csr/1975_2006/ [Google Scholar]

[R5] 5.Imrey PB. Power Divergence Methods. In: Armitage P, Colton T, editors. Encyclopedia of Biostatistics. New York: John Wiley and Sons; 2005. [Google Scholar]

[R6] 6.Kullback S, Leibler RA. On information and sufficiency. The Annals of Mathematical Statistics. 1951;22:79–86. [Google Scholar]

[R7] 7.Li Y, Tiwari RC. Comparing Trends in Cancer Rates Across Overlapping Regions. Biometrics. 2008;64:1280–1286. doi: 10.1111/j.1541-0420.2008.01002.x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R8] 8.Li Y, Tiwari RC, Zou Z. An age-stratified model for comparing trends in cancer rates across overlapping regions. Biometrical Journal. 2008;50:608–619. doi: 10.1002/bimj.200710430. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R9] 9.Pardo L. Statistical Inference Based on Divergence Measures. New York: Chapman & Hall / CRC (Statistics: Textbooks and Monographs); 2006. [Google Scholar]

[R10] 10.Pardo L, Martín N. Homogeneity/heterogeneity hypotheses for standardized mortality ratios based on minimum power-divergence estimators. Biometrical Journal. 2009;51:819–836. doi: 10.1002/bimj.200800158. [DOI] [PubMed] [Google Scholar]

[R11] 11.Pickle LW, White AA. Effects of the choice of age-adjustement method on maps of death rates. Statistics in Medicine. 1995;14:615–627. doi: 10.1002/sim.4780140519. [DOI] [PubMed] [Google Scholar]

[R12] 12.Cressie N, Read TRC. Multinomial goodness-of-fit tests. Journal of the Royal Statistical Society B. 1988;46:440–464. [Google Scholar]

[R13] 13.Riddell C, Pliska JM. Portland, Oregon: Department of Human Services, Oregon Public Health Division, Oregon State Cancer Registry; 2008. Cancer in Oregon, 2005: Annual Report on Cancer Incidence and Mortality among Oregonians. http://egov.oregon.gov/DHS/ph/oscar/arpt2005/ar2005.pdf. [Google Scholar]

[R14] 14.Tiwari RC, Clegg L, Zou Z. Efficient interval estimation for age-adjusted cancer rates. Statistical Methods in Medical Research. 2006;15:547–569. doi: 10.1177/0962280206070621. [DOI] [PubMed] [Google Scholar]

[R15] 15.Walters KA, Li Y, Tiwari RC, Zou Z. A Weighted-Least-Squares Estimation Approach to Comparing Trends in Age-Adjusted Cancer Rates Across Overlapping Regions. Journal of Data Science. 2010;8:631–644. [PMC free article] [PubMed] [Google Scholar]

PERMALINK

A New Class of Minimum Power Divergence Estimators with Applications to Cancer Surveillance

Nirian Martín

Yi Li

Abstract

1 Introduction

Figure 1.

2 Models associated with the Annual Percent Change (APC)

3 Minimum Power Divergence Estimators for an Age-stratified Poisson Regression Model with Overlapping

4 Simulation Studies and Analysis of SEER Mortality Data

Table 1.

Table 3.

Table 2.

Table 4.

Table 6.

Table 8.

Table 5.

Table 7.

Table 9.

Figure 2.

Figure 3.

Table 10.

5 Concluding Remarks

Acknowledgements

Technical Appendix

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

A New Class of Minimum Power Divergence Estimators with Applications to Cancer Surveillance

Nirian Martín

Yi Li

Abstract

1 Introduction

Figure 1.

2 Models associated with the Annual Percent Change (APC)

3 Minimum Power Divergence Estimators for an Age-stratified Poisson Regression Model with Overlapping

4 Simulation Studies and Analysis of SEER Mortality Data

Table 1.

Table 3.

Table 2.

Table 4.

Table 6.

Table 8.

Table 5.

Table 7.

Table 9.

Figure 2.

Figure 3.

Table 10.

5 Concluding Remarks

Acknowledgements

Technical Appendix

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases