Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2012 Feb 23.
Published in final edited form as: J Multivar Anal. 2011 Sep 1;102(8):1175–1193. doi: 10.1016/j.jmva.2011.03.011

A New Class of Minimum Power Divergence Estimators with Applications to Cancer Surveillance

Nirian Martín 1,*, Yi Li 2,3
PMCID: PMC3285401  NIHMSID: NIHMS353973  PMID: 22368308

Abstract

The Annual Percent Change (APC) has been adopted as a useful measure for analyzing the changing trends of cancer mortality and incidence rates by the NCI SEER program. Difficulties, however, arise when comparing the sample APCs between two overlapping regions because of the induced dependence (e.g., comparing the cancer mortality change rate of California with the national level). This paper deals with a new perspective of understanding the sample distribution of the test statistics for comparing the APCs between overlapping regions. Our proposal allows for computational readiness and easy interpretability. We further propose a more general family of estimators, namely, the so-called minimum power divergence estimators, including the maximum likelihood estimators as a special case. Our simulation experiments support the superiority of the proposed estimator to the conventional maximum likelihood estimator. The proposed method is illustrated by the analysis of the SEER cancer mortality rates observed from 1991 to 2006.

Keywords: Minimum power divergence estimators, Age-adjusted cancer rates, Annual percent change (APC), Trends, Poisson sampling

1 Introduction

According to the World Health Statistics 2009, published by the World Health Organization, in 2004 the age-standardized mortality rate in high-income countries attributable to cancer deaths was 164 per 100,000. Cancer constituted the second highest cause of death after cardiovascular disease (its age-standardized mortality rate was equal to 408 per 100,000). For cancer prevention and control programs, such as the Surveillance, Epidemiology and End Results (SEER) in the United States (US), it is very important to rely on statistical tools to capture downward or upward trends of rates associated with each type of cancer and to measure their intensity accurately. These trends in cancer rates are defined within a specific spatial-temporal framework, that is, different geographic regions and time periods are considered.

Let rki be the expected value of the cancer rate associated with region k and the i-th time point in a sequence of ordered Ik time points {tki}i=1Ik. We shall assume that Region 1 starts with the earliest time. Each point is representing an equally spaced period of time, for instance a year, and thus without any loss of generality t1i = i, i = 1, …, I1 (any change in origin or scale with respect to the time should not affect a measure of trend). The cancer rates are useful to evaluate either the risk of developing cancer (cancer incidence rates) or dying from cancer (cancer mortality rates) in a specific moment. Statistically, the trend in cancer rates is an average rate of change per year in a given relatively short period of time framework when constant change along the time has been assumed. The annual percent change (APC) is a suitable measure for comparing recent trends associated with age-adjusted expected cancer rates

rki=j=1Jωjrkji, (1)

where J is the number of age-groups, {ωj}j=1J is the age-distribution of the Standard Population (j=1Jωj=1,ωj>0,j=1,,J) and {rkji}j=1J is the set of expected rates associated with the k-th region (k = 1, 2) at the time-point tki (i = 1, …, Ik), or the i-th year, in each of the age-groups (j = 1, …, J). For example, the SEER Program applies as standard the US population of year 2000 with J = 19 age-groups [0, 1), [1, 5), [5, 10), [10, 15), …, [80, 85), [85, *). The APC removes differences in scale by considering the proportion (rk,i+1rk,i)/rk,i = rk,i+1/rk,i − 1 under constant change assumption of the expected rates. Proportionality constant θk = rk2/rk1 = … = rkIk/rk,Ik−1 constitutes the basis for defining APCk = 100(θk − 1) as a percentage associated with the expected rates {rkji}j=1J of the k-th region. Since the models that deal with the APCs consider the logarithm of age-adjusted cancer rates, the previous formula is usually replaced by

APCk=100(exp(β1k)1), (2)

and we would like to make statistical inferences on parameter β1k.

The data that are collected for modeling the APC associated with region k, are:

  • dkji, the number of deaths (or incidences) in the k-th region, j-th age-group, at the time-point tki;

  • nkji the population at risk in the k-th region, j-th age-group, at the time-point tki;

so that the r.v.s that generate dkji, Dkji, are considered to be mutually independent. In a sampling framework we can define the empirical age-adjusted cancer rates as Rki=j=1JωjRkji=j=1JωjDkjinkji,, whose expected value is (1). Even though the assumption of “independence” associated with Dkji simplify the process of making statistical inference, it is in practice common to find situations in which the two APCs to be compared, APC1 and APC2, share some data because there is an overlap between the two regions. For example, in Riddell and Pliska (2008) county-level data on 22 selected cancer sites during 1996–2005 are analyzed, so that the APC of each county is compared with the APC of Oregon state. It is not possible to assume independence between the data of counties (local level) and their state (global level). Moreover, the APC comparison between overlapping regions is more complicated when the APCs are not for the same period of time. For instance in the aforementioned study that appeared in Riddell and Pliska (2008), while Oregon APC was obtained for a period of time ending in 2005, the US APC was calculated for a period of time ending in 2004 because the US data of year 2005 were not available. Figure 1 represents the most complicated overlapping case for two regions, where {1, 6} × {5, 8} is the set of points of the first region, {5, 9} × {2, 6} is the set the points of the second region, {5, 6} × {5, 6} is the set of points of the overlapping region (boxed points). Each of the two regions have a portion of space and period of time not contained in the other one (circular points for region 1 and diamond points for region 2).

Figure 1.

Figure 1

Two overlapping regions not sharing the same period of time.

This paper is structured as follows. In Section 2 different models that establish the relationship between rki and β1k are reviewed and the two basic tools for making statistical inferences are presented, the estimators and test-statistics for equal APCs. Specifically, the Age-stratified Poisson Regression model, introduced for the first time in Li et al. (2008), is highlighted as the model that arises as an alternative to improve the previous ones. Based on Power-divergence measures, in Section 3 a family of estimators that generalize the maximum likelihood estimators (MLEs) are considered for the Age-stratified Poisson Regression model. In addition, a new point of view for computing the covariance between the MLEs of β1k is introduced inside the framework of this family of estimators and this is the key for substantially improving the Z-test statistic for testing the equality of APCs for the Age-stratified Poisson Regression model. In addition, such a methodology provides explicit and interpretable expressions of the covariance between the estimators of β1k. We evaluate the performance of the new proposed methodology in Section 4 through a simulation study and we also consider an application example to Breast and Thyroid cancer data from California (CA) and the US population, extracted from the SEER*STAT software of the SEER Program. Finally in Section 5 some concluding remarks are given.

2 Models associated with the Annual Percent Change (APC)

When non-overlapping regions are taken into account, there are basically two models which allow us to estimate the APC starting from slightly different assumptions, the Age-adjusted Cancer Rate Regression model and Age-stratified Poisson Regression model. The main difference between them is based on the probability distribution of Dkji, number of deaths in the k-th region, j-th age-group, at the time-point tki: while the Age-adjusted Cancer Rate Regression model assumes normality for log Rki with Dkji having the same mean and variance, the Age-stratified Poisson Regression model assumes directly a Poisson random variable (r.v.) for Dkji. The Age-adjusted Cancer Rate Regression model establishes log Rki = β0k + β1ktki + εki, where εki~ind𝒩(0,σki2) with σki2=j=1Jωj2rkji/nkji=j=1Jωj2mkji/nkji2 under

E[Dkji]=Var[Dkji]=nkjirkjimkji, (3)

i.e. log Rki~ind𝒩(log rki,σki2) with

rki=exp(β0k) exp(β1ktki). (4)

According to the Age-stratified Poisson Regression model (Li et al. 2008), Dkji~ind𝒫(nkjirkji) and for rkji it holds

log rkji=β0kj+β1ktki   or   logmkjinkji=β0kj+β1ktki. (5)

Observe that the parametrization of both models is essentially the same because the expected age-adjusted rate rki in terms of (5) is equal to (4), where

exp(β0k)=j=1Jωj exp(β0kj), (6)

and thus for both models it holds that

θk=(rkIkrk1)1tkIktk1=exp(β1k). (7)

The original estimators associated with the Age-adjusted Cancer Rate Regression model and Age-stratified Poisson Regression model are the Weighted Least Square estimators (WLSE) and Maximum Likelihood estimators (MLE) respectively.

The hypothesis testing for comparing the equality of trends of two regions, ℋ0 : APC1 = APC2, is according to (2), equivalent to ℋ0 : β11 − β12 = 0. Hence, the Z-test statistic for both models can be defined as

Z=β^11β^12Var^(β^11β^12), (8)

where β̂1k, k = 1, 2 are the estimators of β1k associated with each region, Var^(β^11β^12) is the estimator of the variance of β̂11 − β̂12, Var(β̂11 − β̂12). The expression of the variance is Var(β^11β^12)=σ112+σ122, with σ1k2Var(β^1k), k = 1, 2, for non-overlapping regions. When overlapping regions are taken into account, the methodology for obtaining the estimators as well as Z-test statistic (8) remain valid, but the given expression for Var(β̂11 − β̂12) is no longer valid. When the overlapping regions do not share the same period of time (t11t21 or I1I2), we must consider a new reference point for index i, denoted by Ī, such that t1Ī represents the time point within {t1i}i=1I1 where the time series associated with the second region is about to start, i.e. we have {t2i}i=1I2 such that t21 = t1Ī + 1. In particular, if t1i = i, i = 1, …, I1, then t2i = Ī + i, i = 1, …, I2. Observe that {t1i}i=I¯+1I1, or equivalently {t2i}i=1I1I¯, is the time series associated with the overlapping region (t1i = t2,iĪ, i = Ī + 1, …, I1). In Figure 1 I1 = 6, I2 = 5, Ī = 4 and thus we can distinguish three subregions {5, 6} × {1, …, 4}, {5, 6} × {5, 6} and {5, 6} × {7, ×, 9}. Without any loss of generality each random variable Dkji can be decomposed into two summands

Dkji=Dkji(1)+Dkji(2) (9)

where Dkji(1), i ∈ {1, …, Ik}, is the number of deaths (or incidences) in the k-th region, j-th age-group, at the time-point tki for the subregion where there is no overlap in space; Dkji(2), i ∈ {1, …, Ik}, is the the number of deaths (or incidences) in the k-th region, j-th age-group, at the time-point tki for the subregion where there is overlap in space. Similarly, nkji=nkji(1)+nkji(2) and mkji(βk)=mkji(1)(βk)+mkji(2)(βk). Observe that when i ∈ {Ī + 1, …, I1}, r.v.s D1ji(2) and D2j,iI¯(2) are associated with the same overlapping subregion. Revisiting the example illustrated in Figure 1, it should be remarked that in the y-axis (space) there are more points than those that represent one realization of all r.v.s Dkji(b) in each time point, but grouping the points belonging to the same vertical line inside the portion marked in dash we are referring to one realization of them (for instance, for t11 = 1 we have two groups of points associated with D1j1(1),D1j1(2) respectively, while for t1j5 = t2j1 = 5 we have three groups of points associated with D1j5(1),D1j5(2) or D2j1(2),D2j1(1). Grouping points symbolize different extension in regions. In Figure 1 there are 20 realizations of all r.v.s Dkji(b) in total, 12 for region 1, 10 for region 2 and 2 r.v.s are shared for both regions.

It is important to understand r.v.s Dkji(b), b ∈ {1, 2} as “homogeneous contributors” with respect to Dkji, i.e. Dkji(b)~𝒫(mkji(b)) such that (10) holds, and hence {m1ji(2)(β1)}i=I¯+1I1 and {m2ji(2)(β2)}i=1I1I¯ are only equal when β11 = β12 (or equivalently, when β1 = β2). Now we can say thoroughly that under β11 = β12, the reason why Cov(β̂11, β̂12) = 0 is not true inside Var(β̂11 − β̂12) = Var(β̂11) + Var(β̂12) − 2Cov(β̂11, β̂12) for overlapping regions is that {D1ji}i=1, …, I1;j=1, ‥, J and {D2ji}i=1, …, I2;j=1, ‥, J are not independent, because both regions share the same the set of r.v.s {D1ji(2)}i=I¯+1,,I1;j=1,,J with D1ji(2)=D2j,iI¯(2).

Assumption 1 Dkji(b)~ind𝒫(mkji(b)), b ∈ {1, 2}, where for nkji(b)>0 the following holds

mkji(b)=nkji(b)nkjimkji,  b{1,2}. (10)

We accept the case where nkji(b)=0,, for some b ∈ {1, 2}, so that Dkji(b)=0 a.s. (degenerate r.v.) because mkji(b)=0.

Regarding the basic models considered in the papers dealing with overlapping regions, the Age-stratified Poisson regression model can be considered as the most realistic one, actually they have been constructed by successive improvements on the previous models so that initially normality assumptions were taken as approximations of underlying Poisson r.v.s. In the first paper concerned about trend comparisons across overlapping regions (Li and Tiwari (2008)), it is remarked that “… the derivation of Cov(β̂11, β̂12), …., is nontrivial as it requires a careful consideration of the overlapping of two regions”. The assumption considered by them (which is based on Pickle and White (1995)) for the overlapping subregion is similar to the assumption considered herein in the sense that the overlapping subregion follows the same distribution considered for the whole region. A similar criterion was followed in Li et al. (2007) and Li et al. (2008).

3 Minimum Power Divergence Estimators for an Age-stratified Poisson Regression Model with Overlapping

Let ms be the expected value of the r.v. of deaths (or incidences) Ds associated with the s-th cell of a contingency table with MkJIk cells (s = 1, …, Mk). In this section, we consider model (5) in matrix notation so that the triple indices are unified in a single one by following a lexicographic order. Hence, the vector of cell means mk(βk) = (m1(βk), …, mMk (βk))T = (mk11(βk), …, mkJIk (βk))T of the multidimensional r.v. of deaths (or incidences) Dk = (D1, …, DMk)T = (Dk11, …, DkJIk)T, is related to the vector of parameters βk= (β0k1, …, β0kJ, β1k)T ∈ Θk = ℝJ+1 according to

log (Diag1(nk)mk(βk))=Xkβk  or  mk(βk)=Diag(nk) exp(Xkβk), (11)

where Diag(nk) is a diagonal matrix of individuals at risk nk = (n1, …, nMk)T = (nk11, …, nkJIk)T (ns > 0, s = 1, …, Mk) and

Xk=(1Iktk1Iktk)JIk×(J+1)=(IJ1Ik,1Jtk),

with tk ≡ (tk1, …, tkIk)T, a full rank Mk × (J + 1) design matrix. Based on the likelihood function of a Poisson sample Dk the kernel of the log-likelihood function is given by

βk(Dk)=s=1MkDs log ms(βk)s=1Mkms(βk),

and thus the MLE of βk is

β^k=arg max βkΘkβk(Dk).

It is well known that there is a very close relationship between the likelihood theory and the Kullback-Leibler divergence measure (Kullback and Leibler (1951)). Focussed on a multinomial contingency table it is intuitively understandable that a good estimator of the probabilities of the cells should be such that the discrepancy with respect to the empirical distribution or relative frequencies is small enough. The oldest discrepancy or distance measure we know is the Kullback divergence measure, actually the estimator which is built from the Kullback divergence measure is the MLE. By considering the unknown parameters of a Poisson contingency table, the expected values, rather than probabilities and the observed frequencies rather than relative frequencies, we are going to show how is it possible to carry out statistical inference for Poisson models through power divergence measures. According to the Kullback divergence measure, the discrepancy or distance between the Poisson sample Dk and its vector of means mk(βk) is given by

dKull(Dk,mk(βk))=s=1Mk(Ds logDsms(βk)Ds+ms(βk)). (12)

Observe that dKull(Dk,mk(βk)) = −ℓβk (Dk) + Ck, where Ck does not depend on parameter βk. Such a relationship allows us to define the MLE of βk as minimum Kullback divergence estimator

β^k=arg min βkΘkdKull(Dk,mk(βk)),

and the MLE of mkk) functionally as mk(β̂k) due to the invariance property of the MLEs. The power divergence measures are a family of measures defined as

dλ(Dk,mk(βk))=1λ(1+λ)s=1Mk(Dsλ+1msλ(βk)Ds(1+λ)+λms(βk)),   λ{0,1}. (13)

such that from each possible value for subscript λ ∈ ℝ − {0, −1} a different way to quantify the discrepancy between Dk and mk(βk) arises. In case of λ ∈ {0, −1}, we define dλ(Dk,mk(βk)) = limℓ→λ d(Dk,mk(βk)), and in this manner the Kullback divergence appears as special case of power divergence measures when λ = 0, d0(Dk,mk(βk)) = dKull(Dk,mk(βk)) and on the other hand case λ = −1 is obtained by changing the order of the arguments for the Kullback divergence measure, d−1(Dk,mk(βk)) = dKull(mk(βk), Dk).

The estimator of βk obtained on the basis of (13) is the so-called minimum power divergence estimator (MPDE) and it is defined for each value of λ ∈ ℝ as

β^k,λ=arg min βkΘkdλ(Dk,mk(βk)), (14)

and the MPDE of mk(βk) functionally as mk(β̂k) due to the invariance property of the MPDEs. Apart from the MLE (β̂k or β̂k,0) there are other estimators that are members of this family of estimators: minimum modified chi-square estimator, β̂k,−2; minimum modified likelihood estimator, β̂k;−1; Cressie-Read estimator, β̂k,2/3; minimum chi-square estimator, β̂k,1. These estimators were introduced and analyzed for multinomial sampling by Cressie and Read (1988), but for Poisson sampling were applied for the first time in Pardo and Martín (2009). The so-called minimum ϕ-divergence estimators are a wider class of estimator that contains MPDEs as special case (see Pardo (2005) for more details) and this statistical problem could be easily extended for these estimators.

Taking into account that the asymptotic distribution of all MPDEs tend to be “theoretically” the same, including the MLE, we are going to propose an alternative method for estimating Var(β̂11 − β̂12) = Var(β̂11,0 − β̂12,0) that covers a new element for overlapping regions, Cov(β̂11, β̂12) = Cov(β̂11,0, β̂12,0). We postulate that for not very large data sets, the MLEs, β̂11,0 − β̂12,0, might be likely improved by the estimation associated with λ = 1, β̂11,1 − β̂12,1, when overlapping regions are considered.

In order to obtain the MPDE of (2), APC^k,λ=100(exp(β^1k,λ)1), we need to compute the estimator of the parameter of interest by following the next result.

Proposition 2 The MPDE of β1k, β̂1k, is the solution of the nonlinear equation

f(β^1k,λ)=i=1Iktkiϒki=0,

with

ϒki=j=1Jmkji(β^λ)(φkji1),mkji(β^λ)=nkji exp(β^0kj,λ) exp(β^1ki,λtki)  and  φkji=(Dkjimkji(β^λ))λ+1,exp(β^0kj,λ)=(s=1Ikpkjsψkjsλ+1)1λ+1,   j=1,,J,pkjs=nkjs exp(β^1k,λtks)h=1Iknkjh exp(β^1k,λtkh)  and  ψkjs=Dkjsnkjs exp(β^1k,λtks).

Our aim is to show that β̂11,λ − β̂12,λ is asymptotically normal and to obtain an explicit expression of the denominator of the Z-test statistic (8) with MPDEs

Zλ=β^11,λβ^12,λVar^(β^11,λβ^12,λ), (15)

when the random vectors of observed frequencies of both regions, D1 and D2, share some components (those belonging to the overlapping subregion). Since (15) is approximately standard normal for min{N1, N2} large enough, we can test ℋ0 : APC1 = APC211 = β12) vs ℋ1 : APC1 ≠ APC211 ≠ β12), so that if the value of |Zλ| is greater than the quantile z1α2(i.e.,Pr(Zλ<z1α2)=1α2), ℋ0 is rejected with significance level α.

The following result is the key result for estimating the variances and covariance of the estimators of interest, β̂1k, k = 1, 2. It allows us to establish a linear relationship between the parameter of interest and the observed frequencies under Poisson sampling when the expected total mean Nk in each region (k = 1, 2) is large enough and the way that Nk increases is given in Assumption 3.

Assumption 3 mkji*(βk0)=mkji(βk0)/Nk remains constant as Nk increases, that is, mkji(βk0) increases at the same rate as Nk.

Theorem 4 The MPDE of β1k, β̂1k, k = 1, 2, can be expressed as

β^1k,λβ1k0=σ1k2t˜kT(βk0)XkT(Dkmk(βk0))+o(Dkmk(βk0)Nk),

where superscript 0 is denoting the true and unknown value of a parameter, o is denoting a little o function for a stochastic sequence (see Appendix in Bishop et al. (1975)) and

σ1k2=(t˜kT(βk0)XkTDiag(mk(βk0))Xkt˜k(βk0))1=(j=1Ji=1Ikmkji(βk0)(tkit˜kj(βk0))2)1, (16)
t˜kT(βk0)=(t˜k1(βk0)      t˜kJ(βk0)   1),t˜kj(βk0)=i=1Ikmkji(βk0)tkii=1Ikmkji(βk0). (17)

Theorem 5 The MPDE of β1k, β̂1k, k = 1, 2, is asymptotically Normal, unbiased and with variance equal to (16).

Note that Theorem 5 would be more formally enunciated in terms of Nk(β^1k,λβ1k0),, because σ1k2 is not constant as Nk increases. We have avoided that in order to focus directly on the estimator of interest. Due to Assumption 3 and t˜kj(βk0)=i=1Ikmkji*(βk0)tki, what is constant is

Var(Nkβ^1k,λ)=Nkσ1k2=(j=1Ji=1Ikmkji*(βk0)(tkit˜kj(βk0))2)1.

Let N be the total expected value of the region constructed by joining regions 1 and 2. Note that NN1 + N2, being only equal with non-overlapping regions. In order to establish the way that N increases with respect to Nk, we shall consider throughout the next assumption.

Assumption 6 Nk*=NkN(k=1,2) is constant as N increases, that is N increases at the same rate as Nk.

Note that for overlapping regions, N1*+N2*>1 holds and under the hypothesis that β110=β120, we have a common true parameter vector β110β120(k=1,2). Hence, under the hypothesis that β110=β120, since N1*+N2*=1+j=1Ji=1I1I¯m2kj(2)(β0)/N is constant, the overlapping death fraction, j=1Ji=1I1I¯m2kj(2)(β0)/N,, is also constant as N increases.

Theorem 7 Under the hypothesis that β110=β120, the MPDE of β11 − β12, β̂11,λ − β̂12,λ, is decomposed as

β^11,λβ^12,λ=X1+X2+X3+Y, (18)
X1=σ112t˜1T(β0)X1T(D1(1)m1(1)(β0)),X2=σ122t˜2T(β0)X2T(D2(1)m2(1)(β0)),X3=(σ112t˜1T(β0)X¯1Tσ122t˜2T(β0)X¯2T)(D¯(2)m¯(2)(β0)),Y=o(D1m1(β0)N1)+o(D2m2(β0)N2),

where k is an amplified J(Ī + I2) × (J + 1) matrix of Xk,

X¯k=(1¯kt¯k1¯kt¯k)J(I¯+I2)×(J+1)=(IJ1¯k,1Jt¯k),1¯1T=(1I1T,0I¯+I2I1T)   and   1¯2T=(0I¯T,1I2T),t¯1T=(t1T,0I¯+I2I1T)   and   t¯2T=(0I¯T,t2T),

and (2) = (111, …, 1J,Ī+I2)T, m¯(2)(β0)=m¯111(2)(β0),,m¯1J,I¯+I2(2)(β0))T are the vectors obtained joining Dk(2) for k = 1, 2 and mk(2)(β0) for k = 1, 2 respectively, i.e.

D¯(2)=((D111,,D1JI¯),(D2(2))T)T,  D2(2)=(D211,,D2JI2)T,m¯(2)(β0)=((m111(2)(β0),,m1JI¯(2)(β0)),m2(2)(β0))T,  m2(2)(β0)=(m211(2)(β0),,m2JI2(2)(β0))T.

Theorem 8 Under the hypothesis that β110=β120, the asymptotic distribution of β̂11,λ − β̂12,λ is central Normal with

Var(β^11,λβ^12,λ)=σ112+σ1222σ112σ122ξ12

where σ1k2 is equal to

σ1k2=(j=1Ji=1Ikmkji(β0)(tkit˜kj(β0))2)1=(j=1Ji=1Ikmkji(β0)tki2j=1Jmkjt˜kj2(β0))1, (19)

with mkj=i=1Ikmkji(β0),t˜kj(βk0) is (17) and

ξ12=j=1Ji=1I1I¯n2ji(2)n2jim2ji(β0)(t2it˜1j(β0))(t2it˜2j(β0))=j=1Ji=1I1I¯n2ji(2)n2jim2ji(β0)(t2i2+t˜1j(β0)t˜2j(β0))j=1Ji=1I1I¯n2ji(2)n2jim2ji(β0)t2i(t˜1j(β0)+t˜2j(β0)). (20)

That is, the covariance between β̂11,λ and β̂12,λ is given by

σ1,12=Cov(β^11,λ,β^12,λ)=σ112σ122ξ12, (21)

and the correlation by ρ1,12 = Cor(β̂11,λ, β̂12,λ) = σ11σ12σ12.

For the expression in the denominator of (15), we need to obtain the MPDEs of σ1k2, k = 1, 2 and ξ12, σ^1k,λ2, k = 1, 2 and ξ̂12,λ respectively. A way to proceed is based on replacing β0 by the most efficient MPDE

β^λ0{β^1,λ0,if N1N2β^2,λ0,if N1<N2.

An important advantage of this new methodology is that the expression of the denominator of (15) is explicit, easy to compute and can be interpreted easily. The term (20) determines the sign of (21). The structure of (20) is similar to the covariance proposed in the model of Li et al. (2007) for WLSEs or as well as for the estimators in the model of Li and Tiwari (2008). We can see that if there is no time-point shared by the two regions, i.e. ĪI1, then σ̂1,12,λ = 0 and Var^(β^11,λβ^12,λ)=σ^11,λ2+σ^12,λ2; if there is no space overlap, then it holds m2ji(2)(β^λ0)=0 for all i and j belonging to the overlapping subregion and hence σ1,12,λ = 0 and Var^(β^11,λβ^12,λ)=σ^11,λ2+σ^12,λ2.. On the other hand, when the two regions to be compared share at least one time-point and there is space overlap, Var^(β^11,λβ^12,λ)=σ^11,λ2+σ^12,λ22σ^1,12,λ holds, with σ̂1,12,λ ≠ 0. Moreover, when the period of time not shared by the two regions is large (small), the covariance tends to be negative (positive) because the average values, t˜1j(β^1,λ0) and t˜2j(β^2,λ0), are more separated from (closer to) the time-points associated with the overlapping subregion. We shall later analyze this behaviour through a simulation study, and we shall now investigate how is the structure of ξ12 when the two regions to be compared share the whole period of time.

Corollary 9 When Ī = 0 and I1 = I2, under the hypothesis that β110=β120

ξ12=1σ1(2)2+j=1Jm1j(1)m1jm2j(1)m2jmj(2)(t˜1j(1)(β0)t˜1j(2)(β0))(t˜2j(1)(β0)t˜2j(2)(β0)), (22)

with

1σ^1(2)2=j=1Ji=1I2m2ji(2)(β0)(t2it˜2j(2)(β0))2,t˜kj(b)(β0)=i=1Ikmkji(b)(β0)tkii=1Ikmkji(b)(β0),mkj(b)=i=1Ikmkji(b)(β0),  mkj=mkj(1)+mkj(2),mj(2)=i=1I2m2ji(2)(β0)=i=1I1m1ji(2)(β0).

σ1(2)2 represents the variance of β̂12,λ focussed on the overlapping subregion. In particular, if region 2 is completely contained in region 1, ξ12=1/σ1(2)2=1/σ122,m2j(1)=0 for all j = 1, …, J, and hence

Var(β^11,λβ^12,λ)=σ122σ112. (23)

4 Simulation Studies and Analysis of SEER Mortality Data

When dealing with asymptotic results, it is interesting to analyze the performance of the theoretical results in an empirical framework. Specifically, for Poisson sampling what is important to calibrate is the way that the total expected value of deaths (or incidences) Nk affects the precision of the results. Other characteristics such as the percentage of overlapping regions “in space” or “in time”, as well as the suitable choice of λ values are also worth to be analyzed. As a preliminary study, before focussing on Nk, we have considered thyroid cancer mortality (rare cancer) in three regions, Western (W) US population (composed of Arizona, New Mexico and Texas), South Western (SW) US population (composed of Arizona, California and Nevada) and West Coast (WC) US population (composed of California, Oregon and Washington). APC comparison of W vs. SW (Arizona is shared) on one hand and SW vs. WC (California is shared) on the other hand are considered. We have taken different scenarios with different time periods, 1998–2007 for SW in all scenarios and 1986–1995, 1989–1998, 1992–2001, 1995–2004 and (1998–2007) for the other region (W or WC) in each of scenarios A′, B′, C′, D′ and E′ respectively. In Table 1 the percentage of expected deaths in the regions to be compared with respect to the shared part (the percentages of overlapping) are shown, when β11 = β12 = −0.005 (APC1 = APC2 ≃ −0.5) for W vs. SW, and β13 = β14 = 0.02 for SW vs. WC (APC3 = APC4 ≃ 2.02). Observe that in the same scenario but different couple of comparisons, the change in overlapping percentage is due to the space overlapping (the overlapping percentages are greater for SW vs. WC, actually the shared part is a large state, California). In addition, we have chosen some values of λ, λ{0.5,0,23,1,1.5}, in order to compare the performance of minimum power divergence estimators. In Table 3 these results are shown for W vs. SW. From scenario B′ to E′ (i.e. when the overlapping percentage is increasing), the covariance is increasing, starts with negative values at B′ (1 time point is shared), decreases at E′ (4 time points are shared), later positive values but small are reached at F′ (7 time points are shared) and finally at E′ (10 time points are shared) ends with positive and high values. It seems that more or less the sign of the covariance changes in the middle of time points considered for each of the regions. In scenario A′ the theoretical covariance is zero, actually the two regions do not share observations. By asterisk we have marked the variances and significance levels obtained by simulation which are greater than its corresponding theoretical values, in order to visualize them as the worst cases. From the results it is concluded the minimum power divergence estimators with λ = 1, that is the minimum chi-squared estimators provide empirically efficient estimators and their Z-test statistics have good performance with respect to the theoretical significance level in the sense that tend to be much smaller. We have omitted the results for SW vs. WC because we have seen that the space overlapping by itself do not affect much the covariances of β̂k1,λ. That is, there were no remarkable difference among the covariances in case of choosing SW vs. WC rather than W vs. SW, because the sign of the covariances starts at the same scenario and it is just the value of the covariance what marks the difference between both of them. The behaviour of minimum power divergence estimators is very similar too. Hence, in the simulation study that follows we are going to focus only on fixed overlapping percentages and one of them is going to be 100% and the focus of interest are going to be the MLEs and the MCSEs.

Table 1.

Overlapping percentages for W vs SW and SW vs WC in five scenarios.

space \ time sc A sc B sc C sc D sc E
W vs SW 18.96%; 13.03% 12.66%; 9.12% 6.94%; 5.24% 1.66%; 1.32% 0%; 0%
SW vs WC 81.80%;78.39% 59.09%; 54.06% 34.75%; 30.30% 8.93%; 7.40% 0%; 0%

Table 3.

Minimum Power Divergence Estimators with λ{0.5,0,23,1,1.5} for scenarios A′, B′, C′, D′ and E′.

sc λ
σ112
σ˜11,λ2
σ122
σ˜12,λ2
σ1,12 σ̃1,12,λ Var(β̂11,λ − β̂12,λ)
Var˜(β^11,λβ^12,λ)
α̃λ
A −0.5 106106.94 117206.91 88722.57 96004.29 0.00 −190.20 194829.51 *213591.59 *0.056
A 0 106106.94 106482.52 88722.57 88399.60 0.00 −131.23 194829.51 *195144.57 0.050
A
23
106106.94 100968.49 88722.57 84561.80 0.00 −64.51 194829.51 185659.31 0.047
A 1 106106.94 99842.89 88722.57 83793.94 0.00 −34.77 194829.51 183706.37 0.047
A 1.5 106106.94 99346.15 88722.57 83510.99 0.00 2.27 194829.51 182852.60 0.049

B −0.5 106106.94 117293.27 83850.45 92311.97 −4020.16 −3833.28 197997.72 *217271.80 *0.058
B 0 106106.94 106707.01 83850.45 85398.66 −4020.16 −3490.04 197997.72 *199085.75 0.051
B
23
106106.94 101342.71 83850.45 81753.09 −4020.16 −3229.76 197997.72 189555.32 0.049
B 1 106106.94 100261.70 83850.45 80985.96 −4020.16 −3142.66 197997.72 187532.98 0.049
B 1.5 106106.94 99807.30 83850.45 80649.82 −4020.16 −3047.64 197997.72 186552.39 *0.052

C −0.5 106106.94 116056.24 79295.40 84620.24 −6035.64 −5099.81 197473.63 *210876.09 *0.055
C 0 106106.94 105572.08 79295.40 78400.39 −6035.64 −4630.90 197473.63 193234.25 0.048
C
23
106106.94 100178.76 79295.40 75138.00 −6035.64 −4302.56 197473.63 183921.87 0.046
C 1 106106.94 99090.96 79295.40 74470.54 −6035.64 −4199.62 197473.63 181960.74 0.046
C 1.5 106106.94 98646.02 79295.40 74214.18 −6035.64 −4094.68 197473.63 181049.56 0.049

D −0.5 106106.94 115548.66 74971.59 81107.54 2294.32 2271.85 176489.89 *192112.50 *0.057
D 0 106106.94 104872.37 74971.59 75820.32 2294.32 2148.49 176489.89 176395.71 0.050
D
23
106106.94 99400.99 74971.59 72923.02 2294.32 2060.12 176489.89 168203.77 0.048
D 1 106106.94 98300.76 74971.59 72306.86 2294.32 2034.16 176489.89 166539.31 0.050
D 1.5 106106.94 97854.16 74971.59 72044.77 2294.32 2011.33 176489.89 165876.27 *0.052

E −0.5 106106.94 115740.28 70747.13 75885.14 15621.44 17152.32 145611.20 *157320.78 *0.055
E 0 106106.94 105114.37 70747.13 71094.62 15621.44 16123.83 145611.20 143961.33 0.048
E
23
106106.94 99710.57 70747.13 68383.44 15621.44 15273.05 145611.20 137547.92 0.047
E 1 106106.94 98636.63 70747.13 67789.30 15621.44 14953.01 145611.20 136519.90 0.047
E 1.5 106106.94 98219.30 70747.13 67513.89 15621.44 14557.46 145611.20 136618.26 0.049

For studying the precision of the results when Nk changes, we have considered three proportionality constants κ{1,1100,1300} associated with Nk in each of the following scenarios for Regions 1 and 2, with β1k ∈ {0.02, 0.005, 0, −0.005} being equal for both (k = 1, 2) as it is required for the null hypothesis, i.e. APC1 = APC2 ≃ 2.02, APC1 = APC2 ≃ 0.50, APC1 = APC2 ≃ 0, APC1 = APC2 ≃ −0.50:

  • Scenario A: Low level overlapping regions, I1 = 6, I2 = 11, I1Ī = 3.

  • Scenario B: Medium level overlapping regions, I1 = 10, I2 = 11, I1Ī = 7.

  • Scenario C: High level overlapping regions, I1 = 8, I2 = 8, I1Ī = 8.

The values of nkji have been obtained from real data sets for female:

  • Scenario A: Region 1 = United States (US) during 1993–1998, Region 2 = California (CA) 1996–2006.

  • Scenario B: Region 1 = US during 1993–2002, Region 2 = CA during 1996–2006.

  • Scenario C: Region 1 = US during 1999–2006, Region 2 = CA during 1999–2006.

From the same data sets we have have taken β0kj = log(κDkj1/nkj1) − βk1tk1, focussed on the Breast cancer for the first year of the time interval (i = 1). All these data were obtained from the SEER database and hence we are taking into account J = 19 age groups. Once the previous parameters have been established we can compute in a theoretical framework the individual variances of estimators β̂k1,λ, σ1k2, covariance σ1,12 and Var(β^11,λβ^12,λ)=σ112+σ1222σ1,12. We can also compute the theoretical value of ηkNk/(JIk), the average expected value per cell, which is useful to see if the value of Nk is large enough, these values are in Table 2.

Table 2.

Average total expected means of deaths per cell.

Scenario A Scenario B Scenario C

κ β1k η1 η2 η1 η2 η1 η2
1 0.020 2538.24 331.42 2741.10 331.42 2493.85 265.98
1 0.005 2441.69 292.43 2552.96 292.43 2360.81 251.71
1 0.000 2410.67 280.62 2494.19 280.619 2318.67 247.19
1 −0.0050 2380.23 269.35 2437.28 269.35 2277.59 242.79
1100
0.020 25.38 3.31 27.41 3.31 24.94 2.66
1100
0.005 24.42 2.92 25.53 2.92 23.61 2.52
1100
0.000 24.11 2.81 24.94 2.81 23.19 2.47
1100
−0.0050 23.80 2.69 24.37 2.69 22.77 2.43
1300
0.020 8.46 1.10 9.14 1.10 8.31 0.89
1300
0.005 8.14 0.97 8.51 0.97 7.87 0.84
1300
0.000 8.03 0.93 8.31 0.93 7.73 0.82
1300
−0.0050 7.93 0.90 8.12 0.90 7.59 0.81

Since both regions share a common space, we have generated firstly its death counts by simulation and thanks to the Poisson distribution’s reproductive property under summation, we have generated thereafter the death counts for each region by adding the complementary Poisson observations. In Tables 4, 6, 8 are summarized the theoretical results as well as those obtained by simulation for the MLEs and in Tables 5, 7, 9 for the MCSEs. The variances and covariances appear multiplied by 109 in all the tables. We have added tilde notation for those parameter that have been calculated by simulation with R = 22, 000 replications:

σ˜1k,λ2=1Rr=1R(β^1k,λ(r)E˜[β^1k,λ])2,    E˜[β^1k,λ]=1Rr=1Rβ^1k,λ(r),σ˜1,12,λ=1Rr=1R(β^11,λ(r)E˜[β^11,λ])(β^12,λ(r)E˜[β^12,λ]).

Table 4.

Scenario A. Maximum Likelihood Estimators (λ = 0).

κ β1k
σ112
σ˜11,λ2
σ122
σ˜12,λ2
σ1,12 σ̃1,12,λ Var(β̂11,λ − β̂12,λ)
Var˜(β^11,λβ^12,λ)
α̃λ
1 0.020 1188.50 1196.88 1468.14 1475.38 −152.70 −153.71 2962.03 *2979.67 0.049
1 0.005 1233.49 1245.97 1653.48 1644.16 −166.41 −159.88 3219.78 3209.89 0.050
1 0.000 1248.91 1237.81 1720.50 1707.01 −171.20 −156.44 3311.81 3257.70 0.049
1 −0.005 1264.55 1276.82 1790.33 1801.21 −176.11 −185.35 3407.10 *3448.73 *0.052
1100
0.020 118849.86 120545.66 146813.66 146902.40 −15269.95 −16186.46 296203.41 *299820.97 *0.052
1100
0.005 123349.03 123414.45 165348.33 167479.53 −16640.50 −16374.79 321978.37 *323643.55 *0.052
1100
0.000 124891.15 124618.15 172050.41 173875.81 −17119.82 −17946.96 331181.19 *334387.88 0.050
1100
−0.005 126455.01 125135.69 179033.49 181356.96 −17610.72 −15914.04 340709.94 338320.73 0.047
1300
0.020 356549.59 359581.84 440440.97 451288.03 −45809.84 −53204.15 888610.23 *917278.18 *0.052
1300
0.005 370047.09 373291.90 496045.00 503332.51 −49921.51 −51558.77 965935.10 *979741.96 0.050
1300
0.000 374673.44 375119.77 516151.22 532280.30 −51359.46 −50448.54 993543.56 *1008297.13 *0.051
1300
−0.005 379365.02 380562.71 537100.47 562780.79 −52832.16 −58143.46 1022129.82 *1059630.42 *0.054

Table 6.

Scenario B: Maximum Likelihood Estimators (λ = 0).

κ β1k
σ112
σ˜11,λ2
σ122
σ˜12,λ2
σ1,12 σ̃1,12,λ Var(β̂11,λ − β̂12,λ)
Var˜(β^11,λβ^12,λ)
α̃λ
1 0.020 234.90 234.40 1468.14 1461.71 12.72 6.74 1677.59 *1682.63 0.050
1 0.005 251.10 252.79 1653.48 1648.47 13.91 10.97 1876.77 *1879.32 *0.052
1 0.000 256.77 255.02 1720.50 1713.16 14.35 7.81 1948.56 *1952.57 0.050
1 −0.005 262.57 261.96 1790.33 1792.15 14.83 17.78 2023.24 2018.56 0.049
1100
0.020 23489.78 23328.11 146813.66 147774.27 1272.17 181.93 167759.10 *170738.52 *0.053
1100
0.005 25109.90 24424.06 165348.33 147273.99 1390.58 1546.90 187677.08 168604.26 0.049
1100
0.000 25676.71 25666.21 172050.41 171995.21 1435.45 822.83 194856.21 *196015.76 *0.052
1100
−0.005 26257.50 26172.50 179033.49 179024.69 1483.35 708.28 202324.30 *203780.65 *0.051
1300
0.020 70469.35 71112.09 440440.97 442433.57 3816.51 2392.77 503277.31 *508760.12 *0.052
1300
0.005 75329.71 74737.59 496045.00 510147.59 4171.74 3181.74 563031.24 *578521.71 *0.053
1300
0.000 77030.13 76849.11 516151.22 521168.35 4306.36 2781.06 584568.62 *592455.34 0.050
1300
−0.005 78772.49 79582.80 537100.47 545463.20 4450.04 5288.71 606972.89 *614468.57 0.050

Table 8.

Scenario C: Maximum Likelihood Estimators (λ = 0).

κ β1k
σ112
σ˜11,λ2
σ122
σ˜12,λ2
σ1,12 σ̃1,12,λ Var(β̂11,λ − β̂12,λ)
Var˜(β^11,λβ^12,λ)
α̃λ
1 0.020 505.19 502.38 4753.38 4766.57 505.19 515.35 4248.20 4238.26 0.050
1 0.005 532.12 529.53 5006.55 4962.78 532.12 527.77 4474.43 4436.77 0.049
1 0.000 541.45 543.59 5094.21 5129.48 541.45 549.19 4552.76 *4574.69 0.050
1 −0.005 550.96 550.15 5183.56 5202.96 550.96 563.62 4632.60 4625.87 0.051
1100
0.020 50518.62 50823.72 475338.31 480772.05 50518.62 52893.29 424819.68 *425809.19 0.050
1100
0.005 53212.46 53963.47 500654.98 500398.48 53212.46 53825.39 447442.52 446711.16 0.049
1100
0.000 54145.29 53655.97 509420.86 511073.61 54145.29 55232.68 455275.57 454264.23 0.050
1100
−0.005 55096.19 55610.24 518356.01 521012.61 55096.19 56166.72 463259.82 *464289.42 0.050
1300
0.020 151555.86 149118.50 1426014.92 1461021.71 151555.86 152950.11 1274459.05 *1304240.00 0.051
1300
0.005 159637.38 161042.69 1501964.94 1529631.00 159637.38 160328.08 1342327.55 *1370017.53 0.049
1300
0.000 162435.88 162828.12 1528262.58 1534795.18 162435.88 165625.27 1365826.70 *1366372.76 0.047
1300
−0.005 165288.56 165289.23 1555068.02 1599312.35 165288.56 168363.58 1389779.46 *1427874.42 0.050

Table 5.

Scenario A. Minimum Chi-Square Estimators (λ = 1).

κ β1k
σ112
σ˜11,λ2
σ122
σ˜12,λ2
σ1,12 σ̃1,12,λ Var(β̂11,λ − β̂12,λ)
Var˜(β^11,λβ^12,λ)
α̃λ
1 0.020 1188.50 1196.61 1468.14 1474.56 −152.70 −153.88 2962.03 *2978.92 0.049
1 0.005 1233.49 1245.63 1653.48 1642.03 −166.41 −158.86 3219.78 3205.38 0.049
1 0.000 1248.91 1237.43 1720.50 1704.17 −171.20 −156.10 3311.81 3253.80 0.049
1 −0.005 1264.55 1276.42 1790.33 1797.77 −176.11 −185.22 3407.10 *3444.64 *0.051
1100
0.020 118849.86 118678.59 146813.66 131711.15 −15269.95 −14717.95 296203.41 279825.64 *0.051
1100
0.005 123349.03 121351.67 165348.33 148155.30 −16640.50 −14873.11 321978.37 299253.18 0.049
1100
0.000 124891.15 122229.69 172050.41 152728.68 −17119.82 −16259.50 331181.19 307477.36 0.048
1100
−0.005 126455.01 122628.20 179033.49 158913.98 −17610.72 −14215.39 340709.94 309972.96 0.045
1300
0.020 356549.59 342888.09 440440.97 340354.35 −45809.84 −42220.99 888610.23 767684.41 0.050
1300
0.005 370047.09 354377.57 496045.00 369058.29 −49921.51 −41173.07 965935.10 805782.01 0.045
1300
0.000 374673.44 356408.32 516151.22 388239.25 −51359.46 −38265.08 993543.56 821177.73 0.045
1300
−0.005 379365.02 360799.63 537100.47 403732.21 −52832.16 −47176.61 1022129.82 858885.06 0.045

Table 7.

Scenario B: Minimum Chi-Square Estimators (λ = 1).

κ β1k
σ112
σ˜11,λ2
σ122
σ˜12,λ2
σ1,12 σ̃1,12,λ Var(β̂11,λ − β̂12,λ)
Var˜(β^11,λβ^12,λ)
α̃λ
1 0.020 234.90 234.36 1468.14 1459.11 12.72 6.73 1677.59 *1680.02 0.050
1 0.005 251.10 252.87 1653.48 1646.88 13.91 10.79 1876.77 *1878.16 *0.052
1 0.000 256.77 255.00 1720.50 1710.41 14.35 7.53 1948.56 *1950.35 0.050
1 −0.005 262.57 261.91 1790.33 1790.65 14.83 17.89 2023.24 2016.78 0.049
1100
0.020 23489.78 23039.44 146813.66 132848.97 1272.17 204.75 167759.10 155478.91 *0.056
1100
0.005 25109.90 24424.06 165348.33 147273.99 1390.58 1546.90 187677.08 168604.26 0.049
1100
0.000 25676.71 25227.72 172050.41 152123.09 1435.45 950.07 194856.21 175450.67 0.049
1100
−0.005 26257.50 25713.71 179033.49 157016.58 1483.35 608.74 202324.30 181512.81 0.050
1300
0.020 70469.35 68417.63 440440.97 333558.97 3816.51 2545.19 503277.31 396886.23 *0.055
1300
0.005 75329.71 71630.87 496045.00 375196.57 4171.74 2568.93 563031.24 441689.58 0.049
1300
0.000 77030.13 73435.63 516151.22 380384.11 4306.36 1980.50 584568.62 449858.76 0.046
1300
−0.005 78772.49 75952.38 537100.47 394349.52 4450.04 3665.47 606972.89 462970.97 0.046

Table 9.

Scenario C: Minimum Chi-Square Estimators (λ = 1).

κ β1k
σ112
σ˜11,λ2
σ122
σ˜12,λ2
σ1,12 σ̃1,12,λ Var(β̂11,λ − β̂12,λ)
Var˜(β^11,λβ^12,λ)
α̃λ
1 0.020 505.19 502.28 4753.38 4756.74 505.19 514.11 4248.20 4230.79 *0.051
1 0.005 532.12 529.39 5006.55 4956.27 532.12 527.17 4474.43 4431.32 0.049
1 0.000 541.45 543.50 5094.21 5120.89 541.45 549.07 4552.76 *4566.24 0.050
1 −0.005 550.96 550.04 5183.56 5194.95 550.96 563.79 4632.60 4617.41 *0.051
1100
0.020 50518.62 49937.40 475338.31 417941.93 50518.62 47230.21 424819.68 373418.90 0.050
1100
0.005 53212.46 53092.20 500654.98 434030.70 53212.46 48517.38 447442.52 390088.14 0.048
1100
0.000 54145.29 52785.97 509420.86 441697.10 54145.29 49680.01 455275.57 395123.06 0.046
1100
−0.005 55096.19 54621.08 518356.01 449926.19 55096.19 50651.05 463259.82 403245.16 0.047
1300
0.020 151555.86 141857.76 1426014.92 1037025.40 151555.86 119830.05 1274459.05 939223.06 0.048
1300
0.005 159637.38 153101.35 1501964.94 1075673.38 159637.38 123845.16 1342327.55 981084.41 0.046
1300
0.000 162435.88 154380.49 1528262.58 1074400.76 162435.88 128138.62 1365826.70 972504.01 0.043
1300
−0.005 165288.56 57146.31 1555068.02 1110194.55 165288.56 131114.59 1389779.46 1005111.67 0.044

It is important to remark that such a large quantity of replications have been chosen in order to reach a reliable precision in the simulation study (e.g., it was encountered that R = 10, 000 was not large enough). The last column is referred to the exact significance level associated with the Z-test obtained by simulation when the nominal significance level is given by α = 0.05,

α˜λ=1Rr=1RI(|Zλ(r)|>z0.975),

where I() is an indicator function and z0.975 ≃ 1.96 the quantile of order 0.975 for the standard normal distribution.

It can be seen as expected, that in Scenario 3 the covariance is positive in all the cases, while in Scenario 1 the covariance is negative. It is clear that the precision for Var˜(β^11,λβ^12,λ) as well as for α̃λ gets better as κ increases. While for large data sets (κ = 1) there is no best choice regarding λ, for small data sets (κ = 1/300) the choice in favour of λ = 1 is clear because estimators β11,λ − β12,λ are more efficient, in fact Var˜(β^11,1β^12,1)<Var(β^11,λβ^12,λ)<Var˜(β^11,0β^12,0), and the exact significance levels or estimated type I error is less than for λ = 0 in all the cases (α̃1 ≤ α̃0). Since perhaps type II error could be better for MLEs, the power functions for both estimators have been studied. In particular, for κ = 1/300 it was observed the same behaviour as appears in Figure 2: in equidistant differences regarding β = β11 − β12, when β110 is fixed, if error II is better for MLEs when β > 0 (β < 0) then error II is better for MCSEs when β < 0 (β > 0). Hence, in overall terms we recommend using MCSE rather than MLEs for small data sets. This is the case of the study illustrated for instance in Riddell and Pliska (2008) where there are a lot of cases such that the value of η^k=j=Ji=1Ikdkji/(JIk) is quite low (moreover, several cases such that η̂k < 12/19 appear without giving any estimation “due to instability of small numbers”).

Figure 2.

Figure 2

Power fuction in terms of β = β11 − β12 when β110=0, for Scenario A and κ = 1/300.

We have applied our proposed methodology to compare with real data the APC in the age-adjusted mortality rates of WC, WS and W (described at the beginning of this section) for different periods of time, 1969–1983, 1977–1991 and 1990–1999 respectively, with both estimators and for Thyroid cancer (rare cancer). The third one differs from the rest in the sense that it considers a shorter period of time for its study. The rates are expressed per 100, 000 individuals at risk. In Figure 3 the fitted models are plotted and from them it seems at first sight that there is a decreasing trend for Thyroid cancer in WC and SW, and null or decreasing trend in W. The specific values for estimates and test-statistics λ, for λ = 0, 1, are summarized in Table 10. Apart from the appropriate test-statistic, we have included naive test-statistics λ, for λ = 0, 1 that are obtained by applying the methodology for non-overlapping regions. For Thyroid cancer there is no evidence for rejecting the hypothesis of equal APCs for WS and W but it is not clear WC and WS. Looking at the confidence intervals for each region, observe that for WC and WS the test-statistic has more power to discriminate differences than for WS and W, because the variability is less (the period of time considered for W is shorter). The hypothesis of equal APCs is rejected with 0.05 significance level for WC and WS when using the naive test, and cannot be rejected when using the proper test-statistic for overlapping regions (anyway, its p-value is close to 0.05). When dealing with common cancer types the same value of APC differences on the sample would probably lead to reject the null hypothesis.

Figure 3.

Figure 3

MCSE and MLE for Thyroid cancer mortality trends in WC, SW and W.

Table 10.

Thyroid cancer mortality trends comparison among WC, SW and W during 1969–1983, 1977–1991 and 1990–1999 respectively: Maximum Likelihood Estimators and Minimum Chi-Square Estimators.

Region k λ β̂1k,λ β̂0k,λ
σ^1k,λ2
σ^1,k,k+1,λ2
APC^k,λ
CIAPCk(95%)
WC 1 0 −0.0267 −0.3680 2.923 × 10−5 −77.292 × 10−5 −2.639 (−3.665,−1.601)
1 1 −0.0268 −0.3241 2.785 × 10−5 −73.429 × 10−5 −2.646 (−3.648,−1.635)

SW 2 0 −0.0107 −0.5404 3.044 × 10−5 −32.915 × 10−5 −1.064 (−2.128; 0.011)
2 1 −0.0106 −0.4943 2.888 × 10−5 −3.1074 × 10−5 −1.053 (−2.089,−0.005)

W 3 0 0.0003 −0.7939 13.064 × 10−5 0.031 (−2.184; 2.297)
3 1 −0.0012 −0.7084 12.421 × 10−5 −0.117 (−2.275; 2.088)

Z-test statistics for WC vs. SW: Z12,0 = −1.85, 12,0 = −2.08; Z12,1 = −1.92, 12,1 = −2.16
Z-test statistics for SW vs. W: Z23,0 = −0.85, 23,0 = −0.87; Z23,1 = −0.75, 23,1 = −0.76

5 Concluding Remarks

In this work, we have dealt with an important problem of comparing the changing trends of cancer mortality/incidence rates between two overlapping regions. Our new proposal allows us to correctly account for the correlation induced by the overlapping regions when drawing statistical inference. The better finite sample performance of the minimum chi-square estimators, in comparison with the maximum likelihood estimators, suggests the practical utility of the proposed methods especially when comparing the APCs of rare cancers. Not only do our results verify the claim of Berkson (1980) that the efficiency of the maximum likelihood estimator is questionable for the finite sample size situations, they also encompass the Poisson models, for which the power-divergence based theoretical results (in particular for the minimum chi-square estimators) have remained elusive. In this paper, we have mainly focused on comparing two regions. Extending the methods to accommodate more than two regions simultaneously is certainly worthy of future investigations.

Acknowledgements

The authors thank the associate editor and two referees for their valuable comments and suggestions. This work was carried out during the stay of the first author as Visiting Scientist at Harvard University and Dana Farber Cancer Institute, supported by the Real Colegio Complutense and grant MTM2009-10072.

Technical Appendix

Proof of Theorem 4

Let ΔMk be the set with all possible Mk-dimensional probability vectors and CMk=(0,1)×Mk×(0,1). The way in which N increases is so that Diag−1(nk) mk(βk) does not change, hence ms(βk) and ns increase at the same time (s = 1, …, Mk). This means that as Nk increases, parameter βk does not suffer any change and neither does the normalized mean vector of deaths, mk*(β)=1Nkmk(βk). Note that mk*(βk)ΔMkCMk. Let V ⊂ ℝJ+1 be a neighbourhood of βk0 and a function

FNk(λ)=(F1(λ),,FJ+1(λ)):CMkJ+1,

so that

Fi(λ)(mk*,βk)=dλ(Nkmk*,mk(βk))θki,   i=1,,J+1,

with βk = (β0k1, …, β0kJ, β1k)T = (θk1, …, θkJ, θk,J+1)TV and mk*=(m1*,,mMk*)TΔMkCMk.

More thoroughly, considering Xk = (xsi)s=1, …,Mk;i=1, …, J+1 and dλ(Dk,mk(βk))=s=1Mkms(βk)ϕλ(Dsms(β)), where

ϕλ(x)={xλ+1xλ(x1)λ(λ+1),λ(λ+1)0,limαλϕα(x),λ(λ+1)=0,

it holds

Fi(λ)(mk*,βk)=s=1Mkms(β)xsi(ϕλ(Nms*ms(βk))Nkms*ms(βk)ϕλ(Nms*ms(βk))).

It can be seen that replacing mk* by mk*(βk0),βk by βk0, it holds Fi(λ)(mk*(βk0),βk0)=0, for all i = 1, …, Mk. We shall now establish that Jacobian matrix

FNk(λ)(mk*,βk)βk=(Fi(λ)(mk*,βk)θkj)i,j=1,,Mk+J+1

is nonsingular when (mk*,βk)=(mk*(βk0),βk0). For i, j = 1, …, J + 1

Fi(λ)(mk*,βk)θkj=θjdλ(Nmk*,mk(βk))θki=θkj(s=1Mkms(βk)xsi(ϕλ(Nkms*ms(βk))Nkms*ms(β)ϕλ(Nkms*ms(βk))))=s=1Mkms(βk)xsixsj(ϕλ(Nkms*ms(β))Nkms*ms(β)ϕλ(Nkms*ms(β)))+s=1MkNkms*xsixsjNkms*ms(β)ϕλ(Nkms*ms(β)),

and because ϕλ(1)=ϕλ(1)=0,and ϕλ(1)=1 for all λ,

Fi(λ)(mk*,βk)θkj|(mk*,βk)=(mk*(βk0),βk0)=Nks=1Mkms*(βk0)xsixsj.

Hence,

(FNk(λ)(mk*,βk)βk)1|(mk*,βk)=(mk*(βk0),βk0)=NkXkTDiag(mk*(βk0))Xk.

Applying the Implicit Function Theorem there exist:

  • a neighbourhood Uk of (mk*(βk0),βk0) in CMk × ℝJ+1 such that F(λ)(mk*,βk)/βk is nonsingular for every (mk*,βk)Uk;

  • an open set AkCMk that contains mk*(βk0);

  • and a unique, continuously differentiable function β˜k(λ) such that β˜k(λ)(mk*(βk0))=βk0 and
    {(mk*,βk)Uk:FNk(λ)(mk*,βk)=0}={(mk*,β˜k(λ)(mk*)):mk*Ak}.

Since

minmk*Akdλ(mk(βk0),mk(β˜k(λ)(mk*)))=minβkΘkdλ(mk(βk00),mk(βk)),

it holds

β˜k(λ)(arg min mk*Akdλ(mk(βk0),mk(β˜k(λ)(mk*))))=arg min βkΘkdλ(mk(βk00),mk(βk)),

that is

β˜k(λ)(mk*(βk0))=arg min βkΘkdλ(Nkmk*(βk0),m(βk)). (24)

Furthermore, from the properties of power divergence measures and because β˜k(λ)(mk*(βk0))=βk0, we have

0=dλ(mk(βk0),m(β˜k(λ)(mk*(βk0))))<dλ(mk(βk0),mk(βk)),      mk(βk)mk(βk0).

By applying the chain rule for obtaining derivatives on Fk(λ)(mk*,β˜k(λ)(mk*(βk0)))=0 with respect to mk*Ak, we have

FNk(λ)(mk*,βk)mk*|βk=β˜k(λ)(mk*)+FN(λ)(mk*,βk)βk|βk=β˜k(λ)(mk*)β˜k(λ)(mk*)mk*=0,

so that for mk*=mk*(βk0)

β˜k(λ)(mk*)mk*|mk*=mk*(βk0)=(FN(λ)(mk*(βk0),θ)βk)1F(λ)(mk*,βk0)mk*|(mk*,βk)=(mk*(βk0),βk0).

The last expression is part of the Taylor expansion of β˜k(λ)(mk*) around mk*(βk0)

β˜k(λ)(mk*)=β˜k(λ)(mk*(βk0))+β˜k(λ)(mk*)mk*|mk*=mk*(βk0)(mk*mk*(βk0))+o((mk*mk*(βk0))).

Taking derivatives on Fi(λ)(mk*,βk) with respect to mj*

Fi(λ)(mk*,βk)mj*=mj*dλ(Nkmk*,mk(βk))θki=mj*s=1Mkms(βk)xsi (ϕλ(Nms*ms(βk))Nms*ms(βk)ϕλ(Nms*ms(βk))),=NkNkmj*mj(βk)xjiϕ  (Nkmj*mj(βk)),

that is

Fi(λ)(mk*,βk)mj*|(βk,mk*)=(βk0,mk*(βk0))=Nkxji,

and hence

FNk(λ)(mk*,βk)mk*|(mk*,βk)=(mk*(βk0),βk0)=(Fi(λ)(m*,θ)mj*|(m*,θ)=(m*(β0),θ0))i=1,,B;j=1,,Mk=NkXkT,

and

β˜k(λ)(mk*)=β˜k(λ)(mk*(βk0))+(XkTDiag(mk*(βk0))Xk)1XkT(mk*mk*(βk0))+o((mk*mk*(βk0))). (25)

It is well known that for Poisson sampling DkNk converges almost surely (a.s.) to mk*(βk0) as Nk increases, which means that DkNkAk a.s. for Nk large enough and thus according to the Implicit Function Theorem (DkNk,β˜k(λ)(DkNk))U a.s. for Nk large enough. We can conclude from (24)

β˜k(λ)(DkNk)=argminβkΘkdλ(NkDkNk,mk(βk))=argminβkΘkdλ(Dk,mk(βk)),

which means that β^k,λ=β˜k(λ)(DkNk), and hence from (25)

β^k,λβk0=(XkTDiag(mk(βk0))Xk)1XkT(Dkmk(βk0))+o(Dkmk(βk0)Nk).

Taking into account that β^1k,λβ1k0=eJ+1T(β^k,λβk0),where eJ+1T=(0,,0,1), we are going to show that eJ+1T(XkTDiag(mk(βk0))Xk)1=σ1k2t˜kT(βk0). For that purpose we consider the design matrix partitioned according to Xk = (U, υ), where U = IJ1Ik, υ = 1Jtk, so that for

(XkTDiag(mk(βk0))Xk)1=(A11A12A21A22)1=(B11B12B21B22),{A11=UTDiag(mk(βk0)) U=Diag({Nkj}j=1J),A12=UTDiag(mk(βk0))υ=(i=1Ikmk1i(βk0)tki,,i=1IkmkJi(βk0)tki)T=A21T,A22=υTDiag(mk(βk0)) υ=j=1Ji=1Ikmkji(βk0)tki2,

we can use formula

{B11=A111+A111A12B22A21A111B21=B12T=B22A21A111B22=(A22A21A111A12)1. (26)

It follows that

eJ+1T(XkTDiag(mk(βk0))Xk)1=(B21B22)=(B22A21A111B22)=B22(A21A1111),

where

A21A111=(i=1Ikmk1i(βk0)tki,,i=1IkmkJi(βk0)tki)Diag({Nkj1}j=1J)=(Nk11i=1Ikmk1i(βk0)tki,,NkJ1i=1IkmkJi(βk0)tki)=(t˜k1(βk0),,t˜kJ(βk0))

and

B22=(j=1Ji=1Ikmkji(βk0)tki2j=1J(i=1Ikmkji(βk0))t˜kj2(βk0))1=(j=1Ji=1Ikmkji(βk0)tki2j=1J(i=1Ikmkji(βk0))t˜kj2(βk0)±j=1J(i=1Ikmkji(βk0)tkj)t˜kj(βk0))1=(j=1Ji=1Ikmkji(βk0)(tkit˜kj(βk0))2)1,

because j=1J(i=1Ikmkji (βk0))t˜kj2(βk0)=j=1J(i=1Ikmkji (βk0)tkj)t˜kj (βk0).

Proof of Theorem 5

Reformulating Theorem 4 we obtain

Nk(β^1k,λβ1k0)=akTNk(Dkmk(βko))+o(Nk(DkNkmk*(βko))),

with akTσ1k2t˜kT(βk0)XkT. We would like to calculate the asymptotic distribution as a linear function of

Nk(DkNkmk*(βko))Nk𝒩(0,Diag(mk*(βk0))).

Since

Var (akTNk(Dkmk(βko)))=akT Var (NkNk(DkNkmk*(βko)))ak=Nk2akTDiag(mk*(βk0))ak=Nkσ1k2,

it holds

akTNk(Dkmk(βko))Nk𝒩(0,Nkσ1k2). (27)

Taking into account that o(Nk(DkNkmk*(βko)))=o(OP(1))=oP(1), according to the Slutsky’s Theorem, the asymptotic distribution of Nk(β^1k,λβ1k0) must coincide with the asymptotic distribution of 27.

Proof of Theorem 7

From Theorem 4 subtracting β^12,λβ120 to β^11,λβ110 we get

(β^11,λβ110)(β^12,λβ120)=σ112t˜1T(β10)X1T((D1(1)m1(1)(β10))(D1(2)m1(2)(β10)))σ122t˜2T(β20)X2T((D2(1)m2(1)(β10))(D2(2)m2(2)(β10)))+o(D1m1(β10)N1)o(D2m2(β20)N2).

Observe that XkTDk(2)=X¯kTD¯(2), k = 1, 2, and under β110=β120 it holds XkTmk(2)(βk0)=X¯kTm¯(2)(β0), k = 1, 2. In addition, o () function is not affected by the negative sign and under β110=β120 it holds β10=β20 and thus we obtain (18).

Proof of Theorem 8

We can consider the following decomposition

N(β^11,λβ^12,λ)=(Na1T)ND1m1(β0)N+(Na2T)ND2m2(β0)N+NY, (28)

with

NY=o(1N1*D1m1(β0)N)+o(1N2*D2m2(β0)N),

rather than (18). Note that from Assumptions 3 and 6 mk(β0)/N=Nk*m*(β0) is constant as N increases and hence NY=o(D1m1(β0)N)+o(D2m2(β0)N)=o(OP(1))+o(OP(1))=oP(1). We would like to calculate the asymptotic distribution as a linear function of

NDkmk(β0)NN𝒩(0,Diag (Nk*m*(β0))).

From (28) and by applying Slutsky’s theorem we can conclude that the asymptotic distribution of N(β^11,λβ^12,λ) is central Normal. In order to calculate the variance we shall follow (18) so that

N(β^11,λβ^12,λ)=NX1+NX2+NX3+NY,

with

NX1=a1TN(D1(1)m1(1)(β0)),NX2=a2TN(D2(1)m2(1)(β0)),NX3=(a¯1Ta¯2T)N(D¯(2)m¯(2)(β0)),NY=oP(1)

where a¯kTσ1k2t˜kT(β0)X¯kT, and X1, X2 and X3 are independent random variables. Since

Var (NXk)=Var (akTN(Dk(1)mk(1)(β0)))=NakTDiag(mk(1)(β0))ak,    k=1,2,
Var (NX3)=Var ((a¯1Ta¯2T)N(D¯(2)m¯(2)(β0)))=N(a¯1Ta¯2T) Diag(m¯(2)(β0))(a¯1a¯2)=N(a¯1TDiag(m¯(2)(β0))a¯1+a¯2TDiag(m¯(2)(β0))a¯22a¯1TDiag(m¯(2)(β0))a¯2)=N(a1TDiag(m1(2)(β0))a1+a2TDiag(m2(2)(β0))a22σ112σ122ξ12),

with

ξ12=t˜1T(β0)X¯1TDiag(m¯(2)(β0))X¯2t˜2(β0)=j=1Ji=1I1I¯m2ji(2)(β0)(t2it˜1j(β0))(t2it˜2j(β0))=j=1Ji=1I1I¯n2ji(2)n2jim2ji(β0)(t2it˜1j(β0))(t2it˜2j(β0)),

it holds

Var (N(X1+X2+X3))=N(a1TDiag(m1(1)(β0)+m1(2)(β0))a1+a2TDiag(m2(1)(β0)+m2(2)(β0))a22σ112σ122ξ12)=N(σ112+σ1222σ112σ122ξ12),

that coincides with the asymptotic variance of N(β^11,λβ^12,λ).

Proof of Corollary 9

Since

t˜kj(β0)=t˜kj(2)(β0)+mkj(1)mkj(t˜kj(1)(β0)t˜kj(2)(β0)),    k=1,2,

formula (20) can be rewritten as

ξ12=j=1Ji=1I1I¯m2ji(2)(β0)(t2it˜1j(2)(β0)m1j(1)m1j(t˜1j(1)(β0)t˜1j(2)(β0)))(t2it˜2j(2)(β0)m2j(1)m2j(t˜2j(1)(β0)t˜2j(2)(β0)))=j=1Ji=1I1I¯m2ji(2)(β0)(t2it˜2j(2)(β0))2+j=1Ji=1I1I¯m2ji(2)(β0)m1j(1)m1jm2j(1)m2j(t˜1j(1)(β0)t˜1j(2)(β0))(t˜2j(1)(β0)t˜2j(2)(β0))k=12j=1Ji=1I1I¯m2ji(2)(β0)(t2it˜2j(2)(β0))mkj(1)mkj(t˜kj(1)(β0)t˜kj(2)(β0)).

The last summand is canceled because

j=1Ji=1I1I¯m2ji(2)(β0)(t2it˜2j(2)(β0))mkj(1)mkj(t˜kj(1)(β20)t˜kj(2)(β0))=j=1Jmkj(1)mkj(t˜kj(1)(β0)t˜kj(2)(β0))i=1I1I¯m2ji(2)(β0)(t2it˜2j(2)(β0))

and i=1I1I¯m2ji(2)(β0)(t2it˜2j(2)(β0))=0. Hence, it holds (22).

If region 2 is completely contained in region 1, ξ12 = 1/σ12, and therefore

Var(β^11,λβ^12,λ)=σ122+σ1122σ122σ112ξ12=σ122+σ1122σ112,

and it follows (23).

References

  • 1.Berkson J. Minimum chi-square, not maximum likelihood! Annals of Statistics. 1980;8:457–487. [Google Scholar]
  • 2.Bishop YMM, Fienberg SE, Holland PW. Discrete Multivariate Analysis: Theory and Practice. Cambridge: MIT Press; 1995. [Google Scholar]
  • 3.Fay M, Tiwari R, Feuer E, Zou Z. Estimating average annual percent change for disease rates without assuming constant change. Biometrics. 2006;62:847–854. doi: 10.1111/j.1541-0420.2006.00528.x. [DOI] [PubMed] [Google Scholar]
  • 4.Horner MJ, Ries LAG, Krapcho M, Neyman N, Aminou R, Howlader N, Altekruse SF, Feuer EJ, Huang L, Mariotto A, Miller BA, Lewis DR, Eisner MP, Stinchcomb DG, Edwards BK, editors. SEER Cancer Statistics Review,1975–2006. Bethesda, MD: National Cancer Institute; http://seer.cancer.gov/csr/1975_2006/ [Google Scholar]
  • 5.Imrey PB. Power Divergence Methods. In: Armitage P, Colton T, editors. Encyclopedia of Biostatistics. New York: John Wiley and Sons; 2005. [Google Scholar]
  • 6.Kullback S, Leibler RA. On information and sufficiency. The Annals of Mathematical Statistics. 1951;22:79–86. [Google Scholar]
  • 7.Li Y, Tiwari RC. Comparing Trends in Cancer Rates Across Overlapping Regions. Biometrics. 2008;64:1280–1286. doi: 10.1111/j.1541-0420.2008.01002.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Li Y, Tiwari RC, Zou Z. An age-stratified model for comparing trends in cancer rates across overlapping regions. Biometrical Journal. 2008;50:608–619. doi: 10.1002/bimj.200710430. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Pardo L. Statistical Inference Based on Divergence Measures. New York: Chapman & Hall / CRC (Statistics: Textbooks and Monographs); 2006. [Google Scholar]
  • 10.Pardo L, Martín N. Homogeneity/heterogeneity hypotheses for standardized mortality ratios based on minimum power-divergence estimators. Biometrical Journal. 2009;51:819–836. doi: 10.1002/bimj.200800158. [DOI] [PubMed] [Google Scholar]
  • 11.Pickle LW, White AA. Effects of the choice of age-adjustement method on maps of death rates. Statistics in Medicine. 1995;14:615–627. doi: 10.1002/sim.4780140519. [DOI] [PubMed] [Google Scholar]
  • 12.Cressie N, Read TRC. Multinomial goodness-of-fit tests. Journal of the Royal Statistical Society B. 1988;46:440–464. [Google Scholar]
  • 13.Riddell C, Pliska JM. Portland, Oregon: Department of Human Services, Oregon Public Health Division, Oregon State Cancer Registry; 2008. Cancer in Oregon, 2005: Annual Report on Cancer Incidence and Mortality among Oregonians. http://egov.oregon.gov/DHS/ph/oscar/arpt2005/ar2005.pdf. [Google Scholar]
  • 14.Tiwari RC, Clegg L, Zou Z. Efficient interval estimation for age-adjusted cancer rates. Statistical Methods in Medical Research. 2006;15:547–569. doi: 10.1177/0962280206070621. [DOI] [PubMed] [Google Scholar]
  • 15.Walters KA, Li Y, Tiwari RC, Zou Z. A Weighted-Least-Squares Estimation Approach to Comparing Trends in Age-Adjusted Cancer Rates Across Overlapping Regions. Journal of Data Science. 2010;8:631–644. [PMC free article] [PubMed] [Google Scholar]

RESOURCES