Conditional and Marginal Estimates in Case-Control Family Data - Extensions and Sensitivity Analyses

Malka Gorfine; Rottem De-Picciotto; Li Hsu

doi:10.1080/00949655.2011.581669

. Author manuscript; available in PMC: 2013 Oct 1.

Published in final edited form as: J Stat Comput Simul. 2012 Jul 5;82(10):1449–1470. doi: 10.1080/00949655.2011.581669

Conditional and Marginal Estimates in Case-Control Family Data - Extensions and Sensitivity Analyses

Malka Gorfine ^a,^*, Rottem De-Picciotto ^a, Li Hsu ^b

PMCID: PMC3446754 NIHMSID: NIHMS290649 PMID: 23002315

Abstract

This work considers two specific estimation techniques for the family specific proportional hazards model and for the population-averaged proportional hazards model. So far, these two estimation procedures were presented and studied under the gamma frailty distribution mainly because of its simple interpretation and mathematical tractability. Modifications of both procedures for other frailty distributions, such as inverse Gaussian, positive stable and a specific case of discrete distribution, are presented. By extensive simulations, it is shown that under the family specific proportional hazards model, the gamma frailty model appears to be robust to frailty distribution misspecification in both bias and efficiency loss in the marginal parameters. The population-averaged proportional hazards model, is found to be robust under the gamma frailty model misspecification only under moderate or weak dependency within cluster members.

Keywords: case-control family study, clustered survival data, frailty model, marginalized hazard function

1. Introduction

In family studies correlated failure times arise frequently in the form of ages at onset or ages at diagnosis for a disease. Many diseases such as coronary heart disease or cancer, are known to be correlated within families due to common genetic and environment factors that contribute to the occurrences of the disease. For the same reason, family studies have been frequently used in discovering novel genes or characterizing candidate genes for their involvement in diseases.

In our work we will focus on population-based case-control studies, where a number of cases and controls are sampled randomly from a well-defined population and an array of risk factors is collected on the cases and controls and their relatives [1]. We refer these cases and controls as probands to indicate that they are the index subjects because of whom the families are ascertained.

Our work is motivated by a recent breast cancer study conducted at the Fred Hutchinson Cancer Research Center [2]. In this study, the cases were incident breast cancer cases ascertained from a set of geographically defined, population based cancer registries in the United States. The controls were selected by random digit dialing, and matched with cases based on age at diagnosis and country of residence. Each subject (case or control proband) was asked to enumerate all their first-degree (mother, sister, daughters) and second-degree (aunts and grandmothers) female blood relatives. For each relative enumerated, the interviewer asked for the birth year, vital status, death year, history and type of cancer, and laterality (if breast cancer). Blood samples were collected on the probands to determine the presence or absence of BRCA1/2 mutations. One of the study objectives was to estimate the effects of BRCA1/2 mutation and other risk factors on the age at breast cancer diagnosis. It was also desired to estimate the baseline hazard function for obtaining absolute risk for a woman given her risk profile.

Case-control family studies, involve a cluster structure with potential correlations between the outcomes within a cluster. There are two main approaches for dealing with the dependence induced by the cluster effects: the conditional (or family-specific) model [3, 4] and the marginal (or population-averaged) model [5, 6]. In the conditional model, the hazard function takes into account the cluster effects and is used to compare between the risk of failure of members within the same cluster (family). Extensive reviews and discussions on the shared frailty models can be found in [7] and [8], and references therein. In the marginal approach, the risk of failure does not take into account the cluster effects. It represents the averaged hazard in the population and is used to compare the risk of failures of members in the population.

Estimation in the frailty model, has received much attention under various frailty distributions, including gamma [9–11], positive stable [12], inverse gaussian [11, 13], compound Poisson [13] and log-normal [14, 15]. Among many frailty distributions considered, gamma, or equivalently the Clayton-Oakes model [16, 17], is most commonly used due to its appealing interpretation and mathematical convenience. Despite its popularity, it is of concern that misspecification of gamma frailty distribution may invalidate the inference. Model diagnostic procedures, for cohort data, have been developed for that purpose [18–22]. However, in reality it may not be always easy to check the goodness-of-fit of the model because there is a lack of sufficient data to distinguish between various models. Hence, it is of practical importance that one should first examine to what extent the misspecification of the frailty distribution may affect the regression coefficients and baseline hazard function estimation in terms of bias and efficiency.

Some work has been done on the misspecification of frailty distribution in cohort family studies. It is found by simulation [23] that the regression coefficient estimates under the assumed gamma frailty model appeared to be minimally affected when the true frailty distribution is inverse Gaussian or positive stable. However, they did not study the effect of misspecified frailty distribution on the hazard functions. Hsu et al. [4] studied, under cohort and case-control settings, how the misspecification of the frailty distribution affects the estimation of the fitted marginalized hazard function for individuals with a particular risk profile. They assumed gamma distribution when the true distributions were inverse Gaussian, positive stable and specific case of the discrete distribution. They showed that the gamma distribution appears to be robust to frailty distribution misspecification and that the biases are generally 10% and lower, even when the true frailty distribution deviates substantially from the assumed gamma distribution. Note that both works concentrated on wrongly assuming gamma frailty model.

While family-specific hazard function is useful in genetic counseling, population-averaged marginal hazard functions are also of interest from the public health perspective for devising effective strategies for preventing diseases and treating the general population. Under the frailty model, the population-averaged hazard functions can be obtained by integrating out the frailty. However it may likely be affected by the frailty distribution assumption, as the integrated function involves not only the regression coefficients but also the dependence parameter. To overcome this undesired property, Hsu et al. [5] proposed a population-averaged marginal hazard frailty-based model for the case-control study design, while the marginal hazard functions are free of the frailty distribution. They showed by simulations, that the efficiency gain by the proposed method, in contrast to the generalized estimating equation approach, is most pronounced with high degree of correlation and with large family size.

Both works, [3, 5], only considered the gamma frailty distribution with scale and shape parameters θ⁻¹. Hence, the main goals of this work are: (1) extending and applying the estimation procedures of [3, 5] for different frailty distributions for case-control family data; and (2) studying the bias and particularly the efficiency loss of gamma frailty distribution misspecification on the regression estimates and marginal hazard functions. We investigated the following frailty distributions: inverse Gaussian, positive stable and a specific case for the discrete distribution. The discrete distribution is such that the frailty variate takes one of only two possible values, 1 − θ or 1 + θ, where the constraint (1 + θ + 1 − θ)/2 = 1 is set to allow for a unique identification of the baseline hazard function. Each distribution is a function of a parameter which quantifies differently the heterogeneity in risks among families and allow for a unique identification of the baseline hazard function. Each frailty distribution yields a different association between survival times of cluster (family) members. Table 1 gives the density functions (f), the first and second moments (μ₁, μ₂), Laplace transforms (ϕ) and cross-ratio (C) for the above distributions. We use here the popular cross ratio function [16] as a measure of dependency for bivariate survival times when comparing between the distributions. Hougaard [7] provides a comprehensive review of the properties of the various frailty distributions.

Table 1.

Density functions (f), first and second moments (μ₁, μ₂), Laplace transforms (ϕ) and cross-ratio (C) for the distributions: gamma, inverse Gaussian, positive stable and discrete.

Gamma

f(t) = θ^–1/θt^(1–θ)/θ exp(–t/θ)/γ(1/θ), θ > 0

μ₁ = 1, μ₂ = θ + 1

ϕ(s) = (1 + θs)^–1/θ

C(t₁, t₂) = θ + 1

Inverse Gaussian

f(t) = (πθ)^–1/2 exp(2/θ)t^–3/2 exp{–t/θ – 1/(tθ)}, θ ≥ 0

μ₁ = 1, μ₂ = θ/2

ϕ (s) = exp [2 {\frac{1}{θ} - {(\frac{1}{θ^{2}} + \frac{s}{θ})}^{1 ∕ 2}}]

C (t_{1}, t_{2}) = 1 + \frac{θ}{2 - θ log S^{m} (t_{1}, t_{2})}

S^m(t₁, t₂) = P(T₁ ≥ t₁, T₂ ≥ t₂) = ϕ(H₁(t₁) + H₂(t₂)) where H_i(t_i) = Λ₀(t_i) exp(β^TZ_i)

Positive Stable

f (t) = - {(π t)}^{- 1} Σ_{k = 1}^{\infty} γ (k θ + 1) {(k!)}^{- 1} {(- t^{- θ})}^{k} sin (θ π k),, 0 < θ < 1

μ₁, μ₂ does not exist for θ < 1

ϕ(s) = exp(–s^θ)

C (t_{1}, t_{2}) = 1 + \frac{1 - θ}{- θ log S^{m} (t_{1}, t_{2})}

Discrete

P_r(ω = 1 + θ) = 0.5, and P_r(ω = 1 – θ) = 0.5, –1 ≤ θ < 1

μ₁ = 1, μ₂ = 1 + θ²

ϕ(s) = 0.5 exp {–s(1 – θ)} + 0.5 exp {–s(1 + θ)}

C(t₁, t₂) = 1 + 4θ²[(1 + θ){G(t₁, t₂)}^–θ + (1 – θ){G(t₁, t₂)}^θ]^–2

G(t₁, t₂) = exp{H₁(t₁) + H₂(t₂)}

Gamma	U_j0 = {1 + θH_j0(t)}^–1
Inverse Gaussian	$U_{j 0} = \frac{1}{θ} {\frac{1}{θ^{2}} + \frac{H_{j 0} (t)}{θ}}^{- 1 ∕ 2}$
Positive stable	U_j0 = θ{H_j0(t)}^θ–1
Discrete	$U_{j 0} = \frac{(1 + θ) exp {- H_{j 0} (t) (1 + θ)} + (1 - θ) exp {- H_{j 0} (t) (1 - θ)}}{exp {- H_{j 0} (t) (1 + θ)} + exp {- H_{j 0} (t) (1 - θ)}}$

Gamma	$τ = \frac{θ}{θ + 2}$	θ = 1.64
Inverse Gaussian	$τ = 0.5 - 2 ∕ θ + 8 ∕ θ^{2} exp (4 ∕ θ) \int_{4 ∕ θ}^{\infty} u^{- 1} exp (- u) d u$	θ = 30.55
Positive Stable	τ = 1 – θ	θ = 0.55
Discrete	τ = θ²/2.	θ = 0.95

	$Λ_{0}^{c} (0.05) = 0.05$	$Λ_{0}^{c} (0.1) = 0.1$	$Λ_{0}^{c} (0.15) = 0.15$	$Λ_{0}^{c} (0.2) = 0.2$
60% – 80% censoring rate
True frailty distribution: inverse Gaussian (IG)
Used: IG	0.055(0.012)	0.110(0.022)	0.164(0.033)	0.218(0.044)
Used: gamma	0.045(0.009)	0.082(0.015)	0.116(0.022)	0.149(0.028)
True frailty distribution: positive stable (PS)
Used: PS	0.057(0.013)	0.113(0.023)	0.168(0.033)	0.221(0.041)
Used: gamma	0.278(0.047)	0.505(0.090)	0.743(0.141)	0.998(0.203)
True frailty distribution: discrete (Disc)
Used: Disc	0.052(0.007)	0.106(0.012)	0.159(0.017)	0.212(0.023)
Used: gamma	0.051(0.007)	0.105(0.014)	0.161(0.020)	0.219(0.027)
30% – 40% censoring rate
True frailty distribution: inverse Gaussian (IG)
Used: IG	0.067(0.041)	0.137(0.085)	0.208(0.130)	0.279(0.176)
Used: gamma	0.035(0.006)	0.062(0.010)	0.087(0.013)	0.110(0.016)
True frailty distribution: positive stable (PS)
Used: PS	0.050(0.010)	0.101(0.018)	0.151(0.025)	0.201(0.032)
Used: gamma	0.192(0.031)	0.326(0.054)	0.452(0.078)	0.578(0.105)
True frailty distribution: discrete (Disc)
Used: Disc	0.050(0.006)	0.100(0.011)	0.151(0.015)	0.201(0.019)
Used: gamma	0.055(0.008)	0.114(0.015)	0.176(0.022)	0.241(0.023)

		Λ^m(0.05)		Λ^m(0.1)		Λ^m(0.15)		Λ^m(0.2)
	β^C	Z=0	Z=l	Z=0	Z=l	Z=0	Z=l	Z=0	Z=l
True frailty distribution: inverse Gaussian (IG)
True value	0.693	0.039	0.066	0.066	0.109	0.089	0.143	0.109	0.172
mean(SE): IG	0.703(0.128)	0.041(0.011)	0.069(0.015)	0.069(0.015)	0.113(0.020)	0.092(0.017)	0.147(0.023)	0.113(0.019)	0.177(0.026)
mean(SE): gamma	0.690(0.129)	0.034(0.006)	0.065(0.011)	0.058(0.008)	0.110(0.016)	0.079(0.011)	0.146(0.019)	0.097(0.013)	0.177(0.022)
MSE: IG	1.648	0.012	0.023	0.023	0.041	0.029	0.054	0.037	0.070
MSE: gamma	1.665	0.006	0.012	0.013	0.026	0.022	0.037	0.031	0.050
RE	0.984	3.361	1.859	3.515	1.560	3.388	1.465	2.136	1.396
True frailty distribution: positive stable (PS)
True value	0.693	0.192	0.282	0.282	0.413	0.352	0.516	0.413	0.604
mean(SE): PS	0.685(0.108)	0.187(0.028)	0.274(0.038)	0.276(0.034)	0.404(0.046)	0.345(0.039)	0.506(0.052)	0.405(0.043)	0.594(0.056)
mean(SE): gamma	0.695(0.124)	0.165(0.022)	0.293(0.035)	0.257(0.031)	0.434(0.045)	0.332(0.037)	0.542(0.052)	0.398(0.042)	0.632(0.058)
MSE: PS	1.172	0.080	0.151	0.119	0.219	0.157	0.280	0.191	0.323
MSE: gamma	1.538	0.121	0.134	0.158	0.246	0.177	0.338	0.199	0.415
RE	0.758	1.612	1.178	1.202	1.045	1.111	1.000	1.048	0.932
True frailty distribution: discrete (Disc)
True value	0.693	0.049	0.095	0.095	0.182	0.140	0.260	0.182	0.329
mean(SE): Disc	0.702(0.105)	0.049(0.006)	0.096(0.011)	0.096(0.010)	0.185(0.017)	0.141(0.013)	0.266(0.021)	0.184(0.016)	0.339(0.025)
mean(SE): gamma	0.678(0.114)	0.052(0.007)	0.099(0.014)	0.103(0.012)	0.186(0.021)	0.152(0.017)	0.265(0.027)	0.197(0.021)	0.335(0.032)
MSE: Disc	1.110	0.003	0.012	0.010	0.029	0.017	0.047	0.026	0.072
MSE: gamma	1.322	0.005	0.021	0.021	0.045	0.043	0.075	0.066	0.106
RE	0.848	0.734	0.617	0.694	0.655	0.584	0.604	0.580	0.610

Gamma	$α (t) = λ_{0}^{m} (t) exp {β^{M^{T}} Z + θ exp (β^{M^{T}} Z) Λ_{0}^{m} (t -)}$
Inverse Gaussian	$α (t) = λ_{0}^{m} (t) exp (β^{M^{T}} Z) {\frac{θ}{2} Λ_{0}^{m} (t -) exp (β^{M^{T}} Z) + 1}$
Positive Stable	$α (t) = λ_{0}^{m} (t) \frac{1}{θ} {Λ_{0}^{m} (t -)}^{1 ∕ θ - 1} \cdot exp (\frac{1}{θ} β^{M^{T}} Z)$

Kendall's τ	0.33	0.45
Gamma	1.25	2.20
Inverse Gaussian	8.50	380
Positive Stable	0.63	0.50

Gamma	$A^{- 1} (x) = \frac{log (1 - θ x)}{θ exp (β^{M^{T}} Z)}$
Inverse Gaussian	$A^{- 1} (x) = \frac{- 1 + \sqrt{1 - θ x}}{\frac{θ}{2} exp (β^{M^{T}} Z)}$
Positive Stable	$A^{- 1} (x) = {(- x)}^{θ} exp (- β^{M^{T}} Z)$

			Λ^m(0.05)		Λ^m(0.1)		Λ^m(0.15)		Λ^m(0.2)
	θ	β^m	Z=0	Z=l	Z=0	Z=l	Z=0	Z=l	Z=0	Z=l
True frailty distribution: gamma
True value	2.2	0.693	0.05	0.1	0.1	0.2	0.15	0.3	0.2	0.4
Used: gamma	2.26(0.486)	0.696(0.096)	0.049(0.008)	0.099(0.015)	0.099(0.014)	0.198(0.027)	0.148(0.021)	0.295(0.04)	0.186(0.029)	0.372(0.055)
True frailty distribution: inverse Gaussian (IG)
True value	380	0.693	0.05	0.1	0.1	0.2	0.15	0.3	0.2	0.4
Used: IG	407(222)	0.691(0.094)	0.053(0.018)	0.104(0.031)	0.104(0.031)	0.205(0.051)	0.153(0.041)	0.302(0.068)	0.190(0.049)	0.376(0.083)
Used: gamma	-	0.695(0.084)	0.022(0.012)	0.043(0.024)	0.038(0.021)	0.076(0.042)	0.051(0.028)	0.103(0.057)	0.061(0.034)	0.121(0.068)
True frailty distribution: positive stable (PS)
True value	0.5	0.693	0.05	0.1	0.1	0.2	0.15	0.3	0.2	0.4
Used: PS	0.526(0.057)	0.696(0.080)	0.051(0.012)	0.099(0.021)	0.105(0.019)	0.207(0.031)	0.159(0.028)	0.314(0.046)	0.209(0.033)	0.413(0.055)
Used: gamma	-	0.661(0.077)	0.032(0.010)	0.065(0.021)	0.057(0.018)	0.115(0.037)	0.078(0.026)	0.157(0.052)	0.096(0.032)	0.192(0.064)

	Gamma		Inverse Gaussian		Positive Stable		Discrete
	estimate	Bootstrap SE	estimate	Bootstrap SE	estimate	Bootstrap SE	estimate	Bootstrap SE
${\hat{β}}^{C}$	-0.484	0.216	-0.485	0.226	-0.595	0.247	-0.477	0.203
$\hat{θ}$	0.889	0.443	1.835	0.924	0.984	0.006	0.947	0.216
${\hat{Λ}}_{0}^{C} (40)$	0.005	0.002	0.005	0.002	0.002	0.001	0.005	0.002
${\hat{Λ}}_{0}^{C} (50)$	0.023	0.006	0.023	0.005	0.016	0.008	0.023	0.005
${\hat{Λ}}_{0}^{C} (60)$	0.051	0.010	0.050	0.009	0.044	0.016	0.050	0.009
${\hat{Λ}}_{0}^{C} (70)$	0.095	0.016	0.095	0.016	0.097	0.026	0.092	0.016
${\hat{Λ}}_{0}^{m} (40)$	0.005	0.002	0.005	0.002	0.002	0.002	0.005	0.002
${\hat{Λ}}_{0}^{m} (50)$	0.023	0.005	0.022	0.006	0.017	0.008	0.022	0.005
${\hat{Λ}}_{0}^{m} (60)$	0.049	0.009	0.049	0.009	0.046	0.016	0.048	0.009
${\hat{Λ}}_{0}^{m} (70)$	0.090	0.016	0.091	0.016	0.100	0.027	0.087	0.016

			Λ^m(0.05)		Λ^m(0.1)		Λ^m(0.15)		Λ^m(0.2)
	θ	β^m	Z=0	Z=1	Z=0	Z=1	Z=0	Z=1	Z=0	Z=1
True frailty distribution: gamma
True value	1.25	0.693	0.05	0.1	0.1	0.2	0.15	0.3	0.2	0.4
mean(SE): gamma	1.250(0.148)	0.690(0.059)	0.050(0.007)	0.099(0.013)	0.099(0.011)	0.199(0.019)	0.150(0.015)	0.299(0.025)	0.201(0.019)	0.400(0.032)
True frailty distribution: inverse Gaussian (IG)
True value	8.5	0.693	0.05	0.1	0.1	0.2	0.15	0.3	0.2	0.4
mean(SE): IG	8.860(2.080)	0.684(0.064)	0.051(0.008)	0.101(0.016)	0.102(0.014)	0.201(0.023)	0.153(0.018)	0.302(0.030)	0.203(0.022)	0.402(0.036)
mean(SE): gamma	-	0.667(0.061)	0.048(0.007)	0.094(0.013)	0.099(0.012)	0.193(0.020)	0.151(0.017)	0.293(0.027)	0.202(0.021)	0.394(0.033)
MSE: IG	-	0.417	0.006	0.025	0.020	0.053	0.033	0.090	0.049	0.130
MSE: gamma	-	0.439	0.005	0.019	0.014	0.045	0.029	0.077	0.044	0.112
RE	-	1.100	1.306	1.514	1.361	1.322	1.121	1.234	1.097	1.190
True frailty distribution: positive stable (PS)
True value	0.63	0.693	0.05	0.1	0.1	0.2	0.15	0.3	0.2	0.4
mean(SE): PS	0.650(0.037)	0.704(0.062)	0.046(0.012)	0.093(0.021)	0.095(0.018)	0.193(0.031)	0.145(0.022)	0.292(0.037)	0.195(0.027)	0.393(0.042)
mean(SE): gamma	-	0.665(0.060)	0.046(0.007)	0.090(0.012)	0.096(0.012)	0.188(0.019)	0.148(0.016)	0.287(0.026)	0.200(0.020)	0.389(0.032)
MSE: PS	-	0.396	0.015	0.048	0.035	0.101	0.054	0.14	0.075	0.187
MSE: gamma	-	0.438	0.006	0.020	0.016	0.050	0.026	0.084	0.042	0.114
RE	-	1.067	2.890	3.121	2.351	2.662	2.030	2.036	1.701	1.780

PERMALINK

Conditional and Marginal Estimates in Case-Control Family Data - Extensions and Sensitivity Analyses

Malka Gorfine

Rottem De-Picciotto

Li Hsu

Abstract

1. Introduction

Table 1.

2. Conditional modeling - notation and the model under consideration

2.1. Conditional modeling - the likelihood function

Table 2.

2.2. Conditional modeling - an estimation procedure

Table 3.

Table 4.

3. Conditional modeling - a simulation study

Table 5.

Table 6.

Table 7.

Table 9.

Table 8.

4. Marginal modeling - notation and the model under consideration

Table 10.

4.1. Marginal modeling - the likelihood function

Table 11.

4.2. Marginal modeling - an estimation procedure

5. Marginal modeling - a simulation study

Table 12.

Figure 1.

Table 13.

Table 14.

Table 16.

Table 15.

6. Example - A case-control family study of breast cancer

Table 17.

Table 18.

7. Discussion

8. Acknowledgment

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases