Skip to main content
Biostatistics (Oxford, England) logoLink to Biostatistics (Oxford, England)
. 2014 May 7;15(4):636–650. doi: 10.1093/biostatistics/kxu016

Estimating effect of environmental contaminants on women's subfecundity for the MoBa study data with an outcome-dependent sampling scheme

Jieli Ding 1, Haibo Zhou 2,*, Yanyan Liu 3, Jianwen Cai 4, Matthew P Longnecker 5
PMCID: PMC4168316  PMID: 24812419

Abstract

Motivated by the need from our on-going environmental study in the Norwegian Mother and Child Cohort (MoBa) study, we consider an outcome-dependent sampling (ODS) scheme for failure-time data with censoring. Like the case-cohort design, the ODS design enriches the observed sample by selectively including certain failure subjects. We present an estimated maximum semiparametric empirical likelihood estimation (EMSELE) under the proportional hazards model framework. The asymptotic properties of the proposed estimator were derived. Simulation studies were conducted to evaluate the small-sample performance of our proposed method. Our analyses show that the proposed estimator and design is more efficient than the current default approach and other competing approaches. Applying the proposed approach with the data set from the MoBa study, we found a significant effect of an environmental contaminant on fecundability.

Keywords: Biased-sampling, Empirical likelihood, Proportional hazards model, Survival analysis

1. Introduction

In many epidemiologic studies and disease prevention trials, much of the cost is spent on acquiring measurements of the main exposure variable. Large cohort studies with simple random sampling are too expensive to conduct for investigators with a limited budget. Alternative cost-efficient designs and procedures are therefore desirable and may play a critical role in reaching the prespecified power level for many studies with a limited budget. Outcome-dependent sampling (ODS) (e.g. the case–control study) is a retrospective sampling scheme that enhances the efficiency and reduces the cost of a study by allowing investigators observe the exposure with a probability that depends on the value of the outcome (e.g. Cornfield, 1951; Weinberg and Wacholder, 1993; Whittemore, 1997). Recent work has focused on a more general ODS design for continuous outcomes (Zhou and others, 2002; Chatterjee and others, 2003; Weaver and Zhou, 2005). The principle idea of such a design is to concentrate resources on a segment of the population that conveys the most information about the exposure–response relationship (Song and others, 2009; Zhou, Song and others, 2011; Zhou, Wu and others, 2011).

For the time-to-event data, the case-cohort design (Prentice, 1986) is a well-known biased-sampling scheme for censored failure-time data. The case-cohort design measures the covariates on a simple random sample (SRS) (subcohort) as well as on all the failures at the end of study (e.g. Sun and others, 2004; Lu and Tsiatis, 2006; Breslow and Wellner, 2007; Tsai, 2009). When the number of failures is large, a generalized case-cohort design has been proposed where, in addition to a random sample, the information on covariates is assembled only for a subset of the failures instead of all the failures to reduce the cost (e.g. Chen, 2001; Cai and Zeng, 2007; Kang and Cai, 2009). Case-cohort and generalized case-cohort designs are especially advocated when censoring of cases is frequent.

Our research is motivated by a recent substudy of the Norwegian Mother and Child Cohort (MoBa) about the potential health effects of perfluoroalkyl substances (PFASs) (Whitworth and others, 2012). PFASs are man-made chemicals that are widely used as industrial surfactants and emulsifiers and in a variety of consumer products. Two of the most widely detected and studied PFASs are perfluorooctane sulfonate (PFOS) and perfluorooctanoic acid (PFOA). Both PFOS and PFOA have shown the potential for toxicity in animal studies (e.g. Johansson and others, 2008). In human studies, several studies have linked PFOS and PFOA levels to lower birth weight, increased cholesterol, increased rates of cancer (e.g. Alexander and others, 2003), and reduced human fertility (e.g. Fei and others, 2009).

Our interests are focused on assessing the relationship between exposure to PFASs and women's subfecundity. Measurements to estimate fecundity were ascertained as time to pregnancy (TTP), reported by women around gestational week 17. Because of the expense measuring the PFAS levels, Whitworth and others (2012) chose two groups of women for measurement of PFAS levels: an overall SRS of Inline graphic women from the cohort and a supplemental sample of Inline graphic women sampled from those who delivered a child and had a TTP Inline graphic months. The MoBa substudy was designed to take advantage of the ODS scheme to yield more powerful and efficient inferences. In this paper, we consider a general failure-time ODS sampling scheme for the MoBa data.

One frequent approach in epidemiology for the above MoBa data is to dichotomize the TTP and apply logistic regression for a binary response. The odds ratios based on this logistic regression are then computed (e.g. per ng/ml of PFOA). Loss of information and bias may result. There is also the risk for misclassification and the estimations may not be comparable if different cutpoints are chosen to dichotomize the outcome. We assess the relationship between the exposure of interest and time-to-event response by analyzing the right-censored data obtained by the above ODS scheme under the framework of the proportional hazards model (Cox, 1972). We develop an estimated maximum semiparametric empirical likelihood approach where we replace the baseline cumulative hazard function and survival function of censoring time in the joint likelihood with some consistent estimators and then maximize it by an empirical approach without specifying the marginal distribution of covariates. We illustrate the proposed method through simulations and compare it with the results from different competing methods. Software and practical recommendations are provided to researchers who will deal with biased-sampling failure-time data in practice.

The layout of the remainder of this article is as follows. In Section 2, we describe the proposed failure-time ODS design, present an estimated semiparametric empirical likelihood estimator, and develop the asymptotic properties of the proposed estimator. In Section 3, we conduct simulation studies to compare its efficiency with some alterative methods. In Section 4, we apply our proposed method to analyze a data set from the MoBa study. In Section 5, we give some final remarks.

2. Design and estimation

2.1. ODS design and notations

Suppose that there exists a large, but finite, study population of Inline graphic independent individuals. Let Inline graphic denote the failure time and Inline graphic denote the censoring time for subject Inline graphic (Inline graphic). The observed time is Inline graphic. Let Inline graphic denote the right-censoring indicator for subject Inline graphic, Inline graphic denote the at-risk process, and Inline graphic denote the counting process, where Inline graphic is an indicator function. We confine our attention to non-time-dependent covariates. Let Inline graphic be a Inline graphic-dimensional covariate for subject Inline graphic. Assume that Inline graphic and Inline graphic are conditionally independent given Inline graphic. Let Inline graphic denote the end time for the study.

Suppose that the failure-time Inline graphic follows the following proportional hazards model (Cox, 1972):

2.1. (2.1)

where Inline graphic is the unspecified baseline hazard function, and Inline graphic is a Inline graphic-dimensional regression coefficient of primary interest. We assume that the range of observed failure time of all the cases is partitioned into Inline graphic mutually exclusive and exhaustive strata: Inline graphic by some known constants Inline graphic which satisfy Inline graphic. We consider the following ODS design where Inline graphic is observed: First, a random sample of size Inline graphic from the full cohort, denoted by the SRS sample, is selected. In addition, we select a supplemental sample of size Inline graphic from each of the above Inline graphicth stratum of cases. The samples from these two components constitute the ODS sample. We suppose that Inline graphic is fixed by design for Inline graphic. We denote Inline graphic to be the total size of the ODS sample. Let Inline graphic, Inline graphic and Inline graphic be the index set of the total ODS sample, the SRS sample, and the supplemental sample from the Inline graphicth stratum, respectively. Hence, the observed data for our ODS design can be summarized as

2.1. (2.2)

The likelihood function corresponding to the observed data described in (2.2) is

2.1. (2.3)

where Inline graphic and Inline graphic denote the cumulative distribution and density function of Inline graphic, respectively, Inline graphic denotes the conditional distribution function of Inline graphic given Inline graphic, and Inline graphic denotes the joint density function of Inline graphic, conditional on the censoring indicator Inline graphic being 1, and the failure-time Inline graphic being in interval Inline graphic. By applying Bayes’ Law to the supplemental samples in the second bracket of (2.3), we can rewrite (2.3) as

2.1. (2.4)

Under random censorship, Inline graphic when Inline graphic and Inline graphic when Inline graphic, where Inline graphic and Inline graphic are the conditional density function and survival function of Inline graphic given Inline graphic with the baseline cumulative hazard function Inline graphic, respectively, and Inline graphic and Inline graphic are the density function and survival function of the censoring time Inline graphic, respectively. We assume that Inline graphic is independent on the covariate Inline graphic. Thus, we have Inline graphic. The likelihood function in (2.4) is proportional to

2.1. (2.5)

Note that the non-parametric portion Inline graphic cannot be separated from the above likelihood function that combines both the conditional parametric likelihood and the marginal semiparametric likelihood. Clearly, the inference for the underlying parameters requires methods to deal with Inline graphic, which are effectively infinite-dimensional nuisance functions. For all these challenges, we develop next an estimated maximum semiparametric empirical likelihood approach, in which we replace Inline graphic in the joint likelihood Inline graphic with their estimators to get an estimated likelihood function Inline graphic, and then maximize Inline graphic with respect to Inline graphic by a semiparametric empirical approach without specifying Inline graphic.

2.2. An estimated maximum semiparametric empirical likelihood approach

First, we estimate the baseline cumulative hazard function Inline graphic by the Breslow–Aalen estimator

2.2.

and the survival function Inline graphic of censoring time Inline graphic by the Nelson–Aalen estimator

2.2.

where Inline graphic denotes the number of subjects at risk at a time prior to Inline graphic (for Inline graphic) and Inline graphic is the estimate of Inline graphic based only on the SRS portion. Replacing Inline graphic in the likelihood function (2.5) with Inline graphic, we obtain the estimated log-likelihood function:

2.2. (2.6)

where

2.2.

which is obtained by an extension of the result of Johansen (1983) for the Cox model, and

2.2.

which are the stratum-specific estimated probabilities of the failure time across all cases.

Maximizing Inline graphic with respect to Inline graphic without specifying Inline graphic is not straightforward. We first profile the likelihood function in (2.6) by fixing Inline graphic and replacing Inline graphic with the empirical likelihood function (Vardi, 1982, 1985). To maximize Inline graphic over all distributions whose support contains the observed Inline graphic values, we only need to consider the discrete conditional distribution of Inline graphic with jumps at each of the observed points (Owen, 1990). Denote

2.2.

For a fixed Inline graphic, we have

2.2. (2.7)

We use the Lagrange multiplier argument to search for Inline graphic that maximize (2.7) under the constraints Inline graphic The Lagrange function can be written as

2.2.

where Inline graphic denotes the Lagrange multipliers. It can be shown that the solutions to the score equation of Inline graphic with respect to Inline graphic have the form:

2.2.

By plugging Inline graphic back into Inline graphic in (2.7), we have the resulting profile likelihood function

2.2. (2.8)

where Inline graphic. The proposed estimated maximum semiparametric empirical likelihood estimator (EMSELE) is the Inline graphic that maximizes (2.8). Define Inline graphic, and denote the EMSELE for Inline graphic to be Inline graphic and the EMSELE for parameter Inline graphic to be Inline graphic, the corresponding portion of Inline graphic. A Newton–Raphson algorithm can be used to obtain Inline graphic.

2.3. Asymptotic properties of EMSELE

To present the large-sample result, we introduce the following notations:

2.3.

Here, for a vector Inline graphic, Inline graphic. We indicate the true values of a parameter by superscript “0”. Let Inline graphic denote expectation conditional on Inline graphic, so that, for any function Inline graphic,

2.3. (2.9)

Under some general regularity conditions (see Appendix of supplementary material available at Biostatistics online) and assuming that Inline graphic and Inline graphic for Inline graphic, the following theorem establishes the asymptotic properties of the EMSELE Inline graphic as well as a consistent estimator for the asymptotic variance matrix.

Theorem 2.1 —

Under general regularity conditions, Inline graphic converges in probability to Inline graphic, while Inline graphic has an asymptotic normal distribution with mean zero and with a variance matrix in the form Inline graphic where Inline graphic is the limiting Hessian matrix of the profile likelihood Inline graphic, and

graphic file with name M142.gif

where Inline graphic, and

graphic file with name M144.gif

A consistent estimator for the asymptotic covariance matrix Inline graphic is Inline graphic, where Inline graphic, Inline graphic, Inline graphic, and Inline graphic are obtained by replacing the large-sample quantities in Inline graphic, Inline graphic, Inline graphic, and Inline graphic with their corresponding small-sample quantities.

The proof for Theorem 2.1 is given in Appendix (see supplementary material available at Biostatistics online).

3. Simulation studies

We conducted simulation studies to assess the finite sample properties of our proposed method. We consider the following Cox's proportional hazards model:

3. (3.1)

We took the marginal distribution of failure-time Inline graphic to be exponential with failure rate Inline graphic. The baseline hazard function Inline graphic was set to be Inline graphic. The covariate Inline graphic was generated from a standard normal distribution and Inline graphic was generated from a Bernoulli distribution with Inline graphic. We set Inline graphic and Inline graphic. The censoring time Inline graphic was generated from a uniform distribution Inline graphic with Inline graphic chosen to depend on the desired percentage of censoring. We considered censoring rates of approximately Inline graphic and Inline graphic with the corresponding Inline graphic values Inline graphic and Inline graphic.

For our ODS design, we first generated the SRS sample of Inline graphic. We then partitioned all the cases into three strata, separated by quantiles Inline graphic and Inline graphic of failure times in the cases. We sampled the supplemental sample of Inline graphic and Inline graphic subjects from the low stratum and the high stratum, respectively. In addition to various configurations for the parameter values, we also chose two pairs of the cutpoints (Inline graphic and Inline graphic quantiles, and Inline graphic and Inline graphic quantiles, respectively), to investigate the impact of different cutpoints for our ODS design for creating the supplemental samples.

Under each configuration, we compared the proposed estimator, Inline graphic, with three competing estimators: the maximum likelihood estimator based on a SRS of the same size as the ODS sample (Inline graphic); the weighted estimator under generalized case-cohort design developed by Kang and Cai (2009) (Inline graphic); the estimator under the case-cohort design developed by Prentice (1986) (Inline graphic). For calculating Inline graphic, we first selected a subcohort of Inline graphic by simple random sampling. We then selected a SRS of cases of Inline graphic in the remaining cases, which we set to be the same size as the supplemental samples in ODS design, i.e. Inline graphic. We used the weighed estimating equation provided in Kang and Cai (2009) with the time-invariant weight function to obtain Inline graphic. For calculating Inline graphic, we randomly sampled a subcohort of Inline graphic and took all the remaining cases. In order to obtain an approximate sample size with the ODS samples, we adjusted the size of the full cohort according to different subcohort sizes and different censoring rates. For example, we set the full cohort size to be Inline graphic when the subcohort size Inline graphic was Inline graphic and the censoring rate was Inline graphic, and the mean of sample sizes for case-cohort design under Inline graphic simulations was Inline graphic.

The estimated means (Means), standard deviations (SDs), mean of the variance estimates (SEs), and Inline graphic nominal confidence intervals coverages (CPs) for each estimator were obtained from Inline graphic independently generated data sets. The results are summarized in Table 1. Under all of the cases considered here, the four estimators for Inline graphic and Inline graphic are all unbiased. Our proposed variance estimator provides a good estimation for the sample standard errors and the confidence intervals attain coverage close to the nominal Inline graphic level. We note that the estimation of the sample SDs become less stable at the very high censoring rate (e.g. Inline graphic), which indicates a higher sample size may be needed. Further, the efficiency gains are higher when the cutpoint is further out (Inline graphic vs. Inline graphic). We also note that the proposed estimator Inline graphic is the most efficient among all the estimators compared under all the different censoring rates. Inline graphic is more efficient than Inline graphic when the censoring rate is Inline graphic. Inline graphic is more efficient than Inline graphic when the censoring rate is Inline graphic. Inline graphic and Inline graphic are comparable under censoring rate of Inline graphic. The fact that Inline graphic is more efficient than Inline graphic and Inline graphic indicates that our ODS design for the survival analysis can be a more efficient alternative to the case-cohort design and the generalized case-cohort design. Further, comparing the results in Table 1, we note that, for a given total ODS sample size (Inline graphic), the efficiency improves as we allocate more individuals in the supplemental samples (e.g. Inline graphic vs. Inline graphic).

Table 1.

Results are based on the model Inline graphic, where Inline graphic and Inline graphic

Inline graphic
Inline graphic
Inline graphic Cutpoints Inline graphic Mean SD SE CP Mean SD SE CP
Inline graphic (0.30, 0.70) 0.60 Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
0.90 Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
(0.15, 0.85) 0.60 Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
0.90 Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
Inline graphic (0.30, 0.70) 0.60 Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
0.90 Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
(0.15, 0.85) 0.60 Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
0.90 Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic

The cutpoints for the ODS design were Inline graphic and Inline graphic sample quantiles and Inline graphic and Inline graphic sample quantiles, respectively; Inline graphic denotes the estimator from a simple random sample of the same size as the ODS sample; Inline graphic denotes the generalized case-cohort estimator developed by Kang and Cai (2009); Inline graphic denotes the case-cohort estimator developed by Prentice (1986), and the average sample size is 1000 by adjusting the full cohort sample size to different censoring rate Inline graphic; Inline graphic denotes the proposed EMSELE estimator. Simulation results are based on 2000 simulations with the total ODS sample size Inline graphic.

Table 2 provides additional simulation results on the sensitivity analysis and the unbalanced pattern of ODS supplemental samples allocations. We investigated the performance of the proposed estimator under unbalanced values of Inline graphic and Inline graphic by choosing Inline graphic, Inline graphic, Inline graphic and Inline graphic, respectively. The results reported in Table 2 indicate that, overall, the observed properties of Inline graphic under Table 1 is consistent for the balanced and unbalanced values of Inline graphic and Inline graphic.

Table 2.

Results are based on the model Inline graphic, where Inline graphic and Inline graphic

Inline graphic
Inline graphic
Inline graphic Inline graphic Mean SD SE CP Mean SD SE CP
Inline graphic
Inline graphic 0.60 Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
Censoring Scenario I: Inline graphic
Inline graphic 0.60 Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
0.90 Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
Censoring Scenario II: Inline graphic
Inline graphic 0.60 Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
0.90 Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic

The cutpoints for the ODS design were Inline graphic and Inline graphic sample quantiles. Censoring Scenario I: Inline graphic was generated from the distribution Inline graphic; Censoring Scenario II: Inline graphic was generated from the distribution Inline graphic. The results are based on Inline graphic replicates for each setting. Simulation results are based on the total ODS sample size Inline graphic.

The second component in Table 2 was conducted to evaluate the performance of Inline graphic when the censoring time depends on the covariates. We considered the following two scenarios: Scenario I: Inline graphic was generated from the distribution Inline graphic, and Inline graphic was chosen to be Inline graphic and Inline graphic; Scenario II: Inline graphic was generated from the distribution Inline graphic, and Inline graphic was chosen to be Inline graphic and Inline graphic. The results in Table 2 indicate that the dependence of censoring time on the covariates will lead to biased estimates of Inline graphic and Inline graphic. This suggests there is a need to check for censoring dependence on covariates before using the estimator in real analysis.

4. Analysis of the MoBa study data

The MoBa study is an ongoing pregnancy cohort study conducted by the Norwegian Institute of Public Health. Pregnant women in Norway were enrolled from 1999 to 2008 and completed questionnaires regarding demographic and lifestyle factors, and medical and reproductive history. Women were asked if their pregnancy was planned and reported TTP. Subfecundity was defined as having a TTP Inline graphic months. Our data set was based on women enrolled from 2003 to 2004 who delivered a live born child. Five hundred and fifty subjects were randomly sampled from the cohort who reported a TTP with Inline graphic subjects excluded. Four hundred subjects were supplementally sampled from women whose TTP was Inline graphic months (Whitworth and others, 2012). In this data set, there was no censoring. Hence, there is no need to check the independence of censoring on covariates here.

Among these Inline graphic eligible women, blood samples were collected around gestational week Inline graphic. Concentrations of PFOS and PFOA were measured from the maternal blood samples by high performance liquid chromatography/tandem mass spectrometry based on Inline graphic of plasma. In the analysis, we included the following variables as potential confounders: pre-pregnancy body mass index (BMI), maternal plasma albumin concentration (Alb), maternal consumption of lean fish and oily fish (Leanfish, Oilyfish), maternal age (MotherAge), paternal age (FatherAge), maternal education (MotherEdu), paternal education (FatherEdu), maternal smoking (Smoke3 for smoking Inline graphic months before pregnancy, Smoke17 for smoking at gestational week Inline graphic), maternal self-reported alcohol intake Inline graphic months before pregnancy (MotherDrink), frequency of sexual intercourse Inline graphic month before pregnancy (SexFreq), maternal diseases (endometriosis (Endo), ovary/fallopian tube infection (Ovary), sexually transmitted disease (Std), diabetes), and calendar year of blood draw (Yeardraw). Table 3 provides the demographic characteristics for all Inline graphic women.

Table 3.

Demographics and characteristics of the MoBa study

All individuals TTP Inline graphic months TTP Inline graphic12 months
PFOS, Inline graphic Inline graphic Inline graphic Inline graphic
PFOA, Inline graphic Inline graphic Inline graphic Inline graphic
BMI, Inline graphic Inline graphic Inline graphic Inline graphic
FatherAge Inline graphic Inline graphic Inline graphic Inline graphic
Alb, Inline graphic Inline graphic Inline graphic Inline graphic
Oilyfish, Inline graphic Inline graphic Inline graphic Inline graphic
Leanfish, Inline graphic Inline graphic Inline graphic Inline graphic
MotherAge (%)
Inline graphic25 17.36 (163/939) 13.95 (71/509) 21.40 (92/430)
 25–29 41.21 (387/939) 41.45 (211/509) 40.93 (176/430)
 30–34 31.63 (297/939) 34.18 (174/509) 28.60 (123/430)
Inline graphic35 9.80 (92/939) 10.41 (53/509) 9.07 (39/430)
MotherEdu (%)
Inline graphicHigh school 8.86 (83/937) 6.69 (34/508) 11.42 (49/429)
 High school and other 31.70 (297/937) 30.51 (155/508) 33.10 (142/429)
 Some college 41.73 (391/937) 43.90 (223/508) 39.16 (168/429)
Inline graphicCollege 17.72 (166/937) 18.90 (96/508) 16.32 (70/429)
FatherEdu (%)
Inline graphicHigh school 12.29 (112/911) 10.44 (52/498) 14.53 (60/413)
 High school and other 44.13 (402/911) 44.18 (220/498) 44.07 (182/413)
 Some college 26.89 (245/911) 27.71 (138/498) 25.91 (107/413)
Inline graphicCollege 16.68 (152/911) 17.67 (88/498) 15.50 (64/413)
Smoke3 (%)
 None 69.86 (656/939) 73.48 (374/509) 65.58 (282/430)
 Sometimes 10.12 (95/939) 9.63 (49/509) 10.70 (46/430)
 Daily 20.02 (188/939) 16.90 (86/509) 23.72 (102/430)
Smoke17 (%)
 None 76.25 (716/939) 79.57 (405/509) 72.33 (311/430)
 Stopped 15.76 (148/939) 14.73 (75/509) 16.98 (73/430)
 Sometimes 1.28 (12/939) 1.38 (7/509) 1.16 (5/430)
 Daily 6.71 (63/939) 4.32 (22/509) 9.53 (41/430)
MotherDrink (%)
Inline graphic1 per week 7.22 (67/928) 7.17 (36/502) 7.28 (31/426)
 1 per week 15.41 (143/928) 18.73 (94/502) 11.50 (49/426)
 1–3 per month 32.87 (305/928) 34.86 (175/502) 30.52 (130/426)
Inline graphic1 per month or never 44.50 (413/928) 39.24 (197/502) 50.70 (216/426)
SexFreq (%)
Inline graphic1 per week 18.45 (171/927) 17.33 (87/502) 19.76 (84/425)
 1–2 per week 37.97 (352/927) 33.67 (169/502) 43.06 (183/425)
Inline graphic2 per week 43.58 (404/927) 49.00 (246/502) 37.18 (158/425)
Endo (%)
 Yes 3.09 (29/939) 0.59 (3/509) 6.05 (26/430)
 No 96.91 (910/939) 99.41 (506/509) 93.95 (404/430)
Ovary (%)
 Yes 2.66 (25/939) 2.36 (12/509) 3.02 (13/430)
 No 97.34 (914/939) 97.64 (497/509) 96.98 (417/430)
Std (%)
 Yes 12.57 (118/939) 12.38 (63/509) 12.79 (55/430)
 No 87.43 (821/939) 87.62 (446/509) 87.21 (375/430)
Diabete (%)
 Yes 1.60 (15/939) 0.39 (2/509) 3.02 (13/430)
 No 98.40 (924/939) 99.61 (507/509) 96.98 (417/430)
YearDraw (%)
 2003 49.52 (465/939) 50.10 (255/509) 48.84 (210/430)
 2004 50.48 (474/939) 49.90 (254/509) 51.16 (220/430)

Odds ratio approach. We first implemented a standard epidemiologic approach by dichotomizing the subfecundity measurement as binary response, i.e. Inline graphic for TTP Inline graphic months and Inline graphic for TTP Inline graphic months, and used logistic regression to model the association between PFOA and subfecundity odds ratio adjusted for the confounders listed above. The result for this approach is summarized in the first column in Table 4. Due to missing values for covariates, the final sample size for the analysis included Inline graphic women, Inline graphic with TTP Inline graphic months, and Inline graphic with TTP Inline graphic months. We note that the odds of subfecundity increases by Inline graphic times when PFOA level increases one unit. The second and third columns of Table 4 show results for the cutpoints Inline graphic and Inline graphic months. Comparing across the columns, we note that different choices of the cutpoints result in different inferences. The odds ratios for TTP Inline graphic, TTP Inline graphic and TTP Inline graphic are Inline graphic, Inline graphic and Inline graphic, respectively. Further, the effect of PFOA becomes not significant when the cutpoint changes from Inline graphic to Inline graphic months (Inline graphic).

Table 4.

Logistic regression analyses for covariate effects on subfecundity in the MoBa study with different cutpoints

Inline graphic
Inline graphic
Inline graphic
Est. SE Inline graphic-value Est. SE Inline graphic-value Est. SE Inline graphic-value
Intercept Inline graphic7.2631 1.0236 Inline graphic0.0001Inline graphic Inline graphic5.1018 1.0265 Inline graphic0.0001Inline graphic Inline graphic9.0628 1.2646 Inline graphic0.0001Inline graphic
PFOA 0.2303 0.0652 0.0004Inline graphic 0.2646 0.0694 0.0001Inline graphic 0.1114 0.0779 0.1527
BMI 0.0563 0.0153 0.0002Inline graphic 0.0418 0.0158 0.0081Inline graphic 0.0402 0.0182 0.0271Inline graphic
MotherAge 0.7001 0.1078 Inline graphic0.0001Inline graphic 0.5182 0.1068 Inline graphic0.0001Inline graphic 0.9220 0.1404 Inline graphic0.0001Inline graphic
MotherEdu Inline graphic0.1428 0.0853 0.0939 Inline graphic0.2258 0.0864 0.0090Inline graphic Inline graphic0.3258 0.1124 0.0037Inline graphic
FatherAge 0.1293 0.0183 Inline graphic0.0001Inline graphic 0.1123 0.0188 Inline graphic0.0001Inline graphic 0.1421 0.0209 Inline graphic0.0001Inline graphic
SexFreq Inline graphic0.3372 0.0983 0.0006Inline graphic Inline graphic0.3080 0.0996 0.0020Inline graphic Inline graphic0.1915 0.1274 0.1328
Endo 2.4183 0.6341 0.0001Inline graphic 2.8997 1.0290 0.0048Inline graphic 2.2673 0.4307 Inline graphic0.0001Inline graphic

Inline graphicParameter estimate is significant at 5% level.

Inline graphic (Inline graphic).

Proposed ODS design and analysis. Using the Inline graphic women sampled randomly as the SRS portion and Inline graphic women sampled additionally from those with TTP Inline graphic months as a supplemental sample, we implemented our EMSELE method adjusted for all potential confounders. Due to missing values of covariates, the sample sizes of SRS and supplemental samples for the final fitted model were Inline graphic and Inline graphic, respectively. The result of the final fitted model is listed in the first column in Table 5.

Table 5.

Cox regression analyses for covariate effects on TTP in the MoBa study

ODS design (n_0=520, n_3=390)
Naive design (n_v=n_0+n_3=910)
SRS design (n_0=520)
Est. SE Inline graphic-value Est. SE Inline graphic-value Est. SE Inline graphic-value
PFOA Inline graphic0.0567 0.0253 0.0251Inline graphic Inline graphic0.0479 0.0331 0.1473 Inline graphic0.0633 0.0419 0.1304
BMI Inline graphic0.0148 0.0056 0.0090Inline graphic Inline graphic0.0222 0.0063 0.0004Inline graphic Inline graphic0.0110 0.0091 0.2282
MotherAge Inline graphic0.2155 0.0342 Inline graphic0.0001Inline graphic Inline graphic0.4066 0.0467 Inline graphic0.0001Inline graphic Inline graphic0.1662 0.0674 0.0137Inline graphic
MotherEdu 0.0968 0.0333 0.0037Inline graphic 0.1179 0.0396 0.0029Inline graphic 0.0730 0.0523 0.1630
FatherAge Inline graphic0.0470 0.0059 Inline graphic0.0001Inline graphic Inline graphic0.0801 0.0087 Inline graphic0.0001Inline graphic Inline graphic0.0415 0.0117 0.0004Inline graphic
SexFreq 0.1089 0.0396 0.0060Inline graphic 0.1133 0.0488 0.0203Inline graphic 0.0968 0.0649 0.1359
Endo Inline graphic1.2153 0.2906 Inline graphic0.0001Inline graphic Inline graphic0.7830 0.1879 Inline graphic0.0001Inline graphic Inline graphic0.8424 0.5027 0.0938

Inline graphicParameter estimate is significant at Inline graphic level.

We note that the estimate for PFOA is negative suggesting that PFOA level increases the risk of subfecundity (i.e. TTP Inline graphic). Women with a higher PFOA level tend to have a longer TTP, and per unit increment in PFOA, the risk of subfecundity increases with hazard ratio Inline graphic. Unsurprisingly, older mothers and fathers are more likely to have a longer TTP. Women who had endometriosis before pregnancy have a higher risk, hazard ratio Inline graphic. One advantage of our proposed method is that, given covariates, we can predict the risk probability of TTP Inline graphic for any TTP time Inline graphic. In contrast, only one risk probability (i.e. only for TTP Inline graphic) in the logistic method.

The analysis results using only the SRS portion of the ODS sample and treating the ODS sample as a SRS (Naive) are given in columns 3 and 2 of Table 5, respectively. Among the three estimation methods compared, the estimators are consistent and our proposed method is the most efficient one. However, the significant impact of PFOA level shows in neither Naive Design nor SRS Design. The proposed estimator is unbiased and more efficient.

Following another standard epidemiologic approach, we used the discrete-time analog of the Cox model to estimate the fecundability odds ratio (FOR). The analyses were adjusted by the same confounders used in Tables 4 and 5. The resulted FOR is Inline graphic with SE Inline graphic, Inline graphic CI Inline graphic and Inline graphic. Note the FOR reflects the fecundability odds ratio for 1 month, which is different from the hazard ratio from the Cox model.

In summary, the proposed method provides an efficient and consistent estimate, utilizes fully the available survival data and takes advantage of the nature of the ODS design. It does not have the inconsistent issue of the existing odds ratio approach used by epidemiologists.

5. Discussion

In this paper, motivated by the need to assess the relationship between the PFASs on women's subfecundability on our study of the MoBa study, we designed a general ODS sampling scheme for survival studies with a failure-time outcome. To reap in the benefit of such a survival ODS design, we developed a new inferential method and provide an EMSELE for the parameters of primary interest. Our proposed ODS method is an improvement over the current odds ratio approach used in epidemiology as well as improvement over the case-cohort and the generalized case-cohort designs. Because we allow the sample selection of cases to depend on the timing of disease endpoints, i.e. by oversampling subjects from the most informative regions, the proposed ODS design for survival data can enhance study efficiency and reduce study cost.

Simulation studies suggest that the small-sample performance of the proposed method approximates the asymptotic properties well. Our proposed estimator is the most efficient one among the four competing estimators: the maximum likelihood estimator based on the Cox's likelihood from a sample random sample of same size as the ODS sample, the estimator under case-cohort design developed by Prentice (1986), and the weighted estimator under generalized case-cohort design developed by Kang and Cai (2009). The efficiency gain shows that our proposed method is a cost-efficient alternative to case-cohort and generalized case-cohort designs.

A few comments on the behavior of the proposed design/estimator and some cautionary points on practical applications of the proposed method are presented. First, we note that the proposed ODS design and our estimator is more efficient than the SRS design when censoring rate is high. This is due to the fact that at high censoring cases, the SRS design will have substantial fewer failures than the ODS design. This suggests that ODS design is particularly useful when censoring is high. Secondly, we caution users that when censoring is extremely high (e.g. at Inline graphic and Inline graphic), our variance estimator based on the asymptotic properties could overestimate the true variance. One would need to either increase the sample size or employ some alternative variance estimator, such as the bootstrap estimator, in the small-sample situations. Thirdly, our estimator from (2.6) is based on assuming censoring is independent of covariates. Biases in effect estimation could result if this is violated (Table 2). A good practice is to check this assumption in real data analysis using the SRS sample. Finally, we estimated Inline graphic with a simple consistent estimator from the SRS sample. Alternative estimators that use more available data could be used.

Application of the proposed ODS method to analyze the MoBa data suggested that women with higher PFOA levels tend to have a longer TTP. On the other hand, the default epidemiologic approach for the odds ratio under different TTP cutpoints yields inconsistent results. Comparing with competing designs, our proposed ODS method provides a feasible design and efficient estimates. Future study includes developing models and estimation procedures appropriate for studies with multiple disease outcomes. In some studies, researchers may need to consider several diseases or several subtypes of disease (Lu and Shih, 2006; Kang and Cai, 2009). For example, in the Busselton Health Study (Cullen, 1972), it was of interest to study the relationship between serum ferritin and coronary heart disease and stroke events.

Supplementary material

Supplementary material is available at http://biostatistics.oxfordjournals.org.

Funding

This research was supported in part by U.S. National Institute of Health grants (R01 ES021900, UL1 RR025747, P01 CA142538), and the Intramural Research Program of the National Institutes of Health, National Institute of Environmental Health Sciences. The Norwegian Mother and Child Cohort Study is supported by the Norwegian Ministry of Health and the Ministry of Education and Research, NIH/NIEHS (N01-ES-75558), NIH/NINDS (UO1 NS 047537-01, UO1 NS 047537-06A1), and the Norwegian Research Council/ FUGE (151918/S10). This research is also funded in part by National Science Foundation of China (11101314 to J.D. and 11171263, 11371299 to Y.L.).

Supplementary Material

Supplementary Data

Acknowledgements

We are grateful to all the participating families in Norway who take part in this ongoing cohort study. Conflict of Interest: None declared.

References

  1. Alexander B. H., Olsen G. W., Burris J. M., Mandel J. H., Mandel J. S. Mortality of employees of a perfluorooctanesulphonyl fluoride manufacturing facility. Occupational and Environmental Medicine. 2003;60:722–729. doi: 10.1136/oem.60.10.722. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Breslow N. E., Wellner J. A. Weighted likelihood for semiparametric models and two-phase stratified samples, with application to cox regression. The Scandinavian Journal of Statistics. 2007;34:86–102. doi: 10.1111/j.1467-9469.2007.00574.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Cai J., Zeng D. Power calculation for case-cohort studies with nonrare events. Biometrics. 2007;63:1288–1295. doi: 10.1111/j.1541-0420.2007.00838.x. [DOI] [PubMed] [Google Scholar]
  4. Chatterjee N., Chen Y. H., Breslow N. E. A pseudoscore estimator for regression problems with two-phase sampling. Journal of the American Statistical Association. 2003;98:158–168. [Google Scholar]
  5. Chen K. Generalized case-cohort sampling. Journal of the Royal Statistical Society, Series B. 2001;63:791–809. [Google Scholar]
  6. Cornfield J. A method of estimating comparative rates from clinical data. Applications to cancer of lung, breast, and cervix. Journal of National Cancer Institute. 1951;11:1269–1275. [PubMed] [Google Scholar]
  7. Cox D. R. Regression models and life-tables. Journal of the Royal Statistical Society, Series B. 1972;34:187–220. [Google Scholar]
  8. Cullen K. J. Mass health examinations in the busselton population, 1966 to 1970. The Medical Journal of Australia. 1972;2:714–718. doi: 10.5694/j.1326-5377.1972.tb103506.x. [DOI] [PubMed] [Google Scholar]
  9. Fei C. Y., McLaughlin J. K., Lipworth L., Olsen J. Maternal levels of perfluorinated chemicals and subfecundity. Human Reproduction. 2009;24:1200–1205. doi: 10.1093/humrep/den490. [DOI] [PubMed] [Google Scholar]
  10. Johansen S. An extension of Cox's regression Model. International Statistical Review. 1983;51:165–174. [Google Scholar]
  11. Johansson N., Fredriksson A., Eriksson P. Neonatal exposure to perfluorooctane sulfonate (PFOS) and perfluorooctanoic acid (PFOA) causes neurobehavioural defects in adult mice. Neurotoxicology. 2008;29:160–169. doi: 10.1016/j.neuro.2007.10.008. [DOI] [PubMed] [Google Scholar]
  12. Kang S., Cai J. Marginal hazards model for case-cohort studies with multiple disease outcomes. Biometrika. 2009;96:887–901. doi: 10.1093/biomet/asp059. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Lu S., Shih J. H. Case-cohort designs and analysis for clustered failure time data. Biometrics. 2006;62:1138–1148. doi: 10.1111/j.1541-0420.2006.00584.x. [DOI] [PubMed] [Google Scholar]
  14. Lu W., Tsiatis A. A. Semiparametric transformation models for the case-cohort study. Biometrika. 2006;93:207–214. [Google Scholar]
  15. Owen A. B. Empirical likelihood for confidence regions. The Annals of Statistics. 1990;18:90–120. [Google Scholar]
  16. Prentice R. A case-cohort design for epidemiologic cohort studies and disease prevention trials. Biometrika. 1986;73:1–11. [Google Scholar]
  17. Song R., Zhou H., Kosorok M. R. On semiparametric efficient inference for two-stage outcome dependent sampling with a continuous outcome. Biometrika. 2009;96:221–228. doi: 10.1093/biomet/asn073. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Sun J., Sun L., Flournoya N. Additive hazards model for competing risks analysis of the case-cohort design. Communications in Statistics—Theory and Methods. 2004;33:351–366. [Google Scholar]
  19. Tsai W. Y. Pseudo-partial likelihood for proportional hazards models with biased-sampling data. Biometrika. 2009;96(3):601–615. doi: 10.1093/biomet/asp026. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Vardi Y. Nonparametric estimation in the presence of length bias. The Annals of Statistics. 1982;10:616–620. [Google Scholar]
  21. Vardi Y. Empirical distributions in selection bias models. The Annals of Statistics. 1985;13:178–203. [Google Scholar]
  22. Weaver M. A., Zhou H. An estimated likelihood method for continuous outcome regression models with outcome-dependent sampling. Journal of The American Statistical Association. 2005;100:459–469. [Google Scholar]
  23. Weinberg C. R., Wacholder S. Prospective analysis of case-control data under general multiplicative-intercept risk models. Biometrika. 1993;80:461–465. [Google Scholar]
  24. Whittemore A. S. Multistage sampling designs and estimating equations. Journal of the Royal Statistical Society, Series B. 1997;59:589–602. [Google Scholar]
  25. Whitworth K. W., Haug L. S., Baird D. D., Becher G., Hoppin J. A., Skjaerven R., Thomsen C., Eggesbo M., Travlos G., Wilson R., Longnecker M. P. Perfluorinated compounds and subfecundity in pregnant women. Epidemiology. 2012;23:257–263. doi: 10.1097/EDE.0b013e31823b5031. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Zhou H., Song R., Qin J. Statistical inference for a two-stage outcome dependent sampling design with a continuous outcome. Biometrics. 2011;67:194–202. doi: 10.1111/j.1541-0420.2010.01446.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Zhou H., Weaver M., Qin J., Longnecker M., Wang M. C. A semiparametric empirical likelihood method for data from an outcome dependent sampling scheme with a continuous outcome. Biometrics. 2002;58:413–421. doi: 10.1111/j.0006-341x.2002.00413.x. [DOI] [PubMed] [Google Scholar]
  28. Zhou H., Wu Y., Liu Y., Cai J. Semiparametric inference for a 2-stage outcome-auxiliary-dependent sampling design with continuous outcome. Biostatistics. 2011;12:521–534. doi: 10.1093/biostatistics/kxq080. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Data

Articles from Biostatistics (Oxford, England) are provided here courtesy of Oxford University Press

RESOURCES