Skip to main content
Biostatistics (Oxford, England) logoLink to Biostatistics (Oxford, England)
. 2021 May 5;24(1):32–51. doi: 10.1093/biostatistics/kxab014

Semiparametric regression analysis of bivariate censored events in a family study of Alzheimer’s disease

Fei Gao 1,, Donglin Zeng 2, Yuanjia Wang 3
PMCID: PMC9748583  PMID: 33948627

Summary

Assessing disease comorbidity patterns in families represents the first step in gene mapping for diseases and is central to the practice of precision medicine. One way to evaluate the relative contributions of genetic risk factor and environmental determinants of a complex trait (e.g., Alzheimer’s disease [AD]) and its comorbidities (e.g., cardiovascular diseases [CVD]) is through familial studies, where an initial cohort of subjects are recruited, genotyped for specific loci, and interviewed to provide extensive disease history in family members. Because of the retrospective nature of obtaining disease phenotypes in family members, the exact time of disease onset may not be available such that current status data or interval-censored data are observed. All existing methods for analyzing these family study data assume single event subject to right-censoring so are not applicable. In this article, we propose a semiparametric regression model for the family history data that assumes a family-specific random effect and individual random effects to account for the dependence due to shared environmental exposures and unobserved genetic relatedness, respectively. To incorporate multiple events, we jointly model the onset of the primary disease of interest and a secondary disease outcome that is subject to interval-censoring. We propose nonparametric maximum likelihood estimation and develop a stable Expectation-Maximization (EM) algorithm for computation. We establish the asymptotic properties of the resulting estimators and examine the performance of the proposed methods through simulation studies. Our application to a real world study reveals that the main contribution of comorbidity between AD and CVD is due to genetic factors instead of environmental factors.

Keywords: Alzheimer’s disease, Cardiovascular diseases, Comorbidity, Event history analysis, Multivariate survival analysis, Precision medicine, Random effects, Risk prediction

1. Introduction

Assessing disease comorbidity patterns in families represents the first step in gene mapping for a disease, and evaluating genetic risk in families is crucial for implementing precision medicine in routine clinical care (Aronson and Rehm, 2015). Case–control family studies and kin-cohort studies, because of their cost-effectiveness and ability to include disease family history information from all family members including deceased relatives, have been adopted to study genetic risk for many diseases including breast cancer, Alzheimer’s disease (AD), and Parkinson’s disease (Wacholder and others, 1998; Chen and others, 2009; Wang and others, 2015; Hsu and others, 2018). In these studies, independent subjects called probands are recruited and undergo genotyping, and the history of a primary disease of interest and secondary diseases is obtained from the probands and their family members (relatives). The relatives are usually not genotyped but their genetic mutation status may be partially inferred using the family pedigree structure under the Mendelian assumption.

Our motivating study is the Washington Heights, Hamilton Heights, Inwood Community Aging Project (WHICAP), a community-based, prospective longitudinal study of aging and dementia among elderly, urban-dwelling residents (Tang and others, 2001). The main objective of the project is to identify risk factors and biomarkers for aging and late onset AD (age-at-onset greater than 65; Reitz and others, 2011) in a multiethnic cohort. The project began enrolling patients in 1989 and recruited participants by a probability sample from Medicare beneficiaries in the Northern Manhattan communities. The study followed more than 5900 residents over 65 years of age and has yielded comprehensive data on the rates and risk factors for AD and other dementias among African-Americans, Caribbean Hispanics, and Caucasians living in these Northern Manhattan communities (Tang and others, 2001; Stern and others, 2017). Probands from WHICAP were genotyped at recruitment for the frequency of the Apolipoprotein E (APOE) Inline graphic4 allele. In addition, they received a structured family history interview that includes family history of AD and cardiovascular diseases (CVD) among first-degree relatives (Maestre and others, 1995). The probands were then followed for AD and CVD occurrences, so that the exact event time or right-censoring time on their AD onset and interval-censoring time (when exact event time was not available) on their CVD onset were obtained. For relatives, the history of both AD and CVD was only assessed once at the study baseline so that the relatives contribute current status data (i.e., whether the disease occurred before when a subject was contacted by the study).

Major challenges to analyze data from family history studies such as WHICAP are to account for potential dependence among relatives and to distinguish dependence attributed to disease heredity from shared environmental exposures. Simply assuming conditional independence for the disease events from all family members given their genetic variants is not sufficient to account for such dependence so may lead to suboptimal findings. To analyze family history data, Hsu and others (2007), Chen and others (2009), Graber-Naidich and others (2011), and Gorfine and others (2013) introduced a class of shared frailty models, where a single shared random effect was introduced to characterize dependency among family members. Gorfine and Hsu (2011) considered a general structure for random effects among family members for multivariate event time subject to competing risks. Later on, Hsu and others (2018) modeled the dependency among family members by more flexible Copula models. Even though these existing methods accounted for dependence among relatives, they cannot disentangle distinct correlation due to disease heredity from that of shared family-specific environmental exposures.

Another complication to incorporating family history data is that the exact time of disease onset is not always available because of the retrospective nature of obtaining disease history information from the family members. In our motivating study of WHICAP, only 675 out of a total of 8465 subjects had the exact time of AD onset, and current status (including right-censored) data for the rest were observed. Mixed types of censoring for the disease onset time pose a major challenge for analysis and inference. All existing methods in analyzing family history data assume right-censored onset times so are not applicable.

Lastly, many family studies also collect (incomplete) onset time for secondary diseases, for instance, CVD in the WHICAP study. Information on secondary events from family members allows characterizing dependence between comorbid disorders due to genetic factors and environmental exposures, which is an important objective of comorbidity research in precision medicine. In addition, incorporating this secondary disease outcome into the analysis can potentially increase the power of findings for the primary disease onset. However, the challenge for analysis is that the secondary disease is often incomplete and known to occur within an interval so is interval-censored.

In this article, we propose a semiparametric regression model framework to address all aforementioned challenges in analyzing family studies data. The research goals are to estimate genetic risk at a causal gene or major locus, assess effects of unobserved environmental risk factors, and genetic risk at unobserved loci (polygenic effects), and dissect relative importance of genetic risks versus environmental contribution to multiple traits (e.g., AD and CVD). We make several contributions that are not available in the existing literature: (i) distinguish the dependence due to shared environmental risk factors (e.g., shared lifestyle or diet) from genetic hereditary risk factors; (ii) handle multilevel correlation where subjects nested in families and events nested in subjects; (iii) handle complex censoring patterns; (iv) jointly model multiple disease events to predict primary disease risk given history of other events; and (v) establish asymptotic properties of the estimators.

We use a family-specific random effect to account for the dependence due to shared environmental risk factors. In addition, we use subject-specific random effects to represent unobserved genetic relatedness and heredity such that their correlation is consistent with the kinship coefficients among family members. To incorporate information from a secondary disease outcome, we jointly model the onset event times of the primary and secondary diseases. Such joint modeling allows predicting risk of the primary disease given the medical history of the other event collected on the subject and family event history. We propose nonparametric maximum likelihood estimation and develop a stable Expectation-Maximization (EM) algorithm for computation. The method and algorithm can efficiently handle complex censoring schemes including left- and right-censoring, current status, and interval-censoring, with several techniques in place to improve computational efficiency. We establish the asymptotic properties of the resulting estimators using empirical processes and semiparametric efficiency theories. Finally, we examine the performance of the proposed methods through simulation studies and application to the WHICAP study.

2. Methods

2.1. Model and data

Consider a random sample of Inline graphic independent families. For subject Inline graphic in the Inline graphicth family of size Inline graphic (Inline graphic, let Inline graphic denote the primary event time, Inline graphic denote the secondary event time, Inline graphic denote the genotype, and Inline graphic denote a Inline graphic-vector of all other baseline covariates. Write Inline graphic. We assume that the cumulative hazard function of Inline graphic given random effects is given by

graphic file with name Equation1.gif (2.1)

Here, Inline graphic is a Inline graphic-vector of regression coefficients and Inline graphic is an unknown baseline cumulative hazard function. Two-level random effects are introduced: Inline graphic is a family-specific random effect representing the shared environmental risks among family members; and Inline graphic is a subject-specific random effect reflecting the genetic relatedness (polygenic effect) among family members. Two-level random effects representing environmental risks and genetic relatedness has also been considered by Pfeiffer and others (2001), where the disease outcome is binary. The shared environmental risks among family members may come from shared social culture, living conditions, or life style, which leads to common nutrition intake, exposure of air pollution, level of physical activities, and so on. We assume that Inline graphic and Inline graphic are independent with

graphic file with name Equation2.gif

where Inline graphic is the (known) kinship matrix within the family, and Inline graphic and Inline graphic are unknown variance component parameters. Kinship matrix is used to account for correlation among relatives due to shared latent genetic risk factors (not captured by Inline graphic) at multiple loci for complex disorders, as opposed to Mendelian disorders in which mutation in a single gene determines the development of the disease (Khoury and others, 1993). For example, for a family of mother, father, and two children, Inline graphic is given by

graphic file with name Equation3.gif

reflecting the fact that each offspring inherits one gene from each parent independently. To further include the onset time of secondary disease in the analysis, we assume that the cumulative hazard function of Inline graphic given random effects is

graphic file with name Equation4.gif

where Inline graphic is a Inline graphic-vector of regression coefficients, Inline graphic and Inline graphic are unknown coefficients, and Inline graphic is an unknown baseline cumulative hazard function. Here, Inline graphic represents the magnitude of the dependence between the two diseases due to the shared environment, and Inline graphic represents the impact of the same genetic component on the secondary disease. In particular, a nonzero Inline graphic indicates that the two disease outcomes are correlated due to shared latent genetic factors and thus implying common genetic causes other than the observed genotype Inline graphic.

We assume that the primary disease is subject to a mixed pattern of censoring so that Inline graphic can be left-censored, right-censored, or exactly observed. We let Inline graphic denote a censoring time, Inline graphic denote, by the values of 0, 1, or 2, if the observation is right-censored, exactly observed, or left-censored, respectively, and Inline graphic denote the observation time. In addition, the secondary disease is subject to interval-censoring such that Inline graphic is only known to lie within an interval Inline graphic. We assume that all censoring times are independent of Inline graphic and Inline graphic given covariates. The determination of Inline graphic and Inline graphic is based on the design of the study. In WHICAP, Inline graphic is the right-censoring time for probands and it is the age at baseline for relatives who contributed current status data. For the secondary diseases, Inline graphic and Inline graphic are the adjacent examination times where the disease occurred in between.

Since only probands are genotyped, we introduce Inline graphic to denote, by the values of 1 versus 0, whether subject Inline graphic in family Inline graphic is proband such that genotype Inline graphic is available. The observed data consist of Inline graphic, where

graphic file with name Equation5.gif

Given Inline graphic, and Inline graphic, the probability of observing Inline graphic for subject Inline graphic in family Inline graphic is given by

graphic file with name Equation6.gif

where Inline graphic denotes the derivative of Inline graphic. Since not all Inline graphic are observed, we define Inline graphic as the set of all possible genotypes compatible with the observed Inline graphic for family Inline graphic and Inline graphic as the corresponding probability for Inline graphic that is determined by the Mendelian law and the (known) population frequency of genetic mutation denoted by Inline graphic. For example, for family Inline graphic of a proband with genotype Inline graphic and his/her parent with unmeasured genotype, Inline graphic, Inline graphic, and Inline graphic. This approach of using Mendelian law to determine relatives’ genotypes given their proband’s genotype was also used in Graber-Naidich and others (2011). The observed-data likelihood function is

graphic file with name Equation7.gif

where, by slight abuse of notation, Inline graphic is the univariate normal density with mean zero and variance Inline graphic, and Inline graphic is the multivariate normal density with mean zero and variance Inline graphic.

Remark. —

Semiparametric regression analysis for multiple event times subject to right- or interval-censoring have been considered in literature, e.g., Gao and others (2019), where random effects were introduced to account for dependence among event times. However, the methods are not directly applicable. In this article, the number of random effects increases with the number of members in each family (Inline graphic random effects for family with size Inline graphic) and the random effects have special correlation structure among family members, such that distinct challenges in computation are posed. In addition, the structure of correlation requires special care in examining model identifiability and invertibility of the information operator in showing the asymptotic properties of the estimators. More details are given in Section S.3 of the Supplementary material available at Biostatistics online.

2.2. Nonparametric maximum likelihood estimation

We wish to maximize the forgoing likelihood function to obtain the estimators. We apply a nonparametric maximum likelihood estimation approach, where Inline graphic and Inline graphic are allowed to be step functions. In particular, we let Inline graphic be the ordered sequence of all Inline graphic with Inline graphic and Inline graphic be the ordered sequence of all Inline graphic and Inline graphic with Inline graphic. The estimators for Inline graphic and Inline graphic are step functions that take jumps at Inline graphic and Inline graphic with respective jump sizes Inline graphic and Inline graphic. We maximize the following objective function

graphic file with name Equation8.gif

where

graphic file with name Equation9.gif

and Inline graphic is the jump size of Inline graphic at Inline graphic.

Direct maximization of the objective function is difficult due to the lack of the analytical expressions of the maximizers. We introduce latent random variables to form a likelihood equivalent to the objective function such that the maximum likelihood estimators can be easily obtained via a simple EM algorithm. In particular, we introduce a sequence of independent Poisson random variables Inline graphicInline graphic with rate Inline graphic. Let

graphic file with name Equation10.gif

Then, the observed-data likelihood of Inline graphic given Inline graphic, Inline graphic, Inline graphic, and Inline graphic is

graphic file with name Equation11.gif

In addition, we introduce another sequence of independent Poisson random variables Inline graphicInline graphic with rate Inline graphic, where Inline graphic. Let

graphic file with name Equation12.gif

The observed-data likelihood of Inline graphic given Inline graphic, Inline graphic, Inline graphic, and Inline graphic is

graphic file with name Equation13.gif

Therefore, the objective function Inline graphic can be viewed as the observed-data likelihood for Inline graphicInline graphic, treating Inline graphic, Inline graphic, Inline graphic, Inline graphic, and Inline graphic as missing data. We then propose an EM algorithm with details given in Section S.1 of the Supplementary material available at Biostatistics online. We denote the final estimators as Inline graphic and Inline graphic.

The use of Poisson random variables in an EM algorithm was originally proposed in Wang and others (2016) and Zeng and others (2016), but our case is substantially different and more complicated because of the mixed-type-censoring structure, nested random effects, and clustering nature of the observed data. By a similar argument as in Section S.1 in Zeng and others (2017), the likelihood increases at each iteration of the EM algorithm. The algorithm has several desirable features. First, large scale optimization for the jump sizes is avoided since the they are updated explicitly in the M-step. Second, the regression parameters are updated by solving estimating equations similar to the Poisson regression score equations via one-step Newton–Raphson. Finally, for a family with size Inline graphic, the E-step involves Inline graphic-dimensional numerical integration, which can be further simplified based on the structure of kinship matrix Inline graphic. In Section S.2 of the Supplementary material available at Biostatistics online, we provide an accelerated algorithm with simplified computation for nuclear families.

2.3. Prediction of primary disease onset

Based on the proposed model, we are able to predict the event time of the primary disease given the disease history from individuals and their relatives. Here, we evaluate the conditional density of the random effects given observed disease history and predict the future occurrence of the primary disease by replacing the density of the random effects by such conditional density. This approach has been commonly applied for risk prediction with frailty models (Gorfine and others, 2013; Gorfine and others, 2014).

Suppose that at time Inline graphic, a proband with genotype Inline graphic and covariates Inline graphic has not developed the primary disease. We wish to predict the future occurrence of primary disease given his/her history of the secondary disease, denoted by the interval-censored observation Inline graphic. Note that the conditional density of the random effects Inline graphic given the event history is proportional to

graphic file with name Equation14.gif

where Inline graphic. We predict the event time of the primary disease by the conditional cumulative distribution function Inline graphic that is given by

graphic file with name Equation15.gif (2.2)

If disease history of family members is also available, it can be used to further update the conditional density of the random effects Inline graphic. In particular, let Inline graphic, Inline graphic, and Inline graphic denote the respective mixed-type-censored observation of the primary disease, interval-censored observation of the secondary disease, and covariates for family member Inline graphic. The conditional density of Inline graphic is then proportional to

graphic file with name Equation16.gif

where Inline graphic is the kinship matrix. The conditional cumulative distribution function of the primary disease is then given by

graphic file with name Equation17.gif (2.3)

The quantities can be estimated though replacing the parameters by the estimators and evaluating the integrals by numerical integration with Gauss-Hermite quadratures.

3. Asymptotic properties

Suppose that the status of the secondary disease for subject Inline graphic in family Inline graphic is determined at a sequence of examination times Inline graphic, which have finite support Inline graphic with the least upper bound Inline graphic. We assume the following regularity conditions.

(A1) The true value Inline graphic lies in a known compact set Inline graphic in the interior of the domain for Inline graphic. The true value Inline graphic is strictly increasing and continuously differentiable on Inline graphic with Inline graphic. The true value Inline graphic is strictly increasing and continuously differentiable on Inline graphic with Inline graphic.

(A2) There exists some positive constant Inline graphic such that Inline graphic almost surely.

(A3) The number of potential examination times Inline graphic is positive with Inline graphic. There exists a positive constant Inline graphic such that Inline graphic. In addition, there exists a probability measure Inline graphic in Inline graphic such that the bivariate distribution function of Inline graphic conditional on Inline graphic is dominated by Inline graphic and its Radon–Nikodym derivative, denoted by Inline graphic, can be expanded to a positive and twice-continuously differentiable function in the set Inline graphic.

(A4) There exists a constant Inline graphic such that the family size Inline graphic satisfies Inline graphic and Inline graphic. The family size Inline graphic is independent of the random effects Inline graphic and Inline graphic’s.

(A5) Conditional on Inline graphic, let Inline graphic be a Inline graphic-vector with only the first and the Inline graphicth elements equal to 1 for Inline graphic. If there exist a constant Inline graphic, a constant vector Inline graphic, and constants Inline graphic and Inline graphic such that for any Inline graphic, and Inline graphic,

graphic file with name Equation18.gif

and

graphic file with name Equation19.gif

for Inline graphic with probability 1, then Inline graphic, Inline graphic, and Inline graphic.

(A6) Conditional on Inline graphic, if Inline graphic with probability 1 for some function Inline graphic, then Inline graphic.

Conditions (A1)Inline graphic(A3) are the standard assumptions for clustered mixed- and interval-censored survival data. Condition (A4) assumes that the family size is bounded, there exist at least some families with at least two members, and the family size is not informative. Conditions (A5) and (A6) are crucial conditions to ensure parameter identifiability in the presence of multiple random effects (Lemma 1 in Section S.3 of the Supplementary material available at Biostatistics online). In particular, (A5) holds if Inline graphic does not concentrated on a hyperplane of lower dimensions with probability one and Inline graphic has nonzero off-diagonal elements with a positive probability. In (A6), we require that for a fixed family size, the matrix formed by the vectors Inline graphicInline graphic across all possible pedigree structures is a full rank matrix.

We state the consistency of Inline graphic and the weak convergence of Inline graphic in two theorems.

Theorem 1. —

Under conditions (A1)–(A6), Inline graphic, and Inline graphic almost surely, where Inline graphic denotes the supremum norm on Inline graphic and Inline graphic is the end of study time.

Theorem 2. —

Under conditions (A1)–(A6), Inline graphic converges weakly to a Inline graphic-variate zero-mean normal random vector with a covariance matrix that attains the semiparametric efficiency bound.

The proofs of the theorems are provided in Section S.3 of the Supplementary material available at Biostatistics online. To estimate the covariance matrix of Inline graphic, we define the profile likelihood function

graphic file with name Equation20.gif

where Inline graphic is the set of step functions with nonnegative jumps at Inline graphic, and Inline graphic is the set of step functions with nonnegative jumps at Inline graphic. Let Inline graphic denote the Inline graphicth family’s contribution to Inline graphic, i.e., Inline graphic, where Inline graphic is the log-likelihood function for the Inline graphicth family, and Inline graphic. We then estimate the covariance matrix of Inline graphic by the inverse of the matrix

graphic file with name Equation21.gif

where Inline graphic is the Inline graphicth canonical vector in Inline graphic, Inline graphic, and Inline graphic is a constant of order Inline graphic that is used for numerical differentiation of the log profile likelihood function. To evaluate the profile likelihood, we use the EM algorithm in Section S.1 of the Supplementary material available at Biostatistics online but fix the value of Inline graphic and only update Inline graphic and Inline graphic in the M-step.

4. Numerical studies

4.1. Simulations

To examine the performance of the proposed methods, we conducted simulation studies that mimicked data collection procedure in WHICAP. We considered nuclear families only, where the proband genotype was simulated from Inline graphic with equal probabilities and the genotypes for parents and other children were generated following the Mendelian law with the population frequency of genetic mutation Inline graphic. We simulated independent baseline covariates Inline graphic from Bernoulli(0.5) and Inline graphic from Inline graphic. We set Inline graphic, Inline graphic, Inline graphic, Inline graphic, Inline graphic, and Inline graphic. The simulated data set includes nuclear families with different structures. In particular, we generated families with (i) one proband, (ii) one proband and two parents, and (iii) one proband, two parents, and two siblings, with probability 0.5, 0.2, and 0.3, respectively. We considered number of families Inline graphic and Inline graphic with respective average sample sizes 260 and 520.

We generated the censoring mechanism of event times similar to that in WHICAP. In particular, the age at the recruitment was generated from Uniform(3, 6) and Uniform(0, 3) for parents and children (proband and proband’s siblings), respectively. The proband is followed for the occurrence of the primary disease till a censoring time generated from Uniform(7, 10). The status of the secondary disease for the probands was determined at a sequence of examinations, with gap times generated from Inline graphic, till censored. Relatives’ primary and secondary disease statuses were only measured once at the age of the recruitment as in WHICAP.

We considered 2000 replicates for each sample size. All replicates converge with Inline graphic convergence criterion and the average computation times for one replicate (including variance estimation) are 11 and 100 hours for family sizes Inline graphic and 200, respectively. Table 1 shows the simulation results. The standard errors were estimated with Inline graphic, while the results are not sensitive to different choices of Inline graphic (e.g., Inline graphic). The biases for all parameter estimators are small and decrease as Inline graphic increases. The variance estimators for Inline graphic and Inline graphic are accurate, especially for large Inline graphic. The variance estimators for Inline graphic, Inline graphic, Inline graphic, and Inline graphic tend to overestimate the actual variabilities and this phenomenon has been reported in literature (Zeng and others, 2017; Gao and others, 2019) in inference for variance components, since variance estimates based on the curvature of the profile likelihood function may not be accurate when the estimate is close to zero. The 95% confidence intervals for all parameters have reasonable coverage probabilities. Figure 1 shows the results on estimation of the baseline cumulative distribution functions. Since the last jump times for most of the replicates are greater than 9 (76% and 59% of the replicates for the primary and secondary events with Inline graphic), the estimators were plotted till time Inline graphic. The proposed estimators are virtually unbiased.

Table 1.

Summary statistics for the simulation studies. Bias, SE, SEE, and CP stand for median empirical bias, empirical standard error, median standard error estimator, and empirical coverage percentage of the 95% confidence interval, respectively. For Inline graphic and Inline graphic, the confidence interval is based on the log transformation. Each entry is based on 2000 replicates.

    Inline graphic Inline graphic
  True value Bias SE SEE CP Bias SE SEE CP
Inline graphic 0.5 0.012 0.210 0.213 0.96 0.014 0.149 0.141 0.94
Inline graphic -0.5 -0.023 0.236 0.241 0.96 -0.016 0.169 0.159 0.94
Inline graphic 0.5 0.021 0.139 0.149 0.97 0.019 0.100 0.098 0.96
Inline graphic 0.2 0.012 0.196 0.187 0.94 0.021 0.129 0.121 0.94
Inline graphic 0.4 0.039 0.245 0.237 0.95 0.018 0.164 0.153 0.94
Inline graphic -0.8 -0.068 0.171 0.180 0.96 -0.033 0.117 0.116 0.94
Inline graphic 1 -0.031 0.849 1.099 0.98 -0.025 0.628 0.667 0.96
Inline graphic 1 0.034 0.758 1.098 0.96 0.000 0.490 0.630 0.94
Inline graphic 0.25 -0.006 0.287 0.383 0.95 0.001 0.196 0.248 0.94
Inline graphic 0.5 -0.020 0.491 0.660 0.96 0.023 0.362 0.431 0.95

Fig. 1.

Fig. 1.

Simulation results on estimation of the baseline cumulative distribution functions. The solid black curve, dashed red curve, and dotted blue curve pertain, respectively, to the true value and mean estimates from the proposed method with family sizes 100 and 200. Each estimate is based on 2000 replicates.

To assess predictive performance of the proposed methods, we consider three performance measures. Particularly, we evaluate discrimination by the Concordance (C) statistic (Harrell Jr, 2001), evaluate calibration by the ratio of the observed and expected number of events (O/E), and evaluate the prediction accuracy by the mean squared error of prediction (MSEP). For individual Inline graphic in a prediction data set, let Inline graphic be the predicted conditional cumulative distribution function of the primary disease at time Inline graphic. Let Inline graphic be the indicator of primary disease occurrence at time Inline graphic. The C statistic is defined by

graphic file with name Equation22.gif

where Inline graphic. It shall vary between 0.5 and 1, with a higher value corresponding a better discriminative model. The O/E is given by Inline graphic, which should be close to 1 for a well-calibrated model. The MSEP is defined as Inline graphic, where Inline graphic is the (true) conditional cumulative distribution function at time Inline graphic for individual Inline graphic. It shall be close to zero for a model with good accuracy.

We assess the predictive performance of the proposed methods, comparing with the univariate approach where family data on the primary event is modeled using model (2.1). For each replicate, an independent data set with Inline graphic was generated similarly for prediction performance evaluation. Particularly, in the evaluation data set, proband disease history up to time Inline graphic was observed along with all family disease history, to predict the primary disease occurrence at time Inline graphic. Table 2 shows the performance in discrimination, calibration, and accuracy based on simulations with Inline graphic. The proposed approach using family data on both events has better prediction performance in discrimination and accuracy (mean C statistic increased by 0.019 and mean MSEP reduced by 10.2%), and the performance on calibration is slightly better than the univariate approach (mean O/E 0.004 closer to 1).

Table 2.

Prediction performance in the simulation studies with Inline graphic. Mean and SE stand for mean empirical average and empirical standard error, respectively. Each entry is based on 2000 replicates.

  Proposed approach Univariate approach
Prediction measure Mean SE Mean SE
C statistic 0.647 0.113 0.628 0.115
O/E 1.101 0.537 1.105 0.544
MSEP 0.024 0.010 0.027 0.011

To assess the robustness of the proposed methods, we applied the proposed approach to data sets that were simulated from a misspecified model. Particularly, the random effects Inline graphic and Inline graphic’s were correlated with a correlation Inline graphic. Table 3 shows the simulation results based on 2000 replicates. Even though the variance components, especially Inline graphic, are estimated with slight bias, the estimators for the regression coefficient Inline graphic and Inline graphic are close to the true values. Another set of simulations from a misspecified model (Gamma distributed Inline graphic) was also conducted and the results are given in Table S.1. of the Supplementary material available at Biostatistics online. The performance of the proposed approach is robust in the examined model-misspecification settings.

Table 3.

Summary statistics for the simulation studies with misspecified model. Bias, SE, SEE, and CP stand for median empirical bias, empirical standard error, median standard error estimator, and empirical coverage percentage of the 95% confidence interval, respectively. For Inline graphic and Inline graphic, the confidence interval is based on the log transformation. Each entry is based on 2000 replicates.

    Inline graphic Inline graphic
  True value Bias SE SEE CP Bias SE SEE CP
Inline graphic 0.5 0.014 0.222 0.216 0.95 0.002 0.148 0.143 0.95
Inline graphic -0.5 -0.019 0.241 0.245 0.96 -0.015 0.172 0.162 0.94
Inline graphic 0.5 0.020 0.138 0.151 0.97 0.010 0.100 0.099 0.95
Inline graphic 0.2 0.023 0.206 0.190 0.93 0.001 0.138 0.123 0.92
Inline graphic 0.4 0.025 0.257 0.240 0.94 0.007 0.167 0.155 0.93
Inline graphic -0.8 -0.061 0.178 0.181 0.95 -0.022 0.110 0.114 0.95
Inline graphic 1 0.019 0.714 0.811 0.98 0.045 0.452 0.504 0.97
Inline graphic 1 0.035 0.662 1.066 0.96 -0.038 0.421 0.642 0.92
Inline graphic 0.25 0.096 0.244 0.338 0.90 0.095 0.177 0.224 0.89
Inline graphic 0.5 0.011 0.284 0.474 0.95 0.049 0.217 0.307 0.95

4.2. Analysis of WHICAP study

We consider the joint modeling of AD and CVD with family data in the WHICAP study to infer risk for AD. Particularly, CVD is defined as a composite endpoint of heart failure, myocardial infarction, stroke, and other heart diseases, as suggested in He and others (2017). The exact event time for incidence cases and left- or right-censoring time on AD onset and interval-censoring time on CVD onset were obtained for probands. For relatives, the history of both AD and CVD was only assessed once at the study baseline so that most relatives contribute current status data. None of the relatives was genotyped. Other baseline covariates, such as gender, race, and years of education, were available for all subjects. The main research interests are to estimate genetic risk at APOE-Inline graphic4 allele, to assess familial aggregation patterns of multiple diseases (AD and CVD) in the presence of unobserved environmental risk factors and genetic risk at unobserved loci, and to predict AD risk in the presence and absence of family history of CVD.

We use the accelerated algorithm in Section S.2 of the Supplementary material available at Biostatistics online to analyze nuclear families from WHICAP. We include parents and siblings and exclude children of the probands for ease of computation; however, the information loss is minimal since the event rate of late onset AD is low among children of probands. We excluded subjects with missing covariates to obtain a data set with 5259 probands and 3206 family members. There were 4443 families with proband only, 615 families with 1–5 relatives, and 201 families with 6–15 relatives. The probands have baseline ages ranging from 60 to 103, with a median baseline age of 74. The family members have baseline ages ranging from 16 to 119, with a median baseline age of 76. The onset time of AD was observed exactly, left-censored, and right-censored for 675, 586, and 7204 subjects, respectively. The onset time of CVD was left-censored, interval-censored, and right-censored for 1989, 690, and 5786 subjects, respectively. Probands or relatives who died before developing AD or CVD were treated as right censored.

We jointly modeled AD and CVD using the proposed approaches, where the population APOE Inline graphic frequency Inline graphic was set to 0.15, which is the average APOE Inline graphic allele frequency in healthy individuals (Tang and others, 1998). The left panel of Table 4 shows the estimation results, where the reference levels for gender and race are female and Hispanic, respectively. The tests for the variance components Inline graphic and Inline graphic were assessed based on a mixture of chi-square distribution. The variance components Inline graphic and Inline graphic are significantly greater than zero, indicating strong effect of shared environmental exposure and genetic factors for AD among family members. The parameter Inline graphic is not significant, indicating that the underlying environmental risk factors for AD may not be significantly associated with CVD. The parameter Inline graphic is estimated as Inline graphic and highly significant, reflecting the presence of common latent genetic risk factors that affect both AD and CVD.

Table 4.

Results on regression analysis in WHICAP.

  Joint model Univariate model
Covariate Est. Std. Err. Inline graphic -value Est. Std. Err. Inline graphic -value
AD APOE Inline graphic 0.576 0.072 <0.0001 0.571 0.075 <0.0001
  Gender_Male -0.203 0.073 0.005 -0.230 0.074 0.002
  Race_White -0.538 0.119 <0.0001 -0.563 0.122 <0.0001
  Race_Black -0.023 0.087 0.794 -0.048 0.088 0.589
  Education -0.096 0.009 <0.0001 -0.096 0.009 <0.0001
Inline graphic   0.415 0.171 0.008 0.500 0.174 0.002
Inline graphic   0.458 0.215 0.016 0.388 0.294 0.093
CVD APOE Inline graphic 0.098 0.055 0.078      
  Gender_Male 0.282 0.050 <0.0001      
  Race_White 0.237 0.072 0.001      
  Race_Black 0.151 0.065 0.020      
  Education 0.000 0.006 0.997      
Inline graphic   -0.046 0.272 0.864      
Inline graphic   1.385 0.371 0.0002      

APOE Inline graphic was found to be significantly associated with an increased risk of AD, which confirms the findings in major AD literature (Tang and others, 1998; Lindsay and others, 2002), although the estimated APOE Inline graphic odds ratios are slightly smaller than those in the literature (Farrer and others, 1997). Some of the reported studies did not use a probability sampling design as in WHICAP, such that they may have contributed to an over-estimation of the APOE Inline graphic risk. Male gender and higher level of education are associated with decreased risk of AD, which has also been reported in the literature (Cobb and others, 1995; Launer and others, 1999). As expected, APOE Inline graphic is not significantly associated with the risk of CVD, while male gender is associated with a higher risk of CVD, which is consistent with the findings in the literature (Winkleby and others, 1992).

For comparison, we conducted a univariate analysis with family history data on AD where the cumulative hazard function of AD follows model (2.1). The right panel of Table 4 shows the results. The univariate model for AD gives the same set of significant risk factors with similar effect sizes and slightly larger standard errors. Figure 2 shows the estimated cumulative incidence functions of AD and CVD for a Hispanic female APOE Inline graphic carrier with 10 years of education and a similar subject who is noncarrier. The APOE Inline graphic carrier is associated with a higher risk of AD than noncarrier, while the risks of CVD are similar. The estimated risks of AD from the univariate model and the proposed model are similar.

Fig. 2.

Fig. 2.

Estimation of the cumulative incidence functions of AD and CVD in WHICAP. The solid and dashed curves pertain, respectively, to the estimates for the hispanic female carrier and noncarrier with 10 years of education. The black and red curves pertain to the proposed joint model and the univariate model, respectively.

To demonstrate the advantage of joint modeling, we predicted the occurrence of AD given different event history of AD and CVD collected on a proband subject and his or her family members in Figure 3. In particular, we considered a proband who is a Hispanic female APOE Inline graphic carrier with 10 years of education and has not yet developed AD by age 65. The presence of CVD by 65 in the proband increases her risk of AD (black solid curve versus red solid). Additionally, we considered event history of CVD and AD by age 65 in the proband’s mother. With the same CVD history in the proband, a positive family history of both AD and CVD in mother substantially increases the proband’s risk of AD (dashed curves versus solid and dotted). In contrast, a negative family history of both AD and CVD in mother does not affect proband’s risk of AD substantially, comparing to the case with no family history (dotted curves versus solid). In addition, we also estimated a subject’s conditional baseline cumulative hazard functions of CVD given different history of AD, as shown in Figure S.1 of the Supplementary material available at Biostatistics online. The presence of AD by 65 increases the subject’s conditional baseline cumulative hazard function of CVD.

Fig. 3.

Fig. 3.

Estimation of the conditional cumulative distribution function of AD for a proband who is a Hispanic female APOE Inline graphic carrier with 10 years of education and has not yet developed AD by age 65 given different event histories of herself or her mother with the same covariate values. The black and red curves pertain, respectively, to the scenarios where the proband has and has not developed CVD at 65. The solid, dashed, and dotted curves pertain, respectively, to the scenarios where there is no family history of mother, mother has developed both AD and CVD before 65, and mother has not developed AD nor CVD before 65.

To evaluate the performance of the prediction, we randomly divided the study cohort into training and testing sets with equal numbers of families. We analyzed the training set to obtain parameter estimates, based on which we predicted the cumulative incidence of AD at age 80 for probands in the testing set, given their CVD history and disease history of family members. We calculated the C statistic and O/E among probands with known AD status at age 80 and compared them with those from the univariate model. Based on 100 randomly divided training/test sets, the 95% range of C statistic based on proposed approach is Inline graphic, while that based on univariate approach is Inline graphic. The value of the C statistic based on the approach is higher than that from the univariate approach in all examined random divisions. The 95% range of the O/E based on proposed approach is Inline graphic, while that based on univariate approach is Inline graphic. The value of the O/E based on the approach is closer to one than that from the univariate approach in 97% of all examined random divisions.

To conclude, the first insight we draw from our joint multilevel model is that comorbidity of AD and CVD is mostly attributed to shared genetic risk factors instead of shared environmental factors. Our second insight is that developing CVD by 65 and family history of AD and CVD both increase risk of AD. Since quantitative estimates of AD risk can be useful for guiding preventive behavioral strategies, we also developed an R Shiny interactive web application for the prediction of AD risk given patient characteristics based on our joint modeling results. With information on patient CVD history, APOE genotype, demographics, and disease history in their relatives, the application gives prediction of AD cumulative incidence risk. When some information is not available (e.g., unknown APOE genotype in family members), the application provides an average risk (e.g., average AD risk over APOE genotype groups). Figure 4 shows an example of the predicted cumulative AD incidence risk for a subject, where the demographics, CVD history, and information on relatives can be filled interactively in the web application.

Fig. 4.

Fig. 4.

An illustration of the interactive web application for AD risk prediction.

5. Concluding remarks

A research goal of precision medicine is to disentangle shared genetic risk among multiple events from shared environmental risks. This information is crucial for gene mapping of comorbid diseases and constructing shared genetic mechanism for multiple diseases. In this article, we provide a multilevel semiparametric regression to quantify genetic and environmental risks of disease comorbidities using family data. We distinguish unobserved shared environmental risk from unobserved genetic hereditary risks, handle multiple events and mixed types of censoring schemes, and predict genetic risk on AD given history of CVD on the subject and family history of events in relatives. Our results provide more precise AD risk estimates not only given a subject’s own risk factors and health history but also their family members’ health history. Furthermore, we provide an interactive risk calculator to visualize the risk estimates.

Our method can be scaled up to accommodate future larger samples. The computation for joint modeling of mixed-type and interval-censored data is very challenging since it involves large-scale optimization for jump sizes of the baseline hazard functions and numerical integration with dimension increasing with family sizes. By artificially introducing Poisson and Bernoulli random variables for nuclear families (details given in Section S.2 of the Supplementary material available at Biostatistics online), we proposed an EM algorithm with at most 3D integration in the E-step and jump sizes updated explicitly in the M-step. Our algorithm substantially reduces the dimension of numeric integration for families with a large size (e.g., up to 15 relatives in real data; reduces 16-dimensional integration to 3D). Stochastic algorithms can be considered when the sample size is extremely large.

Our analysis of WHICAP data reveals that comorbidity of AD and CVD is mainly attributable to latent genetic factors. These results may suggest that future studies of mechanisms of comorbidity of AD and CVD to focus on exploring genetic risks. Identifying common risk factors for CVD and AD may offer opportunities to develop interventions to reduce AD risk through a shared mechanism. In addition, the predicted genetic risks given event history and family history of secondary outcomes are useful for genetic counseling, patient care tailoring, and family planning in the era of precision medicine. For example, Figures 2 and 3 can be presented in a genetic counseling session for subjects to understand their age-specific risk of AD depending on race, presence/absence of APOE mutation and history of CVD in family members.

Our method were developed based on the assumption that probands are sampled randomly and prospectively from the population, as in our motivating study WHICAP. In some other studies, probands may be sampled retrospectively based on disease status (e.g., case–cohort and nested case–control designs). The proposed methods can be extended to account for different sampling schemes by incorporating the ascertainment probabilities in the likelihood function. The estimation procedure needs to be modified accordingly.

In follow-up study with objective on the primary disease, the secondary disease may be censored by the occurrence of the primary disease such that no examination for the secondary disease is taken after the primary disease’s occurrence. Our proposed methods using joint likelihood of the two event times automatically handle such semicompeting risks data and provide estimates on genetic and environmental attributions to disease dependence. In some cases, both the primary and secondary events may be censored by the occurrence of a third event. For example, in the WHICAP study, both AD and CVD are censored by the occurrence of death. Current strategy of treating death time as a censoring time assumes conditional independence of the two event times and death time given covariates. The proposed methods need to be extended if the conditional independence assumption indeed fails.

Since family history data were usually self-reports and collected retrospectively, they may be measured with error. The accuracy of self-reported family history may vary by degree of relatives and type of disease (Braun and others, 2018). Incorrectly determined age-at-onset may induce bias so that extending the proposed methods to handle mismeasured age of disease onset may be of great interest. Here, we focus on two events, but the method can be easily extended to handle more events.

A number of assumptions were employed in our joint modeling. One assumption is the proportional hazards assumption for the conditional distributions of the events given the random effects. Even though a number of testing procedures for assessing proportional hazards assumption (Harrell and Lee, 1986; Grambsch and Therneau, 1994) were proposed for right-censored data, those for interval-censored data, let alone those current setting, have never been formally established. Statistical testing methods for assessing proportional hazards assumption with interval-censored data require further investigation.

6. Software

The programming codes for this article and an interactive web-based application for prediction of AD risk given patient’s characteristics including genetic risk factors and medical history of CVD and his/her family members’ disease history are publicly available at https://github.com/feigao1/BiCens_Fam_Genorisk.

Supplementary Material

kxab014_Supplementary_Data

ACKNOWLEDGMENTS

Conflict of Interest: None declared.

Contributor Information

Fei Gao, Division of Vaccine and Infectious Disease, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA.

Donglin Zeng, Department of Biostatistics, University of North Carolina, Chapel Hill, NC 27599, USA.

Yuanjia Wang, Department of Biostatistics, Columbia University, New York, NY 10032, USA.

SUPPLEMENTARY MATERIAL

Supplementary material is available at http://biostatistics.oxfordjournals.org.

FUNDING

This work was supported by US National Institutes of Health grants (NIH NS073671 and GM124104). The Washington Heights, Hamilton Heights, Inwood Community Aging Project study was supported by AG037212. Scientific computing at the Fred Hutch is supported by ORIP (S10OD028685).

REFERENCES

  1. Aronson, S. J. and Rehm, H. L. (2015). Building the foundation for genomics in precision medicine. Nature 526, 336–342. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Braun, D., Gorfine, M., Katki, H. A., Ziogas, A. and Parmigiani, G. (2018). Nonparametric adjustment for measurement error in time-to-event data: application to risk prediction models. Journal of the American Statistical Association 113, 14–25. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Chen, L., Hsu, L. and Malone, K. (2009). A frailty-model-based approach to estimating the age-dependent penetrance function of candidate genes using population-based case-control study designs: an application to data on the BRCA1 gene. Biometrics 65, 1105–1114. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Cobb, J. L., Wolf, P. A., Au, R., White, R. and D’agostino, R. B. (1995). The effect of education on the incidence of dementia and Alzheimer’s disease in the Framingham Study. Neurology 45, 1707–1712. [DOI] [PubMed] [Google Scholar]
  5. Farrer, L. A., Cupples, L. A., Haines, J. L., Hyman, B., Kukull, W. A., Mayeux, R., Myers, R. H., Pericak-Vance, M. A., Risch, N. and Van Duijn, C. M. (1997). Effects of age, sex, and ethnicity on the association between apolipoprotein E genotype and Alzheimer disease: a meta-analysis. Journal of the American Medical Association 278, 1349–1356. [PubMed] [Google Scholar]
  6. Gao, F., Zeng, D., Couper, D. and Lin, D. Y. (2019). Semiparametric regression analysis of multiple right-and interval-censored events. Journal of the American Statistical Association 114, 1232–1240. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Gorfine, M. and Hsu, L. (2011). Frailty-based competing risks model for multivariate survival data. Biometrics 67, 415–426. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Gorfine, M., Hsu, L. and Parmigiani, G. (2013). Frailty models for familial risk with application to breast cancer. Journal of the American Statistical Association 108, 1205–1215. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Gorfine, M., Hsu, L., Zucker, D. M. and Parmigiani, G. (2014). Calibrated predictions for multivariate competing risks models. Lifetime Data Analysis 20, 234–251. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Graber-Naidich, A., Gorfine, M., Malone, K. E. and Hsu, L. (2011). Missing genetic information in case-control family data with general semi-parametric shared frailty model. Lifetime Data Analysis 17, 175–194. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Grambsch, P. M. and Therneau, T. M. (1994). Proportional hazards tests and diagnostics based on weighted residuals. Biometrika 81, 515–526. [Google Scholar]
  12. Harrell, F. E. and Lee, K. L. (1986). Verifying assumptions of the cox proportional hazards model. In: Proceedings of the Eleventh Annual SAS Users Group International Conference. Cary, NC: SAS Institute Inc. pp. 823–828. [Google Scholar]
  13. Harrell Jr, F. E. (2001). Regression Modeling Strategies. New York: Springer. [Google Scholar]
  14. He, L., Culminskaya, I., Loika, Y., Arbeev, K. G., Bagley, O., Duan, M., Yashin, A. I. and Kulminski, A. M. (2017). Causal effects of cardiovascular risk factors on onset of major age-related diseases: a time-to-event mendelian randomization study. Experimental Gerontology 107, 74–86. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Hsu, L., Gorfine, M. and Malone, K. (2007). On robustness of marginal regression coefficient estimates and hazard functions in multivariate survival analysis of family data when the frailty distribution is mis-specified. Statistics in Medicine 26, 4657–4678. [DOI] [PubMed] [Google Scholar]
  16. Hsu, L., Gorfine, M. and Zucker, D. (2018). On estimation of the hazard function from population-based case–control studies. Journal of the American Statistical Association 113, 560–570. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Khoury, M. J., Beaty, T. H. and Cohen, B. H. (1993). Fundamentals of Genetic Epidemiology. New York: Oxford University Press. [Google Scholar]
  18. Launer, L. J., Andersen, K., Dewey, M. E., Letenneur, L., Ott, A, Amaducci, L. A., Brayne, C., Copeland, J. R. M., Dartigues, J.-F., Kragh-Sorensen, P., Lobo, A., Martinez-Lage, J. M., Stijnen, T., Hofman, A.. and others. (1999). Rates and risk factors for dementia and Alzheimer’s disease results from EURODEM pooled analyses. Neurology 52, 78–84. [DOI] [PubMed] [Google Scholar]
  19. Lindsay, J., Laurin, D., Verreault, R., Hébert, R., Helliwell, B., Hill, G. B. and McDowell, I. (2002). Risk factors for Alzheimer’s disease: a prospective analysis from the Canadian Study of Health and Aging. American Journal of Epidemiology 156, 445–453. [DOI] [PubMed] [Google Scholar]
  20. Maestre, G., Ottman, R., Stern, Y., Gurland, B., Chun, M., Tang, M.-X., Shelanski, M., Tycko, B. and Mayeux, R. (1995). Apolipoprotein E and Alzheimer’s disease: ethnic variation in genotypic risks. Annals of Neurology 37, 254–259. [DOI] [PubMed] [Google Scholar]
  21. Pfeiffer, R. M., Gail, M. H. and Pee, D. (2001). Inference for covariates that accounts for ascertainment and random genetic effects in family studies. Biometrika 88, 933–948. [Google Scholar]
  22. Reitz, C., Brayne, C. and Mayeux, R. (2011). Epidemiology of Alzheimer disease. Nature Reviews Neurology 7, 137–152. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Stern, Y., Gu, Y., Cosentino, S., Azar, M., Lawless, S. and Tatarina, O. (2017). The Predictors study: development and baseline characteristics of the Predictors 3 cohort. Alzheimer’s & Dementia 13, 20–27. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Tang, M.-X., Cross, P., Andrews, H., Jacobs, D. M., Small, S., Bell, K., Merchant, C., Lantigua, R., Costa, R., Stern, Y.. and others. (2001). Incidence of AD in African-Americans, Caribbean hispanics, and caucasians in northern Manhattan. Neurology 56, 49–56. [DOI] [PubMed] [Google Scholar]
  25. Tang, M.-X., Stern, Y., Marder, K., Bell, K., Gurland, B., Lantigua, R, Andrews, H., Feng, L., Tycko, B. and Mayeux, R. (1998). The APOE-Inline graphic4 allele and the risk of Alzheimer disease among African Americans, whites, and Hispanics. JAMA 279, 751–755. [DOI] [PubMed] [Google Scholar]
  26. Wacholder, S., Hartge, P., Struewing, J. P., Pee, D., McAdams, M., Brody, L. and Tucker, M. (1998). The kin-cohort study for estimating penetrance. American Journal of Epidemiology 148, 623–630. [DOI] [PubMed] [Google Scholar]
  27. Wang, L., McMahan, C. S., Hudgens, M. G. and Qureshi, Z. P. (2016). A flexible, computationally efficient method for fitting the proportional hazards model to interval-censored data. Biometrics 72, 222–231. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Wang, Y., Liang, B., Tong, X., Marder, K., Bressman, S., Orr-Urtreger, A., Giladi, N. and Zeng, D. (2015). Efficient estimation of nonparametric genetic risk function with censored data. Biometrika 102, 515–532. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Winkleby, M. A., Jatulis, D. E., Frank, E. and Fortmann, S. P. (1992). Socioeconomic status and health: how education, income, and occupation contribute to risk factors for cardiovascular disease. American Journal of Public Health 82, 816–820. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Zeng, D., Gao, F. and Lin, D.-Y. (2017). Maximum likelihood estimation for semiparametric regression models with multivariate interval-censored data. Biometrika 104, 505–525. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Zeng, D., Mao, L. and Lin, D. Y. (2016). Maximum likelihood estimation for semiparametric transformation models with interval-censored data. Biometrika 103, 253–271. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

kxab014_Supplementary_Data

Articles from Biostatistics (Oxford, England) are provided here courtesy of Oxford University Press

RESOURCES