Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2025 Jun 15.
Published in final edited form as: Stat Med. 2024 Apr 18;43(13):2560–2574. doi: 10.1002/sim.10081

Variance-components Tests for Genetic Association with Multiple Interval-censored Outcomes

Jaihee Choi 1,*, Zhichao Xu 2, Ryan Sun 2
PMCID: PMC11116038  NIHMSID: NIHMS1984631  PMID: 38636557

Abstract

Massive genetic compendiums such as the UK Biobank have become an invaluable resource for identifying genetic variants that are associated with complex diseases. Due to the difficulties of massive data collection, a common practice of these compendiums is to collect interval-censored data. One challenge in analyzing such data is the lack of methodology available for genetic association studies with interval-censored data. Genetic effects are difficult to detect because of their rare and weak nature, and often the time-to-event outcomes are transformed to binary phenotypes for access to more powerful signal detection approaches. However transforming the data to binary outcomes can result in loss of valuable information. To alleviate such challenges, this work develops methodology to associate genetic variant sets with multiple interval-censored outcomes. Testing sets of variants such as genes or pathways is a common approach in genetic association settings to lower the multiple testing burden, aggregate small effects, and improve interpretations of results. Instead of performing inference with only a single outcome, utilizing multiple outcomes can increase statistical power by aggregating information across multiple correlated phenotypes. Simulations show that the proposed strategy can offer significant power gains over a single outcome approach. We apply the proposed test to the investigation that motivated this study, a search for the genes that perturb risks of bone fractures and falls in the UK Biobank.

Keywords: Time-to-event, Genome-wide Association Studies, Set-based Inference, Interval-censored, Multiple Outcomes

1 |. INTRODUCTION

In recent years, many organizations have initiated enormous efforts to create massive genetic datasets. These huge compendiums, which include projects such as the UK Biobank (UKB),1 Million Veterans Project,2 and All of Us,3 contain expansive biomedical data collected across a long period of time. For example, the UKB is a large prospective cohort study that has been collecting various health-related outcomes and genetic data on approximately 500 000 people for over 15 years.1,4 It is of great interest to use these resources to further explore the genetic etiologies of complex diseases.

The challenges of collecting large-scale biomedical data have caused much of the currently available time-to-event data to be released in interval-censored form. Specifically, in the UK Biobank, participants complete extensive questionnaires at enrollment and then also when they return to assessment centers for follow-up studies.5 Thus their failure times for a variety of outcomes are interval-censored, that is, an event of interest is known to occur only between two endpoints but the exact failure time is unknown.6 For example, a typical questionnaire item might ask if the subject has ever sustained a fracture. If the subject answers affirmatively, then the fracture is only known to occur between the last negative answer and the date of the affirmative answer. If the first answer is affirmative, then the fracture is only known to occur between birth and the time of the first questionnaire.

Though interval-censored data arise commonly in biomedical data, they are seen less in genetic studies, which are usually conducted on binary and continuous outcomes. Accordingly, many methods developed for the analysis of genome-wide association study (GWAS) data are tailored for the generalized linear model (GLM) setting.7,8 In certain cases, investigators have transformed naturally interval-censored data into simpler data forms so that more powerful analysis tools are available for the detection of rare and weak genetic effects.9 For example, a recent GWAS of fractures in the UKB dichotomized the fractures outcome to be binary.10 However, it is well-known that transformations of interval-censored data can result in loss of efficiency and robustness.11,12

While a limited number of more sophisticated tools are slowly becoming available for GWAS data with interval-censored outcomes, most focus only on one outcome at a time. For example, the Interval-Censored Sequence Kernel Association Test13 allows testing sets of genetic variants against single interval-censored outcomes. The weighted-V-statistic method approach provides similar functionality and also incorporates interaction effects.14 Testing sets of variants against a single outcome is a popular approach in genetics settings, as this approach alleviates many of the difficulties encountered in single variant analysis.15 The multiple testing burden is vastly reduced, interpretations at the gene set or pathway set level are often more useful than at the single variant level, and weak signals can be aggregated into a more detectable effect.

The main goal of this paper is to improve upon the existing interval-censored set-based approaches by developing more powerful tests for sets of genetic variants against multiple interval-censored phenotypes. Testing multiple correlated outcomes jointly can greatly increase the amount of information, and thus power, available to a set-based inference procedure. It is well-known that certain genetic variants are involved in mechanisms that perturb the risks of multiple related diseases.16 Therefore combining the information from these diseases can greatly increase detection power.

Conducting set-based inference across multiple outcomes is already a well-established procedure within the GLM framework and has yielded promising results. For instance, the Multi-SKAT method extends the single-outcome sequence kernel association test (SKAT) to jointly model multiple outcomes, and this tool has been shown to outperform single outcome tests.7 Many other similar approaches have been developed8,17 for the GLM setting and are known to perform better as the correlation between jointly modeled outcomes increases. However, it is not clear how to perform set-based testing of multiple interval-censored outcomes in their original state.

There are two main components to the development of our testing procedure. First, to model the effects of a SNP set, we use the common random effects specification for the genetic variants.18 This approach is routinely pursued in sequence kernel association type tests and allows the null hypothesis to be specified in terms of the random effect variance parameter. Second, to introduce correlation between the multiple outcomes, we specify a subject-specific random intercept for a second source of randomness in the linear predictor. The model formulation is completed by allowing fixed effects for non-genetic covariates and assuming the conventional proportional hazards specification for interval-censored event times.

A set of extensive simulation studies demonstrates our proposed approach is able to control type I error rates at the extremely low levels necessary to demonstrate genome-wide significance. Further empirical studies show that modeling three or five outcomes at once offers much higher power than testing one single outcome at a time. The procedure can be oriented to detect either homogeneous or heterogeneous effects by assuming different correlation matrices for the random genetic effects.

We apply the proposed method to the study that motivated this work: an investigation of the genes associated with risk of fractures. Fractures are known to be a highly heritable and devastating outcome, leading to tens of billions of dollars in costs each year. The genetic etiology of fractures has been analyzed extensively, including the aforementioned UKB study that dichotomized the available time-to-event data and studied only one outcome at a time. To increase the power available to detect risk genes, we combine the fractures outcome with a falls outcome that is also interval-censored. The risks of falling and of fractures are known to be correlated, and we expect that combining these outcomes will allow for more discoveries than studying fractures alone.

The remainder of the paper is organized as follows. Section 2 introduces the interval-censored set-based association framework and defines necessary notation. In Section 3 we develop the multiple outcome testing method, including the test statistic and inference procedure. Section 4 describes simulation studies and compares other multiple outcome approaches. Section 5 applies the proposed method to fractures and falls data from the UK Biobank, and we conclude with a discussion in Section 6.

2 |. METHODS

2.1 |. Model and Notation

Consider a genetic dataset of i=1,2,,n total subjects, and suppose that for each of these individuals we are interested in k=1,,K outcomes with complete data. Suppose also that for each subject we observe Gi=Gi1,,GiqT, a q-dimensional vector of SNPs that belong to a biologically relevant SNP set, and Xi=Xi1,,XipT, a p-dimensional vector of additional non-genetic covariates such as sex or other demographic factors. Let W be a pre-determined q×q diagonal weight matrix that can be used to upweight or downweight certain variants. For instance, W is sometimes taken as the density of a Beta random variable evaluated at the minor allele frequency of the variant.

Denote Tik as the actual event time of interest for event type k in individual i. Further assume that for each event k, each individual i is only monitored at the set of Mik event times sik1,,sikMik. Due to this monitoring scheme, event k is only known to fall within the interval (Lik,Rik], where Lik is the left end of the interval and Rik is the right end of the interval. If Lik=0, the event is left-censored and occurs prior to the first observation time. If Rik=, the event has not occurred within the observation time, so it is right-censored. We assume throughout this work that the censoring mechanism is independent of the event time of interest. That is, we have independent or non-informative interval censoring.6 We also assume there are no competing risks, i.e. that there are no other correlated events whose occurrence would preclude the observation of our primary outcome.

A general semi-parametric model to link the SNPs and non-genetic covariates with the failure time is

g{Sik(t)}=hk(t)+XiTβk+GiTWγk+bi. (1)

Here Sik(t) is the survival function for Tik conditional on Gi,Xi, and bi,hk() is an unspecified strictly increasing function, and g() is a known continuous and decreasing function. The non-genetic effects βk=β1k,β2k,,βpkT are specified as fixed effects, and the SNP effects γk=γ1k,γ2k,,γqkT are assumed to be random with mean 0 and covariance structure Var(γ)=τΣ. We do not specify the full distribution of γ=γ1T,,γKTT.

The subject-specific random effect is assumed to have a standard Gaussian distribution bi~N0,σ2. This frailty term accounts for unobserved heterogeneity and introduces correlation between related phenotypes, such as falls and fractures. Note that the phenotype correlation is a separate concept from Σ, which describes the heterogeneity of the SNP effects. When Σ is closer to the identity matrix, the SNP effects are independent and heterogeneous, e.g. they are likely to point in different directions and possess different magnitudes. It should also be noted that the genotype correlation matrix, which is the q×q correlation matrix of Gi, is a third separate concept and is not considered in our framework.

The link function can take various forms but the most common is g(x)=log{-log(x)}, which corresponds to the standard Cox proportional hazards model.19,20 For simplicity, we will focus on this proportional hazards model, although much of the development applies to other link functions as well. In addition to the proportional hazards assumption, we also assume that covariates in the model have time-invariant effects. Note that in the proportional hazards model, hk() is the log of the baseline cumulative hazard. For the sake of speed and model parsimony, we suggest approximating this function using a cubic spline with one interior knot,

hk(t)=a0k+a1klog(t)+a2kv{log(t)}, (2)

where v{log(t)}={log(t)-c}+3-log(t)-cmin+3cmax-c/cmax-cmin-log(t)-cmax+3c-cmin/cmax-cmin. The cmin and cmax are the minimum and maximum values across all observed intervals, and c is the median observed value. Another option that may enhance the accuracy of the model is to select the optimal number of internal knots using the Akaike information criterion (AIC) or other criteria. More detailed discussion regarding model selection as well as the challenges that may arise when modeling interval-censored data are further discussed in Appendix A.

Denote the spline parameters by αk=a0k,a1k,a2kT and further define Lik=1,logLik,vlogLikT,Rik=1,logRik,vlogRikT,Uik=XiT,LikTT, and Vik=XiT,RikTT21 Let b=b1,,bnT. Then the full set of regression parameters in the model is θ=θ1T,,θKTT with θk=βkT,αkTT as the collection of all fixed effects for outcome k. The full likelihood conditional on the random effects can be written as

(θγ,b)=i=1nLi(γ,bi),Li(γ,bi)=k=1K[exp{exp(UikTθk+GiTWγk+bi)}exp{exp(VikTθk+GiTWγk+bi)}]. (3)

Note that the null hypothesis of no SNP effects on any outcome is H0:γ=0 or equivalently H0:τ=0.

2.2 |. Variance Component Test Statistic

To perform a variance component test on τ, we must first find the marginal likelihood unconditional on γ or b.18 In general, there is no simple form for this likelihood in the interval-censored setting. One way to marginalize over the subject-specific random effects is to use Gauss-Hermite quadrature to perform the integral

(θ,σ2γ)=i=1n Li(γ,bi)f(bi)dbi. (4)

Let xd,d=1,,D be a set of D quadrature nodes and wd be the set of corresponding weights.22,23 Then the logarithm of the integrated function θ,σ2γlogLθ,σ2γ can be approximated as

(θ,σ2γ)i=1nli(γ),
li(γ)log{1πd=1DwdLi(γ,2σxd)}. (5)

The quadrature nodes and weights can be tabulated straightforwardly from one of many available algorithms. The numerical study below suggests that D=100 provides good precision. Note also that the quadrature is only in one-dimension and thus the computational burden is not extreme.

The full specification of Equation (5) can be found in Appendix B. After integrating out the subject-specific random effect, only the genetic random effects remain. In general, the marginal likelihood

(θ,σ2,τ)= (θ,σ2γ)f(γ)dγ

does not have a closed form either, especially because we leave the full distribution of γ unspecified. Instead, we can perform a two-term Taylor expansion around the point γ=0. With this approach, the marginal log-likelihood can be approximated by

θ,σ2,τlog ei=1nli(0)1+γTddγi=1nli(γ)γ=0+12γTd2dγdγTi=1nli(γ)γ=0γ+12γTddγi=1nli(γ)γ=0ddγi=1nli(γ)γ=0Tγf(γ).

Taking the derivative of θ,σ2,τ with respect to τ and evaluating at τ=0 gives

ddτθ,σ2,ττ=0=12trddγi=1nliγγ=0ddγi=1nliγγ=0T+d2dγdγTi=1nli0Σ.

If we take twice the first term as the test statistic Q,13 then we have

Q=UγTΣUγ, (6)

where Uγ=ddγi=1nli(γ)γ=0. The full forms of the terms in Equation (6) are quite large and complex. For the sake of presentation, we provide them in Appendix C.

2.3 |. Variations and Inference

Note that the test statistic Q depends on the specification of the covariance matrix of γ. An obvious first choice is to specify independent effects with Σ=IqK×qK, much like other variance components tests in genetic association settings.24 Then we have

Qind=UγTUγ. (7)

We can see that the Uγ terms in Equation (7) are equivalent to score equations for the elements γ if we assume model 1 and treat γ as fixed effects. Because of this duality, we know that Uγ has a multivariate Gaussian distribution with mean 0 and covariance matrix V=Iγγ-IγηIηη-1IγηT. Here η=θT,σ2T.Iγγ,Iηγ, and Iηη are expected information matrices under the fixed effect perspective. The full derivation of the covariance matrix can be found in Appendix D, and the full forms of all the terms are shown in Appendices E and F. Thus Qind is a quadratic form, and the asymptotic distribution of Qind follows a mixture of chi-square distributions:

Qind ~l=1qKλl2χ12,

where the λl are the eigenvalues of V.

Another possible specification for the distribution of the covariance matrix of γ is Σ=11T, where 1=(1,,1)T. In this case, Equation (6) becomes

Qcor=UγT11TUγ. (8)

The statistic Qcor is the opposite of Qind in that it assumes the genetic effects are totally homogeneous. By again appealing to the score-like quantity of the terms in Qcor when γ are assumed to be fixed effects, we can see that the score statistic JTUγ has mean 0 and covariance matrix JTVUrJ. Thus, we can see that Qcor/1TV1~χ12 under the null hypothesis.

2.4 |. Omnibus test

It is generally difficult to estimate Σ in practical testing situations. Thus, it would be useful to develop a test that combines the advantages of Qind and Qcor and can perform effectively across diverse settings. This reasoning motivates our development of an omnibus test statistic.

To introduce the omnibus statistic, consider first a set of intermediate test statistics that blend Qind and Qcor,

Qint,m=UγT{(1ρm)Iq+ρmJJT}Uγ=(1ρm)Qind+ρmQcor. (9)

Here the m=1,,M index the intermediate test statistics, and we define 0=ρ1<ρ2<<ρM=1. We can see that Qint,m is a linear combination of Qind and Qcor and is equivalent to assuming that the correlation structure of γ is exchangeable with off-diagonal value ρm. We have Qint,1=Qind and Qint,M=Qcor. The omnibus test statistic then aggregates all the intermediate test statistics with the aggregated Cauchy association test (ACAT)25 method:

Qomni=i=1Mtan0.5-pint,mπ/M.

Here pint,m is the intermediate p-value for Qint,m, which is also a quadratic form and has a null distribution similar to Qind.

The p-value of the omnibus test pomni is then

pomni =0.5-arctanQomni /π.

A notable feature of ACAT is that the pomni calculation can be applied to p-values with arbitrary dependence.25 We recommend using M=6 intermediate test statistics equally spaced between 0 and 1.

3 |. SIMULATION STUDIES

3.1 |. Type I Error Simulations

We performed an extensive set of simulation studies to investigate the finite sample operating characteristics of the proposed methods. Data was simulated using two sample sizes, n=2400 and n=24000, and we investigated application to both K=3 and K=5 outcomes. We start by reporting the type I error simulations, which were conducted at levels down to α=1×10-4 to account for the small levels needed to declare significance in genetic association settings. The exact event times were generated for each subject i under Sik(t)=exp-texpXiTβ+bi for all outcomes k. There were two additional covariates Xi, with the first term generated from a standard normal distribution and the second from a binomial distribution with probability of success 0.5. We set the fixed effects to be β=(1,1) for each outcome. The subject-specific random effects, bi, were generated from a N(0,1) distribution.

The observation times for each outcome were generated from a uniform distribution of length 0.2 centered around the times (1, 2, 3, 4), and the probability of missing a visit was set at 10%. Genotypes were generated from a multivariate binary variable with size 2, mean parameters distributed uniformly between (0.01, 0.05), and an exchangeable correlation matrix where each off-diagonal entry was equal to 0.1. In all settings, we set q=50, we used one internal knot in the cubic spline estimate of the log baseline cumulative hazard, and the omnibus test statistic used M=6 intermediate test statistics equally spaced between 0 and 1. Additional simulation details are available in Appendix G.

Other tests that could be used in an interval-censored multiple phenotype testing strategy were also considered. Specifically, we considered combining the test statistics for single-outcome tests. The interval-censored sequence kernel association test (ICSKAT) and interval-censored burden test13 perform set-based inference for one outcome at a time. We applied them in this way and then combined the p-values using either a Bonferroni correction26 or the ACAT approach.

The empirical type I error rates are presented in Table 1 at various nominal significance levels. We can see that all proposed tests demonstrate good control of the type I error rate for both n=2400 and n=24000 subjects. All proposed tests also appear to perform well for both K=3 and K=5 outcomes. We see that the ACAT combinations of single outcome tests also appear to produce valid inference. Using a Bonferroni correction on the minimum ICSKAT p-value produced conservative results, which is a well-known property of Bonferroni-type approaches. To better illustrate the performance of the different methods under the various simulation settings, we have provided quantile-quantile plots of the most extreme p-values in Appendix H.

TABLE 1.

Type I error rates for both three and five correlated outcomes under sample sizes of n=2400 and n=24000. A million simulations were run for each setting. Qind refers to the joint set-based test assuming an independent covariance structure; Qcor refers to the joint set-based test assuming a perfectly correlated covariance structure; Qomni refers to the joint set-based omnibus test; ICSKAT, Bonferroni refers to interval-censored SKAT, with Bonferroni-corrected p-values; ICSKAT, ACAT refers to interval-censored SKAT followed by aggregated Cauchy association test; Burden, ACAT refers to the interval-censored burden test followed by aggregated Cauchy association test.

Outcomes Model α Qind Qcor Qomni ICSKAT, Bonferroni ICSKAT, ACAT Burden, ACAT
K= 3 n =2400 5 × 10−2 4.97 × 10−2 5.08 × 10−2 5.27 × 10−2 4.63 × 10−2 4.77 × 10−2 5.09 × 10−2
1 × 10−2 0.98 × 10−2 1.04 × 10−2 1.06 × 10−2 0.89 × 10−2 0.90 × 10−2 1.03 × 10−2
1 × 10−3 0.98 × 10−3 1.11 × 10−3 1.04 × 10−3 0.80 × 10−3 0.80 × 10−3 1.00 × 10−3
1 × 10−4 0.90 × 10−4 1.2 × 10−4 1.08 × 10−4 0.79 × 10−4 0.79 × 10−4 1.06 × 10−4
n =24000 5 × 10−2 4.98 × 10−2 5.01 × 10−2 5.25 × 10−2 4.82 × 10−2 4.89 × 10−2 4.99 × 10−2
1 × 10−2 1.00 × 10−2 1.01 × 10−2 1.04 × 10−2 0.98 × 10−2 0.98 × 10−2 1.00 × 10−2
1 × 10−3 1.00 × 10−3 1.01 × 10−3 1.04 × 10−3 0.99 × 10−3 0.99 × 10−3 1.00 × 10−3
1 × 10−4 1.11 × 10−4 0.93 × 10−4 1.02 × 10−4 1.05 × 10−4 1.05 × 10−4 1.04 × 10−5
K= 5 n =2400 5 × 10−2 4.94 × 10−2 5.09 × 10−2 5.24 × 10−2 4.54 × 10−2 4.74 × 10−2 5.08 × 10−2
1 × 10−2 9.72 × 10−3 1.04 × 10−3 1.03 × 10−3 0.87 × 10−3 0.88 × 10−3 1.02 × 10−3
1 × 10−3 8.65 × 10−3 1.09 × 10−3 1.02 × 10−3 0.77 × 10−3 0.77 × 10−3 9.85 × 10−3
1 × 10−4 9.40 × 10−4 1.36 × 10−4 1.18 × 10−4 0.77 × 10−4 0.77 × 10−4 1.13 × 10−4
n =2400 5 × 10−2 5.03 × 10−2 5.05 × 10−2 5.30 × 10−2 4.76 × 10−2 4.84 × 10−2 4.98 × 10−2
1 × 10−2 1.01 × 10−2 1.00 × 10−2 1.03 × 10−2 0.97 × 10−2 0.98 × 10−2 1.01 × 10−2
1 × 10−3 1.02 × 10−3 0.97 × 10−3 1.03 × 10−3 0.94 × 10−3 0.95 × 10−3 1.00 × 10−3
1 × 10−4 1.09 × 10−4 1.00 × 10−4 0.98 × 10−4 0.84 × 10−4 0.84 × 10−4 1.01 × 10−4

3.2 |. Power Simulations

We next conducted power simulations under a variety of true genetic effect models. The non-genetic covariates, non-genetic effect size, genetic data, and observation times were generated in the same manner as the type I error rate simulation, and the true model was Sik(t)=exp-texpXiTβ+GiTγk+bi. Each set contained 50 SNPs, and in all scenarios, four SNPs had an effect, unless specified otherwise. We again considered the cases of n=2400 and n=24000 as well as K=3 and K=5 outcomes. Additional simulation details are discussed in Appendix G.

The first simulation is shown in Figure 1 below and considers test power as effect sizes rise in a heterogeneous signals setting. The effect sizes are the same for each causal SNP, with half positive and half negative for each outcome. We would expect this setting to favor the proposed Qind as well as the ICSKAT tests, which assume that the genetic effects are heterogeneous. Indeed, we see that Qind offers the most power as the effect sizes increase, and its outperformance over the alternative tests accelerates as the genetic effects become larger. The omnibus test Qomni performs almost as well as Qind, demonstrating that it can borrow information from Qind when the genetic effects are heterogeneous. The same trends are broadly observed in both the n=2400 (Figures 1A and 1C) and n=24000 subjects cases (Figures 1B and 1D) as well as when testing K=3 (Figures 1A and 1B) and K=5 outcomes (Figures 1C and 1D). The high power of Qind across these different settings validates the intuition that efficiency gains should be large when utilizing three or five times as much information in a joint test.

FIGURE 1.

FIGURE 1

Power simulations conducted at varying effect sizes, with half of the effect sizes in the positive direction and half in the negative direction for all outcomes. Panels (A) and (B) were simulated using three correlated outcomes while (C) and (D) were simulated using five correlated outcomes. The sample sizes were varied for each scenario as well, with (A) and (C) using a sample size of n=2400 and (B) and (D) having n=24000. Four causal variants were used for all the simulations, and each plotted point is the result of 200 iterations. Qind refers to the joint set-based test assuming an independent covariance structure; Qcor refers to the joint set-based test assuming a perfectly correlated covariance structure; Qomni refers to the joint set-based omnibus test; ICSKAT, Bonferroni refers to interval-censored SKAT, with Bonferroni-corrected p-values; ICSKAT, ACAT refers to interval-censored SKAT followed by aggregated Cauchy association test; Burden, ACAT refers to the interval-censored burden test followed by aggregated Cauchy association test.

Among the other tests, we can see that combining the ICSKAT individual outcome test statistics also performs well when the effects are heterogeneous. Combining the individual outcome burden tests or using Qcor tends to produce very little power regardless of the sample size or number of outcomes. Thus, even combining the data from several different outcomes is not helpful when a major assumption on the effect size is violated. Since half of the effects on each outcome are positive and half are negative, they generally cancel each other out in the burden test framework.

However, it is not always the case that effect sizes are heterogeneous. To create a scenario that would be more favorable for the burden test, we next generated outcomes from genetic data with more homogeneous effect sizes. We again varied the effect sizes as in the first set of simulations, but we set the effect sizes to share the same sign and magnitude. The results are shown in Figure 2. When there is low variability among the effect sizes, the Qcor test performs the best regardless of whether there are n=2400 (Figures 2A and 2C) or n=24000 (Figures 2B and 2D) subjects and for both three and five outcomes as well. Again, we can see the Qomni performs almost as well as the best test, Qcor, further showing the robustness of the omnibus approach. Interestingly, the Qind test demonstrates roughly the same power as the combination strategies when there are K=3 outcomes (Figures 2A and 2B) but is the worst test when there are K=5 outcomes (Figures 2C and 2D). The implication is that when there are five outcomes, one of the single outcome tests will be so significant that it outweighs the advantage of using all the data together in a joint test, albeit one that makes an incorrect assumption about the genetic effect distribution.

FIGURE 2.

FIGURE 2

Power simulations conducted at varying effect sizes, which were all constant and positive for each outcome in a given iteration. Panels (A) and (B) were simulated using three correlated outcomes while (C) and (D) were simulated using five correlated outcomes. The sample sizes were varied for each scenario as well, with (A) and (C) using a sample size of n=2400 and (B) and (D) having n=24000. Four causal variants were used for all the simulations, and each plotted point is the result of 200 iterations. Qind refers to the joint set-based test assuming an independent covariance structure; Qcor refers to the joint set-based test assuming a perfectly correlated covariance structure; Qomni refers to the joint set-based omnibus test; ICSKAT, Bonferroni refers to interval-censored SKAT, with Bonferroni-corrected p-values; ICSKAT, ACAT refers to interval-censored SKAT followed by aggregated Cauchy association test; Burden, ACAT refers to the interval-censored burden test followed by aggregated Cauchy association test.

We next wanted to observe how the different tests perform when only one of the outcomes contributes genetic effects. We simulated event times using the same scheme as Figure 1, except exactly one outcome in both the K=3 and K=5 scenario was given genetic effects while the rest of the outcomes were given zero genetic effects. The results can be seen in Figure 3. Even in the lower sample size settings (Figures 3A and 3C), Qind maintains the highest power while the other tests perform less favorably. In the larger sample size setting (Figures 3B and 3D), Qind outperforms the other tests by much more, while the combination of the ICSKAT individual outcome test statistics performs well when the genetic effects are stronger and the sample size is larger. The burden test performs poorly in this scenario, which is expected again since the outcomes possess heterogeneous effects.

FIGURE 3.

FIGURE 3

Power simulations conducted at varying effect sizes, with one outcome having half positive and half negative effects and the rest of the outcomes having zero genetic effects. Panels (A) and (B) were simulated using three correlated outcomes while (C) and (D) were simulated using five correlated outcomes. The sample sizes were varied for each scenario as well, with (A) and (C) using a sample size of n=2400 and (B) and (D) having n=24000. Four causal variants were used for all the simulations, and each plotted point is the result of 200 iterations. Qind refers to the joint set-based test assuming an independent covariance structure; Qcor refers to the joint set-based test assuming a perfectly correlated covariance structure; Qomni refers to the joint set-based omnibus test; ICSKAT, Bonferroni refers to interval-censored SKAT, with Bonferroni-corrected p-values; ICSKAT, ACAT refers to interval-censored SKAT followed by aggregated Cauchy association test; Burden, ACAT refers to the interval-censored burden test followed by aggregated Cauchy association test.

3.2.1 |. Power Simulations from UK Biobank Genetic Data

For another set of experiments, we used real genetic data from the UKB. Each time, the SNP-set is composed of 50 consecutive variants and four causal SNPs from a random gene on chromosome 2. Otherwise, the model is the same as that used in Figure 1. We again varied the genetic effect sizes and simulated the outcome times under the setting that all outcomes have half positive and half negative genetic effects. Our results are displayed in Figure 4. For the larger sample size cases (Figures 4B and 4D), we can see that Qcor and Qomni maintain the highest power across all effect sizes.

FIGURE 4.

FIGURE 4

Power simulations generated from real genetic data from the UKB. The effect sizes were varied in a heterogeneous setting. (A) and (B) were simulated using three correlated outcomes while (C) and (D) were simulated using five correlated outcomes. (A) and (C) have a sample sizes of n=1500 and (B) and (D) have a sample size of n=24000. Ten causal variants were used for all the simulations, and each plotted point is the result of 200 iterations. Qind refers to the joint set-based test assuming an independent covariance structure; Qcor refers to the joint set-based test assuming a perfectly correlated covariance structure; Qomni refers to the joint set-based omnibus test; ICSKAT, Bonferroni refers to interval-censored SKAT, with Bonferroni-corrected p-values; ICSKAT, ACAT refers to interval-censored SKAT followed by aggregated Cauchy association test; Burden, ACAT refers to the interval-censored burden test followed by aggregated Cauchy association test.

These power simulations show that combining multiple interval-censored outcomes can lead to more powerful tests. Specifically, jointly testing the associations is more powerful than testing each individual outcome and correcting the p-values. When the effect sizes are homogeneous, burden-type tests that assume similar genetic effects are more favorable. However, when the effect sizes show more variation, assuming independent random effects is a powerful option. Ultimately, there is no uniformly most powerful test across all situations. The omnibus test almost never delivers the best performance, but it also never loses too much power compared to the best test. Thus in practice, we recommend using Qomni, as knowledge of the true distribution of γ is rare. Additional simulations that investigate different factors, such as model weights and censoring rates, can be found in Appendices I - M.

4 |. APPLICATION IN UK BIOBANK FRACTURE-RELATED TRAITS

This work is motivated by the high prevalence of fractures in the health care system. It is estimated that approximately 1.5 million occur nationally every year, and the impacts on quality of life are devastating.27 Fractures create substantial economical burdens as well at about $30 000 per incidence.28 Moreover, the number of fractures and the costs of their care have been increasing yearly.29 It is of great interest to identify genetic risk factors for fractures so that a variety of translational interventions can be implemented. For example, a better understanding of genetic etiology could result in improved identification of high-risk individuals or possible therapeutic strategies.

Many previous GWAS studies have been successful in identifying genes associated with various fracture-like traits. For example, a previous UKB study identified 13 novel loci associated with bone mineral density (BMD),10 a heritable trait that is a leading determinant of osteoporosis. An earlier study identified 153 novel loci related to heel fractures.30 For both of these studies, primary interest lay in whether subjects had experienced their first fracture, as opposed to counting the number of fractures. One major limitation of these studies is that they were generally conducted on a single outcome at a time. As previously mentioned, investigators also sometimes transform the data and discard the time-to-event information.

The major unique features of this analysis are the use of time-to-event data and the joint testing for two outcomes simultaneously. Specifically, we consider the event times for both falls and fractures in the UKB. Following previous practice,10,30 we focus on the time to first incidence, which is interval-censored. That is, Ti1 is the time to first fracture and Ti2 is the time to first fall. Note that falls and fractures do not have to occur sequentially or together. That is, a subject may experience neither event, only one of the two events, a fracture before a fall, or a fall before a fracture. Additional information about these outcomes from the UK Biobank can be found in Appendix N. These two outcomes were chosen because previous studies of both outcomes have shown the significant positive genetic correlation between falls and fractures.31 Joint set-based tests using the original time-to-event data may identify more genes with pleiotropic effects on falls and fractures compared to analyzing each outcome by itself.

We applied the proposed Qind to a heavy-smoker population expected to be most at risk for fractures. The negative correlation between smoking and bone health is well-established, as smoking leads to less calcium absorption in the bones and increases the rate of bone loss.32 Heavy smokers were defined as those who smoked more than 40 packs of cigarettes per year.33 Using this subset of the data provides two main advantages. First, it is useful to better understand the determinants of disease among those who are already most at risk. Second, the sheer amount of information used when testing multiple outcomes creates computational bottlenecks when loading and reading data. Testing the high-risk subjects allows us to incorporate more cases while limiting the amount of computing memory used.

The cleaned sample consists of n=15021 British individuals. We fit model (1) with covariates for sex and the first ten genetic principal components. The list of genes was compiled through Ensembl.34 In total, we tested 26 927 genes. More detail about the UK Biobank outcomes as well as the preprocessing of the genetic data is discussed in Appendix N. Table 2 displays the top ten most significant genes across all the methods, and Supplementary Table 4 in Appendix O shows the top 10 genes selected by any of the three multiple outcome tests. Figure 5 displays a quantile-quantile plot of all the calculated p-values.

TABLE 2.

The top ten most significant genes associated with fall and fracture risk data. The genes are ordered by most significant from any test. Qind refers to the joint set-based test assuming an independent covariance structure; Qcor refers to the joint set-based test assuming a perfectly correlated covariance structure; Qomni refers to the joint set-based omnibus test; ICSKAT (min) refers to the minimum p-value between the single outcome interval-censored SKAT with fractures as the outcome and with falls as the outcome; ICSKAT, ACAT refers to interval-censored SKAT of both outcomes followed by aggregated Cauchy association test; Burden (min) refers to the minimum p-value between the single outcome interval-censored burden with fractures as the outcome and with falls as the outcome; Burden, ACAT refers to the interval-censored burden test of both outcomes followed by aggregated Cauchy association test.

Gene Chr Qind Qcor Qomni ICSKAT (min) ICSKAT, ACAT Burden (min) Burden, ACAT
MRPS22 3 1.05 × 10−2 3.40 × 10−2 1.95 × 10−2 9.95 × 10−5 1.99 × 10−4 5.85 × 10−7 1.17 × 10−6
SPATA31A3 9 4.27 × 10−2 1.35 × 10−2 1.76 × 10−2 2.69 × 10−6 5.39 × 10−6 3.06 × 10−6 6.13 × 10−6
PISRT1 3 2.00 × 10−3 8.31 × 10−3 4.03 × 10−3 3.47 × 10−4 6.94 × 10−4 1.81 × 10−5 3.62 × 10−5
SLC5A2 16 3.11 × 10−2 2.94 × 10−5 4.35 × 10−5 1.98 × 10−1 3.52 × 10−1 1.42 × 10−2 2.25 × 10−2
CHID1 11 7.35 × 10−1 3.50 × 10−2 5.51 × 10−2 1.31 × 10−1 1.87 × 10−1 3.64 × 10−5 7.28 × 10−5
MSX1 4 4.55 × 10−5 2.97 × 10−3 1.32 × 10−4 1.07 × 10−1 8.55 × 10−1 4.53 × 10−3 8.96 × 10−3
ARHGAP20 11 5.04 × 10−5 3.03 × 10−2 1.51 × 10−4 6.07 × 10−2 9.02 × 10−2 8.35 × 10−2 1.02 × 10−1
c4orf46 4 7.05 × 10−3 1.08 × 10−1 1.88 × 10−2 1.12 × 10−4 2.23 × 10−4 5.34 × 10−5 1.07 × 10−4
PRAMEF5 1 5.51 × 10−3 8.97 × 10−5 1.31 × 10−4 3.79 × 10−3 7.19 × 10−3 6.82 × 10−5 1.36 × 10−4
TYW1B 7 1.18 × 10−1 7.89 × 10−2 8.88 × 10−2 7.16 × 10−4 1.44 × 10−3 6.94 × 10−5 1.39 × 10−4

FIGURE 5.

FIGURE 5

Quantile-quantile plot of the resulting p-values for all six methods associating time to falls and fractures in the UKB. Qind refers to the joint set-based test assuming an independent covariance structure; Qcor refers to the joint set-based test assuming a perfectly correlated covariance structure; Qomni refers to the joint set-based omnibus test; ICSKAT, Bonferroni refers to interval-censored SKAT, with Bonferroni-corrected p-values; ICSKAT, ACAT refers to interval-censored SKAT followed by aggregated Cauchy association test; Burden, ACAT refers to the interval-censored burden test followed by aggregated Cauchy association test.

In a reassuring result, our analysis identified several genes that have previously been found in multiple studies as having an association with bone health. One of the most significant genes, MSX1, has been previously studied as a key factor in limb-pattern formation.35,36 Additionally, another top gene, ARHGAP20, has shown to be downregulated in fractures.37 Other findings include TRAM1 and MESDC2, which were in the top 10 genes that were identified by our methods, have also previously been linked to similar processes.38,39

The results from the joint test, such as the low p-value yielded from MSX1, can be interpreted as evidence of an association when testing a larger null hypothesis that includes both falls and fractures outcomes. We note that the aforementioned top genes showed little evidence of association when using the single outcome ICSKAT. For risk of fractures, the individual p-values for the association of MSX1, TRAM1, and MESDC2 using ICSKAT were p = 0.95, p = 0.70, p = 0.56 respectively, and for falls, the p-values were p = 0.11, p = 0.59, p = 0.12 respectively. The ability of our test to identify genetic variants that are known to be associated with bone health and bone diseases shows the potential for more discoveries when using methods suited for time-to-event outcomes. Furthermore, the identification of novel genes in our analysis show that using multiple correlated outcomes can identify genes that may have been too weak to be detected in single-outcome studies.

5 |. CONCLUSION

With the recent surge of interest in generating massive genetic compendiums, the amount of time-to-event data available for investigating the etiology of complex diseases has never been greater. However, much of the available data is interval-censored, and there exists a lack of statistical methodology for performing genetic association studies with such data. While outcomes can be transformed to binary or right-censored forms, transformations can naturally introduce a loss of efficiency or robustness.

Here, we developed a set-based variance components test to associate SNP sets with multiple interval-censored outcomes. This test can greatly increase power to detect rare and weak effects by integrating data from multiple correlated phenotypes, as opposed to performing inference on a single outcome at a time. No transformations to the original interval-censored data are required. Broadly, the approach relies on introducing a subject-specific correlation term in the survival model and also specifying the genetic effects to be random. Different test statistics can be constructed by specifying different correlation matrices for these random effects. We further introduced an omnibus test that aggregates these different test statistics.

Simulation shows that the proposed approaches offer good protection of type I error rates. Additionally, jointly testing multiple outcomes generally outperforms the ad-hoc strategy of combining multiple single-outcome tests. Power can be dependent on the true genetic signal strengths and directions, which are unknown in a practical analysis. We suggest using Qomni in most practical testing situations.

In application to the UKB dataset, the joint test highlights genes that have previously been identified as important in bone health. As noted previously, some of these genes, such as MSX1, TRAM1, and MESDC2, are not ranked highly when only looking at a single outcome at a time. Our methodology is publicly available in the SIMICO package, which stands for Set-based Inference with Multiple Interval-Censored Outcomes. The package offers many customization options, including allowance for different numbers of outcomes.

Some limitations of this study should be mentioned. While our extensive simulation studies encompass a wide range of scenarios, we recognize that the full complexity of real genetic data is challenging to capture. The true effect sizes, correlation structure, and the numbers of causal variants are often unknown and difficult to precisely replicate. There are many parameters that can be adjusted in genetic simulations, and empirical results should only be used as a guide, not a guarantee, of performance in real data analysis situations. Similarly, data application results from a single study should always be interpreted with caution as well. Challenges such as model misspecification or unexplained sources of confounding factors can never be totally eliminated and may cause misleading findings. Further, all genetic association should be further investigated in functional follow-up experiments for validation.

There are many interesting possible extensions of this work. First, it would be interesting to incorporate functional annotation information as weights in the model to increase power for detecting risk genes. Additionally, we are interested in exploring more accurate approximations of the log of the baseline cumulative hazard function. Finally, it is of interest to develop methods for selecting individual causal SNPs from within significant SNP sets.

Supplementary Material

Supinfo

ACKNOWLEDGEMENTS

This research has been conducted using the UK Biobank Resource under Application Numbers 52008 and 73569. This project is supported by NIH T32 training grant award 5T32CA096520-16, NIH grant R03DE029238, and NIH/NCI Cancer Center Support Grant (CCSG) #P30CA016672. We would also like to thank the Associate Editor and two reviewers for their detailed review and helpful comments which have strengthened this work.

References

  • 1.Allen N, Sudlow C, Downey P, et al. UK Biobank: Current status and what it means for epidemiology. Health Policy Technol. 2012; 1(3): 123–126. [Google Scholar]
  • 2.Gaziano JM, Concato J, Brophy M, et al. Million Veteran Program: A mega-biobank to study genetic influences on health and disease. J Clin Epidemiol. 2016; 70: 214–223. [DOI] [PubMed] [Google Scholar]
  • 3.All of Us Research Program Investigators. The “All of Us” research program. N Engl J Med. 2019; 381(7): 668–676. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Bycroft C, Freeman C, Petkova D, et al. The UK Biobank resource with deep phenotyping and genomic data. Nature. 2018; 562(7726): 203–209. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Littlejohns TJ, Holliday J, Gibson LM, et al. The UK Biobank imaging enhancement of 100,000 participants: rationale, data collection, management and future directions. Nat Commun. 2020; 11(1): 1–12. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Sun J. The Statistical Analysis of Interval-censored Failure Time Data. New York: Springer; 2006. [Google Scholar]
  • 7.Dutta D, Scott L, Boehnke M, Lee S. Multi-SKAT: General framework to test for rare-variant association with multiple phenotypes. Genet Epidemiol. 2019; 43(1): 4–23. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Wu B, Pankow JS. Sequence kernel association test of multiple continuous phenotypes. Genet Epidemiol. 2016; 40(2): 91–100. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Gómez G, Calle ML, Oller R, Langohr K. Tutorial on methods for interval-censored data and their implementation in R. Stat Modelling. 2009; 9(4): 259–297. [Google Scholar]
  • 10.Morris JA, Kemp JP, Youlten SE, et al. An atlas of genetic influences on osteoporosis in humans and mice. Nat Genet. 2019; 51(2): 258–266. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Radke BR. A demonstration of interval-censored survival analysis. Prev Vet Med. 2003; 59(4): 241–256. [DOI] [PubMed] [Google Scholar]
  • 12.Rodrigues AS, Calsavara VF, Silva FI, Alves FA, Vivas AP. Use of interval-censored survival data as an alternative to Kaplan-Meier survival curves: studies of oral lesion occurrence in liver transplants and cancer recurrence. Applied Cancer Research. 2018;38(1): 1–10. [Google Scholar]
  • 13.Sun R, Zhu L, Li Y, Yasui Y, Robison L. Inference for set-based effects in genetic association studies with interval-censored outcomes. Biometrics. 2023; 79(2): 1573–1585. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Wu D, Li C, Lu Q. Multi-marker genetic association and interaction tests with interval-censored survival outcomes. Genet Epidemiol. 2021; 45(8): 860–873. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Wu MC, Lee S, Cai T, Li Y, Boehnke M, Lin X. Rare-variant association testing for sequencing data with the sequence kernel association test. Am J Hum Genet. 2011; 89(1): 82–93. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Li X, Yung G, Zhou H, et al. A multi-dimensional integrative scoring framework for predicting functional variants in the human genome. Am J Hum Genet. 2022; 109(3): 446–456. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Turley P, Walters RK, Maghzian O, et al. MTAG: multi-trait analysis of GWAS. BioRxiv. 2017: 118810. [Google Scholar]
  • 18.Lin X. Variance component testing in generalised linear models with random effects. Biometrika. 1997; 84(2): 309–326. [Google Scholar]
  • 19.Zhang Z, Sun L, Zhao X, Sun J. Regression analysis of interval-censored failure time data with linear transformation models. Can J Stat. 2005; 33(1): 61–70. [Google Scholar]
  • 20.Cheng S, Wei L, Ying Z. Analysis of transformation models with censored data. Biometrika. 1995; 82(4): 835–845. [Google Scholar]
  • 21.Royston P, Parmar MK. Flexible parametric proportional-hazards and proportional-odds models for censored survival data, with application to prognostic modelling and estimation of treatment effects. Stat Med. 2002; 21(15): 2175–2197. [DOI] [PubMed] [Google Scholar]
  • 22.Pan J, Thompson R. Gauss-Hermite quadrature approximation for estimation in generalised linear mixed models. Comput Stat. 2003; 18(1): 57–78. [Google Scholar]
  • 23.Pinheiro JC, Chao EC. Efficient Laplacian and adaptive Gaussian quadrature algorithms for multilevel generalized linear mixed models. J Comput Graph Stat. 2006; 15(1): 58–81. [Google Scholar]
  • 24.Lin X, Breslow NE. Bias correction in generalized linear mixed models with multiple components of dispersion. J Am Stat Assoc. 1996; 91(435): 1007–1016. [Google Scholar]
  • 25.Liu Y, Chen S, Li Z, Morrison AC, Boerwinkle E, Lin X. ACAT: A fast and powerful p Value combination method for rare-variant analysis in sequencing studies. Am J Hum Genet. 2019; 104(3): 410–421. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Bland JM, Altman DG. Multiple significance tests: the Bonferroni method. Bmj. 1995; 310(6973): 170. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Office of the Surgeon General (US). Bone Health and Osteoporosis: A Report of the Surgeon General. 2004. [PubMed]
  • 28.Williams SA, Chastek B, Sundquist K, et al. Economic burden of osteoporotic fractures in US managed care enrollees. Am J Manag Care. 2020; 26(5): e142–e149. [DOI] [PubMed] [Google Scholar]
  • 29.Burge R, Dawson-Hughes B, Solomon DH, Wong JB, King A, Tosteson A. Incidence and economic burden of osteoporosisrelated fractures in the United States, 2005–2025. J Bone Miner Res. 2007; 22(3): 465–475. [DOI] [PubMed] [Google Scholar]
  • 30.Kemp JP, Morris JA, Medina-Gomez C, et al. Identification of 153 new loci associated with heel bone mineral density and functional involvement of GPC6 in osteoporosis. Nat Genet. 2017; 49(10): 1468. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Trajanoska K, Seppala LJ, Medina-Gomez C, et al. Genetic basis of falling risk susceptibility in the UK Biobank Study. Commun Biol. 2020; 3(1): 1–10. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Krall EA, Dawson-Hughes B. Smoking increases bone loss and decreases intestinal calcium absorption. J Bone Miner Res. 1999; 14(2): 215–220. [DOI] [PubMed] [Google Scholar]
  • 33.Kim MJ, Shin R, Oh HK, Park JW, Jeong SY, Park JG. The impact of heavy smoking on anastomotic leakage and stricture after low anterior resection in rectal cancer patients. World J Surg. 2011; 35(12): 2806–2810. [DOI] [PubMed] [Google Scholar]
  • 34.Zerbino DR, Achuthan P, Akanni W, et al. Ensembl 2018. Nucleic Acids Res. 2018; 46(D1): D754–D761. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Goto N, Fujimoto K, Fujii S, et al. Role of MSX1 in osteogenic differentiation of human dental pulp stem cells. Stem Cells Int. 2016; 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Orestes-Cardoso S, Nefussi JR, Lezot F, et al. Msx1 is a regulator of bone formation during development and postnatal growth: in vivo investigations in a transgenic mouse model. Connect Tissue Res. 2002; 43(2–3): 153–160. [DOI] [PubMed] [Google Scholar]
  • 37.Del Real A, Pérez-Campo FM, Fernández AF, et al. Differential analysis of genome-wide methylation and gene expression in mesenchymal stem cells of patients with fractures and osteoarthritis. Epigenetics. 2017; 12(2): 113–122. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Mo XB, Lu X, Zhang YH, Zhang ZL, Deng FY, Lei SF. Gene-based association analysis identified novel genes associated with bone mineral density. PLoS One. 2015; 10(3): e0121811. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Moosa S, Yamamoto GL, Garbes L, et al. Autosomal-recessive mutations in MESD cause osteogenesis imperfecta. Am J Hum Genet. 2019; 105(4): 836–843. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supinfo

RESOURCES