Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2018 Jun 1.
Published in final edited form as: Biometrics. 2017 Mar 10;73(2):551–561. doi: 10.1111/biom.12624

Hypothesis testing in functional linear models

Yu-Ru Su 1, Chong-Zhi Di 1,*, Li Hsu 1
PMCID: PMC5518697  NIHMSID: NIHMS874194  PMID: 28295175

Summary

Functional data arise frequently in biomedical studies, where it is often of interest to investigate the association between functional predictors and a scalar response variable. While functional linear models (FLM) are widely used to address these questions, hypothesis testing for the functional association in the FLM framework remains challenging. A popular approach to testing the functional effects is through dimension reduction by functional principal component (PC) analysis. However, its power performance depends on the choice of the number of PCs, and is not systematically studied. In this paper, we first investigate the power performance of the Wald-type test with varying thresholds in selecting the number of PCs for the functional covariates, and show that the power is sensitive to the choice of thresholds. To circumvent the issue, we propose a new method of ordering and selecting principal components to construct test statistics. The proposed method takes into account both the association with the response and the variation along each eigenfunction. We establish its theoretical properties and assess the finite sample properties through simulations. Our simulation results show that the proposed test is more robust against the choice of threshold while being as powerful as, and often more powerful than, the existing method. We then apply the proposed method to the cerebral white matter tracts data obtained from a diffusion tensor imaging tractography study.

1. Introduction

With functional data arising increasingly common in many scientific studies, functional data analysis (FDA; Ramsay and Silverman, 2005) has been an important area of research. In these studies, a common question is to quantify the relationship between functional/longitudinal covariates and scalar responses. Functional linear models (FLM; Ramsay and Dalzell, 1991) have been widely used to address such questions, as it allows for a dynamic association between the response and the functional covariate at different points on the support in consideration.

Testing the association of functional covariates with response is of great interest as it provides an overall assessment of the association; however, it remains challenging due to infinite dimensionality of functional covariates. To overcome this issue, a natural strategy is to reduce the dimension. For FLM it is popular to represent functional covariates and the coefficient function by linear combinations of a set of basis functions, such as a pre-specified basis system like B-splines, Fourier or wavelet bases (James, 2002; Goldsmith et al., 2011, 2013), or data-adaptive basis functions from functional principal component analysis (FPCA; Cardot et al., 1999; Yao et al., 2005b; Goldsmith et al., 2011, 2013). Under such representations, the testing problem in FLM reduces to hypothesis testing under a classical linear model or linear mixed effects model.

We focus on testing procedures based on FPCA, since it provides parsimonious representation of functional data and is widely used. Several approaches have been proposed to test the leading principal components in FPCA for the associations using the covariance-based test (Cardot et al., 2003; Horváth and Kokoszka, 2012), or classical tests such as Wald, score or likelihood ratio (Kong et al., 2013; Swihart et al., 2014). All of these works require pre-specifying a threshold to choose the leading principal components (PCs) for inclusion in the test, where PCs are ranked based on eigenvalues, or equivalently the percentage of variance explained (PVE) for the functional covariates. However, as we will demonstrate, the PVE alone is not an optimal criterion to use for the purpose of testing, because the power is sensitive to the choice of the threshold. Different thresholds often lead to very different p-values and therefore inconsistent conclusions (e.g., the diffusion tensor imaging example in Section 5), making the statistical inference confusing and difficult to interpret for practitioners. In the literature, tests based on pre-specified spline basis representations have also been proposed, for example a permutation F-test (Ramsay et al., 2009) and restricted likelihood ratio tests (Swihart et al., 2014). The former one requires a relatively high computational cost, and the later ones are shown to be outperformed by FPCA-based tests in terms of power (Swihart et al., 2014). Thus, we will focus on FPCA-based approaches in this paper.

In this paper, we propose a novel testing procedure that orders and selects the PCs based on an association-variation index (AVI) that combines both the variation and the association along each direction. The AVI is directly related to the noncentrality parameter in power functions, so using the AVI to select the PCs is more desirable for the testing purpose and can potentially improve power. Compared to existing tests, the proposed procedure is more robust to the choice of tuning parameters (threshold values used to choose the number of PCs), while enjoying power gain with relatively low number of selected PCs in many scenarios. In addition, we provide a comprehensive power study of classical procedures, as, to our best knowledge, there is no systematic study that evaluates how the power of classical tests depends on the tuning parameters in FPCA, such as the number of PCs or the threshold of the PVE.

The rest of the paper is organized as follows. In Section 2, we review existing FLM works, and discuss potential issues on classical testing procedures. We then introduce our proposed testing procedure and establish the asymptotic properties of the proposed test in Section 3. Simulation studies and results are presented in Section 4, followed by a data example of diffusion tensor imaging study in Section 5. Conclusions and future directions are discussed in Section 6.

2. Classical testing procedures and power considerations

2.1 Notation, models and classical testing procedures

We consider the setting with a scalar response Y and a functional covariate X(t) ∈ L2(𝒯) defined on a compact support 𝒯 ⊆ ℝ. Assuming that the observed sample consists of n independent subjects. Denote {Yi, Xi(·)}, i = 1,, n, the i.i.d. realizations of {Y, X(·)}. We consider the case that Xi(·) is measured on a set of dense and regular grid points over {tj ∈ 𝒯 : j = 1,, J}. The scenario for Xi(·) measured intermittently with error will be discussed in Section 3.

The scalar-on-function FLM model being considered has the following form,

Y=α0+Tβ(t)X(t)dt+ε, (1)

where β(t) ∈ L2(𝒯) is a smooth square integrable coefficient function, α0 ∈ ℝ is the intercept and ε is an error term with mean 0 and variance σε2. We further assume that ε is independent of X(·). Under model (1), β(·) describes the effect of the functional covariate X(·) on Y along t in the support 𝒯. The statistical problem of interest in this paper focuses on hypothesis testing for

H0:β(t)=0,foralltTvs.Ha:β(t)0fortinsomeopensubsetinT.

This corresponds to test the global null effect of the coefficient function β(·).

To model the coefficient function β(·) nonparametrically, a popular approach is to employ FPCA-based expansions. Specifically, the covariance matrix of the random function X(·) has a spectral decomposition, G(s,t)=cov[X(s),X(t)]=k=1λkϕk(s)ϕk(t), where λk’s are non-negative eigenvalues ranked in nonincreasing order with k=1λk<, and ϕk(·)’s are the corresponding orthonormal eigenfunctions. Using the Karhunen-Loève expansion, the random function X(t) can be expressed as X(t)=μ(t)+k=1ξkϕk(t), where μ(·) denotes the mean function of X(·), ξk’s are principal component scores with E(ξk) = 0, var(ξk) = λk, and E(ξkξk′) = 0, for kk′. The scores can be calculated by ∫𝒯 {X(t) − μ(t)}ϕk(t) dt. Since the eigenfunctions provide a set of basis functions for the functional space, the coefficient function can be represented by β(t)=k=1βkϕk(t), where βk = ∫𝒯 β(t)ϕk(t)dt. Applying these expansions to both Xi(·) and β(·), the FLM (1) can be re-written as

Y=α0+k=1βkξk+ε, (2)

where α0=α0+Tμ(t)β(t)dt. In practice, a few leading principal components (PC) often suffice to approximate the process X(·) well. Thus, a truncated FLM model using the first K PCs is

Y=α0+k=1Kβkξk+ζ, (3)

where ζ=k=K+1βkξk+ε. Under this model, the null hypothesis becomes β1 = ··· = βK = 0. The error term ζ has mean 0 and a larger variance σζ2=K+1λkβk2+σε2. A common approach to determine the number of PCs, K, is based on the PVE that directly depends on eigenvalues. For example, given threshold γ (e.g., 95%), K is defined as the smallest integer that satisfies k=1Kλi/k=1λkγ, and is finite due to the fact that k=1λk=T{X(t)}2dt<.

Classical testing procedures based on truncated model (3) are widely used in the literature, with asymptotic properties formally studied by Kong et al. (2013). It has been shown that eigenvalues, eigenfunctions, and PC scores can be consistently estimated via FPCA from the observed data (Hall and Hosseini-Nasab, 2006; Zhu et al., 2014). The least square estimate β̂k obtained from fitting model (3) with ξk substituted by ξ̂k is expressed as β^k=(ξ^kTξ^k)-1ξ^kTY. It can be shown that the asymptotic variance of β̂k is σζ2nλk, and β̂k’s are asymptotically independent with each other. The Wald-type test statistic is thus defined as

Tc=k=1Knβ^k2var^(β^k)=1σ^ζ2k=1KnYTξ^kξ^kTYξ^kTξ^k, (4)

where Kn is the number of selected PCs based on estimated eigenvalues, and σ^ζ2 is the estimated error variance. As shown in Kong et al. (2013), the null distribution of Tc and a χ2-distribution with Kn degrees of freedom are asymptotically equivalent under some regularity conditions.

2.2 Power considerations

The truncated model (3) effectively reduces dimensionality and allows for statistical inference using a classical linear model framework. However, the impact of tuning parameter selection (the number of PCs, or equivalently the PVE threshold) on hypothesis testing, especially statistical power, is not well understood. For example, commonly used thresholds to determine the truncated model by the PVE are 80%, 90%, 95%, and 99%. The leading PCs are selected corresponding to directions that explain the largest variation. From an estimation perspective, a higher PVE threshold generally results in a better approximation of X(t) and β(t). However, as will be demonstrated, a higher PVE threshold does not necessarily lead to a higher power in hypothesis testing.

We now study how the power of classical tests depends on the choice of the PVE threshold or the number of PCs selected. Consider a specific alternative hypothesis Ha : β(t) = βa(t), where βa(t) ≠ 0 for t in some open subset in 𝒯. It can be shown that the distribution of Tc and a noncentral χ2-distribution, χKn2(η), are asymptotically equivalent (Müller and Stadtmüller, 2005, Theorem 4.1), where

η=nσζ2k=1Knλkβka2,whereβka=Tβa(t)ϕk(t)dt. (5)

Consequently the power function of Tc is approximated by Pr{χKn2(η)qχKn2,1-α}, where qχKn2,1-α is the (1 − α)% quantile of χKn2.

The formula above clearly demonstrates that the power contribution from the kth component involves both the eigenvalue λk as well as the magnitude of association βk. Among all PCs, the one that contributes the most power is not the first PC with the largest λk, but rather the component with the largest λkβk2 value. Similarly, to maximize power among all truncated models with K components (i.e., fixing the degree of freedom to be K), the optimal procedure chooses those with the largest values of λkβk2. In contrast, classical procedures choose the first K components with the largest eigenvalues (λk), and thus might not perform well in terms of power, especially when PCs with small variation are strongly associated with the response or when leading PCs are not associated with the outcome. The potential issue of using the leading PCs for functional regression was also briefly mentioned by Zhu et al. (2014), but they focused on estimation instead of hypothesis testing.

To illustrate this phenomenon, we show a numerical example below. The eigenfunctions are 6 Fourier basis functions ϕk(t), while corresponding eigenvalues are 1, 0.2, 0.15, 0.1, 0.075, and 0.06, respectively. The true coefficient function is β(t) = 0.03ϕ1(t) + 0.03ϕ2(t) + 0.06ϕ3(t) + 0.24ϕ4(t) + 0.03ϕ5(t) + 0.03ϕ6(t). Figure (1a) visualizes λk, βk, and λkβk2 along each PC. In this example, the fourth PC is most strongly associated with the response. To explore how the PVE threshold affects the power performance, we considered a set of thresholds 50%, 70%, 80%, 90%, 95%, and 99%, corresponding to 1 to 6 selected PCs by truncated models, respectively. Figure (1b) shows the PVE and power as functions of the number of included PCs. For the first three thresholds, the corresponding power are 0.167, 0.162, and 0.186, respectively. The power increases substantially to 0.609 when the threshold increases to 90%, under which the fourth PC is additionally selected in the model. If the threshold further increases to 95% and 99%, however, the power decreases to 0.582 and 0.540, respectively. This demonstrates that the choice of the PVE threshold is critical in power performance and that a higher PVE threshold does not imply a higher power. On the other hand, it is possible that an alternative procedure achieves higher power than any of the six truncated model under consideration by the classical procedure. In fact, if one includes only the the first and the fourth ones, the corresponding power is 0.666, superior to the highest power 0.609 attained under any PVE thresholds. This example shows that eigenvalues alone do not provide the optimal criterion to rank and select the PCs for the testing purpose.

Figure 1.

Figure 1

An illustrative example: functional regression model on a scalar response with a functional covariate consisting of 6 Fourier basis. Left: eigenvalues (λ), magnitude of associations (β), and contributions of each PC to the power function. Right: the cumulative percentage of variation explained and the power of corresponding tests versus the number of PCs selected.

3. A new selection criterion and testing procedure

3.1 The proposed procedure

To incorporate the association between the covariate and the response into the selection criterion, we hereby propose an association-variation index (AVI)

Vk=λkβk2,k=1,,,

for each principal component. The AVI is motivated by the noncentral parameter shown in (5) and is closely related to the coefficient of determination in the linear model. First, the power of the classical method demonstrated in the previous section depends on the noncentrality parameter η. Since λk and βk affect the value of η only through λkβk2, it is natural to consider the AVI defined above. Second, it can be shown that the AVI Vk=cov2(ξk,Y)λk is proportional to the coefficient of determination Rk2=corr2(ξk,Y), the proportion of the variation of Y explained by ξk. By taking the association βk into account in the selection criterion, the AVI captures the most important directions for hypothesis testing in FLM, yet keeps the amount of information along each direction in consideration.

However, unlike the selection criterion based on the variation λk that can be estimated based on the covariate process X(·) only, Vk involves unknown parameter βk that also depends on the outcome. To obtain β̂k, we propose to pre-fit the truncated model (3) with a very high threshold of PVE, say, 99%. This step is used to control the total number of PCs to be considered. The PVE threshold for the pre-fit model might affect the detection power. Based on our numerical experiences (Web Figure W3), a high threshold of PVE is recommended as it provides more comprehensive information for the selection procedure later on. The AVIs can be estimated by V^k=λ^kβ^k2. Suppose that there are Cn principal components obtained from the pre-fitted truncated model.

We propose a Wald-type test statistic similar to (4), but construct it based on the (nonincreasing) order statistics (k) of {k, k = 1,, Cn}. Before presenting the test statistic, we introduce a new measure, the percentage of association-variation explained (PAVE), to describe how well the truncated model approximates the FLM. The idea of PAVE is comparable to PVE except that it depends on the AVIs rather than λk only. For a given positive integer KCn, the corresponding PAVE is expressed as

k=1KV^(k)/k=1CnV^(k).

This measure determines how many eigenfunctions should be included. Given a threshold of PAVE, say γ, the test statistic based on the estimated AVI is defined as

T=k=1Knβ^(k)2var^(β^(k))=1σ^ζ2k=1KnYTξ^(k)ξ^(k)TYξ^(k)Tξ^(k),

where

Kn=argminKCnk=1KV^(k)/k=1CnV^(k)γ. (6)

T is proportional to the sum of the Kn largest values among the Cn estimated AVIs. Intuitively, the proposed testing procedure orders the principal components by the AVI and utilizes information along the directions associated with large AVI.

3.2 Asymptotic distributions under the null and alternatives

The randomness of β̂k involved in the proposed selection criterion induces complications in deriving the null distribution of T. As a result, the classical χ2 distribution does not hold. We investigate the asymptotic null distribution of T herein. Required regularity conditions are listed in Assumptions C1–C5 in Web Appendix A. There are two approaches to determining the number of eigenfunctions Kn in the testing procedure. One is to pre-specify a positive integer (≤ Cn), and the other is to set a tuning parameter for the PAVE as shown in (6). We present the asymptotic null distributions corresponding to both approaches.

Theorem 1

Denote Cn the number of selected PCs from a pre-fitted model, and Z(1), …, Z(Cn) as order statistics (in a decreasing order) of Cn i.i.d. χ12 random variables Z1, …, ZCn. Assume that regularity conditions C1–C4 and the null hypothesis hold.

  1. Given Kn = K, the distributions of T and k=1KZ(k) are asymptotically equivalent.

  2. Let γ ∈ (0, 1) be a pre-specified value of the PAVE. The distribution of T and that of k=1KZ(k) are asymptotically equivalent, where K*, a random quantity, satisfies k=1K-1Z(k)/k=1CnZ(k)<γ and k=1KZ(k)/k=1CnZ(k)γ.

The proof of Theorem 1 is included in in Web Appendix B. From Theorem 1, the asymptotic null distribution of T is affected by not only the values but also the order of the estimated AVIs. It is perceivable that the typical usage of a χK2 as the null distribution of the Wald-type T shall inflate the type I error since T has a heavier right tail. Although Theorem 1 explicitly gives the asymptotic null distribution of T, unfortunately there is no closed form since it involves order statistics. However, empirical approximations of quantiles and tail probabilities of the null distribution of T can be conducted by very fast Monte Carlo approximations. The remark below shows a special case under which an analytical distribution exists, and the quantiles and tail probabilities can be easily obtained by existing softwares.

Remark 1

When Kn is pre-specified as 1, the CDF of T under the null hypothesis can be approximated by [Fχ12(t)]Cn, where Fχ12 stands for the CDF of a χ12 random variable.

To comprehensively understand the performance of the proposed method, we also study the asymptotic distribution under an alternative hypothesis, and investigate the power of the testing procedure. For a given threshold γ ∈ (0, 1) of the PAVE, we denote by Kn0 a positive integer which satisfies

k=1Kn0-1V(k)/k=1CnVk<γandk=1Kn0V(k)/k=1CnVkγ.

Below we consider an alternative hypothesis Ha : β(·) ≠ 0, for a certain known β(·)=k=1βkϕk(·). The asymptotic distribution of T under the alternative hypothesis, as n → ∞, is presented in the following theorem and a sketch of the proof is provided in Web Appendix C.

Theorem 2

Assume that regularity conditions C1–C5 hold. Denote {V(k)} the ordered statistics of V1, …, VCn and ηn=nσζ2k=1Kn0V(k). Under Ha, the proposed test statistic T follows the following asymptotic distribution.

T-(Kn+ηn)2(Kn+2ηn)DN(0,1),asn. (7)

Theorem 2 can be used for calculating sample size when a desired level of power p* is specified. For a given sample size n, significance level α, threshold of PVE for the pre-fit model, and specified alternative β(·) with associated Vk, the power can be approximated by 1-Φ(q(α,Kn,Cn)-(Kn+ηn)2(Kn+2ηn)), where q(α,Kn,Cn) is the 100(1 − α)-percent quantile of the null distributions shown in Theorem 1. By integrating estimation of the covariance matrix from some pilot study and solving 1-Φ(q(α,Kn,Cn)-(Kn+ηn)2(Kn+2ηn))p, researchers can decide an appropriate sample for a desired level of power p*.

3.3 Other extensions

In practice, the functional covariate observed from the ith subject is often recorded intermittently at grid points t⃗i = (ti1,, tini), and may also be subject to measurement error. Specifically, the observed functional covariate W⃗i = (Wi1,, Wini) is expressed as Wij = Xi(tij) + eij, where eij~(0,σe2) is an error term. The measurement error introduces additional variation in the observations and hence results in a covariance matrix cov(W⃗i) with the (j1, j2)-element as G(tij1,ij2)+σe2I(j1=j2). Yao et al. (2005a) extended FPCA to adapt the situations with measurement error. They proposed an approach called PACE (principle component analysis based on conditional expectations) to consistently estimate eigenvalues, eigenfunctions and the error variance. If one uses standard approach to estimate the PC scores, ∫{X(t) − μ(t)}ϕk(t) dt, the estimates will be contaminated by noise and thus measurement error correction is needed in the FLM regression. Regression calibration is a popular and effective approach for measurement error models (Carroll et al., 2006), which provides consistent estimators for regression coefficients in linear models. It requires substituting ξk by its conditional expectation ξ̂ik = Ê (ξik|W⃗i), which is exactly the PACE estimator for the PC scores. Thus, plugging in ξ̂k to the regression model is in essence equivalent to implementing a regression calibration correction and thus yields consistent estimators for βk. Thus, the proposed testing procedure is still valid in the presence of measurement error, as the PACE estimator for PC scores are used.

Another extension is to accommodate cases with additional baseline covariates. In the presence of d-dimensional baseline covariates Z, model (1) can be extended to

Y=β0+ZTα1+Tβ(t)X(t)dt+ε,

where α ∈ ℝd are the regression coefficients for baseline covariates, and the error term ε is independent of Z and X(·). A truncated version analogous to (3) is

Y=β0+ZTα1+k=1Kβkξk+ζ.

The proposed testing procedure can be extended to this setting in a straightforward way.

4. Simulation studies

In this section, we study the finite sample performance of both the classical and proposed testing procedures under the FLM. The cases without and with measurement errors are investigated in the following two sections, respectively.

4.1 Functional covariates without measurement errors

For simplicity of presentation, we first consider a scenario where functional covariate Xi(·), i = 1,, n, generated by 3 Fourier basis as

Xi(t)=k=13ξikϕk(t),

where ξik~i.i.dN(0,λk) with λ1 = 1, λ2 = 0.5, and λ3 = 0.25. The three orthonormal Fourier basis functions are, ϕ1(t)=sin(2πt)/0.5,ϕ2(t)=cos(4πt)/0.5, and ϕ3(t)=sin(4πt)/0.5. Besides Xi(t), a binary baseline covariate Zi ~ B(1, 0.05) was included in the model as a potential confounder. The outcome Yi was generated based on a model

Yi=-3+0.5Zi+01β(t)Xi(t)dt+εi, (8)

with εi ~ N(0, 1). The association function β(·) is formed by the three same Fourier basis functions for Xi(t), β(t)=ak=13βkϕk(t), where a = 0, 0.04, 0.08, and 0.12 controls the magnitude of β(·) across the support. To explore the performance of the testing methods under different scenarios, we consider the true association parameters (β1, β2, β3) to be (1, 1, 1), (1, 2, 0), (1, 2, 0), (0.5, 1, 1) and (0, 1, 2), for settings 1–5, respectively. Setting 1 corresponds to the scenario that the functional effect is the same across 3 directions, while settings 2 and 3 represent the cases that the direction with the smallest variation has null association. Settings 4 and 5 consider the situations that the directions with smaller variation are strongly associated with the response. These five settings cover a wide spectrum of combinations of variation and association, so that we can compare the performance of both the classical and proposed approaches comprehensively. In addition, we consider an extreme Setting 6 with eigenvalues (λ1, λ2, λ3) = (1, 0.5, 0.01), where the third PC explains less than 1% of variation and is the only direction associated with the response. The magnitudes of association along the three directions were set to be (β1, β2, β3) = (0, 0, 5).

A total of 2000 datasets each with n = 1000 were generated under each setting. We study the power performance under various choices of the number of PCs (NPCs) and the threshold γ of PVE or PAVE. We consider NPCs = 1, 2, and 3 for fixed NPCs, and set γ to range from 0.5 to 0.99 with an increment 0.05 for fixed threshold levels. To our knowledge, the performance of the classical method has not been studied with a wide range of threshold of PVE in the literature. Our simulation provides not only a comparison between the classical and the proposed approaches, but also a thorough investigation on the power of the classical method against the choice of the threshold. The simulation results are shown in Table 1 for fixed NPCs and Table 2 for fixed threshold levels of PVE/PAVE. For concise presentation, we choose to show results corresponding to four selected threshold levels (0.5, 0.7, 0.85, and 0.99) in Table 2, while power curves on the refined grid points of threshold levels are shown in Web Figure W1.

Table 1.

Simulation results of the classical (Tc) and the proposed (T) methods with fixed number of principal components based on 2000 Monte Carlo simulations. The rows with a = 0 stand for type I error rates of both methods under different scenarios.

Setting 1 Setting 2 Setting 3 Setting 4 Setting 5 Setting 6

a NPC Tc T Tc T Tc T Tc T Tc T Tc T
0 1 0.050 0.044 0.050 0.044 0.050 0.044 0.050 0.044 0.050 0.044 0.050 0.044
0 2 0.047 0.048 0.047 0.048 0.047 0.048 0.047 0.048 0.047 0.048 0.047 0.048
0 3 0.045 0.049 0.045 0.049 0.045 0.049 0.045 0.049 0.045 0.049 0.045 0.049

0.04 1 0.252 0.252 0.252 0.288 0.254 0.413 0.106 0.164 0.053 0.220 0.050 0.069
0.04 2 0.286 0.276 0.370 0.311 0.522 0.464 0.166 0.180 0.120 0.245 0.048 0.078
0.04 3 0.284 0.291 0.321 0.322 0.454 0.466 0.175 0.180 0.240 0.244 0.076 0.081

0.08 1 0.736 0.736 0.734 0.821 0.730 0.956 0.250 0.491 0.053 0.700 0.052 0.164
0.08 2 0.815 0.810 0.914 0.873 0.990 0.975 0.522 0.568 0.353 0.756 0.049 0.166
0.08 3 0.822 0.829 0.876 0.876 0.974 0.974 0.583 0.589 0.752 0.756 0.160 0.166

0.12 1 0.970 0.978 0.970 0.996 0.966 1.000 0.480 0.826 0.051 0.970 0.053 0.344
0.12 2 0.991 0.990 1.000 0.999 1.000 1.000 0.857 0.896 0.690 0.987 0.048 0.344
0.12 3 0.994 0.994 0.998 0.999 1.000 1.000 0.916 0.915 0.986 0.986 0.336 0.336

Table 2.

Empirical Type I error and power of the classical (Tc) and proposed (T) methods, based on 2000 Monte Carlo simulations. The number of PCs were selected based on various threshold choices, γ, for PVE in the former and PAVE in the latter. The rows with a = 0 correspond to Type I error rates, while those with a > 0 correspond to power under alternative hypotheses.

Setting 1 Setting 2 Setting 3 Setting 4 Setting 5 Setting 6

a γ Tc T Tc T Tc T Tc T Tc T Tc T
0 0.5 0.050 0.043 0.050 0.043 0.050 0.043 0.050 0.043 0.050 0.043 0.045 0.050
0 0.7 0.047 0.048 0.047 0.048 0.047 0.048 0.047 0.048 0.047 0.048 0.046 0.049
0 0.85 0.047 0.047 0.047 0.047 0.047 0.047 0.047 0.047 0.047 0.047 0.046 0.053
0 0.99 0.045 0.049 0.045 0.049 0.045 0.049 0.045 0.049 0.045 0.049 0.046 0.055

0.04 0.5 0.252 0.266 0.252 0.296 0.254 0.432 0.106 0.172 0.053 0.227 0.046 0.102
0.04 0.7 0.286 0.276 0.370 0.306 0.522 0.456 0.166 0.175 0.120 0.244 0.046 0.106
0.04 0.85 0.286 0.284 0.370 0.318 0.522 0.460 0.166 0.180 0.120 0.244 0.046 0.100
0.04 0.99 0.284 0.291 0.321 0.321 0.454 0.465 0.175 0.179 0.240 0.244 0.046 0.100

0.08 0.5 0.736 0.771 0.734 0.835 0.730 0.962 0.250 0.532 0.053 0.712 0.045 0.310
0.08 0.7 0.815 0.810 0.914 0.869 0.990 0.971 0.522 0.566 0.353 0.752 0.045 0.297
0.08 0.85 0.815 0.824 0.914 0.878 0.990 0.973 0.522 0.582 0.353 0.758 0.045 0.298
0.08 0.99 0.822 0.829 0.876 0.876 0.974 0.974 0.583 0.589 0.752 0.755 0.045 0.298

0.12 0.5 0.970 0.987 0.970 0.996 0.966 1.000 0.480 0.868 0.051 0.969 0.044 0.634
0.12 0.7 0.991 0.992 1.000 0.998 1.000 1.000 0.857 0.906 0.690 0.984 0.044 0.605
0.12 0.85 0.991 0.993 1.000 0.999 1.000 1.000 0.857 0.912 0.690 0.986 0.044 0.610
0.12 0.99 0.994 0.994 0.998 0.999 1.000 1.000 0.916 0.915 0.986 0.986 0.044 0.620

The type I error under all settings are close to the nominal significance level 0.05. The power performance of the classical method highly depends on the NPCs and the choice of γ. For example, the power of the classical method under Setting 2 with a = 0.08 increases from 0.734 to 0.914, then drops to 0.876 with increasing γ. A higher threshold of PVE does not guarantee a better power. This is a result from a trade-off between the information included in the selected PCs and the degrees of freedom. In contrast, the power of the proposed method is more stable with respect to the NPCs and the threshold choice across different scenarios, compared to the classical method. In addition, we make the following observations: 1) The proposed method has comparable power to the classical method when the PCs with large variation have stronger association although somewhat power loss is observed in some cases. For example when two PCs are specified to be selected under Setting 2 or 3, the power of the proposed method is slightly lower than the traditional one (say 0.311 versus 0.370, and 0.464 versus 0.522). This phenomenon is not surprising since these situations favor the traditional method as the first two PCs are the optimal PCs to be included in the testing procedure, and the power loss of the proposed one is caused by accounting for the extra randomness in the selecting procedure; 2) When the PCs with smaller variation dominate the association function, such as Setting 5 and 6, the proposed method demonstrates substantial power advantage, as it reaches a high power even with a low NPC or a low threshold of PAVE (0.7), while the classical method has much lower power with similar threshold values, even at a threshold value as high as 0.85. In an extreme setting like Setting 6, the classical method cannot detect the association effectively even with a PVE threshold of 0.99, while the proposed method performs well in terms of power.

We also conducted simulation studies under more complex scenarios, where there are more PCs and the shape of β(t) is more complex. As they demonstrate similar patterns to those discussed above, we reported those results in Web Tables W1 and W2 in the supplementary materials.

4.2 Functional covariates with measurement error

The observed functional covariate from the ith subject at time tij was generated by

Wi(tij)=k=13ξikϕk(tij)+eij,

where eij~N(0,σe2), with σe2=0.1 and 1, corresponding to low and high noise levels. Here a dense grid of 100 equally spaced points between [0, 1] is considered. The response was generated by (8). The simulation results are presented in Table 3 for two measurement error settings, σe2=0.1 and σe2=1, respectively. The thresholds of PVE for the pre-fitted model of the proposed approach are chosen to be 99%.

Table 3.

Empirical Type I error and power of the classical (Tc) and proposed (T) methods when the functional covariate is recorded with small ( σe2=0.1) and moderate measurement error ( σe2=1), based on 2000 Monte Carlo simulations. The number of PCs were selected based on various threshold choices, γ, for PVE in the former and PAVE in the latter. The rows with a = 0correspond to Type I error rates, while those with a > 0 correspond to power under alternative hypotheses.

σe2=0.1
σe2=1

Setting 1 Setting 3 Setting 5 Setting 1 Setting 3 Setting 5

a γ Tc T Tc T Tc T Tc T Tc T Tc T
0 0.5 0.049 0.043 0.049 0.043 0.049 0.043 0.047 0.045 0.047 0.045 0.047 0.045
0 0.7 0.047 0.044 0.047 0.044 0.047 0.044 0.046 0.045 0.046 0.046 0.046 0.046
0 0.85 0.046 0.046 0.046 0.046 0.046 0.046 0.045 0.048 0.045 0.048 0.045 0.048
0 0.99 0.046 0.046 0.046 0.046 0.046 0.046 0.046 0.047 0.046 0.047 0.046 0.047

0.04 0.5 0.252 0.268 0.252 0.429 0.053 0.224 0.251 0.261 0.254 0.420 0.052 0.222
0.04 0.7 0.285 0.278 0.522 0.452 0.124 0.234 0.282 0.276 0.512 0.449 0.118 0.227
0.04 0.85 0.285 0.280 0.514 0.458 0.139 0.246 0.281 0.279 0.504 0.450 0.137 0.240
0.04 0.99 0.282 0.285 0.457 0.458 0.236 0.238 0.283 0.284 0.452 0.454 0.236 0.237

0.08 0.5 0.732 0.769 0.722 0.964 0.055 0.710 0.724 0.767 0.711 0.958 0.053 0.686
0.08 0.7 0.816 0.806 0.988 0.972 0.354 0.750 0.812 0.804 0.986 0.970 0.345 0.739
0.08 0.85 0.818 0.822 0.987 0.972 0.404 0.751 0.814 0.818 0.984 0.971 0.400 0.742
0.08 0.99 0.820 0.820 0.974 0.975 0.752 0.754 0.815 0.815 0.972 0.972 0.745 0.746

The type I error rates of both methods are controlled under 0.05. The choice of threshold of PVE has a large impact on the power performance of the classical method especially under Setting 3 and 5. In contrast, the proposed method is relatively robust against choice of threshold values and tend to include less PCs to achieve comparable power than the classical method (average number of PC included are shown in Table W2). Under these scenarios, all threshold values above 0.7 perform well in terms of statistical power. This is appealing in practice in two ways. First, it reduces the dimension of predictors because a lower threshold is sufficient to achieve good power. Second, the choice of the threshold does not affect the analysis result greatly, whereas using different thresholds of PVE for the classical method could lead to different conclusions.

5. Data example: a diffusion tensor imaging study

We consider a diffusion tensor imaging (DTI) study conducted on 100 multiple sclerosis (MS) patients in Johns Hopkins Hospital with multiple clinical visits. This cerebral dataset has been considered in Goldsmith et al. (2011) and available in R package “refund”. The scientific question of interest is the association between the diffusivity along white matter tracts and the cognitive disability among multiple sclerosis patients. The diffusion of water molecules at each voxel is measured by fractional anisotropy (FA) obtained by magnetic resonance imaging technique. FA profiles along two well-defined white matter tracts, corpus callosum (CC) and right corticospinal tracts (RCST), are considered in the analysis. There are 93 and 55 locations along CC and RCST respectively. Since the value of FA varies at different location along white matter tracts, it can be treated as a function of location on the white matter tracts. To quantify the cognitive impairment, each patient received a paced auditory serial addition test (PASAT) at every clinical visit, and obtained a score ranging from 0 to 60. The goal in our analysis is to quantify the impact of the FA profile along CC and RCST on the PASAT score, and test for the significance of the associations.

To explore the relationship between FA profiles and PASAT scores, we group the MS patients into 4 groups according to their PASAT scores separated by the 3 quartiles (25%, 50%, and 75%). The estimated mean FA profiles in the 4 groups along the two white matter tracts are presented in Figure (2a) and Figure (2b). Figure (2a) shows that the mean FA profile along CC in the group with PASAT scores below 25%-quartile is lower than the mean FA profiles from the other three groups in general, while the other three mean curves tangle with each other except at the right tail. On the other hand, the four mean FA profiles along RCST in the 4 groups stick together except at the peaks and the valleys. It is difficult to determine whether the FA profiles has a significant impact on the PASAT scores visually; hence a functional linear regression model is employed with PASAT score as a scalar response and FA profiles as functional covariates.

Figure 2.

Figure 2

Estimated mean FA profiles in the 4 groups formed by PASAT scores and the corresponding quartiles, estimated β(t), and simulated power curves. Left penal: corpus callosum tracts. Right panel: right corticospinal tracts.

We applied both the classical and the proposed testing method, and the results are shown in Table 4. We considered three thresholds of PVE/PAVE: 70%, 85%, and 99%. The PVE threshold for the pre-fitted model in the proposed method is set as 95%. From the results of the classical method, we see that there are inconsistent conclusions with different choices of threshold of PVE. If we use alpha=0.05 as the significance level, we would conclude that there is a significant association between the FA profiles along CC and the PASAT score if the threshold is 70% or 85%, but the association is not significant if the threshold is 99%. Similarly for the FA profiles along RCST, there is no significant association if the threshold is 70% or 85%, but there is a significant association if the threshold is 99%. For both associations, different thresholds can lead to opposite conclusions. The proposed method overcomes this issue. For the CC area, the p-values with 70%, 85%, and 99% are all smaller than 0.05, which give a consistent conclusion that the FA profile along CC is significantly associated with the PASAT score. Similarly, in the analysis with RSCT, the p-values are all under 0.05 with different choices of the threshold. This suggests a significant impact of the FA profile along RSCT on the PASAT score.

Table 4.

Testing results of the classical (Tc) and proposed (T) methods on the DTI data with FA profiles in corpus callosum and right corticospinal tracts as functional covariates. The number of PCs were selected based on various threshold choices, γ, for PVE in the former and PAVE in the latter.

Corpus callosum tracts Right corticospinal tracts

P-value Number of PCs P-value Number of PCs

γ Tc T Tc T Tc T Tc T
0.70 0.007 0.039 2 2 0.111 0.021 3 2
0.85 0.028 0.044 4 3 0.064 0.015 5 3
0.99 0.131 0.042 9 6 0.004 0.028 9 6

To better explain the inconsistency of results fom the PVE based tests, we conduct a simulation study mimicking the DTI data. Specifically, we first estimated eigenvalues and eigenfunctions of the functional covariates as well as the functional regression along both CC and RCST. We then conducted a simulation study, setting the true parameters to be those estimated from the DTI application. The estimated power curves corresponding to a wide range of threshold levels are shown in Figure 2e and 2f, respectively. For the CC, the power of the classical method decrease with the PVE threshold especially in the range from 0.7 to 0.99. The loss of power corresponding to increasing γ explains why the p-value changes from 0.028 (γ = 0.85) to 0.131 (γ = 0.99). On the other hand, the power of the proposed method is relatively stable, which explains the consistency in p-values across different threshold choices. For the RCST, the power of the classical method increases substantially with increasing threshold levels, which is why statistical significance is achieved only for γ = 0.99. Again, the power of the proposed method is relatively robust against the threshold values, leading to the same conclusion in terms of statistical significance across different threshold choices.

6. Conclusions

FLM has been a popular tool to describe the dynamic impact of a functional covariate on a scalar response. In the presence of infinite dimensional covariates and regression parameters, applying FPCA can reduce the dimensionality and facilitate statistical inferences. In this work, we investigate the performance of a classical Wald-type test based on the PCs chosen by eigenvalues, and propose a novel association-variance-based selection procedure. The number of PCs can be either pre-specified or determined by the proposed PAVE threshold. We establish the null distributions of both selection criteria and study their asymptotic power. Our numerical studies show that the power performance of the classical approach is sensitive to choice of the number of PCs and including more does not guarantee a higher power. This is unappealing in practice due to lack of guidance on choosing the threshold and inconsistency of conclusions based on analysis with different threshold levels. On the other hand, the proposed PC selection procedure is robust against threshold choices. Moreover, it is more powerful than the classical method when the leading PCs are weakly or not associated with the response.

Practitioners might be interested in how to choose an optimal threshold for the proposed PAVE approach. As we illustrated, the choice of threshold is less crucial for our method, due to the robustness of its power across a wide range of threshold levels. This is demonstrated by simulation studies across a wide range of scenarios, including the simple settings in Section 4.1, more complex settings in Web Appendix E, and settings that mimic the DTI data application (power curves in Figure 2e and 2f). In these scenarios, generally the choice of threshold does not lead to substantial differences in statistical power and changes of conclusions, which makes our method more desirable than the classical approach. Nevertheless, it is always desirable to provide some guidance for practitioners. Based on our empirical experiences, we recommend a threshold level of around 0.8–0.9, which works well across the scenarios that we explored. In particular, simulations mimicking the DTI data suggest that our proposed PAVE-based test with a threshold level of 0.8–0.9 achieves great power (Figure 2e and 2f) with parsimonious fitting. In case that practitioners really want to identify the optimal threshold for highest power, we suggest conducting a small simulation study mimicking their data structure, especially if they have prior knowledge of plausible shapes of the association function β(t).

There are several directions for future research. First, we consider functional linear models for continuous outcomes in this paper, and it will be of interest to extend the method to generalized functional linear models for non-Gaussian outcomes. Second, extensions to multilevel functional data will also be useful. Many studies, including the DTI application here, recorded functional data at multiple visits for the same subject, resulting in multilevel functional data. Di et al. (2009) and Crainiceanu et al. (2009) proposed multilevel functional principal component analysis to extract modes of variations at both between and within subject levels for such data. Third, the proposed methods can also be generalized for time-to-event outcomes, or more general nonlinear functional regression models. Finally, even though we briefly discussed extending our method to functional linear regression with sparse longitudinal data, the asymptotic theory remains challenging, especially how the smoothing parameter in the FPCA step will affect inference. This will be investigated in a separate paper in the future.

Supplementary Material

Supplementary Material

Footnotes

7. Supplementary materials

Appendix, Web Figures, and Web Tables referenced in Section 3 and Sections 4.1 are available with this paper at the Biometrics website on Wiley Online Library.

References

  1. Cardot H, Ferraty F, Mas A, Sarda P. Testing hypotheses in the functional linear model. Scandinavian Journal of Statistics. 2003;30:241–255. [Google Scholar]
  2. Cardot H, Ferraty FP. Functional linear model. Statistics and Probability Letters. 1999;45:11–22. [Google Scholar]
  3. Carroll RJ, Ruppert D, Stefanski LA, Crainiceanu CM. Measurement error in nonlinear models: a modern perspective. Chapman and Hall/CRC; 2006. [Google Scholar]
  4. Crainiceanu CM, Staicu AM, Di CZ. Generalized multilevel functional regression. Journal of the American Statistical Association. 2009;104:1550–1561. doi: 10.1198/jasa.2009.tm08564. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Di CZ, Crainiceanu CM, Caffo B, Punjabi NM. Multilevel functional principal component analysis. Annals of Applied Statistics. 2009;3:458–488. doi: 10.1214/08-AOAS206SUPP. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Goldsmith J, Bobb J, Crainiceanu CM, Caffo B, Reich D. Penailzed functional regression. Journal of Computational and Graphical Statistics. 2011;20:830–851. doi: 10.1198/jcgs.2010.10007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Goldsmith J, Greven S, Crainiceanu CM. Corrected confidence bands for functional data using prinicipal components. Biometrics. 2013;69:41–51. doi: 10.1111/j.1541-0420.2012.01808.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Hall P, Hosseini-Nasab M. On properties of functional principal components analysis. Journal of Royal Statistical Society Series B. 2006;68:109–126. [Google Scholar]
  9. Horváth L, Kokoszka P. Inference for functional data with applications. Springer; 2012. [Google Scholar]
  10. James G. Generalized linear models wtih functional predictors. Journal of the Royal Statistical Society, Series B. 2002;64:411–432. [Google Scholar]
  11. Kong D, Staicu A-M, Maity A. Classical testing in functional linear models. Technical report, North Carolina State University Department of Statistics Technical Reports. 2013;2647:1–23. [Google Scholar]
  12. Müller HG, Stadtmüller U. Generalized functional linear models. The Annals of Statistics. 2005;33:774–805. [Google Scholar]
  13. Ramsay J, Silverman BW. Functional data analysis. Springer; 2005. [Google Scholar]
  14. Ramsay JO, Dalzell CJ. Some tools for functional data analysis. Journal of the Royal Statistical Society, Series B. 1991;53:539–572. [Google Scholar]
  15. Ramsay JO, Hooker G, Graves S. Functional data analysis with R and MATLAB. Springer; 2009. [Google Scholar]
  16. Swihart BJ, Goldsmith J, Crainiceanu CM. Restricted likelihood ratio tests for functional effects in the functional linear model. Technometrics. 2014;56:483–493. [Google Scholar]
  17. Yao F, Müller HG, Wang JL. Functional data analysis for sparse longitudinal data. Journal of the American Statistical Association. 2005a;100:577–590. [Google Scholar]
  18. Yao F, Müller HG, Wang JL. Functional linear regression analysis for longitudinal data. The Annals of Statistics. 2005b;33:2873–2903. [Google Scholar]
  19. Zhu H, Yao F, Zhang HH. Structured functional additive regression in reproducing kernel hilbert spaces. Journal of Royal Statistical Society Series B. 2014;76:581–603. doi: 10.1111/rssb.12036. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Material

RESOURCES