Abstract
Functional data analysis has attracted substantial research interest and the goal of functional sparsity is to produce a sparse estimate which assigns zero values over regions where the true underlying function is zero, i.e., no relationship between the response variable and the predictor variable. In this paper, we consider a functional linear regression models that explicitly incorporates the interconnections among the responses. We propose a locally sparse (i.e., zero on some subregions) estimator, multiple-smooth and locally sparse (m-SLoS) estimator, for coefficient functions base on the interconnections among the responses. This method is based on a combination of smooth and locally sparse (SLoS) estimator and Laplacian quadratic penalty function, where we used SLoS for encouraging locally sparse and Laplacian quadratic penalty for promoting similar locally sparse among coefficient functions associated with the interconnections among the responses. Simulations show excellent numerical performance of the proposed method in terms of the estimation of coefficient functions especially the coefficient functions are same for all multivariate responses. Practical merit of this modeling is demonstrated by one real application and the prediction shows significant improvements.
Keywords: Functional data analysis, locally sparse, functional linear multivariate regression
1. Introduction
Functional data analysis has attracted substantial research interest. In functional data analysis, functional linear regression (FLR) is a popular technique when predictions themselves are functions. Historically, FLR originates from the ordinary linear regression with a large number of predictors. It has consequently been thoroughly studied and extensively applied. A non-exhaustive list of recent works includes the followings [1–7].
In this article, we consider the functional linear regression for multivariate responses. Let Y = {Y1, ··· ,Yp} be the response vector and X(t) be the functional predictor observed at a dense grid of points. Consider the following functional linear model
| (1) |
where μj is the intercept term, βj(t) is unknown smooth coefficient function, and ∈j is the random error for the jth response. If on a subregion I ⊂ [0,T], βj(t) = 0 for every t ∈ I, then X(t) has no contribution to Yj on the interval I. In light of this observation, an estimate of βj(t) improves the interpretability of the model and is practically appealing, if it not only yields weights of the contribution of X(t) over the entire domain, but also locates subregions where X(t) has no statistically significant contribution to Yj.
This estimate of βj mentioned above is called the locally sparse estimate [8,9]. Although the literature on FLR is abundant, little has been done on interpretability and locally sparse modeling, especially on the multivariate response. James et al. [10] proposed “FliRTI” to achieves local sparseness by employing L1 penalty on the coefficient function and its first several derivatives at some discrete grid points. Zhou et al. [11] pointed out one drawback of FliRTI method that the produced estimate possesses large variation. When the grid size is small, the numerical solution is unstable, while when the grid size is large, FLiRTI method tends to overparameterize the model. To overcome this background, Zhou et al. [11] proposed an alternative locally sparse estimator obtained in two stages. Lin et al. [12] proposed a simple one-stage procedure that yields a smooth and locally sparse estimator of the coefficient function and they call it “smooth and locally sparse (SLoS) estimator”. All methods mentioned above are done with a univariate response, while we consider the multiple-output FLR in this paper.
In multivariate regression, the idea of using information from different responses to improve estimation is not new. Previous work has been done on scalar multivariate regression. Breiman and Friedman [13] proposed Curds and Whey method, which used optimal linear combination of least squares predictions as the predictors. Rothman et al. [14] proposed multivariate regression with covariance estimation, leveraged correlation in unexplained variation to improve estimation. Peng et al. [15] proposed regularized multivariate regression for identifying master predictors, which was motivated by investigating the regulatory relationships among different biological molecules based on multiple types of high dimensional genomic data. Rai et al. [16] proposed a multiple-output regression model that allows leveraging both output structure and task structure with output structure and task structure learned from the data. It relies on a priori information about valuable predictors. They imposed a group L1 and L2 norm, across responses, on all covariates not prespecified as being useful predictors. Bradley and Ben [17] proposed a method for simultaneously estimating regression coefficients and clustering response variables in a multivariate regression model, to increase prediction accuracy and give insights into the relationship between response variables. Shi et al. [18] proposed a novel method, Variational Inference for Multiple Correlated Outcomes, for joint analysis of multiple traits in Genome-Wide Association Studies and a variational Bayesian expectation-maximization algorithm was used to ensure computational efficiency. In this paper, we assume that if Yk and Yj are tightly connected, then their regression coefficient profiles βk(t) and βj(t) should be similar on local sparsity. Although the literature on scalar multivariate regression is abundant, little has been done on functional multivariate regression.
Based on the above discussions, we aim to develop locally sparse estimator for the coefficient function βj(t) for j = 1, ··· ,q, while effectively accommodate correlations among the multivariate responses. In this article, we consider a combination of the SLoS and Laplacian quadratic as the penalty function. We call the proposed method “multiple-SLoS” (m-SLoS). This method uses the SLoS for encouraging locally sparse and Laplacian quadratic penalty for promoting similar local sparsity among coefficient functions associated with the interconnections among the responses. Note that the Laplacian quadratic penalty here is imposed on functions, and is different from that in Huang et al. [19], which promotes similarities among regression coefficients.
The later sections are organized as follows. The model setting and methodology are described in Section 2 as well as the computational algorithm. In Section 3, we present simulation studies under the four different scenarios to assess finite performance of our proposed method. In Section 4, we apply the proposed method to Tecator data. The article concludes with a discussion in Section 5.
2. Methods
2.1. The model setting and methodology
Under the smoothness condition, we approximate βj(·) using the B-spline basis expansion. Given Mn evenly-spaced knots, , let Ik = [tk−1, tk] for k = 1, ··· , Mn. Associated with this set of knots, there are (Mn +d) B-spline basis functions, , each of which is a piecewise polynomial of degree d with support at most d + 1 subintervals Ik. Then
where rj(t) is an approximation error that is uniformly bounded on [0, T] with the bound going to 0 as Mn goes to infinity. Let U be an n×(Mn +d) matrix with entries and U = (u1, ···,un)⊤. Moreover, set and bj = (bj,1, ··· ,bj,Mn+d)⊤. Then model (1) can be written as
where yj = (y1j, ··· ,ynj)⊤, and the error εj = (ε1j, ··· ,εnj)⊤ satisfies . In this section, we adopt the least squares objective functions
where μ = (μ1,··· ,μq)⊤, and ∥·∥ is the l2 norm.
Moreover, suppose the adjacency matrix is A = (aij,1 ⩽ i,j ⩽ q) for the responses and we want to accommodate the correlation structure to promote similar local sparsity among coefficient functions associated with the interconnections among the responses. The idea of using Laplacian quadratic penalty to promote similarities among coefficients was not new. Huang et al. [19] proposed Laplacian quadratic penalty for variable selection and estimation that explicitly incorporates the correlation patterns among predictors. Shi et al. [20] proposed a sparse double Laplacian shrinkage method which jointly models the effects of multiple CNAs on multiple GEs. Wu et al. [21] borrowed the idea of Laplacian penalty and proposed a method to comprehensively accommodate multiple challenging characteristics of GE - CNV modeling. In this paper, we extend the usage of Laplacian quadratic penalty to multiple-output FLR. Our goal is to promote similar locally sparse among coefficient functions associated with the interconnections among the responses. To accommodate the correlation structure, we propose .
Note that
where . Let D = diag(d1, ··· ,dq), where Define L = D − A, which can be easily proved to be a positive semi-definite matrix. Then
where ⊗ is the Kronecker product. More related discussions refer to Huang et al. [19].
Lin et al. [12] developed a SLoS estimator for coefficient function based on “fS-CAD” penalty: . Here pλ(t) is the SCAD penalty, where λ is a data-dependent tuning parameter. From Theorem 1 in their paper,
where Wk is an (Mn + d) × (Mn + d) matrix with entries if k ⩽ u,v ⩽ k+d and zeros otherwise. When Mn is relative large, the estimator usually exhibits excessive variability. A popular approach to rectify the variability is to add a roughness penalty on βj(t) = B(t)⊤bj. For example,
where V is an (Mn + d) × (Mn + d) matrix with entries .
Based on the above discussion, we propose the following objective function for functional linear multivariate regression
| (2) |
In (2), the first part is the usual least squares objective function. The second part is used to encourage the local sparsity of coefficient functions. The third part is a roughness penalty, which is a popular approach to rectify the variability of coefficient functions. The last part is Laplacian quadratic penalty, which can promote similar local sparsity among coefficient functions associated with the interconnections among the responses. We called this methods as “multiple-SLoS” (m-SLoS). The estimator by minnimizing (2) enjoys smooth and locally sparse properties. In next section, we will show the algorithm to solve this problem.
2.2. Computational algorithm
In this section, we will discuss the algorithm to minnizing (2). Before solving this optimization problem, we have to get the the adjacency matrix A. If we have prior information of A, we can just use it. If we don’t have prior information, we could use the data to calculate the adjacency matrix among responses. More related discussions refer to Huang et al.[19]. In this paper, we use the correlation matrix of responses as A.
When u ≈ u(0),the LQA of SCAD function pλ(μ) is
Then given some initial estimate , for , we have
where
only depends on the initial estimate .
Let , then we have
Recall the objective function is
where , Wk is an (Mn +d)×(Mn +d) matrix with entries if k ⩽ u, v ⩽ k + d and zeros otherwise, V is an (Mn + d) × (Mn + d) matrix with entries . Now we get
Let R(bj) denotes the terms that contain bj, we have
Differentiating R(bj) with respect to bj and setting it to zero, we have the following equation:
with the solution
In summary, we have the following algorithm to compute and obtain the estimator for j = 1, 2, …,q.
Step1: for j = 1,2, …,q,
Compute the initial estimate .
Given , compute W(i) and .
Repeat (b) until the convergence of is reached.
Step2: Let the initial value be the value we get from Step1.
Step3: Given for j = 1,2, … ,q, update
Step4: Repeat Step3 until the convergence of is reached for j = 1,2,…, q.
Remark 1. We first calculate the SLoS estimator respectively in Step1. Then we use this estimator as the initial value of our proposed method. The m-SLoS estimator is calculated in Step3.
3. Simulation studies
In this section, we conduct simulation studies to evaluate the performance of the proposed method. We conduct a simulation study on the following function linear model
where the true parameter μ = (μ1,··· ,μq)⊤ = (1,···,1)⊤. The covariate function Xi(t) = Σj cijBj(t) where the coeffcients cij are generated from the standard uniform distribution on [−5, 5] and each Bj(t) is a B-spline basis function defined by order 5 and 50 equally spaced knots. We independently generate n datasets as the training set. The tune parameters, λ1,λ2 and λ3, are selected using 5-fold cross-validation. We have developed R code and made it publicly available at https://github.com/ruiqwy/m-slos.
To show the appearance of new method when βjs are same or similar, we present simulation studies under four different scenarios to assess performance of our proposed model to SLoS. Coefficient functions for those four settings are plotted in Figure 1. In first example, βjs are same. In second example, βjs share the same zero intervals but the values of them are not same on nozero intervals. As for the last two examples, βjs’ zero intervals are similar but not same. Before we show the details of those four scenarios, we introduce some indexes to measure the performances of those estimations of coefficient functions.
Figure 1.
Coefficient functions in four settings. (a) Example 1; (b) Example 2; (c) Example 3; (d) Example 4.
The quality of estimates is measured by the integrated squared error (ISE) and integrated absolute error (IAE). ISE0 and ISE1 measure integrated squared errors between an estimated coefficient function and the true function β(t) on null subregions and non-null subregions, respectively. IAE0 and IAE1 measure integrated absolute errors between an estimated coefficient function and the true function β(t) on null subregions and non-null subregions, respectively. These indexes are described in more details below.
where βj(t) is the true coefficient function, is estimated coefficient function using SLoS or m-SLoS, N(βj) is the null subregions of βj(t), S(βj) is the non-null subregions of βj(t), l0j is the length of null subregions N(βj) and l1j is the length of non-null subregions S(βj).
In addition, to assess the performance of local sparsity detection, we used three numerical measures: the average percentage of intervals correctly identified (CI), the average percentage of true zero intervals correctly identified (CZ) and the average percentage of nonzero intervals correctly identified as zeros (CN). The bigger CI, CZ and CN are, the better estimator is. For the best estimator, those three indexes are close to 1. We show the details of the four scenarios below.
Example 1: In this example, we set trivariate functional data with the same coefficient function. Here we set
Then βj(t) = 0 on [0,0.2]∪[0.486,0.771] for j = 1,2,3. The distribution of ∈j is set to be N(0, σ2). We set different errors to see the differences under different levels of error. The number of observations in training set is n. The results of 100 Monte Carlo repetitions are showed in Table 1. From this table, we can find that the performances of estimations to those two methods both get better with the sample size n increase. The performances are better when the variance of ∈j is smaller.
Table 1.
Results of Example 1 obtained from 100 Monte Carlo repetitions (with standard errors in parentheses).
| σ2 | n | Method | CI | CZ | CN | ISE0 | ISE1 | IAE0 | IAE1 |
|---|---|---|---|---|---|---|---|---|---|
| (%) | (%) | (%) | (1e-3) | (1e-3) | (1e-3) | (1e-3) | |||
| 0.05 | 300 | m-SLoS | 89.45 | 78.35 | 99.91 | 1.41 | 9.22 | 8.56 | 57.76 |
| (5.11) | (10.66) | (0.26) | (0.90) | (4.34) | (5.47) | (16.39) | |||
| slos | 68.01 | 34.03 | 100.00 | 4.87 | 9.88 | 54.07 | 94.66 | ||
| (4.82) | (9.95) | (0.00) | (0.98) | (1.40) | (10.91) | (7.10) | |||
| 500 | m-SLoS | 90.99 | 81.50 | 99.92 | 1.35 | 9.58 | 7.25 | 57.19 | |
| (3.24) | (6.79) | (0.24) | (0.73) | (7.90) | (3.39) | (20.52) | |||
| slos | 72.38 | 43.05 | 100.00 | 3.94 | 8.86 | 43.60 | 86.22 | ||
| (4.39) | (9.05) | (0.00) | (0.61) | (1.05) | (6.89) | (6.18) | |||
| 1000 | m-SLoS | 91.22 | 81.93 | 99.96 | 1.22 | 7.26 | 6.50 | 48.91 | |
| (2.84) | (5.92) | (0.17) | (0.68) | (2.39) | (3.06) | (10.55) | |||
| slos | 76.85 | 52.26 | 100.00 | 3.32 | 7.91 | 35.94 | 77.48 | ||
| (4.19) | (8.65) | (0.00) | (0.52) | (0.72) | (5.29) | (4.63) | |||
| 0.1 | 300 | m-SLoS | 71.65 | 44.75 | 96.97 | 17.72 | 62.26 | 44.85 | 157.10 |
| (2.71) | (7.14) | (2.05) | (10.12) | (35.99) | (15.32) | (58.02) | |||
| slos | 64.91 | 27.70 | 99.96 | 6.39 | 19.35 | 48.46 | 139.70 | ||
| (4.47) | (9.25) | (0.11) | (1.404) | (4.30) | (7.55) | (14.90) | |||
| 500 | m-SLoS | 82.09 | 63.13 | 99.96 | 2.46 | 11.62 | 16.70 | 68.45 | |
| (5.43) | (11.31) | (0.22) | (1.30) | (11.08) | (8.96) | (24.50) | |||
| slos | 64.59 | 26.99 | 100.00 | 7.18 | 13.41 | 72.46 | 116.22 | ||
| (4.58) | (9.45) | (0.00) | (1.64) | (2.33) | (13.57) | (11.05) | |||
| 1000 | m-SLoS | 85.38 | 69.93 | 99.94 | 2.00 | 10.95 | 12.36 | 63.92 | |
| (4.65) | (9.70) | (0.250) | (0.95) | (7.90) | (6.69) | (22.27) | |||
| slos | 67.38 | 32.74 | 100.00 | 5.95 | 12.39 | 59.09 | 104.92 | ||
| (4.52) | (9.33) | (0.00) | (1.01) | (1.64) | (9.47) | (8.03) |
Example 2: In this example, we set trivariate functional data with coefficient functions share same zero intervals. We want to see the performance of our new proposed method when the coefficient functions are not same but share the same zero intervals. The difference between this example and Example 1 is that the values of coefficient function on nonzero intervals are different. Here we set
Then βj(t) = 0 on [0,0.2]∪[0.486,0.771] for j = 1,2,3. βjs share the same zero intervals but they are not same on nonzero intervals. The distribution of . The number of observations in training set is n. The results of 100 Monte Carlo repetitions are showed in Table 2. From this table, we can find that the performances of estimations to those two methods all get better with the sample size n increase. This results show that the newly proposed method also works if the coefficient functions share same zero intervals though they are not same on the whole intervals.
Table 2.
Results of Example 2 and 3 obtained from 100 Monte Carlo repetitions (with standard errors in parentheses).
| n | Method | CI | CZ | CN | ISE0 | ISE1 | IAE0 | IAE1 |
|---|---|---|---|---|---|---|---|---|
| (%) | (%) | (%) | (1e-3) | (1e-3) | (1e-3) | (1e-3) | ||
| Example2 | ||||||||
| 300 | m-SLoS | 76.55 | 51.91 | 99.75 | 4.67 | 8.60 | 45.19 | 64.52 |
| (6.31) | (13.30) | (0.78) | (2.52) | (7.72) | (19.95) | (22.26) | ||
| slos | 61.84 | 21.35 | 99.96 | 8.04 | 12.35 | 81.32 | 117.52 | |
| (4.31) | (8.91) | (0.10) | (2.60) | (2.54) | (19.45) | (12.63) | ||
| 500 | m-SLoS | 87.21 | 73.65 | 99.99 | 2.11 | 8.55 | 10.83 | 54.72 |
| (4.05) | (8.36) | (0.14) | (0.75) | (3.83) | (4.60) | (14.31) | ||
| slos | 72.55 | 43.40 | 100.00 | 3.91 | 8.84 | 43.33 | 86.28 | |
| (8.50) | (17.52) | (0.00) | (1.03) | (1.76) | (12.31) | (10.15) | ||
| 1000 | m-SLoS | 88.57 | 76.43 | 100.00 | 2.01 | 7.82 | 9.95 | 49.65 |
| (2.94) | (6.07) | (0.00) | (0.51) | (1.46) | (3.74) | (7.31) | ||
| slos | 76.86 | 52.29 | 100.00 | 3.41 | 8.06 | 36.73 | 77.95 | |
| (7.29) | (15.03) | (0.00) | (0.84) | (1.19) | (9.45) | (7.33) | ||
| Example3 | ||||||||
| 300 | m-SLoS | 90.66 | 84.67 | 99.97 | 0.12 | 1.92 | 2.06 | 25.92 |
| (5.86) | (10.09) | (0.23) | (0.16) | (1.00) | (2.71) | (8.78) | ||
| slos | 66.43 | 77.11 | 99.88 | 0.48 | 3.17 | 9.05 | 50.45 | |
| (5.26) | (12.93) | (0.18) | (0.47) | (1.31) | (6.95) | (9.55) | ||
| 500 | m-SLoS | 93.29 | 88.69 | 99.97 | 0.12 | 1.57 | 2.05 | 21.36 |
| (3.65) | (6.30) | (0.24) | (0.14) | (0.91) | (2.67) | (7.39) | ||
| slos | 66.51 | 78.11 | 99.93 | 0.32 | 2.22 | 7.77 | 41.07 | |
| (4.65) | (10.86) | (0.15) | (0.28) | (0.65) | (5.25) | (6.20) | ||
| 1000 | m-SLoS | 96.03 | 93.14 | 99.98 | 0.05 | 1.11 | 0.69 | 16.25 |
| (1.73) | (2.76) | (0.17) | (0.06) | (0.76) | (0.85) | (6.62) | ||
| slos | 71.42 | 90.12 | 99.96 | 0.09 | 1.59 | 2.75 | 33.22 | |
| (1.07) | (2.93) | (0.12) | (0.08) | (0.43) | (1.17) | (4.26) | ||
Example 3: In this example, we set bivariate functional data with coefficient functions share similar zero intervals. One of the coefficient functions are more sparse than another one. We want to see whether m-SLoS will cause misestimate of another coefficient function or not. Here we set
Then β1(t) = 0 on [0,0.2]∪[0.486,0.771] and β2(t) = 0 on [0,0.771]. The distribution of . The number of observations in training set is n. The results of 100 Monte Carlo repetitions are showed in Table 2. This results show that the performances of estimations get better with the sample size n increase for both methods. The newly proposed method also works but the improvement is slight compared with Example 2.
Example 4: In this example, we set bivariate functional data. The coefficient functions share similar zero intervals. The difference between this example and Example 3 is that in this example, coefficient functions are less sparse. The motivation of this example is that we want to see whether the differences on nonzero intervals (especially the sign of coefficient functions) will influence the the estimations of new proposed method. Here we set
Then β1(t) = 0 on [0, 0.2] ∪ [0.486, 0.771] and β2(t) = 0 on [0.486, 0.771]. βjs’ zero intervals are similar but not same. We set different errors to see the performances of SLoS and m-SLoS under different levels of error. The distribution of ∈j is N(0, σ2). The number of observations in training set is n. The results of 100 Monte Carlo repetitions are showed in Table 3. The performances of estimations to both methods get better with the sample size n increase. The performances are better when the variance of ∈j is smaller. The improve of the estimation of CI is slight. It is not surprising since that the zero intervals are not same for those two coefficients.
Table 3.
Results of Example 4 obtained from 100 Monte Carlo repetitions (with standard errors in parentheses).
| σ2 | n | Method | CI | CZ | CN | ISE0 | ISE1 | IAE0 | IAE1 |
|---|---|---|---|---|---|---|---|---|---|
| (%) | (%) | (%) | (1e-3) | (1e-3) | (1e-3) | (1e-3) | |||
| 0.05 | 300 | m-SLoS | 91.25 | 81.80 | 97.72 | 0.26 | 6.10 | 6.59 | 77.42 |
| (4.89) | (11.28) | (2.55) | (0.30) | (3.02) | (5.04) | (17.10) | |||
| slos | 88.36 | 67.98 | 99.60 | 0.65 | 3.97 | 12.37 | 65.56 | ||
| (5.79) | (15.87) | (0.71) | (0.52) | (1.35) | (7.80) | (10.03) | |||
| 500 | m-SLoS | 93.16 | 86.97 | 97.60 | 0.15 | 1.73 | 2.64 | 25.10 | |
| (2.85) | (6.32) | (2.84) | (0.16) | (0.90) | (2.93) | (7.82) | |||
| slos | 88.48 | 68.74 | 99.72 | 0.49 | 2.79 | 10.94 | 53.63 | ||
| (5.35) | (14.12) | (0.51) | (0.34) | (0.68) | (6.07) | (6.77) | |||
| 1000 | m-SLoS | 94.78 | 88.18 | 98.74 | 0.10 | 1.22 | 1.61 | 18.95 | |
| (2.02) | (4.80) | (2.07) | (0.11) | (0.76) | (1.48) | (6.84) | |||
| slos | 94.08 | 83.04 | 99.81 | 0.18 | 2.00 | 5.03 | 43.48 | ||
| (1.46) | (4.48) | (0.44) | (0.14) | (0.48) | (1.78) | (4.63) | |||
| 0.1 | 300 | m-SLoS | 87.68 | 72.56 | 97.15 | 0.98 | 13.70 | 13.80 | 125.40 |
| (6.90) | (18.76) | (2.87) | (1.15) | (5.73) | (12.07) | (25.02) | |||
| slos | 82.91 | 53.63 | 98.99 | 2.53 | 13.15 | 28.04 | 121.76 | ||
| (7.64) | (20.02) | (1.56) | (2.19) | (4.32) | (20.01) | (19.28) | |||
| 500 | m-SLoS | 88.86 | 76.67 | 97.37 | 0.57 | 4.04 | 7.61 | 44.86 | |
| (5.74) | (14.27) | (2.99) | (0.74) | (1.59) | (8.87) | (11.37) | |||
| slos | 84.14 | 57.18 | 99.34 | 1.64 | 8.71 | 22.44 | 98.07 | ||
| (6.31) | (16.97) | (1.07) | (1.18) | (3.25) | (12.81) | (16.3) | |||
| 1000 | m-SLoS | 91.43 | 80.73 | 98.34 | 0.22 | 2.55 | 3.53 | 32.76 | |
| (4.19) | (10.52) | (2.50) | (0.25) | (1.29) | (3.51) | (10.47) | |||
| slos | 89.35 | 70.94 | 99.53 | 0.64 | 5.12 | 11.38 | 73.65 | ||
| (4.05) | (11.22) | (0.87) | (0.45) | (2.02) | (5.60) | (12.47) |
From Table 1–3 we can see that m-SLoS preforms better than SLoS. Though CN sometimes is a little smaller when we use m-SLoS, it performs better on zero intervals. M-SLoS get a better estimation on zero intervals at the cost of sacrificing the estimation on nonzero intervals. The cost is infinitesimal that we can ignore it.
4. Application
We applied the proposed method to Tecator data. The Tecator data are recorded by a Tecator near-infrared spectrometer (the Tecator Infratec Food and Feed Analyzer) which measures the spectrum of light transmitted through a sample of minced pork meat in the region 850 – 1050 nanometers (nm). Each sample contains finely chopped pure meat with different moisture, fat and protein contents. For each meat sample the data consist of a 100 channel spectrum of absorbances and the contents of moisture (water), fat and protein. The three contents, measured in percent, are determined by analytic chemistry. The total number of samples is 215. Figure 2 displays the 215 curves. This data can be found at http://lib.stat.cmu.edu/datasets/tecator. In this section, our aim is to predict the percentage of fat and protein content given the corresponding spectrometric curve.
Figure 2.
100 channel spectrum of absorbances for 215 curves
Fat and protein are all heat nutrients that can be converted into energy for the body to use. There must exist interconnections between those two responses. Hence when we predict fat and protein, we assume fat and protein are tightly connected. Under this assumption, we can use multivariable regressions to predict fat content or protein content.
Base on above discussion, there must exist range which has no prediction power either on fat content or protein content. Thus we can use m-SLoS to predict fat and protein content and investigate what range of spectra that has no predicting power on fat content and protein content. It will save energy, time and money as there is no need to record spectra on the range without prediction power.
To show the estimations and predictions of the newly proposed method comparing to SLoS, we randomly choose 170 samples from data as training set and the remaining 45 samples are used as the testing set Q. Without loss of generality, we random split 100 times to see the difference among methods. The regularization parameters are selected by 5-fold cross-validation base on the training set. Given the regularization parameters that we chose in this manner, we obtain the final estimations of the regression coefficients base on training test and calculate the of the ARSE (Average Relative Square Error) on testing set Q. Here
where yi is the true content and is the prediction. From Figure 3, we can see that using m-SLoS can greatly improve the prediction both on fat and protein.
Figure 3. Comparison of two methods in terms of ARSE.
(a)Comparison of SLoS and m-SLoS in terms ARSE of prediction of fat; (b)Comparison of SLoS and m-SLoS in terms ARSE of prediction of protein. These results were averaged over 100 random partitions of the data. The box in each box plot shows the lower quartile, median, and upper quartile values, and the whiskers show the range of ARSE in the 100 random partitions of the data.
5. Discussion
Although the literature on FLR is abundant, littler has been done on interpretability and locally sparse modeling. In this article, we consider a combination of the SLoS and Laplacian quadratic as the penalty function. We call the proposed method “m-SLoS”, which uses the SLoS for encouraging locally sparse and Laplacian quadratic penalty for promoting similar local sparsity among coefficient functions associated with the interconnections among the multivariate responses. Simulations and data analysis show excellent numerical performance of the proposed method. We have focused on the least squares based loss and functional linear regression model. Extensions to other models and robust techniques are of interest in future study.
Acknowledgments
The authors gratefully acknowledge National Natural Science Foundation of China (11971404, 71471152), Humanity and Social Science Youth Foundation of Ministry of Education of China (19YJC910010), Fundamental Research Funds for the Central Universities (20720171095, 20720181003) and National Institutes of Health (CA216017).
References
- [1].Cuevas A, Febrero M, Fraiman R. Linear functional regression: the case of fixed design and functional response. Canadian Journal of Statistics. 2002;30(2):285–300. [Google Scholar]
- [2].Cardot H, Ferraty F, Mas A. Testing hypotheses in the functional linear model. Scandinavian Journal of Statistics. 2003;30(1):241–255. [Google Scholar]
- [3].Yao F, Müller HG, Wang JL. Functional data analysis for sparse longitudinal data. Journal of the American Statistical Association. 2005;100(470):577–590. [Google Scholar]
- [4].Yao F, Müller HG, Wang JL. Functional linear regression analysis for longitudinal data. The Annals of Statistics. 2005;33(6):2873–2903. [Google Scholar]
- [5].Müller HG, Stadtmüller U. Generalized functional linear models. The Annals of Statistics. 2005;33(2):774–805. [Google Scholar]
- [6].Ramsay J, Silverman B. Functional data analysis..New York: Springer-Verlag; 2005. [Google Scholar]
- [7].Li Y, Hsing T. On rates of convergence in functional linear regression. Journal of Multivariate Analysis. 2007;98(9):1782–1804. [Google Scholar]
- [8].Tu CY, Song D, Breidt FJ, et al. Functional model selection for sparse binary time series with multiple inputs. Economic Time Series: Modeling and Seasonality. 2012;477–497. [Google Scholar]
- [9].Wang H, Kai B. Functional sparsity: Global versus local. Statistica Sinica. 2015;25:1337–1354. [Google Scholar]
- [10].James GM, Wang J, Zhu J, et al. Functional linear regression that’s interpretable. The Annals of Statistics. 2009;37(5A):2083–2108. [Google Scholar]
- [11].Zhou J, Wang NY, Wang N. Functional linear model with zero-value coefficient function at sub-regions. Statistica Sinica. 2013;23(1):25–50. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [12].Lin Z, Cao J, Wang L, et al. Locally sparse estimator for functional linear regression models. Journal of Computational and Graphical Statistics. 2016;26(2):306–318. [Google Scholar]
- [13].Breiman L, Friedman JH. Predicting multivariate responses in multiple linear regression. Journal of the Royal Statistical Society: Series B. 1997;59(1):3–54. [Google Scholar]
- [14].Rothman AJ, Levina E, Zhu J. Sparse multivariate regression with covariance estimation. Journal of Computational and Graphical Statistics. 2010;19(4):947–962. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [15].Peng J, Zhu J, Bergamaschi A, et al. Regularized multivariate regression for identifying master predictors with application to integrative genomics study of breast cancer. The Annals of Applied Statistics. 2010;4(1):53–77. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [16].Rai P, Kumar A, Daume H. Simultaneously leveraging output and task structures for multiple-output regression. In Advances in Neural Information Processing Systems. 2012; 3194–3202. [Google Scholar]
- [17].Bradley SP, Ben S. A cluster elastic net for multivariate regression. Journal of Machine Learning Research. 2018;18:1–39. [Google Scholar]
- [18].Shi X, Jiao Y, Yang Y, et al. Vimco: Variational inference for multiple correlated outcomes in genome-wide association studies. Bioinformatics. 2019; doi: 10.1093/bioinformatics/btz167. [DOI] [PubMed] [Google Scholar]
- [19].Huang J, Ma S, Li H, et al. The sparse laplacian shrinkage estimator for high-dimensional regression. The Annals of Statistics. 2011;39(4):2021–2046. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [20].Shi X, Zhao Q, Huang J, et al. Deciphering the associations between gene expression and copy number alteration using a sparse double laplacian shrinkage approach. Bioinformatics. 2015;31(24):3977–3983. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [21].Wu C, Zhang Q, Jiang Y, et al. Robust network-based analysis of the associations between (epi) genetic measurements. Journal of Multivariate Analysis. 2018;168:119–130. [DOI] [PMC free article] [PubMed] [Google Scholar]



