Abstract
In this paper, we study high-dimensional multivariate logistic regression models in which a common set of covariates is used to predict multiple binary outcomes simultaneously. Our work is primarily motivated from many biomedical studies with correlated multiple responses such as the cancer cell-line encyclopedia project. We assume that the underlying regression coefficient matrix is simultaneously low-rank and row-wise sparse. We propose an intuitively appealing selection and estimation framework based on marginal model likelihood, and we develop an efficient computational algorithm for inference. We establish a novel high-dimensional theory for this nonlinear multivariate regression. Our theory is general, allowing for potential correlations between the binary responses. We propose a new type of nuclear norm penalty using the smooth clipped absolute deviation, filling the gap in the related non-convex penalization literature. We theoretically demonstrate that the proposed approach improves estimation accuracy by considering multiple responses jointly through the proposed estimator when the underlying coefficient matrix is low-rank and row-wise sparse. In particular, we establish the non-asymptotic error bounds, and both rank and row support consistency of the proposed method. Moreover, we develop a consistent rule to simultaneously select the rank and row dimension of the coefficient matrix. Furthermore, we extend the proposed methods and theory to a joint Ising model, which accounts for the dependence relationships. In our analysis of both simulated data and the cancer cell line encyclopedia data, the proposed methods outperform the existing methods in better predicting responses.
Keywords: Logistic regression, Multiple binary responses, Low-rank models, Smoothly clipped absolute deviation, Generalized information criterion
1. Introduction
Logistic regression is the standard approach to analyzing the relationship between a binary response variable and a set of explanatory variables. For high-dimensional data, where the number of covariates often exceeds the sample size, a number of penalized methods (Lee and Silvapulle, 1988; Cessie and Houwelingen, 1992; Meier et al., 2008; Li et al., 2015) have been developed to estimate the model parameters. These methods have been applied to analyze biomedical data, such as cancer classification and gene interaction detection; see Zhu and Hastie (2004), Park and Hastie (2008), Hastie et al. (2009), and Hung and Wang (2012) for extensions.
Although most of the proposed methods consider one binary response, multiple binary outcomes need to be predicted by a common set of predictors in an increasing number of applications (Chen and Huang, 2012; Martin et al., 2021). For multiple binary responses arising from many applications such as the cancer cell line data analysis, where the treatment responses from multiple drugs are measured from each cell line, standard logistic regression can be applied by combining multiple binary outcomes into a single categorical response (Greenlund et al., 2005; Hayes et al., 2006). The main limitation of this approach is that it may hide the potential outcome-specific covariate effects, leading to inefficient estimates. Another approach is to fit separate logistic models for each response (Koziol-McLai et al., 2001; Cumyn et al., 2009). Although the effects of the covariates on each response can be distinguished, this approach does not consider the potential relationship among multiple responses. When analyzing the Cancer Cell Line Encyclopedia (CCLE) (Garnett et al., 2012) data, which is presented in detail in Section 4.1, if the binary outcomes of multiple drugs are considered on the same cell line, they are likely to be correlated within a cell line (Fitzmaurice et al., 1995; France and Velanovich, 2009; Carpenter, 2010). Because multiple drug responses are available for each individual cancer cell line, there could be an information gain when jointly considering all the drug responses for the same cell line. Thus, one could consider multiple response logistic regression models that exploit information on the common gene expression and the multiple drug response data derived from each cell line. Despite this potential advantage, multiple binary response regression based on a reduced set of predictors has not been rigorously investigated in the literature.
To address the limitations of the existing methods for multiple binary responses, we establish a high-dimensional multivariate logistic regression to estimate the coefficient matrix by imposing a structural assumption on this matrix. For the analysis of the CCLE data, it could be reasonable to assume that the coefficient matrix has a certain structure, e.g., specific genes could have similar effects on multiple responses. Because the dimension of the CCLE data is large because of the large number of genes as well as the number of drugs tested, standard statistical inference may be unstable. It is important to develop methods that are more efficient in dealing with a matrix parameter estimation. In this study, we assume that the underlying coefficient matrix is both row-wise sparse and low-rank to capture the potential similarities among the effects of different drugs to better predict treatment responses.
Such low-rank and row-wise sparse assumptions are commonly used in a wide variety of matrix-related estimation problems to reduce the effective number of model parameters and to achieve better statistical performance. Among numerous approaches for low-rank matrix estimation, nuclear norm regularization is one popular approach including multitask regression (Yuan et al., 2007; Bach, 2008; Rohde and Tsybakov, 2011; Negahban and Wainwright, 2011; Chen et al., 2013; She and Chen, 2017; Cho and Park, 2022), matrix completion (Candés and Recht, 2009; Keshavan et al., 2010; Koltchinskii et al., 2011; Negahban and Wainwright, 2012), and collaborative filtering (Srebro et al., 2005). The nuclear norm is the sum of the singular values of a matrix; so nuclear norm regularization encourages many zeros in singular values for the estimated matrix, i.e., it leads to a low-rank estimate. Chen et al. (2013) proposed an adaptive nuclear norm penalization approach in high dimensional multivariate linear regression.
Another line of work, called reduced-rank regression (RRR) (Izenman, 1975; Reinsel and Velu, 1998), is based on a restriction imposed on the rank of the regression coefficient matrix; it has proven to be effective for the prediction of multiple responses from common predictors. As for the estimation of this rank-deficient coefficient matrix parameter in RRR, the minimax optimal estimation has been studied in the literature (Bunea et al., 2011; Giraud, 2011; Bunea et al., 2012; She, 2017; Bing and Wegkamp, 2019; She and Tran, 2019). Bunea et al. (2011) proposed a rank selection criterion for selecting the optimal reduced rank estimator of the coefficient matrix in multivariate response regression models. She (2017) proposed selective RRR for constructing optimal explanatory factors. She and Tran (2019) developed a predictive information criterion to achieve the minimax optimality when the coefficient matrix parameter is both reduced rank and row-wise sparse. For the variable selection methods in RRR, see Turlach et al. (2005); Chun and Keles (2010); Peng et al. (2010). Most related studies including the cited papers above have focused on linear regression type problems with continuous multivariate responses. For multivariate categorical responses, the class of vector generalized linear models has been widely used and examined only in computational and applied studies, e.g., Yee (2015).
In this paper, we establish high-dimensional multivariate logistic regression and general asymptotic theory with low-rank and row-wise sparse assumptions. This facilitates suitable statistical estimation and inferences of the multiple binary responses such as in the CCLE data application. To the best of our knowledge, this study is one of the first to develop rigorous multivariate regression theory for discrete responses such as binary variables. We propose new non-convex penalized estimation methods under both marginal and joint model set-ups, and we conduct novel theoretical analyses under weak dependencies of the binary responses. The proposed penalization uses the sums of the Smoothly Clipped Absolute Deviation (SCAD) (Fan and Li, 2001) transformations of singular values and Euclidean norm of rows, respectively. The first part, i.e., the sum of the SCADs of the singular values, is interpreted as a non-convex variant of the nuclear norm penalization. Despite the remarkable popularity for a parsimonious model selection, such non-convex nuclear norm penalizations via SCAD have been not studied in the high-dimensional matrix literature. For an overview of the non-convex penalization research, see Fan and Lv (2010) and references therein. It is worthy noting that most existing literature on non-convex penalties analyzes coordinate-separable penalty functions satisfying , e.g., see Loh and Wainwright (2014, 2017). In contrast, our penalty functions for rank selection are not coordinate-separable, which poses challenges in developing theoretical properties. By addressing theoretical and computational issues due to the non-convexity of the nuclear penalization, we prove that the proposed optimization has a unique stationary point so that it avoids the potential multiple local minima problem in non-convex optimizations. That is, the global minimizer is indeed the unique stationary point of the proposed non-convex problem in multivariate logistic regression models.
We establish high dimensional statistical theory of the unique stationary point . Specifically, we show that enjoys a tight estimation error bound in terms of an operator norm with a rate of , where represents an index set of non-zero rows of the matrix , and q is the number of responses. Additionally, we can achieve a true rank of the underlying coefficient matrix and the true row support set S at the same time via the developed method. Moreover, we propose a new data-adaptive selection criterion to choose the rank and row-dimension of a coefficient matrix simultaneously. We prove that it provides an estimate with rank consistency as well as support consistency of rows. Compared with the theoretical analysis in the existing multivariate regression literature in the high dimensional setting (Chen and Huang, 2012; Bunea et al., 2012; She and Chen, 2017; Bing and Wegkamp, 2019; She and Tran, 2019; Zou et al., 2021), our results are new in that we consider discrete multiple responses, i.e., multivariate logistic regression, and non-convex version of the nuclear norm penalization, and we develop a new data-adaptive choice for consistent simultaneous selection of the rank and non-zero rows of a matrix parameter. For detailed comparisons to the existing literature, we refer to Sections 2.2.1 and 3.4. Further, we develop an alternating direction method of multipliers (ADMM) (Boyd et al., 2011) algorithm in our implementation to solve the proposed constrained non-convex optimization.
The remainder of this paper is organized as follows. In Section 2, we introduce the proposed marginal and joint models, respectively, to predict multiple binary responses simultaneously, and develop statistical procedures and theory. In Section 3, we perform simulation studies to evaluate the finite sample performance of the developed methods. In Section 4, we apply the developed methods to the CCLE data to illustrate the advantage of our method over the existing methods. All the technical proofs are deferred to the Supplementary materials.
2. Model and Method
2.1. Marginal Logistic Regression Models
We consider a high-dimensional multivariate marginal logistic regression, where for units i = 1, …, n with a covariate vector , there exist q binary responses for satisfying
| (1) |
where is the conditional probability of given xi, and both dimensions p and q are allowed to increase with n. Such datasets are widely encountered in an increasing number of biomedical studies. For example, for the CCLE data, the gene expression levels xi and the treatment responses corresponding to drugs k = 1, …, 24, were measured for each of the 482 cell lines i = 1, …, 482. As the 24 treatment outcomes are likely to be correlated within a cell line, we need to simultaneously consider a logistic modeling of multiple responses to exploit information on the common gene expression. We also allow dependence between responses, i.e., for k = 1, …, q may satisfy for given xi.
With multiple responses, the model parameters can be written in matrix-form, namely , and we impose both low-rank and row-wise sparse constraints on this high dimensional matrix B*. That is, we assume that the row support set of B* is with , and . These are plausible in the CCLE data application: some responses (drugs) may be similar, i.e., their associations with covariates (gene expression levels) are similar, and only a few variables (genes) are associated with responses (drugs), i.e., only the first s rows of B* can have non-zero values. The low-rank assumption can also be interpreted as that the underlying coefficients ’s lie inside a lower dimensional space (Rohde and Tsybakov, 2011; Bunea et al., 2012; Chen and Ye, 2014). In our theoretical analyses, the number of covariates p, the true row model size s, and the number of responses q are allowed to increase with the sample size n.
For the matrix notation, define the binary response matrix and the covariate matrix . Let be the negative log-likelihood function for conditional distribution of each drug response yk given covariates xi. Following Meier et al. (2008), we assume that the probability in Model (1) satisfies
| (2) |
for some so that we can consider the settings where the probability approaches to zero or one by allowing . Note that (2) is equivalent to or by (1) where for a matrix A. Because , where is the jth row of B*, the inequality is a sufficient condition for (2). Motivated by this, we consider the following feasible set: for some b > 0.
One could estimate the coefficient matrix B* by using the likelihood principle, i.e., minimizing the negative marginal likelihood , subject to low-rank and row-sparse space constraints on B, i.e., , where and are pre-specified. Denote this estimator, that is, the constrained Maximum Likelihood Estimator (MLE) by . The negative logistic likelihood is obtained under the assumption that the multiple binary variables are independent. It is also used as the marginal negative likelihood for the models allowing for correlated multiple binary responses such as in the CCLE data application. Usually, it is quite complicated to model the dependence relationships between variables, in particular in high dimension as in our CCLE application. If the (conditional) marginal distribution for k = 1, …, q is of major interest, the marginal likelihood may be conveniently used, which has been employed in various problems with dependent variables (Wilson, 1989; Welsh et al., 2002; Jentsch et al., 2021; Lee and Park, 2021). Refer to Section 2.4 for the joint model setting of the multiple responses involving the dependence structures.
Because it is computationally infeasible to choose a rank and row-wise space together, especially for high dimension, we develop a novel penalized approach for the simultaneous choice of a matrix rank and row space, and coefficient matrix estimation in this work. More specifically, we consider the following optimization with penalty functions and :
| (3) |
where are regularization parameters, is the jth largest singular value of B, and ’s are SCAD functions (Fan and Li, 2001) with regularization parameters :
We set a = 3.7 as suggested by Fan and Li (2001); Tang et al. (2022), which is known to be optimal based on cross-validated empirical studies (Loh and Wainwright, 2014). In (3), two different penalty terms are included. The first one, i.e., , aims to obtain low-rank estimation. The second one, i.e., , aims to obtain row-wise sparse estimation. Then, S and r are estimated by the row support set and rank of the estimate , i.e.,
Note that the proposed optimization (3) is non-convex due to the non-convex SCAD penalty terms, which can involve multiple local minimum points. In Subsection 2.2, we prove that the proposed optimization (3) actually has a unique stationary point under some conditions, i.e., a global minimum point is a unique stationary point. We also derive the estimation error bound of the proposed estimator (3) and show the rank consistency and row-selection consistency for the proposed estimator.
2.2. Assumptions and Results
Throughout the theoretical considerations, let be the transformation of a matrix into a vector, by stacking its columns into a column vector, and be its inverse operation. For a function f, let ∇f denote a gradient or subgradient, if it exists. Let for . We can view as . Further, we let , which is a p × q matrix. Define a function . Thus, and . Throughout the paper, represents an operator norm of a matrix A, i.e., the maximum singular value of A, and is a maximum eigenvalue of a symmetric matrix B. We write if for some positive constant C1, write if and , and use and to denote and , respectively. For a matrix A, let be the index set of non-zero rows of the matrix A, and be the index set of non-zero entries of the matrix A. For a vector x, let be the number of non-zero elements in the vector x. For a matrix , let , and . For a matrix A and index sets of rows and columns, say S1 and S2, let AS1,⋅ and A⋅,S2 be the sub-matrices of A with rows in S1 and columns in S2, respectively. Let and be the rank of X and X⋅,s respectively. A stationary point of the optimization (3) is any point in the feasible set Ω satisfying
for all B ∈ Ω, where . Let be the compact singular value decomposition of , i.e., , and . Let denote and any orthogonal complements of U and V, respectively. Let , where represents the Fisher information matrix related to the kth response.
We make the following assumptions to facilitate our technical derivations.
Assumption 1. [On the likelihood] It holds that for some positive constants αk and γk,
where .
Assumption 2. [On the correlations between the drug responses] Denote the conditional covariance matrix of the multiple responses given by . Then, the largest eigenvalue of Σx is uniformly bounded, i.e., for some .
Assumption 3. [On the coefficient matrix] The underlying coefficient matrix B* satisfies , and and, where .
Assumption 4. [On the eigenvalues] The eigenvalues of is bounded from above by some constant , i.e., .
Assumption 5. [On the covariates] For , it holds that .
Assumption 1 is referred as a restricted strong convexity condition that is also imposed in Raskutti et al. (2010); Loh and Wainwright (2014, 2017), where αk represents a curvature parameter and γk represents a tolerance parameter for k = 1, 2. It is related to the lower and upper bounds of the conditional probabilities , and it holds when the are uniformly bounded away from zero and one. Assumption 2 is interpreted as dependence conditions imposed on the covariance matrix . Note that by simple calculations. If the responses for k = 1, …, q are uncorrelated given xi, then it is satisfied, e.g., by taking qy = 1 / 4. More generally, it holds if the covariance matrix Σx is sparse in the sense that there exists a function satisfying and for any , and x. Thus, one easily sees that Assumption 2 is fulfilled for weakly dependent binary variables such as m-dependent and autoregressive logistic processes (e.g., Sim, 1993). Assumption 3 guarantees the feasibility of the underlying B*, and gives beta-min type conditions imposed on the non-zero singular value and non-zero rows of B*, respectively. A similar condition is also imposed in Bunea et al. (2012); Chen et al. (2013); She and Tran (2019). Assumption 4 is an eigenvalue condition, which is commonly assumed in the literature of high-dimensional models (Zheng et al., 2015). Assumption 5 is also imposed in Zheng et al. (2015) with a slight modification to prove the oracle property of their estimate, which restricts the correlations between relevant covariates and irrelevant covariates.
Note that the proposed method enjoys the rank consistency without the following irrepresentable condition: , where . The rank consistency based on the conventional nuclear norm penalization is known to require an additional irrepresentable condition, which can be found in Bach (2008) and Kong et al. (2020). Our results show that the proposed method based on SCAD-type penalty for singular value will guarantee the rank consistency even in the absence of the irrepresentable condition, illustrating the theoretical advantage of the SCAD in the rank selection context. Similar advantage of SCAD over ℓ1 norm penalization (Lasso) has also been investigated in high dimensional univariate regression (e.g., Loh and Wainwright, 2014, 2017). The following theorem presents the theoretical properties of the proposed method.
Theorem 1. Suppose that Assumptions 1-5 hold, , and . If and satisfy and , then the program (3) has a unique stationary point and it satisfies
Theorem 1 shows that if the penalty parameters are appropriately chosen, then the proposed method achieves the same convergence rate as the oracle estimator , that is, the constrained MLE using the true rank r and row-support set S, in terms of the Frobenius norm . See Theorem S1 in the Supplementary materials for theoretical properties of the constrained MLE. Moreover, it implies that it leads to consistent simultaneous selections of the rank and row space of relevant covariates. One also sees that although the proposed optimization (3) is non-convex, there exists a unique solution , more specifically, the proposed estimate is the unique stationary point of the non-convex problem (3). It is worth noting that the growth rate conditions and are only required to prove the uniqueness of the stationary point as in Kim and Kwon (2012); Loh and Wainwright (2014, 2017) and rank consistency as in Bach (2008); Kong et al. (2020). A similar condition can also be found in Loh and Wainwright (2017) in high dimensional univariate regression contexts. Without this growth rate condition, the local optimality is only guaranteed for . The growth rate conditions have been assumed in many published studies about the global optimality of nonconvex penalized methods in the other settings (Fan and Li, 2001; Wang et al., 2012; Loh and Wainwright, 2014).
2.2.1. Comparisons with related results
Here, we provide detailed theoretical error rate comparisons to related literature, including the existing multivariate reduced rank regression papers and extensions of some penalizations to our multiple logistic settings. We would like to note that a main goal of our study is to achieve selection consistency (of rank and row support set), which generally needs stronger assumptions than to attain optimal estimation rates.
Even though our main interest, penalization techniques, and model set-ups are different, we consider estimation of XB, i.e., which is the main focus in some literature (Giraud, 2011; Bunea et al., 2012) in the multivariate response linear regression setting. Bunea et al. (2012) achieved an error rate of and She (2017); She and Tran (2019) achieved an error rate of under the setting that the underlying coefficient B* is low-rank and row-wise sparse. Theorem 1 stated above implies that our estimator satisfies , which is tighter than the obtained rate in the aforementioned literature because Theorem 1 assumes stronger conditions. If we are interested in estimation consistency instead of selection consistency, a slight modification of the proposed method can lead to as in She (2017); She and Tran (2019) under more mild conditions. We refer to Theorem S2 in the Supplementary materials for more details. Under the setting that the underlying coefficient B* is just low-rank, Bunea et al. (2011); Bing and Wegkamp (2019) achieved a minimax error rate of . It is worth noting that the obtained error bound corresponds to this minimax error rate when only relevant covariates in X, i.e., X,S, are included in the low-rank matrix estimation problem.
Theorem 1 also implies that the proposed method has theoretical advantage than extensions of some existing penalizations to our multivariate response logistic regression setting: regular logistic regression with ℓ1 penalty (RL) to individual responses; and multiple response logistic regression with group Lasso and nuclear norm penalties (MLL), which are also considered in the simulation analysis as in Section 3. Using the existing theories applied to Lasso, we can prove that RL achieves . This implies that the proposed method provides a tighter estimation error bound because it always holds that . Specifically, if the rank r is small, i.e., , the proposed method can provide a substantial rate improvement over RL by considering multiple responses jointly through the low-rank and row-wise sparse structural assumption. Additionally, through modification of theories used in Theorem 1, it can be shown that MLL attains the same rate , but with an additional irrepresentable condition as mentioned in Subsection 2.2. Refer to Section 3 for numerical comparisons with RL and MLL.
2.3. Simultaneous selection of rank and row space
We now propose a data-dependent way to select the regularization parameters λ1 and λ2 for the proposed method (3), and prove that the proposed selection criteria jointly choose the underlying rank r and row support set S with probability tending to one. Recently, the generalized information criterion (GIC), including the Bayesian information criterion, has been used to select the tuning parameter of penalized methods for variable selection in high dimensional models, e.g., Chen and Chen (2008), Fan and Tang (2013), Lee et al. (2014), and Zheng et al. (2015). Accordingly, we define an information criterion suitable for our simultaneous choice problem by modifying the GIC. See also Nishii (1984). Let be the proposed estimate using penalty parameters and . We propose to select and as the minimizers of the following information criterion:
| (4) |
where and . Here, we can see that r (s + q) and sq are the degrees of freedom corresponding to the choices of the rank r and the row sparsity s, respectively. There are similar computations of the degrees of freedom in the literature, e.g., Bai and Ng (2002) and Huang et al. (2012).
For selection of the row support set S, we set an upper bound of the size of the row support set, denoted by U(n), with s < U(n) < p, i.e., we only consider B satisfying . Similar assumptions of restricting the model dimension have been widely used in related studies., e.g., Fan and Tang (2013) and Zheng et al. (2015). For d ≤ p, define . It holds that . Following Zheng et al. (2015), let Γmax be the uniform upper bound of the eigenvalues of with i.e., . To discuss selection consistency of the proposed information criterion (4), we introduce some notations for mis-specified models for (r, S), that is, overfitted and underfitted models. We define an overfitted model when , where
Similarly, we define an under-fitted model when , where
Theorem 2 implies that λ1 and λ2 chosen from the proposed information criterion provides the estimate with the correct rank r and correct row support set S with high probability.
Theorem 2. Suppose that the conditions in Theorem 1,
Let be the proposed estimate using the optimal penalty parameters and satisfying conditions in Theorem 1. If , and , we have as ,
Theorem 2 suggests that wide ranges of and work in theory. This includes the following specifications as special cases: . In this case, the GIC in (4) can be be written as
| (5) |
In our simulation and real data examples, we set and we observed that these choices perform well.
The beta-min (minimum signal strength) conditions and the restrictions on the rank r and the sparsity parameter s in Theorem 2 are essential for rank and row selection consistency of the GIC-type penalty. Note that She (2017) and She and Tran (2019) proposed a new class of information criteria called PIC, which achieved the minimax optimal error rate without the beta-min conditions. However, they also require the beta-min conditions to guarantee the row support recovery via PIC as in Theorem 5 of She and Tran (2019). Among different types of PIC, we choose the following criterion to compare it with our GIC:
for some absolute constants A1, A2 > 0, which can be determined by Monte Carlo experiments. Following She and Tran (2019), we set A1 = 2 and A2 = 1.8, and we check that this specification works well in our numerical experiments. For comparisons of the proposed GIC and PIC via simulation analyses, see Subsection 3.4.
2.4. Joint Ising models for multiple binary responses
In this subsection, we consider the joint Ising models (Ising, 1925) for multivariate binary data. The models account for the conditional dependency relationships between the outcome variables as well as for the marginal models as in Section 2.1. Following Cheng et al. (2014), we assume a sparse covariate dependent Ising model to study both the conditional dependency within the binary data and its relationship with the additional covariates. Let , where . We assume that the log-odds depend on the covariates through
| (6) |
Compared to model (1), model (6) includes the additional term , from which dependencies between responses are considered. Let
Specifically, the vector being zero implies that and are conditionally independent given x, and for , and implies that the conditional association between and does not depend on xij (Cheng et al., 2014). We assume that with , and with . Note that a sufficient condition for (2) is and for all . Because and , satisfying and for all k is a sufficient condition for (2). Thus, we consider the following feasible set: for some b > 0, where . Let be the negative log-likelihood function for conditional distribution of each drug response given xi and . We can consider the following optimization:
| (7) |
where are regularization parameters, and Θjk is the (j, k)th component of a matrix Θ. For a penalty function , we consider the SCAD functions. For choices of , we consider the following GIC, which generalizes the GIC in (5):
where , and are the estimates of s, r, and f obtained from and .
The following theorem presents the theoretical properties of the proposed penalization incorporating the Ising models. We make new assumption for the joint Ising models that replace Assumptions 1 in the marginal models. Assumption S1, presented in the Supplementary materials, is the restricted strong convexity condition in the Ising models, which corresponds to Assumption 1 in the proposed marginal models. Let , where represents the Fisher information matrix related to the kth response.
Theorem 3. Suppose Assumptions 2-5 and Assumption S1 hold, and . If , and satisfy , , and , then the program (7) has a unique stationary point , and it satisfies
Theorem 3 shows that if the penalty parameters are appropriately chosen, then the proposed method incorporating the Ising models leads to consistent simultaneous selections of the rank , row space , and support set . Although the optimization (7) is non-convex, the proposed estimate is the unique stationary point of the non-convex problem (7).
3. Simulation Study
3.1. Methods for Comparison
We conduct simulation studies to investigate the finite sample performance of the proposed method. Specifically, to illustrate the advantages of the proposed estimator in the multivariate logistic regression framework, we compare it to two other logistic regression methods:
Multiple response Logistic regression with SCAD penalties (MLS)
Multiple response Logistic regression with group Lasso and nuclear norm penalties (MLL)
Regular Logistic regression with ℓ1 penalty (RL).
Here, MLS is the proposed estimate in (3), MLL is the estimate using nuclear norm and group Lasso penalties, i.e., estimate of (3) with a penalty , and RL is the logistic regression with Lasso penalty by considering each response separately. We consider a setting where the underlying B* is both low-rank and row-wise sparse. For regularization parameters in MLL and MLS, we choose λ1 and λ2 using the proposed generalized information criterion as in Subsection 2.3. We set and , which satisfies (2) with . Note that RL involves only one regularization parameter λ. We choose λ using the Bayesian information criterion. Using this comparison, we can assess the performance of each method with the appropriately chosen regularization parameters.
3.2. Simulation Model
In our simulation study, we consider the multiple response logistic model, where the underlying coefficient matrix is both low-rank and row-wise sparse, where is the intercept coefficient, the first s + 1 rows of B* are non-zero, and the remaining p − s rows of B* are set to zero. We set , and the non-zero entries in are generated by the singular value thresholding of , such that the rank of B*, except , is r, where each entry in follows the mixture distribution . Specifically, for the singular value decomposition of and satisfy , and is a diagonal matrix whose diagonal components are singular values of sorted in non-increasing order. We set , where Ur and Vr are the first r columns of U and V, respectively, and Dr is the upper-left r by r sub-matrix of D. The dimensions are chosen from , and q = 25. Note that we consider the datasets and , where , i.e., in a matrix form . When generating X, each row of X follows a multivariate normal distribution, , where . Thus, a total of 16 different simulation cases are considered, with varying data distributions and model parameters.
3.3. Simulation Results
To compare the performance of different methods, we consider the identification accuracy of the non-zero rows and rank, and estimation accuracy for the estimates in terms of the following criteria:
Relative estimation error between the underlying B* and the estimate
PUS: Percentage of Under-fitted models in terms of the row Support set, i.e.,
PCS: Percentage of Correctly fitted models in terms of the row Support set, i.e.,
POS: Percentage of Over-fitted models in terms of the row Support set, i.e.,
PCR: Percentage of Correctly fitted models in terms of the Rank, i.e.,
Average of the estimated rank
The performance measures are averaged over 100 replications for each specification. It is worth noting that we consider 100 cases in which the true support set and true rank are included in a sequence of candidate models. The proposed information criterion is used to select the tuning parameters for MLS and MLL.
Table S1 in the Supplementary materials displays the average of the relative estimation errors with the corresponding standard deviations within parentheses for the three methods over the 16 different cases. We can see that the proposed method MLS achieves smaller relative estimation errors compared with MLL and RL for most cases. This suggests that MLS with appropriately chosen regularization parameters can improve the estimation accuracy compared to the other two methods when the underlying coefficient matrix is both row-wise sparse and low-rank. Table 1 and Table S2 in the Supplementary materials record the performances of the row support identification for MLS, MLL, and RL, when n = 400 and n = 800, respectively. We can see that RL tends to produce an over-fitted model because RL does not incorporate multiple responses. Overall MLS has the best performance in the row support set estimation accuracy, which demonstrates that the reduced rank-related smoothly clipped absolute deviation penalty does not affect negatively to the sparse row-related smoothly clipped absolute deviation penalty. This also demonstrates that by accumulating information across responses and using SCAD penalties for row support selection, the proposed method may boost the power of correct model selection.
Table 1.
Support set accuracy for the three methods when n = 400
| Parameters | MLS | MLL | RL | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| n | p | s | r | PUS(%) | PCS(%) | POS(%) | PUS(%) | PCS(%) | POS(%) | PUS(%) | PCS(%) | POS(%) |
| 400 | 400 | 10 | 5 | 19 | 71 | 10 | 31 | 50 | 19 | 29 | 40 | 31 |
| 10 | 17 | 70 | 13 | 28 | 51 | 21 | 27 | 38 | 35 | |||
| 20 | 5 | 19 | 68 | 13 | 27 | 48 | 25 | 26 | 32 | 42 | ||
| 10 | 17 | 70 | 13 | 25 | 48 | 27 | 22 | 34 | 44 | |||
| 800 | 10 | 5 | 27 | 62 | 12 | 40 | 43 | 27 | 38 | 20 | 42 | |
| 10 | 28 | 65 | 7 | 37 | 43 | 20 | 36 | 24 | 40 | |||
| 20 | 5 | 33 | 57 | 10 | 26 | 35 | 19 | 50 | 13 | 37 | ||
| 10 | 33 | 60 | 7 | 49 | 38 | 13 | 59 | 15 | 26 |
Table 2 and Table S3 in the Supplementary materials record the performances of the rank estimates for MLS and MLL, when n = 400 and n = 800, respectively. For the average rank, we also record standard deviations within parentheses. Because RL does not involve rank-based regularization, it is not included in this comparison. We can see that MLS gives more accurate rank estimates on average compared with MLL for all cases. MLS outperforms MLL also in correct rank accuracy for all cases. This demonstrates that MLS with appropriately chosen regularization parameters improves the rank accuracy.
Table 2.
Rank estimate accuracy for the two methods when n = 400
| Parameters | Mean rank | PCR(%) | |||||
|---|---|---|---|---|---|---|---|
| n | p | s | r | MLS | MLL | MLS | MLL |
| 400 | 400 | 10 | 5 | 4.25 (1.32) | 9.23 (3.32) | 84 | 40 |
| 10 | 8.42 (3.20) | 18.32 (6.31) | 79 | 31 | |||
| 20 | 5 | 6.02 (1.40) | 13.23 (6.33) | 80 | 33 | ||
| 10 | 11.44 (2.25) | 21.23 (7.83) | 78 | 31 | |||
| 800 | 10 | 5 | 5.92 (1.88) | 12.22 (5.92) | 68 | 17 | |
| 10 | 11.71 (2.53) | 23.82 (8.31) | 61 | 14 | |||
| 20 | 5 | 7.72 (3.15) | 20.75 (7.78) | 64 | 17 | ||
| 10 | 12.78 (4.01) | 29.79 (9.41) | 65 | 12 |
3.4. Comparisons of GIC and PIC
In this subsection, we compare the proposed GIC and the existing PIC in terms of rank and row support selection. For simulation settings, we generate data as in Subsection 3.2, and follow She and Tran (2019) when generating an underlying coefficient matrix B*. Specifically, we set the coefficient matrix , where , and entries in A0 and A1 are i.i.d. standard Gaussian. We consider the following six different simulation settings, with varying model parameters: (a) ; (b) n = 400, p = 400, q = 10, s = 15, r ∈ {3, 5}; (c) n = 400, p = 600, q = 15, s = 20, r ∈ {3, 7}.
Tables S4 and S5 in the Supplementary materials record the comparisons of GIC and PIC in terms of the row support accuracy and rank estimate accuracy, respectively. Overall, both GIC and PIC provide accurate row support accuracy, whereas PIC tends to overfit compared to GIC. Regarding rank estimate, GIC provides accurate results except when (n, p, q, s, r) = (400, 600, 15, 20, 7). In this case, GIC tends to underestimate the underlying rank r, which may be due to the fact that the term log p in GIC causes overpenalizing the rank.
4. Applications to cancer cell line data
4.1. The dataset
With the development of high-throughput technologies in recent decades, various studies have been conducted to build genomic predictors of drug response from large panels of cancer cell lines (Staunton et al., 2001; Barretina et al., 2012; Garnett et al., 2012). The CCLE dataset offers a comprehensive collection of gene expression and anti-cancer drug responses across cell lines (Barretina et al., 2012; Garnett et al., 2012). This dataset contains drug treatment responses for 24 drugs on 482 cancer cell lines, where the transcription profile of each cell line is characterized by the measured expression levels of 18,988 probes. These data have an important role in drug discovery for the screening of drug candidates (Blanco et al., 2018). They are widely used to understand cancer biology and to test the efficacy of novel therapies (Sharma et al., 2010) due to their cost-effectiveness and unlimited replicative nature (Ferreira et al., 2013).
For the cancer cell line data analysis, standard logistic regression can be applied by combining multiple binary outcomes into a single categorical response (Greenlund et al., 2005; Hayes et al., 2006). The main limitation of this approach is that it may hide the potential outcome-specific covariate effects, leading to inefficient estimates. Another approach is to fit separate logistic models for each response (Koziol-McLai et al., 2001; Cumyn et al., 2009). Although the effects of the covariates on each response can be distinguished, this approach does not consider the potential relationships among multiple responses. When analyzing the cancer cell line encyclopedia data, if the binary outcomes of multiple drugs are considered, they are likely to be correlated within the same cell line (Fitzmaurice et al., 1995; France and Velanovich, 2009; Carpenter, 2010). Because multiple drug responses are available for each individual cancer cell line, there could be an information gain when jointly considering all the drug responses for the same cell line. Thus, one could consider multiple response logistic regression models that exploit information on the common gene expression and the multiple drug response data derived from each cell line. Despite this potential advantage, multiple response regression based on a reduced set of predictors has not been investigated for logistic modeling, either theoretically or practically, in the literature.
In this analysis, we consider binary drug responses defined as being either sensitive or resistant between a drug and the treated cell line. We first select the 3,000 probes with the largest variances across the 482 cell lines and then choose 450 probes having the largest correlation coefficients with the IC50 values. Therefore, we consider n = 482 cell lines, p = 450 features, and q = 24 drugs, respectively. We divide the cell lines into two groups based on their continuous growth inhibition values (IC50) such that the cell lines with their IC50 value less than 0.5 are assigned to the “sensitive” group, and the other cell lines are assigned to the “resistant” group.
4.2. Results
For prediction accuracy, we consider the following five methods including the three logistic regression methods considered above, i.e., MLS, MLL, RL, and adding Linear Support Vector Machine (SVM) and Support Vector Machine with a Gaussian kernel (SVMG) using the optimal parameter. To evaluate the prediction accuracy of each method, we perform 10-fold CV. The cancer cell lines are randomly divided into 10 mostly equal-sized subsets, then nine subsets are used as the training set to estimate the model, and the remaining subset is used as the test set to assess the prediction accuracy. The performance of each method is presented by the proportion of the accurately predicted cell lines (PA) and area under the ROC curve (AUC) to investigate the trade-off between the true positive rate and the false positive rate. Figure 1 displays the overall ROC curve, and average values of AUC, and PA with one standard deviation for the five methods. Here, we include all the cancer cell lines and drugs to compute the performance measures. We observe that the proposed multiple logistic regression, i.e., MLS, has higher AUC and PA values than the other methods, suggesting its better performance on average.
Fig. 1.

Performances of the five methods. All cancer cell lines and drugs are included when computing the performance measures.
Figure 2 illustrates the average prediction accuracy for each drug. Note that when computing AUC for each drug, the test data set involving only a single response value for the corresponding drug is not included because the ROC curve and AUC cannot be computed for that case. Based on the AUC, MLS and SVMG have superior predictions compared to the other methods for the majority of drugs. Specifically, MLS, SVMG, and SVM perform best in 13, 7, and 4 of the 24 drugs, respectively. Based on PA, MLS generally outperforms the other methods, suggesting that it may provide superior prediction for the extremely imbalanced test data that are excluded from AUC calculation.
Fig. 2.

Performances of the five classification methods. Responses of each drug across all cancer lines are included when computing performance measures.
Next, we investigate the performance for each pair of drug and cancer type separately. In particular, we consider the following 16 cancer types that include at least 10 cell lines in the CCLE data: Kidney, Liver, Pancreas, Ovary, Lung, Skin, Lymphoid, Intestine, Stomach, Nervous, Oesophagus, Bone, Breast, Endometrium, Ganglia, and Soft. For this study, we only consider PA because for a specific cancer type and drug combination, the test data frequently include a single response value with respect to a drug such that AUC cannot be computed. Figures 3 and S1–S3 in the Supplementary materials display the heatmaps of the average PA values for each pair of cancer type and drug. We observe that MLS achieves greater or similar PA values compared to the other methods in many cases, which suggests the benefit of prediction by jointly combining different drugs.
Fig. 3.

Performances of the five methods for the four selected cancer types, ‘Kidney’, ‘Liver’, ‘Pancreas’, and ‘Ovary’. For each pair of drug and cancer type, the corresponding cancer cell lines and drugs are included when computing performance measures.
We observe that for certain compounds (e.g., 17-AAG, Topotecan, PD-0325901), all the classification methods consistently have smaller PA values across the 16 cancer types, suggesting that the prediction accuracy may depend on the specific compound. Figure S4 displays the standard deviations of the PA values over 16 cancer types. We observe that the standard deviation for ‘17-AAG’, ‘Irinotecan’, and ‘AZD6244’ is near 0.1, whereas the standard deviation of ‘PHA-665752’, ‘Nutlin-3’, and ‘PD-0332991’ is near 0.01 for all the methods. These results imply that the variations of the prediction accuracy across different cancer types depend on the drugs considered.
Among the drugs considered, ‘Lapatinib’ is an orally active drug for breast cancer, and its treatment has been found to induce prevalent resistance (Liu et al., 2009; Wang et al., 2011). We observe that the proposed MLS provides a more accurate prediction for the breast cancer cell lines treated with ‘Lapatinib’. ‘17-AAG’ is another effective compound for breast cancer (Modi et al., 2011). We also observe that MLS provides a higher PA for the breast cancer cell lines for this compound. In addition, ‘PD-0332991’ can also be used for the treatment of ER-positive and HER2-negative breast cancer. In this case, we see that all the methods predict the response well for the breast cancer cell lines.
Regarding drugs for lung cancer treatment, ‘PF2341066’, an anti-cancer drug acting as an ALK (anaplastic lymphoma kinase) and ROS1 (c-ros oncogene 1) inhibitor (Forde and Rudin, 2012; Roberts, 2013), is approved for lung cancer in the US and other countries. ‘TAE684’ is also effective to inhibit cell proliferation and used for the treatment of lung cancer. For the lung cancer cell lines, MLS and SVMG predict the responses of these two compounds well compared with other methods. Regarding other drugs related to the other cancer types, ‘Erlotinib’ is used for both lung cancer and pancreatic cancer. For the lung and pancreatic cancer cell lines, all the methods predict the response of ‘Erlotinib’ well. As studied in Scotlandi et al. (2005), ‘AEW541’ is one of the growth factor-I receptor kinase inhibitors applied to musculoskelet al tumors such as cancer that develops in bone or soft tissue. For the bone and soft cancer cell lines, all the methods have similar prediction accuracy for the response of ‘AEW541’. As demonstrated in Petrucci et al. (2012), ‘LBW242’ can be used for the treatment of ovarian cancer. For the ovary cancer cell lines, we observe that all the methods similarly predict the response of ‘LBW242’. For pancreatic cancer, the administration of ‘AZD6244’ can significantly inhibit tumor growth in the pancreatic tumor xenograft model (Yeh et al., 2007). MLS and SVMG predict the response of ‘AZD6244’ well compared with the other methods for the pancreatic cancer cell lines.
Note that some of the 24 drugs in the CCLE data have been used for multiple cancer types. For example, ‘Topotecan’ is used to treat ovarian cancer when other treatments have failed and is also used to treat lung cancer. For the lung cancer cell lines, MLS predicts the response of ‘Topotecan’ well compared with the other methods. For the ovarian cancer cell lines, SVMG predicts the response of ‘Topotecan’ well; MLS is the second best. ‘Paclitaxel’ is used for the treatment of breast, ovarian, lung, and other types of solid tumor cancers. For the breast cancer cell lines, MLS predicts the response of ‘Paclitaxel’ well; for the lung cancer patients, all the methods predict similarly; however, for the ovarian cancer cell lines, SVM predicts well compared with the other methods. Overall, for the drugs known to be effective for the treatment of certain cancer types, we observe that the proposed MLS predicts the response of the drug well in many cases or similarly compared with the other methods.
Further, we apply the Ising model proposed in Subsection 2.4 to investigate the dependency structures among drugs. Because the prediction accuracy of the proposed Ising model is similar to MLS, we mainly consider conditional associations between drugs. Specifically, for each pair of drugs k and l, we record , which represents the degree of dependencies between drugs k and l, across all the genes. These values are shown as a heatmap in Figure 4, where white colors between two drugs indicate weak associations whereas black colors indicate strong associations.
Fig. 4.

Heatmap of drug dependencies
Specifically, TAE684 and PF2341066 have an estimated dependency of 0.97, which is the highest value. Both TAE684 and PF2341066 are Kinase Inhibitors, their mechanism of action is known to ALK Inhibitor, and were synthesized following published procedures (Galkin et al., 2007; Li et al., 2011). RAF265 and L-685458, both Kinase Inhibitors, have the second highest dependency value of 0.88. Both TAE684 and AEW541 are also Kinase Inhibitors, and they have the third highest dependency 0.85. For ZD-6474, only Erlotinib has significant non-zero value. Both ZD-6474 and Erlotinib are EGFR inhibitors, and that ZD-6474 is only grouped with Erlotinib according to the results in Wei et al. (2019). For PD-0332991, only Nilotinib has a non-zero value. This implies that PD-0332991 may only have an association with Nilotinib, which is consistent with the clustering results in Wei et al. (2019). For PHA-665752, only PF2341066 and Topotecan have non-zero values. Note that both PHA-665752 and PF2341066 are small molecule inhibitors related to c-met (Zillhardt et al., 2010), and PHA-665752 and Topotecan are grouped together in Wei et al. (2019). For Lapatinib, AZD0530 has the highest value 0.68. Note that these drugs are EGFR inhibitors. For AZD0530, Nilotinib has the highest value 0.82. Both Nilotinib and AZD0530 are related to BCR-ABL (Sabitha, 2012). Specifically, Nilotinib inhibits several tyrosine kinases including BCR-ABL, and AZD0530 targets BCR-ABL kinase activity and reduces the leukaemic maintenance by BCR-ABL.
5. Discussion
Although recent years have seen the adoption of SCAD as a popular penalization approach for a more parsimonious and consistent model selection in a wide range of high dimensional settings, there has been little work for rank selection using SCAD in the high-dimensional matrix literature. In this paper, we considered the use of a SCAD-type penalization coupled with information criterion for both rank and row selection consistency. More specifically, we have introduced a novel sparse reduced rank regression with correlated multiple binary responses. We have proposed a new penalized likelihood method using SCAD version of the nuclear norm, developed associated novel high-dimensional theory, and conducted comprehensive data analyses. We found that the SCAD penalty function may also benefit low-rank matrix estimation in other high dimensional settings. Our theoretical work needs more mild conditions rather than strong irrepresentable condition assumed for the ordinary nuclear norm penalization (Bach, 2008; Kong et al., 2020). To account for the dependencies among binary responses, we also developed a novel estimation procedure and theory for an Ising model in the multiple response logistic regression setting.
As matrix responses are commonly encountered in a wide range of applications as also seen in Kong et al. (2020), a natural extension of our work is to develop both low-rank and row-sparse procedures for high-dimensional matrix responses. Moreover, it is also of interest to extend the sure independence screening procedure of Kong et al. (2020) to binary matrix response settings, which may help ultra-high dimensional data analysis.
Supplementary Material
Acknowledgements
The authors thank the Editor, Associate Editor, and referees for suggestions that significantly improved the paper. Seyoung Park is supported by the National Research Foundation of Korea grant funded by the Korean government (No. NRF-2022R1A2C4002150). Eun Ryung Lee is supported by a National Research Foundation of Korea grant funded by the Korean government (No. NRF-2022R1A2C1012798). Hongyu Zhao is supported by the NIH grants R01 GM134005 and P50 CA196530, and NSF grant DMS 1902903. The authors report there are no competing interests to declare.
Footnotes
Supplementary Materials
Supplementary Materials include discussions about our assumptions in the theory, our theoretical results for the estimation error, the numerical implementations, some simulations, additional application results, and technical proofs of theorems.
References
- Bach FR (2008). Consistency of trace norm minimization. J. Mach. Learn. Res, 9, 1019–1048. [Google Scholar]
- Bai J and Ng S (2002). Determining the number of factors in approximate factor models. Econometrica, 70, 191–221. [Google Scholar]
- Barretina J, Caponigro G, Stransky N, Venkatesan K, Margolin AA, Kim S, et al. (2012). The cancer cell line encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature, 483, 603–607. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bing X and Wegkamp MH (2019). Adaptive estimation of the rank of the coefficient matrix in high-dimensional multivariate response regression models. Annals of Statistics, 6, 3157–3184. [Google Scholar]
- Blanco TJ, Frigola MD, and Aloy P (2018). Rationalizing drug response incancer cell lines. J. Mol. Biol, 430, 3016–3027. [DOI] [PubMed] [Google Scholar]
- Boyd S, Parikh N, Chu E, Peleato B, and Eckstein J (2011). Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and TrendsR in Machine Learning, 3(1), 1–122. [Google Scholar]
- Bunea F, She Y, and Wegkamp MH (2011). Optimal selection of reduced rank estimators of high-dimensional matrices. Annals of Statistics, 39(2), 1282–1309. [Google Scholar]
- Bunea F, She Y, and Wegkamp MH (2012). Joint variable and rank selection for parsimonious estimation of high-dimensional matrices. Annals of Statistics, 40(5), 2359–2388. [Google Scholar]
- Candés EJ and Recht B (2009). Exact matrix completion via convex optimization. Found. Comput. Math, 9, 717–772. [Google Scholar]
- Carpenter, A. V. S. D. O. (2010). Residential proximity to environmental sources of persistent organic pollutants and first-time hospitalizations for myocardial infarction with comorbid diabetes mellitus: a 12-year population-based study. International Journal of Occupational and Environmental Health, 23, 5–13. [DOI] [PubMed] [Google Scholar]
- Cessie SL and Houwelingen JCV (1992). Ridge estimators in logistic regression. Journal of the Royal Statistical Society: Series C, 41, 191–201. [Google Scholar]
- Chen J and Chen Z (2008). Extended bayesian information criteria for model selection with large model spaces. Biometrika, 95, 759–771. [Google Scholar]
- Chen J and Ye J (2014). Sparse trace norm regularization. Computational Statistics, 29, 623–639. [Google Scholar]
- Chen K, Dong H, and Chan K-S (2013). Reduced rank regression via adaptive nuclear norm penalization. Biometrika, 100(4), 901–920. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chen L and Huang JZ (2012). Sparse reduced-rank regression for simultaneous dimension reduction and variable selection in multivariate regression. Journal of the American Statistical Association, 107, 1533–1545. [Google Scholar]
- Cheng J, Levina E, Wang P, and Zhu J (2014). A sparse ising model with covariates. Biometrics, 70(4), 943–953. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cho Y and Park S (2022). Multivariate response regression with low-rank and generalized sparsity. Journal of the Korean Statistical Society. [Google Scholar]
- Chun H and Keles S (2010). Sparse partial least squares regression for simultaneous dimension reduction and variable selection. Journal of the Royal Statistical Society: Series B, 72, 3–25. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cumyn L. et al. (2009). Comorbidity in adults with attention-de cit hyperactivity disorder. Canadian Journal of Psychiatry, 54, 673–683. [DOI] [PubMed] [Google Scholar]
- Fan J and Li R (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96, 1348–1360. [Google Scholar]
- Fan J and Lv J (2010). A selective overview of variable selection in high dimensional feature space. Statistica Sinica, 20, 101–148. [PMC free article] [PubMed] [Google Scholar]
- Fan Y and Tang CY (2013). Tuning parameter selection in high dimensional penalized likelihood. J. R. Stat. Soc. Ser. B. Stat. Methodol, 75, 531–552. [Google Scholar]
- Ferreira D, Adega F, and Chaves R (2013). The importance of cancer cell lines as in vitro models in cancer methylome analysis and anticancer drugs testing. In: Lopez-Camarillo Cesar (Eds.),. [Google Scholar]
- Fitzmaurice GM et al. (1995). Bivariate logistic regression analysis of childhood psychopathology ratings using multiple informants. American Journal of Epidemiology, 142, 1194–1203. [DOI] [PubMed] [Google Scholar]
- Forde PM and Rudin CM (2012). Crizotinib in the treatment of non-small-cell lung cancer. Expert Opin Pharmacother, 13(8), 1195–1201. [DOI] [PubMed] [Google Scholar]
- France E and Velanovich V (2009). The relative in uence of surgical disease and co-morbidities on patient responses to a generic health-related quality-of-life instrumen. American Surgeon, 75, 1084–1090. [PubMed] [Google Scholar]
- Galkin AV, Melnick JS, Kim S, Hood TL, Li N, Li L, Xia G, R RS, Chopiuk G, and Jiang J (2007). Identification of nvp-tae684, a potent, selective, and efficacious inhibitor of npm-alk. The Proceedings of the National Academy of Sciences, 104,270–275. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Garnett MJ et al. (2012). Systematic identification of genomic markers of drug sensitivity in cancer cells. Nature, 483, 570–575. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Giraud C (2011). Low rank multivariate regression. Electronic Journal of Statistics, 5, 775–799. [Google Scholar]
- Greenlund KJ et al. (2005). Using behavioral risk factor surveillance data for heart disease and stroke prevention programs. American Journal of Preventive Medicine, 29, 81–87. [DOI] [PubMed] [Google Scholar]
- Hastie T, Tibshirani R, and Friedman J (2009). The Elements of Statistical Learning: Data Mining, inference, and Prediction, 2nd edition. Berlin: Springer. [Google Scholar]
- Hayes DK et al. (2006). Racial/ethnic and socioeconomic differences in multiple risk factors for heart disease and stroke in women: behavioral risk factor surveillance system. Journal of Women’s Health, 15, 1000–1008. [DOI] [PubMed] [Google Scholar]
- Huang J, Breheny P, and Ma S (2012). A selective review of group selection in high-dimensional models. Statistical Science, 27(4), 481–499. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hung H and Wang C-C (2012). Matrix variate logistic regression model with application to eeg data. Biostatistics, 14(1), 189–202. [DOI] [PubMed] [Google Scholar]
- Ising E (1925). Beitrag zur theorie der ferromagnetismus. Zeitschrift für Physik, 31,253–258. [Google Scholar]
- Izenman AJ (1975). Reduced-rank regression for the multivariate linear model. Journal of Multivariate Analysis, 5(2), 248–264. [Google Scholar]
- Jentsch C, Lee ER, and Mammen E (2021). Poisson reduced-rank models with an application to political text data. Biometrika, 108, 455–468. [Google Scholar]
- Keshavan RH, Montanari A, and Oh S (2010). Matrix completion from noisy entries. Journal of Machine Learning Research, 11, 2057–2078. [Google Scholar]
- Kim Y and Kwon S (2012). Global optimality of nonconvex penalized estimators. Biometrika, 99, 315–325. [Google Scholar]
- Koltchinskii V, Lounici K, and Tsybakov AB (2011). Nuclear-norm penalization and optimal rates for noisy low-rank matrix completion. Ann. Statist, 39, 2302–2329. [Google Scholar]
- Kong D, An B, Zhang J, and Zhu H (2020). L2rm: Low-rank linear regression models for high-dimensional matrix responses. Journal of American Statistical Association, 115(529), 402–424. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Koziol-McLai J et al. (2001). Predictive validity of a screen for partner violence against women. American Journal of Preventive Medicine, 21, 93–100. [DOI] [PubMed] [Google Scholar]
- Lee A and Silvapulle M (1988). Ridge estimation in logistic regression. Communications in Statistics, Simulation and Computation, 17, 1231–1257. [Google Scholar]
- Lee E and Park S (2021). Poisson reduced-rank models with sparse loadings. Journal of the Korean Statistical Society, 50, 1079–1097. [Google Scholar]
- Lee ER, Noh H, and Park BU (2014). Model selection via bayesian information criterion for quantile regression models. Journal of the American Statistical Association, 109, 216–229. [Google Scholar]
- Li Y, Ye X, Liu J, Zha J, and Pei L (2011). Evaluation of eml4-alk fusion proteins in non-small cell lung cancer using small molecule inhibitor. Neoplasia, 13, 1–11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li Y-H, Scarlett J, Ravikumar P, and Cevher V (2015). Sparsistency of l1-regularized m-estimators. In International Conference on Artificial Intelligence and Statistics (AISTATS). [Google Scholar]
- Liu L et al. (2009). Novel mechanism of lapatinib resistance in her2-positive breast tumor cells: activation of axl. Cancer Res., 69(17), 6871–6878. [DOI] [PubMed] [Google Scholar]
- Loh P and Wainwright MJ (2014). Regularized m-estimators with nonconvexity: Statistical and algorithmic theory for local optima. Journal of Machine Learning Research, 1, 1–56. [Google Scholar]
- Loh P and Wainwright MJ (2017). Support recovery without incoherence: A case for nonconvex regularization. Annals of Statistics, 45(6), 2455–2482. [Google Scholar]
- Martin GP, Sperrin M, Snell KIE, Buchan I, and Riley RD (2021). Clinical prediction models to predict the risk of multiple binary outcomes: a comparison of approaches. Statistics in Medicine, 40(2), 498–517. [DOI] [PubMed] [Google Scholar]
- Meier L, van de Geer S, and Bühlmann P (2008). The group Lasso for logistic regression. Journal of the Royal Statistical Society Series B, 70, 53–71. [Google Scholar]
- Modi S et al. (2011). Hsp90 inhibition is effective in breast cancer: A phase ii trial of tanespimycin (17-aag) plus trastuzumab in patients with her2-positive metastatic breast cancer progressing on trastuzumab. Clinical Cancer Research, 17, 5132–5139. [DOI] [PubMed] [Google Scholar]
- Negahban S and Wainwright MJ (2011). Estimation of (near) low-rank matrices with noise and high-dimensional scaling. Ann. Statist, 39, 1069–1097. [Google Scholar]
- Negahban S and Wainwright MJ (2012). Restricted strong convexity and (weighted) matrix completion: Optimal bounds with noise. J. Mach. Learn. Res, 13, 1665–1697. [Google Scholar]
- Nishii R (1984). Asymptotic properties of criteria for selection of variables in multiple regression. Annals of Statistics, 12, 758–765. [Google Scholar]
- Park M and Hastie T (2008). Penalized logistic regression for detecting gene interactions. Biostatistics, 9, 30–50. [DOI] [PubMed] [Google Scholar]
- Peng J, Zhu J, Bergamaschi A, Han W, Noh D, Pollack D, R. J, and Wang P (2010). Regularized multivariate regression for identifying master predictors with application to integrative genomics study of breast cancer. Annuls of Applied Statistics, 4, 53–77. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Petrucci E et al. (2012). A small molecule smac mimic lbw242 potentiates trail- and anticancer drug-mediated cell death of ovarian cancer cells. PLoS One, 7(4). [DOI] [PMC free article] [PubMed] [Google Scholar]
- Raskutti G, Wainwright M, and Yu B (2010). Restricted nullspace and eigenvalue properties for correlated gaussian designs. Journal of Machine Learning Research, 11, 2241–2259. [Google Scholar]
- Reinsel GC and Velu RP (1998). Multivariate Reduced-Rank Regression: Theory and Applications. Springer. [Google Scholar]
- Roberts PJ (2013). Clinical use of crizotinib for the treatment of non-small cell lung cancer. Biologics, 7, 91–101. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rohde A and Tsybakov AB (2011). Estimation of high-dimensional low-rank matrices. Ann. Statist, 39, 887–930. [Google Scholar]
- Sabitha K (2012). Nilotinib based pharmacophore models for bcr-abl. Bioinformation, 8, 658–663. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Scotlandi K et al. (2005). Antitumor activity of the insulin-like growth factor-i receptor kinase inhibitor nvp-aew541 in musculoskeletal tumors. Cancer Res., 65, 3868–3876. [DOI] [PubMed] [Google Scholar]
- Sharma S, Haber D, and Settleman J (2010). Cell line-based platforms to evaluate the therapeutic efficacy of candidate anticancer agents. Nat. Rev. Cancer, 10, 241–253. [DOI] [PubMed] [Google Scholar]
- She Y (2017). Selective factor extraction in high dimensions. Biometrika, 104(1), 97–110. [Google Scholar]
- She Y and Chen K (2017). Robust reduced-rank regression. Biometrika, 104(3), 633–647. [DOI] [PMC free article] [PubMed] [Google Scholar]
- She Y and Tran H (2019). On cross-validation for sparse reduced rank regression. Journal of the Royal Statistical Society, Series B, 81(1), 145–161. [Google Scholar]
- Sim CH (1993). First-order autoregressive logistic processes. Journal of Applied Probability, 30(2), 467–470. [Google Scholar]
- Srebro N, Rennie J, and Jaakkola T (2005). Maximum margin matrix factorization. Advances in Neural Information Processing Systems, 17. [Google Scholar]
- Staunton JE et al. (2001). Chemosensitivity prediction by transcriptional profiling. Proc. Natl. Acad. Sci. U.S.A, 98(19), 10787–92. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tang D, Park S, and Zhao H (2022). Scadie: simultaneous estimation of cell type proportions and cell type-specific gene expressions using scad-based iterative estimating procedure. Genome Biology, 23(129). [DOI] [PMC free article] [PubMed] [Google Scholar]
- Turlach B, Venables W, and Wright S (2005). Simultaneous variable selection. Technometrics, 47. [Google Scholar]
- Wang L, Wu Y, and Li R (2012). Quantile regression for analyzing heterogeneity in ultra-high dimension. Journal of the American Statistical Association, 107(497), 214–222. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang YC et al. (2011). Different mechanisms for resistance to trastuzumab versus lapatinib in her2-positive breast cancers–role of estrogen receptor and her2 reactivation. Breast Cancer Res., 13(6), R121. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wei D, Liu C, Zheng X, and Li Y (2019). Comprehensive anticancer drug response prediction based on a simple cell line-drug complex network model. BMC bioinformatics, 20, 1–15. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Welsh AH, Lin X, and Carroll RJ (2002). Marginal longitudinal nonparametric regression. Journal of the American Statistical Association, 97, 482–493. [Google Scholar]
- Wilson GT (1989). On the use of marginal likelihood in time series model estimation. Journal of the Royal Statistical Society, B51, 15–27. [Google Scholar]
- Yee TW (2015). Vector Generalized Linear and Additive Models With an Implementation in R. Springer. [Google Scholar]
- Yeh TC et al. (2007). Biological characterization of arry-142886 (azd6244), a potent, highly selective mitogen-activated protein kinase kinase 1/2 inhibitor. Clin Cancer Res., 13(5), 1576–1583. [DOI] [PubMed] [Google Scholar]
- Yuan M et al. (2007). Dimension reduction and coefficient estimation in multivariate linear regression. Journal of the Royal Statistical Society Series B, 69, 329–346. [Google Scholar]
- Zheng Q, Peng L, and He X (2015). Globally adaptive quantile regression with ultra-high dimensional data. Annals of Statistics, 43(5), 2225–2258. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhu J and Hastie T (2004). Classification of gene microarrays by penalized logistic regression. Biostatistics, 5, 427–443. [DOI] [PubMed] [Google Scholar]
- Zillhardt M, Christensen JG, and Lengyel E (2010). An orally available small-molecule inhibitor of c-met, pf-2341066, reduces tumor burden and metastasis in a preclinical model of ovarian cancer metastasis. Neoplasia, 12, 1–10. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zou C, Ke Y, and Zhang W (2021). Estimation of low rank high-dimensional multivariate linear models for multi-response data. Journal of the American Statistical Association, In Press. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
