Feature Selection Method Based on Partial Least Squares and Analysis of Traditional Chinese Medicine Data

Canyi Huang; Jianqiang Du; Bin Nie; Riyue Yu; Wangping Xiong; Qingxia Zeng

doi:10.1155/2019/9580126

. 2019 Jul 1;2019:9580126. doi: 10.1155/2019/9580126

Feature Selection Method Based on Partial Least Squares and Analysis of Traditional Chinese Medicine Data

Canyi Huang ¹, Jianqiang Du ^1,^✉, Bin Nie ¹, Riyue Yu ², Wangping Xiong ¹, Qingxia Zeng ¹

PMCID: PMC6636449 PMID: 31354860

Abstract

The partial least squares method has many advantages in multivariable linear regression, but it does not include the function of feature selection. This method cannot screen for the best feature subset (referred to in this study as the “Gold Standard”) or optimize the model, although contrarily using the L1 norm can achieve the sparse representation of parameters, leading to feature selection. In this study, a feature selection method based on partial least squares is proposed. In the new method, exploiting partial least squares allows extraction of the latent variables required for performing multivariable linear regression, and this method applies the L1 regular term constraint to the sum of the absolute values of the regression coefficients. This technique is then combined with the coordinate descent method to perform multiple iterations to select a better feature subset. Analyzing traditional Chinese medicine data and University of California, Irvine (UCI), datasets with the model, the experimental results show that the feature selection method based on partial least squares exhibits preferable adaptability for traditional Chinese medicine data and UCI datasets.

1. Introduction

In the era of rapid information technology development, data have become increasingly important. As one of the key techniques of data mining, statistical analysis methods have received extensive attention in the fields of biomedicine, physical chemistry, and traditional Chinese medicine [1–3]. One target variable, however, is often affected by other features, which exert different degrees of influence. Traditional Chinese medicine data with multicollinearity, meanwhile, include somewhat irrelevant as well as redundant information that not only increases the time and space complexity of the model but also seriously affects its accuracy and operational efficiency. In conventional statistical analysis methods, calculating the regression coefficients has the partial advantage of reflecting the relationships between features when dealing with such data [4, 5], whereas for data containing irrelevant and redundant features, feature selection may be achieved only minimally; in which case, we lose the chance to achieve model optimization and improved regression accuracy. Therefore, given the multicollinearity of traditional Chinese medicine data as well as the problem of irrelevant and redundant content, there is an urgent need to find a method of data analysis that can remove irrelevant and redundant features from the original dataset, thereby overcoming multicollinearity and screening out the “Gold Standard” feature subset in order to construct a robust model.

The remainder of this manuscript is organized as follows. Related research is introduced in Section 2. In Section 3, the new model is described in detail. In Section 4, 3 traditional Chinese medicine datasets and 3 public UCI datasets are used in the new model and subjected to experimental analysis. The new model is compared with several existing algorithms in order to further verify its feasibility and effectiveness. The final section concludes this study with a brief summary and discussion.

2. Related Work

Feature selection, as an effective dimension reduction method, involves choosing a subset of features from the original set that has an acceptable distinguishing capability based on a particular standard [6, 7], thereby retaining the features that are most effective and favorable for regression (or classification) while decreasing the complexity of the algorithm. This method has attracted the attention of numerous researchers. In the medical field, for example, Peng et al. [8] have developed a multimodal feature selection method based on a hypergraph for multitask feature selection in order to choose effective brain region data; Ye et al. [9] have proposed an informative gene selection method based on symmetric uncertainty and support vector machine (SVM) recursive feature elimination that can effectively remove irrelevant genes; and Zhang et al. [10] proposed a hybrid feature selection algorithm that can select genetic subsets with strong classification ability. In addition, feature selection methods have successfully been applied in other fields. Hu et al. [11] proposed a feature selection algorithm for joint spectral clustering and neighborhood mutual information that can remove features unrelated to markers; Huang et al. [12] proposed a feature selection algorithm based on multilabel ReliefF that can remove irrelevant features while opting for features having strong correlations with categories. In terms of our research questions, most of the experimental data from traditional Chinese medicine present characteristics having multielement, multitarget, and strong multicollinearity problems [13]. Although the feature selection method can eliminate the irrelevant features in order to achieve an improved dimensionality reduction effect, this will not be a good solution to the multicollinearity problem. Therefore, we are in the process of pursuing further improvements and continuing our exploration.

As a nonparametric multivariate statistical analysis method, partial least squares (PLS) provides a regression modeling method for multidependent variables to multi-independent variables in order to effectively solve the problem of multicollinearity [14, 15] when there is a high correlation between independent variables. Based on preponderance, some researchers have proposed a series of improved models. You et al. [16] have proposed a feature selection method (PLSRFE) by combining PLS with the recursive feature elimination method (RFE). The PLSRFE can remove irrelevant features and select a small number of features, although it lacks reliability due to the default parameters applied. Shang et al. [17] presented a robust feature selection and classification algorithm based on partial least squares regression to solve the problem of multicollinearity and redundancy in features, yet the follow-up question as to whether the selection of parameters in the model is sufficient remains to be answered. Nagaraja et al. [18] connected partial least squares regression and optimization experimental design in order to select features by analyzing the model parameters while ignoring the interference of noise samples, which weakened the robustness of the model. In view of the shortcomings of the methods proposed above, by only using partial least squares for all data in the regression model without considering the advantages of the feature selection method, we create a model with poor interpretability and issues such as overfitting, shortcomings that make it impossible to achieve “Gold Standard” screening, and model optimization. Therefore, the goal of this study is to determine how to effectively perform feature selection on multicollinear traditional Chinese medicine (TCM) data and establish a regression model that is as simple and accurate as possible.

In a feature selection study, higher-quality feature selection methods should exhibit the following characteristics [19]: (1) interpretability, meaning that the features selected in the model have scientific significance; (2) acceptable model stability; (3) avoidance of deviations in the hypothesis test; and (4) model calculation complexity within a manageable range. Traditional feature selection methods such as stepwise regression, ridge regression, and principal component regression [20, 21] only satisfy some of the above characteristics. Therefore, effectively overcoming such problems and achieving better feature selection results have become a research point for regression and classification. In order to determine a feasible solution to this problem, Tibshirani has presented a feature selection method called “lasso” [22] and applied it successfully by taking inspiration from both the ridge regression algorithm proposed by You et al. [23] and the nonnegative garrote algorithm proposed by Breiman [24]. The lasso method compresses the regression coefficients by using the absolute value function of the model coefficients as a penalty to achieve the selection of significant features as well as the estimation of corresponding parameters. The lasso method preferably overcomes the insufficiencies of the traditional selection methods in the feature selection model.

Given the above, this study proposes a feature selection method (LAPLS) that unites the concepts of the lasso method and partial least squares (PLS). This method utilizes PLS to perform the regression and applies the L1 regular term constraint to the sum of the absolute values of the regression coefficients. In this way, the regression coefficients of insignificant features are compressed to 0, helping in achieving the goal of feature selection. This technique is then combined with the coordinate descent method to perform multiple iterations in order to select a higher-quality feature subset, which can then be used to screen the target for the “Gold Standard.” The new model algorithm not only can effectively overcome the issues of multicollinearity but can also eliminate insignificant features, making it suitable for the data analysis of traditional Chinese medicine.

3. Feature Selection Method Based on Partial Least Squares (LAPLS)

The lasso method uses L1 norm penalty regression to find the optimal solution [25, 26], in which a sparse weight matrix is generated that can be applied to feature selection. The basic idea is that under the constraint that the sum of the absolute values of the regression coefficients is less than or equal to a threshold s (i.e., ∑|w| ≤ s), the sum of the residual sums is minimized, so that the regression coefficients with smaller absolute values are compressed to 0. Following this, feature selection and corresponding parameter estimation are simultaneously realized, thus achieving an improved data dimensionality reduction effect.

The partial least squares method is a regression technique for solving multiple independent variables and multiple dependent variables. Compared with traditional regression analysis, PLS overcomes the problems of multicollinearity, small sample size, and variable number limit [27, 28]. For achieving the effective modeling purposes of partial least squares, firstly, extracting the latent variables t₁ and u₁ from the original independent variable X and the dependent variable Y, respectively, and simultaneously meet the criteria that t₁ and u₁ should represent the datasets as well as possible, and t₁ has a strong explanatory power for u₁ [29, 30]. If the accuracy condition is not met, the residual information is used to continue to carry out the second latent variables extraction until the accuracy condition is satisfied [31]. In the process of regression, however, the PLS method does not have the function of feature selection and cannot achieve an improved dimensionality reduction effect. Exploiting the lasso method, however, can effectively remove irrelevant and partially redundant features, thereby selecting better feature subsets. This study uses the lasso algorithm to optimize the partial least squares method, not only to achieve the feature selection functionality of partial least squares but also to achieve the screening of the “Gold Standard” via iteration. At the same time, this research helps us to overcome the problem of multicollinearity, which is an issue with the traditional lasso method.

The LAPLS method first uses the latent variables extracted by principal component analysis and canonical correlation analysis as the input for the multiple linear regression in partial least squares and then makes the sum of the absolute values of the coefficients less than or equal to a constant when performing the regression. That is to say, the L1 regular term constraint regression coefficient is added to the objective function, and, simultaneously, the sum of the squares of the residuals is minimized so that some regression coefficients strictly equal to 0 can be generated, thereby eliminating the irrelevant and partially redundant features. Combination with the coordinate descent method (CDM) [32] at last yields the “Gold Standard” feature subset. The construction process is shown in Figure 1.

The specific construction process is as follows:

Step 1. Standardize the data (z-score): X⇒E₀, Y⇒F₀.
Step 2. Extract the latent variables: the first latent variables t₁ and u₁ are, respectively, extracted from E₀ and F₀, with the goal of deriving the largest mutation information,var(t₁)⟶max, var(u₁)⟶max; the largest degree of correlation, r(t₁, u₁)⟶max; and the largest comprehensive covariance of both sides, Cov(t₁, u₁)⟶max, in which o₁, c₁ is the first unit vector of E₀, F₀. o₁, c₁ can be obtained by solving the maximum value of o₁^TE₀^TF₀c₁, that is, using the Lagrangian algorithm; o₁, c₁ is the eigenvector corresponding to the largest eigenvalue of E₀^TF₀F₀^TE₀, F₀^TE₀E₀^TF₀. In this way, calculation of t₁, u₁ and the residual information matrix E₁, F₁ can be obtained, wherein t₁ = E₀o₁, u₁ = F₀c₁, F₁ = F₀ − t₁r₁^T, E₁ = E₀ − t₁p₁^T, r₁ = (F₀^Tt₁)/‖t₁‖², and p₁ = (E₀^Tt₁)/‖t₁‖². The residual information matrix E₁, F₁ is then substituted for E₀, F₀, after which o₂, c₂ and the second latent variables t₂, u₂ are then calculated according to the above steps.
Step 3. Judge whether satisfactory accuracy has been reached: according to the definition of cross-validity (1), if the currently extracted latent variables t₂ makes the square of Q_h less than 0.0975 [30], then the added latent variables t₂ has no significant effect on the prediction deviation of the reduction equation. Therefore, the previously extracted latent variable t₁ is sufficient to achieve satisfactory accuracy, and the algorithm can be terminated. If, however, the currently extracted latent variable t₂ makes the square of Q_h greater than 0.0975, and it can be assumed that increasing the latent variables t₂ will improve the prediction accuracy. Therefore, the next latent variables are then extracted, and a judgment is made as to whether it is beneficial for reducing the prediction deviation of the equation. This process is cyclically looped until the satisfactory accuracy is no longer improved, and the algorithm can be terminated:

\begin{matrix} Q_{h}^{2} = 1 - \frac{\sum_{i = 1}^{q} {(F_{0 i} - F_{0 h (- i)})}^{2}}{\sum_{i = 1}^{q} {(F_{0 i} - F_{0 (h - 1) i})}^{2}}, \end{matrix}

(1)

where q is the number of samples; h is the latent variables number (h = 2,…, r); and r is the rank of the matrix E₀; and F_0h(−i) is the fitting value of F₀ at the sample point i, and the solution process is as follows: firstly, divide all sample points into two parts (one part including n − 1 and another part including one), and then use n − 1 sample points and the h latent variables for doing a regression equation, and finally the one sample point is substituted into the equation to obtain the fitted value F_0h(−i). In addition, all sample points are used to fit the regression equation containing h − 1 latent variables, and then the predicted value F_{0(h − 1)i} of the ith sample point can be obtained.
Step 4. Acquire the regression coefficients: assuming that the number of the latent variables extracted is m (m < r) when the satisfactory accuracy is achieved, the t regression equation for F₀ and the inverse normalized equation are as shown in equation (2), thereby obtaining the regression coefficient W = (w₁, w₂,…, w_m) that will be processed:

\begin{matrix} \{\begin{matrix} F_{0} = t_{1} r_{1}^{T} + t_{2} r_{2}^{T} + \dots + t_{m} r_{m}^{T} + F_{m}, \\ Y = \sum w_{k} x_{k} + F_{m l} (k = 1,2, \dots, m), \end{matrix} \end{matrix}

(2)

where F_ml is the l column of the residual matrix F_m; l = 1,2,…, L; L is the number of dependent variables.
Step 5. Construct the objective function: after generating the coefficients using the above PLS regression, the function J(w) can be constructed by combining the regression coefficient W with the L1 regularization term of the lasso algorithm, in which the function satisfies the constraint that the sum of the absolute values of the regression coefficients w_j is less than or equal to a threshold. Under this condition, the sum of the squared residuals is minimized:

\begin{matrix} J (w) = \sum_{i = 1}^{q} {(y_{i} - \sum_{j = 1}^{m} w_{j} x_{j})}^{2} + λ \sum_{j = 1}^{m} |w_{j}| . \end{matrix}

(3)

Minimization of the residual sum of squares:

\begin{matrix} \begin{matrix} \arg \min_{w} & \{\sum_{i = 1}^{q} {(y_{i} - \sum_{j = 1}^{m} w_{i j} x_{i j})}^{2}\} \\ s . t & \sum_{j = 1}^{m} |w_{j}| \leq s, \end{matrix} \end{matrix}

(4)

where q is the number of samples; s is the threshold; and the parameter value is λ = e^{(iter − k)}, k ∈ [8,17].
Step 6. Solve the function: since the new model imposes the L1 regular term constraint on the regression coefficients generated by PLS, the resulting constructed function has absolute values, which makes the function underivable at the zero point. Therefore, this study uses the coordinate descent method [33] to solve the problem. It is worth noting that the new algorithm (LAPLS) is a reimplementation of the standard coordinate descent algorithm for lasso regression with an initial solution generated using PLS (Algorithm 1).
First, the function is divided into 2 parts, RSS = ∑_i=1^q(y_i − ∑_j=1^mw_jx_j)² and L1 = λ∑_j=1^m|w_j|. Next, the partial derivative is calculated separately so that the overall partial derivative can be obtained:

\begin{matrix} \frac{\partial J (w)}{\partial w} = \{\begin{matrix} 2 z_{j} w_{j} - 2 ρ_{j} - λ, & w_{j} < 0, \\ [- 2 ρ_{j} - λ, - 2 ρ_{j} + λ], & w_{j} = 0, \\ 2 z_{j} w_{j} - 2 ρ_{j} + λ, & w_{j} > 0, \end{matrix} \end{matrix}

(5)

where ρ_j = ∑_i=1^qx_j(y_i − ∑_k≠jw_kx_k), z_j = ∑_i=1^qx_i².
Let ∂j(w)/∂w = 0, and then determine the regression coefficient w_j under the regular term constraint:

\begin{matrix} w_{j} = \{\begin{matrix} \frac{(ρ_{j} + λ / 2)}{z_{j}}, & ρ_{j} < (- \frac{λ}{2}), \\ 0, & ρ_{j} \in ((- \frac{λ}{2}), \frac{λ}{2}), \\ \frac{(ρ_{j} - λ / 2)}{z_{j}}, & ρ_{j} > \frac{λ}{2} . \end{matrix} \end{matrix}

(6)

Step 7. Perform multiple iterations through Step 6, in which some regression coefficients are strictly compressed to 0 at a certain iteration so that the best feature subset (i.e., the “Gold Standard”) can be selected. The selected subset of features is then regressed to construct a simpler optimization model than PLS regression:

\begin{matrix} Y^{*} = \sum w_{h}^{*} x_{h} + F_{h}^{*}, h < m . \end{matrix}

(7)

4. Experimental Design

4.1. Experimental Data Description

In this study, the 6 experimental datasets included the traditional Chinese medicine data (WYHXB, NYWZ, and DCQT) from the Key Laboratory of Modern Chinese Medicine Preparations Ministry of Education as well as Communities and Crime (CCrime), Breast Cancer Wisconsin (Prognostic) (BreastData), and Residential Building Dataset (RBuild) from the UCI Machine Learning Repository. The basic information for each dataset is listed in Table 1. There are 798 features in WYHXB, 1 dependent variable, and 54 samples; 10283 features in NYWZ, 1 dependent variable, and 54 samples; 9 features in DCQT, 1 dependent variable, and 10 samples; CCrime describes community crime and includes 127 features, 1 dependent variable, and 1994 samples; BreastData describes breast cancer cases and includes 34 features, 1 dependent variable, and 198 samples; and RBuild describes residential buildings and includes 103 features, 1 dependent variable, and 372 samples. Since the UCI datasets obtained from the UCI Machine Learning Repository generally had numerous missing values, the mean filling method was used for data preprocessing during the experiment. The reason this study adopted CCrime, BreastData, and RBuild from the UCI datasets for the experiment was to compare the regression effects of the new model on a public dataset consisting of diversified data, in order to validate the reliability and robustness of the new model.

Table 1.

Basic dataset information (default task: regression).

Dataset	Number of samples	Number of attributes
WYHXB	54	799 (798 + 1)
NYWZ	54	10284 (10283 + 1)
DCQT	10	10 (9 + 1)
CCrime	1994	128 (127 + 1)
BreastData	198	35 (34 + 1)
RBuild	372	104 (103 + 1)

Open in a new tab

WYHXB and NYWZ are the basic experimental data of Shenfu injections used to treat cardiogenic shock, which utilizes the left anterior descending coronary artery near the edge of the heart to replicate the middle-end cardiogenic shock rat model. In seven groups of 6 shock model rats, each were injected with 0.1, 0.33, 1.0, 3.3, 10, 15, or 20 mL·kg⁻¹ of Shenfu. There was also a model setting group and a blank group. Sixty minutes after receiving the Shenfu, the red blood cell flow rate (μ m/s) pharmacodynamic indicator was collected. The material information contained in the Shenfu injection is called the exogenous substance (i.e., the WYHXB data, as shown in Table 2), and the material information of the experimental individual is called the endogenous substance (i.e., the NYWZ data, as shown in Table 3). In the 2 datasets, the material information is the independent variable (i.e., the feature), and the red blood cell flow rate is the dependent variable.

Table 2.

Partial data of basic experiments with traditional Chinese medicine substances (WYHXB).

0.34_237.0119 (m/z)	0.35_735.1196 (m/z)	0.36_588.0942 (m/z)	…	0.36_590.0903 (m/z)	Red blood cell flow rate (μ m/s)
0.48808	302.16	0	…	27.8589	750
100.078	62.016	0	…	3.80712	1400
11.6992	52.5058	7.61005	…	4.85059	785
143.643	284.113	0	…	456.607	790
7.75089	54.4535	0	…	0	670
18.2499	0	0	…	14.6621	680
…	…	…	…	…	…
28.5783	0	0	…	2.3551	850
2.91064	0	16.1624	…	3.41406	620
…	…	…	…	…	…

Open in a new tab

Table 3.

Partial data of basic experiments with traditional Chinese medicine substances (NYWZ).

11.10_787.5077 (m/z)	12.29_526.1784 (m/z)	12.29_531.2005 (m/z)	…	12.47_631.3847 (m/z)	Red blood cell flow rate (μ m/s)
53.3719	11557.6	764.329	…	1795.79	2200
43.4717	7971.33	875.465	…	1842.39	2750
76.507	3399.9	870.161	…	1562.81	1980
153.145	51027.4	916.064	…	1619.62	1860
16.3197	10694.4	942.699	…	1612.42	2100
42.2836	11048.1	714.536	…	1649.23	2000
…	…	…	…	…	…
55.5021	4702.83	748.844	…	1632.9	2481
153.21	78912.8	835.24	…	1647.55	2970
…	…	…	…	…	…

Open in a new tab

The experimental data of the traditional Chinese medicine (DCQT) were mainly used to study the factors affecting the physiological index (d-lactic acid content) under the influence of the active ingredients in Chinese medicinal rhubarb. The contents (characteristics) of the active ingredients of the rhubarb include aloe emodin, emodin, rhein and chrysophanol, emodin methyl ether, magnolol, as well as honokiol, hesperidin, and synephrine. The dependent variable is D(−)-lactic acid content. The partial experimental data are listed in Table 4.

Table 4.

Effects of active ingredients in traditional Chinese medicine on physiological indices (DCQT).

Aloe emodin	Emodin	Rhein	…	Synephrine	D(−)-lactic acid
0.0625	0.0468	0.0945	…	0.2198	0.0625
0.0450	0.0317	0.0558	…	0.4865	0.0525
0.0075	0.0085	0.0126	…	0.0176	0.0300
0.0350	0.0278	0.0434	…	0.0709	0.0400
…	…	…	…	…	…
0.1006	0.0875	0.1841	…	0.1239	0.0575
0.1060	0.0960	0.1982	…	0.0536	0.1325
0.0540	0.0441	0.0871	…	0.0471	0.1900

Open in a new tab

4.2. Results and Discussion

4.2.1. Experimental Parameters

Using a strategy of selecting and optimizing the parameters corresponding to each dataset with the goal of ensuring model result reliability is significant given that the experimental data themselves have different characteristics and the corresponding model parameters are inconsistent. First, the model parameters were initialized and set to s=0.1 and λ=e^{(iter − 10)}, where s is the model threshold, λ is the model parameter value, and iter is the number of iterations. Next, based on the initialization value, a comparison strategy was used for analysis. Specifically, with the maintenance condition of s=0.1, the λ value was gradually increased or decreased until the best λ value was determined based on the R-squared evaluation index (as shown in Table 5). Finally, by keeping the selected λ value for each dataset fixed while increasing or decreasing the threshold s, a set of model parameters could preferentially be selected (as shown in Table 6).

Table 5.

R-squared (s=0.1 constant) for 6 datasets with different λ=e^{(iter − k)} values.

	k = 8	k = 9	k = 10	k = 11	k = 12	k = 13	k = 14	k = 15	k = 16	k = 17
WYHXB	0.4756	0.5412	0.5881	0.6927	0.7321	0.7434	0.7408	0.7407	0.7411	0.7411
NYWZ	0.5827	0.6938	0.6521	0.7434	0.7689	0.7615	0.7452	0.7316	0.7294	0.7294
DCQT	0.1258	0.8760	0.9546	0.9417	0.9355	0.8686	0.9277	0.9277	0.9277	0.9277
CCrime	0.4517	0.4517	0.5320	0.6621	0.6708	0.6680	0.6684	0.6684	0.6684	0.6684
BreastData	0.6475	0.6475	0.5435	0.6829	0.7436	0.5697	0.5728	0.5732	0.5641	0.5641
RBuild	0.7567	0.7567	0.7567	0.8195	0.8801	0.9593	0.9404	0.9181	0.9045	0.8757

Open in a new tab

Table 6.

Comparative analysis of several parameter combinations for 6 datasets.

s, λ=e^{(iter − k)}	R ²
WYHXB
s = 0.001, k = 13	0.7366
s = 0.005, k = 13	0.7366
s = 0.01, k = 13	0.7427
s = 0.05, k = 13	0.7403
s = 0.1, k = 13	0.7434
s = 0.11, k = 13	0.7434
s = 0.12, k = 13	0.7434
s = 0.13, k = 13	0.7418
s = 0.14, k = 13	0.7418
s = 0.15, k = 13	0.7418
s = 0.20, k = 13	0.7235

NYWZ
s = 0.001, k = 12	0.7361
s = 0.005, k = 12	0.7361
s = 0.01, k = 12	0.7435
s = 0.05, k = 12	0.7456
s = 0.1, k = 12	0.7689
s = 0.11, k = 12	0.7692
s = 0.12, k = 12	0.7692
s = 0.13, k = 12	0.7468
s = 0.14, k = 12	0.7468
s = 0.15, k = 12	0.7468
s = 0.20, k = 12	0.7344

DCQT
s = 0.001, k = 10	0.8760
s = 0.005, k = 10	0.8760
s = 0.01, k = 10	0.8760
s = 0.05, k = 10	0.8760
s = 0.1, k = 10	0.9546
s = 0.11, k = 10	0.9546
s = 0.12, k = 10	0.9546
s = 0.13, k = 10	0.9546
s = 0.14, k = 10	0.9546
s = 0.15, k = 10	0.9546
s = 0.20, k = 10	0.9546

CCrime
s = 0.001, k = 12	0.6524
s = 0.005, k = 12	0.6524
s = 0.01, k = 12	0.6722
s = 0.05, k = 12	0.6625
s = 0.1, k = 12	0.6708
s = 0.11, k = 12	0.6620
s = 0.12, k = 12	0.6678
s = 0.13, k = 12	0.6678
s = 0.14, k = 12	0.6678
s = 0.15, k = 12	0.6678
s = 0.20, k = 12	0.6678

BreastData
s = 0.001, k = 12	0.6419
s = 0.005, k = 12	0.6419
s = 0.01, k = 12	0.6419
s = 0.05, k = 12	0.7401
s = 0.1, k = 12	0.7436
s = 0.11, k = 12	0.7436
s = 0.12, k = 12	0.7436
s = 0.13, k = 12	0.7887
s = 0.14, k = 12	0.7887
s = 0.15, k = 12	0.7887
s = 0.20, k = 12	0.7213

RBuild
s = 0.001, k = 13	0.9407
s = 0.005, k = 13	0.9407
s = 0.01, k = 13	0.9589
s = 0.05, k = 13	0.9845
s = 0.1, k = 13	0.9593
s = 0.11, k = 13	0.9593
s = 0.12, k = 13	0.8803
s = 0.13, k = 13	0.8803
s = 0.14, k = 13	0.8803
s = 0.15, k = 13	0.9285
s = 0.20, k = 13	0.9285

Open in a new tab

In Table 5, keeping s=0.1 fixed, the results are as follows: for the WYHXB data, the result of the homologous R-squared evaluation indicates that the best result occurs when k = 13 (i.e., λ=e^{(iter − 13)}); for the NYWZ data, the best result occurs when k = 12 (i.e., λ=e^{(iter − 12)}); for the DCQT data, the best result occurs when k = 10 (i.e., λ=e^{(iter − 10)}); for the CCrime data, the best result occurs when k = 12 (i.e., λ=e^{(iter − 12)}); for the BreastData data, the best result occurs when k = 12 (i.e., λ=e^{(iter − 12)}); and for the RBuild data, the best result occurs when k = 13 (i.e., λ=e^{(iter − 13)}). By comparing the results of Table 6, one group of optimal parameters for each dataset can be selected: for the WYHXB data, we should choose λ=e^{(iter − 13)}, s=0.1; for the NYWZ data, we should choose λ=e^{(iter − 12)}, s=0.11; for the DCQT data, we should choose λ=e^{(iter − 10)}, s=0.1; for the CCrime data, we should choose λ=e^{(iter − 12)}, s=0.11; for the BreastData data, we should choose λ=e^{(iter − 12)}, s=0.13; and for the RBuild data, we should choose λ=e^{(iter − 13)}, s=0.05.

At the same time, in order to verify the feasibility and availability of the LAPLS method, we should consider the feature analyses of each dataset (with the model parameters utilizing the above results) during our experimentation. Specifically, by determining the iteration at a particular time (the number of iterations ranged from 1 to 25), the model achieving the best feature subset (“Gold Standard”) and the best homologous R-squared value can be selected. The results of this analysis are shown in Figures 2–8. It can be seen in Figure 2 that the number of features in the 6 groups of experimental data decreased as the number of iterations increased, indicating the trend towards the goal of eliminating irrelevant and partially redundant features. Similarly, Figures 3–8 show the trend of the corresponding R-squared values during the iterative process of each dataset (i.e., the stages as the number of features change). This does not indicate, however, that the fewer the number of features, the better the calculated result. The specific results were as follows: for the WYHXB data, after 10 iterations, 425 significant features could be selected (373 features had been eliminated), and the homologous R-squared value was optimal; for the NYWZ data, after 12 iterations, 1247 significant features could be selected (9036 features had been eliminated) and the homologous R-squared value was optimal; for the DCQT data, after 11 iterations, 5 significant features could be selected (4 features had been eliminated) and the homologous R-squared value was optimal; for the CCrime data, after 11 iterations, 82 significant features could be selected (45 features had been eliminated) and the homologous R-squared value was optimal; for the BreastData data, after 13 iterations, 22 significant features could be selected (12 features had been eliminated) and the homologous R-squared value was optimal; for the RBuild data, after 14 iterations, 39 significant features could be selected (64 features had been eliminated) and the homologous R-squared value was optimal.

From the above experiments, we could determine the respective parameters and the homologous iteration times for each of the 6 datasets and also obtain the dimensionality reduction effect of each feature selection by the new model (i.e., the degree of irrelevant and redundant feature elimination), as shown in Figure 9. It is worth noting, however, that the number of eliminated features cannot be 0, given the meaning of feature selection.

Feature selection results for the 6 datasets.

4.2.2. Comparison of the LAPLS with Other Methods

Further analysis of the new model consisted of randomly dividing each dataset into a training set and a test set (ratio = 7 : 3) and then utilizing traditional partial least squares (PLS), lasso, PLSRFE, and the improved algorithm (LAPLS) for training and learning. The test set was subjected to a regression experiment (with parameters and number of iterations consistent with the previously listed optimal values), in which the R-squared (R²) and root-mean-square error (RMSE) values were used as the model evaluation indicators. Meanwhile, in order to ensure the reliability of the experimental results, 10 tests were performed for each set of experimental data, and ultimately, the respective average values were chosen as the final experimental results, as shown in Table 7. Via the above experimental design, the verification of the new model could be analyzed from two perspectives: (1) the comparison between the LAPLS and the traditional methods (PLS, lasso) and (2) the comparison between the LAPLS and the same type of feature selection method (PLSRFE).

Table 7.

Comparison of experimental results of the LAPLS with other methods (evaluation indicators: R² and RMSE).

	PLS		Lasso		PLSRFE		LAPLS
	R ²	RMSE	R ²	RMSE	R ²	RMSE	R ²	RMSE
WYHXB	0.5660	422.1680	0.4538	427.7071	0.6498	418.3305	0.6558	412.7325
NYWZ	0.6072	154.8713	0.6254	152.8729	0.6791	158.7410	0.7326	140.5172
DCQT	0.8262	0.01620	0.7988	0.0129	0.8848	0.0181	0.9384	0.0117
CCrime	0.6419	0.1388	0.6609	0.1306	0.7355	0.1413	0.6703	0.1516
BreastData	0.6333	3.5338	0.6414	3.1766	0.5777	3.6686	0.7064	3.1468
RBuild	0.9616	221.2931	0.9815	226.7571	0.9746	190.4369	0.9831	202.5260
Average	0.7060	133.6702	0.6936	135.1095	0.7502	128.5561	0.7811	126.5143

Open in a new tab

Table 7 lists the experimental results of LAPLS regression on the test sets of 6 sets of original data (WYHXB, NYWZ, DCQT, CCrime, BreastData, and RBuild). The R² values were 0.6558, 0.7326, 0.9384, 0.6703, 0.7064, and 0.9831, respectively, and the RMSE were 412.7325, 140.5172, 0.0117, 0.1516, 3.1468, and 202.5260, respectively. Compared with PLS and lasso, the LAPLS had a slightly inferior RMSE for the CCrime (larger sample size) data (0.0128 greater error than PLS and 0.0212 greater error than lasso), but for the results of the remaining experimental datasets, the new method performed better than the traditional methods. Compared with the PLSRFE, the LAPLS was slightly inferior for the CCrime data and the LAPLS RMSE was slightly higher than the PLSRFE RMSE for the RBuild data, although the LAPLS results were better than those of the PLSRFE for the remaining experimental data. Overall, the results of the improved algorithm were better than those of the other existing algorithms, indicating that the new model has the effect of eliminating irrelevant and redundant features. In addition, as shown by the experimental results, the new model proved to be relatively adaptable, demonstrating effectiveness not only for multifeature data but also for data with fewer features.

In order to observe the experimental results more intuitively, the trend graphs were portrayed separately (Figures 10 and 11) in order to reflect the fluctuations of the R² and RMSE values. It can be seen that the R² and RMSE values of the new model for the 6 sets of experimental data are basically superior to those of the other algorithms, indicating that the regression results of the new model are better and that this approach effectively eliminates the irrelevant and partially redundant features. In summary, the improved algorithm not only performs feature selection and screens out the “Gold Standard” feature subset for general high-dimensional data but also could well suit the traditional Chinese medicine data.

Experimental results of 6 datasets (R-squared).

Experimental results of 6 datasets (RMSE).

5. Conclusions

The traditional partial least squares method has no feature selection function, and the goal of obtaining of a higher-quality feature subset cannot be achieved for experimental data from traditional Chinese medicine. Given this, we proposed a feature selection method based on partial least squares. This method made sufficient use of the advantages of the lasso algorithm, namely, imposing constraints on the sum of the absolute values of the regression coefficients and carrying out feature selection, while simultaneously combining this technique with the partial least squares method, which can solve the multicollinearity problem in order to perform regression analysis. In this way, both data dimensionality reduction and the screening of the “Gold Standard” feature subset were realized. Through the experimental comparison of TCM data and UCI datasets, it was clearly demonstrated that the improved algorithm significantly strengthens the interpretation degree and prediction accuracy of the model, and it is a suitable analytical method for TCM data. The improved algorithm, however, has the disadvantage of only eliminating the partially redundant features from high-dimensional data. Going forward, we will continue to improve the algorithm in order to boost its efficiency. In addition, ensuring that reasonable relevant parameters are set during model creation also requires further study.

Acknowledgments

This research was funded by the National Natural Science Foundation of China (nos. 61762051 and 61562045) and the Jiangxi Province Major Projects Fund (no. 20171ACE50021).

Data Availability

The TCM data used in this study can be obtained by contacting the first author, and other data sets can be obtained through the UCI Machine Learning Repository.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

References

1.Andreas M., Benjamin H., Elisabeth W., Tobias H., Sebastian M., Olaf G. An update on statistical boosting in biomedicine. Computational and Mathematical Methods in Medicine. 2017;2017:12. doi: 10.1155/2017/6083072.6083072 [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Toribio A. R., Prisle N. L., Wexler A. S. Statistical mechanics of multilayer sorption: surface concentration modeling and XPS measurement. Journal of Physical Chemistry Letters. 2018;9(6):1461–1464. doi: 10.1021/acs.jpclett.8b00332. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Li W., Han H., Cheng S., Zhang Y., Liu S., Qu H. A feasibility research on the monitoring of traditional Chinese medicine production process using NIR-based multivariate process trajectories. Sensors and Actuators B: Chemical. 2016;231:313–323. doi: 10.1016/j.snb.2016.03.023. [DOI] [Google Scholar]
4.Liu J., Gao P., Yuan J., Du X. An effective method of monitoring the large-scale traffic pattern based on RMT and PCA. Journal of Probability and Statistics. 2010;2010:16. doi: 10.1155/2010/375942.375942 [DOI] [Google Scholar]
5.Hellton K. H., Hjort N. L. Fridge: focused fine-tuning of ridge regression for personalized predictions. Statistics in Medicine. 2018;37(8):1290–1303. doi: 10.1002/sim.7576. [DOI] [PubMed] [Google Scholar]
6.Andrea B., Rahnenführer J., Lang M. A multicriteria approach to find predictive and sparse models with stable feature selection for high-dimensional data. Computational and Mathematical Methods in Medicine. 2017;2017:18. doi: 10.1155/2017/7907163.7907163 [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Xuan H. Research and development of feature dimensionality reduction. Computer Science. 2018;45(S1) [Google Scholar]
8.Peng Y., Zu C., Zhang D. Hypergraph based multi-modal feature selection and its application. Journal of Frontiers of Computer Science and Technology. 2018;12(1):112–118. doi: 10.3778/j.issn.1673-9418.1611004. [DOI] [Google Scholar]
9.Ye Q., Gao Y., Wu R., et al. Informative gene selection method based on symmetric uncertainty and SVM recursive feature elimination. International Journal of Pattern Recognition and Artificial Intelligence. 2017;30(5):429–438. [Google Scholar]
10.Zhang J., Hu X. G., Li P. P., et al. Informative gene selection for tumor classification based on iterative lasso. Pattern Recognition & Artificial Intelligence. 2014;27(1):49–59. [Google Scholar]
11.Hu M., Zheng L., Tang L., et al. Feature selection algorithm based on joint spectral clustering and neighborhood mutual information. Pattern Recognition & Artificial Intelligence. 2017;30(12):1121–1129. [Google Scholar]
12.Huang L., Tang J., Sun D., Luo B. Feature selection algorithm based on multi-label reliefF. Journal of Computer Applications. 2012;32(10):2888–2890. doi: 10.3724/sp.j.1087.2012.02888. [DOI] [Google Scholar]
13.Feng Y., Lin X., Shen L., Hong Y. L. Pharmaceutical study on multi-component traditional Chinese medicines. China Journal of Chinese Materia Medica. 2013;38(5):629–632. doi: 10.4268/cjcmm20130502. [DOI] [PubMed] [Google Scholar]
14.Hao Z., Du J., Nie B., Yu F., Yu R., Xiong W. Random forest regression based on partial least squares connect partial least squares and random forest. Proceedings of the International Conference on Artificial Intelligence: Technologies and Applications; January 2016; Bangkok, Thailand. [DOI] [Google Scholar]
15.Yu F., Du J. Q., Nie B., Xiong J., Zhu Z.-P., Liu L. Optimization method of fusing model tree into partial least squares. ITM Web of Conferences. 2017;12 doi: 10.1051/itmconf/20171203032.03032 [DOI] [Google Scholar]
16.You W., Yang Z., Yuan M., Ji G. TotalPLS: local dimension reduction for multicategory microarray data. IEEE Transactions on Human-Machine Systems. 2014;44(1):125–138. doi: 10.1109/thms.2013.2288777. [DOI] [Google Scholar]
17.Shang Z., Dong Y., Li M., Li Z. Robust feature selection and classification algorithm based on partial least squares regression. Journal of Computer Applications. 2017;37(3):871–875. doi: 10.11772/j.issn.1001-9081.2017.03.871. [DOI] [Google Scholar]
18.Nagaraja V. K., Abdalmageed W. Feature selection using partial least squares regression and optimal experiment design. Proceedings of the International Joint Conference on Neural Networks (IJCNN); July 2015; IEEE, Killarney, Ireland. [DOI] [Google Scholar]
19.Wanfeng S., Xuegang H. U. K-part lasso based on feature selection algorithm for high-dimensional data. Computer Engineering and Applications. 2012;48(1):157–161. [Google Scholar]
20.Friedman J. H. Multivariate adaptive regression splines. Annals of Statistics. 1991;19(1):1–67. doi: 10.1214/aos/1176347963. [DOI] [Google Scholar]
21.Hastie T., Taylor J., Tibshirani R., Walther G. Forward stagewise regression and the monotone lasso. Electronic Journal of Statistics. 2007;1:1–29. doi: 10.1214/07-ejs004. [DOI] [Google Scholar]
22.Tibshirani R. Regression shrinkage and selection via the lasso: a retrospective. Journal of the Royal Statistical Society: Series B Statistical Methodology. 2011;73(3):273–282. doi: 10.1111/j.1467-9868.2011.00771.x. [DOI] [Google Scholar]
23.You F. M., Booker H. M., Duguid S. D., Jia G., Cloutier S. Accuracy of genomic selection in biparental populations of flax (Linum usitatissimum L.) Crop Journal. 2016;4(4):290–303. doi: 10.1016/j.cj.2016.03.001. [DOI] [Google Scholar]
24.Breiman L. Better subset regression using the nonnegative garrote. Technometrics. 1995;37(4):373–384. doi: 10.2307/1269730. [DOI] [Google Scholar]
25.Muthukrishnan R., Rohini R. LASSO: a feature selection technique in predictive modeling for machine learning. Proceedings of the IEEE International Conference on Advances in Computer Applications (ICACA); December 2016; London, UK. IEEE; pp. 18–20. [Google Scholar]
26.Tibshirani R. Regression shrinkage and selection via the lasso: a retrospective. Journal of the Royal Statistical Society: Series B (Methodological) 1996;58(1):267–288. doi: 10.1111/j.2517-6161.1996.tb02080.x. [DOI] [Google Scholar]
27.Tsuchida J., Yadohisa H. Partial least-squares method for three-mode three-way datasets based on tucker model. Procedia Computer Science. 2017;114:234–241. doi: 10.1016/j.procs.2017.09.065. [DOI] [Google Scholar]
28.Huang H., Chen B., Liu C. Safety monitoring of a super-high dam using optimal kernel partial least squares. Mathematical Problems in Engineering. 2015;2015(12):13. doi: 10.1155/2015/571594.571594 [DOI] [Google Scholar]
29.Wang H., Hu Z. Maximum margin criterion embedded partial least square regression for linear and nonlinear discrimination. Proceedings of the International Conference on Computational Intelligence & Security; November 2006; Guangzhou, China. [DOI] [Google Scholar]
30.Wang H. Linear and Nonlinear Methods for Partial Least Squares Regression. Beijing, China: National Defense Industry Press; 2006. [Google Scholar]
31.Zhu Y., Wang H. A simplified algorithm of PLS regression. Journal of Systems Science and Systems Engineering. 2000;9(4):31–36. [Google Scholar]
32.Grant E., Lange K., Wu T. T. Coordinate descent algorithms for L1 and L2 regression. Journal of the American College of Cardiology. 1995;25(2):491–499. [Google Scholar]
33.Nesterov Y. Efficiency of coordinate descent methods on huge-scale optimization problems. SIAM Journal on Optimization. 2012;22(2):341–362. doi: 10.1137/100802001. [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The TCM data used in this study can be obtained by contacting the first author, and other data sets can be obtained through the UCI Machine Learning Repository.

[B1] 1.Andreas M., Benjamin H., Elisabeth W., Tobias H., Sebastian M., Olaf G. An update on statistical boosting in biomedicine. Computational and Mathematical Methods in Medicine. 2017;2017:12. doi: 10.1155/2017/6083072.6083072 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B2] 2.Toribio A. R., Prisle N. L., Wexler A. S. Statistical mechanics of multilayer sorption: surface concentration modeling and XPS measurement. Journal of Physical Chemistry Letters. 2018;9(6):1461–1464. doi: 10.1021/acs.jpclett.8b00332. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B3] 3.Li W., Han H., Cheng S., Zhang Y., Liu S., Qu H. A feasibility research on the monitoring of traditional Chinese medicine production process using NIR-based multivariate process trajectories. Sensors and Actuators B: Chemical. 2016;231:313–323. doi: 10.1016/j.snb.2016.03.023. [DOI] [Google Scholar]

[B4] 4.Liu J., Gao P., Yuan J., Du X. An effective method of monitoring the large-scale traffic pattern based on RMT and PCA. Journal of Probability and Statistics. 2010;2010:16. doi: 10.1155/2010/375942.375942 [DOI] [Google Scholar]

[B5] 5.Hellton K. H., Hjort N. L. Fridge: focused fine-tuning of ridge regression for personalized predictions. Statistics in Medicine. 2018;37(8):1290–1303. doi: 10.1002/sim.7576. [DOI] [PubMed] [Google Scholar]

[B6] 6.Andrea B., Rahnenführer J., Lang M. A multicriteria approach to find predictive and sparse models with stable feature selection for high-dimensional data. Computational and Mathematical Methods in Medicine. 2017;2017:18. doi: 10.1155/2017/7907163.7907163 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B7] 7.Xuan H. Research and development of feature dimensionality reduction. Computer Science. 2018;45(S1) [Google Scholar]

[B8] 8.Peng Y., Zu C., Zhang D. Hypergraph based multi-modal feature selection and its application. Journal of Frontiers of Computer Science and Technology. 2018;12(1):112–118. doi: 10.3778/j.issn.1673-9418.1611004. [DOI] [Google Scholar]

[B9] 9.Ye Q., Gao Y., Wu R., et al. Informative gene selection method based on symmetric uncertainty and SVM recursive feature elimination. International Journal of Pattern Recognition and Artificial Intelligence. 2017;30(5):429–438. [Google Scholar]

[B10] 10.Zhang J., Hu X. G., Li P. P., et al. Informative gene selection for tumor classification based on iterative lasso. Pattern Recognition & Artificial Intelligence. 2014;27(1):49–59. [Google Scholar]

[B11] 11.Hu M., Zheng L., Tang L., et al. Feature selection algorithm based on joint spectral clustering and neighborhood mutual information. Pattern Recognition & Artificial Intelligence. 2017;30(12):1121–1129. [Google Scholar]

[B12] 12.Huang L., Tang J., Sun D., Luo B. Feature selection algorithm based on multi-label reliefF. Journal of Computer Applications. 2012;32(10):2888–2890. doi: 10.3724/sp.j.1087.2012.02888. [DOI] [Google Scholar]

[B13] 13.Feng Y., Lin X., Shen L., Hong Y. L. Pharmaceutical study on multi-component traditional Chinese medicines. China Journal of Chinese Materia Medica. 2013;38(5):629–632. doi: 10.4268/cjcmm20130502. [DOI] [PubMed] [Google Scholar]

[B14] 14.Hao Z., Du J., Nie B., Yu F., Yu R., Xiong W. Random forest regression based on partial least squares connect partial least squares and random forest. Proceedings of the International Conference on Artificial Intelligence: Technologies and Applications; January 2016; Bangkok, Thailand. [DOI] [Google Scholar]

[B15] 15.Yu F., Du J. Q., Nie B., Xiong J., Zhu Z.-P., Liu L. Optimization method of fusing model tree into partial least squares. ITM Web of Conferences. 2017;12 doi: 10.1051/itmconf/20171203032.03032 [DOI] [Google Scholar]

[B16] 16.You W., Yang Z., Yuan M., Ji G. TotalPLS: local dimension reduction for multicategory microarray data. IEEE Transactions on Human-Machine Systems. 2014;44(1):125–138. doi: 10.1109/thms.2013.2288777. [DOI] [Google Scholar]

[B17] 17.Shang Z., Dong Y., Li M., Li Z. Robust feature selection and classification algorithm based on partial least squares regression. Journal of Computer Applications. 2017;37(3):871–875. doi: 10.11772/j.issn.1001-9081.2017.03.871. [DOI] [Google Scholar]

[B18] 18.Nagaraja V. K., Abdalmageed W. Feature selection using partial least squares regression and optimal experiment design. Proceedings of the International Joint Conference on Neural Networks (IJCNN); July 2015; IEEE, Killarney, Ireland. [DOI] [Google Scholar]

[B19] 19.Wanfeng S., Xuegang H. U. K-part lasso based on feature selection algorithm for high-dimensional data. Computer Engineering and Applications. 2012;48(1):157–161. [Google Scholar]

[B20] 20.Friedman J. H. Multivariate adaptive regression splines. Annals of Statistics. 1991;19(1):1–67. doi: 10.1214/aos/1176347963. [DOI] [Google Scholar]

[B21] 21.Hastie T., Taylor J., Tibshirani R., Walther G. Forward stagewise regression and the monotone lasso. Electronic Journal of Statistics. 2007;1:1–29. doi: 10.1214/07-ejs004. [DOI] [Google Scholar]

[B22] 22.Tibshirani R. Regression shrinkage and selection via the lasso: a retrospective. Journal of the Royal Statistical Society: Series B Statistical Methodology. 2011;73(3):273–282. doi: 10.1111/j.1467-9868.2011.00771.x. [DOI] [Google Scholar]

[B23] 23.You F. M., Booker H. M., Duguid S. D., Jia G., Cloutier S. Accuracy of genomic selection in biparental populations of flax (Linum usitatissimum L.) Crop Journal. 2016;4(4):290–303. doi: 10.1016/j.cj.2016.03.001. [DOI] [Google Scholar]

[B24] 24.Breiman L. Better subset regression using the nonnegative garrote. Technometrics. 1995;37(4):373–384. doi: 10.2307/1269730. [DOI] [Google Scholar]

[B25] 25.Muthukrishnan R., Rohini R. LASSO: a feature selection technique in predictive modeling for machine learning. Proceedings of the IEEE International Conference on Advances in Computer Applications (ICACA); December 2016; London, UK. IEEE; pp. 18–20. [Google Scholar]

[B26] 26.Tibshirani R. Regression shrinkage and selection via the lasso: a retrospective. Journal of the Royal Statistical Society: Series B (Methodological) 1996;58(1):267–288. doi: 10.1111/j.2517-6161.1996.tb02080.x. [DOI] [Google Scholar]

[B27] 27.Tsuchida J., Yadohisa H. Partial least-squares method for three-mode three-way datasets based on tucker model. Procedia Computer Science. 2017;114:234–241. doi: 10.1016/j.procs.2017.09.065. [DOI] [Google Scholar]

[B28] 28.Huang H., Chen B., Liu C. Safety monitoring of a super-high dam using optimal kernel partial least squares. Mathematical Problems in Engineering. 2015;2015(12):13. doi: 10.1155/2015/571594.571594 [DOI] [Google Scholar]

[B29] 29.Wang H., Hu Z. Maximum margin criterion embedded partial least square regression for linear and nonlinear discrimination. Proceedings of the International Conference on Computational Intelligence & Security; November 2006; Guangzhou, China. [DOI] [Google Scholar]

[B30] 30.Wang H. Linear and Nonlinear Methods for Partial Least Squares Regression. Beijing, China: National Defense Industry Press; 2006. [Google Scholar]

[B31] 31.Zhu Y., Wang H. A simplified algorithm of PLS regression. Journal of Systems Science and Systems Engineering. 2000;9(4):31–36. [Google Scholar]

[B32] 32.Grant E., Lange K., Wu T. T. Coordinate descent algorithms for L1 and L2 regression. Journal of the American College of Cardiology. 1995;25(2):491–499. [Google Scholar]

[B33] 33.Nesterov Y. Efficiency of coordinate descent methods on huge-scale optimization problems. SIAM Journal on Optimization. 2012;22(2):341–362. doi: 10.1137/100802001. [DOI] [Google Scholar]

PERMALINK

Feature Selection Method Based on Partial Least Squares and Analysis of Traditional Chinese Medicine Data

Canyi Huang

Jianqiang Du

Bin Nie

Riyue Yu

Wangping Xiong

Qingxia Zeng

Abstract

1. Introduction

2. Related Work

3. Feature Selection Method Based on Partial Least Squares (LAPLS)

Figure 1.

Algorithm 1.

4. Experimental Design

4.1. Experimental Data Description

Table 1.

Table 2.

Table 3.

Table 4.

4.2. Results and Discussion

4.2.1. Experimental Parameters

Table 5.

Table 6.

Figure 2.

Figure 3.

Figure 4.

Figure 5.

Figure 6.

Figure 7.

Figure 8.

Figure 9.

4.2.2. Comparison of the LAPLS with Other Methods

Table 7.

Figure 10.

Figure 11.

5. Conclusions

Acknowledgments

Data Availability

Conflicts of Interest

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases