Local Linear Regression and the problem of dimensionality: a remedial strategy via a new locally adaptive bandwidths selector

O Eguasa; E Edionwe; J I Mbegbu

doi:10.1080/02664763.2022.2026895

. 2022 Jan 31;50(6):1283–1309. doi: 10.1080/02664763.2022.2026895

Local Linear Regression and the problem of dimensionality: a remedial strategy via a new locally adaptive bandwidths selector

O Eguasa ^a,^CONTACT, E Edionwe ^b, J I Mbegbu ^c

PMCID: PMC10071963 PMID: 37025278

Abstract

Local Linear Regression (LLR) is a nonparametric regression model applied in the modeling phase of Response Surface Methodology (RSM). LLR does not make reference to any fixed parametric model. Hence, LLR is flexible and can capture local trends in the data that might be too complicated for the OLS. However, besides the small sample size and sparse data which characterizes RSM, the performance of the LLR model nosedives as the number of explanatory variables considered in the study increases. This phenomenon, popularly referred to as curse of dimensionality, results in the scanty application of LLR in RSM. In this paper, we propose a novel locally adaptive bandwidths selector, unlike the fixed bandwidths and existing locally adaptive bandwidths selectors, takes into account both the number of the explanatory variables in the study and their individual values at each data point. Single and multiple response problems from the literature and simulated data were used to compare the performance of the $L L R_{P A B}$ with those of the OLS, $L L R_{F B}$ and $L L R_{A B}$ . Neural network activation functions such ReLU, Leaky-ReLU, SELU and SPOCU was considered and give a remarkable improvement on the loss function (Mean Squared Error) over the regression models utilized in the three data.

KEYWORDS: Desirability function, locally adaptive bandwidths selector, Local Linear Regression model, SPOCU activation function, Response Surface Methodology

1. Introduction

Response Surface Methodology (RSM) is a collection of mathematical and statistical techniques applied in the modeling and analysis of data in which a response is influenced by one or more explanatory variables, [5]. There are three distinct phases in RSM, namely, the Design of Experiment Phase, the Modeling Phase, and the Optimization Phase, see [6].

In the Modeling Phase of RSM, a fundamental assumption is that the relationship between the response variable $y$ and $k$ explanatory variables $x_{1}, x_{2}, \dots, x_{k},$ may be represented as:

y_{i} = f (x_{i 1,} x_{i 2}, \dots, x_{i k}) + ε_{i}, i = 1, 2, \dots, n

(1)

where the mean function $f$ denotes the true but unknown relationship between the response variable and the $k$ explanatory variables, $ε_{i}, i = 1, 2, \dots, n,$ are random error terms assumed to have a normal distribution with mean zero and constant variance and $n$ is the sample size, see [27,33].

The OLS and the LLR are existing regression models applied in the estimation of the unknown function $f$ in (1) [3,18]. The OLS model is applied in the estimation of the unknown parameters (coefficients) in the parametric (polynomial) model that the experimenter assumes adequate to approximate $f$ in (1), see [33].

1.1. The parametric regression model

y = X β + ϵ

(2)

The OLS estimate ${\hat{y}}_{i}^{(O L S)}$ response in the $i^{t h}$ data point is given as:

{\hat{y}}_{i}^{(O L S)} = x_{i} (X^{T} X)^{- 1} X^{T} y

(3)

where $y$ is a $n \times$ 1 vector of response, $X$ is a $n \times p$ model matrix, $p$ is the number of model parameters (coefficients), $X^{T}$ is the transpose of the matrix $X$ , and $x_{i}$ is the $i^{t h}$ row vector of the matrix $X$ , see [29].

In matrix notation, the vector of OLS estimated response is expressed as:

{\hat{y}}^{(O L S)} = [\begin{matrix} h_{1}^{(O L S)} \\ \begin{matrix} h_{2}^{(O L S)} \\ \begin{matrix} ⋮ \\ h_{n}^{(O L S)} \end{matrix} \end{matrix} \end{matrix}] y = H^{(O L S)} y,

(4)

where the vector $h_{i}^{(O L S)} = x_{i} (X^{T} X)^{- 1} X^{T}$ is the $i^{t h}$ row of the $n \times n$ OLS Hat matrix $H^{(O L S)}$ .

The OLS model requires several assumptions to be met for valid interpretation of its parameter estimates. Furthermore, it performs poorly if the assumed polynomial model is inadequate for the data, also see [33].

1.2. The Local Linear Regression model

The LLR model is a nonparametric regression version of the weighted least squares model, also see [14,12,23]. The LLR estimate, ${\hat{y}}_{i}^{(L L R)}$ of $y_{i}$ , is given as:

{\hat{y}}_{i}^{(L L R)} = {\tilde{x}}_{i} ({\tilde{X}}^{T} W_{i} \tilde{X})^{- 1} {\tilde{X}}^{T} W_{i} y = h_{i}^{(L L R)} y,

(5)

where ${\tilde{x}}_{i}$ is the $i^{t h}$ row of the LLR model matrix $\tilde{X}$ given as:

\tilde{X} = {[\begin{matrix} 1 & \begin{matrix} x_{11} & x_{12} \end{matrix} & \begin{matrix} \dots & x_{1 k} \end{matrix} \\ 1 & \begin{matrix} x_{21} & x_{22} \end{matrix} & \begin{matrix} \dots & x_{2 k} \end{matrix} \\ \begin{matrix} ⋮ \\ 1 \end{matrix} & \begin{matrix} \begin{matrix} ⋮ & ⋮ \end{matrix} \\ \begin{matrix} x_{n 1} & x_{n 2} \end{matrix} \end{matrix} & \begin{matrix} \begin{matrix} ⋮ & ⋮ \end{matrix} \\ \begin{matrix} \dots & x_{n k} \end{matrix} \end{matrix} \end{matrix}]}_{n \times (k + 1)},

where $x_{i j}$ , $i = 1, 2, \dots, n, j = 1, 2, \dots, k,$ denotes the value of the $j^{t h}$ explanatory variable in the $i^{t h}$ data point, $W_{i}$ is a $n \times n$ diagonal weights matrix given as:

W_{i} = {[\begin{matrix} \begin{matrix} w_{1 i} \\ 0 \end{matrix} & \begin{matrix} 0 \\ w_{2 i} \end{matrix} & \begin{matrix} \begin{matrix} \dots \\ \dots \end{matrix} & \begin{matrix} 0 \\ 0 \end{matrix} \end{matrix} \\ ⋮ & ⋮ & \begin{matrix} ⋱ & ⋮ \end{matrix} \\ 0 & 0 & \begin{matrix} 0 & w_{n i} \end{matrix} \end{matrix}]}_{n \times n}

1.3. Bandwidths for nonparametric regression model

The bandwidth is the most crucial parameter in nonparametric regression and estimation procedure because the choice selected determines the shape of the fitted curve, see [14,22,30]. If $b_{i j} = b$ for all $i$ and all $j$ , we say that smoothing in the LLR model is done using fixed bandwidth or global $b,$ otherwise $b_{i j}$ , $i = 1, 2, \dots, n$ , $j = 1, 2, \dots, k$ , are referred to as locally adaptive bandwidths given as:

$K (\frac{x_{i j} - x_{1 j}}{b_{i j}}) = e^{- {(\frac{x_{i j} - x_{1 j}}{b_{i j}})}^{2}}$ is the simplified Gaussian kernel function and $b_{i j}$ , $0 < b_{i j} \leq 1$ , $i = 1, 2, \dots, n$ , $j = 1, 2, \dots, k,$ are the bandwidths (smoothing parameters), see [28,29].

In a situation where more than one explanatory are used in the model matrix $X$ , the kernel weight $w_{1 i}$ is a product kernel given as:

w_{1 i} = \prod_{j = 1}^{k} K (\frac{x_{i j} - x_{1 j}}{b_{i j}}) / \sum_{i = 1}^{n} \prod_{j = 1}^{k} K (\frac{x_{i j} - x_{1 j}}{b_{i j}}), i = 1, 2, \dots, n, j = 1, 2, \dots, k,

(6)

Each value of $b_{i j}$ , $i = 1, 2, \dots, n,$ and $j = 1, 1, \dots, k$ , may be thought of as an entry in a matrix say $Φ$ given as:

Φ = {[\begin{matrix} b_{11} & b_{12} & \begin{matrix} \dots & b_{1 k} \end{matrix} \\ b_{21} & b_{22} & \begin{matrix} \dots & b_{2 k} \end{matrix} \\ \begin{matrix} ⋮ \\ b_{n 1} \end{matrix} & \begin{matrix} ⋮ \\ b_{n 1} \end{matrix} & \begin{matrix} \begin{matrix} ⋮ & ⋮ \end{matrix} \\ \begin{matrix} \dots & b_{n k} \end{matrix} \end{matrix} \end{matrix}]}_{n x k}

In the matrix $Φ$ are referred to as locally adaptive bandwidths.

In RSM, the matrix comprising the vector of optimal bandwidths $b_{11}^{*}, b_{12}, \dots, b_{n k}^{*}$ is obtained from the minimization of the Penalized Prediction Error Sum of Squares ( $P R E S S^{* *}$ ):

M i n i m i z e P R E S S^{* *} {b_{11}, b_{12}, \dots, b_{n k}} = \frac{\sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i, - i}^{(L L R)})}^{2}}{n - t r a c e (H^{(L L R)} (Φ)) + (n - k - 1) \frac{S S E_{m a x} - S S E_{Φ}}{S S E_{m a x}}}

(7)

where $S S E_{m a x}$ is the maximum Sum of Squared of Errors obtained as the $b_{i j}$ , $i = 1, 2, \dots, n$ , $j = 1, 2, \dots, k,$ all tend to infinity, $S S E_{Φ}$ is the Sum of Squared of Errors of $b_{i j}$ , $i = 1, 2, \dots, n$ , $j = 1, 2, \dots, k$ , $t r (H^{(L L R)} Φ)$ is the trace of the LLR Hat matrix and ${\hat{y}}_{i, - i}^{(L L R)}$ is the delete-one cross-validation estimate of $y_{i}$ where the $i^{t h}$ observation left out, see [33].

The LLR model is flexible and can capture local trends which may be overlooked by the OLS model. However, its performance is generally poor when applied in studies that involve $k > 1$ explanatory variables. This poor performance is referred to as ‘curse of dimensionality’ in the nonparametric regression literature [13].

1.4. Genetic algorithm

Once the data has been modeled, the resulting fitted curve is used for determining the setting of the explanatory variables that optimizes the response based on the production requirement. This task summarizes the aim of the Optimization Phase of RSM, also see [20,24]. In this paper, we perform all the optimization tasks using the Genetic Algorithm (GA) Optimization toolbox available in Matlab software.

The GA was introduced by Holland, see [19]. The GA procedure is based on natural selection and other genetic concepts including population, chromosomes, selection, crossover, mutation, etc., [7,16]. The GA is an evolutionary optimization tool that can be applied to solve a variety of problems that are not well suited for standard optimization algorithms including problems in which the objective function lacks closed-form expression such as the LLR model, discontinuous, non-differentiable, stochastic or highly nonlinear, see [2,29,32,35] and Figure 1.

For multiple response studies that involve $m$ response, $m > 1,$ it is essential that we get an optimal setting of the explanatory variables that simultaneously optimize all the responses with respect to their individual production requirements, see [18,31,34]. The most popular criterion applied in the optimization of multiple responses is the Desirability function, see [1,8,15].

1.5. Desirability function

Based on the production requirement of a response, the Desirability function transforms the estimated response, ${\hat{y}}_{p} (x)$ into a scalar measure, $d_{p} ({\hat{y}}_{p} (x)) .$

If the response is of nominal-the-better (NTB) type where the $p^{t h}$ response acceptable value lies between an upper limit, U and a lower limit, L, $d_{p} ({\hat{y}}_{p} (x))$ is given as:

\begin{matrix} d_{p} ({\hat{y}}_{p} (x)) = & {\begin{matrix} \begin{matrix} 0 \\ {\frac{{\hat{y}}_{p} (x) - L}{\emptyset - L}}^{t_{1}} \end{matrix} & \begin{matrix} {\hat{y}}_{p} (x) < L \\ L \leq {\hat{y}}_{p} (x) < \emptyset, \end{matrix} \\ {\frac{U - {\hat{y}}_{p} (x)}{U - \emptyset}}^{t_{2}} & \emptyset \leq {\hat{y}}_{p} (x) \leq U, \\ 0 & {\hat{y}}_{p} (x) > U, \end{matrix} \end{matrix}

(8)

where $\emptyset$ is the target value of the $p^{t h}$ response.

If the objective is to maximize the $p^{t h}$ response, $d_{p} ({\hat{y}}_{p} (x))$ is given by a one-sided transformation as:

d_{p} ({\hat{y}}_{p} (x)) \begin{matrix} = & {\begin{matrix} 0 \\ {\frac{{\hat{y}}_{p} (x) - L}{\emptyset - L}}^{t_{1}} \\ 1 \end{matrix} & \begin{matrix} {\hat{y}}_{p} (x) < L, \\ L \leq {\hat{y}}_{p} (x) \leq \emptyset \\ {\hat{y}}_{p} (x) > \emptyset, \end{matrix}, \end{matrix}

(9)

where $\emptyset$ is interpreted as large enough value of the $p^{t h}$ response.

If the objective is to minimize the $p^{t h}$ response, $d_{p} ({\hat{y}}_{p} (x))$ is given by a one-sided transformation as:

d_{p} ({\hat{y}}_{p} (x)) \begin{matrix} = & {\begin{matrix} 1 \\ {\frac{U - {\hat{y}}_{p} (x)}{U - \emptyset}}^{t_{2}} \\ 0 \end{matrix} & \begin{matrix} {\hat{y}}_{p} (x) < \emptyset, \\ \emptyset \leq {\hat{y}}_{p} (x) \leq U \\ {\hat{y}}_{p} (x) > U, \end{matrix}, \end{matrix}

(10)

where $\emptyset$ is the small enough value of the $p^{t h}$ response.

In all cases, $t_{1} and t_{2}$ are the parameters that control the shape of the desirability function, enabling the user to accommodate nonlinear desirability functions. However, for RSM data, the values of $t_{1} and t_{2}$ are taken to be 1, see [6,18].

1.6. The overall desirability

The overall objective of the Desirability criterion is to obtain the setting of the explanatory variables that maximizes the geometric mean (D) of all the individual desirability measures given as:

D = m a x i m i z e ({(\prod_{p = 1}^{m} d_{p} ({\hat{y}}_{p} (x)))}^{1 / m}),

(11)

The remainder of the paper is organized as follows: A review of existing locally adaptive bandwidths selector in RSM concludes the current Section. In Section 2, the proposed locally adaptive bandwidths selector is presented with an algorithm. Using three examples and simulated data, comparisons of the results of the LLR utilizing the bandwidths from the proposed locally adaptive bandwidths selector with these from the OLS, the LLR utilizing fixed bandwidth and the LLR utilizing the bandwidths from existing locally adaptive bandwidth selector is presented in Section 3, as well as using neural network activation function such as SPOCU activation function. The paper concludes in Section 4.

1.7. A review of locally adaptive bandwidths for data from RSM

Locally adaptive bandwidths perform better than their fixed counterpart because of their comparatively better sensitivity to local trends and patterns in the data; also see [4,11,36]. Locally adaptive bandwidths selectors modeled as functions of local information at each data point. Such local information includes the values of the explanatory variables, $x_{i}$ or of the response, $y_{i}$ , or both, allowing for different degrees of smoothing at each data point, see [4].

Also see [9], presented a data-driven locally adaptive bandwidth selector given as:

b_{i} = \frac{N (N \sum_{i = 1}^{n} y_{i} - y_{i})}{(N n - 1) \sum_{i = 1}^{n} y_{i}}, i = 1, 2, \dots, n,

(12)

where $n$ is the sample size, $y_{i}$ is the response at $i^{t h}$ data point, and $N > 0$ is a tuning parameter.

A drawback of the locally adaptive bandwidths selected by (12) is that they tend to cluster around a small range of values in the interval $(0, 1]$ with a very small difference between the largest and the smallest bandwidths, see [10]. In order to address the clustering of the bandwidths selected by (12), see [10] presented a locally adaptive bandwidths selector given as:

b_{i} = \frac{b^{*} N (C \sum_{i = 1}^{n} α_{i} - α_{i})}{(C n - 1) \sum_{i = 1}^{n} α_{i}}, i = 1, 2, \dots, n,

(13)

where $b^{*}$ is a fixed optimal bandwidth, $α_{i}$ is $y_{i}$ , $i = 1, 2, \dots, n$ , or any ordered statistic that reflects the inadequacies in the OLS estimates of the response, $N > 0$ and $c \geq 0$ are tuning parameters. The tuning parameters $N$ in (12) and (13) and $C$ in (13) are chosen based on the minimization the $P R E S S^{* *}$ criterion in (7), see [10].

The selector in (13) performs very well, giving outstanding results when applied to problems from the literature. However, the curse of dimensionality in LLR originates from the number of explanatory variables, which is not considered in (12) or (13).

The idea that motivates this paper is that a bandwidths selector that assigns a unique bandwidth to each data point per the number of explanatory variables can proffer stronger remedial measures on the curse of dimensionality than the one which ignores such vital information about the data.

2. Methodology

We propose a locally adaptive bandwidths selector that incorporates important information of RSM data, namely, the value of the explanatory variables at each data point and the number of explanatory variables in the study.

The mathematical procedure for the modeling of the proposed locally adaptive bandwidths selector is as follows:

Denote the value of the bandwidth at the $i^{t h}$ data point and for the $k^{t h}$ explanatory variable as $b_{i j}$ and assume that $b_{i j}$ is proportional to a value of a weight function $V_{i j}$ of $x_{i j}$ :

\begin{aligned} b_{i j} & \propto V_{i j} (x_{i j}), i = 1, 2, \dots, n; j = 1, 2, \dots, k . \end{aligned}

(14)

\begin{aligned} b_{i j} & = T_{1 j} V_{i j} (x_{i j}), T_{1 j} > 0, i = 1, 2, \dots, n; j = 1, 2, \dots, k . \end{aligned}

(15)

where $T_{1 j}$ is the constant of proportionality may require scaling either upward or downward by the value of the weight function $V_{i j}$ in order to achieve the optimum smoothing requirement in the $i, j^{t h}$ data point.

An important attribute of a weight function is the ability to assign relatively smaller weight to relatively larger $α_{i}$ , and vice versa, according to the smoothing requirement of the data, see [10]. For instance, if $x_{1 j} > x_{2 j},$ we may either get $V_{i j} (x_{1 j}) < V_{i j} (x_{2 j})$ or $V_{i j} (x_{1 j}) > V_{i j} (x_{2 j})$ .

Mathematically, one of the ways (15) can incorporate the attribute is to express it as:

b_{i j} = T_{1 j} {(Z - \frac{x_{i j}}{T_{2 j}})}^{2}, T_{1 j} > 0, T_{2 j} > 0, i = 1, 2, \dots, n; j = 1, 2, \dots, k .

(16)

where $Z$ a real number and the exponent ‘2′ ensures nonnegative weights that could arise in some data points from the difference $Z - \frac{x_{i j}}{T_{2 j}}, Z \neq \frac{x_{i j}}{T_{2 j}}$ . $T_{2 j}$ plays two key roles: One, for a fixed Z, $T_{2 j}$ ensures that no data point is assigned a zero weight. Two, it ensures that the attribute in (ii) above is embedded, and accomplished in the proposed bandwidths selector.

In order to avoid the clustering of bandwidths, we proceed to obtain the optimal value of Z that would ensure that the difference between the largest bandwidth and the smallest one in the interval (0, 1) is as large as possible.

From Equation (16) we have:

b_{i j} (x_{i j}, Z, T_{1 j}, T_{2 j}) = T_{1 j} Z^{2} - \frac{2 Z T_{1 j} x_{i j}}{T_{2 j}} + \frac{T_{1 j} x_{i j}^{2}}{T_{2 j}^{2}}

(17)

Set $x_{i j} = 0$ and $x_{i j} = 1$ in (17) to get:

\begin{aligned} b_{i j} (0, Z, T_{1 j}, T_{2 j}) & = Z^{2} T_{1 j} \end{aligned}

(18)

\begin{aligned} b_{i j} (1, Z, T_{1 j}, T_{2 j}) & = Z^{2} T_{1 j} - \frac{2 Z T_{1 j}}{T_{2 j}} + \frac{T_{1 j}}{T_{2 j}^{2}} \end{aligned}

(19)

Let $g = 0$ and $h = 1$ represent the range of $x_{i j} .$ By Mean Value Theorem, we have:

\begin{aligned} \frac{b_{i j} (h, Z, T_{1 j}, T_{2 j}) - b_{i j} (g, Z, T_{1 j}, T_{2 j})}{h - g} = \frac{d b_{i j} (Z)}{d x_{i j}} \end{aligned}

(20)

\begin{aligned} \frac{b_{i j} (1, Z, T_{1 j}, T_{2 j}) - b_{i j} (0, Z, T_{1 j}, T_{2 j})}{1 - 0} = \frac{d b_{i j} (Z)}{d x_{i j}} \end{aligned}

(21)

Subtracting Equation (18) from (19) and dividing the result by $(1 - 0)$ , we have:

\frac{b_{i j} (1, Z, T_{1 j}, T_{2 j}) - b_{i j} (0, Z, T_{1 j}, T_{2 j})}{1 - 0} = \frac{Z^{2} T_{1 j} - \frac{2 Z T_{1 j}}{T_{2 j}} + \frac{T_{1 j}}{T_{2 j}^{2}} - Z^{2} T_{1 j}}{1 - 0} = \frac{T_{1 j}}{T_{2 j}^{2}} - \frac{2 Z T_{1 j}}{T_{2 j}}

(22)

The Left-hand side of Equations (21) and (22) are equal. So, we can write:

\frac{d b_{i j} (Z)}{d x_{i j}} = \frac{T_{1 j}}{T_{2 j}^{2}} - \frac{2 Z T_{1 j}}{T_{2 j}}

(23)

Differentiating Equation (17) with respect to $x_{i j}$ we have:

\begin{aligned} \frac{d b_{i j} (x_{i j}, Z, T_{1 j}, T_{2 j})}{d x_{i j}} & = - \frac{2 Z T_{1 j}}{T_{2 j}} + \frac{2 T_{1 j} x_{i j}}{T_{2 j}^{2}} \end{aligned}

(24)

\begin{aligned} \frac{d b_{i j} (Z)}{d x_{i j}} & = - \frac{2 Z T_{1 j}}{T_{2 j}} + \frac{2 Z T_{1 j}}{T_{2 j}^{2}} \end{aligned}

(25)

Equating Equation (23) and (25), we have:

\begin{aligned} \frac{T_{1 j}}{T_{2 j}^{2}} - \frac{2 Z T_{1 j}}{T_{2 j}} & = - \frac{2 T_{1 j} Z}{T_{2 j}} + \frac{2 T_{1 j} Z}{T_{2 j}^{2}} \end{aligned}

(26)

\begin{aligned} \frac{T_{1 j}}{T_{2 j}^{2}} & = \frac{2 T_{1 j} Z}{T_{2 j}^{2}} \end{aligned}

(27)

\begin{aligned} Z & = \frac{T_{1 j} T_{2 j}^{2}}{2 T_{1 j} T_{2 j}^{2}} = \frac{1}{2}, \end{aligned}

(28)

Therefore, $Z = \frac{1}{2}$ , is the optimal value of Z in [0,1] that guarantee minimum clustering of the locally adaptive bandwidths from (16). Substituting $Z = \frac{1}{2}$ in (16) gives:

b_{i j} (x_{i j}, \frac{1}{2}, T_{1 j}, T_{2 j}) = T_{1 j} {(\frac{1}{2} - \frac{x_{i j}}{T_{2 j}})}^{2}, i = 1, 2, \dots, n; j = 1, 2, \dots, k

(29)

The matrix, $Φ^{*}$ of the locally adaptive optimal bandwidths from Equation (29) is obtained at optimally selected values of $T_{1 j}$ , $T_{2 j}$ , (hereafter referred to as $T_{1 j}^{*}$ and $T_{2 j}^{*}$ , respectively), $j = 1, 2, \dots, k$ , based on the minimization of the $P R E S S^{* *}$ criterion in (7).

The optimal values of the tuning parameters ( $T_{1 j}^{*}$ , $T_{2 j}^{*}$ for the proposed bandwidths selector in (29), $C^{*}, N^{*}$ for example, see [10] in (13)) and the locally adaptive optimal bandwidths for k explanatory variables are presented in Tables 1 and 2.

Table 1.

Optimal values of the tuning parameters $T_{1 j}^{*}$ and $T_{2 j}^{*}$ and the bandwidths of the proposed bandwidth selector.

$i$	$x_{1}, T_{11}^{}, T_{21}^{}$	$x_{2}, T_{12}^{}, T_{22}^{}$	$\dots$	$x_{k}, T_{1 k}^{}, T_{2 k}^{}$
1	$b_{11}^{*}$ ( $x_{11}$ )	$b_{12}^{*} (x_{12})$	$\dots$	$b_{1 k}^{*} (x_{1 k})$
2	$b_{21}^{*} (x_{21})$	$b_{22}^{*} (x_{22})$	$\dots$	$b_{2 k}^{*} (x_{2 k})$
$⋮$	$⋮$	$⋮$	$⋮$	$⋮$
N	$b_{n 1}^{*} (x_{n 1})$	$b_{n 2}^{*} (x_{n 2})$	$\dots$	$b_{n k}^{*} (x_{n k})$

$i$	$x_{1}, C^{}, N^{}$	$x_{2}, C^{}, N^{}$	$\dots$	$x_{k}, C^{}, N^{}$
1	$b_{1}^{*}$ ( $y_{1}$ )	$b_{1}^{*} (y_{1})$	$\dots$	$b_{1}^{*} (y_{1})$
2	$b_{2}^{*} (y_{2})$	$b_{2}^{*} (y_{2})$	$\dots$	$b_{2}^{*} (y_{2})$
$⋮$	$⋮$	$⋮$	$⋮$	$⋮$
N	$b_{n}^{*} (y_{n})$	$b_{n}^{*} (y_{n})$	$\dots$	$b_{n}^{*} (y_{n})$

$i$	$x_{1}$	$x_{2}$	$y$
1	−1	−1	88.55
2	1	−1	85.80
3	−1	1	86.29
4	1	1	80.44
5	−1.414	0	85.50
6	1.414	0	85.39
7	0	−1.414	86.22
8	0	1.414	85.70
9	0	0	90.21
10	0	0	90.85
11	0	0	91.31

$i$	$x_{1}$	$x_{2}$	$y$
1	0.1464	0.1464	88.55
2	0.8536	0.1464	85.80
3	0.1464	0.8536	86.29
4	0.8536	0.8536	80.44
5	0.0000	0.5000	85.50
6	1.0000	0.5000	85.39
7	0.5000	0.0000	86.22
8	0.5000	1.0000	85.70
9	0.5000	0.5000	90.21
10	0.5000	0.5000	90.85
11	0.5000	0.5000	91.31

$i$	$x_{1}$	$x_{2}$
	$T_{11}^{*} = 1.3151$	$T_{12}^{*} = 1.4134$
	$T_{21}^{*} = 2.9740$	$T_{22}^{*} = 1.0412$
	$b_{i 1}^{*}$	$b_{i 2}^{*}$
1	0.2672	0.1826
2	0.0597	0.1826
3	0.2672	0.1446
4	0.0597	0.1446
5	0.3288	0.0006
6	0.0353	0.0006
7	0.1448	0.3534
8	0.1448	0.2996
9	0.1448	0.0006
10	0.1448	0.0006
11	0.1448	0.0006

Method	$b^{*}$	$D F_{e r r o r}$	$M S E$	$S S E$	$R^{2}$	$R_{a d j}^{2}$	$P R E S S$	$P R E S S^{*}$	$P R E S S^{* *}$
$O L S$	–	5.000	3.1600	15.8182	0.8388	0.6777	109.5179	21.9036	21.9036
$L L R_{F B}$	0.5200	5.6509	5.7000	32.2355	0.6717	0.4190	93.2835	16.5076	8.9508
$L L R_{A B}$	*	2.9261	0.5974	1.7481	0.9822	0.9391	46.0765	15.7467	4.2858
$L L R_{P A B}$	*	2.0537	0.3947	0.8106	0.9917	0.9598	45.2734	22.0443	4.5398

Approach	$x_{1}$	$x_{2}$	$\hat{y}$
OLS	0.43930	0.43610	90.9780
$L L R_{F B}$	0.40140	0.39438	88.3509
$L L R_{A B}$	0.40771	0.42312	91.1278
$L L R_{P A B}$	0.7272	0.5000	92.6823

	$E x p e r i m e n t a l v a r i a b l e s$		Responses
i	$x_{1}$	$x_{2}$	$y_{1}$	$y_{2}$	$y_{3}$
1	−1	–1	76.5	62	2940
2	1	–1	78.0	66	3680
3	−1	1	77.0	60	3470
4	1	1	79.5	59	3890
5	−1.414	0	75.6	71	3020
6	1.414	0	78.4	68	3360
7	0	–1.414	77.0	57	3150
8	0	1.414	78.5	58	3630
9	0	0	79.9	72	3480
10	0	0	80.3	69	3200
11	0	0	80.0	68	3410
12	0	0	79.7	70	3290
13	0	0	79.8	71	3500

	Proposed tuning parameters for locally adaptive bandwidth selector				Edionwe et al. (2016)
	$T_{11}^{*}$	$T_{21}^{*}$	$T_{12}^{*}$	$T_{22}^{*}$	$b^{*}$	$C^{*}$	$N^{*}$
$y_{1}$	1.0031	5.9998	1.2694	1.0541	0.5123	0.0797	6.0399
$y_{2}$	1.5240	4.4808	0.4149	7.6721	0.4847	0.0959	2.4438
$y_{3}$	0.9987	3.2763	0.6603	3.6044	1.0000	0.0896	4.8181

Response	Model	$D F$	$P R E S S^{* *}$	$P R E S S$	$S S E$	$M S E$	$R^{2}$ (%)	$R_{A d j}^{2}$ (%)
$y_{1}$	$O L S$	7.0000	0.3361	2.3525	0.4962	0.0709	98.27	97.04
	$L L R_{F B}$	7.4717	0.5686	8.4888	4.7536	0.6362	83.46	73.44
	$L L R_{A B}$	4.7777	0.2063	3.0144	0.3103	0.0649	98.92	97.29
	$L L R_{P A B}$	4.0144	0.0481	0.6687	0.2165	0.0539	99.25	97.75
$y_{2}$	$O L S$	7.0000	28.8726	202.1082	36.2242	5.1749	89.98	82.81
	$L L R_{F B}$	7.2576	22.0691	330.8149	80.2383	11.0558	77.79	63.27
	$L L R_{A B}$	4.0000	9.2024	126.2331	10.0000	2.5000	97.23	91.70
	$L L R_{P A B}$	4.0009	8.8531	121.4495	10.0000	2.4994	97.23	91.70
$y_{3}$	$O L S$	7.0000	159,080	1,113,600	207,870	29,696	75.90	58.68
	$L L R_{F B}$	9.2798	56,513	588,010	243,460	26,235	71.77	63.50
	$L L R_{A B}$	5.8380	40,779	508,170	92,621	15,865	89.26	77.93
	$L L R_{P A B}$	4.0000	26504	307,560	65,720	16,430	92.38	77.14

Model	$x_{1}$	$x_{2}$	${\hat{y}}_{1}$	${\hat{y}}_{2}$	${\hat{y}}_{3}$	$d_{1}$	$d_{2}$	$d_{3}$	$D$ (%)
$O L S$	0.4449	0.2226	78.7616	66.4827	3229.9	0.1744	0.5058	0.3504	31.3800
$L L R_{F B}$	0.4481	0.3709	78.5537	66.7908	3290.8	0.0358	0.4031	0.0461	8.7200
$L L R_{A B}$	0.5155	0.3467	78.6965	65.0328	3285.9	0.1310	0.9891	0.0703	20.8837
$L L R_{P A B}$	1.0000	0.6472	79.6033	64.0137	3212.7	0.7355	0.6712	0.4367	59.9647

	Coded levels
i	$x_{1}$	$x_{2}$	$x_{3}$	y₁	y₂	y₃	$y_{4}$
1	−1	−1	−1	1.83	29.31	29.50	50.36
2	1	−1	−1	1.73	39.32	19.40	48.16
3	−1	1	−1	1.85	25.16	25.70	50.72
4	1	1	−1	1.67	40.18	27.10	49.69
5	−1	−1	1	1.86	29.82	21.40	50.09
6	1	−1	1	1.77	32.20	24.00	50.61
7	−1	1	1	1.88	22.01	19.60	50.36
8	1	1	1	1.66	40.02	25.10	50.42
9	−1.682	0	0	1.81	33.00	24.20	29.31
10	1.682	0	0	1.37	51.59	30.60	50.67
11	0	−1.682	0	1.85	20.35	20.90	48.75
12	0	1.682	0	1.92	20.53	18.90	52.70
13	0	0	−1.682	1.88	23.85	23.00	50.19
14	0	0	1.682	1.90	20.16	21.20	50.86
15	0	0	0	1.89	21.72	18.50	50.84
16	0	0	0	1.88	21.21	18.60	50.93
17	0	0	0	1.87	21.55	16.80	50.98

$i$	$x_{1}$	$x_{2}$	$x_{3}$	$y_{1}$	$y_{2}$	$y_{3}$	$y_{4}$
1	0.2030	0.2030	0.2030	1.83	29.31	29.50	50.36
2	0.7970	0.2030	0.2030	1.73	39.32	19.40	48.16
3	0.2030	0.7970	0.2030	1.85	25.16	25.70	50.72
4	0.7970	0.7970	0.2030	1.67	40.18	27.10	49.69
5	0.2030	0.2030	0.7970	1.86	29.82	21.40	50.09
6	0.7970	0.2030	0.7970	1.77	32.20	24.00	50.61
7	0.2030	0.7970	0.7970	1.88	22.01	19.60	50.36
8	0.7970	0.7970	0.7970	1.66	40.02	25.10	50.42
9	0.0000	0.5000	0.5000	1.81	33.00	24.20	29.31
10	1.0000	0.5000	0.5000	1.37	51.59	30.60	50.67
11	0.5000	0.0000	0.5000	1.85	20.35	20.90	48.75
12	0.5000	1.0000	0.5000	1.92	20.53	18.90	52.70
13	0.5000	0.5000	0.0000	1.88	23.85	23.00	50.19
14	0.5000	0.5000	1.0000	1.90	20.16	21.20	50.86
15	0.5000	0.5000	0.5000	1.89	21.72	18.50	50.84
16	0.5000	0.5000	0.5000	1.88	21.21	18.60	50.93
17	0.5000	0.5000	0.5000	1.87	21.55	16.80	50.98

Response	Model	$D F$	$P R E S S^{* *}$	$P R E S S$	$S S E$	$M S E$	$R^{2}$ (%)	$R_{A d j}^{2}$ (%)
$y_{1}$	$O L S$	14.0000	0.0042	0.0582	0.0231	0.0017	92.13	91.00
	$L L R_{F B}$	12.1398	0.0026	0.0681	0.0126	0.0010	95.70	94.33
	$L L R_{A B}$	12.0000	0.0008	0.0216	0.0123	0.0010	95.79	94.39
	$L L R_{P A B}$	12.0000	0.0019	0.0491	0.0123	0.0010	95.79	94.39
$y_{2}$	$O L S$	12.0000	19.5097	234.1166	90.9033	7.5753	93.39	91.18
	$L L R_{F B}$	11.2152	36.4407	786.71166	245.3568	21.8771	82.15	74.53
	$L L R_{A B}$	8.1282	16.7007	359.9569	38.7168	4.7633	97.18	94.45
	$L L R_{P A B}$	8.2177	7.4867	162.1354	37.8103	4.6011	97.25	94.64
$y_{3}$	$O L S$	9.0000	20.2719	182.4468	41.1338	4.5704	84.06	71.66
	$L L R_{F B}$	8.3794	17.0573	287.0907	82.1622	9.8053	68.16	39.21
	$L L R_{A B}$	5.8585	11.5001	203.8490	20.4613	3.4926	92.07	78.35
	$L L R_{P A B}$	2.0443	8.0901	120.7925	2.0489	1.0023	99.21	93.79
	$O L S$	14.0000	48.9101	684.7407	198.8048	14.2003	54.13	47.57
$y_{4}$	$L L R_{F B}$	12.0308	17.1477	454.5609	12.2623	1.0193	97.17	96.24
	$L L R_{A B}$	12.0000	14.0842	372.9912	12.1387	1.0116	97.20	96.27
	$L L R_{P A B}$	12.0001	8.8590	234.6134	12.1387	1.0116	97.20	96.27

Model	$x_{1}$	$x_{2}$	$x_{3}$	${\hat{y}}_{1}$	${\hat{y}}_{2}$	${\hat{y}}_{3}$	${\hat{y}}_{4}$	$d_{1}$	$d_{2}$	$d_{3}$	$d_{4}$	D(%)
$O L S$	0.3764	1.0000	0.7155	1.9071	19.4993	17.2185	50.3018	0.9415	1.00	0.8692	0.8866	92.29
$L L R_{F B}$	0.8078	0.2375	0.9573	1.6877	36.7371	24.7076	49.7628	0.000	0.00	0.0000	0.7965	0.00
$L L R_{A B}$	0.4318	1.0000	0.5673	1.8775	18.9436	19.6005	50.6611	0.8068	1.00	0.1248	0.9467	55.57
$L L R_{P A B}$	0.5711	0.4481	0.6094	2.0825	20.0918	16.7583	51.0266	1.0000	1.00	1.0000	1.0000	100.00

Model	$γ$	OLS	$L L R_{F B}$	$L L R_{A B}$	$L L R_{P A B}$
(1)	0.00	9.8961	8.3371	8.3220	8.3133
	0.50	22.5001	8.4606	8.4105	8.4204
	1.00	48.7471	8.4817	8.4120	8.4310
(2)	0.00	9.8769	8.1392	8.2887	8.2679
	0.50	16.2334	8.4989	8.2899	8.2973
	1.00	30.5292	9.4051	9.1337	8.9398
(3)	0.00	6.9849	68.9816	6.3277	4.0700
	0.50	18.0887	61.6146	14.4455	4.7940
	1.00	51.0910	99.0211	15.1152	5.1912
(4)	0.00	7.0210	34.0919	13.6632	4.0198
	0.50	13.7667	41.8323	20.9044	7.0169
	1.00	39.1912	72.1624	38.9560	8.9640
(5)	0.00	7.0113	28.9237	6.2117	5.8945
	0.50	125.2006	254.4773	12.5466	6.3215
	1.00	479.6291	747.5212	71.8911	26.041
(6)	0.00	7.2458	37.5407	7.9100	4.7715
	0.50	44.1519	64.3340	12.1213	5.8219
	1.00	155.2220	173.5006	22.1993	8.8906

Activation function	Loss ( $y_{1}$ )	Loss ( $y_{2}$ )	Loss ( $y_{3}$ )
ReLU	0.0329	0.0391	0.0762
Leaky-ReLU	0.0074	0.0277	0.0764
SELU	0.0086	0.0277	0.0762
SPOCU	0.0074	0.0277	0.0762

Activation function	Loss ( $y_{1}$ )	Loss ( $y_{2}$ )	Loss ( $y_{3}$ )	Loss ( $y_{4}$ )
ReLU	0.000682	0.0186	0.0079	0.0000233
Leaky-ReLU	0.00073	0.0000981	0.0079	0.0000237
SELU	0.0102	0.000203	0.0083	0.0000259
SPOCU	0.000682	0.000148	0.0079	0.0000232

PERMALINK

Local Linear Regression and the problem of dimensionality: a remedial strategy via a new locally adaptive bandwidths selector

O Eguasa

E Edionwe

J I Mbegbu

Abstract

1. Introduction

1.1. The parametric regression model

1.2. The Local Linear Regression model

1.3. Bandwidths for nonparametric regression model

1.4. Genetic algorithm

Figure 1.

1.5. Desirability function

1.6. The overall desirability

1.7. A review of locally adaptive bandwidths for data from RSM

2. Methodology

Table 1.

Table 2.

Figure 2.

Figure 3.

2.1. Algorithm: Leave-one-out cross-validation technique for selecting locally adaptive bandwidths for LLR model

3. Application

3.1. Single response chemical process data

Table 3.

3.2. Transformation of data from Central Composite Design

Table 4.

Table 5.

Table 6.

Figure 4.

Table 7.

3.3. The multiple response chemical process data

Table 8.

Table 9.

Table 10.

Table 11.

Table 12.

Figure 5.

Table 13.

3.4. The Minced Fish Quality Data

Table 14.

Table 15.

Table 16.

Table 17.

Table 18.

Table 19.

Figure 6.

3.5. Simulation study

Table 20.

Table 21.

Table 22.

3.6. Neural network computing and application

3.6.1. Scaled polynomial constant unit (SPOCU) activation function

Figure 7.

Figure 8.

Figure 9.

4. Conclusion

Table 23.

Table 24.

Table 25.

Acknowledgements

Disclosure statement

Declaration of Interest statement

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases