Skip to main content
Journal of Applied Statistics logoLink to Journal of Applied Statistics
. 2022 Jan 31;50(6):1283–1309. doi: 10.1080/02664763.2022.2026895

Local Linear Regression and the problem of dimensionality: a remedial strategy via a new locally adaptive bandwidths selector

O Eguasa a,CONTACT, E Edionwe b, J I Mbegbu c
PMCID: PMC10071963  PMID: 37025278

Abstract

Local Linear Regression (LLR) is a nonparametric regression model applied in the modeling phase of Response Surface Methodology (RSM). LLR does not make reference to any fixed parametric model. Hence, LLR is flexible and can capture local trends in the data that might be too complicated for the OLS. However, besides the small sample size and sparse data which characterizes RSM, the performance of the LLR model nosedives as the number of explanatory variables considered in the study increases. This phenomenon, popularly referred to as curse of dimensionality, results in the scanty application of LLR in RSM. In this paper, we propose a novel locally adaptive bandwidths selector, unlike the fixed bandwidths and existing locally adaptive bandwidths selectors, takes into account both the number of the explanatory variables in the study and their individual values at each data point. Single and multiple response problems from the literature and simulated data were used to compare the performance of the LLRPAB with those of the OLS, LLRFB and LLRAB. Neural network activation functions such ReLU, Leaky-ReLU, SELU and SPOCU was considered and give a remarkable improvement on the loss function (Mean Squared Error) over the regression models utilized in the three data.

KEYWORDS: Desirability function, locally adaptive bandwidths selector, Local Linear Regression model, SPOCU activation function, Response Surface Methodology

1. Introduction

Response Surface Methodology (RSM) is a collection of mathematical and statistical techniques applied in the modeling and analysis of data in which a response is influenced by one or more explanatory variables, [5]. There are three distinct phases in RSM, namely, the Design of Experiment Phase, the Modeling Phase, and the Optimization Phase, see [6].

 In the Modeling Phase of RSM, a fundamental assumption is that the relationship between the response variable  y and k explanatory variables x1,x2,,xk, may be represented as:

yi=f(xi1,xi2,,xik)+εi,i=1,2,,n (1)

where the mean function f denotes the true but unknown relationship between the response variable and the k explanatory variables, εi,i=1,2,,n, are random error terms assumed to have a normal distribution with mean zero and constant variance and n is the sample size, see [27,33].

The OLS and the LLR are existing regression models applied in the estimation of the unknown function f in (1) [3,18]. The OLS model is applied in the estimation of the unknown parameters (coefficients) in the parametric (polynomial) model that the experimenter assumes adequate to approximate f in (1), see [33].

1.1. The parametric regression model

y=Xβ+ϵ (2)

The OLS estimate y^i(OLS) response in the ith data point is given as:

y^i(OLS)=xi(XTX)1XTy (3)

where y is a n× 1 vector of response, X is a n×p model matrix, p is the number of model parameters (coefficients), XT is the transpose of the matrix X, and xi is the ith row vector of the matrix X, see [29].

In matrix notation, the vector of OLS estimated response is expressed as:

y^(OLS)=[h1(OLS)h2(OLS)hn(OLS)]y=H(OLS)y, (4)

where the vector hi(OLS)=xi(XTX)1XT is the ith row of the n×n OLS Hat matrix H(OLS).

The OLS model requires several assumptions to be met for valid interpretation of its parameter estimates. Furthermore, it performs poorly if the assumed polynomial model is inadequate for the data, also see [33].

1.2. The Local Linear Regression model

The LLR model is a nonparametric regression version of the weighted least squares model, also see [14,12,23]. The LLR estimate, y^i(LLR) of yi, is given as:

y^i(LLR)=x~i(X~TWiX~)1X~TWiy=hi(LLR)y, (5)

where x~i is the ith row of the LLR model matrix X~ given as:

X~=[1x11x12x1k1x21x22x2k1xn1xn2xnk]n×(k+1),

where xij, i=1,2,,n,j=1,2,,k, denotes the value of the jth explanatory variable in the ith data point, Wi is a n×n diagonal weights matrix given as:

Wi=[w1i00w2i00000wni]n×n

1.3. Bandwidths for nonparametric regression model

The bandwidth is the most crucial parameter in nonparametric regression and estimation procedure because the choice selected determines the shape of the fitted curve, see [14,22,30]. If bij=b for all i and all j, we say that smoothing in the LLR model is done using fixed bandwidth or global b, otherwise bij, i=1,2,,n, j=1,2,,k, are referred to as locally adaptive bandwidths given as:

K(xijx1jbij)=e(xijx1jbij)2 is the simplified Gaussian kernel function and bij, 0<bij1, i=1,2,,n, j=1,2,,k, are the bandwidths (smoothing parameters), see [28,29].

In a situation where more than one explanatory are used in the model matrix X, the kernel weight w1i is a product kernel given as:

w1i=j=1kK(xijx1jbij)/i=1nj=1kK(xijx1jbij),i=1,2,,n,j=1,2,,k, (6)

Each value of bij, i=1,2,,n, and j=1,1,,k, may be thought of as an entry in a matrix say Φ given as:

Φ=[b11b12b1kb21b22b2kbn1bn1bnk]nxk

In the matrix Φ are referred to as locally adaptive bandwidths.

In RSM, the matrix comprising the vector of optimal bandwidths b11,b12,,bnk is obtained from the minimization of the Penalized Prediction Error Sum of Squares ( PRESS):

MinimizePRESS{b11,b12,,bnk}=i=1n(yiy^i,i(LLR))2ntrace(H(LLR)(Φ))+(nk1)SSEmaxSSEΦSSEmax (7)

where SSEmax is the maximum Sum of Squared of Errors obtained as the bij, i=1,2,,n, j=1,2,,k, all tend to infinity, SSEΦ is the Sum of Squared of Errors of bij, i=1,2,,n, j=1,2,,k, tr(H(LLR)Φ) is the trace of the LLR Hat matrix and y^i,i(LLR) is the delete-one cross-validation estimate of yi where the ith observation left out, see [33].

The LLR model is flexible and can capture local trends which may be overlooked by the OLS model. However, its performance is generally poor when applied in studies that involve k>1 explanatory variables. This poor performance is referred to as ‘curse of dimensionality’ in the nonparametric regression literature [13].

1.4. Genetic algorithm

Once the data has been modeled, the resulting fitted curve is used for determining the setting of the explanatory variables that optimizes the response based on the production requirement. This task summarizes the aim of the Optimization Phase of RSM, also see [20,24]. In this paper, we perform all the optimization tasks using the Genetic Algorithm (GA) Optimization toolbox available in Matlab software.

The GA was introduced by Holland, see [19]. The GA procedure is based on natural selection and other genetic concepts including population, chromosomes, selection, crossover, mutation, etc., [7,16]. The GA is an evolutionary optimization tool that can be applied to solve a variety of problems that are not well suited for standard optimization algorithms including problems in which the objective function lacks closed-form expression such as the LLR model, discontinuous, non-differentiable, stochastic or highly nonlinear, see [2,29,32,35] and Figure 1.

Figure 1.

Figure 1.

A basic GA flowchart.

For multiple response studies that involve m response, m>1, it is essential that we get an optimal setting of the explanatory variables that simultaneously optimize all the responses with respect to their individual production requirements, see [18,31,34]. The most popular criterion applied in the optimization of multiple responses is the Desirability function, see [1,8,15].

1.5. Desirability function

Based on the production requirement of a response, the Desirability function transforms the estimated response, y^p(x) into a scalar measure, dp(y^p(x)).

If the response is of nominal-the-better (NTB) type where the pth response acceptable value lies between an upper limit, U and a lower limit, L, dp(y^p(x)) is given as:

dp(y^p(x))={0{y^p(x)LL}t1y^p(x)<LLy^p(x)<,{Uy^p(x)U}t2y^p(x)U,0y^p(x)>U, (8)

where is the target value of the pth response.

If the objective is to maximize the pth response, dp(y^p(x)) is given by a one-sided transformation as:

dp(y^p(x))={0{y^p(x)LL}t11y^p(x)<L,Ly^p(x)y^p(x)>,, (9)

where is interpreted as large enough value of the pth response.

If the objective is to minimize the pth response, dp(y^p(x)) is given by a one-sided transformation as:

dp(y^p(x))={1{Uy^p(x)U}t20y^p(x)<,y^p(x)Uy^p(x)>U,, (10)

where is the small enough value of the pth response.

In all cases, t1andt2 are the parameters that control the shape of the desirability function, enabling the user to accommodate nonlinear desirability functions. However, for RSM data, the values of t1andt2 are taken to be 1, see [6,18].

1.6. The overall desirability

The overall objective of the Desirability criterion is to obtain the setting of the explanatory variables that maximizes the geometric mean (D) of all the individual desirability measures given as:

D=maximize((p=1mdp(y^p(x)))1/m), (11)

The remainder of the paper is organized as follows: A review of existing locally adaptive bandwidths selector in RSM concludes the current Section. In Section 2, the proposed locally adaptive bandwidths selector is presented with an algorithm. Using three examples and simulated data, comparisons of the results of the LLR utilizing the bandwidths from the proposed locally adaptive bandwidths selector with these from the OLS, the LLR utilizing fixed bandwidth and the LLR utilizing the bandwidths from existing locally adaptive bandwidth selector is presented in Section 3, as well as using neural network activation function such as SPOCU activation function. The paper concludes in Section 4.

1.7. A review of locally adaptive bandwidths for data from RSM

Locally adaptive bandwidths perform better than their fixed counterpart because of their comparatively better sensitivity to local trends and patterns in the data; also see [4,11,36]. Locally adaptive bandwidths selectors modeled as functions of local information at each data point. Such local information includes the values of the explanatory variables, xi or of the response, yi, or both, allowing for different degrees of smoothing at each data point, see [4].

Also see [9], presented a data-driven locally adaptive bandwidth selector given as:

bi=N(Ni=1nyiyi)(Nn1)i=1nyi,i=1,2,,n, (12)

where n is the sample size, yi is the response at ith data point, and N>0 is a tuning parameter.

A drawback of the locally adaptive bandwidths selected by (12) is that they tend to cluster around a small range of values in the interval (0,1] with a very small difference between the largest and the smallest bandwidths, see [10]. In order to address the clustering of the bandwidths selected by (12), see [10] presented a locally adaptive bandwidths selector given as:

bi=bN(Ci=1nαiαi)(Cn1)i=1nαi,i=1,2,,n, (13)

where b is a fixed optimal bandwidth, αi is yi, i=1,2,,n, or any ordered statistic that reflects the inadequacies in the OLS estimates of the response, N>0 and c0 are tuning parameters. The tuning parameters N in (12) and (13) and C in (13) are chosen based on the minimization the PRESS criterion in (7), see [10].

The selector in (13) performs very well, giving outstanding results when applied to problems from the literature. However, the curse of dimensionality in LLR originates from the number of explanatory variables, which is not considered in (12) or (13).

The idea that motivates this paper is that a bandwidths selector that assigns a unique bandwidth to each data point per the number of explanatory variables can proffer stronger remedial measures on the curse of dimensionality than the one which ignores such vital information about the data.

2. Methodology

We propose a locally adaptive bandwidths selector that incorporates important information of RSM data, namely, the value of the explanatory variables at each data point and the number of explanatory variables in the study.

The mathematical procedure for the modeling of the proposed locally adaptive bandwidths selector is as follows:

Denote the value of the bandwidth at the ith data point and for the kth explanatory variable as bij and assume that bij is proportional to a value of a weight function Vij of xij:

bijVij(xij),i=1,2,,n;j=1,2,,k. (14)
bij=T1jVij(xij),T1j>0,i=1,2,,n;j=1,2,,k. (15)

where T1j is the constant of proportionality may require scaling either upward or downward by the value of the weight function Vij in order to achieve the optimum smoothing requirement in the i,jth data point.

An important attribute of a weight function is the ability to assign relatively smaller weight to relatively larger αi, and vice versa, according to the smoothing requirement of the data, see [10]. For instance, if x1j>x2j, we may either get Vij(x1j)<Vij(x2j) or Vij(x1j)>Vij(x2j).

Mathematically, one of the ways (15) can incorporate the attribute is to express it as:

bij=T1j(ZxijT2j)2,T1j>0,T2j>0,i=1,2,,n;j=1,2,,k. (16)

where Z a real number and the exponent ‘2′ ensures nonnegative weights that could arise in some data points from the difference ZxijT2j,ZxijT2j. T2j plays two key roles: One, for a fixed Z, T2j ensures that no data point is assigned a zero weight. Two, it ensures that the attribute in (ii) above is embedded, and accomplished in the proposed bandwidths selector.

In order to avoid the clustering of bandwidths, we proceed to obtain the optimal value of Z that would ensure that the difference between the largest bandwidth and the smallest one in the interval (0, 1) is as large as possible.

From Equation (16) we have:

bij(xij,Z,T1j,T2j)=T1jZ22ZT1jxijT2j+T1jxij2T2j2 (17)

Set xij=0 and xij=1 in (17) to get:

bij(0,Z,T1j,T2j)=Z2T1j (18)
bij(1,Z,T1j,T2j)=Z2T1j2ZT1jT2j+T1jT2j2 (19)

Let g=0 and h=1 represent the range of xij. By Mean Value Theorem, we have:

bij(h,Z,T1j,T2j)bij(g,Z,T1j,T2j)hg=dbij(Z)dxij (20)
bij(1,Z,T1j,T2j)bij(0,Z,T1j,T2j)10=dbij(Z)dxij (21)

Subtracting Equation (18) from (19) and dividing the result by (10), we have:

bij(1,Z,T1j,T2j)bij(0,Z,T1j,T2j)10=Z2T1j2ZT1jT2j+T1jT2j2Z2T1j10=T1jT2j22ZT1jT2j (22)

The Left-hand side of Equations (21) and (22) are equal. So, we can write:

dbij(Z)dxij=T1jT2j22ZT1jT2j (23)

Differentiating Equation (17) with respect to xij we have:

dbij(xij,Z,T1j,T2j)dxij=2ZT1jT2j+2T1jxijT2j2 (24)
dbij(Z)dxij=2ZT1jT2j+2ZT1jT2j2 (25)

Equating Equation (23) and (25), we have:

T1jT2j22ZT1jT2j=2T1jZT2j+2T1jZT2j2 (26)
T1jT2j2=2T1jZT2j2 (27)
Z=T1jT2j22T1jT2j2=12, (28)

Therefore, Z=12, is the optimal value of Z in [0,1] that guarantee minimum clustering of the locally adaptive bandwidths from (16). Substituting Z=12 in (16) gives:

bij(xij,12,T1j,T2j)=T1j(12xijT2j)2,i=1,2,,n;j=1,2,,k (29)

The matrix, Φ of the locally adaptive optimal bandwidths from Equation (29) is obtained at optimally selected values of T1j, T2j, (hereafter referred to as T1j and T2j, respectively), j=1,2,,k, based on the minimization of the PRESS criterion in (7).

The optimal values of the tuning parameters ( T1j, T2j for the proposed bandwidths selector in (29), C,N for example, see [10] in (13)) and the locally adaptive optimal bandwidths for k explanatory variables are presented in Tables 1 and 2.

Table 1.

Optimal values of the tuning parameters T1j and T2j and the bandwidths of the proposed bandwidth selector.

i x1,T11,T21 x2,T12,T22 xk,T1k,T2k
1 b11( x11) b12(x12) b1k(x1k)
2 b21(x21) b22(x22) b2k(x2k)
N bn1(xn1) bn2(xn2) bnk(xnk)

Table 2.

Optimal values of the tuning parameters for C and N and the bandwidths from [10].

i x1,C,N x2,C,N xk,C,N
1 b1( y1) b1(y1) b1(y1)
2 b2(y2) b2(y2) b2(y2)
N bn(yn) bn(yn) bn(yn)

Unlike the bandwidths from the proposed bandwidths selector, bi1( y1) =  bi2( y1) = … =  bik( y1), see [10] since C,N and yi, i=1,2,,n, are same for the k explanatory variables and Figure 2 and Figure 3.

Figure 2.

Figure 2.

Plot of bandwidths against explanatory variables for fixed values of T2 less than 1.

Figure 3.

Figure 3.

Plot of bandwidths against explanatory variables for fixed values of T2 greater than 1.

From the plots in Figures 2 and 3, we observe that as T2j increases from 0.05 through 0.25, the value of bandwidth b(x) increases as values of the explanatory variable x increase from 0 to 1. At T2j=0.45, we notice the beginning of a new trend which culminates in a parabolic curve at T2j=1.05. For these values of T2j=0.45 through T2j=1.05, the vertices of the parabolas show a gradual shift from x=0 towards x=0.5. At these plots with pseudo-parabolas, the local bandwidths b(x) decreases as x increases from 0 to the data point where the vertex for the particular plot is located, and thereafter increases as the x increases. Elsewhere, for T2j=0.05 through T2j=0.25, the bandwidth increases as x increases. The reverse is displayed at the plots for T2j=1.85 through T2j=25.05. At T2j1.0×1015, we have a horizontal curve that indicates fixed or global bandwidth for all the data points. In addition, no data point is assigned a negative bandwidth. These observations are graphical assertions of the set objectives regarding the modeling of the proposed locally adaptive bandwidth selector.

2.1. Algorithm: Leave-one-out cross-validation technique for selecting locally adaptive bandwidths for LLR model

Step 1: define the bandwidths bij for ith location and kth number of explanatory variables.

bij(xij,12,T1j,T2j)=T1j(12xijT2j)2,i=1,2,,n;j=1,2,,k.

Step 2: obtain a set ψ of acceptable values of bandwidths (for RSM data, ψ(0,1]) for which the locally adaptive bandwidths bij are located values.

Step 3: define the leave-one-out cross validation for all likely set if Ω=(b1j,b2j,,bnj) over the complete range in the set ψ(0,1]:

y^i,i(b1j,b2j,,bnj),i=1,2,,n;j=1,2,,k.

Step 4: define the PRESS criterion:

PRESS(b)=i=1n(yiy^i,i(.)(Ω))2ntrace(H(.)(Ω))+(nk1)SSEmaxSSEΩSSEmax

for selecting locally adaptive bandwidths on the interval (0, 1) and obtain y^i,i(Ω) the estimated response at location i, leaving out the ith observation for the set of locally adaptive bandwidths

Ω=[b1j,b2j,,bnj;b1j,b2j,,bnj;b1j,b2j,,bnj]

Step 5: obtain SSEmax as b tend to infinity, (b=100000000000000000000) in y^i(LLR)(b)

SSEmax=i=1n(yiy^i(LLR)(b))2

Step 6: obtain SSEΩ for a particular set of locally adaptive bandwidths:

SSEΩ(b1j,b2j,,bnj)=i=1n(yiy^i(LLR)(b1j,b2j,,bnj))2

Step 7: lastly, outline the optimal locally adaptive bandwidths:

( b1j,b2j,,bnj) is the convergence results that minimized PRESS criterion [13].

Step 8: Stop.

3. Application

In the first part of this Section, a single response and two multiple response problems are used in order to compare the performance of the proposed locally adaptive bandwidths selector and the locally adaptive selector, see [10]. The aim of the comparisons is to validate the bandwidths selector that has more capacity to reduce the curse of dimensionality in the LLR model applied to RSM data. The goodness-of-fit used for comparison include the Sum of Squared Errors (SSE), the Mean Squared Error (MSE), the Coefficient of Determination, ( R2), the Adjusted Coefficient of Determination (RAdj2), the PRESS criterion given in (7) and the PRESS=i=1n(yiy^i,i(.))2, where y^i,i(.) is the leave-one-out estimate of yi, tr(H(.)) is the trace of the Hat matrix, and (.) refers to any of the regression models OLS or LLR. In the second part, we use simulated data to further compare the respective performances.

The SSE and MSE indicate how close the estimated responses are to their observed values. A measure of the amount of variability present in the data is explained by the regression model is given by the R2 and RAdj2. PRESS and PRESS give a measure of the model’s predictive accuracy.

The results from LLR that utilizes fixed bandwidth, locally adaptive bandwidths, see [10] and the those from the proposed locally adaptive bandwidths selector are designated LLRFB, LLRAB and LLRPAB, respectively, where the subscript PAB stands for Proposed Adaptive Bandwidths selector.

3.1. Single response chemical process data

The problem of the study as given in [10,27,29] was to relate chemical yield (y) to temperature ( x1) and time ( x2) with the aim to maximize the chemical yield. The data obtained using the Central Composite Design (CCD) is given in Table 3.

Table 3.

Single response chemical process data generated from the Central Composite Design.

i x1 x2 y
1 −1 −1 88.55
2 1 −1 85.80
3 −1 1 86.29
4 1 1 80.44
5 −1.414 0 85.50
6 1.414 0 85.39
7 0 −1.414 86.22
8 0 1.414 85.70
9 0 0 90.21
10 0 0 90.85
11 0 0 91.31

Source: see [27].

3.2. Transformation of data from Central Composite Design

Following nonparametric regression procedures in RSM, the values of the explanatory variables are coded between 0 and 1. The data collected via a CCD is transformed by a mathematical relation:

xnew=Min(xold)x0(Min(xold)Max(xold)) (30)

where xnew is the transformed value, x0 is the target value that needed to be transformed in the vector containing the old coded value, represented as xold, Min(xold) and Max(xold) are the minimum and maximum values in the vector xold, respectively [27].

The natural or coded variables in Table 3 can be transformed to explanatory variables in Table 4 using Equation (30)

Table 4.

The transformed single response chemical process data.

i x1 x2 y
 1 0.1464 0.1464 88.55
 2 0.8536 0.1464 85.80
 3 0.1464 0.8536 86.29
 4 0.8536 0.8536 80.44
 5 0.0000 0.5000 85.50
 6 1.0000 0.5000 85.39
 7 0.5000 0.0000 86.22
 8 0.5000 1.0000 85.70
 9 0.5000 0.5000 90.21
 10 0.5000 0.5000 90.85
 11 0.5000 0.5000 91.31

Source: see [27].

Target points needed to be transformed for location 1 under the coded variables are given below:

Target points x0:1,1; Min(xold):1.414,1.414;Max(xold):1.414,1.414

xnew=Min(xold)x0(Min(xold)Max(xold))Explanatoryvariablex1:x11=1.414(1)((1.414)(1.414))=0.1464Explanatoryvariablex2:x12=1.414(1)((1.414)(1.414))=0.1464

Target points needed to be transformed for location 2 under the coded variables are given below:

Target points x0:1,1; Min(xold):1.414,1.414;Max(xold):1.414,1.414

xnew=Min(xold)x0(Min(xold)Max(xold))Explanatoryvariablex1:x21=1.414(1)((1.414)(1.414))=0.8536Explanatoryvariablex2:x22=1.414(1)((1.414)(1.414))=0.1464

Target points needed to be transformed for location 6 under the coded variables are given below:

Target points x0:1.414,0; Min(xold):1.414,1.414;Max(xold):1.414,1.414

xnew=Min(xold)x0(Min(xold)Max(xold))Explanatoryvariablex1:x61=1.414(1.414)((1.414)(1.414))=1.0000Explanatoryvariablex2:x62=1.414(0)((1.414)(1.414))=0.5000

Repeating the process up to location 11, then we obtain the entries for explanatory variables x1 and x2, respectively, in Table 4.

The explanatory variables are coded between 0 and 1 by a mathematical relation as given in equation (30). Thus, the transformed data is given in Table 4.

The proposed local adaptive optimal bandwidths are presented in Table 5 and the goodness-of-fit statistics are presented in Table 6, respectively.

Table 5.

Proposed optimal tuning parameters and optimal locally adaptive bandwidths for the single response chemical process data.

i x1 x2
  T11=1.3151 T12=1.4134
  T21=2.9740 T22=1.0412
  bi1 bi2
1 0.2672 0.1826
2 0.0597 0.1826
3 0.2672 0.1446
4 0.0597 0.1446
5 0.3288 0.0006
6 0.0353 0.0006
7 0.1448 0.3534
8 0.1448 0.2996
9 0.1448 0.0006
10 0.1448 0.0006
11 0.1448 0.0006

Table 6.

Comparison of the goodness-of-fit statistics of each method for the single response chemical process data.

Method b DFerror MSE SSE R2 Radj2 PRESS PRESS PRESS
OLS 5.000 3.1600 15.8182 0.8388 0.6777 109.5179 21.9036 21.9036
LLRFB 0.5200 5.6509 5.7000 32.2355 0.6717 0.4190 93.2835 16.5076 8.9508
LLRAB * 2.9261 0.5974 1.7481 0.9822 0.9391 46.0765 15.7467 4.2858
LLRPAB * 2.0537 0.3947 0.8106 0.9917 0.9598 45.2734 22.0443 4.5398

Generally, the results in Table 6 show that the LLRPAB performs better than OLS, LLRFB and LLRAB in terms of SSE, MSE, R2, Radj2 and PRESS statistics. Whereas, LLRAB performs better than OLS, LLRFB and LLRPAB in terms of PRESS and PRESS. Also from Table 6, ‘‘*’’ represents AB of [10] and the PAB, respectively.

From Figure 4, the residual plots for different models are shown, and averagely, the LLRPAB gives the best representation of how well the regression model estimates f as given in Equation (1).

Figure 4.

Figure 4.

Graph of Model Residuals for Single Response Chemical Process Data.

From Table 7, LLRPAB provides the best chemical yield over OLS, LLRFB and LLRAB and the two settings of the explanatory variables give the best process satisfaction.

Table 7.

Comparison of optimization results for the single chemical process data.

Approach x1 x2 y^
OLS 0.43930 0.43610 90.9780
LLRFB 0.40140 0.39438 88.3509
LLRAB 0.40771 0.42312 91.1278
LLRPAB 0.7272 0.5000 92.6823

3.3. The multiple response chemical process data

 This problem is analyzed in [17,18]. The aim of the study is to get the setting of the explanatory variables x1 and x2 (representing reaction time and temperature, respectively) that would simultaneously optimize three quality measures of a chemical solution y1, y2, and y3 (representing yield, viscosity, and molecular weight, respectively).

Based on the process requirements a CCD was conducted to establish the design experiment and observed responses as presented in Table 8.

Table 8.

Designed experiment and response values for the multi-response chemical process data [17,18].

  Experimentalvariables Responses
i x1 x2 y1 y2 y3
1 −1 –1 76.5 62 2940
2 1 –1 78.0 66 3680
3 −1 1 77.0 60 3470
4 1 1 79.5 59 3890
5 −1.414 0 75.6 71 3020
6 1.414 0 78.4 68 3360
7 0 –1.414 77.0 57 3150
8 0 1.414 78.5 58 3630
9 0 0 79.9 72 3480
10 0 0 80.3 69 3200
11 0 0 80.0 68 3410
12 0 0 79.7 70 3290
13 0 0 79.8 71 3500

The values of the explanatory variables are transformed by the relation in Equation (30) coded between 0 and 1 as given in Table 9.

Table 9.

The transformed multiple response chemical process data.

i x1 x2 y1 y2 y3
1 0.1464 0.1464 76.5 62 2940
2 0.8536 0.1464 78.0 66 3680
3 0.1464 0.8536 77.0 60 3470
4 0.8536 0.8536 79.5 59 3890
5 0.0000 0.5000 75.6 71 3020
6 1.0000 0.5000 78.4 68 3360
7 0.5000 0.0000 77.0 57 3150
8 0.5000 1.0000 78.5 58 3630
9 0.5000 0.5000 79.9 72 3480
10 0.5000 0.5000 80.3 69 3200
11 0.5000 0.5000 80.0 68 3410
12 0.5000 0.5000 79.7 70 3290
13 0.5000 0.5000 79.8 71 3500

The process requirements for each response are as follows:

  • Maximize y1 with lower limit L=78.5, and target value =80;

  • y2 should take a value in the range L=62 and U=68 with =65;

  • Minimize y3 with upper limit U=3300 and target value =3100.

The real values of the explanatory variables are transformed to values in the interval [0 1] by a mathematical relation in equation (30). This is a standard procedure for nonparametric regression models, see [29,33]. The data is presented in Table 9. The OLS is applied to get estimates of the parameters of a full second-order polynomial model specified for three responses.

The optimal values of the tuning parameters of both the proposed bandwidth selector and see [10] for each response are presented in Table 10. Table 11 presents the optimal bandwidths from each bandwidths selector. Table 12 presents the goodness of fits.

Table 10.

Optimal values of tuning parameters of the proposed locally adaptive bandwidths selector and [10] for the multiple response chemical process data.

  Proposed tuning parameters for locally adaptive bandwidth selector Edionwe et al. (2016)
  T11 T21 T12 T22 b C N
y1 1.0031 5.9998 1.2694 1.0541 0.5123 0.0797 6.0399
y2 1.5240 4.4808 0.4149 7.6721 0.4847 0.0959 2.4438
y3 0.9987 3.2763 0.6603 3.6044 1.0000 0.0896 4.8181

Table 11.

Optimal locally adaptive bandwidths from each selector for the multiple response chemical process data.

  Proposed locally adaptive bandwidths selector Edionwe et al. (2016) Bandwidths selector
  y1 y2 y3 y1 y2 y3
i bi1( xi1) bi2( xi2) bi1( xi1) bi2( xi2) bi1( xi1) bi2( xi2)      
1 0.2269 0.1655 0.3328 0.0960 0.2070 0.1393 0.4041 0.1106 0.6669
2 0.1284 0.1655 0.1460 0.0960 0.0573 0.1393 0.2781 0.0881 0.1755
3 0.2269 0.1218 0.3328 0.0627 0.2070 0.0457 0.3621 0.1219 0.3149
4 0.1284 0.1218 0.1460 0.0627 0.0573 0.0457 0.1521 0.1276 0.0360
5 0.2508 0.0008 0.3810 0.0784 0.2497 0.0862 0.4797 0.0599 0.6138
6 0.1115 0.0008 0.1168 0.0784 0.0379 0.0862 0.2445 0.0768 0.3880
7 0.1741 0.3174 0.2299 0.1037 0.1205 0.1651 0.3621 0.1389 0.5275
8 0.1741 0.2555 0.2299 0.0567 0.1205 0.0327 0.2361 0.1332 0.2087
9 0.1741 0.0008 0.2299 0.0784 0.1205 0.0862 0.1185 0.0542 0.3083
10 0.1741 0.0008 0.2299 0.0784 0.1205 0.0862 0.0849 0.0712 0.4943
11 0.1741 0.0008 0.2299 0.0784 0.1205 0.0862 0.1101 0.0768 0.3548
12 0.1741 0.0008 0.2299 0.0784 0.1205 0.0862 0.1353 0.0655 0.4345
13 0.1741 0.0008 0.2299 0.0784 0.1205 0.0862 0.1269 0.0599 0.2950

Table 12.

Model goodness of fits statistics for the multi-response chemical process data.

Response Model DF PRESS PRESS SSE MSE R2(%) RAdj2(%)
y1 OLS 7.0000 0.3361 2.3525 0.4962 0.0709 98.27 97.04
  LLRFB 7.4717 0.5686 8.4888 4.7536 0.6362 83.46 73.44
  LLRAB 4.7777 0.2063 3.0144 0.3103 0.0649 98.92 97.29
  LLRPAB 4.0144 0.0481 0.6687 0.2165 0.0539 99.25 97.75
y2 OLS 7.0000 28.8726 202.1082 36.2242 5.1749 89.98 82.81
  LLRFB 7.2576 22.0691 330.8149 80.2383 11.0558 77.79 63.27
  LLRAB 4.0000 9.2024 126.2331 10.0000 2.5000 97.23 91.70
  LLRPAB 4.0009 8.8531 121.4495 10.0000 2.4994 97.23 91.70
y3 OLS 7.0000 159,080 1,113,600 207,870 29,696 75.90 58.68
  LLRFB 9.2798 56,513 588,010 243,460 26,235 71.77 63.50
  LLRAB 5.8380 40,779 508,170 92,621 15,865 89.26 77.93
  LLRPAB 4.0000 26504 307,560 65,720 16,430 92.38 77.14

The results presented in Table 12 shows that LLRPAB, either exclusively or jointly, gives the best results in terms of all the statistics for y1 and y2. For the y3, the LLRPAB gives the best results in four out of the seven statistics for comparison. Interestingly, LLRPAB gives the best PRESS and PRESS for all the responses with Figure 5.

Figure 5.

Figure 5.

Graphs of model residuals for the multiple response chemical process data.

Figure 5 shows that the y2 residuals of both the LLRAB and LLRPAB overlap while those from LLRPAB, for the most part, are seen to lie closest to the zero residual line than those from existing models for y1 and y2. Furthermore, quite unlike the curves of existing models, we observe that approximately the same number of residuals lie both above and below the zero residual line in all the LLRPAB curves. This is indicative of the fact that LLRPAB gives curves of best fit.

The optimization solutions in Table 13 show that LLRPAB provides the settings of the explanatory variables that give the highest desirability measure.

Table 13.

Model optimal solution based on the Desirability function for multi-response chemical process data.

Model x1 x2 y^1 y^2 y^3 d1 d2 d3 D(%)
OLS 0.4449 0.2226 78.7616 66.4827 3229.9 0.1744 0.5058 0.3504 31.3800
LLRFB 0.4481 0.3709 78.5537 66.7908 3290.8 0.0358 0.4031 0.0461 8.7200
LLRAB 0.5155 0.3467 78.6965 65.0328 3285.9 0.1310 0.9891 0.0703 20.8837
LLRPAB 1.0000 0.6472 79.6033 64.0137 3212.7 0.7355 0.6712 0.4367 59.9647

3.4. The Minced Fish Quality Data

The Minced Fish Quality Data is presented in [31,33]. The problem seeks the setting of three explanatory variables x1 (washing temperature), x2 (washing time) and x3 (washing ratio of water volume to sample weight) that would optimize four aspects of quality of minced fish, namely, springiness ( y1), thiobarbituric acid number ( y2), cooking loss ( y3), and whiteness index ( y4).

Based on the process requirements, a CCD was conducted to establish the design experiment and observed responses as presented in Table 14.

Table 14.

The Minced Fish Quality Data generated through CCD [33].

  Coded levels        
i x1 x2 x3 y1 y2 y3 y4
1 −1 −1 −1 1.83 29.31 29.50 50.36
2 1 −1 −1 1.73 39.32 19.40 48.16
3 −1 1 −1 1.85 25.16 25.70 50.72
4 1 1 −1 1.67 40.18 27.10 49.69
5 −1 −1 1 1.86 29.82 21.40 50.09
6 1 −1 1 1.77 32.20 24.00 50.61
7 −1 1 1 1.88 22.01 19.60 50.36
8 1 1 1 1.66 40.02 25.10 50.42
9 −1.682 0 0 1.81 33.00 24.20 29.31
10 1.682 0 0 1.37 51.59 30.60 50.67
11 0 −1.682 0 1.85 20.35 20.90 48.75
12 0 1.682 0 1.92 20.53 18.90 52.70
13 0 0 −1.682 1.88 23.85 23.00 50.19
14 0 0 1.682 1.90 20.16 21.20 50.86
15 0 0 0 1.89 21.72 18.50 50.84
16 0 0 0 1.88 21.21 18.60 50.93
17 0 0 0 1.87 21.55 16.80 50.98

The values of the explanatory variables are transformed by the relation in Equation (30) which is coded between 0 and 1 as given in Table 15.

Table 15.

The transformed Minced Fish Quality Data [33].

i x1 x2 x3 y1 y2 y3 y4
1 0.2030 0.2030 0.2030 1.83 29.31 29.50 50.36
2 0.7970 0.2030 0.2030 1.73 39.32 19.40 48.16
3 0.2030 0.7970 0.2030 1.85 25.16 25.70 50.72
4 0.7970 0.7970 0.2030 1.67 40.18 27.10 49.69
5 0.2030 0.2030 0.7970 1.86 29.82 21.40 50.09
6 0.7970 0.2030 0.7970 1.77 32.20 24.00 50.61
7 0.2030 0.7970 0.7970 1.88 22.01 19.60 50.36
8 0.7970 0.7970 0.7970 1.66 40.02 25.10 50.42
9 0.0000 0.5000 0.5000 1.81 33.00 24.20 29.31
10 1.0000 0.5000 0.5000 1.37 51.59 30.60 50.67
11 0.5000 0.0000 0.5000 1.85 20.35 20.90 48.75
12 0.5000 1.0000 0.5000 1.92 20.53 18.90 52.70
13 0.5000 0.5000 0.0000 1.88 23.85 23.00 50.19
14 0.5000 0.5000 1.0000 1.90 20.16 21.20 50.86
15 0.5000 0.5000 0.5000 1.89 21.72 18.50 50.84
16 0.5000 0.5000 0.5000 1.88 21.21 18.60 50.93
17 0.5000 0.5000 0.5000 1.87 21.55 16.80 50.98

The process requirements for each response given in [33] are as follows:

  • Maximize y with lower bound L = 1.70, and target value ∅ =  1.92;

  • Minimize y with target value ∅ = 20.16 and upper bound U = 21.00;

  • Minimize y with target value ∅ = 16.80, and upper bound U = 20.00;

  • Maximize y with lower bound L = 45.00, and target value ∅ = 50.98.

The polynomials specified for the response variables y1 and y4 include the intercept, x1 and x12. The one specified for y2 includes the intercept, x1, x2,x12, and x1x2 and for y3 we have the intercept, x1,x2,x3,x12, x1x2,x1x3, x32. The OLS is used to get the estimates of the parameters of these polynomials.

The optimal values of the tuning parameters of both the proposed bandwidth selector and see [10] for each response are presented in Table 16. Table 17 presents the optimal bandwidths from each of the bandwidths selector. The models’ goodness of fits is presented in Table 18.

Table 16.

Optimal values of the tuning parameters of the proposed bandwidths selector and [10] for the Minced Fish Quality Data.

  Proposed tuning parameters for locally adaptive bandwidth selector Edionwe et al. (2016)
  T11 T21 T12 T22 T13 T23 b C N
y1 0.6575 0.8448 0.1463 0.8441 9.3384
y2 0.8896 4.2042 4.0000 3.2728 0.4363 0.0000 7.8641
y3 1.2054 4.7643 1.5314 4.7377 1.3423 5.1462 0.5371 0.0841 14.4996
y4 0.7504 2.9767 0.1197 0.5210 10.6354

Table 17.

Optimal locally adaptive bandwidths from each selector in the Minced FishQquality Data.

  Proposed locally adaptive bandwidths selector Edionwe et al. (2016) Bandwidths selector
  y1 y2 y3 y4        
i bi1(xi1) bi1(xi1) bi2(xi2) bi1(xi1) bi2(xi2) bi3(xi3) bi1(xi1) y1 y2 y3 y4
1 0.0443 0.1815 0.7673 0.2522 0.3200 0.2847 0.1399 0.0803 0.2044 0.1337 0.0747
2 0.1293 0.0857 0.7673 0.1334 0.3200 0.2847 0.0405 0.0806 0.2742 0.6098 0.0751
3 0.0443 0.1815 0.2631 0.2522 0.1686 0.2847 0.1399 0.0802 0.1755 0.3128 0.0746
4 0.1293 0.0857 0.2631 0.1334 0.1686 0.2847 0.0405 0.0808 0.2802 0.2468 0.0748
5 0.0443 0.1815 0.7673 0.2522 0.3200 0.1599 0.1399 0.0802 0.2080 0.5155 0.0747
6 0.1293 0.0857 0.7673 0.1334 0.3200 0.1599 0.0405 0.0805 0.2246 0.3929 0.0746
7 0.0443 0.1815 0.2631 0.2522 0.1686 0.1599 0.1399 0.0801 0.1535 0.6003 0.0747
8 0.1293 0.0857 0.2631 0.1334 0.1686 0.1599 0.0405 0.0808 0.2791 0.3411 0.0746
9 0.1644 0.2224 0.4823 0.3013 0.2383 0.2178 0.1876 0.0803 0.2301 0.3835 0.0787
10 0.3074 0.0611 0.4823 0.1014 0.2383 0.2178 0.0202 0.0818 0.3598 0.0818 0.0746
11 0.0055 0.1292 1.0000 0.1881 0.3829 0.2178 0.0827 0.0802 0.1419 0.5391 0.0750
12 0.0055 0.1292 0.1512 0.1881 0.1278 0.2178 0.0827 0.0800 0.1432 0.6333 0.0742
13 0.0055 0.1292 0.4823 0.1881 0.2383 0.3356 0.0827 0.0801 0.1663 0.4401 0.0747
14 0.0055 0.1292 0.4823 0.1881 0.2383 0.1254 0.0827 0.0800 0.1406 0.5249 0.0746
15 0.0055 0.1292 0.4823 0.1881 0.2383 0.2178 0.0827 0.0801 0.1515 0.6522 0.0746
16 0.0055 0.1292 0.4823 0.1881 0.2383 0.2178 0.0827 0.0801 0.1479 0.6475 0.0745
17 0.0055 0.1292 0.4823 0.1881 0.2383 0.2178 0.0827 0.0801 0.1503 0.7323 0.0745

Table 18.

Model goodness of fits statistics for the Minced Fish Quality Data.

Response Model DF PRESS PRESS SSE MSE R2(%) RAdj2(%)
y1 OLS 14.0000 0.0042 0.0582 0.0231 0.0017 92.13 91.00
LLRFB 12.1398 0.0026 0.0681 0.0126 0.0010 95.70 94.33
LLRAB 12.0000 0.0008 0.0216 0.0123 0.0010 95.79 94.39
LLRPAB 12.0000 0.0019 0.0491 0.0123 0.0010 95.79 94.39
y2 OLS 12.0000 19.5097 234.1166 90.9033 7.5753 93.39 91.18
LLRFB 11.2152 36.4407 786.71166 245.3568 21.8771 82.15 74.53
LLRAB 8.1282 16.7007 359.9569 38.7168 4.7633 97.18 94.45
LLRPAB 8.2177 7.4867 162.1354 37.8103 4.6011 97.25 94.64
y3 OLS 9.0000 20.2719 182.4468 41.1338 4.5704 84.06 71.66
LLRFB 8.3794 17.0573 287.0907 82.1622 9.8053 68.16 39.21
LLRAB 5.8585 11.5001 203.8490 20.4613 3.4926 92.07 78.35
LLRPAB 2.0443 8.0901 120.7925 2.0489 1.0023 99.21 93.79
OLS 14.0000 48.9101 684.7407 198.8048 14.2003 54.13 47.57
y4 LLRFB 12.0308 17.1477 454.5609 12.2623 1.0193 97.17 96.24
LLRAB 12.0000 14.0842 372.9912 12.1387 1.0116 97.20 96.27
LLRPAB 12.0001 8.8590 234.6134 12.1387 1.0116 97.20 96.27

From the results in Table 18, we observe that the LLRPAB performs quite as well as the LLRFB and the LLRAB in the both y1 and y4, This is due to the fact that both y1 and y4 involve a single explanatory variable, x1. However, LLRPAB outperforms the OLS, LLRFB and LLRAB in y2 and y3 which depend on two and three explanatory variables, respectively. Again, LLRPAB gives the best PRESS and PRESS in three out of the four responses, coming a close second in y1.

The results in Table 19 show that LLRPBS provides the setting of the explanatory variables that gives the highest desirability measure of 100%, Figure 6.

Table 19.

Model optimal solution via the Desirability function in the Minced Fish Quality Data.

Model x1 x2 x3 y^1 y^2 y^3 y^4 d1 d2 d3 d4 D(%)
OLS 0.3764 1.0000 0.7155 1.9071 19.4993 17.2185 50.3018 0.9415 1.00 0.8692 0.8866 92.29
LLRFB 0.8078 0.2375 0.9573 1.6877 36.7371 24.7076 49.7628 0.000 0.00 0.0000 0.7965 0.00
LLRAB 0.4318 1.0000 0.5673 1.8775 18.9436 19.6005 50.6611 0.8068 1.00 0.1248 0.9467 55.57
LLRPAB 0.5711 0.4481 0.6094 2.0825 20.0918 16.7583 51.0266 1.0000 1.00 1.0000 1.0000 100.00

Figure 6.

Figure 6.

Graphs of model residuals for the multiple response Minced Fish Quality Data.

Plots in Figure 6 shows that LLRPAB and LLRAB residuals of both y1 and y4 overlap. However, for y2 and y3, the LLRPAB residuals are seen to lie the closest to the zero residual line than those from the existing models . Again, for all the LLRPAB curves, we observe that approximately the same number of residuals below and above the zero residual line, indicative of the fact that LLRPAB gives curves of best fits.

3.5. Simulation study

In the examples given in section 3.3 and 3.4, it was shown that the goodness of fits and the optimal solutions of fits of LLRPAB were either better than or highly competitive when compared with the results from the OLS, LLRFB and LLRAB. In this subsection, we compare the performances of the respective regression models via simulated data. Each Monte Carlo simulation comprises 1000 data sets based on the following underlying polynomial models:

Model 1: yi=70+12x1i24+γ{3sin(3πx1i)}+εi;

Model 2: yi=3331x1i+20x1iγ{2cos(4πx1i)}+εi;

Model 3: yi=2010x1i25x2i15x1ix2i+20x1i2+50x2i2+{2sin(4πx1i)+2cos(4πx2i)2sin(4πx1ix2i)}+εi;

Model 4: yi=66+22x1i+10x2i+13x1ix2i23x1i225x2i2+γ{2sin(3πx1i)2cos(3πx2i)+2sin(2πx1ix2i)}+εi;

Model 5: yi=45+27x1i+9x2i+19x3i22x1ix2i17x2ix3i8x1ix3i+10x1i2+13x2i2+13x3i2+γ(2sin(3πx1i)2cos(3πx2i)3cos(4πx3i)+2sin(3πx1ix2i)+2cos(3πx2ix3i)+2sin(3πx1ix3i))+εi

Model 6: yi=83+19x1i41x2i14x3i36x1ix2i15x2ix3i+28x1ix3i+15x1i2+25x2i211x3i2(2sin(4πx1i)2cos(13πx2i)+2sin(2πx3i)+2sin(3πx1ix2i)+3cos(4πx2ix3i)5cos(2πx1ix3i))+εi,

where the x1i, x2i and x3i are the values of the explanatory variables, εi,i=1,2,,n, are the error terms which are normally distributed with mean zero and variance 1, and γrepresents a misspecification parameter. The values of the explanatory variables are presented in Tables 20 and 21.

Table 20.

The CCD for the Simulating Data for Models 1–4.

i x1 x2
1 0.8536 0.8536
2 0.1464 0.8536
3 0.8536 0.1464
4 0.1464 0.1464
5 1.0000 0.5000
6 0.0000 0.5000
7 0.5000 1.0000
8 0.5000 0.0000
9 0.5000 0.5000
10 0.5000 0.5000
11 0.5000 0.5000
12 0.5000 0.5000
13 0.5000 0.5000

Table 21.

The CCD for the Simulating Data for Models 5 and 6.

i x1 x2 x3
1 0.2030 0.2030 0.2030
2 0.7970 0.2030 0.2030
3 0.2030 0.7970 0.2030
4 0.7970 0.7970 0.2030
5 0.2030 0.2030 0.7970
6 0.7970 0.2030 0.7970
7 0.2030 0.7970 0.7970
8 0.7970 0.7970 0.7970
9 0.0000 0.5000 0.5000
10 1.0000 0.5000 0.5000
11 0.5000 0.0000 0.5000
12 0.5000 1.0000 0.5000
13 0.5000 0.5000 0.0000
14 0.5000 0.5000 1.0000
15 0.5000 0.5000 0.5000
16 0.5000 0.5000 0.5000
17 0.5000 0.5000 0.5000

The goal of the simulation study is to demonstrate the resolve of each of the regression models when applied to studies that consist of one, two, or three explanatory variables, respectively. The model Average Sum of Squares (AVESSE) for each degree of model misspecification is presented in Table 22.

Table 22.

Comparison of the AVESSE of each method for each model in the simulation studies.

Model γ OLS LLRFB LLRAB LLRPAB
(1) 0.00 9.8961 8.3371 8.3220 8.3133
  0.50 22.5001 8.4606 8.4105 8.4204
  1.00 48.7471 8.4817 8.4120 8.4310
(2) 0.00 9.8769 8.1392 8.2887 8.2679
  0.50 16.2334 8.4989 8.2899 8.2973
  1.00 30.5292 9.4051 9.1337 8.9398
(3) 0.00 6.9849 68.9816 6.3277 4.0700
  0.50 18.0887 61.6146 14.4455 4.7940
  1.00 51.0910 99.0211 15.1152 5.1912
(4) 0.00 7.0210 34.0919 13.6632 4.0198
  0.50 13.7667 41.8323 20.9044 7.0169
  1.00 39.1912 72.1624 38.9560 8.9640
(5) 0.00 7.0113 28.9237 6.2117 5.8945
  0.50 125.2006 254.4773 12.5466 6.3215
  1.00 479.6291 747.5212 71.8911 26.041
(6) 0.00 7.2458 37.5407 7.9100 4.7715
  0.50 44.1519 64.3340 12.1213 5.8219
  1.00 155.2220 173.5006 22.1993 8.8906

The values of the AVESSE of the LLRFB, LLRAB and LLRPAB for models 1 and 2 are approximately the same but better than the AVESSE of the OLS model across all the degrees of model misspecifications. For models 3 and 4 through models 5 and 6 where the curse of dimensionality is most intense, LLRPAB gives the best AVESSE. Furthermore, while the AVESSE of the LLRPAB are fairly stable as γ increases from 0 to 1 across the models 3 through 6, the AVESSE of the OLS, the LLRFB and the LLRAB deteriorate quite rapidly.

3.6. Neural network computing and application

The application of neural network cut across multidisciplinary studies which range from neuroscience to theoretical statistical physics and more importantly, the most significant theoretical and applied topics in neural networks and computing is the choice of adequate activation functions because they can capture nonlinearity in data, hence form the core in both deep and shallow learning with different architectures [21].

We shall give the mathematical relation of SPOCU activation function in neural network as given in the literature and applied it to three RSM data (Single response, Multiple-response chemical process data, and the Minced Fish Quality Data):

3.6.1. Scaled polynomial constant unit (SPOCU) activation function

The SPOCU activation function is given by;

s(x)=αh(xγ+β)αh(β),whereβ(0,1),α,γ>0 (31)

and the generator:

h(x)={r(c),xcr(x),x[0,c)0,x<0 (32)

with r(x)=x3(x52x4+2) and 1c<.c tend to infinity as r(c), see [21].

The performance statistic obtained from the activation functions, such as ReLU, Leaky-ReLU, SELU, and SPOCU were adequate. But SPOCU activation function shows a satisfactory result in terms of smaller mean squared error (MSE) over ReLU, Leaky-ReLU, and SELU and the result is justified in Figures 7, 8, and 9, respectively. See Tables 23, 24, 25 and Figures 7, 8, 9.

Figure 7.

Figure 7.

Graph of Model Loss function (MSE) via neural network computing for single response chemical process data ( y1).

Figure 8.

Figure 8.

Graph of Model Loss function (MSE) via neural network computing for multi-response chemical process data.

Figure 9.

Figure 9.

Graph of Model Loss function (MSE) via neural network computing for multi-response Minced Fish Quality Data.

4. Conclusion

Quality is one of the most important factors that inform a consumer’s preference for one product among several competing products. Consequently, improving the quality of a product is a key strategy that leads to business growth, enhanced competitiveness and huge returns to investment also see [6,27].

In the early stage of the design of a new product, research teams run experiments and build regression models in order to identify the setting of the explanatory variables that optimize responses related to the quality of the new product. This series of activities is referred to as product qualification in the manufacturing circles, see [25,26].

Once a product has been qualified, its recipe, which includes the identified optimal setting of the explanatory variables, is used to produce the product in a large scale for the intended consumers. The reliability of the optimal setting of the explanatory variables depends on how well the regression model fits the data, also see [6,18]. A regression model that gives relatively low Prediction Errors Sum of Squares, with a comparative high R2 provides statistically more reliable optimal solutions, see [29,33].

In this paper, we proposed a new locally adaptive bandwidths selector for smoothing RSM data. The proposed bandwidth selector is applied in the LLR model for fitting simulated data and three problems in the literature. The results of the goodness of fits and optimal solutions obtained show that the LLR regression model utilizing the proposed bandwidths selector performs better than the OLS, the fixed bandwidth LLR, and the LLR that utilizes the locally adaptive bandwidths selected by the existing locally adaptive bandwidths selector proposed by [10].

Data consisting of two or three explanatory variables are commonplace in RSM. This creates a problem referred to as the curse of dimensionality for the LLR model which normally thrives in modeling data that involves only a single explanatory variable. However, the results from the three examples and the simulated data show that the LLR model benefits more from bandwidths selected by the proposed locally adaptive bandwidths selector that takes into account both the number and values of the explanatory variables at each data point than it does from bandwidths selected by fixed and the existing locally adaptive bandwidths selector.

 Neural network activation functions such ReLU, Leaky-ReLU, SELU, and SPOCU were considered with a remarkable improvement on the loss function (MSError) over the regression models utilized in the three RSM data. Among the four activation functions, SPOCU show to work satisfactorily on the variety of problems over RELU, Leaky-ReLU, and SELU activation functions, see Tables 23, 24, 25, respectively.

Table 23.

Loss function (MSE) via neural network computing for single response chemical process data.

Activation function Loss ( y1)
ReLU 0.0062
Leaky-ReLU 0.0062
SELU 0.0062
SPOCU 0.0062

Table 24.

Loss function (MSE) via neural network computing for multi-response chemical process data.

Activation function Loss ( y1) Loss ( y2) Loss ( y3)
ReLU 0.0329 0.0391 0.0762
Leaky-ReLU 0.0074 0.0277 0.0764
SELU 0.0086 0.0277 0.0762
SPOCU 0.0074 0.0277 0.0762

Table 25.

Loss function (MSE) via neural network computing for multi-response Minced Fish Quality Data.

Activation function Loss ( y1) Loss ( y2) Loss ( y3) Loss ( y4)
ReLU 0.000682 0.0186 0.0079 0.0000233
Leaky-ReLU 0.00073 0.0000981 0.0079 0.0000237
SELU 0.0102 0.000203 0.0083 0.0000259
SPOCU 0.000682 0.000148 0.0079 0.0000232

Acknowledgements

I am obliged to my PhD supervisor, Prof. J. I. Mbegbu for his tutelage. Thanks to Dr E. Edionwe for his relentless contributions.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Declaration of Interest statement

‘None’

References

  • 1.Adalarasan R., and Santhanakumar M., Response surface methodology and desirability analysis for optimizing μ WEDM parameters for A16351/20% Al2O2 composite. Int. J. ChemTech Res. 7 (2015), pp. 2625–2631. [Google Scholar]
  • 2.Alvarez M.J., Izarbe L., Viles E., and Tanco M., The use of genetic algorithm in response surface methodology. J. Qual. Technol. Quant. Manag. 6 (2009), pp. 295–309. [Google Scholar]
  • 3.Anderson-Cook C.M., and Prewitt K., Some guidelines for using nonparametric models for modeling data from response surface designs. J. Mod. Appl. Stat. Models 4 (2005), pp. 106–119. [Google Scholar]
  • 4.Atkeson C.G., Moore A.W., and Schaal S., Locally weighted learning. Artif. Intell. Rev. 11 (1997), pp. 11–73. [Google Scholar]
  • 5.Box G.E.P., and Wilson B., On the experimental attainment of optimum conditions. J. R. Stat. Soc. B 13 (1951), pp. 1–45. [Google Scholar]
  • 6.Castillo D.E., Process Optimization: A Statistical Method, Springer International Series in Operations Research and Management Science, New York, 2007. [Google Scholar]
  • 7.Chen Y., and Ye K., Bayesian hierarchical modelling on dual response surfaces in partially replicated designs. J. Qual. Technol. Quant. Manag. 6 (2009), pp. 371–389. [Google Scholar]
  • 8.Derringer G., and Suich R., Simultaneous optimization of several response variables. J. Qual. Technol. 12 (1980), pp. 214–219. [Google Scholar]
  • 9.Edionwe E., and Mbegbu J.I., Local bandwidths for improving the performance statistics of model robust regression 2. J. Mod. Appl. Stat. Methods. 13 (2014), pp. 506–527. [Google Scholar]
  • 10.Edionwe E., Mbegbu J.I., and Chinwe R., A new function for generating local bandwidths for semi–parametric MRR2 model in response surface methodology. J. Qual. Technol. 48 (2016), pp. 388–404. [Google Scholar]
  • 11.Fan J., and Gijbels I., Data-driven bandwidth selection in local polynomial fitting: A variable bandwidth and spatial adaptation. J. R. Stat. Soc. Ser. B 57 (1995), pp. 371–394. [Google Scholar]
  • 12.Fan J., and Gijbels I., Local Polynomial Modeling and its Applications, Chapman and Hall, London, 1996. [Google Scholar]
  • 13.Geenens G., Curse of dimensionality and related issues in nonparametric functional regression. Stat. Surv. 5 (2011), pp. 30–43. [Google Scholar]
  • 14.Hardle W., Muller M., Sperlich S., and Werwatz A., Nonparametric and Semiparametric Models: An Introduction, Springer-Verlag, Berlin, 2005. [Google Scholar]
  • 15.Harrington E.C., The desirability function. Ind. Qual. Control 21 (1965), pp. 494–498. [Google Scholar]
  • 16.Heredia-Langner A., Montgomery D.C., Carlyle W.M., and Borer C.M., Model robust optimal designs: A genetic algorithm method. J. Qual. Technol. 36 (2004), pp. 263–279. [Google Scholar]
  • 17.He Z., Wang J., Oh J., and Park S.H., Robust optimization for multiple responses using response surface methodology. Appl. Stoch. Models. Bus. Ind. 26 (2009), pp. 157–171. [Google Scholar]
  • 18.He Z., Zhu P.E., and Park S.H., A robust desirability function for multi-response surface optimization. Eur. J. Oper. Res. 221 (2012), pp. 241–247. [Google Scholar]
  • 19.Holland J., Adaptation in Natural and Artificial Systems, University of Michigan Press, Ann Arbor, 1975. [Google Scholar]
  • 20.Johnson R.T., and Montgomery D.C., Choice of second-order response surface designs for logistics and Poisson regression models. Int. J. Exp. Des. Process Optim. 1 (2009), pp. 2–23. [Google Scholar]
  • 21.Kiselak J., Lu Y., Svihra J., Szepe P., and Stehlik M., “SPOCO”: scaled polynomial constant unit activation function. Neural Comput. Appl. (2020), doi: 10.1007/s00521-020-05182-1. [DOI] [Google Scholar]
  • 22.Kohler M.A., Schindler A., and Sperlich S., A review and comparison of bandwidth selection methods for kernel regression. Int. Stat. Rev. 82 (2014), pp. 243–274. [Google Scholar]
  • 23.Mays J.E., Birch J.B., and Starnes B.A., Model robust regression: Combining parametric, nonparametric, and semi-parametric models. J. Nonparametr. Stat. 13 (2001), pp. 245–277. [Google Scholar]
  • 24.Mondal A., and Datta A.K., Investigation of the process parameters using response surface methodology on the quality of crustless bread baked in a water-spraying oven. J. Food Process Eng 34 (2011), pp. 1819–1837. [Google Scholar]
  • 25.Montgomery D.C., Introduction to Statistical Quality Control. 7th Ed., John Wiley & Sons, New York, 2009. [Google Scholar]
  • 26.Myers R.H., Response surface methodology – Current status and future directions. J. Qual. Technol. 31 (1999), pp. 30–44. [Google Scholar]
  • 27.Myers R., Montgomery D.C., and Anderson-Cook C.M., Response Surface Methodology: Process and Product Optimization Using Designed Experiments, Wiley, Toronto, ON, 2009. [Google Scholar]
  • 28.Nadaraya E.A., On estimating regression. J. Theory Probab. Appl. 9 (1964), pp. 141–142. [Google Scholar]
  • 29.Pickle S.M., Robinson T.J., Birch J.B., and Anderson-Cook C.M., A semi-parametric model to robust parameter design. J. Stat. Plan. Inference. 138 (2008), pp. 114–131. [Google Scholar]
  • 30.Sestelo M., Villanueva N.M., Meira-Machado L., and Roca-Pardinas J., An R package for nonparametric estimation and inference in life sciences. J. Stat. Softw. 82 (2017), pp. 1–27. [Google Scholar]
  • 31.Shah K.H., Montgomery D.C., and Carlyle W.M., Response surface modelling and optimization in multi-response experiments using seemingly unrelated regressions. Qual. Eng. 16 (2004), pp. 387–397. [Google Scholar]
  • 32.Thongsook S., Borkowski J.J., and Budsaba K., Using a genetic algorithm to generate Ds – optimal designs with bounded D-efficiencies for mixture experiments. J. Thail. Stat. 12 (2014), pp. 191–205. [Google Scholar]
  • 33.Wan W., and Birch J.B., A semi-parametric technique for multi-response optimization. J. Qual. Reliab. Eng. Int. 27 (2011), pp. 47–59. [Google Scholar]
  • 34.Wu C.F.J., and Hamada M.S., Experiments: Planning, Analysis and Parameter Design Optimization, John Wiley & Sons, Inc, New York, 2000. [Google Scholar]
  • 35.Yeniay O., Comparative study of algorithm for response surface optimization. J. Math. Comput. Appl. 19 (2014), pp. 93–104. [Google Scholar]
  • 36.Zheng Q., Gallagher C., and Kulasekera K.B., Adaptively weighted kernel regression. J. Nonparametr. Stat. 25 (2013), pp. 855–872. [Google Scholar]

Articles from Journal of Applied Statistics are provided here courtesy of Taylor & Francis

RESOURCES