Kernel principal component analysis (PCA) control chart for monitoring mixed non-linear variable and attribute quality characteristics

Muhammad Ahsan; Muhammad Mashuri; Hidayatul Khusna; Wibawati

doi:10.1016/j.heliyon.2022.e09590

. 2022 Jun 6;8(6):e09590. doi: 10.1016/j.heliyon.2022.e09590

Kernel principal component analysis (PCA) control chart for monitoring mixed non-linear variable and attribute quality characteristics

Muhammad Ahsan ^1,^⁎, Muhammad Mashuri ¹, Hidayatul Khusna ¹, Wibawati ¹

PMCID: PMC9189028 PMID: 35706944

Abstract

The products are commonly measured by two types of quality characteristics. The variable characteristics measure the numerical scale. Meanwhile, the attribute characteristics measure the categorical data. Furthermore, in monitoring processes, the multivariate variable quality characteristics may have a nonlinear relationship. In this paper, the Kernel PCA control chart is applied to monitor the mixed (attribute and variable) characteristics with the nonlinear relationship. First, the Average Run Length (ARL) is utilized to evaluate the performance of the proposed chart. The simulation studies show that the proposed chart can detect the shift in process. For this case, the Radial Basis Function (RBF) kernel demonstrates the consistent performance for several cases studied. Second, the performance comparison between the proposed chart and the conventional PCA Mix chart is performed. Based on the results, it is known that the proposed chart performs better in detecting the small shift in process. Finally, the proposed chart is applied to monitor the well-known NSL KDD dataset. The proposed chart shows good accuracy in detecting intrusion in the network. However, it still produces more False Negatives (FN).

Keywords: Kernel PCA, $T^{2}$ Hotelling's chart, Mixed quality characteristics, Kernel Density Estimation, Nonlinearity

Kernel PCA; $T^{2}$ Hotelling's chart; Mixed quality characteristics; Kernel Density Estimation; Nonlinearity

1. Introduction

Two types of control charts have been developed based on the monitored quality characteristics. These charts are named as the attribute and variable charts. The variable control chart is developed to monitor the variable quality characteristics (in variable or ratio scale) such as length, temperature, or height (Montgomery, 2009). Meanwhile, to monitor the attribute quality characteristics (in categorical scale) the attribute chart was applied (Ahsan et al., 2018). When the characteristics quality is correlated or cannot be monitored separately, the multivariate control chat has been developed. There are three main types of multivariate variable control charts namely Shewhart, multivariate exponentially weighted moving average (MEWMA), and multivariate cumulative sum (MCUSUM).

The product quality characteristics are not only gauged individually by the attribute or variable characteristics but also can be monitored using a mixed scheme. In order to facilitate a mixed procedure of the monitoring process, several works have studied the development of the mixed characteristics charts. The mixed scheme by employing the combination between $\overline{X}$ and np charts has been proposed and has a good performance in monitoring mixed characteristics (Aslam et al., 2015). The mixed chart proposed by Aslam et al. (2015) is compared with Hybrid Exponential Weighted Moving Average (HEWMA) (Aslam et al., 2016). The spatial-sign covariance matrix-based control chart has been proposed by integrating the standardized ranks and spatial signs in calculating the mixed statistics (Wang et al., 2018). Furthermore, the principal component analysis for mixed data is applied in inspecting the process (Ahsan et al., 2018) and in detecting outliers (Ahsan et al., 2019). To overcome the PCA Mix chart drawbacks, Ahsan et al. (2020) proposed the Kernel PCA (KPCA) Mix chart for monitoring the mixed variable and attribute quality characteristics.

The problem arises when the PCA Mix chart (Ahsan et al., 2018) is applied to inspect the nonlinear multivariate processes. In monitoring processes, the multivariate quality characteristics may have a nonlinear relationship. Some studies about the utilization of control charts in detecting a shift in nonlinear data have been conducted. A multivariate chart based on KPCA and Exponentially Weighted Moving Average (EWMA) is proposed to monitor nonlinear biological processes (Yoo and Lee, 2006). Khediri et al. (2010) suggested Support Vector Regression (SVR) control charts for multivariate nonlinear processes with dependency on its samples. Fan et al. (2014) proposed a control chart based on filtering kernel independent component analysis–principal component analysis (FKICA–PCA) to monitor multivariate industrial processes. The nonparametric Revised Spatial Rank Exponential Weighted Moving Average (RSREWMA) control chart is developed to assess the multivariate nonlinear profile data (Pan et al., 2019). Kernel PCA can be applied in monitoring such cases mentioned above by using the control chart approach.

Based on the previous study, the KPCA Mix chart (Ahsan et al., 2020) can be extended to monitor the multivariate nonlinear data. Therefore, this research suggests a mixed multivariate control chart based on the KPCA algorithm that can accommodate the mixed type of quality characteristics with the nonlinear relationship. The estimated PCs Mix from KPCA are then transformed into Hotelling's $T^{2}$ statistics. The control limit of $T^{2}$ statistics is calculated using the kernel density estimation (KDE), the same method used in Ahsan et al. (2020). Moreover, to show the benefits and drawbacks of the proposed chart, its performance is compared with the conventional PCA Mix chart. The rest of this article is arranged as follows: Some related studies are shown in section 2. Section 3 describes the Kernel PCA method. The charting procedures of the proposed KPCA Mix control chart are displayed in section 4. Section 5 presents the performance assessment of the proposed chart in detecting a shift in the process along with the comparison with the PCA Mix chart. The utilization of the proposed chart in simulated and real data is shown in Section 6. Some conclusions and possible future research are presented in Section 7.

2. Related research

The recent studies of the control charts are presented in this section. There are three main categories of control charts discussed in this section such as a multivariate variable chart, attribute chart, and mixed chart. The recent developments in multivariate variable charts are displayed in Table 1. Table 2 shows the recent developments of multivariate attribute charts. Meanwhile, the recent developments in mixed characteristics are presented in Table 3.

Table 1.

The recent development of multivariate variable control charts.

Sources	Proposed scheme	Findings
Chiang et al. (2021)	New scheme of multivariate auxiliary-information-based (AIB) chart	The performance of the proposed chart is evaluated using Monte-Carlo simulation and applied to cement data
Ahmad and Ahmed (2021)	T² control chart to inspect the high dimensional data	The proposed method is usable without preprocessing or dimension reduction with high accuracy detection
Haddad (2021)	T² control charts using modified Mahalanobis distance	The proposed method has better performance in detecting more outliers compared to the traditional chart
Cabana and Lillo (2021)	Robust multivariate chart for individual observations using reweighted shrinkage estimators	The proposed chart has a better performance for high dimensional and high contaminated data
Maleki et al. (2020)	Median estimators of the T² control chart	The proposed method outperforms performance compared to the conventional chart
Haddad et al. (2019)	Bivariate Hotelling's T² charts with bootstrap data	The proposed method shows a better performance compared to the conventional method
Tiengket et al. (2020)	Bivariate Copulas on the Hotelling's T² Control Chart	The bivariate copulas method can be used in the Hotelling's T² chart
Mashuri et al. (2019)	Tr (R²) control charts with Kernel Density Estimation (KDE) control limit	The proposed control chart method presents better performance to detect the shift for the large characteristics and sample size
Mehmood et al. (2019)	Hotelling T² control chart based on bivariate ranked set schemes	Proposed control chart schemes demonstrate an outstanding performance compared to the classical Hotelling T²
Haq and Khoo (2019)	Adaptive MEWMA chart	The proposed chart surpasses the performances of the existing adaptive multivariate charts
Flury and Quaglino (2018)	MEWMA chart for asymmetric gamma distributions	The proposed MEWMA chart outperforms the performance of the conventional T² chart in all the cases
Haq et al. (2020)	Dual MCUSUM charts with auxiliary information for the process mean	The proposed chart has a better performance compared to the DMCUSUM and MDMCUSUM charts when detecting different sizes of a shift in the process mean vector

Sources	Proposed scheme	Findings
Yeganeh et al. (2021)	Combined novel run rules and MEWMA control chart	The proposed method has better performance for small and moderate shifts in monitoring linear profiles
Xie et al. (2021)	MCUSUM control chart for monitoring Gumbel's bivariate exponential data	The proposed chart outperforms the other charts for most shift domains
Mashuri et al. (2020)	Fuzzy bivariate chart	The proposed chart is more sensitive than the conventional bivariate Poisson chart
Zhou et al. (2020)	Synthetic control chart for attribute inspection	The proposed chart demonstrates a higher detection performance for small and large mean shifts
Quinino et al. (2020)	Attribute chart for the joint monitoring of mean and variance	The proposed method is easier to be implemented compared to the conventional approach
Aldosari et al. (2019)	Attribute control chart for multivariate Poisson distribution using multiple dependent state repetitive sampling (MDSRS)	The proposed method has a better performance than the conventional one based on repetitive sampling
Aslam et al. (2019)	Shewhart attribute control with the neutrosophic statistical interval	The proposed attribute control chart has a good ability to detect a shift in the process
Chong et al. (2019)	Multi-attribute CUSUM-np chart	The proposed procedure has a better or equal performance compared to the conventional chart
Aslam (2019)	Attribute control chart using the repetitive sampling under the fuzzy neutrosophic system	The proposed chart with repetitive sampling under the fuzzy neutrosophic system is more sensitive in detecting a shift in the process as compared with the existing chart
Lee et al. (2017)	Multinomial generalized likelihood ratio (MGLR) chart	The proposed chart has better performance than the set of 2-sided Bernoulli CUSUM charts

Sources	Proposed scheme	Findings
Ahsan et al. (2020)	Kernel PCA Mix Chart	The proposed chart has a better performance compared to the PCA Mix chart
Ahsan et al. (2019)	PCA Mix chart for detecting outlier in mixed characteristics scheme	The proposed chart has a great performance to detect more outliers with a higher percentage of outliers added compared to the conventional and other robust charts
Ahsan et al. (2018)	PCA Mix control chart	The proposed chart presents good performance for an appropriate number of principal components used
Wang et al. (2018)	Multivariate sign chart	Simulations show the superiority of the proposed control chart in monitoring mixed-type data
Aslam et al. (2015)	The mixed chart to monitor the process	The mixed chart shows excellent performance in the monitoring process

Shift		Kernel functions
δ_S	δ_μ	RBF	Polynomial	Linear
0	0	376.820	374.850	379.000
0.1	0.0025	367.375	377.570	375.855
0.2	0.0050	357.063	354.560	368.283
0.3	0.0075	313.003	345.998	365.330
0.4	0.0100	284.322	330.686	346.508
0.5	0.0125	264.272	317.742	327.998
0.6	0.0150	250.244	302.643	310.600
0.7	0.0175	236.421	286.088	293.735
0.8	0.0200	226.051	268.144	274.916
0.9	0.0225	220.402	252.661	261.438
1.0	0.0250	219.707	238.942	246.952
1.1	0.0275	224.183	225.516	233.486
1.2	0.0300	239.949	213.429	221.341
1.3	0.0325	272.919	202.299	209.421
1.4	0.0350	310.705	191.916	199.267
1.5	0.0375	352.232	182.158	189.546

Shift		$p = 5$ , $l = 2$		$p = 5$ , $l = 3$		$p = 5$ , $l = 4$
δ_S	δ_μ	KPCA Mix	PCA Mix	KPCA Mix	PCA Mix	KPCA Mix	PCA Mix
0	0	376.820	383.490	370.920	376.110	362.860	385.690
0.1	0.0025	367.375	358.360	361.410	465.410	365.750	438.810
0.2	0.0050	357.063	340.610	356.220	408.150	359.500	430.130
0.3	0.0075	313.003	361.040	323.920	493.960	350.493	469.200
0.4	0.0100	284.322	397.270	303.690	424.150	338.492	436.240
0.5	0.0125	264.272	352.370	281.498	430.750	321.630	499.830
0.6	0.0150	250.244	335.160	267.280	413.010	311.320	461.580
0.7	0.0175	236.421	276.230	252.489	364.630	297.746	411.360
0.8	0.0200	226.051	253.160	235.111	303.430	285.544	332.780
0.9	0.0225	220.402	217.230	220.115	315.980	274.721	328.360
1.0	0.0250	219.707	154.640	207.927	213.670	260.177	263.660
1.1	0.0275	224.183	134.610	196.856	169.880	248.052	212.700
1.2	0.0300	239.949	120.240	186.622	166.900	236.200	177.520
1.3	0.0325	272.919	89.690	177.566	136.860	224.626	166.600
1.4	0.0350	210.705	70.400	169.523	107.190	214.449	140.340
1.5	0.0375	152.232	67.120	162.292	87.070	205.602	95.630

Parameter data non-metric	l	Kernel PCA Mix
θ₁,θ₂ = 0.3 and θ₃ = 0.4	2	•
	3	•
	4	•

θ₁,θ₂ = 0.1 and θ₃ = 0.8	2	•
	3	•
	4	•

θ₁,θ₂ = 0.05 and θ₃ = 0.9	2	•
	3	•
	4	•

Attack types	Number of observations	Percentage (%)
Normal	13,449	53.39

DOS	9,234	36.65
Probe	2,289	9.09
U2R	11	0.04
R2L	209	0,83

Total	25,192	100.00

l	Accuracy	FP rate	FN rate
2	0.82744	0.06751	0.29285
3	0.84741	0.06714	0.25044
4	0.85769	0.08305	0.21016
5	0.84653	0.07361	0.24491
7	0.82347	0.13183	0.22771
10	0.84741	0.06714	0.25044
20	0.68986	0.42724	0.17601

σ	Accuracy	FP rate	FN rate
0.10000	0.58772	0.02632	0.85429
0.01000	0.84522	0.06825	0.25385
0.00100	0.85769	0.08305	0.21016
0.00500	0.84590	0.06022	0.26160
0.00010	0.63492	0.52643	0.18027
0.00001	0.53385	0.00000	1.00000

Method	Accuracy	FP rate
Hybrid Decision Tree (Farid et al., 2014)	0.8192	0.1740
Hybrid Naïve Bayes (Farid et al., 2014)	0.8239	0.1640
Logistic Regression (Belavagi and Muniyal, 2016)	0.8400	0.1700
Support Vector Machine (Belavagi and Muniyal, 2016)	0.7500	0.2400
Hotelling's T² chart	0.7023	0.1433
PCA Mix	0.8041	0.3171
Proposed method	0.8577	0.0831

Shift		Kernel
δ_S	δ_μ	RBF	Polynomial	Linear
0	0	386.060	367.300	380.950
0.1	0.0025	346.665	349.770	384.000
0.2	0.0050	306.600	328.840	379.383
0.3	0.0075	268.633	327.043	366.278
0.4	0.0100	242.388	317.712	348.862
0.5	0.0125	222.198	302.458	333.512
0.6	0.0150	208.613	284.601	314.729
0.7	0.0175	193.365	266.913	295.940
0.8	0.0200	182.924	250.563	277.669
0.9	0.0225	175.184	235.500	262.847
1.0	0.0250	172.916	222.804	246.770
1.1	0.0275	172.819	209.485	233.871
1.2	0.0300	176.240	198.273	220.032
1.3	0.0325	175.111	187.549	207.769
1.4	0.0350	167.725	178.290	197.263
1.5	0.0375	159.685	169.472	187.162

Shift		Kernel
δ_S	δ_μ	RBF	Polynomial	Linear
0	0	371.020	359.550	396.730
0.1	0.0025	369.610	376.185	425.675
0.2	0.0050	355.200	374.097	423.697
0.3	0.0075	353.843	369.205	422.503
0.4	0.0100	331.568	358.198	400.838
0.5	0.0125	306.777	351.158	377.570
0.6	0.0150	284.471	335.774	355.724
0.7	0.0175	264.586	319.088	336.219
0.8	0.0200	248.086	301.681	317.538
0.9	0.0225	233.595	284.584	299.487
1.0	0.0250	220.216	269.831	281.449
1.1	0.0275	207.939	256.110	265.438
1.2	0.0300	197.140	242.057	250.743
1.3	0.0325	187.698	228.902	238.136
1.4	0.0350	178.887	217.107	226.352
1.5	0.0375	161.626	206.726	214.664

Shift		Kernel
δ_S	δ_μ	RBF	Polynomial	Linear
0	0	371.100	394.810	377.530
0.1	0.0025	351.615	382.655	396.125
0.2	0.0050	337.440	360.083	401.523
0.3	0.0075	335.985	345.143	395.915
0.4	0.0100	322.286	329.336	381.536
0.5	0.0125	308.940	309.580	363.160
0.6	0.0150	296.383	295.949	344.946
0.7	0.0175	279.708	278.995	325.604
0.8	0.0200	264.274	265.733	306.423
0.9	0.0225	251.411	252.864	287.762
1.0	0.0250	238.127	239.604	273.223
1.1	0.0275	226.427	228.050	260.837
1.2	0.0300	217.344	218.267	248.189
1.3	0.0325	207.195	207.876	236.569
1.4	0.0350	197.691	198.643	225.320
1.5	0.0375	188.935	189.732	215.198

Shift		Kernel
δ_S	δ_μ	RBF	Polynomial	Linear
0	0	380.770	398.270	351.040
0.1	0.0025	364.740	426.900	363.020
0.2	0.0050	317.727	404.150	370.863
0.3	0.0075	281.193	388.378	358.250
0.4	0.0100	257.002	375.390	346.804
0.5	0.0125	239.968	353.718	335.335
0.6	0.0150	224.706	333.024	312.767
0.7	0.0175	210.456	310.153	293.535
0.8	0.0200	204.304	290.936	276.356
0.9	0.0225	197.367	272.970	259.842
1.0	0.0250	198.296	256.436	245.284
1.1	0.0275	187.847	242.783	231.725
1.2	0.0300	184.638	229.729	218.880
1.3	0.0325	173.244	217.334	206.827
1.4	0.0350	171.971	205.771	196.301
1.5	0.0375	160.653	195.590	186.618

PERMALINK

Kernel principal component analysis (PCA) control chart for monitoring mixed non-linear variable and attribute quality characteristics

Muhammad Ahsan

Muhammad Mashuri

Hidayatul Khusna

Wibawati

Abstract

1. Introduction

2. Related research

Table 1.

Table 2.

Table 3.

3. Kernel PCA

Figure 1.

4. Kernel PCA Mix chart

4.1. Statistics calculation

4.2. Control limit calculation

5. Performance evaluation

5.1. Simulation set-up

Figure 2.

5.2. Performance evaluation

5.2.1. Extreme imbalanced case

Table 4.

Table 5.

Table 6.

5.2.2. Imbalanced case

Table 7.

Table 8.

Table 9.

5.2.3. Balanced case

Table 10.

Table 11.

Table 12.

5.2.4. Comparison with PCA Mix chart

Table 13.

Table 14.

Table 15.

Figure 3.

Figure 4.

Figure 5.

5.3. Discussion

Table 16.

6. Application to the real data

Table 17.

Figure 6.

Table 18.

Table 19.

Table 20.

7. Conclusion and future research

Declarations

Author contribution statement

Funding statement

Data availability statement

Declaration of interests statement

Additional information

Appendix A. Source code

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases