Simultaneous Fault Detection and Identification in Continuous Processes via nonlinear Support Vector Machine based Feature Selection

Melis Onel; Chris A Kieslich; Yannis A Guzman; Efstratios N Pistikopoulos

doi:10.1016/B978-0-444-64241-7.50341-4

. Author manuscript; available in PMC: 2019 Aug 2.

Published in final edited form as: Int Symp Process Syst Eng. 2018 Aug 2;44:2077–2082. doi: 10.1016/B978-0-444-64241-7.50341-4

Simultaneous Fault Detection and Identification in Continuous Processes via nonlinear Support Vector Machine based Feature Selection

Melis Onel ^a,^b, Chris A Kieslich ^c, Yannis A Guzman ^a,^b,^d, Efstratios N Pistikopoulos ^a,^b

PMCID: PMC6284809 NIHMSID: NIHMS989480 PMID: 30534633

Abstract

Rapid detection and identification of process faults in industrial applications is crucial to sustain a safe and profitable operation. Today, the advances in sensor technologies have facilitated large amounts of chemical process data collection in real time which subsequently broadened the use of data-driven process monitoring techniques via machine learning and multivariate statistical analysis. One of the well-known machine learning techniques is Support Vector Machines (SVM) which allows the use of high dimensional feature sets for learning problems such as classification and regression. In this paper, we present the application of a novel nonlinear (kernel-dependent) SVM-based feature selection algorithm to process monitoring and fault detection of continuous processes. The developed methodology is derived from sensitivity analysis of the dual SVM objective and utilizes existing and novel greedy algorithms to rank features that also guides fault diagnosis. Specifically, we train fault-specific two-class SVM models to detect faulty operations, while using the feature selection algorithm to improve the accuracy of the fault detection models and perform fault diagnosis. We present results for the Tennessee Eastman process as a case study and compare our approach to existing approaches for fault detection, diagnosis and identification.

Keywords: Fault Detection and Identification, Process Monitoring, Data-driven Modeling, Feature Selection, Support Vector Machines

1. Introduction

Maintaining a safe operation while minimizing the losses in productivity is one of the major goals in chemical processing. Timely detection, diagnosis and identification of faults occurring during the operation is critical and necessary, and this is continuously encouraging researchers to develop novel process monitoring techniques. Today, with the advances in automation and sensor technologies as well as novel smart manufacturing frameworks, real time process data acquisition has become effortless and created an immense opportunity for manufacturing and process industries by enabling data-driven real time decision making through it.

A fault is defined as an abnormal process behaviour where controllers lack the capability of reversing it. Faults can occur due to mechanical reasons such as equipment failure and/or wear as a result of equipment aging or process-based reasons stemming from severe process disturbances (Chiang et al., 2001). Once a fault is detected during an operation, revealing the root-cause by identifying the key process variables is of utmost importance for rapid counteraction to avoid abnormal event progression (Venkatasubramanian et al., 2003). The most prevalent data-driven process monitoring techniques include Principal Component Analysis (PCA), Dynamic PCA (DPCA), Partial Least Squares (PLS), Independent Component Analysis (ICA), and Fischer Discriminant Analysis (FDA) as dimensionality reduction step followed by Q and T² statistics, contribution plots, and discriminant analysis (Qin, 2012). The aim of dimensionality reduction is to ensure robustness of the analysis. Dimensionality reduction methods are classified into (a) feature extraction, and (b) feature selection. The dimensionality reduction methods adopted within the listed techniques are based on feature extraction, which project input process data onto another space and alter the original representation of input data, thus may cause loss of information. On the other hand, feature selection is the process of selecting the most informative and relevant original features (e.g. process variables) characterizing the system. Therefore, there is a prominent need for novel data-driven process monitoring methods incorporating feature selection techniques for dimensionality reduction.

In this work, we modify a well-known, powerful machine learning algorithm formulation, Support Vector Machines (SVM) (Cortes and Vapnik, 1995), for simultaneous modelling and feature selection. Then, we present the application of the nonlinear (Kernel-dependent) SVM-based feature selection algorithm to process monitoring of continuous processes. Previous studies have used SVM for fault detection in chemical processes (Mahadevan and Shah, 2009; Yin et al., 2014; Xiao et al., 2016). Here, fault detection is achieved through two-class SVM models, where the feature selection algorithm further improves the model accuracy and also reveals diagnosis of the detected fault. We test the performance of the proposed data-driven framework through the 21 faults introduced in the Tennessee Eastman process data and provide comparisons to existing approaches. The presented methodology can be implemented as an online decision support tool for continuous process monitoring purposes.

2. Nonlinear Support Vector Machine-based Feature Selection Algorithm

Here, we define a supervised learning problem with l training instances, where x_i ∈ Rⁿ. Indices i, j = 1,2, …, l correspond to instances, whereas indices k, k′ = 1,2, …, n correspond to input data features. Accordingly, the input data has become x_i = (x_i1, x_i2, …, x_ik, …, x_in)^T. From process monitoring perspective, instances represent distinct continuous operations while features are the collected process measurements from distinct variables. We formulate the fault detection problem in a classification setting with C –parameterized SVM (C –SVM) with hinge loss, l₂ –norm penalty, and linear Kernel as follows:

\begin{matrix} \min_{w, b, ξ} \frac{1}{2} {∥ w ∥}_{2}^{2} + C Σ_{i = 1}^{l} ξ_{i} \\ s.t. \\ \begin{matrix} y_{i} (w, x_{i} + b) \geq 1 - ξ_{i} & i = 1, 2, \dots, l \\ ξ_{i} \geq 0 & i = 1, 2, \dots, l \end{matrix} \end{matrix}

(1)

where y_i ∈ {−1,1} denotes the class label of instance i. Eq.(1) is a convex nonlinear problem satisfying a first-order qualification, hence strong duality holds. When Eq.(1) is solved to global optimality, resulted optimal solution (w*, b*, ξ*) is then used to determine the linear decision function f(x) = w*.x + b*, which yields the sign, thus class, of a new instance x.

One of the major advantages of exploiting Support Vector Machines in modelling is to be able to use Kernel functions, K(x_i, x_j). These functions implicitly map the input feature space, where data is nonlinearly separable, onto a different, possibly higher feature space, where the data become linearly separable. Since most of the chemical processes are nonlinear in nature, we adopt nonlinear (Kernel-dependent) C-SVM formulation, where Kernel functions are introduced in the Lagrange dual formulation of Eq.(1). Next, we introduce binary variables z ∈ {0,1}ⁿ in Eq.(2), which control elimination or selection of feature k, and aim to minimize the number of selected features while maximizing model accuracy via traditional C –SVM formulation:

\begin{matrix} min_{z} max_{α} Σ_{i = 1}^{l} α_{i} - \frac{1}{2} Σ_{i = 1}^{l} Σ_{j = 1}^{l} α_{i} α_{j} y_{i} y_{j} K (x_{i} \circ z, x_{j} \circ z) \\ s.t. \\ \begin{matrix} Σ_{i = 1}^{l} α_{i} y_{i} = 0 \\ α_{i} \in [0, C] & i = 1, 2, \dots, l \\ \underset{k}{Σ} z_{k} = m \\ z_{k} \in {0, 1} & k = 1, 2, \dots, n \end{matrix} \end{matrix}

(2)

where α_i are dual variables. In Eq.(2), ∘ is the Hadamard product operator for componentwise multiplication, and m is the number of optimal subset of features, where the aim is to attain highest C –SVM model accuracy with minimum number of input features. Since Eq.(2) is a challenging and impractical problem to solve to global optimality, we propose an algorithmic solution procedure, which utilizes the Lagrangian sensitivity of the objective function value of Eq.(2) with respect to z_k at (α*; z), where z_k is treated as a fixed parameter. The procedure is iterative where in each step, the features are eliminated individually according to the following criterion:

\begin{matrix} {crit}_{k} = {- \frac{1}{2} Σ_{i = 1}^{l} Σ_{j = 1}^{l} α_{i}^{*} α_{j}^{*} y_{i} y_{j} \frac{\partial K (x_{i} \circ z, x_{j} \circ z)}{\partial z_{k}} ∣}_{z = 1} \\ k_{worst} = \arg_{k} max {crit}_{k} \end{matrix}

(3)

Eq.(3) delineates the nonlinear SVM-based feature selection algorithm adopted in this study. In particular, the algorithm is equivalent to well-known recursive feature elimination (RFE)-SVM classification algorithm when performing a linear classification that is using a linear Kernel. The presented algorithm (Kieslich et al., 2016) has been implemented in C++/Python environment using LibSVM library (Chang and Lin, 2011), and has been successfully utilized in our previous studies in bioinformatics setting.

3. Tennessee Eastman Process

The Tennessee Eastman process simulation, an extensively used benchmark study to evaluate and further develop process monitoring algorithms, was designed by the Eastman Chemical Company (Downs and Vogel, 1993). It is based on a real industrial process, in which the components, kinetics, and operating conditions have been modified for proprietary reasons (Chiang et al., 2001). The process includes five primary units: a reactor, condenser, compressor, separator, and a stripper. It describes the production of chemicals G and H from feedstocks A, C, D and E with byproduct F and inert compound B. The simulated process data used in this study is adopted from Chiang and Braatz, 2001, where the process contains 41 measured and 11 manipulated variables. The variables are sampled every 3 minutes and the dataset includes measurement samples from normal and 21 distinct faulty operations. For further information on the process and simulation, the interested reader can refer to Downs and Vogel, 1993 and Chiang et al., 2001.

4. Proposed Framework

In this study, we are building 21 fault-specific C –SVM binary classifiers for 21 different faults introduced in the process data. Thus, for each of the 21 model building, we combine data from normal and faulty operation from each of the 21 faults. The initial step in data-driven modelling is normalization of the input data. Normalization is performed on each of the 21 faulty-normal process data by subtracting the mean of measurements across the operations from each measurement and then dividing into the standard deviation of them.

4.1. Offline Phase Model Building

The initial step in the offline phase is to create balanced train and test sets. Imbalanced data sets may significantly increase risk of overfitting problem. Therefore we create train and test sets from each 21 datasets with 100 runs of 5-fold cross-validation. Next, we build binary classifier models iteratively for each feature subset. In each iteration we (i) tune C –SVM parameters using train and test data sets with the active set of features (whole feature set in the first iteration), (ii) train C –SVM classifiers with Gaussian Radial Basis Function, where the class probabilities are smoothened via taking median of probabilities with a window size of 3, (iii) calculate Lagrangian sensitivity of dual C –SVM objective function with respect to the parameter z_k to obtain a rank list of features, (iv) eliminate the most redundant, “worst” feature satisfying Eq.(3) from data. This procedure is repeated until we are left with 1 feature in the train and test sets. The result of this framework yields 1092 C –SVM classifiers, which includes 21 fault-specific C –SVM classifiers for each of 52 feature subsets.

Particularly, in the first step, we tune C –SVM hyperparameters C and γ. Tuning is done via grid search for all combinations between 2⁻¹⁰ and 2¹⁰. Hyperparameter combination yielding the highest average testing accuracy and recall is chosen in each iteration of the framework described above. The final stage in the offline phase is the selection of the fault-specific end-models for fault detection. Among the all feature subsets, the C –SVM classifier yielding the highest fault detection rate and accuracy is chosen to be the fault-specific end-model.

4.2. Online Phase Model Building

21 fault-specific end models that are built in offline phase are implemented in industrial setting to monitor the incoming online process data. When online process data is fed, models generate a binary answer for detection of each fault. If a fault is detected, alarm rises and the optimal set of features of the corresponding end-model yield the root-cause analysis of the detected fault instantaneously.

5. Results

We present the results of the proposed data-driven process monitoring framework on the Tennessee Eastman process below (Table 1), where we have adopted a five consecutive fault alarm policy to report a fault. The performance of the end-models is evaluated via (i) fault detection rate, (ii) accuracy, and (iii) latency.

Table 1.

Performance of the Fault Specific End Models.

Fault	Optimal Feature Subset Size	Fault Detection Rate	Accuracy	Latency (min)
1	2	99.88	99.90	6
2	8	98.13	98.44	48
3	4	100.00	83.33	3
4	1	100.00	100.00	3
5	25	100.00	100.00	3
6	1	100.00	100.00	3
7	1	100.00	100.00	3
8	6	99.38	96.88	3
9	6	99.50	83.33	3
10	9	98.25	83.85	3
11	29	96.63	86.46	3
12	9	100.00	96.15	3
13	2	95.00	89.48	3
14	3	100.00	100.00	3
15	7	98.25	82.81	3
16	26	100.00	87.92	3
17	7	97.63	84.69	3
18	24	90.38	91.56	183
19	3	100.00	83.33	3
20	4	100.00	83.75	3
21	1	100.00	99.90	3

Open in a new tab

Table 2 compares the performance of this study to the other existing data-driven process monitoring techniques (Mahadevan and Shah, 2009; Yin et al., 2014; Xiao et al., 2016) in terms of fault detection rate. This comparison demonstrates the power of the proposed data-driven process monitoring framework. Additionally, the average latency among the reported faults has been provided as 306.19, 145.58, 263.12, 151.00, and 98.50 min for PCA-T², PCA-Q, DPCA-T², DPCA-Q, and 1-class SVM respectively in Mahadevan and Shah, 2009; whereas it is significantly lower for the proposed framework (15.67 min for the Table 2 reported faults, and 13.86 min among all 21 faults).

Table 2.

Comparison through Fault Detection Rate. Best results of Xiao et al., 2016 is adopted.

Ref.	Mahadevan and Shah, 2009					Xiao et al, 2016	Yin et al, 2014	This Study
Fault	PCA-T2	PCA-Q	DPCA-T2	DPCA-Q	1-class SVM	1-class SVM	2-class SVM	2-class SVM
1	99.20	99.80	99.40	99.50	99.80	99.50	99.50	99.90
2	98.00	98.60	98.10	98.50	98.60	98.30	98.12	98.10
4	4.40	96.20	6.10	100.00	99.60	47.40	99.88	100.00
5	22.50	25.40	24.20	25.20	100.00	45.20	90.75	100.00
6	98.90	100.00	98.70	100.00	100.00	99.20	60.13	100.00
7	91.50	100.00	84.10	100.00	100.00	70.10	98.91	100.00
8	96.60	97.60	97.20	97.50	97.90	97.40	96.00	99.38
10	33.40	34.10	42.00	33.50	87.60	68.00	81.00	98.25
11	20.60	64.40	19.90	80.70	69.80	65.80	80.25	96.62
12	97.10	97.50	99.00	97.60	99.90	98.80	97.75	100.00
13	94.00	95.50	95.10	95.10	95.50	95.00	92.50	95.00
14	84.20	100.00	93.90	100.00	100.00	93.90	91.00	100.00
16	16.60	24.50	21.70	29.20	89.80	73.10	89.38	100.00
17	74.10	89.20	76.00	94.70	95.30	75.20	81.63	97.62
18	88.70	89.90	88.90	90.00	90.00	89.30	89.50	90.38
19	0.40	12.70	0.70	24.70	83.90	43.60	85.88	100.00
20	29.90	45.00	35.60	51.00	90.00	69.00	80.50	100.00
21	26.40	43.00	35.60	44.20	52.80	59.40	-	100.00

Open in a new tab

6. Conclusions

In this study, we have presented a nonlinear SVM-based feature selection algorithm procedure and implemented it for data-driven process monitoring of continuous processes. The proposed framework establishes a promising decision support tool for online fault detection and identification. This research was funded by U.S. National Institute of Health (NIH) grant P42 ES027704.

References

Chiang LH, Russell EL, Braatz RD, 2001, Faut Detection and Diagnosis in Industrial Systems, Springer. [Google Scholar]
Venkatasubramanian V, Rengaswamy R, Yin K, Kavuri SN, 2003, A Review of Process Fault Detection and Diagnosis Part I: Quantitative Model-based Methods, Comp Chem Eng, 27, 293–311. [Google Scholar]
Qin SJ, 2012, Survey on Data-driven Industrial Process Monitoring and Diagnosis, Annual Reviews in Control, 36, 220–234. [Google Scholar]
Cortes C and Vapnik V, 1995, Support Vector Networks, Machine Learning, 20, 273–297. [Google Scholar]
Mahadevan S and Shah SL, 2009, Fault Detection and Diagnosis in Process Data Using One-class Support Vector Machines, Journal of Process Control, 19, 1627–1639. [Google Scholar]
Yin S, Gao X, Karimi HR, Zhu X, 2014, Study on Support Vector Machine-Based Fault Detection in Tennessee Eastman Process, Abstract and Applied Analysis, Hindawi. [Google Scholar]
Xiao Y, Wang H, Xu W, Zhou J, 2016, Robust one-class SVM for fault detection, Chemometrics and Intelligent Laboratory Systems, 151, 15–25. [Google Scholar]
Kieslich CA, Tamamis P, Guzman YA, Onel M, Floudas CA, 2016, Highly Accurate Structure-based Prediction of HIV-1 Coreceptor Usage Suggests Intermolecular Interactions Driving Tropism, PLoS One, 11. [DOI] [PMC free article] [PubMed] [Google Scholar]
Downs JJ and Vogel EF, 1993, A Plant-wide Industrial Process Control Problem, Computers and Chemical Engineering, 17, 245–255. [Google Scholar]

[R1] Chiang LH, Russell EL, Braatz RD, 2001, Faut Detection and Diagnosis in Industrial Systems, Springer. [Google Scholar]

[R2] Venkatasubramanian V, Rengaswamy R, Yin K, Kavuri SN, 2003, A Review of Process Fault Detection and Diagnosis Part I: Quantitative Model-based Methods, Comp Chem Eng, 27, 293–311. [Google Scholar]

[R3] Qin SJ, 2012, Survey on Data-driven Industrial Process Monitoring and Diagnosis, Annual Reviews in Control, 36, 220–234. [Google Scholar]

[R4] Cortes C and Vapnik V, 1995, Support Vector Networks, Machine Learning, 20, 273–297. [Google Scholar]

[R5] Mahadevan S and Shah SL, 2009, Fault Detection and Diagnosis in Process Data Using One-class Support Vector Machines, Journal of Process Control, 19, 1627–1639. [Google Scholar]

[R6] Yin S, Gao X, Karimi HR, Zhu X, 2014, Study on Support Vector Machine-Based Fault Detection in Tennessee Eastman Process, Abstract and Applied Analysis, Hindawi. [Google Scholar]

[R7] Xiao Y, Wang H, Xu W, Zhou J, 2016, Robust one-class SVM for fault detection, Chemometrics and Intelligent Laboratory Systems, 151, 15–25. [Google Scholar]

[R8] Kieslich CA, Tamamis P, Guzman YA, Onel M, Floudas CA, 2016, Highly Accurate Structure-based Prediction of HIV-1 Coreceptor Usage Suggests Intermolecular Interactions Driving Tropism, PLoS One, 11. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R9] Downs JJ and Vogel EF, 1993, A Plant-wide Industrial Process Control Problem, Computers and Chemical Engineering, 17, 245–255. [Google Scholar]

PERMALINK

Simultaneous Fault Detection and Identification in Continuous Processes via nonlinear Support Vector Machine based Feature Selection

Melis Onel

Chris A Kieslich

Yannis A Guzman

Efstratios N Pistikopoulos

Abstract

1. Introduction

2. Nonlinear Support Vector Machine-based Feature Selection Algorithm

3. Tennessee Eastman Process

4. Proposed Framework

4.1. Offline Phase Model Building

4.2. Online Phase Model Building

5. Results

Table 1.

Table 2.

6. Conclusions

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Simultaneous Fault Detection and Identification in Continuous Processes via nonlinear Support Vector Machine based Feature Selection

Melis Onel

Chris A Kieslich

Yannis A Guzman

Efstratios N Pistikopoulos

Abstract

1. Introduction

2. Nonlinear Support Vector Machine-based Feature Selection Algorithm

3. Tennessee Eastman Process

4. Proposed Framework

4.1. Offline Phase Model Building

4.2. Online Phase Model Building

5. Results

Table 1.

Table 2.

6. Conclusions

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases