Skip to main content
ACS Omega logoLink to ACS Omega
. 2022 Mar 9;7(11):9496–9512. doi: 10.1021/acsomega.1c06839

Multiscale Principal Component Analysis-Signed Directed Graph Based Process Monitoring and Fault Diagnosis

Husnain Ali , Abdulhalim Shah Maulud †,‡,*, Haslinda Zabiri , Muhammad Nawaz , Humbul Suleman §, Syed Ali Ammar Taqvi
PMCID: PMC8945140  PMID: 35350317

Abstract

graphic file with name ao1c06839_0033.jpg

The chemical process industry has become the backbone of the global economy. The complexities of chemical process systems have been increased in the last two decades due to online sensor technology, plant-wide automation, and computerized measurement devices. Principal component analysis (PCA) and signed directed graph (SDG) are some of the quantitative and qualitative monitoring techniques that have been widely applied for chemical fault detection and diagnosis (FDD). The conventional PCA-SDG algorithm is a single-scale FDD representation origin, which cannot effectively solve multiple FDD representation origins. The multiscale PCA-SDG wavelet-based monitoring technique has potential because it easily distinguishes between deterministic and stochastic characteristics. This study uses multiscale PCA-SDG to detect, diagnose the root cause and identify the fault propagation path. The proposed method is applied to a continuous stirred tank reactor system to validate its effectiveness. The propagation route of most process failures is detected, identified, and diagnosed, which is well-aligned with the fault description, demonstrating a satisfactory performance of the suggested technique for monitoring the process failures.

1. Introduction

With the advent of the fourth industrial revolution (IR 4.0), conventional industrial processes have been autonomous. Many modern industrial processes have well-developed sensors that collect process-related data to detect failures and process system monitoring. Careful monitoring, including process control and appropriate corrective measures, is required to ensure process efficiency. It improves and proceeds to the industrial environments with complete equipment and process automation.13 Fault detection and diagnosis (FDD) systems have seen an increased demand for process safety, dependability, and product quality within process engineering systems. Fault detection and diagnostic methods have been proposed to characterize typical variations in a process plant and detect abnormal deviations. Early detection and diagnosis are always challenging yet needed to prevent process disruptions, shutdowns, or even process failures. It has become a crucial feature due to the advanced and complicated processes involving several variables.4 For example, defective events, like disabling a catalyst, valve obstruction, or compressor failure, are unavoidably and regularly present.5 Therefore, early detection and diagnosis increase process safety and productivity and minimize the process defectivity.6,7 The process industry might save up to millions of dollars if precise FDD systems are implemented correctly.8

Generally, FDD approaches are classified into model-based, knowledge-based, and process methods based on history. Exact and highly dependable process system models are used in model-based techniques. However, accurate primary model expansions are more challenging and almost unachievable for specific processes in modern process systems. The procedures for developing the knowledge models are based on the process conduct and plant operator experience. The accumulation of long-term development of the process knowledge base and expert experience is always in-demand. Instead, the process models and expert’s knowledge are unnecessary for approaches that rely on the process history.9 Furthermore, most process systems employ a distributed control system, and enormous amounts of data can be saved. The process data mining systems can be used for this information. This development in data extracting technology contributed to developing methods based on process history, which are also considered methods based on data. Therefore, most FDD applications are focused on data-driven methodologies in the process industry.1013

The methodology of a signed directed graph (SDG)-based fault diagnostics has improved significantly over the last three decades. An SDG model identifies the flow of information and the effect direction (increase and decrease). The SDG for the modeling of chemical processes was initially introduced by Iri et al.14 Recently, Maurya et al.15 introduced the SDG algorithm and diagram for various system types to predict initial and stable system variables for deviations from their nominal fault diagnosis value on external variables and provided techniques for SDG analysis. In the 1980s, SDG-based fault diagnostics were only studied with qualitative system information, and their relationships with the cause/effects were retrieved by some researchers as expert rules.16,17 In 1989, the concept of defect revealing time was initially presented, and the diagnosis of faults was made online with time data.18 In the 1990s, different researchers modified the SDG model to address the multifault diagnostic problem.19 The diagnostic method based on the simultaneously specific and quantifiable methodologies was coupled with the SDG.20,21 Since 1999, there has been a strong interest in other techniques, including fuzzy-SDG,22 PCA-SDG,23 PLS-SDG,24 and QTA-SDG.25

The PCA-based contribution plots are well-known approaches for diagnosing faults.26 In general, contribution plots indicate the effect of each variable on the statistical index T2 or squared prediction error (SPE) from the PCA. The leading causes of the failure are the factors with the most contributions. However, the contribution from the root origin variable is extended to other variables, which may not be the root cause of the failure due to the quick interaction between process variables. Vedam and Venkatasubramanian proposed a PCA-SDG method, where the SPE statistic from PCA is used to determine node thresholds and find a consistent SDG model path.23 However, the contribution can be propagated among variables with fault spreads, making the root cause of the failure hard to locate. These algorithms lead to inaccurate or no diagnosis in the presence of multiple failures. To select the measurable variables for SDG-based diagnosis, the number of thresholds required is decreased to one. Root causes of multiple faults are difficult to identify, reducing the operator number of control actions. The SDG-based fault diagnosis method has an insufficient diagnostic resolution. The standard SDG fault diagnostic methodology originates from a single-scale fault representation and cannot effectively address various roots of fault representation. It causes erroneous representations because of the qualitative nature of the SDG.

Bakshi introduced the MSPCA framework by combining wavelet transforms (WT) with PCA to address these challenges.27 A specified wavelet family was used to break down each variable individually in this framework. The PCA model was applied to the coefficients at each scale to highlight pertinent events. Furthermore, various multiscale extensions have developed multiscale FDD frameworks in recent years.2831 A multiscale approach based on wavelet transform is a crucial technique. It offers several advantages over standard single-scale approaches as deterministic and stochastic properties differ from the interim measurement of the process.

This study proposes a multiscale PCA-SDG approach to diagnose and identify process failure paths in a continuous stirred tank reactor (CSTR). Its vital contribution is introducing a process monitoring method based on multiscale PCA-SDG. Fault diagnostic techniques based on multiscale PCA-SDG are effective for convenient accuracy and easy diagnosis, and it allows operators to respond to strange events at an early stage.32 The multiscale PCA-SDG diagnostics framework leads to early identification and diagnosis of unusual circumstances that may react to measured variable contribution changes in correlation information.

The rest of the paper is structured as follows. Section 2 thoroughly explains the theory, methods, and algorithms such as PCA, wavelet transforms, and the SDG. Section 3 focuses on the proposed MSPCA-SDG-based process monitoring and fault diagnosis framework. Section 3.4 describes the CSTR system used as a case study. The presented framework results are shown in Section 4 followed by the conclusion in Section 5.

2. Methodology

2.1. Principal Component Analysis

PCA is the multivariate statistical modeling methodology proposed by Pearson33 and Hotelling34 that identifies directions of considerable data variance by generating variable combinations. PCA has an excellent performance on feature extraction and dimension reduction.

Consider the matrix data set X, with n variables and the number of m observations.

The matrix of data is standardized with unit variance and zero means and interpreted through a single-value decomposition (SVD) into a new matrix35,36

2.1. 1

where, U and V reflected the orthogonal matrixes and D expressed the diagonal matrix with positive real values.

2.1. 2

where T and P constitute the principal components (PCs) and loading vectors, respectively. These data sets are composed of the eigenvectors related to X’s covariance data matrix.

The covariance data matrix can be computed accordingly

2.1. 3

where Λ represents the eigenvalue diagonal matrix for m PCs and In represents the identity matrix.

The PCA model includes an optimal number of PCs as the model reliability depends on the number of PCs used. The numbers of suitable PCs for model development are determined using several techniques. These techniques include cross-validation using cumulative percent variance (CPV),37,38 scree plots,39,40 and profile likelihood.41 This work uses the CPV-based methodology to select the dominating PCs, and it may be computed as follows.38

2.1. 4

If l is the lowest range of PCs according to the specified percent of the total variance, then, in that case, the input data matrix can be displayed after determining the retained number of PCs.37

2.1. 5

where the matrices and comprise the number of principal components retained and ignored.

Meanwhile, the matrices and contain both retained and eigenvectors, respectively. Now X refers to the given equation.37

2.1. 6

In the above equation, XP̂P̂T = and X(ImP̂P̂T) = E are the matrices that represent modeled variation of X and variations that correspond to the process noise, respectively.

In establishing the PCA model, the two monitoring control charts, T2 and (SPE), are used to determine the model and the residual variation in space, respectively.

The T2 statistic can be calculated accordingly.42

2.1. 7

The T2 threshold can be determined as follows:

2.1. 8

where Fisher distributions represented by Flm – 1, α, l, and (ml) are the degrees of freedom, whereas the significance level is shown by α

The SPE is determined as follows:42

2.1. 9

where the residual vector is denoted by r; the SPE threshold is calculated as

2.1. 10

where Inline graphic, Inline graphic, Inline graphic, and Inline graphic. The value cα deduced from the significant standard classification. An abnormality is indicated by T2 and SPE values being over the threshold. The PCA-based monitoring framework is illustrated in Figure 1. First, a PCA model is built using process data collected under normal operating conditions. Then, the T2 and SPE-based monitoring control charts are calculated on the faulty database. When the control chart of T2 and SPE surpasses the threshold boundary, as predicted during the model building phase, the processing system has a fault.

Figure 1.

Figure 1

PCA-based monitoring framework.

2.2. Wavelet Transforms

Multiscale process monitoring decomposes the original process data into multiscale components using wavelet transform (WT). According to the properties of the time-frequency locations, the tendency to accurately differentiate approximation function, reflecting deterministic characteristics, and the detailed function showing stochastic characteristics.43,44 This enables the process phenomena to be more appropriately interpreted in their temporal frequency bands.45 A mother wavelet ψ(λ)provides all essential functions ψab(λ)inside WT through dilating and by translating processes

2.2. 11

where a and b are the discrete parameters, respectively, of scale and orientation. Multiscale displays consist of a low pass and a high pass filter that transmit the signal in various scales. The original primary signal is projected orthonormal as the deployed scaled version

2.2. 12

Details and approximation coefficients of the several wavelet functions represented by each level are generated via the signal projection as

2.2. 13

The original primary signal is generated and indicated by merging the final deployed scaled signal with all the detailed signals

2.2. 14

where J and n are, respectively, the decomposition level and length of the original signal.

Figure 2 shows that wavelet transformation decomposes the signal at several levels (L = 3) followed by the PCA-SDG approach to develop a qualitative multiscale model.

Figure 2.

Figure 2

Level three wavelet decomposition.

2.3. Signed Directed Graph

An SDG represents the causal relationship of the processes, which represents process variables as graph nodes and causal relationships as directed arcs. The node states in the SDG are “0”, “+”, and “–”, which represent the typical values of a stable state. The arc pointing to a node of cause, which might indicate the effect in the same or opposite directions, is either a solid line as positive or a dotted line as negative. The positive arc sign is “+”, while the negative arc sign is “–”. As seen in Figure 3, the cause node “A” and the effect node “B” in the arc (solid line) are both “+”, showing that nodes “A” and “B” are in the same way.14

Figure 3.

Figure 3

Typical SDG model.

The circle nodes are measurable process variables in the SDG model for fault diagnostics. All the abnormal factors lead to a change in the next node. Various nodes of the reason might hold various sources of the fault. All nodes of reason are root nodes. At least one arc connects a root node with an effect node. An arc is said to be consistent when sign (arc) × sign (effect node) × sign (cause node) = “+”. The valid path is a cause node, effect nodes, and consistent arc.

The SDG-based approach of fault diagnosis was initially presented and characterized as follows for chemical processes:46

2.3. 15

where V = {ν1, ν2, . . ...νn} is a collection of node sets, representing the root cause faults, in which E = {e1, e2, . . ...em} is a set of branches, describing the causality of the relationship between various nodes, and φ : E → { + , – }, φ(ek)(ekE) is the sign of the branch ek; “+” represents a positive impact, and on the contrary, “–” represents a negative impact. ψ : V → { + ,0, – }, ψ(νj)(νjV)means the νjnode sign, expressing the node status, and a divergence of the value xνj projected node value νj, as indicated in eq 16, determines the status of that node47

2.3. 16

3. Multiscale PCA-SDG-Based Process Monitoring and Fault Diagnosis

3.1. Multiscale PCA-SDG Methodology Framework

A multiscale approach for the chemical process is the primary target of the proposed framework. This work proposed and developed a multiscale process monitoring and fault diagnosis technique using the WT and PCA-SDG model characteristics, as shown in Figure 4. When an abnormal situation is monitored, the contribution of observed data to predicted model failure is posted and provided as information to the SDG model for process fault diagnosis.23 After the fault is detected and the next step is to identify the correlated variables of the fault, the contribution plot approach is employed. Then, contribution plots are compared to identify the fault-correlated variables effectively and determine the fault propagation paths. When the correlated variables to the problem have been identified, they follow entry into the SDG model developed from the process knowledge to determine the root cause and the fault propagation path.

Figure 4.

Figure 4

Multiscale PCA-SDG methodology.

The significance of the proposed multiscale PCA-SDG-based process monitoring and fault diagnostic technique amplifies the accuracy and quick fault search efficiency. It provides operators with additional time to respond to abnormal occurrences. The frequency of thresholds necessary to detect measured variable variances for SDG-based diagnosis has been decreased to one, and the measurable variable contributions reflect the change in correlation values.

It can be indicated that the proposed developed multiscale PCA-SDG framework is more appropriate and accurate than the conventional PCA-SDG framework. In fault detection, when a fault occurs, the MSPCA model detects faults more immediately and effectively than conventional PCA. In fault identification, some variables show misleading and inaccurate representations due to this single FDD nature of the PCA. The WT-based contribution plots followed by detailed functions (D1, D2, and D3) and approximation (A3) show the fault-correlated variables’ correct identification. The conventional PCA-SDG fault diagnostic algorithm is a single-scale fault representation origin and cannot efficiently address the issue of actual fault representation roots. The conventional PCA-SDG creates inaccurate and misleading interpretations of fault nodes. The multiscale PCA-SDG framework-based results represent the actual propagation of the fault. The proposed technique shows more efficient fault detection, identification, and diagnosis performance than the conventional PCA-SDG technique. The expressed approach is relatively valuable and adaptable, and it may be applied to any chemical process system.

3.2. Process Monitoring

Process monitoring involves the model development phase and fault detection and identification phase.

3.2.1. Model Development Phase

The presented MSPCA model is classified into phases. First of all, the methodology uses the normal working conditions of fault-free training data.

Step 1: A fault-free normal data set is obtained. Then, the training data using zero mean and unit variance is normalized. After the normalization, every element in the database is decomposed individually using wavelet transformation into wavelet coefficients. After decomposition, the approximations and detail matrices are standardized with mean and standard deviation.

Step 2: Using the approximation (A3) and detail matrices (D1, D2, and D3) obtained after normalization, the PCA model is developed. Then, the control limits of the monitoring charts are determined based on T2 and the SPE.

3.2.2. Fault Detection and Identification Phase

Detection and identification of faulty data follow model development. The faulty data set may include irregularities that may potentially contribute to the unexpected monitoring system operations.

Step 1: First, the faulty data sets are collected. Faulty database information with zero mean and unit variance is normalized. Every element in the faulty database is degraded into wavelet coefficients individually using wavelet transformation. After decomposition of the faulty data set, approximations and detail matrices are normalized with the mean and standard difference, determined in the development phase of the model in step 1.

Step 2: In this stage, the T2 and SPE-based monitoring control charts are calculated on the test database. When the control chart of T2 and the SPE surpasses the threshold boundary, as predicted during the model building phase, the processing system has a fault. The contribution plots are ultimately deployed to find the faulty variable in both control charts.

3.3. Fault Diagnosis

Many researchers discuss diagnostic algorithms based on the SDG. All algorithms use a backtracking search for overall potential routes explaining the impact.19,4850 Two alternative techniques of deducing faulty candidates are, as mentioned by Maurya et al.,15 as follows:

  • 1.

    The search for the backward method is based on a non-zero-qualitative value of a measured node to gain the possible quality of the subsequent nodes. When there is an additional measurement of any projected node value, the related signal derived from the backpropagation compares with the previous one. This search path is discontinued if the two values disagree through the other pathways. When all nodes in the measured node are validated, the backward search is accomplished.

  • 2.

    When reversing, propagation is happening as specified above combined with backward and forward search. Then, forward propagation is started from any potential fault candidate until all non-zero qualitative measures have been acknowledged. Again, the corresponding fault possibility is dismissed if any discrepancy is detected.

SDG uses a single-variable statistical method to establish the node thresholds, in which correlations between variables are not considered. Thresholds (2m) shall be established where m indicates a variable number of procedures. Because of the performance of the PCA in the multivariable statistics, the threshold can be computed based on the variable contribution to the SPE, which is a remarkable residual space statistic. It requires only one threshold to be set, and the calculation can be significantly reduced.23 It can be shown that the variables with significant contribution rates change with time. It determines a threshold of variable nodes in the SDG model to discover problematic propagation channels. Since the contribution from one variable transfers to all variables, the difference in the contribution plot for each sample grows considerably more significant as the fault spreads.51,52 Furthermore, the previously indicated method of determining the threshold using the SPE statistic based on PCA yielded variable results across samples, perhaps failing to eliminate nodes that are not the root cause of the fault.

3.4. Case Study: Continuous Stirred Tank Reactor (CSTR) System

In this section, the multiscale PCA-SDG process monitoring and fault diagnosis approach are illustrated in the CSTR case study. An irreversible exothermic reaction of first order occurs in the CSTR system with cascade control shown in Figure 5. As mentioned below, reactant A enters, and product B comes out from the reactor.

3.4. 17

Figure 5.

Figure 5

Jacketed CSTR system with cascade control.

The jacketed refrigeration fluid dissipates heat from the exothermic impact. The reactor temperature and liquid level are regulated by the refreshing and outflow of the coolant correspondingly. The CSTR system models can be explained using the given equations.

3.4. 18
3.4. 19
3.4. 20
3.4. 21

The system variables of the CSTR system are displayed in Table 1. CSTR simulation system details are available from Kaisare’s work.53 All simulations and programming are performed using the appropriate toolboxes in the MATLAB/Simulink environment.

Table 1. CSTR System Variables.

  system variables for process monitoring
Sr. no process variables variable description
1 h height of CSTR
2 C reactant A concentration in the reactor
3 T the temperature in the reactor
4 Tc the temperature of the coolant in the jacket
5 Fi feed stream flow rate
6 F the flow rate of the outlet stream
7 Ci reactant A initial concentration in the feed
8 Ti the temperature of the feed stream
9 Tci cooling liquid feed temperature in the cooling jacket
10 Fc cooling liquid flowrate in the cooling jacket

The process data has been generated using the CSTR Simulink model for fault-free and faulty operating circumstances. One thousand samples with normal disorders of the faultless process data have been recorded. In addition, three relevant fault situations were simulated, and 1000 fault pattern samples were recorded. The simulated fault situations, including the sensor biases and process faults, are described in Table 2.

Table 2. Simulated Faults in the CSTR System.

fault no. description variable type of fault
1 ramp change reactant A concentration in the reactor, C sensor bias
2 step change feed stream flowrate, Fi process disturbance
3 ramp change the temperature in the reactor, T sensor bias

4. Results and Discussion

This section discusses the conventional single-scale PCA-SDG and multiscale PCA-SDG framework’s outcomes depending on approximation (A3) and detailed functions (D1, D2, and D3). The detailed function (D1) in this study is not included because it solely contains noises. Therefore, using the proposed methodology and other comparative techniques does not observe fault detection in D1. Monitoring charts based on T2 and the SPE with a confidence level of 99% are generated for all simulations. Confidence limits and statistical values are indicated as solid red and blue lines, respectively.

4.1. Fault 1 - Ramp Change in the Reactant A Concentration in the Reactor

In this case, a sensor-biased fault has been inducted at 500 sample points into the reactant concentration in the reactor. Figures 69 illustrate the monitoring results of PCA and multiscale PCA. The scenario of the fault detection technique based on PCA, T2, and SPE monitoring charts detects confined faults, as illustrated in Figure 6a,b. In this scenario, the fault detection technique based on MSPCA followed by detailed functions (D2 and D3) shows that there is still limited detection of the fault presence on monitoring charts. In particular, the statistical monitoring data are still below the control boundary while the fault is persistent, as shown in Figures 7 and 8. Similarly, when a fault occurs at the 500th sample point, the MSPCA model followed by approximation (A3) detects the fault immediately, as shown in Figure 9a,b. The MSPCA fault detection results show that all the statistical monitoring data over the confidence limits and monitoring charts efficiently detect faults.

Figure 6.

Figure 6

Fault detection monitoring charts of the fault in the reactant A concentration in the reactor based on PCA. (a) Monitoring chart of T2 and (b) monitoring chart of the SPE.

Figure 9.

Figure 9

Fault detection monitoring charts of the fault in the reactant A concentration in the reactor based on MSPCA. (a) Monitoring chart of T2 based on A3 and (b) monitoring chart of the SPE based on A3.

Figure 7.

Figure 7

Fault detection monitoring charts of the fault in the reactant A concentration in the reactor based on MSPC. (a) Monitoring chart of T2 based on D2 and (b) monitoring chart of the SPE based on D2.

Figure 8.

Figure 8

Fault detection monitoring charts of the fault in the reactant A concentration in the reactor based on MSPCA. (a) Monitoring chart of T2 based on D3 and (b) monitoring chart of the SPE based on D3.

Figures 1013 show the contribution plot results of PCA and MSPCA. First, to identify the correlated variables of the fault, the contribution plot approach is employed. Then, contribution plots at the 500th sample are compared to identify the fault-correlated variables effectively. Figure 10a,b shows that the PCA-based contribution plots identify the five variables taking part in the fault. The conventional PCA algorithm is a single-scale FDD representation origin. It cannot effectively solve multiple FDD representation origins, which affect the variables in the same direction because of low FDD resolution. Due to this single-scale FDD nature of the PCA, variable three (temperature in the reactor, T) shows misleading and inaccurate representations. The WT-based contribution plots followed by detailed functions (D2 and D3) and approximation (A3) are shown in Figures 1113 at the 500th sample and compared to identify the fault-correlated variables, as shown in Table 3. MSPCA-based contribution plots identify the four variables that take part in the fault. They are then chosen and entered in the SDG model to help determine the fault propagation path. It is worth noting that the results are consistent in D2, D3, and A3. It suggests that selecting the findings at one of these plots to determine the fault propagation is reliable.

Figure 10.

Figure 10

PCA-based contribution plots of the fault in the reactant A concentration in the reactor. (a) T2 contribution and (b) SPE contribution.

Figure 13.

Figure 13

MSPCA-based contribution plots of the fault in the reactant A concentration in the reactor. (a) T2 contribution based on A3 and (b) SPE contribution based on A3.

Figure 11.

Figure 11

MSPCA-based contribution plots of the fault in the reactant A concentration in the reactor. (a) T2 contribution based on D2 and (b) SPE contribution based on D2.

Table 3. Results of Variable Selection by the Contribution Plot Approach.

4.1.

Figure 12.

Figure 12

MSPCA-based contribution plots of the fault in the reactant A concentration in the reactor. (a) T2 contribution based on D3 and (b) SPE contribution based on D3.

Since the correlated variables to the problem have been identified, they follow entry into the SDG model developed from the process knowledge to determine the root cause. As previously stated, the SDG model of the CSTR system is based on process knowledge. Figure 14a,b shows the fault diagnosis result. The fundamental reason for this failure is the manipulated variable reactant concentration in the reactor (C), indicated by a red node. The blue nodes show the factors that fluctuate with the fault spread, while the yellow node represents the outcome of the fault. Figure 14a shows that the conventional PCA-SDG fault diagnostic algorithm is a single-scale fault representation origin and cannot efficiently address the issue of actual fault representation roots. The conventional PCA-SDG creates inaccurate and misleading interpretations of fault node temperature in the reactor (T), while the system component undergoes a non-single transformation. Figure 14b shows the multiscale PCA-SDG fault diagnostic results that represent the actual propagation of the fault. The fault (1) description causes a problem with the feed stream temperature (Ti) to verify the diagnosis’s conclusion. It states that it may isolate the root cause and identify the channel of propagation of the CSTR system by the proposed fault monitoring and diagnostic approach.

Figure 14.

Figure 14

Fault propagation root of the fault in the reactant A concentration in the reactor. (a) Conventional PCA-SDG and (b) multiscale PCA-SDG based on the detailed functions (D2 and D3) and approximation function (A3).

4.2. Fault 2 - Step Change in the Feed Stream Flowrate

In this case, a process-disturbed fault has been inducted at 500 sample points into the feed stream flowrate (Fi). Figures 1518 show the monitoring results of PCA and multiscale PCA. The scenario of the fault detection technique based on PCA, T2, and SPE monitoring charts detects confined faults, as illustrated in Figure 15a,b. The scenario of the fault detection technique based on MSPCA followed by detailed functions (D2 and D3) shows that there is still limited detection of the fault presence on monitoring charts. In particular, the statistical monitoring data are still below the control boundary while the fault is persistent, as shown in Figures 16 and 17. Similarly, when a fault occurs at the 500th sample point, the MSPCA model followed by approximation (A3) detects the fault immediately, as shown in Figure 18a,b. The MSPCA fault detection results show that all the statistical monitoring data over the confidence limits and monitoring charts efficiently detect faults.

Figure 15.

Figure 15

Fault detection monitoring charts of the fault in feed stream flowrate based on PCA. (a) Monitoring chart of T2 and (b) monitoring chart of the SPE.

Figure 18.

Figure 18

Fault detection monitoring charts of the fault in feed stream flowrate based on MSPCA. (a) Monitoring chart of T2 based on A3 and (b) monitoring chart of the SPE based on A3.

Figure 16.

Figure 16

Fault detection monitoring charts of the fault in feed stream flowrate based on MSPCA. (a) Monitoring chart of T2 based on D2 and (b) monitoring chart of the SPE based on D2.

Figure 17.

Figure 17

Fault detection monitoring charts of the fault in feed stream flowrate based on MSPCA. (a) Monitoring chart of T2 based on D3 and (b) monitoring chart of the SPE based on D3.

Figures 1922 show the contribution plot results of PCA and MSPCA. First, to identify the correlated variables of the fault, the contribution plot approach is employed. Then, contribution plots at the 500th sample are compared to identify the fault-correlated variables effectively. Figure 19a,b shows that the PCA-based contribution plots identify the four variables taking part in the fault. The conventional PCA algorithm is a single-scale FDD representation origin. It cannot effectively solve multiple FDD representation origins, which affect the variables in the same direction because of low FDD resolution. Due to this single-scale FDD nature of the PCA, variable nine (temperature of the coolant in the jacket, Tci) shows misleading and inaccurate representations. The WT-based contribution plots followed by detailed functions (D2 and D3) and approximation (A3) are shown in Figures 2022 at the 500th sample and compared to identify the fault-correlated variables shown in Table 4. MSPCA-based contribution plots identify the three variables that take part in the fault. They are then chosen and entered in the SDG model to help determine the fault propagation path. It is worth noting that the results are consistent in D2, D3, and A3. It suggests that selecting the findings at one of these plots to determine the fault propagation is reliable.

Figure 19.

Figure 19

PCA-based contribution plots of the fault in feed stream flowrate. (a) T2 contribution and (b) SPE contribution.

Figure 22.

Figure 22

MSPCA-based contribution plots of the fault in feed stream flowrate. (a) T2 contribution based on A3 and (b) SPE contribution based on A3.

Figure 20.

Figure 20

MSPCA-based contribution plots of the fault in feed stream flowrate. (a) T2 contribution based on D2 and (b) SPE contribution based on D2.

Table 4. Results of Variable Selection by the Contribution Plot Approach.

4.2.

Figure 21.

Figure 21

MSPCA-based contribution plots of the fault in feed stream flowrate. (a) T2 contribution based on D3 and (b) SPE contribution based on D3.

Since the correlated variables to the problem have been identified, they follow entry into the SDG model developed from the process knowledge to determine the root cause. As previously stated, the SDG model of the CSTR system is based on process knowledge. Figure 23a,b shows the fault diagnosis result. The fundamental reason for this failure is the manipulated variable feed flow rate in the reactor (Fi), indicated by a red node. The blue nodes show the factors that fluctuate with the fault spread, while the yellow node represents the outcome of the fault. Figure 23a shows that the conventional PCA-SDG fault diagnostic algorithm is a single-scale fault representation origin and cannot efficiently address the issue of actual fault representation roots. The conventional PCA-SDG creates inaccurate and misleading interpretations of the fault node feed temperature of the coolant in the jacket (Tci), while the system component undergoes a non-single transformation. Figure 23b shows the multiscale PCA-SDG fault diagnostic results that represent the actual propagation of the fault. The fault (2) description causes a problem with the flow rate of the outlet stream (F) that can verify the diagnosis’s conclusion. It states that it may isolate the root cause and identify the channel of propagation of the CSTR system by the proposed fault monitoring and diagnostic approach.

Figure 23.

Figure 23

Fault propagation root of the fault in feed stream flowrate. (a) Conventional PCA-SDG and (b) multiscale PCA-SDG based on the detailed functions (D2 and D3) and approximation function (A3).

4.3. Fault 3 - Ramp Change in Temperature in the Reactor

In this case, a sensor-biased fault has been inducted at 500 sample points into the temperature in the reactor (T) that explains the monitoring results of PCA and multiscale PCA. Although the case 3 detection monitoring results efficiently detect faults similarly to the previously mentioned cases 1 and 2, they are not illustrated in this scenario. Only the contribution plots based on PCA and multiscale PCA are discussed here.

Figures 2427 show the contribution plot results of PCA and MSPCA. First, to identify the correlated variables of the fault, the contribution plot approach is employed. Then, contribution plots at the 500th sample are compared to identify the fault-correlated variables effectively. Figure 24a,b shows that the PCA-based contribution plots identify the five variables taking part in the fault. The conventional PCA algorithm is a single-scale FDD representation origin. It cannot effectively solve multiple FDD representation origins, which affect the variables in the same direction because of low FDD resolution. Due to this single-scale FDD nature of the PCA, variable nine (temperature of the coolant in the jacket, Tci) shows misleading and inaccurate representations. The WT-based contribution plots followed by detailed functions (D2 and D3) and approximation (A3) are shown in Figures 2527 at the 500th sample and compared to identify the fault-correlated variables, as shown in Table 5. MSPCA-based contribution plots identify the four variables that take part in the fault. They are then chosen and entered in the SDG model to help determine the fault propagation path. It is worth noting that the results are consistent in D2, D3, and A3. It suggests that selecting the findings at one of these plots to determine the fault propagation is reliable.

Figure 24.

Figure 24

PCA-based contribution plots of the fault in temperature in the reactor. (a) T2 contribution and (b) SPE contribution.

Figure 27.

Figure 27

MSPCA-based contribution plots of the fault in temperature in the reactor. (a) T2 contribution based on A3 and (b) SPE contribution based on A3.

Figure 25.

Figure 25

MSPCA-based contribution plots of the fault in temperature in the reactor. (a) T2 contribution based on D2 and (b) SPE contribution based on D2.

Table 5. Results of Variable Selection by the Contribution Plot Approach.

4.3.

Figure 26.

Figure 26

MSPCA-based contribution plots of the fault in temperature in the reactor. (a) T2 contribution based on D3 and (b) SPE contribution based on D3.

Since the correlated variables to the problem have been identified, they follow entry into the SDG model developed from the process knowledge to determine the root cause. As previously stated, the SDG model of the CSTR system is based on process knowledge. Figure 28a,b shows the fault diagnosis result. The fundamental reason for this failure is the manipulated variable temperature in the reactor (T), indicated by a red node. The blue nodes show the factors that fluctuate with the fault spread, while the yellow node represents the outcome of the fault. Figure 28a shows that the conventional PCA-SDG fault diagnostic algorithm is a single-scale fault representation origin and cannot efficiently address the issue of actual fault representation roots. The conventional PCA-SDG creates inaccurate and misleading interpretations of the fault node feed temperature of the coolant in the jacket (Tci), while the system component undergoes a non-single transformation. Figure 28b shows the multiscale PCA-SDG fault diagnostic results that represent the actual propagation of the fault. The fault (3) description causes a problem with the temperature of the coolant in the jacket (Tc) that can verify the diagnosis’s conclusion. It states that it may isolate the root cause and identify the channel of propagation of the CSTR system by the proposed fault monitoring and diagnostic approach.

Figure 28.

Figure 28

Fault propagation root of the fault in temperature in the reactor. (a) Conventional PCA-SDG and (b) multiscale PCA-SDG based on the detailed functions (D2 and D3) and approximation function (A3).

5. Conclusions

This study proposes a new and effective multiscale PCA-SDG-based process monitoring and fault diagnosis framework that improves fault search efficiency and diagnosis accuracy. This work implements, tests, and compares the process monitoring and fault diagnosis algorithm based on conventional single-scale PCA-SDG and multiscale PCA-SDG in the CSTR system and their intricate causal representation between process variables. The results show that the proposed multiscale PCA-SDG framework is more appropriate and accurate than the conventional single-scale PCA-SDG framework. The multiscale PCA fault detection results illustrate that the MSPCA model detects faults more immediately and effectively than conventional PCA when a fault occurs. All the monitoring statistics exceed the confidence threshold, and both monitoring charts efficiently detect faults and reduce noise and disturbance. In fault identification, some variables show misleading and inaccurate representations due to the single FDD nature of the PCA. The contribution plot results based on correlated variable selection by multiscale PCA-SDG followed by WT-based detailed (D2 and D3) and approximation (A3) functions consistently show the correct fault-correlated variable identification. However, the conventional PCA-SDG framework is a single-scale fault representation origin that illustrates the false and imprecise interpretations of the propagation path of the fault. It cannot efficiently address the issue of actual fault representation roots. The conventional PCA-SDG creates inaccurate and misleading descriptions of fault nodes. The multiscale PCA-SDG framework results represent the actual propagation of the fault. It can correctly identify active process correction and efficiently specify deterministic and stochastic functions.

Hence, the proposed multiscale PCA-SDG technique shows more efficient fault detection, identification, and diagnosis performance than the conventional PCA-SDG technique. The expressed approach is relatively valuable and adaptable, and it can be applied to any chemical process system. It can provide powerful fault search efficiency, high detection and diagnostic resolution, and accurate fault propagation roots and offer a new approach to ensure process safety.

Acknowledgments

The authors appreciate the Universiti Teknologi PETRONAS for administrative assistance, the Chemical Engineering Department for technical encouragement, and Yayasan UTP (cost center: 015LC0-132) for funding support.

The authors declare no competing financial interest.

References

  1. Du X. Fault detection using bispectral features and one-class classifiers. J. Process Control 2019, 83, 1–10. 10.1016/j.jprocont.2019.08.007. [DOI] [Google Scholar]
  2. Isermann R.Fault-diagnosis applications: model-based condition monitoring: actuators, drives, machinery, plants, sensors, and fault-tolerant systems; Springer Science & Business Media, 2011. [Google Scholar]
  3. Ming L.; Zhao J.. Review on chemical process fault detection and diagnosis. In 2017 6th International Symposium on Advanced Control of Industrial Processes (AdCONIP); IEEE, 28–31 May 2017, 2017; pp. 457–462, 10.1109/ADCONIP.2017.7983824. [DOI] [Google Scholar]
  4. Ayoubi M.; Isermann R. Neuro-fuzzy systems for diagnosis. Fuzzy Sets Syst. 1997, 89, 289–307. 10.1016/S0165-0114(97)00011-0. [DOI] [Google Scholar]
  5. Severson K.; Chaiwatanodom P.; Braatz R. D. Perspectives on process monitoring of industrial systems. Annu. Rev. Control 2016, 42, 190–200. 10.1016/j.arcontrol.2016.09.001. [DOI] [Google Scholar]
  6. Zhou Y.; Gao K.; Li D.; Xu Z.; Gao F. Data-Efficient Constrained Learning for Optimal Tracking of Batch Processes. Ind. Eng. Chem. Res. 2021, 60, 15658–15668. 10.1021/acs.iecr.1c02706. [DOI] [Google Scholar]
  7. Park D.; Na J.; Lee J. M. Clustered Manifold Approximation and Projection for Semisupervised Fault Diagnosis and Process Monitoring. Ind. Eng. Chem. Res. 2021, 60, 9521–9531. 10.1021/acs.iecr.1c01271. [DOI] [Google Scholar]
  8. Wang H.; Chai T.-Y.; Ding J.-L.; Brown M. Data Driven Fault Diagnosis and Fault Tolerant Control: Some Advances and Possible New Directions. Acta Autom. Sin. 2009, 35, 739–747. 10.1016/S1874-1029(08)60093-2. [DOI] [Google Scholar]
  9. Frank P. M. Fault diagnosis in dynamic systems using analytical and knowledge-based redundancy: A survey and some new results. Automatica 1990, 26, 459–474. 10.1016/0005-1098(90)90018-D. [DOI] [Google Scholar]
  10. Gao K.; Lu J.; Xu Z.; Gao F. Control-Oriented Two-Dimensional Online System Identification for Batch Processes. Ind. Eng. Chem. Res. 2021, 60, 7656–7666. 10.1021/acs.iecr.1c00006. [DOI] [Google Scholar]
  11. He Y.-L.; Ma Y.; Xu Y.; Zhu Q.-X. Fault Diagnosis Using Novel Class-Specific Distributed Monitoring Weighted Nai;̈ve Bayes: Applications to Process Industry. Ind. Eng. Chem. Res. 2020, 59, 9593–9603. 10.1021/acs.iecr.0c01071. [DOI] [Google Scholar]
  12. Wang J.; Zhao Z.; Liu F. Robust Slow Feature Analysis for Statistical Process Monitoring. Ind. Eng. Chem. Res. 2020, 59, 12504–12513. 10.1021/acs.iecr.0c01512. [DOI] [Google Scholar]
  13. Chen H.; Jiang B.; Ding S. X.; Huang B. Data-Driven Fault Diagnosis for Traction Systems in High-Speed Trains: A Survey, Challenges, and Perspectives. IEEE Trans. Intell. Transp. Syst. 2020, 1–17. 10.1109/TITS.2020.3029946. [DOI] [Google Scholar]
  14. Iri M.; Aoki K.; O’Shima E.; Matsuyama H. An algorithm for diagnosis of system failures in the chemical process. Comput. Chem. Eng. 1979, 3, 489–493. 10.1016/0098-1354(79)80079-4. [DOI] [Google Scholar]
  15. Maurya M. R.; Rengaswamy R.; Venkatasubramanian V. A signed directed graph-based systematic framework for steady-state malfunction diagnosis inside control loops. Chem. Eng. Sci. 2006, 61, 1790–1810. 10.1016/j.ces.2005.10.023. [DOI] [Google Scholar]
  16. Kramer M. A.; Palowitch B. L. Jr. A rule-based approach to fault diagnosis using the signed directed graph. AIChE J. 1987, 33, 1067–1078. 10.1002/aic.690330703. [DOI] [Google Scholar]
  17. Shiozaki J.; Matsuyama H.; O’Shima E.; Iri M. An improved algorithm for diagnosis of system failures in the chemical process. Comput. Chem. Eng. 1985, 9, 285–293. 10.1016/0098-1354(85)80006-5. [DOI] [Google Scholar]
  18. Shiozaki J.; Shibata B.; Matsuyama H.; Shima E. O. Fault diagnosis of chemical processes utilizing signed directed graphs-improvement by using temporal information. IEEE Trans. Ind. Electron. 1989, 36, 469–474. 10.1109/41.43004. [DOI] [Google Scholar]
  19. Vedam H.; Venkatasubramanian V. Signed digraph based multiple fault diagnosis. Comput. Chem. Eng. 1997, 21, S655–S660. 10.1016/S0098-1354(97)87577-1. [DOI] [Google Scholar]
  20. Tarifa E. E.; Scenna N. J. Fault diagnosis, direct graphs, and fuzzy logic. Comput. Chem. Eng. 1997, 21, S649–S654. 10.1016/S0098-1354(97)87576-X. [DOI] [Google Scholar]
  21. Vianna R. F.; McGreavy C. Qualitative modelling of chemical processes - a weighted digraph (WDG) approach. Comput. Chem. Eng. 1995, 19, 375–380. 10.1016/0098-1354(95)87065-2. [DOI] [Google Scholar]
  22. Tarifa E. E.; Scenna N. J. Fault diagnosis for MSF dynamic states using a SDG and fuzzy logic. Desalination 2004, 166, 93–101. 10.1016/j.desal.2004.06.063. [DOI] [Google Scholar]
  23. Vedam H.; Venkatasubramanian V. PCA-SDG based process monitoring and fault diagnosis. Control Eng. Prac. 1999, 7, 903–917. 10.1016/S0967-0661(99)00040-4. [DOI] [Google Scholar]
  24. Lee G.; Tosukhowong T.; Lee J. H.; Han C. Fault Diagnosis Using the Hybrid Method of Signed Digraph and Partial Least Squares with Time Delay: The Pulp Mill Process. Ind. Eng. Chem. Res. 2006, 45, 9061–9074. 10.1021/ie060793j. [DOI] [Google Scholar]
  25. Maurya M. R.; Rengaswamy R.; Venkatasubramanian V. A Signed Directed Graph and Qualitative Trend Analysis-Based Framework for Incipient Fault Diagnosis. Chem. Eng. Res. Des. 2007, 85, 1407–1422. 10.1016/S0263-8762(07)73181-7. [DOI] [Google Scholar]
  26. Mnassri B.; Adel E. M. E.; Ananou B.; Ouladsine M. Fault Detection and Diagnosis Based on PCA and a New Contribution Plot. IFAC Proc. Vol. 2009, 42, 834–839. 10.3182/20090630-4-ES-2003.00137. [DOI] [Google Scholar]
  27. Bakshi B. R. Multiscale PCA with application to multivariate statistical process monitoring. AIChE J. 1998, 44, 1596–1610. 10.1002/aic.690440712. [DOI] [Google Scholar]
  28. Maulud A. H. S.; Wang D.; Romagnoli J. A.. Wavelet-based nonlinear multivariate statistical process control. In Computer Aided Chemical Engineering; Puigjaner L.; Espuña A. Eds.; Vol. 20; Elsevier, 2005; pp. 1321–1326. [Google Scholar]
  29. Maulud A.; Wang D.; Romagnoli J. A. A multi-scale orthogonal nonlinear strategy for multi-variate statistical process monitoring. J. Process Control 2006, 16, 671–683. 10.1016/j.jprocont.2006.01.006. [DOI] [Google Scholar]
  30. Nawaz M.; Maulud A. S.; Zabiri H.; Suleman H.; Tufa L. D. Multiscale Framework for Real-Time Process Monitoring of Nonlinear Chemical Process Systems. Ind. Eng. Chem. Res. 2020, 59, 18595–18606. 10.1021/acs.iecr.0c02288. [DOI] [Google Scholar]
  31. Nawaz M.; Maulud A. S.; Zabiri H.; Taqvi S. A. A.; Idris A. Improved process monitoring using the CUSUM and EWMA-based multiscale PCA fault detection framework. Chin. J. Chem. Eng. 2021, 29, 253–265. 10.1016/j.cjche.2020.08.035. [DOI] [Google Scholar]
  32. Taqvi S. A. A.; Zabiri H.; Tufa L. D.; Uddin F.; Fatima S. A.; Maulud A. S. A Review on Data-Driven Learning Approaches for Fault Detection and Diagnosis in Chemical Processes. ChemBioEng Rev. 2021, 8, 239–259. 10.1002/cben.202000027. [DOI] [Google Scholar]; https://doi.org/10.1002/cben.202000027 (acccessed 2021/11/24)
  33. Pearson K. Principal components analysis. London Edinburgh Dublin Philos. Mag. J. Sci. 1901, 6, 559. [Google Scholar]
  34. Hotelling H. Analysis of a complex of statistical variables with principal components. J. Educ. Psychol. 1933, 24, 498–520. 10.1037/h0070888. [DOI] [Google Scholar]
  35. Singh R. P.; Likins P. W. Singular Value Decomposition for Constrained Dynamical Systems. J. Appl. Mech. 1985, 52, 943–948. 10.1115/1.3169173. [DOI] [Google Scholar]
  36. Strang G.Linear algebra and its applications; Harcourt: Brace, Jovanovich, Publishers, 1988. [Google Scholar]
  37. Botre C.; Mansouri M.; Nounou M.; Nounou H.; Karim M. N. Kernel PLS-based GLRT method for fault detection of chemical processes. J. Loss Prev. Process Ind. 2016, 43, 212–224. 10.1016/j.jlp.2016.05.023. [DOI] [Google Scholar]
  38. Zhu J.; Ge Z.; Song Z. Distributed Parallel PCA for Modeling and Monitoring of Large-Scale Plant-Wide Processes With Big Data. IEEE Trans. Ind.l Inform. 2017, 13, 1877–1885. 10.1109/TII.2017.2658732. [DOI] [Google Scholar]
  39. Josse J.; Husson F. Selecting the number of components in principal component analysis using cross-validation approximations. Comput.l Stat. Data Anal. 2012, 56, 1869–1879. 10.1016/j.csda.2011.11.012. [DOI] [Google Scholar]
  40. Saccenti E.; Camacho J. Determining the number of components in principal components analysis: A comparison of statistical, crossvalidation and approximated methods. Chemom. Intell. Lab. Syst. 2015, 149, 99–116. 10.1016/j.chemolab.2015.10.006. [DOI] [Google Scholar]
  41. Zhu M.; Ghodsi A. Automatic dimensionality selection from the scree plot via the use of profile likelihood. Comput.l Stat. Data Anal. 2006, 51, 918–930. 10.1016/j.csda.2005.09.010. [DOI] [Google Scholar]
  42. Jackson J. E.; Mudholkar G. S. Control Procedures for Residuals Associated With Principal Component Analysis. Technometrics 1979, 21, 341–349. 10.1080/00401706.1979.10489779. [DOI] [Google Scholar]
  43. Mallat S. G. A theory for multiresolution signal decomposition: the wavelet representation. IEEE Trans. Pattern Anal. Mach. Intell. 1989, 11, 674–693. 10.1109/34.192463. [DOI] [Google Scholar]
  44. Nounou H. N.; Nounou M. N. Multiscale fuzzy Kalman filtering. Eng. Appl. Artif. Intell. 2006, 19, 439–450. 10.1016/j.engappai.2005.11.001. [DOI] [Google Scholar]
  45. Yoon S.; MacGregor J. F. Principal-component analysis of multiscale data for process monitoring and fault diagnosis. AIChE J. 2004, 50, 2891–2903. 10.1002/aic.10260. [DOI] [Google Scholar]
  46. Umeda T.; Kuriyama T.; O’Shima E.; Matsuyama H. A graphical approach to cause and effect analysis of chemical processing systems. Chem. Eng. Sci. 1980, 35, 2379–2388. 10.1016/0009-2509(80)85051-2. [DOI] [Google Scholar]
  47. Yong-kuo L.; Abiodun A.; Zhi-bin W.; Mao-pu W.; Min-jun P.; Wei-feng Y. A cascade intelligent fault diagnostic technique for nuclear power plants. J. Nucl. Sci. Technol. 2018, 55, 254–266. 10.1080/00223131.2017.1394228. [DOI] [Google Scholar]
  48. Chang C. C.; Yu C. C. On-line fault diagnosis using the signed directed graph. Ind. Eng. Chem. Res. 1990, 29, 1290–1299. 10.1021/ie00103a031. [DOI] [Google Scholar]
  49. Nam D. S.; Han C.; Jeong C. W.; Yoon E. S. Automatic construction of extended symptom-fault associations from the signed digraph. Comput. Chem. Eng. 1996, 20, S605–S610. 10.1016/0098-1354(96)00110-X. [DOI] [Google Scholar]
  50. Ould Bouamama B.; Biswas G.; Loureiro R.; Merzouki R. Graphical methods for diagnosis of dynamic systems: Review. Annu. Rev. Control 2014, 38, 199–219. 10.1016/j.arcontrol.2014.09.004. [DOI] [Google Scholar]
  51. Ji C.; Zhu X.; Ma F.; Wang J.; Sun W. Fault Diagnosis Algorithm of Chemical Process Based on Information Entropy. Chem. Eng. Trans. 2020, 81, 541–546. [Google Scholar]
  52. Qin S. J. Survey on data-driven industrial process monitoring and diagnosis. Annu. Rev. Control 2012, 36, 220–234. 10.1016/j.arcontrol.2012.09.004. [DOI] [Google Scholar]
  53. Kaisare N. S.Computational techniques for process simulation and analysis using Matlab®; CRC Press, 2017, 10.1201/9781315119519. [DOI] [Google Scholar]

Articles from ACS Omega are provided here courtesy of American Chemical Society

RESOURCES