Skip to main content
Journal of Applied Statistics logoLink to Journal of Applied Statistics
. 2023 Jan 30;51(5):935–957. doi: 10.1080/02664763.2023.2170991

Aggregated parameter update schemes for monitoring binary profiles

Yifan Li a, Chunjie Wu a, Zhijun Wang a, Zhiming Hu b,CONTACT
PMCID: PMC10956921  PMID: 38524793

Abstract

Profile monitoring is one of the most important topics for statistical process control. Traditional self-starting profile monitoring schemes generally use all historical observations to estimate parameters. Because of the rapid increase in the complexity of modern statistical processes, the practitioners often need to deal with massive datasets in process monitoring. However, when observations of each period are of large sample size and the computation is of high complexity, the traditional method is not economical and urgently needs a parameter update strategy. Under the framework of binary profile monitoring, this paper proposes a novel recursive update strategy based on the aggregated estimation equation (AEE) for massive datasets and designs a self-starting control chart accordingly. Numerical simulation verifies that the proposed method performs better in parameter estimation and process monitoring. In addition, we give the asymptotic property of the proposed monitoring statistic and illustrate our method's superiority by a real-data example.

Keywords: Aggregated estimation equation, binary profile monitoring, massive dataset, recursive update, self-starting scheme

1. Introduction

Control charts are usually designed to monitor processes with time-series observations. In some industrial applications, single variables, like pressure and weight, are used to describe the process state, while multiple variables can also be used. According to the number of dimensions, modern statistical process control (SPC) research can be roughly divided into univariate control charts and multivariate control charts. In some special systems, the quality of a process may be more adequately captured by a relationship between the quality characteristic (the response variable) and one or more predictor variables (the independent variables) [25,41], which derives another important problem called profile monitoring. Our paper concentrates on developing an integrated parameter update scheme to improve the performance of monitoring binary profile processes.

The profile monitoring problem is of great importance in the field of SPC [30], has broad applications and has received extensive attention in recent years [1,27,32]. Kang and Albin [13] first studied monitoring simple linear profile (SLP) data and proposed two monitoring schemes. Kim et al. [18] comprehensively studied monitoring linear profile data and presented three Phase II univariate control charts that monitor the intercept, slope, and variance of the regression error, respectively. Zou et al. [43] proposed a multivariate exponential weighted moving average (EWMA) control chart for general linear profiles, by introducing the variable sampling interval (VSI) and the parametric diagnostic approach. Khedmati and Niaki [16] presented a MaxEWMA control chart to monitor a multi-stage process with a single statistic. Amiri et al. [2] investigated the effect of parameter dimensionality reduction when monitoring Phase II multivariate linear profile data. Based on SLP, researchers have further studied more complex profiles, like polynomial profiles, multivariate linear profiles with correlation patterns. To deal with data imbalance and data missing, Jensen et al. [12] used a linear mixed model to describe linear profile data, explaining the potential correlation structure. Kazemzadeh et al. [14] carried out a study on monitoring polynomial regression profile data, using a likelihood ratio test to locate parameter shifts. In addition, [15] made further efforts on monitoring polynomial regression profile data with first-order autocorrelation. Amiri et al. [4] proposed a MEWMA control chart to simultaneously monitor correlated profile data and multivariate normal quality characteristics. Zhang et al. [42] used the Gaussian process model to describe the profile relationship and then constructed two Phase II control charts that monitor the linear trend and correlation item, respectively, for monitoring linear profile data within correlation. Researches mentioned above generally assume that the response variable is continuous and follows a normal distribution. In industrial applications, there exists an extensive need for monitoring profiles with categorical response variables such as binary profiles [10,36,41]. Kinat et al. [19] proposed a profile monitoring scheme based on generalized linear modeling when the response variable follows the inverse Gaussian distribution. Soleimani and Asadzadeh [37] developed a monitoring approach based on generalized likelihood ratio for gamma-distributed responses as a remedial measure to reduce the detrimental effects of non-normality under generalized linear models. Previous works on binary profile monitoring show that the logistic regression model is consistent in describing the relationship between a binary response variable and multivariate predictor variables. Yeh et al. [41] used the logistic regression model to describe the functional relationship and gave several Hotelling's T2 control charts to monitor binary profiles. Combining the ideas of EWMA and the likelihood ratio test based on logistic regression, Shang et al. [35] proposed a scheme that can monitor the shifts of binary profiles and the predictor variables concurrently. It has the advantages of implementation flexibility, calculation convenience, and monitoring sensitivity. For more research findings, please see [34,38]. However, there is a lack of literature on self-starting binary profile monitoring schemes, which are frequently used when prior information about process parameters is insufficient [20,21,31]. Zou et al. [44] introduced the idea of ‘self-starting’ into linear profile monitoring to effectively identify shifts of either intercept, slope, or standard deviation of the profile when there is a lack of historical observations in Phase I. Ghashghaei et al. [9] proposed a self-starting Max-CUSUM control chart based on recursive residuals to monitor mean vector and variability of a simple linear profile simultaneously. Amiri et al. [3] proposed a self-starting control chart to simultaneously monitor mean and variance of a simple linear profile in the presence of an AR(1) autocorrelation structure within the profile. Xia and Tsung [40] constructed a sequential Wald-type self-starting charting statistic for the simultaneous detection of variance and coefficients in linear profiles with unknown error distributions. Generally, self-starting control schemes use all historical samples for parameter estimation; however, with the rapidly increasing complexity of modern statistical processes, massive datasets proliferated over recent years in monitoring problems, it may be not appropriate for cases when observations of each period are of large sample size and high calculation complexity. Therefore, an online update strategy of parameter estimation with high accuracy and calculation convenience is a matter of urgency. Khoravi and Amiri [17] first introduced the idea of self-starting binary profile monitoring based on the logistic regression model, and accordingly designed three self-starting control charts. It inspires us with an innovative parameter update strategy that reduces the computational loads and improves the monitoring performance for self-starting monitoring binary profiles.

Although the parameter update strategy of the self-starting scheme in Khoravi and Amiri [17] does reduce the computational burden, it still has some weaknesses when dealing with massive datasets. Firstly, Khoravi and Amiri [17] roughly took averages of the whole historical estimates of logistic regression coefficients as the in-control (IC) coefficients, while leading to a severe decline in the estimation accuracy in binary profile monitoring. Secondly, although it avoids repeated calculation of IC parameters, it still needs to store all collected data, which may degrade its practicability because of the memory constraint problem posed by massive datasets.

Based on the shortcomings mentioned above, this paper concentrates on the utility of the parameter update strategy for online self-starting monitoring binary profiles. Online monitoring schemes are supposed to deal with the real-time changing characteristics and achieve better detection performance [7,24]. Lin and Xi [23] borrowed the thought of ‘divide-and-conquer’ and proposed the aggregated estimation equation (AEE) estimator for massive datasets. Inspired by Lin and Xi[ [23], this paper presents a more effective approach of parameter update for better monitoring of binary profiles. The parameter integration recursive update strategy is used to continuously estimate the logistic regression coefficients and the corresponding covariance matrix, which improves the estimation accuracy and the sensitivity of self-starting monitoring for binary profiles. Furthermore, the given monitoring scheme for the binary profile is barely a representative example of our key innovation. The paper expects to provide valuable references for online monitoring research on different types of processes.

The remainder of this paper is organized as follows. In the next section, we first review the method of the AEE and construct an online update estimation strategy based on it, which is one of the main contributions of this paper, and a new self-starting control chart is constructed accordingly. Moreover, the asymptotic property of the proposed monitoring statistic is provided and proved. In Section 3, the performance of the update strategy designed is compared with the existing self-starting estimation method, and numerical simulations are conducted to evaluate the out-of-control (OC) performance of the novel self-starting control chart. In Section 4, we illustrate the superiorities of our design using a real-data example. Several remarks conclude the paper and further studies are mentioned in Section 5.

2. Methodology

In Section 2.1, we first provide a recursive update method based on the AEE estimator, and a self-starting monitoring scheme is constructed based on the new parameter estimation strategy in Section 2.2. After that, some theoretical properties of the self-starting monitoring statistic are discussed in Section 2.3.

2.1. An online update estimation based on the AEE estimator

In practice, self-starting methods that handle sequential monitoring by simultaneously updating parameter estimates and checking for OC conditions are generally used to deal with the situation when a sufficiently large reference dataset is unavailable. Zou et al. [44] proposed a self-starting monitoring scheme to monitor linear profile data. Generally, in the self-starting monitoring of general profiles, to determine whether the process is out of control, we first need to use historical data to obtain the estimator of the in-control regression parameter β (denoted as β^(k) in this subsection); then, the parameters estimated based on the current sample (denoted as β^k in this subsection) will be compared with β^(k), and the self-starting control chart should tend to trigger a signal if the difference is large [17,41,43]. In the self-starting monitoring of binary profiles, the procedure should be similar to that of monitoring general profiles. However, the estimation of regression parameters of binary data usually has no explicit formula (e.g. the maximum likelihood estimation, MLE), and involves substantial numerical calculations. When the sample size is large at each time, the cost of computing time and memory space may be insupportable in the estimation of β^(k).

To reduce the computational burden and avoid repeated calculations in the on-line estimation of IC parameters, a plausible operation is to replace the common estimation procedure by integrating all previous samples, making an average of estimates obtained in previous iterations and re-estimating the IC parameters with a recursive update [17]. Specifically, take β¯(k) as the estimator of the IC process, which is an average estimator based on the last k estimates of each sample {β^1,β^2,,β^k}, that is,

β¯(k)=1kΣi=1kβ^i, (1)

where β^i is the MLE of the logistic regression coefficient vector based on the ith sample observation. Thus the recursive update form of the IC parameters can be deduced as:

β¯(k)=(11k)β¯(k1)+1kβ^k. (2)

Equation (2) formulates the cumulative estimate up to period k, denoting as β¯(k), as a linear combination of β¯(k1) and β^k. Here β¯(k1) denotes the cumulative estimate up to period k−1, while β^k follows the definition in Equation (1).

By using the parameter update methodology in Equation (1), some monitoring statistics based on measuring the differences between β^k and β¯(k1) can be constructed to detect anomalies in processes with low computing costs; refer to Khoravi and Amiri [17] for examples. However, using this rough update strategy for binary profiles may severely degrade the estimation accuracy and monitoring efficiency. Specifically, for a general profile monitoring problem, the sample size of each period is assumed to be n equally for simplicity, denote the kth sample's observations as {zk,1,,zk,n}, zk,i=(yk,i,xk,iT) for i=1,,n, k1, and β^k is the solution of

Mk(β)=i=1nψ(zk,i,β)=0, (3)

where ψ is a score function satisfying k=1ki=1nE[ψ(zk,i,β)]=0 for a βRp. Generally, for a given sample size n, the MLE β^k is not an unbiased estimator of β0, namely, E(β^k)=β0β0, except for the linear regression model under Gaussian assumption. According to the Law of Large Numbers, β¯(k)=1kΣi=1kβ^ipβ0β0. It means that even if k is arbitrarily large, the estimator β¯(k) is not consistent. In Section 3.1, numerical results show that for a fixed n, β¯(k)β00 even if k ( β0 represents the real β, is the L2 norm). A more suitable linear combination of β^k is required to improve the estimation performance for better self-starting monitoring efficiency. For the traditional linear profile model under Gaussian assumption (i.e. ψ(z,β)=x(yxTβ)), a natural way is to take β¯(k)=(i=1kXiTXi)1i=1kXiTXiβ^i=(XTX)1XTy, which is equivalent to directly using the maximum likelihood estimation of all collected data and is unbiased and consistent, where Xi=(xi,1,,xi,n)T, X=(X1T,,XkT)T are the design matrices. However, for generalized linear profiles, the construction of the linear combination for better estimation is not trivial due to the nonlinearity of ψ().

Inspired by the AEE method from Lin and Xi [23], which is proposed to conduct estimating equation estimation for massive datasets, this paper presents a new recursive update strategy to solve the problem above and improve the self-starting monitoring performance of binary profiles. The computational burden can be greatly reduced and the memory space can be significantly saved due to repeated time-cost computing being avoided in parameter estimation and fewer collected observations are required to be stored.

Lin and Xi [23] used the first-order approximation of the Taylor expansion to represent the estimation Equation (3):

Mk(β)=Ak(ββ^k)+R2=Dk(β)+R2, (4)

where

Ak=Σi=1nψ(zki,β^k)β. (5)

and R2 is the residual term in the Taylor expansion. Based on the estimating equation estimates for K subsets, Lin and Xi [23] raised a new estimator β^(K) by solving D(β)=Σk=1KDk(β)=0 as follows:

β^(K)=(Σk=1KAk)1Σk=1KAkβ^k, (6)

It is proved in Lin and Xi [23] that the AEE estimator proposed is strongly consistent under specific conditions (see the Supplementary Materials for details).

Innovated by Equation (6), this paper obtains a new recursive update form for sequentially monitoring binary profiles:

β^(k)[(Σi=1k1Ai)+Ak]1[(Σi=1k1Aiβ^i)+Akβ^k]=(A(k1)+Ak)1[A(k1)β^(k1)+Akβ^k], (7)

where β^(k) is a cumulative estimate of the previous k subsets by recursive update, and A(k1)Σi=1k1Ai denotes the sum of last k−1 Ai as defined in Equation (5). Equation (7) shows a novel recursive update form of β, which only needs to store the process estimates β¯(k1) and A(k1) when estimating β^(k). When observations of the kth period are collected, β^k and Ak can be calculated firstly. Together with the recursive update estimate β^(k1) and A(k1), β^(k) can be estimated by Equation (7). Accordingly, the sum of Ais should be updated by A(k)=A(k1)+Ak, and so on.

Furthermore, this paper proposes a novel recursive update estimation of the covariance matrix for the construction of monitoring statistics. Given by Equation (7) and the independence assumption among each period's sample, the covariance matrix of β¯(k) can be recursively updated by:

V^(k)cov^(β^(k))=cov^[(A(k1)+Ak)1(A(k1)β^(k1)+Akβ^k)]=(A(k1)+Ak)1Var(A(k1)β^(k1)+Akβ^k))[(A(k1)+Ak)1]T=(A(k1)+Ak)1(A(k1)V^(k1){A(k1)}T+AkV^kAkT)[(A(k1)+Ak)1]T, (8)

where V^k is the covariance matrix estimate corresponding to the single-period estimate β^k. The same as Equation (7), the above formula does not need to store all the historical samples and only transforms a few process estimates during process monitoring, which is a real recursive update in our mind and also one of the main contributions of this paper. It considers the information in terms of the mean level and the effect of volatility (the covariance matrix level), thus this strategy has better estimation accuracy than simply averaging the historical estimates.

In this paper, we define the new parameter recursive update estimation based on the AEE method as the aggregated equation estimation update (AEEU). And the previous recursive update in Khoravi and Amiri [17] is denoted as the mean value estimation update (MVEU). In Section 3.1, some detailed numerical simulation is conducted to illustrate the superiority of the estimation performance of the AEEU compared with the MVEU.

2.2. A self-starting control chart based on the AEEU

Before introducing our binary profile monitoring statistic, considering the multivariate statistical process control, assume that (x1,x2,,xM) is the observation sequence of a p-dimension process and the IC distribution of the target process is a multivariate normal distribution, that is, xiNp(μ0,Σ0),i=1,2,,M. The goal of the multivariate control chart is to detect possible mean shifts of the process. Hotelling [11] first introduced the T2 statistic to monitor the multivariate process:

T0,i2=(xiμ0)TΣ01(xiμ0). (9)

Here T0,i2 measures the distance between the target observation xi and the IC mean value μ0 at the ith time point. If the process is still IC at time T = i, the following formula holds:

T0,i2χp2, (10)

when the IC distribution is unavailable, generally μ0 and Σ0 in Equation (9) can be replaced by their estimates. Denote the 1α quantile value of χp2 as χ1α,p2, where α[0,1] is a given confidence level in process monitoring, the control chart is supposed to alert when T0,i2>χ1α,p2. Hotelling's T2 statistic has far-reaching influence in the multivariate SPC and many other fields; refer to [8,26,28] for more discussions on properties and improvement of related control charts.

Based on the AEEU constructed in Section 2.1, we propose a novel self-starting control chart for monitoring binary profiles. The new self-starting Hotelling's T2-type statistic is defined as:

SSTk2=(β^kβ^(k1))T(Σ^)1(β^kβ^(k1)),k=1,2,, (11)

where β^j is the MLE based on the jth period sample and β^(k1) is the AEEU estimate obtained from the previous k−1 period samples. (β^kβ^(k1)) then measures the distance between the estimate of the current sample and the IC value estimated by historical data.

The value of Σ^ in Equation (11) is of great importance and in this paper, we set Σ^=cov^(β^kβ^(k1))=V^(k1)+V^k and thus obtain the self-starting control chart based on the AEEU:

SSTk2=(β^kβ^(k1))T(V^(k1)+V^k)1(β^kβ^(k1)),k=1,2,, (12)

where V^k is the covariance matrix estimate based on the jth sample and V^(k1) is the AEEU estimate based on the previous k−1 samples. In this paper, we choose (Ak)1 as the covariance estimator V^k, i.e. the estimator of the asymptotic covariance of the MLE. The control chart triggers a signal when SSTk2>h, where h is the control limit obtained from a pre-specified IC average run length (ARL0). It can be seen that when the current MLE β^k deviates far from the historical IC estimate β^(k1), the monitoring statistic SSTk2 would be pretty significant and easier to trigger an alarm.

Remark 2.1

The constructed self-starting control chart (12) for monitoring binary profiles is a Hotelling T2-type statistic. The formula shows that it is different from CUSUM-type or EWMA-type control charts, and only measures the deviation from the IC distribution of the current sample, namely, it does not accumulate historical deviations. However, we can similarly design a memory-based self-starting control chart based on the AEEU. For instance, let

zk=(V^(k1)+V^k)12(β^kβ^(k1)),

thus an EWMA-type self-starting control chart can be constructed accordingly:

uk=λzk+(1λ)uk1,k=1,2,,

where 0<λ<1 is a smoothing parameter, and the control chart triggers an alarm when ukTuk gets larger than the prespecified control limit. Due to the improvement in the self-starting monitoring proposed in this paper is primary in the parameter update rather than the form of control chart, we will focus on the theoretical properties and monitoring performance of the T2-type control chart (12). The properties and improvement of other types of AEEU-based control chart can refer to those of the control chart (12).

Remark 2.2

In Section 2.1, we have shown that the MVEU is biased, which can degrade the monitoring performance. However, although the AEEU can improve the estimation accuracy, it still fails to eliminate the bias. As the increase of each sample size n can reduce the error and improve the detection accuracy, in practice, we can improve the monitoring performance by merging samples. Specifically, the approach is to combine the adjacent samples, and apply the AEEU to the combined samples to obtain the IC estimate. As the size of each combined sample increases, the estimation bias can also decrease. Obviously, this operation will reduce the computation efficiency, so it is necessary to control the estimation bias and computational burden by controlling the maximum combined sample size nmax. Without other requirements, nmax shall be the maximum sample size within the maximum acceptable 1tational burden (considering both the computing time and memory space) in solving the MLE based on one sample. For instance, the practical monitoring procedure can be as follows:

  • Step 1.

    Select the acceptable maximum sample size nmax and the control limit h according to ARL0 before monitoring a process;

  • Step 2.

    for k = 1, suppose that the collected sample Z1 is of size n1 (note that in practice the sample size in each period is not always the same), calculate SST12=(β^1β^(0))T(V^(0)+V^1)1(β^1β^(0)), where β^(0) and V^(0) are the MLE based on the historical samples for initial estimation (cf. Khosravi and Amiri [17]). If SST12>h, the control chart triggers a signal, and stop the monitoring; otherwise, initialize N = 0, j = 1, and continue to Step 3;

  • Step 3.

    for k=2,3,, suppose that the collected sample Zk is of size nk, if N+nk1nmax, combine the last N+nk1 observations into one sample Zj; otherwise, let j = j + 1, and reset N = 0, so that Zj=Zk. Calculate SST12=(β^kβ^(j))T(V^(j)+V^1)1(β^kβ^(j)), where β^(j) and V^(j) are estimators based on the AEEU obtained from Z1,,Zj. Set N=N+nk1. Repeat Step 3 until SST12>h and the control chart triggers a signal, and stop the monitoring.

The above monitoring procedure can dynamically combine collected samples, maximize the size of a single sample within the maximum computational burden for solving the MLE, thus reducing information loss. Moreover, the above strategies can be changed according to the application scenarios, which needs further research in the future.

Table 1.

Computational cost of the AEEU-SC and SST2 control chart.

  AEEU-SC SST2 EEU-SC
Total computational cost 50.90 s 49.54 s 5.02 ×105 s
Maximum computational cost 0.08 s 0.07 s 110.34 s
Memory space required 1.05 ×106 bytes 1.12 ×108 bytes 1.11 ×108 bytes

2.3. Theorem

To better describe the proposed monitoring statistic, in this subsection, we concentrate on the theoretical properties of SSTj2 proposed in Equation (12). We firstly introduce some basic notations and definitions in the theorem. Throughout this subsection, the symbol with a lower right corner indicates the estimate based on the current sample, e.g. the definition of β^k is the MLE obtained from the n observations of the kth sample; the symbol with an upper right corner is a cumulative estimate based on historical samples, e.g. β^(k) is the cumulative MLE up to period k. Some necessary definitions are as follows

  • Suppose that the independent observations (yk,i,xk,i) (i=1,,n;k1) follow the logistic regression model
    Pr(yk,i=1)=μ(xk,iTβ)=exk,iTβ1+exk,iTβ,
    the log-likelihood function of period k is defined as lk(β)=i=1n[μ(xk,iTβ)]yk,i[1μ(xk,iTβ)]yk,i, so the cumulative log-likelihood function up to period k is l(k)(β)=j=1klk(β).
  • Define the first derivative and negative second derivative of the log-likelihood function of period k as Sk(β)lk(β)β and Fk(β)2lk(β)β2, and let Sk=Sk(β0), Fk=Fk(β0); thus the first derivative and negative second derivative of the cumulative log-likelihood function up to period k are S(k)(β)l(k)(β)β and F(k)(β)2l(k)(β)β2, and let S(k)=S(k)(β0), F(k)=F(k)(β0), where β0 is the real coefficient vector of the logistic regression model.

Before proving the theorem, we first provide a lemma of the asymptotic normality of β^ for the general case, for the convenience of our statement.

Lemma 2.1

Suppose that the vectors xk,i are uniformly bounded, the minimum eigenvalue λk of i=1nxk,ixk,iT satisfies λk/n>C>0 for all k and n, if n and k=O(nγ) for some 0<γ<13, we have Fk12(β^kβ0)dN(0,Ip) and (F(k))12(β^(k)β0)dN(0,Ip).

Lemma 2.1 gives the asymptotic normality of β^k and β^(k) in general cases. Based on the above proposition, we have the following conclusion.

Theorem 2.2

Under conditions in Lemma 2.1, if n, we have

(β^kβ^(k1))T[(i=1k1Ai)1+(Ak)1]1(β^kβ^(k1))χp2. (13)

Theorem 2.2 shows that in the framework of the logistic regression model, the AEEU-based monitoring statistic SSTj2 proposed in this section asymptotically follows a Chi-square distribution. This ensures the theoretical validity of the proposed monitoring scheme. All proofs of the lemma and theorem are given in the Supplementary File.

Remark 2.3

Because the control chart is usually used when there are few historical data, it may not be possible to use data-driven methods (e.g. a Bootstrap procedure, see Chatterjee and Qiu [6] for details) to find the control limit in the practical monitoring. It is important to construct a procedure to estimate the control limit for self-starting schemes without historical data. For a Shewhart-type control chart, if its asymptotic distribution is a Chi-square distribution χp2, the control limit of it for a prespecified ARL0 can be approximated to χp,1α2, where χp,1α2 is the 1α quantile of χp2 and α=1/ARL0. This conclusion is based on the fact that the Shewhart control chart is independent and identically distributed at different times. Different from Shewhart charts, the proposed self-starting charting statistics (12) are dependent at different times, which may affect the effectiveness of using the asymptotic property to determine the control limits. However, we found that this problem did not affect the estimation of the control limit significantly when using the asymptotic property to determine the control limits. The control limit of (12) can still be estimated by χp,1α2. This is because, when the process is in-control, with the increase of k, β^k will become stable quickly and nearly fixed, and then the charting statistics will be nearly independent of each other. This property has been broadly used in many self-starting control charts. The accuracy of using this property to determine the control limits is investigated in Section 3.2.

3. Numerical study

In this section, under the background of monitoring binary response profile data, we first compare the estimation performance of the logistic regression coefficients of the AEEU and the MVEU used in Khoravi and Amiri [17] in Section 3.1, and the monitoring performance of two self-starting control charts based on the MVEU and the novel AEEU is evaluated in Section 3.2.

3.1. Comparison of on-line estimation performance

This subsection mainly evaluates the parameter estimation performance of both the AEEU and the MVEU, based on a specific logistic regression model. Denote the logistic regression model as:

Pr(y=1)=μ(xTβ)=exp(xTβ)1+exp(xTβ),i=1,,n, (14)

where x=(1,x1,,x5)T, 1 is the intercept item of the logistic regression model. The real coefficient vector is set to be βtrue=(β0,β1,,β5)T=(0,1,2,3,4,5)T and the sample size of each period is fixed as n = 500. It is assumed that elements of the explanatory vector xk,i are independent identically distributed variables that follow the standard normal distribution. Also, the number of periods is set to be K = 100 and the observations of each period are randomly generated. Then the logistic regression coefficients are estimated by the MLE, denoting as {β^k,k=1,2,,100}. The AEEU and the MVEU are conducted to simulate the recursive update procedure by Equation (7) and Equation (2), respectively. According to the binary regression profile model defined in Equation (14), the recursive update procedure of the AEEU is as follows:

  • Step 1: deduce the log-likelihood function of the logistic regression coefficients β:
    lk(β)=logi=1n[μ(xk,iTβ)]yk,i[1μ(xk,iTβ))]1yk,i=i=1n[yk,ilog(μ(xk,iTβ))+(1yk,i)log(1μ(xk,iTβ))]. (15)
  • Step 2: calculate the first-order partial derivative of the likelihood function:
    lk(β)β=i=1nyk,ilog(μ(xk,iTβ))β+(1yk,i)log(1μ(xk,iTβ))β=i=1nyk,i1μ(xk,iTβ)μ(xk,iTβ)β+(1yk,i)11μ(xk,iTβ)1μ(xk,iTβ)β=i=1nyk,i(1μ(xk,iTβ))xk,i(1yk,i)μ(xk,iTβ)xk,i=i=1n(yk,iμ(xk,iTβ))xi. (16)
  • Step 3: calculate Ak of the logistic regression model:
    Ak=i=1n(yk,iμ(xk,iTβ))xk,iβ=i=1n(111+exp(xiTβ))xk,iβ=i=1nexp(xk,iTβ)(1+exp(xk,iTβ))2xk,ixk,iT. (17)

The relative bias is used to evaluate the performance of parameter estimations in this paper, its definition is as follows:

rbiasβ=||β^βtrue||||βtrue||, (18)

where βtrue is the real regression coefficient vector and |||| represents the L2 norm.

To reduce the randomness in the comparison, we repeated the simulation 1000 times, and compared the mean and standard deviation of rbiasβ of the two methods at different times, where the mean indicates the estimation accuracy (i.e. the average relative bias), and the standard deviation shows the stability of the rbiasβ. As shown in Figure 1, the relative bias rbiasβ varies as the number of periods increases, the horizontal axis and vertical axis represent the number of observed periods and the relative bias of the estimated parameters. The solid line indicates the simulation results of the AEEU, while the dashed line indicates that of the MVEU. The figure shows that the relative bias of the AEEU is always smaller than that of the MVEU, and when the number of periods increases, they gradually become flat and converge to a constant, which verifies that the AEEU proposed in this paper can estimate the parameters more accurately than the MVEU. At different times, the standard deviation of rbiasβ of AEEU is smaller, indicating that it is more stable.

Figure 1.

Figure 1.

Comparison of the average relative bias of β (a) and the corresponding standard deviation (b) between the AEEU and the MVEU.

In Section 2.1, we have shown that the MVEU is biased. From Figure 1(a) we can see that, with the increase of k, the rbiasβ of both the AEEU and MVEU tends to be stable but still greater than 0. It means that both the AEEU and MVEU are biased, while the expected bias (i.e. E(β^(k)βtrue) of the AEEU is smaller. To study the robustness to n of the AEEU, we repeat the above simulation for the AEEU with n = 250, 500, 1000, 2000. The rbiasβ curves are shown in Figure 2. The results show that with the increase of n, the rbiasβ decreases. The AEEU is proposed for processes of massive data, the bias should not be a serious problem in many real applications due to the advance of data acquisition and storage usually produce large-scale data.

Figure 2.

Figure 2.

Calculated rbiasβ curves of the AEEU with different n.

3.2. Comparison of monitoring schemes

Continuously using the settings in Section 3.1, this subsection evaluates and compares the performance of monitoring binary profile data of the two control schemes based on the AEEU and the MVEU. In this subsection, denote the observations of one period as {zk,i=(xk,i,yk,i),i=1,,n,k=1,} for simplicity, where n is the size of the current sample and yi is the binary response variable. As for x=(1,xk,i1,,xk,i5)T, {xk,i1,,xk,i5} are observed values of the 5 explanatory variables, while 1 corresponds to the intercept term of the logistic regression model.

Integrating Equations (7), (8) and (12) into Equation (17), we conduct the monitoring for binary response regression profiles, which is called AEEU self-starting control chart (AEEU-SC).

Note that most conventional control charts which usually assume the IC parameters can be well estimated, including CUSUM and EWMA charts, are not proper for comparison. Self-starting control charts are applicable in monitoring processes without sufficient samples in Phase I, thus the application scenarios of self-starting schemes and EWMA/CUSUM charts are different. To show the improvement of the proposed scheme, another self-starting chart should be selected for comparison. In Section 2.2, we have pointed out that, various types of self-starting control charts can be obtained based on the proposed AEEU. Thus, the compared charts should be in the same form. For simplicity, a similar T2-type self-starting chart shall be considered for comparison.

Before selecting the control chart to be compared, we show the improvement in computational efficiency by using the AEE method to estimate parameters through some numerical results. Another two Hotelling T2-type control charts are selected to be the benchmark for comparison, including the SST2 chart in Khoravi and Amiri [17] (denoted as SST2) and a T2-type statistic based on the estimating equation estimation (denoted as EEU-SC) (cf. Lin and Xi [23]). Specifically, the EEU-SC is

EEUSCk=(β^kβ~(k1))T(Σ~)1(β^kβ~(k1)),k=1,2,,

where β~(k) is the MLE obtained from last k samples directly, and Σ~=cov^(β^kβ~(k1))=cov^(β^k)+cov^(β~(k1)) can be similarly calculated as Ak in Section 3.1. The formula of SST2 refers to Khoravi and Amiri [17], that is

SSTk2=(β^kβ¯(k1))T(S(j1))1(β^kβ¯(k1)),

where S(j1) is the estimated covariance matrix (refer to Khoravi and Amiri [17] for its specific form). The above two control charts both require to store all historical observations, The former one involves more computation because it estimates IC parameters using all collected samples at each period. The computational efficiency of large datasets is not considered in its design. Following Lin and Xi [23], we randomly generated a sequential process with a length of 1000 and n = 1000, the data generation and β0 are the same as above. The total computational cost and the maximum computational cost for one sample of the two control charts to monitor the process are shown in Table 1.

Table 1 shows that the computing time of the EEU-SC chart is much longer than that of the AEEU-SC, and the memory space required of the SST2 and EEU-SC is much larger than that of the AEEU-SC. The difference in computing time and memory space required may be because the update of EEU-SC and SST2 requires all historical data and the estimation in EEU-SC involves a considerable amount of iterative calculation. In modern industrial processes, due to the rapid development of sensor technology and data collection systems, practitioners often need to deal with massive datasets. The sample size may be much larger than the setting in our simulation, and the sample collection intervals may be milliseconds. In these monitoring scenarios, the selected control charts should be computational efficient, and the proposed AEEU-SC could be an optimal choice.

For comparison, a monitoring statistic similar to Equation (11) is set to be

SST¯k2=(β^kβ¯(k1))T(Σ¯)1(β^kβ¯(k1)),k=1,2,, (19)

where β^k and β¯(k1) follow the definitions in Section 2.1, and

Σ¯=cov^(β^kβ¯(k1))=Ak1+i=1k1Ai1/(k1)2.

Similar to the AEEU-SC, SST¯k2 is denoted as MVEU-SC hereafter. In Section 3.1, the accuracy of parameter estimation is evaluated via numerical simulation, and this subsection will further test the monitoring performance of the AEEU-SC and the MVEU-SC.

Following other literature on self-starting control charts, assume that the logistic regression coefficient vector β experiences a mean shift at time T=τ, from β0 to β1, that is, β=β0 for t=1,2,,τ1, and β=β1 for t=τ,τ+1,, where β0=(β0,β1,,β5)T=(0,1,2,3,4,5)T is the logistic regression coefficient vector of the IC process. For self-starting monitoring, several historical samples are required for initial estimation of the model parameters. In this study, the number of historical samples generated before monitoring is set to be 5. The sample size of each period n is set to be 500, and ARL0 of each control chart is set to be 200. Monte Carlo simulation is used to calculate the control limits of the AEEU-SC and the MVEU-SC, and random samples (i.e. zk,i=(xk,i,yk,i)) are generated from the IC distribution to obtain the run length for each process. The control limit h should be determined carefully such that a prespecified IC ARL is reached. For a given nominal ARL0, the bisection search algorithm or its modified versions can be applied effectively to search h (cf., e.g. Capizzi and Masarotto [5], Li et al. [22]). All results are obtained by 10,000 repetitions.

Taking the expected delay as the evaluation of control charts' OC monitoring performance, several OC cases of different types and magnitudes are designed as follows:

  • Case I: only one element of the logistic regression coefficient vector shifts, β1=(0,1,2,3,4,5+δ)T, where δ{0.2,0.4,,1,0.2,0.4,,1}.

  • Case II: two elements of the logistic regression coefficient vector shift in the same direction, β1=(0,1,2,3,4+δ,5+δ)T, where δ{0.2,0.4,,1,0.2,0.4,,1}.

  • Case III: two elements of the logistic regression coefficient vector shift in different directions, β1=(0,1,2,3,4δ,5+δ)T, where δ{0.2,0.4,,1,0.2,0.4,,1}.

  • Case IV: the intercept term of the logistic regression coefficient vector shifts, β1=(δ,1,2,3,4,5)T, where δ{0.2,0.4,,1,0.2,0.4,,1}.

Note that for Case I–IV, both positive and negative shifts are considered; moreover, for Case II and Case III, the positions of shifts are the same, but the shifts in the former are in the same direction, while the shifts in the latter are in different directions. Because the MLE of binary profiles is biased for fixed sample size, the OC performance of the schemes differ for shifts in different directions. Therefore, to investigate the detection performance of the schemes more comprehensively, the shifts in different directions (positive or negative) need to be considered in each case.

The OC performance comparison of the AEEU-SC and the MVEU-SC under Case I is shown in Table 2, where δ{0.2,0.4,,1,0.2,0.4,,1} represents different magnitudes of shifts appear in the element of the logistic regression coefficient vector whose IC mean is 5. Conclusions are as follows:

  • As we mentioned in Section 2.1, because the MLE of the binary profiles is biased, the IC estimate of β5 of the AEEU is greater than 5, while that of the MVEU is less than 5, the detection performance of the two control charts for shifts in different directions is significantly different, and when the shift magnitude is small (e.g. δ=±0.2), the value of ARL1s may be greater than ARL0, that is, the control charts are biased. Moreover, comparing the MVEU-SC with δ=0.2 and the AEEU-SC with δ=0.2, we can see that the MVEU-SC may be more biased and the monitoring performance is worse than the AEEU-SC.

  • For the negative shift magnitudes ( δ{0.2,0.4,,1}) and change points, the ARL1s of the AEEU-SC are significantly less than those of the MVEU-SC, which means the AEEU-SC performs much better in detecting negative shifts. For the positive shift magnitudes ( δ{0.2,0.4,,1}), the AEEU-SC performs worse than the MVEU-SC. But on the whole, the AEEU-SC is still better than MVEU-SC in this case.

  • When the shift magnitude δ increases, ARL1s of either the AEEU-SC or the MVEU-SC monotonously decrease, indicating that the greater the shift magnitude is, the more sensitive the control chart is. In addition, in the majority of cases, with τ increasing, in other words, when the change point occurs later, ARL1s of both control charts decrease monotonously. Later change points mean that the self-starting control chart can accumulate more IC samples so that the IC coefficients can be accurately estimated and the monitoring performance is improved.

Table 2.

Case I, calculated ARL1 values and their standard deviations (in parentheses) of the AEEU-SC and the MVEU-SC with various combinations of δ and τ.

  AEEU-SC MVEU-SC
  τ τ
δ 20 50 100 20 50 100
−0.2 170.91 (1.96) 167.46 (1.84) 153.51 (1.77) 204.87 (2.03) 216.02 (2.08) 219.73 (2.07)
−0.4 126.77 (1.73) 85.01 (1.41) 53.015 (0.86) 198.85 (2.00) 183.42 (2.05) 154.51 (1.85)
−0.6 60.78 (1.38) 15.59 (0.50) 8.69 (0.14) 162.66 (1.94) 116.91 (1.95) 53.64 (1.06)
−0.8 3.06 (0.03) 2.82 (0.03) 2.62 (0.02) 88.10 (1.62) 26.33 (0.84) 9.903 (0.29)
−1.0 1.57 (0.01) 1.46 (0.01) 1.44 (0.01) 15.58 (0.70) 3.16 (0.04) 2.64 (0.02)
0.2 205.89 (2.07) 203.57 (2.21) 188.70 (1.93) 173.03 (1.82) 145.11 (1.63) 141.29 (1.71)
0.4 180.46 (2.13) 156.84 (1.98) 149.52 (2.04) 135.33 (1.76) 81.04 (1.30) 50.32 (0.85)
0.6 121.99 (2.05) 65.86 (1.38) 31.50 (0.80) 65.30 (1.14) 22.05 (0.52) 11.34 (0.15)
0.8 59.45 (1.53) 14.93 (0.62) 6.43 (0.10) 18.39 (0.61) 4.77 (0.08) 3.77 (0.04)
1.0 11.04 (0.70) 3.48 (0.13) 2.81 (0.02) 3.20 (0.11) 2.19 (0.02) 1.90 (0.01)
EWRL 125.14 100.65 85.88 138.34 110.90 91.38

Above all, the AEEU-SC is better than the MVEU-SC when monitoring OC processes under Case I.

Tables 3 and 4 show the OC performance of the AEEU-SC and the MVEU-SC for Case II and III, respectively. Both cases describe scenarios where two elements of the logistic regression coefficient vector experience mean shifts. However, the conclusions we draw from it are not similar. In Case II, the results are similar to those of Case I, but to a greater extent. Because the AEEU and MVEU are biased, the AEEU-SC performs worse in detecting positive shifts, while the MVEU-SC is less available in detecting negative shifts, and compared with Case I, this trend is more pronounced. Because the IC parameters β4 and β5 estimated by AEEU are both greater than the true value, the detection performance of the AEEU-SC is worse for small positive shifts ( δ=0.2 or 0.4) in this case. The MVEU-SC has similar but severer problems in detecting small negative shifts. According to the above description and the analysis in Case I, we can conclude that the AEEU-SC performs better than the MVEU-SC in Case II. In Case III, The shift directions of the two changed elements are opposite, and the results are quite different from those in Case II. In this case, the AEEU-SC is better than the MVEU-SC in almost all shift cases, and all ARL1s in Table 3 are less than ARL0, which means that although the control charts are biased, the performance of the control charts will decline significantly only when the shift directions of most elements are the same the deviation directions of the IC estimation.

Table 3.

Case II, calculated ARL1 values and their standard deviations (in parentheses) of the AEEU-SC and the MVEU-SC with various combinations of δ and τ.

  AEEU-SC MVEU-SC
  τ τ
δ 20 50 100 20 50 100
−0.2 180.22 (2.08) 145.33 (1.61) 129.75 (1.61) 223.41 (2.08) 231.66 (2.11) 251.24 (2.40)
−0.4 112.52 (1.69) 63.05 (1.21) 35.39 (0.69) 228.12 (2.14) 229.62 (2.28) 219.67 (2.40)
−0.6 33.13 (0.93) 8.34 (0.18) 5.73 (0.09) 195.88 (2.10) 162.56 (2.38) 117.39 (2.00)
−0.8 3.17 (0.09) 1.99 (0.02) 1.98 (0.01) 117.41 (1.88) 56.09 (1.44) 13.98 (0.39)
−1.0 1.26 (0.01) 1.21 (0.01) 1.17 (0.00) 28.94 (1.09) 3.47 (0.09) 2.73 (0.03)
0.2 223.51 (2.37) 212.99 (2.21) 203.69 (2.15) 165.62 (2.10) 133.50 (1.75) 107.94 (1.38)
0.4 220.26 (2.08) 180.15 (2.25) 148.83 (2.06) 104.91 (1.50) 56.01 (1.09) 32.23 (0.51)
0.6 164.45 (2.31) 86.22 (1.74) 48.75 (1.05) 46.81 (1.12) 11.37 (0.22) 7.83 (0.10)
0.8 86.90 (1.80) 26.23 (0.91) 9.54 (0.18) 10.11 (0.46) 3.35 (0.05) 2.95 (0.02)
1.0 31.11 (1.49) 5.25 (0.16) 3.45 (0.03) 2.73 (0.10) 1.86 (0.01) 1.73 (0.01)
EWRL 138.59 101.43 87.52 148.51 120.53 105.01

Table 4.

Case III, calculated ARL1 values and their standard deviations (in parentheses) of the AEEU-SC and the MVEU-SC with various combinations of δ and τ.

  AEEU-SC MVEU-SC
  τ τ
δ 20 50 100 20 50 100
−0.2 171.20 (2.15) 118.12 (1.69) 90.89 (1.39) 163.10 (1.83) 138.15 (1.76) 113.09 (1.52)
−0.4 35.55 (1.15) 6.24 (0.16) 4.69 (0.05) 57.43 (1.23) 15.45 (0.42) 8.01 (0.10)
−0.6 1.46 (0.01) 1.41 (0.01) 1.37 (0.01) 2.58 (0.10) 1.77 (0.01) 1.69 (0.01)
−0.8 1.05 (0.00) 1.03 (0.00) 1.03 (0.00) 1.10 (0.00) 1.07 (0.00) 1.07 (0.00)
−1.0 1.00 (0.00) 1.00 (0.00) 1.00 (0.00) 1.00 (0.00) 1.01 (0.00) 1.00 (0.00)
0.2 174.44 (2.06) 123.65 (1.79) 96.64 (1.55) 172.51 (2.01) 121.76 (1.62) 97.31 (1.34)
0.4 38.95 (1.16) 8.17 (0.27) 5.40 (0.09) 54.59 (1.27) 13.97 (0.43) 6.54 (0.08)
0.6 1.80 (0.02) 1.46 (0.01) 1.49 (0.01) 2.10 (0.03) 1.67 (0.01) 1.65 (0.01)
0.8 1.05 (0.00) 1.05 (0.00) 1.05 (0.00) 1.09 (0.00) 1.08 (0.00) 1.07 (0.00)
1.0 1.00 (0.00) 1.00 (0.00) 1.00 (0.00) 1.00 (0.00) 1.01 (0.00) 1.00 (0.00)
EWRL 77.68 48.74 38.04 83.32 55.85 43.34

Table 5 shows the performance comparison between the AEEU-SC and the MVEU-SC for Case IV. In this case, the results are similar to those in Case III rather than Case I, ARL1s are consistently less than ARL0, which may indicate that the IC parameter estimates of β0 of the two methods are more accurate than those of β5, and except for a few settings of δ and τ, the AEEU-SC performs significantly better than the MVEU-SC, which shows that the AEEU-SC has tremendous advantages over MVEU-SC in this case. To sum up, for the above scenarios, the AEEU-SC performs better than the MVEU-SC, which indicates that it is more available for self-starting monitoring binary than the traditional methods.

Table 5.

Case IV, calculated ARL1 values and their standard deviations (in parentheses) of the AEEU-SC and the MVEU-SC with various combinations of δ and τ.

  AEEU-SC MVEU-SC
  τ τ
δ 20 50 100 20 50 100
−0.2 195.84 (2.09) 152.74 (1.74) 153.15 (1.79) 183.49 (1.90) 167.54 (1.90) 142.28 (1.70)
−0.4 111.23 (1.68) 63.49 (1.34) 27.35 (0.64) 119.48 (1.66) 71.43 (1.26) 40.77 (0.75)
−0.6 23.39 (0.78) 4.79 (0.06) 3.99 (0.04) 36.75 (0.95) 9.51 (0.26) 5.36 (0.06)
−0.8 1.80 (1.77) 1.64 (0.01) 1.59 (0.01) 6.19 (0.48) 2.02 (0.02) 1.85 (0.01)
−1.0 1.12 (0.00) 1.13 (0.00) 1.09 (0.00) 1.28 (0.01) 1.19 (0.00) 1.18 (0.00)
0.2 167.52 (1.85) 156.92 (1.92) 143.25 (1.71) 189.65 (2.04) 171.43 (1.91) 149.51 (1.76)
0.4 116.74 (1.86) 62.50 (1.30) 28.02 (0.66) 134.34 (1.81) 77.38 (1.41) 44.57 (0.82)
0.6 18.61 (0.69) 5.07 (0.12) 3.75 (0.04) 43.72 (1.15) 9.07 (0.27) 5.88 (0.08)
0.8 1.88 (0.02) 1.55 (0.01) 1.57 (0.01) 3.26 (0.13) 2.04 (0.02) 1.88 (0.01)
1.0 1.12 (0.00) 1.13 (0.00) 1.08 (0.00) 1.26 (0.01) 1.21 (0.01) 1.17 (0.00)
EWRL 110.03 81.53 67.70 120.15 90.46 70.50

To compare the overall performance of the control charts, we introduce the expected weighted run length (EWRL) proposed by Ryu et al. [33]

EWRL=abω(δ)ARL(δ)f(δ)dδ,

where ARL (δ) is the ARL of the considered control chart when the actual mean shift size is δ, and δ follows a distribution with a density function f() within a range [a,b]. ω(δ) denotes the weight associated with δ. The EWRL is the expected ARL (EARL) when ω(δ)=1. Following Ryu et al. [33], we set ω(δ)=1+δ2, so that the corresponding EWRL is the expected ARL, namely, EWRL=ab(1+δ2)ARL(δ)f(δ)dδ; the considered density function f(δ) is a uniform distribution U(1,1). The calculated EWRLs of all scenarios are also shown in Tables 2 – 5. The EWRLs show that the AEEU-SC outperforms the MVEU-SC with the same τ in all scenarios.

Moreover, although the AEEU-SC outperforms the MVEU-SC overall, the results show that it is still biased. In some monitoring scenarios, the bias can cause serious problems. It means that the control chart can be invalid in detecting small shifts in specific directions. In Section, we provide some discussion on alleviating the bias problem and improving detection performance by combining samples. This process enables the control chart to achieve a balance between computational efficiency and monitoring effectiveness. Here we provide some new suggestions based on the simulation results: (i) in many practical monitoring scenarios, the control chart should not alert process operators to small random location shifts, because they are generally due to common cause variation [39]. Moreover, in some processes, only part of the shift direction should be concerned, and these directions may not be the same as the bias direction. In these scenarios, the bias problem is less serious. Before monitoring, the practitioner should carefully consider the concerned shift sizes and directions, evaluate whether the sample size n is sufficient for the monitoring, and determine whether the AEEU-SC is applicable in the process; (ii) it is well known that a memory-based self-starting chart based on the AEEU (e.g. the EWMA-type chart mentioned in Section 2.2) is more suitable for detecting small shifts.

In Section 2.3, we have pointed out that, it is important to estimate the control limit for a self-starting control chart without historical data. Thus, we provide some numerical results to verify the effectiveness of using χp,1α2 to estimate the control limit, where α=1/ARL0. The nominal ARL0 is set to be 200, and the sample sizes at each period n are selected to be 250, 500, 1000, 2000; other parameters are the same as above. The numbers of historical samples for each case are set to be 24, 12, 6, 3, respectively, so the total historical observations are of the same size. The results are based on 10, 000 repetitions. The actual ARL0s with various sample sizes are shown in Table 6. From the table, we can see that with the increase with n, the actual ARL0 gets close to the nominal value. Because our method is proposed for massive data process monitoring problems, it is feasible to determine the control limit based on the asymptotic distribution.

Table 6.

Actual ARL0s and their standard deviations (in parentheses) when the control limit is set to be χp,1α2.

Sample size 250 500 1000 2000
Actual ARL0 266.89 (3.63) 228.56 (2.58) 213.95 (2.21) 205.20 (2.09)

The nominal ARL0 is set to be 200.

4. Real-data application

This section demonstrates our proposed method using an experimental dataset from a soft drink manufacturer, which also appears in the research of Montgomery et al. [29] and Khoravi and Amiri [17]. The manufacturer's marketing department should design experiments to test the effect of issuing different discount coupons on consumers' purchasing behavior. The explanatory variable of the data source is the discount value of the issued coupon, ranging from 5 cents to 25 cents with a step of 2 cents, that is, {x1=5,x2=7,,x11=25}. Meanwhile, the response variable is the corresponding number of coupons that consumers used in each period. The more coupons used by consumers, the greater the promotion of consumption is. Referring to Montgomery et al.[29] and Khoravi and Amiri [17], we logarithmize the explanatory variable and replace it with log(x), obtaining the design matrix:

X=(1111log(5)log(7)log(9)log(25))T. (20)

The logistic regression model is used to evaluate the impact of discounts on consumers' willingness to buy. Referring to Montgomery et al. [29] and Khoravi and Amiri [17], the IC coefficient vector of the logistic regression model is set to be βIC=(β0,β1)T=(4.5986,1.7397)T. In terms of the data source, existing example, and the applicable situation of the proposed chart, we assume that the intercept term of the logistic regression model coefficients β shifts from the 16th time point, changing from β0=4.5986 to β0=4.1785, which means the OC coefficient vector is βOC=(4.1785,1.7397)T. The sample size of each period is n = 100. Both the AEEU-SC and the MVEU-SC are constructed to monitor the OC process.

For each period of the process, the logistic regression coefficient vector β is estimated by the MLE. And the recursive update results of both the AEEU-SC and the MVEU-SC are shown in the second and third columns of Table 7, where β^0k and β^1k represent the estimates of the intercept term and the slope term, respectively, for every single period. Both the AEEU and the MVEU are used to recursively update the estimates of the logistic regression coefficients, and simultaneously the corresponding monitoring statistics SSTAEEU2 and SSTMVEU2 are calculated. We set ARL0 = 200, and the control limits of the two charts considered are obtained using the bootstrap procedure based on the IC dataset. The control limit of the AEEU-SC is hAEEU=10.469, while that of the MVEU-SC is hMVEU=10.689.

Table 7.

Monitoring performance of the AEEU-SC and the MVEU-SC (ARL0 = 200).

      AEEU-SC MVEU-SC
  β^0k β^1k β^0(k) β^1(k) SSTAEEU2 β¯0(k) β¯1(k) SSTMVEU2
1 −4.885 1.873 −4.885 1.873 0.000 −4.885 1.873 0.000
2 −4.263 1.595 −4.558 1.727 3.317 −4.574 1.734 3.317
3 −4.534 1.732 −4.550 1.729 0.214 −4.561 1.733 0.195
4 −4.970 1.913 −4.650 1.773 1.920 −4.663 1.778 1.838
5 −4.763 1.817 −4.672 1.781 0.074 −4.683 1.786 0.055
6 −4.631 1.757 −4.665 1.777 0.121 −4.674 1.781 0.140
7 −4.625 1.763 −4.659 1.775 0.009 −4.667 1.779 0.013
8 −4.725 1.833 −4.666 1.782 1.699 −4.675 1.785 1.657
9 −4.478 1.692 −4.644 1.772 0.887 −4.653 1.775 0.933
10 −5.154 1.946 −4.691 1.788 1.588 −4.703 1.792 1.559
11 −4.313 1.644 −4.655 1.774 0.897 −4.667 1.779 0.955
12 −4.438 1.636 −4.634 1.761 5.811 −4.648 1.767 5.899
13 −4.638 1.786 −4.634 1.763 0.841 −4.647 1.768 0.811
14 −4.749 1.825 −4.642 1.767 0.664 −4.655 1.772 0.622
15 −4.531 1.713 −4.634 1.763 0.348 −4.647 1.768 0.384
16 −4.322 1.728 −4.608 1.759 10.486 −4.626 1.766 10.448
17 −3.708 1.572 −4.530 1.739 39.919 −4.572 1.754 40.003
18 −4.111 1.730 −4.490 1.731 33.834 −4.547 1.753 34.011
19 −4.219 1.729 −4.469 1.728 15.451 −4.529 1.752 15.504
20 −4.574 1.870 −4.468 1.733 15.465 −4.532 1.758 15.218
21 −4.077 1.729 −4.436 1.728 31.082 −4.510 1.756 31.125
22 −4.596 1.888 −4.438 1.733 15.252 −4.514 1.762 14.806
23 −4.892 2.003 −4.453 1.742 15.833 −4.530 1.773 15.055
24 −4.789 1.927 −4.465 1.749 6.003 −4.541 1.779 5.391
25 −4.655 1.875 −4.471 1.753 4.919 −4.546 1.783 4.453
26 −3.979 1.668 −4.446 1.748 16.865 −4.524 1.779 16.878
27 −3.988 1.664 −4.424 1.743 13.213 −4.504 1.774 13.289
28 −3.618 1.545 −4.386 1.732 21.804 −4.472 1.766 22.281
29 −4.779 1.987 −4.395 1.739 17.552 −4.483 1.774 16.747
30 −3.650 1.541 −4.364 1.730 14.369 −4.455 1.766 14.924
Control limit h 10.469 10.689

Table 7 presents the specific monitoring procedures of the AEEU-SC and the MVEU-SC. It is found that the AEEU-SC issues an alarm at the time point of T = 16, while the MVEU-SC alerts at T = 17, indicating that the AEEU-SC can alert the shift earlier than the MVEU-SC. To be specific, when the overall willingness of consumption changes, which leads to a shift in the intercept of the logistic regression coefficients, it would be urgent for the enterprise to follow the market and monitor the profile. Considering the results in Table 7, both parameter recursive update strategies show the same trend when the intercept term β0 shifts. Up to the time point of T = 30, the estimates of β0 are updated to β¯0,AEEU=4.364 and β¯0,MVEU=4.455 using the AEEU and the MVEU, respectively. It is shown that the AEEU can detect the shifts of parameters more quickly than the MVEU, which verifies the superiority of the AEEU-SC proposed in this paper.

5. Conclusion

Online monitoring should ensure the sensitivity and robustness of control charts and consider the computational complexity and space complexity. Traditional self-starting monitoring schemes generally use all historical samples to conduct parameter estimation. However, it seems not economical for processes whose sample of one period is large, including some complex profile data. Therefore, it is urgent to innovate an online parameter update strategy with calculation sustainability. Under the framework of binary profile self-starting monitoring, this paper concentrates on the utilization of a parameter update strategy for online self-starting monitoring of binary profiles.

Referring to the AEE estimation for massive datasets in Lin and Xi [23], this paper innovatively constructs a recursive update strategy for estimating parameters of binary profile data. It can recursively update the estimates of both the coefficient vector and the covariance matrix of the logistic regression model, with a lower burden of storing historical observations. Via numerical simulation, the proposed strategy is verified better for parameter estimation than the previous scheme. Therefore, this paper further designs a self-starting monitoring scheme for binary profile data, based on the AEE update strategy. And the asymptotic property of the presented Hotelling's T2 monitoring statistic is demonstrated. The simulation results show that the self-starting control chart based on the AEEU performs better under most shift magnitudes and types, illustrating the superiority of the scheme proposed. Furthermore, a real-data application of a soft drink company's discount coupon strategy proves that the parameter recursive update strategy is also of great practical significance.

Since the presented control chart is a basic design of self-starting monitoring, more types of monitoring schemes can be considered, like the multivariate EWMA chart Zou et al. [43], for future study. Also, the parameter update strategy in this paper concentrates on monitoring binary profile data, while it can be extended to many more generalized linear models mentioned above.

Funding Statement

Prof. Wu is supported by the National Natural Science Foundation of China (11871324); Prof. Hu is supported by the Scientific Research Project of Education Department of Zhejiang Province (Y202147034) and Zhejiang College of Shanghai University of Finance and Economics for Scientific Research Projects at the Provincial and Above Levels (2022).

Disclosure statement

No potential conflict of interest was reported by the author(s).

References

  • 1.Alqahtani M.A., Jeong M.K., and Elsayed E.A., Multilevel spatial randomness approach for monitoring changes in 3D topographic surfaces, Int. J. Prod. Res. 58 (2020), pp. 5545–5558. [Google Scholar]
  • 2.Amiri A., Eyvazian M., and Noorossana Z.R., A parameters reduction method for monitoring multiple linear regression profiles, Int. J. Adv. Manuf. Technol. 58 (2012), pp. 621–629. [Google Scholar]
  • 3.Amiri A., Ghashghaei R., and Khosravi P., A self-starting control chart for simultaneous monitoring of mean and variance of autocorrelated simple linear profile, 2016 IEEE International Conference on Industrial Engineering and Engineering Management, IEEE, Tehran, 2016.
  • 4.Amiri A., Zou C., and Doroudyan M.H., Monitoring correlated profile and multivariate quality characteristics, Qual. Reliab. Eng. Int. 30 (2014), pp. 133–142. [Google Scholar]
  • 5.Capizzi G. and Masarotto G., Efficient control chart calibration by simulated stochastic approximation, IIE Trans. 48 (2016), pp. 57–65. [Google Scholar]
  • 6.Chatterjee S. and Qiu P., Distribution-free cumulative sum control charts using bootstrap-based control limits, Ann. Appl. Stat. 3 (2009), pp. 349–369. [Google Scholar]
  • 7.Franceschini F. and Settineri L., Control charts for the on-line diagnostics of CMM performances, Int. J. Comput. Integr. Manuf. 13 (2000), pp. 148–156. [Google Scholar]
  • 8.Fuchs C. and Kenett R.S., Multivariate Quality Control Theory and Applications, New York, Chapman and Hall/CRC, 1998. [Google Scholar]
  • 9.Ghashghaei R., Khosravi P., and Amiri A., A self-starting control chart for simultaneous monitoring of mean and variance of simple linear profiles, Int. J. Eng. 29 (2016), pp. 1263–1272. [Google Scholar]
  • 10.He S., Song L., Shang Y., and Wang Z., Change-point detection in Phase I for autocorrelated Poisson profiles with random or unbalanced designs, Int. J. Prod. Res. 59 (2021), pp. 4306–4323. [Google Scholar]
  • 11.Hotelling H., Multivariate quality control, in Techniques of Statistical Analysis, C. Eisenhart, M. Hastay, and W.A. Wallis, eds., McGraw-Hill, New York, 1947, pp. 111–184.
  • 12.Jensen W.A., Birch J.B., and Woodall W.H., Monitoring correlation within linear profiles using mixed models, J. Qual. Technol. 40 (2008), pp. 167–183. [Google Scholar]
  • 13.Kang L. and Albin S.L., On-line monitoring when the process yields a linear profile, J. Qual. Technol. 32 (2000), pp. 418–426. [Google Scholar]
  • 14.Kazemzadeh R.B., Noorossana R., and Amiri A., Phase I monitoring of polynomial profiles, Commun. Stat. Theory Methods 37 (2008), pp. 1671–1686. [Google Scholar]
  • 15.Kazemzadeh R.B., Noorossana R., and Amiri A., Phase II monitoring of autocorrelated polynomial profiles in AR(1) processes, Sci. Iran 17 (2010), pp. 12–24. [Google Scholar]
  • 16.Khedmati M. and Niaki S.T.A., Monitoring simple linear profiles in multistage processes by a MaxEWMA control chart, Comput. Ind. Eng. 98 (2016), pp. 125–143. [Google Scholar]
  • 17.Khoravi P. and Amiri A., Self-starting control charts for monitoring logistic regression profiles, Commun. Stat. Theory Methods 48 (2019), pp. 1860–1871. [Google Scholar]
  • 18.Kim K., Mahmoud M.A., and Woodall W.H., On the monitoring of linear profiles, J. Qual. Technol. 35 (2003), pp. 317–328. [Google Scholar]
  • 19.Kinat S., Amin M., and Mahmood T., GLM-based control charts for the inverse Gaussian distributed response variable, Qual. Reliab. Eng. Int. 36 (2020), pp. 765–783. [Google Scholar]
  • 20.Li W., Pu X., Tsung F., and Xiang D., A robust self-starting spatial rank multivariate EWMA chart based on forward variable selection, Comput. Ind. Eng. 103 (2017), pp. 116–130. [Google Scholar]
  • 21.Li W. and Qiu P., A general charting scheme for monitoring serially correlated data with short-memory dependence and nonparametric distributions, IISE Trans. 52 (2020), pp. 61–74. [Google Scholar]
  • 22.Li Y., Wu C., Li W., and Tsung F., Nonparametric passenger flow monitoring using a minimum distance criterion, IISE Trans. (2022). doi: 10.1080/24725854.2022.2092241. [DOI] [Google Scholar]
  • 23.Lin N. and Xi R., Aggregated estimating equation estimation, Stat. Interface 4 (2011), pp. 73–83. [Google Scholar]
  • 24.Liu L., Lai X., Zhang J., and Tsung F., Online profile monitoring for surgical outcomes using a weighted score test, J. Qual. Technol. 50 (2018), pp. 88–97. [Google Scholar]
  • 25.Liu Y., Zhu J., and Lin D.K., A generalized likelihood ratio test for monitoring profile data, J. Appl. Stat. 48 (2021), pp. 1402–1415. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Lowry C.A. and Montgomery D.C., A review of multivariate control charts, IIE Trans. 27 (1995), pp. 800–810. [Google Scholar]
  • 27.Maleki M.R., Amiri A., and Castagliola P., An overview on recent profile monitoring papers (2008-2018) based on conceptual classification scheme, Comput. Ind. Eng. 126 (2018), pp. 705–728. [Google Scholar]
  • 28.Mason R.L. and Young J.C., Multivariate Statistical Process Control with Industrial Applications, Society for Industrial and Applied Mathematics, Philadelphia, 2002. [Google Scholar]
  • 29.Montgomery D.C., Peck E.A., and Vining G.G., Introduction to Linear Regression Analysis, John Wiley & Sons, New York, 2001. [Google Scholar]
  • 30.Qiu P., Introduction to Statistical Process Control, Chapman & Hall/CRC, Boca Raton, 2014. [Google Scholar]
  • 31.Qiu P. and Xie X., Transparent sequential learning for statistical process control of serially correlated data, Technometrics 64 (2022), pp. 487–501. [Google Scholar]
  • 32.Qiu P., Zou C., and Wang Z., Nonparametric profile monitoring by mixed effects modeling, Technometrics 52 (2010), pp. 265–277. [Google Scholar]
  • 33.Ryu J.H., Wan G., and Kim S., Optimal design of a CUSUM chart for a mean shift of unknown size, J. Qual. Technol. 42 (2010), pp. 311–326. [Google Scholar]
  • 34.Saghaei A., Rezazadeh S.M., Noorossana R., and Dorri M., Phase II logistic profile monitoring, Cell Transplant 4 (2012), pp. 745–747. [Google Scholar]
  • 35.Shang Y., Tsung F., and Zou C., Profile monitoring with binary data and random predictors, J. Qual. Technol. 43 (2011), pp. 196–208. [Google Scholar]
  • 36.Sharafi A., Aminnayeri M., and Amiri A., An MLE approach for estimating the time of step changes in Poisson regression profiles, Sci. Iran 20 (2013), pp. 855–860. [Google Scholar]
  • 37.Soleimani P. and Asadzadeh S., Effect of non-normality on the monitoring of simple linear profiles in two-stage processes: a remedial measure for gamma-distributed responses, J. Appl. Stat. 49 (2022), pp. 2870–2890. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Soleymanian M.E., Khedmati M., and Mahlooji H., Phase II monitoring of binary response profiles, Sci. Iran 20 (2013), pp. 2238–2246. [Google Scholar]
  • 39.Sparks R.S., CUSUM charts for signalling varying location shifts, J. Qual. Technol. 32 (2000), pp. 157–171. [Google Scholar]
  • 40.Xia Z. and Tsung F., A computationally efficient self-starting scheme to monitor general linear profiles with abrupt changes, Qual. Technol. Quant Manag. 16 (2019), pp. 278–296. [Google Scholar]
  • 41.Yeh A.B., Huwang L., and Li Y., Profile monitoring for a binary response, IIE Trans. 41 (2009), pp. 931–941. [Google Scholar]
  • 42.Zhang Y., He Z., Zhang C., and Woodall W.H., Control charts for monitoring linear profiles with within-profile correlation using Gaussian process models, Qual. Reliab. Eng. Int. 30 (2014), pp. 487–501. [Google Scholar]
  • 43.Zou C., Tsung F., and Wang Z., Monitoring general linear profiles using multivariate EWMA schemes, Technometrics 49 (2007), pp. 395–408. [Google Scholar]
  • 44.Zou C., Zhou C., Wang X., and Tsung F., A self-starting control chart for linear profiles, J. Qual. Technol. 39 (2007), pp. 364–375. [Google Scholar]

Articles from Journal of Applied Statistics are provided here courtesy of Taylor & Francis

RESOURCES