ABSTRACT
Increasingly complex models are being fit to data these days. This is especially the case for Bayesian modelling making use of Markov chain Monte Carlo methods. Tailored model diagnostics are usually lacking behind. This is also the case for Bayesian mediation models. In this paper, we developed a method for the detection of influential observations for a popular mediation model and its extensions in a Bayesian context. Detection of influential observations is based on the case-deletion principle. Importance sampling with weights which take advantage of the dependence structure in hierarchical models is utilized in order to identify the part of the model which is influenced most. We make use of the variance of log importance sampling weights as the measure of influence. It is demonstrated that this approach is useful when interest lies in the impact of individual observations on a subset of model parameters. The method is illustrated on a three-level data set from the field of nursing research, which was previously used to fit a mediation model of patient satisfaction with care. We focused on influential cases on both the second and the third level of the data.
Keywords: Bayesian mediation models, influential observations, importance sampling
1. Introduction
Increasingly complex models, often estimated in a Bayesian way, are being proposed in the current statistical literature. A lot of effort has been spent to develop general models to describe complex phenomena. However, diagnostics and model checking are lagging behind, although they are important to avoid that wrong or misleading results and conclusions are used. One of such diagnostics checks whether observations have a high impact on the estimated model parameters.
In this paper we focus on mediation models, which constitute a broad class of often complex models that received quite some attention in recent biostatistical literature. More specifically, we wish to develop a general method to detect influential observations in a popular mediation model for ordinal outcomes and its extensions to the multi-level case. In a mediation analysis one aims to explain the causal path from risk factor to outcome via a set of explanatory variables. Most common approaches to mediation analysis assume that the outcome variable Y as well as all explanatory variables are continuous and the analysis proceeds by using suitable linear models (see, e.g. Preacher et al. [11]). To introduce the concept, we now describe a popular but simple mediation model. Let us assume pairs ( ) with a continuous outcome and a continuous mediator both observed on a set of N independent units. We wish to investigate their relationship with a covariate ( ). A simple mediation model is specified through two model equations based on a classical linear model:
| (1) |
| (2) |
where , are model error terms, and , , , , are regression coefficients.
In the context of mediation modelling, Equation (1) is referred to as a measurement equation, whereas Equation (2) is called a mediation equation. The coefficient is referred to as the direct effect, whereas the product is referred to as the indirect (mediated) effect of the variable W on the outcome. Equations (1)–(2) are fitted jointly in order to estimate the indirect effect. For the identification of the indirect effect, the following assumptions need to be satisfied: (1) no unmeasured treatment-outcome confounding, (2) no unmeasured mediator-outcome confounding, (3) no unmeasured treatment-mediator confounding, and (4) no mediator-outcome confounder affected by treatment (VanderWeele [16]). The direct and indirect effects are of primary interest, therefore it is essential that their estimates are stable. The above simple model will further be extended to capture more complex data structures.
The Bayesian approach is often utilized to fit complex models. In general, the Markov chain Monte Carlo (MCMC) methods are employed to generate a sample from the posterior distribution of the parameters. This sample is further used to draw inference on the model parameters. However, no general model diagnostic tool for the identification of influential observations is readily available for our model and its extensions. Therefore, we deem the development of such an instrument as necessary, see also Section 1.2 for our motivation in a practical setting. In this paper, we consider an observation (or a group of observations) as influential if the case-deleted (or group-deleted) posterior distribution of the parameter of interest considerably differs from that obtained with the posterior distribution based on all observations. The metric that defines a considerable difference is defined below.
Case-deletion as a tool for detecting influential observations has been widely used in linear regression models in both frequentist and Bayesian analysis (see, e.g. Cook and Weisberg [4], Weiss [17] and Kass et al. [7]). While in the frequentist approach, analytical results or approximate formulas are available to express the case-deleted parameter estimates as a function of the full sample (FS) estimates, this is not the case anymore in the Bayesian approach. Importance sampling is a common method to calculate the case-deleted posterior parameter estimates based on the FS posterior distribution determined by MCMC methods (e.g. Robert and Casella [12], Section 3.3). In addition, large variability of the importance weights indicates that the case-deleted posterior estimates may be quite different from the FS estimates. For this reason, Thomas et al. [15] proposed to use the variance of log importance sampling weights as a measure of influence.
In this paper, we combine the approach of Bradlow and Zaslavsky [2] with that of Thomas et al. [15] for the detection of influential observations in the mediation model defined in (1, 2) extended to the multi-level setting with ordinal responses and multiple covariates ( ) and mediators ( ), first proposed in our earlier work Rusá et al. [13].
The importance weights suggested here take advantage of the dependence structure in models analogous to (1, 2) in order to determine which part of the model is influenced. This may add additional insight to the analysts and can help them to point out what part of the model could be improved. With a hierarchical model structure, the choice of suitable importance weights facilitates the utilization of the leave-one-out diagnostics at any level of the data. We propose to use a diagnostic plot with two marginal influence measures (these reflect the measurement and the mediation part of the model) plotted on the X-axis and the Y-axis, respectively. The overall influence is evaluated as the sum of the marginal influence measures so it is easily detected from the off diagonals of the plot.
We now review the existing literature in Section 1.1. In Section 1.2 we formulate the research questions that motivated our developments and introduce the motivating dataset.
1.1. Review and need for extensions
The assessment of influence in Bayesian normal linear models and hierarchical normal random effects models has been studied in Weiss and Cho [18]. The approach allowed to focus on a subset of parameters thereby treating the remaining ones as nuisance. In Bayesian generalized linear models where the likelihood dominates the prior, the MCMC output can be used to approximate the frequentist case-deleted maximum likelihood estimate and employed as a measure of influence (Jackson et al. [6]). Hodges [5] expressed hierarchical models as ordinary linear models, which facilitates the development of diagnostics for model selection, case influence and residuals. This method is appropriate for the simple mediation model given in (1, 2). Bayarri and Morales [1] suggested to carry out preliminary checks and proposed several measures for outlier detection in Gaussian models. These proposals may be of interest because outliers are those observations that are remote from the bulk of the data and are potentially influential (Zaslavsky and Bradlow [19]).
Model diagnostics have not been restricted to case deletion. Cook [3] proposed general perturbation schemes for assessing local influence of frequentist models. The Bayesian approach to local influence, based on perturbing the value of a suitably chosen hyperparameter away from an initial choice, is discussed in McCulloch [8]. Moreover, it was shown that the posterior variance of the log-likelihood of a given observation is a measure of that observation's local influence in models with conditionally independent observations (Millar and Stewart [9]).
Importance sampling is a popular technique used for the estimation of case-deleted posterior statistics (e.g. posterior means) in Bayesian models. For hierarchical models, the importance weights can be easily computed due to the (assumed) conditional hierarchical independence in the model (Bradlow and Zaslawsky [2]). However, the use of classical importance weights to estimate the case-deleted posterior means may fail because an excessive (even infinite) variance of the importance weights when the case-deleted posterior differs too much from the original (see, e.g. Hodges [5] or Peruggia [10]). While Bradlow and Zaslavsky [2] tried to reduce the variability of the importance weights using a convenient factorization, they have not solved this issue. Recently, Thomas et al. [15] obtained low-dimensional case-influence summaries entirely via the variance-covariance matrix of the log importance sampling weights. They also considered the joint influence for sets of cases. The influence measures are computed from a principal component analysis (PCA), which reveals the most interesting directions of the local curvature of the Kullback-Leibler divergence of the full posterior from a geometrically perturbed quasi-posterior. In mediation models we are specifically interested in knowing what part of the model (measurement and/or mediation part) is grossly influenced by the observation. Combining the approaches of Bradlow and Zaslavsky [2] and Thomas et al. [15] might therefore be useful for mediation models.
Nevertheless, none of previously mentioned methods may be directly applied to detect influential observations in the mediation models considered in Section 2. Diagnostic checks should be an integral part of any model evaluation and it does not make sense to fit complex models without standard model checking, which includes the search for influential observations. Therefore, we deem it necessary to develop computationally tractable methods for the identification of influential observations for complex Bayesian mediation models. To this end, we adopted the approach of Bradlow and Zaslavsky [2] to case influence diagnostics but our methods differ from theirs in three ways: (i) we use an extended factorization scheme, which is advantageous for mediation models, (ii) we propose simple importance weights if case-deletion for parameters from only a single model equation is of interest, and finally, (iii) we suggest how to combine the importance weights in case we are interested in the detection of influence of a group of observations. Consequently, we utilize the same measure of influence as in Thomas et al. [15]. However, for mediation models the natural decomposition of the model (the measurement equation and mediation equations) is preferred over PCA used there.
1.2. Motivating example: RN4CAST study and research questions
The Registered Nurse Forecasting (RN4CAST) study (Sermeus et al. [14]) is a cross-sectional survey of patients and nurses conducted in 12 European countries. This FP7-funded project ran from 2009 to 2010 and collected data on hospitals, nursing units, nurses and patients. In each of the 12 countries (except Sweden) at least 30 hospitals depending upon country size and number of hospitals were involved in the study. Hospitals were selected using different selection mechanisms depending on the country. Nurses were invited to participate as well as patients who were hospitalized in a priori chosen periods. Various hospital characteristics were recorded, such as nurse staffing, nurse education, number of beds, etc. The nurses were interviewed on various nursing care aspects, such as their well-being, satisfaction with their job, their willingness to recommend the hospital and overtime work. The patients provided information on their satisfaction with hospital care and hospital rating.
In Rusá et al. [13], a multi-level mediation model with ordinal outcome is presented to evaluate how the patients' willingness to recommend the hospital relates to the system-level features in the hospital organization or in the nursing care. It was also of interest to know whether the association is mediated by two measures of ‘nursing care left undone’. Here, we aim to discover whether there are hospitals or countries that are influential for estimating the model parameters. In addition, we wish to disentangle their effect into their effect on the measurement model and their effect on the mediation model.
The outline of the paper is as follows. In Section 2 we describe the multi-level moderated mediation model with ordinal outcome and multiple mediators suggested in Rusá et al. [13]. The Bayesian methods for the detection of influential observations are explained in Section 3. In Section 4 we apply our model to the mediation model outlined in Section 2. Concluding remarks are given in Section 5.
2. A general multi-level mediation model
In this section we treat case influence diagnostics for the general multi-level mediation model with ordinal outcome proposed by Rusá et al. [13]. This model generalizes the joint model Equations (1, 2) to a hierarchical model with an ordinal outcome. The model was motivated by research questions raised in the RN4CAST study, which is briefly described below. We also show how the posterior distribution can be factorized for the computation of convenient case influence diagnostics using importance sampling described in Section 3.
2.1. Model formulation
Rusá et al. [13] suggested a multivariate extension of the mediation model given by (1) and (2). We describe the model with the RN4CAST data in mind, but the proposed model is of course more general. Suppose that the data can be structured into three hierarchical levels. Let K be the number of units on the third level (countries in our application); the number of units (hospitals) within the kth unit of the third level (country); and , , , denotes the number of subjects (patients) within a second-level unit (hospital) j of the third-level unit (country) k. Finally, let be the total number of subjects in the data set.
Further, let be the ordinal outcome with L + 1 levels measured on the ith patient in hospital j and country k. A natural approach for an ordinal outcome is to assume that it is derived from a continuous latent variable as follows:
with unknown thresholds such that .
Motivated by our application, we assume that all main explanatory variables, confounders and mediators are available only at the hospital level. However, the methods presented in Rusá et al. [13] are directly applicable to variables measured on any higher level. Let , and , respectively, be the vector of the main explanatory variables, confounders, mediators, for hospital j of country k, , .
To deal with the multi-level structure of the RN4CAST data, we included the country-level effects ( ) and hospital-level effects ( ) in the model equation. Consequently, the mediation model for the latent outcome can now be summarized as ( , , )
| (3) |
where , are the unknown country-specific intercepts, and , , , , , are unknown regression parameters, which will be further referred to as and . We assume that , and , , , , are mutually independent.
2.2. The factorization of the posterior distribution
In the following, let be the vector of the ordinal outcome for all patients in hospital j of country k. We denote the total set of outcomes, mediator measurements and latent hospital-level means by , , , respectively.
The vector represents the unknown model parameters (without latent data ), which consists of (i) the thresholds , (ii) the measurement equation regression coefficients , (iii) the mediation equations regression coefficients and (iv) the variance parameters , , . That is, . Furthermore, let and , so .
Further, let and be generic symbols for a density and a conditional density, respectively. As usual, the lower case letters denote the observed values of the corresponding upper case letter random variables. The most important property of our model is that certain subsets of parameters are a posteriori independent (provided they are independent a priori). In particular, in our case, this property holds for parameters and – if we assume that the parameters , , , , and are a priori independent then the conditional independence of some parameters and data and the prior independence implies:
| (4) |
Hence, and are a posteriori independent. This property allows for the derivation of the importance sampling weights suitable for the efficient calculation of case-deleted posterior summary statistics based on a single MCMC run.
3. The detection of influential observations
The Bayesian analysis of model (3) was carried out in Rusá et al. [13]. Apart from the estimation of the parameters it may be of interest to assess the influence of some (sets of) observations on the parameter estimates. In our application, we aim to evaluate the influence on (i) the parameters in the measurement equation , (ii) the parameters in the mediating equations , (iii) the indirect effects . To this end, importance sampling has been utilized in the literature, see, e.g. Bradlow and Zaslavsky [2]. The importance weights suggested by them will be employed for the parameters of the measurement and the mediation equations separately. In this paper, we are interested in the posterior distribution of given the data while leaving out a particular second-level observation. Hence, the term ‘case-deleted’ data will implicitly mean the complete data without an observation in the second level. In the following, denotes the data excluding the second-level observation jk.
3.1. Importance sampling and its use in the Bayesian model diagnostics
The posterior distribution of any function can be obtained from the posterior of . The dissimilarity of the posterior distribution of computed from the full dataset and the posterior distribution based on all data but a second-level observation (here hospital jk) indicates how influential that observation is. However, most often the posterior distribution cannot be derived analytically and must be estimated using MCMC. This was also done in Rusá et al. [13]. Such a comparison requires the computation of multivariate numerical integration. Bradlow and Zaslavsky [2] defined the difference of the posterior distributions via their posterior means, i.e. , but also measures such as quantiles can be used. The case-deleted posterior means and hence the above difference can be obtained from the generated Markov chain to determine the posterior distribution based on all observations. This is done by reweighing the generated values of the parameters in such a way that the case-deleted posterior estimates are consistently estimated. This technique is based on importance sampling. Importance sampling implies that the case-deleted posterior can be approximated from the full posterior distribution by resampling the elements of the original chain using the weight function
Then, the case-deleted posterior mean of an arbitrary function can be rewritten as
implying that it can be estimated by the weighted sample mean
with the elements from the original chain and normalized importance weights for estimating at the s-th draw of the chain. From the Law of Large numbers it follows that this estimate is consistent if the original sample converges in distribution to the full data posterior. A similar approach is possible in case we know and (or their ratio) up to a constant. Let us define such as
then the so-called self-normalized importance sampling estimate
| (5) |
is a consistent estimate of . In the remainder of this paper, we will sometimes use a more concise notation or respectively. Furthermore, we can observe that the estimate (5) corresponds to the estimate with normalized weights with .
Bradlow and Zaslavsky [2] showed that the ratio is proportional to so we can define the non-normalized importance weights to be equal to these weights and utilize the estimate (5). Nevertheless, following Bradlow and Zaslavsky [2], in some models (e.g. in hierachical models) we can suggest more suitable importance weights than these elementary weights. Such weights are often based on some convenient factorization of the posterior distribution and are especially useful when some parameters are considered as nuisance parameters. To be more specific, this approach may be particularly suitable when data augmentation is utilized because it often leads to a more simple and computationally less intensive form of the importance weights. However, we rarely want to infer about the augmented data so it makes sense to ‘integrate them out’ from the importance weights.
3.2. The importance weights for
In hierarchical models, we can often divide the parameters to a group of ‘interesting’ parameters and a group of ‘nuisance’ parameters (e.g. augmented data) and consider a factorization . In our application, we might utilize three different partitions of dictated by our intention to study the 3 different effects aforementioned in the model: (i) , in case we want to infer about the parameters in the measurement equation only, (ii) , in case we want to infer about the parameters in the mediating equations, (iii) , in the general case. In general, we are interested in the estimation of the case-deleted posterior distribution of a function of the ‘interesting’ parameters , .
In Equation (4) we showed that the parameters from the measurement equation and the parameters from the mediation equation are a posteriori independent. Consequently, the influence diagnostics of the two parts of the model can be carried out separately (forgetting the other part of the model) and then combined later in order to assess which part of the model is influenced more. In the following, we will present the importance weights for (i) the parameters in the measurement equation (Section 3.3), (ii) the mediation equation (Section 3.4) and (iii) the indirect effect or any other function of the parameters in both the measurement and the mediation equations (Section 3.5). Furthermore, in Section 3.6 we show that the higher-level importance weights can be obtained as the product of the relevant lower-level importance weights.
3.3. The importance weights for the parameters in the measurement equation
Let and , where and are the parameters in the measurement equation and mediation equations, respectively. From the following factorization of the posterior distribution of , we can derive the suitable weights suggested by Bradlow and Zaslavsky [2]:
| (6) |
In their terminology, we set , . The purpose of this factorization is to rewrite the posterior distribution as the product of the posterior distribution of the parameters and the augmented data given the case-deleted data and some other factors.
The structure of the desired importance weights is such that it ensures the consistency of the importance weighted estimate. In the denominator we put the first two factors of the right side of Equation (6), whereas in the numerator we can utilize an arbitrary positive function of the augmented data and the data . On the basis of this motivation, we can define the importance weights as
| (7) |
for some . From Equation (7), we see that the estimate based on these importance weights, computed as in Equation (5), converges in probability to
as long as .
Generally, it is a good strategy to choose g such that the factors in g and the denominator of the importance weights (7) (almost) cancel out (Bradlow and Zaslavsky [2]). To be more specific, we replace the parameter in all the terms of the denominator of Equation (7) by a suitable constant (depending on the iteration in chain), such as the posterior mean . Also the replacement of the target parameters by its posterior mean was first suggested by Bradlow and Zaslavsky [2]. Hence, we define
| (8) |
Consequently, the importance weights for the parameters in the measurement equation of model (3) can be defined as
| (9) |
We can now see that the importance weights (9) have similar terms in the nominator and denominator and these almost cancel out by which means we eliminate the variability of the importance weights caused by .
3.4. The importance weights for the parameters in the mediation equations
Now, we will consider the importance weights for the case-deleted posterior distribution of the parameters in the mediation equations ( and ). In this case, there are no augmented data so we can use the simple importance weights. To be specific,
| (10) |
3.5. The importance weights for an arbitrary function of parameters in the mediation model
In the most general case we are interested in an arbitrary function of , , e.g. the indirect effect , so in this section, we consider a scenario with and . The importance weights can be computed as the product of the importance weights for the measurement equation and the importance weights for the mediation equations , that is
| (11) |
This follows from
Thus, the importance weights for are given by
| (12) |
Summarized, we have proposed suitable importance weights to evaluate the influence of the hospitals on the posterior distribution of parameters of interest. In case (i) is of interest, we choose the importance weights defined in Equation (9), (ii) is of interest, we choose the importance weights defined in Equation (10), (iii) an arbitrary function is of interest, we choose the importance weights defined in Equation (12).
3.6. The importance weights for higher levels of the data
For the multi-level setting, as in our example, the choice of suitable importance weights facilitates the utilization of the case-deletion diagnostics at any level of the data. Having proposed importance weights for deletion of observations at level two, we could now determine the weighting scheme for level three deletion. It is then clear that the same reasoning applies for the higher levels, if any, of the data. It is straightforward to show that the importance weights for level three deletion can be computed from importance weights for level two deletion. Similarly as in the previous section, we can condition by and set for case-deletion in the measurement equation and for case-deletion in the mediation equations.
Further, as in the factorization for a specific second-level observation (6), we have
so we can see that the only difference between the level two and level three deletion is that instead of one term and , respectively, we get the products across the second level units in the third level unit and , respectively. The same reasoning can be applied to the importance weights for the parameters of mediation equations. Therefore, the third-level deletion importance weights can be defined as
To summarize, we can compute the level three deleted importance weights as the product of the corresponding importance weights of level two units in the level three unit.
3.7. The diagnostic plot for the detection of influential observations in mediation models
Having derived suitable importance weights for different parts of the mediation model, we now aim to find the most influential ones. We will be interested in the observations with the highest variance of the log importance weights with respect to (i) the measurement part of the model; (ii) the mediation part of the model and (iii) the model generally. In the following, the importance weights , and , respectively, will be denoted more concisely as , and , respectively.
We will utilize Equation (11) that states that the joint importance weights can be computed as the product of the importance weights for the measurement equation and the importance weights for the mediation equations or equivalently, that . Moreover, the importance weights and are a posteriori independent so
Therefore it is possible to determine which part of the model is influenced by observation (hospital) jk, if any. As a diagnostic tool, we propose to draw a plot with the values of on the x-axis and on the y-axis for all hospitals (that is , ). From this plot it is straightforward to see the influence on the whole model as well as which part of the model is influenced most. From this graphical check one can identify the most influential cases and investigate them further, e.g. run a new MCMC chain based on the data set without that observation.
For the detection of case influence on the higher (country) level, one can use the same method with the corresponding importance weights , and defined in Section 3.6.
4. Application on RN4CAST data
4.1. Prior specification
The parameters , , , , and were assumed to be a priori independent. Normal priors were specified for and , proper uniform priors on were specified for and . The covariance matrix was parametrized using the variance of denoted by , the variance of denoted by and their correlation ρ. For the standard deviations , we also specified proper uniform priors on . A uniform prior was considered for ρ: . To ensure the identification of the thresholds α and the mean and the variance of the latent variable, we fix the extreme thresholds at preassigned values and . The following non-informative prior was used for the thresholds:
4.2. The influence diagnostics
The model suggested in Rusá et al. [13] aimed to find the effect of nurse staffing, the quality of nurse work environment and other hospital characteristics on patient outcome . In addition, it was of interest to see if this effect was mediated by two measures of care left undone . In this section, we investigate whether the model estimates are too dependent on some hospitals or countries, i.e. whether they are stable with respect to deleting hospitals or countries.
We ran 2 Markov chain cycles, each with 20,000 samples after 10,000 burn-in samples, so in our case S = 40,000. Then we computed the importance weights , and for , , by plugging in the posterior means of in the function g defined in Equation (8). Next, we computed the variance of the log weights , and . The diagnostic plot of on the x-axis and on the y-axis suggested in Section 3.7 is shown in Figure 1.
Figure 1.
The hospital-level diagnostic plot with the variance of the log importance weights for the measurement equation on the x-axis and the variance of the log importance weights for the mediation equation on the y-axis for all hospitals ( , ). The overall influence on the model is determined by the sum of these two values and can be visually compared with the diagonal lines.
Several hospitals, namely hospitals with number 147, 151, 152, 156, 158 and 160 stand out in the plot. From these, hospitals 152 and 158 influence mainly the measurement part of the model, while hospitals 147, 151, 156 and 160 influence the mediation part of the model.
With regard to the country level, we can see from Figure 2 that the most influential country is Greece, which seems to influence both parts of the model. This is not surprising since most of the outlying hospitals from Figure 1 are from Greece. Belgium is the second most influential with respect to the mediation part of the model.
Figure 2.
The country-level diagnostic plot with the variance of the log importance weights for the measurement equation on the x-axis and the variance of the log importance weights for the mediation equation on the y-axis for all hospitals ( ). The overall influence on the model is determined by the sum of these two values and can be visually compared with the diagonal lines.
However, it is desirable to check the usefulness of the diagnostic plots. To this end, we fitted 14 models with case-deleted data (which we would normally want to avoid). To be specific, we ran new MCMC chains while removing hospitals 147, 151, 152, 156, 158 and 160 and all 8 countries from the data one at a time. For each element of and we computed the difference of the estimates of the posterior mean of the parameter and standardized it by the estimated posterior standard deviation of the parameter based on complete data. The standardized differences are shown in Figure 3, where the parameters are aggregated in two groups given by the part of the model. Elements of from the measurement part of the model are part of a group abbreviated as Meas. in the plot, whereas the parameters from the mediation equations are grouped together and abbreviated as Med.
Figure 3.
The standardized difference of the posterior means of the elements of and based on complete data and based on a new MCMC chain without hospital for the hospitals marked as possibly influential (i.e. ). On the x-axis, we distinguish between elements of (measurement equation) and (mediation equations).
In accordance with the diagnostic plot (Figure 1) we can see that for hospitals 147, 151, 156 and 160 the standardized differences are generally higher for the elements of corresponding to mediation equations (Med.). For hospitals 152 and 158, the standardized differences for the elements of are higher. This also corresponds to the diagnostic plot.
The standardized differences for the countries are depicted in Figure 4. Greece seems to be the most influential country and the influence seems to stem from both parts of the model which corresponds to the diagnostic plot (Figure 2). The standardized differences are relatively high for the parameters in the mediation equations (Med.) for Belgium. Thus, we conclude that our diagnostic plot gives the same information as when removing the hospitals and countries effectively.
Figure 4.
The standardized difference of the posterior means of the elements of and based on complete data and based on a new MCMC chain without country for all countries (i.e. ). On the x-axis, we distinguish between elements of (measurement equation) and (mediation equations).
Next, we compared the conclusions of our method above with the traditional approach of the estimation of case-deleted estimates using importance sampling weights (5). In Section 1.1, we discussed a possible drawback of the traditional approach, namely that the variance of the importance weights can be too large. The importance weighted estimate may therefore be dominated by a few samples. In our case, the maximal importance weights were quite large ( for Ireland measurement part of the model, for Finland mediation part). The standardized differences of the country-deleted importance weighted estimates and the posterior means are shown in Figure 5. According to the traditional approach, Finland is marked as influential in the mediation part and Ireland in the measurement part. Looking at Figure 2 we can realize this is not true – the countries were marked as influential due to the high variability of the importance weights.
Figure 5.
The standardized difference of the posterior means of the elements of and based on complete data and country-deleted importance weighted estimates for all countries (i.e. ). On the x-axis, we distinguish between elements of (measurement equation) and (mediation equations).
5. Discussion
In this paper, we combined the approach Bradlow and Zaslavsky [2] with that in Thomas et al. [15] for the detection of influential observations in a complex mediation model. We have proposed a general method for the evaluation of the influence of observations on any level of hierarchy in multi-level data which is applicable, e.g. for complex Bayesian mediation models. Due to our application, we assumed the errors in our model were normally distributed but the method would be applicable if their distribution were non-normal.
The diagnostics are based on a suitable plot of variance of the log importance weights with the measurement part on the x-axis and the mediation part on the y-axis. The overall influence can also be determined from the plot. The software used for the computation of the importance weights, which also produces the diagnostic plots, is briefly introduced in the Appendix and is available as supplementary material.
The method was applied on a data set from nursing research. The methodology makes use of the latent data in our model which can be conditioned by and used to propose easy to compute importance weights. Moreover, we are able to investigate the influence on subsets of parameters, thereby ignoring the rest of the parameters. This is done by making use of conditional independences in hierarchical models, which leads to simple but at the same time effective importance weights. Another advantage of our approach is that it can be directly generalized to assess the influence of a particular group of observations because the joint importance weights can be obtained easily as the product of the individual importance weights.
Our method can be applied to various mediation models, which satisfy the condition that the parameters in the measurement equation and the parameters in the mediation equations are a posteriori independent. As long as this assumption holds, the influence can be assessed at all levels of the data.
We advocate that the model diagnostics, or more specifically influence diagnostics, be carried out routinely also in other complex Bayesian models. The proposed general importance weights are computationally feasible so the applied researchers can compute the relevant importance weights in real time and plot the variance of the log weights of the measurement equation and the mediation part of the model against each other.
Supplementary Material
Funding Statement
The work on this paper was supported by the Czech Science Foundation grant GAČR 19-00015S and GAUK 1583317.
Disclosure statement
No potential conflict of interest was reported by the author(s).
References
- 1.Bayarri M.J. and Morales J., Bayesian measures of surprise for outlier detection, J. Statist. Plann. Inference. 111 (2003), pp. 3–22. doi: 10.1016/S0378-3758(02)00282-3 [DOI] [Google Scholar]
- 2.Bradlow E.T. and Zaslavsky A.M., Case influence analysis in Bayesian inference, J. Comput. Graph. Stat. 6 (1997), pp. 314–331. [Google Scholar]
- 3.Cook R.D., Assessment of local influence, J. R. Stat. Soc. Ser. B (Methodol.) 48 (1986), pp. 133–169. [Google Scholar]
- 4.Cook R.D. and Weisberg S., Residuals and Influence in Regression, New York, Chapman and Hall, 1982. [Google Scholar]
- 5.Hodges J.S., Some algebra and geometry for hierarchical models, applied to diagnostics, J. R. Stat. Soc. Ser. B (Methodol.) 60 (1998), pp. 497–536. doi: 10.1111/1467-9868.00137 [DOI] [Google Scholar]
- 6.Jackson D., White I.R., and Carpenter J., Identifying influential observations in Bayesian models by using Markov chain Monte Carlo, Stat. Med. 31 (2012), pp. 1238–1248. doi: 10.1002/sim.4356 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Kass R.E., Tierney L., and Kadane J.B., Approximate methods for assessing influence and sensitivity in Bayesian analysis, Biometrika 76 (1989), pp. 663–674. doi: 10.1093/biomet/76.4.663 [DOI] [Google Scholar]
- 8.McCulloch R.E., Local model influence, J. Amer. Statist. Assoc. 84 (1989), pp. 473–478. doi: 10.1080/01621459.1989.10478793 [DOI] [Google Scholar]
- 9.Millar R.B. and Stewart W.S., Assessment of locally influential observations in Bayesian models, Bayesian Anal. 2 (2007), pp. 365–383. doi: 10.1214/07-BA216 [DOI] [Google Scholar]
- 10.Peruggia M., On the variability of case-deletion importance sampling weights in the bayesian linear model, J. Amer. Statist. Assoc. 92 (1997), pp. 199–207. doi: 10.1080/01621459.1997.10473617 [DOI] [Google Scholar]
- 11.Preacher K.J., Rucker D.D., and Hayes A.F., Addressing moderated mediation hypotheses: Theory, methods, and prescriptions, Multivariate Behav. Res. 42 (2007), pp. 185–227. doi: 10.1080/00273170701341316 [DOI] [PubMed] [Google Scholar]
- 12.Robert C.P. and Casella G., Monte Carlo Statistical Methods, 2nd ed, Springer-Verlag, New York, 2004. [Google Scholar]
- 13.Rusá Š., Komárek A., Lesaffre E., and Bruyneel L., Multilevel moderated mediation model with ordinal outcome, Stat. Med. 37 (2018), pp. 1650–1670. doi: 10.1002/sim.7605 [DOI] [PubMed] [Google Scholar]
- 14.Sermeus W., Aiken L.H., Van den Heede K., Rafferty A.M., Griffiths P., Moreno-Casbas M.T., Busse R., Lindqvist R., Scott A.P., Bruyneel L., Brzostek T., Kinnunen J., Schubert M., Schoonhoven L., and Zikos D., Nurse forecasting in Europe (RN4CAST): Rationale, design and methodology, BMC. Nurs. 10 (2011), pp. 1–9. doi: 10.1186/1472-6955-10-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Thomas Z.M., MacEachern S.N., and Peruggia M., Reconciling curvature and importance sampling based procedures for summarizing case influence in Bayesian models, J. Amer. Statist. Assoc. 113 (2018), pp. 1669–1683. doi: 10.1080/01621459.2017.1360777 [DOI] [Google Scholar]
- 16.VanderWeele T.J., A three-way decomposition of a total effect into direct, indirect, and interactive effects, Epidemiology 24 (2013), pp. 224–232. doi: 10.1097/EDE.0b013e318281a64e [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Weiss R., An approach to Bayesian sensitivity analysis, J. R. Stat. Soc. Ser. B (Methodol.) 58 (1996), pp. 739–750. [Google Scholar]
- 18.Weiss R.E. and Cho M., Bayesian marginal influence assessment, J. Statist. Plann. Inference. 71 (1998), pp. 163–177. doi: 10.1016/S0378-3758(98)00015-9 [DOI] [Google Scholar]
- 19.Zaslavsky A.M. and Bradlow E.T., Posterior predictive outlier detection using sample reweighting, J. Comput. Graph. Stat. 19 (2010), pp. 790–807. doi: 10.1198/jcgs.2010.08141 [DOI] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.





