Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2016 Apr 8.
Published in final edited form as: Psychometrika. 2013 Nov 22;79(4):543–568. doi: 10.1007/s11336-013-9355-z

A CLASS OF DISTRIBUTION-FREE MODELS FOR LONGITUDINAL MEDIATION ANALYSIS

D Gunzler 1, W Tang 2, N Lu 3, P Wu 4, XM Tu 5
PMCID: PMC4825877  NIHMSID: NIHMS770607  PMID: 24271505

Abstract

Mediation analysis constitutes an important part of treatment study to identify the mechanisms by which an intervention achieves its effect. Structural equation model (SEM) is a popular framework for modeling such causal relationship. However, current methods impose various restrictions on the study designs and data distributions, limiting the utility of the information they provide in real study applications. In particular, in longitudinal studies missing data is commonly addressed under the assumption of missing at random (MAR), where current methods are unable to handle such missing data if parametric assumptions are violated.

In this paper, we propose a new, robust approach to address the limitations of current SEM within the context of longitudinal mediation analysis by utilizing a class of functional response models (FRM). Being distribution-free, the FRM-based approach does not impose any parametric assumption on data distributions. In addition, by extending the inverse probability weighted (IPW) estimates to the current context, the FRM-based SEM provides valid inference for longitudinal mediation analysis under the two most popular missing data mechanisms; missing completely at random (MCAR) and missing at random (MAR). We illustrate the approach with both real and simulated data.

Keywords: mediation analysis, functional response models, structural equation model, weighted generalized estimating equations II, missing data, generalized least squares

1. Introduction

Mediational analysis is employed in a wide range of behavioral, biomedical, and psychosocial research studies to investigate the causal mechanism of intervention, i.e., mediation. The traditional regression paradigm is ill-suited for modeling such multidimensional causal relationships, because of the complex relationships among different variables and multiple roles a variable may play. Although the structural equation model (SEM) provides an ideal conceptual framework for the dynamic role a mediator plays in channeling the effect of a causal agent to the outcome of interest, the various restrictions imposed by the available inference methods have hindered its wide use and limited the utility of the information provided in real studies.

In real longitudinal studies, we often encounter missing data on the mediator and outcome of interest that is dependent on information from a previous time point, an assumption that has been termed missing at random (MAR). For instance, if an intervention is effective, a patient may drop out of a study if they believe there is no added benefit in continuing the treatment. The missing data is associated with the treatment. If parametric assumptions are not met, under this commonly used assumption of MAR, there currently is no method available for unbiased inference in the SEM framework in the context of mediation analysis.

In this paper, we discuss a new, robust approach to address the various limitations of existing methods by utilizing a class of functional response models (FRM). In Section 2, we briefly review the concept of mediation and the SEM modeling framework for such a causal process. In Section 3, we discuss how to frame the mediation model using FRM after giving a brief review of the latter. In Section 4, we discuss inference for the distribution-free FRM-based SEM for both complete and missing data in the longitudinal study design. In Section 5, we illustrate the proposed approach with both real and simulated data, and compare its performance with existing alternatives. In Section 6, we give our concluding remarks.

2. Structural Equations Model for Mediation Analysis

2.1. Mediation Analysis

In treatment studies, it is often of great interest to identify and study mechanisms by which an intervention achieves its effect. By investigating such a mediational process through which the treatment affects study outcomes, not only can we further our understanding of the pathology of the disease and treatment, but may also provide alternative intervention strategies for the disease with efficient use of resources. For example, a tobacco prevention program may teach participants how to stop taking smoking breaks at work, thereby changing the social norms for tobacco use. As a result, this change in social norms reduces cigarette smoking (MacKinnon & Fairchild, 2009). With mediation analysis, we gain insight and acquire deep understanding about the mechanism of action of pharmacological and psychotherapeutic treatments. Such information provides an added dimension to understand the etiology of disease and pathways of therapeutic effects so more efficacious and cost efficient alternative therapies may be developed.

Baron and Kenny (1986), in the first paper addressing mediation analysis, tested the mediation process through a series of regression equations. Since variables in a causal relationship can play both roles of the cause and effect, the standard regression paradigm is ill-suited for modeling such a relationship, because of the unique cause or effect qualifier of each variable it mandates (Kraemer, 2001; MacKinnon & Fairchild, 2009; Rothman & Greenland, 1998). The structural equation model (SEM) provides a formal modeling and inference framework for mediation and other related causal analysis.

There are many advantages for using the SEM framework in the context of mediation analysis. SEM simplifies testing of mediation hypotheses as SEM is designed, in part, to test these more complicated mediation models in a single analysis (MacKinnon, 2008). In standard regression, ad hoc methods such as the ones developed by Baron and Kenny (1986), Sobel (1982), and Clogg, Petkova, and Shihadeh (1992) must be used for inference about indirect and total effects. These ad hoc methods rely on combining the results of two or more equations to derive the asymptotic variance. This is especially problematic under missing data where a different number of observations may be missing in the different regression equations representing a mediation process. Also, in standard regression, by default, we handle missing data via listwise deletion since there is no built in missing data mechanism when using ordinary least squares (OLS).

Further, since we make causal assumptions in a mediation model, SEM analysis approach provides model fit information about consistency of the hypothesized mediational model to the data and evidence for the plausibility of the causality assumptions (Bollen & Pearl, 2012; Imai, Keele, & Tingley, 2010) we make in a mediation model. The procedure by Baron and Kenny (1986) has also been shown to be low powered (MacKinnon, 2008). SEM allows for ease of extension to longitudinal data within a single framework, corresponding with a study’s conceptual framework for clear hypothesis articulation (Preacher, Wichman, MacCallum, & Briggs, 2008). Finally, Bollen and Pearl (2012) note that even the same equation in SEM and regression analysis is based on completely different assumptions and has a different meaning. Standard regression analysis implies a statistical relationship based on a conditional expected value, while SEM implies a functional relationship expressed via a conceptual model, path diagram, and mathematical equations. The causal relationships in a hypothesized mediation process, simultaneous nature of the indirect and direct effects and the dual role the mediator plays as both a cause for the outcome and effect of the intervention can therefore be expressed better in terms of structural equations than regression analysis.

2.2. Structural Equations Model (SEM)

The structural equation model (SEM) addresses complex relationships among variables that are generally depicted in path diagrams. A path diagram consists of nodes representing the different variables and arrows showing relations among the variables. For example, shown in Figure 1 is the path diagram for the causal relationship between the three variables in the smoking prevention example discussed earlier. The three variables, prevention intervention (xi1), social norm (zi2), and amount of smoking (yi3), are measured at three assessment points in chronological order starting at baseline within a longitudinal setting. Also, all effect-receiving variables such as the social normal and amount of smoking in this example are called endogenous variables, and effect-imparting variables such as the prevention intervention are known as exogenous variables in the nomenclature of SEM. In this paper, all variables are observed unless otherwise noted. See Bollen (1989) and Kowalski and Tu (2007) for more details about modeling complex relationships involving latent constructs using SEM.

Figure 1.

Figure 1

Diagram showing the pathway of a mediation process for a tobacco prevention program.

Consider a longitudinal study with three assessment times indexed by t (1 ≤ t ≤ 3). Let xit, zit, and yit denote the causal (or predictor), mediator and response variables at time t. In this paper, we focus on continuous mediators and responses. The SEM for a typical mediational process involving a single predictor, mediator and response as depicted in the path diagram in Figure 1 is given by

zi2=β0+βxzxi1+ϵzi,yi3=γ0+γxyxi1+γzyzi2+ϵyi,ϵi=(ϵyiϵzi)(0,Ψ),Ψ=(σϵy2σϵyϵzσϵz2),xi1ϵzi,xi1,zi2ϵyi. (1)

The error terms ϵyi and ϵzi are typically assumed to be normal. However, since we focus on robust inference, we only assume that ϵyi and ϵzi are both continuous with mean 0 and variance Ψ. The above may be expressed in general matrix notation as follows:

ui=Bui+Γxi+ϵi,ϵi(0,Ψ),xiϵi, (2)

or alternatively,

ui=(I2B)1Γxi+ϵ~i,ϵ~i(0,Φ),xiϵ~i, (3)

where

ui=(yi3,zi2),xi=(1,xi1),θ=(γ0,γzy,γxy,β0,βxz,σϵz2,σϵyϵz,σϵy2),I2=(1001),B=(0γzy00),Γ=(γ0γxyβ0βxz),ϵ~i=(I2B)1ϵi,Φ=(I2B)1Ψ(I2B). (4)

Note that stochastic independence is not taken for granted, as it is particularly important for causal inference (Bollen & Pearl, 2012; Imai et al., 2010). To facilitate validation, the usual independence is replaced by zero correlation, or pseudo-isolation, which can be empirically checked. For example, to assess the causal effect of xi1 and zi2 in (1), it is critical that xi1 be uncorrelated with ϵzi in the first and both xi1 and zi2 be uncorrelated with ϵyi in the second equation of the SEM. It is then readily checked that

Cov(ϵzi,ϵyi)=Cov(ϵyi,zi2)=0. (5)

In other words, ϵzi and ϵyi are uncorrelated as well, i.e., σϵyϵz = 0, for this particular SEM, and thus θ in (4) reduces to θ=(γ0,γzy,γxy,β0,βxz,σϵz2,σϵy2).

Note that SEM such as the one in (1) only provides causal inference in the absence of selection bias, such as data from randomized controlled trials. Imai et al. (2010) have proposed approaches to extend SEM in the absence of selection bias based on the counterfactual outcome based causal framework, such as data from most epidemiological studies. In this paper, we assume no selection bias unless otherwise stated.

Although (1) can also be applied to cross-sectional data, the longitudinal data setting is more popular for mediation analysis since temporal changes are important for modeling causal relationships. In this paper we consider longitudinal data from multiple waves, based on a hypothesized relationship between xi1, zi2, and yi3. These repeated measures from different times are linked not only with each other, but with other variables as well (see Section 5.1.1). For example, under MAR the missingness of zi2 may depend on zi1 and even yi1 in addition to xi1 (Kowalski & Tu, 2007; Robins, Rotnitzky, & Zhao, 1995).

For mediation analysis, the primary interest is the hypothesis of full mediation H0 : γxy = 0. Under this null, the direct path from x to y is broken, with the effect of x on y fully mediated through the change in z. In practice, however, it is more common that a researcher comes across partial mediation, where the direct path from x to y is partially broken through the change in z, indicating that z mediates some of the effect of x on y. Under partial mediation, one becomes interested in constructing the direct, γxy, indirect, βxzγxy, and total, γxy + βxzγxy, effect of the predictor xi1. The three types of effects are readily obtained from estimates of θ. Inference (standard errors and p-values) about such effects is also easily performed using the Delta or Bootstrap methods (e.g., Sobel, 1982; Clogg et al., 1992; Bollen & Stine, 1990).

Significant advances have been made over the past few decades in the theory and applications as well as software development for fitting SEM models. For example, in addition to specialized packages such as LISREL (Joreskog & Sorbom, 1997), MPlus (Muthén & Muthén, 1998–2010), EQS (Bentler, 2006), and Amos (Arbuckle, 1995–2010), procedures for fitting SEM are also available from general-purposes statistical packages such as R, SAS, STATA, and Statistica. These packages provide inference based on maximum likelihood (ML), generalized least squares (GLS), and weighted least squares (WLS). While ML assumes a multivariate normal for the joint distribution of all variables (exogenous plus endogenous variables such as the trivariate (xi1, zi2, yi3) in (1) within our context), GLS and WLS do not, thereby providing more robust estimates.

In recent years, many software packages have implemented robust variance estimates to improve the validity of inference in the presence of departures from assumed parametric models (Maas & Hox, 2004; Van der Leeden & Busing, 1994; Van der Leeden, Busing, & Meijer, 1997). In most real studies, missing data invariably occurs, especially for longitudinal studies. In the presence of missing data, such variance estimates not only do little to prevent inference bias (wrong standard errors and type I error levels), but also give rise to bias in parameter estimates (Lu, Tang, He, Yu, Crits-Christoph, Zhang, & Tu, 2009). Our simulations for the SEM-based mediation model in (1) have demonstrated similar poor performances by the GLS, ML, and WLS under departures from the assumed normality in the presence of missing data. We discuss these findings in detail in Section 5.

Other approaches have also been proposed to address missing data. For example, Papadopoulos and Amemiya (2005) proposed a multistage analysis approach to handle two nonnormal, correlated endogenous variables and unbalanced data (due to missing data) in the longitudinal setting (Allison, 1987; Baker & Fulker, 1983; Satorra, 2002; Werts, Rock, & Grandy, 1979). However, their approach to missing data is analogous to the generalized estimating equations (GEE), which provides valid inference only for a more limited class of missing data following the missing completely at random (MCAR), rather than the more general response-dependent missing at random (MAR) mechanism focused in this paper.

In a recent paper, Zhang and Wang (2013) introduced a method using normal data and the EM algorithm along with bootstrapping for mediation analysis. However, like other methods that rely on parametric assumptions, their approach will also yield invalid inference under MAR, if the parametric assumptions are violated.

Finally, outside of the SEM framework using the Baron and Kenny (1986) approach, we might observe similar point estimates, though slightly different standard errors and Type I error, via OLS and SEM ML under complete data or under the MCAR assumption when standard regression assumptions are met. However, OLS does not have a built in mechanism to handle missing data and does not provide robust inference under the MAR assumption.

However, unlike traditional maximum likelihood inference for regression models, where only the regression coefficients such as γ0, γxy, γzy, β0, and βxz are estimated, the ML for SEM must also estimate the variance parameters such as σϵy2 and σϵz2 in (1) in order to make the γ’s identifiable (Bollen, 1989; Gunzler, 2011). However, the necessity to estimate these parameters not only complicates the estimation procedure, but also renders conventional robust methods inapplicable to SEM.

Before introducing a distribution-free approach to address the aforementioned flaws of available methods within a longitudinal data setting, we first give a brief overview of the functional response models upon which the new approach is premised.

3. Functional Response Models

As we have discussed earlier, we use SEM methods for mediation analysis in this paper that make simultaneous inference about both equations in the mediation model. We develop an approach in this section that will be used to achieve the inference step within the context of distribution-free SEM. Traditional regression models for cross-sectional data will not provide such inference and are all defined based on a single-subject response. For example, consider a sample of size n, and let yi and xi denote some response and a vector of explanatory variables (independent variables, predictors, or covariates) of interest (1 ≤ in). The popular linear model is defined by, E(yixi)=xiβ, where E(yi | xi) denotes the conditional mean of yi given xi, and β a vector of parameters. In this model, the dependent variable is a single-subject response yi. Although the linear model has been extended for modeling more complex types of response variables such as binary and count data (McCullagh & Nelder, 1989), such models still involve the single-subject response yi. For example, in the generalized linear model defined by E(yixi)=g(xiβ), although the right side is generalized to be a function of the linear predictor, xiβ, to accommodate the range restriction of non-linear response yi, the left side remains identical to the linear model. Further, the response only appears on the left, while the explanatory variable must stay on the right side of the model.

These constraints prevent regression models from being applied to many problems of interest in practice. For example, within our context of SEM, since an endogenous variable can be on either side of an equation, this feature immediately excludes regression as a framework for inference about SEM. In other applications such as correlation analysis, mixture models and social network connectivity, we are interested in relationships between defined by outcomes from multiple subjects, which again violates the confines of regression. By addressing the fundamental limitations of the classic regression, the functional response model (FRM) can express a broader class of problems under a regression-like framework (Kowalski & Tu, 2007; Kowalski, Pagano, & DeBruttola, 2002; Kowalski & Powell, 2004; Ma, Tang, Feng, & Tu, 2008; Ma, Tang, Yu, & Tu, 2010; Tu, Feng, Kowalski, Tang, Wang, Wan, & Ma, 2007; Yu, Tang, Kowalski, & Tu, 2011; Yu, Chen, Tang, He, Gallop, Crits-Christoph, Hu, & Tu, 2013).

Consider a class of distribution-free regression models defined by

E[f(yi1,,yiq)g(yi1,,yiq),xi1,,xiq]=h(g(yi1,,yiq),xi1,,xiq;θ),(i1,,iq)Cqn,1q,1in, (6)

where yi = (yi1, … , yiq) denotes the vector of responses from the ith subject, f is some vector-valued function and g is a subset of variables yi1, … , yiq with f and g containing nonoverlapping variables, h(θ) some vector-valued smooth function (e.g., with continuous derivatives up to the second order), θ a vector of parameters of interest, q some positive integer, and Cqn the set of (nq) combinations of q distinct elements (i1, … , iq) from the integer set {1, … , n}. FRM in (6) extend the single-subject response in the classic GLM to a function of responses from multiple subjects. For example, by setting q = 1, we immediately obtain from (6) the class of distribution-free GLM.

To apply the FRM in our setting, let

fi=(f1i,f2i),hi(θ)=h(xi,θ)=(h1i(θ),h2i(θ)),i=1,2,,n,f1i=(yi3,zi2),f2i=(yi32,yi3zi2,zi22),xi=xi1,h1i(θ)=((γ0+γzyβ0)+(γxy+γzyβxz)xi1,β0+βxzxi1),h2i(θ)=E(f2ixi)=(E(yi32xi),E(yi3zi2xi),E(zi22xi)), (7)

where θ=(γ0,γzy,γxy,β0,βxz,σϵz2,σϵy2) is the set of parameters, and

E(zi22xi)=σϵz2+(β0+βxzxi1)2,E(yi3zi2xi)=γzy(β0+βxzxi1)(β0+βxzxi1)+γzyσϵz2+(β0+βxzxi1)[(γ0+γzyβ0)+(γxy+γzyβxz)xi1],E(yi32xi)=γzy2σϵz2+σϵy2+[(γ0+γzyβ0)+(γxy+γzyβxz)xi1]2. (8)

Then the FRM for the SEM in (7) is

E(fixi)=hi(θ),i=1,2,,n. (9)

Note that as σϵyϵz = 0 following from (1), this parameter is not included in θ. Note also that the xi in the FRM above is defined differently from the xi in (2) or (3).

As in the case of (1), hi (θ) above is not linear in the components of θ, and thus the FRM-based SEM in (9) is again not a linear model. The first component f1i of the response function fi is identical to ui in the parametric setup in (1), while the second f2i contains the necessary second-order moments of ui to provide the needed information to identify all the γ’s in (7). The three higher-order moment equations in (8) not only permit estimation of the two variance parameters σϵy2 and σϵz2, but also supplement the needed information to address the identifiability of the γ’s.

For the particular SEM in (1) for mediation analysis, we may also define an alternative FRM to estimate the parameters of primary interest, θ=(γ0,γxy,γzy,β0,βxz), without the help of higher-order moments as f2i in (7). The issue of identifiability is introduced when taking the conditional expectation of yi3 given xi1 in the second equation (Bollen, 1989; Gunzler, 2011). This “smoothing” process eliminates the independent information needed to estimate γzy, thereby rendering the resulting smoothed version incapable of identifying the parameters in the second regression model in (1). Thus, to identify all the γ’s without relying on higher-order moments, we may define an FRM by bypassing such a smoothing process.

To this end, consider an FRM defined by

E(fi1gi1)=h1(gi1,θ),E(fi2gi2)=h2(gi2,θ),fi1=yi3,fi2=zi2,gi1={xi1,zi2},gi2=xi1,h1(gi1,θ)=γ0+γxyxi1+γzyzi2,h2(gi2,θ)=β0+βxzxi1. (10)

As the SEM in (9), fi1 and gi1 contain no common variable. Since the mean response hi1(θ) above retains the variable zi2, all model parameters θ are uniquely defined, and thus estimable. Further, the FRM is not a longitudinal regression, since git involves a varying set of variables for t = 1, 2.

For either FRM-based mediation model in (7) and (10), its cross-sectional data version is similarly defined by simply modifying the respective components of the FRMs. Since the alternative FRM in (10) does not involve second-order moments as does ML, GLS, and WLS, it is much easier to implement, saving valuable computation time.

4. Distribution-Free Inference

We first discuss distribution-free inference with no missing data, which is the same for either cross-sectional or longitudinal data, and then extend the considerations to the longitudinal setting under the two most common missing data mechanisms, MCAR and MAR.

4.1. Complete Data

Consider the FRM-based mediation model in (7) with complete data, with fi=(yi3,zi2,yi32,yi3zi2,zi22). Let

Si=fihi,Di=θhi,Vi=Var(fixi),wni=DiVi1Si. (11)

We estimate θ by the following set of estimating equations:

wn(θ)=1ni=1nwni=i=1nDiVi1Si=0. (12)

Given the model specifications, Si and Di are readily evaluated. Further, given parametric assumptions in (1), Vi is also easily computed. Under (9), the estimate θ^ obtained as the solution of θ to the equations in (12) is consistent and asymptotically normal, regardless of the data distributions, i.e.,

n(θ^θ)dN(0,Σθ),B=E(DiVi1Di),Σθ=B1E(DiVi1SiSiVi1Di)B, (13)

where →d denotes convergence in distribution.

Unlike the ML estimate, the asymptotic results above do not require the (normal) distribution assumptions. But if the conditional joint ui = (yi3, zi2) given xi1 does follow the multivariate normal, the asymptotic variance Σθ in (13) simplifies to the model-based asymptotic variance Σθ = B−1. A consistent estimate of Σθ is obtained by substituting moment estimates in place of the respective parameters:

Σ^θ=B^1(1n1i=1nD^iV^i1S^iS^iV^i1D^i)B^,B^=1n1i=1nD^iV^i1D^i,

where B^i, D^i, S^i, and V^i denote the corresponding quantities with θ replaced by θ^. Further, since we are fixing on our exogenous variable xi1 in our estimating equations, the asymptotic results of the FRM-based approach will hold regardless of what type of variable (i.e., discrete, continuous) or the data distribution of xi1.

The set of equations in (12) bears a close resemblance to the generalized estimating equations II or GEE II (Kowalski & Tu, 2007; Prentice & Zhao, 1991; Reboussin & Liang, 1998). For convenience, we refer to (12) as GEE II throughout the rest of discussion. By setting fi and hi equal to the ones in (10), we can also use (12) to provide inference about θ for this alternative FRM. To ensure consistent estimates, however, we must select Vi in such a way that E(Vi1Sixi1)=Vi1E(Si+xi1) (Scharfstein, Rotnitzky, & Robins, 1999). For example, this condition is met, if Vi = Var(fi | xi1). But, if Vi = Var(fi | xi1, zi2), E(Vi1Sixi1)Vi1E(Sixi1) and (12) is not guaranteed to provide consistent estimates. Another choice of Vi to ensure this condition is

Vi=(Var(fi1xi1)Cov(fi1,fi2xi1)Cov(fi1,fi2xi1)Var(fi2xi1,zi2)).

A straightforward choice of Vi to ensure this condition is I2. The estimating equations approach discussed yields more robust inference than ML. Therefore, using nonnormally distributed data, we will obtain more robust estimates for the standard errors and Type I errors using this approach. Also of note is that while we are only estimating first moment parameters in the alternative FRM, the estimation procedure accounts for the correlated responses because we are including a working covariance matrix Vi.

4.2. Missing Data

Applications of GEE II are also used for unbalanced data, and analogous methods to GEE II have been proposed for nonnormal, correlated, and unbalanced data in the longitudinal setting, applicable to mediation analysis (Allison, 1987; Baker & Fulker, 1983; Papadopoulos & Amemiya, 2005; Satorra, 2002; Werts et al., 1979). However, as missing data with an underlying mechanism for missingness arise in virtually all real studies, applications of GEE II when restricted to the subsample of subjects with completely observed data over all assessments (t = 1, 2, 3) are not only inefficient, but more importantly are also vulnerable to selection bias (Kowalski & Tu, 2007; Lu et al., 2009; Robins et al., 1995; Scharfstein et al., 1999; Tsiatis, 2006).

For mean-based distribution-free models such as the generalized linear model, the weighted generalized estimating equations (WGEE) is the most popular for inference when the missing data follows the missing at random (MAR) assumption, a plausible and yet quite general statistical model for missing data that is applicable to many studies in practice (Kowalski & Tu, 2007; Little & Rubin, 1987; Lu et al., 2009; Robins et al., 1995; Scharfstein et al., 1999; Tsiatis, 2006). We discuss below how to extend this inverse probability weighting (IPW) based approach to the current FRM-based SEM models for mediation analysis.

We assume that there is no missing data at baseline t = 1, and missing data is the result of missed visit. Thus, the trio (xit, zit, yit) is observed for t = 1, but subject to missing for t = 2, 3. Define a missing (or rather observed) data indicator for each ith subject as follows:

rit={1if(xit,zit,yit)is observed0if(xit,zit,yit)is missing},ri=(ri1,ri2,ri3).

Under the assumptions, ri1 = 1 (1 ≤ in). Let

πit=Pr(rit=1xi,zi,yi),Δit=ritπit. (14)

In most applications, the missing data probability πit is unknown and must be estimated. Under the missing completely at random (MCAR) assumption (Little & Rubin, 1987), ri is independent of xi, zi and yi, yielding πit = Pr(rit = 1) = πt (2 ≤ t ≤ 3, 1 ≤ in). In this case, πt, a constant independent of xi, zi and yi, is readily estimated by the sample moment: π^t=1ni=1nrit(2t3).

Under MAR, πit becomes dependent on the observed xis, zis, yis for s up to and including time t (2 ≤ t ≤ 3), making it difficult to model and estimate πit without imposing some further assumption. One popular constraint is the Monotone Missing Data Pattern (MMDP). Under MMDP, xit, zit and yit are observed only if their predecessors xis, zis, and yis prior to time t are all observed. As patient dropout is the most popular MAR, this assumption not only reduces the number of missing data patterns and complexity in modeling πit, but also posits a plausible model for modeling missingness in most real studies (Kowalski & Tu, 2007; Robins et al., 1995).

For 2 ≤ t ≤ 3, let

x~it=(xi1,,xi(t1)),z~it=(zi1,,zi(t1)),y~it=(yi1,,yi(t1)),2t3,1in,

denoting the vectors containing the explanatory, mediator, and response variables prior to time t, respectively. Let

Hit={x~it,z~it,y~it;2t3}.

In the above, Hit contains all the observed information prior to time t. It follows from the posited assumptions of MAR that

πit=Pr(rit=1xi,zi,yi)=Pr(rit=1Hit),2t3.

Thus, under MAR, the independence of rit with xi, zi and yi under MCAR is replaced by a weaker version of conditional independence; rit is independent of (xit, zit, yit) given Hit.

To estimate πit, let pit = Pr(rit = 1 | ri(t–1) = 1, Hit), denoting the one-step transition probability for observing data from time t – 1 to t. We can model pit using logistic regression:

logit(pit(ηt))=logit[Pr(rit=1ri(t1)=1,Hit)]=η0t+ηxtx~it+ηztz~it+ηyty~it,2t3, (15)

where ηt=(η0t,ηxt,ηzt,ηyt) are the parameters for the logistic model above. It is readily shown that under MMDP

πit(η)=pitPr(ri(t1)=1Hi(t1))=s=2tpis(ηs),2t3,1in, (16)

where η=(η2,η3) contains the collection of all parameters for modeling πit (η) (t = 2, 3). Thus, we can estimate πit from the estimates of pit using the relationship in (16).

To estimate η, we can use either maximum likelihood or estimating equations. For example, when using maximum likelihood, we estimate η^ as the solution to the score-based estimating equations:

Qn(γ)=i=1n(Qi2,Qi3)=0, (17)

where

Qit=ηt{ri(t1)[ritlog(pit)+(1rit)log(1pit)]},2t3,i=1,2,,n, (18)

With an estimated πit, we can develop a set of weighted GEE II (WGEE II) for inference about θ. Consider first the FRM in (7) and define a weighting matrix as follows:

Δi=(ri3πi300000ri2πi200000ri3πi300000ri3πi300000ri2πi2),1in. (19)

Now consider the following WGEE II:

wn(θ)=i=1nwni=i=1nDiVi1ΔiSi=0, (20)

where Δi is given in (19), and Di, Vi, and Si are defined the same as in the equations in (12) except for the redefined longitudinal response fi and mean function hi. As in the GEE II case, Vi may be computed under the normal distribution assumption. Finally, to compute an estimate of θ from (20), we must substitute an estimate of γ such as the one defined by (17) and (18).

The estimate θ^ obtained from solving the WGEE II in (20) has nice asymptotic properties, akin to GEE II, as summarized in a theorem below, with a sketch of the proof given in Appendix A.

Theorem 1. Under mild regularity conditions,

  1. θ^ is consistent.

  2. θ^ is asymptotically normal with asymptotic variance given by
    Σθ=B1(ΣU+Φ)B,B=E(DiVi1ΔiDi),ΣU=E(DiVi1ΔiSiSiΔiVi1Di),Φ=CHTCFF,C=E[η(DiVi1ΔiSi)],H=E(ηQni),F=E(DiVi1ΔiSiQniHC). (21)

A consistent estimate of Σθ is given by

Σ^θ=B^1(1ni=1nD^iV^i1Δ^iS^iS^iΔ^iV^i1D^i)B^,B^=1ni=1nD^iV^i1Δ^iD^i,C^=1ni=1nη(D^iV^i1Δ^iS^i),F^(1ni=1nD^iV^i1Δ^iS^iQ^ni)H^C^, (22)

where A^i denotes the corresponding quantity with θ replaced by θ^. Note that the asymptotic variance in (21) contains a modifying term B−1 Φ B−⊺ to account for the sampling variability in the estimated η^. By substituting the first 2 × 2 submatrix of Δi defined in (19) in place of Δi in (20), we can apply the resulting WGEE II to provide inference about the θ for the FRM in (10).

4.3. A Score Test

As Wald-type tests are typically anticonservative (Daniels & Kass, 2001; Pan, 2001; Yu et al., 2013), score statistics are often used as an alternative to reduce bias, especially for type I error rates for small to moderate samples. We develop a score statistic based on the WGEE II, which is asymptotically equivalent to theWald, but is often more accurate for small and moderate samples.

Let θ=(θ(1),θ(2)), with p and q denoting the dimension of θ(1) and θ(2). Consider testing the null H0 : θ(2) = θ(20), with θ(20) denoting a vector of known constants. Let

Di=(h(θ)θ(1)h(θ)θ(2))=(Di(1)Di(2)),DiVi1ΔiSi=(Di(1)Vi1ΔiSiDi(2)Vi1ΔiSi),wn(1)(θ)=1ni=1nDi(1)Vi1ΔiSi,wn(2)(θ)=1ni=1nDi(2)Vi1ΔiSi. (23)

Denote by θ~(1) the estimate of θ(1) from solving the following reduced WGEE II:

wn(1)(θ(1),θ(20))=1ni=1nDi(1)Vi1ΔiSi=0. (24)

Note that only θ(1) is unknown in the equations above.

To define the score statistic, let

θ~=(θ~(1)θ(20)),B=E(DiVi1ΔiDi)=(B11B12B12B22),G=(B21B111Ip),Σ(2)=GΣθ(B)G, (25)

where Ip designates the p × p identity matrix, B11 denotes the p × p submatrix, B12 the p × q submatrix, and B22 the q × q submatrix from the partitioned (p + q) × (p + q) matrix B defined in (21), and Σθ(B)=BΣθB, with Σθ defined in (21). Let

w~n(2)=w~n(2)((θ~(1),θ(20))),Σ(2)1=Σ~(2)1((θ~(1),θ(20))),

i.e., the quantities of wn(2) and Σ(2) with θ substituted by θ~.

Consider the score test statistic, Ts((θ~(1),θ(20)))=nw~n(2)Σ~(2)1w~n(2). As asserted by Theorem 2, this statistic has an asymptotic (central) χq2 distribution with q degrees of freedom, i.e.,

Ts((θ~(1),θ(20)))=nw~n(2)Σ~(2)1w~n(2)dχq2. (26)

Theorem 2. Under mild regularity conditions, if H0 : θ(2) = θ(20) holds true, the score test Ts((θ~(1),θ~(20))) has the asymptotic distribution in (26). A sketch of the proof is given in Appendix B.

5. Applications

In this section, we illustrate the methodology with both simulated and real study data. Since only θ = (γ0, γzy, γxy, β0, βxz) is of primary interest for mediation analysis, we focus on the simplified FRM-based SEM in (10) for all simulated and real study data, except for the complete data case of the simulation study in which we also apply the FRM in (7) to estimate the variance parameters in addition to θ. We start with the simulation study.

5.1. Simulation Study

We carried out a series of simulation studies to examine the performance of the proposed approach with different sample sizes. Since the cross-sectional data is a special case of the longitudinal setting under complete data, we performed the simulations for the latter only in the order of no missing data, and missing data under MCAR and the MAR. We also include a power analysis.

We report results for sample size n = 50, 100, and 500 with complete data. Since we anticipate potential bias in the ML approach with missing data under MAR in violation of parametric assumptions, according to theoretical results, we define our large sample size in the missing data simulation as n = 2000 in place of n = 500 to evaluate how this bias continues to increase with sample size. All simulations were performed based on a Monte Carlo sample size of 1,000, with data generated and FRM analysis performed using the R software (R Development Core Team, 2008).We assumed a continuous xi1 for the simulation study in the paper. However, the proposed approach works for continuous and discrete xi1.

We briefly outline the results of the simulation studies with complete data using the SEM for the mediation model in (1) in this section, before a more in depth treatment of the results with missing data. The FRM in (7) showed almost no bias with complete data and normal errors. The simplified FRM in (10) performed similarly well to the existing methods (ML, MLR, WLS, and GLS) with complete data and normal errors.

We also performed the same simulations for the simplified FRM in (10) under nonnormal error terms, using the t-distributed with 3 degrees of freedom and chi-squared with 1 degree of freedom (scaled to have mean 0 and variance 1) error terms, i.e., ϵzi, ϵyi ~ F(0, 1), where F(0, 1) is either a t distribution or a rescaled chi-square with 1 degree of freedom. The FRM-based approach once again showed almost no bias and performed similarly to another robust approach, ML estimation with robust asymptotic standard errors (Satorra & Bentler, 1994, 1998) performed using the MLR option in MPlus (Muthén & Muthén, 1998–2010), under all sample sizes. Although ML estimation in theory does not provide consistent estimates for nonnormal data, it performed well for these two particular nonnormal distributions, since we failed to observe large bias in the ML point estimates. We did observe slightly smaller estimates for the asymptotic standard errors under all sample sizes using our FRM-based SEM and slightly different Type I errors compared to ML, although these differences were almost trivial at n = 500. Another distribution-free approach, WLS, showed some potential convergence issues with nonnormal data, given that some of the Monte Carlo runs under the t-distributed errors with 3 degrees of freedom lead to inflated estimates for the asymptotic standard errors for the estimates of γ = (γ0, γzy, γxy) at n = 100 and at n = 500 when compared to FRM, MLR, ML, and GLS.

5.1.1. Missing Data

We considered the SEM for the longitudinal mediation model with central t-distributed random error terms with 3 degrees of freedom:

xi1=μx+gi+ϵxi1,zi1=β01gi+fi+ϵzi1,zi2=β02+β12xi1+ci+fi+ϵzi2,yi2=γ02ci+di+ϵyi2,yi3=γ03+γ13xi1+γ23zi2+di+ϵyi3,ϵxi1=ϵzi1=ϵzi2=ϵyi2=ϵyi3t3giN(0,σg2),fiN(0,σf2),ciN(0,σc2),diN(0,σd2). (27)

Assuming no missing data at baseline t = 1, we simulated missing responses under MCAR and MAR with about 15 % (30 %) missing data at t = 2 (3). We also included random effects gi, fi, ci, and di in our model for the between subject effects, linking observations of measures from previous time points to current time points, a common occurrence in real world data. Since the error terms are t-distributed, the joint normal distribution assumption is not met in the presence of missing data following MAR. Also, unlike the case in modeling a single response over time as in traditional longitudinal data analysis, the MAR mechanism within the current context of mediation analysis is more complex, since missingness can occur to either or both of endogenous variables zit and yit.

Note that the random effects in (27) were used to create correlated repeated measures of both within- and between-variables. We could have used other approaches such as the copula method (Nelson, 2006). However, the random-effect approach allowed us to have better control about the relationships both within the same and across different variables.

We discussed how to model the missing data mechanism under MMDP in Section 4.2. In the simulation model in (27), the repeated measures of the three variables were linked together using random effects and thus missingness of any of the variables could be made to depend on itself or any of the other variables at the previous time points. For convenience and without the loss of generality, we considered a relatively simple Markov structure for the missing value indicator rit, with the missingness of zi2 (yi3) depending only on its predecessor zi1 (yi2), with the one-step transition probability pit in (15) taking the following form:

logit(pi2)=logit(Pr(ri2=1ri1=1,Hi2))=η02+ηz2zi1,logit(pi3)=logit(Pr(ri3=1ri2=1,Hi3))=η03+ηy3yi2. (28)

We set ηz2 = ηy3 = 0.4 (ηz2 = ηy3 = 0) for MAR (MCAR), and then selected values of η0t to yield about 15 % and 30 % missing data at t = 2 and 3 under MAR (MCAR). We considered the simplified FRM-based SEM in (10), but simulated (xit, zit, yit) over all three assessment times to enable us to model the missing data.

We fixed σg2, σf2, σc2 and σd2 so that, approximately:

Corr(xi1,zi1)=0.30,Corr(zi1,zi2)=0.25,Corr(zi2,yi2)=0.20,Corr(yi2,yi3)=0.25.

Therefore, in this simulation we chose a scenario using random effects to link repeated measures and a special case of our missing data model to illustrate the type of bias we might observe in practice with a moderate amount of missing data.

For the simulation study, we set θ = (γ0, γzy, γxy, β0, βxz) = (1, 1, 0, 1, 1). Since γxy = 0, we were able to examine the performance of the FRM when used to test the null hypothesis of full mediation. Given the correlations and distribution assumptions in (27), the outcomes, (xi1, zi2, yi3), were readily simulated from (27). The data generated was fit by the FRM-based SEM in (7).

To examine type I error rates, we considered the null of full mediation, H0: γxy = 0, for the parameter of primary interest, and computed the Wald statistic, Qn=nσ^γxy2γ^xy2, where σ^γxy2 denotes the element of the estimated asymptotic variance Σ^θ corresponding to γ^xy. Let Qn(k) denote the value of this statistic from the kth MC simulation (1 ≤ k ≤ 1000). The type I error rate for testing H0 was estimated by: α^=11000k=11000I{Qn(k)q1,0.95}, with q1,0.95 denoting the 95th percentile of the central χ12 with one degree of freedom.

Since the Wald statistic is often anticonservative (Yu et al., 2013), we also applied the score test statistic in Section 4.3 to see how this alternative would improve the Wald within the current setting. For testing the null H0 : γxy = 0, we partitioned θ as θ=(θ(1),θ(2)). with θ(1) = (γzy, βxz) and θ(2) = γxy. Under H0 : θ(2) = 0, the score statistic Ts(θ~(1),0) in (26) has an asymptotic χ12 distribution. The type I error rate for the score test was calculated similarly as in the Wald by α^=11000k=11000I{Ts(k)q1,0.95}, where Ts(k) denotes the value of the statistic Ts(θ~(1),0) from the kth MC simulation (1 ≤ k ≤ 1000).

The estimate of θ was obtained from the WGEE II. Both GLS and WLS perform listwise deletion and as a result do not provide valid inference in the presence of missing data. Therefore, it was unnecessary to include these estimates in the simulation studies. Since results under MCAR are quite similar to those under complete data for FRM and ML, we only report results for the MAR case. The full information maximum likelihood (FIML) approach provided by Mplus handled missing data under ML (Muthén & Muthén, 1998–2010).

Shown in Table 1 are the percent bias of parameter estimates, estimates of the average asymptotic standard errors over 1,000 MC replications and type I error rates for the SEM model in (27) under FRM and ML, with missingness modeled as in (28) for FRM. For comparison purposes, we also included the relative difference (RD) between the average asymptotic and “empirical” standard error estimates defined as

RD=(asymptoticempirical)/(asymptotic+empirical2),

where |·| is the absolute value function.

Table 1.

Comparison of percent bias of parameter estimates, average asymptotic standard error, relative difference (RD) between average asymptotic and empirical standard error, and type I error rates, under and type I error rates, with missing data under MAR from 1000 MC simulations.

Missing data (15 %/30 % at time 2/3) with correlated repeated measures, t-distributed error terms


θ
(%) Bias
Method
Standard error
Method
Asymptotic
RD
FRM ML FRM ML FRM ML
Sample size = 50

γ 0 9.9 60.2 0.679 0.691 0.248 0.107
γ zy 1.7 4.9 0.181 0.187 0.264 0.082
γ xy −3.7 −6.7 0.347 0.357 0.204 0.062
β 0 0.8 21.4 0.549 0.555 0.070 0.028
β xz 1.4 3.3 0.234 0.246 0.175 0.086

α for H0: γxy = 0 Wald 0.134 0.079

Score 0.057

Sample size = 100

γ 0 5.4 61.9 0.512 0.495 0.227 0.078
γ zy 0.9 5.1 0.139 0.133 0.189 0.044
γ xy −0.1 −5.5 0.259 0.247 0.189 0.004
β 0 1.0 22.1 0.400 0.397 0.046 0.030
β xz 0.2 2.7 0.168 0.171 0.041 0.036

α for H0: γxy = 0 Wald 0.107 0.057

Score 0.064

Sample size = 2000

γ 0 0.4 63.3 0.138 0.111 0.148 0.053
γ zy 0.1 4.8 0.040 0.029 0.072 0.034
γ xy −0.4 −5.1 0.074 0.054 0.115 0.000
β 0 −0.3 22.2 0.093 0.090 0.052 0.033
β xz 0.2 2.9 0.040 0.038 0.000 0.027

α for H0: γxy = 0 Wald 0.054 0.159

Score 0.047

The percent bias of parameter estimates for FRM was relatively small, and approached zero as the sample size increased. However, the estimates for ML showed a heavy bias, which stayed at about the same magnitude across the different samples. The Score type I error for FRM was close to or below 0.05 at all times, while the Wald type I error was inflated for n = 50, 100, but decreased to 0.05 at the large sample size of n = 2000. In contrast, the type I error for ML remained high and even increased to 0.159 at n = 2000, confirming that ML does not provide consistent estimates.

Figure 2 visually depicts the bias in ML compared to FRM. FRM, as the sample size increases, becomes even closer to a horizontal line representing the bias of parameter estimates (difference between the mean parameter estimates and population parameter estimates) for all the estimates of θ = (γ0, γxy, γzy, β0, βxz), while ML continues to show bias with a curvilinear pattern, under and overestimating the different parameters. We see that the size of the confidence intervals for the estimates whether by FRM or ML are all decreasing with sample size, with the difference that ML estimates are off target. Additionally, we performed the same missing data simulation using MLR in place of ML. Since the percent bias of parameters estimates will be the same between the two methods, both are problematic. Similar to the case of complete data, for n = 50, 100, the asymptotic standard errors were slightly smaller for MLR. However, the type I error was higher for MLR (Type I error = 0.092) at n = 50, and n = 100. (Type I error = 0.69). Again, however, at the large sample, in this case n = 2000, there was no distinct difference between the two methods of MLR (Type I error = 0.162) and ML.

Figure 2.

Figure 2

Simulation Results:Mean estimates – population estimates (± standard errors) show the bias in ML while FRM performs well with missing data.

5.1.2. Power Analysis

Shown in Table 2 are the power estimates under sample sizes n = 50, 100, and 500 for the Wald test for the different methods generated under full mediation, H0 : γxy = 0 vs. Ha : γxy = 0.5, considered for the SEM model in (1), with complete data and normal errors as well as missing data following MAR modeled by (27) under correlated repeated measures and t-distributed errors. Under complete data, the Score test showed that the proposed approach had slightly lower power, but still performed well. Note that both the Wald test for FRM and WLS showed higher power than ML for n = 50, 100, but these were likely the result of bias in the estimates due to small to moderate sample sizes, since ML was (asymptotically) the most powerful in the simulation setting with normal errors and complete data.

Table 2.

Comparison of type II error rates through power analysis from 1000 MC simulations.

Power Analysis under H0: γxy = 0, HA: γxy = 0.5
Normal error under complete data

Method
Sample size FRM (Wald) FRM (Score) ML GLS WLS
50 0.735 0.647 0.706 0.697 0.735
100 0.924 0.899 0.920 0.920 0.924
500 >0.999 >0.999 >0.999 >0.999 >0.999
Missing data (15 %/30 % at time 2/3) with correlated repeated measures, t-distributed error terms

Method
Sample size FRM (Wald) FRM (Score) ML
50 0.350 0.204 0.264
100 0.511 0.363 0.458
500 0.938 0.915 0.985

Since we already established a bias in ML with missing data under MAR with nonnormal error terms, we wanted to see how FRM would perform under missing data in terms of detecting mediation effects. Since only FRM yielded valid inference in this case, the results in the table for the large sample size n = 500 indicated that ML had an upward bias. We have noted during the simulation study with missing data for the SEM model in (27) that the bias in ML is increasing with sample size, so this increase in power may lead a researcher to believe even more strongly in biased results.

As in the complete data case, the Wald test for FRM again showed some upward bias, especially for the small and moderate sample sizes n = 50, 100. Note that we did not include power estimates for GLS and WLS since these methods perform listwise deletion, which not only yielded inefficient estimates under missing data, but also invalid inference under MAR.

5.2. Real Study Data

The real study data is shown to complement the simulation results by showing how the methods may lead us to different estimates and conclusions when presented with nonnormal, skewed complete data and missing data under MAR.

5.2.1. Child Resilience Example with Complete Data

To illustrate the approach to real study data, we applied FRM to a longitudinal study known as the Child Resilience Project (Wyman, Cross, Brown, Yu, & Tu, 2010). This project is ongoing, with 401 students from first up to third grade in five Rochester City School District elementary schools. Enrollment began in Fall 2006, with data collection for the final cohort to be completed by June 2011. The study examines how children with a higher risk of developing behavioral problems with a mentor socially improve compared to the control and lower risk children over periods of 6 and 18 months.

In this mediation analysis, both the mediator and response were focused on helping children to manage challenging emotions—emotion self-regulation, self-reported verbal, declarative knowledge of the skills the child is learning in the Resilience Project at 6 months. We examined what role a potential mediator plays in a cause and effect relationship between the treatment and the child’s self-initiated demonstration of skills he/she is learning at 6 months. Thus, we have longitudinal data with two assessment times, baseline and 6 months, with the mediator hypothesized to occur before the outcome. There were 253 subjects with these three measures forming a complete dataset. Shown in Table 3 is a sample of our longitudinal data set for the three variables of interest.

Table 3.

Child Resilience Study complete dataset sample.

ID Treatment at baseline Knowledge of skills at 6 months Demonstration of skills at 6 months
1 No 9 0
2 Yes 8 4
3 No 0 0
4 No 14 3
5 Yes 14 1
6 Yes 0 0
7 No 9 1
8 No 4 1
9 Yes 10 2
. . . .
. . . .

The treatment is a binary indicator as children either had a mentor or no mentor. In the hypothesis of interest, the treatment would be expected to predict a higher demonstration of skills, which would indicate that the children receiving a mentor improved their social skills over time. The distributions of both the verbal, declarative knowledge of skills, and demonstration of skills were skewed as shown in Figure 3.

Figure 3.

Figure 3

Histogram of the mediator and outcome for the Child Resilience Study.

By identifying the treatment condition as xi1, the verbal, declarative knowledge of the skills at 6 months as zi2, and the demonstration of skills at 6 months as yi3, we modeled the mediation process of interest using the SEM in (1).

Shown in Figure 4 are the path diagrams for the direct linear model and full mediation model. Since the path for the linear regression was significant, we looked to the mediation model to see how the mediator would effect the relationship between the treatment and demonstration of skills.

Figure 4.

Figure 4

Path diagram for the mediation model for the Child Resilience Study.

Shown in Table 4 are the estimates of θ, standard errors, and type I errors for this mediation model obtained from FRM and ML. Since the results for ML and GLS were practically the same, we only included the ML estimates in the table. WLS was unable to yield estimates for this highly skewed data, showing the same potential convergence problems as for some of the Monte Carlo runs with the t-distributed with 3 degrees of freedom error terms in our simulation studies. From the table, we see that the estimates for FRM and ML were practically the same. This is not surprising, since, as noted in our simulation study, ML would still produce consistent estimates in the presence of nonnormal error under complete data. However, the standard error was significantly lower for the FRM estimate of γ0, and the standard errors were also lower for the FRM estimates of β = (β0, βxz). Although the FRM and ML estimates of γxy, the parameter of primary interest to the mediation hypothesis, were identical to the three decimal places shown in the table, the slightly higher standard error and type I error of FRM may be the most accurate. Since we hypothesized (both conceptually as well as based on other related findings) that a demonstration of skills was positively associated with treatment in this model, the estimates of γxy confirmed this relationship.

Table 4.

Parameter estimates, standard errors, and type I error rates for the mediation model for the Child Resilience Study with complete data.

Estimates, standard errors and type I errors
Child Resilience Study example under complete data


θ
Estimate
Method
Standard error
Method
Asymptotic


FRM ML FRM ML
Sample size = 253

γ 0 0.392 0.392 0.127 0.202
γ zy 0.117 0.117 0.032 0.029
γ xy 0.873 0.873 0.306 0.279
β 0 3.429 3.429 0.369 0.374
β xz 4.390 4.390 0.527 0.529

Type I α for H0: γxy = 0 Wald 0.004 0.002

Score 0.005

The relationship was significant in the linear model, and the magnitude of the relationship has decreased, while the p-value has increased in the mediation model. Therefore, verbal, declarative knowledge of skills at 6 months may be a partial mediator. It is important to note that all parameters θ = (γ0, γxy, γzy, β0, βxz) were highly significant for FRM by bothWald and Score Type I errors (p ≤ 0.005). However, for ML the γ0 parameter was not significant (p = 0.053).

5.2.2. Child Resilience Example with Missing Data

As mentioned in Section 5.2.1, there were 401 students in the child resilience project. We had full information on whether each child received the treatment at baseline. However, there were more missing observations for demonstration of skills at 18 months, as now only 164 students were observed. A longitudinal mediation analysis under missing data, using the same three variables as in Section 5.2.1, over 18 months will help us understand if the treatment is effective over a longer time period, accounting for all 401 children in the study.

When modeling for the missing data, we had 253 observations for the mediator, declarative knowledge of the skills at 6 months, as in the complete data example. Shown in Table 5 is a sample of our longitudinal data set for the three variables of interest.

Table 5.

Child Resilience Study dataset sample with missing data.

ID Treatment at baseline Knowledge of skills at 6 months Demonstration of skills at 18 months
1 No 9 3
2 Yes 8 4
3 No 0 NA
4 No 14 0
5 Yes NA NA
6 Yes 14 0
7 Yes NA NA
8 Yes 0 NA
9 No 9 2
. . . .
. . . .

We expanded our mediation analysis to an 18-month time period with missing data by identifying the treatment condition as xi1, the verbal, declarative knowledge of the skills at 6 months as zi2, and the demonstration of skills at 18 months as yi3, and model the mediation process of interest using the SEM in (1), as shown in Figure 5.

Figure 5.

Figure 5

Path diagram for the mediation model for the Child Resilience Study with MAR Data.

We had a high percentage of (37 %/59 %) missing data at t = 2(3) and modeled our missing data using the following logistic model:

logit(pi2)=η02+ηx1xi1,logit(pi3)=η03+ηz2zi2, (29)
pit=Pr(rit=1ri(t1)=1,Hit). (30)

This is a special case of our missing data model from Section 4.2 in which we are only observing xi1, zi2, and yi3 and building our missing data models with only observed data at the previous time point from xi1 and zi2. We estimated the parameters in R with the glm function. Since we modeled our missing data at t = 2 based on the treatment information at baseline, we used all 401 observations.

Shown in Table 6 are the estimates of η = (η02, ηx1, η03, ηz2) for the missing data model in (29). The p-value for ηz2 was significant, indicating a MAR mechanism for the missing data at time 3. Since the p-value for ηx1 was not significant, missing data at time 2 was MCAR and we would expect no bias for the estimates of β = (β0, βxz) in ML according to our simulation results in Section 5.1.1. However, based on our simulation results in Table 1, we expect to see a bias for the estimates of the γ = (γ0, γxy, γzy) parameters in ML. Subsequently, if we assumed a MAR mechanism for the missing data at time 2 and a MCAR mechanism for the missing data at time 3, we would expect bias for the estimates of β = (β0, βxz) at time 2 but not for γ = (γ0, γxy, γzy) at time 3 in ML. In the hypothesis of interest, the treatment would be expected to predict a higher demonstration of skills at 18 months, which would indicate that the children receiving a mentor improved their social skills over time.

Table 6.

Parameter estimates, standard errors, and p-values for the missing data model for the Child Resilience Study.

Estimates, standard errors and p-value
Child Resilience Example under missing data

η Estimate Standard error
Asymptotic
p-value
Sample size = 401

η 02 0.546 0.147 <0.001
η x1 −0.019 0.207 0.926
η 03 0.250 0.201 0.214
η z2 0.067 0.029 0.022

Shown in Table 7 are the estimates of θ and associated standard errors and type I errors for this mediation model obtained from FRM and ML. From the table, we see that the estimates for FRM and ML were practically the same for the β = (β0, βxz) parameters, but different for γ = (γ0, γxy, γzy). In the three significant estimates with the same or similar results for FRM and ML, (γ0, β0, βxz), FRM had a smaller standard error. We saw from the simulation for longitudinal missing data in Table 3 that ML would produce a value of γxy biased less in magnitude than the true estimate. This appeared true again as the FRM estimate was higher in magnitude, confirming that the treatment predicted a higher demonstration of skills at 18 months. The parameter γzy was not significant for either FRM or ML in this model (p > 0.421 for the Wald test in both FRM and ML), implying a nonsignificant indirect effect in this mediation analysis.

Table 7.

Parameter estimates, standard errors, and type I error rates for the mediation model for the Child Resilience Study with missing data.

Estimates, standard errors and type I errors
Child Resilience Study example under missing data (37 %/59 %)


θ
Estimate
Method
Standard error
Method
Asymptotic


FRM ML FRM ML
Sample size = 401

γ 0 1.812 1.810 0.278 0.352
γ zy −0.042 −0.039 0.053 0.050
γ xy 2.330 2.283 0.503 0.480
β 0 3.429 3.429 0.370 0.374
β xz 4.390 4.390 0.528 0.529

Type I α for H0: γxy = 0 Wald < 0.001 <0.001

Score < 0.001

6. Discussion

Mediation analysis is a critical component of many studies in biomedical, psychosocial, and related services research to investigate the causal mechanism of interventions. The traditional regression paradigm is ill-suited for modeling such multidimensional causal relationships, because of the complex relationships among different variables and many different roles a variable may play. Although the structural equation model (SEM) provides an ideal conceptual framework for the dynamic role a mediator plays in channeling the effect of a causal agent to the outcome of interest, the various restrictions imposed by the available inference methods have hindered its wide use, and limited the utility of the information provided in real studies. Namely, in real studies, commonly we come across missing data, where making the MAR assumption is reasonable and parametric assumptions may be violated.

Great progress has been made in the analysis of causal relationships, particularly in the area of distribution-free models for longitudinal data analysis, over the past few decades. However, little of this progress applies directly to SEM and its application to modeling mediation relationships within a longitudinal setting. One major block is the nonlinear nature of SEM in its parameters, which in general requires second- and even higher-order moments to identify model parameters. As only the first-order moment is modeled in the standard regression paradigm, addressing the limitations of SEM requires a paradigm shift by breaking the tradition in current regression analysis and creating a new framework for modeling nonlinear relationships and higher-order moments.

By taking advantage of the functional response models (FRM), we have developed a new, robust approach to systematically address the limitations of SEM as it applies to mediation analysis. This class of FRM-based SEM requires no parametric models for the data distribution and provides valid inference for longitudinal mediation hypotheses under the two most popular missing data mechanisms, missing completely at random (MCAR) and missing at random (MAR).

The results from the simulation studies indicate that the new approach performed well under both complete and missing data for small and moderate sample sizes. Under complete data, the FRM-based SEM is nearly on par with the parametric ML in efficiency. Thus, the loss of efficiency seems to be negligible for practical purposes. On the other hand, the FRM-based mediation model shines when the missing data follow MAR with nonnormal errors, since it is the only model to provide robust estimates and valid inference.

The validity of WGEE II relies on a correct model for the missingness. If this model is misspecified, WGEE II estimates may be biased. Work is currently underway to investigate the possibility to extend the concept of double robust estimate in traditional distribution-free regression models to the current context of FRM-based mediation model (Browne, 1974; Lu et al., 2009; Satorra, 2002). Currently, in progress are also extensions of the approach to noncontinuous outcomes such as binary.

Acknowledgements

This research was supported in part by NIH grant R33 DA027521, NIH/NCRR CTSA grant KL2TR000440 and NIH grant R01 GM108337-01. Its contents are solely the responsibility of the authors and do not necessarily represent the official views of the NIH. We also thank Dr. Peter Wyman for his generous help with the use of data for the Child Resilience Study and interpretation of model estimates, and Dr. Changyong Feng and Dr. Yinglin Xia for their help with the presentation of the materials.

Appendix A. Proof of Theorem 1

Let Gi=DiVi1 and πi = (πi1, … , πim 1). Then wn=1ni=1nGiΔiSi, with Gi Δi Si = Gi(xi, θ) Δi(ri, πi, γ)Si(yi, xi, θ). It follows from the iterated conditional expectation that E(Gi Δi Si) = E[Gi SiE(Δi | ri, yi, xi)]. Since E(ritπitri,yi,xi)=1, it follows that E(Gi Δi Si) = E(Gi Si) = 0. Thus, the WGEE II is unbiased and the estimate θ^ obtained as the solution to the equations is consistent.

Let γ^ be the solution to (17). By a Taylor expansion of the estimating equations in (17) and solving for γ^γ, we obtain

n(γ^γ)=H1nni=1nQni+op(1), (A.1)

where op(1) denotes the stochastic o(1) (Kowalski & Tu, 2007). Also, by applying a Taylor series expansion to the WGEE II in (20), we have

nwn=(θwn)n(θ^θ)(γwn)n(γ^γ)+op(1)=(θwn)n[wn+C(γ^γ)]+op(1).

It follows from (A.1) and (A.2) that

n(θ^θ)=(θwn)nni=1n(wniCH1Qni)+op(1). (A.2)

Since

θwn=1ni=1n(θΔiSi)Gi+op(1)=1ni=1nDiΔiGi+op(1)pB, (A.3)

where →p denotes convergence in probability, it follows from (A.2) and (A.3) that

n(θ^θ)=Bnni=1n(wniCH1Qni)+op(12). (A.4)

By applying the central limit theorem and Slutsky’s theorem (Kowalski & Tu, 2007) to (A.4), θ^ is asymptotically normal with the asymptotic variance given by Σθ in (21).

Appendix B. Proof of Theorem 2

First, assume no missing data. Then B=E(DiVi1Di). By applying the law of large numbers,

θwn(θ)=(θ(1)wn(1)(θ)θ(1)wn(2)(θ)θ(2)wn(2)(θ)θ(2)wn(2)(θ))pB=E(DiVi1Di)=(B11B12B12B22). (B.1)

It follows from a Taylor’s series expansion and (B.1) that

0=wn(1)(θ~(1),θ(20))=wn(1)(θ)B11(θ~(1)θ(1))+op(n12).

Thus,

θ~(1)θ(1)=B111wn(1)(θ)+op(n12). (B.2)

Similarly, since B12=B21, we have

wn(2)(θ~(1),θ(20))=wn(2)(θ)(θ(1)wn(2))(θ~(1)θ(1))+op(n12)=wn(2)(θ)B21(θ~(1)θ(1))+op(n12). (B.3)

It follows from (B.2) and (B.3) that

wn(2)(θ~(1),θ(20))=wn(2)(θ)B21[B111wn(1)(θ)+op(n12)]+op(n12)=Gwn(θ)+op(n12).

By the central limit theorem,

nwn(2)(θ~(1),θ(20))=nGwn(θ)+op(1)dN(0,Σ(2)=GΣθG), (B.4)

where G is defined in (25) and Σθ in (21). In the presence of missing data, B=E(DiVi1ΔiDi) as defined in (21). By a similar argument, wn(2)(θ~(1),θ(20)) has an asymptotic normal distribution, which implies that the score statistic Ts((θ~(1),θ(20))) has the asymptotic χq2 distribution.

Contributor Information

D. Gunzler, CASE WESTERN RESERVE UNIVERSITY AT METROHEALTH MEDICAL CENTER

W. Tang, UNIVERSITY OF ROCHESTER

N. Lu, UNIVERSITY OF ROCHESTER AND CANANDAIGUA VA MEDICAL CENTER

P. Wu, CHRISTIANA CARE HEALTH SYSTEM

X.M. Tu, UNIVERSITY OF ROCHESTER AND CANANDAIGUA VA MEDICAL CENTER

References

  1. Allison PD. Sociological methodology. American Sociological Association; Washington: 1987. Estimation of linear models with incomplete data; pp. 71–103. [Google Scholar]
  2. Arbuckle JL. IBM SPSS Amos 19 user’s guide. Amos Development Corporation; Crawfordville: 1995–2010. [Google Scholar]
  3. Baker LA, Fulker DW. Incomplete covariance matrices and LISREL. Data Analyst. 1983;1:3–5. [Google Scholar]
  4. Baron RM, Kenny DA. The moderator–mediator variable distinction in social psychological research: concept, strategic and statistical considerations. Journal of Personality and Social Psychology. 1986;51:1173–1182. doi: 10.1037//0022-3514.51.6.1173. [DOI] [PubMed] [Google Scholar]
  5. Bentler PM. EQS 6 structural equations program manual. Multivariate Software, Inc.; Encino: 2006. [Google Scholar]
  6. Bollen KA. Structural equations with latent variables. Wiley; New York: 1989. [Google Scholar]
  7. Bollen KA, Pearl J. Eight myths about causality and structural equation models (Technical Report (R-393)) In: Morgan S, editor. Handbook of causal analysis for social research. Springer; New York: 2012. UCLA Cognitive Systems Laboratory, Draft chapter for. [Google Scholar]
  8. Bollen KA, Stine R. Direct and indirect effects: classical and bootstrap estimates of variability. Sociological Methodology. 1990;20:115–140. [Google Scholar]
  9. Browne MW. Generalized least-squares estimators in the analysis of covariance structures. South African Statistical Journal. 1974;8:1–24. [Google Scholar]
  10. Clogg CC, Petkova E, Shihadeh ES. Statistical methods for analyzing collapsibility in regression models. Journal of Educational Statistics. 1992;17(1):51–74. [Google Scholar]
  11. Daniels MJ, Kass RE. Shrinkage estimators for covariance matrices. Biometrics. 2001;57:1173–1184. doi: 10.1111/j.0006-341x.2001.01173.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Gunzler D. Doctoral dissertation. 2011. A class of distribution-free models for longitudinal mediation analysis. Retrieved from ProQuest Dissertations and Theses Database (AAT 3478294) [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Imai K, Keele L, Tingley D. A general approach to casual mediation analysis. Psychological Methods. 2010;15(4):309–334. doi: 10.1037/a0020761. [DOI] [PubMed] [Google Scholar]
  14. Joreskog KG, Sorbom D. Lisrel 8 user’s guide. 2nd ed. Scientific Software; Lincolnwood: 1997. [Google Scholar]
  15. Kowalski J, Powell J. Nonparametric inference for stochastic linear hypotheses: application to high-dimensional data. Biometrika. 2004;91(2):393–408. [Google Scholar]
  16. Kowalski J, Tu XM. Modern applied U statistics. Wiley; New York: 2007. [Google Scholar]
  17. Kowalski J, Pagano M, DeGruttola V. A nonparametric test of gene region heterogeneity associated with phenotype. Journal of the American Statistical Association. 2002;97(458):398–408. [Google Scholar]
  18. Kraemer H. How do risk factors work together?Mediators, moderators, and independent, overlapping, and proxy risk factors. The American Journal of Psychiatry. 2001;158:848–856. doi: 10.1176/appi.ajp.158.6.848. [DOI] [PubMed] [Google Scholar]
  19. Little RJA, Rubin DB. Statistical analysis with missing data. Wiley; New York: 1987. [Google Scholar]
  20. Lu N, Tang W, He H, Yu Q, Crits-Christoph P, Zhang H, Tu XM. On the impact of parametric assumptions and robust alternatives for longitudinal data analysis. Biometrical Journal. 2009;51:627–643. doi: 10.1002/bimj.200800186. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Ma Y, Tang W, Feng C, Tu XM. Inference for kappas for longitudinal study data: applications to sexual health research. Biometrics. 2008;64:781–789. doi: 10.1111/j.1541-0420.2007.00934.x. [DOI] [PubMed] [Google Scholar]
  22. Ma Y, Tang W, Yu Q, Tu XM. Modeling concordance correlation coefficient for longitudinal study data. Psychometrika. 2010;75(1):99–119. [Google Scholar]
  23. Maas CJ, Hox JJ. Robustness issues in multilevel regression. Statistica Neerlandica. 2004;58:127–137. [Google Scholar]
  24. MacKinnon D. Introduction to statistical mediation analysis. Lawrence Erlbaum Associates; New York: 2008. [Google Scholar]
  25. MacKinnon D, Fairchild A. Current directions in mediation analysis. Current Directions in Psychological Science. 2009;18:16–20. doi: 10.1111/j.1467-8721.2009.01598.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. McCullagh P, Nelder JA. Generalized linear models. Chapman and Hall; London: 1989. [Google Scholar]
  27. Muthén LK, Muthén BO. Mplus user’s guide. 6th ed. Muthén & Muthén; Los Angeles: 1998–2010. [Google Scholar]
  28. Nelsen RB. An introduction to Copulas. Springer; New York: 2006. [Google Scholar]
  29. Pan W. On the robust variance estimator in generalized estimating equations. Biometrika. 2001;88:901–906. [Google Scholar]
  30. Papadopoulos S, Amemiya Y. Correlated samples with fixed and nonnormal latent variables. The Annals of Statistics. 2005;33:2732–2757. [Google Scholar]
  31. Preacher KJ, Wichman AL, MacCallum RC, Briggs NE. Latent growth curve modeling. Sage; Los Angeles: 2008. [Google Scholar]
  32. Prentice RL, Zhao LP. Estimating equations for parameters in means and covariances of multivariate discrete and continuous responses. Biometrics. 1991;47:825–839. [PubMed] [Google Scholar]
  33. R Development Core Team . R: a language and environment for statistical computing. R Foundation for Statistical Computing; Vienna: 2008. [Google Scholar]
  34. Reboussin BA, Liang KY. An estimating equations approach for the LISCOMP model. Psychometrika. 1998;63:165–182. [Google Scholar]
  35. Robins JM, Rotnitzky A, Zhao LP. Analysis of semiparametric regression models for repeated outcomes in the presence of missing data. Journal of the American Statistical Association. 1995;90:106–121. [Google Scholar]
  36. Rothman KJ, Greenland S. Modern epidemiology. Lippingcott Williams and Wilkins; Philadelphia: 1998. [Google Scholar]
  37. Satorra A. Asymptotic robustness in multiple group linear-latent variable models. Econometric Theory. 2002;18:297–312. [Google Scholar]
  38. Satorra A, Bentler PM. Corrections to test statistics and standard errors in covariance analysis. In: von Eye A, Clogg CC, editors. Latent variable analysis: applications to developmental research. Sage; Newbury Park: 1994. pp. 399–419. [Google Scholar]
  39. Satorra A, Bentler PM. Scaling corrections for chi-square statistics in covariance structure analysis; In 1988 proceedings of the business and economics statistics section of the American Statistical Association; 1998.pp. 308–313. [Google Scholar]
  40. Scharfstein DO, Rotnitzky A, Robins JM. Adjusting for nonignorable drop-out using semiparametric nonresponse models. Journal of the American Statistical Association. 1999;94:1096–1120. [Google Scholar]
  41. Sobel ME. Asymptotic intervals for indirect effects in structural equations models. In: Leinhart S, editor. Sociological methodology. Jossey-Bass; San Francisco: 1982. pp. 290–312. [Google Scholar]
  42. Tsiatis AA. Semiparametric theory and missing data. Springer; New York: 2006. [Google Scholar]
  43. Tu XM, Feng C, Kowalski J, Tang W, Wang H, Wan C, Ma Y. Correlation analysis for longitudinal data: applications to HIV and psychosocial research. Statistics in Medicine. 2007;26:4116–4138. doi: 10.1002/sim.2857. [DOI] [PubMed] [Google Scholar]
  44. Van der Leeden R, Busing F. First iteration versus IGLS/RIGLS estimates in two-level models: a Monte Carlo study with ML3. Department of Psychometrics and Research Methodology, Leiden University; Leiden: 1994. [Google Scholar]
  45. Van der Leeden R, Busing F, Meijer E. Applications of bootstrap methods for two-level models; Paper presented at the Multilevel Conference; Amsterdam, The Netherlands. 1997.Apr 1–2, [Google Scholar]
  46. Werts CE, Rock DA, Grandy J. Confirmatory factor analysis applications: missing data problems and comparisons of path models between populations. Multivariate Behavioral Research. 1979;14:199–213. doi: 10.1207/s15327906mbr1402_5. [DOI] [PubMed] [Google Scholar]
  47. Wyman PA, Cross W, Brown HC, Yu Q, Tu XM. Intervention to strengthen emotional self-regulation in children with emerging mental health problems: proximal impact on school behavior. Journal of Abnormal Child Psychology. 2010;30:707–720. doi: 10.1007/s10802-010-9398-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Yu Q, Tang W, Kowalski J, Tu XM. Multivariate U-statistics: a tutorial with applications. Wiley Interdisciplinary Reviews: Computational Statistics. 2011;3:457–471. [Google Scholar]
  49. Yu Q, Chen R, Tang W, He H, Gallop R, Crits-Christoph P, Hu J, Tu XM. Distribution-free models for longitudinal count responses with overdispersion and structural zeros. Statistics inMedicine. 2013;32(14):2390–2405. doi: 10.1002/sim.5691. [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Zhang Z, Wang L. Methods for mediation analysis with missing data. Psychometrika. 2013;78:154–184. doi: 10.1007/s11336-012-9301-5. [DOI] [PubMed] [Google Scholar]

RESOURCES