An Introduction to Model Implied Instrumental Variables using Two Stage Least Squares (MIIV-2SLS) in Structural Equation Models (SEMs)

Kenneth A Bollen; Zachary Fisher; Michael L Giordano; Adam Lilly; Lan Luo; Ai Ye

doi:10.1037/met0000297

. Author manuscript; available in PMC: 2023 Oct 1.

Published in final edited form as: Psychol Methods. 2021 Jul 29;27(5):752–772. doi: 10.1037/met0000297

An Introduction to Model Implied Instrumental Variables using Two Stage Least Squares (MIIV-2SLS) in Structural Equation Models (SEMs)

Kenneth A Bollen ^1,^*, Zachary Fisher ¹, Michael L Giordano ¹, Adam Lilly ¹, Lan Luo ¹, Ai Ye ¹

PMCID: PMC8799757 NIHMSID: NIHMS1674084 PMID: 34323584

Abstract

Structural equation models (SEMs) are widely used to handle multiequation systems that involve latent variables, multiple indicators, and measurement error. Maximum likelihood (ML) and Diagonally Weighted Least Squares (DWLS) dominate the estimation of SEMs with continuous or categorical endogenous variables, respectively. When a model is correctly specified, ML and DWLS function well. But, in the face of incorrect structures or nonconvergence, their performance can seriously deteriorate. Model Implied Instrumental Variable, Two Stage Least Squares (MIIV-2SLS) estimates and tests individual equations, is more robust to misspecifications, and is noniterative, thus avoiding nonconvergence. This paper is an overview and tutorial on MIIV-2SLS. It reviews the six major steps in using MIIV-2SLS: 1) model specification, 2) model identification, 3) latent to observed (L2O) variable transformation, 4) finding MIIVs, 5) using 2SLS, and 6) tests of overidentified equations. Each step is illustrated using a running empirical example from Reisenzein’s (1986) randomized experiment on helping behavior. We also explain and illustrate the analytic conditions under which an equation estimated with MIIV-2SLS is robust to structural misspecifications. We include additional sections on MIIV approaches using a covariance matrix and mean vector as data input, conducting multilevel SEM, analyzing categorical endogenous variables, causal inference, and extensions and applications. Supplemental online material illustrates input code for all examples and simulations using the R package MIIVsem.

Translational Abstract

Theories in psychology hypothesize relationships between abstract variables that we can only imperfectly measure. To test these ideas requires models with latent variables to represent these abstract concepts and multiple measures to anchor the latent variables to those we can observe. Researchers routinely use latent variable structural equation models (SEMs) to test psychological theories and explanations, as well as to refine our measures. Current methods to estimate and test such models focus on the whole system under the assumption that the model is a fully accurate portrayal of reality. In the common situation of models as approximations to reality, these techniques are susceptible to spreading errors from one part of the system to another. This paper describes an alternative approach, MIIV-2SLS, with a focus on individual equations that better limits the spread of model misspecification errors that often occur. This didactic paper describes the major steps to using MIIV-2SLS illustrated with an empirical example. It also describes its conditions for robustness as well as the wide variety of areas in which the MIIV-2SLS estimator applies and highlights a number of recent extensions to the method.

Introduction

Theories in psychology hypothesize relationships between abstract variables that we can only imperfectly measure. To test these ideas requires models with latent variables to represent these abstract concepts and multiple indicators to anchor the latent variables to those we can observe. Researchers routinely use latent variable structural equation models (SEMs) to test psychological theories and explanations, as well as to refine our measures. Once the model is built and the measures are available, one class of estimators has come to dominate applications. These are called “system wide” or “full information” estimators because they estimate all parameters in the SEM simultaneously where information in one part of the model helps to shape the estimates in another part of the model. When the endogenous variables are continuous, the maximum likelihood (ML) estimator dominates the field whereas with categorical endogenous variables diagonally weighted least square (DWLS) is a favorite.¹

These system wide estimators have much to commend them. Consider the ML estimator, the default in virtually all SEM software. This estimator is consistent, asymptotically unbiased, asymptotically efficient, asymptotically normal, and we have large sample standard errors for all parameter estimates. Furthermore, the ML estimator yields a likelihood ratio chi square test to compare our hypothesized model to a saturated model, as a means of assessing overall model fit. Indeed, these properties may lead us to question why we would consider any other approach.

This rosy picture of system wide estimators such as ML depends on the fulfillment of its underlying assumptions. And this is where complications arise. For instance, the claim of asymptotic efficiency is often misunderstood. Many researchers interpret this to mean that ML is more efficient than any other estimator. However, it is subtler than that. It means that in “large” samples there is no other estimator more efficient than ML. It is possible that another estimator has the same asymptotic efficiency. It also says nothing about the efficiency of ML in finite samples.

But even this requires closer scrutiny. The asymptotic efficiency, asymptotic standard errors, and the likelihood ratio test of the ML estimator assume that the observed variables come from a multivariate normal distribution (Jöreskog, 1977; Bollen, 1989, pp. 131–35) or from a distribution without excessive multivariate kurtosis (Browne, 1984).² When these distributional assumptions are not met, we can use corrected or bootstrap standard errors (e.g., Satorra & Bentler, 1994; Bollen & Stine, 1990) and chi square test statistics (Satorra & Bentler, 1994; Bollen & Stine, 1993), though we sacrifice the claim of asymptotic efficiency when using them.

Less easy to address is the “no structural misspecifications” assumption upon which the system wide (full information) ML estimator depends. Those desirable properties of ML that we listed depend on the assumption that the model is known and is the true data generation model. That is, we are assuming there are no omitted variables, correlated errors, cross loadings, or relations among the latent variables. Furthermore, we are assuming we have the correct dimensions for each of our concepts as well as the right functional form relating our variables to each other. These assumptions run counter to the widespread acceptance that all SEMs are approximations (e.g., Browne & Cudeck, 1993; MacCallum, Browne, & Cai, 2007).³ Approximation means structural misspecifications such as those described. Structural misspecifications, in turn, mean the desirable properties of ML need not hold.

The advantages of a system wide estimator in using information from all equations becomes a disadvantage when bias from one part of the system spreads to other parts that have no such issues. The parsimonious single chi square test of global fit makes it difficult to locate the source of the problem when the test fails to validate the hypothesized model. A small, omitted correlation among the errors of two indicators, for instance, might lead to rejection of the whole model. This holds even if a key latent variable equation is largely unaltered by omitting this correlation. Furthermore, the ML estimator as an iterative estimator might not converge on a solution or might converge on estimates that are improper solutions, such as negative variances or correlations greater than one in absolute value.

The dominance of these system wide estimators is so great that many researchers do not realize there are alternative estimators for SEMs that overcome or diminish many of these limitations. The Model Implied Instrumental Variable, Two Stage Least Squares (MIIV-2SLS) estimator originating in Bollen (1996a) is one example and the focus of our paper. In contrast to the system wide estimators, researchers can apply the MIIV-2SLS estimator to one equation at a time. Indeed, if one equation includes the key hypothesis in a study, an analyst could estimate and test this equation with MIIV-2SLS without estimating other equations. This could be especially helpful if other equations in the model are underidentified, which hinders a system wide estimator like ML from being applied.

The MIIV-2SLS estimator does not assume that the observed variables come from multivariate normal or no excess multivariate kurtosis distributions to justify its significance test of coefficient estimates. It is asymptotically “distribution free” (Bollen, 1996a, p.115–116). In addition, Bollen’s (1996a) MIIV-2SLS has greater robustness to structural misspecifications than the system wide ML estimator (e.g., Bollen, Kirby, Curran, Paxton, & Chen, 2007). Compared to the system wide ML estimator, MIIV-2SLS better isolates structural misspecifications to the equations in which they occur. Indeed, considerable progress has been made in understanding the conditions under which an equation will be robust to structural misspecifications elsewhere in the system (Bollen, 2001, 2020b; Bollen, Gates, & Fisher, 2018). This is important when we are dealing with models that are only approximately valid, a situation that characterizes nearly all models. In addition, MIIV-2SLS has an overidentification test that applies to individual overidentified equations rather than to the whole model (Bollen, 1996a; Kirby & Bollen, 2009). This allows for local fit testing and can highlight problematic as well as good-fitting equations rather than only testing the model as a whole.

Another desirable feature of the MIIV-2SLS estimator of SEMs is computational. Full information ML estimators are iterative and sometimes fail to converge to a solution. This can occur, even if analysts increase the number of iterations or change the starting values. The MIIV-2SLS estimator of SEMs is a noniterative estimator of a model. As such, nonconvergence is not an issue because an explicit formula analogous to OLS regression is available to estimate the coefficients in the model.

Given this long list of advantages, it is surprising that MIIV-2SLS is not more widely used to estimate SEMs. We see several reasons for this. One reason is that MIIV-2SLS methods synthesize concepts from econometrics and psychometrics. It uses the ideas of latent variables and multiple indicators from psychometrics combined with instrumental variable estimators that are better known in econometrics. Psychologists and psychometricians are largely unfamiliar with instrumental variable methods. Econometricians and other social scientists using instrumental variable methods, on the other hand, rarely use latent variables and multiple indicators, as is routinely done in psychometrics. Finally, the key idea in the MIIV-2SLS estimator, that we can transform latent variables into observed variables and derive instrumental variables from among the observed variables within the model, is not something routinely done in any of these fields. In other words, the MIIV-2SLS borrows ideas from several different fields but falls between the cracks in that it requires knowledge from areas that are not often combined.

In addition to the unfamiliarity with MIIV-2SLS, another reason for lower usage is that software built specifically for the MIIV-2SLS estimation of SEM was not previously available. Researchers aware of the estimator could adapt existing software for this purpose, but the burden of doing so was far greater than employing the system wide estimators (e.g., ML) routinely available in SEM software. Fortunately, a freely available R (CRAN; R Core Team, 2019) package called MIIVsem is widely available and includes many features of the MIIV-2SLS estimator for SEMs (Fisher, Bollen, Gates, & Rönkö, 2017).

Another factor that influences usage is that most of the MIIV-2SLS literature is fairly technical which can restrict readership. Didactic or overview papers are rare. A recent paper by Bollen (2019) gives an overview of the MIIV approach to SEM, but it was a Presidential Address to the Society for Multivariate Experimental Research with a more technical audience as its focus. In addition, it did not cover a number of topics that we cover. Among the topics that we discuss and that were not treated in Bollen (2019) are: 1) MIIV-2SLS approaches to analyze multilevel SEM, 2) an up-to-date list of conditions when MIIV-2SLS is robust to structural misspecifications, 3) new results on analyzing categorical endogenous variables, 4) the use of covariance matrices and means as input in place of raw data, 5) connections between the MIIV approach and causal inference, 6) an illustration of multiple testing corrections for our equation-based chi square tests, 7) explaining and illustrating a chi square difference test for MIIV-2SLS, and 8) using data from a randomized experiment as the empirical example to show how the MIIV-2SLS applies to these types of designs. Our main text is in scalar rather than matrix notation to reach a wider audience. Furthermore, we describe a variety of extensions and provide the most complete list of empirical applications of MIIV-2SLS ever published. Although papers using full information estimators are far more common in SEM, our list provides a number of applications of MIIV-2SLS.

Our purpose is to provide a didactic overview of the MIIV-2SLS estimator for latent variable SEM. The intended audience are graduate students, faculty, and other researchers who have experience with SEMs but are unfamiliar with the MIIV-2SLS estimator. In addition to explaining the ideas and assumptions that underlie the MIIV-2SLS estimator, we will illustrate its use with empirical data.

Steps in the MIIV Approach

In this section, we will explain the six main steps in applying the MIIV approach to a SEM. Specifically, these are:

Model Specification
Model Identification
Latent to Observed (L2O) Variable Transformation
Find the Model Implied Instrumental Variables (MIIVs)
Use instrumental variable estimator to provide estimates
Test the overidentified equations.

The Model Specification and Identification steps are the same as in traditional approaches to SEM. The remaining steps represent a departure. An appendix provides a general matrix expression of several of these steps, but here we illustrate the MIIV approach using an empirical example published by Reisenzein (1986).⁴ Lastly, we note the freely available MIIVsem software automates these steps for the user and code for our examples can be found in the Online Supplementary Materials.

1. Model Specification

The first step in the process is for researchers to construct a model that corresponds to the theory they wish to test. This transition from theory to a model is the model specification. Reisenzein (1986) designed a randomized experiment to test Weiner’s (1980) attribution-affect model of helping behavior. According to this theory, whether people help others is determined by their anger or sympathy. Anger and sympathy are affected by perceived controllability. If individuals are in difficult situations because of their own controllable actions, this negatively affects sympathy and positively affects anger of the potential helpers. The opposite holds if the situation seems beyond the individuals’ control.

The first of Reisenzein’s (1986) experiments describes a person collapsing and lying on the floor of a subway. Subjects were told that the person was either drunk (“controllable situation”) or ill (“uncontrollable situation”). This randomized story was intended to affect perceptions of controllability, and controllability in turn affected feelings of sympathy and anger. Finally, sympathy should positively affect helping behavior while anger would negatively affect helping.

Figure 1 is a path diagram of Reisenzein’s (1986) hypothesized model that includes the latent variables (circles) as well as observed indicators and randomized stimulus (boxes). The latent variables are controllability (L₁), sympathy (L₂), anger (L₃), and help (L₄). There are three measures for each latent variable (Z₂ to Z₁₃) and Z₁ is the randomized story stimulus or eliciting situation. In Table 1, the variable symbols correspond to those in our paper and the item-specific questions are based on Reisenzein (1986, Table 1, p. 1126). All indicators except the last helping indicator are scored on a nine-point scale. The last helping indicator is scored on a five-point scale. The sample size was 138.

Path Diagram of Reisenzein’s (1986) Model of Helping Behavior

Table 1.

Reisenzein’s (1986) Variables and Their Descriptions¹

Variable Name	Construct	Question or Description
Z1	None	Eliciting Situation
Z2	Controllability	How controllable, do you think, is the cause of the person’s present condition? (1 = not at all under personal control, 9 = completely under personal control).
Z3		How responsible, do you think, is that person for his present condition? (1 = not at all responsible, 9 = very much responsible).
Z4		I would think that it was the person’s own fault that he is in the present situation. (1 = no. not at all. 9 = yes, absolutely so).
Z5	Sympathy	How much sympathy would you feel for that person? (1 = none at all. 9 = very much).
Z6		I would feel pity for this person. (1 = none at all, 9 = very much).
Z7		How much concern would you feel for this person? (1 = none al all, 9 = very much).
Z8	Anger	How angry would you feel at that person? (1 = not at all, 9 = very much).
Z9		How irritated would you feel by that person? (1 = not at all, 9 = very much).
Z10		I would feel aggravated by that person. (1 = not at all, 9 = very much so).
Z11	Help	How likely is it that you would help that person? (1 = definitely would not help. 9 = definitely would help).
Z12		How certain would you feel that you would help the person? (1 = not at all certain. 9 = absolutely certain).
Z13		Which of the following actions would you most likely engage in? 1 = not help at all, try to stay uninvolved; 2 = try to alert other bystanders, but stay uninvolved myself; 3 = try to inform the conductor or another official in charge; 4 = go over and help the person to a seat; 5 = help in any way that might be necessary, including if necessary first aid and/or accompanying the person to a hospital.

Open in a new tab

Variable symbols correspond to path diagram and equations in our paper. Descriptions of variables taken from Reisenzein (1986).

The latent variable model corresponding to the Figure 1 path diagram is,

L_{1} = α_{L_{1}} + B_{L_{1} Z_{1}} Z_{1} + ε_{L_{1}} L_{2} = α_{L_{2}} + B_{L_{2} L_{1}} L_{1} + ε_{L_{2}} L_{3} = α_{L_{3}} + B_{L_{3} L_{1}} L_{1} + ε_{L_{3}} L_{4} = α_{L_{4}} + B_{L_{4} L_{2}} L_{2} + B_{L_{4} L_{3}} L_{3} + ε_{L_{4}}

(1)

where we defined the latent variables in the previous paragraph, the intercepts of each equation are the subscripted αs, the regression coefficients are the subscripted B s, and the εs are the error terms. As usual we assume a mean of zero for all errors (ε). Furthermore, in this path diagram all errors are uncorrelated with the randomized stimulus (Z₁) and with each other.

The measurement model in Figure 1 is,

Z_{2} = L_{1} + ε_{Z_{2}} Z_{3} = α_{Z_{3}} + Λ_{31} L_{1} + ε_{Z_{3}} Z_{4} = α_{Z_{4}} + Λ_{41} L_{1} + ε_{Z_{4}} Z_{5} = L_{2} + ε_{Z_{5}} Z_{6} = α_{Z_{6}} + Λ_{62} L_{2} + ε_{Z_{6}} Z_{7} = α_{Z_{7}} + Λ_{72} L_{2} + ε_{Z_{7}} Z_{8} = L_{3} + ε_{Z_{8}} Z_{9} = α_{Z_{9}} + Λ_{93} L_{3} + ε_{Z_{9}} Z_{10} = α_{Z_{10}} + Λ_{10, 3} L_{3} + ε_{Z_{10}} Z_{11} = L_{4} + ε_{Z_{11}} Z_{12} = α_{Z_{12}} + Λ_{12, 4} L_{4} + ε_{Z_{12}} Z_{13} = α_{Z_{13}} + Λ_{13, 4} L_{4} + ε_{Z_{13}}

(2)

where Z₂ to Z₁₃ are the multiple indicators of the four latent variables, the αs are the intercepts of the measurement equations, the Λs are the factor loadings where the first subscript gives the indicators being influenced and the second subscript signifies the latent variable on which the indicator loads, and the εs are the measurement errors (“unique factors”) for each indicator. As usual, the measurement errors are assumed to have means of zero and to be uncorrelated with all latent variables ( $L s$ ), with the randomized stimulus (Z₁), and with each other. Each latent variable is scaled to its first indicator as shown in Figure 1. “Reference” or “anchor” indicators are two other terms researchers sometimes use for the scaling indicators. We set the origin and metric of each latent variable by setting the scaling indicator’s constant or intercept (α) to zero and its factor loading (Λ) to one (Bollen, 1989, pp.152–54). This makes the scale of the latent variable similar to that of the scaling indicator. They will have the same mean, but the variance of the indicator is larger than the variance of the latent variable due to the variance of the indicator’s error term.

This illustrates the model specification, the step when the analyst represents a theory in terms of a path diagram and/or its corresponding equations. We are now ready to move to model identification.

2. Model Identification

Model identification concerns whether it is possible to find unique values for every parameter. If a parameter is not identified, then it is possible for more than one value of the parameter to satisfy the constraints in the model. In other words, we would not be able to tell the true parameter value from false ones even if we had the population covariance matrix and population means. As Fisher (1966, p.2) describes it: “One literally cannot hope to know the parameters of the equation in question on the basis of empirical observations alone, no matter how extensive and complete these observations may be.”⁵

If we can establish that each and every parameter in the model is a unique function of the means, variances, or covariances of the observed variables, then this algebraic approach establishes model identification (Bollen, 1989, pp. 88–93, 238–42, 326–28). However, in complicated models this algebraic approach is often too difficult or tedious to complete. In practice, researchers depend on SEM software to determine model identification. But there are several reasons that this is less than optimal. First, sometimes the error messages provided in SEM software are ambiguous. The message might list identification as only one of several possible reasons for an error in a run. This creates uncertainty as to whether it is really an identification problem or something else. Second, the empirical tests on identification look at local rather than global identification. What this means is that the empirical checks might miss some situations where one or more parameters are underidentified, but not detected by the software. See Bollen (1989, pages 246–51), Bollen and Bauldry (2010), and Kenny and Milan (2012) for further discussion.

Fortunately, there are identification rules that cover some models and provide a check as to whether a model is identified. The Two-Step Rule of identification (Bollen, 1989, pages 328–31) is one that we can use on the Reisenzein (1986) model. The first step converts the original model to a confirmatory factor analysis (CFA) by focusing on the measurement model and ignoring any direct relations between the latent variables and seeing whether such a model would be identified using rules of identification from CFA. It is well known that if each latent variable has three indicators that load exclusively on one latent variable and no correlated errors, then the model is identified.⁶ The second step is to ignore the measurement model and to treat the latent variable model as if it were an observed variable model. In the Reisenzein (1986) example, this part of the model would correspond to a fully recursive model and these types of models are identified. By passing both steps, the whole model is identified.

Establishing model identification is valuable to the MIIV approach. If the whole model is identified, then each equation can typically be estimated using the MIIV approach. However, the MIIV approach only requires that each equation be identified or overidentified. This means there will be situations where the model as whole might not be identified, but that key equations are. We will revisit the topic of assessing identification at the equation level, rather than the model level, in subsequent sections. But for now, we stick with the more common situation of having an identified model.

3. Latent to Observed (L2O) Variable Transformation

The usual application of instrumental variables operates within an observed variable only framework. Our latent variable and measurement models both include latent variables. To make our model ready for instrumental variables, we convert all latent variables to observed variables. We do so using Bollen’s (1996a) latent to observed (L2O) variable transformation.

The appendix gives a general matrix expression of how to use the L2O model for SEMs in general. Here we illustrate this L2O transformation using the equation for sympathy (L₂). From equation (1), we repeat the sympathy equation:

L_{2} = α_{L_{2}} + B_{L_{2} L_{1}} L_{1} + ε_{L_{2}} .

(3)

Next, we require the measurement equation for the scaling indicator for each of the latent variables in equation (3). Equation (2) shows these scaling indicator equations as:

Z_{2} = L_{1} + ε_{Z_{2}} Z_{5} = L_{2} + ε_{Z_{5}} .

(4)

Using equation (4), we write each latent variable as equal to its scaling indicator minus its error:

L_{1} = Z_{2} - ε_{Z_{2}} L_{2} = Z_{5} - ε_{Z_{5}}

(5)

To complete the L2O transformation of equation (3) for sympathy (L₂), we replace L₁ and L₂ with their right hand side expressions from equation (5). After this substitution, we get,

Z_{5} = α_{L_{2}} + B_{L_{2} L_{1}} Z_{2} - B_{L_{2} L_{1}} ε_{Z_{2}} + ε_{L_{2}} + ε_{Z_{5}} .

(6)

Equation (6) is the L2O version of the latent variable equation (3) for the sympathy (L₂) variable influenced by the controllability (L₁) latent variable. The intercept and regression coefficient of this equation match those in the latent variable equation. However, we have replaced all latent variables with observed variables. In fact, equation (6) almost appears to be a simple regression and we might be tempted to estimate it with OLS. But this would be a mistake because a key assumption of OLS is that the equation error is uncorrelated with the explanatory variables. Examining this composite error shows that $ε_{Z_{2}}$ is part of it and $ε_{Z_{2}}$ certainly correlates with Z₂ and hence undermines OLS as a consistent and asymptotically unbiased estimator. Instrumental variable estimators are ideal to handle this situation, but before discussing them we need to complete the L2O transformation for the Reisenzein (1986) example.

Following the same procedure of replacing each latent variable with its scaling indicator minus its error leads to the full L2O transformation of the latent variable model. To illustrate this further, the original latent variable model from equation (1) is,

L_{1} = α_{L_{1}} + B_{L_{1} Z_{1}} Z_{1} + ε_{L_{1}} L_{2} = α_{L_{2}} + B_{L_{2} L_{1}} L_{1} + ε_{L_{2}} L_{3} = α_{L_{3}} + B_{L_{3} L_{1}} L_{1} + ε_{L_{3}} L_{4} = α_{L_{4}} + B_{L_{4} L_{2}} L_{2} + B_{L_{4} L_{3}} L_{3} + ε_{L_{4}} .

(7)

Each latent variable is equal to its scaling indictor minus its error leading to,

L_{1} = Z_{2} - ε_{Z_{2}} L_{2} = Z_{5} - ε_{Z_{5}} L_{3} = Z_{8} - ε_{Z_{8}} L_{4} = Z_{11} - ε_{Z_{11}}

(8)

which when we substitute into equation (7) gives the L2O form of the original latent variable model of,

Z_{2} = α_{L_{1}} + B_{L_{1} Z_{1}} Z_{1} + ε_{L_{1}} + ε_{Z_{2}} Z_{5} = α_{L_{2}} + B_{L_{2} L_{1}} Z_{2} - B_{L_{2} L_{1}} ε_{Z_{2}} + ε_{L_{2}} + ε_{Z_{5}} Z_{8} = α_{L_{3}} + B_{L_{3} L_{1}} Z_{2} - B_{L_{3} L_{1}} ε_{Z_{2}} + ε_{L_{3}} + ε_{Z_{8}} Z_{11} = α_{L_{4}} + B_{L_{4} L_{2}} Z_{5} + B_{L_{4} L_{3}} Z_{8} - B_{L_{4} L_{2}} ε_{Z_{5}} - B_{L_{4} L_{3}} ε_{Z_{8}} + ε_{L_{4}} + ε_{Z_{11}} .

(9)

Developing consistent and asymptotically unbiased estimators of these intercepts and regression coefficients provides a researcher the key coefficients from the latent variable model. The first equation in (9) differs from the others in that OLS applied to it is a consistent estimator of the coefficients because the composite error is uncorrelated with the Z₁ variable. (Recall that Z₁ is the randomized experimental stimulus.) For the other equations, an instrumental variable estimator would be beneficial because it operates even when the composite error correlates with the explanatory variables.

A L2O transformation of the measurement model is also possible. To illustrate this, consider the measurement model equations for the sympathy (L₂) latent variable. In the original measurement model these are,

Z_{2} = L_{1} + ε_{Z_{2}} Z_{3} = α_{Z_{3}} + Λ_{31} L_{1} + ε_{Z_{3}} . Z_{4} = α_{Z_{4}} + Λ_{41} L_{1} + ε_{Z_{4}}

(10)

The L2O transformation of these equations results in,

L_{1} = Z_{2} - ε_{Z_{2}} Z_{3} = α_{Z_{3}} + Λ_{31} Z_{2} - Λ_{31} ε_{Z_{2}} + ε_{Z_{3}} . Z_{4} = α_{Z_{4}} + Λ_{41} Z_{2} - Λ_{41} ε_{Z_{2}} + ε_{Z_{4}}

(11)

In the same fashion, we could transform all the measurement equations to eliminate the latent variables and create equations with observed variables on the left- and right-hand sides. Each L2O form of the equation results in a composite error that correlates with the right-hand side scaling indicator and this rules out using OLS. Though as we mentioned above, an instrumental variable estimator works in this situation. The next section discusses instrumental variables and how to find them among the observed variables already in the model.⁷

4. Model Implied Instrumental Variables (MIIVs)

Researchers in economics, sociology, and other social sciences use instrumental variable estimators more frequently than do psychologists. However, their potential applications to psychology are great (e.g., Maydeu-Oliveres, Shi, & Fairchild, 2020). It is more appropriate to refer to instrumental variable methods as a family of estimators rather than a single one. And the way in which researchers employ instrumental variables also differs. This can be a source of confusion. For instance, many researchers who have heard of or have used instrumental variable estimators will lament that a good instrument is difficult to find. This refers to a common situation where a researcher has an outcome of interest and a number of covariates to explain it. But there is concern that one or more of the covariates might correlate with the equation error of the outcome and thereby rule out OLS as an estimator. An instrumental variable is one that correlates with the problematic covariates but is uncorrelated with the equation error. In this common approach to instrumental variables, a researcher will search for instruments among other variables not previously considered. Following Bollen (2012), we refer to these variables as “auxiliary” instrumental variables where the instruments were not part of the original model but are sought among variables external to the original equation.

The MIIV approach in Bollen (1996a) is different. For each equation, it determines which observed variables already in the model meet the conditions to serve as instrumental variables. For a set of variables to be proper instruments, they must be uncorrelated with the error in the equation to be estimated, the covariance matrix of the instruments must be nonsingular, and the rank of the covariance matrix of the instruments and the equation’s covariates must equal the number of covariates (e.g., Greene, 2012; Bollen, 2012). Bollen (1996a, p.114) describes two ways to find MIIVs for SEMs. Bollen and Bauer (2004), Bauldry (2014), and Fisher et al. (2017) provide SAS, Stata, and R software, respectively, to automate the process of identifying MIIVs.

To gain a better understanding of how to locate MIIVs, we return to the Reisenzein (1986) empirical example and the equation for sympathy (L₂). Equation (6) gives the L2O transformed equation for this outcome as,

Z_{5} = α_{L_{2}} + B_{L_{2} L_{1}} Z_{2} - B_{L_{2} L_{1}} ε_{Z_{2}} + ε_{L_{2}} + ε_{Z_{5}} .

(12)

We cannot estimate this equation with OLS because of the correlation of the composite error with Z₂. This leads us to seek MIIVs among the observed variables in the model. One way to find MIIVs is through a process of elimination where we rule out any variable that correlates with the composite error term. To illustrate the process, Table 1 contains a list of all observed variables in the model: Z₁ to Z₁₃. By examining each error term in the composite error, we can determine which of these observed variables is directly or indirectly influenced by one or more of these error terms. If an error term directly or indirectly affects an observed variable, then it is correlated with that observed variable and that observed variable in turn is ineligible to be an instrument.

Consider $ε_{L_{2}}$ , the equation error for the sympathy (L₂) equation. Refer back to path diagram in Figure 1. By tracing the effects of $ε_{L_{2}}$ in the path diagram we can find the variables that correlate with it. For instance, we have paths from $ε_{L_{2}}$ to the other two indicators of sympathy (L₂), that is, $ε_{L_{2}} \to L_{2} \to Z_{6}$ and $ε_{L_{2}} \to L_{2} \to Z_{7}$ . This rules out using Z₆ and Z₇ as MIIVs. Note that $ε_{L_{2}}$ also has indirect effects on Z₁₁ to Z₁₃ and this removes these variables from our list of MIIVs. Turning to $ε_{Z_{2}}$ , we find that it only influences Z₂ and $ε_{Z_{5}}$ , the last component of the error, only influences Z₅. This leads us to strike Z₂ and Z₅ off our list of MIIVs. The row corresponding to the L2O equation for Z₅ in Table 2 shows us the remaining variables that are uncorrelated with the equation’s composite error and can serve as MIIVs for the sympathy (L₂) equation. These are Z₁, Z₃, Z₄, Z₈, Z₉, and Z₁₀. We also need to confirm that the MIIVs correlate with Z₂, but given the direct effects linking the latent variables and that at least two of the MIIVs are indicators of the same latent variable as Z₂, these MIIVs are likely to meet this condition.

Table 2.

Latent to Observed (L2O) Equations with Model Implied Instrumental Variables (MIIVs) for Each Equation

L20 Equations	MIIVs
Latent Variable Model
$Z_{2} = α_{L_{1}} + B_{L_{1} Z_{1}} Z_{1} + ε_{L_{1}} + ε_{Z_{2}}$	No MIIVs required. Use OLS.
$Z_{5} = α_{L_{2}} + B_{L_{2} L_{1}} Z_{2} - B_{L_{2} L_{1}} ε_{Z_{2}} + ε_{L_{2}} + ε_{Z_{5}}$	$Z_{1}, Z_{3}, Z_{4}, Z_{8}, Z_{9}$ , and $Z_{10}$
$Z_{8} = α_{L_{3}} + B_{L_{3} L_{1}} Z_{2} - B_{L_{3} L_{1}} ε_{Z_{2}} + ε_{L_{3}} + ε_{Z_{8}}$	$Z_{1}, Z_{3}, Z_{4}, Z_{5}, Z_{6}$ , and $Z_{7}$
$Z_{11} = α_{L_{4}} + B_{L_{4} L_{2}} Z_{5} + B_{L_{4} L_{3}} Z_{8} - B_{L_{4} L_{2}} ε_{Z_{5}} - B_{L_{4} L_{3}} ε_{Z_{8}} + ε_{L_{4}} + ε_{Z_{11}}$	$Z_{1}, Z_{2}, Z_{3}, Z_{4}, Z_{6}, Z_{7}, Z_{9}$ , and $Z_{10}$

Measurement Model
$L_{1} = Z_{2} - ε_{Z_{2}}$
$Z_{3} = α_{Z_{3}} + Λ_{31} Z_{2} - Λ_{31} ε_{Z_{2}} + ε_{Z_{3}}$	$Z_{1}, Z_{4}, Z_{5}, Z_{6}, Z_{7}, Z_{8}, Z_{9}, Z_{10}, Z_{11}, Z_{12}$ and $Z_{13}$
$Z_{4} = α_{Z_{4}} + Λ_{41} Z_{2} - Λ_{41} ε_{Z_{2}} + ε_{Z_{4}}$	$Z_{1}, Z_{3}, Z_{5}, Z_{6}, Z_{7}, Z_{8}, Z_{9}, Z_{10}, Z_{11}, Z_{12}$ and $Z_{13}$
$L_{2} = Z_{5} - ε_{Z_{5}}$
$Z_{6} = α_{Z_{6}} + Λ_{62} Z_{5} - Λ_{62} ε_{Z_{5}} + ε_{Z_{6}}$	$Z_{1}, Z_{2}, Z_{3}, Z_{4}, Z_{7}, Z_{8}, Z_{9}, Z_{10}, Z_{11}, Z_{12}$ and $Z_{13}$
$Z_{7} = α_{Z_{7}} + Λ_{72} Z_{5} - Λ_{72} ε_{Z_{5}} + ε_{Z_{7}}$	$Z_{1}, Z_{2}, Z_{3}, Z_{4}, Z_{6}, Z_{8}, Z_{9}, Z_{10}, Z_{11}, Z_{12}$ and $Z_{13}$
$L_{3} = Z_{8} - ε_{Z_{8}}$
$Z_{9} = α_{Z_{9}} + Λ_{93} Z_{8} - Λ_{93} ε_{Z_{8}} + ε_{Z_{9}}$	$Z_{1}, Z_{2}, Z_{3}, Z_{4}, Z_{5}, Z_{6}, Z_{7}, Z_{10}, Z_{11}, Z_{12}$ and $Z_{13}$
$Z_{10} = α_{Z_{10}} + Λ_{10, 3} Z_{8} - Λ_{10, 3} ε_{Z_{8}} + ε_{Z_{10}}$	$Z_{1}, Z_{2}, Z_{3}, Z_{4}, Z_{5}, Z_{6}, Z_{7}, Z_{9}, Z_{11}, Z_{12}$ and $Z_{13}$
$L_{4} = Z_{11} - ε_{Z_{11}}$
$Z_{12} = α_{Z_{12}} + Λ_{12, 4} Z_{11} - Λ_{12, 4} ε_{Z_{11}} + ε_{Z_{12}}$	$Z_{1}, Z_{2}, Z_{3}, Z_{4}, Z_{5}, Z_{6}, Z_{7}, Z_{8}, Z_{9}, Z_{10}$ , and $Z_{13}$
$Z_{13} = α_{Z_{13}} + Λ_{13, 4} Z_{11} - Λ_{13, 4} ε_{Z_{11}} + ε_{Z_{13}}$	$Z_{1}, Z_{2}, Z_{3}, Z_{4}, Z_{5}, Z_{6}, Z_{7}, Z_{8}, Z_{9}, Z_{10}$ , and $Z_{12}$

Open in a new tab

The preceding example illustrates the selection of MIIVs for a latent variable equation in the Reisenzein (1986) model. A similar process occurs for equations from the measurement model. To illustrate this, we consider the equation Z₃. The L2O transformed equation is,

Z_{3} = α_{Z_{3}} + Λ_{31} Z_{2} - Λ_{31} ε_{Z_{2}} + ε_{Z_{3}} .

(13)

Any variables directly or indirectly influenced by the error terms of $ε_{Z_{2}}$ and $ε_{Z_{3}}$ are ruled out as MIIVs. Referring to the path diagram in Figure 1, we can see that these errors only have direct effects on Z₂ and Z₃, respectively. All other observed variables are eligible to be MIIVs.

As is illustrated in this example, it is not unusual to have more MIIVs for equations from the measurement model than for the latent variable model. Though researchers could use the same steps we illustrate here to find the MIIVs for all equations, it is more common to automate the process with MIIV software (see Bollen & Bauer, 2004; Bauldry, 2014; or Fisher, et al., 2017). Table 2 lists the L2O equations and the MIIVs for each based on the R package MIIVsem. With the MIIVs established, researchers are ready to estimate the model.

5. Estimate Model with MIIV-2SLS Estimator

With the instrumental variables available for each equation in a model, researchers have a variety of instrumental variable estimators available. For instance, Bollen, Kolenikov, and Bauldry (2014) propose a Generalized Method of Moment estimator. The econometric literature suggests a variety of k-class estimators (e.g., Nagar, 1959; Theil, 1961). The most common and most studied instrumental variable estimator to date is Two Stage Least Squares (2SLS). Originally developed by Theil (1953a, 1953b, 1954, 1961), Basmann (1957), and Sargan (1958), researchers have mostly applied 2SLS to simultaneous equation models or equations where latent variables and measurement error were not the focus. Bollen (1996) showed how the MIIV-2SLS estimator applies to both simultaneous equations without measurement error and to latent variable SEMs such as the Reisenzein (1986) example. Typically, researchers apply MIIV-2SLS one equation at a time, though a matrix formulation is available (e.g., Bollen, 2001). It is easier to understand in the context of a single equation, so we focus on this.

We return to the latent variable equation for sympathy (L₂). Its L20 version is (6) or $Z_{5} = α_{L_{2}} + B_{L_{2} L_{1}} Z_{2} - B_{L_{2} L_{1}} ε_{Z_{2}} + ε_{L_{2}} + ε_{Z_{5}}$ . We determined that the MIIVs of this equation are Z₁, Z₃, Z₄, Z₈, Z₉, and Z₁₀. See Table 2. With six MIIVs and only one covariate (Z₂), we have five more MIIVs than the minimum needed. Here we provide an alternative description that provides a more intuitive understanding of how 2SLS works. The problem with using OLS on the L2O equation is that Z₂ correlates with the composite error. We could view the two stages of the 2SLS as:

First Stage: Regress Z₂ on Z₁, Z₃, Z₄, Z₈, Z₉, and Z₁₀ using OLS and form the predicted value of the dependent variable, ${\hat{Z}}_{2}$ .

Second Stage: Regress Z₅ on ${\hat{Z}}_{2}$ using OLS.

The intercept and coefficient from the second stage are consistent and asymptotically unbiased estimators of $α_{L_{2}}$ and $B_{L_{2} L_{1}}$ from the original latent variable sympathy equation. One way to understand how the MIIV-2SLS works is to recognize that ${\hat{Z}}_{2}$ is a linear, weighted combination of Z₁, Z₃, Z₄, Z₈, Z₉, and Z₁₀ from the first stage. Because each of these MIIVs is uncorrelated with the equation error, ${\hat{Z}}_{2}$ , a linear combination of them, is asymptotically (“in large samples”) uncorrelated with the same error. This justifies the use of OLS in the second stage. In a sense, the MIIVs allow us to purge from Z₂ that part of it that correlates with the error. We can think of ${\hat{Z}}_{2}$ as the part of Z₂ that is uncorrelated with the error from equation (6). Purged of this correlated component, the new ${\hat{Z}}_{2}$ variable is asymptotically uncorrelated with the equation error and OLS becomes suitable. The intercept and coefficient from this second stage are consistent estimators. However, the 2SLS procedures as described above require adjusted standard errors to be suitable for significance tests. 2SLS procedures such as that in MIIVsem automatically make these adjustments.

The MIIV-2SLS estimates of the equation for sympathy (L₂) taken from the R package MIIVsem are,

L_{2} = 9.56 - 0.72 L_{1} + {\hat{ε}}_{L_{2}} (0.47) (0.09)

(14)

Here we see a significant negative coefficient of controllability (L₁) that means that higher levels of perceived controllability of the individual result in lower levels of sympathy (L₂) as would be predicted by the theory.

The same procedure is followed for all equations in the model. MIIVsem automates both the selection of the MIIVs and the development of the MIIV-2SLS estimates for all equations in the model. Table 3 lists the MIIV-2SLS results for this model from MIIVsem. As expected, all factor loading estimates are positive and statistically significant (p<0.01) and are similar in magnitude with the exception of the loading for Z₁₃. Similarly, the coefficients for the latent variable model are in the hypothesized direction and statistically significant (p<0.01). For instance, the higher is the perceived Controllability (L₁), the lower is the Sympathy (L₂) and the higher is the Anger (L₃). Overall, the coefficient estimates are consistent with the theory.

Table 3.

Model Implied Instrumental Variable, Two Stage Least Squares (MIIV-2SLS) Estimates of Reisenzein’s (1986) Measurement Model and Latent Variable Model

Measurement Model

Latent Variable	Indicator	Estimate	SE
Controllability (L1)	Controllable (Z2)	1.00
	Responsible (Z3)	1.05	0.08
	Fault (Z4)	1.15	0.09
Sympathy (L2)	Sympathy (Z5)	1.00
	Pity (Z6)	0.72	0.07
	Concern (Z7)	0.72	0.06
Anger (L3)	Angry (Z8)	1.00
	Irritated (Z9)	0.90	0.07
	Aggravated (Z10)	0.89	0.07
Help (L4)	Likelihood (Z11)	1.00
	Certainty (Z12)	1.10	0.06
	Type (Z13)	0.43	0.03
Latent Variable Model

Dependent Variable	Explanatory Variable	Estimate	SE

Controllability (L1)	Eliciting Situation (Z1)	3.83	0.32
Sympathy (L2)	Controllability (L1)	−0.72	0.09
Anger (L3)	Controllability (L1)	0.64	0.08
Help (L4)	Sympathy (L2)	0.43	0.08
Help (L4)	Anger (L3)	−0.40	0.09
Intercepts

Dependent Variable		Estimate	SE

Controllability (L1)		−0.98	0.49
Sympathy (L2)		9.56	0.47
Anger (L3)		−0.19	0.40
Help (L4)		4.73	0.70
Controllable (Z2)		0.00	0.00
Responsible (Z3)		0.36	0.41
Fault (Z4)		−0.95	0.45
Sympathy (Z5)		0.00	0.00
Pity (Z6)		1.87	0.45
Concern (Z7)		1.95	0.40
Angry (Z8)		0.00	0.00
Irritated (Z9)		0.86	0.25
Aggravated (Z10)		0.68	0.24
Likelihood (Z11)		0.00	0.00
Certainty (Z12)		−0.97	0.38
Type (Z13)		0.70	0.22

Open in a new tab

With the estimates in hand, we are able to test the overidentified equations in the model.

6. Test Overidentified Equations

Analysts of SEMs are familiar with the likelihood ratio (LR) test of overall model fit (e.g., Jöreskog, 1973; 1977). The LR chi square compares the fit of the hypothesized model to a saturated model that has as many parameters as there are means, variances, and covariances of the observed variables. We can treat the LR chi square test as a test of the overidentification constraints in the model. What this means is that in an overidentified model there are some parameters for which there are two or more ways in which to solve for them when employing the observed variables’ means, variances, and covariances. If the model is valid, then these different solutions for the same parameter should lead to the same value in the population. A significant LR chi square test implies that these overidentification constraints do not hold as they should if the model is correct.

Similar chi square tests for the whole model can be developed for the MIIV-2SLS estimator (see Bollen & Maydeu-Oliveres, 2007), but a more interesting option is that the MIIV-2SLS estimator has a chi square diagnostic test for each overidentified equation. Just like the LR chi square tests the overidentification of the whole model, the chi square test of overidentified equation tests the overidentification constraints of each equation. The overidentification test only applies when we have more than the minimum number of MIIVs for that equation. For instance, if we have one covariate and it correlates with the equation error, we need a minimum of two MIIVs if we hope to apply this diagnostic test. If there is only one, then we cannot use this diagnostic test.

Kirby and Bollen (2009) reviewed a number of overidentification tests for equations. They found that the Sargan (1958) test had the best overall performance in their Monte Carlo simulation study. The Sargan test (T_Sargan) is,

T_{S a r g a n} = N R_{\hat{u}}^{2}

(15)

where N is the sample size, $\hat{u}$ is the residual from the MIIV-2SLS L2O equation, and $R_{\hat{u}}^{2}$ is the R-squared value from regressing $\hat{u}$ on all of the MIIVs for that equation.

T_Sargan approximates a chi square variate in large samples and has degrees of freedom equal to the excess number of MIIVs above that minimum needed for the equation. The null hypothesis is that all overidentification constraints of the equation hold while the alternative hypothesis is that at least one of the overidentification constraints does not. Rejection of the null hypothesis is evidence that some aspect of the model is in error. This follows in that it was the model structure that led to the MIIVs and if there is evidence against the MIIVs, there is likely a structural misspecification in the model that led to one or more instruments that fail to meet the conditions of an instrumental variable.

Returning to the equation for the sympathy (L₂), the L2O equation is $Z_{5} = α_{L_{2}} + B_{L_{2} L_{1}} Z_{2} - B_{L_{2} L_{1}} ε_{Z_{2}} + ε_{L_{2}} + ε_{Z_{5}}$ and the MIIVs are Z₁, Z₃, Z₄, Z₈, Z₉, and Z₁₀. We have six MIIVs and only one right hand side covariate (Z₂), so our chi square test will have 5 (=6–1) degrees of freedom (df). The residuals ( $\hat{u}$ ) are generated by,

{\hat{u}}_{i} = Z_{5 i} - {\hat{α}}_{L_{2}} - {\hat{B}}_{L_{2} L_{1}} Z_{2 i} .

(16)

We then regress ${\hat{u}}_{i}$ on $Z_{1}, Z_{3}, Z_{4}, Z_{8}, Z_{9}$ , and $Z_{10}$ . The $R_{\hat{u}}^{2}$ from this equation is multiplied by N and this provides the Sargan test statistic from equation (15). Using the R package MIIVsem, we find that the Sargan chi square is 10.44 with 5 df and this has a p-value of 0.51. The nonsignificant test statistic means we cannot reject the null hypotheses that all overidentification constraints for these MIIVs hold.

To help give readers more intuition surrounding the overidentification test, we estimated the sympathy (L₂) equation two additional times, first using (Z₃) as the only MIIV and then using (Z₄) as the only MIIV. The respective estimates for $B_{L_{2} L_{1}}$ are −0.705 (s.e. = 0.100) and −0.846 (s.e. = 0.107). If the null hypothesis for the overidentification test is true, the population value of $B_{L_{2} L_{1}}$ should be the same regardless of the MIIV(s) used. While the estimates of $B_{L_{2} L_{1}}$ differ, the nonsignificant test statistic reported above means that we cannot reject the null hypothesis that the difference is only due to sampling error. If we compare the standard errors for $B_{L_{2} L_{1}}$ estimated using the single MIIVs with the standard error using all available MIIVs from Table 3, we can see that the standard error in Table 3 is somewhat smaller, illustrating another advantage of combining estimates.

All the equations except controllability (L₁) are overidentified: the number of MIIVs exceeds the minimum needed. The last three columns of Table 4 list the Sargan test, df, and p-value for all other equations. Because of the large number of tests, the p-values use the Holm (1979) adjustment for multiple testing. This is an option available in MIIVsem.

Table 4.

Sargan Tests for All Overidentified Equations in Reisenzein’s (1986) Model

Measurement Model

Latent Variable	Indicator	Chi-square	df	p-value
Controllability (L1)	Responsible (Z3)	12.08	10	1.00
	Fault (Z4)	11.55	10	1.00
Sympathy (L2)	Pity (Z6)	10.42	10	1.00
	Concern (Z7)	26.68	10	0.03
Anger (L3)	Irritated (Z9)	5.28	10	1.00
	Aggravated (Z10)	9.27	10	1.00
Help (L4)	Certainty (Z12)	7.63	10	1.00
	Type (Z13)	11.85	10	1.00
Latent Variable Model

Dependent Variable	Explanatory Variable	Chi-square	df	p-value

Sympathy (L2)	Controllability (L1)	10.44	5	0.51
Anger (L3)	Controllability (L1)	14.64	5	0.12
Help (L4)	Sympathy (L2)	16.02	6	0.12
Help (L4)	Anger (L3)

Open in a new tab

After the multiple testing adjustment, the only equation with a statistically significant (p<0.05) Sargan test statistic is the indicator equation for Concern ( $Z_{7}$ ) in the measurement model. This is evidence that one or more of the MIIVs for this equation correlates with the error term of this indicator L2O equation. Because the MIIVs derive from the model specification, the test statistic suggests a misspecification in the model that impacts the MIIV selection for this equation. One possible misspecification related to Z₇ comes from the question wording (“How much concern would you feel for this person?”). One possibility is that Z₇ cross-loads on Help (L₄) in that concern could tap a respondent’s desire to help. If Z₇ does in fact cross-load on Help (L₄) but this is left unmodelled, Z₁₁ is incorrectly included as a MIIV for the Z₇ equation. Interestingly, when this cross-loading is included in the model specification the Sargan test statistic for the Z₇ equation is no longer statistically significant.

Chi Square Difference Tests

There are times when we would like to compare two different model structures that would lead to a different set of MIIVs for one model versus the other. For example, for the sympathy (L₂) equation, suppose we suspected that in the original structural equation model with latent variables that there were correlated errors between $ε_{Z_{5}}$ and $ε_{Z_{4}}$ . The L2O equation for the sympathy (L₂) equation is $Z_{5} = α_{L_{2}} + B_{L_{2} L_{1}} Z_{2} - B_{L_{2} L_{1}} ε_{Z_{2}} + ε_{L_{2}} + ε_{Z_{5}}$ and the MIIVs for the equation without the correlated errors are Z₁, Z₃, Z₄, Z₈, Z₉, and Z₁₀. If the correlated error were in the model, then the L2O equation would be the same, but the MIIVs would exclude Z₄ and would be Z₁, Z₃, Z₈, Z₉, and Z₁₀. We could test the hypothesis of correlated errors between $ε_{Z_{5}}$ and $ε_{Z_{4}}$ by using MIIV-2SLS to estimate the L2O equation twice. Once would be with the full set of MIIVs, Z₁, Z₃, Z₄, Z₈, Z₉, and Z₁₀, and the second time would be the MIIVs omitting Z₄. The Sargan test statistic would be computed for each of these equations with the different MIIVs. A test of the correlated errors would be subtracting the T_Sargan from the model with reduced set of MIIVs from the T_Sargan for the full set of MIIVs. This would form a new chi square difference test with one degree of freedom. The null hypothesis would be that there are no correlated errors between $ε_{Z_{5}}$ and $ε_{Z_{4}}$ , and the alternative hypothesis would be that there are. In this case a significant chi square difference test supports the correlated errors hypothesis.

More generally, if two different models lead to two different sets of MIIVs for the same equation and one set of MIIVs is a subset of the other, we can compare the models with this chi square difference test. Or if we are concerned that one or more MIIVs are not appropriate for an equation, we can test that idea with this chi square difference test.

This concludes the six main steps of the MIIV approach to SEM. These same steps hold regardless of whether we are dealing with a confirmatory factor analysis, a simultaneous equation system, or a general latent variable SEM. With this behind us, we turn to several other extensions and uses of the MIIV-2SLS estimator. These include: the MIIV-2SLS’s robustness to structural misspecifications, analysis based on the means and covariance matrix of observed variables in place of raw data, the incorporation of categorical endogenous variables, multilevel analysis, and the role of MIIVs in causal inference. We turn now to an overview of the robustness of MIIV-2SLS to errors in specification.

Robustness to Structural Misspecification

No estimator completely escapes the biasing effects of structural misspecifications. However, the MIIV-2SLS estimator appears more robust than ML estimator on distributional assumptions (e.g., normality) and more robust than ML and DWLS on structural misspecifications, which are two primary types of robustness that researchers discuss in the SEM literature.

Most estimators for SEMs have methods to account for distributional misspecifications. For example, ML remains a consistent estimator (Browne, 1984) when distributional assumptions fail, and we can adjust its standard errors to take account of nonnormality. Distributional misspecifications are less of an issue for the MIIV-2SLS estimator because the asymptotic properties of it are “distribution-free” (Bollen, 1996a) – they do not assume the observed variables come from normal distributions.

The literature on robustness to structural misspecifications has been mostly dedicated to the system wide ML estimator, where misspecifications have been found to affect both consistency (e.g., Yuan et al., 2003) and significance tests (e.g., Kolenikov & Bollen, 2012). MIIV-2SLS is less affected by structural misspecifications because it is equation oriented. More specifically, Bollen (2001) proposed that the MIIV-2SLS estimator for a specific equation remains robust if two conditions are met: the equation being estimated is correctly specified and the MIIVs for that equation are not changed by the structural misspecifications in other equations.

Table 5 summarizes more specific robustness conditions from Bollen, Gates, and Fisher (2018) and Bollen (2020b). These articles give a more detailed description of these conditions, but we illustrate a couple here. We use simulation examples to demonstrate some of the MIIV-2SLS robustness condition summarized in Table 5. Unlike real empirical data, we know the data generation model (DGM) with simulations. This allows us to compare an incorrectly specified model to the true model. Because the analytic conditions of robustness hold for MIIV-2SLS at any sample size and any model, there is no need to use an extensive Monte Carlo simulation with multiple replications. We intend these two examples to illustrate these analytic conditions.

Table 5.

MIIV-2SLS Robustness Conditions for Structural Misspecifications (Bollen et al., 2018; Bollen, 2020b)¹

1.	The MIIV-2SLS estimator of the coefficients of the measurement model is robust-unchanged to structural misspecifications in the latent variable model.
2.	The MIIV-2SLS estimator of the coefficients of the latent variable model is robust-unchanged to having the incorrect covariance matrices among the unique factors (errors) of the measurement model for the nonscaling indicators.
3.	The MIIV-2SLS estimator of the coefficients of the latent variable model is robust-unchanged to omitted cross-loadings as long as the omitted nonzero cross-loadings when present do not create indirect paths from the latent variable equation error to one or more of the MIIVs from the misspecified model.
4.	The MIIV-2SLS estimator of the coefficients of the latent variable model is robust-unchanged to the omission of correlated unique factors (errors) for the scaling indicators of latent variables that are included in the true latent variable equation.
5.	The MIIV-2SLS estimator of the coefficients of the latent variable model is not robust-unchanged to omitted cross-loadings if the omitted cross-loadings when present create nonzero indirect paths from the latent variable equation error to one or more of the MIIVs from the misspecified model.
6.	The MIIV-2SLS estimator is robust-unchanged to the omission of any observed variable that is not part of the L2O transformed equation and is not among the MIIVs for that equation.
7.	The MIIV-2SLS estimator of factor loadings in an indicator equation is robust-unchanged when the indicator equation is correct and none of the errors in the L2O version of the indicator equation correlate with other indicators in the model.
8.	The MIIV-2SLS estimator of an indicator equation is robust-unchanged under the conditions given in 7. even when among the remaining indicators there are: a) omitted correlated errors, b) omitted crossloadings, c) causal indicators treated as reflective, or d) direct effects between these indicators.
9.	The MIIV-2SLS estimator of the coefficients of an indicator equation is not robust-unchanged to: a) omitted correlated errors involving the indicator or scaling indicator with the errors of other indicators, b) omitted crossloadings that directly affect the indicator or scaling indicator, c) mistakenly treating the indicator or scaling indicator as reflective when one or both are causal indicators, d) omitting direct effects between the scaling or indicator of interest and one or more remaining indicators.

Open in a new tab

^1.

Robust-unchanged refers to the situation where the estimates of one or more parameters are identical for two different model structures. See Bollen et al. (2018) for further discussion of the meaning of robust to structural misspecifications.

Rule 1 states that structural misspecifications in the latent variable model do not affect the parameter estimates of the measurement model. To illustrate this, we examine a structural misspecification in the latent variable model. Figure 2 represents both the true DGM and the false estimated model.⁸ Both models share the same specification at the measurement model level; The solid and dashed lines in Figure 2 stand for the DGM where the latent variable model has $L_{1} \to L_{2} \to L_{3}$ and $L_{1} \to L_{3}$ . The misspecified model omits the dashed line and the latent variable model has $L_{1} \to L_{2} \to L_{3}$ only. According to Rule 1, all the measurement model parameter estimates from MIIV-2SLS should be the same in both models, even though the latent variable coefficients will differ. Table 6 includes the coefficients from these two models, estimated using both MIIV-2SLS and Maximum Likelihood (ML). It shows that the estimates for factor loadings remain unchanged (highlighted in gray) between the DGM and the misspecified latent variable model when estimated using MIIV-2SLS. In contrast, the ML estimate of the factor loading for Z₂ declines about 30%, the factor loading for Z₄ increases about 7%, and the factor loading for Z₆ increases about 28%, even though the measurement model is correctly specified.

True and misspecified model for simulation 1.

Table 6.

Comparison of coefficients under data generating and misspecified models in figure 2 using MIIV-2SLS and ML estimators in simulation 1.

Coefficient estimates	Data generation model		Misspecified model

	MIIV-2SLS	ML	MIIV-2SLS	ML

Z₁ on L₁	1	1	1	1
Z₂ on L₁	0.796	0.807	0.796	0.565
Z₃ on L₂	1	1	1	1
Z₄ on L₂	0.794	0.799	0.794	0.853
Z₅ on L₃	1	1	1	1
Z₆ on L₃	0.815	0.833	0.815	1.070
L₂ on L₁	0.158	0.258	0.158	0.297
L₃ on L₁	0.876	0.772	NA	NA
L₃ on L₂	0.131	0.265	0.480	0.500

Open in a new tab

This shows that even if the researcher fits a latent variable model with an omitted path, the MIIV-2SLS measurement model estimates are unaffected, whereas we cannot guarantee the same for ML. This rule 1 for MIIV-2SLS applies beyond this example. Structural misspecifications in the latent variable model do not affect the measurement model estimates. A useful aspect of this property is that if the Sargan (1958) chi square test detects problems with an equation from the measurement model, this robustness condition tells us that the structural misspecification is not located in the latent variable model, but rather in the measurement model.

Another very common model misspecification is the omission of correlated errors. We demonstrate rule 2 and rule 4 from Table 5 with the models in Figure 3. The true DGM includes both the solid and dashed lines and the incorrect model omits the dashed lines.⁹ In this case, in the incorrectly estimated model, the correlated measurement errors are omitted such as might occur with longitudinal data where the same indicators are used at three waves of data. Like the procedure applied to simulation 1, we estimated both the DGM and the misspecified model with omitted correlated errors, using both MIIV-2SLS and ML. Table 7 summarizes the coefficient estimates from both models using both methods. The estimates show that although the estimates in the measurement model differ, the coefficient estimates for the latent variable model remain unchanged when estimated using MIIV-2SLS. In contrast, the ML estimate of the $L_{1} \to L_{2}$ and $L_{2} \to L_{3}$ coefficients inflate by about 17% and 11%, respectively, even though the latent variable model is correctly specified. The code for data generation and analyses for both simulation cases can be found in the Supplementary Material.

True and misspecified model for simulation 2.

Table 7.

Comparison of coefficients under data generating and misspecified models in figure 3 using MIIV-2SLS and ML estimators in simulation 2.

Coefficient estimates	Data generation model		Misspecified model

	MIIV-2SLS	ML	MIIV-2SLS	ML

Z₁ on L₁	1	1	1	1
Z₂ on L₁	0.752	0.733	0.823	0.813
Z₃ on L₁	0.816	0.814	0.890	0.891
Z₄ on L₂	1	1	1	1
Z₅ on L₂	0.825	0.803	0.770	0.868
Z₆ on L₂	0.793	0.816	0.732	0.859
Z₇ on L₃	1	1	1	1
Z₈ on L₃	0.867	0.880	0.764	0.896
Z₉ on L₃	0.915	0.917	0.764	0.903
L₂ on L₁	0.570	0.523	0.570	0.610
L₃ on L₂	0.495	0.644	0.495	0.712

Open in a new tab

The two examples of structural misspecifications provided illustrate some of the MIIV-2SLS robustness conditions summarized in Table 5. Many other examples are possible, but these suffice to illustrate that when using MIIV-2SLS we have important guiding principles on what does and does not impact a given equation. Importantly, these examples demonstrate when MIIV-2SLS can isolate specification errors. Furthermore, when our diagnostics point to problems, these conditions tell us where in the model the problems lie. It also is valuable to note that these conditions hold for both continuous and categorical endogenous variable models.

In the next section we move from specification analysis to ways in which we can apply MIIV-2SLS to summary statistics rather than raw data.

Analysis Using Covariance Matrix and Means

Conventional SEM analysis sometimes analyzes the sample covariance matrix and sample mean vector of the observed variables in place of the raw data. The presentation so far has focused on raw data analysis, but the MIIV-2SLS applies to covariance matrices and means. In fact, we can obtain all of the quantities discussed thus far from the first two moments of the sample data. The full covariance matrix and mean vector for the Reisenzein (1986) example are below,

S = [\begin{matrix} 0.25 \\ 0.96 & 7.12 \\ 1.05 & 5.65 & 8.05 \\ 1.16 & 5.81 & 6.55 & 8.96 \\ −0.69 & −3.61 & - 3.98 & - 4.91 & 6.98 \\ −0.52 & −2.98 & - 2.96 & - 3.35 & 4.69 & 6.05 \\ −0.58 & −2.49 & - 2.99 & - 3.50 & 4.33 & 3.31 & 5.04 \\ 0.69 & 3.62 & 3.27 & 4.33 & −2.29 & −1.45 & −1.98 & 5.57 \\ 0.57 & 3.03 & 3.00 & 3.82 & −2.21 & −1.26 & −1.74 & 3.93 & 5.33 \\ 0.57 & 3.12 & 2.84 & 3.49 & −1.67 & −1.40 & −1.56 & 3.92 & 3.60 & 4.98 \\ −0.65 & −2.75 & - 3.37 & - 3.64 & 3.01 & 2.26 & 3.24 & −2.53 & −2.46 & −2.34 & 4.98 \\ −0.69 & −2.70 & - 3.36 & - 3.78 & 3.20 & 2.25 & 3.34 & −2.85 & −2.78 & −2.81 & 4.82 & 6.24 \\ −0.26 & −0.92 & −1.21 & −1.50 & 0.95 & 0.75 & 1.27 & −1.04 & −0.89 & −0.91 & 1.87 & 2.12 & 1.30 \end{matrix}]

\bar{Z} = [\begin{matrix} 1.47 & 4.65 & 5.23 & 4.41 & 6.23 & 6.38 & 6.44 & 2.79 & 3.36 & 3.15 & 6.28 & 5.93 & 3.38 \end{matrix}]

To provide an illustration of how we calculate the MIIV-2SLS coefficients using these quantities, we return to the sympathy (L₂) equation from the previous section. To recap, the L2O equation for the sympathy equation is $Z_{5} = α_{L_{2}} + B_{L_{2} L_{1}} Z_{2} - B_{L_{2} L_{1}} ε_{Z_{2}} + ε_{L_{2}} + ε_{Z_{5}}$ and the MIIVs are Z₁, Z₃, Z₄, Z₈, Z₉, and Z₁₀. Now let $S_{V_{L_{2}} V_{L_{2}}}$ be the sample covariance matrix for the MIIVs of the L₂ equation,

S_{V_{L_{2}} V_{L_{2}}} = [\begin{matrix} 0.25 \\ 1.05 & 8.05 \\ 1.16 & 6.55 & 8.96 \\ 0.69 & 3.27 & 4.33 & 5.57 \\ 0.57 & 3.00 & 3.82 & 3.93 & 5.33 \\ 0.57 & 2.84 & 3.49 & 3.92 & 3.60 & 4.98 \end{matrix}]

(17)

And let $S_{Z_{L_{2}} V_{L_{2}}}$ be the elements of the sample covariance matrix S with rows corresponding to the explanatory variables in the L20 version of the L₂ equation and columns corresponding to the MIIVs for that same equation such that, $S_{Z_{L_{2}} V_{L_{2}}} = [\begin{matrix} \begin{matrix} 0.96 & 5.65 & 5.81 & 3.62 & 3.03 & 3.12 \end{matrix} \end{matrix}]$ . Similarly, let $S_{V_{L_{2}} Y_{L_{2}}} = [\begin{matrix} \begin{matrix} - 0.69 & - 3.98 & - 4.91 & - 2.29 & - 2.21 & - 1.67 \end{matrix} \end{matrix}]$ be the sample covariances between the MIIVs and dependent variable from the L20 version of the L₂ equation. With these quantities in hand, we can calculate the coefficient for $B_{L_{2} L_{1}}$ as,

{\hat{B}}_{L_{2} L_{1}} = {(S_{Z_{L_{2}} V_{L_{2}}} S_{V_{L_{2}} V_{L_{2}}}^{- 1} S_{Z_{L_{2}} V_{L_{2}}}^{'})}^{- 1} S_{Z_{L_{2}} V_{L_{2}}} S_{V_{L_{2}} V_{L_{2}}}^{- 1} S_{V_{L_{2}} Y_{L_{2}}} = - 0.72

We also calculate the intercept for the L₂ equation, $α_{L_{2}}$ , using the sample means and the regression coefficient $B_{L_{2} L_{2}}$ as follows,

{\hat{α}}_{L_{2}} = {\hat{μ}}_{Y_{L_{2}}} - {\hat{μ}}_{Z_{L_{2}}} {\hat{B}}_{L_{2} L_{1}} = 9.56

where ${\hat{μ}}_{Y_{L_{2}}} = 6.23$ is the sample mean of the dependent variable and ${\hat{μ}}_{Z_{L_{2}}} = 4.65$ is the mean of the explanatory variable from the L20 version of the L₂ equation. When computation using the sample moments is possible (e.g. no missing data, continuous outcomes), the resulting coefficients, standard errors and test statistics will be identical to those obtained from the raw data. For this reason, the MIIVsem software was designed to allow for both raw data as well as the sample covariance matrix and mean vector as inputs to a MIIV-2SLS analysis.

This section demonstrates how a covariance matrix and vector of means enables us to generate the MIIV-2SLS estimates. The next section illustrates how we extend these ideas to covariance matrices and means in multilevel models.

Multilevel Analysis

A recent area of development for the MIIV approach is the development of procedures to estimate and test multilevel SEMs (MSEM). Much of the work on MSEM has focused on models with random intercepts across groups. The same is true for the MIIV approach. MSEMs are particularly useful if one has a reasonably large dataset with clustered observations and theoretical latent variables.

A key feature of the random intercepts MSEM is that we can decompose the total covariance matrix into two additive and orthogonal components: a within groups covariance matrix and a between groups covariance matrix. This is typically given in the following expression:

Σ_{T} = Σ_{W} + Σ_{B}

(18)

where $Σ_{T}$ is the population total covariance matrix, $Σ_{W}$ is the population within groups covariance matrix, and $Σ_{B}$ is the population between groups covariance matrix. Both early and modern ML estimation approaches for MSEMs rely on estimating the level-specific components for a given sample. We will refer to the sample counterparts as $S_{W}$ and $S_{B}$ . We apply MIIV-2SLS in a similar way; we provided a key ingredient in the preceding section by showing that MIIV-2SLS can be estimated with covariance matrices in place of raw data.

Obtaining level specific covariance matrices for a given sample requires estimating these in an additional step. Several authors have suggested ways to estimate them (Goldstein & McDonald, 1988; Muthén, 1990; Muthén 1994, Yuan & Bentler, 2007). Giordano and Bollen (2020) provide greater detail on the application of MIIV-2SLS for estimating MSEMs, using the work of Yuan and Bentler (2007) to obtain estimates of $S_{W}$ and $S_{B}$ . The general procedure can be broken down into the following steps: (1) Estimate the level specific covariance matrices $S_{W}$ and $S_{B}$ , (2) obtain point estimates by applying MIIV-2SLS to each level using $S_{W}$ and $S_{B}$ as raw data, (3) adjust standard errors with the delta method, and (4) adjust Sargan test with the delta method. Giordano and Bollen (2020) showed that this procedure performed similarly to ML across a variety of sample sizes, offering additional robustness qualities already outlined in this paper. We direct readers to that article for specifics of performance.

In addition to the advantages of using MIIV-2SLS already outlined in this paper, there are some unique advantages in the MSEM context. First, ML based estimation of MSEMs is somewhat limited in terms of sample size. Generally, 100 is considered the minimum necessary number of clusters. Giordano and Bollen (2020) showed that it is possible to obtain reasonable estimates in smaller samples with MIIV-2SLS, with the caveat that one may need to consider using a smaller subset of instruments to avoid slight negative bias. Next, assessing model fit can be misleading with traditional chi-square based fit indices. Potential solutions with ML based fit indices exist but require implementing. MIIV-2SLS for MSEMs offers equation-by-equation tests of fit using the Sargan test as already outlined. Giordano and Bollen (2020) show that the adjusted Sargan test for MSEMs was able to detect misspecification with adequate power across a range of sample sizes, with the expected alpha levels in correctly specified equations.

Another situation besides multilevel models where covariance matrices form the input to a SEM analysis is when some of the observed endogenous variables are noncontinuous variables. In this case, researchers first estimate the polychoric correlation (covariance) matrix and then apply DWLS to estimate the structural parameters of the model. Fortunately, there is a MIIV approach to analyze the polychoric correlation matrix. The next section sketches out this MIIV approach.

Categorical and Noncontinuous Endogenous Variables

Binary, ordinal, and other noncontinuous endogenous variables are common in social science research. Bollen and Maydeu-Oliveres (2007) proposed the Polychoric Instrumental Variable (PIV) estimator, a MIIV estimator for binary and ordinal endogenous variables. Recent research has found a number of promising features of the PIV estimator. First, Nestler (2013) found the PIV estimator to be as accurate as the Unweighted Least Squares (ULS) and Diagonally Weighted Least Squares (DWLS) estimators when a model is correctly specified. But when structural misspecifications were present, Nestler (2013) found the MIIV-2SLS estimates to be more robust and less biased than ULS and DWLS. Similarly, Jin et al. (2016) found the PIV estimates to be equivalent in performance to those from ULS and DWLS when the equation and model were both correctly specified and less biased for parameters from correctly specified equations when model misspecification was present in the model at large.

These are important findings as it is far from controversial to suggest our models are approximations as opposed to a near perfect realizations of the phenomena under study. Furthermore, for system wide estimators it is often the case that bias introduced by even minor misspecifications, such as an omitted factor loading, linearly increases with each additional omission (Yang-Wallentin, Jöreskog, & Luo, 2010). Furthermore, a number of recent analytic developments have extended diagnostic tests for the PIV estimator originally proposed by Bollen and Maydeu-Oliveres (2007). For example, Jin and Cao (2018) proposed an equation-by-equation overidentification test compatible with the PIV framework.

Another recent extension comes from Fisher and Bollen (2020), who extend the PIV estimator to the case of mixed continuous, dichotomous and ordinal indicators. Mixed data types are increasingly common, and it is important that applied researchers also have access to MIIV estimators capable of handling this situation. Fisher and Bollen (2020) developed novel alternative parameterizations for ordinal variable models (e.g. Jöreskog, 2002) which allow researchers increased flexibility in SEM specification. For example, researchers can estimate the error variances and intercepts of the underlying variables for ordinal indicators when more than two categories are present. Fisher and Bollen (2020) also provide analytic matrix derivatives for a general MIIV model in terms of both the mean and covariance structures. They show this general estimator to perform well in simulations across a number of simulation studies designed to mimic real-world modeling scenarios.

In this section, we briefly described a MIIV estimator that works with categorical endogenous observed variables. This, as with the other estimators, is an instrumental variable approach. In the next section, we give more of a description of the role of instruments in causal inference.

MIIVS and Causal Inference

Researchers familiar with instrumental variables know that a common application is to recover causal estimates when one or more covariates in an equation are endogenous (Gennetian, Magnuson, & Morris, 2008). Following Bollen (2012), we make a three-way distinction between model implied instrumental variables (MIIVs), randomization instrumental variables (RIVs), and auxiliary instrumental variables (AIVs). In practice, the AIVs are the most common. The typical situation is when an analyst suspects that a covariate in a single equation correlates with the equation’s error. The researcher then seeks one or more variables external to the original model that satisfy the conditions of instrumental variables (e.g., correlated with the covariate, but uncorrelated with the equation’s error). The RIVs differ in that they are typically part of the research design where an intervention or quasi-experimental situation leads this intervention to satisfy the instrumental variables’ conditions. To the degree that the RIV is part of the original research design and original model, it is a special case of a MIIV in that it was not an instrument that was sought afterwards but it is an essential variable in the original model.

The Reisenzein (1986) empirical example provides a good example of a RIV that is a MIIV. The randomized story stimulus, Z₁, is an essential component of the original model and research design. It is an example of a RIV because it is randomized and therefore plausibly uncorrelated with all equation error terms. Because of this, Z₁ can be used as a MIIV for each equation in the model. Other variables in the Reisenzein (1986) model are not randomized, but still are eligible as MIIVs based on the equations and assumptions embedded in the model. In earlier sections, we gave examples of these MIIVs.

Is the MIIV approach compatible with causal analysis? The answer to this goes back to the structural equation model and its purpose. The SEM embodies the causal assumptions of the researcher (Bollen & Pearl, 2013; Wright, 1921). The causal assumptions are justified via research design, theoretical hypotheses, and substantive expertise. Though we never have certainty about causal relations, we have diagnostic tests that can cast doubt on them (Popper, 2005). In the MIIV approach, the overidentification tests such as the Sargan (1958) test discussed previously provides such evidence. The MIIVs are chosen based on the causal assumptions that are part of the model. If an equation has a statistically significant result of a Sargan test, this is evidence against the model structure and hence against the causal assumptions represented in the model. The reason is that it was the model structure that led to the MIIVs and if we reject one or more of the MIIVs, then we should question the causal assumptions that led to those MIIVs. Passing the overidentification diagnostic tests does not prove the model to be true, but at least provides evidence that the model and data are consistent with each other.

The MIIV approach in this sense is similar to the traditional system wide estimators of SEMs. Regardless of the estimator, the SEM contains the causal assumptions of the researcher. One relevant difference of the MIIV approach is its property of better isolating the biases of misspecified models than the ML/DWLS approach. Another is that the MIIV method provides equation-by-equation diagnostic tests whereas the ML/DWLS focus is a global test. But both approaches lead to models that represent the causal assumptions of researchers. The DAG approach to SEM is analogous to the MIIV approach in that it provides additional local fit tests researchers can use to interrogate the causal assumptions of their models before estimating them (Chen, Pearl, & Kline, 2018; Thoemmes, Rosseel, & Textor, 2018). There are also other tools for determining instruments in a prespecified model (Chen, Pearl, & Kline, 2018). The additional local fit tests and tools for instrument determination can be implemented using the DAGitty R package (Textor, Hardt, & Knüppel, 2011). Because it is impossible to test all the causal assumptions present in a model, it is imperative that researchers understand the causal assumptions they are making when deciding to make a variable exogenous or to omit a correlation between two error terms in a model.

Extensions and Applications

In the previous sections, we provided an overview of areas of SEMs where MIIV-2SLS is applicable, ranging from CFA and SEM to multilevel and categorical endogenous variable models. However, this is far from a complete list. Though we do not have the space to more fully describe these, we provide a summary here. With regard to extensions, the MIIV-2SLS applies to models that are nonlinear in their latent variables such as models with interactions or quadratics (Bollen, 1995; Bollen & Paxton, 1998; Brandt, Umbach, Kelava, & Bollen, 2020). To list a few, Nestler (2014) shows how to extend the MIIV approach to latent growth curve models. Bollen and Biesanz (2002) explain how to estimate higher order factor analysis models with the MIIV method. Equality restrictions on parameters are possible as Nestler (2015a) describes. Gates et al. (2020) propose a MIIV-2SLS approach to time series data that allows individual-level analysis without assuming measurement invariance across individuals. Fisher, Bollen, and Gates (2019) introduce a MIIV method to analyze dynamic factor analysis models that is more robust to structural misspecifications than alternative times series methods. Culpepper, Aguinis, Kern, and Millsap (2019) illustrate a MIIV-2SLS technique to handle measurement and prediction invariance.

The MIIV techniques also have led to diagnostic tests to improve model specification. Mooijaart and Satorra (2009) argued that the conventional likelihood ratio chi square test is insensitive to nonlinear misspecifications. Nestler (2015b) showed how to develop diagnostic tests using MIIV-2SLS to uncover unmodeled nonlinear effects. Oczkowski and Farrell (1998) proposed a MIIV method to discriminate between different measurement scales of the same construct with a nonnested test. Bollen (2011) took a MIIV approach to determining the number of dimensions underlying a set of measures. Jin and Cao (2018) propose a chi square overidentification test that applies to equations with categorical dependent variables when using MIIVs.

Furthermore, the MIIV approach is not limited to the 2SLS estimator with homoscedastic errors. For example, Bollen (1996b) explains the calculation of heteroscedastic-consistent standard errors with MIIV-2SLS. Bollen, Kolenikov, and Bauldry (2014) were already mentioned as a source of the Generalized Method of Moments (GMM) MIIV estimator. A valuable feature of the MIIV-GMM estimator is that it is scalable so that a research can estimate any subset of equations in the system and have diagnostic tests for just that part. There are a variety of other instrumental variable estimators that could apply to the MIIV approach. The key is using the L2O transformation to eliminate the latent variables from the model and finding the model implied instruments suitable for an equation. Hayakawa and Sun (2019) compare a variety of limited information MIIV estimators and how they perform under different conditions.

Empirical applications of the MIIV approach are available as well. The MIIV approach has been popular for analyzing interactions between latent variables because prior techniques for accomplishing this required difficult programming. The steps described above provide many of the necessary ingredients for MIIV-2SLS with latent interactions, but we briefly describe the additional complications here and direct readers to Bollen and Paxton (1998) for more details. For an equation with a product term between two latent variables, the L2O transformation would result in a product term between the two scaling indicators of the latent variables and a more complicated error term. Adding an interaction term between two latent variables already present in the equation will not change the variables eligible to be MIIVs, but we also would prefer additional MIIVs that reflect the interaction between the latent variables. We accomplish this by forming all possible product terms between the non-scaling indicators of the latent variables involved in the interaction that are eligible to be MIIVs, where each product term consists of one indicator from the first latent variable and one from the second.

Miller, Niehuis, and Huston (2006) adopted the MIIV-2SLS approach to estimate the interaction effect between the latent variables of agreeable behavior and disagreeable behavior on peoples’ perception of their partner. The method was also used to study interaction effects between work stressors and active coping latent variables on psychological symptoms (Snow, Swan, Raghavan, Connell, & Klein, 2003). See Gray and Montgomery (2012) for another example of the application of MIIV-2SLS to interaction analysis.

Other studies provided empirical examples to show that MIIV-2SLS has greater ability to detect structural misspecifications than the system wide ML estimator. As an example, Quinn’s (2005) analysis looked at job performance of knowledge workers. He tested two theoretical models about the factors that relate to performance and applied both the full-information (ML) and limited-information (MIIV-2SLS) estimation approaches. The results from the two approaches were divergent: while the MIIV-2SLS estimator favored the hypothesized first-order factor model over the competing second-order factor model, the full-information estimator supported both. However, the equation-by-equation overidentification tests available with MIIV-2SLS further revealed that the second-order factor model was misspecified in that all of the first-order factors except one were rejected by the Sargan test. Quinn’s (2005) research demonstrated the value of local tests and compared ML and MIIV-2SLS estimates for the same model (see also Frazier, Newman, & Jaccard, 2007).

Brain functional connectivity using fMRI is another area of application for MIIV-2SLS (e.g., Sideridis, Simos, Papanicolaou, & Fletcher, 2014). More recently, MIIV-2SLS estimation implemented in the MIIVsem R program has been used in the area of cognitive neuroscience to study functional connectivity between multivariate brain domains in relation to the aging process (Varangis, Razlighi, Habeck, Fisher, & Stern, 2019). A factor analytic functional connectivity with MIIV-2SLS estimation has valuable features to model network-level interactions among multiple primary cognitive domains, for its robust estimation of subject-specific, high-dimensional multivariate timeseries data. Fisher et al. (2019) establish how a MIIV-2SLS approach addresses problems in such a dynamic time series context.

In brief, there are a number of extensions to the MIIV-2SLS approach that we could not discuss in detail as well as a number of empirical applications.

Conclusions

Much of the history of empirical psychology has focused on statistical models for a single outcome. Whether it was an ANOVA of the impact of a randomized teaching innovation on mathematics skills or a regression examining the effects of multiple covariates on depression, the focus was one dependent variable and its determinants. With the turn to SEM, there was a transition towards multiple outcomes. A SEM, for example, might have social support as an outcome in one equation, depression in another, and separate equations for each indicator that measures these and other latent variables. Rather than a single outcome, such models might have dozens of outcomes and measures. SEMs have allowed psychologists to study theories of behavior with many causes and many effects; these models offer closer approximations to the complexities of psychology.

System wide (full information) estimators such as ML emerged to handle SEMs with all parameters estimated simultaneously. Information from many equations contributed to the estimates across different parts of the system. Diagnostic tests and fit indices applied to the whole system. These system wide estimators are built on the implicit assumption that our theory (i.e., statistical model) is exactly correct. We recognize that our theories are approximations to reality, undermining a central tenet of ML estimation. In this paper, we have explored the MIIV-2SLS approach which has advantages over ML in the face of imperfect theories/models.

The MIIV-2SLS approach has similarities and differences with both the single outcome and the latent variable SEM system approaches. It is similar to traditional SEMs in that the starting point is the specification of the latent variable and measurement models that summarize researchers’ knowledge of the relationships. In other words, an equation system with multiple dependent variables is formulated. The MIIV-2SLS approach also is like the system wide approach to SEM in that it allows latent variables and multiple indicators so that researchers can take account of measurement error and build measurement models.

However, like single outcome research, researchers using MIIV-2SLS need only estimate a single outcome equation or any subset of equations. With the latent to observed (L2O) variable transformation, L2O equations from MIIV-2SLS look very much like the single equation models with observed variables that are common in applications. But these equations contain parameters that match those from the latent variable and measurement model equations. As does the ML estimator, MIIV-2SLS estimates latent variable to latent variable relations as well as latent variable to indicator ones.

The ML estimator for SEMs has an overidentification test that gauges the correspondence between the full model and the data. The MIIV-2SLS estimator also has an overidentification test, but it is local in that it is available for each overidentified equation. As such, it better locates possible problems within the SEM. In addition, as a limited information estimator, MIIV-2SLS better isolates misspecifications in the system than does the ML or other system wide estimators. Furthermore, many other limited information approaches do not include similar tests to assess model fit. The MIIV-2SLS is reminiscent of ANOVA or OLS regression of single outcome research in that MIIV-2SLS is a noniterative procedure that does not suffer from nonconvergence and is computationally quick.

Our paper has described numerous applications and desirable features of the MIIV-2SLS estimator. We have provided didactic examples of the method, which we hope have made MIIV-2SLS more accessible. We laid out steps of MIIV-2SLS to aid in readers’ understanding though we should reiterate that currently available software automates the steps. If one can estimate SEMs with lavaan in R, then one can just as easily estimate SEMs using the MIIVsem package. In online supplemental material, we list the MIIVsem R program that generated the results for the empirical examples. We also point out that this MIIV approach applies even in models without latent variables, that is, in simultaneous equations that assume no measurement error. Indeed, 2SLS is widely used in econometric simultaneous equation models. However, the automated procedures of choosing MIIVs are not discussed in that literature.

Though we have described many applications of the MIIV approach, there remains much to explore about its strengths and weaknesses. A reviewer suggested the single equation approach of MIIV-2SLS could permit researchers to switch estimators across equations rather than always using the same 2SLS estimator for all equations. For instance, if one equation is oriented toward a hazard rate for an event, then a variant of Cox regression could apply to it even if other equations were addressed with say MIIV-2SLS. This represents a high degree of flexibility in modeling where the nature of the dependent variable determines the best estimator for that equation.

Mediation analysis using MIIV-2SLS is another area that could be developed. Researchers should be able to use MIIV-2SLS estimates to form estimates of the direct, indirect, and total effects of one variable on another. The researcher would estimate each equation that is part of these effects and form these effects in the usual fashion. For example, in the classic example of $X \to Z \to Y$ and $X \to Y$ , the indirect effect would be formed by the product of coefficients of $X \to Z$ and $Z \to Y$ . The direct effect would be the coefficient of the path of $X \to Y$ . Following Bollen and Stine’s (1990) suggestion of bootstrapping the product of coefficients for the indirect effect and the single coefficient estimate, researchers could form confidence intervals or standard errors. Alternatively, if all coefficients were estimated for all equations (e.g., Bollen, 2001), the asymptotic covariance matrix of all coefficient estimates would be available and the delta method (cf. Folmer, 1981; Sobel, 1982) would provide an asymptotic standard error for the indirect effect.

Like the ML approach, many properties of the MIIV-2SLS are asymptotic. While there are still open questions about its finite sample performance, Rosseel (2020) recommends MIIV-2SLS as one of several estimation strategies appropriate for small samples because it is noniterative and provides local overidentification tests. In addition, the ML estimator for SEM has been studied by hundreds of researchers over nearly a half century. The 2SLS estimator for simultaneous equations has been studied by econometricians, but the MIIV-2SLS estimator in the context of latent variable SEM and factor analysis has received only a fraction of the attention. Problems such as weak instrumental variables in latent variable systems, diagnostics to find specific, bad MIIVs, modification indices, strategies to respecify models, and gaining a better understanding of the conditions under which MIIV-2SLS performs better or worse than the ML estimator are valuable areas of research. In addition, the benefit of using both ML and MIIV-2SLS to analyze the same model remains unexplored and shifts the attention from an either/or situation to the optimal combination of methods. In addition, this paper has focused on only one type of instrumental variable estimator (MIIV-2SLS). There are others that could be developed (cf. Bollen et al., 2014). In brief, there is much we already can do with MIIV-2SLS and there is much more to explore.

Supplementary Material

Supplementary

NIHMS1674084-supplement-Supplementary.pdf^{(231.1KB, pdf)}

Acknowledgments

We gratefully acknowledge partial support of this research from NIH 1R21MH119572-01 and the Carolina Population Center’s grants of NIH T32 HD091058, T32 HD007168, and P2C HD050924. No version of this paper was presented at a conference, on a listserv, or otherwise distributed.

APPENDIX

LATENT TO OBSERVED (L2O) VARIABLE TRANSFORMATION¹⁰

In the main text we illustrated the L2O transformation for an example and stated that the same L2O transformation can be done for other SEMs. In this appendix we give a general SEM matrix notation and then show how to perform the L2O transformation on this general SEM. By so doing, we illustrate that the L2O transformation works for SEM in general.

We can write the general SEM as,

L_{i} = α_{L} + B L_{i} + ε_{L i}

(19)

Z_{i} = α_{z} + Λ L_{i} + ε_{z i}

(20)

where L_i is an N_L × 1 vector of latent variables with N_L the number of latent variables, α_L is the N_L × 1 vector of intercepts for each latent variable in L_i, B is the $N_{L} \times N_{L}$ matrix of regression coefficients giving the expected effect of each L on the other Ls in the model, and $ε_{L i}$ is the $N_{L} \times 1$ vector of errors. The Z_i is the $N_{Z} \times 1$ vector of indicators of the latent variables, $α_{z}$ is the same dimension vector of intercepts for the measurement model, Λ is the $N_{Z} \times N_{L}$ matrix of factor loadings, and $ε_{z i}$ is the $N_{Z} \times 1$ vector of unique factors (measurement errors). The i indexes the cases in the sample. The model assumes that all disturbances have means of zero, that is, $E (ε_{L i}) = 0$ and $E (ε_{z i}) = 0$ . The $ε_{z}$ unique factors or disturbances are uncorrelated with the latent variables (L) or $C (ε_{z i}, L_{j}) = 0$ for all i, j. Furthermore, if we partition $ε_{L}$ into the errors of the latent endogenous variables ( $ε_{L_{Y} i}$ ) and the errors of the latent exogenous variables ( $ε_{L_{X} i} = L_{X i}$ ), then the errors $ε_{L_{Y} i}$ are uncorrelated with the latent exogenous variables ( $L_{X i}$ ), though the latent exogenous variables [or equivalently their errors ( $ε_{L_{x} i}$ )] typically correlate with each other. Unless otherwise stated, we will assume that the disturbance vectors ( $ε_{L i}, ε_{z j}$ ) are uncorrelated with each other [ $E (ε_{L i}, ε_{z j}^{'}) = 0$ , for all i, j]. We assume that the disturbances are not autocorrelated over cases $E (ε_{L i}, ε_{L j}^{'}) = 0, E (ε_{z i}, ε_{z j}^{'}) = 0$ for $i \neq j$ ] and they are homoscedastic $E (ε_{L i}, ε_{L i}^{'}) = Σ_{ε_{L} ε_{L}}, E (ε_{z i}, ε_{z i}^{'}) = Σ_{ε_{z} ε_{z}}$ for all i].

To simplify the notation, we drop the case index i. We also reorder the observed variables so that those that serve as the scaling indicator are grouped together first and called $Z_{s}$ and the nonscaling indicators are grouped together in $Z_{n s}$ . With this partitioning we rewrite the measurement model in (20) as,

Z = [\begin{array}{l} Z_{s} \\ Z_{n s} \end{array}] = [\begin{matrix} 0 \\ α_{Z n s} \end{matrix}] + [\begin{matrix} I \\ Λ_{n s} \end{matrix}] L + [\begin{matrix} ε_{Zs} \\ ε_{Z n s} \end{matrix}]

(21)

where $α_{Z n s}$ contains the intercepts, $Λ_{n s}$ contains the factor loadings, $ε_{Z n s}$ has the unique factors of $Z_{n s}$ , and $ε_{Z s}$ has the unique factors of $Z_{s}$ . For the scaling indicators, $Z_{s}$ , we write

Z_{s} = L + ε_{Z s}

(22)

Reexpress equation (22) as

L = Z_{S} - ε_{Zs}

(23)

The only intercepts and factor loadings that we need to estimate are for the second set of indicators, $Z_{n s}$ which from equation (21) is $Z_{n s} = α_{Z n s} + Λ_{n s} L + ε_{Z n s}$ .

Substituting $L = Z_{s} - ε_{Z s}$ for $L$ , the $Z_{n s}$ measurement model becomes

Z_{n s} = α_{Z n s} + Λ_{n s} Z_{s} - Λ_{n s} ε_{Zs} + ε_{Zns} = α_{Z n s} + Λ_{n s} Z_{s} + u_{n s}

(24)

where $u_{n s} = - Λ_{n s} ε_{Zs} + ε_{Zns}$ . This simple Latent to Observed (L2O) transformation turns a latent variable factor analysis model into a multiequation observed variable model with a composite disturbance. A similar L2O transformation of the latent variable equation (19) leads to,

\begin{matrix} L = α_{L} + B L + ε_{L} \\ Z_{s} - ε_{Z s} = α_{L} + B (Z_{s} - ε_{Z s}) + ε_{L} \\ Z_{s} = α_{L} + B Z_{s} (I - B) ε_{Z s} + ε_{L} . \end{matrix}

(25)

Combining equations (24) and (25) gives us the L2O transformation for the whole model,

[\begin{matrix} Z_{s} \\ Z_{n s} \end{matrix}] = [\begin{matrix} α_{L} \\ α_{Z n s} \end{matrix}] + [\begin{matrix} B \\ Λ_{n s} \end{matrix}] [Z_{s}] + [\begin{matrix} (I - B) & I & 0 \\ - Λ_{n s} & 0 & I \end{matrix}] [\begin{matrix} ε_{Z s} \\ ε_{L} \\ ε_{Z n s} \end{matrix}] .

(26)

The intercepts and coefficients from this equation match the intercepts and coefficients from the original SEM in equations (19) and (20). If we have consistent estimators of the intercept and coefficients in equation (26), then we will have consistent estimators of the intercepts and coefficients in the original equations (19) and (20).

The R package MIIVsem (Fisher et al., 2017) automatically makes this L2O transformation, but this appendix provides insight into how it does it.

MIIV-2SLS ESTIMATOR IN MATRIX NOTATION

The scalar expression for the MIIV-2SLS was given for a couple of examples in the body of the text. This appendix shows that we can generalize this estimator to virtually any number of covariates and coefficients and any number of MIIVs.

The starting point is a single L2O equation. Create a N × 1 vector Y that contains all N values of the dependent variable for that equation. Form a N × 1 vector of the composite errors for the equation and call it e. The L2O equation has one or more covariates that influence the dependent variable. Sort these covariates into two categories: (1) those covariates that directly affect Y and that correlate with e and (2) the intercept and those covariates that directly affect Y and that do not correlate with e. Collect all N values of the covariates in category (1) and place them in a $N \times N_{U_{a}}$ matrix U_a with their coefficients in a $N_{U_{a}} \times 1$ vector called $β_{a}$ . Similarly, put a column of ones for the intercept and all values of the covariates from (2) in a $N \times N_{U_{b}}$ matrix $U_{b}$ with their intercept and coefficients in an $N_{U_{b}} \times 1$ vector $β_{b}$ . Then we write the equation,

Y = U_{a} β_{a} + U_{b} β_{b} + e,

(27)

Y = U β + e,

(28)

where $U = [\begin{matrix} U_{a} U_{b} \end{matrix}]$ with $β^{'} = [\begin{matrix} β_{a}^{'} β_{b}^{'} \end{matrix}]$ their corresponding coefficients. Because the covariates in U_a correlate with e, OLS is not a consistent estimator of the parameters in the equation. Instead we use the MIIV-2SLS estimator. Create a matrix $V = [\begin{matrix} U_{b} U_{c} \end{matrix}]$ where we defined U_b above and U_c consists of all observed variables that are uncorrelated with e and do not have a direct effect on Y. The variables in V consist of a column of ones and all of the MIIVs for that equation. The MIIV-2SLS estimator for the equation is,

{\hat{β}}_{2 S L S} = {(U^{'} P_{V} U)}^{- 1} U^{'} P_{V} Y

(29)

where P_V is a projection matrix equal to $V {(V^{'} V)}^{- 1} V^{'}$ . This procedure is followed for each of the remaining equations in a given SEM. The dependent variable will differ and the variables that enter $U = [\begin{matrix} U_{a} U_{b} \end{matrix}]$ and $V = [\begin{matrix} U_{b} U_{c} \end{matrix}]$ often differ across equations. But the same general procedure is followed to attain the MIIV-2SLS estimators.

The MIIV-2SLS estimator ${\hat{β}}_{2 S L S}$ is asymptotically unbiased, consistent, asymptotically efficient among single-equation limited information estimators, and asymptotically normally distributed with an asymptotic covariance matrix of

\hat{V} ({\hat{β}}_{2 S L S}) = {\hat{σ}}_{ε}^{2} {(U^{'} P_{V} U)}^{- 1}

(30)

where ${\hat{σ}}_{ε}^{2}$ is the residual variance from equation (28) when ${\hat{β}}_{2 S L S}$ is substituted for β or

{\hat{σ}}_{ε}^{2} = \frac{(Y - U {\hat{β}}_{2 S L S})^{'} (Y - U {\hat{β}}_{2 S L S})}{N} .

(31)

SARGAN OVERIDENTIFICATION TEST

The main text gave a scalar expression for the Sargan (1958) overidentification test for all overidentified equations. Here we give a matrix expression for the same test as this matrix expression is common in the econometric literature. It is,

T_{S a r g a n} = \frac{\hat{u} V {(V^{'} V)}^{- 1} V^{'} \hat{u}}{{\hat{u}}^{'} \hat{u} / N} .

(32)

where $\hat{u}$ is a vector of the residuals from the MIIV-2SLS equation, V is the matrix of values of all MIIVs for the equation, and N is the sample size. T_Sargan approximates a chi square variate in large samples. Its degrees of freedom equals the difference in the number of right hand side covariates in the L2O equation and the number of MIIVs for that equation.

Footnotes

DWLS first estimates the univariate thresholds followed by pairwise polychoric correlations, and then applies a system wide estimator to the polychoric correlation matrix. WLSMV is a popular version of DWLS.

There are some robustness conditions (e.g., Satorra, 1990) under which the asymptotic significance tests could remain accurate, but there are no established procedures to determine whether these robustness conditions hold.

Some scholars dislike the idea of treating models as approximations and tolerating any kind of specification error because such error might be theoretically consequential. However, the predominant view in SEM as well as in most areas of statistics of which we are aware is that all models are approximations.

⁴

We are very grateful to Professor Reisenzein for making his data available to us.

⁵

Fisher’s (1966) work and much of econometrics focuses on equation identification whereas the latent variable SEM literature has focused on model identification. However, the point that we cannot find unique values of underidentified parameters holds whether the parameter is in an equation or in a model.

⁶

See Bollen (1989, p. 244). The randomized variable Z₁ is perfectly measured and its mean, variance, and its covariances with the latent variables are identified.

⁷

The R program MIIVsem of Fisher et al. (2017) automatically makes the L2O transformation for all equations, so there is no need for the analysts to manually make the transformation we described in this section.

⁸

The simulation illustrations generate all exogenous variables and errors from N(0,1) distributions with an N=1000. The population parameters for the first simulation are 1 for scaling indicator factor loadings, 0.8 for non-scaling indicator factor loadings, 0.2 for $L_{1} \to L_{2} \to L_{3}$ , and 0.8 for $L_{1} \to L_{3}$ . The discrepancies from these population values in Tables 6 and 7 are due to sampling fluctuations.

⁹

The simulation illustrations generate all exogenous variables and errors from N(0,1) distributions with an N=1000. The population parameters for the second simulation are set to 1 for scaling indicator factor loadings, 0.8 for non-scaling indicator factor loadings, 0.6 for $L_{1} \to L_{2} \to L_{3}$ , and 0.3 for all pairs of correlated errors.

¹⁰

Parts of this appendix are taken from Bollen (2020a).

REFERENCE

Basmann RL (1957). A generalized classical method of linear estimation of coefficients in a structural equation. Econometrica 25, 77–83. [Google Scholar]
Bauldry S. (2014). miivfind: A command for identifying model-implied instrumental variables for structural equation models in Stata. Stata Journal, 14, 60–75. [Google Scholar]
Bollen KA (1989). Structural equations with latent variables. New York: John Wiley Sons. 10.1002/9781118619179 [DOI] [Google Scholar]
Bollen KA (1995). Structural Equation Models That are Nonlinear in Latent Variables: A Least-Squares Estimator. Sociological Methodology, 25, 223–251. [Google Scholar]
Bollen KA (1996a). An alternative two stage least squares (2sls) estimator for latent variable equations. Psychometrika, 61(1), 109–121. [Google Scholar]
Bollen KA (1996b). A Limited Information Estimator for LISREL Models with and Without Heteroscedasticity. In Marcoulides GA & Schumacker RE (Eds.), Advanced Structural Equation Modeling: Issues and Techniques (pp. 227–241). Mahwah, N.J.: Lawrence Erlbaum Associates. [Google Scholar]
Bollen KA (2001). Two-stage Least Squares and Latent Variable Models: Simultaneous Estimation and Robustness to Misspecifications. Pages 119–138 in Cudeck Robert, Stephen Du Toit, & Dag rbom(eds), Structural Equation Modeling: Present and Future, A Festschrift in Honor of Karl Jöreskog. (pp. 119–38). Lincolnwood, Ill.: Scientific Software International. [Google Scholar]
Bollen KA (2012). Instrumental variables in sociology and the social sciences. Annual Review of Sociology, 38, 37–72. [Google Scholar]
Bollen KA (2019). Model Implied Instrumental Variables (MIIVs): An Alternative Orientation to Structural Equation Modeling. Multivariate Behavioral Research, 54(1), 31–46. doi: 10.1080/00273171.2018.1483224. [DOI] [PMC free article] [PubMed] [Google Scholar]
Bollen KA (2020a). Foundations of structural equation models [Unpublished manuscript]. University of North Carolina, Chapel Hill, NC. [Google Scholar]
Bollen KA (2020b). When good loadings go bad: Robustness in factor analysis. Structural Equation Modeling, 27(4), 515–524. 10.1080/10705511.2019.1691005 [DOI] [PMC free article] [PubMed] [Google Scholar]
Bollen KA, & Bauer DJ (2004). Automating the Selection of Model-Implied Instrumental Variables. Sociological Methods & Research, 32(4), 425–452. [Google Scholar]
Bollen KA, & Bauldry S. (2010). Model identification and computer algebra. Sociological Methods & Research, 39(2), 127–156. [DOI] [PMC free article] [PubMed] [Google Scholar]
Bollen KA, & Biesanz JC (2002). A note on a two-stage least squares estimator for higher-order factor analyses. Sociological Methods and Research, 30(4), 568–579. [Google Scholar]
Bollen KA, Kirby JB, Curran PJ, Paxton PM, & Chen F. (2007). Latent Variable Models under Misspecification: Two Stage Least Squares (2SLS) and Maximum Likelihood (ML) Estimators. Sociological Methods and Research 36(1), 46–86. [Google Scholar]
Bollen KA, Kolenikov S, and Bauldry S. (2014). Model-Implied Instrumental Variable-Generalized Method of Moments (MIIV-GMM) Estimators for Latent Variable Models. Psychometrika, 79(1), 20–50. [DOI] [PMC free article] [PubMed] [Google Scholar]
Bollen KA & Maydeu-Olivares A. (2007). A Polychoric Instrumental Variable (PIV) Estimator for Structural Equation Models with Categorical Variables. Psychometrika, 72(3), 309–326. [Google Scholar]
Bollen KA, Gates KM, & Fisher Z. (2018). Robustness Conditions for MIIV-2SLS When the Latent Variable or Measurement Model is Structurally Misspecified. Structural Equation Modeling: A Multidisciplinary Journal, 25(6), 848–859. [DOI] [PMC free article] [PubMed] [Google Scholar]
Bollen KA, & Paxton P. (1998). Interactions of latent variables in structural equation models. Structural Equation Modeling, 5(3), 267–293. 10.1080/10705519809540105 [DOI] [Google Scholar]
Bollen KA, & Pearl J. (2013). Eight myths about causality and structural equation models. In Morgan SL (Ed.) Handbook of causal analysis for social research (pp. 301–328). Springer, Dordrecht. [Google Scholar]
Bollen KA, & Stine RA (1990). Direct and Indirect Effects: Classical and Bootstrap Estimates of Variability. Sociological Methodology 20(1), 115–140. [Google Scholar]
Bollen KA, & Stine RA (1993). Bootstrapping goodness-of-fit measures in structural equation models. In Bollen KA, Long JS, (Eds.). Testing structural equation models (pp. 111–135). Newbury Park, CA: Sage. [Google Scholar]
Brandt H, Umbach N, Kelava A, & Bollen KA (2020). Comparing estimators for latent interaction models under structural and distributional misspecifications. Psychological Methods. 25(3), 321–345. 10.1037/met0000231 [DOI] [PubMed] [Google Scholar]
Browne MW (1984). Asymptotically distribution-free methods for the analysis of covariance structures. British Journal of Mathematical and Statistical Psychology, 37(1), 62–83. [DOI] [PubMed] [Google Scholar]
Browne MW & Cudeck R. (1993). Alternative ways of assessing model fit. In Bollen KA & Long JS (Eds.), Testing structural equation models (pp. 136–162). Newbury Park, CA: Sage. [Google Scholar]
Chen B, Pearl J, & Kline R. (2018). Graphical tools for linear path models. (Technical Report R-469) Retrieved from UCLA, Department of Computer Science website: https://ftp.cs.ucla.edu/pub/stat_ser/r469.pdf [Google Scholar]
Culpepper SA, Aguinis H, Kern JL (2019). High-Stakes Testing Case Study: A Latent Variable Approach for Assessing Measurement and Prediction Invariance. Psychometrika (84). 285–309. 10.1007/s11336-018-9649-2 [DOI] [PubMed] [Google Scholar]
Fisher FM (1966). The Identification Problem in Econometrics. New York: McGraw Hill. [Google Scholar]
Fisher ZF, & Bollen KA (2020). An Instrumental Variable Estimator for Mixed Indicators: Analytic Derivatives and Alternative Parameterizations. Psychometrika, 85(3), 660–683. [DOI] [PMC free article] [PubMed] [Google Scholar]
Fisher Z, Bollen K, Gates K, and Rönkkö M. (2017). Miivsem: Model Implied Instrumental Variable (MIIV) Estimation of Structural Equation Models. R package version 0.5.2. [Google Scholar]
Fisher Z, Bollen KA, & Gates KM (2019). A Limited Information Estimator for Dynamic Factor Models. Multivariate Behavioral Research, 54(2), 246–263. doi: 10.1080/00273171.2018.1519406 [DOI] [PMC free article] [PubMed] [Google Scholar]
Folmer H. (1981). Measurement of the Effects of Regional Policy Instruments by Means of Linear Structural Equation Models and Panel Data. Environment and Planning, 13(14), 35–48. [Google Scholar]
Frazier LD, Newman FL, & Jaccard J. (2007). Psychosocial outcomes in later life: A multivariate model. Psychology and Aging, 22(4), 676–689. 10.1037/0882-7974.22.4.676 [DOI] [PubMed] [Google Scholar]
Gates KM, Fisher ZF, & Bollen KA (2020). Latent variable GIMME using model implied instrumental variables (MIIVs). Psychological Methods, 25(2), 227–242. 10.1037/met0000229 [DOI] [PMC free article] [PubMed] [Google Scholar]
Gennetian LA, Magnuson K, & Morris PA (2008). From statistical associations to causation: what developmentalists can learn from instrumental variables techniques coupled with experimental data. Developmental psychology, 44(2), 381–394. doi: 10.1037/0012-1649.44.2.381 [DOI] [PMC free article] [PubMed] [Google Scholar]
Giordano M, & Bollen KA (2020). Estimating and testing random intercept multilevel structural equation models with model implied instrumental variables. Unpublished paper. [DOI] [PMC free article] [PubMed] [Google Scholar]
Goldstein H, & McDonald RP (1988). A general model for the analysis of multilevel data. Psychometrika, 53(4), 455–467. [Google Scholar]
Gray CM, Montgomery MJ (2012). Links between alcohol and other drug problems and maltreatment among adolescent girls: perceived discrimination, ethnic identity, and ethnic orientation as moderators. Child. Abus. Negl, 36(5), 449–460. [DOI] [PMC free article] [PubMed] [Google Scholar]
Greene WH (2012). Econometric Analysis. 7th ed. Upper Saddle River, NJ: Prentice Hall. [Google Scholar]
Hayakawa H, & Sun Q. (2019) Instrumental variable estimation of factor models with possibly many variables. Communications in Statistics - Simulation and Computation, 48(6), 1729–1745, DOI: 10.1080/03610918.2018.1423690 [DOI] [Google Scholar]
Holm S. (1979). A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics, 6(2), 65–70. [Google Scholar]
Jin S. & Cao C. (2018). Selecting polychoric instrumental variables in confirmatory factor analysis: An alternative specification test and effects of instrumental variables. British Journal of Mathematical and Statistical Psychology, 71(2). [DOI] [PubMed] [Google Scholar]
Jin S, Luo H, & Yang-Wallentin F. (2016). A Simulation Study of Polychoric Instrumental Variable Estimation in Structural Equation Models. Structural Equation Modeling: A Multidisciplinary Journal, 23(5), 680–694. [Google Scholar]
Jöreskog KG (1973). A general method for estimating a linear structural equation system. In Goldberger AS & Duncan OD (Eds.), Structural equation models in the social sciences (pp. 85–112). New York, NY: Seminar Press. [Google Scholar]
Jöreskog KG (1977) Structural equation models in the social sciences: Specification, estimation and testing. In Krishnaiah PR (Ed.), Applications of statistics. Amsterdam: North Holland Publishing Co., 265–286. [Google Scholar]
Jöreskog KG (2002), Structural Equation Modeling with Ordinal Variables using LISREL. DOI: 10.1214/lnms/1215463803. [DOI] [Google Scholar]
Kirby JB and Bollen KA (2009). Using Instrumental Variable (IV) Tests to Evaluate Model Specification in Latent Variable Structural Equation Models. Sociological Methodology, 39(1):327–355. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kenny DA, & Milan S. (2012). Identification: A non-technical discussion of a technical issue. Hoyle R, Kaplan D, Marcoulides G, & West S. (Eds.), Handbook of Structural Equation Modeling (pp. 145–163). New York: Guilford. [Google Scholar]
Kolenikov S, & Bollen KA (2012). Testing negative error variances: Is a Heywood case a symptom of misspecification? Sociological Methods & Research, 41(1), 124–167. [Google Scholar]
MacCallum RC, Browne MW, & Cai L. (2007). Factor analysis models as approximations. In Cudeck R. & MacCallum RC (Eds.), Factor analysis at 100: Historical developments and future directions (pp. 153–175). Lawrence Erlbaum Associates Publishers. [Google Scholar]
Maydeu-Olivares A, Shi D, & Fairchild AJ (2020). Estimating causal effects in linear regression models with observational data: The instrumental variables regression model. Psychological Methods, 25(2), 243–258. 10.1037/met0000226 [DOI] [PubMed] [Google Scholar]
Miller PJE, Niehuis S, & Huston TL (2006). Positive Illusions in Marital Relationships: A 13-Year Longitudinal Study. Personality and Social Psychology Bulletin, 32(12), 1579–1594. 10.1177/0146167206292691 [DOI] [PubMed] [Google Scholar]
Mooijaart & Satorra(2009). On insensitivity of the chi-square model test to nonlinear misspecification in structural equation models. Psychometrika, 74, 443–455. [Google Scholar]
Muthén BO (1990). Mean and covariance structure analysis of hierarchical data. Paper presented at the Psychometric Society meeting in Princeton, NJ, June 1990. UCLA Statistics Series 62. Paper retrieved from https://www.statmodel.com/bmuthen/full_paper_list.htm [Google Scholar]
Muthén BO (1994). Multilevel covariance structure analysis. Sociological Methods & Research, 22(3), 376–398. [Google Scholar]
Nagar AL (1959). The bias and the moment matrix of the general k-class estimators of parameters in simultaneous equations. Econometrica, 27, 575–595. [Google Scholar]
Nestler S. (2013). A Monte Carlo study comparing PIV, ULS and DWLS in the estimation of dichotomous confirmatory factor analysis. British Journal of Mathematical and Statistical Psychology, 66(1):127–143. [DOI] [PubMed] [Google Scholar]
Nestler S. (2014a). How the 2SLS/IV estimator can handle equality constraints in structural equation models: A system-of-equations approach. British Journal of Mathematical and Statistical Psychology, 67(2), 353–369. 10.1111/bmsp.12023 [DOI] [PubMed] [Google Scholar]
Nestler S. (2015a). Using Instrumental Variables to Estimate the Parameters in Unconditional and Conditional Second-Order Latent Growth Models. Structural Equation Modeling: A Multidisciplinary Journal, 22(3), 461–473. 10.1080/10705511.2014.934948 [DOI] [Google Scholar]
Nestler S. (2015b). A Specification Error Test That Uses Instrumental Variables to Detect Latent Quadratic and Latent Interaction Effects. Structural Equation Modeling: A Multidisciplinary Journal, 22(4), 542–551. [Google Scholar]
Oczkowski E, & Farrell MA (1998) Discriminating between measurement scales using non-nested tests and two-stage least squares: The case of market orientation. International Journal of Research in Marketing,15(4), 349–366 [Google Scholar]
Popper K. (2005). The logic of scientific discovery. Routledge. [Google Scholar]
Quinn RW (2005). Flow in Knowledge Work: High Performance Experience in the Design of National Security Technology. Administrative Science Quarterly, 50(4), 610–641. 10.2189/asqu.50.4.610 [DOI] [Google Scholar]
R Core Team (2019). R: A language and environment for statistical computing. R Foundation for Statistical Computing; Vienna, Austria: 2012. URL http://www.R-project.org/ [Google Scholar]
Reisenzein R. (1986). A Structural Equation Analysis of Weiner’s Attribution-Affect Model of Helping Behavior. Journal of Personality and Social Psychology, 50(6), 1123–33. [Google Scholar]
Rosseel Y. (2020). Small sample solutions for structural equation modeling. In van de Schoot R, & Miočević M. (Eds.), Small sample size solutions: A guide for applied researchers and practitioners (pp. 226–238). London: Routledge. [Google Scholar]
Sargan JD (1958). The estimation of economic relationships using instrumental variables. Econometrica: Journal of the Econometric Society, 26, 393–415. [Google Scholar]
Satorra A. (1990). Robustness issues in structural equation modeling: A review of recent developments. Quality and Quantity, 24, 367–386. 10.1007/BF00152011 [DOI] [Google Scholar]
Satorra A, & Bentler PM (1994). Corrections to test statistics and standard errors in covariance structure analysis. In von Eye A. & Clogg CC (Eds.), Latent variables analysis: Applications for developmental research (pp. 399–419). Sage Publications, Inc. [Google Scholar]
Satorra MW & Cudeck R. (1993). Alternative ways of assessing model fit. In Bollen KA and Long JS (Eds.), Testing structural equation models (pp. 136–162). Newbury Park, CA: Sage. [Google Scholar]
Sideridis G, Simos P, Papanicolaou A, & Fletcher J. (2014). Using Structural Equation Modeling to Assess Functional Connectivity in the Brain: Power and Sample Size Considerations. Educational and Psychological Measurement, 74(5), 733–758. 10.1177/0013164414525397 [DOI] [PMC free article] [PubMed] [Google Scholar]
Snow DL, Swan SC, Raghavan C, Connell CM & Klein I. (2003). The relationship of work stressors, coping and social support to psychological symptoms among female secretarial employees, Work & Stress, 17(3), 241–263, DOI: 10.1080/02678370310001625630 [DOI] [Google Scholar]
Sobel M. (1982). Asymptotic Confidence Intervals for Indirect Effects in Structural Equation Models. Sociological Methodology, 13(1), 290–312. [Google Scholar]
Textor J, Hardt J, & Knüppel S. (2011). DAGitty: a graphical tool for analyzing causal diagrams. Epidemiology, 22(5), 745. [DOI] [PubMed] [Google Scholar]
Theil H. (1953a). Repeated least-squares applied to complete equation systems. The Netherlands: Central Planning Bureau (mimeographed). [Google Scholar]
Theil H. (1953b). Estimation and simultaneous correlation in complete equation systems. The Netherlands: Central Planning Bureau (mimeographed). [Google Scholar]
Theil H. (1954). Estimation of parameters in econometric models. Bulletin of the International Statistical Institute 34, 122–129. [Google Scholar]
Theil H. (1961). Economic forecasts and policy. Amsterdam: North-Holland Pub. Co. [Google Scholar]
Thoemmes F, Rosseel Y, & Textor J. (2018). Local fit evaluation of structural equation models using graphical criteria. Psychological methods, 23(1), 27–41. doi: 10.1037/met0000147 [DOI] [PubMed] [Google Scholar]
Varangis E, Razlighi Q, Habeck C, Fisher Z, & Stern Y. (2019). Between-network functional connectivity is modified by age and cognitive task domain. J. Cogn. Neurosci. 34, 607–622. 10.1162/jocn_a_01368 [DOI] [PMC free article] [PubMed] [Google Scholar]
Yang-Wallentin F, Jöreskog KG, & Luo H. (2010). Confirmatory Factor Analysis of Ordinal Variables with Misspecified Models, Structural Equation Modeling: A Multidisciplinary Journal, 17(3), 392–423. DOI: 10.1080/10705511.2010.489003 [DOI] [Google Scholar]
Yuan KH, & Bentler PM (2007). Multilevel Covariance Structure Analysis by Fitting Multiple Single-Level Models. Sociological Methodology, 37(1), 53–82. [DOI] [PMC free article] [PubMed] [Google Scholar]
Yuan KH, Marshall LL, & Bentler PM (2003). 8. Assessing the Effect of Model Misspecifications on Parameter Estimates in Structural Equation Models. Sociological methodology, 33(1), 241–265. [Google Scholar]
Weiner B. (1980). A Cognitive (Attribution)-Emotion-Action Model of Motivated Behavior: An Analysis of Judgements of Help-Giving. Journal of Personality and Social Psychology, 39(2), 186–200. [Google Scholar]
Wright S. (1921). Correlation and causation. J. agric. Res, 20, 557–580. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary

NIHMS1674084-supplement-Supplementary.pdf^{(231.1KB, pdf)}

[R1] Basmann RL (1957). A generalized classical method of linear estimation of coefficients in a structural equation. Econometrica 25, 77–83. [Google Scholar]

[R2] Bauldry S. (2014). miivfind: A command for identifying model-implied instrumental variables for structural equation models in Stata. Stata Journal, 14, 60–75. [Google Scholar]

[R3] Bollen KA (1989). Structural equations with latent variables. New York: John Wiley Sons. 10.1002/9781118619179 [DOI] [Google Scholar]

[R4] Bollen KA (1995). Structural Equation Models That are Nonlinear in Latent Variables: A Least-Squares Estimator. Sociological Methodology, 25, 223–251. [Google Scholar]

[R5] Bollen KA (1996a). An alternative two stage least squares (2sls) estimator for latent variable equations. Psychometrika, 61(1), 109–121. [Google Scholar]

[R6] Bollen KA (1996b). A Limited Information Estimator for LISREL Models with and Without Heteroscedasticity. In Marcoulides GA & Schumacker RE (Eds.), Advanced Structural Equation Modeling: Issues and Techniques (pp. 227–241). Mahwah, N.J.: Lawrence Erlbaum Associates. [Google Scholar]

[R7] Bollen KA (2001). Two-stage Least Squares and Latent Variable Models: Simultaneous Estimation and Robustness to Misspecifications. Pages 119–138 in Cudeck Robert, Stephen Du Toit, & Dag rbom(eds), Structural Equation Modeling: Present and Future, A Festschrift in Honor of Karl Jöreskog. (pp. 119–38). Lincolnwood, Ill.: Scientific Software International. [Google Scholar]

[R8] Bollen KA (2012). Instrumental variables in sociology and the social sciences. Annual Review of Sociology, 38, 37–72. [Google Scholar]

[R9] Bollen KA (2019). Model Implied Instrumental Variables (MIIVs): An Alternative Orientation to Structural Equation Modeling. Multivariate Behavioral Research, 54(1), 31–46. doi: 10.1080/00273171.2018.1483224. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R82] Bollen KA (2020a). Foundations of structural equation models [Unpublished manuscript]. University of North Carolina, Chapel Hill, NC. [Google Scholar]

[R83] Bollen KA (2020b). When good loadings go bad: Robustness in factor analysis. Structural Equation Modeling, 27(4), 515–524. 10.1080/10705511.2019.1691005 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R10] Bollen KA, & Bauer DJ (2004). Automating the Selection of Model-Implied Instrumental Variables. Sociological Methods & Research, 32(4), 425–452. [Google Scholar]

[R11] Bollen KA, & Bauldry S. (2010). Model identification and computer algebra. Sociological Methods & Research, 39(2), 127–156. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R12] Bollen KA, & Biesanz JC (2002). A note on a two-stage least squares estimator for higher-order factor analyses. Sociological Methods and Research, 30(4), 568–579. [Google Scholar]

[R13] Bollen KA, Kirby JB, Curran PJ, Paxton PM, & Chen F. (2007). Latent Variable Models under Misspecification: Two Stage Least Squares (2SLS) and Maximum Likelihood (ML) Estimators. Sociological Methods and Research 36(1), 46–86. [Google Scholar]

[R14] Bollen KA, Kolenikov S, and Bauldry S. (2014). Model-Implied Instrumental Variable-Generalized Method of Moments (MIIV-GMM) Estimators for Latent Variable Models. Psychometrika, 79(1), 20–50. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R15] Bollen KA & Maydeu-Olivares A. (2007). A Polychoric Instrumental Variable (PIV) Estimator for Structural Equation Models with Categorical Variables. Psychometrika, 72(3), 309–326. [Google Scholar]

[R16] Bollen KA, Gates KM, & Fisher Z. (2018). Robustness Conditions for MIIV-2SLS When the Latent Variable or Measurement Model is Structurally Misspecified. Structural Equation Modeling: A Multidisciplinary Journal, 25(6), 848–859. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R17] Bollen KA, & Paxton P. (1998). Interactions of latent variables in structural equation models. Structural Equation Modeling, 5(3), 267–293. 10.1080/10705519809540105 [DOI] [Google Scholar]

[R18] Bollen KA, & Pearl J. (2013). Eight myths about causality and structural equation models. In Morgan SL (Ed.) Handbook of causal analysis for social research (pp. 301–328). Springer, Dordrecht. [Google Scholar]

[R19] Bollen KA, & Stine RA (1990). Direct and Indirect Effects: Classical and Bootstrap Estimates of Variability. Sociological Methodology 20(1), 115–140. [Google Scholar]

[R20] Bollen KA, & Stine RA (1993). Bootstrapping goodness-of-fit measures in structural equation models. In Bollen KA, Long JS, (Eds.). Testing structural equation models (pp. 111–135). Newbury Park, CA: Sage. [Google Scholar]

[R21] Brandt H, Umbach N, Kelava A, & Bollen KA (2020). Comparing estimators for latent interaction models under structural and distributional misspecifications. Psychological Methods. 25(3), 321–345. 10.1037/met0000231 [DOI] [PubMed] [Google Scholar]

[R22] Browne MW (1984). Asymptotically distribution-free methods for the analysis of covariance structures. British Journal of Mathematical and Statistical Psychology, 37(1), 62–83. [DOI] [PubMed] [Google Scholar]

[R23] Browne MW & Cudeck R. (1993). Alternative ways of assessing model fit. In Bollen KA & Long JS (Eds.), Testing structural equation models (pp. 136–162). Newbury Park, CA: Sage. [Google Scholar]

[R24] Chen B, Pearl J, & Kline R. (2018). Graphical tools for linear path models. (Technical Report R-469) Retrieved from UCLA, Department of Computer Science website: https://ftp.cs.ucla.edu/pub/stat_ser/r469.pdf [Google Scholar]

[R25] Culpepper SA, Aguinis H, Kern JL (2019). High-Stakes Testing Case Study: A Latent Variable Approach for Assessing Measurement and Prediction Invariance. Psychometrika (84). 285–309. 10.1007/s11336-018-9649-2 [DOI] [PubMed] [Google Scholar]

[R26] Fisher FM (1966). The Identification Problem in Econometrics. New York: McGraw Hill. [Google Scholar]

[R27] Fisher ZF, & Bollen KA (2020). An Instrumental Variable Estimator for Mixed Indicators: Analytic Derivatives and Alternative Parameterizations. Psychometrika, 85(3), 660–683. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R28] Fisher Z, Bollen K, Gates K, and Rönkkö M. (2017). Miivsem: Model Implied Instrumental Variable (MIIV) Estimation of Structural Equation Models. R package version 0.5.2. [Google Scholar]

[R29] Fisher Z, Bollen KA, & Gates KM (2019). A Limited Information Estimator for Dynamic Factor Models. Multivariate Behavioral Research, 54(2), 246–263. doi: 10.1080/00273171.2018.1519406 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R30] Folmer H. (1981). Measurement of the Effects of Regional Policy Instruments by Means of Linear Structural Equation Models and Panel Data. Environment and Planning, 13(14), 35–48. [Google Scholar]

[R31] Frazier LD, Newman FL, & Jaccard J. (2007). Psychosocial outcomes in later life: A multivariate model. Psychology and Aging, 22(4), 676–689. 10.1037/0882-7974.22.4.676 [DOI] [PubMed] [Google Scholar]

[R84] Gates KM, Fisher ZF, & Bollen KA (2020). Latent variable GIMME using model implied instrumental variables (MIIVs). Psychological Methods, 25(2), 227–242. 10.1037/met0000229 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R32] Gennetian LA, Magnuson K, & Morris PA (2008). From statistical associations to causation: what developmentalists can learn from instrumental variables techniques coupled with experimental data. Developmental psychology, 44(2), 381–394. doi: 10.1037/0012-1649.44.2.381 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R33] Giordano M, & Bollen KA (2020). Estimating and testing random intercept multilevel structural equation models with model implied instrumental variables. Unpublished paper. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R34] Goldstein H, & McDonald RP (1988). A general model for the analysis of multilevel data. Psychometrika, 53(4), 455–467. [Google Scholar]

[R35] Gray CM, Montgomery MJ (2012). Links between alcohol and other drug problems and maltreatment among adolescent girls: perceived discrimination, ethnic identity, and ethnic orientation as moderators. Child. Abus. Negl, 36(5), 449–460. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R36] Greene WH (2012). Econometric Analysis. 7th ed. Upper Saddle River, NJ: Prentice Hall. [Google Scholar]

[R37] Hayakawa H, & Sun Q. (2019) Instrumental variable estimation of factor models with possibly many variables. Communications in Statistics - Simulation and Computation, 48(6), 1729–1745, DOI: 10.1080/03610918.2018.1423690 [DOI] [Google Scholar]

[R38] Holm S. (1979). A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics, 6(2), 65–70. [Google Scholar]

[R39] Jin S. & Cao C. (2018). Selecting polychoric instrumental variables in confirmatory factor analysis: An alternative specification test and effects of instrumental variables. British Journal of Mathematical and Statistical Psychology, 71(2). [DOI] [PubMed] [Google Scholar]

[R40] Jin S, Luo H, & Yang-Wallentin F. (2016). A Simulation Study of Polychoric Instrumental Variable Estimation in Structural Equation Models. Structural Equation Modeling: A Multidisciplinary Journal, 23(5), 680–694. [Google Scholar]

[R41] Jöreskog KG (1973). A general method for estimating a linear structural equation system. In Goldberger AS & Duncan OD (Eds.), Structural equation models in the social sciences (pp. 85–112). New York, NY: Seminar Press. [Google Scholar]

[R42] Jöreskog KG (1977) Structural equation models in the social sciences: Specification, estimation and testing. In Krishnaiah PR (Ed.), Applications of statistics. Amsterdam: North Holland Publishing Co., 265–286. [Google Scholar]

[R43] Jöreskog KG (2002), Structural Equation Modeling with Ordinal Variables using LISREL. DOI: 10.1214/lnms/1215463803. [DOI] [Google Scholar]

[R44] Kirby JB and Bollen KA (2009). Using Instrumental Variable (IV) Tests to Evaluate Model Specification in Latent Variable Structural Equation Models. Sociological Methodology, 39(1):327–355. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R45] Kenny DA, & Milan S. (2012). Identification: A non-technical discussion of a technical issue. Hoyle R, Kaplan D, Marcoulides G, & West S. (Eds.), Handbook of Structural Equation Modeling (pp. 145–163). New York: Guilford. [Google Scholar]

[R46] Kolenikov S, & Bollen KA (2012). Testing negative error variances: Is a Heywood case a symptom of misspecification? Sociological Methods & Research, 41(1), 124–167. [Google Scholar]

[R47] MacCallum RC, Browne MW, & Cai L. (2007). Factor analysis models as approximations. In Cudeck R. & MacCallum RC (Eds.), Factor analysis at 100: Historical developments and future directions (pp. 153–175). Lawrence Erlbaum Associates Publishers. [Google Scholar]

[R48] Maydeu-Olivares A, Shi D, & Fairchild AJ (2020). Estimating causal effects in linear regression models with observational data: The instrumental variables regression model. Psychological Methods, 25(2), 243–258. 10.1037/met0000226 [DOI] [PubMed] [Google Scholar]

[R49] Miller PJE, Niehuis S, & Huston TL (2006). Positive Illusions in Marital Relationships: A 13-Year Longitudinal Study. Personality and Social Psychology Bulletin, 32(12), 1579–1594. 10.1177/0146167206292691 [DOI] [PubMed] [Google Scholar]

[R50] Mooijaart & Satorra(2009). On insensitivity of the chi-square model test to nonlinear misspecification in structural equation models. Psychometrika, 74, 443–455. [Google Scholar]

[R51] Muthén BO (1990). Mean and covariance structure analysis of hierarchical data. Paper presented at the Psychometric Society meeting in Princeton, NJ, June 1990. UCLA Statistics Series 62. Paper retrieved from https://www.statmodel.com/bmuthen/full_paper_list.htm [Google Scholar]

[R52] Muthén BO (1994). Multilevel covariance structure analysis. Sociological Methods & Research, 22(3), 376–398. [Google Scholar]

[R53] Nagar AL (1959). The bias and the moment matrix of the general k-class estimators of parameters in simultaneous equations. Econometrica, 27, 575–595. [Google Scholar]

[R54] Nestler S. (2013). A Monte Carlo study comparing PIV, ULS and DWLS in the estimation of dichotomous confirmatory factor analysis. British Journal of Mathematical and Statistical Psychology, 66(1):127–143. [DOI] [PubMed] [Google Scholar]

[R55] Nestler S. (2014a). How the 2SLS/IV estimator can handle equality constraints in structural equation models: A system-of-equations approach. British Journal of Mathematical and Statistical Psychology, 67(2), 353–369. 10.1111/bmsp.12023 [DOI] [PubMed] [Google Scholar]

[R56] Nestler S. (2015a). Using Instrumental Variables to Estimate the Parameters in Unconditional and Conditional Second-Order Latent Growth Models. Structural Equation Modeling: A Multidisciplinary Journal, 22(3), 461–473. 10.1080/10705511.2014.934948 [DOI] [Google Scholar]

[R57] Nestler S. (2015b). A Specification Error Test That Uses Instrumental Variables to Detect Latent Quadratic and Latent Interaction Effects. Structural Equation Modeling: A Multidisciplinary Journal, 22(4), 542–551. [Google Scholar]

[R58] Oczkowski E, & Farrell MA (1998) Discriminating between measurement scales using non-nested tests and two-stage least squares: The case of market orientation. International Journal of Research in Marketing,15(4), 349–366 [Google Scholar]

[R59] Popper K. (2005). The logic of scientific discovery. Routledge. [Google Scholar]

[R60] Quinn RW (2005). Flow in Knowledge Work: High Performance Experience in the Design of National Security Technology. Administrative Science Quarterly, 50(4), 610–641. 10.2189/asqu.50.4.610 [DOI] [Google Scholar]

[R61] R Core Team (2019). R: A language and environment for statistical computing. R Foundation for Statistical Computing; Vienna, Austria: 2012. URL http://www.R-project.org/ [Google Scholar]

[R62] Reisenzein R. (1986). A Structural Equation Analysis of Weiner’s Attribution-Affect Model of Helping Behavior. Journal of Personality and Social Psychology, 50(6), 1123–33. [Google Scholar]

[R63] Rosseel Y. (2020). Small sample solutions for structural equation modeling. In van de Schoot R, & Miočević M. (Eds.), Small sample size solutions: A guide for applied researchers and practitioners (pp. 226–238). London: Routledge. [Google Scholar]

[R64] Sargan JD (1958). The estimation of economic relationships using instrumental variables. Econometrica: Journal of the Econometric Society, 26, 393–415. [Google Scholar]

[R85] Satorra A. (1990). Robustness issues in structural equation modeling: A review of recent developments. Quality and Quantity, 24, 367–386. 10.1007/BF00152011 [DOI] [Google Scholar]

[R65] Satorra A, & Bentler PM (1994). Corrections to test statistics and standard errors in covariance structure analysis. In von Eye A. & Clogg CC (Eds.), Latent variables analysis: Applications for developmental research (pp. 399–419). Sage Publications, Inc. [Google Scholar]

[R66] Satorra MW & Cudeck R. (1993). Alternative ways of assessing model fit. In Bollen KA and Long JS (Eds.), Testing structural equation models (pp. 136–162). Newbury Park, CA: Sage. [Google Scholar]

[R67] Sideridis G, Simos P, Papanicolaou A, & Fletcher J. (2014). Using Structural Equation Modeling to Assess Functional Connectivity in the Brain: Power and Sample Size Considerations. Educational and Psychological Measurement, 74(5), 733–758. 10.1177/0013164414525397 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R68] Snow DL, Swan SC, Raghavan C, Connell CM & Klein I. (2003). The relationship of work stressors, coping and social support to psychological symptoms among female secretarial employees, Work & Stress, 17(3), 241–263, DOI: 10.1080/02678370310001625630 [DOI] [Google Scholar]

[R69] Sobel M. (1982). Asymptotic Confidence Intervals for Indirect Effects in Structural Equation Models. Sociological Methodology, 13(1), 290–312. [Google Scholar]

[R70] Textor J, Hardt J, & Knüppel S. (2011). DAGitty: a graphical tool for analyzing causal diagrams. Epidemiology, 22(5), 745. [DOI] [PubMed] [Google Scholar]

[R71] Theil H. (1953a). Repeated least-squares applied to complete equation systems. The Netherlands: Central Planning Bureau (mimeographed). [Google Scholar]

[R72] Theil H. (1953b). Estimation and simultaneous correlation in complete equation systems. The Netherlands: Central Planning Bureau (mimeographed). [Google Scholar]

[R73] Theil H. (1954). Estimation of parameters in econometric models. Bulletin of the International Statistical Institute 34, 122–129. [Google Scholar]

[R74] Theil H. (1961). Economic forecasts and policy. Amsterdam: North-Holland Pub. Co. [Google Scholar]

[R75] Thoemmes F, Rosseel Y, & Textor J. (2018). Local fit evaluation of structural equation models using graphical criteria. Psychological methods, 23(1), 27–41. doi: 10.1037/met0000147 [DOI] [PubMed] [Google Scholar]

[R76] Varangis E, Razlighi Q, Habeck C, Fisher Z, & Stern Y. (2019). Between-network functional connectivity is modified by age and cognitive task domain. J. Cogn. Neurosci. 34, 607–622. 10.1162/jocn_a_01368 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R77] Yang-Wallentin F, Jöreskog KG, & Luo H. (2010). Confirmatory Factor Analysis of Ordinal Variables with Misspecified Models, Structural Equation Modeling: A Multidisciplinary Journal, 17(3), 392–423. DOI: 10.1080/10705511.2010.489003 [DOI] [Google Scholar]

[R78] Yuan KH, & Bentler PM (2007). Multilevel Covariance Structure Analysis by Fitting Multiple Single-Level Models. Sociological Methodology, 37(1), 53–82. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R79] Yuan KH, Marshall LL, & Bentler PM (2003). 8. Assessing the Effect of Model Misspecifications on Parameter Estimates in Structural Equation Models. Sociological methodology, 33(1), 241–265. [Google Scholar]

[R80] Weiner B. (1980). A Cognitive (Attribution)-Emotion-Action Model of Motivated Behavior: An Analysis of Judgements of Help-Giving. Journal of Personality and Social Psychology, 39(2), 186–200. [Google Scholar]

[R81] Wright S. (1921). Correlation and causation. J. agric. Res, 20, 557–580. [Google Scholar]

PERMALINK

An Introduction to Model Implied Instrumental Variables using Two Stage Least Squares (MIIV-2SLS) in Structural Equation Models (SEMs)

Kenneth A Bollen

Zachary Fisher

Michael L Giordano

Adam Lilly

Lan Luo

Ai Ye

Abstract

Translational Abstract

Introduction

Steps in the MIIV Approach

1. Model Specification

Figure 1.

Table 1.

2. Model Identification

3. Latent to Observed (L2O) Variable Transformation

4. Model Implied Instrumental Variables (MIIVs)

Table 2.

5. Estimate Model with MIIV-2SLS Estimator

Table 3.

6. Test Overidentified Equations

Table 4.

Chi Square Difference Tests

Robustness to Structural Misspecification

Table 5.

Figure 2.

Table 6.

Figure 3.

Table 7.

Analysis Using Covariance Matrix and Means

Multilevel Analysis

Categorical and Noncontinuous Endogenous Variables

MIIVS and Causal Inference

Extensions and Applications

Conclusions

Supplementary Material

Acknowledgments

APPENDIX

LATENT TO OBSERVED (L2O) VARIABLE TRANSFORMATION10

MIIV-2SLS ESTIMATOR IN MATRIX NOTATION

SARGAN OVERIDENTIFICATION TEST

Footnotes

REFERENCE

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

LATENT TO OBSERVED (L2O) VARIABLE TRANSFORMATION¹⁰