Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2020 Jan 1.
Published in final edited form as: Multivariate Behav Res. 2018 Sep 17;54(1):31–46. doi: 10.1080/00273171.2018.1483224

MODEL IMPLIED INSTRUMENTALVARIABLES (MIIVs): AN ALTERNATIVE ORIENTATION TO STRUCTURAL EQUATION MODELING*

Kenneth A Bollen 1
PMCID: PMC6693517  NIHMSID: NIHMS1037926  PMID: 30222004

INTRODUCTION

Universal agreement is a rarity in life. And so it is among scholars using Structural Equation Models (SEMs). But there are two views of SEMs that enjoy a near consensus. The first is that SEMs approximate reality. The approximations manifest themselves in two primary forms: distributional misspecifications and structural misspecifications. Distributional misspecifications occur when the distributional assumptions that underlie an estimator fail to hold exactly. For instance, the Maximum Likelihood (ML) system wide estimator for factor analysis and SEMs was originally derived under the assumption that the observed variables came from a multivariate normal distribution (e.g., Lawley, 1940; Jöreskog, 1977), but this assumption rarely holds. Structural misspecifications are the second type of approximation. When paths, variables, correlated errors, or nonlinearities are incorrectly omitted from a model, we have structural misspecifications. Most would readily agree that distributional and structural misspecifications are the norm in SEMs (e.g., Bollen, 1989, pp. 67–72; Browne & Cudeck, 1993; McDonald, 2010).

A second point of near consensus is that the system wide ML estimator is the optimal estimator for SEMs with continuous endogenous variables. This consensus is based on the desirable properties that accompany ML estimation. Specifically, the ML estimator is consistent, asymptotically unbiased, asymptotically efficient, asymptotically normal, and it provides asymptotic standard errors (e.g., Silvey, 1975; Browne, 1984; Bollen, 1989). The value of these properties is great. Having an estimator that is unbiased and efficient in large samples enhances the accuracy of our estimates. And to perform significance tests of parameter estimates, we need standard errors. But there is some “fine print” that underlies these claims of ML’s optimal properties. Among other things, we must assume: 1) a correctly specified model, 2) multivariate normality (or no “excessive multivariate kurtosis”), and 3) a sufficiently large sample for the asymptotic properties to take hold.

In brief, it is widely accepted that SEMs approximate reality and that the ML estimator is the best estimator for continuous dependent variables. Trouble occurs when we combine these two areas of near consensus. If we accept that SEMs are approximations and look more closely at the ML estimator, most of the optimal properties of ML are undermined. An approximate model means that the model has some structural misspecifications, but the consistency, asymptotic unbiasedness, asymptotic efficiency, and asymptotic standard errors are derived assuming no structural misspecifications.

There are other problems with the ML estimator that deserve comment. One is the tendency for the ML estimator to spread the effects of structural misspecifications beyond the equations of the model in which they occur (e.g., Bollen, Kirby, Curran, Paxton, & Chen, 2007). We can have a multiequation model where all but a few equations are specified without structural errors, but the ML estimator can spread the errors of one or two equations to other equations that are correctly specified.

Another problem with the ML estimator occurs when the model as a whole is underidentified, yet the key equations in the system are identified. That is, some of the equations that are underidentified are not important to the research, but their presence can prevent developing estimates of the coefficients in the key equations that are identified.

Nonconvergence is a third problem that occurs with the ML estimator. The ML estimator typically requires an iterative solution. Sometimes a small sample, too few indicators per latent variable, or large amounts of missing data lead to convergence problems that prevent obtaining estimates of parameters (e.g., Anderson & Gerbing, 1984; Bollen, 1989, pp. 254–56; Boomsma, 1985). Increasing the number of iterations does not always help.

Distributional misspecifications create some complications, but these are less serious than the structural misspecifications in that when the distribution is the only problem, we have corrections for nonnormality that enable significance tests (e.g., Bollen & Stine, 1993; Satorra & Bentler, 1994). These test statistic corrections include the likelihood ratio chi square test of overall model fit. But even with corrections, there are other issues. For instance, given the structural misspecifications that are common in SEMs, the chi square test should reject the model once it has sufficient statistical power. Indeed, in large samples a statistically significant chi square is common as it should be when there are structural misspecifications. Locating the source of the problem is challenging. With many equations, variables, and parameters it is difficult to know whether the problems lie with the measurement model or the latent variable model. Simulation studies have suggested problems with using the Lagrangian Multiplier statistic (“Modification Index”) to lead the researcher to the correct structure (e.g., MacCallum, 1986; MacCallum, Roznowski, & Necowitz, 1992).

These issues imply that it would be desirable to have an estimator that is less likely to spread structural specification errors from a few equations to other well-specified equations. It also would be helpful to have an estimator that could estimate a key identified equation even if the model as a whole were not identified. Along with these local estimates of the coefficients of an identified equation, it would be advantageous to have diagnostic test statistics that applied to every overidentified equation in the model. If the estimator were noniterative, then we could avoid the issue of nonconvergence. Finally, it would be valuable if the distributional assumptions to justify it were less restrictive than those of the ML estimator.

In point of fact, there is such an estimator. The purpose of this paper is to describe the Model Implied Instrumental Variable, Two Stage Least Squares (MIIV-2SLS) estimator that meets these needs. Using the MIIV-2SLS estimator leads to an alternative approach to SEMs, while sharing some valuable characteristics such as the ability to handle multiple indicators and latent variables. The MIIV-2SLS was developed over a number of years starting with Bollen (1995; 1996) and this paper consolidates some of the key properties in a single place. My purpose is to provide an overview of these advances rather than to develop new properties or features. It is intended to be didactic. The paper also will illustrate a new R-package, MIIVsem (Fisher, Bollen, Gates & Rönkkö, 2017) that implements the MIIV-2SLS estimator.

In the next section, I will present the primary steps or “ingredients” required in the MIIV-2SLS approach to SEMs. Many of the steps are introduced via a running example. After these steps, I turn to the robustness of the MIIV-2SLS with respect to distributional and structural misspecifications with more emphasis on the latter than the former. The next two sections are on extensions of and open questions about the MIIV-2SLS estimator. This is followed by the conclusion.

PRIMARY INGREDIENTS

It is useful to highlight the different ingredients that go into the MIIV-2SLS approach to SEMs. Some steps are the same as the ML and more traditional approaches to SEMs while others differ. I highlight five steps: 1) Specify the model, 2) Transform from a Latent to an Observed (L2O) variables model, 3) Find the Model Implied Instrumental Variables (MIIVs), 4) Estimate with Two Stage Least Squares (2SLS), and, 5) Test each overidentified equation. The subsections that follow provide more details on each step.

1. SPECIFY MODEL

Researchers must develop a model that summarizes their best understanding of the relationships between the latent variables that represent their substantive concepts and the measures of these latent variables. In the first step, researchers lay out the latent variable and measurement models. This step is the same as the usual approach to SEMs. The latent variable model1 includes the relationship between all latent variables as well as which, if any, of the errors or disturbances in these equations correlate. The measurement model formulates the relationship between the latent variables and the indicators that measure them, as well as any covariances of the unique factors (errors).

General matrix representations of the latent variable and measurement model such as the LISREL model are widely known and I will not repeat them here. Rather I will introduce an empirical example that will serve to illustrate model specification.

Figure 1 is the path diagram of the model. The model represents the relationships between industrialization in 1960 and its effects on political democracy at 1960 and 1965 and the effect of democracy at time 1960 on democracy in 1965. The same four indicators measure political democracy at both times. There are three indicators of industrialization. More details on this model are in Bollen (1989). The sample consists of 75 industrializing countries. Boxes enclosed observed variables (Z s), circles signify latent variables (L s), and the unenclosed variables are errors or disturbances (ε s). The single headed arrows signify a direct effect between two variables. The curved two headed arrows show covariances among variables connected by them. The latent variables are:L1 = Industrialization at 1960, L2 = Political Democracy at 1960, and L3 = Political Democracy at 1965. The Z1 to Z11 are indicators of L1 to L3 as shown in the path diagram.2

Figure 1.

Figure 1

Industrialization and Political Democracy Panel Data Model

Corresponding to the path diagram is a system of equations for the latent variable and measurement model. The equation form of the model is important to the MIIV-2SLS approach, so I provide these equations below:

 Latent Variable Model L2=αL2+B21L1+εL2L3=αL3+B31L1+B32L2+εL3 (1)
MeasurementModelZ1=L1+εZ1Z4=L2+εZ4Z8=L3+εZ8Z2=αz2+Λ21L1+εZ2Z5=αZ5+Λ52L2+εZ5Z9=αZ9+Λ93L3+εZ9Z3=αZ3+Λ31L1+εZ3Z6=αz6+Λ62L2+εz6Z10=αZ10+Λ10,3L3+εZ10Z7=αz7+Λ72L2+εz7Z11=αZ11+Λ11,3L3+εZ11 (2)

Standard assumptions include that the mean of each error variable is zero; all errors (ε s) are uncorrelated with the latent exogenous variable, L1, and are uncorrelated with each other unless connected by a curved two headed arrow. To keep things simple, I also assume that the errors are homoscedastic and not autocorrelated across cases. In addition, I choose one indicator for each latent variable to set the scale of the latent variable. Following Bollen (1989, pp. 152–53, 306–11), I set the intercept of the scaling indicator to zero and its factor loading to one. These or some other restrictions are needed to identify the model. As I will show in the next subsection, this choice of scaling also facilitates the replacement of latent variables with observed variables. The α s in the equations represent the nonzero intercepts where the subscript identifies the equation to which the intercept belongs. The Bs and Λs are the regression coefficients and factor loadings, respectively, where their subscripts describe the variables to which they correspond.

Although the best method for choosing scaling indicators is an understudied problem in the SEM literature in general, common practice suggests that we choose that indicator most closely related to its latent variable and that loads exclusively on that latent variable. Some times previous research reveals which indicators meet these criteria. Other times, researchers need to rely on their best judgment in selecting scaling indicators.

2. TRANSFORM LATENT TO OBSERVED (L2O) VARIABLE MODEL

This second step of the MIIV-2SLS method differs from the usual approach to SEM. It involves replacing all latent variables in the model with their respective scaling indicators minus their errors. The end result is a new model that has no latent variables (L s), but does have a more complex error structure for each equation. This is the latent to observed variable (L2O) transformation. This step is needed because instrumental variable methods nearly always operate with observed variables rather than latent ones. Because latent variables are by definition not in data sets, I transform the equation so that they are replaced by observed variables. Even though this results in the replacement of latent variables with observed variables, the intercepts, regression coefficients, and factor loadings in the L2O form remain the same as they were in the original model with latent variables. So if I obtain estimates of these parameters from the L2O equations, this provides me estimates of the same parameters from the original latent variable and measurement models.

To clarify this step, I return to the latent variable model for the industrialization-democracy example in equation (1). Below I list the equation for the scaling indicator for each latent variable:

Z1=L1+εZ1Z4=L2+εZ4Z8=L3+εZ8. (3)

I rewrite each of these equations to solve for each latent variable resulting in,

L1=Z1εZ1L2=Z4εZ4L3=Z8εZ8. (4)

Equation (4) shows that each latent variable is equivalent to its scaling indicator minus the error for the scaling indicator. I can therefore replace each latent variable by its scaling indicator minus its error.

Consider the L2 equation from the latent variable model [see (1)]:

L2=αL2+B21L1+εL2Z4=αL2+B21Z1+u4 (5)

where u4 = −B21εZ1 +εZ4 +εL2. Equation (5) shows that I can rewrite the L2 latent variable equation as an observed variable equation with a more complex error structure. In a similar fashion, I rewrite the latent variable equation for L3 as an observed variable equation,

L3=αL3+B31L1+B32L2+εL3Z8=αL3+B31Z1+B32Z4+u8 (6)

where u8 = −B31εZ1B32εZ4 +εZ8 +εL3.

An appealing aspect of the L2O transformation is that it eliminates the latent variables from the latent variable equations. In the L2O form the equations have the appearance of regression equations with observed dependent variables and one or more observed explanatory variables. As such, it might be tempting to use Ordinary Least Squares (OLS) to estimate these equations. The problem with so doing is that these equations violate a key assumption of the OLS estimator. That is, the error of each equation correlates with the explanatory variable. For instance, consider equation (5). The error term is u4 = −B21εZ1 +εZ4 +εL2. The term εZ1 is the error for Z1 and clearly correlates with this variable in violation of a key OLS assumption. Similarly, εZ1 and εZ4 from the composite error in equation (6) correlate with Z1 and Z4 and hence discourage the use of OLS.

Instrumental variables provide a method by which researchers can estimate a regression equation when one or more explanatory variables correlates with the equation error. The Model Implied Instrumental Variable (MIIV) method from Bollen (1996) provides a means by which we can find instrumental variables from among the observed variables in a model. Finding the MIIVs is the next Primary Ingredient and I turn to this now.

3. FIND THE MODEL IMPLIED INSTRUMENTAL VARIABLES (MIIVs)

The L2O transformation has created a set of equations where the composite error correlates with the right hand side explanatory variables. As stated in the previous section, this violates a key assumption of OLS. But instrumental variables were designed to work in such situations. Sewall Wright (1925) and/or his father Philip Wright (1928) are credited with developing the idea of using instrumental variables to estimate statistical models (see e.g., Goldberger, 1972; Angrist & Krueger 2001, Stock & Trebbi 2003; Bollen, 2012). Key properties of instrumental variables are that they are uncorrelated with the error of the equation and are sufficiently correlated with the problematic explanatory variables that correlate with the error. More formally, define the explanatory variables in the jth equation to be in an N × K1 matrix Zj, the error to be in an N ×1 vector uj, and the instruments for the equation to be in an N × K2 matrix Vj where N is the number of cases, K1 is the number of explanatory variables plus the intercept in the equation, and K2 is the number of instrumental variables plus the intercept where K2K1. Then the instruments should have the following properties: (1) Vj is uncorrelated with uj, (2) the covariance matrix of Vj is nonsingular, and (3) the rank of the covariance matrix of Zj and Vj is K1 + K2.

The overwhelming practice among analysts using instrumental variable estimators is to search for instruments among variables external to the model. Elsewhere I have referred to this as the auxiliary variables approach to instruments (Bollen, 2012). The MIIV approach proposed in Bollen (1996) differs in that it finds the instrumental variables among the observed variables that are already part of the model. In general, if a SEM is identified, then there are sufficient MIIVs to estimate each equation. Indeed, for some equations there are more MIIVs than the minimum required and this renders the equation overidentified. Because of this there is no need to introduce ad hoc or auxiliary instrumental variables into the model.3 The MIIVs follow because the structure of the model implies which observed variables are uncorrelated with the equation disturbance and which are not.

Bollen (1996, p. 114) describes two ways in which researchers can find the MIIVs for an equation. One method uses the model structure to form the correlation of each observed variable with each error term. If the correlations of an observed variable with all the errors of an equation are zero, then it is a possible MIIV. The second method looks at the observed variables that are directly or indirectly affected by each element of the composite error and any errors that correlate with these errors. I use the second approach here. In outline, the algorithm is to (Bollen, 1996):

  1. Focus on one equation at a time,

  2. Find the direct and indirect effects on the observed variables in the model of each error that is part of the composite error in the L2O form of the equation,

  3. Eliminate from consideration any observed variables found in step (2),

  4. Find the direct and indirect effects on the observed variables in the model of any errors that are correlated with the composite errors in the L2O form of the equation,

  5. Eliminate from consideration any observed variables found in step (4),

  6. The remaining observed variables are the MIIVs for the given equation.

To make this algorithm more concrete I return to the second latent variable equation in (1) from the industrialization and democracy example given in Figure 1. The latent variable equation, its L2O transformation, and the composite error are:

L2=αL2+B21L1+εL2Z4=αL2+B21Z1+u4 with u4=B21εZ1+εZ4+εL2. (7)

Now I need to find the direct and indirect effects of εZ1, εZ4, and εL2 on the observed variables in the model. I start with εL2 and return to the path diagram of the model shown as Figure 2.

Figure 2.

Figure 2

Tracing the Direct and Indirect Effects (in gray) of εL2 on the Observed Variables in the Industrialization and Democracy Example.

In the path diagram I indicate in gray all of the variables that are directly or indirectly influenced by εL2. The observed variables highlighted in gray are eliminated as MIIVs because the nonzero indirect effects of εL2 on them implies that these variables are correlated with εL2 and hence not eligible as instruments.

The next step is to consider the effects of εZ1 and εZ4 on any observed variables in the model. These are the last two parts of the composite error (u4). Figure 3 shows the direct effects of these errors in gray. The εZ1 and εZ4 variables eliminate Z1 and Z4 as MIIVs because these are the random errors for these two variables. Because εZ4 correlates with εZ8 and εZ8 has a direct effect on Z8, the Z8 variable also is ineligible as a MIIV. Combining these two figures, the only observed variables not eliminated and eligible to be MIIVs for this latent variable equation are Z2 and Z3.

Figure 3.

Figure 3

Tracing the Direct and Indirect Effects (in gray) of εZ1 and εZ4 on the Observed Variables in the Industrialization and Democracy Example.

The latent variable equation for political democracy at 1965 (L3) along with its L2O transformed equation are,

L3=αL3+B31L1+B32L2+εL3Z8=αL3+B31Z1+B32Z4+u8with u8=B31εZ1B32εZ4+εZ8+εL3 (8)

To find the MIIVs for the L2O equation would require finding the observed variables that are not directly or indirectly influenced by εZ1, εZ4, εZ8, or εL3 or any errors correlated with these errors. If this were done, you would find that Z2,Z3,Z5,Z6, and Z7 are the MIIVs for this equation. Similarly, an analyst could find the MIIVs for each of the measurement equations.

Fortunately, several researchers have automated this search for MIIVs. Bollen and Bauer (2004) developed a SAS macro to implement the search algorithm for MIIVs described in Bollen (1996). Bauldry (2014) created the miivfind program to use with Stata. Fisher, et al. (2017) implemented a procedure based on the correlation of the observed variables with the composite error in a much broader and flexible R program that in addition to finding the MIIV permits estimation and testing of such models.

This subsection explained the search for MIIVs for each equation in a SEM. It was illustrated for two latent variable equations in the running example to explain how it works. But in practice a researcher has several programs to choose from to automate the procedure. Once the MIIVs are in hand, the next step is to estimate the coefficients.

4. ESTIMATE EQUATIONS WITH TWO STAGE LEAST SQUARES (2SLS)

Econometricians have developed a number of instrumental variable estimators for simultaneous equations without latent variables or sometimes with an emphasis on a single observed variable equation where the error is suspected to correlate with the explanatory variables. There is some literature on latent variable estimation with instruments (e.g., Madansky, 1964; Hägglund, 1982), but the focus has been exploratory factor analysis with uncorrelated errors. In keeping with the other subsections, I center my presentation on the MIIV method developed in Bollen (1996) and the use of the Two Stage Least Squares (2SLS) estimator for these models. Thiel (1953a; 1953b, 1961), Basmann (1957), and Sargan (1958) are credited with inventing 2SLS, but in contexts that differ from latent variable SEMs. However, the L2O transformation from Bollen (1996) permits researchers to use 2SLS for my purposes. Recall that if I can estimate the coefficients from the L2O transformed equation, then I have in hand estimates of the intercepts, factor loadings, and coefficients from the original latent variable and measurement models.

To present the 2SLS estimator, it is helpful to define the matrices for one of the L2O equations where Yj is an N ×1 vector containing the values of the dependent variable for the jth L2O equation, Zj is the matrix of explanatory variables on the right hand side of the same jth L2O equation, and Vj is the matrix of MIIVs for the same jth L2O equation. The MIIV-2SLS estimator of the coefficients is,

(Z^jZ^j)1Z^jYjwhere Z^j=Vj(VjVj)1VZj (9)

If the L2O equation is from the latent variable model, then equation (9) provides coefficient and intercept estimates for a latent variable equation. If it is from the measurement model, then equation (9) contains the factor loading and measurement intercept estimates.

To illustrate this, I return to the latent variable equation with latent political democracy (L2) regressed on latent industrialization (L1). The original latent variable equation is L2 =αL2 + B21L1 +εL2. The L2O transformed form of this equation is Z4L2 + B21Z1 + u4 with MIIVs of Z2 and Z3. For this equation, I have:

Yj=[Z41Z42Z4N]Zj=[1Z111Z121Z1N]Vj=[1Z21Z311Z22Z321Z2NZ3N]. (10)

The 2SLS estimator of the coefficients is (Z^jZ^j)1Z^jYj where Z^j=Vj(VjVj)1VZj.

The other equations from the L2O forms of the latent variable and measurement models would be set up in an analogous fashion.

Table 1 provides a summary of the properties of the MIIV-2SLS and the ML estimators.4 The estimators have many of the same asymptotic properties in that both estimators are consistent, asymptotically unbiased, asymptotically normal, and have asymptotic standard errors for significance testing. They also differ in several ways. Unlike ML, the MIIV-2SLS estimator of factor loadings and regression coefficients is noniterative so that there are no issues with nonconvergence. There is the preceding explicit formula to find the coefficients. In addition, the derivation of MIIV-2SLS does not derive from normality assumptions (see Bollen, 1996) whereas the ML does. Another difference is that the MIIV-2SLS estimator has a diagnostic test for each overidentified equation whereas ML has a single test for the overall model fit.

Table 1.

Comparison of properties of MIIV-2SLS and ML estimators of Structural Equation Models without structural misspecification.

Comparison MIIV-2SLS ML
Consistency Yes Yes
Asymp. unbiased Yes Yes
Asymp. normal Yes Yes
Asymp. efficient Yes* Yes
Asymp. std. errors Yes Yes
Noniterative Yes No
Nonnormal robust Yes No**
Single equation estimable Yes No
Overidentification test Equation Model
*

Efficient among limited information estimators

**

Corrections for nonnormality available

Many of the desirable properties in Table 1 assume that the model is without structural misspecifications. This contradicts the idea that all models are approximations and as I noted in the beginning of this article the estimators lose these optimal asymptotic properties when structural misspecifications are present. Structural misspecifications can affect both estimators, but Monte Carlo simulations suggest that MIIV-2SLS better isolates the structural misspecifications than does the ML estimator (e.g., Bollen et al., 2007). I will explore robustness to structural misspecifications later in this paper, but suffice it to say that the MIIV-2SLS appears to have greater robustness than does the ML estimator.

The asymptotic efficiency of the MIIV-2SLS and ML system wide estimator is another property that is sometimes discussed. The MIIV-2SLS estimator is asymptotically efficient among single equation estimators whereas the ML estimator is asymptotically efficient among system wide estimators. There are several points to bring out in this comparison. First is that the asymptotic efficiency refers to large samples and is silent on these properties in finite samples. Second, is that these properties assume that a researcher uses the correct model and that the distributional assumptions of the estimator are satisfied. But the very idea of approximate models means that both these assumptions are violated to some degree. Therefore the approximate nature of models undermines these asymptotic properties. Their actual properties depend on the model and the nature of the misspecifications.

Another point is that even under the ideal conditions that analysts can create in a simulation where both the model and distributional assumptions hold, the asymptotic properties do not often show much difference in biases or variances between limited information estimators like 2SLS and system wide ones like ML in finite samples. For example, in the context of equations without latent variables, the well-known econometrician William Greene (2012, p. 334) states: “The upshot would appear to be that the advantage of the systems estimators in finite samples may be more modest than the asymptotic results would suggest. Monte Carlo studies of the issue have tended to reach the same conclusion.”

Bollen et al. (2007) find the same thing when they compare the MIIV-2SLS and ML estimator of latent variable SEM in a simulation under the conditions of a perfectly valid model generated with variables from normal distributions. If anything, these conditions should favor the ML estimator, but their results find few differences between MIIV-2SLS and ML in valid models.

Another contrast between MIIV-2SLS and the ML estimator is that with the ML estimator all equations are estimated simultaneously whereas with MIIV-2SLS a researcher can estimate only one equation or all of the equations. If, for example, primary interest lies in the latent variable model, then a researcher can estimate only those equations without estimating the measurement model. Or if the model as a whole is underidentified, but the key equations are identified, a researcher can estimate the identified ones with MIIV-2SLS without having to be concerned with the underidentified parts of the model.

To illustrate the MIIV-2SLS estimates I return to the running example of industrialization and political democracy.5 These estimates are obtainable with any statistical software package that has a 2SLS estimator (e.g., SAS, Stata, SPSS). However, the R package MIIVsem (Fisher et al., 2017) is specifically designed for the MIIV-2SLS estimator. Table 2 contains the computer code to run the complete latent variable and measurement model for the industrialization and political democracy example in Figure 1. Users who are familiar with the R lavaan package (Rosseel, 2012) will see that the specification of the model in MIIVsem is similar to that of lavaan. I will not provide a line-by-line explanation of the input code, but refer readers to Fisher et al. (2017) for more explanation. Rather I will discuss some of the output of MIIVsem. With the industrialization and political democracy example I have concentrated on the latent variable model and I will do the same for the output, extracting that part which estimates the latent variable equations’ parameters.

Table 2.

MIIVsem Code in R to Read in Data and to Run the Industrialization and Political Democracy Empirical Example Shown in Figure 1.

library (“MIIVsem”)
data <– bollen1989a
colnames(data) <– c(paste0(“Z”, 4:11), paste0(“Z”, 1:3))
model <– ‘
   L2 =~ Z4 + Z5 + Z6 + Z7
   L3 =~ Z8 + Z9 + Z10 + Z11
   L1 =~ Z1 + Z2 + Z3
   L2  ~ L1
   L3  ~ L1 + L2
   Z4 ~~ Z8
   Z5 ~~ Z7 + Z9
   Z6 ~~ Z10
   Z7 ~~ Z11
   Z9 ~~ Z11
miive(model, data)

Table 3 extracts the results from the two latent variable equations of the model. The results give the regression coefficients and the intercepts from the two latent variable equations when estimated with MIIV-2SLS. The first column on the left gives the equations to which the estimates refer. For instance, the “L2 ~ L1” are the results from regressing the latent variable of political democracy at 1960 (L2) on industrialization at 1960 (L1). The regression coefficient estimate is 1.261 that gives the expected difference in political democracy (L2) for a one unit difference in industrialization (L1). The asymptotic standard error is 0.426 and the ratio of the coefficient to this standard error gives a “z-value” for significance testing. The probability of a two tailed test of the coeffficient being zero is 0.003 in the next column. Similarly, if I move down a row in the output I find the coefficients for “L3 ~ L1 L2” which is 1965 political democracy (L3) regressed on industrialization (L1) and political democracy (L2) both at 1960. Both the coefficients are positive and statistically significant. The last two rows on the output extract are the intercepts for each equation along with their asymptotic standard errors, z-values and p-values.

Table 3.

MIIV-2SLS Estimates from MIIVsem Output for the Industrialization and Political Democracy Empirical Example (N=75)

STRUCTURAL COEFFICIENTS:
Estimate Sth.Err z-value P(>|z|) Sargan df P(Chi)
L2 ~
  L1 1.261 0.426 2.962 0.003 0.503 1 0.478
L3 ~
  L1 1.123 0.312 3.598 0.000 0.801 3 0.849
  L2 0.724 0.101 7.140 0.000
INTERCEPTS:
  L2 −0.909 2.170 −0.419 0.675
  L3 −4.499 1.424 −3.160 0.002

The last three columns of the output are less familiar. These are overidentification test statistics for the two latent variable equations, the topic of the next subsection.

5. TESTS OF EACH OVERIDENTIFIED EQUATION

Recall that MIIV-2SLS requires MIIVs for each equation in the L2O transformation of the SEM. At a minimum, there should be as many MIIVs as there are explanatory variables for that equation. It is not unusual to have more MIIVs than the minimum in latent variable SEM. In this situation, it is possible to test one of the assumptions of the MIIVs. That is, researchers can test the null hypothesis that all MIIVs are uncorrelated with the equation’s composite error versus the alternative hypothesis that at least one of the MIIVs correlates with the error. This hypothesis is valuable in that it provides a test of the model structure. The reason is that the model structure is what led me to choose particular observed variables as MIIVs and others as not. They are chosen based on the model structure and if this structure is true, then all MIIVs would be uncorrelated with the equation error. So empirical evidence that leads us to reject the null hypothesis is evidence against the model structure.

There are a variety of test statistics for instrumental variables being uncorrelated with the equation error. Kirby and Bollen (2009) looked at an assortment of these and found that many of them performed well in large samples, but that the Sargan (1958) test statistic had the best performance across different sample sizes. MIIVsem incorporates the Sargan test statistic as part of its standard output. The formula for Sargan’s test statistic (Ts) is,

Ts=u^V(VV)1Vu^u^u^/N (11)

where u^ = 2SLS residuals, V = MIIVs, and N = sample size. The Ts asymptotically follows a chi square distribution with degrees of freedom equal to the number of MIIVs minus the number of explanatory variables in the equation.

With this as background, I return to the last three columns in Table 3 to interpret the Sargan test for the two latent variable equations. The Sargan test statistic for the political democracy at 1960 (L2) equation is only 0.503 with 1 degree of freedom (df) and a p-value of 0.478. There is 1 df because there is only one explanatory variable and there are two MIIVs. This results in one more MIIV than the minimum needed. The lack of significance of the Sargan test means that I cannot reject the null hypothesis that both MIIVs are uncorrelated with the equation error. Similarly, if I move to the political democracy at 1965 (L3) latent variable equation, I also find that the Sargan test statistic is not significant and I cannot reject the null hypothesis that all of the MIIVs are uncorrelated with the equation error. Thus, both equations from the latent variable model are consistent with the model structure.

Though I have only shown the two equations for the latent variable model, there is similar output and test statistics for all overidentified equations in the measurement model. It is quite possible for some equations to pass the Sargan test while others do not. This can help locate possible problems with the model specification. Of course, this test statistic like others is subject to cautionary notes having to do with statistical power and multiple testing. False discovery rates or Bonferonni-type corrections are possible ways to take account of multiple testing.

SUMMARY OF PRIMARY INGREDIENTS

The preceding subsection completes the summary of the primary ingredients that go into the MIIV approach to SEMs. The five ingredients are:

  1. Specify Model

  2. Transform from Latent to Observed (L2O) variable model

  3. Find Model Implied Instrumental Variables (MIIVs, pronounced to rhyme with “gives”)

  4. Estimate with Two Stage Least Squares (2SLS)

  5. Test each overidentified equation

In this paper, I have emphasized the use of 2SLS estimator for MIIV. There are other estimators that researchers can use with MIIV. For instance, there are additional limited information estimators that could be developed. There also is a scalable estimator based on the Generalized Method of Moments (GMM) estimator that applies to one, two, or any subset of equations from a full SEM. This MIIV-GMM estimator is described in Bollen, Kolenikov, and Bauldry (2014).

When using the MIIV-2SLS estimator the equation by equation method has some advantages. As mentioned previously, if the equation of most interest is identified, the MIIV-2SLS estimator permits estimates of its coefficient even if the whole model is not identified. Furthermore, if the equation is overidentified, the Sargan diagnostic test is available and researchers can assess whether the MIIVs are uncorrelated with the equation error and hence test whether an assumption implied by the model structure holds. In addition, if the key equation passes the Sargan test, this is evidence that the equation estimates are robust to structural misspecifications in other parts of the model. In other words, structural misspecifications that occur elsewhere in the model structure need not impact the MIIV-2SLS estimator for a given equation. The MIIV-2SLS estimator is robust to some of these structural misspecifications and passing the Sargan test is evidence that is consistent with the robustness for the tested equation.

Are there examples of this robustness for the MIIV-2SLS estimator? It is to this topic that I now turn.

ROBUSTNESS

Analyst use the term “robustness” of an estimator in at least two different senses. One is whether an estimator is robust to violations of the distributional assumptions that might have justified the estimator. For instance, if the ML estimator is based on assuming that the observed variables come from a multinormal distribution, then do the same properties hold if they come from nonnormal distributions. A second sense of robustness concerns whether the properties of the estimator hold when there are structural misspecifications such as omitting variables, omitting paths, or failing to include correlated errors.

The asymptotic properties of the MIIV-2SLS estimator that I described in an earlier section do not depend on the observed variables coming from a normal distribution (see Bollen, 1996). In this sense, the MIIV-2SLS estimator is a “distribution-free” estimator. In addition, it is easy to bootstrap the MIIV-2SLS estimator to develop an alternative estimate of its standard errors for coefficients using MIIVsem (Fisher et al., 2017). However, of more interest and more neglected is the robustness of an estimator to structural misspecifications. The literature on robustness to structural misspecification is sparse and those papers that broach the topic focus on the ML system wide estimator (e.g., Kaplan, 1989; Kaplan & Wenger, 1993; Yuan, Marshall, & Bentler, 2003; Yuan, Douros, & Kelley, 2008).

A general analytic result on when the MIIV-2SLS is robust to structural misspecifications is in Bollen (2001, p. 130):

Suppose that for the j-th equation in the correctly specified model, the model-implied IVs are in a matrix Vj. The 2SLS estimator of the coefficients in Aj is robust for any misspecifications in other equations under two conditions: (1) the equation being estimated is correctly specified, and (2) the misspecifications in the other equations do not alter the variables in Vj.” In essence, this says that the MIIV-2SLS estimator for an equation is robust when the equation is correctly specified and the structural misspecifications in the other equations do not change the MIIVs for that equation.

A few examples can illustrate the power of these robustness conditions and the benefit of having them. To start with I return to the industrialization and political democracy example as shown in Figure 4. Assume that the correct structural equation model includes both the solid and the dashed lines shown in Figure 4. A researcher mistakenly omits the six pairs of correlated errors. This researcher’s primary interest is in the latent variable model. The question is what happens to the MIIV-2SLS estimates of the coefficients of the latent variable model in the correctly and incorrectly specified SEMs?

Figure 4.

Figure 4

Industrialization and Political Democracy Example with Correlated Errors Omitted (dashed lines); “True” Model includes Dashed and Solid Lines

Both models were estimated using the MIIVsem R package. Table 4 extracts the coefficients and intercepts for the “true” model6 and the one that omits all correlated errors and places them side-by-side in the labeled columns. The latent variable coefficients and intercepts are identical. In other words, the two equations for the latent variable model are structurally robust to the misspecification of omitting the six pairs of correlated errors. Not shown are that the standard errors and other equation output for each equation are identical. The reason is that the two latent variable equations satisfy the robustness conditions in Bollen (2001). The MIIVs are identical whether I assume the model with or without the correlated errors.

Table 4.

Comparison of Regression Coefficient and Intercept MIIV-2SLS Estimates for the Latent Variable Equations for True and Two Structurally Misspecified Equations

Latent Variable Equation “True” Model Omits 6 Correlated Errors Omits 28 Correlated Errors & 3 Crossloadings
Regression Coefficients
L2 regressed on L1 1.261 1.261 1.261
L3 regressed on L2 1.123 1.123 1.123
L1 0.724 0.724 0.724
Intercepts
L2 −0.909 −0.909 −0.909
L3 −4.499 −4.499 −4.499

An even more dramatic demonstration of robustness comes in Figure 5. The dashed line with the short arrows pointing to the error variables represents all possible covariances among the connected errors. Here I assume that the true model should have 28 correlated errors that are omitted as well as three omitted cross loadings from L2 to Z9, Z10, and Z11. By most eveyone’s standards omitting this many parameters is a major structural misspecification. Yet the last column of Table 4 shows the MIIVsem R package estimates for the MIIV-2SLS estimator and the latent variable model regression coefficients and intercept are robust to these structural misspecifications. They remain the same as they were in the true model.

Figure 5.

Figure 5

Industrialization and Political Democracy Example with 28 Correlated Errors Omitted (dashed lines) and Three Crossloadings Omitted; “True” Model includes Dashed and Solid Lines

Another form of structural misspecification is having the incorrect number of dimensions underlying a set of indicators. Figure 6 gives a simple illustration where the left part gives the true model with two dimensions with three indicators per dimension. The population factor loadings are set to 1 for all indicators. These are simulated data (N=1000) where all population factor loadings are one. The right figure shows a SEM where a researcher mistakenly assumes that there is a single rather than two dimensions underlying the indicators. It is interesting to see the consequences of applying the MIIV-2SLS estimator to the true and misspecified models.

Figure 6.

Figure 6

Two Factors True Model (left) and Misspecified One Dimension Model.

Table 5 contains the MIIV-2SLS estimates of factor loadings and the Sargan test statistics for both models. Comparing the second and fourth columns of MIIV-2SLS factor loading estimates, it is surprising to find that the factor loading estimates for those indicators that load on the first factor are robust to the wrong dimensionality. That is, I can correctly estimate the influence of the first factor on its indicators even if my model has the wrong number of factors. The explanation is that Z2 and Z3 equations have the same MIIVs under either model structure and these equations meet the robustness conditions in Bollen (2001). In contrast, the last three indicators of Z4 to Z6 are incorrectly specified and do not meet these conditions. Hence, their estimates are not robust.

Table 5.

Comparison of MIIV-2SLS Factor Loading Estimates and Sargan Overidentification Tests for True and Misspecified Measurement Models

Measurement Equation Correct Two Factor Model Incorrect One Factor Model
Indicator Factor Loadings Sargan Test Statistic (df=3) Factor Loadings Sargan Test Statistic (df=3)
Z1 1.000 ----- 1.000 -----
Z2 1.034* 0.555 1.034* 0.555
Z3 0.964* 1.048 0.964* 1.048
Z4 1.000 ----- 0.363* 246.7*
Z5 1.043* 0.247 0.359* 264.0*
Z6 1.144* 0.858 0.405* 273.8*

Note:

*

= p<0.001

Each estimated equation in both models is overidentified in that it has more MIIVs than the minimum required. This permits the Sargan overidentification test of whether all MIIVs of the equation are uncorrelated with the equation error. The third and fifth columns of Table 5 contain the results of applying this test to the estimated equations using the MIIVsem R package. Recall that these are chi square test statistics and in this model all tests have 3 degrees of freedom. The most striking result of these tests is that they are all nonsignificant for the two factor correct model while they are all highly significant for Z4 to Z6 in the incorrect one factor model.7 It suggests that there is a correlation between at least some of the MIIVs of each of these significant equations and their respective error. This result is interesting because Z4 to Z6 are the three indicators that should load on the second factor rather than the first. Hence, this test statistic is a useful diagnostic for a possible source of the problem.

SUMMARY OF MIIV-2SLS ROBUSTNESS

In this section I have discussed the conditions of robustness when using the MIIV-2SLS estimator. Bollen (2001) and Bollen, Fisher, and Gates (2018) present general conditions for robustness that I illustrated. MIIV-2SLS is robust when the MIIVs and the equation from a structurally misspecified model are the same as the MIIVs for the structurally correct model even if other equations are misspecified. Another point I must emphasize is that MIIV-2SLS is not robust for all equations in a structurally misspecified model. The last example that had the correct two factor model versus incorrect one factor model illustrates this. In the incorrect one factor model the two indicator equations for Z2 and Z3 were robust and provided accurate estimates of their factor loadings on the first factor. However, the MIIV-2SLS estimates for the factor loadings of Z4 to Z6 were biased in the one factor model. Clearly, these were not robust as would be expected because I had them load on the first factor rather than a second factor. The general point is that structural misspecifications will typically have consequences for the estimated parameters from at least some equations. But with MIIV-2SLS an analyst can sometimes isolate equations from these biasing effects.

EXTENTIONS

This presentation provided an overview of the basic ingredients of the MIIV-2SLS approach to SEMs. I limited myself to the fundamentals and as a result gave only a partial view. In fact, the MIIV approach to SEMs has been expanded in a number of ways. Categorical endogenous variables are common in research. Bollen and Maydeu-Oliveres (2007), Nestler (2013), and Jin, Luo, and Yang-Wallentin (2016) have examined the Polychoric Instrumental Variable (PIV) estimator for such cases. Interactions and nonlinear functions of latent variables are sometimes called for in models. Bollen (1995) and Bollen and Paxton (1998) examine how to apply MIIV-2SLS to these circumstances. The MIIV-SEM also is applicable to higher-order factor analysis (Bollen & Biesanz, 2002). Second order growth curves with repeated latent variables with longitudinal data is another MIIV application discussed by Nestler (2014). Bollen (2011) discusses a MIIV approach to testing the dimensionality of measures.

A couple of papers have discussed the overidentification tests for MIIVs in SEMs. Kirby and Bollen (2009) explore the performance of several of these that were originally proposed for econometric models without latent variables. Nestler (2015) proposes a MIIV based specification test for nonlinearity and interactions. Bollen et al. (2014) propose a General Method of Moments MIIV estimator.

With regard to software developments, it is possible to use the MIIV-2SLS estimator with any statistical package that has a 2SLS estimator such as R, SAS, Stata, or SPSS. But there are routines that are available to make parts of this easier. For instance, Bollen and Bauer (2004) gives a SAS macro that automates the finding of MIIVs in a SEM. Bauldry’s (2014) miivfind is a more recent procedure to use with Stata. The most comprehensive MIIV package is MIIVsem by Fisher et al. (2017) that is part of R. Among its features are that MIIVsem: finds MIIVs, gives the MIIV-2SLS estimates, calculates Sargan test statistic, allows raw data or covariances as input, has several bootstrap options, permits equality restrictions, includes Wald tests, and can take account of categorical endogenous variables. Features under development include: options for missing data and implementing Lagrangian multiplier tests, weak instruments diagnostics, and the Generalized Method of Moment estimator.

OPEN QUESTIONS

The MIIV approach to SEMs is far less studied than are the system wide estimators such as ML. As a result there are many open research questions. For instance, the field needs to learn more about the optimal selection of scaling indicators for each latent variable. I mentioned this issue earlier in the paper. A scaling indicator with little association with its latent variable seems certain to lead to poorer estimator properties than would a strong indicator. Weak instrument statistics and the R-squared for scaling indicators seem promising diagnostics that could help spot poor scaling indicators. Another scaling issue not explored is what happens when the scaling indicator actually loads on two but is assumed to load on one latent variable.8 What are the consequences? Would this be detectable with the equation overidentification test due to incorrect selection of MIIVs?

The weak instrument problem was mentioned several times in this paper and in the original presentation of MIIV-2SLS (Bollen, 1996). Although the econometric literature has examined this problem for some time now (e.g.,Bowden & Turkington, 1984; Bound, Jaeger, & Baker, 1995), it has been in the context of single equation or simultaneous equation models without latent variables. Because the L2O transformation makes latent variable models into a type of simultaneous equation model with a more complicated error structure, it would seem that much of the findings from the econometric literature should carry over to the MIIV-2SLS estimator. On the other hand, when indicators of the same latent variable are the MIIVs that predict the scaling indicator as is common in multiple indicator models, these relations should be moderate to strong. This might mean that weak instruments are less common than they are in the econometric literature where auxiliary variables external to the model are common. This too would be a fruitful area of research.

The overidentification tests also demand more attention. Based on the results of Kirby and Bollen (2009), I have emphasized the Sargan test in conjunction with the MIIV-2SLS estimator. But other overidentification tests should be further considered. Furthermore, if the Sargan test statistic is significant, researchers need more guidance on the best way to locate the offending MIIVs. This would be helpful in determining how to respecify the model. Another question is whether it is possible for an equation that omits an explanatory variable that correlates with the included variables to escape detection by the Sargan test. More generally, under what conditions will overidentification tests fail to detect problems in the specification of the equation?

The preceding concerns local tests of overidentified equations, but neglects overidentification tests of the full model (e.g., a likelihood ratio test of the hypothesized vs. saturated model) as is typical in ML system wide estimators of SEMs. For those interested in a test of the full system of the equations, there are several options. One is to use the MIIV-GMM estimator from Bollen et al. (2014). It has an overidentification test that applies to any subset of equations including all equations in the model. A second option is to adapt the overall fit statistics described in Bollen and Maydeu-Oliveres (2007) for the PIV estimator to the MIIV-2SLS estimator described here. These would require estimates of the variance and covariance parameters in the model along the lines described in Bollen (1996) and Bollen and Maydeu-Oliveres (2007). A third option is to use the chi square test statistics for vanishing tetrads proposed in Bollen (1990) and Bollen and Ting (1993). These are not yet available in MIIVsem R package, but some are available in other software (e.g., Bauldry & Bollen, 2016). Which of these overall fit test approaches would be most useful is not known.

It also would be valuable to further clarify the conditions under which the MIIV-2SLS estimator is robust. The conditions in Bollen (2001) are fairly general and it would be useful to have more specific ones. Bollen et al. (2018) clarifies when structural misspecifications in the latent variable model affect the measurement model and vice versa, but even more could be done to clarify the robustness conditions within the measurement model or within the latent variable model. There also is an issue of the optimal selection of MIIVs for an equation when there are a large number of MIIVs to choose from. Is there a method to find the optimal subset? Finally, it is important to understand when MIIV-2SLS performs best and worse. For example, Bollen et al. (2007) found that at small sample sizes (e.g., <100) it was best not to use a large number of MIIVs because it led to greater finite sample bias than using a smaller number of MIIVs. This matters less in larger samples. What other types of models, sample sizes, numbers of MIIVs, etc. have noteworthy effects on MIIV-2SLS? These and other questions remain to be investigated.

I remind the reader that much of this paper has focused on a single MIIV estimator, MIIV-2SLS. I already mentioned the MIIV-GMM estimator, but there are other single or multiple equation estimators that researchers could investigate using the L2O transformation and the MIIV approach to selecting instrumental variables. It is likely that no one of these alternative MIIV estimators will consistently outperform all others under all conditions. Research could help reveal which is best and under what circumstances.

In brief, there is much to learn about the MIIV estimators and many opportunities for new research.

CONCLUSIONS

The field of structural equation models is dominated by estimators like ML that assume perfection while we simultaneously preach that models are approximations. When we recognize the approximate nature of our models and the distributional failures of our variables, the optimal properties of ML are called into question. No longer can rearchers using ML claim consistency, asymptotic unbiasedness and asymptotic efficiency when structural and distributional misspecifications are acknowledged. Recognizing approximations is conceding misspecifications. Once we do so, it is desirable to distinguish the good from the bad parts of the model. This suggests that local tests rather than global tests could be helpful. It also points to the desirability of an estimator that is less likely to spread bias from one part of the system to another part. If the estimator were “distribution free,” it would be an added bonus.

The MIIV-2SLS estimator better satisfies the realities of approximate models than the dominant system wide estimators like ML. It is an asymptotic distribution free estimator, but perhaps more importantly, it is less likely to spread bias from structural misspecifications in one equation to other parts of the system. In addition, each overidentified equation has an overidentification test that might be helpful in better pinpointing problems. Furthermore, the MIIVsem package in R (Fisher et al., 2017) makes it easier to implement this MIIV approach to SEM. Despite these advances there remain a number of areas that require further research as detailed in the last section.

McDonald (2010) and others have critique the dependency of the field of SEMs on specifying, estimating, and testing a full model simultaneously. The MIIV approach suggests that we still specify globally, but that we give more attention to estimating and testing locally.

Footnotes

FORTHCOMING IN MULTIVARIATE BEHAVIORAL RESEARCH.

*

Earlier versions of this paper were presented as the 2017 Presidential Address at the Society of Multivariate Experimental Psychology, as a keynote presentation at the 2017 Modern Modeling Methods, University of Connecticut, Storrs, Connecticut, and as a keynote talk at the 2017 Quantitative Methods Section Meeting of the German Psychological Society, Tübingen, Germany. I would like to thank David Braudt and Zack Fisher for research assistance and Michael Giordano, Adam Lilly, Steve West, Ai Ye, and the MBR reviewers for comments on this paper.

1

I refrain from the common practice of calling this the “structural model” because both the latent variable and the measurement model contain structural parameters. To highlight one part of the model as structural and not the other suggests otherwise and can lead to confusion (Bollen, 1989, p. 11).

2

To avoid cluttering the diagram, the intercepts are not explicitly shown. I will include them in the equations that represent the model.

3

In some models that are otherwise underidentified, a researcher could introduce auxiliary instruments to enable identification. In such cases the set of instruments would be a mixture of MIIVs and auxiliary instruments.

4

See Bollen (1996) for MIIV-2SLS properties. The ML properties follow from general principles of ML estimation.

5

The industrialization-democracy example originally appeared in Bollen (1989). The data for this is available in the R package lavaan.

6

“True” appears in quotes because these are real empirical data and the true model is unknown. But for the purposes of this demonstration, I treat it as if it were true.

7

Scaling indicators do not have estimated coefficients nor Sargan test statistics. The absence of the latter is represented by “----” in Table 5.

8

This question is neglected not just for MIIV estimators, but also for ML and other system estimators.

REFERENCES

  1. Anderson JC & Gerbing D (1984). The effect of sampling error on convergence, improper solutions, and goodness-of-fit indices for maximum likelihood confirmatory factor analysis. Psychometrika, 49, 155–173. 10.1007/bf02294170 [DOI] [Google Scholar]
  2. Angrist JD & Krueger AB (2001). Instrumental variables and the search for identification: from supply and demand to natural experiments. Journal of Economic Perspectives, 15, 69–85. 10.2139/ssrn.281433 [DOI] [Google Scholar]
  3. Basmann R (1957). A generalized classical method of linear estimation of coefficients in a structural equation. Econometrica, 25, 77–83. 10.2307/1907743 [DOI] [Google Scholar]
  4. Bauldry S (2014). miivfind: A command for identifying model-implied instrumental variables for structural equation models in Stata. Stata Journal, 14, 60–75. [Google Scholar]
  5. Bauldry S & Bollen KA (2016). tetrad: A set of Stata commands for confirmatory tetrad analysis. Structural Equation Modeling, 23, 921–30. 10.1080/10705511.2016.1202771 [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Bollen KA (1989). Structural Equations with Latent Variables. NY:Wiley; 10.1002/9781118619179 [DOI] [Google Scholar]
  7. Bollen KA (1990). Outlier screening and a distribution-free test for vanishing tetrads. Sociological Methods and Research, 19, 80–92. https://doi-org.libproxy.lib.unc.edu/10.1177/0049124190019001003 [Google Scholar]
  8. Bollen KA (1995). Structural equation models that are nonlinear in latent variables: A least-squares estimator. Sociological Methodology, 25, 223–51. 10.1177/0049124190019001003 [DOI] [Google Scholar]
  9. Bollen KA (1996). An alternative 2sls estimator for latent variable models. Psychometrika, 61, 109–21. 10.1007/bf02296961 [DOI] [Google Scholar]
  10. Bollen KA (2001). Two-stage least squares and latent variable models: simultaneous estimation and robustness to misspecifications In Cudeck R, Jöreskog KG, & Sörbom D (Eds.), Structural equation modeling: present and future: A festschrift in honor of Karl Jöreskog (pp.119–138). Chicago, Illinois:Scientific Software International. [Google Scholar]
  11. Bollen KA (2011). Determining the number of factors using model implied instrumental variables (MIIVs) Presentation at Annual Meeting of the Association of Psychological Sciences, Washington, DC, May 2011. [Google Scholar]
  12. Bollen KA (2012). Instrumental variables in sociology and the social sciences. Annual Review of Sociology, 38, 37–72. 10.1146/annurev-soc-081309-150141 [DOI] [Google Scholar]
  13. Bollen KA & Bauer DJ (2004). Automating the selection of model-implied instrumental variables. Sociological Methods & Research, 32(4), 425–452. 10.1177/0049124103260341 [DOI] [Google Scholar]
  14. Bollen KA & Biesanz JC (2002). A note on a two-stage least squares estimator for higher-order factor analyses. Sociological Methods & Research, 30(4), 568–79. 10.1177/0049124102030004004 [DOI] [Google Scholar]
  15. Bollen KA, Fisher Z & Gates KM. (2018). Robustness conditions for MIIV-2SLS when the latent variable or measurement model is structurally misspecified. Structural Equation Modeling. 10.1080/10705511.2018.1456341 [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Bollen KA, Kirby JB, Curran PJ, Paxton PM, & Chen F (2007). Latent variable models under misspecification: two-stage least squares (2SLS) and maximum likelihood (ML) estimators. Sociological Methods & Research, 36(1), 48–86. https://doi-org.libproxy.lib.unc.edu/10.1177/0049124107301947 [Google Scholar]
  17. Bollen KA, Kolenikov S & Bauldry S (2014). Model-implied instrumental variable generalized method of moments (MIIV-GMM) estimators for latent variable models. Psychometrika, 79(1), 20–50. 10.1007/s11336-013-9335-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Bollen KA & Maydeu-Olivares A (2007). A polychoric instrumental variable (piv) estimator for structural equation models with categorical variables. Psychometrika, 72(3), 309–26. 10.1007/s11336-007-9006-3 [DOI] [Google Scholar]
  19. Bollen KA, & Paxton P (1998). Interactions of latent variables in structural equation models. Structural Equation Modeling, 5(3), 267–93. 10.1080/10705519809540105 [DOI] [Google Scholar]
  20. Bollen KA, & Stine RA (1993). Bootstrapping goodness-of-fit measures in structural equation models In Bollen KA, Long JS, Testing Structural Equation Models (pp. 111–135). Newbury Park, CA:Sage. [Google Scholar]
  21. Bollen KA & Ting K (1993). Confirmatory tetrad analysis. Sociological Methodology, 23, 47–75. 10.2307/271009 [DOI] [Google Scholar]
  22. Boomsma A (1985). Nonconvergence, improper solutions, and starting values in lisrel maximum likelihood estimation. Psychometrika, 50, 229–242. 10.1007/bf02294248 [DOI] [Google Scholar]
  23. Bound J, Jaeger DA, & Baker RM (1995). Problems with instrumental variables estimation when the correlation between the instruments and the endogenous explanatory variable is weak. Journal of the American Statistical Association, 90, 443–50. 10.1080/01621459.1995.10476536 [DOI] [Google Scholar]
  24. Bowden RJ, & Turkington DA (1984). Instrumental variables. Cambridge, UK: Cambridge University Press. [Google Scholar]
  25. Browne MW (1984). Asymptotically distribution-free methods for the analysis of covariance structures. British Journal of Mathematical and Statistical Psychology, 37, 62–83. 10.1111/j.2044-8317.1984.tb00789.x [DOI] [PubMed] [Google Scholar]
  26. Browne MW & Cudeck R (1993). Alternative ways of assessing model fit In Bollen KA and Long JS (Eds.), Testing structural equation models (pp. 136–162). Newbury Park, CA: Sage. [Google Scholar]
  27. Fisher ZF, Bollen KA Gates K & Rönkkö M (2017). MIIVsem: Model implied instrumental variable (miiv) estimation of structural equation models. R package version 0.5.2. [Google Scholar]
  28. Goldberger AS (1972). Structural equation methods in the social sciences. Econometrica, 40, 979–1001. 10.2307/1913851 [DOI] [Google Scholar]
  29. Greene W (2012). Econometric analysis (7th Ed.). Upper Saddle River, NJ: Prentice Hall. [Google Scholar]
  30. Hägglund G (1982). Factor analysis by instrumental variables. Psychometrika, 47, 209–22. 10.1007/bf02296276 [DOI] [Google Scholar]
  31. Jin S, Luo H, & Yang-Wallentin F (2016). A simulation study of polychoric instrumental variable estimation in structural equation models. Structural Equation Modeling: A Multidisciplinary Journal, 23(5), 680–94. 10.1080/10705511.2016.1189334 [DOI] [Google Scholar]
  32. Jöreskog K (1977). Structural equation models in the social sciences: specification, estimation, and testing In Krishnaiah PR (Eds.) Applications of statistics (pp. 265–87). Amsterdam: North Holland. [Google Scholar]
  33. Kaplan D (1989). A study of the sampling variability and z-values of parameter estimates from misspecified structural equation models. Multivariate Behavioral Research, 24, 41–57. 10.1207/s15327906mbr2401_3 [DOI] [PubMed] [Google Scholar]
  34. Kaplan D & Wenger RN (1993). Asymptotic independence and separability in covariance structure models: implications for specification error, power, and model modification. Multivariate Behavioral Research, 28, 467–82. 10.1207/s15327906mbr2804_4 [DOI] [PubMed] [Google Scholar]
  35. Kirby JB, & Bollen KA (2009). Using instrumental variable (IV) tests to evaluate model specification in latent variable structural equation models. Sociological Methodology, 39(1), 327–55. 10.1111/j.1467-9531.2009.01217.x [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Lawley DN (1940). The estimation of factor loadings by the method of maximum likelihood. Proceedings of the Royal Society of Edinburgh, Section A, 60, 64–82. 10.1017/s037016460002006x [DOI] [Google Scholar]
  37. Madansky A (1964). Instrumental variables in factor analysis. Psychometrika, 29, 105–13. 10.1007/bf02289693 [DOI] [Google Scholar]
  38. MacCallum R (1986). Specification searches in covariance structure modeling. Psychological Bulletin, 100, 107–120. 10.1037//0033-2909.100.1.107 [DOI] [Google Scholar]
  39. MacCallum R, Roznowski M, & Necowitz L (1992). Model modifications in covariance structure analysis: The problem of capitalization on chance. Psychological Bulletin, 111, 490–504. 10.1037//0033-2909.111.3.490 [DOI] [PubMed] [Google Scholar]
  40. McDonald RP (2010). Structural models and the art of approximation. Perspectives on Psychological Science, 5, 675–86. 10.1177/1745691610388766 [DOI] [PubMed] [Google Scholar]
  41. Nestler S (2013). A monte carlo study comparing PIV, ULS and DWLS in the estimation of dichotomous confirmatory factor analysis. British Journal of Mathematical and Statistical Psychology, 66(1), 127–43. 10.1111/j.2044-8317.2012.02044.x [DOI] [PubMed] [Google Scholar]
  42. Nestler S (2014). How the 2SLS/IV estimator can handle equality constraints in structural equation models: A system-of-equations approach. British Journal of Mathematical and Statistical Psychology, 67(2), 353–69. 10.1111/bmsp.12023 [DOI] [PubMed] [Google Scholar]
  43. Nestler S (2015). Using instrumental variables to estimate the parameters in unconditional and conditional second-order latent growth models. Structural Equation Modeling, 22(3), 1–13. 10.1080/10705511.2014.93494825614730 [DOI] [Google Scholar]
  44. Rosseel Y (2012). lavaan: An R package for structural equation modeling. Journal of Statistical Software, 48(2), 1–36. 10.18637/jss.v048.i02 [DOI] [Google Scholar]
  45. Sargan JD (1958). The estimation of economic relationships using instrumental variables. Econometrica, 26, 393–415. 10.2307/1907619 [DOI] [Google Scholar]
  46. Satorra A & Bentler PM (1994). Corrections to test statistics and standard errors in covariance structure analysis In von Eye A & Clogg CC (Eds.), Latent variables analysis: Applications for developmental research (pp. 399–419). Thousand Oaks, CA: Sage. [Google Scholar]
  47. Silvey SD (1975). Statistical inference. Boca Raton, FL: Chapman & Hall/CRC. [Google Scholar]
  48. Stock JH and Trebbi F (2003). Who invented instrumental variable regression? Journal of Economic Perspectives, 17, 177–94. 10.1257/089533003769204416 [DOI] [Google Scholar]
  49. Theil H (1953a). Repeated least-squares applied to a complete equation systems Mimeo. The Hague, Netherlands: Central Planning Bureau. [Google Scholar]
  50. Theil H (1953b). Estimation and simultaneous correlation in complete equation systems Mimeo. The Hague, Netherlands: Central Planning Bureau. [Google Scholar]
  51. Theil H (1961). Economic forecasts and policy, (2nd Ed.). Amsterdam, Netherlands: North Holland. [Google Scholar]
  52. Wright PG (1928). The tariff on animal and vegetable oils. New York: Macmillan. [Google Scholar]
  53. Wright S (1925). Corn and hog correlations. United States Department of Agriculture Bulletin. 1300, 1–60. [Google Scholar]
  54. Yuan K, Douros CD, & Kelley Ken. (2008). Diagnosis for covariance structure models by analyzing the path. Structural Equation Modeling, 15(4), 564–602. 10.1080/10705510802338991 [DOI] [Google Scholar]
  55. Yuan K, Marshall LL, & Bentler PM (2003). Assessing the effect of model misspecifications on parameter estimates in structural equation models. Sociological Methodology, 33, 241–65. 10.1111/j.0081-1750.2003.00132.x [DOI] [Google Scholar]

RESOURCES