Abstract
Advances in research methods, data collection and record keeping, and statistical software have substantially increased our ability to conduct rigorous research across the lifespan. In this article, we review a set of cutting-edge statistical methods that life-course researchers can use to rigorously address their research questions. For each technique, we describe the method, highlight the benefits and unique attributes of the strategy, offer a step-by-step guide on how to conduct the analysis, and illustrate the technique using data from the National Institute of Child Health and Human Development Study of Early Child Care and Youth Development. In addition, we recommend a set of technical and empirical readings for each technique. Our goal was not to address a substantive question of interest but instead to provide life-course researchers with a useful reference guide to cutting-edge statistical methods.
Introduction
As the number of large-scale longitudinal studies grows and researchers consider an even broader range of structural, social, and cultural determinants of health-related outcomes, the need for statistical techniques that adequately address complex questions has also increased. In addition, the expanding interdisciplinary nature of our field demands greater statistical sophistication. However, even the most experienced researchers often lack the knowledge they need to use these new techniques, and, as a result, life-course research is sometimes based on less appropriate and less powerful techniques than are currently available. Indeed, a review of the literature indicates that the majority of existing life-course studies are cross-sectional, using cohorts of individuals of different ages to model an outcome across the lifespan. Those studies that are longitudinal frequently include only 2 time points for each individual or are analyzed as if they only include 2 time points (e.g., as a difference score). However, if we are to continue to move the field forward, methods that allow us to explore change over time in both continuous and dichotomous outcomes, combine a range of related indicators into a single construct, simultaneously estimate numerous regression effects, or deal with issues of selection bias must become the norm and not the exception. As such, the purpose of this study is to provide an overview of a set of analytic techniques that may be particularly relevant for life-course researchers. These techniques are drawn from a range of other disciplines, including education, psychology, and economics, and offer a critical perspective on life-course research. Our intent is not to teach readers how to conduct these analyses; instead, our hope is to illustrate the kinds of questions we can be asking of our data and to provide researchers with the information and resources they need to get started.
In the following, we review 6 modeling strategies that can be used to examine associations between a predictor or set of predictors and an outcome across the life course: 1) regression with covariates; 2) hazard modeling; 3) individual growth modeling; 4) structural equation modeling (SEM3); 5) propensity score analysis (PSA); and 6) regression discontinuity design (RDD) analysis (for a summary of each technique, see Table 1). We realize that regression with covariates is a very familiar analytic strategy for life-course researchers and have included it here only as a comparison for other techniques. We also recognize that many life-course researchers are familiar with hazard modeling and so we only provided a brief description here. To illustrate 4 of the 6 techniques, we used data from the National Institutes of Child Health and Human Development Study of Early Child Care and Youth Development (1). We are not able to illustrate regression discontinuity analysis because we do not have an appropriate treatment variable in the dataset. However, we do provide several excellent examples from the literature.
TABLE 1.
Modeling strategies, purposes, sample equations, and statistical software packages for a life-course approach
| Technique | Equations | Explanation | Program | Command |
| Regression with covariates | Yi = β0 + β1X + β2Z | Y is the outcome for individual i | Standard statistical packages, such as SAS (SAS Institute Inc.), SPSS (IBM), Stata (StataCorp) SAS, SPSS, Stata | Proc mixed in SAS, regression in SPSS, and xtmixed in Stata |
| β0 is the average outcome value when all else is 0 | ||||
| β1 is the association between predictor and outcome controlling for other covariates | ||||
| X is the predictor of interest | ||||
| Z is a vector of covariates | ||||
| Hazard modeling | Logit h(tij) = [α1D1ij + α2D2ij + …αJDij] + [β1X1i + β2X2ij] | Logit h(tij) is the individual i value of logit hazard at time j | Standard statistical packages, such as SAS, SPSS, Stata | Proc logistic in SAS, logistic regression in SPSS, logistic in Stata |
| Individual growth modeling | Level 1: Yij = [π0i + π1i (Ageij)] + [εij] | Level 1: Yij is the outcome for individual i at time j | Standard statistical packages, such as SAS, SPSS, Stata | Proc mixed in SAS, mixed in SPSS, and xtmixed in Stata |
| Level 2: π0i = γ00 + γ01HQi + ζ0i | π0i is the initial level or intercept value for individual i when all else in the model is 0 | |||
| Ageij is the linear rate of change for individual i at time j; centered at the first time point | ||||
| π1i = γ10 + γ11HQi + ζ1i | Level 2: π0i, π1i are the individual growth parameters | |||
| γ01HQi, γ11HQi is the relation between early home quality and individual growth parameters | ||||
| Structural equation modeling | Specialized software packages, such as MPlus (Muthen & Muthen) and AMOS (IBM) | BY for measurement model and ON for structural model in MPlus | ||
| Propensity score analysis | Step 1: Ti = λ0 + λ1X1 + λ2X2…λnXn + δi | Step 1: Ti is the outcome in the logistic regression analysis (treatment) | Preferably Stata | Pscore in Stata |
| Step 2: Yi = β0 + β1Ti + β2P1 + β3P2 + … βnPn + εi | X1, X2, etc., are selection variables identified from the literature | |||
| Step 2: Yi is the outcome for individual i | ||||
| β1 is the effect of predictor of interest, controlling for the propensity blocks | ||||
| Ti is the treatment effect | ||||
| P1, P2, etc., represent matched groups within a given propensity block | ||||
| Regression discontinuity analysis | Yi = β0 + β1Ti + γ(Xi − Xc) + εi | Yi is the outcome for individual i | Standard statistical packages, such as SAS, SPSS, Stata | Proc mixed in SAS, regression in SPSS, and xtmixed in Stata |
| β1 is the causal effect of treatment | ||||
| Ti is a dichotomous variable representing treatment assignment | ||||
| Xi − Xc is the exogenous covariate used to determine treatment centered on the cutoff point |
We chose to test the hypothesis that BMI at 15 y is associated with child birth weight and early home quality. BMI at 15 y was defined as the adolescent’s weight (pounds) divided by the square of their height (inches) times 703. Child birth weight was measured in grams. Early home quality was a composite of 8 subscales measured when study children were aged 36 and 54 mo: 1) learning materials; 2) language stimulation; 3) physical environment; 4) parental responsivity; 5) learning stimulation; 6) modeling of social maturity; 7) variety in experience; and 8) acceptance of the child. Higher scores on the composite indicate better home quality. For 3 of the modeling strategies (regression with covariates, SEM, and PSA), we used birth weight and early home quality to predict BMI at 15 y; for individual growth modeling, we used birth weight and early home quality to predict changes in BMI between 24 mo and 15 y. Our purpose here is not to answer a substantive question of interest. Instead, our goal is to illustrate the method and to demonstrate how the answer to our question may vary across analytic strategies. However, we recommend that, whenever possible, researchers should consider using multiple modeling strategies to address their research questions because no single technique provides a solution to all problems (2).
Regression with Covariates
Regression with covariates is one of the most commonly used strategies across a wide range of disciplines. In general, the goal of regression analysis is to estimate the magnitude and direction of the association between a predictor or set of predictors and an outcome, controlling for an extensive set of covariates. In non-experimental studies, the inclusion of covariates is meant to help isolate the association between the predictor of interest and the outcome variable by modeling important sources of influence. Including covariates also helps rule out other plausible explanations for differences in the outcome. Nevertheless, a model can be over-controlled (i.e., include too many controls) or under-controlled (i.e., fail to include key controls). This leads to biased estimates of the association between a predictor and an outcome and reduces the generalizability of the researchers findings (3, 4). Thus, both theory and existing literature must play a central role in the selection of the covariates that will be used in any analysis.
To illustrate regression with covariates, we regressed the outcome, BMI at 15 y, on the predictors birth weight and early home quality. There was evidence that both variables predicted BMI at age 15 y (Table 2). Specifically, children who had lower birth weights (B = 0.001, P < 0.001) and higher-quality home environments during early childhood (B = −0.18, P < 0.001) had lower BMI scores at 15 y than children who had higher birth weights or children from lower-quality home environments. Next, to control for the possible effects of other family environment variables, we added to the model child sex (male), maternal education, and family poverty status at 24, 36, and 54 mo. Both birth weight and early home quality remained statistically significant predictors of BMI at 15 y. Examinations of the B values across models showed that the effect size of the predictors of interest reduced only slightly when the covariates were added to the model. Note, also, that some of the covariates were statistically significant as well. Although it has not always been common to present the uncontrolled associations between the predictor and outcome variable, in a recent article published in Psychological Science, Simmons et al. (5) recommended that researchers present both the controlled and uncontrolled. The authors suggest that doing so makes it clear the extent to which the association of interest is dependent on the presence of a covariate. Model fit is typically described using the R2 statistic, with higher values indicating better fit, and model comparisons are made by calculating a change in R2.
TABLE 2.
Results from regression with covariate analyses for BMI at 15 y (n = 833)1
| Model 1 | Model 2 | |
| Birth weight | 0.001* (0.000) | 0.001* (0.000) |
| Early home quality | −0.18* (0.029) | −0.10* (0.036) |
| Male | −0.14 (0.331) | |
| Maternal education | −0.20** (0.082) | |
| Poor (24 mo) | 0.55 (0.609) | |
| Poor (36 mo) | 1.86*** (0.602) | |
| Poor (54 mo) | −0.74 (0.588) | |
| R2 statistic | 0.055 | 0.082 |
Data are expressed as parameter estimate (SD). *P < 0.001; **P < 0.05; ***P < 0.01.
We recommend the study by Tabachnick and Fidell (4) for some additional reading on regression with covariates. For a more technical reading on multiple regression analysis, see the study by Rubinfeld (6). All statistical software packages include commands for regression with covariates, and all are comparable.
Hazard Modeling
Like regression with covariates, hazard modeling is a relatively common statistical technique for life-course researchers. As such, we devoted only a small amount of time to the topic. Broadly speaking, hazard modeling, also referred to as survival analysis or event history analysis, is used to describe event occurrence and/or reoccurrence over time, as well as to identify determinants of these event histories (7). Researchers interested in whether an event occurs (e.g., relapse to drug or alcohol use, diagnosis of a disease, becoming overweight or obese) and if so when it occurs should consider using hazard modeling to address their research questions. Hazard modeling produces estimates of the unique probability that individuals will experience an event during a specified time frame (referred to as hazard probability), as well as estimates of the cumulative probability that individuals will not experience the event over time (referred to as survival probability). In addition, associations between event occurrence and time-invariant (i.e., takes on the same value at each assessment) or time-varying (i.e., can take on a different value at each assessment) predictors can be estimated.
One of the greatest challenges faced by researchers studying event occurrence is that participants may not experience the event under investigation during the study period. As a result, the researcher does not know whether the individual ever experienced the event; they only know that, if they did experience the event, it was outside of the data collection period (8). These participants are referred to as censored individuals. In the past, censored cases were often dropped from the analysis, and thus estimates of event occurrence and timing were based on only those individuals who actually experienced the event. The result was often an underestimate of average time to event occurrence. Others have assigned censored individuals an event occurrence “time”—usually equivalent to the last time period—thereby including these individuals in their analysis. However, the result of this strategy was often an overestimate of the average time to event occurrence (7). Hazard modeling addresses the problem by assigning censored cases a 0 for the event and then including them in all analyses. That is, individuals who do not experience the event contribute to the estimation of event occurrence during each assessment period. The result is a more precise estimate of the probability of event occurrence because estimates are based on all individuals, i.e., those that experienced the event and those who did not.
Hazard modeling involves several steps. First, the researcher must create a person-period dataset with a set of time dummy variables (for more information on the person-period dataset, see the section on individual growth modeling below), as well as a dichotomous event variable. The time dummies take on a value of 1 during each assessment to which an individual contributes and a 0 during all other time periods (e.g., for assessment 1, time dummy 1 gets a “1” and assessments 2, 3, and 4 get “0” for assessment 2, time dummy 2 gets a “1” and assessments 1, 3, and 4 get “0”). The event variable indicates whether or not the individual experienced the event of interest during that time period, given that he or she had not already experienced the event. Next, using the person-period dataset, the researcher fits a basic hazard model using logistic regression. That is, the dichotomous event variable is regressed on the time dummy variables (e.g., 4 dummies if there are 4 time periods), and a set of logit values are obtained. These logits are then used to calculate either the hazard (i.e., the unique probability of experiencing the event during that assessment period) or survival (i.e., cumulative probability that an individual will not experience the event) probability. It is important to note that, to obtain estimates for all of the time dummy variables, the researcher must specify that the intercept should not be estimated. As a final step, determinants of risk (i.e., predictors) are added to the model. Predictors can be time invariant, i.e., they take on the same value at each assessment point, or time varying, i.e., they can take on a different value at each assessment point. Interactions between these predictors and the time dummies can also be included to determine whether the effect of a predictor is consistent across time or whether there are periods of greater and lesser risk. Model fit is compared using a deviance statistic, with smaller values indicating a better fitting model.
For additional technical reading on hazard modeling, see the studies by Singer and Willett (7), Alison (9), and Willett and Singer (10). For applications of hazard modeling, see the studies by Willett and Singer (11), Rank and Hirschl (12), and Gupta et al. (13).
Individual Growth Modeling
Life-course researchers are frequently interested in understanding change over time in 1 or more continuous outcomes. Individual growth modeling refers to a family of methods used to describe development in an outcome over time to compare patterns of change across groups and to identify determinants of change (7, 14). This set of techniques provides basic information on population estimates of the average level of the outcome at a given time (intercept), the average linear rate of change or growth over time (slope), and the average nonlinear rate of change (quadratic, cubic, etc.). In addition, these techniques can tell us about the effects that substantive predictors have on the intercept and slope. Because repeated assessments of an individual are necessarily correlated, failure to adequately account for these correlations can result in biased estimates of the intercept and slope (7). However, individual growth modeling methods typically account for these correlations by nesting repeated assessments within individuals, although there is considerable variation in how effectively they do so (14).
There are a variety of approaches to individual growth modeling, including univariate and multivariate repeated measures, hierarchical linear modeling, latent growth-curve modeling, growth mixture modeling, and latent class analysis. Each of these techniques varies, to some extent, in whether they allow for repeated assessments of a predictor (i.e., time-varying predictors), whether they can handle missing data, whether they allow for individual differences in the intercept as well as the slope, whether they require equally spaced assessments for each individual, and whether they handle measurement error (14). For example, univariate repeated-measures modeling only estimates individual differences in the intercept but not in the rate of change (or slope). In contrast, multivariate repeated-measures modeling allows for individual differences in both the intercept and slope. That is, in addition to estimating fixed effects for the intercept and slope, multivariate repeated-measures modeling also estimates random effects (often referred to as variance components) for the intercept and slope. Latent growth-curve modeling separates an individual’s true score from measurement error, thereby producing more precise estimates of the population average intercept and rate of change (15). In general, however, researchers will likely reach the same conclusion regardless of which individual growth modeling method they use, and, thus, selection of a specific technique is driven primarily by the researcher’s goals.
Estimating individual growth curves involves numerous steps. For techniques such as univariate and multivariate repeated-measures or hierarchical linear modeling, the researcher must begin by creating a person-period dataset (note that individual growth modeling techniques such as latent growth-curve modeling, growth mixture modeling, and latent class analysis do not always require this structure) (7). Data are commonly stored in a way that every individual in the dataset has a single line of data (often referred to as a person-oriented dataset). Repeated assessments of the outcome (or predictors) are represented by multiple columns that are differentiated by a time or assessment index (e.g., BMI1, BMI2, BMI3, etc.). In contrast, in a person-period dataset, every individual has multiple lines of data corresponding to the number of assessment periods. For example, in a longitudinal study with 4 assessment points, every individual would have 4 lines of data. Every variable that can take on a different value at each assessment will be represented by a single column rather than by multiple columns.
Once the dataset has been transformed, the researcher then fits a set of unconditional models. The unconditional means model contains only an intercept and is used to partition the outcome variance into within-individual and between-individual variance (7). This information can be used to inform model building. For example, higher within-individual variation suggests a need for more time-varying predictors (and fewer time-invariant predictors), whereas higher between-individual variation suggests a need for both. The unconditional growth model contains both an intercept and a slope and tells us whether there is statistically significant linear change in our outcome variable over time. Decisions about where to center time (e.g., beginning, middle, or end of study) must be made before fitting the model (7). Assuming that there is significant linear growth and adequate data, nonlinear growth terms (e.g., quadratic, cubic, quartic, etc.) can then be added to the model. To estimate a linear growth model, you need at least 3 time points; to estimate a quadratic model, at least 4 time points are needed; to estimate a cubic model, at least 5 points are needed; and so on. As a final step, determinants or predictors of the intercept and slope are added to the model. Note that the inclusion of the main effect of a predictor in the model simply tests the effect of that predictor on the intercept; to test the effects of a predictor on the slope, an interaction between the predictor and time must also be added to the model. Predictors of the intercept and slope are commonly referred to as “level 2” predictors in hierarchical linear modeling techniques. Nested models are compared using the deviance statistic, with smaller values indicating a better fitting model. Models that are not nested can be compared using the Akaike information criterion and Bayesian information criterion statistics, and again, smaller values indicate a better fit.
To illustrate multivariate repeated-measures analysis, one of the many individual growth modeling techniques, we return to our working example of the association between child birth weight, the early home environment, and BMI. We began by restructuring our dataset from a person-oriented dataset to a person-period dataset. Next, we centered our time variable at the last assessment point by subtracting 180 mo (~15 y) from each time period. This enables us to interpret our intercept as the average BMI at 15 y. Then, using a multilevel procedure, we fit an unconditional linear growth model to determine whether there was statistically significant linear growth in BMI from 24 mo to 15 y. It is here that the researcher would also test for nonlinear growth before adding predictors to the model. However, because of space limitations in here, we focused only on linear growth. Finally, we predicted growth in BMI first from our question predictors (i.e., birth weight and early home environment) and then from our question predictors and control variables.
Not surprisingly, there was evidence of linear change in children’s BMI between 24 mo and 15 y. On average, adolescents’ BMI was 22.58 (P < 0.001) at 15 y and increased between 24 mo and 15 y at a rate of 0.05 (P < 0.001) units per month (Table 3). Both birth weight and the early home environment significantly predicted the average level of BMI at 15 y, as well as the rate of change in BMI across childhood and adolescence. Higher birth weight was associated with a higher BMI at 15 y, as well as more rapid increases in BMI over time. In contrast, the quality of the early home environment was negatively associated with level and rate of change in BMI such that a higher-quality early home environment predicted a lower BMI at 15 y, as well as a less rapid increase in BMI over time (Table 3, Model 1). To control for other possible explanatory factors, we added to the model child sex, maternal education, and family poverty status over time. Birth weight and quality of the early home environment remained statistically significant predictors of BMI at 15 y, as well as change in BMI across childhood and adolescence (Table 3, Model 2). Note that, with the exception of the effect of child sex on the intercept, our control variables were also statistically significant.
TABLE 3.
Individual growth model predicting changes in BMI from 24 mo to 15 y from birth weight and quality of the early home environment1
| Model 1 | Model 2 | |
| Intercept (π0i) | 25.72* (1.05) | 26.26* (1.09) |
| Birth weight | 0.001* (0.0002) | 0.001* (0.0002) |
| Early home quality | −0.185* (0.019) | -0.117* (0.022) |
| Male | -0.0117 (0.212) | |
| Maternal education | −0.259* (0.051) | |
| Poor (24 mo) | 0.753* (0.203) | |
| Rate of change (π1i) | 0.079* (0.006) | 0.078* (0.006) |
| Birth weight | 0.0001* (0.000) | 0.0001* (0.000) |
| Early home quality | −0.001* (0.0001) | −0.001* (0.0001) |
| Male | −0.002** (0.001) | |
| Maternal education | −0.002* (0.0003) | |
| Poor (24 mo) | 0.009* (0.002) | |
| Variance components | ||
| Within individual | 2.18 | 2.17 |
| Between individual | 2.54 | 2.52 |
| Fit statistics | ||
| R2 within | 0.563 | 0.570 |
| R2 between | 0.095 | 0.098 |
| R2 overall | 0.348 | 0.353 |
Data are expressed as parameter estimate (SD). *P < 0.001; **P < 0.05.
For more information on individual growth modeling techniques more generally, see the studies by Singer and Willett (7) and Burchinal et al. (14). For specific information on hierarchical linear modeling, see the study by Bryk and Raudenbush (16). For specific information on latent growth modeling, see the studies by Willett and Bub (15) or Willett and Sayer (17). Most statistical software packages include commands for individual growth modeling, and all are comparable.
SEM
SEM, also referred to as covariance structure analysis, is an extremely versatile tool that can be used to address very complex research questions. SEM is a very large topic, and much of the works on SEM are technical in nature. As such, we attempted to provide a very brief overview of the many ways in which SEM can be used. Common analytic strategies that can be performed within the SEM framework include confirmatory factor analysis, multivariate regression analysis, simultaneous equation estimation, path analysis, latent growth modeling, and multilevel modeling. One of the key benefits of SEM is that it explicitly models measurement error, which commonly occurs when social, psychologic, and health-related indicators are measured. This disattenuation of observed variables from measurement error results in unbiased (or less biased) estimates of the relation between 2 or more variables (18). Thus, SEM is sometimes referred to as a causal effects technique. Another benefit of SEM is that the researcher can examine multiple outcomes simultaneously rather than fitting separate models for each outcome of interest. Similarly, SEM can be used to model the effects of change in one domain on changes in another domain, something no other analytic tool can handle. Clearly, the benefits of SEM are great.
Not surprisingly, the kind of information obtained from SEM depends on the type of analysis conducted. For example, researchers interested in creating a latent construct (i.e., a theoretical model of something that has not been measured directly) will learn not only whether the indicators they have selected offer a reasonable representation of the construct they were attempting to construct but also the extent to which each indicator contributes to the construct. Similarly, for those who use SEM to estimate a regression or a set of simultaneous equations, parameter estimates describing the magnitude and direction of association between predictors or latent constructs and one or more outcomes are derived. When latent growth modeling is conducted, estimates of the true intercept and rate of change are obtained.
Assessing model fit is one of the most important steps in SEM, although there is little consensus on which indices to use and what values suggest a good fit. The most common fit indices include the χ2 statistic, the comparative fit index (CFI), and root mean square error of approximation (RMSEA). The χ2 statistic, sometimes referred to as the “badness of fit” statistic, provides an index of the discrepancy between the sample and the fitted covariance matrices (19). When the “match” is poor, the χ2 statistic will be larger, but when the “match” is good, the χ2 statistic will be smaller. In general, model fit is considered good if the χ2 statistic is nonsignificant. Note that the χ2 statistic is sensitive to sample size; when large samples are used, the χ2 statistic will frequently lead the researcher to reject the model. The CFI essentially compares the sample covariance matrix with a hypothetical null model (20). Because the CFI is minimally affected by sample size, it is a frequently reported fit statistic (21). CFI values ranging from 0.90 to 1 have been said to provide a fair fit to the data, although a more conservative value of 0.95 is now being recommended (18). Finally, the RMSEA, which is often considered one of the most informative fit indices, provides an estimate of how well the model fits the population covariance matrix (22). The RMSEA is sensitive to the number of parameters being estimated, and thus more parsimonious models will typically have lower RMSEA values. Values <0.10 are said to provide adequate fit to the data, although more recent cutoff values are 0.06 or 0.07 (18, 23). It is generally recommended that researchers report multiple fit indices because each index is sensitive to different elements of the model.
It is not uncommon to obtain model fit statistics that indicate a moderate to poor fit to the data. Before concluding that the model is not viable, there are a number of strategies the researcher can use to try to improve model fit. General strategies include dropping indicators if the factor loadings are small or too large (typically determined by the R2 statistic), setting error variances to 0 if they are small or negative and nonsignificant, and fixing the correlations among indicators and/or errors to be 0 when there is no theoretical or statistical reason they should be correlated. When there are a large number of items (as we have here), it is worth considering whether multiple factors might better represent the constructs of interest. In instances in which multiple latent constructs are involved, it is a good idea to estimate each one separately to determine whether one construct is more problematic than the other. Software programs, such as MPlus, do provide a set of modification indices, which can be used to improve model fit. More specifically, modification indices offer guidance for parameters that can be fixed, freed, or removed to improve model fit, but they should not be used without careful consideration. For a useful review of fit indices and model respecification, see the study by Hooper et al. (24).
To estimate a latent construct using SEM, often referred to as a measurement model, the researcher must begin by identifying the observed variables he or she would like to use to represent this abstract or unmeasured construct. Although a latent construct can be represented by as few as 2 indicators, model fit tends to be better when more indicators are used. Thus, we recommend using at least 3 measured variables. Once the indicators have been identified, the researcher must select 1 to serve as the scaling factor for the latent construct. The factor loading for this variable will be fixed to 1 (this is done by placing it first in the list of variables), and the factor loadings (or relative contribution) for all other variables will be estimated.
Once the measurement model has been estimated, the researcher can use the latent construct to predict one or more observed variables (or use one or more observed variables to predict the latent construct). That is, the observed variable is simply regressed on the latent construct (or vice versa). One of the key benefits of SEM over other regression techniques is that, with SEM, we can simultaneously estimate the relations between a predictor (or latent construct) and multiple outcomes (or latent constructs). To do this, the researcher specifies the latent construct in the first line of the syntax and then specifies the relation of interest in the second line of syntax. This portion of the model is typically referred to as the structural model. Common extensions of SEM include investigating interactions between 2 observed variables, an observed variable and a latent construct, or 2 latent constructs and exploring path differences across groups using a multigroup analysis.
To illustrate the SEM technique, we return to our working example. In previous analyses, the early home quality variable was created by averaging total home quality scores at 36 and 54 mo. In this analysis, we began by creating a latent construct representing early home quality that comprised 16 observed variables. Indicators included the following: 1) learning materials; 2) language stimulation; 3) physical environment; 4) parental responsivity; 5) learning stimulation; 6) modeling of social maturity; 7) variety in experience; and 8) acceptance of the child measured when children were aged 36 and 54 mo. The factor loading for learning materials at 36 mo was fixed to 1 to provide the scaling unit. The 15 remaining observed variables significantly loaded onto the early home quality factor, although they did not contribute to the construct equally (for factor loadings, see Table 4). More specifically, B values ranged from 0.66 (language stimulation at 36 mo) to 0.31 (acceptance at 54 mo). Examination of the fit statistics indicated an adequate, although not ideal, fit to the data (χ2 = 529.16, df = 98, P = 0.00; CFI = 0.88; RMSEA = 0.07, P = 0.00), and thus a single latent construct representing early home quality was retained for subsequent analyses.
TABLE 4.
Estimated factor loadings for early home quality measurement model1
| Unstandardized Factor Loading | Standardized Factor Loading | |
| Learning materials, 36 mo | 1.00 (0.000) | 0.78* (0.018) |
| Language stimulation, 36 mo | 0.39* (0.021) | 0.66* (0.023) |
| Physical environment, 36 mo | 0.31* (0.024) | 0.47* (0.031) |
| Responsivity, 36 mo | 0.36* (0.026) | 0.52* (0.029) |
| Academic stimulation, 36 mo | 0.41* (0.023) | 0.64* (0.024) |
| Modeling, 36 mo | 0.29* (0.022) | 0.51* (0.029) |
| Variety, 36 mo | 0.52* (0.028) | 0.68* (0.022) |
| Acceptance, 36 mo | 0.17* (0.017) | 0.39* (0.033) |
| Learning materials, 54 mo | 0.47* (0.025) | 0.61* (0.027) |
| Language stimulation, 36 mo | 0.16* (0.013) | 0.45* (0.032) |
| Physical environment, 36 mo | 0.28* (0.020) | 0.52* (0.029) |
| Responsivity, 54 mo | 0.25* (0.026) | 0.37* (0.034) |
| Academic stimulation, 54 mo | 0.25* (0.021) | 0.47* (0.031) |
| Modeling, 54 mo | 0.25* (0.020) | 0.47* (0.031) |
| Variety, 54 mo | 0.41* (0.025) | 0.61* (0.026) |
| Acceptance, 54 mo | 0.11* (0.014) | 0.31* (0.036) |
| Fit statistics | ||
| χ2 (df) | 529.164* (98) | |
| CFI | 0.88 | |
| RMSEA | 0.07 (P < 0.001) | |
Data are expressed as factor loading (SE). CFI, comparative fit index; RMSEA, root mean square error of approximation. *P < 0.001.
Next, we regressed 15-y BMI on the latent construct representing early home quality, as well as the observed indicators for birth weight, child sex (male), maternal education, and poverty status at 24, 36, and 54 mo. Once again, there was evidence that birth weight and the latent construct representing early home quality predicted BMI at 15 y (Fig. 1). As was the case with the regression with covariates analysis, children who had lower birth weights (B = 0.13, P < 0.001) and higher-quality home environments (B = −0.14, P < 0.01) had lower BMI scores at 15 y than other children. Again, some of the covariates were statistically significant as well. Examination of the fit statistics indicated that the model provided an adequate (although not ideal) fit to the data (χ2 = 742.80, df = 203, P = 0.00; CFI = 0.87; RMSEA = 0.06, P = 0.00).
FIGURE 1.
Fitted path diagram for birth weight and early home quality on BMI at 15 y (standardized results with SEs in parentheses). Note that the circle represents the latent construct for early home quality and comprises 16 indicators. The squares represent observed variables. CFI, Comparative Fit Index; RMSEA, Root Mean Square Error of Approximation.
For additional technical readings on SEM, see the studies by Keiley et al. (25), Kline (26,27), and Muthén and Muthén (28). For applications of SEM, see the studies by Windle and Mason (29), (L.K. Ferretti, K.I. Bub, unpublished results), and Ferretti and Bub (30). Specialized software packages, such as MPlus, AMOS, and Stata, are required for SEM.
PSA
Even in the most well-designed studies, issues of selection or omitted variables bias can affect our findings. Thus, researchers across disciplines, including education and the social sciences, have begun to seek methods that allow them to more rigorously test the associations between their predictors of interest and their outcomes. One such technique is PSA (31). Broadly speaking, PSA models the probability that an individual will have a particular experience (e.g., high-quality home environment) or be in a treatment program (e.g., obesity-prevention program) given a set of background characteristics for that individual. Assuming that the model is appropriately specified (i.e., the background characteristics are correct), PSA produces unbiased estimates of the effect of the treatment on the outcome variable, corrected for selection effects (32).
PSA is a 2-stage modeling process. In the first stage, logistic regression is used to predict “treatment” status from a set of background characteristics (note that “treatment” does not necessarily imply a formal treatment program; it can also reflect a specific experience like a high-quality home environment). Each participant receives a score indicating his or her unique probability of being in a given group (e.g., high-quality home environment). Individuals who are actually in the group should have high propensity scores given their background characteristics. Individuals with similar background characteristics who were not in the group should also have high propensity scores. In this way, PSA can be used to identify a reasonable comparison group. However, it should be noted that, in non-experimental studies, a researcher can never be sure that he or she has included all of the relevant background characteristics that influenced selection into a given group. Nevertheless, a careful consideration of the characteristics increases the chances that the majority of factors were included.
In the second stage of modeling, the propensity scores are used to model associations between the predictor of interest and the outcome. One of 2 approaches can be used here. First, the continuous propensity score can be entered into a regression model to control for background or selection factors. This approach is not all that different from adding a vector of covariates to your regression model and thus is not used all that frequently. The second approach is to use the propensity scores to create groups who are “matched” on background characteristics and thus should be equally likely to be in a given group (e.g., high-quality early environment) as not (e.g., low-quality early environment). Groups are matched in that the mean values for all background variables within a block are similar regardless of treatment status. Once matched groups are obtained, separate regression models within each block are fitted and the coefficients for the predictor of interest are averaged across models to obtain a mean treatment effect (e.g., a mean effect of high-quality home environment).
To illustrate PSA, we return to our working example of the association between early home quality and BMI at age 15 y. We began by dichotomizing the early home quality variable such that individuals who were in the top 75th percentile were given a score of 1 to indicate a high-quality home environment, and those below the 75th percentile received a score of 0. Note that any value could be used to dichotomize your variable of interest, although the decision is typically driven by theory and previous research; we selected the 75th percentile because the range of values was relatively limited and we wanted to maximize the differences across groups. We then examined our descriptive statistics on other child and family background variables (mothers’ and fathers’ education, maternal age at birth of study child, poverty status at 24 mo, and child’s birth order) for the 2 quality groups. Next, we fit a logistic regression model in which we predicted the probability of being in a high-quality home environment from these background variables. In addition to calculating a propensity score for each individual (ranging from 0 to 1), PSA also creates a set of balanced blocks in which individuals in the block are equally likely to be in the treatment (in this case high-quality home environment) as not given a similar set of background characteristics. More specifically, through an iterative process, the continuous propensity score is divided into blocks. The number of blocks can be specified by the researcher or driven by the data. When specified by the researcher, the most common number of blocks requested is 5. Blocks are retained and the model is considered adequate when a t test reveals that there are no differences in the average propensity score for the treatment and control groups and when the number of treatment and control individuals in each group is sufficient. Within each block, the probability of being in the treatment (i.e., high-quality home environment) is identical. For example, in our dataset, block 1 consists of all individuals with a propensity score <0.20, block 2 consists of individuals with propensity scores between 0.20 and 0.40, etc. The result is a set of blocks or strata in which some individuals are in high-quality home environments and some are not, but the means on each background characteristic are similar regardless of group membership. Finally, we fit 2 linear regression models: 1) using the propensity score as a control; and 2) using the matched blocks as controls.
An examination of the descriptive statistics for those with high-quality home environments versus those without it revealed that mothers in the high-quality sample were more educated (mean = 15.89 vs. 13.90), had partners who were more educated (mean = 15.95 vs. 14.16), and were older when the study child was born (mean = 31.28 vs. 27.67). Families with high-quality early home environments were less likely to be poor as well. No differences between groups were found for child’s birth order. Next, we fit a regression model in which we predicted adolescents’ BMI at 15 y from birth weight, our dichotomous home quality variable representing a high-quality early home environment, and the continuous propensity score. Home quality was not a statistically significant predictor of BMI, suggesting that there are no differences in BMI between those who experienced a high-quality home environment and those who did not once we account for the probability of being in a high-quality environment (Table 5).
TABLE 5.
Results from the propensity score analysis for adolescent BMI at 15 y1
| Model 1 | Model 2 | |
| Intercept | 20.83* (1.21) | 20.79* (1.18) |
| Birth weight | 0.001** (0.0003) | 0.001** (0.0003) |
| High-quality early home environment | −0.373 (0.412) | −0.332 (0.416) |
| Continuous propensity score | −5.51* (1.01) | |
| Block 2 | −0.560 (0.489) | |
| Block 3 | −1.62** (0.544) | |
| Block 4 | −2.28* (0.575) | |
| Block 5 | −2.45* (0.539) | |
| Block 6 | −3.12* (0.842) | |
| Block 7 | −4.15+ (2.44) |
Data are expressed as parameter estimate (SD). *P < 0.001; **P < 0.01; ***P < 0.05; +P < 0.10.
Finally, we refit the regression model in which we included 6 dummy variables representing the matched propensity blocks rather than the continuous propensity score. Note that we requested 5 blocks, but these blocks did not meet the balancing properties described above (i.e., no difference in propensity scores between treatment and control groups within a block and/or too few individuals in the treatment and control groups within a block), and thus a sixth block was created. Again, high-quality home environment did not significantly predict BMI at 15 y, but 4 (blocks 3–6) of the 6 propensity blocks significantly predicted BMI, and 1 (the block 7) marginally predicted BMI. The negative and statistically significant coefficients for each propensity block suggest that, relative to block 1 or the excluded group (i.e., the individuals with the lowest likelihood of being in a high-quality home environment), individuals with a higher probability of being in a home environment tend to have lower BMIs at 15 y.
For an overview of PSA, see the studies by Murnane and Willett (32) and McCartney et al. (33). For more technical readings on PSA, see the studies by D’Agostino (34), Rosenbaum and Rubin (35), and Rubin (36). For an application of PSA, see the study by Hill et al. (37). The most appropriate software program for estimating PSA is Stata.
Regression Discontinuity
The RDD is one of the most powerful quasi-experimental designs for identifying causal effects in non-experimental studies (38). Because assignment to a treatment or intervention is often endogenous (i.e., correlated with unmeasured variables and thus with the error term), estimating the causal effect of such a program can be difficult. To deal with this endogeneity, researchers using RDD assign individuals to either a treatment or control group based on a cutoff score for a given predictor variable. As such, treatment assignment is known (rather than unknown), and any differences between the treatment and control group on a subsequent outcome can be interpreted as the average treatment effect (38). For example, using our BMI data, children whose BMI is above the “obese” value might be assigned to participate in a nutrition intervention, whereas children whose BMI falls below the “obese” value would not receive the intervention. By comparing the outcomes (e.g., BMI in adolescence or early adulthood) of children just above the cutoff (i.e., the treatment group) with the outcomes of children just below the cutoff (i.e., the control group), we can determine the causal effect of our intervention, because any differences in the outcome are presumably attributable to the intervention alone. For an illustrated example of the RDD, see Figure 2. One additional benefit of RDD is that it reduces, if not eliminates, many of the ethical challenges associated with random assignment studies. Because it is often difficult to randomly assign individuals to a given condition (e.g., we cannot ethically assign someone to an obesity condition and make them gain weight), the RDD allows researchers in health-related fields to compute stronger causal estimates of treatment effects.
FIGURE 2.
Sample figure from a regression discontinuity analysis examining posttest scores on an outcome from pretest scores on the same construct. Solid black line represents the hypothetical control group, and the solid gray line represents the hypothetical treatment group.
RDDs can be either sharp or fuzzy. In a sharp design, treatment is determined only by a cutoff score on some observable variable. Causal inferences made from a sharp RDD closely resemble those obtained in a randomized experiment (39). In a fuzzy design, treatment may be determined by a set of unknown characteristics (or by a single characteristic), but the probability of receiving the treatment is discontinuous at the cutoff (40). In other words, exceptions to cutoff rules (e.g., BMI values 10% of a point below the obese value might be assigned to the treatment because the child needs the nutrition intervention) can lead to biased estimates, but as long as the likelihood of receiving the treatment is discontinuous, the RDD holds and estimates of the average treatment effect can be computed.
Like PSA, RDD is a 2-stage process. First, the researcher must identify a cutoff score on a predictor variable and determine which participants will serve as the treatment group (e.g., those above the cutoff) and which participants will serve as the control group (e.g., those below the cutoff). Based on this determination, the researcher creates a dichotomous treatment variable (i.e., 1 for treatment and 0 for control). In the second stage, the researcher fits a basic regression model in which the posttest score is regressed on the treatment indicator as well as the pretest score centered around the cutoff score. By centering the pretest score around the cutoff, the intercept is equal to the cutoff score. As a final step, additional covariates can be added to the regression model.
As we noted previously, we are not able to demonstrate the RDD with our working example because we do not have a treatment program per se. Nevertheless, the field of education offers numerous effective examples of RDD. For example, to investigate the effect of a new school reform effort in the Chicago Public Schools, Jacob and Lefgren (41) examined the causal impact of summer school on children’s reading and mathematics skills. They began by assigning children to summer school based on their pretest reading and math scores. Children above the cutoff did not receive remedial services (i.e., summer school), whereas children below the cutoff were required to attend summer school. They then compared the posttest reading and mathematics scores of children who received the treatment with the scores of children who did not receive the treatment and found that children who attended summer school had higher tests scores. They concluded that summer school lead to the differences in children’s test performance.
Gormley et al. (42) offer another excellent example of the application of RDD to education policy issues. Using variation in state-mandated age cutoffs for entering kindergarten, the authors investigated the impact of the Oklahoma universal pre-kindergarten (pre-k) program on children’s academic achievement in kindergarten. More specifically, they compared the outcomes of “young” kindergarten children who had just completed pre-k (i.e., treatment group) with the outcomes of “old” pre-k children who were just beginning pre-k (i.e., control group). They found that the universal pre-k program led to increases (as much as 3 points) in a range of language, literacy, and mathematics outcomes. Furthermore, the benefits were conferred to a diverse group of children, including ethnic minority and low-income children.
For an excellent overview of RDDs, see the study by Murnane and Willett (32). For those interested in more technical readings on RDD, see the studies by Hahn et al. (40) or Shadish et al. (43). Regression discontinuity analyses can be conducted in any software package, and all are relatively comparable.
Conclusion
In a field that is rapidly changing, the need to conduct rigorous research across the lifespan is expanding exponentially. In many instances, regression with covariates is insufficient for addressing the complex interdisciplinary questions that life-course researchers are investigating. Techniques that allow use to explore changes in an outcome over time or that carefully control for selection bias are necessary if we are going to produce results that accurately describe development or appropriately inform practice. In this study, we reviewed a set of statistical methods that life-course researchers can use to address their questions about health and development from infancy to older adulthood. Rather than offering a technical discussion of each analytic strategy, our goal was to provide a user-friendly guide to these techniques and offer additional resources for those who would like to learn more. We conclude with 2 recommendations. First, researchers should take the time to carefully consider the analytic strategy they plan use to address their questions. Sometimes, the data lend themselves to 1 technique over another; often, however, 1 technique is no better than another, and thus the selection of a strategy that leads to warranted conclusions is a critical step in moving our fields forward. Second, although each technique can be used alone, we encourage researchers to consider using multiple strategies to address their research questions, as we have done here. This approach to data analysis is being used more frequently, and a comparison of findings across strategies is often quite informative.
Acknowledgments
All authors read and approved the final manuscript.
Footnotes
Abbreviations used: CFI, comparative fit index; pre-k, pre-kindergarten; PSA, propensity score analysis; RDD, regression discontinuity design; RMSEA, root mean square error of approximation; SEM, structural equation modeling.
Literature Cited
- 1.NICHD Early Child Care Research Network. Child care and child development: results from the NICHD Study of Early Child Care and Youth Development. New York: Guilford Press; 2005.
- 2.Winship C, Morgan SL. The estimation of causal effects from observational data. Annu Rev Sociol. 1999;25:659–706 [Google Scholar]
- 3.Newcombe NS. Some controls control too much. Child Dev. 2003;74:1050–2 [DOI] [PubMed] [Google Scholar]
- 4.Tabachnick BG, Fidell LS. Using multivariate statistics. 3rd ed. New York: Harper Collins Publishers; 1996.
- 5.Simmons JP, Nelson LD, Simonsohn U. False-positive psychology: undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychol Sci. 2011;22:1359–66 [DOI] [PubMed] [Google Scholar]
- 6.Rubinfeld DL. Reference guide on multiple regression. Federal Judicial Center, Reference Manual on Scientific Evidence. 2nd ed. Washington, DC: Federal Judicial Center; 2005:179–227.
- 7.Singer JD, Willett JB. Applied longitudinal data analysis: modeling change and event occurrence. New York: Oxford University Press; 2003.
- 8.Singer JD, Willett JB. It’s about time: using discrete-time survival analysis to study duration and the timing of events. J Educ Stat. 1993;18:155–95 [Google Scholar]
- 9.Alison PD. Survival analysis using the SAS system: a practical guide. Cary, NC: SAS Institute; 1995.
- 10.Willet JB, Singer JD. Investigating onset, cessation, relapse and recovery: why you should, and how you can, use discrete-time survival analysis. J Consult Clin Psychol. 1993;61: 952–65 [DOI] [PubMed] [Google Scholar]
- 11.Willett JB, Singer JD. It’s deja vu all over again: using multiple-spell discrete-time survival analysis. J Educ Behav Stat. 1995;20:41–67 [Google Scholar]
- 12.Rank MR, Hirschl TA. The economic risk of childhood in America: estimating the probability of poverty across the formative years. J Marriage Fam. 1999;61:1058–67 [Google Scholar]
- 13.Gupta S, Smock PJ, Manning WD. Moving out: transition to nonresidence among resident fathers in the United States, 1968–1997. J Marriage Fam. 2004;66:627–38 [Google Scholar]
- 14.Burchinal M, Nelson L, Poe M. Growth curve analysis: an introduction to various methods for analyzing longitudinal data. In: McCartney K, Burchinal MR, Bub KL, editors. Best practices in quantitative methods for developmentalists, monographs of the society for research in child development. Boston, MA: Blackwell Publishing; 2006:65–87. [DOI] [PubMed]
- 15.Willett JB, Bub KL. Structural equation modeling: latent growth curve analysis. In: Everitt BS, Howell DC, editors. Encyclopedia of statistics in behavioral science. Chichester, UK: John Wiley & Sons; 2005:772–9.
- 16.Bryk AS, Raudenbush SW. Hierarchical linear models: application and data analysis. 2nd ed. Newbury Park, CA: Sate Publications; 2002.
- 17.Willett JB, Sayer AG. Using covariance structure analysis to detect correlates and predictors of individual change over time. Psychol Bull. 1994;116:363–81 [Google Scholar]
- 18.Hu L, Bentler PM. Cutoff criteria for fit indexes in covariance structure analysis: conventional criteria versus new alternatives. Struct Equ Modeling. 1999;6:1–55 [Google Scholar]
- 19.Hu L, Bentler PM. Fit indices in covariance structure modeling: sensitivity to underparameterized model misspecification. Psychol Methods. 1998;3:424–53 [Google Scholar]
- 20.Bentler PM. Comparative fit indexes in structural equation modeling. Psychol Bull. 1990;107:238–46 [DOI] [PubMed] [Google Scholar]
- 21.Fan X, Thompson B, Wang L. Effects of sample size, estimation methods, and model specification on structural equation modeling fit indexes. Struct Equ Modeling. 1999;6:56–83 [Google Scholar]
- 22.Bryne BM. Structural equation modeling with LISREL, PRELIS and SIMPLIS: basic concepts, applications, and programming. Mahwah, NJ: Lawrence Erlbaum Associates; 1998.
- 23.Steiger JH. Understanding the limitations of global fit assessment in structural equation modeling. Pers Individ Dif. 2007;42:893–8 [Google Scholar]
- 24.Hooper D, Coughlan J, Mullen MR. Structural equation modelling: guidelines for determining model fit. Electr J Business Res Methods. 2008;6:53–60 [Google Scholar]
- 25.Keiley MK, Dankoski M, Dolbin-MacNab M, Liu T. Covariance structure analysis: from path analysis to structural equation modeling. In: Sprenkle DH, Piercy FP, editors. Research methods in family therapy. 2nd ed. New York: Guilford Press; 2005:432–60.
- 26.Kline RB. Principles and practice of structural equation modeling. 3rd ed. New York: Guilford Press; 2011.
- 27.Kline RB. Latent variable path analysis in clinical research: a beginner’s tour guide. J Clin Psychol. 1991;47:471–84 [DOI] [PubMed] [Google Scholar]
- 28.Muthén LK, Muthén BO. Mplus user’s guide. 5th ed. Los Angeles, CA: Muthén & Muthén; 2007.
- 29.Windle M, Mason WA. General and specific predictors of behavioral and emotional problems among adolescents. J Emot Behav Disord. 2004;12:49–61 [Google Scholar]
- 30.Bub KL, McCartney K, Willett JB. Behavior problem trajectories and first grade cognitive ability and achievement skills: a latent growth curve analysis. J Educ Psychol. 2007;99:653–70 [Google Scholar]
- 31.Rubin DB. Bayesian inference for causal effects: the role of randomization. Ann Stat. 1978;6:34–58 [Google Scholar]
- 32.Murnane RJ, Willett JB. Methods matter: improving causal inference in educational and social science research. New York: Oxford University Press; 2011.
- 33.McCartney L, Bub KL, Burchinal M. Selection, detection, and reflection. In: McCartney K, Burchinal MR, Bub KL, editors. Best practices in quantitative methods for developmentalists, monographs of the society for research in child development. Boston, MA: Blackwell Publishing; 2006:105–26 [DOI] [PubMed]
- 34.D’Agostino RB. Tutorial in biostatistics: propensity score methods for bias reduction in the comparison of a treatment to a non-randomized control group. Stat Med. 1998;17:2265–81 [DOI] [PubMed] [Google Scholar]
- 35.Rosenbaum PR, Rubin DB. The central role of the propensity score in observational studies for causal effects. Biometrika. 1983;70:41–55 [Google Scholar]
- 36.Rubin DB. Estimating causal effects form large datasets using propensity scores. Ann Intern Med. 1997;127:757–63 [DOI] [PubMed] [Google Scholar]
- 37.Hill JL, Waldfogel J, Brooks-Gunn J, Han W. Maternal employment and child development: a fresh look using newer methods. Dev Psychol. 2005;41:833–50 [DOI] [PubMed] [Google Scholar]
- 38.Cook TD, Campbell DT. Quasi-experimentation: design and analysis issues for field settings. Boston: Houghton Mifflin; 1979.
- 39.Pitts SC, Prost JH, Winters JJ. Quasi-experimental designs in developmental research: design and analysis considerations. In: Teti DM, editor. Handbook of research methods in developmental science. Malden, MA: Blackwell Publishing; 2004.
- 40.Hahn J, Todd P, Van der Klaauw W. Identification and estimation of treatment effects with a regression-discontinuity design. Econometrica. 2001;69:201–9 [Google Scholar]
- 41. Jacob BA, Lefgren L. Remedial education and student achievement: a regression discontinuity analysis. NBER Working Paper 8918. Cambridge, MA: National Bureau of Economic Research.
- 42.Gormley WT, Gayer T, Phillips DA, Dawson B. The effects of universal pre-k on cognitive development. Dev Psychol. 2005;41:872–84 [DOI] [PubMed] [Google Scholar]
- 43.Shadish WR, Cook TD, Campbell DT. Experimental and quasi-experimental designs for generalized causal inference. Boston: Houghton-Mifflin; 2002.


