An Introduction to Inverse Probability Weighting and Marginal Structural Models: The Case of Environmental Tobacco Exposure and Attention Deficit/Hyperactivity Disorder Behaviors

Michael T Willoughby; Siri Warkentien; Erica N Browne; Lisa Gatzke-Kopp; Daniel Berry

doi:10.1037/dev0001803

. Author manuscript; available in PMC: 2025 Sep 27.

Published in final edited form as: Dev Psychol. 2024 Aug 22;61(1):195–213. doi: 10.1037/dev0001803

An Introduction to Inverse Probability Weighting and Marginal Structural Models: The Case of Environmental Tobacco Exposure and Attention Deficit/Hyperactivity Disorder Behaviors

Michael T Willoughby ¹, Siri Warkentien ¹, Erica N Browne ¹, Lisa Gatzke-Kopp ², Daniel Berry ³

PMCID: PMC11894555 NIHMSID: NIHMS2055842 PMID: 39172429

Abstract

Developmental scientists routinely examine how a focal predictor relates to some aspect of children’s development. Although covariate adjustment is typically used to test hypotheses, propensity score-based methods, including inverse probably of treatment weighting (IPTW) and marginal structural models (MSM), can strengthen inference and answer more nuanced, developmentally relevant questions. This paper provides a didactic introduction to IPTW and MSM methods and demonstrates their use for testing the impact of environmental smoke exposure (continuous treatment) from 6-90 months on parent-reported ADHD behaviors in 1^st grade for 1053 children (51% male, 44% Black) in the Family Life Project. We highlight differences that result both in conclusions and in the evaluation of assumptions for IPTW and MSM relative to more traditional covariate adjustment methods. Sample Stata syntax is provided.

Keywords: ADHD, environmental tobacco exposure, causal methods

Introduction

Overview

Developmental scientists routinely conduct studies that are designed to test questions about how a focal predictor (e.g., an exposure or experience) relates to some aspect of children’s development. Observational designs are frequently used because the focal predictors of interest often cannot be ethically subjected to experimentation. Psychologists have long been aware of the threats to inference that result when relying on observational designs, including the risk of confounds underlying the observed association between predictors and outcomes (Campbell, 1957; Shadish, Cook, & Campbell, 2002). To offset this risk, developmentalists routinely collect information on potential confounder variables and include these as covariates in their statistical models. Covariate adjustment has become the de facto standard for testing developmental questions. An unstated premise is that increasing the number of covariates that are considered serves to improve the strength of inference.

There is nothing inherently wrong with the conventional approach for testing developmental questions, and developmentalists routinely leverage experimental evidence to inform their research questions whenever it is available (e.g., randomized controlled trials; experimental animal research). However, it is instructive to consider how researchers from other disciplines grapple with similar challenges. In this study, we demonstrate two closely related propensity score-based methods, inverse treatment probability weighting (IPTW) and marginal structural models (MSM), that have gained substantial support in epidemiology, economics, and medicine. We do not attempt to provide a fully didactic treatment of these methods, which have been provided elsewhere (e.g., Hernan & Robins, 2020; Thoemmes & Ong, 2016; Hong, 2015). Instead, we provide a non-technical introduction to these approaches through an applied example. Our objective is to demonstrate how IPTW and MSM can challenge us to think more clearly about the formulation of developmental questions, the selection of covariates, and to more explicitly discern whether covariate adjustment efforts were successful. Moreover, in situations in which focal predictors and confounders are time varying, these methods overcome limitations of traditional covariate adjustment approaches. To organize this work, we provide a brief rationale for our motivating research question. Next, we provide an orientation to key ideas from Donald Rubin’s potential outcomes framework on which IPTW and MSM methods depend. Finally, we describe IPTW and MSM methods and demonstrate their application.

Postnatal Tobacco Exposure and Children’s Attention Problems

The association between maternal smoking during pregnancy and externalizing behavior in children has been documented for hyperactivity, aggression, and antisocial behaviors (Gatzke-Kopp & Beauchaine, 2007; Gaysina et al., 2013; Keyes, Davey Smith, & Susser, 2014). Until recently, there has been far less attention paid to the potential risks of postnatal smoke exposure. Recent research has expanded to consider the impacts of postnatal environmental exposure, including secondhand smoke sources beyond the child’s mother, and environmental sources no longer in the airborne phase (Matt et al., 2011). We recently demonstrated a positive linear association between children’s levels of secondhand smoke exposure (using salivary indicators of cotinine, a metabolic byproduct of nicotine exposure) and teacher reported externalizing behaviors, which persisted even after controlling for multiple family, caregiver, and child characteristics (Gatzke-Kopp et al., 2020). Yet there continues to be considerable debate in the field about whether this association is confounded by other processes (e.g., other characteristics of children’s homes that correlate with secondhand smoke exposure). The consequences of providing rigorous evidence to this debate should not be overlooked. As was seen with prenatal smoking, once rigorous evidence demonstrated that smoking during pregnancy increases risks of physical health complications for both mother and child, robust public health efforts were mounted to encourage prenatal smoking cessation. Similar efforts may result with respect to postnatal smoke exposure.

Building on our earlier work (Gatzke-Kopp et al., 2019, 2020), here we demonstrate the IPTW method to estimate the effect of a single treatment exposure period for the following research question: What is the association between average environmental smoke exposure across the first seven years of life and parent-reported attention deficit hyperactivity disorder (ADHD) symptomatology in first grade? We then acknowledge that smoke exposure, like many critical developmental exposures, is dynamic and can vary throughout childhood. Although researchers can choose to simplify dynamic treatments by using data from a single timepoint or averaging values from multiple time points (as we do in the IPTW demonstration for didactic purposes), these strategies result in information loss, may violate assumptions or provide biased estimates, and ultimately prevent the investigation of more nuanced and developmentally relevant questions. Such questions might ask whether earlier exposure to treatment is more consequential than later exposure (timing), whether there is a linear relation between treatment exposure length and outcome (duration), or whether the order of treatment exposures matters (sequencing). Answers to these more specific questions on the timing, duration, and sequencing of exposures during childhood have implications for policy and/or prevention efforts. We subsequently demonstrate the use of MSM methods to answer the question: What is the unique and cumulative association of environmental smoke exposure at each of two early periods (infancy-toddlerhood and early childhood) with parent-reported ADHD symptomatology in 1^st grade? Does the developmental timing of environmental smoke exposure matter for ADHD?

Potential Outcomes Framework

West and Thoemmes (2010) described two traditions related to causal inference. They observed that psychologists are primarily trained in the tradition of Donald Campbell, which prioritizes careful research design in estimating treatment effects by ruling out common threats to internal validity—history, maturation, and selection (Campbell, 1957; Shadish et al., 2002). Although Campbell’s approach is currently in widespread use, developmental psychologists are less likely to have been exposed to Donald Rubin’s potential outcomes framework. Here, we provide a brief synopsis of some of the key ideas in Rubin’s approach. The synopsis, which includes a high-level overview of propensity scores, is necessary to understand the IPTW and MSM methods that are introduced in the next section. A slightly more technical but still accessible introduction to these ideas is provided by West and Thoemmes (2010), and source material for Rubin’s potential outcomes model has also been assembled (Rubin, 2006).

Rubin’s potential outcomes model considers the causal effect of a ‘treatment’ (we refer to treatments, exposures, and focal predictors interchangeably). For ease of exposition, the treatment is often defined as a dichotomous comparison between a treatment and control condition. A causal effect is conceived of as an idealized (but unachievable) comparison between an individual’s outcomes if they could be exposed to both treatment and control conditions at the same time and in the same contexts. Because only one condition can ever be observed for any individual, the estimation of causal effects can be construed as a missing data problem. Causal inference methods within the potential outcomes framework seek to approximate this idealized comparison in the presence of missing data. Randomized controlled trials are considered the gold standard approach because the random assignment of individuals to treatment and control conditions creates balanced groups on both observed and unobserved characteristics that differ only in their exposure to the treatment. The balance allows unbiased estimates of the treatment effect to be directly calculated from the data. For questions that are not amenable to randomization, causal effects cannot be calculated directly from study data because individuals in the treatment condition often differ systematically from those in the control condition. In these cases, the ability to estimate causal effects depends on how closely the researcher can approximate a randomized experiment.

In experimental studies, randomization can be considered a known selection mechanism that ensures that treatment groups are balanced (i.e., confounder variables are uncorrelated with treatment). In observational studies, the selection mechanism is unknown and must be modeled. To do so, we select covariates that are related to the presumed selection mechanism and use these covariates to predict treatment exposure. We then assess the extent to which our treatment and measured confounder variables are uncorrelated, thereby determining if the attempt to approximate a randomized experiment was successful.

In our experience, developmentalists often select confounder variables to guard against general threats to inference, typically focusing on demographic and socioeconomic variables. For instance, including family income in a model that examines the association between tobacco exposure and ADHD offers some protection against drawing inaccurate conclusions about the role impact of tobacco exposure, that may be attributable to other aspects of poverty. Rubin’s potential outcomes framework sharpens our thinking about the selection of confounder variables. Confounders are chosen based on having a presumed role as a selection mechanism for our focal predictor in addition to being associated with the outcome (Austin & Stuart, 2015; Pearl, 2011). In our example, the selection of confounder variables was informed by asking the question ‘what are the factors that determine whether, when, and to what extent children are exposed to environmental tobacco smoke?’.

In contrast to identifying general threats to inference, the potential outcomes framework clarifies two aspects of confounder selection. First, it guarantees that the temporal order is correct so that the included confounders necessarily precede the treatment exposure. For instance, for family income level to predict the extent to which children experience environmental tobacco smoke, it must have occurred (been experienced by the child) prior to exposure. Including confounders that follow exposure may block some of the estimated effect and lead to biased estimates. Second, this framework forces the researcher to separate two key analytic steps: first constructing treatment groups that are balanced on observed covariates and then running the substantive outcome model (Rubin, 2007). Separating these two steps prevents the researcher from including or excluding various confounders based on the focal predictor’s coefficient or significance level (i.e., “p-hacking”).

Once a full complement of observed confounder variables has been selected, some variation of propensity score methods are used to estimate the probability that an individual was exposed to the treatment irrespective of their actual exposure. Formally, propensity scores are defined as the conditional probability of receiving the treatment given an observed set of covariates (i.e., $P (Z = 1| X)$ , where Z is a binary treatment exposure and X are all observed confounders) (Rosenbaum & Rubin, 1983). Propensity scores are the basis for multiple strategies to estimate treatment effects using observational data, including matching, subclassification, and weighting (see, e.g., Austin, 2011; Harder, Stuart and Anthony, 2010; notably, King & Nielsen 2019 have advised against using propensity score matching). In all cases, the propensity score is used to compare children who differed with respect to their treatment but who were similar with respect to their propensity score. For example, we can test whether children who had similar levels of risk for tobacco smoke exposure, but who differed with respect to their actual exposure, differed in terms of ADHD behaviors in first grade. A key innovation of propensity score methods is that individuals with the same propensity score are balanced (equivalent) on all observed individual confounder variables that were used in the propensity score model (Rosenbaum & Rubin, 1983). As such, and assuming that other non-trivial assumptions are met (elaborated below), propensity score methods, including IPTW and MSM, approximate a randomized experiment and permit making stronger inferences about focal predictors from observational designs than do conventional covariate adjustment approaches. Another key feature of propensity score methods is the ability to visualize and empirically test whether individuals who varied with respect to their actual treatment exposure were comparable with respect to their propensity scores. In circumstances where individuals who differed on their treatment exposure are dissimilar with respect to their confounders (i.e., propensity scores are unbalanced across treatment conditions), strong inferences are not possible (which is unknown when using conventional covariate adjustment approaches).

One final note is important. ‘Treatment’ exposure is not limited to a comparison of two conditions. Although initially developed for use with binary treatments, the potential outcomes framework has been extended to multi-category and continuous treatment (e.g., Hirano & Imbens, 2004; Imai & van Dyk, 2004). This is important given that many developmental questions focus on continuous exposures. In the present study, we consider children’s exposure to tobacco smoke as our focal continuous treatment.

Inverse Probability Treatment Weighting

Although IPTW methods were developed by Robins (1986) independently of Rubin’s work, IPTW methods can be understood as a propensity score weighting approach (Thoemmes & Ong, 2016). IPTW methods follow the general sequence of steps that were described above for estimating propensity scores. First, potential confounder variables are selected and a propensity score model is developed. For binary treatments, this can be accomplished using logistic regression and a predicted probability serves as the propensity score. The propensity scores are then used to calculate the inverse probability of treatment weight by taking the inverse of the propensity score for individuals who received the treatment ( $1 / P (Z = 1| X))$ , and the inverse of 1 minus the propensity score for individuals who did not receive the treatment ( $1 / (1 - P (Z = 1| X))$ ). This gives individuals who have a low predicted probability of receiving the treatment but who do in fact receive it a large weight, while those who have a low predicted probability of receiving the treatment who do not receive it are assigned a small weight. For a continuous treatment, instead of using logistic regression to estimate a probability, a linear regression model used in conjunction with a normal probability density function can generate a propensity score (Robins, Hernan, and Brumback, 2000; Thoemmes & Ong, 2016). The propensity score is denoted by $φ (E (Z | X)$ , where $φ$ is the probability density function and Z is the continuous treatment exposure level. For continuous treatments, instead of 1 in the numerator, the numerator is defined as the probability density function of receiving the treatment not conditional on any covariates (i.e., mean treatment level). These are referred to as stabilized weights and denoted $φ (E (Z) / φ (E (Z | X))$ . During the second step, checks are made regarding both the construction of weights and, most critically, the extent to which the weights create the intended balance in the measured confounds across treatment conditions in the weighted population. Here, balance is established if the weights have markedly reduced the relations between the measured confounds and the treatment (e.g., standardized mean differences, ds < .20; Pearson correlations, rs < .10). As in the randomized experiment, it is this balance that permits the unbiased estimate of causal effects. If weighting is successful, the third step is to estimate a weighted regression model that relates the outcome to the focal predictor (treatment). IPT weights essentially create a “pseudo-population” that is not identical to the observed sample but instead can be thought of as a sample from a population (created through the differential weighting) in which the treatment variable is uncorrelated with confounder variables (Thoemmes & Ong, 2016).

Extending Inverse Probability Treatment Weighting to Multiple Occasions: Marginal Structural Models

Many developmental research questions involve focal treatment and confounder variables that vary across time. For example, children’s exposure to environmental tobacco smoke, as well as the factors that contribute to that exposure, likely vary from infancy through early childhood. In these circumstances, traditional regression-based methods that rely on covariate adjustment have the potential to result in biased estimates of the treatment effect. In our example, we consider tobacco exposure during two treatment periods (i.e., infancy-toddlerhood, defined as 6-24 months and early childhood, defined as 48-90 months), two sets of time-varying confounders (i.e., factors that increase the likelihood of tobacco exposure during infancy-toddlerhood and early childhood), and parent-rated ADHD behaviors at 36-months (an intermediate point between infancy-toddler and early childhood periods) and 1^st grade (the distal outcome). To account for the possibility that ADHD symptoms at 36-months could cause increased tobacco exposure in early childhood and/or increased ADHD symptoms at 1^st grade, we need to account for 36-month ADHD symptoms in our models. However, traditional regression-based approaches could be problematic. Specifically, including 36-month ADHD symptoms as a covariate would adjust the effect of early childhood tobacco exposure, as intended. However, to the extent to which the association between infant-toddler tobacco exposure with 1^st grade ADHD symptoms is transmitted through 36-month ADHD symptoms, then this part of the effect would be ‘blocked’ or ‘controlled away’ by the inclusion of 36-month ADHD as a covariate. Thus, regression-based statistical control would address one bias, while introducing a new one. A key advantage of using IPTWs to estimate MSMs is that they allow one to address both potential biases simultaneously.

Estimating MSMs through IPTWs allows one to address these concerns by allowing each exposure to maintain its own set of balancing covariates. Thus, the measured covariates (i.e., confounds) included in the balancing model for each wave can (and typically should) be distinct and reflect a presumed causal model, which in many disciplines is visually depicted using a directed acyclic graph (see Wouk, Bauer, & Gottfredson, 2019 for an accessible description). Continuing with our example, we want to create a set of weights that balances 36-month ADHD symptoms across their subsequent levels of early childhood tobacco exposure; however, we would not want to balance 36-month ADHD symptoms across levels of infant tobacco-smoke exposure because (1) causes should not go backward in time and (2) doing so would re-introduce the blocked indirect effect that we are trying to avoid (described above). Rather, tobacco smoke exposure in infancy would be allowed its own set of balancing covariates comprising plausible confounds for that period of exposure (e.g., maternal reports of prenatal smoking). Although the specific covariates included in the respective balancing models should be based on one’s specific causal model, it is common to carry time-invariant and time-varying balancing covariates forward from prior exposure waves to future waves—including prior levels of the exposure itself (i.e., infancy-toddler tobacco exposure balanced across early childhood tobacco exposure). After establishing the balance of each covariate at each time point, a single omnibus weight is created for each individual by taking the product of their respective wave-specific weights (Hernan & Robins, 2006). Using these weights in a regression model functionally removes the effects of confounders on treatments, simulating a repeated random assignment design.

Notably, relying on a causal model to inform the selection of balancing covariates is important for avoiding what is known as collider bias. In their simplest form, colliders are covariates that are caused by the exposure and the dependent variable. Whereas a confounding variable represents a common cause of the exposure and the outcome, a collider variable is the opposite—a common outcome of the exposure and dependent variable. Balancing (or controlling) a collider variable induces a spurious correlation between the exposure and the outcome. For example, if it were the case that both exposure to tobacco smoke and ADHD symptomatology were associated with heightened risk of respiratory problems (e.g., via inflammation effects; Chang et al., 2021), balancing on respiratory problems could induce a spurious association between the tobacco-smoke exposure and ADHD, even if there was no true association. Paying close attention to the temporal precedence of balancing covariates relative to the exposure and outcome can help to minimize the accidental inclusion of a collider. However, it is no panacea, as collider biases can also emerge via more complex scenarios that do not lend themselves to simple rules of thumb (see Cinelli, Forney & Pearl, 2020). Thus, thinking carefully about the broader causal model behind one’s balancing covariates is a critical first step in the entire IPTW endeavor.

Assumptions

The success of IPTW and MSM methods are contingent on multiple assumptions. First, like all propensity score methods, these methods assume that all relevant confounder variables have been measured and that there are no unobserved confounders (i.e., ignorability assumption). This assumption is untestable and underscores the importance of measuring all the relevant variables that represent the selection mechanism that accounts for individual differences in differential treatment exposure. In our example, we want to include all confounders that predict smoke exposure. As above, thoughtful consideration of the selection mechanism is essential, as including colliders or other variables with ambiguous causal effects on the exposure and outcome can introduce new bias (Cole & Hernán, 2008). When faced with large numbers of potential variables, stepwise regression procedures or machine learning algorithms can be used to reduce the number of covariates to those that significantly predict treatment exposure (Rosenbaum & Rubin, 1984; Hong, 2015; Lee, Lessler, Stuart, 2010; Zhu et al., 2015).

Second, these methods can only be used to estimate causal effects for individuals who have the potential of being exposed to a full range of treatment (i.e., positivity assumption). In our case, children who have no potential of smoke exposure (probability equal to zero) cannot contribute to estimates because there must be both exposed and non-exposed children at each exposure level across all covariate histories (Cole et al., 2010). We can only draw inferences about the association between tobacco smoke exposure and children’s outcomes for children who have a positive probability of exposure at each timepoint for all levels of exposure. One can check whether there are individuals who receive exposure to treatment for all values of important confounders. In the case of continuous treatments, researchers can stratify the continuous treatment into multiple discrete levels and confirm the overlap of histograms or boxplots of each subject’s propensity score stratified by treatment group (Brown, 2019).

Third, these methods assume that the model that is used to estimate IPT weights is correctly specified. That is, in addition to having included all the appropriate confounders, one must correctly model the association between confounders and treatment. The most frequently provided example of misspecification includes failure to consider nonlinear or multiplicative associations between confounders and treatment exposures (Lee, Lessler, and Stuart, 2010). For instance, if parental unemployment interacts with depression to affect smoke exposure for children, omitting this term from the model will lead to model misspecification and bias treatment effect estimates. This assumption cannot be tested directly but can be considered met if the model results in “well-behaved” weights (with mean close to 1.0 and moderate variance) and covariate balance across treatment levels (Austin and Stuart, 2015). A final assumption, called the stable unit value treatment assignment (SUTVA), assumes that a child’s outcome depends only on the exposure they experience and does not depend on any other child’s exposure level. This assumption is not a primary concern in this study given that individual children were sampled from each family but is of more concern in group settings (e.g., students nested within classrooms).

Current Study

Our substantive questions were twofold. First, as an extension of recent work that relied on covariate adjustment (Gatzke-Kopp et al., 2020), we test whether children’s average exposure to tobacco smoke across the first seven years of life—thus, a time-invariant mean exposure—is associated with parent-reported attention deficit/hyperactivity disorder (ADHD) behaviors in 1^st grade. This question is amenable to a standard IPTW approach and provides the opportunity to demonstrate the steps and checks that are involved in applying this method. Second, we extend this question to ask whether the developmental timing of environmental smoke exposure is associated with parent-reported ADHD behaviors in 1^st grade. Specifically, we test whether there is a unique and/or cumulative association between tobacco exposure during infancy-toddlerhood and early childhood with ADHD behaviors in 1^st grade (see Figure 1). In this case, both our focal predictors (timing of tobacco smoke exposure) and confounder variables (factors that contribute to the level and timing of exposure, including parent-rated ADHD behaviors at 36 months) are time varying, which necessitates MSM methods. These methods facilitate a test of whether the timing (i.e., are there significant associations in both infancy-toddlerhood and early childhood periods and are they stronger in one period versus the other) and duration (i.e., are there cumulative effects across early childhood) of exposure matters.

Figure 1. — Directed acyclical graph for exposure to environmental tobacco exposure on ADHD symptoms

The primary goal of this paper is to illustrate the utility of IPTW and MSM methods for answering developmental questions. As such, we make several simplifications to our definition of treatment exposure to keep concepts accessible. We organize our treatment definitions around developmental periods, first defining the treatment for IPTW as the average value of cotinine measured across the child’s first 7 years. Then, to answer more developmentally nuanced questions on effect timing and duration, we define treatment as average exposure in infancy-toddlerhood and in early childhood in the MSM. Focusing on two treatment exposures is the simplest demonstration of MSM and allows for the straightforward introduction of analytic steps and interpretation of results, but these methods can be easily adapted to as many treatment periods as necessary.

Methods

Participants & Procedures

The data were obtained from the Family Life project, a prospective longitudinal study of families residing in six predominantly nonurban, low-income counties in eastern North Carolina and central Pennsylvania. The study employed complex sampling procedures to ensure a representative sample, while also oversampling for low-income families, and African American families in North Carolina (see Willoughby et al., 2013 for details). Caregivers provided written consent at the time of study participation. The University of North Carolina at Chapel Hill Institutional Review Board oversees the FLP study data (study 21-2544). A total of n = 1,290 families enrolled in the study. The analytic sample was restricted to children who had at least one valid cotinine (a measure of tobacco exposure) value and a valid parent-reported ADHD measure at 90 months. The restrictions reduced the sample from 1,290 to 1,053 children. Children included in the analytic sample did not differ from those excluded (n = 237) with respect to child gender and family poverty status (all probabilities, ps > 0.05). Children who were Black were more likely to be included in this analysis (44% of children in the sample were Black v. 37% of children not in the sample, p = 0.04), as well as those with a primary caregiver who had completed higher education at the time of recruitment (15% of children in the sample versus 10% not in the sample, p = 0.04).

Measures

Outcome: ADHD Symptoms.

Consistent with previous studies (Pelham, Evans, Gnagy, & Greenslade, 1992), primary caregivers rated the presence of 18 DSM-IV symptoms for ADHD using a 4-point Likert scale (0 = not at all, 1 = just a little, 2 = pretty much, 3 = very much) at the 36 and 90-month (when most children were in 1st grade) home visits. Prior research has established that parent-reported ADHD behaviors in this sample at these ages were most parsimoniously summarized by a single factor (Willoughby, Pek, Greenberg & the Family Life Project Investigators, 2012). To preserve maximum variation, the mean score across all 18 items was used to create a total score (M = 1.04, range = 0-3, SD = .59, α = .93 at 36 months; M = .82, SD = .62, α = .95 at 90 months).

Treatment: Environmental smoke exposure.

Children’s exposure to environmental tobacco smoke was quantified by assaying cotinine, the primary metabolic byproduct of nicotine from children’s saliva using a commercially available diagnostic immunoassay (US Food, Drug cleared, and Cosmetic Act §501(k); conforms with European health and safety requirements [CE Marked]). Whole saliva was collected from infants and children at the 6, 15, 24, 48, and 90-month home visits. Because cotinine data are logarithmically scaled, a log transform was applied to each time point to normalize the distributions. Values ranged from −2.59 to 6.75, indicating a large range of cotinine levels. Additional details on the distribution and severity of exposure in the sample has been documented previously (Gatzke-Kopp et al., 2019).

We first consider a single summary measure of children’s early environmental smoke exposure, by averaging all available cotinine measures between 6 and 90 months. This measure provides our treatment exposure to demonstrate the use of IPTW. Although correlations across time for each cotinine measure are relatively high (rs = 0.62 – 0.77), environmental smoke exposure can vary over time for multiple reasons, including having a primary caregiver start or quit smoking, having a smoker move into or out of the household, and moving into a residence that was previously occupied by smokers. For MSM analyses, we define exposure for two key developmental periods: infancy-toddlerhood environmental smoke exposure (defined as average exposure from 6 to 24 months) and early childhood environmental smoke exposure (defined as average exposure from 48 to 90 months). As noted above, we define two time periods to simplify the illustration of MSM analytic steps and results interpretation.

Time-invariant covariates.

We include a wide range of time-constant and time-varying measures that predict children’s selection into the treatment. Descriptive statistics and correlations among all covariates, treatment, and outcomes variables are shown in Tables 1 and 2. Time-invariant baseline measures were collected at the first home visit, when the child was 2 months old. These include child gender (dichotomous variable with 1 indicating male and 0 female), child race (dichotomous variable with 1 indicating Black and 0 White), and mother’s age at child’s birth. Low birth weight status is included as a dichotomous variable, with 1 indicating that the infant was <2,500g at birth. Pregnancy and delivery complications were assessed by having biological mothers complete the pregnancy and delivery module of the Missouri Assessment of Genetics Interview for Children (MAGIC) at the 2-month visit (Todd, Joyner, Heath, Neuman, & Reich, 2003). It was recoded to a 3-category variable: no complications, 1-2 complications, and 3 or more complications. Maternal prenatal smoking is a dichotomous measure that was collected retrospectively by asking the mother during the 2-month home visit whether she smoked during pregnancy.

Table 1.

Sample characteristics

Variables	M	SD

Cotinine Exposures and Outcomes
Average cotinine (logged) 6-90 mos.	0.16	1.44
Average cotinine (logged): Infancy-toddlerhood (6-24m)	0.47	1.53
Average cotinine (logged): Early childhood (48-90m)	−0.32	1.53
Parent-reported ADHD (35 mos.)	1.04	0.59
Parent-reported ADHD (90 mos.)	0.82	0.62
Time-constant covariates
Child sex (male)	0.51	0.50
Child race (African American)	0.44	0.50
PC age	25.77	5.78
Child low birth weight	0.08	0.28
Pregnancy and delivery complications (None)	0.10	0.30
Pregnancy and delivery complications (1-2)	0.42	0.49
Pregnancy and delivery complications (3 or more)	0.48	0.50
Biological mother smoked during pregnancy	0.23	0.42
PC high school diploma	0.81	0.40
PC 4-year college degree	0.15	0.35
PC IQ estimate	92.09	14.98
Biological mother history ADHD	0.02	0.16
Biological father history ADHD	0.02	0.22
Time-varying covariates	Pre-infancy-toddlerhood (2-6 months)		Pre-early childhood (24-48 months)

Income-to-needs ratio	1.84	1.52	1.74	1.25
Parent consistently employed	0.36	0.48	0.44	0.50
Number of people in household	4.38	1.32	4.40	1.29
Change in household members	0.20	0.40	0.58	0.49
Biological dad always in household	0.60	0.49	0.49	0.50
Change in secondary caregiver	0.09	0.29	0.24	0.43
PC depression	50.06	7.75	50.70	9.24
PC hostility	50.93	8.95	50.40	10.69
Percent of time in center-based childcare	0.12	0.27	0.31	0.36

Open in a new tab

Note: N = 1053 from the first imputed dataset; PC=primary caregiver; ADHD = attention deficit/hyperactivity disorder

Table 2.

IPTW and MSM weight characteristics and unadjusted, adjusted, and IPT weighted estimates of cotinine exposure on ADHD symptoms

	Weight Characteristics

	Median	Mean	SD	Min	Max
IPTW	0.96	0.96	1.53	0.08	21.53
MSM
Infancy-toddlerhood	0.70	0.95	1.56	0.11	34.15
Early childhood	0.95	1.01	0.58	0.17	12.81
Final weight	0.65	0.96	1.67	0.06	25.83

	Estimates of overall cotinine exposure

	Coef.	SE	95% CI
Unadjusted regression	0.31	0.03	(0.25, 0.37)
Adjusted regression¹	0.21	0.04	(0.13, 0.29)
Weighted (IPTW)	0.25	0.04	(0.17, 0.33)

Estimates of cotinine exposure timing
	Cotinine exposure Infancy-toddlerhood			Cotinine exposure Early childhood
	Coef.	SE	95% CI	Coef.	SE	95% CI
Unadjusted regression	0.15	0.05	(0.05, 0.25)	0.18	0.05	(0.08, 0.28)
Adjusted regression¹	0.07	0.05	(−0.03, 0.17)	0.13	0.05	(0.03, 0.23)
Weighted (MSM)	0.14	0.07	(0.00, 0.28)	0.13	0.07	(0.00, 0.27)

Open in a new tab

Note: N = 1053 from the first imputed dataset; Stabilized weights that exceeded the 99th percentile truncated at 99th percentile. Coefficients are standardized beta coefficients. Adjusted regression models include all covariates shown in Table 1 (time constant and pre-infancy-toddlerhood covariates for IPTW; time constant, pre-infancy-toddlerhood, and pre-early childhood for MSM), as well as two-way interactions between child race, gender, prenatal smoking, and primary caregiver high school diploma.

Caregiver education was measured as the highest level of education obtained by the primary caregiver as of the 2-month home visit. Two dichotomous variables are included: whether the caregiver had a high school diploma, and whether the caregiver had a 4-year college degree. Caregiver IQ was assessed by administering the Vocabulary and Block Design subtests of the Wechsler Adult Intelligence Scale, 3^rd edition (Wechsler, 1997) at the 48-month home visit. Parental history of ADHD was assessed by asking whether the biological mother and father of the target child had a childhood history of ADHD (i.e., ‘Has a doctor or other medical professional ever told you [him/her] that you [s/he] have [has] attention-deficit disorder’). When the primary caregiver was not a biological parent of the target child, s/he answered the question with reference to the child’s biological parents.

Time-varying covariates.

Several variables were assessed at multiple time points and could change values over time. These time-varying covariates were included in two groups. The first group preceded the infancy-toddlerhood cotinine exposure period (measured between 6 and 24 months) and included data from the 2- and 6-month interviews (referred to pre-infancy-toddlerhood). The second group preceded the early childhood cotinine exposure period (measured between 48 and 90 months) and included data from the 24-, 36-, and 48-month interviews (referred to as pre-early childhood). Variables included as time-varying confounders were those that could be affected by early exposure (e.g., ADHD at 36 months is affected by infancy-toddlerhood exposure) and that act as confounders for later exposure (e.g., changes in the household composition during early childhood may increase the child’s early childhood exposure and increase the likelihood of ADHD behaviors at 1^st grade).

Household poverty levels are measured by an income-to-needs (INR) ratio calculated by summing the income of all residents in the household and dividing it by the federal poverty threshold for the given year for the given family size. Information collected at the 6-month home visit was used for infancy-toddlerhood since it was not asked at the 2-month home visit. Children’s poverty levels varied throughout early childhood, with 50% of the sample living at 200% of the poverty threshold or below (INR ≤ 2.0) in both infancy-toddlerhood and early childhood periods. The mean INR in infancy-toddlerhood was 1.89, and the mean INR in early childhood was 1.77.

Parental employment was assessed by asking whether the child’s primary caregiver was employed at the time of the interview. Dichotomous variables were constructed, with 1 indicating that the caregiver was consistently employed during the period and 0 if the caregiver was ever unemployed during the period.

Total number of people in household was measured at each home visit and values were averaged across visits for each period. Changes in household members was a dichotomous variable with values 1 of indicating that the household had different numbers of members across visits and 0 if the values were consistent across visits. Biological dad in household was included as a dichotomous variable, with 1 indicating the biologic father consistently lived in the household during the period and 0 otherwise. Changes in primary caregiver was a dichotomous variable with a 1 indicating the child experienced a change in caregiver during the period and 0 otherwise.

Caregiver hostility and caregiver depression were assessed using items from the Brief Symptom Inventory, including a 6-iem subscale assessing depression and a 5-item subscale assessing hostility (Derogatis, 2000). Values were averaged from the 2- and 6-month interviews for the infancy-toddlerhood period and drawn from the 24-month home visit for the early childhood period.

At each visit, primary caregivers were asked about the location of out-of-home care the child attended. Consistent with previous work with this sample (Berry et al., 2016), center-based childcare was constructed as the proportion of time the child attended center-based care during each period.

Missing data.

Twelve variables had missing values, ranging from <1% missing (maternal prenatal smoking, child low birth weight status, pregnancy and delivery complications, biological mother history of ADHD, income-to-needs ratio in early childhood) to 11% missing (income-to-needs ratio in infancy-toddlerhood). Missing data was handled using multiple imputation with Stata’s mi command (White, Royston, and Wood, 2011). We multiply imputed 10 complete data sets with the outcome and all covariates included in the imputation model. IPTW and MSM analyses were conducted on each complete dataset using the analysis methods described in detail below. For simplicity, results are presented from the first imputed dataset (pooled estimates were nearly identical to estimates from the first imputed data set).

Analytic Strategy

To maximize accessibility, we provide both a general description of the steps that are used to conduct IPTW and MSM models, as well as a specific instantiation of these steps in this study.

Inverse Probability Treatment Weights (IPTW).

Three steps are involved in IPTW analysis. In the first step, a propensity score (predicted probability) is created for each observation, and the inverse of that propensity score is calculated to form a weight. The propensity score, in this instance,represents the predicted probability that an individual experienced their actual treatment given their observed confounder variables (covariates). For a continuous treatment, the propensity score can be represented as

φ (E (Z | X)) = \frac{1}{\sqrt{2 π {\hat{σ}}^{2}}} e^{- \frac{{(z - \hat{z})}^{2}}{2 {\hat{σ}}^{2}}}

where $φ$ is the normal probability density function, Z is the continuous treatment, and X are pre-treatment covariates (Robins, Hernan, and Brumback, 2000; Thoemmes & Ong, 2016). This is accomplished by fitting a linear regression model of Z on X, obtaining the predicted values $\hat{z}$ and residual standard errors $\hat{σ}$ , and calculating the conditional density by inserting these values in the normal probability density function (elaborated below). The inverse of the probability forms a weight w, where $w = 1 / φ (E (Z | X))$ . The stabilized weight (sw) replaces the numerator of 1 with the mean of the continuous treatment variable such that $s w = φ (E (Z) / φ (E (Z | X)))$ . This weight will then be used to estimate the average treatment effect (ATE).

To be concrete, our first research questions examined the extent to which a child’s cumulative environmental smoke exposure (average logged cotinine between 6 and 90 months) was associated with parent-reported ADHD in 1^st grade. In the first step, stabilized weights are constructed by estimating two linear regression models, both of which used cumulative environmental smoke exposure as the outcome. The first model, which is used to create values for the numerator of the stabilized weight, does not include any covariates (intercept only, or the mean of the continuous treatment). The second model, which is used to create values for the denominator of the stabilized weight, includes all pre-treatment (baseline and time invariant) covariates, including several higher order and interaction terms. We obtain predicted values for each child from each model and use these to calculate individual probabilities (propensity score) using the normal probability density function. In both cases, the mean and standard deviation of the normal probability density function are represented as the mean of the predicted values and the residual standard error from the linear regression model. These steps yield numerator and denominator values for each child that are used to construct a child-specific stabilized weight.

The second step involves assessing the characteristics of the weights and evaluating whether the treatment assignment (environmental smoke exposure) and pre-treatment covariates are independent (unconfounded) in the weighted sample. During this step, it is not uncommon to run multiple propensity score models with various specifications to achieve balance (Cole & Hernan, 2008). Recent advances such as boosting algorithms (Zhu, Coffman, and Ghosh, 2015) can help reduce the number of iterations required in this step. Any iterating on the propensity score model during the weight construction step is acceptable and expected as long one keeps the weight construction step separate from the outcomes model step. We employ correlation-based balance diagnostics to show that the weighted correlations between each covariate and the exposure variable are close to zero (Austin, 2019). Zhu et al. (2015) suggest the average of all weighted correlations should be less than 0.1. Box-and-whisker plots such as those shown in Figure 1 visually demonstrate the reduction in correlations between pre-treatment covariates and treatment exposure before and after weighting and provide summary statistics of the correlations.

The third step involves estimating a weighted outcomes model (using the stabilized weights) with robust standard errors to account for the uncertainty in the estimated weights (Robins et al., 2000). In our example, we obtain the effect of average environmental smoke exposure between 6 and 90 months on parent-reported ADHD in 1^st grade by running the following weighted regression model:

E [Y] = β_{0} + β_{1} z

where Y is the parent-reported ADHD outcome and z is environmental smoke exposure. To the extent that the assumptions have been met, the estimated regression coefficient $β_{1}$ represents the average treatment effect of experiencing a one unit increase in environmental tobacco exposure (average logged cotinine) on ADHD behaviors in 1^st grade. Said another way, if all children in the population were to be subjected to a one unit increase in average logged cotinine, relative to what they would have otherwise experienced, the regression coefficient reflects the estimated effect of that increase on ADHD behaviors. Additional effects can be estimated, depending on the causal effects of interest.

Marginal Structural Models.

Whereas the first research question considers the cumulative impact of environmental tobacco exposure on ADHD behaviors, the second research question considers the potential for differential impacts of environmental tobacco exposure on ADHD behaviors as a function of the developmental timing of exposure (i.e., infancy-toddlerhood, defined as average logged cotinine experienced between 6 and 24 months; and early childhood, defined as average logged cotinine at 48 and 90 months). This question recasts environmental tobacco exposure as a time-varying treatment. Both time invariant (e.g., prenatal exposure to smoking) and time-varying covariates (e.g., household composition, caregiver depression, attendance in center-based childcare, ADHD at 36 months) serve as potential confounders. Although the IPTW methods in the first research question offer a simplified approach to estimate the impact of average childhood environment tobacco exposure, MSMs provide a better method to estimate of the overall effect (by addressing time-varying confounding) and permit more nuanced developmental questions related to timing. In MSMs, the use of multiple inverse probability of treatment weights eliminates these sources of bias (Robins et al., 2000). Specifically, IPT weights are constructed separately for infancy-toddler and early childhood exposure, and their product is used to create a final weight. As in the single treatment exposure setting, this final weight creates a pseudo-population in which treatment status at every time point (i.e., environmental tobacco exposure at infancy-toddlerhood and early childhood periods) is unconfounded by observed variables (Hernan & Robins, 2006).

The construction of weights follows the same steps described above. The first weight is ${s w}_{1} = \frac{φ (E (Z_{1} = z_{i}))}{φ (E (Z_{1} = z_{i} | X_{0}))}$ , where Z₁ indicates tobacco exposure in infancy-toddlerhood and X₀ indicates all time invariant confounders and time varying confounders that were measured prior to infancy-toddlerhood (see Table 1). The second weight is $s w_{2} = \frac{φ (E (Z_{2} = z_{i} | Z_{1} = z_{i}))}{φ (E (Z_{2} = z_{i} | Z_{1}, X_{0}, X_{1} Y_{1}))},$ where tobacco exposure in early childhood is conditional on infancy-toddlerhood exposure, baseline covariates, time-varying covariates from infancy-toddlerhood and early childhood, and ADHD at 36 months. All variables shown in Table 1, including additional squared and interaction terms, were used to construct IPT weights. The final stabilized weight ( ${s w}_{f}$ ) is the product of sw₁ and sw₂ ( ${s w}_{f} = {s w}_{1} * {s w}_{2}$ )_. Although our study is simplified to having just two time periods, note that this framework can be generalized to any number of treatments or exposure periods.

Our final analytic model is again a weighted linear regression model of early and later cotinine exposure on parent-reported ADHD in 1^st grade. The following weighted regression model allows us to define specific causal effects for different treatment histories,

E [Y_{z_{1} z_{2}}] = β_{0} + β_{1} z_{1} + β_{2} z_{2}

where $z_{1}$ is tobacco exposure in infancy-toddlerhood and $z_{2}$ is exposure in early childhood and $Y_{z_{1} z_{2}}$ is the ADHD outcome at 1^st grade for the treatment history $z_{1}$ and $z_{2}$ . We interpret the early exposure coefficient ( $β_{1}$ ) as the expected change in parent-reported ADHD in 1^st grade if early (logged) cotinine exposure were increased by one unit. Said another way, this is the direct causal effect of a one unit increase in early exposure holding later exposure constant. The later exposure coefficient ( $β_{2}$ )is the expected change in ADHD if later cotinine exposure were increased by one standard deviation holding early exposure constant. The additive linear combination of both coefficients indicates the total expected change, if both exposure periods increased by one unit relative to exposure in both exposure periods of one unit lower. Stata code that implements these steps is provided in the Appendix (see Thoemmes and Ong 2016 for code in R and SPSS).

Results

Table 1 displays the characteristics for the children in the analytic sample. Building from our previous work, we identified confounders as those variables that were associated with the likelihood that a child would be exposed to environmental tobacco exposure and were associated with ADHD behaviors. Correlations between average tobacco smoke exposure and covariates are provided in Supplementary Table 1. Mean salivary cotinine between 6 and 90 months was negatively correlated with primary caregiver age, income-to-needs ratio, biological dad in the household, primary caregiver education, and IQ (rs = −0.32 to −0.46), and moderately negatively correlated with time spent in center-based childcare and consistent primary caregiver employment (rs = −0.14 to −0.16). Average cotinine exposure was positively correlated with maternal prenatal smoking (rs = 0.49) and changes in household members (rs = 0.22). These factors were conceptualized as indexing the selection mechanism that ‘assigned’ children to varying levels of environmental tobacco exposure.

Inverse Probability Treatment Weights

Weight Construction and Balance Diagnostics.

The IPT weights were estimated from regression models that included all time invariant covariates and time-varying covariates from infancy-toddlerhood, as well as multiple interaction terms between gender, race, parental education, and prenatal smoke exposure (see Table 1). Weights were stable across multiple regression specifications. Table 2 (first row of top panel) presents descriptive statistics for the final stabilized weights. Ideally, weights should have a mean close to 1.0 and modest variance, both of which would reduce concerns of model misspecification (Cole & Hernan, 2008; Hernan & Robins, 2006). By examining a kernel density plot, we find the weights are skewed to the left but are centered about the mean (SD=1.53). The maximum weight is relatively large (21.53), and although there is no formal guidance regarding excessively large weights, researchers caution that large weights may have disproportionate influence on final estimates and increase the variability of the estimated treatment effect (Austin & Stuart, 2015). To test the effect of extreme weights on our results, we run the final models with weights that were bottom- and top-coded at the 1^st and 99^th percentile, respectively. .

We assess the balance of all covariates using correlation-based diagnostics (Austin, 2019). Figure 2 summarizes the unweighted and weighted Pearson correlations (r) between each of the included covariates and average cotinine exposure. Prior to weighting, the median correlation was r = 0.16 (min=0, 25^th percentile=0.08,75^th percentile=0.34, and max =, 0.49). After weighting, the median correlation was r = 0.04 (min= 0, 25^th percentile=0.02, 75^th percentile=0.07, and max= 0.14,). The horizontal line at 0.1 in Figure 2 indicates the threshold that denotes minimal confounding (Zhu et al., 2015). All but four weighted correlations (i.e., primary caregiver education ≥ college degree, IQ, and depression; biological father with history of ADHD)) fall below this threshold (with max r = 0.125), indicating relatively low levels of remaining confounding.

Model Results.

The middle panel of Table 2 provides the IPTW estimate of the association between environmental tobacco exposure with child’s parent-reported ADHD in 1^st grade, as well as the corresponding estimates from unadjusted regression and covariate-adjusted regression models. The unadjusted estimates reflect the association between cotinine and ADHD prior to accounting for non-random selection into various environmental smoke exposures. This association is positive and significant, β = 0.31, p < 0.001. Controlling for all pretreatment covariates in a standard multivariate regression reduces the estimated association by about one-third, β = 0.21 p < 0.001. The final row of the middle panel in the table presents the IPTW estimate using the stabilized weights and falls between the unadjusted and covariate-adjusted results, β = .25, p < 0.001.

In general, results from traditional regression adjustment are often similar to the IPT weighting results. This is reassuring in the sense that model choice is not changing the direction or magnitude of the conclusion. However, as noted by Robins, Hernan, and Brumback (2000), weighting approaches allow for researchers to make appropriate adjustments for confounding when time-varying treatment and time-varying confounders are present in ways that standard regression methods do not, as described next.

Marginal Structural Models

Weight Construction and Balance Diagnostics.

For the MSM, we now have two periods of environmental smoke exposure, one for exposure in infancy-toddlerhood (6-24 months) and one for early childhood (48-90 months). We model selection into these two exposure periods through two different propensity score models that result in two stabilized inverse probability of treatment weights. Unlike the inverse probability of treatment weights, time-varying confounders are included in the construction of weights. These include household size and structure changes, family income, caregiver changes, caregiver employment, caregiver depression and hostility, childcare attendance, prior cotinine exposure, and parent-reported ADHD at 36 months. These variables are important to include because they are likely both affected by prior cotinine exposure (e.g., early cotinine exposure may increase parent-reported ADHD at 36 months) and affect later cotinine exposure and the ultimate outcome of parent-reported ADHD in 1^st grade. By including prior levels of treatment and outcome in the balancing model, we attempt to mimic random assignment for each exposure period, akin to a repeated random assignment.

Table 2 (top panel, rows 2-4) shows the descriptive statistics for the weights for each time period and the final weight. As expected, the means of the weights from each period and the final weight are close to or just under 1.0. The weights are skewed left but center around the mean (final weight SD=1.7).

Model Results.

When we properly account for observed selection into different cotinine exposures at both time points, the effect of experiencing one SD higher environmental tobacco exposure during infancy- toddlerhood on 1^st grade ADHD was significant (β = 0.14, p = 0.03), while the effect of experiencing one SD higher environmental tobacco exposure during early childhood, holding earlier exposure constant, was not (β = 0.13, p = 0.07) (Table 2, bottom panel, row 3). Despite trivial differences in p-values, the direct effect for early childhood was comparable in magnitude to the effect for the infancy-toddlerhood period. Taken together, the total (cumulative) association of experiencing a one SD higher tobacco exposure in infancy-toddlerhood and a one SD higher exposure in early childhood (relative to what the exposure would have otherwise been in both exposure periods) on 1^st grade ADHD behaviors was significant, β = 0.27, p < 0.001.

By comparison, the results from the unadjusted regression model (Table 2, bottom panel, row 1) are significant and positive at both time points, with smaller p-values than the weighted MSM model. In contrast, the results from the adjusted regression model (traditional covariate adjustment) indicated a nonsignificant effect during the infancy-toddlerhood period and a significant effect during the early childhood period (Table 2, bottom panel, row 2). In these data, the biggest discrepancy between traditional covariate adjusted and the weighted (MSM) results is the magnitude of the infant-toddler effect (β s = .07 vs. .14).

Discussion

The present study used the example of environmental tobacco exposure and children’s development of ADHD behaviors to demonstrate the application of inverse probability treatment weighting and marginal structural models for testing developmental questions concerned with the timing and dosage of children’s experiences. Results indicate that environmental tobacco exposure across the first seven years of life is associated with parent-reported ADHD behaviors in 1^st grade. Inferences regarding the time varying associations of environmental tobacco exposure on ADHD behaviors in 1^st grade were mixed. Although there was evidence that infant-toddler exposure was associated with increased ADHD behaviors, exposure during early childhood did not meet conventions of statistical significance (p = .07). However, a test of the cumulative association of tobacco exposures on ADHD behaviors in 1^st grade suggested was comparably across the infant-toddler and early childhood periods.

The results of IPTW models that focused on the aggregate effects of environmental tobacco exposure on ADHD behaviors in 1^st grade are consistent with and extend our recent findings (Gatzke-Kopp, 2020). Here, we focused on time varying measures of environmental tobacco exposure and considered parent (not teacher) reported ADHD at two different time periods (age 3 and 1^st grade home visits). When substantive questions involve testing the impact of a single focal experience (treatment) on some outcome, IPTW, propensity score models, and traditional regression-based approaches will frequently yield similar conclusions (Shah, Laupacis, Hux, & Austin, 2005). Although it is reasonable to wonder whether the extra work is necessary in these situations, there are two reasons to consider using an IPTW approach. First, the potential outcomes framework can provide a helpful lens through which covariates are selected. Reconceptualizing focal predictors as treatments encourages greater consideration of the potential causal pathways in which the focal experience exerts its effects, as well as which variables index the selection mechanism that ‘assigns’ individuals to varying levels of that treatment. The goal of covariate selection is to remove associations between covariates and treatment to approximate a randomized experiment. This practice differs from current conventions in developmental psychology in which covariates are often selected to provide general “control” for potential confounding (e.g., adjusting for SES status) with less consideration of a specific causal model. Second, IPTW and related methods provide a framework for determining how well covariate adjustment ‘worked’. The ability to visualize and test covariate balance as a function of treatment level is not easily accomplished and rarely considered in regression-based approaches. The use of IPTW forces the researcher to be specific about the counterfactual being tested (e.g., the treatment vs. control condition) by explicitly aligning what one is conceptually trying to test with the actual analysis. Furthermore, it requires the separation of two distinct phases of the research process – setting up the “experiment” (i.e., modeling selection into treatment and checking balance) prior to estimating the treatment effects (i.e., running the weighted outcome model).

The extension of inverse probability of treatment weighting for single treatment exposures to marginal structural models for multiple time-varying treatment exposures has the potential to improve the clarity of developmental questions while improving the rigor with which those questions are tested. Time-varying treatments correspond to long-standing developmental questions, including how the timing, duration, and sequencing of a child’s experience impacts specific outcomes. The potential outcomes framework, including the methods that we describe here, has been more frequently applied in epidemiology, medicine, and sociology than in psychology. However, these methods have the potential to provide stronger evidence to inform developmental theory and to inform policy relevant questions.

Despite the potential value of these methods for developmental research, they are not a panacea. We highlight three limitations of the current study for researchers to consider in their own application of these methods. First, for didactic reasons, the study simplified treatment exposures by averaging exposures experienced within developmental periods. Although this allowed for a simple illustrative analysis and interpretation of results, it did not incorporate each of the exposure measures individually and thus may not fully account for time-varying confounding. Marginal structural models can accommodate as many treatment exposures as available in the data. However, numerous treatment exposure periods can become unwieldy to manage, and efforts to ensure balance across treatment histories can become complicated. Whenever possible, researchers should rely on theory when deciding on the number and durations of treatment histories. A second limitation is that linear regression was used to estimate the propensity scores in the IPTW and MSM models. This approach was used because of general ease of use and familiarity to developmental researchers. Significant improvements in propensity score estimation for continuous treatment exposures include the use of generalized boosted models (GBM) (Zhu et al., 2015) and the covariate balancing generalized propensity score (CBGPS) (Fong, Hazlett, and Imai, 2018). Those authors have provided R code for each of these approaches. A third limitation of the study (and indeed all observational studies) is that the validity of causal estimates is contingent on how closely the assumptions were met. For example, to the extent that there are relevant confounder variables that remain unmeasured (which is both unknowable and likely), the ability to draw causal inferences is undermined. There are proposed sensitivity analyses which can be used to estimate the robustness of findings to potential unmeasured confounders, such as the E-value by VanderWeele and Ding (2017) which estimates the minimum strength an unmeasured confounder would need to have with both the treatment and outcome to explain away the observed effect. Decisions about the selection of confounders is predicated on the veracity of theoretical models that inform treatment selection. In our view, even in instances where some assumptions are not met (e.g., omission of confounder variables), IPTW and MSM approaches are still useful because they help to both force thinking about the underlying conceptual model that is being tested and to provide a strategy for evaluating whether covariate balance was achieved.

Most developmental psychologists’ graduate training emphasized the importance of research design as the primary strategy for making strong inferences. Exposure to the potential outcomes framework has been less common. We hope this study helps to draw more awareness of and interest in using this framework to improve the rigor of developmental research.

Supplementary Material

supplemental material

NIHMS2055842-supplement-supplemental_material.docx^{(29.6KB, docx)}

Public Significance Statement.

Covariate adjustment approaches are widely used by developmental psychologists to infer associations between focal predictors and child outcomes. Inverse probability treatment weighting and marginal structural models are two alternative approaches, which are more widely used in public health and medicine, that provide a more sophisticated approach for addressing developmental questions. We provide a didactic orientation to these approaches through an applied example.

Acknowledgements

Support for data collection was provided by the Eunice Kennedy Shriver National Institute of Child Health and Human Development grants P01HD039667 and P01HD039667 (Vernon-Feagans, PI). Support for this data analysis and manuscript writing was provided by the National Institutes of Health, Office of the Director grants UG3OD023332 and UH3OD023332 (Blair, PI). Siri Warkentien is now with the Administration for Children & Families, Office of Planning, Research, and Evaluation.

Appendix A. Sample Stata Code for Continuous Treatment

/*******************************************************************************/

* Name: CotDBD_6_dta1_iptwmsm.do

* Purpose:  IPTW w/ continuous treatment variable (cotinine exposure);

*       Outcome: parent-reported ADHD (pADHDTotM);

*       Covariates: race; male; INR; PC age; PDICs; LBW; parent IQ;

*       prenatal smoking, parent education; Hx ADHD (parent); PC hostility,

*       depression; biodad in HH; PC emp; HH member, HH changes, PC change;

*       center-based care; # moves;

/*******************************************************************************/

use “$source/CotDBD_6_dta1.dta”, clear

sum

////////////

// IPTW //

////////////

* Tx: continuous cotinine exposure (measured as log average from 6 - 90 months)

/// (0) Check pre-weighted differences

foreach y in tcmale tcblac2 pmage2 pmage2sq lbw mslbw pdics3c1 pdics3c2 pdics3c3 ///

          mspdics smokpreg mssmokpreg pmhs2 pm4col2 pciq mspciq momhxadhd ///

          msmomhxadhd dadhxadhd msdadhxadhd inr6tc msinr6tc pempall2_6 numhh2_6 ///

          hhchg2_6 biodad2_6 pmsc2dif6 bsideppc2_6 bsihospc2_6 cnt2_6 nummov2_6 {

   regress avgcotl `y'

/// (1) Create weights

* NOTE: use stabilized weights, which divide the baseline probability of

*      selecting treatment (i.e., model with no covariates) by probability of

*      selecting treatment given the covariates

* A) Calculate numerator

regress avgcotl

predict num_predict

sum num_predict

predict num_resid, residuals

sum num_resid

gen num_sd=r(sd)

disp num_sd

gen num1=normalden(avgcotl, num_predict, num_sd)

* B) Calculate denominator

regress avgcotl tcmale tcblac2 pmage2 pmage2sq lbw mslbw pdics3c2 pdics3c3 ///

      mspdics smokpreg mssmokpreg pmhs2 pm4col2 pciq mspciq momhxadhd ///

      msmomhxadhd dadhxadhd msdadhxadhd inr6tc inr6tcsq msinr6tc pempall2_6 ///

      numhh2_6 hhchg2_6 biodad2_6 pmsc2dif6 bsideppc2_6 bsihospc2_6 cnt2_6 ///

      nummov2_6 smkpgtcmale smkpgtcblac2 smkpgpmhs2 maletcblac2 malepmhs2 blacpmhs2

predict den_predict

sum den_predict

predict den_resid, residuals

sum den_resid

gen den_sd=r(sd)

disp den_sd

gen den1=normalden(avgcotl, den_predict, den_sd)

* C) Caculate weights

gen w1=num1/den1

sum w1

/// (2) Assess balance

* NOTE: Example below uses correlation-based diagnostics (Austin, 2018)

foreach y in tcmale tcblac2 pmage2 pmage2sq lbw mslbw pdics3c2 pdics3c3 mspdics ///

            smokpreg mssmokpreg pmhs2 pm4col2 pciq mspciq momhxadhd ///

            msmomhxadhd dadhxadhd msdadhxadhd ///

            inr6tc msinr6tc pempall2_6 numhh2_6 hhchg2_6 biodad2_6 ///

            pmsc2dif6 bsideppc2_6 bsihospc2_6 cnt2_6 ///

            nummov2_6 smkpgtcmale smkpgtcblac2 smkpgpmhs2 ///

            maletcblac2 malepmhs2 blacpmhs2 {

   pwcorr avgcotl `y', sig

   pwcorr avgcotl `y' [aweight=w1], sig

/// (3) Run Outcome Models

* Unadjusted regression

regress p90adhdtotm avgcotl, beta

* Covariate-adjusted regression

regress p90adhdtotm avgcotl tcmale tcblac2 pmage2 pmage2sq lbw mslbw pdics3c2 ///

       pdics3c3 mspdics smokpreg mssmokpreg pmhs2 pm4col2 pciq mspciq ///

       momhxadhd msmomhxadhd dadhxadhd msdadhxadhd ///

       inr6tc inr6tcsq msinr6tc pempall2_6 numhh2_6 hhchg2_6 biodad2_6 ///

       pmsc2dif6 bsideppc2_6 bsihospc2_6 cnt2_6 nummov2_6 ///

       smkpgtcmale smkpgtcblac2 smkpgpmhs2 maletcblac2 malepmhs2 blacpmhs2, beta

* IPTW

regress p90adhdtotm avgcotl [aweight=w1], vce(robust) beta

* Checks on extreme weights

regress p90adhdtotm avgcotl [pweight=w1] if w5<=10, vce(robust) beta

regress p90adhdtotm avgcotl [pweight=w1] if w5<=5, vce(robust) beta

/////////////

/// MSM ///

/////////////

use “$source/CotDBD_6_dta1.dta”, clear

sum

/// (1) Create weights

*** Time 0 ***

* (a) Calculate numerator (t0)

regress avgcotl6_24

predict num_predictt0

sum num_predictt0

predict num_residt0, residuals

sum num_residt0

gen num_sdt0=r(sd)

disp num_sdt0

gen numt0=normalden(avgcotl6_24, num_predictt0, num_sdt0)

sum numt0

hist numt0

* (b) Calculate denominator (t0)

regress avgcotl6_24 tcmale tcblac2 pmage2 pmage2sq lbw mslbw pdics3c2 ///

      pdics3c3 mspdics smokpreg mssmokpreg pmhs2 pm4col2 pciq mspciq ///

      momhxadhd msmomhxadhd dadhxadhd msdadhxadhd inr6tc inr6tcsq msinr6tc ///

      pempall2_6 numhh2_6 hhchg2_6 ///

      biodad2_6 pmsc2dif6 bsideppc2_6 bsihospc2_6 cnt2_6 nummov2_6 ///

      smkpgtcmale smkpgtcblac2 smkpgpmhs2 maletcblac2 malepmhs2 blacpmhs2

predict den_predictt0

sum den_predictt0

predict den_residt0, residuals

sum den_residt0

gen den_sdt0=r(sd)

disp den_sdt0

sum den_sdt0

gen dent0=normalden(avgcotl6_24, den_predictt0, den_sdt0)

hist dent0

*** Time 1 ***

* (a) Calculate numerator (t1)

regress avgcotl48_90 avgcotl6_24

predict num_predictt1

sum num_predictt1

predict num_residt1, residuals

sum num_residt1

gen num_sdt1=r(sd)

disp num_sdt1

gen numt1=normalden(avgcotl48_90, num_predictt1, num_sdt1)

sum numt1

hist numt1

* (b) Calculate denominator (t1)

regress avgcotl48_90 avgcotl6_24 p35adhdtotm ///

                tcmale tcblac2 pmage2 pmage2sq lbw mslbw pdics3c2 ///

                pdics3c3 mspdics smokpreg mssmokpreg pmhs2 pm4col2 ///

                pciq mspciq momhxadhd msmomhxadhd ///

                dadhxadhd msdadhxadhd inr6tc inr6tcsq msinr6tc ///

                pempall2_6 numhh2_6 hhchg2_6 biodad2_6 pmsc2dif6 ///

                bsideppc2_6 bsihospc2_6 cnt2_6 nummov2_6 ///

                smkpgtcmale smkpgtcblac2 smkpgpmhs2 maletcblac2 ///

                malepmhs2 blacpmhs2 inr24_48tc inr24_48tcsq ///

                msinr24_48tc pempall24_48 numhh24_48 ///

                hhchg24_48 mshhchg24_48 biodad24_48 pmsc24dif48 ///

                bsideppc24 bsihospc24 msbsideppc24 msbsihospc24 ///

                cnt24_48 nummov24_48

predict den_predictt1

sum den_predictt1

predict den_residt1, residuals

sum den_residt1

gen den_sdt1=r(sd)

disp den_sdt1

sum den_sdt1

gen dent1=normalden(avgcotl48_90, den_predictt1, den_sdt1)

hist dent1

*** Create final weights ***

* Calculate stabilized weight for t0 and t1

gen swt0=numt0/dent0

gen swt1=numt1/dent1

sum swt0 swt1 avgcotl6_24 avgcotl48_90

* Take product of stabilized weights to get final weight

gen fw=swt0*swt1

sum fw, d

/// (2) Assess balance

* Time 0 + Time 1

foreach y in avgcotl6_24 p35adhdtotm tcmale tcblac2 pmage2 pmage2sq lbw ///

          mslbw pdics3c2 pdics3c3 mspdics smokpreg mssmokpreg pmhs2 ///

          pm4col2 pciq mspciq momhxadhd msmomhxadhd dadhxadhd msdadhxadhd ///

          inr6tc inr6tcsq msinr6tc pempall2_6 numhh2_6 hhchg2_6 ///

          biodad2_6 pmsc2dif6 bsideppc2_6 bsihospc2_6 cnt2_6 nummov2_6 ///

          momhxadhd msmomhxadhd dadhxadhd msdadhxadhd smkpgtcmale ///

          smkpgtcblac2 smkpgpmhs2 maletcblac2 malepmhs2 blacpmhs2 ///

          inr24_48tc inr24_48tcsq msinr24_48tc pempall24_48 numhh24_48 ///

          hhchg24_48 mshhchg24_48 biodad24_48 pmsc24dif48 bsideppc24 ///

          bsihospc24 msbsideppc24 msbsihospc24 cnt24_48 nummov24_48 {

   pwcorr avgcotl6_24 `y'

   pwcorr avgcotl6_24 `y' [aweight=fw]

   pwcorr avgcotl48_90 `y'

   pwcorr avgcotl48_90 `y' [aweight=fw]

/// (3) Run weighted model

global cov tcmale tcblac2 pmage2 pmage2sq lbw mslbw pdics3c2 pdics3c3 ///

         mspdics smokpreg mssmokpreg pmhs2 pm4col2 pciq mspciq ///

         momhxadhd msmomhxadhd dadhxadhd msdadhxadhd inr6tc inr6tcsq ///

         msinr6tc pempall2_6 numhh2_6 hhchg2_6 ///

         biodad2_6 pmsc2dif6 bsideppc2_6 bsihospc2_6 cnt2_6 nummov2_6 ///

         momhxadhd msmomhxadhd dadhxadhd msdadhxadhd smkpgtcmale ///

         smkpgtcblac2 smkpgpmhs2 maletcblac2 malepmhs2 blacpmhs2 ///

         inr24_48tc inr24_48tcsq msinr24_48tc pempall24_48 numhh24_48 ///

         hhchg24_48 mshhchg24_48 biodad24_48 pmsc24dif48 ///

         bsideppc24 bsihospc24 msbsideppc24 msbsihospc24 cnt24_48 nummov24_48

* Unadjusted regression

regress p90adhdtotm avgcotl6_24 avgcotl48_90, beta

* Adjusted regression

regress p90adhdtotm avgcotl6_24 avgcotl48_90 $cov , beta

* Weighted MSM

regress p90adhdtotm avgcotl6_24 avgcotl48_90 [pweight=fw] , vce(robust) beta

References

Austin PC (2011). An introduction to propensity score methods for reducing the effects of confounding in observational studies. Multivariate Behavioral Research. 46, 399–424. [DOI] [PMC free article] [PubMed] [Google Scholar]
Austin PC, & Stuart EA (2015). Moving towards best practice when using inverse probability of treatment weighting (IPTW) using the propensity score to estimate causal treatment effects in observational studies. Statistics in medicine, 34(28), 3661–3679. [DOI] [PMC free article] [PubMed] [Google Scholar]
Austin PC (2019). Assessing covariate balance when using the generalized propensity score with quantitative or continuous exposures. Statistical methods in medical research, 28(5), 1365–1377. [DOI] [PMC free article] [PubMed] [Google Scholar]
Berry D, Blair C, Willoughby M, Garrett-Peters P, Vernon-Feagans L, Mills-Koonce WR, & Family Life Project Key Investigators. (2016). Household chaos and children’s cognitive and socio-emotional development in early childhood: Does childcare play a buffering role? Early childhood research quarterly, 34, 115–127. [DOI] [PMC free article] [PubMed] [Google Scholar]
Brown DW (2019). Novel propensity score methods for multiple and continuous treatments: Applications to EHR data. UT School of Public Health Dissertations (Open Access). 91. [Google Scholar]
Campbell DT (1957). Factors relevant to the validity of experiments in social settings. Psychological bulletin, 54(4), 297. [DOI] [PubMed] [Google Scholar]
Chang JPC, Su KP, Mondelli V, & Pariante CM (2021). Cortisol and inflammatory biomarker levels in youths with attention deficit hyperactivity disorder (ADHD): evidence from a systematic review with meta-analysis. Translational psychiatry, 11(1), 430. [DOI] [PMC free article] [PubMed] [Google Scholar]
Cinelli C, Forney A, & Pearl J (2022, May 20). A crash course in good and bad controls. Sociological Methods & Research, 10.1177/00491241221099 [DOI] [Google Scholar]
Cole SR, & Hernán MA (2008). Constructing inverse probability weights for marginal structural models. American journal of epidemiology, 168(6), 656–664. [DOI] [PMC free article] [PubMed] [Google Scholar]
Cole SR, Platt RW, Schisterman EF, Chu H, Westreich D, Richardson D, & Poole C (2010). Illustrating bias due to conditioning on a collider. International journal of epidemiology, 39(2), 417–420. 10.1093/ije/dyp334 [DOI] [PMC free article] [PubMed] [Google Scholar]
Derogatis L (2000). The Brief Symptom Inventory–18 (BSI-18): Administration, scoring and procedures manual. Minneapolis, MN: National Computer Systems. [Google Scholar]
DuPaul GJ, Anastopoulos AD, Power TJ, Reid R, Ikeda MJ, McGoey KE (1998). Parent ratings of Attention-Deficit/Hyperactivity Disorder symptoms: factor structure and normative data. Journal of Psychopathology and Behavioral Assessment, 20:83–102. [Google Scholar]
Fong C, Hazlett C, & Imai K (2018). Covariate balancing propensity score for a continuous treatment: Application to the efficacy of political advertisements. The Annals of Applied Statistics, 12(1), 156–177. [Google Scholar]
Frank KA, Maroulis SJ, Duong MQ, & Kelcey BM (2013). What would it take to change an inference? Using Rubin’s causal model to interpret the robustness of causal inferences. Educational Evaluation and Policy Analysis, 35(4), 437–460. [Google Scholar]
Gangl M (2013). Partial identification and sensitivity analysis. In Handbook of Causal Analysis for Social Research (pp. 377–402). Springer, Dordrecht. [Google Scholar]
Gatzke-Kopp LM, & Beauchaine TP (2007). Direct and passive prenatal nicotine exposure and the development of externalizing psychopathology. Child Psychiatry and Human Development, 38, 255–269. [DOI] [PMC free article] [PubMed] [Google Scholar]
Gatzke-Kopp LM, Willoughby MT, Warkentien SM, O’Connor T, Granger DA, & Blair C (2019). Magnitude and chronicity of environmental smoke exposure across infancy and early childhood in a sample of low-income children. Nicotine & Tobacco Research, 21(12), 1665–1672. [DOI] [PMC free article] [PubMed] [Google Scholar]
Gatzke-Kopp L, Willoughby MT, Warkentien S, Petrie D, Mills-Koonce R, & Blair C (2020). Association between environmental tobacco smoke exposure across the first four years of life and manifestation of externalizing behavior problems in school-aged children. Journal of Child Psychology and Psychiatry, 61(11), 1243–1252. [DOI] [PMC free article] [PubMed] [Google Scholar]
Gaysina D, Fergusson D, Leve L, Horwood J, Reiss D, Shaw D,…& Harold GT (2013). Maternal smoking during pregnancy and offspring conduct problems: Evidence from 3 independent genetically sensitive research designs. JAMA Psychiatry, 70, 956–963. [DOI] [PMC free article] [PubMed] [Google Scholar]
Harder VS, Stuart EA, & Anthony JC (2010). Propensity score techniques and the assessment of measured covariate balance to test causal associations in psychological research. Psychological methods, 15(3), 234–249. doi: 10.1037/a0019623 [DOI] [PMC free article] [PubMed] [Google Scholar]
Hernán MA, Hernández-Diaz S, Werler MM & Mithcel AA (2002). Causal knowledge as a prerequisite for confounding evaluation: an application to birth defects epidemiology. American Journal of Epidemiology, 155(2), 176–184, doi: 10.1093/aje/155.2.176 [DOI] [PubMed] [Google Scholar]
Hernán MA, & Robins JM (2006). Estimating causal effects from epidemiological data. J Epidemiol Community Health, 60, 578–586. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hernán MA, Robins JM (2020). Causal Inference: What If. Boca Raton: Chapman & Hall/CRC. [Google Scholar]
Hirano K, Imbens GW (2004). The propensity score with continuous treatments. Applied Bayesian modeling and causal inference from incomplete-data perspectives, 226164, 73–84. [Google Scholar]
Hong G (2015). Causality in a social world: Moderation, mediation and spill-over. John Wiley & Sons. [Google Scholar]
Imai K, Van Dyk DA (2004). Causal inference with general treatment regimes: Generalizing the propensity score. Journal of the American Statistical Association, 99(467), 854–866. [Google Scholar]
Keyes KM, Davey Smith G, Susser E (2014). Associations of prenatal maternal smoking with offspring hyperactivity: Causal or confounded? Psych. Medicine, 44, 857–867. [DOI] [PMC free article] [PubMed] [Google Scholar]
King G, & Nielsen R (2019). Why propensity scores should not be used for matching. Political Analysis, 27(4), 435–454 [Google Scholar]
Lee BK, Lessler J, & Stuart EA (2011). Weight trimming and propensity score weighting. PloS one, 6(3). [DOI] [PMC free article] [PubMed] [Google Scholar]
Leyrat C, Seaman SR, White IR, et al. Propensity score analysis with partially observed covariates: how should multiple imputation be used? Stat Methods Med Res. 2019;28(1):3–19. [DOI] [PMC free article] [PubMed] [Google Scholar]
Matt G, Quintana P, Zakarian J, Formann A, Chatfield D, Hoh E… & Hovell M (2011). When smokers move out and non-smokers move in: residential thirdhand smoke pollution and exposure. Tobacco Control, 20, e1. [DOI] [PMC free article] [PubMed] [Google Scholar]
Morgan SL, & Winship C (2015). Counterfactuals and causal inference. Cambridge University Press. [Google Scholar]
Robins J (1986). A new approach to causal inference in mortality studies with a sustained exposure period—Application to control of the healthy worker survivor effect. Mathematical Modelling, 7, 1393–1512. [Google Scholar]
Robins JM, Hernan MA, & Brumback B (2000). Marginal structural models and causal inference in epidemiology Epidemiology 11 (5): 550–560. [DOI] [PubMed] [Google Scholar]
Rosenbaum PR, Rubin DB (1983). The central role of the propensity score in observational studies for causal effects. Biometrika. 70, 41–55. [Google Scholar]
Rosenbaum PR (2009). Design of Observational Studies. Springer; New York, NY. [Google Scholar]
Rubin DB. Multiple Imputation for Nonresponse in Surveys. Hoboken, NJ: John Wiley and Sons; 1987. [Google Scholar]
Rubin DB (2006). Matched sampling for causal effects. Cambridge University Press. [Google Scholar]
Rubin DB (2007). The design versus the analysis of observational studies for causal effects: parallels with the design of randomized trials. Statistics in Medicine. 26, 20–36. [DOI] [PubMed] [Google Scholar]
Rutter M (2007). Proceeding from observed correlation to causal inference: The use of natural experiments. Perspectives on Psychological Science, 2(4), 377–395. [DOI] [PubMed] [Google Scholar]
Shadish WR, Cook TD, & Campbell DT (2002). Experimental and quasi-experimental designs for generalized causal inference. Boston: Houghton Mifflin,. [Google Scholar]
Shah BR, Laupacis A, Hux JE, & Austin PC (2005). Propensity score methods gave similar results to traditional regression modeling in observational studies: a systematic review. Journal of clinical epidemiology, 58(6), 550–559. [DOI] [PubMed] [Google Scholar]
Thoemmes F, & Ong AD (2016). A primer on inverse probability of treatment weighting and marginal structural models. Emerging Adulthood, 4(1), 40–59. [Google Scholar]
Todd R, Joyner C, Heath A, Neuman R, & Reich W (2003). Reliability and stability of a semistructured DSM-IV interview designed for family studies. Journal of the American Academy of Child and Adolescent Psychiatry, 42, 1460–1468. [DOI] [PubMed] [Google Scholar]
VanderWeele TJ, Ding P. (2017). Sensitivity Analysis in Observational Research: Introducing the E-Value. Annals Internal Medicine, 167(4):268–274. doi: 10.7326/M16-2607 [DOI] [PubMed] [Google Scholar]
Wechsler D (1997). Wechsler adult intelligence scale—Third edition. San Antonio, TX: The Psychological Corporation. [Google Scholar]
West SG, & Thoemmes F (2010). Campbell’s and Rubin’s perspectives on causal inference. Psychological methods, 15(1), 18. [DOI] [PubMed] [Google Scholar]
Willoughby M, Burchinal M, Garrett-Peters P, Mills- Koonce R, Vernon-Feagans L, Cox M (2013). The Family Life Project: An epidemiological and developmental study of young children living in poor rural communities:II. Recruitment of the Family Life Project. Monographs of the Society for Research in Child Development, 78, 24–35. [DOI] [PubMed] [Google Scholar]
Willoughby MT, Pek J, Greenberg MT & the FLP Investigators (2012). Parent-reported Attention Deficit/Hyperactivity symptomatology in preschool-aged children: factor structure, developmental change, and early risk factors. J Abnorm Child Psychol, 40(8): 1301–1312. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wouk K, Bauer AE, Gottfredson NC (2019). How to implement directed acyclic graphs to reduce bias in addiction research. Addictive Behaviors, 94, 109–116. doi: 10.1016/j.addbeh.2018.09.032. [DOI] [PubMed] [Google Scholar]
Zhu Y, Coffman DL, & Ghosh D (2015). A boosting algorithm for estimating generalized propensity scores with continuous treatments. Journal of causal inference, 3(1), 25–40. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

supplemental material

NIHMS2055842-supplement-supplemental_material.docx^{(29.6KB, docx)}

[R1] Austin PC (2011). An introduction to propensity score methods for reducing the effects of confounding in observational studies. Multivariate Behavioral Research. 46, 399–424. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R2] Austin PC, & Stuart EA (2015). Moving towards best practice when using inverse probability of treatment weighting (IPTW) using the propensity score to estimate causal treatment effects in observational studies. Statistics in medicine, 34(28), 3661–3679. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R3] Austin PC (2019). Assessing covariate balance when using the generalized propensity score with quantitative or continuous exposures. Statistical methods in medical research, 28(5), 1365–1377. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R4] Berry D, Blair C, Willoughby M, Garrett-Peters P, Vernon-Feagans L, Mills-Koonce WR, & Family Life Project Key Investigators. (2016). Household chaos and children’s cognitive and socio-emotional development in early childhood: Does childcare play a buffering role? Early childhood research quarterly, 34, 115–127. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] Brown DW (2019). Novel propensity score methods for multiple and continuous treatments: Applications to EHR data. UT School of Public Health Dissertations (Open Access). 91. [Google Scholar]

[R6] Campbell DT (1957). Factors relevant to the validity of experiments in social settings. Psychological bulletin, 54(4), 297. [DOI] [PubMed] [Google Scholar]

[R7] Chang JPC, Su KP, Mondelli V, & Pariante CM (2021). Cortisol and inflammatory biomarker levels in youths with attention deficit hyperactivity disorder (ADHD): evidence from a systematic review with meta-analysis. Translational psychiatry, 11(1), 430. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R8] Cinelli C, Forney A, & Pearl J (2022, May 20). A crash course in good and bad controls. Sociological Methods & Research, 10.1177/00491241221099 [DOI] [Google Scholar]

[R9] Cole SR, & Hernán MA (2008). Constructing inverse probability weights for marginal structural models. American journal of epidemiology, 168(6), 656–664. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R10] Cole SR, Platt RW, Schisterman EF, Chu H, Westreich D, Richardson D, & Poole C (2010). Illustrating bias due to conditioning on a collider. International journal of epidemiology, 39(2), 417–420. 10.1093/ije/dyp334 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R11] Derogatis L (2000). The Brief Symptom Inventory–18 (BSI-18): Administration, scoring and procedures manual. Minneapolis, MN: National Computer Systems. [Google Scholar]

[R12] DuPaul GJ, Anastopoulos AD, Power TJ, Reid R, Ikeda MJ, McGoey KE (1998). Parent ratings of Attention-Deficit/Hyperactivity Disorder symptoms: factor structure and normative data. Journal of Psychopathology and Behavioral Assessment, 20:83–102. [Google Scholar]

[R13] Fong C, Hazlett C, & Imai K (2018). Covariate balancing propensity score for a continuous treatment: Application to the efficacy of political advertisements. The Annals of Applied Statistics, 12(1), 156–177. [Google Scholar]

[R14] Frank KA, Maroulis SJ, Duong MQ, & Kelcey BM (2013). What would it take to change an inference? Using Rubin’s causal model to interpret the robustness of causal inferences. Educational Evaluation and Policy Analysis, 35(4), 437–460. [Google Scholar]

[R15] Gangl M (2013). Partial identification and sensitivity analysis. In Handbook of Causal Analysis for Social Research (pp. 377–402). Springer, Dordrecht. [Google Scholar]

[R16] Gatzke-Kopp LM, & Beauchaine TP (2007). Direct and passive prenatal nicotine exposure and the development of externalizing psychopathology. Child Psychiatry and Human Development, 38, 255–269. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R17] Gatzke-Kopp LM, Willoughby MT, Warkentien SM, O’Connor T, Granger DA, & Blair C (2019). Magnitude and chronicity of environmental smoke exposure across infancy and early childhood in a sample of low-income children. Nicotine & Tobacco Research, 21(12), 1665–1672. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R18] Gatzke-Kopp L, Willoughby MT, Warkentien S, Petrie D, Mills-Koonce R, & Blair C (2020). Association between environmental tobacco smoke exposure across the first four years of life and manifestation of externalizing behavior problems in school-aged children. Journal of Child Psychology and Psychiatry, 61(11), 1243–1252. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R19] Gaysina D, Fergusson D, Leve L, Horwood J, Reiss D, Shaw D,…& Harold GT (2013). Maternal smoking during pregnancy and offspring conduct problems: Evidence from 3 independent genetically sensitive research designs. JAMA Psychiatry, 70, 956–963. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R20] Harder VS, Stuart EA, & Anthony JC (2010). Propensity score techniques and the assessment of measured covariate balance to test causal associations in psychological research. Psychological methods, 15(3), 234–249. doi: 10.1037/a0019623 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R21] Hernán MA, Hernández-Diaz S, Werler MM & Mithcel AA (2002). Causal knowledge as a prerequisite for confounding evaluation: an application to birth defects epidemiology. American Journal of Epidemiology, 155(2), 176–184, doi: 10.1093/aje/155.2.176 [DOI] [PubMed] [Google Scholar]

[R22] Hernán MA, & Robins JM (2006). Estimating causal effects from epidemiological data. J Epidemiol Community Health, 60, 578–586. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R23] Hernán MA, Robins JM (2020). Causal Inference: What If. Boca Raton: Chapman & Hall/CRC. [Google Scholar]

[R24] Hirano K, Imbens GW (2004). The propensity score with continuous treatments. Applied Bayesian modeling and causal inference from incomplete-data perspectives, 226164, 73–84. [Google Scholar]

[R25] Hong G (2015). Causality in a social world: Moderation, mediation and spill-over. John Wiley & Sons. [Google Scholar]

[R26] Imai K, Van Dyk DA (2004). Causal inference with general treatment regimes: Generalizing the propensity score. Journal of the American Statistical Association, 99(467), 854–866. [Google Scholar]

[R27] Keyes KM, Davey Smith G, Susser E (2014). Associations of prenatal maternal smoking with offspring hyperactivity: Causal or confounded? Psych. Medicine, 44, 857–867. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R28] King G, & Nielsen R (2019). Why propensity scores should not be used for matching. Political Analysis, 27(4), 435–454 [Google Scholar]

[R29] Lee BK, Lessler J, & Stuart EA (2011). Weight trimming and propensity score weighting. PloS one, 6(3). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R30] Leyrat C, Seaman SR, White IR, et al. Propensity score analysis with partially observed covariates: how should multiple imputation be used? Stat Methods Med Res. 2019;28(1):3–19. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R31] Matt G, Quintana P, Zakarian J, Formann A, Chatfield D, Hoh E… & Hovell M (2011). When smokers move out and non-smokers move in: residential thirdhand smoke pollution and exposure. Tobacco Control, 20, e1. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R32] Morgan SL, & Winship C (2015). Counterfactuals and causal inference. Cambridge University Press. [Google Scholar]

[R33] Robins J (1986). A new approach to causal inference in mortality studies with a sustained exposure period—Application to control of the healthy worker survivor effect. Mathematical Modelling, 7, 1393–1512. [Google Scholar]

[R34] Robins JM, Hernan MA, & Brumback B (2000). Marginal structural models and causal inference in epidemiology Epidemiology 11 (5): 550–560. [DOI] [PubMed] [Google Scholar]

[R35] Rosenbaum PR, Rubin DB (1983). The central role of the propensity score in observational studies for causal effects. Biometrika. 70, 41–55. [Google Scholar]

[R36] Rosenbaum PR (2009). Design of Observational Studies. Springer; New York, NY. [Google Scholar]

[R37] Rubin DB. Multiple Imputation for Nonresponse in Surveys. Hoboken, NJ: John Wiley and Sons; 1987. [Google Scholar]

[R38] Rubin DB (2006). Matched sampling for causal effects. Cambridge University Press. [Google Scholar]

[R39] Rubin DB (2007). The design versus the analysis of observational studies for causal effects: parallels with the design of randomized trials. Statistics in Medicine. 26, 20–36. [DOI] [PubMed] [Google Scholar]

[R40] Rutter M (2007). Proceeding from observed correlation to causal inference: The use of natural experiments. Perspectives on Psychological Science, 2(4), 377–395. [DOI] [PubMed] [Google Scholar]

[R41] Shadish WR, Cook TD, & Campbell DT (2002). Experimental and quasi-experimental designs for generalized causal inference. Boston: Houghton Mifflin,. [Google Scholar]

[R42] Shah BR, Laupacis A, Hux JE, & Austin PC (2005). Propensity score methods gave similar results to traditional regression modeling in observational studies: a systematic review. Journal of clinical epidemiology, 58(6), 550–559. [DOI] [PubMed] [Google Scholar]

[R43] Thoemmes F, & Ong AD (2016). A primer on inverse probability of treatment weighting and marginal structural models. Emerging Adulthood, 4(1), 40–59. [Google Scholar]

[R44] Todd R, Joyner C, Heath A, Neuman R, & Reich W (2003). Reliability and stability of a semistructured DSM-IV interview designed for family studies. Journal of the American Academy of Child and Adolescent Psychiatry, 42, 1460–1468. [DOI] [PubMed] [Google Scholar]

[R45] VanderWeele TJ, Ding P. (2017). Sensitivity Analysis in Observational Research: Introducing the E-Value. Annals Internal Medicine, 167(4):268–274. doi: 10.7326/M16-2607 [DOI] [PubMed] [Google Scholar]

[R46] Wechsler D (1997). Wechsler adult intelligence scale—Third edition. San Antonio, TX: The Psychological Corporation. [Google Scholar]

[R47] West SG, & Thoemmes F (2010). Campbell’s and Rubin’s perspectives on causal inference. Psychological methods, 15(1), 18. [DOI] [PubMed] [Google Scholar]

[R48] Willoughby M, Burchinal M, Garrett-Peters P, Mills- Koonce R, Vernon-Feagans L, Cox M (2013). The Family Life Project: An epidemiological and developmental study of young children living in poor rural communities:II. Recruitment of the Family Life Project. Monographs of the Society for Research in Child Development, 78, 24–35. [DOI] [PubMed] [Google Scholar]

[R49] Willoughby MT, Pek J, Greenberg MT & the FLP Investigators (2012). Parent-reported Attention Deficit/Hyperactivity symptomatology in preschool-aged children: factor structure, developmental change, and early risk factors. J Abnorm Child Psychol, 40(8): 1301–1312. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R50] Wouk K, Bauer AE, Gottfredson NC (2019). How to implement directed acyclic graphs to reduce bias in addiction research. Addictive Behaviors, 94, 109–116. doi: 10.1016/j.addbeh.2018.09.032. [DOI] [PubMed] [Google Scholar]

[R51] Zhu Y, Coffman DL, & Ghosh D (2015). A boosting algorithm for estimating generalized propensity scores with continuous treatments. Journal of causal inference, 3(1), 25–40. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

An Introduction to Inverse Probability Weighting and Marginal Structural Models: The Case of Environmental Tobacco Exposure and Attention Deficit/Hyperactivity Disorder Behaviors

Michael T Willoughby

Siri Warkentien

Erica N Browne

Lisa Gatzke-Kopp

Daniel Berry

Abstract

Introduction

Overview

Postnatal Tobacco Exposure and Children’s Attention Problems

Potential Outcomes Framework

Inverse Probability Treatment Weighting

Extending Inverse Probability Treatment Weighting to Multiple Occasions: Marginal Structural Models

Assumptions

Current Study

Figure 1.

Methods

Participants & Procedures

Measures

Outcome: ADHD Symptoms.

Treatment: Environmental smoke exposure.

Time-invariant covariates.

Table 1.

Table 2.

Time-varying covariates.

Missing data.

Analytic Strategy

Inverse Probability Treatment Weights (IPTW).

Marginal Structural Models.

Results

Inverse Probability Treatment Weights

Weight Construction and Balance Diagnostics.

Figure 2.

Model Results.

Marginal Structural Models

Weight Construction and Balance Diagnostics.

Model Results.

Discussion

Supplementary Material

Public Significance Statement.

Acknowledgements

Appendix A. Sample Stata Code for Continuous Treatment

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases