Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2014 Nov 21.
Published in final edited form as: Stat Politics Policy. 2012 Feb 4;3(1):2151–1050. doi: 10.1515/2151-7509.1050

Why and When “Flawed” Social Network Analyses Still Yield Valid Tests of no Contagion

Tyler J VanderWeele 1, Elizabeth L Ogburn 2, Eric J Tchetgen Tchetgen 3
PMCID: PMC4240520  NIHMSID: NIHMS596262  PMID: 25419506

Abstract

Lyons (2011) offered several critiques of the social network analyses of Christakis and Fowler, including issues of confounding, model inconsistency, and statistical dependence in networks. Here we show that in some settings, social network analyses of the type employed by Christakis and Fowler will still yield valid tests of the null of no social contagion, even though estimates and confidence intervals may not be valid. In particular, we show that if the alter’s state is lagged by an additional period, then under the null of no contagion, the problems of model inconsistency and statistical dependence effectively disappear which allow for testing for contagion. Our results clarify the setting in which even “flawed” social network analyses are still useful for assessing social contagion and social influence.

Keywords: confounding, contagion, dependence, social influence, social networks


In his paper, Lyons (2011) offers a number of criticisms of social network analyses that attempt to estimate contagion effects, such as those of Christakis and Fowler (2007, 2008). A number of his criticisms, and a number of other critiques (Cohen-Cole and Fletcher, 2008; Shalizi and Thomas, 2011; Noel and Nyhan, 2011), are important and need to be taken seriously in the conduct and interpretation of such studies. Some progress has been made in addressing or working around some of these critiques (Fowler and Christakis, 2008; Ver Steeg and Galstyan, 2010; Christakis and Fowler, 2011; VanderWeele, 2011). However, many of the issues raised have arguably not yet been dealt with adequately. In this paper, we offer further discussion on several points raised by Lyons (2011), focusing specifically on model consistency and inference. We argue that, although the issues raised by Lyons (2011) can lead to biased estimates and invalid inference, social network analyses like those of Christakis and Fowler (2007, 2008) will, in some circumstances, still suffice as a valid test of the null hypothesis of no contagion (no social influence) in the social network.

On p. 13 of his paper, Lyons (2011) considers a model like that used in Christakis and Fowler (2007, 2008) in which the log odds of the state of a “focal participant” or “ego” at time t, Yi,t, is modeled as a linear function of the state of the “linked participant” or “alter” at time t, Yj,t, and at time t − 1, Yj,t−1, of the ego’s state at time t −1, Yi,t−1, and of the covariates for the ego, Xi. In this model, β1 is the coefficient for Yj,t (the alter’s state at time t) and β2 is the coefficient for Yj,t−1 (the alter’s state at time t −1):

log(P(Yi,t=1Yj,t,Yj,t-1,Yi,t-1,Xi)P(Yi,t=0Yj,t,Yj,t-1,Yi,t-1,Xi))=β0+β1Yj,t+β2Yj,t-1+β3Yi,t-1+β4Xi

Christakis and Fowler (2007, 2008) interpret the estimate of β1 as their “contagion effect” or causal estimate of social influence. Lyons (2011) argues that, if, in the network, there is a person i with a tie to person k and that person k has a tie to person mi then the models themselves imply that β1 = 0. The models themselves effectively contradict the existence of the very effect Christakis and Fowler want to assess. Lyons further argues that when the state is continuous and linear regression is used as in the loneliness social network analyses of Cacioppo et al. (2009) then if person i has a tie with person j and person j with person i, and if likewise person j has a tie with person k and person k with person j with ki then it follows from the models that β1= β2 =0.

This issue raised by Lyons is essentially that there are more equations than unknowns. This arises because the state of the ego at time t is regressed on the current state of the alter at the same time t, rather than only on the lagged state of the alter. When there is reciprocation between persons with regard to their ties, this creates modeling problems. Intuitively, the problem develops because the same variable at the same time period, e.g. the ego’s state at time t, is the dependent variable in one regression and the independent variable in another regression.

As noted by Lyons, the models themselves then effectively contradict the conjecture of social influence that Christakis and Fowler want to assess. An important exception, however, arises when the null hypothesis of no contagion is true. In this case, provided that homophily and environmental confounding have been properly controlled for, β1 does indeed equal 0 (cf. Shalizi and Thomas, 2011). And, if β1 = 0, then the models may be correctly specified, provided e.g. the log odds of the ego’s state is indeed linear in the covariates. Under the null hypothesis of no contagion, the problem of model inconsistency effectively vanishes1. Thus, under the null hypothesis of no contagion, a statistical test for β1 = 0 would provide a joint test of (i) no contagion, (ii) no homophily or environmental confounding conditional on the covariates and (iii) correct model specification with regard to the covariates. The estimate and confidence interval for β1 would not constitute a valid estimate of the contagion effect, even if there is no homophily or environmental confounding conditional on the covariates. However, whether the confidence interval for β1 contained 0 would constitute a valid test of the null hypothesis of no contagion, again provided the assumptions of no homophily and no environmental confounding conditional on the covariates and that of correct model specification with respect to the covariates held. Under these assumptions, we can in theory do testing, but not estimation.

This brings us to yet another critique offered by Lyons (2011), that of statistical modeling under the dependence structures that are generated by a social network. Christakis and Fowler (2007, 2008) use a method referred to as generalized estimating equations, clustering on the ego, to take into account the use of multiple time points for the ego. Unfortunately, as Lyons (2011) notes, this is not the only source of dependence in the data. If there is social influence (contagion) then the clusters defined by the ego will not be independent of one another. Moreover, even under the null of no contagion, when contemporaneous ego-alter data is used, the generalized estimating equations standard error is not always valid. Christakis and Fowler (2007, 2008) consider social influence for different types of relationships including ego-nominated friends, alter-nominated friends, mutual friends, spouses, neighbors and siblings. We show in the Appendix that because Christakis and Fowler (2007, 2008) use contemporaneous data for the ego and the alter, and because one person’s state at time t is thus both an outcome in one regression and an independent variable in another, the standard errors for β1 obtained by Christakis and Fowler (2007, 2008) are anti-conservative whenever relationships are reciprocal e.g. for mutual friends, spouses, siblings and neighbors. In these cases, even under the null hypothesis of no contagion, the standard errors will be invalid and the confidence intervals will be too narrow. One could derive a valid estimator of the standard error under the null but unfortunately the generalized estimating equation standard error used by Christakis and Fowler (2007, 2008) is not valid. However, we also show that for relationships which are not reciprocal, e.g. ego-nominated friendships or alter-nominated friendships (that are not mutual friendships), the generalized estimating equation standard error used by Christakis and Fowler (2007, 2008) is valid under the null hypothesis of no contagion and thus whether their confidence interval includes 0 does constitute a valid test for the null of no contagion, provided control has been made for homophily and environmental confounding.2

For the purposes of testing, both the problem of model inconsistency and the problems of statistical dependence and standard error estimation can be easily addressed if the alter’s state is lagged by an additional period in the regressions. The argument used by Lyons (2011) to show that the models are inconsistent in the presence of contagion is no longer applicable. Moreover, under the null of no contagion, and provided adequate control has been made for homophily and environmental confounding, the clusters defined by the ego are independent of one another, avoiding statistical dependence throughout the network. Finally, by lagging the alter’s state by an additional period, the same variable is no longer an outcome in one regression and a dependent variable in another regression at the same period time t, circumventing the issue of obtaining, under the null, valid standard errors when using generalized estimating equations. The generalized estimating equation standard error will be valid, under the null of no contagion. Thus, if a researcher lags the alter’s state by an extra time period so that the log odds of the ego’s state at time t, Yi,t, is modeled as a linear function of the alter’s state at time t − 1, Yj,t−1, and at time t − 2, Yj,t−2, the ego’s state at time t − 1, Yi,t−1, and the covariates for the ego:

log(P(Yi,t=1Yj,t-1,Yj,t-2,Yi,t-1,Xi)P(Yi,t=0Yj,t-1,Yj,t-2,Yi,t-1,Xi))=β0+β1Yj,t-1+β2Yj,t-2+β3Yi,t-1+β4Xi

then, whether the generalized estimating equation confidence interval for the coefficient of Yj,t−1 contains 0 will constitute a valid test of the null of no contagion. We can at least still do testing using the same approach of Christakis and Fowler (2007, 2008) but simply lagging the alter’s state by an additional period.

All of our discussion thus far has assumed that adequate control has been made for homophily and environmental confounding. As noted by Lyons (2011) and by Shalizi and Thomas (2011), this is, of course, a very strong assumption. VanderWeele (2011) proposed a sensitivity analysis technique to assess the extent to which an unmeasured factor responsible for homophily or environmental confounding would have to be related to both the ego’s and the alter’s state in order to substantially alter qualitative and quantitative conclusions. The technique itself made simplifying parametric assumptions but a more general approach could alternatively be used (VanderWeele, 2011; VanderWeele and Arah, 2011). Unfortunately, however, it is not clear that this technique would apply in the context of inconsistent models when contemporaneous data for the ego and the alter are used. This is because the sensitivity analysis parameters in VanderWeele (2011) related the observed expectation for the ego’s state, controlling for observed covariates, to the expectation that would have been obtained had control also been possible for an unobserved covariate; however, when the models are inconsistent then it is no longer clear that the estimates, e.g. in Christakis and Fowler (2007, 2008), using the observed data, provide a consistent estimate of the expectation conditional on the observed covariates, for the very reasons raised by Lyons. The sensitivity analysis technique could, however, be applied to estimates obtained by lagging the alter’s state by an additional period because, once again, the problem of model inconsistency then no longer arises.

We have given numerous arguments for lagging the alter’s state by an additional period: (1) the problem of model inconsistency raised by Lyons (2011) does not arise, (2) the analyses using generalized estimating equations clustering by ego as in Christakis and Fowler (2007, 2008) will give valid tests of the null of no contagion, and (3) the sensitivity analysis technique of VanderWeele (2011) can be applied to the estimates obtained from such analyses.

In fact, Christakis and Fowler (2007, 2008) report, in the online supplement to their papers, that they ran such analyses in which the alter’s state was lagged by an additional period and that the results of such analyses were similar to those of their main analyses using contemporaneous data for the ego and alter, i.e. they once again find evidence of significant contagion effects for smoking and obesity. Moreover, with these lagged social network analyses, the sensitivity analysis techniques to assess that the extent to which latent homophily and unmeasured environmental confounding could explain away the estimates are again applicable and suggest the contagion effect for smoking cessation between spouses and obesity between mutual friends are quite robust to potential latent homophily and unmeasured environmental confounding (VanderWeele, 2011).

A few further caveats are, however, in order. First, the sensitivity analysis techniques, in their present form, are not applicable to dynamic forms of homophily, such as “unfriending”, as considered by Noel and Nyhan (2011). However, this unfriending problem in the Framingham Heart Study data used by Christakis and Fowler (2007, 2008) does not seem, by Noel and Nyhan’s own simulations, sufficiently common to result in substantial biases. Second, even with the alter’s state lagged an additional period, the sensitivity analysis technique of VanderWeele (2011) is applicable to the estimates, but may not be to the limits of the confidence interval obtained by using generalized estimating equations, because, under the alternative hypothesis that contagion effects are present, the standard error for the supposed contagion effect still may not be valid because of statistical dependence in outcomes across the network. If valid confidence intervals were obtained the sensitivity analysis technique of VanderWeele (2011) would be applicable to the limits of the confidence interval as well. Finally, although some progress can be made with testing the null of no contagion, ultimately, we would also like to be able to obtain valid inferences and confidence intervals, not just tests and estimates. Doing so will require the development of statistical theory to handle the sorts of dependence structures that arise on social networks. In our view, this should be one of the central priorities in subsequent work that aims to provide a more rigorous foundation for the types of social network analyses for contagion effects exemplified by the studies of Christakis and Fowler (2007, 2008).

Acknowledgments

This research was support by NIH grants ES017876 and HD060696.

Appendix

Consider outcome data Yi,t, t = 1, …, T; i = 1, …n; let Xi,t denote p covariates for person i observed up to time t. If Xi,t is time-invariant then we could also write Xi,t = Xi. Let Inline graphic denote the set of all Yj,t with a specific type of tie to person i at time t, ij, which can be of Type A for an “alter-nominated tie”, of type E for an “ego-nominated tie”, or of type M for a “mutually-nominated tie” (or similarly for any other type of tie which is reciprocal e.g. spouse, neighbor, sibling). To test for contagion we may test the null hypothesis H0 that Yi,t is jointly independent of {Yj,t: Yj,tInline graphic} given (Xi,t, Yi,t−1, Yj,t−1). We make the following assumptions:

  1. The cardinality of Inline graphic remains bounded as n goes to infinity.

  2. The support of Xi,t is bounded.

  3. Yi,t is the independent of ({Xj,t : t′ ≤ t}, {Yj,t : t′ < t −1}, {Xj,t : t′ < t}) given (Xi,t, Yi,t−1, Yj,t, Yj,t−1).
    logitPi,j,t(β):=logitPr(Yi,t=1Yj,t,Xi,t,Yi,t-1;β)=β0+β1Yj,t+β2Yj,t-1+β3Yi,t-1+β4Xi,t,forallYj,tRi,t,whereβ1=0encodesthenulleffectofYj,t,andβ=[β0β1β2β3β4T]T=[β00β2β3β4T]T. (1)

Christakis and Fowler (2007, 2008) estimate β by maximizing the objective function

logitj:Yj,tRi,tYi,tPi,j,t(β)Yi,t(1-Yi,t)(1-Pi,j,t(β))(1-Yi,t)

with respect to β* which produces β̂, the solution to the estimating equation:

itji:Yj,tRi,tUi,j,t(β^)=0whereUi,j,t(β^)=Wi,j,tεitWi,j,t=[1Yj,tYj,t-1Yi,t-1Xi,tT]Tεit={Yi,t-Pi,j,t(β^)} (2)

this set of equations define a standard GEE for correlated outcomes with a logit link, and the independence working correlation matrix. In this setting, the large sample variance of β̂ is typically approximated by Σemp =

{n-1iE[tj:Yj,tRi,tUi,j,t(β)βTβ]}-1×n-1{iE[tj:Yj,tRi,tUi,j,t(β)]2}×{n-1iE[tj:Yj,tRi,tUi,j,t(β)βTβ]}-1

We now state the main result.

Result

Suppose assumptions 1-4 hold, then, under H0, the following hold:

  1. if for all i and for all t, Ri,t is strictly of type E or A only, then Σemp is, when n is large, approximately equal to the large sample variance-covariance of β̂.

  2. if for all i and for all t, Ri,t is strictly of type M only, then Σemp is guaranteed, when n is large, to be anti-conservative; that is Σemp is generally smaller (in the semipositive definite sense) than the variance-covariance of β̂.

Proof

Under H0 it can be verified that under Assumption 4, β=[β00β2β3β4T]T solves the equation

E{Ui,j,t(β)}=0

which in turn implies that under mild regularity conditions, β̂ is consistent for β. Furthermore, a Taylor series expansion of equation (2) can be used to establish that in large samples, under Assumptions 1-4:

n(β^-β){n-1iE[tj:Yj,tRi,tUi,j,t(β)βTβ]}-11nitj:Yj,tRi,tUi,j,t(β)

This further implies that in large samples, the variance of n(β^-β) is approximately equal to

{n-1iE[tj:Yj,tRi,tUi,j,t(β)βTβ]}-1×n-1E{itj:Yj,tRi,tUi,j,t(β)}2×{n-1iE[tj:Yj,tRi,tUi,j,t(β)βTβ]}-1

where A⊗2 = AAT. The middle factor reduces to

n-1E{itj:Yj,tRi,tUi,j,t(β)}2=n-1iE[tj:Yj,tRi,tUi,j,t(β)]2+n-1ijE[{tj:Yj,tRi,tUi,j,t(β)}{tk:Yk,tRj,tUj,k,tT(β)}]

and

n-1isE[{tj:Yj,tRi,tUi,j,t(β)}{tk:Yk,tRs,tUs,k,tT(β)}]=n-1isE[tj:Yj,tRi,tk:Yk,tRs,tUi,j,t(β)Us,k,tT(β)]

since under Assumption 3, for t < t′, Yj,tInline graphic, Yk,tInline graphic and is

E{Ui,j,t(β)Us,k,tT(β)}=0.

Furthermore, Assumption 3 implies that:

n-1isE[tj:Yj,tRi,tk:Yk,tRs,tUi,j,t(β)Us,k,tT(β)]=n-1ist1(Ys,tRi,t)1(Yi,tRs,t)E{Ui,j,t(β)Us,k,tT(β)}=n-1i<st1(Ys,tRi,t)1(Yi,tRs,t)[E{Ui,s,t(β)Us,i,tT(β)}+E{Us,i,t(β)Ui,s,tT(β)}]=n-1i<st1(Ys,tRi,t)1(Yi,tRs,t)[E{Wi,s,tWs,i,tTεitεst}+E{Ws,i,tWi,s,tTεitεst}]

where

E{Wi,s,tWs,i,tTεitεst}=[0(p+2)×(p+2)0(p+2)×101×(p+2)Pi,s,t(β)(1-Pi,s,t(β))Ps,i,t(β)(1-Ps,i,t(β))]:=Γi,s,t

Therefore the variance of n(β^-β) is approximately equal to

{n-1iE[tj:Yj,tRi,tUi,j,t(β)βTβ]}-1×n-1E{iE[tj:Yj,tRi,tUi,j,t(β)]2+i<st1(Ys,tRi,t)1(Yi,tRs,t)Γi,s,t}×{n-1iE[tj:Yj,tRi,tUi,j,t(β)βTβ]}-1.

The standard sandwich estimator implemented by Christakis and Fowler (2007, 2008) is under H0 approximately equal to Σemp and therefore the bias of their estimator is approximately equal to

{n-1iE[tj:Yj,tRi,tUi,j,t(β)βTβ]}-1×n-1i<st1(Ys,tRi,t)1(Yi,tRs,t)Γi,s,t×{n-1iE[tj:Yj,tRi,tUi,j,t(β)βTβ]}-1

which can be verified to be semipositive definite. Under the null hypothesis H0, the variance estimator of β̂ used by Christakis and Fowler (2007, 2008) is valid in case (i) of the result, since then

1(Ys,tRi,t)1(Yi,tRs,t)=0ofallt,and(s,i),si

and thus the bias term is equal to 0. However, in case (ii), with mutual/reciprocal ties, this is not the case and thus their variance estimator may be anti-conservative in large samples.

Footnotes

1

The problem of model consistency may still arise if multiple alters are used for a single ego, but such issues would not arise with the Framingham Heart Study data with the analyses of mutual friends, ego-nominated friendships, or spouses since, in these cases, there will only be one alter per ego. In the Framingham Heart Study, each ego nominates only one friend.

2

Of course, such a test statistic will only be useful if it has non-trivial power; however, in the analyses of Christakis and Fowler (2007, 2008), they were able to reject the null at least for ego-nominated ties.

Contributor Information

Tyler J. VanderWeele, Harvard University

Elizabeth L. Ogburn, Harvard University

Eric J. Tchetgen Tchetgen, Harvard University

References

  1. Cacioppo JT, Fowler JH, Christakis NA. Alone in the crowd: the structure and spread of loneliness in a large social network. Journal of Personality and Social Psychology. 2009;97(6):977–991. doi: 10.1037/a0016076. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Christakis NA, Fowler JH. The spread of obesity in a large social network over 32 years. New England Journal of Medicine. 2007;357:370–379. doi: 10.1056/NEJMsa066082. [DOI] [PubMed] [Google Scholar]
  3. Christakis NA, Fowler JH. The collective dynamics of smoking in a large social network. New England Journal of Medicine. 2008;358:2249–2258. doi: 10.1056/NEJMsa0706154. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Christakis NA, Fowler JH. Social contagion theory: examining dynamic social networks and human behavior. Statistics in Medicine. 2011 doi: 10.1002/sim.5408. to appear. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Cohen-Cole E, Fletcher JM. Is obesity contagious? Social network vs. environmental factors in the obesity epidemic. Journal of Health Economics. 2008;27(5):1382–1387. doi: 10.1016/j.jhealeco.2008.04.005. [DOI] [PubMed] [Google Scholar]
  6. Fowler JH, Christakis NA. Estimating peer effects on health in social networks. Journal of Health Economics. 2008;27(5):1386–1391. doi: 10.1016/j.jhealeco.2008.07.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Lyons R. The spread of evidence-poor medicine via flawed social-network analyses. Statistics, Politics and Policy Article. 2011;2(1):Article 2, 1–26. [Google Scholar]
  8. Noel H, Nyhan B. The ’unfriending’ problem: the consequences of homophily in friendship retention for causal estimates of social influence. Social Networks. 2011;33:211–218. [Google Scholar]
  9. Shalizi CR, Thomas AC. Homophily and contagion are generically confounded in observational social network studies. Sociological Methods and Research. 2011;40:211–239. doi: 10.1177/0049124111404820. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. VanderWeele TJ. Sensitivity analysis for contagion effects in social networks. Sociological Methods and Research. 2011;40:240–255. doi: 10.1177/0049124111404821. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. VanderWeele TJ, Arah OA. Bias formulas for sensitivity analysis of unmeasured confounding for general outcomes, treatments and confounders. Epidemiology. 2011;22:42–52. doi: 10.1097/EDE.0b013e3181f74493. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Ver Steeg G, Galstyan A. Ruling out latent homophily in social networks. NIPS Worksop on Social Computing; 2010. URL: http://mlg.cs.purdue.edu/lib/exe/fetch.php?id=schedule&cache=cache&media=machine_learning_group:projects:paper19.pdf. [Google Scholar]

RESOURCES