Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2019 Dec 28.
Published in final edited form as: Uncertain Artif Intell. 2019 Jul;2019:352.

Intervening on Network Ties

Eli Sherman 1, Ilya Shpitser 2
PMCID: PMC6935346  NIHMSID: NIHMS1063765  PMID: 31885519

Abstract

A foundational tool for making causal inferences is the emulation of randomized control trials via variable interventions. This approach has been applied to a wide variety of contexts, from health to economics [4, 7]. Variable interventions have long been studied in independent and identically distributed (iid) data contexts, but recently non-iid settings, such as networks with interacting agents [9, 20, 32] have attracted interest. In this paper, we propose a type of structural intervention [14] relevant in network contexts: the network intervention. Rather than estimating the effect of changing variables, we consider changes to social network structure resulting from creation or severance of ties between agents. We define the individual participant and average bystander effects for these interventions and describe identification criteria. We then prove a series of theoretical results that show existing identification theory obtains minimally KL-divergent distributions corresponding to network interventions. Finally, we demonstrate estimation of effects of network interventions via a simulation study.

1. INTRODUCTION

Practitioners in applied fields such as medicine, epidemiology, and economics often seek causal understanding of the processes they observe. In turn, an in-depth causal understanding can inform decisions and improve policy. The gold standard for obtaining this understanding is the randomized control trial (RCT). Unfortunately, RCTs are often infeasible due to expense or ethical concerns. The field of causal inference provides a framework for emulating RCTs in such situations, using observational data.

A fundamental element of the causal framework is the notion of interventions. Researchers select one or more ‘treatment’ variables and outcomes of interest. The value of the outcome of interest is estimated under the hypothetical scenario in which the value of the treatment variable is changed to a specific, researcher-chosen value.

A number of assumptions are required for such estimates to have a valid causal interpretation. A common assumption is that data samples are independent and identically distributed (iid), which permits the application of conventional statistical methods. Nevertheless, it is easy to think of cases where this assumption does not hold. Here we focus on domains pertaining to networks of interacting study subjects such as infectious disease spread and social networks. Recently, several papers have proposed methods for obtaining inferences from dependent data [32, 28, 20, 19]. As in the iid setting, these papers emulate RCTs by intervening on variables and estimating the effects on downstream outcomes.

Unfortunately, existing methods for network inference are ill-suited to consider more general changes to the network. For instance, in urban development economics authors have proposed housing vouchers as a ‘treatment’ to incent families to move to neighborhoods with greater opportunity for upward social mobility [4]. Evaluating the effect of extracting a family from one neighborhood and placing them in a new neighborhood with new social connections isn’t possible by considering changes to values of variables alone: the network itself changes.

In this paper we extend the classical causal inference framework to consider changes to social network structure. First, we review different network representations in the causal inference literature as well as notions of interventions that have departed from conventional variable interventions. Next, we give a motivating example based on the global political economy. Extending [14], we propose network interventions; interventions on the structure of a network where ties between units are formed or broken. We define the individual participant and average bystander effects of these interventions, analogous to the network effects described in [9] and discuss identification. We then demonstrate that post-severance distributions satisfy independence constraints for the severed units while remaining minimally KL-divergent from the pre-intervention distributions. Finally, we demonstrate estimation of network intervention effects from observational data via a simulation study.

2. REVIEW

Causal Networks

Since networks are of interest to a variety of fields, there are numerous representations, each with their own advantages and limitations. These representations were developed as a means of studying interference – the phenomenon that arises when neighbors’ treatments causally affect each other’s outcomes. While the present work doesn’t focus explicitly on interference, we discuss it here since our work is complementary to that literature.

A widely used approach, characterized in [20], represents networks with directed acyclic graphs (DAGs), where network connections appear as directed edges from one individual’s variables to another’s. This approach lends a natural causal interpretation that follows from a rich literature on causal DAGs. Importantly, the relationships between individuals are encoded in the functional relationships represented by edges connecting different units; when two individuals are not friends, edges will be absent.

Recent work [32, 28, 18] advocates representing networks with Lauritzen-Wermuth-Freydenburg (LWF) chain graphs (CGs), which were given a causal interpretation in [13]. CGs extend DAGs by permitting representation of symmetric relationships (i.e. stable-state equilibria) via undirected edges. [18] argued that CGs under the LWF interpretation can approximate feedback processes when those processes are slow. While CGs provide a more general representation, their interpretation in the context of the present work is somewhat complicated.

Beyond these notions, there is a substantial literature on probabilistic relational models [11, 6]. These models generalize conventional graphical models by employing first-order logic to describe the nature of relationships between entities. These models have been extended to causal inference in network settings [1], however, similar to chain graphs, their use in public health contexts like those considered here is not yet well established. For these reasons, we will restrict attention to graphical models.

Aside from graphical representations, a large subset of the interference literature formalizes inter-unit relationships algebraically as in [9]. Many of these formulations could be reformulated using graphical models.

Structural Interventions

The majority of the causal inference literature has focused on hypothetical experiments wherein interventions are made upon variables (e.g. smoking status) and the effects of interventions are considered with respect to some outcome (e.g. lung cancer). The two dominating frameworks, the Neyman-Rubin potential outcomes framework [17, 27] and Pearl’s graph-based framework [21] differ primarily in their philosophical approach, and recently researchers have begun to use their terminology and mechanics interchangeably. See, for example [23]. The causal interpretation of variable interventions under these (and other) frameworks is the subject of literature at the intersection of applied fields and the philosophy of science. These discussions are broad and have a lengthy history. For an incomplete survey, we encourage the interested reader to consult the works of Halpern, Tian, and Pearl [34, 8], and Woodward [35].

In the past two decades, there has been a movement towards defining more general notions of intervention. Korb proposed several generalizations, including interventions that are stochastic with respect to the treatment variable [12]. Eberhardt and Scheines discussed similar ideas, contrasting ‘hard’ and ‘soft’ interventions, corresponding to changing causal structure (e.g. removing edges) and parametric form respectively [5]. They proposed using this continuum of interventions to aid in causal discovery efforts (see also [34]). Malinsky proposed a framework for considering the effects of changes to the structure of a causal model on a ‘macro level’ [14]. In this framework, one modifies structural equations or manipulates parameters in order to evaluate counterfactuals pertaining to the world in which macro level features are different. Finally, unrelated to philosophy, [19] proposes a type of edge intervention in social networks as a means of understanding changes in network ties. In addition, interventions on paths and edges were considered in the context of mediation analysis in [30]. In the current work, we build on these ideas to evaluate general interventions on network ties, enabling us to envision the counterfactual world in which two units are severed or connected.

3. MOTIVATING EXAMPLE: THE POLITICAL ECONOMY

In this section we give a motivating example: a model of trade relations between countries, in which network interventions provide a means of understanding counterfactual changes to network structure.

In global economics, policies made by one country, such as treaties, trade deals, and tariffs, have a direct impact on the nations geographically and diplomatically connected to the policymaker. In light of current events, we refer to the interventions represented in Fig. 1 as the ‘Brexit’ scenario (for severing two or more countries) and the ‘Turkey joins the EU’ scenario (for connecting two or more countries). These types of temporal DAG models, also known as dynamic Bayes nets [16], correspond to time series cross sectional data from the political economy literature [2]. Each country i is represented by temporally sequential observations Yi,1, Yi,2, … Yi,T where each Y is a vector of economic variables (GDP, unemployment rate, open-market funds rate, etc.).

Figure 1:

Figure 1:

(a) A DAG representing time series cross sectional data on three countries where country 2 has a trade agreement with countries 1 and 3; (b) the DAG in (a) after an intervention is performed, severing the alliance between countries 2 and 3 at t = 3.

As a generalization of Fig. 1, we can imagine the network having several countries, each with multiple neighbors. We can then consider the hypothetical effect of a ‘clean break’ at time t between one country and some or all of its neighbors (introducing non-stationarity [26]). This is represented by moving from Fig. 1 (a) to Fig. 1 (b) by severing the connection between countries 2 and 3 at t = 3. We can also consider the reverse intervention, where two previously unconnected countries are connected, corresponding to the signing of a trade agreement. Graphically, this corresponds to moving from Fig. 1 (b) to (a). Using the framework we propose in this paper, a decision-maker could evaluate these hypothetical policies prior to implementation and ensure that they have the intended effect.

4. REPRESENTING NETWORKS WITH CAUSAL DAGs

Throughout this paper, we will consider performing causal inference in social networks represented by DAGs. In this section we formalize the notation needed to define network interventions and their associated effects.

Causal DAG Prerequisites

We follow standard notation from the probabilistic graphical models literature with vertices and random variables used interchangeably. We will represent random variables and their realizations via capital letters (V) and lowercase letters (v) respectively. Sets of variables and their realizations will be denoted in boldface (V and v). We also define shorthand notation for standard graphical notions: parents, paG(V){W|WV}; ancestors, anG(V){W|WV}; children, chG(V){W|VW}; descendants deG(V){W|VW}. Each of these can be generalized disjunctively to sets: paG(S)VSpaG(V). We also define non-descendants ndG(S)=V\deG(S). Finally XV denotes the state space of the variable V.

A statistical DAG model G with vertices V is associated with a set of probability distributions on random variables in V that satisfy the factorization: p(V)=VVp(V|paG(V)).

In such models, the absences of edges between variables, relative to a complete DAG, encode independences. These correspond to the local Markov property of DAGs: XndG(X)\paG(X)|paG(X).

Extending statistical DAGs, a causal DAG model G with vertices V is associated with a set of distributions on counterfactual variables in V. For YV and AV \ Y, a counterfactual Y (a) describes the value of Y under the hypothetical scenario in which A is set to a via an intervention [21]. We will describe generalizations to this convention in the next section.

In this paper, we assume Pearl’s functional model of a DAG G(V). Under this model, if a are the values of the parents of VV, then V(a) is determined by a structural equation f (a, ϵV) where f is invariant to changes to the values of a and ϵV is an error term. We will further assume that there are no hidden variables in the models discussed in this paper. Relaxing this assumption for network interventions is the subject of future work.

The above counterfactuals, often referred to as one step ahead counterfactuals, permit us to describe all variables in the model via recursive substitution:

V(a)V(apaG(V),{W(a):WpaG(V)\A})

where AV and aXA.

A parameter in a model is said to be identified if it is expressable as a function of the observed data. In causal DAGs with no unobserved variables, all counterfactual distributions p(V(a)) are identified by the g-formula [24]:

p({W(a):WV\A})=WV\Ap(W|paG(W))|A=a

As an example, consider a single-unit version of Fig. 2. If we are interested in the effect of setting A = a, the interventional distribution p(V(a)) is given by p(Y|A = a,C)p(C).

Figure 2:

Figure 2:

A simple social network represented by a DAG. The network exhibits unit homogeneity, symmetric connections, and homogeneous connections.

While classical interventions set a variable to a value, we are often interested in how an intervention affects an outcome along multiple pathways, such as the separate effects of smoking, smoke inhalation and nicotine, on a patient’s risk of lung cancer. In these cases, it is natural to think of interventions in which we intervene on the treatment node with different values for each edge out of the node. For instance, we might consider setting smoking status to 0 for the sake of the smoke inhalation edge and to a reference value for the nicotine exposure edge, corresponding to having the patient smoke e-cigarettes.

Formally, for a set of treatment variables A the set of edges out of A is denoted by α. Interventions are performed with a multiset aα which maps edges to constant values for A or to the natural value of A for each AA. As with node interventions, for Aα = {A|(AB)α}, where (AB)α signifies that an edge AB is in α, edge interventions given by p({W(aα):WV\Aα}) are identified by the edge g-formula [31] with paGα¯(V)={W|(WV)α}:

WV\Aαp(W|a(ZW)α,paGα¯(W)). (1)

If we again consider a single-unit version of Fig. 2, when we intervene with aα={(CA)=c,(CY)=c}, the distribution p({W(aα):WV\Aα}) is given by p(Y |A, C = c′)p(A|C = c).

As an alternative generalization to classical interventions, we might be interested in customizing treatments according to unit-specific characteristics. For instance, we might want to choose a cancer patient’s chemotherapy regimen according to the specific characteristics of their tumor. Rather than setting treatments to fixed values, we set them to analyst-specified functions of pre-treatment covariates. This type of policy intervention is the subject of the dynamic treatment regime (DTR) literature [33, 29].

Formally, for a set of treatment variables A, the set of pre-treatment covariates we wish to use to set each AA is denoted CA. Policy interventions entail setting A to the set of functions fA, where fAfA maps XCAXA. Responses to policy interventions, p({W(fA) : W ∈V \ A}), are identified by the policy g-formula [29]:

WV\Ap(W|{fA(CA):AApaG(W)},paG(W)\A) (2)

Continuing with our single-unit example for Fig. 2, suppose we are interesting in setting A to a policy that is a function of C: fA(C). Then the counterfactual distribution p(V(fA(C))) is given by p(Y|A = fA(C),C)p(C).

DAG Representation of Network Data

In this paper we will represent networks of interacting agents with DAGs following [20]. We will assume each network G is associated with a probability distribution p(V) and that G has a causal interpretation as described above. Denote the set of agents (‘units’ or ‘subjects’) in G by A. G can be partitioned into sub-graphs Gi with variables ViV for each agent iA. The marginal distribution for agent i is therefore denoted p(Vi). The notation −i will refer to A\i. Analogously, Gi denotes the subgraph of G where Vi and its associated edges have been removed.

We define the notion of unit homogeneity. This assumption has two parts: a) if there exists a unit iA with variable ViVi, then there is a corresponding VkVk for all kA with an analogous interpretation; and b) if there exists a unit iA with variables Vi,UiVi such that VipaGi(Ui), then VkpaGk(Uk) for all kA. The first part ensures that units are all of the same ‘type’ (e.g. all agents have the same demographic variables, and the same outcome variable). The second part ensures that the existence of a relationship between one unit’s variables implies the same relationship exists for all other units.

For an example of these definitions, consider Fig. 2. Each unit has a variable of each ‘type’ (e.g. C, A, Y) and the connections between variables are the same for each unit (e.g. CiAi in all units).

On the network level, we define the notions of connectedness, symmetry of connections, and homogeneity of connections. Two units i,jA, with ij, are said to be connected if for some ViVi and some UjVj it is the case that VipaG(Uj). The connection between i and j is said to be symmetric if the vice-versa relationship holds. That is, if i and j are connected and the connection is symmetric then for all VipaG(Uj), we have VjpaG(Ui), where Vi is analogous to Vj and Ui is analogous to Uj. The set of units connected to unit i in G, also referred to as i’s neighbors, will be denoted NG(i). Finally, we define homogeneity of connections, which ensures that the relationships across the network are similar. If i and j are connected and there is an edge between some ViVi and some VjVj then network connections are homogeneous if for all connected units k, l in the network, an edge is present between the analogous VkVk and VlVl.

We further define homogeneity of functional form which strengthens the notion of homogeneity for connections by imposing that, for any pair of connected nodes, the marginal distribution with respect to those two nodes is the same as the marginal distribution for any other pair of connected nodes (e.g. p(Vi, Vj) = p(Vk, Vl) for all i ≠ j and kl)). Under this assumption, pairwise relationships between units are the same, regardless of the type of unit. This assumption is reasonable in certain applied contexts, such as infectious disease spread, which is governed by a process that operates in the same way for any unit in the population.

For an example of these definitions, once again consider Fig. 2. Connections are symmetric (e.g. C1A2 and C2A1) and homogeneous (e.g. C1A2 and likewise C3 → A2).

5. NETWORK INTERVENTIONS

In this section we introduce the notion of network interventions where we intervene on the structure of a network by adding or removing edges, changing relationships between units. We define effects of these interventions and give identification criteria in §6, describe appealing properties of certain network interventions with respect to KL-divergence in §7, and discuss estimation in §8.

Severance Interventions

We will call interventions in which we sever two individuals in a network ‘severance interventions’. For a graph G with pre-intervention distribution p(V), where V is partitioned by {Vi|iA}, we denote the intervention severing units i and j by ij. Graphically, this corresponds to removing all edges between Vi and Vj, yielding the graph Gij. We will define responses to severances with respect to individual units (e.g. p(Vi(ij)). The joint response is simply the joint distribution over these counterfactuals.

We propose two different types of severance intervention. Each formulation has a corresponding causal interpretation and one could use either formulation depending on the application.

The first formulation, which we will call ‘value-based’ severance and is closely tied to classical mediation analysis, generalizes edge interventions [31] to networks. We intervene on variables in an edge-specific manner, replacing cross-unit edges into a unit, say i, with synthetic edges into i that represent fixed relationships no longer dependent on variables in the previously connected unit.

For ViV, let AVi=paGj(Vi), the parents of Vi in Vj. We consider setting Avi to avi. for the sake of edges from Avi to Vi. All other edges out of AVi maintain the observed values of their source node so that for all VjV \ Vi, Vj’s pre- and post-intervention distributions are the same. Since the intervention values are constant, i and j are no longer connected. Returning to our diplomacy example, one might choose avi to be a reference value in the network, such as network averages of economic variables. Formally,

p(Vi(ij;aVi))=p(Vi(AVi=aVi,{Vj(aVi):VjpaGj(Vi)}))

The second formulation, which we call ‘stochastic’ severance, entails marginalizing out the parents from the severed unit. We phrase these as policy interventions.

Consider ViVi and let A={AVi|paGj(A)} (i.e. A is the set of unit-i variables with parents in unit j). The counterfactual p(Vi(ifAj)) corresponds to selecting a set of stochastic policies fA where each fA is unit-structure preserving (see below). The counterfactual is given by the recursive formula:

p(Vi(ifAj))=fVi({C(fA):CpaG(A)\paGj(A)})

For a policy fVi to be unit-structure preserving, p(Vi|paG-j(Vi)) must be the same in the pre- and post-intervention distributions. This ensures that unit i’s causal structure is maintained. Formally,

fVi({W:WpaGj(Vi)})=paj(Vi)p(Vi|paG(Vi))d(paj(Vi)),

where paj(Vi)=paG(Vi)Vj.

We will argue in §7 that post-severance distributions are minimally KL-divergent from p(V) among the class of distributions corresponding to the DAG with reduced edge set. Specifically, this holds for value-based severances if, instead of fixing values, we allow the source nodes of edge interventions to vary and average over those nodes. Likewise, for stochastic severances, the KL result holds if we pick fVi such that Vi and it’s remaining parents paGij(Vi) have a particular relationship.

Connection Interventions

We will call interventions in which we adjoin two previously unconnected individuals in a network connection interventions. We will denote the intervention where units i and j are joined by ij. Graphically, this corresponds to inserting one or more edges from Gi to Gj or vice-versa, yielding Gij. As before, we will define responses to connection interventions with respect to individual units (e.g. p(Vi(ij)). The joint response p(V(ij)) is simply the joint distribution over these counterfactual variables. We describe three separate and increasingly general formulations of connection interventions.

Interventions Under Functional Form Homogeneity

If we assume that the functional forms of network ties are homogeneous, and further assume that each structural equation in the network aggregates arbitrarily many inputs, then the new structural equation for each variable is determined by the equations for the analogous variables in the network.

We might be interested in counterfactual situations that are not present in the observed data, such as the case when connecting two units results in one unit having more neighbors than any unit in the observed data. Because we assume homogeneity of functional form, we can only allow for classes of policies that can flexibly handle an arbitrary number of neighbor nodes.

For the intervention to be well-defined, we must have fVF, where F is a class of aggregator functions of the form f (hU (U1, U2, …), hW (W1, W2, …), …). Each hZ maps Z where Z is an arbitrary-sized multiset of Z-type variables. In turn, f maps H where H is the arbitrary-sized multiset of outputs from the h functions. For instance, if Vi has parents of types U, WV, we might select hU to output the mean of the U’s, hW to output the median of the W’s, and f to output the sum of those two values.

Suppose i and m, and i and k are connected in G. Then under functional form homogeneity, the relationships between ViVi and UmpaGm(Vi) and Vi and the analogous UkpaGk(Vi) are governed by a function fV. In the post-intervention distribution, p(Vi(ifV j)), where units i and j are connected, the relationship between Vi and Uj is also governed by fV. The associated counterfactual is given by:

p(Vi(ifVj))=p(Vi=fV({V(ij):VpaGij(Vi)}))

Intervening With Known Policies

We can relax the assumption of homogeneous network ties by intervening with a known functional form. As with the previous formulation, the analyst is interested in understanding the effect of inducing a specific relationship. Continuing our diplomacy example from §3, consider Turkey as a candidate for EU membership. Since Turkey has a large, robust economy, it may be able to negotiate a more favorable entrance with specific parameters, similar to Switzerland’s non-member bilateral treaties. This formulation represents the inverse operation of function form-based severance interventions.

We wish to evaluate the effect of connecting units i and j with a known induced relationship. In the pre-intervention distribution, ViVi is determined by fVi(paG(Vi),ϵVi)FVi. For the intervention to be valid, the analyst must specify fViFVi where FVi is a family of unit-structure preserving functions. The counterfactual is defined as:

p(Vi(ifVij))=p(Vi=fVi({V(ij):VpaGij(Vi)}))

In this context, the notion of a unit-structure preserving policy is the same as before, however for notational clarity we define S=paGij(Vi)\paG(Vi), Vi’s new parents in the post-intervention graph, and rephrase the definition as:

p(Vi|paG(Vi))=SfVi(paGij(Vi))p(S)dS (3)

Intervening with Unknown Policies

In the most general formulation, we do not assume the analyst knows the interventional policy in advance. Instead, we formalize a procedure for picking an optimal policy to govern the relationship between connected units subject to some known constraints. In the example where we consider Turkey joining the EU, this corresponds to the EU and Turkey negotiating a treaty that jointly optimizes their outcomes (e.g. mean per-capita GDP).

Building on the preceding subsection, we can simply express this type of intervention as an optimization on some jointly defined criterion, such as utility, within a class of policies. Let FVi and FVj be families of unit-structure preserving candidate policies for Vi and Vj. Let C be a known set of constraints that the solution must satisfy (e.g. Turkey cannot trade away more natural resources than it has). Let g((Vi,Vj)(fVi,fVj)) be a known function that captures the joint outcome for units i and j under a given pair of f’s. Then the optimal f’s are given by:

arg maxfViFVi,fVjFVjE[g(Vi,Vj)(fVi,fVj)] subject to C

Solving this optimization corresponds to evaluating p(V(ij)(fVi,fVj)) for each pair of candidate f ‘s that satisfy C and picking the best pair.

6. EFFECTS AND IDENTIFICATION OF NETWORK INTERVENTIONS

Hudgens and Halloran [9] defined the direct, spillover, and network average effects for interference settings. Respectively, these correspond to the effect on unit i’s outcome when i’s treatment is modified, the effect on i’s outcome when i’s neighbor’s treatment is modified, and the average effect on all units’ when someone’s treatment is modified (i.e. the sum of the direct and spillover effects). Since these effects are defined for a particular type of node intervention, it is necessary to define analogous effects for network interventions.

We define two new effects: the individual participant effect (IPE), and the average bystander effect (ABE). The IPE is defined for units i and j when they are the subjects of a network intervention. IPEi is the contrast between i’s observed and interventional outcomes. For severances (with connections defined analogously), this contrast is given by IPEi(ij) = YiE[Yi(ij)]. We can also define the average participant effect (APE) as the mean of IPEi and IPEj.

The ABE captures the contrast for units not directly involved in a network intervention. By the Markov property of DAGs, for a network intervention on i and j, the ABE is non-trivial for i and j ‘s pre-intervention neighbors NG(i)NG(j)\{i,j} (e.g. the other countries i and j have treaties with). For severances,

ABE(ij)=1|(NiNj)\{i,j}|k(NiNj)\{i,j}YkE[Yk(ij)]

Connections are defined analogously. Following [9], the average effect on the network (e.g. the effect on the ‘global’ economy) is the sum of APE and ABE.

Identification

For a given intervention type, if the IPE is identified then the ABE is also identified and vice versa. We therefore focus on the criteria for identification of each type of intervention we’ve discussed.

Under our setup, value-based severance interventions are the network analogue of edge interventions in mediation settings. For a severance of units i and j, let α be the set of edges out of paG(Vi)Vj. If aα specifies a constant value for each edge VjVi and that the source nodes for all other edges in α are random, then p(Vi(ij;aα)) is identified by the edge g-formula (Eq. 1).

For instance, in Fig. 2, if we are interested in the effect on Y2 of severing units 2 and 3 by setting AV2=aV2={C3=c3,A3=a3} for the sake of the edges (C3Y2)and (A3Y2), then:

p(V2(23);aV3)=p(Y2|A1,A2,C1,C2,A3=a3,C3=c3)×p(A1|C1,C2)p(A2|C1,C2,C3=c3)p(C1)p(C2)

The other interventions we define entail a change in the functional form of the variables of interest. Suppose we wish to join units i and j with A={VVi|paGij(V)Vj} and CA={VVj|AchGij(V)} Then, fA are all functions that either satisfying the aggregator properties for the homogeneous case, or are unit-structure preserving for the non-homogeneous case, the counterfactual p(Vi(ifAj)) is identified by the policy g-formula (Eq. 2). Stochastic severances (e.g. of units i and j) are also identified under our setup, with A={AVi|paGj(A)} and CA=paG(A)\paGj(A) for each AA and fA satisfying unit-structure preservation for each fA. For homogeneous connections, we must also estimate the parameters of each aggregator function (hV, hW, etc.) from observed data. These are identified by maximum likelihood from G

As an example, if we are interested in performing a stochastic severance on units 2 and 3 in Fig. 2, suppose we set fVi(paG23(Vi))=p(Vi|paG(Vi)\parG3(Vi)) for each ViVi. Then the identifying functional for the effect on V2 is given by:

p(V2(2fVi3))=p(Y2|A1,A2,C1,C2)×p(A1|C1,C2)p(A2|C1,C2)p(C1)p(C2)

Latent-Variable Network Interventions

Throughout this work, we have assumed that data is representable by a DAG where all variables are observed. We can relax this assumption to allow for models in which some variables are latent. In these cases, the interpretation of the proposed interventions remains the same, however, identification conditions will be modified slightly.

Consider a latent-variable DAG G(VH) with V observed and H hidden. From G, we can obtain a acyclic directed mixed graph (ADMG) G(V) via a latent projection operation [22]. G represents an equivalence class of graphs that share the same observed variables and set of independence constraints [22].

Identification of network interventions in an ADMG G relies on the assumptions described in the previous section, existing non-parametric identification theory for ADMGs, and the requirement that the network intervention operates only on edges that are present in both G and G. As pointed out previously, value-based severances in DAGs can be identified by the edge g-formula. Under the relaxation allowing for latent variables, value-based severances are instead identifiable according to a version of the ID algorithm adapted to edge interventions, proven sound and complete in [29]. Likewise, for stochastic severances and for connection interventions, if the identification conditions described in the previous sub-section hold, then the respective interventions are identifiable according to a version of the ID algorithm adapted to policy interventions, proven sound and complete in [29]).

7. OPTIMAL CHOICE OF POST-SEVERANCE DISTRIBUTION

In this section we prove a series of results regarding the KL-divergence from a distribution p(V), corresponding to a known DAG G, to another distribution p˜(V), corresponding to a DAG in which edges have been removed. The results demonstrate that the KL-divergence from p to p˜ is minimized when p˜ takes on a form similar to the g-formula [24]. These probabilistic results help justify the g-formula and edge g-formula as intuitive tools for analyzing causal queries in DAGs. Moreover, these results motivate the manner in which we perform severances.

The first result demonstrates that when removing edges between a node A and its parents, a simple modification to the factorization of G, removing A’s parents from the term for A yields the KL-minimal distribution satisfying the independence constraints implied by the severance.

Theorem 1 Let V be a set of random variables with p(V) corresponding to a DAG G. Let A ∈ V. Let P(V) be the set of probability distributions that factorize according to G. Then

p(A)VV\Ap(V|paG(V))=arg min p˜P(V)DKL(pp˜)s.t.ApaG(A)

The second result generalizes the first by allowing for edge removal between A and a subset of its parents.

Theorem 2 Let V be a set of random variables with p(V) corresponding to a DAG G. Let AV and BV such that BpaG(A). Let P(V) be the set of probability distributions that factorize according to G. Then

p(A|paG(A)\B)VV\Ap(V|paG(V))=arg min p˜P(V)DKL(pp˜)s.t.AB|paG(A)\B

The following result generalizes the previous theorem to allow for removal of any set of edges in G. This result corresponds to directly to severance interventions. If we remove the dependence of each variable on the parents for which we remove edges, and otherwise keep the variable functionally consistent with its original structural equation, the result is the minimally KL-divergent distribution from the original distribution that reflects the severance.

Theorem 3 Let V be a set of random variables with p(V) corresponding to a DAG G. Let AV and for each AA define In (A)paG(A), the set of parents of A whose edges into A we wish to remove. Let P(V) be the set of probability distributions that factorize according to G Then

AAp(A|paG(A)\In(A))VV\Ap(V|paG(V))=arg min p˜P(V)DKL(pp˜) s.tAIn(A)|paG(A)\In(A)AA

The final two results are corollaries of Thm. 3 and are closely related to classical causal inference. The first corresponds to variable interventions where we fix some AV to a value a. The KL-closest distribution to p(V) is given by the g-formula, where terms for each AA are removed and variables with parents in A are evaluated with those parents set to a.

Theorem 4 Let V be a set of random variables with p(V) corresponding to a DAG G. Let AV and assume that for some a we have p(A = a) > 0. Let P(V) be the set of probability distributions that factorize according to G. Then

VV\Ap(V|paG(V))|A=a=arg min p˜P(V)DKL(pp˜)s.t.p˜(Ai|ndG(Ai))=I(Ai=ai)fori={1,,|A|}

The final result, which can be found in the appendix, generalizes the above theorem to edge interventions [31]. This result corresponds to the value-based formulation of severances. When we fix a set of edges to constant values, the resulting distribution is given by the edge g-formula and is the KL-closest distribution to the pre-intervention distribution that reflects the fact that those edges have been fixed.

8. EXPERIMENTS

We now describe a set of simulation studies which demonstrate the feasibility of obtaining unbiased estimates of the effects of network interventions. In these experiments we assume partial interference: we observe M samples of a network, each with N units. While we do not consider full interference scenarios, in which the analyst has access to only a single sample of the network, similar results could be obtained in that setting using the auto-g-computation algorithm [32]. We also assume that all pre-intervention networks satisfy symmetry of connections, and homogeneity of units, connections, and functional form.

We consider a social network graph resembling Fig. 2 where all variables C, A, and Y are binary. In four separate experiments we demonstrate estimation across varying social network generators, varied attachment probabilities for the Erdős-Rényi generator, varied network sizes, and varied sample sizes. For the latter three experiements we restrict attention to the stochastic severance intervention. For each unit i we generate values for Vi according to log-linear models with parameters τC, τA, τY. For the detailed setup, please see the appendix.

For each experiment we estimate the average IPE by separately applying the intervention to each unit in the network. For severances we remove the connection between the unit of interest and it’s highest degree neighbor while for connections we connect the unit to it’s highest degree non-neighbor.

8.1. ESTIMATION AND EVALUATION

For each experiment we first fit models for each variable type given it’s parents via MLE where features for neighbor variables are sums of those variables. We estimate values of endogenous nodes using Monte Carlo sampling using these fit models and exogenous nodes via the empirical distribution. We estimate values in the pre- and post-intervention worlds and report the mean difference between these estimates across all units and all samples of the network. For specific details on the mechanics of each intervention type, please see the appendix.

To evaluate the performance of this estimation technique, we generate ‘ground truth’ graphs corresponding to the result of each intervention and generate values for the Yi’s of interest. For each simulated network we generated 1000 bootstrap samples. We compare the intervention effects to the ground truth effects and obtain the bias of our approach. As presented in Tables 1, and 3 – 5, the 95% confidence interval for each experiment covers the ground truth bias and thus shows that the effects of network interventions can be consistently estimated.

Table 1:

95% confidence intervals for the bias of estimates of each type of network intervention.

95% Confidence Intervals of Bias
Intervention Erdős-Rényi Barabasi-Albert Watts-Strogatz
Homogeneous Connection (−.0049, .0020) (−.0021, .0006) (−.0024, .0010)
Known Connection (−.0014, .0010) (−.0004, .0016) (−.0018, .0020)
Unknown Connection (−.0035, .0025) (−.0134, .0124) (−.0280, .0093)
Stochastic Severance (−.0015, .0043) (−.0096, .0066) (−.0032, .0020)
Value Severance (−.0088, .0112) (−.0010, .0020) (−.0048, .0016)

9. DISCUSSION

In this paper we proposed a framework for intervening on the structure of a social network graph by severing or creating connections between subjects. We defined effects that extend the network effects defined in [9]. We then proved that for severances, and causal interventions generally, the g-formula and edge g-formula obtain distributions that are minimally KL-divergent from the pre-intervention distribution subject to the independence constraint imposed by the intervention. Finally, we demonstrated that these effects can be estimated from observational data via a simulation study.

In the future, this framework could be generalized to chain graph models to allow for more flexibility of network representation.

Supplementary Material

Appendix

ACKNOWLEDGEMENTS

The authors would like to express their gratitude to Jiji Zhang for proposing an earlier version of Theorem 1, inspiring the other theoretical results given in this paper.

This work is sponsored in part by the National Institutes of Health grant R01 AI127271-01 A1, the Office of Naval Research grant N00014-18-1-2760 and the Defense Advanced Research Projects Agency (DARPA) under contract HR0011-18-C-0049. The content of the information does not necessarily reflect the position or the policy of the Government, and no official endorsement should be inferred.

Contributor Information

Eli Sherman, Department of Computer Science, Johns Hopkins University, Baltimore, MD.

Ilya Shpitser, Department of Computer Science, Johns Hopkins University, Baltimore, MD.

References

  • [1].Arbour D, Garant D, and Jensen D. Inferring network effects from observational data In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 715–724. ACM, 2016. [Google Scholar]
  • [2].Beck N and Katz JN. Modeling dynamics in time-series–cross-section political economy data. Annual Review of Political Science, 14:331–352, 2011. [Google Scholar]
  • [3].Chen HY. A semiparametric odds ratio model for measuring association. biometrics, 63:413–421, 2007. [DOI] [PubMed] [Google Scholar]
  • [4].Chetty R, Hendren N, and Katz LF. The effects of exposure to better neighborhoods on children: New evidence from the moving to opportunity experiment. American Economic Review, 106(4):855–902, 2016. [DOI] [PubMed] [Google Scholar]
  • [5].Eberhardt F and Scheines R. Interventions and causal inference. Philosophy of Science, 74(5):981–995, 2007. [Google Scholar]
  • [6].Friedman N, Getoor L, Koller D, and Pfeffer A. Learning probabilistic relational models. In IJCAI, volume 99, pages 1300–1309, 1999. [Google Scholar]
  • [7].Glass TA, Goodman SN, Hernán MA, and Samet JM. Causal inference in public health. Annual review of public health, 34:61–75, 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [8].Halpern JY and Pearl J. Causes and explanations: A structural-model approach. part i: Causes. The British journal for the philosophy of science, 56(4):843–887, 2005. [Google Scholar]
  • [9].Hudgens MG and Halloran ME. Toward causal inference with interference. Journal of the American Statistical Association, 103(482):832–842, 2008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [10].Jones E, Oliphant T, and Peterson P. ${$scipy$}$: open source scientific tools for ${$python$}$. 2014.
  • [11].Koller D, Friedman N, Džeroski S, Sutton C, McCallum A, Pfeffer A, Abbeel P, Wong M-F, Heckerman D, Meek C, et al. Introduction to statistical relational learning. MIT press, 2007. [Google Scholar]
  • [12].Korb KB, Hope LR, Nicholson AE, and Axnick K. Varieties of causal intervention In Pacific Rim International Conference on Artificial Intelligence, pages 322–331. Springer, 2004. [Google Scholar]
  • [13].Lauritzen SL and Richardson TS. Chain graph models and their causal interpretations. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 64(3):321–348, 2002. [Google Scholar]
  • [14].Malinsky D. Intervening on structure. Synthese, 195(5):2295–2312, May 2018. [Google Scholar]
  • [15].Malinsky D, Shpitser I, and Richardson T. A potential outcomes calculus for identifying conditional path-specific effects In Chaudhuri K and Sugiyama M, editors, Proceedings of Machine Learning Research, volume 89 of Proceedings of Machine Learning Research, pages 3080–3088. PMLR, 16–18 Apr 2019. [PMC free article] [PubMed] [Google Scholar]
  • [16].Murphy KP and Russell S. Dynamic bayesian networks: representation, inference and learning. 2002. [Google Scholar]
  • [17].Neyman J. Sur les applications de la théorie des probability aux experiences agricoles: Essai des principes. Roczniki Nauk Rolniczych, 10:1–51, 1923. [Google Scholar]
  • [18].Ogburn EL, Shpitser I, and Lee Y. Causal inference, social networks, and chain graphs. arXiv preprint arXiv:1812.04990, 2018. [DOI] [PMC free article] [PubMed]
  • [19].Ogburn EL, Sofrygin O, Diaz I, and van der Laan MJ. Causal inference for social network data. arXiv preprint arXiv:1705.08527, 2017. [DOI] [PMC free article] [PubMed]
  • [20].Ogburn EL, VanderWeele TJ, and others. Causal diagrams for interference. Statistical science, 29(4):559–578, 2014. [Google Scholar]
  • [21].Pearl J Causality. Cambridge university press, 2009. [Google Scholar]
  • [22].Richardson TS, Evans RJ, Robins JM, and Shpitser I. Nested Markov properties for acyclic directed mixed graphs. arXiv preprint arXiv:1701.06686, 2017.
  • [23].Richardson TS and Robins JM. Single world intervention graphs (swigs): A unification of the counterfactual and graphical approaches to causality. Center for the Statistics and the Social Sciences, University of Washington Series. Working Paper, 128(30):2013, 2013. [Google Scholar]
  • [24].Robins J. A new approach to causal inference in mortality studies with a sustained exposure period—application to control of the healthy worker survivor effect. Mathematical modelling, 7(9–12):1393–1512, 1986. [Google Scholar]
  • [25].Robins J, Richardson T, and Spirtes P. On identification and inference for direct effects. Epidemiology, 2009. [Google Scholar]
  • [26].Robinson JW and Hartemink AJ. Non-stationary dynamic bayesian networks. In Advances in neural information processing systems, pages 1369–1376, 2009.
  • [27].Rubin DB. Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of educational Psychology, 66(5):688, 1974. [Google Scholar]
  • [28].Sherman E and Shpitser I. Identification and estimation of causal effects from dependent data. In Advances in Neural Information Processing Systems, pages 9445–9456, 2018. [PMC free article] [PubMed]
  • [29].Shpitser I and Sherman E. Identification of personalized effects associated with causal pathways. In proceedings of the 34th Conference on Uncertainty in Artificial Intelligence, 2018. [PMC free article] [PubMed] [Google Scholar]
  • [30].Shpitser I and Tchetgen ET. Causal inference with a graphical hierarchy of interventions. Annals of statistics, 44(6):2433, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [31].Shpitser I and Tchetgen ET. Causal inference with a graphical hierarchy of interventions. Annals of statistics, 44(6):2433, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [32].Tchetgen EJT, Fulcher I, and Shpitser I. Auto-G-Computation of Causal Effects on a Network. arXiv preprint arXiv:1709.01577, 2017.
  • [33].Tian J. Identifying dynamic sequential plans. arXiv preprint arXiv:1206.3292, 2012.
  • [34].Tian J and Pearl J. Causal discovery from changes In Proceedings of the Seventeenth conference on Uncertainty in artificial intelligence, pages 512–521. Morgan Kaufmann Publishers Inc., 2001. [Google Scholar]
  • [35].Woodward J. Causation and manipulability. 2001.

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Appendix

RESOURCES