Abstract
Exponential random graph models (ERGMs), also known as p* models, have been utilized extensively in the social science literature to study complex networks and how their global structure depends on underlying structural components. However, the literature on their use in biological networks (especially brain networks) has remained sparse. Descriptive models based on a specific feature of the graph (clustering coefficient, degree distribution, etc.) have dominated connectivity research in neuroscience. Corresponding generative models have been developed to reproduce one of these features. However, the complexity inherent in whole-brain network data necessitates the development and use of tools that allow the systematic exploration of several features simultaneously and how they interact to form the global network architecture. ERGMs provide a statistically principled approach to the assessment of how a set of interacting local brain network features gives rise to the global structure. We illustrate the utility of ERGMs for modeling, analyzing, and simulating complex whole-brain networks with network data from normal subjects. We also provide a foundation for the selection of important local features through the implementation and assessment of three selection approaches: a traditional p-value based backward selection approach, an information criterion approach (AIC), and a graphical goodness of fit (GOF) approach. The graphical GOF approach serves as the best method given the scientific interest in being able to capture and reproduce the structure of fitted brain networks.
Introduction
Brain networks
Whole-brain connectivity analyses are gaining prominence in the neuroscientific literature due to the need to understand how various regions of the brain interact with one another. The inherent complexity in the way these regions interact necessitates studying the brain as a whole rather than just its individual parts. The application of network and graph theory to the brain has facilitated these whole-brain analyses and helped to uncover new insights into the structure and function of the nervous system. Structural and functional connectivity studies have revealed that the brain exhibits the small-world properties [1]–[4]. These properties are characterized by tight local clustering and efficient long distance connections as described in the seminal work of [5]. Network models based on a given small-world property or other local property (e.g., node degree ()) have mostly been utilized as a means to describe various brain networks. However, in order to gain deeper insights into the complex neurobiological interactions and changes that occur in many neurological conditions and disorders, analysis methods that enable systematically assessing several properties simultaneously are needed given the statistical dependencies among these properties [6], [7]. The exponential random graph models discussed in this paper provide one such analysis approach.
Exponential random graph models
Exponential random graph models (ERGMs), also known as p* models [8]–[11], have been utilized extensively in the social science literature to analyze complex network data as discussed in [12], [13] and others. However, the literature on their use in biological networks (especially brain networks) has remained sparse. Descriptive models based on a specific feature of the network such as characteristic path length () and clustering coefficient () have dominated connectivity research in neuroscience [14]. The few inferential studies have employed relatively rudimentary testing techniques such as the ANOVA used in [15] to examine group differences based on one of these features. ERGMs provide a statistically principled approach to the systematic exploration of several features simultaneously and how they interact to form the global network architecture. They allow parsimoniously modeling the probability mass function (pmf) for a given class of graphs based on a set of explanatory metrics (local features). The pmf can then be used to determine the probability that any given graph is drawn from the same distribution as the observed graph. These models enable achieving an efficient representation of complex network data structures and allow examining the way in which a network's global structure and function depend on its local structure. That is, they provide a means of assessing how and to what extent combinations of local (brain) structures produce global network properties.
In ERGMs networks are analogous to a multivariate response variable in regression analysis, with the explanatory metrics quantifying local features of the network such as how clustered connections are (short distance communication) or how well the network transmits information globally (long distance communication). Fitted parameter values from the model can then be utilized to understand particular emergent behaviors of the network (how local features give rise to the global structure). These values can also be used to simulate random realizations of networks that retain constitutive characteristics of the original network.
A more intuitive way to view ERGMs in the brain network context are as models that quantify the relative significance of various graph/network measures (, , , etc.), or their analogues, in explaining the overall network structure, thus enabling generative conjectures about global architecture. These models provide several benefits for brain network researchers. They allow asking specific questions about processes that may give rise to the network architecture via the inclusion of explanatory metrics of choice. ERGMs inherently account for any confounding bias, like the (, )-dependence of network measures (where is the number of nodes and the average degree) detailed in [6], when the potential confounding variables are included in the model. The stochastic nature of the model allows understanding and quantifying the uncertainty (an intrinsic feature of complex biological processes) associated with our observed brain network(s) [12]. Simulations based on ERGM fits to brain networks (sets of selected network measures and their parameter estimates) can provide insight into biological variability via the distribution of possible brain networks produced. However, currently, the computational intensiveness of fitting ERGMs may preclude their use with very large networks (e.g., voxel-based networks with tens of thousands of nodes) and certain combinations of network measures.
Here we illustrate the utility of ERGMs for modeling, analyzing, and simulating complex whole-brain network. We also provide a foundation for the development of a “best assessment” ERGM for analyzing complex brain networks. Appropriate statistical comparisons between networks (or groups of networks) via ERGMs necessitates establishing one model (set of explanatory metrics/local features) in order to extract comparable parameter estimates due to the dependence of these features on each other. Toward this end, we assess three potential methods of feature selection for ERGMs in the brain network context. These approaches include a traditional p-value based backward selection approach, an information criterion approach (AIC), and a graphical goodness of fit (GOF) approach. Although the latter two techniques have been discussed in the context of ERGMS [16], [17], no detailed comparisons have been performed to determine whether the approaches generally produce the same “best” model/set of features.
Materials and Methods
Ethics statement
This study included 10 volunteers representing a subset of a previous study [18]. The study protocol, including all analyses performed here, was approved by the Wake Forest University School of Medicine Institutional Review Board. All subjects gave written informed consent in accordance with the Declaration of Helsinki.
Data and network construction
Our data include whole-brain functional connectivity networks for 10 normal subjects aged 20–35 (5 female, average age 27.7 years old [4.7 SD]). Each network is comprised of 90 nodes corresponding to the 90 brain regions (90 ROIs-Regions of Interest) defined by the Automated Anatomical Labeling atlas (AAL; [19]). The whole-brain networks were constructed based on fMRI images using graph theory methods. For each subject, 120 images were acquired during 5 minutes of resting using a gradient echo echoplanar imaging (EPI) protocol with TR/TE = 2500/40 ms on a 1.5 T GE twin-speed LX scanner with a birdcage head coil (GE Medical Systems, Milwaukee, WI). The acquired images were motion corrected, spatially normalized to the MNI (Montreal Neurological Institute) space and re-sliced to 4 mm voxel size using an in-house processing script based on SPM99 package (Wellcome Trust Centre for Neuroimaging, London, UK). The resulting images were not smoothed in order to avoid artificially introducing local spatial correlation [20].
The first step in performing the network construction was to generate a whole brain connectivity matrix, or adjacency matrix . This is a binary matrix where is the number of nodes representing 90 ROIs. The matrix notes the presence or absence of a connection between any two nodes ( and ). The determination of a connection between and was done by calculating a partial correlation coefficient adjusted for motion and physiological noises (see [21] for further details).
An unweighted, undirected network was then generated for each subject by applying a threshold to the correlation matrix to yield an adjacency matrix . In order to compare data across people, it is necessary to generate comparable networks. The network was defined so that the relationship between the number of nodes and the average node degree is the same across different subjects. In particular, the network was defined so that = ()() is the same across subjects, with . This relationship is based on the path length of a random network with nodes and average degree [4], [5], and can be re-written as . Our analysis includes ERGM fits to these thresholded whole-brain functional connectivity networks for each subject (an example of which is shown in Figure 1 ).
Model definition
Exponential random graph models have the following form [22]:
(1) |
Here is an ( nodes) random symmetric adjacency matrix representing a brain network from a particular class of networks, with if an edge exists between nodes and and otherwise. Nodes represent locations in the brain (e.g., ROIs) and edges represent functional or structural connections between them. We statistically model the probability mass function (pmf) of this class of networks as a function of the prespecified network features defined by the -dimensional vector g. This vector of explanatory metrics consists of covariates that are functions of the network and can contain any graph statistic (e.g., number of paths of length two) or node statistic (e.g., brain location of the node). The parameter vector , associated with g, quantifies the relative significance of the network features in explaining the structure of the network after accounting for the contribution of all other network features in the model and must be estimated. More specifically, indicates the change in the log odds of an edge existing for each unit increase in the corresponding explanatory metric. If the value corresponding to a given metric is large and positive, then that metric plays a considerable role in explaining the network architecture and is more prevalent than in the null model (random network with the probability of an edge existing ()). Conversely, if the value is large and negative, then that metric still plays a considerable role in explaining the network architecture but is less prevalent than in the null model. Consequently, inferences can be made about whether certain local features/substructures are observed in the network more than would be expected by chance enabling hypothesis development regarding the biological processes that produce these structural properties. The normalizing constant ensures that the probabilities sum to one. This approach allows representing the global network structure by locally specified explanatory metrics, thus providing a means to examine the nature of networks that are likely to emerge from these effects.
The goal in defining g is to identify local metrics that concisely summarize the global (whole-brain) network structure. Table 1 defines a subset of mathematically compatible explanatory network metrics (for further details see [16], [23], [24]). Several analogs to these metrics for directed graphs have been detailed by [25]. The GWD, GWESP, and GWDSP statistics discussed in [17] help address degeneracy issues illuminated in [22] and [26]. These issues concern the shape of the estimated pmf (e.g., a pmf in which only a few graphs have nonzero probability) and can lead to lack of model convergence and unreliable results. As noted by [16], [27], the most appropriate explanatory metrics vary by network type. Thus, an exploration of which network metrics best characterize brain networks has great appeal. Once the most appropriate statistics have been established, parameter profiles can be utilized to classify and compare whole-brain networks. These parameter profile comparisons require the use of a uniform set of explanatory metrics for all networks (due to metric interdependencies) and balanced networks (same number of nodes for all networks) due to the dependence of the metrics on network size.
Table 1. Subset of explanatory network metrics.
Metric | Description |
Edges | Number of edges in network |
Two-Path | Number of paths of length 2 in the network |
k-Cycle | Number of k-cycles in network |
k-Degree | Number of nodes with degree k |
Geometrically weighted | Weighted sum of the counts of each degree () weighted |
degree (GWD) | by the geometric sequence , where |
is a decay parameter | |
Geometrically weighted | Weighted sum of the number of connected nodes having exactly |
edge-wise shared partner (GWESP) | shared partners weighted by the geometric sequence |
, where is a decay parameter | |
Geometrically weighted | Weighted sum of the number of non-connected nodes having exactly |
non-edge-wise shared partner(GWNSP) | shared partners weighted by the geometric sequence |
, where is a decay parameter | |
Geometrically weighted | Weighted sum of the number of dyadsa having exactly |
dyad-wise shared partner (GWDSP) | shared partners weighted by the geometric sequence |
, where is a decay parameter | |
Nodematch | Number of edges for which nodal attribute |
equals nodal attribute | |
(e.g., brain location of node brain location of node ) |
node pair with or without edge.
It is important to note that ERGMs can be thought of as a way of parameterizing models for networks, and are not a “kind” of network model in the way “model” is traditionally used in the brain network literature. Most other network models, in theory, should have an equivalent ERGM expression (though that specific expression may not be convenient, parsimonious, etc.). For instance, an ERGM with just the Edges metric ( Table 1 ) in the formulation is equivalent to the Erdos-Renyi model. Thus, ERGMs allow parameterizations that subsume most (if not all) other network models.
Fitting of the ERGM in equation 1 is normally done with either Markov chain Monte Carlo maximum likelihood estimation (MCMC MLE) or maximum pseudo-likelihood estimation (MPLE) ([28] contains details). Model fits with MPLE are much simpler computationally than MCMC MLE fits and afford higher convergence rates with large networks. However, properties of the MPLE estimators are not well understood, and the estimates tend to be less accurate than those of MCMC MLE. Here we employ MCMC MLE to fit the model in equation 1 given that there were no convergence issues. See [29] for further details about this estimation approach which can be implemented in the statnet package [23] for the R statistical computing environment.
Model selection
In order to establish the most appropriate set of explanatory metrics for each subject's brain network and provide a foundation for the development of a “best assessment” ERGM for analyzing complex brain networks, we implemented and assessed three model/metric selection methods. They include a traditional p-value based backward selection approach [30], an information criterion approach (AIC, [31]), and a graphical goodness of fit (GOF) approach [17]. The latter two techniques are used most often for metric selection in ERGMs [16], [17]; and, to our knowledge, no detailed comparisons have been performed to determine whether the approaches generally produce the same “best” model. The p-value approach is based on removing metrics that are not statistically significant. Whereas, the AIC approach selects the set of metrics that produce the estimated distribution most likely to have resulted in the observed data with a penalty for additional metrics to ensure parsimony. Alternatively, the graphical GOF method allows subjectively selecting the set of explanatory metrics that produces the model most able to capture and reproduce certain topological properties of the observed network (see Appendix S1 for more details). For each approach ERGMs were fitted to the 90-node unweighted, undirected brain networks of the 10 subjects discussed previously. The potential explanatory metrics for each of the 10 networks are listed by category in Table 2 . The categories were chosen based on properties of brain networks that are regarded as important in the literature [14]. These metrics are analogous to typical brain network metrics (e.g., clustering coefficient ()) but have been developed to be statistically compatible with ERGMs. Figure 2 illustrates the calculation of the less widely used of these statistics, namely GWESP, GWNSP, and GWDSP, on a six-node example network. The distribution of the unweighted analogues of these metrics (ESP, NSP, and DSP) is given for simplicity. The weighted versions simply sum the values of the distribution giving less weight to those with more shared partners. For this example we note that the network has 1 set of connected nodes with 1 shared partners (ESP), 5 sets with 1 shared partner (ESP), 1 set with 2 shared partners (ESP), and 0 sets with 3 or 4 shared partners (ESP and ESP). Further details on the metrics are provided in Table 1 and [27]. The parameters associated with GWESP, GWDSP, GWNSP, and GWD were all assumed to be fixed and known (for reasons outlined in [17]) and set to based on preliminary analyses as this value generally led to better fitting models according to all selection methods. The three aforementioned model selection approaches are outlined in Appendix S1.
Table 2. Explanatory network metrics by category.
Category | Metric(s) |
1) Connectedness | Edges, Two-Path |
2) Local Clustering/Efficiency | GWESP, GWDSP |
3) Global Efficiency | GWNSPa |
4) Degree Distribution | GWD |
5) Location (in the brain) | Nodematch |
NOTE: See Table 1 for more details on the metrics.
Not inherently global, but helps produce models that accurately capture the global efficiency of our networks.
Results
We implemented the model selection procedures delineated in the previous section and Appendix S1 for each of the 10 subjects using the statnet package [23] for the R statistical computing environment. The resulting models (for each approach) and their corresponding parameter estimates are displayed in Table 3 . These estimates quantify the relative significance of the given metric in explaining the overall network structure; and, more specifically, they specify how much the log odds of an edge existing increases for each unit increase in the corresponding metric. For example, the final graphical GOF model for subject 10 shows that GWESP is the most important metric (other than the number of edges) in describing the structure of the subject's network given the larger absolute value of the parameter estimate. Additionally, the positiveness of the estimate associated with GWESP indicates that an edge that closes a triangle is more likely to exist than it would by chance (i.e., the network has more clustering than a random network where the probability of an edge is ) for the family of networks represented by subject 10's fitted model. As evidenced by the results in Table 3 , the three model selection methods can lead to very different “best” models. The disparate final model GOF plots that can result from the three different model selection approaches are exhibited in Figures 3 and 4 (for subjects 2 and 8). Again, our aim here is not to judge the three selection methods, but to highlight the fact that they can lead to disparate final models/sets of features. These model selection approaches have been used seemingly arbitrarily in the literature; and, to our knowledge, no detailed comparisons have been performed to determine whether the approaches generally produce the same “best” model/set of features. For our purposes we recommend the graphical GOF approach as the standard and will use it in future analyses given that our main scientific interest lies in being able to capture and reproduce the structure of the fitted brain networks. With the exception of subject 8, the graphical GOF approach produces reasonably good fits for all subjects. The remaining best graphical selection model GOF plots are shown in Figures 5 – 12 .
Table 3. Final model estimates by model selection approach for each subject.
Final Modela | ||||||||
Subject | Approach | Edges | Two-Path | GWESP | GWDSP | GWNSP | GWD | Nodematch |
2 | p-value | 0.90 | 1.11 | |||||
AIC | 0.90 | 1.11 | ||||||
Graphical | 1.00 | 0.93 | ||||||
3 | p-value | 0.12 | 1.53 | |||||
AIC | 0.30 | 1.50 | ||||||
Graphical | 0.72 | 1.59 | ||||||
5 | p-value | 1.02 | 1.25 | |||||
AIC | 0.99 | 1.24 | ||||||
Graphical | 0.99 | 1.24 | ||||||
8 | p-value | 0.45 | 0.80 | |||||
AIC | 1.27 | 1.23 | ||||||
Graphical | 0.05 | |||||||
9 | p-value | 1.04 | 1.19 | |||||
AIC | 0.92 | 1.21 | ||||||
Graphical | 1.06 | |||||||
10 | p-value | 1.52 | 1.19 | 1.41 | ||||
AIC | 1.34 | 1.40 | ||||||
Graphical | 1.51 | 1.12 | ||||||
12 | p-value | 0.85 | 1.32 | |||||
AIC | 0.00 | 0.33 | 1.22 | |||||
Graphical | 0.87 | |||||||
13 | p-value | 1.08 | 1.12 | |||||
AIC | 1.05 | 1.11 | ||||||
Graphical | 1.04 | |||||||
16 | p-value | 0.41 | 0.82 | |||||
AIC | 0.87 | 1.32 | ||||||
Graphical | 0.81 | |||||||
21 | p-value | 0.93 | 1.93 | |||||
AIC | 0.66 | 2.05 | ||||||
Graphical | 0.90 | 1.93 |
Bolded metrics are those contained in at least half of the “best” subject network models based on the graphical GOF approach.
Despite the obvious importance of Edges (as evidenced by the absolute values of its parameter estimates in Table 3 ) in the models, the overlap between the simulated and observed networks in the GOF plots is not merely an effect of pure connectivity, but also an effect of network organization. As mentioned in the Materials and Methods section, an ERGM with just an Edges metric is equivalent to the Erdos-Renyi random graph. Thus, due to the small worldness of brain networks, models of this type will not capture the tight local clustering/regional specificity (among other properties) present in these networks [5]. Figure 13 illustrates this point by exhibiting the disparate GOF plots for an Edges only model and the final graphical selection model for subject 10. Clearly the Edges only model is unable to capture the regional specificity (edge-wise shared partners distribution) and global processing (minimum geodesic distance distribution) properties of brain networks which are well embodied by the final graphical selection model.
The bolded explanatory metrics in Table 3 are those contained in at least half of the “best” subject network models based on the graphical GOF approach. Examining the uniformity of the selected explanatory metrics across subjects in this way is needed for the development of a “best assessment” ERGM for the reasons detailed in the Introduction and Materials and Methods sections. Examination of these metrics leads to an overall ERGM for whole-brain networks that requires a Connectedness metric (Edges), a Local Efficiency metric (GWESP), and a Global Efficiency metric (GWNSP). That is,
(2) |
These three metrics having the most influential impact on overall functional brain network organization in these subjects seems consistent with our biological understanding of the brain. The number of functional connections present (Edges) is clearly instrumental in information transfer while also playing a role in brain network organization [6]. Clustering (GWESP) is another critical feature of brain network architecture that allows the efficient local processing of information. The consistently negative values associated with GWNSP indicate that if two brain areas are not functionally connected, they are less likely to have shared connections with other regions than they would by chance. That is, two regions are less likely than by chance to have a 2-path as the shortest path between them. Speculatively, this may result from the brain having direct connections when necessary, but allowing for slightly longer global connections (3-paths, etc.) to maintain efficiency otherwise. Additionally, the synergistic combination of these metrics engenders networks that well capture the geodesic (global efficiency), shared partner (local efficiency), degree, and triad census (motifs) distributions of brain networks as evidenced by several of the GOF plots in Figures 3– 13.
Group-based network comparisons can potentially be performed by comparing the mean of the estimated , , and values among groups via hypothesis testing or classification techniques. It is important to note that if one were to just compare the mean of the estimated (Edges) values among groups, for instance, potential confounding from the GWESP and GWNSP would be inherently accounted for given that the estimates account for all other metrics in the model. In the hypothesis testing framework one can exploit the fact that the 's are approximate MLEs and thus asymptotically have a Gaussian distribution. Approximate T-tests and/or F-tests can then be employed. Investigating the individual differences in final models among subjects is also important. Although parameter values cannot be directly compared when different models are fitted, the disparate fits themselves may elucidate biologically interesting differences among groups or individual subjects.
Here we implement our best assessment ERGM from equation 2 to illustrate its utility for comparing groups of networks. The subjects were split into a younger (aged 20–26) and slightly older (aged 29–35) group (5 subjects each) in order to assess if there were any discernible differences between their brain networks. Other studies have shown that older adults tend to have less clustering and slightly more connections than their younger counterparts [32], [33]. However, direct comparisons have not been done on groups of subjects this close in age to establish whether these changes tend to commence immediately or take effect at older ages. Moreover, these studies did not consider the potential confounding effects of other network metrics when assessing these differences. As evidenced by the results of our analysis exhibited in Table 4 , the two groups differ significantly in (the GWNSP parameter) with the younger group having a more negative value. That is, if two nodes are not functionally connected, they are more likely to have shared connections with other nodes in the brain networks of the older group. Biologically, this could be the result of the older brain maintaining two-path connections between brain areas that have lost their direct connections; however, this interpretation is purely speculative at this point. Interestingly, there is not a statistically significant difference between the groups for the Edges or GWESP parameter, with the trend being for the older subjects' networks to have more connections and clustering. These findings run counter to those in the literature and may stem from the fact that our analysis accounts for some of the confounding that arises from network metric dependencies [6], [7]. These disparate findings could also just be a result of the closeness in age of the two groups or random variability given our small sample size. As noted by [7], larger and methodologically more comparable future investigations are needed to resolve many of the contradictory findings in functional connectivity studies.
Table 4. Results of ERGM parameter estimate comparisons between younger and older subjects.
Younger | Older | ||||
Mean | SE | Mean | SE | P-value | |
(Edges) | 3.95 | 3.47 | 0.2626 | ||
(GWESP) | 0.89 | 1.81 | 1.14 | 1.53 | 0.3339 |
(GWNSP) | 6.62 | 4.79 |
In addition to model representation and comparison, ERGMs also provide a statistically sound method for simulating complex brain networks as is done for the GOF plots. To illustrate their utility in this context we simulated 100 networks based on the fitted ERGM of subject 10. We then calculated several descriptive metrics commonly used in the neuroimaging literature for the observed and simulated networks to assess the utility of the simulated networks within the neuroscientific context. Table 5 displays the results of these computations for Clustering coefficient (), Characteristic path length (), Local Efficiency (), Global Efficiency (), and Mean Nodal Degree () (see [4], [34] for details on these metrics). As evidenced by the results in this table, the simulated networks are very similar to the observed network. Hence ERGMs render an approach to simulating scientifically meaningful brain networks.
Table 5. Network metrics of observed and simulated networks from subject 10.
Simulated Networks | ||
Metric | Observed Value | Mean (SE) |
Clustering coefficient () | 0.447 | 0.468 (0.004) |
Characteristic path length () | 3.520 | 3.475 (0.033) |
Local Efficiency () | 0.555 | 0.576 (0.004) |
Global Efficiency () | 0.284 | 0.290 (0.003) |
Mean Nodal Degree () | 5.066 | 4.939 (0.042) |
Discussion
Our analyses in the previous section illustrate the utility of ERGMs for modeling, analyzing, and simulating complex whole-brain networks. We have also provided a foundation for the development of a best assessment ERGM for the classification and comparison of brain networks via the evaluation of three model/feature selection approaches. The graphical GOF approach serves as the best method given the scientific interest in being able to capture and reproduce the structure of the fitted networks. The greatest appeal of modeling brain networks with ERGMs lies in their ability to efficiently represent this complex network data and allow examining the way in which a network's global structure and function depend on local structural components.
There are a myriad of ways in which ERGMs can potentially be useful for brain network researchers. As previously discussed and demonstrated, groups of networks can be statistically compared and classified (by disease status, age, task, etc.) based on several network features simultaneously. The models also provide a way of exploring which local features of brain networks are most important in explaining their global architecture. As noted by many authors [35]–[38], an analysis approach that can capture the network characteristics from a group of subjects' brain networks is needed. ERGMs provide a potential solution since one could average the parameter profiles, , of a group and then simulate “representative” networks based on this averaged profile. Preliminary work has shown this approach to be quite effective. These representative networks can serve as null networks against which other networks and network models can be compared, as visualization tools, and as a means for characterizing properties of network metrics in a group (e.g., community structure). ERGMs, in general, will also serve to both accommodate the ever increasing complexity of whole-brain analyses and inform future statistical models for whole-brain research.
A computational limitation of note for brain network researchers is that MCMC MLE fits of ERGMs can be computationally intensive and may fail to converge with more spatially resolved networks than the 90 ROI ones used here. This fitting algorithm has been shown to handle networks of several thousand nodes [17]; however, its effectiveness is more dependent on the number and topological structure of the edges than the node count [23]. Future work will examine the scalability of ERGMs fitted with MCMC MLE in the context of brain networks. As convergence issues arise with more finely parcellated networks, MPLE fits may serve as an appropriate alternative [16].
Another potential issue of note is that the original data's variability may affect the resulting ERGM fits. A given subject may exhibit variability of the connections in their brain networks at different times of day due to experimental or physiological reasons. Assessment of the robustness of ERGM fits to this within-subject variability is important and will be the focus of future investigations.
In addition to the utility of ERGMs in the research context, the potential implications of their use in the clinical context are profound as they can aid in elucidating system level functional features/neurological processes (represented by the explanatory network metrics) that play a role in various cognitive disorders. For instance, several authors have shown that schizophrenics have less local efficiency in their brain networks [7], [35], [36] (which would correspond to a smaller parameter estimate for GWESP in the ERGM framework) than control subjects. ERGMs enable empirically examining how this difference in efficiency affects global brain structure and comparing these emergent whole-brain brain networks between schizophrenics and controls. For example, one could simulate networks based on model fits to schizophrenics and controls to see how this difference affects the variability of the resulting networks. This comparison may give us insight into the neurological mechanisms that lead to schizophrenia (e.g., lack of local neuronal communication leads to less stability in global structure for schizophrenics).
Aside from the aforementioned clinical and biological work that can be done with the models, there are also many possible directions for future methodological research involving the analysis of complex brain networks with ERGMs. Approximating the small-sample distribution of may prove useful for hypothesis testing frameworks in which appealing to asymptotic normality may not be appropriate. Developing methods for quantifying GOF plots to remove subjectivity and allow for analytical comparisons of the graph will be valuable. The approach should allow some flexibility in determining how to weight the four comparison statistics with respect to their relative importance to the scientific context. Developing novel explanatory network metrics rooted in both the biology of the brain and the mathematics of ERGMs will engender better best assessment models for network comparison. A corresponding hybrid model selection approach where models are penalized for using many covariates and the GOF plots are assessed will prove useful in maintaining parsimony as the number of relevant explanatory metrics increases. The extension of ERGMs to directed and/or weighted brain networks will prove beneficial as construction of these network types gains feasibility.
Supporting Information
Acknowledgments
We thank the editor and referees for their comments that considerably improved the paper. An earlier version of this manuscript can be found at arxiv.org (Simpson et al., arXiv:1007.3230v1 [stat.AP]).
Footnotes
Competing Interests: The authors have declared that no competing interests exist.
Funding: This work was supported by the Translational Science Institute of Wake Forest University (Translational Scholar Award, URL: http://www.wfubmc.edu/TSI.htm) and by the National Institute of Neurological Disorders and Stroke (NS070917, URL:http://www.ninds.nih.gov/). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
References
- 1.Iturria-Medina Y, Sotero RC, Canales-Rodriguez EJ, Aleman-Gomez Y, Melie-Garcia L. Studying the human brain anatomical network via diffusion-weighted MRI and graph theory. Neuro Image. 2008;40:1064–1076. doi: 10.1016/j.neuroimage.2007.10.060. [DOI] [PubMed] [Google Scholar]
- 2.Gong G, He Y, Concha L, Lebel C, Gross DW, et al. Mapping anatomical connectivity patterns of human cerebral cortex using in vivo diffusion tensor imaging tractography. Cerebral Cortex. 2009;19:524–536. doi: 10.1093/cercor/bhn102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Bassett DS, Bullmore E. Small-world brain networks. Neuroscientist. 2006;12:512–523. doi: 10.1177/1073858406293182. [DOI] [PubMed] [Google Scholar]
- 4.Stam CJ, Reijneveld JC. Graph theoretical analysis of complex networks in the brain. Nonlinear Biomedical Physics. 2007;1:3. doi: 10.1186/1753-4631-1-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Watts DJ, Strogatz SH. Collective dynamics of small-world networks. Nature. 1998;393:440–442. doi: 10.1038/30918. [DOI] [PubMed] [Google Scholar]
- 6.van Wijk BCM, Stam CJ, Daffertshofer A. Comparing brain networks of different size and connectivity density using graph theory. PLoS ONE. 2010;5(10):e13701. doi: 10.1371/journal.pone.0013701. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Lynall ME, Bassett DS, Kerwin R, McKenna PJ, Kitzbichler M, et al. Functional Connectivity and brain networks in schizophrenia. The Journal of Neuroscience. 2010;30(28):9477–9487. doi: 10.1523/JNEUROSCI.0333-10.2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Frank O, Strauss D. Markov graphs. Journal of the American Statistical Association. 1986;81:832–842. [Google Scholar]
- 9.Pattison PE, Wasserman S. Logit models and logistic regressions for social networks: II. Multivariate relations. British Journal of Mathematical and Statistical Psychology. 1999;52:169–194. doi: 10.1348/000711099159053. [DOI] [PubMed] [Google Scholar]
- 10.Robins GL, Pattison PE, Wasserman S. Logit models and logistic regression for social networks: III. Valued relations. Psychometrika. 1999;64:371–394. doi: 10.1348/000711099159053. [DOI] [PubMed] [Google Scholar]
- 11.Wasserman S, Pattison PE. Logit models and logistic regressions for social networks: I. An introduction to Markov graphs and p*. Psychometrika. 1996;61:401–425. [Google Scholar]
- 12.Robins GL, Pattison PE, Kalish Y, Lusher D. An introduction to exponential random graph (p*) models for social networks. Social Networks. 2007;29:173–191. [Google Scholar]
- 13.Robins GL, Snijders T, Wang P, Handcock M, Pattison PE. Recent developments in exponential random graph (p*) models for social networks. Social Networks. 2007;29:192–215. [Google Scholar]
- 14.Bullmore E, Sporns O. Complex brain networks: graph theoretical analysis of structural and functional systems. Nature Reviews Neuroscience. 2009;10:186–198. doi: 10.1038/nrn2575. [DOI] [PubMed] [Google Scholar]
- 15.Meunier D, Achard S, Morcom A, Bullmore E. Age-related changes in modular organization of human brain functional networks. Neuro Image. 2009;44:715–723. doi: 10.1016/j.neuroimage.2008.09.062. [DOI] [PubMed] [Google Scholar]
- 16.Saul ZM, Filkov V. Exploring biological network structure using exponential random graph models. Bioinformatics. 2007;23:2604–2611. doi: 10.1093/bioinformatics/btm370. [DOI] [PubMed] [Google Scholar]
- 17.Hunter DR, Goodreau SM, Handcock MS. Goodness of fit of social network models. Journal of the American Statistical Association. 2008;103:248–258. [Google Scholar]
- 18.Peiffer AM, Hugenschmidt CE, Maldjian JA, Casanova R, Srikanth R, et al. Aging and the interaction of sensory cortical function and structure. Human Brain Mapping. 2009;30:228–240. doi: 10.1002/hbm.20497. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Tzourio-Mazoyer N, Landeau B, Papathanassiou D, Crivello F, Etard O, et al. Automated anatomical labeling of activations in SPM using a macroscopic anatomical parcellation of the MNI MRI single-subject brain. Neuro Image. 2002;15:273–289. doi: 10.1006/nimg.2001.0978. [DOI] [PubMed] [Google Scholar]
- 20.van den Heuvel MP, Stam CJ, Boersma M, Hulshoff Pol HE. Small-world and scale free organization of voxel-based resting-state functional connectivity in the human brain. Neuro Image. 2008;43:528–539. doi: 10.1016/j.neuroimage.2008.08.010. [DOI] [PubMed] [Google Scholar]
- 21.Hayasaka S, Laurienti PJ. Comparison of characteristics between region- and voxel based network analysis in resting-state fMRI. Neuro Image. 2010;50:499–508. doi: 10.1016/j.neuroimage.2009.12.051. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Handcock MS. Statistical models for social networks: Inference and degeneracy. In: Breiger R, Carley K, Pattison PE, editors. Dynamic Social Network Modelling and Analysis: Workshop Summary and Papers. Washington, DC: National Academy Press; 2002. pp. 229–240. [Google Scholar]
- 23.Handcock MS, Hunter DR, Butts CT, Goodreau SM, Morris M. statnet: Software tools for the statistical modeling of network data. 2008. URL http://statnetproject.org. [DOI] [PMC free article] [PubMed]
- 24.Snijders TAB, Pattison PE, Robins GL, Handcock MS. New specifications for exponential random graph models. Sociological Methodology. 2006;36:99–154. [Google Scholar]
- 25.Robins G, Pattison P, Wang P. Closure, connectivity and degree distributions: exponential random graph (p*) models for directed social networks. Social Networks. 2009;31:105–117. [Google Scholar]
- 26.Rinaldo A, Fienberg SE, Zhou Y. On the geometry of discrete exponential families with application to exponential random graph models. Electronic Journal of Statistics. 2009;3:446–484. [Google Scholar]
- 27.Morris M, Handcock MS, Hunter DR. Specification of exponential-family random raph models: terms and computational aspects. Journal of Statistical Software. 2008;24 doi: 10.18637/jss.v024.i04. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.van Duijn MAJ, Gile KJ, Handcock MS. A framework for the comparison of maximum pseudo-likelihood and maximum likelihood estimation of exponential family random graph models. Social Networks. 2009;31:52–62. doi: 10.1016/j.socnet.2008.10.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Hunter DR, Handcock MS, Butts CT, Goodreau SM, Morris M. ergm: A package to fit, simulate and diagnose exponential-family models for networks. Journal of Statistical Software. 2008;24 doi: 10.18637/jss.v024.i03. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Muller KE, Fetterman BA. Regression and ANOVA: an integrated approach using SAS software. Cary, , NC: SAS Institute Inc; 2002. [Google Scholar]
- 31.Akaike H. A new look at the statistical model identification. IEEE Transaction on Automatic Control. 1974;AC-19:716–723. [Google Scholar]
- 32.Gaal ZA, Boha R, Stam CJ, Molnar M. Age-dependent features of EEG-reactivity Spectral, complexity, and network characteristics. Neuroscience Letters. 2009;479:79–84. doi: 10.1016/j.neulet.2010.05.037. [DOI] [PubMed] [Google Scholar]
- 33.Gong G, Rosa-Neto P, Carbonell F, Chen ZJ, He Y, Evans AC. Age- and gender related differences in the cortical anatomical network. Journal of Neuroscience. 2009;29:15684–15693. doi: 10.1523/JNEUROSCI.2308-09.2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Rubinov M, Sporns O. Complex network measures of brain connectivity: uses and interpretations. Neuroimage. 2010;52(3):1059–1069. doi: 10.1016/j.neuroimage.2009.10.003. [DOI] [PubMed] [Google Scholar]
- 35.Rubinov M, Knock SA, Stam CJ, Micheloyannis S, Harris AWF, et al. Small-world properties of nonlinear brain activity in schizophrenia. Human Brain Mapping. 2007;30:403–416. doi: 10.1002/hbm.20517. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Alexander-Bloch AF, Gogtay N, Meunier D, Birn R, Clasen L, et al. Disrupted modularity and local connectivity of brain functional networks in childhood-onset schizophrenia. Frontiers in Systems Neuroscience. 2010;4:Article 147. doi: 10.3389/fnsys.2010.00147. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Meunier D, Lambiotte R, Fornito A, Ersche KD, Bullmore ET. Hierarchical modularity in human brain functional networks. Frontiers in Neuroinformatics. 2009;3:37. doi: 10.3389/neuro.11.037.2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Joyce KE, Laurienti PJ, Burdette JH, Hayasaka S. A new measure of centrality for brain networks. PLoS ONE. 2010;5(8):e12200. doi: 10.1371/journal.pone.0012200. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.