Abstract
Egocentric sampling of networks selects a subset of nodes (“egos”) and collects information from them on themselves and their immediate network neighbours (“alters”), leaving the rest of the nodes in the network unobserved. This design is popular because it is relatively inexpensive to implement and can be integrated into standard sample surveys. Recent methodological developments now make it possible to statistically analyse this type of network data with Exponential-family Random Graph Models (ERGMs). This provides a framework for principled statistical inference, and the fitted models can in turn be used to simulate complete networks of arbitrary size that are consistent with the observed sample data, allowing one to infer the distribution of whole-network properties generated by the observed egocentric network statistics. In this paper, we discuss how design choices for egocentric network studies impact statistical estimation and inference for ERGMs. The design choices include both measurement strategies (for ego and alter attributes, and for ego–alter and alter–alter ties) and sampling strategies (for egos and alters). We discuss the importance of harmonising measurement specifications across egos and alters, and conduct simulation studies to demonstrate the impact of sampling design on statistical inference, specifically stratified sampling and degree censoring.
Keywords: ERGM egocentric data egonet alter–alter matrix
1. Introduction
In the social networks field, as in many other areas of empirical research, study designs for data collection are largely driven by the methodology that will be used to analyse the data. Unlike other fields, however, the traditional methods used for analysing networks have typically required conducting a “sociocentric” network census: data on all nodes and links in the population of interest. This design was necessary to provide accurate descriptive statistics on whole-network properties such as centrality/centralisation, geodesics, paths and reachability, cliques/clusters, various forms of positional equivalence, and accurate estimation of statistical models for networks. In the absence of methods for estimating such properties from sampled network data, sociocentric designs were mandatory.
The network census requirement imposed an onerous burden on empirical network research: costly and time-consuming, invasive and increasingly likely to run afoul of human subjects concerns (Borgatti and Molina 2005), threatened by missing data (Kossinets 2006), and haunted by the boundary specification problem (Laumann, Marsden, and Prensky 1989). Studies of passively collected electronically encoded social data (e.g., Facebook, Twitter, citations and the occasional email/text/phone network) could often overcome the first two problems, but at the cost of severely limited topical content and boundaries driven by selective engagement (González-Bailón et al. 2014).
Two forms of network sampling have received attention over the years: adaptive (link tracing) and egocentric sampling. Both designs require an initial sample of respondents, a “name generator” to enumerate their contacts, and a strategy for sampling these contacts. Adaptive designs start with a small sample of “seeds” from the population of interest and employ a range of different link tracing strategies to recruit the contacts into the study. Variants include snowball samples (Goodman 1961), random walks, respondent driven samples (RDS) (Heckathorn 1997, 2002) and a host of ad hoc designs. Egocentric designs employ standard survey sampling techniques to recruit respondents (egos), who then report on their contacts (alters); the alters are not recruited into the sample. These, too, have a range of design variations based on the enumeration and description of alters, and the possible collection of alter–alter tie information (Crossley et al. 2015; Perry, Pescosolido, and Borgatti 2018).
In both link tracing and egocentric designs the burden of data collection is much reduced, but this also severely constrains the network information available in the sample and thus the inferences that can be drawn from the data. Adaptive samples have primarily been used to estimate the size of hidden populations (Frank and Snijders 1994; Crawford, Wu, and Heimer 2018). Egocentric samples have primarily been used to investigate degree distributions (network size), patterns of homophily and, if the alter–alter matrix is collected, triads and modified network density and centrality measures (Marsden 1987, 2002; Kalish and Robins 2006).
Egocentric surveys have traditionally been used when the primary research goals involve studying the size and composition of social networks and their correlation with behaviours or opinions. The key advantage of egocentric designs is that they can be implemented within a standard sample survey. This makes them practical and allows them to leverage traditional sampling theory for statistical inference. The first large-scale egocentric surveys in the US were conducted over 40 years ago: the Northern California Community Study in 1977 by Claude Fischer (Fischer 1977, 1982) and the General Social Survey national survey module on core discussion networks in 1985, 1987, and 2004 by NORC (Burt 1984; Smith et al. 2019). The egocentric design has been used in many other studies since then, internationally, and across a range of disciplines. Some notable examples include:
social networks surveys in East Germany (Völker and Flap 2001), the Netherlands (Völker and Flap 2002; Völker, Flap, and Mollenhorst 2007; Völker, Schutjens, and Mollenhorst 2013) and Poland (Mach, Manterys, and Sadowski 2018);
sexual network surveys in over 60 countries since 2000, with repeated country surveys about every 5 years, collected as part of the Demographic and Health Survey Program (Demographic and Health Surveys Program 2020);
the POLYMOD study (Improving Public Health Policy in Europe through the Modelling and Economic Evaluation of Interventions for the Control of Infectious Diseases) conducted in eight European countries (Mossong et al. 2008);
a national Gallup panel on social networks and health (analysed by O’Malley et al. 2012);
the National Health and Social Life Survey (Laumann et al. 1992);
a multicountry comparison of scientific productivity (Hara, Chen, and Ynalvez 2017); and
longitudinal surveys like the Children of Immigrants Longitudinal Survey in Four European Countries (CILS4EU 2014; Kalter et al. 2016) and Understanding How Personal Networks Change (Fischer 2018).
While all of these studies fall under the umbrella of egocentric network designs, they employ a wide range of strategies for sampling and measurement. That, in turn, influences how the data can be analysed and the quality of inferences to the population. The importance of these design details is growing now that more statistical methods for analysing egocentric data are being developed.
The development of principled, general statistical methods for analysing egocentric network data began to emerge in the last two decades. Some excellent recent overviews of this field have recently been published (McCarty et al. 2019; Perry, Pescosolido, and Borgatti 2018). The newest addition to the field is the development of inferential methods for exponential-family random graph models (ERGMs) with egocentric data. These have not yet disseminated widely, so our paper will focus on the data issues relevant to this framework. However, many of the topics we address are also relevant to other methods.
The most general framework for estimating ERGMs from sampled network data is the likelihood based approaches outlined by Handcock and Gile (2010). These apply to a wide range of “amenable” sampling designs that include both adaptive and egocentric forms, as do the Bayesian approaches of Koskinen, Robins, and Pattison (2010); and a number of computational approaches have been proposed (Pattison et al. 2013, for example). These provide a solid basis for model-based statistical inference by specifying the conditions needed for the sampling design to be ignorable, which in turn allows one to maximise the marginal likelihood over all possible population networks consistent with this sample.
While this is an elegant theoretical solution that draws on principled analysis of missing data, it requires alters to be uniquely identifiable. This is almost never the case for sample survey based egocentric designs. In the survey context, the anonymity of alters is not a bug but a valuable feature from the standpoint of both improving disclosure and reducing human subjects study concerns.
The alternative is a recently developed “hybrid” approach that provides principled statistical estimation from egocentrically sampled data (Krivitsky and Morris 2017) with anonymous alters. This framework generalises the well-established body of statistical theory for estimation and inference from survey samples, relying on simple and transparent scaling assumptions rather than marginalisation. The methods support estimation and inference for ERGMs and their temporal extensions—Temporal ERGMs (TERGMs) (Hanneke, Fu, and Xing 2010) and Separable Temporal ERGMs (STERGMs) (Krivitsky 2012; Krivitsky and Handcock 2014) in particular—from egocentric data. While ERGMs are known as one of the primary statistical tools for analysing complete network data, this new approach brings the ability to conduct traditional statistical inference—joint estimation, confidence intervals and hypothesis tests for model coefficients—about a broad range of structural models, to egocentrically sampled network data. It also enables simulation and exploration of complete networks that are consistent with the egocentric observations. This is a game changer for social network analysis: one can now collect a single cross–sectional egocentric sample of a network (the opposite of “big data”), and use it to produce a complete dynamic simulated network, of arbitrary size, that is consistent with the scaled observed data.
We now have a practical and well-understood approach to network sampling, paired with a powerful framework for estimation, inference and simulation. Once a statistical framework like this has been established, it becomes possible to evaluate aspects of the study design that inform statistical modelling, and that provide leverage for improving the properties of the estimates: minimising bias and variance. This is the purpose of this paper. We start in Section 2 by reviewing the previous literature on methods of network inference from egocentrically sampled data and the statistical theory of ERGMs. The three subsequent sections are the main contributions of the paper, and focus on the implications of ERGM inference for egocentric network study design. In Section 3, we show how existing network questionnaire items and their response scales need to be adjusted to mitigate harmonisation problems and allow for ERG model fitting. In Section 4, we discuss how ego sampling strategies affect the precision of estimating population-level network statistics and corresponding parameters of an ERG model—a distinction that turns out to be important. We also show how estimation of dyad-independent homophily effects and dyad-dependent degree effects performs under various stratified sampling schemes. In Section 5, we turn our attention to the issue of alter censoring—a phenomenon that arises when the number of possible alters is too large to enumerate in the time allotted, so a maximum is set a priori in a Fixed Choice Design (FCD). We explore the effects of censoring on ERG model estimation and the potential benefits of Augmented FCD (AFCD), which collects one additional piece of information—the total number of alters. The Section 6 summarises the findings, notes limitations, and suggests directions for future research.
2. Background
The methods for analysing egocentrically sampled network data fall into one of two broad classes: descriptive and generative.
Descriptive methods summarise patterns in the observed data and are often, though not always, focused on ties (rather than non-ties). Examples include estimating the size, composition, and homophily in personal networks, and the effects of these network attributes on ego behaviour or outcomes (Fischer 1982; McPherson, Smith-Lovin, and Brashears 2006; Brashears 2008). When egocentric data are collected longitudinally, hierarchical models can be used to examine the factors associated with network stability and multiplex tie dependence (van Duijn, Busschbach, and Snijders 1999; Snijders, Spreen, and Zwaagstra 1995). When the alter–alter tie data is collected, analyses extend to topics such as network density, cohesion, centrality, and brokerage (Gould and Fernandez 1989; Marti, Bolibar, and Lozares 2017).
Generative models instead seek to represent the determinants of tie presence (static) or incidence and persistence (dynamic). We use the term “generative” here to distinguish models that, once estimated, can be used to generate (i.e., simulate) networks that are consistent with the observed egocentric data. (This is as opposed to its usage in the context of causal inference, which we do not discuss here.) The main methodological framework used for this is ERGMs (Krivitsky and Morris 2017; Smith 2012, 2015)1 and their temporal extensions: TERGMs and STERGMs). In both cases, summary statistics from the egocentric sample are used to estimate the model, and the fitted model can then be used to simulate complete networks consistent with the egocentric data. Many of the same network features found in descriptive models are also common in generative models—ego network size, composition, patterns of attribute mixing, and, with alter–alter ties, the additional structural features visible from this. In the generative context, however, these features serve to predict the probability that a tie exists, forms, or is dissolved.
ERGMs, TERGMs, and STERGMs provide a rich class of generative models for representing the structural and dynamic properties of networks. The limitation is that the models can only be used to conduct inference on terms for which the scaled network statistics are observable in the sample. For egocentric designs this includes the degree distribution sequence, heterogeneity in degree by nodal attributes, assortative mixing on multiple nodal attributes (homophily is one example), some types of triadic biases (if the alter–alter matrix is collected), and some types of reciprocity (if the ties are directed). Indeed, the list of observable network statistics in an egocentric sample includes all of the most common terms that are typically used for analysing complete networks. And, if one adopts the theoretical perspective that processes operating at the micro-level—such as sociality, homophily, and transitivity—cumulate up to produce the emergent macro-level properties—such as centrality and geodesics—then the informational value of the statistics observable in egocentric samples is foundational and substantial. This framework thus has both theoretical and methodological value.
While egocentric samples leave many higher-order configurations and whole-network properties unobserved, ERGMs can partially offset this limitation because ERGMs, as a stochastic model, are generative, in the sense that one can generate (i.e., simulate) a complete network from the fitted model. Each simulated network is a draw from the probability distribution defined by the model, so one can draw a sample of networks from this distribution, and the properties of the estimators ensure that the sample will reproduce the observed (scaled) network statistics in expectation. In addition, one can examine the range of unobserved higher-order configurations and whole-network properties in this sample, and these unobserved properties will also be consistent with the model and the observed statistics. Any whole-network property, and its distribution, can be explored in this way. While the sample of simulated networks cannot be used to test whether there are additional biases in the unobserved higher-order configurations, it can be used to explore the range and distribution of these configurations that would be consistent with the observed data.
We have only just begun to take advantage of this “hybrid” inference; it seems safe to say that most people in the network analysis field are not yet aware that whole-network ERGMs can be estimated from egocentric data, let alone that egocentric data can be used to explore the implications of the model for whole-network properties.
There is one area, however, where this framework has been quickly adopted: network epidemiology. Epidemiologists are often tasked with making projections of potential epidemic spread, to examine the potential impact of different intervention strategies. They cannot experiment with real human populations, so they need a “virtual laboratory”, a simulation platform that can be used to represent the dynamic transmission network and how it interacts with the public health intervention system and other demographic processes. To be realistic, this simulation should be grounded in empirical data. To be practical, the data must be easy to collect. The egocentric ERGM/STERGM framework provides exactly this: an entire workflow from data collection, through statistical estimation, to simulation studies, which can be used to explore and optimise public health intervention strategies. This can now be done in a statistically principled way, in contrast to the ad hoc approaches available previously, and it does not require the kind of “big data” that is only available in limited settings.
Examples of recent papers that exploit this approach include studies on effects of network properties such as tie concurrency and mixing on network connectivity and component size distributions (Krivitsky and Morris 2017), and how these in turn influence the spread of infectious disease (Morris et al. 2009; Weiss et al. 2019; Jenness et al. 2018; Goodreau et al. 2017, 2010). As the promise of this approach becomes better known, the value of egocentrically sampled network data will grow accordingly. And this, in turn, will focus attention on study design.
The Krivitsky and Morris (2017) approach estimates the Pseudo-Maximum-Likelihood Estimator (PMLE)2 (Binder 1983) of the coefficients of an ERGM using design-based estimates of the sufficient statistics from egocentrically sampled data. Because the inference framework was published in a journal that may not be familiar to the audience of Social Networks, we briefly review the key conceptual elements in the next section: egocentric designs and notation, estimating ERGMs using the statistical principle of sufficiency, and the statistics identifiable from different types of egocentric designs. We then turn to the question of how inference is influenced by study design choices.
2.1. Egocentric Sampling Designs and Notation
Egocentric designs require an initial sample of respondents (egos), a “name generator” to enumerate their contacts (alters), and a strategy for collecting information from ego about these alters; the alters are not recruited into the sample. There is a range of design variations based on the sampling of the egos, the limits placed on enumeration and description of alters, and the optional collection of alter–alter tie information—the alter matrix.
The two criteria of egocentric designs considered here are a probability sample of egos, with alters that are not uniquely identifiable. In practice, this is the most common form of egocentrically sampled data available. While such data cannot be used to directly compute the descriptive statistics for complete networks used in traditional social network analysis (e.g., betweenness, geodesics, equivalence, etc), they can be used by statistical models for networks, and specifically ERGMs. The specifics of the sampling design and measurement constrain the model terms that are identifiable, but the estimates that can be obtained inherit the inferential properties of probability survey samples.
Some notation is helpful for reviewing the statistical framework and necessary for deriving what is, and is not, identifiable from an egocentric sample. Here we mostly follow the notation of Krivitsky and Morris (2017) and Krivitsky, Bojanowski, and Morris (2019).
2.1.1. Population Network Features
The population network is the census of all nodes and links of interest. We refer to these elements as follows:
N is the set of actors of interest, the “population” in egocentric sampling.
is a subset of the set of distinct unordered pairs of actors in N, the set of potential relationship of interest. From the point of view of egocentric sampling, these are relationships that would, if present, be revealed by the name generator used in the egocentric study. If multiple name generators are used, it is the union of these relationships. It may also be limited by nodal sampling constraints.
xi, for i ∈ N is a vector of nodal attributes, such as age or socioeconomic status.
x represents the nodal attributes of all of the actors in the population.
is the set of potential networks that may be realised.
y ∈ is the population network: a (typically) large, finite unobserved network.
yi,j is an indicator of a presence of an observable relationship between i and j.
yi is the set of indices of neighbours of i and |yi| is their number.
g(y, x) is a population-level network statistic, a function of the network that computes a numeric vector operationalising the network features of interest. The resulting sum represents how many specific configurations of nodes and links of that type are found in a network.
2.1.2. Sample Network Features
When we egocentrically sample the network, we observe a number of features:
S ⊂ N is a sample of actors.
ei for i ∈ S is an ego report with some subset of the following information:
is ego’s attribute vector xi.
|yi| is the number of alters ego i has.
is a list of alter attributes (xj for j ∈ yi).
is the alter–alter matrix, a list of relations reported by ego i among their immediate alters.
Furthermore, we define and to refer to the kth attribute of ego i and of jth alter of ego i, respectively; and we use (C) to refer to an indicator function, which has the value 1 if condition C holds and 0 otherwise.
2.2. Review of the estimation approach
Krivitsky and Morris (2017) proposed to use egocentric data of this form to estimate ERGMs by taking advantage of their sufficiency properties. Recall that an ERGM represents the overall probability of the random network Y as a function of the deviation of specific network configurations from those that that would have been observed in a homogeneous random graph:
| (1) |
with κ(θ, x) being the normalising constant.
For any term (element) gk(y, x) of the (vector) statistic g(y, x), if its observed value is what we would expect given the other terms in the model, then the estimated value of θk will not be significantly different from zero.
Given a network y, the MLE for θ can be found by differentiating the log-likelihood to obtain the score function and finding its zero. For minimal exponential-family models, including non-curved ERGMs, the score function is simply the difference between the observed statistic g(y, x) and its expected value under the model μ(θ, x):
| (2) |
The approach taken by Krivitsky and Morris (2017) was to restrict attention to a class of egocentric ERGMs, defined by a statistic g(y, x) that is a simple sum of the node-level contributions, i.e.,
| (3) |
for some (vector) function h(·), which embodies an individual node’s contribution to the network statistic. For such ERGMs, g(y, x) is simply a population total, and estimation of θ becomes a function of estimating this population total—a well-understood problem in classical statistics and in network sampling (Krivitsky, Handcock, and Morris 2011; Smith 2012; Gjoka, Smith, and Butts 2014, 2015)—and of obtaining the information needed to compute the normalising constant κ(θ, x), and/or μ(θ, x).
Estimating g(y, x) from an egocentrically sampled statistic, , can be done using design-based or model-based methods alike. These rely on simple scaling principles to estimate the population-level statistics from the sample statistics (incorporating survey weights as appropriate). Substituting the design-based estimate, (eS, x) into (2) and solving for θ produces the above-mentioned pseudo-maximum-likelihood estimator (PMLE) (Binder 1983). Krivitsky and Morris (2017) (eq. 4.4) also derived the expression for the asymptotic variance of the PMLE , completing the framework for principled statistical inference:
| (4) |
Here, is the approximate change in the estimate for a unit change in the network statistic g(y, x), and |N|2Σ is the variance of the estimate for g(y, x).
2.3. Identifiable statistics with egocentric sampling
In discussing the type of data needed for the framework of Krivitsky and Morris (2017), it is helpful to distinguish between what is needed to estimate the network statistic (g(y, x)) and what is needed over and above that to fit the model.
The primary condition for identifiability of a model term is that it can be recovered from an egocentric census of the same design. For some model terms—especially dyad-independent ones—this condition is intuitive and quite easily proven. For other model terms, the intuition may not be as clear, and the proof may be more complicated, but the principle is the same.
In Krivitsky and Morris (2017), the “minimal” egocentric sampling design was shown to support inference for model terms representing nodal attribute based heterogeneity in degree sequences, degree mean and mixing, but not triadic effects. They also suggested, and a subsequent paper (Krivitsky, Bojanowski, and Morris 2019) demonstrated, that some triadic closure terms are estimable provided that data contains information about the relationships between the alters. The set of identifiable terms includes both dyad-independent and dyad-dependent terms.
While the importance of the alter matrix for identifiability of triadic effects is obvious, the fact that these ties do not automatically support estimation of all triadic terms is not as intuitively clear. As shown by Krivitsky, Bojanowski, and Morris (2019), some types of triadic terms, such as the dyadwise shared partners, cannot be recovered from an egocentric census with alter matrix data.
2.4. Estimable covariates from egocentric sampling
While g(y, x) for an egocentric statistic only requires covariate information about extant edges in y, κ(θ, x) in the likelihood (and μ(θ, x) in the score function) require information to predict the probability of any possible network under the model, which implies being able to predict the probability of any specific edge. That is, if g(y, x) depends on some dyadic property, then estimation requires being able to estimate or evaluate that property for every (i, j) ∈ , or, at the very least, to obtain a highly accurate estimate of its joint distribution with the other covariates. We discuss ways to avert this limitation during the questionnaire design stage in Section 3. Some potential predictors—such as kinship—are fundamentally dyadic (Krivitsky, Bojanowski, and Morris 2019). These can contribute to defining y, but they cannot be used as predictors in the framework.
Krivitsky, Handcock, and Morris (2011), Smith (2012), Gjoka, Smith, and Butts (2014), Gjoka, Smith, and Butts (2015), Krivitsky and Morris (2017), and Krivitsky, Bojanowski, and Morris (2019) discuss how the choices made in data collection affect the statistics and coefficients that can be estimated, and Schweinberger et al. (2020) place this in a more general framework considering targets of inference. We now expand on this discussion, considering the implications of two types of design decisions:
Questionnaire measures What questions are asked and what scales are used to record the response?
Sampling What are the sampling designs for egos and for alters? Note that alters are sampled by means of a “name generator” in the questionnaire.
We discuss these in turn.
3. Design considerations: measurement
A name generator is the most common tool for eliciting personal networks through survey interviews (Laumann 1966 describes the pioneering applications). While research has identified a number of reliability issues with this approach (see Marin and Hampton 2007 for an overview) it is the de facto mainstay for egocentric surveys. The explicit question used to generate alter names from ego is what operationalizes the social relationship of interest. There are broad classes of possible relationships defined by behaviour (e.g., exchange), social position (e.g., supervisor), or affect (e.g., like) (Fischer 1982). Examples from the literature include alters with whom egos discuss important matters (Burt 1984), have regular chats or visits (Campbell and Lee 1991), stay in a specific role (Hampton and Wellman 2003), maintain daily contacts (Fu 2005), exchange social support (Wellman and Wortley 1990; Wellman 1979), or have sex (Laumann et al. 1994).
Name generator questions are typically followed by name interpreter questions designed to gather further information about alter attributes (such as age or gender), attributes of ego–alter relations (such as how close the ego feels towards the alter). Egos can also be asked to report on the relations among the alters (the alter–alter matrix), although this is less common.
There is a substantial literature devoted to assessing the quality of measurement of personal networks in egocentric surveys, including reliability of reports (Poel 1993; Marin and Hampton 2007) biases in respondents’ recall (Brewer and Yang 1994; Brewer 1997), wording effects (Straits 2000), mode and interviewer effects (Harling et al. 2018; Fischer and Bayham 2019; Marsden 2003), and survey context effects (Bailey and Marsden 1999).
Our goal here is different. We seek to highlight how consistency of measurement, a topic that often escapes attention, influences estimation in the context of a generative statistical model for the network, specifically an ERGM. As noted in Section 2.3 and by Krivitsky, Bojanowski, and Morris (2019), an ERGM seeks to make inferences about a well-defined, partially observed, population network. This requires consistent measurements both for ego and alter attributes and for ego–alter and alter–alter ties.
3.1. Actor attributes
Terms that capture actor attributes play a large role in dyad-independent ERGMs. Measurement of these attributes, for ego and alter, influences what we can estimate.
If we measure an attribute only for egos we can estimate the “main effect” of that attribute on the existence of a tie. This type of term in a model allows for heterogeneity in tie prevalence by attribute: , with xi, k either a quantitative covariate or a dummy variable indicating group membership. The corresponding .
If we measure the same attribute for alters, we get two additional benefits.First, the main effect can be estimated with greater accuracy via , which is likely to be less variable than the variant based solely on the ego’s attribute. Second, we can now estimate a wide range of selective mixing terms, the most well known of which is homophily. For example, a count of homophilous ties on categorical attribute , is recovered by ; and the effect of the difference in a quantitative attribute, , is recovered by . A much broader class of mixing terms can be defined and estimated, however. Examples include differential homophily, a single dummy term to capture a specific attribute pairing, a saturated model that will estimate the relative likelihood of any categorical pairing, upper/lower triangle terms to capture asymmetric effects for ordinal attributes, and other functions such as geographic distance.
To obtain these benefits it is essential that both ego and alter attribute question wordings and their response scales be comparable. This is where egocentric surveys often come up short, in a variety of ways. The least problematic case is when the scale used to measure an attribute for alters is less granular than the scale used for egos. An example from the GSS, shown in Table 1, is the measurement of political party affiliation (National Opinion Research Center 2018). For ego, this is a 7-point scale that includes the 3 main affiliations along with an intensity level for the ends and a leaning for the middle. It is straightforward in this case to collapse the ego categories to match the alter scale. The measurement of age in the Polish survey “People in Networks” (Mach, Manterys, and Sadowski 2018) is another example: egos have a single-year scale, whereas alters have a set of 10-year age intervals, so the ego scale can be harmonised to the alter.
Table 1:
Scale for political party affiliation used in the General Social Survey. Note the higher, but compatible, granularity of the ego scale.
| Ego scale | Alter scale |
|---|---|
| Strong democrat | Democrat |
| Not strong democrat | |
| Ind,near democrat | |
| Independent | Independent |
| Ind,near republican | |
| Not strong republican | |
| Strong republican | Republican |
| Other party | Other party |
| Don’t know | Don’t know |
| No answer | No answer |
A more difficult case is when the attributes of egos and alters are measured with scales that are not compatible. An example of this is the measurement of race/ethnicity in the 2004 GSS survey. Ego’s race (multiple choice) has 15 categories, but alter’s has 4 (aside from “Other”, “Don’t know” and “No answer”). Two of the categories match: White and Hispanic. For alters, the other two categories are “Black” and “Asian”. Egos include “Black or African American”—so the wording is different—and a range of additional categories that includes “Asian Indian”, along with specific Asian countries of origin. This raises the question of how egos would report the race of alters from India: as “Asians”, or as something else? The answer may depend on the ego’s origin. To avoid ambiguity and misclassification, harmonisation would require collapsing Asian, and Asian-related, into the “Other” category for both ego and alter, which is a significant loss of information. The measurement of race and ethnicity itself has changed over the years: in the 1985 GSS, ego’s race (Black/White/Other) is asked separately from ethnicity (which has 42 categories), but alter race is coded as in 2004. In addition, by the 2004 GSS, egos were allowed to report multiple races for themselves.
The difficulty in defining race contributes to this problem and is reflected in the way measurement strategies have changed over the years—not just in the GSS. But the topic of homophily by race in the US is of sufficient interest that race was one of only five alter attributes included, so this measurement incompatibility is unfortunate.
An alternative measurement strategy is to instead ask if alter has the same attribute as ego (Fischer 2018). (This is in addition to asking for the ego’s attribute, without which the statistic may be estimable but the ERGM not, per Section 2.4.) This measure may reduce the cognitive burden on the respondent, and it is a more direct measure of whether ego perceives this alter as similar to themselves. But this restricts analysis to homophily only (uniform or differential). It does not allow for more nuanced selective mixing patterns, where the off-diagonal ties are not distributed at random. In addition, it does not guarantee that what ego means when they say alter is “the same” as them corresponds to the measurement scale used for the ego.
Ideally the attribute for both egos alters alters should be measured using comparable questions and identical or at least compatible scales.
3.2. Alter–Alter Ties
It is possible to estimate models with terms for triadic effects if the data include information on ties between the alters. Information about ego–alter ties is collected via the original name generator, while information about alter–alter ties is collected with a separate matrix or list item (Smith 2015; Krivitsky, Bojanowski, and Morris 2019). For these to provide consistent information on the same relation in the population network, the tie descriptor questions in both sections need to be comparable. Sometimes this happens (Cornwell et al. 2009), but often it does not.
For example, the GSS in all years uses uses the specific wording “discussing important matters” for the ego–alter name generator, and then asks the tie descriptor question “Which of these people do you feel especially close to?” This is a binary response (close/not close). For the alter–alter ties ego is instead asked whether a given pair of alters are “especially close”, “know each other”, or are “total strangers” (Burt 1984). This makes the ego–alter ties inconsistent with the alter–alter ties, for two reasons. First, the name generator for ego–alter is not used for the alter–alter nominations—as a result, we do not know whether the alters discuss important matters with each other (unless they are total strangers), but that is the specific tie we observe for ego–alter. Second, and relatedly, there is no way to collapse the alter “close” categories to match the ego: the “total stranger” category is not possible for the ego measure, as all of their nominated alters (who are discussion partners) are known. The discrepancy makes it impossible to estimate triadic effects for discussion partners—there may be triads visible, but the ties that make up the triad are not all the same. It is also not possible to estimate triads based on closeness, as this was not the name generator for ego–alter ties. Instead, we can only construct and analyse a mixture of tie types, which makes it impossible to interpret the results. The same issue is found in the longitudinal egocentric survey, “Understanding How Personal Networks Change” (Fischer 2018). The wording for that survey is identical to the GSS—an example of how an unanticipated problem, once it has become a standard in the field, may diffuse to many studies.
3.3. When network measures do not support ERGMs: tie descriptors only
A common strategy for measuring social support networks is to ask who a respondent would turn to for help with specific needs. This strategy was used in the International Social Survey Programme in 1986, 2001 and 2017 (GESIS Data Archive 2018), and it does not measure alter attributes but only records a tie descriptor (e.g., parent, sibling, spouse, friend). These data do not support modelling with ERGMs.
The reason is that this type of network data is not asking about tie existence (or formation/dissolution), it is asking about what the tie is used to exchange. The survey may provide information on the attributes of ego, but there is no corresponding information on the attributes of alters. In the social support example above, one could imagine an ERGM for a friendship or spousal network, but absent information on the alters it is not possible to model how these friends or spouses were selected. The other kin relations are generated by demographic processes, not by the type of process an ERGM represents: one does not choose their biological family.
4. Design considerations: sampling of egos
The egocentric inference methodology relies on classical survey sampling inference to estimate the network statistics of interest—as population totals—and quantify their uncertainty. This permits it to take advantage of a variety of common sampling designs used for egocentric data, including stratification and cluster sampling. Indeed, Laumann et al. (1994) write about the National Health and Social Life Survey (NHSLS) that “the sample design for the NHSLS is the most straightforward element of our methodology because nothing about probability sampling is specific to or changes in a survey of sexual behaviour” (p. 43), and their design used elements of cluster sampling (e.g., individuals within households), oversampling (of Blacks and Hispanics), and post-stratification to account for non-response (Laumann et al. 1992, 7–8). Similarly, the US General Social Survey (Burt 1984) uses a combination of multi-stage cluster sampling and stratified/quota sampling to select respondents (Smith et al. 2019, App. A). Thus, standard references (Lohr 2019, among many others) can be used to select, and adjust for the selection of, egos. However, the inference does not end at the network statistics. Indeed, as we show below, samples that are optimal for statistics are not necessarily optimal for estimating model parameters.
The expression for var (4) depends on two components: , the effect of a unit change in the estimate for g(y, x) on ; and |N|2Σ, the variance-covariance matrix of the estimates for g(y, x). The former depends on the model specification, the composition of the population network, and the specific value of . None of these are under the control of the survey designer: model specification and population network size and composition are driven by (in principle) substantive considerations and feasibility of collecting the needed data, and is estimated from the data after the data collection. Thus, at the data collection stage, the only determinant of the var() under the designer’s control is Σ.
In our framework, Σ’s elementwise effect on var() is fairly direct: it is generally the case that the variance of a given element of is most strongly affected by the variance corresponding diagonal element of Σ. For example, other things being equal, reducing Σk,k by a factor of 4 will reduce the standard error of by a factor of around 2. Since Σ is the variance of an estimate for a population mean (or scaled population total), standard sample selection techniques (stratification, cluster sampling, etc.) apply, and we refer the reader to the standard reference texts on social survey designs (such as Kish 1995; Sampath 2001; Thompson 2012) for general discussions of efficient sampling for estimation of population means and totals.
However, sample selection often presents trade-offs. Social surveys—including those containing egocentric data—often deliberately oversample some subpopulations to facilitate more precise estimation for those subpopulation quantities, at the cost of less precise estimation for the overall population (Laumann et al. 1992, for example). In the context of egocentric inference for ERGMs, this means that some statistics (i.e., those incident on oversampled groups) may be estimated more precisely, while others less so, which propagates to the parameter estimates.
Unfortunately, the form of this propagation can be nontrivial, and designing samples to maximise accuracy of ERGM estimates requires simplifying assumptions, simulations, or both. An example follows.
4.1. Example: Stratified sampling for estimating homophily
The detailed derivation is provided in the Supplementary Materials, and we summarise the results here.
Consider a population network with one binary actor attribute, xi ∈ {1, 2}, and let N1 and N2 be sets of actors in the two groups defined by x. We wish to model uniform homophily on this attribute, net of the two groups’ overall tendencies to have ties. This can be operationalised in an ERGM with a sociality effect for k = 1, 2 and a uniform homophily effect .
Following Krivitsky and Morris (2017), we assume an offset model of Krivitsky, Handcock, and Morris (2011) and use their asymptotic results, resulting in the following logit form:
This is a simple dyad-independent ERGM.
To estimate this model, suppose that we draw a stratified sample of total size |S|, taking |S1| egos from N1 and |S2| from N2. For the model in question,
An unbiased estimator for g(y, x) from such a sample would be
and ERGM estimation could proceed from there. For the purposes of illustration, we will assume that we know the population composition (i.e., |N1| and |N2|) and we have a good idea of the value of θ. In practice, these could be obtained from prior studies, population censuses, and other sources.
We show in the supplement that if our sole priority is to reduce the variance of our estimate for the homophily statistic (g3(y)), we should oversample the larger group: as a first approximation, set |Sk| ∝ |Nk|3/2, and for optimal allocation also take into account the strength of the homophily and the mean degree of the groups involved. Intuitively, this is because the number of homophilous dyads within a group is proportional to the square of its size, so it makes sense to obtain a disproportionately larger sample from a larger group.
However, our ultimate goal is to estimate model parameters in the presence of other parameters. The optimal sample allocation to minimise the variance of does not have a simple form, but as the variance has a closed form, it can be optimised numerically.
To illustrate this, three synthetic population networks each of size 200,000 were simulated from the above-specified ERGM:
Population 1: 25% of actors in Group 1, 75% of actors in Group 2, positive group homophily
Population 2: 25% of actors in Group 1, 75% of actors in Group 2, negative group homophily (i.e., heterophily)
Population 3: 12.5% of actors in Group 1, 87.5% of actors in Group 2 (i.e., a more lopsided population than the others), positive group homophily
The parameters and the statistics are given in Table 2. Outside of rare special cases, it is not possible to construct a network for which the maximum likelihood estimator will exactly equal to a specified value. Therefore, we then refit the model to the simulated population networks and produce the “actual” true population parameters.
Table 2:
Synthetic population networks; note that parameters are net of the −log|N| network size adjustment.
| Population | ||||
|---|---|---|---|---|
|
|
||||
| 1 | 2 | 3 | ||
| Group size | Group 1 | 50,000 | 50,000 | 25,000 |
| Group 2 | 150,000 | 150,000 | 175,000 | |
| Parameter (simulation) | θ1 (Group 1) | 0.00 | 0.50 | 0.00 |
| θ2 (Group 2) | −0.50 | 0.00 | −0.50 | |
| θ3 (homophily) | 1.00 | −1.00 | 1.00 | |
| Statistic | g1 (Group 1) | 56,712 | 73,972 | 21,296 |
| g2 (Group 2) | 135,768 | 103,118 | 166,810 | |
| g3 (homophily) | 73,412 | 26,957 | 81,103 | |
| Parameter (actual) | θ1 (Group 1) | 0.00 | 0.49 | −0.02 |
| θ2 (Group 2) | −0.50 | 0.00 | −0.51 | |
| θ3 (homophily) | 1.00 | −1.00 | 1.02 | |
Each population network was then egocentrically sampled with stratification on group, with total sample size |S| = 200, using each of the following five schemes:
Sampling proportional to group size (|Sk| ∝ |Nk|).
Equal sample size for both groups. This is effectively oversampling the smaller group.
|Sk| ∝ |Nk|3/2. This is effectively oversampling the larger group.
Optimal allocation for g3(y) (optimised numerically).
Optimal allocation for θ3(y) (optimised numerically).
The resulting allocations are shown in Table 3.
Table 3:
Stratum sample sizes.
| Population / Group | ||||||
|---|---|---|---|---|---|---|
|
|
||||||
| 1 | 2 | 3 | ||||
|
|
|
|
||||
| 1 | 2 | 1 | 2 | 1 | 2 | |
| 1. Proportional to size | 50 | 150 | 50 | 150 | 25 | 175 |
| 2. Equal (oversample smallest) | 100 | 100 | 100 | 100 | 100 | 100 |
| 3. 3/2-power (oversample largest) | 32 | 168 | 32 | 168 | 10 | 190 |
| 4. Optimal for g3(y) | 48 | 152 | 48 | 152 | 16 | 184 |
| 5. Optimal for θ3(y) | 81 | 119 | 94 | 106 | 73 | 127 |
We then generated 4,096 egocentric samples from each population using each of these schemes, computed the statistics, and fit the model. The biases for the estimators, shown in Table 4, are small, with the exception θ1 for those scenarios (e.g., Population 3, Scheme 3) for which the sample size in Group 1 is particularly small. This is likely to be because even in ordinary logistic regression, the mapping from g(Y, x) to θ is nonlinear, so even unbiased variation in g(Y, x) can bias (Firth 1993; Krivitsky and Morris 2017)—and a small sample size in Group 1 results in a particularly noisy estimate for and therefore a particularly severe bias for .
Table 4:
Biases of the estimators for the sampling schemes for the three stratified populations, normalised by the standard deviation of the corresponding estimator under Scheme 1.
| g 1 | g 2 | g 3 | θ 1 | θ 2 | θ 3 | |
|---|---|---|---|---|---|---|
| Population 1 | ||||||
| 1. Proportional to size | −0.01 | 0.00 | 0.00 | −0.10 | −0.05 | 0.03 |
| 2. Equal (oversample smallest) | −0.03 | −0.05 | −0.05 | −0.09 | −0.09 | 0.03 |
| 3. 3/2-power (oversample largest) | −0.01 | 0.00 | −0.01 | −0.13 | −0.01 | −0.01 |
| 4. Optimal for g3(y) | 0.01 | 0.00 | 0.00 | −0.08 | −0.03 | 0.01 |
| 5. Optimal for θ3(y) | 0.00 | 0.01 | 0.00 | −0.07 | −0.03 | 0.01 |
| Population 2 | ||||||
| 1. Proportional to size | 0.01 | 0.02 | 0.02 | −0.11 | 0.07 | −0.12 |
| 2. Equal (oversample smallest) | 0.00 | 0.02 | 0.02 | −0.03 | −0.01 | −0.07 |
| 3. 3/2-power (oversample largest) | 0.02 | 0.01 | 0.01 | −0.18 | 0.14 | −0.19 |
| 4. Optimal for g3(y) | 0.02 | −0.01 | 0.01 | −0.07 | 0.03 | −0.10 |
| 5. Optimal for θ3(y) | 0.02 | 0.02 | 0.03 | −0.02 | −0.02 | −0.05 |
| Population 3 | ||||||
| 1. Proportional to size | 0.03 | −0.03 | −0.03 | −0.15 | 0.04 | −0.07 |
| 2. Equal (oversample smallest) | 0.00 | 0.02 | 0.02 | −0.08 | −0.04 | 0.02 |
| 3. 3/2-power (oversample largest) | 0.03 | 0.00 | 0.00 | −1.02 | 0.78 | −0.79 |
| 4. Optimal for g3(y) | 0.00 | 0.01 | 0.01 | −0.38 | 0.19 | −0.20 |
| 5. Optimal for θ3(y) | −0.01 | −0.02 | −0.02 | −0.09 | −0.04 | 0.02 |
We now discuss the standard deviations of estimators for the network statistics and for the model parameters, reported in Table 5. The effects of sample allocation on the precision of estimates of statistics is straightforward: oversampling a specific group increases the precision of the estimate of its statistic. Naive (Scheme 3) oversampling of the larger group does not appear to increase the precision of the homophily statistic, but optimal oversampling does so slightly. It is less straightforward for the parameters, however: oversampling smaller strata uniformly (Scheme 2) improves the estimates for parameters, with the optimal one (Scheme 5) improving the homophily parameter’s standard deviation by between 5% and 24%, depending on the parameters and the population distribution.
Table 5:
Standard deviations of estimators for the sampling schemes for the three stratified populations, relative to sampling in proportion to stratum size. A value less than one represents a gain in efficiency from the design.
| g 1 | g 2 | g 3 | θ 1 | θ 2 | θ 3 | |
|---|---|---|---|---|---|---|
| Population 1 | ||||||
| 2. Equal (oversample smallest) | 0.79 | 1.15 | 1.08 | 0.96 | 0.99 | 0.99 |
| 3. 3/2-power (oversample largest) | 1.21 | 0.94 | 1.01 | 1.15 | 1.15 | 1.15 |
| 4. Optimal for g3(y) | 1.03 | 0.98 | 0.97 | 1.00 | 1.03 | 1.01 |
| 5. Optimal for θ3(y) | 0.85 | 1.09 | 1.03 | 0.94 | 0.96 | 0.95 |
| Population 2 | ||||||
| 2. Equal (oversample smallest) | 0.93 | 1.16 | 1.12 | 0.86 | 0.88 | 0.85 |
| 3. 3/2-power (oversample largest) | 1.16 | 1.04 | 1.05 | 1.28 | 1.29 | 1.29 |
| 4. Optimal for g3(y) | 1.02 | 1.02 | 1.01 | 1.00 | 1.02 | 1.02 |
| 5. Optimal for θ3(y) | 0.94 | 1.13 | 1.10 | 0.86 | 0.87 | 0.85 |
| Population 3 | ||||||
| 2. Equal (oversample smallest) | 0.75 | 1.30 | 1.28 | 0.78 | 0.77 | 0.76 |
| 3. 3/2-power (oversample largest) | 1.51 | 0.95 | 0.98 | 4.07 | 4.04 | 4.03 |
| 4. Optimal for g3(y) | 1.22 | 0.96 | 0.98 | 2.02 | 1.96 | 1.95 |
| 5. Optimal for θ3(y) | 0.75 | 1.14 | 1.12 | 0.78 | 0.77 | 0.76 |
That is, what is optimal for statistics is not necessarily optimal for parameters, and vice versa.
4.2. Example: Stratified Sampling to Estimate Differential Homophily
To further illustrate the difference between estimation of statistics and estimation of parameters, and to explore the effects of sampling on estimation, consider a toy population and sampling designed to mimic the race distribution in the US General Social Survey (Burt 1984), with three groups, comprising 10%, 20%, and 70% and positive within-group homophily, using an ERGM with xi ∈ {1, 2, 3} and six statistics: for each group k = 1, 2, 3, an activity statistic and a (differential) homophily statistic . Again, we use the offset of Krivitsky, Handcock, and Morris (2011), resulting in the following logit form:
The population composition, parameters, and statistics are summarised in Table 6, and the sample sizes under the three basic sampling schemes (proportional to size, oversample smallest, oversample largest) are given in Table 7.
Table 6:
Three-group synthetic population network; note that parameters are net of the −log|N| network size adjustment.
| Group 1 | Group 2 | Group 3 | ||
|---|---|---|---|---|
| Size | 20,000 | 40,000 | 140,000 | |
| Parameters (simulation) | activity (θ1,θ2,θ3) | 0.00 | 0.00 | 0.25 |
| homophily (θ4,θ5,θ6) | 3.00 | 2.00 | 1.00 | |
| Statistics | activity (g1,g2,g3) | 60,557 | 98,901 | 492,816 |
| homophily (g4,g5,g6) | 19,390 | 29,516 | 219,560 | |
| Parameters (actual) | activity (θ1,θ2,θ3) | −0.00 | 0.00 | 0.25 |
| homophily (θ4,θ5,θ6) | 2.97 | 1.99 | 1.01 |
Table 7:
Stratum sample sizes for the three-group population.
| Group 1 | Group 2 | Group 3 | |
|---|---|---|---|
| 1. Proportional to size | 150 | 300 | 1050 |
| 2. Equal (oversample smallest) | 500 | 500 | 500 |
| 3. 3/2-power (oversample largest) | 67 | 190 | 1243 |
The simulation results, given in Table 8, are unsurprising for statistics: oversampling a particular group leads to more precise estimates of statistics for both its “main effect” and its within-group homophily. On the other hand, we see, as before, that oversampling the smaller groups improves the accuracy of all parameters, including the activity and the homophily of the larger group. The biases for estimating the parameters are also slightly reduced.
Table 8:
Biases and standard deviations of estimators for the sampling schemes for the three-group population. All values are normalised by the standard deviation of the corresponding estimator under Scheme 1. A relative standard deviation of less than one represents a gain in efficiency from the design.
| g 1 | g 2 | g 3 | g 4 | g 5 | g 6 | θ 1 | θ 2 | θ 3 | θ 4 | θ 5 | θ 6 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Biases | ||||||||||||
| 1. Proportional to size | 0.02 | 0.01 | 0.00 | 0.01 | −0.01 | −0.01 | −0.09 | −0.07 | 0.07 | 0.09 | 0.07 | −0.07 |
| 2. Equal (oversample smallest) | 0.00 | 0.00 | 0.06 | 0.00 | 0.01 | 0.07 | −0.08 | −0.07 | 0.04 | 0.09 | 0.07 | −0.04 |
| 3. 3/2-power (oversample largest) | 0.00 | 0.00 | 0.01 | −0.01 | 0.00 | 0.01 | −0.15 | −0.12 | 0.11 | 0.13 | 0.11 | −0.10 |
| Standard deviations (relative to Scheme 1) | ||||||||||||
| 2. Equal (oversample smallest) | 0.68 | 0.89 | 1.48 | 0.55 | 0.78 | 1.49 | 0.78 | 0.82 | 0.80 | 0.75 | 0.81 | 0.81 |
| 3. 3/2-power (oversample largest) | 1.48 | 1.21 | 0.94 | 1.51 | 1.26 | 0.92 | 1.31 | 1.35 | 1.36 | 1.33 | 1.34 | 1.34 |
4.3. Example: Degree effects estimation
Additionally, we illustrate the impact of weighted sampling on some dyad-dependent effects. In the interests of brevity, we reuse the composition and the sampling designs from Population 1, and we modify the true model by replacing the third statistic (homophily) with the count of actors with degree 1:
simulating with θ = [0.10, −0.40, 2.00] (net of network size adjustment) to obtain g = [48956, 128948, 157369] and actual θ = [−0.01, −0.50, 2.00].
As of this writing, there is no known way to obtain a closed-form estimate for the expected values of the sufficient statistics and conversely the parameters corresponding to them, so we consider only the sampling schemes 1–3 (i.e., proportional to size, equal, 3/2-power).
The normalised biases are given in Table 9, and the standard deviations (relative sampling proportional to size) in Table 10. We see a nontrivial bias for all designs—likely due to the relatively modest overall sample size as discussed in the previous section, and the effects are not very consistent, except in that deviating from proportional sampling increases the variance of the degree effect parameter.
Table 9:
Biases of the estimators for the sampling schemes for the population with a degree effect, normalised by the standard deviation of the corresponding estimator under Scheme 1.
| g 1 | g 2 | g 3 | θ 1 | θ 2 | θ 3 | |
|---|---|---|---|---|---|---|
| 1. Proportional to size | 0.01 | 0.01 | 0.00 | −0.10 | −0.19 | 0.25 |
| 2. Equal (oversample smallest) | 0.01 | −0.02 | 0.02 | −0.07 | −0.27 | 0.34 |
| 3. 3/2-power (oversample largest) | 0.10 | 0.12 | 0.17 | −0.03 | −0.30 | 0.34 |
Table 10:
Standard deviations of estimators for the sampling schemes for the population with a degree effect, relative to sampling in proportion to stratum size. A value less than one represents a gain in efficiency from the design.
| g 1 | g 2 | g 3 | θ 1 | θ 2 | θ 3 | |
|---|---|---|---|---|---|---|
| 2. Equal (oversample smallest) | 0.98 | 1.18 | 1.12 | 1.01 | 1.27 | 1.21 |
| 3. 3/2-power (oversample largest) | 1.13 | 0.98 | 1.02 | 1.12 | 1.18 | 1.19 |
5. Design considerations: censoring in alter sampling
A common feature of egocentric designs is a limit on the number of nominations an ego may make—a Fixed-Choice Design (FCD) (Ott et al. 2017). For example, the US General Social Survey (Burt 1984) limits the number of alters to 5. This is common in studies where all actors are surveyed and alters are identified, and in the pure likelihood framework, a number of approaches have been proposed. Hoff et al. (2013) viewed the problem as a coarsening and thresholding of an underlying ordinal process, and Krivitsky and Butts (2017) suggested that their framework for ordinal relational data can be used to model this ordinal process. Both Hoff et al. (2013) and Ott et al. (2017) demonstrate that not taking this limit into account can introduce biases and otherwise distort inferences. At the same time, as Krivitsky and Morris (2017) discuss, a pure likelihood approach may not be practical if the alters are not identified.
A possible approach, pointed out by Hoff et al. (2013) among others, is to impose a sample space constraint, forbidding an actor from having more than the specified number of ties. This would be an egocentric constraint by the definition of Krivitsky and Morris (2017), and therefore easily incorporated into their inferential framework. However, this approach is also suboptimal, since it attempts to approximate what is a result of the sampling process by a constraint on the underlying network process.
Ott et al. (2017) proposed an Augmented Fixed-Choice Design (AFCD) that can be outlined as follows:
Each respondent i is asked to list all |yi| of their alters by some identifier.
The respondent is asked to provide detailed information (i.e., xj) for m of the |yi| alters, ideally selected randomly.
|yi| is recorded as well.
Here, we second the recommendation of Ott et al. (2017) and show how the Krivitsky and Morris (2017) framework can make use of it to obtain accurate estimates, as well as discuss some limitations of the AFCD.
Recall that estimation using this framework requires us to produce an estimator of (3), a population total. In our notation, an ego data point collected using AFCD, , can be represented as:
|yi| is the number of alters;
is the attribute information collected on the m sampled alters, as reported by ego i; and
is the number of sampled alters: min(|yi|,m); and
Then, under certain conditions, we can define a simple expansion estimator for ego i’s statistic hk(ei):
| (5) |
For example, suppose that under m = 2 in a two-group population, ego i in Group 1 reported a total of 3 alters, with one of them in Group 1 (i.e., homophilous), another in Group 2 (i.e., heterophilous), and the third one censored per the design. Then, for the homophily statistic the AFCD estimate for that ego’s contribution would be,
The conditions are that 1) the alters are randomly sampled without replacement and 2) the network statistic, if it depends on alter attributes, is dyad-independent. The latter requirement, if fulfilled, means that hk(ei) can be expressed as a (possibly weighted) count or sum over the alters, making this extrapolation from the attributes of the observed alters to the unobserved possible. Under these conditions, will be an unbiased estimator of hk(ei). (If the statistic does not depend on the alter attributes, it is a fortitori not affected by incompletely observed alter attributes, though it may still need to make use of the total number of alters.)
In practice, this means that even at the extreme of m = 1—in which the respondent is asked to elaborate about only one alter—parameters for main effects of categorical (e.g., gender) or quantitative (e.g., age) covariates, mixing between groups (e.g., ethnicity), and effects of interaction between quantitative covariates (e.g., age difference) remain estimable with only a modest loss of accuracy. In studies of romantic networks in particular, it is common to ask the respondent to estimate the number of partners within a certain time window (e.g., last 6 or 12 months) and then ask about the demographics of the last partner (e.g., Rao et al. 2019). However, effects such as diversity in partners—e.g., number of actors with both male and female partners—can no longer be estimated as they could be in the absence of alter censoring. Also, focusing specifically on the last partner may violate Condition 1 above.
Inferentially, , the sample of egos and subsamples of their alters can be viewed as a two-stage cluster sample (or a multistage sample, if the ego-level sampling is itself a cluster sample). Whereas an uncensored estimator has one source of variation—who could have been in the egocentric sample—the censored estimator acquires an additional source of variation: which alters could have been selected for the report. Its variance of is thus a function of both the variance between the ego contributions and the variance within the ego contributions. However, standard estimators exist for multistage cluster samples (Lohr 2019, Sec. 5.3, for example).
5.1. Example: Approaches to censoring
We illustrate the different approaches to censored alters with an example. Again starting with Population 1 in Section 4.1, we take a simple random sample of size |S| = 200 egos and consider each of the following four censoring and analysis schemes for m = 1 and m = 2:
No alter censoring.
FCD analysed with no adjustment.
FCD analysed via an ERGM conditional on degree at most m.
AFCD analysed by cluster sampling.
Censoring at m = 1 results in censoring the reports of about 25% of the egos and m = 2 of about 7%.
The bias in the estimators, standardised by its standard deviation under Scheme 1, is summarised in Table 11. Absent censoring, both the statistics and the parameters are estimated with a negligible bias. Censoring alters without adjustment creates a negative bias in all statistics—to be expected, since it is effectively removing edges from the network. This results in a negative bias in parameter estimates for actor activity. As one might expect, these biases are stronger the stricter the censoring of the data (i.e., stronger for m = 1 than for m = 2). Interestingly, censoring has negligible effect on the estimates of homophily net of those.
Table 11:
Biases of the estimators for the sampling schemes for the censoring simulation, normalised by the standard deviation of the corresponding estimator under Scheme 1.
| g 1 | g 2 | g 3 | θ 1 | θ 2 | θ 3 | |
|---|---|---|---|---|---|---|
| 1. No censoring | −0.01 | 0.02 | 0.01 | −0.11 | −0.01 | −0.01 |
| m = 1 | ||||||
| 2. FCD, no adjustment | −2.46 | −3.86 | −3.84 | −2.10 | −1.68 | 0.01 |
| 3. FCD, conditional | −2.46 | −3.86 | −3.84 | 7.58 | 5.48 | 0.01 |
| 4. AFCD, adjusted | 0.01 | 0.00 | 0.01 | −0.12 | −0.04 | 0.01 |
| m = 2 | ||||||
| 2. FCD, no adjustment | −0.81 | −1.06 | −1.15 | −0.71 | −0.48 | 0.01 |
| 3. FCD, conditional | −0.81 | −1.06 | −1.15 | 2.54 | 1.88 | 0.01 |
| 4. AFCD, adjusted | −0.02 | −0.01 | −0.01 | −0.12 | −0.04 | 0.02 |
Attempting to adjust for alter censoring by conditioning on degree appears to overcompensate for the censoring, resulting in a positive bias for the main effect estimators, though, again, the effect on the homophily parameter’s bias is negligible. (It does not change from the censored, because a sample space constraint affects only the κ(θ, x) and μ(θ, x) components of ERGM estimation.) Lastly, using AFCD and adjusting eliminates the bias due to censoring.
We next turn to the variances of the estimates, summarised in Table 12 as standard deviations relative to the uncensored (i.e., divided by the standard deviation for the corresponding uncensored estimate). Censoring increases the variance of the two estimators that seek to adjust for it, reflecting the greater uncertainty that it introduces. As one would expect, stricter censoring inflates the uncertainty more.
Table 12:
Standard deviations of estimators for the sampling schemes for the censoring simulation, relative to sampling in proportion to stratum size. A value less than one represents a gain in efficiency from the design.
| g 1 | g 2 | g 3 | θ 1 | θ 2 | θ 3 | |
|---|---|---|---|---|---|---|
| m = 1 | ||||||
| 2. FCD, no adjustment | 0.55 | 0.56 | 0.59 | 1.00 | 1.20 | 1.29 |
| 3. FCD, conditional | 0.55 | 0.56 | 0.59 | 3.41 | 1.85 | 1.29 |
| 4. AFCD, adjusted | 1.06 | 1.03 | 1.11 | 1.14 | 1.35 | 1.44 |
| m = 2 | ||||||
| 2. FCD, no adjustment | 0.82 | 0.83 | 0.83 | 0.97 | 1.03 | 1.06 |
| 3. FCD, conditional | 0.82 | 0.83 | 0.83 | 1.65 | 1.24 | 1.06 |
| 4. AFCD, adjusted | 1.01 | 1.00 | 1.02 | 1.03 | 1.07 | 1.09 |
This simulation study underscores the advantages of AFCD over FCD, and suggests that conditioning on maximum degree overcompensates for alter censoring.
6. Conclusions and discussion
Egocentric sampling provides a relatively inexpensive and generalisable approach to observing networks, while allowing for a great deal of insight into their structure, particularly when augmented by ERG modelling. We have provided an overview of how measurement and sampling affect the statistical utility of egocentric studies, as well as recommendations for improvement. While we focus on quantifying the benefits and drawbacks of study design decisions on ERGM estimates, the design issues we highlight are relevant for all forms of egocentric data analysis.
With respect to questionnaire design and measurement, we argued in Section 3 that the design of egocentric survey instruments often leads to unanticipated harmonisation problems. Attributes of ego and alters, such as gender, age or education, should ideally be measured using comparable survey questions and scales. Harmonisation of these measures often requires aggregation of response categories, and if the measures are designed well this can minimise the loss of information. Similarly, information about ego–alter and alter–alter ties needs to be gathered in a way that ensures comparability. Without this, triadic information becomes uninterpretable.
The precision of ERGM parameter estimation can be improved by choosing the right survey sampling design for the parameters of most interest. We demonstrated in Section 4 that a design that is optimal for estimating the network statistics is not necessarily optimal for estimating ERGM parameters. In particular, it appears that when modelling homophily in the presence of “main effects”, oversampling bigger groups leads to more precise estimates of the homophily statistics but less precise estimates of the homophily parameters, whereas an optimal design for estimating the homophily parameter oversamples the smaller group. This is encouraging from the point of view of integrating network questions into general social surveys: it is typical to oversample the smaller groups in those surveys, and doing so appears to sharpen ERGM inference.
We have also demonstrated how parameter estimation is affected by survey design choices related to alter elicitation. In Section 5 we show that the Fixed Choice Design, in which an ego can nominate up to a fixed number of alters, has inferior inferential properties compared to the Augmented Fixed Choice Design, which obtains information about the total number of alters and randomises which alters are elaborated on. Using simulations, we show that estimates obtained from AFCD are associated with much lower bias.
A number directions for future research remain.
In our discussion of estimable covariates, we noted that some, like kinship, are fundamentally dyadic. Although they cannot be used currently as covariates for other relationships (such as support), future advances may allow them to be modelled jointly in, say, a multiplex network framework (Pattison and Wasserman 1999; Krivitsky, Koehly, and Marcum 2020, for example). More generally, it may be possible to relax the limitation on the types of covariates.
We discussed the modelling consequences of limiting the number of alters about whom detailed information is available to m = 1, but the situation with m ≥ 2 is not as simple. For example, if we wish to evaluate the diversity of an actor’s ties according to some attribute, an m ≥ 2 sample contains meaningful information about alter diversity, so some inference may be possible, if not straightforward. Similarly, if alter–alter relations among the subsampled alters are observed, then the GWESP statistic—which does not depend on alter attributes but is not dyad-independent—may or may not be helped by observing the total number of alters. Development of estimators for these two scenarios—or proofs of their impossibility—is a topic for future work.
Inference based on AFCD depends on the alters whose attributes are observed in detail being randomly chosen from among the alters matching the name generator. This assumption may not be valid unless explicitly designed into the survey, and it may not hold, even approximately, for studies, such as Rao et al. (2019), that focus on the respondent’s last partner. Adjusting for this is another subject for future work.
With respect to alter sampling, some designs (such as Burt 1984) that impose limits on the number of alters reported do indicate whether the respondent wished to name more (but not how many more). This does not lend itself to AFCD inference directly, but an approximate solution may be to regard the reported number of alters as a censored observation from a parametric count family (e.g., Poisson, zero-inflated Poisson, or negative binomial). Estimating its parameters and using the family to “impute” the alter count is a promising approach as well, albeit one that makes strong assumptions.
Recently, Aggregated Relational Data (ARD), originally used for estimating size of hidden populations (Zheng, Salganik, and Gelman 2006), has been used to estimate latent space network models (McCormick and Zheng 2015; Breza et al. 2017). Extending the presented framework to such aggregated data is subject for future work.
Supplementary Material
Highlights:
Rigorous inferential framework for egocentrically-sampled network data is discussed.
Validity and comparability requirements for survey questions about attributes of egos and alters, ego–alter ties, and alter–alter ties are formulated.
Strategies for optimal stratified sampling for estimating homophily parameters are described.
Analytical and simulation-based results are presented discussing advantages and disadvantages of Fixed-Choice Design (FCD) and Augmented Fixed-Choice Design (AFCD).
7. Acknowledgements
This work utilised the computational resources of the University of Wollongong’s National Institute for Applied Statistics Research Australia (NIASRA) HPC cluster and the Katana computational cluster supported by Research Technology Services at UNSW Sydney. Bojanowski thanks Interdisciplinary Centre for Mathematical and Computational Modelling at the University of Warsaw for support through computational grant G74-3.
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
The key difference between the approaches taken in Krivitsky and Morris (2017) and Smith (2012) is their primary objective. Smith (2012) seeks to leverage just the generative property of an ERGM estimated on egocentric data to simulate complete networks, including terms for triadic effects. He proposes an ad hoc but effective iterative method to do this. Krivitsky and Morris (2017) seeks to establish a general framework for valid statistical inference for ERGMs from egocentric data. This provides a more principled foundation for leveraging the generative properties in simulation. The inferential questions are what we seek to query in this paper, not the properties of the simulated complete networks, so we will focus discussion on the Krivitsky and Morris (2017) framework. The measurement issues we address here are relevant to both approaches.
This is not to be confused with the Maximum Pseudo-Likelihood Estimator (MPLE) of Strauss and Ikeda (1990), the computational approximation technique. We do not make direct use of MPLEs here.
References
- Bailey Stefanie, and Marsden Peter V. 1999. “Interpretation and Interview Context: Examining the General Social Survey Name Generator Using Cognitive Methods.” Social Networks 21 (3): 287–309. 10.1016/s0378-8733(99)00013-1. [DOI] [Google Scholar]
- Binder David A. 1983. “On the Variances of Asymptotically Normal Estimators from Complex Surveys.” International Statistical Review 51: 279–92. 10.2307/1402588. [DOI] [Google Scholar]
- Borgatti Stephen P., and Molina José-Luis. 2005. “Toward Ethical Guidelines for Network Research in Organizations.” Social Networks 27 (2): 107–17. 10.1016/j.socnet.2005.01.004. [DOI] [Google Scholar]
- Brashears Matthew E. 2008. “Gender and Homophily: Differences in Male and Female Association in Blau Space.” Social Science Research 37 (2): 400–415. 10.1016/j.ssresearch.2007.08.004. [DOI] [PubMed] [Google Scholar]
- Brewer Devon D. 1997. “No Associative Biases in the First Name Cued Recall Procedure for Eliciting Personal Networks.” Social Networks 19 (4): 345–53. 10.1016/S0378-8733(97)00002-6. [DOI] [Google Scholar]
- Brewer Devon D., and Yang Bihchii Laura. 1994. “Patterns in the Recall of Persons in a Religious Community.” Social Networks 16 (4): 347–79. 10.1016/0378-8733(94)90016-7. [DOI] [Google Scholar]
- Breza Emily, Chandrasekhar Arun G., McCormick Tyler H., and Pan Mengjie. 2017. “Using Aggregated Relational Data to Feasibly Identify Network Structure Without Network Data.” Working Paper 23491. Working Paper Series. National Bureau of Economic Research. 10.3386/w23491. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Burt Ronald S. 1984. “Network Items and the General Social Survey.” Social Networks 6 (4): 293–339. 10.1016/0378-8733(84)90007-8. [DOI] [Google Scholar]
- Campbell Karen E., and Lee Barrett A.. 1991. “Name Generators in Surveys of Personal Networks.” Social Networks 13 (3): 203–21. 10.1016/0378-8733(91)90006-F. [DOI] [Google Scholar]
- CILS4EU. 2014. Children of Immigrants Longitudinal Survey in Four European Countries. Mannheim University Mannheim. [Google Scholar]
- Cornwell Benjamin, L Schumm Edward Laumann, and Graber Jessica. 2009. “Social Networks in the NSHAP Study: Rationale, Measurement, and Preliminary Findings.” The Journals of Gerontology. Series B, Psychological Sciences and Social Sciences 64 Suppl 1: i47–55. 10.1093/geronb/gbp042. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Crawford Forrest W, Wu Jiacheng, and Heimer Robert. 2018. “Hidden Population Size Estimation from Respondent-Driven Sampling: A Network Approach.” Journal of the American Statistical Association 113 (522): 755–66. 10.1080/01621459.2017.1285775. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Crossley Nick, Bellotti Elisa, Edwards Gemma, Everett Martin G., Koskinen Johan, and Tranmer Mark. 2015. Social Network Analysis for Ego-Nets: Social Network Analysis for Actor-Centred Networks. SAGE. [Google Scholar]
- Demographic and Health Surveys Program. 2020. “DHS Model Questionnaire.” The DHS Program website [Accessed June 22, 2020]: ICF. https://dhsprogram.com/publications/publication-dhsq8-dhs-questionnaires-and-manuals.cfm. [Google Scholar]
- Firth David. 1993. “Bias Reduction of Maximum Likelihood Estimates.” Biometrika 80 (1): 27–38. 10.2307/2336755. [DOI] [Google Scholar]
- Fischer Claude S. 1977. “Northern California Community Study.” Inter-university Consortium for Political; Social Research; [distributor]. 10.3886/ICPSR07744.v2. [DOI] [Google Scholar]
- Fischer Claude S. 1982. To Dwell Among Friends: Personal Networks in Town and City. University of Chicago Press. [Google Scholar]
- Fischer Claude S. 2018. “Understanding How Personal Networks Change: Wave 1.” Ann Arbor, MI: Inter-university Consortium for Political; Social Research. 10.3886/ICPSR36975.v1. [DOI] [Google Scholar]
- Fischer Claude S., and Bayham Lindsay. 2019. “Mode and Interviewer Effects in Egocentric Network Research.” Field Methods 31 (3): 195–213. 10.1177/1525822X19861321. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Frank Ove, and Snijders Tom. 1994. “Estimating the Size of Hidden Populations Using Snowball Sampling.” Journal of Official Statistics 10 (1). [Google Scholar]
- Fu Yang-chih. 2005. “Measuring Personal Networks with Daily Contacts: A Single-Item Survey Question and the Contact Diary.” Social Networks 27 (3): 169–86. 10.1016/j.socnet.2005.01.008. [DOI] [Google Scholar]
- GESIS Data Archive. 2018. International Social Survey programme, 1984–2018 [Machine-Readable Data File]. Cologne: ISSP Research Group. http://w.issp.org/data-download/by-topic/. [Google Scholar]
- Gjoka Minas, Smith Emily, and Butts Carter. 2014. “Estimating Clique Composition and Size Distributions from Sampled Network Data.” In Sixth IEEE International Workshop on Network Science for Communication Networks. [Google Scholar]
- Gjoka Minas, Smith Emily, and Butts Carter T.. 2015. “Estimating Subgraph Frequencies with or Without Attributes from Egocentrically Sampled Data.” https://arxiv.org/abs/1510.08119.
- González-Bailón Sandra, Wang Ning, Rivero Alejandro, Borge-Holthoefer Javier, and Moreno Yamir. 2014. “Assessing the Bias in Samples of Large Online Networks.” Social Networks 38 (July): 16–27. 10.1016/j.socnet.2014.01.004. [DOI] [Google Scholar]
- Goodman Leo A. 1961. “Snowball Sampling.” AMS 32 (1): 148–70. [Google Scholar]
- Goodreau Steven M., Cassels Susan, Kasprzyk Danuta, Montaño Daniel E., Greek April, and Morris Martina. 2010. “Concurrent Partnerships, Acute Infection and HIV Epidemic Dynamics Among Young Adults in Zimbabwe.” AIDS and Behavior, 1–11. 10.1007/s10461-010-9858-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Goodreau Steven M., Rosenberg Eli S., Jenness Samuel M., Luisi Nicole, Stansfield Sarah E., Millett Gregorio A., and Sullivan Patrick S.. 2017. “Sources of Racial Disparities in HIV Prevalence in Men Who Have Sex with Men in Atlanta, GA, USA: A Modelling Study.” The Lancet. HIV 4 (7): e311–e320. 10.1016/s2352-3018(17)30067-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gould Roger V., and Fernandez Roberto M.. 1989. “Structures of Mediation: A Formal Approach to Brokerage in Transaction Networks.” Sociological Methodology 19: 89–126. 10.2307/270949. [DOI] [Google Scholar]
- Hampton Keith, and Wellman Barry. 2003. “Neighboring in Netville: How the Internet Supports Community and Social Capital in a Wired Suburb.” City & Community 2 (4): 277–311. [Google Scholar]
- Handcock Mark S., and Gile Krista J.. 2010. “Modeling Social Networks from Sampled Data.” Annals of Applied Statistics 4 (1): 5–25. 10.1214/08-AOAS221. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hanneke Steve, Fu Wenjie, and Xing Eric P.. 2010. “Discrete Temporal Models of Social Networks.” Electronic Journal of Statistics 4: 585–605. 10.1214/09-EJS548. [DOI] [Google Scholar]
- Hara Noriko, Chen Hui, and Ynalvez Marcus A. 2017. “Using Egocentric Analysis to Investigate Professional Networks and Productivity of Graduate Students and Faculty in Life Sciences in Japan, Singapore, and Taiwan.” PLoS One 12: e0186608. 10.1371/journal.pone.0186608. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Harling Guy, Jessica M Perkins Francesc Xavier Gómez-Olivé, Morris Katherine, Ryan G Wagner Livia Montana, Chodziwadziwa W Kabudula Till Bärnighausen, Kahn Kathleen, and Berkman Lisa. 2018. “Interviewer-Driven Variability in Social Network Reporting: Results from Health and Aging in Africa: A Longitudinal Study of an INDEPTH Community (HAALSI) in South Africa.” Field Methods 30 (2): 140–54. 10.1177/1525822×18769498. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Heckathorn Douglas D. 1997. “Respondent-Driven Sampling: A New Approach to the Study of Hidden Populations.” Social Problems 44 (2): 174–99. 10.2307/3096941. [DOI] [Google Scholar]
- Heckathorn Douglas D. 2002. “Respondent-Driven Sampling II: Deriving Valid Population Estimates from Chain-Referral Samples of Hidden Populations.” Social Problems 49 (1): 11–34. 10.1525/sp.2002.49.1.11. [DOI] [Google Scholar]
- Hoff Peter, Fosdick Bailey, Volfovsky Alex, and Stovel Katherine. 2013. “Likelihoods for Fixed Rank Nomination Networks.” Network Science 1 (3): 253–77. 10.1017/nws.2013.17. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jenness Samuel M., Maloney Kevin M., Smith Dawn K., Hoover Karen W., Goodreau Steven M., Rosenberg Eli S., Weiss Kevin M., Liu Albert Y., Rao Darcy W., and Sullivan Patrick S.. 2018. “Addressing Gaps in HIV Preexposure Prophylaxis Care to Reduce Racial Disparities in HIV Incidence in the United States.” American Journal of Epidemiology 188 (4): 743–52. 10.1093/aje/kwy230. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kalish Yuval, and Robins Garry. 2006. “Psychological Predispositions and Network Structure: The Relationship Between Individual Predispositions, Structural Holes and Network Closure.” Social Networks 28 (1): 56–84. 10.1016/j.socnet.2005.04.004. [DOI] [Google Scholar]
- Kalter Frank, Heath Anthony, Hewstone Miles, Jonsson Jan O., Kalmijn Matthijs, Kogan Irena, and Van Tubergen Frank. 2016. “Children of Immigrants Longitudinal Survey in Four European Countries (CILS4EU) Full Version. Data File Version 1.2.0.” Cologne, ZA5353: GESIS Data Archive. 10.4232/cils4eu.5353.1.2.0. [DOI]
- Kish Leslie. 1995. Survey Sampling. New York: John Wiley & Sons, Inc. [Google Scholar]
- Koskinen Johan H., Robins Garry L., and Pattison Philippa E.. 2010. “Analysing Exponential Random Graph (P-Star) Models with Missing Data Using Bayesian Data Augmentation.” Statistical Methodology 7 (3): 366–84. 10.1016/j.stamet.2009.09.007. [DOI] [Google Scholar]
- Kossinets Gueorgi. 2006. “Effects of Missing Data in Social Networks.” Social Networks 28 (3): 247–68. 10.1016/j.socnet.2005.07.002. [DOI] [Google Scholar]
- Krivitsky Pavel N. 2012. “Modeling of Dynamic Networks Based on Egocentric Data with Durational Information.” 2012–01. Pennsylvania State University Department of Statistics. http://stat.psu.edu/research/technical-reports/copy2_of_2012-technical-reports. [Google Scholar]
- Krivitsky Pavel N., Bojanowski Michał, and Morris Martina. 2019. “Inference for Exponential-Family Random Graph Models from Egocentrically-Sampled Data with Alter–Alter Relations.” Working Paper. National Institute for Applied Statistics Research Australia, University of Wollongong. https://niasra.uow.edu.au/workingpapers/UOW255140.html. [Google Scholar]
- Krivitsky Pavel N., and Butts Carter T.. 2017. “Exponential-Family Random Graph Models for Rank-Order Relational Data.” Sociological Methodology 47 (1): 68–112. 10.1177/0081175017692623. [DOI] [Google Scholar]
- Krivitsky Pavel N., and Handcock Mark S.. 2014. “A Separable Model for Dynamic Networks.” Journal of the Royal Statistical Society, Series B 76 (1): 29–46. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Krivitsky Pavel N., Handcock Mark S., and Morris Martina. 2011. “Adjusting for Network Size and Composition Effects in Exponential-Family Random Graph Models.” Statistical Methodology 8 (4): 319–39. 10.1016/j.stamet.2011.01.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Krivitsky Pavel N., Koehly Laura M., and Marcum Christopher Steven. 2020. “Exponential-Family Random Graph Models for Multi-Layer Networks.” Psychometrika. 10.1007/s11336-020-09720-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Krivitsky Pavel N., and Morris Martina. 2017. “Inference for Social Network Models from Egocentrically-Sampled Data, with Application to Understanding Persistent Racial Disparities in HIV Prevalence in the US.” Annals of Applied Statistics 11 (1): 427–55. 10.1214/16-AOAS1010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Laumann Edward O. 1966. Prestige and Association in an Urban Community: An Analysis of an Urban Stratification System. Bobbs-Merrill Company. [Google Scholar]
- Laumann Edward O., Gagnon John H., Michael Robert T., and Michaels Stuart. 1992. “National Health and Social Life Survey.” Chicago, IL, USA: University of Chicago and National Opinion Research Center; [producer], 1995. Ann Arbor, MI, USA: Inter-university Consortium for Political and Social Research [distributor], 2008–04-17. 10.3886/ICPSR06647. [DOI] [Google Scholar]
- Laumann Edward O., Gagnon John H., Michael Robert T., and Michaels Stuart. 1994. The Social Organization of Sexuality. Chicago: University of Chicago Press. [Google Scholar]
- Laumann Edward O., Marsden Peter V., and Prensky David. 1989. “The Boundary Specification Problem in Network Analysis.” In Research Methods in Social Network Analysis, edited by Freeman Linton C., White Douglas R., and Kimball Romney A, 61–79. Fairfax: George Mason University Press. [Google Scholar]
- Lohr Sharon L. 2019. Sampling: Design and Analysis: Design and Analysis. Chapman; Hall/CRC. [Google Scholar]
- Mach Bogdan W., Manterys Aleksander, and Sadowski Ireneusz, eds. 2018. Individuals and Their Social Contexts. Institute of Political Studies, Polish Academy of Sciences. http://wydawnictwo.isppan.waw.pl/produkt/individuals-and-their-social-contexts/. [Google Scholar]
- Marin Alexandra, and Hampton Keith N.. 2007. “Simplifying the Personal Network Name Generator: Alternatives to Traditional Multiple and Single Name Generators.” Field Methods 19 (2): 163–93. 10.1177/1525822X06298588. [DOI] [Google Scholar]
- Marsden Peter V. 1987. “Core Discussion Networks of Americans.” American Sociological Review 52 (1): 122–31. 10.2307/2095397. [DOI] [Google Scholar]
- Marsden Peter V. 2002. “Egocentric and Sociocentric Measures of Network Centrality.” Social Networks 24 (4): 407–22. 10.1016/S0378-8733(02)00016-3. [DOI] [Google Scholar]
- Marsden Peter V. 2003. “Interviewer Effects in Measuring Network Size Using a Single Name Generator.” Social Networks 25: 1–16. 10.1016/S03788733(02)00009-6. [DOI] [Google Scholar]
- Marti Joel, Bolibar Mireia, and Lozares Carlos. 2017. “Network Cohesion and Social Support.” Social Networks 48: 192–201. 10.1016/j.socnet.2016.08.006. [DOI] [Google Scholar]
- McCarty Christopher, Lubbers Miranda J., Vacca Raffaele, and Molina José-Luis. 2019. Conducting Personal Network Research: A Practical Guide. Methodology in the Social Sciences. Guilford Publications. [Google Scholar]
- McCormick Tyler H., and Zheng Tian. 2015. “Latent Surface Models for Networks Using Aggregated Relational Data.” Journal of the American Statistical Association 110 (512): 1684–95. 10.1080/01621459.2014.991395. [DOI] [Google Scholar]
- McPherson Miller, Smith-Lovin Lynn, and Brashears Matthew E.. 2006. “Social Isolation in America: Changes in Core Discussion Networks over Two Decades.” American Sociological Review 71 (3): 353–75. [Google Scholar]
- Morris Martina, Kurth Ann E., Hamilton Deven T., Moody James, and Wakefield Steve. 2009. “Concurrent Partnerships and HIV Prevalence Disparities by Race: Linking Science and Public Health Practice.” American Journal of Public Health 99 (6): 1023–31. 10.2105/AJPH.2008.147835. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mossong J, Hens N, Jit M, Beutels P, Auranen K, Mikolajczyk R, Massari M, et al. 2008. “Social Contacts and Mixing Patterns Relevant to the Spread of Infectious Diseases.” Journal Article. PLoS Medicine 5 (3): e74. 10.1371/journal.pmed.0050074. [DOI] [PMC free article] [PubMed] [Google Scholar]
- National Opinion Research Center. 2018. General Social Surveys, 1972–2018 [Machine-Readable Data File]. Chicago: National Opinion Research Center (NORC). http://gssdataexplorer.norc.org. [Google Scholar]
- O’Malley A. James, Arbesman Samuel, Steiger Darby M., Fowler James H., and Christakis Nicholas A.. 2012. “Egocentric Social Network Structure, Health, and Pro-Social Behaviors in a National Panel Study of Americans.” PLoS One 7 (5): e36250. 10.1371/journal.pone.0036250. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ott Miles Q., Harrison Matthew T., Gile Krista J., Barnett Nancy P., and Hogan Joseph W.. 2017. “Fixed Choice Design and Augmented Fixed Choice Design for Network Data with Missing Observations.” Biostatistics 20 (1): 97–110. 10.1093/biostatistics/kxx066. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pattison Philippa E., Robins Garry L., Snijders Tom A. B., and Wang Peng. 2013. “Conditional Estimation of Exponential Random Graph Models from Snowball Sampling Designs.” Journal of Mathematical Psychology 57: 284–96. 10.1016/j.jmp.2013.05.004. [DOI] [Google Scholar]
- Pattison Philippa, and Wasserman Stanley. 1999. “Logit Models and Logistic Regressions for Social Networks: II. Multivariate Relations.” British Journal of Mathematical and Statistical Psychology 52 (2): 169–93. 10.1348/000711099159053. [DOI] [PubMed] [Google Scholar]
- Perry Brea L., Pescosolido Bernice A., and Borgatti Stephen P.. 2018. Egocentric Network Analysis: Foundations, Methods, and Models. Cambridge University Press. [Google Scholar]
- Poel, van der Mart G. M.. 1993. “Delineating Personal Support Networks.” Social Networks 15 (1): 49–70. 10.1016/0378-8733(93)90021-C. [DOI] [Google Scholar]
- Rao Darcy W., Carr Jason, Naismith Kelly, Hood Julia E., Hughes James P., Morris Martina, Goodreau Steven M., Rosenberg Eli S., and Golden Matthew R.. 2019. “Monitoring HIV Preexposure Prophylaxis Use Among Men Who Have Sex with Men in Washington State.” Sexually Transmitted Diseases 46 (4): 221–28. 10.1097/olq.0000000000000965. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sampath S. 2001. Sampling Theory and Methods. New Delhi: Narosa Publishing House. [Google Scholar]
- Schweinberger Michael, Krivitsky Pavel N., Butts Carter T., and Stewart Jonathan. 2020. “Exponential-Family Models of Random Graphs: Inference in Finite-, Super-, and Infinite Population Scenarios.” Statistical Science To appear. https://arxiv.org/abs/1707.04800. [Google Scholar]
- Smith Jeffrey A. 2012. “Macrostructure from Microstructure: Generating Whole Systems from Ego Networks.” Sociological Methodology 42 (1): 155–205. 10.1177/0081175012455628. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Smith Jeffrey A. 2015. “Global Network Inference from Ego Network Samples: Testing a Simulation Approach.” The Journal of Mathematical Sociology 39 (2): 125–62. 10.1080/0022250X.2014.994621. [DOI] [Google Scholar]
- Smith Tom W., Davern Michael, Freese Jeremy, and Morgan Stephen L.. 2019. General Social Surveys, 1972–2018: Cumulative Codebook. Chicago: National Opinion Research Center (NORC). [Google Scholar]
- Snijders Tom, Spreen Marinus, and Zwaagstra Ronald. 1995. “The Use of Multilevel Modeling for Analysing Personal Networks: Networks of Cocaine Users in an Urban Area.” Journal of Quantitative Anthropology 5 (2): 85–105. [Google Scholar]
- Straits Bruce C. 2000. “Ego’s Important Discussants or Significant People: An Experiment in Varying the Wording of Personal Network Name Generators.” Social Networks 22 (2): 123–40. 10.1016/S0378-8733(00)00018-6. [DOI] [Google Scholar]
- Strauss David, and Ikeda Michael. 1990. “Pseudolikelihood Estimation for Social Networks.” Journal of the American Statistical Association 85 (409): 204–12. 10.1080/01621459.1990.10475327. [DOI] [Google Scholar]
- Thompson Steven K. 2012. Sampling. Third. Wiley Series in Probability and Statistics. Hoboken, NJ: Wiley. [Google Scholar]
- van Duijn Marijtje A. J., van Busschbach Jooske T., and Snijders Tom A. B. 1999. “Multilevel Analysis of Personal Networks as Dependent Variables.” Social Networks 21 (2): 187–210. 10.1016/S0378-8733(99)00009-X. [DOI] [Google Scholar]
- Völker Beate, and Flap Henk. 2001. “Weak Ties as a Liability: The Case of East Germany.” Rationality and Society 13 (4): 397–428. 10.1177/104346301013004001. [DOI] [Google Scholar]
- Völker Beate, and Flap Henk. 2002. “The Survey on the Social Networks of the Dutch (SSND1): Data and Codebook.” Utrecht: Utrecht University. [Google Scholar]
- Völker Beate, Flap H, and Mollenhorst Gerald. 2007. “The Survey on the Social Networks of the Dutch (Second Wave, SSND2). Data and Codebook.” Utrecht: Utrecht University. [Google Scholar]
- Völker Beate, Schutjens Veronique, and Mollenhorst Gerald. 2013. “The Survey on the Social Networks of the Dutch, Third Wave (SSND3): Data and Codebook.” Utrecht: Utrecht University. [Google Scholar]
- Weiss Kevin, Jones Jeb, Katz David, Gift Thomas, Bernstein Kyle, Workowski Kimberly, Rosenberg Eli, and Jenness Samuel. 2019. “Epidemiological Impact of Expedited Partner Therapy for Men Who Have Sex with Men: A Modeling Study.” Sexually Transmitted Diseases 46: 697–705. 10.1097/OLQ.0000000000001058. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wellman Barry. 1979. “The Community Question: The Intimate Networks of East Yorkers.” American Journal of Sociology 84 (5): 1201–31. [Google Scholar]
- Wellman Barry, and Wortley Scot. 1990. “Different Strokes from Different Folks: Community Ties and Social Support.” American Journal of Sociology 96 (3): 558–88. 10.1086/229572. [DOI] [Google Scholar]
- Zheng Tian, Salganik Matthew J., and Gelman Andrew. 2006. “How Many People Do You Know in Prison?” Journal of the American Statistical Association 101 (474): 409–23. 10.1198/016214505000001168. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
