Fixed choice design and augmented fixed choice design for network data with missing observations

Miles Q Ott; Matthew T Harrison; Krista J Gile; Nancy P Barnett; Joseph W Hogan

doi:10.1093/biostatistics/kxx066

. 2017 Dec 16;20(1):97–110. doi: 10.1093/biostatistics/kxx066

Fixed choice design and augmented fixed choice design for network data with missing observations

Miles Q Ott ^1,^✉, Matthew T Harrison ², Krista J Gile ³, Nancy P Barnett ⁴, Joseph W Hogan ⁵

PMCID: PMC6296337 PMID: 29267874

Summary

The statistical analysis of social networks is increasingly used to understand social processes and patterns. The association between social relationships and individual behaviors is of particular interest to sociologists, psychologists, and public health researchers. Several recent network studies make use of the fixed choice design (FCD), which induces missing edges in the network data. Because of the complex dependence structure inherent in networks, missing data can pose very difficult problems for valid statistical inference. In this article, we introduce novel methods for accounting for the FCD censoring and introduce a new survey design, which we call the augmented fixed choice design (AFCD). The AFCD adds considerable information to analyses without unduly burdening the survey respondent, resulting in improvements over the FCD, and other existing estimators. We demonstrate this new method through simulation studies and an analysis of alcohol use in a network of undergraduate students living in a residence hall.

Keywords: Augmented fixed choice design, Fixed choice design, Missing data, Right censoring by degree, Social network

1. Introduction

The statistical analysis of social networks is increasingly used to understand social processes and patterns (Knoke and Yang, 2008). Of particular interest to sociologists, psychologists, and public health researchers is the association between social relationships and individual behaviors. Social relationships are often measured through nominations, such as when collecting information on a friend network, the nomination from person Inline graphic to person indicates that person claims person as their friend. When analyzing a network, there is often the possibility that nominations are missing (Kossinets, 2006). Because of the complex dependence structure inherent in networks, missing data can pose very difficult problems (Holland and Leinhardt, 1973; Marsden, 1990; Kossinets, 2006). Even when missingness is at random, it can induce bias in structural measures of the network, such as homophily and centrality (Smith and Moody, 2013). Of particular interest is how to handle missingness when analyzing the network for peer effects on behavior.

A considerable amount missingness is a result of study design. Several recent studies on networks in school settings make use of the fixed choice design (FCD), which typically induces missing edges. In FCD, the number of possible nominations that each person in the network can make is capped at a maximum, Inline graphic , inducing missing nominations (Holland and Leinhardt, 1973; Kossinets, 2006; Yan and Gregory, 2011). For example, studies have variously restricted the number of friends in a classroom to 4 when studying depression (Witvliet and others, 2010), and the number of best friends to 5 when studying smoking (Mercken and others, 2010), and another study on infectious disease transmission on social networks allowed participants to name up to 6 within class contacts and up to 4 outside of class contacts (Conlan and others, 2010). The National Longitudinal Study of Adolescent Health (Add Health) allowed participants to nominate and rank up to 5 boys and 5 girls as friends (Resnick and others, 1997; Goodreau, 2007) and is the basis for significant methodological advancements in the analysis of social networks with missing data. For example, Goodreau and others (2009) approach the Add Health censoring mechanism by assuming that the true network of interest is the network consisting of the top five male and top five female friends of each respondent. Hipp and others (2015) investigated the consequences of missing observations in longitudinal network data and found that different methods of accounting for missingness can lead to vastly different results. Wang and others (2016) present an exponential random graph (ERGM) method for the imputation of missing network data using the Add Health study as an application. Handcock and Gile present network modeling approaches for networks with missing data with application to the Add Health study (Handcock and Gile, 2007).

Holland and Leinhardt (1973) first introduced the problem of missing ties due to the FCD (also called limited choice design and right-censoring of degree). Kossinets (2006) subsequently showed how the FCD can lead to biases in estimates of structural measures of the network. Gommans and Cillessen (2015) compared analyses on the same populations of elementary school students, with FCD as well as a design without censoring, and found significant differences in the conclusions drawn from the two different data collection schemes.

Hoff and others (2013) have developed a likelihood based approach for fixed rank network data (where there is a maximum number of nominations that could be made, and those nominations are ranked) as well as for FCD, and then used these likelihoods in Bayesian estimation of latent variables which are assumed to govern the nominations and their ranks.

In a study on the transmission of influenza through household contacts, Mossong and others (2008) collected egocentric information on the number of household contacts an individual in the household has made in a given day, without collecting which specific household members the ego was in contact with. Potter and others (2011) used these data to model disease transmission between household members, showing that the number of contacts, or edges, can be useful information even when Inline graphic is capped at zero.

In this work, we introduce a novel approach that can improve inference for FCD data: that true total nominations are collected in addition to the standard FCD data. We develop a method that accounts for the missingness resulting from FCD, given the true total nominations, and show how different censoring cut-offs affect parameter estimation using a dyad-independent ERGM model. In Section 2, we introduce novel methods for accounting for the fixed choice censoring and introduce a new survey design. In Section 3, we present simulation studies wherein we demonstrate our methods for handling fixed choice data and compare the effects of different censoring cut-offs on parameter estimation, as well as against different estimation methods. In Section 4, we demonstrate our method in an analysis of relationships in the presence of alcohol use in a network of undergraduates living in a residence hall. A discussion is presented in Section 5.

2. Model formulation and inference

2.1. General framework

Given a family of probability models indexed by a parameter Inline graphic we use the notation to denote the probability mass function (pmf) of a discrete random variable under the model with parameter . Usually, the model will also involve fixed covariates, but this is suppressed for now in our notation. Our goal is to identify what is so that we can find the maximum likelihood estimates of Inline graphic given our data . If is a matrix, then we use to denote the th row of . If is a vector, then we use to denote the sum of .

Let Inline graphic be an sociomatrix, where if nominates and for all . Note that does not imply that , meaning that these relationships may be unreciprocated. We use to denote the row sums of , i.e., . We assume that is the sociomatrix that would be observed without any reporting constraints, and, hence, Inline graphic is the true total of nominations of the th subject. If could be observed, then, given a computationally tractable probability model , we could use standard likelihood-based methods to estimate . A simple model, which we assume here, is that all ties are independent Bernoulli random variables, namely,

(2.1)

where Inline graphic is designed according to the problem at hand.

In a FCD that allows at most Inline graphic nominations per subject we do not observe , but instead observe a censored sociomatrix . We assume the following possible censoring mechanism: if , then there is no censoring and , otherwise, the subject reports exactly of the original nominations chosen uniformly at random. Consequently, the joint pmf of Inline graphic is

(2.2)

with

(2.3)

Equation (2.3) involves the term

(2.4)

which is simply the probability that a sum of independent Bernoulli’s is equal to Inline graphic and is easy to compute using discrete convolution.

From (2.2) to (2.4), we see that Inline graphic is easily computable and can be combined with standard likelihood methods to generate estimates of from joint observations of and . Since is not usually observed in a FCD, we call this design an augmented FCD (AFCD). In a regular FCD, from (2.2) we also see that

(2.5)

where we sum over all possible values of Inline graphic for rows with missing data:

(2.6)

Note from (2.3) when we are finding the Inline graphic , we are in the fully observed data case when , which is in contrast to (2.6) when there are missing observations since we do not observe . This is because when is known, then we are able to discern whether we have complete data when . However when is censored, as in the FCD setting, we cannot be certain if Inline graphic or if .

As noted above, our goal for both the AFCD and FCD is to identify the function Inline graphic in the AFCD setting, or the function in the FCD setting so that we may find the maximum likelihood estimates of given our data. Now that and have been specified for the AFCD and FCD data cases, respectively, we employ one of the many available optimization techniques to find the maximum likelihood estimates of of Inline graphic given the available data.

2.2. Variance estimation

The variance estimation for Inline graphic is non-trivial. Simply using the observed information will not incorporate the uncertainty of the censored values. In order to quantify the variance of the maximum likelihood estimates of the parameters, we describe a parametric bootstrap by following these steps with bootstrap samples.

1. Maximize the appropriate likelihood as described above [(2.3) for AFCD or (2.6) for FCD] to produce .
2. Using and the same covariates used in the model in Step 1, generate sociomatrix .
3. With uniform probability delete edges for all row for in of to generate .
4. Maximize the appropriate likelihood using to attain
5. Repeat Steps 2–4 times to generate a distribution of which can be used to get bootstrapped standard errors and confidence intervals.

3. Simulation studies

Using the general framework described above, we experiment with models of the form

where Inline graphic is an appropriate link function for binary data, and the parameter is a column vector and where is a column vector of known covariates that may depend on both and . If was fully observed, so that we could use from (2.1) for inference, then this would be regression with edge-level covariates. Instead, for a FCD or an AFCD, we use Inline graphic or , respectively, as derived in the previous section to find a maximum likelihood estimates. We proceed in the rest of this article to use the probit link function, though other link functions could also be implemented.

We do not directly observe edge-level covariates in this simulation, but rather create them from vertex-level covariates. For each vertex Inline graphic , let be a vector of known covariates. Define the edge-level covariates as some subset of

which has dimension Inline graphic . For example, if we look at age as the single variable of interest (), then we define:

for all Inline graphic and . If we were to use age and income as the two covariates of interest (), then we define:

for all Inline graphic and .

3.1. Simulation design

In order to contrast the AFCD, FCD, a naive analysis (where we assume that all unobserved values of Inline graphic are non-edges), and Hoff and others (2013)’s censored binary (CB) estimator, we performed a simulation study.

For a simulated population of size Inline graphic , we first generated a continuous covariate for all members of the simulated population, which stays fixed for all simulations. In keeping with the previously defined notation, we next generated directed edges between members of the network such that the probability that individual has a directed relationship with individual Inline graphic , denoted by , is independent given :

where

and Inline graphic is a vector. Note that our formulation allows for both row covariates (), column covariates () as well as edge covariates (), and that here we do not use the column covariates. We next simulate the censoring processes of the AFCD, the FCD, and the CB (which has the identical censoring process as the FCD) for maximum number of nominations Inline graphic . Under the AFCD, in row , given and , when , edges are censored, where each of the edges have an equal probability of being censored. Moreover, all non-edges in rows where are censored. This gives us the observed AFCD data . For the FCD, for a maximum in each row where , edges are censored, and all non-edges in these rows are censored, to give us the observed FCD data Inline graphic .

We then obtain maximum likelihood estimators under the AFCD and FCD with varying Inline graphic values. We also find estimates for the CB using the posterior means of using the amen package in R (Hoff and others, 2015). Finally, we carry out the naive analysis by applying probit regression to the data where we treat all censored values as zero, which is the default current practice for analyzing FCD.

3.2. Simulation results

The distribution of the covariate value Inline graphic is displayed in Figure 1(a). We generated the network using which generates networks with a roughly normal distribution of edges with mean number of edges = 10; however, having a normal distribution of edges is not a requirement here. Here, beta implies that for where , the probit of Inline graphic nominating is equal to . Likewise, implies that for a fixed value in the absolute difference between and , as increases by one unit, then the probit of nominating increases by 0.02. Last, implies that for a fixed value of as the absolute difference between and increases by one unit, the probit of Inline graphic nominating decreases by 0.025. We simulated 100 different networks with 100 nodes using these covariate and values. The distribution of for all 100 simulations is plotted in Figure 1(b), where grey bars indicate values used to censor the simulated data . For the CB, which uses Bayesian inference, we used MCMC and generated 55 000 posterior draws. The first 5000 draws were discarded, and of the remaining 50 000 posterior draws, we used every 25th draw, for a total of 2000 posterior draws. We present means of these 2000 posterior draws as the estimates.

Fig. 1. — Simulation specifications: distribution of covariates used to generate simulated networks, (a), and (b) distribution of the true total nominations made from 100 simulations. The grey bars denote values of used to impose censoring in 100 different simulations .

We present the mean squared error (MSE), empirical bias, and SEs of the estimated Inline graphic ’s from the 100 simulations in Table 1.

Table 1.

Mean squared error, empirical bias, and standard deviation of estimated Inline graphic and from 100 simulations, varying the maximum number of nominations observed,

	MSE				Bias				SD
	AFCD	FCD	Naive	CB	AFCD	FCD	Naive	CB	AFCD	FCD	Naive	CB
0	34.68	—	—	—	4.43	—	—	—	3.90	—	—	—
2	0.34	153.08	73.84	377.03	0.01	7.85	8.59	18.84	0.59	9.61	0.26	4.72
4	0.27	32.00	28.31	501.85	0.02	2.46	5.32	18.12	0.52	5.12	0.20	13.24
6	0.26	1.10	10.42	77.28	0.03	0.09	3.22	7.20	0.51	1.05	0.23	5.07
8	0.25	0.40	3.26	4.95	0.03	0.06	1.79	1.77	0.50	0.64	0.28	1.36
	MSE				Bias				SD
	AFCD	FCD	Naive	CB	AFCD	FCD	Naive	CB	AFCD	FCD	Naive	CB
0	5.13	—	—	—	48.38	—	—	—	0.53	—	—	—
2	0.17	27.22	0.04	16.88	1.30	18.21	5.32	16.51	0.13	1.65	0.03	1.30
4	0.16	10.85	0.02	108.48	1.32	13.08	3.70	27.18	0.13	1.04	0.03	3.30
6	0.16	0.60	0.03	15.53	1.20	1.11	3.27	10.70	0.13	0.25	0.04	1.25
8	0.16	0.26	0.05	0.98	1.06	2.53	2.52	0.09	0.13	0.16	0.06	0.31
	MSE				Bias				SD
	AFCD	FCD	Naive	CB	AFCD	FCD	Naive	CB	AFCD	FCD	Naive	CB
0	936.23	—	—	—	651.87	—	—	—	71.89	—	—	—
2	1.30	45.91	5.95	3.54	3.45	134.78	71.59	50.70	3.61	16.74	2.89	3.13
4	0.58	6.14	2.97	1.48	2.32	39.16	50.54	31.84	2.40	6.82	2.06	2.18
6	0.37	0.43	1.54	0.63	1.10	2.09	35.04	17.30	1.93	2.07	1.77	1.84
8	0.31	0.33	0.78	0.34	0.29	0.88	22.23	6.23	1.76	1.84	1.70	1.74

Open in a new tab

For a given maximum Inline graphic , the AFCD had lower MSE than the FCD and CB for and . The AFCD had lower MSE than the naive estimator (in which the data are falsely assumed to be uncensored) for and , though naive estimator had lower MSE than the AFCD for , due to its lower variance. For the parameter, having an AFCD with Inline graphic had a lower MSE than FCD with , naive with and CB with . For the parameter, having an AFCD with outperformed a FCD with and CB with in terms of MSE. For , the AFCD had lower MSE for all levels of simulated as compared to the other estimators. For higher values of , the relative improvement in MSE for AFCD over FCD tended to diminish. In other words, when the data collection design incurs a higher levels of missingness (such as when the maximum number of nominations that can be made in the survey Inline graphic is low relative to , the true total nominations) the added benefit of knowing the true number of relationships can be quite large, which is why the AFCD will outperform the FCD and CB. When is high relative to the distribution of the true total nominations per individual in the network, there will be less missingness and a smaller benefit of the AFCD as compared to the FCD and CB. In general, the AFCD requires lower Inline graphic to achieve comparable MSE to the other estimators, indicating that it may be preferable to collect the total number of relationships to marginally increasing .

When comparing the MSE of the FCD to the CB, neither estimator proved uniformly better. While the FCD had substantially lower MSE than the CB when estimating Inline graphic regardless of , when estimating and the CB has lower MSE when is smaller, and the FCD has lower MSE when is larger.

The naive analyses in which all censored edges are considered to be non-edges (as is a common current practice), estimated Inline graphic poorly. This is result is not surprising as the naive estimator will necessarily underestimate the probability of any edge due to the way that it treats all censored values as non-edges. Notably, the naive estimator tended to have high bias and a low standard deviation.

4. Analysis of UrWeb study

We next apply the method to the UrWeb data set, in which data were collected from residents of a primarily freshman dormitory (Barnett and others, 2014). Each participant was 18 years old or older when the survey was administered and was asked to report the number of days in a month that they consumed alcohol. Central to our interests here, each participant was asked to nominate which of the other participants were important to them. This network is pictured in Figure 2(a). Among the 129 participants included in the sample, 507 nominations were made; 4 participants did not nominate anyone nor were they nominated.

Fig. 2. — UrWeb network (a), and distribution (b) distribution of nominations made by UrWeb study participants.

The UrWeb data were collected under a FCD, with Inline graphic . In this data set, only one person endorsed the maximum number of nominations, which suggests that there is little design-induced missingness of nominations. We will proceed to show the utility of the methods introduced here by artificially inducing a in the UrWeb data set, estimating parameters, and comparing the estimated parameters when Inline graphic to the parameters estimated with probit regression using the full data set. We will artificially induce by deleting edges of individuals with more than in two different ways, first by randomly deleting edges with uniform probability (as above in the simulation study), and secondly by deleting edges with regard to the order that each individual made their nominations. Figure 2(b) displays the distribution of the number of nominations that were made by each participant in the UrWeb study. We will proceed assuming that the UrWeb data are fully observed, and will demonstrate how this method will work in practice, compare the information loss for different Inline graphic values, and contrast AFCD, FCD, CB, and naive analyses.

We use the number of days in a month that the subjects consume alcohol as the Inline graphic covariate in this model:

where Inline graphic indicates that participant nominated , and .

Having artificially induced Inline graphic 100 times, we estimated using the AFCD, FCD, CB, and naive methods. We present boxplots of in Figure 3. In each of these figures, we denote the estimates from the fully observed UrWeb data (where ) with a solid horizontal line and the estimate from the full data the estimated standard error from the full data with dotted horizontal lines.

Fig. 3. — Boxplots of , , and , varying maximum number of nominations , and AFCD, FCD, naive analysis, and CB in the UrWeb data set. The black horizontal line is the estimated value of the parameter when using the full UrWeb data. The dotted lines are computed from the full UrWeb data for: (a) , (b) , and (c) , respectively.

When Inline graphic we are only able to obtain maximum likelihood estimates of for the AFCD. Because there is only one way to impose censoring when no nomination data is observed, Figure 3 presents a point estimate rather than a distribution of estimates when .

The analyses in this section differ from those in Section 3 in a few important ways. First, rather than simulating several networks, we are analyzing a single real network Inline graphic and repeatedly removing edges at random to form many different realizations of . In these analyses, we do not know the true values, and so we cannot evaluate the bias of the estimates. However, we can use these analyses to investigate information loss due to the design-induced censoring by comparing AFCD, FCD, naive, and CB estimates when Inline graphic to estimates when we observe . For example in Figure 3, excluding when the estimates of and from the AFCD are all within one standard error of the estimate when the full UrWeb data are observed (). This is in sharp contrast to the FCD, naive, and CB estimates of and when . In general, the AFCD seems to lose less information than the FCD, which in turn loses less information than the naive analysis and the CB.

In this analysis of the UrWeb network, the AFCD produces estimates of Inline graphic that are roughly centered on the full data estimate for . The FCD method produces estimates that diverge from the estimate when the data are fully observed, especially when is small. This result is in agreement to the simulation study which also showed that the FCD on average produces somewhat biased estimates when Inline graphic is small.

Next, we deleted nominations in the reverse order in which they were made by the participants in the UrWeb study so that Inline graphic . We estimated using AFCD, FCD, CB, and naive methods. For the AFCD and FCD, we calculated standard error estimates from 500 bootstrap samples. For the naive analyses, we simply used the standard error from a regular probit regression model. For the CB, we used the standard deviation of the posterior draws of the Inline graphic terms to estimate the uncertainty of the estimates. These are presented in Figure 4. The UrWeb study was not explicitly designed to accommodate the AFCD, FCD, or CB design in that participants were not prompted to name a random sample of the people who were important to them. It is possible that study participants chose their nominations non-uniformly, for example nominating peers in the order of their importance. By deleting nominations in reverse order, we seek to investigate whether this could impact inference. We see very similar results when comparing Figure 3 in which nominations were deleted independently of order to Figure 4 in which nominations were deleted in reverse order. These results suggest that the order in which nominations were made in this data set did not greatly impact the inferences made in these analyses.

Fig. 4. — Plots of 1 bootstrap standard error for the AFCD and FCD, the probit regression standard error for the naive analysis, and the standard deviation of the posterior distribution estimates for the CB, deleting nominations in the reverse order in which they were made. The dotted lines are computed from the full UrWeb data for: (a) , (b) , and (c) , respectively.

5. Discussion

Collecting complete social network information in a closed population may be difficult as the network survey will impose an unreasonable amount of respondent burden. The FCD seeks to ameliorate respondent burden by asking respondents to nominate up to Inline graphic individuals in the population with whom they have a particular relationship.

In our application, we demonstrate that estimating associations between behaviors and social relationships from social network data arising from a fixed choice survey design as though the social network was fully observed (as is the current standard practice) can result in severely biased estimates. We introduce observed data likelihoods for FCD data. We demonstrated that maximizing the observed data likelihood for the FCD may improve the MSE in comparison to estimates where the data are (falsely) assumed to be fully observed.

We also introduce the AFCD, a new network survey sampling design and method of analysis which collects information on the total number of relationships for each individual in the network, in addition to the data collected with the standard FCD. This novel study design can add considerable information to analyses without unduly burdening the survey respondent, resulting in improvements over the FCD and naive analyses. We demonstrate that the AFCD is superior to both the FCD and naive analyses, as well as Hoff and others (2013)’s CB in terms of MSE. The improvement of the AFCD’s MSE relative to the FCD, CB, and naive analyses is particularly pronounced when the Inline graphic is small relative the number of true total nominations. Unsurprisingly, our simulations show that for every estimator when is larger, variation and bias is smaller. While collecting all nominations from each survey respondent would be optimal in terms of minimizing variance and bias, the AFCD can provide a way to improve estimation while keeping respondent burden low.

Since the AFCD utilizes information on the true total nominations, the estimates of the intercepts are much better with the AFCD than the FCD or the naive analyses. This suggests that the AFCD should be implemented when edge prediction is a goal of the analyses.

Limitations are acknowledged. In this work, we assume that nominations are randomly censored. Violation of this assumption may lead to incorrect inference. This assumption warrants additional investigation, and further research into survey methodology for AFCD and FCD data are necessary. Though, in this work we find that the order in which respondents nominated their peers did not heavily influence inference.

The analyses and simulations we present use a dyad independent ERGM model. This model does not incorporate important network characteristics including reciprocity, transitivity, and clustering. Modeling network structural characteristics and allowing for complex dependencies is particularly important when the goal of the model is to impute missing edges, or provide a realistic network model. Alternative models that incorporate network characteristics and dependencies include the social relations model (Warner and others, 1979), the ERGM family of models (Frank and Strauss, 1986; Robins and others, 2004; Goodreau, 2007), and the latent space and factor models (Hoff and others, 2002; Hoff, 2009).

Hoff and others (2013) presented likelihoods for fixed rank and FCD data. Hoff et al. assume that there is an underlying parametric model for the network that generates the ranked or binary social relations data. Using a social relations model, Hoff et al. perform estimation in the Bayesian framework. A benefit of that approach is the ability to accommodate both ranked and binary nominations. However, that method relies upon an underlying parametric model, requiring more stringent assumptions. As we are concerned with binary and not ranked data, we have compared the performance of Hoff’s CB estimator to the AFCD, FCD, and naive estimator and found that in simulations the AFCD had uniformly lower MSE than the CB, while the FCD often had lower MSE than the CB. It should be noted that Hoff et al. found that their CB estimator performed comparably to their estimator that accounted for social rankings (Hoff and others, 2013), hence we would anticipate that the AFCD would also outperform the fixed rank estimator. Therefore we suggest that when collecting sociometric data, whether or not relationship rankings are collected, that the total number of relationships should be collected, so that censoring can be more readily accounted for.

Funding

NSF (SES-1230081) including support from the National Agricultural Statistics Service; NSF D(MS-1309004); Research Excellence Award from the Center for Alcohol and Addiction Studies, Brown University; and NIH (R01AA023522, R01CA183854, R01AI108441, P01AA019072, P30AI42853).

References

Barnett N., Ott M. Q., Rogers M., Loxley M., Linkletter C. and Clark M. (2014). Peer associations for substance use and exercise in a college student social network. Health Psychology 33, 1134–1142. [DOI] [PubMed] [Google Scholar]
Conlan A. J. K., Eames K. T. D, Gage J. A., von Kirchbach J. C., Ross J. V., Saenz R. A. and Gog J. R. (2010). Measuring social networks in British primary schools through scientific engagement. Proceedings of the Royal Society Series B 278, 1467–1475. [DOI] [PMC free article] [PubMed] [Google Scholar]
Frank O. and Strauss D. (1986). Markov graphs. Journal of the American Statistical Association 81, 832–842. [Google Scholar]
Gommans R. and Cillessen A. H. N. (2015). Nominating under constraints: a systematic comparison of unlimited and limited peer nomination methodologies in elementary school. International Journal of Behavioral Development 39, 77–86. [Google Scholar]
Goodreau S. M. (2007). Advances in exponential random graph (p*) models applied to a large social network. Social Networks 29, 231–248. [DOI] [PMC free article] [PubMed] [Google Scholar]
Goodreau S. M., Kitts J. A. and Morris M. (2009). Birds of a feather, or friend of a friend? Using exponential random graph models to investigate adolescent social networks*. Demography 46, 103–125. [DOI] [PMC free article] [PubMed] [Google Scholar]
Handcock M. S. and Gile K. (2007). Modeling Social Networks with Sampled or Missing Data. Working Paper no. 75, Center for Statistics and the Social Sciences, University of Washington. [Google Scholar]
Hipp J. R., Wang C., Butts C. T., Jose R. and Lakon C. M. (2015). Research note: the consequences of different methods for handling missing network data in stochastic actor based models. Social Networks 41, 56–71. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hoff P. D. (2009). Multiplicative latent factor models for description and prediction of social networks. Computational and Mathematical Organization Theory 15, 261–272. [Google Scholar]
Hoff P. D., Fosdick B., Volfovsky A. and Stovel K. (2013). Likelihoods for fixed rank nomination networks. Network Science 1, 253–277. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hoff P., Fosdick B., Volfovsky A. and He Y. (2015). amen: Additive and Multiplicative Effects Models for Networks and Relational Data. R package version 1.1. [Google Scholar]
Hoff P. D., Raftery A. E. and Handcock M. S. (2002). Latent space approaches to social network analysis. Journal of the American Statistical Association 97, 1090–1098. [Google Scholar]
Holland P. W. and Leinhardt S. (1973). The structural implications of measurement error in sociometry. Journal of Mathematical Sociology 3, 85–111. [Google Scholar]
Knoke D. and Yang S. (2008). Social Network Analysis. Quantitative Applications in the Social Sciences, 2nd edition, Volume 154 Thousand Oaks, CA: Sage Publications. [Google Scholar]
Kossinets G. (2006). Effects of missing data in social networks. Social Networks 28, 247–268. [Google Scholar]
Marsden P. V. (1990). Network data and measurement. Annual Review of Sociology 16, 435–463. [Google Scholar]
Mercken L., Snijders T. A. B, Steglich C., Vartianen E. and de Vries H. (2010). Dynamics of adolescent friendship networks and smoking behavior. Social Networks 32, 72–81. [DOI] [PubMed] [Google Scholar]
Mossong J., Hens N., Jit M., Beutels M., Auranen P. and Mikolajczyk K. (2008). Social contacts and mixing patterns relevant to the spread of infectious diseases. PLoS Medicine 5, e74. [DOI] [PMC free article] [PubMed] [Google Scholar]
Potter G. E., Handcock M. S., Longini I. M. Jr and Halloran M. E. (2011). Estimating within-household contact networks from egocentric data. The Annals of Applied Statistics 5, 1816–1838. [DOI] [PMC free article] [PubMed] [Google Scholar]
Resnick M. D., Bearman P. S., Blum R. W., Bauman K. E., Harris K. M., Jones J., Tabor J., Beuhring T., Sieving R. E., Shew M., Ireland M., Bearinger L. H.. and others (1997). Protecting adolescents from harm. Findings from the National Longitudinal Study on Adolescent Health. Journal of the American Medical Association 278, 823–832. [DOI] [PubMed] [Google Scholar]
Robins G., Pattison P. and Woolcock J. (2004). Missing data in networks: exponential random graph (p*) models for networks with non-respondents. Social Networks 26, 257–283. [Google Scholar]
Smith J. A. and Moody J. (2013). Structural effects of network sampling coverage I: nodes missing at random. Social Networks 35, 652–668. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wang C., Butts C. T., Hipp J. R., Jose R. and Lakon C. M. (2016). Multiple imputation for missing edge data: a predictive evaluation method with application to add health. Social Networks 45, 89–98. [DOI] [PMC free article] [PubMed] [Google Scholar]
Warner R. M., Kenney D. A. and Stoto M. (1979). A new round robin analysis of variance for social interaction data. Journal of Personality and Social Psychology 37, 1742–1757. [Google Scholar]
Witvliet M., Brendgen M., van Lier P., Koot H. M. and Vitaro F. (2010). Early adolescent depressive symptoms: prediction from clique isolation, loneliness, and perceived social acceptance. Journal of Abnormal Child Psychology 38, 1045–1056. [DOI] [PMC free article] [PubMed] [Google Scholar]
Yan B. and Gregory S. (2011). Finding missing edges and communities in incomplete networks. Journal of Physics A: Mathematical and Theoretical 44, 495102. [Google Scholar]

[B1] Barnett N., Ott M. Q., Rogers M., Loxley M., Linkletter C. and Clark M. (2014). Peer associations for substance use and exercise in a college student social network. Health Psychology 33, 1134–1142. [DOI] [PubMed] [Google Scholar]

[B2] Conlan A. J. K., Eames K. T. D, Gage J. A., von Kirchbach J. C., Ross J. V., Saenz R. A. and Gog J. R. (2010). Measuring social networks in British primary schools through scientific engagement. Proceedings of the Royal Society Series B 278, 1467–1475. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B3] Frank O. and Strauss D. (1986). Markov graphs. Journal of the American Statistical Association 81, 832–842. [Google Scholar]

[B4] Gommans R. and Cillessen A. H. N. (2015). Nominating under constraints: a systematic comparison of unlimited and limited peer nomination methodologies in elementary school. International Journal of Behavioral Development 39, 77–86. [Google Scholar]

[B5] Goodreau S. M. (2007). Advances in exponential random graph (p*) models applied to a large social network. Social Networks 29, 231–248. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B6] Goodreau S. M., Kitts J. A. and Morris M. (2009). Birds of a feather, or friend of a friend? Using exponential random graph models to investigate adolescent social networks*. Demography 46, 103–125. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B7] Handcock M. S. and Gile K. (2007). Modeling Social Networks with Sampled or Missing Data. Working Paper no. 75, Center for Statistics and the Social Sciences, University of Washington. [Google Scholar]

[B8] Hipp J. R., Wang C., Butts C. T., Jose R. and Lakon C. M. (2015). Research note: the consequences of different methods for handling missing network data in stochastic actor based models. Social Networks 41, 56–71. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B9] Hoff P. D. (2009). Multiplicative latent factor models for description and prediction of social networks. Computational and Mathematical Organization Theory 15, 261–272. [Google Scholar]

[B10] Hoff P. D., Fosdick B., Volfovsky A. and Stovel K. (2013). Likelihoods for fixed rank nomination networks. Network Science 1, 253–277. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B11] Hoff P., Fosdick B., Volfovsky A. and He Y. (2015). amen: Additive and Multiplicative Effects Models for Networks and Relational Data. R package version 1.1. [Google Scholar]

[B12] Hoff P. D., Raftery A. E. and Handcock M. S. (2002). Latent space approaches to social network analysis. Journal of the American Statistical Association 97, 1090–1098. [Google Scholar]

[B13] Holland P. W. and Leinhardt S. (1973). The structural implications of measurement error in sociometry. Journal of Mathematical Sociology 3, 85–111. [Google Scholar]

[B14] Knoke D. and Yang S. (2008). Social Network Analysis. Quantitative Applications in the Social Sciences, 2nd edition, Volume 154 Thousand Oaks, CA: Sage Publications. [Google Scholar]

[B15] Kossinets G. (2006). Effects of missing data in social networks. Social Networks 28, 247–268. [Google Scholar]

[B16] Marsden P. V. (1990). Network data and measurement. Annual Review of Sociology 16, 435–463. [Google Scholar]

[B17] Mercken L., Snijders T. A. B, Steglich C., Vartianen E. and de Vries H. (2010). Dynamics of adolescent friendship networks and smoking behavior. Social Networks 32, 72–81. [DOI] [PubMed] [Google Scholar]

[B18] Mossong J., Hens N., Jit M., Beutels M., Auranen P. and Mikolajczyk K. (2008). Social contacts and mixing patterns relevant to the spread of infectious diseases. PLoS Medicine 5, e74. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B19] Potter G. E., Handcock M. S., Longini I. M. Jr and Halloran M. E. (2011). Estimating within-household contact networks from egocentric data. The Annals of Applied Statistics 5, 1816–1838. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B20] Resnick M. D., Bearman P. S., Blum R. W., Bauman K. E., Harris K. M., Jones J., Tabor J., Beuhring T., Sieving R. E., Shew M., Ireland M., Bearinger L. H.. and others (1997). Protecting adolescents from harm. Findings from the National Longitudinal Study on Adolescent Health. Journal of the American Medical Association 278, 823–832. [DOI] [PubMed] [Google Scholar]

[B21] Robins G., Pattison P. and Woolcock J. (2004). Missing data in networks: exponential random graph (p*) models for networks with non-respondents. Social Networks 26, 257–283. [Google Scholar]

[B22] Smith J. A. and Moody J. (2013). Structural effects of network sampling coverage I: nodes missing at random. Social Networks 35, 652–668. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B23] Wang C., Butts C. T., Hipp J. R., Jose R. and Lakon C. M. (2016). Multiple imputation for missing edge data: a predictive evaluation method with application to add health. Social Networks 45, 89–98. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B24] Warner R. M., Kenney D. A. and Stoto M. (1979). A new round robin analysis of variance for social interaction data. Journal of Personality and Social Psychology 37, 1742–1757. [Google Scholar]

[B25] Witvliet M., Brendgen M., van Lier P., Koot H. M. and Vitaro F. (2010). Early adolescent depressive symptoms: prediction from clique isolation, loneliness, and perceived social acceptance. Journal of Abnormal Child Psychology 38, 1045–1056. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B26] Yan B. and Gregory S. (2011). Finding missing edges and communities in incomplete networks. Journal of Physics A: Mathematical and Theoretical 44, 495102. [Google Scholar]

PERMALINK

Fixed choice design and augmented fixed choice design for network data with missing observations

Miles Q Ott

Matthew T Harrison

Krista J Gile

Nancy P Barnett

Joseph W Hogan

Summary

1. Introduction

2. Model formulation and inference

2.1. General framework

2.2. Variance estimation

3. Simulation studies

3.1. Simulation design

3.2. Simulation results

Fig. 1.

Table 1.

4. Analysis of UrWeb study

Fig. 2.

Fig. 3.

Fig. 4.

5. Discussion

Funding

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Fixed choice design and augmented fixed choice design for network data with missing observations

Miles Q Ott

Matthew T Harrison

Krista J Gile

Nancy P Barnett

Joseph W Hogan

Summary

1. Introduction

2. Model formulation and inference

2.1. General framework

2.2. Variance estimation

3. Simulation studies

3.1. Simulation design

3.2. Simulation results

Fig. 1.

Table 1.

4. Analysis of UrWeb study

Fig. 2.

Fig. 3.

Fig. 4.

5. Discussion

Funding

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases