Identification of system-level features in HIV migration within a host

Ravi Goyal; Victor De Gruttola; Sara Gianella; Gemma Caballero; Magali Porrachia; Caroline Ignacio; Brendon Woodworth; Davey M Smith; Antoine Chaillon

doi:10.1371/journal.pone.0291367

. 2023 Sep 26;18(9):e0291367. doi: 10.1371/journal.pone.0291367

Identification of system-level features in HIV migration within a host

Ravi Goyal ^1,^*, Victor De Gruttola ², Sara Gianella ¹, Gemma Caballero ¹, Magali Porrachia ¹, Caroline Ignacio ¹, Brendon Woodworth ¹, Davey M Smith ¹, Antoine Chaillon ¹

Editor: Nafees Ahemad³

PMCID: PMC10521982 PMID: 37751407

Abstract

Objective

Identify system-level features in HIV migration within a host across body tissues. Evaluate heterogeneity in the presence and magnitude of these features across hosts.

Method

Using HIV DNA deep sequencing data generated across multiple tissues from 8 people with HIV, we represent the complex dependencies of HIV migration among tissues as a network and model these networks using the family of exponential random graph models (ERGMs). ERGMs allow for the statistical assessment of whether network features occur more (or less) frequently in viral migration than might be expected by chance. The analysis investigates five potential features of the viral migration network: (1) bi-directional flow between tissues; (2) preferential migration among tissues in the same biological system; (3) heterogeneity in the level of viral migration related to HIV reservoir size; (4) hierarchical structure of migration; and (5) cyclical migration among several tissues. We calculate the Cohran’s Q statistic to assess heterogeneity in the magnitude of the presence of these features across hosts. The analysis adjusts for missing data on body tissues.

Results

We observe strong evidence for bi-directional flow between tissues; migration among tissues in the same biological system; and hierarchical structure of the viral migration network. This analysis shows no evidence for differential level of viral migration with respect to the HIV reservoir size of a tissue. There is evidence that cyclical migration among three tissues occurs less frequent than expected given the amount of viral migration. The analysis also provides evidence for heterogeneity in the magnitude that these features are present across hosts. Adjusting for missing tissue data identifies system-level features within a host as well as heterogeneity in the presence of these features across hosts that are not detected when the analysis only considers the observed data.

Discussion

Identification of common features in viral migration may increase the efficiency of HIV cure efforts as it enables targeting specific processes.

1 Introduction

Despite modern antiretroviral therapy (ART), HIV persists in deep tissue and cellular reservoirs. To cure HIV, we need to better understand reservoir persistence and its dynamics. Without ART, cell free virions can be detected in the bloodstream and circulate freely across the body. In addition, when ART is interrupted, HIV can rebound from deep tissues and repopulate other cell and tissue reservoirs [1, 2]. The biological mechanisms governing such replenishment and reseeding remain unclear. Identification of mechanisms driving viral migration would increase the efficiency of HIV cure efforts as it enables targeting specific processes. We take a system-level perspective based on statistical network science techniques applied to HIV genomic data to gain insight into such mechanisms by identifying common features of HIV migration. Specifically, our analysis investigates whether the frequency of system-level features associated with viral migration among the tissues occurs more or less often than would be expected by chance; such features include bi-directional flow between tissues, cyclical migration among several tissues, and hierarchical structure of migration. A challenge that arises in the identification of system-level features is the limited access to hard-to-reach tissues, which results in incomplete observation of viral migration events within hosts. Our analysis compares estimates associated with the frequency of viral migration feature with, and without, adjusting for missing data to investigate the potential impact of such adjustment. The analysis also uses an approach from meta-analysis to evaluate heterogeneity in frequency of these features across the hosts.

To investigate features associated with viral migration, we model the complex dependencies of HIV migration among tissues as a network. We refer to such networks as viral migration networks (VMNs). The use of network analysis has two important advantages over traditional statistical methods. First, network analysis enables statistical investigation of system-level features of replenishment and reseeding of HIV in tissues. Specifically, network analysis enables formal statistical testing of whether features observed in viral migration events occur at greater (or less) frequency than expected under a null hypothesis. For example, phylodynamic analysis for HIV sequencing data using Bayesian discrete diffusion models by [1] showed bi-directional migration within the central nervous system between the occipital lobe and frontal lobe. Network analysis allows an assessment of whether these observations would be expected by chance given the amount of migration among the tissues within a host [3, 4]. Second, network science methods are required to adjust for missing tissue data as standard approaches are not able to handle the complex dependencies among the tissues [5–8]. Ignoring missing data when complex dependencies exist by conducting complete case analysis has been shown to result in significant biases in the estimation of network features [9–13]. In this manuscript, we demonstrate that adjusting for missing tissue data can aid in (and potentially be necessary for) the identification of system-level features within a host as well as in assessing heterogeneity in these features across the hosts. In particular, the analysis presented below shows that observing evidence for the presence of several features depends on doing the adjustment.

A VMN for a participant is a directed network, wherein the tissues and blood are represented as nodes in the network, and the migration events are represented as directed edges from the tissue of egress to the tissue being reseeded or to blood. Migration events can be inferred from viral genomic data using Bayesian discrete trait analyses (DTA), which estimate pairwise migration events between tissues [1]. We use of this approach on HIV full length envelope single genome sequencing data generated from blood and tissue samples from participants enrolled in the Last Gift (LG) cohort [14]. Our analysis that investigates features of the VMNs uses a common family of statistical network models called the exponential random graph models (ERGMs) [3, 4, 15, 16]. Only recently have there been studies using models to investigate an ensemble of networks; [17–19] these studies have generally focused on binary and complete networks. To our knowledge, this is the first study that makes use of network models to investigate an ensemble of value networks in the presence of missing data.

2 Results

2.1 Descriptive analyses of viral migration networks

The VMNs based on DTA applied to the LG cohort are shown in Fig 1. The nodes and edges represent tissues and migration events among the tissues, respectively. The node color indicates the biological system to which the tissue belongs (for example, central nervous system [CNS] or gut), the node size is proportion to the number of sequences within the sample, and the edge width denotes the number of migrations between the two tissues connected. As illustrated in the figure, the number of unsampled/missing tissues varies between 8 (LG04) and 26 (LG12).

The subsequent sections present results from the investigation of five features: (1) bi-directional flow between tissues; (2) preferential migration among tissues in the same biological system; (3) heterogeneity in level of viral migration related to HIV reservoir size; (4) hierarchical structure of migration; and (5) cyclical migration among several tissues. The results include an assessment of the heterogeneity in the presence and magnitude across individuals for each of these features as well as of the impact of missing data on the estimates associated with the features.

2.2 Viral migration is bi-directional (Reciprocity)

Our analysis provides strong evidence for bi-directional migration; also referred to as reciprocity. Fig 2 shows the estimates (blue points) and their 95% confidence interval (CI) (blue error bars) for reciprocity from ERGM models that adjust for missing data for each participant. These results imply that reciprocity is a common feature across all 8 LG participants; but its magnitude varies, i.e., there appears to be heterogeneity in the level of reciprocity across participants. For example, the ERGM estimate of reciprocal HIV migration for LG12 is 3.8 (3.2–4.4 95% CI); whereas for both LG04 and LG06 it is 1.2 (1.0–1.4 95% CI). These estimates may be interpreted as the increase (attributable to reciprocity) in the log-odds of a network that contains an edge that results in a unit increase in the number reciprocity pairs compared to a network without that particular edge. We test for heterogeneity using Cochran’s Q statistic [20, 21], which yields a value of 46.9 with degrees of freedom of 8, and therefore a p-value of < 0.001–providing strong evidence of heterogeneity in the level of reciprocity. Note that the presence of reciprocity does not imply that the same viral sequence goes back and forth between the tissues, but rather that there is a bi-directional dynamic process of seeding and replenishment.

The smaller estimates for reciprocity from models not adjusting for missing data (shown in red in Fig 2) imply that that missing data has a large impact on reciprocity estimates. Note that adjusting for missing data can lead to either a higher or lower propensity of any particular edge being part of a reciprocal pair. Furthermore, visual inspection of Fig 2, shows that adjusting for missing data increases heterogeneity across hosts. The Cochran’s Q statistic without missing data adjustment is 1.9 (p-value 0.98), which provides no evidence for heterogeneity. Hence, failure to adjust for missing data would lead to an underestimate not only of the level of reciprocity but also its heterogeneity across individuals.

2.3 HIV migration events are more common among tissues in the biological system (Homophily)

Another common process that can occur in a network is the tendency of entities (i.e., sampled tissues) to form connections based on similarity of one or more of their individual characteristics. Connection formation is often assortative but can also be dissasortative; the former implies preferential formation of connections between tissues with similar characteristics, and the latter, between tissues with contrasting characteristics [22]. The term homophily is used to describe both process and outcome of preferential connection formation. To assess homophily in this setting, we categorize each tissue based on its biological system. We then assess whether HIV migration events are more common among tissues in the same category. Controlling for the total number of inferred migration events, the propensity of non-zero edges, and the number of HIV sequences obtained from each tissue, our analysis provides strong evidence for presence of homophily in 6 of the 8 participants. Fig 3 shows homophily estimates (blue points) and their 95% confidence interval (CI) (blue error bars) from ERGM models after adjusting for missing data for each participant. For those individuals in whom homophily is detected, there is considerable heterogeneity in its magnitude. For example, LG04 shows little evidence of homophily (log-odds of 0.2, 95% CI 0.1–0.3), whereas LG12 and LG15 show significant evidence of homophily (log-odds of 1.4 for both, 95% 1.1–1.8 for LG12 and 1.2–1.6 for LG15). The Cochran’s Q statistic provides statistical evidence of heterogeneity of the presence of homophily (p-value of 0.006).

Fig 3 — The blue and red points present estimates for analyses that adjusted and not adjusted for missing data, respectively.

As with reciprocity, adjusting for missing data can have a large impact on the estimates associated with homophily as well as assessing heterogeneity; estimates with no missing data adjustment are shown in red. Fig 3 shows the point estimates and 95% CI for the level of homophily adjusting for missing data (shown in blue) are typically larger than estimates without adjustment (red). In contrast to results adjusted for missing data, the Cochran’s Q without adjustment statistic is 2.58 (p-value 0.96), implying no evidence for heterogeneity.

We hypothesize that tissues that are closer in the body have a higher level of homophily. For example, homophily among central nervous system (CNS) tissues can arise from the blood-brain barrier, which has the potential to make the CNS differ from any other organ regarding HIV persistence. Other biological causes of homophily include similarity of cell composition. We estimate a model that can assess the homophily for each tissue type. Controlling for the total number of inferred migration events, the propensity of a non-zero edge, and number of HIV sequences as well as adjusting for missing data, our analysis reveals strong indication of homophily in both CNS tissues and gut in 5 of the 7 participants; the ERGM for participant LG12 did not converge–a known issues with ERGMs [23–25]. Table 1 shows the estimates from the ERGM models for each participant (‘*’ indicates statistical significance at the 0.05 level). Given the number of tissues sampled for each category, these results should be viewed as preliminary, and we do not present results without adjusting for missing data.

Table 1. Estimation of the level of homophily in VMNs by tissue type.

Cells with an ‘*’ indicate statistical significance at the 0.05 level.

System	LG01	LG03	LG04	LG05	LG06	LG08	LG15
Blood	-1.61*	-0.1	-0.64	-0.03	-0.28	-0.18	-0.19
CNS	-0.15	1.75*	0.67*	1.9*	0.83*	-0.22	2.31*
Genitourinary tract	0.15	-0.11	-0.03	-0.03	0.02	-0.44	-0.09
Gut	0.09	0.35*	0.22*	-0.86*	0.47*	0.12	1.98*
Lymphoid	-0.01	0.29*	0.17	0.52*	0.03	-0.21*	-0.06
Other	-0.81	-0.09	-0.07	-0.04	0.06	0.2	-0.07

Open in a new tab

2.4 Number of viral migration events is not association with reservoir size (heterogeneity in outward events)

We hypothesize that having large HIV reservoirs (i.e., level of cell-associated HIV DNA) may directly impact the number of viral migration events from tissues. Testing this hypothesis is based on investigation of heterogeneity in outward events among tissues, i.e., assessment of whether tissues with larger HIV reservoirs have a greater or smaller number of outward edges compared to others. Our model controls for the total number of inferred migration events, the propensity of the tissue to have a non-zero number of edges, and the number of HIV sequences. Fitting these models yielded no indication of heterogeneity in outward events based on HIV reservoir size for any of the LG participants (results not shown), i.e., the size of the HIV DNA reservoirs in sampled tissue did not impact the number of outgoing edges.

2.5 Viral migration forms a hierarchical structure among the tissues (Triads: Transitivity and 3-cycles)

Transitivity and 3-cycles are network properties that are often considered in the investigation of networks; both involve triads—three node subgraphs. Transitivity is the tendency for migration events to occur between tissues ‘A’ and ‘C’ if there are events between tissues ‘A’ and ‘B’ and ‘B’ and ‘C’. Cycles consisting of 3-nodes is a feed-back process, but is distinct from reciprocity, which is a 2-node cycle. The results below in Table 2 are based only on the observed data (‘*’ indicates statistical significance at the 0.05 level); ERGMs adjusting for missing data did not converge. Evidence in support of a greater-than-expected number of network motifs associated with transitivity (as shown by positive and significant estimates), but a less-than-expected number of 3-cycles (negative and significant estimates) implies a hierarchical structure in VMNs.

Table 2. Estimation of the level of triads and 3-cycles in VMNs.

Cells with an ‘*’ indicate statistical significance at the 0.05 level.

Feature	LG01	LG03	LG04	LG05	LG06	LG08	LG12	LG15
Transitivity	0.21*	0.38*	0.24*	0.27*	0.22*	0.44*	0.53*	0.65*
Cycles	-0.08	-0.16*	-0.09*	-0.05	-0.14*	-0.14*	-0.18	-0.05

Open in a new tab

3 Discussion

We investigate five network features: (1) bi-directional flow between tissues; (2) preferential migration among tissues in the same biological system; (3) heterogeneity in level of viral migration with regards to the HIV reservoir size; (4) hierarchical structure of migration; and (5) cyclical migration among several tissues. Due to biological and logistical challenges, not all tissues across a human body are sampled and sequenced; hence, the observations regarding the VMNs are incomplete. We adjust for missing data on host tissues. Our results provide strong evidence for bi-directional flow between tissues, migration among tissues in the same biological system, and hierarchical structure of the viral migration. This analysis provides no evidence that level of viral migration depends on the HIV reservoir size of a tissue. There is evidence in support of the believe that cyclical migration among three tissues occurs less frequently than expected given the amount of viral migration. The analyses also provide evidence for heterogeneity in the presence and magnitude of these features across hosts. Adjustment for missing data has a large impact on our estimates. In particular, adjusting for missing tissue data identifies system-level features within a host as well as heterogeneity in these features across hosts that is not detected without the adjustment.

Evidence for transitivity and against 3-cycles indicates a hierarchical structure in VMNs. This is noteworthy as it tends to imply that few tissues are initial sources of reseeding, thereby further implying that migration propagates through intermediate tissues. This finding complements the absence of evidence for heterogeneity in outward events with respect to HIV reservoir size. Once again, the implication is that VMNs are not characterized by a small number of tissues with large HIV reservoirs serving as the source of direct viral migration to other tissues, that is, our VMNs do not appear to have a hub and spoke network structure with respect to HIV reservoir size. One possible explanation for the absence of heterogeneity in outward events may be that HIV DNA does not reflect HIV replication-competent viruses; our findings might be best confirmed by basing analyses on HIV RNA transcripts measures. Such data are currently being generated for the LG participants. To control for any sample bias (e.g., re-sampling of sequences), our analysis adjusted for the number of unique sequences identified in the tissues. This adjustment may also explain why tissues with a high levels of HIV DNA were not found to be “hubs”. [2] found complimentary results–specifically that rebound virus can originate from several cellular and anatomical compartments after treatment interruption [2].

Our investigation has implications for studies aimed at developing HIV curative strategies by identifying features of VMNs, even though it does not reveal the biological mechanisms driving viral migration. The latter would require deeper study of additional biomarkers—currently underway using the LG cohort. For example, our investigation of homophily shows that tissues within the same biological system are more likely to have migration events among them than tissues that are not within a single system, but the analysis does not reveal the underlying mechanism. Possible explanations for our results are that tissues within a biological system have similar cell composition, a high level of blood exchange, or share other characteristics that promote viral migration. Further analyses using HIV sequences generated at the cellular level are in progress and will be integrated into future models.

Evolution of HIV-1 in a host is shaped by many evolutionary forces, including recombination. If compartmentalization reflects spatial segregation of the virus population, viral recombination is a result of population mixing. Hence, if different point mutations may arise in different tissues, viral migration may bring these variants together and lead eventually to recombination and intermixed viral population. We acknowledge that both migration and recombination should be investigated when studying HIV-1 dynamics within host. While our study does not attempt to evaluate this combined effect and it would require further investigation [26], we investigate the potential impact of intra-host recombination. To do so, we first use GARD to identify potential recombination breakpoints [27]; see S1 Table in S1 File for an overview of the number of putative breakpoint identified. Next, we run our network models (i.e., ERGMs) using the partitioned dataset according to the inferred breakpoint(s). However, these new models exhibit convergence issues; therefore, we cannot provide conclusive assessment of the impact of recombination on features of the viral migration network. Below we provide additional details regarding convergence issues with ERGMs and a alternative network model that does not exhibit such issues, but requires additional methodological development. Furthermore, other factors, such as local immune pressure and antiretroviral therapies, would also need to be considered to comprehensively characterize factors influencing viral dynamics and evolution within host.

Our analyses have several other scientific limitations: it was conducted on 8 participant, some of whom had only a limited number of tissues. Also, there is heterogeneity in the participants, in particular, their terminal disease and ART usage; this heterogeneity may impact the generalizability of our findings. In addition, the construction of the VMNs is based on HIV full-length envelopes sequences (gp160) using single genome dilution techniques. We also include a filtering step to identify of defective or hypermutant sequences [1]. While using HIV envelopes contain less information than full-length genome, they are less likely to be impacted by amplification failure of long fragment. Furthermore, full-length genome may miss a large proportion of intact proviruses due to amplification failure [28]. Our findings also can be impacted by blood T cell contamination of tissue samples obtained during autopsy. Previous sequence analyses on the samples showed viral compartmentalization for all participants, which suggests that possible blood contamination would not negate our findings; see Chaillon et al. [1] for additional details regarding contamination. With regard to statistical issues, ERGMs can suffer from degeneracy as seen in some of the analyses, including transitivity and 3-cycles [23–25]. Another limitation of ERGMs is that the theoretical foundation for estimation of standard error associated with the ERGMs has not been fully developed–even for completely observed networks [29]. Additional development is needed for to incorporate the additional uncertainty that arises from missing data in estimates of confidence intervals. Such uncertainty can be captured using a Bayesian paradigm. Bayesian extensions of the ERGMs–referred to as BERGMs–has been developed, which can capture such uncertainty in estimating credible intervals [8]. These approaches, however, do not currently allow for analyzing valued networks. In addition, there are computational limits to the size of networks that can be analyzed using BERGMs. A potential future direction is investigating complex features using the congruence class model (CCM) for networks [30–34]. CCMs form a broad class that includes as special cases such common network models as the Erdős-Rényi-Gilbert and stochastic block models as well as many ERGMs. CCMs requires additional methodological development to address missing network data, which is necessary for our context of missing tissues; such methodological development is currently underway.

The primary goal of our results and investigation is to understand the potential (and necessity) of analyzing viral migration using network science techniques. While phylodynamic modeling has greatly enhanced these efforts, our analysis provides additional insights through analysis of migration as a network—rather than as pairwise events between two tissues. Insights gained using network science techniques in analysis of VMNs have therapeutic implications, in that they may aid in the identification of common features in viral migration in people with HIV (or a subpopulation, such as those who interrupt ART). An understanding of these features may elucidate potential processes to target the source of viral reseeding. For example, our findings suggest a hierarchical structure for viral migration among the tissues. Treatments targeting tissues upstream may be more efficient in preventing viral rebound compared to treatments focused on tissues further down in the viral migration structure. Therefore, this research may serve as initial insight into developing more efficient treatments to provide viral migrating and reseeding of tissues.

4 Methods

4.1 Last Gift Cohort

The data derives from the LG Cohort, which is a cohort from an end-of-life HIV research study underway at the University of California San Diego. The goal of the study is to investigate HIV reservoirs using a rapid autopsy procedure in PWH who voluntarily agree to have their organs harvested post-mortem [1, 14]. To accomplish this goal, the study collects detailed clinical data and biological samples from participants before death and then collects additional samples during a rapid autopsy procedure [35]. Table 3 presents the demographic characteristics of the participants and summary statistics of their VMN, including the number of observed and missing tissues and the number of directed edges (i.e., the presence of a migration event inferred from Bayesian models). S2 Table in S1 File provides information on the LG participant number, tissue name, system category, and number of HIV sequences from each tissue sample. All HIV sequences—except for blood plasma samples—are proviral DNA sequences. The genomic DNA is extracted, and precipitation is performed to concentrate DNA. Concentrations of DNA are determined using NanoDrop One (ThermoScientific). We perform single genome dilution and sequencing of full-length envelope (gp160). See the S1 File in [1] for additional information on collection and processing of data from the LG Cohort.

Table 3. Demographic characteristics of the participants and summary statistics of their VMN, including the number of observed and missing tissues and the number of directed edges (i.e., the presence of a migration event inferred from Bayesian models).

ID	Gender	On ART at death	Cancer Cancer	Observed tissues	Missing tissues	Observed migration
LG01	M	Yes	No	10	23	23
LG03	M	No	Yes	19	14	87
LG04	M	Yes	No	25	8	151
LG05	M	No	No	13	20	36
LG06	M	No	Yes	20	13	100
LG08	M	No	Yes	20	13	122
LG12	F	No	Yes	7	26	20
LG15	M	Yes	No	11	22	41

Open in a new tab

4.2 Bayesian discrete phylogeographic models

In this study, we use inferred migration events between tissues obtained from Bayesian discrete phylogeographic models using HIV full length envelopes sequences sampled in each tissue [36, 37]. Briefly, discrete trait analyses (DTA) consider spatial diffusion among discrete locations, from which viral sequences have been sampled, as a continuous-time Markov process [36]. From a statistical perspective, this Bayesian stochastic search variable sampling (BSSVS) procedure is particularly appropriate because statistical inference is efficient (achieves lowest variance). Importantly, it also uses a Bayes Factor (BF) test to infer the most parsimonious description of the diffusion process [36]. With the BSSVS procedure, BF support for all possible migration rates are obtained in a single DTA analysis [36]. As this procedure only accounts for the number of trait states (e.g. locations), it remains difficult to assess significant support for a particular migration link. To remedy this problem [36], developed a new measure of significance (the adjusted BF test) that has a low false-positive rate by incorporating information on the relative abundances of samples from each location in the data set. The new measure of support for particular migration relies on the a priori expected and a posteriori noted inclusion frequencies under BSSVS. See Chaillon et al. [1] for additional details.

4.3 Exponential random graph models

We use ERGMs to make inference regarding features in the presence of missing data. Let Y be the space of all potential directed valued networks with N tissues. In our setting N = 33, the number of unique tissues sampled across all LG participants. The values represent counts of migration event among the tissues. Let y ∈ Y and let y_i,j be the value for edge (i, j) where i and j denote tissues. The probability mass function (PMF) defined by an ERGM on Y is the following PMF:

\begin{matrix} P_{h, g} (Y = y; θ) = \frac{h (y) e x p (θ^{T} g (y))}{κ_{h, g} (θ)}, \end{matrix}

(1)

where θ is our vector of parameters, g(y) is a vector of summary statistics of y, h specifies a reference measure, and κ_h,g(θ) is a normalizing constant; see Krivitsky et al. [38] for additional details.

Our analyses investigate network models associated with each of our features. Each of the models control for the total amount of migration, the propensity of a non-zero edges, and number of HIV sequences. We define a tissue as missing if the tissue sample is not available for an individual but is collected for at least one other participant in the LG cohort. To account for missing data, we assume that the pattern of missingness is ignorable; that is, the probability of a value being missing only depends on the observed data [7]. The likelihood for model parameters is then calculated by marginalizing the ERGM likelihood over all possible complete networks that are compatible with the observed data. The models are estimated using the STATNET package [39, 40] in R CRAN [41].

4.4 Assessing heterogeneity

To investigate heterogeneity in the effects of mechanisms across participants, we use both graphical and statistical assessments. For our graphical assessment, we investigate whether confidence intervals for the parameter associated with a particular mechanism across individual have little overlap. Such a pattern generally indicates the presence of heterogeneity. To formally test for heterogeneity, we use the Cochran’s Q statistic [20], shown below:

\begin{matrix} Q = n \sum_{i = 1}^{n} w_{i} * {(\hat{θ_{i}} - \hat{θ})}^{2}, \end{matrix}

(2)

where $\hat{θ}$ is the mean estimate across all LG participants for the network property of interest, $\hat{θ_{i}}$ is the estimate for LG participant i, and w_i is the inverse of the variance for participant i. The sum is across all n = 8 LG participants. Due to weights on the items in the sum, the value of Q depends not only on the deviation of $\hat{θ_{i}}$ from $\hat{θ}$ , but also on the precision of participant estimates. The Cochran’s Q statistic approximately follows a chi-squared distribution, with n − 1 degrees of freedom.

Supporting information

S1 File. Contains all the supplementary tables.

(PDF)

Click here for additional data file.^{(130.1KB, pdf)}

Data Availability

The sequence data have been uploaded on Dryad at DOI: 10.5061/dryad.dncjsxm44. The code has been uploaded to Github at: https://github.com/ravigoyalgit/VMN.

Funding Statement

This research is supported by grants from the National Institutes of Health (P01 AI131385, R01 AI147441, DP2 DA051915, R01 DK131532, P01 AI169609, P01 AI169609, P30 AI036214, UM1 AI164570, R01 AI147821, UM1 AI164559, R01 DA055491), James B. Pendleton Charitable Trust, and Department of Veterans Affairs.

References

1. Chaillon A, Gianella S, Dellicour S, Rawlings SA, Schlub TE, De Oliveira MF, et al. HIV persists throughout deep tissues with repopulation from multiple anatomical sources. The Journal of clinical investigation. 2020;130(4):1699–1712. doi: 10.1172/JCI134815 [DOI] [PMC free article] [PubMed] [Google Scholar]
2. De Scheerder MA, Vrancken B, Dellicour S, Schlub T, Lee E, Shao W, et al. HIV rebound is predominantly fueled by genetically identical viral expansions from diverse reservoirs. Cell host & microbe. 2019;26(3):347–358. doi: 10.1016/j.chom.2019.08.003 [DOI] [PMC free article] [PubMed] [Google Scholar]
3. Robins G, Pattison P, Kalish Y, Lusher D. An introduction to exponential random graph (p*) models for social networks. Social networks. 2007;29(2):173–191. doi: 10.1016/j.socnet.2006.08.003 [DOI] [Google Scholar]
4. Lusher D, Koskinen J, Robins G. Exponential random graph models for social networks: Theory, methods, and applications. vol. 35. Cambridge University Press; 2013. [Google Scholar]
5. Wang C, Butts CT, Hipp JR, Jose R, Lakon CM. Multiple imputation for missing edge data: a predictive evaluation method with application to add health. Social networks. 2016;45:89–98. doi: 10.1016/j.socnet.2015.12.003 [DOI] [PMC free article] [PubMed] [Google Scholar]
6. Koskinen JH, Robins GL, Pattison PE. Analysing exponential random graph (p-star) models with missing data using Bayesian data augmentation. Statistical Methodology. 2010;7(3):366–384. doi: 10.1016/j.stamet.2009.09.007 [DOI] [Google Scholar]
7. Handcock MS, Gile KJ. Modeling social networks from sampled data. The Annals of Applied Statistics. 2010;4(1):5. doi: 10.1214/08-AOAS221 [DOI] [PMC free article] [PubMed] [Google Scholar]
8. Caimo A, Friel N. Bayesian inference for exponential random graph models. Social Networks. 2011;33(1):41–55. doi: 10.1016/j.socnet.2010.09.004 [DOI] [Google Scholar]
9. Kossinets G. Effects of missing data in social networks. Social networks. 2006;28(3):247–268. doi: 10.1016/j.socnet.2005.07.002 [DOI] [Google Scholar]
10. Smith JA, Moody J. Structural effects of network sampling coverage I: Nodes missing at random. Social networks. 2013;35(4):652–668. doi: 10.1016/j.socnet.2013.09.003 [DOI] [PMC free article] [PubMed] [Google Scholar]
11. Smith JA, Moody J, Morgan JH. Network sampling coverage II: The effect of non-random missing data on network measurement. Social networks. 2017;48:78–99. doi: 10.1016/j.socnet.2016.04.005 [DOI] [PMC free article] [PubMed] [Google Scholar]
12. Krause RW, Huisman M, Steglich C, Snijders T. Missing data in cross-sectional networks–An extensive comparison of missing data treatment methods. Social Networks. 2020;62:99–112. doi: 10.1016/j.socnet.2020.02.004 [DOI] [Google Scholar]
13. Smith JA, Morgan JH, Moody J. Network sampling coverage III: Imputation of missing network data under different network and missing data conditions. Social Networks. 2022;68:148–178. doi: 10.1016/j.socnet.2021.05.002 [DOI] [PMC free article] [PubMed] [Google Scholar]
14. Gianella S, Taylor J, Brown TR, Kaytes A, Achim CL, Moore DJ, et al. Can Research at the End-of-life be a Useful Tool to Advance HIV Cure? AIDS (London, England). 2017;31(1):1. doi: 10.1097/QAD.0000000000001300 [DOI] [PMC free article] [PubMed] [Google Scholar]
15. Frank O, Strauss D. Markov graphs. Journal of the american Statistical association. 1986;81(395):832–842. doi: 10.1080/01621459.1986.10478342 [DOI] [Google Scholar]
16. Wasserman S, Pattison P. Logit models and logistic regressions for social networks: I. An introduction to Markov graphs andp. Psychometrika. 1996;61(3):401–425. doi: 10.1007/BF02294547 [DOI] [Google Scholar]
17. Yin F, Shen W, Butts CT. Finite Mixtures of ERGMs for Modeling Ensembles of Networks. Bayesian Analysis. 2022;1(1):1–39. [Google Scholar]
18.Lehmann B, White S. Bayesian exponential random graph models for populations of networks. arXiv preprint arXiv:210405110. 2021;.
19. Lunagómez S, Olhede SC, Wolfe PJ. Modeling network populations via graph distances. Journal of the American Statistical Association. 2021;116(536):2023–2040. doi: 10.1080/01621459.2020.1763803 [DOI] [Google Scholar]
20. Cochran WG. The combination of estimates from different experiments. Biometrics. 1954;10(1):101–129. doi: 10.2307/3001666 [DOI] [Google Scholar]
21. Harrer M, Cuijpers P, Furukawa TA, Ebert DD. Doing meta-analysis with R: A hands-on guide. Chapman and Hall/CRC; 2021. [Google Scholar]
22. Goodreau SM, Kitts JA, Morris M. Birds of a feather, or friend of a friend? Using exponential random graph models to investigate adolescent social networks. Demography. 2009;46(1):103–125. doi: 10.1353/dem.0.0045 [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Handcock MS, Robins G, Snijders T, Moody J, Besag J. Assessing degeneracy in statistical models of social networks. Working paper; 2003.
24. Snijders TA, Pattison PE, Robins GL, Handcock MS. New specifications for exponential random graph models. Sociological methodology. 2006;36(1):99–153. doi: 10.1111/j.1467-9531.2006.00176.x [DOI] [Google Scholar]
25. Mukherjee S. Degeneracy in sparse ERGMs with functions of degrees as sufficient statistics. Bernoulli. 2020;26(2):1016–1043. doi: 10.3150/19-BEJ1135 [DOI] [Google Scholar]
26. Sarkar S, Romero-Severson E, Leitner T. Migration coupled with recombination explains disparate HIV-1 anatomical compartmentalization signals. bioRxiv. 2023; p. 2023–04. [Google Scholar]
27. Kosakovsky Pond SL, Posada D, Gravenor MB, Woelk CH, Frost SD. Automated phylogenetic detection of recombination using a genetic algorithm. Molecular biology and evolution. 2006;23(10):1891–1901. doi: 10.1093/molbev/msl051 [DOI] [PubMed] [Google Scholar]
28. White JA, Kufera JT, Bachmann N, Dai W, Simonetti FR, Armstrong C, et al. Measuring the latent reservoir for HIV-1: Quantification bias in near full-length genome sequencing methods. PLoS pathogens. 2022;18(9):e1010845. doi: 10.1371/journal.ppat.1010845 [DOI] [PMC free article] [PubMed] [Google Scholar]
29. Kolaczyk ED, Krivitsky PN. On the question of effective sample size in network modeling: An asymptotic inquiry. Statistical science: a review journal of the Institute of Mathematical Statistics. 2015;30(2):184. doi: 10.1214/14-STS502 [DOI] [PMC free article] [PubMed] [Google Scholar]
30. Goyal R, Blitzstein J, De Gruttola V. Sampling networks from their posterior predictive distribution. Network Science. 2014;2(01):107–131. doi: 10.1017/nws.2014.2 [DOI] [PMC free article] [PubMed] [Google Scholar]
31. Goyal R, De Gruttola V. Inference on network statistics by restricting to the network space: applications to sexual history data. Statistics in medicine. 2018;37(2):218–235. doi: 10.1002/sim.7393 [DOI] [PMC free article] [PubMed] [Google Scholar]
32. Goyal R, De Gruttola V. Dynamic Network Prediction. Network Science. 2020;8(04):574–595. doi: 10.1017/nws.2020.24 [DOI] [PMC free article] [PubMed] [Google Scholar]
33. Goyal R, De Gruttola V. Investigation of patient-sharing networks using a Bayesian network model selection approach for congruence class models. Statistics in medicine. 2021;40(13):3167–3180. doi: 10.1002/sim.8969 [DOI] [PMC free article] [PubMed] [Google Scholar]
34. Goyal R, Carnegie N, Slipher S, Turk P, Little SJ, De Gruttola V. Estimating contact network properties by integrating multiple data sources associated with infectious diseases. Statistics in Medicine. 2023;. doi: 10.1002/sim.9816 [DOI] [PMC free article] [PubMed] [Google Scholar]
35. Rawlings SA, Layman L, Smith D, Scott B, Ignacio C, Porrachia M, et al. Performing rapid autopsy for the interrogation of HIV reservoirs. AIDS (London, England). 2020;34(7):1089. doi: 10.1097/QAD.0000000000002546 [DOI] [PMC free article] [PubMed] [Google Scholar]
36. Lemey P, Rambaut A, Drummond AJ, Suchard MA. Bayesian phylogeography finds its roots. PLoS computational biology. 2009;5(9):e1000520. doi: 10.1371/journal.pcbi.1000520 [DOI] [PMC free article] [PubMed] [Google Scholar]
37. Edwards CJ, Suchard MA, Lemey P, Welch JJ, Barnes I, Fulton TL, et al. Ancient hybridization and an Irish origin for the modern polar bear matriline. Current biology. 2011;21(15):1251–1258. doi: 10.1016/j.cub.2011.05.058 [DOI] [PMC free article] [PubMed] [Google Scholar]
38. Krivitsky PN. Exponential-family random graph models for valued networks. Electronic journal of statistics. 2012;6:1100. doi: 10.1214/12-EJS696 [DOI] [PMC free article] [PubMed] [Google Scholar]
39.Handcock MS, Hunter DR, Butts CT, Goodreau SM, Krivitsky PN, Morris M. ergm: Fit, Simulate and Diagnose Exponential-Family Models for Networks; 2018. Available from: https://CRAN.R-project.org/package=ergm. [DOI] [PMC free article] [PubMed]
40. Hunter DR, Handcock MS, Butts CT, Goodreau SM, Morris M. ergm: A Package to Fit, Simulate and Diagnose Exponential-Family Models for Networks. Journal of Statistical Software. 2008;24(3):1–29. doi: 10.18637/jss.v024.i03 [DOI] [PMC free article] [PubMed] [Google Scholar]
41.R Core Team. R: A Language and Environment for Statistical Computing; 2021. Available from: https://www.R-project.org/.

PLoS One. doi: 10.1371/journal.pone.0291367.r001

Decision Letter 0

Nafees Ahemad

7 Feb 2023

PONE-D-22-24111Identification of system-level features in HIV migration within a hostPLOS ONE

Dear Dr. Goyal,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

==============================

Dear Authors

Based on the comments from the reviewers, the manuscript needs a minor revision before acceptance. There are a few concerns by reviewers. Please address those.

==============================

Please submit your revised manuscript by Mar 24 2023 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.
A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.
An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Nafees Ahemad

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. In the Methods section of your revised manuscript, please include the full name of the institutional review board or ethics committee that approved the protocol, the approval or permit number that was issued, and the date that approval was granted.

Please note that PLOS ONE has specific guidelines on code sharing for submissions in which author-generated code underpins the findings in the manuscript. In these cases, all author-generated code must be made available without restrictions upon publication of the work. Please review our guidelines at https://journals.plos.org/plosone/s/materials-and-software-sharing#loc-sharing-code and ensure that your code is shared in a way that follows best practice and facilitates reproducibility and reuse.

3. Thank you for stating in your Funding Statement:

“This research is supported by grants from the National Institutes of Health (P01 AI131385-05, R01 AI-147441, DA051915, R01 DK131532, P01 AI169609)

Sponsors did not play any role in the manuscript.”

Please provide an amended statement that declares *all* the funding or sources of support (whether external or internal to your organization) received during this study, as detailed online in our guide for authors at http://journals.plos.org/plosone/s/submit-now. Please also include the statement “There was no additional external funding received for this study.” in your updated Funding Statement.

Please include your amended Funding Statement within your cover letter. We will change the online submission form on your behalf.

4. Thank you for stating the following in your Competing Interests section:

“None”

Please complete your Competing Interests on the online submission form to state any Competing Interests. If you have no competing interests, please state "The authors have declared that no competing interests exist.", as detailed online in our guide for authors at http://journals.plos.org/plosone/s/submit-now

This information should be included in your cover letter; we will change the online submission form on your behalf.

5. We note that you have indicated that data from this study are available upon request. PLOS only allows data to be available upon request if there are legal or ethical restrictions on sharing data publicly. For more information on unacceptable data access restrictions, please see http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions.

In your revised cover letter, please address the following prompts:

a) If there are ethical or legal restrictions on sharing a de-identified data set, please explain them in detail (e.g., data contain potentially sensitive information, data are owned by a third-party organization, etc.) and who has imposed them (e.g., an ethics committee). Please also provide contact information for a data access committee, ethics committee, or other institutional body to which data requests may be sent.

b) If there are no restrictions, please upload the minimal anonymized data set necessary to replicate your study findings as either Supporting Information files or to a stable, public repository and provide us with the relevant URLs, DOIs, or accession numbers. For a list of acceptable repositories, please see http://journals.plos.org/plosone/s/data-availability#loc-recommended-repositories.

We will update your Data Availability statement on your behalf to reflect the information you provide.

6. Please update your submission to use the PLOS LaTeX template. The template and more information on our requirements for LaTeX submissions can be found at http://journals.plos.org/plosone/s/latex.

7. Thank you for stating the following in the Acknowledgments Section of your manuscript:

“This research is supported by grants from the National Institutes of Health (P01 AI131385-05, R01 AI-147441, DA051915, R01 DK131532, P01 AI169609). Conflict of Interest: None.”

We note that you have provided additional information within the Acknowledgements Section that is not currently declared in your Funding Statement. Please note that funding information should not appear in the Acknowledgments section or other areas of your manuscript. We will only publish funding information present in the Funding Statement section of the online submission form.

Please remove any funding-related text from the manuscript and let us know how you would like to update your Funding Statement. Currently, your Funding Statement reads as follows:

“This research is supported by grants from the National Institutes of Health (P01 AI131385-05, R01 AI-147441, DA051915, R01 DK131532, P01 AI169609

Sponsors did not play any role in the manuscript.”

Please include your amended statements within your cover letter; we will change the online submission form on your behalf.

Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

Additional Editor Comments (if provided):

Dear Authors

Based on the comments from the reviewers, the manuscript needs a minor revision before acceptance. There are a few concerns by reviewers. Please address those.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Partly

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: I Don't Know

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: No

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: Goyal and colleagues utilize unique and diverse tissue samples from the Last Gift cohort to assess HIV DNA sequencing data to study the “migration” of HIV sequences across and between tissues. This is an interesting methodological and data analysis approach to questions that are difficult to ask, especially as access to tissues is limited. The approaches have several technical limitations, which may currently be unavoidable, that should be acknowledged and discussed.

As has been recently discussed (White JA et al. PLoS Pathog. 2022. PMID: 36074794) near-full-length PCR techniques may introduce bias and over-represent some sequences. How could such bias influence the representation of sequences and their apparent "flow" or change in proportion over time? Further most sequences are defective, but may proliferate as their host cells proliferate, Both myeloid and lymphoid cells may carry HIV DNA, and some cells may migrate within tissue to various body compartments, without true migration of viral particles due to spreading infection. These biological aspects should be discussed in the methodological description of the analysis and its interpretation.

Reviewer #2: 1) The paper states that data and software will be available from the authors upon request, but the proper way to make the data available is to create GenBank entries for the HIV sequences and list the accession numbers in the publication. It is also very nice of multiple sequence alignments, or other useful data formats are stored at TreeBase, or the data DRYAD or similar online repositories.

2) Intra-patient viral recombination is not mentioned in the paper. It is difficult to analyze intra-subtype recombination in HIV-1, and even more difficult to analyze intra-patient recombination. However, tools such as the Highlighter too at the LANL HIV Database ( https://www.hiv.lanl.gov/content/sequence/HIGHLIGHT/highlighter_top.html ) and GARD ( http://www.datamonkey.org/GARD/ ) can be useful in identifying whether or not recombination is likely to be influencing phylogenetic analyses.

3)The Figure 1 legend does not mention it, but I assume the overall size or diameter of each patient graph is proportional to virus diversity in that patient. So for example LG12 fig1G had less diverse virus than LG01 fig1A.

4) There are many sentences which don't make sense to me, and I wonder if it is because words are missing? For example on page 15 "While phylodynamic modeling has

greatly enhanced these e orts, our analysis provides additional insights through analysis of migra-

tion as a network|rather than as pairwise events tissue." Maybe was supposed to end with "pairwise events between two tissues."?

5) I think the paper could benefit from a better description of how this type of study can help with a cure. The paper says "Insights gained using network science techniques in analysis of VMNs have therapeutic implications, in that they may aid in the identification of common features in viral migration, and, by facilitating the targeting of specific processes, potentially increase efficacy of HIV cure." Three of the 8 patients were not on ART at the time of death, and might have therefor had higher viral loads than the others.

6) The paper has quite a bit of discussion of the computational analyses, but no information is provided about the data acquisition. How were the tissues sampled to reduce or eliminate the potential for sampling blood cells rather than tissue cells in each tissue? Are the sequences likely to be from viral RNA or from proviral DNA integrated into the host genome? Was the complete envelope gp160 region sequenced? What were the 33 tissues for each patient? Table 3 shows 7 tissues sampled and 26 missing for patient LG12 and Gig 1G shows 7 nodes with 5 of them being from gut. Most patients seem to have just one or two blood tissues sampled. In many places, the paper says "number of HIV sequences" but nowhere is it mentioned whether there ere hundreds of sequences from each tissue sample, or dozens, or thousands.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: Yes: Brian T. Foley

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2023 Sep 26;18(9):e0291367. doi: 10.1371/journal.pone.0291367.r002

Author response to Decision Letter 0

23 Aug 2023

Please see attachment

Attachment

Submitted filename: ViralMigrationNetwork_responses_07_05_2023.pdf

Click here for additional data file.^{(175.2KB, pdf)}

PLoS One. doi: 10.1371/journal.pone.0291367.r003

Decision Letter 1

Nafees Ahemad

29 Aug 2023

Identification of system-level features in HIV migration within a host

PONE-D-22-24111R1

Dear Dr. Goyal,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Nafees Ahemad

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

PLoS One. doi: 10.1371/journal.pone.0291367.r004

Acceptance letter

Nafees Ahemad

11 Sep 2023

PONE-D-22-24111R1

Identification of system-level features in HIV migration within a host

Dear Dr. Goyal:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Nafees Ahemad

Academic Editor

PLOS ONE

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

S1 File. Contains all the supplementary tables.

(PDF)

Click here for additional data file.^{(130.1KB, pdf)}

Attachment

Submitted filename: ViralMigrationNetwork_responses_07_05_2023.pdf

Click here for additional data file.^{(175.2KB, pdf)}

Data Availability Statement

The sequence data have been uploaded on Dryad at DOI: 10.5061/dryad.dncjsxm44. The code has been uploaded to Github at: https://github.com/ravigoyalgit/VMN.

[pone.0291367.ref001] 1. Chaillon A, Gianella S, Dellicour S, Rawlings SA, Schlub TE, De Oliveira MF, et al. HIV persists throughout deep tissues with repopulation from multiple anatomical sources. The Journal of clinical investigation. 2020;130(4):1699–1712. doi: 10.1172/JCI134815 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0291367.ref002] 2. De Scheerder MA, Vrancken B, Dellicour S, Schlub T, Lee E, Shao W, et al. HIV rebound is predominantly fueled by genetically identical viral expansions from diverse reservoirs. Cell host & microbe. 2019;26(3):347–358. doi: 10.1016/j.chom.2019.08.003 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0291367.ref003] 3. Robins G, Pattison P, Kalish Y, Lusher D. An introduction to exponential random graph (p*) models for social networks. Social networks. 2007;29(2):173–191. doi: 10.1016/j.socnet.2006.08.003 [DOI] [Google Scholar]

[pone.0291367.ref004] 4. Lusher D, Koskinen J, Robins G. Exponential random graph models for social networks: Theory, methods, and applications. vol. 35. Cambridge University Press; 2013. [Google Scholar]

[pone.0291367.ref005] 5. Wang C, Butts CT, Hipp JR, Jose R, Lakon CM. Multiple imputation for missing edge data: a predictive evaluation method with application to add health. Social networks. 2016;45:89–98. doi: 10.1016/j.socnet.2015.12.003 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0291367.ref006] 6. Koskinen JH, Robins GL, Pattison PE. Analysing exponential random graph (p-star) models with missing data using Bayesian data augmentation. Statistical Methodology. 2010;7(3):366–384. doi: 10.1016/j.stamet.2009.09.007 [DOI] [Google Scholar]

[pone.0291367.ref007] 7. Handcock MS, Gile KJ. Modeling social networks from sampled data. The Annals of Applied Statistics. 2010;4(1):5. doi: 10.1214/08-AOAS221 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0291367.ref008] 8. Caimo A, Friel N. Bayesian inference for exponential random graph models. Social Networks. 2011;33(1):41–55. doi: 10.1016/j.socnet.2010.09.004 [DOI] [Google Scholar]

[pone.0291367.ref009] 9. Kossinets G. Effects of missing data in social networks. Social networks. 2006;28(3):247–268. doi: 10.1016/j.socnet.2005.07.002 [DOI] [Google Scholar]

[pone.0291367.ref010] 10. Smith JA, Moody J. Structural effects of network sampling coverage I: Nodes missing at random. Social networks. 2013;35(4):652–668. doi: 10.1016/j.socnet.2013.09.003 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0291367.ref011] 11. Smith JA, Moody J, Morgan JH. Network sampling coverage II: The effect of non-random missing data on network measurement. Social networks. 2017;48:78–99. doi: 10.1016/j.socnet.2016.04.005 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0291367.ref012] 12. Krause RW, Huisman M, Steglich C, Snijders T. Missing data in cross-sectional networks–An extensive comparison of missing data treatment methods. Social Networks. 2020;62:99–112. doi: 10.1016/j.socnet.2020.02.004 [DOI] [Google Scholar]

[pone.0291367.ref013] 13. Smith JA, Morgan JH, Moody J. Network sampling coverage III: Imputation of missing network data under different network and missing data conditions. Social Networks. 2022;68:148–178. doi: 10.1016/j.socnet.2021.05.002 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0291367.ref014] 14. Gianella S, Taylor J, Brown TR, Kaytes A, Achim CL, Moore DJ, et al. Can Research at the End-of-life be a Useful Tool to Advance HIV Cure? AIDS (London, England). 2017;31(1):1. doi: 10.1097/QAD.0000000000001300 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0291367.ref015] 15. Frank O, Strauss D. Markov graphs. Journal of the american Statistical association. 1986;81(395):832–842. doi: 10.1080/01621459.1986.10478342 [DOI] [Google Scholar]

[pone.0291367.ref016] 16. Wasserman S, Pattison P. Logit models and logistic regressions for social networks: I. An introduction to Markov graphs andp. Psychometrika. 1996;61(3):401–425. doi: 10.1007/BF02294547 [DOI] [Google Scholar]

[pone.0291367.ref017] 17. Yin F, Shen W, Butts CT. Finite Mixtures of ERGMs for Modeling Ensembles of Networks. Bayesian Analysis. 2022;1(1):1–39. [Google Scholar]

[pone.0291367.ref018] 18.Lehmann B, White S. Bayesian exponential random graph models for populations of networks. arXiv preprint arXiv:210405110. 2021;.

[pone.0291367.ref019] 19. Lunagómez S, Olhede SC, Wolfe PJ. Modeling network populations via graph distances. Journal of the American Statistical Association. 2021;116(536):2023–2040. doi: 10.1080/01621459.2020.1763803 [DOI] [Google Scholar]

[pone.0291367.ref020] 20. Cochran WG. The combination of estimates from different experiments. Biometrics. 1954;10(1):101–129. doi: 10.2307/3001666 [DOI] [Google Scholar]

[pone.0291367.ref021] 21. Harrer M, Cuijpers P, Furukawa TA, Ebert DD. Doing meta-analysis with R: A hands-on guide. Chapman and Hall/CRC; 2021. [Google Scholar]

[pone.0291367.ref022] 22. Goodreau SM, Kitts JA, Morris M. Birds of a feather, or friend of a friend? Using exponential random graph models to investigate adolescent social networks. Demography. 2009;46(1):103–125. doi: 10.1353/dem.0.0045 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0291367.ref023] 23.Handcock MS, Robins G, Snijders T, Moody J, Besag J. Assessing degeneracy in statistical models of social networks. Working paper; 2003.

[pone.0291367.ref024] 24. Snijders TA, Pattison PE, Robins GL, Handcock MS. New specifications for exponential random graph models. Sociological methodology. 2006;36(1):99–153. doi: 10.1111/j.1467-9531.2006.00176.x [DOI] [Google Scholar]

[pone.0291367.ref025] 25. Mukherjee S. Degeneracy in sparse ERGMs with functions of degrees as sufficient statistics. Bernoulli. 2020;26(2):1016–1043. doi: 10.3150/19-BEJ1135 [DOI] [Google Scholar]

[pone.0291367.ref026] 26. Sarkar S, Romero-Severson E, Leitner T. Migration coupled with recombination explains disparate HIV-1 anatomical compartmentalization signals. bioRxiv. 2023; p. 2023–04. [Google Scholar]

[pone.0291367.ref027] 27. Kosakovsky Pond SL, Posada D, Gravenor MB, Woelk CH, Frost SD. Automated phylogenetic detection of recombination using a genetic algorithm. Molecular biology and evolution. 2006;23(10):1891–1901. doi: 10.1093/molbev/msl051 [DOI] [PubMed] [Google Scholar]

[pone.0291367.ref028] 28. White JA, Kufera JT, Bachmann N, Dai W, Simonetti FR, Armstrong C, et al. Measuring the latent reservoir for HIV-1: Quantification bias in near full-length genome sequencing methods. PLoS pathogens. 2022;18(9):e1010845. doi: 10.1371/journal.ppat.1010845 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0291367.ref029] 29. Kolaczyk ED, Krivitsky PN. On the question of effective sample size in network modeling: An asymptotic inquiry. Statistical science: a review journal of the Institute of Mathematical Statistics. 2015;30(2):184. doi: 10.1214/14-STS502 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0291367.ref030] 30. Goyal R, Blitzstein J, De Gruttola V. Sampling networks from their posterior predictive distribution. Network Science. 2014;2(01):107–131. doi: 10.1017/nws.2014.2 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0291367.ref031] 31. Goyal R, De Gruttola V. Inference on network statistics by restricting to the network space: applications to sexual history data. Statistics in medicine. 2018;37(2):218–235. doi: 10.1002/sim.7393 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0291367.ref032] 32. Goyal R, De Gruttola V. Dynamic Network Prediction. Network Science. 2020;8(04):574–595. doi: 10.1017/nws.2020.24 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0291367.ref033] 33. Goyal R, De Gruttola V. Investigation of patient-sharing networks using a Bayesian network model selection approach for congruence class models. Statistics in medicine. 2021;40(13):3167–3180. doi: 10.1002/sim.8969 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0291367.ref034] 34. Goyal R, Carnegie N, Slipher S, Turk P, Little SJ, De Gruttola V. Estimating contact network properties by integrating multiple data sources associated with infectious diseases. Statistics in Medicine. 2023;. doi: 10.1002/sim.9816 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0291367.ref035] 35. Rawlings SA, Layman L, Smith D, Scott B, Ignacio C, Porrachia M, et al. Performing rapid autopsy for the interrogation of HIV reservoirs. AIDS (London, England). 2020;34(7):1089. doi: 10.1097/QAD.0000000000002546 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0291367.ref036] 36. Lemey P, Rambaut A, Drummond AJ, Suchard MA. Bayesian phylogeography finds its roots. PLoS computational biology. 2009;5(9):e1000520. doi: 10.1371/journal.pcbi.1000520 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0291367.ref037] 37. Edwards CJ, Suchard MA, Lemey P, Welch JJ, Barnes I, Fulton TL, et al. Ancient hybridization and an Irish origin for the modern polar bear matriline. Current biology. 2011;21(15):1251–1258. doi: 10.1016/j.cub.2011.05.058 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0291367.ref038] 38. Krivitsky PN. Exponential-family random graph models for valued networks. Electronic journal of statistics. 2012;6:1100. doi: 10.1214/12-EJS696 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0291367.ref039] 39.Handcock MS, Hunter DR, Butts CT, Goodreau SM, Krivitsky PN, Morris M. ergm: Fit, Simulate and Diagnose Exponential-Family Models for Networks; 2018. Available from: https://CRAN.R-project.org/package=ergm. [DOI] [PMC free article] [PubMed]

[pone.0291367.ref040] 40. Hunter DR, Handcock MS, Butts CT, Goodreau SM, Morris M. ergm: A Package to Fit, Simulate and Diagnose Exponential-Family Models for Networks. Journal of Statistical Software. 2008;24(3):1–29. doi: 10.18637/jss.v024.i03 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0291367.ref041] 41.R Core Team. R: A Language and Environment for Statistical Computing; 2021. Available from: https://www.R-project.org/.

PERMALINK

Identification of system-level features in HIV migration within a host

Ravi Goyal

Victor De Gruttola

Sara Gianella

Gemma Caballero

Magali Porrachia

Caroline Ignacio

Brendon Woodworth

Davey M Smith

Antoine Chaillon

Roles

Abstract

Objective

Method

Results

Discussion

1 Introduction

2 Results

2.1 Descriptive analyses of viral migration networks

Fig 1. Visual representation of the VMNs for each of the Last Gift participants.

2.2 Viral migration is bi-directional (Reciprocity)

Fig 2. Estimation of the level of reciprocity in VMNs.

2.3 HIV migration events are more common among tissues in the biological system (Homophily)

Fig 3. Estimation of the level of homophily in VMNs.

Table 1. Estimation of the level of homophily in VMNs by tissue type.

2.4 Number of viral migration events is not association with reservoir size (heterogeneity in outward events)

2.5 Viral migration forms a hierarchical structure among the tissues (Triads: Transitivity and 3-cycles)

Table 2. Estimation of the level of triads and 3-cycles in VMNs.

3 Discussion

4 Methods

4.1 Last Gift Cohort

Table 3. Demographic characteristics of the participants and summary statistics of their VMN, including the number of observed and missing tissues and the number of directed edges (i.e., the presence of a migration event inferred from Bayesian models).

4.2 Bayesian discrete phylogeographic models

4.3 Exponential random graph models

4.4 Assessing heterogeneity

Supporting information

Data Availability

Funding Statement

References

Decision Letter 0

Nafees Ahemad

Roles

Author response to Decision Letter 0

Decision Letter 1

Nafees Ahemad

Roles

Acceptance letter

Nafees Ahemad

Roles

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases