Skip to main content
PLOS Computational Biology logoLink to PLOS Computational Biology
. 2020 Oct 30;16(10):e1007951. doi: 10.1371/journal.pcbi.1007951

Model-based and phylogenetically adjusted quantification of metabolic interaction between microbial species

Tony J Lam 1, Moses Stamboulian 1, Wontack Han 1, Yuzhen Ye 1,*
Editor: Christos A Ouzounis2
PMCID: PMC7657538  PMID: 33125363

Abstract

Microbial community members exhibit various forms of interactions. Taking advantage of the increasing availability of microbiome data, many computational approaches have been developed to infer bacterial interactions from the co-occurrence of microbes across diverse microbial communities. Additionally, the introduction of genome-scale metabolic models have also enabled the inference of cooperative and competitive metabolic interactions between bacterial species. By nature, phylogenetically similar microbial species are more likely to share common functional profiles or biological pathways due to their genomic similarity. Without properly factoring out the phylogenetic relationship, any estimation of the competition and cooperation between species based on functional/pathway profiles may bias downstream applications. To address these challenges, we developed a novel approach for estimating the competition and complementarity indices for a pair of microbial species, adjusted by their phylogenetic distance. An automated pipeline, PhyloMint, was implemented to construct competition and complementarity indices from genome scale metabolic models derived from microbial genomes. Application of our pipeline to 2,815 human-gut associated bacteria showed high correlation between phylogenetic distance and metabolic competition/cooperation indices among bacteria. Using a discretization approach, we were able to detect pairs of bacterial species with cooperation scores significantly higher than the average pairs of bacterial species with similar phylogenetic distances. A network community analysis of high metabolic cooperation but low competition reveals distinct modules of bacterial interactions. Our results suggest that niche differentiation plays a dominant role in microbial interactions, while habitat filtering also plays a role among certain clades of bacterial species.

Author summary

Microbial communities, also known as microbiomes, are formed through the interactions of various microbial species. Utilizing genomic sequencing, it is possible to infer the compositional make-up of communities as well as predict their metabolic interactions. However, because some species are more similarly related to each other, while others are more distantly related, one cannot directly compare metabolic relationships without first accounting for their phylogenetic relatedness. Here we developed a computational pipeline which predicts complimentary and competitive metabolic relationships between bacterial species, while normalizing for their phylogenetic relatedness. Our results show that phylogenetic distances are correlated with metabolic interactions, and factoring out such relationships can help better understand microbial interactions which drive community formation.


This is a PLOS Computational Biology Methods paper.

Introduction

Recent advances in microbiome research have accelerated the study of the composition and function of microbial communities associated with different environments and hosts. Studies have shown the association of microbial communities with human health and diseases including type 2 diabetes [1], and efficacy of treatment including immunotherapy to cancers [2]. To reveal the mechanisms behind the microbiome-host interactions, it is important to understand how microbial species form communities and how the microbial communities interact with the host to mediate various biological processes [3].

Studying the principles underlying the structure and composition of microbial communities is of long-standing interest to microbial ecologists. The dynamics which govern microbial community assembly have been extensively debated, and it is disputed upon as to what extent the role of neutral or deterministic dynamics plays in microbial interactions [4, 5]. Some studies support the neutral hypothesis, which assumes that community structure is determined by random processes [6]. Other theories suggest that community assembly dynamics are govern by deterministic processes such as habitat filtering and niche differentiation [7, 8]. While many studies focus on species abundances to study community assembly, Bruke et al. [9] showed that the key level to address the community assembly may not lie at the species level, but rather the functional level of genes. While the aforementioned theories of community assembly may not be all-encompassing, they highlight varied dynamics which can contribute to community structure and affect the assembly of complex microbial communities.

Some studies have also shown that microbial communities tend to be more phylogenetically clustered than expected by chance, harboring groups of closely related taxa that exhibit microscale differences in genomic diversity [1012]. In one such study, marine bacterial communities were observed at various locations and it was reported that local communities were phylogenetically different from each other and tend to be phylogenetically clustered [12]. However, some microbial communities have also shown the opposite patterns, in which taxa are less clustered and are less related than expected by chance [13, 14]. Together, these studies have explored the relationship between functional distances/metabolic overlap with phylogenetic relatedness, and they have given rise to competing theories of ‘habitat-filtering’ and ‘niche differentiation’: habitat filtering suggests that dominant species exhibit similar functional traits, whereas niche differentiation says that phylogenetically similar species are unable to co-exist due to similar traits and resource overlap [3]. Nevertheless, several methods have been developed for inference of bacterial interaction network based on the assumption that phylogenetically related species tend to co-exists. For example, Lo et al. [15] developed phylogenetic graphical lasso approach for bacterial community detection, based on the assumption that phylogenetically correlated microbial species are more likely to interact to each other. Additionally, systematists have long argued that the comparison between species is not an independent process [16]; this is largely driven by the fact that related organisms share many genes and traits. The confounding effect of shared phylogeny has since inspired the development of methods and techniques, such as phylogenetically independent contrasts and phylogenetic generalized lease squares, to account for the dependent effect of phylogeny when comparing across species [1719].

The study of microbial interactions and the dynamics which govern such interactions are important in providing insights to community assembly and ultimately processes which influence host health and disease. Insights into community complementarity and competition may also uncover symbiotic and antagonistic relationships and can be used to provide prospective candidates for probiotics. Leveraging the increasing availability of microbiome datasets, novel statistical and computational methods have been developed to infer bacterial interaction networks from co-occurrence information. Some examples include, SparCC [20], a tool to infer correlations by correcting for compositional data. Conner et al. demonstrated the importance of using null model to infer microbial co-occurrence networks [21]. Mandakovic and colleagues compared microbial co-occurrence networks representing bacterial soil communities from different environments to determine the impact of a shift in environmental variables on the community’s taxonomic composition and their relationships [22]. MDiNE is another recently developed model for estimating differential co-occurrence networks in microbiome studies [23]. Notably, Faust et al. [24] applied generalized boosted linear models to infer thousands of significant co-occurrence and co-exclusion relationships between 197 clades occurring throughout the human microbiomes; their study revealed reverse correlation between functional similarity and phylogenetic distance among bacterial species, which is unsurprising. Despite of the numerous advances, it has been considered difficult to infer microbial community structure based on co-occurrence network approaches [25].

Functional profiles or biological pathways inferred from genomic sequences of the microbial species can provide mechanistic information about the functional traits of the microbes and potential cross-feeding. Genome-scale metabolic models (GEMS) can potentially provide mechanistic explanations to the association of bacterial species that are discovered by analyzing their co-occurrence in diverse microbial communities [26]. Many automated tools [2730] have been made available for genome scale metabolic reconstructions (GENREs), however to get quality models these automated methods often require additional manual refinement including checks for stoichiometric consistency, defined media, and gap filling [31]. The challenges of manual curation often make it difficult to construct GEMs for a large consortium of microbes. Notably, Machado et al. [32] developed an automated tool called CarveMe, which uses a top-down approach to build species and community level metabolic models which the authors claim is able to produce comparable results to other tools while also reducing manual intervention [32, 33]. The ability to predict metabolic network of microbial members through GENREs has led some studies to focus on inferring levels and types of interaction among microbial species via metabolic models. Levy and Borenstein [34] introduced pairwise indices of metabolic interaction: the metabolic competition index and complementarity index, which are computed based on the overlapping and complementarity of the compounds that are contained in the metabolic models, respectively. By analyzing the metabolic interactions among 154 human-associated bacterial species and comparing the computed indices with observed species co-occurrence in microbiomes, the authors concluded that species tend to co-occur across individuals more frequently with species with which they strongly compete, suggesting that microbial assembly is dominated by habitat filtering [34]. Similar metrics have been introduced to quantify the metabolic complementarity and competition between bacterial species, such as MIP (metabolic interaction potential) and MRO (metabolic resource overlap) [26].

By nature, two phylogenetically-close microbial species share similar functional profiles or biological pathways due to their genomic similarity. Additionally, co-evolutionary studies have also shown that comparative analyses between species cannot be assumed to be statistically independent, as comparative data of similarly related species correlate with each other due to shared ancestry [16, 3537]. Thus, without factoring out the phylogenetic relationship (the confounding factor), any estimation of the competition and complementarity based on function/pathway profiles may be biased and cause problems in downstream applications. In this study, we focused on the large collection of human gut-associated genomes (including reference genomes and genomes assembled from metagenomic sequences, MAGs). We implemented an automated pipeline, PhyloMInt, for genome scale pathway reconstruction and for computing competition and complementarity scores based on the reconstructed pathways. Our results showed correlation between phylogenetic distance and metabolic competition/complementarity indices, indicating the importance of normalizing these indices by the phylogenetic distance between underlying microbial species. Using a discretization approach, we were able to detect pairs of bacterial species with complementarity scores significantly higher than the average pairs of bacterial species with similar phylogenetic distances. We further built a network of human-gut microbes based on complementarity and competition indices, and we discuss some of the results we derived by analyzing the network.

Results

Evaluation of the performance of GENREs on incomplete genomes

To assess the stability of CarveMe genome-scale metabolic reconstructions (GENREs) on incomplete MAGs, we simulated incomplete genomes by randomly removing clusters of neighboring genes from complete genomes and evaluated their resulting GENREs (See Methods). By comparing GENREs constructed from incomplete genomes to that of complete genomes, we observed that the distribution of the number of source and sink nodes remain relatively stable in respect to the number of removed genes (Fig 1, top two rows of panels). We further compared the similarity of the networks as measured by the overlap of the nodes and edges (Jaccard similarity) (Fig 1, bottom two rows of panels). We found that for most cases, the reconstructed metabolic networks of simulated genomes remain largely similar to those of the complete genomes, with actual differences smaller than the expected values of the differences that are proportional to the loss of CDSs (the red dotted lines in Fig 1). For example, the metabolic networks of simulated incomplete genomes of E. coli str K-12 MG1655 with only 80% of the total CDSs shared similar nodes and edges with the complete genome with Jaccard similarity greater than 0.9. Considering both the stability of metabolic networks generated from GENREs from incomplete genomes (as measured by the number of source and sink nodes, and the similarity of metabolic networks), and the fact that we only utilized near complete MAGs in our analysis, we believe that use of near-complete MAGS (>80% completeness) should have minimal impact on the calculation of the metabolic complementarity/competition indices. Results of simulations can be found in S1 Table.

Fig 1. Boxplots of the evaluation of CarveMe GENREs from simulated incomplete genomes.

Fig 1

Simulated incomplete genomes with 70%-100% of the total CDSs of the complete reference genome were used. Inferred metabolic networks of incomplete genomes were evaluated by the number of source nodes, the number of sink nodes, the overlap between the nodes of the networks (Jaccard similarity), and the overlap between the edges of the networks (Jaccard similarity). Dashed line in source and sink node boxplots represent the baseline number of source and sink nodes in a complete genome. Dashed line in metabolic network edge and node overlap boxplots represents a regression line with y-intercept of 1.0 and slope of -1; this line represents the expected value of Jaccard Index which is proportional to the total remaining CDSs. (A) GENRE of iML1515, Escherichia coli str. K-12 substr. MG1655 (B) GENRE of iEK1008, Mycobacterium tuberculosis H37Rv.

Impact of phylogenetic relationship on microbial complementarity and competition indices

We applied our pipeline to analyze 2,815 human gut related MAGs and computed their pairwise competition and complementarity scores (about 8M directed pairs). As shown in Fig 2A, we see a positive relationship between the metabolic complementarity of bacterial species and their phylogenetic distances. In contrast, we see in Fig 2B there is a negative relationship between metabolic competition of bacterial species and phylogenetic distance. Our results are consistent with other previous studies of functional and metabolic relationships with phylogenetic distances [24, 26, 38]. And they support the theory of niche differentiation, which states that phylogenetically close species are more likely to compete with each other due to their shared traits and resource overlap, leading to less probability of their co-existence.

Fig 2. Hexagonal binned plots of metabolic interaction indices versus phylogenetic distance.

Fig 2

Pairwise comparison between 2,815 human gut related MAGs (A) Metabolic Complementarity Index and (B) Metabolic Competition Index, versus their phylogenetic distance with density contours. The plots were fitted with a generalized additive model (red line).

Due to the non-zero correlation between metabolic interactions and phylogenetic distances, comparing complementarity and competition between species pairs without accounting for their phylogenetic relationships confounds such comparisons. As an example, in Fig 2A, if a pair of closely related genomes (with phylogenetic distance close to 0) had complementarity index of 0.18 and therefore would be a significant outlier comparing to other pairs of genomes of similar phylogenetic distance. However, if this complementarity index was to be compared to those of genome pairs with greater phylogenetic distance (e.g., complementarity index of 2), it would no longer be considered a statistical outlier. To compare complementarity/competition indices across species, the confounding effects of phylogeny must be first decoupled.

Here we demonstrate a discretization approach for the identification of statistically significant complementary species pairs as a method for accounting/correcting for phylogenetic distances. To discretize comparisons across continuous phylogenetic distances, pairwise indices were binned by their phylogenetic distances. Outliers are then identified within each bin, which are likely pairs of bacteria with statistically significant complementary or competitive interactions.

Identification of potentially collaborative or competing pairs of gut bacteria from metabolic outliers

To explore the relationship between complementary and competitive pairs, we compared their respective Z-scores (Fig 3). Significant outliers were selected using a Z-score threshold of ±2.698 as proposed by Tukey [39]. A total of 60,116 directed pairs were identified as positive complementary outliers. Additionally, 7,769 and 44,409 competitive positive and negative directed pairs of outliers were identified, respectively. Unsurprisingly, most pairs were centered around a Z-score of zero and no pairs were simultaneously significant for both complementarity and competition, simultaneously.

Fig 3. Hexagonal binned plot of metabolic complementarity and competition Z-scores with density contours.

Fig 3

We analyzed bacteria pairs belonging to the same genus or family that have significantly high complementarity scores to better understand how taxonomic similarity correlates with metabolic cooperation. At the genus level, 140,152 directed pairs were identified; and at the family level, 233,555 directed pairs were identified. Of the pairs belonging to the same genus or family, 1,230 and 5,190 were identified as significant complementary outliers, respectively. These taxonomically similar bacteria pairs have the potential to cooperate in gut microbiomes. The rarity of significant outliers with the same taxonomic classifications suggests that for most taxonomically similar pairs at the genus and family level, niche differentiation plays an integral role in community assembly. Detailed lists of complementarity and competition outliers are provided in S2 and S3 Tables, respectively.

To explore community assembly dynamics, we constructed a directed graph of bacterial species where bacteria are the nodes and a directed edge is added between two bacteria if they have a high metabolic complementarity (Z-score > 2.698) and low metabolic competition (Z-score < −1.000); here we relaxed the Z-score of competition indices to -1.000 in order to focus our analysis towards species pairs with greater complementarity while still constraining the analysis to a degree of low competition observed between species pairs (see S4 Table for detailed list). Using Infomap [40] to analyze the network, we were able to identify two main community modules (Fig 4). The larger community module (shown on the right in Fig 4) was populated with many multi-layer sub-modules, which featured majority of the significantly cooperating bacteria. Interestingly the smaller community module (shown on the left in Fig 4) exclusively contained Bifidobacterium spp. (e.g. B. longum, B. bifidum, B. infantis), suggesting that various Bifidobacterium species are metabolically complementary to each other, more-so than other phylogenetically similar taxa.

Fig 4. Community modules of significant complementarity outliers that exhibit low metabolic competition identified from human-gut related MAGs.

Fig 4

Circular nodes represent predicted community modules and sub-modules of cooperative bacterial communities.

Bifidobacterium species are major colonizers of infant gut microbiota, and play a prominent role in the degradation and metabolism of Human milk oligosaccharides (HMOs) [41]. One such example of the complementary interactions between Bifidobacterium species was captured by our pipeline, and can be exemplified in predicted presence/absence of sialic acid metabolism pathways (Fig 5). The GEM for B. infantis subsp. infantis (Fig 5; left) was able to capture the pathways involved in sialic acid metabolism; whereas the GEM for B. longum (Fig 5; right) failed to capture any metabolic pathways that utilize sialic acid. While both species are present in high concentrations in the infant microbiota, various studies have shown that B. longum lacks the associated gene clusters for the sialic acid catabolism [42, 43]. It should be noted that while the GEM was able to capture sialic acid metabolic pathways in B. infantis, CarveMe failed to predict exo-α-sialidase mediated degradation of sialylated carbohydrates. A protein blast of exo-α-sialidase (NanH1 and NanH2) from B. longum subsp. infantis ATCC 15697 [44] against protein coding genes of B. infantis (MAG: GCF_000020425.1) was able to confirm the presence of both protein coding genes with 100% percent identity (S5 Table); both exo-α-sialidase genes were absent in B. longum (MAG: 18391_1_6) (S6 Table). Nevertheless, this example shows a possible metabolic complementary between related species reflected within the GENREs. In addition to Bifidobacterium spp., other bacterial genera were shown to also form sub-community modules highly uniform for their own genera (i.e. Helicobacter, Collinsella, Lachnospiraceae, and Ruminococcus). We note that if complementarity scores were analyzed without correcting for phylogenetic distances, these significant complementarity scores of taxonomically similar bacteria would not be considered significant, thus emphasizing the importance of correcting for phylogenetic distances. The pattern of taxonomically related genomes forming community module is suggestive of habitat filtering characteristics within certain distinctive bacterial taxa. Infomap community module membership available in S1 File.

Fig 5. Sialic acid metabolism pathway reconstructed from Bifidobacterium species.

Fig 5

(Left) Metabolic pathway of sialic acid metabolism present in the GENRE constructed from B. infantis MAG (GCF_000020425.1). (Right) Metabolic pathway of sialic acid metabolism present in the GENRE constructed from B. longum MAG (18391_1_6), missing metabolites and metabolic reactions from the GENRE model are grayed out with dotted lines.

To further explore this, we analyzed the proportion of significantly cooperative bacteria with the same genus annotations. Our results show that more than half (42/76) of the taxa with 50 or more members within the same genus contained a significant number of metabolically complementary pairs; within genus proportion of taxa with significant pairs ranged from 0.02% to 15.9% (S7 Table). Together, these results show that while niche differentiation dominates a majority of metabolic interactions, we observe habitat filtering characteristics within certain bacterial taxa.

Discussion

Here we demonstrate a novel approach to identifying significant metabolic cooperators and competitors between bacterial species pairs. This approach builds upon previously developed metrics of metabolic complementation and competition [34, 45, 46] by identifying outlier pairs relative to their phylogenetic distances. As pairwise metabolic interactions are correlated with phylogenetic distance, it remains imperative to take into consideration their phylogenetic distances when making comparisons across different phylogenetic distances as such comparisons may confound comparisons.

Our analysis shows that metabolic cooperation exhibits a positive relationship with phylogenetic distance, whereas metabolic competition exhibits a negative relationship. These findings support the results from previous work that studied the relationship between phylogenetic relatedness and gene content, functional distance, and metabolic interactions [24, 26, 38]. Together these observed relationships seem to support the theory of niche differentiation, where functional overlap discourages phylogenetically related species from co-existing. However, by taking into consideration the phylogenetic distance between pairs to identify metabolic outliers, we were able to identify significant intra-genus cooperation in several distinct taxa. The intra-genus modules may suggest that while most bacteria interactions display niche differentiation characteristics, some taxa exhibit habitat filtering. Notably, Bifidobacterium species were shown to form distinct community modules which suggest significant intra-genus cooperation compared to other taxa. These results support recent findings that suggest strains of Bifidobacterium spp. in infants have different nutrient profiles to support colonization of other specific Bifidobacterium species [47]. The observation of both habitat filtering and niche differentiation characteristics suggests that in some cases both contribute to the dynamics of community assembly.

We note a few limitations of our approach. First, metabolic complementarity and competition indices are dependent on a given metabolic model. Completeness of GENREs are dependent on a variety of variables (e.g. the reconstruction tool and the genome completeness) that can have a significant impact on predicted metabolic interactions. Second, seed sets used to calculate the metabolic interaction indices do not represent required metabolites for growth, but rather represent a baseline of metabolites that in theory enable a given bacterium to produce any metabolite in their predicted metabolic network. As such, seed sets may influence the overestimation or underestimation of metabolic interactions between bacterial species. However, by integrating phylogenetic distances to normalize metabolic interaction indices, we believe that our approach provides a more accurate prediction of metabolic interactions in comparison to other similar methods. Additionally, low abundant microbial species within microbiomes are not always well represented within metagenomic samples but may play key roles within a metabolic network. While we acknowledge that validation of this method remains difficult due to the lack of a gold standard comparison, the non-independent nature of comparative metrics between organisms due to shared ancestry provides a logical explanation as to the necessity to account for such confounding effects.

The ability capture the presence/absence of sialic acid metabolism pathways within Bifidobacterium spp. MAGs provides an example of CarveMe’s ability to reconstruct meaningful biological pathways. However, CarveMe’s reliance on reference models fails to capture species and/or strain specific metabolic pathways absent from those utilized reference databases. This ultimately is a current limitation of automated GENREs, and a limitation of reference based techniques. Gap filling and manual curation of metabolic models can be also be used to supplement reconstruction of highly accurate models. With our method, metabolic model reconstructions can be easily interchanged and as new metabolic reconstruction tools are developed, the phylogenetic adjustment of Complementarity and Competition idicies can be easily applied when comparing metabolic networks.

By decoupling phylogenetic distances between Complementarity and Competition indicies, we provide a method to explore statistically significant cooperating/competing species pairs within microbbiomes to better understand community assembly dynamics. Additionally, competition networks can be used to identify highly competitive species pairs, which may be useful for suggesting beneficial probiotic candidates. A future research direction is to integrate phylogenetically-corrected complementarity and competition scores with co-occurrence information to better address the challenges of identifying bacterial interactions through mechanistic insight.

Materials and methods

Genome sequences of human-gut bacteria

To assemble the human-gut associated reference genomes, we collected genomes from two recent studies [48, 49]. Bacterial genomes reported in [49] were compiled from two sources: a total of 617 genomes obtained from the human microbiome project (HMP) [50], and 737 whole genome-sequenced bacterial isolates, representing the Human Gastrointestinal Bacteria Culture Collection (HBC). These 737 bacterial genomes were assembled by culturing and purifying bacterial isolates of 20 fecal samples originating from different individuals [49]. The bacterial genomes reported in [48] were generated and classified from a total of 92,143 metagenome assembled genomes (MAGs), among which a total of 1,952 binned genomes were characterized as non-overlapping with bacterial genomes reported. We were able to retrieve 612 out of 617 RefSeq sequences using the reported RefSeq IDs. We only included genomes with > 80% completeness and < 5% contamination (via CheckM [51]). Our final dataset for this study contains a total of 2,815 genomes/MAGs. Taxonomic annotation of these genomes/MAGs was done using GTDB-toolkit’s least common ancestors approach [52].

Genome scale metabolic network reconstructions and analysis

Genome-scale metabolic network reconstructions (GENREs) for all genomes were constructed using CarveMe [32] with default parameters. Coding sequences (CDSs) of all input genomes were generated using FragGeneScan [53] to be used as input for CarveMe. Briefly, CarveMe is a genome-scale metabolic model reconstruction tool which utilizes a universal model for a top-down approach to build GENREs. In contrast to conventional bottom-up methods which require well defined growth media, manual curation and gap-filling, the top down approach of CarveMe removes reactions and metabolites inferred to be not present in the manually curated universal template.

Evaluation of incomplete metagenome assembled genomes

As we utilized MAGs with greater than equal to 80% CheckM [51] genome completeness, we simulated the ability of CarveMe to construct GENREs on incomplete genomes between 70-100% of the total CDSs at 5% intervals. To accomplish this, we utilized complete reference genomes obtained from NCBI RefSeq database. We then predicted protein coding sequences using FragGeneScan [53]. Using a custom script, we randomly selected (with repeats) sets of 3 neighboring CDSs until a specified interval of remaining CDSs remained. Neighboring CDSs were removed as a set, as genes are often missing together from assembled genomes due to uneven binning and sequencing, rather than a completely random process. We then used the remaining CDSs as input for CarveMe. This was repeated for 50 times at each 5% interval. The resulting GENRE were then used to construct a metabolic network with directed edges. The metabolic networks of simulated incomplete genomes were then compared with the corresponding metabolic networks constructed from complete genomes. To compare the differences between the constructed metabolic models, we assessed the Jaccard Index between the sets of edges and nodes in each metabolic model. Additionally, because the number of source and sink nodes are utilized for the computation of the Complementarity Index and Competition Index, we also assessed the effect of incomplete genomes on metabolic reconstruction using the number of source and sink nodes.

Phylogenetic distance

To compute pairwise evolutionary distances between gut bacteria, we first inferred a phylogeny covering all participating genomes using FastTree [54]. It was shown that using more phylogenetic marker genes (e.g. a set of 16S ribosomal protein sequences from each organism) gives trees with higher-resolution than the 16S rRNA gene alone [55]. A total of 120 bacterial marker genes were used to infer these phylogeny. The 120 marker genes used are ubiquitous among bacterial species and are shown to occur as single copies and less susceptible to horizontal gene transfer [56]. Amino acid sequence of protein coding genes were searched using HMMER3 [57] against a 120 HMM model database of marker genes received from Pfam [58] and TIGRfam databases [59]. Similar to the approach in [56], sequences extracted from each HMM model were individually aligned using hmmalign [57], which were later concatenated to form the final alignment. Poorly aligned regions were removed from the concatenated alignment and a final phylogeny was inferred using FastTree under WAG + GAMMA models. From the inferred phylogenetic tree, the phylogenetic (evolutionary) distance between two nodes (i.e., species) can be calculated as the sum of all the branch lengths between them.

Species interaction indexes

To estimate potential metabolic cooperation and competition between bacterial species, we need to know their nutritional profiles, which however are unavailable for most of the gut bacteria. Similar to the approach reported in [34, 60], we use the compound seed set of each species as a proxy for its nutritional profile: the seed set of a metabolic network is defined as the minimal subset of the compounds that cannot be synthesized from other compounds in the network (due to lack of the corresponding enzymes, and hence are exogenously acquired) but their existence permits the production of all other compounds in the network.

We implemented a pipeline for computing metabolic interaction indices from genome sequences. Our pipeline uses a) CarveMe for building genome-scale metabolic models from genome sequences, b) NetworkX [61] to identity seed compounds, and c) our own implementation (in Python) of the approaches for computing metabolic competition and complimentary indices given two genome-scale metabolic models. We call our pipeline PhyloMInt (Phylogenetically-adjusted Metabolic Interaction indices).

Seed set identification

Utilizing NetworkX v2.2 [61], strongly connected components (SCC) within the GENREs are identified. Confidence levels are assigned for all compounds relative to their SCC size, where the confidence level (C) is denoted as:

C=1(ComponentSize) (1)

The confidence level is representative of the confidence that a given compound belongs to the seed set. A threshold of C ≥ 0.2 was used to select compounds to be regarded as compounds part of a given ‘seed set’ of a given organism as specified by [60].

Metabolic competition and complementarity indices

Given two genome-scale metabolic models (GEMs) A and B, their Metabolic Competition Index (MICompetition) is calculated as the fraction of A’s seed set that is also in B’s seed set, normalized by the weighted sum of the confidence score [34, 46]. MICompetition estimates the baseline metabolic overlap between two given metabolic networks.

MICompetition=C(SeedSetASeedSetB)C(SeedSetA) (2)

Metabolic Complementarity Index (MIComplementarity) is calculated as the fraction of A’s seed set that is found within B’s metabolic network but not part of B’s seed set, normalized by the number of A’s seed set in B’s entire metabolic network [34, 45]. MIComplementarity represents the potential for A’s to utilize the potential metabolic output of B.

MIComplementarity=|SeedSetA¬SeedSetB||SeedSetA(SeedSetB¬SeedSetB)| (3)

A toy example for the calculation of Metabolic Competition and Complementarity indices has been provided in Fig 6. We note that the competition and complementarity indices are asymmetric.

Fig 6. Schematic illustration of seed set identification, and complementarity and competition index calculation between two toy metabolic networks.

Fig 6

In metabolic pathway A, SeedSetA consists of metabolites A, F, G, and H; metabolites F, G, and H form a strongly connected component (SCC). Confidence level of seed set metabolites within metabolic network A is 1, 13, 13, and 13 for metabolites A, F, G, and H, respectively. In metabolic pathway B, SeedSetB consist of F, I, J, and K; metabolites I and J form a SCC. Confidence level of seed set metabolites within metabolic network B is 1, 13, 13, and 1 for metabolites F, I, J, and K, respectively. In a comparison between metabolic network A versus metabolic network B, metabolic network A shares only one seed metabolite with metabolic network B (metabolite F) which lies in the SCC in metabolic network A. Thus, the MICompetition between metabolic network A and B is (13÷2)=16. Among SeedSetA, metabolites A and F are found within the metabolic network B but only metabolite A is within non-SeedSetB, thus the MIComplementarity index between metabolic network A and metabolic network B is 0.5. We used the same toy networks as those in [45].

Phylogenetic normalization and outlier detection

Pairwise metabolic complementarity and competition indices between species pairs are plotted against their predicted phylogenetic distance. While methods of outlier detection for continuous data exists, local peaks and troughs of indices relative to phylogenetic distance make it difficult to identify local outliers. Thus, we utilize a binning approach to limit outlier detection to localized values. Both metabolic complementarity and competition indexes use a two-step binning process to bin pairwise observations, first by using a fixed phylogenetic distance interval of 0.01, followed by merging bins which are smaller than a prespecified size. Here we used the first bin size as the reference. Bins were merged with the closest preceding bin satisfying our minimum bin size threshold. To identify metabolic complementarity and competition outliers within each phylogenetic distance bin, we calculate the Z-score within each bin respectively. Tukey’s method for outlier detection (equivalent to a Z-score threshold ±2.698) [39] was utilized to identify significant outliers.

Network construction and community detection

To build a metabolic complementarity/competition network, species pairs are represented as nodes within the network. Identified significant outliers were used to construct a network of gut bacteria, in which for any pair of species A and B, a directed edge is added between A and B (from A to B), if A and B have significantly high complementarity score but low competition score. Using the adjacency list of the directed graph, a local installation of Infomap [40] (with the parameters: –directed –zero-based-numbering –num-trials 10) was utilized to identify community interaction modules within our dataset. Infomap is a random walk based approach for community detection, and it provides a user friendly interface for visualization and exploration of the network and community structure (https://www.mapequation.org/navigator).

Supporting information

S1 File. Infomap output of community modules identified from metabolic complementarity index outlier network.

(TXT)

S1 Table. Metabolic network statistics of simulated incomplete genomes.

(XLSX)

S2 Table. Metabolic complementarity index outliers.

(TSV)

S3 Table. Metabolic competition index outliers.

(TSV)

S4 Table. Pairwise list of idnetified high metabolic complementarity (Z-score > 2.698) and low metabolic competition (Z-score < −1.000) bacteria.

(TSV)

S5 Table. Protein BLAST results.

Protein BLAST of exo-α-sialidase (NanH1 and NanH2) from B. longum subsp. infantis ATCC 15697 [44] against protein coding genes of B. infantis (MAG: GCF_000020425.1).

(TSV)

S6 Table. Protein BLAST results.

Protein BLAST of exo-α-sialidase (NanH1 and NanH2) from B. longum subsp. infantis ATCC 15697 [44] against protein coding genes of B. longum (MAG: 18391_1_6).

(TSV)

S7 Table. Analysis of the proportion of significantly cooperative bacteria within the same Genus.

(TSV)

Data Availability

Implementation and data are available at https://github.com/mgtools/PhyloMint.

Funding Statement

This research was supported by the National Institute of Health [1R01AI108888 to Y.Y., 1R01AI143254 to Y.Y.] and the National Science Foundation [2025451 to Y.Y]. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1. Zhao L, Zhang F, Ding X, et al. Gut bacteria selectively promoted by dietary fibers alleviate type 2 diabetes. Science. 2018;359(6380):1151–1156. 10.1126/science.aao5774 [DOI] [PubMed] [Google Scholar]
  • 2. Routy B, Le Chatelier E, et al. Gut microbiome influences efficacy of PD-1–based immunotherapy against epithelial tumors. Science. 2018;359(6371):91–97. 10.1126/science.aan3706 [DOI] [PubMed] [Google Scholar]
  • 3. Nemergut DR, Schmidt SK, Fukami T, O’Neill SP, Bilinski TM, Stanish LF, et al. Patterns and processes of microbial community assembly. Microbiol Mol Biol Rev. 2013;77(3):342–356. 10.1128/MMBR.00051-12 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Zhou J, Ning D. Stochastic community assembly: does it matter in microbial ecology? Microbiol Mol Biol Rev. 2017;81(4):e00002–17. 10.1128/MMBR.00002-17 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Powell JR, Karunaratne S, Campbell CD, Yao H, Robinson L, Singh BK. Deterministic processes vary during community assembly for ecologically dissimilar taxa. Nature communications. 2015;6(1):1–10. 10.1038/ncomms9444 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Woodcock S, Van Der Gast CJ, Bell T, Lunn M, Curtis TP, Head IM, et al. Neutral assembly of bacterial communities. FEMS microbiology ecology. 2007;62(2):171–180. 10.1111/j.1574-6941.2007.00379.x [DOI] [PubMed] [Google Scholar]
  • 7. Dumbrell AJ, Nelson M, Helgason T, Dytham C, Fitter AH. Relative roles of niche and neutral processes in structuring a soil microbial community. The ISME journal. 2010;4(3):337–345. 10.1038/ismej.2009.122 [DOI] [PubMed] [Google Scholar]
  • 8. Wong HL, Smith DL, Visscher PT, Burns BP. Niche differentiation of bacterial communities at a millimeter scale in Shark Bay microbial mats. Scientific reports. 2015;5:15607 10.1038/srep15607 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Burke C, Steinberg P, Rusch D, Kjelleberg S, Thomas T. Bacterial community assembly based on functional genes rather than species. Proceedings of the National Academy of Sciences. 2011;108(34):14288–14293. 10.1073/pnas.1101591108 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Horner-Devine MC, Bohannan BJ. Phylogenetic clustering and overdispersion in bacterial communities. Ecology. 2006;87(sp7):S100–S108. 10.1890/0012-9658(2006)87[100:PCAOIB]2.0.CO;2 [DOI] [PubMed] [Google Scholar]
  • 11. Bryant JA, Lamanna C, Morlon H, Kerkhoff AJ, Enquist BJ, Green JL. Microbes on mountainsides: contrasting elevational patterns of bacterial and plant diversity. Proceedings of the National Academy of Sciences. 2008;105(Supplement 1):11505–11511. 10.1073/pnas.0801920105 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Pontarp M, Canbäck B, Tunlid A, Lundberg P. Phylogenetic analysis suggests that habitat filtering is structuring marine bacterial communities across the globe. Microbial ecology. 2012;64(1):8–17. 10.1007/s00248-011-0005-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Thompson JR, Pacocha S, Pharino C, Klepac-Ceraj V, Hunt DE, Benoit J, et al. Genotypic diversity within a natural coastal bacterioplankton population. Science. 2005;307(5713):1311–1313. 10.1126/science.1106028 [DOI] [PubMed] [Google Scholar]
  • 14. Chaffron S, Rehrauer H, Pernthaler J, Von Mering C. A global network of coexisting microbes from environmental and whole-genome sequence data. Genome research. 2010;20(7):947–959. 10.1101/gr.104521.109 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Lo C, Marculescu R. PGLasso: Microbial Community Detection through Phylogenetic Graphical Lasso. arXiv preprint arXiv:180708039. 2018;.
  • 16. Felsenstein J. Phylogenies and the comparative method. The American Naturalist. 1985;125(1):1–15. 10.1086/284325 [DOI] [PubMed] [Google Scholar]
  • 17. Garland T Jr, Harvey PH, Ives AR. Procedures for the analysis of comparative data using phylogenetically independent contrasts. Systematic biology. 1992;41(1):18–32. 10.1093/sysbio/41.1.18 [DOI] [Google Scholar]
  • 18. Garamszegi LZ. Modern phylogenetic comparative methods and their application in evolutionary biology: concepts and practice. Springer; 2014. [Google Scholar]
  • 19. Mundry R. Statistical issues and assumptions of phylogenetic generalized least squares In: Modern phylogenetic comparative methods and their application in evolutionary biology. Springer; 2014. p. 131–153. [Google Scholar]
  • 20. Friedman J, Alm EJ. Inferring correlation networks from genomic survey data. PLoS computational biology. 2012;8(9). 10.1371/journal.pcbi.1002687 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. Connor N, Barberán A, Clauset A. Using null models to infer microbial co-occurrence networks. PloS one. 2017;12(5). 10.1371/journal.pone.0176751 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22. Mandakovic D, Rojas C, Maldonado J, Latorre M, Travisany D, Delage E, et al. Structure and co-occurrence patterns in microbial communities under acute environmental stress reveal ecological factors fostering resilience. Scientific reports. 2018;8(1):5875 10.1038/s41598-018-23931-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23. McGregor K, Labbe A, Greenwood CMT. MDiNE: a model to estimate differential co-occurrence networks in microbiome studies. Bioinformatics. 2019;. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24. Faust K, Sathirapongsasuti JF, Izard J, Segata N, Gevers D, Raes J, et al. Microbial co-occurrence relationships in the human microbiome. PLoS computational biology. 2012;8(7):e1002606 10.1371/journal.pcbi.1002606 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25. Hirano H, Takemoto K. Difficulty in inferring microbial community structure based on co-occurrence network approaches. BMC bioinformatics. 2019;20(1):329 10.1186/s12859-019-2915-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26. Zelezniak A, Andrejev S, Ponomarova O, Mende DR, Bork P, Patil KR. Metabolic dependencies drive species co-occurrence in diverse microbial communities. Proceedings of the National Academy of Sciences. 2015;112(20):6449–6454. 10.1073/pnas.1421834112 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27. Devoid S, Overbeek R, DeJongh M, Vonstein V, Best AA, Henry C. Automated genome annotation and metabolic model reconstruction in the SEED and Model SEED In: Systems Metabolic Engineering. Springer; 2013. p. 17–45. [DOI] [PubMed] [Google Scholar]
  • 28. Agren R, Liu L, Shoaie S, Vongsangnak W, Nookaew I, Nielsen J. The RAVEN toolbox and its use for generating a genome-scale metabolic model for Penicillium chrysogenum. PLoS computational biology. 2013;9(3). 10.1371/journal.pcbi.1002980 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29. Dias O, Rocha M, Ferreira EC, Rocha I. Reconstructing genome-scale metabolic models with merlin. Nucleic acids research. 2015;43(8):3899–3910. 10.1093/nar/gkv294 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30. Karp PD, Latendresse M, et al. Pathway Tools version 19.0 update: software for pathway/genome informatics and systems biology. Briefings in bioinformatics. 2016;17(5):877–890. 10.1093/bib/bbv079 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31. Thiele I, Palsson BØ. A protocol for generating a high-quality genome-scale metabolic reconstruction. Nature protocols. 2010;5(1):93 10.1038/nprot.2009.203 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32. Machado D, Andrejev S, Tramontano M, Patil KR. Fast automated reconstruction of genome-scale metabolic models for microbial species and communities. Nucleic Acids Research. 2018;46(15):7542–7553. 10.1093/nar/gky537 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33. Mendoza SN, Olivier BG, Molenaar D, Teusink B. A systematic assessment of current genome-scale metabolic reconstruction tools. Genome Biology. 2019;20(1). 10.1186/s13059-019-1769-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34. Levy R, Borenstein E. Metabolic modeling of species interaction in the human microbiome elucidates community-level assembly rules. Proceedings of the National Academy of Sciences. 2013;110(31):12804–12809. 10.1073/pnas.1300926110 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35. Dutheil JY. Detecting coevolving positions in a molecule: why and how to account for phylogeny. Briefings in Bioinformatics. 2011;13(2):228–243. 10.1093/bib/bbr048 [DOI] [PubMed] [Google Scholar]
  • 36. Rezende EL, Diniz-Filho JAF. Phylogenetic analyses: comparing species to infer adaptations and physiological mechanisms. Comprehensive Physiology. 2011;2(1):639–674. [DOI] [PubMed] [Google Scholar]
  • 37.Cope AL, O’Meara B, Gilchrist MA. Gene Expression of Functionally-Related Genes Coevolves Across Fungal Species: Detecting Coevolution of Gene Expression Using Phylogenetic Comparative Methods. 2019;. [DOI] [PMC free article] [PubMed]
  • 38. Hester ER, Jetten MS, Welte CU, Lücker S. Metabolic overlap in environmentally diverse microbial communities. Frontiers in genetics. 2019;10:989 10.3389/fgene.2019.00989 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39. Tukey JW. Exploratory Data Analysis. Pearson; 1977. [Google Scholar]
  • 40. Rosvall M, Bergstrom CT. Maps of random walks on complex networks reveal community structure. Proceedings of the National Academy of Sciences. 2008;105(4):1118–1123. 10.1073/pnas.0706851105 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41. Lawson MAE, O’Neill IJ, Kujawska M, Javvadi SG, Wijeyesekera A, Flegg Z, et al. Breast milk-derived human milk oligosaccharides promote Bifidobacterium interactions within a single ecosystem. The ISME Journal. 2019;14(2):635–648. 10.1038/s41396-019-0553-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42. Underwood MA, German JB, Lebrilla CB, Mills DA. Bifidobacterium longum subspecies infantis: champion colonizer of the infant gut. Pediatric Research. 2014;77(1-2):229–235. 10.1038/pr.2014.156 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43. Garrido D, Ruiz-Moyano S, Kirmiz N, Davis JC, Totten SM, Lemay DG, et al. A novel gene cluster allows preferential utilization of fucosylated milk oligosaccharides in Bifidobacterium longum subsp. longum SC596. Scientific Reports. 2016;6(1). 10.1038/srep35045 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44. Sela DA, Li Y, Lerno L, Wu S, Marcobal AM, German JB, et al. An Infant-associated Bacterial Commensal Utilizes Breast Milk Sialyloligosaccharides. Journal of Biological Chemistry. 2011;286(14):11909–11918. 10.1074/jbc.M110.193359 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45. Levy R, Carr R, Kreimer A, Freilich S, Borenstein E. NetCooperate: a network-based tool for inferring host-microbe and microbe-microbe cooperation. BMC bioinformatics. 2015;16(1):164 10.1186/s12859-015-0588-y [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46. Kreimer A, Doron-Faigenboim A, Borenstein E, Freilich S. NetCmpt: a network-based tool for calculating the metabolic competition between bacterial species. Bioinformatics. 2012;28(16):2195–2197. 10.1093/bioinformatics/bts323 [DOI] [PubMed] [Google Scholar]
  • 47. Vatanen T, Plichta DR, Somani J, Münch PC, Arthur TD, Hall AB, et al. Genomic variation and strain-specific functional adaptation in the human gut microbiome during early life. Nature microbiology. 2019;4(3):470–479. 10.1038/s41564-018-0321-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48. Almeida A, Mitchell AL, Boland M, Forster SC, Gloor GB, Tarkowska A, et al. A new genomic blueprint of the human gut microbiota. Nature. 2019;568(7753):499 10.1038/s41586-019-0965-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49. Forster SC, Kumar N, Anonye BO, Almeida A, Viciani E, Stares MD, et al. A human gut bacterial genome and culture collection for improved metagenomic analyses. Nature biotechnology. 2019;37(2):186 10.1038/s41587-018-0009-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50. Turnbaugh PJ, Ley RE, Hamady M, Fraser-Liggett CM, Knight R, Gordon JI. The human microbiome project. Nature. 2007;449(7164):804 10.1038/nature06244 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51. Parks DH, Imelfort M, Skennerton CT, Hugenholtz P, Tyson GW. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome research. 2015;25(7):1043–1055. 10.1101/gr.186072.114 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52. Parks DH, Chuvochina M, Waite DW, Rinke C, Skarshewski A, Chaumeil PA, et al. A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life. Nature biotechnology. 2018;. 10.1038/nbt.4229 [DOI] [PubMed] [Google Scholar]
  • 53. Rho M, Tang H, Ye Y. FragGeneScan: predicting genes in short and error-prone reads. Nucleic Acids Research. 2010;38(20):e191–e191. 10.1093/nar/gkq747 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54. Price MN, Dehal PS, Arkin AP. FastTree 2–approximately maximum-likelihood trees for large alignments. PloS one. 2010;5(3):e9490 10.1371/journal.pone.0009490 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55. Hug LA, Baker BJ, Anantharaman K, Brown CT, Probst AJ, Castelle CJ, et al. A new view of the tree of life. Nat Microbiol. 2016;1:16048 10.1038/nmicrobiol.2016.48 [DOI] [PubMed] [Google Scholar]
  • 56. Parks DH, Rinke C, Chuvochina M, Chaumeil PA, Woodcroft BJ, Evans PN, et al. Recovery of nearly 8,000 metagenome-assembled genomes substantially expands the tree of life. Nature microbiology. 2017;2(11):1533–1542. 10.1038/s41564-017-0012-7 [DOI] [PubMed] [Google Scholar]
  • 57. Finn RD, Clements J, Arndt W, Miller BL, Wheeler TJ, Schreiber F, et al. HMMER web server: 2015 update. Nucleic acids research. 2015;43(W1):W30–W38. 10.1093/nar/gkv397 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58. Finn RD, Bateman A, Clements J, Coggill P, Eberhardt RY, Eddy SR, et al. Pfam: the protein families database. Nucleic acids research. 2014;42(D1):D222–D230. 10.1093/nar/gkt1223 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59. Haft DH, Selengut JD, White O. The TIGRFAMs database of protein families. Nucleic acids research. 2003;31(1):371–373. 10.1093/nar/gkg128 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60. Borenstein E, Kupiec M, Feldman MW, Ruppin E. Large-scale reconstruction and phylogenetic analysis of metabolic environments. Proceedings of the National Academy of Sciences. 2008;105(38):14482–14487. 10.1073/pnas.0806162105 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Hagberg A, Swart P, S Chult D. Exploring network structure, dynamics, and function using NetworkX. Los Alamos National Lab.(LANL), Los Alamos, NM (United States); 2008.
PLoS Comput Biol. doi: 10.1371/journal.pcbi.1007951.r001

Decision Letter 0

Christos A Ouzounis, Jason A Papin

10 Jun 2020

Dear Dr. Ye,

Thank you very much for submitting your manuscript "Pathway-based and phylogenetically adjusted quantification of metabolic interaction between microbial species" for consideration at PLOS Computational Biology.

As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. In light of the reviews (below this email), we would like to invite the resubmission of a significantly-revised version that takes into account the reviewers' comments.

We cannot make any decision about publication until we have seen the revised manuscript and your response to the reviewers' comments. Your revised manuscript is also likely to be sent to reviewers for further evaluation.

When you are ready to resubmit, please upload the following:

[1] A letter containing a detailed list of your responses to the review comments and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.

[2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file).

Important additional instructions are given below your reviewer comments.

Please prepare and submit your revised manuscript within 60 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email. Please note that revised manuscripts received after the 60-day due date may require evaluation and peer review similar to newly submitted manuscripts.

Thank you again for your submission. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments.

Sincerely,

Christos A. Ouzounis

Associate Editor

PLOS Computational Biology

Jason Papin

Editor-in-Chief

PLOS Computational Biology

***********************

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: This paper is applying an original and advanced computational approach to address an extremely important, fundamentally and practically, problem of metabolic interactions, competitive and cooperative, in complex microbial communities. The developed software (computational pipeline PhyloMInt) combines existing tools for automated genome-scale metabolic reconstruction/modeling (CarveMe) and dissection of so called “seed” compounds (NetworkX) with the original implementation of an algorithm for computing of metabolic competition/cooperation indices. This comprises one of the most impactful aspects of the submitted MS. The availability of the developed software and all of the needed components at github is commendable, and the possibility for any reader to seamlessly install and even modify (if needed) the software add strongly to the anticipated impact of this publication! While the utility of the approach is demonstrated specifically for the human gut microbiome (HGM) (> 2,800 genomes including MAGs), it can be seamlessly expanded to any microbial community limited strictly by the availability of genomes and a quality of their functional annotations.

Additional strength of the paper is the demonstration of non-independence between computed metabolic indices and phylogeny, which is consistent with previously recognized trends of phylogenetically (and, thus, metabolically) close species to compete for the niche resources, and the opposite trend for phylogenetically distant organisms. This finding prompted authors to develop and implement an approach to minimization of this phylogenetic bias. Application of the developed pipeline to the analysis of HGM datasets leads to a reasonable (albeit not unexpected) conclusion that both types of metabolic interactions (and both drivers, “habitat filtration” and “niche differentiation”) play a role in HGM ecology.

Based on these considerations, this article is expected to attract a broad readership and have a notable impact in the field. However, prior to its publication, some work is required to mitigate several noted weaknesses briefly outlined below:

1. Ttile. The current title is somewhat misleading. Unlike MinPath, a previous very impactful accomplishment of the corresponding author, the PhyloMint pipeline, as well as the article per se, does not operate with a concept of metabolic pathways (a biologically solid and practically useful paradigm of predominant and semi-isolated segments of metabolic networks). Instead, it relies on a far more ambitious but less granular/accurate (albeit computationally more attractive) concept of automated genome-scale metabolic models. It is desirable to duly reflect it in the title to avoid confusion.

2. Supplementary data. They are mentioned several times in the MS (line 121, 162, 188, 194) and they are seemingly important but not properly organized and referred to. Instead of a generic umbrella reference in its current form (“details can be found in supplementary data”), which is not helpful at all, every reference should point to a specific supplementary table, graph or file.

3. A more important weakness of the paper is a lack of even one meaningful biological illustration of the main observations and conclusions. A very weak attempt to do so (p. 6, lines 177-183) is really not helpful. An interesting observation about possible metabolic complementarity of distinct species of Bifidobacteria actually has merit, which should be discussed explicitly (and not via a reference to a paper, which is also not very helpful in that sense). Examples of metabolic complementarity could have included: (i) B2 auxotrophy, which is characteristic for most Bifidobacteria except B. infantis, one of the earliest colonizers of breast milk-fed infants; (ii) ability/inability to utilize HMOs also distinguishing otherwise closely related B. longum and B. infantis; (iii) extracellular degradation of fucosylated and sialilated HMOs, which is uniquely characteristic of B. bifidum and may comprise a mechanism of metabolic cooperation with co-colonizing B. infantis. Considering this (or another) example implicated by PhyloMInt at some level of biological/biochemical details would strongly impact the interest of biologists to the methodology and conclusions. In fact, it would be important to openly state whether indeed at least some (or even one) of the known distinctive metabolic features were indeed picked up by CarveMe tool. [Off note, the fact that the quality of the PhyloMInt analysis may be only as good as underlying metabolic reconstructions that are full of errors, deficiencies and inconsistencies (which authors should have admitted while providing rationale for their choice of CarveMe over other overpromising automated model generators), does not diminish the importance of the developed downstream approaches and tools.]

4. In the same context: line 181- “…dominated by taxonomically similar genomes (Helicobacter, Collinsella, Lachnospiraceae, and Ruminococcus).” What was meant by this sentence? That Helicobacter is a close relative of Ruminococcus? Seriously? And if not, then what? And this is only one of the two sentences containing a reference to actual biology… The reader may lose all interest and confidence over vague statements like this one.

5. Providing one or two relevant metabolic examples would also allow authors to illustrate for the readers the entire conceptual apparatus of source and sink metabolites, seed compound sets and so forth that are very relevant for understanding of their analysis of metabolic interactions in mcirobial communities.

There is no doubt that authors will be able to quickly address these minor concerns (may be with a help of an expert in microbial metabolism), which would enhance the well-anticipated impact of this publication.

Reviewer #2: Predicting species interactions among a microbial community from genomic sequences is a challenging problem. Previous work has established algorithms to construct metabolic networks from genomics sequences and compute competition/complementary indices. This work further explored the impact of incomplete genomes assembled from metagenome and the presence of phylogenetically related clusters. Based on simulations the author concludes that genome completeness is unlikely to be factor to infer species interaction. Using a human microbiome dataset the authors normalized phylogenetic relatedness via a bin-based approach. Outliers from each bin are regarded as true interactions. The manuscript is overall easy to follow despite of some grammar errors.

I really like the directions the authors were taking, but feel the presented research is preliminary. More experiments are required to boost the rigor of this presentation. I also understand so far there are no control datasets with known species interactions for a robust estimation of the prediction accuracy, so one might have to be creative to come up with some metrics.

Some ideas to strengthen the paper:

1. While the source and sink nodes are necessary for metabolic network construction, the authors need to show that having incomplete genomes does not impact the resulting networks, not just the number of nodes. Moreover, Metagenome assembled genomes may not miss genes randomly, genes co-located in the same genomic region tend to be missed together (a binning problem).

2. The advantage of controlling phylogenetic relatedness was not clearly shown. Are there other independent evidence that may corroborate the statement that normalized indices are better than those without normalization?

3. In the method section, "Metabolic competition and complementarity indices", is the method the same as that presented in Levy and Borenstein (2013, ref 30)? If not, please list and justify the innovations here.

4. In the method section, "Phylogenetic distance", it seems that the authors presented a novel method. Has it been compared to established method, such as gANI (PNAS 2005;102:2567–2572) or GTDB-tk (https://doi.org/10.1093/bioinformatics/btz848)?

**********

Have all data underlying the figures and results presented in the manuscript been provided?

Large-scale datasets should be made available via a public repository as described in the PLOS Computational Biology data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information.

Reviewer #1: No: Supplementary data are not properly organized. See comments about additional illustrations above.

Reviewer #2: Yes

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Figure Files:

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.

Data Requirements:

Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example in PLOS Biology see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5.

Reproducibility:

To enhance the reproducibility of your results, PLOS recommends that you deposit laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions, please see http://journals.plos.org/compbiol/s/submission-guidelines#loc-materials-and-methods

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1007951.r003

Decision Letter 1

Christos A Ouzounis, Jason A Papin

10 Sep 2020

Dear Dr. Ye,

We are pleased to inform you that your manuscript 'Model-based and phylogenetically adjusted quantification of metabolic interaction between microbial species' has been provisionally accepted for publication in PLOS Computational Biology.

Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests.

Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated.

IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript.

Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us now if you or your institution is planning to press release the article. All press must be co-ordinated with PLOS.

Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Computational Biology. 

Best regards,

Christos A. Ouzounis

Associate Editor

PLOS Computational Biology

Jason Papin

Editor-in-Chief

PLOS Computational Biology

***********************************************************

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: I am very pleased with responses and revisions.

The paper was much improved and can be accepted as is.

Noted only one seeming typo:

Line 217, page 7:

"...example shows a possible metabolic complementary between related species reflected".

I assume it should be either "metabolic complementarity" or "complementary metabolic interactions"

Reviewer #2: The authors sufficiently addressed my previously concerns. I was glad to read the much improved manuscript.

**********

Have all data underlying the figures and results presented in the manuscript been provided?

Large-scale datasets should be made available via a public repository as described in the PLOS Computational Biology data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information.

Reviewer #1: Yes

Reviewer #2: Yes

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: Yes: Zhong Wang

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1007951.r004

Acceptance letter

Christos A Ouzounis, Jason A Papin

21 Oct 2020

PCOMPBIOL-D-20-00796R1

Model-based and phylogenetically adjusted quantification of metabolic interaction between microbial species

Dear Dr Ye,

I am pleased to inform you that your manuscript has been formally accepted for publication in PLOS Computational Biology. Your manuscript is now with our production department and you will be notified of the publication date in due course.

The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript.

Soon after your final files are uploaded, unless you have opted out, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers.

Thank you again for supporting PLOS Computational Biology and open-access publishing. We are looking forward to publishing your work!

With kind regards,

Matt Lyles

PLOS Computational Biology | Carlyle House, Carlyle Road, Cambridge CB4 3DN | United Kingdom ploscompbiol@plos.org | Phone +44 (0) 1223-442824 | ploscompbiol.org | @PLOSCompBiol

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 File. Infomap output of community modules identified from metabolic complementarity index outlier network.

    (TXT)

    S1 Table. Metabolic network statistics of simulated incomplete genomes.

    (XLSX)

    S2 Table. Metabolic complementarity index outliers.

    (TSV)

    S3 Table. Metabolic competition index outliers.

    (TSV)

    S4 Table. Pairwise list of idnetified high metabolic complementarity (Z-score > 2.698) and low metabolic competition (Z-score < −1.000) bacteria.

    (TSV)

    S5 Table. Protein BLAST results.

    Protein BLAST of exo-α-sialidase (NanH1 and NanH2) from B. longum subsp. infantis ATCC 15697 [44] against protein coding genes of B. infantis (MAG: GCF_000020425.1).

    (TSV)

    S6 Table. Protein BLAST results.

    Protein BLAST of exo-α-sialidase (NanH1 and NanH2) from B. longum subsp. infantis ATCC 15697 [44] against protein coding genes of B. longum (MAG: 18391_1_6).

    (TSV)

    S7 Table. Analysis of the proportion of significantly cooperative bacteria within the same Genus.

    (TSV)

    Attachment

    Submitted filename: responses.pdf

    Data Availability Statement

    Implementation and data are available at https://github.com/mgtools/PhyloMint.


    Articles from PLoS Computational Biology are provided here courtesy of PLOS

    RESOURCES