Abstract
An instrumental discovery in comparative and developmental biology is the existence of assembly archetypes that synthesize the vast diversity of organisms’ body plans—from legs and wings to human arms—into simple, interpretable and general design principles. Here, we combine a novel mathematical formalism based on category theory with experimental data to show that similar ‘assembly archetypes’ exist at the larger organization scale of ecological communities when assembling a species pool across diverse environmental contexts, particularly when species interactions are highly structured. We applied our formalism to clinical data discovering two assembly archetypes that differentiate between healthy and unhealthy human gut microbiota. The concept of assembly archetypes and the methods to synthesize them can pave the way to discovering the general assembly principles of the ecological communities we observe in nature.
Keywords: community ecology, assembly rules, archetype, structural stability, gut microbiota
1. Introduction
The ultimate goal of biology is to synthesize the general design principles that underlie the diversity of life on Earth [1]. A very insightful example of such synthesis in comparative and developmental biology is the discovery of assembly archetypes that synthesize the diversity of organisms’ body plans into simple, interpretable and general design principles [2–5]. To illustrate such assembly archetypes, consider the wings that birds use to fly, the legs that mice use to run and the arms that we humans use to create works of art (figure 1a). These limb types have a distinct number of bones, dimensions and functions. Yet, underlying these differences, their body plans share the same assembly archetype of tetrapods [4,5]: one (proximal stylopod) bone, followed by two (middle zeugopod) bones, ending with distal (autopod) bones. An assembly archetype can also be the substrate to build new, more complex assembly rules and assembly archetypes [6]. In our example, requiring that distal bones are digits generates a more complex assembly archetype that synthesizes the limbs of mice and humans. These assembly archetypes of different complexity organize diverse limb types into a hierarchy that explains how they are related (figure 1b). Namely, a collection of limb types share an assembly archetype (i.e. a common design principle) if they have a common ancestor. For example, the green, yellow and purple types in figure 1b share an assembly archetype, but not the red type.
Figure 1.
Assembly archetypes in ecological communities across environmental contexts. We introduce our approach using an analogy to how comparative and developmental biologists synthesize assembly archetypes in organisms’ body plans. (a) Cartoons of the skeletons of four limb types. Despite differences in their size, the number of bones, and use, direct inspection shows that the first three limb types share the assembly archetype of tetrapods [4]: one (stylopod) bone, followed by two (zeugopod) bones, ending in distal bones. This archetype provides a simple, general and interpretable assembly principle for different limb types. One can obtain the more complex assembly archetype ‘tetrapods with digits’ starting from the ‘tetrapods’ archetype and adding the constraint ‘distal bones are digits’. Importantly, note that limb types have endowed a structure allowing us to directly perceive their assembly archetypes by inspection using our human senses. (b) The two assembly archetypes of panel (a) of different complexity (grey) organize the limb types (colours) into a hierarchy. A collection of limb types have a common assembly principle if they share an assembly archetype, synthesized by their closest common ancestor in this hierarchy. In particular, limb types are more similar if the assembly archetype they share is closer in this hierarchy. Therefore, human and mouse limbs are more similar compared with bird limbs. Limb types cannot be synthesized into an assembly archetype when they do not share a common design principle (i.e. they do not share an ancestor in the hierarchy). This situation happens when we try to synthesize an assembly archetype for humans, mice, birds and crabs in our example. (c) Cartoon for a species pool of S = 3 species. (d) The species pool can be assembled in four hypothetical environmental contexts . (e) The coexistence of a species composition depends on the environmental context in which we assemble it. This fact generates different community types (i.e. different coexisting species compositions) across different environmental conditions. The assembly rule of an environment characterizes which community types we can observe in that environment. That is, just as limbs adopt different types in different organisms, the same species pool adopts different community types when assembled in different environmental contexts. However, unlike limb types, the community types come with no structure organizing them, making it virtually impossible to synthesize their assembly archetypes. For example, by inspecting the four community types in this panel, it is difficult for human senses to grasp if they share an assembly archetype or not.
In ecology, however, it is still unclear if general design principles underlie the assembly of ecological communities across the diverse environmental contexts found in nature [7–15]. Indeed, because species coexistence depends on many environmental factors such as temperature [16] or available nutrients [17], the same species pool can exhibit different species compositions when assembled under different environmental contexts. That is, just like limbs ‘adopt’ different types of form in different organisms, a species pool, when assembled under different environmental contexts, can ‘adopt’ different community types. Here, the community type in the environmental context θ is the set of all species compositions that we can assemble by combining species from the pool in that environment. The community type describes all species compositions that coexist and hence we can observe in a given environmental context (see Methods for a precise definition). Figure 1c–e illustrates this concept using a hypothetical pool of S = 3 species that can be assembled under four different environmental contexts . For instance, the community adopts the type HB = {{1}, {2}, {1, 3}, {2, 3}} under environment B and the type HD = {{1, 3}, {1, 2, 3}} under environment D. The composition {1, 3} belongs to both community types, so species 1 and 3 can coexist when assembled in both environmental contexts. That the composition {1} belongs only to HB means that species 1 can survive alone in the environmental context B but not in D.
To obtain a first-hand experience of the difficulties of identifying general design principles in the assembly of ecological communities, consider the following question: do the community types in figure 1e share an ‘assembly archetype’ analogous to the tetrapod assembly archetype that the limb types in figure 1a,b share? This question is not asking if the community types have something in common regarding their species compositions, such as if one species composition appears in all of them. That would be analogous to asking if various limb types have a common bone. Instead, the question asks for an assembly archetype characterizing a fundamental similarity in the rules used to assemble all community types. An assembly rule for the environmental context θ is a statement determining if a potential species composition will coexist or not in that environment, given we know which species we assemble (see Methods for a precise definition and the section ‘Synthesizing assembly archetypes’ for details). For example, the assembly rule MA for the environmental context A is ‘a species composition coexists if species 1 or species 2 are present’. If community types do exhibit assembly archetypes, their existence can pave the way to synthesizing the general assembly principles of ecological communities across diverse environmental conditions, which have remained elusive and controversial since Diamond’s seminal work more than 40 years ago [7,10,18,19].
Answering the above question using our human senses alone is difficult because, unlike limb types that have additional topological structure telling which bones are adjacent, community types are plain collections of numbers lacking any structure. Without this structure, it is virtually impossible to determine whether or not two different community types are the product of similar assembly rules. Consequently, it becomes impossible to determine if a given collection of community types shares an assembly archetype. This difficulty is not limited to figure 1. Instead, it is a fundamental challenge to discover general design principles in ecological communities assembled across diverse environmental contexts [9]. Namely, classic [20] and more recent [21] work has shown that communities can exhibit well-defined assembly rules and specific community types when assembled under similar environmental contexts. However, these rules do not translate well under more diverse environmental contexts [14,19,20,22], making it unclear if and in which sense such rules could be an assembly archetype.
Here, we address the above challenges by introducing a mathematical formalism based on category theory [23] that endows the collection of community types with the necessary structure to compare their assembly rules. This formalism provides an abstract ‘synthesis operation’ that we can apply to any given collection of community types to determine if they share an assembly archetype or not. For example, this synthesis operation answers the question we posed above: the community types HA, HB and HC in figure 1e share the assembly archetype ‘a species composition coexists if either species 1 or 2 are present’, while HD does not.
The remainder of this paper is organized as follows. We first introduce our formalism to synthesize assembly archetypes, providing mathematical definitions for community types and assembly rules. In the Results section, we combine our formalism with experimental data of microbial communities and synthetic data generated from simple population dynamics models to provide evidence of conditions leading to the existence of assembly archetypes. We corroborate these results by studying assembly archetypes using clinical data on the human gut microbiota across healthy and unhealthy hosts. We end by discussing some limitations and possible extensions of our approach.
2. Synthesizing assembly archetypes
Denote by the potential species pool and by the set of different environmental contexts (‘environments’, from hereafter) under which we can assemble a community. The community type of the given species pool under the environment is the collection of all species compositions that can coexist under that environment (Methods). We can conveniently visualize a community type as a hypergraph [24] that, in our analogy to limb types, should be regarded as its ‘skeleton’ (figure 2a). To describe the assembly rule of the community type it is necessary to choose a language and a syntax. Community types are discrete objects. Therefore, Boolean functions provide a compact and convenient language to describe them. An assembly rule takes as input a binary vector z ∈ {0, 1}S representing a potential species composition, where its ith entry satisfies zi = 1 if the ith species is assembled, and zi = 0 otherwise. For example, z = (1, 0, 1) represents the species composition {1, 3}. The assembly rule outputs if the species composition z can coexist in the environment θ, and otherwise.
Figure 2.
The category of homologous hypergraphs allows synthesizing assembly archetypes in ecological communities. The mathematics of category theory allows defining an abstract ‘synthesis operation’ to identify if a collection of community types share an assembly archetype or not. (a) The community types observed in the environmental context can be conveniently visualized as coexistence hypergraphs . In a coexistence hypergraph , vertices correspond to species (circles) and hyperedges to species composition that can coexist in environment θ. Filled (resp. empty) circles represent species that survive (resp. cannot survive) alone in the environment. Filled areas represent hyperedges (i.e. species compositions that coexist). For example, the hypergraph HD = {{1, 3}, {1, 2, 3}} is represented by a line and the interior of a triangle. Coexistence hypergraphs with different colour represent different environmental contexts (colours as in figure 1e). In particular, H1 is the complete hypergraph (i.e. all species compositions can coexist), and H0 is the empty hypergraph (i.e. no species composition can coexist). (b) Each hypergraph has an associated assembly rule , describing the conditions that a species composition needs to satisfy to coexist in environmental context θ. In this work, we use logic circuits written in the conjunctive normal form (CNF) as the language and syntax to describe assembly rules. An assembly rule takes as input a binary vector z = (z1, …, zS) ∈ {0, 1}S representing a potential species composition, where zi = 1 only if species i is assembled. Then, the assembly rule outputs only if the species composition z coexists in the environmental context θ. (c) The category of homologous hypergraphs for the community type HE contains as objects its assembly archetypes (squares) and their organization as morphisms (arrows). These morphisms correspond to removing variables, logic gates, or whole clauses from one archetype to obtain a simpler one. In this example, its category has four objects {ME, MB, MA, M1} and three (prime) morphisms. (d) The category of homologous hypergraphs for all 128 community types of S = 3 species, with objects visualized as squares and morphisms as lines. The resulting categories form a hierarchy. This hierarchy allows defining two operations: synthesis (moving up in this hierarchy) and analysis (moving down in this hierarchy). An assembly archetype M* exists for a collection of community types if we can synthesize each community type into the same assembly rule by moving up in the hierarchy. In the panel, MA is the assembly archetype of {HC, HE}, and M1 is the assembly archetype of {HA, HB, HE}. Some collections of community types, such as {HA, HB, HC, HD}, do not have an assembly archetype.
To choose a syntax for writing assembly rules, we aim to mimic how more complex assembly archetypes of limb types are constructed by adding constraints to an existing archetype. For example, in figure 1a,b, we obtained the assembly archetype ‘tetrapod with digits’ starting from ‘tetrapod’ and adding the constraint ‘distal bones are digits’. We choose logic circuits written in the minimal conjunctive normal form (CNF) syntax to obtain a similar behaviour for assembly rules (Methods). An assembly rule thus reads as a set of clauses joined by an AND () gate (figure 2b). Each of those clauses consists of a combination of the variables {z1, …, zS} with OR () and NOT () gates. To coexist, a species composition must satisfy each and every clause. Note that adding variables, logic gates or complete clauses to an existing assembly rule will generate a more complex assembly rule, while removing them will simplify it. This fact suggests defining the complexity of rule as the minimum number of variables and logic gates needed to write it. In our example of figure 1, the community type HB corresponding to the environment B has as assembly rule (yellow in figure 2b). This assembly rule has only one clause and it shows that, in the B environment, a species composition coexists if and only if ‘either species 1 or species 2 are not assembled’. The complexity of this assembly rule is κ(MB) = 5 because it uses two variables and three logic gates. Finally, for each community type , we build a category encoding its assembly archetypes and how they are organized (Methods). Specifically, the objects in are assembly rules that are archetypical in the sense that they are the Pareto-optimal descriptions of the community type in terms of accuracy and complexity. Morphisms in consist in removing logic gates or entire clauses from an assembly archetype to obtain another simpler one, providing the structure that organizes them. In general, different community types give rise to categories having different objects and morphisms.
To illustrate how our approach works, consider again the community type HE for a pool of S = 3 species shown in figure 2a. Figure 2c depicts its corresponding category . This category has four objects {ME, MB, MA, M1}, representing the assembly archetypes of HE. The (prime) morphisms in are , indicating how its assembly archetypes are organized (orange arrows in figure 2c). The first assembly archetype of complexity κ(ME) = 7 has two clauses and describes HE exactly (i.e. each species composition h ∈ HE satisfies ME(z(h)) = 1, and vice versa). The first prime morphism f1 removes the clause from ME to obtain the second assembly archetype of complexity κ(MB) = 5. The fact that MB is the second archetype means that there is no assembly rule that is more accurate for describing HE, except for the first archetype ME itself. At this complexity level, HE is best described by the rule ‘a species composition coexist if either species 1 or 2 is not present’. The second morphism f2 removes the not gate ¬ from MB to obtain the third assembly archetype of complexity κ(MA) = 3. This third archetype shows that HE is best described at this complexity level by the rule ‘a species composition coexists if either species 1 or 2 is present’. The final morphism f3 removes the clause from MA to get a simplest assembly archetype M1(z) = 1 with complexity κ(M1) = 0. This final archetype shows that the best description of HE at this complexity is simply the rule ‘all species compositions can coexist’. Note that it is also possible to get the third archetype MA directly from the first one ME by composing the two prime morphisms (i.e. by removing the clauses f1 and f2 from ME). It is the structure of a category that ensures we can always compose morphisms in this way. The more morphisms we apply to ME, the simpler the assembly archetype we get.
The above categories provide a ‘synthesis operation’ that identifies if an assembly archetype exists for a collection of community types (calculated as the co-product of the union of their categories, Methods). To explain how this works, in figure 2d, we show the categories for all the possible community types of a pool of S = 3 species, with objects visualized as squares, and morphisms visualized as lines. The resulting categories generate a hierarchy among community types, analogous to that of figure 1b for limb types. This hierarchy allows defining two operations: synthesis corresponding to moving up in this hierarchy, and analysis corresponding to moving down in this hierarchy. For example, the synthesis for the community type HE consists of starting on the orange square ME and moving up in the hierarchy ME → MB → MA → M1 using the morphisms of its category (orange lines in figure 2d). We can apply the same process to obtain a synthesis of the other community types HA (green), HB (yellow), HC (purple) and HD (red) using the hierarchy of figure 2d, moving using only morphisms of their respective colour. Then, an assembly archetype M* exists for a collection of community types if we can synthesize each community type into the same assembly rule by moving up in the hierarchy. In other words, an assembly archetype exists if, at some complexity level, the best way to describe the assembly rule of all community types is identical. For example, the community types {HC, HE} have as their assembly archetype. Namely, applying one orange morphism to ME and one purple morphism to MC get us to the same assembly rule MA. The assembly archetype for {HB, HC, HE} is M1(z) = 1. Importantly, assembly archetypes do not always exist, such as for the community types {HA, HB, HC, HE, HD}. This happens because the first three community types share the assembly archetype M1(z) = 1, but the last community type HD has the assembly archetype M0(z) = 0 (red in figure 2d). That a collection of community types has no assembly archetype means that they do not share a common design principle.
Note we can give a unique label to each of the possible assembly rules (or community types) for a pool of S species. Then, each environment can be associated with the label ℓ = ℓ(θ) of the assembly rule of that environment. Environments with the same assembly rule will have the same label. With a slight abuse of notation, in what follows, we write Mℓ and Hℓ for the ℓth assembly rule and community type, respectively.
3. Results
3.1. Assembly archetypes exist in experimental communities under similar environments
We investigated the existence of assembly archetypes in an experimental microbial community formed by a pool of S = 3 bacterial species studied by Friedman et al. [21] (left in figure 3). In these experiments, all the possible species compositions were assembled under very similar in vitro conditions. To determine the observed community types, we used the experimental data of species assemblages to probabilistically parametrize a generalized Lotka–Volterra (gLV) model and predict species coexistence (Methods). This method provides an inferred species interaction matrix (left in figure 3a), as well as a probability distribution p(θ) of species’ effective growth rates θ = (θ1, θ2, θ3), which phenomenologically represent the total effect of the environment (right in figure 3a). Consistent with the experimental design, the inference indicates that the experimental environments are very strongly concentrated into a single value of effective growth rates (black dot in figure 3a and electronic supplementary material, figure S6). Consequently, the community exhibits a unique community type H31 = {{1}, {2}, {3}} and a unique assembly rule . This rule itself is the trivial assembly archetype of the community across all experimental similar environments, and it is consistent with the original findings.
Figure 3.
Assembly archetypes in two experimental ecological communities. On the left, a pool of S = 3 bacterial species from soil (Pseudomonas aurantiaca, Pseudomonas veronii and Serratia marcescens) assembled in tightly controlled environmental contexts [21] (i.e. test tubes with same nutrient medium). On the right, a pool of S = 3 species from Drosophila melanogaster gut (Lactobacillus plantarum, Lactobacillus brevis and Acetobacter orientalis) assembled across diverse environmental contexts [25] (different fly hosts). (a) Interaction matrix and experimental environmental contexts inferred from experimental data. This inference phenomenologically characterizes the environmental context θ = (θ1, θ2, θ3) as the species’ effective (intrinsic) growth rates. Due to the experimental set-up, all effective growth rates are positive. For easier visualization, we project them into a ternary plot. The inference indicates that environments are strongly concentrated (black dot). Therefore, across the experimental environmental contexts, the community exhibits a single assembly rule M31. (b) Assembling the community across diverse environmental contexts generates not one but 24 different assembly rules, each one appearing in a portion of the environment. Here, different colours correspond to different assembly rules. Green represents rules having M128(z) = 1 as archetype, and purple represents rules having M1(z) = 0 as archetype. (c) The precision π(M) of rule M is the portion of environmental contexts in which it appears. The globality γ(M) of rule M is the portion of environmental contexts in which it is an assembly archetype. The rule with the highest precision is M31 with π(M31) = 0.31, which coincides with the rule observed in the experimental environment. This most precise rule only occupies less than a third of all environments. Indeed, its globality is also γ(M31) = 0.31. By contrast, there are two assembly archetypes with almost twice the globality: γ(M30) = 0.61 and γ(M128) = 0.59. However, and most importantly, note that there is no global assembly archetype across all environmental contexts. (d,e) Similar to (a,c), but for the community of Drosophila gut microbiota. The inferred environmental contexts (d) are more diverse than (c). These experimental environmental contexts generate six assembly rules (72, 91, 106, 124, 125, 128), and the inferred model predicts that six additional assembly rules are possible (e). (f) Rule M128 = 1 has both the higher precision and higher globality (π = 0.56, γ = 0.99), indicating this community has a single and very accurate assembly archetype. (g) Some of the rules observed in (a)–(f) and their morphisms.
However, our inferred model suggests that the existence of the above assembly archetype is rather accidental, occurring because of the low diversity of experimental environments. Should the environments be more diverse, the same community would exhibit many different assembly rules. To demonstrate this point, consider the set of environments where species have positive effective growth rates and their sum equals those of the experimental environments (i.e. the ‘total’ intrinsic growth rate is kept constant, Methods). If we were to assemble this species pool across , the inference indicates that we would observe not one but 24 different assembly rules (figure 3b). Each of these assembly rules occupies a fraction of the possible environments. To quantify this fraction, we can define the precision π(M) of rule M as the fraction of all environments in which the rule occurs. More precise rules are those that exactly describe the community types across wider environments. Note that an assembly rule might not exactly describe a community type under some environment, but it might be its assembly archetype. Indeed, each observed assembly rule occupies a portion of the hierarchy of homologous rules, allowing us to determine their archetypes (electronic supplementary material, figure S12a). Thus, we define the globality γ(M) of rule M as the fraction of all environments under which the rule is an archetype. By definition, γ(M) ≥ π(M). A rule has low precision but high globality when it does not exactly describe the assembly of the community across environments but it is an assembly archetype.
When assembling the community across , the most precise rule is still M31, but it has low precision (i.e. π(M31) = 0.31, figure 3c). Furthermore, γ(M31) = π(M31), meaning that the validity of M31 is limited to those environments where it describes the assembly exactly. This result demonstrates that communities can fail to have an accurate assembly rule across environments even in ‘simple’ scenarios (e.g. gLV population dynamics and the environment only changes the species’ intrinsic growth rates).
Despite that no single rule can accurately describe the assembly of this community across environments, we still find it has two accurate assembly archetypes: and M128 = 1 (figure 3c). The first assembly archetype M30 is twice as global than rule M31, it is simpler (κ(M30) = 2), and it is easier to interpret (i.e. a composition coexists as long as species 3 is not present). Actually, M30 is an archetype for rule M31 (purple in figure 3g and electronic supplementary material, figure S12a). The second archetype M128 has very similar globality to M30 (figure 3c). The assembly archetype M128 occurs only for environments where the effective growth rate of species 1 is the largest among all (green in figure 3c). Otherwise, the community has M30 as the assembly archetype (purple in figure 3c). This result is important because it shows that, even in simple scenarios, communities might not exhibit a unique design principle or assembly archetype when assembled across diverse environments. Therefore, given the expected high diversity of environments occurring in nature, this last result may suggest that finding assembly archetypes is unlikely under natural settings.
3.2. Assembly archetypes exist in experimental communities under diverse environments
To investigate if assembly archetypes can also exist in experimental communities assembled across diverse environments, we revisit a community of commensal bacteria in Drosophila melanogaster gut studied by Gould et al. [25]. This study experimentally assembled different species compositions fed into different fly hosts (each host representing one particular environment). Using these in vivo experimental data, we inferred again a species interaction matrix (left in figure 3d) and a probability density function p(θ) of effective growth rates (right in figure 3d). These inferred effective growth rates are more diverse than those for the in vitro community. Consequently, the in vivo community adopts six different assembly rules {M72, M91, M106, M124, M125, M128}, with rule being the most frequent one. Then, as before, we used the inferred model to study the assembly rules that this community can adopt across the even more diverse environments given by (figure 3e). We found that the community adopts six additional assembly rules. The observed assembly rules can be located in the hierarchy of homologous assembly rules, allowing us to synthesize their assembly archetype (electronic supplementary material, figure S12b). Across these diverse environments, the rule M128(z) = 1 is a single remarkably global assembly archetype (γ(M128) = 0.99, figure 3f). That is, M128 is an almost perfect assembly archetype for this species pool, even when assembled across diverse environments. Indeed, note that M128 is an archetype for rule M106, which is the most frequent assembly rule observed in the experiments (green in figure 3g and electronic supplementary material, figure S12b). This result provides evidence of the existence of assembly archetypes in experimental communities assembled across diverse environments. The question is why the in vivo community can have an assembly archetype across diverse environments, whereas the in vitro community does not.
3.3. Communities with structured interactions render assembly archetypes
To address the above question, we studied a simple niche model of S = 3 competing species (Methods). The niche we consider is a circle (figure 4a). Each species occupies a location in this niche, with species located closer competing more strongly. Under these assumptions, the question becomes: are there configurations of species locations that render assembly archetypes more likely when assembling the community across diverse environments? Because the niche is invariant to rotations, we can fix the location of one species without loss of generality. We can then let the other two species’ locations vary across the niche. For each of those locations, we calculate the assembly rules generated when assembling the community across different environments represented by different intrinsic growth rates.
Figure 4.
Structured species interactions render assembly archetypes more likely. Results are for a niche model and a pool of S = 3 competing species assembled across different environments represented by different intrinsic growth rates chosen uniformly at random from . The grey panel at the bottom of the figure shows community types (bottom) with their corresponding assembly rule (top) and their associated label. (a) Sketch of the niche model, with species located closer competing more strongly. Without loss of generality, we fix the location of the first species at the origin. (b) Label of the most precise rule (top) and its precision (bottom) across all configurations of species locations. (c) Competition creates three zones around each species location. The intermediate zone is shown in grey. (d) Up to symmetry, four configurations H1 to H3 of species locations have high precision (orange). Up to symmetry, configurations with low precision arrange in three lines L1 to L3 (blue). Dashed lines indicate the two symmetry axes. The four configurations with high precision arrange species into one (H1), two (H2 and H2’) and three groups. No species is in an intermediate (grey) zone. Here, species (strongly) compete only with others in the same group. Species can move in the niche direction of the arrows as long they do not enter into an intermediate zone, creating the orange blobs in (b). For example, in H2, species 3 can approach the group {1, 2}; in H2’, the group {2, 3} move approaching species 1. (e) Category of homologous hypergraphs observed across all configurations. (f) Label of the most global assembly archetype (top) and its globality (bottom) across all configurations of species locations. (g) Assembly archetypes are very accurate even if one species is in an intermediate zone (left), thus covering most of the possible configurations of species locations (red in the right panel). Archetypes with low globality occur when two species are in the same intermediate zone. Up to two symmetries, there are two such configurations (L1 and L2 in the middle panel). Even in that case, the globality of archetypes is much higher than the precision of assembly rules.
Figure 4b shows the most precise rule found at each species configuration (top) and the maximum precision that such rule attains (bottom). Configurations where rules attain high precision occur as (orange) blobs and configurations with low precision occur as (blue) lines. Note also the two axes of symmetry in this panel. These blobs and lines can be explained by observing that competition between species creates three zones with different coexistence outcomes around each species’ location (figure 4c). First, close to the species location, a zone of no coexistence because strong competition causes competitive exclusion. Second, far from the species location, a zone of coexistence is produced by weak competition. Finally, an intermediate zone between the first two zones where coexistence or no coexistence outcomes are possible depending on the environment where species are assembled. The number of species located in the intermediate zones controls the precision of assembly rules, as shown below.
Assembly rules with high precision occur only when no species is located in an intermediate zone (i.e. when coexistence outcomes are rather environment independent). Up to symmetry, there are four different configurations producing assembly rules with high precision (orange in figure 4d). These configurations structure species into one (H1), two (H2 and H2’) or three (H3) groups, where species strongly interact only with other species in their same group. Up to symmetry, there are three lines characterizing configurations with low precision (blue in figure 4d). These configurations have in common that at least one species is in an intermediate zone. For example, line L1 characterizes the configurations where species 2 is located in the intermediate zone of species 1, while species 3 location can change across the niche. Configurations at the intersection of two lines occur when two species are located in an intermediate zone, producing assembly rules with very low precision. Importantly, note that the precision of an assembly rule rapidly deteriorates when species are not exactly structured into groups because one species enters an intermediate zone. For example, assembly rules attain a high precision in the configuration H3 of figure 4e. However, if species 2 moves away from species 1 to its intermediate zone, the precision drops by half to .
To calculate the assembly archetypes we build the category of homologous hypergraphs observed across the different species configurations (figure 4e). Figure 4f shows the most global assembly archetype found at each species configuration (top) and the maximum globality that it attains (bottom). Because globality is at least as high as precision, rules with high globality—and thus accurate assembly archetypes—are also more likely when species are structured into groups. But importantly, accurate archetypes can exist even when this condition is not exactly satisfied. For example, accurate archetypes exist when one species is located in an intermediate zone, such as in the transition from configuration H3 to configuration H2 of figure 4g. The configurations with the low globality occur when two species are located in intermediate zones (green in figure 4g). However, even in such cases, an assembly rule’s globality remains much higher than its precision (i.e. for all species configurations, compared with some configurations having ).
Changing the niche’s ‘size’ using a niche spread parameter changes the species’ competition strength. For example, decreasing the niche spread makes the niche ‘bigger’. In turn, this decreases competition, makes the intermediate zones smaller, and the blue lines of species configurations with low precision become thinner (electronic supplementary material, figure S14). In this case, because a big circle looks like a line close to any of its points, the neighbourhood of the origin will also behave as if the niche was linear (electronic supplementary material, figure S14a). By contrast, increasing the niche spread will increase the intermediate zones, making the blue lines thicker, overall decreasing the precision and globality of assembly rules (electronic supplementary material, figure S14b,c). Increasing the niche spreads also tends to increase the number of different observed assembly rule for a species configuration. In general, the largest difference between the globality and precision of an assembly rule occurs at the intersection of the lines L1 and L3 (bottom row of electronic supplementary material, figure S14). That is, the biggest advantage of assembly archetypes over assembly rules occurs when one species is located in the intermediate zone of each species. We find similar results in a niche model with predator–prey and mutualistic interactions (electronic supplementary material, figures S15–S17).
3.4. Assembly archetypes of the human gut microbiota under health and disease
The gut microbiota is the community of bacteria residing in our gastrointestinal tract. It plays a fundamental role in human health and disease [26]. Here, we leverage highly resolved clinical data analysed by Gibson et al. [27] to ask if the human gut microbiota has assembly archetypes when assembled across healthy and unhealthy (specifically, with ulcerative colitis) hosts. Specifically, we are interested in understanding similarities and differences under healthy and unhealthy conditions in terms of assembly rules, not in terms of the identities of the taxa assembled in each condition. Thus, we consider a pool of S = 16 taxa found in both conditions (Methods). For this taxa pool in healthy and unhealthy hosts, previous work [27] has shown that its dynamics can be explained by: (i) two different structured interaction matrices (figure 5a,b) and (ii) two different probability distributions for taxa’s effective growth rates (taxa-average [0.15, 0.84] for health, and [0.28, 0.96] for unhealthy, figure 5c).
Figure 5.
Assembly archetypes in the human gut microbiota under health and disease. Results are for S = 16 species shared by the human gut microbiota under health and disease (ulcerative colitis). We use data and inference from Gibson et al. [27]. (a)–(c) The result of the inference done in [27] indicates that the gut microbiota dynamics in health and disease across subjects can be explained by two unique, highly structured interaction matrices, one for health (a) and one for disease (b). Here, blue indicates negative interactions and yellow positive ones. Explaining the microbiota dynamics requires two probability distributions for species’ effective growth rates, one for health (green in (c)) and the other for disease (pink in (c)). This panel shows the two principal components of these two probability distributions for easier visualization. Both distributions show a substantial variance in species’ effective growth rates representing differences in the host’s environmental contexts (average variance across species of 0.026 for health and 0.034 for disease). (d) Remarkably, the inference indicates that assembling this community across healthy hosts results in a unique assembly rule MH with complexity κ(MH) = 445. Similarly, assembling the community across unhealthy hosts results in a unique assembly rule MD with complexity κ(MD) = 325. We then applied our formalism to investigate if both assembly rules share an assembly archetype, resulting in a category of homologous assembly rules (Methods). We find that the only assembly archetype they share is M0(z) = 0 of zero complexity, indicating that their assembly is similar only at this lowest complexity level. (e–h) Other assembly archetypes illuminate the similarities and differences in the assembly of the human gut microbiota under health and disease. The healthy gut has three assembly archetypes of low complexity (MH1, MH2 and MH3) of complexity 14. The unhealthy gut has one assembly archetype of complexity 20. In both health and disease, their archetypes indicate that, to coexist, a species composition must not contain taxa 15 (Parabacteroides merdae) and 16 (Hungatella hathewayi/effluvii). In the healthy gut, the coexistence of species composition requires that either species {2, 8, 10} (MH1) or {5, 9, 14} (MH2) or {6, 7, 13} (MH3) are not present. Under disease, coexistence becomes possible under stronger conditions, in particular requiring that taxa 12 (Porphyromonadaceae Parabacteroides, pink) is not present.
We found that assembling this community across healthy hosts will generate a single community type HH, and thus a single assembly rule MH of complexity κ(MH) = 445. This result is consistent with previous findings indicating that the human gut has remarkably similar dynamics across healthy hosts [28]. Furthermore, the inference indicates that the community also adopts a single (but different) community type HD when assembled across unhealthy hosts with a rule MD with complexity κ(Md) = 325. That the community adopts a single community type when assembled across healthy or unhealthy hosts is consistent with our previous findings indicating that structured species interactions render more likely the existence of assembly archetypes.
To understand the similarities between the healthy and unhealthy assembly rules, we built the category of homologous assembly rules for them (Methods and figure 5d). This category shows that similarities are very scarce: they only share the archetype M0(z) = 0 of zero complexity. The only similarity between health and disease is that, when assembled, most taxa compositions tend not to coexist.
Above the zero-complexity level, the healthy and unhealthy gut assemblies differ. Note these differences are exactly encoded by their assembly rules MH and MD. But the high complexity of these assembly rules makes their differences challenging to understand. We can circumvent this challenge by comparing their simplest assembly archetypes of non-zero complexity. Our framework synthesizes three healthy assembly archetypes {MH1, MH2, MH3} of complexity κ = 14. These three assembly archetypes coincide in that a necessary condition for the coexistence of a taxa composition is that neither Parabacteroides merdae (taxa 15) nor Hungatella hathewayi/effluvii (taxa 16) is present (green taxa in figure 5f–h). In addition, these healthy assembly archetypes indicate that coexistence in the healthy gut requires that at least one of three taxa trios are absent (black taxa in figure 5e–g).
Our framework also synthesizes one single assembly archetype MD1 for unhealthy hosts of complexity κ = 20 (pink in figure 5d). This assembly archetype indicates that, in disease, coexistence requires that: (i) taxa 15 and taxa 16 are not present (green taxa in figure 5h); (ii) that a quartette of taxa is not present (black taxa in figure 5h) and (iii) that Porphyromonadaceae Parabacteroides is not present (taxa 12, figure 5h). While coexistence in healthy hosts also requires Condition 1 above, Condition 2 is stronger than in healthy hosts by combining more taxa. Importantly, Condition 3 is not required in healthy hosts. This last observation suggests that taxa 12 becomes highly competitive under disease, making coexistence more challenging. Indeed, clinical studies have found that bacteria of the genus Parabacteroides and family Porphyromonadaceae are implicated in the genesis of ulcerative colitis by depleting the intestine’s mucosal barrier [29,30].
4. Discussion
Assembly archetypes synthesize the different assembly rules obtained when assembling the same species pool across different environmental contexts, revealing what they all fundamentally share. We have studied experimental and clinical microbial communities and provided direct evidence of assembly archetypes in ecological communities. However, unlike the assembly archetypes for limb types, it is challenging to discover assembly archetypes by direct inspection using our human senses. Instead, assembly archetypes are revealed only through the ‘mathematical synthesis’ that the language of category theory provides. It is in this other sense that, paraphrasing David Hilbert’s perspective on the use of mathematics [31, Part I], assembly archetypes are idealized objects that ‘complete’ our partial view of reality. Assembly archetypes are also a concrete realization of Rene Thom’s programme to understand biological organization from a mathematical viewpoint (e.g. [32, p. 40] and [33]). We also note that previous works have extended the idea of finding ‘archetypes’ to other biological entities such as genes, nucleotides, physiological processes and behavioural patterns [34,35]. To our best knowledge, our work is the first application of the notion of archetype to community ecology.
Our study suggests that assembly archetypes are more likely under structured species interactions. More precisely, we found that rules with high precision tend to occur only when species interactions are such that they make coexistence outcomes environment independent. This conclusion makes sense because a community assembly can achieve high precision only if it remains similar across environments. Importantly, assembly archetypes can occur under more general conditions when species are structured into groups, where species interact strongly with species within their same group and interact weakly with species in different groups. However, additional work is necessary to study the validity of this result in more detailed mathematical models. For example, considering more species with more detailed niche models of species interactions like foodwebs [36] or mutualistic networks [37,38], with different niche utilization functions, or allow species to have different kinds of interactions. Indeed, using clinical data with S = 16 structured taxa, we have shown that our theoretical and experimental results can remain valid under such more realistic cases. We conjecture that a ‘concentration phenomenon’ similar to an asymptotic equipartition property may occur: increasing the number of species could render the partition of environments with the same assembly rule to converge into a configuration where one or a few assembly rules occupy most of the space, while the remaining assembly rules occupy a vanishing or zero space. The clinical data are an extreme example of this conjecture because, according to our findings, a single assembly rule occupies the entire environment. If this conjecture is true, it implies that the counterintuitive process that assembly archetypes can become more likely as the number of species increases. Notably, the absence of assembly archetypes with high globality can be as informative as their presence. Namely, the lack of global assembly archetypes implies there is no common design principle in the assembly rules of the community across the considered set of environmental contexts.
Assembly rules with high precision are those with high structural stability [19,24,39]. That is, the assembly rule of the species pool remains unchanged despite significant environmental changes. Having assembly rules with high structural stability is sufficient for assembly archetypes because an assembly rule’s globality is at least as large as its precision. However, as figure 3b illustrates, this condition is not necessary. Assembly archetypes can exist despite changes in the assembly rules as long as these changes ‘align’ with a single archetype. In this sense, the concept of assembly archetype generalizes the notion of structural stability. For limb types, allowing their assembly rules to change is essential for innovation in body plans and for organisms’ evolvability [40,41]. Importantly, not all changes are possible because the biological substratum constrains the possible ‘directions’ in which assembly rules can change. These constraints are responsible for generating archetypes [2]. We expect an analogous process occurs in ecology at the level of community types. Namely, when subject to environmental changes, the population dynamics of ecological communities should constrain but not eliminate changes in their assembly rules. It remains open to understand how ecological communities in nature achieve a trade-off between maintaining robustness against environmental changes and allowing changes in their assembly rules to generate innovations such as new species compositions.
Motivated by development biology, our work adopts the CNF as the syntax for assembly rules. With this syntax, an assembly rule consists of a collection of clauses, each characterizing one condition that a species composition needs to satisfy to coexist in environment θ. A species composition coexists in the environment if it satisfies every clause. The assembly rule thus acts like a ‘filter’ characterizing the conditions that a species composition needs to satisfy to coexist in that environment, conforming to previous approaches [14,42]. The complexity of an assembly rule in the CNF syntax increases if either (i) it has more clauses or (ii) clauses contain more variables or logic gates. These two conditions are ecologically meaningful. Namely, the first point means that a species composition needs to satisfy more requirements to coexist in environments with more complex assembly rules. The second point means that it is more difficult to describe one of the necessary conditions to coexist in environments with more complex assembly rules. We also note that our category-based framework can be used to synthesize assembly archetypes using any other syntax for assembly rules. Also, as for future work, the idea of ‘mining’ rules from data is a well-established paradigm in machine learning [43], for instance to mine associations in ecological systems [44]. Rule-mining could be combined with our formalism to make the synthesis of assembly archetypes more efficient for larger communities.
From an applied viewpoint, the synthesis of archetypes could help understand the role of specific taxa in community assembly across different environmental contexts. For example, this could help identify taxa with a keystone role under diverse contexts [45] or how changes in the environmental context can drive ecological succession events [46]. Our analysis of the human gut microbiota under health or ulcerative colitis is just a preliminary step in this direction. Additional work is required to evaluate assembly archetypes in other diseases, like Crohn’s disease or recurrent Clostridium difficile infections, and with larger patient cohorts. When moving in this direction, a central challenge is obtaining informative enough data to infer the community types [24]. Machine learning methods could play a fundamental role in circumventing this challenge [47]. Another critical challenge is understanding how changes in environmental context and assembly rules are causally related.
More broadly, according to the Piercian viewpoint [48], the whole scientific endeavour can be conceptualized using only two processes: analysis and synthesis. Analysis starts from the general (e.g. the ‘species’ archetype) and dissects it to discover its particular properties (e.g. which species types are more important for a system). Synthesis is the inverse process, starting from the particular (i.e. a collection of types) and aims to identify the general (i.e. their archetype). Types are the partial, fragmented observations of reality that we have access to. Archetypes ‘glue’ these fragmented observations into a coherent whole or general principle—the unity behind the multiplicity. With the unprecedented amount of available data and computational resources, the time is ripe for ecological research to revisit and develop tools of mathematical synthesis [49,50], as they could hold the key to discover general design principles that underlie the complex context-dependent systems we observe in nature.
5. Methods
Here, we give a summary of our methods, referring the reader to electronic supplementary material, Notes for details.
5.1. Community types
Consider a pool of S species that can be assembled in the set , p ≥ 1, of environmental contexts. Let denote the power set of (i.e. the set of all subsets of ), representing all potential species compositions that can coexist. When a species composition is assembled in an environment , we assume that its coexistence is a binary outcome: the composition either coexists or not. For concreteness and following previous works [24,51,52], we use the classic notion of ‘permanence’ as criterion of coexistence. Thus, coexistence allows for species whose abundance converges to one or multiple interior equilibria or that exhibit limit cycles or strange attractors. We emphasize, however, that our framework can use any other coexistence criteria as long as it is binary (e.g. requiring that species’ abundance remains above a certain predefined level). We then define the community type as the set . A community type can be visualized as a coexistence hypergraph [24], with vertices corresponding to species and hyperedges corresponding to species compositions that can coexist under that environment (definition 1 in electronic supplementary material, note 1.1).
5.2. Assembly rules and their realization as logic circuits
A potential species composition can be equivalently represented by a binary vector z = z(h) ∈ {0, 1}S using the convention zi = 1 only if the ith species is included in h. Therefore, any community type can be equivalently represented by a Boolean function where only if the species composition encoded by z coexists in environment θ. is the assembly rule for in the sense that it specifies the conditions that a species composition z needs to satisfy for coexisting in the environment θ [14]. In this case, we also say that realizes (definition 2 in electronic supplementary material, note 1.2).
As syntax to write assembly rules we choose logic circuits written in the minimal CNF, electronic supplementary material, note 1.3. An assembly rule is thus written as a logic circuit combining the variables z = {z1, z2, …, zS} with the logic gates AND ‘’, OR ‘ and NOT ‘¬’. We are interested in the minimal CNF representation of an assembly rule that uses the minimum number of logic gates and variables (see example 4 in electronic supplementary material, note 1.3). There are very efficient algorithms to compute such minimal CNF representation that are implemented in standard software packages, such as Mathematica or PyEDA. The minimal CNF form of an assembly rule M reads as
where r ≥ 1 is the number of clauses, and each clause Ck(z) takes the form
The sets Pk⊆{1, 2, …, S} and specify the input variables appearing without and with a NOT gate in the kth clause, respectively. Note that if clause k has as fixed 1 as input, then Ck(z) = 1 for all z and the clause can be removed from the assembly rule. Similarly, if clause k has a fixed 0 as input, then this input can be removed from Ck.
Given another assembly rule M′ in the minimal CNF form, we write M′ ⊂ M if it is possible to obtain M′ from M by either (i) removing some input variables from a clause, (ii) removing some NOT gates from a clause or (iii) removing a complete clause. The first operation corresponds to eliminating some elements from the set Pk. The second operation corresponds to eliminating from Qk. The third operation corresponds to eliminating the clause Ck(z) itself (i.e. eliminating Pk and Qk). See electronic supplementary material, note 1.3 for additional discussion.
The convention of writing assembly rules as minimal CNF circuits allows us to define the complexity κ(M) of the assembly rule M as the minimal number of variables and logic gates needed to write it (definition 3 in electronic supplementary material, note 1.3). The CNF imitates our use of natural language to describe organisms’ limb types at different complexity levels by adding or removing clauses (electronic supplementary material, figure S1). More complex assembly rules have more complex clauses or/and a larger number of them.
5.3. The category of homologous community types
We quantify how well a given assembly rule describes a community type H using the prediction error
where M realizes H. Note that if and only if realizes H.
Consider now the set containing all the different assembly rules that exist for a pool of S species. For each , we can calculate its complexity and prediction error . From this data, we identify the assembly rules that are Pareto-optimal in the sense that they provide the best complexity-prediction error trade-off (definition 4 in electronic supplementary material, note 2.1). This approach was used before in machine learning for discovering fundamental laws or rules from data, such as Newton’s Law of Motion [53] or functional responses between species interactions [54]. With this notion, we define the assembly archetypes of H as the optimal-Pareto assembly rules that satisfy the additional condition that simpler assembly archetypes must be contained in more complex archetypes, starting with the assembly rule M realizing H itself as the first archetype (definition 5 in electronic supplementary material, note 2.1). Each represents the best description of the community assembly rule at the complexity level . We illustrate this construction in example 5 of electronic supplementary material, note 2.1 and figure S2a–d.
From the set of assembly archetypes for the community type H, we build the category that encodes how the different archetypes can be compared (see definition 6 in electronic supplementary material, note 2.2 for a formal statement). Namely, the objects of are the assembly archetypes of the community type H. Morphisms in represent removing variables, logic gates or clauses from the minimal CNF representation of archetype to obtain simpler ones (see electronic supplementary material, figure S3 for an example of what a morphism represents). We illustrate this construction in example 6 of electronic supplementary material, note 2.2 and figure S2e.
5.4. A synthesis operation to calculate assembly archetypes
The constructed categories for the different community types allow defining a synthesis operation to identify and calculate if an assembly archetype exists for a given collection {H1, …, Hn} of community types. To define this synthesis operation, denote the union category of all community types as (see definition 7 in electronic supplementary material, note 2.3). The objects of this category are the union of the objects of each category , and a morphism exists between two objects if it exists in at least one category . Then, we define the assembly archetype M*(H1, …, Hn) of the community types {H1, …, Hn} as the co-product
5.1 |
calculated in the category , if such co-product exists. If the co-product does not exist, then the community types do not have an assembly archetype. See definition 8 in electronic supplementary material, note 2.4 for details.
5.5. Experimental communities, inference and estimating community types
We studied three experimental species pools of bacteria based on previously reported experimental data of assemblies of species compositions [21,25,27] (see electronic supplementary material, note 3.1 for descriptions). For each species pool, we used the data of different species assemblies to probabilistically parametrize a gLV model allowing us to estimate the community types that are likely observed under the experimental environmental contexts. Specifically, for a pool of S species, the gLV model takes the form
5.2 |
where xi(t) denotes the abundance of species i at time t ≥ 0. The gLV has as parameters the species’ intrinsic growth rates and their interaction matrix . We note that, despite its simplicity, the gLV was found to adequately model the dynamics of two of the three experimental communities we studied [21,27].
We combined the experimental data of species assemblages with a Bayesian inference method to estimate the gLV parameters under the experimental conditions of each study (electronic supplementary material, note 3.2). For Friedman et al. [21] and Drosophila [25] communities, this resulted in one estimated interaction matrix and one probability density function p(θ) for the intrinsic growth rates that phenomenologically represent the diversity of environments in which species are assembled. For gut microbiota [27], we used inferred interaction matrix and probability density function for intrinsic growth rates reported in the original study. In this case, there exists one pair for health and another pair for disease.
Given a pair parametrizing the gLV model, we obtained the observed community types by sampling θ ∼ p(θ) and calculating the coexistence of each species composition using Jansen’s permanence criterion [51,55] (electronic supplementary material, note 3.3). This resulted in a collection of community types that are likely to be observed under the given experimental conditions of each study.
For each experimental system, we construct the set of environments as follows. Let denote the expected (i.e. average) value of inferred experimental effective growth rates, where p(θ) is the distribution inferred from the experimental data. Then we define , where is the S-dimensional unit simplex. Therefore, for any vector , the sum of effective growth rates equals the average effective growth rate observed in the experiment.
5.6. A niche model shows that structured species interactions render assembly archetypes more likely
Our model is an instance of the classic niche model of species competition [56] (see details in electronic supplementary material, note 4). We consider an (abstract) niche that is one-dimensional, finite and without boundaries, thus topologically equivalent to a circle . Each species is assigned a niche location . We chose a circular niche because it has no distinguished location, so that all species will have an equal number of neighbours on each side. For example, if the niche represents the different resources that can be consumed, a circular niche means there is no special diet. A circular niche also allows us to avoid ‘edge effects’ during the analysis [57]. As noted before, we can choose the location of the first species as μ1 = 0 without loss of generality.
To describe the niche use of each species, we use a von Mises distribution centred at the species’ location μi and with niche spread 1/k > 0. The von Mises distribution is analogous to a normal distribution on the circle [58]. We choose the niche spread equal for all species. Under the above conditions, the interaction strength of species j over species i can be calculated as
5.3 |
where I0(k) is the modified Bessel function of the first kind of order zero. The value of αij depends on the species locations μi and μj. Decreasing the niche spread making k → ∞ renders the niche ‘bigger’ (k → 0 renders the niche ‘smaller’). Note that a sufficiently big circle looks like a line in small neighbourhood of any of its points (formally, they are topologically equivalent). Therefore, for small values of the niche spread, the outcomes of our analysis for the circular niche model coincide with the outcomes for a linear niche model.
The dynamics for each species takes the form of the gLV model of (5.2) with the entry aij of the interaction matrix given by aij = −wij. Note this yields aii = −1. To fix the species’ intrinsic growth rates θ = (θ1, θ2, θ3), we first choose a positive vector as before, and then set θi = ri. In electronic supplementary material, note 4.3, we describe in detail how we extend our niche model to consider predator–prey or mutualistic interactions.
5.7. Synthesizing assembly archetypes in the human gut microbiota
For the pool of S = 16 taxa in the human gut microbiota shared under health and disease [27], there exist different community types, which is much larger than the estimated number of atoms in the universe. This fact makes it unfeasible to synthesize assembly archetype by constructing the union category of all possible community types. To circumvent this challenge, we designed a random sampling algorithm that estimates the assembly archetypes for large species pools (electronic supplementary material, notes 3.4 and 3.5). In brief, given the calculated community types Hh and Hd that the pool exhibits under healthy and unhealthy hosts, we first calculate their (exact) assembly rules Mh and Md written in the minimal CNF syntax. Then, our random sampling algorithm estimates their assembly archetypes by randomly removing variables, logic gates, or whole clauses from the exact assembly rules Mh and Md. We describe in detail this algorithm in electronic supplementary material, note 3.3. In electronic supplementary material, figure S11, we test the performance of this random sampling algorithm showing that it exactly recovers the assembly archetypes in the case of S = 3 species.
Contributor Information
Serguei Saavedra, Email: sersaa@mit.edu.
Marco Tulio Angulo, Email: mangulo@im.unam.mx.
Ethics
This work did not require ethical approval from a human subject or animal welfare committee.
Data accessibility
The code supporting the results is available from the GitHub repository: https://github.com/hugofloresar/Assembly_archetypes/tree/0.1.0 [59] and is archived from the Zenodo repository: https://zenodo.org/records/10154700 [60].
Supplementary material is available online [61].
Declaration of AI use
We have not used AI-assisted technologies in creating this article.
Authors' contributions
H.F.-A: investigation, software; O.A.-C: formal analysis, investigation, methodology; S.S.: conceptualization, formal analysis, funding acquisition, investigation, methodology, project administration, writing—original draft, writing—review and editing; M.T.A.: conceptualization, data curation, formal analysis, funding acquisition, investigation, methodology, project administration, software, supervision, visualization, writing—original draft, writing—review and editing.
All authors gave final approval for publication and agreed to be held accountable for the work performed therein.
Conflict of interest declaration
We declare we have no competing interests.
Funding
M.T.A acknowledges the financial support provided by CONACyT grant no. A1-S-13909 and PAPIIT 104915. Funding to S.S. was provided by NSF grant no. DEB-2024349.
References
- 1.Kirschner MW, Gerhart JC. 2005. The plausibility of life: resolving Darwin’s dilemma. New Haven, CT: Yale University Press. [Google Scholar]
- 2.Alberch P. 1989. The logic of monsters: evidence for internal constraint in development and evolution. Geobios 22, 21-57. ( 10.1016/S0016-6995(89)80006-3) [DOI] [Google Scholar]
- 3.Hall BK. 2012. Homology: the hierarchical basis of comparative biology. San Diego, CA: Academic Press. [Google Scholar]
- 4.Shubin N. 2008. Your inner fish: a journey into the 3.5-billion-year history of the human body. New York, NY: Knopf Doubleday Publishing Group. [Google Scholar]
- 5.Young NM. 2017. Integrating ‘evo’ and ‘devo’: the limb as model structure. Integr. Comp. Biol. 57, 1293-1302. ( 10.1093/icb/icx115) [DOI] [PubMed] [Google Scholar]
- 6.Shubin N, Tabin C, Carroll S. 2009. Deep homology and the origins of evolutionary novelty. Nature 457, 818-823. ( 10.1038/nature07891) [DOI] [PubMed] [Google Scholar]
- 7.Diamond JM. 1975. Assembly of species communities. Ecol. Evol. Commun. 6, 342-444. [Google Scholar]
- 8.Clements FE. 1916. Plant succession: an analysis of the development of vegetation, vol. 242. Washington, DC: Carnegie Institution of Washington. [Google Scholar]
- 9.Park T. 1954. Experimental studies of interspecies competition II. Temperature, humidity, and competition in two species of Tribolium. Physiol. Zool. 27, 177-238. ( 10.1086/physzool.27.3.30152164) [DOI] [Google Scholar]
- 10.Weiher E, Keddy P. 2001. Ecological assembly rules: perspectives, advances, retreats. Cambridge, UK: Cambridge University Press. [Google Scholar]
- 11.Vellend M. 2016. The theory of ecological communities (MPB-57). Princeton, NJ: Princeton University Press. [Google Scholar]
- 12.Fukami T. 2015. Historical contingency in community assembly: integrating niches, species pools, and priority effects. Ann. Rev. Ecol. Evol. Syst. 46, 1-23. ( 10.1146/annurev-ecolsys-110411-160340) [DOI] [Google Scholar]
- 13.Drake JA. 1990. Communities as assembled structures: do rules govern pattern? Trends Ecol. Evol. 5, 159-164. ( 10.1016/0169-5347(90)90223-Z) [DOI] [PubMed] [Google Scholar]
- 14.Keddy PA. 1992. Assembly and response rules: two goals for predictive community ecology. J. Veg. Sci. 3, 157-164. ( 10.2307/3235676) [DOI] [Google Scholar]
- 15.Temperton VM, Hobbs RJ, Nuttle T, Halle S. 2004. Assembly rules and restoration ecology: bridging the gap between theory and practice. Washington, DC: Island Press. [Google Scholar]
- 16.Lessard JP, Dunn R, Sanders N. 2009. Temperature-mediated coexistence in temperate forest ant communities. Insectes Soc. 56, 149-156. ( 10.1007/s00040-009-0006-4) [DOI] [Google Scholar]
- 17.Aponte C, García LV, Marañón T. 2013. Tree species effects on nutrient cycling and soil biota: a feedback mechanism favouring species coexistence. Forest Ecol. Manag. 309, 36-46. ( 10.1016/j.foreco.2013.05.035) [DOI] [Google Scholar]
- 18.Gotelli NJ, McCabe DJ. 2002. Species co-occurrence: a meta-analysis of JM Diamond’s assembly rules model. Ecology 83, 2091-2096. ( 10.1890/0012-9658(2002)083[2091:SCOAMA]2.0.CO;2) [DOI] [Google Scholar]
- 19.Deng J, Taylor W, Saavedra S. 2022. Understanding the impact of third-party species on pairwise coexistence. PLoS Comput. Biol. 18, e1010630. ( 10.1371/journal.pcbi.1010630) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Gilpin ME, Carpenter MP, Pomerantz MJ. 1986. The assembly of a laboratory community: multispecies competition in Drosophila. In Community ecology (eds Diamond J, Case TD), pp. 23-40. New York, NY: Harper and Row. [Google Scholar]
- 21.Friedman J, Higgins LM, Gore J. 2017. Community structure follows simple assembly rules in microbial microcosms. Nat. Ecol. Evol. 1, 1-7. ( 10.1038/s41559-017-0109) [DOI] [PubMed] [Google Scholar]
- 22.Deng J, Angulo MT, Saavedra S. 2021. Generalizing game-changing species across microbial communities. ISME Commun. 1, 22. ( 10.1038/s43705-021-00022-2) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Lawvere FW, Schanuel SH. 2009. Conceptual mathematics: a first introduction to categories. Cambridge, UK: Cambridge University Press. [Google Scholar]
- 24.Angulo MT, Kelley A, Montejano L, Song C, Saavedra S. 2021. Coexistence holes characterize the assembly and disassembly of multispecies systems. Nat. Ecol. Evol. 5, 1091-1101. ( 10.1038/s41559-021-01462-8) [DOI] [PubMed] [Google Scholar]
- 25.Gould AL, et al. 2018. Microbiome interactions shape host fitness. Proc. Natl Acad. Sci. USA 115, E1 1951-E1 1960. ( 10.1073/pnas.1809349115) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Cho I, Blaser MJ. 2012. The human microbiome: at the interface of health and disease. Nat. Rev. Genet. 13, 260-270. ( 10.1038/nrg3182) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Gibson TE, et al. 2021. Intrinsic instability of the dysbiotic microbiome revealed through dynamical systems inference at scale. BioRxiv ( 10.1101/2021.12.14.469105). [DOI]
- 28.Bashan A, Gibson TE, Friedman J, Carey VJ, Weiss ST, Hohmann EL, Liu YY. 2016. Universality of human microbial dynamics. Nature 534, 259-262. ( 10.1038/nature18301) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Imhann F, et al. 2018. Interplay of host genetics and gut microbiota underlying the onset and clinical presentation of inflammatory bowel disease. Gut 67, 108-119. ( 10.1136/gutjnl-2016-312135) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Alipour M, et al. 2016. Mucosal barrier depletion and loss of bacterial diversity are primary abnormalities in paediatric ulcerative colitis. J. Crohn's Colitis 10, 462-471. ( 10.1093/ecco-jcc/jjv223) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Benacerraf P, Putnam H. 1984. Philosophy of mathematics: selected readings. Cambridge, UK: Cambridge University Press. [Google Scholar]
- 32.Thom R. 1969. Topological models in biology. Topology 8, 313-335. ( 10.1016/0040-9383(69)90018-4) [DOI] [Google Scholar]
- 33.Thom R. 1988. Structure et fonction en biologie aristotélicienne. In Biologie théorique, pp. 247–266. Paris, France: Solignac.
- 34.Brigandt I. 2003. Homology in comparative, molecular, and evolutionary developmental biology: the radiation of a concept. J. Exp. Zool. B: Mol. Dev. Evol. 299, 9-17. ( 10.1002/jez.b.36) [DOI] [PubMed] [Google Scholar]
- 35.Wagner GP. 2007. The developmental genetics of homology. Nat. Rev. Genet. 8, 473-479. ( 10.1038/nrg2099) [DOI] [PubMed] [Google Scholar]
- 36.Williams RJ, Martinez ND. 2000. Simple rules yield complex food webs. Nature 404, 180-183. ( 10.1038/35004572) [DOI] [PubMed] [Google Scholar]
- 37.Saavedra S, Reed-Tsochas F, Uzzi B. 2009. A simple model of bipartite cooperation for ecological and organizational networks. Nature 457, 463-466. ( 10.1038/nature07532) [DOI] [PubMed] [Google Scholar]
- 38.Medeiros LP, Garcia G, Thompson JN, Guimarães PR Jr. 2018. The geographic mosaic of coevolution in mutualistic networks. Proc. Natl Acad. Sci. USA 115, 12 017-12 022. ( 10.1073/pnas.1809088115) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Saavedra S, Rohr RP, Bascompte J, Godoy O, Kraft NJ, Levine JM. 2017. A structural approach for understanding multispecies coexistence. Ecol. Monogr. 87, 470-486. ( 10.1002/ecm.1263) [DOI] [Google Scholar]
- 40.Arnold S, et al. 1989. How do complex organisms evolve. In Complex organismal functions: integration and evolution in vertebrates (eds DB Wake, GJ Roth), pp. 403–433. Chichester, UK: Wiley & Sons.
- 41.Alberch P. 1991. From genes to phenotype: dynamical systems and evolvability. Genetica 84, 5-11. ( 10.1007/BF00123979) [DOI] [PubMed] [Google Scholar]
- 42.Hobbs RJ, Norton DA. 2004. Ecological filters, thresholds, and gradients in resistance to ecosystem reassembly. In Assembly rules and restoration ecology: bridging the gap between theory and practice (eds RJ Hobbs, S Halle, T Nuttle, VM Temperton), pp. 72–95. Washington, DC: Elsevier.
- 43.Hipp J, Güntzer U, Nakhaeizadeh G. 2000. Algorithms for association rule mining—a general survey and comparison. ACM Sigkdd Explor. Newsl. 2, 58-64. ( 10.1145/360402.360421) [DOI] [Google Scholar]
- 44.Chng KR, et al. 2020. Metagenome-wide association analysis identifies microbial determinants of post-antibiotic ecological recovery in the gut. Nat. Ecol. Evol. 4, 1256-1267. ( 10.1038/s41559-020-1236-0) [DOI] [PubMed] [Google Scholar]
- 45.Wang XW, Sun Z, Jia H, Michel-Mata S, Angulo MT, Dai L, He X, Weiss ST, Liu YY. 2023. Identifying keystone species in microbial communities using deep learning. BioRxiv 2023–03. [DOI] [PMC free article] [PubMed]
- 46.Versluis DM, Schoemaker R, Looijesteijn E, Muysken D, Jeurink PV, Paques M, Geurts JM, Merks RM. 2022. A multiscale spatiotemporal model including a switch from aerobic to anaerobic metabolism reproduces succession in the early infant gut microbiota. MSystems 7, e00446-22. ( 10.1128/msystems.00446-22) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Michel-Mata S, Wang XW, Liu YY, Angulo MT. 2022. Predicting microbiome compositions from species assemblages through deep learning. Imeta 1, e3. ( 10.1002/imt2.3) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Zalamea F. 2012. Synthetic philosophy of contemporary mathematics. Cambridge, MA: MIT Press. [Google Scholar]
- 49.Rosen R. 1958. The representation of biological systems from the standpoint of the theory of categories. Bull. Math. Biophys. 20, 317-341. ( 10.1007/BF02477890) [DOI] [Google Scholar]
- 50.Varenne F. 2013. The mathematical theory of categories in biology and the concept of natural equivalence in Robert Rosen. Rev. Hist. Sci. 66, 167-197. ( 10.3917/rhs.661.0167) [DOI] [Google Scholar]
- 51.Jansen W. 1987. A permanence theorem for replicator and Lotka-Volterra systems. J. Math. Biol. 25, 411-422. ( 10.1007/BF00277165) [DOI] [Google Scholar]
- 52.Sigmuiud K. 1995. Darwin’s ‘circles of complexity’: assembling ecological communities. Complexity 1, 40-44. ( 10.1002/cplx.6130010109) [DOI] [Google Scholar]
- 53.Schmidt M, Lipson H. 2009. Distilling free-form natural laws from experimental data. Science 324, 81-85. ( 10.1126/science.1165893) [DOI] [PubMed] [Google Scholar]
- 54.Chen Y, Angulo MT, Liu YY. 2019. Revealing complex ecological dynamics via symbolic regression. BioEssays 41, 1900069. ( 10.1002/bies.201900069) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Schreiber SJ. 2000. Criteria for Cr robust permanence. J. Differ. Equ. 162, 400-426. ( 10.1006/jdeq.1999.3719) [DOI] [Google Scholar]
- 56.MacArthur R, Levins R. 1967. The limiting similarity, convergence, and divergence of coexisting species. Am. Nat. 101, 377-385. ( 10.1086/282505) [DOI] [Google Scholar]
- 57.Scheffer M, van Nes EH. 2006. Self-organized similarity, the evolutionary emergence of groups of similar species. Proc. Natl Acad. Sci. USA 103, 6230-6235. ( 10.1073/pnas.0508024103) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Best D, Fisher NI. 1979. Efficient simulation of the von Mises distribution. J. R. Stat. Soc. Ser. C (Appl. Stat.) 28, 152-157. ( 10.2307/2346732) [DOI] [Google Scholar]
- 59.Flores-Arguedas H, Antolin-Camarena O, Saavedra S, Angulo MT. 2023. Assembly archetypes in ecological communities. GitHub repository. (https://github.com/hugofloresar/Assembly_archetypes/tree/0.1.0) [DOI] [PMC free article] [PubMed]
- 60.Flores-Arguedas H, Antolin-Camarena O, Saavedra S, Angulo MT. 2023. Assembly archetypes in ecological communities. Zenodo. (https://zenodo.org/records/10154700) [DOI] [PMC free article] [PubMed]
- 61.Flores-Arguedas H, Antolin-Camarena O, Saavedra S, Angulo MT. 2023. Assembly archetypes in ecological communities. Figshare. ( 10.6084/m9.figshare.c.6927509) [DOI] [PMC free article] [PubMed]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Citations
- Flores-Arguedas H, Antolin-Camarena O, Saavedra S, Angulo MT. 2023. Assembly archetypes in ecological communities. GitHub repository. (https://github.com/hugofloresar/Assembly_archetypes/tree/0.1.0) [DOI] [PMC free article] [PubMed]
- Flores-Arguedas H, Antolin-Camarena O, Saavedra S, Angulo MT. 2023. Assembly archetypes in ecological communities. Zenodo. (https://zenodo.org/records/10154700) [DOI] [PMC free article] [PubMed]
- Flores-Arguedas H, Antolin-Camarena O, Saavedra S, Angulo MT. 2023. Assembly archetypes in ecological communities. Figshare. ( 10.6084/m9.figshare.c.6927509) [DOI] [PMC free article] [PubMed]
Data Availability Statement
The code supporting the results is available from the GitHub repository: https://github.com/hugofloresar/Assembly_archetypes/tree/0.1.0 [59] and is archived from the Zenodo repository: https://zenodo.org/records/10154700 [60].
Supplementary material is available online [61].