Abstract
Phylogenetic networks describe the evolution of a set of taxa for which reticulate events have occurred at some point in their evolutionary history. Of particular interest is when the evolutionary history between a set of just three taxa has a reticulate event. In molecular phylogenetics, substitution models can model the process of evolution at the genetic level, and the case of three taxa with a reticulate event can be modeled using a substitution model on a semi-directed graph called a 3-sunlet. We investigate a class of substitution models called group-based phylogenetic models on 3-sunlet networks. In particular, we investigate the discrete geometry of the parameter space and how this relates to the dimension of the phylogenetic variety associated to the model. This enables us to give a dimension formula for this variety for general group-based models when the order of the group is odd.
Introduction
Phylogenetic networks are directed graphs that describe the evolution of a set of taxa for which reticulate events have occurred. Such events, which include, horizontal gene transfer and hybridization, are increasingly being discovered to have occurred between taxa, and the development of methods to reconstruct phylogenetic networks from molecular sequence data is an active area of research. It is therefore important that phylogenetic networks and the models that are placed on them are well understood.
In this work, we focus on phylogenetic network-based substitution models. These are latent-variable Markov models where the state space is a set of biological molecules (usually the four nucleic acids ), and along each edge in the network, a transition matrix gives the probabilities of each possible substitution occurring along that edge (see Gross and Long (2018), Nakhleh (2011) for further details). In this work we focus on a family of Markov models called group-based models, so called because the state space of the Markov process is identified with a finite abelian group. In molecular phylogenetics, there are several nucleotide substitution models that are group-based models, such as the Jukes-Cantor (JC) model, the Kimura 2-parameter (K2P) model, and the Kimura 3-parameter (K3P) model. In all these cases, the state space of the four nucleic acids is identified with the Klein-four group .
For Markov models on phylogenetic networks, the joint probabilities of observing particular patterns at the leaves of the network have polynomial parameterizations in terms of the numerical parameters of the model, i.e., the substitution rates along each edge and reticulation edge parameters. This makes them amenable to algebraic study, and, in particular, the space of all possible joint probabilities at the leaves is the intersection of an algebraic variety with the probability simplex.
In this work we are concerned with the dimension of the variety associated to a phylogenetic network and group-based model. The dimension of such varieties, in this case, t-varieties, is an interesting geometric question in its own right, but also has applications to identifiability and phylogenetic network inference. While Gross et al. (2024) establishes the dimension for most group-based phylogenetic network models, the most elusive has been when the network contains 3-cycles. In this work, we focus on the smallest phylogenetic network containing a 3-cycle, which is called a 3-sunlet. While these networks remain the most elusive to understand mathematically, they are perhaps the most important cycles to understand biologically. Indeed, it is assumed that 3-cycles are the most common cycle motif in true phylogenetic networks, since they indicate hybridization or lateral gene transfer between two very closely related taxa, whereas, larger cycles would indicate such reticulation events between less closely related taxa, which, in many cases, is assumed to be rare. Understanding the dimensions of 3-sunlets can help us establish the statistical property of identifiability, as demonstrated in Proposition 3.9. It can also help us understand how 3-sunlet models are geometrically embedded within larger sunlet models, helping interpret residuals when using algebraic methods as in Barton et al. (2022), Martin et al. (2023) or determining the most appropriate penalty term when using Bayesian methods.
For group-based models on phylogenetic trees, after a transformation, the parameterization of the model is monomial Evans and Speed (1993), Székely et al. (1993), and thus the corresponding variety is a toric variety. These models have been well studied (see e.g. Sturmfels and Sullivant (2005)). The parameterization of the model on a sunlet network has a combinatorial interpretation, and, after the same transformation, is described by binomials. Here we look at the 3-cycle case in depth, and establish a dimension result for group-based models for groups of odd order.
Theorem 1.1
Let G be a finite abelian group of odd order , and let be the 3-sunlet network under the general group-based model given by G, with corresponding phylogenetic network variety . Then the affine dimension of is given by
We call the quantity the expected dimension of the model Gross et al. (2024), and Theorem 1.1 agrees with the conjecture given in Gross et al. (2024). As we will see in Section 4, we believe that this conjecture holds for all finite abelian groups (of order at least 5), but as we discuss in detail in Section 5, the proof strategy that we use here is not easily modified for groups of even order, and thus those cases remain open.
Whilst in analysis of DNA sequences the state space of our models is the four nucleic acids and therefore has order four, odd-order state spaces are common in other analyses. For example, in codon models such as the Goldman-Yang model Goldman and Yang (1994), triplets of nucleotides, called codons, code for amino acids, and are the states of the model. There are 64 possible nucleotide triplets, but often only 61 are used, since three are ‘stop’, codons, which are not modeled. In amino acid models, the 20 amino acids of the standard genetic code are often recoded to a smaller set. One example is Dayhoff six-state recoding Dayhoff et al. (1978), where chemically related amino acids are grouped together to form six states. This grouping is based on the substitution rates in the PAM250 matrix, with amino acids with high substitution rates between them grouped together. However, substitution rates between amino acids can depend on factors such as secondary structure and functional domain Goldman et al. (1998), and so methods have been developed to recode based on replacement patterns from, for example, domain-specific databases Kosio et al. (2004). This introduces the possibility of substitution models with odd-order state spaces.
The paper is organized as follows. In Section 2, we describe phylogenetic networks and the paramaterization map for 3-sunlet networks for the general group-based model. We also outline the tropicalization method from Draisma (2008) that we use to determine a lower bound for the dimension. The method is rooted in tropical geometry and leads to hyperplane arrangements on spaces of weight vectors. We close Section 2 with observations about the chambers of these hyperplane arrangements for 3-sunlet networks. In Section 3, we prove the main theorem of the paper (Theorem 1.1), which gives a formula for the dimension for 3-sunlet networks under general group-based models of odd order greater than or equal to 5. We end the section with a partial identifiability result (Proposition 3.9) for general group-based models of odd order. In Section 4, we investigate the dimension for small finite abelian groups (both even and odd) through computational experiments. In particular, we explore chambers of hyperplane arrangements to highlight the difficulty in finding appropriate weight vectors that can be used to establish dimensions of 3-sunlets. Section 5 closes the paper with a discussion about the challenges involved in understanding 3-sunlets, and more generally, networks with 3-cycles.
Background
A (rooted binary) phylogenetic network is a rooted, acyclic, directed graph where each non-root internal vertex has in-degree one and out-degree two, or in-degree two and out-degree one. We refer to the internal vertices with in-degree one as tree vertices and the internal vertices with in-degree two as reticulation vertices. The leaves of the phylogenetic network (the vertices of in-degree 1 and out-degree 0) are labelled by a set of taxa, for which we will always use the set of the first n positive integers . The two edges directed into a reticulation vertex are called reticulation edges. A phylogenetic network is said to be level-1 if, in the undirected skeleton of , no two cycles share an edge. For an example of a level-1 phylogenetic network, see Figure 1. A semi-directed phylogenetic network is a mixed graph that is obtained from a phylogenetic network by suppressing the root vertex and un-directing all non-reticulation edges. Semi-directed networks generalize the notion of unrooted trees, and, for group-based models, if two phylogenetic networks have the same underlying semi-directed topology, then their corresponding varieties are also equal (Gross et al. 2024, Lemma 2.2). Since we are concerned with the dimensions of the corresponding varieties, we will only consider semi-directed phylogenetic networks.
Fig. 1.

A rooted, level-1 phylogenetic network. This network contains a single 3-cycle and a single 4-cycle. Reticulation edges are drawn with dashed lines
The fundamental building blocks of level-1 semi-directed phylogenetic networks are unrooted trees and k-sunlet networks, which are the minimal semi-directed phylogenetic networks containing a k-cycle. In this paper, we focus on 3-sunlet networks, which are the minimal semi-directed phylogenetic networks containing a triangle (i.e., a 3-cycle). A 3-sunlet can be obtained from the phylogenetic network in Figure 1 by restricting the network to the leaves labelled by taxa 1, 3, and 6 and suppressing vertices of degree 2 (restriction is discussed in more detail towards the end of Section 3). Phylogenetic networks with triangles are thought to be among the most common phylogenetic networks, because hybridization usually occurs between closely related species. Despite this, 3-sunlet networks are the least understood of the sunlet networks.
We place a group-based model of evolution on a level-1 semi-directed phylogenetic network by arbitrarily assigning direction to all undirected edges (i.e., non-reticulation edges), and choosing a finite abelian group G and a subgroup B of the automorphism group of G, denoted . We note that arbitrarily assigning directions to all undirected edges results in the same variety as rooting the semi-directed network even if the chosen edge directions are not consistent with any placement of a root vertex (see the proof of Lemma 2.2 in Gross et al. (2024)). The group G is identified with the state space of the model, and the group B encodes additional constraints that the transition matrices must adhere to. When we choose we call the model the general group-based model for G. For example, the Kimura 3-parameter (K3P) model is the general group-based model for . The Jukes-Cantor (JC) model is the group-based model with and , which we identify with , the symmetric group of order 3. In between these two we have the Kimura 2-parameter model (K2P), where B is a subgroup isomorphic to . Group-based models have the desirable property that for any phylogenetic tree there exists a Fourier transformation that transforms expressions for the marginal probabilities of observations at the leaves into monomial expressions (see e.g., (Sullivant 2018, Chapter 15) for an overview). For level-1 phylogenetic networks, that same transformation significantly simplifies the expressions for the marginal probabilities, although they are not monomial (Gross and Long 2018, Prop 4.2).
We are interested in identifying the semi-directed phylogenetic network from observed data on the leaves. As noted above, for group-based models, the root of network is not identifiable. For certain group-based models, identifiability results are known (see e.g. Gross and Long (2018), Gross et al. (2021), Hollering and Sullivant (2021), Cummings et al. (2023)), but a general result for all group-based models has yet to be determined. Understanding the dimension of the variety associated to a phylogenetic network and model can assist in determining identifiability. A step in this direction was taken in Gross et al. (2024), and some identifiability results were obtained for arbitrary group-based models. Here, one limiting factor was being unable to determine the dimension of the varieties associated the 3-sunlet network.
The 3-sunlet networks
The 3-sunlet is the semi-directed network topology of a simple 3-leaf phylogenetic network with a single cycle. It poses a particular problem to phylogeneticists, because under the most commonly used 4-state group-based models (JC, K2P, and K3P), the reticulation vertex is not identifiable from data at the leaves of the network (see e.g. (Gross et al. 2021, Lemma 1)). Thus many of the identifiability results obtained for these models require the phylogenetic networks to be ‘triangle free’Gross et al. (2021).
Mathematically, the 3-sunlet is a semi-directed graph whose skeleton consists of a single 3-cycle and one leaf vertex adjacent to each vertex in the cycle. One of the vertices in the cycle is the reticulation vertex, and the two cycle edges adjacent to this vertex are reticulation edges. The reticulation edges are the only directed edges and they are directed towards the reticulation vertex (see Figure 2 for an example). By removing either of the edges or in Figure 2 and undirecting the remaining edge, we obtain an unrooted phylogenetic tree (with a vertex of degree 2), which we denote by and respectively.
Fig. 2.
(Left) The semi-directed network topology of the 3-sunlet network with taxa labels 1,2, and 3. (Right) A directed 3-sunlet network
In order to simplify our exposition, we begin with the phylogenetic network parameterization in the transformed coordinates. Readers interested in the derivation of this parameterization from the substitution model can consult (Gross et al. 2024, Section 2) for a full explanation. In order to specify the parameterization we must direct the undirected edges of the semi-directed network topology. By (Gross et al. 2024, Lemma 2.2) we may arbitraily choose these directions, and so for the remainder of this work we denote by the directed 3-sunlet network in Figure 2 (right).
Let G be a finite abelian group and let B be a subgroup of . Let be the set of B-orbits of G, and define . Note that when we have . A consistent leaf-labelling is a triple satisfying . For a fixed G there are exactly consistent leaf-labellings. We give a basis indexed by B-orbits and edges of , and denote the basis element corresponding to the B-orbit [g] and edge as . We give a basis indexed by consistent leaf-labellings . Then the parameterization map (in transformed coordinates) is given by
where is the coefficient of in w. Here, the first term comes from the phylogenetic tree (obtained from be removing the edge ). Each superscript is given by the edge-labelling of the edge in the right hand diagram in Figure 2 (see Gross et al. (2024) for further details). Similarly, the second term comes from the phylogenetic tree (obtained from by removing the edge ). The phylogenetic variety of and (G, B) is defined as the Zariski closure of the image of , denoted
and this is the object that we study. Since the map is homogeneous, the variety is a projective variety. However, we will mostly remain in affine space and consider the affine cone.
Observe that the map is a morphism of affine varieties. It has comorphism given by
somewhere we think of the coordinate ring of as being generated by variables for consistent leaf-labellings , and we think of the coordinate ring of as being generated by variables for and . Under this definition, the vanishing ideal of , which we denote , is given by . This ideal, and in particular, its generating sets, are an important object of study in mathematical phylogenetics as they can be used for model selection Barton et al. (2022) Cummings et al. (2024) Martin et al. (2023) and to establish identifiability Gross and Long (2018) Hollering and Sullivant (2021) Gross et al. (2021) Cummings et al. (2023). While some polynomials are known for some group-based models such as CFN Cummings et al. (2024) Cummings et al. (2023), and for certain networks, such as the 4-sunlet, polynomials have been calculated for the JC, K2P Martin et al. (2023), and K3P Cummings and Hollering (2026) models, generating sets of are not known in general.
Determining Dimension
In Section 3, we give a dimension result for the 3-sunlet network and the general group-based model for groups of odd order by following the approach taken in Gross et al. (2024). Here, we introduce the concepts and objects we need. First, for a level-1 phylogenetic network and group-based model (G, B), the variety is defined as the Zariski closure of the image of a polynomial map (as defined above for the 3-sunlet). Considering the number of free parameters in the domain of the map gives us an upper bound on the dimension.
Lemma 2.1
(Gross et al. 2024, Proposition 4.2) Let be the 3-sunlet network, G a finite abelian group and B a subgroup of with . Then
As a result of Lemma 2.1, to prove Theorem 1.1, it is sufficient for us to give a lower bound on the dimension. We do this by exhibiting a Jacobian matrix of sufficient rank of the tropicalization of the parameterization.
Let be the parameterization map of the 3-sunlet network under the general group-based model for an abelian group G, and for a consistent leaf-labelling , let be the component of mapping onto the -coordinate of . Since is a polynomial map, each is a polynomial, and we can define
where denotes the set of exponent vectors corresponding to the monomials in the polynomial expression for . Then is the map with components . Now for at which is differentiable there exists a matrix such that for all in an open neighbourhood of (in fact is the Jacobian of at ). Then we have a lower bound on the dimension of the affine variety given by
This is a specific case of Corollary 2.3 in Draisma (2008). For full details we recommend the reader consult Draisma (2008) and (Gross et al. 2024, Section 2.3).
In this paper, we study the matrices , and, in particular, the cone in the space that they induce. We can think about this space as being a kind of ‘tropical dual’ to the parameter space , and we adopt the same indexing. That is, the entries of are indexed by B-orbits and edges of . Then , where and . Observe that the vector defines a monomial order on the polynomial ring (provided we specify another order for resolving ties). For this reason, we call a weight vector.
For weight vectors where is differentiable, each column of is indexed by a consistent leaf-labelling . The -entry of the parameterization is given by , where is the -entry of the parameterization of . For each and we define to be the natural product of the row vector of exponents of , with the column vector . That is,
| 1 |
and
| 2 |
Thus for a given weight vector , we take the column of indexed by to be the exponent vector of the monomial where for ; in this case, we will say that assigns to tree . The procedure just described is equivalent to describing an initial ideal of the ideal generated by the image of , where the initial term for each generator, , is given by the monomial .
In Section 3, we give a solution to the dimension problem for general group-based models on the 3-sunlet network for all finite abelian groups of odd order at least 7 (we handle by computation), by constructing such that has maximal rank.
Defining Hyperplanes
The matrix is determined by inequalities between linear combinations of the coordinates of the weight vector . In this subsection, we construct a hyperplane arrangement , which divides Euclidean space into the regions on which is differentiable and is constant. The hyperplanes themselves correspond to regions on which is not differentiable. By understanding the defining hyperplanes and resulting geometry, we are able to construct a weight vector so that has the maximum possible rank, and therefore gives the best lower bound on the dimension of the variety .
Definition 2.2
A hyperplane arrangement is a collection of hyperplanes , . A connected component of the complement is a chamber of . We denote by the set of chambers of .
For the 3-sunlet network, each column of corresponds to a consistent leaf-labelling . The column of corresponding to can be one of two vectors, and depends on which inequality the coordinates of satisfy. Thus, each consistent leaf-labelling determines a hyperplane of the arrangement we want to construct for the 3-sunlet network. As above, for we write for the monomial corresponding to the tree in the -entry of the parametrization map . Then, we can define the hyperplane arrangement corresponding to the 3-sunlet as the collection is a consistent leaf-labelling, where
In Lemma 2.3, we make several observations about the hyperplanes . Note that the hyperplane arrangement and the resulting chambers form the Gröbner fan of the ideal generated by the image of , denoted . In this interpretation, the hyperplanes themselves consist of those points for which the initial ideal is not a monomial ideal, and the chambers consist of those points for which the initial ideal is monomial.
We make the following observation about the matrix .
Lemma 2.3
Let for all . The inequalities determining the matrix are:
| 3 |
| 4 |
In particular, the determining inequalities depend only on and .
Proof
The matrix is constant on a region R if and only if each column is constant on R, so it follows that the hyperplane arrangement we seek is the union of hyperplane arrangements each of whose regions defines the constant regions of a single column.
The assignment of columns is equivalent to choosing the direction of the inequality in or . Without loss of generality, assume . Expanding and as in equations (1) and (2) we obtain
Cancelling terms we see that
Inequality (3) controls the assignment of the columns with label , for . The inequalities given by (4) each control a single column with label , with . Thus, by crossing the hyperplane exactly of the columns of change. While, when crossing a hyperplane of the form , exactly one of the columns of changes. Observe that we cannot simply choose which inequalities are satisfied and expect to find a weight vector that achieves this. That is, some combinations of inequalities cannot be simultaneously satisfied. Thus, for a given G, it is unclear how many chambers lie in the hyperplane arrangement , but it is at most . Lemma 3.2 in the next section gives some restrictions on these assignments for groups of odd order, and we will see further examples in Section 4.
Example 2.4
Let . We have four consistent leaf-labellings given by (0, 0, 0), (0, 1, 1), (1, 0, 1), (1, 1, 0). The image of is given by
where in each case the first monomial, corresponds to , and the second monomial corresponds to . Pick a weight vector with , and . We construct the matrix . The columns of are indexed by the 4 consistent leaf-labellings, and the rows are indexed by the 12 parameters for and . For columns (0, 0, 0) and (0, 1, 1) we have that , so in these cases the monomial from is chosen. For (1, 0, 1) we have that so the monomial from is chosen. For (1, 1, 0) we have that so the monomial from is chosen. This gives us the following matrix
![]() |
where the entry corresponding to column and row is the exponent of in the monomial from the entry of the expression for corresponding to the tree chosen.
To end this section, we make some observations on the hyperplane arrangement coming from the 3-sunlet and a group-based model (G, B). We will define the of a chamber as the rank of the corresponding tropical Jacobian matrix for all .
First, observe that when we choose so that all columns correspond to or (this is possible by choosing either very large or very small for all g), then is equal to the corresponding tropical Jacobian for the group-based phylogenetic tree model for or (albeit with some extra rows of 0’s corresponding to the parameters or respectively). Since, the matrix has rank equal to the dimension of the toric variety corresponding to (see e.g., (Sturmfels 1996, Lemma 4.2)), and these varieties are well studied for trees, in both cases, (see e.g. (Gross et al. 2024, Lemma 4.1)). This describes two chambers with rank equal to . In fact, there are two more, as the next proposition demonstrates.
Proposition 2.5
Let C and be two adjacent chambers in , separated by the hyperplane . Then .
Proof
Let and be weight vectors, and consider the matrices and . Suppose, without loss of generality, that in the columns indexed by with correspond to , and in they correspond to . Thus, in the columns of indexed by , entries in the row indexed by are 1, and entries in the row indexed by are 0. On the other hand, in the columns of indexed by , entries in the row indexed by are 0, and entries in the row indexed by are 1. All other entries of and are the same. Finally, observe that in the rows indexed by and , all entries are 0 outside the columns indexed by for . Thus, the difference between and is a swap of the rows indexed by and , and this does not affect the rank.
Applying Proposition 2.5 to the two chambers of rank found above, we have four chambers of this rank. These come from column assignments where all columns corresponding to a leaf-labelling are assigned to , and all other columns assigned to for . It is easy to see that all other chambers have rank strictly greater than this (changing any column corresponding to with will introduce a 1 into the row or which was previously all 0’s), thus we have exactly four chambers of minimal rank .
Lemma 2.6
Let be the hyperplane arrangement given by the 3-sunlet network and group-based model (G, B). Then has exactly four chambers of minimal rank .
Proposition 2.7
Let be the hyperplane arrangement given by the 3-sunlet network and group-based model (G, B). Given a chamber of rank r, the rank of each adjacent chamber is between and .
Proof
As described above, moving to an adjacent chamber is equivalent to either swapping the assignment of all columns indexed by for , or swapping the assignment of a single column indexed by with . In the first case, the rank does not change by Proposition 2.5. In the second case, the rank can change by at most 1.
Proposition 2.5 above shows that every chamber is adjacent to at least one other chamber of equal rank. We would also like to know whether every chamber is adjacent to another of chamber of strictly greater rank, or equivalently, whether the only maximal chambers (with respect to rank) are the globally maximal chambers. As we will see in Section 4.3, the answer to this is no, and there do exist locally maximal chambers.
Dimension for Odd Order Groups
In this section we give the dimension of the 3-sunlet variety for general group-based models where G is a finite abelian group of odd order at least 5. Our results agree with (Gross et al. 2024, Conjecture 7.1). Our method is to find a weight vector so that the corresponding tropical Jacobian has rank greater than or equal to , where , as detailed in Section 2.
First we set out some notation. Assume |G| is odd and let . Choose a subset with and satisfying the property that if then . In particular, . For example, for G equal to the cyclic group of order we could have . We will need the following lemma.
Lemma 3.1
Let G be an abelian group with , where and , i.e. |G| is odd with . Let be a subset with such that if then . Then there exists an injective function such that is not equal to or 2g.
Proof
Let and let . We will give an iterative method of choosing the value of , where at each step we update the set of available choices. To begin with, let , this will be our initial set of available choices. For each , choose the value of to be an element of K satisfying and (should be in K), and remove the value of from K. This is possible, since at each stage, K contains at least 3 elements.
Now it remains to choose and . In the worst case, we have For each we take to be the (possibly unique) such that and .
The next lemma demonstrates that there are relationships among the inequalities in Lemma 2.3.
Lemma 3.2
Fix such that , and suppose that we have a weight vector such that for some the consistent leaf labelling is assigned to and is assigned to for all . If there exists such that is assigned to then is assigned to .
Proof
By assumption, since assigns to and to , we have and for all (see proof of Lemma 2.3). Thus for all . Now if then the result is tautological, so assume . Let so that . Now if is assigned to then . Since and , we have . Thus, , and consequently, is assigned to .
Next we describe a procedure for choosing so that the matrix has rank equal to the expected dimension. This procedure will be illustrated below in Example 3.3. For ease of notation, we will write for , so that for all . In this notation, for a leaf labeling ,
This means is assigned to if and only if and is assigned to if and only if
Choose such that for all , with large enough and all small enough such that for all we have . Fix . By our choice of , we can find such that and for all with . Then for the consistent leaf-labelling we have , so is assigned to . For all we have so the consistent leaf-labelling is assigned to .
Next, consider . In this case, for all the consistent leaf-labelling is assigned to if and only if
and is assigned to if and only if
Choose . Then for we have
and thus is assigned to . Now, consider the inequality , which holds for all . Write with . Then we have and therefore . Thus is assigned to for all .
To summarize, at this point, we have found and to give us the following assignments of the consistent leaf-labellings and for all :
To we have assigned and for all .
To we have assigned and for all .
We repeat the above procedure for every . Then for each of the t pairs we have consistent leaf-labellings assigned to and consistent leaf-labellings assigned to . Now, for all assign to so that we have consistent leaf-labellings assigned to and consistent leaf-labellings assigned to . Note that for we have .
We give a small example to illustrate.
Example 3.3
Let , and let . Pick , , and . When , for the consistent leaf-labellings (1, 2, 0), (1, 1, 1), and (1, 0, 2) respectively we have
We choose so that (1, 2, 0) is assigned to ; and (1, 1, 1) and (1, 0, 2) are assigned to .
When , for the consistent leaf-labellings (2, 1, 0), (2, 2, 2) and (2, 0, 1) respectively we have
Setting gives us (2, 1, 0) and (2, 2, 2) assigned to and (2, 0, 1) assigned to .
Finally, we choose so that (0, 0, 0), (0, 1, 2), and (0, 2, 1) are assigned to .
Returning to an arbitrary finite abelian group of odd order, we choose and as above (with negative) so that we have the assignments of leaf-labellings as described above. We will show that for this choice of we have . Our strategy is to perform row and column operations on in order to turn it into a block upper triangular matrix without changing the rank.
We will denote the row of corresponding to the parameter as . Our first observation is that for any assignment of consistent leaf-labellings to and we have for all . The rows therefore do not contribute to the rank of , so we remove them. Note that this corresponds to the notion of a contracted semi-directed network, which we do not introduce here (see Gross et al. (2024) for further details). Next, perform column swaps so that all columns assigned to are to the left of the columns assigned to . Observe that for all columns assigned to , the entry corresponding to is equal to the entry corresponding to for all (where edge labels are as in Figure 2), and the entry corresponding to is 0 for all . In terms of the tree topology, this is because in the vertex between and has degree 2, and because does not contain the edge . Perform row swaps so that the corresponding edge order is from top to bottom. Next perform the row operations
for all . This gives a block upper-triangular matrix of the form
![]() |
5 |
It follows that . The submatrix A consists of all columns assigned to and rows corresponding to and for all . These are all the variables associated to edges in that form a 3-star tree, and so we know . In Lemma 3.5, we show that . The submatrix B consists of all columns assigned to and rows and for all . The columns of B are given by the consistent leaf-labellings assigned to which are
so and . In the next example, we illustrate the block triangular matrix in (5) for .
Example 3.4
Here we continue Example 3.3. With and assignments of consistent leaf-labellings as in the example, after all row operations we have the following matrix.
![]() |
Note that in this case the dimension of the space containing the variety is , which is less than the expected dimension of . Therefore the expected dimension cannot be reached, and indeed, explicit computation shows that the variety has dimension 9. Here we can see explicitly that the submatrices A and B have rank strictly less than as described in Lemmas 3.5 and 3.7.
In the next two lemmas, we give the rank of the submatrix A and a lower bound on the rank of the submatrix B.
Lemma 3.5
Let G be an abelian group with odd and , and as above in (5). Then .
Proof
The submatrix A consists of columns corresponding to consistent leaf-labellings assigned to , and rows corresponding to the variables , and for all . This submatrix also appears in the corresponding matrix of exponents for the 3-star tree with edges , and as in Figure 3 (possibly after performing some column swaps).
Fig. 3.

A 3-star tree with taxa labels and edge labels
Denote by the matrix of exponents corresponding to the general group-based model of G on the 3-star tree. The dimension of the variety corresponding to this model is (Gross et al. 2024, Lemma 4.1). Since the parameterization of this model is monomial, has rank equal to the dimension of the model.
Let be the column of corresponding to the consistent leaf-labelling , so that we have
We will show that the columns of corresponding to consistent leaf-labellings that we have assigned to can be written as linear combinations of columns corresponding to consistent leaf-labellings we have assigned to , thereby showing that .
First, consider the consistent leaf-labelling . The reader can check that for any with we have
and all terms on the right hand side are from consistent leaf-labellings that are assigned to . Note that since , at least one such h exists.
Next consider the consistent leaf-labelling for and . We consider two cases separately:
Case 1: . For this case we may write
| 6 |
Case 2: . We break this down into two further cases. First, suppose . Then we have
| 7 |
where we are using that although is assigned to , the column is linearly dependent on columns corresponding to leaf-labellings assigned to by the first part of the proof. Therefore we can substitute the relation from (6) into (7) to obtain as a linear combination of columns assigned to . On the other hand, if then
Remark 3.6
Observe that the relations used in the proof above correspond to the cubic binomials in the ideal for the 3-star tree, as described in Sturmfels and Sullivant (2005).
Lemma 3.7
Let G be a finite abelian group with odd and , and as above in (5). Then .
Proof
The columns of B correspond to the consistent leaf-labellings we assign to , that is, those in the set S, where
The rows of B correspond to the variables and , where we have performed the row operation , for all . For the leaf-labelling , we have . Thus, the column of B corresponding to that leaf labelling is the vector in given by
To prove the result, it is sufficient to find a subset of linearly independent columns.
First consider the columns corresponding to consistent leaf-labellings in . Here each column is given by
for . Since , the column is the only column of B with a non-zero component in the basis vector , and is therefore linearly independent of all other columns.
Next we consider columns coming from consistent leaf-labellings in . For each consider the leaf-labellings and , where for each the element is chosen from with the conditions that , and if then (this is possible by Lemma 3.1). We claim that the set
is linearly independent. Let
By the choice of , for all . Thus, to prove the claim, it is sufficient to show that for all with .
First, by considering the basis vectors and , we must have that where , so that . Now is spanned by the vectors and , where
An arbitrary element of is then given by
for . Observe that from the choice of , the basis vectors appearing in this expression are distinct. Now suppose that for with , we have another such element with coefficients and . We will show that if then so that as claimed. We have
| 8 |
| 9 |
Now suppose . By examining the coefficients of we must have
| 10 |
Next consider the coefficient of . Either appears in the expression for v or . For the former, since and so , we must have either or . We consider these three cases separately.
Case 1: . By equating coefficients of we have , and substituting in to equation (10) gives . Next consider the coefficient of . Either this is zero (i.e. or appears in the expression for v.
If then since or , we must have and therefore , from which it follows by Equation (10) that . Now since the coefficient of is not zero we must have either or . If then substituting in gives and therefore , a contradiction. We therefore must have so then , i.e., . But now we have simultaneous equations and , so .
Now, if then we have and . Substituting these values into equation (8) and setting gives
so we conclude that , and therefore .
Case 2: . By equating coefficients of we have . Next consider the coefficient of . As before, either this is zero or appears in the expression for v.
If then we must have , so that and therefore by Equation (10), and then . Now consider the coefficient of . Either or . If then so = , but this contradicts how and were chosen; both and , but then by definition of X we must have and . Thus we must have , and so . Now we have so by equation 10 we have . Solving this simultaneously with gives .
If we have and so by Equation (10) . Substituting into equation (8) and setting gives us as in case 1, and therefore .
Case 3: . In this case we have so . This gives us
In particular, if then so that has 4 linearly independent non-zero terms. We have
so if we must have exactly four non-zero terms in the expression for v. This is only possible if or . If then we have
If then by equating coefficients we must have and thus . On the other hand, if we get
Examining the coefficient of we see that since and we must have , so and .
Putting together the results of this section and (Gross et al. 2024, Proposition 4.2), we get Theorem 1.1, which we restate here for convenience. By direct calculation (see Table 4), we have that for , the dimension of is 21, the expected dimension, so we include this result in the statement.
Table 4.
The number of samples for each rank when
| rank | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 |
|---|---|---|---|---|---|---|---|---|---|
| # chambers | 4 | 80 | 560 | 2,160 | 5,228 | 11,520 | 27,960 | 41,360 | 24,480 |
| # chambers | .004% | .071% | .494% | 1.906% | 4.612% | 10.163% | 24.667% | 36.488% | 21.596% |
Theorem 1.1
Let G be a finite abelian group of odd order , and let be the 3-sunlet network under the general group-based model given by G, with corresponding phylogenetic network variety . Then the affine dimension of is given by
Proof
For , the result is acheived computationally by using random sampling to find a weight vector for which has rank (Table 4). For all other G, choose as described in this section. Through row operations we can transform the matrix into a block upper triangular matrix of the form
so that . By Lemma 3.5, , and, by Lemma 3.7, . Thus, we obtain , and so the affine dimension of is at least . On the other hand, Lemma 2.1 says that is at most .
As discussed at the beginning of this work, the 3-sunlet was the only sunlet where a dimension formula was not given in Gross et al. (2024). Since level-1 phylogenetic networks can be broken down into trees and sunlet networks, for the case when G is an abelian group of odd order at least 5 and is a level-1 phylogenetic network, we can now give a full dimension result.
Theorem 3.8
Let be a level-1 phylgenetic network with n leaves, m edges, and c cycles. Let G be a finite abelian group of odd order . Then the variety corresponding to under the general group-based model for G, denoted , has dimension .
Proof
Following (Gross et al. 2024, Theorem 1.1), we prove the result by induction on the number of cut edges of . If has no cut edges, then it is either the 3-star tree or a sunlet network. If is the 3-star tree, then it has no cycles and (Gross et al. 2024, Lemma 4.1). If is a 3-sunlet network then it has a single cycle and 6 edges, and by Theorem 1.1. If is an n-sunlet network with then it has a single cycle and 2n edges, and (Gross et al. 2024, Theorem 4.7). Thus in all cases the result holds.
For the induction step, suppose that is a level-1 phylogenetic network with a cut edge e, m edges, and c cycles. Let and be the networks obtained by cutting at e and observe that both and have strictly fewer cut edges than . For let and be the number of edges and cycles respectively of , then by induction we have . Next we have that the ideal defining is given by the toric fiber product of and (Cummings et al. 2024, Remark 3.3). Using (Gross et al. 2024, Corollary 3.4) we obtain
The dimension results in Gross et al. (2024) were used to prove identifiability statements for level-1 phylogenetic networks under group-based models. Now that we have the dimension result for 3-sunlet networks, we can remove the ‘triangle-free’ restriction placed on those statements for the general-group based model for G, when G is a finite abelian group of odd order at least 5. The proof of the following result is identical to (Gross et al. 2024, Proposition 6.9). First, recall the following two definitions. We say that two phylogenetic networks and are distinguishable over G if and . Given a level-1 phylogenetic network on n leaves and a subset A of the leaf-set [n], the network restricted to A, denoted , is the level-1 phylogenetic network obtained from by removing all edges and vertices that do not lie on any path between two leaves in A, and suppressing any resulting vertices of degree 2 (except the root vertex). See (Gross and Long 2018, Definition 4.1) for a full definition.
Proposition 3.9
Let and be two level-1 phylogenetic networks on n leaves and both with exactly c cycles. Let G be a finite abelian group of odd order . If there exists a subset such that either
and are level-1 phylogenetic networks with distinct numbers of cycles; or
is a tree and is a level-1 phylogenetic network (i.e. with at least 1 cycle); or
and are distinct trees;
then and are distinguishable over G.
Experimental Results
Theorem 1.1 confirms that for odd order groups, the general group-based model on a 3-sunlet has the expected dimension (except ). In this section, we investigate the dimension for small finite abelian groups, and whilst an analogous construction of for even order groups does not give us a maximal rank , through experiments we find that the expected dimension is obtained in all cases once G is sufficiently large. The code we used to perform these calculations was written in Julia Bezanson et al. (2017) and available to download at https://github.com/shelbycox/3-Sunlet.
Sampling Methods
We use the hyperplane description from Section 2.3 to compute the possible matrices (and their ranks) that can appear for the groups , , , and . For each of the possible regions, we use OSCAR Oscar (2024) to test whether the region is full dimensional. We then use random sampling of points in to obtain exactly one point in each region. For other small groups mentioned in this section, we used only random sampling with samples for each group to obtain the results. In each case, we retain at most one point for each region.
Chamber counts for small groups
Tables 1, 2, 3, 4 and 5 are the result of the computations described above for the groups , , , , and respectively. For the first four groups listed, we confirmed that these are exactly the regions of the hyperplane arrangement. For , the data in the table is the result of sampling points and then retaining no more than one point per region. In addition, we include an illustration of relationships between chambers for in Figure 4. We make some observations.
Table 1.
The number of samples for each rank when
| rank | 7 | 8 | 9 |
|---|---|---|---|
| # chambers | 4 | 24 | 64 |
| % chambers | 4.3% | 26.1% | 69.6% |
Table 2.
The number of samples for each rank when
| rank | 10 | 11 | 12 | 13 | 14 | 15 | 16 |
|---|---|---|---|---|---|---|---|
| # chambers | 4 | 48 | 180 | 496 | 864 | 624 | 112 |
| % chambers | .2% | 2.06% | 7.7% | 21.3% | 37.1% | 26.8% | 4.8% |
Table 3.
The number of samples for each rank when
| rank | 10 | 11 | 12 | 13 | 14 | 15 | 16 |
|---|---|---|---|---|---|---|---|
| # chambers | 4 | 48 | 156 | 584 | 1056 | 480 | 0 |
| % chambers | .2% | 2.06% | 6.7% | 25.1% | 45.4% | 20.6% | 0% |
Table 5.
The number of samples for each rank when
| rank | 16 | 17 | 18 | 19 | 20 | 21 |
|---|---|---|---|---|---|---|
| # chambers | 4 | 120 | 1296 | 7,180 | 26,576 | 79,156 |
| % chambers | .00004% | .00129% | .01391% | .07707% | .28528% | .84969% |
| rank | 22 | 23 | 24 | 25 | 26 | |
| # chambers | 229,069 | 742,458 | 2,148,510 | 3,606,334 | 2,475,117 | |
| % chambers | 2.45892% | 7.96986% | 23.06303% | 38.71193% | 26.56897% |
Fig. 4.

Poset of , graded by distance (right to left) from the starting chamber, (at the bottom of the poset or far right of the diagram). The number in each node is the rank of the corresponding chamber. Key: blue/square - , red/triangle - , yellow/circle -
Remark 4.1
As observed in Lemma 2.6, in all cases we have exactly 4 chambers of lowest rank, which is equal to the dimension of the 3-star tree, . Two of the chambers correspond to when all leaf-labellings are assigned to either or . Weight vectors with sufficiently large and for all , or vice-versa, lie in these chambers. The remaining two chambers correspond to when all leaf-labellings are assigned to and all others assigned to ; and when all leaf-labellings are assigned to and all others assigned to . These chambers can be reached by taking the previous weight vector and swapping the values of and .
For , these chambers can be seen in Figure 4, drawn in squares and highlighted in blue at the very top and very bottom of the diagram. The shortest path in the poset between and has length 7, meaning that 7 hyperplanes need to be crossed to reach one from the other.
Remark 4.2
When , most chambers have rank 9, which is the maximum possible, because the space containing the corresponding variety is . However, for the maximum rank, 16, is achieved by only of chambers. For the maximum rank is 15, which is achieved by only of chambers. For both and rank 14 chambers are observed the most ( and ).
Remark 4.3
From Table 2 and Table 3, we observe that the number of chambers in the hyperplane arrangement is the same for and . We speculate that this is true in general for groups of the same order. However, we also observe that the distribution of ranks for these chambers differs. Thus, the distribution of ranks can depend on the structure of G, and not just |G|.
Locally maximal chambers
In our investigations, we were curious whether a greedy algorithm could be used to find an appropriate by moving from chamber to chamber. For such a method to work, there should be no locally maximal chambers that do not achieve the maximum rank. A locally maximal chamber in terms of rank means that all adjacent chambers have rank less than or equal to the rank of the chamber. Here, we describe our search for locally maximal chambers that do not achieve maximum rank. To do this computationally, we cycle through every chamber in the hyperplane arrangement found by random sampling (as in Section 4.2) and check whether the rank of this chamber is less than the global maximum, but greater than the rank of all adjacent chambers. The code is available in the file locallyMaximalChambers.jl.
The number of such chambers for small groups is given in Table 6. For groups with , the rank of the locally maximal chambers of deficient rank is always one less than the dimension of the model. For , this appears to not be the case. For this group, our code did not complete, but had still found 625 locally maximal chambers of ranks 23, 24, and 25 before it was terminated. However, since we were not able to verify that we had found all chambers in the hyperplane arrangement through random sampling, it may be the case that some of the locally maximal chambers found are not maximal at all.
Table 6.
The number of locally maximal chambers of deficient rank for each group. Locally maximal chambers of deficient rank are chambers which have non-maximal rank, and for which all adjacent chambers have rank lesser than or equal to the rank of the chamber
| Group | |||||
|---|---|---|---|---|---|
| Locally maximal chambers | 0 | 128 | 0 | 1840 | |
| % | 0% | 5.50% | 0% | 1.62% |
Weight vector for groups of even order
In Section 3, we construct a weight vector where is the maximum rank possible for groups of odd order. Here, for even order groups, we construct an analogous vector by following the construction in Section 3, but including elements of order 2 in the set X. In this case, we find that the rank of does not always achieve the empirically maximum rank. That is, random sampling sometimes finds with having greater rank than .
Observe that in all cases, through random sampling we are able to find a weight vector where has the maximum possible rank according to Lemma 2.1. This means that in all cases in Table 7, the dimension of the variety is equal to the expected dimension of .
Table 7.
The third column of the table (Empirical maximum) lists the maximum rank of found by randomly sampling . The fourth column (Modified Section 3 construction) lists the rank of , when is the vector described at the beginning of this section. The last column records the difference between the third and fourth columns.
| Modified Section 3 | ||||
|---|---|---|---|---|
| Group | |G| | Empirical maximum | construction | Gap |
| 4 | 16 | 15 | 1 | |
| 15 | 13 | 2 | ||
| 6 | 26 | 26 | – | |
| 36 | 36 | – | ||
| 8 | 36 | – | ||
| 29 | 7 | |||
| 10 | 46 | 46 | – | |
| 12 | 56 | 56 | – | |
| 56 | – | |||
| 14 | 66 | 66 | – | |
| 16 | 76 | 76 | – | |
| 76 | – | |||
| 76 | – | |||
| 76 | – | |||
| 61 | 15 | |||
| 18 | 86 | 86 | – | |
| 86 | – | |||
| 32 | 156 | 125 | 31 | |
We see that for constructed according to the methods in Section 3, often achieves the maximum empirical rank, and it is only for powers of that this construction does not work. Furthermore, in these cases, the difference between the rank of and the maximum rank is equal to . In Appendix A, we provide an example comparing the constructed and the empirically maximal for . Specifically, we find a weight vector for which the corresponding matrix, , achieves the maximum rank of 16, and compare it to the matrix obtained from a construction analogous to that in Section 3. This gives good evidence that it is always possible to achieve the maximum possible rank (i.e., ) as the rank of , when .
Discussion
In this paper, we give a dimension formula for varieties associated to a 3-sunlet phylogenetic network and general group-based model, where the group G is a finite abelian group of odd order. To do this, we use ideas from tropical geometry and linear algebra. Our proof relies on the fact that for odd order groups, there are no elements of order 2, and thus the non-identity elements can be partitioned into two sets of equal size (one of which we refer to as X in Section 3), each containing mutually inverse elements. Thus, our proof does not obviously generalise to even-order groups, and a full understanding of the dimension of these models remains open. In Section 4.4 we construct weight vectors for even-order groups by assigning self-inverse elements to X and following the construction in Section 3. However, as shown in Table 7, for those groups that are products of , the rank of the corresponding is less than the dimension of the model. Interestingly, the difference is .
We have not yet explored models for which the subgroup is non-trivial. In Gross et al. (2024), a dimension formula is given for triangle-free phylogenetic networks for all group-based models (i.e., for all such subgroups B). This is achieved by choosing a weight vector that, for each edge in the network, is constant on the coordinates associated to the B-orbits on that edge. Since the structure of varies with G, this is only possible if we consider the two orbits and . However, in our experiments we found that for the 3-sunlet, the weight vectors with of maximal rank were not constant on these orbits, suggesting that a case-by-case analysis may be required. We identified equivalent weight vectors (i.e. those for which is the same) with chambers in a hyperplane arrangement. Each chamber can be interpreted as a toric degeneration of the variety. Two of the chambers correspond to the two distinct phylogenetic trees displayed by a 3-sunlet. These have the lowest rank among all chambers, and the rank is equal to the dimension of the variety associated to the group-based model on the corresponding tree. The remaining chambers correspond to a mixture of the Fourier coordinates for these two trees, and so there is no clear phylogenetic intepretation of these chambers.
The investigations in this paper highlight the intricate challenges involved in understanding 3-cycles. In the sampling experiments in Section 4, we were able to examine the ranks of all chambers in the hyperplane arrangement. However, it is only when that the expected dimension of the variety () is less than the dimension of the ambient space (), so the cases with are exceptional. We observe that as the group gets larger, there are proportionally more chambers of maximal rank. However, due to the large growth in the number of chambers as the size of the group grows, we were not able to determine if this is a general pattern.
A first approach at obtaining the result for all finite abelian groups may be to try to adapt the weight vector . Guidance on appropriate adaptations could be found by understanding how changes in the vector correspond to moving between chambers. Indeed, a good understanding of the hyperplane arrangement may make it possible to devise an algorithm to search for a weight vector for which has maximal rank. However, as we have shown in Section 4.3, this is also not straightforward. In this section, we found that for at least some groups there are locally maximal chambers, and therefore a greedy algorithm starting at a lowest rank chamber and moving to chambers of strictly larger rank may not terminate on a globally maximal chamber.
To further understand the varieties associated to 3-sunlet networks and group-based models, we would like to be able to describe generating sets of the corresponding ideals. Polynomials in these ideals are called phylogenetic invariants, and are useful for determining model identifiability, model selection, and even topology inference from sequence data. However, calculating generating sets is challenging. Using Macaulay2Grayson and Stillman (2022) and elimination theory, we attempted to find a Gröbner basis for the ideal corresponding to a 3-sunlet under the general group-based model with – the smallest odd-order group for which we have a dimension result and for which the variety does not fill the whole space. After 50 days on an HPC these computations were still running. We also used the MultigradedImplicitization package Cummings and Hollering (2026) to find generators of fixed total degree. The computations completed for degrees up to and including 7, and in each case there were no generators. After 45 days the computations for degree 8 were still running. In this case we have a polynomial ring with generators, so the dimension of the space spanned by monomials of degree is large: . This highlights the difficulty in calculating generators even for small models. Further work is necessary for this to be achieveable.
Acknowledgements
SC was supported by the National Science Foundation Graduate Research Fellowship under Grant No. DGE-1841052, and by the National Science Foundation under Grant No. 1855135 during the writing of this paper. SM was supported by the Biotechnology and Biological Sciences Research Council (BBSRC), part of UK Research and Innovation, through the Core Capability Grant BB/CCG1720/1 at the Earlham Institute and is grateful for HPC support from NBI’s Research Computing group. SM is grateful for further funding from BBSRC (grant number BB/X005186/1) which also supported this work. EG was supported by the National Science Foundation grant DMS-1945584. This project was initiated at the “Algebra of Phylogenetic Networks Workshop" held at the University of Hawai‘i at Mānoa and supported by National Science Foundation grant DMS-1945584. Additional parts of this research were performed while EG and SC were visiting the Institute for Mathematical and Statistical Innovation (IMSI) for the semester-long program on “Algebraic Statistics and Our Changing World," IMSI is supported by the National Science Foundation (Grant No. DMS-1929348).
Appendix A. Example Weights and Matrices
A.1 Example when .
Below, we study the -construction from Section 3, adapted for even order groups (so all order two elements are in X), for . We denote this weight vector by and denote the corresponding matrix by . As noted in Table 7, , which is not the empirical maximum. Through random sampling, we find in Table 2 that there are 112 chambers of the -sunlet arrangement whose corresponding matrices achieve the maximal rank. We pick a weight vector, , in one of these chambers so that the corresponding matrix, , has maximal rank and differs from in exactly one column. For the chosen here, the two matrices differ only in the column [[2], [1], [1]].
![]() |
Funding
Open Access funding enabled and organized by Projekt DEAL.
Footnotes
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- Barton T, Gross E, Long C, Rusinko J (2022) Statistical learning with phylogenetic network invariants. arXiv preprint arXiv:2211.11919
- Bezanson J, Edelman A, Karpinski S, Shah VB (2017) Julia: A fresh approach to numerical computing. SIAM Rev 59(1):65–98 [Google Scholar]
- Cummings J, Gross E, Hollering B, Martin S, Nometa I (2023) The Pfaffian structure of CFN phylogenetic networks. arXiv preprint arXiv:2312.07450
- Cummings J, Hollering B (2026) Computing implicitizations of multi-graded polynomial maps. J Symb Comput 132:102459 [Google Scholar]
- Cummings J, Hollering B, Manon C (2024) Invariants for level-1 phylogenetic networks under the Cavendar-Farris-Neyman model. Adv Appl Math 153:102633 [Google Scholar]
- Dayhoff MO, Schwartz RM, Orcutt BC (1978) Atlas of protein sequences and structure, Vol 5, chapter A model of evolutionary change in proteins, pages 345–352. National Biomedical Research Foundation
- Draisma J (2008) A tropical approach to secant dimensions. J Pure Appl Algebra 212(2):349–363 [Google Scholar]
- Evans SN, Speed TP (1993) Invariants of some probability models used in phylogenetic inference. Ann Stat 21(1):355–377 [Google Scholar]
- Goldman N, Thorne JL, Jones DT (1998) Assessing the impact of secondary structure and solvent accessibility on protein evolution. Genetics 149(1):445–458 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Goldman N, Yang Z (1994) A codon-based model of nucleotide substitution for protein-coding dna sequences. Mol Biol Evol 11:725–736 [DOI] [PubMed] [Google Scholar]
- Grayson DR, Stillman ME (2022) Macaulay2, Version 1.20, http://www.math.uiuc.edu/Macaulay2/
- Gross E, Krone R, Martin S (2024) Dimensions of level-1 group-based phylogenetic networks. Bull Math Biol 86(8):90 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gross E, Long C (2018) Distinguishing phylogenetic networks. SIAM J Appl Algebra Geom 2(1):72–93 [Google Scholar]
- Gross E, Lv I, Janssen R, Jones M, Long C, Murakami Y (2021) Distinguishing level-1 phylogenetic networks on the basis of data generated by Markov processes. J Math Biol 83(3):32 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hollering B, Sullivant S (2021) Identifiability in phylogenetics using algebraic matroids. J Symb Comput 104:142–158 [Google Scholar]
- Kosio C, Goldman N, Buttimore NH (2004) A new criterion and method for amino acid classification. J Theor Biol 228:97–106 [DOI] [PubMed] [Google Scholar]
- Martin S, Moulton V, Leggett RM (2023) Algebraic invariants for inferring 4-leaf semi-directed phylogenetic networks. bioRxiv:2023.09.11.557152
- Nakhleh L (2011) Problem Solving Handbook in Computational Biology and Bioinformatics, chapter Evolutionary Phylogenetic Networks: Models and Issues, pages 125–158. Springer Science+Business Media, LLC
- Oscar – open source computer algebra research system, version 1.0.0, 2024
- Sturmfels B (1996) Gröbner Bases and Convex Polytopes, vol 8. Universty Lectures Series. American Mathematical Society, Providence, RI
- Sturmfels B, Sullivant S (2005) Toric ideals of phylogenetic invariants. J Comput Biol 12(4):457–481 [DOI] [PubMed] [Google Scholar]
- Sullivant S (2018) Algebraic Statistics, vol 194. Graduate Studies in Mathematics. American Mathematical Society, Providence, RI
- Székely LA, Steel MA, Erdős PL (1993) Fourier calculus on evolutionary trees. Adv Appl Math 14:200–216 [Google Scholar]





