An interchange property for the rooted phylogenetic subnet diversity on phylogenetic networks

Tomás M Coronado; Gabriel Riera; Francesc Rosselló

doi:10.1007/s00285-024-02142-4

. 2024 Oct 4;89(5):48. doi: 10.1007/s00285-024-02142-4

An interchange property for the rooted phylogenetic subnet diversity on phylogenetic networks

Tomás M Coronado ^1,^2,^✉, Gabriel Riera ^1,², Francesc Rosselló ^1,²

PMCID: PMC11452452 PMID: 39365458

Abstract

Faith’s Phylogenetic Diversity (PD) on rooted phylogenetic trees satisfies the so-called strong exchange property that guarantees that, for every two sets of leaves of different cardinalities, a leaf can always be moved from the larger set to the smaller set in such a way that the sum of the PD values does not decrease. This strong exchange property entails a simple polynomial-time greedy solution to the PD optimization problem on rooted phylogenetic trees. In this paper we obtain an exchange property for the rooted Phylogenetic Subnet Diversity (rPSD) on rooted phylogenetic networks, which involves a more complicated exchange of leaves. We derive from it a polynomial-time greedy solution to the rPSD optimization problem on rooted semibinary level-2 phylogenetic networks.

Supplementary Information

The online version contains supplementary material available at 10.1007/s00285-024-02142-4.

Keywords: Phylogenetic network, Level-k network, Phylogenetic subnet diversity, Phylogenetic subnet diversity optimization problem

Introduction

Over the last few centuries, human activity has caused the destruction of natural habitats at an unprecedented pace, resulting in a major episode of biodiversity extinction (Kolbert 2014). Urgent action is required to combat extinction and preserve biodiversity, but there are challenges, including a lack of funding and uncertainties about conservation strategies. Consequently, there has been an increasing need to provide criteria for defining priorities and proposing variables that allow quantification of biodiversity.

The traditional approach to assessing biodiversity based on species counts, species richness, and number of endemic species has limitations. For instance, this type of data is so heterogeneous that it can be difficult to compare across different sites and times (Gaston 1996). The approach based on lists of threatened species also has its drawbacks: for example, changes in the composition of these lists may represent changes in knowledge of species status rather than changes in the status itself (Possingham et al. 2002). Finally, measures of biodiversity based solely on species have been criticized for treating all species as equal, without regard to their functional roles in the ecosystem or their evolutionary history (Faith 1992).

A feature of species that may influence their biodiversity value is their evolutionary distinctness. A species with few close living evolutionary relatives is considered more worthy of protection than a species with many close genetically and phenotypically similar relatives (McNeely et al. 1990). At the beginning of the 1990s, the qualitative value afforded to evolutionarily distinct species was replaced by quantitative measures of phylogenetic distinctness. One of the first published measures of biodiversity based on phylogenetic information was Faith’s phylogenetic diversity, PD (Faith 1992). The PD value of a set of species placed in the leaves of a phylogenetic tree is defined as the total weight (i.e., the sum of the branch lengths) of the spanning tree connecting the root and these leaves. In its original formulation, the branch lengths represented the number of changes in phenotypic characters, and PD measured the diversity of phenotypic characters in a set of species. In the current usual interpretation of phylogenetic trees, branch lengths represent evolutionary time, which is assumed to be positively correlated with character variation.

Since its introduction, PD has been widely studied and applied (Pellens and Grandcolas 2016). One of its most useful properties, both from the formal and the applicability point of view, is the possibility of efficiently finding and characterizing all subsets of species in a phylogenetic tree of a given size with maximal PD value by means of a very simple greedy algorithm (Pardi and Goldman 2005; Steel 2005); for instance, for a recent application to the analysis of SARS-CoV-2 phylogeny, see Zhukova et al. (2021). The basis of this result is the so-called strong exchange property stating that for every pair of sets of leaves $X, X^{'}$ with $| X | > | X^{'} |$ , we can always move a leaf from X to $X^{'}$ without decreasing the sum of the PD values.

Faith’s PD is defined on evolutionary histories modelled by means of phylogenetic trees. But phylogenetic trees can only cope with speciation events due to mutations, where each species other than the universal common ancestor has only one parent in the evolutionary history (its parent in the tree). It is clearly understood now that other speciation events, which cannot be properly represented by means of single arcs in a tree, play an important role in evolution (Doolittle 1999). These are reticulate events, like genetic recombinations, hybridizations, or lateral gene transfers, where a species is the result of the interaction between several parent species. This has lead to the introduction of phylogenetic networks as models of phylogenetic histories that allow to include these reticulate events (Huson et al. 2010). Faith’s PD has been extended to split networks1(Spillner et al. 2008) and to rooted phylogenetic networks (Wicke and Fischer 2018; Bordewich et al. 2022); as a matter of fact, several generalizations to rooted phylogenetic networks have been proposed, the most natural of which is the rooted Phylogenetic Subnet Diversity, rPSD, introduced by Wicke and Fischer (2018) and renamed AllPaths-PD by Bordewich et al. (2022).

It has been proved that the PD optimization problem can be solved efficiently on circular split networks2 using integer programming (Chernomor et al. 2016; Spillner et al. 2008), as well as (for rPSD) on the simplest class of non-tree rooted phylogenetic networks, the so-called galled trees, by reducing it to sets of linear size of minimum-cost flow problems (Bordewich et al. 2009, 2022). It is also known that these optimization problems are in general NP-hard on rooted phylogenetic networks (Bordewich et al. 2022) and on split networks (Chernomor et al. 2016).

In this paper we focus on the extension of the greedy optimization algorithm for PD on phylogenetic trees to rPSD on rooted phylogenetic networks. As we have mentioned, the greedy algorithm on phylogenetic trees is a consequence of the strong exchange property for PD that guarantees that, given two sets of leaves of different cardinalities, we can always move some element from the larger set to the smaller one without lowering the sum of the PD values. It is easy to check that this strong exchange property for rPSD is no longer valid even on galled trees (Bordewich et al. 2022). So, our first main contribution is its generalization to rPSD through a more involved exchange of leaves than simply moving one leaf from one set to another.

Our exchange property then allows us to strengthen the result of Bordewich et al. on galled trees, by proving that every rPSD-optimal set of m leaves in a galled tree is always obtained from an rPSD-optimal set of $m - 1$ leaves by either optimally adding a leaf or optimally replacing a leaf by a pair of leaves. It also allows us to give polynomial time greedy solutions for the rPSD problem on semibinary level-2 networks and semi-3-ary level-1 networks, the next complexity level of rooted phylogenetic networks (see §2.1 for the definitions). On the negative side, we have not been able to deduce from it a greedy algorithm for semibinary level-3 or semi-4-ary level-1 networks and the problem for these more general classes remains open.

This paper is organized as follows. In Sect. 2.1 we define the concepts necessary to understand this work, including a generalization of the Phylogenetic Diversity due to Wicke and Fischer (2018), together with its properties and an example. Section 3 contains the main result of this manuscript, Theorem 1, and Sect. 4 exposes some of its applications to galled trees and to semi-d-ary level-k networks, for particular instances of d and k. We end in Sect. 5 with some concluding remarks. The proof of Theorem 1 together with two required lemmas can be found in the Appendix and proofs of additional results can be found in the Supplementary Material.

Preliminaries

Phylogenetic networks

Let $Σ$ be a finite set of labels. By a phylogenetic network on $Σ$ we understand a rooted directed acyclic simple graph where each node of in-degree $⩾ 2$ has out-degree exactly 1 and whose leaves (i.e., its nodes of out-degree 0) are bijectively labeled by $Σ$ (Huson et al. 2010). A phylogenetic tree is simply a phylogenetic network without nodes of in-degree $⩾ 2$ . Let us point out here that, although the usual definition of phylogenetic tree and network forbids, for reconstructibility reasons, the existence of elementary nodes, that is, of nodes of in-degree $⩽ 1$ and out-degree 1, we shall allow their existence in order to simplify some statements and proofs.

Let N be a phylogenetic network. We shall denote its root (i.e., its only node of in-degree 0) by r and its sets of nodes and arcs by V(N) and E(N), respectively, and we shall always identify its leaves with their corresponding labels. Given two nodes u, v in N, we say that v is a child of u, and also that u is a parent of v, when $(u, v) \in E (N)$ . A node in N is of tree type, or a tree node, when its in-degree is $⩽ 1$ , and a reticulation when its in-degree is $⩾ 2$ (and hence, its out-degree is 1). We shall say that N is semi-d-ary when all its reticulations have in-degree $⩽ d$ , and that N is binary when it is semibinary and all its internal tree nodes have out-degree 2.

We shall denote a (directed) path in N from a node u to a node v by $u ⇝ v$ . The intermediate nodes of a path $u ⇝ v$ are the nodes involved in it other than u and v. For every $u, v \in V (N)$ , we say that v is a descendant of u, and also that u is an ancestor of v, when there exists a path $u ⇝ v$ , and that v is a descendant of an arc $e = (u^{'}, u)$ when it is a descendant of its end u. In particular, every node is an ancestor, and a descendant, of itself. If v is a descendant of u and $u \neq v$ , we shall say that it is a proper descendant of u. A set of nodes $V_{0} \subseteq V (N)$ is independent when no node in it is a proper descendant of any other node in it.

For every $v \in V (N)$ , its cluster $C_{N} (v) \subseteq Σ$ (or simply C(v) when N is clear from the context), is the set of (labels of) the descendant leaves of v, and the subnetwork of N rooted at v is the subgraph $N_{v}$ of N induced by the set of all descendants of v. $N_{v}$ is a phylogenetic network on C(v) with root v.

For every $X \subseteq V (N)$ , we shall denote the set of all nodes in N that are ancestors of nodes in X by $↑ X$ . Given an arc $e = (u, u^{'}) \in E (N)$ , we shall make the abuse of notation of writing $e \in ↑ X$ to mean that e has some descendant in X, that is, that $u^{'} \in ↑ X$ .

A subgraph of a phylogenetic network N is biconnected when it is connected (as an undirected graph) and it remains connected after removing any node from it together with all arcs incident to this node. Every node and every arc in N are biconnected subgraphs. A biconnected component of N is a maximal biconnected subgraph, and we shall call a biconnected component with more than 2 nodes a blob. Every blob $B$ has one, and only one, node that is an ancestor of all its nodes; we call it its split node. Every node in a blob $B$ with no child inside $B$ is a reticulation (should it be of tree type, removing its parent would disconnect $B$ ); we call such reticulations the exit reticulations of $B$ , and the rest of its reticulations, internal. Every node in $B$ has some descendant exit reticulation.

A phylogenetic network is level-k (Jansson and Sung 2006) when every biconnected component contains at most k reticulations. Thus, a level-0 network is a phylogenetic tree. A semibinary level-1 network is also called a galled tree (Gusfield et al. 2004); the phylogenetic network in Fig. 1 is a galled tree.

Fig. 1 — A weighted phylogenetic network. The tree nodes are represented by circles, the reticulation by a square, and the arcs’ labels represent their weights

A phylogenetic network N is weighted when it is endowed with a weight mapping $w : E (N) \to R_{⩾ 0}$ . The total weight of a subgraph of a weighted phylogenetic network is the sum of the weights of all arcs in the subgraph. In particular, the weight of a path is the sum of the weights of its arcs. All phylogenetic networks (and trees) appearing from now on in this paper are assumed to be weighted, usually without any further notice.

The rooted phylogenetic diversity on phylogenetic trees

Given a finite set $Σ$ , we shall denote henceforth its set of subsets by $P (Σ)$ and, for every $k ⩾ 0$ , the set of all its subsets of cardinality k by $P_{k} (Σ)$ .

Given a weighted phylogenetic tree T on $Σ$ , Faith’s rooted Phylogenetic Diversity (Faith 1992) is the set function ${PD}_{T} : P (Σ) \to R_{⩾ 0}$ sending each $X \subseteq Σ$ to the total weight of the subtree induced by the ancestors of nodes in X:

\begin{matrix} {PD}_{T} (X) = \sum_{e \in ↑ X} w (e) . \end{matrix}

This function ${PD}_{T}$ on phylogenetic trees satisfies the following strong exchange property, introduced by Steel (2005) for unrooted phylogenetic trees: for every phylogenetic tree T on $Σ$ and for every $X, X^{'} \subseteq Σ$ such that $| X^{'} | < | X |$ , there exists some $x \in X \ X^{'}$ such that

\begin{matrix} {PD}_{T} (X) + {PD}_{T} (X^{'}) ⩽ {PD}_{T} (X^{'} \cup {x}) + {PD}_{T} (X \ {x}) . \end{matrix}

For a proof of this fact in the rooted case, see (Steel 2016, §6.4.1).

This strong exchange property for ${PD}_{T}$ is the key ingredient in the proof that the simple Algorithm 1 given below produces, for every $k ⩾ 1$ , the family $M_{k}$ of all ${PD}_{T}$ -optimal subsets of $Σ$ of cardinality k, that is, of all sets of k leaves with maximum ${PD}_{T}$ value. For this proof in the unrooted case, see Steel (2005); the proof in the rooted case is similar: cf. §6.4.1 in Steel (2016). In particular, given a phylogenetic tree T on $Σ$ , this algorithm provides a polynomial solution to the problem of finding the maximum ${PD}_{T}$ value among all members of $P_{k} (Σ)$ , and a member of $P_{k} (Σ)$ reaching this maximum.

The rooted phylogenetic subnet diversity

Wicke and Fischer (2018) proposed several generalizations of Faith’s rooted Phylogenetic Diversity function to phylogenetic networks. One of them, and possibly the most straightforward, is the rooted Phylogenetic Subnet Diversity: the set function ${rPSD}_{N} : P (Σ) \to R_{⩾ 0}$ sending each $X \subseteq Σ$ to the total weight of the subgraph induced by the ancestors of nodes in X:

\begin{matrix} {rPSD}_{N} (X) = \sum_{e \in ↑ X} w (e) . \end{matrix}

It is clear that if N is a phylogenetic tree, then ${rPSD}_{N} = {PD}_{N}$ . When N is clear from the context, we shall omit the subscript N and simply write $rPSD$ .

Example 1

On the phylogenetic network N depicted in Fig. 1,

\begin{matrix} \begin{matrix} rPSD (x_{1}) = 5, rPSD (x_{2}) = 6, rPSD (x_{3}) = 5, rPSD (x_{4}) = 4, \\ rPSD ({x_{1}, x_{2}}) = 8, rPSD ({x_{1}, x_{3}}) = 9, rPSD ({x_{1}, x_{4}}) = 9, \\ rPSD ({x_{2}, x_{3}}) = 8, rPSD ({x_{2}, x_{4}}) = 10, rPSD ({x_{3}, x_{4}}) = 9, \\ rPSD ({x_{1}, x_{2}, x_{3}}) = 10, rPSD ({x_{1}, x_{2}, x_{4}}) = 12, rPSD ({x_{1}, x_{3}, x_{4}}) = 13, \\ rPSD ({x_{2}, x_{3}, x_{4}}) = 12, rPSD ({x_{1}, x_{2}, x_{3}, x_{4}}) = 14 . \end{matrix} \end{matrix}

For every phylogenetic network N on $Σ$ , $rPSD$ is:

(i)
Monotone nondecreasing: For every $X \subseteq Y \subseteq Σ$ , $rPSD (X) ⩽ rPSD (Y)$ .
(ii)
Subadditive: For every $X, Y \subseteq Σ$ ,
$\begin{matrix} rPSD (X \cup Y) ⩽ rPSD (X) + rPSD (Y) . \end{matrix}$
(iii)
Submodular: For every $X \subseteq Y \subseteq Σ$ and for every $a \in Σ \ Y$ ,
$\begin{matrix} rPSD (Y \cup {a}) - rPSD (Y) ⩽ rPSD (X \cup {a}) - rPSD (X) . \end{matrix}$

(i) and (ii) are clear. As to (iii), it is proved by Bordewich et al. (2022).

On the negative side, $rPSD$ need not satisfy the strong exchange property, even for the simplest non-tree networks N. Indeed, consider again the binary galled tree N depicted in Fig. 1. Take $X = {x_{1}, x_{3}, x_{4}}$ and $X^{'} = {x_{2}, x_{4}}$ . Then

\begin{matrix} \begin{matrix} rPSD ({x_{1}, x_{3}, x_{4}}) + rPSD ({x_{2}, x_{4}}) = 23, \\ rPSD ({x_{3}, x_{4}}) + rPSD ({x_{1}, x_{2}, x_{4}}) = rPSD ({x_{1}, x_{4}}) + rPSD ({x_{2}, x_{3}, x_{4}}) = 21 . \end{matrix} \end{matrix}

Therefore, there is no $x \in X \ X^{'}$ such that

\begin{matrix} rPSD (X) + rPSD (X^{'}) ⩽ rPSD (X \ {x}) + rPSD (X^{'} \cup {x}) . \end{matrix}

As a consequence, an $rPSD$ -optimal set of cardinality k of a phylogenetic network N need not contain any $rPSD$ -optimal set of cardinality $k - 1$ . Consider again the galled tree depicted in Fig. 1. Its only set of two labels with largest $rPSD$ value is ${x_{2}, x_{4}}$ and its only set of three labels with largest $rPSD$ value is ${x_{1}, x_{3}, x_{4}}$ .

So, Algorithm 1 cannot be used to produce $rPSD$ -optimal sets of a given cardinality as it stands. Actually, Bordewich et al. (2022) prove that, given a phylogenetic network N on $Σ$ and an integer k, the problem of finding the maximum ${rPSD}_{N}$ value on $P_{k} (Σ)$ is NP-hard. On the positive side, these authors also prove that this problem can be solved in polynomial time on binary galled trees.

A general exchange property

Let $Σ$ be a finite set and $W : P (Σ) \to R_{⩾ 0}$ a function. Given $X, X^{'} \subseteq Σ$ such that $| X^{'} | < | X |$ , a W-improving pair for $X, X^{'}$ is a pair of sets (A, B), with $A \subseteq X \ X^{'}$ , $B \subseteq X^{'} \ X$ , and $| B | < | A |$ , such that

\begin{matrix} W (X) + W (X^{'}) ⩽ W ((X \ A) \cup B) + W ((X^{'} \ B) \cup A) . \end{matrix}

To simplify the notation, given $X \subseteq Σ$ , $S \subseteq X$ and $T \subseteq Σ \ X$ , we shall denote henceforth $(X \ S) \cup T$ by $τ_{S, T} (X)$ .

Given a set

\begin{matrix} S \subseteq {(A, B) \in P {(Σ)}^{2} : A \cap B = \emptyset, | B | < | A |}, \end{matrix}

we shall say that $W : P (Σ) \to R_{⩾ 0}$ satisfies the exchange property with respect to $S$ when every pair of sets $X, X^{'} \subseteq Σ$ with $| X^{'} | < | X |$ has a W-improving pair in $S$ . So, Steel’s strong exchange property for phylogenetic trees mentioned in §2.2 says that, for every phylogenetic tree T on $Σ$ , ${PD}_{T} : P (Σ) \to R_{⩾ 0}$ satisfies the exchange property with respect to

\begin{matrix} S_{0} (Σ) = {({x}, \emptyset) : x \in Σ} . \end{matrix}

As we have seen, this is no longer true for $rPSD$ on galled trees. The main result in this paper, Theorem 1, says that $rPSD$ satisfies, on every semi-d-ary level-k phylogenetic network on $Σ$ , the exchange property with respect to a larger family of pairs of subsets $S_{k, d} (Σ)$ whose description only depends on k and d. These families are, when $k = 1$ ,

\begin{matrix} S_{1, d} (Σ) = S_{0} (Σ) \cup {(A, B) \in P {(Σ)}^{2} : A \cap B = \emptyset, 1 ⩽ | B | < | A | ⩽ d} \end{matrix}

and, when $k ⩾ 2$ ,

\begin{matrix} S_{k, d} (Σ) & = S_{0} (Σ) \\ \cup {(A, B) \in P {(Σ)}^{2} : A \cap B = \emptyset, 1 ⩽ | B | < | A | < d k, \\ | A | - | B | ⩽ (d - 1) k} . \end{matrix}

From now on, when it is unnecessary to explicit the set of labels $Σ$ , we shall omit it from the notation of these families.

Given k and d, the cardinalities of these families of sets are polynomial in $| Σ | = n$ : $| S_{0} | = n$ and

\begin{matrix} | S_{1, d} | & = n + \sum_{j = 2}^{d} \sum_{i = 1}^{j - 1} (\begin{matrix} n \\ j \end{matrix}) (\begin{matrix} n - j \\ i \end{matrix}), \\ | S_{k, d} | & = n + \sum_{j = 2}^{d k - 1} \sum_{i = j - (d - 1) k}^{j - 1} (\begin{matrix} n \\ j \end{matrix}) (\begin{matrix} n - j \\ i \end{matrix}) when k ⩾ 2 . \end{matrix}

As we announced above, the main result in this section is the following theorem. Since its proof is quite long and technical, in order not to lose the thread of the manuscript we postpone it until Appendix A at the end of the paper.

Theorem 1

If N is a semi-d-ary level-k phylogenetic network, ${rPSD}_{N}$ satisfies the exchange property with respect to $S_{k, d}$ .

The family $S_{k, d}$ cannot be improved, because there are semi-d-ary level-k phylogenetic networks N and pairs of sets of leaves $X, X^{'}$ with $| X^{'} | < | X |$ having no ${rPSD}_{N}$ -improving pair (A, B) with $| A | - | B | < (d - 1) k$ . The next example describes one such network for $d = 2$ ; it is straightforward to generalize it to the semi-d-ary setting for any $d ⩾ 2$

Example 2

Consider the binary level-k phylogenetic network N on $Σ = {y, x_{1}, \dots, x_{k}}$ depicted in Fig. 2. Assume that all its arcs e have weight $w (e) > 0$ .

Let $X = {x_{1}, \dots, x_{k}}$ and $X^{'} = {y}$ . Let us check that, for every (A, B) such that $A \subseteq X$ , $B \subseteq X^{'}$ , and $| B | < | A |$ ,

\begin{matrix} rPSD (X) + rPSD (X^{'}) ⩾ rPSD (τ_{A, B} (X)) + rPSD (τ_{B, A} (X^{'})) \end{matrix}

and that the equality holds only when $(A, B) = (X, X^{'})$ . This will imply that the only $rPSD$ -improving pair for $X, X^{'}$ in $S_{k, 2}$ is $(X, X^{'})$ itself.

Let:

$E_{0}$ be the arcs in $↑ {v_{1}, \dots, v_{k}}$ ; that is, $(r, a_{1})$ and those beginning in ${a_{1}, \dots, a_{k - 1}}$ .
$E_{1} = E (N) \ (E_{0} \cup {e_{i}}_{i = 1, \dots, k})$ ; that is, the arcs ending in ${H_{1}, H_{2}, \dots, H_{k}, y}$ .

Then,

\begin{matrix} rPSD (X) = \sum_{i = 1}^{k} w (e_{i}) + \sum_{e \in E_{0}} w (e), rPSD (X^{'}) = \sum_{e \in E_{0} \cup E_{1}} w (e) . \end{matrix}

Now, on the one hand, if $B = \emptyset$ and $A \neq \emptyset$

\begin{matrix} rPSD (τ_{A, \emptyset} (X)) = rPSD (X \ A) = \sum_{x_{i} \notin A} w (e_{i}) + \sum_{e \in E_{0} \cap ↑ (X \ A)} w (e) \\ rPSD (τ_{\emptyset, A} (X^{'})) = rPSD (X^{'} \cup A) = rPSD (X^{'}) + \sum_{x_{i} \in A} w (e_{i}) \end{matrix}

and then

\begin{matrix} rPSD (X) + rPSD (X^{'}) - (rPSD (τ_{A, \emptyset} (X)) + rPSD (τ_{\emptyset, A} (X^{'}))) \\ = \sum_{e \in E_{0}} w (e) - \sum_{e \in E_{0} \cap ↑ (X \ A)} w (e) > 0 \end{matrix}

because for every $x_{i} \in A$ the arc $(a_{i}, v_{i})$ (or $(a_{k - 1}, v_{k})$ if $i = k$ ) does not belong to $↑ (X \ A)$ and therefore $E_{0} \cap ↑ (X \ A) ⊊ E_{0}$ .

On the other hand, if $B = X^{'} = {y}$ ,

\begin{matrix} rPSD (τ_{A, {y}} (X)) = rPSD ((X \ A) \cup {y}) = \sum_{x_{i} \notin A} w (e_{i}) + rPSD (X^{'}), \\ rPSD (τ_{{y}, A} (X^{'})) = rPSD (A) = \sum_{x_{i} \in A} w (e_{i}) + \sum_{e \in E_{0} \cap ↑ A} w (e) \end{matrix}

and then

\begin{matrix} rPSD (X) + rPSD (X^{'}) - (rPSD (τ_{A, {y}} (X)) + rPSD (τ_{{y}, A} (X^{'}))) \\ = \sum_{e \in E_{0}} w (e) - \sum_{e \in E_{0} \cap ↑ A} w (e) ⩾ 0, \end{matrix}

where, arguing as above, the inequality is an equality exactly when $A = X$ .

We close this section with a refinement of Theorem 1 for level-1 networks. The proof is similar, and we provide it in Section 2 of the Supplementary file.

Corollary 1

If N is a semi-d-ary level-1 phylogenetic network on $Σ$ , ${rPSD}_{N}$ satisfies the exchange property with respect to

\begin{matrix} S_{d} = S_{0} \cup {(A, {b}) \in P {(Σ)}^{2} : b \notin A, 1 < | A | ⩽ d} \end{matrix}

Moreover, if $X, X^{'}$ have an improving pair $(A, {b}) \in S_{d}$ , then there exists a blob in N with exit reticulation H and split node v such that $X \cap C (H) = \emptyset$ , $b \in C (H)$ , and $A \subseteq C (v)$ .

Applications

In this section we apply Theorem 1 to the study of ${rPSD}_{N}$ -optimal subsets for low values of the level of N and the in-degree of its reticulations. Throughout this section, let N be a phylogenetic network on a set $Σ$ of cardinality n and $rPSD = {rPSD}_{N}$ . We shall use the following notation:

For every m, let ${Opt}_{m}$ be the family of $rPSD$ -optimal subsets of $Σ$ of cardinality m:
$\begin{matrix} {Opt}_{m} = {Z \in P_{m} (Σ) : rPSD (Z) = max (rPSD (P_{m} (Σ)))} . \end{matrix}$
An optimal sequence of N is a sequence $Y = {(Y_{m})}_{0 ⩽ m ⩽ n}$ with each $Y_{m} \in {Opt}_{m}$ .
For every $k ⩾ 1$ and $d ⩾ 2$ , for every $1 ⩽ j ⩽ (d - 1) k$ , and for every $X \in P (Σ)$ ,
- $τ_{k, d, j} (X)$ is the family of subsets of $Σ$ of cardinality $| X | + j$ of the form $τ_{B, A} (X)$ (this is, $(X \ B) \cup A)$ with $(A, B) \in S_{k, d}$ , $B \subseteq X$ , $A \subseteq Σ \ X$ , and $| A | - | B | = j$ .
- $Opt- τ_{k, d, j} (X)$ are the members of $τ_{k, d, j} (X)$ with largest $rPSD$ value.
and, analogously,
- $τ_{k, d, j}^{- 1} (X)$ is the family of subsets of $Σ$ of cardinality $| X | - j$ of the form $τ_{A, B} (X)$ (this is, $(X \ A) \cup B)$ with $(A, B) \in S_{k, d}$ , $A \subseteq X$ , $B \subseteq Σ \ X$ , and $| A | - | B | = j$ .
- $Opt- τ_{k, d, j}^{- 1} (X)$ are the members of $τ_{k, d, j}^{- 1} (X)$ with largest $rPSD$ value.
Notice that $X^{'} \in τ_{k, d, j} (X)$ if, and only if, $X \in τ_{k, d, j}^{- 1} (X^{'})$ .
Finally, for every $k ⩾ 1$ and $d ⩾ 2$ , for every $1 ⩽ j ⩽ (d - 1) k$ , we describe the family of subsets of $Σ$ of cardinality $m + j$ (resp. $m - j$ ) of the form $τ_{B, A} (Y)$ (resp. $τ_{A, B} (Y)$ ) with $(A, B) \in S_{k, d}$ , $| A | - | B | = j$ , with largest $rPSD$ value obtained from each $Y \in {Opt}_{m}$ :
- $Opt- τ_{k, d, j} ({Opt}_{m}) = ⋃_{Y \in {Opt}_{m}} Opt- τ_{k, d, j} (Y)$ .
- $Opt- τ_{k, d, j}^{- 1} ({Opt}_{m}) = ⋃_{Y \in {Opt}_{m}} Opt- τ_{k, d, j}^{- 1} (Y)$ .
The aim of this section will be to relate each $Opt- τ_{k, d, j} ({Opt}_{m})$ with ${Opt}_{m + j}$ and $Opt- τ_{k, d, j}^{- 1} ({Opt}_{m})$ with ${Opt}_{m - j}$ , providing the key ingredient of the greedy algorithm.

We begin with galled trees. As we have already mentioned, it was proved in Bordewich et al. (2022, Cor 4.6) that the optimization problem for $rPSD$ can be solved in polynomial time on galled trees. The next proposition strengthens this result by providing a recursive construction of the $rPSD$ -optimal sets for these networks.

Proposition 1

Let N be a galled tree. Then, for every $m = 1, \dots, n$ ,

\begin{matrix} {Opt}_{m} = Opt- τ_{1, 2, 1} ({Opt}_{m - 1}) . \end{matrix}

Proof

Let $Y_{m} \in {Opt}_{m}$ and $Y_{m - 1} \in {Opt}_{m - 1}$ . By Theorem 1, there exists some $(A, B) \in S_{1, 2}$ , with $A \subseteq Y_{m} \ Y_{m - 1}$ and $B \subseteq Y_{m - 1} \ Y_{m}$ , such that

\begin{matrix} rPSD (Y_{m}) + rPSD (Y_{m - 1}) ⩽ rPSD (τ_{A, B} (Y_{m})) + rPSD (τ_{B, A} (Y_{m - 1})) . \end{matrix}

Since $| A | - | B | = 1$ , we have that $τ_{A, B} (Y_{m}) \in P_{m - 1} (Σ)$ and $τ_{B, A} (Y_{m - 1}) \in P_{m} (Σ)$ , and then, being $Y_{m - 1}$ and $Y_{m}$ optimal in $P_{m - 1} (Σ)$ and $P_{m} (Σ)$ , respectively,

\begin{matrix} rPSD (τ_{A, B} (Y_{m})) ⩽ rPSD (Y_{m - 1}), rPSD (τ_{B, A} (Y_{m - 1})) ⩽ rPSD (Y_{m}) . \end{matrix}

Combining these inequalities with (1) we obtain

\begin{matrix} \begin{matrix} rPSD (Y_{m}) + rPSD (Y_{m - 1}) & ⩽ rPSD (τ_{A, B} (Y_{m})) + rPSD (τ_{B, A} (Y_{m - 1})) \\ ⩽ rPSD (Y_{m - 1}) + rPSD (Y_{m}) . \end{matrix} \end{matrix}

Then, the inequalities (2) must be equalities, from which we deduce that:

$τ_{A, B} (Y_{m}) \in {Opt}_{m - 1}$ , and thus $Y_{m} = τ_{B, A} (τ_{A, B} (Y_{m})) \in Opt- τ_{1, 2, 1} ({Opt}_{m - 1})$ .
$τ_{B, A} (Y_{m - 1}) \in {Opt}_{m}$ , and thus $Opt- τ_{1, 2, 1} (Y_{m - 1}) \subseteq {Opt}_{m}$ .

Since the choice of the optimal sets $Y_{m}, Y_{m - 1}$ was arbitrary, we conclude that

\begin{matrix} {Opt}_{m} \subseteq Opt- τ_{1, 2, 1} ({Opt}_{m - 1}) and Opt- τ_{1, 2, 1} ({Opt}_{m - 1}) \subseteq {Opt}_{m} \end{matrix}

as stated. $□$

Remark 1

Notice that along the proof of the previous proposition we have proved that, in a galled tree, for every $Y_{m} \in {Opt}_{m}$ and $Y_{m - 1} \in {Opt}_{m - 1}$ , there exists some pair $(A, B) \in S_{1, 2}$ , with $A \subseteq Y_{m} \ Y_{m - 1}$ and $B \subseteq Y_{m - 1} \ Y_{m}$ , such that $τ_{A, B} (Y_{m}) \in {Opt}_{m - 1}$ and $τ_{B, A} (Y_{m - 1}) \in {Opt}_{m}$ .

Proposition 1 implies that, on a galled tree, the members of ${Opt}_{m}$ are those obtained from members of ${Opt}_{m - 1}$ by either optimally adding a leaf or optimally replacing a leaf by two leaves. This result yields the simple greedy polynomial time Algorithm 2 computing the family of optimal sets ${Opt}_{m}$ in increasing order of m that extends the greedy Algorithm 1 for phylogenetic trees.

Remark 2

Proposition 1 also implies that, on a galled tree, the members of each ${Opt}_{m}$ are obtained from members of ${Opt}_{m + 1}$ by removing a leaf or replacing a pair of leaves by a leaf in such a way that the value of $rPSD$ decreases the least.

To move up in the complexity ladder of phylogenetic networks, it is convenient to introduce a notation that allows a more compact description of the arguments of the type used in the previous proposition. Given a semi-d-ary level-k phylogenetic network N and an optimal sequence $Y = {(Y_{p})}_{0 ⩽ p ⩽ n}$ of it, we shall write, for every $0 ⩽ q < p ⩽ n$ and for every $j ⩾ 1$ ,

\begin{matrix} (p, q) ≺ \cdot^{Y} (p - j, q + j) \end{matrix}

to mean that there exists an $rPSD$ -improving pair $(A, B) \in S_{k, d}$ for $Y_{p}$ and $Y_{q}$ such that $| A | - | B | = j$ . When we need to emphasize an improving pair (A, B), we shall write “ $(p, q) ≺ \cdot^{Y} (p - j, q + j)$ by an improving pair (A, B)”. In addition, we shall write $(p, q) ≺ \cdot_{j}^{Y} {p^{'}, q^{'}}$ to mean that $(p, q) ≺ \cdot^{Y} (p - j, q + j)$ and ${p - j, q + j} = {p^{'}, q^{'}}$ .

Remark 3

By Theorem 1, given any optimal sequence Y of a semi-d-ary level-k phylogenetic network and $0 ⩽ q < p$ , there always exists some $1 ⩽ j ⩽ (d - 1) k$ such that $(p, q) ≺ \cdot^{Y} (p - j, q + j)$ .

The proof of the next lemma, which we leave to the reader, is similar to that of Proposition 1; actually, that proposition is a direct consequence of this lemma for $j = 1$ .

Lemma 1

Let N be a phylogenetic network and Y an optimal sequence of N. If $(p, q) ≺ \cdot^{Y} (p - j, q + j)$ and $rPSD (Y_{p - j}) + rPSD (Y_{q + j}) ⩽ rPSD (Y_{p}) + rPSD (Y_{q})$ , then $Y_{p} \in Opt- τ_{k, d, j} ({Opt}_{p - j})$ and $Y_{q} \in Opt- τ_{k, d, j}^{- 1} ({Opt}_{q + j})$ .

In particular, if $p - q = j$ and $(p, q) ≺ \cdot^{Y} (q, p)$ , then $Y_{p} \in Opt- τ_{k, d, j} ({Opt}_{q})$ and $Y_{q} \in Opt- τ_{k, d, j}^{- 1} ({Opt}_{p})$ .

Corollary 2

Let N be a phylogenetic network and Y an optimal sequence of N. If there exists a closed $≺ \cdot^{Y}$ -chain of length $m ⩾ 1$

\begin{matrix} (p_{1}, q_{1}) ≺ \cdot_{1}^{Y} {p_{2}, q_{2}} by an improving pair (A_{1}, B_{1}) \\ (p_{2}, q_{2}) ≺ \cdot_{2}^{Y} {p_{3}, q_{3}} by an improving pair (A_{2}, B_{2}) \\ ⋮ \\ (p_{m}, q_{m}) ≺ \cdot_{m}^{Y} {p_{1}, q_{1}} by an improving pair (A_{m}, B_{m}) \end{matrix}

then, for each $i = 1, \dots m$ ,

\begin{matrix} Y_{p_{i}} \in Opt- τ_{k, d, j_{i}} ({Opt}_{p_{i} - j_{i}}) and Y_{q_{i}} \in Opt- τ_{k, d, j_{i}}^{- 1} ({Opt}_{q_{i} + j_{i}}) . \end{matrix}

Proof

The closed chain ensures that all the inequalities in

\begin{matrix} rPSD (Y_{p_{1}}) + rPSD (Y_{q_{1}}) ⩽ rPSD (τ_{A_{1}, B_{1}} (Y_{p_{1}})) + rPSD (τ_{B_{1}, A_{1}} (Y_{q_{1}})) \\ ⩽ rPSD (Y_{p_{2}}) + rPSD (Y_{q_{2}}) ⩽ rPSD (τ_{A_{2}, B_{2}} (Y_{p_{2}})) + rPSD (τ_{B_{2}, A_{2}} (Y_{q_{2}})) \\ ⩽ rPSD (Y_{p_{3}}) + rPSD (Y_{q_{3}}) ⩽ rPSD (τ_{A_{3}, B_{3}} (Y_{p_{3}})) + rPSD (τ_{B_{3}, A_{3}} (Y_{q_{3}})) \\ ⋮ \\ ⩽ rPSD (Y_{p_{m}}) + rPSD (Y_{q_{m}}) ⩽ rPSD (τ_{A_{m}, B_{m}} (Y_{p_{m}})) + rPSD (τ_{B_{m}, A_{m}} (Y_{q_{m}})) \\ ⩽ rPSD (Y_{p_{1}}) + rPSD (Y_{q_{1}}), \end{matrix}

are equalities, and the result follows from applying the Lemma 1 to each $(p_{i}, q_{i}) ≺ \cdot^{Y} (p_{i} - j_{i}, q_{i} + j_{i}) .$ $□$

It is time to move one step up in the complexity ladder of phylogenetic networks. Recall that

\begin{matrix} S_{2, 2} = S_{1, 3} = S_{0} {\cup {(A, B) \in P (Σ)}^{2} : A \cap B = \emptyset, 1 ⩽ | B | < | A | ⩽ 3)} \end{matrix}

and in particular, for every $j = 1, 2$ , $Opt- τ_{1, 3, j} = Opt- τ_{2, 2, j}$ .

Proposition 2

If N is a semibinary level-2 or a semi-3-ary level-1 network, then:

${Opt}_{m} \subseteq Opt- τ_{2, 2, 1} ({Opt}_{m - 1}) \cup Opt- τ_{2, 2, 2} ({Opt}_{m - 2})$ for every $m = 1, \dots, n$ .
${Opt}_{m} \subseteq Opt- τ_{2, 2, 1}^{- 1} ({Opt}_{m + 1}) \cup Opt- τ_{2, 2, 2}^{- 1} ({Opt}_{m + 2})$ for every $m = 1, \dots, n - 1$ .

Proof

Let Y be an optimal sequence of N and fix $1 ⩽ m ⩽ n$ . Then, by Theorem 1,

\begin{matrix} (m, m - 1) ≺ \cdot^{Y} (m - j_{1}, m - 1 + j_{1}) \end{matrix}

for some $j_{1} = 1$ or $j_{1} = 2$ .

If $j_{1} = 1$ , Eqn. (3) says that $(m, m - 1) ≺ \cdot^{Y} (m - 1, m)$ , and hence, by Corollary 2,
$\begin{matrix} Y_{m} \in Opt- τ_{2, 2, 1} ({Opt}_{m - 1}) and Y_{m - 1} \in Opt- τ_{2, 2, 1}^{- 1} ({Opt}_{m}) . \end{matrix}$
If $j_{1} = 2$ , Eqn. (3) says that $(m, m - 1) ≺ \cdot^{Y} (m - 2, m + 1)$ . Applying Theorem 1 again,
$\begin{matrix} (m + 1, m - 2) ≺ \cdot^{Y} (m + 1 - j_{2}, m - 2 + j_{2}), \end{matrix}$
for some $j_{2} = 1$ or $j_{2} = 2$ . In both cases, ${m + 1 - j_{2}, m - 2 + j_{2}} = {m - 1, m}$ , thus closing the $≺ \cdot$ -chain initiated with (3). Then, by Corollary 2,
$\begin{matrix} Y_{m} \in Opt- τ_{2, 2, 2} ({Opt}_{m - 2}) and Y_{m - 1} \in Opt- τ_{2, 2, 2}^{- 1} ({Opt}_{m + 1}) . \end{matrix}$

Thus, in both cases we have that

\begin{matrix} \begin{matrix} Y_{m} \in Opt- τ_{2, 2, 1} ({Opt}_{m - 1}) \cup Opt- τ_{2, 2, 2} ({Opt}_{m - 2}), \\ Y_{m - 1} \in Opt- τ_{2, 2, 1}^{- 1} ({Opt}_{m}) \cup Opt- τ_{2, 2, 2}^{- 1} ({Opt}_{m + 1}), \end{matrix} \end{matrix}

which, by the arbitrary choice of Y and m, concludes the proof. $□$

Point (a) in the last proposition tells us that if N is semibinary level-2 or semi-3-ary level-1, all members of each ${Opt}_{m}$ are obtained either from members of ${Opt}_{m - 1}$ by optimally adding a leaf, optimally replacing a leaf by a pair of leaves, or optimally replacing a pair of leaves by a triple of leaves (this possibility need not be considered in the semi-3-ary level-1 case by Corollary 1), or from members of ${Opt}_{m - 2}$ by optimally replacing a leaf by a triple of leaves. This proves the correctness of the polynomial time greedy Algorithm 3 to compute the family of optimal sets ${Opt}_{m}$ for such a network N in increasing order of m (as we have mentioned, if N is semi-3-ary level-1, the sets $M^{(4)}$ in the loop need not be computed).

Algorithm 3 — Greedy for semibinary level-2 or semi-3-ary level-1 networks

Example 3

Consider the phylogenetic networks in Fig. 3. On the left, a semi-3-ary level-1 network and on the right a semibinary level-2 network obtained by blowing up the reticulations in the left-hand side network into a pair of in-degree 2 connected reticulations. In both networks, we have the following optimal sets of leaves:

\begin{matrix} \begin{matrix} {Opt}_{1} : {z_{0}} & {Opt}_{5} : {x_{00}, x_{01}, x_{02}, x_{11}, x_{12}} \\ {Opt}_{2} : {z_{0}, z_{1}} & {Opt}_{6} : {x_{00}, x_{01}, x_{02}, x_{10}, x_{11}, x_{12}} \\ {Opt}_{3} : {x_{11}, x_{12}, z_{0}} & {Opt}_{7} : {x_{00}, x_{01}, x_{02}, x_{10}, x_{11}, x_{12}, z_{1}} \\ {Opt}_{4} : {x_{00}, x_{01}, x_{02}, z_{1}} \end{matrix} \end{matrix}

Then, in both networks,

\begin{matrix} {x_{00}, x_{01}, x_{02}, z_{1}} \in {Opt}_{4} \ Opt- τ_{2, 2, 1} ({Opt}_{3}), {x_{11}, x_{12}, z_{0}} \in {Opt}_{3} \ Opt- τ_{2, 2, 1}^{- 1} ({Opt}_{4}) . \end{matrix}

Now, if we move one more step further in the complexity ladder, the structure of the optimal sets is no longer so simple.

Proposition 3

If N is a semibinary level-3 or a semi-4-ary level-1 network, then, for every $m = 1, \dots, n$ , at least one of the following assertions is true:

${Opt}_{m} \subseteq ⋃_{j = 1}^{3} Opt- τ_{k, d, j} ({Opt}_{m - j})$ and ${Opt}_{m - 1} \subseteq ⋃_{j = 1}^{3} Opt- τ_{k, d, j}^{- 1} ({Opt}_{m - 1 + j})$ .
${Opt}_{m + 1} = Opt- τ_{k, d, 3} ({Opt}_{m - 2})$ ,

where (k, d) is (3, 2) or (1, 4), depending on the type of network.

Proof

To begin with, notice that

\begin{matrix} S_{3, 2} & = S_{0} {\cup {(A, B) \in P (Σ)}^{2} : A \cap B = \emptyset, 1 ⩽ | B | < | A | < 6, | A | - | B | ⩽ 3} \\ S_{1, 4} & = S_{0} {\cup {(A, B) \in P (Σ)}^{2} : A \cap B = \emptyset, 1 ⩽ | B | < | A | ⩽ 4} \end{matrix}

and therefore $S_{1, 4} \subseteq S_{3, 2}$ . To simplify the notation, we shall abbreviate $Opt- τ_{k, d, j}$ by simply $Opt- τ_{j}$ . Observe that j can only go from 1 to 3.

Let Y be an optimal sequence of N and fix $1 < m ⩽ n$ . To ease the task of the reader, we sketch the flow of the proof in Fig. 4; all implications leading to (a) or (b) are due to Cor. 2.

By Theorem 1,

\begin{matrix} (m, m - 1) ≺ \cdot^{Y} (m - j_{1}, m - 1 + j_{1}) \end{matrix}

for some $j_{1} \in {1, 2, 3}$ .

If $j_{1} = 1$ , then $(m, m - 1) ≺ \cdot^{Y} (m - 1, m)$ and we conclude as in (1) in the proof of Proposition 2 that $Y_{m} \in Opt- τ_{1} ({Opt}_{m - 1})$ and $Y_{m - 1} \in Opt- τ_{1}^{- 1} ({Opt}_{m})$ .
If $j_{1} = 2$ , then $(m, m - 1) ≺ \cdot^{Y} (m - 2, m + 1)$ . Applying Theorem 1 again,
$\begin{matrix} (m + 1, m - 2) ≺ \cdot^{Y} (m + 1 - j_{2}, m - 2 + j_{2}), \end{matrix}$
for some $j_{2} \in {1, 2, 3}$ .
1. If $j_{2} = 1$ or $j_{2} = 2$ , $(m + 1, m - 2) ≺ \cdot_{j_{2}}^{Y} {m, m - 1}$ and we conclude as in (2) in the proof of Proposition 2 that $Y_{m} \in Opt- τ_{2} ({Opt}_{m - 2})$ and $Y_{m - 1} \in Opt- τ_{2}^{- 1} ({Opt}_{m + 1})$ .
2. When $j_{2} = 3$ , we have $(m + 1, m - 2) ≺ \cdot^{Y} (m - 2, m + 1)$ and we can only deduce that $Y_{m + 1} \in Opt- τ_{3} ({Opt}_{m - 2})$ and $Y_{m - 2} \in Opt- τ_{3}^{- 1} ({Opt}_{m + 1})$ .
If $j_{1} = 3$ , then $(m, m - 1) ≺ \cdot^{Y} (m - 3, m + 2)$ . Applying Theorem 1 again,
$\begin{matrix} (m + 2, m - 3) ≺ \cdot^{Y} (m + 2 - j_{2}, m - 3 + j_{2}), \end{matrix}$
for some $j_{2} \in {1, 2, 3}$ .
1. If $j_{2} = 1$ , then $(m + 2, m - 3) ≺ \cdot^{Y} (m + 1, m - 2)$ . Applying Theorem 1, we have
  $\begin{matrix} (m + 1, m - 2) ≺ \cdot^{Y} (m + 1 - j_{3}, m - 2 + j_{3}) \end{matrix}$
  for some $j_{3} \in {1, 2, 3}$ .
  1. If $j_{3} = 1$ or $j_{3} = 2$ , then $(m + 1, m - 2) ≺ \cdot^{Y} {m, m - 1}$ , closing the $≺ \cdot$ -chain initiated with (4). Then, by Corollary 2, $Y_{m} \in Opt- τ_{3} ({Opt}_{m - 3})$ and $Y_{m - 1} \in Opt- τ_{3}^{- 1} ({Opt}_{m + 2})$ .
  2. If $j_{3} = 3$ , then $(m + 1, m - 2) ≺ \cdot^{Y} (m - 2, m + 1)$ as in (2.b) and we only have that $Y_{m + 1} \in Opt- τ_{3} ({Opt}_{m - 2})$ and $Y_{m - 2} \in Opt- τ_{3}^{- 1} ({Opt}_{m + 1})$ .
2. If $j_{2} = 2$ or $j_{2} = 3$ , then $(m + 2, m - 3) ≺ \cdot^{Y} {m, m - 1}$ , closing the $≺ \cdot$ -chain initiated with (4). Then, by Corollary 2, $Y_{m} \in Opt- τ_{3} ({Opt}_{m - 3})$ and $Y_{m - 1} \in Opt- τ_{3}^{- 1} ({Opt}_{m + 2})$ .

Summarizing, we only have two possibilities:

On the one hand, in the cases (1), (2.a), (3.a.i), and (3.b),
$\begin{matrix} Y_{m} \in ⋃_{j = 1}^{3} Opt- τ_{j} ({Opt}_{m - j}) and Y_{m - 1} \in ⋃_{j = 1}^{3} Opt- τ_{j}^{- 1} ({Opt}_{m - 1 + j}) . \end{matrix}$
On the other hand, in the cases (2.b) and (3.a.ii),
$\begin{matrix} Y_{m + 1} \in Opt- τ_{3} ({Opt}_{m - 2}) and Opt- τ_{3} (Y_{m - 2}) \subseteq {Opt}_{m + 1} . \end{matrix}$

By the arbitrary choice of Y and m, this concludes the proof. $□$

A similar result holds for (k, d) such that $(d - 1) k = 4$ . We give its proof in Section 3 of the Supplementary file.

Proposition 4

If N is a semi-5-ary level-1 or a semi-3-ary level-2 network, then, for every $m = 1, \dots, n$ , at least one of the following assertions is true:

${Opt}_{m} \subseteq ⋃_{j = 1}^{4} Opt- τ_{k, d, j} ({Opt}_{m - j})$ and ${Opt}_{m - 1} \subseteq ⋃_{j = 1}^{4} Opt- τ_{k, d, j}^{- 1} ({Opt}_{m - 1 + j})$ .
${Opt}_{m + 1} = Opt- τ_{k, d, 3} ({Opt}_{m - 2})$ ,

where $(k, d) = (2, 3)$ or (1, 5), depending on the type of network.

So, while we could give a greedy optimization algorithm for semibinary level-2 networks or semi-3-ary level-1 networks, an analogous argument fails for more complex networks. The reason why Propositions 3 and 4 are not sufficient to provide such a greedy algorithm is that we would require their assertion (a) —or a similar expression— to be true for all m. In the occurrence of any m where only assertion (b) holds, we do not have enough information about ${Opt}_{m}$ to be able to ensure that it can be obtained from previous optimal sets.

Remark 4

A close analysis of the proof of Proposition 3, using Corollary 2 in its full strength, shows that we actually have a more general result: for every optimal sequence Y of N and for every $1 < m ⩽ n$ , at least one of the following conditions holds (the labels correspond to the cases in the proof):

$Y_{m} \in Opt- τ_{1} ({Opt}_{m - 1})$ and $Y_{m - 1} \in Opt- τ_{1}^{- 1} ({Opt}_{m})$ .
$Y_{m} \in Opt- τ_{2} ({Opt}_{m - 2})$ , $Y_{m - 1} \in Opt- τ_{2}^{- 1} ({Opt}_{m + 1})$ , and
- $Y_{m + 1} \in Opt- τ_{1} ({Opt}_{m})$ and $Y_{m - 2} \in Opt- τ_{1}^{- 1} {Opt}_{m - 1}$ , or
- $Y_{m + 1} \in Opt- τ_{2} ({Opt}_{m - 1})$ and $Y_{m - 2} \in Opt- τ_{2}^{- 1} {Opt}_{m}$ .
$Y_{m + 1} \in Opt- τ_{3} ({Opt}_{m - 2})$ and $Y_{m - 2} \in Opt- τ_{3}^{- 1} ({Opt}_{m + 1})$ .
$Y_{m} \in Opt- τ_{3} ({Opt}_{m - 3})$ , $Y_{m - 1} \in Opt- τ_{3}^{- 1} ({Opt}_{m + 2})$ , $Y_{m + 2} \in Opt- τ_{1} ({Opt}_{m + 1})$ , $Y_{m - 3} \in Opt- τ_{1}^{- 1} {Opt}_{m - 2}$ , and
- $Y_{m + 1} \in Opt- τ_{1} ({Opt}_{m})$ and $Y_{m - 2} \in Opt- τ_{1}^{- 1} {Opt}_{m - 1}$ , or
- $Y_{m + 1} \in Opt- τ_{2} ({Opt}_{m - 1})$ and $Y_{m - 2} \in Opt- τ_{2}^{- 1} {Opt}_{m}$ .
$Y_{m + 1} \in Opt- τ_{3} ({Opt}_{m - 2})$ and $Y_{m - 2} \in Opt- τ_{3}^{- 1} ({Opt}_{m + 1})$ .
$Y_{m} \in Opt- τ_{3} ({Opt}_{m - 3})$ , $Y_{m - 1} \in Opt- τ_{3}^{- 1} ({Opt}_{m + 2})$ , and
- $Y_{m + 2} \in Opt- τ_{2} ({Opt}_{m})$ and $Y_{m - 3} \in Opt- τ_{2}^{- 1} {Opt}_{m - 1}$ , or
- $Y_{m + 2} \in Opt- τ_{3} ({Opt}_{m - 1})$ and $Y_{m - 3} \in Opt- τ_{3}^{- 1} {Opt}_{m}$ .

Unfortunately, the extra information obtained in this way is still not enough to prove the correctness of a greedy $rPSD$ -optimization algorithm for the networks considered in that proposition. A similar situation appears in the context of Proposition 4.

But we must point out that we have not been able to find any semibinary level-3 or any semi-4-ary level-1 network for which ${Opt}_{m} ⊈ ⋃_{j = 1}^{3} Opt- τ_{k, d, j} ({Opt}_{m - j})$ for some m. Similarly, we have not been able to find any semi-5-ary level-1 or any semi-3-ary level-2 network for which ${Opt}_{m} ⊈ ⋃_{j = 1}^{4} Opt- τ_{k, d, j} ({Opt}_{m - j})$ for some m. So, it might be possible that the greedy algorithm also works in these cases, since we have not discovered a counterexample that disproves its correctness for these types of networks. In Section 4 of the Supplementary file we provide several examples that illustrate our search for a counterexample. More examples can be found in the second author’s PhD Thesis (Riera 2023).

Conclusions

PD on phylogenetic trees satisfies the strong exchange property that guarantees that, for every two sets of leaves of different cardinalities, a leaf can always be moved from the larger set to the smaller one without decreasing the sum of the PD values. But rPSD does not longer satisfy this exchange property even for galled trees. In this paper we have generalized this exchange property to rPSD on phylogenetic networks of bounded level and reticulations’ in-degree, showing that a similar results holds if we allow more involved exchanges of leaves’ subsets. Our final goal was to use this generalized exchange property to find a polynomial time greedy algorithm for the optimization of rPSD on phylogenetic networks of bounded level and in-degree of reticulations. We have ultimately failed in this goal. We have indeed shown that the generalized exchange property entails such a greedy algorithm for semibinary level-2 networks and semi-3-ary level-1 networks (and sheds new light on the structure of the families of rPSD-optimal sets ${Opt}_{m}$ on galled trees) but it cannot be used, as it stands, to obtain such an algorithm on more complex networks. However, we have not been able to find examples of semibinary level-3 networks or semi-4-ary level-1 networks where the greedy algorithm fails: it is simply that the generalized exchange property alone seems not to be enough to prove its correctness.

Finally, it is important to point out that just like the $rPSD$ optimization problem itself, testing counterexamples is computationally expensive, too. While the greedy algorithm runs in polynomial time, finding whether ${Opt}_{m}$ can be obtained from some ${Opt}_{m - j}$ or not still requires calculating ${Opt}_{m}$ by brute force, and testing whether the exchange property holds for a certain subset of $S_{k, d}$ where $| A | - | B | < j$ also requires testing all subsets $X, X^{'} \subseteq Σ$ . All these operations are exponential, hence trying even slightly larger examples can dramatically increase the runtime of the test.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (pdf 487 KB)^{(487KB, pdf)}

Acknowledgements

This research was partially supported by the grant PID2021-126114NB-C44, PGC2018-096956-B-C43 funded by MCIU/AEI/10.13039/501100011033 and by “ERDF/EU.”

Appendix A: Proof of Theorem 1

We begin by stating two auxiliary lemmas. From now on, we shall call a semi-d-ary k-blob any blob with k reticulations, all of them of in-degree $⩽ d$ . Given such a semi-d-ary k-blob $B$ and a non-empty subset $E_{1}$ of its exit reticulations, the first lemma provides a sharp upper bound for the cardinality of any independent set of nodes V of $B$ whose members have no descendant exit reticulation outside $E_{1}$ . This bound will entail the bound for the cardinality of A in the definition of $S_{k, d}$ . We give the proof of this lemma in Section 1 of the Supplementary file.

Lemma 2

Let $B$ be a semi-d-ary k-blob with l exit reticulations and $E_{1}$ a non-empty subset of its exit reticulations of cardinality $l_{1} ⩾ 1$ . Then, for every independent set of nodes V of $B$ without descendant exit reticulations outside $E_{1}$ , $| V | ⩽ d l_{1} + (d - 1) (k - l) .$

The constructions explained in the proof of this lemma easily show that the bound it provides is sharp, in the sense that, for every $d, k, l, l_{1}$ with $d ⩾ 2$ and $k ⩾ l ⩾ l_{1} ⩾ 1$ , there are semi-d-ary k-blobs with l exit reticulations and subsets $E_{1}$ of $l_{1}$ exit reticulations containing an independent set of nodes V without descendant exit reticulations outside $E_{1}$ of cardinality $d l_{1} + (d - 1) (k - l)$ : cf. Fig. 5.

Fig. 5 — A semibinary k-blob with l exit reticulations, a subset $E_{1}$ of $l_{1} > 0$ exit reticulations, and an independent set of nodes (represented by filled circles) without descendant exit reticulations outside $E_{1}$ reaching the upper bound in Lemma 2 for $d = 2$

Remark 5

By Lemma 2, if $B$ is a semi-d-ary blob without internal reticulations, if $E_{1}$ is a subset of its exit reticulations of cardinality $l_{1}$ , and if V is an independent set of nodes in $B$ without descendant exit reticulations outside $E_{1}$ , then $| V | ⩽ l_{1} d$ . A close analysis of the proof of that lemma easily shows that the upper bound $| V | = d l_{1}$ is achieved when all the reticulations in $E_{1}$ have in-degree d and the set V contains, for every $H \in E_{1}$ , exactly d nodes whose only reticulate descendant is H. Of course, such sets do not always exist: for instance, when $B$ contains a node that is a parent of two different exit reticulations.

The second auxiliary lemma extracts a key technical step in the proof of Theorem 1. This lemma provides an analog of the exchange property for sets of ancestors of multisets of nodes of a blob. More precisely, we prove that if $X, X^{'}$ are multisets of nodes of a semi-d-ary k-blob with $| X | > | X^{'} |$ and satisfying some extra conditions (those under which we shall apply the lemma in the proof of the main theorem) then there exist a subset A of X disjoint with $X^{'}$ and a submultiset B of $X^{'}$ disjoint with X whose cardinalities satisfy the restrictions defining the family $S_{k, d}$ and such that if we replace X and $X^{'}$ by $(X \ A) \cup B$ and $(X^{'} \ B) \cup A$ , the set of nodes that are simultaneously ancestors of nodes in both sets does not decrease.

We use in this lemma some standard notation for multisets X: $m_{X} (v)$ denotes the multiplicity of an element v in X; $Supp X$ denotes the support of X, that is, the set of elements v such that $m_{X} (v) > 0$ ; we say that X is a set when all its multiplicities are $⩽ 1$ , and then we identify it with its support; a submultiset Y of X is full when $m_{Y} (y) = m_{X} (y)$ for every $y \in Supp Y \subseteq Supp X$ ; and the cardinality of X is $| X | = \sum_{v \in Supp X} m_{X} (v)$ . We shall also use the notation $τ_{S, T} (X) = (X \ S) \cup T$ when X, S, T are multisets with $S \subseteq X$ and $Supp T \subseteq Σ \ Supp X$ .

This lemma also uses some basic properties of $↑$ -notation. Some simple results in this regard are that, for any two sets A, B, $↑ (A \cup B) = ↑ A \cup ↑ B$ and $↑ A \ ↑ B \subseteq ↑ (A \ B)$ , and that if $A \subseteq B$ , then $↑ A \subseteq ↑ B$ . Moreover, given a multiset A, we define $↑ A$ as $↑ Supp A$ , without taking into account the multiplicities of the elements of A.

Lemma 3

Let $B$ be a semi-d-ary k-blob and $X, X^{'}$ two multisets of nodes of $B$ with $| X^{'} | < | X |$ and satisfying the following two further conditions:

(i)
For each $v \in V (B)$ , if $m_{X^{'}} (v) < m_{X} (v)$ , then $m_{X} (v) = 1$ and $m_{X^{'}} (v) = 0$ .
(ii)
Each exit reticulation of $B$ belongs to X or $X^{'}$ .

Then

\begin{matrix} ↑ X \cap ↑ X^{'} \subseteq ↑ τ_{A, B} (X) \cap ↑ τ_{B, A} (X^{'}) \end{matrix}

for some set $A \subseteq Supp X \ Supp X^{'}$ and some full submultiset B of $X^{'}$ with $Supp B \subseteq Supp X^{'} \ Supp X$ such that $B = \emptyset$ and $| A | = 1$ , or $0 < | B | < | A | = d$ , or $0 < | B | < | A | < d k$ and $| A | - | B | ⩽ (d - 1) k$ .

Proof

First, we introduce some notation.

Let $E$ be the set of exit reticulations of $B$ , and let
$\begin{matrix} \begin{matrix} E_{X} = E \cap (Supp X \ Supp X^{'}), E_{X^{'}} = E \cap (Supp X^{'} \ Supp X), \\ [1 e x] E_{X, X^{'}} = E \cap (Supp X \cap Supp X^{'}) . \end{matrix} \end{matrix}$
Let $l_{X} = | E_{X} |$ , $l_{X^{'}} = | E_{X^{'}} |$ and $l_{X, X^{'}} = | E_{X, X^{'}} |$ . By (ii), $l_{X} + l_{X^{'}} + l_{X, X^{'}} = | E |$ .
For each $H \in E$ , let $↑_{only} H$ be the set $↑ H \ ↑ (E \ {H})$ of nodes whose only descendant exit reticulation is H. Since every node in $V (B)$ has some descendant exit reticulation, $↑_{only} H = V (B) \ ↑ (E \ {H})$ . Observe that $↑_{only} {H \cap ↑}_{only} H^{'} = \emptyset$ if $H \neq H^{'}$ .
Let $\hat{X}$ be the set $Supp X \ Supp X^{'}$ . By (i), $\hat{X} = {v \in V (B) : m_{X^{'}} (v) < m_{X} (v)}$ .
Let ${\hat{X}}^{'}$ be the full submultiset of $X^{'}$ supported on $Supp X^{'} \ Supp X$ .

The inequality $| X | > | X^{'} |$ implies that $| \hat{X} | > | {\hat{X}}^{'} |$ , too. Indeed:

\begin{matrix} 0 & < | X | - | X^{'} | = \sum_{v \in V (B)} (m_{X} (v) - m_{X^{'}} (v)) \\ = \sum_{\begin{matrix} v \in V (B) \\ m_{X} (v) > m_{X^{'}} (v) \end{matrix}} (m_{X} (v) - m_{X^{'}} (v)) - \sum_{\begin{matrix} v \in V (B) \\ m_{X^{'}} (v) > m_{X} (v) \end{matrix}} (m_{X^{'}} (v) - m_{X} (v)) \\ = | \hat{X} | - \sum_{\begin{matrix} v \in V (B) \\ m_{X^{'}} (v) > m_{X} (v) \end{matrix}} (m_{X^{'}} (v) - m_{X} (v)) ⩽ | \hat{X} | - \sum_{v \in {\hat{X}}^{'}} (m_{X^{'}} (v) - m_{X} (v)) \\ = | \hat{X} | - \sum_{v \in {\hat{X}}^{'}} m_{X^{'}} (v) = | \hat{X} | - | {\hat{X}}^{'} | . \end{matrix}

We shall consider three cases; in all of them we shall choose a subset $A \subseteq \hat{X}$ and a full submultiset $B \subseteq {\hat{X}}^{'}$ satisfying the requirements in the statement and we shall prove that they satisfy Eqn. (5).

(a) If there exists some $x \in \hat{X}$ with a proper descendant in X, then $x \in ↑ (X \ {x})$ and hence $↑ X = ↑ (X \ {x})$ . In this case, taking $A = {x}$ and $B = \emptyset$ we have that

\begin{matrix} ↑ X \cap ↑ X^{'} = ↑ (X \ {x}) \cap ↑ X^{'} \subseteq ↑ (X \ {x}) \cap ↑ (X^{'} \cup {x}) . \end{matrix}

(b) Assume that no $x \in \hat{X}$ has any proper descendant in X and that $E_{X^{'}} = \emptyset$ . This implies that $E = E_{X} \cup E_{X, X^{'}} \subseteq X$ and that $\hat{X} = E_{X}$ , as any $x \in \hat{X} \ E_{X}$ would have some proper descendant in $E \subseteq X$ .

In this case, there exists an $H_{0} \in E_{X}$ such that ${\hat{X}}^{'} \subseteq ↑ (E \ {H_{0}})$ . Indeed, assume that for every $H \in E_{X}$ there existed some node $x_{H}^{'} \in {\hat{X}}^{'}$ without any descendant in $E \ {H}$ . Then, each $x_{H}^{'}$ would belong to $↑_{only} H$ . Since the sets $↑_{only} H$ are pairwise disjoint, the nodes $x_{H}^{'}$ would be pairwise different, forming a subset of $Supp {\hat{X}}^{'}$ of cardinality $| E_{X} | = | \hat{X} |$ , which cannot exist because $| {\hat{X}}^{'} | < | \hat{X} |$ .

Take then $A = {H_{0}}$ and $B = \emptyset$ . If can prove that $↑ X^{'} \subseteq ↑ (X \ {H_{0}})$ , then we will have

\begin{matrix} ↑ X \cap ↑ X^{'} \subseteq ↑ X^{'} = ↑ (X \ {H_{0}}) \cap ↑ X^{'} \subseteq ↑ (X \ {H_{0}}) \cap ↑ (X^{'} \cup {H_{0}}) . \end{matrix}

So, let $v \in ↑ X^{'}$ . There are two possibilities:

If v has some descendant in ${\hat{X}}^{'}$ , then the latter will have a descendant in $E \ {H_{0}} \subseteq X \ {H_{0}}$ , which will also be a descendant of v.
If v has no descendant in ${\hat{X}}^{'}$ , then
$\begin{matrix} v \in ↑ X^{'} \ ↑ {\hat{X}}^{'} \subseteq ↑ (X^{'} \ {\hat{X}}^{'}) = & ↑ (X \cap X^{'}) = ↑ (X \ \hat{X}) \\ = & ↑ (X \ E_{X}) \subseteq ↑ (X \ {H_{0}}) . \end{matrix}$

(c) Assume finally that $E_{X^{'}} \neq \emptyset$ and that no $x \in \hat{X}$ has any proper descendant in X. This last condition implies that the set of nodes $\hat{X} \ E_{X}$ is independent and all their descendant exit reticulations belong to $E_{X^{'}}$ . Then, by Lemma 2 we have that

\begin{matrix} | \hat{X} | = | \hat{X} \ E_{X} | + | E_{X} | ⩽ (d - 1) (k - l_{X} - l_{X^{'}} - l_{X, X^{'}}) + d l_{X^{'}} + l_{X} \\ = (d - 1) k - (d - 2) l_{X} + l_{X^{'}} - (d - 1) l_{X, X^{'}} \\ ⩽ (d - 1) k + l_{X^{'}} (because d ⩾ 2) \\ ⩽ (d - 1) k + min {k, | {\hat{X}}^{'} |} (because l_{X^{'}} ⩽ k and l_{X^{'}} ⩽ | Supp {\hat{X}}^{'} | ⩽ | {\hat{X}}^{'} |) . \end{matrix}

In particular,

\begin{matrix} | \hat{X} | ⩽ d k and | \hat{X} | - | {\hat{X}}^{'} | ⩽ (d - 1) k . \end{matrix}

Now, on the one hand, if $| \hat{X} | < d k$ , take $A = \hat{X}$ and $B = {\hat{X}}^{'}$ . By Eqns. (6) and (8), they satisfy the required conditions in the statement, and

\begin{matrix} Supp τ_{B, A} (X^{'}) = Supp ((X^{'} \ {\hat{X}}^{'}) \cup \hat{X}) = Supp X, \\ Supp τ_{A, B} (X) = Supp ((X \ \hat{X}) \cup {\hat{X}}^{'}) = Supp X^{'}, \end{matrix}

which implies $↑ X^{'} \cap ↑ X = ↑ τ_{A, B} (X) \cap ↑ τ_{B, A} (X^{'})$ .

On the other hand, if $| \hat{X} | = d k$ , then all inequalities in the sequence (7) as well as the inequality $l_{X^{'}} ⩽ k$ are equalities. The equality $l_{X^{'}} = k$ implies that the blob $B$ has no reticulation other than those in $E_{X^{'}}$ . Moreover, since the first inequality in (7) is an equality, $\hat{X}$ reaches the maximum number of possible independent nodes in $↑ E_{X^{'}} = ↑ E$ . Then, as noted in Remark 5, it must happen for each $H \in E$ that ${deg}_{in} (H) = d$ and $| \hat{X} \cap ↑ H | = | \hat{X} \cap ↑_{only} H | = d$ .

Now, since $k = | E_{X^{'}} | ⩽ | {\hat{X}}^{'} | < | \hat{X} | = d k$ , there must exist some $H_{0} \in E_{X^{'}}$ with $m_{X^{'}} (H_{0}) < d$ . Take $A = \hat{X} {\cap ↑}_{only} H_{0} = \hat{X} \cap ↑ H_{0}$ and B the multiset with $Supp B = {H_{0}}$ and $m_{B} (H_{0}) = m_{X^{'}} (H_{0})$ . We have that $0 < | B | < | A | = d$ and hence the pair (A, B) satisfies the requirements in the statement. As to Eqn. (5), notice that

\begin{matrix} ↑ X^{'} \subseteq & ↑ (X^{'} \ {H_{0}}) \cup ↑ A \cup {H_{0}} \\ \cup {x^{'} \in X^{'} ∣ x^{'} intermediate in some path A ⇝ H_{0}} . \end{matrix}

Now, $H_{0} \notin ↑ X$ and, by assumption, the elements of A have no proper descendant in X, which implies

\begin{matrix} ({H_{0}} \cup {x^{'} \in X^{'} ∣ x^{'} intermediate in some path A ⇝ H_{0}}) \cap ↑ X = \emptyset . \end{matrix}

Moreover, since $A \subseteq ↑ H_{0}$ , we have that $↑ X \subseteq ↑ ((X \ A) \cup {H_{0}})$ . Therefore

\begin{matrix} ↑ X^{'} \cap ↑ X \subseteq (↑ (X^{'} \ {H_{0}}) \cup ↑ A) \cap ↑ X \subseteq ↑ ((X^{'} \ {H_{0}}) \cup A) \cap ↑ ((X \ A) \cup {H_{0}}) \end{matrix}

as we wanted to prove. $□$

Theorem 1

If N is a semi-d-ary level-k phylogenetic network, ${rPSD}_{N}$ satisfies the exchange property with respect to $S_{k, d}$ .

Proof

The case $k = 0$ is Steel’s strong exchange property for phylogenetic trees (Steel 2016, §6.4.1). So, we shall focus on the case $k ⩾ 1$ .

Without any loss of generality, we can assume that every tree node in N is at most bifurcating, in the sense that the out-degree of each tree node is at most 2 (recall that we do not forbid out-degree 1 tree nodes in our networks). Indeed, let first $N^{'}$ be the phylogenetic network obtained from N as follows: for every node v that is the split node of more than one blob and for each such blob rooted at v, add a new split node $v_{i}$ to the blob and a new arc $(v, v_{i})$ with weight 0. $N^{'}$ is still semi-d-ary and level-k, no node in it is the split node of more than one blob, and ${rPSD}_{N} (Z) = {rPSD}_{N^{'}} (Z)$ for every $Z \subseteq Σ$ . Now, let $N^{''}$ be the phylogenetic network obtained from $N^{'}$ as follows: for every tree node v with $k ⩾ 3$ children $v_{1}, \dots, v_{k}$ , replace in N the subgraph supported on ${v, v_{1}, \dots, v_{k}}$ by a bifurcating tree with root v and leaves $v_{1}, \dots, v_{k}$ and all its arcs except those ending in $v_{1}, \dots, v_{k}$ of weight 0: the arc ending in each $v_{i}$ inherits the original weight of $(v, v_{i})$ ; if any node $v_{i}$ had any entering arcs other than $(v, v_{i})$ , we keep them with their weights. Since v was the split node of at most one blob, no blob increases its level from $N^{'}$ to $N^{''}$ , and therefore $N^{''}$ is still semi-d-ary and level-k, and ${rPSD}_{N^{''}} (Z) = {rPSD}_{N^{'}} (Z) = {rPSD}_{N} (Z)$ for every $Z \subseteq Σ$ .

So, in the rest of this proof we shall suppose that N is at-most-bifurcating and in particular that no node in N is the split node of more than one blob.

We shall proceed by induction on the number $α$ of arcs of the network. A phylogenetic network with $α = 0$ is a phylogenetic tree consisting of a single leaf, where the stated exchange property trivially holds. Now, let N be an at-most-bifurcating semi-d-ary level-k phylogenetic network with $α ⩾ 1$ arcs, and let us suppose that the thesis in the statement is true for all at-most-bifurcating semi-d-ary level-k phylogenetic networks with less than $α$ arcs.

Let $X, X^{'} \subseteq Σ$ with $| X^{'} | < | X |$ . If $| X | = 1$ the exchange property is trivially satisfied taking $A = X$ and $B = X^{'} = \emptyset$ , so we assume from now on that $| X | ⩾ 2$ . Now consider the tree of blobs T of N (Gusfield et al. 2007), obtained by collapsing each blob in N into its split node. Then, T is a phylogenetic tree with the same root r as N, $V (T) \subseteq V (N)$ , and, for every $v \in V (T)$ , its cluster in T and in N are the same; let us denote it by C(v). Since $| X^{'} | < | X |$ and $| X | ⩾ 2$ , the set of nodes v in T such that $| X^{'} \cap C (v) | < | X \cap C (v) |$ and $1 < | X \cap C (v) |$ is nonempty: it contains the root r.

We shall consider four cases.

(a) Assume that T contains some node $v_{0} \neq r$ such that $| X \cap C (v_{0}) | > | X^{'} \cap C (v_{0}) |$ and $| X \cap C (v_{0}) | > 1$ . Since $v_{0} \in V (T)$ , $v_{0}$ is in N a tree node such that the arc $e_{0} = (v_{1}, v_{0})$ ending in it does not belong to any blob, which implies that it is a cut arc. Let $N_{0} = N_{v_{0}}$ and let $N_{1}$ be the network obtained from N by removing $N_{v_{0}}$ and the arc $e_{0}$ and, if $v_{1}$ is a reticulation node, appending to it a dummy leaf child (not labelled in $Σ$ ) through an arc of weight 0; cf. Figure 6. By the induction hypothesis, $N_{0}$ satisfies the thesis in the statement.

Now, for every $Z \subseteq Σ$ , if $Z \cap C (v_{0}) = \emptyset$ , then ${rPSD}_{N} (Z) = {rPSD}_{N_{1}} (Z)$ , and if $Z \cap C (v_{0}) \neq \emptyset$ , then

\begin{matrix} {rPSD}_{N} (Z) = {rPSD}_{N_{0}} (Z) + {rPSD}_{N_{1}} (Z) + w (e_{0}) + \sum_{e \in ↑ v_{1} \ ↑ (Z \ C (v_{0}))} w (e) . \end{matrix}

(Throughout this proof, given a network $N^{'}$ with set of leaves $Σ^{'}$ and a set Z, we write ${rPSD}_{N^{'}} (Z)$ to denote actually ${rPSD}_{N^{'}} (Z \cap Σ^{'})$ . So, for instance, ${rPSD}_{N_{0}} (Z)$ and ${rPSD}_{N_{1}} (Z)$ in the expressions above actually mean ${rPSD}_{N_{0}} (Z \cap C (v_{0}))$ and ${rPSD}_{N_{1}} (Z \ C (v_{0}))$ , respectively.)

Since $| X \cap C (v_{0}) | > | X^{'} \cap C (v_{0}) |$ , by the induction hypothesis there exist $A \subseteq (X \ X^{'}) \cap C (v_{0})$ and $B \subseteq (X^{'} \ X) \cap C (v_{0})$ such that $(A, B) \in S_{k, d} (C (v_{0})) \subseteq S_{k, d}$ and

\begin{matrix} {rPSD}_{N_{0}} (X) - {rPSD}_{N_{0}} (τ_{A, B} (X)) ⩽ {rPSD}_{N_{0}} (τ_{B, A} (X^{'})) - {rPSD}_{N_{0}} (X^{'}) . \end{matrix}

Since $A, B \subseteq C (v_{0})$ , $τ_{A, B} (X) \ C (v_{0}) = X \ C (v_{0})$ and $τ_{B, A} (X^{'}) \ C (v_{0}) = X^{'} \ C (v_{0})$ , and thus, in particular,

\begin{matrix} {rPSD}_{N_{1}} (X) = {rPSD}_{N_{1}} (τ_{A, B} (X)), {rPSD}_{N_{1}} (X^{'}) = {rPSD}_{N_{1}} (τ_{B, A} (X^{'})) . \end{matrix}

Notice also that $τ_{B, A} (X^{'}) \cap C (v_{0}) \neq \emptyset$ because $A \neq \emptyset$ .

Assume first that $B \neq \emptyset$ , so that $X^{'} \cap C (v_{0}) \neq \emptyset$ and $τ_{A, B} (X) \cap C (v_{0}) \neq \emptyset$ . Then,

By the same argument, using that $X^{'} \cap C (v_{0}) \neq \emptyset$ and $τ_{B, A} (X^{'}) \cap C (v_{0}) \neq \emptyset$ , we also have that

\begin{matrix} {rPSD}_{N} (τ_{B, A} (X^{'})) - {rPSD}_{N} (X^{'}) = {rPSD}_{N_{0}} (τ_{B, A} (X^{'})) - {rPSD}_{N_{0}} (X^{'}) . \end{matrix}

Therefore, by Eqn. (9),

\begin{matrix} {rPSD}_{N} (X) - {rPSD}_{N} (τ_{A, B} (X)) = {rPSD}_{N_{0}} (X) - {rPSD}_{N_{0}} (τ_{A, B} (X)) \\ ⩽ {rPSD}_{N_{0}} (τ_{B, A} (X^{'})) - {rPSD}_{N_{0}} (X^{'}) = {rPSD}_{N} (τ_{B, A} (X^{'})) - {rPSD}_{N} (X^{'}) . \end{matrix}

Assume now that $B = \emptyset$ . Then, by the definition of $S_{k, d}$ , the set A must be a singleton and then $τ_{A, B} (X) \cap C (v_{0}) = (X \ A) \cap C (v_{0}) \neq \emptyset$ , because, by assumption, $| X \cap C (v_{0}) | > 1$ . Then, arguing as above, we have that

\begin{matrix} {rPSD}_{N} (X) - {rPSD}_{N} (τ_{A, B} (X)) = {rPSD}_{N_{0}} (X) - {rPSD}_{N_{0}} (τ_{A, B} (X)) . \end{matrix}

Similarly, if $X^{'} \cap C (v_{0}) \neq \emptyset$ ,

\begin{matrix} {rPSD}_{N} (τ_{B, A} (X^{'})) - {rPSD}_{N} (X^{'}) = {rPSD}_{N_{0}} (τ_{B, A} (X^{'})) - {rPSD}_{N_{0}} (X^{'}), \end{matrix}

while if $X^{'} \cap C (v_{0}) = \emptyset$ (and using that $τ_{B, A} (X^{'}) \ C (v_{0}) = X^{'} \ C (v_{0})$ ),

In either case, by Eqn. (9) we have again

\begin{matrix} {rPSD}_{N} (X) - {rPSD}_{N} (τ_{A, B} (X)) = {rPSD}_{N_{0}} (X) - {rPSD}_{N_{0}} (τ_{A, B} (X)) \\ ⩽ {rPSD}_{N_{0}} (τ_{B, A} (X^{'})) - {rPSD}_{N_{0}} (X^{'}) ⩽ {rPSD}_{N} (τ_{B, A} (X^{'})) - {rPSD}_{N} (X^{'}) . \end{matrix}

(b) Assume now that the only node v in T such that $| X \cap C (v) | > | X^{'} \cap C (v) |$ and $| X \cap C (v) | > 1$ is the root r, and that r is not the split node of any blob in N. Then, each child v of r in N is also its child in T and thus, if $| X \cap C (v) | > | X^{'} \cap C (v) |$ , then $| X \cap C (v) | = 1$ . But since $| X | > | X^{'} |$ , r must have some child $v_{1}$ such that $| X \cap C (v_{1}) | > | X^{'} \cap C (v_{1}) |$ and hence such that $| X \cap C (v_{1}) | = 1$ and $X^{'} \cap C (v_{1}) = \emptyset$ ; and then, since $| X | ⩾ 2$ , r must have a second child $v_{2}$ and $X \cap C (v_{2}) \neq \emptyset$ . For each $i = 1, 2$ , let $e_{i} = (r, v_{i})$ and let $N_{i}$ be the subnetwork of N rooted at $v_{i}$ . The sets of leaves $C (v_{1}), C (v_{2})$ of $N_{1}, N_{2}$ are disjoint and therefore, for each $Z \subseteq Σ$ ,

\begin{matrix} {rPSD}_{N} (Z) = {rPSD}_{N_{1}} (Z) + {rPSD}_{N_{2}} (Z) + χ_{N_{1}} (Z) w (e_{1}) + χ_{N_{2}} (Z) w (e_{2}) \end{matrix}

where, for each $i = 1, 2$ , $χ_{N_{i}} (Z) = 1$ if $Z \cap C (v_{i}) \neq \emptyset$ and $χ_{N_{i}} (Z) = 0$ otherwise.

Let $X \cap C (v_{1}) = {x}$ and take $A = {x}$ and $B = \emptyset$ . Then, $(A, B) \in S_{0}$ and $τ_{A, B} (X) \cap C (v_{1}) = \emptyset$ , $τ_{B, A} (X^{'}) \cap C (v_{1}) = {x}$ , $τ_{A, B} (X) \cap C (v_{2}) = X \cap C (v_{2})$ , and $τ_{B, A} (X^{'}) \cap C (v_{2}) = X^{'} \cap C (v_{2})$ . Therefore,

\begin{matrix} {rPSD}_{N} (X) - {rPSD}_{N} (τ_{A, B} (X)) \\ = {rPSD}_{N_{1}} (X) + {rPSD}_{N_{2}} (X) + χ_{N_{1}} (X) w (e_{1}) + χ_{N_{2}} (X) w (e_{2}) - {rPSD}_{N_{1}} (τ_{A, B} (X)) \\ - {rPSD}_{N_{2}} (τ_{A, B} (X)) - χ_{N_{1}} (τ_{A, B} (X)) w (e_{1}) - χ_{N_{2}} (τ_{A, B} (X)) w (e_{2}) \\ = {rPSD}_{N_{1}} ({x}) + {rPSD}_{N_{2}} (X) + w (e_{1}) + w (e_{2}) - 0 - {rPSD}_{N_{2}} (X) - 0 \cdot w (e_{1}) - w (e_{2}) \\ = {rPSD}_{N_{1}} ({x}) + w (e_{1}) \end{matrix}

and, similarly,

\begin{matrix} \begin{matrix} {rPSD}_{N} (τ_{B, A} (X^{'})) - {rPSD}_{N} (X^{'}) \\ = {rPSD}_{N_{1}} ({x}) + {rPSD}_{N_{2}} (X^{'}) + w (e_{1}) + χ_{N_{2}} (X^{'}) w (e_{2}) \\ - 0 - {rPSD}_{N_{2}} (X^{'}) - 0 \cdot w (e_{1}) - χ_{N_{2}} (X^{'}) w (e_{2}) \\ = {rPSD}_{N_{1}} ({x}) + w (e_{1}) . \end{matrix} \end{matrix}

Hence, in this case,

\begin{matrix} {rPSD}_{N} (X) - {rPSD}_{N} (τ_{A, B} (X)) = {rPSD}_{N} (τ_{B, A} (X^{'})) - {rPSD}_{N} (X^{'}) . \end{matrix}

(c) Assume finally that the only node v in T such that $| X \cap C (v) | > | X^{'} \cap C (v) |$ and $| X \cap C (v) | > 1$ is the root r, and that r is the split node of a (single) blob $B$ . we distinguish two subcases.

(c.1) If $B$ contains some exit reticulation H with no descendant in $X \cup X^{'}$ , and if $v_{1}, \dots, v_{d^{'}}$ are the parents of H, then let $\hat{N}$ be the phylogenetic network obtained from N by removing the subnetwork $N_{H}$ , adding new leaves $h_{1}, \dots, h_{d^{'}}$ with dummy labels outside $Σ$ , and replacing each arc $(v_{i}, H)$ by an arc $(v_{i}, h_{i})$ with weight 0; cf. Figure 7. $\hat{N}$ is still at-most-bifurcating, semi-d-ary, and level-k and it has less than $α$ arcs (we have removed the arcs in $N_{H}$ ). Therefore, by the induction hypothesis, it satisfies the thesis in the statement. Let $\hat{Σ}$ be its set of labels. Then, since, by assumption, $X, X^{'} \subseteq \hat{Σ} \cap Σ$ , there exist $A \subseteq X \ X^{'}$ and $B \subseteq X^{'} \ X$ such that $(A, B) \in S_{k, d} (\hat{Σ} \cap Σ) \subseteq S_{k, d}$ and

\begin{matrix} {rPSD}_{\hat{N}} (X) - {rPSD}_{\hat{N}} (τ_{A, B} (X)) ⩽ {rPSD}_{\hat{N}} (τ_{B, A} (X^{'})) - {rPSD}_{\hat{N}} (X^{'}) . \end{matrix}

Since ${rPSD}_{\hat{N}} (Z) = {rPSD}_{N} (Z)$ for every $Z \subseteq \hat{Σ} \cap Σ$ , we conclude that

\begin{matrix} {rPSD}_{N} (X) - {rPSD}_{N} (τ_{A, B} (X)) ⩽ {rPSD}_{N} (τ_{B, A} (X^{'})) - {rPSD}_{N} (X^{'}) . \end{matrix}

(c.2) Finally, assume that all the exit reticulations of the blob $B$ rooted at r have descendants in X or $X^{'}$ . Let $B^{*}$ be the set of nodes of $B$ that have a child outside of $B$ ; if $v \in B^{*}$ , we shall denote its child outside of $B$ by $\bar{v}$ . Notice that:

$r \notin B^{*}$ (its two children must belong to the blob);
the exit reticulations of $B$ belong to $B^{*}$ ;
since reticulations have out-degree 1, the internal reticulations of $B$ do not belong to $B^{*}$ ;
$\bar{v} \in V (T) \ {r}$ for every $v \in B^{*}$ , and thus, by the current assumption, if $| X \cap C (\bar{v}) | > | X^{'} \cap C (\bar{v}) |$ then $| X \cap C (\bar{v}) | = 1$ .

For each $v \in B^{*}$ let ${\bar{N}}_{v}$ be the subnetwork of N rooted at v consisting of $N_{\bar{v}}$ , v and the arc $(v, \bar{v})$ .

Fig. 7 — The networks N and $\hat{N}$ in case (c.1)

For each $Z \subseteq Σ$ , we shall denote by $B_{Z}^{*}$ the multiset of nodes of $B^{*}$ supported on

\begin{matrix} Supp B_{Z}^{*} = {v \in B^{*} : Z \cap C (\bar{v}) \neq \emptyset} \end{matrix}

and with multiplicities $m_{B_{Z}^{*}} (v) = | Z \cap C (\bar{v}) |$ . Since the subnetworks ${\bar{N}}_{v}$ , with $v \in B^{*}$ , have pairwise disjoint sets of leaves and the union of their sets of leaves is $Σ$ , we have that $| B_{Z}^{*} | = | Z |$ and

\begin{matrix} {rPSD}_{N} (Z) = \sum_{v \in Supp B_{Z}^{*}} {rPSD}_{{\bar{N}}_{v}} (Z) + \sum_{e \in ↑ B_{Z}^{*}} w (e) . \end{matrix}

So, $| B_{X^{'}}^{*} | = | X^{'} | < | X | = | B_{X}^{*} |$ ; by the current assumption, every exit reticulation belongs to $B_{X}^{*} \cup B_{X^{'}}^{*}$ ; and if $m_{B_{X^{'}}^{*}} (v) = | X^{'} \cap C (\bar{v}) | < m_{B_{X}^{*}} (v) = | X \cap C (\bar{v}) |$ , then $m_{B_{X}^{*}} (v) = 1$ . Therefore, the multisets $B_{X}^{*}$ , $B_{X^{'}}^{*}$ satisfy the hypotheses of Lemma 3, which implies the existence of a set $B_{A}$ and a multiset $B_{B}$ of nodes of $B$ such that:

$B_{A} \subseteq Supp B_{X}^{*} \ Supp B_{X^{'}}^{*}$ ; thus, if $v \in B_{A}$ , $| X \cap C (\bar{v}) | = 1$ and $| X^{'} \cap C (\bar{v}) | = 0$ .
$Supp B_{B} \subseteq Supp B_{X^{'}}^{*} \ Supp B_{X}^{*}$ and, for every $v \in Supp B_{B}$ , $m_{B_{B}} (v) = m_{B_{'}^{*}} (v) = | X^{'} \cap C (\bar{v}) |$ .
$B_{B} = \emptyset$ and $| B_{A} | = 1$ , or $0 < | B_{B} | < | B_{A} | = d$ , or $0 < | B_{B} | < | B_{A} | < d k$ and $| B_{A} | - | B_{B} | ⩽ (d - 1) k$ .
$↑ B_{X}^{*} \cap ↑ B_{X^{'}}^{*} \subseteq ↑ τ_{B_{A}, B_{B}} (B_{X}^{*}) \cap ↑ τ_{B_{B}, B_{A}} (B_{X^{'}}^{*})$ .

Let

\begin{matrix} A = ⋃_{v \in B_{A}} (X \cap C (\bar{v})), B = ⋃_{v \in Supp B_{B}} (X^{'} \cap C (\bar{v})) . \end{matrix}

Then, $A \subseteq X \ X^{'}$ and $B \subseteq X^{'} \ X$ with $| A | = | B_{A} |$ and $| B | = | B_{B} |$ . In particular, by property (3), $(A, B) \in S_{k, d}$ . We shall prove that

\begin{matrix} {rPSD}_{N} (X) - {rPSD}_{N} (τ_{A, B} (X)) ⩽ {rPSD}_{N} (τ_{B, A} (X^{'})) - {rPSD}_{N} (X^{'}) . \end{matrix}

Before doing so, let us point out some facts that we shall use. First, notice that $B_{A} = B_{A}^{*}$ and $B_{B} = B_{B}^{*}$ , because for every $v \in B^{*}$

\begin{matrix} m_{B_{A}^{*}} (v) = | A \cap C (\bar{v}) | = \{\begin{matrix} 1 if v \in A \\ 0 if v \notin A \end{matrix}\} = m_{B_{A}} (v) \\ m_{B_{B}^{*}} (v) = | B \cap C (\bar{v}) | = | X^{'} \cap C (\bar{v}) | \\ (because the clusters C (\bar{v}) are pairwise disjoint) \\ = m_{B_{X^{'}}^{*}} (v) = m_{B_{B}} (v) . (by definition) \end{matrix}

Moreover

\begin{matrix} B_{τ_{A, B} (X)}^{*} = τ_{B_{A}, B_{B}} (B_{X}^{*}) and Supp B_{τ_{A, B} (X)}^{*} = ((Supp B_{X}^{*}) \ B_{A}) \cup Supp B_{B}, \end{matrix}

\begin{matrix} B_{τ_{B, A} (X^{'})}^{*} = τ_{B_{B}, B_{A}} (B_{X^{'}}^{*}) and Supp B_{τ_{B, A} (X^{'})}^{*} = (Supp B_{X^{'}}^{*} \ Supp B_{B}) \cup B_{A} . \end{matrix}

Indeed, as to Eqn. (11), for every $v \in B^{*}$

\begin{matrix} m_{B_{τ_{A, B} (X)}^{*}} (v) = | ((X \ A) \cup B) \cap C (\bar{v}) | = | X \cap C (\bar{v}) | - | A \cap C (\bar{v}) | + | B \cap C (\bar{v}) | \\ = m_{B_{X}^{*}} (v) - m_{B_{A}} (v) + m_{B_{B}} (v) = m_{B_{X}^{*} \ B_{A}} (v) + m_{B_{B}} (v) = m_{(B_{X}^{*} \ B_{A}) \cup B_{B}} (v) \end{matrix}

and in particular

\begin{matrix} Supp B_{τ_{A, B} (X)}^{*} = Supp ((B_{X}^{*} \ B_{A}) \cup B_{B}) = ((Supp B_{X}^{*}) \ B_{A}) \cup Supp B_{B}^{*} \end{matrix}

because $m_{B_{A}} (v) = m_{B_{X}^{*}} (v)$ for every $v \in B_{A}$ .

A similar argument, using that, for every $v \in Supp B_{B}$ , $m_{B_{B}} (v) = m_{B_{X^{'}}^{*}} (v) = | B \cap C (\bar{v}) | = | X^{'} \cap C (\bar{v}) |$ and that $B_{A} \cap Supp B_{X^{'}}^{*} = \emptyset$ , proves Eqn. (12).

We can proceed now to prove the desired inequality

\begin{matrix} {rPSD}_{N} (X) - {rPSD}_{N} (τ_{A, B} (X)) ⩽ {rPSD}_{N} (τ_{B, A} (X^{'})) - {rPSD}_{N} (X^{'}) . \end{matrix}

By Eqn. (10),

\begin{matrix} {rPSD}_{N} (X) - {rPSD}_{N} (τ_{A, B} (X)) \\ = \sum_{v \in Supp B_{X}^{*}} {rPSD}_{{\bar{N}}_{v}} (X) - \sum_{v \in Supp B_{τ_{A, B} (X)}^{*}} {rPSD}_{{\bar{N}}_{v}} (τ_{A, B} (X)) + \sum_{e \in ↑ B_{X}^{*}} w (e) - \sum_{e \in ↑ B_{τ_{A, B} (X)}^{*}} w (e) \end{matrix}

where

\begin{matrix} \sum_{v \in Supp B_{X}^{*}} {rPSD}_{{\bar{N}}_{v}} (X) = \sum_{v \in (Supp B_{X}^{*}) \ B_{A}} {rPSD}_{{\bar{N}}_{v}} (X) + \sum_{v \in B_{A}} {rPSD}_{{\bar{N}}_{v}} (X) \\ = \sum_{v \in (Supp B_{X}^{*}) \ B_{A}} {rPSD}_{{\bar{N}}_{v}} ((X \ A)) + \sum_{v \in B_{A}} {rPSD}_{{\bar{N}}_{v}} (A) \end{matrix}

because if $v \in (Supp B_{X}^{*}) \ B_{A}$ , then $A \cap C (\bar{v}) = \emptyset$ and if $v \in B_{A}$ , then $X \cap C (\bar{v}) = A \cap C (\bar{v})$ ; and

\begin{matrix} \sum_{v \in Supp B_{τ_{A, B} (X)}^{*}} {rPSD}_{{\bar{N}}_{v}} (τ_{A, B} (X)) \\ = \sum_{v \in (Supp B_{X}^{*}) \ B_{A}} {rPSD}_{{\bar{N}}_{v}} (((X \ A) \cup B)) + \sum_{v \in Supp B_{B}} {rPSD}_{{\bar{N}}_{v}} (((X \ A) \cup B)) \\ (by 11) \\ = \sum_{v \in (Supp B_{X}^{*}) \ B_{A}} {rPSD}_{{\bar{N}}_{v}} ((X \ A)) + \sum_{v \in Supp B_{B}} {rPSD}_{{\bar{N}}_{v}} (B) \end{matrix}

because if $v \in Supp B_{X}^{*}$ , then $B \cap C (\bar{v}) = \emptyset$ , and if $v \in Supp B_{B}$ , then $X \cap C (\bar{v}) = \emptyset$ .

Therefore, combining Eqns. (13) to (15), we obtain

\begin{matrix} {rPSD}_{N} (X) - {rPSD}_{N} (τ_{A, B} (X)) = \\ = \sum_{v \in B_{A}} {rPSD}_{{\bar{N}}_{v}} (A) - \sum_{v \in Supp B_{B}} {rPSD}_{{\bar{N}}_{v}} (B) + \sum_{e \in ↑ B_{X}^{*}} w (e) - \sum_{e \in ↑ B_{τ_{A, B} (X)}^{*}} w (e) . \end{matrix}

A similar argument proves that

\begin{matrix} {rPSD}_{N} (τ_{B, A} (X^{'})) - {rPSD}_{N} (X^{'}) \\ = \sum_{v \in B_{A}} {rPSD}_{{\bar{N}}_{v}} (A) - \sum_{v \in Supp B_{B}} {rPSD}_{{\bar{N}}_{v}} (B) + \sum_{e \in ↑ B_{τ_{B, A} (X^{'})}^{*}} w (e) - \sum_{e \in ↑ B_{X^{'}}^{*}} w (e) . \end{matrix}

Thus,

\begin{matrix} {rPSD}_{N} (X) - {rPSD}_{N} (τ_{A, B} (X)) ⩽ {rPSD}_{N} (τ_{B, A} (X^{'})) - {rPSD}_{N} (X^{'}) \end{matrix}

if, and only if,

\begin{matrix} \sum_{e \in ↑ B_{X}^{*}} w (e) + \sum_{e \in ↑ B_{X^{'}}^{*}} w (e) ⩽ \sum_{e \in ↑ B_{τ_{A, B} (X)}^{*}} w (e) + \sum_{e \in ↑ B_{τ_{B, A} (X^{'})}^{*}} w (e) . \end{matrix}

Finally, this last inequality holds because

\begin{matrix} \sum_{e \in ↑ B_{X}^{*}} w (e) + \sum_{e \in ↑ B_{X^{'}}^{*}} w (e) = \sum_{e \in ↑ B_{X}^{*} \cup ↑ B_{X^{'}}^{*}} w (e) + \sum_{e \in ↑ B_{X}^{*} \cap ↑ B_{X^{'}}^{*}} w (e) \\ ⩽ \sum_{e \in ↑ B_{τ_{A, B} (X)}^{*} \cup ↑ B_{τ_{B, A} (X^{'})}^{*}} w (e) + \sum_{e \in ↑ B_{τ_{A, B} (X)}^{*} \cap ↑ B_{τ_{B, A} (X^{'})}^{*}} w (e) (*) \\ = \sum_{e \in ↑ B_{τ_{A, B} (X)}^{*}} w (e) + \sum_{e \in ↑ B_{τ_{B, A} (X^{'})}^{*}} w (e) \end{matrix}

where step $(*)$ is due to

\begin{matrix} ↑ B_{τ_{A, B} (X)}^{*} \cup ↑ B_{τ_{B, A} (X^{'})}^{*} = ↑ (B_{τ_{A, B} (X)}^{*} \cup B_{τ_{B, A} (X^{'})}^{*}) \\ = ↑ (τ_{B_{A}, B_{B}} (B_{X}^{*}) \cup τ_{B_{B}, B_{A}} (B_{X^{'}}^{*})) (by (11) and (12)) \\ = ↑ ((((B_{X}^{*} \ B_{A}) \cup B_{B}) \cup ((B_{X^{'}}^{*} \ B_{B}) \cup B_{A}))) \\ = ↑ (B_{X}^{*} \cup B_{X^{'}}^{*}) = ↑ B_{X}^{*} \cup ↑ B_{X^{'}}^{*} \end{matrix}

and, by property (4) of $B_{A}$ and $B_{B}$ (and, again, (11) and (12)),

\begin{matrix} ↑ B_{X}^{*} \cap ↑ B_{X^{'}}^{*} \subseteq ↑ τ_{B_{A}, B_{B}} (B_{X}^{*}) \cap ↑ τ_{B_{B}, B_{A}} (B_{X^{'}}^{*}) = ↑ B_{τ_{A, B} (X)}^{*} \cup ↑ B_{τ_{B, A} (X^{'})}^{*} . \end{matrix}

This completes the proof of case (c.2). $□$

Funding

Open Access funding provided thanks to the CRUE-CSIC agreement with Springer Nature.

Declarations

Conflict of interest

The authors of this article declare that they have no financial Conflict of interest with the content of this article.

Footnotes

A class of undirected graphs that generalize unrooted trees and do not describe evolutionary histories but simply evolutionary relationships.

A subclass of split networks widely used because they are the output of popular programs like PhyloNet (Yu et al. 2014) or Splitstree4 (Huson and Bryant 2006).

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

Bordewich M, Semple C, Spillner A (2009) Optimizing phylogenetic diversity across two trees. Appl Math Lett 22:638–641 [Google Scholar]
Bordewich M, Semple C, Wicke K (2022) On the complexity of optimising variants of phylogenetic diversity on phylogenetic networks. Theoret Comput Sci 917:66–80 [Google Scholar]
Chernomor O, Klaere S etal (2016) “Split diversity: measuring and optimizing biodiversity using phylogenetic split networks.” In: Pellens and Grandcolas (2016) , 173-195
Doolittle WF (1999) Phylogenetic classification and the universal tree. Science 284:2124–2128 [DOI] [PubMed] [Google Scholar]
Faith D (1992) Conservation evaluation and phylogenetic diversity. Biol Cons 61:1–10 [Google Scholar]
Gaston KJ (1996) Species richness: measures and measurements. In: Gaston KJ (ed) Biodiversity: a biology of numbers and differences. Blackwell Science, pp 77–113 [Google Scholar]
Gusfield D, Eddhu S, Langley C (2004) Optimal, efficient reconstruction of phylogenetic networks with constrained recombination. J Bioinform Comput Biol 2:173–213 [DOI] [PubMed] [Google Scholar]
Gusfield D, Bansal V et al (2007) A decomposition theory for phylogenetic networks and incompatible characters. J Comput Biol 14:1247–1272 [DOI] [PMC free article] [PubMed] [Google Scholar]
Huson D, Bryant D (2006) Application of phylogenetic networks in evolutionary studies. Mol Biol Evol 23:254–267 [DOI] [PubMed] [Google Scholar]
Huson D, Rupp R, Scornavacca C (2010) Phylogenetic networks: concepts, algorithms and applications. Cambridge University Press [Google Scholar]
Jansson J, Sung W-K (2006) Inferring a level-1 phylogenetic network from a dense set of rooted triplets. Theoret Comput Sci 363:60–68 [Google Scholar]
Kolbert E (2014) The Sixth Extinction. An Unnatural History. Henry Holt and Company [Google Scholar]
McNeely JA, Miller KR et al. (1990). Conserving the world’s biological diversity. In: International Union for conservation of nature and natural resources
Pardi F, Goldman N (2005) Species choice for comparative genomics: Being greedy works. PLoS Genet 1:e71 [DOI] [PMC free article] [PubMed] [Google Scholar]
Pellens R, Grandcolas P eds. (2016). Biodiversity conservation and phylogenetic systematics: preserving our evolutionary heritage in an extinction crisis Springer Nature
Possingham HP, Andelman S et al (2002) Limits to the use of threatened species lists. Trends Ecol Evol 17:503–507 [Google Scholar]
Riera G (2023) Theoretical Models and Computational Techniques for the Analysis of Microbial Communities. PhD Thesis, UIB
Spillner A, Nguyen BT, Moulton V (2008) Computing phylogenetic diversity for split systems. IEEE/ACM Trans Comput Biol Bioinf 5:235–244 [DOI] [PubMed] [Google Scholar]
Steel M (2005) Phylogenetic diversity and the greedy algorithm. Syst Biol 54:527–529 [DOI] [PubMed] [Google Scholar]
M. Steel (2016). Phylogeny: Discrete and random processes in evolution
Wicke K, Fischer M (2018) Phylogenetic diversity and biodiversity indices on phylogenetic networks. Math Biosci 298:80–90 [DOI] [PubMed] [Google Scholar]
Yu Y, Dong J, Liu KJ (2014) Bayesian estimation of species networks from multilocus data. Mol Biol Evol 31:1032–1043 [Google Scholar]
Zhukova A, Blassel L et al (2021) Origin, evolution and global spread of SARS-CoV-2. CR Biol 344:57–75 [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary file 1 (pdf 487 KB)^{(487KB, pdf)}

[CR1] Bordewich M, Semple C, Spillner A (2009) Optimizing phylogenetic diversity across two trees. Appl Math Lett 22:638–641 [Google Scholar]

[CR2] Bordewich M, Semple C, Wicke K (2022) On the complexity of optimising variants of phylogenetic diversity on phylogenetic networks. Theoret Comput Sci 917:66–80 [Google Scholar]

[CR3] Chernomor O, Klaere S etal (2016) “Split diversity: measuring and optimizing biodiversity using phylogenetic split networks.” In: Pellens and Grandcolas (2016) , 173-195

[CR4] Doolittle WF (1999) Phylogenetic classification and the universal tree. Science 284:2124–2128 [DOI] [PubMed] [Google Scholar]

[CR5] Faith D (1992) Conservation evaluation and phylogenetic diversity. Biol Cons 61:1–10 [Google Scholar]

[CR6] Gaston KJ (1996) Species richness: measures and measurements. In: Gaston KJ (ed) Biodiversity: a biology of numbers and differences. Blackwell Science, pp 77–113 [Google Scholar]

[CR7] Gusfield D, Eddhu S, Langley C (2004) Optimal, efficient reconstruction of phylogenetic networks with constrained recombination. J Bioinform Comput Biol 2:173–213 [DOI] [PubMed] [Google Scholar]

[CR8] Gusfield D, Bansal V et al (2007) A decomposition theory for phylogenetic networks and incompatible characters. J Comput Biol 14:1247–1272 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR9] Huson D, Bryant D (2006) Application of phylogenetic networks in evolutionary studies. Mol Biol Evol 23:254–267 [DOI] [PubMed] [Google Scholar]

[CR10] Huson D, Rupp R, Scornavacca C (2010) Phylogenetic networks: concepts, algorithms and applications. Cambridge University Press [Google Scholar]

[CR11] Jansson J, Sung W-K (2006) Inferring a level-1 phylogenetic network from a dense set of rooted triplets. Theoret Comput Sci 363:60–68 [Google Scholar]

[CR12] Kolbert E (2014) The Sixth Extinction. An Unnatural History. Henry Holt and Company [Google Scholar]

[CR13] McNeely JA, Miller KR et al. (1990). Conserving the world’s biological diversity. In: International Union for conservation of nature and natural resources

[CR14] Pardi F, Goldman N (2005) Species choice for comparative genomics: Being greedy works. PLoS Genet 1:e71 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR15] Pellens R, Grandcolas P eds. (2016). Biodiversity conservation and phylogenetic systematics: preserving our evolutionary heritage in an extinction crisis Springer Nature

[CR16] Possingham HP, Andelman S et al (2002) Limits to the use of threatened species lists. Trends Ecol Evol 17:503–507 [Google Scholar]

[CR17] Riera G (2023) Theoretical Models and Computational Techniques for the Analysis of Microbial Communities. PhD Thesis, UIB

[CR18] Spillner A, Nguyen BT, Moulton V (2008) Computing phylogenetic diversity for split systems. IEEE/ACM Trans Comput Biol Bioinf 5:235–244 [DOI] [PubMed] [Google Scholar]

[CR19] Steel M (2005) Phylogenetic diversity and the greedy algorithm. Syst Biol 54:527–529 [DOI] [PubMed] [Google Scholar]

[CR20] M. Steel (2016). Phylogeny: Discrete and random processes in evolution

[CR21] Wicke K, Fischer M (2018) Phylogenetic diversity and biodiversity indices on phylogenetic networks. Math Biosci 298:80–90 [DOI] [PubMed] [Google Scholar]

[CR22] Yu Y, Dong J, Liu KJ (2014) Bayesian estimation of species networks from multilocus data. Mol Biol Evol 31:1032–1043 [Google Scholar]

[CR23] Zhukova A, Blassel L et al (2021) Origin, evolution and global spread of SARS-CoV-2. CR Biol 344:57–75 [DOI] [PubMed] [Google Scholar]

PERMALINK

An interchange property for the rooted phylogenetic subnet diversity on phylogenetic networks

Tomás M Coronado

Gabriel Riera

Francesc Rosselló

Abstract

Supplementary Information

Introduction

Preliminaries

Phylogenetic networks

Fig. 1.

The rooted phylogenetic diversity on phylogenetic trees

Algorithm 1.

The rooted phylogenetic subnet diversity

Example 1

A general exchange property

Theorem 1

Example 2

Fig. 2.

Corollary 1

Applications

Proposition 1

Proof

Remark 1

Algorithm 2.

Remark 2

Remark 3

Lemma 1

Corollary 2

Proof

Proposition 2

Proof

Algorithm 3.

Example 3

Fig. 3.

Proposition 3

Proof

Fig. 4.

Proposition 4

Remark 4

Conclusions

Supplementary Information

Acknowledgements

Appendix A: Proof of Theorem 1

Lemma 2

Fig. 5.

Remark 5

Lemma 3

Proof

Theorem 1

Proof

Fig. 6.

Fig. 7.

Funding

Declarations

Conflict of interest

Footnotes

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases