Abstract
The traditional way of tackling discrete optimization problems is by using local search on suitably defined cost or fitness landscapes. Such approaches are however limited by the slowing down that occurs when the local minima that are a feature of the typically rugged landscapes encountered arrest the progress of the search process. Another way of tackling optimization problems is by the use of heuristic approximations to estimate a global cost minimum. Here, we present a combination of these two approaches by using cover-encoding maps which map processes from a larger search space to subsets of the original search space. The key idea is to construct cover-encoding maps with the help of suitable heuristics that single out near-optimal solutions and result in landscapes on the larger search space that no longer exhibit trapping local minima. We present cover-encoding maps for the problems of the traveling salesman, number partitioning, maximum matching and maximum clique; the practical feasibility of our method is demonstrated by simulations of adaptive walks on the corresponding encoded landscapes which find the global minima for these problems.
Keywords: Adaptive walk, Coarse-graining, Oracle function, Genotype–phenotype map, Combinatorial optimization
Introduction
Fitness landscapes have proved to be a valuable concept in the understanding of adaptation in evolutionary biology and beyond, by visualizing the relationships between genotypes and effective reproductive success (Wright 1932, 1967). This concept has been taken forward in the field of evolutionary computation, where the performance of optimization algorithms utilizing local search has often been described as dynamics on a fitness landscape, see, e.g., the book by Engelbrecht and Richter (2014).
However, fitness functions alone do not determine the performances of local search algorithms, which depend also on the structure of the search spaces involved. These in turn are determined by two largely independent ingredients: (1) the concrete representations of the configurations that are to be optimized, referred to as encodings, and (2) locality in the search space, referred to as a move set.
For many well-studied combinatorial optimization problems and related models from statistical physics (such as spin glasses), there is a natural encoding. For instance, tours of a traveling salesperson problem (TSP) are naturally encoded as permutations of the cities concerned, while spin configurations are encoded as strings over the alphabet with each letter referring to a fixed spin variable. This natural encoding is usually free of redundancy; any residual redundancies that occur usually arise from simple symmetries of the problem which can easily be factored out. For instance, TSP tours can start at any city so that they are invariant under rotations, while many spin glass models are invariant under simultaneous flipping of all spins. This natural or “direct” encoding is often referred to as the phenotype space, see, e.g., (Rothlauf 2006; Neumann and Witt 2010; Rothlauf 2011; Borenstein and Moraglio 2014).
In biology, fitness is conceptually understood as a property (function) of the genotype. It depends, however, on properties of higher-level structures such as molecular structure, gene-regulatory networks, tissues, or organs, i.e., on a phenotype. The relationship of genotype and fitness, therefore, is a composition of a genotype–phenotype map and phenotype-dependent fitness function. This decomposition has been studied extensively in several distinct models systems, including RNA secondary structures, (Schuster et al. 1994), gene-regulatory networks (Ciliberti et al. 2007), and metabolic networks (Dykhuizen et al. 1987; Flamm et al. 2010). Here, we focus on the abstract structure rather than the specifics of such models.
For a given encoding, irrespective of whether it is genotypic or phenotypic, the performance of search crucially depends on the move set. Here, we will consider only reversible, mutation-like moves. The search space therefore is modeled as an undirected graph. More general settings are discussed, e.g., by Flamm et al. (2007). The cost function assigned to a specific search space defines a fitness landscape. Evolutionary algorithms can thus be viewed as dynamical systems operating on landscapes, whose structure has, as a consequence, been studied extensively in the field (Reidys and Stadler 2002; Østman et al. 2010; Engelbrecht and Richter 2014).
Continuing the analogy with biology in evolutionary computation, an additional encoding Y, the so-called genotype space, is often used (Rothlauf and Goldberg 2003; Rothlauf 2006). The genotype–phenotype relation is determined by a map , where represents phenotypic configurations that do not occur in the original problem, i.e., does not encode a feasible solution of the original problem whenever . For example, a frequently used genotypic encoding for TSP tours comprises binary strings for two cities which represent their presence (1) or absence (0), for each of the possible adjacencies (Applegate et al. 2006). Most binary strings, however, do not correspond to TSP tours.
In practice, genotypic representations are usually chosen with a high degree of redundancy to tackle optimization problems which often also introduces neutrality, i.e., the appearance of adjacent configurations with the same value of the cost function. Detailed investigations of fitness landscapes from molecular biology have shown that degrees of neutrality can facilitate optimization (Schuster et al. 1994; Reidys and Stadler 2002) due to the inclusion of extensive neutral paths which prevent trapping in metastable states (Schuster et al. 1994; Fernández and Solé 2007; Yu and Miller 2002; Banzhaf and Leier 2006). On the other hand, “synonymous encodings” where genotypes mapping to the same phenotype form tight clusters in the genotype space have been advocated for the design of evolutionary algorithms (Rothlauf 2006; Choi and Moon 2008; Rothlauf 2011). Rather than having neutral paths connecting remote areas of the landscape, cost-equivalent configurations are locally clustered in synonymous encodings.
What is clear is that, empirically, the introduction of arbitrary redundancy (by means of random Boolean network mapping) does not increase the performance of mutation-based search (Knowles and Watson 2002), suggesting that the inclusion of redundancy should be suitably designed in order to facilitate optimization. One such approach was that of Klemm et al. (2012), which emphasized the utility of such inhomogeneous genotype–phenotype maps via the idea that low-cost solutions could be enriched and optimization made more efficient in genotype space if the size of the preimage of the phenotypes were anti-correlated with the cost function f(x) . Of course, for such anti-correlations to be imposed, needs to become explicitly dependent on the cost function.
Simplifying Landscape Structure by Encoding
Before delving into the technicalities, we present a conceptual outline of the key ideas of this contribution. Our starting point is the twenty-year-old observation by Ruml et al. (1996) that certain redundant encodings of the Number-Partitioning Problem (NPP) allow simple, generic optimization heuristics to find dramatically improved solutions. In previous work (Klemm et al. 2012) we found that this approach was not limited to the NPP, but that suitably chosen redundant encodings also improved the performance of heuristics on several other combinatorial optimization problems. In the present work, our objectives are to understand (a) why the particular method used by (Ruml et al. 1996) works so well and (b) how it can be generalized to essentially arbitrary combinatorial optimization problems in a principled way.
We focus in this contribution on black-box-type optimization scenarios in which the information on the cost function f(x) is exclusively obtained by evaluating it for specific configurations in the search space X. The sequence of these function evaluations is determined by the optimization heuristic. Practical algorithms of this type propose candidates for evaluation based on past evaluation results. These candidates are chosen locally in the vicinity of past successful candidates with the help of rules that depend on the representation of X. This explicitly or implicitly defines a topological structure on X. For the purpose of the present contribution, we assume that the topology of the search space X is expressed by a notion of adjacency that is respected by the search process.
Intuitively, the most important obstruction for local optimization heuristics is the presence of a large number of local optima that trap the search process. The aim of a redundant encoding, therefore, is to provide an alternative representation Y of the optimization problem that reduces the number of local optima and makes it easier to find the globally optimal solution. Formulated over Y, we would wish that
-
(i)
neighborhoods in Y are small enough to be searched in practice.
-
(ii)
for every starting point there is a path to the global optimum such that the cost function is decreasing, or at least non-increasing.
Condition (i) ensures that we still deal with local search heuristics, while condition (ii) intuitively makes the landscape easy to search. Note that condition (ii) does not make the optimization problem trivial, since the heuristics still have to find an efficient path among possibly many very long ones. Its real significance is that it rules out traps and guarantees that simple downhill search will be successful eventually.
Is it possible at least in principle to construct such an encoding? The prepartition encoding, which performed best for the NPP (Ruml et al. 1996), provides an important hint. Each particular encoding corresponds to a restricted version of the original optimization problem, i.e., it can be seen as constraining the original search space X to a subset . A deterministic approximation is then used to solve the restricted problem on . For every , this provides an upper bound on the cost function . Since the encoding is chosen such that there is also a code for the global optimum , i.e., , the task now becomes to find , which minimizes by construction. The numerical results by (Ruml et al. 1996) suggest that this auxiliary problem of minimizing the cost function of the encoding is much easier than the original, despite the fact that the search space is much larger. Below we show that this is case because (1) does a good job at approximating the true solution of the restricted optimization problem on and (2) the perfect solutions give rise to landscapes with the desired properties mentioned above.
This observation suggests a general construction for “good” landscape encodings. The first step is the construction of a genotype space Y and an encoding scheme that maps genotypes to restrictions of the original problem rather than a particular phenotype y. This map has to satisfy certain conditions discussed in detail in Sect. 3.2 to be a good choice. The cost function then enters by guiding, for every genotype , a heuristic that solves the restricted problem .
Following the formal introduction of the general concepts, we construct landscape encodings explicitly for several well-known examples. In Sect. 4, we focus on a particularly useful construction that makes use of the fact that the restricted subproblems on can be seen as smaller instances of the same type of optimization problem, or alternatively, as coarse-grained problems. We show in particular that the NPP heuristic that motivated our approach is also of this type. In Sect. 5, finally, we use numerical experiments to show that the encoding scheme proposed here also works well in practice.
A Theory of Encoding Representations
Landscapes
Formally, an instance (X, f) of a combinatorial optimization problem consists of a finite set X and a cost function on X. The task of the combinatorial optimization problem (X, f) is to find a global minimum so that for all .
A landscape consists of a finite set X endowed with a symmetric and irreflexive (adjacency) relation and a cost function . A point is a strict local minimum in if (i) and (ii) there is no with and an f-non-increasing path , that is, and holds for . Note that a global minimum is not a strict local minimum as defined above.
For any , the restricted problem , where for all , consists in finding a so that for all . A restricted landscape can be defined analogously.
Oracle Function and Cover-Encoding Map
A key ingredient in our reasoning is to consider the global solutions of restricted optimization problems. This is formalized as follows:
Definition 1
The oracle function of an optimization problem (X, f) is
| 1 |
for all . We use the convention .
We say that a subset is good if , i.e., if contains a global optimum, and bad if . The oracle function is by definition monotonic in the following sense:
| 2 |
We call F an oracle function because in general there is no efficient algorithm for computing it. In fact, if we had an efficient way to compute F, we would already have solved the original optimization problem as well. Nevertheless, it is a useful theoretical construct, as we shall see below. First, it guides our construction of encodings of the original optimization problem that have the potential of being easily solved, or at least easier to solve. Second, it provides an inroad for constructing practical heuristics provided we can come up with a good approximation for F.
We start by formalizing the idea of an encoding of a landscape.
Definition 2
A function is a cover-encoding map for X if it satisfies
- (Y1)
.
Property (Y1) states that the collection of sets is a set cover of X. The points can be thought as coding for a particular element of this set cover. In the following, we will be interested in cover-encoding maps that satisfy some or all of the following additional properties:
- (Y0)
.
- (Y2)
For every there is a such that .
- (Y3)
There is such that .
Note that both (Y2) and (Y3) imply (Y1). Axiom (Y0) excludes infeasible points in Y.
It is not hard to see that cover-encoding maps always exist. In particular, consider any subset , the set of non-empty subsets of X, such that (i) the singletons for all and (ii) . Then the identity is obviously a cover-encoding map that satisfies (Y0), (Y1), (Y2), and (Y3).
Now consider an optimization problem (X, f) and let be a cover-encoding map for X. We define as the composition of with the oracle function of (X, f), i.e., . In the following, we will be interested in the relationship between the “encoded” optimization problem and the original problem (X, f).
If condition (Y2) is satisfied, there is so that for every global optimum of the original problem. For most applications, it is sufficient to find one global optimum, hence we will consider the weaker condition:
- (F0)
There is so that (i) and .
Condition (F0) simply states that there exists a code that identifies a global optimum of the original problem (X, f). This is sufficient to consider (X, f) and as “equivalent optimization problems.”
The identity cover-encodings from and are the extreme cases. encodes all possible subproblems, while only encodes the singletons, i.e., the evaluation of the cost function f for every , as well as the full optimization problem.
In this contribution, we are interested in search-based algorithms. Hence we fix an adjacency relation on Y. For the landscape , we consider the following three properties:
- (R1)
For every with there is a sequence such that for and .
- (R2)
For every with there is a sequence such that for , and .
- (R3)
Every y with has a neighbor with .
In plain words, (R1) ensures that all minimum-cost encodings are connected by paths staying at minimum cost. Under (R2), each configuration is the beginning of a path to a minimum-cost configuration, with the value of the cost function not increasing along the path. Property (R3) uses the fact that all configurations in Y are subsets of X. It says that each configuration has a neighboring configuration properly containing y. It is worth noting that (R3) is independent of the oracle function F.
For identity cover-encodings introduced above, a natural definition of adjacency is to set and whenever (i) , (ii) , and (iii) if then or . That is, two sets are adjacent if they are adjacent in the Hasse diagram for set inclusion. By construction, every is connected by a sequence of adjacent sets to all singletons with and to the full set . Since is the identity, (R3) holds. Using that implies , properties (R1) and (R2) also follows immediately.
Taken together, the identity cover-encodings demonstrate that cover-encodings and associated adjacencies satisfying (Y0) through to (Y3) as well as (R1), (R2), and (R3) always exist.
Lemma 1
(R3) implies (R2) for any oracle function F.
Proof
If , then by construction. Now consider an arbitrary starting point y. By (R3), there is a neighbor such that , and by Eq. (2), we therefore have . Repeating the argument, we obtain a -non-increasing sequence along which is strictly increasing in each step. Since X is finite, there is a finite k so that and thus , i.e., (R2) is satisfied.
The importance of conditions (R1) and (R2) stems from the following observation:
Theorem 1
Suppose (X, f), , and the relation on Y are chosen such that (Y1), (F0), (R1), and (R2) are satisfied. Then the landscape has no strict local optimum.
Proof
Let be an arbitrary starting point. If then y, by (R1), is not a local optimum but part of a connected neutral network that contains the global optimum . If , then . By (R2), there is a path with non-increasing values of that connects y to a point with . We already know that there is a path with constant values of leading from to the global optimum . Thus y is connected by a -non-increasing path to . Hence y is, by definition, not a strict local optimum.
In particular, the identity cover-encodings satisfy the conditions of Theorem 1 and thus their landscapes have no strict local optima. There are, however, also very different general constructions with this property. In the remainder of this section, we consider one example.
Definition 3
Let be an arbitrary landscape. Its square encoding is the map , for . The neighborhood relation on is given by
The graph is the Cartesian square of the graph (Hammack et al. 2016). The idea behind this construction is to allow a local search algorithm to keep track of the best solution so far in one variable and use the other variable for exploration. Figure 1 shows an example.
Fig. 1.
(Color figure online) Illustration of the square encoding. a Original landscape with configurations . The three configurations form a path under the adjacency relation . The cost function f renders t the unique global minimum, r a strict local minimum. Thus t is not reachable from r by a non-increasing path. b Landscape resulting from square encoding of the landscape in (a). Here, each configuration is a tuple of configurations of the original landscape, . The cost function is . On this landscape, a minimal cost configuration is reachable from all configurations by a non-increasing path
Lemma 2
The landscape satisfies (Y0), (Y2), (F0), (R1), and (R2). In particular it has no strict local optima.
Proof
Considering the properties of , (Y0) is obtained with for all ; (Y2) is fulfilled choosing for any . This implies (Y0) so is a cover-encoding map. We have (Y3) only in the trivial case . Property (F0) is fulfilled with .
For , we write for the standard graph distance, the length of a shortest path, between y and ; analogous notation for the distance on . For and , we have .
Now let . Then . We assume, without loss of generality, (otherwise swap and ). Because is connected, we find a neighbor with . With , we have and . For each element we thus find a that (i) is strictly closer to than y is; and (ii) does not evaluate at higher value than y under . Using the argument inductively at most times, the desired sequences in (R1) and (R2) are constructed. Therefore properties (R1) and (R2) are fulfilled by . Theorem 1 now implies that there are no strict local minima.
Adaptive Walks
An adaptive walk on a fitness landscape is a Markov chain on the state space Y with transition probabilities for and . Otherwise , except for where is obtained by normalization of probability. The degree of state y is the number of neighbors . Formulated as a stochastic search algorithm, a neighbor z of the current (time t) configuration y is drawn uniformly at random. If , the walk proceeds to configuration z at time ; otherwise it remains at configuration y.
Call the set of global minima of the landscape . Assume that this landscape does not have a strict local minimum. Then each realization of an adaptive walk eventually hits a global minimum. Due to the absence of strict local minima, the adaptive walk is trapped only at global minima. Each invariant measure of the adaptive walk therefore evaluates to zero on all configurations with non-minimum cost. Property (R2) clearly is a necessary condition for an optimization problem to be solvable by adaptive walks alone. The conditions of Theorem 1 are already sufficient as it excludes strict local optima.
Examples of Cover-Encoding Maps
Let us now turn to constructing some problem-specific examples of cover-encoding maps. We will then use some of these examples to show that some cover-encoding maps are useful to construct good heuristic search algorithms for several well-studied combinatorial optimization problems.
Prepartition Encoding for the NPP
An NPP instance is described by a list of numbers. We write for the index set. We have to divide these n numbers into two subsets with as equal a sum as possible. In other words, we assign to each index i a variable so that
| 3 |
see, e.g., (Mertens 2006) for a review. The set X consists of all strings of and of length n, the set Y consists of all functions . The so-called prepartitioning encoding (Ruml et al. 1996) of the NPP can be written in the following way: Each function defines the partition whose classes are the indices of the input numbers that are assigned the same value of y. As usual we write for the class that contains index i. For given we now insist that the signs whenever . This amounts to the restricted set of configurations
| 4 |
One easily checks that whenever y is a bijection, i.e., (Y3) is satisfied. Furthermore, the subset corresponds exactly to the assignments of positive and negative signs: Writing simply set if and if . (More precisely, the choice of or is arbitrary; the symmetry can, however, easily be removed, e.g., fixing once and for all.) Conversely, every assignment of signs has a representation as a bipartition in . Thus (Y2) is satisfied.
The most natural choice of an adjacency on Y is to define if and only if for exactly one . Unless y is a bijection, there is at least one unused value and at least one pair with . The neighbor of y with for and corresponds to refinement of the partition because , , and all other classes of and are the same. Thus satisfies (R3).
An optimal solution of the NPP (X, f) is a partition of [n] into exactly two classes and so that for and for . A code is good if there is a configuration in in which the signs can be assigned in exactly this manner, i.e., if is a refinement of . Conversely, is good only if it is a refinement of a bipartition that represents a global minimum. Generically is unique. Now consider two classes and in that are contained in the small class of , i.e., . Reassigning one element at a time from to thus corresponds to a sequence of codes all of which are encode refinements . Furthermore, is one class less than y. Repeating this step at most times eventually results in . Intermediate codes and are adjacent by construction and satisfy , i.e, condition (R1) is satisfied. Thus, we conclude that the “oracle landscape” has no strict local minima.
Prepartition Encoding for the TSP
The cost function of TSP (Gutin and Punnen 2007) is
| 5 |
where is a bijection from the index set [n] to a set of cities C. The index i specifies the position along the tour. For a city c, therefore, is its position along the tour. The problem is parametrized by distances that satisfy for all but in general are neither symmetric nor do they satisfy the triangle inequality.
Klemm et al. (2012) introduced the following version of a prepartition encoding. Here, an arbitrary function is used to restrict the possible orderings of the cities along the tour as follows: For all cities , the condition implies . Again this defines a subset of the search space X of each y. We use the same definition of adjacency in Y. Here, constant functions y impose no restrictions on , i.e, whenever for all . On the other hand, if y is bijective then consists only of a single tour since in this case for all , i.e., . Thus, (Y2) and (Y3) are satisfied.
To address properties (R2) and (R1), we first observe that given an encoding y, we can always move one city c with to one of the classes defined by y with an adjacent value . More precisely, suppose is such that (a) there is a city d so that and b) there are no cities e with , for any between k and . If , the city which we can move is the one with that appears last in the optimal tour ; similarly, if , we can move the city c with that appears first in the optimal tour . In the first case, we can set , while in the second case, we can choose . By construction , and therefore . It is also clear from the construction that the step from y to can always be chosen so that the number of classes remains constant, increases by one , or decreases by one—unless we already have , in which case only a decrease is possible, or we have , in which case only an increase is possible. Thus, we can always find a path along which does not increase and along which is non-increasing or non-decreasing, respectively. Note the moves keeping constant might be necessary to move the values y(c) stepwise around in [n] to have enough “space” to break up individual classes of , so that its members in the end have consecutive values of y. It is not hard to convince oneself that this is always possible. As a consequence, we can always connect any y to a code with a single class (for which ). For two adjacent classes, we simply join, one-by-one, the cities of the smaller class to the larger one. Furthermore, the single-class code can be broken by pulling a city at a time so that (R1) also holds. Note that (R3) is not necessarily satisfied, however.
In contrast to the previous example of the NPP, here the paths are much more involved and often longer. We therefore conjecture that the prepartition encoding is less efficient for the TSP than for the NPP.
Spanning Forest Encoding for the NPP
A very different encoding for the NPP can be constructed as follows. Denote by Y the set of all spanning forests of the complete graph . For a detailed discussion of the combinatorics of spanning forests, we refer to (Teranishi 2005). For each forest denote by one of its connected components. Since is a tree and thus bipartite, there is a uniquely defined bipartition of its vertex set. We assign for and for to the other.
| 6 |
Suppose the spanning forest y has k components. Then, the sign pattern on each component is uniquely defined by fixing independently the sign of the lexicographically smallest . Thus, consists of exactly distinct configurations. It follows that if y contains no edges. Denoting the complement of x by , we have whenever y is a spanning tree. Since x and represent the same solution of the number partitioning problem, satisfies (Y2) and (Y3).
(R3) holds since removing an edge from the spanning forest y yields another spanning forest that imposes fewer restrictions and thus corresponds to a larger subset of X. In general, write if is a subforest of y. Then . The unconstrained search space corresponds to the spanning forest without edges. Conversely, every spanning tree that defines the bipartition of the globally minimal solution of the original NPP encodes exactly this solution. Every sequence of spanning forests obtained by successive edge deletions from connects and and each also contains the global minimum encoded by . Thus (R1) holds.
Subdivision Encoding for the TSP
An alternative encoding for the TSP uses a permutation of the set of C cities and subdivision of [n] into consecutive intervals. We specify by the upper bound of the interval, i.e., . Since the tours are circular, we set and as usual consider the order < circular on [n]. Therefore, . An encoded configuration fixes the order of cities within each of the index intervals . The first city in interval is , the last city is . Thus, if is obtained by permuting the intervals and following the order given by within each interval, as shown in Fig. 2.
Fig. 2.
Example for a subdivision of the TSP. The cities are subdivided into classes of a partition within which their order is fixed among all restricted tours (full arrows). The order in which the classes are traversed remains free (dotted arrows)
If is the discrete partition, then we obviously have , while the indiscrete partition uniquely specifies the tour . The encoding therefore satisfies (F0), (Y0), (Y1), (Y2), and (Y3). Consider any adjacency relation on Y so that if is obtained by splitting a class (interval) into two or merging two intervals. Then (R3) is clearly satisfied.
In order to consider (R1), we specify the adjacency relation more stringently. If , then either (i) y is obtained from by splitting exactly one class of into two non-empty parts or vice versa, or (ii) y and exhibit the same partition of the cities, i.e., . In case (i), the ordering within each class in maintained. For the split interval , this means that an index is chosen and the resulting intervals become and . The ordering between intervals (classes of ) remains fixed. In case (ii), the partition and the ordering within the intervals both remain unchanged, but the ordering of the intervals (classes of ) changes. For our purposes, it is not important which types of permutations between intervals are allowed, as long as they form an ergodic set. Plausible choices are transpositions, canonical transpositions, reversals, or even all permutations.
Now consider an encoded configuration with . The intervals of specified are partial tours of the globally optimal solution. Moves on Y can now be performed so that a new encoding is obtained in a stepwise fashion, that uses the same intervals and brings two partial tours that are consecutive in into the desired order. During this stepwise change of , the encoded sets stay the same, and thus . Now the two appropriate consecutive intervals can be merged. This reduces m by 1 and makes smaller, but the globally optimal solution is still retained, i.e., . The procedure can be repeated at most times to reach the indiscrete partition, which fully specifies the globally optimal tour. Thus, (R1) holds for all choices of neighborhoods that allow merging/splitting of adjacent intervals and an ergodic permutation of the intervals.
Sparse Subgraph Encoding for the Maximum Matching Problem
For a graph , a matching is a subset of pairwise disjoint edges, i.e., (V, M) is a graph with a maximum degree of at most 1. Denoting by X the set of matchings on G, the maximum matching problem (MMP) (X, f) has the cost function f giving the number of unmatched nodes
| 7 |
in a matching M. Thus, the MMP asks for a subset of edges that cover as many nodes as possible without having any node contained in more than one edge (Lovász and Plummer 1986).
Now consider an edge subset . In the present context, we call S sparse if the graph (V, S) has maximum degree 2, so each connected component of (V, S) is a cycle or path (including isolated nodes as trivial paths). Denote by Y the set of all sparse subsets of E. Since a matching M is also a sparse subset of G, we have .
The cover-encoding map assigns each the set of maximum matchings of the graph (V, S). Now with S sparse, the maximum matching problem on (V, S) is trivially solved separately on each connected component being a path or cycle. For a path of odd length k, the maximum matching is unique with edges; a path or cycle of even length k has exactly two disjoint maximum matchings of cardinality k / 2. A cycle of odd length k has exactly k pairwise different maximal matchings of cardinality .
For each matching , we have so property (Y2) holds. Properties (Y0) and (Y1) are fulfilled. With the choice , (F0) is fulfilled. Property (Y3) holds if and only if (G, E) is sparse itself.
We consider sparse subsets D and as adjacent, , if they differ at exactly one edge, .
In order to demonstrate properties (R1) and (R2), let . We show that there is with and . Thus, neighbor is obtained from y either by adding an edge contained in or removing an edge not contained in . If , find an edges and set , and we are done. Otherwise, since , there is an edge . If is sparse, we are done using . Otherwise at least one of nodes v and w has degree 3 in the graph (V, z); suppose node v has degree 3. Find a maximum matching . Since v has degree 2 in the graph (V, y), there is an edge incident in v. Set . We easily confirm in each of the cases above. Sequences for properties (R1) and (R2) are obtained by induction.
String Encoding for the Maximum Clique Problem
For a graph , a clique is a node subset inducing a fully connected subgraph, i.e., for all with . Denoting by X the set of cliques of G, the maximum clique problem (MCP) (X, f) has the cost function f giving the number of nodes
| 8 |
outside a clique M (Bomze et al. 1999).
For arbitrary and any string of not necessarily distinct nodes , we define the greedy clique recursively by
| 9 |
and for the empty string .
We construct a cover-encoding map based on strings of length , so . For a string , we denote the substring (suffix) from index k to the end (index n) by . Now maps a string to maximal greedy cliques over suffices of y,
| 10 |
So a clique C is contained in if and only if C is a greedy clique from a suffix of y and none of the other greedy cliques from y properly contains C. This ensures that produces all the singletons, thus fulfilling property (Y2). We call y pure if . A string is pure if and only if is a clique of G. We define strings to be adjacent, in symbols , if and only if there is a unique index with (Hamming distance 1).
In order to prove properties (R1) and (R2), we first observe that there is a non-increasing sequence of strings from any to a pure with and . The sequence is obtained by finding a maximal . If y is not pure, there is with . The next string in the sequence can be obtained by replacing the entry with an arbitrary element from C.
If are pure with and , there is a non-increasing sequence from y to z. It may be constructed by stepwise swapping operations. Since , there is at least one element in C found at two distinct positions in y so one of these can be used as a temporary variable in the swap.
Now let with . Find a maximal clique and a maximal clique . We construct a non-increasing sequence from y to by concatenating the following sequences. First, a non-increasing sequence from y to a pure with . Second, a non-increasing sequence from to a pure with and , and arbitrary . Third, a sequence from z to a string is obtained by assigning, step by step, nodes in to entries from to . The sequence is non-increasing because each of its strings generates C under . On the other hand, so . Now again by swap steps, we transform into .
Coarse-Graining
Some of the restricted search spaces introduced above can also be thought of as coarse-grainings of the original problem. In the following subsections, we show this for the prepartition and spanning forest encodings of the NPP, as well as for the TSP.
Prepartition Encoding of the NPP
Consider the NPP instance with numbers and let be an arbitrary partition of [n] with classes (subsets) so that . Of course, we can think of as the classes defined by the prepartition encoding, i.e., . Set . Then the set of numbers defines an NPP on m numbers. In terms of a prepartition y this amounts to . Note that if , then is the discrete partition in which every class contains only a single element, and hence . In the general case, the solutions of the two NPPs are related to each other in the following way. Denote the variables for the smaller NPP by and write and for the cost functions. Then, obviously
| 11 |
An optimal solution of the larger problem corresponds to a partition of [n] into exactly two classes and so that for and for . The coarse-grained NPP has an optimal solution with the same cost if (and in the generic case also only if) or holds for all , i.e., if (and generically only if) the coarse-graining partition is a refinement of the partition that encodes the globally optimal solution of the original problem.
Travelling Salesman Problems
Recall the subdivision encoding for the TSP and fix an encoding . The length of the partial tour inside the interval is
| 12 |
Furthermore, the road from interval to interval is the road from to , i.e.,
| 13 |
Since a tour is uniquely defined by a permutation of the intervals, we have
| 14 |
where is the tour length of the TSP restricted to the connections between the fixed intervals. With a slight change, one can also produce a TSP that retains the original values of the cost function. To this end, we set
| 15 |
and . A short computation verifies .
Note that we naturally obtain an asymmetric TSP even if the original problem was symmetric since now because in general we will have .
Spanning Forest Representation of the NPP
Let us now return to the NPP. Let y be a spanning forest of . For each connected component (tree) let and be the corresponding bipartition of the vertex set of t. Define
| 16 |
This defines an instance of the NPP with as many numbers as connected components in y. A choice of sign for t implies a particular choice of sign for each , i.e., each configuration z for the NPP with numbers corresponds to a configuration x of the original problem with numbers . Clearly, these coincide with the configurations described in Sect. 3.4.3.
Some Remarks on Coarse-Grainings: Analogies with the Renormalization Group?
It is tempting to speculate that the coarse-grainings we have observed in the above are analogous to those observed in renormalization group theory, well known for its use in analyzing spin glasses and related disordered systems (Rosten 2012). In our context, it can be described as follows. For a given type of problem, such as the NPP or the TSP, consider the space of all possible instances of all sizes. A particular instance (e.g., the NPP with n numbers ) is a point . Now we define a set of maps that map larger instances to strictly smaller ones. Of interest in this context are in particular those maps r that (approximately) preserve salient properties. Since is a smaller instance than , the map r is not invertible. The maps in can of course be composed, and thus form a semi-group which is known as the renormalization group (Wilson and Kogut 1974; Wilson 1971). Of course, while renormalization groups in statistical physics are used to analyze the typical behavior of large systems near criticality, our focus in the present optimization context is on particular instances of systems that are typically large. This does not yet rule out an analogy, assuming that something like an ergodic hypothesis applies, where the behavior of typical instances is indeed that of the average. Thus, starting from , or more precisely, an encoding y so that , we can think of adjacent encodings with as “renormalized” versions of . A path in leading from to the trivial instance thus can be seen as the iteration of progressively renormalized samples.
A positive example of this analogy could be that of the spanning forest encoding of the NPP with real-space renormalization schemes for Ising spins: an example of an could be a so-called block spin transformation (Kadanoff 1966), where suitable averages are taken over small local subsets of spins, which are then progressively scaled up to larger system sizes to explore their critical behavior. Only certain block variables will work for such schemes, depending on the underlying symmetries of the problem, just as, in the earlier subsection, only the sums of numbers preserve the optimal solutions. Such simple real-space scalings, do not, however, always exist for our optimization schemes: the prepartition encoding of the TSP, for example, cannot be rephrased as a coarse-grained (i.e., reduced-size) TSP. To see this, simply observe that the evaluation of a tour in the restricted model still requires an optimization over multiple incoming and outgoing connections (roads) for every city, i.e., the information of inter-city distances cannot be collapsed in any way upon the transition from a larger (less restricted) to a smaller (more restricted) problem. This does not, however, rule out the possibility of, say, a renormalization-type scaling in some sort of generalized Fourier space. In the case of landscapes on permutation spaces, the characters of the symmetric group provide a suitable Fourier-like basis (Rockmore et al. 2002), which seem to be applicable to TSP and certain assignment problems. These and other possibilities are currently being explored, since it seems that deep similarities may underlie relatively superficial differences in the nature of the transformations involved in renormalization groups and the optimization-facilitating encodings that are the subject of this paper.
Heuristic Optimization over Y
General Considerations
So far, we were only concerned with the abstract structure of cover-encoding maps and the adjacencies in their encodings Y. On this theoretical basis, we can now construct a search-based optimization heuristic that generalizes the approaches in (Ruml et al. 1996) and our earlier work (Klemm et al. 2012). The idea is very simple: If we have an accurate and efficiently computable heuristic, we can quickly obtain good upper bounds for each of the restricted problems . The properties (R1) and (R2) guarantee the existence of non-increasing paths from an arbitrary initial encoding down to a final encoding . Steps to adjacent encodings that decrease therefore will have a bias toward the optimal solution of the original problem.
The fact that we have to rely on the quality of the estimate also suggests that it should be more efficient to restart the search often rather than try to overcome barriers of local minima in the landscape . In the examples above, local minima in can, as we have proved, appear only due to insufficient accuracy of the heuristic solutions for some encodings.
The discussion above also implies guidelines for the construction of encodings:
The cover-encoding map should be of a form that guarantees that has no local optima, i.e., the properties (R1), (R2), (Y1), and (Y2) should hold.
The paths in connecting large sets to smaller ones should not contain many steps along which the sets do not shrink. For instance, while the prepartition encoding for the NPP always has a strictly coarse-grained neighbor, this is not the case for the prepartition encoding for the TSP. We therefore suspect that other encodings for the TSP will work better in general.
The heuristic producing needs to be efficient, ideally not much slower than the function evaluations for the initial cost function f.
In order to demonstrate that the theory developed above may also have practical implications we probe instances of encoded landscapes by adaptive walks. To simulate a realization of an adaptive walk, we first generate an initial state y(0) by a procedure specific for the given landscape. At each time step t, we uniformly draw a neighbor z of state y(t) and set if , otherwise.
We select the MMP and the MCP as examples because (1) oracle functions and encodings can constructed that guarantee the absence of strict local minima; and (2) there is a simple and efficient algorithm for exact computation of for each . So we do not require heuristics. We leave the combination of cover-encoding maps with non-trivial heuristics for a future manuscript.
Maximum Matching Problems
Figure 3 shows the time evolution of cost in adaptive walks on the encoded landscapes of matchings encoded by sparse graphs, where the figure caption contains details on the instances and the definitions are to be found in Sect. 3.4.5. Note the logarithmic time axis in the plot.
Fig. 3.
(Color figure online) Time evolution of cost in adaptive walks on the landscape of matchings encoded by sparse subgraphs. Radius of symbols is proportional to the number of degrees of freedom (paths of even length and cycles of odd length) in the encoded state. Upper set of curves: 10 realizations, each on an independently generated ER random graph on 500 nodes with edge probability , i.e., average degree 2. Lower set of curves: 10 realizations on graphs (500 nodes) with perfect matching planted first, then adding each of the remaining possible edges with , resulting in average degree 2. Each adaptive walk is initialized by a random maximal matching L(0). Departing from the empty set, L(0) is generated by considering the edges of the graph G in the order of a random uniform permutation and adding an edge to L(0) if the result remains a matching
Both on purely random graphs and on those with a planted perfect matching, a solution of globally minimal cost is found. In addition to reaching a minimum-cost solution, we observe another interesting feature of the dynamics. The sizes of symbols (and annotated values in the uppermost curve) indicate the number of degrees of freedom of the solution y(t) at time t. This is the number of the connected components in the sparse graph, with two distinct maximum matchings. Departing from a singleton state (), the number of degrees of freedom first increases and then decreases during the descent of cost. So the optimization happens as a walk through states with large cardinality of the encoded set. Furthermore as a particular feature of this encoded landscape, the optimization dynamics eventually returns to low , having with a single optimal solution selected at large time t.
Maximum Clique Problems
Figure 4 shows the time evolution of the cost of adaptive walks on the encoded landscapes of graph cliques encoded by node sequences. The figure caption contains details on the instances and relevant definitions can be found in Sect. 3.4.6. We plot the difference with the minimum cost , so that a plotted value of 0 means the global optimum has been found.
Fig. 4.
(Color figure online) Time evolution of cost in adaptive walks on the landscape of cliques encoded by node sequences. For each graph size |V|, 100 random graph instances with parameter are generated independently. For each instance, an adaptive walk on the encoded landscape is performed with starting state . Plotted values are differences between of the state y(t) held by the adaptive walk at time t and the optimal cost , averaged over the 100 instances. Length of error bars is the standard deviation over these instances. The exact is computed with a branch-and-bound algorithm (Östergård 2002)
Our tentative conclusions are that the time to reach the optimal solution scales moderately with problem size. The standard deviation over realizations (error bars in the plot) also indicates a moderate variation of optimization time across these randomly generated instances.
Discussion and Conclusions
In this contribution we have shown that, in principle, it is possible to construct a genotypic encoding for any given phenotypically encoded combinatorial optimization problem with the property that the encoded landscape has no strict local minima. The construction hinges on three ingredients: a cover-encoding map that satisfies a few additional conditions, a suitable adjacency relation on Y, and an oracle function that (miraculously) returns the optimal cost value on the restrictions of the original problem to the covering sets . Of course, if we had such an oracle function in practice, we would not need a search heuristic in the first place.
Nevertheless, the concepts of oracle functions and cover-encoding maps are not just an empty exercise. We have seen that cover-encoding maps give rise to practically useful encodings provided there is a good deterministic heuristic for the restriction of the optimization problem to . For the NPP, it turns out that the Karmarkar–Karp differencing algorithm (Karmarkar and Karp 1982; Boettcher and Mertens 2008) provides a very good approximation to the oracle function. The prepartition encoding proposed by Ruml et al. (1996), on the other hand, ensures that the landscape of the oracle function is of the desirable type that has no local minima. Together these two facts make the work of Ruml et al. (1996) a showcase application of the theory developed here.
The numerical simulations of Sect. 5 strongly suggest that encodings with local-minima-free landscapes indeed admit efficient optimization by local search-based methods also for other optimization problems. Hence the theoretical results obtained here are of practical relevance provided a sufficiently accurate approximation to the oracle function can be computed. The precise meaning of the phrase “sufficiently accurate approximation” remains an open question for future research. We suspect, however, that the main problem arises when the approximation claims , suggesting that a step from y to be accepted, while holds, suggesting the step to should not be taken.
The construction of encodings for several well-known optimization problems also highlights the connections between encodings and a natural notion of coarse-graining for optimization problems. This also suggests a link to renormalization group methods commonly used in statistical physics. While it is clear that there is not a trivial correspondence, and that real-space coarse-grainings are just a particular subclass of encodings, this connection certainly deserves further study. The formalism laid out here at least provides a promising starting point.
An important issue in biology is the fact that encodings as symbolized by the genotype–phenotype map are themselves subject to evolutionary changes because the mechanisms of development evolve. It is well known that features of the genotype–phenotype, such as robustness (Wagner 2005) and accessibility (Fontana and Schuster 1998; Ndifon et al. 2009) have a key influence on evolution in the long term. Mathematical approaches that focus on the properties of encodings thus may become a very useful component in formal theories of evolvability and developmental evolution.
Acknowledgements
Open access funding provided by Max Planck Society. KK acknowledges funding from MINECO through the Ramón y Cajal program and through project SPASIMM, FIS2016-80067-P (AEI/FEDER, EU). This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (Grant Agreement No. 694925).
Contributor Information
Konstantin Klemm, Email: klemm@ifisc.uib-csic.es.
Anita Mehta, Email: anita@bioinf.uni-leipzig.de.
Peter F. Stadler, Email: studla@bioinf.uni-leipzig.de
References
- Applegate DL, Bixby RM, Chvátal V, Cook WJ. The traveling salesman problem. Princeton: Princeton University Press; 2006. [Google Scholar]
- Banzhaf W, Leier A. Evolution on neutral networks in genetic programming. In: Yu T, Riolo R, Worzel B, editors. Genetic programming theory and practice III. New York: Springer; 2006. pp. 207–221. [Google Scholar]
- Boettcher S, Mertens S. Analysis of the Karmarkar-Karp differencing algorithm. Eur Phys J B. 2008;65:131–140. doi: 10.1140/epjb/e2008-00320-9. [DOI] [Google Scholar]
- Bomze IM, Budinich M, Pardalos PM, Pelillo M. The maximum clique problem. In: Du DZ, Pardalos PM, editors. Handbook of combinatorial optimization-supplement volume A. Dordrecht: Kluwer Academic Publishers; 1999. pp. 1–74. [Google Scholar]
- Borenstein Y, Moraglio A, editors. Theory and principled methods for designing metaheuristics. Berlin: Springer; 2014. [Google Scholar]
- Choi SS, Moon BR. Normalization for genetic algorithms with nonsynonymously redundant encodings. IEEE Trans Evol Comp. 2008;12:604–616. doi: 10.1109/TEVC.2007.913699. [DOI] [Google Scholar]
- Ciliberti S, Martin OC, Wagner A. Innovation and robustness in complex regulatory gene networks. Proc Natl Acad Sci USA. 2007;104:13,591–13,596. doi: 10.1073/pnas.0705396104. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dykhuizen DE, Dean AM, Hartl DL. Metabolic flux and fitness. Genetics. 1987;115:25–31. doi: 10.1093/genetics/115.1.25. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Engelbrecht A, Richter H, editors. Recent advances in the theory and application of fitness landscapes. Berlin: Springer; 2014. [Google Scholar]
- Fernández P, Solé RV. Neutral fitness landscapes in signalling networks. J R Soc Interface. 2007;4:41–47. doi: 10.1098/rsif.2006.0152. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Flamm C, Stadler BMR, Stadler PF (2007) Saddles and barrier in landscapes of generalized search operators. In: Stephens CR, Toussaint M, Whitley D, Stadler PF (eds) 9th international workshop on foundations of genetic algorithms IX, FOGA 2007, Mexico City, Mexico, January 8–11, 2007. Lecture notes in computer science, vol 4436, pp 194–212. Springer, Berlin
- Flamm C, Ullrich A, Ekker H, Mann M, Högerl D, Rohrschneider M, Sauer S, Scheuermann G, Klemm K, Hofacker IL, Stadler PF. Evolution of metabolic networks: a computational framework. J Syst Chem. 2010;1:4. doi: 10.1186/1759-2208-1-4. [DOI] [Google Scholar]
- Fontana W, Schuster P. Continuity in evolution: on the nature of transitions. Science. 1998;280:1451–1455. doi: 10.1126/science.280.5368.1451. [DOI] [PubMed] [Google Scholar]
- Gutin G, Punnen AP, editors. The traveling salesman problem and its variations, combinatorial optimization. Berlin: Springer; 2007. [Google Scholar]
- Hammack R, Imrich W, Klavžar S. Handbook of product graphs. 2. Boca Raton: CRC Press; 2016. [Google Scholar]
- Kadanoff LP. Scaling laws for Ising models near . Physics. 1966;2:263–272. doi: 10.1103/PhysicsPhysiqueFizika.2.263. [DOI] [Google Scholar]
- Karmarkar N, Karp RM (1982) The differencing method of set partitioning. Computer Science Division (EECS), University of California, Berkeley, CA
- Klemm K, Mehta A, Stadler PF. Landscape encodings enhance optimization. PLoS ONE. 2012;7(e34):780. doi: 10.1371/journal.pone.0034780. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Knowles JD, Watson RA. On the utility of redundant encodings in mutation-based evolutionary search. In: Guervós JJM, Adamidis P, Beyer HG, Schwefel HP, Fernández-Villacañas JL, editors. Parallel problem solving from nature—PPSN VII. Berlin: Springer; 2002. pp. 88–98. [Google Scholar]
- Lovász L, Plummer MD. Matching theory, annals of discrete mathematics. Amsterdam: North-Holland; 1986. [Google Scholar]
- Mertens S. The easiest hard problem: number partitioning. In: Percus A, Istrate G, Moore C, editors. Computational complexity and statistical physics. Oxford: Oxford University Press; 2006. pp. 125–140. [Google Scholar]
- Ndifon W, Plotkin JB, Dushoff J. On the accessibility of adaptive phenotypes of a bacterial metabolic network. PLoS Comput Biol. 2009;5(e1000):472. doi: 10.1371/journal.pcbi.1000472. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Neumann F, Witt C. Bioinspired computation in combinatorial optimization. Berlin: Springer; 2010. [Google Scholar]
- Östergård PRJ. A fast algorithm for the maximum clique problem. Discr Appl Math. 2002;120:197–207. doi: 10.1016/S0166-218X(01)00290-6. [DOI] [Google Scholar]
- Østman B, Hintze A, Adami C. Critical properties of complex fitness landscapes. In: Fellermann H, Dörr M, Hanczyc MM, Laursen LL, Maurer SE, Merkle D, Monnard PA, Støy K, Rasmussen S, editors. Artificial life XII. Cambridge: MIT Press; 2010. pp. 126–132. [Google Scholar]
- Reidys CM, Stadler PF. Combinatorial landscapes. SIAM Rev. 2002;44:3–54. doi: 10.1137/S0036144501395952. [DOI] [Google Scholar]
- Rockmore D, Kostelec P, Hordijk W, Stadler PF. Fast Fourier transform for fitness landscapes. Appl Comput Harm Anal. 2002;12:57–76. doi: 10.1006/acha.2001.0346. [DOI] [Google Scholar]
- Rosten OJ. Fundamentals of the exact renormalization group. Phys Rep. 2012;511:177–272. doi: 10.1016/j.physrep.2011.12.003. [DOI] [Google Scholar]
- Rothlauf F. Representations for genetic and evolutionary algorithms. 2. Heidelberg: Springer; 2006. [Google Scholar]
- Rothlauf F. Design of modern heuristics: principles and application. Heidelberg: Springer; 2011. [Google Scholar]
- Rothlauf F, Goldberg DE. Redundant representations in evolutionary computation. Evol Comput. 2003;11:381–415. doi: 10.1162/106365603322519288. [DOI] [PubMed] [Google Scholar]
- Ruml W, Ngo J, Marks J, Shieber S. Easily searched encodings for number partitioning. J Optim Theory Appl. 1996;89:251–291. doi: 10.1007/BF02192530. [DOI] [Google Scholar]
- Schuster P, Fontana W, Stadler PF, Hofacker IL. From sequences to shapes and back: a case study in RNA secondary structures. Proc R Soc Lond B. 1994;255:279–284. doi: 10.1098/rspb.1994.0040. [DOI] [PubMed] [Google Scholar]
- Teranishi Y. The number of spanning forests of a graph. Discrete Math. 2005;290:259–267. doi: 10.1016/j.disc.2004.10.014. [DOI] [Google Scholar]
- Wagner A. Robustness, evolvability, and neutrality. FEBS Lett. 2005;579:1772–1778. doi: 10.1016/j.febslet.2005.01.063. [DOI] [PubMed] [Google Scholar]
- Wilson KG. Renormalization group and critical phenomena. I. Renormalization group and the Kadanoff scaling picture. Phys Rev B. 1971;4:3174–3183. doi: 10.1103/PhysRevB.4.3174. [DOI] [Google Scholar]
- Wilson KG, Kogut J. The renormalization group and the expansion. Phys Rep. 1974;12:75–199. doi: 10.1016/0370-1573(74)90023-4. [DOI] [Google Scholar]
- Wright S (1932) The roles of mutation, inbreeding, crossbreeding and selection in evolution. In: Jones DF (ed) Proceedings of the sixth international congress on genetics, vol 1, pp 356–366
- Wright S. “Surfaces” of selective value. Proc Nat Acad Sci USA. 1967;58:165–172. doi: 10.1073/pnas.58.1.165. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yu T, Miller JF. Finding needles in haystacks is not hard with neutrality. Lect Notes Comp Sci. 2002;2278:13–25. doi: 10.1007/3-540-45984-7_2. [DOI] [Google Scholar]




