Rotamer Optimization for Protein Design through MAP Estimation and Problem-Size Reduction

Eun-Jong Hong; Shaun M Lippow; Bruce Tidor; Tomás Lozano-Pérez

doi:10.1002/jcc.21188

. Author manuscript; available in PMC: 2012 Nov 10.

Published in final edited form as: J Comput Chem. 2009 Sep;30(12):1923–1945. doi: 10.1002/jcc.21188

Rotamer Optimization for Protein Design through MAP Estimation and Problem-Size Reduction

Eun-Jong Hong ^1,², Shaun M Lippow ^1,^3,⁴, Bruce Tidor ^1,^2,⁵, Tomás Lozano-Pérez ^1,²

PMCID: PMC3495010 NIHMSID: NIHMS414666 PMID: 19123203

Abstract

The search for the global minimum energy conformation (GMEC) of protein side chains is an important computational challenge in protein structure prediction and design. Using rotamer models, the problem is formulated as a NP-hard optimization problem. Dead-end elimination (DEE) methods combined with systematic A* search (DEE/A*) has proven useful, but may not be strong enough as we attempt to solve protein design problems where a large number of similar rotamers is eligible and the network of interactions between residues is dense. In this work, we present an exact solution method, named BroMAP (branch-and-bound rotamer optimization using MAP estimation), for such protein design problems. The design goal of BroMAP is to be able to expand smaller search trees than conventional branch-and-bound methods while performing only a moderate amount of computation in each node, thereby reducing the total running time. To achieve that, BroMAP attempts reduction of the problem size within each node through DEE and elimination by lower bounds from approximate maximum-a-posteriori (MAP) estimation. The lower bounds are also exploited in branching and subproblem selection for fast discovery of strong upper bounds. Our computational results show that BroMAP tends to be faster than DEE/A* for large protein design cases. BroMAP also solved cases that were not solved by DEE/A* within the maximum allowed time, and did not incur significant disadvantage for cases where DEE/A* performed well. Therefore, BroMAP is particularly applicable to large protein design problems where DEE/A* struggles and can also substitute for DEE/A* in general GMEC search.

Introduction

Determining low-energy placements for side chains on a fixed backbone is an important problem in both protein structure prediction and protein design. A typical approach to the protein structure prediction is homology modeling^1;2;3 followed by refinement of the model through determination of the side-chain conformations. Determining the side-chain conformation for a given backbone structure and an amino acid sequence is called “side-chain placement” and is solved through finding the minimum energy conformation. In addition, in protein design problems, also referred as the “inverse folding problem”^4;5;6, an amino acid sequence that will stably fold to the target backbone structure is to be found. Given a backbone structure and energy functions, the protein design problem is also solved as a generalized side-chain placement problem, that is, by finding the minimum energy conformation of side chains, drawing from a range of amino acid types at each residue position^7;8. If the backbone structure is not assumed to be fixed, one can still design with a flexible backbone by using iterative steps, where a side-chain placement problem is solved for each perturbed fixed backbone structure⁹. The search for the minimum energy conformation is, therefore, one of the most important computational challenges in computational protein design.

In finding the minimum energy conformation, the search space can be simplified by allowing only some finite number of fixed side-chain conformations, called rotamers^10;11. With the rotamer model, the energy function of a protein sequence folded onto a specific backbone template can be described in terms of¹²:

the self-energy of the backbone template from the interactions within the backbone (denoted as E_template);
the singleton interaction energy between the backbone and rotamer conformation r at position i of the sequence (denoted as E(i_r));
the pairwise interaction energy between rotamer conformation r at position i and rotamer conformation s at position j, i ≠ j (denoted as E(i_r, j_s)).

Then, the energy of a protein sequence of length n in a specific backbone template structure and conformation C = {C₁, …, C_n | C_i is the conformation of position i} can be written in a functional form as

E (C) = E_{template} + \sum_{i = 1}^{n} E (C_{i}) + \sum_{i = 1}^{n - 1} \sum_{j = i + 1}^{n} E (C_{i}, C_{j}) .

(1)

Energy terms E(i_r) and E(i_r, j_s) can be computed for a given backbone template and the set of allowed rotamers using coordinates of atoms and specified molecular force fields, such as AMBER^13;14;15, CHARMM^16;17, MMFF¹⁸, or OPLS¹⁹. The conformation C that minimizes the energy function Inline graphic (C) is often called the global minimum energy conformation (GMEC). In this work, we consider the problem of finding the GMEC when given a backbone conformation, a set of rotamers, and energy terms, and call such a problem “the GMEC problem”. Note that E_template is constant by definition and can be ignored when we minimize Inline graphic (C).

The GMEC problem is a strongly NP-hard optimization problem as one can readily show by reduction from the satisfiability problem²⁰. Despite the theoretical hardness, one finds that many instances of the GMEC problem are easily solved by the exact method of dead-end elimination (DEE)¹². Elimination procedures such as Goldstein’s conditions and unification²¹, logical singles-pairs elimination²², the magic bullet pairs heuristic²³, splitting²⁴, generalized elimination conditions²⁵, hybrid optimization through scheduling of various elimination conditions²⁶, and more recently divide-and-conquer enhancement to DEE²⁷ are often able to reduce the problem size dramatically, while demanding only reasonable computational power.

Other than DEE, there exist various approaches to solve the GMEC problem exactly. Leach and Lemon²⁸, Gordon and Mayo²⁹, and Wernisch et al.³⁰ describe a branch-and-bound method. Eriksson et al.³¹, Althaus et al.³², and Kingsford et al.³³ present integer linear programming approaches. Leaver-Fay et al.³⁴ describe a dynamic programming approach based on tree-decomposition. Xu³⁵ describes another method based on tree-decomposition and presents a tree-decomposition algorithm for protein backbone structures. Xie and Sahinidis³⁶ describe a method that combines several residue-reduction and rotamer-reduction techniques. Yanover et al.³⁷ use a tree-reweighted belief propagation algorithm as a linear-program solver with better scalability, and Weiss et al.³⁸ extend this approach by suggesting a search scheme for an integral solution when the solution of the linear program is fractional. Each exact approach may have some advantages over others depending on the characteristics of the problem being considered. For example, for a simplified version of the problem where the number of rotamers per position is limited or interactions between residue positions are sparse, even deterministic algorithms with guaranteed time bounds exist. However, it is known that protein structures and stabilities can be predicted better with more side-chain flexibility, that is, by using a larger rotamer library^39;40. In addition, the network of interactions between residue positions can be dense as is often observed in protein cores. Therefore, we are interested in protein design problems where all possible pairs of positions are assumed to interact and a large number of similar rotamers is offered at each position. To our knowledge, only DEE-like methods or DEE followed by branch-and-bound methods have shown success in solving such hard protein design cases exactly.

There also exist approximate approaches for the GMEC problem. Koehl and Delarue⁴¹ present the self-consistent mean field theory. Desjarlais and Handel⁴² and Jones⁴³ use genetic algorithms. Jiang et al.⁴⁴ use simulated annealing and Monte Carlo sampling. Wernisch et al.³⁰ describe a heuristic for protein design. Yanover and Weiss⁴⁵ use belief-propagation methods. However, inaccuracy during GMEC search may introduce uncertainty in the analysis step where correction of energy functions or modification of the design protocol is to be made. Therefore, we are primarily interested in finding the exact GMEC and will not further consider approximate methods in this work.

Enhanced DEE²⁶ performs well for some of the hard protein design cases of interest to us. However, finding dead-ends using the known elimination conditions does not always eliminate as many rotamers or rotamer pairs as necessary. In case the remaining conformational space after DEE application is too large to literally enumerate, a systematic search method such as A* algorithm^46;28 is often followed to find the GMEC (call the combined method DEE/A*). However, such a combined scheme will not be useful unless DEE reduces the size of conformational space to the point where a systematic search is applicable.

Here we describe a new exact solution method for the GMEC problem that can substitute for DEE/A*, especially in solving hard design cases. Our method, named BroMAP (branch-and-bound rotamer optimization using MAP estimation), is based on the branch-and-bound (BnB) framework and a new subproblem-pruning method. We present lower-bounding methods and problem-size reduction techniques, organized into a BnB framework so that BroMAP is guaranteed to find an optimal solution.

Our numerical experiments confirm the utility of BroMAP in GMEC search for large protein design problems, including ones that are challenging for DEE/A*. In our experiments, all cases solved by DEE/A* were also solved by BroMAP, and using BroMAP did not incur significant disadvantage over DEE/A*. Moreover, BroMAP excelled on the cases where DEE/A* did not perform well; for each case that took longer than one hour but was eventually solved by DEE/A*, BroMAP took at most 33% of the DEE/A* running time. Among 68 test cases of various types and sizes, we found BroMAP failed to solve three cases within the 7-day allowed time whereas DEE/A* failed to solve 17 of them.

Compared to DEE, BroMAP has an advantage that it can attack smaller subproblems separately using various problem-size reduction or lower-bounding techniques instead of having to keep the problem as a whole. Meanwhile, the use of DEE as one of the problem-size reduction techniques in BroMAP allows the strengths of DEE for protein design problems to be transferred to BroMAP.

BroMAP has the advantage of reducing the search trees over conventional BnB approaches in two ways. First, it uses problem-size reduction techniques within each node so that the effect of problem-size reduction from branching is often larger than that of a conventional BnB method. Hence, the depth of the resulting search tree is also smaller. Second, it quickly finds a strong upper-bound (at the end of the first depth-first dive) with the help of informed branching and subproblem selection. This facilitates effective pruning of nodes that follow, and therefore often results in sparse search trees growing mostly in one direction. BroMAP achieves these advantages without excessive computation by using new inexpensive lower-bounding methods and limiting the effort spent by bounding or problem-size reduction.

Followings are the contributions made in this work:

Development of lower-bounding methods for minimum conformation energy of individual rotamers and rotamer pairs using a maximum-a-posteriori estimation method called tree-reweighted max-product algorithm⁴⁷;
Adoption of problem-size reduction techniques (DEE and elimination by lower-bounds) within the BnB framework;
Use of rotamer lower-bounds in branching and subproblem selection for fast discovery of strong upper-bounds;
Extensive evaluation of BroMAP and DEE/A* on various types and sizes of protein design problems.

Overview of the method

In this section, we present an overview of BroMAP in a top-down manner. We start with a brief description of the branch-and-bound method as the framework of BroMAP. Then, the pruning scheme used by BroMAP is discussed in more detail.

Branch-and-bound framework

Figure 1 shows an overview of BroMAP. It is organized at the top level as a branch-and-bound method (BnB), a general problem-solving technique particularly effective for combinatorial problems⁴⁸. The basic idea of BnB is to partition the original problem recursively and solve these smaller subproblems. In the resulting search tree, each subproblem is another instance of the GMEC problem, with a different number of rotamers or residue positions from the original problem at the root node.

BnB solves the GMEC problem as a kind of tree search problem. It maintains a global upper-bound U, which is the energy of the best conformation found so far. The initial value of U is set to the energy of an arbitrary conformation. BroMAP can be recursively described as follows:

Select a subproblem from the queue.
Can the subproblem be fully solved within limited time and memory? If so, (a) compute the minimum energy; (b) set U to the minimum energy if it is less than U; (c) return to step 1.
Compute a lower bound and an upper bound on the minimum energy for this subproblem. If the upper bound is less than U, set U to the upper bound.
If the lower bound exceeds the current global upper-bound U, then discard (prune) this subproblem and return to step 1.
When possible, exclude ineligible conformations from the search space.
Pick one residue and split its rotamers into two groups; define two child subproblems based on this split (see Figure 2).
Add the child subproblems to the queue and return to step 1.

Splitting a subproblem. Rotamers at a position are divided into two groups and each child of the subproblem takes only one group of rotamers.

A node is said to be “expanded” (i.e. processed) by steps 2 to 7. This description leaves many details unspecified: how to attempt solutions, how to obtain bounds, how to identify ineligible conformations, how to choose the residue and rotamers for the node split, and what order to solve the subproblems. We provide these details in the subsequent sections.

The key advantage of BnB over naive enumeration-based methods comes from being able to approximately solve subproblems, that is, to obtain bounds on the answer that allow many subproblems to be pruned, thus avoiding exploration of the entire solution space. If the bounds are weak, BnB may end up generating too many subproblems to be effective. The purpose of branching in a BnB method is to reduce the size of the subproblems so that they can be either solved or pruned effectively with limited resources.

In our BnB formulation, the branching rule (splitting the rotamers of a residue) only brings about a modest reduction in the search space of each child subproblem compared to its parent subproblem. Furthermore, there is no net reduction in the total search space when one considers both children. A critical component of our approach is to reduce the size of the total search space, by eliminating ineligible conformations, before splitting. This is in the spirit of the dead-end elimination algorithm or “branch-and-terminate”²⁹ but employing additional elimination by our new lower bounds.

Solving subproblems

There are two well-known approaches to solving the GMEC problem exactly. One is DEE^12;21;24 and the other is integer linear programming (ILP)⁴⁸. Both of these methods are guaranteed to solve the GMEC problem given unbounded resources but have unpredictable running times as a function of the problem size.

DEE is an iterative method that eliminates a non-GMEC rotamer by comparing its energetics with those of other rotamers at the same position. The same rules are also applied to eliminate rotamer pairs. When a rotamer can be eliminated from consideration, this can be represented by reducing the set of rotamers at a residue position. Eliminated rotamer pairs, on the other hand, are tracked via “pair flags”, which indicate ineligible assignments for pairs of positions. When the numerical properties of the energy terms are favorable or when the problem size is relatively small, DEE successfully eliminates many non-GMEC rotamers or rotamer pairs so that the GMEC can be easily found from the remaining small conformational space. In general, we will need to perform a systematic search of the remaining conformational space; the A* heuristic search algorithm⁴⁶ is usually used for this purpose. However, DEE may fail to reduce the size of the conformational space to the point where it is practical to search for the GMEC using A*. This is what motivates our BnB approach.

ILP is a popular approach to solving combinatorial optimization problems but we have found that direct application of general ILP solvers to protein design problems is generally impractical (see Appendix B). Furthermore, as we discuss below, DEE has the additional advantage of reducing the size of the conformational space at each subproblem, even when it fails to completely solve the subproblem. Therefore, we have used a DEE-based solver as our method for solving subproblems.

Bounding subproblems

In addition to completely solving subproblems, we also need a way of obtaining lower bounds to prune nodes more efficiently. The classical approach for obtaining bounds for a combinatorial optimization problem is via the relaxation to linear programming (LP) after formulating the problem as ILP. For example, we obtain LP by treating the integer-valued variables in the ILP formulation of the GMEC problem, i.e. (25) – (29) of Appendix B, as real. Although LP problems are solvable in polynomial time, it is still the case that the LP problems resulting from the relaxation of typical protein design problems are often too large and thus require impractical amounts of computing time and memory.

The less expensive lower-bounding method that we use in this work is the tree-reweighted max-product algorithm (TRMP)⁴⁷, which will be introduced later in this paper. TRMP lower bounds are known to be no better than the LP lower bounds, and there are no guarantees of how close to the LP bound a TRMP bound will be. However, the relatively low computational cost and its good performance in practice makes TRMP an excellent lower-bounding tool.

Another key advantage of TRMP is that, like DEE, it can be used to compute lower-bounds for parts of the conformational space efficiently and to eliminate them as discussed below.

On the other hand, the upper bounds are also obtained by TRMP for the subproblems that are not exactly solved. This is based on a heuristic use of TRMP, but often produces stronger upper bounds than random sampling of conformations. We present the details on upper-bounding by TRMP later in the paper.

Reducing subproblem size

As we mentioned above, a critical component of our BnB methodology is that we attempt to reduce the size of the search space for each subproblem by removing ineligible conformations. Smaller subproblems are easier to solve and to bound. We use two techniques to accomplish this: DEE discussed above and elimination by lower bounds. The latter is illustrated in Figure 3 and discussed below.

Elimination by rotamer lower bounds. The x-axis lists all rotamers of the subproblem in an arbitrary order. The vertical dotted lines indicate division of rotamers by positions they belong to. Two types of y-values are plotted for each rotamer *i_r*: (1) minimum energy that a conformation including *i_r* can have, (2) a lower bound of (1) obtained by a lower-bounding method. Three horizontal lines are also depicted, each representing (a) an upper bound U, (b) the optimal value of the subproblem, (c) a lower bound of (b) obtained from the same lower-bounding method. Rotamers that can be eliminated by comparison against U are indicated by filled triangles.

For each rotamer r at an arbitrary position i, we can think of an assignment of rotamers in other positions such that no other assignment can give a lower conformational energy when position i is fixed to r. We call the energy corresponding to such an assignment the minimum conformational energy of i_r. Similarly, we can define the minimum conformational energy for an arbitrary pair of rotamers (i_r, j_s) such that i ≠ j.

Suppose we know a lower-bound L(i_r) of the minimum conformational energy of i_r and a global upper-bound U such that L(i_r) > U. Then, rotamer i_r can be eliminated from the subproblem without affecting whether the subproblem is prunable or not. Similarly, if we have a lower bound of the minimum conformational energy of a rotamer pair greater than U, the rotamer pair can also be eliminated. Figure 4 illustrates the problem-size reduction by elimination of rotamers and rotamer pairs.

Reduction by elimination of rotamers and rotamer pairs. While elimination of rotamers brings explicit reduction of the problem size, elimination of rotamer pairs will be implicitly represented by pair flags. Rotamer eliminations in (c) were made consistent with bounds of Figure 3.

*Figure (a).* Original subproblem.

*Figure (b).* Rotamer-pair elimination.

*Figure (c).* Rotamer elimination.

The problem is obtaining useful lower bounds for each rotamer or rotamer pair. If we use LP relaxation, we would need to solve LP problems as many times as the number of rotamers or rotamer pairs, and each LP problem can be still very large. A more practical solution follows from the theoretical properties of TRMP, which allow us to obtain the lower bounds for all rotamers and rotamer pairs in one TRMP convergence plus post-processing time at most square of the problem size. We will discuss how we can obtain these lower bounds using TRMP later in the paper.

When a rotamer pair is eliminated by a TRMP lower bound, we mark the rotamer pair with a pair flag, as done in DEE. However, such a pair flag is more general than the pair flags used in conventional DEE since the elimination is done relative to the current global upper-bound U. Thus, it is possible for TRMP to flag rotamer pairs belonging to the minimum energy conformation of the subproblem in case the optimal value of the subproblem is greater than U. When this happens, the optimal value of the subproblem after the elimination can be greater than before the elimination. However, if the optimal value is less than or equal to U, elimination by lower bounds is guaranteed to produce reduced subproblems with unchanged optimal value.

If enough pairs are eliminated by TRMP lower bounding, it may be that some positions may not have any remaining valid assignments. In this situation, the whole subproblem is infeasible and can be pruned.

Conventional DEE never flags rotamer pairs that belong to the minimum energy conformation. Therefore, the interaction of DEE with these general pair flags should be carefully considered to avoid illegal elimination by DEE. In our work, this is done by numerically enforcing the pair flags, that is, by replacing the pair flags with very large (artificial) pairwise energies. This guarantees correct elimination by DEE conditions based on energy comparison (e.g. Goldstein’s conditions). Meanwhile, when logical elimination is attempted (e.g. logical singles-pairs elimination or unification), general pair flags are used as if they are conventional pair flags.

Note that we use elimination by lower bounds together with the modified DEE in each node of the search tree. In a previous work²⁹, lower bounds were used in the BnB framework to “terminate” singles, but DEE is only used as a preprocessing procedure before applying the BnB method. In another work²⁶, elimination by lower bounds was applied in conjunction with DEE to the whole problem, but no branching was used. The lower bounds used there were also computed differently, by fixing conformations for a subset of positions and finding minimum values over decomposed sets of positions.

Subproblem splitting and selection

Our strategy of subproblem selection is depth-first search (DFS), where one selects the deepest subproblem to expand, breaking ties by choosing the node with the smallest lower bound. The goal is to first find a good upper-bound by following DFS through the children with the lowest bounds, then to prune the remaining subproblems using that upper-bound. To implement this strategy, we need to split subproblems so that they have substantially different lower bounds.

As discussed above, we can compute inexpensive lower bounds for individual rotamers by TRMP. Therefore, we can split a subproblem by dividing rotamers of a selected position into two groups according to their rotamer lower bounds, so that the maximum rotamer lower bound of one group is less than or equal to the minimum rotamer lower bound of the other group. We call the child from the former group “the low child” and the other as “the high child”. The low child is more likely to have an optimal value less than that of the high child. A splitting position is selected so that difference between maximum and minimum rotamer lower bounds is large. This splitting scheme will also tend to make the high child easier to prune than the low child.

The leftmost diagram in Figure 1 illustrates our subproblem selection strategy. We can see that the tree first grows along the line of low-subproblems then the high-subproblems are traversed. We call the DFS along all low-branches until the first leaf node is reached as “the first depth-first dive”. If the splitting is successful and non-optimal nodes are pruned effectively, the search tree should be highly skewed toward low-branches.

Bounding the GMEC energy through MAP estimation

In this section, we formulate the GMEC problem as a maximum-a-posteriori (MAP) estimation problem and introduce the MAP estimation method, particularly TRMP, as a lower-bounding tool for the GMEC energy.

Problem Formulation

Probabilistic inference problems⁴⁹, including the MAP estimation problem, involve a random vector x = (x₁, x₂, …, x_n) characterized by a probability distribution that maps a sample x ∈ Inline graphic to a probability p(x). The MAP estimation problem asks to find a MAP assignment x* such that x* ∈ arg max_x_∈ p(x), where is the sample space for x. In the GMEC problem, we number the sequence positions by i = 1, …, n, and associate with each position i a discrete random variable x_i that ranges over R_i, a set of allowed rotamers at position i. Then, we can define a probability distribution p(x) over Inline graphic = R₁ × … × R_n as

p (x) = \frac{1}{Z} exp {- e (x)},

(2)

for a normalization constant Z and $e (x) = \sum_{i = 1}^{n} e_{i} (x_{i}) + \sum_{i = 1}^{n - 1} \sum_{j = i + 1}^{n} e_{i j} (x_{i}, x_{j})$ , where e_i(r) = E(i_r) for r ∈ R_i, and e_ij(r, s) = E(i_rj_s) for (r, s) ∈ R_i × R_j. Therefore, the GMEC problem for minimizing e(x) is equivalent to the MAP estimation problem for p(x), that is, the assignment that maximizes the probability minimizes the energy. Note that the value of Z is conventionally determined so that Σ_x_∈ p(x) = 1. However, computing the exact value of Z that satisfies this condition is not necessary in finding the MAP assignment of p(x) because 1/Z simply scales the exponential function of (2). We will see later that our algorithm does not depend on the value of Z.

A probability distribution over a random vector can be related to a graphical model⁴⁹. An undirected graphical model Inline graphic = ( , ) consists of a set of vertices that represent random variables and a set of edges connecting some pairs of vertices. The structure of a graphical model is determined by conditional independencies among the random variables. That is, a probability distribution p(x) can be represented by an undirected graphical model Inline graphic if p(x) can be factorized into non-negative functions (called compatibility functions), each of which is defined over variables in a clique of . The typical motivation for using the graphical model is finding as simple a model as possible that captures conditional independencies among variables. However, we generally consider a complete graph with n vertices as the graphical model for the GMEC problem, that is, the protein design problems we are interested in have molecular interactions between every pair of positions.

In what follows, we will often describe distributions by their associated graphical model; for example, a “tree distribution” refers to a distribution represented by a tree graphical model.

Max-marginals and max-product algorithm

Wainwright et al.⁵⁰ define (singleton) max-marginals μ_i as the maximum of p(x) when one of the variables x_i is constrained to a specific value, i.e. $μ_{i} (x_{i}) = κ_{i} {max}_{{x^{'} ∣ x_{i}^{'} = x_{i}}} p (x^{'})$ . Similarly, pairwise max-marginals μ_ij are defined as $μ_{i j} (x_{i}, x_{j}) = κ_{i j} {max}_{{x^{'} ∣ x_{i}^{'} = x_{i}, x_{j}^{'} = x_{j}}} p (x^{'})$ , the maximum of p(x) when a pair of the variables are constrained to a specific pair of values. Note that κ_i and κ_ij are constants that can vary depending on i and j. In what follows, we will simply denote all the constants as κ. It is known that any tree distribution p(x) can be factorized in terms of its max-marginals as $p (x) \propto \prod_{i \in V} μ_{i} (x_{i}) \prod_{(i, j) \in E} \frac{μ_{i j} (x_{i}, x_{j})}{μ_{i} (x_{i}) μ_{j} (x_{j})}$ ⁴⁹. If we knew the max-marginals of a tree distribution p(x), we could easily compute the maximum value of p(x).

Example 1 (Max-marginals)⁵⁰

Let x ∈ {0, 1}³ be a random vector defined by a graphical model of Figure 5 and compatibility functions ψ such that

The diagram shows the graphical model and pairwise compatibility functions ψ₁₂(x₁*, x*₂) and ψ₂₃(x₂*, x*₃) of the distribution used in Example 3.

ψ_{i} (x_{i}) = 1, for all x_{i} \in {0, 1} and i \in {1, 2, 3} .

(3)

and

ψ_{i j} (x_{i}, x_{j}) = {\begin{array}{l} 1 & i f x_{i} = x_{j} \\ 4 & otherwise \end{array} for all (i, j) \in {(1, 2), (2, 3)} .

(4)

That is, $p (x) = \frac{1}{50} ψ_{1} (x_{1}) ψ_{2} (x_{2}) ψ_{3} (x_{3}) ψ_{12} (x_{1}, x_{2}) ψ_{23} (x_{2}, x_{3}) .$ .

Then, it is easy to verify ${max}_{{x^{'} ∣ x_{1}^{'} = x_{1}}} p (x^{'}) = 4^{2} / 50$ for all x₁ ∈ {0, 1}. Therefore, we can define max-marginals μ₁(x₁) = 1 for all x₁ ∈ {0, 1}, i.e. ${max}_{{x^{'} ∣ x_{1}^{'} = x_{1}}} p (x^{'}) = \frac{4^{2}}{50} μ_{1} (x_{1})$ and $κ_{1} = \frac{50}{4^{2}}$ . Since μ₂(x₂) and μ₃(x₃) can be defined similarly, we obtain μ_i(x_i) = 1 for all x_i ∈ {0, 1} and i ∈ {1, 2, 3}.

Likewise, we can verify ${max}_{{x^{'} ∣ (x_{1}^{'}, x_{2}^{'}) = (x_{1}, x_{2})}} p (x^{'})$ is 4/50 if x₁ = x₂, and 4²/50 otherwise. Since we obtain the same result when maximizing under fixed (x₂, x₃) values, we can define μ_ij(x_i, x_j) as

μ_{i j} (x_{i}, x_{j}) = {\begin{array}{l} 1 & i f x_{i} = x_{j} \\ 4 & otherwise \end{array} for all (i, j) \in (1, 2), (2, 3) .

(5)

i.e. ${max}_{{x^{'} ∣ (x_{i}^{'}, x_{j}^{'}) = (x_{i}, x_{j})}} p (x^{'}) = \frac{4}{50} μ_{i j} (x_{i}, x_{j})$ and $κ_{i j} = \frac{50}{4}$ .

In this example, we realize μ_i(x_i) = ψ_i(x_i) and μ_ij(x_i, x_j) = ψ_ij(x_i, x_j) for all i, j, and also μ_ij(x_i, x_j) = ψ_ij(x_i, x_j)ψ_i(x_i)ψ_j(x_j). This makes us easily verify that p(x) is factorized by max-marginals:

p (x) = \frac{1}{50} ψ_{1} (x_{1}) ψ_{2} (x_{2}) ψ_{3} (x_{3}) \frac{ψ_{12} (x_{1}, x_{2}) ψ_{1} (x_{1}) ψ_{2} (x_{2})}{ψ_{1} (x_{1}) ψ_{2} (x_{2})} \frac{ψ_{23} (x_{2}, x_{3}) ψ_{2} (x_{2}) ψ_{3} (x_{3})}{ψ_{2} (x_{2}) ψ_{3} (x_{3})}

(6)

= \frac{1}{50} μ_{1} (x_{1}) μ_{2} (x_{2}) μ_{3} (x_{3}) \frac{μ_{12} (x_{1}, x_{2})}{μ_{1} (x_{1}) μ_{2} (x_{2})} \frac{μ_{23} (x_{2}, x_{3})}{μ_{2} (x_{2}) μ_{3} (x_{3})} .

(7)

Now, assume that we are given p(x) and the max-marginals {μ_i, μ_ij}. We illustrate how max-marginals can be used to compute max_x p(x). We know $p (x) = \frac{1}{Y} μ_{1} (x_{1}) μ_{2} (x_{2}) μ_{3} (x_{3}) \frac{μ_{12} (x_{1}, x_{2})}{μ_{1} (x_{1}) μ_{2} (x_{2})} \frac{μ_{23} (x_{2}, x_{3})}{μ_{2} (x_{2}) μ_{3} (x_{3})}$ for some Y. The value of Y can be easily computed by comparing both sides of the equation for some specific assignment, e.g. (0, 0, 0). In this example, we obtain Y = 50 as shown in (7). Assuming x* is a MAP assignment, we have

max_{x} p (x) = p (x^{*}) = \frac{1}{50} μ_{1} (x_{1}^{*}) μ_{2} (x_{2}^{*}) μ_{3} (x_{3}^{*}) \frac{μ_{12} (x_{1}^{*}, x_{2}^{*})}{μ_{1} (x_{1}^{*}) μ_{2} (x_{2}^{*})} \frac{μ_{23} (x_{2}^{*}, x_{3}^{*})}{μ_{2} (x_{2}^{*}) μ_{3} (x_{3}^{*})} .

(8)

Since we know $x_{i}^{*}$ and ( $x_{i}^{*}, x_{j}^{*}$ ) should be a maximizer of μ_i(x_i) and μ_ij(x_i, x_j), respectively, the maximum value of p(x) can be obtained simply by finding the maximum value of each μ_i(x_i) and μ_ij(x_i, x_j) without needing to find the actual assignment x*. Therefore, max_xp(x) = 4²/50.

Max-marginals are also useful in finding a MAP assignment for a tree distribution⁵⁰. We can easily determine a MAP assignment value for the root node of the tree by finding a value that maximizes the singleton max-marginals of the root. Then, the MAP assignment is determined for the rest of the nodes in the order of tree traversal from the root to leaves; for each pair of parent and child nodes and a given assignment for the parent node, the child node assignment is a value that maximizes the corresponding pairwise max-marginals.

For a distribution over a non-tree (cyclic) graphical model, knowing the exact max-marginals does not necessarily imply a MAP assignment or the maximum value of p(x) can be easily found. There are special cases that allow efficient computation of MAP assignments for cyclic distributions using max-marginals. For example, when each singleton max-marginals factor has a unique maximizer, the assignment consisting of these maximizers is the unique MAP assignment. More generally, an assignment that maximizes every max-marginals factor of the distribution is a MAP assignment⁴⁷. Such an assignment can be more efficiently found by restricting the search to a subgraph derived from singleton factors that have multiple maximizers³⁸. However, this search is still very large in case there are many maximizers of each singleton max-marginals factor and the subgraph is densely connected.

The ordinary max-product (also known as max-plus or min-sum) algorithm⁴⁹ is an iterative algorithm that estimates a MAP assignment by propagating a series of messages along the edges of the graphical model. The algorithm exactly computes a MAP assignment for tree distributions, but it does not guarantee finding one for cyclic distributions. It is known that the ordinary max-product algorithm applied to a tree distribution can be interpreted as computing max-marginals exactly and efficiently⁵⁰. For general cyclic distributions, there is no known method that efficiently computes max-marginals; it can be as expensive as the original MAP estimation problem.

Pseudo-max-marginals

Instead of attempting to compute max-marginals, Wainwright et al.⁴⁷ use the notion of pseudo-max-marginals in their tree-reweighted max-product (message-passing) algorithm. Pseudo-max-marginals are defined so that they become max-marginals for each tree distribution used in the algorithm, and the original distribution is represented as a convex combination of these tree distributions.

The basic idea of the tree-reweighted max-product algorithm is to express a cyclic distribution as a convex combination of distributions over a set of spanning trees. This convex combination of tree distributions is used to upper bound the MAP probability, that is, to lower bound the energy. It can be shown that the upper bound is tight if and only if every tree distribution shares a common MAP configuration, i.e. tree agreement⁴⁷. The tree-reweighted max-product algorithm tries to induce this tree agreement by factorizing each tree distribution with factors called pseudo-max-marginals and having pseudo-max-marginals converge to the max-marginals of each tree distribution.

Let us assume we use the tree-reweighted max-product algorithm with Inline graphic , a set of spanning trees of , and some non-negative constant ρ(T) for each T ∈ such that Σ_T_∈ ρ(T) = 1. The tree-reweighted max-product algorithm requires that every vertex and edge of be covered by , i.e. each vertex and edge in is in some tree T in such that ρ(T) > 0. Then, by construction, pseudo-max-marginals ν = {ν_i, ν_ij} from the tree-reweighted max-product algorithm satisfy “ρ-reparameterization”, that is described as:

p (x) \propto \prod_{T \in T} {[\prod_{i \in V (T)} ν_{i} (x_{i}) \prod_{(i, j) \in E (T)} \frac{ν_{i j} (x_{i}, x_{j})}{ν_{i} (x_{i}) ν_{j} (x_{j})}]}^{ρ (T)} = \prod_{i \in V} ν_{i} {(x_{i})}^{ρ_{i}} \prod_{(i, j) \in E} {[\frac{ν_{i j} (x_{i}, x_{j})}{ν_{i} (x_{i}) ν_{j} (x_{j})}]}^{ρ_{i j}},

(9)

where ρ_ij is an edge coefficient such that ρ_ij = Σ_T_∈
:(_i,j_)∈
(_T₎ ρ(T) defined for all (i, j) ∈ Inline graphic , and ρ_i is a vertex coefficient such that ρ_i = Σ_T_∈
:_i_∈
(_T₎ ρ(T) defined for all i ∈ . Note that, if is a set of spanning trees, then ρ_i is 1 for all i ∈ .

A tree distribution p^T(x; ν) for some T ∈ Inline graphic and given pseudo-max-marginals can be defined as

p^{T} (x; ν) = \prod_{i \in V (T)} ν_{i} (x_{i}) \prod_{(i, j) \in E (T)} \frac{ν_{i j} (x_{i}, x_{j})}{ν_{i} (x_{i}) ν_{j} (x_{j})} .

(10)

Then, we have p(x) ∝ Π_T_∈{p^T(x; ν)}^ρ⁽^T⁾ from (9). The pseudo-max-marginals ν* at convergence of the tree-reweighted max-product algorithm satisfy the “tree-consistency condition” with respect to every tree T ∈ Inline graphic . That is, the pseudo-max-marginals converge to the max-marginals of each tree distribution.

Example 2 (Pseudo-max-marginals)⁴⁷

Let x ∈ {0, 1}³ be a random vector on a graphical model illustrated in Figure 6(a). Let $p (x) = \frac{1}{98} ψ_{1} (x_{1}) ψ_{2} (x_{2}) ψ_{3} (x_{3}) ψ_{12} (x_{1}, x_{2}) ψ_{23} (x_{2}, x_{3}) ψ_{31} (x_{3}, x_{1})$ , where ψ_i(x_i) and ψ_ij(x_i, x_j) are defined same as in Example 1. We define pseudo-max-marginals ν̂ as follows:

{\hat{ν}}_{i} (x_{i}) = 1, for all x_{i} \in {0, 1} and i \in {1, 2, 3},

(11)

{\hat{ν}}_{i j} (x_{i}, x_{j}) = {\begin{array}{l} 1 & i f x_{i} = x_{j} \\ 8 & otherwise \end{array} for all (i, j) \in {(1, 2), (2, 3), (3, 1)} .

(12)

Illustration of pseudo-max-marginals and ρ-reparameterization. (a) Original distribution. (b) – (d) Pseudo-max-marginals on each tree used by convex combination.

*Figure (a). p*(x)

*Figure (b). p*¹(x; ν●); $p^{1} (x; \hat{ν}); ρ^{1} = \frac{1}{3}$

*Figure (c). p*²(x; ν●); $p^{2} (x; \hat{ν}); ρ^{2} = \frac{1}{3}$

*Figure (d). p*³(x; ν●); $p^{3} (x; \hat{ν}); ρ^{3} = \frac{1}{3}$

Figure 6(b) – (d) illustrates the trees used for the convex combination and pseudo-max-marginals on each tree. It can be easily verified that pseudo-max-marginals on each tree are in fact max-marginals. Thus, the pseudo-max-marginals are tree-consistent. The distribution for each tree is given by (10). For example, the distribution for Figure 6(b) is

p^{1} (x; \hat{ν}) = {\hat{ν}}_{1} (x_{1}) {\hat{ν}}_{2} (x_{2}) {\hat{ν}}_{3} (x_{3}) \frac{{\hat{ν}}_{12} (x_{1}, x_{2})}{{\hat{ν}}_{1} (x_{1}) {\hat{ν}}_{2} (x_{2})} \frac{{\hat{ν}}_{23} (x_{2}, x_{3})}{{\hat{ν}}_{2} (x_{2}) {\hat{ν}}_{3} (x_{3})} .

(13)

Then, by letting ρ(T) = 1/3 for all three trees, we obtain

\frac{1}{98} p^{1} {(x; \hat{ν})}^{1 / 3} p^{2} {(x; \hat{ν})}^{1 / 3} p^{3} {(x; \hat{ν})}^{1 / 3}

(14)

= \frac{1}{98} ψ_{1} (x_{1}) ψ_{2} (x_{2}) ψ_{3} (x_{3}) ψ_{12} (x_{1}, x_{2}) ψ_{23} (x_{2}, x_{3}) ψ_{31} (x_{3}, x_{1}) = p (x),

(15)

from ψ_i(x_i) = ν̂_i(x_i)^−1/3 and ψ_ij(x_i, x_j) = ν̂_ij(x_i, x_j)^2/3. This verifies the pseudo-max-marginals satisfy ρ-reparameterization as well.

TRMP

Algorithm 1 in Appendix A describes “edge-based reparameterization updates”⁴⁷ defining Inline graphic as a set of (not necessarily spanning) trees in , as used by Kolmogorov⁵¹. In what follows, we will call this algorithm TRMP in short. Note that, although we define as a set of general trees covering all vertices and edges of , it can be easily verified that all the analyses done by Wainwright et al.⁴⁷ can be applied to TRMP in exactly the same way, to show TRMP has the same properties owned by the original edge-based reparameterization updates.

TRMP can sometimes guarantee the optimality of an assignment found at convergence for cyclic distributions. Even if TRMP does not find the exact MAP assignment, we can easily compute the exact maximum value for each tree distribution at TRMP convergence since pseudo-max-marginals converge to max-marginals for each tree distribution. Then, we can combine these to get an upper bound for the original, cyclic distribution (thereby obtaining a lower bound on the energy).

We are free to choose any set of trees Inline graphic and ρ(·) as long as each vertex and edge is covered by some T ∈ with ρ(T) > 0. In this work, we consistently use a set of maximal stars in place of for the convenience of implementation and the simplicity in computing rotamer-pair lower bounds. A star is a tree where at most one vertex is not a leaf. We denote the center of star S as γ(S). A maximal star is a star that is not a subset of another star. Figure 7 illustrates covering a graph by a set of maximal stars; all vertices and edges of graph (a) are covered by Inline graphic consisting of three maximal stars. In general, covering dense graphs such as complete graphs requires (n) maximal stars. As explained in Lemma 3, computing a rotamer-pair lower bound involves solving a constrained maximization problem for each tree distribution. Therefore, using allows us to address only Inline graphic (n) maximization problems in computing a rotamer-pair lower bound. In addition, due to the structure of a star, maximization of each tree (i.e. star) distribution can be simplified to one of the four cases of (22).

Example of covering a graph by maximal stars: of (a) is completely covered by S¹, S², and S³.

Figure (a).

Figure (b). S¹

Figure (c). S²

Figure (d). S³

Following the terminology of Kolmogorov⁵¹, we say ν is in a normal form if it satisfies max_{r∈R_i} ν_i(r) = 1 for all i ∈ Inline graphic , and max_{(r,s)∈R_i×R_j} ν_ij (r, s) = 1 for all (i, j) ∈ . In what follows, we assume ν of Algorithm 1 is always in a normal form. Then, from (2) and (9), and by introducing a positive constant ν_c, we obtain the following equation:

exp {- e (x)} = ν_{c} \prod_{S \in S} {p^{S} (x; ν)}^{ρ (S)} .

(16)

Table I.

Test case facts. Each column represents (1) No.: case number, (2) Model: model system, (3) Region: protein regions being considered, (4) AA: type of amino acids offered for design positions, (5) Lib: types of rotamer library used, (6) n: number of positions, (7) n_D: number of design positions, (8) w: number of mobile water molecules considered, (9) Σ|R_i|: total number of rotamers, (10) Pairs: total number of rotamer pairs, (11) logconf: $\sum_{i = 1}^{n} log ∣ R_{i} ∣$ , (12) Solved by: methods that solved the case (“Limited DEE” implies the case was solved by both BroMAP and DEE/A*, but only DEE-gp was necessary for BroMAP. “Bro” and “DEE” abbreviate BroMAP and DEE/A*, respectively).

No.	Model	Region	AA	Lib	n	n_D	w	Σ\|R_i\|	Pairs	logconf	Solved by
1	fn3	core	HP	REG	14	14	0	743	2.5E5	50.2	Limited DEE
2	fn3	core++	HP	REG	20	20	0	1,778	1.5E6	83.7	Bro & DEE
3	fn3	core++	HP	REG	23	23	0	1,894	1.7E6	94.1	Bro & DEE
4	fn3	core++	HP	REG	25	25	0	2,048	2.0E6	102.9	Bro & DEE
5	fn3	core++	HP	REG	27	27	0	2,083	2.1E6	108.6	Bro & DEE
6	fn3	core	HP	EXP	14	14	0	8,774	3.5E7	82.4	Limited DEE

7	D44.1	int	A	REG	7	4	0	476	8.5E4	21.6	Limited DEE
8	D44.1	int	A	REG	7	7	0	822	2.8E5	28.7	Limited DEE
9	D44.1	int	A	REG	8	8	0	965	4.0E5	33.4	Bro & DEE
10	D44.1	int	A	REG	9	9	0	1,019	4.5E5	37.1	Bro & DEE
11	D44.1	int	A	REG	10	10	0	1,133	5.6E5	40.6	Bro & DEE
12	D44.1	int	A	REG	11	11	0	1,376	8.4E5	46.4	Bro
13	D44.1	int	A	REG	16	14	2	2,020	1.9E6	70.1	None
14	D44.1	int	A	EXP	7	4	0	5,026	9.5E6	36.4	Limited DEE
15	D44.1	int	A	EXP	7	5	0	7,019	1.9E7	39.9	Bro & DEE
16	D44.1	int	A	EXP	7	6	0	7,910	2.6E7	42.9	Bro

17	D44.1	int	A	EXP	7	7	0	8,771	3.2E7	42.9	Bro
18	D1.3	int	A	REG	6	4	2	450	8.3E4	21.7	Limited DEE
19	D1.3	int	A	REG	11	8	3	767	2.6E5	38.5	Limited DEE
20	D1.3	int	A	REG	23	7	9	1,618	1.2E6	78.8	Limited DEE
21	D1.3	int	A	EXP	6	4	2	3,599	4.8E6	28.7	Limited DEE
22	D1.3	int	A	EXP	7	5	2	3,616	4.8E6	28.7	Limited DEE
23	D1.3	int	A	EXP	8	6	2	4,070	6.3E6	34.4	Bro & DEE
24	D1.3	int	A	EXP	11	4	3	4,612	8.0E6	42.6	Bro & DEE
25	D1.3	int	A	EXP	11	6	3	4,987	9.7E6	45.1	Bro & DEE
26	D1.3	int	A	EXP	11	7	3	5,461	1.2E7	47.4	Bro & DEE
27	D1.3	int	A	EXP	11	7	3	5,891	1.4E7	50.5	Bro
28	D1.3	int	A	EXP	11	8	3	6,365	1.7E7	52.8	Bro
29	D1.3	core	H	REG	16	16	0	342	5.4E4	44.1	Limited DEE
30	D1.3	core	H	REG	20	20	0	430	8.6E4	54.6	Limited DEE
31	D1.3	core	H	REG	26	26	0	503	1.2E5	66.7	Limited DEE
32	D1.3	core	H	REG	34	34	0	567	1.5E5	81.4	Limited DEE
33	D1.3	core	HP	REG	16	16	0	980	4.4E5	59.5	Bro & DEE
34	D1.3	core	HP	REG	20	20	0	1,228	7.1E5	74.1	Bro & DEE
35	D1.3	core	HP	REG	26	26	0	1,431	9.7E5	92.3	Bro & DEE
36	D1.3	core	HP	REG	34	34	0	1,582	1.2E6	112.7	Bro & DEE
37	D1.3	core	H	EXP	13	13	0	1,844	1.5E6	56.3	Bro & DEE
38	D1.3	core	H	EXP	16	16	0	2,734	3.5E6	75.7	Bro
39	D1.3	core	H	EXP	20	20	0	3,370	5.3E6	91.8	Bro
40	D1.3	core	H	EXP	26	26	0	3,894	7.1E6	111.6	Bro
41	D1.3	core	H	EXP	34	34	0	4,444	9.4E6	142.0	Bro

42	epo	int	A	REG	5	5	0	466	7.1E4	16.6	Limited DEE
43	epo	int	A	REG	6	6	0	419	6.8E4	17.0	Limited DEE
44	epo	int	A	REG	11	11	0	1,005	4.4E5	39.4	Bro & DEE
45	epo	int	A	REG	21	11	3	1,503	1.0E6	67.5	Bro & DEE
46	epo	int	A	REG	21	15	3	1,999	1.9E6	79.6	Bro
47	epo	int	A	REG	21	18	3	2,138	2.1E6	87.5	None
48	epo	int	A	EXP	5	5	0	5,001	8.4E6	26.5	Limited DEE
49	epo	int	A	EXP	6	6	0	4,170	6.8E6	26.3	Bro & DEE
50	epo	int	A	EXP	8	8	0	7,544	2.3E7	46.4	Bro & DEE
51	epo	int	A	EXP	9	9	0	8,724	3.2E7	53.4	Bro & DEE
52	epo	core	H	REG	17	17	0	291	3.9E4	43.5	Limited DEE
53	epo	core	H	REG	22	22	0	395	7.4E4	58.1	Limited DEE
54	epo	core	H	REG	28	28	0	433	8.9E4	65.4	Limited DEE
55	epo	core	H	REG	33	33	0	573	1.6E5	82.7	Limited DEE
56	epo	core	H	REG	41	41	0	727	2.6E5	103.3	Limited DEE
57	epo	core	HP	REG	17	17	0	827	3.2E5	60.1	Bro & DEE
58	epo	core	HP	REG	22	22	0	1,103	5.8E5	79.9	Bro & DEE
59	epo	core	HP	REG	28	28	0	1,208	7.0E5	92.6	Bro & DEE
60	epo	core	HP	REG	33	33	0	1,615	1.3E6	115.9	Bro & DEE
61	epo	core	HP	REG	36	36	0	1,827	1.6E6	128.4	Bro
62	epo	core	HP	REG	38	38	0	1,956	1.9E6	136.6	Bro
63	epo	core	HP	REG	41	41	0	1,999	1.9E6	143.1	Bro
64	epo	core	H	EXP	17	17	0	2,307	2.4E6	73.5	Limited DEE
65	epo	core	H	EXP	22	22	0	3,006	4.2E6	99.0	Bro & DEE
66	epo	core	H	EXP	28	28	0	3,213	4.8E6	111.1	Bro & DEE
67	epo	core	H	EXP	33	33	0	4,322	8.9E6	140.0	Bro
68	epo	core	H	EXP	41	41	0	5,712	1.6E7	175.0	None

Open in a new tab

The value of ν_c can be computed by comparing both sides of (16) for any assignment x ∈ Inline graphic . Equivalently, p(x) can be expressed as follows:

p (x) \frac{ν_{c}}{Z} \prod_{S \in S} {p^{S} (x; ν)}^{ρ (S)} .

(17)

Bounding the GMEC energy with TRMP

We also make heuristic use of TRMP to obtain upper bounds for the GMEC energy. At convergence of TRMP, we occasionally find an exact MAP configuration. TRMP provides an easy evaluation condition called optimum specification (OS) criterion such that an assignment is guaranteed to be a MAP configuration if it satisfies the OS criterion. However, such an assignment may not exist for a given reparameterization or it could be computationally expensive to find. Therefore, in our upper bounding, instead of trying to find an assignment that satisfies the OS criterion, we simply find an assignment that maximizes the tree distribution for some star S ∈ Inline graphic at TRMP convergence, using dynamic programming⁵⁰. Another possible upper-bounding method is to randomly pick a maximizer for each singleton max-marginals at TRMP convergence regardless of the trees. Although neither of these procedures guarantees the quality of the upper bounds, the resulting upper bounds are empirically close to the optimal values. The procedures can be repeated for different trees or different random selection of maximizers to improve the upper bounds.

A lower bound for the GMEC energy min_x e(x) can be easily obtained at the convergence of TRMP with the following lemma:

Lemma 1

When ν and ν_c of (17) in a normal form satisfy the tree-consistency condition, the MAP probability is upper bounded by

max_{x} p (x) \leq \frac{ν_{c}}{Z} .

(18)

Therefore, the GMEC energy min_x e(x) is lower bounded by min_x e(x) ≥ − ln ν_c from (2). This lower bound of the GMEC energy is independent of the normalization constant Z because, in (16), the product Π_S_∈ p^S(x; ν) purely depends on the normalized pseudo-max-marginals, that are generated without any reference to Z. Note that Lemma 1 is true not only for star covers but for general tree covers.

Example 3

To upper bound max_x p(x) using Lemma 1 and the pseudo-max-marginals given in Example 2, we first need to normalize pairwise pseudo-max-marginals. Since the maximum value of ν̂_ij(x_i, x_j) for all (i, j) are 8, normalized pairwise pseudo-max-marginals are as follows:

ν_{i j} (x_{i}, x_{j}) = {\begin{array}{l} 1 / 8 & i f x_{i} = x_{j} \\ 1 & otherwise \end{array} for all (i, j) \in {(1, 2), (2, 3), (3, 1)} .

(19)

Singleton pseudo-max-marginals are already in a normal form. Given the normalized pseudo-max-marginals, and p(x), we can compute ν_c/Z = 64/98 from (17). Note that ν_c/Z is computed instead of ν_c in this example because we are given p(x), but not e(x). Then, by Lemma 1, the upper bound of the MAP probability is 64/98. It is easy to see max_x p(x) is equal to 16/98 attained by any of (x₁, x₂, x₃) = (0, 0, 1), (0, 1, 0), etc. The upper bound of the MAP probability (thereby the resulting lower bound of the GMEC energy) is not tight in this example, but the quality of bounds from Lemma 1 can be stronger depending on pseudo-max-marginals from TRMP. In this example, on the other hand, a tight lower-bound of p(x) (therefore a tight upper-bound of the GMEC energy) is easily obtained by finding a MAP assignment for any of the trees in Inline graphic . For instance, (x₁, x₂, x₃) = (0, 1, 0) is a MAP assignment for tree distribution p¹(x; ν), and also for p(x).

Elimination by TRMP lower bounds

We can exploit the tree-consistency of ν at TRMP convergence in computing various lower bounds for a set of conformations. If a lower bound greater than a global upper-bound U is obtained, we can eliminate corresponding conformations from the subproblem while conserving the inequality relation between the minimum energy of the subproblem and U. We make a more precise argument for what we call rotamer-pair elimination and rotamer elimination as follows. Let P̃ be the set of flagged rotamer pairs in the subproblem of our interest. Then, given conformational space Inline graphic , we define ( , P̃) as the set of all legal conformations containing no flagged rotamer pairs.

1. rotamer-pair elimination: suppose we have a lower-bound LB(ζ_r, η_s) of the minimum conformational energy for {x|(x_ζ, x_η) = (r, s)}, the set of all conformations including (ζ_r, η_s), such that min_{_x_|(_xζ_,_xη₎₌₍_r_,_s_)} e(x) ≥ LB(ζ_r, η_s) > U. Elimination of (ζ_r, η_s) can be represented by the set of pair-flags P̃′ = P̃ ∪ ζ_r, η_s). We know min_x_∈
(_,P̃_′) e(x) is prunable if and only if min_x_∈
(_,P̃₎ e(x) is prunable. Therefore, we use P̃′ as the updated set of pair flags.

2. rotamer elimination: suppose we have a lower-bound LB(ζ_r) of the minimum conformational energy for {x|x_ζ = r}, the set of all conformations including ζ_r, such that min_{_x_|_{xζ= r}_} e(x) ≥ LB(ζ_r) > U. Elimination of ζ_r can be represented by the set of pair-flags P̃′= P̃ ∪ {(ζ_r, j_s)|s ∈ R_j, j ∈ Inline graphic , j ≠ ζ}, which includes all rotamer pairs stemming from ζ_r. Again, we know min _x_∈
(_,P̃_′) e(x) is prunable if and only if min_x_∈
(_,P̃₎ e(x) is prunable. Therefore, we use P̃′ as the updated set of pair flags. In both cases, the optimal value of min_x_∈
(_,P̃₎ e(x) does not change if min_x_{∈
(}_,P̃₎ e(x) ≤ U.

The lower-bounds LB(ζ_r) and LB(ζ_r, η_s) can be, for example, obtained by directly solving an LP relaxation of the ILP given in Appendix B. However, solving LP may not be practical when the problem size is large. In addition, solving LP for every rotamer or rotamer pair will multiply the lower-bounding time by the number of rotamers or rotamer pairs. Here, we use upper-bounding inequalities for the singleton and pairwise max-marginals to obtain lower bounds for minimum conformational energies of rotamers and rotamer pairs. Such lower bounds are at best as tight as the bounds from solving the LP discussed in Appendix B⁴⁷, but requires computation time for one TRMP run until convergence (no guaranteed time bound) plus post-processing time at most cubic of the problem size. The rest of this section explains how we can efficiently compute the rotamer and rotamer-pair lower bounds.

We have the following lemma on upper-bounding the singleton max-marginals:

Lemma 2

When ν and ν_c of (17) in a normal form satisfy the tree-consistency condition, it is true for all r ∈ R_ζ, ζ ∈ Inline graphic that

max_{{x ∣ x_{ζ} = r}} p (x) \leq \frac{ν_{c}}{Z} ν_{ζ} {(r)}^{ρ_{ζ}} .

(20)

Example 4

From Lemma 2 and the normalized pseudo-max-marginals given in Example 3, we find an upper bound for the maximum probability of p(x) when x₁ = 0 as (ν_c/Z)ν₁(0)^1/3 = 64/98 × 1^1/3. The bound is not tight because max_{x|x₁=0} p(x) = 16/98, but the tightness may change depending on the pseudo-max-marginals from TRMP. Even when the resulting bound is not tight, it could be still strong enough to eliminate the corresponding rotamer through comparison against a global upper-bound U.

Lemma 2 combined with (2) provide a rotamer lower-bound LB(ζ_r) for each r ∈ R_ζ and ζ ∈ Inline graphic as min_{{x|x_ζ=r}} e(x) ≥ LB(ζ_r) = − ln ν_c − ρ_ζ ln ν_ζ(r).

To upper bound the pairwise max-marginals, we use the general inequality

max_{{x ∣ x_{ζ} = r, x_{η} = s}} p (x) \leq \frac{ν_{c}}{Z} \prod_{S \in S} {[max_{{x ∣ x_{ζ} = r, x_{η} = s}} p^{S} (x; ν)]}^{ρ (S)} .

(21)

The maximization problem max_{{x|x_ζ =r,x_η=s}} p^S(x; ν) can be easily solved using the following lemma:

Lemma 3

When ν and ν_c of (17) in a normal form satisfy the tree-consistency condition,

max_{{x ∣ x_{ζ} = r, x_{η} = s}} p^{S} (x; ν) = {\begin{array}{l} 1 & i f ζ, η \notin V (S) \\ ν_{ζ} (r) & i f ζ \in V (S) and η \notin V (S) \\ ν_{ζ η} (r, s) & i f (ζ, η) \in E (S) \\ {max}_{x_{ξ} \in R_{ξ}} \frac{ν_{ξ ζ} (x_{ξ}, r) ν_{ξ η} (x_{ξ}, s)}{ν_{ξ} (x_{ξ})} & else (let ξ = γ (S)) \end{array}

(22)

Example 5

Let us bound max_{{x|(x₁,x₂)=(0,0)}} p(x) using the normalized pseudo-max-marginals given in Example 3. As discussed above, we have to solve maximization problem for each star:

p¹(x; ν) and p³(x; ν) (Figure 6(b) and 6(d)): this corresponds to the third case of (22). Therefore,
${max}_{{x ∣ (x_{1}, x_{2}) = (0, 0)}} p^{1} (x; ν) = {max}_{{x ∣ (x_{1}, x_{2}) = (0, 0)}} p^{3} (x; ν) = ν_{12} (0, 0) = 1 / 8.$
p²(x; ν) (Figure 6(c)): this corresponds to the fourth case of (22). Therefore,
$max_{{x ∣ (x_{1}, x_{2}) = (0, 0)}} p^{2} (x; ν) = max_{x_{3}} \frac{ν_{3, 1} (x_{3}, 0) ν_{3, 2} (x_{3}, 0)}{ν_{3} (x_{3})} = 1.$ (23)

By combining the above results in (21), we obtain

max_{{x ∣ (x_{1}, x_{2}) = (0, 0)}} p (x) \leq (64 / 98) \times {(1 / 8)}^{1 / 3} \times {(1 / 8)}^{1 / 3} \times 1^{1 / 3} = 16 / 98.

(24)

This bound is tight from max_{{x|(x₁,x₂)=(0,0)}} p(x) = 16/98 attained by x₃ = 1. Note that the same pseudo-max-marginals that yielded weak upper bounds in Examples 3 and 4, led to a tight upper bound for the rotamer pair, a more constrained bounding problem.

LB(ζ_r, η_s), a lower bound for the minimum conformation energy of rotamer-pair (ζ_r, η_s), is given by LB(ζ_r, η_s) = − ln ν_c − Σ_S_∈ ρ_S ln max_{{x|x_ζ=r,x_η=s}} p^S(x; ν).

Note that there can be at most Inline graphic (n) stars that correspond to the fourth case of (22) for each position pair (ζ, η). If we let n_rot be the average number of rotamers per position, the maximization problem corresponding to the fourth case of (22) requires (n_rot) operations. Therefore, it will take (n_rotn) post-processing operations to compute an upper bound for each rotamer pair using Lemma 3.

In computing the rotamer lower bound for a rotamer ζ_r, we can also use pair-flags information to obtain a lower bound, LB′(ζ_r), for the constrained problem min_{{x∈
(
,P̃)|x_ζ =r}} e(x). If we have LB′(ζ_r) > U, then conformations, {x ∈ Inline graphic ( , P̃)|x_ζ = r} can be excluded from the search space. This is equivalent to eliminating rotamer ζ_r because all conformations containing x_ζ = r are in effect excluded. Computing LB′(ζ_r) will take additional polynomial time compared to LB(ζ_r), but it is particularly advantageous to leverage the pair flags when there exist a large number of flagged rotamer pairs. We used a simple search-based method to compute LB′(ζ_r) as follows; let p̂ = Π_S_∈[max_{{x∈
(
,P̃)|x_ζ=r}} p^S(x; ν)^ρ(S) for tree-consistent ν in a normal form. Then, it is easy to see (ν_c/Z)p̂ is an upper bound of max_{{x∈
(
, P̃)|x_ζ=r}} p(x). If we use a naive search, it will take $O (n_{rot}^{2} n)$ post-processing comparison operations to compute max_{{x∈
(
,P̃)|x_ζ=r}}p^S(x; ν). Therefore, it takes $O (n_{rot}^{2} n^{2})$ post-processing time to exactly compute p̂. Finally, the rotamer lower bound is computed as LB′(ζ_r) = − ln ν_c − ln p̂.

Results and Discussions

We performed computational experiments to evaluate the performance of BroMAP. We used a set of various protein design cases to measure and compare the running times of BroMAP and a fast implementation of DEE/A* that includes most of the state-of-art techniques²⁶. In the following, to distinguish the modified version of DEE used in BroMAP from the DEE used in DEE/A*, we will call the former as DEE-gp (DEE for general pair flags). The two main questions we are interested in investigating with the experiments are (1) whether BroMAP can solve design cases previously unsolved by DEE/A*, and (2) whether we can use BroMAP generally as an alternative to DEE/A* without being restricted to specific types of design cases. Although we are mainly interested in the overall performance of BroMAP here, Hong and Lozano-Pérez⁵² evaluate the effectiveness of our pruning method by comparing it against linear programming.

DEE/A* implementation

We used an in-house implementation of DEE/A* written in the C programming language⁵³. DEE/A* was performed with the following options and order:

Eliminate singles using Goldstein’s condition²¹. Repeat until elimination is unproductive.
Eliminate singles using split flags (s = 1)²⁴. Repeat until elimination is unproductive.
Do logical singles-pairs elimination²².
Eliminate pairs using Goldstein’s condition with one magic bullet²³.
Do logical singles-pairs elimination.
If unification is possible, do unification²¹, and go to (1).
Do A*⁴⁶.

For unification, the pair of positions that has the largest fraction of flagged rotamer pairs is picked. However, because the energy terms and pair flags must be stored in machine memory, we capped the total number of rotamers that would result to be no greater than a unification option C_uni. Therefore, any pair of positions that would create a larger number of rotamers when unified than C_uni was not considered, and the pair with the next-largest fraction of flagged rotamer pairs was considered. We experimented with different values of C_uni, i.e. 6,000, 8,000, 10,000, 12,000, and 14,000, to obtain the best running time for each test case. Note that this gives DEE/A*, the competing method an advantage over BroMAP in comparing their running times, because it will give better DEE/A* times than consistently using one of the C_uni values. Increasing C_uni and thus the allowance for large unification can facilitate solving otherwise difficult or unsolvable cases. However, for small to medium cases, larger values of C_uni often result in slower solution times.

Our DEE implementation uses a full table of energies. That is, if there are $q = \sum_{i = 1}^{n} ∣ R_{i} ∣$ rotamers in the problem, DEE allocates memory for q² floating point numbers.

When the DEE/A* procedure described above using various C_uni values failed to solve a test case, we also tried singles-elimination using split flags with s = 2 instead of s = 1, or allowed the number of magic bullets to increase up to the number of positions.

BroMAP implementation

BroMAP was implemented in C++. We used the PICO-library⁵⁴ for the BnB framework. The PICO-library provides the data structures and methods to create/delete nodes and to search the tree. It also provides procedure skeletons, for instance, for upper/lower-bounding methods.

In BroMAP, we restricted the amount of effort spent by DEE-gp instead of allowing it to keep iterating singles/pairs-flagging and unification until it finally solved the subproblem. This was done by limiting the maximum number of iterations of singles/pairs-flagging and also by using a smaller fixed C_uni value for unification than those used by DEE/A*.

Other than performing DEE-gp and TRMP bounding for each subproblem, we also allowed rotamer-contractions⁵². Rotamer-contraction reduces the size of a subproblem by grouping similar rotamers at a residue position as a cluster and replacing the cluster by a new single rotamer. It also defines the pairwise energies for the new rotamer so that the optimal value of the reduced subproblem is always a lower bound of the optimal value of the subproblem before the rotamer-contraction. Rotamer-contraction was iteratively performed until we obtained a lower bound greater than U or the number of executed rotamer-contractions reached a pre-determined limit. We used a heuristic boundability index (BI) multiplied by a positive integer P_rc as such limits. The BI for a specific node is equal to the number of ‘high’ branches on the path from the root to the node. For example, in the search tree of Figure 1, assuming the BI of the root node is equal to 0, BI’s are 0 for nodes 1, 3, 5, 7, 9, and 1 for nodes 2, 4, 6, 8, 11, and 2 for node 10. In these experiments, we let P_rc = 16 after exploring the overall effect of different values of P_rc on running times of BroMAP.

In case rotamer-contractions were performed multiple times in bounding a subproblem as described above, we also performed additional DEE-gp and TRMP periodically on the subproblem reduced by rotamer-contractions. After every P_DEE consecutive rotamer-contractions, we applied DEE-gp to see if we could solve the reduced problem or only to flag more rotamers or rotamer pairs. TRMP was also run until convergence after every P_TRMP consecutive rotamer-contractions to compute a lower bound for the subproblem or to flag rotamers or rotamer-pairs using the TRMP lower bounds. In this experiment, we let P_DEE = 8, and P_TRMP = 16.

Along the first depth-first dive, that is, until we exactly solve a subproblem for the first time, we performed only DEE-gp, TRMP bounding, and subproblem splitting, once respectively, but did not use any rotamer-contraction. As with DEE/A*, BroMAP also used the A* search algorithm when DEE-gp could not eliminate any more rotamers or rotamer pairs and the subproblem was considered small, i.e. contained less than 200,000 rotamer pairs.

The BroMAP implementation needs to hold TRMP data, whose size is of the order of the number of rotamer pairs. This corresponds to $\sum_{i = 1}^{n - 1} \sum_{j = i + 1}^{n} ∣ R_{i} ∣ ∣ R_{j} ∣$ floating point numbers, and is roughly half the memory required by DEE/A*. Since BroMAP also performs DEE-gp, it requires additional memory of ${(\sum_{i = 1}^{n} ∣ R_{i} ∣)}^{2}$ floating point numbers for the full DEE energy table. Therefore, the maximum memory requirement of BroMAP is ${(\sum_{i = 1}^{n} ∣ R_{i} ∣)}^{2} + \sum_{i = 1}^{n - 1} \sum_{j = i + 1}^{n} ∣ R_{i} ∣ ∣ R_{j} ∣$ floating point numbers, which is roughly 1.5 times larger than that of DEE/A*.

Platform

We used a Linux workstation with two dual-core 2 GHz AMD Opteron 246 processors and 2 Gbytes of memory for the experiment. The C/C++ codes for BroMAP and DEE/A* were compiled using Intel C/C++ Compiler Version 9.1 for Linux. During compile, OpenMP directives were enabled to parallelize the execution of DEE, DEE-gp, and TRMP over two CPU cores. All other procedures, including A*, were executed over a single core. Note that BroMAP or DEE/A* was allowed to use the whole system memory but only one processor at a time.

Test cases

We used 68 test cases whose energy files are smaller than 300 Mbytes. An energy file contains floating point numbers representing singleton and pairwise energies. We found energy files larger than 300 Mbytes are not handled well with the current implementation of BroMAP on our workstation due to the memory requirement of BroMAP.

We used three different model systems in obtaining test cases:

FN3: derived from protein ¹⁰ Fn3, the tenth human fibronectin type III domain⁵⁵. It is a 94-residue β-sheet protein with an immunoglobulin-like fold. Besides its natural in vivo role, the protein has been engineered as an antibody mimic to bind with high affinity and specificity to arbitrary protein targets.
D44.1⁵⁶ and D1.3⁵⁷: antibodies that bind to hen egg-white lysozyme (HEL), though they bind different HEL epitopes. Each has low nanomolar binding affinity, and was originally isolated after murine immunization. For the D1.3 core designs, we redesigned the core of the lysozyme protein, absent of the antibody.
EPO: human erythropoietin (Epo) protein complexed to its receptor (EpoR)⁵⁸. One Epo binds to two EpoR with one high-affinity and one low-affinity binding sites. Our EPO interface designs addressed the high-affinity binding site while our core designs addressed the core of the EpoR from the high-affinity site.

Each case corresponds to one of three types of protein regions:

INT: protein–protein binding interface.
CORE: protein core, i.e. side chains that are not solvent-exposed.
CORE++: protein core plus boundary positions that are partially exposed to solvent.

We varied the types of amino acids offered at design positions of each case as follows:

H: hydrophobic amino acids (A, F, G, I, L, M, W, V).
HP: hydrophobic plus polar amino acids (A, F, G, I, L, M, W, V, H, N, Q, S, T, Y).
A: all type of amino acids, excluding proline and cysteine.

For CORE, we used both H and HP, and for CORE++, we used HP (with both neutral tautomers of histidine allowed in each case). For INT, we used A, and allowed both neutral tautomers and the protonated form of histidine. For all designs, if the wild-type amino acid was not part of the library, it was added at that position. For some test cases, the total number of positions in the search was greater than the number of design positions. At these other positions, the native amino-acid type was retained and its conformation was varied.

Each case was made using one of two different rotamer libraries:

REG: standard rotamer library. This is based on the backbone-independent May 2002 library⁵⁹. This library was supplemented with three histidine rotamers for an unsampled ring flip, and two asparagine rotamers to increase sampling of the final dihedral angle rotation.
EXP: expanded rotamer library. This was created by expanding both χ₁ and χ₂ of rotamers in REG by ±10°. The hydroxyls of serine, threonine, and tyrosine were sampled every 30 degrees. For some INT cases of D1.3, D44.1, and EPO, crystallographic water molecules were allowed conformational freedom. The oxygen atom location was fixed to that of the crystal structure and the hydrogen atoms were placed to create 60 symmetric water molecule rotations.

For all libraries and cases, each crystallographic wild-type rotamer was added in a position-specific manner to the library, using the complete Cartesian representation of the side chain, rather than just the dihedral angles.

The singleton/pairwise energies of rotamers were computed using the energy function of CHARMM PARAM22 all-atom parameter set with no cut-offs for non-bonded interactions and a 4r distance-dependent dielectric constant. All energy terms were used (bond, angle, Urey-Bradley, dihedral, improper, Lennard-Jones, and electrostatic). Rotamers that clashed with the fixed protein regions were eliminated during case generation if their singleton energies were greater than the smallest singleton energy at that position by at least 50 kcal/mol. Further details on design methods and test case construction can be found from Lippow et al.⁶⁰

Table I lists composition and problem-size information of each test case. Its last column also summarizes the experimental results presented in the following.

Running time comparison

Among the 68 cases, BroMAP solved 65 cases within the 7-days allowed time whereas DEE/A* solved 51 cases for the same allowed time. There were no cases DEE/A* solved but BroMAP was not able to solve. The 14 cases solved by BroMAP but not by DEE/A* suggest that BroMAP can be an alternative to DEE/A* for hard design cases where DEE/A* performs poorly.

Among the 51 cases solved by both BroMAP and DEE/A*, solving 23 cases by BroMAP required only the DEE-gp part of BroMAP. Since BroMAP only acted as a light DEE for these cases, comparing the running times of BroMAP and DEE/A* on them is not meaningful. After eliminating these 23 cases, we are left with 28 cases for which we are interested in comparing the running times of BroMAP and DEE/A*. The running times for these 28 cases are shown in Table II. Additionally, the table lists results for 14 cases that only BroMAP solved.

Table II.

Results of solving the non-“Limited DEE” cases with BroMAP and DEE/A* (cases solved by limited DEE are not presented). Columns (1) No.: case number, (2) Bro: BroMAP solution time in seconds, (3) DEE: DEE/A* solution time in seconds, (4) T-Br: total number of branchings (i.e. splits), (5) F-Br: number of branchings during the first depth-first dive, (6) Skew: skewness of the search tree defined as $\frac{(number o f low - subproblems split)}{(total number o f splits) - 1}$ , (7) F-Ub: U - OP T, i.e. difference between the upper bound from the first depth-first dive and the GMEC energy, (8) Leaf: Σ_i log₁₀ |R_i| of the node at the end of the first depth-first dive, (9) Rdctn: average reduction of Σ_i log₁₀ |R_i| during the first depth-first dive, i.e. (logconf-Leaf)/(F-Br), where logconf is defined in Table I, (10) RC: number of rotamer-contractions performed, (11) %DE: BroMAP time percentage used for DEE-gp, (12) %A*: BroMAP time percentage used for A*, (13) %TR: BroMAP time percentage used for TRMP. Note that columns 11 to 13 may not sum to 100% because of time spent on rotamer-contraction and overhead of using the branch-and-bound framework.

No.	Bro	DEE	T-Br	F-Br	Skew	F-Ub	Leaf	Rdctn	RC	%DE	%A*	%TR
2	2.6E3	3.1E4	31	25	0.90	0.49	30.7	2.12	36	42.8	0.3	56.3
3	2.4E3	2.3E4	31	26	0.93	0.49	27.7	2.55	32	46.2	0.6	52.6
4	2.8E3	1.3E4	23	23	1	0	33.7	3.01	0	43.9	0.3	55.5
5	2.7E3	2.1E4	26	26	1	0.55	27.4	3.12	0	37.2	0.4	62.2
9	1.2E2	4.8E2	3	3	1	0	27.6	1.93	0	8.9	74.1	17.0
10	4.6E2	1.3E3	13	10	0.75	0.37	26.9	1.02	74	7.6	70.4	14.4
11	5.7E3	3.5E4	109	17	0.81	0.36	26.2	0.85	663	3.8	78.9	11.2
15	2.9E2	3.5E2	0	0	NA	0	NA	NA	0	94.6	0.4	4.7
23	1.5E2	2.6E2	0	0	NA	0	NA	NA	0	86.7	0	12.6
24	3.2E2	3.1E2	4	4	1	0	25.3	4.33	0	62.3	15.1	21.6
25	2.9E2	1.2E3	0	0	NA	0	NA	NA	0	89.6	0	10.4
26	1.4E3	1.7E3	11	11	1	0.89	29.2	1.65	0	46.1	0.4	53.2
33	4.1E2	2.1E3	13	13	1	0	27.9	2.43	0	34.7	4.5	59.8
34	1.1E3	3.7E3	19	19	1	0	30.0	2.32	0	32.2	2.7	64.8
35	2.8E3	4.1E4	21	21	1	0	28.7	3.03	0	50.7	0.6	48.6
36	4.6E3	2.3E4	25	25	1	0	27.9	3.39	0	53.2	0.7	45.9
37	2.5E2	2.5E2	0	0	NA	0	NA	NA	0	76.0	2.4	21.2
44	2.2E2	3.8E1	8	6	0.71	0.54	28.2	1.87	17	8.2	75.5	14.1
45	8.8E2	2.0E2	8	8	1	0	26.2	5.16	8	48.6	23.8	25.4
49	3.3E2	5.0E2	4	4	1	0	19.8	1.63	0	51.1	11.5	37.5
50	1.2E3	1.1E3	7	7	1	0	22.3	3.44	12	72.2	7.0	17.1
51	5.7E4	2.8E5	666	25	0.85	0.58	27.6	1.03	5,656	16.7	21.2	41.8
57	4.6E1	2.7E2	0	0	NA	0	NA	NA	0	84.8	0	15.2
58	1.5E3	1.0E3	19	19	1	0	28.8	2.69	0	42.5	0.2	57.1
59	4.4E2	4.0E3	0	0	NA	0	NA	NA	0	70.6	0	29.1
60	1.5E4	4.6E4	32	32	1	0	37.3	2.46	0	30.1	0.1	69.7
65	4.6E3	1.7E3	15	15	1	0	22.7	5.09	0	61.9	0	37.8
66	7.7E3	2.4E3	15	15	1	0	33.9	5.15	0	67.2	0	32.6

Cases below were solved by BroMAP only.

12	2.0E5	NA	2773	23	0.82	7.11	26.2	0.88	3.9E4	6.0	59.1	20.1
16	3.5E3	NA	12	11	0.91	0	23.6	1.75	30	41.7	6.0	49.3
17	1.1E5	NA	298	21	0.84	3.35	26.7	0.77	2,576	17.7	28.1	32.7
27	8.0E3	NA	23	23	1	0	27.8	0.99	13	32.2	1.1	66.4
28	2.1E4	NA	175	25	0.91	0	28.0	0.99	1,168	23.8	8.5	57.9
38	1.4E4	NA	155	31	0.87	0.50	30.2	1.47	571	30.9	0.4	62.8
39	1.2E5	NA	572	43	0.85	0	27.4	1.50	4,791	30.4	0.1	58.4
40	1.8E5	NA	293	43	0.81	0	29.6	1.91	2,440	35.9	0	56.1
41	2.1E5	NA	364	41	0.85	0	33	2.66	2,771	34.2	0	57.5
46	5.0E5	NA	2675	36	0.69	8.28	27.8	1.44	1.4E5	18.8	18.8	35.3
61	2.8E4	NA	55	49	0.96	0.36	28.2	2.04	15	49.0	0	50.8
62	3.6E5	NA	232	58	0.88	0.27	30.1	1.84	1,119	43.5	0	50.2
63	1.1E5	NA	143	53	0.85	0.29	32.8	2.08	506	41.4	0	55.4
67	1.3E5	NA	37	37	1	0	35.6	2.82	0	51.5	0	48.5

Open in a new tab

Figure 8 plots the ratio of BroMAP running time to DEE/A* running time vs. DEE/A* running time. Note that the plotted ratios for cases solved only by BroMAP are upper bounds on actual ratios because actual DEE/A* running times should be more than 7 days. Overall, the plot suggests BroMAP gains advantage for cases as DEE/A* takes longer. For all cases that DEE/A* took more than one hour to solve, the maximum ratio was 0.33. Together with the 14 cases solved by BroMAP only, the experiment supports that BroMAP can be an alternative to DEE/A* for hard design cases. There are 5 cases for which the BroMAP solution time is at least 10% longer than DEE/A* solution time. Considering four of them (cases 45, 58, 65, and 66) were almost ideally solved by BroMAP (the GMEC was found at the end of the first depth-first dive and there was no branching after the first depth-first dive), we find more aggressive DEE conditions such as larger C_uni were critical in obtaining shorter running times on them. In terms of the total running time, however, none of these five cases needed more than 130 minutes to be solved by BroMAP. Therefore, using BroMAP did not impractically slow down the solution time for cases in Table I.

Ratio of BroMAP time to DEE/A* time vs. DEE/A* time for 42 cases in Table II. Labels next to data points are case numbers from Table I. The 14 cases solved by BroMAP only are shown in the narrow pane on the right side. The running time ratios for these cases were calculated by assuming the DEE/A* time for each of them is 7 days although they were not solved by DEE/A* within 7 days. The trend line represents a robust fit for the 28 cases that were solved by both BroMAP and DEE/A*. The horizontal dashed line represents the ratio equal to 1. Different symbols are used to represent each case depending on the type of protein region (CORE, CORE++, or INT) and the type of library used (REG or EXP): (1) ○ = CORE, □ = CORE++, △ = INT, (2) empty = REG, filled = EXP.

For large hard cases, the system memory can be a limiting factor on the performance of DEE/A* because the performance of DEE/A* often greatly depends on the unification procedure that requires a large amount of memory. While this implies larger system memory could have given advantage to DEE/A* over BroMAP in terms of running time, our results suggests that the memory constraints experienced by DEE/A* can be alleviated through the use of BroMAP.

Table II lists the percentage of time used for each component of BroMAP. In most cases, DEE-gp, A*, and TRMP turned out to be major contributors to the running time. If we sum running times of BroMAP for all cases, it is found that 42% of the total time was spent on DEE-gp and A*, and 45% on TRMP. On the other hand, considering the proportion between the total running time of BroMAP and A* time, a great amount of time was spent on A* for cases 11 and 12. This could be avoided by further restricting the size of the subproblem for which A* is allowed to run.

Among cases in Table I, BroMAP was able to solve six cases at the root node without splitting. All other cases required BroMAP to branch but many of them needed very little branching other than those performed during the first depth-first dive. This trend is observed through the skewness of the search tree, defined as $\frac{(number of low - subproblems split)}{(total number of branchings) - 1}$ . The ratio varies between 0 and 1 and is larger than 0.5 if there are more low-subproblems split than high-subproblems. We computed skewness for 36 cases where BroMAP required more than one split. The minimum skewness from these cases is 0.69 and 17 cases had skewness equal to 1, that is, needed only low-subproblem splittings.

Figure 9 shows actual search trees generated by BroMAP during solution of three cases. Box-shaped (shaded) nodes in each search tree represent the subproblems that were exactly solved and resulted in an upper bound less than or equal to the current best upper bound. Therefore, the box-shaped node that is expanded latest is a node where the GMEC is found in the search tree. Note that, for 27 cases out of 42 cases in Table II, an upper bound equal to the GMEC energy was found at the end of the first depth-first dive. However, early discovery of the GMEC did not necessarily lead to fast completion of BroMAP. For example, in Figure 9(c), we can see the lower bounding was not effective for large subproblems although they were expanded after the optimal upper-bound was found, resulting in a large number of branchings.

Search trees of BroMAP for three cases. For each branching, the low-subproblem is placed on the right-hand side, and the high-subproblem on the left-hand side. Shaded box-shaped nodes represent the subproblems that were exactly solved and resulted in an update of the global upper bound.

*Figure (a).* Case 65 (skew = 1.0)

*Figure (b).* Case 3 (skew = 0.93)

*Figure (c).* Case 17 (skew = 0.84)

Table II suggests that the search trees of BroMAP have smaller depths than those from conventional BnB methods would have. A simple branching without reduction within a node would only reduce the problem size by a factor of two. That is, a child subproblem will have Σ_i log₁₀ |R_i| value smaller by log₁₀ 2 ≈ 0.30 than its parent subproblem. However, column “Rdctn” shows the average reduction was far greater. Excluding the cases solved without branching, the average of the average reduction of Σ_i log₁₀|R_i| along the first depth-first dive was 2.32, a factor of 7 speed-up over reduction by conventional BnB methods. It should be noted that the reduction within a node can be even greater after a strong upper bound is found. This is evidenced by highly skewed shapes of search trees. Overall, the reduced depth and high skewness of BroMAP search trees suggest the number of nodes expanded by BroMAP is exponentially smaller than that of conventional BnB methods using simple branching. Meanwhile, the effect of smaller search trees will be transferred to shorter running times as well; the experimental results presented by Hong and Lozano-Pérez⁵² show that the node processing time by DEE-gp and TRMP is similar to the bounding time for solving a linear programming (LP) problem, a typical bounding method used in BnB methods, but the LP produces weaker bounds.

The plots in Figure 10 provide interesting insights on the hardness of test cases. In Figure 10(a), categorizing all test cases by their solvability reveals cases with higher ratios of log conformation to the number of design positions tend to be harder to solve. Figure 10(b) uses gray scale to represent the running times of BroMAP. Although the performance of BroMAP is not particularly dependent on protein regions, it is noted that INT cases are smaller than CORE cases. This is because we excluded small CORE cases from the experiment because they are often too easy for either BroMAP or DEE/A*, and also excluded large INT cases for the opposite reason. There are two main reasons that INT cases are harder than CORE. First, INT cases are offered more rotamers per position because we allowed 8 to 14 amino acids for CORE cases whereas 18 amino acids including R, K, D and E were allowed for INT cases. These four additional amino acids offer even more rotamers per amino acid than average because of their long side chains. Second, whereas CORE cases are constrained by side-chain/side-chain interactions as well as side-chain/backbone interactions, INT cases are generally less constrained by side-chain/backbone interactions, and therefore there exist a larger number of compatible conformations.

Each case is plotted by the number of design positions and log number of conformations. In both (a) and (b), different symbols were used for different protein regions: (1) △ = INT, (2) ○ = CORE, (3) □= CORE++. In (a), cases were marked with different colors depending on their solvability: (1) yellow = solved by limited DEE, (2) green = solved by BroMAP and DEE/A*, (3) blue = solved by BroMAP only, (4) red = solved by none. In (b), the BroMAP running time on each case was used to color the corresponding symbol. The color bar on the right side shows mapping between a color and a running time in seconds.

*Figure (a).* Solvability

*Figure (b).* Protein region and BroMAP running time

TRMP lower bounds

We present a numerical example to illustrate the utility of TRMP lower bounds in rotamer/rotamer-pair elimination. For this purpose, we use subproblems of Case 17 at depth 2 to 11 (root node is at depth 1). These subproblems correspond to node numbers 2, 4, 6, …, 20, and are colored in light gray in Figure 9(c) (nodes in the search tree are numbered by the order of creation using depth-first search). Table III lists the lower-bounding result.

Table III.

TRMP lower-bounding results for subproblems of Case 17. The meaning of each column is, in order: (1) node number, (2) number of rotamers in the subproblem, (3) number of rotamer pairs in the subproblem, (4) time (sec) for TRMP convergence, (5) median rotamer lower bound when not using pair flags (rotlb₁), (5) percentage of rotamers such that rotlb₁ > U, (6) time (sec) for computing rotlb₁ for all rotamers, (7) median rotamer-pair lower bound (rplb), (8) percentage of rotamer pairs such that rplb > U, (9) time (sec) for computing rplb for all rotamer pairs, (10) median rotamer lower bound when using pair flags (rotlb₂) after rotamer pairs were flagged by rplb, (11) percentage of rotamers such that rotlb₂ > U, (12) time (sec) for computing rotlb₂ for all rotamers. In the Table, time for TRMP convergence was excluded from time for computing rotlb₁, rotlb₂ or rplb. The value of U is −55.13, which is equal to the optimal value and was available as a global upper bound for each node in the table by the time they were expanded.

Node	#rots	#pairs	^TTR	Rot lb’s w/o p-flags			Rot-pair lb’s			Rot lb’s w/p-flags

				med	%el	time	med	%el	time	med	%el	time
2	4504	8.4E6	148	−71.69	11	0	−40.21	74	1162	−70.91	26	7
4	3837	6.3E6	203	−70.98	9	0	−46.18	68	774	−70.61	21	3
6	3570	5.4E6	238	−80.98	3	0	−53.45	54	607	−80.56	7	4
8	3269	4.5E6	190	−84.57	1	1	−58.82	44	463	−84.35	3	1
10	2969	3.7E6	99	−73.98	12	0	−47.80	62	353	−73.59	17	2
12	2704	3.1E6	105	−84.00	7	0	−59.17	45	261	−83.81	8	1
14	2504	2.6E6	77	−84.10	13	0	−51.16	50	202	−83.93	14	1
16	2173	2.0E6	65	−82.25	5	0	−61.36	42	138	−82.13	8	0
18	1878	1.5E6	71	−88.56	5	0	−66.31	40	92	−88.47	9	0
20	1725	1.3E6	16	−86.09	7	0	−65.68	38	74	−85.94	8	0

Open in a new tab

In each node, we obtain more elimination using rotlb₂ (rotamer lower bound from using pair flags) than using rotlb₁ (rotamer lower bound from not using pair flags). This is due to massive flagging of rotamer pairs by rplb (rotamer-pair lower bound). Figure 11 shows rotamers ordered by the value of rotlb₁ on x-axis and their rotlb₁, rotlb₂ values on y-axis for the subproblem of node 2. The difference between rotlb₁ and rotlb₂ for the same rotamer shows pair-flags information can strengthen the lower bounds, thereby doubling the number of eliminated rotamers in this example.

Plots of TRMP lower bound vs. rotamer, for all (4,504) rotamers in the subproblem of node 2 in solving Case 17 by BroMAP; *rotlb*₁ is represented by a dot and *rotlb*₂ by a ‘+’ symbol. Rotamers are sorted on x-axis by the increasing order of *rotlb*₁. All rotamers with lower bounds greater than or equal to 0 were clipped at y = 0. The horizontal line at y = −55.13 represents U. By comparing *rotlb*₁ against U, 497 rotamers (4, 008_th to 4, 504_th in the order) were eliminated. Using *rotlb*₂ instead increased the number of eliminated rotamers to 1,171.

Large elimination obtained for subproblems at small depth are suspected to come from our splitting scheme of dividing rotamers by their lower bounds. As we go deeper down the search tree, we expect such distinction between rotamer lower bounds to become less clear. The trend is observed by the median value of rotlb₁ and the percentage of eliminated rotamers and rotamer pairs for nodes at different depths.

Computing rotlb₂ takes more time than rotlb₁, but Table III shows that the difference is relatively insignificant compared to the time for computing rplb. The time for computing rplb for every rotamer pair was typically at least double the time for TRMP convergence, suggesting that an efficiency improvement of rotamer-pair lower-bound computation would significantly contribute to reducing the running time of BroMAP.

Conclusions

In this work we presented an exact solution method (BroMAP) for the global minimum energy conformation search. Particularly, BroMAP was designed to substitute the DEE/A* approach for large protein design problems where a large number of rotamers is offered at each position and there exist side-chain interactions between all pairs of residue positions. BroMAP uses a branch-and-bound (BnB) framework and performs problem-size reduction within each subproblem using DEE and elimination by TRMP lower bounds. BroMAP also exploits TRMP lower bounds in branching and subproblem selection. We performed computational experiments to evaluate BroMAP on various types and sizes of protein design problems in comparison with DEE/A*. The experimental results show that BroMAP solved hard protein design cases faster than DEE/A*, and that BroMAP also solved many cases that DEE/A* failed to solve within allowed time and memory. In addition, using BroMAP on cases where DEE/A* performed well did not incur significant disadvantage in running time.

The performance advantage of BroMAP over DEE/A* or conventional BnB methods can be attributed to three factors. First, the search trees are radically smaller than those from conventional BnB methods. Problem-size reduction performed within each node results in reduced depths of search trees, and early discovery of suboptimal upper bounds allows effective pruning of nodes that follow. Second, on top of fast reduction by DEE within each node, BroMAP can perform additional elimination and informed branching using lower bounds from inexpensive computation. Third, the general BnB framework of BroMAP allows additional lower-bounding techniques such as rotamer-contraction to be easily incorporated.

It could be argued that the performance comparison between BroMAP and DEE/A* was not thorough or fair because DEE can be faster depending on what elimination conditions are used, how they are combined²⁶, or how much memory is available for unification. However, it should be noted that BroMAP also exploits DEE, and that BroMAP can be regarded as an added structure to DEE/A* to allow a more effective use of it in a general framework. As a result, if a better implementation of DEE/A* is given or a better system environment is allowed, the performance of BroMAP is also expected to benefit from it.

In our experiment, using rotamer-contraction did not always improve the total running time of BroMAP, although it tends to reduce the number of nodes expanded by BroMAP. However, among the 14 test cases that were solved by BroMAP with rotamer-contraction but not by DEE/A*, two could not be solved by BroMAP without rotamer-contraction within the 7-day time limit. In addition, for the 51 test cases used for comparison of BroMAP and DEE/A*, the total running time of BroMAP was reduced by 17% on average simply by using rotamer-contraction. Therefore, there is a question of how much effort should be spent on rotamer-contraction to maximize the performance of BroMAP. On the other hand, observing the behavior of BroMAP on many random instances to parameterize its solution time by problem characteristics will be interesting and may help improve the performance of BroMAP, because no direct correlation between the problem size and the BroMAP solution time has been found. Finally, a substantial fraction of BroMAP’s running time is spent on post-processing of TRMP to compute rotamer-pair lower bounds. Therefore, a speed-up of BroMAP could be made through efficiency improvement of this post-processing procedure. Our future investigation will address these problems to extend the applicability of BroMAP to larger protein design cases.

Acknowledgments

The authors would like to thank current and past members of Tidor group, especially Michael Altman for his DEE/A* code and Alessandro Senes for giving freely of his time and advice.

Appendix A. TRMP

Algorithm 1 describes “edge-based reparameterization updates”⁴⁷ for a set of general trees Inline graphic . In line 2, 3, 5, and 6, $κ_{i}^{n}$ and $κ_{i j}^{n}$ are constants that can be arbitrarily set as long as they are positive. Γ(i) is the set of vertices neighboring i in for i ∈ ( ).

Table IV.

Algorithm 1: TRMP (edge-based reparameterization updates^47;51)

graphic file with name nihms414666f12.jpg

Open in a new tab

Appendix B. ILP formulation

The ILP formulation for the GMEC problem referred in this paper is as follows:

min_{{x_{i_{r}}}, {x_{i_{r} j_{s}}}} [\sum_{i \in V} \sum_{r \in R_{i}} E (i_{r}) x_{i_{r}} + \sum_{(i, j) \in E} \sum_{(r, s) \in R_{i} \times R_{j}} E (i_{r} j_{s}) x_{i_{r} j_{s}}]

(25)

\sum_{r \in R_{i}} x_{i_{r}} = 1, \forall i \in V

(26)

\sum_{s \in R_{j}} x_{i_{r} j_{s}} = x_{i_{r}}, \forall (i, j) \in E, \forall r \in R_{i},

(27)

x_{i_{r}} \in {0, 1}, \forall i \in V, r \in R_{i},

(28)

x_{i_{r} j_{s}} \in {0, 1}, \forall (i, j) \in E, (r, s) \in R_{i} \times R_{j} .

(29)

An LP relaxation can be obtained by simply dropping constraints (28)–(29). The resulting LP is equivalent to the tree-relaxed LP⁴⁷.

Appendix C. Proof of Lemma 1

From (17), we have

max_{x} p (x) = max_{x} \frac{ν_{c}}{Z} \prod_{S \in S} {p^{S} (x; ν)}^{ρ (S)} \leq \frac{ν_{c}}{Z} \prod_{S \in S} {max_{x} p^{S} (x; ν)}^{ρ (S)}

(30)

Since ν is tree-consistent with every S ∈ Inline graphic , we can easily find a MAP assignment x^S such that x^S ∈ arg max_x p^S(x; ν) using dynamic programming⁵⁰. Then, due to the assumption that ν is tree-consistent with S and is in a normal form, we have the following properties:

ν_{i} (x_{i}^{S}) = 1, for all i \in V (S),

(31)

ν_{i j} (x_{i}^{S}, x_{j}^{S}) = 1, for all (i, j) \in E (S) .

(32)

Therefore,

max_{x} p^{S} (x; ν) = p^{S} (x^{S}; ν) = \prod_{i \in V (S)} ν_{i} (x_{i}^{S}) \prod_{(i, j) \in E (S)} \frac{ν_{i j} (x_{i}^{S}, x_{j}^{S})}{ν_{i} (x_{i}^{S}) ν_{j} (x_{j}^{S})} = 1.

(33)

Since (33) is true for every S ∈ Inline graphic and Σ_S_∈ρ(S) = 1, we obtain max_x p(x) ≤ ν_c/Z from (30).

Appendix D. Proof of Lemma 2

From (17), we have

\begin{array}{l} max_{{x ∣ x_{ζ} = r}} p (x) = max_{{x ∣ x_{ζ} = r}} \frac{ν_{c}}{Z} \prod_{S \in S} {p^{S} (x; ν)}^{ρ (S)} \leq \frac{ν_{c}}{Z} \prod_{S \in S} {max_{{x ∣ x_{ζ} = r}} p^{S} (x; ν)}^{ρ (S)} \\ = \frac{ν_{c}}{Z} \prod_{S \in S : ζ \in V (S)} {max_{{x ∣ x_{ζ} = r}} p^{S} (x; ν)}^{ρ (S)} \prod_{S \in S : ζ \notin V (S)} {max_{{x ∣ x_{ζ} = r}} p^{S} (x; ν)}^{ρ (S)} \end{array}

(34)

By the definition of max-marginals and the assumption that ν is tree-consistent, for S ∈ Inline graphic such that ζ ∈ (S), we have

ν_{ζ} (r) = κ_{ζ} max_{{x ∣ x_{ζ} = r}} p^{S} (x; ν),

(35)

for some constant κ_ζ. We know there exists r* ∈ R_ζ such that

ν_{ζ} (r^{*}) = max_{x_{ζ} \in R_{ζ}} ν_{ζ} (x_{ζ}) = κ_{ζ} max_{{x ∣ x_{ζ} = r^{*}}} p^{S} (x; ν) .

(36)

Then, since ν is in a normal form, ν_ζ(r*) = 1. We know from (33) in the proof of Lemma 1, that max_{{x|x_ζ= r^*}}p^S(x; ν) = max_x p^S(x; ν) = 1. Therefore, κ_ζ = 1, and

max_{{x ∣ x_{ζ} = r}} p^{S} (x; ν) = ν_{ζ} (r) .

(37)

On the other hand, for all S ∈ Inline graphic such that ζ ∉ (S), we know max_{{x|x_ζ= r}} p^S(x; ν) = max_x p^S(x; ν) = 1. Plugging the obtained values of max_{{x|x_ζ = r}} p^S(x; ν) and Σ_S_∈
: _ζ_∈
(_S₎ ρ(S) = ρ_ζ into (34), we obtain max_{{x|x_ζ = r}} p(x) ≤ (ν_c/Z)ν_ζ(r)^ρ_ζ.

Appendix E. Proof of Lemma 3

If ζ, η ≠ (S), we know max_{{x|x_ζ=r, x_η=s}p^S(x; ν) = max_x p^S(x; ν). Then, since ν is tree-consistent and in a normal form, max_{{x|x_ζ=r,x_η=s}}p^S(x; ν) = 1 from (33).
If ζ ∈ (S), and η ∉ (S), we know max_{{x|x_ζ=r,x_η=s}}p^S(x; ν) = max_{{x|x_ζ=r}}p^S(x; ν). Then, from (37), max_{{x|x_ζ=r,x_η=s}}p^S(x; ν) = ν_ζ(r)
If (ζ, η) ∈ (S), by the definition of max-marginals and the assumption that ν is tree-consistent with every S, we have ν_ζη(r, s) = κ_ζηmax_{{x|(x_ζx_η)=(r,s)}} p^S(x; ν), for some constant κ_ζη. We also know there exists (r^*, s^*) ∈ R_ζ × R_η such that ν_ζη(r^*, s^*) = max_{(x_ζ,x_η)∈Rζ ×} _Rη ν_ζη(x_ζ, x_η) = κ_ζη max_{{x|(x_ζ,x_η=(r^*,s^*)}} p^S(x; ν). Then, since ν is in a normal form, i.e. ν_ζη(r^*, s^*) = 1 and we have max_{{x|(x_ζx_η)=(r^*,s^*)}} p^S(x; ν) = max_x p^S(x; ν) = 1 from (33), we obtain κ_ζη = 1. Therefore, max_{{x|(x_ζ,x_η=(r,s)}} p^S(x; ν) = ν_ζη(r, s).
If ζ, η (S) and (η, ν) ∉ (S), let ξ = γ(S). Then,
$max_{{x ∣ x_{ζ}, x_{η} = (r, s)}} p^{S} (x; ν) = max_{{x ∣ x_{ζ}, x_{η} = (r, s)}} ν_{ξ} (x_{ξ}) ν_{ζ} (r) ν_{η} (s) \frac{ν_{ξ ζ} (x_{ξ}, r)}{ν_{ξ} (x_{ξ}) ν_{ζ} (r)}$ (38)

$\times \frac{ν_{ξ η} (x_{ξ}, s)}{ν_{ξ} (x_{ξ}) ν_{η} (s)} \prod_{j \in V (S) \ {ζ, η, ξ}} ν_{j} (x_{j}) \frac{ν_{ξ j} (x_{ξ}, x_{j})}{ν_{ξ} (x_{ξ}) ν_{j} (x_{j})}$ (39)

$= max_{{x ∣ (x_{ζ}, x_{η}) = (r, s)}} \frac{ν_{ξ ζ} (x_{ξ}, r) ν_{ξ η} (x_{ξ}, s)}{ν_{ξ} (x_{ξ})} \prod_{j \in V (S) \ {ζ, η, ξ}} \frac{ν_{ξ j} (x_{ξ}, x_{j})}{ν_{ξ} (x_{ξ})}$ (40)

$= max_{x_{ξ}} {\frac{ν_{ξ ζ} (x_{ξ}, r) ν_{ξ η} (x_{ξ}, s)}{ν_{ξ} (x_{ξ})} \prod_{j \in V (S) \ {ζ, η, ξ}} max_{x_{j}} \frac{ν_{ξ j} (x_{ξ}, x_{j})}{ν_{ξ} (x_{ξ})}} .$ (41)

From the tree-consistency of ν, we have ν_ξ(x_ξ) = β_ξj max_{x_j} ν_ξj (x_ξ, x_j) for some β_ξj > 0. We can also easily find x^S such that x^S ∈ arg max_x p^S(x; ν) using the tree-consistency of ν. Then, since ν is in a normal form, we have $ν_{ξ} (x_{ξ}^{S}) = 1$ , and ${max}_{x_{j}} ν_{ξ j} (x_{ξ}^{S}, x_{j}) = ν_{ξ j} (x_{ξ}^{S}, x_{j}^{S}) = 1$ . Therefore, β_ξj = 1, and ν_ξ (x_ξ) = max_{x_j} ν_ξj (x_ξ, x_j). So the maximization over x_j in the parentheses of (41) is equal to 1 for all j ∈ Inline graphic (S)\{ζ, η,ξ}. Therefore,

max_{{x ∣ (x_{ζ}, x_{η}) = (r, s)}} p^{S} (x; ν) = max_{x_{ξ}} \frac{ν_{ξ ζ} (x_{ξ}, r) ν_{ξ η} (x_{ξ}, s)}{ν_{ξ} (x_{ξ})} .

References

1.Chothia C, Lesk AM. EMBO J. 1986;5(4):823–826. doi: 10.1002/j.1460-2075.1986.tb04288.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Ring CS, Cohen FE. FASEB J. 1993;7:783–790. doi: 10.1096/fasebj.7.9.8330685. [DOI] [PubMed] [Google Scholar]
3.Baker D, Sali A. Science. 2001;294(5540):93–96. doi: 10.1126/science.1065659. [DOI] [PubMed] [Google Scholar]
4.Drexler KE. Proc Natl Acad Sci USA. 1981;78:5275–5278. doi: 10.1073/pnas.78.9.5275. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Pabo C. Nature. 1983;301:200. doi: 10.1038/301200a0. [DOI] [PubMed] [Google Scholar]
6.Godzik A, Kolinski A, Skolnick J. J Comput Aided Mol Des. 1993;7(4):397–438. doi: 10.1007/BF02337559. [DOI] [PubMed] [Google Scholar]
7.Hellinga HW, Richards FM. Proc Natl Acad Sci USA. 1994;91:5803–5807. doi: 10.1073/pnas.91.13.5803. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Dahiyat BI, Mayo SL. Protein Sci. 1996;5:895–903. doi: 10.1002/pro.5560050511. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Kuhlman B, Dantas G, Ireton GC, Varani G, Stoddard BL, Baker D. Science. 2003;302:1364–1368. doi: 10.1126/science.1089427. [DOI] [PubMed] [Google Scholar]
10.Ponder JW, Richards FM. J Mol Biol. 1987;193(4):775–791. doi: 10.1016/0022-2836(87)90358-5. [DOI] [PubMed] [Google Scholar]
11.Dunbrack RL, Karplus M. J Mol Biol. 1993;230:543–574. doi: 10.1006/jmbi.1993.1170. [DOI] [PubMed] [Google Scholar]
12.Desmet J, De Maeyer M, Hazes B, Lasters I. Nature. 1992;356:539–542. doi: 10.1038/356539a0. [DOI] [PubMed] [Google Scholar]
13.Weiner SJ, Kollman PA, Case DA, Singh UC, Ghio C, Alagona G, Profeta S, Jr, Weiner P. J Am Chem Soc. 1984;106(3):765–784. [Google Scholar]
14.Weiner SJ, Kollman PA, Nguyen DT, Case DA. J Comput Chem. 1986;7(2):230–252. doi: 10.1002/jcc.540070216. [DOI] [PubMed] [Google Scholar]
15.Cornell WD, Cieplak P, Bayly CI, Gould IR, Merz KM, Jr, Ferguson DM, Spellmeyer DC, Fox T, Caldwell JW, Kollman PA. J Am Chem Soc. 1995;117:5179–5197. [Google Scholar]
16.Brooks BR, Bruccoleri RE, Olafson BD, States DJ, Swaminathan S, Karplus M. J Comput Chem. 1983;4:187–217. [Google Scholar]
17.Mackerell AD, Bashford D, Bellott M, Dunbrack RL, Evanseck JD, Field MJ, Fischer S, Gao J, Guo H, Ha S, Joseph-Mccarthy D, Kuchnir L, Kuczera K, Lau FTK, Mattos C, Michnick S, Ngo T, Nguyen DT, Prodhom B, Reiher WE, Roux B, Schlenkrich M, Smith JC, Stote R, Straub J, Watanabe M, Wiorkiewicz-Kuczera J, Yin D, Karplus M. J Phys Chem B. 1998;102:3586–3616. doi: 10.1021/jp973084f. [DOI] [PubMed] [Google Scholar]
18.Halgren TA. J Comput Chem. 1996;17:490–519. [Google Scholar]
19.Jorgensen WL, Tirado-Rives J. J Phys Chem. 1996;100(34):14508–14513. [Google Scholar]
20.Korte B, Vygen J. 2 Springer-Verlag; 2002. [Google Scholar]
21.Goldstein RF. Biophys J. 1994;66:1335–1340. doi: 10.1016/S0006-3495(94)80923-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Lasters I, De Maeyer M, Desmet J. Protein Eng. 1995;8:815–822. doi: 10.1093/protein/8.8.815. [DOI] [PubMed] [Google Scholar]
23.Gordon DB, Mayo SL. J Comput Chem. 1998;13:1505–1514. [Google Scholar]
24.Pierce NA, Spriet JA, Desmet J, Mayo SL. J Comput Chem. 2000;21:999–1009. [Google Scholar]
25.Looger LL, Hellinga HW. J Mol Biol. 2001;307:429–445. doi: 10.1006/jmbi.2000.4424. [DOI] [PubMed] [Google Scholar]
26.Gordon DB, Hom GK, Mayo SL, Pierce NA. J Comput Chem. 2003;24:232–243. doi: 10.1002/jcc.10121. [DOI] [PubMed] [Google Scholar]
27.Georgiev I, Lilien RH, Donald BR. Bioinformatics. 2006;22(14):174–183. doi: 10.1093/bioinformatics/btl220. [DOI] [PubMed] [Google Scholar]
28.Leach AR, Lemon AP. Proteins. 1998;33:227–239. doi: 10.1002/(sici)1097-0134(19981101)33:2<227::aid-prot7>3.0.co;2-f. [DOI] [PubMed] [Google Scholar]
29.Gordon DB, Mayo SL. Structure. 1999;7(9):1089–1098. doi: 10.1016/s0969-2126(99)80176-2. [DOI] [PubMed] [Google Scholar]
30.Wernisch L, Hery S, Wodak SJ. J Mol Biol. 2000;301:713–736. doi: 10.1006/jmbi.2000.3984. [DOI] [PubMed] [Google Scholar]
31.Eriksson O, Zhou Y, Elofsson A. Proceedings of WABI 2001, Vol. 2149 of LNCS; Springer; 2001. pp. 128–141. [Google Scholar]
32.Althaus E, Kohlbacher O, Lenhof HP, Müller P. J Comput Biol. 2002;9:597–612. doi: 10.1089/106652702760277336. [DOI] [PubMed] [Google Scholar]
33.Kingsford C, Chazelle B, Singh M. Bioinformatics. 2005;21(7):1028–1036. doi: 10.1093/bioinformatics/bti144. [DOI] [PubMed] [Google Scholar]
34.Leaver-Fay A, Kuhlman B, Snoeyink J. Pacific Symposium on Biocomputing. Vol. 10. World Scientific Publishing; 2005. pp. 16–27. [PubMed] [Google Scholar]
35.Xu J. Proceedings of RECOMB 2005; Springer; 2005. pp. 423–439. [Google Scholar]
36.Xie W, Sahinidis NV. Bioinformatics. 2006;22(2):188–194. doi: 10.1093/bioinformatics/bti763. [DOI] [PubMed] [Google Scholar]
37.Yanover C, Meltzer T, Weiss Y. J Mach Learn Res. 2006;7:1887–1907. [Google Scholar]
38.Weiss Y, Yanover C, Meltzer T. Proceedings of UAI 2007; 2007. [Google Scholar]
39.Desjarlais JR, Handel TM. J Mol Biol. 1999;289:305–318. doi: 10.1006/jmbi.1999.2866. [DOI] [PubMed] [Google Scholar]
40.Peterson RW, Dutton PL, Wand AJ. Protein Sci. 2004;13:735–751. doi: 10.1110/ps.03250104. [DOI] [PMC free article] [PubMed] [Google Scholar]
41.Koehl P, Delarue M. J Mol Biol. 1994;239:249–275. doi: 10.1006/jmbi.1994.1366. [DOI] [PubMed] [Google Scholar]
42.Desjarlais JR, Handel TM. Protein Sci. 1995;4:2006–2018. doi: 10.1002/pro.5560041006. [DOI] [PMC free article] [PubMed] [Google Scholar]
43.Jones DT. Protein Sci. 1994;3:567–574. doi: 10.1002/pro.5560030405. [DOI] [PMC free article] [PubMed] [Google Scholar]
44.Jiang X, Farid H, Pistor E, Farid R. Protein Sci. 2000;9:403–416. doi: 10.1110/ps.9.2.403. [DOI] [PMC free article] [PubMed] [Google Scholar]
45.Yanover C, Weiss Y. Proceedings of NIPS 2002; 2002. [Google Scholar]
46.Hart PE, Nilsson NJ, Raphael B. IEEE Trans Syst Sci Cyber. 1968;2:100–107. [Google Scholar]
47.Wainwright MJ, Jaakkola TS, Willsky AS. IEEE Trans Inform Theory. 2005;51(11):3697–3717. [Google Scholar]
48.Nemhauser GL, Wolsey LA. Integer and Combinatorial Optimization. Wiley; N. Y: 1988. [Google Scholar]
49.Cowell RG, Dawid AP, Lauritzen SL, Spiegehalter DJ. Probabilistic Networks and Expert Systems. Springer-Verlag; New York: 1999. [Google Scholar]
50.Wainwright MJ, Jaakkola T, Jordan MI. Stat Comput. 2004;14:143–166. [Google Scholar]
51.Kolmogorov V. IEEE Trans Pattern Anal. 2006;28:1568–1583. doi: 10.1109/TPAMI.2006.200. [DOI] [PubMed] [Google Scholar]
52.Hong E-J, Lozano-Pérez T. Proceedings of WABI 2006, Vol. 4175 of LNCS; Springer; 2006. pp. 219–230. [Google Scholar]
53.Altman MD. PhD thesis. Massachusetts Institute of Technology; Cambridge, MA, U.S.A: 2006. Computational ligand design and analysis in protein complexes using inverse methods, combinatorial search, and accurate solvation modeling. [Google Scholar]
54.Eckstein J, Phillips CA, Hart WE. Pico: an object oriented framework form parallel branch and bound. Technical report, RUTCOR. 2001 [Google Scholar]
55.Main AL, Harvey TS, Baron M, Boyd J, Campbell ID. Cell. 1992;71(4):671–678. doi: 10.1016/0092-8674(92)90600-h. [DOI] [PubMed] [Google Scholar]
56.Braden BC, Souchon H, Eisele JL, Bentley GA, Bhat TN, Navaza J, Poljak RJ. J Mol Biol. 1994;243(4):767–81. doi: 10.1016/0022-2836(94)90046-9. [DOI] [PubMed] [Google Scholar]
57.Bhat TN, Bentley GA, Boulot G, Greene MI, Tello D, Dallacqua W, Souchon H, Schwarz FP, Mariuzza RA, Poljak RJ. Proc Natl Acad Sci USA. 1994;91:1089–1093. doi: 10.1073/pnas.91.3.1089. [DOI] [PMC free article] [PubMed] [Google Scholar]
58.Syed R, Reid S, Li C, Cheetham J, Aoki K, Liu B, Zhan H, Osslund T, Chirino A, Zhang J, Finer-Moore J, Elliott S, Sitney K, Katz B, Matthews D, Wendoloski J, Egrie J, Stroud R. Nature. 1998;395:511–516. doi: 10.1038/26773. [DOI] [PubMed] [Google Scholar]
59.Dunbrack RL. Curr Opin Struct Biol. 2002;12(4):431–440. doi: 10.1016/s0959-440x(02)00344-5. [DOI] [PubMed] [Google Scholar]
60.Lippow SM, Wittrup KD, Tidor B. Nat Biotechnol. 2007;25:1171–1176. doi: 10.1038/nbt1336. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R1] 1.Chothia C, Lesk AM. EMBO J. 1986;5(4):823–826. doi: 10.1002/j.1460-2075.1986.tb04288.x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R2] 2.Ring CS, Cohen FE. FASEB J. 1993;7:783–790. doi: 10.1096/fasebj.7.9.8330685. [DOI] [PubMed] [Google Scholar]

[R3] 3.Baker D, Sali A. Science. 2001;294(5540):93–96. doi: 10.1126/science.1065659. [DOI] [PubMed] [Google Scholar]

[R4] 4.Drexler KE. Proc Natl Acad Sci USA. 1981;78:5275–5278. doi: 10.1073/pnas.78.9.5275. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] 5.Pabo C. Nature. 1983;301:200. doi: 10.1038/301200a0. [DOI] [PubMed] [Google Scholar]

[R6] 6.Godzik A, Kolinski A, Skolnick J. J Comput Aided Mol Des. 1993;7(4):397–438. doi: 10.1007/BF02337559. [DOI] [PubMed] [Google Scholar]

[R7] 7.Hellinga HW, Richards FM. Proc Natl Acad Sci USA. 1994;91:5803–5807. doi: 10.1073/pnas.91.13.5803. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R8] 8.Dahiyat BI, Mayo SL. Protein Sci. 1996;5:895–903. doi: 10.1002/pro.5560050511. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R9] 9.Kuhlman B, Dantas G, Ireton GC, Varani G, Stoddard BL, Baker D. Science. 2003;302:1364–1368. doi: 10.1126/science.1089427. [DOI] [PubMed] [Google Scholar]

[R10] 10.Ponder JW, Richards FM. J Mol Biol. 1987;193(4):775–791. doi: 10.1016/0022-2836(87)90358-5. [DOI] [PubMed] [Google Scholar]

[R11] 11.Dunbrack RL, Karplus M. J Mol Biol. 1993;230:543–574. doi: 10.1006/jmbi.1993.1170. [DOI] [PubMed] [Google Scholar]

[R12] 12.Desmet J, De Maeyer M, Hazes B, Lasters I. Nature. 1992;356:539–542. doi: 10.1038/356539a0. [DOI] [PubMed] [Google Scholar]

[R13] 13.Weiner SJ, Kollman PA, Case DA, Singh UC, Ghio C, Alagona G, Profeta S, Jr, Weiner P. J Am Chem Soc. 1984;106(3):765–784. [Google Scholar]

[R14] 14.Weiner SJ, Kollman PA, Nguyen DT, Case DA. J Comput Chem. 1986;7(2):230–252. doi: 10.1002/jcc.540070216. [DOI] [PubMed] [Google Scholar]

[R15] 15.Cornell WD, Cieplak P, Bayly CI, Gould IR, Merz KM, Jr, Ferguson DM, Spellmeyer DC, Fox T, Caldwell JW, Kollman PA. J Am Chem Soc. 1995;117:5179–5197. [Google Scholar]

[R16] 16.Brooks BR, Bruccoleri RE, Olafson BD, States DJ, Swaminathan S, Karplus M. J Comput Chem. 1983;4:187–217. [Google Scholar]

[R17] 17.Mackerell AD, Bashford D, Bellott M, Dunbrack RL, Evanseck JD, Field MJ, Fischer S, Gao J, Guo H, Ha S, Joseph-Mccarthy D, Kuchnir L, Kuczera K, Lau FTK, Mattos C, Michnick S, Ngo T, Nguyen DT, Prodhom B, Reiher WE, Roux B, Schlenkrich M, Smith JC, Stote R, Straub J, Watanabe M, Wiorkiewicz-Kuczera J, Yin D, Karplus M. J Phys Chem B. 1998;102:3586–3616. doi: 10.1021/jp973084f. [DOI] [PubMed] [Google Scholar]

[R18] 18.Halgren TA. J Comput Chem. 1996;17:490–519. [Google Scholar]

[R19] 19.Jorgensen WL, Tirado-Rives J. J Phys Chem. 1996;100(34):14508–14513. [Google Scholar]

[R20] 20.Korte B, Vygen J. 2 Springer-Verlag; 2002. [Google Scholar]

[R21] 21.Goldstein RF. Biophys J. 1994;66:1335–1340. doi: 10.1016/S0006-3495(94)80923-3. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R22] 22.Lasters I, De Maeyer M, Desmet J. Protein Eng. 1995;8:815–822. doi: 10.1093/protein/8.8.815. [DOI] [PubMed] [Google Scholar]

[R23] 23.Gordon DB, Mayo SL. J Comput Chem. 1998;13:1505–1514. [Google Scholar]

[R24] 24.Pierce NA, Spriet JA, Desmet J, Mayo SL. J Comput Chem. 2000;21:999–1009. [Google Scholar]

[R25] 25.Looger LL, Hellinga HW. J Mol Biol. 2001;307:429–445. doi: 10.1006/jmbi.2000.4424. [DOI] [PubMed] [Google Scholar]

[R26] 26.Gordon DB, Hom GK, Mayo SL, Pierce NA. J Comput Chem. 2003;24:232–243. doi: 10.1002/jcc.10121. [DOI] [PubMed] [Google Scholar]

[R27] 27.Georgiev I, Lilien RH, Donald BR. Bioinformatics. 2006;22(14):174–183. doi: 10.1093/bioinformatics/btl220. [DOI] [PubMed] [Google Scholar]

[R28] 28.Leach AR, Lemon AP. Proteins. 1998;33:227–239. doi: 10.1002/(sici)1097-0134(19981101)33:2<227::aid-prot7>3.0.co;2-f. [DOI] [PubMed] [Google Scholar]

[R29] 29.Gordon DB, Mayo SL. Structure. 1999;7(9):1089–1098. doi: 10.1016/s0969-2126(99)80176-2. [DOI] [PubMed] [Google Scholar]

[R30] 30.Wernisch L, Hery S, Wodak SJ. J Mol Biol. 2000;301:713–736. doi: 10.1006/jmbi.2000.3984. [DOI] [PubMed] [Google Scholar]

[R31] 31.Eriksson O, Zhou Y, Elofsson A. Proceedings of WABI 2001, Vol. 2149 of LNCS; Springer; 2001. pp. 128–141. [Google Scholar]

[R32] 32.Althaus E, Kohlbacher O, Lenhof HP, Müller P. J Comput Biol. 2002;9:597–612. doi: 10.1089/106652702760277336. [DOI] [PubMed] [Google Scholar]

[R33] 33.Kingsford C, Chazelle B, Singh M. Bioinformatics. 2005;21(7):1028–1036. doi: 10.1093/bioinformatics/bti144. [DOI] [PubMed] [Google Scholar]

[R34] 34.Leaver-Fay A, Kuhlman B, Snoeyink J. Pacific Symposium on Biocomputing. Vol. 10. World Scientific Publishing; 2005. pp. 16–27. [PubMed] [Google Scholar]

[R35] 35.Xu J. Proceedings of RECOMB 2005; Springer; 2005. pp. 423–439. [Google Scholar]

[R36] 36.Xie W, Sahinidis NV. Bioinformatics. 2006;22(2):188–194. doi: 10.1093/bioinformatics/bti763. [DOI] [PubMed] [Google Scholar]

[R37] 37.Yanover C, Meltzer T, Weiss Y. J Mach Learn Res. 2006;7:1887–1907. [Google Scholar]

[R38] 38.Weiss Y, Yanover C, Meltzer T. Proceedings of UAI 2007; 2007. [Google Scholar]

[R39] 39.Desjarlais JR, Handel TM. J Mol Biol. 1999;289:305–318. doi: 10.1006/jmbi.1999.2866. [DOI] [PubMed] [Google Scholar]

[R40] 40.Peterson RW, Dutton PL, Wand AJ. Protein Sci. 2004;13:735–751. doi: 10.1110/ps.03250104. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R41] 41.Koehl P, Delarue M. J Mol Biol. 1994;239:249–275. doi: 10.1006/jmbi.1994.1366. [DOI] [PubMed] [Google Scholar]

[R42] 42.Desjarlais JR, Handel TM. Protein Sci. 1995;4:2006–2018. doi: 10.1002/pro.5560041006. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R43] 43.Jones DT. Protein Sci. 1994;3:567–574. doi: 10.1002/pro.5560030405. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R44] 44.Jiang X, Farid H, Pistor E, Farid R. Protein Sci. 2000;9:403–416. doi: 10.1110/ps.9.2.403. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R45] 45.Yanover C, Weiss Y. Proceedings of NIPS 2002; 2002. [Google Scholar]

[R46] 46.Hart PE, Nilsson NJ, Raphael B. IEEE Trans Syst Sci Cyber. 1968;2:100–107. [Google Scholar]

[R47] 47.Wainwright MJ, Jaakkola TS, Willsky AS. IEEE Trans Inform Theory. 2005;51(11):3697–3717. [Google Scholar]

[R48] 48.Nemhauser GL, Wolsey LA. Integer and Combinatorial Optimization. Wiley; N. Y: 1988. [Google Scholar]

[R49] 49.Cowell RG, Dawid AP, Lauritzen SL, Spiegehalter DJ. Probabilistic Networks and Expert Systems. Springer-Verlag; New York: 1999. [Google Scholar]

[R50] 50.Wainwright MJ, Jaakkola T, Jordan MI. Stat Comput. 2004;14:143–166. [Google Scholar]

[R51] 51.Kolmogorov V. IEEE Trans Pattern Anal. 2006;28:1568–1583. doi: 10.1109/TPAMI.2006.200. [DOI] [PubMed] [Google Scholar]

[R52] 52.Hong E-J, Lozano-Pérez T. Proceedings of WABI 2006, Vol. 4175 of LNCS; Springer; 2006. pp. 219–230. [Google Scholar]

[R53] 53.Altman MD. PhD thesis. Massachusetts Institute of Technology; Cambridge, MA, U.S.A: 2006. Computational ligand design and analysis in protein complexes using inverse methods, combinatorial search, and accurate solvation modeling. [Google Scholar]

[R54] 54.Eckstein J, Phillips CA, Hart WE. Pico: an object oriented framework form parallel branch and bound. Technical report, RUTCOR. 2001 [Google Scholar]

[R55] 55.Main AL, Harvey TS, Baron M, Boyd J, Campbell ID. Cell. 1992;71(4):671–678. doi: 10.1016/0092-8674(92)90600-h. [DOI] [PubMed] [Google Scholar]

[R56] 56.Braden BC, Souchon H, Eisele JL, Bentley GA, Bhat TN, Navaza J, Poljak RJ. J Mol Biol. 1994;243(4):767–81. doi: 10.1016/0022-2836(94)90046-9. [DOI] [PubMed] [Google Scholar]

[R57] 57.Bhat TN, Bentley GA, Boulot G, Greene MI, Tello D, Dallacqua W, Souchon H, Schwarz FP, Mariuzza RA, Poljak RJ. Proc Natl Acad Sci USA. 1994;91:1089–1093. doi: 10.1073/pnas.91.3.1089. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R58] 58.Syed R, Reid S, Li C, Cheetham J, Aoki K, Liu B, Zhan H, Osslund T, Chirino A, Zhang J, Finer-Moore J, Elliott S, Sitney K, Katz B, Matthews D, Wendoloski J, Egrie J, Stroud R. Nature. 1998;395:511–516. doi: 10.1038/26773. [DOI] [PubMed] [Google Scholar]

[R59] 59.Dunbrack RL. Curr Opin Struct Biol. 2002;12(4):431–440. doi: 10.1016/s0959-440x(02)00344-5. [DOI] [PubMed] [Google Scholar]

[R60] 60.Lippow SM, Wittrup KD, Tidor B. Nat Biotechnol. 2007;25:1171–1176. doi: 10.1038/nbt1336. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Rotamer Optimization for Protein Design through MAP Estimation and Problem-Size Reduction

Eun-Jong Hong

Shaun M Lippow

Bruce Tidor

Tomás Lozano-Pérez

Abstract

Introduction

Overview of the method

Branch-and-bound framework

Figure 1.

Figure 2.

Solving subproblems

Bounding subproblems

Reducing subproblem size

Figure 3.

Figure 4.

Subproblem splitting and selection

Bounding the GMEC energy through MAP estimation

Problem Formulation

Max-marginals and max-product algorithm

Example 1 (Max-marginals)50

Figure 5.

Pseudo-max-marginals

Example 2 (Pseudo-max-marginals)47

Figure 6.

TRMP

Figure 7.

Table I.

Bounding the GMEC energy with TRMP

Lemma 1

Example 3

Elimination by TRMP lower bounds

Lemma 2

Example 4

Lemma 3

Example 5

Results and Discussions

DEE/A* implementation

BroMAP implementation

Platform

Test cases

Running time comparison

Table II.

Figure 8.

Figure 9.

Figure 10.

TRMP lower bounds

Table III.

Figure 11.

Conclusions

Acknowledgments

Appendix A. TRMP

Table IV.

Appendix B. ILP formulation

Appendix C. Proof of Lemma 1

Appendix D. Proof of Lemma 2

Appendix E. Proof of Lemma 3

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

Example 1 (Max-marginals)⁵⁰

Example 2 (Pseudo-max-marginals)⁴⁷