Skip to main content
. 2011 Aug 12;135(6):065102. doi: 10.1063/1.3615722

Algorithm for learning to dock

1: Input: Set of correctly docked conformation Xij (i is the complex index and j is the index of the protein chain, total of n complexes and 2n chains), their sequences, and their transformations τi ((X11, X12), τ1), …, ((Xn1, Xn2), τn), C – the weight of the slack variable penalty, tolerated approximation error υ, size of region in output space ε.
2: Start the search by calculating an initial set of potential parameters. For all n complexes with known empirical structures generate set of incorrect transformations Γi(0)i = 1, …, n (i is index of the complex). Any set of decoys can be used to boot strap the algorithm. In the present study we used Patchdock.41
3: Calculate the set of constraints Sii and a set of clusters of transformations Gkk (k is the index of the cluster) Si{GkΓi(0)τi(j)Gk:αpα(fα(τi)fα(τi(j)))1ηikΔ(τi,τi(j))} where Δ(τi,τi(j))=irmsd((Xi1,Xi2)(τi),(Xi1,Xi2)(τi(j))), τi(j) is an element in the cluster of transformations Gk.
4: Solve the quadratic programming problem (P,η)= argmin w,ξ12PtP+Cni,kηik subject to the constraints i=1nSi and ∀i, kηik ⩾ 0
5: Start the main iteration cycle and set the number of iterations:ς = 0
6: Repeat: ς = ς + 1
7: fori = 1, …, ndo /* Loop over all complexes*/
8.1: Find most violated transformations, Ti(ς), the energies of the violating decoys, E, and their similarity to the native, Δ. The input is the coordinates of the two chains Xi1 and Xi2 of complex i, the set of transformations, τi to model complex i, the set of parameters P, tolerated energy error υ, the geometrical size of a ball ε that determines the boundary of a cluster, and the number of complex structures to retain,Λ. We also provide the set of clusters added in previous iterations cycles β=0ς1kΓik(β) so that clusters are added only if not already present.
find_top_violations:
• Input: receptor XR, ligand XL, native transformation τnat, scoring function parameters w, tolerated error υ, existing clusters T, cluster size ε and minimum number of solutions to retain Λ
find radius of each protein and determine the density of rotational sampling and grid spacing to be used such that the error can be bounded by υ
compute score of native transformation Enat
compute grids Rvdw and ∀j ∈ {1, …, 22}Rj on the receptor protein and their inverse Fourier transforms
(Γ, E, Δ) ← Ø /* set of high scoring transformations, their energies and distances from native */
Vsorted = [0, 0, …, 0] (sorted array of extents of top Λ violations)
• foruαU(the space of rotations) do
compute scores EαGrid for all transformations (Γα) involving current rotation
○ for τ ∈ ΓαwithEτnatEτGrid1+υdo
   compute Δ(τ,τnat)=irmsd((XR,XL)(τ),(XR,XL)nat))
  ▪ if(1(EτnatEτGrid))>Vsorted[Λ]Δ(τ,τnat)υ, then
    compute exact score Eτ
   ▪ if(1(EτnatEτ))>Vsorted[Λ]Δ(τ,τnat) and EτnatEτ≤1, then
     compute Vτ=Δ(τ,τnat)(1−(EτnatEτ)), update Vsorted
   • fi
  ▪ fi
• end for
○ end for
incremental cluster retained transformations and add/update clusters, let Tout be the final set of clusters
• Output: (Tout, E, Δ)
More compactly: (Ti(ς),E,Δ)= find ̱ top ̱ violations (Xi1,Xi2,τi,P,υ,ɛ,Λ,β=0ς1kΓik(β))
8.2: Create new clusters if regions of top scoring transformations have not been seen so far according to current set of parameters.
 /* add violated constraints to the working set */
forGkTi(α)do
  forτi(j)Gkdo
    if Δ(τi(j),τi)(1αpα[fα(τi)fα(τi(j))])>0 /* have a violation */
    if cluster ik exists from previous iteration or added in this loop (say it was Γil(ι)) and Δ(τi(j),τi)(1αpα[fα(τi)fα(τi(j))])>ηil+υ, Γil(ι)=Γil(ι){τi(j)} fi
    if new cluster ik, Γik(ς)={τi(j)} fi
   fi
  end for
end for
9: Set constraints
Siβ=0ς{GkΓi(β)τi(j)Gk:αpα[fα(τi)fα(τi(j))]1ηikΔ(τi,τi(j))} and solve the quadratic programming problem: (P,η)= argmin P,η12PtP+Cni,kηik subject to the constraints i=1nSi and ∀i, kηik ⩾ 0
10: end for
11: until no new constraints found during iteration (i,kΓik(ς)=Ø)
12: Output:P