Skip to main content
. Author manuscript; available in PMC: 2016 Jan 1.
Published in final edited form as: Methods. 2014 Dec 3;0:146–157. doi: 10.1016/j.ymeth.2014.11.015

Algorithm 1.

Our methodology to build LBVS-specific benchmarking set. All Ligand set is defined as SAL [], ZINC set is defined as Sz [], Final Ligand set is defined as SFL[], Potential Decoy set is defined as SPD[], output of All Final Decoy set is defined as SAFD[], Index is the unique identity of each compound for selection purpose.

Step 1. Select Diverse Ligands (SAL[])
1:  TsimAL[] ⇐ Tanimoto coefficient {SAL, SAL} // get topological similarity between all ligand pairs, temporary ID for each ligand is i (target ligand set) and j (reference ligand set), i, j =1,2,… NAL
2:  TsimAL [] ⇐ set element TsimAL(i,j)=a (a<0) where i=j // set self-similarity to a negative value
3:  rval ⇐ 1 // set initial ID of current target ligand to 1
4:  while size rval ≠0 do
5:    Pair[]⇐ get (i,j) of TsimAL[] ≥ 0.75 //get IDs of two ligands whose similarity is over the threshold (0.75)
6:    rval ⇐ get i of the first element in Pair [] // get index of current target ligand based on Pair[]
7:    If Pair[]null do
8:      TsimAL[]⇐set TsimAL(i,j)=0 where i=rval or j=rval // set similarity of current target ligand to any reference ligand to 0
9:    End if
10: end while
11: Index_SL[]⇐get index of TsimAL,AL (i,j)0 where j=1 // get indexes of all ligands whose mutual similarity is within threshold
12: SFL[]SAL[Index_SL[]] // the final ligand set with selected diverse ligands
13: return SFL[]
Step 2. Select Potential Decoys (SFL[], Sz[])
1:  PFL[] ⇐ physicochemical properties {SFL} // calculate physicochemical properties of Final Ligands
2:  PZ[] ⇐ physicochemical properties {SZ} // calculate physicochemical properties of ZINC compounds
3:  PFL_R[]⇐ get ranges of each property based on PFL[] // get the maximum and minimum values for each property
4:  Index_Z[] ⇐ get index of PZ[] within PFL_R[] // get the indexes of compounds within all physicochemical properties threshold
5:  SZS[]Sz[Index_Z[]] // ZINC Subset with selected compounds
6:  TsimFL[] ⇐ Tanimoto coefficient {SFL, SFL} // calculate mutual topological similarity within Final Ligands
7:  TsimFL_min⇐ get min of TsimFL[] // get the minimum of all values based on TsimFL[]
8:  TsimFL,ZS[] ⇐ Tanimoto coefficient {SFL, Szs} // calculate mutual topological similarity between Final Ligands and ZINC Subset compounds
9:  TsimFL,ZS_maxmin[]⇐get all mins and maxs of TsimFL,ZS[] // get min and max similarity values to all Final Ligands for each ZINC Subset compound
10: Index_ZS[]⇐ get index of TsimFL_minTsimFL,ZS_maxmin[]< 0.75 // get indexes of compounds within topological similarity threshold (max < 0.75 & min ≥ TsimFL_min)
11: SPD[]Szs[Index_ZS[]] // Potential Decoys with selected compounds from ZINC Subset
12: return SPD[]
Step 3. Select Final Decoys (SFL[], SPD[])
1:  PPD[] ⇐ physicochemical properties {SPD} // calculate physicochemical properties of Potential Decoys
2:  PFL_norm[] ⇐ normalize PFL[] // normalize the values into [0,1] based on PFL_R[]
3:  PPD_norm[] ⇐ normalize PPD[] // normalize the values into [0,1] based on PFL_R[]
4:  NFL ⇐ size SFL[] // count the number of Final Ligands
5:  IndexFL[]⇐get index of SFL[] // get indexes of Final Ligands
6:  IndexPD[]⇐get index of SPD[] // get indexes of Potential Decoys
7:  SAFD[]null // define a dataset of All Final Decoys
8:  for i ⇐ 1 to NFL do // i is temporary ID for each Final Ligand
9:    IndexAFD[] ⇐ get index of SAFD[] // get indexes of already-made decoys in All Final Decoy set
10:  SPDnew[]SPD [IndexPD ≠ IndexAFD] // select potential decoys that are not in All Final Decoy set to form a new Potential Decoy set
11:  IndexCURR[]⇐get index of ligand i // get index of the current ligand
12:  SCURR[]SFL [IndexFL[]= IndexCURR[]] // ligand set containing current selected ligand
13:  PCURR_norm[] ⇐ get normalized properties of SCURR // get the normalized properties of the current ligand from PFL_norm[]
14:  PsimPDnew,CURR[] ⇐ property similarity {SPDnew [], SCURR []} // calculate property similarity between potential decoys and current ligand based on PCURR_norm[] and PPDnew_norm[]
15:  for k ⇐ 1 to 10 do // k controls different thresholds
16:      Thr(k) = 1−0.05×(k−1) // set the Psim threshold based on k
17:      Num_thr (k) ⇐ count number of PsimPDnew,CURR[]Thr(k) // count numbers of potential decoys whose Psim over each threshold
18:    end for
19:    cutoff_k ⇐get k of Num_thr (k)r & Num_thr (k+1)>r // get k when number of potential decoys within threshold
20:    cutoff_maxThr(cutoff_k) // get the upper threshold of Psim
21:    cutoff_minThr(cutoff_k+1) // get the lower threshold of Psim
22:    IndexPD1[] ⇐ get index of PsimPDnew,CURR[]cutoff_max // get index of the 1st pool of selected potential decoys within threshhold
23:    SFD1[]SPD[IndexPD1[]] // Final Decoy set 1
24:    IndexPD2[]⇐ get index of cutoff_maxPsimPDnew,CURR[]cutoff_min // get index of the 2nd pool of potential decoys within threshold
25:    SFD2_midSPD[IndexPD2[]] // Final Decoy set 2 intermediate
26:    SFL_otherSFL[IndexFL[]IndexCURR[]] // the set of other ligands except for the current one
27:    TsimFL_other, CURR []⇐Tanimoto_coefficient {SFL_other, SCURR} // calculate mutual topological similarity between other final ligands and current final ligand
28:    TsimFL_other, FD2_mid [] ⇐ Tanimoto coefficient {SFL_other, SFD2_mid} // calculate mutual topological similarity between other final ligands and each compound in Final decoys 2 intermediate
29:    ΔTsimFD2_mid, CURR[]⇐get values of | TsimFL_other, FD2_mid []TsimFL_other, CURR []| // Tsim difference
30:    SΔTsimFD2_mid, CURR[]⇐sort ΔTsimFD2_mid, CURR[] // sort the topological difference from low to high
31:    IndexFD2[]⇐ get indexes of the top (r-size SFD1) values of SΔTsimFD2_mid, CURR[]// get indexes of the top (r-size SFD1) decoys from Final Decoy set 2 intermediate
32:    SFD2[]SPD[IndexFD2[]] // Final Decoy set 2
33:    SFD[]= SFD1SFD2 // Final Decoy set including Final Decoy set 1 and Final Decoy set 2
34:    SAFD[]=SAFD[]SFD[] // All Final Decoy set containing separate final decoys for each ligand
35: end for
36: return SAFD[]