Step 1. Select Diverse Ligands (SAL[]) |
1: TsimAL[] ⇐ Tanimoto coefficient {SAL, SAL} // get topological similarity between all ligand pairs, temporary ID for each ligand is i (target ligand set) and j (reference ligand set), i, j =1,2,… NAL
|
2: TsimAL [] ⇐ set element TsimAL(i,j)=a (a<0) where i=j // set self-similarity to a negative value |
3: rval ⇐ 1 // set initial ID of current target ligand to 1 |
4: while size rval ≠0 do
|
5: Pair[]⇐ get (i,j) of TsimAL[] ≥ 0.75 //get IDs of two ligands whose similarity is over the threshold (0.75) |
6: rval ⇐ get i of the first element in Pair [] // get index of current target ligand based on Pair[]
|
7: If
Pair[] ≠ null do
|
8: TsimAL[]⇐set TsimAL(i,j)=0 where i=rval or j=rval // set similarity of current target ligand to any reference ligand to 0 |
9: End if
|
10: end while
|
11: Index_SL[]⇐get index of TsimAL,AL (i,j) ≠0 where j=1 // get indexes of all ligands whose mutual similarity is within threshold |
12: SFL[]⇐ SAL[Index_SL[]] // the final ligand set with selected diverse ligands |
13: return
SFL[]
|
Step 2. Select Potential Decoys (SFL[], Sz[]) |
1: PFL[] ⇐ physicochemical properties {SFL} // calculate physicochemical properties of Final Ligands |
2: PZ[] ⇐ physicochemical properties {SZ} // calculate physicochemical properties of ZINC compounds |
3: PFL_R[]⇐ get ranges of each property based on PFL[] // get the maximum and minimum values for each property |
4: Index_Z[] ⇐ get index of PZ[] within PFL_R[] // get the indexes of compounds within all physicochemical properties threshold |
5: SZS[]⇐Sz[Index_Z[]] // ZINC Subset with selected compounds |
6: TsimFL[] ⇐ Tanimoto coefficient {SFL, SFL} // calculate mutual topological similarity within Final Ligands |
7: TsimFL_min⇐ get min of TsimFL[] // get the minimum of all values based on TsimFL[]
|
8: TsimFL,ZS[] ⇐ Tanimoto coefficient {SFL, Szs} // calculate mutual topological similarity between Final Ligands and ZINC Subset compounds |
9: TsimFL,ZS_maxmin[]⇐get all mins and maxs of TsimFL,ZS[] // get min and max similarity values to all Final Ligands for each ZINC Subset compound |
10: Index_ZS[]⇐ get index of TsimFL_min ≤ TsimFL,ZS_maxmin[]< 0.75 // get indexes of compounds within topological similarity threshold (max < 0.75 & min ≥ TsimFL_min) |
11: SPD[]⇐Szs[Index_ZS[]] // Potential Decoys with selected compounds from ZINC Subset |
12: return
SPD[]
|
Step 3. Select Final Decoys (SFL[], SPD[]) |
1: PPD[] ⇐ physicochemical properties {SPD} // calculate physicochemical properties of Potential Decoys |
2: PFL_norm[] ⇐ normalize PFL[] // normalize the values into [0,1] based on PFL_R[]
|
3: PPD_norm[] ⇐ normalize PPD[] // normalize the values into [0,1] based on PFL_R[]
|
4: NFL ⇐ size SFL[] // count the number of Final Ligands |
5: IndexFL[]⇐get index of SFL[] // get indexes of Final Ligands |
6: IndexPD[]⇐get index of SPD[] // get indexes of Potential Decoys |
7: SAFD[]⇐null // define a dataset of All Final Decoys |
8: for
i ⇐ 1 to NFL
do // i is temporary ID for each Final Ligand |
9: IndexAFD[] ⇐ get index of SAFD[] // get indexes of already-made decoys in All Final Decoy set |
10: SPDnew[]⇐SPD [IndexPD ≠ IndexAFD] // select potential decoys that are not in All Final Decoy set to form a new Potential Decoy set |
11: IndexCURR[]⇐get index of ligand i // get index of the current ligand |
12: SCURR[] ⇐SFL [IndexFL[]= IndexCURR[]] // ligand set containing current selected ligand |
13: PCURR_norm[] ⇐ get normalized properties of SCURR // get the normalized properties of the current ligand from PFL_norm[]
|
14: PsimPDnew,CURR[] ⇐ property similarity {SPDnew [], SCURR []} // calculate property similarity between potential decoys and current ligand based on PCURR_norm[] and PPDnew_norm[]
|
15: for
k ⇐ 1 to 10 do // k controls different thresholds |
16: Thr(k) = 1−0.05×(k−1) // set the Psim threshold based on k
|
17: Num_thr (k) ⇐ count number of PsimPDnew,CURR[] ≥ Thr(k) // count numbers of potential decoys whose Psim over each threshold |
18: end for
|
19: cutoff_k ⇐get k of Num_thr (k)≤ r & Num_thr (k+1)>r // get k when number of potential decoys within threshold |
20: cutoff_max ⇐Thr(cutoff_k) // get the upper threshold of Psim
|
21: cutoff_min ⇐Thr(cutoff_k+1) // get the lower threshold of Psim
|
22: IndexPD1[] ⇐ get index of PsimPDnew,CURR[] ≥ cutoff_max // get index of the 1st pool of selected potential decoys within threshhold |
23: SFD1[] ⇐SPD[IndexPD1[]] // Final Decoy set 1 |
24: IndexPD2[]⇐ get index of cutoff_max ≥ PsimPDnew,CURR[] ≥ cutoff_min // get index of the 2nd pool of potential decoys within threshold |
25: SFD2_mid ⇐SPD[IndexPD2[]] // Final Decoy set 2 intermediate |
26: SFL_other ⇐ SFL[IndexFL[]≠ IndexCURR[]] // the set of other ligands except for the current one |
27: TsimFL_other, CURR []⇐Tanimoto_coefficient {SFL_other, SCURR} // calculate mutual topological similarity between other final ligands and current final ligand |
28: TsimFL_other, FD2_mid [] ⇐ Tanimoto coefficient {SFL_other, SFD2_mid} // calculate mutual topological similarity between other final ligands and each compound in Final decoys 2 intermediate |
29: ΔTsimFD2_mid, CURR[]⇐get values of | TsimFL_other, FD2_mid []−TsimFL_other, CURR []| // Tsim difference |
30: SΔTsimFD2_mid, CURR[]⇐sort ΔTsimFD2_mid, CURR[] // sort the topological difference from low to high |
31: IndexFD2[]⇐ get indexes of the top (r-size SFD1) values of SΔTsimFD2_mid, CURR[]// get indexes of the top (r-size SFD1) decoys from Final Decoy set 2 intermediate |
32: SFD2[] ⇐SPD[IndexFD2[]] // Final Decoy set 2 |
33: SFD[]= SFD1 ∪ SFD2 // Final Decoy set including Final Decoy set 1 and Final Decoy set 2 |
34: SAFD[]=SAFD[] ∪ SFD[] // All Final Decoy set containing separate final decoys for each ligand |
35: end for
|
36: return
SAFD[]
|