Abstract
Pose prediction of ligands to proteins remains a central challenge of structure-based drug design. Although data leakage and generalizability concerns remain, data-driven methods for pose prediction (i.e., based on deep learning and diffusion) now routinely outperform traditional techniques such as molecular docking. In this work, we propose a simple data-driven ligand-based baseline for pose prediction, which is based on maximal common substructure to reference molecules, followed by constrained 3D embedding. As this TEMplate-based Protein–Ligand (TEMPL) baseline is strictly data-driven, it is a particularly meaningful baseline for interpolative tasks, where physics-based methods sometimes underperform as they exploit data less directly. However, it can also highlight the added advantage of other interpolative data-driven methods that should outperform this simple approach. We applied our baseline method in the ASAP-Polaris-OpenADMET antiviral competition, achieving a result that outperformed some classic docking algorithms for the pose prediction of a series of ligands at the Main Protease of SARS-CoV-2 and MERS-CoV. Furthermore, we show that the performance of our baseline is relatively good on a protein–ligand pose prediction benchmark used for deep learning based pose prediction, PDBBind, highlighting the risk of data leakage and the necessity of challenging splits for other data-driven methods as well. We also show our baseline method has limited performance on more challenging benchmarks, such as PoseBusters. We provide our baseline method as open source software. For convenience and for nontechnical users, we also provide a web application to run the pipeline. These findings will aid in the evaluation of future pose prediction methods, especially more complex data-driven approaches that are increasing in popularity.
Introduction
The prediction of the pose (conformation and absolute placement of a ligand in a protein binding site) is one of the grand challenges of structure-based drug design. The capacity to predict the correct pose of a ligand enables rational design of analogs, as well as providing the starting point for molecular modeling such as molecular dynamics or free energy of binding estimation methods. The traditional method used for pose prediction is molecular docking, which is widely used and has achieved many successes as an important component of drug design campaigns, but which also has several well-known serious limitations. , Molecular docking is based on the combination of a scoring function, which ranks a given pose, and a search method, which is used to optimize starting poses into the lowest possible score, usually using a heuristic approach.
Recently, several new data-driven methods for pose prediction that are distinct from classic molecular docking , were developed, which can be broadly divided into two main categories: (1) deep learning based pose prediction and (2) cofolding. In deep learning based pose prediction, large protein–ligand complex structure data sets are leveraged to predict ligand poses using deep learning methods. Some notable methods are EquiBind, which is based on E(3)-equivariant geometric deep learning, and DiffDock, a diffusion based approach. Another early pose prediction method based on deep learning was TankBind, using trigonometry aware neural networks. Conversely, in cofolding, protein 3D structure prediction and ligand placement are done concurrently, meaning the input is the sequence of the target protein and not a pre-existing 3D structure. The best known example of the cofolding approach is AlphaFold3 as well as closely related approaches such as Chai, Boltz, Protenix, NeuralPlexer as well as RosettaFold AllAtom. These methods have all achieved superior performance in pose prediction success on common benchmarks.
A typical problem of data-driven machine learning approaches is limited extrapolative capacity. Analysis showed that the good reported performance of deep learning based pose prediction and cofolding can be partially attributed to similarities between training and test sets. For example, when using the commonly utilized time split approach on the PDBBind set, the test set contains identical and very similar proteins and ligands, which heavily favors such data-driven methods. These issues as well as ways to deal with them have been discussed at length. − Additionally, diffusion based ligand placement methods suffer from some unusual issues, such as alteration of the ligand (e.g., inversion of stereocenters, change of bond orders), which is uncommon with physics inspired methods such as docking. Some of these issues were highlighted by Buttenschoen and co-workers.
Recently, cofolding methods achieved success in a prospective challenge: the pose prediction component of the ASAP-Polaris-OpenADMET antiviral competition, hereafter mentioned as “the Polaris competition”. The competition involved the prediction of the structure of ca. 200 protein–ligand complexes of inhibitors bound to the SARS-CoV-2 and MERS-CoV Main Protease (MPro). As these true poses were only made available after the challenge concluded, this can be considered an honest benchmark challenge to compare pose prediction methods. In this challenge, cofolding methods outperformed all other methods, particularly traditional pose prediction methods such as FRED, GLIDE and Vina, confirming the reported good performance of cofolding methods cannot solely be attributed to data leakage and similarity. Another success for cofolding methods was reported in the 16th Critical Assessment of Structure Prediction (CASP16) challenge where the AlphaFold3 baseline outperformed all submissions in the protein–ligand pose prediction component of the challenge.
The reported success of the data-driven methods can be partially attributed to the fact a wealth of data exists for the SARS-CoV-2 Main Protease, which is a widely targeted protein, and which is the target of Nirmatrelvir, Pfizer’s FDA-approved SARS-CoV-2 MPro inhibitor clinically used to treat COVID-19 infections. For MERS-CoV a much smaller amount of protein–ligand structures are available, although the structure is relatively similar to that of SARS-CoV-2 MPro. The abundance of available protein–ligand crystal structures for SARS-CoV-2 MPro meant that pose information on known ligands could be used as a sort of template or reference to predict the pose of new ligands, with the assumption common parts of different binders will more or less bind in the same location. This same principle is also applied in template-based docking, which is available in FRED, GLIDE and other docking approaches. −
To further investigate and isolate the power of the template effect, we designed a baseline method which is based completely on ligand alignment. Unlike template-based docking methods or the aforementioned complex data-driven approaches our procedure is based on first finding the best template in a reference ligand set using Maximal Common Substructure (MCS) algorithms, then generating conformations based on the template, using constrained embedding, and then finally ranking conformation by 3D alignment with the template. As such this approach is almost fully ligand-based and only requires the target protein sequence to search a suitable template ligand set for the 3D alignment. The input of the workflow (Figure B and C) is a set of ligands, for which the pose is to be generated. Given a set of reference protein–ligand complexes (e.g., PDBBind), which are retrieved by looking for similar proteins of an input apo protein (Figure A), aligned reference ligands are obtained by alignment and superposition of the reference complexes onto the input protein in 3D space. The aligned reference ligands are then used as input for the TEMPL core pipeline (Figure B and C).
1.
Workflow of TEMPL. (A) Alignment of the input protein to a set of reference protein–ligand complexes can be used to generate the reference ligands to be used as templates. (B) Template-based, ligand-only strategy for placement prediction. (C) Detailed view of every TEMPL core submodule.
Methods
The core methods of TEMPL are all internal functions of the RDKit, a cheminformatics library. The procedure used in TEMPL is based on detection of the Maximal Common Substructure (MCS) between the input ligands (with aligned absolute coordinates) and a set of reference ligands. For each input ligand, the best MCS match is used for constrained 3D embedding, generating conformers where the matching atoms from the MCS are locked in the reference coordinates. Then, the generated conformers are ranked using shape or feature alignment to the reference molecule. These correspond to the three blocks in Figure B and are described below.
Maximal Common Substructure
MCS detection was performed using the RascalMCES algorithm. This method was preferred over the standard MCS method in RDKit (rdFMCS) because of its higher speed, which is an important bottleneck when there is a high amount of probe and reference ligands. This method is different from rdFMCS and leverages matching bonds/edges, and as such is actually based on maximum common edge substructure.
Constrained Embedding
Constrained embedding was performed using the ETKDGv3 method, which is a commonly used method for conformer generation, based on knowledge-enhanced distance geometry. These generated conformers can optionally be optimized using the MMFF94s or UFF (in case of atom types that do not have parameters in MMFF94s) force field, although this undoes the strict coordinate constraints from the preceding step.
Shape Alignment
Finally, ranking of the conformers resulting from constrained embedding was done using RDKit’s Align3D method, which was developed by the PubChem team. This recently added method is based on the Gaussian volume approximation of molecular shape which allows fast calculation of overlap. This overlap is quantified using ShapeTanimoto and ColorTanimoto, which corresponds to the intersection of the (colored or not) volumes divided by the union of the volumes. Color here means the gaussians are assigned feature labels, from 6 feature types: hydrogen bond acceptor, hydrogen bond donor, anionic, cationic, hydrophobic and ring features. Another ranking method is ComboTanimoto, which is simply the average between ShapeTanimoto and ColorTanimoto.
Data Sources
Data related to the Polaris competition was obtained using the Polaris Python tool’s data loader. The competition organizers provided reference complexes for the SARS-CoV-2 and MERS-CoV MPro. These were used as references to align all provided experimental SARS-CoV-2 structures to, using the Biotite Python package. Only a single reference MERS-CoV complex was provided, but a selection of 17 were extracted from the literature. SARS-CoV-2 MPro ligands were then aligned to the reference MERS-CoV MPro complex to augment the data, as these ligands are chemically much closer to the test set molecules, which both for MERS-CoV and SARS-CoV-2 MPro mostly belong to the same series of isoquinoline-based inhibitors. Finally, the superposed ligands were inspected and anything far from the pocket (e.g., bound to the wrong protein chain) was manually removed. This then led to a set of absolute ligand coordinates which could be used as inputs in the ligand-based core TEMPL pipeline.
For the PDBBind experiments, significant data cleaning was performed on the PDBBind (v2020) data set. To ensure correct parsing of molecules, which are provided as SDF files, the corresponding ligand SMILES were retrieved from the work of Li and co-workers and used to reconstruct and correct the atom connectivity, which is essential for correct detection of MCS. We rejected all molecules that could not be parsed by RDKit, and additionally, we filter out oligopeptides with more than 8 residues and oligosaccharides with more than 3 sugars, as our method is intended for small molecule pose prediction and very large molecules are not compatible with MCS algorithms. In the end, this resulted in a training set size of 15119 complexes, a test set size of 340 complexes and a validation set size of 883 complexes.
PoseBusters data was used as provided.
The methods used for reference alignment are shown in Figure A and are described below.
Finding Similar Proteins
The sequences of PDBBind protein chains were embedded using ESM2 (specifically the esm2_t33_650M_UR50D model), averaging the per residue embeddings into a single 1280 float embedding vector for each input protein sequence. The distance between different sequences were quantified using cosine distance, with the best 100 template proteins retained. To increase portability of our method, we have also made the used embedding available via a publicly available repository to spare the user the effort of having to recalculate these for all proteins in the PDBBind set.
Protein Superposition
For protein backbone superposition, we applied the “superimpose homologs” method from Biotite. When the final Cα RMSD was above a specified threshold (default of 10 Ångström), the protein was discarded due to low agreement between the structures (in case no protein at all is found, as a fallback, the distance threshold is relaxed with 5 Å steps in order to have at least one protein template).
Extract Aligned Ligands
The transform (i.e., rotation and translation) used during protein superposition is applied on the corresponding ligand, transforming the coordinates of the ligand in the data set. These absolute coordinates are stored in the SDF file used as input in the TEMPL core pipeline.
Pose Assessment
For calculation of ligand pose Root Mean Square Deviations (RMSDs), the sPyRMSD Python package was used. This method is symmetry-corrected and is also independent of hydrogens and bond orders, which enables some tautomer and protomer related nonidentity issues to be averted. This is the same RMSD implementation which was used for the Polaris competition leaderboard. The equation for RMSD is given below (eq ), with A and B the N by 3 matrices of atomic coordinates of two conformers A and B with matched heavy atoms.
| 1 |
Generally, an RMSD of 2 Ångström and below has been considered a docking success although this threshold is considered crude and a truly successful docking prediction will also involve other factors such as physical soundness, retrieval of molecular interactions and realistic torsion angles.
As a second metric for assessing ligand pose predictions, we apply Local Distance Difference Test Protein–Ligand Interactions (lDDT-PLI), a variant of lDDT which looks at the conservation of protein–ligand contacts, and which is commonly used to assess the quality of predicted protein–ligand structures. We used the docker image provided by OpenStructure, using standard settings and we considered any failure or error as a score of 0.000.
Results
Polaris Competition Pose Prediction
A prototype version of our method was applied in the pose prediction component of the Polaris competition. This challenge involved predicting the pose of a series of ca. 200 closely related ligands to the SARS-CoV-2 and MERS-CoV MPro. A training set of about 800 complexes to SARS-CoV-2 Main protease was made available as a part of the challenge. The initial prototype of our method achieved a success rate of 51% (MERS-CoV + SARS-CoV-2 < 2 Å) on the intermediary leaderboard. On the final leaderboard, the performance of the baseline deteriorated to 34%, mainly due to using a smaller template ligand set for the MERS-CoV Main protease and not using SARS-CoV-2 MPro data. Below, we report the results for the finalized version of TEMPL, which was developed after the Polaris competition ended (incorporating a web server, code cleanup, an ablation study, and data cleanup). We performed an ablation study, of which the results are summarized in Table . The final achieved performance using the optimized settings is 75.4% (MERS-CoV + SARS-CoV-2 < 2 Å) and lDDT-PLI of 0.838.
1. Ablation Study (Metrics Reported for Test Data) .
| experiment | settings | MERS-CoV < 2 Å | SARS-CoV-2 < 2 Å | MERS-CoV+SARS-CoV-2 < 2 Å | lDDT-PLI (mean + - st.dev.) |
|---|---|---|---|---|---|
| (A) default | MCS + 200 conformations + ComboTanimoto | 67.0 | 74.5 | 70.8 | 0.838 ± 0.150 |
| (B) different ligand templates | MERS PDB reference only (instead of transposed SARS-CoV-2) | 16.5 | 74.5 | 45.6 | 0.720 ± 0.230 |
| (C) different 3D alignment score | ShapeTanimoto | 62.9 | 68.4 | 65.6 | 0.721 ± 0.267 |
| ColorTanimoto | 59.8 | 69.4 | 64.6 | 0.785 ± 0.186 | |
| (D) no constrained embedding | unconstrained embedding + ComboTanimoto | 66.0 | 72.4 | 69.2 | 0.782 ± 0.200 |
| unconstrained embedding + ShapeTanimoto | 63.9 | 69.4 | 66.7 | 0.775 ± 0.206 | |
| unconstrained embedding + ColorTanimoto | 58.8 | 68.4 | 63.6 | 0.775 ± 0.208 | |
| (E) no realignment | no realignment | 64.9 | 69.4 | 67.2 | 0.725 ± 0.190 |
| (F) final force field minimization | MMFF94 optimization | 60.8 | 68.4 | 64.6 | 0.750 ± 0.194 |
| (G) reduced conformations | 100 conformations | 62.9 | 74.5 | 68.7 | 0.777 ± 0.205 |
| 50 conformations | 60.8 | 69.4 | 65.1 | 0.777 ± 0.199 | |
| 20 conformations | 53.6 | 63.3 | 58.5 | 0.759 ± 0.199 | |
| 10 conformations | 42.3 | 63.3 | 52.8 | 0.753 ± 0.187 | |
| 5 conformations | 28.9 | 52.0 | 40.5 | 0.700 ± 0.228 | |
| 1 conformation | 15.5 | 19.4 | 17.4 | 0.557 ± 0.266 |
(A) Default settings. (B) Different ligand templates (literature MERS-CoV MPro complexes only, instead of transposed SARS-CoV-2 MPro ones). (C) Different 3D alignment scoring function. (D) No constrained 3D embedding (i.e., no locking of coordinates based on MCS). (E) No realignment (keep coordinates fixed during 3D alignment, so direct output of 3D embedding reranked with alignment scores only). (F) To minimize the final structure, a force field (MMFF94s) was used. (G) Reduced amount of ligand conformations used in the pipeline.
PDBBind Pose Prediction
To compare our baseline head to head with recent deep learning pose prediction methods, we employed the PDBBind data set, using commonly used splits, in particular the time split used by Strk and co-workers. , The PDBBind data set is a curated subset of the Protein Data Bank (PDB) with protein–ligand complexes, which consists of 18,902 unique structures before postprocessing. This split was later criticized for test-train similarities, both on the ligand and protein level, making the pose prediction task easier than anticipated and possibly inflating the performance of data driven pose prediction methods. Because our baseline is designed to capture such effects, we anticipated this is one place where our baseline method should show some degree of successful pose predictions due to near-neighbor behavior.
Using these provided splits, which is based on a cutoff at a given date (first of January 2019), and using the PDBBind train set complexes as a reference, TEMPL achieves an RMSD < 2 Å success rate on the test set of 22.1%. This can be compared with the reported success rates of GLIDE at 21.8%, EquiBind at 5.5% and DiffDock at 38.2%. Note that reported GLIDE numbers are relatively low because the reported task is blind docking, so without the binding site specified, which is typically specified for traditional docking algorithms such as GLIDE. In Figure , the success rate dependency on training set similarity is shown both for ligand and protein similarity. As expected, particularly in the situation of high ligand similarity combined with high protein similarity, high success rates (67.3%) are achieved.
2.
RMSD < 2 Å success rate versus protein and molecular similarity for the test set and for the validation set. Protein similarity is estimated by RMSD of Cα between input protein and closest found template protein after homologous superimposition with Biotite, molecular similarity using the Tanimoto similarity of ECFP4 (2048 bits). Binning is done by quartile (Q1–Q4). Numbers between brackets are the size of the bin.
The validation set achieved a higher pose recovery rate of 41.3%, this is a consequence of the fact the validation set is a random split of the pre-2019 PDBBind train set and as such has higher average similarity to the training set. We also performed a Leave-One-Out (LOO) experiment on the train set itself (where all train set templates except the one for the current pose prediction are used), achieving a similar pose recovery rate of 42.3%.
From the point of view of lDDT-PLI a similar tendency is present: the leave-one-out train task achieves an lDDT-PLI of 0.581 ± 0.330, the test set is worse at 0.346 ± 0.351 and the validation set 0.577 ± 0.329.
PoseBusters Pose Prediction and Pose Evaluation
One of the first serious challenges to the reported good performance of data-driven pose placement methods was made by the PoseBusters team, which showed these methods suffer from serious problems including physically improbable conformations and steric clashes. The PoseBusters authors also provided a more challenging protein–ligand complex data set which was later used as a benchmark set for cofolding methods such as AlphaFold3. Using the same workflow, TEMPL achieved a < 2 Å success rate of 8.9% (<5 Å success rate of 17.6%), which outperformed EquiBind (2.0%), but was outperformed by all other DL methods (reported success rates were TankBind 16%, DeepDock 20%, Uni-Mol 22%, DiffDock 38%) and compared to traditional methods such as AutoDock Vina (60%). The lDDT-PLI is similarly low, obtaining a mean value of 0.189 ± 0.305, corresponding to a low success rate.
In Figure , the success rate dependency on training set similarity is shown both for ligand and protein similarity. Similarly to the PDBBind case, in the situation of high ligand similarity combined with high protein similarity, the highest success rates (53.3%) are achieved.
3.

Performance on the PoseBusters task is strongly dependent on similarity of ligands and proteins to the reference set. Protein similarity is estimated by RMSD of Cα between input protein and closest found template protein after homologous superimposition with Biotite, molecular similarity using the Tanimoto similarity of ECFP4 (2048 bits). Binning is done by quartile. Numbers between brackets are the size of the bin.
Additionally, we used PoseBusters to estimate the PoseBusters validity of poses obtained using TEMPL. As TEMPL is based on MCS alignment and is strictly ligand-based, there is no awareness at all of possible steric clashes, and the aim is solely to obtain poses that perform well on the RMSD based metric. As expected, 66.7% of the obtained poses within the <2 Å threshold were PoseBusters-invalid. The invalidity rate is comparable with the invalidity rates reported for DL method DiffDock (68%), and better than the other DL methods (EquiBind 100%; TankBind 79%; DeepDock 74%; Uni-Mol 91%) but significantly higher than the low invalidity reported for AF3 and traditional docking methods such as AutoDock Vina (<5%). The main cause of invalidity was failure on the “Minimum protein–ligand distance” task, meaning there are unrealistically close atom distances between the ligand and the protein.
Software Package
Our method is offered as a Python-based command-line tool, and additionally, the tool is made available as a web application, with the intention to facilitate its usage as a baseline method to compare against other data-driven methods. The web application is based on Streamlit and makes it possible for users to try out the TEMPL pipeline without commitment, local installation or familiarity with command-line software. The web application can be run locally, but we have made an instance of it available via https://templ.dyn.cloud.e-infra.cz. Instruction for a basic usage task for both the command line tool and the web app are given in the Supporting Information. A detailed flowchart of the workflow including all fallbacks is available in the code repository.
Discussion
The pose prediction experiments described in the results section of this paper confirm the initial observation from our submission to the Polaris competition. When similar protein–ligand complexes are available, pose prediction based on ligand similarity achieves high success rates. As described elsewhere − the same effect leads to overestimation of the strength of data-driven pose prediction methods.
The Polaris competition itself corresponded to a best-case scenario for similarity-based methods: a large amount of protein–ligand complex structures were available for SARS-CoV-2 Main Protease and the ligands in the test set mostly belonged to one series of similar compounds that all contain a decorated isoquinoline. In this case, our method performed well and outperformed traditional and physics-based methods, including AutoDock Vina with template placement correction. On the other hand, fine-tuned deep learning methods performed significantly better, occupying the entire top 3 in the final leaderboard.
The ∼ 20% increase in final reported performance of 70.8% pose recall for TEMPL in the Polaris competition versus the intermediary leaderboard performance of 51% can be attributed to enhanced data cleanup, polishing of the prototype, and the capacity of running the ablation study on unblinded test data.
The ablation study (Table ) showed all 3 components, MCS, constrained embedding and alignment based placement ranking are necessary for good performance (Table A). It was also shown that force field optimization of poses decreased the success rate (Table F). It is known ETKDGv3 generated conformers create very high quality ensembles of conformations without further optimization, but based on previously known results, force field optimization is not expected to make the performance worse, or to lower conformation diversity. However, there is an obvious reason for the degraded performance in our case: force field optimization loosens coordinate constraints of formerly locked atoms.
Furthermore, the ablation study showed that 3D alignment on its own (i.e., without any MCS or template) is able to capture many correct poses too (Table D), although it underperforms relative to our optimized baseline (Table A). When 3D alignment was used for ranking only (and not for optimization of the coordinates), there was a slight degradation in performance (Table E). It was found 200 conformers per input molecule are necessary (Table G), in line with the observations by McNutt and co-workers. The choice of reference molecules is very important: realigned SARS-CoV-2 MPro data performed much better than MERS-CoV MPro data alone (Table B). This confirmed our intuition that similar protein templates with similar ligands give more meaningful ligand templates than identical protein templates with less similar ligands. For 3D alignment, ComboTanimoto outperforms both ShapeTanimoto and ColorTanimoto (Table C), suggesting feature distribution and molecular shape contain complementary information that is useful for successful alignment.
Applying this method on widely used pose prediction benchmarks, PDBBind and PoseBusters, confirmed a strong dependency on similarity and, in the case of the PDBBind benchmark, achieved moderate performance, comparable with reported numbers for traditional docking methods. On novel proteins and/or using novel ligands, the performance completely collapses, confirming this approach is, unlike traditional docking methods, not able to extrapolate meaningfully at all (Figure ). This was particularly clear on the more novel PoseBusters benchmark set, where there is essentially no pose recovery outside of the known region (Figure ).
Many of the output poses on the PoseBusters task did not pass the validity checks, although surprisingly, more than 33% were valid, which is a higher validity rate than all the other deep learning based methods in this benchmark. This is interesting, because all information about possible steric clashes or protein contacts is only indirectly provided by the shape of the reference ligands. Nonetheless, this is apparently enough to prevent steric clashes in many cases. Because ligand conformations in our method originate from ETKDGv3, only a limited amount of physically unrealistic conformers occur.
TEMPL is a baseline method and is expected to underperform in challenging tasks. As discussed above, this is seen in the PoseBusters protein–ligand benchmark, where we observed a low recovery rate of correct poses. Comparably with what is reported for deep learning based methods in the PoseBusters pose prediction task, we observe a distinct relation between pose recovery success and the similarity of protein sequences to the training set. Notably, modern cofolding methods such as AlphaFold3 perform well on this task. We propose our method as a challenging data-driven baseline that should be considered the minimum acceptable performance for any advanced data-driven method, including not only pose recall but also PoseBusters validity.
Conclusions
We have constructed a baseline method which leverages knowledge from known protein–ligand complexes to predict the poses of new ligands to new proteins. We show this baseline achieves relatively strong performance at pose prediction, sometimes outperforming widely used traditional pose prediction methods such as AutoDock Vina. This is surprising, because the method is ligand-based without any consideration of protein–ligand binding such as molecular interactions or steric factors. Although our baseline was not among the highest ranked in the final Polaris competition leaderboard, the final optimized baseline outperforms many advanced pose prediction techniques and matches the performance of the best physics-based methods. We propose our baseline to be used as a realistic minimal baseline that needs to be outperformed by newly proposed pose prediction methods, particularly under conditions where highly similar protein or ligand information can be leveraged. Our method is available both as open-source code as well as through a web app, accessible at https://templ.dyn.cloud.e-infra.cz/.
Supplementary Material
Acknowledgments
J.F., M.Š., and W.D. were supported by the Ministry of Education, Youth and Sports of the Czech Republic – National Infrastructure for Chemical Biology (CZ-OPENSCREEN, LM2023052). W.D. was supported by the Ministry of Education, Youth and Sports of the Czech Republic by the project “New Technologies for Translational Research in Pharmaceutical Sciences/NETPHARM”, project ID CZ.02.01.01/00/22_008/0004607, cofunded by the European Union. Computational resources were provided by the e-INFRA CZ project (ID:90254), supported by the Ministry of Education, Youth and Sports of the Czech Republic.
The source code of TEMPL is freely available via github.com/fulopjoz/templ-pipeline, and is released under the permissive MIT license. This also includes the code for a streamlit application, which can be run locally or accessed via https://templ.dyn.cloud.e-infra.cz/. Data is deposited to Zenodo, including a snapshot of the code at the time of submission and precalculated ESM embeddings, and including outputs of benchmarks. Benchmark output data are available via https://zenodo.org/records/16875932.
The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acs.jcim.5c01985.
Code examples for basic usage; basic usage of the web server (Figure S1); benchmark output data summarized in Tables S1 and S2; similarity quartile heatmaps using ESM2 cosine distance as the similarity metric (Figure S2); relation between RMSD and lDDT-PLI (Figure S3); lDDT-PLI per protein target class (Figure S4) (PDF)
J.F. was in charge of software implementation, software testing, conceptualization, and manuscript revision. M.S. was in charge of conceptualization, software testing, and manuscript revision. W.D. was in charge of software testing, conceptualization, manuscript revision, and drafting.
The authors declare no competing financial interest.
Published as part of Journal of Chemical Information and Modeling special issue “Open Science and Blind Data: The Antiviral Discovery Challenge”.
References
- Gschwend D. A., Good A. C., Kuntz I. D.. Molecular Docking towards Drug Discovery. J. Mol. Recognit. 1996;9(2):175–186. doi: 10.1002/(SICI)1099-1352(199603)9:2<175::AID-JMR260>3.0.CO;2-D. [DOI] [PubMed] [Google Scholar]
- Sabe V. T., Ntombela T., Jhamba L. A., Maguire G. E. M., Govender T., Naicker T., Kruger H. G.. Current Trends in Computer Aided Drug Design and a Highlight of Drugs Discovered via Computational Techniques: A Review. Eur. J. Med. Chem. 2021;224:113705. doi: 10.1016/j.ejmech.2021.113705. [DOI] [PubMed] [Google Scholar]
- Gentile F., Oprea T. I., Tropsha A., Cherkasov A.. Surely You Are Joking, Mr Docking! Chem. Soc. Rev. 2023;52(3):872–878. doi: 10.1039/D2CS00948J. [DOI] [PubMed] [Google Scholar]
- Warren G. L., Andrews C. W., Capelli A.-M., Clarke B., LaLonde J., Lambert M. H., Lindvall M., Nevins N., Semus S. F., Senger S., Tedesco G., Wall I. D., Woolven J. M., Peishoff C. E., Head M. S.. A Critical Assessment of Docking Programs and Scoring Functions. J. Med. Chem. 2006;49(20):5912–5931. doi: 10.1021/jm050362n. [DOI] [PubMed] [Google Scholar]
- Dias R., De Azevedo W. Jr. Molecular Docking Algorithms. CDT. 2008;9(12):1040–1047. doi: 10.2174/138945008786949432. [DOI] [PubMed] [Google Scholar]
- Pagadala N. S., Syed K., Tuszynski J.. Software for Molecular Docking: A Review. Biophys Rev. 2017;9(2):91–102. doi: 10.1007/s12551-016-0247-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Onodera K., Satou K., Hirota H.. Evaluations of Molecular Docking Programs for Virtual Screening. J. Chem. Inf. Model. 2007;47(4):1609–1618. doi: 10.1021/ci7000378. [DOI] [PubMed] [Google Scholar]
- Stärk, H. ; Ganea, O.-E. ; Pattanaik, L. ; Barzilay, R. ; Jaakkola, T. . EquiBind: Geometric Deep Learning for Drug Binding Structure Prediction. arXiv June 4, 2022. 10.48550/arXiv.2202.05146. [DOI] [Google Scholar]
- Corso, G. ; Stärk, H. ; Jing, B. ; Barzilay, R. ; Jaakkola, T. . DiffDock: Diffusion Steps, Twists, and Turns for Molecular Docking. arXiv February 11, 2023. 10.48550/arXiv.2210.01776. [DOI] [Google Scholar]
- Lu W., Wu Q., Zhang J., Rao J., Li C., Zheng S.. TANKBind: Trigonometry-Aware Neural NetworKs for Drug-Protein Binding Structure Prediction. Adv. Neural Inf. Process. Syst. 2022;35:7236. doi: 10.1101/2022.06.06.495043. [DOI] [Google Scholar]
- Abramson J., Adler J., Dunger J., Evans R., Green T., Pritzel A., Ronneberger O., Willmore L., Ballard A. J., Bambrick J., Bodenstein S. W., Evans D. A., Hung C.-C., O’Neill M., Reiman D., Tunyasuvunakool K., Wu Z., Žemgulytė A., Arvaniti E., Beattie C., Bertolli O., Bridgland A., Cherepanov A., Congreve M., Cowen-Rivers A. I., Cowie A., Figurnov M., Fuchs F. B., Gladman H., Jain R., Khan Y. A., Low C. M. R., Perlin K., Potapenko A., Savy P., Singh S., Stecula A., Thillaisundaram A., Tong C., Yakneen S., Zhong E. D., Zielinski M., Žídek A., Bapst V., Kohli P., Jaderberg M., Hassabis D., Jumper J. M.. Accurate Structure Prediction of Biomolecular Interactions with AlphaFold 3. Nature. 2024;630(8016):493–500. doi: 10.1038/s41586-024-07487-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chai Discovery Team; Boitreaud, J. ; Dent, J. ; McPartlon, M. ; Meier, J. ; Reis, V. ; Rogozhnikov, A. ; Wu, K. . Chai-1: Decoding the Molecular Interactions of Life. BioRxiv Synthetic Biology October 11, 2024. 10.1101/2024.10.10.615955. [DOI] [Google Scholar]
- Wohlwend, J. ; Corso, G. ; Passaro, S. ; Getz, N. ; Reveiz, M. ; Leidal, K. ; Swiderski, W. ; Atkinson, L. ; Portnoi, T. ; Chinn, I. ; Silterra, J. ; Jaakkola, T. ; Barzilay, R. . Boltz-1 Democratizing Biomolecular Interaction Modeling. BioRxiv Biophysics November 20, 2024. 10.1101/2024.11.19.624167. [DOI] [Google Scholar]
- ByteDance AML AI4Science Team; Chen, X. ; Zhang, Y. ; Lu, C. ; Ma, W. ; Guan, J. ; Gong, C. ; Yang, J. ; Zhang, H. ; Zhang, K. ; Wu, S. ; Zhou, K. ; Yang, Y. ; Liu, Z. ; Wang, L. ; Shi, B. ; Shi, S. ; Xiao, W. . Protenix - Advancing Structure Prediction Through a Comprehensive AlphaFold3 Reproduction. BioRxiv Bioinformatics January 11, 2025. 10.1101/2025.01.08.631967. [DOI] [Google Scholar]
- Qiao Z., Nie W., Vahdat A., Miller T. F., Anandkumar A.. State-Specific Protein–Ligand Complex Structure Prediction with a Multiscale Deep Generative Model. Nat. Mach Intell. 2024;6(2):195–208. doi: 10.1038/s42256-024-00792-z. [DOI] [Google Scholar]
- Krishna R., Wang J., Ahern W., Sturmfels P., Venkatesh P., Kalvet I., Lee G. R., Morey-Burrows F. S., Anishchenko I., Humphreys I. R., McHugh R., Vafeados D., Li X., Sutherland G. A., Hitchcock A., Hunter C. N., Kang A., Brackenbrough E., Bera A. K., Baek M., DiMaio F., Baker D.. Generalized Biomolecular Modeling and Design with RoseTTAFold All-Atom. Science. 2024;384(6693):eadl2528. doi: 10.1126/science.adl2528. [DOI] [PubMed] [Google Scholar]
- Wang R., Fang X., Lu Y., Wang S.. The PDBbind Database: Collection of Binding Affinities for Protein–Ligand Complexes with Known Three-Dimensional Structures. J. Med. Chem. 2004;47(12):2977–2980. doi: 10.1021/jm030580l. [DOI] [PubMed] [Google Scholar]
- Jain, A. N. ; Cleves, A. E. ; Walters, W. P. . Deep-Learning Based Docking Methods: Fair Comparisons to Conventional Docking Workflows. arXiv 2024. 10.48550/ARXIV.2412.02889. [DOI] [Google Scholar]
- Masters, M. R. ; Mahmoud, A. H. ; Lill, M. A. . Do Deep Learning Models for Co-Folding Learn the Physics of Protein-Ligand Interactions? bioRxiv Bioinformatics June 4, 2024. 10.1101/2024.06.03.597219. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yu, Y. ; Lu, S. ; Gao, Z. ; Zheng, H. ; Ke, G. . Do Deep Learning Models Really Outperform Traditional Approaches in Molecular Docking? arXiv 2023. 10.48550/ARXIV.2302.07134. [DOI] [Google Scholar]
- Škrinjar, P. ; Eberhardt, J. ; Durairaj, J. ; Schwede, T. . Have Protein-Ligand Co-Folding Methods Moved beyond Memorisation? BioRxiv Bioinformatics February 7, 2025. 10.1101/2025.02.03.636309. [DOI] [Google Scholar]
- Buttenschoen M., Morris G. M., Deane C. M.. PoseBusters: AI-Based Docking Methods Fail to Generate Physically Valid Poses or Generalise to Novel Sequences. Chem. Sci. 2024;15(9):3130–3139. doi: 10.1039/D3SC04185A. [DOI] [PMC free article] [PubMed] [Google Scholar]
- MacDermott-Opeskin, H. ; Scheen, J. ; Wognum, C. ; T Horton, J. ; West, D. ; Payne, A. M. ; Castellanos, M. A. ; Colby, S. ; Griffen, E. ; Cousins, D. ; Stacey, J. ; Reid, L. ; Cara Aschenbrenner, J. ; Fearon, D. ; Balcomb, B. ; Marples, P. ; W. E. Tomlinson, C. ; Lithgo, R. ; Winokan, M. ; Barr, H. ; Lahav, N. ; Lavi, M. ; Duberstein, S. ; Cohen, G. ; Fate, G. ; Lefker, B. ; Robinson, R. ; Szommer, T. ; Lynch, N. ; Tollefson, M. ; Xu, C. ; Hsu, J. ; St-Laurent, J. ; Etsmoberg, H. ; Zhu, L. ; Quirke, A. ; Abdul Haleem, M. I. ; Alibay, I. ; Baid, G. ; Birnbaum, B. ; Bishop, K. ; Bohorquez, H. ; Bose, A. ; Brown, C. J. ; Burns, J. ; Cai, L. ; Cedeno, R. ; Chupakhin, V. ; Clark, F. ; Cole, D. ; Corbi-Verge, C. ; Danial, M. ; Davi, A. ; Dehaen, W. ; Doering, N. P. ; Dougha, A. ; Eakin, B. ; Ehrlich, A. ; Elijosius, R. ; Fülöp, J. ; Gitter, A. ; Gu, Y. ; Head-Gordon, T. ; Jiang, E. ; Kaminow, B. ; Khosravi, S. ; Khoualdi, A. F. ; Lenselink, E. B. ; Liu, Z. ; Liu, Y. ; Liu, S. ; Ma, Y. ; Maher, P. ; Mayer, I. ; Mey, A. ; Montanari, F. ; Niu, T. ; Ogino, R. ; Palaniappan, A. ; Pan, X. ; Patnaik, A. ; Pham, L.-H. ; Pinto, L. ; Purnomo, J. ; Rich, A. ; Schaaf, L. ; Schran, C. ; Srivastava, S. P. ; Sun, K. ; Sun, Z. ; Talagayev, V. ; Thirukonda Subramanian Balakrishnan, B. ; Tkatchenko, A. ; Treyde, W. ; Tripp, A. ; Vithayapalert, N. ; Wang, Y. ; Wasi, A. T. ; Wedig, S. ; Xu, B. ; Zhou, W. ; Von Delft, F. ; Lee, A. ; Kirkegaard, K. ; Sjö, P. ; Fraser, J. ; Chodera, J. D. . A Computational Community Blind Challenge on Pan-Coronavirus Drug Discovery Data. ChemRxiv Chemistry July 25, 2025. 10.26434/chemrxiv-2025-zd9mr. [DOI] [Google Scholar]
- McGann M.. FRED Pose Prediction and Virtual Screening Accuracy. J. Chem. Inf. Model. 2011;51(3):578–596. doi: 10.1021/ci100436p. [DOI] [PubMed] [Google Scholar]
- Friesner R. A., Banks J. L., Murphy R. B., Halgren T. A., Klicic J. J., Mainz D. T., Repasky M. P., Knoll E. H., Shelley M., Perry J. K., Shaw D. E., Francis P., Shenkin P. S.. Glide: A New Approach for Rapid, Accurate Docking and Scoring. 1. Method and Assessment of Docking Accuracy. J. Med. Chem. 2004;47(7):1739–1749. doi: 10.1021/jm0306430. [DOI] [PubMed] [Google Scholar]
- Trott O., Olson A. J.. AutoDock Vina: Improving the Speed and Accuracy of Docking with a New Scoring Function, Efficient Optimization, and Multithreading. J. Comput. Chem. 2010;31(2):455–461. doi: 10.1002/jcc.21334. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gilson, M. ; Eberhardt, J. ; Škrinjar, P. ; Durairaj, J. ; Robin, X. ; Kryshtafovych, A. . Assessment of Pharmaceutical Protein-Ligand Pose and Affinity Predictions in CASP16. Preprints April 26, Authorea: 2025. 10.22541/au.174562565.51283311/v1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Boby M. L., Fearon D., Ferla M., Filep M., Koekemoer L., Robinson M. C., Chodera J. D., Lee A. A., London N., Von Delft A., Von Delft F., Achdout H., Aimon A., Alonzi D. S., Arbon R., Aschenbrenner J. C., Balcomb B. H., Bar-David E., Barr H., Ben-Shmuel A., Bennett J., Bilenko V. A., Borden B., Boulet P., Bowman G. R., Brewitz L., Brun J., Bvnbs S., Calmiano M., Carbery A., Carney D. W., Cattermole E., Chang E., Chernyshenko E., Clyde A., Coffland J. E., Cohen G., Cole J. C., Contini A., Cox L., Croll T. I., Cvitkovic M., De Jonghe S., Dias A., Donckers K., Dotson D. L., Douangamath A., Duberstein S., Dudgeon T., Dunnett L. E., Eastman P., Erez N., Eyermann C. J., Fairhead M., Fate G., Fedorov O., Fernandes R. S., Ferrins L., Foster R., Foster H., Fraisse L., Gabizon R., García-Sastre A., Gawriljuk V. O., Gehrtz P., Gileadi C., Giroud C., Glass W. G., Glen R. C., Glinert I., Godoy A. S., Gorichko M., Gorrie-Stone T., Griffen E. J., Haneef A., Hassell Hart S., Heer J., Henry M., Hill M., Horrell S., Huang Q. Y. J., Huliak V. D., Hurley M. F. D., Israely T., Jajack A., Jansen J., Jnoff E., Jochmans D., John T., Kaminow B., Kang L., Kantsadi A. L., Kenny P. W., Kiappes J. L., Kinakh S. O., Kovar B., Krojer T., La V. N. T., Laghnimi-Hahn S., Lefker B. A., Levy H., Lithgo R. M., Logvinenko I. G., Lukacik P., Macdonald H. B., MacLean E. M., Makower L. L., Malla T. R., Marples P. G., Matviiuk T., McCorkindale W., McGovern B. L., Melamed S., Melnykov K. P., Michurin O., Miesen P., Mikolajek H., Milne B. F., Minh D., Morris A., Morris G. M., Morwitzer M. J., Moustakas D., Mowbray C. E., Nakamura A. M., Neto J. B., Neyts J., Nguyen L., Noske G. D., Oleinikovas V., Oliva G., Overheul G. J., Owen C. D., Pai R., Pan J., Paran N., Payne A. M., Perry B., Pingle M., Pinjari J., Politi B., Powell A., Pšenák V., Pulido I., Puni R., Rangel V. L., Reddi R. N., Rees P., Reid S. P., Reid L., Resnick E., Ripka E. G., Robinson R. P., Rodriguez-Guerra J., Rosales R., Rufa D. A., Saar K., Saikatendu K. S., Salah E., Schaller D., Scheen J., Schiffer C. A., Schofield C. J., Shafeev M., Shaikh A., Shaqra A. M., Shi J., Shurrush K., Singh S., Sittner A., Sjö P., Skyner R., Smalley A., Smeets B., Smilova M. D., Solmesky L. J., Spencer J., Strain-Damerell C., Swamy V., Tamir H., Taylor J. C., Tennant R. E., Thompson W., Thompson A., Tomásio S., Tomlinson C. W. E., Tsurupa I. S., Tumber A., Vakonakis I., Van Rij R. P., Vangeel L., Varghese F. S., Vaschetto M., Vitner E. B., Voelz V., Volkamer A., Walsh M. A., Ward W., Weatherall C., Weiss S., White K. M., Wild C. F., Witt K. D., Wittmann M., Wright N., Yahalom-Ronen Y., Yilmaz N. K., Zaidmann D., Zhang I., Zidane H., Zitzmann N., Zvornicanin S. N.. Open Science Discovery of Potent Noncovalent SARS-CoV-2 Main Protease Inhibitors. Science. 2023;382(6671):eabo7201. doi: 10.1126/science.abo7201. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tuttle J. B., Allais C., Allerton C. M. N., Anderson A. S., Arcari J. T., Aschenbrenner L. M., Avery M., Bellenger J., Berritt S., Boras B., Boscoe B. P., Buzon L. M., Cardin R. D., Carlo A. A., Coffman K. J., Dantonio A., Di L., Eng H., Farley K. A., Ferre R. A., Gajiwala K. S., Gibson S. A., Greasley S. E., Hurst B. L., Kadar E. P., Kalgutkar A. S., Lachapelle E. A., Lanyon L. F., Lee J., Lee J., Lian Y., Liu W., Martínez-Alsina L. A., Mason S. W., Noell S., Novak J., Obach R. S., Ogilvie K., O’Neil S. V., Ostner G., Owen D. R., Patel N. C., Pettersson M., Singh R. S., Rai D. K., Reese M. R., Sakata S., Sammons M. F., Sathish J. G., Sharma R., Steppan C. M., Stewart A., Updyke L., Verhoest P. R., Wei L., Wright S. W., Yang E., Yang Q., Zhu Y.. Discovery of Nirmatrelvir (PF-07321332): A Potent, Orally Active Inhibitor of the Severe Acute Respiratory Syndrome Coronavirus 2 (SARS CoV-2) Main Protease. J. Med. Chem. 2025;68(7):7003–7030. doi: 10.1021/acs.jmedchem.4c02561. [DOI] [PubMed] [Google Scholar]
- Alekseenko A., Kotelnikov S., Ignatov M., Egbert M., Kholodov Y., Vajda S., Kozakov D.. ClusPro LigTBM: Automated Template-Based Small Molecule Docking. J. Mol. Biol. 2020;432(11):3404–3410. doi: 10.1016/j.jmb.2019.12.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kumar A., Zhang K. Y. J.. A Pose Prediction Approach Based on Ligand 3D Shape Similarity. J. Comput. Aided Mol. Des. 2016;30(6):457–469. doi: 10.1007/s10822-016-9923-2. [DOI] [PubMed] [Google Scholar]
- Yang X., Liu Y., Gan J., Xiao Z.-X., Cao Y.. FitDock: Protein–Ligand Docking by Template Fitting. Briefings in Bioinformatics. 2022;23(3):bbac087. doi: 10.1093/bib/bbac087. [DOI] [PubMed] [Google Scholar]
- Landrum, G. ; Tosco, P. ; Kelley, B. ; Rodriguez, R. ; Cosgrove, D. ; Vianello, R. ; sriniker; Gedeck, P. ; Jones, G. ; Kawashima, E. ; NadineSchneider; Nealschneider, D. ; Dalke, A. ; tadhurst-cdd; Swain, M. ; Cole, B. ; Turk, S. ; Savelev, A. ; Vaucher, A. ; Wójcikowski, M. ; Take, I. ; Faara, H. ; Walker, R. ; Scalfani, V. F. ; Probst, D. ; Ujihara, K. ; Maeder, N. ; Pahl, A. ; Godin, G. ; Lehtivarjo, J. . Rdkit/Rdkit: 2025_03_5 (Q1 2025) Release, Zenodo: 2025. 10.5281/ZENODO.16439048. [DOI] [Google Scholar]
- Raymond J. W.. RASCAL: Calculation of Graph Similarity Using Maximum Common Edge Subgraphs. Computer Journal. 2002;45(6):631–644. doi: 10.1093/comjnl/45.6.631. [DOI] [Google Scholar]
- Dalke A., Hastings J.. FMCS: A Novel Algorithm for the Multiple MCS Problem. J. Cheminform. 2013;5(S1):O6. doi: 10.1186/1758-2946-5-S1-O6. [DOI] [Google Scholar]
- Wang S., Witek J., Landrum G. A., Riniker S.. Improving Conformer Generation for Small Rings and Macrocycles Based on Distance Geometry and Experimental Torsional-Angle Preferences. J. Chem. Inf. Model. 2020;60(4):2044–2058. doi: 10.1021/acs.jcim.0c00025. [DOI] [PubMed] [Google Scholar]
- McNutt A. T., Bisiriyu F., Song S., Vyas A., Hutchison G. R., Koes D. R.. Conformer Generation for Structure-Based Drug Design: How Many and How Good? J. Chem. Inf. Model. 2023;63(21):6598–6607. doi: 10.1021/acs.jcim.3c01245. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Halgren T. A.. Merck molecular force field. I. Basis, form, scope, parameterization, and performance of MMFF94. J. Comput. Chem. 1996;17(5–6):490–519. doi: 10.1002/(SICI)1096-987X(199604)17:5/6<490::AID-JCC1>3.0.CO;2-P. [DOI] [Google Scholar]
- Rappe A. K., Casewit C. J., Colwell K. S., Goddard W. A., Skiff W. M.. UFF, a Full Periodic Table Force Field for Molecular Mechanics and Molecular Dynamics Simulations. J. Am. Chem. Soc. 1992;114(25):10024–10035. doi: 10.1021/ja00051a040. [DOI] [Google Scholar]
- Bolton E. E., Chen J., Kim S., Han L., He S., Shi W., Simonyan V., Sun Y., Thiessen P. A., Wang J., Yu B., Zhang J., Bryant S. H.. PubChem3D: A New Resource for Scientists. J. Cheminform. 2011;3(1):32. doi: 10.1186/1758-2946-3-32. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Grant J. A., Gallardo M. A., Pickup B. T.. A Fast Method of Molecular Shape Comparison: A Simple Application of a Gaussian Description of Molecular Shape. J. Comput. Chem. 1996;17(14):1653–1666. doi: 10.1002/(SICI)1096-987X(19961115)17:14<1653::AID-JCC7>3.0.CO;2-K. [DOI] [Google Scholar]
- Kunzmann P., Hamacher K.. Biotite: A Unifying Open Source Computational Biology Framework in Python. BMC Bioinformatics. 2018;19(1):346. doi: 10.1186/s12859-018-2367-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li, J. ; Guan, X. ; Zhang, O. ; Sun, K. ; Wang, Y. ; Bagni, D. ; Head-Gordon, T. . Leak Proof PDBBind: A Reorganized Dataset of Protein-Ligand Complexes for More Generalizable Binding Affinity Prediction. arXiv 2023. 10.48550/ARXIV.2308.09639. [DOI] [Google Scholar]
- Lin Z., Akin H., Rao R., Hie B., Zhu Z., Lu W., Smetanin N., Verkuil R., Kabeli O., Shmueli Y., Dos Santos Costa A., Fazel-Zarandi M., Sercu T., Candido S., Rives A.. Evolutionary-Scale Prediction of Atomic-Level Protein Structure with a Language Model. Science. 2023;379(6637):1123–1130. doi: 10.1126/science.ade2574. [DOI] [PubMed] [Google Scholar]
- Meli R., Biggin P. C.. Spyrmsd: Symmetry-Corrected RMSD Calculations in Python. J. Cheminform. 2020;12(1):49. doi: 10.1186/s13321-020-00455-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mariani V., Biasini M., Barbato A., Schwede T.. lDDT: A Local Superposition-Free Score for Comparing Protein Structures and Models Using Distance Difference Tests. Bioinformatics. 2013;29(21):2722–2728. doi: 10.1093/bioinformatics/btt473. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Robin X., Studer G., Durairaj J., Eberhardt J., Schwede T., Walters W. P.. Assessment of Protein–Ligand Complexes in CASP15. Proteins. 2023;91(12):1811–1821. doi: 10.1002/prot.26601. [DOI] [PubMed] [Google Scholar]
- Biasini M., Schmidt T., Bienert S., Mariani V., Studer G., Haas J., Johner N., Schenk A. D., Philippsen A., Schwede T.. OpenStructure : An Integrated Software Framework for Computational Structural Biology. Acta Crystallogr. D Biol. Crystallogr. 2013;69(5):701–709. doi: 10.1107/S0907444913007051. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The source code of TEMPL is freely available via github.com/fulopjoz/templ-pipeline, and is released under the permissive MIT license. This also includes the code for a streamlit application, which can be run locally or accessed via https://templ.dyn.cloud.e-infra.cz/. Data is deposited to Zenodo, including a snapshot of the code at the time of submission and precalculated ESM embeddings, and including outputs of benchmarks. Benchmark output data are available via https://zenodo.org/records/16875932.




