Table 1.
Summary of the popular protein–peptide complexes datasets that are widely used for testing and benchmarking different docking tool
Dataset | Number of complexes | Length of peptide | Special Features | Specific application | Availability |
---|---|---|---|---|---|
LEADS-PEP | 53 | 3–12 residues | Diverse sequence of peptides, complexes do not interact with nucleic acids | Due to smaller peptide size, suitable for testing tools adapted from small molecule docking tools | www.leads-x.org |
PeptiDB | 105 | 5–15 residues | Diverse secondary structure of peptides including conformational change upon binding, complexes with diverse biological functions | Suitable for testing tools that tackle peptide flexibility | RCSB code of the complexes: https://ars.els-cdn.com/content/image/1-s2.0-S096921260900478X-mmc1.pdf |
PPDbench | 133 | 9–15 residues | Diverse in term peptide sequences (<40% sequence similarity) and biological functionalities | Suitable for testing docking tools on different complexes categorised with different functionalities | https://webs.iiitd.edu.in/raghava/ppdbench/ |
PepPro | 89 | 5–30 residues | Contains 58 unbound receptors structures | Useful for testing tools whether they can predict apo-holo conformational change | http://zoulab.dalton.missouri.edu/PepPro_benchmark |
Propedia | ~20000 | 2–50 residues | Contains subsets of complexes based on clustering on different features such as sequence, interface structure or binding site | Broader range of peptide length allows it to test different type of docking tools. Also, different subset gives flexibility to user on testing their tools | https://bioinfo.dcc.ufmg.br/propedia |
PixelDB | 1966 | NA | Uses machine learning to identify protein and peptide. This helps to overcome the issue of incorrectly identifying them when peptide is larger than the receptor | Broader range of peptide length allows any docking tools to be tested on | https://github.com/KeatingLab/PixelDB |