Abstract
Sidechain rotamer libraries of the common amino acids of a protein are useful for folded protein structure determination and for generating ensembles of intrinsically disordered proteins (IDPs). However much of protein function is modulated beyond the translated sequence through thFiguree introduction of post-translational modifications (PTMs). In this work we have provided a curated set of side chain rotamers for the most common PTMs derived from the RCSB PDB database, including phosphorylated, methylated, and acetylated sidechains. Our rotamer libraries improve upon existing methods such as SIDEpro and Rosetta in predicting the experimental structures for PTMs in folded proteins. In addition, we showcase our PTM libraries in full use by generating ensembles with the Monte Carlo Side Chain Entropy (MCSCE) for folded proteins, and combining MCSCE with the Local Disordered Region Sampling algorithms within IDPConformerGenerator for proteins with intrinsically disordered regions.
INTRODUCTION
Post-translational modifications (PTMs) refer to the chemical modifications that are made to the amino acids of a protein after translation to enable the cell to regulate its function. The importance of PTMs can’t be understated given that at least 80% of mammalian proteins are modulated by PTMs, influencing essentially all biological processes, including cell signaling and metabolic pathways, transcriptional regulation, and DNA repair. In addition, dysregulation of PTMs is implicated in the development and progression of many diseases including cancer.1 In spite of the fact that there are hundreds of PTMs known to occur in biology,2 the available experimental structural data, e.g. from X-ray crystallography, cryo-electron microscopy (Cryo-EM) or Nuclear Magnetic Resonance (NMR) measurements, remains sparse across the full space of chemical modifications compared to unmodified proteins.
In principle, computational models can fill the gap for modeling the protein structural changes introduced by PTMs, although most structure-based prediction algorithms are primarily designed for the canonical amino acids. For example, while the original AlphaFold23 and RosettaFold4 have revolutionized protein structure prediction for unmodified sequences, they did not handle PTMs. However, these algorithms are also largely limited to prediction of single stable protein conformations, and do not provide insights into protein flexibility5–7, including the most flexible class with intrinsic disorder.8–12 PTMs modulate protein energy landscapes, which can lead to changes in conformations and in their dynamic interconversions. Protein flexibility also leads to the fluctuations of side chain packing arrangements that often have a direct functional role13–17. A large number of high-quality physical algorithms18–23 and machine learning approaches24–26 exist to perform side chain repacking for the canonical amino acids for folded proteins, disordered proteins when they undergo binding-upon-folding, or when they form dynamical complexes.
However, the ability to model side chain ensembles with PTMs are currently quite limited. There have been a small number of studies for computational modeling of PTMs27; software packages such as Rosetta28,29, FoldX30, and SIDEpro25 have the ability to model PTMs using rotamer libraries31. A recent method named GlycoSHIELD exclusively focuses on the modeling of glycosylation by grafting glycan conformer candidates sampled from molecular dynamics trajectories33. Given that experimental PTM structural data has continued to accumulate since many of these methods have been developed, it is worth creating new side chain rotamer libraries containing PTMs. Furthermore, since some rotamer libraries such as SIDEpro only characterized the effect of the PTM modifications on side chain torsion angles, it is also valuable to account for the rotamer distribution shifts of the backbone dihedral angles.
In this study, we have undertaken the creation of both backbone-dependent and backbone-independent rotamer libraries to comprehensively investigate the influence of PTMs on sidechain conformational ensembles. The decision to generate both types of libraries arises from the goal of capturing nuanced details of sidechain variability within the local structural context provided by the protein’s backbone, as well as to understand more general patterns of sidechain conformations across diverse protein structures where data sparsity is less of a concern. We compare our rotamer libraries against SIDEpro25 and Rosetta28,29, and show that the overall trend across various metrics show systematic improvements against folded proteins and IDP compared to these standard methods.
METHODS
Structural Data for post-translational modifications of amino acids.
To generate structured data for PTM-modified amino acids to construct our datasets, we first accessed the PTM Structural Database (PTM-SD).32 Each data entry in the PTM-SD corresponds to a distinct post-translational modification on a specific amino acid residue. Within the PTM Structural Database, an inquiry targeting these amino acids and their associated modifications facilitated the retrieval of RCSB Protein Data Bank (PDB) identification codes for each modified residue.
Utilizing the obtained PDB identification codes, we conducted searches within the PDB database33 to acquire structural data for the corresponding modified amino acids. A notable observation was that a subset of proteins frequently exhibited high sequence similarity and identity. Such a high degree of redundancy would introduce biases that could impact the interpretation of patterns and relationships in the rotamer data. To address this, the independent NCBI BLAST tool34 was employed to detect identical sequences. Through the alignment of FASTA sequences from all proteins within the dataset, a threshold for sequence similarity of 90% or above was established, resulting in the clustering of protein structures. Subsequently, within each group, the structure having the highest resolution was selected for further analysis. Furthermore, PDB files with incomplete structural data were excluded.
Supplementary Tables S1 and S2 provide the curated PDB dataset for PTM-containing amino acids, further delineated by the refined resolution ranges for structures including each modified residue type, and the number of PDB files available in each resolution category. We found PDB files for phosphorylated serine (SEP), phosphorylated threonine (TPO), phosphorylated tyrosine (PTR), methylated arginine (AGM), mono-methylated lysine (MLZ), di-methylated lysine (MLY), tri-methylated lysine (M3L), oxidized methionine (OMT), and acetylated lysine (ALY) (see Supplementary Table S3). However, we chose to perform our analysis and rotamer generation to create PTM rotamer libraries only on modified amino acids having sufficient data by defining a resolution cutoff of 3.5 Å and lower, and requiring a minimum of 40 total PDB files. Based on these criteria, we only developed rotamer libraries for SEP, TPO, PTR, M3L, and ALY.
Statistical Analysis for PTM rotamer libraries.
Given that the modified amino acids exhibit a non-uniform distribution of backbone phi (ϕ) and psi (ψ), and the relative sparsity of the observations of side chain rotamer chi (χi) for specific backbone dihedral angles, we use the adaptive kernel density estimation of Shapovalov and Dunbrack to estimate the rotamer probability density functions (PDFs)35. This method determines the width of the kernel based on the local density of data points, such that in denser regions narrower kernels are applied to capture more local variations while broader kernels create smoother distributions in regions where rotamer occupancy is sparse.
We employ the von Mises kernel36 which is well-suited to periodic data such as dihedral angles. The Nadaraya-Watson kernel regression model37,38 considers the influence of neighboring data points in a weighted manner, producing smoothed estimates of the mean and standard deviation values for χ angles within each ϕ/ψ bin. This smoothing helps mitigate the impact of noise and fluctuations in the data, providing a more robust and reliable characterization of the relationships between the backbone ϕ/ψ and sidechain χ dihedral angles. Bayes’ rule is applied to these rotamer probability density functions to acquire the rotamer probabilities.
Backbone-dependent (BD) PTM rotamer libraries.
In the construction of the BD-rotamer library, we discretized the backbone space into bins, each spanning a 30° interval. This level of granularity allows for a detailed representation of the local backbone geometry and facilitates the accurate prediction of sidechain conformations. The 30° bin size was chosen to balance computational efficiency with the need to capture subtle variations in sidechain orientations influenced by the local backbone structure. Regarding discretized bins which lack occupancy in the rotamer library, the probability and χ angle means and standard deviation are estimated using the backbone-independent (BI) probability distributions.
In the process of estimating rotamer means and standard deviations and discretizing angle bins for a rotamer sidechain library, the specified angle ranges for trans (T, (120°, 180°) or (−180°, −120°)), gauche (G+, (0°, 120°)), and gauche- (G–, (−120°, 0°)) conformers are defined for sp3 carbons (Supplementary Figure S1). Given a sp3-sp3 hybridized bond, the degrees of freedom of the dihedral angles have probability density distributions that contain 3 distinct and symmetric peaks that occur at 60°, −60°, and −180/180° and that align with the conformers mentioned above. For ALY χ5, the rotamer bins are instead defined for gauche (G+, (30°, 90°) or (−150°, −90°)), gauche- (G−, (90°, 150°) or (−90°, −30°)) and trans (T, (−30°, 30°) or (−180°, −135°) or (135°, 180°)) given the sp3-sp2 hybridized bond. For PTR χ2, the rotamer bins are defined for gauche (G, (45°, 135°) or (−135°, −45°)) and trans (T, (−45°, 45°) or (−180°, −135°) or (135°, 180°)) to account for the symmetry of a benzene ring. When discretizing angle bins, they are aligned with these conformer-specific ranges, ensuring that each bin corresponds to the appropriate conformer type. The estimation of rotamer means and standard deviations is then performed within these conformer-specific ranges.
While canonical amino acids fit well into this discretization strategy, we find that PTMs sometimes do not fit into standard categories. To accommodate non-standard rotamer angle distributions observed for the χ2 of SEP and TPO, and χ3 of PTR, we split the probability distributions by their population density with clusters defined using DBSCAN43. Furthermore, as the terminal χ angles of phosphorylated SEP, TPO, PTR, and the methylated M3L have a 3-fold symmetry around the torus axis and show broad distributions regardless of the other sidechain angles, we estimated their means and standard deviations by only aligning with the conformer ranges of the angles themselves (Supplementary Figure 2).
Creating protein structures using PTM rotamer libraries for folded proteins and IDPs.
We use our newly generated PTM libraries by generating side chain ensembles for folded proteins with the Monte Carlo Side Chain Entropy (MCSCE).18 The MCSCE algorithm is also embedded within the Local Disordered Region Sampling (LDRS)39 algorithms within IDPConformerGenerator40 for proteins with intrinsically disordered regions (IDRs). The backbone conformers of disordered regions were aligned and attached to folded domains using the LDRS module in IDPConformerGenerator, discarding conformations with steric clashes. For each disordered region case, we sampled 20,000 backbone conformations from a combination of loop, helices, and β-strands regions; successful and complete sidechain packing statistics are given in Supplementary Table S4.
Candidate sidechain packings were generated with Monte Carlo-Side Chain Entropy (MCSCE)18 program for the disordered regions of the unmodified sequence using the Dunbrack library39,44, and with PTMs using the BD-rotamer and BI-rotamer libraries we developed in this work. We also used MCSCE to generate sidechain packings using the SIDEpro25 PTM rotamer libraries. As the SIDEpro rotamer library only defines the additional sidechain angles for PTMs conditioned on the rotamer ranges of the existing angles in the unmodified amino acid residues, these sidechain angles were sampled according to the probability distribution of the corresponding canonical amino acids. In both cases, the rotamers of the folded domains are unchanged. The Rosetta28,29 packed conformers were generated by invoking the fixbb application starting from the same backbone conformers attached to the folded domains, similarly with the folded domains held fixed. As Rosetta’s scoring function during conformer packing does not define an explicit clash cutoff as in MCSCE, the Rosetta generated conformers were also filtered with the same clash criteria in MCSCE for comparison.
RESULTS
We first consider a BD-rotamer library for the SEP, TPO, PTR, M3L, and ALY that categorize sidechain conformations based on specific backbone ϕ and ψ dihedral angles. As shown by Dunbrack and co-workers41, a BD library for predicting side chain conformations produces much better results for protein structure refinement using NMR and X-ray data as opposed to a BI-rotamer library. The BI-rotamer library focuses solely on the distribution of sidechain dihedral angles, and for PTMs may be a necessity if the data is too sparse to differentiate it from the BD-rotamer case.
The influence of backbone conformations on PTM sidechain rotamer states is illustrated in Figures 1 and 2 (and Supplementary Figures S1–S3). In Figure 1, the χ1 of PTM-modified amino acids show distinct rotamer populations in different ϕ/ψ regions (and in relation to secondary structure), and more importantly the χ1 distributions for the PTM-modified amino acids also shift considerably from the unmodified canonical residues in these regions. For example, while the χ1 of SER, THR and LYS show similar populations regardless of ϕ/ψ ranges, the χ1 of SEP, TPO and M3L have visibly different preferences for certain backbone regions (Figure 1). This illustrates that the SIDEPro rotamer library that utilizes an unmodified χ1 will be unable to fully capture these structural changes exhibited by the PTMs.
Figure 1. Ramachandran plots color coded by χ1 angle ranges for PTM-modified amino acids using the backbone-dependent library.
Backbone bins in 60° separation are shown in gray dash lines.
Figure 2. Ramachandran plots color coded by χ1 and χ2 angle ranges for SEP and TPO with hydrogen bond formation using the backbone-dependent library.
Backbone bins in 60° separation are shown in gray dash lines. Hydrogen bonds are defined by within a donor-acceptor distance cutoff of 3.5 Å. A1, A2 and B show representative configurations for SEP (A1, A2) and TPO (B) associated with the annotated backbone regions.
The χ2 of the phosphorylated amino acids SEP and TPO adopt rotamers that cannot be easily explained using bond hybridization models, but instead arise from favorable hydrogen bonding interactions45 (Figure 2). In particular, TPO has a dominating χ1/χ2 population centered at −60°/120°, a configuration that encourages formation of an internal P-O/N-H hydrogen bond (Figure 2B). χ2 of SEP exhibits a mixed population that spans the angular range from 60° to −120°. In addition to the χ1/χ2 −60°/±120° configuration associated with an internal hydrogen bond (Figure 2–A1), we observe the formation of a hydrogen bond between P-O of the SEP residue i and N-H of its adjacent residue i+1 with a χ1/χ2 configuration around −180°/60° (Figure 2–A2). These varieties of sidechain rotamer states are consistent with the cooperative transition between a state in which the phosphate group is well-solvated and a state that forms intra- and inter-residue hydrogen bonds, as noted in reference (45). As shown in Figure 2 (and Supplementary Figures S1–S3), the observed higher percentage of hydrogen bond formations in the polyproline helical backbone region, and the resulting selectivity in the χ1/χ2 configurations, help rationalize the distinct rotamer populations for phosphorylated amino acids such as SEP and TPO that are dependent on backbone configurations. All these differences highlight the need to better characterize the rotamer states of PTM-modified amino acids apart from the canonical amino acids.
We verify the fidelity of our constructed BD-rotamer and BI-rotamer libraries for PTMs by sampling their sidechain torsion distributions to generate new side chain packings and compare the resulting accuracies of the repacked structures to the experimental PDB data. In Supplementary Figure S4, we show that the sidechain rotamer distribution of the constructed libraries are in good agreement with the PDB data when sampled based on the same backbone conformation. When repacking PTMs sidechains, we removed the original PTM sidechain structures and then resampled the rotamers for PTMs with all other residues intact. We compare the RMSD values of the full protein structures with PTMs, regenerated with the BD-rotamer and BI-rotamer libraries from this work against the SIDEpro and Rosetta libraries, and comapre the distributions as boxplots in Figure 3A.
Figure 3. RMSD distributions of repacked PTM-modified residues using different rotamer libraries to experimental structures.
SIDEpro does not support ALY packing. A). Boxplots for the RMSD distributions. Medians are highlighted in white and each box extends from the first quartile (Q1) to the third quartile (Q3). Outliers in circle are defined as points outside of 1.5 times the interquartile range below Q1 or above Q3. B). Repacked PTM-containing structures using different rotamer libraries compared to the experimental PDB structures. Examples are taken from PDB ID 3CLY (PTR), 4EZH (M3L) and 4QUT (ALY).
It is evident that the BD rotamer library has a smaller interquartile range of somewhere between 0.25 Å to 1.0 Å and is skewed toward lower RMSD values across all of the PTM types compared to the other rotamer libraries (Figure 3A and Supplementary Table S5). The BI-rotamer library for PTMs also demonstrates trends in improvement to the other standard methods, with the exception of TPO, and the statistical significance of improvement is not as strong compared to the BD-rotamer library. We also performed a Wilcoxon Signed Rank test to evaluate the RMSD differences between each distribution pair with a 95% confidence level, in which the p-values from this test are in Supplementary Figure S5 show that the BD-rotamer library results in a statistically meaningful decrease in RMSDs compared to existing methods such as SIDEpro and Rosetta. For all PTM types considered in this work, the PTM BD-rotamer provides excellent performance in recovering the experimental structures; Figure 3B provides examples of repacked PTM-modified structures using all four methods and compared to experimental PDB structures.
To demonstrate how our PTM libraries can support ensemble generation for disordered proteins, we considered two cases for which the proteins contain IDRs. The Histone H3 N-terminal IDR within the nucleosome structure (chains C and G, PDB ID 8SIY) have 4 of the 5 types of PTMs investigated here, with methylation at K9, phosphorylation at S10 and T11, and acetylation at K14 (Figure 4A). The ubiquitin recognition factor in ER-associated degradation protein 1 (UFD1) with phosphorylation at Y219 (PDB ID 2YUJ) is shown in Figure 5A. For both cases we compare ensembles generated with and without PTMs using our BD-rotamer and BI-rotamer libraries as well as libraries from SIDEpro and Rosetta, although SIDEpro only contains M3L, SEP and TPO since ALY is not supported by that method25. We also compared against results using the original Rosetta scoring function that ignores steric clashes for the Histone H3 nucleosome structure. Supplementary Figure S6 shows that the Ramachandran plots of the sidechain rotamer states with PTMs do not change for IDRs, nor among libraries, and only small changes are observed in secondary structure between rotamer libraries as seen in Supplementary Figure S7. This indicates that the structural changes are concentrated in any differences in side chain packing.
Figure 4. Histone H3 conformers generated with different PTMs libraries.
A). Modifications on the histone H3 N-terminus on chains C/G of the nucleosome. B). Ensembles of 30 all-atom H3-modified nucleosome conformers (folded domains and DNA are taken from PDB ID 8SIY). The N-terminal IDRs on chain C and G are shown in salmon (loop) and cyan (helices), and folded domains (yellow). PTM-modified residues are highlighted with stick representations. C). Comparison of torsion angle probability distributions for PTM-modified residues in the H3 conformers with different libraries. D). Fractional inter-residue contacts (Cα-Cα distances within 8 Å) of PTMs containing ensembles (δptm) subtracted by the ensemble without modifications (δ0) for the IDR regions. The maps were calculated with 500 randomly sampled conformers (based on convergence of Rgand averaged over 10 trials (blue-less, red-more). From left to right: BD-rotamer, BI-rotamer, SIDEpro, Rosetta and Rosetta without clash filtering, with contacts averaged from chain C and G.
Figure 5. UDF1 conformers generated with different PTM libraries.
. A). Modifications on the C-terminal IDR of UDF1 at Y219. B). Cartoon representations of 30 all-atom UDF1 conformers (structure of the folded domain taken from PDB ID 2YUJ model). The C-terminal IDR is shown in salmon red (loop) and cyan (helices), and the folded domains in yellow. C). Comparison of torsion angle probability distributions for PTM in the UDF1 conformers with different libraries. D). Fractional inter-residue contacts (Cα-Cα distances within 8 Å) of PTMs containing ensembles (δptm) subtracted by the ensemble without modifications (δ0) for the IDR region around the PTM site. The maps were calculated with ensembles of 500 randomly sampled conformers and averaged over 10 trials. Top: BD-rotamer, BI-rotamer; Bottom (L to R): SIDEpro, Rosetta and Rosetta without clash filtering.
Figures 4C and 5C show that the torsional properties of the PTM IDR ensembles are different to IDR ensembles without sidechain modifications at the modification sites. Furthermore, the sidechain rotamer state changes are reflected differently for the BD-, BI-, SIDEpro, and two Rosetta-rotamer ensembles. To better analyze what are the structural consequences of the IDR ensembles generated with the different PTM rotamer libraries, we constructed 2D contact maps subtracting the values from the ensemble without PTMs (Figures 4D and 5D) to look for increases (red) or decreases (blue) of residue-residue contacts being made. For Histone H3, residues 5–14 show a higher population of contacts with the BD-rotamer library, which we see corresponds to a much denser hydrogen-bonded network on average in this area (Figure 4B). A similar observation is found for UDF1 using the BD-rotamer library, and although it is only a single modification spot at residue 219, the PTM introduces a network of hydrogen bonds (example shown in Figure 5B). Especially for Histone H3, the other rotamer libraries show a smaller set of contacts or even net loss of contacts in this same region. In turn the BD-rotamer generated structures show a diminishment of contacts made by these same PTM residues with other regions of the protein, an effect which is muted in the other libraries. For the BI-rotamer library this is due to over-averaging, whereas for SIDEPro the differences with the unmodified ensembles is because it uses the same χ1 as the unmodified residues. It is interesting that the Rosetta rotamer-library combined with no clash criteria gives a nearly opposite trend in sidechain packing for residues with PTMs. Given that repacking structures on the folded protein backbone showed greater reliability with the BD-rotamer set, we extrapolate that there is better justification for supporting the structural consequences observed for the Histone H3 and UDF1 IDRs.
DISCUSSION AND CONCLUSION
Modeling sidechain conformations for PTMs have important applications in understanding protein conformational switches, including changes in dynamics and disordered protein interactions, leading to functional consequences modulated by PTMs. Although publicly available high-resolution structures including residues modified by many of the known PTMs are still limited, there is enough structural data for residues modified by phosphorylation, methylation, and acetylation. Thus, we have developed BD-rotamer and BI-rotamer libraries for PTM-modified residues, using structural data curated and cleaned from the recent PTM-SD update.
We evaluate the constructed libraries in comparison to SIDEpro and Rosetta in the context of generating conformers for folded domains and for proteins with disordered regions. The BD-rotamer libraries outperform the BI-rotamer libraries as well as SIDEpro and Rosetta in retrieving the experimental structures for PTMs on the folded proteins, as evidence of its ability to capture the correlation more accurately between backbone and sidechain, and within sidechain dihedral conformations. For phosphorylated residues specifically, the ability to predict the sidechain rotamer cooperatively is crucial to modeling local hydrogen bonding interactions, thus allowing one to better delineate the structural features of single conformer and ensembles upon chemical modifications. We also show that the constructed PTM-modified residue rotamer libraries can be used along with MCSCE and IDPConformerGenerator to produce all-atom conformers for disordered proteins with PTMs. If experimental data such as nuclear magnetic resonance, small-angle X-ray scattering, and fluorescence resonance energy transfer are available, a reweighting protocol using X-EISD46 or evolving underlying ensembles to agree with experimental data using DynamICE42 can be applied to these all-atom sidechain-modified conformers to generate more realistic ensemble representations to investigate PTM-regulated activities and interactions.
Supplementary Material
ACKNOWLEDGEMENTS
J.D.F.-K. and T.H.-G. acknowledge funding from the National Institutes of Health (2R01GM127627-05). J.D.F.-K. also acknowledges support from the Natural Sciences and Engineering Research Council of Canada (NSERC, 2016-06718) and from the Canada Research Chairs Program.
DATA AND CODE AVAILABILITY
The codes for dihedral angle computations and library creation are available at https://github.com/THGLab/ptm_sc.git. They support backbone dependent and independent rotamer analysis and library construction and can be extended to other PTMs. We also note that the library is part of the MCSCE program seamlessly integrated into the IDPConformerGenerator platform, where users can generate conformers with disordered regions with PTMs in one command line.
REFERENCES
- 1.Ramazi S. & Zahiri J. Post-translational modifications in proteins: resources, tools and prediction methods. Database 2021, baab012, doi: 10.1093/database/baab012 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Huang K.-Y. et al. dbPTM in 2019: exploring disease association and cross-talk of post-translational modifications. Nucleic Acids Research 47, D298–D308, doi: 10.1093/nar/gky1074 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Jumper J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589, doi: 10.1038/s41586-021-03819-2 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Baek M. et al. Accurate prediction of protein structures and interactions using a three-track neural network. Science 373, 871–876, doi: 10.1126/science.abj8754 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Lane T. J. Protein structure prediction has reached the single-structure frontier. Nature Methods 20, 170–173, doi: 10.1038/s41592-022-01760-4 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Wolff A. M. et al. Mapping protein dynamics at high spatial resolution with temperature-jump X-ray crystallography. Nature Chemistry 15, 1549–1558, doi: 10.1038/s41557-023-01329-4 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Wayment-Steele H. K. et al. Predicting multiple conformations via sequence clustering and AlphaFold2. Nature 625, 832–839, doi: 10.1038/s41586-023-06832-9 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Uversky V. N., Oldfield C. J. & Dunker A. K. Intrinsically Disordered Proteins in Human Diseases: Introducing the D2 Concept. Annual Review of Biophysics 37, 215–246 (2008). [DOI] [PubMed] [Google Scholar]
- 9.Wright P. E. & Dyson H. J. Intrinsically disordered proteins in cellular signalling and regulation. Nat Rev Mol Cell Biol 16, 18–29 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Bhowmick A. et al. Finding Our Way in the Dark Proteome. J Am Chem Soc 138, 9730–9742, doi: 10.1021/jacs.6b06543 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Lazar T. et al. PED in 2021: a major update of the protein ensemble database for intrinsically disordered proteins. Nucleic Acids Research 49, D404–D411, doi: 10.1093/nar/gkaa1021 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Ghafouri H. et al. PED in 2024: improving the community deposition of structural ensembles for intrinsically disordered proteins. Nucleic Acids Research 52, D536–D544, doi: 10.1093/nar/gkad947 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Fraser J. S. et al. Accessing protein conformational ensembles using room-temperature X-ray crystallography. Proceedings of the National Academy of Sciences of the United States of America 108, 16247–16252, doi: 10.1073/pnas.1111325108 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Moorman V. R., Valentine K. G. & Wand A. J. The dynamical response of hen egg white lysozyme to the binding of a carbohydrate ligand. Protein science : a publication of the Protein Society 21, 1066–1073, doi: 10.1002/pro.2092 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Fenwick R. B., van den Bedem H., Fraser J. S. & Wright P. E. Integrated description of protein dynamics from room-temperature X-ray crystallography and NMR. Proceedings of the National Academy of Sciences of the United States of America 111, E445–454, doi: 10.1073/pnas.1323440111 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Welborn V. V. & Head-Gordon T. Fluctuations of Electric Fields in the Active Site of the Enzyme Ketosteroid Isomerase. Journal of the American Chemical Society 141, 12487–12492, doi: 10.1021/jacs.9b05323 (2019). [DOI] [PubMed] [Google Scholar]
- 17.Richard J. P. Protein Flexibility and Stiffness Enable Efficient Enzymatic Catalysis. Journal of the American Chemical Society 141, 3320–3331, doi: 10.1021/jacs.8b10836 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Bhowmick A. & Head-Gordon T. A Monte Carlo method for generating side chain structural ensembles. Structure 23, 44–55, doi: 10.1016/j.str.2014.10.011 (2015). [DOI] [PubMed] [Google Scholar]
- 19.Dicks L. & Wales D. J. Exploiting Sequence-Dependent Rotamer Information in Global Optimization of Proteins. The Journal of Physical Chemistry B 126, 8381–8390, doi: 10.1021/acs.jpcb.2c04647 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Jumper J. M., Faruk N. F., Freed K. F. & Sosnick T. R. Accurate calculation of side chain packing and free energy with applications to protein molecular dynamics. PLOS Computational Biology 14, e1006342, doi: 10.1371/journal.pcbi.1006342 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Liang S., Zheng D., Zhang C. & Standley D. M. Fast and accurate prediction of protein side-chain conformations. Bioinformatics 27, 2913–2914 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Ollikainen N., de Jong R. M. & Kortemme T. Coupling Protein Side-Chain and Backbone Flexibility Improves the Re-design of Protein-Ligand Specificity. PLOS Computational Biology 11, e1004335, doi: 10.1371/journal.pcbi.1004335 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Huang X., Pearce R. & Zhang Y. FASPR: an open-source tool for fast and accurate protein side-chain packing. Bioinformatics 36, 3758–3765, doi: 10.1093/bioinformatics/btaa234 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.McPartlon M. & Xu J. An end-to-end deep learning method for protein side-chain packing and inverse folding. Proceedings of the National Academy of Sciences 120, e2216438120, doi: 10.1073/pnas.2216438120 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Nagata K., Randall A. & Baldi P. SIDEpro: a novel machine learning approach for the fast and accurate prediction of side-chain conformations. Proteins 80, 142–153, doi: 10.1002/prot.23170 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Misiura M., Shroff R., Thyer R. & Kolomeisky A. B. DLPacker: Deep learning for prediction of amino acid side chain conformations in proteins. Proteins: Structure, Function, and Bioinformatics 90, 1278–1290, doi: 10.1002/prot.26311 (2022). [DOI] [PubMed] [Google Scholar]
- 27.Petrovskiy D. V. et al. Modeling Side Chains in the Three-Dimensional Structure of Proteins for Post-Translational Modifications. International Journal of Molecular Sciences 24, 13431 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Leaver-Fay A. et al. ROSETTA3: an object-oriented software suite for the simulation and design of macromolecules. Methods Enzymol 487, 545–574, doi: 10.1016/b978-0-12-381270-4.00019-6 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Alford R. F. et al. The Rosetta All-Atom Energy Function for Macromolecular Modeling and Design. Journal of Chemical Theory and Computation 13, 3031–3048, doi: 10.1021/acs.jctc.7b00125 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Schymkowitz J. et al. The FoldX web server: an online force field. Nucleic Acids Research 33, W382–W388, doi: 10.1093/nar/gki387 (2005). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Renfrew P. D., Craven T. W., Butterfoss G. L., Kirshenbaum K. & Bonneau R. A rotamer library to enable modeling and design of peptoid foldamers. J Am Chem Soc 136, 8772–8782, doi: 10.1021/ja503776z (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Craveur P., Rebehmed J. & de Brevern A. G. PTM-SD: a database of structurally resolved and annotated posttranslational modifications in proteins. Database 2014, bau041, doi: 10.1093/database/bau041 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Berman H. M. et al. The Protein Data Bank. Nucleic Acids Research 28, 235–242, doi: 10.1093/nar/28.1.235 (2000). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Altschul S. F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25, 3389–3402, doi: 10.1093/nar/25.17.3389 (1997). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Shapovalov Maxim V. & Dunbrack Roland L., Jr. A Smoothed Backbone-Dependent Rotamer Library for Proteins Derived from Adaptive Kernel Density Estimates and Regressions. Structure 19, 844–858, doi: 10.1016/j.str.2011.03.019 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Mardia K. V. & Zemroch P. J. The Von Mises Distribution Function. Journal of the Royal Statistical Society Series C: Applied Statistics 24, 268–272, doi: 10.2307/2346578 (1975). [DOI] [Google Scholar]
- 37.Nadaraya E. A. On Estimating Regression. Theory of Probability & Its Applications 9, 141–142, doi: 10.1137/1109020 (1964). [DOI] [Google Scholar]
- 38.Watson G. S. Smooth Regression Analysis. Sankhyā: The Indian Journal of Statistics, Series A (1961–2002) 26, 359–372 (1964). [Google Scholar]
- 39.Liu Z. H. et al. Local Disordered Region Sampling (LDRS) for ensemble modeling of proteins with experimentally undetermined or low confidence prediction segments. Bioinformatics 39, btad739, doi: 10.1093/bioinformatics/btad739 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Teixeira J. M. C. et al. IDPConformerGenerator: A Flexible Software Suite for Sampling the Conformational Space of Disordered Protein States. The Journal of Physical Chemistry A 126, 5985–6003, doi: 10.1021/acs.jpca.2c03726 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Dunbrack R. L. Jr & Cohen F. E. Bayesian statistical analysis of protein side-chain rotamer preferences. Protein Science 6, 1661–1681, doi: 10.1002/pro.5560060807 (1997). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Zhang O. et al. Learning to evolve structural ensembles of unfolded and disordered proteins using experimental solution data. J Chem Phys 158, doi: 10.1063/5.0141474 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The codes for dihedral angle computations and library creation are available at https://github.com/THGLab/ptm_sc.git. They support backbone dependent and independent rotamer analysis and library construction and can be extended to other PTMs. We also note that the library is part of the MCSCE program seamlessly integrated into the IDPConformerGenerator platform, where users can generate conformers with disordered regions with PTMs in one command line.