Skip to main content
Protein Science : A Publication of the Protein Society logoLink to Protein Science : A Publication of the Protein Society
. 2024 Jun 26;33(7):e5065. doi: 10.1002/pro.5065

Assessing the functional roles of coevolving PHD finger residues

Shraddha Basu 1, Ujwal Subedi 1, Marco Tonelli 2, Maral Afshinpour 1, Nitija Tiwari 3, Ernesto J Fuentes 3, Suvobrata Chakravarty 1,
PMCID: PMC11201814  PMID: 38923615

Abstract

Although in silico folding based on coevolving residue constraints in the deep‐learning era has transformed protein structure prediction, the contributions of coevolving residues to protein folding, stability, and other functions in physical contexts remain to be clarified and experimentally validated. Herein, the PHD finger module, a well‐known histone reader with distinct subtypes containing subtype‐specific coevolving residues, was used as a model to experimentally assess the contributions of coevolving residues and to clarify their specific roles. The results of the assessment, including proteolysis and thermal unfolding of wildtype and mutant proteins, suggested that coevolving residues have varying contributions, despite their large in silico constraints. Residue positions with large constraints were found to contribute to stability in one subtype but not others. Computational sequence design and generative model‐based energy estimates of individual structures were also implemented to complement the experimental assessment. Sequence design and energy estimates distinguish coevolving residues that contribute to folding from those that do not. The results of proteolytic analysis of mutations at positions contributing to folding were consistent with those suggested by sequence design and energy estimation. Thus, we report a comprehensive assessment of the contributions of coevolving residues, as well as a strategy based on a combination of approaches that should enable detailed understanding of the residue contributions in other large protein families.

Keywords: AlphaFold2, coevolutionary signal, coevolving residues, protein family, protein stability, proteolysis, sequence design

1. INTRODUCTION

Given that pairs of residues in spatial proximity in folded proteins tend to coevolve (Göbel et al., 1994), coevolutionary signals between residues in homologous protein sequences (Schaarschmidt et al., 2018) have provided a powerful means of accurately estimating inter‐residue contact distances from protein sequences (Balakrishnan et al., 2011; Jones et al., 2012; Kamisetty et al., 2013; Marks et al., 2011; Seemayer et al., 2014; Wang et al., 2017). Coevolutionary signals thus enable protein fold prediction (Bartlett & Taylor, 2008; Marks et al., 2011). Coevolutionary signals are typically inferred from site‐specific residue correlations observed in multiple sequence alignments (MSAs) (De Juan et al., 2013; Hopf et al., 2019; Ju et al., 2021). Coevolutionary signals from MSAs are transformed into inter‐residue distances (or distance/folding constraints) to predict the three‐dimensional (3D) structures of proteins (Ju et al., 2021; Marks et al., 2011; Senior et al., 2020). With an MSA of sufficient depth, sequence information alone can allow for accurate prediction of the 3D structures of proteins (Ju et al., 2021; Marks et al., 2011; Senior et al., 2020). MSA and corresponding coevolutionary signals have contributed to the paradigm shift in structure biology (Elofsson, 2023; Jumper & Hassabis, 2022) heralded by AlphaFold2 (Jumper et al., 2021) (AF2), RoseTTAFold (Baek et al., 2021), ESM‐2 (Lin et al., 2023), and OpenFold (Ahdritz et al., 2022). Conformational heterogeneity (Sala et al., 2023), achieved through alterations in coevolutionary signals (e.g., coevolving residue masking (Schafer & Porter, 2023; Stein & McHaourab, 2022), or use of shallow (Del Alamo et al., 2022; Schlessinger & Bonomi, 2022) or subclustered (Monteiro da Silva et al., 2024; Wayment‐Steele et al., 2024) MSAs), in AF2‐predicted structures thus further highlighting the contribution of coevolutionary signals to structure prediction.

However, the simple narrative of sequence to structure via the use of MSA has prompted questions regarding the contributions of coevolving residues (i.e., residues with strong coevolutionary signals) to the overall fold in proteins. For example, a single MSA, such as one representing a large protein family (e.g., Pfam (Mistry et al., 2021) domain proteins), can have family members whose structures deviate from one another. Although the coevolving residue signals for such a family would remain the same, the structures of the family members could deviate. Consequently, a single MSA might include multiple, possibly conflicting, sets of folding constraints. In addition, coevolutionary signals might also represent functional constraints, which might be misinterpreted as folding constraints. To better understand the contributions of coevolving residue signals, in this study, we probed coevolving residue positions of the plant homeodomain (PHD) finger module, a functionally diverse Pfam domain, as a model of a family including differing PHD subtype structures (Figure 1).

FIGURE 1.

FIGURE 1

Structural and sequence features of PHD subtypes: (a) Cartoon representing PHD finger scaffolds PHD_nW_DD (left) and PHD_W (right). The peptide binding strands are highlighted in black, the 310‐helix (PHD_nW_DD) and helical inserts (PHD_W) are in cyan, and Zn atoms are pink. (b) AlphaFold2 (AF2) DB structures of structurally uncharacterized PHD fingers. (c) PDB structures of PHD fingers. The coevolving residue positions and residues (green) are also highlighted in (b) and (c). PHD fingers in (b) and (c) share ~40% sequence identity. PHD, plant homeodomain.

We focused on two PHD subtypes, PHD_nW_DD and PHD_W (Boamah et al., 2018) (Figure 1), for convenience, because their structures have been very well characterized (Black & Kutateladze, 2023; Chakravarty et al., 2009; Gaurav & Kutateladze, 2023; Nishiyama et al., 2020; Rajakumara et al., 2011; Sanchez & Zhou, 2011; Tsai et al., 2010; Wang et al., 2010; Zeng et al., 2010). Beyond their functional differences (Figure S1), the subtypes are characterized by structural differences (Figure 1; Figure S1). For example, a 310‐helix (Bortoluzzi et al., 2017), preceding the canonical peptide binding site (strands in black, Figure 1), is present and absent in the PHD_nW_DD and PHD_W subtypes, respectively (Figure 1). Similarly, a helical insert is absent and present in the PHD_nW_DD and PHD_W subtypes, respectively (Boamah et al., 2018) (Figure 1). For family members lacking experimental structures, AF2 DB (Varadi et al., 2024) models also capture the respective structural features (Figure 1b). Beyond structural differences, subtypes are characterized by site‐specific residue positions, such as the AW1, AW2, L2, D2, W/G, V, and P positions (Boamah et al., 2018) (Figure 1b,c). For example, residues such as an aspartate at D2 or a leucine at L2 are often found in PHD_nW_DD (Figure 1c) but are typically absent in PHD_W (Figure 1c). Similarly, tryptophan and methionine residues at the W and M positions, respectively, along with other aromatic cage‐forming residues, are often found in PHD_W but are absent in PHD_nW_DD (Boamah et al., 2018) (Figure 1c). Therefore, the residues at the AW1, AW2, L2, D2, V, and P positions have coevolved in PHD_nW_DD (Boamah et al., 2018), whereas the aromatic cage residues have coevolved in the PHD_W subtype. Because of the co‐occurrence of sequence patterns (e.g., L2, D2, and W) with distinct structural features (e.g., 310‐helix and helical insert) in a subtype‐specific manner (Figure 1), an outstanding question is whether coevolving residues might physically contribute to the overall fold of the respective subtypes.

Using proteolysis, NMR, circular dichroism (CD), molecular dynamics (MD) simulations, computational protein design, and generative model‐based energetics, we probed the structural consequences of alanine replacement of coevolving PHD finger proteins. Our findings demonstrated that the contributions of coevolving residues are not uniform across subtypes or structures. The coevolving residues in the PHD_W subtype tended to contribute to stability, whereas in the PHD_nW_DD subtype, the coevolving residues contributed little to stability. Thus, a high coevolutionary signal for a pair of residues might not necessarily indicate that the “constrained” residues substantially contribute to the fold and stability. The PHD finger coevolving residues are in and around the peptide binding site, and physically interact with respective peptide substrates (Figure S1). Thus, in the PHD finger family, the coevolving residues are likely to have evolved to function predominantly in peptide binding. Although coevolutionary signals are powerful for estimating inter‐residue distances and in silico folding constraints, they might not physically translate to true folding energetics. The detailed comparative characterization of subtypes or family members to better understand the roles of coevolving residues presented herein should aid in the characterization of other large protein families.

2. RESULTS

2.1. Manually identified coevolving residues are consistent with those identified through a global statistical method

Global statistical methods (e.g., GREMLIN (Balakrishnan et al., 2011; Kamisetty et al., 2013), CCMPred (Seemayer et al., 2014), EVfold (Marks et al., 2011), and DCA (Morcos et al., 2011)), because of their ability to distinguish direct coupling between MSA columns from merely correlated pairs (Seemayer et al., 2014), have enabled substantial improvements in deciphering coevolutionary signals from MSAs. With accurate coevolutionary signals, GREMLIN (Kamisetty et al., 2013) provides coevolutionary signal scores for the Pfam alignments of large protein domain families (i.e., MSAs with sufficient depth), such as the PHD finger family. Therefore, we evaluated the GREMLIN (Kamisetty et al., 2013) coevolutionary signal scores or s_sco values (Methods) for the PHD finger family to determine whether the s_sco values might be consistent with the coevolving residues in the PHD_nW_DD and PHD_W subtypes (Figure 1). Manually subclustered subtype alignments of PHD_nW_DD and PHD_W are shown in Figure 2a. The GREMLIN s_sco values indeed indicated that the coevolving residues had high s_sco values (i.e., s_sco ≥2, Figure 2c). For example, the GREMLIN inter‐residue s_sco value for the L2 and D2 positions (PHD_nW_DD subtype) was ~2 (Figure 2c). Similarly, the s_sco value for the W and M positions (PHD_W subtype) was ~2 (Figure 2c). Thus, the coevolving residue pairs showed higher signals than random residue pairs in the PHD finger family (Figure 2c). Moreover, the s_sco values were consistent with residue proximity (L2–D2, W–M, Figure 1c). The s_sco value for positions L2 (PHD_nW_DD) and W (PHD_W) was ~3 (Figure 2c). Although a leucine at the L2 and a tryptophan at the W position rarely occurred in the same structure (Figure 1c), the L2 and W positions were indeed proximal to each other (Figure 2b). Importantly, GREMLIN scores pertain to the entire family, whereas the coevolving residue positions were observed only in the respective subtypes (PHD_nW_DD, PHD_W). The PHD_nW_DD and PHD_W subtypes constitute ~20% and ~30%, respectively, of the entire PHD family (Boamah et al., 2018). The coevolving signals observed in the subtype sequences were sufficiently strong to be visible at the family level.

FIGURE 2.

FIGURE 2

PHD subtype alignments and GREMLIN restraints: (a) Pfam alignments of PHD subtypes, PHD_nW_DD (top) and PHD_W (bottom). The position number in the subtype alignments is the same as that of the PHD family Pfam master alignment and the position names (e.g., L2, D2, W, M, etc.) are from the earlier study (Boamah et al., 2018). (b) GREMLIN s_sco values (0–3.25) mapped onto the PHD scaffold structure. Coevolving positions (L2, D2, W) are indicated in the structure. (c) The distribution of PHD finger GREMLIN s_sco values. The s_sco values for the residue pairs L2–D2, W–M, and L2–W are indicated with arrows. W and G mean the same position as the W position in PHD_W is predominantly G in other subtypes. Pfam alignment for the 8 (KDM5B, UHRF1, BAZ2A, KAT6A, BPTF, DIDO1, PHF13, and RAG2) proteins studied here are in Figure S1B. PHD, plant homeodomain.

A map of the s_sco values on the PHD scaffold suggested that the PHD coevolving residue positions (e.g., L2, D2, W, and M) are within the binding site (Figure 2b). The PHD finger binding site was found to be in the region with the highest coevolutionary signals (Figure 2b). A comparison with other peptide‐binding domains (e.g., PDZ and bromodomain) suggested that the binding site residues were not necessarily in the regions with the highest coevolutionary signals (Figure S2a). Therefore, the PHD finger binding site differs from binding sites in other families. The GREMLIN s_sco values were also consistent with structural deviations (i.e., root mean squared deviation) among remote homologs of the peptide binding domain (Figure S2b). The PHD finger binding site (i.e., pair of strands) had a relatively smaller structural deviation than the rest of the structure among remote homologs. Other peptide‐binding domains showed larger structural deviations at the binding site (Figure S2b). The occurrence of binding site residues within the region with the highest coevolutionary signals prompted the question of whether PHD coevolving residues might contribute to (a) overall folding, (b) peptide‐binding, or (c) both folding and peptide‐binding. This study was aimed at answering this question.

2.2. Selection of PHD finger proteins and residue positions according to proteolytic susceptibility

To answer the above question, four representative proteins from each subtype (PHD_nW_DD: KDM5B, UHRF1, BAZ2A, and KAT6A; PHD_W: BPTF, DIDO, RAG2, and PHF13) were first verified for proteolytic susceptibility for further mutational analysis. The sequence identity between pairs among these eight proteins' PHD fingers was ~40%. We verified proteolytic susceptibility through fast parallel proteolysis (Minde et al., 2012; Park & Marqusee, 2005) (FASTpp) experiments performed with purified proteins (Figure 3; Figures S3 and S4, Table 1). FASTpp probes stability by monitoring the extent of proteolysis of a protein by thermolysin (TL) within a temperature range of 10–80°C (Figure 3; Figures S3 and S4). FASTpp indicated that proteolytic susceptibility varied greatly among PHD finger family members, even within a subtype (Table 1). For example, in the PHD_nW_DD subtype, KDM5B resisted proteolysis under temperatures as high as 80°C, whereas the others did not (Table 1, Figure 3; Figure S3). The temperature (Tmid) at which an ~50% decrease in the amount of protein due to proteolysis was observed is listed in Table 1 for each PHD finger. The Tmid for UHRF1 and BAZ2A PHD was 60 and 50°C, respectively, whereas KAT6A was proteolyzed at a temperature as low as 30°C. Similarly, for the PHD_W subtype, Tmid values varied among family members (Table 1; Figure S3). The Tmid for BPTF, DIDO1, RAG2, and PHF13 was 60, 40, 60, and 40°C, respectively. TL preferentially cleaves the peptide N‐terminus of nonpolar amino acids (F, I, L, M, V, and A). The number of nonpolar residues in the KDM5B and BPTF PHD fingers was 16 and 13 (F, I, L, M, V, and A), respectively, whereas the remaining PHD fingers had 14–21 nonpolar residues. Therefore, the observed difference was considered unlikely to have arisen from differences in the amino acid compositions of the scaffolds. These findings indicated that the proteolytic susceptibility (or probably the thermal stability) of PHD finger proteins tended to vary, although these proteins all had similar respective coevolving residue signals. The PHD finger proteolysis results were also compared with those for control proteins (Figure S4) to ensure that the proteolytic behavior of PHD fingers was consistent with that of other proteins.

FIGURE 3.

FIGURE 3

FASTpp of KDM5B and BPTF PHD wild type proteins: SDS‐PAGE showing proteolysis of KDM5B (top) and BPTF (bottom) PHD wild type by Thermolysin (TL). The presence and the absence of TL are represented, respectively with “+” (right) and “–”(left) signs. The temperature range for the experiments was 10°C to 80°C at an interval of 10°C. Tmid, the temperature at which the amount of protein is reduced to half due to proteolysis, is indicated. PHD, plant homeodomain.

TABLE 1.

Proteolytic susceptibility of PHD fingers.

PHD finger name Tmid (°C) (FASTpp)
PHD_nW_DD
KDM5B ND a
UHRF1 60
BAZ2A 50
KAT6A <30
PHD_W
BPTF 60
DIDO 40
RAG2 60
PHF13 40

Abbreviation: PHD, plant homeodomain.

a

ND represents no degradation from proteolysis.

To probe the energetic contributions of coevolving residue positions through mutagenesis, we selected KDM5B, UHRF1, BPTF, and DIDO1 (i.e., two proteins from each subtype), according to the proteolysis outcomes of the wildtype proteins. KDM5B appeared to be the least susceptible to proteolysis, whereas the other three proteins showed intermediate susceptibility. We selected five positions (AW1, AW2, L2, D2, and V; alignment in Figure 2a) for KDM5B and UHRF1 (Table 2), on the basis of an earlier report (Boamah et al., 2018) indicating that coevolving residues at these positions are characteristic features of the PHD_nW_DD subtype. Two or three aromatic cage positions in the BPTF and DIDO1 proteins (Table 3) were chosen for study, because these aromatic cage residue positions co‐occur (i.e., coevolve) in the PHD_W subtype. Positions D1 and L1 were selected as controls (i.e., non‐coevolving residues), because D1 aspartate and L1 leucine, regardless of subtype, are observed in more than 70% of PHD fingers. Except for the E321A mutant (i.e., AW1 on the 310‐helix), which began to degrade above 50°C, the alanine mutants at the remaining positions did not show proteolytic degradation, even at 80°C (Table 2, Figure S5). D322A (i.e., AW2 on the 310‐helix) also began to degrade at 70°C (Table 2, Figure S5). For UHRF1, the proteolytic behavior of the alanine mutants did not deviate from that of the wildtype protein (Table 2, Figure S6). UHRF1 mutants appeared to be slightly less susceptible to proteolysis (Tables 1 and 2, Figure S6). These observations suggested that the coevolving residues in KDM5B and UHRF1 mutants probably contribute little toward stability. These mutants were further evaluated for perturbations in structure and dynamics. Because the L2, D2, and W positions had higher GREMLIN scores, we focused on the L2, D2, and W position mutants rather than the remaining mutants.

TABLE 2.

Proteolytic susceptibility of PHD_nW_DD finger mutants.

Positions Tmid (°C) (FASTpp)
KDM5B UHRF1
D1 D328A ND
L1 L344A 70
AW1 E321A 50
AW2 D322A 70
L2 L326A ND
D2 D331A ND D350A 70
V V346A ND V365A 70

Abbreviation: PHD, plant homeodomain.

TABLE 3.

Proteolytic susceptibility of PHD_W finger aromatic cage mutants.

Position Tmid (°C) (FASTpp)
BPTF DIDO1
Wildtype 60 40
W W32A < 30 W29A low protein yield
Y Y10A < 30 Y8A low protein yield
Another aromatic cage Y17A < 30

2.3. Mutation of coevolving residues causes negligible structural perturbations in KDM5B and UHRF1

The structural stability of the L2, D2, and V position alanine mutants of KDM5B (Figure S7) and UHRF1 (Figure S8) at 25°C was probed with two‐dimensional 1H‐15N HSQC. The well‐dispersed 1H‐15N HSQC peaks of the mutants (Figures S7 and S8) overlapped with those of the wildtype protein, thereby suggesting that these mutants showed negligible structural perturbation at 25°C. Perturbation of the structures was further probed with 1H‐15N HSQC by raising the temperature from 5 to 65°C (Figure 4). This temperature scan indicated that, even at temperatures as high as 65°C, the neither the wildtype protein nor the mutant structures showed signs of loss of structure or unfolding (Figure 4). CD spectroscopy (Figure S9) also indicated little unfolding of the wildtype and mutant proteins, even at temperatures as high as ~100°C (Figure S9).

FIGURE 4.

FIGURE 4

 Temperature scan of KDM5B PHD wild type and mutant proteins: 1H‐15N HSQC spectra monitored for the temperature range 5 – 65 ºC the wild type (a) and the mutants, L326A (b), D331A (c), and V346A (d). The temperature scans for wild type and mutant proteins were respectively at 5 and 10 ºC intervals.

2.4. Coevolving residues are unlikely to substantially contribute to KDM5B and UHRF1 dynamics

We assessed whether the coevolving residues might contribute to the dynamics necessary for KDM5B and UHRF1 function. A comparison of the relaxation in constants (T 1 , T 2 , and NOE) between the wildtype versus mutant forms of both KDM5B (Figure 5; Figure S10) and UHRF1 (Figure S10) indicated that the dynamics did not differ between the wildtype and mutant proteins (Figure 4; Figure S10). Therefore, the coevolving residue positions were unlikely to contribute to relevant motion. In summary, on the basis of the above results, the functional contribution of KDM5B and UHRF1 coevolving residues to protein stability and motion appears to be small.

FIGURE 5.

FIGURE 5

Comparison of relaxation of KDM5B PHD wild type and L2 mutant: T 1 (top), T 2 (middle), and NOE (bottom) of the wild type is compared with that of the L2 (L326A) mutant. The comparison for the remaining mutants is in Figure S10.

2.5. Zn coordination is central to the PHD finger fold and stability

Because Zn atoms are at the core of the PHD finger structure, we purified the KDM5B PHD from M9 minimal medium with and without ZnCl2, to verify whether the proteolysis results differed for the KDM5B PHD finger with or without the Zn atom. The protein purified from M9 medium without ZnCl2 (i.e., lacking the Zn atom in the folded structure, Figure S11a, top) degraded, even at 10°C (Figure S11b, top), whereas the protein purified from M9 medium with ZnCl2 (i.e., retaining the Zn atom in the folded structure, Figure S11b) was resistant to proteolytic degradation (Figure S11c, bottom). This finding suggested that Zn nucleation, rather than the residues at the coevolving site, had a dominant influence on structural stability. A time course experiment at 25°C also suggested that KDM5B PHD wildtype and mutant proteins with reconstituted Zn resisted proteolytic degradation, even after 3 h of TL exposure (Figure S12), whereas the control protein GST began to degrade after 1 h of TL exposure (Figure S12).

2.6. Coevolving residue contributions differ among family members

We next tested the proteolytic susceptibility of alanine substitutions of aromatic cage residues of the PHD_W subtype (Table 3). In the PHD_W Pfam alignments, aromatic cage positions were found to co‐occur with the W and M positions. Therefore, along with the W and M positions, aromatic cage residues for anchoring histone H3K4me3 were considered to be coevolving residues in the PHD_W subtype. Interestingly, the proteolytic susceptibility of coevolving residue alanine mutants of the PHD_W subtype indicated that these mutants were less stable than the respective wildtype proteins. For example, the BPTF PHD mutants W32A, and particularly Y10A and Y17A, were more easily proteolyzed than the wildtype protein (Table 3; Table S13). A 1 μs MD simulation of Y10A suggested disruption of BPTF PHD secondary structures, such as the helical insert of PHD_W, in the Y10A mutant, in comparison with the wildtype protein, during the simulation (Figure S14), thus probably contributing to the mutant's higher proteolytic susceptibility. In DIDO1, the protein purification yields of the aromatic cage residue mutants, in comparison with the wildtype protein, were severely compromised and consequently hindered proteolytic analysis. These findings suggested differences in physical properties between the wildtype DIDO1 and its coevolving residue mutants. The coevolving residues showed negligible contributions to folding and stability in the PHD_nW_DD subtype but did contribute in the PHD_W subtype. Thus, coevolving residue contributions differ across members of the same family.

The difference in behavior between the wildtype and aromatic residue cage mutants was probably because of the ability of bulky aromatic residues, with their large surface areas and π electrons, to engage in a variety of intra‐protein interactions. For example, the BPTF cage residues (Y10 and Y17) engage in anion quadrupole (Chakravarty et al., 2018) (AQ) and hydrogen bonded AQ (HBAQ) (Figure S15); substitution of these residues was likely to have disrupted these interactions and led to the observed compromised stability. Not only BPTF PHD but also nearly all histone Kme readers have aromatic cage residues that engage in AQ/HBAQ interactions (Figure S15b). Therefore, beyond their inter‐protein cation–π interactions, histone reader cage residues are likely to frequently engage in intra‐protein interactions such as AQ. Histone readers often have Asp/Glu and aromatic residues, which engage in electrostatic and π‐mediated interactions, respectively, with positively charged histone peptides. Coevolving residues that have evolved for function, such as the aromatic cage residues well known for their Kme3 binding function, can also engage in intra‐protein interactions, and consequently could contribute to folding and stability.

2.7. Computational sequence design further highlights the roles of coevolving residues

Deep‐learning‐based in silico protein design algorithms (e.g., ProteinMPNN; Dauparas et al., 2022) that return sequences folding into a given fixed backbone structure can also be highly useful in understanding the contributions of residue positions to folding and stability. Given that several recent successful redesign reports (de Haas et al., 2024; Goudy et al., 2023; Kao et al., 2023) have used ProteinMPNN (Dauparas et al., 2022), particularly for enhancing physical properties of native proteins (Sumida et al., 2024), we applied this method to probe the redesigned sequences of PHD finger subtype scaffolds. In an in silico protein design experiment, the residues necessary for the structural fold were expected to be retained in the designed sequences, whereas positions with small energetic contributions to folding were expected to be altered more frequently. We assessed whether the amino acids at the respective coevolving residue positions were retained or altered in an in silico sequence design experiment, to provide an indirect measure of the contributions of the PHD coevolving residues (Figure 6; Figure S16).

FIGURE 6.

FIGURE 6

Profile of ProteinMPNN designed sequences of PHD subtypes: (a) The sequence logo of the designed sequences of 9 structures of the PHD_nW_DD subtype (bottom) is compared with the background (i.e., the corresponding Pfam alignment) (top). For visual convenience, the sequence logo of three segments of the PHD finger structure is shown here. The sequence logo of the full‐length structure is in Figure S16. The black dotted lines indicate the retention of the specific amino acid whereas the red dotted line indicates a lack of retention at coevolving positions. (b) and (d) compare the amino acid (aa) occurrence frequency at coevolving positions between designed sequences and the background (Pfam) for PHD_nW_DD (left) and PHD_W subtype (right), respectively. The occurrence frequency in red indicates a large difference between the background (Pfam) and designed sequences. (c) The sequence logo of the designed sequences of 16 structures of PHD_W subtype (bottom) is compared with the corresponding background Pfam alignment (top). The dotted lines indicate the same as in (a). D1 and L1, present in both subtypes (i.e., the whole family), are reference positions.

Sequence design of several backbone structures of PHD_nW_DD (nine structures) and PHD_W (16 structures) subtype scaffolds with ProteinMPNN (Dauparas et al., 2022) returned several redesigned sequences per structure. All redesigned sequences for a subtype were aligned to create a sequence profile. The sequence profile of the subtype created with the redesigned sequences was then compared with the respective background Pfam sequence profiles (Figure 6). The profile of the designed sequences was as expected, according to the respective subtype Pfam alignments (gray dotted lines in sequence logo, Figure 6a,c; Figure S16). For example, the residues at key Zn‐anchoring positions and several other family‐defining positions were retained in the designed PHD scaffold sequences (Figure 6a,c). In a control ProteinMPNN run, we removed the Zn atoms, to verify whether the key cysteine and histidine residues for Zn coordination would be retained. The results indicated that Zn coordinating residues were rarely replaced, thus suggesting that the fold‐determining residues were likely to have been correctly identified.

However, in the designed sequences of the PHD_nW_DD subtype backbone structures, amino acids (leucine, aspartate, and valine) at coevolving positions (L2, D2, and V) were not retained (red dotted lines, Figure 6a). The position‐specific occurrence frequency of amino acids at the coevolving residue positions is tabulated in Figure 6b,d. The frequency of occurrence of L2, D2, and V positions in PHD_nW_DD (Figure 6d, red) in the designed sequences indicated a large difference to the background Pfam frequency. The absence of leucine, aspartate, and valine residues at the respective coevolving residue positions (L2, D2, and V) in the designed sequences further suggested their lack of contribution to the structure, folding, and stability, in agreement with the above FASTpp, NMR, and CD results. Interestingly, the AW positions AW1 and AW2 (responsible for the 310‐helix in PHD_nW_DD) frequently retained negatively charged residues (Figure 6c). These findings suggested that positions influencing the structure were retained more frequently than those without an influence (i.e., the very premise of the sequence design check). Interestingly, the proteolytic susceptibility of KDM5B AW1 E321A (Table 2, Figure S5) was higher than that of L2, D2, and V position residues, in agreement with the ProteinMPNN replacement frequencies.

A comparison with PHD_W designed sequences suggested that the PHD_W subtype backbone always retained the characteristic coevolving W position tryptophan residue (Figure 6c,d) and often retained other coevolving residues in the subtype (Figure 6d). The aromatic cage residue positions (e.g., Y and M) were also observed 43% and 50% of the time, respectively. The PHD_W co‐occurrence frequencies were comparable to those in the background Pfam PHD_W. In addition to peptide binding, the W position (and other cage‐forming position) was expected to contribute to structure, folding, and stability, in agreement with the FASTpp results for the BPTF W32A mutant. Thus, the roles of coevolving residues were found to differ according to the subtype and their position in the structure (e.g., AW vs. L2/D2) (Figure 6). The median sequence identity of the 9 PHD_nW_DD and 16 PHD_W input sequences was 39% and 30%, respectively. This low sequence identity suggested that the results were likely to have been unbiased toward a particular group of sequences.

2.8. Energetics of PHD finger coevolving residues from generative models

Although ProteinMPNN‐designed sequences provided the amino acid replacement frequencies for specific sequence positions of the PHD finger family structures, the explicit values of the energetic contributions of residues toward PHD fold for these replacements were not available from ProteinMPNN (Dauparas et al., 2022). The generative model‐based inverse folding (IF) approach of the Evolutionary Scale Model (ESM‐IF) (Hsu et al., 2022) can be used to obtain the change in folding free energy change (ΔΔG) for amino acid replacement in a structure. The ESM‐IF designed sequence, sampling the amino acid type from position‐specific likelihoods, can return ΔΔG for a position by comparing the likelihood of the wildtype and variant sequences (Cagiada et al., 2024; Chen et al., 2022; Hsu et al., 2022; Notin et al., 2024; Reeves & Kalyaanamoorthy, 2023). To determine the energetic contributions (ΔΔG) of coevolving residues, we used ESM‐IF (stab_ESM_IF; Cagiada et al., 2024) to predict the absolute free energy change in a protein upon folding (Cagiada et al., 2024). stab_ESM_IF also provided the contributions of residues to the overall protein fold (referred to herein as the stabESM score). The stabESM scores of a residue are normalized between 0 and 1 (Cagiada et al., 2024). Residues with stabESM scores equal to 1 and 0 are considered to have the largest and smallest contribution, respectively, to the parent fold.

Nine and 16 structures of PHD_nW_DD and PHD_W subtype scaffolds, respectively, were used for stab_ESM_IF (Cagiada et al., 2024) calculations. The average stabESM scores for the selected coevolving residue positions in these structures are summarized in Table 4. As in the sequence design exercise above, PHD finger structures lacking the Zn atoms in the coordinate file were used for stab_ESM_IF calculations. The Zn coordinating cysteine and histidine residues were correctly identified as residues with a high average stabESM score ~ 1 (Table 4). Interestingly, the stabESM scores for the L2 and D2 positions were close to 0, whereas those of the aromatic cage positions were close to 0.5. Thus, the average stabESM score was higher for the PHD_W than the PHD_nW_DD coevolving residues (e.g., L2 and D2 positions), in agreement with the proteolysis outcomes and ProteinMPNN substitution frequencies. Notably, the 310‐helical AW1 position had a higher score than those of the L2 and D2 positions, in agreement with the proteolysis results and the ProteinMPNN substitution frequency. In summary, the stabESM scores provided further support that the contributions of coevolving residues can vary across members of a family. Like ProteinMPNN retension frequencies, family specific positions, such as L1 and D1, also showed high stabESM scores. We also note that the standard deviation of the average stabESM score (Table 4) are also large suggesting that even within a subtype, coevolving residue contribution could vary from one structure to another.

TABLE 4.

stab_ESM_IF residue contribution scores.

Residue position stabESM score
Zn coordinating cysteine and histidine 0.95 ± 0.05
PHD_nW_DD subtype
AW1 0.39 ± 0.39
AW2 0.21 ± 0.13
L2 0.08 ± 0.10
D2 0.10 ± 0.07
V 0.30 ± 0.34
L1 0.71 ± 0.04
D1 0.46 ± 0.34
PHD_W subtype
W 0.46 ± 0.25
M 0.48 ± 0.31
Y 0.53 ± 0.39

Abbreviation: PHD, plant homeodomain.

3. DISCUSSION

Although coevolving residues contribute to in silico folding (Hopf et al., 2019; Ju et al., 2021) constraints, the energetic contributions of coevolving residues in the physical context require experimental assessment, because the physics of the true interaction necessary for folding remains unclear (Ooka & Arai, 2023). Using the PHD finger as a model, we assessed whether coevolving residues physically influence the structure and folding of PHD finger proteins. Although PHD finger coevolving residues had been identified through earlier structure and sequence analysis (Boamah et al., 2018), herein, we evaluated the statistical significance of the observed coevolving signals with the global statistical method GREMLIN (Balakrishnan et al., 2011; Kamisetty et al., 2013) (Figure 2), because of its superior performance (Balakrishnan et al., 2011; Kamisetty et al., 2013; Schafer & Porter, 2023). GREMLIN was used for signal evaluation, and mapping of GREMLIN coevolutionary signals (s_sco values) onto the structure indicated that PHD finger binding site residue positions had larger signals than those for the rest of the protein. However, this observation was not found for other peptide‐binding modules (e.g., bromodomain) (Figure 2; Figure S2). Thus, the PHD finger module, because of its unique features (i.e., the binding site residues with higher signal), provided an important model for experimental assessment of the roles of coevolving residues. This model also enabled us to verify whether coevolutionary signals, probably originating from functional constraints (e.g., peptide binding function), might be misinterpreted as constraints for in silico folding.

The PHD finger family was an advantageous choice for study because of its subtypes' distinct sequence and structure patterns representing naturally occurring subsampled sequences (Figure 2a). Therefore, we were able to use subtype alignments to obtain useful insights regarding coevolutionary signals without a need for arbitrary subsampling (Monteiro da Silva et al., 2024) of aligned family sequences. For example, leucine (at the L2 position) and tryptophan (at the W position) were observed only in the PHD_nW_DD and PHD_W subtype sequences, respectively (i.e., they rarely co‐occurred in the same sequence). Despite the lack of co‐occurrence, the L2–W paired positions in the PHD finger family had among the highest GREMLIN coevolutionary signals. These findings suggested an absence of physical contact between the side chain atoms of the two proximal positions, yet the pair showed a strong signal. The origin of the strong coevolving signal between L2 and W positions was probably dominated by functional constraints, in agreement with findings from our earlier study on the PHD_nW_DD subtype (Boamah et al., 2018), in which alanine substitution at the L2 position severely compromised peptide binding (Boamah et al., 2018). Here, the replacement of leucine at L2 had negligible effects on the KDM5B and UHRF1 PHD structures (Figure 4; Figures S7–9). Like a tryptophan residue at W position is critical for methyl‐lysine binding in PHD_W, leucine at the L2 position is a likely adaptation for peptide binding in PHD_nW_DD. However, AF2 models of uncharacterized PHD_nW_DD subtypes often contained leucine at the L2 position (Figure 1). These observations suggested that the L2–W signal was probably misinterpreted as a folding constraint. Similar to the L2–W paired positions observed in the PHD finger subtypes, identification of ‘L2–W'‐type of non‐interacting proximal residues (e.g., using subsampling) with a strong coevolutionary signal could aid in fine functional characterization of coevolving residues given the continued interest in fine characterization of coevolving residues (e.g., OSCA2.3 ion channel (Jojoa‐Cruz et al., 2024), DNA binding response regulators (Shibata et al., 2024)).

As shown in our earlier work (Boamah et al., 2018), PHD_nW_DD coevolving residues contribute to tight interfacial packing for peptide binding (Boamah et al., 2018), because the enrichment in Asp (and Glu) residues at several positions (AW1, AW2, D1, and D2) in the PHD_nW_DD binding site leads to strong electrostatically mediated tight interfacial packing of positively charged histone peptides. Thus, the likely role of PHD_nW_DD coevolving residues is physically contacting peptide atoms by ensuring a tightly packed interface (Boamah et al., 2018; Bortoluzzi et al., 2017; Chakravarty et al., 2015). Other contributions of coevolving residues, such as protein dynamics (Chatterjee et al., 2019) (ps—ns time scale), were ruled out through monitoring of NMR relaxation time constants (T 1, T 2, and NOE) for the PHD_nW_DD wildtype and mutant proteins (Figure 4, Figure S10). Burial of enriched nonpolar residues (e.g., V and L2) in the tightly packed interface of highly charged interacting partners also makes a crucial energetic contribution to peptide binding (Boamah et al., 2018). For example, the hUHRF1 and mUHRF1 PHD fingers bind hPAF15 (Nishiyama et al., 2020) and mDPPA3 (Hata et al., 2022) (histone mimics, Figure S1c, left), respectively, and the nonpolar contact of the V position has an important role in accommodating the mimics (Gaurav & Kutateladze, 2023).

To better understand sequence–structure relationships, we used an alternative approach based on a deep‐learning sequence design method. Sequence design, such as fixed backbone‐based design (e.g., ProteinMPNN; Dauparas et al., 2022), is based on a single backbone structure rather than on a family of sequences, as in structure prediction. Designed sequences compatible with a structure highlight amino acid replacement. Amino acid replacements in individual structures of family members can be compiled into profiles that can capture the energetic contribution of a specific position in the context of a protein family's fold. A sequence profile constructed from designed sequences compatible with a backbone structure revealed differences among subtypes (Figure 6). For example, PHD_nW_DD structures rarely retained coevolving residues (Figure 6a). Coevolving residue position replacements of PHD_nW_DD contrasted with those of the PHD_W subtype (Figure 6c). Sequence design was also complemented by generative model‐based estimates of residue contributions (e.g., the stabESM score of stab_ESM_IF (Cagiada et al., 2024)) toward the fold. The replacements observed in protein design were consistent with the stab_ESM_IF estimates of residue contributions (Table 4), thereby suggesting clear differences in the contributions of coevolving residues to the backbone structures of the respective PHD finger subtypes. Even within a subtype, the contributions of coevolving residue positions were found to vary. For example, the AW1 replacement frequency differed from that at the D2, L2, and V positions. This difference among subtypes or positions of a subtype was also consistent with the proteolysis outcomes (Tables 2 and 3, Figures S5, S6, S13). This finding also suggested that the use of a combination of algorithms to (1) identify coevolving residues in a family, (2) design sequences of structures of the family members, and (3) estimate residue contributions can provide fine details regarding the energetics of coevolving residues, which are consistent with experimental findings. Because determining detailed differences in the behaviors of protein family members (e.g., initiators and effectors of the caspase family; Nag & Clark, 2023, or paralogs such as deubiquitinases; Maurer et al., 2023 or other close homologs; Harrison et al., 2023; Mittal et al., 2023) remains an important area of investigation, the use of a combination of algorithms, as performed herein for the PHD finger family, may serve as a useful approach for detailing subfamily specific coevolving residue contributions in other protein families. In the PHD_W subtype, the structural perturbations of coevolving cage mutants were also verified with MD simulations (Figure S14). We observed a loss of distal site secondary structures due to the mutation (Figure S14). Given that the aromatic cage‐mediated K/Rme recognition mechanism of readers in living cells can be probed by using cage residue mutants, the conclusions (i.e., loss of binding for reader mutants) of such live‐cell experiments might have been partly due to compromised stability. This argument is based on the observation of enhanced degradation (i.e., shorter half‐lives) of stability‐compromised mutants in cells (Sharma et al., 2019).

The proteolysis‐based experimentally determined “mega‐scale” ΔΔG dataset (Tsuboyama et al., 2023) of amino acid replacements can also be used to infer the contributions of coevolving residues toward protein folds. However, the “mega‐scale” dataset (Tsuboyama et al., 2023), consisting of proteins of 40–70 residues, does not include metal‐binding proteins, including Zn fingers such as the ~60‐residue PHD finger module. Therefore, independently probing the PHD finger scaffold through proteolysis was also important. The eight PHD finger family proteins studied herein showed a range of susceptibility to proteolysis by TL (Table 1). This finding was consistent with “mega‐scale” proteolysis‐based ΔG estimates for proteins belonging to a family. For example, the estimated folding ΔG for ~15 Pfam SH3_1 domain proteins in the “mega‐scale” dataset (Tsuboyama et al., 2023) ranged between 2 and 5 kcal/mol. This finding suggested varying proteolytic susceptibility of proteins in the same family, in agreement with our observations regarding PHD fingers. Although proteins belonging to the same family are expected to have the same family‐specific coevolving residue signals, energetic contributions from non‐coevolving residues (e.g., Zn‐anchoring residues in PHD fingers) are highly important for the overall free energy of a fold, particularly given that coevolving residue contributions can vary among family members for a given fold, as observed herein. In short, although coevolving residue signals are critical for in silico folding, their contributions to physical folding are unlikely to dominate the folding energetics. This may motivate further systematic analysis of the contributions of coevolving and non‐coevolving residues, to better understand folding energetics.

4. METHODS

4.1. DNA constructs and protein purification

The DNA constructs used in the study are listed in Table S1 (SI page S21). KDM5B, UHRF1, KAT6A, BAZ2A PHD finger constructs were taken from our earlier studies (Boamah et al., 2018; Chakravarty et al., 2015) while the remaining constructs (BPTF, RAG2A, DIDO1, PHF13) were created here using synthetic DNA by inserting desired DNA segments into the respective vectors (Table S1) using restriction sites (BamHI and XhoI). The alanine substituted mutants at L2, D1, D2, and V positions of the KDM5B (L326A, D328A, D331A, and V346A) PHD finger were taken from our earlier study (Boamah et al., 2018; Chakravarty et al., 2015) while AW1 and AW2 mutants of KDM5B (E321A and D322A) were generated here through site‐directed mutagenesis. Alanine substituted mutants at L1, L2, and V positions of the UHRF1 PHD finger (L344A, M345A, D350A, and V365A) and aromatic cage positions of the BPTF PHD finger (Y10A, Y17, and W32A) and DIDO1 (Y8A and W29A) were generated using site‐directed mutagenesis. Recombinant proteins (wild‐type and mutants) were purified as described in our earlier studies (Boamah et al., 2018; Chakravarty et al., 2015) using either GST or His tag affinity chromatography followed by tag removal by overnight protease digestion and then purified to homogeneity by size exclusion chromatography. 15N labeled proteins were purified from M9 media. Purified protein concentration was determined spectroscopically using the molar extinction coefficient obtained from the respective recombinant protein sequences.

4.2. Proteolysis

FASTpp (Minde et al., 2012; Park & Marqusee, 2005) was carried out with 0.15–1 mg/mL of protein of interest (i.e., wild‐type and mutants) mixed with 5 μL of 5 mg/mL protease TL (Sigma) stock solution in a final reaction volume of 75 μL containing 1× buffer (20 mM phosphate buffer, pH 7.2 containing 10 mM CaCl2 and 150 mM NaCl) at 4°C in a PCR tube. The proteolysis at a desired temperature (TD) was carried out using a thermocycler (Bio‐Rad T100™) in five steps: (1) 4°C to TD°C in 20 s (heating), (2) 120 s at TD°C (proteolysis pulse), (3) TD°C to 4°C in 40 s (cooling), (4) quenching the proteolysis by addition of 50 mM EDTA, and (5) running samples in SDS PAGE. The TD values ranged typically between 10 and 80°C at an interval of 10°C (Figure 4; Figures S6, S7). TD values ranging from 40 to 64°C at an interval of 3°C (data not shown) were also used here for a better estimate of respective protein melting temperatures. FASTpp experiments for each purified protein were carried out 2–4 times. Tmid, the temperature at which ~50% drop in the amount of protein was observed, was estimated by visual inspection in 2–3 repeats. FASTpp, for control proteins, was also carried out in 4 M urea (Figure S4) by mixing the desired amount of protein in 2× buffer with 8 M urea solution in a 1:1 volume ratio to the final reaction volume of 75 μL. Proteins purified from M9 media, with or without ZnCl2, were also used for FASTpp experiments (Figure S11). Because resistance to proteolysis has been used in several recent protein stability studies (Leuenberger et al., 2017; Rocklin et al., 2017; Tsuboyama et al., 2023; Yang et al., 2023), we used FASTpp to further examine the stability of the PHD scaffold, especially given the consistency in the results of proteolysis versus other methods (Figure 4; Figure S9).

4.3. NMR experiments

15N‐labeled protein samples were purified from Escherichia coli cells grown in M9 medium containing 15NH4Cl using the same protocol as that of the purified proteins described above. All NMR spectra were recorded on Varian VNMRS spectrometers operating at 600 and 800 MHz (1H) and equipped with cryogenically cooled probes. For backbone 1H,15N chemical shift assignments, 2D 1H,15N‐HSQC, and 3D NOESY 1H,15N‐HSQC spectra of protein samples (0.5–0.8 mM) were recorded at 25°C, as well as at temperatures ranging from 40 to 65°C at an interval of 5 or 10°C. The 3D NOESY spectra were recorded using a 200 ms mixing time. The spectra were processed with NMRPipe (Delaglio et al., 1995) and analyzed with NMRFAM‐SPARKY (Lee et al., 2015). The 1H,15N backbone peaks in the HSQC spectra were identified using assignments previously deposited in the BMRB database (BMRB entry 19,913 for KDM5B, associated to PDB structure 2mny, and BMRB entry 17,813 for UHRF1, associated to PDB structure 2lgl). Differences between the recorded and deposited HSQC chemical shifts were resolved by analyzing HN‐HN NOE cross‐peaks from the 3D NOESY 1H,15N‐HSQC spectra. For the extraction of 15N relaxation parameters, 15N‐T 1 , 15N‐T 2 , and steady‐state 15N‐heteronuclear NOE experiments were recorded using standard pulse sequences. The data collection for relaxation measurements was carried out at 25°C with protein sample concentration ranging from 0.3 to 0.5 mM. The relaxation experiments were recorded as a series of interleaved 2D 1H,15N spectra, using 8 relaxation delays for T 1 (0.002, 0.005, 0.01, 0.03, 0.05, 0.1, 0.2, and 0.4 s) and 7 relaxation delays for T 2 (0.002, 0.004, 0.008, 0.016, 0.024, 0.032, and 0.064 s). The 2D spectra were then separated during processing with NMRPipe (Delaglio et al., 1995) and analyzed with NMRFAM‐SPARKY (Lee et al., 2015). Upon peak assignments, T 1 and T 2 relaxation times were extracted by nonlinear least squares fitting. Peak heights at the assigned peak positions in each spectrum were used to fit the decay curves. 15N‐Heteronuclear NOE values were taken as the ratio of peak intensities observed for experiments with 5.0 s of 1H‐presaturation during the recycle delay and the corresponding reference experiments were recorded without the pre‐saturation but with a 5.0 s recovery delay. Saturated and reference 2D spectra were also recorded in an interleaved manner and separated during processing. All heteronuclear NOE spectra were recorded in duplicate to estimate errors in the NOE values. The 15N–1H bond length was set to 1.02 Å, and the 15N chemical shift anisotropy was set to an average value of 160 ppm for all iterations. An earlier report (Chatterjee et al., 2019) on the differences in dynamics (1H‐15N relaxation time constants, T 1 , T 2 , and NOE) between SUMO and Ubiquitin motivated the probe of dynamics of PHD_nW_DD.

4.4. GREMLIN scores

GREMLIN (Kamisetty et al., 2013; Ovchinnikov et al., 2014) score analysis was performed as described in our earlier study (Chakravarty et al., 2018). Briefly, the precomputed GREMLIN (Kamisetty et al., 2013; Ovchinnikov et al., 2014) inter‐residue contact scores and the GREMLIN (Kamisetty et al., 2013; Ovchinnikov et al., 2014) consensus sequence for each Pfam domain (e.g., PHD, bromodomain, PDZ, etc.) were downloaded in 2016 from the GREMLIN web server (http://gremlin.bakerlab.org). Using Markov random fields to identify the coevolution of amino acid pairs, GREMLIN predicts inter‐residue contact scores (Balakrishnan et al., 2011; Kamisetty et al., 2013; Schafer & Porter, 2023). The predicted GREMLIN (Kamisetty et al., 2013; Ovchinnikov et al., 2014) inter‐residue contact scores for a Pfam domain are available for residues present in the consensus sequence of that domain. The predicted inter‐residue contact score is a table containing five columns representing: (1) the ith sequence position, (2) the jth sequence position, (3) the predicted score (r_sco) for the contact between ith and jth position pair, (4) the normalized/scaled contact score (s_sco), and (5) the probability (prob) of the contact pair prediction. A high value of GREMLIN probability (e.g., prob = 1.0) has a higher confidence for the predicted contact than that of a lower value of prob. We utilized the normalized contact score (s_sco) for the simple analysis carried out here. s_sco values range between 0 and 3, and a value >2 is considered large. GREMLIN (Kamisetty et al., 2013; Ovchinnikov et al., 2014) contact scores (s_sco and prob) were mapped onto a structure by aligning the consensus sequence to the sequence extracted from the coordinate file of a structure. The consensus sequence represents a single sequence for the entire family, and therefore, to map the scores onto a structure, the consensus sequence of each Pfam domain was aligned to the sequence extracted from a coordinate file using MUSCLE (Edgar, 2004) alignment to get the correspondence between a residue in a structure and that of the consensus sequence. From this correspondence, we mapped the scores to residue positions in a structure (Figure S2a). We had earlier noted that the average GREMLIN inter‐residue contact scores for salt bridges, side‐chain hydrogen bonds, and a weak interaction such as AQ followed the order salt bridge > hydrogen bond > AQ (Chakravarty et al., 2018). This suggested that stronger interactions, in general, tend to have higher contact scores than weaker interactions likely due to higher conservation of residues with larger energetic contributions to stability. Therefore, PHD finger coevolving residues with high GREMLIN contact scores were probed for their energetic contributions.

4.5. Sequence design

The sequence design of a structure by ProteinMPNN (Dauparas et al., 2022) was carried out by uploading the corresponding coordinate file to huggingface.co/spaces/simonduerr/ProteinMPNN and running the design experiment with the default setting (e.g., sampling temperature of 0.1). The uploaded coordinate file was cleaned to make sure that only one chain was retained in the file. For the PHD_nW_DD subtype, the following coordinate files were uploaded: 3sla (Chain A, residues 312–371), 4qf2 (A, 1673–1731), 4tvr (A, 341–398), 5b78 (A, 259–316), 5szc (A, 313–371), 5u2j (A, 259–318), 5vab (A, 19–76), and 5vcd (A, 55–112). For the PHD_W subtype, the following coordinate files were uploaded: 2ri7 (A, 5–64), 2vpb (A, 337–402), 3kqi (A, 2–61), 3kv4 (A, 2–61), 3kv5 (A, 34–93), 3lqh (A, 1563–1632), 4 l58 (A, −1 – 55), 4l7x (A, 3–63), 4up0 (A, 324–387), 5tdr (A, −1 – 56), 5wlf (A, 907–969), 5wxh (A, 864–922), 5y20 (A, 1–60), 5yc3 (A, 1–60), 5z8l (A, 143–199), and 6wxk (A, 336–390). The residue boundaries of the PHD finger domains in these coordinates files were taken from the file named pdb_pfam_mapping.txt downloaded from ftp.ebi.ac.uk/pub/databases/Pfam/mappings. For each uploaded coordinate file, 15 sequences were designed. Coordinates of the Zn atoms were manually removed from the coordinate file. The AlphaFold2 structure of a designed sequence was, however, not checked. The designed sequences of each subtype were aligned using MUSCLE (Edgar, 2004) multiple alignment to generate a sequence profile for each subtype. The “designed” sequence profile was then compared with the background Pfam domain sequence profile. For convenience in profile comparison, sequence logos (Figure 6a,c) representing the “designed” or the background Pfam sequence profile were created using the Skyline (Wheeler et al., 2014) server with default parameters. The frequency of occurrence (p ji ) (Figure 6b,d) of the jth amino acid (or residue type) in the ith alignment position (column) is the ratio between the number of times the jth residue type appears in the ith alignment position and the total number of sequences in the alignment profile.

4.6. Residue contributions from generative model

stab_ESM_IF prediction calculations were carried out through the jupytor notebook in Google Colaboratory located at (https://colab.research.google.com/github/KULL‐Centre/_2024_cagiada_stability/blob/main/stab_ESM_IF.ipynb) by uploading the one coordinate file at a time. For each coordinate file, the calculation returned a text file listing residue number and the corresponding stabESM score. The text file was processed using in‐house scripts to map the score in the text file to the corresponding residue positions in the coordinate file (Figure S17). For average score for a position (e.g., L2, W, etc.) in the family of structures, a structural multiple alignment of the family of structures was first generated. The structural alignment was then used to create a table of equivalent residue positions (Figure S17). The scores mapped to equivalent residue positions were averaged to get mean (and SD).

4.7. Alignments

Three types of alignments were used in this study. They are (1) Pfam (Finn et al., 2016) (profile) alignments (e.g., Figure 2a; Figure S1B), for narrating the PHD subtypes in general, (2) MUSTANG (Konagurthu et al., 2006) and CE (Shindyalov & Bourne, 1998) structural alignments, used for the superposition of structures, especially AF2 DB coordinate files (Figure 1b) and MD trajectory coordinates (Figure S14), and (3) MUSCLE (Edgar, 2004) multiple sequence alignment, used for aligning GREMLIN (Kamisetty et al., 2013; Ovchinnikov et al., 2014) consensus sequence to the sequences extracted from coordinate files for mapping the s_sco values to a structure. MUSCLE was also used for aligning ProteinMPNN designed sequences to create a sequence profile.

AUTHOR CONTRIBUTIONS

Suvobrata Chakravarty: Conceptualization; investigation; funding acquisition; writing – original draft; methodology; validation; writing – review and editing; data curation; software; formal analysis; project administration; supervision. Shraddha Basu: Methodology; data curation; writing – review and editing. Ujwal Subedi: Methodology; data curation; writing – review and editing. Marco Tonelli: Data curation. Maral Afshinpour: Data curation. Nitija Tiwari: Data curation. Ernesto J. Fuentes: Data curation; writing – review and editing.

FUNDING INFORMATION

This study was supported by the US National Institutes of Health through grants from the National Institutes of General Medical Sciences (1R15GM116040‐01A1, 1R15‐GM134502‐01) to SC. This study also made use of the National Magnetic Resonance Facility at Madison (NMRFAM), which is supported by NIH grants R24GM141526 and P41GM103399.

CONFLICT OF INTEREST STATEMENT

The authors declare no competing interests.

Supporting information

DATA S1: Supplemental information.

PRO-33-e5065-s001.pdf (33.8MB, pdf)

ACKNOWLEDGMENTS

We acknowledge R. Auch, A. Semenchenko, and C. Julius of UNRC SDSU for computational support. We thank Kai Cai (NMRFAM) for initiating NMR data collection. We thank investigators for making ProteinMPNN and stab_ESM_IF websites freely available and accessible to all. We thank all members of the SC lab for helpful discussions and help with construct design. We thank Jaime Lopez for the helpful discussions. We thank all reviewers for valuable suggestions.

Basu S, Subedi U, Tonelli M, Afshinpour M, Tiwari N, Fuentes EJ, et al. Assessing the functional roles of coevolving PHD finger residues. Protein Science. 2024;33(7):e5065. 10.1002/pro.5065

Shraddha Basu and Ujwal Subedi have contributed equally and are joint first authors. Marco Tonelli and Maral Afshinpour have contributed equally and are joint second authors.

Review Editor: Carol Beth Post

DATA AVAILABILITY STATEMENT

All data needed to evaluate the conclusions in the paper are present in the paper and the Supplementary Materials.

REFERENCES

  1. Ahdritz G, Bouatta N, Floristean C, Kadyan S, Xia Q, Gerecke W, et al. OpenFold: retraining AlphaFold2 yields new insights into its learning mechanisms and capacity for generalization. Nature Methods. 2022. 10.1038/s41592-024-02272-z [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Baek M, DiMaio F, Anishchenko I, Dauparas J, Ovchinnikov S, Lee GR, et al. Accurate prediction of protein structures and interactions using a three‐track neural network. Science. 2021;373:871–876. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Balakrishnan S, Kamisetty H, Carbonell JG, Lee SI, Langmead CJ. Learning generative models for protein fold families. Proteins. 2011;79:1061–1078. [DOI] [PubMed] [Google Scholar]
  4. Bartlett GJ, Taylor WR. Using scores derived from statistical coupling analysis to distinguish correct and incorrect folds in de‐novo protein structure prediction. Proteins: Structure, Function, and Bioinformatics. 2008;71:950–959. [DOI] [PubMed] [Google Scholar]
  5. Black JC, Kutateladze TG. Atypical histone targets of PHD fingers. The Journal of Biological Chemistry. 2023;299:104601. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Boamah D, Lin T, Poppinga FA, Basu S, Rahman S, Essel F, et al. Characteristics of a PHD finger subtype. Biochemistry. 2018;57:525–539. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Bortoluzzi A, Amato A, Lucas X, Blank M, Ciulli A. Structural basis of molecular recognition of helical histone H3 tail by PHD finger domains. Biochemical Journal. 2017;474:1633–1651. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Cagiada M, Ovchinnikov S, Lindorff‐Larsen K. Predicting absolute protein folding stability using generative models. bioRxiv. 2024. 10.1101/2024.03.14.584940 [DOI] [Google Scholar]
  9. Chakravarty S, Essel F, Lin T, Zeigler S. Histone peptide recognition by KDM5B‐PHD1: a case study. Biochemistry. 2015;54:5766–5780. [DOI] [PubMed] [Google Scholar]
  10. Chakravarty S, Ung AR, Moore B, Shore J, Alshamrani M. A comprehensive analysis of anion–quadrupole interactions in protein structures. Biochemistry. 2018;57:1852–1867. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Chakravarty S, Zeng L, Zhou MM. Structure and site‐specific recognition of histone H3 by the PHD finger of human autoimmune regulator. Structure. 2009;17:670–679. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Chatterjee KS, Tripathi V, Das R. A conserved and buried edge‐to‐face aromatic interaction in small ubiquitin‐like modifier (SUMO) has a role in SUMO stability and function. The Journal of Biological Chemistry. 2019;294:6772–6784. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Chen T, Gong C, Diaz DJ, Chen X, Wells JT, Wang Z, et al. HotProtein: a novel framework for protein thermostability prediction and editing. The eleventh international conference on learning representations; 2022. [Google Scholar]
  14. Dauparas J, Anishchenko I, Bennett N, Bai H, Ragotte RJ, Milles LF, et al. Robust deep learning–based protein sequence design using ProteinMPNN. Science. 2022;378:49–56. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. de Haas RJ, Brunette N, Goodson A, Dauparas J, Yi SY, Yang EC, et al. Rapid and automated design of two‐component protein nanomaterials using ProteinMPNN. Proceedings of the National Academy of Sciences of the United States of America. 2024;121:e2314646121. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. De Juan D, Pazos F, Valencia A. Emerging methods in protein co‐evolution. Nature Reviews. Genetics. 2013;14:249–261. [DOI] [PubMed] [Google Scholar]
  17. Del Alamo D, Sala D, McHaourab HS, Meiler J. Sampling alternative conformational states of transporters and receptors with AlphaFold2. eLife. 2022;11. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Delaglio F, Grzesiek S, Vuister GW, Zhu G, Pfeifer J, Bax A. NMRPipe: a multidimensional spectral processing system based on UNIX pipes. Journal of Biomolecular NMR. 1995;6:277–293. [DOI] [PubMed] [Google Scholar]
  19. Edgar RC. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Research. 2004;32:1792–1797. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Elofsson A. Progress at protein structure prediction, as seen in CASP15. Current Opinion in Structural Biology. 2023;80:102594. [DOI] [PubMed] [Google Scholar]
  21. Finn RD, Coggill P, Eberhardt RY, Eddy SR, Mistry J, Mitchell AL, et al. The Pfam protein families database: towards a more sustainable future. Nucleic Acids Research. 2016;44:D279–D285. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Gaurav N, Kutateladze TG. Non‐histone binding functions of PHD fingers. Trends in Biochemical Sciences. 2023;48:610–617. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Göbel U, Sander C, Schneider R, Valencia A. Correlated mutations and residue contacts in proteins. Proteins: Structure, Function, and Bioinformatics. 1994;18:309–317. [DOI] [PubMed] [Google Scholar]
  24. Goudy OJ, Nallathambi A, Kinjo T, Randolph NZ, Kuhlman B. In silico evolution of autoinhibitory domains for a PD‐L1 antagonist using deep learning models. Proceedings of the National Academy of Sciences of the United States of America. 2023;120:e2307371120. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Harrison SA, Naretto A, Balakrishnan S, Perera YR, Chazin WJ. Comparative analysis of the physical properties of murine and human S100A7: insight into why zinc piracy is mediated by human but not murine S100A7. The Journal of Biological Chemistry. 2023;105292:105292. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Hata K, Kobayashi N, Sugimura K, Qin W, Haxholli D, Chiba Y, et al. Structural basis for the unique multifaceted interaction of DPPA3 with the UHRF1 PHD finger. Nucleic Acids Research. 2022;50:12527–12542. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Hopf TA, Green AG, Schubert B, Mersmann S, Schärfe CPI, Ingraham JB, et al. The EVcouplings Python framework for coevolutionary sequence analysis. Bioinformatics. 2019;35:1582–1584. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Hsu C, Verkuil R, Liu J, Lin Z, Hie B, Sercu T, et al. Learning inverse folding from millions of predicted structures. International conference on machine learning. PMLR; 2022. p. 8946–8970. [Google Scholar]
  29. Jojoa‐Cruz S, Burendei B, Lee W‐H, Ward AB. Structure of mechanically activated ion channel OSCA2.3 reveals mobile elements in the transmembrane domain. Structure. 2024;32:157–167.e5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Jones DT, Buchan DW, Cozzetto D, Pontil M. PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments. Bioinformatics. 2012;28:184–190. [DOI] [PubMed] [Google Scholar]
  31. Ju F, Zhu J, Shao B, Kong L, Liu TY, Zheng WM, et al. CopulaNet: learning residue co‐evolution directly from multiple sequence alignment for protein structure prediction. Nature Communications. 2021;12:2535. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Jumper J, Evans R, Pritzel A, Green T, Figurnov M, Ronneberger O, et al. Highly accurate protein structure prediction with AlphaFold. Nature. 2021;596:583–589. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Jumper J, Hassabis D. Protein structure predictions to atomic accuracy with AlphaFold. Nature Methods. 2022;19:11–12. [DOI] [PubMed] [Google Scholar]
  34. Kamisetty H, Ovchinnikov S, Baker D. Assessing the utility of coevolution‐based residue‐residue contact predictions in a sequence‐ and structure‐rich era. Proceedings of the National Academy of Sciences of the United States of America. 2013;110:15674–15679. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Kao HW, Lu WL, Ho MR, Lin YF, Hsieh YJ, Ko TP, et al. Robust design of effective allosteric activators for Rsp5 E3 ligase using the machine learning tool ProteinMPNN. ACS Synthetic Biology. 2023;12:2310–2319. [DOI] [PubMed] [Google Scholar]
  36. Konagurthu AS, Whisstock JC, Stuckey PJ, Lesk AM. MUSTANG: a multiple structural alignment algorithm. Proteins. 2006;64:559–574. [DOI] [PubMed] [Google Scholar]
  37. Lee W, Tonelli M, Markley JL. NMRFAM‐SPARKY: enhanced software for biomolecular NMR spectroscopy. Bioinformatics. 2015;31:1325–1327. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Leuenberger P, Ganscha S, Kahraman A, Cappelletti V, Boersema PJ, von Mering C, et al. Cell‐wide analysis of protein thermal unfolding reveals determinants of thermostability. Science. 2017;355. 10.1126/science.aai7825 [DOI] [PubMed] [Google Scholar]
  39. Lin Z, Akin H, Rao R, Hie B, Zhu Z, Lu W, et al. Evolutionary‐scale prediction of atomic‐level protein structure with a language model. Science. 2023;379:1123–1130. [DOI] [PubMed] [Google Scholar]
  40. Marks DS, Colwell LJ, Sheridan R, Hopf TA, Pagnani A, Zecchina R, et al. Protein 3D structure computed from evolutionary sequence variation. PLoS One. 2011;6:e28766. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Maurer SK, Mayer MP, Ward SJ, Boudjema S, Halawa M, Zhang J, et al. Ubiquitin specific protease 11 structure in complex with an engineered substrate mimetic reveals a molecular feature for deubiquitination selectivity. The Journal of Biological Chemistry. 2023;105300:105300. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Minde DP, Maurice MM, Rudiger SG. Determining biophysical protein stability in lysates by a fast proteolysis assay, FASTpp. PLoS One. 2012;7:e46147. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Mistry J, Chuguransky S, Williams L, Qureshi M, Salazar GA, Sonnhammer ELL, et al. Pfam: the protein families database in 2021. Nucleic Acids Research. 2021;49:D412–D419. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Mittal M, Kausar T, Rajan S, Rashmi D, Sau AK. Difference in catalytic loop repositioning leads to GMP variation between two human GBP homologues. Biochemistry. 2023;62:1509–1526. [DOI] [PubMed] [Google Scholar]
  45. Monteiro da Silva G, Cui JY, Dalgarno DC, Lisi GP, Rubenstein BM. High‐throughput prediction of protein conformational distributions with subsampled AlphaFold2. Nature Communications. 2024;15:2464. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Morcos F, Pagnani A, Lunt B, Bertolino A, Marks DS, Sander C, et al. Direct‐coupling analysis of residue coevolution captures native contacts across many protein families. Proceedings of the National Academy of Sciences. 2011;108:E1293–E1301. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Nag M, Clark AC. Conserved folding landscape of monomeric initiator caspases. The Journal of Biological Chemistry. 2023;299:103075. [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Nishiyama A, Mulholland CB, Bultmann S, Kori S, Endo A, Saeki Y, et al. Two distinct modes of DNMT1 recruitment ensure stable maintenance DNA methylation. Nature Communications. 2020;11:1222. [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Notin P, Kollasch A, Ritter D, Van Niekerk L, Paul S, Spinner H, et al. ProteinGym: large‐scale benchmarks for protein fitness prediction and design. Advances in Neural Information Processing Systems. 2024;36. 10.1101/2023.12.07.570727 [DOI] [Google Scholar]
  50. Ooka K, Arai M. Accurate prediction of protein folding mechanisms by simple structure‐based statistical mechanical models. Nature Communications. 2023;14:6338. [DOI] [PMC free article] [PubMed] [Google Scholar]
  51. Ovchinnikov S, Kamisetty H, Baker D. Robust and accurate prediction of residue‐residue interactions across protein interfaces using evolutionary information. eLife. 2014;3:e02030. [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. Park C, Marqusee S. Pulse proteolysis: a simple method for quantitative determination of protein stability and ligand binding. Nature Methods. 2005;2:207–212. [DOI] [PubMed] [Google Scholar]
  53. Rajakumara E, Wang Z, Ma H, Hu L, Chen H, Lin Y, et al. PHD finger recognition of unmodified histone H3R2 links UHRF1 to regulation of euchromatic gene expression. Molecular Cell. 2011;43:275–284. [DOI] [PMC free article] [PubMed] [Google Scholar]
  54. Reeves S, Kalyaanamoorthy S. Zero‐shot transfer of protein sequence likelihood models to Thermostability prediction. bioRxiv. 2023. 10.1101/2023.07.17.549396 [DOI] [Google Scholar]
  55. Rocklin GJ, Chidyausiku TM, Goreshnik I, Ford A, Houliston S, Lemak A, et al. Global analysis of protein folding using massively parallel design, synthesis, and testing. Science. 2017;357:168–175. [DOI] [PMC free article] [PubMed] [Google Scholar]
  56. Sala D, Engelberger F, McHaourab HS, Meiler J. Modeling conformational states of proteins with AlphaFold. Current Opinion in Structural Biology. 2023;81:102645. [DOI] [PubMed] [Google Scholar]
  57. Sanchez R, Zhou MM. The PHD finger: a versatile epigenome reader. Trends in Biochemical Sciences. 2011;36:364–372. [DOI] [PMC free article] [PubMed] [Google Scholar]
  58. Schaarschmidt J, Monastyrskyy B, Kryshtafovych A, Bonvin AM. Assessment of contact predictions in CASP12: co‐evolution and deep learning coming of age. Proteins: Structure, Function, and Bioinformatics. 2018;86:51–66. [DOI] [PMC free article] [PubMed] [Google Scholar]
  59. Schafer JW, Porter LL. Evolutionary selection of proteins with two folds. Nature Communications. 2023;14:5478. [DOI] [PMC free article] [PubMed] [Google Scholar]
  60. Schlessinger A, Bonomi M. Exploring the conformational diversity of proteins. eLife. 2022;11. 10.7554/elife.78549 [DOI] [PMC free article] [PubMed] [Google Scholar]
  61. Seemayer S, Gruber M, Söding J. CCMpred—fast and precise prediction of protein residue–residue contacts from correlated mutations. Bioinformatics. 2014;30:3128–3130. [DOI] [PMC free article] [PubMed] [Google Scholar]
  62. Senior AW, Evans R, Jumper J, Kirkpatrick J, Sifre L, Green T, et al. Improved protein structure prediction using potentials from deep learning. Nature. 2020;577:706–710. [DOI] [PubMed] [Google Scholar]
  63. Sharma R, Demény M, Ambrus V, Király SB, Kurtán T, Gatti‐Lafranconi P, et al. Specific and fuzzy interactions cooperate in modulating protein half‐life. Journal of Molecular Biology. 2019;431:1700–1707. [DOI] [PubMed] [Google Scholar]
  64. Shibata M, Lin X, Onuchic JN, Yura K, Cheng RR. Residue coevolution and mutational landscape for OmpR and NarL response regulator subfamilies. Biophysical Journal. 2024;123:681–692. [DOI] [PMC free article] [PubMed] [Google Scholar]
  65. Shindyalov IN, Bourne PE. Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. Protein Engineering. 1998;11:739–747. [DOI] [PubMed] [Google Scholar]
  66. Stein RA, McHaourab HS. SPEACH_AF: sampling protein ensembles and conformational heterogeneity with Alphafold2. PLoS Computational Biology. 2022;18:e1010483. [DOI] [PMC free article] [PubMed] [Google Scholar]
  67. Sumida KH, Núñez‐Franco R, Kalvet I, Pellock SJ, Wicky BIM, Milles LF, et al. Improving protein expression, stability, and function with ProteinMPNN. Journal of the American Chemical Society. 2024;146:2054–2061. [DOI] [PMC free article] [PubMed] [Google Scholar]
  68. Tsai WW, Wang Z, Yiu TT, Akdemir KC, Xia W, Winter S, et al. TRIM24 links a non‐canonical histone signature to breast cancer. Nature. 2010;468:927–932. [DOI] [PMC free article] [PubMed] [Google Scholar]
  69. Tsuboyama K, Dauparas J, Chen J, Laine E, Mohseni Behbahani Y, Weinstein JJ, et al. Mega‐scale experimental analysis of protein folding stability in biology and design. Nature. 2023;620:434–444. [DOI] [PMC free article] [PubMed] [Google Scholar]
  70. Varadi M, Bertoni D, Magana P, Paramval U, Pidruchna I, Radhakrishnan M, et al. AlphaFold protein structure database in 2024: providing structure coverage for over 214 million protein sequences. Nucleic Acids Research. 2024;52:D368–D375. [DOI] [PMC free article] [PubMed] [Google Scholar]
  71. Wang S, Sun S, Li Z, Zhang R, Xu J. Accurate de novo prediction of protein contact map by ultra‐deep learning model. PLoS Computational Biology. 2017;13:e1005324. [DOI] [PMC free article] [PubMed] [Google Scholar]
  72. Wang Z, Song J, Milne TA, Wang GG, Li H, Allis CD, et al. Pro isomerization in MLL1 PHD3‐bromo cassette connects H3K4me readout to CyP33 and HDAC‐mediated repression. Cell. 2010;141:1183–1194. [DOI] [PMC free article] [PubMed] [Google Scholar]
  73. Wayment‐Steele HK, Ojoawo A, Otten R, Apitz JM, Pitsawong W, Hömberger M, et al. Predicting multiple conformations via sequence clustering and AlphaFold2. Nature. 2024;625:832–839. [DOI] [PMC free article] [PubMed] [Google Scholar]
  74. Wheeler TJ, Clements J, Finn RD. Skylign: a tool for creating informative, interactive logos representing sequence alignments and profile hidden Markov models. BMC Bioinformatics. 2014;15:7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  75. Yang A, Jude KM, Lai B, Minot M, Kocyla AM, Glassman CR, et al. Deploying synthetic coevolution and machine learning to engineer protein‐protein interactions. Science. 2023;381:eadh1720. [DOI] [PMC free article] [PubMed] [Google Scholar]
  76. Zeng L, Zhang Q, Li SD, Plotnikov AN, Walsh MJ, Zhou MM. Mechanism and regulation of acetylated histone binding by the tandem PHD finger of DPF3b. Nature. 2010;466:258–262. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

DATA S1: Supplemental information.

PRO-33-e5065-s001.pdf (33.8MB, pdf)

Data Availability Statement

All data needed to evaluate the conclusions in the paper are present in the paper and the Supplementary Materials.


Articles from Protein Science : A Publication of the Protein Society are provided here courtesy of The Protein Society

RESOURCES