Skip to main content
Computational and Structural Biotechnology Journal logoLink to Computational and Structural Biotechnology Journal
. 2025 May 15;27:1998–2013. doi: 10.1016/j.csbj.2025.05.010

Bridging prediction and reality: Comprehensive analysis of experimental and AlphaFold 2 full-length nuclear receptor structures

Akerke Mazhibiyeva a, Tri T Pham b, Karina Pats a,c, Martin Lukac d, Ferdinand Molnár a,
PMCID: PMC12149446  PMID: 40496892

Abstract

AlphaFold 2 has revolutionized protein structure prediction, yet systematic evaluations of its performance against experimental structures for specific protein families remain limited. Here we present the first comprehensive analysis comparing AlphaFold 2-predicted and experimental nuclear receptor structures, examining root-mean-square deviations, secondary structure elements, domain organization, and ligand-binding pocket geometry. While AlphaFold2 achieves high accuracy in predicting stable conformations with proper stereochemistry, it shows limitations in capturing the full spectrum of biologically relevant states, particularly in flexible regions and ligand-binding pockets. Statistical analysis reveals significant domain-specific variations, with ligand-binding domains showing higher structural variability (CV = 29.3%) compared to DNA-binding domains (CV = 17.7%). Notably, Alphafold 2 systematically underestimates ligand-binding pocket volumes and captures only single conformational states in homodimeric receptors where experimental structures show functionally important asymmetry. These findings provide critical insights for structure-based drug design targeting nuclear receptors and establish a framework for evaluating Alphafold 2 predictions across other protein families.

Keywords: AlphaFold 2, Nuclear receptors, Structural comparison, Conformational diversity, Domain architecture, Prediction accuracy

Graphical abstract

graphic file with name gr001.jpg

Highlights

  • AlphaFold 2 shows high accuracy for stable conformations but misses the full spectrum of biologically relevant states.

  • Nuclear receptor LBDs show higher structural variability (CV = 29.3%) compared to DBDs (CV = 17.7%).

  • AF2 systematically underestimates ligand-binding pocket volumes by 8.4% on average.

  • AF2 models miss functional asymmetry in homodimeric receptors where experimental structures show conformational diversity.

  • AF2 models have higher stereochemical quality but lack functionally important Ramachandran outliers.

1. Introduction

As of January 2025, the Protein Data Bank, hosted at rcsb.org (RCSB PDB), contained 230,083 experimental biological macromolecular structures [4], [5], whereas the UniProtKB/TrEMBL protein database (release 2024_06) contained 254,254,987 sequence entries [14]. The “structural gap” in structural genomics refers to a state where a generated data from the next and third generation sequencing and the subsequent in silico prediction of protein-coding sequences grows over the time faster than the number of experimentally determined protein structures. By researchers in the structural biology community, the proposed solution to bridge this gap is the AlphaFold 2 (AF2) system, which combines computational algorithms and artificial intelligence to efficiently predict and build structural models [27], [50], [51], [59]. All the AF2 predicted structures can be obtained from AlphaFold Protein Structure Database (https://alphafold.ebi.ac.uk/) and recently also from RCSB PDB [4]. For training, the AF2 algorithm has been using protein structures that are mostly from the PDB release prior to April 30, 2018 with the addition of some structures obtained from the PDB release before February 15, 2021 [27], [51]. However, it has been reported that PDB templates are not required for an accurate AF2 predictions, as it can produce de novo models using multiple sequence alignments (MSA) alone. Moreover, AF2 may also disregard the PDB templates with low quality [50]. Although AF2 is incapable of predicting accurately the positions of cofactors, metals, ions, and nucleic acids, which are found in many experimental structures, it is trained to predict protein structures as close to their native conformation as possible [27]. Therefore, the positions of the backbone and side chains are usually consistent with the expected structures that contain cofactors or ions. Nevertheless, the absence of additional structural information may result in an inaccurate or improper prediction of the protein structure [51]. The AF2 model's accuracy might vary both within and between structures depending on the particular protein [27]. This accuracy can be assessed by the predicted local distance difference test (pLDDT) score which is provided for each residue [50]. It is important to note that the pLDDT score primarily represents the model's internal confidence in its prediction, rather than a direct measure of structural accuracy. While pLDDT correlates with actual lDDT-Cα measures (Pearson's r=0.76) [27], this relationship is imperfect. For instance, a high pLDDT value indicates that AF2 is confident in its prediction, but does not necessarily guarantee that the predicted structure matches the true biological conformation. Conversely, regions with low pLDDT scores may reflect cases where AF2 lacks confidence due to limited evolutionary information or inherent structural flexibility. This distinction helps explain why a specific problem manifests in case of disordered protein regions that produce a low-confidence pLDDT score below 50. The <50 score suggests that the region is unstructured in a physiological environment or requires additional interaction partners (such as cofactors, DNA, or dimerization partners) that would stabilize its conformation in native protein complexes. Indeed, many proteins show variable degrees of intrinsic disorder and thus require additional partners and it has been estimated that at least 30% of the human proteome are intrinsically disordered proteins [33]. In addition, there is a significant amount of proteins that contain multiple domains. These two features bring high complexity to protein structures located in complexes. The regions with a score between 50 and 70 are likewise poorly modeled and have low confidence whereas range between 70 and 90 indicates better models producing a good backbone prediction. Finally, regions with a score higher than 90 are expected to have the highest accuracy [50].

Based on our expertise and the importance of the nuclear receptor (NR) superfamily as drug targets, we have chosen to evaluate their AF2 predicted and experimental structures. In particular, full-length (FL) multi-domain NRs are of importance since most of them are not present in the PDB. AF2 might be very useful for NR research, specifically to understand the structure-function relationship for their potential as therapeutic targets. For this purpose, it is important to assess the reliability of NR models predicted by AF2. NRs are ligand-activated transcription factors that are responsible for sensing metabolic and systemic hormonal signals [49]. Once activated, they regulate the transcription of genes that control a wide range of cellular processes including cell proliferation, growth, differentiation, and metabolism. Human NRs can be divided into three subgroups based on the nature and affinity of their ligands: i) endocrine steroid hormone receptors, ii) nutritional sensors (adopted orphans) and iii) orphan receptors [6]. Endocrine receptors were identified when researchers were looking for the receptors of well-known steroids such as testosterone, estradiol, progesterone, cortisol and aldosterone. Orphan receptors are NRs for which until now no ligand has been identified or no ligand exists for them. However, earlier, for some of the “orphans” natural or xenobiotic compounds were identified as their ligands and thus, these receptors became adopted orphan receptors. NRs are involved in nearly all human cellular processes, thus they are one of the most established and studied drug targets, being responsible for the therapeutic effect of 16% of small-molecule drugs [3], [41].

Given the importance of NRs as drug targets and the scarcity of FL multi-domain NR structures in the PDB, we conducted the first comprehensive assessment comparing AF2-predicted conformations to all available experimental three-dimensional (3D) structures. For this analysis, we selected all human NRs with available FL multi-domain experimental structures in the PDB as of January 2025, resulting in the inclusion of seven NRs: (i) GR, (ii) HNF4α, (iii) LXRβ, (iv) NURR1, (v) PPARγ, (vi) RARβ, and (vii) RXRα. This comprehensive set represents diverse NR subfamilies, including both steroid hormone receptors and nutritional sensors, providing a representative cross-section of the NR superfamily for evaluating AF2 prediction accuracy across varying receptor architectures and functional states. This study provides a critical benchmark to guide future structural and drug discovery efforts in this important protein family.

2. Results

2.1. The availability of structural data for human NRs in PDB

To date (May 2025, see also Table 1), the 12 human steroid hormone NRs have altogether 817 solved structures, the estrogen receptor (ER) α being the most studied with 454 structures yet from all of them only retinoic acid receptor (RAR) β and glucocorticoid receptor (GR) have a FL multi-domain structural data available. From the 27 human nutritional sensors there are 1026 accessible structures in the PDB, the peroxisome proliferator-activated receptor (PPAR) γ having the highest number e.g. 316 structures available in single and as FL multi-domain 3D structures. Besides PPARγ, there are FL multi-domain 3D structures for hepatocyte nuclear factor (HNF)4α, liver X receptor (LXR) β, NR related 1 protein (NURR1) and retinoid X receptor (RXR) α. Two NRs from this family the RAR-related orphan receptor (ROR) β and testicular NR (TR) 2 lack any kind of structural data. The 9 human orphan NRs have the lowest presence in the PDB mostly in the form of peptides e.g. 20 structures. The germ cell nuclear factor (GCNF) does not have any available 3D structures in PDB. Out of 48 human NRs altogether seven human FL multi-domain NR 3D structures are available in the PDB (Table 2) 1): i) two GR [35] ii) one (HNF4α) homodimer [9], iii) one RXRα-LXRβ heterodimer [31], iv) one NURR1 [58] v) three PPARγ-RXRα heterodimers [8], vi) one RARβ-RXRα heterodimer, and vii) the FL RXRα present in all three heterodimers [10].

Table 1.

Distribution of NR structures in PDB (January 2025) organized by category.

Steroid Hormone NRs
Nutritional Sensors
Orphan NRs
NR PDB FL NR PDB FL NR PDB FL
ERα 456 0 PPARγ 317 3 SHP 13 0
AR 94 0 RXRα 105 5a Othersb 7 0
GR 58 2 FXR 86 0
VDR 52 0 PXR 72 0
ERβ 37 0 PPARα 70 0
TRβ 32 0 PPARδ 55 0
MR 29 0 ERRγ 43 0
PR 20 0 LRH-1 27 0
RARα 14 0 LXRβ 25 1
RARγ 11 0 NURR77 20 0
TRα 10 0 RORγ 158 0
RARβ 9 1 NURR1 8 1
Othersc 0 0 Othersd 52 0



Total 822 3 Total 1,021 5 Total 20 0



Combined Total: 1,863 PDB structures, 7 FL multi-domain structures
a

RXRα appears in 5 heterodimeric FL structures (with PPARγ, LXRβ, and RARβ).

b

Includes COUP-TFI, COUP-TFII, DAX-1, EAR-2, NOR1, PNR, TLX (1 structure each), and GCNF (0).

c

No additional steroid hormone NRs with structures.

d

Includes CAR (2), ERRα (5), ERRβ (3), HNF4α (7, 1 FL), HNF4γ (1), LXRα (7), REV-ERBα (5), REV-ERBβ (6), RORα (3), RORβ (0), RXRβ (7), RXRγ (2), SF-1 (6), TR2 (0), TR4 (5).FL multi-domain structures include: two GR structures [35], one HNF4α homodimer [9], one RXRα-LXRβ heterodimer [31], one NURR1 [58], three PPARγ-RXRα heterodimers [8], one RARβ-RXRα heterodimer, and FL RXRα present in heterodimers [10].

Table 2.

Mapping of experimental full-length NR structures to AlphaFold Database entries used in this study.

NR UniProt ID PDB Code Chains AF2 Database ID Resolution (Å)
GR P04150 7PRV A,B AF-P04150-F1 2.70
7PRW A,B 2.50
HNF4α P41235 4IQR A,B,E,F AF-P41235-F1 2.90
LXRβ P55055 4NQA B,I AF-P55055-F1 3.10
NURR1 P43354 7WNH A,B,C,D AF-P43354-F1 3.10
PPARγ P37231 3DZU D AF-P37231-F1 3.20
3DZY D 3.10
3E00 D 3.10
RARβ P10826 5UAN B AF-P10826-F1 3.51
RXRα P19793 3DZU A AF-P19793-F1 3.20
3DZY A 3.10
3E00 A 3.10
4NQA A,H 3.10
5UAN A 3.51

2.2. Root-mean-square deviation of the structures

To assess the fine differences between the protein structures, three root-mean-square deviation (RMSDs) metrics have been calculated: i) all-atoms (Fig. 1A column a), ii) backbone (N, Cα, C and O atoms) (Fig. 1A column b) and iii) only Cα carbons (Fig. 1A column c). This comprehensive approach allowed us to overcome the positional domain bias that was addressed later in the “Geometric analysis of domain architecture section” and is due to differential positioning of predicted and experimental protein domains in the 3D space. Statistical analysis revealed distinct patterns in structural deviation, with mean RMSD values for DNA-binding domains (DBDs) of 0.442 ± 0.078 Å (all-atom), 0.379 ± 0.077 Å (backbone), and 0.359 ± 0.080 Å (Cα), while ligand-binding domains (LBDs) showed slightly higher deviations of 0.512 ± 0.150 Å (all-atom), 0.452 ± 0.135 Å (backbone), and 0.442 ± 0.137 Å (Cα). The AF2 predicted models for the majority of NRs demonstrated high structural similarity with experimental structures, showing all-atom RMSD less than 0.55 Å for both DBD and LBD domains (Fig. 1A). Notable exceptions were observed in GR LBDs (PDBID: 7PRW both chain A and B), NURR1 LBDs (PDBID: 7WNH all chains A, B, C, D), and LXRβ DBD (PDBID: 4NQA chain I). Quantitative assessment revealed significant domain-specific differences (mean difference = -0.070 Å, t = -1.988, p < 0.05, Cohen's d = 0.574), with DBD RMSD values unexpectedly exceeding those of LBD in 41.7% of cases (10 out of 24 structures). The conventional hierarchical pattern of RMSD values (all-atom > backbone > Cα) was not maintained in 12.5% of the analyzed structures, while the backbone and Cα RMSDs remained consistently below 0.50 Å, except for the aforementioned NURR1 and LXRβ structures. Importantly, individual NR analysis revealed receptor-specific variation patterns: NURR1 showed the most extreme domain-specific differences (DBD: CV = 0.7%; LBD: CV = 18.0%, range = 0.379 Å), GR demonstrated moderate variation (DBD: CV = 3.1%; LBD: CV = 9.5%), while HNF4A uniquely showed higher variability in DBD predictions (CV = 14.7%) compared to LBD (CV = 7.9%). Overall, the LBD showed particularly higher structural variability (CV = 29.3%) compared to DBD (CV = 17.7%), suggesting distinct challenges in structural prediction across different NR families and their domains.

Fig. 1.

Fig. 1

Analysis of RMSD and SSE content for experimental and AF2-predicted full-length NRs. A) RMSD values are shown for all atoms (a), backbone atoms only (b), and Cα atoms (c), measured in Å. The columns are overlaid, and different color hues represent the conditions used to calculate RMSD. Exceptions to the expected hierarchical pattern (all-atom > backbone only > Cα only) are marked with asterisks (*). Additionally, the 10 cases where the DNA-binding domain (DBD) shows higher RMSD than the ligand-binding domain (LBD) are highlighted with “DBD” in red text. B) Percentage distribution of secondary structure elements (SSEs) for α-helices (H), β-sheets (S), and loops (L) across all analyzed structures. In both panels, color coding distinguishes different chains: blue (first chain), orange (second chain), green (third chain), red (fourth chain), purple (fifth chain), brown (sixth chain), and gray (AF2 predictions, only in panel B).

2.3. Secondary structure element content analysis

Secondary structure element (SSE) analysis was performed to assess the similarity of experimental and predicted structures in terms of the α-helical, β-sheet and loop content (Fig. 1B). Statistical analysis revealed systematic differences in SSE distribution, with AF2 predictions showing a general tendency toward higher α-helical content (mean difference = +4.2% ± 2.1% for DBDs, +3.8% ± 1.9% for LBDs, p < 0.05) and compensatory decreases in loop regions (-5.0% ± 2.3%, p < 0.05). β-sheet content showed minimal variation across structures (mean difference = +0.8% ± 0.5%). This pattern was particularly evident in individual NR analysis: GR DBD showed 3-4% more α-helical and 1.5% more β-sheet content in the predicted structure, while its LBD exhibited 4-7% lower α-helical proportion compared to experimental structures. HNF4α predictions displayed 5-10% higher DBD α-helical content, with preserved β-sheet proportions and reduced loop content. LXRβ demonstrated similar trends in its DBD (6-7% higher α-helical content), while maintaining consistent LBD secondary structure patterns. NURR1 showed an inverse trend, with experimental structures (7WNH chain A, B, C) having approximately 4% higher DBD α-helical content compared to the AF2 structure. PPARγ and RARβ predictions showed 3-5% higher α-helical content in both domains, with corresponding reductions in loop regions. RXRα displayed the most notable differences, with AF2 predictions showing 4-8% higher α-helical content and the presence of β-sheet structure (approximately 10% in DBD, 4% in LBD) where several experimental structures showed none. Strong negative correlations were observed between α-helical and loop content (r = -0.527 for DBD, r = -0.906 for LBD), indicating compensatory relationships between these SSEs.

2.4. Geometrical analysis of domain architecture

2.4.1. Distance ϵDBD,LBD between DBD and LBD

The distance between the two center of masses (COMs) DBDCOM and LBDCOM was measured in order to assess the similarity of the experimental and predicted structures in terms of their DBD and LBD position and are displayed in Fig. 2A. Experimental GR PDBs consist of GR homodimer-DNA-peptide in complex with furoate (7PRV) and velsecorat (7PRW). The AF2 GR (42.664 Å) is more similar to 7PRV_B and 7PRW_B, which have DBD and LBD at a distance of 46.865 Å and 44.134 Å, respectively. 7PRV_A and 7PRW_A have a less distanced domains, both around 32 Å. The PDB 4IQR of experimental HNF4α consists of an asymmetric unit with two independent HNF4α homodimer-DNA-peptide complexes. The LBDs of NRs are symmetrical when superimposed separately, but the overall NR has an asymmetrical organization in a complex with DNA. The AF2 model at 65.058 Å is more similar to experimental HNF4α (4IQR_B, _F) which have DBD and LBD at a distance of around 66 Å at their COMs. The other experimental HNF4α (4IQR_A, _E) have less distanced domains, around 38-39 Å. For LXRβ, it is seen that the relative positions of NR DBDs and LBDs slightly differ within the asymmetric unit of RXRα-LXRβ heterodimers. Compared with the experimental structures, the LXRβ AF2 model at 43.928 Å is more similar to 4NQA_I, which has a distance of 43.537 Å between its DBD and LBD COMs, than 4NQA_B with 45.331 Å. The NURR1 PDB (7WNH) consists of four monomer-DNA complexes. The AF2 model of NURR1 has DBD and LBD at a distance of 44.956 Å, which is more than experimental structures (7WNH_A at 37.72 Å; _B at 36.93 Å), but less than 7WNH_C 63.191 Å and _D 65.044 Å. PPARγ has a closely positioned DBD and LBD both in experimental (3DZU: 38.657 Å; 3DZY: 39.162 Å; 3E00: 39.368 Å) and predicted model 39.343 Å. RARβ has less distanced DBD and LBD, the distance between DBDCOM and LBDCOM being slightly more than 35 Å for both experimental 5UAN 35.374 Å and predicted 35.143 Å structures. The DBD and LBD of RXRα partnered with LXRβ (4NQA_A, _H) and RARβ (5UAN) are spatially displaced from each other, the distance between their COMs being around 54-58 Å. The distance is smaller for RXRα partnered with PPARγ (3DZY, 3DZU and 3E00), around 46-47 Å. This value is also more close to the predicted model with 48.139 Å. A striking pattern emerged in the analysis of homodimeric structures (GR and HNF4α): both receptors display marked asymmetry in their domain organization when bound to DNA. In GR structures, one monomer exhibits a compact conformation (domains ∼32 Å apart in 7PRV_A/7PRW_A) while its partner adopts an extended state (∼44-47 Å in 7PRV_B/7PRW_B). Similarly, HNF4α shows asymmetric organization with one set of monomers having closer domain proximity (∼38-39 Å in 4IQR_A/E) compared to their partners (∼66 Å in 4IQR_B/F). Notably, AF2 predictions capture only a single conformational state for each receptor (GR: 42.664 Å; HNF4α: 65.058 Å), corresponding more closely to the extended conformations observed in experimental structures.

Fig. 2.

Fig. 2

Domain organization of FL NRs analyzed through geometric measurements. A) Distances (ϵDBD,LBD) between the COM of the DBD and LBD, reported in Å. Color coding distinguishes different chains: blue (first chain), orange (second chain), green (third chain), red (fourth chain), purple (fifth chain), brown (sixth chain), and gray (AF2 predictions). Notable patterns include marked asymmetry in homodimers: GR shows one monomer with compact conformation (∼32 Å in 7PRV_A/7PRW_A) and its partner in extended state (∼44-47 Å in 7PRV_B/7PRW_B); similarly, HNF4α displays asymmetric organization (38-39 Å in 4IQR_A/E vs. 66 Å in 4IQR_B/F). AF2 models predict only single conformational states (GR: 42.664 Å; HNF4α: 65.058 Å) corresponding to extended conformations. Partner-specific variations are observed in RXRα (46-47 Å with PPARγ vs. 54-58 Å with LXRβ/RARβ). B) Angles (θDBD,LBD) formed by the COMs of the DBD, Hinge, and LBD, given in °. Significant variations include HNF4α showing two distinct angle conformations (108-120°in 4IQR_A/E vs. 166-169°in 4IQR_B/F); NURR1 exhibiting variable angles across structures (112-115°in 7WNH_A/B, 134-135°in 7WNH_C/D); and RXRα displaying partner-dependent angles (72-74°with PPARγ, 119-132°with LXRβ, 151°with RARβ). C) Dihedral Angles (DHDBD,LBD) defined by the COM of the DBD, the last Cα atom of the DBD, the COM of the LBD, and the first Cα atom of the LBD, reported in °. GR and NURR1 show bimodal distributions: GR varies between -91°to -88°(7PRV_A/7PRW_A) and -13°to -7°(7PRV_B/7PRW_B); NURR1 shows positive angles (103-108°in 7WNH_A/B) and negative angles (-106°to -103°in 7WNH_C/D). AF2 predictions generally capture predominant conformational states observed in experimental structures.

2.4.2. Angle θDBD,LBD between DBD and LBD

The angles between the DBDCOM, HingeCOM and LBDCOM were measured to specifically evaluate the relative positions of the DBD and LBD in experimental and predicted structures of NRs (Fig. 2B). GR θDBD,LBD is similar across experimental structures 7PRV_B 104°, 7PRW_A 103.528°and AF2 model 105.517°, but slightly differ for 7PRV_A 100.707°and 7PRW_B 97.590°. For HNF4α, the θDBD,LBD of two chains 4IQR_A 108°and 4IQR_E 120°is notably different from that of the AF2 model with 171°, while the two other chains 4IQR_B 169°and 4IQR_F 166°have angles similar to the predicted structure. LXRβ θDBD,LBD is nearly identical across all analyzed structures at approximately 94-95°. As for NURR1, angles vary across the structures, with 7WNH_C 134.403°and 7WNH_D with 135.500°showing large values of θDBD,LBD, while predicted structure has an angle of 120.367°. 7WNH_A 115.893°and 7WNH_B 112.096°show smaller θDBD,LBD. PPARγ θDBD,LBD is similar for the experimental 3DZU85°, 3DZY 86°and 3E00 85°and predicted structure with 84°. For RARβ, the θDBD,LBD difference between experimental with 109°and predicted model with 110°is not significant. Experimental structures of RXRα partnered with PPARγ (3DZU 74°; 3DZY 74°and 3E00 72°) have similar angles within each other; however, they considerably deviate from the predicted structure, which is at 121°. The predicted RXRα θDBD,LBD is most similar to that of RXRα partnered with LXRβ 4NQA_A 132°and 4NQA_H 119°. RXRα from RARβ-RXRα heterodimer 5UAN has θDBD,LBD of 151°.

2.4.3. Dihedral angles DHDBD,LBD between DBD and LBD

The dihedral angles between DBDCOM, DBDCαL, LBDCαF and LBDCOM were measured to evaluate the relative rotational relationship of the domains (Fig. 2C). The DHDBD,LBD of experimental GR vary within each PDB. 7PRV_A with -88.633°and 7PRW_A with -91.144°have more negative values, while the other two chains 7PRV_B -12.502°and 7PRW_B -6.825°have dihedral angle closer to 0°. The DHDBD,LBD of the predicted GR model is -73.475°. For all experimental HNF4α structures, the DHDBD,LBD is negative, indicating that it is rotated counterclockwise. However, there is a big difference in values between two of the 4IQR chains (4IQR_A -100°and 4IQR_E -113°) and the AF2 model with -166°. The DHDBD,LBD of the predicted structure is more similar to other two 4IQR chains (4IQR_B -157°and 4IQR_F -144°). The LXRβ of 4NQA_I and the AF2 model have a DHDBD,LBD of -39°, while 4NQA_B LXRβ has -26°. Out of four NURR1 experimental structures, two chains 7WNH_A 108.236°and 7WNH_B 103.090°have positive dihedral angles, while the other two chains 7WNH_C -105.849°and 7WNH_D-103.796°along with the predicted NURR1 structure with -103.432°, have negative angles of similar values. Experimental 3DZU 56°; 3DZY 57°and 3E00 55°and predicted PPARγ of 58°structures are similar in terms of their DHDBD,LBD, all being in a range of 55-58°. Similarly, there is no significant difference between the experimental 117°and predicted model with 115°of RARβ DHDBD,LBD. Experimental RXRα partnered with PPARγ 3DZU 80°, 3DZY 80°and 3E00 84°, and AF2 model of 83°have a positive DHDBD,LBD in the range of 80-84°. Although RXRα from 4NQA_B also has a positive angle, the value is substantially different, 148°. RXRα of 4NQA_I and 5UAN have a negative DHDBD,LBD around -121°, and -113°, respectively.

In summary, all the geometric analyses of domain architecture reveal distinct patterns of domain organization across the NR superfamily, with homodimeric receptors exhibiting marked asymmetry that is not captured by AF2 predictions, and heterodimeric receptors showing partner-specific conformational adaptations, particularly evident in RXRα partner interfaces.

2.5. Analysis of ligand-binding pocket geometry

The analysis of ligand-binding pockets (LBPs) reveals distinct patterns between experimental structures and AF2 predictions (Fig. 3). Volumetric calculations show varying degrees of difference between experimental and predicted structures (Fig. 3A). GR structures 7PRV_A/B closely match AF2 predictions (∼3% difference), while 7PRW_A/B show 16-18% larger volumes. HNF4α experimental structures consistently show larger volumes than predicted 111.6-120%. LXRβ displays the most significant volume differences, with experimental structures 4NQA_B and 4NQA_I exceeding AF2 predictions by 163.5% and 155.8%, respectively. NURR1, being a true orphan NR, exhibits an occluded LBP due to bulky aromatic residues in its apo form. PPARγ shows mixed results, with 3DZU_D having 113.3% larger volume than AF2, while 3DZY_D and 3E00_D show smaller volumes (90.4% and 89.7%). RARβ experimental structure (5UAN_B) closely matches AF2 predictions (104.2%). RXRα structures show variable agreement, with 3E00_A, 4NQA_H, and 5UAN_A closely matching AF2 (≤3% difference), while 3DZU_A, 3DZY_A, and 4NQA_A show larger volumes (107.6-117.2%).

Fig. 3.

Fig. 3

LBP metrics analysis across analyzed NRs. A) Surface representation of the calculated LBPs from FL NRs. The LBP of the AF2-predicted structure was set as 100% and served as a reference for normalizing the LBPs of other structures. Color coding distinguishes different chains: blue (first chain), orange (second chain), green (third chain), red (fourth chain), purple (fifth chain), brown (sixth chain), and gray (AF2 predictions). Notable variations include LXRβ experimental structures exceeding AF2 predictions by >150%, while GR structures 7PRV_A/B closely match AF2 predictions (∼3% difference). NURR1 shows occluded LBPs in both experimental and predicted structures due to its orphan receptor status. B) Distribution of calculated LBP features: volume, area, effective radius, and sphericity across receptors. Statistical analysis reveals AF2 accurately predicts pocket sphericity (average deviation 3.2%, r=0.891) but systematically underpredicts effective pocket radius (average deviation 8.4%, r=0.643). A negative correlation (r=-0.65) between radius and sphericity indicates larger pockets tend to be less spherical. RXRα shows the most consistent predictions across parameters, while LXRβ exhibits the largest radius discrepancy (-15.8%) and GR shows the highest sphericity variation (-7.9%).

Statistical analysis using Kruskal-Wallis tests confirms significant differences between receptors in both effective radius (H=32.145) and sphericity (H=27.893). AF2 demonstrates high accuracy in predicting pocket sphericity with average deviation 3.2%, slope=0.967, r=0.891 but systematically underpredicts effective pocket radius with average deviation 8.4%, slope=0.892, r=0.643. A negative correlation between effective radius and sphericity r=-0.65 suggests larger pockets tend to be less spherical. RXRα shows the most consistent predictions across both parameters effective with radius deviation -0.5%, sphericity deviation 0.0%, while LXRβ exhibits the largest radius discrepancy of -15.8% and GR shows the highest sphericity variation of -7.9%. Hierarchical clustering groups RXRα/RARβ as most similar and GR as most distinct, reflecting their structural characteristics.

In conclusion, the statistical analysis using Kruskal-Wallis tests confirms significant differences between receptors in both effective radius (H=32.145) and sphericity (H=27.893), with AF2 systematically under-predicting pocket volumes by 8.4% on average while accurately capturing pocket sphericity (average deviation 3.2%).

2.6. Analysis of structure validation parameters

Comparison of Real-Space Correlation Coefficients (RSCC) values from experimental structures and pLDDT scores from AF2 structures revealed consistent patterns with overlapping scores in most NRs, particularly in their DBDs and LBDs (Fig. 4). Overall, AF2 predictions align well with experimental structures in well-structured regions like the DBDs and LBDs. However, notable deviations are observed in specific areas, such as the GR region near residue 616 (Fig. 4, GR red arrow), where all experimental structures show high RSCC scores, but the AF2 pLDDT score is low. Conversely, in the HNF4α 4IQR_B structure, for residue 145 (Fig. 4, HNF4α 4IQR_B purple arrow), AF2 shows pLDDT around 0.6, while RSCC drops to a negative value. For LXRβ, regions with low RSCC values correspond to low-confidence pLDDT scores (pLDDT < 0.5). In NURR1 (7WNH_C, _D), RSCC values fluctuate significantly (0.6–1) around residue 300 (Fig. 4, NURR1 7WNH_C/D blue arc), but AF2 models maintain consistently high pLDDT scores (>0.9). PPARγ, RARβ, and RXRα show consistent patterns of overlapping RSCC and pLDDT values, with localized drops in scores of varying magnitudes.

Fig. 4.

Fig. 4

Real-Space Correlation Coefficients (RSCC) and pLDDT score comparison across analyzed NR. Black lines represent AF2 pLDDT scores, while colored lines show RSCC values from experimental structures: blue (first chain), orange (second chain), green (third chain), red (fourth chain), purple (fifth chain), and brown (sixth chain). Key regions of interest are highlighted: red arrow indicates GR residue 616, where experimental structures show high RSCC but AF2 predicts low confidence; purple arrow points to HNF4α 4IQR_B residue 150, where AF2 shows moderate confidence while experimental RSCC drops to negative values; blue arcs highlight NURR1 regions around residue 300 in chains C and D (7WNH), where experimental RSCC fluctuates significantly while AF2 maintains high confidence scores. These patterns reveal structure-prediction discrepancies that often correspond to functionally important flexible regions across the NR family.

Structural validation metrics RSCC, b-factors, and pLDDT were displayed in “putty” representation for DBD and LBD of illustrative NRs (Fig. 5). The tube diameter and color range are linked to structural confidence and flexibility. Narrow, blue tubes represent high-confidence structural metrics (high RSCC, low B-factors, and high pLDDT values), while large, red tubes indicate low-confidence regions. GR and RXRα (from RXRα-PPARγ heterodimer, PDB: 3DZY) demonstrate high correlation between RSCC, b-factors, and pLDDT, as the “putty” representation is consistent across all shown metrics. Inversely, NURR1 and RXRα (from RXRα-LXRα heterodimer, PDB: 4NQA) exhibit lower conservation across structural validation metrics. Within GR DBD, many residues exhibit consistently high RSCC, low b-factors, and high pLDDT values, such as P418, K419, and L420, indicating strong structural agreement and stability in this domain. However, notable outliers include Q452 (Fig. 5 top panel) (RSCC = 0.74, b-factor = 101.99, pLDDT = 69.54), which deviates significantly from neighboring residues, suggesting a region of increased flexibility or a distinct structural feature. Similarly, N454 and H453 show elevated values, marking a transition to a more flexible region. In the GR LBD, residue S616 exhibits another interesting discrepancy, with experimental structures showing consistently high RSCC values (>0.90) while the AF2 model predicts relatively low confidence (pLDDT = 0.68). This residue forms part of a functionally important LBD-DBD interface region that is critical for receptor function when bound to ligands. In RXRα/PPARγ complex the RXRα LBD, shows similar patterns that support structural stability and consistency, whereas NURR1 DBD and RXRα LBD in complex with LXRβ exhibit cases where RSCC and pLDDT diverge, highlighting structurally ambiguous regions.

Fig. 5.

Fig. 5

Structural validation metrics of illustrative NRs. “Putty” representation of GR DBD, NURR1 DBD, RXRRXRα as heterodimer of PPARγ and in complex with LXRβ highlights local structural confidence using RSCC, b-factors, and pLDDT data. Tube thickness and color intensity correspond to residue-level variability, with blue and narrow regions indicating high structural reliability (high RSCC, low b-factors, high pLDDT), whereas red and expanded regions mark flexible or uncertain areas. GR DBD shows consistently high confidence metrics for most residues (e.g., P418, K419, L420), with notable exceptions such as Q452 (RSCC = 0.74, b-factor = 101.99, pLDDT = 69.54), which represents a region of increased flexibility. High correlation across metrics is observed for GR and RXRα (PDB: 3DZY), where putty representations remain consistent across all validation metrics, indicating robust structural prediction. In contrast, NURR1 DBD and RXRα LBD in complex with LXRβ (PDB: 4NQA) exhibit local discrepancies between experimental and computational validation metrics, particularly in regions where RSCC and pLDDT values diverge significantly, highlighting areas where AF2 predictions may not fully capture the conformational states observed in experimental structures.

Taken together, these comparative analyses reveal that while AF2 predictions align well with experimental structures in stable regions, they miss functionally important local conformational features, particularly at domain interfaces critical for allosteric regulation.

2.7. Backbone torsional analysis

We analyzed the backbone conformations between experimental crystal structures and AF2 predictions for seven NRs through Ramachandran plot distributions obtained from Molprobity validation server and quantitative metrics. The GR (7PRV_A) showed the highest structural agreement with ϕ/ψ correlations of 0.842/0.868 respectively, ranking first in overall correlation. The experimental structure contained 4 Ramachandran outliers while its AF2 counterpart showed only 1 outlier. In contrast, LXRβ (4NQA_I) demonstrated the poorest agreement, with the lowest ϕ/ψ correlations (0.535/0.533) and highest ψ angle Wasserstein distance (13.99). The Ramachandran plots for these two representative cases, illustrating the best and worst agreement between experimental and AF2 structures, are shown in Fig. 6. HNF4α exhibited an interesting disparity between ϕ correlation (0.915, ranked 1st) and ψ correlation (0.751, ranked 16th). Quantitative comparison revealed systematic differences in stereochemical quality. The experimental PPARγ structure (3E00_D) contained 18 Ramachandran outliers with 78.0% residues in favored regions, while its AF2 model showed only 1 outlier with 98.9% residues in favored regions. RARβ showed high stereochemical quality in both experimental (no outliers) and predicted structures. RXRα (3DZY_A) displayed high Wasserstein distances (10.62 for ϕ, 7.38 for ψ) despite moderate correlation values.

Fig. 6.

Fig. 6

Ramachandran plot comparison between experimental structures and AF2 predictions for LXRβ and GR. Left panels: LXRβ 4NQA_I (experimental, orange circles) vs AF2 prediction (gray circles), showing the poorest agreement among analyzed NRs (ϕ/ψ correlations of 0.535/0.533 and ψ angle Wasserstein distance of 13.99). Right panels: GR 7PRV_A vs AF2 prediction, demonstrating the highest structural agreement (ϕ/ψ correlations of 0.842/0.868) among the seven analyzed NRs. Specific outliers are labeled, including F332 and A194 for LXRβ, and S708 and N472 for GR. Tables below each plot provide statistics highlighting the number of amino acids in favored (98%) and allowed (>99.8%) regions, as well as outliers. Note that the experimental GR structure contained 4 Ramachandran outliers while its AF2 counterpart showed only 1, reflecting AF2's tendency toward higher stereochemical quality, potentially at the expense of functional conformational diversity. Data obtained from Molprobity validation server (http://molprobity.biochem.duke.edu).

Collectively, the statistical analysis demonstrates that AF2 models systematically exhibit higher stereochemical quality compared to experimental structures, potentially missing functionally relevant Ramachandran outliers that reflect biologically important conformational flexibility.

3. Discussion

While several metrics exist for protein structure comparison, we focused on RMSD analysis with different atom selections to provide a comprehensive structural evaluation. Alternative metrics such as the Template Modeling score (TM-score) have been proposed to address certain limitations of RMSD [57]. Unlike RMSD, TM-score is length-independent and weights smaller distances more strongly than larger ones, making it more sensitive to global fold similarity [56]. However, for our analysis of homologous NR structures with the same fold, the domain-specific RMSD approach effectively mitigates the length-dependency issue by analyzing DBD and LBD regions separately. Additionally, our multi-metric RMSD analysis (all-atom, backbone, Cα) provides complementary information about both local and global structural similarities, offering insights similar to what might be gained from TM-score while maintaining computational efficiency [24]. The observed RMSD patterns in NR predictions highlight both the strengths and challenges of AF2 in modeling multi-domain proteins. The generally low RMSD values (< 0.55 Å) for most structures align with previous benchmarking studies, which demonstrated AF2's high accuracy in predicting monomeric protein structures [27]. This aligns with findings from cryoEM challenge datasets, where AF2 achieved consistent structural agreement between its models and cryoEM density maps. Analysis of 13 unique datasets showed that only one case required domain-level adjustments, with a clear correlation between pLDDT scores and model accuracy [23]. However, the domain-specific variations observed, particularly in NURR1 and LXRβ, suggest that certain NRs pose unique structural challenges. These discrepancies likely stem from factors such as domain flexibility, inter-domain interactions, or the absence of cofactors and ligands in the prediction pipeline, an issue that has been previously noted when modeling LBDs [2]. The unexpected finding that 41.7% of cases exhibited higher RMSD values in DBDs compared to LBDs contrasts with the conventional notion that DBDs are generally more structurally conserved across the NR subfamily [8], [9]. One possible explanation is that AF2 predictions may not fully capture the rigid-body constraints imposed by DNA interactions in experimental structures. Additionally, the violation of the expected RMSD hierarchy (all-atom > backbone > Cα) in 12.5% of cases reflects the inherent variability in AF2's accuracy across different structural elements, particularly in flexible or solvent-exposed regions, as previously observed in protein structure prediction studies [27], [2], [51], [26]. Furthermore, while AF2 can predict multiple conformations, its accuracy varies depending on the protein's flexibility and the presence of multiple domains. This aligns with our observation that AF2's predictions show variability in flexible regions, such as LBDs, leading to deviations in the expected RMSD hierarchy. Therefore, the challenges in predicting flexible or solvent-exposed regions may contribute to the observed discrepancies in our study [15]. Our observation of domain-specific variations, particularly pronounced in NURR1 (CV = 0.7% for DBD vs. 18.0% for LBD), supports previous findings that AF2's performance can be influenced by missing cofactors and allosteric effects [51]. Hekkelman et al. demonstrated that AF2 models systematically lack small molecules essential for molecular structure and function, with NRs specifically affected by the absence of ligands that regulate their conformational states [20]. This limitation is particularly significant for proteins that undergo substantial conformational changes in response to ligand-binding, as many allosteric modulators induce structural rearrangements, as show on the example of ABL1 protein kinase [37], that current AF2 implementations cannot predict [20]. For NRs specifically, the absence of ligands in AF2 predictions is consequential, as these proteins utilize complex allosteric pathways initiated by ligands, DNA, and cofactors to regulate their transcriptional activity [36], [29], [13]. Structural variability in LBDs is expected given their functional plasticity, which is required for ligand-binding and conformational switching. The results suggest that while AF2 provides highly accurate backbone predictions, domain-specific challenges remain, particularly for proteins with dynamic regulatory elements.

The systematic differences observed in SSE predictions, particularly the tendency toward higher α-helical content in AF2 models, align with previous observations by Jumper et al. who demonstrated AF2's overall high accuracy but with some biases in structural pattern recognition [27] and Varadi et al. whose analysis of the AF Database revealed consistent SSE distribution patterns [51]. Stevens and He directly quantified this tendency, showing that AF2 overestimates secondary structure elements by 2.28% in regions experimentally determined to be loops, with particular bias toward predicting α-helices [45]. Their comprehensive study of 31,650 loop regions revealed that AF2 has difficulty predicting longer loops (>20 amino acids) accurately and tends to impose more structured elements where experimental structures show greater flexibility. This statistical bias likely stems from the deep learning approach preferentially predicting more stable, well-defined structural elements over disordered or flexible regions that show greater variability in experimental structures. Idrees et al. further demonstrated this pattern, finding that AF2 assigns higher confidence scores (median pLDDT >90) to α-helices and β-sheets than to less structured regions [25]. The strong negative correlation between α-helical and loop content (r=0.906 for LBD) suggests a compensatory mechanism in structural prediction, possibly reflecting the algorithm's training data composition. The complete absence of β-sheets in some experimental RXRα structures, while present in AF2 predictions, corresponds with findings regarding challenges in predicting less prevalent SSEs [2]. Interestingly, Herzberg and Moult demonstrated that AF2 generally excels at capturing uncommon structural motifs that occur in less than 1% of protein residues [21], making the specific case of β-sheet absence in RXRα particularly notable. This discrepancy likely stems from the statistical nature of the deep learning approach, where AF2 may favor predicting more commonly observed structural elements in the training data to minimize overall prediction error. Less frequently occurring structural features may be suppressed in the model's predictions as they represent statistical outliers. Additionally, this disparity might arise from the inherent flexibility of these regions in solution, as demonstrated by NMR and hydrogen/deuterium exchange (HDX) studies [28], which may not be fully captured in either experimental 3D structures or AF2 predictions. The observed variations in domain-specific prediction accuracy suggest that local sequence context and domain interactions play crucial roles in determining secondary structure assignment, further supported by studies comparing crystal and cryo-EM structures of P-loop channels with AF2 models [48].

The geometric analysis of NR domain organization reveals complex patterns of structural variability and domain positioning that reflect both intrinsic flexibility and partner-specific adaptations. Our analyses focused on the DBD and LBD domains, as the intervening hinge region is typically absent from crystal structures due to its inherently flexible and disordered nature. The observed variations in domain distances, angles, and dihedrals demonstrate that NRs can adopt multiple stable conformations, consistent with their allosteric regulation mechanisms [38]. The most pronounced irregularities were observed in the inter-domain distances of HNF4α and NURR1, which showed the largest interquartile range (IQR) values among all analyzed receptors. This bimodal distribution of conformations aligns with recent findings by Chandra et al. [10], who demonstrated that NRs can exist in multiple distinct conformational states depending on their functional context. The substantial variation in HNF4α domain separation (IQR = 27.72 Å) particularly reflects its documented ability to form asymmetric dimers, where one monomer adopts an extended conformation while its partner maintains a more compact arrangement [16]. Domain angles showed receptor-specific patterns of variability, with RXRα exhibiting particularly interesting partner-dependent conformations. This observation supports RXRs' adaptabilities, as RXRα's structural plasticity allows it to adjust to different heterodimeric partners [28], [52]. Notably, our analysis reveals a distinct pattern in conformational flexibility between homodimerizing and heterodimerizing NRs. Receptors capable of homodimerization demonstrate greater conformational diversity, particularly evident in DNA-bound structures where one monomer adopts a compact conformation while its partner maintains an extended state. This asymmetric organization, observed in both GR and HNF4α structures, likely reflects their need to accommodate symmetric self-association and represents a fundamental mechanism for DNA binding. In contrast, NRs that exclusively heterodimerize with RXRα show more constrained conformational ranges, suggesting evolutionary optimization for specific partner interactions. RXRα itself exhibits an intermediate level of flexibility that allows it to serve as a promiscuous dimerization partner. Interestingly, standard AF2 predictions capture only one conformational state, typically matching the more extended conformation. This limitation of the standard AF2 pipeline highlights the importance of experimental structures in revealing the full spectrum of biologically relevant conformational states, particularly for multi-domain proteins like NRs that exhibit functional asymmetry. The systematic differences in RXRα angles when partnered with PPARγ versus LXRβ (∼50° difference) align with recent cryo-EM studies showing partner-specific conformational selection in NR heterodimers [8], [31]. The dihedral angle distributions revealed unexpected patterns, particularly in NURR1 and GR structures, where distinct conformational clusters were observed. Recent molecular dynamics (MD) studies that suggest that such conformational diversity might be essential for allowing these receptors to interact with different regulatory elements and coregulators [54], [55]. The bimodal distribution of GR dihedral angles (IQR = 81.81°) particularly reflects its documented ability to adopt distinct conformations in response to different ligands [42], [30]. AF2 predictions generally captured the predominant conformational states observed in experimental structures, but showed interesting deviations in cases of extreme structural plasticity. This aligns with the assessments of AF2's performance on flexible multi-domain proteins [2], suggesting that while the algorithm effectively predicts stable conformations, it may not capture the full range of physiologically relevant structural states. The boxplot analysis (Fig. 2A-C) highlights key patterns in NR structural variability. HNF4α and NURR1 exhibit greater conformational diversity, likely reflecting their functional plasticity. RXRα shows partner-dependent clustering, supporting the idea that its conformation is influenced by heterodimeric interactions [32]. Domain-specific differences reveal that DBD-LBD distances are more variable than angles, suggesting inter-domain separation plays a key role in NR allostery. A more detailed structural comparisons revealed receptor-specific patterns: i) AF2's HNF4α structure shows greater similarity to specific chains (4IQR_B, _F) that facilitate high-affinity DNA-binding [9], ii) the LXRβ model closely matches specific conformational states observed in crystal structures (4NQA_I), particularly in terms of domain geometry parameters and iii) RXRα's predicted structure shows variable similarity to different experimental structures depending on the geometric descriptor analyzed, consistent with its known adaptability to different dimerization partners [10].

In our geometric analysis of domain architecture, we deliberately chose to analyze the raw AF2 predictions rather than performing additional refinement using MD simulations. This methodological choice was driven by our primary objective to assess the intrinsic capabilities and limitations of AF2 in predicting NR structures, providing an unbiased benchmark for this important protein family. While MD simulations could potentially improve the predicted structures by optimizing them toward minimum energy states, such refinement would fundamentally alter what we are evaluating—shifting from an assessment of raw AF2 performance to an evaluation of a hybrid AF2-MD pipeline [27]. Recent studies reflect this duality, for instance Zhou et al. [60] demonstrated that combining AF2 with MD simulations (in their MoDAFold framework) can improve structural accuracy in the context of missense mutations, offering benefits for localized refinement. In contrast, Pei et al. [34] reported that even with MD simulations, AF2 failed to capture conformational shifts in antithrombin variants, instead favoring native-like folds despite the presence of functionally significant mutations. These findings suggest that while post-prediction refinement can enhance certain applications such as ligand docking, it may also introduce artifacts or obscure key features captured by the raw AF2 models. Furthermore, applying MD simulations to both experimental and predicted structures would introduce additional variables, potentially confounding our ability to isolate AF2-specific prediction limitations. For NRs specifically, conformational flexibility and inter-domain dynamics are essential to their function. Imposing energy minimization or structural restraints may bias the models toward artificially rigid conformations, undermining the dynamic nature of these proteins. Future studies extending our current work could explore the impact of various MD protocols on AF2 NR predictions, examining how different force fields and simulation parameters affect the domain organization metrics we have established here. Such hybrid approaches combining AF2's strengths in global fold prediction with MD's capabilities in local refinement represent a promising direction for structure-based drug discovery targeting NRs.

The high accuracy in sphericity predictions, with r=0.891, indicates that AF2 captures the overall pocket geometry well, consistent with its demonstrated ability to predict global protein structure [27], whereas there was a negative correlation between effective pocket radius and sphericity with r=-0.482. Notably, RXRα shows exceptional consistency between experimental and predicted structures, potentially due to its well-conserved pocket architecture [38]. The larger LBP volume range in LXRβ (-15.8%) aligns with studies highlighting its structural flexibility [47], suggesting that static AF2 predictions may not fully capture its dynamic LBP. Moreover, the systematic underestimation of pocket volumes by AF2 suggests that the algorithm may be biased toward capturing more compact, energetically favorable states rather than the full range of physiologically relevant conformations that NRs adopt during transcriptional regulation. An additional study comparing AF2-predicted and experimental structures of G-protein-coupled receptors (GPCRs) provides further evidence of these limitations [19]. While AF2 can predict protein structures with a Cα RMSD accuracy of 1 Å, it struggles with high-accuracy predictions of GPCR pocket side chains. It also poorly predicts extracellular and transmembrane domains, as well as the transducer-binding interface of GPCRs. These findings suggest that AF2 may be less reliable for modeling highly flexible or functionally dynamic regions, reinforcing the importance of experimental structure determination for capturing biologically relevant conformations [19]. All these findings expand our understanding of AF2's capabilities and limitations in predicting NR binding pockets, which is particularly important for structure-based drug design applications [11].

The comparison of structure validation parameters between AF2 predictions and experimental structures showed distinct patterns in NR conformational stability. Differences between RSCC and pLDDT scores are more pronounced in loops or unstructured regions, such as the activation function-1 domain at the N-termini and certain C-terminal regions, where experimental structures are often unavailable. The activation function-1 domain is particularly notable for its uniformly low RSCC and pLDDT values, reflecting the lack of experimental structural data and its inherent flexibility. These findings highlight both the strengths of AF2 predictions in core domains and the limitations in capturing certain unstructured or flexible regions. This observation is supported by a comprehensive analysis of 904 human proteins comparing AF2 and NMR structures, which found that while AF2 models are generally more accurate than NMR structures, NMR structures prove more reliable for dynamic structures [17].

For GR residue S616, located in helix H5 of the LBD, experimental structures consistently show high RSCC values while AF2 predicts low confidence. This region forms part of the LBD-DBD interface that is critical for receptor function - specifically, the N-terminal end of H5 forms substantial interactions with DBD and participates in key domain interfaces when bound to ligands like velsecorat or fluticasone furoate [35]. The high RSCC values in crystal structures suggest this region becomes well-ordered in the presence of binding partners, while AF2's low confidence may reflect inherent flexibility in the unbound state. For HNF4α residue E145 in the hinge region close to helix H1, AF2 predicts moderate confidence (pLDDT > 0.6) while experimental structures show very low or negative RSCC values specifically in chains B and F of the 4IQR structure. This region is part of a critical DBD-hinge-LBD construct (residues 141-368) that facilitates high-affinity DNA binding, as directly demonstrated by the 75-fold weaker binding of isolated DBD-hinge portions (Kd ∼6000 nM) compared to constructs including the LBD (Kd 80 nM) [9]. The negative RSCC values in specific chains suggest conformational heterogeneity at these interfaces rather than inherent disorder, consistent with their role in dynamic domain organization required for cooperative DNA binding. These discrepancies between computational predictions and experimental structures highlight regions involved in conformational changes and allosteric regulation, providing valuable insights for understanding NR function and potential drug targeting strategies. For all NRs, the low pLDDT scores in the AF2 correlate with the missing residues or low RSCC values in experimental 3D structures. As expected, pLDDT scores also inversely correlate with b-factor values. This correlation between pLDDT scores and RSCC values has been independently validated in a large-scale comparison of 150,000 human protein structures, though that study also emphasized that high-resolution crystallographic structures remain preferable when available [44]. Notably, several NRs (PPARγ, RXRα, HNF4α) exhibit missing residues in their LBD regions, and these areas consistently show lower pLDDT scores (50-70) in AF2 predictions, even when other experimental structures contain these regions. Conversely, NRs with complete experimental structures (RARβ, LXRβ) show consistently high pLDDT scores (>90) throughout their LBDs. Interestingly, for NRs lacking crystal structures (RORβ, GCNF, NOR1, TR2, EAR2), AF2 generally predicts high confidence scores (>70-90) across their LBDs, with only occasional lower-confidence regions.

Our backbone torsional analysis reveals a consistent pattern where AF2 models demonstrate higher stereochemical quality compared to experimental structures. This observation aligns with that AF2's energy function strongly favors ideal backbone geometry and AF2's architecture implements strong geometric constraints during structure prediction, explaining the high proportion of residues in favored Ramachandran regions [27]. The “imperfections” observed in crystal structures likely reflect genuine conformational flexibility or structural adaptations supported by finding that crystal structure Ramachandran outliers often occur in functionally relevant regions and may represent legitimate conformational states rather than experimental errors [12]. The varying degrees of agreement between experimental and predicted structures across different NRs suggest that AF2's accuracy depends on multiple factors. Similar observations were reported by Akdel et al. [2] in their large-scale analysis of AF2 predictions, particularly noting that regions of proteins involved in interactions or conformational changes may show larger deviations from idealized geometry. The case of HNF4α, with its disparate ϕ/ψ correlations, illustrates how AF2 may capture overall fold topology while missing specific local conformational details. This aligns with findings showing that while AF2 excels at predicting global structure, it may miss subtle but functionally important local conformational features [39], [18]. The high-quality predictions for RARβ demonstrate that AF2 can achieve exceptional accuracy when the native structure already exhibits ideal geometry. However, the systematic differences observed in other cases raise important considerations for structure-based drug design and protein engineering applications, where natural conformational flexibility may be functionally relevant as shown for understanding enzyme mechanisms and designing effective inhibitors [7]. While AF2 has shown remarkable potential in predicting individual protein structures, it's important to note that proteins rarely function in isolation. Recent developments in AF2 and in particular in AF3 have begun to address the challenges of modeling macromolecular complexes, though experimental methods remain essential for validation and for capturing the full range of biological conformations [23], [1].

To provide a comprehensive overview of our comparative analysis between experimental and AF2-predicted NR structures, we summarize the eight key metrics evaluated in this study in Table 3. This summary highlights both the strengths and limitations of AF2 in capturing the structural features critical for NR function.

Table 3.

Summary of metrics comparing experimental and AF2 NR structures.

Metric Experimental
Structures
AF2 Predictions Significant
Differences
Biological/Functional Implications
RMSD (by domain) Variable structural similarity between domains Generally low RMSD values (<0.55 Å) LBDs showed higher structural variability (CV = 29.3%) than DBDs (CV = 17.7%); DBD RMSD exceeded LBD in 41.7% of cases Flexible regions essential for ligand binding have higher prediction variability



SSE Content Higher loop content, variable α-helical distribution Tendency toward higher α-helical content (+4.2% DBDs, +3.8% LBDs) with compensatory decreases in loop regions (-5.0%) Strong negative correlations between α-helical and loop content (r = −0.527 for DBD, r = −0.906 for LBD) AF2 may overestimate secondary structure stability in dynamically flexible regions



Domain Distances (ϵDBD,LBD) Marked asymmetry in homodimeric structures; partner-specific distances in heterodimers Captures only a single conformational state Homodimers show bimodal distance distribution not captured by AF2 AF2 misses functionally important asymmetry in domain organization



Domain Angles (θDBD,LBD) Receptor-specific and partner-dependent variations Generally predicts one angle conformation RXRα showed most partner-dependent variations not fully captured by AF2 Angle variations may reflect allosteric regulation mechanisms



Dihedral Angles (DHDBD,LBD) Distinct conformational clusters, particularly in NURR1 and GR Captures predominant conformational states GR showed bimodal distribution (IQR = 81.81) not fully captured by AF2 Dihedral angle diversity may be essential for regulatory element interactions



LBP Geometry Larger, more variable pocket volumes Systematically underpredicts pocket volumes (by 8.4% on average) LXRβ showed most significant volume difference (>150% larger in experimental structures) May impact structure-based drug design, potentially missing cryptic pockets



Structure Validation Parameters Variable RSCC scores, especially in flexible regions Consistent pLDDT patterns with low scores in intrinsically disordered regions Low pLDDT scores correlate with missing residues or low RSCC values Highlights regions involved in conformational changes and allosteric regulation



Backbone Torsional Analysis More Ramachandran outliers, often in functionally relevant regions Higher stereochemical quality with fewer outliers Experimental PPARγ structure had 18 Ramachandran outliers vs. only 1 in AF2 “Imperfections” in crystal structures may represent legitimate functional states

4. Limitations of AI structure prediction and future directions

4.1. Emerging AI models for protein structure prediction

Recent advances in protein structure prediction have led to the development of more sophisticated tools beyond AF2, notably AF3 [1], which might overcome the challenges we observed in predicting LBP geometries and homodimeric asymmetry in NRs. Of particular relevance to NR research is the enhanced capability of these newer models for predicting protein-ligand interactions. AF3 demonstrates at least a 50% improvement in accuracy for protein interactions with other molecules compared to previous methods and incorporates capabilities for predicting protein-ligand complexes, which could potentially overcome the systematic underestimation of pocket volumes (8.4% on average) we observed with AF2. Chai-1 [46], another state-of-the-art model, implements novel architectural improvements that could better capture the conformational flexibility of NR domains and offers enhanced performance for protein-ligand interaction prediction, with comparable or slightly better accuracy than AF3 on the PoseBusters benchmark (77% success rate for Chai-1 versus 76% for AF3). Additionally, Chai-1 can be optionally prompted with experimental constraints derived from laboratory data to boost performance [46]. These developments could be particularly valuable for structure-based drug design targeting NRs, where accurate representation of LBPs is essential. While beyond the scope of our current analysis, evaluating these newer models using our systematic framework would provide valuable insights into whether the latest generation of AI-based structure prediction tools can more accurately represent the full spectrum of biologically relevant states in NRs, including the larger LBP volumes we documented in structures like LXRβ. Such comparative analysis would be particularly relevant for structure-based drug design applications targeting the NR family.

4.2. Methodological considerations in AI structure evaluation

Our study evaluated single AF2 predicted models from the AlphaFold Protein Structure Database, which represents the most accessible format of these predictions for the broader research community. While this approach allowed us to systematically analyze the standard AF2 output that most researchers would encounter, we acknowledge that analyzing multiple alternative models and incorporating Predicted Aligned Error (PAE) matrices would provide additional insights, particularly regarding domain arrangement confidence and potential conformational variability. PAE metrics would have been especially valuable for quantifying prediction confidence in domain-domain interfaces, which proved critical in our geometric analysis of NR domain architecture. Importantly, our study establishes a comprehensive analytical framework for evaluating AI-predicted structures of multi-domain proteins, with metrics and methodological approaches that can be readily applied to future prediction models, alternative sampling techniques, or other protein families with complex domain organization. This framework provides a valuable roadmap for systematic structure evaluation regardless of the prediction method used. Future studies could benefit from generating multiple models using standalone AF2 implementations or services like ColabFold, enabling statistical assessment of prediction variability and more comprehensive evaluation of domain positioning confidence. Such multi-model analysis could reveal whether AF2's conformational sampling occasionally captures the alternative states we observed in experimental structures, particularly the asymmetric arrangements in homodimeric receptors. This approach would complement our current analysis by distinguishing between inherent algorithm limitations and sampling inadequacies in representing the full conformational landscape of NRs.

4.3. Key limitations of AF2 in NR prediction

The comprehensive analysis of AF2 predictions for NRs shows several key limitations in current structure prediction approaches. A fundamental challenge lies in AF2's tendency to model static, well-ordered conformations that mirror crystallographic states, potentially missing important physiologically relevant conformational heterogeneity. Current limitations can be categorized into three main areas:

  • 1.

    The first major limitation concerns dynamic regions and conformational flexibility. Our analysis showed significantly higher structural variability in LBDs (CV = 29.3%) compared to DBDs (CV = 17.7%), with systematic underestimation of LBP volumes. This is particularly evident in cases like LXRβ, where we observed a -15.8% deviation in pocket volume predictions. This limitation is particularly significant for structure-based drug design approaches targeting NRs, where accurate pocket volume and flexibility prediction is essential for identifying potential ligands.

  • 2.

    The second key limitation involves multi-domain organization and regulation. AF2 shows notable difficulties in capturing multiple biologically relevant conformational states, particularly the compact-extended asymmetry in homodimers and predicting partner-specific conformational adaptations in heterodimeric complexes. This limitation likely originates from the algorithm's optimization for single-state prediction rather than conformational ensembles, and its reduced capacity to model the cooperative domain movements and long-range allosteric effects that are crucial for NR function. Our analysis revealed that AF2 consistently predicts only one conformational state for asymmetric homodimers like GR and HNF4α, typically corresponding to the extended conformation, while missing the biologically significant compact states that enable differential DNA recognition and cofactor recruitment.

  • 3.

    The third limitation stems from training biases toward crystallizable protein states and static structural data, affecting the overall prediction quality particularly in flexible regions. This bias is compounded by the underrepresentation of multiple conformational states for the same protein in structural databases, where proteins are typically captured in single, stable conformations favored by crystallization. Additionally, the algorithm may have inherent difficulties in capturing the long-range interactions that govern domain-domain positioning across different functional states, particularly when these relationships involve subtle allosteric mechanisms or partner-induced conformational changes. This limitation particularly affects NRs, where conformational diversity is essential for their function as molecular switches in transcriptional regulation.

To address these challenges, the integration of molecular dynamics (MD) simulations and energy minimization protocols could significantly improve conformational landscape exploration, though this would increase computational costs. Expanding training datasets to include diverse experimental data from cryo-EM and NMR studies could enhance the prediction of dynamic states and transient conformations.

The future of NR structure prediction likely lies in hybrid approaches that integrate experimental and computational methods. While more sophisticated sampling methods and broader training datasets could enhance prediction accuracy, they must balance computational efficiency with biological relevance. Future methods should focus on capturing both structural accuracy and complex regulatory mechanisms, including domain flexibility, allosteric communication, and partner-specific adaptations. Given AF3's expanded capabilities in predicting protein complexes, systematic evaluation following our analysis framework is needed, particularly focusing on DNA binding, ligand interactions, and cofactor peptide associations in NRs.

Recent studies have recognized the limitation of standard AF2 in capturing only single conformational states and have developed specialized modifications to address this issue. Del Alamo et al. [15] proposed a method to sample alternative conformations by manipulating multiple sequence alignment inputs, while Wayment-Steele et al. [53]developed techniques to explore diverse protein conformations from AF2 predictions. While beyond the scope of our current analysis, applying these enhanced sampling approaches to nuclear receptors could potentially reveal the multiple conformational states we observed in experimental structures, particularly the functional asymmetry in homodimers. Future work combining these specialized AF2 modifications with our analytical framework could provide deeper insights into the conformational landscape of NRs.

5. Conclusion

This first comprehensive comparative analysis of AF2-predicted and experimental NR structures reveals both the strengths and limitations of computational structure prediction. While AF2 excels at predicting stable conformations with high stereochemical quality, it shows limitations in capturing the full spectrum of biologically relevant conformational states, particularly in flexible regions and LBPs. The study demonstrates that AF2 predictions should be used as complementary tools alongside experimental structures, especially for understanding protein dynamics and drug-target interactions. Critical differences between predicted and experimental structures often highlight functionally important regions involved in allosteric regulation and protein-protein interactions, providing valuable insights for structure-based drug design targeting NRs.

6. Materials and methods

All geometrical measurements and analysis have been done in PyMOL v2.4.2 [43] using internal commands or python scripts.

6.1. RMSD analysis

The RMSD is a global measure of the similarity between protein structures, calculated as the square root of the mean squared distances between aligned atoms of superimposed structures. The structure-based dynamic programming alignment of experimental and predicted structures was determined using three methods: i) all-atoms, ii) only Cα carbons and iii) backbone (N, Cα, C and O atoms). The RMSD in Å was calculated individually for the DBDs and LBDs using the super command in PyMOL (Version v2.4.2, Schrödinger, LLC) according to the equation:

RMSD=1Ni=1N(xiyi)2 (1)

where N is the number of aligned atoms, and (xiyi) represents the distance between the ith pair of equivalent atoms after optimal superposition. Statistical analyses were performed to quantify structural variations and domain-specific differences. The coefficient of variation (CV) was calculated to assess relative variability:

CV=σμ×100 (2)

where σ is the standard deviation and μ is the mean RMSD value for each measurement type. Domain-specific differences were evaluated using paired statistical analysis, with effect size quantified by Cohen's d:

d=|μ1μ2|σ12+σ222 (3)

where μ1 and μ2 are the means, and σ1 and σ2 are the standard deviations of the DBD and LBD RMSD values, respectively. The significance of domain differences was assessed using a paired t-test with a significance level of α=0.05. To assess structural prediction consistency across different measurement types (all-atom, backbone, Cα), hierarchical patterns were analyzed, and deviations from the expected pattern (all-atom > backbone > Cα) were quantified. Domain measurements were performed separately to account for the potential positional bias arising from differential domain arrangements in 3D space.

6.2. SSE content analysis

The α-helical, β-sheet, and loop content expressed in % was calculated separately for DBD and LBD using python scripts in PyMOL environment. Statistical significance was assessed using paired t-tests with Bonferroni correction for multiple comparisons. Structure element correlations were calculated using Pearson's correlation coefficient:

rH,L=i=1n(HiH¯)(LiL¯)i=1n(HiH¯)2i=1n(LiL¯)2 (4)

where Hi and Li represent α-helical and loop content percentages, respectively.

6.3. Geometric analysis of domain architecture

Distance ϵDBD,LBD between DBD and LBD

The COM distance metric ϵDBD,LBD was selected to quantify global domain separation independent of local structural variations, providing a measure of overall domain arrangement. This approach aims to characterize multi-domain protein architectures and is valuable for NRs, where domain positioning significantly impacts function. COM distances effectively capture the large-scale conformational differences that distinguish functional states in allosteric proteins like NRs. The physical distance in Å between the two domains (DBD and LBD) was calculated by internal command get_distance between the two COMs located in two domains using the centerofmass.py script. Statistical variation in domain distances was quantified using the interquartile range (IQR):

IQR=Q3Q1 (5)

where Q1 and Q3 represent the first and third quartiles of the experimental structure measurements, respectively.

6.3.1. Angle θDBD,LBD between DBD and LBD

Angular measurements between domain COMs and hinge regions θDBD,LBD were selected to characterize the relative orientation of domains, which is critical for understanding how NRs position their DBDs and LBDs in 3D space. This geometric descriptor can identify distinct conformational states in multi-domain transcription factors and provides complementary information to distance measurements by capturing rotational relationships that affect functional interactions with DNA and coregulatory proteins. θDBD,LBD was calculated using three points 1) DBDCOM, 2) LBDCOM, and the 3) COM defined by using the last Cα atom of DBD (DBDCαL) and the first Cα atom of LBD (LBDCαF). This approach was chosen to standardize the definition of the hinge COM across structures with varying numbers of missing atoms in the hinge region. Angles are expressed in degree (°) and were calculated using the PyMOL command get_angle.

Angular variation was assessed using both IQR and circular statistics:

σθ=2ln(R) (6)

where R is the mean resultant length of the angular measurements.

6.3.2. Dihedral angles DHDBD,LBD between DBD and LBD

Dihedral angle analysis DHDBD,LBD was employed to quantify the torsional relationship between domains, which is particularly informative for identifying conformational subtypes in proteins with modular architecture. This metric captures rotational differences that are not evident from distance or simple angle measurements alone, revealing subtle but functionally significant conformational variations. Dihedral angles are especially relevant for NRs, as they reflect the relative positioning of interaction surfaces involved in dimerization and cofactor recruitment. DHDBD,LBD were calculated using four points: 1) DBDCOM, 2) DBDCαL, 3) LBDCαF and 4) LBDCOM. Angles are expressed in degrees and were calculated using the PyMOL command get_dihedral.

Dihedral angle distributions were analyzed using circular statistics and conformational clustering:

Conformational Similarity=cos(ΔDHDBD,LBD) (7)

where ΔDHDBD,LBD represents the difference in dihedral angles between structures.

Comparative analysis between experimental structures and AF2 predictions was performed using multiple metrics. The spread of experimental measurements was quantified using IQR to assess structural variability. Partner-specific effects were evaluated using one-way ANOVA with post-hoc Tukey's test where appropriate. For circular measurements (angles and dihedrals), appropriate circular statistics were applied. Statistical significance was set at p < 0.05.

6.4. Analysis of LBP geometry

The LBPs were identified by hollow 1.3 [22]. The Connoly surface volume and area were calculated with 1.4 Å probe radius using msms 2.6.1 [40] and the PyMOL command get_area, respectively. The LBPs were subsequently visualized by PyMOL at cutoff value of 5 Å from the particular ligand located in the structure. For the apo AF2 predicted structure the cutoff was calculated from the position of all merged ligands in the holo structures. The LBP sphericity Ψ, is defined as:

Ψ=π1/3(6V)2/3A (8)

where is V is the volume and A is the surface area. The effective radius of the LBP is defined as the radius of a sphere with same surface area to volume ratio as the volume of interest, determined by:

reff=3VA (9)

Statistical analyses for the LBP geometry were performed using custom Python scripts. Kruskal-Wallis H-tests were conducted to assess significant differences between NRs for both effective radius and sphericity measurements. Pearson correlation coefficients were calculated to evaluate relationships between effective radius, sphericity, and experimental versus AF2 predicted values. For experimental structures with multiple measurements, means and standard deviations were calculated. Hierarchical clustering was performed using Euclidean distances between radius and sphericity measurements to group receptors with similar characteristics. Linear regression analysis was used to assess the systematic relationships between experimental and AF2-predicted values, with regression slopes indicating prediction biases. Percentage deviations between experimental and predicted values were calculated as (vpredvexp)/vexp×100%, where v represents either radius or sphericity measurements.

6.5. Analysis of structure validation parameters

6.5.1. Comparative analysis of b-factors, RSCC and pLDDT parameters

Local structural quality metrics were visualized through spatial mapping using PyMOL's b-factor putty representation. The visualization parameters were configured to display variations in putty diameter and chromatic gradient corresponding to the respective quality metrics. For crystallographic structures, atomic b-factors were utilized to assess conformational mobility, while RSCC provided quantitative measures of model-to-density agreement. pLDDT scores derived from AF2 models served as confidence metrics for the computational predictions. To enable direct comparison, pLDDT values were scaled by 1/100 to match the same numerical range as RSCC. RSCC of PDB structures and pLDDT scores of AF2 structures were then compared and plotted using AF2 NR sequence numbering. Comparative analysis of these spatially mapped parameters facilitated the identification of regions exhibiting structural correspondence or deviation between experimental and computationally predicted models, with particular emphasis on the DBD and LBD regions.

6.5.2. Torsional analysis

The backbone torsion angles (ϕ and ψ) of the DBDs and LBDs were analyzed using a custom Python script implemented through the PyMOL molecular visualization system. Based on ϕ and ψ torsion angles, the similarity between experimental structures and AF2 predictions was assessed using Python script. Four metrics (RMSD, Wasserstein Distance, Mean Absolute Error, Pearson Correlation Coefficient) were used to evaluate the resemblance for each NR pair. Conformational analysis of the protein backbone geometry was performed using Ramachandran plot calculations in MolProbity [12] hosted at http://molprobity.biochem.duke.edu website. Prior to analysis, structural alignment between predicted AF2 models and experimentally determined structures was performed to ensure a proper comparative assessment. The Ramachandran plots were generated by submitting each NR structure to the MolProbity server, enabling a quantitative evaluation of backbone conformations through assessment of sterically allowed and disallowed regions in the ϕ - ψ torsional space.

CRediT authorship contribution statement

Akerke Mazhibiyeva: Writing – original draft, Visualization, Methodology, Investigation, Formal analysis. Tri T. Pham: Writing – review & editing, Formal analysis. Karina Pats: Writing – review & editing, Formal analysis. Martin Lukac: Writing – review & editing, Formal analysis. Ferdinand Molnár: Writing – review & editing, Visualization, Supervision, Funding acquisition, Conceptualization.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this manuscript.

Acknowledgements

This work was supported by Nazarbayev University Collaborative Research Proposal # 091019CRP2108 to F.M.

Footnotes

Appendix A

Supplementary material related to this article can be found online at https://doi.org/10.1016/j.csbj.2025.05.010.

Appendix A. Supplementary material

The following is the Supplementary material related to this article.

MMC 1

Data for RMDS, SSE and Domain architecture analysis.

mmc1.docx (36.4KB, docx)
MMC 2

Data for LBP geometry analysis.

mmc2.xlsx (60.5KB, xlsx)
MMC 3

Data for RSCC, b-factors and pLDDT analysis.

mmc3.xlsx (33.1KB, xlsx)
MMC 4

Data for Torsion angles analysis and ranked similarity results.

mmc4.xlsx (15.2KB, xlsx)
MMC 5

Data for Molprobity and Ramachandran plot analysis.

mmc5.pdf (1.1MB, pdf)
MMC 6

Data for Ramachandran statistics.

mmc6.xlsx (13.5KB, xlsx)

References

  • 1.Abramson J., Adler J., Dunger J., Evans R., Green T., Pritzel A., et al. Accurate structure prediction of biomolecular interactions with alphafold 3. Nature. 2024;1–3doi doi: 10.1038/s41586-024-07487-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Akdel M., Pires D.E., Pardo E.P., Jänes J., Zalevsky A.O., Mészáros B., et al. A structural biology community assessment of alphafold2 applications. Nat Struct Mol Biol. 2022;29:1056–1067. doi: 10.1038/s41594-022-00849-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Alexander S.P., Cidlowski J.A., Kelly E., Mathie A., Peters J.A., Veale E.L., et al. The concise guide to pharmacology 2021/22: nuclear hormone receptors. Br J Pharmacol. 2021;178:S246–S263. doi: 10.1038/s41586-021-03828-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Berman H.M., Westbrook J., Feng Z., Gilliland G., Bhat T.N., Weissig H., et al. The protein data bank. Nucleic Acids Res. 2000;28:235–242. doi: 10.1093/nar/28.1.235. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Burley S.K., Bhikadiya C., Bi C., Bittrich S., Chen L., Crichlow G.V., et al. Rcsb protein data bank: powerful new tools for exploring 3d structures of biological macromolecules for basic and applied research and education in fundamental biology, biomedicine, biotechnology, bioengineering and energy sciences. Nucleic Acids Res. 2021;49:D437–D451. doi: 10.1093/nar/gkaa1038. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Carlberg C., Molnár F. Springer Nature; 2020. Mechanisms of gene regulation: how science works. [DOI] [Google Scholar]
  • 7.Casadevall G., Duran C., Osuna S. Alphafold2 and deep learning for elucidating enzyme conformational flexibility and its application for design. JACS Au. 2023;3:1554–1562. doi: 10.1021/jacsau.3c00188. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Chandra V., Huang P., Hamuro Y., Raghuram S., Wang Y., Burris T.P., et al. Structure of the intact ppar-γ–rxr-α nuclear receptor complex on dna. Nature. 2008;456:350–356. doi: 10.1038/nature07413. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Chandra V., Huang P., Potluri N., Wu D., Kim Y., Rastinejad F. Multidomain integration in the structure of the hnf-4α nuclear receptor complex. Nature. 2013;495:394–398. doi: 10.1038/nature11966. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Chandra V., Wu D., Li S., Potluri N., Kim Y., Rastinejad F. The quaternary architecture of rarβ–rxrα heterodimer facilitates domain–domain signal transmission. Nat Commun. 2017;8:1–9. doi: 10.1038/s41467-017-00981-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Chen T. Nuclear receptor drug discovery. Curr Opin Chem Biol. 2008;12:418–426. doi: 10.1016/j.cbpa.2008.07.001. [DOI] [PubMed] [Google Scholar]
  • 12.Chen V.B., Arendall W.B., Headd J.J., Keedy D.A., Immormino R.M., Kapral G.J., et al. Molprobity: all-atom structure validation for macromolecular crystallography. Acta Crystallogr, D Biol Crystallogr. 2010;66:12–21. doi: 10.1107/S0907444909042073. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Clark A.K., Wilder J.H., Grayson A.W., Johnson Q.R., Lindsay R.J., Nellas R.B., et al. The promiscuity of allosteric regulation of nuclear receptors by retinoid X receptor. J Phys Chem B. 2016;120:8338–8345. doi: 10.1021/acs.jpcb.6b02057. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Consortium T.U. Uniprot: the universal protein knowledgebase in 2023. Nucleic Acids Res. 2023;51:D523–D531. doi: 10.1093/nar/gkac1052. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Del Alamo D., Sala D., Mchaourab H.S., Meiler J. Sampling alternative conformational states of transporters and receptors with alphafold2. eLife. 2022;11 doi: 10.7554/eLife.75751. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Dhe-Paganon S., Duda K., Iwamoto M., Chi Y.I., Shoelson S.E. Crystal structure of the hnf4α ligand binding domain in complex with endogenous fatty acid ligand. J Biol Chem. 2002;277:37973–37976. doi: 10.1074/jbc.C200420200. [DOI] [PubMed] [Google Scholar]
  • 17.Fowler N.J., Williamson M.P. The accuracy of protein structures in solution determined by alphafold and nmr. Structure. 2022 doi: 10.1016/j.str.2022.04.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Gut J.A., Lemmin T. Dissecting alphafold2's capabilities with limited sequence information. Bioinform Adv. 2025;5 doi: 10.1093/bioadv/vbae187. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.He X.h., You C.z., Jiang H.l., Jiang Y., Xu H.E., Cheng X. Alphafold2 versus experimental structures: evaluation on g protein-coupled receptors. Acta Pharmacol Sin. 2022:1–7. doi: 10.1038/s41401-022-00938-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Hekkelman M.L., de Vries I., Joosten R.P., Perrakis A. AlphaFill: enriching AlphaFold models with ligands and cofactors. Nat Methods. 2023;20:205–213. doi: 10.1038/s41592-022-01685-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Herzberg O., Moult J. More than just pattern recognition: prediction of uncommon protein structure features by AI methods. Proc Natl Acad Sci. 2023;120 doi: 10.1073/pnas.2221745120. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Ho B.K., Gruswitz F. Hollow: generating accurate representations of channel and interior surfaces in molecular structures. BMC Struct Biol. 2008;8:49–54. doi: 10.1186/1472-6807-8-49. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Hryc C.F., Baker M.L. Alphafold2 and cryoem: revisiting cryoem modeling in near-atomic resolution density maps. iScience. 2022;00 doi: 10.1016/j.isci.2022.104496. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Hung L.H., Guerquin M., Samudrala R. Accelerated protein structure comparison using TM-score-GPU. BMC Res Notes. 2012;5:1–11. doi: 10.1093/bioinformatics/bts345. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Idrees S., Pérez-Conesa S., Yang J., Zhang X., Cao R. Assessing fairness of AlphaFold2 prediction of protein 3D structures. J Chem Inf Model. 2022;62:4604–4617. doi: 10.1021/acs.jcim.2c00467. [DOI] [Google Scholar]
  • 26.Institute E.B. How accurate are alphafold structure predictions? URL. 2023. https://www.ebi.ac.uk/training/online/courses/alphafold/validation-and-impact/how-accurate-are-alphafold-structure-predictions/
  • 27.Jumper J., Evans R., Pritzel A., Green T., Figurnov M., Ronneberger O., et al. Highly accurate protein structure prediction with alphafold. Nature. 2021;596:583–589. doi: 10.1038/s41586-021-03819-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Kojetin D.J., Burris T.P. Small molecule modulation of nuclear receptor conformational dynamics: implications for function and drug discovery. Mol Pharmacol. 2013;83:1–8. doi: 10.1124/mol.112.079285. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Kojetin D.J., Matta-Camacho E., Hughes T.S., Srinivasan S., Nwachukwu J.C., Cavett V., et al. Structural mechanism for signal transduction in RXR nuclear receptor heterodimers. Nat Commun. 2015;6:8013. doi: 10.1038/ncomms9013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Liu X., Wang Y., Ortlund E.A. First high-resolution crystal structures of the glucocorticoid receptor ligand-binding domain–peroxisome proliferator-activated γ coactivator 1-α complex with endogenous and synthetic glucocorticoids. Mol Pharmacol. 2019;96:408–417. doi: 10.1124/mol.119.116806. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Lou X., Toresson G., Benod C., Suh J.H., Philips K.J., Webb P., et al. Structure of the retinoid x receptor α–liver x receptor β (rxrα–lxrβ) heterodimer on dna. Nat Struct Mol Biol. 2014;21:277–281. doi: 10.1038/nsmb.2778. [DOI] [PubMed] [Google Scholar]
  • 32.Martínez C., Souto J.A., de Lera A.R. Ligand design for modulation of rxr functions. Retin Rexin Signal: Methods Protoc. 2019:51–72. doi: 10.1007/978-1-4939-9585-1_4. [DOI] [PubMed] [Google Scholar]
  • 33.Nguyen Ba A.N., Yeh B.J., Van Dyk D., Davidson A.R., Andrews B.J., Weiss E.L., et al. Proteome-wide discovery of evolutionary conserved sequences in disordered regions. Sci Signal. 2012;5:rs1. doi: 10.1126/scisignal.2002515. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Pei Z., Li J., Li M., Wang X., Jin H. Evaluating the structure of antithrombin variants: insights from alphafold2 and molecular dynamics simulations. Sci Rep. 2024;14 doi: 10.1038/s41598-024-60220-7. [DOI] [Google Scholar]
  • 35.Postel S., Wissler L., Johansson C.A., Gunnarsson A., Gordon E., Collins B., et al. Quaternary glucocorticoid receptor structure highlights allosteric interdomain communication. Nat Struct Mol Biol. 2023;30:286–295. doi: 10.1038/s41594-022-00914-4. [DOI] [PubMed] [Google Scholar]
  • 36.Putcha B.D.K., Fernandez E.J. Direct interdomain interactions can mediate allosterism in the thyroid receptor. J Biol Chem. 2009;284:22517–22524. doi: 10.1074/jbc.M109.026682. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Raisinghani N., Alshahrani M., Gupta G., Tian H., Xiao S., Tao P., et al. Integration of a randomized sequence scanning approach in AlphaFold2 and local frustration profiling of conformational states enable interpretable atomistic characterization of conformational ensembles and detection of hidden allosteric states in the ABL1 protein kinase. J Chem Theory Comput. 2024;20:5317–5336. doi: 10.1021/acs.jctc.4c00222. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Rastinejad F., Huang P., Chandra V., Khorasanizadeh S. Understanding nuclear receptor form and function using structural biology. J Mol Endocrinol. 2013;51:T1–T21. doi: 10.1530/JME-13-0173. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Saldaño T., Escobedo N., Marchetti J., Zea D.J., Mac Donagh J., Velez Rueda A.J., et al. Impact of protein conformational diversity on alphafold predictions. Bioinformatics. 2022;38:2742–2748. doi: 10.1093/bioinformatics/btac202. [DOI] [PubMed] [Google Scholar]
  • 40.Sanner M.F., Olson A., Spehner J. Reduced surface: an efficient way to compute molecular surfaces. Biopolymers. 1996;38:305–320. doi: 10.1002/(SICI)1097-0282(199603)38:3%3C305::AID-BIP4%E3.0.CO;2-Y. [DOI] [PubMed] [Google Scholar]
  • 41.Santos R., Ursu O., Gaulton A., Bento A.P., Donadi R.S., Bologa C.G., et al. A comprehensive map of molecular drug targets. Nat Rev Drug Discov. 2017;16:19–34. doi: 10.1038/nrd.2016.230. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Scheschowitsch K., Leite J.A., Assreuy J. New insights in glucocorticoid receptor signaling—more than just a ligand-binding receptor. Front Endocrinol. 2017;8:16. doi: 10.3389/fendo.2017.00016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Schrödinger L., DeLano W. Pymol. http://www.pymol.org/pymol
  • 44.Shao C., Bittrich S., Wang S., Burley S.K. Assessing pdb macromolecular crystal structure confidence at the individual amino acid residue level. Structure. 2022 doi: 10.1016/j.str.2022.08.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Stevens A.O., He Y. Benchmarking the accuracy of AlphaFold 2 in loop structure prediction. Biomolecules. 2022;12:985. doi: 10.3390/biom12070985. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Team C.D. Chai-1: decoding the molecular interactions of life. 2024. https://doi.org/10.1101/2024.10.10.615955 bioRxiv.
  • 47.Tice C.M., Noto P.B., Fan K.Y., Zhuang L., Lala D.S., Singh S.B. The medicinal chemistry of liver x receptor (lxr) modulators. J Med Chem. 2014;57:7182–7205. doi: 10.1021/jm500442z. [DOI] [PubMed] [Google Scholar]
  • 48.Tikhonov D.B., Zhorov B.S. P-loop channels: experimental structures, and physics-based and neural networks-based models. Membranes. 2022;12:229. doi: 10.3390/membranes12020229. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Tuckermann J., Bourguet W., Mandrup S. Meeting report: nuclear receptors: transcription factors and drug targets connecting basic research with translational medicine. Mol Endocrinol. 2010;24:1311–1321. doi: 10.1210/me.2010-0083. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Tunyasuvunakool K., Adler J., Wu Z., Green T., Zielinski M., Žídek A., et al. Highly accurate protein structure prediction for the human proteome. Nature. 2021;596:590–596. doi: 10.1038/s41586-021-03828-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Varadi M., Anyango S., Deshpande M., Nair S., Natassia C., Yordanova G., et al. Alphafold protein structure database: massively expanding the structural coverage of protein-sequence space with high-accuracy models. Nucleic Acids Res. 2022;50:D439–D444. doi: 10.1093/nar/gkab1061. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Venäläinen T., Molnár F., Oostenbrink C., Carlberg C., Peräkylä M. Molecular mechanism of allosteric communication in the human pparα-rxrα heterodimer. Proteins, Struct Funct Bioinform. 2010;78:873–887. doi: 10.1002/prot.22613. [DOI] [PubMed] [Google Scholar]
  • 53.Wayment-Steele H.K., Ojoawo A., Otten R., Apitz J.M., Pitsawong W., Hömberger M., et al. Predicting multiple conformations via sequence clustering and alphafold2. Nature. 2024;625:832–839. doi: 10.1038/s41586-023-06832-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Weikum E.R., Knuesel M.T., Ortlund E.A., Yamamoto K.R. Glucocorticoid receptor control of transcription: precision and plasticity via allostery. Nat Rev Mol Cell Biol. 2017;18:159–174. doi: 10.1038/nrm.2016.152. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Windshügel B. Structural insights into ligand-binding pocket formation in nurr1 by molecular dynamics simulations. J Biomol Struct Dyn. 2019;37:4651–4657. doi: 10.1080/07391102.2018.1559099. [DOI] [PubMed] [Google Scholar]
  • 56.Xu J., Zhang Y. How significant is a protein structure similarity with TM-score = 0.5? Bioinformatics. 2010;26:889–895. doi: 10.1093/bioinformatics/btq066. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Zhang Y., Skolnick J. Scoring function for automated assessment of protein structure template quality. Proteins, Struct Funct Bioinform. 2004;57:702–710. doi: 10.1002/prot.20264. [DOI] [PubMed] [Google Scholar]
  • 58.Zhao M., Wang N., Guo Y., Li J., Yin Y., Dong Y., et al. Integrative analysis reveals structural basis for transcription activation of nurr1 and nurr1-rxrα heterodimer. Proc Natl Acad Sci. 2022;119 doi: 10.1073/pnas.2206737119. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Zhorov B.S., Dong K. Pyrethroids in an alphafold2 model of the insect sodium channel. Insects. 2022;13 doi: 10.3390/insects13080745. https://www.mdpi.com/2075-4450/13/8/745 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Zhou Y., Ye L., Xu T., Li H., Wang H. Modafold: a strategy for modeling missense mutant protein structures using alphafold2 and molecular dynamics simulations. Front Mol Biosci. 2023;10 doi: 10.3389/fmolb.2023.1083575. [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

MMC 1

Data for RMDS, SSE and Domain architecture analysis.

mmc1.docx (36.4KB, docx)
MMC 2

Data for LBP geometry analysis.

mmc2.xlsx (60.5KB, xlsx)
MMC 3

Data for RSCC, b-factors and pLDDT analysis.

mmc3.xlsx (33.1KB, xlsx)
MMC 4

Data for Torsion angles analysis and ranked similarity results.

mmc4.xlsx (15.2KB, xlsx)
MMC 5

Data for Molprobity and Ramachandran plot analysis.

mmc5.pdf (1.1MB, pdf)
MMC 6

Data for Ramachandran statistics.

mmc6.xlsx (13.5KB, xlsx)

Articles from Computational and Structural Biotechnology Journal are provided here courtesy of Research Network of Computational and Structural Biotechnology

RESOURCES