Abstract
Cryogenic electron microscopy (cryo-EM) has now been widely used for determining multi-chain protein complexes. However, modeling a large complex structure, such as those with more than ten chains, is challenging particularly when the map resolution decreases. Here, we present DiffModeler, a fully automated method for modeling large protein complex structures. DiffModeler employs a diffusion model for backbone tracing and integrates AlphaFold2-predicted single-chain structures for structure fitting. DiffModeler showed an average TM-Score of 0.88 and 0.91 for two datasets of cryo-EM maps of 0–5 Å resolution, and 0.92 for intermediate resolution maps (5–10 Å), substantially outperforming existing methodologies. Further benchmarking at low resolutions (10–20 Å) confirms its versatility, demonstrating plausible performance.
Introduction
Proteins are fundamental molecules that carry out numerous functions in living organisms, including enzyme catalysis, cell signaling, and transport of molecules. Cryogenic electron microscopy (cryo-EM) has gained significant popularity among experimental protein structure determination techniques1–3. This technique is increasingly favored due to several advantages, notably its superior capacity to determine the three-dimensional (3D) structures of large macromolecular complexes.
While reported map resolutions in literature have generally shown steady improvement over recent years, it remains common to encounter intermediate resolutions (~5–10 Å) in real-life lab scenarios, posing challenges for structure modeling. When the map resolution is better than 5 Å, direct tracing of main-chain of proteins4–8 and nucleic acids9 have now become feasible due to recent modeling methods leveraging deep learning to detect atom positions within the map. However, for maps within the intermediate resolution range (5–10 Å), de novo modeling is generally not viable because the identification of amino acid residues and atoms remains elusive even with deep learning techniques. Hence, a practical approach involves conducting structure fitting using methods such as Phenix10, Flex-EM11, Assembline12, MultiFit13, Chimera14, MarkovFit15, and VESPER16 or employing manual fitting with known structures from PDB17 or predicted structure models18. Secondary structure detection methods within EM maps19–21 can aid in protein structure fitting. Despite many structures are determined through structure fitting, accurately orienting molecules within a map in this resolution range remains challenging, especially for complexes comprising multiple subunits. The successful development of an automatic and precise structure fitting method for EM maps at medium- and low-resolution intermediate resolutions would significantly support structural biologists.
Here, we developed DiffModeler, a fully automated structure fitting method for modeling large protein complex structures in cryo-EM maps with resolutions ranging from 5 to 10 Å. DiffModer uses a diffusion model22–24 to enhance the map aiding in finding precise fitting poses for these structures. The diffusion model is a parameterized Markov chain trained using variational inference to generate samples that match the underlined data after a finite time frame. Notably, the diffusion model has demonstrated considerable success in various areas of image processing, such as image generation23–26, segmentation27,28, and translation29,30, and also in bioinformatics, including protein docking31 and protein design32,33. Building upon these successful applications, DiffModeler integrates the diffusion model to enhance the extraction of structural information, facilitating accurate structure modeling for cryo-EM maps at intermediate resolutions.
To the best of our knowledge, this is the first fully automated and accurate method for modeling protein complex structures in maps at intermediate resolutions. DiffModeler initiates the process by tracing protein backbones within a cryo-EM map, employing a diffusion model designed to capture the distinctive local density patterns representing protein backbones. Simultaneously, we use AlphaFold2 (AF2)34, the cutting-edge protein structure prediction method, to generate high-quality single-chain structures. Subsequently, the structure models from AF2 are fitted into the traced backbone map, producing many candidate poses through the VESPER16 structure fitting program. Ultimately, the complete protein complex structure is assembled by combining candidate poses of constituent subunits.
A benchmark conducted on EM maps ranging from 5.0 Å to 10.0 Å resolution demonstrated that modeling with DiffModeler substantially outperformed conventional methods10,16. Extending our evaluation, we further benchmarked DiffModeler on 6 experimental maps at a low resolution of 10 to 20 Å, where DiffModeler modelled the structure with a TM-Score of 0.27 to 0.97. Additionally, we integrated DiffModeler with CryoREAD, our DNA/RNA structure modeling method9, to build protein-nucleic acid complex structures in two datasets comprising 61 and 28 maps at a resolution of up to 5 Å. This combined protocol showcased a state-of-the-art performance delivering an average TM-score35 of 0.88 and 0.91, respectively.
Results
Overall Framework of DiffModeler
We begin by explaining the DiffModeler algorithm depicted in Fig. 1. DiffModeler comprises four major steps: First, it detects the protein backbone positions in the input cryo-EM map by enhancing the map using a trained diffusion model. Secondly, it conducts the modeling of individual protein structures using AF2. Thirdly, structure models are fitted to the enhanced map using VESPER. Lastly, it selects and combines fitted single-chain poses to build the complete protein complex structures within the map. Below, we provide more information of each step.
Fig. 1. Overall framework of DiffModeler.

a. Workflow of DiffModeler. DiffModeler consists of four main steps. 1) Backbone tracing from cryo-EM maps at intermediate resolution via a diffusion model. 2) Single-chain structure prediction by AF2. 3) Single-chain structure fitting using VESPER. 4) Protein complex modeling by assembling algorithms. b. Overview of Diffusion Process. Starting from an original cryo-EM map as a condition and random Gaussian noise. The protein backbone is traced by the iterative reverse diffusion process utilizing a pre-trained diffusion model. On the right shown is the ground-truth protein backbone density, which is the target of the reverse diffusion process. The examples used here are EMD-0213 (Resolution: 6.35 Å) and EMD-1042 (Resolution: 10.3 Å).
Backbone Tracing via Diffusion Model
Achieving accurate structure fitting for maps of an intermediate resolution is untrivial. To aim for higher accuracy, the main innovation of DiffModeler is to use diffusion model to pronounce the density that belong to protein backbone. The input map is scanned with a 643 Å3 box with a stride of 32 Å. Given a box of cryo-EM density, the encoder of the conditional diffusion model computes an embedding of the input density box. Subsequently, the decoder starts with random Gaussian noise as the initial density distribution and iteratively refines its estimates to make it closer to the ground-truth traced backbone conditioned on the embedding from the encoder and the initial density input. The diffusion process is illustrated in Fig. 1b. This traced backbone provides clearer information for structure fitting compared to the original map. The diffusion model is trained via denoising diffusion implicit model (DDIM) framework, with the main objective to perform conditional denoising of a noisy density of the traced backbone to achieve the ground-truth traced protein backbone density in the map. The overall framework is optimized via Dice loss36 that considers the agreement of the identified and ground-truth backbone positions. The training and inference framework is presented in Extended Data 1-4, respectively, and further details can be found in Methods.
Structure Prediction by AF2
In DiffModeler, we use predicted single chain protein structures by AF2 34 to fit into the diffused map. While there are instances where AF2 models do not align with the proteins’ conformations in particular cryo-EM maps 8, ample cases exist37–39 where AF2 models demonstrated sufficient accuracy to be effectively integrated into EM maps. Specifically, for maps at a resolution worse than 5 Å,, where de novo main-chain tracing becomes highly challenging, it would be pragmatic to consider AF2 models for structure modeling. Instead of generating new AF2 models, users can also use precomputed models available in the AlphaFold database 18, which we employed in this work.
Structure Model Fitting with VESPER
The predicted structure models are fit to the diffused map using VESPER16, a structure and map fitting method developed in our group. By taking into account local density gradient within maps, VESPER has demonstrated superior performance surpassing existing methods16. The predicted structure models are converted into simulated maps at a 1 Å resolution. Subsequently, both these simulated maps of derived from the models and the diffused backbone map are transformed into local dense points (LDPs) using the mean-shift algorithm4,40. LDPs serve to encapsulate the local salient features of density, proving to be more precise for alignment than using the unprocessed maps. Using VESPER, each subunit is aligned with the diffused map and the top 100 candidate poses are kept for the subsequent assembly phase.
Protein complex modeling by a greedy assembling algorithm
This phase is geared towards assembling the complete protein complex structure through an assembly of suitable poses from each subunit. To accomplish this, we have devised a greedy algorithm, which is explained in detail in the Methods section and visually outlined in Extended Data 5. In the preceding step, a collection of 100 poses has been constructed for each subunit, with each pose being evaluated based on a fitness score. From all combinations of subunit-pose pairs, we identify the subunit-pose with the highest score. Subsequently, we mask the local density of the map occupied by this selected subunit-pose pair and select the next best subunit-pose in the pool. This process iterates, systematically selecting the subsequent best subunit-pose pairs until all subunits seamlessly integrate into the diffused protein backbone map.
Fitting Quality Estimation in DiffModeler
The quality of the structure fit is quantified by subunit_fitscore (Eq. 9 in Methods), used to rank fitting poses of chains. This score ranges from 0 to 1, with higher scores indicating better fitting quality. As illustrated in Extended Data 6 (EMD-21136, PDB: 6vac), well-fitted structures typically have a score higher than 0.5, whereas chains with scores below 0.5 (depicted in red) indicate potentially incorrect fit.
Structure Modeling Performance at Intermediate Resolution
We assessed DiffModeler’s performance on an independent dataset comprising 71 maps determined at resolutions between 5.0 Å and 10.0 Å. These maps included 19 cases where all proteins in the map were predicted by AF2 with a TM-score no worse than 0.5, whereas the rest 52 maps include inaccurate AF2 models. These structures are nonredundant in comparison to the training and validation datasets we used (see Methods). Sup Table S1 provides a comprehensive list of the maps included in this dataset. The range of residues of proteins in the 19 maps varied from 1,202 to 13,462, while the number of protein chains ranged between 3 and 47. Notably, 12 out of 19 maps include protein complexes with more than 3,000 residues in total, which is larger than the size that the state-of-the-art protein docking method, Alphafold-Multimer 41 was trained on. The number of chains in the other 52 maps ranged from 1 to 62 with the number of residues ranging from 814 to 18,080.
Fig. 2 summarizes the modeling accuracy on the dataset from various perspectives. In Fig. 2a, we assessed the accuracy of the diffused backbone map generated by the diffusion model in the initial step of DiffModeler (as depicted in the traced backbone panel in Fig. 1). We computed recall and precision of the grid points within the diffused map with reference to the backbone heavy atoms (Cα, C, N) excluding oxygen of proteins in a map (details in Methods). As a diffused map outlines backbone atom positions within an input EM map, the volume of a map was, in principle, reduced, on average, by 53.7% for the 19 maps (which did not have inaccurate AF2 models). This modification of maps notably elevated the average precision to 85.1% from 68.8% without significantly compromising the recall, which remained stable at an average of 93.1% from 96.6% (the original maps). Detailed results of individual maps are available in Sup Table S2. Precision and recall for the 52 maps with inaccurate AF2 models (shown as orange crosses) were at similar levels, 80.6% and 93.5%, respectively.
Fig 2. Performance of protein complex structure modeling by DiffModeler.
a. Backbone recall and precision of the diffusion model. Recall and precision were computed by considering grid points in the diffused maps relative to ground-truth positions of main-chain atoms in the maps. Details in Sup Table S2. 19 maps with no inaccurate AF2 models (TM-Score < 0.5) are shown with blue circles; while 52 maps that include inaccurate AF2 models are shown with orange crosses. b. TM-Score of modeled protein complex structure relative to backbone recall. The symbols are the same as panel a. c. TM-Score comparison between DiffModeler and three existing methods, the raw VESPER, Phenix, and EMBuild. d. TM-Score relative to the map resolution of the different methods. The lines in the panels represent the regression line and the shaded region represents the confidence interval for the regression estimate. e. TM-Score relative to the overall protein complex size represented by the number of residues for the different methods. f. Sequence identity relative to the structure size for the different methods. g. RMSD relative to the structure size for the different methods. h. FSC resolution estimation between diffused maps and original maps relative to the native structure in PDB. The resolution was estimated at FSC = 0.143. i. FSC resolution estimation comparison between diffused maps and original maps relative to the modeled structure by DiffModeler. Raw data of the plots are available in Sup Table S3. j. Modeling results by DiffModeler using the native single chain structures as compared with the results using AF2 models. Here 71 maps were used. For statistical information of regression lines in Fig. 2d-2g, please see Methods.
Fig. 2b illustrates our exploration into the impact of backbone recall within diffused maps on the subsequent accuracy of structure fitting. To assess the precision of modeled protein complexes, we employed MM-align42 to superimpose a modeled complex structure onto the accurate structure (referencing the PDB entry associated with the map) and calculated the TM-Score35 (Details in Methods). The TM-Score is a dimensionless metric utilized to gauge structural resemblance between two protein structures, with a value of 1 denoting identical protein pairs and values exceeding 0.5 indicative of meaningful similarity. When the 19 maps with no incorrect (TM-Score < 0.5) AF2 models were considered (blue circles in the plot), on average, DiffModeler achieved a high TM-Score of 0.808. There were two instances where the TM-Score fell below 0.8. In a particular case (EMD-1871), despite a high backbone recall of 0.98 (close to 1.0), the TM-Score remained 0.781 (close to 0.8), due to two wrong single-chain structure fittings because of the low backbone precision 0.64.
We further evaluated DiffModeler on all 71 test maps, which included 52 maps with incorrectly AF2 (TM-Score < 0.5) (crosses in the plot). The average TM-Scores of modeled structures decreased from 0.922 to 0.737. However, 59 out of the 71 maps (83.1%) still maintained TM-Scores higher than 0.5, indicating that the modeled structures shared similar folds with the native structures. Among the 52 maps, there were 3 instances where the overall TM-Score fell below 0.2. In all these cases, multiple AF2 models in a map were incorrectly built. For example, in the case of EMD-9036, the map with the smallest TM-Score of 0.056, includes only 1 chain in the map that had a TM-Score of 0.089.
In Fig. 2c, we compared the TM-Score of models constructed by DiffModeler with three other existing methods, the dock_in_map program in Phenix10, EMBuild43, and raw VESPER16. For the latter, the original EM maps were used instead of the diffused maps for structure fitting. EMBuild is a recent method for fitting AF2 models within a cryo-EM map, which combines structure fitting, domain-based refinement, and graph-based iterative assembly. DiffModeler exhibited a high average TM-Score of 0.922. In contrast, VESPER (raw), Phenix, and EMBuild showcased a broad spectrum of model accuracy, averaging approximately half of DiffModeler’s performance, with TM-Scores of 0.407, 0.409, and 0.841, respectively. The notable contrast between DiffModeler and VESPER (raw) vividly highlights the substantial positive impact of utilizing diffused maps.
Figs. 2d to 2g aim to explore the relationship between model accuracy and both map resolution (Fig. 2d) and the size of the complexes (Fig. 2e, 2f, and 2g). While the performance of other methods noticeably declined with increasing resolution and larger structure sizes, DiffModeler consistently maintained stable performance and notably outperformed in challenging scenarios involving lower resolutions or larger sizes. Fig. 2f compares the sequence identity of different methods relative to the structure size, which considers the fraction of residues in the reference structure that were successfully modeled and with the correct residue type. DiffModeler demonstrated the stable sequence identity, while all other methods decreased dramatically when the structure size was large. On average, DiffModeler, VESPER (raw), Phenix, and EMBuild yielded sequence identities of 0.89, 0.31, 0.29, and 0.74. respectively. Fig. 2g investigates model accuracy concerning complex sizes using a different metric, the root-mean-standard-deviation (RMSD) of the aligned residues in the model (see Methods for details). On average, DiffModeler, VESPER (raw), Phenix, and EMBuild yielded RMSD values of 3.89 Å, 10.09 Å, 10.48 Å, and 4.08 Å, respectively (details in Supp Table S3). EMbuild showed lower sequence identity (Fig. 2f) in general than DiffModeler but their RMSD values are comparable (Fig. 2g). This indicates that EMbuild places chains in structurally similar regions, yielding a small RMSD, which are, however, not the correct native positions, which resulted in lower sequence identity.
In Fig. 2h and 2i, we examined map-model agreement, calculating the Fourier Shell Coefficient (FSC) with phenix.validation_cryoem44. Fig. 2h examines the original experimental maps and the diffused maps with the native structure while Fig. 2i compares the original and diffused maps with the modelled structures by DiffModeler. For both cases, a clear improvement was observed when comparing the diffused map with the structures.
In the last panel, Fig. 2j, we show modeling results by DiffModeler using the native single chain structures in comparison with results using the AF2 models because the inaccuracy of AF2 models is a main reason for low accuracy of DiffModeler models. As shown in the plot, most of the 71 testing maps have a high TM-score by using the native chain structures. The average TM-score improved from 0.737 to 0.917.
Examples of Protein Complex Structure Models
In this section we discuss five examples of models constructed by DiffModeler. In Fig. 3, for each example map, five panels are shown: the original experimental map, the diffused backbone map, LDPs of traced backbone, the structure models, and structure comparison between the constructed model with the PDB entry. The first example (Fig. 3a) is the state 2 of Mus musculus TRPML1 (EMD-6824, resolution: 7.4 Å), which encompasses four protein chains totaling 1,696 residues45. The resolution of this map was mentioned to be 7.4 Å in the paper45 but it may be even worse because Resmap46, a map resolution estimation program, reported 9.4 Å when we ran it. Modeling the interaction between the transmembrane domain and the peripheral domain was particularly difficult for this map, resulting in low TM-Scores of 0.30 and 0.47 using Phenix and VESPER (raw), respectively. In contrast, DiffModeler nicely traced the backbone by diffusion model achieving a TM-Score of 0.95 and an align ratio of 1.0 for this challenging map.
Fig. 3. Examples of structure models constructed by DiffModeler from the test dataset.

Detailed evaluation Results are available in Sup Table S3. For each example, five columns are shown: the cryo-EM map with the structure of the protein complex with different color indicating different chains; the diffused backbone map by the diffusion model; the local dense points of the backbone map; the structure model by DiffModeler; the superposition of the model by DiffModeler (blue) with the native structure (red). A. State 2 of Mus musculus TRPML1 (EMD-6824, PDB: 5YE1, Resolution: 7.4 Å (see the text); protein size: 4 chains and 1,696 amino acids (aa)). TM-Score: 0.95; RMSD: 3.26 Å. b. Closed conformation of Cx26 Gap junction channels at acidic pH (EMD-20916, PDB: 6UVT, Res.: 7.50 Å; protein size: 12 chains and 2,112 aa). TM-Score: 0.88; RMSD: 5.04 Å. c. the human PLC editing module (EMD-3906, PDB: 6ENY, Res.: 5.80 Å; protein size: 5 chains and 1,569 aa). TM-Score: 0.95; RMSD: 3.08 Å. d. State 2 of ATPase cycle in PAN-proteasomes (EMD-213, PDB: 6HE9, Res.: 6.35 Å; protein size: 34 chains and 8,531 aa). TM-Score: 0.97; RMSD: 3.79 Å. e. Minor state of T. thermophilus enzyme in complex with NADH (EMD-11237, PDB: 6ZJN, Res.: 6.10 Å; protein size: 15 chains and 4,655 aa). TM-Score: 0.98; RMSD: 2.40 Å.
The next example (Fig. 3b) is the closed conformation of Cx26 Gap junction channels (GJCs) at acidic pH (EMD-20916)47. This complex is difficult to model because it has 12 chains in a map of a relatively low resolution, 7.5 Å. DiffModeler was able to precisely identify helices in the map and correctly fit the 12 chains with a TM-Score of 0.88. In contrast, VESPER (raw) struggled to find correct poses of the chains, resulting in a TM-Score of 0.24.
Fig. 3c is the model for the human peptide-loading complex (PLC) editing module (EMD-3906, resolution: 5.8 Å)48. Modeling the full protein complex is difficult due to the substantial flexibility exhibited by calreticulin (the chain in purple) and the sparseness of the chain assembly. Fitting the structures to the original experimental map was challenging as indicated by a low TM-Score of 0.5 by VESPER (raw). In contrast, DiffModeler achieved a high TM-Score of 0.95, demonstrating that the diffusion model was effective to capture structural features in the map.
The next map is a complex with 34 chains (Fig. 3d). It is the state 2 of a complex of the proteolytic core and the ATPase PAN (proteasome-activating nucleotidase) (EMD-213, resolution: 6.35 Å)49. DiffModeler was able to fit most of the subunits correct except for long helical domains locating at the top of the complex in the figure, yielding a TM-Score of 0.97. In comparison, with the original map, VESPER (raw) was only able to fill about 20% of the structure with a TM-Score of 0.20.
The last example (Fig. 3e) is the minor state of T. thermophilus enzyme in complex with NADH (EMD-11237, resolution: 6.10 Å)50, which includes 15 chains. Fitting subunit structures to the original map was difficult as all the chains are α-helical and hard to distinguish as indicated by a low TM-Score of 0.64 by VESPER (raw). On the other hand, with the advantage of the map diffusion, DiffModeler showed accurate backbone structure tracing with backbone recall of 0.98 and a superior structure alignment with a TM-Score of 0.98 and an RMSD of 2.40 Å.
Fig. 4 illustrates the largest protein complex structure built by DiffModeler. This example is proteasome in complex with ADP-AlFx (EMD-6693, resolution: 6.30 Å)51. The complex comprises 47 protein chains totaling 13,462 amino acids. The diffusion model in DiffModeler achieved a 0.92 backbone tracing recall, laying a robust foundation for further protein complex structure modeling. Overall, the modeled complex showed high consistency with the native structure, as evidenced by a TM-Score of 0.94 and a sequence identity of 0.89. When individual chains are considered, 45 chains out of 47 chains were successfully modeled with an average sequence matching of 92.6%. 17 out of 45 individual chain structures shown in the figure, which appear in the front view of the complex. The high modeling accuracy is clearly due to the application of the diffusion model to the map, as VESPER (raw) alone only achieved 0.25 TM-Score. In contrast, EMBuild yielded a TM-Score of 0.88 and a sequence identity of only 0.47, which indicates many chains were placed on similar but incorrect map regions.
Fig. 4. Structure model of proteasome constructed by DiffModeler.

This is the largest complex in the test set. Proteasome in complex with ADP-AlFx (EMD-6693, PDB: ID: 5WVI, Resolution: 6.30 Å; 47 chains and 13,462 aa). TM-Score: 0.94, Sequence Identity: 0.89, RMSD: 5.13 Å. The EM map superimposed with the corresponding complex structure and the model by DiffModeler are shown on the top left and top right, respectively, with different colors indicating different chains. On the bottom left, superimposition of the entire complex in PDB and the model is shown together with comparison of 17 individual chain models that appear in the front view of the complex. Blue, the model, red, the native structure in the PDB entry. The modeled structures by the other methods are shown in Extended Data 7.
Structure Modeling on cryo-EM Maps at Low Resolution
We further conducted an additional benchmark of DiffModeler on cryo-EM maps determined at low resolutions (10 to 18 Å). There were four maps in EMDB in this resolution range and satisfy the map selection criteria we used (see Methods). The modeling results are shown in Fig. 5 and detailed performance metrics are provided in Sup Table S4. For these four maps, the average TM-Score of models by DiffModeler was 0.74, while that of EMBuild, Phenix, and VESPER (raw) was 0.32, 0.36 and 0.27, respectively, which are below the cutoff of 0.5 that indicates meaningful structural similarity.
Fig. 5. Modeling results by DiffModeler for experimental maps at low resolution.

Detailed Evaluation Results are shown in Sup Table S4, which contains TM-score of individual chains and modeled protein complex by different methods. For each map four panels are shown from left to right: the input cryo-EM map; the corresponding native structure; the model by DiffModeler; and the superposition of the DiffModeler model (blue) with the native structure (red). a. ATP-Bound States of GroEL (EMD-1042, PDB: ID: 1GR5, Resolution: 10.3 Å; 14 chains and 7,238 aa): TM-Score: 0.97, Sequence Identity: 0.95, RMSD: 3.88 Å. b. anaerobic fatty acid beta oxidation trifunctional enzyme (anEcTFE) octameric complex (EMD-16134, PDB: ID: 8BNR, Resolution: 10.3 Å; 8 chains and 4,584 aa): TM-Score: 0.87, Sequence Identity: 0.88, RMSD: 5.80 Å. c. cofilactin filament inside microtubule lumen (EMD-16877, PDB: ID: 8OH4, Resolution: 16.5 Å; 14 chains and 3,776 aa): TM-Score: 0.60, Sequence Identity: 0.50, RMSD: 8.30 Å. d, MecA-ClpC complex with ATP with the Walker B mutations introduced in the D2 ring (EMD-5608, PDB: ID: 3J3S, Resolution: 11.0 Å; 12 chains and 5,352 aa): TM-Score: 0.51, Sequence Identity: 0.45, RMSD: 8.42 Å. Modeling results by other methods are provided in Extended Data 8.
The first example (Fig. 5a) is ATP-Bound States of GroEL (EMD-1042, resolution: 10.3 Å, 14 chains, 7,238 residues)52. Due to the low resolution of the map, the authors manually determined this structure by fitting individual chain structures while considering symmetry information. In contrast, DiffModeler demonstrated the capability to model the complete atomic structure automatically and accurately, achieving a TM-Score of 0.97 and an RMSD of 3.88 Å. The structural superimposition of the model with the corresponding PDB entry visually confirms the accuracy of the model.
Fig. 5b is a 10.3 Å map from anaerobic fatty acid beta oxidation trifunctional enzyme (anEcTFE) octameric complex (EMD-16134)53. The complex has eight chains, which is a dimer of a tetramer, shown as left and right volumes in the map in the figure. The original investigators modeled the complex structure with multiple manual steps. The procedure included structure fitting from a related tetramer map with a resolution of 3.55 Å, which was modeled by incorporating the crystal structure (PDB: 6DV25467) with further fitting and refinement. Subsequently, they docked the solved structure into the low-resolution map and conducted further refinement to achieve the final structure. In contrast, DiffModeler automated the assembly of the entire protein complex based on the low-resolution map and achieved a high TM-Score of 0.87. Structures derived from EMBuild, Phenix, and VESPER reported TM-Scores of 0.30, 0.30, and 0.19, respectively, emphasizing the distinct advantage of DiffModeler.
The third map was cofilactin filament inside microtubule lumen (EMD-16877), determined even at a lower resolution of 16.5 Å with 14 chains (Fig. 5c). The authors determined the structure by fitting cofilactin filament model (PDB: 5YU8) to the density manually followed by a local refinement55. The model built by DiffModeler had a TM-Score of 0.60, which was substantially higher than values of EMBuild, Phenix, and VESPER (raw), 0.17, 0.25, and 0.18, respectively, which failed to capture even the overall fold. The model by DiffModeler captured the overall shape of the complex. However, only 4 chains out of 14 chains are successfully aligned (sequence identity: 0.99). There were chains, e.g., chains E, H, and K, which were placed in the correct regions but with incorrect alignments.
In the last panel (Fig. 5d), we illustrate a case where DiffModeler’s performance was relatively poor. The presented structure is derived from a 11.0 Å resolution map or a 12-chain complex of MecA-ClpC with ATP and Walker B mutations introduced in the D2 ring (EMD-5608). The authors employed a complex manual procedure for structure determination: Initially, they used an initial model based on another crystal structure of ClpC (PDB: 3PXI) and employed MODELLER56 to fill in missing loops using other related structures as templates (PDB: 1JBK and 1R6B). Subsequently, the structure was manually docked into the cryo-EM maps, followed by flexible fitting using NAMD57. The model generated by DiffModeler had an overall TM-Score of 0.51, a barely significant score for structure modeling. Among 12 chains, 4 chains C, D, E, and F were modelled successfully with an average TM-score of 0.73 and sequence identity of 0.73. The rest of the chains were placed to incorrect regions of the map. TM-Scores of EMBuild, Phenix, and VESPER (raw) were even worse, 0.26, 0.30, and 0.19, respectively.
To our knowledge, DiffModeler is the first method capable of automatically modeling protein complexes from maps in this low-resolution range. It distinctly demonstrates its advantage over existing methods.
Structure Modeling for Maps at High Resolution
Although the primary focus of DiffModeler is low resolution maps, where it can demonstrate its unique strengths, it also performs effectively with higher resolution maps. To illustrate this versatility, we employed DiffModeler on maps with better than 5 Å resolution. We conducted benchmarking on two distinct datasets: one that was used in the paper of CryoREAD 9 and the other employed in ModelAngelo58. These two datasets cover a broad spectrum of structures, encompassing protein-DNA/RNA complexes and protein-only configurations. The CryoREAD dataset comprised 61 maps (excluding those DNA/RNA-only maps), while the ModelAngelo dataset included 28 maps. The number of chains in the dataset ranged from 1 to 48 chains, totaling residues from 447 to 17,947. On this dataset we used the identical model and pipeline of DiffModeler without any alterations. For maps with protein-DNA/RNA complexes, we first used CryoREAD to construct DNA/RNA structures and then modeled protein structures in the remaining regions in the maps by DiffModeler. AF2 models used in the DiffModeler modeling was selected from the AF2 database by BLAST59 sequence search. Sup Table S5 shows the sequence identity and the TM-Score of AF2 models relative to the native structure of individual chains in the maps. The TM-Score of AF2 models ranged from 0.134 to 0.998, with an average TM-Score of 0.858 and 0.896 for CryoREAD and ModelAngelo datasets, respectively.
Fig. 6 summarizes the modeling results, with details in Sup Table S6. In Fig. 6a-d, models of the maps were evaluated with TM-Score and the sequence identity with other methods on the two datasets. We compared other three modeling methods, Phenix (phenix.dock_in_map), VESPER (raw), ModelAngelo58, and DeepMainMast8. Among these methods, Diffmodeller clearly outperformed the other three methods on the two datasets. The average TM-Score and the sequence identity by DiffModeler for the CryoREAD/ModelAngelo datasets were 0.879/0.907 (Fig. 6a, 6c) and 0.851/0.864 (Fig. 6b, 6d), respectively, which are comparable results as benchmarked on the original dataset of 5.0–10.0 Å resolutions (Fig. 2). In contrast, the other four methods showed substantially lower performance: Phenix, VESPER (raw), ModelAngelo, DeepMainMast8 yielded the average TM-Score, and the sequence identity were 0.572/0.697/0.348/0.597 and 0.430/0.579/0.328/0.555 on the CryoREAD dataset, 0.573/0.701/0.542/0.791 and 0.508/0.605/0.532/0.779 on the ModelAngelo dataset. In Fig. 6e, we investigated the impact of complex size on the TM-Score of DiffModeler models. As depicted in Fig. 2, we consistently observed high TM-scores, even for large complexes.
Fig. 6. Protein complex structure modeling by DiffModeler for experimental maps at near-atomic resolution (< 5 Å).
Detailed Evaluation Results are shown in Sup Table S5 and S6. The benchmark was performed on two datasets, the CryoREAD dataset and the ModelAngelo dataset Models by DiffModeler were compared with those by Phenix, VESPER (raw), ModelAngelo, and DeepMainmast. a. TM-Score comparison on the CryoREAD dataset. b. The sequence identity comparison on the CryoREAD dataset. c. TM-Score comparison on the ModelAngelo dataset. d. The sequence identity comparison on the ModelAngelo dataset. e. TM-Score relative to the total number of residues in the map. For CryoREAD dataset, the equation of regression line is y = 8.70e-7x + 0.876 (Pearson correlation coefficient, 0.029; P value, 0.824; standard error, 3.90e-6); for ModelAngelo dataset, the equation of regression line is y = 3.71e-6x + 0.896 (Pearson correlation coefficient, 0.095; P value, 0.632; standard error, 7.67e-6). f. RqcH DR variant bound to 50S-peptidyl-tRNA-RqcP RQC complex (EMD-13017, PDB: ID: 7OPE, Resolution: 3.2 Å; 3,818 residues and 2,996 nucleotides). DiffModeler and CryoREAD: TM-Score: 0.92, Sequence Identity (proteins): 0.92, RMSD: 1.74 Å.
While DiffModeler exhibited strong performance for most cases, there were instances where the performance is low, indicated by TM-Score or sequence identity values lower than 0.6. One contributing factor to these cases was the failure in predicting the AF2 chain structure (e.g., EMD-12935, EMD-27705; Sup Table S6). TM-Score of these two maps were 0.65 and 0.54, respectively. But if we used the native chain structures as input, they improve both to 0.99 (Extended Data 9). There are also two cases with a low sequence identity and a high TM-Score (e.g., EMD-13619, EMD-13620), which are cases of hetero-oligomers where subunits were placed in equivalent places of different chains.
In Fig. 6f, we present a model for RqcH DR variant bound to 50S-peptidyl-tRNA-RqcP ribosome-associated protein quality-control (RQC) complex (EMD-13017, resolution: 3.2 Å)60. This large complex includes 3,818 amino acid residues and 2,996 nucleotides. We modelled the entire complex with DiffModeler and CryoREAD9 for protein and RNA, respectively, which yielded a TM-Score of 0.92 for protein and a backbone recall of 0.94 for RNA. While this work primarily focuses on protein structure modeling with DiffModeler, we also extended our modeling efforts to include nucleic acid structures within these maps using CryoREAD9. Backbone and sequence recall (see Methods in CryoREAD9), were measured at 0.855 and 0.523 on the CryoREAD dataset and 0.829 and 0.413 on the ModelAngelo dataset.
Discussion
DiffModeler is a structure modeling method, which uniquely targets low resolution cryo-EM maps of 5–15 Å. Within this target resolution range, the presence of noisy density in cryo-EM maps makes it exceedingly difficult to detect precise atom and amino acid positions as well as main-chain conformations in the map. DiffModeler overcomes these obstacles by sculpting out main-chain conformations from low resolution maps using a diffusion model, which enables to achieve substantially higher accuracy in structure fitting. The benchmark of DiffModeler on higher resolution, better than 5 Å, further indicated its generalizability and accuracy to handle maps with higher resolution.
For the training of DiffModeler, we opted to train our models using experimental low-resolution EM maps and then benchmarked it on high-resolution settings instead of using Gaussian-noise simulated low-resolution maps. This is because we found Gaussian-noise does not properly simulate experimental low-resolution maps 19,20 (more details in Supplementary Note).
Although DiffModeler has demonstrated overall accuracy and effectiveness, it is crucial to address the limitations of the current version. First, in some regions with low local resolution, the backbone tracing of diffusion model may be inaccurate, leading to incorrect structure fitting. To address this issue, further enhancements can be made to prioritize the fitting of regions with higher local resolution, mitigating the risk of such errors. Likewise, if local EM density is missing for an entire or a part of a subunit, such subunit is difficult to model because there is no density which diffusion model can modify. Secondly, as of now, DiffModeler exclusively supports protein structure complex modeling. To expand its applicability, future developments will aim to extend its capabilities to support protein/DNA/RNA complex structure modeling, enhancing its versatility in addressing a wider range of biological systems. Furthermore, for high-resolution cryo-EM maps (better than 4 Å), it will be essential to develop local structure refinement approaches that leverage the density information to refine predicted structures, further enhancing accuracy and reliability. Addressing these limitations remains as future developments. Lastly, the results obtained from DiffModeler fitting are inevitably influenced by (in)accuracy of AF2 models of proteins included in the maps, which can be overcome by fitting individual domains of proteins because AF2 tends to build accurate domain structures (implements and examples included in Methods). The overall accuracy of the models is expected to improve as protein structure prediction methods become more accurate in the near future.
We firmly believe that DiffModeler will prove to be an indispensable and user-friendly tool for protein complex structure modeling, bridging a crucial gap in the availability of tools suitable for maps at low resolutions. The approach will also be applicable for cryo-electron tomography within the same resolution range, better than 15 Å, which is now increasingly available61,62.
Methods
Constructing Benchmark Dataset
Following the protocols employed in our previous works 19,20,63,64, we complied a dataset of experimental cryo-EM maps for training, validation, and testing DiffModeler. Initially, we sourced cryo-EM maps from EMDB (as of January 26th, 2023) with resolutions between 5 Å to 10 Å and had the corresponding deposited structures in PDB with more than 20 residues. We only kept maps that contain only proteins. This initial screening yielded 840 maps.
Subsequently, we assessed the quality of structure-to-map fit by measuring cross-correlation and overlap between the EM maps and simulated maps generated from their respective structures in PDB 17. Maps were discarded if their corresponding structures displayed a cross correlation and overlap below 0.65. The remaining maps were manually inspected. These steps reduced the number of maps to 337.
To remove redundancy in the data, we applied single linkage clustering with the sequence identity of proteins within each map. Two maps were grouped into the same group if any protein chains from both maps exhibited a global sequence identity of 25% or higher. This clustering procedure resulted in 103 clusters. Out of the 103 clusters, we randomly allocated 68 clusters (230 maps) for the training set, 18 clusters (36 maps) for validation, 17 clusters (71 maps) for testing (Sup Table S1). It is important to note that the training, validation, and testing sets are fully independent from each other. Finally, we classified maps in the testing set that contained inaccurate predicted models with a TM-score lower than 0.5 in the Alphafold Database 18 and those which do not have such inaccurate models. Among the 71 maps, 19 maps did not have inaccurate models with a TM-score lower than 0.5.
Pre-processing of map data
If a map had a grid size that is different from 1.0 Å, we interpolated the grid size to 1.0 Å using trilinear interpolation. The density values within a map were normalized to [0.0, 1.0] with a minimum-maximum normalization. Any negative values in a map were set to 0, and 0 was used as the minimum value for normalization. We set the maximum value for normalization as the 98th percentile density value, and any density values above that were capped at 1.0.
From each map, boxes of a size of 643 Å3 were collected by scanning the box across a map along three axes with a stride of 32 Å. Each grid point within the box was assigned a label indicating whether it belonged to the backbone. If a grid point was within 2.0 Å of any backbone atoms, it was assigned as backbone. Otherwise, the point was considered as background. A box was excluded from training if less than 0.1% of the grid points were assigned as backbone.
Training the conditional diffusion model of DiffModeler
Given the density information from cryo-EM maps, the objective of the diffusion model of DiffModeler is to generate the backbone labels in the map. We employed a conditional diffusion model, particularly, the denoising diffusion implicit model (DDIM) 24, for its superior generation quality and efficiency. Inspired by the Pix2Seq 28 framework, we designed an encoder-decoder network architecture (Extended Data 1). The encoder scans the input density map with a box of 643 size and embeds (outputs) hidden features of the map. The decoder utilizes three components as input of the conditional diffusion framework: the condition (the starting cryo-EM density map and hidden features), the noised backbone at timestep t, and the time t of the current step. From these inputs, the decoder outputs the predicted traced backbone . The noised backbone is a mixture of the ground-truth traced backbone density and the Gaussian noise determined by the timestep t, which will be explained later. The encoder and the decoder are optimized simultaneously by comparing the predicted traced backbone and ground truth traced backbone . The encoder and decoder neural network architecture is shown in Extended Data 2a and 2b, respectively. The detailed network architecture of each component of the encoder and the decoder is shown in Extended Data 3.
As mentioned in the previous dataset section, we allocated 230 maps for training and 36 maps for validation of the conditional diffusion model. For each batch of training, we randomly sampled 8 boxes from the 230 maps. In total, around 16,000 and 3,500 boxes were used in an epoch for training and validation, respectively. The framework was trained through 30 epochs, and the final model is selected based on the validation performances.
The main objective of the model is to perform conditional denoising of a noisy density of the traced backbone to achieve the ground-truth traced protein backbone density x0 in the map. For training the model, a series of noisy traced backbone density maps were generated by randomly sampling the density values from the ground-truth traced backbone density and the Gaussian noise:
| (1) |
| (2) |
where is the noised traced backbone at timestep , is a cosine scheduling function shown in Eq. (2), is the ground-truth traced backbone, and is a noise variable randomly sampled from the standard Gaussian noise, . The ground-truth density of the traced protein backbone was prepared by assigning the backbone label to each grid point based on the corresponding backbone native structure (N, Ca, C atoms). For any grid point in the map, if a grid point was within 2.0 Å of any backbone atoms, it was assigned as backbone. Otherwise, the point was considered as background.
During the training process, was uniformly sampled from [0,1] for each map in the training set at each iteration to enforce that the framework successfully captures the diffusion process. The noised backbone for time t was obtained according to Eq. (1), from which the decoder computes the predicted backbone map . The loss of was computed in comparison with the ground truth backbone . The used Dice loss 36 was define as
| (3) |
represents the Dice loss of a predicted box of prediction at timestep and a corresponding ground truth box of ground truth ; is the total number of grid points inside the box; is the predicted probability of the i-th grid point in the predicted box; is the binary ground truth of the i-th grid point, where 1 denotes the existence of backbone structure in the grid point and 0 indicates background; is a smoothing factor with value of 1e-6; is the overall loss of a batch of examples; represents the dice loss of k-th example’s detection. Here different samples in the same batch may have different timestep since it is uniformly and independently sampled for each example.
We tested hyperparameter combinations of a learning rate of [1e-3, 1e-4, 1e-5] with a weight decay of [0, 1e-6, 1e-5, 1e-4] using the Adam optimizer 65. Among the combinations, the learning rate 1e-4 without weight decay showed the best grid-wise Intersection-over-Union (IoU) of 0.562 on the validation set. Training and validation of the conditional diffusion model took around 5 days. The computations are performed on two paralleled NVIDIA RTX A6000 48 GB GPU connected via NVLink.
Inference of the conditional diffusion model in DiffModeler
With the trained conditional diffusion model, we compute the traced backbone conditioned on the input cryo-EM density. The inference of conditional diffusion model is presented in Extended Data 4. Given a box of cryo-EM density, the encoder of the conditional diffusion model first embeds the hidden features of the input density box. Subsequently, the decoder starts with the random Gaussian noise as the initial distribution and iteratively refines the estimated density to make it closer to the ground-truth traced backbone , conditioned on the hidden features from the encoder and the initial density input.
Benefited from the training, which used uniformly sampled timesteps, we have the flexibility to choose the overall inference steps T. We chose T = 100 as we did not observe significant performance improvement with T larger than 100. The current timestep t is calculated by
| (4) |
where is the timestep at inference iteration , and is the overall inference steps. Though the actual timestep is a fraction number belongs [0,1] during inference, we simplify the term as integer in following description by starting to , which corresponds to the iteration to .
The first iteration of the inference starts at timestep . The decoder takes the random Gaussian noise , timestep embedding, and the condition (i.e., the hidden feature embedding and the original cryo-EM map) as input and then it outputs .
In the following timestep , the condition inputs are the same and the timestep embedding obtained with Eq. (4). However, the noisy backbone input for decoder is different from training. During training, is computed following Eq. (1) which uses as the ground-truth traced backbone. As is not available in the inference stage, the input of the decoder, , uses the decoder’s output at timestep :
| (5) |
where is the estimated noised backbone at timestep , is the decoder output at and is the random Gaussian noise. In this equation, is also estimated by comparing the decoder’ noisy backbone input and its corresponding backbone estimation output from the decoder as follows:
| (6) |
By combining Eq.(5) and Eq.(6), we can obtain the decoder input with decoder output at timestep . The inference process of the decoder is repeated for times and at timestep is our final estimated backbone.
Single-chain structure fitting using VESPER
We used VESPER 16 for fitting AF2 models of individual proteins to the modified map by the diffusion model. AF2 models of the protein chains were taken from the Alphafold database 18. Supp Table S3 provides TM-score of the chains. The average TM-score was 0.922. The fitting process involved three main steps: Initially, AF2 models were transformed into simulated maps at a 1 Å resolution using TEMPy 66. In the subsequent step, we simplified both the modified EM map and the simulated maps of the AF2 models into maps by condensing them into maps with local representative density points. This was achieved through the mean-shifting algorithm 40 a method we devised in our early work, MAINMAST 4. Finally, VESPER was used to globally align AF2 models into various poses within the representative map, generating different fit scores. The top 100 poses were retained as pose candidates for each subunit.
The mean shift algorithm is employed to compute maps featuring local representative density points by clustering density points within an EM map. First, grid points with a density exceeding 0 are identified. Then, the algorithm iteratively updates the coordinates of a grid point by considering the weights associated with neighboring grid points: , where
| (7) |
is the neighborhood of , which are a set of neighboring grid points that satisfy ; is a Gaussian kernel function with bandwidth , as shown in Eq.(8); is the density value of the grid point .
| (8) |
where the is the bandwidth set as 2. The mean-shift process is continued until convergence, i.e., with set to 0.001.
Following the completion of the mean-shifting process, we merged shifted points that were in close proximity. Points closer than a predefined threshold distance of 2.0 Å, were clustered together, and the grid point with the highest density within the cluster was designated as the representative node. This clustering and selection process was iterated until the convergence of the selected representative nodes. The resulting set of points, known as representative points, forms the basis for the representative map (Fig. 3).
By completing this stage, we acquired two distinct representative maps using the mean-shift algorithm: the subunit representative map derived from the simulated map of the AF2 single-chain structure, and the backbone representative map obtained from the diffusion-traced backbone map.
The final step involves utilizing VESPER to globally align AF2 single-chain subunits into various poses within the backbone map. Specifically, VESPER aligns different to obtained in the preceding step. For each subunit representative map , VESPER systematically explores all potential poses to align with . In VESPER’s global search, we used a rotation scan interval of 10 ° and a translation scan interval of 2 Å. The fitness score of at pose is defined as:
| (9) |
where is the number of Cα positions of subunit at pose that have representative points in within 3 Å, and is the total number of Cα positions of subunit . Top 100 poses were kept for each subunit. This pool comprises of pose candidates for a protein structure complex with chains.
Assembling subunits to generate the entire protein complex structure
Subunits, fitted to the map with different pose candidates, are then assembled into a complete protein complex structure model. We developed a greedy algorithm that iteratively assembles superimposed subunits within the map. The entire pipeline is depicted in Extended Data 5. As outlined in the preceding section, we generated 100 poses for each subunit in the map using VESPER. Therefore, the subunit-pose pool for a given protein structure complex comprises pose candidates, all of which were scored using the (Eq. 9).
The initial step in the modeling process involves selecting the subunit-pose with the highest among all available poses. Subsequently, a local region within 20 Å from the fitted subunit-pose is masked out in the backbone map and the subunit pose is further optimized in terms of the with an interval of 5° for rotation scan and an interval of 1 Å for translation scan in that local region. Then, from the subunit-pose pool, subunit-poses are removed if the poses belong to the subunit that was just selected or if they have significant overlap with the selected subunit-pose. A subunit-pose is considered to have overlap if more than 10% of Cα positions of the subunit-pose are closer than 3 Å to any Cα positions to an already selected subunit-pose(s).
Following this, the subsequent best subunit-pose is selected iteratively until the subunit-pose pool is exhausted. In most cases, where each subunit assumes a correct pose, all M subunits are successfully fitted into the map. However, there are rare instances where not all subunits are selected due to significant overlap among all 100 poses of a subunit with other already-selected subunit-poses. In such scenarios, where some subunits remain unfitted due to substantial overlap, a new pose set is generated for these remaining subunits. This is achieved by fitting them to the remaining density regions within using VESPER. The same iterative process is then applied until all the subunits are successfully fitted.
In the output cif file of the structure model, subunit_fitscore is shown in the occupancy field of each residue of chains.
Domain-based DiffModeler
The results obtained from DiffModeler fitting are inevitably influenced by (in)accuracy of AF2 models of proteins included in the maps. The impact of inaccuracies of AF2 models can often be mitigated by fitting individual domains of proteins because AF2 tends to build accurate domain structures even when relative orientations of domains are incorrect. To address this, we have implemented a procedure to cut multiple chain structure models into individual domains using SWORD267 and use the domains in the fitting process. Extended Data 10 shows an example where domain-based fitting improved the model accuracy. The domain-based DiffModeler is also available at our Github codebase and server. Alternatively, AF2 models could be also trimmed to remove low-confidence regions, which is an area left for future work.
Evaluation Metrics
Backbone Recall:
The backbone recall was computed for each residue by determining the fraction of backbone heavy atoms within a 3 Å proximity to any grid points in the diffused map. This was then averaged across all residues in the map.
Backbone Precision:
The backbone precision was computed as the fraction of grid points within a 3 Å proximity to any backbone atoms.
To evaluate the performance of modeled structure, we utilized MM-align42 to compare the modeled structure and the native structure. MM-align is a sequence-independent alignment of protein complex structures, which aims to find the best superposition between two protein complex structures via a heuristic iteration of a modified Needleman-Wunsch dynamic programming (DP) algorithm. The heuristic alignment procedure is repeated until the alignment between two protein complexes converges.
Given a modeled protein complex with M residues and a native protein complex with N residues, and the number of aligned residues is K identified by MM-align, then the evaluation metrics are calculated as follows:
TM-Score:
the structural similarity between modeled structure and native structure.
| (10) |
where N is the total number of residues in the native structure, K is the number of aligned residues. is the distance between the Cα atoms of the residue and its aligned pair from the modeled structure after superposition by MM-align, and specified by MM-align.
Align Ratio:
the fraction of residues in the native structure that have aligned residues from the modeled structure.
| (11) |
where N is the total number of residues in the native structure, K is the number of aligned residues identified by the alignment algorithm MM-align.
Sequence Identity:
the fraction of residues that are in the native structure that have aligned residues from the modeled structure, and the aligned residues have the same residue type.
| (12) |
where L is the number of residues that have correct residue types among all K aligned residues identified by MM-align, and N is the total number of residues in the native structure.
RMSD:
the root-mean-square deviations between K aligned residues of the modeled structure and native structure.
| (13) |
where K is the number of aligned residues identified by MM-align, is the Euclidean distance between the Cα atoms of the aligned residues.
Statistical Information of Benchmark
In the following table, we presented the statistical information of benchmark shown in Fig 2.
Extended Data
Extended Data Fig. 1. The overall framework of the conditional diffusion model in DiffModeler.
The entire framework consists of one encoder and one decoder. The encoder takes the cryo-EM density as input and outputs the hidden features by scanning the map density with a box. The decoder utilizes three main components as the input of the conditional diffusion framework: the condition (the starting cryo-EM density map and hidden features), the noised backbone at timestep t, and the timestep t. Then the decoder outputs the predicted traced backbone. The encoder and the decoder are optimized simultaneously by comparing the predicted traced backbone and ground truth traced backbone, with details illustrated in Methods.
Extended Data Fig. 2. The network architecture of the conditional diffusion model in DiffModeler.
a. The encoder network architecture. It is a 3D U-shape-based convolutional neural network (UNet) with skip connections. The channel size of different layers is also illustrated in the figure. The input is first processed by Conv3D layer with 32 filters in size of 33, and then iteratively processed and down-sampled by encoding block Enc1-Enc5 (Extended Data Fig. 3a), The downsample block (Extended Data Fig. 3c), the dense information is further processed by bridge block (Extended Data Fig. 3b), subsequently process the encoding, which is upsampled by Dec1-Dec5 (Extended Data Fig. 3a). The upsample blocks (Extended Data Fig. 3d) with skip connections connecgting with the The network architecture of the conditional diffusion model in DiffModeler encoding blocks, and the final ConvBlock (Extended Data Fig. 3e) aggregate the information and yield the final output. b. The decoder network architecture. It shares similar UNet architecture as the encoder. Additionally, it includes a TimeBlock (Extended Data Fig. 3f) that encodes the timestep input and passes it to every level of encoding block in the decoder network. Individual blocks are illustrated in Extended Data Fig. 3.
Extended Data Fig. 3. Individual network block architecture of the conditional diffusion model in DiffModeler.
a. The encoder/decoder block (Enc1-Enc5, Dec1-Dec5. in panel a and b of Extended Data Fig. 2). Concat is an operation that concatenates inputs. b. The bridge block (located at the bottom of Extended Data Fig. 2a,b); c. The DownSample Block. Conv3D is a 3-dimentional (3D) convolutional layer with a filter size of 3*3*3, stride 1, and padding 1. d. The UpSample Block. e. The ConvBlock (located one step before the output box in Extended Data Fig. 2a, S2b). GroupNorm is a normalization layer that calculates group statistics across channels to normalize the input data by dividing multiple channels into different groups. Swish is a smooth, non-monotonic function that consistently matches or outperforms ReLU and serves as an activation layer. f. Time Block, specifically designed for timestep embedding. PositionalEncoding is an explicit layer with pairs of sine and cosine functions to add positional information to the input. FC is a fully connected layer in which each neuron applies a linear transformation to the input vector through a weight matrix. g. Attention_ResBlock (Attention_ResBlock in panel a, b); h. The Attention Block (AttentionBlock in panel g). Attention is a layer that enables to dynamically highlight the relevant features of the input data through the attention mechanism.
Extended Data Fig. 4. The inference pipeline of the conditional diffusion model.
During the inference stage, the encoder first takes the cryo-EM density as input and outputs the hidden features. Then, the decoder iteratively refines the density from time step T utilizing three core information as input to estimate the traced backbone: condition (cryo-EM density and hidden features), the noised traced backbone at timestep t, the embedding of timestep t. The noised traced backbone starts with a random Gaussian noise and at timestep is iteratively updated by the decoder’s output through DDIM step (illustrated in Methods).
Extended Data Fig. 5. The overall pipeline of assembling algorithm.
Simulated maps of Alphafold2 models for each subunit are aligned with the RM_backbone using VESPER, and the top 100 poses for each subunit are cataloged in the structure pose pool. Initially, the subunit-pose exhibiting the highest subunit_fitscore is chosen from the structure pool. The local density region within the map occupied by this subunit is then masked out (the white local region in the figure). The pose of the selected subunit undergoes further refinement by VESPER, employing smaller angle and shifting intervals. Subsequent subunit-poses in the pool are eliminated under two conditions as shown by red crosses in the figure: if they belong to the same subunit as the one just selected or if they overlap with the selected subunit-pose. This iterative process continues until all the subunits are chosen to construct the full complex, at which point the pool becomes empty.
Extended Data Fig. 6. Example of estimated fitting quality for modeled structure.
a. native structure of Mouse retromer (VPS26/VPS35/VPS29) heterotrimer (EMD-21136, PDB ID: 6VAC, Resolution: 5.70 Å; protein lengths: 3 chains and 1,202 amino acids (aa)). Different colors represent different chains. b. modeled structures colored by subunit_fitscore, scaled from blue to red for high to low scores. c. the superposition of the model by DiffModeler (blue) with the native structure (red): TM-Score: 0.66, Sequence Identity: 0.54, RMSD: 4.96 Å. The chain colored in red on the right in the panel b do not have a correct pose relative to the native structure.
Extended Data Fig. 7. Atomic structure modeling by different methods for experimental map EMD-6872.

The proteasome in complex with ADP-AlFx (EMD-6693, PDB ID: 5WVI, Resolution: 6.30 Å; protein lengths: 47 chains and 13,462 amino acids (aa)). The 5 columns from left to right are 1) EM map and its corresponding structure; 2) the atomic structure by DiffModeler: TM-Score: 0.94, Align Ratio: 0.96, Sequence Identity: 0.89, RMSD: 5.13 Å; 3) the atomic structure by EMBuild: TM-Score: 0.88, Align Ratio: 0.91, Sequence Identity: 0.47, RMSD: 4.8 Å. 4) the atomic structure by Phenix: TM-Score: 0.38, Align Ratio: 0.51, Sequence Identity: 0.04, RMSD: 17.4 Å; 5) the atomic structure by VESPER: TM-Score: 0.25, Align Ratio: 0.29, Sequence Identity: 0.16, RMSD: 12.5 Å.
Extended Data Fig. 8. Examples of structure models built with three other methods for experimental maps at low resolution (10–15 Å).

Detailed Evaluation Results are shown in Sup Table S4. In each row of the modeling example, five columns shown from left to right are 1) input cryo-EM map; 2) the corresponding native structure; 3) the structure model by DiffModeler; 4) the structure model by EMBuild; 5) the structure model by Phenix; 6) the structure model by VESPER (raw). The DiffModeler model and its superposition is also shown in Fig. 5. a. ATP-Bound States of GroEL (EMD-1042, PDB ID: 1GR5, Resolution: 10.3 Å; protein lengths: 14 chains and 7,238 amino acids (aa)): DiffModeler (EMBuild, Phenix, VESPER): TM-Score: 0.97 (0.96 0.78, 0.37), Sequence Identity: 0.95 (0.92, 0.71, 0.34), RMSD: 3.88 Å (4.90 Å, 6.50 Å, 7.76 Å). b. acid beta oxidation trifunctional enzyme (anEcTFE) octameric complex (EMD-16134, PDB ID: 8BNR, Resolution: 10.3 Å; protein lengths: 8 chains and 4,584 aa): DiffModeler (EMBuild, Phenix, VESPER): TM-Score: 0.87 (0.30, 0.30, 0.19), Sequence Identity: 0.88 (0.23, 0.17, 0.16), RMSD: 5.80 Å (9.65 Å, 11.28 Å, 8.03 Å). c. cofilactin filament inside microtubule lumen (EMD-16877, PDB ID: 8OH4, Resolution: 16.5 Å; protein lengths: 14 chains and 3,776 aa): DiffModeler (EMBuild, Phenix, VESPER): TM-Score: 0.60 (0.17, 0.25, 0.18), Sequence Identity: 0.50 (0.02, 0.05, 0.08), RMSD: 8.30 Å (12.1 Å, 13.19 Å, 12.67 Å). d, MecA-ClpC complex with ATP with the Walker B mutations introduced in the D2 ring (EMD-5608, PDB ID: 3J3S, Resolution: 11.0 Å; protein lengths: 12 chains and 5,352 aa): DiffModeler (EMBuild, Phenix, VESPER): TM-Score: 0.51(0.26, 0.26, 0.48), Sequence Identity: 0.45(0.16, 0.15, 0.41), RMSD: 8.42 Å(12.28 Å, 11.91 Å, 8.72 Å).
Extended Data Fig. 9. Examples of Modeled Structure by DiffModeler (AF2) and DiffModeler (native).
From left to right, we present the native structure, with different colors represent different chains, superposition of native structure (red) and modeled structure (blue) by DiffModeler using AF2 single-chain structures, superposition of native structure (red) and modeled structure (blue) by DiffModeler using native single-chain structures a. thermostable human MFSD2A in complex with thermostable human Sync2 (EMD-12935, PDB ID: 7OIX, Resolution: 3.60 Å; protein lengths: 2 chains and 716 aa): DiffModeler (AF2): TM-Score: 0.63,Sequence Identity: 0.57, RMSD: 5.01 Å; DiffModeler (native): TM-Score: 0.99,Sequence Identity: 1.00, RMSD: 0.74 Å. b. insulin receptor (IR) bound with S597 component 2 (EMD-27705, PDB ID: 8DTM, Resolution: 3.50 Å; protein lengths: 3 chains and 802 aa): DiffModeler (AF2): TM-Score: 0.54,Sequence Identity: 0.54, RMSD: 4.09 Å; DiffModeler (native): TM-Score: 0.99,Sequence Identity: 1.00, RMSD: 0.78 Å.
Extended Data Fig. 10. Example of Modeled Structure by fitting domains.
a. the native protein structure the core MMTV intasome (EMD-6441, PDB ID: 3JCA, Resolution: 4.80 Å; protein lengths: 8 chains and 1,226 aa). Different colors represent different chains. This entry is included in the CryoREAD dataset. b. the superposition of the model by original DiffModeler (blue) with the native structure (red): TM-Score: 0.65, sequence identity: 0.64, RMSD: 4.37 Å c. the superposition of the model by domain-based DiffModeler (blue) with the native structure (red): TM-Score: 0.87, sequence identity: 0.85, RMSD: 2.05 Å.
Supplementary Material
Table 1.
The statistical information of Fig. 2.
| Figure | Method | Regression line | Pearson correlation coefficient | p-value | standard error |
|---|---|---|---|---|---|
| Fig. 2d | DiffModeler | y = −0.012x + 0.999 | −0.110 | 0.655 | 0.027 |
| Fig. 2d | EMBuild | y = −0.095x + 1.449 | −0.360 | 0.131 | 0.060 |
| Fig. 2d | VESPER(raw) | y = −0.225x + 1.850 | −0.716 | 0.001 | 0.053 |
| Fig. 2d | Phenix | y = −0.097x + 1.031 | −0.379 | 0.110 | 0.058 |
| Fig. 2e | DiffModeler | y = 9.22e-6x + 0.872 | 0.434 | 0.063 | 4.64e-6 |
| Fig. 2e | EMBuild | y = 2.69e-6x + 0.827 | 0.053 | 0.829 | 1.23e-5 |
| Fig. 2e | VESPER(raw) | y = −3.68e-5x + 0.605 | −0.608 | 0.006 | 1.16e-5 |
| Fig. 2e | Phenix | y = −1.65e-5x + 0.498 | −0.336 | 0.160 | 1.13e-5 |
| Fig. 2f | DiffModeler | y = 1.48e-5x + 0.815 | 0.433 | 0.064 | 7.45e-6 |
| Fig. 2f | EMBuild | y = −1.63e-5x + 0.830 | −0.246 | 0.310 | 1.56e-5 |
| Fig. 2f | VESPER(raw) | y = −4.23e-5x + 0.537 | −0.603 | 0.006 | 1.36e-5 |
| Fig. 2f | Phenix | y = −2.71e-5x + 0.433 | −0.448 | 0.054 | 1.31e-5 |
| Fig. 2g | DiffModeler | y = 3.59e-5x + 3.700 | 0.135 | 0.581 | 6.38e-5 |
| Fig. 2g | EMBuild | y=7.65e-5x+3.672 | 0.146 | 0.551 | 1.26e-3 |
| Fig. 2g | VESPER(raw) | y = 9.93e-4x + 4.743 | 0.755 | 1.87e-4 | 2.09e-4 |
| Fig. 2g | Phenix | y = 9.54e-4x + 5.348 | 0.790 | 5.67e-5 | 1.79e-4 |
Acknowledgements
The authors thank Jacob C. Verburgt, Anika Jain, Charles Christoffer for their help in literature search, discussion, and proofreading. The author would also thank Jessica A. Nash, Sam Ellis and Jing Chen’s suggestion for optimizing the released software.
This work was partly supported by the National Institutes of Health (R01GM133840) and the National Science Foundation (DMS2151678, DBI2003635, CMMI1825941, MCB2146026, and MCB1925643). XW is recipient of the MolSSI graduate fellowship.
Footnotes
Software used for benchmark
The software used in benchmark is Phenix-v1.21.1-5286, VESPER-vpub1, EMBuild-v1.0, and DiffModeler-v1.0.
Code Availability
The source code of DiffModeler is made available at https://github.com/kiharalab/DiffModeler69. It can run on our webserver https://em.kiharalab.org/algorithm/DiffModeler freely without installing it in a local machine. We also provide sequence version of DiffModeler on our server https://em.kiharalab.org/algorithm/DiffModeler(seq), which can automatically use the sequence information to find the most similar single-chain structure from RCSB and AlphaFold database and then model the full protein complex structure. The source code of ComplexModeler (including DiffModeler and CryoREAD) for protein-DNA/RNA complex structure modeling is made available at https://github.com/kiharalab/ComplexModeler. It is also available our webserver https://em.kiharalab.org/algorithm/ComplexModeler. All the code is also deposited in https://doi.org/10.5281/zenodo.13132116.
Competing Interests Statement
Authors declare that they have no competing interests.
Data Availability
The entries of the maps and corresponding structure models utilized in this study are provided in Sup Tables S1, S4 and S6. The experimental EM maps utilized can be downloaded from the EMDB (https://www.emdataresource.org/). The corresponding experimental determined structures utilized can be downloaded from the RCSB (https://www.rcsb.org/). The structures modeled by DiffModeler, diffused maps, intermediate diffusion results and the corresponding native structures from RCSB are available at https://zenodo.org/records/1215518468. The single-chain AF2 predicted structures are collected from AlphaFold Database (https://alphafold.ebi.ac.uk/).
References
- 1.Bai X-C, McMullan G & Scheres SH How cryo-EM is revolutionizing structural biology. Journal of Molecular Biology 40, 49–57 (2015). [DOI] [PubMed] [Google Scholar]
- 2.Wüthrich K The way to NMR structures of proteins. Nature structural biology 8, 923–925 (2001). [DOI] [PubMed] [Google Scholar]
- 3.Adams PD et al. PHENIX: building new software for automated crystallographic structure determination. Acta Crystallographica Section D: Biological Crystallography 58, 1948–1954 (2002). [DOI] [PubMed] [Google Scholar]
- 4.Terashi G & Kihara D De novo main-chain modeling for EM maps using MAINMAST. Nature Communications 9, 1618 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Pfab J, Phan NM & Si D DeepTracer for fast de novo cryo-EM protein structure modeling and special studies on CoV-related complexes. Proceedings of the National Academy of Sciences 118 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Terwilliger TC, Adams PD, Afonine PV & Sobolev OV A fully automatic method yielding initial models from high-resolution cryo-electron microscopy maps. Nature methods 15, 905–908 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Zhang X, Zhang B, Freddolino PL & Zhang Y CR-I-TASSER: assemble protein structures from cryo-EM density maps using deep convolutional neural networks. Nature methods 19, 195–204 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Terashi G, Wang X, Prasad D, Nakamura T & Kihara D DeepMainmast: integrated protocol of protein structure modeling for cryo-EM with deep learning and structure prediction. Nature methods 21, 122–131, doi: 10.1038/s41592-023-02099-0 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Wang X, Terashi G & Kihara D CryoREAD: de novo structure modeling for nucleic acids in cryo-EM maps using deep learning. Nature Methods, 1–9 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Liebschner D et al. Macromolecular structure determination using X-rays, neutrons and electrons: recent developments in Phenix. Acta Crystallographica Section D: Structural Biology 75, 861–877 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Topf M et al. Protein structure fitting and refinement guided by cryo-EM density. Structure 16, 295–307 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Rantos V, Karius K & Kosinski J Integrative structural modeling of macromolecular complexes using Assembline. Nature Protocols 17, 152–176 (2022). [DOI] [PubMed] [Google Scholar]
- 13.Lasker K, Topf M, Sali A & Wolfson HJ Inferential optimization for simultaneous fitting of multiple components into a CryoEM map of their assembly. Journal of molecular biology 388, 180–194 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Pettersen EF et al. UCSF Chimera—a visualization system for exploratory research and analysis. Journal of computational chemistry 25, 1605–1612 (2004). [DOI] [PubMed] [Google Scholar]
- 15.Alnabati E, Esquivel-Rodriguez J, Terashi G & Kihara D MarkovFit: Structure Fitting for Protein Complexes in Electron Microscopy Maps Using Markov Random Field. Frontiers in Molecular Biosciences 9, 935411 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Han X, Terashi G, Christoffer C, Chen S & Kihara D VESPER: global and local cryo-EM map alignment using local density vectors. Nature Communications 12, 2090, doi: 10.1038/s41467-021-22401-y (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Berman HM et al. The protein data bank. Nucleic acids research 28, 235–242 (2000). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Varadi M et al. AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models. Nucleic acids research 50, D439–D444 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Maddhuri Venkata Subramaniya SR, Terashi G & Kihara D Protein secondary structure detection in intermediate-resolution cryo-EM maps using deep learning. Nature Methods 16, 911–917, doi: 10.1038/s41592-019-0500-1 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Wang X et al. Detecting protein and DNA/RNA structures in cryo-EM maps of intermediate resolution using deep learning. Nature communications 12, 1–9 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Mostosi P, Schindelin H, Kollmannsberger P & Thorn A Haruspex: A Neural Network for the Automatic Identification of Oligonucleotides and Protein Secondary Structure in Cryo‐Electron Microscopy Maps. Angewandte Chemie International Edition (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Dhariwal P & Nichol A Diffusion models beat gans on image synthesis. Advances in Neural Information Processing Systems 34, 8780–8794 (2021). [Google Scholar]
- 23.Ho J, Jain A & Abbeel P Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems 33, 6840–6851 (2020). [Google Scholar]
- 24.Song J, Meng C & Ermon S in International Conference on Learning Representations. [Google Scholar]
- 25.Ramesh A, Dhariwal P, Nichol A, Chu C & Chen M Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125 (2022). [Google Scholar]
- 26.Nichol AQ et al. in International Conference on Machine Learning. 16784–16804 (PMLR; ). [Google Scholar]
- 27.Wolleb J, Sandkühler R, Bieder F, Valmaggia P & Cattin PC in International Conference on Medical Imaging with Deep Learning. 1336–1348 (PMLR; ). [Google Scholar]
- 28.Chen T, Li L, Saxena S, Hinton G & Fleet DJ in Proceedings of the IEEE/CVF international conference on computer vision. 909–919. [Google Scholar]
- 29.Saharia C et al. in ACM SIGGRAPH 2022 Conference Proceedings. 1–10. [Google Scholar]
- 30.Ruiz N et al. in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 22500–22510. [Google Scholar]
- 31.Corso G, Jing B, Barzilay R & Jaakkola T in International Conference on Learning Representations (ICLR; 2023). [Google Scholar]
- 32.Watson JL et al. De novo design of protein structure and function with RFdiffusion. Nature, 1–3 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Yim J et al. in International Conference on Machine Learning. 40001–40039 (PMLR; ). [Google Scholar]
- 34.Jumper J et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Zhang Y & Skolnick J Scoring function for automated assessment of protein structure template quality. Proteins: Structure, Function, and Bioinformatics 57, 702–710 (2004). [DOI] [PubMed] [Google Scholar]
- 36.Sudre CH, Li W, Vercauteren T, Ourselin S & Jorge Cardoso M in Deep learning in medical image analysis and multimodal learning for clinical decision support 240–248 (Springer, 2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Fontana P et al. Structure of cytoplasmic ring of nuclear pore complex by integrative cryo-EM and AlphaFold. Science 376, eabm9326 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Dutta D, Nguyen V, Campbell KS, Padrón R & Craig R Cryo-EM structure of the human cardiac myosin filament. Nature, 1–10 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Cramer P AlphaFold2 and the future of structural biology. Nature structural & molecular biology 28, 704–705 (2021). [DOI] [PubMed] [Google Scholar]
- 40.Carreira-Perpinan MA in 2006. IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06). 1160–1167 (IEEE; ). [Google Scholar]
- 41.Evans R et al. Protein complex prediction with AlphaFold-Multimer. BioRxiv, 2021.2010. 2004.463034 (2022). [Google Scholar]
- 42.Mukherjee S & Zhang Y MM-align: a quick algorithm for aligning multiple-chain protein complex structures using iterative dynamic programming. Nucleic acids research 37, e83–e83 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.He J, Lin P, Chen J, Cao H & Huang S-Y Model building of protein complexes from intermediate-resolution cryo-EM maps with deep learning-guided automatic assembly. Nature Communications 13, 4066 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Afonine PV et al. New tools for the analysis and validation of cryo-EM maps and atomic models. Acta Crystallographica Section D: Structural Biology 74, 814–840 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Zhang S, Li N, Zeng W, Gao N & Yang M Cryo-EM structures of the mammalian endo-lysosomal TRPML1 channel elucidate the combined regulation mechanism. Protein & cell 8, 834–847 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Kucukelbir A, Sigworth FJ & Tagare HD Quantifying the local resolution of cryo-EM density maps. Nature methods 11, 63–65 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Khan AK et al. A steric “ball-and-chain” mechanism for pH-mediated regulation of gap junction channels. Cell reports 31 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Blees A et al. Structure of the human MHC-I peptide-loading complex. Nature 551, 525–528 (2017). [DOI] [PubMed] [Google Scholar]
- 49.Majumder P et al. Cryo-EM structures of the archaeal PAN-proteasome reveal an around-the-ring ATPase cycle. Proceedings of the National Academy of Sciences 116, 534–539 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Gutiérrez-Fernández J et al. Key role of quinone in the mechanism of respiratory complex I. Nature communications 11, 4135 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Ding Z et al. High-resolution cryo-EM structure of the proteasome in complex with ADP-AlFx. Cell research 27, 373–385 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Ranson NA et al. ATP-bound states of GroEL captured by cryo-electron microscopy. Cell 107, 869–879 (2001). [DOI] [PubMed] [Google Scholar]
- 53.Sah-Teli SK et al. Structural basis for different membrane-binding properties of E. coli anaerobic and human mitochondrial β-oxidation trifunctional enzymes. Structure (2023). [DOI] [PubMed] [Google Scholar]
- 54.Sah-Teli SK et al. Complementary substrate specificity and distinct quaternary assembly of the Escherichia coli aerobic and anaerobic β-oxidation trifunctional enzyme complexes. Biochemical Journal 476, 1975–1994 (2019). [DOI] [PubMed] [Google Scholar]
- 55.Paul DM et al. In situ cryo-electron tomography reveals filamentous actin within the microtubule lumen. Journal of Cell Biology 219, e201911154 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Webb B & Sali A Comparative protein structure modeling using MODELLER. Current protocols in bioinformatics 54, 5.6. 1–5.6. 37 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Phillips JC et al. Scalable molecular dynamics with NAMD. Journal of computational chemistry 26, 1781–1802 (2005). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Jamali K et al. Automated model building and protein identification in cryo-EM maps. Nature, doi: 10.1038/s41586-024-07215-4 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Altschul SF, Gish W, Miller W, Myers EW & Lipman DJ Basic local alignment search tool. Journal of molecular biology 215, 403–410 (1990). [DOI] [PubMed] [Google Scholar]
- 60.Takada H et al. RqcH and RqcP catalyze processive poly-alanine synthesis in a reconstituted ribosome-associated quality control system. Nucleic Acids Research 49, 8355–8369 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Turk M & Baumeister W The promise and the challenges of cryo‐electron tomography. FEBS letters 594, 3243–3261 (2020). [DOI] [PubMed] [Google Scholar]
- 62.Chen Z et al. De novo protein identification in mammalian sperm using in situ cryoelectron tomography and AlphaFold2 docking. Cell 186, 5041–5053. e5019 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
Method-only References
- 63.Terashi G, Wang X, Prasad D, Nakamura T & Kihara D DeepMainmast: Integrated Protocol of Protein Structure Modeling for Cryo-EM with Deep Learning and Structure Prediction. Nature Methods (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Wang X, Terashi G & Kihara D De novo structure modeling for nucleic acids in cryo-EM maps using deep learning. Nat Methods (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Kingma DP & Ba J in International Conference on Learning Representations (2015). [Google Scholar]
- 66.Farabella I et al. TEMPy: a Python library for assessment of three-dimensional electron microscopy density fits. Journal of applied crystallography 48, 1314–1323 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Cretin G et al. SWORD2: hierarchical analysis of protein 3D structures. Nucleic acids research 50, W732–W738 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Wang X, Zhu H, Terashi G, Taluja M & Kihara D Data of “DiffModeler: Large Macromolecular Structure Modeling for Cryo-EM Maps Using Diffusion Model”. 10.5281/zenodo.12155184 (2024). [DOI] [PMC free article] [PubMed]
- 69.Wang X, Zhu H, Terashi G, Taluja M & Kihara D Code of “DiffModeler: Large Macromolecular Structure Modeling for Cryo-EM Maps Using Diffusion Model”. 10.5281/zenodo.13132116 (2024). [DOI] [PMC free article] [PubMed]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The entries of the maps and corresponding structure models utilized in this study are provided in Sup Tables S1, S4 and S6. The experimental EM maps utilized can be downloaded from the EMDB (https://www.emdataresource.org/). The corresponding experimental determined structures utilized can be downloaded from the RCSB (https://www.rcsb.org/). The structures modeled by DiffModeler, diffused maps, intermediate diffusion results and the corresponding native structures from RCSB are available at https://zenodo.org/records/1215518468. The single-chain AF2 predicted structures are collected from AlphaFold Database (https://alphafold.ebi.ac.uk/).










