Abstract
The breakthrough in cryo-electron microscopy (cryo-EM) technology has led to an increasing number of density maps of biological macromolecules. However, constructing accurate protein complex atomic structures from cryo-EM maps remains a challenge. In this study, we extend our previously developed DEMO-EM to present DEMO-EM2, an automated method for constructing protein complex models from cryo-EM maps through an iterative assembly procedure intertwining chain- and domain-level matching and fitting for predicted chain models. The method was carefully evaluated on 27 cryo-electron tomography (cryo-ET) maps and 16 single-particle EM maps, where DEMO-EM2 models achieved an average TM-score of 0.92, outperforming those of state-of-the-art methods. The results demonstrate an efficient method that enables the rapid and reliable solution of challenging cryo-EM structure modeling problems.
Keywords: cryo-ET, cryo-EM, protein complex, protein structure prediction, protein domain assembly
INTRODUCTION
Cryo-EM plays a crucial role in determining macromolecular structures [1]. With the technological breakthrough, an increasing number of cryo-EM density maps are released in recent years. Statistics reveal that over 14 000 new protein density maps have been deposited to Electron Microscopy Data Bank (EMDB) [2] in the past 4 years. However, density maps are merely three-dimensional images of proteins, while atomic structures are more essential for providing significant biological insights. Generally, high-resolution (<3 Å) maps enable high-quality structures construction using traditional modeling methods for X-ray crystallography [3–5]. For maps with resolution
3 Å, which constitute over 80% of the maps in EMDB, a common modeling method is to fit experimentally determined homologous structures to the density map. Nevertheless, the accuracy of the final model depends highly on the quality of the homologous templates. Unfortunately, experimentally determined structures are usually unavailable for numerous proteins, posing a significant challenge in high-precision modeling.
Proteins within cells often consist of multiple domains, each functioning as a compact and independent folding unit. These domains may exist individually or combine to form the tertiary structure of a protein chain, while interactions between multiple protein chains contribute to the formation of protein complexes. With the breakthrough of deep learning in protein structure prediction, the accuracy of computationally predicted model has reached unprecedented heights. Notably, AlphaFold2 [6] has demonstrated the ability to construct highly accurate models for a majority of protein chains, with predicted domain models approaching the precision of experimental measurements. Consequently, the structure modeling of cryo-EM density maps could be done by fitting AlphaFold2 predicted chain or domain models to density maps [7], where the fitness between the model and the density map is an extremely crucial step. Currently, numerous fitting methods have been developed, including EMBuild [8], Phenix [9], Situs [10], gmfit [11], EMFIT [12], MOFIT [13] and DEMO-EM [14–16]. Moreover, some deep learning methods, such as CR-I-TASSER [17] and DeepTracer [18], have been presented to extract main-chain atoms from density maps to assist the fitting or directly construct the model.
Despite the current methods have been successfully applied in many cases, constructing high-quality models from medium- to low-resolution density maps remains a significant challenge. Firstly, most methods are designed for rigid-body fitting, while the flexible fitting is also required when starting models exhibit deviations at different structural levels. For instance, accurate domain-level flexible fitting is essential for chain models with correct domain structures but incorrect domain orientations. Secondly, many methods are usually developed for single-chain protein fitting, requiring separate fitting of each chain to the density map to construct a complex model. However, these approaches may encounter limitations when applied to homomeric protein complexes or complexes with structurally similar chains, where different chains may be fitted to the same region of the map. Thirdly, the resolution of density maps may be non-uniform, or there may be missing density data in certain regions, which often impact the quality of the final constructed models. Finally, manual adjustments are indispensable in the fitting process for many methods, which is not user-friendly for non-computational biologists.
In this study, we present DEMO-EM2, a method designed for automated construction of protein complex models from cryo-EM density maps. DEMO-EM2 builds upon our previously developed DEMO-EM, which automatically assembles multi-domain protein structures from cryo-EM density maps through a progressive structure refinement process. Here, DEMO-EM2 employs an iterative assembly process for predicted chain models by intertwining chain-level matching, domain-level matching and domain-based fitting based on fast quasi-Newton optimization and differential evolution (DE) [19, 20] algorithms. To evaluate its performance, we built a test set comprising both cryo-EM and cryo-ET maps. The results on this test set demonstrate that the protein complex structures assembled by DEMO-EM2 are superior to those built by state-of-the-art methods in the field.
METHODS
Overview of DEMO-EM2
DEMO-EM2 is an improved method of DEMO-EM to automatically construct protein complex structures from cryo-EM density maps through an iterative assembly procedure. Figure 1 provides an overview of the DEMO-EM2 workflow. Starting from the sequence of the protein complex and its corresponding density map, the initial step involves pre-processing the density map to eliminate voxels with density values below a specified cutoff. Meanwhile, the structure of each chain is individually modeled using the single chain modeling method. Then all chains are sorted in descending order based on their sequence lengths, and each chain model is sequentially matched into the preprocessed density map one-by-one using a quasi-Newton optimization approach. Subsequently, in cases where the chain matching step does not yield promising results, two domain-based fitting strategies, namely domain matching and local domain optimization, are employed to search better locations of the chain model. Here, the domain matching individually matches each domain of the chain to the density map, and the local domain optimization adjusts the position and orientation of each domain through a DE algorithm for the best chain pose determined by the chain matching step. Next, the region has been matched by the current chain is removed from the density map to reduce the impact of the following matching of other chains. After all chains are matched into the density map through the above steps, the DE algorithm guided by model-density correlation and clash score is employed to obtain the optimal combination of all chain poses for building a complex model. Finally, the complex model undergoes further refinement through a global domain optimization, which simultaneously optimizes the positions and orientations of all domains.
Figure 1.
Flowchart of DEMO-EM2. This flowchart illustrates the construction process of a protein complex with three chains from Sar1-Sec23-Sec24 (PDBID: 6GNI). Initially, models for each chain are generated using a single-chain modeling program, with simultaneous preprocessing of the density map. Subsequently, each chain model is sequentially and independently matched with the density map using a quasi-Newton search. The chain is divided into multiple domains if the matching result is unsatisfactory, and either domain matching or local domain optimization is employed to explore better positions for the chain model. Next, the region matched by the current chain is removed from the density map. The complex model is constructed by searching the best combination of all chain poses. Finally, the complex model undergoes further refinement through a global domain optimization to generate the final model.
Individual chain modeling and density map processing
The input data processing is divided into two parts: the modeling of individual chains and the preprocessing of the density map. The chains are sorted in descending order based on their sequence length, and the model of each individual chain is independently modeled using the protein single-chain modeling methods, such as AlphaFold2, RossTTAFold [21], I-TASSER [22], ESMFold [23] and OmegaFold [24]. In our method, AlphaFold2 was employed due to its outstanding performance in protein structure prediction. For the density map, we first normalize the density values to the range [0, 1]. Then, a histogram of the normalized density values is calculated, and a threshold of the density valued is determined based on the second broader peak of the histogram, which indicates protein/nucleic acid density [25]. Finally, voxels with density values below the threshold are removed to generate a new map.
Fast matching of individual chain model with density map
For each chain model, the limited-memory Broyden–Fletcher–Goldfarb–Shanno (L-BFGS) algorithm—a quasi-Newton optimization approach is used to search the optimal positions and orientations of the chain model. Here, the position is represented by the translation vector, and the orientation is expressed by the rotation angle. For a pose generated by a pair of translation vector and rotation angle, a density correlation score (DCS) is computed as
![]() |
(1) |
which is used to guide the L-BFGS simulations. Here,
is the remaining number of voxels after the density map preprocessing,
is the experimental density of the
voxel
and
is the density probed from the decoy model.
and
represent the average density value of the map generated based on the three-dimensional structure and that of the experimental map, respectively.
can be calculated by
![]() |
(2) |
where
represents the
atom coordinates of the
residue in the structure,
represents the mass of
atom,
represents the resolution of the cryo-EM density map and
represents the position coordinates of the
voxel.
As the result of L-BFGS heavily depends on the initial solution due to its local convergence, we performed L-BFGS using multiple initial positions. To obtain multiple initial positions (translation vector), the center point of all voxels in the preprocessed density map is utilized as the starting point for exploring other positions. The search for the initial location must satisfy two conditions. Firstly, the distance between any two adjacent locations should be greater than
, where
represents the radius of gyration of the model, and the value of 0.85 is determined according to our previously study in DEMO-EM.
Å is the minimum distance between two initial positions. Secondly, the other condition is set to eliminate the edge position by an edge distance
determined according to the maximum distance between each initial position and the center point of the density map:
![]() |
(3) |
where
is the radius of gyration of the density map and is calculated as
![]() |
(4) |
where
is the center point of the
voxels (Supplementary Figure S1). To get the initial orientations (rotation angle) for each initial position, we enumerate all combinations of Euler angles (
,
and
) by traversing the whole density map with a step size of
, where the range of
and
is [0°, 360°],
is in the range of [0°, 180°] and
is set to 30° to strike a balance between performance and efficiency according to the recommendation in Situs. Here, we select the orientations with the smallest DCS as the initial orientation for each initial position to reduce computational costs. For each initial position, a L-BFGS search is conducted to get the optimal position and orientation. Following the L-BFGS search, DCS of the generated structures are sorted in ascending order, and the top five structures with the smallest scores are selected.
Domain matching and local domain optimization for individual chain
In order to further improve the accuracy of individual chain matching, a domain matching strategy is employed to individually match each domain of a chain to the density map. Specifically, we firstly segment the chain into domains through our previously designed FUpred [26], which maximizes number of intra-domain contacts and minimizes that of inter-domain contacts on the ResPRE [27] predicted contact map. Then each domain is individually matched with the density map according to the long-short order determined by their sequence lengths, where the matching method is identical to the one used in the fast chain matching.
For the local domain optimization, we apply a DE algorithm to optimize the position and orientation of each domain in the chain. The optimization is guided by a more detailed energy terms as defined in Equations (5)–(7). The top five models are selected in ascending order based on DCS, preparing for subsequent complex structure construction. The local domain optimization is evaluated through an energy function comprising three terms:
![]() |
(5) |
where the first term is the density correlation score defined as in Equation (1). The second term is the connectivity of the boundary between domains, which is designed to prevent disconnection between segments of a domain. The domain boundary connectivity score is calculated as
![]() |
(6) |
where
=3.8 Å is the standard distance between adjacent
atoms. The parameter
represents the distance between the
atom of the C-terminal residue of the mth domain and the N-terminal residue of the nth domain. For cases involving discontinuous domains, the discontinuous domain is split into two segments due to the insertion of a continuous domain,
is the average distance between two linkers connecting a continuous domain and a discontinuous segment of domain (Supplementary Figure S2). The third term is a constraint on inter-domain steric clash, defined as
![]() |
(7) |
where
is the distance between the ith
atom in the mth domain and the jth
atom in the nth domain in the structure,
and
respectively represent the number of residues in the mth domain and nth domain,
= 3.75 Å is the minimum distance between two
atoms.
The optimal weight factors in E are determined by maximizing the correlation between the energy and Root Mean Square Deviation (RMSD) of the decoy models to the native structure based on a training set of 254 proteins extracted from He et al. [8], which are non-redundant to the benchmark set. The optimal values are
,
and
.
Density map region removing for matched model
To prevent the target chain from erroneously aligning with regions of the density map already matched by other chains, we remove the regions of the density map that have been previously aligned with chain models. We firstly calculate the distance between all
atoms in the matched models and all voxels in the density map. Then, the density value of the voxel of the density map is set to 0 if there exists at least one
atom near it with a distance less than 4 Å. The resultant modified density map serves as the input for the subsequent steps of the chain matching process.
Complex structure assembling and global domain optimization
During the chain matching and domain fitting processes, we adopt a strategy to exclude regions that have already been matched by the model from the density map for subsequent matching iterations. To alleviate the potential impact of inaccuracies in the removal of map regions due to imprecise matching, we initially consider only those chain models with DCS below a specified cutoff in the first iteration. Models that surpass this cutoff are subjected to further iterative matching with the density map, gradually enhancing the DCS cutoff until reaching a threshold of 0.5. If the DCS of models is still larger than 0.5, we directly fit them into the density map according to the ascending order of DCS without considering the cutoff of DCS. Here, the DCS cutoff is set to 0.4, and the step size is 0.02, which is determined based on the training set with the aim of balancing efficiency and accuracy. For each chain, the top five poses are selected according to DCS. The complex model is generated by searching the optimal combination of the selected poses for all chains through the DE algorithm guided by an energy function composed of DCS and atomic clash.
In order to further improve the accuracy of the complex model, we apply a global domain optimization with concurrent optimization for all domains within the model. Specifically, the DE algorithm is employed to simultaneously explore the best positions and orientations of all domains by considering the density map restraint and inter-domain potentials [Equation (5)], in which all domains are kept rigid. The model associated with the lowest DCS is selected as the final complex model.
RESULTS
Benchmark set
To evaluate DEMO-EM2, we collected 43 density maps (Supplementary Table S1 and S2) from EMDB (http://www.ebi.ac.uk/emdb/) based on the standards used in EMBuild (Supplementary Section 1). This dataset contains 27 maps constructed through subtomogram averaging reconstructions and 16 maps obtained through single-particle reconstruction techniques. These density maps are sharpened maps and non-redundant. The resolutions of all maps fall within the range of 3–10 Å, and the distribution of density maps is approximately balanced across different resolution ranges (Supplementary Figure S3). The corresponding structures of these density maps were retrieved from the Protein Data Bank (PDB) at https://www.rcsb.org. These proteins contain 1–24 chains. For all benchmark proteins, the density volume within a distance of 4.0 Å to the structures were segmented out from the density map. The structural model for each chain is generated using AlphaFold2, where all templates released before the query were excluded by setting the template library date to that before the release date of the query. This resulted in the TM-score of chain models ranging from 0.40 to 0.99, with an average TM-score = 0.86, and RMSD ranges from 0.5 to 42.5 Å, with an average RMSD = 4.1 Å (Supplementary Figure S4).
Comparison with state-of-the-art methods on cryo-ET maps
We firstly assessed DEMO-EM2 on the 27 cryo-ET maps and compared it with four state-of-art methods, namely, EMBuild, phenix.dock_in_map, Situs and gmfit (Supplementary Sections 2–5). All settings of these methods remain the same as their original methods. To assess the accuracy of model construction, we employed two metrics, TM-score and RMSD, calculated using USalign [28]. Figure 2A shows the TM-scores of complex models constructed by different methods relative to the deposited PDB structures, where the majority of the deposited PDB structures were constructed by fitting homologous models into cryo-EM density maps using fitting tools such as UCSF Chimera [29], followed by manual adjustment of the structure using tools like Coot [4] and refinement using methods like Phenix [30] or RosettaCM [31] in their original publications. The detailed results for each protein can be found in Supplementary Tables S3–S7. From Figure 2B, we can observe that DEMO-EM2 is obviously superior to other methods. Specifically, the average TM-score of complex models constructed by DEMO-EM2 reaches 0.94, which is 10.6%, 80.8%, 176.5% and 248.1% higher than that by EMBuild (0.85), phenix.dock_in_map (0.52), Situs (0.34) and gmfit (0.27), respectively. The corresponding P-values in Student’s t-test is 2.32 × 10−2, 6.11 × 10−8, 1.31 × 10−13 and 8.26 × 10−16, respectively, suggesting that the difference is statistically significant. Moreover, DEMO-EM2 built models obtain a TM-score > 0.9 in 88.9% of the dataset, which is higher than that of EMBuild (77.8%), phenix.dock_in_map (20.8%), Situs (3.8%) and gmfit (0.0%). As shown in Figure 2C, the average RMSD for models constructed by DEMO-EM2 is as low as 5.8 Å, which is considered highly commendable for construction of complex models, despite the inherent limitations in RMSD’s algorithm [32, 33]. Specifically, the average RMSD of DEMO-EM2 is significantly lower than EMBuild (9.6 Å), phenix.dock_in_map (29.7 Å), Situs (40.2 Å) and gmfit (47.6 Å), corresponding to Student’s t-test P-values of 4.98 × 10−2, 2.76 × 10−5, 2.63 × 10−9 and 2.77 × 10−9, respectively. From Figure 2D, it can be observed that the majority of models generated by DEMO-EM2 obtain RMSD values lower than 5 Å, which is far below those of other methods.
Figure 2.
Results for proteins using 27 cryo-ET maps. (A) The distribution of TM-scores for models constructed by different methods. The vertical lines represent the 10th to 90th percentiles; the shape of the half-violin plot illustrates the distribution; and the diamond represents the TM-score corresponding to the distribution. (B) Comparison of the TM-scores between DEMO-EM2 and other methods on each case. (C) Boxplot of the RMSDs of models obtained by different methods. The box represents the lower to upper quartiles, the horizontal line and circle in the box represent the median and mean, respectively, and the whiskers indicate the 5th and 95th percentiles. (D) Comparison of the RMSDs between DEMO-EM2 and other methods on each case. (E) Comparison between the TM-score of the individual chain models by AlphaFold2 and that of the final models constructed by DEMO-EM2. (F) iFSC of models constructed by different methods. Error bars indicate ±1.0 standard deviations. (G) and (H) Comparison between constructed models with deposited structures (half-transparent gray) for PDBID: 5L93 and PDBID: 6HWI, respectively.
Due to the flexibility of inter-domain orientations in the DEMO-EM2 assembly, the part of increase in the TM-score of the complex model may also result from an improvement in the quality of individual chain models. To investigate this, we conducted a comparison of TM-score between the initial chain model generated by AlphaFold2 and the chain model in the final complex model constructed by DEMO-EM2. As shown in Figure 2E, 96.8% of individual chains exhibit improved or unchanged TM-scores after DEMO-EM2 assembly, whereas EMBuild achieves this in 93.2% of chains (Supplementary Figure S5). On average, DEMO-EM2 improved the TM-score of chain models from 0.86 to 0.91 (P-value = 1.64 × 10−11, Student’s t-test), indicating that the chain-level structural improvements brought by DEMO-EM2 are statistically significant. However, the average TM-score of chain models generated by EMBuild was only increased to 0.87, which is 4.6% lower than that of DEMO-EM2. For phenix.dock_in_map and gmfit, the accuracy of chain models was not changed since individual chain models are kept rigid in their modeling. These results indicate that domain-based fitting of DEMO-EM2 could significantly improve the inter-domain orientations of the individual chain, leading to improvements in both individual chain models and the final complex models.
Since the deposited PDB structure is often unknown in real applications, it is important to evaluate the constructed model against the density map. To comprehensively evaluate DEMO-EM2, we also compared it with other methods in terms of iFSC (integrated Fourier shell correlation) [34], a metric quantifying the correlation between the model and the density map. iFSC has a range from 0 to 1, with higher values indicating a more accurate fit between the model structure and the density map. As illustrated in Figure 2F, the models constructed by DEMO-EM2 achieved an average iFSC of 0.60, which is higher than that of EMBuild (0.55), phenix.dock_in_map (0.24), Situs (0.24) and gmfit (0.01), respectively.
Figure 2G and H show two representative cases that DEMO-EM2 constructed high quality complex models. First, Figure 2G (left) shows the comparison between the deposited PDB structure (PDBID: 5 L93, three homologous chains) and the DEMO-EM2 model of EMD-4015, a 3.9 Å cryo-ET map for Human Immunodeficiency Virus type 1. As shown in the figure, DEMO-EM2 constructed a high accurate model with TM-score = 0.98 and RMSD = 1.5 Å to the deposited structure, which is significantly superior to models (Supplementary Figure S6A–D) constructed by EMBulid (TM-score = 0.30, RMSD = 21.7 Å), phenix.dock_in_map (TM-score = 0.24, RMSD = 36.7 Å), Situs (TM-score = 0.27, RMSD = 34.2 Å) and gmfit (TM-score = 0.21, RMSD = 24.2 Å). The likely main reason that DEMO-EM2 obtains better results is the use of domain-based fitting, which improves the individual chain model. Figure 2G (middle and right panels) shows the comparison of the initial chain model by AlphaFold2 with the deposited chain structure and the comparison of final chain model by DEMO-EM2 with the deposited chain structure, respectively. Starting from the chain with incorrect domain orientations (Figure 2G middle), DEMO-EM2 redressed the orientations and increased the TM-score of the chain from 0.56 to 0.94, corresponding to a decrease of RMSD from 13.7 to 1.5 Å (Figure 2G right).
Figure 2H (left) shows another example EMD-0290, a cryo-ET map with the resolution of only 7.2 Å. The corresponding PDB structure contains four homologous chains, and it was derived from the immature M-PMV capsid hexamer structure in intact virus particles (PDBID: 6HWI). Despite the lower resolution, the model constructed by DEMO-EM2 achieved a TM-score of 0.93 and an RMSD of 2.4 Å, surpassing models built by other methods by a considerable margin (Supplementary Figure S6E–H). Specifically, the TM-score of the model constructed by DEMO-EM2 is 121.4% higher than that of EMBuild (0.42), 210.0% phenix.dock_in_map (0.3), 210.0% Situs (0.3) and 287.5% gmfit (0.24), and the RMSD is significantly lower than EMBuild (21.9 Å), phenix.dock_in_map (17.3 Å), Situs (23.3 Å) and gmfit (25.8 Å). Particularly, the three initial chain models are identical and initially exhibit a TM-score of 0.57 due to incorrect domain orientations (Figure 2H middle). However, their TM-scores were increased to 0.87, 0.86 and 0.87, respectively, after the DEMO-EM2 assembly. Figure 2H (right) reports the structural comparison between the final chain model of DEMO-EM2 and the deposited chain structure for one of the three chains.
Evaluation on single-particle maps of cryo-EM data
DEMO-EM2 was further evaluated on 16 cryo-EM maps obtained by single-particle, the detailed results for each of the 16 cases are listed in Supplementary Tables S8–S12. Figure 3A and B demonstrate that DEMO-EM2 remains outperform other methods. In detail, models built by DEMO-EM2 have an average TM-score of 0.88, which is 6.0%, 22.2%, 33.3% and 87.2% higher than that of EMBuild (0.83), phenix.dock_in_map (0.72), Situs (0.66) and gmfit (0.47), respectively. DEMO-EM2 achieves a TM-score > 0.9 in 75.0% of the cases, surpassing EMBuild (53.3%), phenix.dock_in_map (31.3%), Situs (25.0%) and gmfit (6.3%). In terms of RMSD, DEMO-EM2 also outperforms other methods for most cases (Figure 3C and D). The average RMSD of models built by DEMO-EM2 is 9.7 Å, which is lower than EMBuild (10.8 Å), phenix.dock_in_map (19.8 Å), Situs (24.7 Å) and gmfit (36.0 Å). Similar to the results for cryo-ET maps, Supplementary Figure S7A also indicates that the DEMO-EM2 assembly could improve the individual chain models. In addition, we also calculated the iFSC of models by different methods in Supplementary Figure S7B. On average, the iFSC of DEMO-EM2 is 0.47, which is also higher than that of EMBuild (0.46), phenix.dock_in_map (0.25), Situs (0.31) and gmfit (0.01).
Figure 3.
Results for proteins using 16 cryo-EM maps. (A) Half-boxplot of the TM-scores for models built by different methods. The box represents the lower to upper quartiles, the horizontal line and circle in the box represent the median and mean, respectively, and the whiskers indicate the 5th and 95th percentiles. (B) Comparison of the TM-scores between DEMO-EM2 and other methods. (C) Distribution of RMSDs for models constructed by different methods. Vertical lines represent outliers 1.5, white squares indicate the means, the length of black boxes represents the 25th to 75th percentiles and the shape of the violin plot illustrates the distribution. (D) Comparison of the RMSDs between DEMO-EM2 and other methods on each case. (E) Comparison between constructed models with deposited structures (half-transparent gray) for PDBID: 5FWP.
Figure 3E presents a representative example of high accurate complex models constructed by DEMO-EM2 from cryo-EM maps. The example is EMD-3340, a 7.2 Å cryo-EM map of the Hsp90-Cdc37-Cdk4 complex (PDBID: 5FWP). Figure 3E (left) illustrates the comparison between the DEMO-EM2 model and the deposited PDB structure. From the figure, it is evident that the DEMO-EM2 model reproduces the conformation of the complex structure with four chains in the deposited structure, and the model is superior to those of other methods (Supplementary Figure S8A–D). Specifically, the DEMO-EM2 model achieves a TM-score of 0.97 and an RMSD of 3.1 Å, which is significantly better than models constructed by EMBuild (TM-score = 0.89, RMSD = 16.9 Å), phenix.dock_in_map (TM-score = 0.82, RMSD = 24.9 Å), Situs (TM-score = 0.73, RMSD = 30.7 Å) and gmfit (TM-score = 0.40, RMSD = 54.0 Å). Consistent with results of cryo-ET cases, the superior performance of DEMO-EM2 is likely predominantly attributed to domain-based fitting. The starting chain models of 5FWP include a chain with incorrect domain orientations (Figure 3E middle). However, DEMO-EM2 effectively corrected the domain orientations of the chain, resulting in a significant improvement of TM-score from 0.48 to 0.82 and a reduction in RMSD from 17.0 to 4.0 Å (Figure 3E right).
Supplementary Figure S9 presents two representative examples to show the quality of side-chain in the DEMO-EM2 model. Supplementary Figure S9A depicts a high-precision protein complex structure by DEMO-EM2 from Mycobacterium smegmatis MmpL3 (PDBID: 8QKK), with Rotamer outliers [35] as low as 0.78%, a MolProbity score [36] of 0.98 and a Q-score [9] of 0.49. Although the deposited model has a close similarity to the DEMO-EM2 model with a TM-score of 0.97, the DEMO-EM2 model has a better local geometry with the MolProbity score improved from 1.63 to 0.98. As shown in Supplementary Figure S9B, another example is a druggable VP1–VP3 interprotomer pocket in the capsid of enteroviruses (PDBID: 6GZV). The model constructed by DEMO-EM2 obtains Rotamer outliers of 0.28%, a MolProbity score of 2.30 and a Q-score of 0.5. Compared to the deposited structure, although some metrics are slightly inferior to the deposited structure, the DEMO-EM2 constructed model shows a lower Rotamer outlier rate (0.28%) compared to the deposited structure (1.28%), indicating superior side-chain modeling in the DEMO-EM2 model.
For all 43 density maps, we also examined the correlation between the resolution of the density map and the TM-scores of the complex models built by different modeling methods (Supplementary Figure S10), where three cases with more than one low-quality initial chain models by AlphaFold2 were excluded. DEMO-EM2 not only outperforms other methods but also exhibits consistent performance across different map resolution ranges, consistently maintaining a TM-score above 0.9. In contrast, the performance of phenix.dock_in_map is most severely impacted by the resolution, as evidenced by a sharp decline in TM-score with the resolution decreasing within the range of 5–8 Å. Similarly, the performance of Situs also dropped within the range of 4−7 Å as the resolution decreases. The performance of EMBuild significantly fluctuates within the resolution ranges of 3–5 and 6–8 Å. The gmfit exhibits varying TM-score values at different resolutions, and all TM-scores are below 0.5. These results demonstrate the robustness of DEMO-EM2, highlighting that its performance is not highly dependent on the resolution of the density map.
Effects of functional modules in DEMO-EM2
We designed ablation experiments to analyze the contribution of each functional module of DEMO-EM2, including fast chain matching (step1), domain-based fitting (step2) and global domain optimization (step3), on all the 43 maps. As shown in Figure 4A, we firstly ran DEMO-EM2 using only the fast chain matching, yielding an average TM-score of 0.82. Then we performed an experiment by using both fast chain matching and domain-based fitting, which results an average TM-score of 0.91, corresponding to an increase of 11.0%. Finally, when global domain optimization was included, the average TM-score of DEMO-EM2 was increased to 0.92. In terms of RMSD, the values decreased from 11.2 to 7.5 Å and eventually reached 7.3 Å when sequentially adding the domain-based fitting and the global domain optimization to the fast chain matching (Figure 4B). Additionally, Figure 4C presents the iFSC of different experiments. When sequentially applying domain-based fitting and global domain optimization on the basis of the fast chain matching, iFSC increased progressively from 0.45 to 0.51 and ultimately reached 0.55.
Figure 4.
The results of ablation experiments on the 43 density maps. step1 represents the fast chain matching, step2 represents the domain-based fitting including domain matching (step2–1) and local domain optimization (step2–2) and step3 represents the global domain optimization. (A) Distribution of TM-scores for models constructed by different schemes. The vertical lines represent the 10th to 90th percentiles, the circle indicates the mean, the shape of the half-violin plot illustrates the distribution and the diamond represents the TM-score corresponding to the test case. (B) Comparison of the average RMSD of models constructed by different schemes. (C) Distribution of iFSC for models constructed by different schemes. Vertical lines represent the 10th to 90th percentiles, white squares indicate the means, the length of black boxes represents the 25th to 75th percentiles and the shape of the violin plot illustrates the distribution. (D–G) Comparisons between constructed models with deposited structures (half-transparent gray) for PDBID: 8C9M, PDBID: 6V9I, PDBID: 7NO3 and PDBID: 7PTT, respectively.
These results demonstrate the effectiveness of the domain-based fitting and global domain optimization, especially the main contribution by the domain-based fitting. The domain-based fitting includes domain matching and local domain optimization, where the former is only performed when all individual chains are structurally different, while the latter is only applied for the case that has structurally similar chains. In our experiments, 62.5% of the cases utilizing domain matching were improved, and 25% of the cases were unchanged because they contain initial chain models with both incorrect domain models and domain orientations. For cases employing local domain optimization, 86.7% of them were improved, and others were degraded since they contain
chains with incorrect domain models or domain orientations. The global domain optimization resulted in an improvement in 62.8% of the cases, 32.6% remaining unchanged and a slight degradation observed in other cases. However, the correlation between the model and the density map was obviously improved, which corresponds to an increase of 7.8% for iFSC after the global domain optimization.
Figure 4D shows a representative case with notable enhancement after adding the local domain optimization. The instance is EMD-16511, a 3.2 Å cryo-ET map for the HERV-K Gag immature lattice with six chains (PDBID: 8C9M). The TM-score of the model (Figure 4D left) constructed by DEMO-EM2 using only the fast chain matching is merely 0.32, and the RMSD is as high as 29.2 Å. However, when the local domain optimization was added, the complex model was significantly improved, which achieves a TM-score of 0.97 and a RMSD of 3.6 Å. Figure 4E exhibits a typical example of Cullin5 bound to RING-box protein 2 (PDBID: 6V9I with two chains), which achieved a significant improvement by the domain matching used a 5.2 Å cryo-EM map (EMD-21121). In the left and right panels of Figure 4E, the model constructed solely through fast chain matching and the model generated by combining fast chain matching with domain matching are compared with the deposited structure, respectively. These figures show that the domain matching significantly improved the TM-score from 0.83 to 0.94, while reducing the RMSD from 5.0 to 2.9 Å. Figure 4F provides a representative example demonstrating the effectiveness of the global domain optimization. It is a pentamer derived from polyhedral VLPs (PDBID: 7NO3) using a 5.8 Å cryo-ET map (EMD-12488). As shown in the figure, the model (Figure 4F left) constructed by DEMO-EM2 without using global domain optimization obtains a TM-score of 0.96, while the TM-score was increased to 0.98 after performing the global domain optimization (Figure 4F right), and the RMSD was decreased from 2.5 to 1.5 Å. These cases further demonstrate the effectiveness of the domain-based fitting and the global domain optimization.
Figure 4G illustrates an example of the incremental improvement by sequentially adding domain-based fitting and global domain optimization to fast chain matching. It is the in situ structure of hexameric S-layer protein (PDBID: 7PTT with six chains) and corresponds to an 8.0 Å cryo-ET map (EMD-13637). The left structure of Figure 4G presents a comparison between the model constructed by DEMO-EM2 using only the fast chain matching and the deposited structure. The middle and right structures in Figure 4G depict comparisons of the deposited structure with the models generated by successively adding domain-based fitting and global domain optimization to fast chain matching, respectively. We can observe from the figure that the TM-score gradually increases, starting from 0.84, reaching 0.90 and finally reaching 0.96. Simultaneously, the RMSD consistently decreases, starting from 10.7 Å, reducing to 8.2 Å and ultimately reaching 5.4 Å.
Comparison with DEMO-EM
DEMO-EM is specifically designed for constructing models of multi-domain proteins from cryo-EM density maps, whereas DEMO-EM2, developed here, focuses on constructing models of protein complexes from cryo-EM density maps. A commonality between DEMO-EM and DEMO-EM2 is that they both rely on domain assembly. However, DEMO-EM2 incorporates more advanced algorithms and strategies compared to DEMO-EM. Firstly, DEMO-EM2 preprocesses the density map to reduce the impact of noise on chain or domain fitting. Secondly, in addition to employing the quasi-Newton optimization algorithm, DEMO-EM2 utilizes the DE algorithm to prevent it from being trapped in local optima. Thirdly, unlike DEMO-EM, which employs a fixed DCS cutoff, DEMO-EM2 iteratively assembles using a dynamic DCS cutoff. Fourthly, DEMO-EM2 removes the regions of the density map that have been previously aligned with chain models to prevent different chains from matching to the same regions of the density map.
Supplementary Figure S11 presents two representative examples demonstrating the superiority of DEMO-EM2 over DEMO-EM. Supplementary Figure S11A shows the example EMD-3856 derived from an inside-out FMDV A10 capsid (PDBID: 5OWX, three chains). The model constructed by DEMO-EM2 (red) achieved a TM-score of 0.94 and an RMSD of 2.2 Å, outperforming the model constructed by DEMO-EM (green), which obtained a TM-score of 0.65 and an RMSD of 22.0 Å. This improvement is mainly attributed to the more advanced domain matching strategy of DEMO-EM2. In Supplementary Figure S11B, we report another example EMD-17929, the immature HTLV-1 CA-NTD from in vitro assembled MA126-CANC tubes (PDBID: 8PU6, three homologous chains). The model constructed by DEMO-EM2 (blue) achieved a TM-score of 0.98 and an RMSD of 1.0 Å. In contrast, the model constructed by DEMO-EM (yellow) exhibited chain overlap, resulting in a TM-score of 0.68 and an RMSD of 13.4 Å. This is primarily because DEMO-EM did not remove the regions of the density map that had previously been aligned with chain models.
Comparison with ModelAngelo and AlphaFold-Multimer
DEMO-EM2 was also compared with the state-of-the-art method ModelAngelo [37] on the test set. From Supplementary Figure S12A, we can observe that the average TM-score of complex models constructed by DEMO-EM2 is 0.95, which is 150.0% higher than that by ModelAngelo (0.38). The corresponding P-value in Student’s t-test is 3.36 × 10−11, indicating that the difference is statistically significant. Moreover, the TM-score of the model constructed by DEMO-EM2 is higher than that constructed by ModelAngelo in 93.9% of the test set. From Supplementary Figure S12B, it can be observed that the average RMSD of the models constructed by DEMO-EM2 is 4.2 Å, which is significantly lower than that of ModelAngelo (9.7 Å). DEMO-EM2 outperforms ModelAngelo mainly because many test cases in our test set have density map resolutions >4 Å, which falls outside the high-resolution range that ModelAngelo was trained on.
In addition, DEMO-EM2 was also compared with AlphaFold2-multmier on 25 out of the 43 test cases with <2500 residues due to the memory limitation. Supplementary Figure S13A and B illustrate that DEMO-EM2 outperforms AlphaFold2-multmier. Specifically, the models constructed by DEMO-EM2 achieved an average TM-score of 0.97, which is 56.5% higher than that of AlphaFold2-multmier (0.62). The corresponding P-value in Student’s t-test is 3.50 × 10−6, indicating that the difference is statistically significant. The average RMSD of models built by DEMO-EM2 is 2.4 Å, which is significantly lower than that of AlphaFold2-multmier (17.9 Å), corresponding to a student’s t-test P-value of 1.59 × 10−5. In addition, DEMO-EM2 could optimize complex models predicted by AlphaFold2-multmier. Supplementary Figure S14 presents two representative examples. The example is an in situ structure of the Caulobacter crescentus S-layer, as shown in Supplementary Figure S14A. Supplementary Figure S14A (left) shows a comparison between the structure generated by AlphaFold2-multmier (red) and the deposited structure (gray). From the figure, it can be observed that the model constructed by AF2 achieved a TM-score of 0.25 and an RMSD of 30.3 Å. However, in Supplementary Figure S14A (right), the model (blue) achieved a TM-score of 0.99 and an RMSD of 0.8 Å after the optimization by DEMO-EM2, and the DEMO-EM2 model almost perfectly aligns with the deposited structure (gray). Another example is the VPS26 dimer region of the metazoan membrane-assembled retromer. As illustrated in Supplementary Figure S14B, it also shows significant improvement in TM-score and RMSD after the optimization by DEMO-EM2.
Test DEMO-EM2 using deep-learning processed density maps
Due to the widespread application of deep learning, many methods have been developed to process density maps [8, 38]. To explore the effectiveness of deep learning–processed density maps in the model construction of DEMO-EM2, we integrated the density main-chain probability map generated by EMBuild into DEMO-EM2 and tested it on the 43 density maps. Specifically, DEMO-EM2 obtained an average TM-score of 0.89 when applying the main-chain probability map, indicating a 3.4% decrease compared to the TM-score of 0.92 obtained using the original density maps (Supplementary Table S13). Among all cases, only 9.5% of showed an improvement of TM-score > 0.001 after using the main-chain probability map, while the remaining cases either remained unchanged or decreased (Supplementary Figure S15A). The average RMSD (Supplementary Figure S15B) is increased from 7.3 to 9.6 Å, and the running time is also significantly increased when using the main-chain probability map because EMBuild rescales the grid width to 1.0 Å. Furthermore, when using the same main chain probability map, DEMO-EM2 models achieved an average TM-score of 0.89 and an average RMSD of 7.3 Å, which is better than models constructed by EMBuild (TM-score = 0.85, RMSD = 10.0 Å). These results suggest that density maps processed with deep learning may not significantly contribute to model construction for DEMO-EM2.
Computational efficiency
The average runtime required by the whole pipeline of DEMO-EM2 when starting from chain models generated by AlphaFold2 based on the experiment of 43 test targets is 3.4 h. Each test case was executed on a single core of an Intel(R) Xeon(R) CPU E5-2680 v3@2.50GHz processor. Supplementary Figure S16A illustrates the relationship between the number of residues and the runtime of DEMO-EM2. In the figure, it can be observed that 85.3% of proteins are completed within 1 h. In addition, Supplementary Figure S16B depicts the relationship between the number of chains in proteins and the runtime of DEMO-EM2. The figure shows that 76.5% of proteins with fewer than eight chains are completed within 1 h.
Application to maps lacking deposited structures
Supplementary Figure S17 presents two cases of models constructed by DEMO-EM2 from cryo-EM maps lacking deposited structures. Supplementary Figure S17A shows the example of EMD-14623, a 4.8 Å cryo-ET map of the Rubisco from Cyanobium carboxysome (innermost layer). Due to the absence of deposited structures, we only employ Q-score to assess the quality of models generated by DEMO-EM2. The Q-score of the model constructed by DEMO-EM2 is 0.24. It is worth noting that the iFSC of this model has reached 0.80, indicating an accurate fit between the model by DEMO-EM2 and the density map. As shown in Supplementary Figure S17B, another case is EMD-20028, a 3.1 Å cryo-EM map of Human apoferritin. The figure illustrates DEMO-EM2 model superimposed onto the cryo-EM map colored in transparent gray. The model constructed by DEMO-EM2 achieves a Q-score of 0.54, while the iFSC between the model and the density map is 0.68, indicating a high-confidence model.
CONCLUSION
In this study, we propose DEMO-EM2, an extension of our previously developed DEMO-EM. DEMO-EM2 is an automated modeling approach for the assembly of protein complexes from cryo-EM density maps. Starting from the predicted chain models, DEMO-EM2 constructs models through an iterative assembly procedure that intertwines chain-level matching, domain-level matching and domain-based fitting based on the L-BFGS and DE algorithms. DEMO-EM2 was tested on a composite benchmark set of 43 density maps with different resolutions and maps generated by single particle and subtomogram averaging techniques. The results showed that DEMO-EM2 yielded high-accurate complex models for 83.7% cases with TM-score > 0.9 using density maps ranging from 3 to 10 Å. DEMO-EM2 was compared with four widely used methods (EMBuild, phenix.dock_in_map, Situs and gmfit), Starting from the same density maps and individually modeled chain models, the TM-score of the full-length models constructed by DEMO-EM2 is 8.2%, 53.3%, 95.7% and 162.9% higher than EMBuild, phenix.dock_in_map, Situs and gmfit, corresponding to P-values of 9.33 × 10−3, 4.93 × 10−9, 1.28 × 10−13 and 5.56 × 10−18 in Student’s t-test, respectively. DEMO-EM2 could potentially be applied to assist drug discovery, understanding disease mechanisms and precision diagnostics. For example, the high-precision structures constructed by DEMO-EM2 may be applied to understanding the interactions between drugs and targets, thereby aiding in the design and optimization of drug molecules’ structure. Additionally, DEMO-EM2 could potentially be used to build the structures of pathogens such as viruses and bacteria, which is beneficial for understanding their infection mechanisms.
While DEMO-EM2 has shown promising results, there is potential for further improvement in several aspects. Firstly, we can use deep learning techniques to extract features of the density map to predict distances and orientations between chains/domains, which can be considered as new constraints to guide the assembly of protein complex models. Secondly, the model quality assessment method [39, 40] combined with flexible assembly may further improve the accuracy of the DEMO-EM2 final complex model. Thirdly, identifying the domain region in the density map based on deep learning may be favorable for improving the model quality. Efforts along these lines will further enhance DEMO-EM2 for assembling complex structures.
Key Points
DEMO-EM2 is an extended method of DEMO-EM for constructing protein complex structures from cryo-EM maps by intertwining chain-level matching, domain-level matching and domain-based fitting.
Experimental results on 43 density maps show that DEMO-EM2 is capable of constructing high-quality models that surpass state-of-the-art methods in this field.
DEMO-EM2 is able to improve the AlphaFold2 predicted chain models by leveraging the flexibility of inter-domain orientations in the assembly and the quality of constructed complex unaffected by the resolution of density maps.
Supplementary Material
Author Biographies
Ziying Zhang is an MS candidate in the College of Information Engineering, Zhejiang University of Technology. His research interests include protein structure prediction and deep learning.
Yaxian Cai is an MS candidate in the College of Information Engineering, Zhejiang University of Technology. Her research interests include protein structure prediction and deep learning.
Biao Zhang is a lecturer in the College of Information Engineering, Zhejiang University of Technology. His research interests include protein folding and structure prediction.
Wei Zheng is a post-doctoral fellow in the Department of Computational Medicine and Bioinformatics, University of Michigan. His research interests include protein structure and function prediction.
Lydia Freddolino is an associate professor in the Department of Computational Medicine and Bioinformatics, University of Michigan. His research interests include protein folding and protein function prediction.
Guijun Zhang is a professor in the College of Information Engineering, Zhejiang University of Technology. His research interests include bioinformatics, intelligent information processing and optimization theory.
Xiaogen Zhou is a professor in the College of Information Engineering, Zhejiang University of Technology. His research interests include bioinformatics, intelligent information processing and optimization theory.
Contributor Information
Ziying Zhang, College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China.
Yaxian Cai, College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China.
Biao Zhang, College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China.
Wei Zheng, Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA.
Lydia Freddolino, Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA.
Guijun Zhang, College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China.
Xiaogen Zhou, College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China.
FUNDING
This work was supported by the National Science and Technology Major Project of China [2022ZD0115103] and the National Nature Science Foundation of China [62203389, 62201506]. This work used the Extreme Science and Engineering Discovery Environment (XSEDE) [41], which is supported by the National Science Foundation (ACI1548562).
DATA AVAILABILITY
All data needed to evaluate the conclusions are present in the paper and the Supplementary Information. Furthermore, the DEMO-EM2 standalone package and benchmark dataset are readily available for academic and non-commercial users at https://zhanggroup.org/DEMO-EM/DEMO-EM2/.
References
- 1. Yip KM, Fischer N, Paknia E, et al. Atomic-resolution protein structure determination by cryo-EM. Nature 2020;587:157–61. [DOI] [PubMed] [Google Scholar]
- 2. Lawson CL, Patwardhan A, Baker ML, et al. EMDataBank unified data resource for 3DEM. Nucleic Acids Res 2016;44:D396–403. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Cowtan K. The Buccaneer software for automated model building. 1. Tracing protein chains. Acta Crystallogr D Biol Crystallogr 2006;62:1002–11. [DOI] [PubMed] [Google Scholar]
- 4. Emsley P, Lohkamp B, Scott WG, Cowtan K. Features and development of Coot. Acta Crystallogr D Biol Crystallogr 2010;66:486–501. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5. Voss JE, Vaney M-C, Duquerroy S, et al. Glycoprotein organization of Chikungunya virus particles revealed by X-ray crystallography. Nature 2010;468:709–12. [DOI] [PubMed] [Google Scholar]
- 6. Jumper J, Evans R, Pritzel A, et al. Highly accurate protein structure prediction with AlphaFold. Nature 2021;596:583–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Kryshtafovych A, Moult J, Albrecht R, et al. Computational models in the service of X-ray and cryo-electron microscopy structure determination. Proteins 2021;89:1633–46. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. He J, Lin P, Chen J, et al. Model building of protein complexes from intermediate-resolution cryo-EM maps with deep learning-guided automatic assembly. Nat Commun 2022;13:4066. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Liebschner D, Afonine PV, Baker ML, et al. Macromolecular structure determination using X-rays, neutrons and electrons: recent developments in Phenix. Acta Crystallogr D Struct Biol 2019;75:861–77. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Wriggers W. Using situs for the integration of multi-resolution structures. Biophys Rev 2010;2:21–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Kawabata T. Gaussian-input Gaussian mixture model for representing density maps and atomic models. J Struct Biol 2018;203:1–16. [DOI] [PubMed] [Google Scholar]
- 12. Rossmann MG, Bernal R, Pletnev SV. Combining electron microscopic with X-ray crystallographic structures. J Struct Biol 2001;136:190–200. [DOI] [PubMed] [Google Scholar]
- 13. Zhang B, Zhang W, Pearce R, et al. Fitting low-resolution protein structures into cryo-em density maps by multiobjective optimization of global and local correlations. J Phys Chem B 2021;125:528–38. [DOI] [PubMed] [Google Scholar]
- 14. Zhou X, Li Y, Zhang C, et al. Progressive assembly of multi-domain protein structures from cryo-EM density maps. Nat Comput Sci 2022;2:265–75. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. Zhou X, Hu J, Zhang C, et al. Assembling multidomain protein structures through analogous global structural alignments. Proc Natl Acad Sci 2019;116:15930–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16. Zhou X, Peng C, Zheng W, et al. DEMO2: assemble multi-domain protein structures by coupling analogous template alignments with deep-learning inter-domain restraint prediction. Nucleic Acids Res 2022;50:W235–45. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Zhang X, Zhang B, Freddolino PL, Zhang Y. CR-I-TASSER: assemble protein structures from cryo-EM density maps using deep convolutional neural networks. Nat Methods 2022;19:195–204. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. Pfab J, Phan NM, Si D. DeepTracer for fast de novo cryo-EM protein structure modeling and special studies on CoV-related complexes. Proc Natl Acad Sci 2021;118:e2017525118. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19. Storn R, Price K. Differential evolution–a simple and efficient heuristic for global optimization over continuous spaces. J Glob Optim 1997;11:341–59. [Google Scholar]
- 20. Zhou X-G, Peng C-X, Liu J, et al. Underestimation-assisted global-local cooperative differential evolution and the application to protein structure prediction. IEEE Trans Evol Comput 2019;24:1–550. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21. Baek M, DiMaio F, Anishchenko I, et al. Accurate prediction of protein structures and interactions using a three-track neural network. Science 2021;373:871–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22. Zhou X, Zheng W, Li Y, et al. I-TASSER-MTD: a deep-learning-based platform for multi-domain protein structure and function prediction. Nat Protoc 2022;17:2326–53. [DOI] [PubMed] [Google Scholar]
- 23. Lin Z, Akin H, Rao R, et al. Language models of protein sequences at the scale of evolution enable accurate structure prediction. BioRxiv 2022;2022:500902. [Google Scholar]
- 24. Wu R, Ding F, Wang R, et al. High-resolution de novo structure prediction from primary sequence. BioRxiv 2022;2022.2007. 2021.500999.
- 25. Trabuco LG, Villa E, Mitra K, et al. Flexible fitting of atomic structures into electron microscopy maps using molecular dynamics. Structure 2008;16:673–83. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26. Zheng W, Zhou X, Wuyun Q, et al. FUpred: detecting protein domains through deep-learning-based contact map prediction. Bioinformatics 2020;36:3749–57. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27. Li Y, Hu J, Zhang C, et al. ResPRE: high-accuracy protein contact prediction by coupling precision matrix with deep residual neural networks. Bioinformatics 2019;35:4647–55. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28. Zhang C, Shine M, Pyle AM, Zhang Y. US-align: universal structure alignments of proteins, nucleic acids, and macromolecular complexes. Nat Methods 2022;19:1109–15. [DOI] [PubMed] [Google Scholar]
- 29. Pettersen EF, Goddard TD, Huang CC, et al. UCSF chimera—a visualization system for exploratory research and analysis. J Comput Chem 2004;25:1605–12. [DOI] [PubMed] [Google Scholar]
- 30. Adams P. PHENIX: a comprehensive Python-based system for macromolecular structure solution. Acta Crystallogr D Biol Crystallogr 2009;65:1074. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31. Song Y, DiMaio F, Wang RYR, et al. High-resolution comparative modeling with RosettaCM. Structure 2013;21:1735–42. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32. Xu J, Zhang Y. How significant is a protein structure similarity with TM-score= 0.5? Bioinformatics 2010;26:889–95. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33. Zhang Y, Skolnick J. Scoring function for automated assessment of protein structure template quality. Proteins 2004;57:702–10. [DOI] [PubMed] [Google Scholar]
- 34. DiMaio F, Song Y, Li X, et al. Atomic-accuracy models from 4.5-Å cryo-electron microscopy data with density-guided iterative local refinement. Nat Methods 2015;12:361–5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35. Hintze BJ, Lewis SM, Richardson JS, Richardson DC. Molprobity's ultimate rotamer-library distributions for model validation. Proteins 2016;84:1177–89. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36. Chen VB, Arendall WB, Headd JJ, et al. MolProbity: all-atom structure validation for macromolecular crystallography. Acta Crystallogr D Biol Crystallogr 2010;66:12–21. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37. Jamali K, Käll L, Zhang R, et al. Automated model building and protein identification in cryo-EM maps. Nature 2024:1–2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38. Maddhuri Venkata Subramaniya SR, Terashi G, Kihara D. Protein secondary structure detection in intermediate-resolution cryo-EM maps using deep learning. Nat Methods 2019;16:911–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39. Guo S-S, Liu J, Zhou X-G, Zhang GJ. DeepUMQA: ultrafast shape recognition-based protein model quality assessment using deep learning. Bioinformatics 2022;38:1895–903. [DOI] [PubMed] [Google Scholar]
- 40. Liu J, Liu D, Zhang G-J. DeepUMQA3: a web server for accurate assessment of interface residue accuracy in protein complexes. Bioinformatics 2023;39:btad591. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41. Towns J, Cockerill T, Dahan M, et al. XSEDE: accelerating scientific discovery. Comput Sci Eng 2014;16:62–74. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
All data needed to evaluate the conclusions are present in the paper and the Supplementary Information. Furthermore, the DEMO-EM2 standalone package and benchmark dataset are readily available for academic and non-commercial users at https://zhanggroup.org/DEMO-EM/DEMO-EM2/.











