Abstract
The EMDataResource Ligand Model Challenge aimed to assess the reliability and reproducibility of modeling ligands bound to protein and protein/nucleic-acid complexes in cryogenic electron microscopy (cryo-EM) maps determined at near-atomic (1.9–2.5 Å) resolution. Three published maps were selected as targets: E. coli beta-galactosidase with inhibitor, SARS-CoV-2 RNA-dependent RNA polymerase with covalently bound nucleotide analog, and SARS-CoV-2 ion channel ORF3a with bound lipid. Sixty-one models were submitted from 17 independent research groups, each with supporting workflow details. We found that (1) the quality of submitted ligand models and surrounding atoms varied, as judged by visual inspection and quantification of local map quality, model-to-map fit, geometry, energetics, and contact scores, and (2) a composite rather than a single score was needed to assess macromolecule+ligand model quality. These observations lead us to recommend best practices for assessing cryo-EM structures of liganded macromolecules reported at near-atomic resolution.
Cryogenic electron microscopy (Cryo-EM) has rapidly emerged as a powerful method for determining structures of macromolecular complexes. It is complementary to macromolecular crystallography in its ability to visualize macromolecules, and complexes thereof, of varying sizes and extents of structural heterogeneity in 3D at near to full atomic resolution. The number of new structures determined by cryo-EM has been steadily increasing, and with improved resolution (Figure 1a). Macromolecular complexes may contain, in addition to larger components (i.e., proteins or nucleic acids), smaller components such as enzyme cofactors, substrates, analogs or inhibitors, medically relevant drug discovery candidates or approved drugs, glycans, lipids, ions, or water molecules. Accurate modeling of ligands within their macromolecular environment is important, as they can substantially influence larger-scale structure and functions. As the number of novel ligands in cryo-EM-derived structures continues to increase rapidly (Figure 1b), it becomes important to investigate how best to validate them to ensure optimal modeled ligand quality using various measures such as fit of model-to-map, geometry scores of the ligand, and local interactions with ions, waters, protein or nucleic acid components.
An international workshop on validation of ligands in crystallographic PDB depositions1 held in 2015 identified several common problems, including weak experimental density, ligand atoms poorly placed, incorrectly defined or misinterpreted chemical species, and inclusion of atoms not directly supported by experimental evidence. The main outcome was a set of best practice recommendations for PDB depositors and for the PDB archive. For PDB depositors, recommendations included providing unambiguous chemical definitions for all ligands present in a structure, including hydrogen atoms, providing ligand geometry and refinement restraints, clearly identifying atoms not supported by experimental evidence, providing the experimental map used for modeling, and including comments explaining outliers. Recommendations for PDB validation included providing informative images of ligands in their density; providing stick figure diagrams indicating geometry outliers; identifying atoms not supported by experimental evidence; providing quality assessment metrics for each identified ligand; and identifying possible protonation states. Most of the workshop validation recommendations have been implemented in PDB validation reports, with ligand geometric assessments implemented for all experimental methods2–4.
Since 2010, EMDataResource (EMDR) has organized multiple Challenge activities (https://challenges.emdataresource.org) with the aim of bringing the cryo-EM community together to address important questions regarding the reconstruction and interpretation of maps and map-derived atomic coordinate models5. For each Challenge, a committee consisting of prominent experts is invited to recommend targets and set goals. Each event has been conducted with the operational principles of fairness, transparency, and openness, using modeler-blind assessments and open results, with a major goal of promoting innovation.
In 2016, paired Map and Model Challenges invited participants to apply their novel algorithms/software to reconstruct maps and to evaluate models at resolutions of 2.9–4.5 Å. The results were published in a 19-article special journal issue6. By 2018, most participating groups had improved their pipelines, eliminating many identified mistakes. The unique EMRinger map metric for sidechain-mainchain consistency7 was first tested systematically in the 2016 Challenge and is now standard.
The 2019 Model Metrics Challenge evaluated models, while also evaluating the effectiveness of many different coordinate-only and map-model fit metrics for 4 targets at 1.7–3.3 Å resolution. The results were published in a single joint paper8. To streamline the challenge process, input of data from participants and initial assessment pipelines were automated, and comprehensive statistics, visualizations of scores and comparisons were made available. The CaBLAM multi-residue mainchain metric9, introduced in 2016, was shown in the 2019 Challenge to be the score most highly correlated with measures of match-to-target. The Q score10, inspired and introduced by the 2019 Challenge, has now been adopted by the wwPDB Validation System used at deposition as well as in the detailed validation report11.
The 2021 Ligand Model challenge brought together research and industry groups to evaluate and discuss available measures and tools for ligand quality assessment. Many of the issues identified for crystallographic structures in the 2015 ligand workshop were also expected to occur in cryo-EM structures with modeled ligands, but with additional considerations unique to cryo-EM. Targets were chosen from publicly available maps with sufficient resolution to theoretically allow de-novo ligand modeling, include diverse components such as protein and RNA, and have current interest and relevance. The objectives set out were to identify 1) methods for modeling such ligands and 2) metrics to evaluate map-model fit, stereochemical geometry, and chemically sensible interactions between the ligand and protein or RNA component. We describe here the overall design and outcomes of the EMDR Ligand Challenge, recommendations for the cryo-EM community based on currently available assessment methods, and what is needed for the future.
Results
Challenge Design
Three Cryo-EM map targets were chosen based on the following criteria: recently published with resolution better than 3 Å, maps released in the Electron Microscopy Databank (EMDB), associated coordinates in the Protein Data Bank (PDB), small molecules present (ligands, water, metal ions, detergent, and/or lipid), and having current topical relevance (Figure 2 panels A-C):
Target 1: 1.9 Å E. coli β-Galactosidase (β-Gal) in complex with inhibitor 2-phenylethyl 1-thio-beta-D-galactopyranoside (PETG) with PDB Chemical Composition Dictionary (CCD) id PTQ, EMDB map entry EMD-7770, PDB reference model 6CVM12
Target 2: 2.5 Å SARS-CoV-2 RNA-dependent RNA polymerase (RNAP) with the pharmacologically active, nucleotide form of the prodrug remdesivir (CCD id F86) covalently-bound to RNA, EMD-30210, PDB reference model 7BV213 14
Target 3: 2.1 Å SARS-CoV-2 Open Reading Frame 3a (ORF3a) putative ion channel in complex with 1,2-dioleoyl-sn-glycero-3-phosphoethanolamine phospholipid (CCD id PEE), EMD-22898, PDB reference model 7KJR15
Next, modeling teams were solicited via emails to multiple bulletin board lists and were asked to register, generate and upload optimized models for each Target, following provided guidelines (see Online Methods). A total of 61 independently determined models were contributed by seventeen teams from different institutions (ids EM001-EM017), with workflow details collected for each (see summary in Table 1 and Supplementary Data S1, S2 for details).
Table 1.
ID | Modeling Team | T1 | T2 | T3 | Polymer Modeling | Ligand Modeling | Ligand Restraints Software | Automati on level | Modeling Software |
---|---|---|---|---|---|---|---|---|---|
EM001 | D. Kihara, G. Terashi, D. Sarkar, J. Verburgt | 3 | 2 | 3 | ab initio or optimized | refit or optimized | MD Force Field | partial | Mainmast, Mainmastseg, Rosetta PyMOL, Schrodinger, VMD, Chimera, MDFF |
EM002 | D. Si, S. Lin, M. Zhao, R. Cao, J. Hou | 3 | 2 | 3 | ab initio or none | refit | Phenix eLBOW | full | DeepTracer, Phenix |
EM003 | A. Muenks, F. DiMaio | 3 | 2 | 2 | optimized | refit | Phenix eLBOW, Open Babel | partial | Rosetta, Chimera |
EM004 | J. Cheng, N. Giri | 2 | 2 | 2 | ab initio | refit | PyRosetta | partial | Rosetta, Chimera, DeepTracer |
EM005 | G. Pintilie, M. Schmid, W. Chiu | 2 | 1 | 1 | none | refit | Phenix eLBOW | partial | Chimera |
EM006 | M. Baker, C. Hryc | 1 | 1 | 1 | ab initio | refit | Phenix eLBOW | partial | Pathwalker, Phenix |
EM007 | A. Perez, A. Mondal, R. Esmaeeli, L. Lang | 1 | 1 | 1 | optimized | optimized | PyRosetta, Antechamber, MD Force Field | partial | MELD, Amber, VMD |
EM008 | P. Emsley | 1 | 1 | 1 | optimized | refit | CCP4 AceDRG | partial | Coot, REFMAC |
EM009 | N.W. Moriarty, P. V. Afonine, C.J. Schlicksup, O.V. Sobolev | 1 | 1 | 1 | optimized | refit | Phenix eLBOW | partial | Coot, Chimera, ChimeraX, Phenix |
EM010 | G. Chojnowski | 1 | 1 | 1 | ab initio | refit | CCP4 mon lib | partial | ARP/wARP, ChimeraX, Coot, Isolde, Phenix, doubleHelix |
EM011 | M. Igaev, H. Grubmuller,. Pohjolainen, A. Vaiana | 1 | 1 | 1 | ab initio | optimized | MD Force Field | partial | Chimera, Modeller, VMD, CDMD |
EM012 | C. Palmer, R. Nicholls, R. Warshamanage, K. Yamashita, G. Murshudov, P. Bond, S. Hoh, M. Olek, K. Cowtan, A. Joseph, T. Burnley, M. Winn | 1 | 1 | 1 | optimized | refit or optimized | CCP4 AceDRG | partial | CCP-EM, Coot, EMDA, LAFTER, ProSMART, REFMAC, Servalcat |
EM013 | A. Singharoy, S. Mittal, A. Perez, D. Kihara, M. Shekhar, D. Sarkar, G. Terashi, C. Rowley, R. Esmaeeli, L. Lang, A. Mondal, A. Campbell | 1 | 1 | optimized | refit or optimized | CGENFF | partial | MDFF, CryoFold, MELD | |
EM014 | W.-C. Kao, C. Hunte | 1 | 1 | optimized | refit | Grade (BUSTER), Phenix eLBOW | manual | ChimeraX, Coot, Isolde, Phenix | |
EM015 | G. Schröder, L. Schäfer, K. Pothula | 1 | optimized | refit | MD Force Field | partial | CDMD | ||
EM016 | D. Kumar | 1 | optimized | refit | Phenix eLBOW | partial | Coot, Phenix | ||
EM017 | S. Weyand, S.C. Vedithi, T. Blundell, S. Brohawn | 1 | optimized | refit | Schrödinger Ligprep | full | Schrödinger | ||
Totals | 23 | 17 | 21 |
Model Assessments
Submitted and PDB reference models for each target were evaluated by passing them through the EMDR Model Challenge validation pipeline8,16. Individual scores were obtained for many different sets of metrics, with a new Ligand analysis track added to the existing Fit-to-Map, Coordinates-only, Comparison-to-Reference, and Comparison-among-Models tracks.
Global Fit-to-Map metrics included Map-Model Fourier shell correlation (FSC)17, Atom Inclusion18, EMRinger7, density-based correlation scores from TEMPy19, Phenix20 and Q-score10.
Overall Coordinates-only quality was evaluated using Clashscore, Rotamer outliers, Ramachandran outliers, and CaBLAM from MolProbity9,21, as well as standard geometry measures (e.g., bond, chirality, planarity) from Phenix22. Davis-QA, a measure used in critical assessment of protein structure prediction (CASP) competitions, was used to assess similarity among submitted models23.
Assessment teams contributed a wide variety of ligand-specific assessments (Table 2, ids AT01-AT07) including ligand, ligand environment, solvent, and RNA-specific analyses. AT01 used Mogul24 to evaluate ligand covalent geometry as implemented in the wwPDB validation process2,4, with inclusion of a novel composite ligand geometry ranking score25. AT02 evaluated model ligands using Coot26 and AceDRG27. AT03 evaluated RNA conformation with DNATCO28,29 and solvent atom placement around protein residues using water distributions30,31. AT04 analyzed ligand all-atom contacts with Molprobity Probescore9, and ion and water placements using UnDowser32. AT05 scored ligand placements using density fields derived from pharmacophore consensus field analysis33, a method utilized in computer-aided drug design to identify and extract possible interactions between a ligand–receptor complex based on steric and electronic features34. AT06 examined ligand strain energies using both molecular mechanics and neural net potential energy strategies35–37, where strain energy is the calculated difference in energy between the modeled conformation and the lowest energy conformation in solution. AT07 prepared Q-score analyses10 for model-fit-to-map of whole models, protein, ligands, and water, as well as ligand plus protein and/or nucleic acid polymer atoms in the immediate vicinity of the ligand (LIVQ). Assessor scores are available online at model-compare.emdataresource.org; results are briefly outlined below.
Table 2.
Assessment Team ID | Team members | Assessment method |
---|---|---|
AT01 | C. Shao | wwPDB validation report pipeline (Mogul) |
AT02 | P. Emsley | Coot Tools |
AT03 | B. Schneider, J. Černý | Nucleic acid conformations, protein hydration analysis |
AT04 | J.S. Richardson, C.J. Williams, V. Chen, D. Richardson | Contact analysis, probescore, occupancy, UnDowser, CaBLAM, visual examination |
AT05 | C.I. Williams, Chemical Computing Group Support Team | Pharmacophore density fields (PH4) |
AT06 | B. Sellers, A. Gobbi, S. Noreng, Y. Yang, A. Rohou | Molecular Mechanics Force Field Strain Energy (MM), Neural Net Potential Energy (NNP) |
AT07 | G. Pintilie, M. Schmid, W. Chiu | Q-score analysis |
Outcomes
The modeled ligands from each of the submissions are shown superimposed with their corresponding map density in Figure 2 panels D-F; selected ligand and whole-model score distributions are shown for all three targets in Figure 3. The full set of pipeline and assessment team scores and their definitions are provided in Supplementary Data S3. The superimposed views and score distributions demonstrate that the methods utilized by the modeling teams produced a range of ligand positions and conformations.
Overall model scoring.
With regards to overall Fit-to-Map evaluation, the majority of submitted models scored very similarly to PDB reference models for all targets, both in terms of the overall map-model FSC17 and protein Q-score10 (Figure 3, rows 9 and 11). For Targets 2 and 3, several teams modestly improved upon EMRinger score7 (Figure 3, columns 2 and 3, row 10). With regards to overall Coordinates-only evaluation, many teams were able to improve upon PDB reference models for all targets in terms of Clashscore32 and CaBLAM32, metrics that identify steric clashes and evaluate protein backbone geometry, respectively (Figure 3, rows 6, 7).
Ligand and ligand environment scoring.
Ligand and ligand environment evaluation methods were challenged by missing atoms in some submissions, the covalently bound ligand (Target 2), and presence of charged ligands (Targets 2 and 3). In terms of ligand-specific Fit-to-Map (Ligand Q-score), many teams made improvements relative to the PDB reference model of Target 1, but scored similarly or worse than the PDB reference of Targets 2 and 3 (Figure 3, row 1). In terms of covalent geometry (Mogul)24,25, many ligands in the submitted models were improved relative to references for Targets 1 and 3, while results were mixed for Target 2 (Figure 3, row 5). With respect to calculated ligand strain energy and pharmacophore ligand environment modeling, many of the submitted models were improved relative to references for Targets 1 and 2, but some poses were less favorable (Figure 3, rows 3–4). Ligand strain energy qualitatively should be less than 3 kcal/mol with minor relaxation using the sampling and scoring as described in Online Methods. Only a subset of submitting groups carefully considered treatment of ions (Extended Data Figure 5).
Nucleic Acid scoring.
Target 2’s RNA (a typical A-form double helix, with two unpaired nucleotides at the 5՛ end of the template strand) had close to expected geometries for most submitted models as assessed by DNATCO nucleic acid Confal scores29 (Figure 3, row 8). Values of torsion angles in the dinucleotide units assigned to DNATCO NtC classes agreed with expected distributions including sugar ring torsions that define pucker. Note that prior to running this Challenge, Target 2’s reference model (PDB 7bv2) had been re-versioned by the deposition authors and re-released by the PDB with several corrections to sequence, RNA conformation, and CaBLAM outliers38, thus limiting scope for model improvement.
Submitted Model rankings.
To evaluate and rank quality of ligand Fit-to-Map within the context of the macromolecular complex, we developed a novel score, the Ligand + Immediate Vicinity Q-score (LIVQ), which averages Q-scores of non-hydrogen atoms of the ligand together with all non-hydrogen polymer atoms in the immediate vicinity of ligand. A distance cutoff of 5 Å was chosen to define the immediate vicinity of the ligand for model ranking purposes (LIVQ5, Figure 4A–C); extension to 10 Å yielded similar results (LIVQ10, Extended Data Figure 2). The results of the analysis show that for each target there are several models that exhibit very good model-to-map fit comparable to that of reference PDB-deposited models (Figure 4A–C, blue bars). Nine, two and three submitted models respectively on Targets 1–3 score better than the corresponding deposited reference model.
Group rankings.
Overall ranking of participating groups (Figure 4D) employed a combination of LIVQ5 and MolProbity score, itself a weighted function of clashes, Ramachandran favored, and rotamer outliers9. LIVQ5 was weighted higher than stereochemical plausibility, similar to the approach customarily used in CASP39:
where z.metric is the number of standard deviations relative to the mean of the score distribution for all models from each group on the selected target according to the selected metric. Overall, group EM003 (DiMaio) had the best relative performance by this ranking criterion, being the only group that outscored all deposited reference PDB models (Figure 4A–C).
Alternate group rankings.
The model-compare website Group Ranking calculator enables users to explore other possible ranking formulas: z-scores of up to 40 different individual metrics can be selected for inclusion with adjustable weighting. Extended Data Figure 3 illustrates an alternate ranking method based upon thirteen different metrics including ligand, ligand+environment, full model coordinates-only and full model fit-to-map. By this alternate method, five groups ranked higher than PDB reference models: EM010 (Chojnowski), EM008 (Emsley), EM012 (Palmer), EM003 (DiMaio), and EM009 (Moriarty), and one performed very close to reference, EM011 (Igaev).
Ligand Quality.
The ligand environment for the reference models and the best submitted models is compared for each target in Figure 5.
For Target 1 (β-Gal, Fig 5 A,D), the PTQ ligand O5 atom connected to the sugar ring is situated at the bottom of the binding pocket in the reference model and in eight submitted models, whereas in the top-scoring model, as well as five other submitted models, the sugar ring is flipped with oxygen O5 situated at the top. The flipped ligand fits the density better and has more optimal interatomic distances to water and protein atoms for hydrogen-bonding, with O5 H-bonded to a coordinated water of the nearby magnesium ion (see Supplemental section S5). The density shape does not preclude the possibility that both original and flipped conformations are present, each with partial occupancy, and probescores for the two states are nearly identical (Extended Data Fig 4A).
For Target 2 (RNAP; Fig 5 B,E), the F86 ligand is very similar for the deposited and top-scoring model, though distances to base-paired U10 are slightly different. F86 probescores varied greatly across models, with the reference at 10.1, model EM008_1 at 39.9, and the worst model at −106.9 (Extended Data Figure 4). Many models did not correctly create the RNA polymer – F86 (remdesivir) covalent bond. In addition, only five models indicated partial occupancy for F86, yet the map density for F86 and its paired base is almost exactly half that of adjacent base pairs (Extended Data Fig 4B), indicating 50% occupancy.
In the case of Target 3 (ORF3 ion channel; Fig 5 C,F), the PEE ligand has similar interactions to nearby atoms and placed water molecules, though with slightly different interatomic distances. The head-group amino N atom (which has no close contacts within 4Å) points up in the deposited model but away from the camera view in the top-scoring model (Fig 5F). The long lipid tails of PEE have lower density, with confusingly interlaced and gapped connectivity that indicates disorder; the ensemble of all PEE ligand models shown in Fig 2F may be a more meaningful representation than any one individual model.
Discussion
The selected targets for the Ligand Challenge are some of the first structures deposited and released into PDB that contain ligands modeled into cryo-EM maps with resolution of 3 Å or better. Our Challenge results revealed that a deposited PDB model’s ligand and local ligand environment may not be fully optimal in terms of concurrent Fit-to-Map and Coordinates-only measures. For all three targets and especially for Target 1, adjustments in the ligand and/or ligand environment could be made to the deposited reference model that improved one or more validation criteria, as demonstrated by several modeler groups. Most of the submitted models were in the “better” range, where tiny differences in measured scores become inconsequential. In our previous Challenge, we showed that overall Fit-to-Map and Coordinates-only metrics are orthogonal measures8; here we see that at a local level, ligand/ligand-environment Fit-to-Map and Coordinates-only metrics are similarly independent (Figure 3, Extended Data Figure 3B, Supplementary Data S3). In other words, ligands that fit quite well into density may not be optimized with respect to ligand coordinates-only validation criteria, and vice versa.
Based on our analyses and experiences running the Challenge, we make the following recommendations.
Recommendation 1, regarding validation of the macromolecular models:
For ligand-macromolecular complexes, the macromolecular model should be subject to standard geometric checks as done for X-ray crystallographic based models1. These include standard covalent geometry checks and MolProbity evaluation, including CaBLAM, clashscore9,21,32. Sugar pucker and DNATCO conformational analysis28,29should be checked for nucleic acid components. The macromolecular model-map fit should be evaluated by EM Ringer7, Q score10, and FSC17. Serious local outliers (which usually indicate an incorrect local conformation) should be emphasized, rather than overall average scores.
The individual MolProbity scores, CaBLAM and clashscore have more utility for validation of protein conformation than overall MolProbity score which incorporates Ramachandran and side-chain rotamer quality, since cryo-EM model refinement includes these as restraints.
Recommendation 2, regarding validation of ligand models:
Ligands in macromolecular complexes should conform to standard covalent geometry measures (bond lengths, angles, planarity, chirality) as recommended by the wwPDB validation report2,4. Additional checks that should be applied to ligands include fit to density using methods applicable to cryo-EM such as Q-score, occupancy (density strength, both absolute and relative to surroundings), and identification of missing atoms, including any surrounding ions.
Ligand energetics should also be examined. Ligand models should be assessed for their strain energy (the calculated difference in energy between the modeled conformation and the lowest energy conformation in solution) to identify improbable model geometries and lower energy alternatives35,36. Other methods can be used but may have different thresholds due to variation in absolute energy values. Strain energy calculations using neural net potentials offer speed close to force fields with the accuracy of QM calculations and are predicted to play a primary role in identifying accurate strain energies in the future. More research is needed to evaluate the overall utility of these deep learning novel methods.
Recommendation 3, regarding validation of ligand environment:
The detailed interaction of the ligand with its binding site is of great importance and should be assessed by several independent metrics. Pharmacophore modeling33 is an optimized and time-tested energetic measure for how well the site would bind the specific ligand. LIVQ scores, introduced here, measure the density fit of the surrounding residues as well as the ligand itself. Probescore32 both quantifies and identifies specific all-atom contacts of H-bond, clash, and van der Waals interactions. All three types of measures should be taken into account. If the ligand model shows only weak interaction with its environment, the model is not right.
During the virtual wrap-up workshop, modelers and assessors shared their experiences and strategies to identify/assess the correct pose for the ligand based on the cryo-EM density maps. It was noted that the local map resolution for a ligand can be worse than the overall map resolution. As one objective measure, Q-scores were found to be lower for ligands in the best submitted models than for the nearby environment (Table 3). Factors that may affect resolvability of local ligand map features include incomplete occupancy, multiple conformations/poses present, regions of ligand flexibility or disorder, chemical modifications, and radiation damage.
Table 3.
Target Map (Reported Resolution) | Model with highest ligand Q-score | Q_ligand (ligand atoms) | Q_near (atoms ≤5Å of ligand) | LIVQ5 (ligand +atoms ≤5Å of ligand) | Expected_Q at reported map resolution |
---|---|---|---|---|---|
T1 β-gal (1.9Å) | EM005_2 | 0.809 | 0.849 | 0.845 | 0.846 |
T2 RNAP (2.5Å) | EM009_1 | 0.707 | 0.735 | 0.731 | 0.690 |
T3 ORF3a (2.1Å) | EM016_1 | 0.767 | 0.819 | 0.812 | 0.791 |
Recommendation 4, regarding organization of future Challenges:
Future cryo-EM Model Challenges should be organized similarly to the well-established CASP and CAPRI challenge events of the X-ray crystallography and prediction communities23, with incorporation of automated checks and immediate author feedback on all model submissions.
Recommendation 5, regarding topics for future Challenges:
For future Challenge topics, consider validation of RNA models, including identification of RNA-associated ions, owing to the rapidly rising numbers of RNA-containing cryo-EM structures40–42. We also recommend maps determined in the 3.5-to-10 Å resolution range be considered as future targets to reflect the rapid rise in depositions of maps from subtomogram averaging of components in cell tomograms43–45. There are very few validation tools for that resolution range.
Online Methods
Challenge process and organization
The Ligand Model Challenge process closely followed the streamlined procedure adopted in the previous Model Metrics Challenge8. In the fall of 2020, a panel of advisors with expertise in cryo-EM methods, ligand modeling and/or ligand model assessment was recruited (J. Černý, P. Emsley, A. Joachimiak, J. Richardson, R. Read, A. Rohou, B. Schneider). The panel worked with EMDR team members to develop the challenge goals and guidelines, to identify suitable ligand-containing reference models from the PDB with cryo-EM map targets from EMDB, and to recommend metrics to be calculated for each submitted model.
The main stated goal was to identify metrics most suitable for evaluating and comparing fit of ligands in atomic coordinate models into cryo-EM maps with 3.0 Å or better reported overall resolution. The specific focus areas for assessor teams suggested by the expert panel were: (1) Geometry and fit to map of small molecules including ligands, water, metal ions, detergent, lipid, nanodiscs. (2) Model geometry (including backbone and side-chain conformations, clashes) in the neighborhood surrounding the small molecules. (3) Local model Fit-to-Map density per residue and per atom. (4) Resolvability at residue or atom-level. (5) Atomic Displacement parameters (B-factors) recommended optimization practice. A key question to be answered: How reliable are ligands/waters/ions built into cryo-EM maps? Can they be placed automatically or is manual intervention needed?
Modeling teams were tasked with creating and uploading their optimized model for each Target Map. The challenge rules and guidance were as follows: (1) Submitted models should be as complete and as accurate as possible (i.e., close to publication-ready), with atomic coordinates and atomic displacement parameters for all model components. (2) Submitted models must use the deposited PDB Reference Model’s residue, ligand, and chain numbering/labeling for all shared model components. (3) Ligands should ideally be deleted and refitted independently. (4) Additional polymer residues should be labeled according to the Reference Model’s sequence/residue numbering/chain ids. (5) If additional waters/ions/ligands are included, they should be labeled with unique chain ids. (6) If predicted hydrogen atom positions are part of the modeling process, hydrogens should be included in the submitted coordinates. (7) Models are expected to adhere to the reconstruction’s point symmetry (D2 for Target 1, C1 for Target 2, C2 for Target 3).
Members of cryo-EM and modeling communities were invited to participate in February 2021 and details were posted at challenges.emdataresource.org . Models were submitted by participant teams between March 1 and April 15. For each submitted model, metadata describing the full modeling workflow were collected via a Drupal webform (see Supplementary Data S1, S2), and coordinates were uploaded and converted to PDBx/mmCIF format using PDBextract46. Model coordinates were then processed for atom/residue ordering and nomenclature consistency using PDB annotation software (Feng Z., https://sw-tools.rcsb.org/apps/MAXIT) and additionally checked for sequence consistency, ligand atom naming, and correct position relative to the designated target map. Models were then evaluated as described below (see Model evaluation system).
In mid-April 2021, models, workflows and initial calculated scores were made publicly available for evaluation, blinded to modeler team identity and software used. In the period mid-April to mid-May, evaluators discovered several problems with the submitted models that blocked assessment software from completing calculations. The primary issue identified was inconsistent ligand atom naming. Approximately half of all submitted models had to be revised to make atom names consistent with the deposited reference models (see Challenge rule (2) above). Corrected coordinate files were provided by the submitting modeler teams, which were then re-processed as described above and re-released to evaluators.
A virtual 3-day (~4 hours/day) workshop was held in mid-July 2021 to review the Challenge results. All modeling participants were invited to attend remotely and present overviews of their modeling processes and/or assessment strategies. Recommendations were made for additional evaluations of the submitted models as well as for future challenges. Modeler teams, workflows and software were unblinded during the workshop.
Data sources and Modeling
Target maps were obtained from EM Data Bank47. Target 1 E. coli β-Galactosidase/PETG12: EMD-7770, Target 2 SARS-CoV-2 RNA-dependent RNA polymerase/Remdesivir13: EMD-30210, Target 3 SARS-CoV-2 ORF3a putative ion channel/phospholipid in nanodisc15: EMD-22898.
Table 1 summarizes the approach and lists the software used by each modeling team. Further details for each model can be found in Supplement S2. Modeling teams categorized their polymer modeling type as either ab initio (followed by optimization), optimized, or not optimized. Non-ab initio approaches made use of polymer coordinates from the following PDB entries. Target 1: 6cvm, 1jz7, 6tte. Target 2: 7bv2, 7b3d, 6×71, 3ovb. Target 3: 7kjr.
Submitted models were further categorized by ligand modeling type, either independently refit or optimized. Initial ligand coordinates and restraints were obtained from the PDB Chemical Component Dictionary (CCD)48, Crystallography Open Database (COD)49, or from a PDB entry. Ligand restraint generation software included BUSTER Grade (Global Phasing Ltd., Cambridge, UK), Phenix eLBOW50, CCP4 AceDRG51, PyRosetta52, AMBER Antechamber53, OpenBabel54, CHARMM CGenFF55, LigPrep (Schrödinger LLC, New York, USA), and CCP4 monomer library56. Restraints were not applied by teams using MD-based approaches.
Ab initio modeling software included ARP/wARP57, Mainmast58, Mainmastseg59, Pathwalker60, Rosetta61, Modeller62, and DeepTracer63,64. Model optimization software included CDMD65, Phenix22, REFMAC66, Servalcat67, ProSMART68, MDFF69, CryoFold70, Amber53, MELD71,72, Schrödinger (Schrödinger LLC, New York, USA). The program doubleHelix73 was used to assign RNA sequence and refinement restraints. Atomic displacement parameters (B-factors) were optimized for 32 of 61 models, with 23 applying individual atomic B-factors.
Participants made use of VMD74, Chimera75, ChimeraX76, Coot26, ISOLDE77, EMDA78 and PyMOL for visual evaluation and/or manual model improvement of map-model fit. Manipulation of map densities was carried out using CCP-EM79, EMDA, and LAFTER80.
Model evaluation system
The evaluation pipeline for the 2021 challenge (model-compare.emdataresource.org) was built upon the basis of the 2019 Model Challenge pipeline8,16. Submitted models were evaluated for >70 individual metrics in four established tracks: Fit-to-Map, Coordinates-only, Comparison-to-Reference and Comparison-among-Models, plus a new Ligand track, created for comparison of ligand-specific scores (See Supplementary Data S3). Ligand and Nucleic-acid specific scores provided by Assessor teams (Table 2) were integrated into data tables alongside scores from the evaluation pipeline to enable comparisons and composite score generation.
Pharmacophore Modeling
The Molecular Operating Environment platform (MOE) was used to score the placement of ligands. Starting from the model coordinates submitted by each group, the MOE QuickPrep application was used to prepare all-atom structures with hydrogens and atomic partial charges. For each target, an ensemble of structures consisting of all submitted models was input into the db_AutoPH4 application to produce pharmacophore consensus fields based on the ensemble. The pharmacophore consensus fields were then used to score the ligand poses of each submission. Additional details are provided in Supplementary Data S4.
Strain energy calculations
Preparation: ligands were extracted from model files. For the T2 F86 ligand, strain energy was measured after deleting the covalent bond to the RNA polymer (SMILES:Nc(ncn1)c2n1c([C@]3(C#N)O[C@@H]([C@H]([C@H]3O)O)COP([O-])([O-])=O)cc2). For the T3 PEE ligand, all models were truncated to just the head group (SMILES:CCC(OC[C@@H](OC(CC)=O)CO[P@]([O-])(OCC[NH3+])=O)=O). Hydrogens were added using MOE/Protonate3D from the Chemical Computing Group.
Molecular Mechanics (MM) Forcefield Strain Energy: predicted ligand energy was calculated by minimizing each ligand structure using OpenEye/SZYBKI (MMFF94S with Sheffield solvation model) with a maximum RMSD of 0.6 Angstroms. Predicted global minimum energy was identified by sampling conformations using OpenEye/Omega and then minimizing each conformer structure using OpenEye/SZYBKI (MMFF94S with Sheffield solvation model) with no restraints, then selecting the conformer with the lowest minimized energy.
Neural Net Potential (NNP) Energy: predicted ligand energy was calculated by minimizing each ligand structure in an in-house implementation of the ANI neural net potential37 with a maximum RMSD of 0.6 Ångstroms. Predicted global minimum energy was identified by sampling conformations using OpenEye/Omega and then minimizing each conformer structure using the same in-house implementation of the ANI neural net potential with no restraints.
Reported scores are predicted strain energy as (predicted ligand energy - global minimum energy) in kcal/mol. NNP was only calculated for the T1 ligand as the method currently does not support atomic charges.
Molecular Graphics
Molecular graphics images were generated using UCSF Chimera (Figures 2, 5, Extended Data Figure 1).
Classification of unique ligands in PDB introduced by Cryo-EM
Search of the Protein Data Bank via RCSB PDB’s data API81 identified 981 unique non polymer ligands/PDB Chemical Component Dictionary (CCD) ids in EM-derived PDB structures released through December 2021. Next, for each ligand, the PDB entry that first introduced the ligand/CCD id was identified. The 403 unique non-polymer ligands that were found to be introduced in structures determined by cryo-EM were then manually classified as enzyme modulators (substrates, inhibitors, agonists, co-factors), medically relevant drugs, lipids, photochemicals (e.g. carotenoids), peptides (amino-acid-based), reagents (buffers or labels), nucleotides, or steroids (fused rings).
Extended Data
Acknowledgements
EMDataResource (CLL, AK, GP, HMB, WC) was supported by the US National Institutes of Health (NIH)/National Institute of General Medical Science (R01GM079429).
The following additional grants are acknowledged for participant support.
JSR, CJW, VBC, and DCR acknowledge support from the US National Institutes of Health (P01GM063210, R01GM073919, R35GM131883)
NWM, PVA, CJS and OVS gratefully acknowledge the financial support of NIH/NIGMS through grants P01GM063210, R01GM071939, R24GM141254 and the Phenix Industrial Consortium. This work was supported in part by the US Department of Energy under Contract DE-AC02-05CH11231.
JC and BS acknowledge support to the Institute of Biotechnology of the Czech Academy of Sciences RVO 86652036.
RJR was supported by a Principal Research Fellowship from the Wellcome Trust (209407/Z/17/Z).
DK acknowledges support from the US National Institutes of Health (R01GM133840).
AP acknowledges support from an NSF-CAREER award (CHE-2235785).
HG and MI acknowledge support by the German Science Foundation (DFG, RTG 2756).
JC and NG acknowledge the support from the US National Institutes of Health (grant #: R01GM146340).
CH and W-CK were supported by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany’s Excellence Strategy (CIBSS – EXC-2189 – Project ID 390939984).
SCV was supported by the American Leprosy Missions (grant number G88726) at the Department of Biochemistry, University of Cambridge.
PSB was supported by the Biotechnology and Biological Sciences Research Council (grant number BB/S005099/1).
SWH was supported by the Biotechnology and Biological Sciences Research Council (grant number BB/T012935/1).
SM acknowledges support from SERB (Project No. CRG/2022/002761).
CMP, TB, APJ and MW were supported by the Medical Research Council (grant MR/V000403/1).
FD and AMuenks acknowledge support from the National Institute of General Medical Sciences (1R01GM123089-01). AMuenks acknowledges support by the NSF Graduate Research Fellowship (DGE-1762114).
Author Contributions: HB and WC conceived the project; CL and AK organized the Challenge with the assessors and modelers; GP and MFS assisted in the analysis.
Footnotes
Competing Interests
S. Noreng, A. Gobbi, A. Rohou, B. Sellers and Y. Yang are current or former employees of Genentech. The remaining authors declare no competing interests.
Additional Declarations: Yes there is potential Competing Interest. S. Noreng, A. Gobbi, A. Rohou, B. Sellers and Y. Yang are current or former employees of Genentech. The remaining authors declare no competing interests.
Supplementary Files
Supplementary Information
S1: Ligand Challenge Model Submission Statistics and Form (.pdf)
S2: Ligand Challenge Submitted Model Metadata (.xlsx)
S3: Ligand Challenge Scores (.xlsx)
S4: MOE Pharmacore Assessment Summary (.pdf)
References
- 1.Adams P. D. et al. Outcome of the First wwPDB/CCDC/D3R Ligand Validation Workshop. Structure 24, 502–508 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Gore S. et al. Validation of Structures in the Protein Data Bank. Structure 25, 1916–1927 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Smart O. S. et al. Validation of ligands in macromolecular structures determined by X-ray crystallography. Acta Crystallogr D Struct Biol 74, 228–236 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Feng Z. et al. Enhanced validation of small-molecule ligands and carbohydrates in the Protein Data Bank. Structure 29, 393–400.e1 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Lawson C. L., Berman H. M. & Chiu W. Evolving data standards for cryo-EM structures. Struct Dyn 7, 014701 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Lawson C. L. & Chiu W. Comparing cryo-EM structures. J. Struct. Biol. 204, 523–526 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Barad B. A. et al. EMRinger: side chain-directed model and map validation for 3D cryo-electron microscopy. Nat. Methods 12, 943–946 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Lawson C. L. et al. Cryo-EM model validation recommendations based on outcomes of the 2019 EMDataResource challenge. Nat. Methods 18, 156–164 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Williams C. J. et al. MolProbity: More and better reference data for improved all-atom structure validation. Protein Sci. 27, 293–315 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Pintilie G. et al. Measurement of atom resolvability in cryo-EM maps with Q-scores. Nat. Methods 17, 328–334 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Wang Z., Patwardhan A. & Kleywegt G. J. Validation analysis of EMDB entries. Acta Crystallogr D Struct Biol 78, 542–552 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Bartesaghi A. et al. Atomic Resolution Cryo-EM Structure of β-Galactosidase. Structure 26, 848–856.e3 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Yin W. et al. Structural basis for inhibition of the RNA-dependent RNA polymerase from SARS-CoV-2 by remdesivir. Science 368, 1499–1504 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Kokic G. et al. Mechanism of SARS-CoV-2 polymerase stalling by remdesivir. Nat. Commun. 12, 279 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Kern D. M. et al. Cryo-EM structure of SARS-CoV-2 ORF3a in lipid nanodiscs. Nat. Struct. Mol. Biol. 28, 573–582 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Kryshtafovych A., Adams P. D., Lawson C. L. & Chiu W. Evaluation system and web infrastructure for the second cryo-EM model challenge. J. Struct. Biol. 204, 96–108 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Rosenthal P. B. & Henderson R. Optimal determination of particle orientation, absolute hand, and contrast loss in single-particle electron cryo-microscopy. J. Mol. Biol. 333, 721–745 (2003). [DOI] [PubMed] [Google Scholar]
- 18.Lagerstedt I. et al. Web-based visualisation and analysis of 3D electron-microscopy data from EMDB and PDB. J. Struct. Biol. 184, 173–181 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Joseph A. P., Lagerstedt I., Patwardhan A., Topf M. & Winn M. Improved metrics for comparing structures of macromolecular assemblies determined by 3D electron-microscopy. J. Struct. Biol. 199, 12–26 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Afonine P. V. et al. New tools for the analysis and validation of cryo-EM maps and atomic models. Acta Crystallogr D Struct Biol 74, 814–840 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Chen V. B. et al. MolProbity: all-atom structure validation for macromolecular crystallography. Acta Crystallogr. D Biol. Crystallogr. 66, 12–21 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Liebschner D. et al. Macromolecular structure determination using X-rays, neutrons and electrons: recent developments in Phenix. Acta Crystallogr D Struct Biol 75, 861–877 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Kryshtafovych A. et al. Challenging the state of the art in protein structure prediction: Highlights of experimental target structures for the 10th Critical Assessment of Techniques for Protein Structure Prediction Experiment CASP10. Proteins 82 Suppl 2, 26–42 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Bruno I. J. et al. Retrieval of crystallographically-derived molecular geometry information. J. Chem. Inf. Comput. Sci. 44, 2133–2144 (2004). [DOI] [PubMed] [Google Scholar]
- 25.Shao C. et al. Simplified quality assessment for small-molecule ligands in the Protein Data Bank. Structure 30, 252–262.e4 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Casañal A., Lohkamp B. & Emsley P. Current developments in Coot for macromolecular model building of Electron Cryo-microscopy and Crystallographic Data. Protein Sci. 29, 1069–1078 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Nicholls R. A. et al. Modelling covalent linkages in CCP4. Acta Crystallogr D Struct Biol 77, 712–726 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Černý J., Božíková P., Svoboda J. & Schneider B. A unified dinucleotide alphabet describing both RNA and DNA structures. Nucleic Acids Res. 48, 6367–6381 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Černý J. et al. Structural alphabets for conformational analysis of nucleic acids available at dnatco.datmos.org. Acta Crystallogr D Struct Biol 76, 805–813 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Biedermannová L. & Schneider B. Structure of the ordered hydration of amino acids in proteins: analysis of crystal structures. Acta Crystallogr. D Biol. Crystallogr. 71, 2192–2202 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Černý J., Schneider B. & Biedermannová L. WatAA: Atlas of Protein Hydration. Exploring synergies between data mining and ab initio calculations. Phys. Chem. Chem. Phys. 19, 17094–17102 (2017). [DOI] [PubMed] [Google Scholar]
- 32.Prisant M. G., Williams C. J., Chen V. B., Richardson J. S. & Richardson D. C. New tools in MolProbity validation: CaBLAM for Cryo-EM backbone, UnDowser to rethink ‘waters,’ and NGL Viewer to recapture online 3D graphics. Protein Sci. 29, 315–329 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Jiang S., Feher M., Williams C., Cole B. & Shaw D. E. AutoPH4: An Automated Method for Generating Pharmacophore Models from Protein Binding Pockets. J. Chem. Inf. Model. 60, 4326–4338 (2020). [DOI] [PubMed] [Google Scholar]
- 34.Tyagi R., Singh A., Chaudhary K. K. & Yadav M. K. Chapter 17 - Pharmacophore modeling and its applications. in Bioinformatics (eds. Singh D. B. & Pathak R. K.) 269–289 (Academic Press, 2022). [Google Scholar]
- 35.Sellers B. D., James N. C. & Gobbi A. A Comparison of Quantum and Molecular Mechanical Methods to Estimate Strain Energy in Druglike Fragments. J. Chem. Inf. Model. 57, 1265–1275 (2017). [DOI] [PubMed] [Google Scholar]
- 36.Lee M.-L. et al. chemalot and chemalot_knime: Command line programs as workflow tools for drug discovery. J. Cheminform. 9, 38 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Smith J. S., Isayev O. & Roitberg A. E. ANI-1: an extensible neural network potential with DFT accuracy at force field computational cost. Chem. Sci. 8, 3192–3203 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Croll T. I., Williams C. J., Chen V. B., Richardson D. C. & Richardson J. S. Improving SARS-CoV-2 structures: Peer review by early coordinate release. Biophys. J. 120, 1085–1096 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Modi V., Xu Q., Adhikari S. & Dunbrack R. L. Jr. Assessment of template-based modeling of protein structure in CASP11. Proteins 84 Suppl 1, 200–220 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Zhang K. et al. Cryo-EM structure of a 40 kDa SAM-IV riboswitch RNA at 3.7 Å resolution. Nat. Commun. 10, 5511 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Su Z. et al. Cryo-EM structures of full-length Tetrahymena ribozyme at 3.1 Å resolution. Nature 596, 603–607 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Lawson C. L., Berman H. M., Chen L., Vallat B. & Zirbel C. L. The Nucleic Acid Knowledgebase: a new portal for 3D structural information about nucleic acids. Nucleic Acids Res. (2023) doi: 10.1093/nar/gkad957. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Sun S. Y. et al. Cryo-ET of parasites gives subnanometer insight into tubulin-based structures. Proc. Natl. Acad. Sci. U. S. A. 119, (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Liu H.-F. et al. nextPYP: a comprehensive and scalable platform for characterizing protein variability in situ using single-particle cryo-electron tomography. Nat. Methods (2023) doi: 10.1038/s41592-023-02045-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Chmielewski D. et al. Integrated analyses reveal a hinge glycan regulates coronavirus spike tilting and virus infectivity. Res Sq (2023) doi: 10.21203/rs.3.rs-2553619/v1. [DOI] [Google Scholar]
- 46.Yang H. et al. Automated and accurate deposition of structures solved by X-ray diffraction to the Protein Data Bank. Acta Crystallogr. D Biol. Crystallogr. 60, 1833–1839 (2004). [DOI] [PubMed] [Google Scholar]
- 47.wwPDB Consortium. EMDB-the Electron Microscopy Data Bank. Nucleic Acids Res. (2023) doi: 10.1093/nar/gkad1019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Westbrook J. D. et al. The chemical component dictionary: complete descriptions of constituent molecules in experimentally determined 3D macromolecules in the Protein Data Bank. Bioinformatics 31, 1274–1278 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Gražulis S. et al. Crystallography Open Database (COD): an open-access collection of crystal structures and platform for world-wide collaboration. Nucleic Acids Res. 40, D420–7 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Moriarty N. W., Grosse-Kunstleve R. W. & Adams P. D. electronic Ligand Builder and Optimization Workbench (eLBOW): a tool for ligand coordinate and restraint generation. Acta Crystallogr. D Biol. Crystallogr. 65, 1074–1080 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Nicholls R. A. et al. The missing link: covalent linkages in structural models. Acta Crystallogr D Struct Biol 77, 727–745 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Chaudhury S., Lyskov S. & Gray J. J. PyRosetta: a script-based interface for implementing molecular modeling algorithms using Rosetta. Bioinformatics 26, 689–691 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Wang J., Wolf R. M., Caldwell J. W., Kollman P. A. & Case D. A. Development and testing of a general amber force field. J. Comput. Chem. 25, 1157–1174 (2004). [DOI] [PubMed] [Google Scholar]
- 54.O’Boyle N. M. et al. Open Babel: An open chemical toolbox. J. Cheminform. 3, 33 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Vanommeslaeghe K. et al. CHARMM general force field: A force field for drug-like molecules compatible with the CHARMM all-atom additive biological force fields. J. Comput. Chem. 31, 671–690 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Vagin A. A. et al. REFMAC5 dictionary: organization of prior chemical knowledge and guidelines for its use. Acta Crystallogr. D Biol. Crystallogr. 60, 2184–2195 (2004). [DOI] [PubMed] [Google Scholar]
- 57.Chojnowski G., Sobolev E., Heuser P. & Lamzin V. S. The accuracy of protein models automatically built into cryo-EM maps with ARP/wARP. Acta Crystallogr D Struct Biol 77, 142–150 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Terashi G. & Kihara D. De novo main-chain modeling for EM maps using MAINMAST. Nat. Commun. 9, 1618 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Terashi G., Kagaya Y. & Kihara D. MAINMASTseg: Automated Map Segmentation Method for Cryo-EM Density Maps with Symmetry. J. Chem. Inf. Model. 60, 2634–2643 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Chen M. & Baker M. L. Automation and assessment of de novo modeling with Pathwalking in near atomic resolution cryo-EM density maps. J. Struct. Biol. 204, 555–563 (2018). [DOI] [PubMed] [Google Scholar]
- 61.DiMaio F., Tyka M. D., Baker M. L., Chiu W. & Baker D. Refinement of protein structures into low-resolution density maps using rosetta. J. Mol. Biol. 392, 181–190 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Webb B. & Sali A. Protein structure modeling with MODELLER. Methods Mol. Biol. 1137, 1–15 (2014). [DOI] [PubMed] [Google Scholar]
- 63.Si D. et al. Deep Learning to Predict Protein Backbone Structure from High-Resolution Cryo-EM Density Maps. Sci. Rep. 10, 4282 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Pfab J., Phan N. M. & Si D. DeepTracer for fast de novo cryo-EM protein structure modeling and special studies on CoV-related complexes. Proc. Natl. Acad. Sci. U. S. A. 118, (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Igaev M., Kutzner C., Bock L. V., Vaiana A. C. & Grubmüller H. Automated cryo-EM structure refinement using correlation-driven molecular dynamics. Elife 8, (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Brown A. et al. Tools for macromolecular model building and refinement into electron cryo-microscopy reconstructions. Acta Crystallogr. D Biol. Crystallogr. 71, 136–153 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Yamashita K., Palmer C. M., Burnley T. & Murshudov G. N. Cryo-EM single-particle structure refinement and map calculation using Servalcat. Acta Crystallogr D Struct Biol 77, 1282–1291 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Nicholls R. A., Fischer M., McNicholas S. & Murshudov G. N. Conformation-independent structural comparison of macromolecules with ProSMART. Acta Crystallogr. D Biol. Crystallogr. 70, 2487–2499 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Singharoy A. et al. Molecular dynamics-based refinement and validation for sub-5 Å cryo-electron microscopy maps. Elife 5, (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Shekhar M. et al. CryoFold: determining protein structures and data-guided ensembles from cryo-EM density maps. Matter 4, 3195–3216 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.MacCallum J. L., Perez A. & Dill K. A. Determining protein structures by combining semireliable data with atomistic physical models by Bayesian inference. Proc. Natl. Acad. Sci. U. S. A. 112, 6985–6990 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Perez A., MacCallum J. L. & Dill K. A. Accelerating molecular simulations of proteins using Bayesian inference on weak information. Proc. Natl. Acad. Sci. U. S. A. 112, 11846–11851 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73.Chojnowski G. DoubleHelix: nucleic acid sequence identification, assignment and validation tool for cryo-EM and crystal structure models. Nucleic Acids Res. 51, 8255–8269 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Hsin J., Arkhipov A., Yin Y., Stone J. E. & Schulten K. Using VMD: an introductory tutorial. Curr. Protoc. Bioinformatics Chapter 5, Unit 5.7 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.Pettersen E. F. et al. UCSF Chimera--a visualization system for exploratory research and analysis. J. Comput. Chem. 25, 1605–1612 (2004). [DOI] [PubMed] [Google Scholar]
- 76.Goddard T. D. et al. UCSF ChimeraX: Meeting modern challenges in visualization and analysis. Protein Sci. 27, 14–25 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Croll T. I. ISOLDE: a physically realistic environment for model building into low-resolution electron-density maps. Acta Crystallogr D Struct Biol 74, 519–530 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78.Warshamanage R., Yamashita K. & Murshudov G. N. EMDA: A Python package for Electron Microscopy Data Analysis. J. Struct. Biol. 214, 107826 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79.Burnley T., Palmer C. M. & Winn M. Recent developments in the CCP-EM software suite. Acta Crystallogr D Struct Biol 73, 469–477 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80.Ramlaul K., Palmer C. M. & Aylett C. H. S. A Local Agreement Filtering Algorithm for Transmission EM Reconstructions. J. Struct. Biol. 205, 30–40 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 81.Rose Y. et al. RCSB Protein Data Bank: Architectural Advances Towards Integrated Searching and Efficient Access to Macromolecular Structure Data from the PDB Archive. J. Mol. Biol. 433, 166704 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 82.Burley S. K. et al. Electron microscopy holdings of the Protein Data Bank: the impact of the resolution revolution, new validation tools, and implications for the future. Biophys. Rev. 14, 1281–1301 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 83.Chen V. B., Davis I. W. & Richardson D. C. KING (Kinemage, Next Generation): a versatile interactive molecular and scientific visualization program. Protein Sci. 18, 2403–2409 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]