Abstract
Structures of biomolecular systems are increasingly computed by integrative modeling. In this approach, a structural model is constructed by combining information from multiple sources, including varied experimental methods and prior models. In 2019, a Workshop was held as a Biophysical Society Satellite Meeting to assess progress and discuss further requirements for archiving integrative structures. The primary goal of the Workshop was to build consensus for addressing the challenges involved in creating common data standards, building methods for federated data exchange, and developing mechanisms for validating integrative structures. The summary of the Workshop and the recommendations that emerged are presented here.
1. Introduction
When the Protein Data Bank (PDB) (Protein Data Bank, 1971) was first established in 1971, X-ray crystallography (X-ray) was the only method for determining three-dimensional structures of biological macromolecules at sufficient resolution to build atomic models. A decade later, structures of biomolecules in solution could also be determined by Nuclear Magnetic Resonance (NMR) spectroscopy (Williamson et al., 1985). Recently, three-dimensional cryo-electron microscopy (3DEM) (Henderson et al., 1990) began to achieve unprecedented near-atomic resolution for large complex assemblies. Increasingly, investigators are also modeling structures based on data from more than one method (Rout and Sali, 2019). These integrative/hybrid approaches to structure determination consist of collecting information about a system using multiple experimental and computational methods, followed by integrative/hybrid modeling that converts this information into integrative/hybrid structure models. For succinctness, we will use the term integrative hereafter to refer to integrative/hybrid approaches, modeling, and models.
The PDB has established a data processing pipeline for depositing, validating, archiving, and disseminating structures determined by single methods, and to a limited extent structures based on data from two different experimental methods. Examples of the latter include structures derived from a combination of X-ray plus neutron diffraction data, NMR or X-ray plus Small Angle Scattering (SAS) data. However, the processing of structures produced by integrating data from many different methods and/or those depicted by non-atomic, coarse-grained representations, poses a greater challenge. Given the importance of integrative structures for advancing biological sciences and the significant investment made to determine them, the Worldwide Protein Data Bank (wwPDB) (Berman et al., 2003) initiated an effort to address the key challenges in enhancing its data processing pipeline to accommodate integrative structures.
In 2014, the wwPDB convened an Integrative/Hybrid Methods (IHM) Task Force and sponsored a workshop held at the European Bioinformatics Institute (EBI). The purpose of the workshop was to engage a community of experts to make recommendations for how to responsibly archive integrative structures. The five recommendations formulated by the workshop participants were:
In addition to archiving the models themselves, all relevant experimental data and metadata as well as experimental and computational protocols should be archived; inclusivity is key.
A flexible model representation needs to be developed, allowing for multi-scale models, multi-state models, ensembles of models, and models related by time or other order.
Procedures for estimating the uncertainty of integrative models should be developed, validated, and adopted.
A federated system of model and data archives should be created.
Publication standards for integrative models should be established.
A Whitepaper was published (Sali et al., 2015), and two working groups were established; the Federation Working Group was to address the issues of data federation (Figure 1) and the Model Working Group was tasked with helping set up the framework for model representation, validation, and visualization.
Figure 1. Illustration of federating structural models and experimental data.
At the center are the three structural biology model repositories: the PDB archive of experimentally determined structures of macromolecules (wwPDB consortium, 2019); the ModelArchive of in silico structural models (www.modelarchive.org); and the PDB-Dev prototype system for archiving integrative structures (Burley et al., 2017; Vallat et al., 2018). The outer circle indicates experimental data that contribute to integrative structural biology. Existing data exchange mechanisms for X-ray, NMR, 3DEM, and SAS data are represented by black arrows. Ongoing and future projects aim to develop methods for data exchange with archives for other types of experimental data as well as among the existing structural model repositories (gray arrows).
Over the last five years, steady progress has been made in implementing the IHM Task Force recommendations. Members of the Federation and Model Working Groups have met periodically in-person and via video conferencing. One key challenge has been to develop common data standards for describing the multiple experimental and computational methods used to produce integrative structures. Thus, the PDB exchange/Macromolecular Crystallographic Information File (PDBx/mmCIF) dictionary (Fitzgerald et al., 2005; Westbrook, 2013) for describing structures has been extended to include the terms necessary for representing and archiving integrative structures (Vallat et al., 2018). Software support for these dictionary extensions has been developed, including software tools for visualizing integrative structures (Goddard et al., 2018) and a prototype archiving system called PDB-Dev (pdb-dev.wwpdb.org) (Burley et al., 2017; Vallat et al., 2018; Vallat et al., 2019). Mechanisms that facilitate data exchange (e.g., transfer of restraints from an experimental data archive to a structure archive) among archives are being developed. Furthermore, methods for validating integrative structures are also being developed.
The wwPDB has proposed a governance structure for structural biology archives. These archives include Core Archives, currently the Protein Data Bank (PDB) (wwPDB consortium, 2019) and the Biological Magnetic Resonance Data Bank (BMRB (Ulrich et al., 2008), as well as Federated Resources that participate in data exchange with the Core Archives. The Electron Microscopy Data Bank (EMDB) (Tagari et al., 2002) is proposed to become a Core Archive in the near future. Federated resources expected to align with the wwPDB in 2019 include the Small Angle Scattering Biological Data Bank (SASBDB) (Valentini et al., 2015), and the Electron Microscopy Public Image Archive (EMPIAR) (Iudin et al., 2016). A proof of concept software system for bidirectional data exchange between SASBDB and the PDB is under development.
In 2019, a Workshop was held as a Biophysical Society (BPS) Satellite Meeting to assess progress and discuss further requirements for archiving integrative structures. The primary goal of the Workshop was to build consensus for addressing the challenges involved in creating common data standards, building methods for federated data exchange, and developing mechanisms for validating integrative structures. This goal is aligned with the “FAIR” (Findable, Accessible, Interoperable and Reusable) guiding principles of scientific data management (Wilkinson et al., 2016). The summary of the Workshop and the recommendations that emerged are presented here.
2. Progress on archiving integrative structures
2.1. Archiving requirements
The requirements for archiving integrative structures include: (1) creating standard definitions for the experimental data used for structure determination and the structural features of the models; (2) developing methods for curation and validation of models and data; and (3) building the infrastructure for acquiring, archiving, and disseminating the models and the data. Because integrative structures are based on data derived from multiple experimental methods, the wwPDB IHM Task Force came up with the concept of Federated Resources, wherein structural models and experimental data could be seamlessly exchanged. Within the Federation model, expert communities are responsible for the creation of data standards in their respective areas. Experts in multiple domains contribute to multiple resources and provide coordination on common data standards among resources. The development of well-aligned data standards and efficient methods for data exchange among the different repositories as well as software platforms are key prerequisites for an effective Federation. An integrated Federated system will provide a mechanism for archiving the experimental data, structural models, and associated metadata, such as citations, software, authors, workflows, sample, and data and model quality metrics. Furthermore, the availability of experimental data used for building structural models will facilitate the development of methods for building and validating integrative structures.
2.2. Molecular representation of integrative structures
One of the fundamental requirements for all operations involving integrative structures, including computing, archiving, validating, visualizing, disseminating, and analyzing, is the creation of standards for representing these models. Integrative structures are often computed for large conformationally and compositionally heterogeneous systems, based on relatively sparse and potentially low-resolution datasets. Thus, a molecular representation of ensembles of multi-scale and/or multi-state structures is required. The first version of the prototype archiving system for integrative structures (Vallat et al., 2018) adopted the molecular representation developed as part of the open source Integrative Modeling Platform (IMP) program (Russel et al., 2012).
2.3. An extensible standard dictionary of terms
During the 2000’s, the wwPDB transitioned from using the PDB Format (Callaway et al., 1996) to the mmCIF data representation (Fitzgerald et al., 2005; Westbrook, 2013) for archiving structural models. The PDBx/mmCIF standard provides a rich framework for defining macromolecular components, small-molecule ligands, polymeric sequences, and atomic coordinates. The PDBx/mmCIF data representation was extended by adding terms to accommodate the expanded molecular representation for integrative structures (Section 2.2) and the many experimental and computational methods used to determine them. These additional definitions are maintained as an extension dictionary called the IHM Dictionary (Vallat et al., 2018). The organization of the extension dictionary capturing these additional data definitions is depicted in Figure 2. Descriptions of starting structural models of the components used in integrative modeling of assemblies are also included, along with definitions of spatial restraints derived from multiple methods, including chemical crosslinking mass spectrometry (CX-MS), two-dimensional electron microscopy (2DEM), 3DEM, SAS, Förster resonance energy transfer (FRET), and electron paramagnetic resonance (EPR) spectroscopy. Generic methods for describing modeling workflows and for referencing data residing in external resources are also provided.
Figure 2. Depiction of the data content captured in the IHM Dictionary.
The green boxes represent existing external repositories that provide information referenced from the IHM Dictionary. Macromolecular sequence information is available from UniProt (The UniProt Consortium, 2017) and the International Nucleotide Sequence Database Collaboration (INSDC) (Nakamura et al., 2013); small-molecule chemical information is available from the Cambridge Crystallographic Data Center (CCDC) (Groom et al., 2016); macromolecular structures are archived in the PDB (wwPDB consortium, 2019), ModelArchive (www.modelarchive.org), and PDB-Dev (Burley et al., 2017; Vallat et al., 2018); and various types of experimental data are available from the PDB (wwPDB consortium, 2019), BMRB (Ulrich et al., 2008), EMDB (Tagari et al., 2002), and SASBDB (Valentini et al., 2015). The yellow boxes show the information derived from the repositories used in integrative modeling. The chemistry of the molecular components is already contained in the PDBx/mmCIF dictionary. The starting structural models derived from the structural data repositories and the spatial restraints derived from experimental methods are described in the IHM Dictionary. The orange box depicts the combination of multi-scale, multi-state, ordered ensembles whose representations are defined in the IHM Dictionary (Vallat et al., 2018).
A software library called python-ihm (github.com/ihmwg/python-ihm) has been built to support reading, writing, and managing data files compliant with the IHM Dictionary. The library can be used as a standalone package or as part of an integrative modeling package. The IMP modeling program (Russel et al., 2012) and the ChimeraX visualization software (Goddard et al., 2018) already use the python-ihm library to support the IHM Dictionary.
2.4. PDB-Dev: a prototype archiving system for integrative structures
A prototype archiving system called PDB-Dev (Vallat et al., 2018) supporting integrative modeling was announced in 2017 (Burley et al., 2017). PDB-Dev (pdb-dev.wwpdb.org) currently contains ~35 structures and is growing rapidly. The structures in PDB-Dev range from small- and medium-size complexes (such as human Rev7 dimer (Rizzo et al., 2018), diubiquitin complex (Liu et al., 2018), 16S rRNA complexed with methyltransferase A (van Zundert et al., 2015), and human mitochondrial iron sulfur cluster core complex (Cai et al., 2018)), to large complexes (such as the yeast nuclear pore complex (Kim et al., 2018) and the RNF168-RING domain nucleosome complex (Horn et al., 2019)). The structures were determined based on data from experimental methods such as CX-MS, 2DEM, 3DEM, NMR, SAS, FRET, EPR, and other proteomics and biophysical techniques. Various modeling programs, such as IMP (Russel et al., 2012), HADDOCK (Dominguez et al., 2003; van Zundert et al., 2016), Rosetta (Leaver-Fay et al., 2011), XPLOR-NIH (Schwieters et al., 2018), TADbit (Trussart et al., 2015; Serra et al., 2017), iSPOT (Huang et al., 2016; Hsieh et al., 2017), FPS (Dimura et al., 2016), PatchDock (Schneidman-Duhovny et al., 2005), and BioEn (Hummer and Kofinger, 2015), have been used in building these structures.
2.5. A pipeline for deposition, curation, validation, visualization, and dissemination
Work is in progress to expand the PDB-Dev system into a pipeline that can handle deposition, curation, validation, and dissemination of integrative structures and associated data. A key objective is to integrate this PDB-Dev prototype into the wwPDB OneDep system (Young et al., 2017) (Figure 3) and the integrative structures into the PDB archive.
Figure 3. Schematic representation of the pipeline for archiving integrative structures (top panel) and the future wwPDB OneDep pipeline (bottom panel).
The blue boxes in the top panel show the past and ongoing development projects for archiving integrative structures. These projects include creation of the data representation, development of specific methods for annotation and validation of integrative structures, and creation of a prototype deposition and archiving system, called PDB-Dev (Vallat et al., 2018). The green boxes show current and future components of the wwPDB OneDep pipeline (Young et al., 2017). The methods developed for processing and archiving integrative structures in the top panel will be transferred into the wwPDB OneDep pipeline in the bottom panel to provide support for integrative structures within OneDep.
3. Resources for computing and visualizing integrative structures
A variety of resources and approaches for integrative modeling exist (Table 2 in (Rout and Sali, 2019)), including programs developed specifically for integrative modeling and scripts that exploit programs originally developed for other types of modeling. Several modeling programs used to compute integrative structures deposited in PDB-Dev and software tools used to visualize these structures are outlined below.
3.1. Integrative Modeling Platform (IMP)
IMP is an open-source software package that provides programmatic support for implementing and distributing integrative modeling protocols (Russel et al., 2012). Building a structural model is cast as a computational optimization problem, where knowledge about the modeled system can be used in five different ways, guided by maximizing the accuracy and precision of the model while remaining computationally feasible: (i) representing components of a model, (ii) scoring a model for its consistency with input information, (iii) searching for good-scoring models, (iv) filtering models based on input information, and (v) validating the resulting models (Rout and Sali, 2019). IMP is designed to allow mixing-and-matching of different molecular representations, scoring functions, and sampling schemes. It has been used mainly for structural modeling of macromolecular complexes by assembling subunits of known structure based on data from 3DEM, CX-MS, FRET, SAS, Hydrogen Deuterium Exchange Mass Spectrometry (HDX-MS), and various proteomics and bioinformatics methods. Integrative structures of several complexes determined using IMP have been deposited in PDB-Dev, including the nuclear pore complex (Kim et al., 2018) and various of its sub-complexes (Kim et al., 2014; Shi et al., 2014; Fernandez-Martinez et al., 2016; Upla et al., 2017), exosome (Shi et al., 2015), mediator (Robinson et al., 2015), 26S proteasome (Wang et al., 2017), complement C3(H2O) (Chen et al., 2016), and Pol II (G) (Jishage et al., 2018).
3.2. High Ambiguity Driven protein-protein DOCKing (HADDOCK)
HADDOCK (Dominguez et al., 2003; van Zundert et al., 2016) is an information-driven flexible docking approach for modeling macromolecular complexes that builds upon CNS (Brünger et al., 1998) as its computational engine. It leverages ambiguous and low-resolution data to guide the docking process. HADDOCK is versatile in handling any type of interface mapping information that is translated into ambiguous interaction restraints (AIRs). It supports the incorporation of distance restraints derived from a variety of experimental techniques, such as CX-MS and FRET, as well other NMR-based restraints, such as residual dipolar couplings (RDCs), pseudo-contact chemical shifts (PCSs), and dihedral angle restraints. In addition, HADDOCK can use 3DEM maps and other shape-based restraints. Structures archived in PDB-Dev that have been determined using HADDOCK include the 16S rRNA complexed with methyltransferase A (van Zundert et al., 2015), the human mitochondrial iron sulfur cluster core complex (Cai et al., 2018), the human Rev7 dimer (Rizzo et al., 2018), and the nucleosome complex with RNF168-RING domain and Ubiquitin (Horn et al., 2019). Work is in progress to support automated deposition of files created by HADDOCK into PDB-Dev.
3.3. Rosetta
Rosetta (Leaver-Fay et al., 2011) is a comprehensive software suite for macromolecular modeling and design. Rosetta provides a wide range of functionalities, including de novo structure prediction, protein design, small molecule and protein docking, and modeling based on restraints derived from a variety of experimental techniques such as X-ray, NMR, 3DEM, SAS, HDX-MS, CX-MS, and EPR. Restraints can be combined in flexible forms. RosettaScripts (Fleishman et al., 2011) and PyRosetta (Chaudhury et al., 2010) allow for the development of problem-tailored protocols in a plug-and-play fashion, allowing incorporation of multiple sources of experimental data in a single computational experiment. It has been demonstrated that Rosetta can refine integrative structures and accurately add atomic details not present in the experimental data (Wang et al., 2016). The Rosetta software package is open-source, free for academic use, and developed by the RosettaCommons consortium that new developers can join readily. Rosetta-based integrative structures that have been deposited into PDB-Dev include structures of the serum albumin domains in human blood serum (Belsom et al., 2016), the peptide Ghrelin bound to its G-protein coupled receptor (Bender et al., 2019), HCN voltage gated ion channel (Dai et al., 2019) and the native BBSome (Chou et al., 2019). Work is in progress to implement support within Rosetta for creating data files that can be archived in PDB-Dev.
3.4. Bayesian Inference of Ensembles (BioEn)
BioEn is a modeling application that integrates data from diverse experiments with reference ensemble information obtained from simulation or modeling using a Bayesian framework (Hummer and Kofinger, 2015). It enables assessment of the quality and consistency of the experimental data as well as the reference ensemble. The method has been successfully applied to model structures based on EPR data, such as the dimeric SLC26 transporter (Chang et al., 2019), which has been deposited into PDB-Dev. In addition, ensemble refinement based on SAS data has been used to determine the solution structures of the Atg1-Atg13 and Atg17-Atg31-Atg29 subcomplexes and the Atg1 complex (Kofinger et al., 2015). Ongoing research is focused on the development of mechanisms to deal with inconsistent data, automated assessment of model and data quality, and designing a formalism to assess error estimates (Kofinger et al., 2019).
3.5. Integrative modeling with CNS and X-plor
The flexibility of general purpose structure refinement programs, such as X-plor (Brünger, 1992) and Crystallography and NMR System (CNS) (Brünger et al., 1998), made it possible to generate protocols for integrative structure modeling. For example, the complex between single-stranded DNA and single-stranded DNA binding protein of a filamentous bacteriophage was modeled based on stoichiometry and data from low resolution electron microscopy and NMR spectroscopy (Folmer et al., 1994); the complex of multifunctional hexameric arginine repressor with DNA was modeled based on chemical footprinting (Sunnerhagen et al., 1997); and structures of bacterial pili were modeled based on symmetry derived from low resolution 3DEM data, cross-linking, and double charge inversion mutations (Campos et al., 2010; Campos et al., 2011). Similarly, a coarse-grained model of RNA polymerase Pol III was sampled by a Bayesian, ISD-like method implemented in CNS, based on restraints from cross-linking mass spectrometry (Ferber et al., 2016).
3.6. Biochemical Library (BCL)
The Biochemical Library (BCL) program models proteins as assemblies of secondary structure elements (Karakas et al., 2012). The BCL can simultaneously use experimental restraints from 3DEM (Lindert et al., 2009), NMR (Weiner et al., 2014), EPR (Fischer et al., 2015), CX-MS (Hofmann et al., 2015), and SAS (Putnam et al., 2015) experiments. The rationale for replacing flexible loop regions with a loop closure constraint is to substantially reduce the conformational space of a protein, correspondingly reducing the sampling challenge. As many experimental data points relate to secondary structure elements, sampling can often be simplified without substantially reducing the experimental data used for structure determination. The strength of BCL lies in modeling proteins that are rich in secondary structure, such as membrane proteins (Weiner et al., 2013). It has been used, for example, to compute a structural model for the phage T4 recombination mediator protein UvsY (Gajewski et al., 2016).
3.7. Modeling of Genomes using Hi-C data
Data obtained from chromosome conformation capture (Hi-C) experiments can be used to model the three-dimensional structures of genomes (Oluwadare et al., 2019). TADbit (Serra et al., 2017) and Population-based Genome Structure (PGS) (Hua et al., 2018) are two software packages that model 3D genome structures from Hi-C data. TADbit relies on IMP, using modeling by satisfaction of spatial restraints to build 3D structures of genomes from chromatin interaction frequencies obtained through Hi-C experiments. PGS uses a population-based probabilistic approach to model 3D genome structures that are consistent with chromatin-chromatin interaction probabilities obtained from Hi-C data. The multi-scale 3D Chromatin model of the first 4.5Mb of Chromosome 2L from the Drosophila melanogaster genome (Trussart et al., 2015) obtained using TADbit has been deposited in PDB-Dev. Work is in progress to archive 3D models of the human genome obtained using PGS.
3.8. ChimeraX
ChimeraX (Goddard et al., 2018) is a new software application for the visualization and analysis of molecular structures and associated data built using the extensive code base, knowledge, and experience gained from Chimera (Pettersen et al., 2004). It can be used to visualize the integrative structures archived in PDB-Dev. Correspondingly, ChimeraX enables the visualization of multi-scale ensembles comprised of atomic and coarse-grained beaded representations, input spatial restraints such as distances from CX-MS experiments, 2DEM images and 3DEM maps, as well as preliminary validation information regarding satisfaction of input restraints. Satisfied and violated crosslinks are displayed in different colors in ChimeraX, thus facilitating the visualization of preliminary validation information.
3.9. Visual Molecular Dynamics (VMD)
VMD is a rapidly evolving modeling and visualization platform that provides tools for simulation preparation, visualization, and analysis (Humphrey et al., 1996). In particular, it is applicable to large-scale systems and datasets. VMD uses advanced technologies to enable cell-scale modeling and visualization, including using all-atom and coarse-grained molecular representations. It can also integrate experimental data, such as cryo-EM density maps. Work is in progress to support visualization of integrative structures archived in PDB-Dev and to create new graphical interfaces to query and interact with the data. The current focus is on visualizing multi-scale ensembles, restraint information from experiments, statistical inferences, and associated model uncertainties.
4. Standards for representing, validating, and archiving experimental data
Data standards are required to build stable databases and to exchange data among different software programs. The various levels of data standards include data definitions for the experimental and computational methods as well as descriptions of the chemistry and structures. As validation methods are developed, clear definitions for the relevant terms must be created for these methods. The process of creating generally adopted standards requires participation among community stakeholders. These stakeholders include experimentalists, software developers, and the stewards of databases. Once the standards are created and codified into dictionaries, there needs to be cooperation by the journals and funders in enforcing the standards.
We describe below standards for structures derived from traditional single experimental methods followed by emerging standards for experimental and computational methods contributing to integrative structural biology.
4.1. Standards for models derived by single methods
Following the establishment of the PDB and the enforcement of data deposition into the PDB as a requirement for publication in journals, efforts to further standardize the data began. A Data Dictionary for macromolecular crystallography was created as an International Union of Crystallography (IUCr)-sponsored community effort (Bourne et al., 1997). The dictionary called mmCIF contained over 3000 definitions for many aspects of the X-ray experiments, as well as definitions for the chemistry and the three-dimensional structures. Over time, extensions have been added for the other methods used for structure determination. The extended dictionary is called PDBx. A resource site contains the dictionary, software, and general information about mmCIF (mmcif.wwpdb.org). The Master Format for the PDB Core Archive is now PDBx/mmCIF (Fitzgerald et al., 2005). After the community demanded to require structure factors as part of data deposition in 2008, an X-ray Validation Task Force was established with the goal of creating standards for validation of structures determined using X-ray data. Their recommendations were published in 2011 (Read et al., 2011) and were implemented as part of the wwPDB OneDep system (Gore et al., 2012; Young et al., 2017).
Biomolecular NMR data are deposited into BMRB (Ulrich et al., 2008), and the structural models into the PDB. An NMR Data Exchange Format (NEF) for representation of chemical shift and restraint data with future extensions to various other data, as well as relevant metadata, has been created (Gutmanas et al., 2015). NEF is a subset of the more comprehensive NMR-STAR format employed for the BMRB Core Archive (Ulrich et al., 2018). The wwPDB NMR Validation Task Force (NMR VTF) was established and published recommendations in 2013 (Montelione et al., 2013). The first set of recommendations were implemented in the wwPDB NMR validation pipeline using existing software. The NMR VTF has worked with the NMR community to develop standards for designating representative structures from a set of deposited models, and for defining well- vs ill-defined regions of protein structures. It has recommended that the depositor be allowed to also provide a depositor-designated representative structure. This structural representation information is essential for users of models generated from NMR data. Longer term goals include handling of all aspects of dynamic processes, including multi-conformer, multi-model ensembles, partially and completely unfolded proteins, as well as all types of biomolecules studied by NMR, including proteins, nucleic acids, polysaccharides, and small molecules.
The 3DEM community has developed a common metadata standard for archiving both experimental maps and map-derived structural models (Lawson et al., 2011; Patwardhan and Lawson, 2016). Incorporation of the standard into the PDBx/mmCIF dictionary enables joint deposition of 3DEM maps into Electron Microscopy Data Bank (EMDB) (Tagari et al., 2002) and 3DEM models into PDB (wwPDB consortium, 2019). Raw 2D image datasets may be archived separately into the Electron Microscopy Public Image Archive (EMPIAR; (Iudin et al., 2016)). A 3DEM Validation Task Force that met in 2010 emphasized the need to develop and standardize validation practices and metrics for evaluation and comparison of maps and models (Henderson et al., 2012). Subsequent workshops and community challenge activities are helping to advance this effort (Patwardhan et al., 2012; Patwardhan et al., 2014; Baker, 2018; Editorial, 2018; Lawson and Chiu, 2018). A follow-up meeting focused on 3DEM map/model validation is planned for 2020.
4.2. Standards for other experimental methods providing information for integrative modeling
The experimental methods that can contribute to integrative structure determination include traditional 3D structure determination methods (X-ray crystallography, NMR spectroscopy, and 3DEM) as well as many other methods that provide restraints on, for example, solvent exposure, regions of interaction, and shapes and relative dispositions of components (Table 1 in (Rout and Sali, 2019)). The heterogeneity of input information presents a significant challenge not only for archiving the final model, as is addressed above, but for making the input information available for validation and potentially further refinement as new data emerge. The challenges are manifold. First, individual communities have to agree on standards for their data and criteria to ensure quality and reliability. Next, these communities must communicate with each other to ensure that data exchange is facilitated. Various communities are at different stages of this coordination.
SAS was one of the first methods to be combined with the PDB standard bearers (X-ray crystallography, NMR spectroscopy, 3DEM) in computing integrative structures. With the rapid increase in the number of non-expert users, the field saw a wide variability in reporting of data and results. Thus, experts in SAS recognized the need for quality assurance regarding sample provenance, measurement, and processing of data, underpinned by standard tools for assessing the data and models. With sustained community input, preliminary guidelines were developed (Jacques et al., 2012a; Jacques et al., 2012b), followed by their adoption by the International Union of Crystallography (IUCr) Commission Journals in 2012. In 2014, the wwPDB SAS Validation Task Force (SAS VTF) was established (Trewhella et al., 2013) and expanded the guidelines to provide additional recommendations for archiving SAS data. One of the key recommendations of the SAS VTF was to bring together structural biology leaders to address the challenges involved in archiving integrative structures. The 2014 wwPDB IHM Task Force meeting (Sali et al., 2015) was the realization of this recommendation.
A universal exchange dictionary for SAS named sasCIF was established in 2000 (Malfois and Svergun, 2000). The sasCIF Data Dictionary was then extended to describe the experimental information, results, and models, including relevant metadata for analysis and validation of the data and models (Kachala et al., 2016). Processing tools for these files have been developed and made available as open-source programs. The SASBDB repository (Valentini et al., 2015) was established as a searchable public repository for SAS data and models; it currently contains over 1100 released entries with more than 350 additional entries on hold. In 2017, the biomolecular SAS publication guidelines were updated (Trewhella et al., 2017). Recently, a community project was initiated to generate SAS data sets for benchmarking different approaches to predicting SAS profiles from atomic coordinates (sas.wwpdb.org). Finally, a proof-of-concept software system for bidirectional data exchange between SABDB and the PDB is currently under development.
The CX-MS community has recommended proteomics data standards established by the Proteomics Standard Initiative (www.psidev.info, (Deutsch et al., 2017b)). These standards include mzML (Martens et al., 2011) as a standard format for raw data and mzIdentML for search results (crosslink identifications). Support for crosslinking data has been established in mzIdentML 1.2 (Vizcaino et al., 2017), but at this point not all workflows used by the community are supported. Data are increasingly archived in repositories of the ProteomeXchange consortium (Deutsch et al., 2017a) and ChorusProject (chorusproject.org). Work is in progress to reach agreement on minimal metadata standards, to expand crosslinking support in mzIdentML, and to develop reporting standards for publication. A definition for reporting crosslinking restraints is already available in the new extension dictionary for integrative modeling; the development of tools for the seamless integration of MS and modeling data is therefore an obvious next step.
An extension of the PDBx/mmCIF dictionary with terms for fluorescence-based experiments with a current focus on FRET has been created recently (github.com/ihmwg/FLR-dictionary). This extension includes the description of fluorescent probes and resulting FRET-derived inter-dye distances. These extensions can also be applied to other probe-based spectroscopies, such as paramagnetic relaxation enhancement in NMR and spin labels for double electron-electron resonance (DEER) in EPR. A recent multi-laboratory FRET benchmark study demonstrated the precision and accuracy of FRET measurements for dsDNA rulers (estimated uncertainty in relative distance measurement deviation of less than 0–5% is well within the expected error) as well as documented measurement and analysis procedures (Hellenkamp et al., 2018). The FRET community (www.FRET.community) was founded to enhance dissemination, community-driven development of analysis tools, and sharing of data and tools. Even though the starting point and scientific focus of this community is FRET spectroscopy and imaging, it is open to members of other communities, including those that use other types of fluorescence techniques. Currently, researchers in the FRET community perform benchmark FRET studies for proteins with the aim to find the best tool for extracting kinetic information from single-molecule traces (kinSOFTChallenge 2019). In addition to these community-driven experimental and computational challenges, work is in progress to achieve agreement on minimal metadata, establish a standard file format to provide workflow support, establish guidelines for documentation and validation of experiments, analysis and simulations, as well as create reporting standards for publication. A key goal is to standardize methods for the validation of fluorescence-based structural models. A proposal to create a Fluorescence Biological Data Bank (FLBDB) is in progress, aiming to archive data from fluorescence experiments. A number of workshops have been held to discuss FRET and issues of standards and reproducibility in the FRET community. A yearly workshop is planned as a satellite meeting to MAF (Methods and Applications of Fluorescence) conferences (2019 at UC San Diego, 2020 at Chalmers University of Technology, Gothenburg).
The HDX community is in the early stages of developing its standards for reporting and data deposition. The International Society for HDX Mass Spectrometry was formed (www.hdxms.net) in 2017, in part to address the high degree of variability in methods, data reporting, and interpretation employed within this rapidly growing field. The community recently published the “Gothenburg Guidelines” describing best practices for performing and reporting HDX-MS experiments (Masson et al., 2019). A recent workshop engaged the wider structural community to learn from experiences in establishing durable community standards. As a result of these efforts, the international society formed a task-group to develop a position on the adoption of a data exchange dictionary, the creation of data standards, and an open archive for HDX-MS data. Discussions are underway with the proteomics community at the European Bioinformatics Institute for archiving data in the PRIDE database (Vizcaino et al., 2013; Vizcaino et al., 2016) as well as in ChorusProject (chorusproject.org). In addition to standardization of HDX-MS data reporting and deposition, the HDX community has also been engaged in interpretation of HDX-MS data. Despite being complementary to structure-based methods, the current role of HDX in integrative structural analysis is only qualitative; although solvent exchange is generally correlated with protein dynamics, the structure-rate relationship of protein solvent exchange remains ambiguous (Skinner et al., 2012b; Skinner et al., 2012a).
Electron Paramagnetic Resonance (EPR) spectroscopy, also known as electron spin resonance (ESR) spectroscopy, in combination with site-directed spin labeling generates long range distance restraints (in the 1.5 – 8.0 nm range) for macromolecular characterization. Recently, different paramagnetic labels have been developed and optimized for such applications. Several software tools to obtain distance distributions from the time-domain EPR data are available (e.g., DeerAnalysis (Jeschke et al., 2006)). The EPR community is currently working on a Whitepaper with recommendations for experimental procedures and data standards for pulsed dipolar spectroscopy. In a first step, the German Research Society will initiate an international EPR expert workshop at the end of 2019. This initiative results from the strong interactions between the German Priority Program New Frontiers in Sensitivity for EPR Spectroscopy (spp1601.de) and the NSF-funded US-based sharedEPRnetwork (sharedepr.org). Expected outcomes of this meeting are recommendations for experimental procedures, data standards for publications, and quality assessments of EPR data. A task force will describe the final protocols in a Whitepaper. The International EPR (ESR) Society (www.ieprs.org) has committed to supporting and hosting an open database for original EPR time traces and the resulting distance restraints.
4.3. Standards for computational methods providing information for integrative modeling
In addition to experimental information, prior models, such as computationally-derived structural models of components, secondary structure predictions, disorder region predictions, and predicted residue-residue contacts, can also be used in integrative structure modeling.
Following a decision reached at a workshop held in 2006 (Berman et al., 2006), the PDB archive is restricted to structural models derived from experimental methods. Based on community recommendations (Schwede et al., 2009), the macromolecular ModelArchive (www.modelarchive.org) has been built to archive structural models that are not based on experimental information about the modeled system, such as homology models, ab initio predictions, and models based on contact distances predicted by co-evolutionary analysis and deep learning approaches (Ovchinnikov et al., 2015; Kosciolek and Jones, 2016; Hou et al., 2019). About 1500 models have been made publicly accessible in ModelArchive so far. An extension of the PDBx/mmCIF dictionary for representing computational models was developed recently (github.com/ihmwg/MA-dictionary), aiming to facilitate the development of methods for efficient data exchange among the structural model repositories (PDB, ModelArchive, and PDB-Dev; Figure 1). Work is in progress to support the new dictionary within the SWISS-MODEL repository (Bienert et al., 2017; Waterhouse et al., 2018) and ModelArchive.
The Critical Assessment of Protein Structure Prediction (CASP) has been exploring modeling methods based in part on sparse experimental data, including data from SAS, NMR, cross-linking, and FRET. This Integrative CASP Experiment was highlighted at the recent CASP13 meeting (www.predictioncenter.org/casp13), and the resulting manuscripts are currently in review. In particular, CASP has catalyzed continued development of methods for contact prediction from evolutionary co-variance data (Schaarschmidt et al., 2018). Several of the fully automated structure prediction methods participating within the Continuous Automated Model EvaluatiOn (CAMEO, (Haas et al., 2018)) platform infer and subsequently integrate contact predictions in their pipelines. Such contact predictions have already been combined with sparse experimental NMR data for integrative modeling of protein structures (Tang et al., 2015).
5. Standards for validating integrative structures
A structural model of any type must be validated to evaluate how it can be interpreted. Standardized validation of integrative structures will ultimately be part of deposition into the PDB, as is already the case for structures derived using traditional methods (Read et al., 2011; Henderson et al., 2012; Montelione et al., 2013; Trewhella et al., 2013; Gore et al., 2017; Trewhella et al., 2017). Thus, an effort to build a validation pipeline for integrative structures and incorporate it into the OneDep (Young et al., 2017) deposition system was initiated under the auspices of the wwPDB. The input for validation will be the integrative structure and the data used to compute it, represented in the standard format. The output will be a validation report listing validation criteria, presented graphically in a pdf file or on a web page, relying heavily on the extensive experience of the wwPDB working with the structural biology community. The validation report will facilitate reviewing, publishing, and using the results of integrative structural biology studies. A standardized table will report key parameters of a study, similar to such tables used for other structure determination methods (Read et al., 2011; Trewhella et al., 2017).
The proposed wwPDB validation pipeline for integrative structures borrows from the validation implemented in IMP (Rout and Sali, 2019). In addition, it is informed by feedback from the members of the Model Working Group of the wwPDB IHM Task Force and members of the broader integrative structural biology community. The validation pipeline will leverage existing software developed by the structural biology community (e.g., wwPDB (Gore et al., 2017), MolProbity (Williams et al., 2018), BMRB (Ulrich et al., 2008), EMDB (Tagari et al., 2002; Lawson et al., 2016; Patwardhan and Lawson, 2016), SASBDB (Valentini et al., 2015), PHENIX (Adams et al., 2010, and PDBStat [Tejero, 2013 #11105)). For the time being, the proposed wwPDB validation criteria for integrative structures are organized into five broad categories (c.f., Sections 5.1–5.5).
5.1. Quality of the data
The quality of an integrative structure clearly depends on the quality of the data used to compute it (c.f., garbage in, garbage out). Thus, it is essential to annotate integrative structures with data quality measures. These measures are best established by the communities generating the data, illustrating one benefit of the wwPDB Federation model. Importantly, the data quality criteria need to be computable only from the deposited data and its annotations, without requiring non-deposited information nor the structural model itself. Examples include the resolution of the EM map, the false-positive rate of chemical cross-links, and the adequacy of the measurement range and signal-to-noise ratio of a SAS profile.
5.2. Standard criteria for assessing atomic models
Some integrative structures or their parts may be represented at atomic resolution. In such cases, all criteria for assessing the quality of atomic structures already implemented in OneDep (Young et al., 2017) (e.g., clash score, Ramachandran plot outliers, and sidechain outliers) will be adopted, as provided by the MolProbity program (Williams et al., 2018). This assessment may result in annotating some regions as well-defined versus ill-defined, similar to the annotation of structural ensembles determined by NMR spectroscopy (c.f., Section 5.5). Using tools developed in the CAMEO project (Haas et al., 2018) will also be explored.
5.3. Fit of a model to information used to compute it
A model must sufficiently satisfy the data used to compute it. We will adopt standard validation criteria for assessing the fit of a model to these data; for example, cross-correlation coefficient between the model and the EM map, the fraction of chemical cross-links satisfied by the model, and the discrepancy χ2 value between the computed and experimental SAS profiles combined with the goodness-of-fit test for the correlation map (e.g., the P-value from (Franke et al., 2015)). We may need to improve these validation criteria; for example, the threshold on the cross-correlation coefficient between an EM map and a model may depend on the degree of coarse-graining of the model. We will also ensure that all criteria are compatible with the richness of the molecular representations available for integrative structures (i.e., ensembles of multi-scale and multi-state structures) (Section 2.2). Because both integrative structure modeling and NMR-based modeling involve satisfaction of spatial restraints, lessons will be learned from quantifying spatial restraint satisfaction in NMR-based modeling (Tejero et al., 2013; Gutmanas et al., 2015).
Violations of input data by the model occur when the data are more uncertain than assumed (e.g., the false positive rate of chemical cross-links is higher than the presumed threshold), the representation of a model is incorrect (e.g., a subunit structure in the modeled complex is not rigid or the system exists in multiple states instead of a single state), the scoring is incorrect (e.g., a cross-link restraint does not consider the ambiguity resulting from multiple copies of a cross-linked subunit in the modeled system), and/or the sampling is not sufficient (i.e., a model that satisfies all the data does exist but was simply not found by the sampling scheme). Thus, this test provides immediate feedback for improving the modeling protocol.
5.4. Fit of a model to information not used to compute it
A particularly informative test is a comparison of a model against the data that were not used to compute the model. Validation criteria described in the previous section apply, except perhaps with more lenient thresholds. We will encourage deposition of such additional unused data with the model, so that the corresponding standard tests can be performed during deposition.
Resampling tests (e.g., jack-knifing and bootstrapping) consist of repetitively omitting a random subset of the input data, recomputing the model, and comparing the models against the omitted data, to validate both the model and the data. Such tests are the basis for the Rfree criterion in X-ray crystallography (Brunger, 1993) and the use of half-maps in modeling based on 3DEM data (van Heel and Schatz, 2005; Chen et al., 2013; Afonine et al., 2018). An example from integrative structure modeling is using multiple random subsets of chemical cross-links to assess the Nup84 heptamer model (Fernandez-Martinez et al., 2012; Shi et al., 2014). Unfortunately, these resampling tests can only be performed by the depositors themselves, because the wwPDB validation pipeline cannot reproduce a modeling protocol used for each deposited structure. Accordingly, the authors will be encouraged to perform resampling tests before the deposition and report the results in a standardized manner during model deposition.
5.5. Uncertainty of the model
One of the most useful assessments of a model is quantification of its uncertainty. Model uncertainty is most explicitly described by the set of “all” models that are sufficiently consistent with the input information (i.e., the model ensemble; correspondingly, the entire ensemble, not just a single representative member, is in fact the model). In practice, computing such an ensemble requires sufficient structural sampling, which is often neither performed nor tested (Viswanath et al., 2017). If an ensemble is available, model precision can be assessed by analyzing the variability among the models comprising the ensemble. The ensemble can optionally be described by one or more representative models and their uncertainties (e.g., when an ensemble consists of multiple clusters of models, each cluster can be represented by its centroid model). Importantly, the uncertainty is generally not distributed evenly across a model. Only those model features that are coarser than model uncertainty can be interpreted. Thus, the model needs to be annotated by its uncertainty and tools for visualizing this uncertainty need to be further developed. The model uncertainty reflects the actual heterogeneity of the physical sample(s) used to obtain the data as well as the uncertainties in the input information, representation of the model, and scoring of the alternative models. It is generally difficult to deconvolute the effects of these different uncertainties on the model uncertainty.
Because of the importance of estimating model uncertainty, the authors will be encouraged to develop and apply modeling methods that compute a complete ensemble of models consistent with input information and estimate sampling precision for their method (Viswanath et al., 2017). However, not all useful methods for computing integrative structures are able to produce a representative ensemble of models (e.g., when models are constructed by hand or a single model computation is performed). Therefore, we will allow for the following three deposition scenarios:
First, a single structural model is deposited. In such a case, not much can be inferred about the uncertainty of the model from the model itself, although some empirical methods for estimating uncertainty based on a single model may yet be developed (c.f., the accuracy of a comparative model is correlated with the sequence similarity to the template structure on which it is based or with a structure-dependent statistical potential score). To encourage quantification of uncertainty, the IHM Dictionary will provide terms for specifying the uncertainty of each part of an integrative structure, similarly to the atomic B-factors in the crystallographic structure files.
Second, a small ensemble of structural models is deposited, potentially representing more than one cluster of solutions. Here, we will consider adopting the best practices of the NMR community (Montelione et al., 2013), as follows. The total uncertainty of a model, resulting from both the lack of information and sample heterogeneity, is represented approximately by a relatively small ensemble of 20–30 structures, which is often selected from a larger ensemble of 50–100 structures. The deposited ensemble is annotated by identifying the medoid structure that is most similar to all the other structures. Furthermore, well- and ill-defined regions within the ensemble are identified, using domain identification and local superposition to eliminate artefacts that can result from global superposition (Kirchner and Guntert, 2011).
Third, a large ensemble of structural models is deposited, again potentially representing more than one cluster of solutions. For example, IMP routinely generates thousands of structural models that represent as completely as possible all structures that satisfy the input information (Rout and Sali, 2019). The ensemble is used to estimate the sampling precision (Viswanath et al., 2017), cluster these models based on their structural similarity, and represent the resulting clusters with their localization densities (i.e., the probability of any model component at any grid point (Alber et al., 2007)). These clusters and localization densities are a useful representation of model uncertainty. The corresponding visualization will be implemented in the validation pipeline by relying on the programs such as ChimeraX (Goddard et al., 2018) and VMD (Humphrey et al., 1996) as well as the Molstar web application (molstar.org).
Finally, care will be taken to expand the representation of integrative structures in the IHM Dictionary to allow for deposition of all commonly used ensemble depictions (e.g., ensemble modeling of intrinsically disordered proteins or regions based on SAS data).
5.6. Remarks
While the validation pipeline proposed above will certainly be helpful, it does not include all useful tests, because some criteria cannot be easily applied during deposition at this time. As mentioned above, examples include an estimate of sampling precision, which requires extensive stochastic sampling, and data resampling tests, which require repeated modeling with subsets of data. Therefore, describing such validations will by necessity be limited to original papers, contributed by the authors during deposition. It is expected that the validation pipeline will mature over time, as more advanced methods are developed and adopted by the community.
Similarly, the validation of structural models entirely within the Bayesian framework will eventually be explored. Such a formulation promises the most rigorous and general validation, especially if the models are also computed within the Bayesian framework in the first place. The current proposal does not reflect these future advances; even if they were in hand, many existing useful criteria are not Bayesian. However, we expect that our validation pipeline will eventually be informed by the Bayesian view of computing, assessing, and using models.
6. Recommendations
To address the challenges involved in archiving integrative structures, the Workshop participants were divided into two discussion groups that focused on (1) standards and data exchange and (2) validation of integrative models. Their collective recommendations are summarized below.
Continue to develop the IHM Dictionary for integrative structures with standard definitions for the experimental and computational methods used for integrative modeling. This dictionary-based approach will allow for maximum interoperability among the experimental and computational methods used for structure determination and ultimately facilitate deposition of integrative structures into the PDB.
Develop new tools that will facilitate dictionary development in the PDBx/mmCIF framework. Such tools are critical to accelerate the development of resources needed to archive structures.
Promote the development of common data standards that will enable efficient data exchange among scientific repositories contributing to structural biology.
Create a validation pipeline for integrative structures, including measures of the quality of the data on which the structures were based, the standard criteria for assessing atomic models, the fit of a model to information used to compute it, the fit of a model to information not used to compute it, and uncertainty in the model.
Raise awareness by journal editors of the new standards being developed for structure determination and the emergence of new data repositories, and advocate for depositing structures and data prior to publication.
Raise awareness broadly, including at funding agencies, of the critical need for support of the underlying hardware, software, and personnel with expert knowledge, that together form the infrastructure essential for the archiving of integrative structures.
7. Acknowledgments
Funding for the BPS Satellite meeting was provided by a NSF OAC-1838628 to HMB. Additional funding for the development of the pipeline for integrative archiving are: NSF DBI-1756248 (HMB), NSF DBI-1756250 (AS), NSF DBI-1519158 (HMB, AS), DBI-1832184 (SKB, AS), R01GM083960 (AS), P41GM109824 (AS).
Glossary of terms
- 2DEM
Two Dimensional Electron Microscopy
- 3DEM
Three Dimensional Electron Microscopy
- BioEn
Bayesian Inference of Ensembles
- BMRB
Biological Magnetic Resonance data Bank
- CASP
Critical Assessment of Protein Structure Prediction
- CNS
Crystallography and NMR System
- CX
Chemical Crosslinking
- EBI
European Bioinformatics Institute
- EMDB
Electron Microscopy Data Bank
- EMPIA
RElectron Microscopy Public Image Archive
- EPR
Electron Paramagnetic Resonance
- ESR
Electron Spin Resonance
- FLBDB
Fluorescence Biological Data Bank
- FRET
Förster Resonance Energy Transfer
- HADDOCK
High Ambiguity Driven protein-protein DOCKing
- HDX
Hydrogen Deuterium Exchange
- IHM
Integrative/Hybrid Modeling
- IHM Dictionary
Extension of the PDBx/mmCIF Data Dictionary for Integrative/Hybrid Models
- IMP
Integrative Modeling Platform
- IUCr
International Union of Crystallography
- mmCIF
Macromolecular Crystallographic Information File
- MS
Mass Spectrometry
- NEF
NMR Exchange Format
- NMR
NMR Spectroscopy
- NMR-STAR
NMR Self-defining Text Archive and Retrieval format
- NMR VTF
NMR Validation Task Force
- OneDep
wwPDB software system for deposition, validation, and biocuration of 3D structures
- PDB
Protein Data Bank
- PDBx/mmCIF
Protein Data Bank Exchange/Macromolecular Crystallographic Information File
- PGS
Population-based Genome Structure
- SAS
Small Angle Scattering
- SASBDB
Small Angle Scattering Biological Data Bank
- SAS VTF
SAS Validation Task Force
- VMD
Visual Molecular Dynamics
- wwPDB
Worldwide Protein Data Bank
- X-ray
X-ray Crystallography
Footnotes
Declaration of Interests
The authors declare no competing interests.
References
- Adams PD, Afonine PV, Bunkoczi G, Chen VB, Davis IW, Echols N, Headd JJ, Hung LW, Kapral GJ, Grosse-Kunstleve RW, McCoy AJ, Moriarty NW, Oeffner R, Read RJ, Richardson DC, Richardson JS, Terwilliger TC and Zwart PH (2010). PHENIX: a comprehensive Python-based system for macromolecular structure solution. Acta Crystallogr D Biol Crystallogr 66, 213–221. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Afonine PV, Klaholz BP, Moriarty NW, Poon BK, Sobolev OV, Terwilliger TC, Adams PD and Urzhumtsev A (2018). New tools for the analysis and validation of cryo-EM maps and atomic models. Acta Crystallogr D Struct Biol 74, 814–840. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Alber F, Dokudovskaya S, Veenhoff L, Zhang W, Kipper J, Devos D, Suprapto A, Karni-Schmidt O, Williams R, Chait B, Sali A and Rout M (2007). The molecular architecture of the nuclear pore complex. Nature 450, 695–701. [DOI] [PubMed] [Google Scholar]
- Baker M (2018). Cryo-electron microscopy shapes up. Nature 561, 565–567. [DOI] [PubMed] [Google Scholar]
- Belsom A, Schneider M, Fischer L, Brock O and Rappsilber J (2016). Serum Albumin Domain Structures in Human Blood Serum by Mass Spectrometry and Computational Biology. Mol Cell Proteomics 15, 1105–1116. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bender BJ, Vortmeier G, Ernicke S, Bosse M, Kaiser A, Els-Heindl S, Krug U, Beck-Sickinger A, Meiler J and Huster D (2019). Structural Model of Ghrelin Bound to its G Protein-Coupled Receptor. Structure 27, 537–544 e534. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Berman HM, Burley SK, Chiu W, Sali A, Adzhubei A, Bourne PE, Bryant SH, Dunbrack RL Jr., Fidelis K, Frank J, Godzik A, Henrick K, Joachimiak A, Heymann B, Jones D, Markley JL, Moult J, Montelione GT, Orengo C, Rossmann MG, Rost B, Saibil H, Schwede T, Standley DM and Westbrook JD (2006). Outcome of a workshop on archiving structural models of biological macromolecules. Structure 14, 1211–1217. [DOI] [PubMed] [Google Scholar]
- Berman HM, Henrick K and Nakamura H (2003). Announcing the worldwide Protein Data Bank. Nat. Struct. Biol. 10, 980. [DOI] [PubMed] [Google Scholar]
- Bienert S, Waterhouse A, de Beer TA, Tauriello G, Studer G, Bordoli L and Schwede T (2017). The SWISS-MODEL Repository-new features and functionality. Nucleic Acids Res 45, D313–D319. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bourne PE, Berman HM, McMahon B, Watenpaugh KD, Westbrook JD and Fitzgerald PM (1997). Macromolecular Crystallographic Information File. Methods Enzymol 277, 571–590. [DOI] [PubMed] [Google Scholar]
- Brunger AT (1993). Assessment of phase accuracy by cross validation: the free R value. Methods and applications. Acta Crystallogr D Biol Crystallogr 49, 24–36. [DOI] [PubMed] [Google Scholar]
- Brünger AT (1992). X-PLOR, version 3.1, a system for X-ray crystallography and NMR. New Haven, CT, Yale University Press. [Google Scholar]
- Brünger AT, Adams PD, Clore GM, DeLano WL, Gros P, Grosse-Kunstleve RW, Jiang J-S, Kuszewski J, Nilges M, Pannu NS, Read RJ, Rice LM, Simonson T and Warren GL (1998). Crystallographic and NMR system: a new software suite for macromolecular structure determination. Acta Crystallogr D54, 905–921. [DOI] [PubMed] [Google Scholar]
- Burley SK, Kurisu G, Markley JL, Nakamura H, Velankar S, Berman HM, Sali A, Schwede T and Trewhella J (2017). PDB-Dev: a Prototype System for Depositing Integrative/Hybrid Structural Models. Structure 25, 1317–1318. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cai K, Frederick RO, Dashti H and Markley JL (2018). Architectural Features of Human Mitochondrial Cysteine Desulfurase Complexes from Crosslinking Mass Spectrometry and Small-Angle X-Ray Scattering. Structure 26, 1127–1136 e1124. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Callaway J, Cummings M, Deroski B, Esposito P, Forman A, Langdon P, Libeson M, McCarthy J, Sikora J, Xue D, Abola E, Bernstein F, Manning N, Shea R, Stampf D and Sussman J (1996). Protein Data Bank Contents Guide: Atomic coordinate entry format description, Brookhaven National Laboratory. [Google Scholar]
- Campos M, Francetic O and Nilges M (2011). Modeling pilus structures from sparse data. J Struct Biol 173, 436–444. [DOI] [PubMed] [Google Scholar]
- Campos M, Nilges M, Cisneros DA and Francetic O (2010). Detailed structural and assembly model of the type II secretion pilus from sparse data. Proc Natl Acad Sci U S A 107, 13081–13086. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chang YN, Jaumann EA, Reichel K, Hartmann J, Oliver D, Hummer G, Joseph B and Geertsma ER (2019). Structural basis for functional interactions in dimers of SLC26 transporters. Nat Commun 10, 2032. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chaudhury S, Lyskov S and Gray JJ (2010). PyRosetta: a script-based interface for implementing molecular modeling algorithms using Rosetta. Bioinformatics 26, 689–691. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chen S, McMullan G, Faruqi AR, Murshudov GN, Short JM, Scheres SHW and Henderson R (2013). High-resolution noise substitution to measure overfitting and validate resolution in 3D structure determination by single particle electron cryomicroscopy. Ultramicroscopy 135, 24–35. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chen ZA, Pellarin R, Fischer L, Sali A, Nilges M, Barlow PN and Rappsilber J (2016). Structure of Complement C3(H2O) Revealed By Quantitative Cross-Linking/Mass Spectrometry And Modeling. Mol Cell Proteomics 15, 2730–2743. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chou HT, Apelt L, Farrell DP, White SR, Woodsmith J, Svetlov V, Goldstein JS, Nager AR, Li Z, Muller J, Dollfus H, Nudler E, Stelzl U, DiMaio F, Nachury MV and Walz T (2019). The Molecular Architecture of Native BBSome Obtained by an Integrated Structural Approach. Structure 27, 1384–1394 e1384. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dai G, Aman TK, DiMaio F and Zagotta WN (2019). The HCN channel voltage sensor undergoes a large downward motion during hyperpolarization. Nat Struct Mol Biol 26, 686–694. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Deutsch EW, Csordas A, Sun Z, Jarnuczak A, Perez-Riverol Y, Ternent T, Campbell DS, Bernal-Llinares M, Okuda S, Kawano S, Moritz RL, Carver JJ, Wang M, Ishihama Y, Bandeira N, Hermjakob H and Vizcaino JA (2017a). The ProteomeXchange consortium in 2017: supporting the cultural change in proteomics public data deposition. Nucleic Acids Res 45, D1100–D1106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Deutsch EW, Orchard S, Binz PA, Bittremieux W, Eisenacher M, Hermjakob H, Kawano S, Lam H, Mayer G, Menschaert G, Perez-Riverol Y, Salek RM, Tabb DL, Tenzer S, Vizcaino JA, Walzer M and Jones AR (2017b). Proteomics Standards Initiative: Fifteen Years of Progress and Future Work. J Proteome Res 16, 4288–4298. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dimura M, Peulen TO, Hanke CA, Prakash A, Gohlke H and Seidel CA (2016). Quantitative FRET studies and integrative modeling unravel the structure and dynamics of biomolecular systems. Curr Opin Struct Biol 40, 163–185. [DOI] [PubMed] [Google Scholar]
- Dominguez C, Boelens R and Bonvin AM (2003). HADDOCK: a protein-protein docking approach based on biochemical or biophysical information. J Am Chem Soc 125, 1731–1737. [DOI] [PubMed] [Google Scholar]
- Editorial (2018). Challenges for cryo-EM. Nat Methods 15, 985. [DOI] [PubMed] [Google Scholar]
- Ferber M, Kosinski J, Ori A, Rashid UJ, Moreno-Morcillo M, Simon B, Bouvier G, Batista PR, Muller CW, Beck M and Nilges M (2016). Automated structure modeling of large protein assemblies using crosslinks as distance restraints. Nat Methods 13, 515–520. [DOI] [PubMed] [Google Scholar]
- Fernandez-Martinez J, Kim SJ, Shi Y, Upla P, Pellarin R, Gagnon M, Chemmama IE, Wang J, Nudelman I, Zhang W, Williams R, Rice WJ, Stokes DL, Zenklusen D, Chait BT, Sali A and Rout MP (2016). Structure and Function of the Nuclear Pore Complex Cytoplasmic mRNA Export Platform. Cell 167, 1215–1228.e1225. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fernandez-Martinez J, Phillips J, Sekedat MD, Diaz-Avalos R, Velazquez-Muriel J, Franke JD, Williams R, Stokes DL, Chait BT, Sali A and Rout MP (2012). Structure-function mapping of a heptameric module in the nuclear pore complex. J Cell Biol 196, 419–434. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fischer AW, Alexander NS, Woetzel N, Karakas M, Weiner BE and Meiler J (2015). BCL::MP-fold: Membrane protein structure prediction guided by EPR restraints. Proteins 83, 1947–1962. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fitzgerald PMD, Westbrook JD, Bourne PE, McMahon B, Watenpaugh KD and Berman HM (2005). 4.5 Macromolecular dictionary (mmCIF) International Tables for Crystallography G. Definition and exchange of crystallographic data. Hall SR and McMahon B Springer: 295–443. [Google Scholar]
- Fleishman SJ, Leaver-Fay A, Corn JE, Strauch EM, Khare SD, Koga N, Ashworth J, Murphy P, Richter F, Lemmon G, Meiler J and Baker D (2011). RosettaScripts: a scripting language interface to the Rosetta macromolecular modeling suite. PLoS One 6, e20161. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Folmer RH, Nilges M, Folkers PJ, Konings RN and Hilbers CW (1994). A model of the complex between single-stranded DNA and the single-stranded DNA binding protein encoded by gene V of filamentous bacteriophage M13. J Mol Biol 240, 341–357. [DOI] [PubMed] [Google Scholar]
- Franke D, Jeffries CM and Svergun DI (2015). Correlation Map, a goodness-of-fit test for one-dimensional X-ray scattering spectra. Nat Methods 12, 419–422. [DOI] [PubMed] [Google Scholar]
- Gajewski S, Waddell MB, Vaithiyalingam S, Nourse A, Li Z, Woetzel N, Alexander N, Meiler J and White SW (2016). Structure and mechanism of the phage T4 recombination mediator protein UvsY. Proc Natl Acad Sci U S A 113, 3275–3280. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Goddard TD, Huang CC, Meng EC, Pettersen EF, Couch GS, Morris JH and Ferrin TE (2018). UCSF ChimeraX: Meeting modern challenges in visualization and analysis. Protein Sci 27, 14–25. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gore S, Sanz Garcia E, Hendrickx PMS, Gutmanas A, Westbrook JD, Yang H, Feng Z, Baskaran K, Berrisford JM, Hudson BP, Ikegawa Y, Kobayashi N, Lawson CL, Mading S, Mak L, Mukhopadhyay A, Oldfield TJ, Patwardhan A, Peisach E, Sahni G, Sekharan MR, Sen S, Shao C, Smart OS, Ulrich EL, Yamashita R, Quesada M, Young JY, Nakamura H, Markley JL, Berman HM, Burley SK, Velankar S and Kleywegt GJ (2017). Validation of Structures in the Protein Data Bank. Structure 25, 1916–1927. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gore S, Velankar S and Kleywegt GJ (2012). Implementing an X-ray validation pipeline for the Protein Data Bank. Acta Crystallogr D Biol Crystallogr 68, 478–483. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Groom CR, Bruno IJ, Lightfoot MP and Ward SC (2016). The Cambridge Structural Database. Acta Crystallogr B 72, 171–179. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gutmanas A, Adams PD, Bardiaux B, Berman HM, Case DA, Fogh RH, Guntert P, Hendrickx PM, Herrmann T, Kleywegt GJ, Kobayashi N, Lange OF, Markley JL, Montelione GT, Nilges M, Ragan TJ, Schwieters CD, Tejero R, Ulrich EL, Velankar S, Vranken WF, Wedell JR, Westbrook J, Wishart DS and Vuister GW (2015). NMR Exchange Format: a unified and open standard for representation of NMR restraint data. Nat Struct Mol Biol 22, 433–434. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Haas J, Barbato A, Behringer D, Studer G, Roth S, Bertoni M, Mostaguir K, Gumienny R and Schwede T (2018). Continuous Automated Model EvaluatiOn (CAMEO) complementing the critical assessment of structure prediction in CASP12. Proteins 86 Suppl 1, 387–398. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hellenkamp B, Schmid S, Doroshenko O, Opanasyuk O, Kuhnemuth R, Rezaei Adariani S, Ambrose B, Aznauryan M, Barth A, Birkedal V, Bowen ME, Chen H, Cordes T, Eilert T, Fijen C, Gebhardt C, Gotz M, Gouridis G, Gratton E, Ha T, Hao P, Hanke CA, Hartmann A, Hendrix J, Hildebrandt LL, Hirschfeld V, Hohlbein J, Hua B, Hubner CG, Kallis E, Kapanidis AN, Kim JY, Krainer G, Lamb DC, Lee NK, Lemke EA, Levesque B, Levitus M, McCann JJ, Naredi-Rainer N, Nettels D, Ngo T, Qiu R, Robb NC, Rocker C, Sanabria H, Schlierf M, Schroder T, Schuler B, Seidel H, Streit L, Thurn J, Tinnefeld P, Tyagi S, Vandenberk N, Vera AM, Weninger KR, Wunsch B, Yanez-Orozco IS, Michaelis J, Seidel CAM, Craggs TD and Hugel T (2018). Precision and accuracy of single-molecule FRET measurements-a multi-laboratory benchmark study. Nat Methods 15, 669–676. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Henderson R, Baldwin JM, Ceska TA, Zemlin F, Beckmann E and Downing KH (1990). Model for the structure of bacteriorhodopsin based on high-resolution electron cryo-microscopy. J Mol Biol 213, 899–929. [DOI] [PubMed] [Google Scholar]
- Henderson R, Sali A, Baker ML, Carragher B, Devkota B, Downing KH, Egelman EH, Feng Z, Frank J, Grigorieff N, Jiang W, Ludtke SJ, Medalia O, Penczek PA, Rosenthal PB, Rossmann MG, Schmid MF, Schroder GF, Steven AC, Stokes DL, Westbrook JD, Wriggers W, Yang H, Young J, Berman HM, Chiu W, Kleywegt GJ and Lawson CL (2012). Outcome of the first electron microscopy validation task force meeting. Structure 20, 205–214. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hofmann T, Fischer AW, Meiler J and Kalkhof S (2015). Protein structure prediction guided by crosslinking restraints--A systematic evaluation of the impact of the crosslinking spacer length. Methods 89, 79–90. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Horn V, Uckelmann M, Zhang H, Eerland J, Aarsman I, le Paige UB, Davidovich C, Sixma TK and van Ingen H (2019). Structural basis of specific H2A K13/K15 ubiquitination by RNF168. Nat Commun 10, 1751. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hou J, Wu T, Cao R and Cheng J (2019). Protein tertiary structure modeling driven by deep learning and contact distance prediction in CASP13. Proteins. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hsieh A, Lu L, Chance MR and Yang S (2017). A Practical Guide to iSPOT Modeling: An Integrative Structural Biology Platform. Adv Exp Med Biol 1009, 229–238. [DOI] [PubMed] [Google Scholar]
- Hua N, Tjong H, Shin H, Gong K, Zhou XJ and Alber F (2018). Producing genome structure populations with the dynamic and automated PGS software. Nat Protoc 13, 915–926. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Huang W, Ravikumar KM, Parisien M and Yang S (2016). Theoretical modeling of multiprotein complexes by iSPOT: Integration of small-angle X-ray scattering, hydroxyl radical footprinting, and computational docking. J Struct Biol 196, 340–349. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hummer G and Kofinger J (2015). Bayesian ensemble refinement by replica simulations and reweighting. J Chem Phys 143, 243150. [DOI] [PubMed] [Google Scholar]
- Humphrey W, Dalke A and Schulten K (1996). VMD: visual molecular dynamics. J Mol Graph 14, 33–38. [DOI] [PubMed] [Google Scholar]
- Iudin A, Korir PK, Salavert-Torres J, Kleywegt GJ and Patwardhan A (2016). EMPIAR: a public archive for raw electron microscopy image data. Nat Methods 13, 387–388. [DOI] [PubMed] [Google Scholar]
- Jacques DA, Guss JM, Svergun DI and Trewhella J (2012a). Publication guidelines for structural modelling of small-angle scattering data from biomolecules in solution. Acta Crystallogr D Biol Crystallogr 68, 620–626. [DOI] [PubMed] [Google Scholar]
- Jacques DA, Guss JM and Trewhella J (2012b). Reliable structural interpretation of small-angle scattering data from bio-molecules in solution--the importance of quality control and a standard reporting framework. BMC Struct Biol 12, 9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jeschke G, Chechik V, Ionita P, Godt A, Zimmermann H, Banham J, Timmel CR, Hilger D and Jung H (2006). DeerAnalysis2006—a comprehensive software package for analyzing pulsed ELDOR data. Applied Magnetic Resonance 30, 473–498. [Google Scholar]
- Jishage M, Yu X, Shi Y, Ganesan SJ, Chen WY, Sali A, Chait BT, Asturias FJ and Roeder RG (2018). Architecture of Pol II(G) and molecular mechanism of transcription regulation by Gdown1. Nat Struct Mol Biol 25, 859–867. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kachala M, Westbrook J and Svergun D (2016). Extension of the sasCIF format and its applications for data processing and deposition. J Appl Crystallogr 49, 302–310. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Karakas M, Woetzel N, Staritzbichler R, Alexander N, Weiner BE and Meiler J (2012). BCL::Fold--de novo prediction of complex and large protein topologies by assembly of secondary structure elements. PLoS One 7, e49240. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kim SJ, Fernandez-Martinez J, Nudelman I, Shi Y, Zhang W, Raveh B, Herricks T, Slaughter BD, Hogan JA, Upla P, Chemmama IE, Pellarin R, Echeverria I, Shivaraju M, Chaudhury AS, Wang J, Williams R, Unruh JR, Greenberg CH, Jacobs EY, Yu Z, de la Cruz MJ, Mironska R, Stokes DL, Aitchison JD, Jarrold MF, Gerton JL, Ludtke SJ, Akey CW, Chait BT, Sali A and Rout MP (2018). Integrative structure and functional anatomy of a nuclear pore complex. Nature 555, 475–482. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kim SJ, Fernandez-Martinez J, Sampathkumar P, Martel A, Matsui T, Tsuruta H, Weiss TM, Shi Y, Markina-Inarrairaegui A, Bonanno JB, Sauder JM, Burley SK, Chait BT, Almo SC, Rout MP and Sali A (2014). Integrative structure-function mapping of the nucleoporin nup133 suggests a conserved mechanism for membrane anchoring of the nuclear pore complex. Mol Cell Proteomics 13, 2911–2926. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kirchner DK and Guntert P (2011). Objective identification of residue ranges for the superposition of protein structures. BMC Bioinformatics 12, 170. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kofinger J, Ragusa MJ, Lee IH, Hummer G and Hurley JH (2015). Solution structure of the Atg1 complex: implications for the architecture of the phagophore assembly site. Structure 23, 809–818. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kofinger J, Stelzl LS, Reuter K, Allande C, Reichel K and Hummer G (2019). Efficient Ensemble Refinement by Reweighting. J Chem Theory Comput 15, 3390–3401. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kosciolek T and Jones DT (2016). Accurate contact predictions using covariation techniques and machine learning. Proteins 84 Suppl 1, 145–151. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lawson CL, Baker ML, Best C, Bi C, Dougherty M, Feng P, van Ginkel G, Devkota B, Lagerstedt I, Ludtke SJ, Newman RH, Oldfield TJ, Rees I, Sahni G, Sala R, Velankar S, Warren J, Westbrook JD, Henrick K, Kleywegt GJ, Berman HM and Chiu W (2011). EMDataBank.org: unified data resource for CryoEM. Nucleic acids research 39, D456–464. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lawson CL and Chiu W (2018). Comparing cryo-EM structures. J Struct Biol 204, 523–526. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lawson CL, Patwardhan A, Baker ML, Hryc C, Garcia ES, Hudson BP, Lagerstedt I, Ludtke SJ, Pintilie G, Sala R, Westbrook JD, Berman HM, Kleywegt GJ and Chiu W (2016). EMDataBank unified data resource for 3DEM. Nucleic Acids Res 44, D396–403. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Leaver-Fay A, Tyka M, Lewis SM, Lange OF, Thompson J, Jacak R, Kaufman K, Renfrew PD, Smith CA, Sheffler W, Davis IW, Cooper S, Treuille A, Mandell DJ, Richter F, Ban YE, Fleishman SJ, Corn JE, Kim DE, Lyskov S, Berrondo M, Mentzer S, Popovic Z, Havranek JJ, Karanicolas J, Das R, Meiler J, Kortemme T, Gray JJ, Kuhlman B, Baker D and Bradley P (2011). ROSETTA3: an object-oriented software suite for the simulation and design of macromolecules. Methods Enzymol 487, 545–574. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lindert S, Staritzbichler R, Wotzel N, Karakas M, Stewart PL and Meiler J (2009). EM-fold: De novo folding of alpha-helical proteins guided by intermediate-resolution electron microscopy density maps. Structure 17, 990–1003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Liu Z, Gong Z, Cao Y, Ding YH, Dong MQ, Lu YB, Zhang WP and Tang C (2018). Characterizing Protein Dynamics with Integrative Use of Bulk and Single-Molecule Techniques. Biochemistry 57, 305–313. [DOI] [PubMed] [Google Scholar]
- Malfois M and Svergun DI (2000). sasCIF: an extension of core Crystallographic Information File for SAS. Journal of Applied Crystallography 33, 812–816. [Google Scholar]
- Martens L, Chambers M, Sturm M, Kessner D, Levander F, Shofstahl J, Tang WH, Rompp A, Neumann S, Pizarro AD, Montecchi-Palazzi L, Tasman N, Coleman M, Reisinger F, Souda P, Hermjakob H, Binz PA and Deutsch EW (2011). mzML--a community standard for mass spectrometry data. Mol Cell Proteomics 10, R110 000133. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Masson GR, Burke JE, Ahn NG, Anand GS, Borchers C, Brier S, Bou-Assaf GM, Engen JR, Englander SW, Faber J, Garlish R, Griffin PR, Gross ML, Guttman M, Hamuro Y, Heck AJR, Houde D, Iacob RE, Jorgensen TJD, Kaltashov IA, Klinman JP, Konermann L, Man P, Mayne L, Pascal BD, Reichmann D, Skehel M, Snijder J, Strutzenberg TS, Underbakke ES, Wagner C, Wales TE, Walters BT, Weis DD, Wilson DJ, Wintrode PL, Zhang Z, Zheng J, Schriemer DC and Rand KD (2019). Recommendations for performing, interpreting and reporting hydrogen deuterium exchange mass spectrometry (HDX-MS) experiments. Nat Methods 16, 595–602. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Montelione GT, Nilges M, Bax A, Guntert P, Herrmann T, Richardson JS, Schwieters CD, Vranken WF, Vuister GW, Wishart DS, Berman HM, Kleywegt GJ and Markley JL (2013). Recommendations of the wwPDB NMR Validation Task Force. Structure 21, 1563–1570. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nakamura Y, Cochrane G and Karsch-Mizrachi I (2013). The International Nucleotide Sequence Database Collaboration. Nucleic Acids Res 41, D21–24. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Oluwadare O, Highsmith M and Cheng J (2019). An Overview of Methods for Reconstructing 3-D Chromosome and Genome Structures from Hi-C Data. Biol Proced Online 21, 7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ovchinnikov S, Kinch L, Park H, Liao Y, Pei J, Kim DE, Kamisetty H, Grishin NV and Baker D (2015). Large-scale determination of previously unsolved protein structures using evolutionary information. Elife 4, e09248. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Patwardhan A, Ashton A, Brandt R, Butcher S, Carzaniga R, Chiu W, Collinson L, Doux P, Duke E, Ellisman MH, Franken E, Grunewald K, Heriche JK, Koster A, Kuhlbrandt W, Lagerstedt I, Larabell C, Lawson CL, Saibil HR, Sanz-Garcia E, Subramaniam S, Verkade P, Swedlow JR and Kleywegt GJ (2014). A 3D cellular context for the macromolecular world. Nat Struct Mol Biol 21, 841–845. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Patwardhan A, Carazo JM, Carragher B, Henderson R, Heymann JB, Hill E, Jensen GJ, Lagerstedt I, Lawson CL, Ludtke SJ, Mastronarde D, Moore WJ, Roseman A, Rosenthal P, Sorzano CO, Sanz-Garcia E, Scheres SH, Subramaniam S, Westbrook J, Winn M, Swedlow JR and Kleywegt GJ (2012). Data management challenges in three-dimensional EM. Nat Struct Mol Biol 19, 1203–1207. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Patwardhan A and Lawson CL (2016). Databases and Archiving for CryoEM. Methods Enzymol 579, 393–412. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pettersen EF, Goddard TD, Huang CC, Couch GS, Greenblatt DM, Meng EC and Ferrin TE (2004). UCSF Chimera--a visualization system for exploratory research and analysis. J Comput Chem 25, 1605–1612. [DOI] [PubMed] [Google Scholar]
- Protein Data Bank (1971). Crystallography: Protein Data Bank. Nature (London), New Biol. 233, 223–223. [Google Scholar]
- Putnam DK, Weiner BE, Woetzel N, Lowe EW Jr. and Meiler J (2015). BCL::SAXS: GPU accelerated Debye method for computation of small angle X-ray scattering profiles. Proteins 83, 1500–1512. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Read RJ, Adams PD, Arendall WB 3rd, Brunger AT, Emsley P, Joosten RP, Kleywegt GJ, Krissinel EB, Lutteke T, Otwinowski Z, Perrakis A, Richardson JS, Sheffler WH, Smith JL, Tickle IJ, Vriend G and Zwart PH (2011). A new generation of crystallographic validation tools for the protein data bank. Structure 19, 1395–1412. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rizzo AA, Vassel FM, Chatterjee N, D’Souza S, Li Y, Hao B, Hemann MT, Walker GC and Korzhnev DM (2018). Rev7 dimerization is important for assembly and function of the Rev1/Polzeta translesion synthesis complex. Proc Natl Acad Sci U S A 115, E8191–E8200. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Robinson PJ, Trnka MJ, Pellarin R, Greenberg CH, Bushnell DA, Davis R, Burlingame AL, Sali A and Kornberg RD (2015). Molecular architecture of the yeast Mediator complex. Elife 4, e08719. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rout MP and Sali A (2019). Principles for Integrative Structural Biology Studies. Cell 177, 1384–1403. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Russel D, Lasker K, Webb B, Velazquez-Muriel J, Tjioe E, Schneidman-Duhovny D, Peterson B and Sali A (2012). Putting the pieces together: integrative modeling platform software for structure determination of macromolecular assemblies. PLoS Biol 10, e1001244. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sali A, Berman HM, Schwede T, Trewhella J, Kleywegt G, Burley SK, Markley J, Nakamura H, Adams P, Bonvin AM, Chiu W, Peraro MD, Di Maio F, Ferrin TE, Grunewald K, Gutmanas A, Henderson R, Hummer G, Iwasaki K, Johnson G, Lawson CL, Meiler J, Marti-Renom MA, Montelione GT, Nilges M, Nussinov R, Patwardhan A, Rappsilber J, Read RJ, Saibil H, Schroder GF, Schwieters CD, Seidel CA, Svergun D, Topf M, Ulrich EL, Velankar S and Westbrook JD (2015). Outcome of the First wwPDB Hybrid/Integrative Methods Task Force Workshop. Structure 23, 1156–1167. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schaarschmidt J, Monastyrskyy B, Kryshtafovych A and Bonvin A (2018). Assessment of contact predictions in CASP12: Co-evolution and deep learning coming of age. Proteins 86 Suppl 1, 51–66. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schneidman-Duhovny D, Inbar Y, Nussinov R and Wolfson HJ (2005). PatchDock and SymmDock: servers for rigid and symmetric docking. Nucleic Acids Res 33, W363–367. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schwede T, Sali A, Honig B, Levitt M, Berman HM, Jones D, Brenner SE, Burley SK, Das R, Dokholyan NV, Dunbrack RL Jr., Fidelis K, Fiser A, Godzik A, Huang YJ, Humblet C, Jacobson MP, Joachimiak A, Krystek SR Jr., Kortemme T, Kryshtafovych A, Montelione GT, Moult J, Murray D, Sanchez R, Sosnick TR, Standley DM, Stouch T, Vajda S, Vasquez M, Westbrook JD and Wilson IA (2009). Outcome of a workshop on applications of protein models in biomedical research. Structure 17, 151–159. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schwieters CD, Bermejo GA and Clore GM (2018). Xplor-NIH for molecular structure determination from NMR and other data sources. Protein Sci 27, 26–40. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Serra F, Bau D, Goodstadt M, Castillo D, Filion GJ and Marti-Renom MA (2017). Automatic analysis and 3D-modelling of Hi-C data using TADbit reveals structural features of the fly chromatin colors. PLoS Comput Biol 13, e1005665. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shi Y, Fernandez-Martinez J, Tjioe E, Pellarin R, Kim SJ, Williams R, Schneidman-Duhovny D, Sali A, Rout MP and Chait BT (2014). Structural characterization by cross-linking reveals the detailed architecture of a coatomer-related heptameric module from the nuclear pore complex. Mol Cell Proteomics 13, 2927–2943. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shi Y, Pellarin R, Fridy PC, Fernandez-Martinez J, Thompson MK, Li Y, Wang QJ, Sali A, Rout MP and Chait BT (2015). A strategy for dissecting the architectures of native macromolecular assemblies. Nat Methods 12, 1135–1138. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Skinner JJ, Lim WK, Bedard S, Black BE and Englander SW (2012a). Protein dynamics viewed by hydrogen exchange. Protein Sci 21, 996–1005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Skinner JJ, Lim WK, Bedard S, Black BE and Englander SW (2012b). Protein hydrogen exchange: testing current models. Protein Sci 21, 987–995. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sunnerhagen M, Nilges M, Otting G and Carey J (1997). Solution structure of the DNA-binding domain and model for the complex of multifunctional hexameric arginine repressor with DNA. Nat Struct Biol 4, 819–826. [DOI] [PubMed] [Google Scholar]
- Tagari M, Newman R, Chagoyen M, Carazo JM and Henrick K (2002). New electron microscopy database and deposition system. Trends Biochem Sci 27, 589. [DOI] [PubMed] [Google Scholar]
- Tang Y, Huang YJ, Hopf TA, Sander C, Marks DS and Montelione GT (2015). Protein structure determination by combining sparse NMR data with evolutionary couplings. Nat Methods 12, 751–754. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tejero R, Snyder D, Mao B, Aramini JM and Montelione GT (2013). PDBStat: a universal restraint converter and restraint analysis software package for protein NMR. J Biomol NMR 56, 337–351. [DOI] [PMC free article] [PubMed] [Google Scholar]
- The UniProt Consortium (2017). UniProt: the universal protein knowledgebase. Nucleic Acids Res 45, D158–D169. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Trewhella J, Duff AP, Durand D, Gabel F, Guss JM, Hendrickson WA, Hura GL, Jacques DA, Kirby NM, Kwan AH, Perez J, Pollack L, Ryan TM, Sali A, Schneidman-Duhovny D, Schwede T, Svergun DI, Sugiyama M, Tainer JA, Vachette P, Westbrook J and Whitten AE (2017). 2017 publication guidelines for structural modelling of small-angle scattering data from biomolecules in solution: an update. Acta Crystallogr D Struct Biol 73, 710–728. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Trewhella J, Hendrickson WA, Kleywegt GJ, Sali A, Sato M, Schwede T, Svergun DI, Tainer JA, Westbrook J and Berman HM (2013). Report of the wwPDB Small-Angle Scattering Task Force: data requirements for biomolecular modeling and the PDB. Structure 21, 875–881. [DOI] [PubMed] [Google Scholar]
- Trussart M, Serra F, Bau D, Junier I, Serrano L and Marti-Renom MA (2015). Assessing the limits of restraint-based 3D modeling of genomes and genomic domains. Nucleic Acids Res 43, 3465–3477. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ulrich EL, Akutsu H, Doreleijers JF, Harano Y, Ioannidis YE, Lin J, Livny M, Mading S, Maziuk D, Miller Z, Nakatani E, Schulte CF, Tolmie DE, Kent Wenger R, Yao H and Markley JL (2008). BioMagResBank. Nucleic Acids Res 36, D402–408. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ulrich EL, Baskaran K, Dashti H, Ioannidis YE, Livny M, Romero PR, Maziuk D, Wedell JR, Yao H, Eghbalnia HR, Hoch JC and Markley JL (2018). NMR-STAR: comprehensive ontology for representing, archiving and exchanging data from nuclear magnetic resonance spectroscopic experiments. Journal of Biomolecular NMR, 1–5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Upla P, Kim SJ, Sampathkumar P, Dutta K, Cahill SM, Chemmama IE, Williams R, Bonanno JB, Rice WJ, Stokes DL, Cowburn D, Almo SC, Sali A, Rout MP and Fernandez-Martinez J (2017). Molecular Architecture of the Major Membrane Ring Component of the Nuclear Pore Complex. Structure 25, 434–445. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Valentini E, Kikhney AG, Previtali G, Jeffries CM and Svergun DI (2015). SASBDB, a repository for biological small-angle scattering data. Nucleic Acids Res 43, D357–363. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vallat B, Webb B, Westbrook J, Sali A and Berman HM (2019). Archiving and disseminating integrative structure models. J Biomol NMR. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vallat B, Webb B, Westbrook JD, Sali A and Berman HM (2018). Development of a Prototype System for Archiving Integrative/Hybrid Structure Models of Biological Macromolecules. Structure. [DOI] [PMC free article] [PubMed] [Google Scholar]
- van Heel M and Schatz M (2005). Fourier shell correlation threshold criteria. J Struct Biol 151, 250–262. [DOI] [PubMed] [Google Scholar]
- van Zundert GCP, Melquiond ASJ and Bonvin A (2015). Integrative Modeling of Biomolecular Complexes: HADDOCKing with Cryo-Electron Microscopy Data. Structure 23, 949–960. [DOI] [PubMed] [Google Scholar]
- van Zundert GCP, Rodrigues J, Trellet M, Schmitz C, Kastritis PL, Karaca E, Melquiond ASJ, van Dijk M, de Vries SJ and Bonvin A (2016). The HADDOCK2.2 Web Server: User-Friendly Integrative Modeling of Biomolecular Complexes. J Mol Biol 428, 720–725. [DOI] [PubMed] [Google Scholar]
- Viswanath S, Chemmama IE, Cimermancic P and Sali A (2017). Assessing Exhaustiveness of Stochastic Sampling for Integrative Modeling of Macromolecular Structures. Biophys J 113, 2344–2353. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vizcaino JA, Cote RG, Csordas A, Dianes JA, Fabregat A, Foster JM, Griss J, Alpi E, Birim M, Contell J, O’Kelly G, Schoenegger A, Ovelleiro D, Perez-Riverol Y, Reisinger F, Rios D, Wang R and Hermjakob H (2013). The PRoteomics IDEntifications (PRIDE) database and associated tools: status in 2013. Nucleic Acids Res 41, D1063–1069. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vizcaino JA, Csordas A, del-Toro N, Dianes JA, Griss J, Lavidas I, Mayer G, Perez-Riverol Y, Reisinger F, Ternent T, Xu QW, Wang R and Hermjakob H (2016). 2016 update of the PRIDE database and its related tools. Nucleic Acids Res 44, D447–456. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vizcaino JA, Mayer G, Perkins S, Barsnes H, Vaudel M, Perez-Riverol Y, Ternent T, Uszkoreit J, Eisenacher M, Fischer L, Rappsilber J, Netz E, Walzer M, Kohlbacher O, Leitner A, Chalkley RJ, Ghali F, Martinez-Bartolome S, Deutsch EW and Jones AR (2017). The mzIdentML Data Standard Version 1.2, Supporting Advances in Proteome Informatics. Mol Cell Proteomics 16, 1275–1285. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang RY, Song Y, Barad BA, Cheng Y, Fraser JS and DiMaio F (2016). Automated structure refinement of macromolecular assemblies from cryo-EM maps using Rosetta. Elife 5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang X, Chemmama IE, Yu C, Huszagh A, Xu Y, Viner R, Block SA, Cimermancic P, Rychnovsky SD, Ye Y, Sali A and Huang L (2017). The proteasome-interacting Ecm29 protein disassembles the 26S proteasome in response to oxidative stress. J Biol Chem 292, 16310–16320. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Waterhouse A, Bertoni M, Bienert S, Studer G, Tauriello G, Gumienny R, Heer FT, de Beer TAP, Rempfer C, Bordoli L, Lepore R and Schwede T (2018). SWISS-MODEL: homology modelling of protein structures and complexes. Nucleic Acids Res 46, W296–W303. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Weiner BE, Alexander N, Akin LR, Woetzel N, Karakas M and Meiler J (2014). BCL::Fold--protein topology determination from limited NMR restraints. Proteins 82, 587–595. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Weiner BE, Woetzel N, Karakas M, Alexander N and Meiler J (2013). BCL::MP-fold: folding membrane proteins through assembly of transmembrane helices. Structure 21, 1107–1117. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Westbrook J (2013). “PDBx/mmCIF Dictionary Resources.” Retrieved August 25, 2015, from http://mmcif.wwpdb.org.
- Wilkinson MD, Dumontier M, Aalbersberg IJ, Appleton G, Axton M, Baak A, Blomberg N, Boiten JW, da Silva Santos LB, Bourne PE, Bouwman J, Brookes AJ, Clark T, Crosas M, Dillo I, Dumon O, Edmunds S, Evelo CT, Finkers R, Gonzalez-Beltran A, Gray AJ, Groth P, Goble C, Grethe JS, Heringa J, t Hoen PA, Hooft R, Kuhn T, Kok R, Kok J, Lusher SJ, Martone ME, Mons A, Packer AL, Persson B, Rocca-Serra P, Roos M, van Schaik R, Sansone SA, Schultes E, Sengstag T, Slater T, Strawn G, Swertz MA, Thompson M, van der Lei J, van Mulligen E, Velterop J, Waagmeester A, Wittenburg P, Wolstencroft K, Zhao J and Mons B (2016). The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 3, 1–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Williams CJ, Headd JJ, Moriarty NW, Prisant MG, Videau LL, Deis LN, Verma V, Keedy DA, Hintze BJ, Chen VB, Jain S, Lewis SM, Arendall WB 3rd, Snoeyink J, Adams PD, Lovell SC, Richardson JS and Richardson DC (2018). MolProbity: More and better reference data for improved all-atom structure validation. Protein Sci 27, 293–315. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Williamson MP, Havel TF and Wuthrich K (1985). Solution conformation of proteinase inhibitor IIA from bull seminal plasma by 1H nuclear magnetic resonance and distance geometry. J Mol Biol 182, 295–315. [DOI] [PubMed] [Google Scholar]
- wwPDB consortium (2019). Protein Data Bank: the single global archive for 3D macromolecular structure data. Nucleic Acids Res 47, D520–D528. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Young JY, Westbrook JD, Feng Z, Sala R, Peisach E, Oldfield TJ, Sen S, Gutmanas A, Armstrong DR, Berrisford JM, Chen L, Chen M, Di Costanzo L, Dimitropoulos D, Gao G, Ghosh S, Gore S, Guranovic V, Hendrickx PM, Hudson BP, Igarashi R, Ikegawa Y, Kobayashi N, Lawson CL, Liang Y, Mading S, Mak L, Mir MS, Mukhopadhyay A, Patwardhan A, Persikova I, Rinaldi L, Sanz-Garcia E, Sekharan MR, Shao C, Swaminathan GJ, Tan L, Ulrich EL, van Ginkel G, Yamashita R, Yang H, Zhuravleva MA, Quesada M, Kleywegt GJ, Berman HM, Markley JL, Nakamura H, Velankar S and Burley SK (2017). OneDep: Unified wwPDB System for Deposition, Biocuration, and Validation of Macromolecular Structures in the PDB Archive. Structure 25, 536–545. [DOI] [PMC free article] [PubMed] [Google Scholar]