Skip to main content
Nucleic Acids Research logoLink to Nucleic Acids Research
. 2013 Nov 23;42(Database issue):D336–D346. doi: 10.1093/nar/gkt1144

ModBase, a database of annotated comparative protein structure models and associated resources

Ursula Pieper 1,2, Benjamin M Webb 1,2, Guang Qiang Dong 1,2, Dina Schneidman-Duhovny 1,2, Hao Fan 1,2, Seung Joong Kim 1,2, Natalia Khuri 1,2,3, Yannick G Spill 4,5, Patrick Weinkam 1,2, Michal Hammel 6, John A Tainer 7,8, Michael Nilges 4, Andrej Sali 1,2,*
PMCID: PMC3965011  PMID: 24271400

Abstract

ModBase (http://salilab.org/modbase) is a database of annotated comparative protein structure models. The models are calculated by ModPipe, an automated modeling pipeline that relies primarily on Modeller for fold assignment, sequence-structure alignment, model building and model assessment (http://salilab.org/modeller/). ModBase currently contains almost 30 million reliable models for domains in 4.7 million unique protein sequences. ModBase allows users to compute or update comparative models on demand, through an interface to the ModWeb modeling server (http://salilab.org/modweb). ModBase models are also available through the Protein Model Portal (http://www.proteinmodelportal.org/). Recently developed associated resources include the AllosMod server for modeling ligand-induced protein dynamics (http://salilab.org/allosmod), the AllosMod-FoXS server for predicting a structural ensemble that fits an SAXS profile (http://salilab.org/allosmod-foxs), the FoXSDock server for protein–protein docking filtered by an SAXS profile (http://salilab.org/foxsdock), the SAXS Merge server for automatic merging of SAXS profiles (http://salilab.org/saxsmerge) and the Pose & Rank server for scoring protein–ligand complexes (http://salilab.org/poseandrank). In this update, we also highlight two applications of ModBase: a PSI:Biology initiative to maximize the structural coverage of the human alpha-helical transmembrane proteome and a determination of structural determinants of human immunodeficiency virus-1 protease specificity.

INTRODUCTION

The genome sequencing efforts provide us with the complete genetic blueprints of thousands of organisms, including many eukaryotic genomes. We are now faced with the challenge of assigning, investigating and modifying the functions of proteins encoded by these genomes. This task is generally facilitated by the knowledge of the 3D protein structures, which are best determined by experimental methods such as X-ray crystallography and nuclear magnetic resonance-spectroscopy. While the number of experimentally determined structures deposited in the Protein Data Bank (PDB) (1) increased by nearly 40% to ∼93 000 in the past 3 years (September 2013), the number of sequences in the comprehensive sequence databases, such as UniProtKB (2) and GenPept (3), continues to grow even more rapidly; for example, the number of sequences in UniProtKB has now reached >41 million, compared with 12 million only 3 years ago. Therefore, protein structure prediction is essential to bridge this gap. The need for accurate models can frequently be met by homology or comparative modeling (4–13). Comparative modeling is carried out in four sequential steps: identifying known structures (templates) related to the sequence to be modeled (target), aligning the target sequence with the templates, building models and assessing the models. For this reason, comparative modeling is only applicable when the target sequence is detectably related to a known protein structure.

As more proteins are modeled, web-accessible resources that assist biologists in evaluating and analyzing models become increasingly useful. Here, we describe the current state of the ModBase database of comparative protein structure models, the ModWeb comparative modeling web-server and several new associated resources, including web-servers that use SAXS data in the context of comparative modeling: The AllosMod server for modeling ligand-induced protein dynamics (http://salilab.org/allosmod) (14), the AllosMod-FoXS server for predicting the ensemble of conformations that best fit a given SAXS profile (http://salilab.org/allosmod-foxs) (Weinkam et al. in preparation), the FoXSDock server that performs protein–protein docking filtered by a SAXS profile (http://salilab.org/foxsdock) (15), the SAXS Merge server for merging SAXS profiles (http://salilab.org/saxsmerge) (Spill et al. accepted) and the Pose & Rank server for scoring protein–ligand complexes based on a statistical potential (http://salilab.org/poseandrank) (16). Finally, we highlight applications of ModBase models to maximize the structural coverage of the human α-helical transmembrane proteome in a PSI:Biology effort; and to an analysis of structural determinants of human immunodeficiency virus-1 (HIV-1) protease specificity.

CONTENTS

Model generation by comparative modeling (Modeller and ModPipe)

Models in ModBase are calculated using our automated software pipeline for comparative protein structure modeling, ModPipe (17). ModPipe relies mostly on modules of Modeller (18) as well as fold assignment and sequence-structure alignment by PSI-BLAST (19) and the HHSuite modules HHBlits (20) and HHSearch (21). To be able to process a large number of sequences, it is implemented on a Linux cluster.

ModPipe uses sequence–sequence (22), sequence–profile (19,23) and profile–profile (5,24) methods for fold assignment and target–template alignment, using a promiscuous E-value threshold of 1.0 to increase the likelihood of identifying the best available template structure. In addition to the previously implemented profile methods (Modeller’s Build-Profile and PPScan, and PSI-BLAST), we recently added an option to use HHBlits and HHSearch. These will be included in the next public release of ModPipe (2.3.0, expected December 2013). Alignments created by any of these methods can cover the complete target sequence, or only a segment of it, depending on the availability of suitable PDB templates. With the added functionality of HHBlits and HHSearch, some ModPipe models are now based on multiple templates.

To increase efficiency, the available target–template alignments are filtered by sequence identity (ModPipe template option: TOP): if the highest target–template sequence identity is ≤40%, ModPipe selects alignments for all detected templates. Otherwise, the selection only contains alignments for each target–template alignment that is created in a 20% sequence identity window starting from the highest sequence identity. For each selected target–template alignment, 10 models are calculated (18), and the model with the best value of the DOPE statistical potential (25) is selected and then evaluated by several additional quality criteria: (i) target–template sequence identity, (ii) GA341 score (26), (iii) Z-DOPE score (25), (iv) MPQS score (ModPipe quality score) (27) and (v) TSVMod score (28). The models that score best with at least one of these quality criteria are selected for further filtering. If >30 residues of a target sequence are not covered by a selected model, additional models are selected even if they do not score best with at least one of the quality criteria. Finally, only the models with quality criteria values above specified thresholds or with an E-value <104 are included in the final model set.

A key feature of the pipeline is that the validity of sequence–structure relationships is not prejudged at the fold-assignment stage; instead, sequence–structure matches are assessed after the construction of the models and their evaluation. This approach enables a thorough exploration of fold assignments, sequence–structure alignments and conformations, with the aim of finding the model with the best evaluation score, at the expense of increasing the computational time significantly; for some sequences, a few thousand models can be calculated. For sequences with high-quality templates, the optional ‘TOP’ keyword can reduce the amount of computer time by up to 60%.

The source code for ModPipe is freely accessible under the Gnu Public license (http://salilab.org/modpipe). The binary code for Modeller is also available freely to academics for a number of different operating systems (http://salilab.org/modeller).

Statistically optimized atomic potentials (SOAP) for assessing protein interfaces and loops

Both loop modeling and protein–protein docking require accurate scoring functions for selecting the most accurate sampled models. Statistically Optimized Atomic Potentials (SOAP)-PP and SOAP-Loop are atomic statistical potentials for assessing protein interfaces and loops, respectively (http://salilab.org/SOAP, also available in Modeller) (29). They were derived using a Bayesian framework for inferring SOAP. When using SOAP-PP for scoring protein–protein docking models, a near-native model is within the top 10 scoring models in 52% of the PatchDock decoys (30), compared with 23 and 27% for the state-of-the-art ZRANK (31) and FireDock (32) scoring functions, respectively. Similarly, for modeling 12-residue loops in the PLOP benchmark (33), the average main-chain root-mean-square-deviation (RMSD) of the best-scored conformations by SOAP-Loop is 1.5 Å, close to the average RMSD of the best-sampled conformations (1.2 Å) and significantly better than that selected by the Rosetta (34) (2.1 Å), DFIRE (35) (2.3 Å), DOPE (2.5 Å) (25) and PLOP scoring functions (3.0 Å). The SOAP-PP score is used by our AllosMod-FoXS server (below). We are incorporating SOAP scores into the modeling and model assessment modules of ModPipe.

ModBase model sets

Models in ModBase are organized in datasets. Because of the rapid growth of the public sequence databases, we concentrate our efforts on adding datasets that are useful for specific projects, rather than attempt to model all known protein sequences based on all detectably related known structures. Currently, ModBase includes a model dataset for each of 65 complete genomes, as well as datasets for all sequences in the Structure Function Linkage Database (SFLD) (36), and for the complete SwissProt/TrEMBL database as of 2005 (http://salilab.org/modbase/statistics). Additionally, available models for new SFLD sequences are added weekly. Together with other project-oriented datasets, ModBase currently contains ∼29 million reliable models for domains in 4.7 million unique sequences. The ‘Nominate a modelome!’ feature allows community users to request modeling of additional complete genomes as our computational resources allow. This feature has been used, for example, to support the Tropical Disease Initiative (http://tropicaldisease.org) (37–40)

ModWeb: comparative modeling web-server

The ModWeb comparative modeling web-server is an integral module of ModBase (http://salilab.org/modweb) (17). In the default mode, ModWeb accepts one or more sequences in the FASTA format, followed by calculating and evaluating their models using ModPipe based on the best available templates from the PDB. Alternatively, ModWeb also accepts a protein structure as input (template-based calculation), calculates a multiple sequence profile and identifies all homologous sequences in the UniProtKB database, followed by modeling these homologs based on the user-provided structure. This alternative protocol is a useful tool for measuring the impact of new structures, such as those generated by structural genomics efforts (41). Moreover, new members of sequence superfamilies with at least one known structure can be identified (42).

In addition to anonymous access, registered users get unified access to all their ModWeb datasets and can submit template-based calculations.

ASSOCIATED RESOURCES

A number of web services are associated with ModBase. Some of these are tightly integrated with ModBase, whereas others contain data that are derived through ModBase [e.g. single-nucleotide polymorphism (SNP) annotations created by LS-SNP (43)]. We already described the interactions of ModBase with the ModLoop server for loop modeling in protein structures (http://salilab.org/modloop) (44), the PIBASE database of protein–protein interaction (http://salilab.org/pibase) (45), the DBAli database of structural alignments (http://salilab.org/dbali) (46,47), the LS-SNP database of structural annotations of human non-synonymous SNPs (http://salilab.org/LS-SNP) (43,48,49), the SALIGN server for multiple sequence and structure alignment (http://salilab.org/salign) (27,50), the ModEval server for predicting the accuracy of protein structure models (http://salilab.org/modeval) (27), the PCSS server for predicting which peptides bind to a given protein (http://salilab.org/pcss) (27) and the FoXS server for calculating and fitting small angle X-ray scattering profiles (http://salilab.org/foxs) (27,51). Here, we describe several new servers that interact with ModBase.

AllosMod: a web-server for modeling ligand-induced protein dynamics

Conformational transitions of biomolecules are key to many aspects of biology. These dynamic changes span a broad range of time and size scales, and include protein folding, aggregation, induced fit and allostery.

The AllosMod web server (http://salilab.org/allosmod) predicts conformational changes that occur in the native ensemble, such as allosteric conformational transitions. The input is one or more macromolecular coordinate files (including DNA, RNA and sugar molecules) and the corresponding sequence(s). The output is a set of molecular dynamics trajectories based on a simplified energy landscape. The documentation includes analysis examples to help the user in interpreting the expected output. Carefully designed energy landscapes allow efficient molecular dynamics sampling at constant temperatures, thereby providing ergodic sampling of conformational space. AllosMod energy landscapes are constructed using contacts in crystal structure(s) to define the energetic minima. This model is referred to as a structure-based or Gō model (52–54). The energy landscapes are sampled using many short constant temperature molecular dynamics simulations. Sampling occurs quickly, even for large systems with up to 10 000 residues, because the simplified landscapes can be stored in memory. The user can also download Python scripts necessary to run and modify the simulations, which are performed using Modeller (18).

The capabilities of the AllosMod server have been demonstrated in a study of allosteric systems with known effector bound and unbound crystal structures (14,55). Effector bound and unbound simulations are performed using a landscape with a single minimum for the interactions in the effector binding site, corresponding to the bound or unbound structure and dual minima for interactions in the rest of the protein, corresponding to the bound and unbound structures. AllosMod can also be used to predict coupling (i.e. ΔΔG) between a mutation site and the effector binding site.

A family of web-servers for computation and application of SAXS profiles

SAXS is a common technique for low-resolution structural characterization of molecules in solution (56,57). SAXS experiments determine the scattering intensity of a molecule as a function of spatial frequency, resulting in a SAXS profile that can be easily converted into the approximate distribution of atomic distances in the measured system. The experiments can be performed with the protein sample in solution, and usually take only a few minutes on a well-equipped synchrotron beamline (57). Here, we describe new features of the FoXS server for calculating and fitting SAXS profiles, the AllosMod-FoXS server that predicts the structural ensemble that best fits a given SAXS profile, the FoXSDock server that performs protein–protein docking filtered by a SAXS profile and the SAXS Merge server for merging SAXS profiles measured at different concentrations and exposure times.

FoXS (http://salilab.org/foxs) is a rapid and accurate server for calculating a SAXS profile of a given molecular structure (51). The input is one or more macromolecular coordinate files or PDB codes and an experimental profile. The output is a calculated SAXS profile for each input structure, fitted onto the experimental profile. The method explicitly computes all inter-atomic distances and models the first solvation layer based on solvent accessibility. FoXS was tested on 11 protein, 1 DNA and 2 RNA structures, revealing superior accuracy and speed versus CRYSOL (58), AquaSAXS (59), the Zernike polynomials-based method (60) and Fast-SAXS-pro (61). In addition, we demonstrated a significant correlation of the SAXS score with the accuracy of a structural model (62). We have recently updated the server to an interactive user interface; profiles are displayed via an HTML5 canvas element and structures are shown in a Jmol window (Figure 1). If the user uploads multiple structures, the server automatically performs the minimal ensemble computation with Minimal Ensemble Search (MES) (64).

Figure 1.

Figure 1.

The computed profiles for filament models of the XLF–XRCC4 complex (63) are fitted to the experimental SAXS profile with FoXS. The interactive user interface displays the profiles in the left and the models in the right using the same color for each model/profile pair. The table below the panels displays the fit parameters and includes buttons to simultaneously show or hide each model/profile pair. Clicking on Minimal Ensemble Search (MES) results (above the display panel) takes the user to the MES output page.

AllosMod-FoXS (http://salilab.org/allosmod-foxs) is a server that predicts the structural ensemble that best fits a given SAXS profile. The input is one or more macromolecular coordinate files, the corresponding sequence(s) and an ‘experimental’ SAXS profile. The output is the structural ensemble that best fits the input SAXS profile. The server relies on AllosMod conformational sampling (14), FoXS calculations of theoretical SAXS profiles, minimal ensemble computation with MES (64) and the SOAP-PP score. The server was motivated to describe conformational changes in proteins, such as the allostery, based on both modeling considerations (as represented by AllosMod) and experimental SAXS data (as represented by FoXS).

The AllosMod-FoXS server uses various sampling algorithms in AllosMod to generate structures that are directly entered into FoXS. Because FoXS explicitly computes all inter-atomic distances and models the first solvation layer based on solvent accessibility, it can be used to score the similarity of the experimental SAXS profile to the predicted SAXS profiles corresponding to structures from the AllosMod simulations. In addition to the FoXS score, each conformation is assessed for structural quality, using the SOAP-PP score. These two scores are combined to predict structures that collectively best explain the experimental SAXS profile.

FoXSDock (http://salilab.org/foxsdock) is a web server that uses SAXS profiles to filter the models produced by protein–protein docking. It accepts as input structures of two docked proteins and an experimental SAXS profile of their complex. The output is a set of docking models and their calculated SAXS profiles fitted onto the experimental profile. Although many structures of single protein components are becoming available, structural characterization of their complexes remains challenging. Although general, protein–protein docking methods suffer from large errors because of protein flexibility and inaccurate scoring functions. However, when additional information, such as a SAXS profile, is available, it is possible to significantly increase the accuracy of the computational docking.

FoXSDock combines rigid global docking by PatchDock, filtering of the models based on the SAXS profile and interface refinement by FireDock (15). The approach was benchmarked on 176 protein complexes with simulated SAXS profiles, as well as on 7 complexes with experimentally determined SAXS profiles (30). When induced fit is <1.5 Å interface Cα RMSD and the fraction of residues missing from the component structures is <3%, FoXSDock can find a model close to the native structure within the top 10 predictions in 77% of the cases; in comparison, docking alone succeeds in only 34% of the cases.

SAXS Merge (http://salilab.org/saxsmerge) is a web server that uses automated statistical methods to merge SAXS profiles determined at different concentrations and exposure times. High-throughput SAXS data collection requires robust, accurate and automated tools for data processing and merging (57,65). However, SAXS data are generally processed highly subjectively, often manually with the aid of the PRIMUS software package (66). The operation requires an experienced user who can manually inspect each profile to be merged and decide whether the SAXS profiles agree or not. The SAXS Merge web-server alleviates user intervention through an automated and statistically principled merging procedure based on a Bayesian approach (Spill et al. submitted). The SAXS Merge web server was successfully validated on a benchmark of 16 SAXS datasets. The input file consists only of the buffer-subtracted SAXS profiles in a common three-column text format. The output comprises (i) a list of individual q points with associated source profiles, (ii) an estimate of the mean profile, along with a 95% Bayesian credible interval and (iii) the most suitable parametric mean function for the resulting profile, an estimate of the noise level in the pooled dataset. The output is visualized interactively through the web-browser and can also be downloaded.

Pose & rank: a web-server for scoring protein–ligand complexes

Molecular recognition between proteins and ligands plays an important role in many biological processes. Predicting the structures of protein–ligand complexes and finding ligands by virtual screening of small molecule databases are two long-standing goals in molecular biophysics and medicinal chemistry. Solving both problems requires the development of an accurate and efficient scoring function to assess protein–ligand interactions.

The Pose & Rank web server (http://salilab.org/poseandrank) (16) provides access to two atomic distance-dependent statistical scoring functions based on probability theory that can be used in protein–ligand docking: The PoseScore was optimized for recognizing native binding geometries of ligands from other poses, and the RankScore was optimized for distinguishing ligands from non-binding molecules. The server accepts as input a coordinate file of the target protein structure in the PDB format and docking poses of small molecules. The output is a list of scores for each protein–small molecule complex. PoseScore ranks a near-native binding pose the best, top 5 and top 10 for 88%, 97% and 99% of targets, respectively. RankScore improves the overall ligand enrichment (logAUC) and early enrichment (EF1) scores computed by DOCK 3.6 (67) for 68% and 74% of targets, respectively. The Pose & Rank resource can contribute to many applications, such as selecting ligand candidates from virtual screening for experimental testing, predicting the binding geometries for known ligands and suggesting binding site mutations that alter the ligand binding properties and consequently protein functions.

APPLICATION EXAMPLES

Coordinating the impact of structural genomics on the human α-helical transmembrane proteome

With the recent successes in determining membrane protein structures, we explored the tractability of determining representatives for the entire human transmembrane proteome (68) (http://salilab.org/membrane). This proteome contains 2925 unique integral α-helical transmembrane domain sequences that cluster into 1201 families sharing >25% sequence identity. We assessed the modeling coverage by processing all sequences through ModPipe, and analyzing the resulting ModBase dataset. We then clustered all sequences [BlastClust(69)], annotated them with cluster size, modeling coverage and number of predicted transmembrane helices. Finally, we explored several target selection strategies. Structures of 100 optimally selected targets would increase the fraction of modelable human alpha-helical transmembrane domains from 26 to 58%, thus providing structure/function information not otherwise available.

To leverage the results of this study, the PSI:Biology Network (http://www.nigms.nih.gov/Research/FeaturedPrograms/PSI/psi_biology/), including high-throughput and membrane PSI centers as well as the Structural Genomics Consortium, is attempting to express nearly 100 human transmembrane proteins using their standard high-throughput methods. The goal of this survey is to determine which methods best express certain classes of transmembrane proteins. The sequences of our previous analysis were further annotated by fraction of predicted disordered regions (70,71), number of glycosylation sites (2,72,73), clone availability (74–76), HUGO annotations (77), sequence length and several additional metrics. Eighty-six targets were hand-picked from the largest clusters to represent a diverse selection of human membrane proteins with maximum coverage of the transmembrane proteome. Cloning, expression and solubility experiments of these targets using the pipelines of the 10 participating research groups are currently in progress. Participants also use shared and individual sets of six controls. A standard method will be used by all to visualize the protein bands to quantify yield. A final full comparison will determine the most successful methods for each representative transmembrane protein. Progress of the survey is cataloged by the portal of the Protein Structure Initiative Structural Biology Knowledgebase [PSI SBKB (78); http://hmpps.sbkb.org/] and will be accessible to the public after the conclusion of the experiment. A final publication will summarize the survey’s findings.

Structural determinants of HIV-1 protease

The maturation of the HIV virion is facilitated by the cleavage of the Gag and Pol polyproteins (79). A homodimeric aspartic protease (HIV-1 protease) catalyzes these processing events at 10 non-homologous sites and is the target of some of the most effective antiretroviral drugs (80–82). These sites are eight amino acid residues in length; the cleavage occurs between the third and fourth residues (83–86). In addition to processing viral proteins, HIV-1 protease cleaves several human proteins during infection, such as the eukaryotic translation initiation factor 3 subunit D (eIF3D) (87–90).

To predict cleavage sites in human proteins, we began by examining sequence and structural features of >120 cellular substrates of HIV-1 protease that were recently identified in vitro (91) (for an example, see Figure 2). First, every residue of the cleaved and non-cleaved octapeptides was encoded using >512 physicochemical amino acid indices (93,94). To account for cooperativity between residues in different positions of the octapeptide, frequencies of dipeptides and gapped dipeptides (i.e. two specific residues separated by any residue) were also used to train machine learning algorithms for binary classification. Second, a greedy feature selection procedure was applied to determine features of octapeptides important for protease activity. Interestingly, although features encoding known viral cleavage motif ELLE were important for classification, most discriminating features encode structural preferences of amino acid residues in the second and fifth positions of the octapeptide. Therefore, we created a ModBase dataset of 405 models for 118 human proteins cleaved in vitro. PSI-Pred (95) was used to predict secondary structure elements for protein regions without templates. Analysis of the structural models showed the enrichment of alpha+beta protein class (SCOP ID = 53 931) among cleaved proteins and coiled secondary structure (∼41%) among cleaved sites. We added structure-based descriptors of cleaved and non-cleaved sites to the sequence-based features and assessed classifiers’ performance in a 5-fold cross-validation procedure. The average area under the receiver operating characteristic curve for the classifier trained with the Random Forest algorithm(96) was 0.965 (72% sensitivity and 98% specificity) and the entire human proteome was scanned for putative human substrates of the HIV-1 protease. We are currently experimentally validating several of the predicted cleavage sites.

Figure 2.

Figure 2.

Cleavage of human proteins by the HIV-1 protease: crystal structure of the N-terminal domain of human Lupus La protein (92) (left). Residues of the cleavage site (Ile-Asp-Tyr-Tyr-Phe-Gly-Glu-Phe) are shown in orange. Scissile bond between Tyr and Phe in the alpha-helix is cleaved by the HIV-1 protease in vitro.

ACCESS AND INTERFACE

Direct access

The main access to ModBase is through its web interface at http://salilab.org/modbase, by querying with UniprotKB (2,3) and GI (97) identifiers, gene names, annotation keywords, PDB(1) codes, dataset names, organism names, sequence similarity to the modeled sequences [BLAST(19)] and model-specific criteria such as model reliability, model size and target–template sequence identity. Additionally, it is possible to retrieve coordinate files and alignment files of all models for a specific sequence as text files. Metadata for all current ModBase models (updated weekly), all genome datasets and several additional project specific datasets, are also available from our FTP server (ftp://salilab.org/databases/modbase/projects).

The output of a search is displayed on pages with varying amounts of information about the modeled sequences, template structures, alignments and functional annotations. Output examples from a search resulting in one model are shown in Figure 3. A ribbon diagram of the model with the highest target–template sequence identity is displayed by default, together with some details of the modeling calculation. Ribbon thumbprints of additional models for this sequence link to corresponding pages with more information. Ribbon diagrams are generated on the fly using Molscript (98) and Raster3D (99). A pull-down menu provides links to additional functionalities: the SNP module; retrieval of coordinate and alignment files; molecular visualization by UCSF Chimera (100) that allows the user to display template and model coordinates together with their alignment; and Chimera visualization of predicted cavities [ConCavity (101)]. If mutation information is available for a protein sequence, links to the details are provided in the cross-references section. Additionally, cross-references to various other databases, including PDB (102), UniProtKB (103), the UCSC Genome Browser (104), EBI’s InterPro (105), PharmGKB (106) and SFLD (36) are given. Other ModBase pages provide overviews of more than one sequence or structure. All ModBase pages are interconnected to facilitate easy navigation between different views.

Figure 3.

Figure 3.

ModBase interface elements. Search Form: search options are available through the pull-down menu. A quick overview of the available representations is displayed below the search form. Model Details Sketch: the Model details page provides information for all models of a given sequences. The sketch comprises two parts: the model coverage sketch that indicates the sequence coverage by all models (top line) and the sequence coverage by the current model (second line), and a ribbon diagram of the current model. Other models are available via thumbprints. Update and Remodel: this box shows the date of the last modeling calculation for the current sequence, and allows the user to request an update. Chimera Visualization: the visualization includes the model and template structures and the alignment. Cross-references: links to the PMP, UniProtKB, Genbank, UCSC Genome Browser and other databases. Model Details Options: the pull-down menu switches between representations and allows downloads of coordinate and alignment files. Quality Criteria: red indicates unreliable, green reliable. Model Overview: a different representation for several sequences gives a quick overview on modeling coverage and quality. Chimera Cavity View: visualizes cavities predicted by ConCavity.

Access through external databases

The Protein Model Portal

The Protein Model Portal (PMP) has become a valuable option for accessing ModBase models (http://proteinmodelportal.org) (107). The PMP is a single point of entry for accessing protein structure models from a number of different databases. PMP queries all participating source model databases and serves the user with the model coordinates, alignments and quality criteria from a central location. It has been developed as a module of the Protein Structure Initiative Knowledgebase (PSI KB) (79,108). The PMP provides a flexible search interface for all deposited models, quality estimation, cross-links to other sequence and structure databases, annotations of sequences and their models, a central point of entry to comparative modeling servers (including ModWeb) and quality estimation servers (including ModEval) and detailed tutorials on all aspects of comparative modeling. Currently, the PMP retrieves ∼450 000 ModBase model coordinate files each week from ModBase.

A sister web-service to PMP, CAMEO (http://cameo3d.org) (107) continuously evaluates the accuracy and reliability of several comparative protein structure prediction servers in a fully automated manner. The ModWeb server currently participates in the testing mode, and is expected to move into the production mode in the first quarter of 2014.

Access through external databases

ModBase models in academic and public datasets are also directly accessible from several databases, including the PMP (107), UniProtKB (109), PIR’s iProClass (103), EBI’s InterPro (105), the UCSC Genome Browser (104), PubMed (LinkOut) (110), PharmGKB (106) and SFLD (36).

FUTURE DIRECTIONS

ModBase will grow by adding models calculated on demand by external users (using ModWeb) as well as our own calculations of model datasets that are needed for our research projects (using ModPipe, ModWeb or Modeller). These updates will reflect improvements in the methods and software used for calculating the models as well as new template structures in the PDB and new sequences in UniProtKB. In the future, we expect that most of the users will access ModBase models through the PMP.

CITATION

Users of ModBase are requested to cite this article in their publications.

FUNDING

National Institutes of Health [U54 GM094662, U54 GM094625, U54 GM093342, MINOS R01GM105404 to J.A.T. and M.H.]; Sandler Family Supporting Foundation (A.S.); Department of Energy Lawrence Berkeley National Lab IDAT program (to J.A.T. and M.H.); European Union [FP7-IDEAS-ERC 294809 to M.N.]. The authors thank Tom Ferrin and the UCSF Resource for Biocomputing, Visualization and Informatics for making UCSF Chimera (supported by [NIGMS P41-GM103311]) available to the ModBase database and tools. Funding for open access charge: NIH.

Conflict of interest statement. None declared.

ACKNOWLEDGEMENTS

For linking to ModBase from their databases, the authors thank Torsten Schwede and Jürgen Haas (Protein Model Portal), David Haussler and Jim Kent (UCSC Genome Browser), Rolf Apweiler, Maria Jesus Martin and Claire O’Donovan (UniProt), Rolf Apweiler, Sarah Hunter (InterPro), Patsy Babbitt (SFLD), Russ Altman (PharmGKB) and Kathy Wu (PIR/iProClass). We are also grateful for computing hardware gifts from Mike Homer, Ron Conway, NetApp, IBM, Hewlett Packard and Intel.

REFERENCES

  • 1.Rose PW, Bi C, Bluhm WF, Christie CH, Dimitropoulos D, Dutta S, Green RK, Goodsell DS, Prlic A, Quesada M, et al. The RCSB Protein Data Bank: new resources for research and education. Nucleic Acids Res. 2013;41:D475–D482. doi: 10.1093/nar/gks1200. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.The UniProt Consortium. Ongoing and future developments at the Universal Protein Resource. Nucleic Acids Res. 2011;39:D214–D219. doi: 10.1093/nar/gkq1020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Benson DA, Karsch-Mizrachi I, Clark K, Lipman DJ, Ostell J, Sayers EW. GenBank. Nucleic Acids Res. 2012;40:D48–D53. doi: 10.1093/nar/gkr1202. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Baker D, Sali A. Protein structure prediction and structural genomics. Science. 2001;294:93–96. doi: 10.1126/science.1065659. [DOI] [PubMed] [Google Scholar]
  • 5.Eswar N, Webb B, Marti-Renom MA, Madhusudhan MS, Eramian D, Shen MY, Pieper U, Sali A. Comparative protein structure modeling using MODELLER. Curr. Protoc. Bioinformatics. 2006 doi: 10.1002/0471250953.bi0506s15. Chapter 5, Unit 5.6.1–5.6.30. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Eswar N, Eramian D, Webb B, Shen MY, Sali A. Protein structure modeling with MODELLER. Methods Mol. Biol. 2008;426:145–159. doi: 10.1007/978-1-60327-058-8_8. [DOI] [PubMed] [Google Scholar]
  • 7.Schwede T, Sali A, Eswar N, Peitsch MC. In: Computational Structural Biology. Schwede T, Peitsch MC, editors. Singapore: World Scientific Publishing Ltd; 2008. pp. 3–35. [Google Scholar]
  • 8.Forrest LR, Tang CL, Honig B. On the accuracy of homology modeling and sequence alignment methods applied to membrane proteins. Biophys. J. 2006;91:508–517. doi: 10.1529/biophysj.106.082313. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Liu T, Tang GW, Capriotti E. Comparative modeling: the state of the art and protein drug target structure prediction. Comb. Chem. High Throughput Screen. 2011;14:532–547. doi: 10.2174/138620711795767811. [DOI] [PubMed] [Google Scholar]
  • 10.Fiser A. Template-based protein structure modeling. Methods Mol. Biol. 2010;673:73–94. doi: 10.1007/978-1-60761-842-3_6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Daga PR, Patel RY, Doerksen RJ. Template-based protein modeling: recent methodological advances. Curr. Top. Med. Chem. 2010;10:84–94. doi: 10.2174/156802610790232314. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Hillisch A, Pineda LF, Hilgenfeld R. Utility of homology models in the drug discovery process. Drug Discov. Today. 2004;9:659–669. doi: 10.1016/S1359-6446(04)03196-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Wallner B, Elofsson A. All are not equal: a benchmark of different homology modeling programs. Protein Sci. 2005;14:1315–1327. doi: 10.1110/ps.041253405. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Weinkam P, Pons J, Sali A. Structure-based model of allostery predicts coupling between distant sites. Proc. Natl Acad. Sci. USA. 2012;109:4875–4880. doi: 10.1073/pnas.1116274109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Schneidman-Duhovny D, Hammel M, Sali A. Macromolecular docking restrained by a small angle X-ray scattering profile. J. Struct. Biol. 2011;3:461–471. doi: 10.1016/j.jsb.2010.09.023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Fan H, Schneidman D, Irwin JJ, Dong G, Shoichet B, Sali A. Statistical potential for modeling and ranking protein-ligand interactions. J. Chem. Inf. Model. 2011;51:3078–3092. doi: 10.1021/ci200377u. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Eswar N, John B, Mirkovic N, Fiser A, Ilyin VA, Pieper U, Stuart AC, Marti-Renom MA, Madhusudhan MS, Yerkovich B, et al. Tools for comparative protein structure modeling and analysis. Nucleic Acids Res. 2003;31:3375–3380. doi: 10.1093/nar/gkg543. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Sali A, Blundell TL. Comparative protein modelling by satisfaction of spatial restraints. J. Mol. Biol. 1993;234:779–815. doi: 10.1006/jmbi.1993.1626. [DOI] [PubMed] [Google Scholar]
  • 19.Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–3402. doi: 10.1093/nar/25.17.3389. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Remmert M, Biegert A, Hauser A, Soding J. HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment. Nat. Methods. 2012;9:173–175. doi: 10.1038/nmeth.1818. [DOI] [PubMed] [Google Scholar]
  • 21.Soding J. Protein homology detection by HMM-HMM comparison. Bioinformatics. 2005;21:951–960. doi: 10.1093/bioinformatics/bti125. [DOI] [PubMed] [Google Scholar]
  • 22.Smith TF, Waterman MS. Identification of common molecular subsequences. J. Mol. Biol. 1981;147:195–197. doi: 10.1016/0022-2836(81)90087-5. [DOI] [PubMed] [Google Scholar]
  • 23.Eswar N, Webb B, Marti-Renom M, Madhusudhan M, Eramian D, Shen M, Pieper U, Sali A. Comparative protein structure modeling using Modeller. Curr. Protoc. Bioinformatics. 2006 doi: 10.1002/0471250953.bi0506s15. Chapter 5, Unit 5.6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Marti-Renom MA, Madhusudhan MS, Sali A. Alignment of protein sequences by their profiles. Protein Sci. 2004;13:1071–1087. doi: 10.1110/ps.03379804. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Shen MY, Sali A. Statistical potential for assessment and prediction of protein structures. Protein Sci. 2006;15:2507–2524. doi: 10.1110/ps.062416606. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Melo F, Sali A. Fold assessment for comparative protein structure modeling. Protein Sci. 2007;16:2412–2426. doi: 10.1110/ps.072895107. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Pieper U, Webb BM, Barkan DT, Schneidman-Duhovny D, Schlessinger A, Braberg H, Yang Z, Meng EC, Pettersen EF, Huang CC, et al. ModBase, a database of annotated comparative protein structure models, and associated resources. Nucleic Acids Res. 2011;39:465–474. doi: 10.1093/nar/gkq1091. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Eramian D, Eswar N, Shen M, Sali A. How well can the accuracy of comparative protein structure models be predicted? Protein Sci. 2008;17:1881–1893. doi: 10.1110/ps.036061.108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Dong GQ, Fan H, Schneidman-Duhovny D, Webb B, Sali A. Optimized atomic statistical potentials: Assessment of protein interfaces and loops. 2013 doi: 10.1093/bioinformatics/btt560. (epub ahead of print October 23, 2013) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Schneidman-Duhovny D, Rossi A, Avila-Sakar A, Kim SJ, Velazquez-Muriel J, Strop P, Liang H, Krukenberg KA, Liao M, Kim HM, et al. A method for integrative structure determination of protein-protein complexes. Bioinformatics. 2012;28:3282–3289. doi: 10.1093/bioinformatics/bts628. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Pierce B, Weng Z. ZRANK: reranking protein docking predictions with an optimized energy function. Proteins. 2007;67:1078–1086. doi: 10.1002/prot.21373. [DOI] [PubMed] [Google Scholar]
  • 32.Andrusier N, Nussinov R, Wolfson HJ. FireDock: fast interaction refinement in molecular docking. Proteins. 2007;69:139–159. doi: 10.1002/prot.21495. [DOI] [PubMed] [Google Scholar]
  • 33.Jacobson MP, Pincus DL, Rapp CS, Day TJ, Honig B, Shaw DE, Friesner RA. A hierarchical approach to all-atom protein loop prediction. Proteins. 2004;55:351–367. doi: 10.1002/prot.10613. [DOI] [PubMed] [Google Scholar]
  • 34.Gront D, Kulp DW, Vernon RM, Strauss CE, Baker D. Generalized fragment picking in Rosetta: design, protocols and applications. PloS One. 2011;6:e23294. doi: 10.1371/journal.pone.0023294. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Zhang C, Liu S, Zhou Y. Accurate and efficient loop selections by the DFIRE-based all-atom statistical potential. Protein Sci. 2004;13:391–399. doi: 10.1110/ps.03411904. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Pegg SC, Brown SD, Ojha S, Seffernick J, Meng EC, Morris JH, Chang PJ, Huang CC, Ferrin TE, Babbitt PC. Leveraging enzyme structure-function relationships for functional inference and experimental design: the structure-function linkage database. Biochemistry. 2006;45:2545–2555. doi: 10.1021/bi052101l. [DOI] [PubMed] [Google Scholar]
  • 37.Maurer SM, Rai A, Sali A. Finding cures for tropical diseases: is open source an answer? PLoS Med. 2004;1:e56. doi: 10.1371/journal.pmed.0010056. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Orti L, Carbajo R, Pieper U, Eswar N, Maurer S, Rai A, Taylor G, Todd M, Pineda-Lucena A, Sali A, et al. A kernel for open source drug discovery in tropical diseases. PLoS Negl. Trop. Dis. 2009;3:e418. doi: 10.1371/journal.pntd.0000418. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Aguero F, Al-Lazikani B, Aslett M, Berriman M, Buckner F, Campbell R, Carmona S, Carruthers I, Chan A, Chen F, et al. Genomic-scale prioritization of drug targets: the TDR Targets database. Nat. Rev. Drug Discov. 2008;7:900–907. doi: 10.1038/nrd2684. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Martinez-Jimenez F, Papadatos G, Yang L, Wallace IM, Kumar V, Pieper U, Sali A, Brown JR, Overington JP, Marti-Renom MA. Target prediction for an open access set of compounds active against Mycobacterium tuberculosis. PLoS Comp. Biol. 2013;9:e1003253. doi: 10.1371/journal.pcbi.1003253. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Sampathkumar P, Lu F, Zhao X, Li Z, Gilmore J, Bain K, Rutter ME, Gheyi T, Schwinn K, Bonanno J, et al. Structure of a putative BenF-like porin from Pseudomanas fluorescens Pf-5 at 2.6 Å resolution. Proteins. 2010;78:3056–3062. doi: 10.1002/prot.22829. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Pieper U, Chiang R, Seffernick J, Brown S, Glasner M, Kelly L, Eswar N, Sauder J, Bonanno J, Swaminathan S, et al. Target selection and annotation for the structural genomics of the amidohydrolase and enolase superfamilies. J. Struct. Funct. Genom. 2009;10:107–125. doi: 10.1007/s10969-008-9056-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Karchin R, Diekhans M, Kelly L, Thomas DJ, Pieper U, Eswar N, Haussler D, Sali A. LS-SNP: large-scale annotation of coding non-synonymous SNPs based on multiple information sources. Bioinformatics. 2005;21:2814–2820. doi: 10.1093/bioinformatics/bti442. [DOI] [PubMed] [Google Scholar]
  • 44.Fiser A, Sali A. ModLoop: automated modeling of loops in protein structures. Bioinformatics. 2003;19:2500–2501. doi: 10.1093/bioinformatics/btg362. [DOI] [PubMed] [Google Scholar]
  • 45.Davis F, Sali A. PIBASE: a comprehensive database of structurally defined protein interfaces. Bioinformatics. 2005;21:1901–1907. doi: 10.1093/bioinformatics/bti277. [DOI] [PubMed] [Google Scholar]
  • 46.Marti-Renom MA, Pieper U, Madhusudhan MS, Rossi A, Eswar N, Davis FP, Al-Shahrour F, Dopazo J, Sali A. DBAli tools: mining the protein structure space. Nucleic Acids Res. 2007;35:W393–W397. doi: 10.1093/nar/gkm236. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Marti-Renom MA, Ilyin VA, Sali A. DBAli: a database of protein structure alignments. Bioinformatics. 2001;17:746–747. doi: 10.1093/bioinformatics/17.8.746. [DOI] [PubMed] [Google Scholar]
  • 48.Pieper U, Eswar N, Webb B, Eramian D, Kelly L, Barkan D, Carter H, Mankoo P, Karchin R, Marti-Renom M, et al. MODBASE, a database of annotated comparative protein structure models and associated resources. Nucleic Acids Res. 2009;37:D347–D354. doi: 10.1093/nar/gkn791. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Pieper U, Eswar N, Davis FP, Braberg H, Madhusudhan MS, Rossi A, Marti-Renom M, Karchin R, Webb BM, Eramian D, et al. MODBASE: a database of annotated comparative protein structure models and associated resources. Nucleic Acids Res. 2006;34:D291–D295. doi: 10.1093/nar/gkj059. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Braberg H, Webb B, Tjioe E, Pieper U, Sali A, Madhusudhan MS. SALIGN: a webserver for alignment of multiple protein sequences and structures. Bioinformatics. 2012;15:2072–2073. doi: 10.1093/bioinformatics/bts302. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Schneidman-Duhovny D, Hammel M, Sali A. FoXS: a web server for rapid computation and fitting of SAXS Profiles. Nucleic Acids Res. 2010;38:541–544. doi: 10.1093/nar/gkq461. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Ueda Y, Taketomi H, Go N. Studies on protein folding, unfolding, and fluctuations by computer simulation. II.A. Three dimensional lattice model of lysozyme. Biopolymers. 1978;17:1531–1548. [Google Scholar]
  • 53.Okazaki K, Koga N, Takada S, Onuchic JN, Wolynes PG. Multiple-basin energy landscapes for large-amplitude conformational motions of proteins: Structure-based molecular dynamics simulations. Proc. Natl Acad. Sci. USA. 2006;103:11844–11849. doi: 10.1073/pnas.0604375103. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Whitford PC, Noel JK, Gosavi S, Schug A, Sanbonmatsu KY, Onuchic JN. An all-atom structure-based potential for proteins: bridging minimal models with all-atom empirical forcefields. Proteins. 2009;75:430–441. doi: 10.1002/prot.22253. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Weinkam P, Chen YC, Pons J, Sali A. Impact of mutations on the allosteric conformational equilibrium. J. Mol. Biol. 2013;425:647–661. doi: 10.1016/j.jmb.2012.11.041. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Petoukhov MV, Svergun DI. Analysis of X-ray and neutron scattering from biomacromolecular solutions. Curr. Opin. Struct. Biol. 2007;17:562–571. doi: 10.1016/j.sbi.2007.06.009. [DOI] [PubMed] [Google Scholar]
  • 57.Hura GL, Menon AL, Hammel M, Rambo RP, Poole FL, II, Tsutakawa SE, Jenney FE, Jr, Classen S, Frankel KA, Hopkins RC, et al. Robust, high-throughput solution structural analyses by small angle X-ray scattering (SAXS) Nat. Methods. 2009;6:606–612. doi: 10.1038/nmeth.1353. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Svergun D, Barberato C, Koch MHJ. CRYSOL-a program to evaluate X-ray solution scattering of biological macromolecules from atomic coordinates. J. Appl. Crystallogr. 1995;28:768–773. [Google Scholar]
  • 59.Poitevin F, Orland H, Doniach S, Koehl P, Delarue M. AquaSAXS: a web server for computation and fitting of SAXS profiles with non-uniformally hydrated atomic models. Nucleic Acids Res. 2011;39:W184–W189. doi: 10.1093/nar/gkr430. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Liu H, Morris RJ, Hexemer A, Grandison S, Zwart PH. Computation of small-angle scattering profiles with three-dimensional Zernike polynomials. Acta Crystallogr. A. 2012;68:278–285. doi: 10.1107/S010876731104788X. [DOI] [PubMed] [Google Scholar]
  • 61.Ravikumar KM, Huang W, Yang S. Fast-SAXS-pro: a unified approach to computing SAXS profiles of DNA, RNA, protein, and their complexes. J. Chem. Phys. 2013;138:024112. doi: 10.1063/1.4774148. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Schneidman-Duhovny D, Hammel M, Tainer JA, Sali A. Accurate SAXS profile computation and its assessment by contrast variation experiments. Biophys. J. 2013;105:962–974. doi: 10.1016/j.bpj.2013.07.020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Hammel M, Rey M, Yu Y, Mani RS, Classen S, Liu M, Pique ME, Fang S, Mahaney BL, Weinfeld M, et al. XRCC4 protein interactions with XRCC4-like factor (XLF) create an extended grooved scaffold for DNA ligation and double strand break repair. J. Biol. Chem. 2011;286:32638–32650. doi: 10.1074/jbc.M111.272641. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Pelikan M, Hura GL, Hammel M. Structure and flexibility within proteins as identified through small angle X-ray scattering. Gen. Physiol. Biophys. 2009;28:174–189. doi: 10.4149/gpb_2009_02_174. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Grant TD, Luft JR, Wolfley JR, Tsuruta H, Martel A, Montelione GT, Snell EH. Small angle X-ray scattering as a complementary tool for high-throughput structural studies. Biopolymers. 2011;95:517–530. doi: 10.1002/bip.21630. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Konarev PV, Volkov VV, Sokolova AV, Koch MHJ, Svergun DI. PRIMUS: a Windows PC-based system for small-angle scattering data analysis. J. Appl. Crystallogr. 2003;36:1277–1282. [Google Scholar]
  • 67.Shoichet BK, Kuntz ID. Protein docking and complementarity. J. Mol. Biol. 1991;221:327–346. doi: 10.1016/0022-2836(91)80222-g. [DOI] [PubMed] [Google Scholar]
  • 68.Pieper U, Schlessinger A, Kloppmann E, Chang GA, Chou JJ, Dumont M, Fox B, Fromme P, Hendrickson W, Malkowski M, et al. Coordinating the impact of structural genomics on the human a-helical transmembrane proteome. Nat. Struct. Mol. Biol. 2013;20:135–138. doi: 10.1038/nsmb.2508. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Johnson M, Zaretskaya I, Raytselis Y, Merezhuk Y, McGinnis S, Madden TL. NCBI BLAST: a better web interface. Nucleic Acids Res. 2008;36:W5–W9. doi: 10.1093/nar/gkn201. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Ward JJ, McGuffin LJ, Bryson K, Buxton BF, Jones DT. The DISOPRED server for the prediction of protein disorder. Bioinformatics. 2004;20:2138–2139. doi: 10.1093/bioinformatics/bth195. [DOI] [PubMed] [Google Scholar]
  • 71.Dosztanyi Z, Csizmok V, Tompa P, Simon I. IUPred: web server for the prediction of intrinsically unstructured regions of proteins based on estimated energy content. Bioinformatics. 2005;21:3433–3434. doi: 10.1093/bioinformatics/bti541. [DOI] [PubMed] [Google Scholar]
  • 72.Steentoft C, Vakhrushev SY, Joshi HJ, Kong Y, Vester-Christensen MB, Schjoldager KT, Lavrsen K, Dabelsteen S, Pedersen NB, Marcos-Silva L, et al. Precision mapping of the human O-GalNAc glycoproteome through SimpleCell technology. EMBO J. 2013;32:1478–1488. doi: 10.1038/emboj.2013.79. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73.Gupta R, Jung E, Gooley AA, Williams KL, Brunak S, Hansen J. Scanning the available Dictyostelium discoideum proteome for O-linked GlcNAc glycosylation sites using neural networks. Glycobiology. 1999;9:1009–1022. doi: 10.1093/glycob/9.10.1009. [DOI] [PubMed] [Google Scholar]
  • 74.Cormier CY, Park JG, Fiacco M, Steel J, Hunter P, Kramer J, Singla R, LaBaer J. PSI:Biology-materials repository: a biologist's resource for protein expression plasmids. J. Struct. Funct. Genomics. 2011;12:55–62. doi: 10.1007/s10969-011-9100-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 75.Lamesch P, Li N, Milstein S, Fan C, Hao T, Szabo G, Hu Z, Venkatesan K, Bethel G, Martin P, et al. hORFeome v3.1: a resource of human open reading frames representing over 10,000 human genes. Genomics. 2007;89:307–315. doi: 10.1016/j.ygeno.2006.11.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76.Temple G, Gerhard DS, Rasooly R, Feingold EA, Good PJ, Robinson C, Mandich A, Derge JG, Lewis J, Shoaf D, et al. The completion of the mammalian gene collection (MGC) Genome Res. 2009;19:2324–2333. doi: 10.1101/gr.095976.109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 77.Seal RL, Gordon SM, Lush MJ, Wright MW, Bruford EA. genenames.org: the HGNC resources in 2011. Nucleic Acids Res. 2011;39:D514–D519. doi: 10.1093/nar/gkq892. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 78.Gifford LK, Carter LG, Gabanyi MJ, Berman HM, Adams PD. The protein structure initiative structural biology knowledgebase technology portal: a structural biology web resource. J. Struct. Funct. Genomics. 2012;13:57–62. doi: 10.1007/s10969-012-9133-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 79.Freed EO. HIV-1 replication. Somatic Cell Mol. Genet. 2001;26:13–33. doi: 10.1023/a:1021070512287. [DOI] [PubMed] [Google Scholar]
  • 80.McDonald CK, Kuritzkes DR. Human immunodeficiency virus type 1 protease inhibitors. Arch. Int. Med. 1997;157:951–959. [PubMed] [Google Scholar]
  • 81.Drag M, Salvesen GS. Emerging principles in protease-based drug discovery. Nat. Rev. Drug Discov. 2010;9:690–701. doi: 10.1038/nrd3053. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 82.Flexner C. HIV-protease inhibitors. N. Engl. J. Med. 1998;338:1281–1292. doi: 10.1056/NEJM199804303381808. [DOI] [PubMed] [Google Scholar]
  • 83.Kohl NE, Emini EA, Schleif WA, Davis LJ, Heimbach JC, Dixon RA, Scolnick EM, Sigal IS. Active human immunodeficiency virus protease is required for viral infectivity. Proc. Natl Acad. Sci. USA. 1988;85:4686–4690. doi: 10.1073/pnas.85.13.4686. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 84.Murthy KH, Winborne EL, Minnich MD, Culp JS, Debouck C. The crystal structures at 2.2-A resolution of hydroxyethylene-based inhibitors bound to human immunodeficiency virus type 1 protease show that the inhibitors are present in two distinct orientations. J. Biol. Chem. 1992;267:22770–22778. [PubMed] [Google Scholar]
  • 85.Prabu-Jeyabalan M, Nalivaika E, Schiffer CA. How does a symmetric dimer recognize an asymmetric substrate? A substrate complex of HIV-1 protease. J. Mol. Biol. 2000;301:1207–1220. doi: 10.1006/jmbi.2000.4018. [DOI] [PubMed] [Google Scholar]
  • 86.Mahalingam B, Louis JM, Hung J, Harrison RW, Weber IT. Structural implications of drug-resistant mutants of HIV-1 protease: high-resolution crystal structures of the mutant protease/substrate analogue complexes. Proteins. 2001;43:455–464. doi: 10.1002/prot.1057. [DOI] [PubMed] [Google Scholar]
  • 87.Nie Z, Phenix BN, Lum JJ, Alam A, Lynch DH, Beckett B, Krammer PH, Sekaly RP, Badley AD. HIV-1 protease processes procaspase 8 to cause mitochondrial release of cytochrome c, caspase cleavage and nuclear fragmentation. Cell Death Differ. 2002;9:1172–1184. doi: 10.1038/sj.cdd.4401094. [DOI] [PubMed] [Google Scholar]
  • 88.Algeciras-Schimnich A, Belzacq-Casagrande AS, Bren GD, Nie Z, Taylor JA, Rizza SA, Brenner C, Badley AD. Analysis of HIV protease killing through caspase 8 reveals a novel interaction between caspase 8 and mitochondria. Open Virol. J. 2007;1:39–46. doi: 10.2174/1874357900701010039. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 89.Nie Z, Bren GD, Rizza SA, Badley AD. HIV protease cleavage of procaspase 8 is necessary for death of HIV-infected cells. Open Virol. J. 2008;2:1–7. doi: 10.2174/1874357900802010001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 90.Castello A, Franco D, Moral-Lopez P, Berlanga JJ, Alvarez E, Wimmer E, Carrasco L. HIV- 1 protease inhibits Cap- and poly(A)-dependent translation upon eIF4GI and PABP cleavage. PloS One. 2009;4:e7997. doi: 10.1371/journal.pone.0007997. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 91.Impens F, Timmerman E, Staes A, Moens K, Arien KK, Verhasselt B, Vandekerckhove J, Gevaert K. A catalogue of putative HIV-1 protease host cell substrates. Biol. Chem. 2012;393:915–931. doi: 10.1515/hsz-2012-0168. [DOI] [PubMed] [Google Scholar]
  • 92.Kotik-Kogan O, Valentine ER, Sanfelice D, Conte MR, Curry S. Structural analysis reveals conformational plasticity in the recognition of RNA 3' ends by the human La protein. Structure. 2008;16:852–862. doi: 10.1016/j.str.2008.02.021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 93.Nakai K, Kidera A, Kanehisa M. Cluster analysis of amino acid indices for prediction of protein structure and function. Protein Eng. 1988;2:93–100. doi: 10.1093/protein/2.2.93. [DOI] [PubMed] [Google Scholar]
  • 94.Tomii K, Kanehisa M. Analysis of amino acid indices and mutation matrices for sequence comparison and structure prediction of proteins. Protein Eng. 1996;9:27–36. doi: 10.1093/protein/9.1.27. [DOI] [PubMed] [Google Scholar]
  • 95.Jones DT. Protein secondary structure prediction based on position-specific scoring matrices. J. Mol. Biol. 1999;292:195–202. doi: 10.1006/jmbi.1999.3091. [DOI] [PubMed] [Google Scholar]
  • 96.Breiman L, Schapire E. Random forests. Mach. Learn. 2001;45:5–32. [Google Scholar]
  • 97.Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW. GenBank. Nucleic Acids Res. 2010;38:D46–D51. doi: 10.1093/nar/gkp1024. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 98.Kraulis PJ. MOLSCRIPT: a program to produce both detailed and schematic plorts of protein structures. J. Appl. Crystallogr. 1991;24:946–950. [Google Scholar]
  • 99.Merritt EA, Bacon DJ. Raster3D: photorealistic molecular graphics. Methods Enzymol. 1997;277:505–524. doi: 10.1016/s0076-6879(97)77028-9. [DOI] [PubMed] [Google Scholar]
  • 100.Pettersen EF, Goddard TD, Huang CC, Couch GS, Greenblatt DM, Meng EC, Ferrin TE. UCSF Chimera–a visualization system for exploratory research and analysis. J Comput Chem. 2004;25:1605–1612. doi: 10.1002/jcc.20084. [DOI] [PubMed] [Google Scholar]
  • 101.Capra JA, Laskowski RA, Thornton JM, Singh M, Funkhouser TA. Predicting protein ligand binding sites by combining evolutionary sequence conservation and 3D structure. PLoS Comp. Biol. 2009;5:e1000585. doi: 10.1371/journal.pcbi.1000585. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 102.Henrick K, Feng Z, Bluhm WF, Dimitropoulos D, Doreleijers JF, Dutta S, Flippen-Anderson JL, Ionides J, Kamada C, Krissinel E, et al. Remediation of the protein data bank archive. Nucleic Acids Res. 2008;36:D426–D433. doi: 10.1093/nar/gkm937. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 103.Wu CH, Apweiler R, Bairoch A, Natale DA, Barker WC, Boeckmann B, Ferro S, Gasteiger E, Huang H, Lopez R, et al. The Universal Protein Resource (UniProt): an expanding universe of protein information. Nucleic Acids Res. 2006;34:D187–D191. doi: 10.1093/nar/gkj161. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 104.Rhead B, Karolchik D, Kuhn RM, Hinrichs AS, Zweig AS, Fujita PA, Diekhans M, Smith KE, Rosenbloom KR, Raney BJ, et al. The UCSC Genome Browser database: update 2010. Nucleic Acids Res. 2010;38:D613–D619. doi: 10.1093/nar/gkp939. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 105.Hunter S, Apweiler R, Attwood TK, Bairoch A, Bateman A, Binns D, Bork P, Das U, Daugherty L, Duquenne L, et al. InterPro: the integrative protein signature database. Nucleic Acids Res. 2009;37:D211–D215. doi: 10.1093/nar/gkn785. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 106.Klein TE, Chang JT, Cho MK, Easton KL, Fergerson R, Hewett M, Lin Z, Liu Y, Liu S, Oliver DE, et al. Integrating genotype and phenotype information: an overview of the PharmGKB project. Pharmacogenetics research network and knowledge base. Pharmacogenomics J. 2001;1:167–170. doi: 10.1038/sj.tpj.6500035. [DOI] [PubMed] [Google Scholar]
  • 107.Haas J, Roth S, Arnold K, Kiefer F, Schmidt T, Bordoli L, Schwede T. The Protein Model Portal—a comprehensive resource for protein structure and model information. Database. 2013;2013:bat031. doi: 10.1093/database/bat031. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 108.Schwede T, Sali A, Honig B, Levitt M, Berman H, Jones D, Brenner S, Burley S, Das R, Dokholyan N, et al. Outcome of a workshop on applications of protein models in biomedical research. Structure. 2009;17:151–159. doi: 10.1016/j.str.2008.12.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 109.Bairoch A, Apweiler R, Wu CH, Barker WC, Boeckmann B, Ferro S, Gasteiger E, Huang H, Lopez R, Magrane M, et al. The Universal Protein Resource (UniProt) Nucleic Acids Res. 2005;33:D154–D159. doi: 10.1093/nar/gki070. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 110.Giglia E. New year, new PubMed. Eur. J. Phys. Rehabil. Med. 2009;45:155–159. [PubMed] [Google Scholar]

Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press

RESOURCES