Abstract
Systems biology is a data-heavy field that focuses on systems-wide depictions of biological phenomena necessarily sacrificing a detailed characterization of individual components. As an example, genome-wide protein interaction networks are widely used in systems biology and continuously extended and refined as new sources of evidence become available. Despite the vast amount of information about individual protein structures and protein complexes that has accumulated in the past 50 years in the Protein Data Bank, the data, computational tools, and language of structural biology are not an integral part of systems biology. However, increasing effort has been devoted to this integration, and the related literature is reviewed here. Relationships between proteins that are detected via structural similarity offer a rich source of information not available from sequence similarity, and homology modeling can be used to leverage Protein Data Bank structures to produce 3D models for a significant fraction of many proteomes. A number of structure-informed genomic and cross-species (i.e., virus–host) interactomes will be described, and the unique information they provide will be illustrated with a number of examples. Tissue- and tumor-specific interactomes have also been developed through computational strategies that exploit patient information and through genetic interactions available from increasingly sensitive screens. Strategies to integrate structural information with these alternate data sources will be described. Finally, efforts to link protein structure space with chemical compound space offer novel sources of information in drug design, off-target identification, and the identification of targets for compounds found to be effective in phenotypic screens.
Keywords: systems biology, protein structure, protein–protein interaction, homology modeling, computational biology
Abbreviations: HT, high-throughput; ML, machine learning; PDB, Protein Data Bank; P-HIPSTer, Pathogen Host Interactome Prediction using structure similarity; PPI, protein–protein interaction; PrePPI, Predicting Protein-Protein Interactions; TCGA, The Cancer Genome Atlas
The growth of protein structure information has stimulated a parallel growth in computational tools that predict protein structure and function. These tools provide fundamental insights into the physical principles that underlie the behavior of biological macromolecules. For example, molecular dynamics simulations allow realistic descriptions of conformational heterogeneity; Poisson–Boltzmann calculations have revealed how electrostatic interactions play a central role in biological functions; and the forces that determine the stability of the native folded state are now well understood. Advances such as these have been transformative and are part of the language and intellectual foundation of modern structural biology.
A parallel set of computational methods falls under the rubric of “structural genomics,” which includes the goal of structurally characterizing enough members of sequence families so as to enable the construction of homology models for the others. A key development has been the computational identification of geometric relationships among protein structures. Since structural similarity can identify functional relationships even in the absence of statistically significant sequence similarity, structural alignment has become a powerful tool to detect evolutionary relationships between proteins that cannot be detected from sequence alone. We have used the term Structural Blast (1) to imply the use of structural alignment to identify relationships between proteins in analogy to the widely used BLAST suite of programs for sequence alignment (2). Figure 1 provides two examples of functional relationships that can be detected this way: protein–protein interaction (PPI) and protein–compound interaction. Figure 1A illustrates the structural alignment of four protein domains where BLAST fails to detect any sequence relationship between them. Figure 1B shows the experimentally determined complex between the pleckstrin homology (PH) domain from phospholipase C-gamma-2 (yellow) and the small GTPase Rac2 (gray). Structural alignment of the Ezrin F3 lobe (red) with the PH domain produces a model for the complex between Ezrin and Rac2 (red–gray). Similarly, Figure 1C shows the experimentally determined complex between the PH domain from mouse Beta-II spectrin (green) and inositol 1,4,5-trisphosphate (sticks). Structural alignment of the Tiam-2 PH domain (blue) with the Beta-II spectrin PH domain produces a model for the complex between Tiam-2 and inositol 1,4,5-trisphosphate (blue and sticks). These examples provide the basis of many of the methods highlighted later that, as will be described, enable the use of structural information on a genomic scale.
The Protein Data Bank (PDB) (3) stands as a centerpiece of structural biology. It has created standards that impact the entire community, organized data in easily accessible form, and provided a battery of tools and links to other databases that have revealed multiple ways in which 3D structural information can be exploited for the detailed annotation of protein function and interactions. Indeed, much of the research that is discussed here would not have been possible without extensive use of the PDB and its many auxiliary resources.
There are areas of biomedical research where protein structure is still underutilized. Specifically, cellular systems biology, with its heavy emphasis on the study of pathways and networks, has made only limited use of 3D information. In networks, PPIs are typically described as nodes (proteins) connected by edges (interactions), without reference to the structures of the proteins involved or the nature of the interactions. With 20,000 human protein coding genes and potentially millions of PPIs, it is not possible to obtain experimental structures for every node and edge in the interactome. Computational methods to interrogate these interactions can complement the available experimental evidence, enabling more meaningful insights from systems biology approaches.
This article summarizes some of the advances in structural systems biology and points to strategies through which structural information can be integrated with the vast quantities of data emerging from high-throughput (HT) genomic technologies and patient records (summarized in Table 1). There are a number of computational methodologies that are central to this integration. First, the ability to construct homology models for most proteins in a given genome implies that, in principle, structure can be used on a genome-wide scale. Homology models dramatically enhance structural genomics efforts; for example, while there are structures available for about 5000 human proteins in the PDB, there are homology models for at least one domain of about 18,000 human proteins in databases such as ModBase (4) and SwissModel (5).
Table 1.
Systems level | Insight from computational structural biology |
---|---|
Protein | Models of protein domains (4, 5) |
Delineation of intrinsically disordered regions (97) | |
Prediction of interaction surfaces (38, 94) | |
Context of missense mutations (60, 98) | |
PPIs (33, 34, 35) | Determination of direct versus indirect |
Domain-level models of protein regions involved | |
Atomic-level detail of interfaces | |
Pathways/networks | Molecular mechanisms for information flow |
Molecular depiction of complexes and series of PPIs | |
Pathway/submodule crosstalk | |
Hypothesis generation for effects of perturbations | |
Rational targeting to alter phenotypic outcome (75) | |
Integration with subcellular localization (99) | |
Tissue/tumor | Integration with context-specific data (27) |
Differential pathways/networks (100) | |
Models for protein-mediated cell–cell interactions (101) |
A second methodology has been the use of Structural Blast, as illustrated in Figure 1. The structure-based identification of a large number of functional relationships combined with extensive structural coverage of multiple genomes with homology models enables the prediction of PPIs on a genomic scale. Third, machine learning (ML) is crucial to the integration of structural and genomic data. ML not only facilitates the combination of data from multiple sources but also mitigates inaccuracies in structural models since training will determine the extent to which the models have predictive value. In this regard, it is important to emphasize that inferences yielded in systems biology are often statistical in nature, and the use of structural information must be used in such a way so as to conform to this reality.
This article is not meant as a comprehensive review of the literature, and many substantial studies do not appear on the reference list. Rather, our goal is to convey our own perspective of the development of a new interdisciplinary field and highlight articles that provide useful examples along with access to a larger literature. Our perspective is also embodied in our own contributions, some of which are summarized later.
PPIs
The discovery and analysis of PPI networks has become an important area of systems biology where a particular focus has been specific applications to human disease. In systems-based approaches, genes or proteins are identified as disease associated based on their topological location in interaction networks (6, 7, 8). A necessary step in the creation of a network is the identification of interactions among proteins, which may include formation of stable dimeric or multimeric complexes; transient engagements that in some cases may be of low affinity and in others may involve post-translational modification; nonphysical interactions where, for example, one protein may regulate the expression of another in the absence of any physical contact between the two. It is necessary to keep these distinctions in mind when reading the PPI literature.
Given the centrality of PPIs in so many cellular processes, their experimental detection and computational prediction constitute a major research focus. Only HT experimental methods and highly efficient computational approaches are capable of detecting/predicting PPIs on a genomic scale. Complicating the challenge is the fact that physiological PPIs are context dependent: two proteins found to interact in an in vitro assay may well form a complex if expressed at appropriate levels but may never actually encounter one another in vivo.
Databases of experimentally observed PPIs
There are many genome-wide PPI databases for human and different model organisms (9). Some are based on HT methods, such as yeast two-hybrid (10) and tandem affinity purification mass spectroscopy (11), whereas others are based entirely on literature curation (e.g., BioGRID (12), IntAct (13), MINT (14)). Databases such as HINT (15), HURI (16), and APID (17) curate these resources to provide high-quality interactions and/or to extract only binary or physical associations. The widely used STRING database (18) combines literature curation with predictions based primarily on sequence relationships. With few exceptions, existing databases do not include context-specific information, such as the cell line, tissue, tumor type, disease condition, and others, in which the interactions are observed.
Context-specific associations can be derived from methods based on the correlation of gene profiles across many conditions (e.g., cell lines or drug treatments) (19, 20). These profiles are typically obtained from HT genomic screens of cancer cell lines or human tissue samples: Project Achilles for RNAi and CRISPR–Cas9 knockdowns (21, 22); the Library of Integrated Network-Based Cellular Signatures (LINCS) (23) and the Cancer Dependency Map (CMap) (24) for phenotypic drug screens; The Cancer Genome Atlas (TCGA) for tumor-specific genetic variation (25); and Genotype-Tissue Expression (GTEx) for nondiseased tissue-specific genetic variation (24). The Califano laboratory has pioneered the use of algorithms to predict tumor-specific regulatory interactions based on the analysis of large-scale molecular profile data taken, for example, from TCGA (26). As will be discussed later, the integration of patient-specific regulatory networks with predicted physical interactions between proteins enables the development of context-specific structure-informed protein interaction networks, thus providing mechanistic insights not available from resources mentioned previously (27).
Structure-informed prediction of PPIs in the human proteome
PPI prediction can involve (a) predicting the structure of known complexes given the structures of interacting monomers; (b) predicting whether and how two proteins interact given their structures, which requires building a model of the putative complex and then scoring it; (c) predicting whether two proteins interact given their sequence, which can be accomplished either by purely sequence-based methods, that is, sequence relationships to proteins in known complexes, or through some combination of methods (a) and (b). There are two main computational approaches for method (a): docking and template-based modeling. Docking methods (28, 29) are widely used but have not reached the point in terms of computation time where they can truly be used for genome-scale interactomes. Template modeling (30) involves superimposing the structures of two query proteins on structurally similar interacting proteins in a PDB complex (e.g., Fig. 1). Algorithms to find such structurally related proteins are currently quite efficient (31, 32).
The Interactome3D server was an early resource for the prediction of the structures of protein complexes for different organisms (33). The current release lists binary interactions taken from experimental databases and, where possible, structural models for 18 organisms. Structures of complexes are obtained from either the PDB or template-based modeling with templates identified based on sequence relationships. For the human proteome, structural models are provided for ~15,000 binary complexes involving ~10,000 proteins; about half of the complexes are taken from the PDB. Overall, Interactome3D lists 125,000 experimentally observed binary PPIs for the human proteome with structural models for 12%.
Interactome INSIDER (34) also builds models for experimentally determined binary interactions. It is based in part on the Ensemble Classifier Learning Algorithm to predict Interface Residues (ECLAIR) framework, which combines features derived from individual proteins, such as surface properties, with pairwise PPI features obtained from docking and coevolution analysis. ECLAIR is trained on high-quality experimental data sets of PPIs (15). The current version contains over 120,000 predictions of structurally resolved interfaces for experimentally observed human PPIs. The high structural coverage of Interactome INSIDER is achieved by the use of docking, which avoids the necessity of a binary complex as a structural template; that is, only the structures of individual interacting proteins are needed.
The Predicting Protein-Protein Interactions (PrePPI) algorithm is fundamentally different from Interactome3D and Interactome INSIDER in that it makes structure-informed predictions of whether two proteins interact independent of whether they appear in experimental databases (35, 36). Furthermore, PrePPI uses structure on a truly genome-wide scale, effectively screening most of the ∼200 million possible human PPIs. Like other methods, it begins with a database of ∼18,000 PDB structures and homology models for proteins and their constituent domains. PrePPI then uses structural alignment to establish relationships among protein structures: every one of the ∼18,000 query proteins is assigned a set of “structural neighbors” derived from structure alignments to protein structures in the PDB, regardless of species. Each query protein will have, on average, hundreds of neighbors. This large number results both from the use of distant structural relationships in multiple genomes and from the fact that the PrePPI alignment procedure defines neighbors when as few as three secondary structure elements can be aligned. If any two query proteins have neighbors that interact in the same PDB file (templates), then each of the query proteins is superimposed on its appropriate neighbor to generate a structural model for the interaction between those two query proteins. This is illustrated in Figure 2A, where models of the proteins for human CUL5 (yellow) and DCUN1D5 (green) are superimposed on chains of the PDB complex between the yeast proteins for CDC53 (brown) and DCN1 (purple). In this case, the template complex was identified because the proteins for human CUL5 and yeast CDC53 are structural neighbors as are the proteins for human DCUN1D5 and yeast DCN1.
The use of structural alignment in this way generates an extensive set of PPI models that are quickly scored by a naïve Bayesian ML algorithm, trained on experimentally determined PPIs. Scoring is a unique feature of PrePPI. Since hundreds of millions of interaction models are generated, some of them quite crude, applying standard energy functions would be computationally prohibitive. The approach used to enable the scoring of so many models is to transform the problem to one where pairwise information for the modeled interface (Fig. 2B) is transferred directly from the template interface (Fig. 2C). PrePPI scoring is based on the quality of the structural alignment of each individual protein to its template and on features of the alignment of query residues to interfacial residues in the template (37, 38); see figure legend for details. A likelihood ratio is calculated for each interaction, and a cutoff is defined for a “high-confidence” prediction.
PrePPI not only relies on structural information but also calculates likelihood ratios for nonstructural evidence such as whether the two query proteins have a similar function and whether their orthologs interact in other species, are coexpressed, or have a similar phylogenetic history (35). Nonstructural sources of evidence can increase the probability that a structural signal is real but can also have the effect of detecting interactions that are indirect. Overall, PrePPI performance at recovering known (gold standard) PPIs is comparable to that of other large-scale PPI databases and is comparable in accuracy to HT experimental methods (35). At present, the PrePPI database contains high-confidence predictions for over 1.3 million human PPIs where about 500,000 are predicted to be binary physical interactions. Many of these predictions are novel since the use of 3D structure detects many relationships that are not detectable with sequence. Of the 500,000 binary predictions, about 75% are predicted to be domain–domain interactions and 25% are predicted to be protein–peptide interactions. High confidence is of course a vague term, and indeed, PrePPI undoubtedly contains many false positives despite its overall success rate. However, it represents an attempt to replace sequence relationships with structural relationships on a genomic scale and, in doing so, generates testable hypotheses not available from other approaches. Of note, the approximately 800,000 PPIs that are not predicted to involve physical interactions likely involve proteins that are present in the same complex or participate in the same pathway but are not in direct contact.
There has also been major progress in the use of sequence-based approaches that exploit coevolution relationships to predict PPIs (39, 40). For the most part, these techniques require multiple sequence alignments of many orthologs and are, thus, largely limited to bacterial proteomes. Recently, Cong et al. (41) developed a hybrid approach to predict PPIs for the Escherichia coli proteome that first used coevolution to filter 4 million pairs of query protein sequences and then implemented docking with structures and homology models of the query proteins to produce a set of 800 predicted PPIs. Indeed, the combination of structural and coevolution information offers numerous strategies to predict PPIs, and there are likely to be exciting developments in this area in the coming years.
Structure-informed prediction of virus/host PPIs
Viruses deploy an array of genetically encoded strategies to co-opt host machinery and support viral replicative cycles. Molecular mimicry, manifested by structural similarity between viral and endogenous host proteins, allows viruses to harness or disrupt cellular functions including nucleic acid metabolism and modulation of immune responses. Mimicry relationships have been detected through sequence similarity and linear motif co-occurrence (42, 43); however, structural similarity enables identification of mimics between pathogen and host proteins that cannot be observed from sequence alone (44). Structural mimicry can occur at the level of entire protein domains or in the form of “interface mimicry,” where the structure of host protein residues involved in PPIs is mimicked on the surface of a viral protein (45, 46, 47). Indeed, analysis of PDB structures has demonstrated that the interfaces in complexes involving a viral and human protein mimic the interfaces of human PPIs (48), and interface mimicry has been used as a basis for predicting virus/host PPIs (49, 50).
A recent study reported a systematic analysis of molecular mimicry across the entire virome (51). Protein structure similarity was used to scan for viral structure mimics from thousands of catalogued viruses and hosts spanning broad ecological niches and taxonomic range, including bacteria, plants and fungi, invertebrates, and vertebrates. The results point to molecular mimicry as a pervasive strategy employed by viruses and indicate that the protein structure space used by a given virus is dictated by the host proteome. In particular, analysis of the proteins mimicked by human-infecting viruses points to broad diversification of cellular pathways targeted via structural mimicry, identifies biological processes that may underlie autoimmune disorders, and reveals virally encoded mimics that may serve as targets for therapeutics.
Viral mimicry and, in particular, interface mimicry, indicate that viral proteins compete with host proteins for host interaction partners and, indeed, it is clear that knowledge of virus/host PPIs is critical for understanding mechanisms of infection. The PrePPI computational pipeline was used to create the Pathogen Host Interactome Prediction using structure similarity (P-HIPSTer) database (50). P-HIPSTer employs structural information to predict 282,000 pan viral–human PPIs with an experimental validation rate of 75% comparable to what was found for PrePPI for human PPIs (36). In addition to rediscovering known biology, P-HIPSTer has yielded a series of new findings: the discovery of shared and unique machinery employed across human-infecting viruses; a likely role for interactions between Zika Virus proteins and human Estrogen Receptor 1 in modulating viral replication; the identification of PPIs that discriminate between human papilloma viruses with high and low oncogenic potential; and a structure-enabled history of evolutionary selective pressure imposed on the human proteome. Furthermore, P-HIPSTer enables discovery of previously unappreciated cellular circuits that act on human-infecting viruses.
Disease driver mutations and PPI networks
There has been enormous interest in understanding the role of mutations in disease, and 3D structural information has played an important role in this process. Much effort has been invested in the study of somatic mutations identified in the sequenced genomes of tumors and normal tissue available in resources such as TCGA (25) and the International Cancer Genome Consortium (ICGC) (52). There are tens of thousands of somatic mutations present in these genomes, and a major focus has been to identify “driver genes” that contain mutations capable of effecting tumorigenesis. Driver genes were initially identified as containing more mutations than expected from the background mutation rate, but the distribution of mutations on a particular protein also provides an important signal. Given that most tumors contain a large number of unique mutations, it has been necessary to develop sophisticated bioinformatics tools to analyze patient samples. These have focused on the identification of oncogenic “driver mutations” that are generally distinguished from “passenger mutations” that have no oncogenic potential. These classifications are somewhat ambiguous since a single driver mutation is not necessarily sufficient to cause cancer, whereas some passenger mutations might well be oncogenic when present along with other mutations or in specific contexts. The reader is referred to the excellent review by Martinez-Jimenez et al. (53) for an illuminating historical discussion of the large literature in the field.
In another insightful review, Porta-Pardo et al. (54) summarized algorithms that have been developed to identify driver genes based on the distribution of mutations they present. Some algorithms look for clusters of mutations along a protein sequence, whereas others identify clusters within a 3D structure (55, 56, 57); however, such approaches do not necessarily reveal mechanistic insights. Observations that disease mutations are enriched in protein–protein interfaces (58) suggest that cancer driver mutations can be identified on this basis. Indeed, mapping of somatic mutations obtained from TCGA onto PPI interfaces taken from the PDB and high-quality homology models identified about 100 interfaces enriched in somatic mutations involving proteins not previously identified as cancer drivers (59). In a landmark study, Bailey et al. (60) combined 26 computational tools, including some that were structure based, to classify about 750,000 pan-cancer missense mutations and identified 299 driver genes and over 3400 driver mutations. The information and mechanistic insights obtained from these studies are unique but perhaps limited by their focus on individual proteins. Algorithms that treat mutations as perturbations of both the nodes and edges in networks have been successful at annotating disease-associated genes and mutations (8). The integration of structural information into network biology is thus likely to yield important new insights into the identification of driver genes and molecular mechanisms underlying tumorigenesis.
Adding context to interactome analysis
Networks derived from pairwise-interaction assays or computational predictions generally neither account for nor discriminate between cellular contexts (61). Recent approaches have started to address the challenge of “context-specific interactions” by incorporating cell line-, tumor-, or tissue-specific information (62, 63, 64, 65, 66). However, comprehensive proteome-wide depiction of human interactomes across different tissue contexts remains elusive. To address these challenges, we developed an integrative ML framework (OncoSig) using PrePPI and other computationally derived interactomes for the systematic, de novo reconstruction of tumor-specific molecular-interaction signaling maps (SigMaps), anchored on any oncoprotein of interest (27). Specifically, as illustrated in Figure 3, an oncoprotein-specific SigMap recapitulates the molecular architecture necessary to functionally modulate and mediate its activity within a specific cellular context, including its physical cognate binding partners.
OncoSig infers context-specific SigMaps by integrating PrePPI with complementary evidence from transcriptional and post-translational interactions from gene expression and mutational profiles from TCGA. PrePPI provides context-independent and structure-based information on the “reference” human protein interactome. ARACNe (67, 68), VIPER (69), and CINDy (70, 71) provide information from genomic data, including, as depicted in Figure 3A, upstream modulators (orange) and downstream effectors (blue) of a protein of interest (rose) and regulatory interactions, such as feedback loops, among them (green dotted line). They further account for tumor specificity since they are based on the analysis of molecular profile data from patient samples corresponding to different TCGA tumor types (e.g., lung adenocarcinoma or colon adenocarcinoma). The SigMap generated for lung adenocarcinoma recapitulated published KRas biology and identified novel KRas-associated proteins whose genes were experimentally validated as synthetic lethal with KRASmut in 3D spheroid models derived from primary lung cancer cells (27).
Increasingly, PPIs in existing networks are inferred from genetic interactions, which are typically based on the correlation of gene profiles across many conditions (e.g., cell lines or drug treatments) (19). While protein complexes are enriched in genetic interactions (72, 73), genetic interactions do not necessarily correspond to physical PPIs and, thus, serve as an orthogonal and complementary resource for direct physical PPIs as contained, for example, in the PrePPI database. Thus, in parallel to the development of OncoSig where PrePPI was integrated with genetic interactions derived from TCGA, context-specific PPI networks (or SigNets) can be obtained by integrating physical protein interactomes with genetic interactions based on gene profiles derived from HT genomic screens of human cancer cell lines (23, 74). Figure 4 illustrates a generalized scheme to derive context-dependent SigNets. Of note, Figure 4D highlights the description of individual pathways at the level of interactions between individual protein domains.
Structural systems pharmacology
Systems pharmacology approaches typically aim to leverage network topology to elucidate drug mechanism of action, discover new targets, and design combination therapies (75). This has been made possible through the integration of omics technologies with large-scale chemical compound repositories and databases of drug–protein interactions and bioactivity data (76, 77, 78, 79, 80, 81), Moreover, the application of HT screening and sequencing technologies at the single patient level has facilitated the application of systems pharmacology in precision medicine (“N-of-1”) contexts (82, 83, 84). Systems pharmacology thus leverages network-based perspectives of human disease in next-generation drug discovery.
While the intersection of network analysis and phenotypic screens has proved powerful, systems-level implementation of traditional drug discovery tools is necessary for maximum impact. For example, if a new target is identified via network analysis, it is then necessary to find a compound that effectively and specifically inhibits that target. Or, if a particular drug is found to be effective in a phenotypic screen, in many cases, it will be necessary to identify the actual target(s). Furthermore, although drug repurposing has yielded important discoveries, the continuing exploration of chemical space is clearly of great importance.
Traditional drug discovery has relied on both cheminformatic tools and protein structure–based tools. The former is ultimately based on the assumption that chemically similar ligands will bind to similar proteins (e.g., (85, 86)). Numerous tools are available to represent chemicals as molecular fingerprints in a format that can be used for rapid similarity searches based, for example, on Tanimoto coefficients (87). The Similarity Ensemble Approach (SEA) uses this principle to relate proteins based on the ligands they bind and, thus, identifies new protein targets for existing drugs (88). ML is playing an increasingly important role in this area where, in effect, pairwise chemical similarity relationships are supplanted by “learning” what compounds might target a particular protein or have a desired biological effect as determined by training data obtained from aptly designed HT screens (89).
The most common current uses of protein structure are in ligand docking and lead optimization, and significant advances continue to be made in both these technologies. For example, flexible docking helps escape the constraint of using rigid protein structures (90), and neural networks have been trained to score docking poses (91). In the area of lead optimization, free energy perturbation methods can yield truly accurate relative binding free energies of a congeneric series of compounds (92, 93), although accuracy inevitably is compromised if a homology model rather than a crystal structure of the protein–ligand complex is used. Algorithmic advances combined with high performance computing, and particularly the use of Graphical Processing Units (GPUs), have enabled the ever-expanding use of these tools, but there are still limitations for their use on a true genome-wide scale.
Our group and others are developing alternate approaches that leverage the Structural Blast concept. Similar to what has been described previously for PPIs, these methods exploit available structural information under the assumption that structural similarities between proteins provide clues as to what compounds will bind a protein and where. One approach is to align entire protein structures or substructures to PDB protein–compound complexes, which have the effect of moving the ligand in a template structure into the coordinate system of the query protein structure (94) (Fig. 1C). The resulting ligand–protein interaction model can then be scored by enumerating the physiochemical features of the predicted binding site. An alternative approach is to search for regions in potential target proteins that structurally align to binding pockets in PDB complexes (e.g., (95, 96)).
Structural alignment is a way to explore protein structure space, whereas chemical similarity searches enable the exploration of chemical space. A number of efforts to combine the two have been described (95, 96) where the link is a PDB complex. For example, one can start with a query compound identified in a phenotypic screen, search for chemically similar compounds in a database of PDB complexes, and then use structural alignment to identify other proteins that might bind to the original compound. In parallel, starting with a target protein, structural alignment can be used to identify related proteins in PDB complexes, and then chemical similarity can be used to identify lead compounds that bind to the original query protein. This low-resolution strategy, when combined with a battery of docking and lead optimization technologies, offers the possibility of true genome-wide structure-based prediction of ligand–protein interactions.
Concluding remarks
We have highlighted a daunting array of genomic technologies and databases that have emerged in the past few years and that offer the possibility of transforming both basic and translational biomedical research. Given the proper tools, we have argued that the strategy of exploiting the information available in the PDB can make this database the critical resource that enables the integration of structural biology with systems biology. We are now in a position to create and probe tissue- and disease-specific structure–informed protein interaction networks and similar networks that describe pathogen infection. The integration of structure in these networks is the only way to gain mechanistic insights and to link these networks to drug discovery tools, which themselves are undergoing rapid evolution. As the information available in the PDB grows, the ways in which that information can be used to carry out systems-wide analysis of biological processes will grow as well.
Conflict of interest
The authors declare that they have no conflicts of interest with the contents of this article.
Acknowledgments
Author contributions
D. M., D. P., and B. H. wrote the paper.
Funding and additional information
This work was supported by National Institutes of Health grants R01-GM030518, R35-GM139585 and U54-CA209997 (B. H.). The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
Biography
Barry Honig, Professor of Systems Biology, Biochemistry and Molecular Biophysics, and Medical Sciences in Medicine, Columbia University Medical Center, has been a leader throughout his career in the use of computational approaches to understand the structure, energetics, and dynamics that underlie the function of biological macromolecules.
Edited by Wolfgang Peti
References
- 1.Dey F., Cliff Zhang Q., Petrey D., Honig B. Toward a "structural BLAST": Using structural relationships to infer function. Protein Sci. 2013;22:359–366. doi: 10.1002/pro.2225. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Altschul S.F., Gish W., Miller W., Myers E.W., Lipman D.J. Basic local alignment search tool. J. Mol. Biol. 1990;215:403–410. doi: 10.1016/S0022-2836(05)80360-2. [DOI] [PubMed] [Google Scholar]
- 3.Berman H.M., Westbrook J., Feng Z., Gilliland G., Bhat T.N., Weissig H., Shindyalov I.N., Bourne P.E. The protein Data Bank. Nucleic Acids Res. 2000;28:235–242. doi: 10.1093/nar/28.1.235. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Pieper U., Webb B.M., Dong G.Q., Schneidman-Duhovny D., Fan H., Kim S.J., Khuri N., Spill Y.G., Weinkam P., Hammel M., Tainer J.A., Nilges M., Sali A. ModBase, a database of annotated comparative protein structure models and associated resources. Nucleic Acids Res. 2014;42:D336–346. doi: 10.1093/nar/gkt1144. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Waterhouse A., Bertoni M., Bienert S., Studer G., Tauriello G., Gumienny R., Heer F.T., de Beer T.A.P., Rempfer C., Bordoli L., Lepore R., Schwede T. SWISS-MODEL: Homology modelling of protein structures and complexes. Nucleic Acids Res. 2018;46:W296–W303. doi: 10.1093/nar/gky427. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Menche J., Sharma A., Kitsak M., Ghiassian S.D., Vidal M., Loscalzo J., Barabasi A.L. Disease networks. Uncovering disease-disease relationships through the incomplete interactome. Science. 2015;347:1257601. doi: 10.1126/science.1257601. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Huang J.K., Carlin D.E., Yu M.K., Zhang W., Kreisberg J.F., Tamayo P., Ideker T. Systematic evaluation of molecular networks for discovery of disease genes. Cell Syst. 2018;6:484–495. doi: 10.1016/j.cels.2018.03.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Carter H., Hofree M., Ideker T. Genotype to phenotype via network analysis. Curr. Opin. Genet. Dev. 2013;23:611–621. doi: 10.1016/j.gde.2013.10.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Szklarczyk D., Jensen L.J. Protein-protein interaction databases. Methods Mol. Biol. 2015;1278:39–56. doi: 10.1007/978-1-4939-2425-7_3. [DOI] [PubMed] [Google Scholar]
- 10.Rolland T., Tasan M., Charloteaux B., Pevzner S.J., Zhong Q., Sahni N., Yi S., Lemmens I., Fontanillo C., Mosca R., Kamburov A., Ghiassian S.D., Yang X., Ghamsari L., Balcha D. A proteome-scale map of the human interactome network. Cell. 2014;159:1212–1226. doi: 10.1016/j.cell.2014.10.050. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Huttlin E.L., Ting L., Bruckner R.J., Gebreab F., Gygi M.P., Szpyt J., Tam S., Zarraga G., Colby G., Baltier K., Dong R., Guarani V., Vaites L.P., Ordureau A., Rad R. The BioPlex network: A systematic exploration of the human interactome. Cell. 2015;162:425–440. doi: 10.1016/j.cell.2015.06.043. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Oughtred R., Stark C., Breitkreutz B.J., Rust J., Boucher L., Chang C., Kolas N., O'Donnell L., Leung G., McAdam R., Zhang F., Dolma S., Willems A., Coulombe-Huntington J., Chatr-Aryamontri A. The BioGRID interaction database: 2019 update. Nucleic Acids Res. 2019;47:D529–D541. doi: 10.1093/nar/gky1079. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Orchard S., Ammari M., Aranda B., Breuza L., Briganti L., Broackes-Carter F., Campbell N.H., Chavali G., Chen C., del-Toro N., Duesbury M., Dumousseau M., Galeota E., Hinz U., Iannuccelli M. The MIntAct project--IntAct as a common curation platform for 11 molecular interaction databases. Nucleic Acids Res. 2014;42:D358–D363. doi: 10.1093/nar/gkt1115. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Ceol A., Chatr Aryamontri A., Licata L., Peluso D., Briganti L., Perfetto L., Castagnoli L., Cesareni G. MINT, the molecular interaction database: 2009 update. Nucleic Acids Res. 2010;38:D532–D539. doi: 10.1093/nar/gkp983. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Das J., Yu H. Hint: High-quality protein interactomes and their applications in understanding human disease. BMC Syst. Biol. 2012;6:92. doi: 10.1186/1752-0509-6-92. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Luck K., Kim D.K., Lambourne L., Spirohn K., Begg B.E., Bian W., Brignall R., Cafarelli T., Campos-Laborie F.J., Charloteaux B., Choi D., Cote A.G., Daley M., Deimling S., Desbuleux A. A reference map of the human binary protein interactome. Nature. 2020;580:402–408. doi: 10.1038/s41586-020-2188-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Alonso-Lopez D., Campos-Laborie F.J., Gutierrez M.A., Lambourne L., Calderwood M.A., Vidal M., De Las Rivas J. APID database: Redefining protein-protein interaction experimental evidences and binary interactomes. Database (Oxford) 2019;2019 doi: 10.1093/database/baz005. https://doi.org/10.1093/database/baz005 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Franceschini A., Szklarczyk D., Frankild S., Kuhn M., Simonovic M., Roth A., Lin J., Minguez P., Bork P., von Mering C., Jensen L.J. STRING v9.1: Protein-protein interaction networks, with increased coverage and integration. Nucleic Acids Res. 2013;41:D808–D815. doi: 10.1093/nar/gks1094. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.McDermott U. Large-scale compound screens and pharmacogenomic interactions in cancer. Curr. Opin. Genet. Dev. 2019;54:12–16. doi: 10.1016/j.gde.2019.02.002. [DOI] [PubMed] [Google Scholar]
- 20.Rouillard A.D., Gundersen G.W., Fernandez N.F., Wang Z., Monteiro C.D., McDermott M.G., Ma'ayan A. The harmonizome: A collection of processed datasets gathered to serve and mine knowledge about genes and proteins. Database (Oxford) 2016;2016 doi: 10.1093/database/baw100. https://doi.org/10.1093/database/baw100 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Cowley G.S., Weir B.A., Vazquez F., Tamayo P., Scott J.A., Rusin S., East-Seletsky A., Ali L.D., Gerath W.F., Pantel S.E., Lizotte P.H., Jiang G., Hsiao J., Tsherniak A., Dwinell E. Parallel genome-scale loss of function screens in 216 cancer cell lines for the identification of context-specific genetic dependencies. Sci. Data. 2014;1:140035. doi: 10.1038/sdata.2014.35. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Meyers R.M., Bryan J.G., McFarland J.M., Weir B.A., Sizemore A.E., Xu H., Dharia N.V., Montgomery P.G., Cowley G.S., Pantel S., Goodale A., Lee Y., Ali L.D., Jiang G., Lubonja R. Computational correction of copy number effect improves specificity of CRISPR-Cas9 essentiality screens in cancer cells. Nat. Genet. 2017;49:1779–1784. doi: 10.1038/ng.3984. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Stathias V., Turner J., Koleti A., Vidovic D., Cooper D., Fazel-Najafabadi M., Pilarczyk M., Terryn R., Chung C., Umeano A., Clarke D.J.B., Lachmann A., Evangelista J.E., Ma'ayan A., Medvedovic M. LINCS data portal 2.0: Next generation access point for perturbation-response signatures. Nucleic Acids Res. 2020;48:D431–D439. doi: 10.1093/nar/gkz1023. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Tsherniak A., Vazquez F., Montgomery P.G., Weir B.A., Kryukov G., Cowley G.S., Gill S., Harrington W.F., Pantel S., Krill-Burger J.M., Meyers R.M., Ali L., Goodale A., Lee Y., Jiang G. Defining a cancer dependency map. Cell. 2017;170:564–576.e516. doi: 10.1016/j.cell.2017.06.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Hutter C., Zenklusen J.C. The cancer genome Atlas: Creating Lasting value beyond its data. Cell. 2018;173:283–285. doi: 10.1016/j.cell.2018.03.042. [DOI] [PubMed] [Google Scholar]
- 26.Califano A., Alvarez M.J. The recurrent architecture of tumour initiation, progression and drug sensitivity. Nat. Rev. Cancer. 2017;17:116–130. doi: 10.1038/nrc.2016.124. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Broyde J., Simpson D.R., Murray D., Paull E.O., Chu B.W., Tagore S., Jones S.J., Griffin A.T., Giorgi F.M., Lachmann A., Jackson P., Sweet-Cordero E.A., Honig B., Califano A. Oncoprotein-specific molecular interaction maps (SigMaps) for cancer network analyses. Nat. Biotechnol. 2021;39:215–224. doi: 10.1038/s41587-020-0652-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Barradas-Bautista D., Rosell M., Pallara C., Fernandez-Recio J. Structural prediction of protein-protein interactions by docking: Application to biomedical problems. Adv. Protein Chem. Struct. Biol. 2018;110:203–249. doi: 10.1016/bs.apcsb.2017.06.003. [DOI] [PubMed] [Google Scholar]
- 29.Vakser I.A. Protein-protein docking: From interaction to interactome. Biophys. J. 2014;107:1785–1793. doi: 10.1016/j.bpj.2014.08.033. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Petrey D., Chen T.S., Deng L., Garzon J.I., Hwang H., Lasso G., Lee H., Silkov A., Honig B. Template-based prediction of protein function. Curr. Opin. Struct. Biol. 2015;32:33–38. doi: 10.1016/j.sbi.2015.01.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Zhang Y., Skolnick J. TM-align: A protein structure alignment algorithm based on the TM-score. Nucleic Acids Res. 2005;33:2302–2309. doi: 10.1093/nar/gki524. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Yang A.S., Honig B. An integrated approach to the analysis and modeling of protein sequences and structures. I. Protein structural alignment and a quantitative measure for protein structural distance. J. Mol. Biol. 2000;301:665–678. doi: 10.1006/jmbi.2000.3973. [DOI] [PubMed] [Google Scholar]
- 33.Mosca R., Ceol A., Aloy P. Interactome3D: Adding structural details to protein networks. Nat. Methods. 2013;10:47–53. doi: 10.1038/nmeth.2289. [DOI] [PubMed] [Google Scholar]
- 34.Meyer M.J., Beltran J.F., Liang S., Fragoza R., Rumack A., Liang J., Wei X., Yu H. Interactome INSIDER: A structural interactome browser for genomic studies. Nat. Methods. 2018;15:107–114. doi: 10.1038/nmeth.4540. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Garzon J.I., Deng L., Murray D., Shapira S., Petrey D., Honig B. A computational interactome and functional annotation for the human proteome. Elife. 2016;5:e18715. doi: 10.7554/eLife.18715. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Zhang Q.C., Petrey D., Deng L., Qiang L., Shi Y., Thu C.A., Bisikirska B., Lefebvre C., Accili D., Hunter T., Maniatis T., Califano A., Honig B. Structure-based prediction of protein-protein interactions on a genome-wide scale. Nature. 2012;490:556–560. doi: 10.1038/nature11503. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Zhang Q.C., Deng L., Fisher M., Guan J., Honig B., Petrey D. PredUs: A web server for predicting protein interfaces using structural neighbors. Nucleic Acids Res. 2011;39:W283–287. doi: 10.1093/nar/gkr311. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Hwang H., Petrey D., Honig B. A hybrid method for protein-protein interface prediction. Protein Sci. 2016;25:159–165. doi: 10.1002/pro.2744. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Hopf T.A., Scharfe C.P., Rodrigues J.P., Green A.G., Kohlbacher O., Sander C., Bonvin A.M., Marks D.S. Sequence co-evolution gives 3D contacts and structures of protein complexes. Elife. 2014;3:e03430. doi: 10.7554/eLife.03430. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Weigt M., White R.A., Szurmant H., Hoch J.A., Hwa T. Identification of direct residue contacts in protein-protein interaction by message passing. Proc. Natl. Acad. Sci. U. S. A. 2009;106:67–72. doi: 10.1073/pnas.0805923106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Cong Q., Anishchenko I., Ovchinnikov S., Baker D. Protein interaction networks revealed by proteome coevolution. Science. 2019;365:185–189. doi: 10.1126/science.aaw6718. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Ludin P., Nilsson D., Mas1er P. Genome-wide identification of molecular mimicry candidates in parasites. PLoS One. 2011;6:e17546. doi: 10.1371/journal.pone.0017546. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Doxey A.C., McConkey B.J. Prediction of molecular mimicry candidates in human pathogenic bacteria. Virulence. 2013;4:453–466. doi: 10.4161/viru.25180. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Stebbins C.E., Galan J.E. Structural mimicry in bacterial virulence. Nature. 2001;412:701–705. doi: 10.1038/35089000. [DOI] [PubMed] [Google Scholar]
- 45.Jensen S., Thomsen A.R. Sensing of RNA viruses: A review of innate immune receptors involved in recognizing RNA virus invasion. J. Virol. 2012;86:2900–2910. doi: 10.1128/JVI.05738-11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Ivanov K.A., Thiel V., Dobbe J.C., van der Meer Y., Snijder E.J., Ziebuhr J. Multiple enzymatic activities associated with severe acute respiratory syndrome coronavirus helicase. J. Virol. 2004;78:5619–5632. doi: 10.1128/JVI.78.11.5619-5632.2004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Adedeji A.O., Lazarus H. Biochemical characterization of Middle East respiratory syndrome coronavirus helicase. mSphere. 2016;1 doi: 10.1128/mSphere.00235-16. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Franzosa E.A., Xia Y. Structural principles within the human-virus protein-protein interaction network. P Natl. Acad. Sci. U. S. A. 2011;108:10538–10543. doi: 10.1073/pnas.1101440108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Guven-Maiorov E., Tsai C.J., Ma B.Y., Nussinov R. Interface-based structural prediction of novel host-pathogen interactions. Comput. Methods Protein Evol. 2019;1851:317–335. doi: 10.1007/978-1-4939-8736-8_18. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Lasso G., Mayer S.V., Winkelmann E.R., Chu T., Elliot O., Patino-Galindo J.A., Park K., Rabadan R., Honig B., Shapira S.D. A structure-informed Atlas of human-virus interactions. Cell. 2019;178:1526–1541.e1516. doi: 10.1016/j.cell.2019.08.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Lasso G., Honig B., Shapira S. A Sweep of Earth’s virome reveals host-Guided viral protein structural mimicry and points to Determinants of human disease. Cell Syst. 2021;12:82–91.e3. doi: 10.1016/j.cels.2020.09.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.International Cancer Genome, C International network of cancer genome projects. Nature. 2010;464:993–998. doi: 10.1038/nature08987. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Martinez-Jimenez F., Muinos F., Sentis I., Deu-Pons J., Reyes-Salazar I., Arnedo-Pac C., Mularoni L., Pich O., Bonet J., Kranas H., Gonzalez-Perez A., Lopez-Bigas N. A compendium of mutational cancer driver genes. Nat. Rev. Cancer. 2020;20:555–572. doi: 10.1038/s41568-020-0290-x. [DOI] [PubMed] [Google Scholar]
- 54.Porta-Pardo E., Kamburov A., Tamborero D., Pons T., Grases D., Valencia A., Lopez-Bigas N., Getz G., Godzik A. Comparison of algorithms for the detection of cancer drivers at subgene resolution. Nat. Methods. 2017;14:782–788. doi: 10.1038/nmeth.4364. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Porta-Pardo E., Godzik A. e-Driver: a novel method to identify protein regions driving cancer. Bioinformatics. 2014;30:3109–3114. doi: 10.1093/bioinformatics/btu499. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Kamburov A., Lawrence M.S., Polak P., Leshchiner I., Lage K., Golub T.R., Lander E.S., Getz G. Comprehensive assessment of cancer missense mutation clustering in protein structures. P Natl. Acad. Sci. U. S. A. 2015;112:E5486–E5495. doi: 10.1073/pnas.1516373112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Gao J.J., Chang M.T., Johnsen H.C., Gao S.P., Sylvester B.E., Sumer S.O., Zhang H.X., Solit D.B., Taylor B.S., Schultz N., Sander C. 3D clusters of somatic mutations in cancer reveal numerous rare mutations as functional targets. Genome Med. 2017;9:4. doi: 10.1186/s13073-016-0393-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Sahni N., Yi S., Zhong Q., Jailkhani N., Charloteaux B., Cusick M.E., Vidal M. Edgotype: A fundamental link between genotype and phenotype. Curr. Opin. Genet. Dev. 2013;23:649–657. doi: 10.1016/j.gde.2013.11.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Porta-Pardo E., Garcia-Alonso L., Hrabe T., Dopazo J., Godzik A. A pan-cancer Catalogue of cancer driver protein interaction interfaces. Plos Comput. Biol. 2015;11:e1004518. doi: 10.1371/journal.pcbi.1004518. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Bailey M.H., Tokheim C., Porta-Pardo E., Sengupta S., Bertrand D. Comprehensive characterization of cancer driver genes and mutations. Cell. 2018;173:371–385.e318. doi: 10.1016/j.cell.2018.02.060. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Prahallad A., Sun C., Huang S., Di Nicolantonio F., Salazar R., Zecchin D., Beijersbergen R.L., Bardelli A., Bernards R. Unresponsiveness of colon cancer to BRAF(V600E) inhibition through feedback activation of EGFR. Nature. 2012;483:100–103. doi: 10.1038/nature10868. [DOI] [PubMed] [Google Scholar]
- 62.Bild A.H., Yao G., Chang J.T., Wang Q.L., Potti A., Chasse D., Joshi M.B., Harpole D., Lancaster J.M., Berchuck A., Olson J.A., Marks J.R., Dressman H.K., West M., Nevins J.R. Oncogenic pathway signatures in human cancers as a guide to targeted therapies. Nature. 2006;439:353–357. doi: 10.1038/nature04296. [DOI] [PubMed] [Google Scholar]
- 63.Krogan N.J., Lippman S., Agard D.A., Ashworth A., Ideker T. The cancer cell map Initiative: Defining the Hallmark networks of cancer. Mol. Cell. 2015;58:690–698. doi: 10.1016/j.molcel.2015.05.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Greene C.S., Krishnan A., Wong A.K., Ricciotti E., Zelaya R.A., Himmelstein D.S., Zhang R., Hartmann B.M., Zaslavsky E., Sealfon S.C., Chasman D.I., FitzGerald G.A., Dolinski K., Grosser T., Troyanskaya O.G. Understanding multicellular function and disease with human tissue-specific networks. Nat. Genet. 2015;47:569–576. doi: 10.1038/ng.3259. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Hill S.M., Nesser N.K., Johnson-Camacho K., Jeffress M., Johnson A., Boniface C., Spencer S.E., Lu Y., Heiser L.M., Lawrence Y., Pande N.T., Korkola J.E., Gray J.W., Mills G.B., Mukherjee S. Context specificity in causal signaling networks revealed by Phosphoprotein profiling. Cell Syst. 2017;4:73–83.e10. doi: 10.1016/j.cels.2016.11.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Will T., Helms V. PPIXpress: Construction of condition-specific protein interaction networks based on transcript expression. Bioinformatics. 2016;32:571–578. doi: 10.1093/bioinformatics/btv620. [DOI] [PubMed] [Google Scholar]
- 67.Basso K., Margolin A.A., Stolovitzky G., Klein U., Dalla-Favera R., Califano A. Reverse engineering of regulatory networks in human B cells. Nat. Genet. 2005;37:382–390. doi: 10.1038/ng1532. [DOI] [PubMed] [Google Scholar]
- 68.Margolin A.A., Nemenman I., Basso K., Wiggins C., Stolovitzky G., Dalla Favera R., Califano A. Aracne: An algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context. BMC Bioinformatics. 2006;7(Suppl 1):S7. doi: 10.1186/1471-2105-7-S1-S7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Alvarez M.J., Shen Y., Giorgi F.M., Lachmann A., Ding B.B., Ye B.H., Califano A. Functional characterization of somatic mutations in cancer using network-based inference of protein activity. Nat. Genet. 2016;48:838–847. doi: 10.1038/ng.3593. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Giorgi F.M., Lopez G., Woo J.H., Bisikirska B., Califano A., Bansal M. Inferring protein modulation from gene expression data using conditional mutual information. PLoS One. 2014;9:e109569. doi: 10.1371/journal.pone.0109569. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Wang K., Saito M., Bisikirska B.C., Alvarez M.J., Lim W.K., Rajbhandari P., Shen Q., Nemenman I., Basso K., Margolin A.A., Klein U., Dalla-Favera R., Califano A. Genome-wide identification of post-translational modulators of transcription factor activity in human B cells. Nat. Biotechnol. 2009;27:829–839. doi: 10.1038/nbt.1563. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Boyle E.A., Pritchard J.K., Greenleaf W.J. High-resolution mapping of cancer cell networks using co-functional interactions. Mol. Syst. Biol. 2018;14:e8594. doi: 10.15252/msb.20188594. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73.Pan J., Meyers R.M., Michel B.C., Mashtalir N., Sizemore A.E., Wells J.N., Cassel S.H., Vazquez F., Weir B.A., Hahn W.C., Marsh J.A., Tsherniak A., Kadoch C. Interrogation of mammalian protein complex structure, function, and membership using genome-scale Fitness screens. Cell Syst. 2018;6:555–568.e557. doi: 10.1016/j.cels.2018.04.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Ghandi M., Huang F.W., Jane-Valbuena J., Kryukov G.V., Lo C.C., McDonald E.R., 3rd, Barretina J. Next-generation characterization of the cancer cell line Encyclopedia. Nature. 2019;569:503–508. doi: 10.1038/s41586-019-1186-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.Xie L., Ge X., Tan H., Xie L., Zhang Y., Hart T., Yang X., Bourne P.E. Towards structural systems pharmacology to study complex diseases and personalized medicine. Plos Comput. Biol. 2014;10:e1003554. doi: 10.1371/journal.pcbi.1003554. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76.Corsello S.M., Bittker J.A., Liu Z., Gould J., McCarren P., Hirschman J.E., Johnston S.E., Vrcic A., Wong B., Khan M., Asiedu J., Narayan R., Mader C.C., Subramanian A., Golub T.R. The drug repurposing Hub: A next-generation drug library and information resource. Nat. Med. 2017;23:405–408. doi: 10.1038/nm.4306. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Sterling T., Irwin J.J. ZINC 15--ligand discovery for Everyone. J. Chem. Inf. Model. 2015;55:2324–2337. doi: 10.1021/acs.jcim.5b00559. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78.Mendez D., Gaulton A., Bento A.P., Chambers J., De Veij M., Felix E., Magarinos M.P., Mosquera J.F., Mutowo P., Nowotka M., Gordillo-Maranon M., Hunter F., Junco L., Mugumbate G., Rodriguez-Lopez M. ChEMBL: Towards direct deposition of bioassay data. Nucleic Acids Res. 2019;47:D930–D940. doi: 10.1093/nar/gky1075. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79.Kim S., Chen J., Cheng T., Gindulyte A., He J., He S., Li Q., Shoemaker B.A., Thiessen P.A., Yu B., Zaslavsky L., Zhang J., Bolton E.E. PubChem 2019 update: Improved access to chemical data. Nucleic Acids Res. 2019;47:D1102–D1109. doi: 10.1093/nar/gky1033. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80.Wishart D.S., Knox C., Guo A.C., Shrivastava S., Hassanali M., Stothard P., Chang Z., Woolsey J. DrugBank: A comprehensive resource for in silico drug discovery and exploration. Nucleic Acids Res. 2006;34:D668–672. doi: 10.1093/nar/gkj067. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 81.Gilson M.K., Liu T., Baitaluk M., Nicola G., Hwang L., Chong J. BindingDB in 2015: A public database for medicinal chemistry, computational chemistry and systems pharmacology. Nucleic Acids Res. 2016;44:D1045–1053. doi: 10.1093/nar/gkv1072. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 82.Alvarez M.J., Subramaniam P.S., Tang L.H., Grunn A., Aburi M., Rieckhof G. A precision oncology approach to the pharmacological targeting of mechanistic dependencies in neuroendocrine tumors. Nat. Genet. 2018;50:979–989. doi: 10.1038/s41588-018-0138-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 83.Dugger S.A., Platt A., Goldstein D.B. Drug development in the era of precision medicine. Nat. Rev. Drug Discov. 2018;17:183–196. doi: 10.1038/nrd.2017.226. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 84.Filipp F.V. Precision medicine driven by cancer systems biology. Cancer Metastasis Rev. 2017;36:91–108. doi: 10.1007/s10555-017-9662-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 85.Willett P. Similarity searching using 2D structural fingerprints. Chemoinformatics Comput. Chem. Biol. 2011;672:133–158. doi: 10.1007/978-1-60761-839-3_5. [DOI] [PubMed] [Google Scholar]
- 86.Bajorath J. Molecular similarity Concepts for Informatics applications. Methods Mol. Biol. 2017;1526:231–245. doi: 10.1007/978-1-4939-6613-4_13. [DOI] [PubMed] [Google Scholar]
- 87.Maggiora G., Vogt M., Stumpfe D., Bajorath J. Molecular similarity in medicinal chemistry. J. Med. Chem. 2014;57:3186–3204. doi: 10.1021/jm401411z. [DOI] [PubMed] [Google Scholar]
- 88.Keiser M.J., Setola V., Irwin J.J., Laggner C., Abbas A.I., Hufeisen S.J., Jensen N.H., Kuijer M.B., Matos R.C., Tran T.B., Whaley R., Glennon R.A., Hert J., Thomas K.L.H., Edwards D.D. Predicting new molecular targets for known drugs. Nature. 2009;462:175–181. doi: 10.1038/nature08506. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 89.Lo Y.C., Rensi S.E., Torng W., Altman R.B. Machine learning in chemoinformatics and drug discovery. Drug Discov. Today. 2018;23:1538–1546. doi: 10.1016/j.drudis.2018.05.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 90.Pagadala N.S., Syed K., Tuszynski J. Software for molecular docking: A review. Biophys. Rev. 2017;9:91–102. doi: 10.1007/s12551-016-0247-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 91.Ragoza M., Hochuli J., Idrobo E., Sunseri J., Koes D.R. Protein-ligand scoring with Convolutional neural networks. J. Chem. Inf. Model. 2017;57:942–957. doi: 10.1021/acs.jcim.6b00740. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 92.Wang L., Berne B.J., Friesner R.A. On achieving high accuracy and reliability in the calculation of relative protein-ligand binding affinities. Proc. Natl. Acad. Sci. U. S. A. 2012;109:1937–1942. doi: 10.1073/pnas.1114017109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 93.Fratev F., Sirimulla S. An improved free energy perturbation FEP+ sampling Protocol for flexible ligand-binding domains. Sci. Rep. 2019;9:16829. doi: 10.1038/s41598-019-53133-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 94.Hwang H., Dey F., Petrey D., Honig B. Structure-based prediction of ligand-protein interactions on a genome-wide scale. Proc. Natl. Acad. Sci. U. S. A. 2017;114:13685–13690. doi: 10.1073/pnas.1705381114. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 95.Lim H., He D., Qiu Y., Krawczuk P., Sun X.R., Xie L. Rational discovery of dual-indication multi-target PDE/Kinase inhibitor for precision anti-cancer therapy using structural systems pharmacology. Plos Comput. Biol. 2019;15:e1006619. doi: 10.1371/journal.pcbi.1006619. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 96.Zhou H., Cao H., Skolnick J. FINDSITE(comb2.0): A new approach for Virtual ligand screening of proteins and Virtual target screening of Biomolecules. J. Chem. Inf. Model. 2018;58:2343–2354. doi: 10.1021/acs.jcim.8b00309. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 97.Oldfield C.J., Dunker A.K. Intrinsically disordered proteins and intrinsically disordered protein regions. Annu. Rev. Biochem. 2014;83:553–584. doi: 10.1146/annurev-biochem-072711-164947. [DOI] [PubMed] [Google Scholar]
- 98.Porta-Pardo E., Valencia A., Godzik A. Understanding oncogenicity of cancer driver genes and mutations in the cancer genomics era. FEBS Lett. 2020;594:4233–4246. doi: 10.1002/1873-3468.13781. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 99.Lundberg E., Borner G.H.H. Spatial proteomics: A powerful discovery tool for cell biology. Nat. Rev. Mol. Cell Biol. 2019;20:285–302. doi: 10.1038/s41580-018-0094-y. [DOI] [PubMed] [Google Scholar]
- 100.Ideker T., Krogan N.J. Differential network biology. Mol. Syst. Biol. 2012;8:565. doi: 10.1038/msb.2011.99. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 101.Honig B., Shapiro L. Adhesion protein structure, molecular affinities, and principles of cell-cell Recognition. Cell. 2020;181:520–535. doi: 10.1016/j.cell.2020.04.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 102.Hyvonen M., Macias M.J., Nilges M., Oschkinat H., Saraste M., Wilmanns M. Structure of the binding site for inositol phosphates in a PH domain. EMBO J. 1995;14:4676–4685. doi: 10.1002/j.1460-2075.1995.tb00149.x. [DOI] [PMC free article] [PubMed] [Google Scholar]