Estimating the accuracy of pharmacophore-based detection of cognate receptor-ligand pairs in the immunoglobulin superfamily

Nelson Gil; Rojan Shrestha; Andras Fiser

doi:10.1002/prot.26046

. Author manuscript; available in PMC: 2022 Jun 1.

Published in final edited form as: Proteins. 2021 Jan 28;89(6):632–638. doi: 10.1002/prot.26046

Estimating the accuracy of pharmacophore-based detection of cognate receptor-ligand pairs in the immunoglobulin superfamily

Nelson Gil ¹, Rojan Shrestha ¹, Andras Fiser ^1,^*

PMCID: PMC8906256 NIHMSID: NIHMS1663933 PMID: 33483991

Abstract

Secreted and membrane-bound members of the immunoglobulin superfamily (IgSF) encompass a large, diverse array of proteins that play central roles in immune response and neural development, and are implicated in diseases ranging from cancer to rheumatoid arthritis. Despite the potential biomedical benefits of understanding IgSF:IgSF cognate receptor-ligand interactions, relatively little about them is known at a molecular level, and experimentally probing all possible receptor-ligand pairs is prohibitively costly. The Protein Ligand Interface Design (ProtLID) algorithm is a computational pharmacophore-based approach to identify cognate receptor-ligand pairs that was recently validated in a pilot study on a small set of IgSF complexes. Although ProtLID has shown a success rate of 61% at identifying at least one cognate ligand for a given receptor, it currently lacks any form of confidence measure that can prioritize individual receptor-ligand predictions to pursue experimentally. In this study, we expanded the application of ProtLID to cover all IgSF complexes with available structural data. In addition, we introduced an approach to estimate the confidence of predictions made by ProtLID based on a statistical analysis of how the ProtLID-constructed pharmacophore matches the structures of candidate ligands. The confidence score combines the physicochemical compatibility, spatial consistency, and mathematical skewness of the distribution of matches throughout a set of candidate ligands. Our results suggest that a subset of cases meeting stringent confidence criteria will always have at least one successful receptor-ligand prediction.

Keywords: Immunoglobulin Superfamily, Binding Partner Identification, Pharmacophores

Introduction

Cell-surface-secreted, membrane-bound proteins of the immunoglobulin superfamily (IgSF) are essential to a wide variety of human biological processes, ranging from the regulation of the immune response to the establishment of neural synapses [1–3]. IgSF proteins are structurally defined by the presence of at least one immunoglobulin (Ig) domain, a beta-sandwich topology of antiparallel beta sheets surrounding a hydrophobic core [4]. Ig domains are physicochemically well-suited for participating in protein-protein interactions, generally, but not necessarily, with other Ig domains [2–4]. As a result, cell-surface and secreted IgSF proteins (extracellular IgSFs) are common effectors of molecular recognition and mediators of signaling among proximal cells through trans-binding interfaces [2, 4]. Given the widespread roles of extracellular IgSFs in human biology, their dysfunction is implicated in many pathologies, especially autoimmune, infectious, and oncologic ones [5–7]. Thus, the modulation of the protein-protein interactions of extracellular IgSFs represents an important pharmaceutical goal, with several biologic drugs already approved targeting these proteins. A crucial step towards gaining insight on the principles of molecular recognition in the immune synapse is the elucidation of the IgSF interactome: the mapping of all IgSF-IgSF protein-protein interactions.

Accurate identification of binding partners among IgSF proteins remains a critically important problem with far-reaching effects on the development of macromolecular therapeutics. Although the IgSF is one of the largest known superfamilies in the human proteome, with nearly 500 members [8], relatively few molecular-level annotations exist regarding their protein-binding sites and cognate binding partners: only ~25% of the extracellular IgSFs have experimentally annotated binding partners, and only ~5% with known binding partners have crystallized complex structures [9, 10]. Even if only considering interactions within the IgSF itself, there are on the order of 10⁵ possible interacting protein pairs. This large number of possible interactions makes a brute-force experimental determination of the IgSF interactome prohibitively expensive, such that new techniques to reduce the experimental search space are required.

Protein Ligand Interface Design (ProtLID) is a computational strategy to limit the experimental search space by providing a shortlist of probable protein-protein interactions (Suppl. Fig. 1) [11]. For an input protein binding interface (the “receptor”), ProtLID performs extensive molecular dynamics (MD) simulations of amino-acid-based probes set throughout the interface to create a statistical description of amino acid interaction preferences, termed a “residue-specific-pharmacophore” (rs-pharmacophore). A subproteome of candidate ligands is then searched for structural matches to the receptor’s rs-pharmacophore, with the hypothesis that true cognate ligands will be the strongest, most highly-ranked matches. This pharmacophore-based approach has also allowed ProtLID to be adapted to design specificity-enhancing mutations; a recent example is one that allows the IgSF protein Programmed Death 1 (PD-1) [12] to selectively bind to PD Ligand 1 (PD-L1) and not PD Ligand 2 (PD-L2), and thus permit tissue-specific immune regulation [13].

In its original benchmarking study on a set of 11 IgSF proteins with 36 known ligands, ProtLID successfully identified known cognate partners in 61% of cases [11]. However, it is important to recognize that ProtLID’s previous benchmarking was a retrospective analysis; in practice, one would have no way of knowing a priori which specific cases ProtLID was more likely to be correct on and should be prioritized for experimental study. In order to make this approach even more practical, an algorithm to assess the confidence of predictions is necessary.

In the current work, we introduce a quality measure for ProtLID that gauges the likelihood of the algorithm successfully identifying cognate binding partners. This quality measure represents the physicochemical compatibility and spatial consistency of the rspharmacophore matches on the candidate ligand database structures. Additionally, we demonstrate that the skewness of the distribution of rs-pharmacophore matches on all candidate ligand structures can be used to further augment the quality measure by rewarding well-matched cognate partners and penalizing well-matching non-cognate-partner outliers. Our approach was validated on the entire dataset of currently known IgSF members, containing 23 receptors forming 77 complexes due to the redundancy among available ligand structures (Table 1). This work represents an extended validation of the ProtLID approach, a novel computational development that will aid experimentalists in triaging the exploration of putative protein-protein interactions, and a full molecular-level exploration of currently known IgSF complexes.

Table 1.

Application of confidence assignment procedure for individual receptor proteins in the dataset.

Receptor PDB with ProtLID Score Skewness > 2.5	Top-15-Ranked Candidate Ligands with Relative MatchSize > 0.60	Top-15-Ranked Cognate Partners with Relative MatchSize > 0.60 (Enrichment)
1F5W.B	14	8 (57%)
1I8L.C	13	2 (15%)
2JJS.C	4	1 (25%)
2PTT.A	6	1 (17%)
3RBG.A	14	2 (14%)

Open in a new tab

The “enrichment” represents the fraction of top-ranked candidate ligands that meet the relative MatchSize cutoff.

Methods

IgSF dataset.

A set of 23 IgSF receptor-ligand pairs was collected based on previous studies. The 477 secreted or integral membrane human IgSF proteins [9] were queried in the UniProt sequence database [14]. The output of the search was parsed using the criteria described in previous studies [9, 10], while proteins lacking tertiary structures were excluded. The resulting proteins were manually curated using previously-defined criteria [11] to identify trans-binding complexes that were collected from the Protein Data Bank [15]. We obtained 16 human or murine IgSF complexes (six were previously included the original ProtLID benchmarking study [11]) that were solved by X-ray crystallography as of July 2018 and involved in either homophilic or heterophilic trans-binding interactions. From the 16 complexes, we obtained 23 unique receptor-ligand interactions (Suppl. Table 1).

ProtLID algorithm.

The initial version of ProtLID has been described in a previous work [11]. Briefly, the algorithm consists of the following three steps (Suppl. Fig. 1):

Given a receptor’s protein-binding interface, ProtLID generates a mesh of points spaced 1 Å apart across the interface using EDTSurf [16]. Input binding interfaces were defined using Contacts of Structural Units (CSU) [17] on IgSF complex structures (Suppl. Table 1). At each mesh point, molecular probes representing the twenty standard amino acids are initialized for molecular dynamics (MD) simulations 40 ps in length using AMBER [18], with each simulation repeated seven times to ensure reproducibility of results. Each probe is represented by a functional atom (FA) and tracked at 1 ps intervals, with FA definitions and MD parameters as set previously [11]. For the 12 proteins in the dataset not present in the original ProtLID study, we introduced a “dead-end-elimination” approach that statistically evaluated FA positions every 5 ps and prematurely ended simulations where residue probes had early, systematic preference or avoidance for specific locations. This new approach results in equivalent performance to the previously reported ProtLID algorithm at a greatly reduced computational cost.
The end-of-simulation FA positions, averaged over seven runs, are used to assign FAs to mesh points, creating a statistical description of FA and amino acid preferences throughout the receptor binding interface. The FA assignments are consolidated into N “interactors” (a typical N is 15), which consist of the receptor site amino acid position, the predicted ligand position corresponding to a mesh point, the allowed interacting FA types at the mesh point, and a distance restraint between the receptor and ligand atoms. The N interactors constitute the receptor binding interface’s rs-pharmacophore.
ProtLID then decomposes the N-interactor rs-pharmacophore template into all possible sub-templates made up of five interactors (5-mers). The structures of an input candidate ligand database are then searched for matches to all ^NC₅ 5-mers and subject to least-squares-based docking on the receptor using a previously-described pose refinement and clustering process [11]. The candidate ligand database was mostly identical to that in the original ProtLID study [11]; the only additions were cognate partners for the newly included proteins in the present work. The number of 5-mer matches for a specific candidate ligand structure measures its similarity to the rs-pharmacophore and is reflected by the number of poses in the largest cluster – the set of candidate ligand residues that was matched in the largest cluster also constitutes a binding interface prediction. The number of 5-mer matches for each docked pose is normalized by the docked binding interface area and summed over all poses to yield a “ProtLID Score” for the candidate. Finally, the top 15 candidate ligands ranked by ProtLID Score comprise a shortlist predicted to include at least one cognate partner for the input receptor.

Basic confidence score.

For a given receptor, ProtLID Scores were used to rank the candidate ligand database proteins. The goal of a confidence score is to indicate the likelihood that cognate partners are included in the top ranks. The high rank of a cognate partner, as expected, is strongly related to the numerical value of its ProtLID Score relative to other ligands in the candidate database. In addition, high cognate partner rank correlates with the number of residues in the surface patch matched by the 5-mer search. This is because the more the 5-mer search matches a specific area of the candidate ligand surface, the more residues from that area are likely to be matched. In other words, the size of the matched surface patch indicates the consistency of the 5-mer search result for a specific candidate ligand. To compare numbers of matched residues across different proteins, we normalized the number of matched residues for any given candidate ligand relative to the number of residues matched across all other candidate ligands to create a “MatchSize” score.

Based on the above observations, the confidence score was initially modeled as a linear combination (CombScore) of the ProtLID Score and the MatchSize score (both of which are relative measures that range from 0 to 1), weighted by a factor w:

C o m b S c o r e = w * P r o t L I D_S c o r e + (1 - w) * M a t c h S i z e

Any cognate partners whose CombScore exceeds an empirically set cutoff would be predicted to rank in the top 15 of the full candidate ligand list. The success of the CombScore cutoff at identifying cognate partners ranked in the top 15 could be assessed by partitioning the Rank-CombScore correlation plot into four quadrants, representing true and false positives and negatives (Fig. 1). These were used to calculate the performance metrics of precision (ratio of true positives to sum of true and false positives), sensitivity (ratio of true positives to sum of true positives and false negatives), specificity (ratio of true negatives to sum of true negatives and false positives), and F-Score (harmonic mean of precision and sensitivity: 2*precision*sensitivity/(precision + sensitivity)) [19]. The optimal value of w for the CombScore was 0.9 and was obtained by its ability to maximize the F-Score (Suppl. Fig. 2). Furthermore, at the optimal value of w, the CombScore cutoff that maximized the F-Score was 0.50. This also appears to be the right cutoff value when comparing the F-score vs. weight between cases when the CombScore cutoff is set universally at 0.5 for all tested cases and when the CombScore cutoff is individually adjusted to get the best performance for each case (theoretical best possible outcome) (Suppl. Fig. 3).

Figure 1. — The ranks of cognate partners are correlated to (A) their ProtLID Score relative to the rest of the subproteome to which the rs-pharmacophore was matched, and (B) the size of the surface patch matched by the rs-pharmacophore, relative to all other cognate ligands in the dataset used in this work. Each plot was partitioned into four quadrants: the y-axis numerical ranks of 15 or less were considered the “truth”, while the x-axis variable in each plot could be used to set a cutoff beyond which any points would be a “predicted” top-15-ranked cognate partner. These partitions then define true positives (TP), false negatives (FN), false positives (FP), and true negatives (TN), which can be used to compute performance metrics to determine optimal cutoff values for the x-axis variables.

Skewness-enhanced confidence score.

The basic ProtLID confidence score was combined with the skewness of the ProtLID Score distribution since this quantity also correlated with cognate partner rank. Given a set of ProtLID Scores X₁, …, X_n, the sample skewness Skew[X] was calculated as follows [20]:

S k e w [X] = \frac{μ_{3}}{σ^{3}} = \frac{\frac{1}{n} \sum_{i = 1}^{n} {(X_{i} - \bar{X})}^{3}}{{(\frac{1}{n - 1} \sum_{i = 1}^{n} {(X_{i} - \bar{X})}^{2})}^{3 / 2}}

Here, μ₃ is the third moment, σ is the standard deviation, and X̄ is the arithmetic mean of the ProtLID Score distribution. Based on the discrete relationship between the cognate partner rank and ProtLID Score distribution skewness, we defined the Skewness-Enhanced CombScore as equal to 1 if the skewness was greater than 2.5, and equal to the basic ProtLID-MatchSize CombScore otherwise. The optimal values of the ProtLID Score weight w and the CombScore cutoff remain unchanged from those of the basic confidence score (Suppl. Fig. 2).

Results & Discussion

Highly-ranked cognate partners are physicochemically and spatially compatible with the rs-pharmacophore

For a given input protein receptor and candidate ligand subproteome, the output of ProtLID is a residue-specific pharmacophore (rs-pharmacophore) and a list of potential binding partners ranked by their “ProtLID Score”. The fundamental goal of ProtLID is for cognate partners to be ranked highly on this list; the highest-ranking candidate ligands are those with a surface patch most physicochemically compatible with the rs-pharmacophore. Indeed, IgSF cognate partners with high ProtLID Scores relative to the rest of the candidate ligand subproteome are ranked highly, although this relationship is not perfectly linear (Fig. 1A) and shows a correlation coefficient (CC) of −0.73 (higher ranks have lower rank numbers). Our results demonstrate that a relative ProtLID Score of at least 0.50 is a specific and precise characteristic of top-15-ranked cognate ligands which leads to no false positive identifications. However, the perfect specificity and precision comes at the cost of reduced sensitivity as can be seen by the presence of false negatives (i.e. top-15-ranked cognate ligands whose relative ProtLID Scores are below 0.50), resulting in an F-Score of 0.71. Nevertheless, the relationship between the relative ProtLID Score and cognate ligand rank represents an extension of the first ProtLID benchmarking study: 15 of the 23 total receptors (65%) included in this work had a cognate ligand in the top 15 ranks, a very similar result to what was reported on the earlier, smaller dataset (61%) [11]. This suggests that the previous study already delivered a robust performance result as it did not change on this expanded list where none of the newly added test cases were used in any parameterization or testing of the core method.

We also explored if there is any dependence on the results on the type of complexes we explored. The results shown in Suppl. Material Table 2 were separated in two groups: in the first one, the receptor and ligand were solved in the same complex, while in group two the ranking of those ligands are collected that were solved in isolation or in complex with other receptors than the query. The average rank of all cases is 27.7, while group 1 (“holo structures”) and 2 (“apo structures”) have average ranks of 25.4 and 28.6, respectively. A Wilcoxon rank sum statistical test shows that this difference in average ranking is insignificant (p=0.4861)

The present work introduces the size of the interface matched by the rs-pharmacophore (the “MatchSize”) as a novel correlate with the cognate ligand rank (Fig. 1B). The MatchSize is indicative of the spatial consistency of rs-pharmacophore matches on the candidate ligand: the larger the matched patch, the more robust it was through the rs-pharmacophore matching process. Although the MatchSize shows a weaker overall correlation to cognate ligand rank than the ProtLID Score (CC: −0.55), a MatchSize cutoff of 0.60 has a comparable discriminative ability to identify top-15-ranked cognate partners, with an F-Score of 0.69.

The skewness of a ProtLID Score distribution can assess rs-pharmacophore reliability

The set of ProtLID Scores of candidate ligands defines a statistical distribution whose properties characterize the relationship between the rs-pharmacophore and the candidate ligand subproteome. For example, a candidate ligand list whose ProtLID score distribution is highly lopsided indicates that only the few top ligands were strong matches to the rs-pharmacophore; these may be expected to be more likely cognate partners than if the ProtLID Scores of the top candidate ligands were close to those of the rest of the subproteome. Conversely, an rs-pharmacophore may be an especially strong match to a non-cognate partner outlier, causing true cognate partners to have low relative ProtLID Scores and become false negatives. Intuitively, accounting for the shape of the ProtLID Score distribution should improve the reliability of cognate partner identification in both cases.

Indeed, the mathematical skewness of the ProtLID Score distribution is strongly related to the ranks of cognate ligands for a given receptor (Fig. 2). It is apparent that receptors for which the skewness of the ProtLID Score distribution exceeds a value of 2.5 have mostly high-ranking cognate ligands; this cutoff’s ability to distinguish top-15-ranked cognate partners resembles that of the ProtLID Score and the MatchSize, with an F-Score of 0.68. However, unlike the ProtLID Score and MatchSize, the skewness is fundamentally a property of the rs-pharmacophore: all cognate ligands to which the specific rs-pharmacophore was matched will be assigned the same skewness. Thus, these results suggest that the skewness can be used to directly evaluate the likelihood that an rs-pharmacophore successfully identified a receptor’s cognate partners in the candidate ligand subproteome.

Figure 2. — The skewness of the ProtLID Score distribution is a property of that relates the rs-pharmacophore to the entire candidate ligand subproteome. Therefore, all cognate partner structures for a given receptor will be assigned the same skewness value. This correlation plot can be partitioned into true positives (TP), false negatives (FN), false positives (FP), and true negatives (TN), as in Fig. 1, to determine an optimal skewness cutoff.

A combination score can be used to confidently identify cognate partners

A principal goal of the present work is to synergistically combine the characteristics of cognate partners and the rs-pharmacophore to develop a confidence score that can indicate the reliability of ProtLID candidate binding partner predictions. A basic confidence score was modeled as a simple linear combination of the ProtLID Score and the MatchSize – relative respective weights of 90% and 10% were found to optimally improve the cognate partner identification F-Score to 0.78 at a combination score cutoff of 0.50 (Suppl. Fig. 2). The ProtLID Score distribution skewness served as a further enhancement to the confidence score: any cognate ligands coming from a candidate list where the skewness exceeded 2.5 was assigned a combination score of 1, while those that did not were assigned the basic ProtLID-MatchSize combination score (Suppl. Fig. 2). This skewness-enhanced combination score demonstrates a superior ability to identify cognate partners (Fig. 3), with an F-Score of 0.91 at a combination score cutoff of 0.50. This dramatic improvement in cognate partner identification is chiefly due to the correction of false negatives by the skewness component of the combination score. The skewness-enhanced combination score also correlates with the ability of the rs-pharmacophore to match cognate ligands’ true protein-binding interfaces (Fig. 3), an observation is consistent with the prior performance of the ProtLID algorithm [11].

Figure 3. — The ProtLID Score and MatchSize are combined linearly with respective weights of 90% and 10%. The combination score is such that any cognate partners coming from a ProtLID Score distribution with a skewness greater than 2.5 are assigned a combination score of 1 – for visibility in this plot, these points were alternately assigned combination scores of 0.98 and 1. In addition, each point is colored according to the “Binding Interface F-Score”, which indicates how well the rs-pharmacophore matched the cognate partner’s crystal-complex-defined “true” binding interface. Highly-ranked cognate partners tend to have their true binding interfaces matched: 19/30 top-15-ranked cognate partners with a combination score greater than 0.50 have binding interface F-Scores of at least 0.40, whereas only 9/47 cases with a combination score equal to or less than 0.50 outside the top 15 ranks had a binding interface F-Score of at least 0.40.

Receiver operatic characteristic (ROC) curves were constructed by plotting sensitivity (the true positive rate) against 1 – specificity (the false positive rate) as these vary with the confidence score cutoff for top-15-ranked cognate partner identification using the ProtLID Score, the MatchSize, and the basic and skewness-enhanced combination scores (Fig. 4). The area under the ROC curve (AUC) is an alternative to the F-Score that rewards true negative identification: an AUC of 1 represents perfect sensitivity and specificity throughout the entire range of confidence score cutoff values, while an AUC of 0.50 would represent predictions no better than random. By this measure, the MatchSize has the lowest performance in cognate ligand identification, with an AUC of 0.735, while the ProtLID Score and basic combination scores show the same AUC of 0.869. That the ProtLID Score and basic combination scores have the same AUC suggests that the ProtLID-MatchSize synergy is drowned out by the relatively large number of true negatives in the dataset (Fig. 1). Nevertheless, ROC curves allow for a clear visualization of the improvements made by the skewness-enhanced combination score: its AUC of 0.938 is due to an increase in sensitivity at a slight cost in specificity when compared to the relative ProtLID Score and basic combination score.

Figure 4. — Each curve is constructed by varying from 0 to 1 the score cutoffs used to partition rank correlation plots such as those in Figures 1 and 2. The area under each curve represents the performance of each score at cognate partner identification, with larger areas representing greater sensitivity and specificity throughout the score cutoff variation range. The dashed diagonal line indicates the expected performance of a random cognate partner identification.

The skewness-enhanced combination score can enrich the set of experimental tests with cognate partners

On its own, ProtLID would suggest the experimental examination of the top 15 ranked candidate ligands for each of the 23 receptors in the present work’s dataset; without accounting for redundancy, there would be 15*23 = 345 protein pairs to test. Out of these, 33 (9.6%) constitute correctly-identified cognate receptor-ligand pairs. When considering the subset of 5 receptors whose ProtLID Score distribution skewness was greater than 2.5 (Table 1), 18 of 75 (24%) possible cognate receptor-ligand pairs are correctly identified. All 5 of these receptors had at least one cognate partner in their top 15 candidate ligand ranks. Furthermore, the set of top 15 ranked candidate ligands can be further prioritized by considering only those ligands with a relative MatchSize of 0.60. The set of 75 candidate ligands would be reduced to 51, out of which 14 (27%) overall would be cognate partners. The specific percentage of cognate partners meeting the MatchSize cutoff in the list of predicted ligands for individual receptors range from 15% to 57% (Table 1). These rates represent a tremendous increase over the reported success rate for small molecule virtual screening of < 5% [21]. Thus, the present work provides a way to enrich the set of computational protein-protein interaction predictions with likely cognate binding partners and provides an avenue for the rapid development, evaluation, and prioritization of testable biological hypotheses.

Conclusions

The principal contribution of this work is the development of a confidence score that can expedite the experimental exploration of pharmacophore-based protein-protein interaction predictions. This work also expands on our previous pilot study. We have shown recently that the residue based pharmacophore approach implemented in ProtLID has favorable performance compared to leading docking studies [11]. In the current work, on an expanded list of test cases, this performance is virtually unchanged, statistically insignificantly improved from 61% to 65%. As a caveat, this performance comes with a longer calculation time. The most time consuming step in the approach is the generation of tens of thousands of short MD trajectories. While this is a highly parallelizable process, still on an average computer cluster with about 500 computing cores, it can take 2–3 days to complete. There are a number of possible ways to speed up this calculation, such as, freezing irrelevant parts of the structure or employing enhanced sampling approaches, such as replica exchange [22]. These issues will be subjects of future studies.

Supplementary Material

supinfo

NIHMS1663933-supplement-supinfo.docx^{(464.4KB, docx)}

Acknowledgments

This work was supported by the following grants and agencies: National Institutes of Health (NIH) grant GM118709, GM100482 and AI141816; NG was supported by the National Research Service Award (NRSA) individual fellowship F31GM116570 and the Medical Scientist Training Program (MSTP) grant T32GM007288.

Footnotes

Data availability

Software implementing the method described in this article is available on request from the authors.

Conflict of Interest

U.S. patent application No. 16/068,938 has been filed for the ProtLID program.

References

1.Zinn K, Ozkan E: Neural immunoglobulin superfamily interaction networks. Curr Opin Neurobiol 2017, 45:99–105. [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Barclay AN: Membrane proteins with immunoglobulin-like domains--a master superfamily of interaction molecules. Semin Immunol 2003, 15(4):215–223. [DOI] [PubMed] [Google Scholar]
3.Williams AF, Barclay AN: The immunoglobulin superfamily--domains for cell surface recognition. Annu Rev Immunol 1988, 6:381–405. [DOI] [PubMed] [Google Scholar]
4.Chattopadhyay K, Lazar-Molnar E, Yan Q, Rubinstein R, Zhan C, Vigdorovich V, Ramagopal UA, Bonanno J, Nathenson SG, Almo SC: Sequence, structure, function, immunity: structural genomics of costimulation. Immunological reviews 2009, 229(1):356–386. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Wai Wong C, Dye DE, Coombe DR: The role of immunoglobulin superfamily cell adhesion molecules in cancer metastasis. International journal of cell biology 2012, 2012:340296. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Dermody TS, Kirchner E, Guglielmi KM, Stehle T: Immunoglobulin superfamily virus receptors and the evolution of adaptive immunity. PLoS Pathog 2009, 5(11):e1000481. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Vincenti F, Luggen M: T cell costimulation: a rational target in the therapeutic armamentarium for autoimmune diseases and transplantation. Annu Rev Med 2007, 58:347–358. [DOI] [PubMed] [Google Scholar]
8.Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W et al. : Initial sequencing and analysis of the human genome. Nature 2001, 409(6822):860–921. [DOI] [PubMed] [Google Scholar]
9.Yap EH, Rosche T, Almo S, Fiser A: Functional clustering of immunoglobulin superfamily proteins with protein-protein interaction information calibrated hidden Markov model sequence profiles. J Mol Biol 2014, 426(4):945–961. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Rubinstein R, Ramagopal UA, Nathenson SG, Almo SC, Fiser A: Functional classification of immune regulatory proteins. Structure 2013, 21(5):766–776. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Yap EH, Fiser A: ProtLID, a Residue-Based Pharmacophore Approach to Identify Cognate Protein Ligands in the Immunoglobulin Superfamily. Structure 2016, 24(12):2217–2226. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Dai S, Jia R, Zhang X, Fang Q, Huang L: The PD-1/PD-Ls pathway and autoimmune diseases. Cell Immunol 2014, 290(1):72–79. [DOI] [PubMed] [Google Scholar]
13.Shrestha R, Garrett SC, Almo SC, Fiser A: Computational Redesign of PD-1 Interface for PD-L1 Ligand Selectivity. Structure 2019, 27(5):829–836 e823. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.UniProt C: UniProt: a hub for protein information. Nucleic Acids Res 2015, 43(Database issue):D204–212. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE: The Protein Data Bank. Nucleic Acids Res 2000, 28(1):235–242. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Xu D, Zhang Y: Generating triangulated macromolecular surfaces by Euclidean Distance Transform. PLoS One 2009, 4(12):e8140. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Sobolev V, Sorokine A, Prilusky J, Abola EE, Edelman M: Automated analysis of interatomic contacts in proteins. Bioinformatics 1999, 15(4):327–332. [DOI] [PubMed] [Google Scholar]
18.Case DA, Cheatham TE 3rd, Darden T, Gohlke H, Luo R, Merz KM Jr., Onufriev A, Simmerling C, Wang B, Woods RJ: The Amber biomolecular simulation programs. J Comput Chem 2005, 26(16):1668–1688. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Witten IH, Frank E, Hall MA: Data Mining: Practical Machine Learning Tools and Techniques, 3rd edn. Burlington, MA: Morgan Kaufmann; 2011. [Google Scholar]
20.Casella G, Berger RL: Statistical Inference. Pacific Grove, California: Duxbury/Thomson Learning; 2002. [Google Scholar]
21.Kontoyianni M: Docking and Virtual Screening in Drug Discovery. Methods Mol Biol 2017, 1647:255–266. [DOI] [PubMed] [Google Scholar]
22.Yoda T, Sugita Y, Okamoto Y: Protein folding simulations by generalized-ensemble algorithms. Adv Exp Med Biol 2014, 805:1–27. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

supinfo

NIHMS1663933-supplement-supinfo.docx^{(464.4KB, docx)}

[R1] 1.Zinn K, Ozkan E: Neural immunoglobulin superfamily interaction networks. Curr Opin Neurobiol 2017, 45:99–105. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R2] 2.Barclay AN: Membrane proteins with immunoglobulin-like domains--a master superfamily of interaction molecules. Semin Immunol 2003, 15(4):215–223. [DOI] [PubMed] [Google Scholar]

[R3] 3.Williams AF, Barclay AN: The immunoglobulin superfamily--domains for cell surface recognition. Annu Rev Immunol 1988, 6:381–405. [DOI] [PubMed] [Google Scholar]

[R4] 4.Chattopadhyay K, Lazar-Molnar E, Yan Q, Rubinstein R, Zhan C, Vigdorovich V, Ramagopal UA, Bonanno J, Nathenson SG, Almo SC: Sequence, structure, function, immunity: structural genomics of costimulation. Immunological reviews 2009, 229(1):356–386. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] 5.Wai Wong C, Dye DE, Coombe DR: The role of immunoglobulin superfamily cell adhesion molecules in cancer metastasis. International journal of cell biology 2012, 2012:340296. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R6] 6.Dermody TS, Kirchner E, Guglielmi KM, Stehle T: Immunoglobulin superfamily virus receptors and the evolution of adaptive immunity. PLoS Pathog 2009, 5(11):e1000481. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R7] 7.Vincenti F, Luggen M: T cell costimulation: a rational target in the therapeutic armamentarium for autoimmune diseases and transplantation. Annu Rev Med 2007, 58:347–358. [DOI] [PubMed] [Google Scholar]

[R8] 8.Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W et al. : Initial sequencing and analysis of the human genome. Nature 2001, 409(6822):860–921. [DOI] [PubMed] [Google Scholar]

[R9] 9.Yap EH, Rosche T, Almo S, Fiser A: Functional clustering of immunoglobulin superfamily proteins with protein-protein interaction information calibrated hidden Markov model sequence profiles. J Mol Biol 2014, 426(4):945–961. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R10] 10.Rubinstein R, Ramagopal UA, Nathenson SG, Almo SC, Fiser A: Functional classification of immune regulatory proteins. Structure 2013, 21(5):766–776. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R11] 11.Yap EH, Fiser A: ProtLID, a Residue-Based Pharmacophore Approach to Identify Cognate Protein Ligands in the Immunoglobulin Superfamily. Structure 2016, 24(12):2217–2226. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R12] 12.Dai S, Jia R, Zhang X, Fang Q, Huang L: The PD-1/PD-Ls pathway and autoimmune diseases. Cell Immunol 2014, 290(1):72–79. [DOI] [PubMed] [Google Scholar]

[R13] 13.Shrestha R, Garrett SC, Almo SC, Fiser A: Computational Redesign of PD-1 Interface for PD-L1 Ligand Selectivity. Structure 2019, 27(5):829–836 e823. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R14] 14.UniProt C: UniProt: a hub for protein information. Nucleic Acids Res 2015, 43(Database issue):D204–212. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R15] 15.Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE: The Protein Data Bank. Nucleic Acids Res 2000, 28(1):235–242. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R16] 16.Xu D, Zhang Y: Generating triangulated macromolecular surfaces by Euclidean Distance Transform. PLoS One 2009, 4(12):e8140. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R17] 17.Sobolev V, Sorokine A, Prilusky J, Abola EE, Edelman M: Automated analysis of interatomic contacts in proteins. Bioinformatics 1999, 15(4):327–332. [DOI] [PubMed] [Google Scholar]

[R18] 18.Case DA, Cheatham TE 3rd, Darden T, Gohlke H, Luo R, Merz KM Jr., Onufriev A, Simmerling C, Wang B, Woods RJ: The Amber biomolecular simulation programs. J Comput Chem 2005, 26(16):1668–1688. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R19] 19.Witten IH, Frank E, Hall MA: Data Mining: Practical Machine Learning Tools and Techniques, 3rd edn. Burlington, MA: Morgan Kaufmann; 2011. [Google Scholar]

[R20] 20.Casella G, Berger RL: Statistical Inference. Pacific Grove, California: Duxbury/Thomson Learning; 2002. [Google Scholar]

[R21] 21.Kontoyianni M: Docking and Virtual Screening in Drug Discovery. Methods Mol Biol 2017, 1647:255–266. [DOI] [PubMed] [Google Scholar]

[R22] 22.Yoda T, Sugita Y, Okamoto Y: Protein folding simulations by generalized-ensemble algorithms. Adv Exp Med Biol 2014, 805:1–27. [DOI] [PubMed] [Google Scholar]

PERMALINK

Estimating the accuracy of pharmacophore-based detection of cognate receptor-ligand pairs in the immunoglobulin superfamily

Nelson Gil

Rojan Shrestha

Andras Fiser

Abstract