Skip to main content
Computational and Structural Biotechnology Journal logoLink to Computational and Structural Biotechnology Journal
. 2020 May 15;18:1153–1159. doi: 10.1016/j.csbj.2020.05.003

Coevolutionary data-based interaction networks approach highlighting key residues across protein families: The case of the G-protein coupled receptors

Filippo Baldessari a, Riccardo Capelli b,, Paolo Carloni b, Alejandro Giorgetti a,b
PMCID: PMC7260681  PMID: 32489528

Graphical abstract

graphic file with name ga1.jpg

Keywords: GPCRs, Coevolution, Interaction network, Conformational states, Functionally relevant residues

Abstract

We present an approach that, by integrating structural data with Direct Coupling Analysis, is able to pinpoint most of the interaction hotspots (i.e. key residues for the biological activity) across very sparse protein families in a single run. An application to the Class A G-protein coupled receptors (GPCRs), both in their active and inactive states, demonstrates the predictive power of our approach. The latter can be easily extended to any other kind of protein family, where it is expected to highlight most key sites involved in their functional activity.

1. Introduction

The discovery of correlated mutations in protein families across different organisms has shown to provide valuable information on the functional role of residues [1]. These mutations arise from evolutionary pressure, that drives the changes to enhance stability and/or biological function. Starting from the correlation between mutations in sequence alignments (i.e., coevolutionary analysis, CA), one can infer that the destabilization induced by a single-point mutation can be attenuated or even counterbalanced by a corresponding mutation in a different portion of the sequence. Different CA approaches include DCA [2], [3], [4], plmDCA [5], GREMLIN [6], PSICOV [7], and many others [8], [9]. These methods have been used for different goals [10]: from the definition of coarse-grained force fields for molecular simulation [11], to the prediction of mutation energetics [12], [13], the direct inference of 3D structures [3], refinement of structure prediction [14], [15], and the investigation of protein–protein interactions [16], [17].

Here we introduce a structure-based CA that identifies in a single run highly structurally and/or functionally relevant residues across a protein family. This is achieved integrating structural information on a modified version of Lui and Tiana’s [12] approach. The latter uses internal interaction networks to uncover frustrated interactions and mutation free energy differences for a specific protein. Our protocol has several advantages. Unlike previous approaches, mainly based on single protein domains (usually obtained from PFAM [18]), our protocol includes entire proteins in the calculations. In addition, it provides insight on different conformational states of proteins (unlike “classical” DCA analysis [2], which is based on pure sequence information), such as receptor active/inactive states or ion channel close/open states. Finally, and most importantly, it can be applied to large sparse families, i.e. with very large sequence variability due to evolution. Applications to the sparse and fairly well-structurally characterized [19] subfamily, “class A” of the human G-Protein Coupled Receptor (hGPCRs) superfamily, shows the predictive power of the approach: in a single run, it identifies coevolutionary related hotspots, previously pinpointed by techniques other than CA [20], [21], integrating also structural information to highlight differences in distinct conformational states. These are fundamental structural/functional residues or correlated with diseases. The protocol is totally general and can easily be extended to other subfamilies of GPCRs, from organisms other than Homo Sapiens, as well of other large receptor families with large intrinsic variability, like the pentameric ligand-gated ion channels (pLGICs) or the voltage-gated ion channels.

2. Results

The approach

In the following sections, we briefly describe our multistep strategy (Fig. 1). Our approach uses structural information in two different stages of the protocol, i.e. during the multiple sequence alignment (MSA) generation and in the construction of the interaction matrix, in contrast to most of the previously used coevolutionary approaches, that usually do not consider this information.

Fig. 1.

Fig. 1

Schematic representation of the workflow. We employ available sequence alignments and structures to build a coevolution-based interaction matrix that we refine using a contact map, building a network that contains spatial and interaction information about the protein of interest. Hotspots are finally identified by means of the analysis of the betweenness centrality of every node, that are subsequently labeled based on the data available in the literature.

Below we report details of each of these steps.

2.1. Experimental data

Let us consider the general case for which sequences across a given family are highly diverse. In this case several bottlenecks, starting from the MSA generation, can be found. Thus, the use of state of the art methodologies can be applied to overcome these difficulties. We consider here an alignment formed by M sequences composed by L residues σi (where i is the kind of amino acid) obtained form curated databases (Uniprot [22], Pfam [18], or GPCRdb [23]). The MSA can be generated using an algorithm, Promals3D [24], that utilizes structural data, and thus can alleviate the difficulty in aligning families that displays sparse sequences.

2.2. Interaction analysis – DCA and Sequence-specific interaction matrix

Our coevolutionary analysis protocol is based on Direct Coupling Analysis [2], [3]. The basic assumption of such technique is that the interaction between different residues can be written as

U(σi)=i<jEijσa,σb+ihi(σa) (1)

where σi is the amino acid at i-th position of the sequence, Eijσa,σb is the two-bodies term (analogous to the interaction term of a Potts’ model) that contains the interaction energy between residues σi and σj at position i and j of the alignment, respectively, and hi(σa) is a one-body potential that can act on the residues (analogous to the field term of a Potts’ model).

The key quantity here is the two-bodies term Eijσa,σb, that contains the information needed to build the interaction graph, leading, eventually, to the identification of the hotspots (see below). The frequency counts of pairs fij(σa,σb) and single fi(σa) residues of a sequence with n amino acids can be seen as marginals of a probability distribution

fi(σa)=σk;kap(σ1,σ2,,σn) (2)
fij(σa,σb)=σk;ka,bp(σ1,σ2,,σn) (3)

where the probability p is defined as

p(σ1,,σn)=1Zexp-Uσi. (4)

Starting from our alignment with n residues and M sequences of length L, we can compute the empirical frequencies f~ij(σa,σb) and f~i(σa). For M+, the empirical frequencies f~i,f~ij will match the theoretical distributions fi,fij. For a realistic situation, using the empirical distribution we will have finite-size effects, given by the finite number of sequences available. To overcome this issue, we reweight the empirical frequencies with the appropriate pseudocounts [25]. These pseudocounts are weighted on the distribution of residue types (via the weight x), on the distribution of residue types in the considered alignment (via the weight y), and on the distribution of residue types in a specific pair of positions in the alignment (via the weight z), namely

fi(σa)=1Me(x+y+z+1)×fi~(σa)+xMeq+yjfj~(σa)L+zfi~(σa)fij(σa,σb)=1Me(x+y+z+1)fij~(σa,σb)+xMeq2+yL2Meklfk~(σa)fl~(σb)+zMefi~(σa)fj~(σb), (5)

where q is the number of residue types (here we have 21 types: the 20 amino acids and the gap), Me=s1/ms is an effective number of sequences and ms is the number of sequences with similarity larger than 70%. In this work we fixed x=0.5,y=0.1, and z=1.0 following Contini and Tiana [13], that tested these parameters for both cytosolic and membrane proteins.

After the reweighting, if we apply a mean-field approximation we can obtain the associated correlation matrix Cij for the frequencies defined as

Cij(σa,σb)=fij(σa,σb)-fi(σa)fj(σb), (6)

and finally obtain the two-bodies interaction energy in Table 2

Eij(σa,σb)=(C-1)ij(σa,σb) (7)

Table 2.

Details of the hotspots. For every hotspot identified, we highlight the state (active or inactive) of the hGPCR where the residue was identified, the presence of a documented function, interaction with a ligand, the existence of a mutant or variant in the GPCRdb, and the amino acid consensus. Hy indicates general hydrophobic residues; Ha, Hydrophobic aliphatic; Hb, hydrogen bonding; Sm, small. For references of the experimental data, see [11] to [100] of SI.

graphic file with name fx1.gif

As shown in Refs. [3], [12].

This two-bodies term Eij(σa,σb) (that have the form of a 4-dimensional tensor) contains the two-bodies interaction of all the possible residue pairs. Choosing our sequence of interest, we can extract from this tensor the interaction matrix Eij that describes the interaction between all the possible amino acid pairs in the system.

2.3. Network analysis

Eij contains interaction information that is complementary to 3D structural data, because can discriminate the most important energetic contribution of residue that are located in spatial proximity. To integrate spatial and energetic information in a single object, we computed a residue-residue contact map Qij for every protein structure:

Qij=1,ifri-rj10Å0,ifri-rj>10Å (8)

Where ri-rj is the distance between the Cβs of the i-th and the j-th residues (in the case of glycine, the hydrogen atom in the analogue position).

To insert structural information in Eij, we proceed following Scarabelli et al.[26]. There, the authors perform an Hadamard product (a element-by-element multiplication of the two matrices) between the contact map and the interaction matrix, obtaining the local interaction matrix Lij, namely

Lij=QijEij (9)

If the experimental structure considered contains non-resolved residues, one can remove the part of Eij involving such parts.

Now, let us consider each residue of the protein family as a node of a weighted graph G. In this graph, we connect two nodes if the modulus of the interaction obtained from DCA between the respective residues is larger than a threshold value ethr defined iteratively: we start building a network considering the maximum energy of the matrix (in modulus), obtaining an unconnected graph (i.e., a network of isolated nodes except for the two residues with the strongest interaction).1 At this point, we iteratively lower ethr value until we obtain a connected graph. The maximum value of ethr that still returns a connected graph defines the final interaction network and the connected graph (G) itself.

2.4. Hotspots

The betweenness centrality of a node B(k) in G reads [27]:

B(k)=i,jGξ(i,j|k)ξ(i,j) (10)

where ξ(i,j|k) is the number of the shortest paths in the graph that connect i and j passing through k, and ξ(i,j) is the number of the shortest paths in the graph that connect i and j.

If the considered node is “central” in the network (i.e., the information flow passes through it to connect different portion of the protein), its betweenness centrality will be larger. We considered residues as hotspots if the betweenness centrality of their associated node is larger than half of the maximum betweenness centrality in all the nodes.

2.5. Application to class A hGPCRs

hGPCRs, with more than 800 members [19], is the largest family of cell-surface receptors. External signals are translated by this family into cell stimuli. A widely used classification system of hGPCRs is the A-F system that is mainly based on their amino acid sequences and functional similarities. This classification identifies six classes, labeled A-F. Class A, also known as the “rhodopsin-like family”, is the largest group of hGPCRs [28], which includes hormones, neurotransmitters, and light receptors and accounts for around 80% of hGPCRs. These proteins share a common topological signature, namely seven α-helical transmembrane (TM) domains [29]. The members of the family share also positions of residues directly involved in ligand binding and receptors activation. These include for example positions 3.28, 3.32, 3.33, 3.36, 3.37, 5.39, 6.44, 6.48, 6.55, 7.35 and 7.39 [30], [31], [32], functionally conserved along the entire class A [32], [20].2 Such common structural organization contrasts strongly with the agonists’ structural diversity, from subatomic particles (a photon), to small molecules, up to peptides and even proteins [29]. Agonist binding to class A hGPCRs triggers receptor activation. The residues involved in the activation extend from the binding cavity, [20], [34], [35], [36], [37] to the intracellular side of the receptor. Activation lead to binding of a cognate proteins, e.g. G-protein and β-arrestins, and finally downstream signaling pathways. However, these proteins do not simply switch between alternative agonist-bound and inactive forms in this process. They rather adopt a series of intermediate states -likely represented by an ensemble of conformations [38]- influenced not only by agonist binding, but also by other receptors, signaling and regulatory proteins, by post-translational modifications, and by environmental cues [39].

The input of the workflow (Fig. 1) consists of sequence alignments and of experimentally solved (PDB) structures of vertebrate class A GPCRs.3 We considered the vertebrates GPCRs for building up the evolutionary history of the family because, out of vertebrate species the classification in subfamilies is more difficult and not always accurate [19].

The subclass sequences were downloaded from the Uniprot database [22]. The reviewed sequences were firstly chosen (2514 sequences). New sequences from the unreviewed data set were then manually added. All the resulting curated sequences (5,000) were aligned using the Promals3D web-server [24]. We used the default parameters of the server. The MSA obtained by using this program satisfies all the class A hGPCR features, a set of highly conserved residues in each of the transmembrane helices [30], that gives rise to the Ballesteros-Weinstein nomenclature [33] (see footNote 3). The alignment was aided by 50 experimental or predicted structures belonging to 28 different human class A GPCRs, both in the active and inactive states (see Table 1). In all the cases, the structure was not resolved in its entirety: parts of the sequence (typically the intracellular loop and the C- and N-termini of the chain) was missing in the experimental structure. To match structural/sequence information (and matrices dimension), we removed the parts of the Eij that involved unresolved residues. Next, we built the local interaction network and computed the betweenness centrality of every residue. As mentioned in the description of the method, in this phase we used again the structural information of Table 1. Several hotspot positions underwent site-directed mutagenesis experiments (see Tables 2 and 1 SI and references within). Many mutants have a lower ligand activity or prevent activation ( https://gpcrdb.org) [23]) or are linked to disease. Residues identified as hotspots across 30% or more hGPCRs have a documented biological function (Table 2), such as: belonging to the ligand binding site (i), or to the micro-switch network of activation (ii) or being located within the allosteric Na+ binding cavity (iii). Not all the hotspots listed in Table 2 are present in all the structures in Table 1 (the number of hotspots per single structure ranging from 2 to 26, see Table 2 SI). In particular, for some identified hotspots we can infer their functional role via a direct comparison with other members within the same family. Some examples are briefly discussed below,

Table 1.

Human Class A GPCRs experimental and predicted structures. The homology models were obtained from GPCRdb [23] if the sequence identity between target and template was >50%. Some of the human chemokine receptors structures are only in the inactive state and we also analyzed models for inactive conformations that did not have an experimental structure.

Name Species UNIPROT Active Inactive
Rhodopsin Human OPSD_HUMAN 6CMO (98%)
Cannabinoid-1 Human CNR1_HUMAN 6N4B 5U09
Cannabinoid-2 Human CNR2_HUMAN (66%) 5ZTY
Muscarinic M1 Human ACM1_HUMAN 6OIJ 5CXV
Muscarinic M2 Human ACM2_HUMAN 4MQS 3UON
Muscarinic M4 Human ACM4_HUMAN (91%) 5DSG
β2-Adrenoreceptor Human ADRB2_HUMAN 4LDE 2RH1
Adenosine A1 Human AA1R_HUMAN 6D9H 5UEN
Adenosine A2A Human AA2AR_HUMAN 5G53 5NM4
δ-Opioid Human OPRD_HUMAN (82%) 4N6H
μ-Opioid Human OPRM_HUMAN 5C1M 4DKL
κ-Opioid Human OPRK_HUMAN 6B73 4DJK
NOP Receptor Human OPRX_HUMAN (77%) 5DHH
Serotonin 1B Human 5HT1B_HUMAN 6G79 (60%)
Serotonin 2A Human 5HT2A_HUMAN (83%) 6A94
Serotonin 2B Human 5HT2B_HUMAN 5TUD (78%)
Serotonin 2C Human 5HT2C_HUMAN 6BQG 6BQH
Dopamine 2 Receptor Human DRD2_HUMAN (60%) 6CM4
Dopamine 3 Receptor Human DRD3_HUMAN (57%) 3PBL
Dopamine 4 Receptor Human DRD4_HUMAN (57%) 5WIU
Angiotensin 1 Human AGTR1_HUMAN 6DO1 4YAY
Apelin Receptor Human APJ_HUMAN (54%) 5VBL
C–C Chemokine 2 Human CCR2_HUMAN 6GPX
C–C Chemokine 5 Human CCR5_HUMAN 5UIX
C–C Chemokine 9 Human CCR9_HUMAN 5LWE
C–C Chemokine 1 Human CCR1_HUMAN (78%)
C–C Chemokine 3 Human CCR3_HUMAN (77%)
C–C Chemokine-Like 2 Human CCRL2_HUMAN (62%)

2.5.1. Ligand binding

Our protocol is able to capture residues with a fundamental role in selectivity, ligand binding affinity [30] and in dynamical events underlying hGPCR activation (see Table 2; [31], [32], [20]). For example, it identifies the conserved hotspots involved in ligand binding 3.32, 3.37 and 6.48 [40], [30], [32], [20] (Tables 2 and 1 SI for references). The first residue plays a role for ligand charge detection [32]. It is an aspartic acid in 22% of the class A hGPCRs, interacting with amines or with other positively charged groups [32]. The second residue is involved not only in ligand recognition but also in receptor activation [41]. The last one, is a tryptophan residue in more than 77% of the class A subfamily. This position is well known in literature, since it is a hub involved either in ligand detection and as forming the ‘toggle switch’ involved in receptor activation (see below) [42].

2.5.2. Micro-switches

The so-called “micro-switches” are small groups of residues that undergo conformational changes during receptor activation and are mechanistically involved in the activation of GPCRs [43], [42], [44]. Those include: (i) D[E]R3.50Y in helix III which is a common motif that forms the ’ionic lock’, during inactivation, (ii) NP7.50xxY in helix VII which plays an important role in proteins’ conformational changes upon activation [42]. The link between this region and the binding cavity is the ’toggle switch’ formed by positions 6.44, 6.48, 3.40, 5.50: upon ligand binding, position 3.40 rotates and locates between 6.48 and 6.44. This, induce a conformational change of the “hydrophobic barrier”, located below the “toggle switch” that includes positions 2.43, 2.46, 3.43, 3.46, 6.36, 6.37 and 6.40 [45], [42]. The conformational change is important for receptor activation [30], [31]. All the residues involved in this complex mechanism were detected as hotspots in one single run of our protocol (Table 2).

2.5.3. Na+ binding cavity

An allosteric binding cavity for a partially hydrated Na+ ion is conserved across class A hGPCRs, excluding the opsins [46]. The hydrated Na+ is bound in the middle of 7TM helices bundle, it stabilizes the inactive state and reduces basal G-protein activity [46]. D2.50 (90% conserved as Asp) directly coordinates the Na+ ion, N1.50 (97% Asn), S3.39 (75% Ser), N7.45 (70% Asn), S7.46 (66% Ser), N7.49 (75% Asn), and finally Y7.53 (89% Tyr) [46] complete the coordination of the ion. The protocol identifies, as hotspots, D2.50, S3.39 and 7.49 across class A subfamily members, except for the opsins, consistently with experiments [46] (Table 2 SI).

3. Conclusions

We have presented an approach able to capture most of the coevolution-related events relevant for the function of a very sparse protein family or subfamily (Fig. 1). The protocol provides information not only on residues spanning along the full-length, but also on different activation states of the receptors. In contrast to previous approaches, we make use of structural information.

Application to human class A hGPCRs structures shows that, from a sparse family multiple sequence alignment, we were capable of extracting all the residues known to be involved in the different aspects of the receptor activity that were previously identified [20], [21]. These include i) all the position within the binding cavity with a conserved functional role; ii) residues forming the activation microswitches and iii) residues forming the Na+ allosteric binding cavity and those that were found to be mutated in correlation with disease. Importantly, the method was able to capture the functional role of all the residues in one single shot.

Our approach is totally general and can easily be extended to other subfamilies of GPCRs for which experimental structures are available.4 As an example, we cite here the pentameric ligand-gated ion channels (pLGICs) protein family, that mediate fast neurotransmission in the nervous system [47], [48], [49]. These are evolutionary correlated, they share a common architecture that consists of three distinct domains, and they exist in at least three distinct functional states [47], [48], [50]. By exploiting the available structural information [47], [48], the approach might be able to identify all hotspots across the family in a single run. As soon as a statistical significant number of structure will be available, a more specific analysis on single subfamilies of class A hGPCRs can be readily be performed.

One of the present limitations of the protocol regards the study of the oligomers, because the integration of the structural data removes the interaction between residues that are far away in the single monomer. In the future, we plan to remove this restriction integrating oligomers data coming from experiments.

CRediT authorship contribution statement

Filippo Baldessari: Investigation. Riccardo Capelli: Conceptualization, Investigation. Paolo Carloni: Conceptualization. Alejandro Giorgetti: Conceptualization.

Acknowledgments

The authors thank Guido Tiana for providing the code to compute interaction matrices from sequence alignments. This project has received funding from the European Union’s Horizon 2020 Research and Innovation Programme under Grant Agreement 785907 (HBP SGA2).

Footnotes

1

In graph theory a connected graph is a network where, choosing any possible pair of nodes i and j, it is possible to go from i to j via a path defined by the edges of the graph.

2

From here on, we will use the Ballesteros-Weinstein (or generic) numbering scheme [33] commonly used for class A GPCRs. Within this framework, the first number indicates the helix and the second number indicates the residue position with respect to the most conserved residue in that helix (x.50). For example, position 3.52 refers to a residue in helix 3, two positions after the most conserved residue, the 3.50.

3

We consider only non-olfactory class A GPCRs as in [19].

4

In its current version, the approach cannot deal with the olfactory receptors as they are a huge disperse subfamily without a crystal structure.

Appendix A

Supplementary data associated with this article can be found, in the online version, athttps://doi.org/10.1016/j.csbj.2020.05.003.

Supplementary data

The following are the Supplementary data to this article:

mmc1.pdf (716.1KB, pdf)

References

  • 1.Ovchinnikov S., Park H., Varghese N., Huang P.-S., Pavlopoulos G.A., Kim D.E., Kamisetty H., Kyrpides N.C., Baker D. Protein structure determination using metagenome sequence data. Science. 2017;355(6322):294–298. doi: 10.1126/science.aah4043. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Weigt M., White R.A., Szurmant H., Hoch J.A., Hwa T. Identification of direct residue contacts in protein–protein interaction by message passing. Proc Nat Acad Sci. 2009;106(1):67–72. doi: 10.1073/pnas.0805923106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Morcos F., Pagnani A., Lunt B., Bertolino A., Marks D.S., Sander C., Zecchina R., Onuchic J.N., Hwa T., Weigt M. Direct-coupling analysis of residue coevolution captures native contacts across many protein families. Proc Nat Acad Sci. 2011;108(49):E1293–E1301. doi: 10.1073/pnas.1111471108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Baldassi C., Zamparo M., Feinauer C., Procaccini A., Zecchina R., Weigt M., Pagnani A. Fast and accurate multivariate gaussian modeling of protein families: predicting residue contacts and protein-interaction partners. PloS One. 2014;9(3) doi: 10.1371/journal.pone.0092721. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Ekeberg M., Hartonen T., Aurell E. Fast pseudolikelihood maximization for direct-coupling analysis of protein structure from many homologous amino-acid sequences. J Comput Phys. 2014;276:341–356. [Google Scholar]
  • 6.Kamisetty H., Ovchinnikov S., Baker D. Assessing the utility of coevolution-based residue–residue contact predictions in a sequence-and structure-rich era. Proc Nat Acad Sci. 2013;110(39):15674–15679. doi: 10.1073/pnas.1314045110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Jones D.T., Buchan D.W., Cozzetto D., Pontil M. Psicov: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments. Bioinformatics. 2011;28(2):184–190. doi: 10.1093/bioinformatics/btr638. [DOI] [PubMed] [Google Scholar]
  • 8.Skwark M.J., Abdel-Rehim A., Elofsson A. Pconsc: combination of direct information methods and alignments improves contact prediction. Bioinformatics. 2013;29(14):1815–1816. doi: 10.1093/bioinformatics/btt259. [DOI] [PubMed] [Google Scholar]
  • 9.Burger L., Van Nimwegen E. Disentangling direct from indirect co-evolution of residues in protein alignments. PLoS Computat Biol. 2010;6(1) doi: 10.1371/journal.pcbi.1000633. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Morcos F., Onuchic J.N. The role of coevolutionary signatures in protein interaction dynamics, complex inference, molecular recognition, and mutational landscapes. Curr Opin Struct Biol. 2019;56:179–186. doi: 10.1016/j.sbi.2019.03.024. [DOI] [PubMed] [Google Scholar]
  • 11.Sutto L., Marsili S., Valencia A., Gervasio F.L. From residue coevolution to protein conformational ensembles and functional dynamics. Proc Nat Acad Sci. 2015;112(44):13567–13572. doi: 10.1073/pnas.1508584112. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Lui S., Tiana G. The network of stabilizing contacts in proteins studied by coevolutionary data. J Chem Phys. 2013;139(15):10B618_1. doi: 10.1063/1.4826096. [DOI] [PubMed] [Google Scholar]
  • 13.Contini A., Tiana G. A many-body term improves the accuracy of effective potentials based on protein coevolutionary data. J Chem Phys. 2015;143(2):07B608_1. doi: 10.1063/1.4926665. [DOI] [PubMed] [Google Scholar]
  • 14.Ovchinnikov S., Kim D.E., Wang R.Y.-R., Liu Y., DiMaio F., Baker D. Improved de novo structure prediction in CASP 11 by incorporating coevolution information into Rosetta. Proteins: Struct Funct Bioinf. 2016;84:67–75. doi: 10.1002/prot.24974. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Evans R., Jumper J., Kirkpatrick J., Sifre L., Green T., Qin C., Zidek A., Nelson A., Bridgland A., Penedones H. De novo structure prediction with deeplearning based scoring. Annu Rev Biochem. 2018;77(363–382):6. [Google Scholar]
  • 16.Ovchinnikov S., Kamisetty H., Baker D. Robust and accurate prediction of residue–residue interactions across protein interfaces using evolutionary information. Elife. 2014;3 doi: 10.7554/eLife.02030. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Marchetti F., Capelli R., Rizzato F., Laio A., Colombo G. The subtle trade-off between evolutionary and energetic constraints in protein–protein interactions. J Phys Chem Lett. 2019;10(7):1489–1497. doi: 10.1021/acs.jpclett.9b00191. [DOI] [PubMed] [Google Scholar]
  • 18.El-Gebali S., Mistry J., Bateman A., Eddy S.R., Luciani A., Potter S.C., Qureshi M., Richardson L.J., Salazar G.A., Smart A. The pfam protein families database in 2019. Nucl Acids Res. 2019;47(D1):D427–D432. doi: 10.1093/nar/gky995. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Fredriksson R., Lagerström M.C., Lundin L.-G., Schiöth H.B. The g-protein-coupled receptors in the human genome form five main families. phylogenetic analysis, paralogon groups, and fingerprints. Molecul Pharmacol. 2003;63(6):1256–1272. doi: 10.1124/mol.63.6.1256. [DOI] [PubMed] [Google Scholar]
  • 20.Zhou Q, Yang D, Wu M, Guo Y, Guo W, Zhong L, Cai X, Dai A, Jang W, Shakhnovich EI, et al. Common activation mechanism of class A GPCRs, eLife 8. [DOI] [PMC free article] [PubMed]
  • 21.van Westen G.J., Wegner J.K., Bender A., IJzerman A.P., van Vlijmen H.W. Mining protein dynamics from sets of crystal structures using ”consensus structures”. Protein Sci. 2010;19(4):742–752. doi: 10.1002/pro.350. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Consortium U. Uniprot: a worldwide hub of protein knowledge. Nucleic Acids Res. 2019;47(D1):D506–D515. doi: 10.1093/nar/gky1049. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Pándy-Szekeres G., Munk C., Tsonkov T.M., Mordalski S., Harpsøe K., Hauser A.S., Bojarski A.J., Gloriam D.E. GPCRdb in 2018: adding GPCR structure models and ligands. Nucleic Acids Res. 2017;46(D1):D440–D446. doi: 10.1093/nar/gkx1109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Pei J., Kim B.-H., Grishin N.V. Promals3d: a tool for multiple protein sequence and structure alignments. Nucleic Acids Res. 2008;36(7):2295–2300. doi: 10.1093/nar/gkn072. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Altschul S.F., Gertz E.M., Agarwala R., Schäffer A.A., Yu Y.-K. Psi-blast pseudocounts and the minimum description length principle. Nucleic Acids Res. 2008;37(3):815–824. doi: 10.1093/nar/gkn981. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Scarabelli G., Morra G., Colombo G. Predicting interaction sites from the energetics of isolated proteins: a new approach to epitope mapping. Biophys J. 2010;98(9):1966–1975. doi: 10.1016/j.bpj.2010.01.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Brandes U. On variants of shortest-path betweenness centrality and their generic computation. Social Networks. 2008;30(2):136–145. [Google Scholar]
  • 28.Attwood T., Findlay J. Fingerprinting g-protein-coupled receptors. Protein Eng, Des Selection. 1994;7(2):195–203. doi: 10.1093/protein/7.2.195. [DOI] [PubMed] [Google Scholar]
  • 29.Kobilka B.K. G protein coupled receptor structure and activation. Biochimica et Biophysica Acta (BBA)-Biomembranes. 2007;1768(4):794–807. doi: 10.1016/j.bbamem.2006.10.021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Venkatakrishnan A., Deupi X., Lebon G., Tate C.G., Schertler G.F., Babu M.M. Molecular signatures of G-protein-coupled receptors. Nature. 2013;494(7436):185. doi: 10.1038/nature11896. [DOI] [PubMed] [Google Scholar]
  • 31.Veprintsev D, Venkatakrishnan A, Deupi X, Lebon G, Heydenreich FM, Flock T, Miljus T, Balaji S, Bouvier M, Tate CG, et al. Diverse activation pathways in class a gpcrs converge near the g-protein-coupling region. [DOI] [PMC free article] [PubMed]
  • 32.Suku E., Giorgetti A. Common evolutionary binding mode of rhodopsin-like GPCRs: Insights from structural bioinformatics. AIMS. Biophysics. 2017;4:543–556. AIMS Press. [Google Scholar]
  • 33.Ballesteros JA, Weinstein H. [19] integrated methods for the construction of three-dimensional models and computational probing of structure-function relations in g protein-coupled receptors. On: Methods in neurosciences, vol. 25, Elsevier, 1995, pp. 366–428.
  • 34.Granier S., Kobilka B. A new era of GPCR structural and chemical biology. Nature Chem Biol. 2012;8(8):670–673. doi: 10.1038/nchembio.1025. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Dalton J.A., Lans I., Giraldo J. Quantifying conformational changes in GPCRs: glimpse of a common functional mechanism. BMC Bioinf. 2015;16(1):124. doi: 10.1186/s12859-015-0567-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Dror R.O., Arlow D.H., Maragakis P., Mildorf T.J., Pan A.C., Xu H., Borhani D.W., Shaw D.E. Activation mechanism of the β2-adrenergic receptor. Proc Nat Acad Sci. 2011;108(46):18684–18689. doi: 10.1073/pnas.1110499108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Kruse A.C., Hu J., Pan A.C., Arlow D.H., Rosenbaum D.M., Rosemond E., Green H.F., Liu T., Chae P.S., Dror R.O. Structure and dynamics of the m3 muscarinic acetylcholine receptor. Nature. 2012;482(7386):552. doi: 10.1038/nature10867. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Gumbart J., Khalili-Araghi F., Sotomayor M., Roux B. Constant electric field simulations of the membrane potential illustrated with simple systems. Biochimica et Biophysica Acta (BBA)-Biomembranes. 2012;1818(2):294–302. doi: 10.1016/j.bbamem.2011.09.030. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Geppetti P., Veldhuis N.A., Lieu T., Bunnett N.W. G protein-coupled receptors: dynamic machines for signaling pain and itch. Neuron. 2015;88(4):635–649. doi: 10.1016/j.neuron.2015.11.001. [DOI] [PubMed] [Google Scholar]
  • 40.Sandal M., Behrens M., Brockhoff A., Musiani F., Giorgetti A., Carloni P., Meyerhof W. Evidence for a transient additional ligand binding site in the tas2r46 bitter taste receptor. J Chem Theory Comput. 2015;11(9):4439–4449. doi: 10.1021/acs.jctc.5b00472. [DOI] [PubMed] [Google Scholar]
  • 41.Ponzoni L., Rossetti G., Maggi L., Giorgetti A., Carloni P., Micheletti C. Unifying view of mechanical and functional hotspots across class A GPCRs. PLoS Computat Biol. 2017;13(2) doi: 10.1371/journal.pcbi.1005381. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Trzaskowski B., Latek D., Yuan S., Ghoshdastider U., Debinski A., Filipek S. Action of molecular switches in GPCRs-theoretical and experimental studies. Curr Medicinal Chem. 2012;19(8):1090–1109. doi: 10.2174/092986712799320556. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Nygaard R., Frimurer T.M., Holst B., Rosenkilde M.M., Schwartz T.W. Ligand binding and micro-switches in 7tm receptor structures. Trends Pharmacol Sci. 2009;30(5):249–259. doi: 10.1016/j.tips.2009.02.006. [DOI] [PubMed] [Google Scholar]
  • 44.Schönegge A.-M., Gallion J., Picard L.-P., Wilkins A.D., Le Gouill C., Audet M., Stallaert W., Lohse M.J., Kimmel M., Lichtarge O. Evolutionary action and structural basis of the allosteric switch controlling β 2 AR functional selectivity. Nat Commun. 2017;8(1):1–12. doi: 10.1038/s41467-017-02257-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Tehan B.G., Bortolato A., Blaney F.E., Weir M.P., Mason J.S. Unifying family a GPCR theories of activation. Pharmacol Therapeutics. 2014;143(1):51–60. doi: 10.1016/j.pharmthera.2014.02.004. [DOI] [PubMed] [Google Scholar]
  • 46.Katritch V., Fenalti G., Abola E.E., Roth B.L., Cherezov V., Stevens R.C. Allosteric sodium in class A GPCR signaling. Trends Biochem Sci. 2014;39(5):233–244. doi: 10.1016/j.tibs.2014.03.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Amundarain M.J., Ribeiro R.P., Costabel M.D., Giorgetti A. Gabaa receptor family: overview on structural characterization. Future Medicinal Chem. 2019;11(3):229–245. doi: 10.4155/fmc-2018-0336. [DOI] [PubMed] [Google Scholar]
  • 48.Amundarain M.J., Viso J.F., Zamarreño F., Giorgetti A., Costabel M. Orthosteric and benzodiazepine cavities of the α1β2γ2 gabaa receptor: insights from experimentally validated in silico methods. J Biomol Struct Dyn. 2019;37(6):1597–1615. doi: 10.1080/07391102.2018.1462733. [DOI] [PubMed] [Google Scholar]
  • 49.Jaiteh M, Taly A, Hénin J. Evolution of pentameric ligand-gated ion channels: pro-loop receptors, PloS one 11 (3). [DOI] [PMC free article] [PubMed]
  • 50.Miller P.S., Smart T.G. Binding, activation and modulation of cys-loop receptors. Trends Pharmacol Sci. 2010;31(4):161–174. doi: 10.1016/j.tips.2009.12.005. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

mmc1.pdf (716.1KB, pdf)

Articles from Computational and Structural Biotechnology Journal are provided here courtesy of Research Network of Computational and Structural Biotechnology

RESOURCES