Skip to main content
Computational and Structural Biotechnology Journal logoLink to Computational and Structural Biotechnology Journal
. 2023 Aug 12;21:4070–4078. doi: 10.1016/j.csbj.2023.08.007

Cross-kingdom analyses of transmembrane protein kinases show their functional diversity and distinct origins in protists

Zhiyuan Yin a, Danyu Shen a, Yaning Zhao a, Hao Peng c, Jinding Liu b,, Daolong Dou a,b
PMCID: PMC10463195  PMID: 37649710

Abstract

Transmembrane kinases (TMKs) are important mediators of cellular signaling cascades. The kinase domains of most metazoan and plant TMKs belong to the serine/threonine/tyrosine kinase (S/T/Y-kinase) superfamily. They share a common origin with prokaryotic kinases and have diversified into distinct subfamilies. Diverse members of the eukaryotic crown radiation such as amoebae, ciliates, and red and brown algae (grouped here under the umbrella term “protists”) have long diverged from higher eukaryotes since their ancient common ancestry, making them ideal organisms for studying TMK evolution. Here, we developed an accurate and high-throughput pipeline to predict TMKomes in cellular organisms. Cross-kingdom analyses revealed distinct features of TMKomes in each grouping. Two-transmembrane histidine kinases constitute the main TMKomes of bacteria, while metazoans, plants, and most protists have a large proportion of single-pass TM S/T/Y-kinases. Phylogenetic analyses classified most protist S/T/Y-kinases into three clades, with clades II and III specifically expanded in amoebae and oomycetes, respectively. In contrast, clade I kinases were widespread in all protists examined here, and likely shared a common origin with other eukaryotic S/T/Y-kinases. Functional annotation further showed that most non-kinase domains were grouping-specific, suggesting that their recombination with the more conserved kinase domains led to the divergence of S/T/Y-kinases. However, we also found that protist leucine-rich repeat (LRR)- and G-protein-coupled receptor (GPCR)-type TMKs shared similar sensory domain architectures with respective plant and animal TMKs, despite that they belong to distinct kinase subfamilies. Collectively, our study revealed the functional diversity of TMKomes and the distinct origins of S/T/Y-kinases in protists.

Keywords: Kinome, Histidine kinase, Receptor-like kinase, Serine/threonine/tyrosine kinase, Oomycete

Graphical Abstract

ga1

1. Introduction

‘The species that survive are the ones who most accurately perceive their environment and successfully adapt to it’ wrote Charles Darwin in proposing the theory that natural selection drives evolution. Viable cells need to sense and respond to their ever-changing external milieu to sustain life, a process that relies heavily on cell surface receptors embedded in the plasma membrane. Protein kinase-mediated phosphorylation of cell signaling proteins is the most extensively studied post-translational modification [30]. Since the first receptor tyrosine kinase (RTK) was cloned from human in 1984 [39], transmembrane kinases (TMKs) have been extensively studied in bacteria [18], [28], plants [10], and animals [2]. TMKs consist of extracellular domains (ECDs) that bind extracellular molecules, transmembrane (TM) domains, and cytoplasmic protein kinase domains for signaling. Distinct types of TMKs are employed by different domains of life. The two-component sensor-response system is the major signal transduction model in bacteria, and uses histidine kinases to transmit signals (Fig. S1) [18], [45]. Almost all the plant TMKs are serine/threonine kinases. Plant TMKs, also known as receptor-like kinases (RLKs), have a monophyletic origin and are closely related to animal Pelle kinases (Fig. S1) [33]. Tyrosine and serine/threonine kinases diverged to form multiple distinct families before the emergence of fungi, plants, and animals [33], [37]. Nevertheless, animal tyrosine kinases (TKs), plant RLKs, and bacterial eukaryotic-like serine/threonine protein kinases share conserved features and a common evolutionary theme [14], [36]. There are four tyrosine kinase-like subfamilies including mixed lineage kinases (MLK), rapidly accelerated fibrosarcoma (Raf) kinases, animal serine threonine kinase receptors (STKR), and Drosophila Pelle kinases (Pelle) [27]. Combinations between different ECDs and kinase domains further increase the diversification of TMKs [10], [46]. However, the evolutionary origins of eukaryotic TMKs have not been adequately explored.

Diverse members of the eukaryotic crown radiation such as amoebae, ciliates, and red and brown algae (grouped here under the term “protists”) have long diverged from higher eukaryotes since their ancient common ancestry [4]. Therefore, since multiple independent evolutionary lineages are encompassed by protists, this group often has been used for analyzing eukaryotic kinase evolution. For example, phylogenetic analysis of TKs from the choanoflagellate Monosiga brevicollis and two filasterean species revealed that the paraphyletic pre-opisthokont TKs diverged before the monophyletic holozoan TKs [37]. The RLK/Pelle receptor configuration (combination of ECDs and kinase domains) probably occurred before the divergence of land plants from charophytes [10], [24]. Animal and plant TMKs are primarily classified into three families: RTK, STKR, and RLK/Pelle. Distinct families of TMKs have also been reported in various protist lineages. The oomycete Phytophthora infestans, within the kingdom Stramenopila, contains 139 TMKs, most of which belong to oomycete-specific groups (Fig. S1) [19]. Interestingly, 24 TMKs from Phytophthora sojae, a model oomycete, all harbor plant-like extracellular leucine-rich repeat (LRR) domains. However, their kinase domains are different from plant kinases and form a separate clade [34]. Likewise, the amoebozoan Entamoeba histolytica contains a large and novel TMK family branched closely with Raf and STKR (Fig. S1) [5]. Thus, a systematic survey of the TMKomes of representative model organisms from diverse protist lineages would provide clues about the evolutionary origins of eukaryotic TMKs.

TMKs originated from associations between their ECDs and kinase domains. The plant RLK/Pelle family can be divided into at least 17 subfamilies based on the ECDs [10]. The 58 human RTKs belong to 20 subfamilies with different ECDs [31]. These plant and animal TMK subfamilies are highly divergent with distinct ECDs [10], probably because each organismal lineage needs to adapt to its own environment. Furthermore, diverse types of TMKs appeared at the earliest stages of metazoan and land plant lineage evolution, with their TMK repertoires being largely stable after the initial expansion [10], [24], [46]. Interestingly, charophytes and land plants share TMK subfamilies such as LRR-RLKs and proline-rich/extension-like receptor kinases (PERKs) [10], while no RTK orthologs were shared among metazoans and closely related protists (choanoflagellates and filastereans) [25], [37]. These findings indicate an elaborate evolutionary pattern of eukaryotic TMKs.

Here, we selected 128 model organisms from prokaryotes, diverse protists, fungi, plants, and metazoans [15] to analyze the origins of eukaryotic TMKs in the various protist lineages. Cross-kingdom surveys of TMKomes revealed distinct features of each lineage. The kinase domains from protists can be grouped into three clades. Clade I is conserved across all protists examined, while clades II and III appear to be amoebozoan- and oomycete-specific, respectively. Tyrosine and tyrosine-like kinases in animal and plants diverged from the protist clade I kinases, which are likely the shared origin of eukaryotic TMKs. In addition, LRR- and G-protein-coupled receptor (GPCR)-type TMKs from protists share similar sensor domain architectures with respective plant and animal TMKs, although their kinase domains belong to distinct subfamilies.

2. Results

2.1. Cross-kingdom identification of transmembrane protein kinomes (TMKomes) in model organisms

Pfam HMM profiles were used for protein kinase annotation. Based on previously described classification [17], we roughly divided protein kinases into three groups based on their conserved structures and amino acid phosphorylation targets, including serine/threonine/tyrosine kinases (S/T/Y-kinases), histidine-specific kinases (H-kinases), and amino acid kinases (AA-kinase, Pfam: PF00696) that phosphorylate a variety of other amino acid substrates. We developed a topology-based high-throughput pipeline to comprehensively identify TM protein kinomes from different domains of life (Fig. 1A, see the Materials and Methods section for details). The accuracy of this pipeline was first manually checked by using it to annotate the well-studied model phytopathogenic bacterium Xanthomonas campestris, whose genome encodes 31 TM histidine kinases [44]. Our pipeline identified 30 histidine-type TMKs in X. campestris (Table S1) with the only exception of XC_1965 (AAY49028), which was predicted to contain a TM domain using Psort-B [44]. Deeper analysis revealed that its potential cytoplasmic membrane localization was predicted by SCL-BLAST-, which performs a BLASTP search against the PSORTdb database. However, we failed to predict this TM region using either TMHMM 2.0 or the deep learning-based DeepTMHMM, indicating that XC_1965 is not actually a TMK. The human gene group ‘receptor kinases’ approved by the HUGO Gene Nomenclature Committee (HGNC) contains 58 RTKs and 12 receptor serine/threonine kinases. Our pipeline successfully detected all 70 known TMKs as well as 17 additional TMKs (Table S2). These newly-identified TMKs include five guanylyl cyclase 2 (GUCY2) members that contain a kinase domain and a GUCY catalytic domain. The human genome contains 7 GUCY2-encoding genes, two of which are pseudogenes and thus failed to be identified by our pipeline. We finally tested our pipeline using the most extensively studied model plant Arabidopsis thaliana, which has 610 RLK-related sequences [33], though 16 were removed in the newest version of Arabidopsis proteome due to gene model update. For the remaining 594 RLK sequences, 466 have a TM domain and were all successfully predicted by using our pipeline. The remaining 128 kinases do not have a TM. Moreover, we identified 21 additional Arabidopsis TMKs including 10 H-kinases and 11 S/T/Y-kinases (Table S3). Collectively, these testing results demonstrate that our pipeline is highly accurate and suitable for the high-throughput identification of TMKomes and different groups of protein kinases in both prokaryotes and eukaryotes.

Fig. 1.

Fig. 1

The overall features of transmembrane kinases (TMKs) from different kingdoms. (A) The pipeline used for cross-kingdom identification of transmembrane kinases (TMKs). Model organisms were selected according to Hedges [15]. The profile hidden Markov models from Pfam-A were used for annotating three types of protein kinases (H, histidine; S/T/Y, serine/threonine/tyrosine; AA, amino acid) and other domains by HMMER v3.3.2. The transmembrane (TM) topology was predicted using the deepTMHMM server v1.0.13 (https://dtu.biolib.com/DeepTMHMM). (B) Comparison of the TMK proportions in the proteomes among five groupings. Different letters above the boxplots indicate significant difference (one-way ANOVA, n = 18–39). (C) The percentages of different TMK types (H, S/T/Y, and AA) in the five groupings. (D) Major TM topology types of TMKs in each grouping. N, amino-terminus; C, carboxyl-terminus; TM, transmembrane; ECD, extracellular domain.

We identified a total of 22,321 TMKs spanning 128 model species from prokaryotes, protists, fungi, plants, and metazoans (Table S4). The numbers of TMKs found in each species varied extensively, ranging from 0 (several archaea and bacteria) to 2939 (wheat). However, the average proteome percentages of TMKs in prokaryotes, protists, and metazoans were relatively similar at around 0.4% (Fig. 1B). Metazoans often have dozens of RTKs [46], and fungi are evolutionarily close to metazoans. However, consistent with a previous report that fungi lost TKs after their divergence from metazoans [37], many fewer TMKs were identified in fungal proteomes (Table S4). Although most plant RLKs have a monophyletic origin [33], tandem and whole-genome duplications have resulted in their extensive expansion [10], [32] to hundreds of members per species, with an average proteome percentage of 1.75% (Fig. 1B and Table S4).

Most known TMKs are histidine or S/T/Y-kinases. More than 90% of bacterial TMKs are histidine kinases that function as sensors in the two-component system [7]. Histidine kinases are also widely distributed in eukaryotes, such as the receptors of phytohormones [20], [29]. Here, we found that metazoan and plant TMKomes have only ∼0.2% of histidine kinases with TM domains (Fig. 1C). Although fungi have many fewer TMKs in total, about 30% of the fungal TMKs are histidine kinases (Fig. 1C), which is in line with their important functions in fungi [16]. Interestingly, most of the protist lineages examined in this study have no or very few histidine-type TMKs. Exceptions were 201 and 51 histidine-type TMKs detected in two ciliates, Paramecium tetraurelia and Tetrahymena thermophila, respectively. Most of those ciliate TMKs had a similar topological organization with six TM domains. Since both ciliates feed on bacteria, we were prompted to test whether they obtained bacterial TMKs via horizontal gene transfer. We generated a maximum likelihood phylogenetic tree using all identified prokaryotic and ciliate histidine TMKs with six TM domains. The results revealed that bacterial and ciliate TMKs were separated into two distinct clades (Fig. S2), suggesting that ciliate TMKs were unlikely to have been acquired by horizontal gene transfer from the bacteria they feed on. In fact, genomic analyses revealed that lineage-specific duplication in T. thermophila led to the expansion of histidine TMKs in that species [12]. Likewise, whole-genome duplication events in P. tetraurelia likely doubled the copy numbers of TMKs twice [3]. An unusual class of TMKs, having a cytoplasmic AA-kinase domain was found in the oomycete Pythium insidiosum (two TMKs: PINS_001650 and PINS_001466).

TM topology and the numbers of TM domains are often associated with specific functions [42]. Deep learning-based deepTMHMM is currently the TM topology prediction method with the best performance [13]. A TMK defined here contains a cytoplasmic kinase domain and at least one TM domain. We also counted the TM domain numbers to categorize TMKs by topology. Consistent with the high proportion of histidine TMKs in prokaryotes, more than 60% of prokaryotic TMKs have two TM domains (Fig. 1D and S3), which is typical for the two-component sensing system [18]. The 2-TM histidine TMKs, together with other histidine TMKs with 1, 4, or 6 TM domains, constitute the major types of TMKs in prokaryotes (Fig. 1D). In contrast, more than 94% of the TMKs in metazoans and plants are S/T/Y-kinases with a single-pass TM. The TMKs found among protist lineages, especially in the ciliates, are mainly 1-TM S/T/Y-kinases and 6-TM histidine kinases. Overall, our results reveal distinct features of TMKs from different groupings.

2.2. The functional diversity of TMKs from different kingdoms

TMK ECDs perceive environmental signals, which are essential for organisms to adapt to their external environment [22]. Using Pfam-A, we annotated conserved non-kinase domains in TMKs for cross-kingdom comparisons. Since the 18 fungi examined have only 54 TMKs in total (Table S4), they were not included in the statistical description of conserved domains. About 90% of the metazoan and plant TMKs have conserved domains with known functions (Fig. 2A), probably due to the intensive investigations of them over the past three decades. By contrast, about 40% of the protist TMK non-kinase domains currently remain unannotated by Pfam-A. We next analyzed the share of annotated conserved domains by different groupings. A Venn diagram indicated high non-kinase domain diversity of TMKs from different groupings (Fig. 2B). Only seven domains were shared by all four groups (protists: 70.1%, prokaryotes: 81.0%, metazoans: 66.2%, and plants: 83.3%), while a large proportion of domains were specific in each group.

Fig. 2.

Fig. 2

Lineage-specific diversification of conserved non-kinase domains in TMKs. (A) Comparison of the proportions of TMKs with conserved non-kinase domains. The average proportions of TMKs with Pfam hits are shown above the boxplots. Different letters above the boxplots indicate significant differences (one-way ANOVA, n = 22–39). (B) A schematic summary of non-kinase domains in TMKs. The domains were annotated by searching Pfam-A with HMMER v3.3.2. Domains from the same family were consolidated into a single domain. (C) Word clouds showing the most frequent domains in each grouping. The clouds were generated by using the online server BioLadder (www.bioladder.cn).

To further determine the major domain families within each group, the annotated domains were ranked by frequency (Fig. 2C and Table S5). The top domain, leucine-rich repeat (LRR), was the largest family in all higher plant genomes examined [10]. The most prevalent domain family among metazoan TMKs was the fibronectin type III domain (PF00041), which mediates protein interactions, as the LRR domain. In bacteria, the histidine kinase, adenyl cyclase, methyl-accepting protein, and phosphatase (HAMP) domains (PF00672 and PF18947) and the Period circadian protein, aryl hydrocarbon receptor nuclear translocator protein, and Per-Arnt-Sim (PAS) domains (PF13426, PF13188, PF12860, PF08448, PF08447, PF13596, and PF00989) occurred most frequently. HAMP and PAS domains act as molecular sensors and signal transducers in histidine kinases, respectively [1], [38]. Mainly found in oomycetes, LRRs were the largest TMK domain family among the protist lineages [34], [9], similar to the observation in higher plants. However, the specific kinase domains of oomycete LRR TMKs were not closely related to those of human tyrosine-like kinases or plant Pelle/RLKs [19], [34]. The above observations suggest that TMK diversification is extensive and lineage-specific.

2.3. The serine/threonine/tyrosine kinase families in protist lineages are specific with distinct origins

Apart from the specific expansion of histidine kinases in ciliates, single-pass S/T/Y-kinases were the major type of TMKs in protists, metazoans, and plants (Fig. 1D). S/T/Y-kinases diversified into multiple subfamilies from a common origin [27], [36]. Most metazoan TMKs are TKs and STKR-type tyrosine kinase-like (TKL) kinases, while almost all the plant TMKs are Pelle/RLK-type TKL kinases [33]. To determine the kinase types of TMKs among the protist lineages, we generated a phylogenetic tree using metazoan TKLs, plant RLKs, and several established protein kinase families as outgroups, namely inositol-requiring enzyme 1 (IRE1), cAMP-dependent, cGMP-dependent and protein kinase C (AGC), Ca2+/calmodulin-dependent protein kinase (CAMK), cyclin-dependent kinase (CDK), mitogen-activated protein kinase (MAPK), glycogen synthase kinase (GSK), CDC-like kinase (CLK) (CMGC), “Sterile” serine/threonine kinases (STE), and casein kinase I (CK1). All S/T/Y-kinase domains of TMKs from the 12 non-oomycete protists formed three clades in the maximum likelihood phylogenetic tree (Fig. 3). Most kinases fell in clade I, with only a few sequences clustering within the TKL outgroup. Clade I contained all the established protein kinase families in the outgroup (Fig. 3A). Among them, IRE1 homologs [40] formed the major TMK family in fungi (Fig. S4). The kinase domains of human and Arabidopsis IRE1 were nested in clade Ib (Fig. 3). Interestingly, clade II was Evosea (Amoebozoa)-specific, including most of the TMKs in E. histolytica.

Fig. 3.

Fig. 3

The specific and shared protein kinase families in protists. (A) A maximum likelihood tree (evolutionary model LG + F + G4) of the kinase domains from 12 non-oomycete protists. The kinase domains (TK, TKL, IRE1, AGC, CAMK, CMGC, STE, and CK1) from human (metazoan), Neurospora (fungi), and Arabidopsis (plants) were used as the outgroup. The scale bar corresponds to the number of substitutions per site. Branches with bootstrap values of 70–90% and over 90% are marked with gray and black dots, respectively. Major clades are highlighted by different colors. Kinases from different phyla are marked with colored dots.

Since the kinases of E. histolytica are distributed in all the clades mentioned above, we next built a maximum likelihood tree of kinases from E. histolytica and P. sojae. Most of the P. sojae TMKs formed a separate clade, clade III in the tree, with the remaining sequences clustering with TKLs or clade I (Fig. 4). TMKs from all of the other 10 oomycetes showed a similar phylogeny as that of P. sojae (Fig. S5). Based on the conserved substrate-binding site (tryptophan), the oomycete-specific clade III, as well as clade II, are likely variants branching out from clade I (Fig. 5A). The conserved kinase residues in protist clades Ib, II, and III are consistent with the motifs of protein kinases reported previously [27], [46]. However, their substrate-binding residues were distinct, especially those in the C-terminus of the catalytic loop (Fig. 5A). Consistent with a previous report of TKs found in oomycetes [19], most oomycetes examined here contained at least one TK (Fig. 5B). However, no TKs were found in other protists in this study. Collectively, TMKs in different protist lineages have different kinase types. TKs were mainly found in Choanoflagellata and Filasterea [37], while clades II and III were specifically expanded in Evosea and Oomycota, respectively. In contrast, Clade I TMKs were widely distributed in different protist lineages. The sub-clade Ia appeared to be the common origin of S/T/Y-kinases in the protists examined here.

Fig. 4.

Fig. 4

A maximum likelihood tree of kinase domains fromEntamoeba histolyticaandPhytophthora sojae. The tree was constructed using IQ-TREE (evolutionary model LG + G4).

Fig. 5.

Fig. 5

Distribution and conserved motifs of protist serine/threonine/tyrosine kinases. (A) Comparison of the catalytic and activation loop motifs in different kinase clades. Weblogos show the relative frequencies of residues at the conserved sites. The conserved kinase and substrate-binding residues are indicated by red and blue stars, respectively. The numbers of sequences used to build each logo are 20 in clade Ia, 78 in clade Ib, 95 in clade II, and 38 in TKL. (B) Counts of predicted protist TMKs with different kinase types in this study. The species tree was generated by using the online server TimeTree (http://www.timetree.org/). HK, histidine kinase; TK, tyrosine kinase; TKL, tyrosine kinase-like.

2.4. LRR- and GPCR-type TMKs from different groupings share similar sensor domain architectures within each type of family

As the largest group of plant RLKs, LRR-RLKs contain extracellular LRR sensory domains and cytoplasmic Pelle/RLK-type kinase domains [10]. Interestingly, the unrelated oomycetes also have TMKs that resemble plant LRR-RLKs [19], [34]. Plant and oomycete LRRs even share a conserved motif [34]. However, the kinase domains of most oomycete TMKs were oomycete-specific (Fig. 4; Fig. S5). We generated a maximum likelihood tree using the kinase domains of P. sojae TMKs, with human and Arabidopsis kinases being the outgroup. Almost all the P. sojae LRR-TMKs belonged to the oomycete-specific clade III (Fig. 6A). Taken together, plant LRR-RLKs and oomycete LRR-TMKs shared a similar domain architecture but with distinct kinase domains (Fig. 6B).

Fig. 6.

Fig. 6

LRR- and GPCR-TMKs from different groupings share similar sensor domain architectures within each type of family. (A) A maximum likelihood tree (from IQ-TREE, evolutionary model LG + I + G4) of the kinase domains of P. sojae TMKs. TMKs with an extracellular LRR domain are marked with red dots. The kinase clades are highlighted by different colors. (B) A model of the evolution and assembly of LRR-type TMKs in oomycetes and plants. (C) A maximum likelihood tree (from IQ-TREE, evolutionary model LG + G4) of the kinase domains of GPCR-TMKs from oomycetes, ciliates, and amoebae. Kinases from different species are marked with colored dots. (D) A model of the evolution and assembly of GPCR-TMKs in protists and metazoans.

The seven-TM GPCRs are central sensors in eukaryotes that activate the G-protein complex signaling. GPCRs may combine with domains from other signaling systems, such as the phosphatidylinositol phosphate (PIP) kinases (PIPKs) [41]. Here, we found that GPCR-type TMKs with a C-terminal protein kinase domain were widely distributed in oomycetes, ciliates, and amoebae (Fig. 6C). The GPCR-linked kinases fell into two clades. Most oomycete kinases belonged to the clade I described above, while the kinases of ciliates and amoebae were closely related to the TKL clade (Fig. 6C). In addition, several GPCR-TMKs were also found in metazoans, including sea urchins and Felidae animals. However, they were nested within the TK clade (Fig. 6C). Similar to the scenario of LRR kinases, GPCR-TMKs shared similar sensor architectures but involved distinct kinase domains (Fig. 6D).

3. Discussion

TMKs are pivotal signaling components in both prokaryotes and eukaryotes. They are usually predicted by homology-based identification of protein kinases with TM domains [19], [37]. However, the multiple kinase types and their diverse TM topologies make it time-consuming to identify TMKomes accurately. Recently, Hallgren et al. [13] developed deepTMHMM, a deep learning model for TM topology prediction and classification, which is the best-performing method to date. Here, we established a topology-based pipeline for high-throughput identification of TMKomes in all cellular organisms, and demonstrated its high accuracy by testing with model prokaryotic and eukaryotic organisms whose TMKs have been well-studied. This pipeline was also effective in detecting TMKs missed in previous homology-based studies due to different kinase types and/or TM topologies. For example, Si et al. [34] reported 24 LRR-RLKs in P. sojae. We increased the number to 32 by adopting our pipeline combined with manual checks (Table S6).

We surveyed the TMKomes of 128 model organisms across multiple kingdoms, with distinct TMK features found in each grouping. First, bacteria and eukaryotes deploy histidine- and S/T/Y-kinases as major signal transducers, respectively. Although widespread in eukaryotes [20], the histidine-type TMKs generally only contribute a small proportion of their TMKomes. Likewise, TM S/T/Y-kinases are also widely found in bacteria [28], but are many fewer than TM histidine kinases. Second, multiple divergent evolution events occurred to S/T/Y-kinases from protists and higher eukaryotes, which resulted in distinct subfamilies. TKs were the major TMKs in metazoans and related protists (choanoflagellates and filastereans) [37], while Pelle/RLK TMKs were dominant in plants and related protists (charophytes) [10]. Clade I TMKs were widespread in different lineages of protists. Thus, the base of clade I appears to represent the shared origin of S/T/Y-kinases. Clades II and III TMKs were specifically expanded in amoebae and oomycetes, respectively.

TMK non-kinase domains were also highly variable across kingdoms. About 70% of these domains with Pfam annotations were grouping-specific. The high diversity of kinase and non-kinase sensory domain combinations raises the question of TMK origin and evolution. Kinases and other domains appear to have arisen independently and formed TMKs by domain fusions in certain lineages [21], [46]. Natural selection presumably drives the evolution of TMKs by domain shuffling, since TMKs play essential roles in detecting changes in the extracellular environment [22]. Interestingly, plant and oomycete LRR domains share a similar architecture, but their fused kinase domains belong to the distinct Pelle/RLK and protist clade III subfamilies, respectively. Likewise, animals and several groups of protists contain three types of GPCR-TMKs with a similar GPCR architecture, but their kinase domains belong to different clades, namely TK, TKL, and protist clade I. These findings suggest an independent evolution of TMK domain combinations, with distinct evolutionary origins of their kinase and sensory domains.

4. Materials and methods

4.1. The pipeline for TMKome identification

Protein sequences from 128 model organisms were retrieved from NCBI, Ensembl, and PLAZA (https://bioinformatics.psb.ugent.be/plaza/) databases, as described previously [15]. Representative or reference genomes with the highest assembly levels were selected for analysis, as informed by the NCBI database (Table S4). The genomes selected showed an average of 97% completeness when assessed using BUSCO [35]. For each eukaryotic sequence with alternative splicing, only the longest variant was kept for TMK identification. Protein kinase domains subject to searches with HMMER (version 3.3.2, option -E 1e-5) included S/T/Y-kinases (PF07714.20 and PF00069.28), histidine kinases (PF02518.29, PF07730.16, PF00072.27, PF00512.28, and PF07568.15) [29], and AA-kinases (PF00696.31). The TM topologies of protein kinases were predicted using deepTMHMM (version 1.0.13) [13]. A protein with both TM domain(s) and cytosolic kinase domain(s) was defined as a TMK.

4.2. Phylogeny

Maximum likelihood phylogenetic trees were inferred using S/T/Y-kinase domains. Sequences were aligned with MUSCLE [11]. Poorly aligned regions were trimmed by trimAl [6]. The trees were constructed with IQ-TREE 2 (options -m TEST -alrt 1000 -bb 1000) [26] and visualized using the online server tvBOT [43]. Phylogenetic trees were constructed using FlyFish (https://www.bot163.com), a software for automated batch data processing. The species tree of protists was obtained from TimeTree [23]. All the alignments and tree files are provided in Supplementary dataset 1.

4.3. Conserved protein domain and motif analyses

Non-kinase Pfam domains in TMK sequences were searched with HMMER (option -E 1e-2). The Venn diagram was generated using the online server BioLadder (https://www.bioladder.cn/web/#/pro/index). The sequence patterns of catalytic and activation loops in the kinase domains were analyzed with WebLogo [8].

CRediT authorship contributions statement

D.D. conceived the research. Z.Y. and J.L. designed research. J.L. developed the pipeline for TMKome identification. Z.Y., H.P., and D.S. performed the bioinformatics analyses. Z.Y. and H.P. wrote the manuscript. Y.Z. prepared figures. All authors read and approved the final manuscript.

Declaration of interests

The authors declare no competing interests.

Acknowledgments

We would like to thank Prof. Brett Tyler for his precious help in proofreading the manuscript. The work was supported by the Natural Science Foundation of Jiangsu Province (BK20221000), the National Natural Science Foundation of China (32202251 and 32230089), and the Jiangsu Funding Program for Excellent Postdoctoral Talent (2022ZB343). Mention of trade names or commercial products in this publication is solely for providing specific information and does not imply recommendation or endorsement by the USDA. USDA is an equal opportunity provider and employer.

Footnotes

Appendix A

Supplementary data associated with this article can be found in the online version at doi:10.1016/j.csbj.2023.08.007.

Appendix A. Supplementary material

Supplementary material

mmc1.zip (735.6KB, zip)

.

Supplementary material

mmc2.pdf (665.6KB, pdf)

.

Supplementary material

mmc3.xlsx (11.7KB, xlsx)

.

Supplementary material

mmc4.xlsx (17.8KB, xlsx)

.

Supplementary material

mmc5.xlsx (46.8KB, xlsx)

.

Supplementary material

mmc6.xlsx (22KB, xlsx)

.

Supplementary material

mmc7.xlsx (12KB, xlsx)

.

Supplementary material

mmc8.xlsx (2MB, xlsx)

.

Data availability

The datasets presented in this study are available in this paper’s supplemental information. Any additional information required to reanalyze the data reported in this paper is available upon request.

References

  • 1.Airola M.V., Watts K.J., Bilwes A.M., Crane B.R. Structure of concatenated HAMP domains provides a mechanism for signal transduction. Structure. 2010;18:436–448. doi: 10.1016/j.str.2010.01.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Amit I., Wides R., Yarden Y. Evolvable signaling networks of receptor tyrosine kinases: relevance of robustness to malignancy and to cancer therapy. Mol Syst Biol. 2007;3 doi: 10.1038/msb4100195. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Aury J.M., Jaillon O., Duret L., Noel B., Jubin C., Porcel B.M., et al. Global trends of whole-genome duplications revealed by the ciliate Paramecium tetraurelia. Nature. 2006;444:171–178. doi: 10.1038/nature05230. [DOI] [PubMed] [Google Scholar]
  • 4.Baldauf S.L. An overview of the phylogeny and diversity of eukaryotes. J Syst Evol. 2008;46:263–273. [Google Scholar]
  • 5.Beck D.L., Boettner D.R., Dragulev B., Ready K., Nozaki T., Petri W.A. Identification and gene expression analysis of a large family of transmembrane kinases related to the Gal/GalNAc lectin in Entamoeba histolytica. Eukaryot Cell. 2005;4:722–732. doi: 10.1128/EC.4.4.722-732.2005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Capella-Gutiérrez S., Silla-Martínez J.M., Gabaldón T. trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics. 2009;25:1972–1973. doi: 10.1093/bioinformatics/btp348. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Cheung J., Hendrickson W.A. Sensor domains of two-component regulatory systems. Curr Opin Microbiol. 2010;13:116–123. doi: 10.1016/j.mib.2010.01.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Crooks G.E., Hon G., Chandonia J.-M., Brenner S.E. WebLogo: a sequence logo generator. Genome Res. 2004;14:1188–1190. doi: 10.1101/gr.849004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Diévart A., Gilbert N., Droc G., Attard A., Gourgues M., Guiderdoni E., Périn C. Leucine-rich repeat receptor kinases are sporadically distributed in eukaryotic genomes. BMC Evol Biol. 2011;11:367. doi: 10.1186/1471-2148-11-367. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Dievart A., Gottin C., Périn C., Ranwez V., Chantret N. Origin and diversity of plant receptor-like kinases. Annu Rev Plant Biol. 2020;71:131–156. doi: 10.1146/annurev-arplant-073019-025927. [DOI] [PubMed] [Google Scholar]
  • 11.Edgar R.C. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004;32:1792–1797. doi: 10.1093/nar/gkh340. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Eisen J.A., Coyne R.S., Wu M., Wu D., Thiagarajan M., Wortman J.R., Badger J.H., Ren Q., Amedeo P., Jones K.M., et al. Macronuclear genome sequence of the ciliate Tetrahymena thermophila, a model eukaryote. PLoS Biol. 2006;4 doi: 10.1371/journal.pbio.0040286. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Hallgren J., Tsirigos K.D., Pedersen M.D., Almagro Armenteros J.J., Marcatili P., Nielsen H., Krogh A., Winther O. DeepTMHMM predicts alpha and beta transmembrane proteins using deep neural networks. BioRxiv. 2022 doi: 10.1101/2022.04.08.487609. [DOI] [Google Scholar]
  • 14.Hanks S.K., Quinn A.M., Hunter T. The protein kinase family: conserved features and deduced phylogeny of the catalytic domains. Science. 1988;241:42–52. doi: 10.1126/science.3291115. [DOI] [PubMed] [Google Scholar]
  • 15.Hedges S.B. The origin and evolution of model organisms. Nat Rev Genet. 2002;3:838–849. doi: 10.1038/nrg929. [DOI] [PubMed] [Google Scholar]
  • 16.Hérivaux A., So Y.S., Gastebois A., Latgé J.-P., Bouchara J.-P., Bahn Y.-S., et al. Major sensing proteins in pathogenic fungi: the hybrid histidine kinase family. PLoS Pathog. 2016;12 doi: 10.1371/journal.ppat.1005683. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Hunter T. Protein kinase classification. Methods Enzym. 1991;200:3–37. doi: 10.1016/0076-6879(91)00125-g. [DOI] [PubMed] [Google Scholar]
  • 18.Jacob-Dubuisson F., Mechaly A., Betton J.-M., Antoine R. Structural insights into the signalling mechanisms of two-component systems. Nat Rev Microbiol. 2018;16:585–593. doi: 10.1038/s41579-018-0055-7. [DOI] [PubMed] [Google Scholar]
  • 19.Judelson H.S., Ah-Fong A.M.V. The kinome of Phytophthora infestans reveals oomycete-specific innovations and links to other taxonomic groups. BMC Genom. 2010;11 doi: 10.1186/1471-2164-11-700. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Kabbara S., Hérivaux A., Dugé de Bernonville T., Courdavault V., Clastre M., Gastebois A., Osman M., Hamze M., Cock J.M., Schaap P., et al. Diversity and evolution of sensor histidine kinases in eukaryotes. Genome Biol Evol. 2018;11:86–108. doi: 10.1093/gbe/evy213. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.King N., Carroll S.B. A receptor tyrosine kinase from choanoflagellates: molecular insights into early animal evolution. Proc Natl Acad Sci USA. 2001;98:15032–15037. doi: 10.1073/pnas.261477698. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.King N., Hittinger C.T., Carroll S.B. Evolution of key cell signaling and adhesion protein families predates animal origins. Science. 2003;301:361–363. doi: 10.1126/science.1083853. [DOI] [PubMed] [Google Scholar]
  • 23.Kumar S., Stecher G., Suleski M., Hedges S.B. TimeTree: a resource for timelines, timetrees, and divergence times. Mol Biol Evol. 2017;34:1812–1819. doi: 10.1093/molbev/msx116. [DOI] [PubMed] [Google Scholar]
  • 24.Lehti-Shiu M.D., Zou C., Hanada K., Shiu S.H. Evolutionary history and stress regulation of plant receptor-like kinase/Pelle genes. Plant Physiol. 2009;150:12–26. doi: 10.1104/pp.108.134353. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Manning G., Young S.L., Miller W.T., Zhai Y. The protist, Monosiga brevicollis, has a tyrosine kinase signaling network more elaborate and diverse than found in any known metazoan. Proc Natl Acad Sci USA. 2008;105:9674–9679. doi: 10.1073/pnas.0801314105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Minh B.Q., Schmidt H.A., Chernomor O., Schrempf D., Woodhams M.D., von Haeseler A., et al. IQ-TREE 2: new models and efficient methods for phylogenetic inference in the genomic era. Mol Biol Evol. 2020;37:1530–1534. doi: 10.1093/molbev/msaa015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Mohanty S., Oruganty K., Kwon A., Byrne D.P., Ferries S., Ruan Z., Hanold L.E., Katiyar S., Kennedy E.J., Eyers P.A., et al. Hydrophobic core variations provide a structural framework for tyrosine kinase evolution and functional specialization. PLoS Genet. 2016;12 doi: 10.1371/journal.pgen.1005885. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Nagarajan S.N., Lenoir C., Grangeasse C. Recent advances in bacterial signaling by serine/threonine protein kinases. Trends Microbiol. 2022;30:553–566. doi: 10.1016/j.tim.2021.11.005. [DOI] [PubMed] [Google Scholar]
  • 29.Papon N., Stock A.M. What do archaeal and eukaryotic histidine kinases sense? F1000Res. 2019;8 doi: 10.12688/f1000research.20094.1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Pawson T., Scott J.D. Protein phosphorylation in signaling-50 years and counting. Trends Biochem Sci. 2005;30:286–290. doi: 10.1016/j.tibs.2005.04.013. [DOI] [PubMed] [Google Scholar]
  • 31.Robinson D.R., Wu Y.M., Lin S.F. The protein tyrosine kinase family of the human genome. Oncogene. 2000;19:5548–5557. doi: 10.1038/sj.onc.1203957. [DOI] [PubMed] [Google Scholar]
  • 32.Shiu S.H., Bleecker A.B. Expansion of the receptor-like kinase/Pelle gene family and receptor-like proteins in Arabidopsis. Plant Physiol. 2003;132:530–543. doi: 10.1104/pp.103.021964. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Shiu S.H., Bleecker A.B. Receptor-like kinases from Arabidopsis form a monophyletic gene family related to animal receptor kinases. Proc Natl Acad Sci USA. 2001;98:10763–10768. doi: 10.1073/pnas.181141598. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Si J., Pei Y., Shen D., Ji P., Xu R., Xue X., Peng H., Liang X., Dou D. Phytophthora sojae leucine-rich repeat receptor-like kinases: diverse and essential roles in development and pathogenicity. iScience. 2021;24 doi: 10.1016/j.isci.2021.102725. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Simão F.A., Waterhouse R.M., Ioannidis P., Kriventseva E.V., Zdobnov E.M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics. 2015;31(19):3210–3212. doi: 10.1093/bioinformatics/btv351. [DOI] [PubMed] [Google Scholar]
  • 36.Stancik I.A., Šestak M.S., Ji B., Axelson-Fisk M., Franjevic D., Jers C., Domazet-Lošo T., Mijakovic I. Serine/threonine protein kinases from bacteria, archaea and eukarya share a common evolutionary origin deeply rooted in the tree of life. J Mol Biol. 2018;430:27–32. doi: 10.1016/j.jmb.2017.11.004. [DOI] [PubMed] [Google Scholar]
  • 37.Suga H., Dacre M., de Mendoza A., Shalchian-Tabrizi K., Manning G., Ruiz-Trillo I. Genomic survey of premetazoans shows deep conservation of cytoplasmic tyrosine kinases and multiple radiations of receptor tyrosine kinases. Sci Signal. 2012;5 doi: 10.1126/scisignal.2002733. ra35-ra35. [DOI] [PubMed] [Google Scholar]
  • 38.Taylor B.L., Zhulin I.B. PAS domains: Internal sensors of oxygen, redox potential, and light. Microbiol Mol Biol Rev. 1999;63:479–506. doi: 10.1128/mmbr.63.2.479-506.1999. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Ullrich A., Coussens L., Hayflick J.S., Dull T.J., Gray A., Tam A.W., Lee J., Yarden Y., Libermann T.A., Schlessinger J., et al. Human epidermal growth factor receptor cDNA sequence and aberrant expression of the amplified gene in A431 epidermoid carcinoma cells. Nature. 1984;309:418–425. doi: 10.1038/309418a0. [DOI] [PubMed] [Google Scholar]
  • 40.Urano F., Bertolotti A., Ron D. IRE1 and efferent signaling from the endoplasmic reticulum. J Cell Sci. 2000;113:3697–3702. doi: 10.1242/jcs.113.21.3697. [DOI] [PubMed] [Google Scholar]
  • 41.van den Hoogen J., Govers F. GPCR-bigrams: enigmatic signaling components in oomycetes. PLoS Pathog. 2018;14 doi: 10.1371/journal.ppat.1007064. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.von Heijne G. Membrane-protein topology. Nat Rev Mol Cell Biol. 2006;7:909–918. doi: 10.1038/nrm2063. [DOI] [PubMed] [Google Scholar]
  • 43.Xie J., Chen Y., Cai G., Cai R., Hu Z., Wang H. Tree Visualization By One Table (tvBOT): a web application for visualizing, modifying and annotating phylogenetic trees. Nucleic Acids Res. 2023:gkad359. doi: 10.1093/nar/gkad359. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Qian W., Han Z.J., He C. Two-component signal transduction systems of Xanthomonas spp.: a lesson from genomics. Mol Plant Microbe Inter. 2008;21(2):151–161. doi: 10.1094/MPMI-21-2-0151. [DOI] [PubMed] [Google Scholar]
  • 45.Wuichet K., Cantwell B.J., Zhulin I.B. Evolution and phyletic distribution of two-component signal transduction systems. Curr Opin Microbiol. 2010;13:219–225. doi: 10.1016/j.mib.2009.12.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Yeung W., Kwon A., Taujale R., Bunn C., Venkat A., Kannan N. Evolution of functional diversity in the holozoan tyrosine kinome. Mol Biol Evol. 2021;38:5625–5639. doi: 10.1093/molbev/msab272. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary material

mmc1.zip (735.6KB, zip)

Supplementary material

mmc2.pdf (665.6KB, pdf)

Supplementary material

mmc3.xlsx (11.7KB, xlsx)

Supplementary material

mmc4.xlsx (17.8KB, xlsx)

Supplementary material

mmc5.xlsx (46.8KB, xlsx)

Supplementary material

mmc6.xlsx (22KB, xlsx)

Supplementary material

mmc7.xlsx (12KB, xlsx)

Supplementary material

mmc8.xlsx (2MB, xlsx)

Data Availability Statement

The datasets presented in this study are available in this paper’s supplemental information. Any additional information required to reanalyze the data reported in this paper is available upon request.


Articles from Computational and Structural Biotechnology Journal are provided here courtesy of Research Network of Computational and Structural Biotechnology

RESOURCES