Abstract
Cell surface receptors play essential roles in perceiving and processing external and internal signals at the cell surface of plants and animals. The receptor-like protein kinases (RLK) and receptor-like proteins (RLPs), two major classes of proteins with membrane receptor configuration, play a crucial role in plant development and disease defense. Although RLPs and RLKs share a similar single-pass transmembrane configuration, RLPs harbor short divergent C-terminal regions instead of the conserved kinase domain of RLKs. This RLP receptor structural design precludes sequence comparison algorithms from being used for high-throughput predictions of the RLP family in plant genomes, as has been extensively performed for RLK superfamily predictions. Here, we developed the RLPredictiOme, implemented with machine learning models in combination with Bayesian inference, capable of predicting RLP subfamilies in plant genomes. The ML models were simultaneously trained using six types of features, along with three stages to distinguish RLPs from non-RLPs (NRLPs), RLPs from RLKs, and classify new subfamilies of RLPs in plants. The ML models achieved high accuracy, precision, sensitivity, and specificity for predicting RLPs with relatively high probability ranging from 0.79 to 0.99. The prediction of the method was assessed with three datasets, two of which contained leucine-rich repeats (LRR)-RLPs from Arabidopsis and rice, and the last one consisted of the complete set of previously described Arabidopsis RLPs. In these validation tests, more than 90% of known RLPs were correctly predicted via RLPredictiOme. In addition to predicting previously characterized RLPs, RLPredictiOme uncovered new RLP subfamilies in the Arabidopsis genome. These include probable lipid transfer (PLT)-RLP, plastocyanin-like-RLP, ring finger-RLP, glycosyl-hydrolase-RLP, and glycerophosphoryldiester phosphodiesterase (GDPD, GDPDL)-RLP subfamilies, yet to be characterized. Compared to the only Arabidopsis GDPDL-RLK, molecular evolution studies confirmed that the ectodomain of GDPDL-RLPs might have undergone a purifying selection with a predominance of synonymous substitutions. Expression analyses revealed that predicted GDPGL-RLPs display a basal expression level and respond to developmental and biotic signals. The results of these biological assays indicate that these subfamily members have maintained functional domains during evolution and may play relevant roles in development and plant defense. Therefore, RLPredictiOme provides a framework for genome-wide surveys of the RLP superfamily as a foundation to rationalize functional studies of surface receptors and their relationships with different biological processes.
Keywords: RLPredictiOme, probable lipid transfer (PLT)-RLP, plastocyanin-like-RLP, ring finger-RLP, glycosyl-hydrolase-RLP, glycerophosphoryldiester phosphodiesterase (GDPD GDPDL)-RLP, receptor-like protein kinases, receptor-like proteins
1. Introduction
The capacity to transiently regulate cellular processes in response to external environmental signals is crucial to all living organisms. While the downstream regulatory events in a signaling cascade can involve biochemical modifications, including protein phosphorylation, ligand binding, and allosteric regulation, as well as changes in the transcription/translation profiles, the initial sensing event is predominantly mediated by membrane receptors. In plants, two major classes of proteins with membrane receptor structural configuration co-exist, namely receptor-like kinases (RLK) and receptor-like proteins (RLP) [1,2]. The receptor-like kinases comprise a large family with more than 420 family members in Arabidopsis [3]. These transmembrane receptors harbor a divergent extracellular domain (ectodomain, ECD) at the N-terminal region, followed by a transmembrane segment (TM) and a C-terminal cytoplasmic signaling domain. This configuration of a single-pass transmembrane kinase receptor invokes a mechanism of ligand binding-induced homo or hetero oligomerization of RLKs as the essential early event for signaling and transducing from the receptor, similarly to the receptor-tyrosine kinases (RTK) of mammalian cells [4,5]. In this scenario, ECD is the stimulus-sensing, ligand recognition domain that induces multimerization, and the kinase domain functions as the phosphorylation-dependent transducing module that relays the signal intracellularly.
Phylogenetic analyses based on the RLK kinase domains organized their ectodomain into clusters of conserved motifs and classified the RLKs into 15 subfamilies. Among them, the leucine-rich repeat (LRR)-RLK subfamily is further subdivided into 13 subfamilies (LRRI-XIII) according to the LRR motif organization ranging from 3 to 26 LRRs [6,7]. The RLK family size has been determined in other plant species, which revealed even larger RLK gene families in the genome of soybean, rice, and tomato [3,8,9,10]. The complexity of the RLK superfamily may reflect the intricate coordination of plant responses to external signals during plant development and interactions with the biotic and abiotic environment. Accordingly, several RLKs have been functionally characterized in development, environmental stresses, and plant defenses (for more details, see references [11,12,13,14,15,16,17,18,19,20,21,22]).
RLKs are also involved in plant immunity and function as pattern recognition receptors (PRRs), which perceive pathogen-associated molecular patterns (PAMPs) or damage-associated molecular patterns (DAMPs) presented, respectively, by pathogens and plants during infection. Interaction of PRRs with PAMPs/DAMPs initiates PAMP-triggered immunity (PTI), the first layer of the innate immune system in plants [23]. Many examples of leucine-rich repeat receptor-like kinases (LRR-RLKs) have been functionally characterized as PRRs (for more details, see references [24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42]).
The second class of plant transmembrane proteins, RLPs, are built into an N-terminal extracellular domain, which shares similar motifs with RLK ectodomains, an internal single transmembrane segment followed by a short cytoplasmic domain that lacks a transducing-kinase domain [23]. RLPs are structurally similar to Toll-like receptors (TLRs) involved in mammalian immunity, which also contain a leucine-rich repeat ectodomain and a short cytoplasmic tail [5]. The RLP configuration poses a higher degree of complexity for signaling as they depend on heterodimerization with RLKs or association with receptor-like cytoplasmic kinases (RLCK) for transducing a stimulus from the receptor. Accordingly, the leucine-rich repeat receptor-like protein (LRR-RLP) TOO MANY MOUTHS (TMM) forms complexes with LRR-RLKs ERECTA and ERECTA-LIKE 1 (ERL1) to perceive the EPIDERMAL PATTERNING FACTOR 1 (EPF1) and EPF2 peptides for the regulation of stomatal patterning [43], and CLAVATA2 RLP is required for the stability of CLAVATA1 (CLV1) RLK [44]. Likewise, lysine motif (LysM)-RLPs, LYSIN-MOTIF 1 (LYM1), and LYM3 associate with the LysM-RLK CERK1 (CHITIN ELICITOR RECEPTOR KINASE 1) to recognize bacterial peptidoglycans [45], and the LRR-RLP RLP23 forms a complex with the LRR-RLK SUPPRESSOR OF BIR1-1 (SOBIR1) that recognizes NECROSIS- AND ETHYLENE-INDUCING PEPTIDE 1 (NEP1)-LIKE PROTEINS (NLPs) to trigger PTI signaling [46]. In addition to these Arabidopsis RLPs, the first characterized RLP, Cf-9, was identified in tomato plants as an LRR-RLP and has been shown to trigger effector-triggered immunity (ETI)-like signaling, elicited specifically by the Cladosporium fulvum Av9 effector [47]. The tomato LRR-RLP Cf-4 is also required for resistance to C. fulvum expressing the Avr4 gene [48]. Cf-9 and Cf-4 associate with the RLKs SOBIR1 AND BRI1-ASSOCIATED KINASE 1 to initiate receptor endocytosis and plant immunity [49]. Likewise, N. benthamiana LRR-RLP RESPONSE TO XEG1 (RXEG1), which recognizes the glycoside hydrolase 12 protein XEG1, and RLP RE02 (Response to VmE02) forms a complex with BAK1 and SOBIR1 to transduce the XEG1- and VmE02- induced defense signals, respectively [50,51]. The rice RLP, OsRLP1, also interacts with OsSOBIR1 to induce immune responses against viral infection [52].
Although some progress has been reached in characterizing RLPs, a biological function has been assigned to only a few plant RLPs, despite their conceptual relevance in cell signaling events. While 15 RLK subfamilies with distinct ECD have been detected in Arabidopsis, only three Arabidopsis RLP subfamilies have been identified based on single-gene identification and functional studies [2]. The only genome-wide study of RLPs was restricted to the LRR-RLP subfamily [53]. In the case of RLKs, the successful identification and organization of the superfamily in different subfamilies relied on methods that use algorithms, such as BLAST and hidden Markov models (HMM), to perform searches for sequence alignments of conserved regions. One possible explanation for the poor characterization of RLPs may be the difficulty of assigning members to this family based on sequence comparison, as they lack the conserved C-terminal serine/threonine kinase domain, restricting the prediction of novel RLPs. In addition to requiring RLPs to be associated with a kinase domain-containing receptor for signaling, the lack of a cytoplasmic transducing kinase domain prevents genome-wide predictions of RLP subfamilies based on sequence comparisons. Therefore, a complete inventory of the RLP family in the genome of different plant species is lacking, and, hence, functional studies have been limited.
The limitation of software based on multiple sequence alignments for identifying RLPs may be overcome with the application of artificial intelligence algorithms developed based on filters that support the point features of these receptors. In artificial intelligence, machine learning (ML) has emerged as a potential tool in molecular biology to analyze massive datasets and extract knowledge from complex biosystems [54]. ML has been extensively used in all sorts of thematic issues, from medicine to robotics [55,56,57]. In plant science, ML has been applied for viral gene identification [58], the diagnosis of bacterial infection [59], salt stress tolerance [60], and the taxonomy of grapevine [61], in addition to global analysis of gene expression, in response to hormones and environmental stresses [62], plant immunity, and miRNA network prediction [54]. Trained models have also been successfully used for functional protein classification in plant genomes [63].
To provide a framework for identifying and predicting RLP function, we developed the RLPredictiOme as a machine learning method associated with Bayesian inference approaches. In addition to six different features to train ML models, the method used multiple datasets based on RLK ectodomains and the hypothesis that RLP lacks the kinase domain but retains the same RLK receptor configuration. It is reasonable to suppose that the RLP family may contain all RLK-identified ectodomains as they may have emerged during evolution from kinase domain-losing RLKs. So far, five RLK different ectodomains-containing RLP groups have been identified [53]. Our ML models could distinguish RLPs from non-RLPs (NRLPs), RLPs from RLKs and classify subfamilies with relatively high accuracy, precision, sensitivity, and specificity. To prove the capacity to predict RLP families, we validated the method with biological experiments describing a new RLP family, designated GDPDL-RLP. The RLPredictiOme may facilitate the prediction and provide new insights into the role of RLPs in plants.
2. Results
2.1. Revisiting the Ectodomain of the RLK Superfamily in Plants
We performed a survey in the genome of 80 plant species to identify the functional ectodomains of RLKs based on in silico models as a first step for defining the datasets. A total of 40,418 sequences were retrieved. We identified 100 classes of RLK ectodomains associated with C-terminal kinase domains (Table 1). However, most of these ectodomains generated subfamilies with less than 10 members. Sequence identities higher than 0.85 were removed through CD-hit software. Additionally, only sequences with a single membrane segment were selected. A total of 14,787 amino acid sequences were recovered, and their ectodomains were used as positive datasets for filtering RLPs versus NRLPs and RLPs versus RLKs.
Table 1.
Number of RLKs harboring the indicated ectodomain type.
| Description | Total | Description | Total | Description | Total |
|---|---|---|---|---|---|
| LRR-RLK | 14,087 | CHASE-RLK | 8 | CUB-RLK | 2 |
| Unknown-RLK | 10,020 | Cysteine-rich-secretory-RLK | 7 | DUF1084-RLK | 2 |
| S-domain-RLK | 3859 | GDPDL-RLK | 7 | DUF726-RLK | 2 |
| Malectin-RLK | 3299 | Universal-stress-RLK | 6 | Endomembrane-RLK | 2 |
| Salt-stress-response/antifungal-RLK | 2345 | ACT-RLK | 5 | GAF-domain | 2 |
| L-Lectin-RLK | 2213 | Probable-lipid-transfer-RLK | 5 | GTPase-RLK | 2 |
| WAK-RLK | 1844 | Ankyrin-Kinase | 4 | Glycosyl hydrolases-RLK | 2 |
| B-lectin-RLK | 549 | Chromo-RLK | 4 | Glycosyltransferase-RLK | 2 |
| LysM-RLK | 381 | PAN-like-Kinase | 4 | HAD-RLK | 2 |
| WAK-EGF-RLK | 285 | PB1-RLK | 4 | HAD-hyrolase-like-RLK | 2 |
| EGF-like-RLK | 212 | Sel1-RLK | 4 | MSP-RLK | 2 |
| WAK-EFG-RLK | 177 | Alpha/beta-hydrolase-RLK | 3 | NB-ARC-RLK | 2 |
| RCC1-RLK | 148 | Cytochrome P450-RLK | 3 | PQQ-enzyme-RLK | 2 |
| B-Lectin-RLK | 145 | Helix-loop-helix-DNA-binding-RLK | 3 | Peptidase-RLK | 2 |
| PAN-RLK | 131 | Histidine-phosphatase-RLK | 3 | PfkB-RLK | 2 |
| C-Lectin-RLK | 90 | Major-Facilitator-RLK | 3 | Wnt-and-FGF-inhibitory-regulator-RLK | 2 |
| Glycosyl-hydrolases-RLK | 90 | MatE-RLK | 3 | Adenylate-cyclase-associated-(CAP)-N-terminal-RLK | 1 |
| Thaumatin-RLK | 86 | PPR | 3 | Alcohol-dehydrogenase-GroES-like-RLK | 1 |
| NAF-RLK | 79 | PPR-RLK | 3 | Aldose-1-epimerase-RLK | 1 |
| Ethylene-responsive-RLK | 74 | Phospholipase-RLK | 3 | Ankyrin-RLK | 1 |
| EF-hand-RLK | 50 | Proline-rich-RLK | 3 | Castor-and-Pollux-RLK | 1 |
| Cache-RLK | 32 | Sugar-(and other)-transporter-RLK | 3 | Cyclic nucleotide-binding-RLK | 1 |
| Chitinase-RLK | 15 | Transmembrane-RLK | 3 | Cyclic-nucleotide-binding-RLK | 1 |
| PAS-RLK | 12 | Alpha-amylase-catalytic-RLK | 2 | Cytochrome-P450-RLK | 1 |
| Plastocyanin-like-RLK | 12 | Barwin-RLK | 2 | DEAD/DEAH-box-helicase-RLK | 1 |
| Ring-finger-RLK | 9 | C2-RLK | 2 | DUF1221-RLK | 1 |
| Adenovirus E3-RLK | 8 |
Three datasets were created to represent a higher number of negative examples. The first dataset contained 14,973 positive examples and 15,993 negative ones. The second and third ones contained the same examples, 14,973 positives, and 15,973 negative examples. To distinguish RLPs from NRLPs, we used six types of features (see Methods sections) from the three datasets, thus implying a total of 18 training sets. On the other hand, to distinguish RLPs from RLKs, only one dataset with 14,973 positives (ectodomain of the RLKs) and negative (full-length sequence of the RLKs) examples were used, implied in six training sets based on the assumed number of features.
The RLP subfamily members were assigned according to the ectodomains of RLKs. For each training set, 15 classes were considered, and a 16th class, designated Other RLPs, was defined by grouping the smaller subfamilies (Table 2). In some plant species, uncharacterized RLK subfamilies have at least one to ten members and were grouped in the class Other-RLPs. LRR-RLKs, unknown-RLK, S-domain-RLK, and WAK-RLKs are over-represented RLK subfamilies in plants. In contrast, thaumatin, GDPD, and malectin are small subfamilies not represented in all plant species [9]. For each super-represented subfamily, 500 sequences were randomly selected to compose ten additional datasets; thereby, considering the previously mentioned six types of features, 60 training sets were obtained for training.
Table 2.
Subfamily size of receptor-like kinase proteins.
| No | Label | Count |
|---|---|---|
| 1 | L-Lectin-RLK | 980 |
| 2 | LRR-RLK | 5404 |
| 3 | S-domain-RLK | 1626 |
| 4 | Malectin-RLK | 1313 |
| 5 | Salt-stress-response/antifungal-RLK | 1004 |
| 6 | WAK-RLK | 1362 |
| 7 | B-Lectin-RLK | 362 |
| 8 | Unknown-RLK | 3285 |
| 10 | PAN-RLK | 41 |
| 11 | Ethylene-responsive-RLK | 29 |
| 12 | Thaumatin-RLK | 52 |
| 13 | RCC1-RLK | 65 |
| 14 | Glycosyl-hydrolases-RLK | 40 |
| 15 | C-Lectin-RLK | 21 |
| 16 | Other-RLKs | 192 |
2.2. Feature Analysis
We implemented the RLPredictiOme method using six distinct types of attributes (Figure 1). These included (i) the frequency of the chemical properties of amino acid side chains (CPAASC), which have 9 features, and (ii) CPAASC2 extracted from N-terminal and C-terminal regions with 18 features; (iii) the amino acid composition with 20 features and (iv) amino acid composition extracted from N-terminal and C-terminal regions with 40 features (Figure 1B). Furthermore, we used (v) dipeptide and (vi) tripeptide compositions resulting in 400 and 8000 features, respectively. The simultaneous use of six types of features and multiple datasets provided RLPredictiOme with information to apply Bayesian inference (see Section 4) as a powerful ensemble method to make robust predictions.
Figure 1.
Schematic representation of the RLPredictiOme method. Amino acid sequences are submitted to the method with the sequential filters A to F. (A) The signal peptide and segment transmembrane prediction. (B) Attribute vector provided to the ML models. (C) The first step of the classification to distinguish RLP from NRLP (RLP/NRLP). The result (binary vector) of the classification is submitted to perform Bayesian inference using probability distribution Binomial conjugated with Beta distribution. (D) The second classification step to distinguish RLP from RLK (RLP/RLK). The result (binary vector) is submitted to perform Bayesian inference using probability distribution Binomial conjugated with Beta distribution. (E) The ML models for subfamily classification is the third step to classify RLP families. The result (numerical vector) of the classification is submitted to perform Bayesian inference through the Multinomial and Dirichlet probability distributions. (F) The Bayesian inference for making decisions and final prediction using binary vector resulting from the preview inferences.
For the classification models for RLPs/NRLPs (first step, Figure 1C), the tripeptide composition was the feature with the best performance among all tested features of the models built with the RLPs/NRLPs datasets using the logistic regression algorithm (Table 3). The three models built with tripeptide composition achieved accuracy (ACC) of 0.953, 0.955, and 0.953, respectively, and Matthew’s correlation coefficient (MCC) of 0.906, 0.910, and 0.96, respectively. Furthermore, the false discovery rate (FDR) was lower than 0.05.
Table 3.
Summarized results of the evaluation models built with the RLPs/NRLPs datasets.
| Data Set | Algorithm | ACC | F1 | FDR | MCC | Precision | Sensitivity | Specificity |
|---|---|---|---|---|---|---|---|---|
| AAComposition_1 | Logistic RegressionCV | 0.9173 | 0.9211 | 0.0878 | 0.8343 | 0.9303 | 0.9303 | 0.9032 |
| AAComposition_2 | Logistic RegressionCV | 0.9205 | 0.9241 | 0.0839 | 0.8407 | 0.9322 | 0.9322 | 0.9078 |
| AAComposition_3 | Logistic RegressionCV | 0.9209 | 0.9245 | 0.0831 | 0.8416 | 0.9321 | 0.9321 | 0.9088 |
| AAComposition_N_C terminal_1 | MLP Classifier | 0.9457 | 0.9478 | 0.0534 | 0.8912 | 0.9490 | 0.9490 | 0.9421 |
| AAComposition_N_C terminal_2 | MLP Classifier | 0.9468 | 0.9487 | 0.0513 | 0.8934 | 0.9487 | 0.9487 | 0.9446 |
| AAComposition_N_C terminal_3 | MLP Classifier | 0.9482 | 0.9499 | 0.0457 | 0.8964 | 0.9456 | 0.9456 | 0.9511 |
| CPAASC_1 | Linear Discriminant Analysis | 0.9020 | 0.9102 | 0.1315 | 0.8074 | 0.9561 | 0.9561 | 0.8436 |
| CPAASC_2 | Linear Discriminant Analysis | 0.9042 | 0.9120 | 0.1282 | 0.8116 | 0.9562 | 0.9562 | 0.8481 |
| CPAASC_3 | Linear Discriminant Analysis | 0.9040 | 0.9119 | 0.1288 | 0.8113 | 0.9566 | 0.9566 | 0.8473 |
| CPAASC_N_C terminal_1 | Linear Discriminant Analysis | 0.9104 | 0.9172 | 0.1183 | 0.8232 | 0.9558 | 0.9558 | 0.8614 |
| CPAASC_N_C terminal_2 | Linear Discriminant Analysis | 0.9132 | 0.9196 | 0.1148 | 0.8284 | 0.9569 | 0.9569 | 0.8660 |
| CPAASC_N_C terminal_3 | Linear Discriminant Analysis | 0.9140 | 0.9204 | 0.1137 | 0.8301 | 0.9572 | 0.9572 | 0.8674 |
| Dipeptide_1 | MLP Classifier | 0.9439 | 0.9457 | 0.0497 | 0.8878 | 0.9412 | 0.9412 | 0.9468 |
| Dipeptide_2 | MLP Classifier | 0.9481 | 0.9500 | 0.0501 | 0.8960 | 0.9500 | 0.9500 | 0.9459 |
| Dipeptide_3 | MLP Classifier | 0.9447 | 0.9466 | 0.0497 | 0.8894 | 0.9428 | 0.9428 | 0.9468 |
| Tripeptide_1 | Logistic RegressionCV | 0.9535 | 0.9551 | 0.0410 | 0.9069 | 0.9511 | 0.9511 | 0.9561 |
| Tripeptide_2 | Logistic RegressionCV | 0.9550 | 0.9565 | 0.0389 | 0.9100 | 0.9519 | 0.9519 | 0.9584 |
| Tripeptide_3 | Logistic RegressionCV | 0.9534 | 0.9549 | 0.0404 | 0.9067 | 0.9502 | 0.9502 | 0.9568 |
| Mean | 0.9303 | 0.9342 | 0.0784 | 0.8615 | 0.9480 | 0.9480 | 0.9112 |
For the classification models for RLPs/RLKs (second step, Figure 1D), the amino acid composition of the N-terminus and C-terminus and tripeptide composition were the features archiving both the best performance, resulting in ACC of 0.97, MCC of 0.95 and FDR lower than 0.05 (Table 4). In the RLP subfamily models built with RLP subfamily datasets (third step, Figure 1E), the tripeptide composition outperformed the others, with ACC and MCC of 0.984 and 0.866, respectively (Table 5).
Table 4.
Summarized results of the evaluation models built with the RLPs/RLKs datasets.
| Data Set | Algorithm | ACC | F1 | FDR | MCC | Precision | Sensitivity | Specificity |
|---|---|---|---|---|---|---|---|---|
| AAComposition_N_C terminal | Quadratic Discriminant Analysis | 0.9775 | 0.9773 | 0.0337 | 0.9552 | 0.9884 | 0.9884 | 0.9670 |
| Tripeptide | Gradient Boosting Classifier | 0.9762 | 0.9760 | 0.0367 | 0.9527 | 0.9890 | 0.9890 | 0.9639 |
| CPAASC_N_C_terminal | Linear Discriminant Analysis | 0.9707 | 0.9706 | 0.0479 | 0.9421 | 0.9899 | 0.9899 | 0.9523 |
| CPAASC | Linear Discriminant Analysis | 0.9647 | 0.9647 | 0.0572 | 0.9304 | 0.9877 | 0.9877 | 0.9426 |
| Dipeptide | MLP Classifier | 0.9627 | 0.9617 | 0.0344 | 0.9254 | 0.9579 | 0.9579 | 0.9673 |
| AAComposition | Quadratic Discriminant Analysis | 0.9571 | 0.9571 | 0.0627 | 0.9151 | 0.9777 | 0.9777 | 0.9374 |
| Mean | 0.9681 | 0.9679 | 0.0454 | 0.9368 | 0.9818 | 0.9818 | 0.9551 |
Table 5.
Summarized results of the evaluation models built with the RLP subfamily datasets.
| Data Set | Algorithm | ACC | F1 | MCC | Precision | Sensitivity |
|---|---|---|---|---|---|---|
| AAComposition_10 | Linear Discriminant Analysis | 0.984 | 0.872 | 0.864 | 0.872 | 0.872 |
| AAComposition_1 | Calibrated ClassifierCV | 0.984 | 0.869 | 0.861 | 0.869 | 0.869 |
| AAComposition_2 | Calibrated ClassifierCV | 0.984 | 0.874 | 0.866 | 0.874 | 0.874 |
| AAComposition_3 | Linear Discriminant Analysis | 0.984 | 0.873 | 0.864 | 0.873 | 0.873 |
| AAComposition_4 | Linear Discriminant Analysis | 0.984 | 0.870 | 0.862 | 0.870 | 0.870 |
| AAComposition_5 | Linear Discriminant Analysis | 0.983 | 0.867 | 0.858 | 0.867 | 0.867 |
| AAComposition_6 | Linear Discriminant Analysis | 0.984 | 0.871 | 0.863 | 0.871 | 0.871 |
| AAComposition_7 | Calibrated ClassifierCV | 0.984 | 0.869 | 0.861 | 0.869 | 0.869 |
| AAComposition_8 | Calibrated ClassifierCV | 0.985 | 0.876 | 0.868 | 0.876 | 0.876 |
| AAComposition_9 | Linear Discriminant Analysis | 0.984 | 0.875 | 0.867 | 0.875 | 0.875 |
| Mean | 0.984 | 0.872 | 0.863 | 0.872 | 0.872 | |
| AAComposition_N_C_terminal_10 | Calibrated ClassifierCV | 0.989 | 0.911 | 0.905 | 0.911 | 0.911 |
| AAComposition_N_C_terminal_1 | Calibrated ClassifierCV | 0.988 | 0.904 | 0.897 | 0.904 | 0.904 |
| AAComposition_N_C_terminal_2 | Calibrated ClassifierCV | 0.989 | 0.908 | 0.902 | 0.908 | 0.908 |
| AAComposition_N_C_terminal_3 | Calibrated ClassifierCV | 0.988 | 0.902 | 0.896 | 0.902 | 0.902 |
| AAComposition_N_C_terminal_4 | KNeighbors Classifier | 0.989 | 0.911 | 0.905 | 0.911 | 0.911 |
| AAComposition_N_C_terminal_5 | KNeighbors Classifier | 0.989 | 0.909 | 0.903 | 0.909 | 0.909 |
| AAComposition_N_C_terminal_6 | KNeighbors Classifier | 0.988 | 0.903 | 0.896 | 0.903 | 0.903 |
| AAComposition_N_C_terminal_7 | KNeighbors Classifier | 0.988 | 0.900 | 0.894 | 0.900 | 0.900 |
| AAComposition_N_C_terminal_8 | Calibrated ClassifierCV | 0.988 | 0.903 | 0.897 | 0.903 | 0.903 |
| AAComposition_N_C_terminal_9 | Calibrated ClassifierCV | 0.988 | 0.907 | 0.900 | 0.907 | 0.907 |
| Mean | 0.988 | 0.906 | 0.899 | 0.906 | 0.906 | |
| CPAASC_10 | Linear Discriminant Analysis | 0.972 | 0.778 | 0.764 | 0.778 | 0.778 |
| CPAASC_1 | AdaBoost Classifier | 0.971 | 0.772 | 0.757 | 0.772 | 0.772 |
| CPAASC_2 | AdaBoost Classifier | 0.972 | 0.776 | 0.761 | 0.776 | 0.776 |
| CPAASC_3 | AdaBoost Classifier | 0.972 | 0.773 | 0.759 | 0.773 | 0.773 |
| CPAASC_4 | Linear Discriminant Analysis | 0.971 | 0.770 | 0.755 | 0.770 | 0.770 |
| CPAASC_5 | Linear Discriminant Analysis | 0.972 | 0.773 | 0.759 | 0.773 | 0.773 |
| CPAASC_6 | Linear Discriminant Analysis | 0.971 | 0.771 | 0.756 | 0.771 | 0.771 |
| CPAASC_7 | AdaBoos tClassifier | 0.972 | 0.773 | 0.758 | 0.773 | 0.773 |
| CPAASC_8 | Linear Discriminant Analysis | 0.972 | 0.778 | 0.763 | 0.778 | 0.778 |
| CPAASC_9 | AdaBoost Classifier | 0.972 | 0.774 | 0.759 | 0.774 | 0.774 |
| Mean | 0.972 | 0.774 | 0.759 | 0.774 | 0.774 | |
| CPAASC_N_C_terminal_10 | AdaBoost Classifier | 0.975 | 0.800 | 0.787 | 0.800 | 0.800 |
| CPAASC_N_C_terminal_1 | Linear Discriminant Analysis | 0.976 | 0.810 | 0.797 | 0.810 | 0.810 |
| CPAASC_N_C_terminal_2 | AdaBoost Classifier | 0.975 | 0.803 | 0.790 | 0.803 | 0.803 |
| CPAASC_N_C_terminal_3 | Linear Discriminant Analysis | 0.976 | 0.804 | 0.792 | 0.804 | 0.804 |
| CPAASC_N_C_terminal_4 | Linear Discriminant Analysis | 0.976 | 0.805 | 0.793 | 0.805 | 0.805 |
| CPAASC_N_C_terminal_5 | AdaBoost Classifier | 0.975 | 0.802 | 0.789 | 0.802 | 0.802 |
| CPAASC_N_C_terminal_6 | Linear Discriminant Analysis | 0.976 | 0.808 | 0.795 | 0.808 | 0.808 |
| CPAASC_N_C_terminal_7 | Linear Discriminant Analysis | 0.976 | 0.808 | 0.795 | 0.808 | 0.808 |
| CPAASC_N_C_terminal_8 | AdaBoost Classifier | 0.975 | 0.802 | 0.789 | 0.802 | 0.802 |
| CPAASC_N_C_terminal_9 | Linear Discriminant Analysis | 0.976 | 0.805 | 0.792 | 0.805 | 0.805 |
| Mean | 0.976 | 0.805 | 0.792 | 0.805 | 0.805 | |
| Dipeptide_10 | KNeighbors Classifier | 0.992 | 0.935 | 0.931 | 0.935 | 0.935 |
| Dipeptide_1 | KNeighbors Classifier | 0.992 | 0.937 | 0.933 | 0.937 | 0.937 |
| Dipeptide_2 | KNeighbors Classifier | 0.992 | 0.935 | 0.931 | 0.935 | 0.935 |
| Dipeptide_3 | KNeighbors Classifier | 0.992 | 0.934 | 0.930 | 0.934 | 0.934 |
| Dipeptide_4 | KNeighbors Classifier | 0.991 | 0.932 | 0.927 | 0.932 | 0.932 |
| Dipeptide_5 | KNeighbors Classifier | 0.992 | 0.934 | 0.930 | 0.934 | 0.934 |
| Dipeptide_6 | KNeighbors Classifier | 0.991 | 0.931 | 0.926 | 0.931 | 0.931 |
| Dipeptide_7 | KNeighbors Classifier | 0.992 | 0.933 | 0.929 | 0.933 | 0.933 |
| Dipeptide_8 | KNeighbors Classifier | 0.991 | 0.925 | 0.920 | 0.925 | 0.925 |
| Dipeptide_9 | KNeighbors Classifier | 0.991 | 0.929 | 0.925 | 0.929 | 0.929 |
| Mean | 0.992 | 0.932 | 0.928 | 0.932 | 0.932 | |
| Tripeptide_1 | KNeighbors Classifier | 0.995 | 0.957 | 0.954 | 0.957 | 0.957 |
| Tripeptide_2 | KNeighbors Classifier | 0.994 | 0.955 | 0.952 | 0.955 | 0.955 |
| Tripeptide_3 | KNeighbors Classifier | 0.994 | 0.956 | 0.953 | 0.956 | 0.956 |
| Tripeptide_4 | KNeighbors Classifier | 0.995 | 0.958 | 0.955 | 0.958 | 0.958 |
| Tripeptide_5 | KNeighbors Classifier | 0.995 | 0.958 | 0.955 | 0.958 | 0.958 |
| Tripeptide_6 | KNeighbors Classifier | 0.994 | 0.954 | 0.951 | 0.954 | 0.954 |
| Tripeptide_7 | KNeighbors Classifier | 0.994 | 0.955 | 0.952 | 0.955 | 0.955 |
| Tripeptide_8 | KNeighbors Classifier | 0.994 | 0.951 | 0.948 | 0.951 | 0.951 |
| Tripeptide_9 | KNeighbors Classifier | 0.995 | 0.958 | 0.955 | 0.958 | 0.958 |
| Tripeptide_10 | KNeighbors Classifier | 0.995 | 0.959 | 0.957 | 0.959 | 0.959 |
| Mean | 0.994 | 0.956 | 0.953 | 0.956 | 0.956 |
2.3. ML Model Capacity of Distinguishing RLPs from NRLPs
The ability of the ML models to distinguish RLPs from NRLPs was examined through the predictive capacity of the models created with the RLPs/NRLPs datasets (Figure 1C). The models that classify RLPs/NRLPs were evaluated using 10-fold cross-validation based on the following metrics: ACC, sensitivity, precision, F-measure, specificity, FDR, and MCC. For each dataset, 21 models (21 algorithms) were selected, and the performance results are presented in Table 3. In general, the selected models provided average values for ACC, F-measure, FDR, MCC, precision, sensitivity, and specificity equal to 0.93, 0.934, 0.070, 0.861, 0.948, 0.948, and 0.911, respectively.
2.4. ML Model Abilities to Distinguish RLPs from RLKs
To distinguish RLPs from RLKs, we assessed the generality of models constructed with RLP/RLK datasets (Figure 1D). The outcome of 10-fold cross-validations and evaluated metrics for RLPs/RLKs models are shown in Table 4. The quadratic discriminant analysis and gradient boosting classifier with the amino acid composition of the N-terminus, C-terminus, and tripeptide features outperformed the others (Table 4). The average performance of the six models provided ACC 0.968, F-measure 0.967, FDR 0.04, MCC 0.936, precision 0.981, sensitivity 0.981, and specificity 0.955, respectively.
2.5. The Ability of ML Models to Classify RLP Subfamilies
To classify the RLP subfamily, we evaluated models built with RLP subfamily datasets using 10-fold cross-validation. The performance of the models was examined by the previously mentioned metrics (Figure 1E). The tripeptide and dipeptide composition features achieved average MCC values higher than 0.90 when using the K-nearest neighbor algorithm. The N-terminus and C-terminus amino acid composition feature achieved an average MCC value of 0.899 using a calibrated classifier and linear discriminant analysis (Table 4). The average performance of the six models provided ACC 0.98, F-measure 0.874, FDR, MCC, precision 0.877, sensitivity 0.87, while MCC varied from 0.759 to 0.953 (Table 5).
2.6. Validation of RLPredictiOme
For RLPredictiOme validation, we tested the ML models in combination with Bayesian inference as an ensemble method approach (Figure 1). In the first validation, we submitted 47 near-characterized sequences of RLPs against the RLPredictiOme. The validation data set comprises thirty-nine LRR-RLPs, six LysM-RLPs, two WAK-RLPs, and one salt stress-responsive/antifungal-RLP (Table 6). However, six of these RLPs were not characterized as RLP as they did not have a TM. The test resulted in thirty-seven LRR-RLPs correctly classified, two LysM-RLPs were correctly classified, and two LysM-RLPs were classified as undefined due to relative low probability (p) provided by Bayesian inference of the RLP subfamily. The remaining two LysM-RLPs (Q67UE8.1 LYP4 and Q69T51.1 LYP6), one WAK-RLP (AKP45167), and one salt stress-responsive/antifungal- RLP (LOC_Os04g56430.1) were not classified as RLPs due to the TM absence (Table 6).
Table 6.
Validation of the almost characterized RLPs.
| Accession | SP | TM | RLP-NRLP | RLP-NRLP Probability | RLP-RLK | RLP-RLK Probability | RLP-Subfamily | RLP-Subfamily Probability | Classification | Decision Probability |
|---|---|---|---|---|---|---|---|---|---|---|
| NP_001234733.2 | Y | Y | RLP | 0.9961 | RLP | 0.5751 | LRR-RLP | 0.7666 | (LRR-RLP) | 0.9894 |
| sQ9LNV9.2_RLP1 | Y | Y | RLP | 0.9961 | RLP | 0.7161 | LRR-RLP | 0.7671 | (LRR-RLP) | 0.9891 |
| sp—Q93ZH0.1—LYM1 | Y | Y | RLP | 0.8941 | RLP | 0.9915 | LysM-RLP | 0.467 | (LysM-RLP) | 0.989 |
| CAC40826.1_HcrVf2 | Y | Y | RLP | 0.9961 | RLP | 0.9895 | LRR-RLP | 0.8333 | (LRR-RLP) | 0.9888 |
| AAA65235.1_Cf-9 | Y | Y | RLP | 0.9965 | RLP | 0.9906 | LRR-RLP | 0.8331 | (LRR-RLP) | 0.9887 |
| AAC78594.1_Hcr2-2A | Y | Y | RLP | 0.9965 | RLP | 0.8569 | LRR-RLP | 0.849 | (LRR-RLP) | 0.9885 |
| Q9SSD1.1 | Y | Y | RLP | 0.9966 | RLP | 0.991 | LRR-RLP | 0.4667 | (LRR-RLP) | 0.9885 |
| AAC15779.1_Cf-2.1 | Y | Y | RLP | 0.9965 | RLP | 0.855 | LRR-RLP | 0.85 | (LRR-RLP) | 0.9882 |
| sp—Q7FZR1.1—RLP52 | Y | Y | RLP | 0.9966 | RLP | 0.9903 | LRR-RLP | 0.8336 | (LRR-RLP) | 0.9882 |
| QED40966.1 | Y | Y | RLP | 0.9962 | RLP | 0.7168 | LRR-RLP | 0.8506 | (LRR-RLP) | 0.9881 |
| CAC40827.1_HcrVf3 | Y | Y | RLP | 0.9964 | RLP | 0.9909 | LRR-RLP | 0.8501 | (LRR-RLP) | 0.988 |
| sp—Q9LJS0.1—RLP42 | Y | Y | RLP | 0.9966 | RLP | 0.9911 | LRR-RLP | 0.8502 | (LRR-RLP) | 0.988 |
| AAC78593.1_Hcr2-0B | Y | Y | RLP | 0.9962 | RLP | 0.991 | LRR-RLP | 0.8495 | (LRR-RLP) | 0.9879 |
| Q9FK66.1_RLP55 | Y | Y | RLP | 0.9958 | RLP | 0.9915 | LRR-RLP | 0.6669 | (LRR-RLP) | 0.9879 |
| sQ9SN38.1_RLP5 | Y | Y | RLP | 0.9963 | RLP | 0.9912 | LRR-RLP | 0.8497 | (LRR-RLP) | 0.9879 |
| AAC78596.1_Hcr2-5D | Y | Y | RLP | 0.9959 | RLP | 0.9909 | LRR-RLP | 0.85 | (LRR-RLP) | 0.9878 |
| BAE95828.1 (LysM) | Y | Y | RLP | 0.9964 | RLP | 0.99 | Undefined | 0.4169 | (Undefined) | 0.9878 |
| Q9LJS2.1 | Y | Y | RLP | 0.9964 | RLP | 0.9906 | LRR-RLP | 0.8505 | (LRR-RLP) | 0.9878 |
| AJG42080.1_RLM2 | Y | Y | RLP | 0.9963 | RLP | 0.9908 | LRR-RLP | 0.8493 | (LRR-RLP) | 0.9877 |
| CAA05269.1_Hcr9-4E | Y | Y | RLP | 0.9962 | RLP | 0.9893 | LRR-RLP | 0.8332 | (LRR-RLP) | 0.9877 |
| AJG42091.1_LEPR3 | Y | Y | RLP | 0.9967 | RLP | 0.9911 | LRR-RLP | 0.8508 | (LRR-RLP) | 0.9875 |
| Q9M2Y3.1_RLP44 | Y | Y | RLP | 0.9962 | RLP | 0.9902 | LRR-RLP | 0.7503 | (LRR-RLP) | 0.9875 |
| CAC40825.1_HcrVf1 | Y | Y | RLP | 0.9965 | RLP | 0.9921 | LRR-RLP | 0.8166 | (LRR-RLP) | 0.9874 |
| NP_001234474.2 | Y | Y | RLP | 0.9963 | RLP | 0.991 | LRR-RLP | 0.8332 | (LRR-RLP) | 0.9874 |
| Solyc08g016270.1.1 | Y | Y | RLP | 0.9961 | RLP | 0.72 | LRR-RLP | 0.6335 | (LRR-RLP) | 0.9874 |
| AAC78595.1_Hcr2-5B | Y | Y | RLP | 0.9963 | RLP | 0.8517 | LRR-RLP | 0.85 | (LRR-RLP) | 0.9873 |
| O80809.1_CLV2 | Y | Y | RLP | 0.9964 | RLP | 0.991 | LRR-RLP | 0.8496 | (LRR-RLP) | 0.9873 |
| sp—O23006.1—LYM2 | Y | Y | RLP | 0.9962 | RLP | 0.9908 | Undefined | 0.5005 | (Undefined) | 0.9873 |
| sp—O48849.1—RLP23 | Y | Y | RLP | 0.9959 | RLP | 0.9906 | LRR-RLP | 0.7833 | (LRR-RLP) | 0.9873 |
| AAC78592.1_Hcr2-0A | Y | Y | RLP | 0.9966 | RLP | 0.8518 | LRR-RLP | 0.8513 | (LRR-RLP) | 0.9872 |
| sp—Q6NPN4.1—LYM3 | Y | Y | RLP | 0.9452 | RLP | 0.99 | LysM-RLP | 0.4501 | (LysM-RLP) | 0.9872 |
| AAC78591.1 | Y | Y | RLP | 0.9966 | RLP | 0.9899 | LRR-RLP | 0.8507 | (LRR-RLP) | 0.9871 |
| AJV90937.1 | Y | Y | RLP | 0.9968 | RLP | 0.8507 | LRR-RLP | 0.8332 | (LRR-RLP) | 0.9871 |
| AUT14025.1 | Y | Y | RLP | 0.9962 | RLP | 0.8537 | LRR-RLP | 0.7329 | (LRR-RLP) | 0.987 |
| AAC15780.1_Cf-2.2 | Y | Y | RLP | 0.9961 | RLP | 0.8555 | LRR-RLP | 0.8491 | (LRR-RLP) | 0.9863 |
| AGI92782.1_RLP1.813 | Y | Y | RLP | 0.9963 | RLP | 0.9906 | LRR-RLP | 0.4005 | (LRR-RLP) | 0.9862 |
| NP_187187.1 | Y | Y | RLP | 0.9964 | RLP | 0.9913 | LRR-RLP | 0.6497 | (LRR-RLP) | 0.986 |
| AKR80573.1_I-7 | Y | Y | RLP | 0.9963 | RLP | 0.8605 | LRR-RLP | 0.65 | (LRR-RLP) | 0.9855 |
| NP_001362850.1_EIX2 | Y | Y | RLP | 0.9961 | RLP | 0.8581 | LRR-RLP | 0.6005 | (LRR-RLP) | 0.985 |
| sp—Q9SHI4.1—RLP3 | N | Y | RLP | 0.9965 | RLP | 0.9904 | LRR-RLP | 0.8328 | (LRR-RLP) | 0.8015 |
| NP_001355132.1 | N | Y | RLP | 0.9965 | RLP | 0.9903 | LRR-RLP | 0.5163 | (LRR-RLP) | 0.8012 |
| Q940E8.1_FEA2 | Y | N | RLP | 0.9487 | RLP | 0.8554 | LRR-RLP | 0.849 | NRLP | 0.2048 |
| sp—Q67UE8.1—LYP4 | Y | N | RLP | 0.7894 | RLP | 0.8564 | Undefined | 0.0 | NRLP | 0.2017 |
| AFB75328.1 | Y | N | RLP | 0.9472 | RLP | 0.857 | LRR-RLP | 0.5667 | NRLP | 0.2012 |
| AKP45167.1 | Y | N | RLP | 0.9462 | RLP | 0.8543 | Undefined | 0.4495 | NRLP | 0.201 |
| sp—Q69T51.1—LYP6 | Y | N | RLP | 0.8422 | RLP | 0.8544 | Undefined | 0.0 | NRLP | 0.2007 |
| LOC_Os04g56430.1 | Y | N | RLP | 0.9471 | RLP | 0.8518 | Salt-stress-response/antifungal-RLP | 0.4334 | NRLP | 0.1986 |
In the second validation, we used the data of a genome-wide study of RLPs restricted to the LRR-RLP subfamily [53]. The 57 LRR-RLPs of Arabidopsis were submitted to the RLPredictiOme predictor. As a result, 47 LRR-RLPs were classified correctly, although 13 LRR-RLPs did not have a signal peptide (SP). One LRR-RLP harboring SP was undefined, and the remaining nine LRR-RLPs were not classified as RLP due to the TM absence (Table 7). Interestingly, the AtRLP4 protein was previously classified as LRR-RLP; however, the RLPredictiOme classified it as malectin-RLP due to one di-glucose binding domain within the endoplasmic reticulum-associated LRR domain.
Table 7.
Validation of the RLPs from the genome-wide study of Arabidopsis RLPs restricted to the LRR-RLP subfamily.
| Accession | SP | TM | RLP-NRLP | RLP-NRLP Probability | RLP-RLK | RLP-RLK Probability | RLP-Subfamily | RLP-Subfamily Probability | Classification | Decision Probability |
|---|---|---|---|---|---|---|---|---|---|---|
| AT1G65380.1 | Y | Y | RLP | 0.9962 | RLP | 0.9907 | LRR-RLP | 0.8505 | (LRR-RLP) | 0.9902 |
| AT1G17240.1 | Y | Y | RLP | 0.9962 | RLP | 0.9913 | LRR-RLP | 0.8497 | (LRR-RLP) | 0.9886 |
| AT4G13880.1 | Y | Y | RLP | 0.9963 | RLP | 0.9899 | LRR-RLP | 0.8001 | (LRR-RLP) | 0.9884 |
| AT5G27060.1 | Y | Y | RLP | 0.9962 | RLP | 0.991 | LRR-RLP | 0.6669 | (LRR-RLP) | 0.9884 |
| AT3G23110.1 | Y | Y | RLP | 0.9964 | RLP | 0.9912 | LRR-RLP | 0.6502 | (LRR-RLP) | 0.9883 |
| AT1G80080.1 | Y | Y | RLP | 0.9961 | RLP | 0.9911 | LRR-RLP | 0.5506 | (LRR-RLP) | 0.9883 |
| AT2G32680.1 | Y | Y | RLP | 0.9967 | RLP | 0.9918 | LRR-RLP | 0.7838 | (LRR-RLP) | 0.9882 |
| AT1G74180.1 | Y | Y | RLP | 0.9959 | RLP | 0.858 | LRR-RLP | 0.8163 | (LRR-RLP) | 0.988 |
| AT3G05370.1 | Y | Y | RLP | 0.9962 | RLP | 0.8556 | LRR-RLP | 0.6337 | (LRR-RLP) | 0.988 |
| AT3G11080.1 | Y | Y | RLP | 0.9962 | RLP | 0.991 | LRR-RLP | 0.8496 | (LRR-RLP) | 0.988 |
| AT3G28890.1 | Y | Y | RLP | 0.9966 | RLP | 0.8561 | LRR-RLP | 0.6336 | (LRR-RLP) | 0.988 |
| AT2G25440.1 | Y | Y | RLP | 0.9962 | RLP | 0.9902 | LRR-RLP | 0.4832 | (LRR-RLP) | 0.9878 |
| AT5G45770.1 | Y | Y | RLP | 0.9965 | RLP | 0.99 | LRR-RLP | 0.683 | (LRR-RLP) | 0.9878 |
| AT2G42800.1 | Y | Y | RLP | 0.9963 | RLP | 0.9908 | LRR-RLP | 0.6665 | (LRR-RLP) | 0.9876 |
| AT3G05360.1 | Y | Y | RLP | 0.9967 | RLP | 0.9913 | LRR-RLP | 0.6668 | (LRR-RLP) | 0.9876 |
| AT5G65830.1 | Y | Y | RLP | 0.9966 | RLP | 0.8566 | LRR-RLP | 0.667 | (LRR-RLP) | 0.9876 |
| AT1G28340.1 | Y | Y | RLP | 0.8425 | RLP | 0.9905 | Malectin-RLP | 0.4502 | (Malectin-RLP) | 0.9875 |
| AT1G74190.1 | Y | Y | RLP | 0.9959 | RLP | 0.8564 | LRR-RLP | 0.8499 | (LRR-RLP) | 0.9871 |
| AT2G15080.1 | Y | Y | RLP | 0.9965 | RLP | 0.9904 | LRR-RLP | 0.8502 | (LRR-RLP) | 0.987 |
| AT3G05650.1 | Y | Y | RLP | 0.9964 | RLP | 0.9906 | LRR-RLP | 0.6664 | (LRR-RLP) | 0.9868 |
| AT1G45616.1 | Y | Y | RLP | 0.9961 | RLP | 0.9913 | LRR-RLP | 0.7665 | (LRR-RLP) | 0.9868 |
| AT3G05660.1 | Y | Y | RLP | 0.9966 | RLP | 0.8557 | LRR-RLP | 0.85 | (LRR-RLP) | 0.9866 |
| AT1G58190.1 | Y | Y | RLP | 0.9962 | RLP | 0.8521 | LRR-RLP | 0.6663 | (LRR-RLP) | 0.9866 |
| AT3G49750.1 | Y | Y | RLP | 0.9963 | RLP | 0.9909 | LRR-RLP | 0.7502 | (LRR-RLP) | 0.9865 |
| AT4G13920.1 | Y | Y | RLP | 0.9967 | RLP | 0.9911 | LRR-RLP | 0.8498 | (LRR-RLP) | 0.9865 |
| AT5G25910.1 | Y | Y | RLP | 0.9964 | RLP | 0.9899 | LRR-RLP | 0.8501 | (LRR-RLP) | 0.9864 |
| AT2G33060.1 | Y | Y | RLP | 0.9966 | RLP | 0.9914 | LRR-RLP | 0.8332 | (LRR-RLP) | 0.9863 |
| AT4G04220.1 | Y | Y | RLP | 0.9962 | RLP | 0.9911 | LRR-RLP | 0.8506 | (LRR-RLP) | 0.9863 |
| AT2G33050.1 | Y | Y | RLP | 0.9964 | RLP | 0.9915 | LRR-RLP | 0.7498 | (LRR-RLP) | 0.986 |
| AT1G71400.1 | Y | Y | RLP | 0.996 | RLP | 0.8563 | LRR-RLP | 0.6831 | (LRR-RLP) | 0.9851 |
| AT4G18760.1 | Y | Y | RLP | 0.9967 | RLP | 0.9903 | LRR-RLP | 0.8495 | (LRR-RLP) | 0.9885 |
| AT1G71390.1 | N | Y | RLP | 0.9966 | RLP | 0.99 | LRR-RLP | 0.6667 | (LRR-RLP) | 0.8021 |
| AT2G25470.1 | N | Y | RLP | 0.9964 | RLP | 0.8556 | LRR-RLP | 0.8502 | (LRR-RLP) | 0.8014 |
| AT1G47890.1 | N | Y | RLP | 0.9967 | RLP | 0.9908 | LRR-RLP | 0.8501 | (LRR-RLP) | 0.8001 |
| AT4G13810.1 | N | Y | RLP | 0.9964 | RLP | 0.9907 | LRR-RLP | 0.833 | (LRR-RLP) | 0.7997 |
| AT3G23010.1 | N | Y | RLP | 0.9965 | RLP | 0.9908 | LRR-RLP | 0.667 | (LRR-RLP) | 0.7995 |
| AT1G74170.1 | N | Y | RLP | 0.9964 | RLP | 0.8561 | LRR-RLP | 0.7164 | (LRR-RLP) | 0.7994 |
| AT3G24982.1 | N | Y | RLP | 0.9963 | RLP | 0.989 | LRR-RLP | 0.8512 | (LRR-RLP) | 0.7993 |
| AT1G17250.1 | N | Y | RLP | 0.9965 | RLP | 0.9911 | LRR-RLP | 0.8496 | (LRR-RLP) | 0.799 |
| AT3G23120.1 | N | Y | RLP | 0.997 | RLP | 0.9905 | LRR-RLP | 0.6835 | (LRR-RLP) | 0.7976 |
| AT3G53240.1 | N | Y | RLP | 0.9961 | RLP | 0.9905 | LRR-RLP | 0.783 | (LRR-RLP) | 0.7973 |
| AT1G07390.1 | N | Y | RLP | 0.9957 | RLP | 0.7119 | LRR-RLP | 0.7826 | (LRR-RLP) | 0.7969 |
| AT3G11010.1 | N | Y | RLP | 0.9961 | RLP | 0.9902 | LRR-RLP | 0.6665 | (LRR-RLP) | 0.7958 |
| AT1G34290.1 | Y | Y | RLP | 0.9964 | RLP | 0.9898 | Undefined | 0.2166 | (Undefined) | 0.7949 |
| AT5G49290.1 | N | Y | RLP | 0.9966 | RLP | 0.9901 | LRR-RLP | 0.6833 | (LRR-RLP) | 0.7941 |
| AT2G32660 | N | |||||||||
| AT2G33020 | N | |||||||||
| AT2G33030 | N | |||||||||
| AT2G33080 | N | |||||||||
| AT3G24900 | N | |||||||||
| AT3G25010 | N | |||||||||
| AT4G13900 | N | |||||||||
| AT5G40170 | N | |||||||||
| AT3G25020 | N |
In a third validation, we selected 148 LRR-RLPs described in a genome-wide study of rice RLPs [64] (Table S1). The results show that 78 LRR-RLPs with SP and TM were correctly classified with a relatively high probability (greater than 0.98). Additionally, from 73 LRR-RLPs with a single TM, 71 were correctly classified, whereas 2 were classified as Other-RLPs with an estimated probability ranging from 0.792 to 0.805. Only four predicted LRR-RLPs from rice were classified as NRLPs; two lack both SP and TM, and two do not harbor TM. The fourth validation was carried out to ensure that RLPredictiOme does not randomly classify proteins. For this, 100 randomly generated sequences were confronted against RLPredictiOme, and all sequences were classified as NRLP in the first step (Table 8).
Table 8.
Random sequences confronted against RLPredictiOme.
| Accession | SP | TM | RLP-NRLP | RLP-NRLP Probability | RLP-RLK | RLP-RLK Probability | RLP-Subfamily | RLP-Subfamily Probability | Classification | Decision Probability |
|---|---|---|---|---|---|---|---|---|---|---|
| Alien_71_464 | Y | Y | NRLP | 0.0532 | RLP | 0.7145 | Other-RLP | 0.4166 | NRLP | 0.4033 |
| Alien_78_801 | Y | Y | NRLP | 0.0532 | RLP | 0.857 | WAK-RLP | 0.3169 | NRLP | 0.4014 |
| Alien_88_471 | N | Y | NRLP | 0.369 | RLP | 0.855 | Unknown | 0.2837 | NRLP | 0.2068 |
| Alien_90_956 | N | Y | NRLP | 0.0527 | RLK-like | 0.5721 | Other-RLP | 0.3499 | NRLP | 0.2064 |
| Alien_94_666 | N | Y | NRLP | 0.0535 | RLP | 0.8558 | S-domain-RLP | 0.3164 | NRLP | 0.2045 |
| Alien_11_789 | N | Y | NRLP | 0.0524 | RLK-like | 0.4288 | Other-RLP | 0.4331 | NRLP | 0.2034 |
| Alien_34_248 | N | Y | NRLP | 0.2093 | RLP | 0.8571 | Other-RLP | 0.4004 | NRLP | 0.2022 |
| Alien_70_660 | N | Y | NRLP | 0.3677 | RLP | 0.8564 | Unknown | 0.2491 | NRLP | 0.2002 |
| Alien_59_959 | N | Y | NRLP | 0.052 | RLK-like | 0.576 | S-domain-RLP | 0.417 | NRLP | 0.1994 |
| Alien_20_195 | Y | N | NRLP | 0.3704 | RLP | 0.8544 | Unknown | 0.2671 | NRLP | 0.1987 |
| Alien_23_503 | N | Y | NRLP | 0.3698 | RLP | 0.8596 | Unknown | 0.3 | NRLP | 0.1987 |
| Alien_69_854 | N | Y | NRLP | 0.0542 | RLP | 0.7198 | Other-RLP | 0.4327 | NRLP | 0.1985 |
| Alien_2_750 | N | Y | NRLP | 0.0526 | RLK-like | 0.5768 | Other-RLP | 0.3331 | NRLP | 0.1956 |
| Alien_66_528 | N | N | NRLP | 0.0001 | RLP | 0.8549 | S-domain-RLP | 0.3829 | NRLP | 0.0195 |
| Alien_1_268 | N | N | NRLP | 0.0002 | RLP | 0.8536 | Other-RLP | 0.3831 | NRLP | 0.0093 |
| Alien_51_917 | N | N | NRLP | 0.0002 | RLK-like | 0.573 | Unknown | 0.283 | NRLP | 0.0044 |
| Alien_79_429 | N | N | NRLP | 0.3166 | RLP | 0.8588 | Other-RLP | 0.3001 | NRLP | 0.0041 |
| Alien_61_779 | N | N | NRLP | 0.0002 | RLP | 0.7131 | S-domain-RLP | 0.3834 | NRLP | 0.0036 |
| Alien_67_112 | N | N | NRLP | 0.1591 | RLP | 0.7131 | Other-RLP | 0.3342 | NRLP | 0.0035 |
| Alien_42_363 | N | N | NRLP | 0.316 | RLP | 0.8576 | S-domain-RLP | 0.3336 | NRLP | 0.003 |
| Alien_4_417 | N | N | NRLP | 0.0002 | RLK-like | 0.5712 | WAK-RLP | 0.4337 | NRLP | 0.0029 |
| Alien_24_102 | N | N | NRLP | 0.4222 | RLP | 0.861 | WAK-RLP | 0.3498 | NRLP | 0.0027 |
| Alien_9_882 | N | N | NRLP | 0.0002 | RLP | 0.7132 | S-domain-RLP | 0.3664 | NRLP | 0.0019 |
| Alien_7_199 | N | N | NRLP | 0.3166 | RLP | 0.8564 | WAK-RLP | 0.3504 | NRLP | 0.0018 |
| Alien_29_460 | N | N | NRLP | 0.2089 | RLP | 0.8554 | Unknown | 0.284 | NRLP | 0.0017 |
| Alien_50_474 | N | N | NRLP | 0.0009 | RLP | 0.8548 | Unknown | 0.2495 | NRLP | 0.0017 |
| Alien_72_442 | N | N | NRLP | 0.0002 | RLP | 0.8498 | Unknown | 0.2333 | NRLP | 0.0017 |
| Alien_97_120 | N | N | NRLP | 0.3685 | RLP | 0.8566 | Unknown | 0.2999 | NRLP | 0.0017 |
| Alien_38_893 | N | N | NRLP | 0.0003 | RLK-like | 0.5771 | S-domain-RLP | 0.4499 | NRLP | 0.0016 |
| Alien_73_528 | N | N | NRLP | 0.0002 | RLP | 0.857 | S-domain-RLP | 0.3665 | NRLP | 0.0016 |
| Alien_83_641 | N | N | NRLP | 0.0003 | RLP | 0.7085 | Other-RLP | 0.3502 | NRLP | 0.0016 |
| Alien_44_248 | N | N | NRLP | 0.0003 | RLP | 0.7133 | S-domain-RLP | 0.3833 | NRLP | 0.0015 |
| Alien_62_945 | N | N | NRLP | 0.0002 | RLK-like | 0.5733 | S-domain-RLP | 0.4834 | NRLP | 0.0015 |
| Alien_16_855 | N | N | NRLP | 0.0002 | RLK-like | 0.4308 | Unknown | 0.2658 | NRLP | 0.0014 |
| Alien_40_703 | N | N | NRLP | 0.0002 | RLP | 0.711 | S-domain-RLP | 0.3499 | NRLP | 0.0014 |
| Alien_45_534 | N | N | NRLP | 0.0002 | RLP | 0.8553 | WAK-RLP | 0.3165 | NRLP | 0.0014 |
| Alien_74_665 | N | N | NRLP | 0.0001 | RLP | 0.8547 | Unknown | 0.2503 | NRLP | 0.0014 |
| Alien_18_925 | N | N | NRLP | 0.0001 | RLK-like | 0.5679 | Other-RLP | 0.4166 | NRLP | 0.0013 |
| Alien_33_955 | N | N | NRLP | 0.0003 | RLK-like | 0.4348 | Unknown | 0.2332 | NRLP | 0.0013 |
| Alien_39_171 | N | N | NRLP | 0.1577 | RLP | 0.8516 | Unknown | 0.2665 | NRLP | 0.0012 |
| Alien_49_350 | N | N | NRLP | 0.0002 | RLP | 0.8573 | S-domain-RLP | 0.4842 | NRLP | 0.0012 |
| Alien_63_622 | N | N | NRLP | 0.0002 | RLP | 0.8555 | Unknown | 0.2664 | NRLP | 0.0012 |
| Alien_89_627 | N | N | NRLP | 0.0002 | RLP | 0.8567 | Other-RLP | 0.3835 | NRLP | 0.0012 |
| Alien_91_929 | N | N | NRLP | 0.0003 | RLK-like | 0.573 | Other-RLP | 0.4331 | NRLP | 0.0012 |
| Alien_14_450 | N | N | NRLP | 0.3148 | RLP | 0.7157 | WAK-RLP | 0.333 | NRLP | 0.0011 |
| Alien_15_536 | N | N | NRLP | 0.0007 | RLP | 0.8566 | Unknown | 0.2668 | NRLP | 0.0011 |
| Alien_22_586 | N | N | NRLP | 0.001 | RLP | 0.8562 | S-domain-RLP | 0.3993 | NRLP | 0.0011 |
| Alien_3_226 | N | N | NRLP | 0.0003 | RLK-like | 0.431 | Unknown | 0.2991 | NRLP | 0.0011 |
| Alien_57_326 | N | N | NRLP | 0.3151 | RLP | 0.8605 | Unknown | 0.2502 | NRLP | 0.0011 |
| Alien_13_137 | N | N | NRLP | 0.2113 | RLK-like | 0.5764 | Unknown | 0.1667 | NRLP | 0.001 |
| Alien_35_659 | N | N | NRLP | 0.0002 | RLK-like | 0.5687 | Other-RLP | 0.3829 | NRLP | 0.001 |
| Alien_37_440 | N | N | NRLP | 0.0003 | RLK-like | 0.5743 | Unknown | 0.2666 | NRLP | 0.001 |
| Alien_48_571 | N | N | NRLP | 0.0002 | RLP | 0.8586 | Unknown | 0.2999 | NRLP | 0.001 |
| Alien_54_839 | N | N | NRLP | 0.0004 | RLP | 0.7158 | Unknown | 0.2674 | NRLP | 0.001 |
| Alien_12_553 | N | N | NRLP | 0.3185 | RLP | 0.858 | Unknown | 0.2335 | NRLP | 0.0009 |
| Alien_17_304 | N | N | NRLP | 0.3169 | RLP | 0.8541 | Unknown | 0.2828 | NRLP | 0.0009 |
| Alien_25_176 | N | N | NRLP | 0.0003 | RLP | 0.8568 | Unknown | 0.2667 | NRLP | 0.0009 |
| Alien_30_623 | N | N | NRLP | 0.0002 | RLP | 0.8547 | Other-RLP | 0.3833 | NRLP | 0.0009 |
| Alien_32_240 | N | N | NRLP | 0.1576 | RLP | 0.8531 | Unknown | 0.2499 | NRLP | 0.0009 |
| Alien_53_589 | N | N | NRLP | 0.0006 | RLP | 0.7103 | Unknown | 0.3 | NRLP | 0.0009 |
| Alien_58_715 | N | N | NRLP | 0.0001 | RLK-like | 0.5748 | S-domain-RLP | 0.3842 | NRLP | 0.0009 |
| Alien_82_456 | N | N | NRLP | 0.0001 | RLP | 0.855 | S-domain-RLP | 0.3165 | NRLP | 0.0009 |
| Alien_85_415 | N | N | NRLP | 0.0004 | RLP | 0.715 | Unknown | 0.2167 | NRLP | 0.0009 |
| Alien_8_947 | N | N | NRLP | 0.0001 | RLK-like | 0.5689 | Unknown | 0.25 | NRLP | 0.0009 |
| Alien_10_555 | N | N | NRLP | 0.0002 | RLP | 0.8536 | Unknown | 0.2996 | NRLP | 0.0008 |
| Alien_19_229 | N | N | NRLP | 0.0003 | RLP | 0.8599 | PAN-RLP | 0.3336 | NRLP | 0.0008 |
| Alien_27_824 | N | N | NRLP | 0.0002 | RLP | 0.7111 | Unknown | 0.3337 | NRLP | 0.0008 |
| Alien_41_731 | N | N | NRLP | 0.0004 | RLP | 0.7117 | Unknown | 0.2666 | NRLP | 0.0008 |
| Alien_43_686 | N | N | NRLP | 0.0001 | RLP | 0.7129 | S-domain-RLP | 0.3662 | NRLP | 0.0008 |
| Alien_47_420 | N | N | NRLP | 0.0004 | RLP | 0.8546 | Other-RLP | 0.4172 | NRLP | 0.0008 |
| Alien_52_779 | N | N | NRLP | 0.0003 | RLK-like | 0.4383 | Unknown | 0.2999 | NRLP | 0.0008 |
| Alien_55_478 | N | N | NRLP | 0.0002 | RLP | 0.7179 | Other-RLP | 0.3997 | NRLP | 0.0008 |
| Alien_60_817 | N | N | NRLP | 0.0002 | RLP | 0.7135 | Unknown | 0.2999 | NRLP | 0.0008 |
| Alien_64_626 | N | N | NRLP | 0.0002 | RLP | 0.7138 | Other-RLP | 0.4 | NRLP | 0.0008 |
| Alien_75_673 | N | N | NRLP | 0.0002 | RLP | 0.8548 | Unknown | 0.2832 | NRLP | 0.0008 |
| Alien_81_442 | N | N | NRLP | 0.0003 | RLK-like | 0.5736 | S-domain-RLP | 0.4833 | NRLP | 0.0008 |
| Alien_87_495 | N | N | NRLP | 0.0005 | RLP | 0.8555 | S-domain-RLP | 0.3838 | NRLP | 0.0008 |
| Alien_93_110 | N | N | NRLP | 0.3149 | RLP | 0.8597 | WAK-RLP | 0.467 | NRLP | 0.0008 |
| Alien_99_622 | N | N | NRLP | 0.0002 | RLP | 0.8568 | Unknown | 0.25 | NRLP | 0.0008 |
| Alien_21_499 | N | N | NRLP | 0.0002 | RLP | 0.86 | S-domain-RLP | 0.3498 | NRLP | 0.0007 |
| Alien_31_429 | N | N | NRLP | 0.0002 | RLP | 0.7128 | Unknown | 0.2996 | NRLP | 0.0007 |
| Alien_46_860 | N | N | NRLP | 0.0002 | RLK-like | 0.571 | Unknown | 0.2995 | NRLP | 0.0007 |
| Alien_56_859 | N | N | NRLP | 0.0005 | RLK-like | 0.5724 | S-domain-RLP | 0.3328 | NRLP | 0.0007 |
| Alien_5_855 | N | N | NRLP | 0.0003 | RLK-like | 0.572 | Unknown | 0.2997 | NRLP | 0.0007 |
| Alien_65_609 | N | N | NRLP | 0.0002 | RLK-like | 0.4257 | Unknown | 0.2667 | NRLP | 0.0007 |
| Alien_6_529 | N | N | NRLP | 0.0001 | RLP | 0.8565 | Unknown | 0.2504 | NRLP | 0.0007 |
| Alien_86_232 | N | N | NRLP | 0.1581 | RLP | 0.8535 | Other-RLP | 0.3495 | NRLP | 0.0007 |
| Alien_92_960 | N | N | NRLP | 0.0005 | RLK-like | 0.5741 | Other-RLP | 0.3168 | NRLP | 0.0007 |
| Alien_95_597 | N | N | NRLP | 0.157 | RLP | 0.8588 | Unknown | 0.2833 | NRLP | 0.0007 |
| Alien_96_597 | N | N | NRLP | 0.3704 | RLP | 0.8544 | WAK-RLP | 0.3999 | NRLP | 0.0007 |
| Alien_0_119 | N | N | NRLP | 0.0528 | RLP | 0.7163 | PAN-RLP | 0.4339 | NRLP | 0.0006 |
| Alien_26_112 | N | N | NRLP | 0.5285 | RLP | 0.8585 | Unknown | 0.2664 | NRLP | 0.0006 |
| Alien_76_327 | N | N | NRLP | 0.0003 | RLP | 0.7066 | Other-RLP | 0.4002 | NRLP | 0.0006 |
| Alien_77_685 | N | N | NRLP | 0.0002 | RLK-like | 0.569 | Unknown | 0.2494 | NRLP | 0.0006 |
| Alien_98_323 | N | N | NRLP | 0.1046 | RLP | 0.7172 | Other-RLP | 0.5328 | NRLP | 0.0006 |
| Alien_28_468 | N | N | NRLP | 0.0001 | RLP | 0.8563 | Unknown | 0.2831 | NRLP | 0.0005 |
| Alien_36_821 | N | N | NRLP | 0.0001 | RLP | 0.717 | Unknown | 0.2337 | NRLP | 0.0005 |
| Alien_68_626 | N | N | NRLP | 0.0002 | RLP | 0.8541 | Unknown | 0.2835 | NRLP | 0.0005 |
| Alien_80_637 | N | N | NRLP | 0.0002 | RLK-like | 0.5715 | S-domain-RLP | 0.4333 | NRLP | 0.0005 |
| Alien_84_494 | N | N | NRLP | 0.1614 | RLP | 0.8574 | S-domain-RLP | 0.3501 | NRLP | 0.0005 |
2.7. High Throughput Prediction of RLPs in the Arabidopsis Genome Using RLPredictiOme
We performed high throughput prediction by submitting the Arabidopsis sequences against RLPredictiOme. The cutoff tuning for the probability filter was assumed to be 0.6 in the first two-step and 0.7 in the last step (Figure 1F). In the third step, the probability estimates were more flexible in order to predict the RLP subfamilies.
From this genome-wide prediction, RLPredictiOme classified 176 RLP sequences into 15 subfamilies (Table S2). Table 9 summarizes the correct predictions within the subfamily. The number of proteins with unknown functions is highlighted in red, whereas the blue description represents the RLPs subfamilies predicted in other subfamilies. The LRR-RLPs subfamily contained 49 members. Three new members (AT5G37360, AT5G19230, and AT4G28560), predicted with relatively high probability, were not classified into a known subfamily, whereas two sequences were incorrectly classified. Interestingly, AtRLP4 has two domains, an LRR domain, and an endoplasmic reticulum protein-associated Di-glucose binding domain, which characterizes malectin proteins. The RLPredictiOme method classified the AtRLP4 into the malectin-RLP subfamily (see Table S2).
Table 9.
Number of RLPs and predicted RLKs.
| Class (Subfamily) | RLP | Correctly Classified * | Unknown Function ** |
Incorrectly Subfamily Classified *** | Mistakenly Classified **** | RLKs in Arabidopsis |
|---|---|---|---|---|---|---|
| LRR-RLP | 49 | 46 | 3 | 0 | 2 | 235 |
| L-Lectin-RLP | 5 | 0 | 5 | 5 | 45 | |
| Salt stress response/antifungal-RLP | 9 | 3 | 1 | 5 | 0 | 44 |
| WAK-RLP | 6 | 5 | 1 | 4 | 42 | |
| S-domain-RLP | 1 | 1 | 1 | 37 | ||
| Unknown-RLP (Extensin, PERK, RKF3, URKI) | 43 | 43 | 11 | 28 | ||
| Malectin-RLP | 6 | 2 | 3 | 1 | 5 | 15 |
| RCC1-RLP | 4 | 4 | 8 | |||
| LysM-RLP | 4 | 2 | 2 | 3 | ||
| B-lectin-RLP | 1 | 1 | 2 | |||
| C-Lectin-RLP | 0 | 2 | ||||
| Ethylene-responsive-RLP | 3 | 3 | 3 | 2 | ||
| PAS-RLP | 0 | 2 | ||||
| Thaumatin-RLP | 6 | 6 | 2 | |||
| PPR-RLP | 0 | 1 | ||||
| Glycosyl-hydrolases-RLP | 3 | 3 | 0 | |||
| PAN-RLP | 1 | 1 | 1 | 0 | ||
| Other-RLP | 35 | 11 | 24 | 13 | 0 | |
| Undefined | 78 | |||||
| Total | 176 | 122 | 47 | 7 | 45 | 468 |
The candidate sequences with a legume lectin domain were classified into two RLP subfamilies, B-Lectin-RLP and L-Lectin-RLP (Table S2). Only one member was classified as B-Lectin-RLP with an unknown function, while six members were classified into the L-Lectin-RLP subfamily, also designated as unknown function proteins. Seven proteins were classified incorrectly into this subfamily. The 20 Lysin motif-containing candidate proteins were classified as LysM-RLP (Table S2). Two (AT1G77630.1 and AT2G17120.1) of the three previously characterized LysM-RLPs [65] and two classified LysM-RLPs (AT3G06360.1 and AT5G26270.1) belong to subfamilies previously identified as unknown function subfamilies, and one sequence (AT1G63550.1) belongs to the salt stress response/antifungal-RLP family. The other 15 sequences may belong to the lipid transfer protein family, not yet characterized. Additionally, the ectodomain lipid transfer family associated with a kinase domain was allocated in the other-RLP group as probable lipid transfer-RLK. Twelve sequences were classified as probable lipid transfer-RLP; however, this misclassification occurred in the LysM-RLP and unknown-RLP groups, which may be functionally similar. It may be due to the over-representability of these two mentioned groups.
In the malectin-RLP subfamily, RLPredictiOme correctly classified two members previously characterized (AT1G28340.1 and AT1G24485.1). Four candidate members were identified into subfamilies of unknown function, and seven sequences were incorrectly predicted (Table S2). Furthermore, the third previously identified malectin-RLP (AT3G46240.1) was predicted as an RCC1-RLP. This subfamily has seven predicted members without known functions. One salt stress response/antifungal-RLP was predicted within this family. The salt stress response/antifungal-RLPs had four members correctly classified and four predicted within other subfamilies (three in WAK-RLP and one in RCC1-RLP). The S-domain-RLP had a correctly and an incorrectly predicted sequence (Table S2).
As for the thaumatin-RLP subfamily, all six members were correctly predicted (Table S2). The WAK-RLP subfamily correctly predicted five members but also incorporated one candidate sequence with an unknown function and three salt stress response/antifungal-RLPs. Ectodomains without a functional domain were classified within a subfamily designated unknown-RLPs. This group also includes RLPs harboring the ectodomains PERK-like, extensin, RKF3-like, CrRLK1, and RLK10-like proline-rich proteins. RLPredictiOme predicted 46 sequences with unknown functions classified as an unknown-RLP subfamily (Table S2). The protein sequences, which are not classified correctly or have a low relative probability of subfamily classification, were designated as undefined and not considered RLPs. In summary, a total of 78 proteins were classified in this group (Table S2).
RLPredictiOme identified probable lipid transfer-RLPs, considered a novel RLP class associated with RLKs, yet to be characterized. Furthermore, three new classes of RLPs were predicted: plastocyanin-like-RLP, ring finger-RLP, and glycosyl-hydrolase-RLP, which contained eight, five, and seven members, respectively. Interestingly, five glycerophosphoryl diester phosphodiesterase family (GDPDL members were predicted as other-RLPs. As a rare protein family in plants, we selected GDPDL-RLP to carry out an experimental validation for these receptor-like protein candidates. The number of predicted RLPs in each subfamily is shown in Table 9.
2.8. GDPDL Family Downstream Analysis
Phylogenetic analysis of the kinase domain of the RLK family and the kinase domain of IRE1A and IRE1B, endoplasmic reticulum (ER)-specific protein kinase, clustered the kinase domain of GDPDL-RLK and thaumatin in the same group distinct from the ER kinases (Figure 2A). These results suggest that GDPDL-RLKs are not ER transmembrane proteins. The secondary structure and the topology of GDPDL show that the N-terminal region of GDPDL-RLK is composed of a signal peptide, a GDPD domain, and more than 10 candidate sites for N-glycosylation (Figure 2B). As an RLK, GDPDL-RLK contains an ectodomain facing the extracellular space, a transmembrane segment, and a cytoplasmic portion harboring the kinase domain. The topology of classified GDPDL-RLPs fits a typical RLP configuration with an N-terminal peptide signal, the glycerophosphoryl diester phosphodiesterase ectodomain, the transmembrane segment, and it lacks a short C-terminal cytoplasmic domain. GDPDL1 and GDPDl6 harbor two glycerophosphoryl diester phosphodiesterase domains, whereas GDPDL3/4/5 has a single domain localized in a similar position compared with GDPDL-RLK.
Figure 2.
Analysis in silico of the GDPDL-RLPs. (A) Phylogenetic tree of the kinase catalytic domain of RLKs, IRE1A and IRE1B. (B) The topology of GDPDL-RLPs.
The molecular evolution of the new GDPDLs and the GDPDL-RLK ectodomain was investigated by calculating the ratio between non-synonymous and synonymous substitutions (Ka/Ks). Compared to the full-length sequence of GDPDL-RLK, only the gene pair GDPDL-RLK/GDPDL6 with a ratio of Ka/Ks > 1 may have undergone a positive selection (Table 10). The ectodomain sequence of GDPDL-RLK compared with gene pairs GDPL1/3/4 was submitted to purifying selection, as suggested by their Ka/Ks ratio < 1 and p-value < 0.05. The divergence time of GDPL1/3/4 was 23.7, 32.5, and 120.1 Mya. These results suggest that despite the divergence time of GDPL1/3/4 compared to the GDPDL-RLK ectodomain, the higher frequency of synonymous mutations may have maintained the GDPL1/3/4 and the ectodomain GDPDL-RLK functionally similar.
Table 10.
Molecular evolution analysis of the GDPDLs.
| Sequence | Ka | Ks | Ka/Ks | Selection | Date (Mya) | p-Value |
|---|---|---|---|---|---|---|
| GDPDL5-GDPDL3 | 0.382 | 1.578 | 0.242 | Purifying | 129.316 | 7.98 × 10−49 |
| GDPD (ectodomain)- GDPDL4 | 0.214 | 1.466 | 0.146 | Purifying | 120.193 | 2.22 × 10-45 |
| GDPDL4-GDPD-RLK | 0.214 | 1.288 | 0.166 | Purifying | 105.602 | 9.31 × 10−45 |
| GDPDL1-GDPDL4 | 0.180 | 0.940 | 0.192 | Purifying | 77.037 | 1.60 × 10−51 |
| GDPDL3-GDPDL4 | 0.164 | 0.852 | 0.192 | Purifying | 69.822 | 1.12 × 10−46 |
| GDPDL4-GDPDL6 | 0.646 | 0.802 | 0.805 | Purifying | 65.744 | 0.146094 |
| GDPD-RLK-GDPDL6 | 0.695 | 0.638 | 1.090 | Positive | 52.286 | 0.109708 |
| GDPD (ectodomain)- GDPDL3 | 0.170 | 0.397 | 0.428 | Purifying | 32.525 | 4.56 × 10−13 |
| GDPDL3-GDPD-RLK | 0.167 | 0.394 | 0.423 | Purifying | 32.333 | 3.06 × 10−13 |
| GDPD-RLK-GDPDL3 | 0.167 | 0.394 | 0.423 | Purifying | 32.333 | 3.06 × 10−13 |
| GDPDL1-GDPDL3 | 0.141 | 0.390 | 0.363 | Purifying | 31.961 | 1.05 × 10−17 |
| GDPDL1-GDPD-RLK | 0.120 | 0.327 | 0.368 | Purifying | 26.786 | 5.38 × 10−16 |
| GDPD-RLK-GDPDL1 | 0.120 | 0.327 | 0.368 | Purifying | 26.786 | 5.38 × 10−16 |
| GDPDL1-GDPD (ectodomain) | 0.125 | 0.326 | 0.384 | Purifying | 26.730 | 5.08 × 10−15 |
2.9. Identification of GDPDLs- and SNC4-Interacting Proteins from Arabidopsis
Protein–protein interactions between the GDPDLs and GDPDL-RLK, also designated SUPPRESSOR OF NPR1, CONSTITUTIVE 4 (SNC4), and the Arabidopsis proteins were identified in silico through the protein–protein interactome using Cytoscape software and several databases (BioGRID database, Arabidopsis interactome database, and the String database). This procedure identified the protein-protein interaction (PPI) network containing GDPDLs and directly interacting Arabidopsis proteins (Figure 3). The GDPDL6 formed the largest hub (degree 38). Among the GDLDL6-interacting proteins, the glycogen synthase kinase 3/SHAGGY-like kinases (GSKs-AT1G57870) may represent a candidate protein for signaling (Figure 3A, Table 11). Although GSKs have been recently discovered in plants, evidence suggests that they are involved in different biological processes, such as brassinosteroid signaling, flower development, and injury responses [66]). The node-hub GDPDL5 contains the AtMLP328 pathogenesis-related protein and other proteins of unknown function (Figure 3A, Table 11). The AtMLP328 is a member of the major latex protein-like (MLPL) gene family responsible for promoting vegetative growth and delaying flowering.
Figure 3.
GDPDL-RLPs-interacting Arabidopsis proteins. (A) GDPDL-RLP-interacting proteins were identified in the Arabidopsis interactome, and the network was assembled by the Cytoscape software. GDPDL-RLPs and SNC4 (GDPDL2) are indicated in green, GDPDL-specifically interacting proteins in light blue, RNA-binding proteins, which interact with all 6 GDPDLs, including GDPDL_RLK (SNC4), are shown in red. In orange, CSN5A as a central hub of plant-pathogen interactions (B) Gene enrichment of proteins under the molecular function term from the GDPD-RLP-Arabidopsis protein-protein interactions (PPI) network. (C) Gene enrichment of proteins from the GDPD-RLP-Arabidopsis PPI network under the cellular component term.
Table 11.
Protein-protein interactions between the GDPDL proteins and Arabidopsis proteins. The colors indicate the hubs from Figure 3A.
| Name | Betweenness Centrality | Closeness Centrality | Degree | Eccentricity | Description |
|---|---|---|---|---|---|
| SNC4 | 0.19234075 | 0.37614679 | 12 | 3 | glycerophosphoryl diester phosphodiesterase family protein, putative, expressed |
| RLP51 | 0.0 | 0.27516779 | 2 | 4 | leucine rich repeat family protein, putative, expressed |
| SNC1 | 3.0111 × 10−4 | 0.27702703 | 4 | 4 | rp3 protein, putative, expressed |
| SUA | 1.0037 × 10−4 | 0.27702703 | 4 | 4 | RNA recognition motif family protein, expressed |
| DRT111 | 1.0037 × 10−4 | 0.27702703 | 4 | 4 | G-patch domain containing protein, expressed |
| AT2G20050 | 0.0 | 0.27424749 | 1 | 4 | AGC_PKA/PKG_like.1-ACG kinases include homologs to PKA, PKG and PKC, expressed |
| AT1G59780 | 0.0 | 0.27424749 | 1 | 4 | NBS-LRR disease resistance protein, putative, expressed |
| AT3G55350 | 0.0 | 0.27609428 | 3 | 4 | trp repressor/replication initiator, putative, expressed |
| BPA1 | 0.30818366 | 0.51898734 | 6 | 2 | RNA recognition motif containing protein, putative, expressed |
| AT4G1772 | 0.30818366 | 0.51898734 | 6 | 2 | RNA recognition motif, putative, expressed |
| AT1G22920 | 0.0 | 0.27424749 | 2 | 4 | COP9 signalosome complex subunit 5b, putative, expressed |
| GDPDL5 | 0.17835276 | 0.37104072 | 10 | 3 | glycerophosphoryl diester phosphodiesterase family protein, putative, expressed |
| MLP328 | 0.0 | 0.27702703 | 7 | 4 | pathogenesis-related Bet v I family protein, putative, expressed |
| AGL46 | 0.0 | 0.27702703 | 7 | 4 | OsMADS89-MADS-box family gene with M-gamma type-box, expressed |
| AT2G47115 | 0.04302 | 0.2779661 | 8 | 4 | expressed protein |
| AT1G29660 | 0.04302 | 0.2779661 | 8 | 4 | GDSL-like lipase/acylhydrolase, putative, expressed |
| AT5G51950 | 0.04302 | 0.2779661 | 8 | 4 | HOTHEAD precursor, putative, expressed |
| AT1G20680 | 0.04302 | 0.2779661 | 8 | 4 | Ser/Thr-rich protein T10 in DGCR region, putative, expressed |
| AT2G17710 | 0.04302 | 0.2779661 | 8 | 4 | expressed protein |
| AT5G42530 | 0.04302 | 0.2779661 | 8 | 4 | |
| BPA1 | 0.30818366 | 0.51898734 | 6 | 2 | RNA recognition motif containing protein, putative, expressed |
| AT4G17720 | 0.30818366 | 0.51898734 | 6 | 2 | RNA recognition motif, putative, expressed |
| GDPDL3 | 0.1693342 | 0.37104072 | 10 | 3 | glycerophosphoryl diester phosphodiesterase family protein, putative, expressed |
| SHV2 | 0.0 | 0.27516779 | 5 | 4 | COBRA-like protein 7 precursor, putative, expressed |
| MRH1 | 0.0 | 0.27516779 | 5 | 4 | MRH1, putative, expressed |
| BST1 | 0.0 | 0.27516779 | 5 | 4 | endonuclease/exonuclease/phosphatase family domain containing protein, expressed |
| MRH6 | 0.0 | 0.27516779 | 5 | 4 | universal stress protein domain containing protein, putative, expressed |
| MRH2 | 0.0 | 0.27516779 | 5 | 4 | kinesin motor domain containing protein, expressed |
| ATCOAE | 0.0 | 0.27152318 | 1 | 4 | dephospho-CoA kinase, putative, expressed |
| AT3G23750 | 0.0 | 0.27152318 | 1 | 4 | receptor protein kinase TMK1 precursor, putative, expressed |
| BPA1 | 0.30818366 | 0.51898734 | 6 | 2 | RNA recognition motif containing protein, putative, expressed |
| AT4G17720 | 0.30818366 | 0.51898734 | 6 | 2 | RNA recognition motif, putative, expressed |
| GDPDL1 | 0.12794717 | 0.37442922 | 10 | 3 | glycerophosphoryl diester phosphodiesterase family protein, putative, expressed |
| AT1G49750 | 0.0 | 0.27333333 | 1 | 4 | uncharacterized protein At4g06744 precursor, putative, expressed |
| AT3G45710 | 0.0 | 0.27333333 | 1 | 4 | peptide transporter PTR2, putative, expressed |
| PLDGAMMA1 | 0.00779455 | 0.29181495 | 3 | 4 | phospholipase D, putative, expressed |
| MAP18 | 0.0 | 0.27333333 | 1 | 4 | Unknown function |
| CDS1 | 0.0 | 0.28275862 | 2 | 4 | phosphatidate cytidylyltransferase, putative, expressed |
| BPA1 | 0.30818366 | 0.51898734 | 6 | 2 | RNA recognition motif containing protein, putative, expressed |
| AT4G17720 | 0.30818366 | 0.51898734 | 6 | 2 | RNA recognition motif, putative, expressed |
| GDPDL4 | 0.21573054 | 0.38497653 | 14 | 3 | glycerophosphoryl diester phosphodiesterase family protein, putative, expressed |
| AT5G38480 | 0.0 | 0.27891156 | 1 | 4 | 14-3-3 protein, putative, expressed |
| FLA7 | 0.00445805 | 0.29390681 | 6 | 4 | fasciclin domain containing protein, expressed |
| SKU5 | 0.0 | 0.2877193 | 4 | 4 | monocopper oxidase, putative, expressed |
| FLA8 | 0.0 | 0.2877193 | 4 | 4 | fasciclin-like arabinogalactan protein, putative, expressed |
| ZW9 | 0.00445805 | 0.29390681 | 6 | 4 | ubiquitin carboxyl-terminal hydrolase, putative, expressed |
| AT1G32860 | 0.00853443 | 0.29496403 | 2 | 4 | glycosyl hydrolases family 17, putative, expressed |
| AT3G56370 | 0.0 | 0.27891156 | 1 | 4 | receptor-like protein kinase precursor, putative, expressed |
| AT4G09000 | 0.0 | 0.27891156 | 1 | 4 | 14-3-3 protein, putative, expressed |
| BG_PPAP | 0.0 | 0.27891156 | 1 | 4 | glycosyl hydrolases family 17, putative, expressed |
| AT1G01080 | 0.06480132 | 0.39047619 | 3 | 4 | RNA recognition motif containing protein, putative, expressed |
| AT5G65430 | 0.0 | 0.27891156 | 1 | 4 | 14-3-3 protein, putative, expressed |
| BPA1 | 0.30818366 | 0.51898734 | 6 | 2 | RNA recognition motif containing protein, putative, expressed |
| AT4G17720 | 0.30818366 | 0.51898734 | 6 | 2 | RNA recognition motif, putative, expressed |
| GDPDL6 | 0.67455299 | 0.4969697 | 38 | 3 | glycerophosphoryl diester phosphodiesterase family protein, putative, expressed |
| AT4G11860 | 0.0 | 0.27891156 | 1 | 4 | ubiquitin interaction motif family protein, expressed |
| AT3G23410 | 0.0 | 0.33333333 | 1 | 4 | alcohol oxidase, putative, expressed |
| AT4G23400 | 0.0 | 0.33333333 | 1 | 4 | aquaporin protein, putative, expressed |
| AT4G30850 | 0.0 | 0.33333333 | 1 | 4 | haemolysin-III, putative, expressed |
| AT1G57870 | 0.0 | 0.33333333 | 1 | 4 | CGMC_GSK.5-CGMC includes CDA, MAPK, GSK3, and CLKC kinases, expressed |
| AT1G31812 | 0.0 | 0.33333333 | 1 | 4 | acyl CoA binding protein, putative, expressed |
| AT1G14360 | 0.0 | 0.33333333 | 1 | 4 | solute carrier family 35 member B1, putative, expressed |
| AT5G06320 | 0.0 | 0.33333333 | 1 | 4 | harpin-induced protein 1 domain containing protein, expressed |
| AT1G07550 | 0.0 | 0.33333333 | 1 | 4 | senescence-induced receptor-like serine/threonine-protein kinase precursor, putative, expressed |
| AT5G07340 | 0.0 | 0.33333333 | 1 | 4 | calreticulin precursor protein, putative, expressed |
| AT2G41705 | 0.0 | 0.33333333 | 1 | 4 | crcB-like protein, expressed |
| AT3G12180 | 0.0 | 0.33333333 | 1 | 4 | cornichon protein, putative, expressed |
| AT5G11890 | 0.0 | 0.33333333 | 1 | 4 | harpin-induced protein 1 domain containing protein, expressed |
| AT1G14020 | 0.0 | 0.33333333 | 1 | 4 | auxin-independent growth promoter protein, putative, expressed |
| AT1G34640 | 0.0 | 0.33333333 | 1 | 4 | expressed protein |
| AT3G66654 | 0.0 | 0.33333333 | 1 | 4 | peptidyl-prolyl cis-trans isomerase, putative, expressed |
| AT2G22425 | 0.0 | 0.33333333 | 1 | 4 | signal peptidase complex subunit 1, putative, expressed |
| AT2G27290 | 0.0 | 0.33333333 | 1 | 4 | protein of unknown function DUF1279 domain containing protein, expressed |
| AT5G49540 | 0.0 | 0.33333333 | 1 | 4 | transmembrane protein 93, putative, expressed |
| AT1G13770 | 0.0 | 0.33333333 | 1 | 4 | DUF647 domain containing protein, putative, expressed |
| AT1G29060 | 0.0 | 0.33333333 | 1 | 4 | expressed protein |
| AT4G14455 | 0.0 | 0.33333333 | 1 | 4 | SNARE domain containing protein, putative, expressed |
| AT4G25360 | 0.0 | 0.33333333 | 1 | 4 | leaf senescence related protein, putative, expressed |
| AT4G12250 | 0.0 | 0.33333333 | 1 | 4 | UDP-glucuronate 4-epimerase, putative, expressed |
| AT5G35460 | 0.0 | 0.33333333 | 1 | 4 | integral membrane protein, putative, expressed |
| AT1G16170 | 0.0 | 0.33333333 | 1 | 4 | expressed protein |
| AT5G03345 | 0.0 | 0.33333333 | 1 | 4 | expressed protein |
| AT1G47640 | 0.0 | 0.33333333 | 1 | 4 | SSA2-2S albumin seed storage family protein precursor, putative, expressed |
| AT5G52420 | 0.0 | 0.33333333 | 1 | 4 | expressed protein |
| BPA1 | 0.30818366 | 0.51898734 | 6 | 2 | RNA recognition motif containing protein, putative, expressed |
| AT4G17720 | 0.30818366 | 0.51898734 | 6 | 2 | RNA recognition motif, putative, expressed |
The cluster of GDPDL3-interacting proteins includes the BRASSINOSTEROIDE INSENTIVE 1 (BRI1)-ASSOCIATED RECEPTOR KINASE 1 (BAK1), also designated SOMATIC EMBRYOGENESIS RECEPTOR KINASE 3 (SERK3). BAK1 has been shown to function as a co-receptor for many RLKs, including the recruitment of receptor-like proteins and SOBIR to form a heterodimeric complex upon recognition of ligands by RLPs, for example, RLP23-SOBIR1-BAK1, cf-4-BAK1/SERK3- SOBIR1, RE02-BAK1-SOBIR1, and RXEG1-BAK1-SOBIR1 [46,49,51,67] (Figure 3A, Table 11).
The interactions of GDPDLs- and SNC4 converge to centralized hubs represented by BPA1, AT1G01080, and AT4G17720 (BPL1), which contain an RNA binding motif (Figure 3A, Table 11). The BPA1 protein has been shown to interact with Arabidopsis ACD11, which induces the expression of genes associated with disease resistance and genes involved in the ROS-mediated response defense upon recognizing fungal elicitors [68,69]. Furthermore, BPA1 and BPL1 are induced during geminivirus infection [70]. The GDPDLs-Arabidopsis PPI network is enriched for proteins involved in plant defense response to pathogens and vegetative growth, indicating that this new RLP family may be involved in immunity and developmental signaling.
To gain further insights into the cellular processes involved by GDPDLs, we performed functional enrichment analyses of their direct interactors. In all three categories, biological process, molecular function, and cellular component ontology, we identified enriched GO terms with a p-value < 0.05. Under molecular function, we identified enriched terms for Glycerophosphodiester phosphodiesterase activity, nucleotide binding, purine ribonucleotide binding, and hydrolase activity, which are unusual enzyme activities associated with membrane receptor activity (Table 10). Under the cellular component ontology, we observed an over-representation of proteins from plasma membrane term, membrane-bounded term, and plant-type cell wall term, which may suggest that the location and functional activities of these hubs are specific to transmembrane proteins. (Figure 3B). Under the biological process ontology, the response to defense response, response to external stimulus, and developmental growth term represented significantly enriched GO terms, which show that this family of proteins may be related to immunity and plant development (Table S3).
2.10. The Expression Profile of the GDPDLs in Response to Pathogens and Different Organs
To gain insights into the potential defense response of the GDPDLs genes and to validate these candidate receptor-like proteins as expressed genes, we investigated their expression profiles through publicly available expression datasets using the gene investigator (NEBION, AG, Zurich, Switzerland; www.genevestigator.com, academic free license, accessed on 28 February 2020) (Figure 4A). From these microarray data, GDPDL1-RLK was induced by aphids, the bacteria Pseudomonas syringae, and the begomovirus cabbage leaf curl virus (CabLCV), but not by nematodes. Likewise, GDPDL2-RLP is induced by bacteria and aphids, and begomoviruses to a lesser extent. GDPDL3-RLP and GDPDL4-RLP are upregulated by aphids and bacteria and down-regulated by begomovirus. GDPDL5 and GDPDL6 are not induced by aphids and bacteria but downregulated by CabLCV. As for organ-specific expression, except for GDPDL5-RLP and GDPDL6-RLP which only expressed in flowers and siliques, the remaining GDPDLs are expressed in all organs tested, although to a different extent (Figure 4B). While GDPDL1 and GDPDL2 expressions predominate in the developed rosette, GDPDL3 is highly expressed in germinated seeds, and the GDPDL4 expression is fairly distributed in all organs.
Figure 4.
Analysis in silico of the expression of GDPDL-RLPs. (A) The expression profile of the GDPDL-RLPs in response to pathogens. (B) The expression profile of the GDPDL-RLPs in different organs and developmental stages.
Pathogen-induced and organ-specific expression profiles of the predicted GDPDL-RLP genes were confirmed by qRT-PCR (Figure 5 and Figure 6). We also monitored the expression of the GDPDL-RLP genes in response to infections with tobacco rattle virus (TRV) and CabLCV. The antibacterial immune responses (PTI) were activated by treatment with flg22, and the expression of GDPDLs was monitored (Figure 5). Consistent with the microarray data, GDPL5 and GDPL6 expression was not affected by flg22 treatment but was downregulated by CabLCV, whereas GDPDL1 and GDPDL2 were induced by flg22 and CabLCV. All 5 GDPDLs analyzed by qRT-PCR were induced by TRV, a plant RNA virus. Remarkably, these GDPDL proteins are interconnected via interactions with RNA recognition motif-containing proteins, which form centralized hubs in the network interaction (Figure 3A, Table 11). This result may suggest an involvement of GDPDLs in the antiviral response induced by an RNA virus.
Figure 5.
Expression analysis of the GDPDL genes in response to biotic signals. For the flg22-induced expression of GDPDLs (as indicated in the figure), 12-day-old Arabidopsis seedlings were treated with 100 nM flg22, and total RNA was prepared from 100 µg of a pool of 10 flg22-treated plants. For TRV infection, Arabidopsis leaves were mechanically inoculated with TRV from N. benthamiana-infected leaves, and TRV infection was diagnosed by PCR. For CabLCV infection, Arabidopsis plants were inoculated with infectious DNA-A and DNA-B clones, and viral accumulation was monitored by PCR. After 15 days of TRV inoculation and 21 days of CabLCV inoculation, total RNA was extracted from a pool of 10 TRV- and CabLCV-infected plants. The transcript accumulation of the indicated genes was monitored by quantitative RT-PCR with gene-specific primers. The gene expression was calculated by the 2−∆CT method using actin as an endogenous control. The error or standard bars indicate the mean ± SD (n = 3, technical replicates). * p < 0.05.
Figure 6.
Organ-specific expression of the GDPDL genes. Total RNA was extracted from different Arabidopsis organs (as indicated in the figure) of 35-day-grown plants. We used 3 samples of different pools of 10 plants each (therefore n = 3, biological replicates), and the transcript levels of the indicated genes (GDPDL1, GDPDL2, GDPDL3, GDPDL, GDPDL5, and GDPDL6) were determined by qRT-PCR using gene-specific primers. The gene expression was calculated by the 2−∆CT method using actin as an endogenous control. The error or standard bars indicate the mean ± SD (n = 3 biological replicates + n = 3 technical replicates each) of three independent experiments.
We also confirmed the expression profile of these GDPDL genes in different tissues by qRT-PCR. We used the root, pedicel, inflorescence axis, and flower tissues. The expression levels of GDPDL1 and GDPDL2 are similar in all tissues (Figure 6A,B). The highest expression levels were identified in the inflorescence axis and pedicel, suggesting distinct functions in development. Likewise, GDPDL3 is most expressed in roots and barely detected in other tissues (Figure 6C). Interestingly, the expression levels of GDPDL4 are regular in all tissues, showing that this protein may have a varied role during development (Figure 6D). In contrast, qRT-PCR confirmed that the GDPDL5 and GDPDL6 transcripts accumulated to elevated levels in flowers (Figure 6E, 6F). These gene expression analyses confirmed that GDPDL-RLPs are expressed in response to stimuli and development, substantiating the argument that they may form a new class of RLPs involved in immunity and developmental signaling.
3. Discussion
Due to the functional relevance of the RLK family in several biological processes, this large family has been extensively studied in different plant species [6,9,71,72,73,74,75]. In contrast, far less is known about the plant RLP family, despite their conceptual relevance in signaling modules. RLPs can perceive external signals but depend on association with RLKs for signal transduction due to the lack of a cytoplasmic kinase domain at the C-terminus. The absence of a conserved kinase domain precludes using sequence comparison algorithms for genome-wide studies of the plant RLP family. Thus, identifying RLPs in plant genomes is challenging, and few RLPs have been described in plant species. Moreover, a large-scale RLP prediction tool has not been developed. Here, we developed the RLPredictiOme method based on machine learning approaches and Bayesian inference for the throughout prediction of RLPs.
Typically, the ML classification models applied in plant molecular biology require actual data to train ML-supervised algorithms [54,76,77,78]. The RLPredictiOme can predict RLP subfamilies using the RLK ectodomain and simultaneously six types of features during the prediction process. The prediction model consists of three steps subsequently built with trained models and different algorithms capable of distinguishing RLP from NRLP, RLP from RLKs, and finally, predicting an RLP subfamily. The combination of several ML models with different algorithms has been applied for protein and viral sequence classification [58,63]. Using different classifiers requires methods that compile the results of the classifiers into a single final prediction. Some methods have used different techniques for model combinations, including a majoritarian vote of the classifiers or an average probability for the classifications [63,79]. The approaches applied in the RLPredictiOme by combining models are based on the success and failure of predictions, which are modeled with Bayesian inference. In each step after the classifications, the Bayesian inference is applied. The validation results of the RLPredictiOme showed high probabilities for classifying RLPs proteins (See Table 7, columns RLP-NRLP Probability, RLP-RLK Probability, and RLP-Subfamily Probability). In contrast, NRLP proteins were predicted with a lower probability (Table 8). Finally, based on the probability of Bayesian inferences for each step, the last step is used as a decision-making process for the prediction of RLPs (Figure 1F). The RLPredictiOme predicts RLP proteins with a probability ranging from 0.79 to 0.99 (See Table 7, Table 8 and Table 9, column Decision probability). Thus, the ML models can be successfully combined with Bayesian inference to perform robust high-throughput predictions of RLPs in plant genomes.
The RLPredictiOme could predict new RLP subfamilies with higher probability in all steps, although groups less represented were also classified into a corresponding subfamily, yet with lower probability. Furthermore, groups less represented by RLPs tended to be classified within other RLP subfamilies. This other RLP classification was the case of the probable lipid transfer-RLP subfamily, which shares similar functional characteristics with LysM-RLP. The lipid transfer proteins (LTPs), already described as non-specific lipid transfer proteins (nsLTPs), contain an eight-cysteine motif that is stabilized by four disulfide bonds (Wang et al., 2019). The probable lipid transfer family (PLT)-RLPs found by RLPredictiOme harbor a five-cysteine motif (CC-Xn-CXC-Xn-C) in the TP_2 functional domain differently from the typical nsLTPs [80]. Phylogenetics relationships, structure, and genome-wide distribution of LTPs, involved in response to nematodes, have been described in cucumbers (Wang et al., 2019). Furthermore, PLTs have been shown to play a crucial role in regulating various plant biological processes and responding to biotic and abiotic stress [81,82]. Due to evidence of association with kinases, PTL-RLPs may be classified as a new subfamily of RLPs or may represent an expansion of the LysM-RLP subfamily, which exhibits similar functional roles.
In silico and in vitro analyses of GDPDL-RLPs confirmed the efficiency of the RLPredictiOme in identifying a new family of RLPs based on the ectodomain of GDPDL-RLK sequences. The GDPDL-RLK is a reduced class of RLKs in plants. Among all the plant species analyzed, they have been found only in Arabidopsis halleri (Araha.28943s0001.1), Arabidopsis lyrata (475793), Arabidopsis thaliana (AT1G66980.1), Boechera stricta (Bostr.26959s0213.1, Bostr.26959s0216.1), and Brassica rapa (Brara.K00110.1), all from the Brassicaceae family, and Capsella grandiflora (Cagra.0792s0001.1) and Panicum virgatum (Pavir.6NG294600.1), from the Poaceae family. Despite only one GDPDL-RLK in the Arabidopsis genome [83], RLPreditiOme identified five sequences as GDPDL-RLP. Furthermore, the GDPDL-RLK subfamily has been maintained in only a few plant species; thereby, this family is likely suffering a reduction in size and distribution. The GDPDL2-RLK (AT1G66980) has been previously characterized as SNC4, an atypical receptor-like kinase with a predicted extracellular GDPD domain involved in regulating plant immunity [84]. The glycerophosphodiester phosphodiesterase (GDPD) hydrolyzes glycerophosphodiesters into sn-glycerol-3-phosphate (G-3-P) and plays a significant role in various biological processes [84]. The GDPDL2-RLK ectodomain is structurally similar to the predicted GDPDL-RLPs (Figure 2B). Molecular evolution investigated by calculating ka/ks of GDPDL-RLP-GDPDL-RLK pairs revealed a significant rate of synonymous substitutions indicating that although the kinase domain has been lost, the functional characteristics of the ectodomain remained conserved among evolution (Table 10).
A common feature of the RLK subfamilies is that they are often more extensive than the RLP subfamily counterparts, which suggests that some members of the RLK subfamilies have lost their conserved C-terminal kinase domain during evolution. In contrast, RLPredictiOme identified a new RLP subfamily, GDPDL-RLP, which seems to have expanded compared to the corresponding GDPDL-RLK subfamily. Therefore, we were interested in examining the expression profile of the GDPDL-RLP members to ensure a basal level of expression during development or in response to pathogens. In silico analyses from publicly available expression databases indicated that the RLP members display differential expression profiles in response to pathogens and different organs, indicating that they may be involved in development and immunity.
GDPDL1 (GDPGL-RLP) has been previously shown to be expressed in the rosettes of Arabidopsis plants [85]. We confirmed by qRT-PCR that GDPDL1 is expressed in the pedicels of the rosette and flowers. GDPDL1 has also been shown to be involved in processes that confer rigidity to the cell wall, related to defense against insects, nematodes, and oomycetes [85]. Accordingly, the previously published microarray data showed a high GDPDL1 induction in response to these pathogens and pests.
GDPDL1 and GDPDL2 displayed the highest expression in pedicels and flower stems and were highly expressed in response to pathogens and flg22. Among all members of this new GDPDL family, GDPDL3 was barely detected in the organs examined except in roots, consistent with its role in root morphogenesis [86]. GDPL4 was uniformly expressed in all organs evaluated. GDPDL4 has been described as a highly expressed gene in rosettes and is involved in the development of root hair [85,87]. Therefore, the expression profile of already described GDPDLs is coordinated with their assigned function.
Two undescribed family members, GDPDL6 and GDPDL5, displayed elevated levels of expression in flowers, showing that both genes may be involved in the development of reproductive organs and structures. These genes are also induced by biotic signals, as RT-qPCR demonstrated they were upregulated by TRV infection and microarray data showed their slight induction by nematodes. We found that all GDPDLs are induced by the RNA virus TRV and form interconnected protein-protein hubs with RNA binding proteins. It would be relevant to investigate whether GDPDLs function in RNA virus infection. The expression pattern and evolution studies of members of the GDPGL-RLP subfamily further substantiate the notion that the members of this subfamily have maintained functional domains and may play relevant roles in development and plant defense.
4. Materials and Methods
4.1. Reclassification of the Plant RLK Ectodomains for Composing Datasets
The amino acid sequences of 80 plant species were retrieved from the Phytozome database (version 11.1 by DOE Joint Genome Institute, Lawrence Berkeley National Laboratory; https://phytozome.jgi.doe.gov/, accessed on 28 February 2020). We applied filters to remove unknown sequence proteins without functional annotation. The sequences were re-annotated using SMART (version 8.0, licensed by Creative Commons Licence, manufactured by Heidelberg, Germany; smart.embl-heidelberg.de) and Pfam (pfam.sanger.ac.uk) databases. Then, the amino acid sequences containing a predicted kinase domain were selected. The signal peptide was predicted using SignalP v.4.0 [50] and Phobius [88] software, whereas the transmembrane segment was identified using TMHMM [89] and Phobius software. Then, the sequences were filtered by using the criteria based on the presence of a signal peptide and a transmembrane segment. Furthermore, the redundant sequences were removed through CD-HIT algorithm [90]. Subsequently, the amino acid sequences were grouped according to the functional domain of the extracellular ectodomain (LRR-RLK, WAK-RLK, and LysMRLK, for example) [9,91].
4.2. Dataset Composition
For the classification of RLPs, we used three steps: two steps of binary classification and one multilabel classification. In summary, the first stage compares RLPs with other families of NRLP; the second compares RLP with receptor-like kinases (RLKs); and the third performs the classification of a protein sequence within an RLP subfamily using the functional ectodomain present in RLKs. In the first stage, the training dataset consisted of amino acid sequences containing the extracellular ectodomain, the region of the membrane segment, and the cytoplasmic region that precedes (upstream) the kinase domain of RLKs (but without the kinase domain) as a positive class (RLP). The negative class was composed of full-length amino acid randomly selected sequences (NRLP); the sequences of the positive class were removed from the negative dataset. The dataset was divided into three different datasets to increase the number of negative examples.
In the second stage, the positive class contained the training dataset (RLP), and the negative class used the full-length amino acid sequences of RLKs. In the third stage, the data from RLP positive classes were labeled according to the reclassification of RLKs based on their ectodomain. In this case, a putative LRR-RLP, for instance, contained an ectodomain of the leucine-rich repeat kinase receptor-like kinase (LRR-RLK), a transmembrane segment, and a short cytoplasmic region excluding a kinase domain. Furthermore, the whole dataset was distributed into ten different sub-datasets to work around the computational time limitations of the training.
4.3. Feature Extraction
Six types of feature types representing residue frequency composition were calculated for each residue sequence. These included (i) amino acid composition frequency of full-length sequence, (ii) amino acid composition frequency (monopeptide) of the N-terminal and C- terminal regions, (iii) dipeptide frequency, (iv) tripeptide frequency, (v) frequency of chemical properties of amino acid side chains (CPAASC), and (vi) CPAASC2 frequency of the N-terminal and C-terminal regions. A numerical feature vector was created for each sequence of positive and negative datasets. The CPAASC feature describes the frequency of the chemical properties of amino acid side chains, such as positively charged, negatively charged, polar uncharged, aromatic, nonpolar aliphatic, hydrophobicity, volume, and mass of the total number of amino acids in the full-length peptide sequence [63]. In contrast, the CPAASC2 is calculated by the frequency of the chemical properties of amino acid side chains of the N-terminal and C-terminal regions. The full-length sequence is split into two equal (or nearly equal) regions, and the proportion of amino acid composition was also calculated for each of these regions. We consider the N-terminus the first region of the complete amino acid sequence and the C-terminus the second region of the full-length sequence.
The amino acid composition feature describes the frequency of an individual amino acid type within the total number of amino acids in the full-length peptide sequence (Saravanan and Gautham, 2015). The amino acid composition comprises 20 features (ACDEFGHIKLMNPQRSTVWY). The amino acid composition frequency is calculated by the individual amino acid type of the N-terminal and C-terminal regions. The amino acid composition frequency in the N-terminal and C-terminal regions comprises 40 features. The dipeptide frequency describes all combinations of amino acid pairs and comprises 400 features [92]. The tripeptide frequency describes all combinations of three amino acids resulting in 8000 features [93].
The six types of features were used to train all classification models in the three proposed steps. In summary, three training datasets totaling 18 training sets were created for each feature type to compare RLPs with NRLPs proteins (first stage). However, to compare RLPs with RLKs (second stage), one training dataset for each feature type was created. Finally, to classify RLPs within a subfamily (third stage), ten training datasets for each feature type were created, resulting in 60 training sets.
4.4. Dealing with Imbalanced Datasets
The superfamily RLK in plants has been broadly characterized and is subdivided into different groups with a different number of members in the subfamilies. The LRR-RLK is the largest subfamily, whereas other subfamilies have a lower frequency of plant members; we used the SMOTE algorithm [94] to oversample the minority class, resulting in a balanced dataset. The SMOTE creates synthetic samples based on the values of the features from the minor class.
4.5. Machine Learning Algorithms
The RLPredictiOme method embeds several ML models built with the previously described training sets. This study tested 20 ML algorithms to select the one that suits the supervised learning task. Those algorithms are implemented in the Python library Scikit-learn v.0.22.1 [95]. The algorithms AdaBoost, probability calibration, Gradient Boosting, K-Nearest Neighbors, Linear discriminant analysis, Logistic Regression, and Deep Neural Network were selected, respectively, to compose RLPredictiOme [96,97,98,99,100,101,102,103,104].
4.6. Performance Assessment of the Models
The evaluation metrics used in bioinformatics were applied to choose the most efficient algorithms and training models. We evaluated accuracy, F-measure, false discovery rate (FDR), Mathew’s correlation coefficient (MCC), precision, sensitivity, and specificity for each training set and algorithm. These metrics are calculated based on the confusion matrix (contingence matrix) using the number of true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN), respectively. For multi-class models, PyCM python library was used (multi-class confusion matrix library in Python) [105].
4.7. Bayesian Inference in Ensemble Methods
Ensemble methods under an ML approach combine the predictions of several classification models with improving the overall performance. Thus, it attempts to avoid misclassification due to noise, bias, and data variance reductions. In an ensemble method, several models are used to predict each data instance. In the binary classification contrasts involving the models RLPs versus NRLPs, and RLPs versus RLKs, we assumed the results provided by n independent Bernoulli trials (0 or 1 values) with probability parameter π. Thus, the number of successes (x) derived from these trials follows a binomial distribution [106]. In this context, we assumed a Beta distribution as the prior distribution for π [107]. Under the Bayes theorem, the posterior distribution for π (probability of success of classification) is a beta distribution and is conjugated with a binomial distribution. The multilabel models to classify RLP sub-families have different probabilities of success. Thus, the sum of the classification success for each subfamily follows a multivariate generalization of the binomial distribution, named multinomial distribution. We assumed the multinomial distribution for response vector x and probability of observed, and N is a vector of the total counts in each RLP sub-families. Thus, the data distribution assumes a multinomial model for all trials. The prior probability widely used for multinomial models is the Dirichlet distribution, which presents the parameters π and θ. The data vector (x) accounts for the total counts in each RLP sub-family.
We perform Bayesian inference using the Bayesian statistical modeling and PyMC3 Python library, which uses the Markov chain Monte Carlo (MCMC) algorithms to explore the posterior distributions [108]. Based on previous analyses with MCMC chains, we opted to use a single chain with 10,000 iterations per amino acid sequence. We used burn-in to 2000 iterations and four chains for all models. The Gibbs sampler algorithm was used to generate random samples from the posterior distribution for all analyses [109].
4.8. Classifier Evaluation Strategy
The classification models were evaluated using 10-fold cross-validation. Thus, the data were divided into ten subsets, assuming the training with nine datasets and validation with one dataset. This procedure was repeated ten times, whereas the testing for the RLPredictiOme method was performed with three independent datasets. One dataset was composed of 44 RLPs already described in the literature, and other datasets with 57 LRR-RLPs and legume-like (L-type) lectins, G-type lectins, calcium-dependent (C-type) lectins, and the lectin-like Lysin-motifs (LysM) described in Arabidopsis [53,110,111]. In addition, 100 random amino acid sequences were created by an in-house algorithm to demonstrate that the classifiers do not calculate random predictions.
4.9. RLP Subfamilies Downstream Analysis
The function domain prediction analysis was carried out with the Pfam database (version 31, licensed by Creative Commons Zero (“CC0”), manufactured by European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI; Hinxton, Cambridge; http://pfam.xfam.org/) with a Hidden Markov Model (HMM) algorithm implemented in Hummer software. The signal peptide and transmembrane segment were predicted with SignalP v.4.0 and TMHMM software, respectively [50]. The topology diagram was performed with Protter Web server [112]. The sequence alignment of the RLP superfamily was conducted using the Muscle algorithm (version V1.4.4 by EMBL-EBI, Hinxton, Cambridges, United Kingdom; www.ebi.ac.uk/Tools/msa/muscle/). The phylogenetic analysis was performed by the maximum likelihood statistical method with 10.000 bootstraps using FastTree software [113]. The tree was edited using the FigTree (version V1.4.4 by Andrew Rambaut; http://tree.bio.ed.ac.uk/software/figtree/) software. The gene expression of the glycerophosphoryl diester phosphodiesterase RLP subfamily was investigated through the meta-analysis of transcriptomes using Geneinvestigator V3 [114] and ePlant [115] for the expression in tissues and responses to pathogens.
4.10. Protein-Protein Interaction (PPI) Network Analysis
GDPDLs- and SNC4-interacting proteins from Arabidopsis were used as a query term to identify their respective interactions described in the BAR database (Genome Evolution and Function (CAGEF, University of Toronto, Toronto, Canadá; http://bar.utoronto.ca/interactions/). The IntAct and Biogrid databases were selected for searching. The protein–protein interactions (PPI) were visualized in the Cytoscape software (version 3.8.1, licensed by LGP, manufactured by National Resource for Network Biology (NRNB, USA; https://cytoscape.org/), which allowed us to spot the firework topology of the interactions network and measure the network centrality metrics for each protein. We used betweenness, closeness, eccentricity, and degree. Briefly, the betweenness centrality in the PPI network of the graph G = (V, E) was calculated by the number of times a protein interacts along the shorter paths among all nodes. The closeness centrality of a protein v is the sum of the shortest path distances from w to all other proteins. The eccentricity centrality of a protein v is the maximum distance from v to all other proteins in graph G. The degree of centrality of protein v is the total number of adjacent proteins.
4.11. Plant Growth, Treatment with flg22, and Viral infection with TRV and CabLCV
All gene expression experiments used Arabidopsis thaliana ecotype Columbia (Col-0) at different ages. The seeds were germinated on half-strength Murashige and Skoog (MS; Sigma = Aldrich) plates containing 10% (w/v) sucrose and 0.8% (w/v) agar, sterile, and grown under normal growth conditions at 21 °C under a 16 h light/8 h dark cycle. After 10 days, the seedlings were transferred to a tissue culture plate containing 2 mL of 100 nM flg22 (Sigma-Aldrich), and incubated for 15 min [116]. For the viral infection assay with tobacco rattle virus (TRV), Agrobacterium cultures containing TRV-RNA1 (pTRV1) and TRV-RNA2 (pTRV2) T-DNA constructs were infiltrated onto the lower leaf of four-leaf stage N. benthamiana plants using a 1-mL needleless syringe. Infected leaves were confirmed by conventional RT-PCR using TRV-RNA2-specific primers. TRV was mechanically inoculated in A. thaliana grown in soil in a growth chamber for 14 days by rubbing the leaves with sap (0.05 M K2HPO4, pH 7.2, 0.01 M Na2SO3) from infected N. benthamiana leaves. After 2 weeks of inoculation, viral infection was confirmed by RT-PCR. For infection with cabbage leaf curl virus (CabLCV), plants at the seven-leaf stage were inoculated with plasmids containing partial tandem repeats of CabLCV DNA-A and DNA-B [117], using biolistic delivery as previously described [118,119]. Inoculated plants were transferred to a growth chamber, and infection was confirmed by conventional PCR using CabLCV DNA-B-specific primers.
4.12. RNA Extraction, Synthesis of cDNA, and qRT-PCR Analysis
For quantitative RT-PCR, total RNA was extracted from frozen leaves or seedlings with TRIzol (Invitrogen) according to the instructions from the manufacturer. To quantify flg22-induced expression, total RNA was extracted from a pool of 10 flg22-treated seedlings (as described in 4.11). For the TRV infection experiment, total RNA was extracted from a pool of 10 infected plants two weeks post-inoculation (as described in 4.11). For CabLCV infection, total RNA was extracted from a pool of 10 infected plants after 21 days of inoculation. To quantify gene expression in different organs, total RNA was extracted from flowers, the inflorescence axis, pedicels of 35 days-soil-grown Col-0 plants, and from roots of 10 days-grown plants in MS medium under the conditions described in 4.11. We used 3 samples of different pools of 10 plants each (therefore n = 3, biological replicates) and three technical replicates.
Total RNA was treated with 2 units of RNase-free DNase (Promega). First-strand cDNA was synthesized from 3.5 mg of total RNA using oligo-dT(18) and Transcriptase Reverse M-MLV (Invitrogen), according to the manufacturer’s instructions. Real-time RT-PCR reactions were performed on ABI7500 equipment (Applied Biosystems), using SYBR Green PCR Master Mix (Bio-rad). The amplification reactions were performed as follows: 2 min at 50 °C, 10 min at 95 °C, and 40 cycles of 94 °C for 15 s and 60 °C for 1 min. To quantify gene expression, we used the 2−∆Ct method and actin 3 (At3g53750) as the endogenous control genes for data normalization.
5. Conclusions
An extensive family of RLKs and RLPs on the cell surface perceive external stimuli and allows communication of plant cells with the environment. Due to their conceptual relevance in cell signaling, RLKs have been extensively studied and characterized. In contrast, little is known about the RLP family that does not harbor conserved domains to prototype genome-wide searching and characterization of members in different plant species. As a result of this investigation, a new method, based on artificial intelligence and machine learning models in combination with Bayesian inference, designated RLPredictiOme, is proposed to perform genome-wide surveys of RLPs in plant species.
We provided evidence indicating that RLPredictiOme reliably predicts RLP subfamilies in plant genomes. First, the ML models achieved high accuracy, precision, sensitivity, and specificity for predicting RLPs with relatively high probability ranging from 0.79 to 0.99. Second, in the validation tests, more than 90% of known RLPs from Arabidopsis and rice were correctly predicted via RLPredictiOme. Finally, RLPredictiOme may have outperformed the predicting methods based on sequence comparison because it discovered new RLP subfamilies in the Arabidopsis genome. Therefore, PredctOme provides a reliable means to rationalize functional studies of the RLP gene family.
The new GDPDL-RLP subfamily seems to have expanded from the only GDPDL-RLK representative in the Arabidopsis genome. All five GDPDL-RLPs were expressed in different organs and responded to biotic signals. Evolution studies showed that their ectodomain may have undergone purifying selection, indicating that the members of this subfamily may have kept conserved functional signatures during evolution. In addition, an in silico analysis demonstrated that GDPDL-RLPs form biologically relevant hubs in the GDPDL-RLP-Arabidopsis protein-protein interactions network. Collectively, these biological studies confirmed the prediction of the new GDPDL-RLP subfamily.
In addition to using a set of conventional extractable features for training the classification models, RLPredictiOme also filters the conserved characteristics of the RLP configuration. These conserved attributes include the presence of a signal peptide, RLK ectodomains, a transmembrane segment, and the lack of a C-terminal kinase domain. Therefore, RLPredictiOme has the potential to predict RLPs from other organisms as well. Furthermore, the consistent and expanded results using RLPredctOme, which applies a different approach from sequence comparison methods, certify this new method as an innovative and promising tool for predicting RLPs. RLPredictOme will ultimately serve as an essential complement for protein annotation, identification, and functional prediction of novel RLPs in different plant species and organisms.
Acknowledgments
This work was partially supported by the Brazilian funding agencies: Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq), Fundação de Amparo à Pesquisa do Estado de Minas Gerais (FAPEMIG), and the National Institute of Science and Technology in Plant-Pest Interactions (INCTIPP).
Abbreviations
| ACC | accuracy | ML | machine learning |
| BAK1 | BRI1-ASSOCIATED RECEPTOR KINASE1 | MLPL | major latex protein-like |
| BRI1 | BRASSINOSTEROID INSENSITIVE 1 | MS | Murashige and Skoog |
| CabLCV | cabbage leaf curl virus | NEP1 | NECROSIS- AND ETHYLENE-INDUCING PEPTIDE 1 |
| CAP | adenylate-cyclase-associated | NLPs | NEP1-LIKE PROTEINS |
| CERK1 | CHITIN ELICITOR RECEPTOR KINASE 1 | NRLPs | non-RLPs |
| CLV1 | CLAVATA1 | nsLTP | non-specific lipid transfer proteins |
| CPAASC2 | chemical properties of amino acid side chains 2 | PAMPs | pathogen-associated molecular patterns |
| DAMPs | damage-associated molecular patterns | PEPR1 | PEP1 RECEPTOR 1 |
| ECD | extracellular domain | PEPR2 | PEP1 RECEPTOR 2 |
| EPF1 | EPIDERMAL PATTERNING FACTOR 1 | PPI | protein-protein interaction |
| EPF2 | EPIDERMAL PATTERNING FACTOR 2 | PRRs | pattern recognition receptors |
| ER | endoplasmic reticulum | PSK | PHYTOSULFOKINE |
| ERL1 | ERECTA-LIKE 1 | PSKR1 | PHYTOSULFOKINE RECEPTOR 1 |
| ETI | effector-triggered immunity | PPI | protein-protein interactions |
| FDR | false discovery rate | PTI | PAMP-triggered immunity |
| GDPDL | glycerophosphoryl diester phosphodiesterase family | RLCK | receptor-like cytoplasmic kinases |
| HMM | hidden Markov model | RLP | receptor-like protein |
| LRR | leucine-rich repeat | SOBIR1 | SUPPRESSOR OF BIR1-1 |
| LRR-RLK | leucine-rich repeat kinase receptor-like kinase | SP | signal peptide |
| LYM1 | LYSIN-MOTIF 1 | TMM | RLP TOO MANY MOUTHS |
| LYM3 | LYSIN-MOTIF 3 | TN | true negatives |
| LysM | lysin-motifs | TP | true positives |
| MCC | Mathew’s correlation coefficient | TRV | tobacco rattle virus |
Supplementary Materials
The supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/ijms232012176/s1.
Author Contributions
J.C.F.S., conceptualization, writing—original draft preparation; M.A.F. conducted laboratory experiment; T.F.M.C., server configuration online and front-end developer; F.F.S., S.d.A.S., S.H.B., E.P.B.F., writing—review and editing, supervision; E.P.B.F., project administration, and funding acquisition. All authors have read and agreed to the published version of the manuscript.
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
The data are available at http://209.145.56.49:8080/web/.
Conflicts of Interest
The authors declare no conflict of interest.
Funding Statement
This research was funded by Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq Grant no. 403819/2021-0 to E.P.B.F.) and Fundação de Amparo à Pesquisa do Estado de Minas Gerais, Brazil (Fapemig Grants no APQ-01282-17 and RED-00205-22 to E.P.B.F.).
Footnotes
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- 1.Tang D., Wang G., Zhou J.M. Receptor kinases in plant-pathogen interactions: More than pattern recognition. Plant Cell. 2017;29:618–637. doi: 10.1105/tpc.16.00891. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.He Y., Zhou J., Shan L., Meng X. Plant cell surface receptor-mediated signaling–a common theme amid diversity. J. Cell Sci. 2018;131:jcs209353. doi: 10.1242/jcs.209353. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Shiu S.H., Karlowski W.M., Pan R., Tzeng Y.H., Mayer K.F., Li W.H. Comparative analysis of the receptor-like kinase family in Arabidopsis and rice. Plant Cell. 2004;16:1220–1234. doi: 10.1105/tpc.020834. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Ma X., Xu G., He P., Shan L. SERKing coreceptors for receptors. Trends Plant Sci. 2016;21:1017–1033. doi: 10.1016/j.tplants.2016.08.014. [DOI] [PubMed] [Google Scholar]
- 5.Botos I., Segal D.M., Davies D.R. The structural biology of Toll-like receptors. Structure. 2011;19:447–459. doi: 10.1016/j.str.2011.02.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Shiu S.H., Bleecker A.B. Plant receptor-like kinase gene family: Diversity, function, and signaling. Sci. STKE. 2001;2001:re22. doi: 10.1126/stke.2001.113.re22. [DOI] [PubMed] [Google Scholar]
- 7.Shiu S.H., Bleecker A.B. Expansion of the receptor-like kinase/Pelle gene family and receptor-like proteins in Arabidopsis. Plant Physiol. 2003;132:530–543. doi: 10.1104/pp.103.021964. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Gao L.L., Xue H.W. Global analysis of expression profiles of rice receptor-like kinase genes. Mol. Plant. 2012;5:143–153. doi: 10.1093/mp/ssr062. [DOI] [PubMed] [Google Scholar]
- 9.Sakamoto T., Deguchi M., Brustolini O.J., Santos A.A., Silva F.F., Fontes E.P. The tomato RLK superfamily: Phylogeny and functional predictions about the role of the LRRII-RLK subfamily in antiviral defense. BMC Plant Biol. 2012;12:229. doi: 10.1186/1471-2229-12-229. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Zhou F., Guo Y., Qiu L.J. Genome-wide identification and evolutionary analysis of leucine-rich repeat receptor-like protein kinase genes in soybean. BMC Plant Biol. 2016;16:58. doi: 10.1186/s12870-016-0744-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Li J., Chory J. A putative leucine-rich repeat receptor kinase involved in brassinosteroid signal transduction. Cell. 1997;90:929–938. doi: 10.1016/S0092-8674(00)80357-8. [DOI] [PubMed] [Google Scholar]
- 12.Lee J.S., Kuroha T., Hnilova M., Khatayevich D., Kanaoka M.M., McAbee J.M., Sarikaya M., Tamerler C., Torii K.U. Direct interaction of ligand–receptor pairs specifying stomatal patterning. Genes Dev. 2012;26:126–136. doi: 10.1101/gad.179895.111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Jia G., Liu X., Owen H.A., Zhao D. Signaling of cell fate determination by the TPD1 small protein and EMS1 receptor kinase. Proc. Natl. Acad. Sci. USA. 2008;105:2220–2225. doi: 10.1073/pnas.0708795105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Cho S.K., Larue C.T., Chevalier D., Wang H., Jinn T.L., Zhang S., Walker J.C. Regulation of floral organ abscission in Arabidopsis thaliana. Proc. Natl. Acad. Sci. USA. 2008;105:15629–15634. doi: 10.1073/pnas.0805539105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Kumpf R.P., Shi C.L., Larrieu A., Stø I.M., Butenko M.A., Péret B., Riiser E.S., Bennett M.J., Aalen R.B. Floral organ abscission peptide IDA and its HAE/HSL2 receptors control cell separation during lateral root emergence. Proc. Natl. Acad. Sci. USA. 2013;110:5235–5240. doi: 10.1073/pnas.1210835110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Chen D., Guo H., Chen S., Yue Q., Wang P., Chen X. Receptor-like kinase HAESA-like 1 positively regulates seed longevity in Arabidopsis. Planta. 2022;256:21. doi: 10.1007/s00425-022-03942-y. [DOI] [PubMed] [Google Scholar]
- 17.Ogawa M., Shinohara H., Sakagami Y., Matsubayashi Y. Arabidopsis CLV3 peptide directly binds CLV1 ectodomain. Science. 2008;319:294. doi: 10.1126/science.1150083. [DOI] [PubMed] [Google Scholar]
- 18.Ou Y., Kui H., Li J. Receptor-like kinases in root development: Current progress and future directions. Mol. Plant. 2021;14:166–185. doi: 10.1016/j.molp.2020.12.004. [DOI] [PubMed] [Google Scholar]
- 19.Hirakawa Y., Shinohara H., Kondo Y., Inoue A., Nakanomyo I., Ogawa M., Sawa S., Ohashi-Ito K., Matsubayashi Y., Fukuda H. Non-cell-autonomous control of vascular stem cell fate by a CLE peptide/receptor system. Proc. Natl. Acad. Sci. USA. 2008;105:15208–15213. doi: 10.1073/pnas.0808444105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Wang J., Li H., Han Z., Zhang H., Wang T., Lin G., Chang J., Yang W., Chai J. Allosteric receptor activation by the plant peptide hormone phytosulfokine. Nature. 2015;525:265–268. doi: 10.1038/nature14858. [DOI] [PubMed] [Google Scholar]
- 21.Haruta M., Sabat G., Stecker K., Minkoff B.B., Sussman M.R. A peptide hormone and its receptor protein kinase regulate plant cell expansion. Science. 2014;343:408–411. doi: 10.1126/science.1244454. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Zhong S., Li L., Wang Z., Ge Z., Li Q., Bleckmann A., Wang J., Song Z., Shi Y., Liu T., et al. RALF peptide signaling controls the polytubey block in Arabidopsis. Science. 2022;375:290–296. doi: 10.1126/science.abl4683. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Macho A.P., Zipfel C. Plant PRRs and the activation of innate immune signaling. Mol. Cell. 2014;54:263–272. doi: 10.1016/j.molcel.2014.03.028. [DOI] [PubMed] [Google Scholar]
- 24.Gómez-Gómez L., Boller T. FLS2: An LRR receptor–like kinase involved in the perception of the bacterial elicitor flagellin in Arabidopsis. Mol. Cell. 2000;5:1003–1011. doi: 10.1016/S1097-2765(00)80265-8. [DOI] [PubMed] [Google Scholar]
- 25.Zipfel C., Kunze G., Chinchilla D., Caniard A., Jones J.D., Boller T., Felix G. Perception of the bacterial PAMP EF-Tu by the receptor EFR restricts Agrobacterium-mediated transformation. Cell. 2006;125:749–760. doi: 10.1016/j.cell.2006.03.037. [DOI] [PubMed] [Google Scholar]
- 26.Yamaguchi Y., Pearce G., Ryan C.A. The cell surface leucine-rich repeat receptor for At Pep1, an endogenous peptide elicitor in Arabidopsis, is functional in transgenic tobacco cells. Proc. Natl. Acad. Sci. USA. 2006;103:10104–10109. doi: 10.1073/pnas.0603729103. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Yamaguchi Y., Huffaker A., Bryan A.C., Tax F.E., Ryan C.A. PEPR2 is a second receptor for the Pep1 and Pep2 peptides andcontributes to defense responses in Arabidopsis. Plant Cell. 2010;22:508–522. doi: 10.1105/tpc.109.068874. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Miya A., Albert P., Shinya T., Desaki Y., Ichimura K., Shirasu K., Narusaka Y., Kawakami N., Kaku H., Shibuya N. CERK1, a LysM receptor kinase, is essential for chitin elicitor signaling in Arabidopsis. Proc. Natl. Acad. Sci. USA. 2007;104:19613–19618. doi: 10.1073/pnas.0705147104. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Wan J., Zhang X.C., Neece D., Ramonell K.M., Clough S., Kim S.y., Stacey M.G., Stacey G. A LysM receptor-like kinase plays a critical role in chitin signaling and fungal resistance in Arabidopsis. Plant Cell. 2008;20:471–481. doi: 10.1105/tpc.107.056754. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Wan J., Tanaka K., Zhang X.C., Son G.H., Brechenmacher L., Nguyen T.H.N., Stacey G. LYK4, a lysin motif receptor-like kinase, is important for chitin signaling and plant innate immunity in Arabidopsis. Plant Physiol. 2012;160:396–406. doi: 10.1104/pp.112.201699. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Petutschnig E.K., Jones A.M., Serazetdinova L., Lipka U., Lipka V. The lysin motif receptor- like kinase (LysM-RLK) CERK1 is a major chitin-binding protein in Arabidopsis thaliana and subject to chitin-induced phosphorylation. Plant Biotechnol. J. 2010;285:28902–28911. doi: 10.1074/jbc.M110.116657. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Cao Y., Liang Y., Tanaka K., Nguyen C.T., Jedrzejczak R.P., Joachimiak A., Stacey G. The kinase LYK5 is a major chitin receptor in Arabidopsis and forms a chitin-induced complex with related kinase CERK1. eLife. 2014;3:e03766. doi: 10.7554/eLife.03766. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Ranf S., Gisch N., Schäffer M., Illig T., Westphal L., Knirel Y.A., Sánchez-Carballo P.M., Zähringer U., Hückelhoven R., Lee J., et al. A lectin S-domain receptor kinase mediates lipopolysaccharide sensing in Arabidopsis thaliana. Nat. Immun. 2015;16:426–433. doi: 10.1038/ni.3124. [DOI] [PubMed] [Google Scholar]
- 34.Yu H., Ruan H., Xia X., Chicowski A.S., Whitham S.A., Li Z., Wang G., Liu W. Maize FERONIA-like receptor genes are involved in the response of multiple disease resistance in maize. Mol. Plant Pathol. 2022;23:1331–1345. doi: 10.1111/mpp.13232. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Ortiz-Morea F.A., Liu J., Shan L., He P. Malectin-like receptor kinases as protector deities in plant immunity. Nat. Plants. 2022;8:27–37. doi: 10.1038/s41477-021-01028-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Chen X., Ding Y., Yang Y., Song C., Wang B., Yang S., Guo Y., Gong Z. Protein kinases in plant responses to drought, salt, andcold stress. J. Integr. Plant Biol. 2021;63:53–78. doi: 10.1111/jipb.13061. [DOI] [PubMed] [Google Scholar]
- 37.Invernizzi M., Hanemian M., Keller J., Libourel C., Roby D. PERKing up our understanding of the proline-rich extensin-like receptor kinases, a forgotten plant receptor kinase family. New Phytol. 2022;235:875–884. doi: 10.1111/nph.18166. [DOI] [PubMed] [Google Scholar]
- 38.Xie Y., Sun P., Li Z., Zhang F., You C., Zhang Z. FERONIA receptor kinase integrates with hormone signaling to regulate plant growth, development, and responses to environmental stimuli. Int. J. Mol. Sci. 2022;23:3730. doi: 10.3390/ijms23073730. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Xie Y.H., Zhang F.J., Sun P., Li Z.Y., Zheng P.F., Gu K.D., Hao Y.J., Zhang Z., You C.X. Apple receptor-like kinase FERONIA regulates salt tolerance and ABA sensitivity in Malus domestica. J. Plant Physiol. 2022;270:153616. doi: 10.1016/j.jplph.2022.153616. [DOI] [PubMed] [Google Scholar]
- 40.Yang L., Gao C., Jiang L. Leucine-rich repeat receptor-like protein kinase AtORPK1 promotes oxidative stress resistance in and AtORPK1-AtKAPP mediated module in Arabidopsis. Plant Sci. J. 2022;315:111147. doi: 10.1016/j.plantsci.2021.111147. [DOI] [PubMed] [Google Scholar]
- 41.Zhou H., Xiao F., Zheng Y., Liu G., Zhuang Y., Wang Z., Zhang Y., He J., Fu C., Lin H. PAMP-INDUCED SECRETED PEPTIDE 3 modulates salt tolerance through RECEPTOR-LIKE KINASE 7 in plants. Plant Cell. 2022;34:927–944. doi: 10.1093/plcell/koab292. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Liu Z., Hou S., Rodrigues O., Wang P., Luo D., Munemasa S., Lei J., Liu J., Ortiz-Morea F.A., Wang X., et al. Phytocytokine signalling reopens stomata in plant immunity and water loss. Nature. 2022;605:332–339. doi: 10.1038/s41586-022-04684-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Lin G., Zhang L., Han Z., Yang X., Liu W., Li E., Chang J., Qi Y., Shpak E.D., Chai J. A receptor-like protein acts as a specificity switch for the regulation of stomatal development. Genes Dev. 2017;31:927–938. doi: 10.1101/gad.297580.117. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Jeong S., Trotochaud A.E., Clark S.E. The Arabidopsis CLAVATA2 gene encodes a receptor-like protein required for the stability of the CLAVATA1 receptor-like kinase. Plant Cell. 1999;11:1925–1933. doi: 10.1105/tpc.11.10.1925. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Willmann R., Lajunen H.M., Erbs G., Newman M.A., Kolb D., Tsuda K., Katagiri F., Fliegmann J., Bono J.J., Cullimore J.V., et al. Arabidopsis lysin-motif proteins LYM1 LYM3 CERK1 mediate bacterial peptidoglycan sensing and immunity to bacterial infection. Proc. Natl. Acad. Sci. USA. 2011;108:19824–19829. doi: 10.1073/pnas.1112862108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Albert I., Böhm H., Albert M., Feiler C.E., Imkampe J., Wallmeroth N., Brancato C., Raaymakers T.M., Oome S., Zhang H., et al. An RLP23–SOBIR1–BAK1 complex mediates NLP-triggered immunity. Nat. Plants. 2015;1:15140. doi: 10.1038/nplants.2015.140. [DOI] [PubMed] [Google Scholar]
- 47.Jones D.A., Thomas C.M., Hammond-Kosack K.E., Balint-Kurti P.J., Jones J.D. Isolation of the tomato Cf-9 gene for resistance to Cladosporium fulvum by transposon tagging. Science. 1994;266:789–793. doi: 10.1126/science.7973631. [DOI] [PubMed] [Google Scholar]
- 48.Thomas C.M., Jones D.A., Parniske M., Harrison K., Balint-Kurti P.J., Hatzixanthis K., Jones J. Characterization of the tomato Cf-4 gene for resistance to Cladosporium fulvum identifies sequences that determine recognitional specificity in Cf-4 and Cf-9. Plant Cell. 1997;9:2209–2224. doi: 10.1105/tpc.9.12.2209. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Postma J., Liebrand T.W., Bi G., Evrard A., Bye R.R., Mbengue M., Kuhn H., Joosten M.H., Robatzek S. Avr4 promotes Cf-4 receptor-like protein association with the BAK1/SERK3 receptor-like kinase to initiate receptor endocytosis and plant immunity. New Phytol. 2016;210:627–642. doi: 10.1111/nph.13802. [DOI] [PubMed] [Google Scholar]
- 50.Nielsen H. Protein Function Prediction. Springer; New York, NY, USA: 2017. Predicting secretory proteins with SignalP; pp. 59–73. [DOI] [PubMed] [Google Scholar]
- 51.Wang Y., Xu Y., Sun Y., Wang H., Qi J., Wan B., Ye W., Lin Y., Shao Y., Dong S., et al. Leucine-rich repeat receptor-like gene screen reveals that Nicotiana RXEG1 regulates glycoside hydrolase 12 MAMP detection. Nat. Commun. 2018;9:594. doi: 10.1038/s41467-018-03010-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Yu H., Xie W., Li J., Zhou F., Zhang Q. A whole-genome SNP array (RICE 6 K) for genomic breeding in rice. Plant Biotechnol. J. 2014;12:28–37. doi: 10.1111/pbi.12113. [DOI] [PubMed] [Google Scholar]
- 53.Jamieson P.A., Shan L., He P. Plant cell surface molecular cypher: Receptor-like proteins and 957 their roles in immunity and development. Plant Sci. J. 2018;274:242–251. doi: 10.1016/j.plantsci.2018.05.030. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Silva J.C.F., Teixeira R.M., Silva F.F., Brommonschenkel S.H., Fontes E.P. Machine learning approaches and their current application in Plant Mol Biol: A systematic review. Plant Sci. J. 2019;284:37–47. doi: 10.1016/j.plantsci.2019.03.020. [DOI] [PubMed] [Google Scholar]
- 55.Gastaldo P., Pinna L., Seminara L., Valle M., Zunino R. A tensor-based approach to touch modality classification by using machine learning. Rob. Auton. Syst. 2015;63:268–278. doi: 10.1016/j.robot.2014.09.022. [DOI] [Google Scholar]
- 56.Kang J., Schwartz R., Flickinger J., Beriwal S. Machine learning approaches for predicting radiation therapy outcomes: A clinician’s perspective. Int. J. Radiat. Oncol. Biol. Phys. 2015;93:1127–1135. doi: 10.1016/j.ijrobp.2015.07.2286. [DOI] [PubMed] [Google Scholar]
- 57.Zhang B., He X., Ouyang F., Gu D., Dong Y., Zhang L., Mo X., Huang W., Tian J., Zhang S. Radiomic machine-learning classifiers for prognostic biomarkers of advanced nasopharyngeal carcinoma. Cancer Lett. 2017;403:21–27. doi: 10.1016/j.canlet.2017.06.004. [DOI] [PubMed] [Google Scholar]
- 58.Silva J.C.F., Carvalho T.F., Fontes E.P., Cerqueira F.R. Fangorn Forest (F2): A machine learning approach to classify genes and genera in the family Geminiviridae. BMC Bioinform. 2017;18:431. doi: 10.1186/s12859-017-1839-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Pineda M., Pérez-Bueno M.L., Barón M. Detection of bacterial infection in melon plants by classification methods based on imaging data. Front. Plant Sci. 2018;9:164. doi: 10.3389/fpls.2018.00164. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Moghimi A., Yang C., Miller M.E., Kianian S.F., Marchetto P.M. A novel approach to assess salt stress tolerance in wheat using hyperspectral imaging. Front. Plant Sci. 2018;9:1182. doi: 10.3389/fpls.2018.01182. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Gutiérrez S., Fernández-Novales J., Diago M.P., Tardaguila J. On-the-go hyperspectral imaging under field conditions and machine learning for the classification of grapevine varieties. Front. Plant Sci. 2018;9:1102. doi: 10.3389/fpls.2018.01102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Ma C., Xin M., Feldmann K.A., Wang X. Machine learning–based differential network analysis: A study of stress-responsive transcriptomes in Arabidopsis. Plant Cell. 2014;26:520–537. doi: 10.1105/tpc.113.121913. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Carvalho T.F.M., Silva J.C.F., Calil I.P., Fontes E.P.B., Cerqueira F.R. Rama: A machine learning approach for ribosomal protein prediction in plants. Sci. Rep. 2017;7:16273. doi: 10.1038/s41598-017-16322-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Fritz-Laylin L.K., Krishnamurthy N., Tör M., Sjölander K.V., Jones J.D. Phylogenomic analysis of the receptor-like proteins of rice and Arabidopsis. Plant Physiol. 2005;138:611–623. doi: 10.1104/pp.104.054452. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Buendia L., Girardin A., Wang T., Cottret L., Lefebvre B. LysM receptor-like kinase and LysM receptor-like protein families: An update on phylogeny and functional characterization. Front. Plant Sci. 2018;9:1531. doi: 10.3389/fpls.2018.01531. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Jonak C., Hirt H. Glycogen synthase kinase 3/SHAGGY-like kinases in plants: An emerging family with novel functions. Trends Plant Sci. 2002;7:457–461. doi: 10.1016/S1360-1385(02)02331-2. [DOI] [PubMed] [Google Scholar]
- 67.Nie J., Zhou W., Liu J., Tan N., Zhou J.M., Huang L. A receptor-like protein from Nicotiana benthamiana mediates VmE02 PAMP-triggered immunity. New Phytol. 2021;229:2260–2272. doi: 10.1111/nph.16995. [DOI] [PubMed] [Google Scholar]
- 68.Petersen N.H., Joensen J., McKinney L.V., Brodersen P., Petersen M., Hofius D., Mundy J. Identification of proteins interacting with Arabidopsis ACD11. J. Plant Physiol. 2009;166:661–666. doi: 10.1016/j.jplph.2008.08.003. [DOI] [PubMed] [Google Scholar]
- 69.Li Q., Ai G., Shen D., Zou F., Wang J., Bai T., Chen Y., Li S., Zhang M., Jing M., et al. A Phytophthora capsici effector targets ACD11 binding partners that regulate ROS-mediated defense response in Arabidopsis. Mol. Plant. 2019;12:565–581. doi: 10.1016/j.molp.2019.01.018. [DOI] [PubMed] [Google Scholar]
- 70.Ascencio-Ibánez J.T., Sozzani R., Lee T.J., Chu T.M., Wolfinger R.D., Cella R., Hanley-Bowdoin L. Global analysis of Arabidopsis gene expression uncovers a complex array of changes impacting pathogen response and cell cycle during geminivirus infection. Plant Physiol. 2008;148:436–454. doi: 10.1104/pp.108.121038. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Liu J., Chen N., Grant J.N., Cheng Z.M., Stewart Jr C.N., Hewezi T. Soybean kinome: Functional classification and gene expression patterns. J. Exp. Bot. 2015;66:1919–1934. doi: 10.1093/jxb/eru537. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Yan J., Su P., Wei Z., Nevo E., Kong L. Genome-wide identification, classification, evolutionary analysis and gene expression patterns of the protein kinase gene family in wheat and Aegilops tauschii. Plant Mol. Biol. 2017;95:227–242. doi: 10.1007/s11103-017-0637-1. [DOI] [PubMed] [Google Scholar]
- 73.Zuo C., Liu H., Lv Q., Chen Z., Tian Y., Mao J., Chu M., Ma Z., An Z., Chen B. Genome-wide analysis of the apple (Malus domestica) cysteine-rich receptor-like kinase (CRK) family: Annotation, genomic organization, and expression profiles in response to fungal infection. Plant Mol. Biol. Rep. 2020;38:14–24. doi: 10.1007/s11105-019-01179-w. [DOI] [Google Scholar]
- 74.Yan J., Li G., Guo X., Li Y., Cao X. Genome-wide classification, evolutionary analysis and gene expression patterns of the kinome in Gossypium. PLoS ONE. 2018;13:e0197392. doi: 10.1371/journal.pone.0197392. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.Dezhsetan S. Genome scanning for identification and mapping of receptor-like kinase (RLK) gene superfamily in Solanum tuberosum. Physiol. Mol. Biol. Plants. 2017;23:755–765. doi: 10.1007/s12298-017-0471-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76.Pal T., Jaiswal V., Chauhan R.S. DRPPP: A machine learning based tool for prediction of disease resistance proteins in plants. Comput. Biol. Med. 2016;78:42–48. doi: 10.1016/j.compbiomed.2016.09.008. [DOI] [PubMed] [Google Scholar]
- 77.Ni Y., Aghamirzaie D., Elmarakeby H., Collakova E., Li S., Grene R., Heath L.S. A machine learning approach to predict gene regulatory networks in seed development in Arabidopsis. Front. Plant Sci. 2016;7:1936. doi: 10.3389/fpls.2016.01936. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78.Kushwaha S.K., Chauhan P., Hedlund K., Ahrén D. NBSPred: A support vector machine-based high-throughput pipeline for plant resistance protein NBSLRR prediction. Bioinformatics. 2016;32:1223–1225. doi: 10.1093/bioinformatics/btv714. [DOI] [PubMed] [Google Scholar]
- 79.Dietterich T.G. Ensemble methods in machine learning; Proceedings of the International Workshop on Multiple Classifier Systems; Cagliari, Italy. 21–23 June 2000; pp. 1–15. [Google Scholar]
- 80.Wang X., Li Q., Cheng C., Zhang K., Lou Q., Li J., Chen J. Genome-wide analysis of a putative lipid transfer protein LTP_2 gene family reveals CsLTP_2 genes involved in response of cucumber against root-knot nematode (Meloidogyne incognita) Genome. 2020;63:225–238. doi: 10.1139/gen-2019-0157. [DOI] [PubMed] [Google Scholar]
- 81.Torres-Schumann S., Godoy J.A., Pintor-Toro J.A. A probable lipid transfer protein gene is induced by NaCl in stems of tomato plants. Plant Mol. Biol. 1992;18:749–757. doi: 10.1007/BF00020016. [DOI] [PubMed] [Google Scholar]
- 82.Kapoor R., Kumar G., Arya P., Jaswal R., Jain P., Singh K., Sharma T.R. Genome-wide analysis and expression profiling of rice hybrid proline-rich proteins in response to biotic and abiotic stresses, and hormone treatment. Plants. 2019;8:343. doi: 10.3390/plants8090343. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 83.Bi D., Cheng Y.T., Li X., Zhang Y. Activation of plant immune responses by a gain-of-function mutation in an atypical receptor-like kinase. Plant Physiol. 2010;153:1771–1779. doi: 10.1104/pp.110.158501. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 84.Zhang Z., Liu Y., Ding P., Li Y., Kong Q., Zhang Y. Splicing of receptor-like kinase-encoding SNC4 and CERK1 is regulated by two conserved splicing factors that are required for plant immunity. Mol. Plant. 2014;7:1766–1775. doi: 10.1093/mp/ssu103. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 85.Duruflé H., Hervé V., Ranocha P., Balliau T., Zivy M., Chourré J., San Clemente H., Burlat V., Albenne C., Déjean S., et al. Cellwall modifications of two Arabidopsis thaliana ecotypes, Col, and Sha, in response to sub-optimal growth conditions: An integrative study. PlantSci.J. 2017;263:183–193. doi: 10.1016/j.plantsci.2017.07.015. [DOI] [PubMed] [Google Scholar]
- 86.Hayashi S., Ishii T., Matsunaga T., Tominaga R., Kuromori T., Wada T., Shinozaki K., Hirayama T. The glycerophosphoryl diester phosphodiesterase-like proteins SHV3 and its homologs play important roles in cell wall organization. Plant Cell Physiol. 2008;49:1522–1535. doi: 10.1093/pcp/pcn120. [DOI] [PubMed] [Google Scholar]
- 87.Salazar-Henao J.E., Lin W.D., Schmidt W. Discriminative gene co-expression network analysis uncovers novel modules involved in the formation of phosphate deficiency-induced root hairs in Arabidopsis. Sci. Rep. 2016;6:26820. doi: 10.1038/srep26820. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 88.Käll L., Krogh A., Sonnhammer E.L. A combined transmembrane topology and signal peptide prediction method. J. Mol. Biol. 2004;338:1027–1036. doi: 10.1016/j.jmb.2004.03.016. [DOI] [PubMed] [Google Scholar]
- 89.Sonnhammer E.L., Von Heijne G., Krogh A. A hidden Markov model for predicting transmembrane helices in protein sequences; Proceedings of the ISMB; Montréal, QC, Canada. 28 June–1 July 1998; pp. 175–182. [PubMed] [Google Scholar]
- 90.Fu L., Niu B., Zhu Z., Wu S., Li W. CD-HIT: Accelerated for clustering the next-generation sequencing data. Bioinformatics. 2012;28:3150–3152. doi: 10.1093/bioinformatics/bts565. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 91.Shiu S.H., Bleecker A.B. Receptor-like kinases from Arabidopsis form a monophyletic gene family related to animal receptor kinases. Proc. Natl. Acad. Sci. USA. 2001;98:10763–10768. doi: 10.1073/pnas.181141598. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 92.Saravanan V., Gautham N. Harnessing computational biology for exact linear B-cell epitope prediction: A novel amino acid composition-based feature descriptor. OMICS. 2015;19:648–658. doi: 10.1089/omi.2015.0095. [DOI] [PubMed] [Google Scholar]
- 93.Bhasin M., Raghava G.P. Classification of nuclear receptors based on amino acid composition and dipeptide composition. J. Biol. Chem. 2004;279:23262–23266. doi: 10.1074/jbc.M401932200. [DOI] [PubMed] [Google Scholar]
- 94.Chawla N.V., Bowyer K.W., Hall L.O., Kegelmeyer W.P. SMOTE: Synthetic minority over-sampling technique. J. Artif. Intell. Res. 2002;16:321–357. doi: 10.1613/jair.953. [DOI] [Google Scholar]
- 95.Pedregosa F., Varoquaux G., Gramfort A., Michel V., Thirion B., Grisel O., Blondel M., Prettenhofer P., Weiss R., Dubourg V., et al. Scikit-learn: Machine learning in Python. J. Mach. Learn Res. 2011;12:2825–2830. [Google Scholar]
- 96.Freund Y., Schapire R.E. A decision-theoretic generalization of online learning and an application to boosting; Proceedings of the European Conference on Computational Learning Theory; Barcelona, Spain. 13–15 March 1995. [Google Scholar]
- 97.Platt J. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Adv. Large Margin Classif. 1999;10:61–74. [Google Scholar]
- 98.Friedman J.H. Stochastic gradient boosting. Comput. Stat. Data Anal. 2002;38:367–378. doi: 10.1016/S0167-9473(01)00065-2. [DOI] [Google Scholar]
- 99.Samworth R.J. Optimal weighted nearest neighbour classifiers. Ann. Stat. 2012;40:2733–2763. doi: 10.1214/12-AOS1049. [DOI] [Google Scholar]
- 100.Hastie T., Tibshirani R., Friedman J.H., Friedman J.H. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Volume 2 Springer; New York, NY, USA: 2009. [Google Scholar]
- 101.Kim K.S., Choi H.H., Moon C.S., Mun C.W. Comparison of k-nearest neighbor, quadratic discriminant and linear discriminant analysis in classification of electromyogram signals based on the wrist-motion directions. Curr. Appl. Phys. 2011;11:740–745. doi: 10.1016/j.cap.2010.11.051. [DOI] [Google Scholar]
- 102.Schmidt M., LeRoux N., Bach F. Minimizing finite sums with the stochastic average gradient. Math. Program. 2017;162:83–112. doi: 10.1007/s10107-016-1030-6. [DOI] [Google Scholar]
- 103.King G., Zeng L. Logistic regression in rare events data. Polit. Anal. 2001;9:137–163. doi: 10.1093/oxfordjournals.pan.a004868. [DOI] [Google Scholar]
- 104.Hinton G., Deng L., Yu D., Dahl G.E., Mohamed A.-r., Jaitly N., Senior A., Vanhoucke V., Nguyen P., Sainath T.N., et al. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Process Mag. 2012;29:82–97. doi: 10.1109/MSP.2012.2205597. [DOI] [Google Scholar]
- 105.Haghighi S., Jasemi M., Hessabi S., Zolanvari A. PyCM: Multiclass confusion matrix library in Python. J. Open Source Softw. 2018;3:729. doi: 10.21105/joss.00729. [DOI] [Google Scholar]
- 106.Feller W. An Introduction to Probability Theory and Its Applications. Volume 2 John Wiley & Sons; Hoboken, NJ, USA: 2008. [Google Scholar]
- 107.Gupta A.K., Nadarajah S. Handbook of Beta Distribution and Its Applications. CRC Press; Boca Raton, FL, USA: 2004. [Google Scholar]
- 108.Salvatier J., Wiecki T.V., Fonnesbeck C. Probabilistic programming in Python using PyMC3. PeerJ Comput. Sci. 2016;2:e55. doi: 10.7717/peerj-cs.55. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 109.Geman S., Geman D. Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images. J. Appl. Stat. 1993;20:25–62. doi: 10.1080/02664769300000058. [DOI] [PubMed] [Google Scholar]
- 110.Faulkner C., Petutschnig E., Benitez-Alfonso Y., Beck M., Robatzek S., Lipka V., Maule A.J. LYM2-dependent chitin perception limits molecular flux via plasmodesmata. Proc. Natl. Acad. Sci. USA. 2013;110:9166–9170. doi: 10.1073/pnas.1203458110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 111.Liu B., Li J.F., Ao Y., Qu J., Li Z., Su J., Zhang Y., Liu J., Feng D., Qi K., et al. Lysin motif–containing proteins LYP4 and LYP6 play dual roles in peptidoglycan and chitin perception in rice innate immunity. Plant Cell. 2012;24:3406–3419. doi: 10.1105/tpc.112.102475. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 112.Omasits U., Ahrens C.H., Müller S., Wollscheid B. Protter: Interactive protein feature visualization and integration with experimental proteomic data. Bioinformatics. 2014;30:884–886. doi: 10.1093/bioinformatics/btt607. [DOI] [PubMed] [Google Scholar]
- 113.Price M.N., Dehal P.S., Arkin A.P. FastTree 2–approximately maximum-likelihood trees for large alignments. PLoS ONE. 2010;5:e9490. doi: 10.1371/journal.pone.0009490. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 114.Hruz T., Laule O., Szabo G., Wessendorp F., Bleuler S., Oertle L., Widmayer P., Gruissem W., Zimmermann P. Genevestigator v3: A reference expression database for the meta-analysis of transcriptomes. Adv. Bioinform. 2008;2008:420747. doi: 10.1155/2008/420747. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 115.Waese J., Fan J., Pasha A., Yu H., Fucile G., Shi R., Cumming M., Kelley L.A., Sternberg M.J., Krishnakumar V., et al. ePlant: Visualizing and exploring multiple levels of data for hypothesis generation in plant biology. Plant Cell. 2017;29:1806–1821. doi: 10.1105/tpc.17.00073. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 116.Li B., Ferreira M.A., Huang M., Camargos L.F., Yu X., Teixeira R.M., Carpinetti P.A., Mendes G.C., Gouveia-Mageste B.C., Liu C., et al. The receptor-like kinase NIK1 targets FLS2/BAK1 immune complex and inversely modulates antiviral and antibacterial immunity. Nat. Commun. 2019;10:4996. doi: 10.1038/s41467-019-12847-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 117.Fontes E.P., Santos A.A., Luz D.F., Waclawovsky A.J., Chory J. The geminivirus nuclear shuttle protein is a virulence factor that suppresses transmembrane receptor kinase activity. Genes Dev. 2004;18:2545–2556. doi: 10.1101/gad.1245904. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 118.Santos A.A., Carvalho C.M., Florentino L.H., Ramos H.J., Fontes E.P. Conserved threonine residues within the A-loop of the receptor NIK differentially regulate the kinase function required for antiviral signaling. PLoS ONE. 2009;4:e5781. doi: 10.1371/journal.pone.0005781. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 119.Zorzatto C., Machado J.P.B., Lopes K.V., Nascimento K.J., Pereira W.A., Brustolini O.J., Reis P.A., Calil I.P., Deguchi M., Sachetto-Martins G., et al. NIK1-mediated translation suppression functions as a plant antiviral immunity mechanism. Nature. 2015;520:679–682. doi: 10.1038/nature14171. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The data are available at http://209.145.56.49:8080/web/.






