Abstract
Background
Cucumber (Cucumis sativus L.) is a globally important crop, yet its production is severely hampered by pathogen attacks, leading to substantial economic losses. Nucleotide-binding site-leucine-rich repeat (NLR) genes are critical components of the plant immune system, but a comprehensive understanding of their genetic diversity and evolutionary mechanisms in cucumber has been lacking.
Methods
Leveraging a cucumber pan-genome of 12 representative accessions, this study systematically investigated the NLR gene family through orthologous gene group (OGG) analysis, focusing on presence/absence variation (PAV), phylogeny, gene structure, integrated domains (IDs), NLR-pairs, copy number variation, and selection pressures. Publicly available transcriptome data were also integrated to analyze NLR gene expression patterns upon pathogen inoculation.
Results
This study identified a total of 879 pan-NLR genes, which were clustered into 33 core and 54 dispensable orthologous gene groups (OGGs). Comparative genomic analysis revealed that the cucumber NLR repertoire has undergone a contraction during its evolution. Phylogenetic analysis uncovered distinct evolutionary paths among NLR subfamilies: RNLs are highly conserved, CNLs and TNLs exhibit species-specific expansions, and NLs show a polyphyletic distribution, primarily arising from N-terminal domain loss events. Compared to wild populations (Indian), cultivated populations (especially the Eurasian group) exhibit reduced NLR diversity, mainly due to the loss of dispensable genes. This study identified specific IDs and NLR-pairs in wild germplasm, which are potentially associated with responses to specific pathogen pressures. Selection pressure analysis indicated that the NLR family is predominantly under purifying selection, although some genes experienced strong positive selection during domestication. Transcriptome analysis identified a candidate gene, CsaV4_2G000768, which contains a unique ID and shows a broad-spectrum response to multiple pathogens.
Conclusion
This study provides a systematic, pan-genomic view of the contraction and divergent evolution of the cucumber NLR gene family and elucidates the impact of domestication on its diversity. The identification of unique NLR-IDs and NLR-pairs from wild germplasm, along with a candidate gene potentially associated with broad-spectrum response, offers valuable theoretical insights and genetic resources for cucumber disease resistance breeding.
Supplementary Information
The online version contains supplementary material available at 10.1186/s12870-026-08208-3.
Keywords: Cucumber, Pan NLR gene family, Evolution, Domestication, Integrated Domain, Disease resistance
Introduction
Cucumber (Cucumis sativus L.), a member of the Cucurbitaceae family, is one of the most widely cultivated and economically significant vegetable crops globally [1, 2]. Beyond its common use as a food source, cucumber is highly valued for its rich nutritional composition and potential medicinal properties [3]. Originating from the Indian subcontinent—a region abundant in wild cucumber resources—the crop has been cultivated and domesticated, leading to a diverse array of germplasm worldwide [4, 5]. However, in modern agricultural production, cucumber is highly susceptible to a variety of pathogens, including fungi, bacteria, and viruses, which result in substantial economic losses [2].
To counteract diverse pathogenic threats, plants have evolved a multi-layered innate immune system that relies on two primary classes of receptors to mediate immune responses [6]. The first layer is PAMP-Triggered Immunity (PTI), initiated by the recognition of Pathogen-Associated Molecular Patterns [7–9]. The second, more robust layer is Effector-Triggered Immunity (ETI), which is mediated by intracellular Nucleotide-binding site-Leucine-rich repeat (NLR) receptors [10, 11]. NLR proteins typically possess a conserved tripartite domain architecture: a central nucleotide-binding domain (NB-ARC), a C-terminal Leucine-Rich Repeat (LRR) domain, and a variable N-terminal domain that facilitates downstream signaling upon receptor activation [6, 11, 12]. Based on the variation in their N-terminal domains, NLRs are primarily classified into four major subfamilies: TNLs (TIR-NLRs), which contain a Toll/Interleukin-1 Receptor (TIR) domain; CNLs (CC-NLRs), which feature a Coiled-Coil (CC) domain; RNLs (RPW8-NLRs), characterized by a domain similar to RESISTANCE TO POWDERY MILDEW 8; and NLs, which lack a canonical N-terminal domain [12–14]. Plant NLRs recognize pathogen effectors through three principal mechanisms: indirect recognition by monitoring effector-induced modifications of host targets, direct binding of effectors to canonical NLR domains, and recognition mediated by integrated domains (IDs) [13–15]. Studies have shown that IDs can enable a host to rapidly acquire the ability to recognize pathogen effector proteins, and most functionally characterized NLRs with IDs exist in paired orientations. In these pairs, the ID-containing member often acts as a pathogen sensor, while the other member serves as a signaling executor [14, 16–19]. This recognition often culminates in a form of programmed cell death known as the Hypersensitive Response (HR), which restricts pathogen proliferation by sacrificing infected local cells [20, 21]. Although NLRs are crucial for plant disease resistance, functional research on cucumber NLRs lags behind that in model plants. Previous systematic studies have predominantly focused on the expression and function of specific subfamilies (CNL, RNL, TNL) within single cultivars, without adequately considering the genetic diversity among cucumber varieties and its impact on gene function [22–24]. Therefore, a pan-genome-based analysis of the NLR gene family can provide a more precise and comprehensive understanding of their functional diversity.
A cucumber pan-genome was recently released, which comprises 11 varieties from one Xishuangbanna, three Eurasian, three East Asian, and five Indian geographic groups [4]. Previous studies have established that these populations are differentiated not only geographically but also genetically [25, 26]. These resources, along with the public Cucumber-DB [27], have established a basis for pan-genomic studies of the NLR gene family. These resources facilitate a more thorough investigation into the functional differences of NLR family members across various geographic groups and under artificial selection. This study leverages the cucumber pan-genome to conduct a comprehensive analysis of the NLR gene family, examining orthologous gene groups (OGGs) for Presence/Absence Variation (PAV), phylogeny, gene structure, IDs, and NLR-pairs. Complemented by analyses of copy number variation and selection pressure, our research aims to dissect the compositional features and evolutionary dynamics of the NLR gene family across different cucumber geographic groups and to compare cultivated versus wild germplasm. Furthermore, by integrating public transcriptome data, this study analyzes the expression patterns of NLR genes under pathogen challenge to uncover their potential for genetic improvement, thereby providing a theoretical basis for the precision breeding of disease-resistant cucumber cultivars.
Methods
NLR identification and classification
Genome sequences, proteome sequences, and gene annotation (GFF3) files of 12 cucumber pan-genome samples were downloaded from the Cucumber-DB database (http://www.cucumberdb.com/#/download) [27]. This study first employed Hidden Markov Model (HMM) searched using HMMER3.0 with default parameters to screen all protein sequences in each genome. HMM profiles for NB-ARC (PF00931), TIR (PF01582), and RPW8 (PF05659) domains were downloaded from InterPro (https://www.ebi.ac.uk/interpro/), while profiles for LRR-related domains (PF00560, PF07725, PF13306, PF13855) were used for subsequent structural annotation. Protein sequences containing CC motifs were predicted using NLR-Annotator, which was based on two NLR-specific coiled-coil motifs (motif16 and motif17) [28]. The resulting amino acid sequences of putative NLR genes were then used in a genome-wide BLASTP analysis (E-value ≤ 1.0) for each genome. To confirm the presence of detectable NBS and IDs in each sequence, all candidates were further analyzed using the hmmscan tool in HMMER3.0 against a local Pfam-A database (E-value ≤ 10–5, domE ≤ 10–3). For genes with multiple annotated transcripts, the longest transcript was selected. Finally, a gene was defined as an NLR if it contained at least one NB, TIR, or RPW8 domain; the presence of only an LRR or CC motif was deemed insufficient [14]. Protein sequences shorter than 50 amino acids were discarded. The Arabidopsis thaliana NLR sequences were sourced from the TAIR database (https://www.arabidopsis.org/). The identification method used was consistent with that used for the cucumber NLRs, and this study also cross-referenced the existing annotations in the TAIR database to ensure a comprehensive and accurate list.
Identification of Orthologous Gene Groups (OGGs)
This study identified OGGs following the methodology of Van de Weyer et al. [14]. First, all-versus-all alignments of the full-length identified NLR protein sequences were performed using DIAMOND software (v2.1.3, parameters: –max-target-seqs –more-sensitive –comp-based-stats) [29]. Next, potential orthologous and paralogous relationships were identified using orthAgogue (v82dcb7aeb67c, parameters: –use_scores –strict_coorthologs) [30]. Finally, based on the homology information, proteins were clustered into groups using the mcl software (v89609fe, parameter: -I 1.5) [31]. OGGs were required to contain at least two genes; all other genes were classified as private genes.
Phylogenetic analysis of the cucumber NLR gene family
The amino acid sequences of cucumber and Arabidopsis thaliana NLRs were extracted and aligned using MUSCLE v5.3 [32]. The alignment was trimmed using trimAl v1.2 [33]. An unrooted maximum likelihood phylogenetic tree was constructed using IQ-TREE2 v2.4.0 [34]. The best-fit model (JTT + F + R6) was determined based on the lowest Bayesian Information Criterion [35]. The phylogenetic tree was annotated and visualized using the R packages ggtree [36] and ggtreeExtra [37]. Furthermore, to specifically resolve the evolutionary origin of the NL-type lacking the N-terminal domain, a separate phylogenetic analysis was performed using only the conserved NB-ARC domain sequences. NB-ARC domain sequences (Pfam: PF00931) from cucumber and Arabidopsis NLRs were extracted based on the hmmscan results. This NB-ARC domain-specific tree (Fig. S3) was constructed and visualized as described previously.
NLR-pair analysis
A Python 3.11.9 script was used to identify physically adjacent NLRs (< 15 kb apart) arranged in a head-to-head orientation within the pan-genome annotation; these were defined as NLR-pairs. Candidate pairs were manually inspected to identify IDs. Visualization was performed using the gggenes package (commit d218ed3) in R 4.4.2.
Gene duplication analysis
The duplication types of NLR genes were analyzed using DupGen_finder-unique.pl from DupGen_finder v.1.0.0 [38]. An all-versus-all BLASTP analysis (E-value threshold 1-e−5) was conducted on the protein sequences from the 12 cucumber pan-genome accessions and Arabidopsis thaliana (as an outgroup), retaining the top five BLASTP hits based on E-value. The BLASTP output, along with GFF3 gene position files, served as input for DupGen_finder.
Transposable element identification
The panHiTE pipeline [39] from HiTE v3.3.0 [40] was used to identify transposable elements (TEs) in the genomes and annotation files of the 12 cucumber accessions. The genomic locations of NLR genes and TEs were compared using bedtools v2.31.1 to determine their overlap [41].
Selection pressure analysis
The rates of non-synonymous substitutions (Ka), synonymous substitutions (Ks), and their ratio (Ka/Ks) for paired genes were calculated using the Simple Ka/Ks Calculator (NG) tool integrated into TBtools [42]. Orthologous NLR gene pairs among the cucumber pan-genome accessions were used as input. The mean and standard deviation (SD) of Ka, Ks, and Ka/Ks values were calculated using the groupby() function in pandas v2.3.1 in Python 3.11.9.
Transcriptome analysis
Transcriptome data (TPM values) for cucumber line “Chinese Long inbred line 9930” following inoculation with Cladosporium cucumerinum (CCU), Botrytis cinerea (GM), Podosphaera xanthii (PM), and Pseudomonas syringae pv. lachrymans (ALS) were obtained from a recent publication hosted on Cucumber-DB (http://www.cucumberdb.com/). A heatmap of gene expression was generated using the heatmap function in TBtools.
Data visualization and statistical analysis
Bar charts, pie charts, violin plots, and raincloud plots were generated using the matplotlib v3.10.1 and seaborn v.0.13.2 packages in Python 3.11.9. Data analysis was performed using the Scipy v1.16.1 and numpy v1.26.4 packages. Gene location ideograms were drawn using a modified Ideogram function from the RIdeogram R package [43].
Results
Identification, classification, and population distribution of cucumber Pan-NLR genes
To comprehensively characterize the cucumber pan-NLR gene family, this study conducted a systematic analysis of a pan-genome representing 12 accessions. Through scanning for key domains (NB-ARC, TIR, RPW8, and LRR) and precise identification of coiled-coil (CC) motifs with NLR-Annotator, this study identified a total of 879 pan-NLR genes (pan-NLRs). Based on their N-terminal domain configuration, these NLRs were classified into four major subfamilies: TNL (351), CNL (236), NL (242), and RNL (50) (Fig. 1A). This study further subdivided them into 21 structural subclasses based on their domain combinations (Fig. 1A, Table S1). The structural analysis revealed that the CNL subfamily is the most structurally conserved, whereas the TNL subfamily exhibits high structural plasticity, frequently showing loss of the NB-ARC or LRR domain.
Fig. 1.
Identification, classification, and population distribution of the cucumber pan-NLR gene family. A Domain architecture and classification of cucumber pan-NLR genes. A total of 879 NLRs were identified and classified into four major subfamilies (TNL, CNL, NL, and RNL) based on their N-terminal domains and further divided into 21 structural subclasses. B Distribution of NLR genes across different pan-genome components. Genes are categorized as core, dispensable (including soft-core, shell, cloud), and private based on their presence/absence patterns across the 12 accessions. C Number and composition of NLR genes in the 12 representative cucumber accessions. Bars are colored according to NLR subfamilies, and stacked patterns represent the pan-genome component
The NL subfamily is enriched with truncated variants containing only the NB-ARC domain (N-only), indicating that domain loss is a key feature of its evolution. OGGs analysis clustered the 879 NLRs into 87 OGGs and 53 private genes, revealing extensive presence/absence variation (PAV). Based on their PAV patterns, the 87 OGGs were categorized as 33 core OGGs (present in all accessions) and 54 dispensable OGGs. The latter were further divided into 17 soft-core OGGs (absent in one accession), 28 shell OGGs (present in 3 to 10 accessions), and 9 cloud OGGs (present in only two accessions) (Table S2). Core OGGs contained the largest number of genes (46.5%), while cloud OGGs contained the fewest (18 genes), even fewer than the private genes (53) (Fig. 1B). For standardization, this study renamed these OGGs using a systematic pan-genome nomenclature: NLR.CR01–CR33 (core), NLR.SC34–SC50 (soft-core), NLR.SH51–SH78 (shell), and NLR.CL79–CL87 (cloud) (Table S2). Our analysis revealed that some NL subfamily members frequently cluster with structurally intact CNL or TNL subfamilies within the same OGG. For instance, NLR.CR01 consists mainly of TNLs but includes some NLs, and NLR.SC38 is predominantly CNLs but contains some NLs, suggesting a common ancestry followed by the loss of conserved TIR or CC domains. Conversely, NLR.SC39 and NLR.SC49 are mainly composed of NLs but contain a small number of CNL members and TNL members, respectively.
To assess the scale of the cucumber NLR gene family, this study compared it with other plant species reported by Liu et al. [44]. This comparative analysis revealed that the cucumber NLR gene repertoire is relatively small within the angiosperms. Among the 305 plant species compared, cucumber's NLR count was higher than only 42 species. Even within a subset of 51 horticultural crops, its NLR number surpassed only five species, three of which are close relatives in the Cucurbitaceae family (Fig. S1). This highlights a contraction of the cucumber NLR gene pool during its evolution. The size and composition of the NLR gene family exhibited heterogeneity across different geographic groups (East Asian, Eurasian, Indian, Xishuangbanna), with individual accessions containing 68 to 85 NLRs (Fig. 1C, Table S3). This variation is primarily driven by differences in dispensable genes, while the number of core genes remains highly conserved (33–35). All RNL subfamily genes were classified as core genes. The Indian group possessed the highest average number of NLRs and the greatest intra-group variation, with 'Csaw8' having the most NLRs due to a large number of private genes, and 'Csa64' having the fewest due to extensive loss of shell genes. The Eurasian group had the lowest total NLR count, largely attributable to a substantial loss of shell genes and a smaller number of private and cloud genes, similar to 'Csa64'. The East Asian group had a higher total NLR count than the Eurasian group but also contained few private genes and no cloud genes.
Pan-genome-based phylogenetic analysis of the cucumber NLR gene family
To investigate the evolutionary history of the cucumber NLR gene family, this study constructed an integrated phylogenetic tree comprising 879 cucumber NLRs and 207 Arabidopsis thaliana NLRs (Fig. 2). The tree clearly resolved all NLR members into seven major groups (Group I-VII), revealing distinct evolutionary patterns among the different NLR subfamilies. The results demonstrated species-specific evolutionary dynamics for CNL and TNL subfamilies genes. The vast majority of Arabidopsis thaliana CNLs clustered together to form a small, paraphyletic (descended from a common ancestor but not including all the descendant groups) branch (Group I), whereas nearly all cucumber CNLs, along with a few Arabidopsis thaliana CNLs, formed a separate branch (Group III). Similarly, although the TNL genes from both species constitute a large monophyletic (containing an ancestor and all its descendants) group (Group VI), the branching patterns within this clade are clearly species-specific. As shown in Fig. 2, the Arabidopsis thaliana TNLs (represented by circles) and the cucumber TNLs (represented by stars) do not intermingle. Instead, they form distinct, large species-specific sub-clades within Group VI. These patterns suggest that both the CNL and TNL subfamilies have undergone species-specific lineage expansions and contractions since the divergence of the two species. A single Arabidopsis thaliana TNL gene and a distinct lineage composed solely of cucumber TNLs (Group VII) form a cluster that is topologically distant from the other major TNL group (Group VI). This suggests an independent evolutionary trajectory for the Group VII genes in cucumber. In contrast, the RNL subfamily displayed a high degree of cross-species conservation. RNLs from both species clustered together in an independent evolutionary branch (Group V), suggesting they originated from a common ancestor. The most complex evolutionary pattern was observed in the NL subfamily, which exhibited a clear polyphyletic distribution. In addition to forming two exclusive NL groups (Group II and Group IV), numerous NL members were interspersed within the major TNL and CNL branches. This unique phylogenetic distribution was consistently observed across all four cucumber geographic groups and in Arabidopsis thaliana, implying a universal evolutionary mechanism. This study therefore infers that during evolution, some CNL and TNL genes likely underwent frequent loss of their N-terminal conserved domains (CC or TIR), thereby evolving into NLs. To further validate the hypothesis that NLs result from the loss of the N-terminal domain, this study also conducted a phylogenetic analysis using only the conserved NB-ARC domain sequences. The results clearly indicate that although the NLR.SC39 and NLR.SC49 OGGs are predominantly composed of NLs and contain only a few CNL and TNL members, respectively, these OGGs still cluster within the major CNL and TNL clades based on their NB-ARC sequences (Fig. S3).
Fig. 2.
Phylogenetic analysis of NLR genes in cucumber and Arabidopsis thaliana. The unrooted phylogenetic tree, presented here in a circular layout for visualization, includes 879 cucumber NLRs and 207 Arabidopsis thaliana NLRs. Different colors and shapes represent genes from distinct NLR subfamilies (TNL, CNL, RNL, NL) and different species (Cucumber/Arabidopsis thaliana). Major evolutionary groups are labeled I-VII
Analysis of IDs in Cucumber Pan-NLRs
To explore the functional diversity of NLR genes, this study systematically identified IDs within the cucumber pan-NLR gene family. Among the 879 NLRs, this study found that 262 genes contained at least one ID belonging to 25 distinct Pfam families. The AAA domain was the most prevalent, present in 207 NLRs. The distribution of IDs showed population specificity, with each geographic group possessing unique ID types. For example, the East Asian group exclusively had the PSP and DUF382 domains, while the Indian group had five unique IDs and the Eurasian group had six (Figs. 3A, B). At the pan-genome level, IDs also showed a biased distribution between core and dispensable genes. For instance, 13 ID types were found only in core genes, 5 only in soft-core genes, 2 only in private genes, and 1 only in shell genes (Fig. 3C). Domain integration exhibited a clear preference for specific NLR subfamilies: the TNL subfamily integrated the highest diversity of IDs (19 types), whereas CNLs had fewer associations with IDs, although 105 CNLs integrated into an AAA domain. The RNL subfamily exclusively integrated a GRAM domain, specifically in the core gene NLR.CR003 (Fig. 3D, Table S4).
Fig. 3.
Identification and distribution characteristics of IDs in the cucumber pan-NLR gene family. A Statistics on the number of IDs. B Types and quantities of IDs in different geographic groups (East Asia, Eurasia, India, Xishuangbanna). C Distribution preferences of different types of IDs across pan-genome components (core, soft-core, shell). D Types and frequencies of integrated IDs in different NLR subfamilies frameworks (TNL, CNL, RNL)
Genomic Localization and NLR-pair Analysis of Cucumber Pan-NLRs
Physical mapping revealed that cucumber NLR genes are non-randomly distributed across the chromosomes, forming subfamily-specific clusters (Fig. 4A). TNL gene clusters are predominantly located on chromosomes 2 and 5, while CNL gene clusters are enriched on chromosomes 4 and 7. Members of the RNL subfamily are highly concentrated on chromosome 6, often occurring as gene pairs. NL subfamily members are commonly found within both TNL and CNL gene clusters, further supporting the phenomenon of domain loss during NLR evolution.
Fig. 4.
Chromosomal localization of cucumber pan-NLR genes and analysis of NLR-pairs. A Physical map showing the distribution of cucumber NLR genes on the seven chromosomes. Different colored markers represent different NLR subfamilies, while different shaped markers represent distinct geographic groups. B Genomic localization and schematic structure of high-confidence NLR-pairs
In plants, head-to-head arranged NLR-pairs often function as "sensor-executor" modules [18, 45]. Based on this, this study screened the cucumber pan-genome for NLR gene pairs that are physically linked (< 15 kb) and arranged in a head-to-head orientation. This study initially identified 43 candidate head-to-head NLR pairs (Fig. S2). Considering that previously reported NLR-pairs often contain IDs, this study further filtered for pairs where at least one member contained an ID. This process yielded 26 high-confidence NLR-pairs in the cucumber pan-genome. These pairs showed a patterned distribution, primarily located on chromosomes 4 and 5, with only one pair in “Chinese Long inbred line 9930” found on chromosome 2 (Fig. 4B). NLR-pairs are rare in individual genomes, with only 1–4 pairs per accession. These pairs can be classified into seven unique OGG pairing patterns. Among them, two patterns (NLR.SC34/NLR.CR02 and NLR.SH55/NLR.CR15) are conserved across all geographic groups and integrate only the AAA domain. Additionally, this study discovered two NLR-pairs exclusive to the wild Indian group, which integrate multiple redox-related (Redoxin) or kinase-related (PKinase and PK_Tyr_Ser-Thr) IDs (Table S5). This complex and specific combination of IDs suggests they may have evolved in response to particular pathogen pressures. In terms of subfamily composition, most NLR-pairs consist of the same NLR subfamilies (e.g., homologous CNL-CNL pairs). However, two heterologous pairs were identified: NLR.SH73/NLR.SC42 and NLR.SC35/NLR.SH69, both pairing an NL with a full-length NLR, which may represent one of the cases of functional divergence via domain loss after duplication (Table S5).
Gene Duplication, Copy Number Variation (CNV), and Transposable Element (TE) Analysis
Copy Number Variation (CNV) is a major driver of gene family evolution. To investigate its role in the cucumber NLR gene family, this study analyzed CNV in the pan-genome NLR OGGs. The analysis showed that CNV events occurred in all four geographic groups, though not all accessions carried duplicated NLR genes; for example, no CNVs were detected in 'Csacu2' and 'Csahx14'. The copy number of NLR genes appears to be tightly regulated, as CNV events were infrequent. Simple gene duplications (2 copies) were identified in only seven core or soft-core OGGs. Some of these duplicated copies underwent subfamily differentiation (e.g., a TNL evolving into an NL), suggesting that rapid functional divergence or degradation may occur following duplication (Fig. 5A). Furthermore, CNVs were most frequent in the Eurasian population, the group with the fewest NLRs.
Fig. 5.
Analysis of expansion mechanisms and transposable element associations in the cucumber NLR gene family. A Copy number variation (CNV) analysis of NLR OGGs (orthologous gene groups), showing gene duplication and subfamily differentiation. B Comparison of gene duplication types (dispersed, tandem, proximal, transposed, and whole-genome duplication) for NLR genes across different pan-genome components in four geographic populations. C Classification statistics of transposable elements (TEs) associated with NLR loci. D Comparison of TE insertion patterns in NLR gene regions across different pan-genome components among four geographic populations
To elucidate the expansion mechanisms of cucumber NLR genes, this study systematically analyzed their duplication types. Dispersed duplication was the primary driver of NLR gene family expansion, accounting for 40.05% of the total, followed by tandem duplication (29.69%). Proximal duplication also contributed a proportion (22.63%), while transposed (3.87%) and whole-genome duplications (WGD) (3.76%) made smaller contributions (Fig. 5B). Duplication patterns differed among subfamilies: CNL expansion was driven mainly by dispersed and tandem duplications, NLs primarily by dispersed duplication, RNLs by proximal and tandem duplications, while TNL expansion was associated with tandem duplication. At the population level, the composition of NLR duplication types was largely similar across the four geographic groups (East Asian cultivated, Eurasian cultivated, Indian wild, Xishuangbanna wild), with notable exceptions. WGD-derived copies of RNL genes were detected in the wild populations (Indian and Xishuangbanna) but were absent in both cultivated populations. Additionally, the Indian wild population had more gene categories derived from WGD than the other geographic groups (Fig. 5B).
This study also analyzed intact TEs closely associated with the 879 NLR genes, screening the 2 kb upstream and downstream flanking regions as well as the internal exonic and intronic regions. A total of 3,135 intact TEs were identified in the vicinity of NLR loci. Among these, DNA transposons (Class II TEs) were predominant, with 1,751 elements (55.78%), including families such as TcMar, PIF-Harbinger, MULE, hAT, Helitron, and CMC-EnSpm. In contrast, RNA-mediated retrotransposons (Class I TEs) were less numerous, with 422 elements (13.54%), mainly comprising Copia, Gypsy, SINE, and LINE types. Additionally, 962 TEs (30.68%) could not be classified (Unknown) and constituted the largest single category (Fig. 5C). TE insertion showed strong subfamily specificity: for RNLs, TEs were primarily enriched in the downstream 2 kb flanking region, but in the Indian group's RNL genes, TEs were preferentially located in intronic regions, revealing a population-specific insertion bias. For CNLs, TEs were mainly in the flanking regions, while for TNLs and NLs, TEs were distributed in all regions except exons. At the population level, although the overall TE insertion patterns for NLR genes were consistent across the four geographic groups, this study found group-specific biases. For instance, in the East Asian group, there was a lack of TEs in the intronic regions of shell NL genes, and LTR/Copia elements were completely absent, suggesting unique selective pressures against TEs in this specific geographic lineage (Fig. 5D).
Selection pressure analysis
To elucidate the selection pressures acting on the cucumber NLR gene family during its evolution, this study calculated the non-synonymous substitution rate (Ka), synonymous substitution rate (Ks), and the Ka/Ks ratio for different OGGs. This study first analyzed the four major NLR subfamilies (CNL, NL, RNL, TNL) and found their median Ka/Ks ratios to be 0.485, 0.687, 0.522, and 0.531, respectively. This indicates that the entire NLR gene family is under strong purifying selection to maintain its core functions (Fig. 6C). Despite this overall trend, evolutionary rates differed among subfamilies. The NL subfamily exhibited relatively higher median values for Ka (0.0034), Ks (0.0057), and Ka/Ks (0.687), suggesting it may have experienced more relaxed selective constraints (Fig. 6A and B). When comparing different pan-genome components, the cloud genes, which represent the most variable component, had a significantly higher non-synonymous substitution rate (median Ka = 0.011401) than core, soft-core, and shell genes (p < 0.05) (Fig. 6D). A similar trend was observed for the synonymous substitution rate (Ks), where the median Ks of cloud genes was also significantly higher (Fig. 6E). However, despite these significant differences in evolutionary rates, the intensity of selection pressure on these four gene categories was remarkably similar. Their median Ka/Ks values were all very close (ranging from 0.47 to 0.59) and, despite a slight overall statistical difference (p = 1.560e-2), all indicated that they are under comparable levels of purifying selection (Fig. 6F). The Ka/Ks values for core NLRs showed greater dispersion, suggesting that some core genes may have been subjected to more diverse selection events. In addition to comparing the median values, a unique pattern was observed for the RNL subfamily (Fig. 6C). RNLs uniquely exhibit a bimodal distribution of Ka/Ks values. One peak is clearly below 1.0, while a second distinct peak is centered near 1.0. This interesting pattern implies that the RNL subfamily may be evolving under two different modes of selection: approximately half of the members are under strong purifying selection and functionally conserved, while the other half appear to be released from selection pressure and are evolving neutrally. This pattern is potentially consistent with functional divergence or lineage-specific adaptation. To investigate changes in selection pressure during domestication, this study compared the Ka/Ks values of homologous genes between the wild population (Indian) and cultivated populations (East Asian, Eurasian) (Fig. 6J). This comparison revealed that while most genes are under purifying selection (Ka/Ks < 1), some genes within OGGs (mostly NL, CNL, and TNL) have Ka/Ks ratios > 1. In particular, seven CNL genes and one TNL gene had ratios exceeding 2, indicating that these genes may have been under strong positive selection during domestication or local adaptation (Fig. 6J).
Fig. 6.
Analysis of selection pressure on the cucumber NLR gene family. A, B, C, Comparison of the nonsynonymous substitution rate (Ka), synonymous substitution rate (Ks), and Ka/Ks ratio among the four major NLR subfamilies (CNL, NL, RNL, TNL) (D, E, F). Comparison of Ka, Ks, and Ka/Ks values among genes in different pan-genome components (core, soft core, shell, cloud). J Violin plot of Ka/Ks values for homologous genes between wild populations (India) and cultivated populations (East Asia, Eurasia), used to identify genes under positive selection. Asterisks denote statistical significance: *p < 0.05, **p < 0.01, ***p < 0.001
NLR expression and regulation analysis
To investigate the functional roles of NLR genes, this study systematically analyzed their expression dynamics in the cucumber line “Chinese Long inbred line 9930” following inoculation by four different pathogens: Cladosporium cucumerinum (CCU), Botrytis cinerea (GM), Podosphaera xanthii (PM), and Pseudomonas syringae pv. lachrymans (ALS) (Fig. 7). The results showed that the vast majority of NLR genes were either silent or expressed at extremely low levels (TPM < 1) under both normal and stress conditions. Only a small subset of specific NLRs participated in the transcriptional response to inoculation. Among the responsive NLRs, two patterns emerged: broad-spectrum response and specific response. An NLX gene, CsaV4_2G000768, and a co-expressed TNL-pair (CsaV4_5G003443/CsaV4_5G003442) exhibited broad-spectrum activity, responding to all four pathogens. In contrast, an RNL gene, CsaV4_4G002977, showed a specific response, being induced only by scab and angular leaf spot pathogens. Further analysis of the most strongly and broadly responsive gene, CsaV4_2G000768, revealed that it integrates three distinct domains: AAA, DUF382, and PSP (Table S4).
Fig. 7.
Expression pattern analysis of cucumber NLR genes under inoculation by four pathogens. The heatmap displays the Log2-transformed transcript abundance (Log2(TPM + 1)) of NLR genes in cucumber “Chinese Long inbred line 9930” line after inoculation with Cladosporium cucumerinum (CCU), Botrytis cinerea (GM), Podosphaera xanthii (PM), and Pseudomonas syringae pv. lachrymans (ALS), respectively. Rows represent genes, columns represent treatment conditions, and color intensity indicates expression levels
Discussion
The NLR gene family, a critical component of the plant immune system, is shaped by genome evolution and selection pressure. To gain a deeper understanding of the evolutionary strategies of the cucumber NLR genome, this study utilized a pan-genome framework to analyze OGGs for presence/absence variation, phylogeny, gene structure, IDs, NLR-pairs, copy number variation, selection pressure, and transcriptomic responses. This study found that the cucumber NLR repertoire has undergone contraction compared to non-Cucurbitaceae species, containing far fewer members than other horticultural plants (Fig. S1). This may be one of the reasons for cucumber's susceptibility to a wide range of pathogens [22, 46]. Similar to cucumber, other major cucurbit crops also exhibit contracted NLR repertoires compared to Solanaceae or Poaceae species; for instance, melon (Cucumis melo) and watermelon (Citrullus lanatus) genomes typically contain only ~ 80 and ~ 50 NLR genes, respectively [47, 48]. To systematically explore the genomic diversity of cucumbers NLRs, this study analyzed 12 representatives Cucumis sativus L. genomes spanning four distinct geographic groups: Indian, Xishuangbanna, East Asian (cultivated), and Eurasian (cultivated). Regarding gene distribution, the number of core genes was similar across the four geographic groups. The Indian group (Csa64, Csahx117, Csahx14, Csaw4, Csaw8), representing a wild population, exhibited greater intra-population variation in NLR gene count and possessed a rich diversity of dispensable genes (Fig. 1C, Table S3). This suggests that intense selection pressures may have driven more frequent gene acquisition, duplication, and mutation events in wild cucumber, leading to greater genetic diversity. Among the cultivated groups, the Eurasian group (Csa37, Csa9110gt, Csagy14) have markedly fewer NLRs than the other groups, primarily due to a reduction in shell genes within the dispensable OGGs (Fig. 1C, Table S3). The East Asian group (CsaV4, Csacu2, Csaxtmc), also cultivated, did not experience a similar reduction in total NLR numbers during domestication. However, both cultivated groups (Eurasian group and East Asian group) have fewer cloud and private genes (Fig. 1C). This indicates that different artificial selection pressures during cucumber domestication have led to varying degrees of contraction and loss of diversity in the NLR gene family [49]. This conclusion is strongly supported by a recent, parallel comparative genomics study [50]. Although Xie et al. [50] used a different species comparison method (comparing the cultivated cucumber '9930' with its wild relatives), they also independently concluded that the cultivated cucumber ('9930', 63 genes) possesses fewer NLR genes than its wild progenitor C. sativus var. hardwickii (67 genes) and its wild relative C. hystrix (89 genes). Whether from our pan-genome (intra-species) perspective or the comparative genomics (inter-species) perspective of Xie et al. [50], the data collectively point to the conclusion that the cucumber NLR gene pool contracted during domestication. This observed loss of genetic diversity under domestication is not unique to cucumber; it reflects a general pattern of genomic changes reported in many crop species [51], with well-documented examples in crops such as maize [52].
Our phylogenetic analysis reveals that different subfamilies within the cucumber NLR gene family have followed distinct evolutionary trajectories, potentially reflecting their functional diversification in the plant immune system. All cucumber RNL members were classified as core genes and clustered into a single conserved branch in the phylogenetic tree (Figs. 1C, and 2), showing no evidence of species-specific expansion. This high degree of cross-species conservation strongly suggests that RNLs play a fundamental and indispensable role in plant immunity [12, 14], analogous to essential downstream signaling components like ADR1 and NRG1 in Arabidopsis thaliana [53]. In stark contrast to the conservation of RNLs, CNLs and TNLs exhibit species-specific expansion (Fig. 2). On the phylogenetic tree, cucumber and Arabidopsis thaliana CNL and TNL subfamilies form separate evolutionary branches, indicating that they have undergone independent, lineage-specific evolution. This evolutionary pattern is not unique; for example, a substantial lineage-specific expansion of CNL subfamily has also been observed in Solanaceae. Such rapid, lineage-specific evolution is often considered a direct product of the co-evolutionary arms race between plants and pathogens [54].
The NL subfamily displays a clear polyphyletic distribution, with its members scattered among the major TNL and CNL branches (Fig. 2). This finding, combined with evidence from OGG clustering (NLs grouping with full-length TNLs/CNLs) and CNV analysis (domain loss accompanying copy number variation) (Fig. 5A), supports a model where new NLR subfamily are generated through domain loss. This interpretation is further supported by cases like NLR.SC39 and NLR.SC49. Although these OGGs are predominantly composed of NLs, they contain a few CNL and TNL members, respectively. Crucially, a phylogenetic analysis based on the NB-ARC domain places these OGGs firmly within the major CNL and TNL groups (Fig. S3), providing strong evidence that these NLs arose from N-terminal domain loss rather than domain acquisition. This mechanism is widespread in the angiosperms. For instance, Seong et al. [54] also reported the loss of the N-terminal conserved domain in a CNL lineage in Solanaceae. Furthermore, several studies on plant TNLs have found that in tandemly duplicated TNL gene clusters across multiple dicot species, it is common for some members to lack the typical TIR domain [55, 56]. While those studies focused on N-terminal loss in a single subfamily, our work shows this phenomenon occurs in both major subfamilies in cucumber. Moreover, our analysis suggests that some domain loss events may have occurred recently during artificial domestication. For example, the N-terminal deletion in NLR.CR01 resulting from copy number variation is observed only in cultivated accessions. This suggests that domestication may not only have reduced the diversity of NLR genes in modern cultivars but also promoted such structural domain loss events. Gene duplication patterns provide further evidence for the different evolutionary paths of each subfamily. Tandem duplication has contributed to the lineage-specific expansion of CNLs and TNLs, consistent with many studies showing that recent NLR evolution is characterized by tandem duplication events that play a key role in their numerical expansion [56–58]. Differing slightly from the findings of Guo et al. [56], this study observed that N-terminal loss in cucumber occurs not only in tandem duplicates but also in proximal duplicates, particularly in wild populations (Table S6).
Selection pressure analysis revealed a dual pattern in the evolution of the cucumber NLR gene family: strong purifying selection constraining core functions coexists with the rapid evolution of adaptive sites. Whether categorized by subfamily or by pan-genome component (core, dispensable), the median Ka/Ks ratio for all NLR classes was well below 1 (Fig. 6C, F). This indicates that the entire NLR gene family is under strong purifying selection to maintain its fundamental functions. Such strong purifying selection on NLRs, which ensures deleterious mutations are rapidly purged, is a common finding, with similar reports in other crops like tomato [59]. As shown in Fig. 6C, the NL subfamily exhibited a significantly higher evolutionary rate compared to the other two subfamilies (CNL, p < 1e-5; TNL, p < 1e-5), a finding similar to that in wild strawberries [60]. This may be linked to the polyphyletic distribution of NLs; relatively relaxed purifying selection could allow NLs to accumulate more variation, potentially providing a substrate for the evolution of novel resistance specificities. While most NLRs remained under purifying selection during domestication, some genes experienced strong positive selection, likely as a result of local adaptation. This phenomenon is common in the domestication of many species and reflects rapid adaptive evolution in response to specific agricultural environments or pathogen pressures [60, 61].
Most functionally characterized NLRs with IDs exist in paired orientations [14, 16–19]. Therefore, this study analyzed IDs and NLR-pairs in cucumber. This study found that cucumber NLRs integrate 25 classes of IDs, with the AAA superfamily domain being particularly prevalent, found in about a quarter of all NLRs (Fig. 3A-D). The AAA superfamily is common in various ATPases and is widely involved in fundamental cellular processes such as organelle division, protein quality control, membrane transport, and signal transduction [62, 63]. The integration of AAA domains into NLR genes is widespread in plants, having been identified in the pan-NLRome of Arabidopsis thaliana [14]. Notably, a 2024 study reported that a soybean TNL protein with a fused C-terminal AAA_22 domain acted as a "decoy" and conferred broad-spectrum resistance to multiple viruses [64]. However, this study did not detect classic IDs like the WRKY domain [17] integrated into RRS1 or the HMA domain in rice Pik-1 and RGA5, which may reflect selective integration events driven by differences in pathogen ecology across species [65, 66]. IDs distribution exhibited population and subfamily specificity. The Eurasian and Indian groups integrated the most unique IDs. Although the Eurasian group has the fewest NLRs, it possesses six unique IDs, which could be a compensatory mechanism for its smaller gene repertoire. Previous research has indicated that ID integration in grasses is primarily driven by DNA transposition or non-allelic recombination [67, 68]. Our analysis of TE insertions in and around NLR genes showed that TNL and NL subfamilies, which have a greater diversity of IDs, also have more frequent TE insertions in regions outside of exons. This study therefore speculates that ID integration in cucumber may also be dependent on TE activity. Our NLR-pair analysis revealed a pattern different from previous reports; only a small fraction of ID-containing NLRs in cucumber exist in pairs. This study identified only seven unique OGG pairing patterns, located mainly on chromosomes 4 and 5. However, the wild Indian population contains two unique NLR-pairs that integrate multiple redox- or kinase-related IDs (Table S5). These pairs are located on chromosome 5, a locus known to be enriched with resistance sites for downy and powdery mildew in cucumber [69]. This suggests they are valuable genetic resources that may have evolved in response to specific pathogen pressures. The precise roles of these specific IDs and NLR-pairs in cucumber immunity, particularly those from wild populations, warrant further investigation as they represent a promising source of novel genes for cultivated cucumber.
To validate the functional relevance of NLR genes, this study analyzed their expression patterns in the cucumber line “Chinese Long inbred line 9930” from a public database following inoculation with four different pathogens. The results showed that most NLRs remained silent, but a few were responsive (Fig. 7). Among them, CsaV4_6G003127 and CsaV4_4G002977 are the orthologs in the 'Chinese Long inbred line 9930 ' genome of the CsRSF1 and CsRSF2 genes, respectively, which were previously characterized by Wang et al. [23], where these two genes were shown to be more highly expressed in powdery mildew-resistant cucumber varieties than in susceptible ones. Knockdown of CsRSF1/2 significantly reduced resistance, while overexpression enhanced it [23]. In our analysis of the “Chinese Long inbred line 9930” line, however, this study observed a different response: CsaV4_6G003127 showed no transcriptional response to any of the four pathogens, while CsaV4_4G002977, though unresponsive to powdery mildew, was induced by scab (CCU) and angular leaf spot (ALS). This difference in response patterns could be due to functional divergence between alleles. It suggests that RNLs like CsaV4_4G002977 may recognize multiple effector proteins, providing a theoretical basis for broadening their disease resistance spectrum through genetic engineering. This study also identified a broad-spectrum responsive NLX gene, CsaV4_2G000768, which integrates a variety-specific ID (the PSP domain). Its ability to sense multiple effectors reflects the capacity of a few R genes to target a diversity of pathogens [70]. This study speculates that its broad responsiveness may be linked to its unique integrated domain. Although the lack of transcriptomic data from more lines under pathogen challenge limits a comprehensive analysis of ID functional diversity across the cucumber population, the case of CsaV4_2G000768 implies that unique NLR-ID genes from wild or landrace varieties represent a highly promising avenue for breeding broad-spectrum, durable resistance.
By employing a pan-genome family approach, this study provides a comprehensive and unique perspective on the evolution and function of the cucumber NLR gene family. However, further biological experiments, such as gene knockout or overexpression assays, are necessary to definitively validate the broad-spectrum resistance function of CsaV4_2G000768.Our work not only deepens the understanding of the cucumber immune system and the evolutionary mechanisms of NLR genes but also uncovers important resistance gene resources, particularly those with specific IDs or in NLR-pair configurations from wild germplasm, to address the limited resistance in modern cultivated cucumbers. The pan-NLR genomic analysis offers a more diverse and distinctive viewpoint than single-genome studies, enabling a more precise and thorough understanding of the NLR gene family. The value of such pan-genome approaches for gene family characterization has been recently demonstrated in other crops as well, for example, in barley [61, 71]. However, given that our expression analysis was limited to the single cultivated line ‘Chinese Long inbred line 9930’ and the pan-genome comprised 12 representative accessions, future research should aim to expand the population scale to capture rare variants and investigate the pathogen-response dynamics of wild-specific NLRs in their native genetic backgrounds. Furthermore, priority should be given to the functional validation of the candidate genes and unique NLR-pairs identified here, with the aim of achieving improvements in disease-resistance breeding for cucumber.
Conclusion
This study provides a systematic pan-genomic characterization of the NLR gene family in cucumber, revealing a significant contraction of the NLR repertoire during evolution. Our analysis elucidates distinct evolutionary trajectories among subfamilies: RNLs remain highly conserved, CNLs and TNLs exhibit lineage-specific expansions, while NLs display a polyphyletic origin driven primarily by N-terminal domain loss. We observed that domestication has reduced NLR diversity, particularly in cultivated Eurasian accessions, whereas wild germplasms (Indian group) retain unique IDs and NLR-pairs potentially associated with specific pathogen responses. Additionally, transcriptomic profiling identified CsaV4_2G000768, a gene containing a unique ID, as a candidate gene involved in broad-spectrum pathogen responses. These findings offer theoretical insights into the evolutionary dynamics of the cucumber immune system and highlight wild germplasm as a valuable reservoir for genetic improvement. Future research should prioritize the functional validation of these candidate genes and unique NLR-pairs to support the breeding of disease-resistant cucumber varieties.
Supplementary Information
Acknowledgements
We thank Li et al. for publicly providing the cucumber pan-genome data and Guan et al. for establishing the cucumber-DB website.
Abbreviations
- ALS
Pseudomonas syringae pv. Lachrymans
- CC
Coiled-coil domain
- CCU
Cladosporium cucumerinum
- CNL
Coiled-coil nucleotide-binding site-leucine-rich repeat
- CNV
Copy number variation
- ETI
Effector-triggered immunity
- GM
Botrytis cinerea
- HMM
Hidden Markov Model
- HR
Hypersensitive response
- ID
Integrated domain
- Ka
Non-synonymous substitution rate
- Ks
Synonymous substitution rate
- LRR
Leucine-rich repeat
- NB-ARC
Nucleotide-binding adaptor shared by APAF-1, R proteins, and CED-4
- NL
NLRs lacking a canonical N-terminal domain
- NLR
Nucleotide-binding site-leucine-rich repeat
- OGG
Orthologous gene group
- PAMP
Pathogen-associated molecular pattern
- PAV
Presence/absence variation
- PM
Podosphaera xanthii
- PTI
PAMP-triggered immunity
- RNL
RPW8-like nucleotide-binding site-leucine-rich repeat
- RPW8
Resistance to Powdery Mildew 8
- TE
Transposable element
- TIR
Toll/interleukin-1 receptor
- TNL
TIR-nucleotide-binding site-leucine-rich repeat
- TPM
Transcripts per million
- WGD
Whole-genome duplication
Authors’ contributions
Y.D. was responsible for the study design and supervision. B.Z. wrote the manuscript, conducted literature research, and performed all analyses. All authors reviewed and approved the final manuscript.
Funding
This work was supported by The innovation capacity construction of breeding scientific research platform in Guizhou Province(Qianke Hefuqi [2022] 014), Guizhou Academy of Agricultural Sciences "15th Five-Year Plan" Outstanding Team Project on Mountain Vegetable Biological Breeding and Key Core Technology Innovation and Application (Qiannongke Outstanding Team [2026] No. 07),The innovation capacity construction of biological breeding for specialty crops in karst mountainous areas(Qianke Hefuqi [2024]003–1), The construction of scientific research capacity and conditions for the provincial key laboratory of biological breeding of characteristic grain and oil crops in karst mountainous areas (Qianke Hefuqi[2024]003–2) and National Bulk Vegetable Industry Technology System—Guiyang Comprehensive Experimental Station.
Data availability
The cucumber genome sequence used in this study is publicly available in the cucumber-DB database (http://www.cucumberdb.com/).
Declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare no competing interests.
Footnotes
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- 1.Eifediyi EK, Remison SU. Growth and yield of cucumber (Cucumis sativus L.) as influenced by farmyard manure and inorganic fertilizer. J Plant Breed Crop Sci. 2010.
- 2.Mallick PK. Evaluating potential importance of cucumber (Cucumis sativus L. -Cucurbitaceae): a brief review. Int J Appl Sci Biotechnol. 2022;10:12–5. 10.3126/ijasbt.v10i1.44152. [Google Scholar]
- 3.Mukherjee PK, Nema NK, Maity N, Sarkar BK. Phytochemical and therapeutic potential of cucumber. Fitoterapia. 2013;84:227–36. 10.1016/j.fitote.2012.10.003. [DOI] [PubMed] [Google Scholar]
- 4.Li H, Wang S, Chai S, Yang Z, Zhang Q, Xin H, et al. Graph-based pan-genome reveals structural and sequence variations related to agronomic traits and domestication in cucumber. Nat Commun. 2022;13:682. 10.1038/s41467-022-28362-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Sebastian P, Schaefer H, Telford IRH, Renner SS. Cucumber (Cucumis sativus) and melon (C. melo) have numerous wild relatives in Asia and Australia, and the sister species of melon is from Australia. Proc Natl Acad Sci USA. 2010;107:14269–73. 10.1073/pnas.1005338107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Wang J, Song W, Chai J. Structure, biochemical function, and signaling mechanism of plant NLRs. Mol Plant. 2023;16:75–95. 10.1016/j.molp.2022.11.011. [DOI] [PubMed] [Google Scholar]
- 7.Bigeard J, Colcombet J, Hirt H. Signaling mechanisms in pattern-triggered immunity (PTI). Mol Plant. 2015;8:521–39. 10.1016/j.molp.2014.12.022. [DOI] [PubMed] [Google Scholar]
- 8.DeFalco TA, Zipfel C. Molecular mechanisms of early plant pattern-triggered immune signaling. Mol Cell. 2021;81:3449–67. 10.1016/j.molcel.2021.07.029. [DOI] [PubMed] [Google Scholar]
- 9.Yu X, Feng B, He P, Shan L. From chaos to harmony: responses and signaling upon microbial pattern recognition. Annu Rev Phytopathol. 2017;55(1):109–37. 10.1146/annurev-phyto-080516-035649. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Dodds PN, Rathjen JP. Plant immunity: towards an integrated view of plant–pathogen interactions. Nat Rev Genet. 2010;11:539–48. 10.1038/nrg2812. [DOI] [PubMed] [Google Scholar]
- 11.Jones JDG, Vance RE, Dangl JL. Intracellular innate immune surveillance devices in plants and animals. Science. 2016;354:aaf6395. 10.1126/science.aaf6395. [DOI] [PubMed] [Google Scholar]
- 12.Monteiro F, Nishimura MT. Structural, functional, and genomic diversity of plant NLR proteins: an evolved resource for rational engineering of plant immunity. Annu Rev Phytopathol. 2018;56(1):243–67. 10.1146/annurev-phyto-080417-045817. [DOI] [PubMed] [Google Scholar]
- 13.Kroj T, Chanclud E, Michel-Romiti C, Grand X, Morel J-B. Integration of decoy domains derived from protein targets of pathogen effectors into plant immune receptors is widespread. New Phytol. 2016;210:618–26. 10.1111/nph.13869. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Van de Weyer A-L, Monteiro F, Furzer OJ, Nishimura MT, Cevik V, Witek K, et al. A species-wide inventory of NLR genes and alleles in arabidopsis thaliana. Cell. 2019;178:1260-1272.e14. 10.1016/j.cell.2019.07.038. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Kourelis J, van der Hoorn RAL. Defended to the nines: 25 years of resistance gene cloning identifies nine mechanisms for R protein function. Plant Cell. 2018;30:285–99. 10.1105/tpc.17.00579. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Cesari S, Bernoux M, Moncuquet P, Kroj T, Dodds PN. A novel conserved mechanism for plant NLR protein pairs: the “integrated decoy” hypothesis. Front Plant Sci. 2014;5:606. 10.3389/fpls.2014.00606. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Le Roux C, Huet G, Jauneau A, Camborde L, Trémousaygue D, Kraut A, et al. A receptor pair with an integrated decoy converts pathogen disabling of transcription factors to immunity. Cell. 2015;161:1074–88. 10.1016/j.cell.2015.04.025. [DOI] [PubMed] [Google Scholar]
- 18.Narusaka M, Shirasu K, Noutoshi Y, Kubo Y, Shiraishi T, Iwabuchi M, et al. RRS1 and RPS4 provide a dual resistance- gene system against fungal and bacterial pathogens. Plant J. 2009;60:218–26. 10.1111/j.1365-313X.2009.03949.x. [DOI] [PubMed] [Google Scholar]
- 19.Zhang Y, Wang Y, Liu J, Ding Y, Wang S, Zhang X, et al. Temperature-dependent autoimmunity mediated by chs1 requires its neighboring TNL gene SOC3. New Phytol. 2017;213(3):1330–45. 10.1111/nph.14216. [DOI] [PubMed] [Google Scholar]
- 20.Cui H, Tsuda K, Parker JE. Effector-triggered immunity: from pathogen perception to robust defense. Annu Rev Plant Biol. 2015;66(1):487–511. 10.1146/annurev-arplant-050213-040012. [DOI] [PubMed] [Google Scholar]
- 21.Jones JDG, Dangl JL. The plant immune system. Nature. 2006;444:323–9. 10.1038/nature05286. [DOI] [PubMed] [Google Scholar]
- 22.Wan H, Yuan W, Bo K, Shen J, Pang X, Chen J. Genome-wide analysis of NBS-encoding disease resistance genes in Cucumis sativusand phylogenetic study of NBS-encoding genes in Cucurbitaceae crops. BMC Genomics. 2013;14:109. 10.1186/1471-2164-14-109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Wang X, Chen Q, Huang J, Meng X, Cui N, Yu Y, et al. Nucleotide-Binding Leucine-Rich Repeat Genes CsRSF1 and CsRSF2 Are Positive Modulators in the Cucumis sativus Defense Response to Sphaerotheca fuliginea. Int J Mol Sci. 2021;22:3986. 10.3390/ijms22083986. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Zhang W, Yuan Q, Wu Y, Zhang J, Nie J. Genome-wide identification and characterization of the CC-NBS-LRR gene family in cucumber (Cucumis sativus L.). Int J Mol Sci. 2022;23:5048. 10.3390/ijms23095048. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Lv J, Qi J, Shi Q, Shen D, Zhang S, Shao G, et al. Genetic diversity and population structure of cucumber (Cucumis sativus L.). PLoS ONE. 2012;7:e46919. 10.1371/journal.pone.0046919. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Qi J, Liu X, Shen D, Miao H, Xie B, Li X, et al. A genomic variation map provides insights into the genetic basis of cucumber domestication and diversity. Nat Genet. 2013;45:1510–5. 10.1038/ng.2801. [DOI] [PubMed] [Google Scholar]
- 27.Guan J, Miao H, Zhang Z, Dong S, Zhou Q, Liu X, et al. A near-complete cucumber reference genome assembly and Cucumber-DB, a multi-omics database. Mol Plant. 2024;17:1178–82. 10.1016/j.molp.2024.06.012. [DOI] [PubMed] [Google Scholar]
- 28.Steuernagel B, Witek K, Krattinger SG, Ramirez-Gonzalez RH, Schoonbeek H, Yu G, et al. The NLR-annotator tool enables annotation of the intracellular immune receptor Repertoire1 [OPEN]. Plant Physiol. 2020;183:468–82. 10.1104/pp.19.01273. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Buchfink B, Xie C, Huson DH. Fast and sensitive protein alignment using DIAMOND. Nat Methods. 2015;12:59–60. 10.1038/nmeth.3176. [DOI] [PubMed] [Google Scholar]
- 30.Ekseth OK, Kuiper M, Mironov V. OrthAgogue: an agile tool for the rapid prediction of orthology relations. Bioinformatics. 2014;30:734–6. 10.1093/bioinformatics/btt582. [DOI] [PubMed] [Google Scholar]
- 31.Enright AJ, Van Dongen S, Ouzounis CA. An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res. 2002;30:1575–84. 10.1093/nar/30.7.1575. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Edgar RC. Muscle5: high-accuracy alignment ensembles enable unbiased assessments of sequence homology and phylogeny. Nat Commun. 2022;13:6968. 10.1038/s41467-022-34630-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Capella-Gutiérrez S, Silla-Martínez JM, Gabaldón T. trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics. 2009;25:1972–3. 10.1093/bioinformatics/btp348. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Minh BQ, Schmidt HA, Chernomor O, Schrempf D, Woodhams MD, Von Haeseler A, et al. Iq-tree 2: new models and efficient methods for phylogenetic inference in the genomic era. Mol Biol Evol. 2020;37:1530–4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Kalyaanamoorthy S, Minh BQ, Wong TKF, von Haeseler A, Jermiin LS. ModelFinder: fast model selection for accurate phylogenetic estimates. Nat Methods. 2017;14:587–9. 10.1038/nmeth.4285. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Yu G, Smith DK, Zhu H, Guan Y, Lam TTY. ggtree: an r package for visualization and annotation of phylogenetic trees with their covariates and other associated data. Methods Ecol Evol. 2017;8:28–36. 10.1111/2041-210X.12628. [Google Scholar]
- 37.Xu S, Dai Z, Guo P, Fu X, Liu S, Zhou L, et al. GgtreeExtra: compact visualization of richly annotated phylogenetic data. Mol Biol Evol. 2021;38:4039–42. 10.1093/molbev/msab166. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Qiao X, Li Q, Yin H, Qi K, Li L, Wang R, et al. Gene duplication and evolution in recurring polyploidization–diploidization cycles in plants. Genome Biol. 2019;20:38. 10.1186/s13059-019-1650-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Hu K, Xu M, Wang J. panHiTE: a comprehensive and accurate pipeline for TE detection in large-scale population genomes. 2025;:2025.02.15.638472. 10.1101/2025.02.15.638472. [DOI] [PubMed]
- 40.Hu K, Ni P, Xu M, Zou Y, Chang J, Gao X, et al. Hite: a fast and accurate dynamic boundary adjustment approach for full-length transposable element detection and annotation. Nat Commun. 2024;15:5573. 10.1038/s41467-024-49912-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010;26:841–2. 10.1093/bioinformatics/btq033. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Chen C, Wu Y, Li J, Wang X, Zeng Z, Xu J, et al. TBtools-II: a “one for all, all for one” bioinformatics platform for biological big-data mining. Mol Plant. 2023;16:1733–42. 10.1016/j.molp.2023.09.010. [DOI] [PubMed] [Google Scholar]
- 43.Hao Z, Lv D, Ge Y, Shi J, Weijers D, Yu G, et al. Rideogram: drawing SVG graphics to visualize and map genome-wide data on the idiograms. PeerJ Comput Sci. 2020;6:e251. 10.7717/peerj-cs.251. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Liu Y, Zeng Z, Zhang Y-M, Li Q, Jiang X-M, Jiang Z, et al. An angiosperm NLR atlas reveals that NLR gene reduction is associated with ecological specialization and signal transduction component deletion. Mol Plant. 2021;14:2015–31. 10.1016/j.molp.2021.08.001. [DOI] [PubMed] [Google Scholar]
- 45.Saucet SB, Ma Y, Sarris PF, Furzer OJ, Sohn KH, Jones JDG. Two linked pairs of Arabidopsis TNL resistance genes independently confer recognition of bacterial effector AvrRps4. Nat Commun. 2015;6:6338. 10.1038/ncomms7338. [DOI] [PubMed] [Google Scholar]
- 46.Wang X, Bao K, Reddy UK, Bai Y, Hammar SA, Jiao C, et al. The USDA cucumber (Cucumis sativus L.) collection: genetic diversity, population structure, genome-wide association studies, and core collection development. Hortic Res. 2018;5:64. 10.1038/s41438-018-0080-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Sun Y, Kou D-R, Li Y, Ni J-P, Wang J, Zhang Y-M, et al. Pan-genome of Citrullus genus highlights the extent of presence/absence variation during domestication and selection. BMC Genomics. 2023;24:332. 10.1186/s12864-023-09443-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Sun Y, Wang J, Li Y, Jiang B, Wang X, Xu W-H, et al. Pan-genome analysis reveals the abundant gene presence/absence variations among different varieties of melon and their influence on traits. Front Plant Sci. 2022. 10.3389/fpls.2022.835496. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Liu Y, Zhang Y-M, Tang Y, Chen J-Q, Shao Z-Q. The evolution of plant NLR immune receptors and downstream signal components. Curr Opin Plant Biol. 2023;73:102363. 10.1016/j.pbi.2023.102363. [DOI] [PubMed] [Google Scholar]
- 50.Xie S-Y, Wang M, Yang S, Peng Q, Liu W, Yan J, et al. Genome-wide identification and comparative analyses of NLR gene families in Cucumis sativus and its related species. Sci Hortic. 2025;351:114382. 10.1016/j.scienta.2025.114382. [Google Scholar]
- 51.Shi J, Lai J. Patterns of genomic changes with crop domestication and breeding. Curr Opin Plant Biol. 2015;24:47–53. 10.1016/j.pbi.2015.01.008. [DOI] [PubMed] [Google Scholar]
- 52.Yamasaki M, Wright SI, McMullen MD. Genomic screening for artificial selection during domestication and improvement in maize. Ann Bot. 2007;100:967–73. 10.1093/aob/mcm173. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Saile SC, Kasmi FE. Small family, big impact: RNL helper NLRs and their importance in plant innate immunity. PLoS Pathog. 2023;19:e1011315. 10.1371/journal.ppat.1011315. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Seong K, Seo E, Witek K, Li M, Staskawicz B. Evolution of NLR resistance genes with noncanonical N-terminal domains in wild tomato species. New Phytol. 2020;227:1530–43. 10.1111/nph.16628. [DOI] [PubMed] [Google Scholar]
- 55.Baggs E, Dagdas G, Krasileva KV. NLR diversity, helpers and integrated domains: making sense of the NLR IDentity. Curr Opin Plant Biol. 2017;38:59–67. 10.1016/j.pbi.2017.04.012. [DOI] [PubMed] [Google Scholar]
- 56.Guo B-C, Zhang Y-R, Liu Z-G, Li X-C, Yu Z, Ping B-Y, et al. Deciphering plant NLR genomic evolution: synteny-informed classification unveils insights into TNL gene loss. Mol Biol Evol. 2025;42:msaf015. 10.1093/molbev/msaf015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Shao Z-Q, Zhang Y-M, Hang Y-Y, Xue J-Y, Zhou G-C, Wu P, et al. Long-term evolution of nucleotide-binding site-leucine-rich repeat genes: understanding gained from and beyond the legume Family1[C][W]. Plant Physiol. 2014;166:217–34. 10.1104/pp.114.243626. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Shao Z-Q, Xue J-Y, Wang Q, Wang B, Chen J-Q. Revisiting the origin of plant NBS-LRR genes. Trends Plant Sci. 2019;24:9–12. 10.1016/j.tplants.2018.10.015. [DOI] [PubMed] [Google Scholar]
- 59.Bashir S, Rehman N, Fakhar Zaman F, Naeem MK, Jamal A, Tellier A, et al. Genome-wide characterization of the NLR gene family in tomato (solanum lycopersicum) and their relatedness to disease resistance. Front Genet. 2022. 10.3389/fgene.2022.931580. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Zhu N, Feng Y, Shi G, Zhang Q, Yuan B, Qiao Q. Evolutionary analysis of TIR- and non-TIR-NBS-LRR disease resistance genes in wild strawberries. Front Plant Sci. 2024;15. 10.3389/fpls.2024.1452251. [DOI] [PMC free article] [PubMed]
- 61.Bai Y, Luo X, Qian W, Geng X, Bi X, Zhang Y. Identification and analysis of the AP2/ERF gene family in barley based on pan-genome and pan-transcriptome. J Agric Food Chem. 2025;73:18448–55. 10.1021/acs.jafc.5c05138. [DOI] [PubMed] [Google Scholar]
- 62.Bouchnak I, van Wijk KJ. Structure, function, and substrates of clp AAA+ protease systems in cyanobacteria, plastids, and apicoplasts: a comparative analysis. J Biol Chem. 2021;296:100338. 10.1016/j.jbc.2021.100338. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.McNally FJ, Roll-Mecak A. Microtubule-severing enzymes: from cellular functions to molecular mechanism. J Cell Biol. 2018;217:4057–69. 10.1083/jcb.201612104. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Shao W, Shi G, Chu H, Du W, Zhou Z, Wuriyanghan H. Development of an NLR-ID toolkit and identification of novel disease-resistance genes in soybean. Plants. 2024;13:668. 10.3390/plants13050668. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Kanzaki H, Yoshida K, Saitoh H, Fujisaki K, Hirabuchi A, Alaux L, et al. Arms race co-evolution of magnaporthe oryzae AVR-pik and rice pik genes driven by their physical interactions. Plant J: Cell Mol Biol. 2012;72:894–907. 10.1111/j.1365-313X.2012.05110.x. [DOI] [PubMed] [Google Scholar]
- 66.Maqbool A, Saitoh H, Franceschetti M, Stevenson CEM, Uemura A, Kanzaki H, et al. Structural basis of pathogen recognition by an integrated HMA domain in a plant NLR immune receptor. Elife. 2015;4:e08709. 10.7554/eLife.08709. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Bailey PC, Schudoma C, Jackson W, Baggs E, Dagdas G, Haerty W, et al. Dominant integration locus drives continuous diversification of plant immune receptors with exogenous domain fusions. Genome Biol. 2018;19:23. 10.1186/s13059-018-1392-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Grund E, Tremousaygue D, Deslandes L. Plant NLRs with integrated domains: unity makes Strength1[OPEN]. Plant Physiol. 2019;179:1227–35. 10.1104/pp.18.01134. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Bansuli, Sharma A, Rana RS, Lata H, AlishaThakur, Sharma A. Mapping quantitative trait loci (QTLs) for resistance to downy mildew and powdery mildew in cucumber (cucumis sativus L.). J Plant Biochem Biotechnol. 2025. 10.1007/s13562-025-00985-6.
- 70.Dangl JL, Jones JD. Plant pathogens and integrated defence responses to infection. Nature. 2001;411:826–33. 10.1038/35081161. [DOI] [PubMed] [Google Scholar]
- 71.Tong C, Jia Y, Hu H, Zeng Z, Chapman B, Li C. Pangenome and pantranscriptome as the new reference for gene-family characterization: a case study of basic helix-loop-helix (bHLH) genes in barley. Plant Commun. 2025;6:101190. 10.1016/j.xplc.2024.101190. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The cucumber genome sequence used in this study is publicly available in the cucumber-DB database (http://www.cucumberdb.com/).







