Abstract
The genetic information flows from DNA to RNA to protein with high fidelity. While highly essential or conserved genomic sequences are often thought to be compensated by post-transcriptional mechanisms such as alternative splicing (AS) and RNA editing to enhance molecular diversity, this pattern does not necessarily hold true across all genomic regions. Signal peptides are short N-terminal sequences that direct the localization of proteins. By analyzing the genomes and transcriptomes of Drosophila melanogaster and several other commonly studied species, we observed a consistent pattern that genes encoding signal peptides tend to produce fewer protein isoforms than those without. Moreover, AS events at the N-terminal region are significantly underrepresented in signal peptide-containing genes. In both fruitfly and human, RNA recoding events are notably avoided in signal peptide regions. These observations suggest that the presence of signal peptides imposes constraints on both genomic evolution and transcriptomic diversity. Our results provide new insight into the relationship between genome conservation and post-transcriptional regulation, showing that conserved genomic elements, such as signal peptides, do not necessarily coincide with increased post-transcriptional diversification. This study advances our understanding of the evolutionary principles governing RNA-based regulatory mechanisms.
Introduction
Central dogma of molecular biology dictates that the genetic information flows from DNA to RNA and then to protein [1]. The accuracy of DNA replication, transcription, and mRNA (messenger RNA) translation processes ensures the precise inheritance of genetic information. However, during evolution, to adapt to the capricious environments, one version of protein per gene is hardly sufficient. Since DNA mutation is largely inaccessible to conserved genomic regions due to the pleiotropic effect [2], organisms have to find another way to achieve proteomic diversity and plasticity.
Alternative splicing (AS) is frequently utilized as a post-transcriptional strategy to increase molecular and phenotypic diversity/plasticity [3, 4]. With the aid of spliceosomes, an AS event recombines the exons and enables one gene to produce different protein isoforms [5–7]. Apart from AS, another post-transcriptional approach to diversify the proteome is adenosine-to-inosine (A-to-I) mRNA editing [8, 9]. Due to the equivalence between inosine and guanosine in mRNAs, A-to-I editing in coding sequence (CDS) is able to change amino acids (AAs), recoding the protein sequence [10]. It is reported that the flexibility conferred by A-to-I RNA editing is widely used by the organisms to adapt to changeable environments [11–13], resolve the survival–reproduction trade-offs [14, 15], compensate recent G-to-A genomic mutations in populations [16, 17], or serve as a transition state at genomic sites where G allele is fitter than A allele [18]. Interestingly, both post-transcriptional processes, AS and RNA editing, are enriched in evolutionarily conserved genes especially those related to neuronal functions, compensating the proteomic diversity of the genomically constrained sequences [19–21]. This leaves us an impression that the higher conservation level (or essentiality) a sequence is, the more likely it will involve post-transcriptional approaches to increase the proteomic diversity. This compensation strategy circumvents the pleiotropic effect of DNA mutations.
Signal peptides are the N-terminal part of particular immature proteins that guide the protein to the appropriate location such as cell membrane and extracellular matrix [22–24]. During translation, the signal peptide is synthesized, recognized, and bound by receptors on the endoplasmic reticulum (ER) membrane [25]. Then, the translating protein at ER lumen undergoes further folding and conformational changes to form a mature protein [26]. The proteins are guided to specific locations, and then the signal peptides are hydrolyzed and cleaved [27–29]. This conserved mechanism ensures that proteins are accurately delivered to specific cellular or extracellular locations [30, 31]. Signal peptides typically consist of 13–30 AAs, but the length varies across species. They share three main structural regions: a strongly polar N-terminal region, a central hydrophobic core region, and a polar C-terminal region [32]. This sequence conservation allows signal peptides to exert similar functions across different species [33–35]. Single AA alteration caused by point mutation can lead to dramatic changes to the function of signal peptides [36–38]. This raises a question that since the genomic sequences encoding signal peptides are highly unchangeable, will this property restrict the genome evolution of host genes? Or, on the contrary, will the signal peptide-containing genes be compensated by post-transcriptional processes such as AS and RNA editing to increase the proteomic diversity?
In this study, we systematically investigate the relationships between signal peptides, AS, and RNA editing. We mainly focus on fruitfly (Drosophila melanogaster) and also accompany our analyses by other well-annotated genomes of species such as human (Homo sapiens), mouse (Mus musculus), and mouse-ear cress (Arabidopsis thaliana). We found that genes with signal peptides tend to have fewer protein isoforms, and that the AS events at the N-terminal of proteins are strongly avoided in these genes. Moreover, for D. melanogaster and human with sufficient RNA editing sites in CDS, we observed that the RNA recoding events are significantly suppressed in signal peptide regions. Our results provide incremental insight that genomically conserved/constrained regions are not necessarily compensated by extensive post-transcriptional regulations: the signal peptides in a particular set of genes serve as an exception. This study advances our understanding of the evolutionary principles governing post-transcriptional regulatory mechanisms.
Materials and methods
Data acquisition
The reference genome, annotation file, and protein sequences of D. melanogaster were downloaded from FlyBase (https://flybase.org/) genome version r6.06. The full list of candidate A-to-I RNA editing sites in D. melanogaster was retrieved from previous literatures [39, 40]. The known list of RNA editing sites in Apis mellifera was retrieved from our previous works [39, 41]. The known list of RNA editing sites in H. sapiens was combined from two databases RADAR [42] and REDIportal [43]. The known list of RNA editing sites in M. musculus was retrieved from database RADAR. The reference genomes of other species were downloaded with the following link. Homo sapiens: Ensembl database (https://asia.ensembl.org/), genome version hg38 GRCh38.85. Mus musculus: Ensembl database (https://asia.ensembl.org/), genome version mm10 GRCm38.85. Arabidopsis thaliana: Ensembl plants (https://ftp.ensemblgenomes.ebi.ac.uk/pub/plants/), genome version TAIR10.59. Apis mellifera: NCBI (https://www.ncbi.nlm.nih.gov/), genome version HAv3.1. The honeybee transcriptomes used in our study were downloaded from NCBI under accession IDs SRR445999–SRR446004 (nurse) and SRR446005–SRR446010 (forager). The SNP data of D. melanogaster were downloaded from Drosophila melanogaster genetic reference panel (DGRP) [44]. These data were re-annotated and corrected for ancestral state (to determine the derived allele frequency, DAF) in our previous study [45].
Prediction of signal peptides
The signal peptides were predicted from the reference protein sequences using software SignalP 6.0 [33] with default settings. The output results of the software include the position of the cleavage site, and this will tell us the regions of signal peptides on each protein sequence.
Quantification of RNA editing level from the transcriptome
The scheme for reads mapping, filtering, and quantifying editing level for each site was sufficiently described in our previous work [41]. In brief, RNA-seq reads were mapped to the reference sequence using BWA mem version 0.7.17-r1198 [46]. Variation status on known RNA editing sites was extracted by REDItools v2.0 [47] with the following non-default parameters: mapping quality (q) ≥10 and base quality (Q) ≥30. This excludes the poorly aligned reads and those bases of low quality. The RNA editing sites with G allele detected in the RNA-seq were regarded as editing level >0.
dsRNA prediction
We obtained the pre-mRNA sequences of all coding genes in D. melanogaster and used RNALfold [48] with parameter “-z 3″ to predict the region of stable hairpin in pre-mRNAs. All output hairpin regions were merged to a union using “bedtools merge” [49]. The proportion of hairpin in each region was calculated as the length of hairpin divided by the total length of this region [50]. This proportion reflects a tendency to form double-stranded RNA (dsRNA). The reason for not directly folding CDS is that RNA editing typically takes place co-transcriptionally in pre-mRNA; the dsRNA predicted in pre-mRNA is more relevant to the real RNA editing process.
Let each element be (i) a signal peptide region in a gene, (ii) non-signal peptide in a gene, or (iii) CDS of a non-peptide gene. Each element has an L = total length of this element, and an h = hairpin length within this element region. Then, the proportion of hairpin for each element = h/L.
Statistics and graphical works
Statistical tests and graphical works were accomplished in R language (version 3.6.3).
Results
Systematic identification of signal peptides in D. melanogaster genome
To understand whether signal peptides restrict genome evolution, we systematically identified signal peptides in the reference genome of D. melanogaster (genome version r6.06). Among the 13 919 protein coding genes, 3191 (22.9%) genes have signal peptides and the remaining 10 728 (77.1%) genes do not have signal peptides (Fig. 1A). The median length of signal peptides is 21 AAs (Fig. 1B), and the longest signal peptide is 55 AAs located in the N-terminal of gene FBgn0001083 (fw, furrowed). Most of the signal peptides make up <10% of the total protein length (Fig. 1B; median percentage = 6.2%). The highest percentage of signal peptide appears in gene FBgn0262836 (CG43200), where the entire protein has only 21 AAs and the signal peptide is 18 AAs in length. Interestingly, we found that the proteins with signal peptides are significantly shorter than proteins without signal peptides (Fig. 1C). Since these signal peptide-containing proteins are targeted to cell membranes or transported outside the cell, the significant difference in protein length might reflect the selection for energy saving as shorter proteins take fewer energies to traffic. Moreover, secretory and membrane proteins with signal peptides typically have simple, compact, stable, and specialized structures without the need for complex functional domains [51, 52]. In contrast, cytosolic and nuclear proteins, such as transcription factors, DNA/RNA binding proteins, and metabolic enzymes, are more complex and require larger, multifunctional domains to support their diverse roles [53–55].
Figure 1.
Statistics of signal peptides and the impact on genome evolution. (A) Number of genes with or without signal peptide and number of genes with one or more than one protein isoforms. (B) Length distribution of signal peptides and their length fraction in the protein. (C) Protein length of genes with or without signal peptide. P-value was calculated with Wilcoxon rank sum test. (D) Fraction of genes with signal peptide. Genes were ranked with increasing number of protein isoforms. (E) Protein isoforms of a gene can have identical or different N-terminals depending on whether the first “CDS exon” is alternatively spliced. (F) The fraction of genes with identical N-terminal among different isoforms was compared between genes with or without signal peptide. P-value was calculated by Fisher’s exact test.
Signal peptides restrict splicing in Drosophila
Since we observed that the presence of signal peptide constrains the length of host protein, we wonder whether the proteomic diversity (conferred by splicing) is also restricted by signal peptide. Among the 13 919 protein coding genes in D. melanogaster, roughly half of the genes (6902, 49.6%) have only one annotated protein isoform and the other half (7017, 50.4%) have more than one protein isoforms (Fig. 1A). We then interrogated this fraction regarding whether a coding gene has signal peptide. Among the 3191 genes with signal peptide, 59.4% genes (1897/3191) only have one protein isoform, but this fraction is only 46.7% (5005/10 728) for genes without signal peptide and the difference is significant (Supplementary Fig. S1). If we directly compare the number of isoforms of each gene, this number is still significantly lower for the signal peptide genes (median = 1, mean = 1.77) than the non-signal peptide genes (median = 2, mean = 2.31; P = 5.0e−53, Wilcoxon rank sum test). This result suggests that signal peptides not only restrict protein length, but also affect proteomic diversity by confining AS. Moreover, we divided the coding genes into five bins each having one, two, three, four, and more than four isoforms, finding that the fraction of genes with signal peptide monotonically decreases with increasing numbers of isoforms (Fig. 1D). This result provides a granular understanding that the restriction on isoform diversity in signal peptide-containing genes holds true as the number of isoforms increases.
As signal peptides reside in the N-terminal of proteins, we intuitively expect that AS is mainly underrepresented at the N-terminal. We define “CDS exon” as the exons overlapping with CDS. Only the genes with at least two CDS exons are considered. Besides, 169 out of the 13 919 total genes have part of their isoforms having signal peptide, while some of the isoforms do not. These few genes are ambiguous and are excluded from the downstream analysis.
For a given gene, if the first CDS exon is alternatively spliced, then the different protein isoforms will differ at N-terminal; if the first CDS exon is reserved in all protein isoforms, then they will have identical N-terminals (Fig. 1E). By assuming that the sequence of signal peptide is essential for the proper function of host protein, we predict that genes with signal peptide tend to have identical N-terminal for all isoforms. Our data show that 93.5% (1052/1125) of genes with signal peptides have identical N-terminal, and this fraction becomes significantly lower (4139/5723 = 72.3%) in genes without signal peptides (Fig. 1F). This indicates that the AS at the first CDS exon is evolutionarily suppressed in order to maintain the integrity of signal peptide. Since we still obtained 73 (6.5%) signal peptide-containing genes that have different N-terminals for different isoforms, it represents a rare case that the same gene can encode proteins with different signal peptides.
Notably, our results show that although the splicing events are suppressed in the N-terminal of proteins with signal peptides (Fig. 1F), the rest parts of the proteins (which is not constrained by signal peptide) do not seem to play a compensatory role because the total number of protein isoforms is anticorrelated with the presence of signal peptide (Fig. 1D). However, AS is only one of the various routes to diversity; we then interrogate whether the mutation profile is different in signal peptide regions.
Mutations are underrepresented in signal peptides revealed by worldwide Drosophila population
To further elucidate the impact of signal peptide on genome evolution, we downloaded the worldwide SNP data from DGRP [44], the data of which were analyzed in our previous study [45] (Supplementary Data S1). We divided all the protein coding regions into genes with or without signal peptide, and the CDSs with signal peptide were further divided into the signal peptide region and the remaining region (Fig. 2A). Using synonymous sites as control, we calculated the nonsynonymous-to-synonymous ratio (Nonsyn/Syn) of SNPs and found that signal peptide regions have significantly lower Nonsyn/Syn ratio than non-signal peptide regions (Fig. 2B). This suggests an overall avoidance of Nonsyn mutations in signal peptides. Moreover, the derived allele frequency (DAF) spectrum also reflects a suppression of Nonsyn SNPs in signal peptide regions, while Syn sites do not exhibit such patterns (Fig. 2C). These observations imply a certain level of evolutionary constraint on protein sequences imposed by signal peptides.
Figure 2.
Underrepresentation of SNPs in signal peptide regions. (A) CDSs are classified into with or without signal peptide, and the former is further divided into signal peptide region and the remaining region. Nonsynonymous and synonymous SNPs were considered. (B) Nonsyn/Syn ratio of SNPs in different genomic regions. Categories with different letters have significant difference under Fisher’s exact tests. (C) DAF of SNPs in different genomic regions. Categories with different letters have significant difference under Wilcoxon rank sum tests.
Signal peptides restrict A-to-I RNA editing in Drosophila
In addition to AS, another post-transcriptional approach to achieve proteomic diversity is A-to-I RNA editing (Fig. 3A) [8, 56–58]. In animals, endogenous A-to-I RNA editing is mediated by the ADAR enzymes that preferentially target dsRNAs [59–63]. Thousands to millions of RNA editing sites are identified in the transcriptomes of mammals [64–68], insects [69–71], cephalopods [11–13, 19], and worms [72]. Particularly, in insects and cephalopods, a large proportion of RNA editing sites is located in coding region and alters the protein sequence [73–75], termed nonsynonymous or recoding sites [10, 76 ,77].
Figure 3.
Underrepresentation of A-to-I RNA editing in signal peptide regions and implications. (A) An introduction on the occurrence and consequence of A-to-I RNA editing. (B) Nonsynonymous RNA editing levels in different regions. P-values were calculated with Wilcoxon rank sum tests. (C) Proportion of dsRNA structure (hairpin) in different genomic regions. Signal peptides were divided into five bins with increasing length.
Since signal peptides restrict proteomic diversity by avoiding the splicing events at the beginning of CDS, then the same selection pressure may also constrain A-to-I RNA editing events. By collecting the full list of candidate A-to-I RNA editing sites in D. melanogaster [39, 40] and requiring editing level >0 in heads, 1008 nonsynonymous editing sites were obtained (Supplementary Data S2). We found two sites located in signal peptide region, 117 sites in signal peptide genes but not in signal peptide region, and 889 sites in genes without signal peptide. Overall, 46 out of 3191 genes with signal peptide (46/3191 = 1.4%) have nonsynonymous editing events, while this fraction is 3.5% (373/10 728) in genes without signal peptide (P = 2.7e−10, Fisher’s exact test). This suggests an overall avoidance of nonsynonymous RNA editing in signal peptide genes. Moreover, the two nonsynonymous sites in signal peptide region have significantly lower editing levels than the nonsynonymous sites outside signal peptide region or in non-signal peptide genes (Fig. 3B). Moreover, since the non-signal peptide regions of the host genes do not show higher editing efficiency compared to the genes without signal peptides, this indicates that although the proteomic diversity is decreased for signal peptide sequences, the remaining parts of the genes do not seem to increase RNA editing efficiency to compensate the diversity.
We further predicted the potential impact of the two A-to-I RNA editing events in the signal peptide region (Supplementary Fig. S2). The host genes of the two editing sites are vsg (visgun, involved in cell proliferation and long-term memory) and Nplp2 (Neuropeptide-like precursor 2, a widely expressed gene involved in humoral immune response, https://flybase.org/). We found that the two RNA editing events, vsg p.6 Lys>Glu and Nplp2 p.3 Lys>Glu, do not alter the boundary of the three major parts of signal peptides (Supplementary Fig. S2, N-terminal region, central region, and C-terminal region). Given that the editing levels of these two sites are suppressed to an extremely low level, the impact of such editing on the entire cellular environment might be limited.
We try to look for synonymous and non-CDS editing sites as control. This would help clarify whether the observed suppression is specific to signal peptide regions or reflects a broader genomic trend. However, no synonymous RNA editing is recorded in signal peptide regions, and therefore we can only compare synonymous editing in genes with or without signal peptide. The same goes for non-CDS editing sites. We found that the synonymous, UTR, or intronic editing levels do not show significant difference between genes with or without signal peptides (Supplementary Fig. S3). This result, although does not directly imply the constraint in signal peptide regions, suggests that the overall editing efficiency within a gene is not affected by the presence or absence of signal peptides.
Since dsRNA structure is required for ADAR-mediated A-to-I editing [78], we are curious whether this factor could mechanistically underpin the observed underrepresentation of RNA editing in signal peptide. We calculated the proportion of stable dsRNA (hairpin) in different regions (see the “Materials and methods” section), finding that although signal peptide regions have overall lower proportion of RNA structure compared to non-signal peptide region, this difference is likely caused by the shorter length of signal peptides compared to other regions because such proportion is correlated with signal peptide length (Fig. 3C). Nevertheless, the current observations cannot rule out the role of dsRNA in the underrepresentation of Nonsyn RNA editing in signal peptide.
Signal peptides restrict AS in other representative species
To test whether the evolutionary constraint exerted by signal peptides is a universal trend across the tree of life, we retrieved the commonly used and well-annotated reference genomes, including human (H. sapiens), mouse (M. musculus), and mouse-ear cress (A. thaliana). As we have stressed, many non-model organisms are not suitable for such analysis due to their potentially inadequate annotation of reference genomes (where most genes only have one isoform).
We excluded the genes where part of the isoforms has signal peptide and the others do not. These genes are <10% of the total genes in human (2090/22 964 = 9.1%), mouse (999/22 964 = 4.4%), and A. thaliana (501/27 628 = 1.8%). Interestingly, we obtained exactly the same trend with the case in D. melanogaster. First, genes with signal peptides have a significant tendency to be single isoform genes, suggesting that AS is suppressed in those genes (Supplementary Fig. S4). A more nuanced analysis shows that the fraction of genes with signal peptide also monotonically decreases with increasing numbers of isoforms (Fig. 4A), echoing the results in fruitfly (Fig. 1D). Then, for the multi-isoform genes, we calculated how many of these genes have isoforms that differ at the N-terminal. We again found that genes with signal peptide tend to have isoforms with identical N-terminal, while genes without signal peptide tend to have isoforms with different N-terminals (Fig. 4B). This indicates that the presence of signal peptide significantly restricts the AS at 5′ of CDS. Together with the comparison of number of isoforms per gene, we summarize that signal peptide restricts the evolution of splicing in CDS and thus constrains proteomic diversity. This pattern is highly consistent across mammals, insects, and plants.
Figure 4.
Restriction exerted by signal peptide in other representative species including human (H. sapiens), mouse (M. musculus), and mouse-ear cress (A. thaliana). (A) Fraction of genes with signal peptide. Genes were ranked with increasing number of protein isoforms. (B) The fraction of multiple isoform genes having isoforms with identical N-terminal. P-values were calculated by Fisher’s exact tests.
Note that the baseline proportions of genes with signal peptide or genes with single isoform might vary across different species. For example, in the human genome, 75.0% genes have multiple protein isoforms, while this fraction is only 50.4% in D. melanogaster. For the proportion of genes with signal peptide, this fraction is 20.8% in human compared to a similar value of 22.9% in D. melanogaster. However, the restriction of signal peptide on genome evolution tends to be conserved.
Suppression of RNA editing in signal peptides is also observed in honeybee and human
For the non-model organisms, the annotation of alternatively spliced isoforms is not as comprehensive as in D. melanogaster, preventing us to precisely test the relationship between splicing isoforms and signal peptide. However, the availability of known RNA editing sites in those species enables us to check the evolutionary constraint on RNA editing by the presence of signal peptides. Particularly, a species with overrepresented CDS or nonsynonymous RNA editing is favorable. We select honeybee A. mellifera where the recoding sites are overall positively selected.
According to the latest genome version HAv3.1 of A. mellifera, we obtained 9935 unique coding genes, among which 1348 genes contain signal peptides and the remaining 8587 genes do not have signal peptides. By using a union of candidate RNA editing sites identified in our previous studies [39, 41] and requiring editing level >0 in our collected RNA-seq data, we obtained 298 nonsynonymous editing sites (Supplementary Data S3). Eighteen sites are located in 10 unique genes with signal peptide but not in the signal peptide regions, and 280 sites are located in 156 unique genes without signal peptides. A simple calculation on the proportion of genes with nonsynonymous RNA editing suggests that the genes with signal peptide have significantly lower proportion (10/1348 = 0.74%) compared to the genes without signal peptide (156/8587 = 1.8%, P = .0014, Fisher’s exact test). This pattern nicely echoes the results in fruitfly that nonsynonymous RNA editing is underrepresented in genes with signal peptides.
In human, the RNA editing sites are well recorded in databases like RADAR [42] and REDIportal [43]. A combination of the two sources produced 18 921 CDS editing sites not including a few sites at start codons, stop codons, and splicing junctions (Supplementary Data S4). The Nonsyn/Syn ratios of editing sites were 32/23 = 1.39, 3135/1099 = 2.85, and 10 494/4138 = 2.54 for signal peptide region, remaining regions in signal peptide genes, and genes without signal peptide (Supplementary Fig. S5). The differences were significant between signal peptide region and other regions. This indicates that the evolutionary constraint on RNA editing imposed by the presence of signal peptides might be highly conserved across insects and primates. For mouse RNA editing sites recorded in RADAR database, no sites are located in signal peptide regions, preventing us from testing the generality of this pattern in representative rodent. Nevertheless, the conservation between insects and humans demonstrates the broad applicability of our notion.
Note that since plant RNA editing majorly takes place in chloroplast and mitochondrial genes [79–82] where one gene typically produces one isoform and does not encode signal peptide, it is not applicable to study the relationship between RNA editing, AS, and signal peptide in plants.
Discussion
Trade-offs between genome evolution and transcriptomic diversity are commonly seen in organisms [19]. Due to the pleiotropic effects of DNA mutations, the highly essential, conserved, and unchangeable DNA sequences are usually compensated “epigenetically” by various post-transcriptional mechanisms like AS and RNA editing [19–21, 83]. The functionally important signal peptide sequences are such cases that are intuitively unchangeable at DNA level and might expect to be compensated by post-transcriptional processes. We acknowledge that the evolutionary constraint on signal peptide sequence is not an entirely novel concept. For example, literatures demonstrate that the signal peptide of parathyroid hormone-related protein exhibits significant evolutionary conservation among mammalian species, underscoring its functional importance [84]. Mutations in signal peptide can dramatically change the fitness of host organisms [85]. However, previous works are mainly case reports on individual gene(s). The advent of omics era allows us to systematically test this hypothesis at genome-wide and multiple species level. Such scale and depth set our study apart from prior reports.
In this work, we fully take advantage of the commonly used reference genomes of fruitfly, honeybee, human, mouse, and mouse-ear cress and the well-annotated RNA editomes and SNP data of a few species. We found that the presence of signal peptides remarkably constrains the genome evolution and proteomic diversity of host genes. Post-transcriptional processes such as AS and A-to-I RNA editing are avoided in signal peptide regions but are not compensated in the remaining regions of host genes. We provide incremental insight that there are indeed some genomic regions that are constrained at both DNA and post-transcriptional levels.
Due to the unchangeable property of the signal peptide region, AS is underrepresented at the N-terminal of related proteins, and this potentially restricts the proteomic diversity. One may argue that although some mutations can abolish the function of signal peptides [86–88], there are still some flexible regions that are tolerant to mutations. However, the sequence alteration by splicing is much more severe than point mutations. Then, a new question comes that since signal peptides do not appear in mature protein, how is natural selection imposed on signal peptides? We provide the following rationales: (i) Although the sequence of the mature protein is irrelevant to the signal peptide, the mutation in signal peptide can affect the transportation and localization of host proteins. This functional shift is also subjected to strong purifying selection. (ii) The first exon of CDS not only contains the signal peptide, but might also include part of the mature protein. The suppression of AS at this exon abolishes the possible shuffling of the N-terminal of the mature protein. Thus, the presence of signal peptide still constrains the “evolvability” of the host protein.
Regarding the underrepresentation of A-to-I RNA editing in the signal peptide regions, although this phenomenon is only verified in Drosophila, honeybee, and human where there are sufficient numbers of RNA editing sites in CDS, it does not preclude the generality of this theory. Further comparative genomic analysis might reveal the underrepresentation of other types of cis-regulatory elements due to the presence of signal peptides. For future functional validation on this hypothesis, fly strains (or other experimental animals) with point mutation in signal peptide regions are favorable. Artificial introduction of mutations to signal peptide might be associated with decrease fitness in many ways depending on the function of host gene. Moreover, mechanistical insights can be gained from the structural and biochemical switches between the wild-type and mutated proteins.
Taken together, our study provides incremental insight that conserved genomic sequences are not necessarily correlated with extensive post-transcriptional compensation [19]. At least for the genes with signal peptides, AS and RNA editing are not overrepresented compared to the genome-wide baseline. This study deepens our understanding on the evolutionary principles of epigenetic regulations.
Supplementary Material
Acknowledgements
We thank the funders for the financial support. We thank Dr Ling Ma for the help in the scanning of signal peptides. The computational work is supported by High-Performance Computing Platform of China Agricultural University. We thank the platform for the computational support.
Author contributions: Conceptualization and supervision: Y.D., H.L., and W.S. Data analysis: Y.D. and S.C. Writing—original draft: Y.D., S.C., W.C., W.S., and H.L. Writing—review & editing: Y.D., S.C., W.C., W.S., and H.L. All authors approved the submission of this manuscript.
Contributor Information
Yuange Duan, Department of Entomology and State Key Laboratory of Agricultural and Forestry Biosecurity, MOA Key Lab of Pest Monitoring and Green Management, College of Plant Protection, China Agricultural University, Beijing 100193, China.
Shuxian Chen, Department of Entomology and State Key Laboratory of Agricultural and Forestry Biosecurity, MOA Key Lab of Pest Monitoring and Green Management, College of Plant Protection, China Agricultural University, Beijing 100193, China.
Wanzhi Cai, Department of Entomology and State Key Laboratory of Agricultural and Forestry Biosecurity, MOA Key Lab of Pest Monitoring and Green Management, College of Plant Protection, China Agricultural University, Beijing 100193, China.
Wen Song, State Key Laboratory of Plant Environmental Resilience, College of Biological Sciences, China Agricultural University, Beijing 100193, China.
Hu Li, Department of Entomology and State Key Laboratory of Agricultural and Forestry Biosecurity, MOA Key Lab of Pest Monitoring and Green Management, College of Plant Protection, China Agricultural University, Beijing 100193, China.
Supplementary data
Supplementary data is available at NAR Genomics & Bioinformatics online.
Conflict of interest
None declared.
Funding
This study was financially supported by the National Natural Science Foundation of China (nos. 32120103006 and 32300371), the Beijing Natural Science Foundation (Natural Science Foundation of Beijing Municipality, no. 6252012), the Young Elite Scientist Sponsorship Program by CAST (no. 2023QNRC001), the Young Elite Scientist Sponsorship Program by BAST (no. BYESS2023160), and the 2115 Talent Development Program of China Agricultural University.
Data availability
The reference genome, annotation file, and protein sequences of D. melanogaster were downloaded from FlyBase (https://flybase.org/) genome version r6.06. The full list of candidate A-to-I RNA editing sites in D. melanogaster was retrieved from previous literatures [37, 38]. The known list of RNA editing sites in A. mellifera was retrieved from our previous works [37, 39]. The known list of RNA editing sites in H. sapiens was combined from two databases RADAR [40] and REDIportal [41]. The known list of RNA editing sites in M. musculus was retrieved from database RADAR. The reference genomes of other species were downloaded with the following link: Homo sapiens: Ensembl database (https://asia.ensembl.org/), genome version hg38 GRCh38.85; Mus musculus: Ensembl database (https://asia.ensembl.org/), genome version mm10 GRCm38.85; Arabidopsis thaliana: Ensembl plants (https://ftp.ensemblgenomes.ebi.ac.uk/pub/plants/), genome version TAIR10.59; and Apis mellifera: NCBI (https://www.ncbi.nlm.nih.gov/), genome version HAv3.1. The honeybee transcriptomes used in our study were downloaded from NCBI under accession IDs SRR445999–SRR446004 (nurse) and SRR446005–SRR446010 (forager). The SNP data of D. melanogaster were downloaded from DGRP [42]. These data were re-annotated and corrected for ancestral state (to determine the derived allele frequency, DAF) in our previous study [43].
References
- 1. Crick F Central dogma of molecular biology. Nature. 1970; 227:561–3. 10.1038/227561a0. [DOI] [PubMed] [Google Scholar]
- 2. Meng F, Jia Z, Zheng J et al. A deafness-associated mitochondrial DNA mutation caused pleiotropic effects on DNA replication and tRNA metabolism. Nucleic Acids Res. 2022; 50:9453–69. 10.1093/nar/gkac720. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Wright CJ, Smith CWJ, Jiggins CD Alternative splicing as a source of phenotypic diversity. Nat Rev Genet. 2022; 23:697–710. 10.1038/s41576-022-00514-4. [DOI] [PubMed] [Google Scholar]
- 4. Reddy AS Alternative splicing of pre-messenger RNAs in plants in the genomic era. Annu Rev Plant Biol. 2007; 58:267–94. 10.1146/annurev.arplant.58.032806.103754. [DOI] [PubMed] [Google Scholar]
- 5. Roy SW, Gilbert W The evolution of spliceosomal introns: patterns, puzzles and progress. Nat Rev Genet. 2006; 7:211–21. [DOI] [PubMed] [Google Scholar]
- 6. Wang X, Wang J, Li S et al. An overview of RNA splicing and functioning of splicing factors in land plant chloroplasts. RNA Biol. 2022; 19:897–907. 10.1080/15476286.2022.2096801. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Duan Y, Cao Q Systematic revelation and meditation on the significance of long exons using representative eukaryotic genomes. BMC Genomics. 2025; 26:290. 10.1186/s12864-025-11504-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Eisenberg E, Levanon EY A-to-I RNA editing—immune protector and transcriptome diversifier. Nat Rev Genet. 2018; 19:473–90. 10.1038/s41576-018-0006-1. [DOI] [PubMed] [Google Scholar]
- 9. Ma L, Zheng C, Xu S et al. A full repertoire of Hemiptera genomes reveals a multi-step evolutionary trajectory of auto-RNA editing site in insect Adar gene. RNA Biol. 2023; 20:703–14. 10.1080/15476286.2023.2254985. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Alon S, Garrett SC, Levanon EY et al. The majority of transcripts in the squid nervous system are extensively recoded by A-to-I RNA editing. eLife. 2015; 4:e05198. 10.7554/eLife.05198. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Birk MA, Liscovitch-Brauer N, Dominguez MJ et al. Temperature-dependent RNA editing in octopus extensively recodes the neural proteome. Cell. 2023; 186:2544–55. 10.1016/j.cell.2023.05.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Garrett S, Rosenthal JJ RNA editing underlies temperature adaptation in K+ channels from polar octopuses. Science. 2012; 335:848–51. 10.1126/science.1212795. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. Rangan KJ, Reck-Peterson SL RNA recoding in cephalopods tailors microtubule motor protein function. Cell. 2023; 186:2531–43. 10.1016/j.cell.2023.04.032. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Qi Z, Lu P, Long X et al. Adaptive advantages of restorative RNA editing in fungi for resolving survival-reproduction trade-offs. Sci Adv. 2024; 10:eadk6130. 10.1126/sciadv.adk6130. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. Xin K, Zhang Y, Fan L et al. Experimental evidence for the functional importance and adaptive advantage of A-to-I RNA editing in fungi. Proc Natl Acad Sci USA. 2023; 120:e2219029120. 10.1073/pnas.2219029120. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16. An NA, Ding W, Yang XZ et al. Evolutionarily significant A-to-I RNA editing events originated through G-to-A mutations in primates. Genome Biol. 2019; 20:24. 10.1186/s13059-019-1638-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Mai TL, Chuang TJ A-to-I RNA editing contributes to the persistence of predicted damaging mutations in populations. Genome Res. 2019; 29:1766–76. 10.1101/gr.246033.118. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. Popitsch N, Huber CD, Buchumenski I et al. A-to-I RNA editing uncovers hidden signals of adaptive genome evolution in animals. Genome Biol Evol. 2020; 12:345–57. 10.1093/gbe/evaa046. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19. Liscovitch-Brauer N, Alon S, Porath HT et al. Trade-off between transcriptome plasticity and genome evolution in cephalopods. Cell. 2017; 169:191–202. 10.1016/j.cell.2017.03.025. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20. Driscoll S, Merkuri F, Chain FJJ et al. Splicing is dynamically regulated during limb development. Sci Rep. 2024; 14:19944. 10.1038/s41598-024-68608-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21. Weissbach S, Milkovits J, Pastore S et al. Cortexa: a comprehensive resource for studying gene expression and alternative splicing in the murine brain. BMC Bioinformatics. 2024; 25:293. 10.1186/s12859-024-05919-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22. Blobel G, Dobberstein B Transfer of proteins across membranes. I. Presence of proteolytically processed and unprocessed nascent immunoglobulin light chains on membrane-bound ribosomes of murine myeloma. J Cell Biol. 1975; 67:835–51. 10.1083/jcb.67.3.835. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23. Blobel G, Dobberstein B Transfer of proteins across membranes. II. Reconstitution of functional rough microsomes from heterologous components. J Cell Biol. 1975; 67:852–62. 10.1083/jcb.67.3.852. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24. Owji H, Nezafat N, Negahdaripour M et al. A comprehensive review of signal peptides: structure, roles, and applications. Eur J Cell Biol. 2018; 97:422–41. 10.1016/j.ejcb.2018.06.003. [DOI] [PubMed] [Google Scholar]
- 25. Janda CY, Li J, Oubridge C et al. Recognition of a signal peptide by the signal recognition particle. Nature. 2010; 465:507–10. 10.1038/nature08870. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26. Liaci AM, Forster F Take me home, protein roads: structural insights into signal peptide interactions during ER translocation. Int J Mol Sci. 2021; 22:11871. 10.3390/ijms222111871. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27. Bhadra P, Helms V Molecular modeling of signal peptide recognition by eukaryotic sec complexes. Int J Mol Sci. 2021; 22:10705. 10.3390/ijms221910705. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28. Karagyozov L, Grozdanov PN, Bohmer FD The translation attenuating arginine-rich sequence in the extended signal peptide of the protein-tyrosine phosphatase PTPRJ/DEP1 is conserved in mammals. PLoS One. 2020; 15:e0240498. 10.1371/journal.pone.0240498. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29. Lumangtad LA, Bell TW The signal peptide as a new target for drug design. Bioorg Med Chem Lett. 2020; 30:127115. 10.1016/j.bmcl.2020.127115. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30. Gemmer M, Forster F A clearer picture of the ER translocon complex. J Cell Sci. 2020; 133:jcs231340. 10.1242/jcs.231340. [DOI] [PubMed] [Google Scholar]
- 31. Liaci AM, Steigenberger B, Telles de Souza PC et al. Structure of the human signal peptidase complex reveals the determinants for signal peptide cleavage. Mol Cell. 2021; 81:3934–48. 10.1016/j.molcel.2021.07.031. [DOI] [PubMed] [Google Scholar]
- 32. Mirdha L, Sengupta T, Chakraborty H Lipid composition dependent binding of apolipoprotein E signal peptide: importance of membrane cholesterol in protein trafficking. Biophys Chem. 2022; 291:106907. 10.1016/j.bpc.2022.106907. [DOI] [PubMed] [Google Scholar]
- 33. Teufel F, Almagro Armenteros JJ, Johansen AR et al. SignalP 6.0 predicts all five types of signal peptides using protein language models. Nat Biotechnol. 2022; 40:1023–5. 10.1038/s41587-021-01156-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34. Wu JM, Liu YC, Chang DT SigUNet: signal peptide recognition based on semantic segmentation. BMC Bioinformatics. 2019; 20:677. 10.1186/s12859-019-3245-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35. Zhang WX, Pan X, Shen HB Signal-3L 3.0: improving signal peptide prediction through combining attention deep learning with window-based scoring. J Chem Inf Model. 2020; 60:3679–86. 10.1021/acs.jcim.0c00401. [DOI] [PubMed] [Google Scholar]
- 36. Jimenez HJ, Procopio RA, Thuma TBT et al. Signal peptide variants in inherited retinal diseases: a multi-institutional case series. Int J Mol Sci. 2022; 23:13361. 10.3390/ijms232113361. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37. Smets D, Smit J, Xu Y et al. Signal peptide-rheostat dynamics delay secretory preprotein folding. J Mol Biol. 2022; 434:167790. 10.1016/j.jmb.2022.167790. [DOI] [PubMed] [Google Scholar]
- 38. Zhang Z, Wan X, Li X et al. Effects of a shift of the signal peptide cleavage site in signal peptide variant on the synthesis and secretion of SARS-CoV-2 spike protein. Molecules. 2022; 27:6688. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39. Duan Y, Xu Y, Song F et al. Differential adaptive RNA editing signals between insects and plants revealed by a new measurement termed haplotype diversity. Biol Direct. 2023; 18:47. 10.1186/s13062-023-00404-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40. Zheng C, Ma L, Song F et al. Comparative genomic analyses reveal evidence for adaptive A-to-I RNA editing in insect Adar gene. Epigenetics. 2024; 19:2333665. 10.1080/15592294.2024.2333665. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41. Liu J, Zhao T, Zheng C et al. An orthology-based methodology as a complementary approach to retrieve evolutionarily conserved A-to-I RNA editing sites. RNA Biol. 2024; 21:929–45. 10.1080/15476286.2024.2397757. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42. Ramaswami G, Li JB RADAR: a rigorously annotated database of A-to-I RNA editing. Nucleic Acids Res. 2014; 42:D109–13. 10.1093/nar/gkt996. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43. Picardi E, D’Erchia AM, Lo Giudice C et al. REDIportal: a comprehensive database of A-to-I RNA editing events in humans. Nucleic Acids Res. 2017; 45:D750–7. 10.1093/nar/gkw767. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44. Mackay TF, Richards S, Stone EA et al. The Drosophila melanogaster genetic reference panel. Nature. 2012; 482:173–8. 10.1038/nature10811. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45. Ma L, Zheng C, Liu J et al. Learning from the codon table: convergent recoding provides novel understanding on the evolution of A-to-I RNA editing. J Mol Evol. 2024; 92:488–504. 10.1007/s00239-024-10190-z. [DOI] [PubMed] [Google Scholar]
- 46. Li H, Durbin R Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics. 2009; 25:1754–60. 10.1093/bioinformatics/btp324. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47. Picardi E, Pesole G REDItools: high-throughput RNA editing detection made easy. Bioinformatics. 2013; 29:1813–4. 10.1093/bioinformatics/btt287. [DOI] [PubMed] [Google Scholar]
- 48. Gruber AR, Lorenz R, Bernhart SH et al. The Vienna RNA websuite. Nucleic Acids Res. 2008; 36:W70–4. 10.1093/nar/gkn188. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49. Quinlan AR, Hall IM BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010; 26:841–2. 10.1093/bioinformatics/btq033. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50. Duan Y, Ma L, Liu J et al. The first A-to-I RNA editome of hemipteran species Coridius chinensis reveals overrepresented recoding and prevalent intron editing in early-diverging insects. Cell Mol Life Sci. 2024; 81:136. 10.1007/s00018-024-05175-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51. Hubbard C, Singleton D, Rauch M et al. The secretory carrier membrane protein family: structure and membrane topology. Mol Biol Cell. 2000; 11:2933–47. 10.1091/mbc.11.9.2933. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52. Maxwell M, Undheim EAB, Mobli M Secreted cysteine-rich repeat proteins “SCREPs”: a novel multi-domain architecture. Front Pharmacol. 2018; 9:1333. 10.3389/fphar.2018.01333. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53. Suen S, Lu HH, Yeang CH Evolution of domain architectures and catalytic functions of enzymes in metabolic systems. Genome Biol Evol. 2012; 4:976–93. 10.1093/gbe/evs072. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54. Yu L, Tanwar DK, Penha EDS et al. Grammar of protein domain architectures. Proc Natl Acad Sci USA. 2019; 116:3636–45. 10.1073/pnas.1814684116. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55. Soto LF, Li Z, Santoso CS et al. Compendium of human transcription factor effector domains. Mol Cell. 2022; 82:514–26. 10.1016/j.molcel.2021.11.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56. Shoshan Y, Liscovitch-Brauer N, Rosenthal JJC et al. Adaptive proteome diversification by nonsynonymous A-to-I RNA editing in coleoid cephalopods. Mol Biol Evol. 2021; 38:3775–88. 10.1093/molbev/msab154. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57. Liu J, Zheng C, Duan Y New comparative genomic evidence supporting the proteomic diversification role of A-to-I RNA editing in insects. Mol Genet Genomics. 2024; 299:46. 10.1007/s00438-024-02141-6. [DOI] [PubMed] [Google Scholar]
- 58. Peng X, Xu X, Wang Y et al. A-to-I RNA editing contributes to proteomic diversity in cancer. Cancer Cell. 2018; 33:817–28. 10.1016/j.ccell.2018.03.026. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59. Duan Y, Ma L, Song F et al. Autorecoding A-to-I RNA editing sites in the Adar gene underwent compensatory gains and losses in major insect clades. RNA. 2023; 29:1509–19. 10.1261/rna.079682.123. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60. Savva YA, Rieder LE, Reenan RA The ADAR protein family. Genome Biol. 2012; 13:252. 10.1186/gb-2012-13-12-252. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61. Chen CX, Cho DS, Wang Q et al. A third member of the RNA-specific adenosine deaminase gene family, ADAR3, contains both single- and double-stranded RNA binding domains. RNA. 2000; 6:755–67. 10.1017/S1355838200000170. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62. Grice LF, Degnan BM The origin of the ADAR gene family and animal RNA editing. BMC Evol Biol. 2015; 15:4. 10.1186/s12862-015-0279-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63. Palladino MJ, Keegan LP, O’Connell MA et al. dADAR, a Drosophila double-stranded RNA-specific adenosine deaminase is highly developmentally regulated and is itself a target for RNA editing. RNA. 2000; 6:1004–18. 10.1017/S1355838200000248. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64. Bazak L, Haviv A, Barak M et al. A-to-I RNA editing occurs at over a hundred million genomic sites, located in a majority of human genes. Genome Res. 2014; 24:365–76. 10.1101/gr.164749.113. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65. Kim DD, Kim TT, Walsh T et al. Widespread RNA editing of embedded alu elements in the human transcriptome. Genome Res. 2004; 14:1719–25. 10.1101/gr.2855504. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66. Licht K, Kapoor U, Amman F et al. A high resolution A-to-I editing map in the mouse identifies editing events controlled by pre-mRNA splicing. Genome Res. 2019; 29:1453–63. 10.1101/gr.242636.118. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67. Adetula AA, Fan X, Zhang Y et al. Landscape of tissue-specific RNA editome provides insight into co-regulated and altered gene expression in pigs (Susscrofa). RNA Biol. 2021; 18:439–50. 10.1080/15476286.2021.1954380. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68. Zhang Y, Han D, Dong X et al. Genome-wide profiling of RNA editing sites in sheep. J Anim Sci Biotechnol. 2019; 10:31. 10.1186/s40104-019-0331-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69. Li Q, Wang Z, Lian J et al. Caste-specific RNA editomes in the leaf-cutting ant Acromyrmex echinatior. Nat Commun. 2014; 5:4943. 10.1038/ncomms5943. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70. Ma L, Duan Y, Wu Y et al. Comparative genomic analyses on assassin bug Rhynocoris fuscipes (Hemiptera: Reduviidae) reveal genetic bases governing the diet-shift. iScience. 2024; 27:110411. 10.1016/j.isci.2024.110411. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71. Porath HT, Hazan E, Shpigler H et al. RNA editing is abundant and correlates with task performance in a social bumblebee. Nat Commun. 2019; 10:1605. 10.1038/s41467-019-09543-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72. Zhao HQ, Zhang P, Gao H et al. Profiling the RNA editomes of wild-type C. elegans and ADAR mutants. Genome Res. 2015; 25:66–75. 10.1101/gr.176107.114. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73. Duan Y, Li H, Cai W Adaptation of A-to-I RNA editing in bacteria, fungi, and animals. Front Microbiol. 2023; 14:1204080. 10.3389/fmicb.2023.1204080. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74. Yablonovitch AL, Deng P, Jacobson D et al. The evolution and adaptation of A-to-I RNA editing. PLoS Genet. 2017; 13:e1007064. 10.1371/journal.pgen.1007064. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75. Yu Y, Zhou H, Kong Y et al. The landscape of A-to-I RNA editome is shaped by both positive and purifying selection. PLoS Genet. 2016; 12:e1006191. 10.1371/journal.pgen.1006191. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76. Zhao T, Ma L, Xu S et al. Narrowing down the candidates of beneficial A-to-I RNA editing by comparing the recoding sites with uneditable counterparts. Nucleus. 2024; 15:2304503. 10.1080/19491034.2024.2304503. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77. Duan Y, Ma L, Zhao T et al. Conserved A-to-I RNA editing with non-conserved recoding expands the candidates of functional editing sites. Fly. 2024; 18:2367359. 10.1080/19336934.2024.2367359. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78. Porath HT, Knisbacher BA, Eisenberg E et al. Massive A-to-I RNA editing is common across the Metazoa and correlates with dsRNA abundance. Genome Biol. 2017; 18:185. 10.1186/s13059-017-1315-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79. Duan Y, Cai W, Li H Chloroplast C-to-U RNA editing in vascular plants is adaptive due to its restorative effect: testing the restorative hypothesis. RNA. 2023; 29:141–52. 10.1261/rna.079450.122. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80. Chu D, Wei L The chloroplast and mitochondrial C-to-U RNA editing in Arabidopsis thaliana shows signals of adaptation. Plant Direct. 2019; 3:e00169. 10.1002/pld3.169. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 81. Gray MW RNA editing in plant mitochondria: 20 years later. IUBMB Life. 2009; 61:1101–4. 10.1002/iub.272. [DOI] [PubMed] [Google Scholar]
- 82. Lo Giudice C, Hernandez I, Ceci LR et al. RNA editing in plants: a comprehensive survey of bioinformatics tools and databases. Plant Physiol Biochem. 2019; 137:53–61. 10.1016/j.plaphy.2019.02.001. [DOI] [PubMed] [Google Scholar]
- 83. Xie Q, Duan Y An ultimate question for functional A-to-I mRNA editing: why not a genomic G?. J Mol Evol. 2025; 93:185–92. 10.1007/s00239-025-10238-8. [DOI] [PubMed] [Google Scholar]
- 84. Amaya Y, Nakai T, Miura S Evolutionary well-conserved region in the signal peptide of parathyroid hormone-related protein is critical for its dual localization through the regulation of ER translocation. J Biochem. 2016; 159:393–406. 10.1093/jb/mvv111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 85. Ouyang HB, Wang YP, He MH et al. Mutations in the signal peptide of effector gene Pi04314 contribute to the adaptive evolution of the Phytophthora infestans. BMC Ecol Evol. 2025; 25:21. 10.1186/s12862-025-02360-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 86. Naderi M, Ghaderi R, Khezri J et al. Crucial role of non-hydrophobic residues in H-region signal peptide on secretory production of L-asparaginase II in Escherichia coli. Biochem Biophys Res Commun. 2022; 636:105–11. 10.1016/j.bbrc.2022.10.029. [DOI] [PubMed] [Google Scholar]
- 87. Cao Q, Hao Z, Li C et al. Molecular basis of inherited protein C deficiency results from genetic variations in the signal peptide and propeptide regions. J Thromb Haemost. 2023; 21:3124–37. 10.1016/j.jtha.2023.06.021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 88. Benito-Vicente A, Uribe KB, Larrea-Sebal A et al. Leu22_Leu23 duplication at the signal peptide of PCSK9 promotes intracellular degradation of LDLr and autosomal dominant hypercholesterolemia. Arterioscler Thromb Vasc Biol. 2022; 42:e203–16. 10.1161/ATVBAHA.122.315499. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The reference genome, annotation file, and protein sequences of D. melanogaster were downloaded from FlyBase (https://flybase.org/) genome version r6.06. The full list of candidate A-to-I RNA editing sites in D. melanogaster was retrieved from previous literatures [37, 38]. The known list of RNA editing sites in A. mellifera was retrieved from our previous works [37, 39]. The known list of RNA editing sites in H. sapiens was combined from two databases RADAR [40] and REDIportal [41]. The known list of RNA editing sites in M. musculus was retrieved from database RADAR. The reference genomes of other species were downloaded with the following link: Homo sapiens: Ensembl database (https://asia.ensembl.org/), genome version hg38 GRCh38.85; Mus musculus: Ensembl database (https://asia.ensembl.org/), genome version mm10 GRCm38.85; Arabidopsis thaliana: Ensembl plants (https://ftp.ensemblgenomes.ebi.ac.uk/pub/plants/), genome version TAIR10.59; and Apis mellifera: NCBI (https://www.ncbi.nlm.nih.gov/), genome version HAv3.1. The honeybee transcriptomes used in our study were downloaded from NCBI under accession IDs SRR445999–SRR446004 (nurse) and SRR446005–SRR446010 (forager). The SNP data of D. melanogaster were downloaded from DGRP [42]. These data were re-annotated and corrected for ancestral state (to determine the derived allele frequency, DAF) in our previous study [43].




