ABSTRACT
Pseudogenes, nonfunctional genomic sequences derived from functional protein-coding genes, form by duplication or retrotransposition, and loss of gene function by disabling mutations. Studies on the evolution and functional aspects of plant pseudogenes are limited, despite their abundance in the plant genome. To date, most researches on pseudogenes focus on mammals. Here, we summarized current knowledge on pseudogenes including the historical and recent progress, analyzes their essential roles in gene regulation in hope of further stimulating researches in plant species for understanding gene regulation and evolution.
KEYWORDS: Pseudogene, origin, evolution, function, noncoding RNAs
Introduction
Pseudogenes have long been defined as non-functional sequences of genomic DNA that are formed by duplication or retrotransposition and loss of gene function by subsequent disabling mutations. Pseudogene assignments are largely dependent on reliable and stable gene annotations within the genome. As such, the fast genomic sequencing technology offers the possibility for comprehensive comparisons of the Ψ complement in various organisms to specific families or classes of pseudogenes. System studies have been performed in human (Homo sapiens),1,2 and later in worm (Caenorhabidis elegans), and fly (Drosophila melanogaster)3 and plant species.4–7
Pseudogenes are by nonfunctional sequences and are expected to be evolving neutrally, however, studies in arabidopsis and rice reveal that plant pseudogenes seem to experience much stronger purifying selection than animal pseudogenes.5 Further analyses reveals that the selection pressure along the pseudogene is not uniform, suggesting they were functional for a relatively long time.5
Although pseudogenes lack functional signals at the protein level, however, it does not preclude the possibility that some pseudogenes may function as RNA genes.4,5 Our recent large-scale plant pseudogenes study performed in seven angiosperm organisms reveal that the proximal regions of pseudogenes are highly related with regulatory noncoding RNAs (microRNAs and long noncoding RNAs),4 suggesting that pseudogenes may have regulatory roles in plant. The aim of this mini-review is to summarize the current knowledge of pseudogenes for understanding pseudogene origin, evolution and regulation in plant species.
Types and origin of pseudogenes
Pseudogenes are disabled copies of protein-coding genes that are abundant in genomes.8 Protein-coding genes become pseudogenes if degenerated features are present, such as frameshifts, in-frame stop codons, or disrupting interspersed repeats in the original protein-coding sequence or functional promoters.6,9,10 This may occur at coding regions or promoter regions.6,9 Pseudogenes are thought to arise by duplication of protein-coding genes,11 by duplication of existing pseudogenes with gradual accumulation of disabling mutations.5 In 1977, the first “pseudogene” was discovered in Xenopus laevis, which is truncated copy of a 5S rRNA gene with a compromised function.12 Depending on the mechanism of the duplication event that created the pseudogenes, pseudogenes can be classified into three categories: Processed, nonprocessed and unitary pseudogene. Processed pseudogene, also termed retro-pseudogene, arose by reverse transcription of processed mRNAs, followed by integration into the genome.5,13 Thus, the rate of new processed pseudogene generated can be disturbed by the burst of retrotransposition, and the mRNA expression levels.1,2 Nonprocessed pseudogenes are derived from segmental duplication,14 whole genome duplication (WGD),4 tandem duplication.15 An additional type, termed unitary pseudogene (UPG), become disabled that may not have been duplicated before.16 This type of pseudogene is similar to an unprocessed pseudogene but its paralogous counterpart is not found.17 The evolution of land plants is characterized by WGD.18 Immediately after gene duplication, either copy may become genetic redundancy, for no selective pressure should be against any loss-of-function mutation,19 resulting the possibility that a substantial number of pseudogenes can be generated from this process.20
Pseudogene prediction
Pseudogenes prediction is necessary for genome annotation. However, laborious experimental evidence is difficult to identify pseudogene at genome scale. Therefore, some bioinformatics algorithms for pseudogene prediction have been developed, such as PseudoPipe,10 PSF (Pseudogene Finder),21 Shiu’s pipeline,5 PPFinder (Processed Pseudogene Finder),22 and PlantPseudo.4 Owing to the high similarity to their parental genes, homology-based strategy has been predominantly used to initiate a search for pseudogenes by these tools.23 Therefore, three main steps are taken: (1) the search of intergenic regions (masked genic and transposon regions) with sequence similarity to known proteins. (2) detection of destructive mutations. Pseudogene candidates were aligned to their parental genes to examine disabled gene features, like premature stop codons or frameshifts. (3) quality control of the pseudogene candidates, including the match length, identity. To date, PseudoPipe, Shiu’s pipeline, PSF, and PlantPseudo are publicly available automatic pseudogene prediction pipelines, which have great potential to be applied to plant genomes. Among them, PlantPseudo could detect the WGD-derived pseudogenes.4 However, UPGs could not be predicated by all the current bioinformatics tools, for they have no homologous genes.
Evolution and function
Most pseudogenes are devoid of function, thus are expected to evolve neutrally.8 In contrast to animal pseudogenes,24 many plant pseudogenes experienced much stronger purifying selection, with the regions 5′ and 3′ end in the pseudogenes have experienced stronger selection pressure, suggesting that the both regions were functional for a longer period of time after the loss-of-function mutation appeared.4,5 The distribution of pseudogenes is highly associated with recombination rate of the genome, with higher preservation in regions with low recombination rates.3,4 Processed pseudogenes have apparently evolved neutrally because of lacking active promoters.1 Studies of pseudogenes have shown that 2%-5% of mammalian pseudogenes and 2%-32.5% of plant pseudogene are expressed.4,5,25,26 Recent studies revealed that a substantial of pseudogenes, mostly nonprocessed pseudogenes, are enriched in rare alleles,3 have conserved upstream sequences,3 experiencing ongoing constraint, with active transcription factors (TFs) and RNA polymerase II (Pol II) binding sites in the upstream proximal regions,4 hinting their potential regulatory roles. Indeed, many pseudogenes exhibit varying degrees of partial activity, such as, transcribing as high active RNA genes,4,9,25,27 lncRNAs,28 small interfering RNAs,29-32 and microRNA decoys.33,34 Some could even translate into short regulatory peptides that have potential in increasing oxidative stress tolerance in Arabidopsis.35 Therefore, it is becoming apparent that many Ψs can be function as regulatory RNA genes or short regulatory peptides. Number of in-depth case studies reported important roles for pseudogenes in physiology and diseases.36 For example, pseudogenes of the frequently mutated cancer genes PTEN, KRAS, and BRAF function as ceRNAs in vitro.36,37 Pseudogene PTENP1 could function as a competing endogenous RNA to suppress clear-cell renal cell carcinoma progression.38 All these suggest that pseudogenes could participate in transcription and posttranscriptional regulation of protein-coding genes in the context of biological networks as form of non-coding competitive endogenous RNAs and regulatory peptides.
Pseudogenes in plants
Pseudogenes have been reported mostly in mammals.3,13,14,39 In contrast, studies in plant remain limited. In rice, a total of 1,439 pseudogenes have been identified.20 Zou et al. identified 28,330 and 4,771 pseudogenes at intergenic regions of rice and Arabidopsis, respectively.5 Some plant pseudogenes have been experiencing selective constraints as strong as those experienced by most functional genes.5 In Triticum species, pseudogenes were also identified.40 Recently, we performed large-scale pseudogenes prediction in seven plant organisms, including Arabidopsis thaliana, Brachypodium distachion, Glycine max [soybean], Medicago truncatula, Oryza sativa [rice], Populus trichocarpa, and Sorghum bicolor and found that a surprisingly large fraction of nonTE regulatory noncoding RNAs (microRNAs and long noncoding RNAs) originate from transcription of pseudogene proximal upstream regions.4 Therefore, we proposed that the rapid rewiring of Ψ transcriptional regulatory regions is an essential mechanism driving the origin of novel regulatory modules.4 Studies of Arabidopsis populations revealed that the loss-of-function mutations occurred in coding regions play important roles in adaptation and phenotypic diversification.7 Experimental data showed that OSIP108 is contained within a pseudogene and is induced in A. thaliana leaves by both the reactive oxygen species-induced PQ and the necrotrophic fungal pathogen Botrytis cinerea.35 Overall, more comprehensive and functional analyses of pseudogenes for plants remain much needed to accelerate research in this area.
Conclusions
We have summary that pseudogenes can be function by being involved in the regulation of gene expression and generating genetic diversity (Figure 1). Plant genomes contain some pseudogenes that under strong constraints because they are import components of the genome. Mechanisms of regulation include the formation of endogenous siRNAs, lncRNAs, miRNAs and providing transcription factors (TFs) and RNA polymerase II (Pol II) binding sites. While additional studies are needed to confirm their regulation in plant genomes.
Figure 1.
Illustration of pseudogenes formation and function. (*) indicates deleterious mutations.
Funding Statement
This work was supported by the the Fundamental Research Funds for the Central Universities [2018ZY32]; Young Elite Scientists Sponsorship Program by CAST [2018QNRC001]; the Project of the National Natural Science Foundation of China [31670333]; the Project of the National Natural Science Foundation of China [31600537]; Guangdong Provincial Key Laboratory of Silviculture, Protection and Utilization [SPU 2018-01].
Acknowledgments
This work was supported by the Fundamental Research Funds for the Central Universities (2018ZY32), Young Elite Scientists Sponsorship Program by CAST (2018QNRC001), Guangdong Provincial Key Laboratory of Silviculture, Protection and Utilization (No. SPU 2018-01), the Project of the National Natural Science Foundation of China (No. 31600537 and 31670333).
Disclosure of Potential Conflicts of Interest
No potential conflicts of interest were disclosed.
References
- 1.Zhang Z, Harrison PM, Liu Y, Gerstein M.. Millions of years of evolution preserved: a comprehensive catalog of the processed pseudogenes in the human genome. Genome Res 2003; 13:2541–2558. doi: 10.1101/gr.1429003 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Zhang Z, Carriero N, Gerstein M.. Comparative analysis of processed pseudogenes in the mouse and human genomes. Trends Genet 2004; 20:62–67. doi: 10.1016/j.tig.2003.12.005 [DOI] [PubMed] [Google Scholar]
- 3.Sisu C, Pei B, Leng J, Frankish A, Zhang Y, Balasubramanian S, Harte R., Wang D., Rutenberg-Schoenberg M., Clark W., Diekhans M., Rozowsky J., Hubbard T., Harrow J., Gerstein M B.. Comparative analysis of pseudogenes across three phyla. P Natl Acad Sci USA 2014; 111:13361–13366. doi: 10.1073/pnas.1407293111 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Xie J, Li Y, Liu X, Zhao Y, Li B, Ingvarsson PK, Zhang D.. Evolutionary origins of pseudogenes and their association with regulatory sequences in plants. Plant Cell 2019; 31:563–578. doi: 10.1105/tpc.18.00601 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Zou C, Lehti-Shiu MD, Thibaud-Nissen F, Prakash T, Buell CR, Shiu SH.. Evolutionary and Expression Signatures of Pseudogenes in Arabidopsis and Rice. Plant Physiol 2009; 151:3–15. doi: 10.1104/pp.109.140632 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Yang L, Takuno S, Waters ER, Gaut BS.. Lowly expressed genes in Arabidopsis thaliana bear the signature of possible pseudogenization by promoter degradation. Mol Biol Evol 2011; 28:1193. doi: 10.1093/molbev/msq298 [DOI] [PubMed] [Google Scholar]
- 7.Xu YC, Niu XM, Li XX, He W, Chen JF, Zou YP, Wu Q, Zhang YE, Busch W, Guo YL. Adaptation and phenotypic diversification through loss-of-function mutations in Arabidopsis protein-coding genes. Plant Cell. 2019;31:1012–1025. doi: 10.1105/tpc.18.00791. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Li WH, Gojobori T, Nei M.. Pseudogenes as a paradigm of neutral evolution. Nature 1981; 292:237–239. [DOI] [PubMed] [Google Scholar]
- 9.Balakirev ES, Ayala FJ.. Pseudogenes: are They “Junk” or Functional DNA? Annu Rev Genet 2002; 37:123–151. doi: 10.1146/annurev.genet.37.040103.103949 [DOI] [PubMed] [Google Scholar]
- 10.Zhang Z, Carriero N, Zheng D, Karro J, Harrison PM, Gerstein M.. PseudoPipe: an automated pseudogene identification pipeline. Bioinformatics 2006; 22:1437–1439. doi: 10.1093/bioinformatics/btl116 [DOI] [PubMed] [Google Scholar]
- 11.Balakirev ES, Ayala FJ.. Pseudogenes: are they “junk” or functional DNA? Annu Rev Genet 2003; 37:123. doi: 10.1146/annurev.genet.37.040103.103949 [DOI] [PubMed] [Google Scholar]
- 12.Jacq C, Miller JR, Brownlee GG.. A pseudogene structure in 5S DNA of Xenopus laevis. Cell 1977; 12:109–120. [DOI] [PubMed] [Google Scholar]
- 13.Balasubramanian S, Zheng DY, Liu YJ, Fang G, Frankish A, Carriero N, Robilotto R, Cayting P, Gerstein M. Comparative analysis of processed ribosomal protein pseudogenes in four mammalian genomes. Genome Biol. 2009;10:R2. doi: 10.1186/gb-2009-10-1-r2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Khurana E, Lam HY, Cheng C, Carriero N, Cayting P, Gerstein MB.. Segmental duplications in the human genome reveal details of pseudogene formation. Nucleic Acids Res 2010; 38:6997–7007. doi: 10.1093/nar/gkq587 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Allen CL, Kelly JM.. Trypanosoma cruzi: mucin pseudogenes organized in a tandem array. Exp Parasitol 2001; 97:173–177. doi: 10.1006/expr.2001.4600 [DOI] [PubMed] [Google Scholar]
- 16.Zhang ZD, Frankish A, Hunt T, Harrow J, Gerstein M.. Identification and analysis of unitary pseudogenes: historic and contemporary gene losses in humans and other primates. Genome Biol 2010; 11:R26. doi: 10.1186/gb-2010-11-11-r110 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Xiao J, Sekhwal MK, Li P, Ragupathy R, Cloutier S, Wang X, You F.. Pseudogenes and Their Genome-Wide Prediction in Plants. International Journal of Molecular Sciences 2016; 17:1991. doi: 10.3390/ijms17121991 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Mühlhausen S, Kollmar M.. Whole genome duplication events in plant evolution reconstructed and predicted using myosin motor proteins. BMC Evolutionary Biology 2013. September 22; 13(1): 202. doi: 10.1186/1471-2148-13-202 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Ho-Huu J. Contrasted patterns of selective pressure in three recent paralogous gene pairs in the Medicago genus (L.). BMC Evol Biol 2012; 12:195. doi: 10.1186/1471-2148-12-195 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Thibaud-Nissen F, Ouyang S, Buell CR.. Identification and characterization of pseudogenes in the rice gene complement. Bmc Genomics 2009; 10:317. doi: 10.1186/1471-2164-10-317 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Solovyev V, Kosarev P, Seledsov I, Vorobyev D.. Automatic annotation of eukaryotic genes, pseudogenes and promoters. Genome Biol 2006; 7 Suppl 1:S10.1. doi: 10.1186/gb-2006-7-8-R72 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.van Baren MJ, Brent MR.. Iterative gene prediction and pseudogene removal improves genome annotation. Genome Res 2006; 16:678–685. doi: 10.1101/gr.4766206 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Rouchka EC, Cha IElizabeth.. Current Trends in Pseudogene Detection and Characterization. Current Bioinformatics 2009; 4. doi: 10.2174/157489309788184792 [DOI] [Google Scholar]
- 24.Torrents D, Suyama M, Zdobnov E, Bork P.. A genome-wide survey of human pseudogenes. Genome Res 2003; 13:2559–2567. doi: 10.1101/gr.1455503 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Zheng D, Frankish A, Baertsch R, Kapranov P, Reymond A, Choo SW, Lu Y., Denoeud F., Antonarakis S E., Snyder M., Ruan Y., Wei C-L., Gingeras T R., Guigó R., Harrow J., Gerstein M B.. Pseudogenes in the ENCODE regions: consensus annotation, analysis of transcription, and evolution. Genome Res 2007; 17:839–851. doi: 10.1101/gr.5586307 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Harrison PM, Deyou Z, Zhaolei Z, Nicholas C, Mark G.. Transcribed processed pseudogenes in the human genome: an intermediate form of expressed retrosequence lacking protein-coding ability. Nucleic Acids Res 2005; 33:2374–2383. doi: 10.1093/nar/gki531 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Laura P, Leonardo S, Jiangwen Z, Brett C, Haveman WJ, Pier Paolo P.. A coding-independent function of gene and pseudogene mRNAs regulates tumour biology. Nature. 2015;465:1033–1038. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Milligan MJ, Lipovich L.. Pseudogene-derived lncRNAs: emerging regulators of gene expression. Front Genet. 2014;5:476. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Tam OH, Aravin AA, Stein P, Girard A, Murchison EP, Cheloufi S, Hodges E., Anger M., Sachidanandam R., Schultz R M., Hannon G J.. Pseudogene-derived small interfering RNAs regulate gene expression in mouse oocytes. Nature 2008; 453:534–538. doi: 10.1038/nature06904 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Ma HW, Xie M, Sun M, Chen TY, Jin RR, Ma TS, Chen QN, Zhang EB, He XZ, et al. The pseudogene derived long noncoding RNA DUXAP8 promotes gastric cancer cell proliferation and migration via epigenetically silencing PLEKHO1 expression. Oncotarget. 2017;8:52211–52224. doi: 10.18632/oncotarget.11075. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Guo X, Zhang Z, Gerstein MB, Zheng D.. Small RNAs originated from pseudogenes: cis- or trans-acting? Plos Comput Biol 2009; 5:e1000449. doi: 10.1371/journal.pcbi.1000509 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Wen YZ, Zheng LL, Liao JY, Wang MH, Wei Y, Guo XM, Qu L.-H., Ayala F J., Lun Z.-R.. Pseudogene-derived small interference RNAs regulate gene expression in African Trypanosoma brucei. P Natl Acad Sci USA 2011; 108:8345–8350. doi: 10.1073/pnas.1103894108 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Swami M. Small RNAs: pseudogenes act as microRNA decoys. Nature Reviews Cancer 2010; 10:535. doi: 10.1038/nrc2898 [DOI] [PubMed] [Google Scholar]
- 34.Devor EJ. Primate microRNAs miR-220 and miR-492 lie within processed pseudogenes. J Hered 2006; 97:186. doi: 10.1093/jhered/esj037 [DOI] [PubMed] [Google Scholar]
- 35.Coninck BD, Carron D, Tavormina P, Willem L, Craik DJ, Vos C, Thevissen K., Mathys J., Cammue Bruno P A.. Mining the genome of Arabidopsis thaliana as a basis for the identification of novel bioactive peptides involved in oxidative stress tolerance. J Exp Bot 2013; 64:5297–5307. doi: 10.1093/jxb/ert295 [DOI] [PubMed] [Google Scholar]
- 36.Karreth FA, Reschke M, Ruocco A, Ng C, Chapuy B, Leopold V, Sjoberg M., Keane T M., Verma A., Ala U., Tay Y., Wu D., Seitzer N., Velasco-Herrera Martin Del Castillo., Bothmer A., Fung J., Langellotto F., Rodig S J., Elemento O., Shipp M A., Adams D J., Chiarle R., Pandolfi P P.. The BRAF Pseudogene Functions as a Competitive Endogenous RNA and Induces Lymphoma In Vivo. Cell 2015; 161:319–332. doi: 10.1016/j.cell.2015.02.043 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Laura P. Pseudogenes: newly discovered players in human cancer. Science Signaling 2012; 5:re5. doi: 10.1126/scisignal.2003289 [DOI] [PubMed] [Google Scholar]
- 38.Yu G, Yao WM, Gumireddy K, Li AP, Wang J, Xiao W, Chen Ke, Xiao H., Li H., Tang K., Ye Z., Huang Q., Xu H.. Pseudogene PTENP1 Functions as a Competing Endogenous RNA to Suppress Clear-Cell Renal Cell Carcinoma Progression. Mol Cancer Ther 2014; 13:3086–3097. doi: 10.1158/1535-7163.MCT-14-0245 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Lai P, Bahl G, Gremigni M, Matarazzo V, Clotfaybesse O, Ronin C, et al. An Olfactory Receptor Pseudogene whose Function emerged in Humans. 2007. doi: 10.1094/PDIS-91-4-0467B [DOI] [PMC free article] [PubMed]
- 40.Thomas W, Mayer KFX, Heidrun G, Mihaela M, Burkhard S, Uwe S, et al. Frequent gene movement and pseudogene evolution is common to the large and complex genomes of wheat, barley, and their relatives. Plant Cell 2011; 23:1706–1718. doi: 10.1105/tpc.111.086629 [DOI] [PMC free article] [PubMed] [Google Scholar]