Characterization of circRNA-derived pseudogenes and their potential role in reshaping genome architecture. (A) A schematic diagram of the generation of a pseudogene from a (linear) mRNA. (B) A schematic diagram of the generation of a circRNA-derived pseudogene. (C) Genome-wide identification of circRNA-derived pseudogenes by CIRCpseudo. A reference containing 40 bp back-splicing junction sequences (20 bp on either side of junction) in expressed circRNAs was constructed, and then mapped to the genome to identify circRNA-derived pseudogenes (Supplementary information, Data S1). (D) Characterization of mouse circRFWD2-derived pseudogenes. A circRFWD2 that contains exons 2, 3, 4, 5 and 6 with back-splicing of the exon 6-exon 2 junction sequence was produced from the mouse chr1:159 232 326-159 347 580 locus (top), and could be retrotransposed (middle, gray box) to generate pseudogenes at different genomic regions (bottom). *, MMERVK10C-int LTR retrotransposon sequences. (E) Counts of adenosines in 3′ ends of poly(A) tails in all (both linear and circular) RFWD2-originated pseudogenes by UCSC RetroGene annotation. 39 out of 42 putative circRFWD2-derived pseudogenes annotated by UCSC have significantly fewer adenosines than those in the six linear RFWD2 mRNA-derived pseudogenes. *** P = 6.0 × 10−4, Wilcoxon rank-sum test. (F) LTRs are highly enriched in the flanking regions of the circRFWD2-derived pseudogenes. The nearest LTRs are significantly higher in the flanking regions of all 42 circRFWD2-derived pseudogenes (red solid line) than those in mouse RefGenes (gray dashed line) or mouse RetroGenes (blue dashed line). (G) A CTCF-binding site resides in the mouse circSATB1-derived pseudogene region in the mouse ENCODE MEL and G1E cell lines. Correspondingly, this area is also suggested as an enhancer region with active H3K4me1 signals. Blue peaks, CTCF-binding signals; gray peaks, H3K4me1-binding signals; black bars over the binding signals, predicted CTCF/H3K4me1-binding regions.