a, Schematic of the CRISPR/Cas9 library design and targeted sequencing of de novo genes expressed in HEK293T cells. b, Mutations identified in U1/splice sites and their effects on the nuclear/cytoplasmic distribution of the corresponding transcript. The innermost layer and the second layer of the circular heat map show the ratios of the reads depth of the mutated allele to the reference allele in the nucleus (the second layer) and cytoplasm (the innermost layer), respectively. The ratio of scores of the two layers are shown on the outermost layer, in which the blue and red correspond to the odds ratio of N/C ratios between the mutants and the wild-type controls, respectively. According to the benchmark scale bar, the blue codes indicate that the mutation introduce a decreased N/C ratio for the corresponding transcript (increased nuclear export activity), while the red codes indicate that the mutations introduce an increased N/C ratio of the corresponding transcript (or decreased nuclear export activity). The mutations are ranked according to the differences in the U1 score (right part of the circular heat map) or the PSI value (left part) between the mutant and reference alleles (red arrow, mutants show higher U1 scores or lower PSI; blue arrow, mutants with lower U1 scores or higher PSI than the reference alleles). c,d, Proportions of reads in the nucleus and cytoplasm were shown in red and blue, respectively, for one mutation introducing a stronger splice site (c, two-sided, Fisher’s exact test, P < 2.2 × 10−16) and another mutation introducing a lower U1 score (d, two-sided, Fisher’s exact test, P < 2.2 × 10−16). e, The statistics of the segregating sites fixed after the divergence of human and rhesus macaques during the process of de novo gene origin. The effects of the segregating sites on the activity of RNA splicing and the affinity of U1 binding were predicted, and the proportions of sites with different effects are shown. *P ≤ 0.05; **P ≤ 0.01; ***P ≤ 0.001.
Source data