Chr Model start Model end Strand Number of PhyloCSF regions PhyloCSF region Ids Top rank GENCODEv22 annotation HumanGENCODEv28 gene ID Human contemporary RefSeq model Transcribed? Mouse GENCODE M18 gene ID Mouse GENCODE M18 gene biotype Current MGI symbol Mouse contemporary RefSeq model Point of pseudogenisation Non-murine representative coding model Disablement event Notes 1 9951231 9960832 -1 4 chr1:9951232-9951291-, chr1:9952339-9952470-, chr1:9955834-9955902-, chr1:9960725-9960787- 384 absent ENSG00000283611 / No ENSMUSG00000087198 protein-coding Gm13097 ncRNA XR_001784505.1 human coelacanth PLAR LongAUGORFlinc|3P|XLOC_005457|TCONS_00005606:1.86419|166AA|136AA| exon loss and PTC Uncharacterised protein. While syntenic conservation is currently defined by the human / coelacanth clade, potential homologies for this CDS extend beyond vertebrates (see legend to Figure 3b). 1 21950735 21951578 -1 4 chr1:21950738-21950881-, chr1:21951083-21951142-, chr1:21951161-21951205-, chr1:21951297-21951365- 280 absent ENSG00000283234 / Yes ENSMUSG00000041399 protein-coding 1700013G24Rik protein coding NM_027063.2 primates / PTC and frameshift Uncharacterised protein. 1 248692152 248698932 1 2 chr1:248692176-248692208+, chr1:248697030-248697161+ 5148 two lincRNA ENSG00000229703 ncRNA NR_125950.1 Yes ENSMUSG00000052642 protein-coding 4930504O13Rik protein coding NM_207527.3 apes / loss of ATG Protein containing a u-PAR / Ly-6 domain. The human is locus is transcribed and could theoretically encode a CDS that is 5' truncated. 1 230138601 230138783 1 1 chr2:230138601-230138765+ 569 absent ENSG00000283451 / No ENSMUSG00000062783 protein-coding Csprs protein coding NM_033616.3 primates / multiple exon loss Characterised as component of Sp100-rs (Csprs) in mouse, a putative G-protein coupled receptor 2 112440396 112440601 -1 2 chr2:112440419-112440472-, chr2:112440491-112440583- 803 absent ENSG00000283698 / Yes ENSMUSG00000079051 protein-coding Gm14025 protein coding XM_006500514.2 primates / loss of ATG and multiple exons Vinculin superfamily domain containing protein. 3 141660536 141729405 1 5 chr3:141713327-141713374+, chr3:141721005-141721052+, chr3:141721068-141721094+, chr3:141729293-141729376+, chr3:141738332-141738361+ 281 lincRNA ENSG00000242104 ncRNA NR_136190.1 Yes ENSMUSG00000099564 protein-coding Gm28729 protein coding XM_006511608.3 human / PTC The mouse gene is unofficially known as NACHT, LRR and PYD domains-containing protein 5. 3 46719749 46736388 -1 2 chr3:46719752-46719787-, chr3:46719827-46719997- 462 protein coding ENSG00000261603 pseudo NR_147121.1 Yes ENSMUSG00000049719 protein-coding Prss46 protein coding NM_183103.2 apes / exon loss linked to frameshift Protease, serine 46 was re-annotated as a pseudogene in human after PhyloCSF indicated that the locus is significantly truncated. 3 58180009 58183978 1 2 chr3:58180356-58180451+, chr3:58180482-58180517+ 64 absent ENSG00000283511 protein coding XM_017007608.1 Yes ENSMUSG00000109649 protein-coding Gm45521 protein coding XM_017316351.1 primates xenopus LOC105947030 PTC Protein containing a FAM183 domain 3 167903304 167920676 1 4 chr3:167912863-167912901+, chr3:167915073-167915105+, chr3:167915124-167915201+, chr3:167917330-167917494+ 287 two lincRNA ENSG00000244227 pseudo NR_033843.2 Yes ENSMUSG00000100962 protein-coding / / apes eagle LOC104324177 indel linked to frameshift and PTC Protein containing leucine rich repeat 7 domains. 3 100185687 100211766 1 4 chr3:100208048-100208074+, chr3:100208090-100208179+, chr3:100208198-100208245+, chr3:100211575-100211682+ 486 protein coding (TMEM30C) ENSG00000235156 pseudo NR_028357.1 Yes ENSMUSG00000022753 protein-coding Tmem30c protein coding NM_027651.1 human / chimp / two splice site mutations Transmembrane protein 30C was converted to a transcribed unitary pseudogene in human as the loss of splice sites is coupled with the lack of expression data for the final 4 exons. However, as speculated in PMID:17258408, the production of a shorter protein isoform is theoretically possible. 3 195701618 195711643 1 2 chr3:195708133-195708165+, chr3:195711593-195711628+ 5786 lincRNA ENSG00000283426 ncRNA NR_122105.1 Yes ENSMUSG00000094430 protein-coding Gm933 protein coding NM_001256309.1 primates / multiple PTCs Certain other mammalian RefSeq models have been unofficially named novel somatomedin-B and thrombospondin type-1 domain-containing protein-like. 3 112321140 112321443 1 1 chr3:112321220-112321279+ 1395 absent ENSG00000283669 / No ENSMUSG00000053182 protein-coding Gm609 protein coding NM_001005854.2 primates / multiple exon loss Immunoglobulin V-set domain containing protein. 3 44421316 44421811 -1 2 chr3:44421325-44421354-, chr3:44421725-44421772- 2995 lincRNA ENSG00000225873 ncRNA XR_171376.1 (LINC00694) Yes ENSMUSG00000107504 protein-coding Gm35549 protein coding XM_017313737.1 apes / indel linked to frameshift and PTC The coding potential of this locus was initially suggested by weak mass spectrometry evidence, and it briefly existed as a protein-coding gene in GENCODEv24. Subsequent PhyloCSF analysis made it apparent that the human CDS is truncated by 70aa, which is nearly 50% of the protein. The locus was converted to pseudogene on this basis, while high-confidence peptide evidence has yet to be found. Nonetheless, this locus remains an edge case, and RefSeq have recently annotate the truncated CDS as NM_001351479.1. 3 120357754 120367874 1 7 chr3:120366477-120366563+, chr3:120366639-120366722+, chr3:120366732-120366758+, chr3:120366780-120366827+, chr3:120367884-120367967+, chr3:120367974-120368024+, chr3:120368031-120368084+ 11 antisense ENSG00000240661 ncRNA XR_001740865.1 Yes ENSMUSG00000108763 protein-coding Gm36028 protein coding XM_006522825.2 primates / loss of ATG and multiple splice sites, multiple PTCs V-set domain containing protein. 3 48997221 48999442 -1 / found by re-ranking / absent ENSG00000285416 / No ENSMUSG00000115688 unitary pseudogene / / mammals chicken LOC395991 loss of 5' sequence, splice site mutation Ortholog of chicken avian secreted frizzled-related protein 5. 4 82285053 82340013 1 4 chr4:82285065-82285130+; chr4:82285263-82285295+; chr4:82290504-82290551+; chr4:82295689-82295757+ 819 absent ENSG00000284516 ncRNA XR_938934.2 Yes ENSMUSG00000105078 protein-coding Gm35911 protein coding XM_017321222.1 primates / multiple PTCs Vesicle-associated membrane protein family member. While homologies are found beyond vertebrates, syntenic conservation is apparently limited to mammals. 5 140569962 140594570 -1 5 chr5:140590883-140590909-, chr5:140591081-140591131-, chr5:140591663-140591701-, chr5:140592018-140592047-, chr5:140594242-140594367- 66 5' UTR of APBB3 ENSG00000283602 / Yes ENSMUSG00000044719 protein-coding E230025N22Rik protein coding NM_172831.2 primates / loss of ATG, multiple PTCs, splice site mutation Uncharacterised protein. The human model was identified within the 5' UTR of APBB3. 5 151173981 151180617 1 / found by re-ranking / absent ENSG00000285420 / No / absent / / primate / rodent myotis brandtii LOC102242364 exon loss, splice site mutation Mammalian RefSeq models are typically named unofficially as novel platelet-derived growth factor receptor-like protein. Conservation potentially extends across vertebrates - e.g. fugu ENSTRUT00000033722 – although without synteny beyond mammalian genomes. 6 30056965 30058612 -1 5 chr6:30056989-30057018-, chr6:30057112-30057156-, chr6:30057241-30057297-, chr6:30057580-30057663-, chr6:30058118-30058171- 93 antisense ENSG00000204623 pseudo NR_026751.2 Yes ENSMUSG00000036214 protein-coding Znrd1as protein coding NM_029602.1 apes / multiple frameshifts linked to indels and PTCs Uncharacterised protein, named as zinc ribbon domain containing 1, antisense in mouse. 7 37796058 37796888 1 6 chr7:37796070-37796132+, chr7:37796163-37796255+, chr7:37796281-37796406+, chr7:37796513-37796542+, chr7:37796595-37796783+, chr7:37796802-37796888+ 35 absent ENSG00000283444 / No ENSMUSG00000047462 protein-coding A530099J19Rik protein coding NM_175688.4 primates / multiple frameshifts linked indels G-protein receptor coupled protein fmaily member. 7 2383698 2388652 1 2 chr7:2384592-2384675+, chr7:2388551-2388658+ 304 lincRNA ENSG00000230914 / Yes ENSMUSG00000106350 protein-coding Gm4869 / primates / loss of multiple exons, non-canonical splicing, PTC Similar to kinesin-like protein KIF19. 7 81489472 81608710 -1 10 chr7:81560120-81560206-, chr7:81560743-81560862-, chr7:81570270-81570329-, chr7:81570403-81570444-, chr7:81574077-81574166-, chr7:81581744-81581842-, chr7:81586891-81586944-, chr7:81590981-81591064-, chr7:81597741-81597815-, chr7:81607525-81607572- 904 two lincRNA ENSG00000233491 ncRNA NR_126025.1/ Yes ENSMUSG00000109903 protein-coding / / apes / multiple indels linked to frameshifts and PTCs Cadherin-related family member 3-like protein. The mouse gene has 21 exons, though its structure was previously not apparent as it is not supported by Sanger-sequenced transcript evidence. PacBio data allowed for the correct structure to be resolved. 7 73735407 73736073 1 4 chr7:73735456-73735500+, chr7:73735729-73735848+, chr7:73735924-73735983+, chr7:73743038-73743100+ 140 antisense ENSG00000225969 ncRNA NR_026690.1 Yes ENSMUSG00000085042 transcribed unitary pseudogene Abhd11os ncRNA NR_026688.1 primate / rodent pig: XP_013851001.1 (bicaudal D-related protein homolog isoform X2) loss of first exon Certain RefSeq models in other mammals have been named bicaudal D-related protein. 8 8719536 8714055 -1 5 chr8:8714226-8714285-, chr8:8714879-8714917-, chr8:8715416-8715499-, chr8:8715532-8715576-, chr8:8719469-8719546- 167 absent ENSG00000284717 / No ENSMUSG00000109372 protein-coding Gm19410 protein-coding XM_006509234.3 primates / multple exon loss Lipoprotein receptor-like protein. 8 144049079 144051203 1 9 chr8:144049161-144049202+, chr8:144049291-144049581+, chr8:144049636-144049896+, chr8:144050020-144050055+, chr8:144050302-144050370+, chr8:144050386-144050448+, chr8:144050794-144050856+, chr8:144051015-144051050+, chr8:144051114-144051191+ 902 antisense ENSG00000204791 protein coding XM_017014128.1 Yes ENSMUSG00000071724 protein-coding Smpd5 protein coding NM_001195537.1 primates / multiple frameshifts linked to indels and PTCs Long established as sphingomyelin phosphodiesterase 5 in mouse. The frameshift and PTC in ape genomes are shortly downstream of the ancestral ATG. Translation from the next in-frame ATG would give a very small CDS, as it is found in exon 5. The RefSeq model uses an upstream ATG that is poorly conserved and has little evidence for transcription. 9 120847294 120851339 1 1 chr9:120847295-120847354+ 1125 antisense ENSG00000226752 ncRNA NR_024408.1 Yes ENSMUSG00000026870 protein-coding Cutal protein coding NM_001308024.1 primates / multiple exon loss Approximately half of the cutA divalent cation tolerance homolog-like protein-coding gene has been lost in primates. 10 100105754 100140151 -1 8 chr10:100105748-100105804-, chr10:100108360-100108389-, chr10:100108448-100108504-, chr10:100117267-100117368-, chr10:100117406-100117438-, chr10:100121044-100121079-, chr10:100122506-100122571-, chr10:100122574-100122651- 105 absent ENSG00000283232 / No ENSMUSG00000025197 protein-coding Cyp2c44 protein coding NM_001001446.3 primates / multiple exon loss linked to frameshifts and PTCs Cytochrome P450, family 2, subfamily c, polypeptide 44 is highly degraded in primates. 10 26664336 26697295 -1 2 chr10:26664276-26664392-, chr10:26667561-26667644- 1751 lincRNA ENSG00000227932 ncRNA XR_930765.1 Yes / absent / / primate / rodent PLAR model from ferret: EnsCodingFull|3P|XLOC_025805|TCONS_00068935:4.5679 (LOC101691422 in RefSeq) multiple exon loss linked to frameshifts and PTCs Certain protein-coding RefSeq models from other mammals are known unofficially as selenoprotein O-like. 10 26379034 26387374 -1 / found by re-ranking / absent ENSG00000284951 / No / absent / / placental mammals opossum LOC103093635 exon loss, multiple PTCs, splice site mutation Protein containing a TNFR / NGFR cysteine-rich region. Conservation without synteny apparently extends across vertebrates, e.g. Larimichthys crocea LOC104936695. 11 111448450 111473381 1 37 chr11:111447053-111447100+, chr11:111448458-111448505+, chr11:111448581-111448643+, chr11:111449336-111449365+, chr11:111449437-111449472+, chr11:111449753-111449803+, chr11:111450349-111450399+, chr11:111450415-111450444+, chr11:111451043-111451141+, chr11:111451248-111451313+, chr11:111451338-111451409+, chr11:111452409-111452501+, chr11:111453424-111453489+, chr11:111454244-111454366+, chr11:111454984-111455016+, chr11:111455382-111455477+, chr11:111455806-111455871+, chr11:111456553-111456738+, chr11:111457066-111457335+, chr11:111457329-111457391+, chr11:111457847-111457927+, chr11:111457988-111458044+, chr11:111458117-111458194+, chr11:111459630-111459725+, chr11:111459756-111459782+, chr11:111460211-111460348+, chr11:111461911-111461937+, chr11:111462322-111462393+, chr11:111462672-111462710+, chr11:111463102-111463155+, chr11:111463318-111463368+, chr11:111463396-111463428+, chr11:111466815-111466889+, chr11:111469225-111469287+, chr11:111469784-111469840+, chr11:111469916-111469942+, chr11:111473319-111473375+ 394 two lincRNA ENSG00000255093 ncRNA XR_001748381.1/ncRNA XR_001748382.1 Yes ENSMUSG00000110266 protein-coding Gm32742 protein coding XM_017313740.1 primates / multiple frameshifts linked to indels and PTCs, multiple splice site mutations Uncharacterised protein. 12 22116629 22134851 -1 6 chr12:22116633-22116662-, chr12:22116662-22116733-, chr12:22126639-22126758-, chr12:22130471-22130527-, chr12:22134804-22134851-, chr12:22134869-22134940- 65 absent ENSG00000283582 ncRNA XR_001749042.1 No ENSMUSG00000048473 protein-coding Sult6b2 protein coding NM_001145390.1 apes / missing exon linked to frameshift Sulfotransferase family 6B, member 2. 12 95311312 95416098 -1 4 chr12:95405372-95405443-, chr12:95405471-95405503-, chr12:95407918-95407956-, chr12:95408086-95408124- 1638 three lincRNA ENSG00000257943 ncRNA XR_001749266.1/ncRNA XR_001749265.1 Yes ENSMUSG00000114757 unitary pseudogene / / primate / rodent xenopus XP_008109055 exon loss, multiple splice site mutations, multiple PTCs and frameshifts. Protein containing a FAM194 domain. 12 2761200 2775658 -1 3 chr12:2767965-2768021-, chr12:2770415-2770456-, chr12:2775566-2775661- 2025 lincRNA ENSG00000255669 ncRNA NR_033958.1 Yes / absent / / primate / rodent dog PLAR model: EnsCodingFull|3P|XLOC_033065|TCONS_00081178|n/a:8.02699 (XP_005637371.1) multiple PTCs and splice site mutations Maestro heat-like repeat-containing protein family member. 12 121053732 121059670 -1 7 chr12:121053768-121053842-, chr12:121053873-121053899-, chr12:121053950-121054006-, chr12:121054019-121054045-, chr12:121059440-121059478-, chr12:121059542-121059586-, chr12:121059614-121059655- 1742 absent ENSG00000283542 / No ENSMUSG00000029561 protein-coding Oasl2 protein coding NM_011854.2 primates / multiple exon loss linked to frameshifts, multiple PTCs 2'-5' oligoadenylate synthetase-like 2. 13 24611966 24613983 1 2 chr13:24611978-24612046+, chr13:24613945-24613986+ 1873 absent ENSG00000285228 ncRNA XR_001749794.1 No / absent / / primate / rodent cat: XP_019681345 loss of most exons Just two exons remain of a large protein-coding gene found in non primate / non-rodent mammals. The cat model is unofficially known as zonadhesin and has a 3,514aa CDS. 14 62117346 62128397 1 3 chr14:62127829-62127903+, chr14:62127863-62127889+, chr14:62128023-62128391+ 275 lincRNA ENSG00000186369 ncRNA NR_015358.2 Yes ENSMUSG00000071265 unitary pseudogene 1700086L19Rik ncRNA NR_030733.1 primate / rodent dog PLAR model: LongAUGORFlinc|3P|XLOC_055903|TCONS_00135715:5.66804|194AA|103AA||Pfam_1_doms|RNACode|HSS_1991_9.6E-10 indels linked to frameshifts and PTCs Uncharacterised protein. 14 34954859 34982526 -1 1 chr14:34982422-34982517- 353 antisense ENSG00000258704 ncRNA XR_943744.2 Yes ENSMUSG00000062198 protein-coding 2700097O09Rik protein coding NM_028314.2 base of primates (CDS intact in bushbaby but not in new or old world monkey genomes) / multiple exon loss, ATG loss, splice site mutation Protein containing a AdoMet_Mtase superfamily domain. 15 82711246 82711877 -1 4 chr15:82711282-82711380-, chr15:82711677-82711727-, chr15:82711740-82711784-, chr15:82711806-82711838- 594 lincRNA ENSG00000228141 pseudo NR_034139.1 Yes ENSMUSG00000061877 protein-coding BC048679 protein coding NM_001193274.1 primates / exon loss linked to frameshift, multiple PTCs Protein containing a NLPC_P60 superfamily domain. 15 75226401 75234014 -1 2 chr15:75229590-75229646-; chr15:75228852-75228920- 22059 antisense ENSG00000260660 protein coding NM_001321990.1 Yes ENSMUSG00000070298 protein-coding Trcg1 protein coding NM_001014398.2 primates / multiple indels linked to frameshifts and PTCs Taste receptor cell gene 1. The RefSeq CDS starts downstream of the reading frame disruption, and it is highly 5' truncated. 17 68691809 68716064 -1 1 chr17:68693973-68694062- 839 absent ENSG00000283376 / No ENSMUSG00000020617 protein-coding 1700012B07Rik protein coding NM_001162428.1 primates / multiple indels linked to frameshifts and PTCs Uncharacterised protein. 19 6396380 6412338 1 8 chr19:6411742-6411786+, chr19:6411844-6411915+, chr19:6412270-6412332+, chr19:6414855-6414923+, chr19:6414920-6414964+, chr19:6424412-6424450+, chr19:6424801-6424848+, chr19:6424859-6424885+ 18 lincRNA ENSG00000214347 protein coding NM_001321601.1 Yes ENSMUSG00000109739 protein-coding Gm17949 inferred pseudogene Gm17949 primates / exon loss linked to PTC Adenylate kinase isoenzyme 1-like. The RefSeq pseuodgene in mouse has an incorrect structure. 19 50812707 50808145 -1 2 chr19:50808166-50808204-, chr19:50808655-50808759- 99 absent ENSG00000284711 / Yes ENSMUSG00000108622 protien-coding Gm36864 protein-coding XM_006541343.3 primates / exon loss Uncharacterised protein 22 21657811 21661021 1 3 chr22:21657814-21658014+, chr22:21658266-21658385+, chr22:21660896-21660946+ 82 absent ENSG00000284630 / Yes ENSMUSG00000049916 protein-coding 2610318N02Rik protein coding NM_183287.2 base of primates (CDS intact in bushbaby but not in new or old world monkey genomes) / multiple exon loss Uncharacterised protein. The locus has undergone rearrangement in human, leaving a complex distribution of fragments. X 64405660 64406074 1 1 chrX:64405660-64406073+ 335 absent ENSG00000283349 / Yes ENSMUSG00000109858 protein-coding Gm45800 / apes / PTC Profilin 2 family-like protein, since named as profilin 5 (Pfn5) by the Mouse Genome Informatics resource based on this annotation. The single exon structure suggests it originated through retrotransposition.