Abstract
Large-scale experimental analyses find ever more abundant evidence of translation from start codons upstream of the canonical start site. This translation either generates entirely new proteins (from novel upstream open reading frames) or produces isoforms with extended N-terminals when the novel start codon is in frame
Most extended N-terminals are likely to just add a disordered region to the canonical protein isoform, but some may also block the recognition of the signal peptide causing the isoform to accumulate in the incorrect cellular compartment. This analysis finds evidence that upstream translations that would interfere with signal peptides are detected in expected quantities in ribosome profiling experiments, but that the equivalent N-terminally extended protein isoforms are significantly reduced in multiple proteomics experiments.
This suggests that these isoforms are likely to be degraded shortly after translation by the ubiquitination pathway, thus preventing the build up of potentially harmful proteins with hydrophobic regions in the cytoplasm. In addition, this is further evidence that most of the transcripts translated from upstream start sites are the result of an inefficient translation initiation process. This has implications for the annotation of proteins given the huge numbers of upstream translations that are being detected in large-scale experiments.
Introduction
Translation from regions upstream of canonical start codons has been shown to be commonplace [1–4], and in particular in-frame upstream translations that produce isoforms with extended N-terminals have abundant proteomics support [4,6]. Evidence suggests that the upstream translation detected in these experiments is only the tip of the iceberg [4]. This translation has two main characteristics that distinguish it from translation from canonical coding exons. Firstly, most start codons are non-canonical [2–5], and secondly, with a few notable exceptions [4], these upstream regions have little detectable conservation signal [3,4].
Translation from upstream of the canonical ATG can be divided into three main types [4]. Open reading frames (ORFs) that have both their initiation codon and stop codon upstream of the canonical ATG (uORFs), ORFs that upstream of principal ATG and overlap coding exons in a different frame (overlapping uORFs, or uoORFs), and 5’ extensions that initiate before the ATG and read through to the coding exons in the same frame. Translated uORFs and uoORFs produce proteins that are very different from the canonical proteins, and 5’ extensions generate proteins that are identical to the canonical protein but with a longer N-terminal. There is considerably more protein evidence for translation from 5’ extensions than for other types of upstream translation [5,6].
The GC content of the 5’ UTR that include translated upstream regions is remarkably high [4], so in the case of the N-terminal extensions, the extra section of protein sequence at the N-terminal is likely to be disordered and in most cases will not have much effect on the function of the canonical isoform. That means that even if an N-terminal extension is produced as a result of an inefficient translation process, the isoform may not be harmful to the cell, especially if it is translated in relatively small quantities, as appears to be the case [4]. However, there is at least one situation in which an N-terminal extension has the potential to alter protein function drastically: if the N-terminal extension blocks a signal peptide.
Signal peptides are N-terminal hydrophobic sequences that are required for translocation across the endoplasmic reticulum membrane [7]. Localization of the secreted and trans-membrane proteins that pass through the endoplasmic reticulum is under the control of signal recognition particles. These bind to ribosomes from the start of the translation process, and if the nascent protein emerging from the ribosome is recognised as a signal peptide, translation is halted and the signal recognition particle delivers the nascent protein along with ribosome to the Sec61 complex [8,9]. From where the protein is translated directly into the endoplasmic reticulum lumen [8,9]. However, if the protein does not have a signal peptide at the N-terminal, the signal recognition particle detaches from the ribosome [10]. A mostly disordered N-terminal extension of a protein that precedes a signal sequence will probably cause the signal recognition particle to detach early, leaving the translated protein in the cytoplasm, the wrong cellular compartment [11].
Proteins must be in the appropriate cellular compartment. If they accumulate, mislocalized proteins will affect cell function and homeostasis by interacting with proteins vital to the organelle, and interactions between mislocalised and cytosolic proteins can eventually lead to aggregation and even lead to neurodegeneration [12,13]. In the cytoplasm, the exposed hydrophobic regions of mislocalised trans-membrane and secretory proteins are recognised by the BAG6 complex [11, 14], leading to their polyubiquitination and degradation [15].
Here, multiple proteomics experiments provide clear evidence that protein isoforms with elongated N-terminals ahead of signal peptides are likely to be mislocalized and to be degraded via the ubiquitination pathway [11].
Methods
Signal peptide prediction
Signal peptides were predicted for coding genes as part of the APPRIS database [16] based on scores from SignalP (v4.1, 17). The APPRIS signal peptide module generates a score between −4 and 4 for each protein isoform. The APPRIS database assumes that any protein with a score of 2 or more has a signal peptide. However, for this analysis isoforms with an APPRIS signal peptide of score of 1 were also included because peptides with this score also agreed with the predicted or experimental subcellular location annotated by UniProtKB [18]. Scores of 0 or lower had much less agreement with UniProtKB sub-cellular location annotations. Signal peptides were predicted for the principal isoforms of all GENCODE v36 coding genes [19]. APPRIS principal isoforms were used as the reference because they have been shown to be the best approximation of the canonical protein isoform [20, 21].
The same analysis of signal peptides was also carried out using the SignalP predictions from the Human Protein Atlas [22]. The Human Protein Atlas predicts signal peptides using SignalP 6.0 [17] which predicts more types of signal peptide. However, the Human Protein Atlas does not include all the GENCODE v36 coding genes. The results were almost identical to the APPRIS analysis.
Fedorova et al. analysis
Upstream translations were taken from the supplementary materials of this analysis [3]. All upstream translations were included in the study, excluding those that were not protein coding in GENCODE v36. They were separated into those that had protein evidence and those that only had ribosome profiling support based on the data from the supplementary materials.
Zhu et al. analysis
Upstream translations were again listed in the supplementary materials of this analysis [2]. The results from both the A431 cell lines and normal tissues were analysed. All upstream translations (tagged by the authors as “N-terminal extensions”) were included in the study. Those that were not protein coding in GENCODE v36 were excluded. All N-terminal extensions were supported by peptides.
Rodriguez et al. analysis
Upstream translations were listed in the supplementary materials of this analysis [4] and tagged as “N-terminal extension” by the authors. All upstream translations were protein coding in GENCODE v36. All N-terminal extensions were supported by peptides.
PeptideAtlas analysis
Genes with novel upstream translations that had peptide evidence from PeptideAtlas [23] in this analysis [6] were provided by the investigators.
Chen et al. analysis
Upstream 5’ extensions were listed in the supplementary materials in this analysis [24] and tagged as “extension”. Those that were not protein coding in GENCODE v36 were excluded. All extensions were supported by ribosome profiling data.
Annotated N-terminal extensions
The set of genes that had alternative isoforms with N-terminal extensions that were not conserved across primates were provided by the investigators of the Rodriguez et al. paper [4].
Results
The C1Q-like family
Two of the large-scale experiments in this analysis [3, 4] confirm the presence of translated upstream regions in members of the C1Q-like (C1QL) family. Transcript evidence for upstream translation in C1QL2 and C1QL3 was reported in one of the ribosome profiling experiments [3], and an N-terminal extension in C1QL4 had proteomics support [4].
C1Q-like (C1QL) proteins are secreted proteins found across all vertebrates. C1QL1, C1QL2 and C1QL3 are mostly expressed in nervous tissues and C1QL1 and C1QL3 mediate cell adhesion through the ADGRB3 receptor [25, 26]. In adult mice trans-synaptic interaction between C1ql1 and Adgrb3 has been shown to promote the elongation of climbing fibers in Purkinje cells [27, 28]. Less is known about C1QL2 and C1QL4, except that C1QL4 is testis-expressed rather than brain-expressed. Jawed vertebrate species have all four C1QL proteins.
C1Q-like (C1QL) genes clearly undergo upstream translation. In C1QL4, the upstream region, translated from a highly conserved ATT codon, is clearly under purifying selection across mammals [4]. The ATT codon is conserved in the other three ancient C1QL genes and in the ancestor of the C1QL family in lamprey [Figure 1], and the upstream regions of each C1QL gene have conserved basic residues and a conserved valine and alanine-rich region immediately prior to the canonical methionine [Figure 1].
Figure 1. Alignment of C1Q-like proteins from human, fish and lamprey.
The position of the conserved ATT start codon is marked as an isoleucine in orange (even though it probably codes for a methionine), a conserved basic motif in dark red, and a conserved alanine, valine and glycine-rich region in green. The canonical ATG is shown in blue and the hydrophobic region of the predicted signal peptide in pink.
Canonical C1QL proteins are supposed to be secreted and have signal peptides. It is known that at least C1QL1 and C1QL3 function as secreted proteins. However, signal peptides must start at the N-terminal end of the protein sequence [10], so C1QL isoforms with extended N-terminals will not be secreted because the signal peptide will not be recognised. Although proteins that build up in the incorrect cellular compartment are expected to interfere with the cellular processes, the ancient origins and clear cross-species conservation of the C1QL extended N-terminals shows that two populations of C1QL proteins (secreted and non-secreted) are meant to exist. The conserved VGA region in C1QL extensions is predicted to extend the signal peptide helix [29], and all have a conserved basic region next to the predicted extended helix.
Does the loss of signal peptides lead to degradation?
The translation of the upstream region of C1QL4 is supported by protein evidence. Since the extended isoform cannot be secreted, it must have a different role from the canonical isoform. One question is whether or not this example can be extrapolated to all signal peptide blocking N-terminal extensions. If the C1QL4 example is typical, we would expect to detect peptides for other upstream regions, showing that other extended N-terminals that precede signal peptides also accumulate in the cytoplasm. However, if there is little or no peptide evidence for these regions, it strongly suggests that these N-terminal extensions are degraded to protect the cell because they could interfere with cellular processes.
Signal peptides were mapped using the annotations from the APPRIS database [see methods]. Over the whole GENCODE v36 reference set (with read-through genes eliminated [30]), 14.7% of the genes had signal peptides [Figure 2], while the percentage (13.8%) was slightly lower for those 14,888 genes that were detected across the five large-scale proteomics experiments [4].
Figure 2. Percentage of genes with signal peptides predicted via APPRIS.
The percentage of different sets of genes that have signal peptides in their principal isoform [16] according to the SignalP [17] predictor in APPRIS [16]. In orange, the percentage for the whole gene set (Genome) and for those genes detected by 5 large-scale proteomics experiments in Rodriguez et al (Experimental). In yellow, the percentage of genes with peptides for their translated upstream regions that had signal peptides in the Fedorova [3], PeptideAtlas [6], Rodriguez [4] and Zhu [2] analyses. In green, the percentage of genes with ribosome profiling evidence for upstream translation that have signal peptides in the Fedorova [3] and Chen [24] analyses. In blue, the percentage of genes with non-conserved annotated upstream translations (Annotated, [4]) that have signal peptides.
The Rodriguez et al. analysis [4] detected peptides for N-terminal extensions in 170 genes. If N-terminal extensions are not regulated, 23 of these genes (13.8%) would be expected to have signal peptides in their principal isoforms. However, the only gene predicted to have a signal peptide in its principal isoform that had supporting peptides for its upstream region was C1QL4. Just one in 170 genes (0.59%) is much lower than expected and a Chi-squared test indicated that this was highly significant (p < 0.005).
The C1QL4 isoform is highly conserved and is almost certainly not in a different cellular compartment by mistake. However, it seems that other proteins with N-terminal extensions blocking the signal peptides are degraded in the cytoplasm.
To confirm the results, three other data sets with peptide evidence for translated upstream translations were analysed. Zhu et al. [2] found evidence for 51 N-terminal extensions, and none of the 51 translations were from genes that had signal peptides in their principal isoforms [Figure 2]. Peptides from PeptideAtlas [23] support 63 N-terminal extensions [6]. Here, just one of these extensions preceded a signal peptide [Figure 2]. Finally, Fedorova et al. [3] verified 102 N-terminal extensions with an in-house proteomics analysis. Three of the 102 N-terminal extensions (2.9%) were upstream of signal peptides in principal isoforms [Figure 2]. Chi-squared tests for all three analyses found that the number of N-terminal extensions blocking signal peptides that were detected with peptide evidence was significantly lower than would be expected by chance (p < 0.005), confirming that the translation of N-terminal extensions upstream of signal peptides is regulated by the cell.
The Fedorova et al. analysis is instructive because the 5’ UTR extensions that it detected can be separated into two groups, those that have accompanying peptide evidence, and those that have just transcript evidence from ribosome profiling experiments. This second group is larger (338 genes) and remarkably 13.6% of the principal isoforms in these genes have signal peptides, very close to the background level in the genome. Together these results confirm that the regulation of proteins that are likely to build up in the wrong cellular compartment takes place during or shortly after translation, suggesting that the BAG6-dependent polyubiquitination pathway [11] is the most likely means of degradation.
Other data sets support the Fedorova et al. data. Chen et al. [24] also detected ribosome profiling evidence (but not peptide evidence) for in-frame upstream translations, this time in 1143 genes. A total of 13.7% of the principal isoforms in these genes were predicted to have signal peptides, close to the percentage for the reference gene set with peptide evidence, and confirming the results from the Fedorova set.
Finally, 262 GENCODE v36 protein isoforms already annotated with extended N-terminals relative to their principal isoforms [4] were analysed. These annotated N-terminal extensions are similar to those detected in the proteomics and ribosome profiling experiments in that the extensions do not have equivalent regions beyond primates, but different in that all the start codons are ATG. A total of 36 genes of these had signal peptides in their principal isoforms, a similar proportion (13.7%) to the background genes detected in proteomics experiments and to the 5’ extensions detected in ribosome profiling experiments.
Pathogenic variants in N-terminal extensions
The American College of Medical Genetics and Genomics (ACMG) recommends that researchers select the longest or most clinically significant transcript for a gene as the reference transcript [31]. Since there is considerable evidence for transcripts with upstream start codons [3,4,24], many of these regions inevitably will be annotated as coding in gene reference sets and these novel transcripts will almost always become the longest in each gene. So, over time, many researchers will assume that these extended transcripts should be the reference transcript and as a result many will be annotated with pathogenic variants.
This effect can already be seen in some genes. At least two of the genes from the set of 236 annotated non-conserved N-terminal extensions upstream of target peptides in GENCODE v36 are already predicted to have likely pathogenic mutations in the 5’ UTR that produces these extensions. These genes are TMCO1 [Figure 3], which produces an endoplasmic reticulum-based multipass membrane protein that binds the Sec61 complex [32], and EDNRB [Figure 3], a multi-pass transmembrane receptor [33].
Figure 3. Annotating translated upstream regions attracts pathogenic variants.
The likely pathogenic variants annotated in TMCO1 and EDNRB. A. The 3D structure of the principal isoform of TMCO1 predicted by AlphaFold [29]. Hydrophobic trans-membrane regions coloured in yellow, the hydrophobic region of the predicted signal anchor (uncleaved signal peptide) in orange. B. The AlphaFold 3D structure of EDNRB. Hydrophobic trans-membrane regions coloured in yellow, the hydrophobic region of the predicted signal peptide in orange. C. A screenshot of the 5’ exon of TMCO1 in the UCSC genome browser [34] showing the amino acid sequence, methionines (from ATGs) in green. The methionine on the far right indicates the position of the upstream start codon and the central methionine the canonical start codon. Pathogenic and likely pathogenic variants are shown as red dots below the sequence. The likely pathogenic frame shift variant that affects the upstream region of TMCO1 is in the centre of the image.
Variants in TMCO1 can cause craniofacial and skeletal anomalies and glaucoma [35]. The upstream start codon that initiates the translation of the TMCO1 N-terminal extension is only present in Homininae species and there is no RNASeq evidence for this region in GTex [36], nor peptide evidence in PeptideAtlas. Yet Ensembl/GENCODE and UniProtKB annotate the 5’ extension [Figure 3] and UnIProtKB has even made the extended isoform the representative sequence for this gene. The likely pathogenic variation in the 5’ extension [Figure 3] would lead to a frameshift and is supported by one star in ClinVar [37], but the annotating group has not provided any evidence in its support.
Variants in EDNRB may cause Waardenburg syndrome and Hirschsprung disease [38]. The upstream region that produces the N-terminal extension in EDNRB is also only intact within Homininae species. There is residual RNASeq expression of this region in GTex, but no reliably identified peptides in PeptideAtlas. Again the extended transcript/isoform is annotated in both Ensembl/GENCODE and UniProtKB and the likely pathogenic variation in the 5’ extension (a premature stop codon) has one star in ClinVar.
Conclusions
Evidence that 5’ UTRs upstream of canonical start codons are being translated is indisputable [3,24] and an ever increasing number of studies find peptide support for these upstream translations [1–4,6]. The most common form of upstream translation is the in-frame translation of 5’ UTR that produces N-terminally extended canonical protein isoforms [4,6].
This analysis finds clear evidence that at least some of these N-terminally extended protein isoforms are under regulatory control and are degraded by the cellular machinery to prevent the build up of secretory and trans-membrane proteins in the cytoplasm where they might be harmful to the cell.
Protein isoforms with extended N-terminals have additional amino acids at the N-terminal end, and if the protein has a signal peptide, the extra amino acids would have the effect of moving it further away from the N-terminal. This would mean it would be more likely that the signal peptide would still be in the ribosome tunnel when the signal recognition particle disengages from the ribosome [10]. Once the signal recognition particle disengages from the ribosome, the protein will be destined for the cytoplasm rather than the endoplasmic reticulum [11,14]. Unless it is recognised by cellular surveillance and tagged for degradation, the N-terminally extended isoform will accumulate in the cytoplasm.
In each of the four unrelated proteomics analyses that were analysed, N-terminal extensions that would abolish signal peptides were detected significantly less frequently than would be expected, suggesting that these N-terminally extended isoforms are degraded. Since the translated signal peptide would leave an exposed hydrophobic region, the most likely path to degradation would involve BAG6, which has been shown to promote the degradation of proteins with N- and C-terminals that have substantial hydrophobic regions [11,14].
Degradation of the mislocalised proteins was confirmed to occur post-translation because the 5’ extensions that were detected in ribosome profiling experiments [3, 24] had the same proportion of signal peptides as all genes in the control set, those genes detected in proteomics experiments [4]. When we compared compendiums of the four large-scale proteomics analyses and the two ribosome profiling experiments, there was more than eight times as much support for upstream regions that would block signal peptides among the ribosome profile transcripts.
Given that translation from upstream start codons is so frequent [3,4,6,24], one important question is whether or not there are any adaptive benefits. Whether these upstream translations have gained functional roles [39], or whether the translations are simply the result of noisy translation initiation and are not under selection pressures [40]. The simple answer seems to be, both but mostly the result of inefficient translation initiation.
Novel translated upstream regions have a series of characteristics that are not indicative of coding regions [4]. Most importantly, a large majority of upstream translations are not conserved among primates [3,4]. There is no evidence of purifying selection in germline variation data [41] from the majority of upstream regions that are not conserved across primates [4]. At the same time, a small proportion, including the members of the C1QL family, has strong cross species conservation and does appear to be under selection pressure [4].
The results from this analysis demonstrate that practically all the N-terminal extensions that precede signal peptides are degraded. If they are degraded, it would seem to confirm that this sub-group of non-conserved upstream translations are potentially harmful biological aberrations. Since the production of harmful biological aberrations is not the result of any evolutionary selection, it is almost certainly a side-effect of a noisy translation initiation process.
If one sub-group of unconserved N-terminal extensions are a side-effect of a noisy translation initiation process, then this is almost certainly also true for the rest of the non-conserved N-terminal extensions. The only difference being that these N-terminally extended isoforms are not degraded because they are produced in small quantities in the correct cellular compartment. They are more likely to be tolerated by the cell. The degradation of regions translated from upstream start codons confirms that translation initiation is an especially noisy biological process.
The observation that non-canonical ATGs and their associated Kozak sequences are selected against in all frames upstream and downstream of the canonical ATG [42] is another indication that translation initiation is prone to molecular errors. Since ATGs are much more efficient translation initiation codons than non-ATG start codons [43], accidental translation from a non-canonical ATG has the potential to be much more of a metabolic burden [42]. The inefficiency of translation from non-canonical start codons may be one of the reasons that so much upstream translation is tolerated by the cell.
If the majority of non-conserved upstream translations are the result of molecular errors, there has to some consideration as to how they are annotated in reference gene sets to avoid the risk of being annotated with unsupported pathogenic variants like TMCO1 and EDNRB and propagating errors.
Although the ACMG still recommends that researchers select the longest transcript for a gene as the reference transcript [31], the MANE Select [44] transcript is now available for more recent versions of reference gene sets. While this will go some way to avoid the circular annotation of unsupported pathogenic variants, it is not a panacea, partly because protein length and pre-existing pathogenic variantions were important inputs in this automatic process.
One extreme case of what could happen can be seen in the gene PUS1. The MANE Select is the longest transcript, generated from an upstream start codon, presumably because an unsupported pathogenic variant (a premature stop) was already annotated in the upstream exon. However, the region between the upstream start codon and the canonical start codon has little transcript support and is not under coding conservation, even in primates. At present, the PUS1 upstream region is annotated with seven pathogenic or likely pathogenic variants in ClinVar, most of which have been added since it was tagged as part of the MANE Select transcript and none of which have any support, save that they are high impact variants and are present in a gene known to play a role in a rare mitochondrial myopathy [45]. One of the submitters was swayed to record the variant as pathogenic because of the ClinVar pathogenic variants [37] already present in the exon, the very definition of circular annotation.
A definition of functional importance that includes conservation would help distinguish canonical and conserved alternative isoforms from those isoforms that are the result of imprecise translation initiation, but no reference gene set incorporates any such method at present. The APPRIS database includes TRIFID functional scores [46] for each isoform, which give an idea of the relative conservation. High scoring TRIFID isoforms have been shown to capture almost all reliably annotated pathogenic variants in ClinVar [21].
This analysis shows that degradation by the ubiquitination pathway is the likely fate of erroneous upstream translation initiation when it leads to the blocking N-terminal localization signals. That a large proportion of translations from upstream start codons appear to be products of a misfiring translation initiation process has implications for the huge numbers of upstream translations that are being detected in large-scale analyses, and strongly suggests that further work needs to be carried out in this area.
Acknowledgemnents
This work was funded by the National Human Genome Research Institute of the National Institutes of Health (grant number U41 HG007234). The author would like to thank Federico Abascal for his input on the paper.
References
- 1.Kim M.S., Pinto S.M., Getnet D., Nirujogi R.S., Manda S.S., Chaerkady R., Madugundu A.K., Kelkar D.S., Isserlin R., Jain S. et al. (2014) A draft map of the human proteome. Nature, 509, 575–581. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Zhu Y., Orre L.M., Johansson H.J., Huss M., Boekel J., Vesterlund M., Fernandez-Woodbridge A., Branca R.M.M. and Lehtiö J. (2018) Discovery of coding regions in the human genome by integrated proteogenomics analysis workflow. Nat. Commun., 9, 903. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Fedorova A.D., Kiniry S.J., Andreev D.E., Mudge J.M. and Baranov P.V. (2022) Thousands of human non-AUG extended proteoforms lack evidence of evolutionary selection among mammals. Nat Commun., 13, 7910. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Rodriguez J.M., Abascal F., Cerdán-Vélez D., Gómez L.M., Vázquez J. and Tress M.L. (2024) Evidence for widespread translation of 5’ untranslated regions. Nucleic Acids Res., 52, 8112–8126. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Fritsch C., Herrmann A., Nothnagel M., Szafranski K., Huse K., Schumann F., Schreiber S., Platzer M., Krawczak M., Hampe J. and Brosch M. (2012) Genome-wide search for novel human uORFs and N-terminal protein extensions using ribosomal footprinting. Genome Res., 22, 2208–2218. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Rodriguez J.M., Maquedano M., Cerdán-Vélez D., Calvo E., Vázquez J. and Tress M.L. (2024) A deep audit of the PeptideAtlas database uncovers evidence for unannotated coding genes and aberrant translation. bioRxiv, 2024.11.14.623419v1. [Google Scholar]
- 7.von Heijne G. (1990) The signal peptide. J Membr Biol., 115, 195–201. [DOI] [PubMed] [Google Scholar]
- 8.Saraogi I. and Shan S.O. (2011) Molecular mechanism of co-translational protein targeting by the signal recognition particle. Traffic, 12, 535–542. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Voorhees R.M. and Hegde R.S. (2015) Structures of the scanning and engaged states of the mammalian SRP-ribosome complex. Elife, 4, e07975. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Bornemann T., Jöckel J., Rodnina M.V. and Wintermeyer W. (2008) Signal sequence-independent membrane targeting of ribosomes containing short nascent peptides within the exit tunnel. Nat Struct Mol Biol., 15, 494–499. [DOI] [PubMed] [Google Scholar]
- 11.Hessa T., Sharma A., Mariappan M., Eshleman H.D., Gutierrez E. and Hegde R.S. (2011) Protein targeting and degradation are coupled for elimination of mislocalized proteins. Nature, 475, 394–397. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Rane N.S., Chakrabarti O., Feigenbaum L. and Hegde R. S. (2010) Signal sequence insufficiency contributes to neurodegeneration caused by transmembrane prion protein. J Cell Biol., 188, 515–526. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Benarroch R., Austin J.M., Ahmed F. and Isaacson R.L. (2019) The roles of cytosolic quality control proteins, SGTA and the BAG6 complex, in disease. Adv Prot Chem Struct., 114, 265–313. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Yamamoto K., Hayashishita M., Minami S., Suzuki K., Hagiwara T., Noguchi A. and Kawahara H. (2017) Elimination of a signal sequence-uncleaved form of defective HLA protein through BAG6. Sci rep-uk., 7, 14545. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Suzuki R. and Kawahara H. (2016). UBQLN4 recognizes mislocalized transmembrane domain proteins and targets these to proteasomal degradation. EMBO reports, 17(6), 842–857. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Rodriguez J.M., Pozo F., Cerdán-Vélez D., Di Domenico T., Vázquez J. and Tress M.L. (2022) APPRIS: selecting functionally important isoforms. Nucleic Acids Res., 50, D54–D59. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Nielsen H., Teufel F., Brunak S. and von Heijne G. (2024) SignalP: The Evolution of a Web Server. Method Molecular Biol, 2836, 331–367. [DOI] [PubMed] [Google Scholar]
- 18.Consortium UniProt (2023) UniProt: the Universal Protein Knowledgebase in 2023. Nucleic Acids Res., 51, D523–D531. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Frankish A., Carbonell-Sala S., Diekhans M., Jungreis I., Loveland J.E., Mudge J.M., Sisu C., Wright J.C., Arnan C., Barnes I. et al. (2023) GENCODE: reference annotation for the human and mouse genomes in 2023. Nucleic Acids Res., 51, D942–D949. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Pozo F., Martinez Gomez L., Rodriguez J.M., Vazquez J. and Tress M.L. (2022) APPRIS principal isoforms and MANE Select transcripts define reference splice variants. Bioinformatics, 38, ii89–ii94. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Pozo F., Rodriguez J.M., Vázquez J. and Tress M.L. (2022) Clinical variant interpretation and biologically relevant reference transcripts. NPJ Genom. Med., 7, 59. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Uhlén M., Fagerberg L., Hallström B.M., Lindskog C., Oksvold P., Mardinoglu A., Sivertsson Å., Kampf C., Sjöstedt E., Asplund A., et al. (2015) Tissue-based map of the human proteome. Science, 347, 1260419. [DOI] [PubMed] [Google Scholar]
- 23.Desiere F., Deutsch E.W., King N.L., Nesvizhskii A.I., Mallick P., Eng J., Chen S., Eddes J., Loevenich S.N. and Aebersold R. (2006) The PeptideAtlas project. Nucleic Acids Res., 34, D655–D658. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Chen J., Brunner A.D., Cogan J.Z., Nuñez J.K., Fields A.P., Adamson B., Itzhak D.N., Li J.Y., Mann M., Leonetti M.D. and Weissman J.S. (2020) Pervasive functional translation of noncanonical human open reading frames. Science, 367, 1140–1146. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Gupta R., Nguyen D.C., Schaid M.D., Lei X., Balamurugan A.N., Wong G.W., Kim J.A., Koltes J.E., Kimple M.E. and Bhatnagar S. (2018) Complement 1q-like-3 protein inhibits insulin secretion from pancreatic β-cells via the cell adhesion G protein-coupled receptor BAI3. J Biol Chem, 293, 18086–18098. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Sticco M.J., Peña Palomino P.A., Lukacsovich D., Thompson B.L., Földy C., Ressl S. and Martinelli D.C. (2021) C1QL3 promotes cell-cell adhesion by mediating complex formation between ADGRB3/BAI3 and neuronal pentraxins. FASEB J., 35, e21194. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Kakegawa W., Mitakidis N., Miura E., Abe M., Matsuda K., Takeo Y.H., Kohda K., Motohashi J., Takahashi A., Nagao S. et al. (2015) Anterograde C1ql1 signaling is required in order to determine and maintain a single-winner climbing fiber in the mouse cerebellum. Neuron, 85, 316–329. [DOI] [PubMed] [Google Scholar]
- 28.Aimi T., Matsuda K. and Yuzaki M. (2023). C1ql1-Bai3 signaling is necessary for climbing fiber synapse formation in mature Purkinje cells in coordination with neuronal activity. Mol. Brain, 16, 61. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Abramson J., Adler J., Dunger J., Evans R., Green T., Pritzel A., Ronneberger O., Willmore L., Ballard A.J., Bambrick J. et al. (2024). Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature, 630, 493–500. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Abascal F., Juan D., Jungreis I., Kellis M., Martinez L., Rigau M., Rodriguez J.M., Vazquez J. and Tress M.L. (2018) Loose ends: almost one in five human genes still have unresolved coding status. Nucleic Acids Res., 46, 7070–7084. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Richards S., Aziz N., Bale S., Bick D., Das S., Gastier-Foster J., Grody W.W., Hegde M., Lyon E., Spector E. et al. (2015) Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet Med., 17, 405–424. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.McGilvray P.T., Anghel S.A., Sundaram A., Zhong F., Trnka M.J., Fuller J.R., Hu H., Burlingame A.L. and Keenan R.J. (2020) An ER translocon for multi-pass membrane protein biogenesis. eLife, 9, e56889. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Bondurand N., Dufour S. and Pingault V. (2018) News from the endothelin-3/EDNRB signaling pathway: Role during enteric nervous system development and involvement in neural crest-associated disorders. Dev Biol-, 444 , S156–S169. [DOI] [PubMed] [Google Scholar]
- 34.Raney B.J., Barber G.P., Benet-Pagès A., Casper J., Clawson H., Cline M.S., Diekhans M., Fischer C., Navarro Gonzalez J., Hickey G. et al. (2024) The UCSC Genome Browser database: 2024 update. Nucleic Acids Res., 52, D1082–D1088. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Xin B., Puffenberger E.G., Turben S., Tan H., Zhou A. and Wang H. (2010) Homozygous frameshift mutation in TMCO1 causes a syndrome with craniofacial dysmorphism, skeletal anomalies, and mental retardation. Proc Nat Acad Sci U S A., 107, 258–263. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Consortium GTEx (2020) The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science, 369, 1318–1330. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Landrum M. J., Lee J.M., Benson M., Brown G.R., Chao C., Chitipiralla S., Gu B., Hart J., Hoffman D., Jang W. et al. (2018) ClinVar: improving access to variant interpretations and supporting evidence. Nucleic Acids Res. 46, D1062–D1067. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Montalva L., Cheng L.S., Kapur R., Langer J.C., Berrebi D., Kyrklund K., Pakarinen M., de Blaauw I., Bonnard A. and Gosain A. (2023) Hirschsprung disease. Nat Rev Dis Primers., 9, 54. [DOI] [PubMed] [Google Scholar]
- 39.Kearse M.G. and Wilusz J.E. (2017) Non-AUG translation: a new start for protein synthesis in eukaryotes. Genes Dev., 31, 1717–1731. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Xu C. and Zhang J. (2020) Mammalian Alternative Translation Initiation Is Mostly Nonadaptive. Mol Biol Evol., 37, 2015–2028. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Cummings B.B., Karczewski K.J., Kosmicki J.A., Seaby E.G., Watts N.A., Singer-Berk M., Mudge J.M., Karjalainen J., Satterstrom F.K., O’Donnell-Luria A.H. et al. (2020) Transcript expression-aware annotation improves rare variant interpretation. Nature, 581, 452–458. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Zur H. and Tuller T. (2013) New universal rules of eukaryotic translation initiation fidelity. PLoS Comp. Biol. 9, e1003136. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Kozak M. Context effects and inefficient initiation at non-AUG codons in eucaryotic cell-free translation systems. Mol. Cell. Biol. 1989;9:5073–5080. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Morales J., Pujar S., Loveland J.E., Astashyn A., Bennett R., Berry A., Cox E., Davidson C., Ermolaeva O., Farrell C.M., et al. (2022) A joint NCBI and EMBL-EBI transcript set for clinical genomics and research. Nature, 604, 310–315. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Bykhovskaya Y., Casas K., Mengesha E., Inbal A. and Fischel-Ghodsian N. (2004). Missense mutation in pseudouridine synthase 1 (PUS1) causes mitochondrial myopathy and sideroblastic anemia (MLASA). Am J Hum Genet., 74, 1303–1308. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Pozo F., Martinez-Gomez L., Walsh T.A., Rodriguez J.M., Di Domenico T., Abascal F., Vazquez J. and Tress M.L. (2021) Assessing the functional relevance of splice isoforms. NAR Genom Bioinfo., 3, lqab044. [DOI] [PMC free article] [PubMed] [Google Scholar]



