Extended Data Figure 4. Nucleocapsid-overlapping ORF9c is not protein-coding.
Sarbecovirus alignment of frame2-encoded ORF9c (top), which overlaps frame3-encoded Nucleocapsid (bottom). ORF9c start codon is lost in one strain, and most strains have an earlier UAG stop codon (magenta) 3 codons before the end. In Nucleocapsid-encoding frame 2 (bottom), nearly all nucleotide substitutions are amino-acid-preserving (synonymous, light green), indicating strong purifying selection for protein-coding function. By contrast, in ORF9c-encoding frame 3 (top), nearly all nucleotide substitutions result in function-disrupting (radical) amino acid changes (red), and very few result in synonymous (light green) or function-preserving (conservative, dark green) substitutions, indicating lack of purifying selection for protein-coding function for ORF9c, so it does not play conserved protein-coding functions. In addition, ORF9c is unlikely to be translated via leaky ribosome scanning because its start codon is 460 nucleotides after N’s (red arrow) with 9 intervening AUG codons (green dots), direct-RNA sequencing found no ORF9c-specific subgenomic RNAs16–18, no TRS is appropriately positioned to create one, and several SARS-CoV-2 isolates35 contain stop-introducing mutations7, indicating that ORF9c is not a recently-evolved strain-specific gene either. We conclude 9c is not protein-coding.