Significance
All members of the order Nidovirales, including Simian hemorrhagic fever virus (SHFV), produce subgenomic mRNAs (sg mRNAs) for their 3′ genes regulated by genomic transcription regulatory sequences (TRSs). We used a next-generation sequencing–facilitated approach to comprehensively analyze a nidovirus sg mRNA transcriptome. The discovery of high sg mRNA redundancy for individual genes and multiple previously unreported sg mRNAs encoding nonstructural proteins, alternative reading frame proteins, or C-terminal peptides of known proteins represents a paradigm shift in our understanding of SHFV genome-coding capacity and the complexity of transcription regulation that is expected to also be characteristic of other nidoviruses. High sg mRNA redundancy would ensure continued protein synthesis if a TRS is inactivated by random mutation.
Keywords: nidovirus, Simian hemorrhagic fever virus, transcription regulatory sequences, subgenomic mRNAs, next-generation sequencing
Abstract
Members of the order Nidovirales express their structural protein ORFs from a nested set of 3′ subgenomic mRNAs (sg mRNAs), and for most of these ORFs, a single genomic transcription regulatory sequence (TRS) was identified. Nine TRSs were previously reported for the arterivirus Simian hemorrhagic fever virus (SHFV). In the present study, which was facilitated by next-generation sequencing, 96 SHFV body TRSs were identified that were functional in both infected MA104 cells and macaque macrophages. The abundance of sg mRNAs produced from individual TRSs was consistent over time in the two different cell types. Most of the TRSs are located in the genomic 3′ region, but some are in the 5′ ORF1a/1b region and provide alternative sources of nonstructural proteins. Multiple functional TRSs were identified for the majority of the SHFV 3′ ORFs, and four previously identified TRSs were found not to be the predominant ones used. A third of the TRSs generated sg mRNAs with variant leader–body junction sequences. Sg mRNAs encoding E′, GP2, or ORF5a as their 5′ ORF as well as sg mRNAs encoding six previously unreported alternative frame ORFs or 14 previously unreported C-terminal ORFs of known proteins were also identified. Mutation of the start codon of two C-terminal ORFs in an infectious clone reduced virus yield. Mass spectrometry detected one previously unreported protein and suggested translation of some of the C-terminal ORFs. The results reveal the complexity of the transcriptional regulatory mechanism and expanded coding capacity for SHFV, which may also be characteristic of other nidoviruses.
The virus families Coronaviridae, Arteriviridae, Mesoniviridae, and Roniviridae constitute the order Nidovirales. Nidoviruses are single-stranded, positive-sense RNA viruses that share a similar genome organization and generate a 3′ coterminal nested set of subgenomic (sg) mRNAs to express structural and accessory proteins (1–3). The genomes of the arteriviruses are approximately half the size of those of Coronaviruses. Members of the family Arteriviridae include Simian hemorrhagic fever virus (SHFV), Lactate dehydrogenase-elevating virus (LDV), and the well-studied viruses Equine arteritis virus (EAV) and Porcine reproductive and respiratory syndrome virus (PRRSV). SHFV infections in species of African monkeys are typically persistent and asymptomatic (4–7). In contrast, SHFV infections in Asian macaque monkeys trigger an acute, fatal hemorrhagic fever disease with death occurring 7–14 d after infection (6, 8). Macrophages (MΦs) and dendritic cells are target cells for SHFV (9).
Arterivirus genomes have a 5′ cap and a 3′ poly(A) tail. The 5′ two-thirds of the genome encodes two polyproteins, ORF1a and ORF1ab, that are auto-cleaved into 13–15 nonstructural proteins required for virus replication and transcription (10, 11). The 3′ one-third of the arterivirus genome encodes either five or nine minor structural proteins and three major structural proteins that are required for infectious virus (iv) particle production. Among the major structural proteins, GP5 and M form heterodimers in the viral envelop and were shown to be essential for EAV particle assembly (12–14). The nucleocapsid (N) protein forms homodimers that interact with each other as well as with the genomic RNA to form a “cage-like” nucleocapsid (15–17). The minor structural glycoproteins GP2, GP3, and GP4 form heterotrimers and function in cell-receptor recognition (18, 19). The minor structural protein E forms oligomers and is thought to function as an ion channel in the virion membrane during cell entry (20). The function of the recently discovered ORF5a, a possible additional minor structural protein, is not known, but knocking out the expression of this protein in the EAV genome reduced the virus yield (21, 22). Among arteriviruses, SHFV has the largest genome at ∼15.7 kb and encodes an additional nsp1 protein (nsp1γ) and an extra set of minor structural proteins (GP2′, GP3′, GP4′, and E′) (23, 24). The functions of these additional minor structural proteins are not currently known, but each is required for the production of infectious extracellular virions (25).
All nidoviruses generate a 3′ coterminal nested set of sg mRNAs to express structural and also, in some cases, accessory proteins. The production of the minus-strand templates for these sg mRNAs is regulated by transcription regulatory sequences (TRSs) located in the 3′ region of the genome that are called “body TRSs.” In addition to being 3′ coterminal, the sg mRNAs of coronaviruses, arteriviruses, and mesoniviruses are also 5′ coterminal due to a discontinuous RNA synthesis mechanism (1, 3, 26, 27). The arterivirus TRSs are usually 30 to 40 nt in length and consist of a 6- to 9-nt core sequence with 10- to 15-nt flanking sequences on each side. It has been proposed that the leader TRS folds into a stem–loop structure with the core sequence located in the loop and the flanking sequences forming the stem (28, 29). There is a single 5′ leader TRS but multiple 3′ body TRSs in arterivirus genomes. The core sequences of different body TRSs vary in the extent of their sequence homology to the core sequence of the leader TRS (30). As the viral RNA polymerase copies a minus strand from the 3′ end of the genome, it sequentially encounters the 3′ body TRSs. At each body TRS, the polymerase either reads through or terminates prematurely within the TRS sequence. If termination occurs, the polymerase carrying a partially transcribed minus-strand RNA disassociates from the genome. Because the 3′ end of the nascent minus-strand RNA contains a portion of a sequence complementary to the body TRS, which is also partially complementary to the 5′ leader TRS, it can realign at the 5′ end of the genome (31). Transcription then continues, generating a minus-strand sg RNA with a unique leader–body junction sequence (32, 33). The minus-strand sg RNAs generated are then efficiently transcribed into sg mRNAs. Each sg mRNA typically expresses one structural protein from the first 5′ start codon (5′ proximal ORF), but in a few instances the expression of one or more additional ORFs from a single sg mRNA has been proposed (21, 22, 24, 34).
Similar to other arteriviruses, the SHFV genome was initially thought to encode six structural protein ORFs because six strong sg mRNA bands were detected by Northern blotting of infected cell extracts (35). However, subsequent sequencing of the 3′ region of the SHFV genome revealed the presence of ORFs for an extra set of minor structural proteins, GP2′, GP3′, and GP4′ (36). The body TRSs for GP2′ and GP4′ were then discovered, but a separate body TRS for SHFV GP3′ was not identified (24, 36). In a recent study, a sg mRNA3′ encoding GP3′ as the 5′ ORF was detected by Northern blotting, and the corresponding body TRS3′ was identified by RT-PCR amplification, cloning, and sequencing (25). This study also detected an additional sg mRNA band between the sg mRNA5 and sg mRNA6 bands on the Northern blots that was not characterized (25). Additional ORFs encoding the E and ORF5a proteins were discovered in other arterivirus genomes, and the SHFV genome was predicted to encode three additional proteins, E, E′, and ORF5a. It was proposed that the arterivirus E and ORF5a proteins are expressed from the second ORF of bicistronic sg mRNAs (21, 22, 34).
In the present study, the presence of an additional sg mRNA band between sg mRNA5 and sg mRNA6 was confirmed, and this band was shown to contain seven different sg mRNAs of similar size, each regulated by a unique body TRS. All seven of these sg mRNAs encode the same in-frame C-terminal region of GP5. To identify additional functional TRSs, the SHFV sg mRNA transcriptome was analyzed first by amplification, cloning, and sequencing of sg mRNA leader–body junctions and then by next-generation sequencing (NGS) of mRNAs extracted from SHFV-infected MA104 cells and macaque MΦs. A total of 96 functional body TRSs were identified in the SHFV genome; the majority were located in the 3′ structural protein region of the genome, but some were also located in the ORF1a/1b region and produced sg mRNAs encoding nonstructural proteins. Thirty-four of the identified TRSs produced sg mRNAs with two or three variant leader–body junction sequences. The relative abundance of sg mRNAs produced at individual TRSs remained consistent at early and late times post infection in both cell types analyzed. The majority of the newly identified TRSs produced alternative sg mRNAs for known structural proteins. Separate TRSs and sg mRNAs were identified for E′, GP2, and ORF5a. In addition, sg mRNAs encoding in-frame C-terminal ORFs of a number of structural proteins as well as previously unreported ORFs in alternative reading frames were detected. Mass spectrometry analysis of the viral proteome in SHFV-infected MA104 cells detected all the previously identified and predicted SHFV proteins except for ORF5a and nsp6. This analysis also detected one of the alternative reading frame proteins and suggested the production of some of the C-terminal ORFs. The expanded sg mRNA transcriptome and coding capacity and the complex but consistently regulated sg mRNA production for individual ORFs discovered for SHFV are likely characteristic of other nidoviruses.
Results
An Additional sg mRNA Band Was Consistently Detected in SHFV-Infected MA104 Cell Lysates by Northern Blotting.
The SHFV structural protein ORFs are expressed from sg mRNAs. In an initial study, six strong sg mRNA bands, which were thought to encode three minor and three major structural proteins as in other arteriviruses, were detected by Northern blotting of RNA extracted from SHFV-infected MA104 cell lysates (35, 37). However, subsequent sequencing of the 3′ end of the SHFV genome predicted the presence of three additional minor structural protein ORFs in this region and suggested that nine sg mRNAs should have been detected (36). A recent Northern blot analysis performed with a digoxigenin (DIG)-labeled 5′ leader probe on RNA extracted from SHFV-infected MA104 cells [multiplicity of infection (MOI) of 1] at 8, 16, and 24 h post infection (hpi) detected nine sg mRNA bands with sizes corresponding to those predicted for the nine ORFs (25). However, an additional sg mRNA band was consistently detected between sg mRNA5 and sg mRNA6 (25). To confirm the production of the additional sg mRNA band, MA104 cells were infected with SHFV infectious clone (SHFVic) virus at an MOI of 1, and total intracellular RNA was extracted at different times after infection and subjected to Northern blotting analysis using a DIG-labeled sg mRNA7 probe (Table S1). A genomic RNA band and 10 sg mRNA bands were detected by 8 hpi (Fig. 1A). The sizes of nine of these sg mRNA bands were 5.0, 4.4, 4.0, 3.5, 2.8, 2.6, 1.9, 1.2, and 0.6 kb and corresponded to the sizes of the previously identified sg mRNA2′, sg mRNA3′, sg mRNA4′, sg mRNA2, sg mRNA3, sg mRNA4, sg mRNA5, sg mRNA6, and sg mRNA7, respectively (25). The additional ∼1.7-kb band was detected between sg mRNA5 (1.9 kb) and sg mRNA6 (1.2 kb). To further characterize this sg mRNA, MA104 cells were infected with SHFVic at an MOI of 1, and at 12 hpi total intracellular RNA was extracted from three biological-repeat experiments and was subjected to Northern blotting using either a sg mRNA5 probe (targeting the region between TRS5 and TRS6) or a sg mRNA6 probe (targeting the region between TRS6 and TRS7) (Table S1). The sg mRNA5 probe detected the bands corresponding to all the previously identified sg mRNAs except for sg mRNA6 and sg mRNA7 and also detected the additional ∼1.7-kb band (Fig. 1B). The sg mRNA6 probe detected all the sg mRNAs except for sg mRNA7 and also strongly detected the ∼1.7-kb band. Based on the Northern blotting data, the ∼1.7-kb sg mRNA band was predicted to be generated from a body TRS located between TRS5 and TRS6.
Table S1.
Primers used to generate SHFV 3′ end probes for Northern blotting
Name | Sequence (5′ to 3′) | Targeted region | Probe length in nucleotides (nucleotide position) |
sg mRNA5-F | TCTATTACATTCAGCAGCACCGGCGCATCC | TRS5 to TRS6 | 541 (14,098–14,638) |
sg mRNA5-R | cggcggataatacgactcactatagggCCATACAATTTACCACTAAC | ||
sg mRNA6-F | ATTTCCGACCCACAGGGACTGCGGGTTGGACCTCATAAG | TRS6 to TRS7 | 508 (14,719–15,226) |
sg mRNA6-R | cggcggataatacgactcactatagggCATGAGACTACCATTGACTGC | ||
sg mRNA7-F | GCTGGCAAACCAAAAACAAATAACAAGGG | TRS7 to 3′ -UTR | 392 (15,298–15,689) |
sg mRNA7-R | cggcggtaatacgactcactatagggTTAGTCCTTAGCCTAGGGAAG |
The T7 promoter sequence is shown in lowercase. Nucleotide numbering is according to the SHFV infectious clone genomic RNA.
Fig. 1.
Northern blot analysis of sg mRNAs produced in SHFVic-infected MA104 cells. (A) MA104 cells were mock infected or infected with SHFVic at an MOI of 1. At different times post infection, total intracellular RNA was extracted, and 1 µg of RNA was separated on a 1% denaturing agarose gel followed by transfer to an Hybond-N+ membrane. After UV cross-linking, the membrane was hybridized to a DIG-labeled RNA probe specific for sg mRNA7. (B) Total intracellular RNA collected at 12 hpi from three biological repeats was hybridized separately to DIG-labeled RNA probes specific for sg mRNA5 or sg mRNA6. The RNA bands detected were labeled based on the estimated sizes of the predicted structural protein sg mRNAs and on the probe used. The additional sg mRNA band detected is indicated. The RNA ladder was cut from the membrane, stained with methylene blue, and imaged. L, DNA ladder; M, mock-infected; 1, 2, and 3 indicate the different biological repeat samples tested.
Identification of the Body TRS for the ∼1.7-kb sg mRNA.
To identify the body TRS for the ∼1.7-kb sg mRNA, a set of primers was designed to be complementary to the 5′ leader region and a region upstream of TRS6 and used for RT-PCR amplification of RNA extracted from SHFVic-infected MA104 cells (MOI of 1) at 24 hpi (Fig. S1A). Multiple strong bands close to the predicted size (852 bp) of the sg mRNA5 leader–body junction as well as a cluster of fainter, faster-migrating bands (∼500 bp) were detected (Fig. S1B). No bands were detected in the mock-infected samples. The region of the gel containing the faster-migrating bands was excised, and the DNA was extracted and cloned into a TA vector. Forty colonies were randomly selected, and the plasmid DNAs were extracted. The leader–body junction inserts were cut out by restriction digestion and separated by gel electrophoresis (Fig. S1C). Clones containing leader–body junctions of different sizes were sequenced. Alignment of the resulting sequences with both the 5′ leader and the 3′ SHFV genome sequences revealed seven additional functional body TRSs located between TRS5 and TRS6 (Fig. S2). The sg mRNAs generated from all seven of these TRSs encoded the same in-frame, C-terminal peptide of GP5 that was designated “ORF5-C-68aa” (Fig. S1D).
Fig. S1.
Amplification and cloning of sg mRNA leader–body junctions generated from additional functional TRSs located in the genomic region between identified TRS5 and TRS6. (A) Diagram indicating the positions of the primers used and the estimated size for the amplified leader–body junction sequences (thick black line). The white open box represents ORF5, and the short black box represents the leader sequence in transcribed sg mRNAs. (B) MA104 cells were either mock-infected (M) or infected with SHFVic at an MOI of 1. At 24 hpi, total intracellular RNA was extracted and subjected to RT-PCR, and the products were separated on a 2% DNA gel. The band with the size estimated for the leader–body junction in sg mRNA5 produced from the known TRS5 is indicated by an arrow. PCR bands with sizes estimated for the leader–body junctions of ∼1.7-kb sg mRNAs are indicated by a bracket. L, ladder. (C) The bracketed region of the gel was excised, and the DNA was extracted and cloned into a TA vector. Forty colonies were randomly selected and subjected to restriction digestion, and the inserts were separated by gel electrophoresis. The results from 10 representative clones are shown. L, ladder. (D) Diagram showing the locations of the known and previously unreported body TRSs. The TRSs are indicated by black vertical bars. The previously unreported functional TRSs are within a dotted line box. The ORFs encoded by the individual sg mRNAs are indicated by white open boxes. 5-C, ORF5-C-68aa; L, leader region.
Fig. S2.
Sequence alignment of each of the seven ORF5-C-68aa sg mRNAs with the 5′ leader sequence and the 3′ genome sequence. The core sequences of the leader TRS and each of the body TRSs are underlined. The region in italics in each ORF5-C-68aa sg mRNA has the same sequence as a 3′ region of the genome. The sequences shaded in gray were used to calculate the stability of the duplex formed between the leader TRS and each of the seven body TRSs using ViennaRNA Package 2.3.5 software. The lowest free energy (kcal/mol) for each duplex is shown on the right together with the estimated relative abundance for each ORF5-C-68aa sg mRNA.
Identification of Additional Functional Body TRSs in the SHFV Genome.
The discovery of multiple previously unreported functional body TRSs between TRS5 and TRS6 suggested that the additional band detected on the Northern blots (Fig. 1) contained a group of sg mRNAs generated from nearby body TRSs. Multiple PCR bands were also detected in the region of the gel containing the predicted sg mRNA5 leader–body junction (852 bp) (Fig. S1B), strongly suggesting the existence of additional functional TRSs near TRS5 and possibly also adjacent to the previously identified body TRSs for the other 3′ SHFV ORFs. To identify additional functional TRSs within the 3′ end of the SHFV genome, nine reverse primers were designed, each targeting a region downstream of the identified body TRS of one of the 3′ ORFs. Each reverse primer together with the same 5′ leader forward primer was used for RT-PCR amplification of the leader–body junctions of sg mRNAs generated from the corresponding 3′ region of the genome (Table S2). After RT-PCR amplification and TA cloning, 20 clones were randomly selected for each of the nine regions and sequenced. All nine of the previously published body TRSs were detected, and a total of 36 additional functional body TRSs were discovered (Table S3). Twenty-one of these 36 body TRSs functioned as redundant body TRSs generating minus-strand templates for sg mRNAs encoding known structural proteins (Table S3). The remaining 15 of these additional body TRSs generated minus-strand templates for sg mRNAs encoding an in-frame, C-terminal peptide of a known structural protein. These peptides were designated “ORF2a′-C-25aa,” “ORF4′-C-84aa,” “ORF2a-C-161aa,” ORF5-C-68aa, and “ORF6-C-79aa” (Fig. 2 and Table S3).
Table S2.
Primers used to amplify the leader–body junctions of individual SHFV sg mRNAs
Name | Sequence, 5′ to 3′ |
SHFV-5′-Leader -F | tagcccggattggataagc |
SHFV-TRS2′ -R | agtacctgcgtgtactgtgg |
SHFV-TRS3′ -R | ccagatgctaagatctgcc |
SHFV-TRS4′ -R | cagtcagcaaagtcaagagc |
SHFV-TRS2-R | ttgacatgaagggtggcatgg |
SHFV-TRS3-R | agtgagtgcagaagaagagcc |
SHFV-TRS4-R | ttatgatggaagcccaccg |
SHFV-TRS5-R | gaaaggaggcagttgtagc |
SHFV-TRS5-C-R | cctcggtcgtcaatgatg |
SHFV-TRS6-R | catccgcacacgtagaatgg |
SHFV-TRS7-R | accagttagtccttagcc |
Table S3.
SHFV body TRSs identified by amplification and cloning of sg mRNA leader–body junctions
5′ ORF encoded (protein) | TRS | Body TRS 5′ to 3′ | Leader–body junctions 5′ to 3′ | 5′ ORF encoded (protein) | TRS | Body TRS 5′ to 3′ | Leader–body junctions 5′ to 3′ |
ORF2a′ (GP2′) | This study | ttctgaacc | TCCTGAACCCTTGCA | ORF3 (GP3) | This study | ttctaaata | TCCTAAATACCTCAC |
TRS2′* | tctttaact | TCCTTAACTTCTGTT | TRS3* | tcactaacc | TCCCTAACCCATGGA | ||
ORF2b′ (E′) | This study | ggtttaatc | TCCTTAATCCTACTG | ORF4 (GP4) | This study | acttcaaca | TCCTCAACACCTCAG |
This study | tccttaaac | TCCTTAAACCAGTCG | This study | atccaaacc | TCCTAAACCCGTCAA | ||
ORF3′ (GP3′) | This study | ttatttccc | TCCTTTCCCATTGCG | TRS4* | tcattgacc | TCCTTGACCAAAACA | |
This study | gggctaaac | TCCCTAAACATCCTC | ORF5 (GP5) | This study | acaacaacc | TCCTTAACCAGCATC | |
TRS3′* | cttaaaacc | TCCTAAACCTATTCA | This study | cccataacc | TCCTTAACCATAGTG | ||
This study | gatttcaac | TCCTTCAACATCACT | This study | ttcttcgcc | TCCTTCGCCAATCAC | ||
This study | accttcacc | TCCTTCACCATCAAT | This study | gtgataatc | TCCTTAATCTGTATC | ||
This study | ttctgtacc | TCCTGTACCATGGTT | This study | ttgttatca | TCCTTATCATGACTC | ||
ORF2a′-C-25aa | This study | cccttaact | TCCTTAACTTTACTA | This study | atcataacc | TCCATAACCTTGTTC | |
This study | aactttact | TCCTTTACTAATAGT | TRS5* | tccttaact | TCCTTAACTACCTAA | ||
This study | tcgcaaacc | TCCTAAACCCCGGTT | This study | taattatgt | TCCTTATGTACTTAT | ||
This study | caatcaacc | TCCTCAACCTCACAG | ORF5-C-68aa | This study | aaaataaca | TCCATAACACTCACT | |
This study | gcctcaacg | TCCTCAACGCATGAG | This study | agtcaaacc | TCCTTAACCAATCTT | ||
ORF4′ (GP4′) | This study | gccttaacc | TCCTTAACCTTTCAC | This study | ttgttcaac | TCCTTCAACGTGTTT | |
TRS4′* | cctttcacc | TCCTTCACCCTGAAC | This study | ttcttcaca | TTCTTCACACTTACA | ||
ORF4′-C-84aa | This study | ggcgaaacg | TCCGAAACGTGTGAT | This study | tcaataaca | TCCTTAACACTAGGC | |
ORF2b (E) | TRS2* | ctattaacc | TCCTTAACCAAATAC | This study | ttactaacc | TCCTTAACCCATTTA | |
This study | tgggcaacc | TCCTCAACCATCATT | This study | tgtttcacc | TCCTTCACCGGCGAT | ||
ORF2a (GP2) | This study | atactcaccc† | TACTCACCCACATCA | ORF6 (M) | TRS6* | ttgtcaacc | TCCTCAACCACGACG |
ORF2a-C-161aa | This study | caattaaat | TCCTTAAATCTATTT | ORF6-C-79aa | This study | tacttgacc | TCCTTGACCTATAAA |
ORF7 (N) | TRS7* | ttgttaacc | TCGTTAACCTGAGGA |
Previously published SHFV body TRSs.
An additional nucleotide was added at the 5′ end of the body TRS to differentiate the TRS from homologous sequences at other locations in the SHFV genome.
Fig. 2.
Diagram of the genome locations of the identified SHFV body TRSs and ORFs. Previously published body TRSs are in red. When more than one body TRS is used for an ORF, an asterisk indicates the TRS with the highest sg mRNA abundance.
The SHFV sg mRNA2′, sg mRNA2, and sg mRNA5 were previously predicted to bicistronically express GP2′/E′, E/GP2, and GP5/ORF5a proteins, respectively. In sg mRNA2′, the start codon for E′ is located 91 nt downstream of the GP2′ start codon. Two additional functional body TRSs were identified in this 91-nt region that produced sg mRNAs encoding E′ as the 5′ proximal ORF (Table S3). In sg mRNA2, the GP2 start codon is located only 37 nt downstream of the E start codon. One additional functional body TRS was identified in this 37-nt region that produced a sg mRNA encoding GP2 as the 5′ proximal ORF (Table S3). The data indicate that E′ and GP2 can be expressed monocistronically from separate sg mRNAs. However, the possibility that bicistronic expression of these proteins may also occur was not ruled out. A body TRS that generates a separate sg mRNA encoding ORF5a as its 5′ proximal ORF was not identified after TA cloning of the amplified leader–body junctions.
Mutation of the Start Codon of ORF5-C-68aa or ORF6-C-79aa Reduced SHFV Yield from MA104 Cells.
Among the 36 body TRSs identified in this work, 15 produced sg mRNAs encoding the in-frame C-terminal peptide of a known structural protein as the 5′ proximal ORF. As an initial means of analyzing the functional relevance of these C-terminal peptides, the start codon for each peptide was separately mutated in the SHFVic LVR strain to generate SHFVic-∆ORF2a′-C-25aa, SHFVic-∆ORF4′-C-84aa, SHFVic-∆ORF2a-C-161aa, SHFVic-∆ORF5-C-68aa, and SHFVic-∆ORF6-C-79aa. In each case, the nucleotide substitution did not change the amino acid in the overlapping ORF (Table S4). Passage 1 (P1) virus stocks were generated as described in Materials and Methods. MA104 cells were infected with either wild-type infectious clone P1 virus or with one of the mutant P1 viruses at an MOI of 0.5, and viral infectivity in culture fluids harvested at different times after infection was assessed by plaque assay in MA104 cells. Compared with wild-type SHFV, the SHFVic-∆ORF5-C-68aa and SHFVic-∆ORF6-C-79aa mutant viruses produced decreased virus yields starting at 24 hpi (Fig. 3A), whereas the SHFVic-∆ORF2a′-C-25aa, SHFVic-∆ORF4′-C-84aa, and SHFVic-∆ORF2a-C-161aa mutant viruses produced yields similar to those of the wild-type virus (Fig. 3B). The SHFVic-∆ORF5-C-68aa and SHFVic-∆ORF6-C-79aa mutant viruses also produced small plaques (Fig. 3C). These data suggested that ORF5-C-68aa and ORF6-C-79aa could be functionally important during the viral replication cycle.
Table S4.
Primers used to mutate the start codons of identified C-terminal in-frame ORFs of known structural proteins
Mutant ORFs | Primer sequences, 5′ to 3′* | Effects of the mutations on the overlapping ORFs |
∆ORF2a′-C-25aa | F: gcgcctcaacgcaCgagtgcacctcg | In GP2′ –C: ATG to ACG (start codon knockout) |
R: cgaggtgcactcGtgcgttgaggcgc | In GP2′: ATG to ACG (Met to Thr) | |
In GP3′: CAT to CAC (His to His) | ||
∆ORF4′-C-84-aa | F: gagacaattacaggcaacCtgactggtatcaaagaagc | In GP4′ –C: ATG to CTG (start codon knockout) |
R: gcttctttgataccagtcaGgttgcctgtaattgtctc | In GP4′: ATG to CTG (Met to Leu) | |
∆ORF2a-C-61aa | F: gtcccgaggcgcttacaaaaaCgcttttgcgcc | In GP2–C: ATG to ACG (start codon knockout) |
R: ggcgcaaaagcGtttttgtaagcgcctcgggac | In GP2: ATG to ACG (Met to Thr) | |
In E: AAT to AAC (Asn to Asn) | ||
∆ORF5-C-68aa | F: caagaagttagtggtaaattgtCtggccctccgc | In GP5–C: ATG to CTG (start codon knockout) |
R: gcggagggccaGacaatttaccactaacttcttg | In GP5: ATG to CTG (Met to Leu) | |
∆ORF6-C-79-aa | F: ccattctacgtgtgcggCtgtgttggctcggc | In GP6–C: ATG to CTG (start codon knockout) |
R: gccgagccaacacaGccgcacacgtagaatgg | In GP6: ATG to CTG (Met to Leu) |
Uppercase letters indicate the mutated nucleotides in the primer sequence.
Fig. 3.
Effect of mutagenesis of the start codons of the C-terminal ORFs encoded by identified sg mRNAs on virus production in MA104 cells. (A and B) MA104 cells were infected (MOI of 0.5) with P1 of wild-type or the indicated start codon-mutant SHFVic virus. At different times after infection, the culture fluid was collected, and virus infectivity was determined by plaque assay on MA104 cells. (C) Comparison of the diameters of the plaques produced by wild-type and mutant SHFVic viruses harvested at 24 hpi.
NGS Identification of Functional SHFV Body TRSs and sg mRNAs with Alternative Leader–Body Junctions Generated from the Same Body TRS.
Although the strategy based on RT-PCR amplification of leader–body junctions from SHFV sg mRNAs facilitated the discovery of a number of previously unreported functional body TRSs, the depth of this analysis was limited both by the total number of clones screened in each region and by the relative abundance of each sg mRNA. To increase the depth of the analysis and to identify sg mRNAs with low abundance, MA104 cells were infected with SHFVic at an MOI of 1, total RNA was extracted at 8 and 18 hpi, and mRNAs were isolated and subjected to Illumina HiSeq analysis. Three biological repeats for each time point were sequenced and analyzed with CLC Genomics Workbench software. A workflow, which is described in Material and Methods, was designed and used to identify sg mRNA reads containing a leader–body junction sequence and then to sequentially map them to each of the 45 identified leader–body junction sequences (15 nt in length). Multiple reads in both the 8- and 18-h infected samples mapped to each of the 45 junction sequences, confirming the production of sg mRNAs from all the identified SHFV body TRSs at both an early and a later time post infection. Surprisingly, ∼64% of the sg mRNA reads in the samples collected at both times remained unmapped, indicating the junction sequences they contained did not have 100% homology with any of the identified leader–body junctions and suggesting the existence of many additional sg mRNAs and body TRSs. To locate the positions of the additional functional body TRSs in the SHFV genome, all the remaining sg mRNA reads from the 18-h MA104 sample were mapped to an SHFV genome sequence without the 5′ end (nucleotides 250–15,717) (Fig. 4). The majority of the remaining sg mRNA reads mapped to the 3′ region of the genome, with the highest number mapping close to ORF7 as indicated by the large peak in the read coverage that contained about half the remaining reads. Inspection of the sequences of the reads mapping to this region revealed an sg mRNA with an alternative leader–body junction sequence (5′-TCCTTAACCTGAGGA-3′) generated from the published TRS7 during discontinuous synthesis of minus-strand RNA (Table S5). Unexpectedly, some of the remaining sg mRNA reads mapped to various locations in the ORF1a/1b region (Fig. 4). Although the abundance of these reads was lower than that of those mapping to the 3′ end of the genome, their detection indicated that discontinuous RNA synthesis can also occur within the ORF1a/1b region, with a higher number of reads from the ORF1b region than from the ORF1a region.
Fig. 4.
Genome alignment of the NGS sg mRNA reads that did not contain one of the 45 identified leader–body junction sequences. MA104 cells were infected with SHFVic at an MOI of 1. At 18 hpi, total intracellular RNA was extracted, and mRNA was isolated and subjected to library preparation and RNA sequencing (RNA-seq) using Illumina HiSeq. The resulting reads were trimmed, and SHFV sg mRNA reads were extracted as described in Material and Methods. All the sg mRNA reads were mapped to each of the 36 newly identified as well as to the nine previously published 3′ region leader–body junctions. The remaining unmapped sg mRNA reads were then mapped to an SHFV genome sequence without the 5′ end (nucleotides 250–15,717). The mapping results are displayed with the read depth of 0–120 shown in detail and the remainder of the reads shown in compact view. The previously known SHFV genome ORFs are indicated by horizontal arrows at the top with the start site of each ORF indicated by a vertical line. The pink coverage peak indicates a region with a very high number of mapped reads. Each green dot represents a single forward read, each red dot represents a single reverse read, and each blue dot represents a paired read. The numbers on the left side indicate the read depth.
Table S5.
SHFV body TRSs and sg mRNA leader–body junction sequences identified by NGS
5′ ORF encoded (protein) | TRS | Body TRSs* 5′ to 3′ | Leader–body junctions 5′ to 3′ | 5′ ORF encoded (protein) | TRS | Body TRSs* 5′ to 3′ | Leader–body junctions 5′ to 3′ |
ORF2a′ (GP2′) | This study | ttctgaacc | TCCTGAACCCTTGCA | ORF5 (GP5) | This study | cccataacc† | TCCATAACCATAGTG |
TRS2‡ | tctttaact† | TCTTTAACTTCTGTT | TCCTTAACCATAGTG | ||||
TCCTTAACTTCTGTT | This study | ttcttcgcc | TCCTTCGCCAATCAC | ||||
This study | ATTTTGACC† | TCTTTGACCAAAATC | This study | gtgataatc | TCCTTAATCTGTATC | ||
TCCTTGACCAAAATC | This study | ttgttatca | TCCTTATCATGACTC | ||||
This study | CATTTTGAC | TCCTTTGACCAAAAT | This study | atcataacc† | TCCTTAACCTTGTTC | ||
ORF2b′ (E′) | This study | ggtttaatc | TCCTTAATCCTACTG | TCCATAACCTTGTTC | |||
This study | tccttaaac† | TCCTAAACCAGTCGA | TRS5‡ | tccttaact | TCCTTAACTACCTAA | ||
TCCTTAAACCAGTCG | This study | taattatgt | TCCTTATGTACTTAT | ||||
ORF3′ (GP3′) | This study | ttatttccc | TCCTTTCCCATTGCG | This study | GCATCAACG† | TCATCAACGTCACAG | |
This study | gggctaaac | TCCCTAAACATCCTC | TCCTCAACGTCACAG | ||||
TRS3′‡ | cttaaaacc† | TCCAAAACCTATTCA | This study | TTCTTCACC† | TTCTTCACCTGTCGA | ||
TCCTAAACCTATTCA | TCCTTCACCTGTCGA | ||||||
This study | gatttcaac | TCCTTCAACATCACT | This study | CTGTCGACC | TCCTCGACCACGTCC | ||
This study | accttcacc | TCCTTCACCATCAAT | ORF5–C, 68 aa | This study | aaaataaca† | TCCTTAACACTCACT | |
This study | ttctgtacc | TCCTGTACCATGGTT | TCCATAACACTCACT | ||||
This study | TCTGTACCA | TCCTTACCATGGTTT | This study | agtcaaacc† | TCCTAAACCAATCTT | ||
This study | TATTTCCCA | TCCTTCCCATTGCGA | TCCTTAACCAATCTT | ||||
ORF2a′-C-25aa | This study | cccttaact | TCCTTAACTTTACTA | This study | ttgttcaac | TCCTTCAACGTGTTT | |
This study | aactttact | TCCTTTACTAATAGT | This study | ttcttcaca† | TCCTTCACACTTACA | ||
This study | tcgcaaacc | TCCTAAACCCCGGTT | TTCTTCACACTTACA | ||||
This study | caatcaacc | TCCTCAACCTCACAG | This study | tcaataaca† | TCCATAACACTAGGC | ||
This study | gcctcaacg | TCCTCAACGCATGAG | TCCTTAACACTAGGC | ||||
This study | GCGGTAACG† | TCCGTAACGTGTCGT | This study | ttactaacc† | TCCCTAACCCATTTA | ||
TCCTTAACGTGTCGT | TCACTAACCCATTTA | ||||||
ORF4′ (GP4′) | This study | gccttaacc | TCCTTAACCTTTCAC | TCCTTAACCCATTTA | |||
TRS4′‡ | cctttcacc | TCCTTCACCCTGAAC | This study | tgtttcacc | TCCTTCACCGGCGAT | ||
This study | TCTTTAATT | TCCTTAATTTCACTG | This study | CATTTATTG | TCCTTATTGAGTTAT | ||
This study | ACCTTTCAC | TCCTTTCACCCTGAA | This study | CAGTCAAAC | TCCTCAAACCAATCT | ||
This study | CCCTGAACT | TCCTGAACTCTTTGG | ORF6 (GP6) | TRS6‡ | ttgtcaacc† | TCGTCAACCACGACG | |
ORF4′-C-84aa | This study | ggcgaaacg | TCCGAAACGTGTGAT | TCCTTAACCACGACG | |||
ORF2b (E) | TRS2‡ | ctattaacc† | TCATTAACCAAATAC | TCCTCAACCACGACG | |||
TCCTTAACCAAATAC | ORF6-C-79aa | This study | tacttgacc† | TACTTGACCTATAAA | |||
This study | tgggcaacc† | TCCGCAACCATCATT | TCCTTGACCTATAAA | ||||
TCCTTAACCATCATT | This study | TTACTAACA† | TCCTTAACAATTGGG | ||||
TCCTCAACCATCATT | TCCCTAACAATTGGG | ||||||
ORF2b-C-51aa | This study | TCTTTGACC | TCCTTGACCTGCTCA | This study | ACGTGGACC | TCCTGGACCATTCTA | |
ORF2a (GP2) | This study | atactcaccc†,§ | TCCTCACCCACATCA | ORF6-C-40aa | This study | CCCTTGGCC | TCCTTGGCCGTTTAG |
TACTCACCCACATCA | This study | CGATTAACG† | TCATTAACGCGACTG | ||||
ORF2a-C-161aa | This study | caattaaat | TCCTTAAATCTATTT | TCCTTAACGCGACTG | |||
ORF2a-C-137aa | This study | ACCTTACCC | TCCTTACCCTAACCA | This study | GTCGTAACT† | TCCGTAACTCGCCGA | |
This study | ACCCTAACC† | TCCCTAACCATCCTC | TCCTTAACTCGCCGA | ||||
TCCTTAACCATCCTC | ORF7 (GP7) | TRS7‡ | ttgttaacc† | TCCTTAACCTGAGGA | |||
ORF3 (GP3) | This study | ttctaaata | TCCTAAATACCTCAC | TCGTTAACCTGAGGA | |||
TRS3‡ | tcactaacc† | TCACTAACCCATGGA | ORF 15,419–15,505 | This study | TGGCAAACC | TCCTTAACCAAAAAC | |
TCCTTAACCCATGGA | This study | CAAATAACA† | TCCATAACAAGGGAA | ||||
TCCCTAACCCATGGA | TCCTTAACAAGGGAA | ||||||
This study | TCTTTCACC | TCCTTCACCCTCGAC | This study | TCCCCAACG | TCCCCAACGACCTCG | ||
This study | GTTCTTTCAC§ | TCCTTTCACCCTCGA | This study | GGCTTCCCC | TCCTTCCCCAACGAC | ||
ORF4 (GP4) | This study | acttcaaca† | TCTTCAACACCTCAG | This study | CACTCAACA | TCCTCAACAACGTAG | |
TCCTCAACACCTCAG | ORF12,407–12,628 | This study | TCTTTACCG | TCCTTACCGGGAACA | |||
This study | atccaaacc† | TCCCAAACCCGTCAA | This study | TTCTTTACC | TCCTTTACCGGGAAC | ||
TCCTAAACCCGTCAA | ORF12,191–12,271 | This study | ATCATACCA | TCCATACCAACATTA | |||
TRS4‡ | tcattgacc† | TCATTGACCAAAACA | ORF10,071–10,331 | This study | CGACTAACC† | TCACTAACCGGAATA | |
TCCTTGACCAAAACA | TCCCTAACCGGAATA | ||||||
This study | TACTTCCCA | TCCTTCCCAGTTCTT | TCCTTAACCGGAATA | ||||
This study | TTTTCAACG† | TCTTCAACGCCACTT | This study | GGCTGTCAC | TCCTGTCACCTCTAT | ||
TCCTTCAACGCCACT | ORF5,920–5,997 | This study | AACTCAACT | TCCTCAACTTCCTTG | |||
TCCTCAACGCCACTT | ORF13,699–13,836 | This study | TTGCAAACC | TCCTAAACCTTACTT | |||
This study | CTTCTGTCCA§ | TCCTGTCCAGTTTGA | This study | GGCTTACTG | TCCTTACTGGGAGAC | ||
This study | TTGATAACG† | TCCATAACGAGACAT | This study | TACTTACTC | TCCTTACTCCTCCAA | ||
TCCTTAACGAGACAT | nsp12–C, 107 aa | This study | CGCTAAACC | TCCTAAACCTCTGAG | |||
This study | TTCGTAACA | TCCGTAACATACAAA | This study | GCGCTAAAC | TCCCTAAACCTCTGA | ||
This study | CACTTCAAC | TCCTTCAACACCTCA | ORF1b–C, 189 aa | This study | ATGTTGACC | TCCTTGACCACAAGG | |
ORF5a | This study | TACTTATGT | TCCTTATGTTTAGGG | ORF1b–C, -215 aa | This study | ACGGTAACT | TCCGTAACTGATATC |
ORF5 (GP5) | This study | acaacaacc† | TCCTCAACCAGCATC | ORF1b–C, 579 aa | This study | ACGCTAACC† | TCCCTAACCCCTTTG |
TCCACAACCAGCATC | TCCTTAACCCCTTTG | ||||||
TCCTTAACCAGCATC | ORF1b–C, 1021 aa | This study | GCCAATACC | TCCAATACCATCTAT | |||
ORF1a–C, 67a | This study | TCGCTGACC | TCCTTGACCACCCTG |
Lowercase letters represent body TRSs identified by amplification, cloning, and sequencing of leader–body junctions; uppercase letters represent body TRSs identified by NGS.
Body TRSs that can generate two or more sg RNAs with alternative leader–body junction sequences.
Previously published SHFV body TRSs.
An additional nucleotide was added at the 5′ end of the body TRS to differentiate this TRS from homologous sequences at other locations in the SHFV genome.
To identify all the functional body TRSs in the SHFV genome and all the alternative leader–body junctions generated from the same body TRS revealed by the NGS data, all the mapped remaining sg mRNA reads were analyzed for leader–body junction sequences. Fifty-one additional previously unreported body TRSs were identified, giving a total of 96 functional body TRSs in the SHFV genome (Fig. 2). Thirty-four of these 96 body TRSs generate two or more sg RNAs with alternative leader–body junction sequences, resulting in a total of 137 sg mRNAs, each containing a unique leader–body junction sequence (Table S5). A previously unreported body TRS (5′-TACTTATGT-3′) that generates the template for a separate sg mRNA with ORF5a as its 5′ proximal ORF was identified. Two previously unreported functional body TRSs were discovered in the ORF1a region, and eight were discovered in the ORF1b region that generated minus-strand templates for long sg mRNAs with 30 or more reads mapped at 18 hpi. Seven of these long sg mRNAs encode different lengths of in-frame C-terminal ORF1a or ORF1b peptides. Three of these long sg mRNAs encode two previously unreported ORFs in alternative reading frames: ORF10,071–10,331 encoding an 86-aa protein and ORF5,920–5,997 encoding a 25-aa protein (Fig. 2), Eleven functional TRSs were also identified in the 3′ structural protein region that generated sg mRNAs encoding four additional previously unreported ORFs in alternative reading frames: ORF15,419–15,505 encoding a 28-aa protein, ORF12,407–12,628 encoding a 73-aa protein, ORF12,191–12,271 encoding a 26-aa protein, and ORF13,699–13,836 encoding a 45-aa protein (Fig. 2).
The Relative Abundance of Individual SHFV sg mRNAs Was Consistent at Early and Late Times Post Infection in MA104 Cells and Macaque MΦs.
The number of reads mapping to each of the 137 unique leader–body junction sequences was used to estimate the transcription level of sg mRNA produced from each body TRS. Separate calculations were made from the data generated for each of the three biological repeats. Because the values obtained for the three repeats were very similar, read data from one representative repeat are shown in Table S6. In cases where sg mRNAs with variant leader–body junction sequences were produced from a single body TRS (indicated by a dagger in Table S6), the reads for each of these sg mRNAs were added together to obtain the total transcription level from that body TRS (Table S6, “per TRS” column). When multiple body TRSs produced sg mRNAs encoding the same ORF, the sum of the reads for all these TRSs was used to indicate the total transcription level for that ORF (Table S6, “per ORF” column). The total number of reads for each body TRS was higher at 18 hpi than at 8 hpi, indicating increased transcription from each body TRS with time after infection.
Table S6.
NGS reads mapping to the SHFV sg mRNA leader–body junctions produced from each identified body TRS at 8 and 18 hpi in SHFV-infected MA104 cells
5′ ORF encoded (gene) | TRS | Body TRSs* 5′ to 3′ | 8 hpi | 18 hpi | Fold change over time | |||
Per TRS | Per ORF | Per TRS | Per ORF | Per TRS | Per ORF | |||
ORF2a′ (GP2′) | This study | ttctgaacc | 6,151 | 29,503 | 7,458 | 34,889 | 1.21 | 1.18 |
TRS2′† | tctttaact‡ | 22,945 | 26,889 | 1.17 | ||||
This study | ATTTTGACC‡ | 209 | 274 | 1.31 | ||||
This study | CATTTTGAC | 198 | 268 | 1.35 | ||||
ORF2b′ (E′) | This study | ggtttaatc | 2,491 | 17,539 | 3,326 | 22,374 | 1.34 | 1.28 |
This study | tccttaaac‡ | 15,048 | 19,048 | 1.27 | ||||
ORF3′ (GP3′) | This study | ttatttccc | 3,371 | 7,220 | 3,965 | 9,095 | 1.18 | 1.26 |
This study | gggctaaac | 187 | 198 | 1.06 | ||||
TRS3′† | cttaaaacc‡ | 826 | 1,102 | 1.33 | ||||
This study | gatttcaac | 53 | 74 | 1.40 | ||||
This study | accttcacc | 964 | 1,118 | 1.16 | ||||
This study | ttctgtacc | 1,610 | 2,322 | 1.44 | ||||
This study | TCTGTACCA | 87 | 105 | 1.21 | ||||
This study | TATTTCCCA | 122 | 211 | 1.73 | ||||
ORF2a′-C-25aa | This study | cccttaact | 1,435 | 2,055 | 1,884 | 2,694 | 1.31 | 1.31 |
This study | aactttact | 33 | 59 | 1.79 | ||||
This study | tcgcaaacc | 267 | 299 | 1.12 | ||||
This study | caatcaacc | 81 | 121 | 1.49 | ||||
This study | gcctcaacg | 98 | 125 | 1.28 | ||||
This study | GCGGTAACG‡ | 141 | 206 | 1.46 | ||||
ORF4′ (GP4′) | This study | gccttaacc | 9,905 | 14,235 | 11,342 | 16,322 | 1.15 | 1.15 |
TRS4′† | cctttcacc | 3,171 | 3,516 | 1.11 | ||||
This study | TCTTTAATT | 286 | 376 | 1.31 | ||||
This study | ACCTTTCAC | 213 | 272 | 1.28 | ||||
This study | CCCTGAACT | 660 | 816 | 1.24 | ||||
ORF4′-C-84aa | This study | ggcgaaacg | 16 | 16 | 40 | 40 | 2.50 | 2.50 |
ORF2b (E) | TRS2† | ctattaacc‡ | 18,007 | 20,125 | 22,461 | 25,386 | 1.25 | 1.26 |
This study | tgggcaacc‡ | 2,118 | 2,925 | 1.38 | ||||
ORF2b-C-51-aa | This study | TCTTTGACC | 169 | 169 | 282 | 282 | 1.67 | 1.67 |
ORF2a (GP2) | This study | atactcaccc‡,§ | 393 | 393 | 559 | 559 | 1.42 | 1.42 |
ORF2a-C-161aa | This study | caattaaat | 119 | 119 | 120 | 120 | 1.01 | 1.01 |
ORF2a-C-137aa | This study | ACCTTACCC | 296 | 708 | 498 | 1,151 | 1.68 | 1.63 |
This study | ACCCTAACC‡ | 412 | 653 | 1.58 | ||||
ORF3 (GP3) | This study | ttctaaata | 16 | 4,354 | 33 | 7,045 | 2.06 | 1.62 |
TRS3† | tcactaacc‡ | 3,178 | 5,387 | 1.70 | ||||
This study | TCTTTCACC | 1,069 | 1,489 | 1.39 | ||||
This study | GTTCTTTCAC§ | 91 | 136 | 1.49 | ||||
ORF4 (GP4) | This study | acttcaaca‡ | 6,328 | 19,655 | 9,564 | 29,030 | 1.51 | 1.48 |
This study | atccaaacc‡ | 3,048 | 4,315 | 1.42 | ||||
TRS4† | tcattgacc‡ | 4,606 | 6,443 | 1.40 | ||||
This study | TACTTCCCA | 138 | 193 | 1.40 | ||||
This study | TTTTCAACG‡ | 834 | 1,396 | 1.67 | ||||
This study | CTTCTGTCCA§ | 50 | 58 | 1.16 | ||||
This study | TTGATAACG‡ | 2,887 | 4,325 | 1.50 | ||||
This study | TTCGTAACA | 1,243 | 1,894 | 1.52 | ||||
This study | CACTTCAAC | 521 | 842 | 1.62 | ||||
ORF5a | This study | TACTTATGT | 10 | 10 | 11 | 11 | 1.10 | 1.10 |
ORF5 (GP5) | This study | acaacaacc‡ | 382 | 49,886 | 521 | 58,298 | 1.36 | 1.17 |
This study | cccataacc‡ | 7,508 | 8,399 | 1.12 | ||||
This study | ttcttcgcc | 4 | 3 | 0.75 | ||||
This study | gtgataatc | 75 | 130 | 1.73 | ||||
This study | ttgttatca | 103 | 178 | 1.73 | ||||
This study | atcataacc‡ | 24,774 | 29,219 | 1.18 | ||||
TRS5† | tccttaact | 15,407 | 17,610 | 1.14 | ||||
This study | taattatgt | 21 | 31 | 1.48 | ||||
This study | GCATCAACG‡ | 365 | 506 | 1.39 | ||||
This study | TTCTTCACC‡ | 1,127 | 1,515 | 1.34 | ||||
This study | CTGTCGACC | 120 | 186 | 1.55 | ||||
ORF5-C-68aa | This study | aaaataaca‡ | 1,874 | 6,302 | 2,577 | 8,799 | 1.38 | 1.40 |
This study | agtcaaacc‡ | 1,563 | 2,006 | 1.28 | ||||
This study | ttgttcaac | 226 | 295 | 1.31 | ||||
This study | ttcttcaca‡ | 791 | 1,100 | 1.39 | ||||
This study | tcaataaca‡ | 717 | 1,222 | 1.70 | ||||
This study | ttactaacc‡ | 496 | 653 | 1.32 | ||||
This study | tgtttcacc | 271 | 439 | 1.62 | ||||
This study | CATTTATTG | 202 | 265 | 1.31 | ||||
This study | CAGTCAAAC | 162 | 242 | 1.49 | ||||
ORF6 (GP6) | TRS6† | ttgtcaacc‡ | 48,768 | 48,768 | 62,270 | 62,270 | 1.28 | 1.28 |
ORF6-C-79-aa | This study | tacttgacc‡ | 2,035 | 2,879 | 3,510 | 5,190 | 1.72 | 1.80 |
This study | TTACTAACA‡ | 381 | 688 | 1.81 | ||||
This study | ACGTGGACC | 463 | 992 | 2.14 | ||||
ORF6-C-40-aa | This study | CCCTTGGCC | 48 | 4,024 | 100 | 7,467 | 2.08 | 1.86 |
This study | CGATTAACG‡ | 3,089 | 5,659 | 1.83 | ||||
This study | GTCGTAACT‡ | 887 | 1,708 | 1.93 | ||||
ORF7 (GP7) | TRS7† | ttgttaacc‡ | 232,720 | 232,720 | 310,224 | 310,224 | 1.33 | 1.33 |
ORF15,419–15,505 | This study | TGGCAAACC | 91 | 2,968 | 143 | 4,481 | 1.57 | 1.51 |
This study | CAAATAACA‡ | 71 | 116 | 1.63 | ||||
This study | TCCCCAACG | 1,015 | 1,619 | 1.60 | ||||
This study | GGCTTCCCC | 905 | 1,439 | 1.59 | ||||
This study | CACTCAACA | 886 | 1,164 | 1.31 | ||||
ORF12,407–12,628 | This study | TCTTTACCG | 216 | 2,300 | 270 | 2,604 | 1.25 | 1.13 |
This study | TTCTTTACC | 2,084 | 2,334 | 1.12 | ||||
ORF12,191–12,271 | This study | ATCATACCA | 386 | 386 | 534 | 534 | 1.38 | 1.38 |
ORF10,071–10,331 | This study | CGACTAACC‡ | 321 | 424 | 328 | 440 | 1.02 | 1.04 |
This study | GGCTGTCAC | 103 | 112 | 1.09 | ||||
ORF5,920–5,997 | This study | AACTCAACT | 26 | 26 | 31 | 31 | 1.19 | 1.19 |
ORF13,699–13,836 | This study | TTGCAAACC | 736 | 1,185 | 1,038 | 1,672 | 1.41 | 1.41 |
This study | GGCTTACTG | 236 | 341 | 1.44 | ||||
This study | TACTTACTC | 213 | 293 | 1.38 | ||||
nsp12-C-107aa | This study | CGCTAAACC | 1,472 | 1,518 | 2,030 | 2,077 | 1.38 | 1.37 |
This study | GCGCTAAAC | 46 | 47 | 1.02 | ||||
ORF1b-C-189aa | This study | ATGTTGACC | 41 | 41 | 48 | 48 | 1.17 | 1.17 |
ORF1b-C-215aa | This study | ACGGTAACT | 102 | 102 | 153 | 153 | 1.50 | 1.50 |
ORF1b-C–,579aa | This study | ACGCTAACC‡ | 173 | 173 | 231 | 231 | 1.34 | 1.34 |
ORF1b-C-1,021aa | This study | GCCAATACC | 32 | 32 | 42 | 42 | 1.31 | 1.31 |
ORF1a-C-67aa | This study | TCGCTGACC | 61 | 61 | 89 | 89 | 1.46 | 1.46 |
Lowercase represent body TRSs identified from amplification, cloning and sequencing of leader–body junctions; uppercase represent body TRSs identified from NGS analysis.
Previously published SHFV body TRSs.
Body TRSs that can generate two or more sg mRNAs with heterogeneous leader–body junction sequences.
An additional nucleotide was added at the 5′ end of the body TRS to differentiate the TRS from homologous sequences at other locations in the SHFV genome.
The relative abundance of sg mRNAs generated from each SHFV body TRS was calculated by dividing the transcription level from each body TRS by the total transcription level from all the body TRSs (Table S7, per TRS column). The relative abundance of sg mRNAs generated from the different body TRSs varied. The sg mRNAs generated from the published TRS7 had the highest abundance (232,720 reads at 8 hpi and 310,224 at 18 hpi, ∼49.53% and 50.55% abundance, respectively). The sg mRNA with the lowest abundance was one generated from a previously unreported body TRS (5′-ttcttcgcc-3′) located near the published TRS5 (four reads at 18 hpi and three at 8 hpi, 0.0009% and ∼0.0004% abundance, respectively) (Tables S6 and S7). Interestingly, the sg mRNA generated from an alternative TRS5 (5′-atcataacc-3′), rather than the published TRS5, was the third most abundant, indicating that the published TRS5 is not the primary TRS used. The previously identified TRSs for GP3′, GP4′, and GP4 were also not the major ones used. To compare the relative abundance of the sg mRNAs generated from each TRS at 8 and 18 hpi, the fold change over time was calculated for each TRS. Interestingly, the relative abundance of sg mRNAs for each body TRS showed little change with time after infection (Table S7). These data indicate that the production of SHFV sg mRNAs from each body TRS is consistent throughout the infection cycle.
Table S7.
Relative abundance of SHFV sg mRNAs generated from each identified body TRSs at early and late times in MA104 cells and macaque MΦs
5′ Encoded (protein) | Body TRSs* 5′ to 3′ | 8 hpi sg mRNA abundance in MA104 cells | 18 hpi sg mRNA abundance in MA104 cells | Fold change over time | 7 hpi sg mRNA abundance in macaque MΦs | 16 hpi sg mRNA abundance in macaque MΦs | Fold change over time | ||||||
Per TRS, % | Per ORF, % | Per TRS, % | Per ORF, % | Per TRS | Per ORF | Per TRS, % | Per ORF, % | Per TRS, % | Per ORF, % | Per TRS | Per ORF | ||
ORF2a′ (GP2′) | ttctgaacc | 1.309 | 6.279 | 1.215 | 5.686 | 0.93 | 0.91 | 0.661 | 2.621 | 0.788 | 3.181 | 1.19 | 1.21 |
tctttaact†,‡ | 4.883 | 4.382 | 0.90 | 1.930 | 2.336 | 1.21 | |||||||
ATTTTGACC | 0.044 | 0.045 | 1.00 | 0.015 | 0.036 | 2.33 | |||||||
CATTTTGAC | 0.042 | 0.044 | 1.04 | 0.015 | 0.021 | 1.46 | |||||||
ORF2b′ (E′) | ggtttaatc | 0.530 | 3.733 | 0.542 | 3.646 | 1.02 | 0.98 | 0.305 | 1.945 | 0.292 | 2.163 | 0.96 | 1.11 |
tccttaaac‡ | 3.202 | 3.104 | 0.97 | 1.640 | 1.871 | 1.14 | |||||||
ORF3′ (GP3′) | ttatttccc | 0.717 | 1.537 | 0.646 | 1.482 | 0.90 | 0.96 | 0.599 | 1.092 | 0.627 | 1.255 | 1.05 | 1.15 |
gggctaaac | 0.040 | 0.032 | 0.81 | 0.032 | 0.056 | 1.79 | |||||||
cttaaaacc†,‡ | 0.176 | 0.180 | 1.02 | 0.199 | 0.294 | 1.47 | |||||||
gatttcaac | 0.011 | 0.012 | 1.07 | 0.003 | 0.009 | 3.02 | |||||||
accttcacc | 0.205 | 0.182 | 0.89 | 0.096 | 0.100 | 1.04 | |||||||
ttctgtacc | 0.343 | 0.378 | 1.10 | 0.132 | 0.136 | 1.03 | |||||||
TCTGTACCA | 0.019 | 0.017 | 0.92 | 0.009 | 0.012 | 1.27 | |||||||
TATTTCCCA | 0.026 | 0.034 | 1.32 | 0.022 | 0.021 | 0.93 | |||||||
ORF2a′-C-25aa | cccttaact | 0.305 | 0.437 | 0.307 | 0.439 | 1.01 | 1.00 | 0.176 | 0.260 | 0.263 | 0.373 | 1.50 | 1.44 |
cactttact | 0.007 | 0.010 | 1.37 | 0.005 | 0.014 | 2.65 | |||||||
tcgcaaacc | 0.057 | 0.049 | 0.86 | 0.023 | 0.033 | 1.42 | |||||||
caatcaacc | 0.017 | 0.020 | 1.14 | 0.022 | 0.018 | 0.80 | |||||||
gcctcaacg | 0.021 | 0.020 | 0.98 | 0.012 | 0.013 | 1.10 | |||||||
GCGGTAACG‡ | 0.030 | 0.034 | 1.12 | 0.021 | 0.032 | 1.53 | |||||||
ORF4′ (GP4′) | gccttaacc | 2.108 | 3.029 | 1.848 | 2.660 | 0.88 | 0.88 | 1.034 | 1.580 | 1.170 | 1.769 | 1.13 | 1.12 |
cctttcacc† | 0.675 | 0.573 | 0.85 | 0.416 | 0.434 | 1.04 | |||||||
TCTTTAATT | 0.061 | 0.061 | 1.01 | 0.026 | 0.034 | 1.28 | |||||||
ACCTTTCAC | 0.045 | 0.044 | 0.98 | 0.026 | 0.028 | 1.07 | |||||||
CCCTGAACT | 0.140 | 0.133 | 0.95 | 0.076 | 0.103 | 1.34 | |||||||
ORF4′-C-84aa | ggcgaaacg | 0.003 | 0.003 | 0.007 | 0.007 | 1.91 | 1.91 | 0.004 | 0.004 | 0.007 | 0.007 | 1.81 | 1.81 |
ORF2b (E) | ctattaacc†,‡ | 3.832 | 4.283 | 3.660 | 4.137 | 0.96 | 0.97 | 3.578 | 4.071 | 4.809 | 5.408 | 1.34 | 1.33 |
tgggcaacc† | 0.451 | 0.477 | 1.06 | 0.493 | 0.599 | 1.22 | |||||||
ORF2b-C-51aa | TCTTTGACC | 0.036 | 0.036 | 0.046 | 0.046 | 1.28 | 1.28 | 0.015 | 0.015 | 0.011 | 0.011 | 0.72 | 0.72 |
ORF2a (GP2) | atactcaccc§,‡ | 0.084 | 0.084 | 0.091 | 0.091 | 1.09 | 1.09 | 0.051 | 0.051 | 0.057 | 0.057 | 1.11 | 1.11 |
ORF2a-C-161aa | caattaaat | 0.025 | 0.025 | 0.020 | 0.020 | 0.77 | 0.77 | 0.018 | 0.018 | 0.015 | 0.015 | 0.84 | 0.84 |
ORF2a-C-137aa | ACCTTACCC | 0.063 | 0.151 | 0.081 | 0.188 | 1.29 | 1.24 | 0.055 | 0.119 | 0.058 | 0.119 | 1.04 | 1.00 |
ACCCTAACC‡ | 0.088 | 0.106 | 1.21 | 0.063 | 0.061 | 0.97 | |||||||
ORF3 (GP3) | Ttctaaata | 0.003 | 0.927 | 0.005 | 1.148 | 1.58 | 1.24 | 0.004 | 0.794 | 0.003 | 0.907 | 0.75 | 1.14 |
tcactaacc†,‡ | 0.676 | 0.878 | 1.30 | 0.654 | 0.726 | 1.11 | |||||||
TCTTTCACC | 0.227 | 0.243 | 1.07 | 0.117 | 0.156 | 1.33 | |||||||
GTTCTTTCAC† | 0.019 | 0.022 | 1.14 | 0.019 | 0.022 | 1.14 | |||||||
ORF4 (GP4) | acttcaaca§ | 1.347 | 4.183 | 1.559 | 4.731 | 1.16 | 1.13 | 0.746 | 2.501 | 0.916 | 3.214 | 1.23 | 1.28 |
atccaaacc‡ | 0.649 | 0.703 | 1.08 | 0.431 | 0.545 | 1.26 | |||||||
tcattgacc†,‡ | 0.980 | 1.050 | 1.07 | 0.536 | 0.629 | 1.17 | |||||||
TACTTCCCA | 0.029 | 0.031 | 1.07 | 0.024 | 0.033 | 1.33 | |||||||
TTTTCAACG‡ | 0.177 | 0.227 | 1.28 | 0.108 | 0.136 | 1.26 | |||||||
CTTCTGTCCA§ | 0.011 | 0.009 | 0.89 | 0.008 | 0.007 | 0.83 | |||||||
TTGATAACG‡ | 0.614 | 0.705 | 1.15 | 0.448 | 0.688 | 1.54 | |||||||
TTCGTAACA | 0.265 | 0.309 | 1.17 | 0.149 | 0.197 | 1.33 | |||||||
CACTTCAAC | 0.111 | 0.137 | 1.24 | 0.051 | 0.064 | 1.24 | |||||||
ORF5a | TACTTATGT | 0.002 | 0.002 | 0.002 | 0.002 | 0.84 | 0.84 | 0.002 | 0.002 | 0.001 | 0.001 | 0.60 | 0.60 |
ORF5 (GP5) | acaacaacc‡ | 0.081 | 10.616 | 0.085 | 9.500 | 1.04 | 0.89 | 0.063 | 9.406 | 0.059 | 11.835 | 0.93 | 1.26 |
cccataacc‡ | 1.598 | 1.369 | 0.86 | 0.959 | 1.065 | 1.11 | |||||||
ttcttcgcc | 0.001 | 0.000 | 0.57 | 0.000 | 0.001 | N/A | |||||||
gtgataatc | 0.016 | 0.021 | 1.33 | 0.024 | 0.042 | 1.71 | |||||||
ttgttatca | 0.022 | 0.029 | 1.32 | 0.029 | 0.033 | 1.12 | |||||||
atcataacc‡ | 5.272 | 4.762 | 0.90 | 5.881 | 7.549 | 1.28 | |||||||
tccttaact† | 3.279 | 2.870 | 0.88 | 2.241 | 2.834 | 1.26 | |||||||
taattatgt | 0.004 | 0.005 | 1.13 | 0.006 | 0.005 | 0.88 | |||||||
GCATCAACG‡ | 0.078 | 0.082 | 1.06 | 0.055 | 0.073 | 1.32 | |||||||
TTCTTCACC‡ | 0.240 | 0.247 | 1.03 | 0.126 | 0.142 | 1.13 | |||||||
CTGTCGACC | 0.026 | 0.030 | 1.19 | 0.021 | 0.032 | 1.53 | |||||||
ORF5-C-68aa | aaaataaca‡ | 0.399 | 1.341 | 0.420 | 1.434 | 1.05 | 1.07 | 0.415 | 1.138 | 0.367 | 1.092 | 0.89 | 0.96 |
agtcaaacc‡ | 0.333 | 0.327 | 0.98 | 0.189 | 0.155 | 0.82 | |||||||
ttgttcaac | 0.048 | 0.048 | 1.00 | 0.037 | 0.033 | 0.88 | |||||||
ttcttcaca‡ | 0.168 | 0.179 | 1.06 | 0.087 | 0.084 | 0.97 | |||||||
tcaataaca‡ | 0.153 | 0.199 | 1.31 | 0.216 | 0.235 | 1.09 | |||||||
ttactaacc‡ | 0.106 | 0.106 | 1.01 | 0.072 | 0.065 | 0.90 | |||||||
tgtttcacc | 0.058 | 0.072 | 1.24 | 0.038 | 0.053 | 1.38 | |||||||
CATTTATTG | 0.043 | 0.043 | 1.00 | 0.025 | 0.030 | 1.18 | |||||||
CAGTCAAAC | 0.034 | 0.039 | 1.14 | 0.058 | 0.071 | 1.22 | |||||||
ORF6 (M) | ttgtcaacc†,‡ | 10.378 | 10.378 | 10.148 | 10.148 | 0.98 | 0.98 | 9.187 | 9.187 | 12.179 | 12.179 | 1.33 | 1.33 |
ORF6-C-79aa | tacttgacc‡ | 0.433 | 0.613 | 0.572 | 0.846 | 1.32 | 1.38 | 0.281 | 0.505 | 0.408 | 0.771 | 1.45 | 1.53 |
TTACTAACA† | 0.081 | 0.112 | 1.38 | 0.067 | 0.111 | 1.66 | |||||||
ACGTGGACC | 0.099 | 0.162 | 1.64 | 0.157 | 0.252 | 1.60 | |||||||
ORF6-C-40aa | CCCTTGGCC | 0.010 | 0.856 | 0.016 | 1.217 | 1.60 | 1.42 | 0.011 | 0.833 | 0.009 | 0.919 | 0.82 | 1.10 |
CGATTAACG‡ | 0.657 | 0.922 | 1.40 | 0.692 | 0.753 | 1.09 | |||||||
GTCGTAACT‡ | 0.189 | 0.278 | 1.47 | 0.130 | 0.157 | 1.21 | |||||||
ORF7 (N) | ttgttaacc†,‡ | 49.526 | 49.526 | 50.554 | 50.554 | 1.02 | 1.02 | 62.534 | 62.534 | 53.415 | 53.415 | 0.85 | 0.85 |
ORF15,419–15,505 | TGGCAAACC | 0.019 | 0.632 | 0.023 | 0.730 | 1.20 | 1.16 | 0.017 | 0.407 | 0.013 | 0.177 | 0.75 | 0.44 |
CAAATAACA‡ | 0.015 | 0.019 | 1.25 | 0.013 | 0.005 | 0.39 | |||||||
TCCCCAACG | 0.216 | 0.264 | 1.22 | 0.115 | 0.056 | 0.49 | |||||||
GGCTTCCCC | 0.193 | 0.234 | 1.22 | 0.107 | 0.055 | 0.52 | |||||||
CACTCAACA | 0.189 | 0.190 | 1.01 | 0.156 | 0.049 | 0.31 | |||||||
ORF12,407–12,628 | TCTTTACCG | 0.046 | 0.489 | 0.044 | 0.424 | 0.96 | 0.87 | 0.028 | 0.361 | 0.021 | 0.357 | 0.75 | 0.99 |
TTCTTTACC | 0.444 | 0.380 | 0.86 | 0.333 | 0.337 | 1.01 | |||||||
ORF12,191–12,271 | ATCATACCA | 0.082 | 0.082 | 0.087 | 0.087 | 1.06 | 1.06 | 0.046 | 0.046 | 0.056 | 0.056 | 1.21 | 1.21 |
ORF10,071–10,331 | CGACTAACC‡ | 0.068 | 0.090 | 0.053 | 0.072 | 0.78 | 0.79 | 0.044 | 0.078 | 0.060 | 0.104 | 1.36 | 1.32 |
GGCTGTCAC | 0.022 | 0.018 | 0.83 | 0.034 | 0.044 | 1.28 | |||||||
ORF5,920–5,997 | AACTCAACT | 0.006 | 0.006 | 0.005 | 0.005 | 0.91 | 0.91 | 0.005 | 0.005 | 0.005 | 0.005 | 1.07 | 1.07 |
ORF13,699–13,836 | TTGCAAACC | 0.157 | 0.252 | 0.169 | 0.272 | 1.08 | 1.08 | 0.109 | 0.182 | 0.145 | 0.244 | 1.32 | 1.34 |
GGCTTACTG | 0.050 | 0.056 | 1.11 | 0.041 | 0.054 | 1.33 | |||||||
TACTTACTC | 0.045 | 0.048 | 1.05 | 0.032 | 0.045 | 1.44 | |||||||
nsp12-C-107aa | CGCTAAACC | 0.313 | 0.323 | 0.331 | 0.338 | 1.06 | 1.05 | 0.149 | 0.151 | 0.225 | 0.230 | 1.51 | 1.53 |
GCGCTAAAC | 0.010 | 0.008 | 0.78 | 0.002 | 0.006 | 2.71 | |||||||
ORF1b-C-189aa | ATGTTGACC | 0.009 | 0.009 | 0.008 | 0.008 | 0.90 | 0.90 | 0.010 | 0.010 | 0.010 | 0.010 | 1.02 | 1.02 |
ORF1b-C-215aa | ACGGTAACT | 0.022 | 0.022 | 0.025 | 0.025 | 1.15 | 1.15 | 0.041 | 0.041 | 0.064 | 0.064 | 1.56 | 1.56 |
ORF1b-C-579aa | ACGCTAACC‡ | 0.037 | 0.037 | 0.038 | 0.038 | 1.02 | 1.02 | 0.027 | 0.027 | 0.041 | 0.041 | 1.47 | 1.47 |
ORF1b-C-1021aa | GCCAATACC | 0.007 | 0.007 | 0.007 | 0.007 | 1.01 | 1.01 | 0.003 | 0.003 | 0.002 | 0.002 | 0.97 | 0.97 |
ORF1a-C-6aa | TCGCTGACC | 0.013 | 0.013 | 0.015 | 0.015 | 1.12 | 1.12 | 0.014 | 0.014 | 0.016 | 0.016 | 1.16 | 1.16 |
Lowercase letters indicate body TRSs identified from amplification, cloning, and sequencing of leader–body junctions; uppercase letters indicate body TRSs identified from NGS analysis.
Previously published SHFV body TRS.
Body TRSs that can generate two or more sg RNAs with alternative leader–body junction sequences.
An additional nucleotide was added at the 5′ end of the body TRS to differentiate the TRS from homologous sequences at other locations in the SHFV genome.
To analyze the effect of the location of a TRS in the genome on the abundance of sg mRNA generated, the location of each body TRS in the SHFV genome was plotted against the corresponding relative sg mRNA abundance from that TRS. The highest density of body TRSs is in the 3′ region where the SHFV structural proteins are encoded (nucleotides 10,953–15,630) (Fig. 5). Although the TRS for the 3′ terminal ORF7 produced the highest abundance of sg mRNAs, a sequential decrease in sg mRNA abundance with increasing distance of a TRS from the 3′ end of the genome was not observed. Instead, multiple regions containing body TRSs with high transcription activity were flanked by regions containing body TRSs with low transcription activity. To analyze the impact of duplex stability between leader and body TRSs on sg mRNA abundance, the lowest free energy (kcal/mol) of the duplex (12 bp) formed between the leader TRS and each of the seven closely located body TRSs for ORF5-C-68aa was calculated (Fig. S2). A linear correlation between duplex stability and sg mRNA abundance was not observed, suggesting that other factors, such as RNA secondary structure, play a major role in determining sg mRNA abundance.
Fig. 5.
Relative abundance of the sg mRNAs produced from each identified functional SHFV body TRS. The percent abundance of the sg mRNAs produced from each identified functional body TRS was graphed according to its genome location. A schematic diagram of the SHFV genome is shown at the top. RFS, ribosomal frameshift.
MΦs and dendritic cells are targeted by SHFV during infections in monkeys. To determine whether the TRS use and relative abundance of individual sg mRNAs observed in MA104 cells are accurate representations of what occurs during an in vivo infection, primary rhesus macaque MΦs were infected with SHFVic at an MOI of 1, and mRNAs isolated from total RNA extracted at 7 or 16 hpi was subjected to NGS using Illumina MiSeq. The sequencing data were analyzed using CLC Genomics Workbench software and the same workflow described above. Although fewer total reads were obtained due to the lower capacity of Illumina MiSeq compared with HiSeq, each of the 96 body TRSs identified in the SHFV genome was also functional at both early and late times post infection in MΦs, with the exception of the body TRS (5′-ttcttcgcc-3′). This body TRS had the lowest number of mapped reads in MA104 cells (Table S6). There were no reads mapped to this TRS in the 7-h infected MΦ sample, but two reads mapped to it in the 16-h sample (Table S8). The lower capacity of Illumina MiSeq likely negatively impacted the detection of the rare sg mRNAs produced from this TRS. The relative sg mRNA abundance produced at the majority of the SHFV body TRSs was similar in the infected MΦ and MA104 samples and remained consistent in both types of cells with time after infection (Table S7). The consistent sg mRNA abundance for individual TRSs at different times after infection in both a cell line and primary MΦs indicates that transcription is regulated by viral, not cellular, factors.
Table S8.
NGS reads mapping to the SHFV sg mRNA leader–body junctions produced from each identified body TRS at 7 and 16 hpi in macaque MΦs
5′ ORF encoded (gene) | TRS | Body TRS* 5′ to 3′ | 7 hpi | 16 hpi | Fold change over time | |||
Per TRS | Per ORF | Per TRS | Per ORF | Per TRS | Per ORF | |||
ORF2a′ (GP2′) | This study | ttctgaacc | 1,298 | 5,148 | 1,283 | 5,181 | 0.99 | 1.01 |
TRS2′† | tctttaact‡ | 3,791 | 3,805 | 1.00 | ||||
This study | ATTTTGACC‡ | 30 | 58 | 1.93 | ||||
This study | CATTTTGAC | 29 | 35 | 1.21 | ||||
ORF2b′ (E′) | This study | ggtttaatc | 599 | 3,821 | 476 | 3,523 | 0.79 | 0.92 |
This study | tccttaaac‡ | 3,222 | 3,047 | 0.95 | ||||
ORF3′ (GP3′) | This study | ttatttccc | 1,176 | 2,145 | 1,021 | 2,044 | 0.87 | 0.95 |
This study | gggctaaac | 62 | 92 | 1.48 | ||||
TRS3′† | cttaaaacc‡ | 391 | 478 | 1.22 | ||||
This study | gatttcaac | 6 | 15 | 2.50 | ||||
This study | accttcacc | 189 | 163 | 0.86 | ||||
This study | ttctgtacc | 259 | 222 | 0.86 | ||||
This study | TCTGTACCA | 18 | 19 | 1.06 | ||||
This study | TATTTCCCA | 44 | 34 | 0.77 | ||||
ORF2a′-C-25aa | This study | cccttaact | 346 | 510 | 429 | 607 | 1.24 | 1.19 |
This study | aactttact | 10 | 22 | 2.20 | ||||
This study | tcgcaaacc | 46 | 54 | 1.17 | ||||
This study | caatcaacc | 44 | 29 | 0.66 | ||||
This study | gcctcaacg | 23 | 21 | 0.91 | ||||
This study | GCGGTAACG‡ | 41 | 52 | 1.27 | ||||
ORF4′ (GP4′) | This study | gccttaacc | 2,032 | 3,104 | 1,906 | 2,881 | 0.94 | 0.93 |
TRS4′† | cctttcacc | 818 | 707 | 0.86 | ||||
This study | TCTTTAATT | 52 | 55 | 1.06 | ||||
This study | ACCTTTCAC | 52 | 46 | 0.88 | ||||
This study | CCCTGAACT | 150 | 167 | 1.11 | ||||
ORF4′-C-84aa | This study | ggcgaaacg | 8 | 8 | 12 | 12 | 1.50 | 1.50 |
ORF2b (E) | TRS2† | ctattaacc‡ | 7,029 | 7,997 | 7,831 | 8,807 | 1.11 | 1.10 |
This study | tgggcaacc‡ | 968 | 976 | 1.01 | ||||
ORF2b-C-51aa | This study | TCTTTGACC | 30 | 30 | 18 | 18 | 0.60 | 0.60 |
ORF2a (GP2) | This study | atactcaccc‡,§ | 101 | 101 | 93 | 93 | 0.92 | 0.92 |
ORF2a-C-161aa | This study | caattaaat | 36 | 36 | 25 | 25 | 0.69 | 0.69 |
ORF2a-C-13aa | This study | ACCTTACCC | 109 | 233 | 94 | 194 | 0.86 | 0.83 |
This study | ACCCTAACC‡ | 124 | 100 | 0.81 | ||||
ORF3 (GP3) | This study | ttctaaata | 8 | 1,560 | 5 | 1,477 | 0.63 | 0.95 |
TRS3† | tcactaacc‡ | 1,284 | 1,182 | 0.92 | ||||
This study | TCTTTCACC | 230 | 254 | 1.10 | ||||
This study | GTTCTTTCAC§ | 38 | 36 | 0.95 | ||||
ORF4 (GP4) | This study | acttcaaca‡ | 1,465 | 4,914 | 1,491 | 5,234 | 1.02 | 1.07 |
This study | atccaaacc‡ | 847 | 887 | 1.05 | ||||
TRS4† | tcattgacc‡ | 1,053 | 1,025 | 0.97 | ||||
This study | TACTTCCCA | 48 | 53 | 1.10 | ||||
This study | TTTTCAACG‡ | 212 | 221 | 1.04 | ||||
This study | CTTCTGTCCA§ | 16 | 11 | 0.69 | ||||
This study | TTGATAACG‡ | 880 | 1,121 | 1.27 | ||||
This study | TTCGTAACA | 292 | 321 | 1.10 | ||||
This study | CACTTCAAC | 101 | 104 | 1.03 | ||||
ORF5a | This study | TACTTATGT | 4 | 4 | 2 | 2 | 0.50 | 0.50 |
ORF5 (GP5) | This study | acaacaacc‡ | 124 | 18,477 | 96 | 19,273 | 0.77 | 1.04 |
This study | cccataacc‡ | 1,884 | 1,734 | 0.92 | ||||
This study | ttcttcgcc | 0 | 2 | N/A | ||||
This study | gtgataatc | 48 | 68 | 1.42 | ||||
This study | ttgttatca | 57 | 53 | 0.93 | ||||
This study | atcataacc‡ | 11,553 | 12,293 | 1.06 | ||||
TRS5† | tccttaact | 4,403 | 4,616 | 1.05 | ||||
This study | taattatgt | 11 | 8 | 0.73 | ||||
This study | GCATCAACG‡ | 109 | 119 | 1.09 | ||||
This study | TTCTTCACC‡ | 247 | 232 | 0.94 | ||||
This study | CTGTCGACC | 41 | 52 | 1.27 | ||||
ORF5-C-68aa | This study | aaaataaca‡ | 815 | 2,235 | 598 | 1,778 | 0.73 | 0.80 |
This study | agtcaaacc‡ | 371 | 252 | 0.68 | ||||
This study | ttgttcaac | 73 | 53 | 0.73 | ||||
This study | ttcttcaca‡ | 170 | 136 | 0.80 | ||||
This study | tcaataaca‡ | 425 | 383 | 0.90 | ||||
This study | ttactaacc‡ | 142 | 106 | 0.75 | ||||
This study | tgtttcacc | 75 | 86 | 1.15 | ||||
This study | CATTTATTG | 50 | 49 | 0.98 | ||||
This study | CAGTCAAAC | 114 | 115 | 1.01 | ||||
ORF6 (GP6) | TRS6† | ttgtcaacc‡ | 18,047 | 18,047 | 19,833 | 19,833 | 1.10 | 1.10 |
ORF6-C-79aa | This study | tacttgacc‡ | 552 | 992 | 665 | 1,256 | 1.20 | 1.27 |
This study | TTACTAACA‡ | 131 | 180 | 1.37 | ||||
This study | ACGTGGACC | 309 | 411 | 1.33 | ||||
ORF6-C-40aa | This study | CCCTTGGCC | 22 | 1,636 | 15 | 1,497 | 0.68 | 0.92 |
This study | CGATTAACG‡ | 1,359 | 1,227 | 0.90 | ||||
This study | GTCGTAACT‡ | 255 | 255 | 1.00 | ||||
ORF7 (GP7) | TRS7† | ttgttaacc‡ | 122,844 | 122,844 | 86,988 | 86,988 | 0.71 | 0.71 |
ORF15,419–15,505 | This study | TGGCAAACC | 34 | 800 | 21 | 289 | 0.62 | 0.36 |
This study | CAAATAACA‡ | 25 | 8 | 0.32 | ||||
This study | TCCCCAACG | 225 | 91 | 0.40 | ||||
This study | GGCTTCCCC | 210 | 90 | 0.43 | ||||
This study | CACTCAACA | 306 | 79 | 0.26 | ||||
ORF12,407–12,628 | This study | TCTTTACCG | 55 | 709 | 34 | 582 | 0.62 | 0.82 |
This study | TTCTTTACC | 654 | 548 | 0.84 | ||||
ORF12,191–12,271 | This study | ATCATACCA | 91 | 91 | 91 | 91 | 1.00 | 1.00 |
ORF10,071–10,331 | This study | CGACTAACC‡ | 87 | 154 | 98 | 169 | 1.13 | 1.10 |
This study | GGCTGTCAC | 67 | 71 | 1.06 | ||||
ORF5,920–5,997 | This study | AACTCAACT | 9 | 9 | 8 | 8 | 0.89 | 0.89 |
ORF13,699–13,836 | This study | TTGCAAACC | 215 | 357 | 236 | 398 | 1.10 | 1.11 |
This study | GGCTTACTG | 80 | 88 | 1.10 | ||||
This study | TACTTACTC | 62 | 74 | 1.19 | ||||
nsp12-C-107aa | This study | CGCTAAACC | 292 | 296 | 366 | 375 | 1.25 | 1.27 |
This study | GCGCTAAAC | 4 | 9 | 2.25 | ||||
ORF1b-C-189aa | This study | ATGTTGACC | 19 | 19 | 16 | 16 | 0.84 | 0.84 |
ORF1b-C-215aa | This study | ACGGTAACT | 81 | 81 | 105 | 105 | 1.30 | 1.30 |
ORF1b-C-579aa | This study | ACGCTAACC‡ | 54 | 54 | 66 | 66 | 1.22 | 1.22 |
ORF1b-C-1021aa | This study | GCCAATACC | 5 | 5 | 4 | 4 | 0.80 | 0.80 |
ORF1a-C-67aa | This study | TCGCTGACC | 27 | 27 | 26 | 26 | 0.96 | 0.96 |
Lowercase represent body TRSs identified from amplification, cloning and sequencing of leader–body junctions; uppercase represent body TRSs identified from NGS analysis.
Previously published SHFV body TRSs.
Body TRSs that can generate two or more sg mRNAs with heterogeneous leader–body junction sequences.
An additional nucleotide was added at the 5′ end of the body TRS to differentiate the TRS from homologous sequences at other locations in the SHFV genome.
Mass Spectrometry Analysis of SHFV Proteins in Infected MA104 Cells.
The relative abundance of an SHFV ORF product can be estimated by the relative abundance of the sg mRNAs expressing it. When an ORF was encoded as the 5′ ORF in multiple sg mRNAs generated from different TRSs, then the combined sg mRNA abundance from all these TRSs was used to estimate the relative abundance of that ORF (Table S7, per ORF column). Among the three major structural proteins, ORF7 (N, 50.55%) had the highest abundance, followed by ORF6 (M, 10.15%) and ORF5 (GP5, 9.50%). The abundances for the minor structural proteins were lower: 5.69% for ORF2a′ (GP2′), 4.73% for ORF4 (GP4), 4.14% for ORF2b (E), 3.65% for ORF2b′ (E′), 2.66% for ORF4′ (GP4′), 1.48% for ORF3′ (GP3), and 1.15% for ORF3 (GP3). ORF2a (GP2, 0.09%) and ORF5a (0.002%) had the lowest abundance. Among all the in-frame C-terminal ORFs identified, ORF5-C-68aa had the highest abundance (1.43%), which is consistent with the detection of its sg mRNAs on the Northern blots (Fig. 1). Each of the previously unreported ORFs encoded in an alternative reading frame had low abundance (Table S7, per ORF column).
To directly analyze the expression of the SHFV proteins, MA104 cells were infected with SHFVic virus at an MOI of 1. At 18 hpi, cell lysate was collected in radioimmunoprecipitation assay (RIPA) buffer and briefly electrophoresed 1 cm into a 12% NuPAGE Bis-Tris gel. The region of the gel containing proteins smaller than 75 kDa was excised and subjected to LC-MS/MS analysis. The MS/MS spectra generated were searched against a SHFV protein database that contains all the predicted ORFs (≥20 aa) encoded in all three reading frames on both the plus- and minus-strand SHFV RNAs. Each SHFV protein detected was quantified using the intensity-based absolute quantification (iBAQ) method (Table 1). The mass spectrometry analysis detected all the previously identified or predicted SHFV structural proteins except for ORF5a, which had the lowest sg mRNA abundance (0.002%) (Table S7). The relative amount of each SHFV structural protein correlated well with its estimated sg mRNA abundance, with the exception of ORF2a (GP2), ORF2a′ (GP2′), and ORF3′ (GP3′). ORF2a′ (GP2′) had the fourth most abundant sg mRNA, whereas its protein level ranked eighth. Conversely, the ORF3′ (GP3′) protein level ranked sixth, while the sg mRNA level ranked ninth. ORF2a (GP2) had the most surprising discrepancy with a very low sg mRNA abundance (0.09%, ranked 11th) but a very high amount of protein (iBAQ intensity of 9.88 E+08, ranking seventh). This finding suggested that although a separate body TRS was identified for ORF2a (GP2), the high abundance of this protein strongly suggests that it is also bicistronically expressed from the second ORF in the sg mRNAs encoding ORF2b (E), which are highly abundant.
Table 1.
Mass spectrometry analysis of SHFV proteins produced in MA104 cells
SHFV proteins | kDa | % coverage | Razor + unique peptides | iBAQ Intensity* |
SHFV ORF7 (N) | 12.284 | 62.2 | 8 | 1.54E+10 |
SHFV ORF6 (M) | 17.851 | 61.7 | 10 | 2.71E+09 |
SHFV ORF5 (GP5) | 31.294 | 19.8 | 5 | 1.39E+09 |
SHFV ORF2b (E) | 8.7474 | 20 | 2 | 1.25E+09 |
SHFV ORF4 (GP4) | 19.793 | 24.2 | 3 | 1.23E+09 |
SHFV ORF3′ (GP3′) | 22.732 | 26 | 4 | 9.92E+08 |
SHFV ORF2a (GP2) | 24.13 | 28 | 5 | 9.88E+08 |
SHFV ORF2a′ (GP2′) | 31.792 | 48 | 11 | 5.48E+08 |
SHFV ORF2b′ (E′) | 10.456 | 56.4 | 5 | 5.01E+08 |
SHFV ORF4′ (GP4′) | 23.061 | 29.3 | 3 | 1.33E+08 |
SHFV ORF3 (GP3) | 19.655 | 26.8 | 2 | 9.20E+07 |
ORF10,071–10,331 | 9.4705 | 19.8 | 2 | 8.54E+06 |
SHFV nsp1α | 17.898 | 93.3 | 13 | 4.11E+09 |
SHFV nsp3 | 24.016 | 42.3 | 7 | 2.38E+09 |
SHFV nsp1β | 20.663 | 78 | 10 | 1.98E+09 |
SHFV nsp4 | 20.858 | 47.8 | 8 | 1.80E+09 |
SHFV nsp1γ | 15.313 | 77.6 | 6 | 1.66E+09 |
SHFV nsp2 | 81.172 | 51.9 | 31 | 1.63E+09 |
SHFV nsp7 | 22.932 | 73.1 | 11 | 8.21E+08 |
SHFV nsp5 | 18.089 | 4.8 | 1 | 5.94E+08 |
SHFV nsp12 | 19.828 | 45.2 | 6 | 2.58E+08 |
SHFV nsp9 | 70.347 | 42.6 | 24 | 1.08E+08 |
SHFV nsp11 | 24.767 | 43.5 | 6 | 1.03E+08 |
SHFV nsp8 | 5.249 | 68 | 2 | 8.29E+07 |
SHFV nsp10 | 49.348 | 38 | 10 | 7.31E+07 |
SHFV nsp2-2 TF | 68.896 | 37.1 | 2 | 3.08E+07 |
SHFV nsp2-1 TF | 52.161 | 45.5 | 1 | 3.76E+06 |
The structural proteins and then the nonstructural proteins are listed in the order of their relative abundance.
Because the mass spectrometry analysis could not differentiate trypsin cleavage peptides generated from in-frame C-terminal proteins from those generated from the overlapping full-length proteins, the amounts of the C-terminal proteins were not estimated. However, the peptide-mapping diagrams for ORF4′, ORF2b, ORF2a, ORF5, and ORF6 showed a greater abundance of peptides mapping to the C-terminal region of these proteins, suggesting the possibility that these in-frame C-terminal proteins as well as the full-length proteins are produced in infected cells. Among the ORF1a/1b C-terminal ORFs, the in-frame C-terminal ORF of nsp12 (nsp12-C-107aa) had the highest sg mRNA abundance, followed by ORF1b-C-579aa and ORF1b-C-215aa (Table S7). Accordingly, even though the ORF1b polyprotein is translated at a much lower efficiency (15–20%) than ORF1a (38), the mass spectrometry analysis detected a higher amount of nsp11 (iBAQ intensity of 1.03E+08) and nsp12 (iBAQ intensity of 2.58E+08), which are the 3′ terminal proteins of ORF1b, than of nsp8 (iBAQ intensity of 8.29E+07), which is the 3′ terminal protein of ORF1a (Table 1). The data suggest that the ORF1b C-terminal ORFs are translated and provide an additional source of nsp11 and nsp12.
Among the six previously unreported ORFs located in alternative reading frames encoded by newly identified sg mRNAs, only the product of ORF10,071–10,331 was detected by mass spectrometry, confirming its expression in infected cells (Table 1). ORF10,071–10,331 encodes an 86-aa protein from an alternative reading frame in the ORF1b region and is the largest of the proteins expressed from an alternative reading frame. The mass spectrometry analysis also detected all the previously known and predicted SHFV nonstructural proteins except for nsp6, which is only 13 aa in length and contains a trypsin cleavage site (Table 1). Evidence for both −1 and −2 frameshift activities at a site in the nsp2 region of the genome was previously reported for PRRSV (39). The nsp2 frameshift sequence is conserved in the SHFV genome. The mass spectrometry analysis detected the SHFV nsp2 −1 and −2 frameshift products and showed that the SHFV nsp2 −2 frameshift product (nsp2_−2_TF, iBAQ intensity of 3.08E+07) is more abundant than the −1 frameshift product (nsp2_−1_TF, iBAQ intensity of 3.76E+06) (Table 1). No peptides from any of the predicted ORFs in the SHFV minus-strand RNA were detected by the mass spectrometry analysis.
Discussion
The defining characteristic of members of the order Nidovirales is that the protein ORFs located at the 3′ end of the genome are expressed from a 3′ coterminal nested set of sg mRNAs, and the synthesis of the minus-strand templates for these sg mRNAs is regulated by TRSs. Eight body TRSs were initially identified in the SHFV genome, with each considered to be the TRS used for generating the sg RNA of one of the known structural proteins (24). We subsequently identified a body TRS and sg mRNA for a ninth structural protein GP3′ by Northern blotting and RT-PCR (25). In addition to the nine SHFV sg mRNA bands detected by Northern blotting, whose sizes corresponded to the sizes predicted for the sg mRNAs of the known structural proteins—GP2′, GP3′, GP4′, GP2, GP3, GP4, GP5, M, and N—an additional uncharacterized band was detected between sg mRNA5 and sg mRNA6 with a 5′ leader probe (25). Although not characterized, a sg mRNA band at the same relative gel position was evident on multiple previously published Northern blots of EAV sg mRNAs, suggesting that this additional sg mRNA may also be produced in EAV-infected cells (31, 33, 40). In the present study, the additional band was also detected with probes specific for sg mRNA7, sg mRNA6, or sg mRNA5, confirming its location between sg mRNA5 and sg mRNA6. Leader–body junctions amplified from sg mRNAs produced in SHFV-infected MA104 cells using a 5′ leader primer and a 3′ body primer located upstream of TRS6 identified seven adjacent body TRSs between TRS5 and TRS6 and indicated that the broad band detected by Northern blotting was composed of multiple sg mRNAs of slightly different sizes.
Subsequent NGS analyses identified two more body TRSs between TRS5 and TRS6 for a total of nine functional TRSs in this region. The sg mRNAs produced from each of these nine TRSs were predicted to encode the same in-frame, C-terminal peptide of GP5 (ORF5-C-68aa). The start codon for this ORF is located downstream of the last GP5 transmembrane domain, so this protein would be expected to be located in the cytoplasm of infected cells. Although the expression of an ORF5-C peptide by an arterivirus has not previously been predicted or reported, two previous studies of the coronaviruses Severe acute respiratory syndrome virus and Infectious bronchitis virus identified a novel sg mRNA encoding an in-frame, C-terminal peptide of the virion spike protein (41, 42), and an in-frame truncated spike protein was also reported for Porcine respiratory coronavirus (43, 44). Compared with the other identified SHFV in-frame C-terminal peptides, ORF5-C-68aa and ORF6-C-79aa have much higher sg mRNA abundances, and these were the two that showed a decreased virus yield after mutation of their start codon, suggesting that ORF5-C-68aa and ORF6-C-79aa are functionally important during the virus replication cycle. However, the possibility that the conserved Met-to-Leu substitution in the full-length protein was responsible for the effect observed was not ruled out. Although no evidence of function in cell culture was obtained for the ORF4′-C-84aa, ORF2a-C-161aa, and ORF2a′-C-25aa proteins, the possibility that these peptides may be functionally relevant in infected animals has not been investigated.
The initial discovery of multiple functional TRSs for ORF5-C-68aa suggested the possibility that additional TRSs were used for the expression of sg mRNAs for some or all of the other SHFV 3′ ORFs. A total of 36 functional TRSs were subsequently identified by amplification, cloning, and sequencing, and another 51 were identified by NGS analysis of SHFV sg mRNAs from infected MA104 cells. The majority of these TRSs produced alternative sg mRNAs for known SHFV structural proteins. For five of the structural protein ORFs, the relative abundance of the sg mRNA produced from the previously published TRS was higher than that of the sg mRNAs produced from any of the newly identified alternative TRSs, confirming that the published TRS is the major TRS used. However, for GP3′, GP4′, GP4, and GP5, one of the newly identified alternative TRSs produced sg mRNAs at a higher abundance than the previously published TRS. The major TRSs for those proteins were likely not identified in the previous study due to the limited number of leader–body junction clones analyzed (24). Evidence from previous studies suggested that the genomes of other nidoviruses also contain functional alternative body TRSs for their structural proteins. One or two alternative body TRSs producing sg mRNAs encoding the structural proteins GP3, GP4, or GP5 were identified by amplification and cloning of a few sg mRNA leader–body junctions for PRRSV and EAV (40, 45). Although the relative abundance of the EAV sg mRNAs produced from these alternative body TRSs was lower than that from the major TRS, the combined amount produced from them was sufficient to generate infectious progeny virus when the major TRS was inactivated (although the virus yield was reduced) (40). Coronavirus genomes have also been shown to be able to use functional alternative body TRSs when the major body TRS of a 3′ ORF is mutated (46–48). However, the finding that, for the majority of the SHFV proteins detected by mass spectrometry, the relative protein level correlated well with the total sg mRNA abundance produced from multiple TRSs strongly suggests that the alternative body TRSs are not just reserve back-ups that function only to ensure virus survival when the primary TRS becomes mutated. Instead, the use of all the multiple alternative body TRSs for each ORF during each replication cycle appears to be the mechanism used to fine tune the production of the optimal total amount of each viral protein, especially for those proteins that lack a major TRS with high transcription activity.
After the discovery of the E ORF in arterivirus genomes and the prediction of the additional E′ ORF in the SHFV genome, it was proposed that SHFV sg mRNA2′ and sg mRNA2 bicistronically express GP2′/E′ and E/GP2, respectively (10, 34). In the present study, sg mRNAs were identified that encode E′ as the 5′ ORF. The total sg mRNA abundance estimated for GP2′ was ∼1.5 times higher than that for E′, which correlated with the relative protein amounts estimated by mass spectrometry indicating that GP2′ was ∼1.1 times more abundant than E′. These data suggest that the majority of E′ is expressed from the 5′ ORF of multiple E′ sg mRNAs. Multiple sg mRNAs were also identified with GP2 as the 5′ ORF. However, although the total abundance estimated for the GP2 sg mRNAs was ∼46 times lower than that for E, the protein abundance estimated for GP2 was equivalent to that for E. This finding suggests that, although sg mRNAs expressing GP2 as the 5′ ORF are produced at low levels, the majority of GP2 produced is translated as a second ORF from the E sg mRNAs. Monocistronic sg mRNAs have also subsequently been identified for some coronavirus proteins, such as the Mouse hepatitis virus (MHV) E protein, that were initially thought to be expressed bicistronically (41, 49, 50). However, an in vitro translation study showed that the MHV E protein can also be expressed from the second ORF of a larger sg mRNA (50). The SHFV sg mRNA5 was previously predicted to express both GP5 and ORF5a. In the present study, separate ORF5a sg mRNAs were identified at the very low abundance of 0.002%. The lack of detection of ORF5a peptides by the mass spectrometry analysis was consistent with the low sg mRNA abundance and suggested that it is unlikely that ORF5a is expressed bicistronically from the highly abundant GP5 sg mRNAs.
About a third of the body TRSs identified in the SHFV genome produced detectable levels of sg mRNAs with two or three variant leader–body junction sequences. The junction sequence of a sg mRNA is determined by the position at which the polymerase disassociates from the body TRS core sequence and by the location where the 3′ end of the nascent minus strand anneals to the leader TRS core sequence. Heterogeneity of the leader–body junction sequences generated from the same body TRS has also been observed for some coronavirus sg mRNAs (51, 52). Despite the complexity of SHFV sg mRNA production, the total abundance of sg mRNA generated at each body TRS was consistent at early and late times post infection in two different cell types. Although the sg mRNAs generated at the 3′ terminal TRS7 were by far the most abundant, a serial decrease in sg mRNA abundance with increasing TRS distance from the 3′ end was not observed. Instead, multiple regions of high transcriptional activity flanked by regions of low transcriptional activity were detected. Previous studies showed that increasing the duplex stability between leader–body TRSs by site-directed mutagenesis enhanced sg mRNA synthesis but that duplex stability was not the only determinant (33, 47, 53, 54). In the present study, a linear correlation between the sg mRNA abundance and the stability of the duplex formed between the leader TRS and each of the seven nearby body TRSs for ORF5-C-68aa was not observed. These results are consistent with previous observations for both coronavirus and arterivirus genomes indicating that the local RNA secondary structural context of the different body TRSs as well as long-distance RNA interactions in the viral genome play important roles in determining the efficiency of sg mRNA production (40, 54–57).
Surprisingly, multiple functional body TRSs were also identified in the ORF1a/1b region of the SHFV genome, indicating that sg RNA transcription is not restricted to the 3′ structural protein region. A previous study of the coronavirus MHV also identified three functional body TRSs in the ORF1a region (58). Most of the TRSs in the SHFV ORF1a/1b region produced long sg mRNAs encoding different lengths of in-frame, C-terminal portions of ORF1b (truncated ORF1b). A previous study on the arterivirus LDV also identified a body TRS in ORF1b that produced a long sg mRNA encoding a truncated ORF1b (200 aa) (59). ORF1b is translated as part of an ORF1a/1b polyprotein after a −1 frame shift occurs near the end of ORF1a and the 1b region is cleaved to produce nsp9 (RdRp), nsp10 (helicase), nsp11 (NendoU), and nsp12 (function unknown) (10). Except for nsp12-C, all the truncated SHFV ORF1b ORFs encode full-length nsp12. The longest truncated ORF1b ORF, ORF1b-C-1,021aa, encodes full-length nsp12, nsp11, and nsp10 as well as the C terminus of nsp9. The translation efficiency of EAV ORF1b was previously reported to be only 15–20 % of that of ORF1a (38). The production of sg mRNAs encoding ORF1b proteins provides an alternative means for producing nsp10, nsp11, and especially nsp12. Consistent with this hypothesis, the mass spectrometry analysis detected three times more nsp12 (iBAQ intensity of 2.58E+08) than nsp8 (iBAQ intensity of 8.29E+07), which is cleaved from the 3′ end of the ORF1a polyprotein.
Studies on different coronaviruses identified additional functional body TRSs that produce sg mRNAs encoding accessory proteins or previously unidentified protein ORFs, some of which are virus-strain specific (41, 42, 51, 58, 60, 61). In the present study, additional functional body TRSs that produce sg mRNAs encoding six previously unidentified ORFs in an alternative reading frame were discovered. However, only the protein product expressed from ORF10,071–10,331, which is located within the ORF1b region, was detected by mass spectrometry analysis. This 86-aa product was the largest among those of the additional alternative reading frame ORFs and was predicted to be composed primarily of an α-helix, to have two transmembrane domains, and to contain a protein kinase C phosphorylation site (PredictProtein, https://www.predictprotein.org/). Five of the previously unreported body TRSs produced sg mRNAs encoding ORF15,419–15,505 (28 aa) from the +2 reading frame inside ORF7 (N), which is very similar to the recently identified small ORF7a of both type I and type II PRRSV that is also translated from the +2 reading frame inside ORF7 (N) (62). An alternative ORF inside the N gene encoding an accessary protein has also been identified in the genomes of many beta coronaviruses (63–65). The lack of detection of peptides from ORF15,419–15,505 and the other SHFV alternative frame ORFs by mass spectrometry could be due to small peptide size, low abundance, the location of trypsin cleavage sites, and/or posttranslational modifications preventing trypsin cleavage.
An additional functional arterivirus ORF1a frameshift site that primarily produces the −2 frameshift product nsp2TF was previously identified in the nsp2 region of the PRRSV genome. This site was predicted to be conserved in the SHFV LVR nsp2 region, and a −2 frameshift would produce a fusion protein consisting of the N-terminal two-thirds of nsp2 fused to an alternative 225-aa extension (39). Some −1 frameshift activity at the site in the PRRSV genome was also detected and produced a fusion protein (nsp2N) with a very short extension due to an immediate stop codon. Unlike PRRSV, the −1 frameshift in the SHFV LVR nsp2 produces a fusion protein with an alternative 77-aa extension. Peptides mapping to the unique extensions of the −1 or −2 nsp2 frameshift products were detected in SHFV-infected MA104 cells by mass spectrometry. Similar to the data obtained for PRRSV, the SHFV nsp2 −2 frameshift was more efficient than the −1 frameshift. The intracellular locations and biological functions of the SHFV nsp2 frameshift products are not known.
The detection of functional body TRSs in the SHFV 5′ ORF1a/1b region suggests that nidovirus sg RNA transcription can occur within the 5′ region as well as within the 3′ region of the genome. The SHFV ORF1b sg mRNAs identified provide an alternative means of increasing the abundance of the ORF1b nonstructural proteins, especially nsp11 and nsp12. The identification of multiple functional body TRSs for all but four of the SHFV 3′ ORFs and the consistency of sg mRNA abundance produced from individual TRSs at different times during the infection cycle in two different cell types indicate that nidovirus transcription is accurately regulated and more complex than previously appreciated. The alternating regions of high and low transcription activity detected across the 3′ region of the SHFV genome strongly suggest a role for local secondary and genomic higher-order RNA structures in regulating nidovirus transcription. The discovery of many previously unreported in-frame C-terminal ORFs and alternative frame ORFs encoded by newly identified sg mRNAs indicates that the coding capacity of the SHFV genome and very likely that of other nidovirus genomes was previously underestimated. In fact, a recent gene-expression study of the coronavirus MHV that used ribosome profiling and RNA sequencing identified 15 previously unreported sg mRNAs with unique junction sequences at 5 hpi and detected heterogeneous leader–body junction sequences generated from the same body TRS (66). Translation initiation was also detected at internal AUG codons in the MHV ORF5 by Ribo-Seq. In addition, translation of several small, previously unidentified ORFs either upstream of or embedded within known viral protein ORFs was detected that was often initiated from a noncanonical start codon, such as CUG. These data predict an even bigger coding capacity for nidoviruses. The small MHV ORFs were proposed to regulate the translation of the downstream ORF in the same sg mRNA. Some of the SHFV small C-terminal ORFs and alternative frame ORFs identified in the present study may also function as regulators of a downstream ORF. Alternatively, increasing evidence from different organisms suggests that small proteins, as short as 11 aa, translated from in-frame or alternative frame ORFs are functional and are involved in regulating a variety of cellular processes (67, 68).
Materials and Methods
Methods and sources of reagents can be found in SI Materials and Methods.
SI Materials and Methods
Virus.
The SHFVic LVR strain was constructed from cDNA amplified from the viral genome RNA as described previously (25). SHFV RNA (100 ng) transcribed from the infectious clone in vitro was transfected into MA104 cells, and culture fluid containing progeny virus was collected at 120 h after transfection. The progeny virus was passaged once by infecting MA104 cells at an MOI of 1, and the culture fluid was harvested at 24 hpi, clarified by centrifugation, aliquoted, and stored at –80 °C. The titer of the SHFVic stock was 1.3 × 107 pfu/mL.
Cells.
MA104, an African green monkey embryonic kidney cell line, was a gift from O. Nianan, Centers for Disease Control and Prevention, Atlanta. MA104 cells were cultured in Eagle’s minimal essential medium (Gibco; Life Technologies) supplemented with 10% FBS, 1% l-glutamine, and 0.1% gentamicin and were maintained at 37 °C in a 5% CO2 atmosphere. Peripheral blood mononuclear cells (PBMCs) were isolated from whole blood collected from rhesus macaques (Yerkes Regional Primate Research Center) using Ficoll 400 (Mediatech Inc.) density gradient centrifugation according to standard protocols. The isolated monocytes were seeded in 24-well plates at a density of ∼106 cells per well. After a 1-h attachment period, monocytes were gently washed with RPMI-1640 medium (Gibco; Life Technologies) and incubated in RPMI-1640 medium supplemented with 10% autologous serum and/or 10% FBS, 50 U/mL penicillin, 50 μg/mL streptomycin, and 25 ng/mL human recombinant macrophage colony-stimulating factor (R&D Systems) for 8 d at 37 °C in 5% CO2 to differentiate MΦs. Two-thirds of the culture medium was replaced with fresh growth medium every 3 d during differentiation.
Plaque Assay.
Culture fluid containing wild-type or mutant SHFV virus was collected at different times after infection of MA104 cells, clarified by centrifugation at 100 × g for 5 min at 4 °C, and subjected to serial 10-fold dilution. Diluted virus (100 µL per well) was adsorbed onto confluent MA104 monolayers in six-well plates for 1 h at 37 °C. The inoculum was removed and replaced with 2 mL of overlay medium (2× MEM containing 5% FCS mixed 1:1 with 1% SeaKem ME agarose). The plates were incubated at 37 °C for 48 h before removal of the agarose plug. Cells were stained with 0.05% crystal violet in 10% ethanol to visualize the plaques. Three biological repeats of each sample were each assayed in duplicate.
DIG-Labeled SHFV Probes.
High-fidelity PCR (AccuPrime, Invitrogen) was used to amplify regions in the SHFVic to generate templates for in vitro-synthesized RNA probes targeting different sg mRNAs. Three sets of primers were designed to amplify regions between TRS5 and TRS6, between TRS6 and TRS7, and between TRS7 and the 3′ UTR (Table S1). A T7 promoter was included in the reverse primer. The PCR product templates were transcribed in vitro in the presence of DIG-labeled UTPs using a DIG Northern Starter Kit (Roche) according to the manufacturer’s protocol. The concentrations of the DIG-labeled RNA probes were determined by a dot-blot assay using a DIG-labeled human actin RNA standard (Roche) according to the manufacturer’s protocol. Briefly, a standard curve was generated using serial dilutions of human actin RNA of known concentration. A 1-µL aliquot of each dilution was spotted onto an Amersham Hybond-N+ membrane (GE Healthcare) and UV cross-linked. The membrane was then blocked in DIG blocking buffer (Roche) and incubated with anti-DIG antibody at a 1:10,000 dilution. After washing with DIG washing buffer (Roche), the membrane was developed with CDP-Star (Roche) and imaged with an LAS4000 mini Luminescent Image Analyzer (GE Healthcare). The intensity of each spot on the membrane was measured using Multi Gauge V2.3 software (Fijifilm) and compared with the standard curve to estimate the concentration.
Northern Blot Hybridization.
MA104 cells were grown in six-well plates to confluence and were infected with SHFVic at an MOI of 1. At different times after infection, cell lysates were harvested in TRI reagent (Molecular Research Center, Inc.) followed by extraction of total intracellular RNA. NorthernMax formaldehyde loading dye (Ambion) was added to 1 µg of extracted RNA, and the samples were incubated at 80 °C for 10 min. The denatured RNA was electrophoresed on a 1% formaldehyde agarose gel for 2.5 h at 100 V. An RNA ladder (Millennium Markers-Formamide; Ambion) was used as size markers. After overnight capillary transfer to an Amersham Hybond-N+ membrane (GE Healthcare) and UV cross-linking of the transferred RNA to the membrane, the lane containing the RNA ladder was cut from the membrane and stained with methylene blue. The rest of the membrane was prehybridized in DIG Easy Hyb buffer (Roche) at 68 °C for 30 min, followed by overnight hybridization with individual DIG-labeled, denatured RNA probes (100 ng/mL) at 68 °C. The membrane was then washed with a low-stringency buffer containing 2× SSC plus 0.1% SDS at room temperature followed by a wash with a high-stringency buffer containing 0.1× SSC plus 0.1% SDS at 68 °C. The membrane was then blocked in DIG blocking solution (Roche), incubated with anti-DIG antibody diluted 1:10,000 (Roche), developed with CDP-Star (Roche), and imaged with an LAS4000 mini Luminescent Image Analyzer (GE Healthcare).
Analysis of the Leader–Body Junction Sequences.
One forward and 10 reverse RT-PCR primers were designed to amplify the leader–body junctions and flanking sequences of individual sg mRNAs generated from known or predicted body TRSs in the 3′ region of the SHFV genome (Table S2). The forward primer targeted a region in the 5′ leader (genome nucleotides 61–79). The reverse primers targeted sequences downstream of each known or predicted 3′ body TRS with the 5′ most reverse primer located 418 nt downstream of TRS2′. MA104 cells were infected with SHFVic at an MOI of 1, and cell lysates were harvested in TRI reagent at 24 hpi. Total intracellular RNA was extracted from the cell lysate and subjected to high-fidelity RT-PCR. The RT-PCR products were separated on a 2% agarose gel, stained with ethidium bromide, and imaged with a BioDoc-It imaging system (UVP, LLC). The remainder of the RT-PCR product was run on a 2% agarose gel and stained with ethidium bromide. Bands visualized under long-wavelength UV light were excised, and the DNAs were eluted, ligated with pCR4-TOPO DNA (Invitrogen), and transformed into TOP10 chemically competent cells (Invitrogen), followed by plating on LB plates containing 50 µg/mL kanamycin and overnight growth at 37 °C. Forty colonies were picked randomly from the plates for the regions between TRS2′ and TRS4′ and between TRS5 and TRS6, and 20 colonies were picked randomly for each of the other eight regions. The picked colonies were grown overnight in liquid culture. Plasmid DNA was extracted, digested with EcoRI, separated on a 2% agarose gel, and imaged with a BioDoc-It imaging system. For each leader–body junction region, plasmid DNAs carrying inserts of different sizes were sequenced using a T3 primer and were analyzed using DNASTAR Lasergene software to identify leader–body junctions. The body TRS core sequences were identified after alignment of the leader–body junction sequences to the 5′ leader and to the 3′ genomic sequence. A 12-nt region in each of the seven body TRSs for ORF5-C-68aa that could form a duplex with the leader TRS was identified, and the stability of each duplex was calculated using ViennaRNA Package 2.3.5 (https://www.tbi.univie.ac.at/RNA/).
NGS of SHFV sg mRNAs.
MA104 cells grown in T-75 flasks to ∼100% confluence were infected with SHFVic at an MOI of 1. At 8 and 18 hpi, culture fluid was removed, and cell lysates were harvested in TRI reagent (Molecular Research Center) according to the manufacturer’s protocol and sent to University of North Carolina, Chapel Hill, for library construction using an Illumina TruSeq Stranded mRNA Sample Preparation Kit (Illumina). Briefly, mRNA was obtained by poly(A) selection. The quality and quantity of the selected mRNA were assessed using an Agilent RNA 6000 Nano Kit (Agilent Technologies), and then the mRNA was fragmented and subjected to RT-PCR using random hexamers followed by end repair and adenylation. The 3′ adenylated dsDNAs were ligated to Illumina adapters and amplified by PCR. The final dsDNA library was validated with an Agilent High Sensitivity DNA Kit (Agilent Technologies) and subjected to RNA-seq using Illumina HiSeq (paired-end reads, 100-bp read length) with one sample per lane. Libraries from three biological repeats for each time point were prepared and sequenced.
Macaque MΦs differentiated from isolated PBMCs and grown to ∼80% confluence in 24-well plates were mock-infected or infected with SHFVic at an MOI of 1. At 7 and 16 hpi, culture fluid was removed, and cell lysates harvested in TRI reagent from four replicate wells were combined. Cellular RNA was extracted following the manufacture’s protocol and sent to the Georgia Genomics Facility of the University of Georgia for library preparation using an NGS Stranded RNA Library Preparation Kit (KAPA Biosystems). After mRNA was selected from the mock- and SHFVic-infected macaque MΦ RNA samples by poly(A) selection, libraries were prepared and subjected to RNA-seq using Illumina MiSeq (paired-end reads, 75-bp read length) with four samples pooled per lane.
NGS Data Analysis Workflow.
CLC Genomics Workbench 8.5.1 software was used to analyze the RNA-seq data. The following four sequences were imported as references: the SHFV LVR full-length genome (NC_003092.2), this SHFV genome without the 5′ end (genome nucleotides 250–15717), a 15-nt leader sequence (genome nucleotides 186–200), and the downstream 15-nt leader TRS/ORF1a sequence (genome nucleotides 201–215). The 15-nt leader–body junction sequences for each of the nine previously identified and 36 newly identified sg mRNAs were also imported as references. Each leader–body junction reference sequence consisted of the 5′ leader TRS core sequence fused to an individual body TRS region sequence. The Illumina HiSeq and MiSeq fastq reads were trimmed before alignment to the SHFV genome using the default setting. All the reads that aligned to the SHFV genome (30–50% of the total cellular RNA) were then extracted and mapped again to the 15-nt leader sequence (genome nucleotides 186–200) using a high-stringency setting (∼85% identity required). All the reads mapping to the leader sequence were then mapped again to another l5-nt region (genome nucleotides 201–215) containing both the 5′-leader TRS core sequence and the beginning of the ORF1a using the highest-stringency setting (100% identity required). All the unmapped reads were then collected and subjected to a workflow which sequentially mapped reads to each 15-nt known sg mRNA leader–body junction using the highest-stringency setting. All the remaining reads that contained the 5′ leader sequence but failed to map to any of the identified leader–body junctions were then collected and mapped to the SHFV genome sequence without the 5′ end (nucleotides 250–15,717) using the default setting. All the reads that mapped to a region in the SHFV genome were then manually searched for previously unidentified leader–body junction sequences. Once a unique leader–body junction sequence (15 nt) was identified, it was used as a reference for read mapping to determine the number of reads containing this sequence.
The number of reads mapping to each identified leader–body junction was used as an estimate of the transcription level of the sg mRNA containing that unique junction sequence. Because sg mRNAs with alternative leader–body junction sequences can be generated from a single TRS, the total number of reads mapping to all the different leader–body junction sequences generated from the same body TRS was used to estimate the total sg mRNA transcription level from that TRS. The relative sg mRNA abundance for each TRS was calculated by dividing the total number of reads for all the sg mRNAs produced from a particular TRS by the total number of mapped reads containing leader–body junction sequences. To estimate the relative sg mRNA abundance for each known structural protein or previously unreported ORF encoded by a newly identified sg mRNA, the sg mRNA abundances for each body TRS producing alternative sg mRNAs encoding the same ORF were added together. The fold changes in the transcription level and sg mRNA abundance between earlier and later times post infection in both cell types tested were also calculated.
Construction of Mutant SHFV Infectious Clones.
Five sets of primers were designed to mutate the start codon of each identified in-frame C-terminal ORF identified but not to change the amino acid sequence translated from the overlapping ORF (Table S4). Each set of primers was used to introduce a single-nucleotide substitution into the appropriate SHFVic fragment (fragment III for ∆GP2′-C, fragment IV for ∆GP4′-C and ∆GP2-C, fragment V for ∆GP5-C and ∆GP6-C) (25) using a QuikChange Lightning Site-Directed Mutagenesis Kit (Agilent Technologies) according to the manufacturer’s protocol. After the validity of each mutant fragment generated by sequencing was confirmed, each mutant fragment was ligated to the other four wild-type SHFV genome fragments after digestion with PflMI and was cloned into a linearized pACYC184 vector. Mutant SHFV genome RNA was then transcribed in vitro from a linearized full-length mutant clone, and ∼100 ng of purified mutant SHFV RNA was transfected into subconfluent MA104 cell cultures in a six-well plate. At 120 h after transfection, culture fluid (500 μL) was collected and used to infect a confluent MA104 monolayer in a 10-cm dish. At 28 hpi, culture fluid was collected from the dish, clarified by centrifugation at 100 × g for 5 min at 4 °C, aliquoted, and stored at −80 °C as a P1 mutant SHFV stock. Some of the clarified supernatant was used to extract mutant SHFV genome RNA. After RT-PCR amplification of the mutated regions, the RT-PCR products were sequenced to verify the retention of the mutation. Each of the P1 virus stocks was also titered by plaque assay on MA104 cells.
Mass Spectrometry Analysis of SHFV Proteins.
MA104 cells were seeded in a six-well plate until confluent and then were infected with SHFVic virus at an MOI of 3. At 20 hpi, cell lysates were collected in RIPA buffer (1× PBS, 1% Nonidet P-40, 0.5% sodium deoxycholate, and 0.1% SDS) containing 1× Halt protease inhibitor mixture (Thermo Scientific). The total amount of protein in each cell lysate was measured using a BCA assay (Pierce) according to the manufacturer’s protocol. Approximately 28 µg of total protein was loaded onto one lane of a 12% NuPAGE Bis-Tris gel and electrophoresed ∼1 cm into the gel with NuPAGE Mops SDS buffer (Life Technology). Precision Plus Protein Standards (Bio-Rad) were run on the same gel in a separate lane. After the proteins were stained with a colloidal blue staining kit (Life Technology), the protein gel was shipped to the Wistar Institute Proteomics and Metabolomics Facility, Philadelphia. The gel region containing proteins from 0 to 75 kDa was excised as a single band, the proteins were digested with trypsin, and the tryptic peptides were analyzed by LC-MS/MS on a Q Exactive Plus mass spectrometer (Thermo Fisher Scientific). Peptide sequences were identified using MaxQuant 1.5.2.8 (69). MS/MS spectra were searched against a UniProt Chlorocebus (August 2016) database combined with a custom SHFV database containing all the predicted ORFs (≥20 aa) from all three reading frames on both the plus- and minus-strand viral RNAs. The search parameters included full tryptic specificity with up to two missed cleavages, static carboxyamidomethylation of Cys and variable oxidation of Met, and protein N-terminal acetylation. Consensus identification lists were generated with false-discovery rates of 1% at protein and peptide levels. The detected SHFV proteins were quantified based on the identified razor plus unique peptides using the iBAQ method (70).
Acknowledgments
We thank B. Stockman and F. Ede for technical assistance and M. Basu for help with the final figure files. The research was supported by Public Health Service Research Grants AI073824 (to M.A.B.), National Institute of Allergy and Infectious Diseases, NIH Grant U19 AI107810 (to R.S.B.), and by a Georgia State University Molecular Basis of Disease Seed Grant (to H.D.). H.D. was supported by a Georgia State University Molecular Basis of Disease Fellowship.
Footnotes
The authors declare no conflict of interest.
This article is a PNAS Direct Submission.
This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1706696114/-/DCSupplemental.
References
- 1.Pasternak AO, Spaan WJ, Snijder EJ. Nidovirus transcription: How to make sense...? J Gen Virol. 2006;87:1403–1421. doi: 10.1099/vir.0.81611-0. [DOI] [PubMed] [Google Scholar]
- 2.Gorbalenya AE, Enjuanes L, Ziebuhr J, Snijder EJ. Nidovirales: Evolving the largest RNA virus genome. Virus Res. 2006;117:17–37. doi: 10.1016/j.virusres.2006.01.017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Zirkel F, et al. Identification and characterization of genetically divergent members of the newly established family Mesoniviridae. J Virol. 2013;87:6346–6358. doi: 10.1128/JVI.00416-13. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.London WT. Epizootiology, transmission and approach to prevention of fatal simian haemorrhagic fever in rhesus monkeys. Nature. 1977;268:344–345. doi: 10.1038/268344a0. [DOI] [PubMed] [Google Scholar]
- 5.Lauck M, et al. Exceptional simian hemorrhagic fever virus diversity in a wild African primate community. J Virol. 2013;87:688–691. doi: 10.1128/JVI.02433-12. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Vatter HA, et al. A simian hemorrhagic fever virus isolate from persistently infected baboons efficiently induces hemorrhagic fever disease in Japanese macaques. Virology. 2015;474:186–198. doi: 10.1016/j.virol.2014.10.018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Bailey AL, et al. Two novel simian arteriviruses in captive and wild baboons (Papio spp.) J Virol. 2014;88:13231–13239. doi: 10.1128/JVI.02203-14. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Allen AM, Palmer AE, Tauraso NM, Shelokov A. Simian hemorrhagic fever. II. Studies in pathology. Am J Trop Med Hyg. 1968;17:413–421. doi: 10.4269/ajtmh.1968.17.413. [DOI] [PubMed] [Google Scholar]
- 9.Brinton MA, Di H, Vatter HA. Simian hemorrhagic fever virus: Recent advances. Virus Res. 2015;202:112–119. doi: 10.1016/j.virusres.2014.11.024. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Snijder EJ, Kikkert M, Fang Y. Arterivirus molecular biology and pathogenesis. J Gen Virol. 2013;94:2141–2163. doi: 10.1099/vir.0.056341-0. [DOI] [PubMed] [Google Scholar]
- 11.Snijder EJ, Meulenberg JJM. The molecular biology of arteriviruses. J Gen Virol. 1998;79:961–979. doi: 10.1099/0022-1317-79-5-961. [DOI] [PubMed] [Google Scholar]
- 12.de Vries AA, Post SM, Raamsman MJ, Horzinek MC, Rottier PJ. The two major envelope proteins of equine arteritis virus associate into disulfide-linked heterodimers. J Virol. 1995;69:4668–4674. doi: 10.1128/jvi.69.8.4668-4674.1995. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Wieringa R, et al. Structural protein requirements in equine arteritis virus assembly. J Virol. 2004;78:13019–13027. doi: 10.1128/JVI.78.23.13019-13027.2004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Snijder EJ, Dobbe JC, Spaan WJ. Heterodimerization of the two major envelope proteins is essential for arterivirus infectivity. J Virol. 2003;77:97–104. doi: 10.1128/JVI.77.1.97-104.2003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Dea S, Gagnon CA, Mardassi H, Pirzadeh B, Rogan D. Current knowledge on the structural proteins of porcine reproductive and respiratory syndrome (PRRS) virus: Comparison of the North American and European isolates. Arch Virol. 2000;145:659–688. doi: 10.1007/s007050050662. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Doan DN, Dokland T. Structure of the nucleocapsid protein of porcine reproductive and respiratory syndrome virus. Structure. 2003;11:1445–1451. doi: 10.1016/j.str.2003.09.018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Dokland T. The structural biology of PRRSV. Virus Res. 2010;154:86–97. doi: 10.1016/j.virusres.2010.07.029. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Tian D, et al. Arterivirus minor envelope proteins are a major determinant of viral tropism in cell culture. J Virol. 2012;86:3701–3712. doi: 10.1128/JVI.06836-11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Das PB, et al. The minor envelope glycoproteins GP2a and GP4 of porcine reproductive and respiratory syndrome virus interact with the receptor CD163. J Virol. 2010;84:1731–1740. doi: 10.1128/JVI.01774-09. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Lee C, Yoo D. The small envelope protein of porcine reproductive and respiratory syndrome virus possesses ion channel protein-like properties. Virology. 2006;355:30–43. doi: 10.1016/j.virol.2006.07.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Firth AE, et al. Discovery of a small arterivirus gene that overlaps the GP5 coding sequence and is important for virus production. J Gen Virol. 2011;92:1097–1106. doi: 10.1099/vir.0.029264-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Johnson CR, Griggs TF, Gnanandarajah J, Murtaugh MP. Novel structural protein in porcine reproductive and respiratory syndrome virus encoded by an alternative ORF5 present in all arteriviruses. J Gen Virol. 2011;92:1107–1116. doi: 10.1099/vir.0.030213-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Vatter HA, et al. Functional analyses of the three simian hemorrhagic fever virus nonstructural protein 1 papain-like proteases. J Virol. 2014;88:9129–9140. doi: 10.1128/JVI.01020-14. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Godeny EK, de Vries AA, Wang XC, Smith SL, de Groot RJ. Identification of the leader-body junctions for the viral subgenomic mRNAs and organization of the simian hemorrhagic fever virus genome: Evidence for gene duplication during arterivirus evolution. J Virol. 1998;72:862–867. doi: 10.1128/jvi.72.1.862-867.1998. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Vatter HA, Di H, Donaldson EF, Baric RS, Brinton MA. Each of the eight simian hemorrhagic fever virus minor structural proteins is functionally important. Virology. 2014;462–463:351–362, and erratum (2014) 464–465:461. doi: 10.1016/j.virol.2014.06.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Smits SL, et al. Torovirus non-discontinuous transcription: Mutational analysis of a subgenomic mRNA promoter. J Virol. 2005;79:8275–8281. doi: 10.1128/JVI.79.13.8275-8281.2005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Sawicki SG, Sawicki DL, Siddell SG. A contemporary view of coronavirus transcription. J Virol. 2007;81:20–29. doi: 10.1128/JVI.01358-06. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Van Den Born E, Gultyaev AP, Snijder EJ. Secondary structure and function of the 5′-proximal region of the equine arteritis virus RNA genome. RNA. 2004;10:424–437. doi: 10.1261/rna.5174804. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.van den Born E, Posthuma CC, Gultyaev AP, Snijder EJ. Discontinuous subgenomic RNA synthesis in arteriviruses is guided by an RNA hairpin structure located in the genomic leader region. J Virol. 2005;79:6312–6324. doi: 10.1128/JVI.79.10.6312-6324.2005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.de Vries AA, et al. All subgenomic mRNAs of equine arteritis virus contain a common leader sequence. Nucleic Acids Res. 1990;18:3241–3247. doi: 10.1093/nar/18.11.3241. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.van Marle G, et al. Arterivirus discontinuous mRNA transcription is guided by base pairing between sense and antisense transcription-regulating sequences. Proc Natl Acad Sci USA. 1999;96:12056–12061. doi: 10.1073/pnas.96.21.12056. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.den Boon JA, Kleijnen MF, Spaan WJ, Snijder EJ. Equine arteritis virus subgenomic mRNA synthesis: Analysis of leader-body junctions and replicative-form RNAs. J Virol. 1996;70:4291–4298. doi: 10.1128/jvi.70.7.4291-4298.1996. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Pasternak AO, van den Born E, Spaan WJ, Snijder EJ. Sequence requirements for RNA strand transfer during nidovirus discontinuous subgenomic RNA synthesis. EMBO J. 2001;20:7220–7228. doi: 10.1093/emboj/20.24.7220. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Snijder EJ, van Tol H, Pedersen KW, Raamsman MJ, de Vries AA. Identification of a novel structural protein of arteriviruses. J Virol. 1999;73:6335–6345. doi: 10.1128/jvi.73.8.6335-6345.1999. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Zeng L, Godeny EK, Methven SL, Brinton MA. Analysis of simian hemorrhagic fever virus (SHFV) subgenomic RNAs, junction sequences, and 5′ leader. Virology. 1995;207:543–548. doi: 10.1006/viro.1995.1114. [DOI] [PubMed] [Google Scholar]
- 36.Smith SL, Wang X, Godeny EK. Sequence of the 3′ end of the simian hemorrhagic fever virus genome. Gene. 1997;191:205–210. doi: 10.1016/S0378-1119(97)00061-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Godeny EK, Zeng L, Smith SL, Brinton MA. Molecular characterization of the 3′ terminus of the simian hemorrhagic fever virus genome. J Virol. 1995;69:2679–2683. doi: 10.1128/jvi.69.4.2679-2683.1995. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.den Boon JA, et al. Equine arteritis virus is not a togavirus but belongs to the coronaviruslike superfamily. J Virol. 1991;65:2910–2920. doi: 10.1128/jvi.65.6.2910-2920.1991. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Fang Y, et al. Efficient -2 frameshifting by mammalian ribosomes to synthesize an additional arterivirus protein. Proc Natl Acad Sci USA. 2012;109:E2920–E2928. doi: 10.1073/pnas.1211145109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Pasternak AO, Gultyaev AP, Spaan WJ, Snijder EJ. Genetic manipulation of arterivirus alternative mRNA leader-body junction sites reveals tight regulation of structural protein expression. J Virol. 2000;74:11642–11653. doi: 10.1128/jvi.74.24.11642-11653.2000. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Hussain S, et al. Identification of novel subgenomic RNAs and noncanonical transcription initiation signals of severe acute respiratory syndrome coronavirus. J Virol. 2005;79:5288–5295. doi: 10.1128/JVI.79.9.5288-5295.2005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Bentley K, Keep SM, Armesto M, Britton P. Identification of a noncanonically transcribed subgenomic mRNA of infectious bronchitis virus and other gammacoronaviruses. J Virol. 2013;87:2128–2136. doi: 10.1128/JVI.02967-12. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Vaughn EM, Halbur PG, Paul PS. Sequence comparison of porcine respiratory coronavirus isolates reveals heterogeneity in the S, 3, and 3-1 genes. J Virol. 1995;69:3176–3184. doi: 10.1128/jvi.69.5.3176-3184.1995. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Callebaut P, Correa I, Pensaert M, Jiménez G, Enjuanes L. Antigenic differentiation between transmissible gastroenteritis virus of swine and a related porcine respiratory coronavirus. J Gen Virol. 1988;69:1725–1730. doi: 10.1099/0022-1317-69-7-1725. [DOI] [PubMed] [Google Scholar]
- 45.Lin YC, Chang RY, Chueh LL. Leader-body junction sequence of the viral subgenomic mRNAs of porcine reproductive and respiratory syndrome virus isolated in Taiwan. J Vet Med Sci. 2002;64:961–965. doi: 10.1292/jvms.64.961. [DOI] [PubMed] [Google Scholar]
- 46.Ozdarendeli A, et al. Downstream sequences influence the choice between a naturally occurring noncanonical and closely positioned upstream canonical heptameric fusion motif during bovine coronavirus subgenomic mRNA synthesis. J Virol. 2001;75:7362–7374. doi: 10.1128/JVI.75.16.7362-7374.2001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Zúñiga S, Sola I, Alonso S, Enjuanes L. Sequence motifs involved in the regulation of discontinuous coronavirus subgenomic RNA synthesis. J Virol. 2004;78:980–994. doi: 10.1128/JVI.78.2.980-994.2004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Schelle B, Karl N, Ludewig B, Siddell SG, Thiel V. Selective replication of coronavirus genomes that express nucleocapsid protein. J Virol. 2005;79:6620–6630. doi: 10.1128/JVI.79.11.6620-6630.2005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.O’Connor JB, Brian DA. Downstream ribosomal entry for translation of coronavirus TGEV gene 3b. Virology. 2000;269:172–182. doi: 10.1006/viro.2000.0218. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Zhang X, Liu R. Identification of a noncanonical signal for transcription of a novel subgenomic mRNA of mouse hepatitis virus: Implication for the mechanism of coronavirus RNA transcription. Virology. 2000;278:75–85. doi: 10.1006/viro.2000.0637. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Schaad MC, Baric RS. Evidence for new transcriptional units encoded at the 3′ end of the mouse hepatitis virus genome. Virology. 1993;196:190–198. doi: 10.1006/viro.1993.1467. [DOI] [PubMed] [Google Scholar]
- 52.Makino S, Soe LH, Shieh CK, Lai MM. Discontinuous transcription generates heterogeneity at the leader fusion sites of coronavirus mRNAs. J Virol. 1988;62:3870–3873. doi: 10.1128/jvi.62.10.3870-3873.1988. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Pasternak AO, van den Born E, Spaan WJ, Snijder EJ. The stability of the duplex between sense and antisense transcription-regulating sequences is a crucial factor in arterivirus subgenomic mRNA synthesis. J Virol. 2003;77:1175–1183. doi: 10.1128/JVI.77.2.1175-1183.2003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Sola I, Moreno JL, Zúñiga S, Alonso S, Enjuanes L. Role of nucleotides immediately flanking the transcription-regulating sequence core in coronavirus subgenomic mRNA synthesis. J Virol. 2005;79:2506–2516. doi: 10.1128/JVI.79.4.2506-2516.2005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Pasternak AO, Spaan WJ, Snijder EJ. Regulation of relative abundance of arterivirus subgenomic mRNAs. J Virol. 2004;78:8102–8113. doi: 10.1128/JVI.78.15.8102-8113.2004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Moreno JL, Zúñiga S, Enjuanes L, Sola I. Identification of a coronavirus transcription enhancer. J Virol. 2008;82:3882–3893. doi: 10.1128/JVI.02622-07. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Mateos-Gomez PA, Morales L, Zuñiga S, Enjuanes L, Sola I. Long-distance RNA-RNA interactions in the coronavirus genome form high-order structures promoting discontinuous RNA synthesis during transcription. J Virol. 2013;87:177–186. doi: 10.1128/JVI.01782-12. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.La Monica N, Yokomori K, Lai MM. Coronavirus mRNA synthesis: Identification of novel transcription initiation signals which are differentially regulated by different leader sequences. Virology. 1992;188:402–407. doi: 10.1016/0042-6822(92)90774-J. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Chen Z, et al. Sequences of 3′ end of genome and of 5′ end of open reading frame 1a of lactate dehydrogenase-elevating virus and common junction motifs between 5′ leader and bodies of seven subgenomic mRNAs. J Gen Virol. 1993;74:643–659. doi: 10.1099/0022-1317-74-4-643. [DOI] [PubMed] [Google Scholar]
- 60.Shieh CK, et al. Identification of a new transcriptional initiation site and the corresponding functional gene 2b in the murine coronavirus RNA genome. J Virol. 1989;63:3729–3736. doi: 10.1128/jvi.63.9.3729-3736.1989. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Yokomori K, Banner LR, Lai MM. Heterogeneity of gene expression of the hemagglutinin-esterase (HE) protein of murine coronaviruses. Virology. 1991;183:647–657. doi: 10.1016/0042-6822(91)90994-M. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Olasz F, et al. Immunological and biochemical characterisation of 7ap, a short protein translated from an alternative frame of ORF7 of PRRSV. Acta Vet Hung. 2016;64:273–287. doi: 10.1556/004.2016.027. [DOI] [PubMed] [Google Scholar]
- 63.Fischer F, Peng D, Hingley ST, Weiss SR, Masters PS. The internal open reading frame within the nucleocapsid gene of mouse hepatitis virus encodes a structural protein that is not essential for viral replication. J Virol. 1997;71:996–1003. doi: 10.1128/jvi.71.2.996-1003.1997. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Meier C, et al. The crystal structure of ORF-9b, a lipid binding protein from the SARS coronavirus. Structure. 2006;14:1157–1165. doi: 10.1016/j.str.2006.05.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Liu DX, Fung TS, Chong KK, Shukla A, Hilgenfeld R. Accessory proteins of SARS-CoV and other coronaviruses. Antiviral Res. 2014;109:97–109. doi: 10.1016/j.antiviral.2014.06.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Irigoyen N, et al. High-resolution analysis of coronavirus gene expression by RNA sequencing and ribosome profiling. PLoS Pathog. 2016;12:e1005473. doi: 10.1371/journal.ppat.1005473. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Pueyo JI, Magny EG, Couso JP. New peptides under the s(ORF)ace of the genome. Trends Biochem Sci. 2016;41:665–678. doi: 10.1016/j.tibs.2016.05.003. [DOI] [PubMed] [Google Scholar]
- 68.Landry CR, Zhong X, Nielly-Thibault L, Roucou X. Found in translation: Functions and evolution of a recently discovered alternative proteome. Curr Opin Struct Biol. 2015;32:74–80. doi: 10.1016/j.sbi.2015.02.017. [DOI] [PubMed] [Google Scholar]
- 69.Cox J, Mann M. MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification. Nat Biotechnol. 2008;26:1367–1372. doi: 10.1038/nbt.1511. [DOI] [PubMed] [Google Scholar]
- 70.Schwanhäusser B, et al. Global quantification of mammalian gene expression control. Nature. 2011;473:337–342. doi: 10.1038/nature10098. [DOI] [PubMed] [Google Scholar]