Abstract
Kaposi’s sarcoma (KS)-associated herpesvirus or human herpesvirus 8 (HHV8) DNA is found consistently in nearly all classical, endemic, transplant, and AIDS-associated KS lesions, as well as in several AIDS-associated lymphomas. We have previously sequenced the genes for the highly variable open reading frame K1 (ORF-K1) protein from more than 60 different HHV8 samples and demonstrated that they display up to 30% amino acid variability and cluster into four very distinct evolutionary subgroups (the A, B, C, and D subtypes) that correlate with the major migrationary diasporas of modern humans. Here we have extended this type of analysis to six other loci across the HHV8 genome to further evaluate overall genotype patterns and the potential for chimeric genomes. Comparison of the relatively conserved ORF26, T0.7/K12, and ORF75 gene regions at map positions 0.35, 0.85, and 0.96 revealed typical ORF-K1-linked subtype patterns, except that between 20 and 30% of the genomes analyzed proved to be either intertypic or intratypic mosaics. In addition, a 2,500-bp region found at the extreme right-hand side of the unique segment in 45 HHV8 genomes proved to be highly diverged from the 3,500-bp sequence found at this position in the other 18 HHV8 genomes examined. Furthermore, these previously uncharacterized “orphan” region sequences proved to encompass multiexon latent-state mRNAs encoding two highly diverged alleles of the novel ORF-K15 protein. The predominant (P) and minor (M) forms of HHV8 ORF-K15 are structurally related integral membrane proteins that have only 33% overall amino acid identity to one another but retain conserved likely tyrosine kinase signaling motifs and may be distant evolutionary relatives of the LMP2 latency protein of Epstein-Barr virus. The M allele of ORF-K15 is also physically linked to a distinctive M subtype of the adjacent ORF75 gene locus, and in some cases, this linkage extends as far back as the T0.7 locus also. Overall, the results suggest that an original recombination event with a related primate virus from an unknown source introduced exogenous right-hand side ORF-K15(M) sequences into an ancient M form of HHV8, followed by eventual acquisition into the subtype C lineage of the modern P-form of the HHV8 genome and subsequent additional, more recent transfers by homologous recombination events into several subtype A and B lineages as well.
Kaposi’s sarcoma (KS) herpesvirus or human herpesvirus 8 (HHV8) DNA is present in virtually all tumor samples of both classical and AIDS-associated forms of KS (1a, 5), as well as in peripheral blood mononuclear cells in up to 50% of homosexual AIDS patients with KS (35). Infection without disease is also widespread in parts of Africa, the Middle East, and Mediterranean countries, with up to 60% seropositivity in Central Africa and 15% seropositivity in Italy, where classic and endemic KS rates are relatively high (11, 12, 16). Seropositivity is also very high among both KS positive and KS-negative male homosexual AIDS patients (85 and 50%, respectively) but not usually in human immunodeficiency virus (HIV)-positive intravenous drug user or hemophiliac AIDS patients. The seroprevalence in blood donors in the United States, the United Kingdom, and Japan may be no greater than 1% (11, 12), which also correlates with low KS incidences in renal transplant patients in those countries compared to reports of significant iatrogenic KS in patients of Middle East or Mediterranean Jewish ancestry in Toronto and in Saudi Arabia (13, 29).
HHV8 is a class gamma-2 herpesvirus that is distantly related to herpesvirus saimiri and Epstein-Barr virus (EBV) but contains several novel loci that include diverged viral homologues of exogenously acquired cellular genes encoding proteins such as interleukin-6, MIP-IA, MIP-IB, BCK, dihydrofolate reductase, TS, IRF, BCL-2, OX-2, FLIP, GCR, and CYC-D in place of the latent-state EBNA proteins and several other specific regulatory gene products of EBV (4, 22–27, 31). The nearly complete primary nucleotide sequences of the 190-kb double-stranded DNA molecules of two HHV8 genomes, one derived from an AIDS body cavity-based lymphoma cell line (31) and the other derived from a KS lesion (24), have been determined, and they differ overall by only 0.4% from each other.
Comparison of low-level sequence variability among different HHV8 samples in the open reading frame 26 (ORF26) and ORF75 gene regions originally led us to conclude that at least three distinct subtypes of HHV8 genomes occur (38). Subsequently, we found that the 289-amino-acid ORF-K1 membrane protein of HHV8 displays unusually high levels of genetic variability, resulting in four major subtypes, referred to as A, B, C, and D, that differ by 15 to 30% at the amino acid level (27, 37). Even within these ORF-K1 subtypes, amino acid differences of up to 6 to 8% among different clusters of subtype A ORF-K1 proteins and of 9 to 12% between two major branches of subtype C ORF-K1 proteins, together with small in-frame deletions, permitted the division of 46 genomes within the A-plus-C supergroup into 10 distinct variants plus several more narrowly diverged clades. Most U.S. AIDS KS samples are A1, A4, or C3 variants, whereas most classic KS cases from the Middle East, Asia, Europe, and United States are C2 variants. In contrast, samples from Africa are predominantly of the B subtype (plus occasional A5 variants) and the rare D subtype appears to be of Pacific Island origin. For example, among a total of 63 genomes examined at the ORF-K1 locus, 10 of 11 samples from central and southern Africa were subtype B; 6 of 7 from Saudi Arabia were subtype C2, C4, or C5; 7 of 9 from Taiwan represented a novel C3′ clade; 6 of 7 from AIDS patients in Maryland were A1 variants, and all 3 samples from classic KS patients of native Pacific Island heritage were the only examples found of the novel D subtype.
We interpreted these results to imply that the major evolutionary branches of HHV8 correlate with the modern human population divisions that occurred via migrations from Africa first to southern Asia and Oceania 60,000 years ago and second to Europe and northern Asia some 35,000 years ago (37). The fact that cladal populations of the virus are still recognizable from paleolithic times presumably reflects both the relatively low-level distribution of the virus in many present-day populations and a relatively slow rate of horizontal transmission, which would together reduce recombination and scrambling of the differences to minimal levels. This scenario provides an appropriate evolutionary time scale for the application of some unknown but powerful biological selective pressure that generated ORF-K1 diversity. The ORF-K1 amino acid variability levels are similar to those found in immunoglobulin variable chains and in the HIV ENV protein, but we presume that unlike those situations, herpesviruses have no specialized mechanisms to rapidly generate this level of diversity. Nevertheless, it is plausible that some of the HHV8 cladal distribution patterns, especially within the hypervariable VR* loop, could reflect much more recent founder effects associated with higher horizontal infectivity rates within the AIDS epidemic. Further studies to continue to explore the patterns of HHV8 strain differences were undertaken because of the many intriguing questions raised about the molecular evolution of HHV8 genomes and the possibility that strain differences are associated with different disease states.
In the present study, we have evaluated variability at six other loci across the HHV8 genome, including three segments in the central conserved portions of the genome and three at the extreme right-hand end of the DNA molecule. Overall, the results have (i) confirmed that the divisions into three or four major subtypes can also be recognized within the more conserved portions of the genome, (ii) led to the identification of two highly diverged alternative allelic forms (referred to as predominant [P] and minor [M]) of the ORF-K15 region at the right-hand side (RHS) of the genome, and (iii) provided strong evidence that some genomes represent mosaics derived either by recombination between major subtypes or with some unknown but related exogenous virus source. In addition, we present evidence from reverse transcription (RT)-PCR analysis that the orphan extreme RHS regions of both the P and M subtypes of HHV8 encompass complex spliced latent-state genes encoding highly diverged integral membrane proteins that are distantly related to the LMP2 protein of EBV.
MATERIALS AND METHODS
HHV8 genomic phage library and subclones.
A set of phage clones derived from two partial Sau3A BCBL-R tumor DNA genomic libraries in λEMBL3 and λDASHII backgrounds were described previously (25, 38). A λEMBL3 clone (λB3-2) containing a 17-kb insert from position 120.6 to position 137.4 proved to encompass the RHS terminus across the boundary of the unique region into the RHS terminal tandem repeats (TTRs). A 3.3-kb HindIII-SalI plasmid subclone, pDJA62, encompassing all of ORF-K15(P) and the RHS TTR from a subtype A genome was generated and sequenced by shotgun M13 procedures (part of the updated GenBank ORF75/K15 file [accession no. U85269]). Two additional λEMBL3 clones, λE-A2 (108.3 to 120.4) and λE-C2 (115.7 to 129.0), encompassing the T0.7 gene region were also isolated, and a 5.0-kb EcoRI-to-EcoRI plasmid subclone was generated for initial PCR sequencing of the BCBL-R T0.7 region equivalent to coordinates 115470 to 120379 of BC1 (31).
HHV8-positive DNA samples.
The sources and relevant characteristics of all of the KS and primary effusion lymphoma (PEL) DNA samples used here, as well as the procedures used for extraction of archival paraffin or frozen blocks, were described previously (37, 38).
PCR amplification and sequencing primers.
Direct or nested PCR products from the following regions of the HHV8 genome were generated with Bethesda Research Laboratories (BRL) Taq or Stratagene extend polymerase in a Techne PHC3 thermocycler set at 94°C for 1 min, 50°C for 1 min, and 72°C for 2 min over 35 cycles using 20 to 100 ng of template DNA: ORF26 (330 bp), LGH1701 [5′-(GGAT)GGATCCCTCTGACAACC-3′] and LGH1702 [5′-(ACGT)GGATCCGTGTTGTCTACG-3′]; ORF75-E (854 bp), LGH2087 (5′-CAGGTCGTCTACTATTCTG-3′) and LGH1704 [5′-(GTAC)GGATCCACGGAGCATAC-3′], as well as LGH1984 [5′-(CTAG)AGATCTGTTTAGTCCGGAG-3′] and LGH2000 (5′-GGAAACAGGGTGCTGTG-3′); T0.7 (646 bp), LGH2076 (5′-GCTGCAATGTACTGCCATG-3′) and LGH2075 (5′-CTCCAATCCCAATGCATGGA-3′); ORF72 (vCYC-D) (594 bp), LGH2045 (5′-GTAGAACGGAAACATCGCA-3′) and LGH2046 (5′-GATTGGTATTGGGACCTTTC-3′); ORF-K1 (1,065 bp), LGH2089 (5′-GTTCTGCCAGGCATAGTC-3′) and LGH2088 (5′-AATAAGTATCCGACCTCAT-3′); K14.1(P) (362 bp), LGH2079 (5′-GAGATCACTCTCCAACCAC-3′) and LGH2033 (5′-GGAGTGCCTTCCGTATAG-3′); K14.1(M) (450 bp), LGH2079 (5′-GAGATCACTCTCCAACCAC-3′) and LGH2506 (5′-CACAGTCACCTATGCTAG-3′); K15(P) (580 bp), LGH2476 (5′-GCAGTGTTTTATTAACGTC-3′) and LGH2477 (5′-CAAACCCCATTTACTTC-3′); K15(M) (370 bp), LGH2473 (5′-CATGCAGCGAGCTTGAGA-3′) and LGH2474 (5′-CTTTGAGTACTGTTTGTG-3′); U/TTR (850 bp), LGH2098, (5′-AAGATATAGACCCACCATAC-3′) and LGH2097 (5′-CACGTAGCAAGCACTGAG-3′). The underlined sequences represent added restriction sites for cloning purposes.
The PCR DNA products were fractionated on 1% agarose gels (GIBCO BRL ultrapure catalog no. 15510-027), stained with ethidium bromide, and photographed under longwave UV light. Isolated bands were cut out from the gel and extracted with QiaexII glass milk suspension (Qiagen Gel Extraction Kit catalog no. 20021) and subjected to manual 32P-labelled dideoxynucleotide double-stranded cycle DNA sequencing (GIBCO BRL catalog no. 18196) using both left-hand side (LHS) and RHS external, and in some cases also internal, primers. Virtually all of this sequence data were generated on both complementary strands, and many analyses included redundant overlapping runs or duplicative sequencing on multiple independent PCR-amplified products. Whenever possible, direct PCR products were used as sequencing templates in preference to nested amplification products.
Cloning and sequencing of RT-PCR products.
Total RNA was isolated from uninduced HBL6 or BCBL-1 cells by using Trizol reagent (GIBCO BRL). For each cell line, RNA samples (3 μg each) were either DNase treated and subjected to RT-PCR or not DNase treated but subjected to RT-PCR or DNase treated and subjected to DNA PCR. DNase treatment involved incubation with 10 U of RNase-free DNase I (Boehringer Mannheim) for 4 h at 37°C. The samples were then extracted with phenol-chloroform, and the RNA was precipitated in 2.5 volumes of ethanol and 1/10 volume of 3 M sodium acetate (pH 5.2). Samples were resuspended in diethylpyrocarbonate-treated distilled H2O, and reverse transcription was carried out at 50°C for 50 min by using Superscript II (GIBCO BRL) and an oligo(dT) primer (Boehringer Mannheim). Subsequent DNA amplification was performed by PCR for 45 cycles using Taq DNA polymerase (Promega) with the following 5′ and 3′ primers designed to encompass the complete ORF-K15 coding region for both the P and M forms: HBL6(M) N terminus (LGH3129), 5′-CTAGGAATTCATGAATTACAAAAAATACCTGTGGGG-3′; HBL6(M) C terminus (LGH3132), 5′-CTAGGAATTCGTCCGTGGGAAACAAAAC-3′; BCBL-1(P) N terminus (LGH3110), 5′-CTAGGGATCCATGAAGACACTCATATTCTTCTGG-3′; BCBL1(P) C terminus (LGH3111), 5′-CTAGAGATCTGTTCCTGGGAAATAAAACCTCC-3′.
The PCR products of 1.5 kb were cloned into pSG5 Flag-tagged expression vectors and sequenced to verify all splice junctions.
Nucleotide sequence accession numbers.
The GenBank accession numbers for these cDNA sequences are AF156886 [BCBL1 ORF-K15(P)] and AF156885 [HBL6 ORF-K15(M)].
RESULTS
Linkage analysis at multiple genetic loci reveals evidence for recombinant genotypes.
An overview map of the HHV8 genome which shows the positions of the six PCR loci involved in this work relative to other key features of the genome and to several characterized phage lambda clones from the RHS of the BCBL-R genome is presented in Fig. 1A. In our initial strain difference studies (38), the DNA sequence patterns found in the ORF26, ORF75, and UPS75 gene loci each fell into three distinct clusters represented by the prototypes of subtypes A (BCBL-R), B (431KAP), and C (ASM72), which all differed by 0.8 to 1.5% at the nucleotide level at each locus. Among the first 12 genomes tested, there were seven examples of subtype A, three of B, and two of C, and in 10 of these cases, all three genetic loci cosegregated but in 2 they did not (38). Addition of the more recent ORF-K1 data introduced additional complexities (37). For example, HBL6 was interpreted to belong to subtype C (identical to ASM72) at both the ORF75 and UPS75 loci, but was subtype A for ORF26 and ORF-K1. Similarly, ST1 was subtype A at both the ORF75 and UPS75 loci but gave a subtype C pattern for ORF26 and a subtype B ORF-K1 pattern. Finally, a third sample, EKS1, although giving a subtype A-like pattern at all three original loci, proved to be subtype C in ORF-K1.
FIG. 1.
Organization of the RHS of HHV8 genomes relative to the loci used for PCR analysis. (A) Relative genomic positions of the ORF-K1, ORF26, T0.7, ORF75, and ORF-K15 loci are indicated as solid bars with approximate genomic percentile coordinates included. Solid circles and overhead arrows refer to the duplicated predicted ORI-Lyt regions (DL and DR) (27). The G+C-rich TTRs are indicated. The names and map positions of four relevant overlapping phage lambda clones from the RHS of HHV8(BCBL-R) genomic DNA are illustrated by horizontal two-headed arrows. (B) Overall structure of the 3.3-kb sequenced region representing the RHS terminus of the unique region that encompasses the ORF-K15(P) allele in HHV8(BCBL-R). The intact region spanning the N terminus of the ORF75 gene across to the TTR (solid bar) on the RHS was subcloned into plasmid pDJA62 as a HindIII/SalI fragment of phage λB3-2 derived from a PEL tumor genomic DNA library (27). The relative organization of ORF75 and the predicted short ORF-K14.1 gene, together with the multiple spliced exons of the extended ORF-K15 gene(s), are indicated below the plasmid map. The comparative structures of the P and M alleles are also shown. Parentheses (del) denote the unstable G+A-rich repeats near the 5′ end of the M allele of ORF-K15 (31). TTR regions are shown as solid bars, and ORF-K15(M) regions are shown as dotted bars. (C) Map locations and sizes (in base pairs) of the PCR loci used to characterize P and M subtype genomes. Among the three ORF75 PCR products shown (hatched bars), only the new ORF75-E region was used here for extensive sequence analysis (see Fig. 6). K14.1(P) and K14.1(M) represent PCR products of the triple-primer reaction covering the divergent junction of the two subtypes of ORF-K15 genes (see Fig. 3). The K15(P) and U/TTR products are unique to the P subtypes (open bars), and K15(M) is unique to the M subtypes (dotted bars).
This tentative evidence for chimeric or recombinant genomes was duplicated several times with independently amplified PCR products, thus ruling out most sources of contamination. The possibility that some samples might contain mixtures of more than one genome subtype could not be eliminated, but our data for HBL6 were later confirmed when the complete sequence of the BC1 cell line (which was derived from the same patient) proved to be identical at all of these loci (31). As described below, some (but not all) of these complexities were explained by our subsequent realization that the ORF75 gene block from the RHS of both of the subtype C genomes used originally (i.e., ASM72 and HBL6/BC1) is not representative of all subtype C ORF-K1 patterns but instead represents another, unlinked level of subtype diversity that is found only at the RHS of the genome.
To evaluate both potential additional diversity and the levels of cosegregation and linkage between subtype patterns across the entire HHV8 genome, we developed a series of consensus primers for amplification and sequence analysis of multiple loci from the same expanded set of more than 60 different KS and PEL samples that were used previously for the ORF-K1 analysis (37). A representative example of an ethidium bromide-stained agarose gel profile showing the PCR products from all of these loci using DNA extracted from the JSC1 PEL cell line (2) is presented in Fig. 2A. Complete or nearly complete DNA sequence data were obtained at three internal loci from all samples for which DNA remained available, namely, ORF26 at genome map position 0.35, T0.7/K12 at genome map position 0.85, and ORF75-E at genome map position 0.96 (see below). Additional PCR product size and sequence data for the extreme RHS ORF-K14.1, ORF-K15, and U-2TTR boundary loci were also obtained for most samples, although only the most relevant or representative samples were sequenced (between 10 and 15 genomes each), and complete DNA sequence data for the entire ORF-K15 coding region were confined to the BCBL-R prototype genome only, as described below.
FIG. 2.
Representative examples of PCR DNA products involved in these studies. (A) Photograph of ethidium bromide-stained DNA bands in an agarose gel displaying a complete set of directly amplified PCR products from across the HHV8 genome present in the JSC1 cell line for use in DNA sequence analysis. Lanes: 1 and 9, 123-bp oligomer size marker ladder; 2, ORF26 primers (LGH1701 and LGH1702); 3, ORF75-E (LGH2087 and LGH1704); 4, U/TTR (LGH2097 and LGH2098); 5, T0.7 (LGH2076 and LGH2075); 6, CYC-D (LGH2044 and LGH2045); 7, ORF-K1 (LGH2089 and LGH2088); 8, ORF-K14.1 (triple-primer reaction of LGH2079, LGH2033; and LGH2506). (B) Photographs of ethidium bromide-stained RHS junction region PCR DNA products (K14.1) from various P (362 bp) and M (450 bp) allele genomes. Lanes 1 and 9 contained a 123-bp oligomer size marker ladder. The KS and PEL DNA samples used included the following: lanes 2 and 10, HBL6 (M); lane 3, BC2 (P); lane 4, BC3 (P); lane 5, AKS1 (P); lane 6, BCP1 (P); lane 7, BCBL-B (P); lane 8, ASM72 (M); lane 11, ASM76 (M); lane 12, BCBL1 (P); lane 13, EKS1 (P); lane 14, C282 (P); lane 15, a mixture of BCLB1 and HBL6 DNAs. The triple-primer combination of LGH2079 (both) plus LGH2033 (P) and LGH2506 (M) was used in standard direct PCRs.
Two distinct types of ORF-K15 region sequences at the extreme RHS of the genome.
Our initial primary DNA sequence analysis of the extreme RHS of the HHV8 genome just upstream of ORF75 was carried out from a genomic phage library of our prototype subtype A KS genome, HHV8(BCBL-R). Curiously, several different PCR primer pairs derived from this region of the BCBL-R sequence all failed to yield any amplified products from either our prototype subtype C genome (ASM72) or the A/C hybrid genome (HBL6) (38). Direct PCR sequence analysis of the proximal upstream ORF75 region in all three prototype strains (BCBL-R, ASM72, and 431KAP) revealed that although subtypes A and B continued to be nearly identical upstream of the ORF75 coding region, both subtype C patterns proved to diverge toward a nearly totally different sequence beginning within 300 bp of the N terminus of the ORF75 coding region. To analyze the remainder of the RHS unique region from our subtype A genome, we first mapped and subcloned a 16-kb genomic region from the extreme RHS unique-segment terminus of BCBL-R DNA in a selected phage lambda clone (λB3-2) that proved to encompass both the ORF75 region and the adjacent G-plus-C-rich TTR sequences (Fig. 1B). M13 sequencing on both strands of a 3.3-kb HindIII-to-SalI subclone containing the junction fragment (in plasmid pDJA62) revealed the presence of a 2,354-bp unique sequence between the ORF75 initiator codon and the beginning of the TTR region. The junction with the RHS unique region begins at position 653 of the published TTR sequence of HHV8(BC1) (GenBank accession no. U75699) (31) and then crosses into the next TTR unit and ends at position 551, representing the Sau3A site at the end of the RHS of λB3-2 (Fig. 1B). As judged by the finding of identical-size PCR-amplified DNA products for a region spanning the unique-region–TTR junction from the original BCBL-R tumor DNA, as well as in both the λB3-2 lambda clone and the M13-sequenced pDJA63 plasmid DNA (data not shown), we conclude that this sequence represents the complete undeleted RHS unique region from a prototype subtype A genome (GenBank accession no. U85269). Subsequent PCR sequencing of BCBL-1 DNA in this region also showed it to be almost identical to that of BCBL-R.
In contrast, when the complete genome sequence data from Russo et al. (31) for BC1 became available, it proved to have a 3,500-bp unique sequence upstream of the ORF75 gene in a region equivalent to that to the right of where the subtype A and C patterns begin to diverge, together with an unclonable G+A-rich repeat segment whose size is unknown, before reaching the first copy of the 801-bp TTR sequence at the extreme RHS of the genome (Fig. 1B). Surprisingly, a more than 2,000-bp unique region sequence to the right of ORF75 in our BCBL-R sequence showed no significant nucleotide homology to the RHS (or any other sequences) in the published complete HHV8(BC1) genome (GenBank accession no. U75698). Data for this region are missing in the nearly complete HHV8 KS sequence presented by Neipel et al. (24) (GenBank accession no. U93872), but subsequent PCR sequence data for the equivalent regions from our prototype subtype C (ASM72) genome and HBL6 confirmed that they were both also virtually identical to the BC1 pattern across the ORF75-to-ORF-K15 junctions.
An illustration of the pattern of nucleotide differences across the junction between the common and divergent regions on the RHS between our prototype subtype A sequence in BCBL-R (and BCBL1) compared to the subtype C sequence in HBL6 (and BC1) is given in Fig. 3. The differences begin within the proximal N-terminal domain of the leftward-oriented ORF75 coding region, proceed across the presumed promoter region for ORF75, and end with nearly total divergence beginning 350 bp upstream of the ORF75 ATG codon. Interestingly, both types of sequence encompass a small potential rightward-oriented coding region of 79 amino acids (designated the ORF-K14.1 gene here), which lies precisely within the 240 bp between the N terminus of ORF75 and the C terminus of ORF-K15 and spans the beginning of the diverged region. The presumed consensus TATAAA motifs for ORF-K14.1 represent the beginning of the junction region sequences shown in Fig. 3. Residual patchy nucleotide homology beyond the ORF-K14.1 coding region represents several short, conserved amino acid motifs (highlighted) within the extreme C terminus of the leftward-oriented coding region from exon 8 of ORF-K15 (see below). The two DNA sequences become totally nonhomologous beyond this point as they progress further into the ORF-K15 genes.
FIG. 3.
Comparison of DNA sequences across the divergent junctions between the leftward-oriented ORF75 and ORF-K15 gene regions in both the P and M subtypes of HHV8. The hypothetical rightward-oriented ORF-K14.1 genes encompassed by this region are also shown. Potential transcriptional control elements and the predicted amino acids encoded by the extreme N terminus of ORF75, the extreme C terminus of ORF-K15, and the intervening ORF-K14.1 region are shown. Nucleotide identities between the P allele in BCBL-R (top line) and the M allele in HBL6 (lower line) are indicated by asterisks, and missing bases are represented by hyphens. The positions of the PCR primers, LGH2079, LGH2033, and LGH2506, that were used to discriminate between the P and M allele forms of ORF-K14.1 in the triple-primer PCR (Fig. 2B) are indicated. The nucleotide sequences shown correspond to genomic positions 1268 to 1803 in the P allele (BCBL-R; GenBank accession no. U85269) and 134365 to 134881 in the M allele (BC1; GenBank accession no. U75698).
Many subtype A, B, and C HHV8 genomes contain the alternative M form of the RHS junction region.
To evaluate the relative abundance of each of the two types of RHS among HHV8 genomes, we generated a three-primer PCR set containing a single primer within the LHS common region at position +42 in ORF75 and two alternative RHS primers in the divergent regions either at position −320 of BCBL-R (generating a 362-bp product) or at position −408 of HBL6/BC1 (giving a 450-bp product including genome positions 134399 to 134848). Representative examples of the results are shown in Fig. 2B, and the positions of the primers are indicated in Fig. 3. Analysis of all 63 different HHV8 DNA samples for ORF-K14.1 PCR product size variations revealed 45 of the BCBL-R/BCBL1 type and 18 of the HBL6/BC1/ASM72 type. The latter were associated with more than half (13 of 24) of the genomes with subtype C ORF-K1 genes plus 3 of 19 subtype A genes and 2 of the 3 American subtype B genomes but not with any of the 12 African subtype B genomes or the 3 Pacific subtype D genomes. Therefore, the two versions of the RHS ORF-K14.1 region appear to be only partially linked to the LHS ORF-K1 subtype patterns. To avoid confusion and to clearly distinguish them from the subtype C nomenclature used for ORF-K1, we have altered the previously introduced subtype C terminology for the ORF75 regions found in ASM72 and HBL6/BC1 (38) by referring to these alternative minor population forms of the RHS as M alleles compared to the predominant P alleles found in BCBL-R and the majority of the HHV8 genomes tested.
Importantly, all of the HHV8-positive DNA samples that we tested with the three-primer ORFK-14.1 PCR assay gave only one of either the larger (M) or smaller (P) type of product and never both (Fig. 2B). However, to confirm that the situation always reflected the presence of only one or the other region as an alternative, rather than representing preferential amplification or deletion of one of these regions from a putative larger wild-type structure encompassing both, we designed another set of four primers representing a mixture of the ORF-K15(M) gene region (370 bp including BC1 genome coordinates 136044 to 136413) and the positionally equivalent ORF-K15(P) region (580 bp) including BCBL-R positions 3125 to 3704. Again, 10 of the 10 M allele genomes tested gave 370-bp PCR products only and 10 of the 10 P allele genomes tested gave 580-bp PCR products only (data not shown).
U.S. HHV8 samples with the HBL6/BC1-like M type of RHS included both of our ORF-K1 C1 variant genomes (ASM72 and BKS13), one with a C3 variant ORF-K1 sequence (BKS12), two A2 variants (HBL6/BC1 and WKS1), one A1 variant (BKS16), and both subtype B genomes from Florida (OKS7 and OKS8). In addition, all seven of the genomes with clade C3′ ORF-K1 genes and the one C2 variant (TKS11) from Taiwan were of this type, together with two of seven tested renal transplant KS samples from Saudi Arabia (SKS3 and SKS9), although many other subtype C2 and C3 genomes were not. For a summary of the distribution of P and M allele RHS ORF-K15 patterns relative to the assigned ORF-K1 subtypes, see the K14.1 and K15 columns in Fig. 9 (see also the asterisks in Fig. 6, 7, and 8). Genomes with subtype M RHS alleles are also designated by solid-circle symbols in the summary of ORF-K1 data given in Fig. 4 of reference 37.
FIG. 9.
Summary of the overall subclass assignments and patterns of linkage obtained at eight different loci across the HHV8 genome. The comparison includes all 63 HHV8 samples for which complete ORF-K1 sequence data (K1 column) were generated previously (37) and also includes updated ORF26 and UPS75 locus data relative to the original 12 genomes evaluated (38). ORF72 (vCYC-D) data are not included, and because there were only limited amounts of DNA available for some samples, sequences were not obtained at all loci in all cases (blanks). Type information in the column alongside the sample designations summarizes sample characteristics where + is HIV or AIDS associated, B is BCBL or PEL, K is KS, C is classic or endemic, and R is renal transplant. The source column indicates geographic locality (or origin), where A is Africa, F is Florida, G is Germany, P is Pacific Islands, S is Saudi Arabia, T is Taiwan, Z is New Zealand, and U is United States (other than Florida). Designations in parentheses in the K14.1 and K15 columns indicate data derived from PCR product size measurements only. All other entries are derived from PCR sequence data. Only P subtype information can be obtained at the U/TTR locus, and dashes in this column indicate where no PCR products were detected from M subtype genomes. Note that 9 samples are considered to be intertypic recombinants and 18 others are chimeras containing M allele sequences at the RHS (see Fig. 10).
FIG. 6.
Comparison of polymorphic nucleotide patterns that identify four distinct subgroups of HHV8 genomes within the extended 854-bp ORF75-E block (map coordinate 0.96). Dashes indicate identities to the prototype A1 class sequence (BCBL-R) in the top line. The numbering system refers to positions (+ or −) relative to those used by Zong et al. (38) for ORF75 based on the ORF75 631-bp RDA fragment of Chang et al. (5). Positions −223, +1, and +631 are equivalent to positions 132875, 133098, and 133728 in the BC1 complete genome sequence (primers LGH2087 and LGH1704). ORF-K1 and ORF75-E subclass assignments are given in the far left-hand and far right-hand columns, respectively. Importantly, genomes with P allele ORF-K15 genes that have subtype A or C ORF-K1 genes (and some with subtype B ORF-K1 genes) cannot be distinguished here (=A/C class), whereas nearly all of the genomes tested with M allele ORF-K15 genes (denoted by asterisks) form a distinctive M subclass of ORF-75E patterns. Note that all of the genomes examined differed from the original 631-bp sequence of Chang et al. (5) by having 167-T, 442-C, and 496-T instead of 167-C, 442-T, and 496-C.
FIG. 7.
Comparison of polymorphic nucleotide patterns that identify four distinct subgroups of HHV8 genomes within the 646-bp T0.7/K12 gene locus (map coordinates 0.85). Dashes indicate identities to the prototype A1 class sequence (BCBL-R) in the top line. The numbering system refers to the leftward noncoding strand direction of the gene, which is inverted relative to the genomic sequence. Therefore, position 118114 in the BC1 complete genome sequence represents position 1 of the PCR product and position 117469 in BC1 is position 646 (primers LGH2076 and LGH2075). ORF-K1 and T0.7 subclass assignments are given in the far left-hand and far right-hand columns, respectively. Importantly, genomes with P allele ORF-K15 genes that have subtype A or C ORF-K1 genes cannot be distinguished and many (but not all) of the genomes with M allele ORF-K15 regions fall into a distinctive M subclass. Note that all of the genomes examined, including HBL6, differed from BC1 of Russo et al. (31) by having 348-A in place of 348-T (genomic position 117769).
FIG. 8.
Comparison of polymorphic nucleotide patterns that identify several distinct subgroups of HHV8 genomes within the 330-bp ORF26 gene locus (map coordinate 0.35). Dashes indicate identities to the prototype A1 class sequence (BCBL-R) in the top line. The numbering system conforms to that used by Zong et al. (38) and introduced by Chang et al. (5) for the ORF26 RDA fragment. Positions 893 (=1) and 1222 (=330) in these PCR products are equivalent to positions 47193 and 47522 in the complete BC1 genomic sequence (primers LGH1701 and LGH1702). Overall, the ORF26 sequences resolve into four distinct patterns roughly equivalent to the ORF-K1 A, C3, C2, and B subtypes. However, the patterns for several of the B and C variants are either very similar or identical and the D patterns are indistinguishable from those of subtype A patterns. Therefore, the use of ORF26 patterns alone for classification is not recommended. Note that all of the genomes examined here differ from the original data of Chang et al. (5) by having 1033-T instead of 1033-C (genomic position 47333). All of the genomes examined here, including HBL6 and BC2, also differed from the data given for BC1 and BC2 by Cesarman et al. (3) by having 926-A, 989-C, and 1199-T.
Finally, to determine whether there is any significant variation among either the P or M alleles within the divergent ORF-K14.1 block, the products of the triple-primer PCRs were sequenced for 12 different HHV8 samples (for a summary, see Fig. 9). However, among the genomes with the P allele at the RHS terminus, all six of the A and C variants tested were identical over a total of 362 bp whereas the B version (431KAP) showed only a 1-bp change. Similarly, all five of the M allele variants tested from both ORF-K1 subtype A and C genomes were identical to one another across the equivalent 450-bp divergent region.
Evaluation of the coding capacity of the two ORF-K15 regions.
In their original analysis of HHV8(BC1) coding regions, Russo et al. (31) identified only a single short, leftward-oriented coding region of 100 amino acids (termed ORF-K15) within the 3,500-bp region between ORF75 and the RHS TTR. Although our DNA sequence in the equivalent 2,500-bp block in HHV8(BCBL-R) revealed no significant nucleotide homology, it displayed a very similar overall distribution of GC-rich and AT-rich patterns, as well as a small ORF-K15-like region, but with no associated ATG codon. Furthermore, both sequences also contained numerous potential consensus splice signals located on one strand only (leftward orientation) together with a second potential larger ORF without ATG motifs just upstream of ORF75. Most strikingly, there were also several very small ORFs with potentially conserved amino acids, as well as numerous potential hydrophobic transmembrane (TM) domain-like domains present at apparently colinear loci in the leftward orientation (27).
Subsequent extensive visual comparison of both unique sequences revealed that each region could potentially produce complex spliced mRNAs (Fig. 4) consisting of eight almost exactly matching coding exons spliced in a linear manner to create either a 489-amino-acid (P) or a 498-amino-acid (M) protein. These cDNAs are predicted to encode large intact integral membrane proteins consisting of 12 matching hydrophobic TM-spanning domains and a relatively large C-terminal cytoplasmic tail (Fig. 4). Several alternative spliced patterns appear plausible as well, especially toward the N terminus of the P form. Overall, this prototype structure is remarkably similar to that of the 497-amino-acid LMP2 latency protein of EBV, which also contains 12 TM domains encoded in multiple short exons and represents an exact positional analogue of ORF-K15 upstream of the ORF75 homologue (BNRF1) located at the equivalent end of that genome.
FIG. 4.
The structures and intron-exon boundaries of the ORF-K15(P) and ORF-K15(M) proteins and mRNAs are remarkably similar and resemble those of the EBV LMP2 protein. Overall spliced structure and organization of coding exons of sequenced RT-PCR products from uninduced latent-state mRNAs from the BCBL-1 (P) and HBL6 (M) PEL cell lines are shown. Numbers and solid bars show the relative size and amino acid (aa) content of each coding exon (E1 to E8) from ORF-K15 (P) and ORF-K15 (M) mRNAs compared to the organization and size of exons (E1A to E9 or E1B or E9) from EBV LMP2A and LMP2B mRNAs. Seven additional amino acids are created across the splice junctions in both ORF-K15 mRNAs. Hatched and solid bars indicate coding regions, and hatched regions denote the 12 TM domains. Open bars indicate noncoding regions and exons. Circles represent conserved consensus YXXL-type SH2-binding tyrosine kinase interaction motifs. pA, polyadenylation signal.
The existence of latent-state HHV8 mRNAs containing each of the seven predicted splice junctions listed in Table 1 was confirmed by sequencing of the most abundant RT-PCR products from uninduced mRNA obtained from both the BCBL1 (P) and HBL6 (M) cell lines. In both cases, all seven introns are short and similar in size, ranging in length from 77 to 110 bp in BCBL-R and from 78 to 96 bp in BC1. They also include several unusual nonconsensus donor or acceptor motifs associated with introns 4 and 5. The intact BCBL1 ORF-K15(P) protein encoded by the spliced cDNA clone proved to be nearly identical in amino acid sequence to that predicted from our genomic DNA sequence of BCBL-R (Fig. 5). In addition, except for an apparent reading frame discrepancy in exon 1 of the sequence, our intact HBL6 cDNA clone encodes an ORF-K15(M) protein identical to that predicted from BC1 genomic DNA. Sequencing of the appropriate portions of HBL6, TKS1, and ASM72 genomic DNAs confirmed that they too contain an additional base pair in exon 1 near the N terminus of the protein relative to the database sequence for HHV8(BC1). This change extends the intact M allele ORF-K15 reading frame to an ATG initiator codon (at position 136771 in BC1) exactly equivalent to that in the P allele ORF-K15 genes in BCBL-R (position 3652) and BCBL1 (Fig. 5). The intact P and M versions of the ORF-K15 protein display an overall amino acid identity of only 33% with 50% similarity.
TABLE 1.
Comparison of ORF-K15(P) and ORF-K15(M) splicing signalsa
Gene (form) and domain | Defining motif | Positionb | Size (bp) of:
|
|
---|---|---|---|---|
Exon | Intron | |||
BCBL-R (P) | ||||
Exon 1/initiator | ATG | 3652 | (217) | |
Intron 1 | G′GTAAGT | 3435 | 109 | |
Exon 2 | TTTGTTTTTATAG′ | 3326 | 93 | |
Intron 2 | G′GTAAGG | 3233 | 83 | |
Exon 3 | TTGTATTTTATAG′ | 3150 | 243 | |
Intron 3 | G′GTAAGT | 2907 | 87 | |
Exon 4 | ATCTTTTTTATAG′ | 2821 | 90 | |
Intron 4 | G′GTACAG | 2730 | 77 | |
Exon 5 | ACATTTTTTGTAG′ | 2653 | 153 | |
Intron 5 | G′GTTTGT | 2500 | 82 | |
Exon 6 | ATGTATAAACAG′ | 2419 | 108 | |
Intron 6 | G′GTAGGT | 2310 | 82 | |
Exon 7 | CTCTATTTTTTAG′ | 2228 | 102 | |
Intron 7 | A′GTAAGT | 2126 | 85 | |
Exon 8 | TTTTAAAATTTAG′ | 2041 | (461) | |
Terminator | TAG | 1581 | ||
BC1 (M) | ||||
Exon 1/initiator | ATG | 136771 | (232) | |
Intron 1 | G′GTAAGT | 136540 | 85 | |
Exon 2 | TCTTTTGTTTTAG′ | 136455 | 93 | |
Intron 2 | G′GTAAGT | 136362 | 96 | |
Exon 3 | ATGTGTTTTTCAG′ | 136266 | 243 | |
Intron 3 | G′GTAAGT | 136023 | 85 | |
Exon 4 | TTTACTACTACAG′ | 135938 | 90 | |
Intron 4 | G′GTATAT | 135848 | 87 | |
Exon 5 | AACTTTATTCAG′ | 135761 | 156 | |
Intron 5 | G′GTATGT | 135605 | 94 | |
Exon 6 | TTTTATTTACAG′ | 135511 | 108 | |
Intron 6 | G′GTAAGG | 135403 | 78 | |
Exon 7 | TTTATTTCCTTAG′ | 135325 | 102 | |
Intron 7 | A′GTGAGT | 135223 | 80 | |
Exon 8 | TTTTTCTTATTAG′ | 135143 | (470) | |
Terminator | TAA | 134674 |
Predicted splicing patterns and intron and exon sizes and structures of the ORF-K15(P) and ORF-K15(M) genes of HHV8 compared to those for LMP2 in EBV (17). The data are based on DNA sequence results from cloned RT-PCR products of single intact mRNA species encompassing all seven introns obtained from latent-state mRNAs from the BCBL1 (P) and HBL6 (M) PEL cell lines. The underlined sequences represent key consensus positions. Parentheses indicate coding positions only.
FIG. 5.
Comparison of the complete predicted protein sequences of the ORF-K15(P) and ORF-K15(M) alleles of HHV8. Each protein consists of eight matching exons and 12 TM domains (TM1 to TM12) with an extended C-terminal cytoplasmic domain. The ORF-K15(P) allele coding region occupies nucleotide positions 1578 to 3652 and is oriented leftward in the HHV8(BCBL-R) and HHV8(BCBL1) DNA sequences (GenBank accession no. U85269). The ORF-K15(M) allele coding region occupies nucleotide positions 134671 to 136771 and is oriented leftward in the HHV8(BC1) DNA sequence (GenBank accession no. U75698) but has an additional A inserted after position 136610 in our HBL6 version. The 12 presumed TM domains are denoted by the broken overlines, and intron-exon junctions are indicated by carets. Amino acid identities are signified by asterisks, and similarities are shown by vertical lines. The potential protein tyrosine kinase binding motifs YASIL and YEEVL plus a conserved QSG(M/I)S motif in the C-terminal cytoplasmic domain in exon 8 are highlighted.
Both the P and M allele proteins contain predicted 142- to 145-amino-acid C-terminal cytoplasmic tail domains in exon 8, which include conserved YXXL-like tyrosine kinase signalling motifs (e.g., YASI and YEEV) that would be expected to be targets for and to interact with SH2 domain protein kinases. They also have several possible Pro-rich SH3 domain-like motifs (SPPPLPP, PPLPS, PPPFQP, and TPPPT) and a conserved QSG(M/I)S motif. Curiously, this organization of probable C-terminal tail signalling motifs contrasts with the presence of a functional YXXLN6YXXL-containing immunoreceptor tyrosine-based activation motif (ITAM) signal plus a single YEEA motif and several SH3 domains in the 140-amino-acid N-terminal cytoplasmic tail instead in the EBV LMP2A protein. However, considering that there is no residual amino acid homology between them, the organizational features of the core integral membrane sections of the ORF-K15 and LMP2 genes display intriguing parallels, with TM domains 3, 4, 7, 8, and 12, all spanning introns, and the relative sizes of the first five coding exons being very similar (Fig. 4). Only the apparent fusion of exons 6 and 7 and the accompanying loss of the intron in TM domain 11 represent a significant change in the organization of this region in EBV LMP2 compared to the two HHV8 ORF-K15 alleles.
Overall, although the originally defined K15 ORF in HHV8 BC1 represents only exon 3 of the complete protein, the extended multiexon spliced genes for ORF-K15(P) and ORF-K15(M) occupy close to 2,100 bp each and essentially fill up the entire orphan region at the extreme RHS of the genomes between ORF75 and the TTR in both BCBL-R and HBL6/BC1. Our data have not permitted the identification of either the actual 5′ or 3′ ends of the ORF-K15 mRNAs, although we anticipate that as in LMP2 in EBV, the poly(A) signal is probably 3′ coterminal with that of the downstream ORF75 gene. Interestingly, a proximal TATATAA box motif and probable associated lytic-cycle promoter elements are present just upstream of ORF-K15(M) exon-1 in BC1 DNA, but this is not the case in the ORF-K15(P) version, implying that there are likely to be significant differences in the transcriptional control of the two alleles.
The unique-sequence–TTR junction in the P form of the RHS end differs between the B/D and A/C subtypes.
To further evaluate the level of heterogeneity found within the two forms of the extreme RHS of HHV8 genomes, we also analyzed another PCR locus representing the 850-bp unique-sequence–TTR boundary immediately upstream and to the right of the ORF-K15(P) genes from several genomes representing each of the major ORF-K1 subtypes. The results revealed two principal alternative PCR products that differed in size by approximately 45 bp, with the shorter forms being detected only in genomes with subtype B and D ORF-K1 genes (data not shown). As expected, this PCR primer pair failed to give any products with M subtype genomes.
Sequence data (not shown) for the U/TTR locus confirmed that all African KS genomes of the B/P type fell into two groups that could be readily distinguished from one another and from all A/P and C/P subtypes (for a summary, see Fig. 9). In particular, 431KAP, OKS4, RKS1, RKS3, and RKS4 all have either a 53-bp or a 55-bp deletion across the U/TTR boundary, as well as a nearby 10-bp insertion in intron 1 and up to 10 other single-nucleotide changes. We refer to these as B1 and B2 variants (see Fig. 9). In contrast, OKS3, RKS2, RKS5, and JKS15 form a very different subset, which we refer to as A/B (see Fig. 9) because they do not have the deletions and differ only slightly from the A/C pattern. The subtype D1 genome from TKS10 was similar to the deleted B1 and B2 patterns but included a second 15-bp insertion in the center of intron 1, as well as the 10-bp insertion and the 53-bp U/TTR junction deletion. Details of the upstream ORF-K15 regions will be presented elsewhere. However, except for possible alternatively spliced forms of exon 1, there is little effect of any of these subtype B- and D-specific changes on the structure or amino acid sequence of the ORF-K15(P) protein.
In contrast to the B/P and D/P subtypes, the nine A/P and C/P subtypes sequenced showed minimal variations in U/TTR; all four of the subtype A1 or A4 genomes examined are identical to BCBL-R over the entire 849 bp, whereas the four subtype C3 or C4 genomes tested and the A3 variant (BCBL1) differ by only two to four nucleotides, none of which affects the ORF-K15 coding region. Note that the U/TTR boundaries in all of the P allele genomes analyzed are less than 300 bp upstream from the presumed ORF-K15(P) ATG initiator codon, whereas the upstream ORF-K15(M) region in BC1 contains an additional 3 to 4 kb of complex (G+A)-rich repeat sequence before the U/TTR boundary (31).
The M allele of ORF-K15 is linked to a specific M subtype of the adjacent ORF75 gene.
To try to understand the differences noted previously between ASM72 and HBL6/BC1 compared to subtype A and B genomes within the ORF75 coding region, we PCR sequenced an expanded 854-bp region (ORF75-E) from 58 of the HHV8 genomes in our ORF-K1 set. The results revealed that even in the adjacent downstream ORF75 coding region (38), the influence of the M allele, versus that of the P allele, was still readily detectable. A chart showing the positions of all 20 of the nucleotide polymorphisms detected within ORF75-E and our interpretation of subgroup patterns is presented in Fig. 6. The data revealed that 25 of the 28 subtype A and C genomes having the P type of RHS were identical except for a single-nucleotide change in both subtype A3 genomes (BCBL1 and BKS14) and another in one of the subtype C2 samples (EKS1). Among the 14 subtype B genomes tested, two distinct patterns emerged. Six gave a characteristic B pattern, which differed from that of all of the A-plus-C versions at up to seven positions. The other eight samples (plus OKS3) were identical to the A/C pattern, except for 159-G in JKS20 and 563-G in the three American B samples. The three subtype D1 and D2 ORF75-E sequences were also unique and differed at six positions from the A/C pattern, as well as at seven positions from the B pattern. Remarkably, among genomes with the M type of RHS region, 14 of the 16 examples tested, representing a variety of C1, C2, C3, C5, A1, and A2 ORF-K1 genomes, were all identical to one another and to HBL6/BC1, except at one position. However, they also differed from the B/P subgroup ORF75-E pattern at 9 positions, from the D/P pattern at 13 positions, and from the A/P and C/P subtype RHS genomes at 14 positions. The two B/M genomes OKS7 and OKS8 from Florida were the only exceptions found in which M allele ORF-K15 genes were associated with P subtype ORF75-E patterns.
These results show that all subtype A and C genomes with the P form of RHS are virtually identical in both the ORF75-E block and the ORF-K14.1 junction regions that lie immediately to the left (downstream) of ORF-K15, although subtypes B and D both differ from them significantly (by 0.7 to 0.8% within the ORF75-E locus). On the other hand, both loci of all clade A or C genomes that have the M form of RHS are also identical to one another at both of these sites but give patterns that are quite distinct from those of the P type (by up to 1.5%). Therefore, all but 2 of the more than 50 genomes examined showed complete linkage between their P or M patterns for ORF75-E and ORF-K14.1 relative to their P or M alleles of ORF-K15. Overall, we conclude that HHV8 genomes with the ASM72 or HBL6/BC1 type of RHS are all chimeras containing an M variant of the RHS end, which includes very different ORF-K15 alleles and U/TTR junction regions, plus associated distinctive (but far less diverged) versions of the adjacent ORF-K14.1 and ORF75 genes also. Therefore, our original C prototype genome, ASM72, is now judged to be a probable C1/M recombinant whereas HBL6/BC1 represents an A2/M recombinant (see Fig. 9). Interestingly, 8 of the 18 genomes identified so far with the M type of RHS represent classic or iatrogenic non-AIDS KS samples. In addition, all eight from both AIDS and classic Chinese KS patients from Taiwan were of the M type, but no M allele genomes have been detected among our 15 KS samples from Africa and the Pacific.
Further penetration of M allele-associated patterns into the T0.7/K12 gene region.
A short and highly abundant viral RNA species referred to as T0.7, which potentially encodes the small ORF-K12 membrane protein, was the first HHV8 gene product identified that is unambiguously associated with latent-state expression in both PEL cell lines and KS tumor spindle cells (33, 34, 36). This gene lies within the DL-E divergent locus toward the RHS of the genome at map position 0.85. The latent-state ORF72 (vCYC-D) gene region at map coordinate 0.90 (data not shown) was also examined initially as a potential RHS locus that we expected might not be linked to the P and M genotypes for ORF75 and ORF-K15. Although both loci gave subtype-specific polymorphisms, the T0.7 gene proved to display a more useful level of subtype divergence (up to 5% nucleotide variation). Therefore, PCR sequence analysis of the 646-bp T0.7 segment equivalent to nucleotide positions 118114 to 117469 of the BC1 genome was carried out on a total of 58 different HHV8 samples representing almost all of those included in our previous ORF-K1 analysis (37). The results are summarized in the chart in Fig. 7 in the form of variations from our usual prototype subtype A1 genome BCBL-R. Among a total of 35 polymorphic positions within the T0.7 region, four major subgroup patterns were detected, with the A-plus-C group giving two very distinctive types of patterns and the B and D patterns also each being very different from the others. However, again there were some significant changes from the typical ORF-K1 groupings within the A-plus-C subtypes and several additional minor variant patterns were discernible among both the A and B subtypes.
Among the 21 T0.7 samples tested that were subtype A for ORF-K1, all of the A1 and A4 examples were identical, except for BKS1 and BKS16, whereas the A2 (HBL6/BC1) pattern (also found in BKS16) differed at two or three positions and the A3 pattern (BCBL1 and BKS14) differed at four positions, with nucleotide changes to 245-A and 550-C being common to both the A2 and A3 patterns and 78-T plus 525-T being specific for A3. Surprisingly, excluding the Taiwan clade, six of the seven genomes that possessed C3 variant ORF-K1 genes were also identical to the A1 pattern and the other one (BKS12) could be considered an A2 pattern. Similarly, five of the six subtype C2 ORF-K1 genomes were identical to either the A1 (SKS2, SKS6), A2 (SKS7, EKS1), or A3 (SKS3) pattern for T0.7. In contrast, 10 of the 11 remaining genomes with subtype C ORF-K1 genes displayed a dramatically different T0.7 pattern with either 10 or 11 bp changes from the A1 pattern (1.6% variation). Importantly, all 10 of these genomes with the novel T0.7 pattern, which include ASM72 and BKS13 (C1), TKS11 (C2), and SKS9 (C5) plus all six examples of the Taiwan C3′ clade tested, have the M allele type of ORF-K15 gene. Only OKS3(A5) and SKS1(C4) among all of the subgroup A and C genomes did not fit into one or the other of these two major A/C patterns for T0.7, with OKS3 giving a typical B pattern and SKS1 having a novel pattern intermediate between those of the A/C and D subtypes.
Unlike the complexities of the subtype A and C patterns of T0.7, all 12 of the ORF-K1 subtype B genomes tested (and the sole African A5 genome) fitted readily into a very distinct subtype B T0.7 group, which could be further divided into three subpatterns. Five of the African KS samples clustered as B1 variants, and four others represented B2 variants, whereas all three American B subtypes (OKS7, OKS8, and JKS15) represented a distinct B3 variant pattern. Overall, the B patterns differed from the A1 pattern at 11 to 12 positions, from C1 at 8 to 9 positions, and among each other at 6 to 8 positions. Again, the T0.7 gene in the unusual OKS3 sample (with the A5 variant ORF-K1 gene) closely resembled the B1 pattern observed in other African samples. Finally, all three D1 and D2 genomes were nearly identical and displayed a unique subtype D pattern which differed from A1, C1, and B1 at 9, 6, and 6 positions, respectively, and from the M pattern at 13 positions.
There are two possible alternative ways to interpret the finding of two very different types of A/C patterns in T0.7. We could conclude either that the highly diverged T0.7 pattern displayed by four of the C1, C2, and C5 genomes, together with all of the C3′ TKS genomes from Taiwan, represents a prototype subtype C pattern or, alternatively, that it again represents a distinctive M allele-linked pattern. However, the first interpretation is complicated by the following considerations. Firstly, five other genomes with subtype C2 ORF-K1 genes do not have a distinctive subtype C2 T0.7 pattern. Secondly, it does not seem very logical that the seven non-Taiwan-derived C3 patterns are all essentially indistinguishable from A1 patterns yet the novel T0.7 pattern displayed by 10 other subtype C genomes differ from the A1/C3 pattern at as many as 10 positions. Thirdly, we would also have to conclude that all other subtype C genomes (13 of 23) had recombined with subtype A genomes, although admittedly this must be the case anyway for 5 of them that show chimeric patterns (see below). Fourthly, and most compellingly, it is unlikely to be coincidental that all 10 of the genomes with this novel T0.7 pattern also have the M type of ORF-K15 gene. Overall, 10 of the 16 genomes with M genotype RHS ends that we tested gave this novel type of T0.7 pattern. Therefore we conclude, firstly, that (just as in ORF75-E) the A and C subtypes of P allele genomes are virtually indistinguishable and, secondly, that the highly diverged T0.7 pattern must reflect further leftward penetration of the M allele-associated RHS pattern.
Re-evaluation and extension of the ORF26 region data.
Although we originally defined the major A, B, and C subtypes in the ORF-K1 region based on the presumption of linkage to the original assignments for the ORF26 and ORF75-C regions in our prototype strains (i.e., BCBL-R as A; 431KAP as B, and ASM72 as C), it has become evident that ORF26 data alone are insufficient for accurate discrimination between subtype B and C patterns, especially within the smaller 233-bp ORF26 block used in most previous studies. Partly because of the presumed need to revise some tentative assignments in Table 2 of Zong et al. (38) and partly because there were no subtype C3 genomes in our original analysis, we have re-evaluated a somewhat larger 330-bp ORF26 block at map position 0.35 in the genome after sequencing 60 of the HHV8 genomes included in the ORF-K1 analysis. A summary chart of the results and interpretations is presented in Fig. 8. The data include our reanalysis of BC2, which differs at two positions from the original published sequence of Cesarman et al. (3) for BC2, as well as our data for HBL6, which agrees with that for BC1 of Russo et al. (31) but not that of Cesarman et al. (3). We have also omitted the KSHV sequence of Chang et al. (5) because we suspect that the unusual base 1033-C, rather than 1033-T, in ORF26 and three nonconsensus bases in ORF75-E, which have not been found in any other samples, may all represent PCR errors introduced during representational difference analysis (RDA).
The new data revealed essentially identical patterns for all A1, A2, and A4 variants and the subtype D genomes but with a single C-to-A change in the two A3 variants at position 1103. In contrast, the prototype C1, C2, and C3 variants differed from the A pattern at 4, 3, and 5 positions, respectively, including changes at nucleotide positions 981-C, 1032-A, 1055-T, 1086-T, 1094-A, 1132-G, and 1139-C. Similarly, three distinct patterns emerged among known subtype B genomes, although one (B3) is identical to the C2 pattern, and all involve different arrangements of the same changes seen within the subtype C patterns at positions 981-C, 1032-A, 1132-G, and 1139-C. However, there is no evidence for any M-linked pattern in ORF26. Finally, we need to point out that the previously published ORF26 pattern for BC2 was originally assigned incorrectly as a B′ type in Tables 2 and 3 of Zong et al. (38), but the corrected pattern is now recognized as the prototype for C3 variants seen also in BC3, BKS3, and JSC1.
Complexities and possible overlaps among some ORF26 patterns within the B and C subgroups.
In the ORF-K1 analysis (37), only 1 of 10 African KS samples tested did not belong to subtype B, and that was the single example found of subtype A5 (OKS3). The ORF26 pattern for OKS3 is also distinctive, being unlike any of the other A patterns and different from the prototype C3 (BC2) and B (431KAP) patterns. However, the ORF26 sequence pattern for OKS3 matches those of TKS11 (C2) and BKS13 (C1), as well as our unusual ST1 chimeric African sample, which proved to have a subtype B ORF-K1 gene, although it clearly had an A/C subtype ORF75-E pattern. Furthermore, four other African KS samples reported by Huang et al. (14), as well as four KS samples from Saudi Arabian renal transplant patients, have this type of ORF26 pattern (9). Subsequent sequencing of the ORF-K1 region of six Saudi Arabian samples with this type of ORF26 gene revealed four C2 patterns, as well as our prototype examples of the C4 and C5 patterns (37). Therefore, we now believe that the 981-C, 1086-T, and 1139-C pattern for ORF26, as found in TKS11, BKS13, SKS1, SKS2, SKS3, and SKS6, which all differ by only 1 bp (A-1094) from ASM72 (C1), probably represents the prototype C2 pattern. Since most C3 genomes have the distinctive 981-C, 1032-A, 1055-T, 1132-G, and 1139-C pattern for ORF26 (Fig. 8), we also have to conclude that the Taiwan clade TKS1 to TKS9 genomes, which are identical to C2 in ORF26 except for one additional distinctive change to 935-A, presumably all represent a novel C3′/C2′/M recombinant genome. Evidently, KS-F and EKS1 must both also represent chimeric C/A recombinant genomes based on discordant data for their ORF-K1 and ORF26 loci (see later).
The predominant subtype B patterns for ORF26 are either 981-C, 1132-G, and 1139-C (referred to as B1 variants) or 981-C, 1032-A, 1132-G, and 1139-C (referred to as B2 variants) (Fig. 8). However, four samples (ST1, RKS5, JKS20, and OKS3) all have the 981-C, 1086-T, and 1139-C pattern for ORF26, which could be a B3 subgroup but is identical to the C2 pattern (Fig. 8). Similarly, ZKS6 differs by only 1 bp (B3′). The B3 patterns may represent true B variants that happen to be identical to the C2 pattern, or alternatively, they might represent mosaic genomes. Overall, we conclude that there is considerable similarity among and overlap between the ORF26 patterns for several variants of the subtype B and C genomes. Furthermore, the subtype A and D patterns are not distinguishable either. Therefore, we caution that this ambiguity could lead to misinterpretations and incorrect strain assignments when judgements are based solely on ORF26 data, as has been the case in most previous analyses of HHV8 strain variability (8, 9, 14).
Correlations and linkage between multiple loci.
An overall summary of the available data and subgroup assignments determined at each of up to seven distinct central and RHS loci across the genomes of 63 KS and PEL patient samples are compared with the LHS ORF-K1 data from Zong et al. (37) in Fig. 9. Ignoring the complexities introduced by the ORF-K15(M) RHS recombinants, a reasonably coherent picture has emerged showing linkage and cosegregation between subgroup patterns across the whole genome. In most of the 49 subtype A, C, and D genomes examined, the subgroup patterns of the ORF-K1, ORF26, T0.7, ORF75-E, and UPS75 loci are fully consistent across the entire DNA molecule. Obvious exceptions that are best interpreted as intertypic recombinants between different A and C variants include EKS1 and SKS7 (C2/A2), SKS3 (C2/A3), KS-F (C3/A1), and BKS12 (C3/A2), as well as the seven TKS1 to TKS9 genomes (C3′/C2′). Note that many of these also have an M allele pattern at the RHS. In addition, BCBL1, which may be an A3/C3 recombinant, has ORF26, T0.7, and ORF75-E patterns slightly different from those of all of the other subtype A samples (although in each case differing at only one position) but has a novel RHS U/TTR junction pattern (4-bp differences from all of the other subtype A and C samples tested), including a C3-like variant of the TTR. Our current interpretation of the overall structures of 13 clearly chimeric HHV8 genome types are illustrated and summarized in Fig. 10. Most of the intertypic and intratypic chimerism observed was associated with the nine distinct types of M allele-containing genomes, and it apparently reflects multiple sequential recombination events that generated these genomes, but four types of intertypic chimerism within P allele genomes are also illustrated.
FIG. 10.
Summary of the deduced overall genome structures of 13 different mosaic HHV8 genotypes that are interpreted to represent either intertypic recombinants or chimeras that have acquired ORF-K15(M) alleles and associated M-linked ORF75 and T0.7 genes. The relative genomic locations of the seven PCR loci involved in these studies are shown at the top. Key for recombinant genomes: solid horizontal lines, subtype A and RHS P allele sequences; open bars, subtype C sequences; dotted bars, subtype B sequences; hatched bars, M allele-linked sequences; solid bars, presumed exotic nonhuman ORF-K15(M) sequences. (A) Four examples of exclusively intertypic P allele genomes. Note that the EKS1 pattern is also found in SKS7 and that variations of the RKS2/5 pattern are also found in OKS3, ST1, JKS15, and JKS20. (B) Nine examples of chimeric genome patterns containing M allele RHS sequences, including both ORF-K15(M) itself and adjacent M-linked genes. TKS1/9 and BKS16 are also considered to be intratypic recombinants independently of the M allele sequences, whereas BKS12, SKS3, and OKS7/8 are also intertypic recombinants. Note that the ASM72 pattern is also found in BKS13 and the TKS1/9 pattern is also found in TKS2, TKS3, TKS5, TKS6, and TKS7. (C) Model for the origin of M allele sequences from a hypothetical original intact M subtype parent genome. Note that this is itself interpreted to be a chimera containing a highly diverged ORF-K15(M) gene that was likely to have been derived initially by recombination between an ancient human M subtype HHV8 and an exotic nonhuman HHV8-like virus source.
The overall RHS genome assignments for the three D1 and D2 variants that form the Pacific ORF-K1 D subtype are of particular interest. Although the ORF-K1 patterns of the Hwalian sample from Taiwan (TKS10) and the two Polynesian samples (ZKS3 and ZKS4) are dramatically different (21% amino acid divergence), we had judged previously that there were sufficient similarities to warrant their provisional assignment into a single novel D or Pacific subgroup (37). This view is now strengthened by the finding not only that these three samples were again distinguishable from all other subtype patterns at the T0.7, ORF75-E, UPS75, and U/TTR loci (Fig. 9) but that they were also almost indistinguishable from each other at these RHS loci, as well as in ORF26.
Complex chimerism among some subtype B genomes.
Approximately half (7 of 12) of the subtype B KS samples from Africa that have been analyzed appear to have simple uncomplicated subtype B characteristics throughout their genomes. However, the other five, as well as the three subtype B genomes from America, display a complex mosaicism that may reflect a history of both intertypic and intratypic recombination. Interestingly, all 10 of the samples obtained directly from Africa (as well as those from two recent African emigrants) have some features of subtype B genomes. This includes ST1, in which only the LHS is B, and OKS3, in which ORF-K1 is an A5 variant, but other parts of the genome (especially T0.7) have distinctive subtype B characteristics.
At the RHS of the mosaic subset of subtype B genomes, there is a common pattern for OKS3, RKS2, RKS5, JKS15, and JKS20, whereby they all have an A/C-like rather than B-like ORF75-E pattern, as well as a U/TTR pattern (referred to as A/B in Fig. 9) that is intermediate between the A/C and B patterns but much closer to A/C than to B. The two chimeric B/M genomes from Florida also have the A/C-like pattern for ORF75-E, although the U/TTR region in this case has been displaced by an M allele at the RHS. These findings are illustrated in a probably oversimplified form for the three genome structure prototypes exemplified by RKS2/5, OKS3, and OKS7/8 in Fig. 10. Subsequent analysis has also revealed that the LHS recombination boundary for both OKS7 and OKS8 shows a very distinct crossover point between the B pattern and the M-linked pattern within the N terminus of the ORF75 gene (data not shown). Importantly, all seven of these genomes have standard B pattern T0.7 genes but all except RKS2 also have the indeterminate B/C pattern ORF26 genes, which may or may not represent additional mosaicism toward the LHS. In addition, most of the subtype B patterns in constant-region loci (ORF26, T0.7, ORF75-E, and UPS75) can be divided relatively cleanly into at least three subpatterns (B1, B2, and B3 variants). However, there appears to be significant discordance among these subpatterns at different loci in individual genomes, suggesting the possibility of additional ancient intratypic mosaicism. Obviously, a great deal more analysis is required to define the overall evolutionary history of many of these subtype B genomes. However, based on the results of Kasolo et al. (15), who found that 15 of 15 HHV8-positive children in Zambia had subtype A5 ORF-K1 genomes in their peripheral lymphocytes, as well as from our analysis of two other A5/B pattern mosaic genomes from South Africa (1), this OKS3-like class of genomes may be quite prevalent and important in Africa.
DISCUSSION
The ORF-K1 genes from more than 60 HHV8 genomes analyzed previously have evolved into four major subtypes with distinctive ethnic and geographic associations that we believe are likely to have arisen during the migrationary divergence of modern humans in Paleolithic times (37). Thus, the isolation of the precursor D subtype of HHV8 from a precursor B subtype was predicted to have occurred approximately 60,000 years ago as humans first spread from Africa to Southern Asia and the Pacific, whereas the derivation and subsequent divergence of the A and C branches probably correlated with later waves of migration into Europe and Northern Asia via the Middle East about 35,000 years ago. The nature of the biological selection pressure that has been driving the unusually high level of amino acid diversity displayed by the ORF-K1 membrane receptor signalling protein (18, 20, 28) is not understood, but obvious additional questions have arisen about whether or not the ORF-K1 protein subtypes also contribute to the different disease patterns observed in different regions of Africa compared to Mediterranean countries, etc. However, before this concept can be evaluated seriously, we considered it necessary to resolve whether other segments of the HHV8 genome display parallel strain heterogeneity patterns, as well as to determine whether previous hints that some genomes may be chimeric recombinants between subtypes represent additional complicating factors.
Therefore, we carried out a systematic analysis of sequence variability at three internal constant-region loci and at the extreme RHS of the genome in the same large set of samples that had been analyzed for ORF-K1 (37). The results of these studies at genomic map coordinates 0.35 (ORF26), 0.85 (T0.7/K12), and 0.96 (ORF75-E), as well as extensive evaluation of an apparent dichotomy in the DNA sequence patterns at the RHS of the genome (27), have led to three major conclusions. Firstly, consistent subtype A, B, C, and D patterns matching those in ORF-K1 are detectable throughout the more conserved internal regions of the genome, although the level of divergence is greatly diminished. Secondly, a very different pattern of strain subgrouping occurs at the extreme RHS of the genome, whereby two alternative highly diverged P and M subtypes or alleles of the ORF-K15 protein are essentially unlinked to the LHS subtype patterns. Thirdly, a combination of the presence of distinctive alleles of other adjacent genes linked to the P and M forms, together with evidence for simple recombination between some major subtypes, complicates the analysis of strain assignments in as many as 30% of all HHV8 genomes.
Origins of the ORF75 and T0.7 M-linked alleles.
Our original interpretation (38) that the ORF75 sequence pattern present in both the ASM72 and HBL6/BC1 genomes represented the prototype subtype C pattern has had to be revised. Instead, this pattern clearly represents that of the M-linked allele of ORF75, which was present in 14 of the 16 genomes examined that have the M subtype of the ORF-K15 gene, and includes examples of C2, C3, C5, A1, and A2 variant ORF-K1 genomes. Furthermore, examination of a number of ORF-K15(P) allele genomes having subtype C ORF-K1 genes revealed that the presumed true C pattern for ORF75-E cannot be resolved from that of most subtype A genomes, although the B and D patterns are distinctive. The level of nucleotide divergence of the subtype P[D] and P[B] ORF75-E loci from P[A/C] is 0.7% and 0.8% respectively, whereas the M allele ORF75-E pattern differs from P[A/C] by 1.4% and from P[B] by 1.0%. Therefore, it is tempting to argue that the M allele of ORF75 represents an older form that diverged from the P prototype before modern HHV8 genomes split into the A, B, C, and D subgroups. Similarly, at the T0.7 locus, all remaining M patterns (10 of the 18) are linked to both ORF75(M) and ORF-K15(M) genes, and again the M form is at least as far diverged from the P[A/C] and P[D] patterns as is the P[B] pattern.
There are two plausible explanations for these findings. Firstly, the acquisition of the ORF-K15(M) genes from some exotic but related primate HHV8-like virus increased the rate of evolutionary divergence of the adjacent genes in a prototype chimeric virus. A second alternative is that the exotically acquired ORF-K15(M) gene was originally associated with an evolutionarily older and presumably intact M subtype of HHV8 genome rather than with the more modern P subtype. Either way, we see virtually no sequence variation at all within ORF-K15(M) or ORF75(M) genes among the 18 genomes identified so far that carry the M alleles. Therefore, a single relatively recent event must have subsequently introduced this apparent relic from the RHS only of an ancient M subtype virus into a modern P subtype human virus. This event must have occurred at some point after the subtype C lineage diverged from the subtype A lineage but early enough that some descendent chimeric genomes could spread into both Europe and Asia, as well as into the Middle East. Therefore, the M alleles of ORF75 and T0.7 probably represent part of a novel virus from an intermediate evolutionary stage of the modern human lineage dating back to between 100,000 and 300,000 years ago and that the subsequent recombination event that introduced it into a modern human P subtype genome occurred in the Middle East, perhaps between 25,000 and 35,000 years ago. Clearly, many of the chimeric viruses found represent even more recent transfer into other subtype A, B, and C genomes by further relatively rare homologous recombination events. Importantly, no M forms have been detected among African or Pacific samples and the survival and apparent spread of this chimeric form of the virus within many classical KS patients, as well as AIDS patients, may imply some form of selective advantage for the M allele of ORF-K15 over the P allele.
Specific features of M subtype recombinant genomes.
The viruses found in all seven of the classic and AIDS-associated KS lesions examined from patients of Northern Chinese heritage in Taiwan (whose ancestors first emigrated no more than 400 years ago) represent a particularly interesting and complex clade. Although these seven genomes all differ from one another in the VR* loop and other variable loci within ORF-K1 (37), they all contain a distinctive subset of C3′ ORF-K1 genes and a unique C2′ variant of the ORF26 gene, together with both an apparently invariant M allele of the ORF-K15 gene and linked, almost invariant M subtypes of T0.7 and ORF75 (Fig. 9 and 10). Other than this clade, only 11 of the 56 genomes that we have examined from elsewhere around the world contained the ORF-K15(M) type of RHS. Note that in contrast, all five of the classic KS patients of Hwalien heritage that have been tested (whose ancestors are believed to have arrived in Taiwan from the Pacific Islands about 4,000 years ago) had D1/P subtype HHV8 genomes (33a, 37).
Irrespective of the origin of the linked M alleles of ORF75 and T0.7, all of the genomes examined that contained these genes are clearly chimeras that retain RHS segments only from the unknown exotic source virus (Fig. 10). The largest fragment present extends back beyond T0.7 at map position 0.85 but certainly not as far as ORF26 at 0.35. We have identified four distinct sets of genomes of this type that all have subtype C ORF-K1 genes, namely, (i) ASM72 and BKS13 (both C1/M), (ii) SKS9 (C5/M), (iii) TKS11 (C2/M), and (iv) all seven examples of the Taiwan TKS1-to-TKS9 clade (C3′/C2′/M). A smaller segment (not including the M allele of T0.7) was then presumably secondarily transferred to A2 and A1 variants exemplified by HBL6/BC1 and BKS16. Since the only other United States sample found that contains M sequences (BKS12) also has an A2-like T0.7 gene within a C3/A2/M chimeric genome, we presume that this also represents a secondary transfer recombination event. Finally, both the unusual Saudi Arabian C4/A3/M genome in SKS3 and the two Florida B/A/M recombinants (OKS7 and OKS8) also appear to represent alternative and less extensive secondary transfer lineages (Fig. 10). The two Florida AIDS KS samples came from patients of black Haitian and hispanic Mexican ethnicity, suggesting that they arose as a result of interactions between African and Hispanic cultures in Central America within the past few hundred years. Overall, in addition to the two Florida cases, we have identified only three other subtype B ORF-K1 genomes outside of Africa. Two were derived from African emigrants from Africa, one being a classic KS case in Baltimore (JKS20) and another an AIDS-associated case in New Zealand (ZKS6), whereas the third was from an African American with AIDS in Baltimore (JKS15), but none of these were M recombinants.
Lack of divergence between subtypes A and C at the RHS is paralleled within the constant regions of ORF-K1.
Our interpretation that all subtype A and C genomes with P allele RHS patterns are almost invariant within their T0.7, ORF75-E, and ORF-K15 genes was initially rather surprising considering that the A and C subtypes of ORF-K1 at the LHS of the genome differ by as much as 14% at the amino acid level. However, a re-evaluation of the distribution of sequence variations across relatively well-conserved segments of the ORF-K1 membrane signalling protein confirms the notion that they are, in fact, much more closely related than the overall level of variation indicates. As we pointed out previously (37), there are two distinct types of variation in ORF-K1, firstly, the relatively consistent and characteristic intertypic variations and, secondly, the seemingly random intratypic hypervariability displayed both between and within the different variants and clades of each subtype. Both levels of divergence appear to result from powerful biological selection pressures, but the latter are concentrated mostly within the two 40-amino-acid blocks (VR1 and VR2) within the extracellular domains and especially within the 23-amino-acid Cys-bridged VR* loop (37). In fact, most of the variations between subtypes A and C of ORF-K1 occur within these two hypervariable domains. In contrast, when only the four most conserved regions (totaling nearly 60% of the protein) from between amino acids 1 to 36, 105 to 147, 173 to 199, and 229 to 289 are considered, the prototype A and C subtypes differ by only 2.5% (six amino acid changes), whereas A and D still differ by 19% (32 of 164) and A and B differ by 27% (47 of 164). Therefore, we conclude that the hypervariability displayed within and between the A and C subtypes and variants predominantly represents a much more recent ongoing intratypic process (that has occurred even within the Taiwan C3′ clade, for example). In contrast, the specific variations common to all members of a subtype obviously represent a more accurate reflection of older evolutionary divergence. Therefore, in this sense, even on the LHS, the much greater similarity between the constant regions of the subtype A and C proteins compared to subtype B and D proteins matches quite closely to the patterns observed at the RHS. A similar situation applies to the D1 and D2 subtypes, which are virtually identical at the ORF26, T0.7, ORF75, and UPS75 gene loci, despite showing 21% overall amino acid differences between them in ORF-K1.
Nature and origin of the two types of ORF-K15 genes.
The M and P subtypes (or alleles) of the HHV8 ORF-K15 integral membrane protein genes, although they display less than 30% amino acid identity, have practically identical intron-exon splicing patterns in latent-state mRNA, and they show considerable overall structural and exon pattern resemblances (but no homology) to the LMP2 latency protein of EBV. All three proteins have 12 similarly spaced hydrophobic TM domains, and in fact, our initial prediction of the complete ORF-K15 protein structure, which has been confirmed by RT-PCR analysis, depended partly upon the presumption of a linear splicing pattern closely parallel to that of LMP2 (17). However, the ORF-K15 proteins contain two conserved likely SH2-binding tyrosine kinase signalling motifs and other proline-rich SH3-like binding domains within a large C-terminal cytoplasmic domain, whereas LMP2 has functional ITAM, SH2, and SH3 binding domains in an N-terminal cytoplasmic domain originating from across the other side of the TTRs (Fig. 4). Nevertheless, there appears to be little doubt, because of their equivalent genomic positions and orientation, as well as the similar sizes and splicing patterns of exons 2 to 7, that they are evolutionarily related genes.
There is no ORF-K15-like gene in HVS; however, several HHV8-like viruses have been described in Old World primates (7, 30) and the complete sequence of one of these, referred to as rhesus rhadinovirus (RRV), has recently been presented (32). Although the presence or absence of an ORF-K15-like gene in RRV was not addressed by those authors, one can predict from the sequence at the extreme RHS that RRV does contain an equivalent highly spliced gene encoding a protein with a structure similar to that of HHV8 ORF-K15 but without any significant residual amino acid homology. Therefore, we conclude that even the exotic ORF-K15(M) gene must have originated within an HHV8-like gammaherpesvirus. It seems probable that the 33% amino acid identity between ORF-K15(M) and ORF-K15(P) represents a level of evolutionary divergence that is consistent with an origin for one of them (the M allele form) by recombination with a related virus from a great ape or other Old World primate species. In comparison, the LMP2A genes of human EBV and the equivalent gamma-1 class virus from baboons (herpesvirus papio) still display 50% amino acid identity (10). However, the possibility that there was once more than one HHV8-like virus with this level of divergence within the human lineage cannot be ruled out.
The problem of the origin of the two very distinctive subtype A and B EBNA2 genes in EBV (65% amino acid identity compared to 55% identity between EBV and herpesvirus papio) (6) and of the three even further diverged alternative STP-like genes within herpesvirus saimiri provides similar scenarios (19, 21). Do these alleles all represent divergent forms of key genes that suddenly evolved very rapidly with all intermediate forms being lost as each subtype of the virus occupied a new biological niche, or do they instead (as suggested here) represent relics of older forms of the virus or of related viral species that may now persist only as small pieces of their original genomes by virtue of rare recombination events with more modern forms? Our DNA sequence data from multiple samples of both the P and M subtypes of HHV8 show that neither of the two ORF-K15 alleles displays any more nucleotide variation than do the adjacent conserved genes of this virus; this appears to strongly support the recombination model. On the other hand, the hypervariability in ORF-K1 obviously represents an example of a gene that is currently undergoing very rapid evolution within the modern form of HHV8. The equivalent ORF-R1 gene in RRV has also diverged so far as to have very low residual homology to ORF-K1 of HHV8 (32). While we obviously favor the interpretation that a small number of very rare recombination events gave rise to all of the chimeric P and M class genomes described here, as well as to most of the intratypic and intertypic chimeras detected, the possibility that more-complex mosaics, such as those seen in many of the subtype B genomes, represent coordinate evolutionary drift at multiple loci cannot be ruled out.
ADDENDUM
We are aware that A. Davison (6a) has made similar independent predictions about the ORF-K15 structure, with supporting evidence from cDNA analysis carried out by Glenn et al. (12a).
ACKNOWLEDGMENTS
These studies were funded by Public Service grant R01 CA73585 to G.S.H. from the National Cancer Institute, N.I.H. L.J.P. was supported by a graduate student stipend from the Biochemistry, Cell and Molecular Biology Training Program at the Johns Hopkins School of Medicine (2T32-GM07445).
We thank Margit Lucskay for technical assistance and Sarah Heaggans for help with preparation of the manuscript. We thank our colleagues K. Foreman, B. Nickoloff, and S. Alkan (SKS series); P. Browning, P. Rady, and S. Tyring (BKS); K. Powell and M. Croxson (ZKS); I.-J. Su (TKS); and C. Rabkin (RKS), as well as A. Blauvelt and the AIDS Malignancy Bank (OKS) for providing various KS DNA or frozen and paraffin block samples.
REFERENCES
- 1.Alagiozoglou, L., J. C. Zong, and G. S. Hayward. Unpublished data.
- 1a.Beral V, Peterman T A, Berkelman R C, Jaffe H W. Kaposi’s sarcoma among persons with AIDS: a sexually transmitted infection? Lancet. 1990;335:123–128. doi: 10.1016/0140-6736(90)90001-l. [DOI] [PubMed] [Google Scholar]
- 2.Cannon, J. S., A. C. Hawkins, C. A. Griffin, Q. Tao, M. Borowitz, G. S. Hayward, and R. F. Ambinder. Characterization of a new EBV+/HHV8+ primary effusion lymphoma-derived cell line. J. Infect. Dis., in press. [DOI] [PMC free article] [PubMed]
- 3.Cesarman E, Moore P S, Rao P H, Inghirami G, Knowles D M, Chang Y. In vitro establishment and characterization of two AIDS-related lymphoma cell lines containing Kaposi’s sarcoma-associated herpesvirus-like (KSHV) DNA sequences. Blood. 1995;86:2708–2714. [PubMed] [Google Scholar]
- 4.Cesarman E, Nador R G, Bai F, Bohenzky R A, Russo J J, Moore P S, Chang Y, Knowles D M. Kaposi’s sarcoma-associated herpesvirus contains G protein-coupled receptor and cyclin D homologs which are expressed in Kaposi’s sarcoma and malignant lymphoma. J Virol. 1996;70:8218–8223. doi: 10.1128/jvi.70.11.8218-8223.1996. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Chang Y, Cesarman E, Pessin M S, Lee F, Culpepper J, Knowles D M, Moore P S. Identification of herpesvirus-like DNA sequences in AIDS-associated Kaposi’s sarcoma. Science. 1994;266:1865–1869. doi: 10.1126/science.7997879. [DOI] [PubMed] [Google Scholar]
- 6.Dambaugh T, Hennessey K, Chamnankit L, Kieff E. U2 region of Epstein-Barr virus DNA may encode Epstein-Barr virus nuclear antigen 2. Proc Natl Acad Sci USA. 1984;81:7632–7636. doi: 10.1073/pnas.81.23.7632. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6a.Davison. A., et al. Personal communication.
- 7.Desrosiers R C, Sasseville V G, Czajak S C, Zhang X, Lackner A A, Jung J U. A herpesvirus of rhesus monkeys related to the human Kaposi’s sarcoma-associated herpesvirus. J Virol. 1997;71:9764–9769. doi: 10.1128/jvi.71.12.9764-9769.1997. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Di Alberti L, Piattelli A, Artese L, Favia G, Patel S, Saunders N, Porter S R, Scully C M, Ngul S-L, Teo C-G. Human herpesvirus 8 variants in sarcoid tissues. Lancet. 1997;350:1655–1659. doi: 10.1016/s0140-6736(97)10102-7. [DOI] [PubMed] [Google Scholar]
- 9.Foreman K E, Alkan S, Krueger A E, Panella J R, Swinnen L J, Nickoloff B J. Geographically distinct HHV-8 DNA sequences in Saudi Arabian iatrogenic Kaposi’s sarcoma lesions. Am J Pathol. 1998;153:1001–1004. doi: 10.1016/S0002-9440(10)65642-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Franken M, Annis B, Ali A N, Wang F. 5′ coding and regulatory region sequence divergence with conserved function of the Epstein-Barr virus LMP2A homology in herpesvirus papio. J Virol. 1995;69:8011–8019. doi: 10.1128/jvi.69.12.8011-8019.1995. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Gao S-J, Kingsley L, Hoover D R, Spira T J, Rinaldo C R, Saah A, Phair J, Detels R, Parry P, Chang Y, Moore P S. Seroconversion to antibodies against Kaposi’s sarcoma-associated herpesvirus-related latent nuclear antigens before the development of Kaposi’s sarcoma. N Engl J Med. 1996;335:233–241. doi: 10.1056/NEJM199607253350403. [DOI] [PubMed] [Google Scholar]
- 12.Gao S-J, Kingsley L, Li M, Zheng W, Parravicini C, Ziegler J, Newton R, Rinaldo C R, Saah A, Phair J, Detels R, Chang Y, Moore P S. KSHV antibodies among Americans, Italians and Ugandans with and without Kaposi’s sarcoma. Nat Med. 1996;2:925–928. doi: 10.1038/nm0896-925. [DOI] [PubMed] [Google Scholar]
- 12a.Glenn M, Rainbow L, Auradé F, Davison A, Schulz T F. Identification of a spliced gene from Kaposi’s sarcoma-associated herpesvirus encoding a protein with similarities to latent membrane proteins 1 and 2A of Epstein-Barr virus. J Virol. 1999;73:6953–6963. doi: 10.1128/jvi.73.8.6953-6963.1999. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Harwood A R, Osoba D, Hofstader S L, Goldstein M B, Cardella C, Holecek M J, Kunynetz R, Giammarco R A. Kaposi’s sarcoma in recipients of renal transplants. Am J Med. 1979;67:759–765. doi: 10.1016/0002-9343(79)90731-9. [DOI] [PubMed] [Google Scholar]
- 14.Huang Y Q, Li J J, Kaplan M H, Poiesz B, Katabira E, Zhang W C, Feiner D, Friedman-Kien A E. Human herpesvirus-like nucleic acid in various forms of Kaposi’s sarcoma. Lancet. 1995;345:759–761. doi: 10.1016/s0140-6736(95)90641-x. [DOI] [PubMed] [Google Scholar]
- 15.Kasolo F C, Monze M, Obel N, Anderson R A, French C, Gompels U A. Sequence analyses of human herpesvirus-8 strains from both African human immunodeficiency virus-negative and -positive childhood endemic Kaposi’s sarcomas show a close relationship with strains identified in febrile children and high variation in the K1 glycoprotein. J Gen Virol. 1998;79:3055–3065. doi: 10.1099/0022-1317-79-12-3055. [DOI] [PubMed] [Google Scholar]
- 16.Kedes D H, Oberskalski E, Busch M, Kohn R, Flood J, Ganem D. The seroepidemiology of human herpesvirus 8 (Kaposi’s sarcoma-associated herpesvirus): distribution of infection in KS risk groups and evidence for sexual transmission. Nat Med. 1996;2:918–924. doi: 10.1038/nm0896-918. [DOI] [PubMed] [Google Scholar]
- 17.Laux G, Perricaudet M, Farrell P J. A spliced Epstein-Barr virus gene expressed in immortalized lymphocytes is created by circularization of the linear viral genome. EMBO J. 1988;7:769–774. doi: 10.1002/j.1460-2075.1988.tb02874.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Lee H, Guo J, Li M, Choi J-K, DeMaria M, Rosenzweig M, Jung J U. Identification of an immunoreceptor tyrosine-based activation motif of K1 transforming protein of Kaposi’s sarcoma-associated herpesvirus. Mol Cell Biol. 1998;18:5219–5228. doi: 10.1128/mcb.18.9.5219. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Lee H, Trimble J J, Yoon D-W, Regier D, Desrosiers R C, Jung J U. Genetic variation of herpesvirus saimiri subgroup A transforming protein and its association with cellular src. J Virol. 1997;71:3817–3825. doi: 10.1128/jvi.71.5.3817-3825.1997. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Lee H, Veazey R, Williams K, Li M, Guo J, Neipel F, Fleckenstein B, Lackner A, Desrosiers R C, Jung J U. Deregulation of cell growth by the K1 gene of Kaposi’s sarcoma-associated herpesvirus. Nat Med. 1998;4:435–440. doi: 10.1038/nm0498-435. [DOI] [PubMed] [Google Scholar]
- 21.Medveczky P, Szomolanyi E, Desrosiers R C, Mulder C. Classification of herpesvirus saimiri into three groups based on extreme variation in a DNA region required for oncogenicity. J Virol. 1984;52:938–944. doi: 10.1128/jvi.52.3.938-944.1984. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Moore P S, Boshoff C, Weiss R A, Chang Y. Molecular mimicry of human cytokine and cytokine response pathway genes by KSHV. Science. 1996;274:1739–1744. doi: 10.1126/science.274.5293.1739. [DOI] [PubMed] [Google Scholar]
- 23.Moore P S, Gao S-J, Dominguez G, Cesarman E, Lungu O, Knowles D, Garber R, Pellett P E, McGeoch D J, Chang Y. Primary characterization of a herpesvirus agent associated with Kaposi’s sarcoma. J Virol. 1996;70:549–558. doi: 10.1128/jvi.70.1.549-558.1996. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Neipel F, Albrecht J-C, Fleckenstein B. Cell-homologous genes in the Kaposi’s sarcoma-associated rhadinovirus human herpesvirus 8: determinants of its pathogenicity? J Virol. 1997;71:4187–4192. doi: 10.1128/jvi.71.6.4187-4192.1997. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Nicholas J, Ruvolo V, Zong J, Ciufo D, Guo H-G, Reitz M S, Hayward G S. A single 13-kilobase divergent locus in Kaposi sarcoma-associated herpesvirus (human herpesvirus 8) genome contains nine open reading frames that are homologous to or related to cellular proteins. J Virol. 1997;71:1963–1974. doi: 10.1128/jvi.71.3.1963-1974.1997. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Nicholas J, Ruvolo V R, Burns W H, Sandford G, Wan X, Ciufo D, Hendrickson S B, Guo H-G, Hayward G S, Reitz M S. Kaposi’s sarcoma-associated human herpesvirus-8 encodes homologues of macrophage inflammatory protein-1 and interleukin-6. Nat Med. 1997;3:287–292. doi: 10.1038/nm0397-287. [DOI] [PubMed] [Google Scholar]
- 27.Nicholas J, Zong J-C, Alcendor D J, Ciufo D M, Poole L J, Sarisky R T, Chiou C J, Zhang X, Wan X, Guo H-G, Reitz M S, Hayward G S. Novel organizational features, captured cellular genes and strain variability within the genome of KSHV/HHV8. J Natl Cancer Inst Monogr. 1998;23:79–88. doi: 10.1093/oxfordjournals.jncimonographs.a024179. [DOI] [PubMed] [Google Scholar]
- 28.Poole, L. J., D. M. Ciufo, B. Chandran, and G. S. Hayward. Identification and expression of the immunoglobulin receptor-like glycoprotein encoded by the hypervariable ORF-K1 transforming gene of KSHV/HHV8. Submitted for publication.
- 29.Qunibi W Y, Barri Y, Alfurayh O, Almeshari K, Khan B, Taher S, Sheth K. Kaposi’s sarcoma in renal transplant recipients: a report on 26 cases from a single institution. Trans Proc. 1993;25:1402–1405. [PubMed] [Google Scholar]
- 30.Rose T M, Strand K B, Schultz E R, Schaefer G, Rankin J G W, Thouless M E, Tsai C-C, Bosch M L. Identification of two homologs of the Kaposi’s sarcoma-associated herpesvirus (human herpesvirus 8) in retroperitoneal fibromatosis of different macaque species. J Virol. 1997;71:4138–4144. doi: 10.1128/jvi.71.5.4138-4144.1997. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Russo J J, Bohenzky R A, Chien M-C, Chen J, Yan M, Maddalena D, Parry J P, Peruzzi D, Edelman I S, Chang Y, Moore P S. Nucleotide sequence of the Kaposi sarcoma-associated herpesvirus (HHV8) Proc Natl Acad Sci USA. 1996;93:14862–14867. doi: 10.1073/pnas.93.25.14862. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Searles R P, Bergquam E P, Axthelm M K, Wong S W. Sequence and genomic analysis of rhesus macaque rhadinovirus with similarity to Kaposi’s sarcoma-associated herpesvirus/human herpesvirus 8. J Virol. 1999;73:3040–3053. doi: 10.1128/jvi.73.4.3040-3053.1999. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Staskus K A, Zhong W, Gebhard K, Herndier B, Wang H, Renne R, Beneke J, Pudney J, Anderson D J, Ganem D, Haase A T. Kaposi’s sarcoma-associated herpesvirus gene expression in endothelial (spindle) tumor cells. J Virol. 1997;71:715–719. doi: 10.1128/jvi.71.1.715-719.1997. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33a.Su, I. J. Personal communication.
- 34.Sun R, Lin S-F, Gradoville L, Miller G. Polyadenylated nuclear RNA encoded by Kaposi sarcoma-associated herpesvirus. Proc Natl Acad Sci USA. 1996;93:11883–11888. doi: 10.1073/pnas.93.21.11883. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Whitby D, Howard M R, Tenant-Flowers M, Brink N S, Copas A, Boshoff C, Hatzioannou T, Suggett R E, Aldam D M, Denton A S, et al. Detection of Kaposi sarcoma associated herpesvirus in peripheral blood of HIV-infected individuals and progression to Kaposi’s sarcoma. Lancet. 1995;346:799–802. doi: 10.1016/s0140-6736(95)91619-9. [DOI] [PubMed] [Google Scholar]
- 36.Zhong W, Wang H, Herndier B, Ganem D. Restricted expression of Kaposi sarcoma-associated herpesvirus (human herpesvirus 8) genes in Kaposi sarcoma. Proc Natl Acad Sci USA. 1996;93:6641–6646. doi: 10.1073/pnas.93.13.6641. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Zong J-C, Ciufo D M, Alcendor D J, Wan X, Nicholas J, Browning P J, Rady P L, Tyring S K, Orenstein J M, Rabkin C S, Su I-J, Powell K F, Croxson M, Foreman K E, Nickoloff B J, Alkan S, Hayward G S. High-level variability in the ORF-K1 membrane protein gene at the left end of the Kaposi’s sarcoma-associated herpesvirus genome defines four major virus subtypes and multiple variants or clades in different human populations. J Virol. 1999;73:4156–4170. doi: 10.1128/jvi.73.5.4156-4170.1999. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Zong J-C, Metroka C, Reitz M S, Nicholas J, Hayward G S. Strain variability among Kaposi sarcoma-associated herpesvirus (human herpesvirus 8) genomes: evidence that a large cohort of United States AIDS patients may have been infected by a single common isolate. J Virol. 1997;71:2505–2511. doi: 10.1128/jvi.71.3.2505-2511.1997. [DOI] [PMC free article] [PubMed] [Google Scholar]