ABSTRACT
Undifferentiated nasopharyngeal carcinoma (NPC) has a 100% association with Epstein-Barr virus (EBV). However, only three EBV genomes isolated from NPC patients have been sequenced to date, and the role of EBV genomic variations in the pathogenesis of NPC is unclear. We sought to obtain the sequences of EBV genomes in multiple NPC biopsy specimens in the same geographic location in order to reveal their sequence diversity. Three published EBV (B95-8, C666-1, and HKNPC1) genomes were first resequenced using the sequencing workflow of target enrichment of EBV DNA by hybridization, followed by next-generation sequencing, de novo assembly, and joining of contigs by Sanger sequencing. The sequences of eight NPC biopsy specimen-derived EBV (NPC-EBV) genomes, designated HKNPC2 to HKNPC9, were then determined. They harbored 1,736 variations in total, including 1,601 substitutions, 64 insertions, and 71 deletions, compared to the reference EBV. Furthermore, genes encoding latent, early lytic, and tegument proteins and glycoproteins were found to contain nonsynonymous mutations of potential biological significance. Phylogenetic analysis showed that the HKNPC6 and -7 genomes, which were isolated from tumor biopsy specimens of advanced metastatic NPC cases, were distinct from the other six NPC-EBV genomes, suggesting the presence of at least two parental lineages of EBV among the NPC-EBV genomes. In conclusion, much greater sequence diversity among EBV isolates derived from NPC biopsy specimens is demonstrated on a whole-genome level through a complete sequencing workflow. Large-scale sequencing and comparison of EBV genomes isolated from NPC and normal subjects should be performed to assess whether EBV genomic variations contribute to NPC pathogenesis.
IMPORTANCE This study established a sequencing workflow from EBV DNA capture and sequencing to de novo assembly and contig joining. We reported eight newly sequenced EBV genomes isolated from primary NPC biopsy specimens and revealed the sequence diversity on a whole-genome level among these EBV isolates. At least two lineages of EBV strains are observed, and recombination among these lineages is inferred. Our study has demonstrated the value of, and provided a platform for, genome sequencing of EBV.
INTRODUCTION
The incidence rate of undifferentiated nasopharyngeal carcinoma (NPC) is exceptionally high in the southern part of China, and this type of carcinoma is 100% associated with Epstein-Barr virus (EBV) (1). To investigate the role of EBV genomic variation in the pathogenesis of NPC, EBV strains had been characterized in NPC by genotyping polymorphic markers in the EBER1 and -2, LMP1, BHRF1, BZLF1, and EBNA1 gene loci in tumor samples obtained from China, southern Asia, and northern Africa (2–7). Association of LMP1 deletion variant Asp335 with NPC in Hong Kong was reported (8). Specific EBNA1 (V-val) and LMP1 subtypes (China 1) also showed preferential occurrence in NPC biopsy specimens (9, 10). However, genetic variations in the small subsets of genes investigated were not sufficient to assess the geographical distribution of EBV variants and their precise association to diseases. Whole-genome sequencing and genome-wide comparison of variations found in EBV genomes isolated from diseased and normal subjects are needed to determine the role of EBV genomic variations in the pathogenesis of diseases.
The EBV genomes reported to date include B95-8, AG876, Akata, Mutu, GD1, GD2, HKNPC1, C666-1, K4413-Mi, and K4123-Mi. The prototypic type 1 EBV strain B95-8 was the first complete viral genome sequenced. It was established by infecting marmoset B cell with EBV from 883L cell line, and the 833L cell line was obtained by culture of lymphocytes from an individual with infectious mononucleosis (11). The DNA sequence was analyzed by constructing M13 subclone libraries from suitable EcoRI and BamHI fragments, followed by random sequencing using the dideoxynucleotide method (12). B95-8 genome had been extensively mapped for transcripts, promoters, open reading frames, and other structural elements by means of Northern blotting and other methods (13, 14). A more representative type 1 EBV reference genome (GenBank accession number NC_007605) was constructed by using B95-8 as the backbone, while an 11-kb deletion segment was provided by the Raji sequences (15).
AG876 was originated from a Ghanaian case of Burkitt's lymphoma and is the first and only complete type 2 EBV sequence available to date (16). Sequence analysis was performed by Sau3AI digestion, cosmid cloning, and dideoxynucleotide sequencing. The result of whole-genome comparison of type 1 and 2 EBV, made possible since the determination of AG876 sequence, had validated that the two major types of EBV are generally very similar outside the known divergent regions at the EBNA2 and EBNA3 genes.
Akata and Mutu are African Burkitt's lymphoma cell lines that are commonly used model cell lines. Their EBV genomes were sequenced by next-generation sequencing and constructed by de novo assembly (17). C666-1 is a subclone of C666, an epithelial cell line derived from an NPC xenograft of southern Chinese origin (18). C666-1 is unique among NPC cell lines in that it retains the native EBV, while other NPC-derived cell lines have lost their EBV through in vitro culture. It is therefore the most representative NPC line to date. A consensus EBV genomic sequence of C666-1 was recently constructed by reference mapping (19). Most recently, two more EBV genomes in immortalized human B lymphocyte cell lines were sequenced using the Illumina MiSeq platform (20). Sequencing reads from total DNA of the cell lines were mapped to the EBV reference genome, and the mappable reads were assembled to yield the two EBV genomes, K4413-Mi and K4123-Mi.
There are only three EBV genomes isolated from NPC patients: GD1, GD2, and HKNPC1. GD1 was isolated by infecting umbilical cord mononuclear cells by EBV from saliva of a NPC patient (21). The EBV DNA was PCR amplified, subcloned, and sequenced by Sanger sequencing. Both GD2 and HKNPC1 were direct isolates from primary NPC biopsy specimens (22). GD2 was obtained as a small subset of sequence data from next-generation sequencing of the total DNA sequences derived from a NPC biopsy specimen (22). HKNPC1 was sequenced from a primary NPC biopsy specimen by next-generation sequencing after enrichment of EBV DNA by PCR amplification (23).
In this study, we sought to establish a complete sequencing workflow comprising target enrichment of EBV DNA by hybridization, followed by next-generation sequencing, de novo assembly, and joining of contigs by Sanger sequencing to yield whole EBV genomes. Three published EBV genomes—B95-8 (accession no. V01555), C666-1 (KC617875), and HKNPC1 (JQ009376)—were resequenced to validate the sequencing workflow. The sequences of eight NPC biopsy specimen-derived EBV (NPC-EBV) genomes, designated HKNPC2 to HKNPC9, were then determined. Heterogeneity, mutation, and phylogenetic analyses of the NPC-EBV genomes were subsequently performed to assess their genomic diversity.
MATERIALS AND METHODS
NPC patients.
The primary nasopharyngeal tumors were biopsied after obtaining informed written consent from patients diagnosed with NPC between the period from 2008 to 2010, prior to the commencement of their treatment at Queen Mary Hospital, Hong Kong, China. Collection of the tumor biopsy specimens from the NPC patients was approved by the Institutional Review Board of The University of Hong Kong/Hospital Authority Hong Kong West Cluster for the purpose of EBV genomic sequencing. The clinical information for the NPC patients included is summarized in Table 1. Fresh NPC tumor biopsy specimens were temporarily stored in phosphate-buffered saline with 1% fetal bovine serum (FBS), and DNA extraction was performed within 1 h after obtaining the biopsy specimen.
TABLE 1.
Clinical information for patientsa
Case | Age (yr) | TNM staging | Overall staging |
---|---|---|---|
HKNPC1 | 20 | T3N3aM0 | III |
HKNPC2 | 46 | T2bN2 | III |
HKNPC3 | 37 | T3N2 | III |
HKNPC4 | 54 | T4N3b | IVB |
HKNPC5 | 49 | T3N2 | III |
HKNPC6 | 38 | T2N3M1 | IVC |
HKNPC7 | 56 | T2N2M1 | IVC |
HKNPC8 | 60 | T4N2 | IVA |
HKNPC9 | 62 | T3N1 | III |
All of the patients were male, and the histology for all of the patients was undifferentiated NPC.
Cell lines.
Two cell lines, B95-8 (24) and C666-1 (18), were used. They were cultured in RPMI 1640 medium and 10% FBS before DNA extraction. All cultures were maintained at 37°C in 5% CO2 before harvest.
Sample DNA preparation.
The DNA of the NPC tumor biopsy specimens and the two cell lines was extracted by using a Qiagen blood and tissue kit according to the manufacturer's protocol (Qiagen, Hilden, Germany). A NanoDrop spectrophotometer (Thermo Scientific) and a Qubit dsDNA high-sensitivity assay kit (Life Technologies) were used to determine the concentration of the DNA samples. Nondegraded DNA with an A260/A280 ratio between 1.8 and 2.0 was used for the subsequent experiments.
Complete workflow for sequencing of EBV genomes.
The workflow from library preparation, target capture, next-generation sequencing, de novo assembly, and joining of contigs to subsequent analyses is described in detail below. A flow chart for the complete workflow used is shown in Fig. 1.
FIG 1.
Workflow of sequencing and analysis of EBV genomes. Input DNA extracted from primary biopsy specimens of NPC or cell lines go through library extraction, EBV DNA capture, index tagging, and next-generation sequencing to generate sequencing reads. Reads mappable to EBV go through filtering and trimming processes and were de novo assembled to determine contigs. The contigs were aligned to the reference to generate scaffolds, and the contigs were joined by Sanger sequencing. Subsequent analyses are performed on the resulting EBV genomes.
Library preparation, target capture, and index tagging.
The input DNA amount for each sample was 2 μg. RNA bait was designed as overlapping 120-mers covering the EBV genome at five times coverage (25). DNA shearing, end repair, nontemplated addition of adenine nucleobase, adaptor ligation, hybridization, enrichment PCR, and all post-reaction cleanup steps were performed according to the SureSelect Illumina paired-end sequencing library protocol (version 1.3) and observing all of the recommended quality control steps. The captured libraries were index tagged by PCR, normalized, and pooled at equal molar quantities to totals of 8 and 16 pM for MiSeq sequencing and GAIIx sequencing, respectively. Since the EBV-enriched sequences were expected to be of high GC content, 50% of the PhiX control was used as spike-in for MiSeq sequencing, and one full lane of PhiX control was dedicated for GAIIx sequencing to balance the base content.
Next-generation sequencing by using Miseq personal sequencer and genome analyzer IIx.
To validate the sequencing workflow, the MiSeq personal sequencer was used to resequence the B95-8, C666-1, and HKNPC1 genomes, whose sequences had been published previously (12, 19, 23). Cluster generation and 150-bp pair-end sequencing reactions were performed in succession using the cartridge, and the flow cell included in the MiSeq reagent kit (300 cycles; Illumina) according to the manufacturer's protocol. Genome Analyzer IIx was used to sequence the eight NPC-EBV genomes (HKNPC2 to -9). Cluster generation and 76-bp pair-end sequencing reactions were performed using the TruSeq PE Cluster kit v5–CS–GA (Illumina) and the TruSeq SBS kit v5-GA (Illumina), respectively, according to the manufacturer's protocols.
Quality assessment and demultiplexing.
Sequencing reads from the MiSeq personal sequencer were demultiplexed using the MiSeq reporter (Illumina, Inc.) into individual samples by allowing one mismatch in the index sequence. Demultiplexing of output sequences from GAIIx system was performed using the CASAVA (version 1.8.2) software. Thirty-three million reads were assigned into eight samples by allowing one mismatch in the index sequence. Quality assessment of the raw reads was then carried out to filter reads containing adaptor sequences using the FastQC (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/) and in-house scripts. Per-sequence quality scores for B95-8 and HKNPC2 (see Fig. SA1 and SA2 in the supplemental material) are shown as representative cases of the read qualities in the MiSeq and GAIIx runs, respectively.
Coverage of reads was assessed by mapping untrimmed reads of each sample to the reference EBV genome by using the Burrow Wheeler Aligner (BWA) software (version 0.5.8c, default settings) (26). Pile-up files were generated from the BAM files using the SAMtools software (27). The position and coverage information were extracted from the pile-up files and visualized using the R statistical package. Type 1 (NC_007605) or type 2 EBV genome (NC_009334) was used as the reference sequence for the alignment. The average coverage was calculated by dividing the total number of sequenced bases by the total number of bases of the reference genome.
De novo assembly of EBV genomes.
The last 51 bases of the output reads of the MiSeq personal sequencer were trimmed from the 3′ ends of all reads by the FastTrimmer of FASTX-Toolkit (http://hannonlab.cshl.edu/fastx_toolkit), while the first 100 bases from the 5′ end were retained. Similarly, output reads of the GAIIx were trimmed to a read length of 55 bp in read 1 sequences and 50 bp in read 2 sequences. High-quality reads were assembled using the Velvet 1.2.07 (28). The settings were optimized for each sample using the expected average k-mer coverage of 200 to 600, k-mer lengths of 33 to 47, and the minimum k-mer coverage of 20 to 70. Raw reads were aligned to the assemblies by using the BWA (26). A manual visual inspection with Integrative Genomics Viewer (29) was conducted to detect any misassemblies throughout the length of the assembled sequences. In cases where mismatches between reads and assembled contigs were detected, the base quality was inspected. Sanger sequencing would be performed to validate the mismatches where the mismatched bases were of high quality (>Q20) and were consistently detected in the majority of reads at that position.
Scaffolding and joining of contigs by Sanger sequencing.
The location and orientation of contigs were evaluated by pairwise alignment of the contigs to the reference EBV genome. PCR primers were designed at the breakpoints between the contigs. Sequences of the primers are listed in Table SA1 in the supplemental material. PCR using HotStarTaq Plus master mix (Qiagen) was performed across these breakpoints to amplify the missing regions. The products were purified by using a QIAEX II gel extraction kit (Qiagen), and Sanger sequencing was performed using the BigDye Terminator v3.1 cycle sequencing kit (Invitrogen) according to the manufacturer's protocol. The same copy number of internal repeat 1 to that of the reference EBV genome was adopted for all of the sequenced NPC-EBV genomes.
Heterogeneity analysis.
Quality assessment was carried out on the raw reads using the Illumina's default parameters to remove reads of low quality or those that comprised of adaptor or homopolymer sequences. Reads for each sample were aligned to the assembled genomes using the BWA (26). Read counts for each type of base at every individual position were extracted and tabulated by SAMtools (27), VarScan version 2.2 (30), and in-house scripts. A cutoff Q25 quality score was adopted for the base quality of reads. A position was defined as heterogeneous if there were two or more bases of variant frequency between 20 and 94% and each had a read depth of ≥5. Nucleotide positions with read depths of <5 were classified as ambiguous sites since there was insufficient read depth to make a high-confidence call.
Mutation analysis.
Lists of single nucleotide variants (SNVs), insertions and deletions (indels) were generated from pairwise comparison of the DNA sequence of each of the EBV genomes against that of the reference EBV genome, NC_007605, using the cross_match software (http://www.phrap.org/phredphrapconsed.html). The variations identified in the major repeats, including internal repeats (IR) 1 to 4, terminal repeats (TR), and a family of repeats (FR), were disregarded. Positions without sequence data or those marked with “N” were also ignored in the mutation analysis. Substitutions, insertions, and deletions were all considered variations. The variability for each EBV genome was calculated by dividing the number of variations by the total number of bases of the genome. Nonsynonymous mutations were noted for their locations in proteins with known or putative function. Particular attention was paid to those that resulted in amino acid changes in known CD4+ and CD8+ T-cell epitopes. SNVs located in coding sequences and noncoding regions, including TATA boxes, poly(A) signals, microRNAs, and other noncoding RNAs, were also examined.
Phylogenetic analysis of EBV genomes derived from cell lines and NPC biopsy specimens.
Multiple sequence alignments of all of the available EBV genomes were performed using MAFFT version 6 (31) software. The alignment was visualized and edited using the Jalview version 2.8 (32). Poorly aligned regions were trimmed before construction of the phylogenetic trees. Phylogenetic analysis was performed using Molecular Evolutionary Genetics Analysis version 5 (MEGA5) (33) using the neighbor-joining algorithm. This approach was based on multiple sequence alignments of whole genomes or nucleotide sequences of individual genes of all of the sequenced EBV genomes in the study. LMP1, LMP2A, BLLF1, BPLF1, and BZLF1 genes were selected. All phylogenetic trees were rooted with the sequence of the rhesus lymphocrytovirus (accession no. NC_006146). Bootstrap analysis of 500 replicates was performed on each tree to determine the confidence.
Recombination analysis of EBV genomes.
Phylogenetic and mutation analysis suggested potential recombinant strains; hence, recombination analysis was performed on the EBV genomes. The BootScan feature of the Recombination Detection Program version 4.27 (34) was used to detect recombination signal. Using a window size of 200 bp and a step size of 200 bp, 100 bootstrap replicates were performed for the BootScan run. SimPlot version 3.5.1 (35) was used to generate the similarity plot of parental strains to the recombinant strain. A window size of 200 bp, a step size of 200 bp, and a Kimura two-parameter model were adopted for the SimPlot analysis. Each recombinant fragment was subjected to phylogenetic analysis using MEGA5 software. Neighbor-joining trees of the parental and recombinant strains were constructed by using the B95-8 sequence as the root.
Accession numbers.
Sequence data for the eight NPC-EBV genomes were submitted to GenBank under accession numbers KF992564 (HKNPC2), KF992565 (HKNPC3), KF992566 (HKNPC4), KF992567 (HKNPC5), KF992568 (HKNPC6), KF992569 (HKNPC7), KF992570 (HKNPC8), and KF992571 (HKNPC9). Raw sequencing data were submitted to the Sequence Read Archive (study accession number PRJNA253520).
RESULTS
Summary of the sequencing data.
The two cell lines and the HKNPC1 sample accounted for 789,660 (B95-8), 722,652 (C666-1), and 787,212 (HKNPC1), respectively, of the 150-bp pair-end reads generated by the MiSeq platform, which were equivalent to 119 Mb (B95-8), 109 Mb (C666-1), and 119 Mb (HKNPC1) of the sequence data. The number of reads for the B95-8, C666-1, and HKNPC1 genomes that mapped to the reference EBV were 659,683 (83.54%), 578,471 (80.05%), and 609,387 (77.41%), respectively. The NPC-EBV genomes were sequenced by using the GAIIx platform, generating a total of 33,468,614 (2,543 Mb) of 76-bp pair-end reads. The number of mapped EBV reads ranged from 1,941,831 (42.08%, HKNPC9) to 3,011,304 (84.34%, HKNPC6). Taking into account all cell lines and NPC biopsy samples sequenced here, the EBV portion of total DNA had been significantly enriched to 75.2% of the total raw reads on average. The mean coverage for the HKNPC2 to -9 genomes was 1,278-fold, while the mean for the B95-8, C666-1, and HKNPC1 genomes was 538-fold. Given the small size of the EBV genome and the high proportion of EBV sequences among the total reads, it is feasible to multiplex a large number of samples while retaining a high coverage. Details of the sequencing data are listed in Table 2 and in Table SA2 in the supplemental material, and read coverage in HKNPC2 to -9 is illustrated in Fig. SA3 in the supplemental material. Raw sequencing data have been submitted to Sequence Read Archive.
TABLE 2.
Summary of data from Illumina MiSeq and GAIIx analyses
Samplea | No. of raw reads | Throughput (Mb) | No. of mappable reads | % Mappable reads | Avg coverage (fold) |
---|---|---|---|---|---|
B958* | 789,660 | 119 | 659,683 | 83.54 | 576 |
C666-1* | 722,652 | 109 | 578,471 | 80.05 | 505 |
HKNPC1* | 787,212 | 119 | 609,387 | 77.41 | 532 |
HKNPC2 | 4,515,480 | 343 | 3,732,274 | 82.66 | 1,564 |
HKNPC3 | 4,775,468 | 363 | 3,693,719 | 77.35 | 1,548 |
HKNPC4 | 3,997,242 | 304 | 3,256,549 | 81.47 | 1,365 |
HKNPC5 | 3,767,634 | 286 | 2,100,869 | 55.76 | 880 |
HKNPC6 | 3,570,608 | 271 | 3,011,304 | 84.34 | 1,262 |
HKNPC7 | 3,991,446 | 303 | 3,239,159 | 81.15 | 1,357 |
HKNPC8 | 4,235,676 | 322 | 3,429,954 | 80.98 | 1,437 |
HKNPC9 | 4,615,060 | 351 | 1,941,831 | 42.08 | 814 |
Total | 3,064,224 | 462 |
*, Sequenced by the MiSeq personal sequencer.
De novo assembly was performed for all of the sequenced samples. The number of contigs, represented as the number of nodes in graphs constructed by the de Bruijn graph assembler, ranged from 33 (HKNPC3) to 56 (HKNPC9). Contigs shorter than 100 bp were filtered out. N50 sizes of contigs ranged from 16,694 bp (HKNPC6) to 29,863 bp (HKNPC1). The largest contigs were ∼44 kb in length for all of the samples. A summary of the assembled sequences and the contig sizes is given in Table SA2 in the supplemental material. The gaps between the contigs were linked up either by Sanger sequencing or tracts of “N” with length estimated based on the EBV reference, NC_007605.
Resequencing B95-8, C666-1, and HKNPC1 genomes.
B95-8, C666-1, and HKNPC1 genomes were resequenced to validate the sequencing workflow. The resulting contigs were aligned to the published sequences. The major repeats, which generally lacked reliable sequence data for comparison, were disregarded. The newly assembled B95-8 sequence had 14 mismatches to the B95-8 part of the type 1 reference sequence (NC_007605). Considering the occurrence of sequencing errors and spontaneous mutations over prolonged culture, the resequenced B95-8 is highly similar to the original sequence.
Resequencing of the C666-1 genome revealed 148 discrepancies, in reference to the previously published C666-1 sequence, including 47 substitutions, 42 deletions, and 59 insertions. The raw reads were mapped to the newly assembled C666-1 EBV contigs and were checked manually for misalignment, as described in Materials and Methods. In view of the discrepancies mentioned above, we performed two additional validating steps. First, we mapped the reads to the published C666-1 sequence and found that the mismatches contained in the raw reads at these positions of discrepancies to be of high base quality (>Q30). Second, a subset of these discrepancies, namely, 3 substitutions, 5 deletions, and 11 insertions, was validated by Sanger sequencing. These sites (see Table S1 in the supplemental material) were chosen for validation under the constrains of primer design. Representative cases of Sanger verified mismatches are shown in Fig. SA5 to SA9 in the supplemental material. This resequenced C666-1 genomic sequence appears in GenBank under accession no. KC617875.
Resequencing of the HKNPC1 genome using target enrichment by hybridization resulted in an average read depth of 532-fold and a much more even coverage across the whole genome in comparison to that generated by the amplicon sequencing approach (see Fig. SA4 in the supplemental material). Neither method could resolve the major repeats: IR2, IR3, and IR4. However, the newly assembled HKNPC1 sequence resolved 84 “Ns” of the original HKNPC1 sequence. The updated HKNPC1 sequence appears in GenBank under accession no. JQ009376.
Discrepancies of between the resequenced genomes and the published genomes are listed in Table S1 in the supplemental material. Taken together, resequencing B95-8, C666-1, and HKNPC genomes validated the workflow of target EBV DNA enrichment, next-generation sequencing to de novo assembly, and joining of the contigs.
Sequencing HKNPC2 to -9 genomes.
All eight NPC-EBV genomes were successfully sequenced. The genome sizes estimated based on the reference EBV sequence ranged from 170,062 bp (HKNPC2) to 171,556 bp (HKNPC3 and -6), and their GC contents were all ∼58%. The sequencing reads of each NPC-EBV sample were mapped to their individually assembled genomes to assess the presence of heterogeneity. The number of heterogeneous sites and their percentages in relation to the size of the genomes were as follows: HKNPC1 (32 sites, 0.019%), HKNPC2 (61 sites, 0.036%), HKNPC3 (67 sites, 0.039%), HKNPC4 (70 sites, 0.041%), HKNPC5 (88 sites, 0.051%), HKNPC6 (95 sites, 0.055%), HKNPC7 (67 sites, 0.038%), HKNPC8 (65 sites, 0.038%), and HKNPC9 (81 sites, 0.047%). The heterogeneous sites are listed in Table S2 in the supplemental material. The low level of heterogeneity observed is consistent with the current view of the monoclonal origin of EBV in NPC.
Mutation analysis of the NPC-EBV genomes.
In comparison to the reference EBV, the HKNPC2 to -9 genomes harbored 1,736 variations in all, including 1,601 substitutions, 64 insertions, and 71 deletions. Totals of 1,261 substitutions, 36 insertions, and 42 deletions were located in the coding regions of the genomes, while 340 substitutions, 28 insertions, and 29 deletions were found in the noncoding regions. The data for mutations in individual genomes are shown in Table SA3 in the supplemental material. The variability in terms of the number of variations as a proportion of the total number of bases of the NPC-EBV genomes ranged from 0.65% (HKNPC2) to 0.73% (HKNPC6). Figure 2A illustrates the variations of all of the available NPC-EBV genomes relative to the reference EBV sequence. Except for GD1, HKNPC6, and HKNPC7, the pattern of distribution of variations among other NPC-EBV genomes was very similar, showing high density of variations among the latent genes and highly conserved sequences around the regions of BMRF1, BMRF2, and BRLF1 genes. Figure 2B illustrates the variations of the NPC-EBV genomes relative to the HKNPC1 sequence. Variations for the HKNPC2, -3, -4, -5, -8, and -9 genomes had markedly decreased, whereas the HKNPC6 and -7 genomes showed significant differences. Subsequent phylogenetic analyses demonstrated that the latter two genomes were likely of a distinct parental EBV lineage (see below).
FIG 2.
Genetic variations among NPC-EBV strains. (A) Mutations of EBV strains isolated from NPC relative to the reference EBV strain (NC_007605). Mutations in internal repeats and terminal repeats are disregarded, and the regions are shaded in gray. (B) Mutations of EBV strains isolated from NPC relative to HKNPC1. Rightward and leftward open reading frames of EBV are overlaid on top of the mutations.
Latent genes in all of the HKNPC genomes were found to harbor the highest numbers of nonsynonymous mutations, followed by genes encoding tegument and membrane glycoproteins (Fig. 3A), as defined by Tarbouriech et al. (36). HKNPC2, -3, 4, -5, -8, and -9 had 105 to 111 nonsynonymous mutations located in the latent genes, whereas HKNPC6 and -7 had higher numbers of 145 and 137 mutations, respectively. These latent gene mutations accounted for 34.3% (HKNPC2) to 38.9% (HKNPC6) of all nonsynonymous mutations detected for each genome. Genes encoding tegument proteins contained 65 (21.2%, HKNPC2) to 82 (22.8%, HKNPC7) nonsynonymous mutations. Genes encoding membrane glycoproteins contained 35 (11.4%, HKNPC2) to 49 (13.1%, HKNPC6) nonsynonymous mutations. The remaining nonsynonymous mutations were located in proteins for replication, transcription, capsid, packaging, or nucleotide metabolism or in proteins of unknown function.
FIG 3.
Nonsynonymous mutations of HKNPC2 to -9. (A) Number of nonsynonymous mutations contained in the nine categories of EBV-encoded proteins. The majority of the amino acid changes are located in latent proteins (blue) in all of the HKNPC strains, followed by tegument (red) and membrane protein (green). (B) Amino acid changes in CD8+ and CD4+ specific T cell epitopes. Amino acid changes in at least one of the HKNPC strains at CD8+ and CD4+ specific T cell epitopes are marked with solid and hollow arrows, respectively. Stacking arrows indicate that the amino acid change is in a peptide which serves as both CD8+ and CD4+ epitopes. X, nonsense mutation causing a truncation of BLLF1 at QNP epitope; Ins, insertion at EBNA2.
Nonsynonymous mutations in protein-coding and noncoding genes.
HKNPC6 genome contained a truncating mutation (C→T, coordinate 79381), within the coding region of the BLLF1 gene. As a result, the transmembrane domain of the encoded gp350 protein would not be translated. Three of the nonsynonymous mutations found in the BALF4 gene, which encodes the gp110 protein, were found in all of the HKNPC genomes, resulting in changes at amino acids 128 (D→E), 433 (D→N), and 803 (A→V). The same mutations were also observed in C666-1 and M81, an in vitro-transformed line harboring an NPC-derived EBV strain (37). Such polymorphisms are postulated to play a role in the epitheliotropism of EBV (37). An 11-bp frameshift deletion was found in the EBNA3C gene of the HKNPC9 genome at NC_007605 coordinate 87876, causing the protein to terminate at amino acid residue 610. Both the EBNA3C and the BLLF1 mutations were validated by Sanger sequencing. The EBNA1 genes of the NPC-EBV genomes contained 15 nonsynonymous mutations in total, with 3 in the N-terminal regions and 10 in the C-terminal regions. The LMP1 gene of the NPC-EBV genomes contained 33 nonsynonymous mutations, with 3 in the N-terminal cytoplasmic regions, 18 in the transmembrane regions, and 12 in the C-terminal cytoplasmic regions. Polymorphisms in tegument protein-encoding genes were largely unreported. We found that the BPLF1 and BOLF1 genes accounted for the majority of the nonsynonymous mutations in the tegument protein-encoding genes. BPLF1, which has the largest open reading frame of the EBV genome, harbored 52 nonsynonymous substitutions, 68 synonymous mutations, 9 insertions, and 3 deletions in the NPC-EBV genomes, whereas BOLF1 contained 34 nonsynonymous substitutions, 33 synonymous mutations, 4 insertions, and 2 deletions.
Eleven nonsynonymous mutations in the BZLF1 gene were found to be shared by the HKNPC1 to -5, -8, and -9 genomes. Two of the resulting amino acid changes were located in the transactivation domain (residues 68 [T→A] and 76 [S→P]), one in the DNA-binding domain (residue 195 [Q→H]), and one in the dimerization domain (residue 205 [A→S]). These mutations were also found in the M81 strain, in which the BZLF1 gene was associated with increased and sustained lytic replication in the infected B cells (37).
Noncoding RNAs, including EBERs and microRNAs, are generally highly conserved in EBV. EBER1 did not contain any mutations, whereas EBER2 contained six substitutions, all of which were found in all of the NPC-EBV genomes. Only one substitution (T→C, coordinate 148231) was found in the mature miR-BART19-5p but not at the seed region.
Amino acid changes in CD4+ and CD8+ T-cell epitopes.
All latent proteins except EBNA-LP and lytic proteins, including BZLF1, BRLF1, BCRF1, and BLLF1, harbored nonsynonymous mutations in epitopes specific for both CD4+ and CD8+ T cells. These epitopes were defined and reviewed in previous publications (38–40). Amino acid changes were found in seven CD8+ epitopes of LMP2, five epitopes of EBNA3A, and three or fewer in other proteins. Thirteen CD4+ epitopes of EBNA1, six in LMP1, six in LMP2, five in EBNA2, and three or fewer in other proteins contained amino acid changes. Some of the nonsynonymous mutations were affecting multiple epitopes. For example, a C-to-T substitution at 97121 resulted in the change of residue 487 (A→V) in EBNA1, where CD4+ epitopes SNP, NPK, ENI, IAE, and LRA were located. Another C-to-T substitution at coordinate 168229 caused a change in residue 212 (G→S) of LMP1, where CD4+ epitopes QAT, SSH, and SGH were located. The positions of the nonsynonymous changes located in the epitopes are illustrated in Fig. 3B and tabulated in Table S3 in the supplemental material.
Phylogenetic analysis of the NPC-EBV genomes.
The phylogenetic trees constructed based on edited whole-genome alignment of all published EBV genomes and the NPC-EBV genomes are illustrated in Fig. 4. It is known from the sequences of EBNA-2 and EBNA-3 genes that all of the HKNPC genomes are type 1 viruses. AG876, the only type 2 EBV strain included in the analysis, was clearly segregated from the type 1 EBV sequences in the analysis of whole-genome EBNA-2 and EBNA-3A genes (see Fig. SA10 in the supplemental material). Analyses on LMP-1 (Fig. 4) and LMP-2A genes (see Fig. SA10 in the supplemental material) indicated that type 1 EBV generally clustered according to the geographical location from which the samples were collected. All Asian EBV strains, including HKNPC1 to -9, GD1 and -2, C666-1, and the Japanese strain (Akata), were clustered in a branch distant to the non-Asian strains AG876, B95-8, Mutu, K4413-Mi, and K4123-Mi. Despite being a virus derived from a Chinese NPC patient, GD1 seemed to harbor many mutations that were not present in the other Chinese strains. Analyses of the nucleotide sequences of the type-independent genes, BLLF1, BPLF1 (Fig. 4), and BZLF1 (see Fig. SA10 in the supplemental material) illustrated that all of the Chinese viruses, except HKNPC6 and -7, were closely related.
FIG 4.
Phylogenetic analysis on whole EBV genomes and protein-encoding nucleotide sequences of LMP-1, BLLF1, and BPLF1 genes. The layout of phylogenetic trees generally follows the geographical origins of the EBV strains. The only type 2 sequence included in the analysis, AG876, is indicated as most distant from other strains in whole-genome and BLLF1 sequences. Analysis on whole-genome sequences for BLLF1 and BPLF1 shows that HKNPC6 and -7 are distinct from other HKNPC strains. All analyses are rooted using rhesus lymphocryptovirus. The numbers at the internal nodes correspond to the bootstrap values, obtained in an analysis of 500 replicates.
HKNPC6 and -7 genomes are distinct from the other NPC-EBV genomes.
Phylogenetic analyses on sequences of whole-genome BLLF1, BPLF1, and BZLF1 genes showed that HKNPC6 and -7 were segregated in a different branch from that of the other NPC-EBV genomes. Comparison of the HKNPC6 and -7 sequences against the reference EBV and the HKNPC1 sequences revealed a region from IR2 to downstream of IR3 containing variations unique to these two EBV variants (Fig. 2A and B). A total of 411 mutations were only found in the HKNPC6 and -7 genomes, and 364 of these were shared by these two EBV variants. Among these shared mutations, 340 were in the coding regions, and 139 were nonsynonymous. Almost half of these nonsynonymous mutations were located in the EBNA3A, -3B, and -3C genes. Some of these mutations resulted in amino acid changes of CD4+ and CD8+ epitopes, such as VQP (residue 1 [V→E]) and QVA (residue 7 [R→H] and residue 8 [A→T]) in EBNA3A and TYS (residue 9 [I→L]) and AVF (residue 2 [V→F]) in EBNA3B.
HKNPC2 is a recombinant genome of two EBV lineages.
The HKNPC2 genome contained mutations at BFRF1, BFLF1, and BFLF2 genes, and these mutations were only found in the HKNPC6 and -7 genomes. Phylogenetic analysis of the whole-genome sequences showed that HKNPC2 was located in a branch distinct from that of the HKNPC6 and -7 and the other NPC-EBV genomes (refer to Fig. 4). We hypothesized that HKNPC6 and -7 and that HKNPC3, -4, -5, -8, and -9 are of distinct parental EBV lineages, whereas HKNPC2 may arise from recombination of these two lineages. BootScan analysis identified a recombination breakpoint at approximately position 46625 (HKNPC2 coordinates; see Fig. 5A). The recombinant fragment upstream of the breakpoint aligned with the highest identity to the HKNPC7 sequence, whereas further downstream it aligned with the highest identity to the HKNPC9 sequence. A similarity plot generated by the SimPlot software showed a slight decrease in the similarity of HKNPC9 in the region of recombination (Fig. 5B). Neighbor-joining trees of portions of the recombinant regions indicated that HKNPC9 is closer to HKNPC2, whereas in the flanking regions HKNPC7 is closer in phylogeny.
FIG 5.
Recombination analysis of HKNPC2. (A) BootScan profile of HKNPC7 (red) and -9 (blue) against HKNPC2. HKNPC7 has a higher percentage of permutated trees in regions flanking internal repeat 2 (IR2), indicating a recombination event. (B) Closeup view of regions flanking IR2. (C) Similarity plot generated by SimPlot. Both HKNPC7 and -9 are highly similar to HKNPC2, with a slight drop of similarity of HKNPC9 in the region of recombination. (D) Recombination model of HKNPC2. (E to G) Neighbor-joining trees of regions of and flanking the recombinant fragments.
DISCUSSION
We had resequenced the B95-8 genome, the backbone of the type 1 reference sequence NC007605, on which the SureSelect target capture bait was designed. Minor discrepancies were still observed between the B95-8 sequenced in the present study and the original B95-8 genome (V01555). Several reasons could explain these discrepancies. Most were located within the repeat regions and stretches of homopolymers, which are prone to sequencing error by slippage and misalignment during mapping. Genetic drift during prolonged cell culture would also occur. Despite these potential sources of errors, the number of discrepancies between our sequence and the original B95-8 remains very low, validating that the sequencing workflow to be reliable in obtaining genomic sequences of EBV.
The resequenced C666-1 had a coverage very similar to that of the previously published C666-1 sequence (505- and 504-fold, respectively) (19). However, there are important differences in the sequencing approach. The previous study used reference mapping instead of the de novo assembly used in the present study, and it required 251 Gb versus 0.11 Gb data of the current study to generate the similar depth of coverage of ∼500-fold, due to the lack of target DNA enrichment (19). A significant number of discrepancies (n = 148) between the two sequences was found. Hence, we took several steps to verify our newly assembled C666-1 sequence by manually aligning the raw reads to the de novo-assembled sequence and to the published sequence and by additional Sanger sequencing. These steps showed that there was no significant misalignment and that at least some of the substitutions, insertions, and deletions found in the current assembled C666-1 genome were indeed genuine. Although reference mapping could miss large insertions and deletions, de novo assembly (such as that achieved using Velvet) could also generate small apparent insertions and deletions erroneously. A combination of the two methods might help researchers to obtain more complete and accurate EBV genomes.
HKNPC1 was previously sequenced using the amplicon sequencing approach (23). It was evident that a large variation in read depth, i.e., from >10,000-fold to zero depth, was observed throughout the genome, which would interfere with de novo assembly. Sequencing by the target enrichment approach resulted in much more evenly distributed reads across the genome, greatly favoring the process of de novo assembly. De novo assembly has the edge over reference mapping in resolving sequences of medium to high variability, such as those of BPLF1, BOLF1, and latent genes into continuous contigs. In contrast, these are the regions where consensus cannot be confidently called in reference mapping.
Given the relatively small size of the EBV genome, the sequencing capacity of Illumina sequencers allows for multiplexing of large number of genomes in a run. We showed here that the EBV portion of the total DNA in cell lines and NPC biopsy specimens can be enriched to ca. 75% of the total raw reads. If we aimed for 500-fold coverage, as in our resequencing of B95-8, C666-1, and HKNPC1, roughly 100 Mb of sequence output per sample is required. The MiSeq sequencer is suggested to have 15 Gb of output with 25 million sequencing reads; hence, it is possible to sequence as many as 150 EBV genomes per run. The output of HiSeq2500 is suggested to be 1 Tb at best, which is equivalent to 10,000 genomes' worth of output. This dramatic increase in genome output highlights the advantage of using target capture to enrich EBV DNA from total DNA of samples.
The heterogeneity of HKNPC1 and the other eight newly sequenced NPC-EBV genomes ranged from 0.019% (HKNPC1) to 0.055% (HKNPC6) and was much lower than the interstrain variability of ca. 0.5% for virus of the same type, indicating that the heterogeneity is unlikely to be the result of coinfection of different viral strains. The small degree of heterogeneity suggested that the current view of clonal expansion in NPC remains valid. However, spontaneous mutations might arise in the EBV genomes during the process of clonal expansion.
Whole-genome sequencing has clearly revealed much wider sequence diversity in many of the previously unattended genomic regions of EBV. Polymorphisms in the genes encoding tegument and early lytic proteins and glycoproteins have been identified here. BPLF1 is one of the few tegument proteins whose function has been relatively well investigated (41). We identified two nonsynonymous mutations in the N-terminal region of the protein that exhibits deubiquitinating activity. Postulation on the effect of other mutations in the tegument proteins would first require elucidation of the function of the individual domains of these proteins. Glycoproteins, including gp350 and gp110, are important molecules for viral entry into host cells (42, 43). A missense mutation in the BLLF1 gene of HKNPC6 genome results in loss of the transmembrane domain in the encoded gp350 protein, which will incapacitate the anchoring function of the viral envelope to the cell receptor for viral entry. Since only HKNPC6, but not the closely related HKNPC7, harbored the mutation, it might be acquired recently after the virus had entered the host cell. Tsai et al. had shown that polymorphisms within the BALF4 gene might contribute to the tropism of the M81 EBV strain to epithelial cells (37). Interestingly, the translated amino acid sequences of BALF4, which encodes gp110, of all the NPC-EBV genomes, are perfectly identical to those of the M81 strain except for one residue in the HKNPC1 genome. Future work using mutagenesis will help to pinpoint the mutations of BALF4 that contribute to epitheliotropism. That same group also showed that polymorphisms in the BZLF1 gene might be related to increased and sustained lytic replication in EBV-infected B cells (37). The translated amino acid sequences of BZLF1 of HKNPC1 to -5, -8, and -9 are highly similar to those of M81. However, the effect of such BZLF1 polymorphisms on an epithelial cell system has yet to be tested.
As many as one-third of nonsynonymous mutations in each of the NPC-EBV genomes were found in the latent genes. A frameshifting mutation was identified in the EBNA3C gene of HKNPC9, causing a truncation of the protein. Although EBNA3C is critical for maintaining the growth of lymphoblastoid cell growth (44), its function in epithelial cells, such as NPC cells (45), is unclear. Some of the mutations found in latent genes caused amino acid changes in the immune epitopes. Variants of two HLA A11-restricted immunodominant epitopes in EBNA-3B protein were found in the NPC-EBV genomes. AVF epitope with a mutated fourth residue (D→N) was found in HKNPC1 to -5, -8, and -9. HKNPC6 and -7 had the first (A→S) and second (V→F) residues of the AVF epitope changed. The IVT epitope was mutated in residue 2 (V→L) in HKNPC6 and -7 and in residue 9 (K→N) in HKNPC1 to -5, -8, and -9. These variants were reported to be less immunogenic and poorly recognized by IVT- and AVF-specific cytotoxic T cells compared to the wild-type epitopes (46). YFL, an HLA A2-restricted epitope of LMP-1, was a variant peptide of the wild-type YLL. This variant epitope, with changes in residue 2 (L→F) and residue 5 (M→I), was found in all of the NPC-EBV genomes. The CTL recognition of YFL was abrogated compared to the wild-type epitope (47). These variations in epitopes might thus contribute to the evasion of the EBV-infected cells from T-cell surveillance.
The phylogenetic tree constructed based on whole genome alignment of all strains is strongly weighted toward the highly polymorphic EBNA-2 and EBNA-3 genes (48). However, phylogenetic analysis based on other type-independent genes might tell a different story. Comparison of all the available EBV genomes to the reference EBV (NC_007605) identified 230 variations in the BPLF1 gene. A phylogenetic tree based on BPLF1 illustrated that the HKNPC6, HKNPC7, and GD1 genomes are actually closer to the B95-8, Mutu, and AG876 strains than the other Asian variants, including Akata and the other NPC-EBV genomes (see Fig. 4). Analysis of LMP1 and -2 showed a phylogenetic relationship corresponding to the geographical origin of the viral genomes instead of the type 1 and 2 dichotomy, suggesting that LMP1 and -2 genes can serve as geographical markers. HKNPC6 and -7 genomes, which may constitute a distinct EBV lineage, as illustrated by the phylogenetic analyses based on the BPLF1, BLLF1, BZLF1, genes, and the whole-genome nucleotide sequences, were both isolated from primary tumor biopsy specimens of the aggressive-stage IV NPC cases. Future work should investigate the relationship between this distinct lineage of EBV and the clinical stages of NPC. HKNPC6 and -7 genomes also harbored unique variations that lead to amino acid changes in CD4+ and CD8+ specific T cell epitopes. Whether changes in such epitopes confer immune evasion of the NPC cells may constitute another hypothesis for future testing.
Recombination processes had been observed for human herpesviruses (49–51). It has also been shown in EBV that phylogeny is not always consistent for different parts of the genome, suggesting recombination events among different strains of the virus (52, 53). Lineage analysis of the B95-8, AG876, and GD1 genomes showed that these strains had defined EBV haplotype regions, and models of recombination of distinct lineages were proposed (53). Putative recombination breakpoints among GD1, Akata, and a few African strains has also been suggested (52). However, the recombination analyses of EBV to date were performed mainly on EBV strains from different geographical origins. The sequences of the eight NPC-EBV genomes generated in the present study provided some information on the relationship among the viral variants in NPC cells from the same geographic region. We provide evidence of recombination among viruses of close phylogenetic distances. The recombinant segment found in HKNPC2 resembled the haplotype region 2 (HR2) defined in a previous study (53). Regions flanking the recombination segment might serve as recombination hot spots. Large-scale sequencing of EBV genomes will enlighten the frequency of recombination of EBV lineages in both normal and diseased subjects. In order to investigate how genomic variations of EBV contribute to pathogenesis of NPC, it is essential to sequence the EBV genomes of a large number of normal and diseased individuals in the same geographical region and conduct a case-control comparison.
In summary, we established a complete sequencing workflow to delineate the genomic sequences of EBV genomes. This workflow will facilitate future large-scale sequencing of EBV genomes. We reported eight newly sequenced EBV genomes isolated from primary NPC biopsy specimens and demonstrated the sequence diversity on a whole-genome level among these EBV isolates. In the future, large-scale sequencing and comparison of EBV genomes isolated from NPC and normal subjects of the same geographic region should be performed to assess whether EBV genomic variations contribute to NPC pathogenesis.
Supplementary Material
ACKNOWLEDGMENTS
This study was funded by Research Grant Council GRF grant HKU763208M and EBV research grant 200004525 of the AKSC and by a grant from the University Development Fund of the University of Hong Kong to the Centre of Genomic Sciences. Hin Kwok's Ph.D. work was supported by The University of Hong Kong's postgraduate studentship and HoTung Pediatrics Education and Research Fund.
We thank W. L. Wang, Department of Pediatrics and Adolescent Medicine, The University of Hong Kong, for critical reviews of the manuscript.
Footnotes
Published ahead of print 2 July 2014
Supplemental material for this article may be found at http://dx.doi.org/10.1128/JVI.01665-14.
REFERENCES
- 1.Neel HB, III, Pearson GR, Taylor WF. 1984. Antibodies to Epstein-Barr virus in patients with nasopharyngeal carcinoma and in comparison groups. Ann. Otol. Rhinol. Laryngol. 93:477–482 [DOI] [PubMed] [Google Scholar]
- 2.Grunewald V, Bonnet M, Boutin S, Yip T, Louzir H, Levrero M, Seigneurin JM, Raphael M, Touitou R, Martel-Renoir D, Cochet C, Durandy A, Andre P, Lau W, Zeng Y, Joab I. 1998. Amino acid change in the Epstein-Barr-virus ZEBRA protein in undifferentiated nasopharyngeal carcinomas from Europe and North Africa. Int. J. Cancer 75:497–503. [DOI] [PubMed] [Google Scholar]
- 3.Sacaze C, Henry S, Icart J, Mariame B. 2001. Tissue specific distribution of Epstein-Barr virus (EBV) BZLF1 gene variants in nasopharyngeal carcinoma (NPC) bearing patients. Virus Res. 81:133–142. 10.1016/S0168-1702(01)00376-8 [DOI] [PubMed] [Google Scholar]
- 4.Dardari R, Khyatti M, Cordeiro P, Odda M, El Gueddari B, Hassar M, Menezes J. 2006. High frequency of latent membrane protein 1 30-bp deletion variant with specific single mutations in Epstein-Barr virus-associated nasopharyngeal carcinoma in Moroccan patients. Int. J. Cancer 118:1977–1983. 10.1002/ijc.21595 [DOI] [PubMed] [Google Scholar]
- 5.Chang KP, Hao SP, Lin SY, Ueng SH, Pai PC, Tseng CK, Hsueh C, Hsieh MS, Yu JS, Tsang NM. 2006. The 30-bp deletion of Epstein-Barr virus latent membrane protein-1 gene has no effect in nasopharyngeal carcinoma. Laryngoscope 116:541–546. 10.1097/01.mlg.0000201993.53410.40 [DOI] [PubMed] [Google Scholar]
- 6.Nguyen-Van D, Ernberg I, Phan-Thi Phi P, Tran-Thi C, Hu L. 2008. Epstein-Barr virus genetic variation in Vietnamese patients with nasopharyngeal carcinoma: full-length analysis of LMP1. Virus Genes 37:273–281. 10.1007/s11262-008-0262-9 [DOI] [PubMed] [Google Scholar]
- 7.See HS, Yap YY, Yip WK, Seow HF. 2008. Epstein-Barr virus latent membrane protein-1 (LMP-1) 30-bp deletion and XhoI-loss is associated with type III nasopharyngeal carcinoma in Malaysia. World J. Surg. Oncol. 6:18. 10.1186/1477-7819-6-18 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Cheung ST, Leung SF, Lo KW, Chiu KW, Tam JS, Fok TF, Johnson PJ, Lee JC, Huang DP. 1998. Specific latent membrane protein 1 gene sequences in type 1 and type 2 Epstein-Barr virus from nasopharyngeal carcinoma in Hong Kong. Int. J. Cancer 76:399–406. [DOI] [PubMed] [Google Scholar]
- 9.Edwards RH, Sitki-Green D, Moore DT, Raab-Traub N. 2004. Potential selection of LMP1 variants in nasopharyngeal carcinoma. J. Virol. 78:868–881. 10.1128/JVI.78.2.868-881.2004 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Zhang XS, Wang HH, Hu LF, Li A, Zhang RH, Mai HQ, Xia JC, Chen LZ, Zeng YX. 2004. V-val subtype of Epstein-Barr virus nuclear antigen 1 preferentially exists in biopsies of nasopharyngeal carcinoma. Cancer Lett. 211:11–18. 10.1016/j.canlet.2004.01.035 [DOI] [PubMed] [Google Scholar]
- 11.Farrell PJ. 2001. Epstein-Barr virus. The B95-8 strain map. Methods Mol. Biol. 174:3–12 [DOI] [PubMed] [Google Scholar]
- 12.Baer R, Bankier AT, Biggin MD, Deininger PL, Farrell PJ, Gibson TJ, Hatfull G, Hudson GS, Satchwell SC, Seguin C, et al. 1984. DNA sequence and expression of the B95-8 Epstein-Barr virus genome. Nature 310:207–211. 10.1038/310207a0 [DOI] [PubMed] [Google Scholar]
- 13.Hummel M, Kieff E. 1982. Epstein-Barr virus RNA. VIII. Viral RNA in permissively infected B95-8 cells. J. Virol. 43:262–272 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Weigel R, Miller G. 1983. Major EB virus-specific cytoplasmic transcripts in a cellular clone of the HR-1 Burkitt lymphoma line during latency and after induction of viral replicative cycle by phorbol esters. Virology 125:287–298. 10.1016/0042-6822(83)90202-7 [DOI] [PubMed] [Google Scholar]
- 15.de Jesus O, Smith PR, Spender LC, Elgueta Karstegl C, Niller HH, Huang D, Farrell PJ. 2003. Updated Epstein-Barr virus (EBV) DNA sequence and analysis of a promoter for the BART (CST, BARF0) RNAs of EBV. J. Gen. Virol. 84:1443–1450. 10.1099/vir.0.19054-0 [DOI] [PubMed] [Google Scholar]
- 16.Dolan A, Addison C, Gatherer D, Davison AJ, McGeoch DJ. 2006. The genome of Epstein-Barr virus type 2 strain AG876. Virology 350:164–170. 10.1016/j.virol.2006.01.015 [DOI] [PubMed] [Google Scholar]
- 17.Lin Z, Wang X, Strong MJ, Concha M, Baddoo M, Xu G, Baribault C, Fewell C, Hulme W, Hedges D, Taylor CM, Flemington EK. 2013. Whole-genome sequencing of the Akata and Mutu Epstein-Barr virus strains. J. Virol. 87:1172–1182. 10.1128/JVI.02517-12 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Cheung ST, Huang DP, Hui AB, Lo KW, Ko CW, Tsang YS, Wong N, Whitney BM, Lee JC. 1999. Nasopharyngeal carcinoma cell line (C666-1) consistently harbouring Epstein-Barr virus. Int. J. Cancer 83:121–126. [DOI] [PubMed] [Google Scholar]
- 19.Tso KK, Yip KY, Mak CK, Chung GT, Lee SD, Cheung ST, To KF, Lo KW. 2013. Complete genomic sequence of Epstein-Barr virus in nasopharyngeal carcinoma cell line C666-1. Infect. Agents Cancer 8:29. 10.1186/1750-9378-8-29 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Lei H, Li T, Hung GC, Li B, Tsai S, Lo SC. 2013. Identification and characterization of EBV genomes in spontaneously immortalized human peripheral blood B lymphocytes by NGS technology. BMC Genomics 14:804. 10.1186/1471-2164-14-804 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Zeng MS, Li DJ, Liu QL, Song LB, Li MZ, Zhang RH, Yu XJ, Wang HM, Ernberg I, Zeng YX. 2005. Genomic sequence analysis of Epstein-Barr virus strain GD1 from a nasopharyngeal carcinoma patient. J. Virol. 79:15323–15330. 10.1128/JVI.79.24.15323-15330.2005 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Liu P, Fang X, Feng Z, Guo YM, Peng RJ, Liu T, Huang Z, Feng Y, Sun X, Xiong Z, Guo X, Pang SS, Wang B, Lv X, Feng FT, Li DJ, Chen LZ, Feng QS, Huang WL, Zeng MS, Bei JX, Zhang Y, Zeng YX. 2011. Direct sequencing and characterization of a clinical isolate of Epstein-Barr virus from nasopharyngeal carcinoma tissue using next-generation sequencing technology. J. Virol. 85:11291–11299. 10.1128/JVI.00823-11 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Kwok H, Tong AH, Lin CH, Lok S, Farrell PJ, Kwong DL, Chiang AK. 2012. Genomic sequencing and comparative analysis of Epstein-Barr virus genome isolated from primary nasopharyngeal carcinoma biopsy. PLoS One 7:e36939. 10.1371/journal.pone.0036939 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Miller G, Lipman M. 1973. Release of infectious Epstein-Barr virus by transformed marmoset leukocytes. Proc. Natl. Acad. Sci. U. S. A. 70:190–194. 10.1073/pnas.70.1.190 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Depledge DP, Palser AL, Watson SJ, Lai IY, Gray ER, Grant P, Kanda RK, Leproust E, Kellam P, Breuer J. 2011. Specific capture and whole-genome sequencing of viruses from clinical samples. PLoS One 6:e27805. 10.1371/journal.pone.0027805 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Li H, Durbin R. 2009. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25:1754–1760. 10.1093/bioinformatics/btp324 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, Genome Project Data Processing S 2009. The sequence alignment/map format and SAMtools. Bioinformatics 25:2078–2079. 10.1093/bioinformatics/btp352 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Zerbino DR, Birney E. 2008. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 18:821–829. 10.1101/gr.074492.107 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Robinson JT, Thorvaldsdottir H, Winckler W, Guttman M, Lander ES, Getz G, Mesirov JP. 2011. Integrative genomics viewer. Nat. Biotechnol. 29:24–26. 10.1038/nbt.1754 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Koboldt DC, Chen K, Wylie T, Larson DE, McLellan MD, Mardis ER, Weinstock GM, Wilson RK, Ding L. 2009. VarScan: variant detection in massively parallel sequencing of individual and pooled samples. Bioinformatics 25:2283–2285. 10.1093/bioinformatics/btp373 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Katoh K, Asimenos G, Toh H. 2009. Multiple alignment of DNA sequences with MAFFT. Methods Mol. Biol. 537:39–64. 10.1007/978-1-59745-251-9_3 [DOI] [PubMed] [Google Scholar]
- 32.Clamp M, Cuff J, Searle SM, Barton GJ. 2004. The Jalview Java alignment editor. Bioinformatics 20:426–427. 10.1093/bioinformatics/btg430 [DOI] [PubMed] [Google Scholar]
- 33.Tamura K, Peterson D, Peterson N, Stecher G, Nei M, Kumar S. 2011. MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum-parsimony methods. Mol. Biol. Evol. 28:2731–2739. 10.1093/molbev/msr121 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Martin DP, Lemey P, Lott M, Moulton V, Posada D, Lefeuvre P. 2010. RDP3: a flexible and fast computer program for analyzing recombination. Bioinformatics 26:2462–2463. 10.1093/bioinformatics/btq467 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Lole KS, Bollinger RC, Paranjape RS, Gadkari D, Kulkarni SS, Novak NG, Ingersoll R, Sheppard HW, Ray SC. 1999. Full-length human immunodeficiency virus type 1 genomes from subtype C-infected seroconverters in India, with evidence of intersubtype recombination. J. Virol. 73:152–160 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Tarbouriech N, Buisson M, Geoui T, Daenke S, Cusack S, Burmeister WP. 2006. Structural genomics of the Epstein-Barr virus. Acta Crystallogr. Sect. D Biol. Crystallogr. 62:1276–1285. 10.1107/S0907444906030034 [DOI] [PubMed] [Google Scholar]
- 37.Tsai MH, Raykova A, Klinke O, Bernhardt K, Gartner K, Leung CS, Geletneky K, Sertel S, Munz C, Feederle R, Delecluse HJ. 2013. Spontaneous lytic replication and epitheliotropism define an Epstein-Barr virus strain found in carcinomas. Cell Reports 5:458–470. 10.1016/j.celrep.2013.09.012 [DOI] [PubMed] [Google Scholar]
- 38.Long HM, Leese AM, Chagoury OL, Connerty SR, Quarcoopome J, Quinn LL, Shannon-Lowe C, Rickinson AB. 2011. Cytotoxic CD4+ T cell responses to EBV contrast with CD8 responses in breadth of lytic cycle antigen choice and in lytic cycle recognition. J. Immunol. 187:92–101. 10.4049/jimmunol.1100590 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Sitompul LS, Widodo N, Djati MS, Utomo DH. 2012. Epitope mapping of gp350/220 conserved domain of Epstein-Barr virus to develop nasopharyngeal carcinoma (NPC) vaccine. Bioinformation 8:479–482. 10.6026/97320630008479 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Hislop AD, Taylor GS, Sauce D, Rickinson AB. 2007. Cellular responses to viral infection in humans: lessons from Epstein-Barr virus. Annu. Rev. Immunol. 25:587–617. 10.1146/annurev.immunol.25.022106.141553 [DOI] [PubMed] [Google Scholar]
- 41.Whitehurst CB, Ning S, Bentz GL, Dufour F, Gershburg E, Shackelford J, Langelier Y, Pagano JS. 2009. The Epstein-Barr virus (EBV) deubiquitinating enzyme BPLF1 reduces EBV ribonucleotide reductase activity. J. Virol. 83:4345–4353. 10.1128/JVI.02195-08 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Maruo S, Yang L, Takada K. 2001. Roles of Epstein-Barr virus glycoproteins gp350 and gp25 in the infection of human epithelial cells.J. Gen. Virol. 82:2373–2383 [DOI] [PubMed] [Google Scholar]
- 43.Hutt-Fletcher LM. 2007. Epstein-Barr virus entry. J. Virol. 81:7825–7832. 10.1128/JVI.00445-07 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Maruo S, Wu Y, Ito T, Kanda T, Kieff ED, Takada K. 2009. Epstein-Barr virus nuclear protein EBNA3C residues critical for maintaining lymphoblastoid cell growth. Proc. Natl. Acad. Sci. U. S. A. 106:4419–4424. 10.1073/pnas.0813134106 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Young LS, Dawson CW, Clark D, Rupani H, Busson P, Tursz T, Johnson A, Rickinson AB. 1988. Epstein-Barr virus gene expression in nasopharyngeal carcinoma. J. Gen. Virol. 69(Pt 5):1051–1065 [DOI] [PubMed] [Google Scholar]
- 46.Midgley RS, Bell AI, Yao QY, Croom-Carter D, Hislop AD, Whitney BM, Chan AT, Johnson PJ, Rickinson AB. 2003. HLA-A11-restricted epitope polymorphism among Epstein-Barr virus strains in the highly HLA-A11-positive Chinese population: incidence and immunogenicity of variant epitope sequences. J. Virol. 77:11507–11516. 10.1128/JVI.77.21.11507-11516.2003 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Lin HJ, Cherng JM, Hung MS, Sayion Y, Lin JC. 2005. Functional assays of HLA A2-restricted epitope variant of latent membrane protein 1 (LMP-1) of Epstein-Barr virus in nasopharyngeal carcinoma of Southern China and Taiwan. J. Biomed. Sci. 12:925–936. 10.1007/s11373-005-9017-y [DOI] [PubMed] [Google Scholar]
- 48.Sample J, Young L, Martin B, Chatman T, Kieff E, Rickinson A, Kieff E. 1990. Epstein-Barr virus types 1 and 2 differ in their EBNA-3A, EBNA-3B, and EBNA-3C genes. J. Virol. 64:4084–4092 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Kakoola DN, Sheldon J, Byabazaire N, Bowden RJ, Katongole-Mbidde E, Schulz TF, Davison AJ. 2001. Recombination in human herpesvirus-8 strains from Uganda and evolution of the K15 gene. J. Gen. Virol. 82:2393–2404 [DOI] [PubMed] [Google Scholar]
- 50.Bowden R, Sakaoka H, Donnelly P, Ward R. 2004. High recombination rate in herpes simplex virus type 1 natural populations suggests significant coinfection. Infect. Genet. Evol. 4:115–123. 10.1016/j.meegid.2004.01.009 [DOI] [PubMed] [Google Scholar]
- 51.Faure-Della Corte M, Samot J, Garrigue I, Magnin N, Reigadas S, Couzi L, Dromer C, Velly JF, Dechanet-Merville J, Fleury HJ, Lafon ME. 2010. Variability and recombination of clinical human cytomegalovirus strains from transplantation recipients. J. Clin. Virol. 47:161–169. 10.1016/j.jcv.2009.11.023 [DOI] [PubMed] [Google Scholar]
- 52.Santpere G, Darre F, Blanco S, Alcami A, Villoslada P, Mar Alba M, Navarro A. 2014. Genome-wide analysis of wild-type epstein-barr virus genomes derived from healthy individuals of the 1000 genomes project. Genome Biol. Evol. 6:846–860. 10.1093/gbe/evu054 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.McGeoch DJ, Gatherer D. 2007. Lineage structures in the genome sequences of three Epstein-Barr virus strains. Virology 359:1–5. 10.1016/j.virol.2006.10.009 [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.