Skip to main content
Journal of Clinical Microbiology logoLink to Journal of Clinical Microbiology
. 2012 Mar;50(3):857–866. doi: 10.1128/JCM.05715-11

Use of Illumina Deep Sequencing Technology To Differentiate Hepatitis C Virus Variants

Masashi Ninomiya a, Yoshiyuki Ueno a,, Ryo Funayama b, Takeshi Nagashima b, Yuichiro Nishida b, Yasuteru Kondo a, Jun Inoue a, Eiji Kakazu a, Osamu Kimura a, Keiko Nakayama b, Tooru Shimosegawa a
PMCID: PMC3295113  PMID: 22205816

Abstract

Hepatitis C virus (HCV) is a positive-strand enveloped RNA virus that shows diverse viral populations even in one individual. Though Sanger sequencing has been used to determine viral sequences, deep sequencing technologies are much faster and can perform large-scale sequencing. We demonstrate the successful use of Illumina deep sequencing technology and subsequent analyses to determine the genetic variants and amino acid substitutions in both treatment-naïve (patient 1) and treatment-experienced (patient 7) isolates from HCV-infected patients. As a result, almost the full nucleotide sequence of HCV was detectable for patients 1 and 7. The reads were mapped to the HCV reference sequence. The coverage was 99.8% and the average depth was 69.5× for patient 7, with values of 99.4% (coverage) and 51.1× (average depth) for patient 1. In patient 7, amino acid (aa) 70 in the core region showed arginine, with methionine at aa 91, by Sanger sequencing. Major variants showed the same amino acid sequence, but minor variants were detectable in 18% (6/34 sequences) of sequences, with replacement of methionine by leucine at aa 91. In NS3, 8 amino acid positions showed mixed variants (T72T/I, K213K/R, G237G/S, P264P/S/A, S297S/A, A358A/T, S457S/C, and I615I/M) in patient 7. In patient 1, 3 amino acid positions showed mixed variants (L14L/F/V, S61S/A, and I586T/I). In conclusion, deep sequencing technologies are powerful tools for obtaining more profound insight into the dynamics of variants in the HCV quasispecies in human serum.

INTRODUCTION

Hepatitis C virus (HCV) is a positive-strand enveloped RNA virus of approximately 9,600 nucleotide (nt) bases, consisting of a single open reading frame and two untranslated regions, and belongs to the genus Hepacivirus within the family Flaviviridae (6). The single open reading frame encodes a polyprotein of 3,011 amino acids (aa) that is cleaved by viral and cellular proteases into 10 different proteins. The three structural proteins, which constitute the viral particle, include the core protein and the envelope glycoproteins E1 and E2. Two regions in E2, known as hypervariable regions 1 and 2, are reported to have extreme sequence variability. The seven nonstructural components include the p7 polypeptide, the NS2-3 protease, the NS3 serine protease and RNA helicase, the NS4A polypeptide, the NA4B and NS5A proteins, and the NS5B RNA-dependent RNA polymerase (RdRp) (29). At both ends of the open reading frame lie the 5′- and 3′-untranslated regions (5′-UTR and 3′-UTR). The nucleotide sequence of the 5′-UTR is relatively well conserved among different HCV genotypes. The HCV 5′-UTR contains an internal ribosome entry site (IRES) that directs the cap-independent initiation of virus translation and forms on four characteristic stem-loop structures (17, 18). HCV displays very high genetic variability both in populations and within infected individuals, where it exists as a cluster of closely related but distinct variants, termed “quasispecies,” as occurs in many other RNA viruses with a polymerase enzyme lacking proofreading ability (6, 8, 26).

Current standard treatment of chronic HCV infection is based on the combination of pegylated alpha interferon (peg-IFN-α) and ribavirin (RBV). However, patients with a high load of genotype 1b virus (>1 × 105 log IU/ml) do not achieve high sustained virological response (SVR) rates (<50%), even when the most effective combination treatment (IFN plus RBV) is administered for 48 weeks (14, 25). Some investigations concerning therapeutic prediction based on virological features revealed that substitutions of arginine for glutamine at amino acid (aa) 70 and/or leucine for methionine at aa 91 in the core region are independent and significant factors associated with SVR or that patients whose viruses have more than 4 amino acid changes in the NS5A interferon sensitivity-determining region (ISDR) (aa 2209 to 2248) have high responses to IFN therapy compared to those for patients with HCV-J (mutant type), whereas patients whose viruses have no amino acid changes (wild type) or 1 to 3 amino acid changes (intermediate type) have low responses (1, 2, 9, 10).

Recently, direct-acting antiviral (DAA) molecules active on HCV, such as NS3/4A protease inhibitors, nucleoside/nucleotide analogue inhibitors of RdRp, nonnucleoside inhibitors of RdRp, and NS5A inhibitors, have been developed. These DAA molecules, either alone or in combination with peg-IFN plus RBV, were recently described as showing large antiviral effects (15, 21). However, the problem that we have to consider next is viral resistance to DAAs due to the selection of viral variants that contain amino acid substitutions altering the drug target and rendering virus less susceptible to the drug's inhibitory activity (35). Additionally, drug-resistant variants already preexist as minor populations within a patient's quasispecies. Drug exposure intensively inhibits replication of the drug-sensitive viral population, and the resistant variants gradually predominate in the HCV population (7). In the future, to determine the most appropriate treatment for HCV patients, analysis of the nucleotide or amino acid sequence of HCV will become important.

Initial attempts to identify the HCV genome sequence relied on Sanger sequencing and the use of PCR primers targeting relatively conserved regions, methods that would likely fail if the virus had more variants (3234). In recent years, new technologies have been developed that are able to sequence viruses from environmental samples without using specific primers, cloning, and resorting by recombinant DNA techniques and thus can obtain the sequence information for the complete virome in an unbiased way. Metagenomic approaches such as deep sequencing have proven increasingly successful at identifying variants or mutations of the nucleotide sequence (23, 42, 45).

Here we demonstrate the successful use of Illumina deep sequencing technology and subsequent analyses to determine the genetic variants and amino acid substitutions of both treatment-naïve (patient 1) and treatment-experienced (IFN) (patient 7) isolates of HCV without using specific HCV primers.

MATERIALS AND METHODS

Patients.

Two patients with chronic hepatitis C virus infection with genotype 1b virus and one healthy control were enrolled in this study. Each serum sample was collected before treatment with peg-IFN-α and RBV and was stored at −20°C until testing. In the laboratory data, the HCV load was 6.8 log IU/ml for patient 1, who was treatment naïve, and 7.0 log IU/ml for patient 7, who was treated with IFN-α2b and RBV for 6 months in 2002, but with no treatment effect (Cobas TaqMan HCV test; Roche Molecular Systems, Pleasanton, CA). More clinical information is described in Table 1.

Table 1.

Clinical data for patients enrolled in this studya

Patient (storage date of sample) Sex Age (yr) Diagnosis HCV RNA load (log IU/ml) HCV genotype Past treatment (period) Therapeutic effect Core aa 70 Core aa 91 No. of aa substitutions in NS5A ISDR
1 (October 2008) Male 43 Chronic hepatitis C 6.8 1b None NA Wild type Wild type 0
7 (May 2010) Male 57 Chronic hepatitis C 7.0 1b IFN-α2b plus RBV (March–September 2002) Nontherapeutic Wild type Mutant 0
9 (January 2011) Female 64 Control NA NA NA NA NA NA NA
a

Abbreviations: aa, amino acid; IFN, interferon; NA, not applicable; RBV, ribavirin. All patients were negative for hepatitis B surface antigen (HBsAg) and hepatitis B surface antibody (HBsAb).

Sanger sequencing in the core region and NS5A ISDR.

Total RNA was extracted from 100-μl serum samples by use of a MagMAX viral RNA isolation kit (Ambion, Austin, TX), and the RNA preparation thus obtained was subjected to cDNA synthesis with reverse transcriptase (SuperScript III RNase H reverse transcriptase; Invitrogen) and to PCR amplification using Prime Star HS DNA polymerase (TaKaRa Bio, Shiga, Japan) with nested primers derived from the core region and the NS5A ISDR of the HCV genome. Nested PCR amplification of the core region of the HCV genome was carried out with primers C008 (sense; 5′-AAC CTC AAA GAA AAA CCA AAC G-3′) and C011 (antisense; 5′-CAT GGG GTA CAT YCC GCT YG-3′) in the first round, for 35 cycles (98°C for 10 s, 55°C for 15 s, and 72°C for 1 min, with an additional 7 min in the last cycle), and with primers C009 (sense; 5′-CCA CAG GAC GTY AAG TTC CC-3′) and C010 (antisense; 5′-AGG GTA TCG ATG ACC TTA CC-3′) in the second round, for 25 cycles. Nested primers derived from the NS5A ISDR of the HCV genome were designed to amplify a 188-bp product, using primers C004 (sense; 5′-ATG CCC ATG CCA GGT TCC AG-3′) and C005 (antisense; 5′-AGC TCC GCC AAG GCA GAA GA-3′) in the first round and primers C006 (sense; 5′-ACC GGA TGT GGC AGT GCT CA-3′) and C007 (antisense; 5′-GTA ATC CGG GCG TGC CCA TA-3′) in the second round. The PCR products were sequenced directly on both strands by use of a BigDye Terminator, version 3.1, cycle sequencing kit on an ABI Prism 3100 genetic analyzer (Applied Biosystems, Foster City, CA). Sequence analysis was performed using Genetyx-Mac ver. 12.2.6 (Genetyx Corp., Tokyo, Japan) and ODEN (version 1.1.1) from the DNA Data Bank of Japan (National Institute of Genetics, Mishima, Japan) (19).

Library preparation and Illumina sequencing.

Total RNA was extracted from 800 μl of serum by use of a MagMAX viral RNA isolation kit (Ambion) according to the manufacturer's protocol, with the slight modification that carrier RNA was not included. A library was prepared from approximately 200 ng of total RNA by use of an mRNA-seq sample prep kit (Illumina, San Diego, CA). The quality of the library was evaluated with Bioanalyzer (Agilent, Santa Clara, CA). Before deep sequencing, we confirmed the presence of the HCV genome in the libraries by conducting quantitative PCR with StepOnePlus (Applied Biosystems), using SYBR Ex Taq premix (TaKaRa, Shiga, Japan) and the specific primers C112 (sense; 5′-GCW GTS CAR TGG ATG AAC CG-3′) and C113 (antisense; 5′-GCT YTC MGG CAC RTA GTG CG-3′), derived from the 81-bp region encoding HCV NS4B, and then loaded each sample into two or three lanes of a flow cell. Libraries were clonally amplified on the flow cell and sequenced on an Illumina IIx genome analyzer (SCS 2.8 software; Illumina, San Diego, CA), with a 76-mer single end sequence. Image analysis and base calling were performed using RTA 1.8 software.

Analysis.

Seventy-six-mer single-end reads were classified by strict bar codes, split into individual reads, and stripped of any remaining primer sequences by using CLC Genomics Workbench (4.6) (http://www.clcbio.co.jp). Sequence reads aligned to the human genome by hg19 (http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/chromFa.tar.gz), GenBank (http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/mrna.fa.gz), RefSeq (http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/refMrna.fa.gz), and Ensembl (ftp://ftp.ensembl.org/pub/release62/fasta/homo_sapiens/cdna/Homo_sapiens.GRCh37.62.cdna.all.fa.gz and ftp://ftp.ensembl.org/pub/release62/fasta/homo_sapiens/ncrna/Homo_sapiens.GRCh37.62.ncrna.fa.gz) were removed in the first mapping analysis of the human genome. Sequence reads not of human origin were aligned with 970 reference HCV sequences registered at the Hepatitis Virus Database server (http://s2as02.genes.nig.ac.jp/index.html) by use of BWA (0.5.9-r16), allowing mismatches within 5 to 10 nucleotide bases (24). The reads could be defined as being of HCV origin by identification with reference to the HCV sequences, allowing mismatches within 10 nucleotide bases. Duplicate reads were completely excluded to avoid sequence bias, using Samtools (0.1.16) (24). Additionally, the variants compared with HCV-J were identified by Samtools. The result of the analysis was displayed using Integrative Genomics Viewer (IGV; 2.0.3) (36).

Ethics statement.

Written informed consent was obtained from each individual, and the study for detecting host genomes was approved by the Ethics Committee of the Tohoku University School of Medicine (2010-404).

RESULTS

Evaluation of the quality of the libraries.

We conducted deep sequencing analysis for two patients (patient 1 [treatment naïve] and patient 7 [treatment experienced, with IFN]) who had been infected with chronic hepatitis C virus of genotype 1b, as well as one healthy control (patient 9) (Table 1). Since there is only a small quantity of circulating RNAs, including those of viral origin, in serum, it was important to evaluate the quality of the libraries. The library for patient 7 showed good quality using an Agilent bioanalyzer, but the primer and adaptor dimers were mixed in the libraries of patients 1 and 9 (Fig. 1A). Before deep sequencing, we evaluated whether the HCV genome was included in the libraries, and the amplification plots for quantitative PCR showed that the HCV genome could be detected in the libraries from both patients 1 and 7 with the specific primers C112 and C113, derived from the NS4B region (Fig. 1B).

Fig 1.

Fig 1

Evaluation of the quality of libraries. (A) The library was well refined for patient 7, but the primer and adaptor dimers were mixed in the libraries of patients 1 and 9, obtained using Bioanalyzer (Agilent). The horizontal axis shows the DNA size, and the vertical axis shows the quantity of DNA. The blue arrows indicate the desired products, and the red arrows indicate the primer or adaptor dimers. The peaks of the wave at 35 bp and 10,380 bp express the marker. (B) Amplification plots for patients 1 (treatment naïve) and 7 (treatment experienced), showing the presence of the HCV genome in the libraries by quantitative PCR with StepOnePlus (Applied Biosystems).

Distribution of free RNA in human serum.

To characterize the metagenomics of HCV infection in humans, we analyzed the samples by single-end deep sequencing on three lanes for patient 7 and on two lanes for patient 1 and control patient 9, using an Illumina IIx genome analyzer. After trimming the reads to exclude ambiguous nucleotides, primers, or adaptor sequences, 96,079,465 high-quality 76-bp reads were subjected to analysis. From the initial set of reads, a total of 86,868,619 reads were able to be aligned to human genomic DNA.

We then mapped the remaining 9,210,846 reads, including 1,453,419 reads for patient 1, 7,099,434 reads for patient 7, and 657,993 reads for patient 9 (Table 2).

Table 2.

Distribution of viral reads in human serum

Sample No. (%) of reads
Patient 1 (treatment naïve) Patient 7 (treatment experienced) Patient 9 (healthy control)
Total reads 27,717,487 (100.00) 94,151,356 (100.00) 15,032,130 (100.00)
Adaptor and primer reads 6,502,508 (23.46) 28,605,006 (30.38) 5,713,994 (38.01)
Modified total reads 21,214,979 (76.54) 65,546,350 (69.62) 9,318,136 (61.99)
Reads of human origin 19,761,560 (71.30) 58,446,916 (62.08) 8,660,143 (57.61)
Unknown reads 1,453,419 (5.24) 7,099,434 (7.54) 657,993 (4.38)

Mapping of the HCV genome sequence.

The reads were aligned to 970 HCV genome sequences by using BWA, allowing mismatches within 5 to 10 nucleotide bases. Accordingly, MD5-1 (GenBank accession no. AF165053) was expected to be the closest HCV strain to that in patient 1, and MD2-2 (GenBank accession no. AF165048) was expected to be the closest to that in patient 7. The reads obtained from healthy subject 9 were not aligned to MD5-1 or MD2-2, allowing mismatches within 10 nucleotide bases (see the supplemental material). Whereas some strains, for example, HC-J4, HCV-KT9, and HC-J6, could be mapped to the reads from healthy subject 9, all of the reads were aligned at the 3′-UTR of the U-rich region, and we could not evaluate whether they were of HCV origin. Therefore, we constituted the HCV genome sequences of patients 1 and 7 without the 3′-UTR. In this alignment, the duplicate reads were completely excluded. For patient 1, 6,303 reads were mapped on MD5-1, allowing for 10 mismatched nucleotide bases. The coverage was 99.4%, and the average depth was 51.1× (Fig. 2). For patient 7, 8,583 reads could be identified with MD2-2. The coverage was 99.8%, and the average sequencing depth was 69.5× (Fig. 2).

Fig 2.

Fig 2

Mapping to the HCV reference genome. For patient 1, 6,303 reads were mapped to MD5-1. The coverage was 99.4%, and the average depth was 51.1×. For patient 7, 8,583 reads were aligned to MD2-2. The coverage was 99.8%, and the average depth was 69.5%.

Notably, the genome sequence could not be obtained for only 8 nt in the 5′-UTR and 52 nt in the core region for patient 1 and for 18 nt in the E2 region for patient 7.

Amino acid substitutions in the core region and the NS5A ISDR.

To identify potential mutations at key sites in the genome that mediate the effect of IFN-based therapy, we compared the HCV genome obtained from patient 7 with HCV-J, which is known as the prototypical HCV 1b strain and whose complete genomic sequence has been determined (20). A previous study reported that there were substitutions of aa 70 and/or 91 in the core region and that the number of substitutions within three bases in the region of aa 2209 to 2248 (NS5A ISDR) might be associated with resistance to IFN-based therapy (1, 2, 9, 10). Position 70 in the core region showed arginine, with methionine at aa 91, by Sanger sequencing. In deep sequencing, major variants showed the same amino acid sequence, but minor variants were detectable in 18% (6/34 sequences) of sequences, with replacement of methionine by leucine at aa 91 (Fig. 3A). In the NS5A ISDR, no substitution was indicated for major variants, the same as in direct sequencing by Sanger sequencing, but 16% (14/89 sequences) of sequences showed minor variant replacements of aspartic acid by valine at aa 2220 (Fig. 3B). We validated that more than 10% of the detected variants were effective. For patient 1, core aa 70 (arginine), core aa 91 (leucine), and the number of NS5A ISDR mutations (zero) were the same as those by Sanger sequencing.

Fig 3.

Fig 3

Amino acid substitutions of aa 70 and 91 in the core region (A) and aa 2209 to 2248 of the NS5A ISDR (B). Mutations in these regions were reported to affect the outcome of IFN-based therapies for chronic hepatitis C patients. The lower two lines show the nucleotide sequence and amino acid sequence of HCV-J. Amino acid abbreviations: F, phenylalanine; S, serine; Y, tyrosine; C, cysteine; W, tryptophan; L, leucine; P, proline; H, histidine; Q, glutamine; R, arginine; I, isoleucine; M, methionine; T, threonine; N, asparagine; K, lysine; V, valine; A, alanine; D, aspartic acid; E, glutamic acid; G, glycine.

Amino acid substitutions in NS3 and NS5B.

The recent development of DAA molecules, such as protease inhibitors and polymerase inhibitors, has raised the concern that resistance may weaken the effects of DAA-based therapy (35). It is necessary to obtain the amino acid sequences of NS3 or NS5B variants bearing substitutions which alter the target of the drug. In NS3 in the virus from patient 7, 26 amino acids were changed in comparison with the prototype strain HCV-J. Eight amino acids showed mixed variants, with T72T/I (57 variants [75%]/21 variants [27%]), K213K/R (18 variants [26%]/52 variants [74%]), G237G/S (41 variants [46%]/49 variants [54%]), P264P/S/A (20 variants [42%]/19 variants [40%]/9 variants [19%]), S297S/A (63 variants [81%]/15 variants [19%]), A358A/T (20 variants [21%]/75 variants [79%]), S457S/C (56 variants [64%]/32 variants [36%]), and I615I/M (42 variants [46%]/49 variants [54%]) substitutions (Table 3). For NS5B, the full amino acid sequence was observed, and 31 amino acids were altered in comparison with HCV-J. No mixed variants were seen (Table 3). With patient 1, full amino acid sequences were detected for NS3 and NS5B. Compared with HCV-J, 31 amino acids were altered in NS3, and 3 amino acids showed mixed variants, with L14L/F/V (67 variants [89%]/6 variants [8%]/2 variants [3%]), S61S/A (48 variants [71%]/20 variants [29%]), and I586I/T (4 variants [12%]/30 variants [88%]) substitutions (Table 3). In NS5B, 31 amino acids were converted (Table 3). Note that more than 10% of the minor variants were confirmed as effective.

Table 3.

Amino acid substitutions compared with HCV-J in NS3 and NS5B from viruses of patients 1 and 7

Protein and patient Nucleotide position Amino acid position Prototype amino acid Nucleotide sequencea Amino acid substitutionb
NS3
    Patient 1 3426 7 S gCC A
3447 14 L (C/t/g)TT L/F/V (67 [89]/6 [8]/2 [3])
3495 30 D GAg E
3513 36 L gTt V
3588 61 S (T/g)CG S/A (48 [71]/20 [29])
3591 62 K AgG R
3618 71 I gTC V
3645 80 Q CtG L
3663 86 P CaG Q
3747 114 V aTc I
3801 132 I gTC V
3855 150 V GcT A
3915 170 I gT(A/g) V
4149 248 I gTC V
4152 249 E GAc D
4191 262 G aGC S
4194 263 G Gct A
4218 271 C gGC G
4302 299 T tCC S
4392 329 I gTC V
4479 358 A aaC N
4551 382 T tCg S
4554 382 G GcC A
4563 386 L gTC V
4659 418 F TaT Y
4758 451 L gTG V
4782 459 A tCG S
4815 470 S gGg G
4935 510 S aCa T
5007 534 S gGC G
5076 557 L tTC F
5163 586 I A(T/c)A I/T (4 [12]/30 [88])
5232 609 V aTC I
5268 621 A aCA T
    Patient 7 3495 30 D GAg E
3513 36 L gTt V
3531 42 S aCT T
3549 48 V aTC I
3621 72 T A(C/t)C T/I (57 [73]/21 [27])
3663 86 P CaG Q
3672 89 P tCC S
3687 94 M tTG L
3747 114 V aTc I
3801 132 I gTC V
3855 150 V GcT A
3915 170 I gTA V
4044 213 K A(A/g)g K/R (18 [26]/52 [74])
4116 237 G (G/a)G(C/t) G/S (41 [46]/49 [54])
4149 248 I gTt V
4152 249 E Gac D
4194 263 G Gct A
4197 264 P (C/t/g)CC P/S/A (20 [42]/19 [40]/9 [19])
4218 271 C gGC G
4296 297 S (T/g)CG S/A (63 [81]/15 [19])
4302 299 T tCC S
4479 358 A (G/a)CC A/T (20 [21]/74 [79])
4542 379 A tCA S
4551 382 T tCA S
4554 383 G acC T
4563 386 L aTC I
4659 418 F TaT Y
4758 451 L gTG V
4776 457 S T(C/g)(G/t) S/C (56 [64]/32 [36])
4782 459 A tCg/a S
4815 470 S gcg A
5076 557 L tTC F
5172 589 K AgG R
5250 615 I AT(A/g) I/M (42 [46]/49 [54])
5259 618 Y TtC F
NS5B
    Patient 1 7659 25 P gCG A
7689 35 S Aac N
7701 39 S gCC A
7725 47 L CaG Q
7827 81 R Aaa K
7839 85 I gTA V
7878 98 K AgA R
7914 110 S AaC N
7932 116 V aTt I
7944 120 R CaC H
7956 124 E aAG K
7989 135 D aAc N
8025 147 V aTt I
8151 189 P tCC S
8205 207 T gCC A
8223 213 C aaC N
8238 218 S gCA A
8289 235 T gtT V
8370 262 V aTt I
8484 300 T tCT S
8532 316 N tgC C
8589 335 A aaC N
8598 338 A GtC V
8784 400 V GcT A
8937 451 C acT T
8976 464 E cAg Q
9153 523 K Aga R
9177 531 K AgG R
9252 556 N AgC S
9276 564 L gTG V
9306 574 L TgG W
    Patient 7 7599 5 T tCA S
7689 35 S Aa(c/t) N
7701 39 S gCC A
7725 47 L Caa Q
7827 81 R Aaa K
7839 85 I gTg V
7878 98 K AgA R
7914 110 S AaC N
7926 114 R Aaa K
7956 124 E aAG K
8151 189 P tCC S
8205 207 T gCC A
8223 213 C acC T
8238 218 S gCA A
8277 231 N AgT S
8289 235 T gtT V
8298 238 S gCA A
8370 262 V aTC I
8484 300 T tCT S
8532 316 N tgC C
8589 335 A agC S
8784 400 V GcT A
8937 451 C acT T
8940 452 Y cAC H
8946 454 I gTT V
8976 464 E cAA Q
9177 531 K AgG R
9213 543 S TtC F
9231 549 G aaC N
9252 556 N AgC S
9306 574 L TgG W
a

Uppercase letters indicate prototype nucleotides, and lowercase letters indicate mutations.

b

The numbers and percentages of amino acid bases are displayed in parentheses and brackets, respectively.

DISCUSSION

In this study, we attempted to detect the HCV genome directly in human serum without using specific primers and succeeded in determining and certifying nearly the full genome sequence and a high genetic diversity by using deep Illumina sequencing. HCV has already been reported to be a highly variable virus with a quasispecies distribution, large viral populations, and very rapid turnover in individual patients (6, 26). Previous studies using metagenomic sequencing of other viruses from human clinical samples mostly employed pyrosequencing (11, 12, 23, 30, 46). The longer reads from pyrosequencing (250 to 450 bp) facilitate the assembly of individual reads into contigs, which facilitates the classification of the sequence data by homology-based BLAST alignment. In contrast to metagenomic analysis using pyrosequencing, Illumina short-read sequencing enables a greater depth (by an order of magnitude) that is reflected in a very low detection limit. A recent report revealed that viral transcripts could be found at frequencies of <1 in 1,000,000 (28). However, because of short reads, de novo assembly without any reference is difficult to conduct, so it is not suitable for discovering an unknown viral genome. However, it seems quite useful for resequencing or detection of variants of known viruses for which abundant nucleotide sequence data have already been reported.

We defined the Illumina 76-mer reads as being of HCV origin by relying on the 970 HCV genome sequences in this study. HCV shows considerable genetic diversity and has been classified into 11 genogroups or 6 groups by the core, E1, or NS5B region, with nucleotide divergence. Only about 75% similarity was shown in the variable region, even for the same group (38, 39, 44). The reads were aligned to each of the 970 HCV sequences, allowing mismatches within 10 nucleotide bases. Under these conditions, we could not map the reads of HCV isolates without the 3′-UTR region for patient 9, who had not been infected with HCV. This is because the 3′-UTR has a U-rich region, and it is impossible to decide the reads of HCV origin with specificity. Therefore, we aligned the reads to the full HCV sequence without the 3′-UTR for patients 1 and 7.

Many variants with different nucleotide bases, known as quasispecies, were detected in both patients 1 and 7. Taking a close look at the mixed variants with amino acid substitutions in NS3, patient 7 showed 8 variants, while there were only 3 variants in patient 1. Recent studies reported that HCV genomic sequences in treatment relapsers displayed significantly more mutations than those in nonresponders. HCV sequence analysis of a 4-year post-antiviral-therapy follow-up revealed that the vast majority of mutations selected during the therapy phase were maintained in the relapsers, while very few new mutations arose during the 4-year posttherapy span (5, 47). Based on the experiments mentioned above, treatment with IFN may lead to the emergence of mutations. Deep sequencing is considered a useful tool for detecting viral variants and determining the mutational rate without cloning.

Although there were only a few detected reads of HCV origin obtained from serum, metagenomic analysis could be conducted with the enormous data sets generated by deep sequencing. Consequently, almost the full genome sequence of HCV was demonstrated by using computational analysis of sequential alignments of individual reads, with average depths of 51.1× and 69.5×. However, the regions for which we could not obtain the sequence were 8 nt in the 5′-UTR and 52 nt in the core region for patient 1 and 18 nt in the E2 region for patient 7. Since the 5′-UTR forms on characteristic stem-loop structures, Sanger sequencing is generally difficult (31). Similarly, even deep Illumina sequencing appears to be difficult. Since the E2 region, known as a hypervariable region, is reported to have extreme sequence variability, it was predicted that there would be too many mismatches with the reference HCV genome and that the reads could not be mapped by this analysis. Analysis of this hypervariable region will require further work.

Comparing the qualities of the libraries from patients 1 and 7, that of patient 7 was well refined; hence, the quality of the library is important for gaining large amounts of expected reads. However, a lot of duplicate reads were found for patient 7, and in fact, to analyze the full sequence of HCV, it is considered sufficient to use two lanes for Illumina sequencing.

Amino acid substitutions of core aa 70 and 91 and within the NS5A ISDR, as well as genetic polymorphisms in the host IL28B gene, encoding IFN-λ-3 on chromosome 19, affect the outcome of interferon-based therapies for chronic hepatitis C patients (1, 2, 9, 10, 16, 40, 43). Even if only the amino acid substitution of a major variant were assumed, an accurate therapeutic effect would be impossible to predict. The proportion of minor variants may change the therapeutic effect, and variants cannot be detected only by direct sequencing using Sanger sequencing. In fact, patient 7 showed a methionine at aa 91 in the core region and no substitution in the NS5A ISDR by direct Sanger sequencing, but deep sequencing indicated minor variants (at aa 91 in the core region [18% of sequences] and in the NS5A ISDR [15% of sequences]).

Recently, DAA molecules have been developed for HCV therapies, and these drugs may lead to the selection of resistant viruses if administered alone. The first-generation NS3/4A inhibitors are telaprevir and boceprevir, and most of the reported clinical data on drug resistance were obtained from patients treated with telaprevir. As an illustration, based on in vitro studies, telaprevir resistance related to amino acid substitutions V36A/M/C, T54A/S, R155K/T/Q, A155V/T, and A156T has been reported (4, 22, 37). Substitutions that generated boceprevir resistance included those detected in a patient treated with telaprevir, plus V170A/T and V55A substitutions (13, 41). Of the nonnucleoside inhibitors of RdRp displaying inhibitory activities against the RdRp enzyme at NS5B, few have been reported in the in vivo resistance data (27). This is because studies of antiviral efficacy are generally limited to 3 to 5 days. Yet, for example, the S282T substitution has been reported to confer a loss of in vitro sensitivity to nucleoside/nucleotide analogue inhibitors (3). In the future, the triple combination of peg-IFN-α, RBV, and a protease inhibitor or several combinations of DAAs will soon become the standard therapy for treatment-naïve and treatment-experienced patients with HCV genotype 1. In this study, though neither patient 1 nor patient 7 showed drug-resistant variants, it would be very important for the selection of therapy to identify resistant or minor variants prior to treatment. Additionally, when treatment has failed, it is necessary to consider viral factors as a cause.

In conclusion, deep sequencing technologies are a powerful tool for obtaining more profound insight into the dynamics of variants in the HCV quasispecies in human serum. Although the cost of deep sequencing is still much greater than the reagent costs for Sanger sequencing, it is still attractive in clinical medicine because deep sequencing is able to generate much more information on the viral genome sequences in internal organs. The cost will decrease in the future as the technology of deep sequencing develops. As DAA combination treatment of HCV infection is developed, obtaining sequence information on variants in individual cases by use of deep sequencing will be feasible for determining optimal antiviral treatment.

Supplementary Material

Supplemental material

ACKNOWLEDGMENTS

We thank M. Tsuda, N. Koshita, and K. Kuroda for technical assistance. We also acknowledge the support of the Biomedical Research Core of the Tohoku University Graduate School of Medicine.

This study was supported in part by a Grant-in-Aid for Young Scientists (B) from the Ministry of Education, Culture, Sports, Science, and Technology of Japan (assignment no. 22790627) and by grants from the Ministry of Health, Labor, and Welfare of Japan.

Footnotes

Published ahead of print 28 December 2011

Supplemental material for this article may be found at http://jcm.asm.org/.

REFERENCES

  • 1. Akuta N, et al. 2007. Prediction of response to pegylated interferon and ribavirin in hepatitis C by polymorphisms in the viral core protein and very early dynamics of viremia. Intervirology 50:361–368 [DOI] [PubMed] [Google Scholar]
  • 2. Akuta N, et al. 2005. Association of amino acid substitution pattern in core protein of hepatitis C virus genotype 1b high viral load and non-virological response to interferon-ribavirin combination therapy. Intervirology 48:372–380 [DOI] [PubMed] [Google Scholar]
  • 3. Ali S, et al. 2008. Selected replicon variants with low-level in vitro resistance to the hepatitis C virus NS5B polymerase inhibitor PSI-6130 lack cross-resistance with R1479. Antimicrob. Agents Chemother. 52:4356–4369 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Barbotte L, et al. 2010. Characterization of V36C, a novel amino acid substitution conferring hepatitis C virus (HCV) resistance to telaprevir, a potent peptidomimetic inhibitor of HCV protease. Antimicrob. Agents Chemother. 54:2681–2683 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Cannon NA, Donlin MJ, Fan X, Aurora R, Tavis JE. 2008. Hepatitis C virus diversity and evolution in the full open-reading frame during antiviral therapy. PLoS One 3:e2123. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Choo QL, et al. 1991. Genetic organization and diversity of the hepatitis C virus. Proc. Natl. Acad. Sci. U. S. A. 88:2451–2455 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Clavel F, Hance AJ. 2004. HIV drug resistance. N. Engl. J. Med. 350:1023–1035 [DOI] [PubMed] [Google Scholar]
  • 8. Domingo E, et al. 1985. The quasispecies (extremely heterogeneous) nature of viral RNA genome populations: biological relevance—a review. Gene 40:1–8 [DOI] [PubMed] [Google Scholar]
  • 9. Enomoto N, et al. 1995. Comparison of full-length sequences of interferon-sensitive and resistant hepatitis C virus 1b. Sensitivity to interferon is conferred by amino acid substitutions in the NS5A region. J. Clin. Invest. 96:224–230 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Enomoto N, et al. 1996. Mutations in the nonstructural protein 5A gene and response to interferon in patients with chronic hepatitis C virus 1b infection. N. Engl. J. Med. 334:77–81 [DOI] [PubMed] [Google Scholar]
  • 11. Feng H, Shuda M, Chang Y, Moore PS. 2008. Clonal integration of a polyomavirus in human Merkel cell carcinoma. Science 319:1096–1100 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Finkbeiner SR, et al. 2009. Identification of a novel astrovirus (astrovirus VA1) associated with an outbreak of acute gastroenteritis. J. Virol. 83:10836–10839 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Flint M, et al. 2009. Selection and characterization of hepatitis C virus replicons dually resistant to the polymerase and protease inhibitors HCV-796 and boceprevir (SCH 503034). Antimicrob. Agents Chemother. 53:401–411 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. Fried MW, et al. 2002. Peginterferon alfa-2a plus ribavirin for chronic hepatitis C virus infection. N. Engl. J. Med. 347:975–982 [DOI] [PubMed] [Google Scholar]
  • 15. Gao M, et al. 2010. Chemical genetics strategy identifies an HCV NS5A inhibitor with a potent clinical effect. Nature 465:96–100 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. Ge D, et al. 2009. Genetic variation in IL28B predicts hepatitis C treatment-induced viral clearance. Nature 461:399–401 [DOI] [PubMed] [Google Scholar]
  • 17. Honda M, Beard MR, Ping LH, Lemon SM. 1999. A phylogenetically conserved stem-loop structure at the 5′ border of the internal ribosome entry site of hepatitis C virus is required for cap-independent viral translation. J. Virol. 73:1165–1174 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Honda M, et al. 1996. Structural requirements for initiation of translation by internal ribosome entry within genome-length hepatitis C virus RNA. Virology 222:31–42 [DOI] [PubMed] [Google Scholar]
  • 19. Ina Y. 1994. ODEN: a program package for molecular evolutionary analysis and database search of DNA and amino acid sequences. Comput. Appl. Biosci. 10:11–12 [DOI] [PubMed] [Google Scholar]
  • 20. Kato N, et al. 1990. Molecular cloning of the human hepatitis C virus genome from Japanese patients with non-A, non-B hepatitis. Proc. Natl. Acad. Sci. U. S. A. 87:9524–9528 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. Kieffer TL, et al. 2007. Telaprevir and pegylated interferon-alpha-2a inhibit wild-type and resistant genotype 1 hepatitis C virus replication in patients. Hepatology 46:631–639 [DOI] [PubMed] [Google Scholar]
  • 22. Kwong AD, McNair L, Jacobson I, George S. 2008. Recent progress in the development of selected hepatitis C virus NS3.4A protease and NS5B polymerase inhibitors. Curr. Opin. Pharmacol. 8:522–531 [DOI] [PubMed] [Google Scholar]
  • 23. Lataillade M, et al. 2010. Prevalence and clinical significance of HIV drug resistance mutations by ultra-deep sequencing in antiretroviral-naive subjects in the CASTLE study. PLoS One 5:e10952. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24. Li H, et al. 2009. The sequence alignment/map format and SAMtools. Bioinformatics 15:2078–2079 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25. Manns MP, et al. 2001. Peginterferon alfa-2b plus ribavirin compared with interferon alfa-2b plus ribavirin for initial treatment of chronic hepatitis C: a randomised trial. Lancet 358:958–965 [DOI] [PubMed] [Google Scholar]
  • 26. Martell M, et al. 1992. Hepatitis C virus (HCV) circulates as a population of different but closely related genomes: quasispecies nature of HCV genome distribution. J. Virol. 66:3225–3229 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27. McCown MF, et al. 2008. The hepatitis C virus replicon presents a higher barrier to resistance to nucleoside analogs than to nonnucleoside polymerase or protease inhibitors. Antimicrob. Agents Chemother. 52:1604–1612 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28. Moore RA, et al. 2011. The sensitivity of massively parallel sequencing for detecting candidate infectious agents associated with human tissue. PLoS One 6:e19838. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29. Moradpour D, Penin F, Rice CM. 2007. Replication of hepatitis C virus. Nat. Rev. Microbiol. 5:453–463 [DOI] [PubMed] [Google Scholar]
  • 30. Nakamura S, et al. 2009. Direct metagenomic detection of viral pathogens in nasal and fecal specimens using an unbiased high-throughput sequencing approach. PLoS One 4:e4219. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31. Ninomiya M, Takahashi M, Shimosegawa T, Okamoto H. 2007. Analysis of the entire genomes of fifteen torque teno midi virus variants classifiable into a third group of genus Anellovirus. Arch. Virol. 152:1961–1975 [DOI] [PubMed] [Google Scholar]
  • 32. Okamoto H, et al. 1992. Genetic drift of hepatitis C virus during an 8.2-year infection in a chimpanzee: variability and stability. Virology 190:894–899 [DOI] [PubMed] [Google Scholar]
  • 33. Okamoto H, et al. 1992. Full-length sequence of a hepatitis C virus genome having poor homology to reported isolates: comparative study of four distinct genotypes. Virology 188:331–341 [DOI] [PubMed] [Google Scholar]
  • 34. Okamoto H, et al. 1993. Characterization of the genomic sequence of type V (or 3a) hepatitis C virus isolates and PCR primers for specific detection. J. Gen. Virol. 74:2385–2390 [DOI] [PubMed] [Google Scholar]
  • 35. Pawlotsky JM. 2011. Treatment failure and resistance with direct-acting antiviral drugs against hepatitis C virus. Hepatology 53:1742–1751 [DOI] [PubMed] [Google Scholar]
  • 36. Robinson JT, et al. 2011. Integrative genomics viewer. Nat. Biotechnol. 29:24–26 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37. Sarrazin C, et al. 2007. Dynamic hepatitis C virus genotypic and phenotypic changes in patients treated with the protease inhibitor telaprevir. Gastroenterology 132:1767–1777 [DOI] [PubMed] [Google Scholar]
  • 38. Simmonds P, et al. 1994. Identification of genotypes of hepatitis C virus by sequence comparisons in the core, E1 and NS-5 regions. J. Gen. Virol. 75:1053–1061 [DOI] [PubMed] [Google Scholar]
  • 39. Smith DB, et al. 1997. The origin of hepatitis C virus genotypes. J. Gen. Virol. 78:321–328 [DOI] [PubMed] [Google Scholar]
  • 40. Suppiah V, et al. 2009. IL28B is associated with response to chronic hepatitis C interferon-α and ribavirin therapy. Nat. Genet. 41:1100–1104 [DOI] [PubMed] [Google Scholar]
  • 41. Susser S, et al. 2009. Characterization of resistance to the protease inhibitor boceprevir in hepatitis C virus-infected patients. Hepatology 50:1709–1718 [DOI] [PubMed] [Google Scholar]
  • 42. Szpara ML, Parsons L, Enquist LW. 2010. Sequence variability in clinical and laboratory isolates of herpes simplex virus 1 reveals new mutations. J. Virol. 84:5303–5313 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43. Tanaka Y, et al. 2009. Genome-wide association of IL28B with response to pegylated interferon-α and ribavirin therapy for chronic hepatitis C. Nat. Genet. 41:1105–1109 [DOI] [PubMed] [Google Scholar]
  • 44. Tokita H, et al. 1996. Hepatitis C virus variants from Jakarta, Indonesia classifiable into novel genotypes in the second (2e and 2f), tenth (10a) and eleventh (11a) genetic groups. J. Gen. Virol. 77:293–301 [DOI] [PubMed] [Google Scholar]
  • 45. Verbinnen T, et al. 2010. Tracking the evolution of multiple in vitro hepatitis C virus replicon variants under protease inhibitor selection pressure by 454 deep sequencing. J. Virol. 84:11124–11133 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46. Victoria JG, et al. 2009. Metagenomic analyses of viruses in stool samples from children with acute flaccid paralysis. J. Virol. 83:4642–4651 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47. Xiang X, et al. 2011. Viral sequence evolution in Chinese genotype 1b chronic hepatitis C patients experiencing unsuccessful interferon treatment. Infect. Genet. Evol. 11:382–390 [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental material

Articles from Journal of Clinical Microbiology are provided here courtesy of American Society for Microbiology (ASM)

RESOURCES