Skip to main content
Journal of Clinical Microbiology logoLink to Journal of Clinical Microbiology
. 2013 Feb;51(2):444–451. doi: 10.1128/JCM.00739-12

Factors Influencing the Sensitivity and Specificity of Conventional Sequencing in Human Immunodeficiency Virus Type 1 Tropism Testing

David J H F Knapp a, Rachel A McGovern a, Winnie Dong a, Art F Y Poon a, Luke C Swenson a, Xiaoyin Zhong a, Conan K Woods a, P Richard Harrigan a,b,
PMCID: PMC3553919  PMID: 23175258

Abstract

Human immunodeficiency virus type 1 (HIV-1) V3 loop sequence can be used to infer viral coreceptor use. The effect of input copy number on population-based sequencing of the V3 loop of HIV-1 was examined through replicate deep and population-based sequencing of samples with known tropism, a heterogeneous clinical sample (624 population-based sequences and 47 deep-sequencing replicates), and a large cohort of clinical samples from phase III clinical trials of maraviroc including the MOTIVATE/A4001029 studies (n = 1,521). Proviral DNA from two independent samples from each of 101 patients from the MOTIVATE/A4001029 studies was also analyzed. Cumulative technical error occurred at a rate of 3 × 10−4 mismatches/bp, without observed effect on inferred tropism. Increasing PCR replication increased minority species detection with an ∼10% minority population detected in 18% of cases using a single replicate at a viral load of 1,072 copies/ml and in 44% of cases using three replicates. The nucleotide prevalence detected by population-based and deep sequencing were highly correlated (Spearman's ρ, 0.73), and the accuracy increased with increasing input copy number (P < 0.001). Triplicate sequencing was able to predict tropism changes in the MOTIVATE/A4001029 studies for both low (P = 0.05) and high (P = 0.02) viral loads. Sequences derived from independently extracted and processed samples of proviral DNA for the same patient were equivalent to replicates from the same extraction (P = 0.45) and had correlated position-specific scoring matrix scores (Spearman's ρ, 0.75; P ≪ 0.001); however, concordance in tropism inference was only 83%. Input copy number and PCR replication are important factors in minority species detection in samples with significant heterogeneity.

INTRODUCTION

Human immunodeficiency virus type 1 (HIV-1) requires one of two primary coreceptors (CCR5 or CXCR4) (1) to mediate viral entry. HIV-1 can be CCR5 using (R5) or CXCR4 using (X4). In addition, the virus may use both coreceptors, or the population may be comprised of both R5 and X4 viruses (dual/mixed [DM]) (1, 2). Coreceptor usage can change over the course of infection (3, 4). R5 virus is associated with early infection, whereas X4 virus has been linked to later stages of infection, faster disease progression, and poor clinical outcome (3, 5). The recently approved drug maraviroc (ViiV Healthcare) inhibits viral entry by selectively binding the CCR5 coreceptor (6, 7). Maraviroc is thus effective only against R5 virus (6, 7). As such, patient screening for viral tropism prior to initiation of maraviroc is essential for successful treatment (8).

Initially, determination of tropism was done using the MT2 assay which distinguished between syncytium-inducing and non-syncytium-inducing viruses (9). While there is a correlation between the ability to induce syncytia and coreceptor usage, it is not a perfect predictor (10). Currently, phenotypic screens using recombinant virus assays are commonly used to determine viral tropism (8); however, these methods are both costly and time-consuming compared with genotypic testing (8). Genotypic tropism inference can be achieved by sequencing of the V3 loop within the HIV-1 gp120 protein in conjunction with a bioinformatic algorithm (1116). Such genotypic methods can successfully predict non-R5 tropism (17), with predictive power equivalent to the original Trofile assay (a phenotypic tropism assay) (18). Next-generation deep sequencing, a powerful method which allows the sequencing of individual template molecules by physical separation (19), is also gaining widespread use. Deep sequencing has improved sensitivity and viral tropism prediction over population-based sequencing, with sensitivities comparable to the latest phenotypic assay, the enhanced-sensitivity Trofile assay (17, 20).

The PCR used in the amplification of DNA prior to sequencing can introduce errors (21). Determination of primary sequence from the chromatograms from Sanger sequencing is called base calling. Base calling is an error-prone process, the accuracy of which is estimated by base-calling software such as Phred (CodonCode Corp.), which estimates the relative likelihood of a base-calling error (2224). Deep sequencing, despite its additional power, is also prone to additional sources of error, such as insertion/deletion errors associated with stretches of homopolymer (19). Detection of a given variant is also prone to sampling error, which can be modulated by both input concentration (e.g., viral load) and the proportion of the total population which is sampled (e.g., volume of input plasma) as well as by the efficiency of detection of the sampled template (e.g., reverse transcription-PCR [RT-PCR] efficiency). Systematic studies on the effects of these sources of error on minority variant detection and tropism determination have not been thoroughly performed.

The manuscript presents an analysis of factors that influence the sensitivity and specificity of genotypic testing via reverse transcriptase PCR followed by conventional sequencing methodologies, with particular emphasis on HIV-1 tropism testing.

MATERIALS AND METHODS

Sample material.

The HIV-1 clones pNL4-3 (NIH catalog number 114, lot number 8 098164) and Bal (NIH catalog number 510, lot number 14 098109) were used as known X4 and R5 viruses, respectively. Viral NL4-3 RNA was generated by transduction of pNL4-3 plasmid into 293T cells followed by harvest of the cell culture supernatant. The Bal plasmid was similarly transduced, but viral supernatant was subsequently used to reinfect additional 293T cells for several passages prior to RNA extraction (for details, see Methods in the supplemental material). A clinical plasma sample known to contain multiple minority variants, including low levels of minority non-R5 variants from a patient infected with HIV-1 clade B, was repeatedly sequenced, using both conventional and amplicon-based deep sequencing on a GS FLX (454) sequencer (Roche, Basel, Switzerland); this method allows the parallel sequencing of thousands of clones from a single sample. This was done in order to determine the sensitivity and reproducibility of conventional sequencing results as this nonhomogeneous clinical sample is representative of samples which are the most difficult to detect by conventional sequencing approaches. Samples from the phase 3 clinical trials of maraviroc, the MOTIVATE and A4001029 studies (7, 25), which had both population-based sequence data and matched deep-sequencing data (26) in both the forward and reverse direction, were used as a wider data set for comparison (n = 1,521 samples). Two independent samples of peripheral blood mononuclear cells (PBMC) from 39 patients from the MOTIVATE study and from 62 patients from the A401029 study were also sequenced. Ethical approval for this study was granted by the Providence Health Care/University of British Columbia Research Ethics Board.

Sample processing.

Samples were extracted using a NucliSens easyMAG system (bioMérieux), amplified by nested RT-PCR, and sequenced on an ABI 3730xl DNA Analyzer (Applied Biosystems, Foster, CA) (see Methods in the supplemental material for details). Resulting sequences were analyzed using the custom base-calling software ReCall (27, 28). Base calling was automated, with mixtures being called if a minority peak area, defined as the area under the chromatogram curve (generated by the program ReCall which uses Phred peak areas) of the minority peak divided by that of the majority peak, exceeded 12.5% of the primary peak area. Sequences were then run through a position-specific scoring matrix (PSSM) (29). Sequences with a score of −4.75 or greater were inferred to come from non-R5 virus, while those with a score below −4.75 were inferred to come from R5 virus.

Deep sequencing.

The nonhomogeneous clinical sample described in the paragraph “Sample material” was subjected to deep sequencing a total of 47 times over the course of 18 runs on a Genome Sequencer FLX (Roche, Basel, Switzerland) using the amplicon method as described by Swenson et al. (26) to establish the prevalence of each unique sequence (taken as the mean of the 47 replicates). Only sequence runs with 750 or more sequences were included. Individual sequences were discarded if they had insertions or deletions of 1 or 2 bases in length, yielding a mean of 1,873 sequences in the forward direction and 2,210 sequences in the reverse direction for each sequencing replicate. Sequence prevalence was calculated by dividing the number of times a particular unique sequence was observed by the total number of sequences for that particular sequencing replicate.

Statistical analysis. (i) Sensitivity, specificity, and error rate with laboratory-grown virus with known tropism.

To determine sensitivity and specificity of the assay, samples with known X4 (NL4-3) and known R5 (Bal) tropisms were used. NL4-3 was diluted in HIV-uninfected plasma to nominal viral loads of 4,957, 2,478, 1,239, 620, 310, and 155 copies/ml. Likewise, Bal was diluted in negative plasma to nominal viral loads of 5,035, 2,518, 1,259, 629, 315, and 157 copies/ml. Plasma at each viral load was extracted once, followed by 14 independent nested RT-PCRs and sequencing reactions. Deviations from the known sequence of these clones were quantified as a measurement of technical error in the sequencing pipeline. Sensitivity and specificity were calculated as the number of true positives/number of positive outcomes and the number of true negatives/number of negative outcomes, respectively.

(ii) Sensitivity at a low-input viral load.

In order to test the effect of viral load on the sensitivity of the assay for detecting minority non-R5 virus, a serial dilution in negative plasma of a well-characterized nonhomogeneous clinical sample tested by repeated deep sequencing generated pools with viral loads of, nominally, 2,145, 1,072, 536, and 268 copies/ml. Each dilution was extracted separately 10 times (14 times for a viral load of 536). Each extract was subjected to six separate nested RT-PCRs and sequencing reactions. A second serial dilution with nominal viral loads of 8,580, 1,072, 536, 214, and 107 copies/ml was also performed for which RNA from plasma at each viral load was extracted, amplified, and sequenced 95 or 96 times. Chi-square (χ2) tests were performed comparing the proportion of sequences called non-R5 between different extracts within each viral load and between extracts at different viral loads.

In order to determine sensitivity of the assay for detection of minority non-R5 virus, a bootstrap analysis with resampling (n = 10,000) was performed from the nonhomogeneous clinical sample data for extracts from each viral load dilution, sampling 1, 2, 3, 4, 6, and 10 sequences (replicates) per test. A test was inferred as non-R5 if any of the replicates were non-R5. Binomial models were then constructed to determine the theoretical proportion of tests that should detect a sequence at various prevalence levels and replications.

(iii) Base calling and chromatogram analysis.

The relative proportion of minority species measured by population-based sequencing was estimated by dividing the minority chromatogram peak area (calculated by Phred) by the total (minority plus majority) chromatogram peak areas at positions marked as mixtures. Non-R5 prevalence per sample was estimated by taking the mean minority prevalence for polymorphisms specific to sequences called non-R5 by PSSM in the nonhomogeneous clinical sample (in particular at V3 bases 2, 4, 5, 8, 10, 13, 16, 17, 20, 21, 23, 29, 33, 39, 55, 80, and 81). A weighted average of non-R5 virus was then calculated at each viral load. This was done in order to determine the effect of viral load on the measured base prevalence for a minority species of known prevalence (in this case, non-R5 virus).

In order to determine the frequency of detection of specific sequences present in the heterogenous clinical sample, the 10 most common sequences detected by repeated deep-sequencing analysis (n = 47) were compared to the population-based sequences. In order to simulate the effect of triplicate PCR, a bootstrap analysis with resampling (n = 10,000) was performed where at each sampling three sequences were chosen at random from the population-based sequences. The overall proportion of times the population-based sequence successfully detected each of the 10 most common sequences by deep sequencing was determined.

(iv) Chromatogram concordance between population-based and deep sequencing in a large data set.

Screening samples from the MOTIVATE and A4001029 studies were amplified and sequenced using both population-based and deep sequencing as above. For each base in sequences from this data set (n = 1,521), the prevalence of each base (calculated as the chromatogram peak area for that base divided by the total peak area at that position) was compared to the prevalence of that base in the corresponding deep sequence using Spearman's rank correlation. The relationship was shown graphically using a heat map from a randomly selected subset of 100,000 nucleotides.

(v) Reproducibility of proviral DNA sequencing for tropism inference.

In order to determine the reproducibility of sequencing proviral DNA for genotypic tropism inference, two separate PBMC samples from each patient in a group of 101 patients from the MOTIVATE and A4001029 studies (39 and 62 patients, respectively) were each sequenced in triplicate. Pairwise evolutionary distances between sequences were calculated using the algorithm multiple sequence comparison by log-expectation (MUSCLE) (30). Mean evolutionary distance was calculated within replicates of each sample, between the two samples from each patient, and between different patients. These values were then compared using Wilcoxon rank sum tests with continuity correction. Spearman's rank correlation was used to compare maximum PSSM scores between the two samples from each patient.

(vi) Effect of viral load in a large clinical data set.

Samples identified as R5 from the maraviroc arm of the MOTIVATE data set which had at least three successful genotypes performed and had follow-up tropism phenotype data available (n = 530) were stratified by viral load at the time of genotyping (at screening). Genotypic tropism was called using both single-replicate and triplicate sequence results. Kaplan-Meier plots were constructed for each viral load stratum, and differences between genotype calls within each viral load stratum were tested using a log rank test (31).

RESULTS

Sensitivity, specificity, and error rates among laboratory-grown viruses with known tropism.

A total of 65 HIV V3 sequences were obtained from NL4-3 supernatant (14 each for viral loads of 4,957, 2,478, and 1,239 copies/ml, 8 at a viral load 620 copies/ml, 9 at a viral load of 310 copies/ml, and 7 at a viral load of 155 copies/ml). A total of 71 sequences were successfully acquired for Bal (14 each for viral loads of 5,035 and 2,517 copies/ml, 13 at a viral load of 1,259 copies/ml, 12 at a viral load of 629 copies/ml, 11 at a viral load of 315 copies/ml, and 7 at a viral load of 157 copies/ml). For these laboratory-grown viruses of known tropism (i.e., 100% X4 or 100% R5), our test showed 100% sensitivity and specificity, respectively, at all viral loads. The X4 virus (NL4-3) had an error rate of 3 × 10−4, whereas the R5 virus (Bal) had an error rate of 5 × 10−3. In both cases, most errors consisted of nucleotide substitutions appearing as mixtures. None of these errors affected the tropism result.

Amplification/sequencing success rate in a heterogeneous clinical sample.

A total of 624 sequences were successfully generated in 741 attempts using a well-characterized clinical sample. As expected, the rate of successful amplification and sequencing increased with increasing viral load, exceeding 90% with viral loads of 1,000 copies/ml or greater (Fig. 1).

Fig 1.

Fig 1

Amplification/Sanger genotype success rate as a function of viral load. Plasma from a well-characterized heterogeneous clinical sample was diluted to the indicated viral loads, extracted, RT-PCR amplified, and Sanger sequenced. The number of attempted amplification/sequencing reactions is shown for each viral load tested. Amplification/sequencing attempts from which a sequence was generated were considered a success, while those from which sequences could not be generated were considered failures. A cubic smoothing spline fit of the data is shown as a black line. Bars indicate 95% confidence intervals calculated using the Agresti-Coull method.

Effects of extraction and viral load on non-R5 detection.

As multiple extractions were performed on the nonhomogeneous clinical sample at each viral load tested (with the exception of the viral load of 8,580 copies/ml), the proportions of sequences called non-R5 were compared between the various extractions done at each viral load. No significant differences were present in the proportions of sequences called non-R5 between independent extractions performed at the same viral loads (χ2 P values from 0.07 to 0.62). In order to determine whether the viral load of a sample had an effect on inferred tropism, the proportions of sequences called non-R5 were also compared between sequences derived from different viral loads (for example, sequences from viral loads of 536 copies/ml and 8,580 copies/ml). This revealed a significant difference in inferred tropism between viral loads (χ2 P = 0.03).

Sensitivity for detecting non-R5 virus in a well-characterized clinical sample as a function of input viral load and number of replicates.

The nonhomogeneous clinical sample had a median of 9.4% (interquartile range [IQR], 8.0 to 10.0%) non-R5 virus by PSSM analysis of 47 deep-sequencing results. Serial dilutions of this sample were repeatedly sequenced in order to determine the approximate distribution of R5 and non-R5 sequences detected at each dilution. Bootstrap resampling was performed using the observed sequence distributions with different numbers of sequences sampled per test (i.e., simulated replicates) in order to determine the sensitivity of the assay at different input viral loads and numbers of replicates. Sensitivity for population sequencing results was strongly influenced by both changes in viral load and increasing replication (Fig. 2A). At low viral loads, the sensitivity of the assay was approximately equal to the proportion of non-R5 virus in the sample (in this case, 9.4%) (Fig. 2A). Increasing the viral load increased sensitivity (Fig. 2A), as would be predicted by a binomial model (see Fig. SA1 in the supplemental material); however, at a viral load of 8,580 copies/ml, the non-R5 detection rate fell below the actual proportion of non-R5-using virus (Fig. 2A). Increasing the replication increased sensitivity for detecting non-R5-using virus regardless of viral load (Fig. 2A). The effect of viral load on sensitivity was mirrored by its effect on the number of mixtures associated with sequences called non-R5 by PSSM detected in the sample (Fig. 2B). The greatest number of these mixtures occurred between viral loads of 536 and 2,145 copies/ml (Fig. 2B).

Fig 2.

Fig 2

Sensitivity for the detection of minority non-R5 virus by viral load. (A) Percent detection of minority non-R5 virus is shown from a bootstrap resampling performed on the observed sequence distributions from a heterogeneous clinical sample with different numbers of sequences sampled per test (i.e., simulated replicates) in order to determine the sensitivity of the assay at different input viral loads and numbers of replicates. Points were generated through 10,000 bootstrap resamplings with each replication. Numbers of replicates are indicated on the figure. (B) Mean numbers of non-R5-specific mixtures are shown over a range of viral loads. Results from the forward primer (V3O2F) are shown in blue, and those from the reverse primer (SQV3R2) are in red. (C) Measured percentages of non-R5 virus estimated from population-based sequence chromatogram peak areas are shown at various viral loads. The median prevalence of non-R5 virus by deep sequencing is shown as a dashed line. Results from the forward primer (V3O2F) are shown in blue, and those from the reverse primer (SQV3R2) are in red.

Minority non-R5 prevalence inferred from chromatogram peak areas showed a significant negative correlation (r = −0.95) in both the forward and reverse sequencing directions (P values of <0.001), approaching the true prevalence of 9.4% at a viral load of 8,580 copies/ml (Fig. 2C).

Sequence detection of individual variants.

In order to determine the ability of population-based sequencing to detect sequences present at various proportions, detection by population-based sequencing of the 10 most common variants observed by repeated deep sequencing (n = 47) of this clinical sample was quantified using a bootstrap analysis to simulate triplicate sequencing. In general, the proportion of samples in which a given sequence was detected by population-based sequencing was consistent with the average measurement obtained using deep sequencing. Triplicate resampling allowed consistent detection of common sequences and increased rates of detection of the lower-prevalence sequences (Fig. 3). The most common sequence by deep sequencing (24% of the total population) was detected 68.5% of the time at a viral load of 1,072 copies/ml, while the second most common sequence by deep sequencing (12.3% of the total population) was detected 87.9% of the time at a viral load of 1,072 copies/ml (Fig. 3). Detection of the two most common sequences increased with viral load, giving Spearman's ρ values of 0.82 and 0.96, respectively (P = 0.03 and 0.003, respectively). Detection of the non-R5 sequence with ∼3.8% prevalence decreased with increasing viral load, giving a Spearman's ρ of −0.78 (P = 0.04) (Fig. 3). Non-R5 sequences from deep sequencing were detected only in samples that were called non-R5 by population-based sequencing. Detection of the common non-R5 sequences occurred more frequently at lower viral loads (Fig. 3). Not all sequences called non-R5 by population-based sequencing contained either of the common non-R5 sequences detected by deep sequencing (Fig. 3).

Fig 3.

Fig 3

Population-based sequencing detection of the 10 most common sequences detected by deep sequencing of a heterogeneous clinical sample. Three population-based sequences were sampled from the observed distribution of sequences at each viral load in a total of 10,000 bootstraps. Each point represents the proportion of times in which at least one of the three sampled population-based sequences matched, or contained mixtures which could allow it to match, the specific sequence detected by deep sequencing. The prevalence of each sequence averaged over 47 deep-sequencing runs is given on the figure.

Concordance of nucleotide prevalence between population and deep sequencing in the MOTIVATE and A401029 trials.

Comparisons between base prevalence inferred from the population-based sequencing peak area and measured by deep sequencing were made for over 6 million nucleotides from 1,521 samples from the MOTIVATE and A401029 studies. Base prevalence was highly correlated between the two platforms (Spearman's ρ, 0.73; P ≪ 0.001), demonstrating that both platforms are independently capable of measuring base prevalence. Fig. 4A shows a subset derived from 100,000 randomly sampled bases.

Fig 4.

Fig 4

Comparison of population-based sequencing chromatograms with deep-sequencing results. (A) The concordance between base prevalence measured by deep sequencing and estimated from population-based sequencing peak area is shown for 100,000 randomly selected bases from the MOTIVATE/A4001029 studies (n = 1,521). The density of bases in a given area of the plot is shown by color intensity, with darker colors indicating more dense regions. Perfect concordance is shown as a red line. (B) Chromatograms for a single arbitrarily chosen sequence from the MOTIVATE/A4001029 studies are shown. The prevalence of minority species as measured by deep sequencing is shown underneath bases called as ambiguous by ReCall.

Reproducibility of proviral DNA sequencing for tropism inference.

Each of two independent PBMC samples from a subset of 101 patients from the MOTIVATE and A4001029 studies was sequenced in triplicate in order to determine the reproducibility of sequencing from proviral DNA for genotypic tropism inference. Mean evolutionary distance was equivalent between sequence replicates from the same sample and between independently extracted and processed samples from the same patient (P = 0.45) but was significantly different for sequences from different patients (P ≪ 0.001) (see Fig. SA2A in the supplemental material), confirming that proviral DNA sequencing can reproducibly detect patient sequences. Maximum PSSM scores from the two samples from each patient were correlated, giving a Spearman's ρof 0.75 (P ≪ 0.001) (see Fig. SA2B). Inferred tropism from PSSM scores was 83% concordant between samples from each patient.

The effect of viral load and number of replicates on tropism determination.

Reproducibility of genotypic prediction of tropism increased with increasing viral load and with increases in the number of replicates. Patients with a median log viral load of 4.53 copies/ml (IQR, 4.18 to 4.79 copies/ml) had no significant differences between non-R5 prediction based on single-replicate genotypes (P = 0.16) and a relatively narrow, though significant, separation for non-R5 prediction based on triplicate genotypes (P = 0.05) (Fig. 5). In patients with a median log viral load of 5.41 copies/ml (IQR, 5.19 to 5.65 copies/ml), non-R5 predictions based on single-replicate genotyping were, again, not significantly different (P = 0.17) while non-R5 predictions based on triplicate genotyping were significant (P = 0.02) (Fig. 5).

Fig 5.

Fig 5

The effect of viral load on genotypic tropism determination in a large data set. Kaplan-Meier curves showing time-to-change in tropism for samples from the maraviroc arm of the MOTIVATE data set that were phenotypically R5 at screening by the original Trofile assay, had at least three available PSSM scores, and had postscreening phenotypes are shown (n = 530). An event was considered to occur when a sample went from being R5 by the original Trofile assay to being X4 or DM. The proportion remaining R5 is shown on the y axis. Samples were stratified by viral load (pVL; log10 copies/ml) at screening.

DISCUSSION

This study investigated the accuracy and reproducibility of conventional sequencing and its application to genotypic tropism inference in HIV-1 via replicate sequencing of a variety of clonal and patient samples over a range of viral loads, as well as by comparison to deep-sequencing results of the same samples.

Population-based sequencing was able to sensitively and specifically detect known R5 and non-R5 viruses correctly. Further, technical error rates, measured by differences from the known sequences of the laboratory-grown viruses, were low. In the case of NL4-3 these were in line with the number of errors one would expect to be introduced by the Taq polymerases used in the RT-PCR amplification required for sequencing (21). The higher rates observed with Bal were likely a result of true minority populations which had arisen during passage through culture. Overall, these results suggest that the amplification/Sanger sequencing can robustly detect majority species and that the rate of error is low enough that technical errors are unlikely to alter tropism inference.

Sequencing of a heterogeneous clinical sample with low levels of minority non-R5 virus demonstrated that population-based sequencing can also accurately detect minority species though minority species detection was strongly influenced by viral load (and, thus, input template copy number). At low viral loads minority species tended to have a prevalence of ∼50% (thus a 1:1 ratio with majority sequence), suggesting that the initial amplification took place from only two input molecules. Increasing viral load (and, thus, input template copies) increased the accuracy of measured minority prevalence, approaching the true proportion of non-R5 virus as measured by deep sequencing at the highest measured viral loads, suggesting that chromatogram peak area provides a more accurate measurement of actual minority species proportion as viral load increases.

An excellent correlation between chromatogram peak area and prevalence by deep sequencing was also shown in the samples from the MOTIVATE and A4001029 studies, including cases of nucleotide mixtures. This represents, to our knowledge, the largest direct comparison of base prevalence measurements by deep and population-based sequencing. These data covered a wide range of samples with a range of relatively high viral loads, suggesting that both population-based and deep sequencing provide a robust approximation of true proportion.

Detection of rare species decreased with increasing viral load, while detection of common species increased, consistent with sampling error during RT-PCR. The increasing minority peak accuracy seen with increasing viral load could, paradoxically, cause minority species occurring at a prevalence of less than 12.5% to remain uncalled, thus explaining the decrease in detection of rare species with increasing viral load. One aspect that could not be adequately explained was the consistently higher representation of the sequence that, according to deep sequencing, should have been the second most common sequence, with 12.3% prevalence, rather than the virus that should have been the most common, with 24% prevalence according to deep sequencing. This anomaly may be due to biases in the sequencing reaction or perhaps a phenomenon particular to the sample itself as some deep-sequencing runs also showed differences in the order of these sequences (data not shown).

The increasing predictive ability of genotype-based tropism from screening samples in the MOTIVATE and A4001029 studies with increasing viral load is consistent with a linear relationship between minority peak area accuracy and viral load. Even the low-viral-load stratum in this data set was above the level required for the enhanced ability to detect low-prevalence minority species. Both strata would thus be likely to miss minority species below the minority threshold. Increasing viral load would increase the accuracy of minority species measurements, thus making the assay more likely to detect species that are near the cutoff. Both high and low strata missed some samples that changed tropism from R5 to non-R5. This could be due to either preexisting minorities that occurred below 12.5% prevalence or the selection of new variants in these individuals.

Repeat PBMC samples from patients in the MOTIVATE/A4001029 studies showed that proviral DNA can give a reproducible measurement of patient sequence. The correlation of PSSM scores between samples from each patient was not perfect. This resulted in 17% of patients having discordant results between the two samples. This may be a result relatively low copy number of proviral DNA compared to plasma RNA.

One potential limitation of this study was that viral load was measured from plasma; thus, the extracts may have had differences in template qualities and quantities depending on extraction efficiency. Reported input copy numbers in this study were thus subject to a degree of uncertainty. Note that the data presented are from retrospective studies; ideally, prospective trials should address the relative performance of all of these assays.

Overall, these findings suggest that population-based sequencing of the V3 loop of HIV-1 can be used to infer viral tropism, given sufficient input copy number and PCR replication. As RT-PCR drift will affect any sample at low-input copy numbers, these data have implications for the sequencing or recombinant phenotypic testing of any heterogeneous sample.

Supplementary Material

Supplemental material

ACKNOWLEDGMENTS

We thank Peter Cheung for providing the clones used in this study.

Funding for this paper was provided by the Canadian Institutes of Health Research (CIHR) and by Pfizer/ViiV Healthcare. A.F.Y.P. is funded by a CIHR operating grant and supported by a Michael Smith Foundation for Health Research/St. Paul's Hospital Foundation-Providence Health Care Research Institute Career Investigator Salary Award. P.R.H. is funded by a CIHR/GSK Research Chair in Clinical Virology.

P.R.H. received grants from, served as an ad hoc advisor to, or spoke at various events sponsored by Pfizer, Glaxo-Smith Kline, Abbott, Merck, Virco, and Monogram.

The funding sources of the study (Pfizer/ViiV Healthcare) had no involvement in the study design.

Footnotes

Published ahead of print 21 November 2012

Supplemental material for this article may be found at http://dx.doi.org/10.1128/JCM.00739-12

REFERENCES

  • 1. Este JA, Telenti A. 2007. HIV entry inhibitors. Lancet 370:81–88 [DOI] [PubMed] [Google Scholar]
  • 2. Berger EA, Doms RW, Fenyo EM, Korber BT, Littman DR, Moore JP, Sattentau QJ, Schuitemaker H, Sodroski J, Weiss RA. 1998. A new classification for HIV-1. Nature 391:240 doi:10.1038/34571 [DOI] [PubMed] [Google Scholar]
  • 3. Connor RI, Sheridan KE, Ceradini D, Choe S, Landau NR. 1997. Change in coreceptor use correlates with disease progression in HIV-1–infected individuals. J. Exp. Med. 185:621–628 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Delobel P, Sandres-Saune K, Cazabat M, Pasquier C, Marchou B, Massip P, Izopet J. 2005. R5 to X4 switch of the predominant HIV-1 population in cellular reservoirs during effective highly active antiretroviral therapy. J. Acquir. Immune Defic. Syndr. 38:382–392 [DOI] [PubMed] [Google Scholar]
  • 5. Lifson JD, Feinberg MB, Reyes GR, Rabin L, Banapour B, Chakrabarti S, Moss B, Wong-Staal F, Steimer KS, Engleman EG. 1986. Induction of CD4-dependent cell fusion by the HTLV-III/LAV envelope glycoprotein. Nature 323:725–728 [DOI] [PubMed] [Google Scholar]
  • 6. Dorr P, Westby M, Dobbs S, Griffin P, Irvine B, Macartney M, Mori J, Rickett G, Smith-Burchnell C, Napier C, Webster R, Armour D, Price D, Stammen B, Wood A, Perros M. 2005. Maraviroc (UK-427,857), a potent, orally bioavailable, and selective small-molecule inhibitor of chemokine receptor CCR5 with broad-spectrum anti-human immunodeficiency virus type 1 activity. Antimicrob. Agents Chemother. 49:4721–4732 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Gulick RM, Lalezari J, Goodrich J, Clumeck N, DeJesus E, Horban A, Nadler J, Clotet B, Karlsson A, Wohlfeiler M, Montana JB, McHale M, Sullivan J, Ridgway C, Felstead S, Dunne MW, van der Ryst E, Mayer H, Study Teams MOTIVATE 2008. Maraviroc for previously treated patients with R5 HIV-1 infection. N. Engl. J. Med. 359:1429–1441 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Poveda E, Alcami J, Paredes R, Cordoba J, Gutierrez F, Llibre JM, Delgado R, Pulido F, Iribarren JA, Garcia Deltoro M, Hernandez Quero J, Moreno S, Garcia F. 2010. Genotypic determination of HIV tropism—clinical and methodological recommendations to guide the therapeutic use of CCR5 antagonists. AIDS Rev. 12:135–148 [PubMed] [Google Scholar]
  • 9. Koot M, Vos AH, Keet RP, de Goede RE, Dercksen MW, Terpstra FG, Coutinho RA, Miedema F, Tersmette M. 1992. HIV-1 biological phenotype in long-term infected individuals evaluated with an MT-2 cocultivation assay. AIDS. 6:49–54 [DOI] [PubMed] [Google Scholar]
  • 10. Rose JD, Rhea AM, Weber J, Quinones-Mateu ME. 2009. Current tests to evaluate HIV-1 coreceptor tropism. Curr. Opin. HIV. AIDS. 4:136–142 [DOI] [PubMed] [Google Scholar]
  • 11. Chueca N, Garrido C, Alvarez M, Poveda E, de Dios Luna J, Zahonero N, Hernandez-Quero J, Soriano V, Maroto C, de Mendoza C, Garcia F. 2009. Improvement in the determination of HIV-1 tropism using the V3 gene sequence and a combination of bioinformatic tools. J. Med. Virol. 81:763–767 doi:10.1002/jmv.21425 [DOI] [PubMed] [Google Scholar]
  • 12. Hwang SS, Boyle TJ, Lyerly HK, Cullen BR. 1991. Identification of the envelope V3 loop as the primary determinant of cell tropism in HIV-1. Science 253:71–74 [DOI] [PubMed] [Google Scholar]
  • 13. Lengauer T, Sander O, Sierra S, Thielen A, Kaiser R. 2007. Bioinformatics prediction of HIV coreceptor usage. Nat. Biotechnol. 25:1407–1410 [DOI] [PubMed] [Google Scholar]
  • 14. Sander O, Sing T, Sommer I, Low AJ, Cheung PK, Harrigan PR, Lengauer T, Domingues FS. 2007. Structural descriptors of gp120 V3 loop for the prediction of HIV-1 coreceptor usage. PLoS Comput. Biol. 3:e58 doi:10.1371/journal.pcbi.0030058 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Seclen E, Garrido C, Gonzalez Mdel M, Gonzalez-Lahoz J, de Mendoza C, Soriano V, Poveda E. 2010. High sensitivity of specific genotypic tools for detection of X4 variants in antiretroviral-experienced patients suitable to be treated with CCR5 antagonists. J. Antimicrob. Chemother. 65:1486–1492 [DOI] [PubMed] [Google Scholar]
  • 16. Sing T, Low AJ, Beerenwinkel N, Sander O, Cheung PK, Domingues FS, Buch J, Daumer M, Kaiser R, Lengauer T, Harrigan PR. 2007. Predicting HIV coreceptor usage on the basis of genetic and clinical covariates. Antivir. Ther. 12:1097–1106 [PubMed] [Google Scholar]
  • 17. Kagan RM, Johnson EP, Siaw M, Biswas P, Chapman DS, Su Z, Platt JL, Pesano RL. 2012. A genotypic test for HIV-1 tropism combining Sanger sequencing with ultradeep sequencing predicts virologic response in treatment-experienced patients. PLoS One 7:e46334 doi:10.1371/journal.pone.0046334 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. McGovern RA, Thielen A, Mo T, Dong W, Woods CK, Chapman D, Lewis M, James I, Heera J, Valdez H, Harrigan PR. 2010. Population-based V3 genotypic tropism assay: a retrospective analysis using screening samples from the A4001029 and MOTIVATE studies. AIDS 24:2517–2525 [DOI] [PubMed] [Google Scholar]
  • 19. Margulies M, Egholm M, Altman WE, Attiya S, Bader JS, Bemben LA, Berka J, Braverman MS, Chen YJ, Chen Z, Dewell SB, Du L, Fierro JM, Gomes XV, Godwin BC, He W, Helgesen S, Ho CH, Irzyk GP, Jando SC, Alenquer ML, Jarvie TP, Jirage KB, Kim JB, Knight JR, Lanza JR, Leamon JH, Lefkowitz SM, Lei M, Li J, Lohman KL, Lu H, Makhijani VB, McDade KE, McKenna MP, Myers EW, Nickerson E, Nobile JR, Plant R, Puc BP, Ronan MT, Roth GT, Sarkis GJ, Simons JF, Simpson JW, Srinivasan M, Tartaro KR, Tomasz A, Vogt KA, Volkmer GA, Wang SH, Wang Y, Weiner MP, Yu P, Begley RF, Rothberg JM. 2005. Genome sequencing in microfabricated high-density picolitre reactors. Nature 437:376–380 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. Swenson LC, Mo T, Dong WW, Zhong X, Woods CK, Thielen A, Jensen MA, Knapp DJ, Chapman D, Portsmouth S, Lewis M, James I, Heera J, Valdez H, Harrigan PR. 2011. Deep third variable sequencing for HIV type 1 tropism in treatment-naive patients: a reanalysis of the MERIT trial of maraviroc. Clin. Infect. Dis. 53:732–742 [DOI] [PubMed] [Google Scholar]
  • 21. Eckert KA, Kunkel TA. 1991. DNA polymerase fidelity and the polymerase chain reaction. PCR Methods Appl. 1:17–24 [DOI] [PubMed] [Google Scholar]
  • 22. Ewing B, Green P. 1998. Base-calling of automated sequencer traces using Phred. II. Error probabilities. Genome Res. 8:186–194 [PubMed] [Google Scholar]
  • 23. Ewing B, Hillier L, Wendl MC, Green P. 1998. Base-calling of automated sequencer traces using Phred. I. Accuracy assessment. Genome Res. 8:175–185 [DOI] [PubMed] [Google Scholar]
  • 24. Richterich P. 1998. Estimation of errors in “raw” DNA sequences: a validation study. Genome Res. 8:251–259 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25. Saag M, Goodrich J, Fatkenheuer G, Clotet B, Clumeck N, Sullivan J, Westby M, van der Ryst E, Mayer H, A4001029 Study Group 2009. A double-blind, placebo-controlled trial of maraviroc in treatment-experienced patients infected with non-R5 HIV-1. J. Infect. Dis. 199:1638–1647 [DOI] [PubMed] [Google Scholar]
  • 26. Swenson LC, Moores A, Low AJ, Thielen A, Dong W, Woods C, Jensen MA, Wynhoven B, Chan D, Glascock C, Harrigan PR. 2010. Improved detection of CXCR4-using HIV by V3 genotyping: application of population-based and “deep” sequencing to plasma RNA and proviral DNA. J. Acquir. Immune Defic. Syndr. 54:506–510 [DOI] [PubMed] [Google Scholar]
  • 27. Brooks JI, Woods CK, Merks H, Wynhoven B, Hall TA, Sandstrom PA, Harrigan PR. 2009. Evaluation of an automated sequence analysis tool to standardize HIV genotyping results, abstr O061. Abstr. Can. Conf. HIV/AIDS Res., Vancouver, British Columbia, Canada [Google Scholar]
  • 28. Harrigan PR, Dong W, Wynhoven B, Mo T, Hall T, Galli RA. 2002. Performance of ReCall basecalling software for high-throughput HIV drug resistance basecalling using “in-house” methods, abstr TuPeB4598. Abstr. XIV Int. AIDS Conf., Barcelona, Spain, 7 to 12 July, 2002 [Google Scholar]
  • 29. Jensen MA, Li FS, van't Wout AB, Nickle DC, Shriner D, He HX, McLaughlin S, Shankarappa R, Margolick JB, Mullins JI. 2003. Improved coreceptor usage prediction and genotypic monitoring of R5-to-X4 transition by motif analysis of human immunodeficiency virus type 1 env V3 loop sequences. J. Virol. 77:13376–13388 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30. Edgar RC. 2004. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32:1792–1797 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31. Harrington DP, Fleming TR. 1982. A class of rank test procedures for censored survival-data. Biometrika 69:553–566 [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental material

Articles from Journal of Clinical Microbiology are provided here courtesy of American Society for Microbiology (ASM)

RESOURCES