Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2020 Mar 1.
Published in final edited form as: Cancer Res. 2019 Jul 9;79(17):4532–4538. doi: 10.1158/0008-5472.CAN-18-3933

No measurable substitution rate in the HPV16 genome in women with subsequent in situ or invasive cervical cancer: prospective population-based study.

Laila-Sara Arroyo-Mühr 1, Camilla Lagheden 1, Emilie Hultin 1, Carina Eklund 1, Hans-Olov Adami 2,3, Joakim Dillner 1,4, Karin Sundström 1,4
PMCID: PMC6876554  NIHMSID: NIHMS1533540  PMID: 31289133

Abstract

The human papillomavirus (HPV) rate of evolution is essential for cancer-preventive strategies targeting HPV. We analyzed variability over time in a prospective, population-based nested case-control study of in situ (CIS) and invasive squamous cervical cancer (SCC). Among 757690 women who participated in cervical screening in Sweden during 1969–2002, 94 women who had HPV16 persistence in two serial cervical screening samples (median 24 months apart, range 0.5–178 months) and later were diagnosed with CIS (n=59), SCC (n=32) or remained healthy (n=3). Whole-HPV16-genome sequencing and comparison of sequences in the serial samples revealed that all women had the same HPV16 lineage, particularly lineage A, in both serial smears. Fifty-six percent of women had an identical 7906 base pair HPV16 sequence in both samples and no woman had more than 15 nucleotide substitutions. The median substitution rate was 0 substitutions/site/year (confidence interval 0–0.00008), with no variation between quartiles of follow-up. We concluded that in most women with HPV16 persistence preceding disease, the nucleotide substitution rate was not measurable within up to 15-years follow-up. This slow rate of evolution has important implications for both HPV-based screening and HPV vaccination.

Keywords: cervical cancer, human papillomavirus, HPV, HPV16, substitution rate

INTRODUCTION

Persistent human papillomavirus (HPV) infection is essential for cervical cancer development and variant lineages and single nucleotide polymorphisms (SNPs) may play a role in the oncogenic potential (14). Knowledge of the HPV rate of evolution is essential, as rapidly evolving viruses may escape recognition by screening tests or vaccine-induced antibodies (5). Although HPV is considered to evolve slowly, there are limited data about mutation rates (6). Estimates of the nucleotide substitution rate of the most common and most oncogenic HPV type (HPV16) have ranged from 5 × 10^−4 to 4 × 10^−3 substitutions/site/year (7). There are also important reports of a tremendous variability of HPV16 isolates circulating in healthy human populations (8). How these results marry with the observation of slow evolution is not yet entirely clear, but it has been proposed that the plethora of isolates originated already in our distant forebears (9) and posited that decreased viral fitness in relation to productive or disease-causing infections should be considered (8).

Also, when a particular HPV genotype is highly prevalent in the population of women under study, reinfection with the same genotype may occur and entail misclassification of serial transient HPV infections as one, persistent infection (10). However, as HPV persistence is established as a requirement for development of cervical cancer, serial sampling of women who later develop cervical cancer or cancer in situ may provide a biologic ground for identification of persistent infections. Leading advances in next generation sequencing (8) now allow identification of whole viral genome sequences at a previously unparalleled sequencing depths (11). Reliable knowledge of HPV nucleotide substitution rates enabled by these advances may both inform natural history studies of cervical disease, and screening strategies and HPV vaccination strategies.

Since such knowledge on HPV16 substitution rates has been noted as lacking (6,7), we used deep sequencing to analyze serial cervical smears in a prospective, population-based study of invasive and in situ cervical cancer.

MATERIALS AND METHODS

Study participants

The participant inclusion into this study has been previously described (12). In brief, we sampled all squamous cervical cancer (SCC) case women in Sweden during the period 1969–2002, using the Swedish National Cervical Screening Register (http://nkcx.se/index_e.htm) as the sampling frame. We furthermore used the same register to draw a random sample of carcinoma in situ (CIS; equivalent to CIN3). Using case-control sampling, one woman, matched on county, date of entry into cohort (±3 months), and age at first normal smear (±1 year), was then randomly selected as an individually matched control for each CIS and SCC case. To verify the diagnoses of CIS (the equivalent to CIN3) or SCC, histologic samples from the identified cases were reviewed by a senior pathologist. All archival (conventional) Papanicolaou-stained smears from the women were retrieved and HPV-tested by PCR, and HPV16 positive samples were further quantified for HPV16 viral load using real-time PCR, as previously described (13).

To investigate HPV16 genomic variation over time in this study, we included the first and last smear from each woman who contributed at least two smears during the study, and who had an HPV16 viral load of at least 100 copies/microliter in both smears (to ensure sufficient viral DNA for reliable sequencing). HPV16 infections in healthy control women were rare. Thus, the number of included cases was much greater than the number of included controls compared to the 1:1 ratio in the original study. Nevertheless, we judged it best to here include these few controls as well, for comprehensiveness. We thus included all 188 samples from 94 different women who later had CIS/CIN3 (122 samples: 59 cases and 2 controls) or SCC (66 samples: 32 cases and 1 control). The median follow-up between the two smears was 24 months (range 0.5–178 months).

DNA extraction, amplification and primers pooling

DNA was extracted from archived slides as previously described (14) and purified using MagNa Pure LC Total Nucleic Acid Isolation Kit (Roche Molecular Systems, Inc., Alameda, CA, USA), as described (11). The HPV16 entire genome (7906 base pairs) was amplified with 47 overlapping amplicons as described previously (11,15). Primers were divided into 5 different PCR reactions to avoid cross-primer dimers and self-dimers occurrence.

All 188 samples together with 13 negative controls (9 PCR negative controls with water PCR-reagent (Sigma) and 4 PCR negative controls containing human DNA) were amplified separately and the 5 reactions were pooled together according to sample name, before library preparation. DNA amplicon length was checked after PCR (Bioanalyzer, Agilent) as quality analysis prior to library preparation.

Illumina library preparation

Libraries were prepared for all 201 samples (188 samples and 13 negative controls), using the TruSeq Nano DNA Sample Preparation kit according to the user guide revision A (Illumina). As PCR products ranged in size from 181 bp to 375 bp, no tagmentation was necessary and library preparation started with adenylation of 3’-ends. 4.4 ng/μl (75 ng in a total volume of 17,5 μl) of PCR product were used as input material and 2 adaptor indexed primers were ligated to each sample. Individual libraries were validated, normalized and pooled, resulting in a 1.8 pM denatured DNA solution. Pools containing approximately 47 libraries were sequenced paired-end 151+151 cycles once, using the NextSeq500 instrument and NextSeq 500 High Output reagent kit (Illumina) as described in the user guides Denature and Dilute Libraries Guide v02 for the NextSeq System, NextSeq 500 kit Reference Guide revision F and NextSeq 500 System Guide v02.

Sequence analyses

Reads obtained from the NextSeq 500 (Illumina) platform were quality checked with Trimmomatic (16) using default parameters and a minimal length of 150 bp. Long quality reads were aligned to a modified HPV16 reference sequence from the The PapillomaVirus Episteme (PAVE), 7906 bp, (via https://pave.niaid.nih.gov) using NextGenMap (17). Only paired-end reads where both reads mapped to the genome, with the correct orientation and distance, and with >90% identity over 75% of their length were considered as valid and further analyzed. The linear PAVE HPV16 reference genome was modified due to the actual HPV genome being circular. Amplicons containing the region corresponding to the last nucleotides of the reference sequence (positions 7800–7906) but also corresponding to part of the reference genome beginning (1–100 bp) might not be considered as mapped, because part of the sequence does not map to the linear reference. Therefore, we added the first 258 bp to the end of the reference genome (after position 7906) to not lose coverage of amplicons comprising that junction. BAM files were merged and left aligned using the GATK version 3.8, LeftAlignIndels Module. For coverage analysis, we used GATK DepthofCoverage with a minimum of 6 reads required to consider any position as covered. Primers were trimmed from aligned reads with BamUtil Trimbam trimming 32 bases from the 5′ end (http://genome.sph.umich.edu/wiki/BamUtil:_trimBam).

HPV16 variant calling and comparison of serial smears

The HPV16 genome was genotyped by GATK HaplotypeCaller Version 3.8. SNP and indel calls were made and hard filtered, following GATK Best Practices and default parameters. All identified nucleotide variants were manually inspected and were only considered as confirmed if the position was covered with at least 100 reads. For each woman, variant calls detected in both smears were compared to each other and the differences were annotated.

No differences were annotated if a smear presented a nucleotide variant in a position which was not covered with at least 100 reads for both smears.

HPV16 (Sub)lineages

Maximum likelihood (ML) trees were constructed using MEGA 7 (18), including 10 HPV16 A (previously termed European) and B-D (previously termed non-European) variant lineage reference sequences (19). Variant lineage and variant sublineage assignments were confirmed with SNP patterns.

Reproducibility of the study

For reproducibility and validation of our protocol, we cultured and extracted CaSki cells using MagNa Pure LC (Roche), according to the manufactureŕs instructions. Serial dilutions were performed and subjected to HPV 16 real time-PCR, as previously described (13), to quantify HPV viral load to select similar concentrations as for the tumour samples. Two CaSki dilutions together with one replicate each were subjected to PCR amplification, pooling, library preparation and sequencing, in the same fashion as all samples included in the study.

Coinfections with several (sub)lineages

Identification of possible HPV16 co-infections with a different HPV16 lineage or sublineage than the one detected, was performed by analyzing putative heterozygous calls at nucleotide positions that distinguished between variant lineages (i.e. A, B, C, D) and sublineages (e.g. A1, A2, A3 and A4 for the A lineage), in samples that showed complete genome coverage with sequencing median depths >200x. Alignment of 10 HPV16 A-D variant lineage reference sequences (19) was performed to identify positions (diagnostic calls) that were known to be variable between lineages.

All bases with a quality score of Q30 (meaning that the probability of the base to be called incorrectly is 1 in 1000, or higher) present at these diagnostic positions were filtered from each sample. In a sensitivity analysis in a random selection of 30 samples, we allowed bases with a quality score of Q15 to be filtered instead (meaning a probability of 1 in 31.6 to be called incorrectly). This was to investigate whether the stringency of our quality criteria had a strong impact on calling of co-infections or not. A heterozygous allele call (HPV is monoploid) was made if there were at least 2 bases detected, with >1% of total reads (threshold of 1% for the less abundant variant), and there was at least 5 read counts obtained for the less frequent variant.

Statistical analyses

Descriptive statistics are presented along with 95% confidence intervals (CI) appropriate for paired measurements where applicable. Statistical significance testing for differences between samples were carried out using chi2-tests, or two sample t-tests assuming unequal variances (the latter when indicated by test statistics). The number of SNPs observed between first and last smears (X) was divided by the number of total positions in the genome (n=7906 bp) and divided by the number of years between smears, obtaining a measure of substitutions/site/year (i.e. X/7906/years) which we equate with substitution rate. The median substitution rate was taken as the median value of all values of substitutions/site/year calculated in the different women. The average substitution rate was taken as the average of all substitution rates calculated in the women. Confidence intervals (CI) for the median substitution rate were calculated for overall median rate, and median rate by quartile of follow-up time in months, respectively. Using the z-statistic of 1.96, these CIs approximate the equivalent of a 95% CI by representing a range of ranked values capturing the variation around the median rate, but without making any assumptions of a normal distribution (the latter would not be possible for a set of ranked values).

We used a correlation test to test the relationship between number of SNPs detected in serial smears and time in months between these. All tests were two-sided and a p-value of <0.05 was considered statistically significant. Data management and statistical analyses were carried out in Excel and SAS version 9.4.

Ethical approval and consent from study participants

The study was approved by the Regional Ethical Review Board of Stockholm (approval numbers 02–201, 2005/640–32 and 2008/1540–32), which determined that informed consent from the participants was not required.

RESULTS

We included a total of 188 HPV16 positive samples from 94 women (2 smears per woman), of which most derived from women who later developed CIS/CIN3 (62.8%) or SCC (34.0%). Those with subsequent CIS were younger at both first and last smear, and at diagnosis, than the women with subsequent SCC (p<.0001). However, the time elapsed from first and last smears to diagnosis (3.2 years, and 0.14–0.15 years respectively, whether median or average) was similar (Table 1).

Table 1.

Basic characteristics of the study participants.

Diagnosis n=94 (100%) Age at first smeara Time from first smear to diagnosisb Age at last smeara Time from last smear to diagnosisc Age at diagnosisa
CIS/CIN3 59 (62.8%) 29.1 (18.7–56.2) 3.2 (0.1–10.5) 30.7 (21.8–60.4) 0.15 (0.0–5.4) 31.8 (23.1–60.6)
SCC 32 (34.0%) 40.4 (25.6–67.1) 3.2 (0.3–19.5) 44.7 (28.4–70.8) 0.14 (0.0–18.6) 45.0 (28.5–75.5)
Controls 3 (3.2%) 33.9 (28.7–40.4) - 38.8 (32–41.2) - -

Age and time from smear to diagnosis are both given in median number of years (range). CIN3: cervical intraepithelial neoplasia grade 3; CIS: cervical carcinoma in situ; SCC: Squamous cell carcinoma.

a

p<0.0001 for difference in means between CIS/CIN3 and SCC;

b

p>0.05 for difference in means between CIS/CIN3 and SCC;

c

p>0.05 for difference in means between CIS/CIN3 and SCC.

HPV16 genome coverage

The 188 samples presented a median viral load of 648 copies/ul (range 103 copies/ul - 112 427 copies/ul) and all covered at least 80% of the HPV16 complete genome (7906 bp) with sequencing median depths of >200x (median sequence depth per sample, 13 262x) (Table 2). 112/188 of the samples (59.6%) covered the entire HPV16 genome with sequencing median depths >2000x. Coverage varied substantially in one viral region located within HPV16 L2 (nucleotide 5314–5416), where coverage failure occurred in 31–61/188 samples (16.5 – 32.5% of samples depending on the exact nucleotide position considered). This 103 bp region accounts for 1.3% of the genome and corresponds to the non-overlapping part of amplicon 32. Water controls and HPV16 negative/human DNA positive controls contained no HPV16.

Table 2.

Sequence depth across genome coverage for HPV16

Number of samples = 188
Sequence depth HPV 16 genome coverage
80% 85% 90% 95%
>200x 188 182 175 162
>500x 184 179 172 161
>1000 178 175 169 161
>2000 170 170 167 161

Number of samples that exceeded 80, 85, 90 and 95 percent sequence coverage at median depths greater than 200x, 500x, 1000x and 2000x. More than 5 reads were required at a position to be considered as covered.

Variant lineage assignment

HPV16 variant lineage assignment revealed that 178/188 samples (94.7%) had HPV16 variant lineage A infection, corresponding to 132 (70.2%) A1, 40 (21.3%) A2, and 6 (3.2%) A4 sublineages. We found 10/188 (5.3%) samples with an HPV16 non-A variant lineage infection (Table 3).

Table 3.

HPV16 lineages detected by diagnosis/case-control status

Diagnosis Lineages n (%)
A1 A2 A4 C D2 D3
CIS/CIN3 118 84 (71.2) 26 (22.0) 4 (3.4) 0 2 (1.7) 2 (1.7)
SCC 64 42 (65.6) 14 (21.9) 2 (3.1) 2 (3.1) 0 4 (6.3)
Control 6 6 (100) 0 0 0 0 0
Total 188 132 (70.2) 40 (21.3) 6 (3.2) 2 (1.1) 2 (1.1) 6 (3.2)

Number of HPV16 (sub)lineages detected in all 188 samples from 94 women. CIN3: cervical intraepithelial neoplasia grade 3; CIS: cervical carcinoma in situ; SCC: Squamous cell carcinoma.

Lineage C was detected in 2/188 (1.1%) samples and Lineage D in 8/188 (4.3%) samples, corresponding to 2 samples classified as sublineage D2 and 6 classified as sublineage D3. Lineage B was not found in any sample. When we compared smears taken over time in the same woman, all women presented the same sublineage in both smears.

Nucleotide variants in serial cervical smears

We analyzed the number of SNPs/indels that differed from the first smear to the last, for each of the 94 women. 53/94 (56.4%) had no nucleotide differences between their smears, whereas 19/94 (20.2%) women presented a maximum of 2 SNPs, and 22/94 (23.4%) women showed between 3 to 15 SNPs variations from the first to last smear (Table 4).

Table 4:

Number of SNPs detected in serial cervical smears over time.

Number of SNPs
0 1 2 3 4 5 6 7 8 9 15
Number of women (total n=94) 53 8 11 6 3 2 5 2 1 2 1
Proportion women (%) of total sample 56.38 8.51 11.70 6.38 3.19 2.13 5.32 2.13 1.06 2.13 1.06
Average time (years) between smears 2.77 2.83 5.67 3.46 1.97 2.71 1.37 0.63 4.92 5.33 5.58
% Genome identity 100.00 99.99 99.97 99.96 99.95 99.94 99.92 99.91 99.90 99.89 99.81

Number of single nucleotide polymorphisms (SNPs) and genome identity detected in pairs of serial samples from 94 women.

The median time between smears was 24 months (range 0.5–178 months) (Supplementary Table 1). The median substitution rate was 0 substitutions/site/year (SSY) (approximate CI: 0 to 0.8 × 10^−4 SSY). The mean substitution rate was 1.89×10^−4 SSY (95% CI 0.99 × 10^−4 to 1.96 × 10^−4; range 0 to 2.77 ×10^−4 SSY) which would entail on average 1.89*0.0001*7906=1.5 SNPs per year in a woman infected with HPV16. However, there was no statistically established correlation between number of SNPs and progression of time (the CI for median SSY by quartile of follow-up time in months always included 0 (Supplementary Table 2)). Among the few outlier women with a high number of SNPs between their first and last smear (four women with 8, 9, 9 and 15 SNPs, respectively), we analyzed interim samples. The SNPs were all observed already between their first and their second smear. Sequence isolate conservation (i.e. no more SNPs) was then observed between the women’s second smear and their last smear, implying a switch to a new infection/isolate between smears 1 and 2 (Supplementary Table 3).

A total of 148 SNPs and no indels were detected when we compared both smears within each woman (Table 5). Nucleotide variant calls were spread across the whole genome, with no region/gene being particularly hyper/hypovariable. The percentage of variability for each gene ranged from 1.35% to 3.13%, with E7 appearing to be the relatively most conserved and E4 the relatively most variable, but without any statistically significant difference between the two (p=0.14, Table 5). Four SNPs were detected in a non-coding region located between E5 and L2 genes. A total of 4 SNPs were detected in the E7 region. These SNPs occurred in four different women (one SNP in one woman each) and were all non-synonymous in nature: resulting in a change from Ala→Val at C695T; Lys→Glu A739G; Thr→Met C752T and Thr→Ile C818T; respectively. 90/94 women showed the same sequence in E7 when comparing both smears.

Table 5.

SNPs/Indel calls detected by region of HPV16 genome

Region SNPs Indels Size (bp) % Variable Sites (95% CI)
E6 12 0 477 2.52 (1.11–3.92)
E7 4 0 297 1.35 (0.04–2.66)
E1 47 0 1950 2.41 (1.73–3.09)
E2 24 0 1098 2.19 (1.32–3.05)
E4 9 0 288 3.13 (1.12–5.13)
E5 6 0 252 2.38 (0.50–4.26)
L2 20 0 1422 1.41 (0.79–2.02)
L1 22 0 1596 1.38 (0.81–1.95)
URR 15 0 832 1.80 (0.90–2.71)
Total 148 0 7906 1.87 (1.57–2.17)

Presence in the HPV16 genome of single nucleotide polymorphisms/insertion deletions (SNPs/indels) identified in pairs of serial samples from 94 women. Bp: base pairs; CI: confidence interval; URR: Upstream regulatory region.

Reproducibility of the study protocol

Two diluted samples from extracted DNA CaSKi cells (HPV16 viral load 310 and 825.8 copies/ul, respectively) were amplified and sequenced with one replicate each to evaluate reproducibility of our protocol. Both samples and their replicates covered the complete HPV16 genome with sequencing median depths >100 000x, with no genome position covered with less than 2000 reads. All 4 samples were confirmed to have the same isolate, corresponding to HPV16 sublineage A2. No SNP/indel differences were detected between the 4 samples; i.e. each nucleotide of the two sequences of 7906 base pairs were called as entirely identical in their respective replicate.

Co-infections

We evaluated the potential for HPV16 lineage co-infection presence in all samples that showed complete genome coverage with sequencing median depths >200x, (n=112), by analyzing putative heterozygous calls at nucleotide positions that distinguished between variant (sub)lineages. Within all 112 samples, a maximum of 40 heterozygous calls/sample were detected at diagnostic positions when the threshold was set at 1% for the less abundant variant, decreasing to a maximum of 14 heterozygous calls/sample when the threshold reached 10% (Table 6). Relaxing the stringency criteria for quality to allow Q15 instead of Q30 did not change these results. The possibility of co-infection with other lineages than the one detected was thus rejected in any sample, as different lineages within HPV16 differ by at least 75 nucleotides and we could only identify a maximum of 40 heterozygous calls in any one sample.

Table 6.

Number of putative heterozygous calls in cervical smears by HPV16 (sub)lineage

Treshold for less abundant variant
n 1% 5% 10% 30%
A1 77 24 (11.37) 7 (4.17) 6 (3.11) 3 (0.6)
A2 25 25 (16.36) 7 (5.13) 5 (3.9) 3 (1.6)
A4 5 25 (21.27) 7 (6.11) 5 (3.8) 2 (1.3)
C 1 20 3 2 0
D2 2 38 (36.4) 11.5 (10,13) 7 (6,8) 1.5 (1,2)
D3 2 30.5 (30.31) 14 (11.17) 11 (8.14) 4.5 (0.9)
CaSki Controls 4 28 (27.29) 11 (10.11) 7 (7.8) 1 (1.1)

Identification of putative heterozygous calls at nucleotide positions that distinguished between variant (sub)lineages in first compared to last smear. The median numbers of putative heterozygous calls/sample are shown in the table, with minimum and maximum numbers of calls included in brackets.

Deeper analysis of heterozygous calls located at diagnostic positions specific to other sublineages, within the lineage detected, subsequently revealed that not more than 11 diagnostic calls/sample were specific to any sublineage reference, with a 1% threshold for the less abundant variant, effectively discarding also the possibility of coinfection with other sub-lineages in any of the samples.

DISCUSSION

While HPV is traditionally known as a mutationally stable virus, important work recently identified thousands of variable HPV16 genomes circulating in human populations when studying differences in viral sequences occurring between individual women (8). Substitution/mutation rate over time is a critical component of the virus’s resulting evolutionary rate (20), along with ecological and other selective factors. Yet, since longitudinal data has been sparse, there is little consensus on the actual rate of change and HPV16 has even been reported to evolve faster than expected from its genome size (6). HPV16 substitution rates within the same individuals over time are key to investigate, as HPV16 is a major globally circulating oncogenic agent and the results carry implications for both HPV-based screening and HPV vaccination programs. To further inform this issue, we leveraged a nested population-based study using a sequencing protocol capable of the hitherto deepest reported coverage of the entire HPV16 genome (11). We found that the median substitution rate was zero in women who later develop high-grade cervical lesions or cancer; on the contrary, most women displayed exactly the same sequence isolate persisting over time.

We included primarily women who later developed disease to increase the likelihood of studying a true, persistent infection; rather than repeated infections with new HPV16 isolates acquired serially by the same woman. The proximity of the infection in the last smear to time of diagnosis (median time <2 months) further increased the likelihood of us having identified the true disease-causing infection for study. We analyzed isolates with up to 15 years between them and found so few substitutions that there was not even a correlation over time. In a sensitivity analysis, we considered only those infections that did not have more than 4 years between sampling to use the same period of time as in the previous study (7), and the median rate remained the same of 0 substitutions/site/year.

Given these findings, we believe that future studies on the velocity of HPV16 acquiring mutations in case women would likely need a sufficient sequencing depth across the entire genome and a long follow-up time.

Previous work investigating L1-only HPV16 sequences reported to GenBank suggested a mean rate of 3.94 × 10^−3 substitutions/site/year, but the authors noted that there was risk for bias in the estimate (7). We analyzed the median rate in this study, since the mean is more sensitive to outliers and we had such our data where 4 women alone accounted for 41 SNPs. Also, as reported above, based on serial analyses of these women’s smears we could confirm that their relative wealth of SNPs probably derived from switch to a new isolate rather than rapid changes in the same isolate. Indeed, observations on HPV16 divergence would appear to support this choice: if the true HPV16 substitution rate (as measured by the mean) was rather in order of ~4 × 10^−3 substitutions/site/year this should translate into a substitution of 31.6 nucleotides/year. This seems unreasonably high as HPVs are known to have co-existed with humans since antiquity, yet all HPV16 isolates in GenBank do not present more than 791 bp difference (10% of the total genome). Possible explanations for such a high estimated rate, apart from using mean rather than median measures, might include analyses of too short periods between smears (as discussed in 6,7); incorrectly identifying different infections with different isolates as the same isolate; and/or inherent differences in the nature of populations included – i.e. whether individuals more likely to become cases, or remain disease-free, respectively, are studied.

The strengths of this study include the samples deriving from a population-based prospective design nested within a national cervical screening program, rather than from GenBank, where uploaded sequences may be biased towards deviating from the reference sequence (21). We also considered the entire sequence for HPV16, rather than selected parts of the genome. HPV16-positive samples from case women included in the study thus have a very high generalizability to cases of cervical disease occurring in the female population. We focused on archival smears with originally at least 100 copies/μliter of HPV16 DNA extracted, to improve the amount of input genomes.

An advanced deep sequencing method (15) with additional modifications (11) enabled us to obtain very good statistics on depth and coverage. Repeat analyses of the HPV16 genomes from CaSki cells repeatedly found identical sequences, an important control to establish reproducibility of the method. We thus likely avoided false calls that might otherwise have inflated our estimates of substitution rates. To further ascertain that nucleotide variant (i.e. putative mutation) calls belonged to the isolate analyzed rather than concomitant co-infection with another isolate, we looked for co-infections with other HPV16 lineages and/or sublineages in the same sample. However, most diagnostic calls detected were specific to the lineage/sublineage already identified, and too few possible heterozygous calls at diagnostic positions were verified to confirm a coinfection specific to other sublineages within the same lineage (e.g. A1 and A2). A relaxing of criteria to allow lower base quality score (Q15) at sublineage-specific nucleotide sites did not alter this result. Also, the number of specific heterozygous diagnostic calls decreased when increasing the threshold for the less abundant variant. We thus conclude that our finding of no co-infections in this setting is robust and that the high coverage obtained by the Illumina platform, combined with primer pooling reducing the risk for cross-reactions, likely reduced mistaken heterozygous calls that could otherwise have led us to overestimate the prevalence of co-infections in this study.

However, we emphasize the fact that our study was performed in pairs of samples from the same women originating from a homogenous, predominantly Caucasian population, which may have reduced the underlying likelihood of co-infections with other lineages and/or sublineages. Also, we focused on persistent HPV16 infections from women developing cervical disease.

There are thus important phenotypic and design-based differences between our data and those deriving from healthy populations, where up to 20% of samples were reported as containing co-infections with >1 HPV16 isolate (8), but case genomes indeed exhibited less variation which was postulated as related to decreased viral fitness. Thus, these respective findings are not in as stark a contrast as they initially may appear.

Potentially even larger intra-host variation has been reported if HIV-positive women are included (22). Limitations of our study thus include restriction by design to the intra-individual longitudinal perspective, restriction to women developing squamous cervical lesions, the limited inference to other ethnic groups than Caucasians, and the small number of controls which precluded more sophisticated analyses by e.g. case-control status over time. The latter fact was due to the original study identifying that very few women with repeated/persistent HPV16 infection remained healthy, particularly in case of high HPV16 viral DNA load.

To summarize, in this population of women who later developed in situ or invasive cervical cancer, there was no measurable substitution rate over up to 15 years. More than half of women identified in the population-based setting who later developed disease had persistent infection with the exact HPV16 sequence isolate conserved over time. As the price of sequencing is diminishing rapidly, sequencing could be of interest to establish if there is a persistent infection with a conserved isolate. In the context of HPV vaccination programs, the stability of this virus implies that it is unlikely that HPV16 will be able to mutate beyond recognition by the neutralizing monoclonal antibodies deriving from today’s VLP-based vaccine technologies, as previously posited (23).

Supplementary Material

1
2
3

Statement of significance.

Findings show there is no genomic variation over time in HPV16 infections progressing to cervical cancer. This could influence risk-stratification of women when screening for cervical cancer, and inform HPV vaccination strategies.

ACKNOWLEDGEMENTS

The authors thank Kristina Glimsjö and Helena Andersson for administrative support.

FUNDING

This study was funded by the NIH, National Cancer Institute (grant number 1 RO1 CA93378–01), the Swedish Cancer Society (grant number 2014/518), the Swedish Foundation for Strategic Research (grant number RB13–0011) and the Jonas Söderquist Scholarship Foundation (scholarship to KS). The funders had no role in the writing, preparation or decision to submit the manuscript.

CONFLICTS OF INTEREST

KS has received funding to her institution from Merck and Co, Inc, for other register-based studies on HPV vaccination in Sweden. JD has previously received funding to his institution from the same company. The other authors have no conflicts of interest to declare.

References:

  • 1.Cornet I, Gheit T, Iannacone MR, Vignat J, Sylla BS, Del Mistro A, et al. HPV16 genetic variation and the development of cervical cancer worldwide. Br J Cancer 2013;108:240–4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Hildesheim A, Schiffman M, Bromley C, Wacholder S, Herrero R, Rodriguez A, et al. Human papillomavirus type 16 variants and risk of cervical cancer. J Natl Cancer Inst 2001;93:315–8 [DOI] [PubMed] [Google Scholar]
  • 3.Mirabello L, Yeager M, Cullen M, Boland JF, Chen Z, Wentzensen N, et al. HPV16 Sublineage Associations With Histology-Specific Cancer Risk Using HPV Whole-Genome Sequences in 3200 Women. J Natl Cancer Inst 2016;108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Schiffman M, Rodriguez AC, Chen Z, Wacholder S, Herrero R, Hildesheim A, et al. A population-based prospective study of carcinogenic human papillomavirus variant lineages, viral persistence, and cervical neoplasia. Cancer Res 2010;70:3159–69 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Dillner J, Arbyn M, Unger E, Dillner L. Monitoring of human papillomavirus vaccination. Clin Exp Immunol 2011;163:17–25 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Sanjuan R From molecular genetics to phylodynamics: evolutionary relevance of mutation rates across viruses. PLoS Pathog 2012;8:e1002685. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Firth C, Kitchen A, Shapiro B, Suchard MA, Holmes EC, Rambaut A. Using time-structured data to estimate evolutionary rates of double-stranded DNA viruses. Mol Biol Evol 2010;27:2038–51 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Mirabello L, Yeager M, Yu K, Clifford GM, Xiao Y, Zhu B, et al. HPV16 E7 Genetic Conservation Is Critical to Carcinogenesis. Cell 2017;170:1164–74 e6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Pimenoff VN, de Oliveira CM, Bravo IG. Transmission between Archaic and Modern Human Ancestors during the Evolution of the Oncogenic Human Papillomavirus 16. Mol Biol Evol 2017;34:4–19 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Mayrand MH, Coutlee F, Hankins C, Lapointe N, Forest P, de Ladurantaye M, et al. Detection of human papillomavirus type 16 DNA in consecutive genital samples does not always represent persistent infection as determined by molecular variant analysis. J Clin Microbiol 2000;38:3388–93 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Arroyo-Muhr LS, Lagheden C, Hultin E, Eklund C, Adami HO, Dillner J, et al. Human papillomavirus type 16 genomic variation in women with subsequent in situ or invasive cervical cancer: prospective population-based study. Br J Cancer 2018;119:1163–8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Sundstrom K, Eloranta S, Sparen P, Arnheim Dahlstrom L, Gunnell A, Lindgren A, et al. Prospective study of human papillomavirus (HPV) types, HPV persistence, and risk of squamous cell carcinoma of the cervix. Cancer Epidemiol Biomarkers Prev 2010;19:2469–78 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Sundstrom K, Ploner A, Dahlstrom LA, Palmgren J, Dillner J, Adami HO, et al. Prospective study of HPV16 viral load and risk of in situ and invasive squamous cervical cancer. Cancer Epidemiol Biomarkers Prev 2013;22:150–8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Chua KL, Hjerpe A. Polymerase chain reaction analysis of human papillomavirus in archival cervical cytologic smears. Anal Quant Cytol Histol 1995;17:221–9 [PubMed] [Google Scholar]
  • 15.Cullen M, Boland JF, Schiffman M, Zhang X, Wentzensen N, Yang Q, et al. Deep sequencing of HPV16 genomes: A new high-throughput tool for exploring the carcinogenicity and natural history of HPV16 infection. Papillomavirus Res 2015;1:3–11 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 2014;30:2114–20 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Sedlazeck FJ, Rescheneder P, von Haeseler A. NextGenMap: fast and accurate read mapping in highly polymorphic genomes. Bioinformatics 2013;29:2790–1 [DOI] [PubMed] [Google Scholar]
  • 18.Kumar S, Stecher G, Tamura K. MEGA7: Molecular Evolutionary Genetics Analysis Version 7.0 for Bigger Datasets. Mol Biol Evol 2016;33:1870–4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Burk RD, Harari A, Chen Z. Human papillomavirus genome variants. Virology 2013;445:232–43 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Sanjuan R, Nebot MR, Chirico N, Mansky LM, Belshaw R. Viral mutation rates. J Virol 2010;84:9733–48 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Ahmed AI, Bissett SL, Beddows S. Amino acid sequence diversity of the major human papillomavirus capsid protein: implications for current and next generation vaccines. Infect Genet Evol 2013;18:151–9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Dube Mandishora RS, Gjotterud KS, Lagstrom S, Stray-Pedersen B, Duri K, Chin’ombe N, et al. Intra-host sequence variability in human papillomavirus. Papillomavirus Res 2018;5:180–91 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Pastrana DV, Vass WC, Lowy DR, Schiller JT. NHPV16 VLP vaccine induces human antibodies that neutralize divergent variants of HPV16. Virology 2001;279:361–9 [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

1
2
3

RESOURCES