Skip to main content
Microbial Genomics logoLink to Microbial Genomics
. 2025 Jul 15;11(7):001451. doi: 10.1099/mgen.0.001451

‘Vivaldi’: an amplicon-based whole-genome sequencing method for the four seasonal human coronaviruses, 229E, NL63, OC43 and HKU1, alongside SARS-CoV-2

C Patrick McClure 1,2,3,*, Theocharis Tsoleridis 1,2,4, Jack D Hill 1,2, Nadine Holmes 2,5, Joseph G Chappell 1,2, Timothy Byaruhanga 1,2, Joshua Duncan 1,2, Miruna Tofan 2, Louise Berry 6, Gemma Clark 6, William L Irving 1,2,3,6, Alexander W Tarr 1,2,3, Jonathan K Ball 1,2,3,7, Stuart Astbury 3,8,*, Matt Loose 2,5,*
PMCID: PMC12263288  PMID: 40663464

Abstract

Prior to the emergence of SARS-CoV-2 in 2019, alphacoronaviruses 229E and NL63 and betacoronaviruses OC43 and HKU1 were already established endemic ‘common cold’ viral infections. Despite their collective contribution towards global respiratory morbidity and mortality and potential to inform the future trajectory of SARS-CoV-2 endemicity, they are infrequently sequenced. We therefore developed a 1,200 bp amplicon whole-genome sequencing scheme targeting all four seasonal coronaviruses and SARS-CoV-2. The ‘Vivaldi’ method was applied retrospectively and prospectively using Oxford Nanopore Technology to ~400 seasonal coronavirus infections diagnosed in Nottingham, UK, from February 2016 to July 2023. We demonstrate that the amplicon multiplex strategy can be applied agnostically to determine the complete genomes of five different species from two coronaviral genera. Three hundred and four unique seasonal coronavirus genomes of greater than 95% coverage were achieved: 64 for 229E, 85 for NL63, 128 for OC43 and 27 for HKU1. They collectively indicated a dynamic seasonal coronavirus genomic landscape, with co-circulation of multiple variants emerging and declining over the UK winter respiratory infection season, with further geographical distinction when compared to a global dataset. Prolonged infection with concomitant intra-host evolution was also observed for both alpha- (NL63) and betacoronaviruses (OC43). This data represents the largest single cohort of seasonal coronavirus genomes to date and also a novel amplicon scheme for their future global surveillance suitable for widespread and easy adoption in the post-SARS-CoV-2 era of viral genomics.

Keywords: 229E, common cold, HKU1, NL63, OC43, seasonal coronavirus


Impact Statement.

As SARS-CoV-2 makes the transition from pandemic to endemic viral infection, there is a significant requirement to better understand the trajectory of the established four seasonal coronavirus (sCOV) species to have previously made this journey. The dearth of available genome sequences for the alphacoronaviruses 229E and NL63 and betacoronaviruses OC43 and HKU1 was felt most acutely as SARS-CoV-2 emerged as a global threat in 2020, and their natural history was probed for future predictions. To address both the current gap in knowledge and also provide a straightforward assay for future adoption by others, we have developed a 1,200 bp amplicon whole-genome sequencing scheme, applying and validating it on a sCoV cohort of unprecedented size, spanning a broad sampling period before and after the emergence of SARS-CoV-2. We illustrate that the sweeping emergence of variants and significant intra-host evolution in the immunosuppressed is not a feature novel to COVID-19 infection but common to the established endemic sCoV and could have been determined pre-pandemic had there been greater genomic surveillance. Within the current perturbed global viral infection landscape, alongside the coronaviruses’ well-characterized ability to recombine, it is essential that the exemplary pandemic genomic surveillance undertaken is maintained and expanded for all sCoV species. This can be achieved both straightforwardly and cost-effectively using the Vivaldi scheme presented.

Data Summary

The genetic data that support the findings of this study is openly available in GenBank at https://www.ncbi.nlm.nih.gov/genbank/, accession numbers PQ449607, PQ466789–PQ466858, PQ611288–PQ611387, PQ629997–PQ630181 and PQ658381–PQ658412. Raw sequence reads for all sequencing attempts have been deposited in the NCBI Sequence Resource Archive under project ID PRJNA1269771.

Introduction

The Coronaviridae family belongs to the Nidovirales order of positive single-stranded RNA viruses with relatively large genomes (27–33 kb) and four genera (Alpha-, Beta-, Gamma- and Deltacoronavirus) within the Orthocoronavirinae subfamily, with Alpha- and Betacoronavirus genera members capable of human infection [1,2].

Prior to the emergence of SARS-CoV-2 in 2019, four other coronaviruses – alphacoronaviruses 229E and NL63 and betacoronaviruses OC43 and HKU1 – had previously established endemic status [3,6]. Collectively, these are responsible for a significant proportion of global respiratory infections as part of the informal ‘common cold’ viral grouping [7]. Whilst they present a broad spectrum of disease severity in all age groups, infections are typically but not exclusively symptomatically mild [8], which has in part led to their clinical and genomic under-investigation to the detriment of understanding the emergence and future trajectory of SARS-CoV-2 endemicity [7,9, 10].

To date, six 229E genotypes [1,6] and three for NL63 (A–C) [11,12] have been described for the alphacoronaviruses, whilst betacoronaviruses OC43 and HKU1 have been categorized into ten (A–K) and three (A–C) genotypes, respectively [12,16]. Genetic diversity is also generated by extensive recombination events within and between genera [11,15, 17].

Amplicon-based whole-genome sequencing (WGS), utilizing contiguous overlapping PCR products of various sizes, has previously been described for a wide range of viruses [18,24], yielding both population-level epidemiological insight and outbreak management with near-real-time capability [19]. There is a trade-off in the priming strategy between the increased sensitivity of smaller amplicon tiling versus the heightened chance of primer mismatch where a greater number of primers are used. Greater diversity in the targeted viral taxon, as observed in established endemic pathogens, creates a more challenging target preferentially necessitating fewer amplicons to span the genome [18,24]. Continuous viral evolution, especially noticeable during the early phases of rapid viral spread following a recent zoonotic spillover, can result in primer mismatches, necessitating careful surveillance for amplicon ‘drop-out’ and swift primer redesign [25].

Due to the perceived low medical threat posed by seasonal coronaviruses, their inclusion in diagnostic panels and molecular epidemiological surveillance programmes has been overlooked in some instances. Since the emergence of SARS-CoV-2, there is increasing interest in the seasonal coronavirus molecular and clinical epidemiology [10,26,28].

This study describes an amplicon-based approach to provide whole-genome sequences of the four seasonal coronaviruses alongside SARS-CoV-2. We have applied the methodology both retrospectively to a large cohort of archival extracts pre-SARS-CoV-2 emergence and prospectively to contemporary post-pandemic infections. The resulting sequence data provides a greater depth of insight into seasonal coronaviral variation, potentially informing future SARS-CoV-2 endemicity. Contemporary viral genomes and future uptake of the methodology will facilitate further understanding of seasonal coronavirus evolution in a post-pandemic immune landscape.

Methods

Samples

Surplus total nucleic acid from anonymized coronaviral-positive patient respiratory samples diagnosed as part of the routine care pathway at Nottingham University Hospitals National Health Service Trust (NUH NHST) was stored at −70 °C from February 2016 to December 2018 and May 2021 to July 2023 as previously described [29,30]. Samples were taken predominantly as throat swabs (65%) and also nasopharyngeal aspirates (21%). Prior to May 2021, the AusDiagnostics Respiratory 16-plex clinical diagnostic panel (REF 20602, AusDiagnostics, Australia) used could discriminate seasonal coronavirus type, but this was not always recorded by laboratory personnel, nor was the assay’s absolute quantitative template copy number output. Due to potential clinical confusion and focussing of diagnostic resources, seasonal coronaviruses were not routinely investigated at the onset of the SARS-CoV-2 pandemic between February 2020 and April 2021, whereafter no diagnostic distinction was made between 229E, NL63, OC43 and HKU1 with a newly configured AusDiagnostics respiratory multiplex quantitative reverse transcription PCR (RT-qPCR) in use. Available nucleic acid with a recorded seasonal coronavirus type and a diagnostic laboratory quantitation of greater than ten copies per 10 µl RNA was selected from the 2016 to 2018 archive (n=377 from 989 retrieved samples), with the exception of HKU1, where all available positives were tested (n=23). Subsequently, all available diagnosed seasonal coronavirus-positive extracts were investigated from the 2021 to 2023 sub-cohort. Anonymized diagnostic laboratory PCR results were curated and analysed in Excel. NUH NHST approved an extended molecular investigation of diagnosed coronavirus positives under clinical audit number 23-078C.

Amplicon scheme primer design

All seasonal coronavirus genomes deposited in GenBank by July 2021 (n=344) were downloaded, aligned as below and used to construct maximum-likelihood trees, from which five distinct lineage sequences were chosen and concatenated with different SARS-CoV-2 variant sequences to generate five templates for PrimalScheme [21]. PrimalScheme was then instructed to generate primer sequences targeting 1,200 bp regions of the concatenated template, covering all four seasonal coronaviruses and SARS-CoV-2 genomes. The 278 primer scheme output was manually inspected in mega7 for mismatch against the alignment of all available reference genomes. Where a 3′ mismatch was observed in a significant proportion of the global dataset, primers were either edited to include degeneracy or redesigned at an alternative better-conserved location. Amplicon coverage was subsequently reviewed with each sequencing run, redesigning primers where amplicon drop-out or excessively lower coverage was observed. Amplicon balancing was further attempted in a minority of instances by doubling or halving primer concentration where coverage remained significantly low or high, respectively. Final selected primer sequences and relative concentrations are listed in Table S1 (Supplementary Material 1).

cDNA synthesis, PCR and WGS

Coronaviral cDNA was prepared with RNA to cDNA EcoDry Premix containing random hexamers (Takara Bio) as per the manufacturer’s instruction, then processed similarly to previously described [20]. Briefly, up to 2.5 µl of cDNA was used as a template in a 25 µl PCR reaction assembled with Q5® High-Fidelity 2X Master Mix (New England Biolabs) and either primer pool 1 (odd-numbered primer pairs) or 2 (even-numbered primer pairs) as detailed in Supplementary Material 1. Primer pools either contained oligonucleotides specific to individual coronaviral species (targeted Vivaldi) or all five species combined (Full Vivaldi), at a final concentration of 0.015 µM unless stated otherwise (Table S1, Supplementary Material 1). PCR reactions were thermocycled as follows: 98 °C for 30 s, then 45 cycles of 98 °C for 15 s and 65 °C for 5 min. PCR products from reactions 1 and 2 were inspected for specificity and yield on a 2% agarose gel with ethidium bromide and subsequently combined before Qubit quantification and normalization to 100 ng of DNA in 7.5 µl of water. Amplicons were sequenced as previously described [19]. Briefly, barcoded sequencing libraries were prepared using the Ligation Sequencing Kit (Oxford Nanopore Technologies; SQK-LSK109) and the Native Barcoding Expansion 96 Kit (Oxford Nanopore Technologies; EXP-NBD196). Up to 96 barcoded samples (range: 22–96) were sequenced over either one MinION flow cell (Oxford Nanopore Technologies; FLO-MIN106 R9.4.1) on the GridION X5 mk1 platform or one PromethION flow cell (Oxford Nanopore Technologies; FLO-PRO002 R9.4.1) on the PromethION P24 platform. Default sequencing parameters were used apart from barcoding options, which were set to barcode both ends. Some smaller batches of samples were sequenced over a partial PromethION flow cell, washed using the Flow Cell Wash Kit (Oxford Nanopore Technologies; EXP-WSH004). To eliminate the possibility of carryover between sequencing runs, no more than 96 samples, each with unique barcodes, were run on a single flow cell.

Sequence, phylogenetic and recombination analysis

Following basecalling using Guppy (v6.5.7), demultiplexed reads passing the quality threshold (average score>7) were used as the input for the ARTIC pipeline (v1.2.4) [21]. Briefly, reads were filtered to remove those below 700 bp and above 1,400 bp and aligned to a reference.fasta containing the four seasonal coronaviruses and SARS-CoV-2 using Minimap2 [31]. A .bed file corresponding to the primers used in the scheme was then used to softmask resulting alignments via the ARTIC align_trim script to ensure that variants were not called in primer sites. Fasta and bed files can be downloaded from https://github.com/stuartastbury/vivaldi. Finally, the soft-masked alignment was used as the input for nanopolish [32] to generate a consensus sequence which was then aligned to the reference using muscle [33]. A coverage threshold of 20× was used to call drop-outs, with sites falling below this removed from the consensus sequence and replaced with ‘N’.

Genomes were aligned for the presented analysis using the Geneious Prime 2019.0.4 software with the relevant seasonal coronavirus species genomes deposited in GenBank by December 2023 (n=543). Maximum-likelihood trees were generated to visualize the evolutionary relationships between high-quality (>95% complete) study and publicly available genomes with IQ-TREE2 using the following models of evolution as suggested by the software’s model finder, respectively: 229E – TIM+F+R2; NL63 – TIM+F+R3; OC43 – TIM+F+R6; HKU1 – TN+F+R2, with 1,000 bootstraps of the Shimodaira–Hasegawa approximate likelihood ratio test (SH-aLRT). Seasonality was assigned for all sequences with >95% coverage, using the first instance of infection in serially sampled patients only. The core season was defined as October to May (e.g. 16/17 for samples sequenced between October 2016 and May 2017), whilst June to September was considered out of typical season (e.g. 2017 for a sequence generated in June 2017). Snip-it plots from the CIVET tool (https://github.com/artic-network/civet) were used to illustrate genetic difference between sequences derived from prolonged infection [34].

The occurrence of recombination in UKN18_NL63_24 was assessed using RDP4.97 (http://web.cbio.uct.ac.za/~darren/rdp.html). Genomes were aligned with the Clustal W algorithm with manual adjustment, to accommodate length polymorphisms in the sequences. Phylogenetic trees for major and minor parental strains were assessed using unweighted pair group mean averages.

Figure visualization and data availability

Figures were generated in R v4.3.2 using the package ggplot2 v3.5.1. Study sequences have been deposited in GenBank as follows: 229E: PQ449607 and PQ466789–PQ466858; NL63: PQ611288–PQ611387; OC43: PQ629997–PQ630181; HKU1: PQ658381–PQ658412. Raw sequence reads for all sequencing attempts have been deposited in the NCBI Sequence Resource Archive under project ID PRJNA1269771, with this data summarized in Table S2 (Supplementary Material 1). See supplementary materials for scheme primer and bed files (Supplementary Material 1) .

Results

Seasonal coronavirus diagnosis in a regional UK diagnostic lab

Seasonal coronaviruses represent a significant proportion of the viral respiratory pathogens identified in the NUH NHST diagnostic laboratory. Since their inclusion in the respiratory diagnostic RT-qPCR multiplex in February 2016, through to the study end in July 2023, 1,540 positives have been recorded in c. 62,000 tests (data not shown), representing 2.5% of all samples tested and 5.75% of those with a diagnosed viral infection.

Pre-pandemic, broadly similar numbers of coronaviral positives were identified in each yearly infection season, most abundantly in one of the first 3 months of the year (Fig. 1), with a peak seen in March of 2016, February of 2017 and January of both 2018 and 2019. However, infections were recorded all year round but are significantly decreased to a background level of sporadic cases outside of the wider coronaviral season of October to May. Post-pandemic, considerably fewer samples were submitted for seasonal coronavirus testing due to reconfigured diagnostic practice, reducing discrimination of seasonality. Nonetheless, 2023 appeared to indicate seasonal corona infections peaking in January as previously.

Fig. 1. Diagnosed seasonal coronavirus infections, NUH NHST UK, February 2016 to July 2023. Seasonal coronavirus testing was undertaken with the AusDiagnostics Respiratory 16-plex assay beginning in February 2016 and able to discriminate seasonal coronavirus type, but this was not always recorded, in which case ‘sCoV’ was indicated. In February 2020, seasonal coronavirus detection was switched off to avoid clinical confusion at the onset of the SARS-CoV-2 pandemic. Seasonal coronavirus testing was resumed in May 2021 in a more selective capacity, whereafter no diagnostic distinction was made between 229E, NL63, OC43 and HKU1 on a newly reconfigured AusDiagnostics panel. All available samples collected post-SARS-CoV-2 emergence were attempted for WGS, and where successful, the type is presented. Contemporary local SARS-CoV-2 test results are not shown.

Fig. 1.

Whilst seasonal coronavirus type was only noted for approximately half (798) of recorded positives (1,531), with further periodic unevenness (e.g. mid July 2018 onwards no discrimination made in laboratory records), OC43 (312) and NL63 (308) were indicated in approximately equal numbers. In contrast, 229E was observed much more infrequently with 80 positives, and HKU1 was the rarest of types with just 33 recorded infections. With typing determined by this sequencing study, all four seasonal coronaviruses were again detected post-pandemic, with OC43 the most abundant and in line with more extensive seasonal coronavirus epidemiological studies reported in the UK [7] and USA [10] over similar time periods.

A method to sequence seasonal coronaviruses

To interrogate the molecular epidemiology of the seasonal coronaviruses detected at NUH NHST, a novel 1,200 bp amplicon sequencing scheme to generate near-complete genomes was developed and applied to available typed and untyped archived nucleic acid extracts.

In total, 402 unique samples were sequenced across 700 reactions summarized in Table 1 and Fig. 2.

Table 1. Summary of coronaviral sequencing investigations and genomic coverage achieved.

All sequencing reactions attempted*
Genome coverage†, ‡
CoV type Total Median % n>90 %>90 n>95 %>95 n>99 %>99
229E 115 99.98 99 86.09 88 76.52 72 62.61
NL63 233 96.35 178 76.39 167 71.67 85 36.48
OC43 265 96.65 207 78.11 166 62.64 106 40.00
HKU1 78 97.75 68 87.18 54 69.23 26 33.33
SARS-CoV-2 9 93.27 7 77.78 6 66.67 5 55.56
Grand total 700
Unique samples with final scheme iteration§
Genome coverage
Virus Total Median % n>90 %>90 n>95 %>95 n>99 %>99
229E 71 99.99 69 97.18 64 90.14 58 81.69
NL63 100 99.83 86 86.00 85 85.00 65 65.00
OC43 185 97.87 149 80.54 128 69.19 94 50.81
HKU1 38 99.21 29 76.32 27 71.05 25 65.79
SARS-CoV-2 8 96.68 7 87.50 6 75.00 5 62.50
Grand total 402

*Includes preliminary and iterated versions of the scheme performed during the optimization process and samples sequenced with both single targeted primer sets and all five primer sets together (see also Fig. S1, Supplementary Material 2).

†% of nucleotide positions with ≥20 reads.

‡Average read number for all sequences attempted was 120,644 (229E), 93,318 (NL63), 84,866 (OC43), 94,796 (HKU1) and 136,450 (SARS-CoV-2).

§From 372 unique patients, with 21 sampled two or more times.

Fig. 2. Genome coverage of seasonal coronaviruses by the ‘Vivaldi’ 1,200 bp amplicon scheme. The genome coverage of each unique sample for seasonal coronaviruses 229E, NL63, HKU1 and OC43 (y-axis), and additionally SARS-CoV-2, expressed as a percentage of the total region targeted by the amplicon scheme is shown compared to the log template copy number calculated by the diagnostic assay (copies per 10 µl RNA, AusDiagnostics, x-axis). Diagnostic assay quantities were not recorded for 21 samples sequenced and thus excluded from this dataset (n=381). Where multiple attempts were made to sequence the same sample either in scheme development or when comparing ‘full’ with ‘targeted’ schemes, the highest coverage output is presented.

Fig. 2.

The relatively higher prevalence of OC43 and NL63 was reflected in both attempted and ultimately unique sequenced genomes. High-quality (>95%) genomes were achieved for 69% (OC43) to 90% (229E) of coronaviral samples, whilst complete amplicon coverage (and near complete genomes) was achieved in 51–82% of cases. However, all 402 sequenced samples generated at least 25% of the genome, facilitating conclusive typing, although further samples were initially attempted by PCR, but failed to generate any observable amplicons and were thus not progressed to sequencing (data not shown).

Complete genome sequences could be retrieved over the wide dynamic range of recorded diagnostic lab template copy inputs for all four seasonal coronavirus types, from as little as ten copies to in excess of a million (Fig. 2, NB c. 5% of copy numbers were not recorded in the diagnostic lab records). 229E and HKU1 samples with less than 90% coverage had a copy number value of c. 100 copies (per 10 µl) template or less. By contrast, NL63 and OC43 samples often yielded sequence data with low coverage, even at relatively high copy number, suggesting primer mismatch as the likely cause (Fig. 2).

A limited subset (n=46) of predominantly post-pandemic samples were amplified and sequenced both with all five primer sets combined in each pool (the Full Vivaldi, containing 70 and 69 primer pairs in pools 1 and 2, respectively) and in some cases additionally with the conventional single primer sets (targeted Vivaldi, containing 13–15 primer pairs in each pool) (Fig. S1 and Table S1, Supplementary Material 2 and Supplementary Material 1 respectively). The Full Vivaldi approach did yield some full genomes but was more prone to produce incomplete genomes compared to the targeted approach. It was, however, still capable of typing seasonal coronavirus positives (Fig. S1, Supplementary Material 2). When tested, all but two of the partial genomes could be converted to near-complete (>96% coverage) by repeating amplification with targeted primers.

The average read depth of the finalized schemes was relatively consistent across both amplicons and species (Fig. S2, Supplementary Material 2). The vast majority of amplicons returned were within a tenfold depth range of 1,000- to 10,000-fold, with only amplicon 2 of 229E significantly outlying this range at 71,621-fold. Conversely, amplicons 20 of OC43 and 23 of SARS-CoV-2 performed the least well; however, they still delivered average depths of 278- and 235-fold, respectively (Fig. S2, Supplementary Material 2). The priming schemes performed similarly either when all five were multiplexed as the Full Vivaldi, or separately, targeting a specific coronaviral template (Fig. S3, Supplementary Material 2).

A sample positive for both OC43 (3,431 copies per 10 µl template) and SARS-CoV-2 (959 copies) generated complete coverage for the OC43 genome (UKN23_OC43_5) and a partial SARS-CoV-2 genome (65.95%) when amplified by the Full Vivaldi method. The SARS-CoV-2 genome coverage could be improved to 83.36% (UKN23_CoV2_1) when amplified with the targeted scheme, whilst the OC43 targeted scheme maintained complete coverage.

A small number of patients infected with OC43 were sampled and sequenced within a typical acute phase timeframe of infection and generated paired genomes of >95% coverage, allowing insight into short-term intra-patient virus evolution and accuracy of base-calling across the 30,202 sequenced positions. Three paired samples taken 1 (n=2) and 3 days apart exhibited no differences. A further two patients saw single differences over 2- and 7-day timepoints, whilst the sixth patient presented two mutations in a 6-day interval. Taken together, these limited changes are within the expected mutational rate of seasonal coronavirus genomes [35] and indicative of the high fidelity of the amplicon method.

Genetic epidemiology of seasonal coronaviruses in Nottingham, UK, 2016–2023

The developed amplicon scheme thus generated a considerable number of genomes across the pre- and post-pandemic years at our single regional centre, even when compared to the entire available global dataset. High-quality sequences (>95 %) from both the study (64 for 229E, 85 for NL63, 128 for OC43 and 27 for HKU1) and GenBank were aligned and used to construct phylogenetic trees (Figs36). Lineages were assigned as per previous studies, as summarized by Ye et al. [16].

Fig. 3. Phylogeny of 229E seasonal coronavirus. Phylogenetic relationship by the maximum-likelihood method of all Nottingham, UK, study sequences with >95% coverage (blue text) and publicly available genomes (black text) retrieved from GenBank in December 2023 [identified by accession number/country (with the region where noted)/year]. Numbers to the right of tree nodes indicate SH-aLRT branch support, with values <70 not shown. Branch lengths are drawn to a scale of nucleotide substitutions per site, with scale indicated. Tentative genotypes based on well-supported clustering (>90 branch support), with lineage exemplar sequences as presented by Ye et al. [16], are indicated by vertical bars.

Fig. 3.

Fig. 6. Phylogeny of OC43 seasonal coronavirus. Phylogenetic relationship by the maximum-likelihood method of all Nottingham, UK, study sequences with >95% coverage (blue text) and publicly available genomes (black text) retrieved from GenBank in December 2023 [identified by accession number/country (with the region where noted)/year]. Numbers to the right of tree nodes indicate SH-aLRT branch support, with values <70 not shown. Branch lengths are drawn to a scale of nucleotide substitutions per site, with scale indicated. Tentative genotypes based on well-supported clustering (>90 branch support), with lineage exemplar sequences as presented by Ye et al. [16], are indicated by vertical bars.

Fig. 6.

229E

Alphacoronaviral 229E study sequences broadly group into three well-supported clades (Fig. 3; genotypes 6, 7a and 7b). Three sequences from 2016 and one from January 2017 cluster with contemporary samples from the USA (2016 and 2017) and Germany (2015) and Vietnam (2013) assigned to genotype 6 in previous studies [11,16]. Sparse sequence number representation, relatively long branch lengths and additional well-supported nodes are suggestive of an unsampled diversity within 229E genotype 6. However, the bulk of study sequences situate in a previously noted emerging lineage 7 [16] and are further segregated into two well-supported and populated clusters, nominally 7a and 7b, both of which contain study genomes from 2016.

Interestingly, putative genotype 7a was comprised predominantly from our UK study sequences and none from the most extensive prior global study of 229E genomes, set in China immediately preceding the pandemic [16]. Genotype 7a was only sampled elsewhere in the USA between 2015 and 2017 and not after the summer of 2018 in our study, having been the significant majority variant in the core 2016/2017 coronaviral season (Fig. 7) . In contrast, genotype 7b was also seen from the outset of our study period, in samples derived from the USA and the key pre-pandemic Chinese study [16], as well as post-SARS-CoV-2 emergence in Nottingham, as well as China, Russia and the USA.

Fig. 7. Lineage counts of seasonal coronaviruses by season. The lineages of 229E, NL63, HKU1 and OC43 seasonal coronaviruses (panels a to d, respectively) were determined for each genome sequence of greater than 95% coverage, based on well-supported clustering (>90 branch support) with lineage exemplar sequences as presented by Ye et al. [16]. Core infection season was defined as October to May (e.g. 16/17 for samples collected between October 2016 and May 2017), whilst June to September was considered out of typical season (e.g. 2017 for a sequence generated in June 2017).

Fig. 7.

Increasing phylogenetic granularity, we can observe a single UK study sequence UKN18_229E_4 (from a 20-year-old patient in June, outside of typical UK coronaviral season) sitting within an extensive sub-cluster of Chinese sequences. Similarly, the earliest study sequence UKN16_229E_1 is more closely related to 2017 and 2018 Chinese sequences than anything else from the UK. Conversely, well-supported and populated 7b sub-clusters of predominantly UK sequences with accordingly sparse Chinese representation can be observed. More recent 2022 and 2023 study sequences are found in these distinct genotype 7b clades.

Taken together, the genetic epidemiology of 229E supports the previously suggested notion of genetic drift [36], potentially with local lineage displacement of 7a by 7b (Fig. 7) suggesting a more nuanced segregation of geographical transmission at regional levels.

NL63

With the significant NL63 cohort presented here and elsewhere recently [16], previously undetected diversity in circulating NL63 could also be observed. Diverse B and C lineages were sequenced throughout the study timeline (Fig. 4), with many subtypes detected within the most heavily sampled 2017/2018 core season (Fig. 7). Notably in lineage B, the increased sampling supports an expansion of nomenclature towards the designation of subtypes, similar to lineage C [37]. We have tentatively annotated the well-supported lineage B clusters as subtypes B1 and B2 to assist with discrimination here and potentially elsewhere.

Fig. 4. Phylogeny of NL63 seasonal coronavirus. Phylogenetic relationship by the maximum-likelihood method of all Nottingham, UK, study sequences with >95% coverage (blue text) and publicly available genomes (black text) retrieved from GenBank in December 2023 [identified by accession number/country (with the region where noted)/year]. Numbers to the right of tree nodes indicate SH-aLRT branch support, with values <70 not shown. Branch lengths are drawn to a scale of nucleotide substitutions per site, with scale indicated. Tentative genotypes based on well-supported clustering (>90 branch support), with lineage exemplar sequences as presented by Ye et al. [16], are indicated by vertical bars.

Fig. 4.

Like its counterpart Alphacoronavirus 229E, NL63 exhibits geographical segregation within its subtypes. For instance, UKN17_NL63_22 is situated in a well-supported subgroup of B1 with reference sequences from China (2016–2019) and Japan (2018), whilst the bulk of B1, the ‘emerging cluster’ reported by Ye et al. [16] with many 2016–2019 Chinese samples, contains no other UK study sequences despite our extensive sampling in a similar timeframe. Within lineage B, our UK sampling presents most commonly as B2, with only single representatives from the USA (2015) and Japan (2017), in contrast to multiple 2017, 2018, 2022 and 2023 UK study sequences.

A similar picture presents with lineage C2, indicating a well-supported sub-cluster containing predominantly study sequences from 2016 to 2018 and 2021 to 2022, alongside contemporary US (2019), Japanese (2019) and Russian (2022) references. Elsewhere, more limited numbers of further C2 and C3 sequences were observed pre- and post-pandemic, alongside more numerous Chinese and Japanese reference samples. Intriguingly, a single study sample UKN18_NL63_24 stands distinctly from all others (Fig. 4), potentially representing a novel intra-species recombinant (Fig. S4, Supplementary Material 2) as extensively analysed previously [17,38].

HKU1

HKU1 is the least sequenced seasonal coronavirus globally due to an accordingly lower detection rate, and therefore, even our relatively limited contribution of 27 high-quality (>95% coverage) genomes is significant (Fig. 5). Genotypes A (n=13) and B (N=14) were detected in approximately equal numbers across the study period (Fig. 5), presenting in sub-clusters with other contemporary sequences; we did not detect the recombinant and, to date, rarest genotype C [14,39].

Fig. 5. Phylogeny of HKU1 seasonal coronavirus. Phylogenetic relationship by the maximum-likelihood method of all Nottingham, UK, study sequences with >95% coverage (blue text) and publicly available genomes (black text) retrieved from GenBank in December 2023 [identified by accession number/country (with the region where noted)/year]. Numbers to the right of tree nodes indicate SH-aLRT branch support, with values <70 not shown. Branch lengths are drawn to a scale of nucleotide substitutions per site, with scale indicated. Tentative genotypes based on well-supported clustering (>90 branch support), with lineage exemplar sequences as presented by Ye et al. [16], are indicated by vertical bars.

Fig. 5.

Genotype A study samples from 2016 to 2018 and 2022 segregated with high bootstrap support alongside contemporary Japanese (2014 and 2019), US (2015 and 2021), Chinese (2018) and Russian (2022) references, hinting at the extinction of more historically detected genotype A lineages. Genotype B also offers only limited granularity currently, although the two most recent 2023 study samples segregate with earlier UK study genomes, separate from a cluster of US sequences seen at the onset of the SARS-CoV-2 pandemic [28]. Other genotype B study sequences appear more similar to contemporary Chinese (2015–2018), Japanese (2016), Australian (2019), Thai (2017) and US (2017, with a temporally outlying 2005 sequence) isolates.

OC43

OC43 is the most extensively sequenced seasonal coronavirus globally, contributing towards a more complicated genotypic nomenclature. We observed study sequences segregating with genotypes E, G, I, J and K both across the entire study and all within the most sampled 8-month core season of 2017/2018 (Figs6 7). The seasonal lineage data is again suggestive in the better-sampled pre-pandemic era of a predominance in 2016/2017 of genotype I being displaced in the following 2017/2018 season by the emergent genotype K [15]. Post-pandemic, a parity of genotype K with the additional emerging genotype J [15] in the 2021/2022 season gives way to a predominance of genotype J in the most recent core season of 2022/2023. Interestingly, sample UKN18_OC43_38 stands alone on a long, well-supported branch, suggestive of further unsampled diversity in circulating viruses.

Intragenotypic temporal geographical segregation can again be observed, with genotype E infrequently detected elsewhere globally, but featuring as the second most abundant lineage in our most extensively sampled 2017/2018 season (Figs6 7), in stark contrast to an absence of genotype E in 74 contemporary reference genomes from China [16]. Genotype J indicated some additional well-supported clades, potentially indicative of further emerging lineages set to predominate in future seasons.

Prolonged infection by both seasonal alpha- and betacoronaviruses

Several patients presented multiple samples of the same seasonal coronavirus type, although all but two of the serial samplings were within a 3-week timeframe, and most within a few days (data not shown). However, one 23-year-old patient was determined to be NL63-positive over 195 days and 7 consecutive timepoints in 2017 (UKN17_NL63_1, 6, 7, 15, 16, 18 and 19), whilst another 35-year-old was OC43-positive over 164 days and 4 consecutive timepoints in 2023 (UKN23_OC43_7, 12, 15 and 17). The sequential samples clustered closely on their respective trees (Figs4 6; UKN17_NL63_16 and 19 with <95 % coverage were excluded from Fig. 4). Both patients were noted to have been under respiratory surveillance post-bone marrow transplant and thus highly likely to be immunosuppressed. Two weeks after the final positive timeline, the OC43-positive individual was negative for all targets in the viral respiratory multiplex RT-qPCR. The NL63-positive patient was not screened again after the final positive timepoint.

The proximity of such genetically related positive samples suggested a prolonged infection as previously presented with SARS-CoV-2 [40], so the sequence differences were investigated in isolation for intra-patient evolution (Fig. 8). The alphacoronaviral NL63 patient presented an identical genome 22 days after the initial detection. However, as the infection continued, 18 mutations arose in ORF1ab [n=3, 2 non-synonymous (NS)], Spike (8, all NS), NS3 (2, 1 NS), Matrix (1 NS) and Nucleocapsid (4, 2 NS and a 9 bp). Half of the mutations, involving all genes and including the three-amino-acid nucleocapsid deletion, appeared in the last timepoint.

Fig. 8. Putative prolonged seasonal coronaviral infection. Two sets of genome sequences were derived from the same individuals on dates suggestive of prolonged single infections, for both Alphacoronavirus NL63 (panel a) and Betacoronavirus OC43 (panel b). Sequence differences were visualized by Snip-it plots from the CIVET tool (https://github.com/artic-network/civet), using the first sample as the ‘Day 0’ timepoint and reference sequence. Genome nucleotide positions and genes are indicated relative to reference sequences LC488390 and LC756670 for NL63 and OC43, respectively. ‘N’ indicates sequence read drop-out, and ‘−9’ indicates a 9 bp deletion in the final NL63 nucleocapsid sequence.

Fig. 8.

Similar early and late time course mutation patterns were observed for the Betacoronavirus OC43, with no differences seen in the initial acute phase at 9 days. In contrast, mutations at nine sites were observed at the 107-day timepoint and 14 further at 164 days, in ORF1ab (n=10, 8 NS), NS2a (1 NS), Spike (10, 8 NS), Envelope (1 NS) and Membrane (2 NS).

Co- and re-infected patients

In addition to the above-mentioned OC43/SARS-CoV-2 co-infected patient, Full Vivaldi sequencing indicated a further individual positive for both alpha- and betacoronaviruses (NL63 and HKU1), previously undetected by the non-discriminatory diagnostic assay, with a combined seasonal coronavirus genomic template value of 1,771 copies per 10 µl. A complete HKU1 genome and 47% of the NL63 genome were recovered by the Full Vivaldi method, elevated to 82% by subsequent NL63-exclusive sequencing.

Further to the consecutive timepoint NL63 and OC43 infections, four instances of multiple heterologous coronaviral infections in the same season were observed, all with NL63 and OC43 (data not shown). These ranged from as little as 38 to as many as 128 days apart, and all were observed within the most heavily sampled 2017/2018 season.

Discussion

The scale and rapidity of the sequencing efforts since the start of the SARS-CoV-2 pandemic [41,42], accompanied by the extensive media coverage and its impact on government health policies, have underscored the importance of viral genomics. Whilst there has been significant global attention on this novel coronavirus spillover, there has been a notable absence of genomic data for the four endemic seasonal coronaviruses. This represents a critical gap in our knowledge that could provide insights into the future course of SARS-CoV-2 in the human population [35,43, 44]. Our study delivers a simple method to improve ongoing surveillance of seasonal coronaviruses and contributes substantially by providing ‘backfill’ of recent pre-pandemic genomes of all four species.

To reproducibly detect whole genomes from seasonal coronavirus strains [45,47] in unenriched clinical samples, demanded not only an amplicon approach, but also a fine balance between minimizing priming sites and maximizing the sensitivity of detecting smaller RNA fragments. We thus opted for a 1,200 bp amplicon scheme as used previously for SARS-CoV-2 [20], rather than the commonly adopted 400 bp amplicon scheme [19,21], larger amplicon schemes [18,23, 24] or an unbiased metagenomic approach [28,48]. Initial attempts on even a small batch of clinical samples can quickly identify amplicon drop-out, whilst careful continued evaluation is required to guard against evolving drop-out such as seen previously in SARS-CoV-2 [25], and similarly seen in our cohort in post-pandemic 229E infections.

This approach has high sensitivity, with whole-genome retrieval from samples with less than 100 genome copies of input, in line with previously reported sensitivity of the 400 bp amplicon of 50 copies per reaction [21], and likely orders of magnitude more sensitive than metagenomic approaches to coronaviral genome sequencing [28]. The comparable sensitivity of our larger 1,200 bp to previous 400 bp schemes can be explained in part by increasing the amplification cycle number from the typically used 35 to 45 (data not shown [20,21]). Concerns about the introduction of PCR error and/or detection of contaminating sequences by elevated cycle number can be minimized by not only the consistent use of negative controls [49], but also the use of proofreading enzymes [19]. The minimal differences observed in our closely but independently sampled OC43 infections – four differences across six paired c. 30 kb genomes – indicate a robust degree of fidelity in the method, but this could be further tested with clonal templates to probe error rates. Similarly, to metagenomic sequencing, strong signals of only fractional parts of a coronaviral genome should be treated with suspicion [50].

To maximize the scope of our method, we ensured that our scheme could target an unprecedented five near-complete species’ genomes from two coronaviral genera in the minimal two-reaction format [21]. To our knowledge, this is the broadest complete genome amplicon method presented to date and showcases the considerable potential of the methodology in sensitively targeting multiple viruses in whole, or part [51]. We validated this by confirming clinically diagnosed coronavirus co-infections and identifying others missed by RT-qPCR multiplex assay panels, which can have poor discriminatory potential.

Combining our novel method and extensive diagnostic surplus archive, we have generated the largest single cohort of seasonal coronaviral genomes to date. Thus, when combined with the global dataset, and notably other contemporary in-depth studies from China [16] and the USA [28], we were able to gain greater insight into the ebb and flow of seasonal coronaviral genotypes/lineages, such as has been monitored with unprecedented detail with the analogous SARS-CoV-2 variants. In contrast to the currently observed pattern of emergence and apparent total selective sweeps of SARS-CoV-2 lineages [35,52], the seasonal coronaviruses were observed to present a more complex co-circulation of genetically distinct lineages rising and falling in predominance, sometimes with contemporary geographical variation and prevalence. For example, 229E saw a progressive sweep in our UK cohort through lineages 6 and 7a pre-pandemic, to exclusively 7b post-pandemic; however, 7b was predominant in a pre-pandemic Chinese cohort [16]. Similarly, OC43 lineage I was displaced in predominance by the closely genetically related lineage K [15] between the 2016/2017 and 2017/2018 seasons. Whilst lineage J, a variant of precursor lineage H [15], was also present in a minority in both these seasons, it presented as the overwhelming OC43 lineage of the 2022/2023 UK season. It was generally notable that the season that we sampled most extensively (2017/2018) also exhibited the greatest diversity of both alphacoronaviral NL63 and betacoronaviral OC43 lineages, further emphasizing the need for greater genomic surveillance to capture the true extent of circulating seasonal coronavirus variability. Underlining this point was the high prevalence of OC43 genotype E in 2017/2018, infrequently observed elsewhere globally in recent sampling, but strikingly associated with a fatal encephalitis case previously also in the UK in 2011 [53].

It has been noted for NL63 that genotype switching was not a prerequisite to re-infection of an individual [48], supported by phylogenetic analysis and in contrast to 229E and OC43 [44]. The selective force on 229E to antigenically evolve has been further demonstrated in vivo with historical sera failing to neutralize more contemporary isolates through significant variation in receptor-binding domains [43], with our data indicating that the trend continues. Although an older study suggested re-infection by the same 229E strain was indeed possible in the short term [54], this was in a highly controlled experimental setting and in direct contradiction to a similar earlier study in the 1980s [55]. As noted elsewhere [44], HKU1 data is insufficient to draw conclusions about its evolutionary trajectory. However, the novel method and relatively large HKU1 cohorts presented here and recently elsewhere [28] could redress this imbalance in future studies. However, more broadly, the absence of a recombinant HKU1 A/B lineage C extant around the time HKU1 was discovered [6,14] and was in agreement with other key contemporary studies [28,39].

Overall, this may suggest that dominating SARS-CoV-2 lineages in well-sampled regions may not subsequently be swept to extinction but could continue undetected circulation and further evolution in certain populations before re-emergence in others. The unexpected appearance and subsequent domination of novel SARS-CoV-2 variants Delta and Omicron were indeed suggestive of transmission and evolution in unsampled populations [42,56, 57] or longer-term intra-host evolution in immunosuppressed individuals as discussed below. The significant differences between the genomes in this single-centre study and the other largest contemporary seasonal coronavirus cohort to date, from 36 hospitals in Beijing with a catchment of 25 million people [16], underscore the need for greater and broader surveillance of respiratory viral infections, not only for seasonal coronaviruses, but other pathogens also [58].

The effect of non-pharmaceutical interventions to reduce pandemic transmission of SARS-CoV-2 has had a significant effect on the well-monitored circulation of influenza, notably with the apparent disappearance of influenza B Yamagata [59]. These measures may also have exerted pressure on some seasonal coronavirus lineages, with a significant delay in the typical US seasonal onset observed in 2020/2021 attributed to the SARS-CoV-2 pandemic [10]. Pre-pandemic patterns of alternating seasonal betacoronaviral dominance of OC43 and HKU1, punctuated by parity [10], may be further perturbed by significantly enhanced population immunity (by infection and/or vaccination) of SARS-CoV-2 as a third betacoronavirus [60]. But ultimately, immunity against re-infection, cross-protective or otherwise, is short-lived [61], whilst protection against severe disease remains [60,62].

Paradoxically, ‘out of season’ seasonal coronaviral infections were observed. In some instances, phylogenetic analysis was suggestive of importation to the UK by international travel, as we and others reported previously for SARS-CoV-2 [41,42]. Conversely, there may be continued low-level circulation within the UK and elsewhere. In other instances, out-of-season positives were the result of likely persistent infections acquired by the immunocompromised within the typical epidemic months. Long-term SARS-CoV-2 infection of individuals (especially immunosuppressed), leading to intra-patient virus evolution, has been described [40,63] and offered as a potential mechanism for the emergence of some highly evolved variants of concern [63,64]. We similarly demonstrate that increased genomic sequencing of seasonal coronaviruses in even our single-centre cohort can reveal prolonged infection and evolution in both beta- and alphacoronaviruses. Perhaps most strikingly, we observed a 9 bp nucleotide deletion immediately prior to the start of the C-terminal domain of the nucleocapsid in the disordered linker region responsible for RNA binding and oligomerization [65,66] in the last timepoint of our chronically infected NL63 patient. This had parallels in SARS-CoV-2, with a 3 aa nucleocapsid deletion relative to the ancestral Wuhan strain being a feature of the Omicron lineage, albeit in the N terminal domain [67]. The recent spillover of a novel canine coronavirus in Malaysia also reported a larger 12 aa deletion in the middle of the N protein, hypothesized to be associated with adaptation to a new host [68], similar to previously reported nucleocapsid deletions (and insertions) found to determine nuclear localization in MERS and SARS-CoV-1 infections [69]. However, caution must be taken in extrapolating any isolated changes in immunocompromised patients to effects on transmissibility in the wider population.

In conclusion, given the recent spillover of SARS-CoV-2, its uncertain evolutionary path, the concurrent circulation and potential co-infection with endemic coronaviruses and the propensity of coronaviruses for genetic recombination, there is a pressing need for enhanced surveillance of seasonal coronaviruses. The methodology outlined in this study closely resembles widely accepted amplicon schemes used for SARS-CoV-2, but with a notable advancement: it offers unprecedented coverage across two genera and five species. Furthermore, it has demonstrated high sensitivity in both pre- and post-pandemic cohorts of all four seasonal coronaviruses. Therefore, it is well-suited for widespread adoption, serving as a valuable tool to complement the increasing genomic sequencing efforts of other respiratory viruses. This will provide crucial insights for informed epidemiological analyses and public health decision-making.

Supplementary material

Supplementary Material 1.
mgen-11-01451-s001.xlsx (110.9KB, xlsx)
DOI: 10.1099/mgen.0.001451
Supplementary Material 2.
mgen-11-01451-s002.pdf (747.3KB, pdf)
DOI: 10.1099/mgen.0.001451

Abbreviations

NHS

National Health Service

NS

non-synonymous

NUH NHST

Nottingham University Hospitals NHS Trust

RT-qPCR

quantitative reverse transcription PCR

sCOV

seasonal coronavirus

SH-aLRT

Shimodaira–Hasegawa approximate likelihood ratio test

WGS

whole-genome sequencing

Footnotes

Funding: The study was funded by COG-UK (COVID-19 Genomics UK Consortium).

Ethical statement: The study utilized surplus nucleic acid extracted in routine diagnostic investigation of viral respiratory illness at NUH NHST, who approved extended molecular investigation of diagnosed coronavirus positives under clinical audit number 23-078C.

Contributor Information

C. Patrick McClure, Email: patrick.mcclure@nottingham.ac.uk.

Theocharis Tsoleridis, Email: Theo.Tsoleridis@pirbright.ac.uk.

Jack D. Hill, Email: jack.hill.17@outlook.com.

Nadine Holmes, Email: mbznh@exmail.nottingham.ac.uk.

Joseph G. Chappell, Email: mbajc3@exmail.nottingham.ac.uk.

Timothy Byaruhanga, Email: tssekandi@gmail.com.

Joshua Duncan, Email: Joshua.Duncan@nottingham.ac.uk.

Miruna Tofan, Email: tmirunaioana@gmail.com.

Louise Berry, Email: louise.berry17@nhs.net.

Gemma Clark, Email: gemma.clark3@nhs.net.

William L. Irving, Email: mbawi@exmail.nottingham.ac.uk.

Alexander W. Tarr, Email: mrzawt@exmail.nottingham.ac.uk.

Jonathan K. Ball, Email: Jonathan.Ball@lstmed.ac.uk.

Stuart Astbury, Email: stuart.astbury@nottingham.ac.uk.

Matt Loose, Email: matt.loose@nottingham.ac.uk.

References

  • 1.Su S, Wong G, Shi W, Liu J, Lai ACK, et al. Epidemiology, genetic recombination, and pathogenesis of coronaviruses. Trends Microbiol. 2016;24:490–502. doi: 10.1016/j.tim.2016.03.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Woo PCY, Groot RJ, Haagmans B, Lau SKP, Neuman BW, et al. ICTV virus taxonomy profile: coronaviridae 2023. J Gen Virol. 2023;104 doi: 10.1099/jgv.0.001843. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Hamre D, Procknow JJ. A new virus isolated from the human respiratory tract. Proc Soc Exp Biol Med. 1966;121:190–193. doi: 10.3181/00379727-121-30734. [DOI] [PubMed] [Google Scholar]
  • 4.McIntosh K, Dees JH, Becker WB, Kapikian AZ, Chanock RM. Recovery in tracheal organ cultures of novel viruses from patients with respiratory disease. Proc Natl Acad Sci USA. 1967;57:933–940. doi: 10.1073/pnas.57.4.933. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.van der Hoek L, Pyrc K, Jebbink MF, Vermeulen-Oost W, Berkhout RJM, et al. Identification of a new human coronavirus. Nat Med. 2004;10:368–373. doi: 10.1038/nm1024. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Woo PC, Lau SK, Chu C, Chan K, Tsoi H, et al. Characterization and complete genome sequence of a novel coronavirus, coronavirus HKU1, from patients with pneumonia. J Virol. 2005;79:884–895. doi: 10.1128/JVI.79.2.884-895.2005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Nickbakhsh S, Ho A, Marques DFP, McMenamin J, Gunson RN, et al. Epidemiology of seasonal coronaviruses: establishing the context for the emergence of coronavirus disease 2019. J Infect Dis. 2020;222:17–25. doi: 10.1093/infdis/jiaa185. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Veiga ABG da, Martins LG, Riediger I, Mazetto A, Debur M do C, et al. More than just a common cold: endemic coronaviruses OC43, HKU1, NL63, and 229E associated with severe acute respiratory infection and fatality cases among healthy adults. J Med Virol. 2021;93:1002–1007. doi: 10.1002/jmv.26362. [DOI] [PubMed] [Google Scholar]
  • 9.Torjesen I. Covid-19 will become endemic but with decreased potency over time, scientists believe. BMJ. 2021:n494. doi: 10.1136/bmj.n494. [DOI] [PubMed] [Google Scholar]
  • 10.Shah MM, Winn A, Dahl RM, Kniss KL, Silk BJ, et al. Seasonality of common human coronaviruses, United States, 2014-20211. Emerg Infect Dis . 2022;28:1970–1976. doi: 10.3201/eid2810.220396. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Al-Khannaq MN, Ng KT, Oong XY, Pang YK, Takebe Y, et al. Diversity and evolutionary histories of human coronaviruses NL63 and 229E associated with acute upper respiratory tract symptoms in Kuala Lumpur, Malaysia. Am J Trop Med Hyg. 2016;94:1058–1064. doi: 10.4269/ajtmh.15-0810. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Shao N, Zhang C, Dong J, Sun L, Chen X, et al. Molecular evolution of human coronavirus-NL63, -229E, -HKU1 and -OC43 in hospitalized children in China. Front Microbiol. 2022;13:1023847. doi: 10.3389/fmicb.2022.1023847. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Zhu Y, Li C, Chen L, Xu B, Zhou Y, et al. A novel human coronavirus OC43 genotype detected in mainland China. Emerg Microbes Infect. 2018;7:173. doi: 10.1038/s41426-018-0171-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Woo PCY, Lau SKP, Yip CCY, Huang Y, Tsoi H-W, et al. Comparative analysis of 22 coronavirus HKU1 genomes reveals a novel genotype and evidence of natural recombination in coronavirus HKU1. J Virol. 2006;80:7136–7145. doi: 10.1128/JVI.00509-06. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Zhang Z, Liu W, Zhang S, Wei P, Zhang L, et al. Two novel human coronavirus OC43 genotypes circulating in hospitalized children with pneumonia in China. Emerg Microbes Infect. 2022;11:168–171. doi: 10.1080/22221751.2021.2019560. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Ye R-Z, Gong C, Cui X-M, Liu J-Y, Fan H, et al. Continuous evolution and emerging lineage of seasonal human coronaviruses: a multicenter surveillance study. J Med Virol. 2023;95:e28861. doi: 10.1002/jmv.28861. [DOI] [PubMed] [Google Scholar]
  • 17.Müller NF, Kistler KE, Bedford T. A bayesian approach to infer recombination patterns in coronaviruses. Nat Commun. 2022;13:4186. doi: 10.1038/s41467-022-31749-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Dong X, Deng Y-M, Aziz A, Whitney P, Clark J, et al. A simplified, amplicon-based method for whole genome sequencing of human respiratory syncytial viruses. J Clin Virol. 2023;161:105423. doi: 10.1016/j.jcv.2023.105423. [DOI] [PubMed] [Google Scholar]
  • 19.Francis RV, Billam H, Clarke M, Yates C, Tsoleridis T, et al. The impact of real-time whole-genome sequencing in controlling healthcare-associated SARS-CoV-2 outbreaks. J Infect Dis. 2022;225:10–18. doi: 10.1093/infdis/jiab483. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Freed NE, Vlková M, Faisal MB, Silander OK. Rapid and inexpensive whole-genome sequencing of SARS-CoV-2 using 1200 bp tiled amplicons and oxford nanopore rapid barcoding. Biol Methods Protoc. 2020;5:bpaa014. doi: 10.1093/biomethods/bpaa014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Quick J, Grubaugh ND, Pullan ST, Claro IM, Smith AD, et al. Multiplex PCR method for MinION and Illumina sequencing of Zika and other virus genomes directly from clinical samples. Nat Protoc. 2017;12:1261–1276. doi: 10.1038/nprot.2017.066. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Quick J, Loman NJ, Duraffour S, Simpson JT, Severi E, et al. Real-time, portable genome sequencing for Ebola surveillance. Nature. 2016;530:228–232. doi: 10.1038/nature16996. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Maes M, Khokhar F, Wilkinson SAJ, Smith AD, Kovalenko G, et al. Multiplex MinION sequencing suggests enteric adenovirus F41 genetic diversity comparable to pre-COVID-19 era. Microb Genom. 2023;9:mgen000920. doi: 10.1099/mgen.0.000920. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Joffret ML, Polston PM, Razafindratsimandresy R, Bessaud M, Heraud JM, et al. Whole genome sequencing of Enteroviruses species A to D by high-throughput sequencing: application for viral mixtures. Front Microbiol. 2018;9:2339. doi: 10.3389/fmicb.2018.02339. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Borcard L, Gempeler S, Terrazos Miani MA, Baumann C, Grädel C, et al. Investigating the extent of primer dropout in SARS-CoV-2 genome sequences during the early circulation of delta variants. FrontVirol. 2022;2:2022–April. doi: 10.3389/fviro.2022.840952. [DOI] [Google Scholar]
  • 26.Kovacs D, Mambule I, Read JM, Kiran A, Chilombe M, et al. Epidemiology of human seasonal coronaviruses among people with mild and severe acute respiratory illness in blantyre, malawi, 2011-2017. J Infect Dis. 2024 doi: 10.1093/infdis/jiad587. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Zhao M-C, Wen C, Sun L, Duan S-X, Zang K-X, et al. Epidemiology and clinical characteristics of seasonal human coronaviruses in children hospitalized in hebei province, China before and during the COVID-19 pandemic. Risk Manag Healthc Policy. 2023;16:1801–1807. doi: 10.2147/RMHP.S423077. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Chow EJ, Casto AM, Rogers JH, Roychoudhury P, Han PD, et al. The clinical and genomic epidemiology of seasonal human coronaviruses in congregate homeless shelter settings: a repeated cross-sectional study. Lancet Reg Health Am . 2022;15:100348. doi: 10.1016/j.lana.2022.100348. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Chellapuri A, Smitheman M, Chappell JG, Clark G, Howson-Wells HC, et al. Human parainfluenza 2 & 4: clinical and genetic epidemiology in the UK, 2013-2017, reveals distinct disease features and co-circulating genomic subtypes. Influenza Other Respir Viruses. 2022;16:1122–1132. doi: 10.1111/irv.13012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Howson-Wells HC, Tsoleridis T, Zainuddin I, Tarr AW, Irving WL, et al. Enterovirus D68 epidemic, UK, 2018, was caused by subclades B3 and D1, predominantly in children and adults, respectively, with both subclades exhibiting extensive genetic diversity. Microb Genom. 2022;8:mgen000825. doi: 10.1099/mgen.0.000825. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Li H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics. 2018;34:3094–3100. doi: 10.1093/bioinformatics/bty191. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Simpson JT, Workman RE, Zuzarte PC, David M, Dursi LJ, et al. Detecting DNA cytosine methylation using nanopore sequencing. Nat Methods. 2017;14:407–410. doi: 10.1038/nmeth.4184. [DOI] [PubMed] [Google Scholar]
  • 33.Edgar RC. MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics. 2004;5:113. doi: 10.1186/1471-2105-5-113. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.O’Toole Á, Hill V, Jackson B, Dewar R, Sahadeo N, et al. Genomics-informed outbreak investigations of SARS-CoV-2 using civet. PLoS Glob Public Health. 2022;2:e0000704. doi: 10.1371/journal.pgph.0000704. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Markov PV, Ghafari M, Beer M, Lythgoe K, Simmonds P, et al. The evolution of SARS-CoV-2. Nat Rev Microbiol. 2023;21:361–379. doi: 10.1038/s41579-023-00878-2. [DOI] [PubMed] [Google Scholar]
  • 36.Chibo D, Birch C. Analysis of human coronavirus 229E spike and nucleoprotein genes demonstrates genetic drift between chronologically distinct strains. J Gen Virol. 2006;87:1203–1208. doi: 10.1099/vir.0.81662-0. [DOI] [PubMed] [Google Scholar]
  • 37.Wang Y, Li X, Liu W, Gan M, Zhang L, et al. Discovery of a subgenotype of human coronavirus NL63 associated with severe lower respiratory tract infection in China, 2018. Emerg Microbes Infect. 2020;9:246–255. doi: 10.1080/22221751.2020.1717999. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Pollett S, Conte MA, Sanborn M, Jarman RG, Lidl GM, et al. A comparative recombination analysis of human coronaviruses and implications for the SARS-CoV-2 pandemic. Sci Rep. 2021;11:17365. doi: 10.1038/s41598-021-96626-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Chen X, Zhu Y, Li Q, Lu G, Li C, et al. Genetic characteristics of human coronavirus HKU1 in mainland China during 2018. Arch Virol. 2022;167:2173–2180. doi: 10.1007/s00705-022-05541-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Kemp SA, Collier DA, Datir RP, Ferreira IATM, Gayed S, et al. SARS-CoV-2 evolution during treatment of chronic infection. Nature. 2021;592:277–282. doi: 10.1038/s41586-021-03291-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Chappell JG, Tsoleridis T, Clark G, Berry L, Holmes N, et al. Retrospective screening of routine respiratory samples revealed undetected community transmission and missed intervention opportunities for SARS-CoV-2 in the United Kingdom. J Gen Virol. 2021;102:001595. doi: 10.1099/jgv.0.001595. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.du Plessis L, McCrone JT, Zarebski AE, Hill V, Ruis C, et al. Establishment and lineage dynamics of the SARS-CoV-2 epidemic in the UK. Science. 2021;371:708–712. doi: 10.1126/science.abf2946. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Eguia RT, Crawford KHD, Stevens-Ayers T, Kelnhofer-Millevolte L, Greninger AL, et al. A human coronavirus evolves antigenically to escape antibody immunity. PLoS Pathog. 2021;17:e1009453. doi: 10.1371/journal.ppat.1009453. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Kistler KE, Bedford T. Evidence for adaptive evolution in the receptor-binding domain of seasonal coronaviruses OC43 and 229e. Elife. 2021;10:e64509. doi: 10.7554/eLife.64509. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Corman VM, Muth D, Niemeyer D, Drosten C. Hosts and sources of endemic human coronaviruses. Adv Virus Res. 2018;100:163–188. doi: 10.1016/bs.aivir.2018.01.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Cui J, Li F, Shi ZL. Origin and evolution of pathogenic coronaviruses. Nat Rev Microbiol. 2019;17:181–192. doi: 10.1038/s41579-018-0118-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Forni D, Cagliani R, Arrigoni F, Benvenuti M, Mozzi A, et al. Adaptation of the endemic coronaviruses HCoV-OC43 and HCoV-229E to the human host. Virus Evol. 2021;7:veab061. doi: 10.1093/ve/veab061. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Kiyuka PK, Agoti CN, Munywoki PK, Njeru R, Bett A, et al. Human coronavirus NL63 molecular epidemiology and evolutionary patterns in rural coastal Kenya. J Infect Dis. 2018;217:1728–1739. doi: 10.1093/infdis/jiy098. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.López-Labrador FX, Brown JR, Fischer N, Harvala H, Van Boheemen S, et al. Recommendations for the introduction of metagenomic high-throughput sequencing in clinical virology, part I: wet lab procedure. J Clin Virol. 2021;134:104691. doi: 10.1016/j.jcv.2020.104691. [DOI] [PubMed] [Google Scholar]
  • 50.de Vries JJC, Brown JR, Couto N, Beer M, Le Mercier P, et al. Recommendations for the introduction of metagenomic next-generation sequencing in clinical virology, part II: bioinformatic analysis and reporting. J Clin Virol. 2021;138:104812. doi: 10.1016/j.jcv.2021.104812. [DOI] [PubMed] [Google Scholar]
  • 51.Brinkmann A, Uddin S, Ulm S-L, Pape K, Förster S, et al. RespiCoV: simultaneous identification of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and 46 respiratory tract viruses and bacteria by amplicon-based Oxford-Nanopore MinION sequencing. PLoS One. 2022;17:e0264855. doi: 10.1371/journal.pone.0264855. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Viana R, Moyo S, Amoako DG, Tegally H, Scheepers C, et al. Rapid epidemic expansion of the SARS-CoV-2 omicron variant in southern Africa. Nature. 2022;603:679–686. doi: 10.1038/s41586-022-04411-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Morfopoulou S, Brown JR, Davies EG, Anderson G, Virasami A, et al. Human coronavirus OC43 associated with fatal encephalitis. N Engl J Med. 2016;375:497–498. doi: 10.1056/NEJMc1509458. [DOI] [PubMed] [Google Scholar]
  • 54.Callow KA, Parry HF, Sergeant M, Tyrrell DA. The time course of the immune response to experimental coronavirus infection of man. Epidemiol Infect. 1990;105:435–446. doi: 10.1017/s0950268800048019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Reed SE. The behaviour of recent isolates of human respiratory coronavirus in vitro and in volunteers: evidence of heterogeneity among 229E-related strains. J Med Virol. 1984;13:179–192. doi: 10.1002/jmv.1890130208. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Volz E, Mishra S, Chand M, Barrett JC, Johnson R, et al. Assessing transmissibility of SARS-CoV-2 lineage B.1.1.7 in England. Nature. 2021;593:266–269. doi: 10.1038/s41586-021-03470-x. [DOI] [PubMed] [Google Scholar]
  • 57.Tegally H, Moir M, Everatt J, Giovanetti M, Scheepers C, et al. Emergence of SARS-CoV-2 omicron lineages BA.4 and BA.5 in South Africa. Nat Med. 2022;28:1785–1790. doi: 10.1038/s41591-022-01911-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Tang JW, Lam TT, Zaraket H, Lipkin WI, Drews SJ, et al. Global epidemiology of non-influenza RNA respiratory viruses: data gaps and a growing need for surveillance. Lancet Infect Dis. 2017;17:e320–e326. doi: 10.1016/S1473-3099(17)30238-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Paget J, Caini S, Del Riccio M, van Waarden W, Meijer A. Has influenza B/Yamagata become extinct and what implications might this have for quadrivalent influenza vaccines? Euro Surveill. 2022;27:2200753. doi: 10.2807/1560-7917.ES.2022.27.39.2200753. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Kundu R, Narean JS, Wang L, Fenn J, Pillay T, et al. Cross-reactive memory T cells associate with protection against SARS-CoV-2 infection in COVID-19 contacts. Nat Commun. 2022;13:80. doi: 10.1038/s41467-021-27674-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Edridge AWD, Kaczorowska J, Hoste ACR, Bakker M, Klein M, et al. Seasonal coronavirus protective immunity is short-lasting. Nat Med. 2020;26:1691–1693. doi: 10.1038/s41591-020-1083-1. [DOI] [PubMed] [Google Scholar]
  • 62.Lineburg KE, Grant EJ, Swaminathan S, Chatzileontiadou DSM, Szeto C, et al. CD8+ T cells specific for an immunodominant SARS-CoV-2 nucleocapsid epitope cross-react with selective seasonal coronaviruses. Immunity. 2021;54:1055–1065. doi: 10.1016/j.immuni.2021.04.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Chaguza C, Hahn AM, Petrone ME, Zhou S, Ferguson D, et al. Accelerated SARS-CoV-2 intrahost evolution leading to distinct genotypes during chronic infection. Cell Rep Med . 2023;4:100943. doi: 10.1016/j.xcrm.2023.100943. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Gonzalez-Reiche AS, Alshammary H, Schaefer S, Patel G, Polanco J, et al. Sequential intrahost evolution and onward transmission of SARS-CoV-2 variants. Nat Commun. 2023;14:3235. doi: 10.1038/s41467-023-38867-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Zhang B, Tian J, Zhang Q, Xie Y, Wang K, et al. Comparing the nucleocapsid proteins of human coronaviruses: structure, immunoregulation, vaccine, and targeted drug. Front Mol Biosci. 2022;9:761173. doi: 10.3389/fmolb.2022.761173. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Szelazek B, Kabala W, Kus K, Zdzalik M, Twarda-Clapa A, et al. Structural characterization of human coronavirus NL63 N protein. J Virol. 2017;91:e02503-16. doi: 10.1128/JVI.02503-16. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Suharsono H, Mahardika BK, Sudipa PH, Sari TK, Suardana IBK, et al. Consensus insertion/deletions and amino acid variations of all coding and noncoding regions of the SARS-CoV-2 omicron clades, including the XBB and BQ.1 lineages. Arch Virol. 2023;168:156. doi: 10.1007/s00705-023-05787-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Vlasova AN, Diaz A, Damtie D, Xiu L, Toh T-H, et al. Novel canine coronavirus isolated from a hospitalized patient with pneumonia in East Malaysia. Clin Infect Dis. 2022;74:446–454. doi: 10.1093/cid/ciab456. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Gussow AB, Auslander N, Faure G, Wolf YI, Zhang F, et al. Genomic determinants of pathogenicity in SARS-CoV-2 and other human coronaviruses. Proc Natl Acad Sci USA. 2020;117:15193–15199. doi: 10.1073/pnas.2008176117. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Material 1.
mgen-11-01451-s001.xlsx (110.9KB, xlsx)
DOI: 10.1099/mgen.0.001451
Supplementary Material 2.
mgen-11-01451-s002.pdf (747.3KB, pdf)
DOI: 10.1099/mgen.0.001451

Articles from Microbial Genomics are provided here courtesy of Microbiology Society

RESOURCES