Table 9. Results of Genome Sequencing and Phylogenetic Analyses.
Study ID | Study Setting | Method used for WGS | Phylogenetic analysis | Results |
---|---|---|---|---|
Böhmer 2020 | Home,
workplace |
Whole genome sequencing involved Roche KAPA
HyperPlus library preparation and sequencing on Illumina NextSeq and MiSeq instruments as well as RT-PCR product sequencing on Oxford Nanopore MinION using the primers described in Corman and colleagues. Patient 1 was sequenced on all three platforms; patients 2–7 were sequenced on Illumina NextSeq, both with and without RT- PCR product sequencing with primers as in Corman and colleagues; and patients 8–11, 14, and 16 were sequenced on Oxford Nanopore MinION. Sequencing of patient 15 was not successful. Sequence gaps were filled by Sanger sequencing. |
Not reported | Presymptomatic transmission from patient 4 to patient
5 was strongly supported by virus sequence analysis: a nonsynonymous nucleotide polymorphism (a G6446A substitution) was found in the virus from patients 4 and 5 onwards but not in any cases detected before this point (patients 1–3). Later cases with available specimens, all containing this same substitution, were all traced back to patient 5. The possibility that patient 4 could have been infected by patient 5 was excluded by detailed sequence analysis: patient 4 had the novel G6446A virus detected in a throat swab and the original 6446G virus detected in her sputum, whereas patient 5 had a homogeneous virus population containing the novel G6446A substitution in the throat swab. |
Cerami 2021 | Household | cDNA libraries were generated using ARTIC Network
amplicons to generate cDNA followed by library construction with a QIAGEN® (Hilden, Germany) QIAseq FX kit. Paired-end libraries were sequenced on an Illumina MiSeq at the UNC High-Throughput Sequencing Facility. Following demultiplexing, libraries underwent adapter and quality trimming according to default parameters for paired-end reads in Trim Galore!. Trimmed fastq files were converted to unaligned BAM format, trimmed of primer sequences, aligned to the Wuhan reference sequence, and assembled into fasta format using the Broad Institute viral NGS pipelines implemented in Docker Desktop. The resulting fasta files were aligned via MAFFT v7.450 implemented in Geneious Prime® 2021. |
Relatedness between viral sequences was
assessed via phylogenomic analysis in MrBayes v3.2.6 implemented in Geneious Prime® 2021 using default parameters and setting the Wuhan reference sequence as the outgroup. Samples from the same household were considered to be related if they were assigned to the same larger clade by Nextclade as well as the same clade in MrBayes. All sequences included in this analysis are available on GISAID under the accession numbers EPI_ISL_3088340 to EPI_ISL_3088373. |
High density amplicon sequencing of viral isolates from
these late secondary cases and others in their household confirmed that 4/5 were indeed due to household transmission |
Firestone 2020 | Motorcycle
rally |
WGS was conducted at the MDH Public Health Laboratory
on 38 specimens using previously described methods. |
Phylogenetic relationships, including distinct
clustering of viral whole genome sequences, were inferred based on nucleotide differences via IQ-TREE using general time reversible substitution models as a part of the Nextstrain workflow. |
38 (73%) specimens (23 [61%] from primary and 15 [39%]
from secondary and tertiary cases) were successfully sequenced, covering at least 98% of the SARS-CoV-2 genome. Six genetically similar clusters with known epidemiologic links were identified (i.e., cases in patients who were close contacts or who had common exposures at the rally), five of which demonstrated secondary or secondary and tertiary transmission. |
Huang 2021 | Local | Not described | Not described | WGS revealed that all 5 isolates belong to the same
clade, with only four nucleotide changes in two, while the remaining three showing identical viral genome |
Jeewandara
2021 |
Household
Community |
Library preparation was attempted using the AmpliSeq
for Illumina SARS-CoV-2 Community Panel, in combination with AmpliSeq for Illumina library prep, index, and accessories (Illumina, San Diego, USA) and targeted RNA/cDNA amplicon assay was used. The representative lineage sequences were downloaded from https://github. com/cov-lineages/lineages (anonymised.encrypted.aln. safe.fasta) |
Sequence lineage, nucleotide mutations and
amino acid replacements were generated using the CoV-GLUE graphical user interface. GISAID database used. |
Two viruses (only 2/89 samples had RT-qPCR Ct values
<25) were sequenced from this cohort which revealed that they were of clades B.4 and B.1, suggesting that many different virus strains were circulating within the Bandaranayaka watta during this time. One of the viruses had the D614G mutation. |
Jiang 2020 | Home | Positive samples were sequenced directly from the
original specimens as previously described. *Reference virus genomes were obtained from GenBank using Blastn with 2019-nCoV as a query. The open reading frames of the verified genome sequences were predicted using Geneious (version 11.1.5) and annotated using the Conserved Domain Database. Pairwise sequence identities were also calculated using Geneious. Potential genetic recombination was investigated using SimPlot software and phylogenetic analysis. |
The maximum likelihood phylogenetic tree
of the complete genomes was conducted by using RAxML software with 1000 bootstrap replicates, employing the general time- reversible nucleotide substitution model. |
The full genome of 8 patients were >99.9% identical
across the whole genome. Phylogenetic analysis showed that viruses from patients were clustered in the same clade and genetically similar to other SARS-CoV-2 sequences reported in other countries. |
Klompas 2021 | Local | Total nucleic acid from respiratory specimens was
extracted using the Roche MagNA Pure 96 DNA and Viral NA Small Volume Pack. Presence and abundance estimates of SARS-CoV-2 RNA were evaluated by the CDC 2019-Novel Coronavirus Real-Time RT-PCR Diagnostic Panel. Tiled, whole-genome amplicon sequencing was performed using an adapted ARTIC V3 SARS-CoV-2 protocol and a common protocol developed by a collaborative group of state public health laboratories, and the CDC. The samples were combined after PCR tiling, screened, and quantified for Illumina DNA Prep. |
The Cecret pipeline (
https://github.com/
UPHL-BioNGS/Cecret) was used, with minor modifications for our local environment, to generate consensus genomes for each sample. To ensure accuracy of results, we only considered highly complete (≥95% coverage) genomes in downstream analyses. These sequences were aligned and computed pairwise distances between sample genomes. Resultant SNP distances were discussed within the context of epidemiologic linkage to rule in or rule out individuals from this particular cluster. |
Whole-genome sequencing confirmed that 2 staff
members were infected despite wearing surgical masks and eye protection. |
Kolodziej 2022 | Household | Sequences were obtained from saliva samples with
the highest viral load and are labelled per household. Amplicon-based SARS-CoV-2 sequencing for was performed on the positive saliva sample with the highest viral load for each individual using the Nanopore protocol “PCR tiling of COVID-19 virus (Version: PTC_9096_v109_ revE_06FEB2020)” which is based on the ARTIC v3 amplicon sequencing protocol. Several modifications were made to the protocol as primer concentrations were increased from 0.125 to 1 pmol for the following amplicon primer pairs. AMPure XP beads purification was only performed on clinical samples with an initial Cp-value <32. Both libraries were generated using native barcode kits from Nanopore SQK-LSK109 (EXP-NBD104, EXP-NBD114 and EXP-NBD196) and sequencing was performed on a R9.4.1 flow cell multiplexing 48–96 samples per sequence run. |
Not described | Each household shows a distinct cluster in phylogenetic
analyses with minimal sequence differences indicative of a single introduction within each household. For certain households only a single genome could be determined, for which no conclusions could be drawn. |
Ladhani 2020a | Care homes | Whole genome sequencing (WGS) was performed on all
RT-PCR positive samples. Viral amplicons were sequenced using Illumina library preparation kits (Nextera) and sequenced on Illumina short-read sequencing machines. Raw sequence data was trimmed and aligned against a SARS-CoV-2 reference genome (NC_045512.2). A consensus sequence representing each genome base was derived from the reference alignment. |
Consensus sequences were assessed for
quality, aligned using MAFFT (Multiple Alignment using Fast Fourier Transform, version 7.310), manually curated and maximum likelihood phylogenetic trees derived using IQtree (version 2.04). |
All 158 PCR positive samples underwent WGS analysis
and 99 (68 residents, 31 staff) distributed across all the care homes yielded sequence sufficient for WGS analysis. Phylogenetic analysis identified informal clusters, with evidence for multiple introductions of the virus into care home settings. All care home clusters of SARS-CoV-2 genomes included at least one staff member, apart from care home B with no PCR positive staff and high rates of staff self-isolation. Care home A exhibited three distinct sequence clusters and six singletons, potentially representing up to nine separate introductions. Genomic analysis did not identify any differences between asymptomatic/symptomatic residents/staff. The 10 sequences from residents who died were distributed across the lineages identified and were closely matched to sequences derived from non-fatal cases in the same care homes. |
Lucey 2020 | Hospital | Complementary DNA was obtained from isolated
RNA through reverse transcription and multiplex PCR according to the protocol provided by the Artic Network initiative. Libraries were prepared using the NEBNext Ultra II kit (New England Biolabs) and sequenced on an Illumina MiSeq using 300-cycle v2 reagent kits (Illumina). Bowtie 2 was used for aligning the sequencing reads to the reference genome for SARS-CoV-2 (GenBank number, MN908947.3) and SAMtools for manipulating the alignments. |
SNPs were used to define clusters and a
median-joining network was generated including these data from this study and an additional 1,000 strains collected from GISAID available on May 22nd. Clade annotation was included for the Pangolin, GISAID and NextStrain systems. |
WvGS identified six clusters of nosocomial SARS-CoV-2
transmission. The average sequence quality per samples was > 99% for 46 samples, and between 92 and 94% for 4 samples. Phylogenetic analysis identified six independent groups of which clusters 1–3 were related to 39 patients. |
Pang 2022 | Local | Three specific real-time RT-PCR methods targeting the
N, S, and ORF1ab genes were designed to detect the presence of SARS-CoV-2 in clinical samples. Thermal cycling for N gene real-time RT-PCR assays was performed at 50°C for 20 min for reverse transcription, 95°C for 15 min, 50 cycles of 94°C for 5 s, 55°C for 1 min. |
Residual RNA was subjected to tiled
amplicon PCR using ARTIC nCoV-2019 version 3 panel, where One-Step RT-PCR was performed using the SuperScript™ III One-Step RT-PCR System with Platinum™ Taq DNA Polymerase (Thermo Fisher Scientific, MA, USA). Sequencing libraries were prepared using the Nextera XT and sequenced on MiSeq (Illumina, CA, USA) to generate 300 bp paired-end reads. The reads were subjected to a hard-trim of 50 bp on each side to remove primer artifacts using BBMap prior to consensus sequence generation. The generated consensus sequences were shared via a global initiative on sharing avian flu data (GISAID). Closely related representative strains from other countries (99.99% identity and matching the time window) were identified in the GISAID database using BLASTN. |
With the exception of sequence, phylogenetic analysis of
SARS-CoV-2 genome sequences obtained from all cases (13/14; 92.9%), including H1, was grouped into a single cluster. This cluster was supported by a single mutation (T27588A) not found in other sequences in the database before the nursing home outbreak. |
Powell 2022 | Local | Not described | Not described | Whole genome sequencing was successful in two of
five index cases (the initial confirmed case that led to the bubble self-isolating) and all nine positive direct contacts. Overall, four of the nine sequences available for comparison identified different SARS-CoV-2 strains, therefore, ruling out transmission between affected individuals. |
Pung 2020 | Multiple:
Company conference, church, tour group. |
Strain names, GISAID EpiCoV accession numbers used for
genomic sequencing |
Phylogenetic tree utilised the Neighbor-
Joining method and confirmed using Maximum Likelihood approaches. Replicate trees with bootstrap used. All ambiguous positions were removed for each sequence pair (pairwise deletion option). Evolutionary analyses were conducted in MEGA X. Strain names, GISAID EpiCoV accession numbers and collection dates are shown, followed by the case number if available. |
Cluster A: Viral genomic sequences were available for
four cases (AH1, AH2, AH3, and AT1) and phylogenetic analysis confirmed their linkage, as suggested by the epidemiological data. |
Sikkema 2020 | Hospital | Samples were selected based on a Ct <32. A SARS-CoV-2-
specific multiplex PCR for nanopore sequencing was done. The resulting raw sequence data were demultiplexed using qcat. Primers were trimmed using cutadapt,17 after which a reference-based alignment to the GISAID (Global Initiative on Sharing All Influenza Data) sequence EPI_ISL_412973 was done using minimap2. The consensus genome was extracted and positions with a coverage less than 30 reads were replaced with N using a custom script using biopython software (version 1.74) and the python module pysam (version 0.15.3). Mutations in the genome were confirmed by manually checking the alignment, and homopolymeric regions were manually checked and resolved, consulting the reference genome. Genomes were included when having greater than 90% genome coverage. All available full-length SARS-CoV-2 genomes were retrieved from GISAID20 on March 20, 2020 (appendix 1 pp 8–65), and aligned with the newly obtained SARS-CoV-2 sequences in this study using the multiple sequence alignment software MUSCLE (version 3.8.1551). Sequences with more than 10% of N position replacements were excluded. The alignment was manually checked for discrepancies, after which the phylogenomic software IQ-TREE (version 1.6.8) was used to do a maximum- likelihood phylogenetic analysis, with the generalised time reversible substitution model GTR+F+I+G4 as best predicted model.The ultrafast bootstrap option was used with 1000 replicates. Clusters were ascertained based on visual clustering and lineage designations. |
The code to generate the minimum
spanning phylogenetic tree was written in the R programming language. Ape24 and igraph software packages were used to write the code to generate the minimum spanning tree, and the visNetwork software package was used to generate the visualisation. Pairwise sequence distance (used to generate the network) was calculated by adding up the absolute nucleotide distance and indel-block distance. Unambiguous positions were dealt with in a pairwise manner. Sequences that were mistakenly identified as identical, because of transient connections with sequences containing missing data, were resolved. |
46 (92%) of 50 sequences from health-care workers in the
study were grouped in three clusters. Ten (100%) of 10 sequences from patients in the study grouped into the same three clusters: |
Speake 2020 | Aircraft | Processed reads were mapped to the SARS-CoV-2
reference genome (GenBank accession no. MN908947). Primer-clipped alignment files were imported into Geneious Prime version 2020.1.1 for coverage analysis before consensus calling, and consensus sequences were generated by using iVar version 1.2.2. |
Genome sequences of SARS-CoV-2
from Western Australia were assigned to lineages by using the Phylogenetic Assignment of Named Global Outbreak LINeages (PANGOLIN) tool ( https://github. com/cov-lineages/pangolinExternal Link). On July 17, 2020, we retrieved SARS-CoV-2 complete genomes with corresponding metadata from the GISAID database. The final dataset contained 540 GISAID whole- genome sequences that were aligned with the sequences from Western Australia generated in this study by using MAFFT version 7.467. Phylogenetic trees were visualized in iTOL (Interactive Tree Of Life, https://itol.embl.deExternal Link) and MEGA version 7.014. |
100% coverage was obtained for 21 and partial coverage
(81%–99%) for 4 samples. The phylogenetic tree for the 21 complete genomes belonged to either the A.2 (n = 17) or B.1 (n = 4) sublineages of SARS-CoV-2 |
Taylor 2020 | Skilled nursing
facilities |
WGS was conducted by MDH-PHL on available specimens
using previously described methods. |
Phylogenetic relationships, including distinct
clustering of viral whole genome sequences, were inferred based on nucleotide differences via IQ-TREE, using general time reversible substitution models |
Specimens from 18 (35%) residents and seven (18%) HCP
at facility A were sequenced - Strains from 17 residents and five HCP were genetically similar. At facility B, 75 (66%) resident specimens and five (7%) HCP specimens were sequenced, all of which were genetically similar. |
Wang 2020 | Home | Full genomes were sequenced using the BioelectronSeq
4000. WGS integrated information from 60 published genomic sequences of SARS-CoV-2. Full-length genomes were combined with published SARS-CoV-2 genomes and other coronaviruses and aligned using the FFT-NS-2 model by MAFFT. |
Maximum-likelihood phylogenies were
inferred under a generalised-time-reversal (GTR)+ gamma substitution model and bootstrapped 1000 times to assess confidence using RAxML. |
The phylogenetic tree of full-length genomes showed
that SARS-CoV-2 strains form a monophyletic clade with a bootstrap support of 100%. Sequences from six HCWs in the Department of Neurosurgery and one family member were closely related in the phylogenetic tree. 33 family members of the HCWs were not secondarily infected, due to the strict self-quarantine strategies taken by the HCWs immediately after their onset of illness, including wearing a facial mask when they came home, living alone in a separated room, never eating together with their families. |
Zhang 2021 | Local
Household |
Sequencing raw reads were trimmed to remove
sequencing adaptors and low-quality bases. Clean reads were aligned to the reference genome of the SARS-CoV-2 (GenBank: NC_045512.2) using the Bowtie2 v.2.2.537 with default parameters. Duplicate reads were removed with Picard Tools. Samtools (v.1.10) “mpileup” was used to call SNPs using mpileup files as input with parameter -Q 20. Each site was re-calculated, and variants were screened using perl script with the following parameters: (ia) depth of alternate allele ≥ 5, (ii) alternate allele frequency ≥70%, and (iii) discarding the sites only supported by a single strand. The C337T variant in P4 were also considered as an SNP that was supported by sequencing reads (with 67% frequency) and validated by Sanger sequencing. Consensus sequences were called using BCFtools based on reference sequence. |
Phylogenomic analysis of 13 high-quality
(coverage: ≥70%) viral genomes was performed together with 72 strains circulating in Beijing during the same period, including 33 public viral genomes (from global initiative on sharing all influenza data [GISAID]) and 39 viral genomes from local centre. Viral genome was obtained from all 14 patients. 72 viral genomes were obtained, 33 were from the GISAID ( https://gisaid.org), 39 were from Beijing Ditan Hospital (GenBank: PRJNA667180). Consensus sequences were trimmed to 5ʹ and 3ʹ untranslated regions due to their poor quality. Multiple sequence alignment was conducted with parameters --auto --keeplength --addfragments using MAFFT v.7.45324.39. The maximum likelihood tree was constructed using IQ- TREE v.1.6.12 with 1000 bootstrap replicates. The substitution model GTR+F+R2 was selected based on Bayesian information criteria score. TreeTime v.0.7.6 was used for time-resolved phylogenomic analysis.41 iTOL (itol.embl.de) was applied for displaying topology of phylogenomic tree. The nucleotide frequency of each genomic locus was calculated with the 85 viral genomes of circulating strains in Beijing, including 13 genomes (P1–P13) from the outbreak cluster and 72 local genomes mentioned above (Figure S1). A median joining network was constructed using NETWORK v.10.1.0.0 on the Fluxus Technology website ( https://www.fluxus- engineering.com/). |
Twelve viral genomes from this outbreak were tightly
clustered into two clades with bootstrap values of at least 77%. |