Skip to main content
. 2022 Nov 17;10:280. Originally published 2021 Apr 9. [Version 3] doi: 10.12688/f1000research.52439.3

Table 9. Results of Genome Sequencing and Phylogenetic Analyses.

Study ID Study Setting Method used for WGS Phylogenetic analysis Results
Böhmer 2020 Home,
workplace
Whole genome sequencing involved Roche KAPA
HyperPlus library preparation and sequencing on Illumina
NextSeq and MiSeq instruments as well as RT-PCR product
sequencing on Oxford Nanopore MinION using the
primers described in Corman and colleagues. Patient 1
was sequenced on all three platforms; patients 2–7 were
sequenced on Illumina NextSeq, both with and without RT-
PCR product sequencing with primers as in Corman and
colleagues; and patients 8–11, 14, and 16 were sequenced
on Oxford Nanopore MinION. Sequencing of patient 15
was not successful. Sequence gaps were filled by Sanger
sequencing.
Not reported Presymptomatic transmission from patient 4 to patient
5 was strongly supported by virus sequence analysis: a
nonsynonymous nucleotide polymorphism (a G6446A
substitution) was found in the virus from patients 4 and 5
onwards but not in any cases detected before this point
(patients 1–3). Later cases with available specimens, all
containing this same substitution, were all traced back to
patient 5. The possibility that patient 4 could have been
infected by patient 5 was excluded by detailed sequence
analysis: patient 4 had the novel G6446A virus detected
in a throat swab and the original 6446G virus detected in
her sputum, whereas patient 5 had a homogeneous virus
population containing the novel G6446A substitution in
the throat swab.
Cerami 2021 Household cDNA libraries were generated using ARTIC Network
amplicons to generate cDNA followed by library
construction with a QIAGEN® (Hilden, Germany) QIAseq
FX kit. Paired-end libraries were sequenced on an Illumina
MiSeq at the UNC High-Throughput Sequencing Facility.
Following demultiplexing, libraries underwent adapter
and quality trimming according to default parameters for
paired-end reads in Trim Galore!. Trimmed fastq files were
converted to unaligned BAM format, trimmed of primer
sequences, aligned to the Wuhan reference sequence,
and assembled into fasta format using the Broad Institute
viral NGS pipelines implemented in Docker Desktop.
The resulting fasta files were aligned via MAFFT v7.450
implemented in Geneious Prime® 2021.
Relatedness between viral sequences was
assessed via phylogenomic analysis in
MrBayes v3.2.6 implemented in Geneious
Prime® 2021 using default parameters
and setting the Wuhan reference sequence
as the outgroup. Samples from the same
household were considered to be related
if they were assigned to the same larger
clade by Nextclade as well as the same clade
in MrBayes. All sequences included in this
analysis are available on GISAID under the
accession numbers EPI_ISL_3088340 to
EPI_ISL_3088373.
High density amplicon sequencing of viral isolates from
these late secondary cases and others in their household
confirmed that 4/5 were indeed due to household
transmission
Firestone 2020 Motorcycle
rally
WGS was conducted at the MDH Public Health Laboratory
on 38 specimens using previously described methods.
Phylogenetic relationships, including distinct
clustering of viral whole genome sequences,
were inferred based on nucleotide
differences via IQ-TREE using general time
reversible substitution models as a part of
the Nextstrain workflow.
38 (73%) specimens (23 [61%] from primary and 15 [39%]
from secondary and tertiary cases) were successfully
sequenced, covering at least 98% of the SARS-CoV-2
genome. Six genetically similar clusters with known
epidemiologic links were identified (i.e., cases in patients
who were close contacts or who had common exposures
at the rally), five of which demonstrated secondary or
secondary and tertiary transmission.
Huang 2021 Local Not described Not described WGS revealed that all 5 isolates belong to the same
clade, with only four nucleotide changes in two, while the
remaining three showing identical viral genome
Jeewandara
2021
Household
Community
Library preparation was attempted using the AmpliSeq
for Illumina SARS-CoV-2 Community Panel, in combination
with AmpliSeq for Illumina library prep, index, and
accessories (Illumina, San Diego, USA) and targeted
RNA/cDNA amplicon assay was used. The representative
lineage sequences were downloaded from https://github.
com/cov-lineages/lineages (anonymised.encrypted.aln.
safe.fasta)
Sequence lineage, nucleotide mutations and
amino acid replacements were generated
using the CoV-GLUE graphical user interface.
GISAID database used.
Two viruses (only 2/89 samples had RT-qPCR Ct values
<25) were sequenced from this cohort which revealed
that they were of clades B.4 and B.1, suggesting that
many different virus strains were circulating within the
Bandaranayaka watta during this time. One of the viruses
had the D614G mutation.
Jiang 2020 Home Positive samples were sequenced directly from the
original specimens as previously described.
*Reference virus genomes were obtained from GenBank
using Blastn with 2019-nCoV as a query. The open reading
frames of the verified genome sequences were predicted
using Geneious (version 11.1.5) and annotated using the
Conserved Domain Database. Pairwise sequence identities
were also calculated using Geneious. Potential genetic
recombination was investigated using SimPlot software
and phylogenetic analysis.
The maximum likelihood phylogenetic tree
of the complete genomes was conducted by
using RAxML software with 1000 bootstrap
replicates, employing the general time-
reversible nucleotide substitution model.
The full genome of 8 patients were >99.9% identical
across the whole genome. Phylogenetic analysis showed
that viruses from patients were clustered in the same
clade and genetically similar to other SARS-CoV-2
sequences reported in other countries.
Klompas 2021 Local Total nucleic acid from respiratory specimens was
extracted using the Roche MagNA Pure 96 DNA and
Viral NA Small Volume Pack. Presence and abundance
estimates of SARS-CoV-2 RNA were evaluated by the CDC
2019-Novel Coronavirus Real-Time RT-PCR Diagnostic
Panel. Tiled, whole-genome amplicon sequencing was
performed using an adapted ARTIC V3 SARS-CoV-2
protocol and a common protocol developed by a
collaborative group of state public health laboratories,
and the CDC. The samples were combined after PCR tiling,
screened, and quantified for Illumina DNA Prep.
The Cecret pipeline ( https://github.com/
UPHL-BioNGS/Cecret) was used, with minor
modifications for our local environment,
to generate consensus genomes for each
sample. To ensure accuracy of results, we
only considered highly complete (≥95%
coverage) genomes in downstream
analyses. These sequences were aligned
and computed pairwise distances between
sample genomes. Resultant SNP distances
were discussed within the context of
epidemiologic linkage to rule in or rule out
individuals from this particular cluster.
Whole-genome sequencing confirmed that 2 staff
members were infected despite wearing surgical masks
and eye protection.
Kolodziej 2022 Household Sequences were obtained from saliva samples with
the highest viral load and are labelled per household.
Amplicon-based SARS-CoV-2 sequencing for was
performed on the positive saliva sample with the highest
viral load for each individual using the Nanopore protocol
“PCR tiling of COVID-19 virus (Version: PTC_9096_v109_
revE_06FEB2020)” which is based on the ARTIC v3
amplicon sequencing protocol. Several modifications
were made to the protocol as primer concentrations were
increased from 0.125 to 1 pmol for the following amplicon
primer pairs. AMPure XP beads purification was only
performed on clinical samples with an initial Cp-value <32.
Both libraries were generated using native barcode kits
from Nanopore SQK-LSK109 (EXP-NBD104, EXP-NBD114
and EXP-NBD196) and sequencing was performed on a
R9.4.1 flow cell multiplexing 48–96 samples per sequence
run.
Not described Each household shows a distinct cluster in phylogenetic
analyses with minimal sequence differences indicative of
a single introduction within each household. For certain
households only a single genome could be determined,
for which no conclusions could be drawn.
Ladhani 2020a Care homes Whole genome sequencing (WGS) was performed on all
RT-PCR positive samples. Viral amplicons were sequenced
using Illumina library preparation kits (Nextera) and
sequenced on Illumina short-read sequencing machines.
Raw sequence data was trimmed and aligned against
a SARS-CoV-2 reference genome (NC_045512.2). A
consensus sequence representing each genome base was
derived from the reference alignment.
Consensus sequences were assessed for
quality, aligned using MAFFT (Multiple
Alignment using Fast Fourier Transform,
version 7.310), manually curated and
maximum likelihood phylogenetic trees
derived using IQtree (version 2.04).
All 158 PCR positive samples underwent WGS analysis
and 99 (68 residents, 31 staff) distributed across all the
care homes yielded sequence sufficient for WGS analysis.
Phylogenetic analysis identified informal clusters, with
evidence for multiple introductions of the virus into care
home settings. All care home clusters of SARS-CoV-2
genomes included at least one staff member, apart
from care home B with no PCR positive staff and high
rates of staff self-isolation. Care home A exhibited three
distinct sequence clusters and six singletons, potentially
representing up to nine separate introductions. Genomic
analysis did not identify any differences between
asymptomatic/symptomatic residents/staff. The 10
sequences from residents who died were distributed
across the lineages identified and were closely matched
to sequences derived from non-fatal cases in the same
care homes.
Lucey 2020 Hospital Complementary DNA was obtained from isolated
RNA through reverse transcription and multiplex PCR
according to the protocol provided by the Artic Network
initiative. Libraries were prepared using the NEBNext
Ultra II kit (New England Biolabs) and sequenced on an
Illumina MiSeq using 300-cycle v2 reagent kits (Illumina).
Bowtie 2 was used for aligning the sequencing reads
to the reference genome for SARS-CoV-2 (GenBank
number, MN908947.3) and SAMtools for manipulating the
alignments.
SNPs were used to define clusters and a
median-joining network was generated
including these data from this study and
an additional 1,000 strains collected from
GISAID available on May 22nd. Clade
annotation was included for the Pangolin,
GISAID and NextStrain systems.
WvGS identified six clusters of nosocomial SARS-CoV-2
transmission. The average sequence quality per samples
was > 99% for 46 samples, and between 92 and 94% for 4
samples. Phylogenetic analysis identified six independent
groups of which clusters 1–3 were related to 39 patients.
Pang 2022 Local Three specific real-time RT-PCR methods targeting the
N, S, and ORF1ab genes were designed to detect the
presence of SARS-CoV-2 in clinical samples. Thermal
cycling for N gene real-time RT-PCR assays was performed
at 50°C for 20 min for reverse transcription, 95°C for 15
min, 50 cycles of 94°C for 5 s, 55°C for 1 min.
Residual RNA was subjected to tiled
amplicon PCR using ARTIC nCoV-2019
version 3 panel, where One-Step RT-PCR
was performed using the SuperScript™ III
One-Step RT-PCR System with Platinum™
Taq DNA Polymerase (Thermo Fisher
Scientific, MA, USA). Sequencing libraries
were prepared using the Nextera XT and
sequenced on MiSeq (Illumina, CA, USA)
to generate 300 bp paired-end reads. The
reads were subjected to a hard-trim of 50
bp on each side to remove primer artifacts
using BBMap prior to consensus sequence
generation. The generated consensus
sequences were shared via a global initiative
on sharing avian flu data (GISAID). Closely
related representative strains from other
countries (99.99% identity and matching the
time window) were identified in the GISAID
database using BLASTN.
With the exception of sequence, phylogenetic analysis of
SARS-CoV-2 genome sequences obtained from all cases
(13/14; 92.9%), including H1, was grouped into a single
cluster. This cluster was supported by a single mutation
(T27588A) not found in other sequences in the database
before the nursing home outbreak.
Powell 2022 Local Not described Not described Whole genome sequencing was successful in two of
five index cases (the initial confirmed case that led to
the bubble self-isolating) and all nine positive direct
contacts. Overall, four of the nine sequences available
for comparison identified different SARS-CoV-2 strains,
therefore, ruling out transmission between affected
individuals.
Pung 2020 Multiple:
Company
conference,
church, tour
group.
Strain names, GISAID EpiCoV accession numbers used for
genomic sequencing
Phylogenetic tree utilised the Neighbor-
Joining method and confirmed using
Maximum Likelihood approaches. Replicate
trees with bootstrap used. All ambiguous
positions were removed for each sequence
pair (pairwise deletion option). Evolutionary
analyses were conducted in MEGA X. Strain
names, GISAID EpiCoV accession numbers
and collection dates are shown, followed by
the case number if available.
Cluster A: Viral genomic sequences were available for
four cases (AH1, AH2, AH3, and AT1) and phylogenetic
analysis confirmed their linkage, as suggested by the
epidemiological data.
Sikkema 2020 Hospital Samples were selected based on a Ct <32. A SARS-CoV-2-
specific multiplex PCR for nanopore sequencing was done.
The resulting raw sequence data were demultiplexed
using qcat. Primers were trimmed using cutadapt,17
after which a reference-based alignment to the GISAID
(Global Initiative on Sharing All Influenza Data) sequence
EPI_ISL_412973 was done using minimap2. The consensus
genome was extracted and positions with a coverage less
than 30 reads were replaced with N using a custom script
using biopython software (version 1.74) and the python
module pysam (version 0.15.3). Mutations in the genome
were confirmed by manually checking the alignment,
and homopolymeric regions were manually checked and
resolved, consulting the reference genome. Genomes
were included when having greater than 90% genome
coverage.

All available full-length SARS-CoV-2 genomes were
retrieved from GISAID20 on March 20, 2020 (appendix 1
pp 8–65), and aligned with the newly obtained SARS-CoV-2
sequences in this study using the multiple sequence
alignment software MUSCLE (version 3.8.1551). Sequences
with more than 10% of N position replacements were
excluded. The alignment was manually checked for
discrepancies, after which the phylogenomic software IQ-TREE (version 1.6.8) was used to do a maximum-
likelihood phylogenetic analysis, with the generalised
time reversible substitution model GTR+F+I+G4 as best
predicted model.The ultrafast bootstrap option was used
with 1000 replicates. Clusters were ascertained based on
visual clustering and lineage designations.
The code to generate the minimum
spanning phylogenetic tree was written in
the R programming language. Ape24 and
igraph software packages were used to write
the code to generate the minimum spanning
tree, and the visNetwork software package
was used to generate the visualisation.
Pairwise sequence distance (used to
generate the network) was calculated by
adding up the absolute nucleotide distance
and indel-block distance. Unambiguous
positions were dealt with in a pairwise
manner. Sequences that were mistakenly
identified as identical, because of transient
connections with sequences containing
missing data, were resolved.
46 (92%) of 50 sequences from health-care workers in the
study were grouped in three clusters. Ten (100%) of 10
sequences from patients in the study grouped into the
same three clusters:
Speake 2020 Aircraft Processed reads were mapped to the SARS-CoV-2
reference genome (GenBank accession no. MN908947).
Primer-clipped alignment files were imported into
Geneious Prime version 2020.1.1 for coverage analysis
before consensus calling, and consensus sequences were
generated by using iVar version 1.2.2.
Genome sequences of SARS-CoV-2
from Western Australia were assigned
to lineages by using the Phylogenetic
Assignment of Named Global Outbreak
LINeages (PANGOLIN) tool ( https://github.
com/cov-lineages/pangolinExternal Link).
On July 17, 2020, we retrieved SARS-CoV-2
complete genomes with corresponding
metadata from the GISAID database. The
final dataset contained 540 GISAID whole-
genome sequences that were aligned with
the sequences from Western Australia
generated in this study by using MAFFT
version 7.467. Phylogenetic trees were
visualized in iTOL (Interactive Tree Of Life,
https://itol.embl.deExternal Link) and MEGA
version 7.014.
100% coverage was obtained for 21 and partial coverage
(81%–99%) for 4 samples. The phylogenetic tree for the
21 complete genomes belonged to either the A.2 (n = 17)
or B.1 (n = 4) sublineages of SARS-CoV-2
Taylor 2020 Skilled nursing
facilities
WGS was conducted by MDH-PHL on available specimens
using previously described methods.
Phylogenetic relationships, including distinct
clustering of viral whole genome sequences,
were inferred based on nucleotide
differences via IQ-TREE, using general time
reversible substitution models
Specimens from 18 (35%) residents and seven (18%) HCP
at facility A were sequenced - Strains from 17 residents
and five HCP were genetically similar. At facility B, 75 (66%)
resident specimens and five (7%) HCP specimens were
sequenced, all of which were genetically similar.
Wang 2020 Home Full genomes were sequenced using the BioelectronSeq
4000. WGS integrated information from 60 published
genomic sequences of SARS-CoV-2. Full-length genomes
were combined with published SARS-CoV-2 genomes
and other coronaviruses and aligned using the FFT-NS-2
model by MAFFT.
Maximum-likelihood phylogenies were
inferred under a generalised-time-reversal
(GTR)+ gamma substitution model and
bootstrapped 1000 times to assess
confidence using RAxML.
The phylogenetic tree of full-length genomes showed
that SARS-CoV-2 strains form a monophyletic clade with a
bootstrap support of 100%. Sequences from six HCWs in
the Department of Neurosurgery and one family member
were closely related in the phylogenetic tree.
33 family members of the HCWs were not secondarily
infected, due to the strict self-quarantine strategies taken
by the HCWs immediately after their onset of illness,
including wearing a facial mask when they came home,
living alone in a separated room, never eating together
with their families.
Zhang 2021 Local
Household
Sequencing raw reads were trimmed to remove
sequencing adaptors and low-quality bases. Clean reads
were aligned to the reference genome of the SARS-CoV-2
(GenBank: NC_045512.2) using the Bowtie2 v.2.2.537 with
default parameters. Duplicate reads were removed with
Picard Tools. Samtools (v.1.10) “mpileup” was used to call
SNPs using mpileup files as input with parameter -Q 20.
Each site was re-calculated, and variants were screened
using perl script with the following parameters: (ia) depth
of alternate allele ≥ 5, (ii) alternate allele frequency ≥70%,
and (iii) discarding the sites only supported by a single
strand. The C337T variant in P4 were also considered as
an SNP that was supported by sequencing reads (with
67% frequency) and validated by Sanger sequencing.
Consensus sequences were called using BCFtools based
on reference sequence.
Phylogenomic analysis of 13 high-quality
(coverage: ≥70%) viral genomes was
performed together with 72 strains
circulating in Beijing during the same period,
including 33 public viral genomes (from
global initiative on sharing all influenza data
[GISAID]) and 39 viral genomes from local
centre. Viral genome was obtained from all
14 patients.

72 viral genomes were obtained, 33 were
from the GISAID ( https://gisaid.org), 39
were from Beijing Ditan Hospital (GenBank:
PRJNA667180). Consensus sequences were
trimmed to 5ʹ and 3ʹ untranslated regions
due to their poor quality. Multiple sequence
alignment was conducted with parameters
--auto --keeplength --addfragments
using MAFFT v.7.45324.39. The maximum
likelihood tree was constructed using IQ-
TREE v.1.6.12 with 1000 bootstrap replicates.
The substitution model GTR+F+R2 was
selected based on Bayesian information
criteria score. TreeTime v.0.7.6 was used
for time-resolved phylogenomic analysis.41
iTOL (itol.embl.de) was applied for displaying
topology of phylogenomic tree.

The nucleotide frequency of each
genomic locus was calculated with the
85 viral genomes of circulating strains in
Beijing, including 13 genomes (P1–P13)
from the outbreak cluster and 72 local
genomes mentioned above (Figure S1). A
median joining network was constructed
using NETWORK v.10.1.0.0 on the Fluxus
Technology website ( https://www.fluxus-
engineering.com/).
Twelve viral genomes from this outbreak were tightly
clustered into two clades with bootstrap values of at least
77%.