Skip to main content
. 2021 Apr 9;10:280. [Version 1] doi: 10.12688/f1000research.52439.1

Table 9. GS and phylogenetic analysis.

Study ID Study Setting Method used for WGS Phylogenetic analysis Results
Böhmer 2020 Home, workplace Whole genome sequencing involved
Roche KAPA HyperPlus library preparation
and sequencing on Illumina NextSeq
and MiSeq instruments as well as RT-PCR
product sequencing on Oxford Nanopore
MinION using the primers described in
Corman and colleagues. Patient 1 was
sequenced on all three platforms; patients
2–7 were sequenced on Illumina NextSeq,
both with and without RT-PCR product
sequencing with primers as in Corman
and colleagues; and patients 8–11, 14, and
16 were sequenced on Oxford Nanopore
MinION. Sequencing of patient 15 was not
successful. Sequence gaps were filled by
Sanger sequencing.
Not reported Presymptomatic transmission from patient 4 to patient 5 was strongly
supported by virus sequence analysis: a nonsynonymous nucleotide
polymorphism (a G6446A substitution) was found in the virus from
patients 4 and 5 onwards but not in any cases detected before this point
(patients 1–3). Later cases with available specimens, all containing this
same substitution, were all traced back to patient 5. The possibility that
patient 4 could have been infected by patient 5 was excluded by detailed
sequence analysis: patient 4 had the novel G6446A virus detected in a
throat swab and the original 6446G virus detected in her sputum, whereas
patient 5 had a homogeneous virus population containing the novel
G6446A substitution in the throat swab.
Firestone 2020 Motorcycle rally WGS was conducted at the MDH Public
Health Laboratory on 38 specimens using
previously described methods.
Phylogenetic relationships,
including distinct clustering of
viral whole genome sequences,
were inferred based on
nucleotide differences via
IQ-TREE using general time
reversible substitution models
as a part of the Nextstrain
workflow.
38 (73%) specimens (23 [61%] from primary and 15 [39%] from secondary
and tertiary cases) were successfully sequenced, covering at least 98%
of the SARS-CoV-2 genome. Six genetically similar clusters with known
epidemiologic links were identified (i.e., cases in patients who were
close contacts or who had common exposures at the rally), five of which
demonstrated secondary or secondary and tertiary transmission.
Jiang 2020 Home Positive samples were sequenced directly
from the original specimens as previously
described.
*Reference virus genomes were obtained
from GenBank using Blastn with 2019-
nCoV as a query. The open reading frames
of the verified genome sequences were
predicted using Geneious (version 11.1.5)
and annotated using the Conserved
Domain Database. Pairwise sequence
identities were also calculated using
Geneious. Potential genetic recombination
was investigated using SimPlot software
and phylogenetic analysis.
The maximum likelihood
phylogenetic tree of the
complete genomes was
conducted by using RAxML
software with 1000 bootstrap
replicates, employing the
general time-reversible
nucleotide substitution model.
The full genome of 8 patients were >99.9% identical across the whole
genome. Phylogenetic analysis showed that viruses from patients were
clustered in the same clade and genetically similar to other
SARS-CoV-2 sequences reported in other countries.
Ladhani 2020a Care homes Whole genome sequencing (WGS) was
performed on all RT-PCR positive samples.
Viral amplicons were sequenced using
Illumina library preparation kits (Nextera)
and sequenced on Illumina short-read
sequencing machines. Raw sequence data
was trimmed and aligned against a SARS-
CoV-2 reference genome (NC_045512.2).
A consensus sequence representing
each genome base was derived from the
reference alignment.
Consensus sequences were
assessed for quality, aligned
using MAFFT (Multiple Alignment
using Fast Fourier Transform,
version 7.310), manually curated
and maximum likelihood
phylogenetic trees derived using
IQtree (version 2.04).
All 158 PCR positive samples underwent WGS analysis and 99 (68
residents, 31 staff) distributed across all the care homes yielded sequence
sufficient for WGS analysis. Phylogenetic analysis identified informal
clusters, with evidence for multiple introductions of the virus into care
home settings. All care home clusters of SARS-CoV-2 genomes included at
least one staff member, apart from care home B with no PCR positive staff
and high rates of staff self-isolation. Care home A exhibited three distinct
sequence clusters and six singletons, potentially representing up to nine
separate introductions. Genomic analysis did not identify any differences
between asymptomatic/symptomatic residents/staff. The 10 sequences
from residents who died were distributed across the lineages identified
and were closely matched to sequences derived from non-fatal cases in
the same care homes.
Lucey 2020 Hospital Complementary DNA was obtained from
isolated RNA through reverse transcription
and multiplex PCR according to the
protocol provided by the Artic Network
initiative. Libraries were prepared using
the NEBNext Ultra II kit (New England
Biolabs) and sequenced on an Illumina
MiSeq using 300-cycle v2 reagent kits
(Illumina). Bowtie 2 was used for aligning
the sequencing reads to the reference
genome for SARS-CoV-2 (GenBank
number, MN908947.3) and SAMtools for
manipulating the alignments.
SNPs were used to define
clusters and a median-joining
network was generated
including these data from this
study and an additional 1,000
strains collected from GISAID
available on May 22nd. Clade
annotation was included for the
Pangolin, GISAID and NextStrain
systems.
WvGS identified six clusters of nosocomial SARS-CoV-2 transmission. The
average sequence quality per samples was > 99% for 46 samples, and
between 92 and 94% for 4 samples. Phylogenetic analysis identified six
independent groups of which clusters 1–3 were related to 39 patients.
Pung 2020 Multiple:
Company
conference,
church, tour
group.
Strain names, GISAID EpiCoV accession
numbers used for genomic sequencing
Phylogenetic tree utilised the
Neighbor-Joining method and
confirmed using Maximum
Likelihood approaches. Replicate
trees with bootstrap used. All
ambiguous positions were
removed for each sequence
pair (pairwise deletion option).
Evolutionary analyses were
conducted in MEGA X. Strain
names, GISAID EpiCoV accession
numbers and collection dates
are shown, followed by the case
number if available.
Cluster A: Viral genomic sequences were available for four cases (AH1,
AH2, AH3, and AT1) and phylogenetic analysis confirmed their linkage, as
suggested by the epidemiological data.
Sikkema 2020 Hospital Samples were selected based on a Ct
<32. A SARS-CoV-2-specific multiplex
PCR for nanopore sequencing was done.
The resulting raw sequence data were
demultiplexed using qcat. Primers were
trimmed using cutadapt,17 after which a
reference-based alignment to the GISAID
(Global Initiative on Sharing All Influenza
Data) sequence EPI_ISL_412973 was done
using minimap2. The consensus genome
was extracted and positions with a
coverage less than 30 reads were replaced
with N using a custom script using
biopython software (version 1.74) and the
python module pysam (version 0.15.3).
Mutations in the genome were confirmed
by manually checking the alignment, and
homopolymeric regions were manually
checked and resolved, consulting the
reference genome. Genomes were
included when having greater than 90%
genome coverage.
All available full-length SARS-CoV-2
genomes were retrieved from GISAID20
on March 20, 2020 (appendix 1 pp 8–65),
and aligned with the newly obtained
SARS-CoV-2 sequences in this study
using the multiple sequence alignment
software MUSCLE (version 3.8.1551).
Sequences with more than 10% of N
position replacements were excluded.
The alignment was manually checked
for discrepancies, after which the
phylogenomic software IQ-TREE (version
1.6.8) was used to do a maximum-
likelihood phylogenetic analysis, with the
generalised time reversible substitution
model GTR+F+I+G4 as best predicted
model. The ultrafast bootstrap option was
used with 1000 replicates. Clusters were
ascertained based on visual clustering and
lineage designations.
The code to generate
the minimum spanning
phylogenetic tree was written in
the R programming language.
Ape24 and igraph software
packages were used to write
the code to generate the
minimum spanning tree, and the
visNetwork software package
was used to generate the
visualisation. Pairwise sequence
distance (used to generate
the network) was calculated
by adding up the absolute
nucleotide distance and indel-
block distance. Unambiguous
positions were dealt with in a
pairwise manner. Sequences
that were mistakenly identified
as identical, because of transient
connections with sequences
containing missing data, were
resolved.
46 (92%) of 50 sequences from health-care workers in the study were
grouped in three clusters. Ten (100%) of 10 sequences from patients in the
study grouped into the same three clusters:
Speake 2020 Aircraft Processed reads were mapped to the
SARS-CoV-2 reference genome (GenBank
accession no. MN908947). Primer-
clipped alignment files were imported
into Geneious Prime version 2020.1.1
for coverage analysis before consensus
calling, and consensus sequences were
generated by using iVar version 1.2.2.
Genome sequences of SARS-
CoV-2 from Western Australia
were assigned to lineages
by using the Phylogenetic
Assignment of Named Global
Outbreak LINeages (PANGOLIN)
tool ( https://github.com/cov-lineages/pangolinExternal Link).
On July 17, 2020, we retrieved
SARS-CoV-2 complete genomes
with corresponding metadata
from the GISAID database.
The final dataset contained
540 GISAID whole-genome
sequences that were aligned
with the sequences from
Western Australia generated
in this study by using MAFFT
version 7.467. Phylogenetic
trees were visualized in iTOL
(Interactive Tree Of Life, https://
itol.embl.deExternal Link) and
MEGA version 7.014.
100% coverage was obtained for 21 and partial coverage (81%–99%) for 4
samples. The phylogenetic tree for the 21 complete genomes belonged to
either the A.2 (n = 17) or B.1 (n = 4) sublineages of SARS-CoV-2
Taylor 2020 Skilled
nursing
facilities
WGS was conducted by MDH-PHL on
available specimens using previously
described methods.
Phylogenetic relationships,
including distinct clustering of
viral whole genome sequences,
were inferred based on
nucleotide differences via
IQ-TREE, using general time
reversible substitution models
Specimens from 18 (35%) residents and seven (18%) HCP at facility A
were sequenced - Strains from 17 residents and five HCP were genetically
similar. At facility B, 75 (66%) resident specimens and five (7%) HCP
specimens were sequenced, all of which were genetically similar.
Wang 2020 Home Full genomes were sequenced using the
BioelectronSeq 4000. WGS integrated
information from 60 published genomic
sequences of SARS-CoV-2. Full-length
genomes were combined with published
SARS-CoV-2 genomes and other
coronaviruses and aligned using the FFT-
NS-2 model by MAFFT.
Maximum-likelihood
phylogenies were inferred under
a generalised-time-reversal
(GTR)+ gamma substitution
model and bootstrapped 1000
times to assess confidence using
RAxML.
The phylogenetic tree of full-length genomes showed that SARS-CoV-2
strains form a monophyletic clade with a bootstrap support of 100%.
Sequences from six HCWs in the Department of Neurosurgery and one
family member were closely related in the phylogenetic tree.
33 family members of the HCWs were not secondarily infected, due to the
strict self-quarantine strategies taken by the HCWs immediately after their
onset of illness, including wearing a facial mask when they came home,
living alone in a separated room, never eating together with their families.