Table 9. GS and phylogenetic analysis.
Study ID | Study Setting | Method used for WGS | Phylogenetic analysis | Results |
---|---|---|---|---|
Böhmer 2020 | Home, workplace | Whole genome sequencing involved
Roche KAPA HyperPlus library preparation and sequencing on Illumina NextSeq and MiSeq instruments as well as RT-PCR product sequencing on Oxford Nanopore MinION using the primers described in Corman and colleagues. Patient 1 was sequenced on all three platforms; patients 2–7 were sequenced on Illumina NextSeq, both with and without RT-PCR product sequencing with primers as in Corman and colleagues; and patients 8–11, 14, and 16 were sequenced on Oxford Nanopore MinION. Sequencing of patient 15 was not successful. Sequence gaps were filled by Sanger sequencing. |
Not reported | Presymptomatic transmission from patient 4 to patient 5 was strongly
supported by virus sequence analysis: a nonsynonymous nucleotide polymorphism (a G6446A substitution) was found in the virus from patients 4 and 5 onwards but not in any cases detected before this point (patients 1–3). Later cases with available specimens, all containing this same substitution, were all traced back to patient 5. The possibility that patient 4 could have been infected by patient 5 was excluded by detailed sequence analysis: patient 4 had the novel G6446A virus detected in a throat swab and the original 6446G virus detected in her sputum, whereas patient 5 had a homogeneous virus population containing the novel G6446A substitution in the throat swab. |
Firestone 2020 | Motorcycle rally | WGS was conducted at the MDH Public
Health Laboratory on 38 specimens using previously described methods. |
Phylogenetic relationships,
including distinct clustering of viral whole genome sequences, were inferred based on nucleotide differences via IQ-TREE using general time reversible substitution models as a part of the Nextstrain workflow. |
38 (73%) specimens (23 [61%] from primary and 15 [39%] from secondary
and tertiary cases) were successfully sequenced, covering at least 98% of the SARS-CoV-2 genome. Six genetically similar clusters with known epidemiologic links were identified (i.e., cases in patients who were close contacts or who had common exposures at the rally), five of which demonstrated secondary or secondary and tertiary transmission. |
Jiang 2020 | Home | Positive samples were sequenced directly
from the original specimens as previously described. *Reference virus genomes were obtained from GenBank using Blastn with 2019- nCoV as a query. The open reading frames of the verified genome sequences were predicted using Geneious (version 11.1.5) and annotated using the Conserved Domain Database. Pairwise sequence identities were also calculated using Geneious. Potential genetic recombination was investigated using SimPlot software and phylogenetic analysis. |
The maximum likelihood
phylogenetic tree of the complete genomes was conducted by using RAxML software with 1000 bootstrap replicates, employing the general time-reversible nucleotide substitution model. |
The full genome of 8 patients were >99.9% identical across the whole
genome. Phylogenetic analysis showed that viruses from patients were clustered in the same clade and genetically similar to other SARS-CoV-2 sequences reported in other countries. |
Ladhani 2020a | Care homes | Whole genome sequencing (WGS) was
performed on all RT-PCR positive samples. Viral amplicons were sequenced using Illumina library preparation kits (Nextera) and sequenced on Illumina short-read sequencing machines. Raw sequence data was trimmed and aligned against a SARS- CoV-2 reference genome (NC_045512.2). A consensus sequence representing each genome base was derived from the reference alignment. |
Consensus sequences were
assessed for quality, aligned using MAFFT (Multiple Alignment using Fast Fourier Transform, version 7.310), manually curated and maximum likelihood phylogenetic trees derived using IQtree (version 2.04). |
All 158 PCR positive samples underwent WGS analysis and 99 (68
residents, 31 staff) distributed across all the care homes yielded sequence sufficient for WGS analysis. Phylogenetic analysis identified informal clusters, with evidence for multiple introductions of the virus into care home settings. All care home clusters of SARS-CoV-2 genomes included at least one staff member, apart from care home B with no PCR positive staff and high rates of staff self-isolation. Care home A exhibited three distinct sequence clusters and six singletons, potentially representing up to nine separate introductions. Genomic analysis did not identify any differences between asymptomatic/symptomatic residents/staff. The 10 sequences from residents who died were distributed across the lineages identified and were closely matched to sequences derived from non-fatal cases in the same care homes. |
Lucey 2020 | Hospital | Complementary DNA was obtained from
isolated RNA through reverse transcription and multiplex PCR according to the protocol provided by the Artic Network initiative. Libraries were prepared using the NEBNext Ultra II kit (New England Biolabs) and sequenced on an Illumina MiSeq using 300-cycle v2 reagent kits (Illumina). Bowtie 2 was used for aligning the sequencing reads to the reference genome for SARS-CoV-2 (GenBank number, MN908947.3) and SAMtools for manipulating the alignments. |
SNPs were used to define
clusters and a median-joining network was generated including these data from this study and an additional 1,000 strains collected from GISAID available on May 22nd. Clade annotation was included for the Pangolin, GISAID and NextStrain systems. |
WvGS identified six clusters of nosocomial SARS-CoV-2 transmission. The
average sequence quality per samples was > 99% for 46 samples, and between 92 and 94% for 4 samples. Phylogenetic analysis identified six independent groups of which clusters 1–3 were related to 39 patients. |
Pung 2020 | Multiple:
Company conference, church, tour group. |
Strain names, GISAID EpiCoV accession
numbers used for genomic sequencing |
Phylogenetic tree utilised the
Neighbor-Joining method and confirmed using Maximum Likelihood approaches. Replicate trees with bootstrap used. All ambiguous positions were removed for each sequence pair (pairwise deletion option). Evolutionary analyses were conducted in MEGA X. Strain names, GISAID EpiCoV accession numbers and collection dates are shown, followed by the case number if available. |
Cluster A: Viral genomic sequences were available for four cases (AH1,
AH2, AH3, and AT1) and phylogenetic analysis confirmed their linkage, as suggested by the epidemiological data. |
Sikkema 2020 | Hospital | Samples were selected based on a Ct
<32. A SARS-CoV-2-specific multiplex PCR for nanopore sequencing was done. The resulting raw sequence data were demultiplexed using qcat. Primers were trimmed using cutadapt,17 after which a reference-based alignment to the GISAID (Global Initiative on Sharing All Influenza Data) sequence EPI_ISL_412973 was done using minimap2. The consensus genome was extracted and positions with a coverage less than 30 reads were replaced with N using a custom script using biopython software (version 1.74) and the python module pysam (version 0.15.3). Mutations in the genome were confirmed by manually checking the alignment, and homopolymeric regions were manually checked and resolved, consulting the reference genome. Genomes were included when having greater than 90% genome coverage. All available full-length SARS-CoV-2 genomes were retrieved from GISAID20 on March 20, 2020 (appendix 1 pp 8–65), and aligned with the newly obtained SARS-CoV-2 sequences in this study using the multiple sequence alignment software MUSCLE (version 3.8.1551). Sequences with more than 10% of N position replacements were excluded. The alignment was manually checked for discrepancies, after which the phylogenomic software IQ-TREE (version 1.6.8) was used to do a maximum- likelihood phylogenetic analysis, with the generalised time reversible substitution model GTR+F+I+G4 as best predicted model. The ultrafast bootstrap option was used with 1000 replicates. Clusters were ascertained based on visual clustering and lineage designations. |
The code to generate
the minimum spanning phylogenetic tree was written in the R programming language. Ape24 and igraph software packages were used to write the code to generate the minimum spanning tree, and the visNetwork software package was used to generate the visualisation. Pairwise sequence distance (used to generate the network) was calculated by adding up the absolute nucleotide distance and indel- block distance. Unambiguous positions were dealt with in a pairwise manner. Sequences that were mistakenly identified as identical, because of transient connections with sequences containing missing data, were resolved. |
46 (92%) of 50 sequences from health-care workers in the study were
grouped in three clusters. Ten (100%) of 10 sequences from patients in the study grouped into the same three clusters: |
Speake 2020 | Aircraft | Processed reads were mapped to the
SARS-CoV-2 reference genome (GenBank accession no. MN908947). Primer- clipped alignment files were imported into Geneious Prime version 2020.1.1 for coverage analysis before consensus calling, and consensus sequences were generated by using iVar version 1.2.2. |
Genome sequences of SARS-
CoV-2 from Western Australia were assigned to lineages by using the Phylogenetic Assignment of Named Global Outbreak LINeages (PANGOLIN) tool ( https://github.com/cov-lineages/pangolinExternal Link). On July 17, 2020, we retrieved SARS-CoV-2 complete genomes with corresponding metadata from the GISAID database. The final dataset contained 540 GISAID whole-genome sequences that were aligned with the sequences from Western Australia generated in this study by using MAFFT version 7.467. Phylogenetic trees were visualized in iTOL (Interactive Tree Of Life, https:// itol.embl.deExternal Link) and MEGA version 7.014. |
100% coverage was obtained for 21 and partial coverage (81%–99%) for 4
samples. The phylogenetic tree for the 21 complete genomes belonged to either the A.2 (n = 17) or B.1 (n = 4) sublineages of SARS-CoV-2 |
Taylor 2020 | Skilled
nursing facilities |
WGS was conducted by MDH-PHL on
available specimens using previously described methods. |
Phylogenetic relationships,
including distinct clustering of viral whole genome sequences, were inferred based on nucleotide differences via IQ-TREE, using general time reversible substitution models |
Specimens from 18 (35%) residents and seven (18%) HCP at facility A
were sequenced - Strains from 17 residents and five HCP were genetically similar. At facility B, 75 (66%) resident specimens and five (7%) HCP specimens were sequenced, all of which were genetically similar. |
Wang 2020 | Home | Full genomes were sequenced using the
BioelectronSeq 4000. WGS integrated information from 60 published genomic sequences of SARS-CoV-2. Full-length genomes were combined with published SARS-CoV-2 genomes and other coronaviruses and aligned using the FFT- NS-2 model by MAFFT. |
Maximum-likelihood
phylogenies were inferred under a generalised-time-reversal (GTR)+ gamma substitution model and bootstrapped 1000 times to assess confidence using RAxML. |
The phylogenetic tree of full-length genomes showed that SARS-CoV-2
strains form a monophyletic clade with a bootstrap support of 100%. Sequences from six HCWs in the Department of Neurosurgery and one family member were closely related in the phylogenetic tree. 33 family members of the HCWs were not secondarily infected, due to the strict self-quarantine strategies taken by the HCWs immediately after their onset of illness, including wearing a facial mask when they came home, living alone in a separated room, never eating together with their families. |