Skip to main content
[Preprint]. 2023 Jul 31:2022.06.24.497555. Originally published 2022 Jun 27. [Version 4] doi: 10.1101/2022.06.24.497555

Figure 2. SPLASH identifies strain-defining and other variation in SARS-Cov2.

Figure 2.

In A-C, sets of targets that distinguish between SARS-CoV-2 strains are shown; all are in the spike protein (S) gene. Each heatmap has columns for different patients and rows for different targets; the coloring indicates the fraction of the given target observed in the given patient. Summary anchor counts are given for rows and columns, simply to give context of how many observations are summarized. Also shown is a map of the categorical metadata of what strains (primary and secondary) were identified in each patient in the original study; this data was not used by SPLASH, but there is evident agreement between the expression in the heatmap and the strain assignment in the metadata. We give binomial p-values to quantify the distinctions in the plots (per Note S7).

A. Mutation K417N, identified by SPLASH in target 2, distinguishes at the major strain level: it is not found in Delta but is in all Omicron (both BA.1 and BA.2 sub-strains). Patients classified as Delta all express target 1; two patients co-infected with Delta and Omicron show both targets. (p = 6.4E-07)

B. An anchor with three targets identified by SPLASH distinguish at the sub-strain level: target 1 with no mutations matches Delta, target 2 with V213G is specific for BA.2, and target 3 has both a deletion mutation (NL211I) and insertion mutation (R214REPE) characteristic of BA.1. Target 1 and 2 associate inversely with Delta and BA.2; target 3 is more mixed, due to all samples with this anchor expressing some level of BA.1. (p = 1.0E-13)

C. An anchor with four main targets identifies mutations that are not associated to a specific strain: targets 2 and 3 encode Q677H (with different mutations) together with the Delta-specific mutation P681R, and each target is predominant in a different patient. Target 1 has only P681R and lacks Q677H. Target 4 has the Omicron-specific mutations N679K and P681H. SPLASH can identify complicated mutation patterns. (p = 4.9E-12)

D. Protein domain profiling in SARS-CoV-2. The top four protein domain types found by Pfam for translated extended sequences for SPLASH significant anchors (green bars) and Control anchors (highest abundance but generally without significant p-values; gray bars) are shown. S1 Receptor binding domain (RBD) and S2 domain, known to be under strong selective pressure, show high variation by SPLASH in both datasets. Other abbreviations used in the Pfam short-names are: bCoV = beta-coronavirus; CoV = coronavirus, nucleocap = nucleocapsid N = N-terminal domain, SARS = Severe acute respiratory syndrome coronavirus; M, NS7A, NSP1, NSP8, 3b, NSP10 are viral protein names.