Skip to main content
PLOS ONE logoLink to PLOS ONE
. 2022 Sep 22;17(9):e0274967. doi: 10.1371/journal.pone.0274967

Untargeted saliva metabolomics by liquid chromatography—Mass spectrometry reveals markers of COVID-19 severity

Cecile F Frampas 1,#, Katie Longman 1,#, Matt Spick 1, Holly-May Lewis 1, Catia D S Costa 2, Alex Stewart 3, Deborah Dunn-Walters 3, Danni Greener 4, George Evetts 4, Debra J Skene 3, Drupad Trivedi 5, Andy Pitt 5, Katherine Hollywood 5, Perdita Barran 5, Melanie J Bailey 1,2,*
Editor: Tommaso Lomonaco6
PMCID: PMC9498978  PMID: 36137157

Abstract

Background

The COVID-19 pandemic is likely to represent an ongoing global health issue given the potential for new variants, vaccine escape and the low likelihood of eliminating all reservoirs of the disease. Whilst diagnostic testing has progressed at a fast pace, the metabolic drivers of outcomes–and whether markers can be found in different biofluids–are not well understood. Recent research has shown that serum metabolomics has potential for prognosis of disease progression. In a hospital setting, collection of saliva samples is more convenient for both staff and patients, and therefore offers an alternative sampling matrix to serum.

Methods

Saliva samples were collected from hospitalised patients with clinical suspicion of COVID-19, alongside clinical metadata. COVID-19 diagnosis was confirmed using RT-PCR testing, and COVID-19 severity was classified using clinical descriptors (respiratory rate, peripheral oxygen saturation score and C-reactive protein levels). Metabolites were extracted and analysed using high resolution liquid chromatography-mass spectrometry, and the resulting peak area matrix was analysed using multivariate techniques.

Results

Positive percent agreement of 1.00 between a partial least squares–discriminant analysis metabolomics model employing a panel of 6 features (5 of which were amino acids, one that could be identified by formula only) and the clinical diagnosis of COVID-19 severity was achieved. The negative percent agreement with the clinical severity diagnosis was also 1.00, leading to an area under receiver operating characteristics curve of 1.00 for the panel of features identified.

Conclusions

In this exploratory work, we found that saliva metabolomics and in particular amino acids can be capable of separating high severity COVID-19 patients from low severity COVID-19 patients. This expands the atlas of COVID-19 metabolic dysregulation and could in future offer the basis of a quick and non-invasive means of sampling patients, intended to supplement existing clinical tests, with the goal of offering timely treatment to patients with potentially poor outcomes.

1. Introduction

The SARS-CoV-2 pandemic has caused a sustained threat to global health since the discovery of the virus in 2019 [1]. Whilst great strides have been made in both treatment and vaccination development [2, 3], the disease has inflicted multiple waves of infection throughout the world during 2020 and into 2021 [4, 5]. COVID-19 has higher fatality rates than seasonal influenza [6], and in addition, new variants are constantly evolving with the potential for either reduced vaccine effectiveness or altered lethality [7]. As a consequence, there is a continuing need for both better understanding of the impact of COVID-19 on the host metabolism as well as for prognostic tests that can be used to triage the high volumes of patients arriving in hospital settings.

Nasopharyngeal swabs followed by polymerase chain reaction (PCR) have been adopted worldwide for SARS-CoV-2 detection. However, supply chains for swabs rapidly collapsed amongst exponential increases in demand for testing, highlighting the urgency for alternative sample types and testing approaches. Furthermore, whilst PCR tests are easily deployable and highly selective for the virus, these approaches yield no prognostic information and cannot easily deliver rapid turnaround at the point of care, for example during a hospital admissions process. In contrast, analyses based on mass spectrometry can be provided in minutes, and have shown promise in the diagnosis of COVID-19 [8]. Furthermore, mass spectrometry instrumentation is often available to hospitals through third party providers or in-house laboratories. Prognostic tests, whilst challenging due to the varied phenotypes that may present themselves [9], could be used to manage demand for hospitalisation and treatment, especially if vaccine escape leads to future waves of severe COVID-19 infection.

Metabolic biomarkers in blood have been identified that carry prognostic information [1012], but sampling blood is invasive. Our experience in collecting and analysing patient samples is that saliva samples are significantly easier to collect and handle than blood. Blood collection requires trained phlebotomists, causes discomfort to patients and must be spun soon after collection to preserve the metabolome [13]. In contrast, a saliva sample can be donated quickly and painlessly by a patient [14]. Saliva is itself a carrier of the coronavirus [15], and additionally can convey information on wellness via its own characteristic metabolites [16]. To date, saliva is relatively under researched as a biofluid for metabolism analysis. It has been used for breast, pancreatic and also oral cancers [17, 18], and saliva multi-omics has been used to distinguish between COVID-19 inpatients and outpatients [19]. Here we undertook a preliminary and explorative study to investigate the suitability of saliva metabolomics for identifying biomarkers of COVID-19 positivity as well as biomarkers specific to COVID-19 severity within a hospital inpatient cohort (Fig 1).

Fig 1. Workflow summary—Recruitment, processing and results, produced with Biorender.com.

Fig 1

This work took place as part of the efforts of the COVID-19 International Mass Spectrometry (MS) Coalition [20]. This consortium aims to provide molecular level information on SARS-CoV-2 in infected humans, in order to better understand, diagnose and treat cases of COVID-19 infection. Data related to this work will be stored and fully accessible on the MS Coalition open repository on publication. The website URL is https://covid19-msc.org/

2. Materials and methods

2.1. Participant recruitment and ethics

Ethical approval for this project (IRAS project ID 155921) was obtained via the NHS Health Research Authority (REC reference: 14/LO/1221). 88 participants were recruited at NHS Frimley NHS Foundation Trust hospitals by researchers from the University of Surrey. Participants were identified by clinical staff to ensure that they had the capacity to consent to the study and were asked to sign an Informed Consent Form, witnessed by two University of Surrey researchers; written / verbal informed consent was obtained from all participants for inclusion in the study, and those that did not have this capacity or who did not provide written consent were not sampled. Consenting participants were categorised by the hospital as either “query COVID” (meaning there was clinical suspicion of COVID-19 infection) or “COVID positive” (meaning that a positive COVID test result had been recorded during their admission). All participants were provided with a Patient Information Sheet explaining the goals of the study.

Inclusion for participants was determined by reverse transcription polymerase chain reaction (RT-PCR) results; participants with an inconclusive RT-PCR test (clinically positive only and/or inconclusive test result, n = 6) or where the time lag between initial RT-PCR test and sampling exceeded fourteen days were excluded (n = 7). These additional exclusion criteria reduced the participant population from 88 to 75.

2.2. Sample collection, extraction and instrumental analysis

Patients were sampled immediately upon recruitment to the study in two waves, one between May and August 2020 and the second between October and November 2020. The range in time between symptom onset and saliva sampling ranged from 1 day to > 1 month, an inevitable consequence of collecting samples in a pandemic situation. Subsequently, the population was filtered prior to statistical analysis to exclude patients whose RT-PCR result was greater than 14 days from saliva collection. Each participant provided a sample of saliva by spitting directly into a falcon tube which was placed on ice immediately after collection. Samples were collected between the hours of 9 a.m. and 1 p.m. and transferred on ice from the hospital to the University of Surrey by courier within 4 hours of collection, to minimise changes to salivary metabolites [21]. Once received at University of Surrey, the samples were stored at minus 80°C until analysis.

Alongside saliva collection, metadata for all participants was also collected covering inter alia sex, age, comorbidities (based on whether the participant was receiving treatment), the results and dates of COVID PCR tests, bilateral chest X-Ray changes, smoking status, drug regimen, and whether and when the participant presented with clinical symptoms of COVID-19. This included access to medical records, for which consent was given according to the Informed Consent Form described previously. Participants were also sampled for sebum and serum [22]. Values for lymphocytes, CRP and eosinophils were also taken; values obtained within five days of the saliva sampling were recorded. Each participant was attributed a “severity score” in relation to their fitness observations at the time of hospital admission using the metadata collected. We adapted the “mortality scoring” approach of Knight et al. [23] to provide a score for symptom severity. This was derived from the sum of the respiratory rate score (with patients scoring 0 for <20, +1 for 20–29 and +2 for ≥30 breaths per min), peripheral oxygen saturation score (%) (0 for ≥30, +2 for <92) and C reactive protein level score (0 for <50 mg/L, +1 for 50–99 mg/L and +2 for ≥100 mg/L). This score ranged from 0 to 6; patients scoring 0 to 3 were attributed low severity and patients scoring 4 to 6 were attributed high severity.

Sample preparation and processing followed the guidelines set out by the COVID-19 Mass Spectrometry Coalition, which included safe handling procedures [13]. Saliva samples were separated into aliquots: 50 μL of saliva was added to 200 μL of ice-cold isopropanol (this also had the effect of deactivating the virus to allow transfer into a lower biological safety level laboratory). The samples were agitated for one hour, sonicated three times for 30 seconds, with resting on ice for 30 seconds between each sonication. Each sample was then left to stand on ice for 30 minutes then centrifuged for 10 minutes at room temperature at 10 000 g before resting on ice. The supernatant was removed and the precipitated protein pellet reserved for future analysis. The supernatant then underwent centrifugal filtration (0.22 μm cellulose acetate) for five minutes at 10,000 g, and the filtered supernatant was then dried under nitrogen and stored at minus 80°C.

Samples were reconstituted on the day of analysis in 100 μL water:methanol (95:5) with 0.1% formic acid by volume. 10 μL of each sample was set aside for combination in a pooled QC. The samples were analysed over a period of eleven days. Each day consisted of a run incorporating blank injections (n = 2), field blank injections (n = 3), pooled QC injections (n = 6, 3 at the start and finish), as well as QCs to measure instrumental variation and extraction variation (n = 7 and 3 respectively), and 10 participant samples, randomised for positive/negative (n = 3 for each).

2.3. Materials and chemicals

The materials and solvents utilised in this study were as follows: 2 mL microcentrifuge tubes (Eppendorf, UK), 0.22 μm cellulose acetate sterile Spin-X centrifuge tube filters (Corning incorporated, USA), 200 μL micropipette tips (Starlab, UK) and QsertTM clear glass insert LC vials (Supelco, UK). LC-MS grade 2-propanol was used as an inactivation solvent. OptimaTM LC-MS grade methanol and water were used as reconstitution solvents and mobile phases. Formic acid was added to the mobile phase solvents at 0.1% (v/v). Solvents were purchased from Fisher Scientific, UK.

2.4. Instrumentation and operating conditions

Analysis of samples was carried out using a UltiMate 3000 UHPLC equipped with a binary solvent manager, column compartment and autosampler, coupled to a Q Exactive™ Plus Hybrid Quadrupole-Orbitrap™ mass spectrometer (Thermo Fisher Scientific, UK) at the University of Surrey‘s Ion Beam Centre. Chromatographic separation was performed on a Waters ACQUITY UPLC BEH C18 column (1.7 μm, 2.1 mm x 100 mm) operated at 55°C with a flow rate of 0.3 ml min-1.

Mobile phase A was water: methanol (v/v 95:5) with 0.1% formic acid, whilst mobile phase B was methanol:water (v/v, 95:5) with 0.1% formic acid (v/v). An injection volume of 5 μL was used. The initial solvent mixture was 2% B for one minute, increasing to 98% B over 16 minutes and held at this level for four minutes. The gradient was finally reduced back to 2% B and held for two minutes to allow for column equilibration. Analysis on the Q-Exactive Plus mass spectrometer was performed with a scan range of m/z 100 to 1 000, and 70,000 mass resolution. MS/MS validation of features was carried out on Pooled QC samples using data dependent acquisition mode and normalised collision energies of 30 and 35 (arbitrary units). Operating conditions are summarised in S1 Table.

2.5. Data processing

LC-MS outputs (.raw files) were pre-processed for alignment and peak identification using Compound Discoverer version 3.1 and Freestyle 1.6 (Thermo Fisher Scientific, UK). Peak picking was set to a mass tolerance ±5 ppm, and alignment to a retention time window of 120 seconds. Missing values were imputed using K-nearest neighbour imputation [24]. Features identified by mass spectrometry were initially annotated using accurate mass match with reference to external databases (explored in parallel; KEGG, Human Metabolome Database, DrugBank, LipidMaps and BioCyc), and then validation was performed using data dependent MS/MS analysis. This process yielded an initial peak:area matrix with 10,700 discrete features. Two criteria were used for inclusion in the final analysis: only those features with identities validated by MS/MS were used, reducing the number of features to 1,874, and 1,514 features that were present in less than 30% of participant samples were excluded. This left 360 features that were used in the analysis. Normalisation was performed using EigenMS in NOREVA for each dataset analysed [25, 26].

2.6. Statistical analysis

PCA analyses were conducted in SIMCA (Sartorius Stedim Biotech, France) with additional machine learning conducted in R Studio Version 1.3.959 and MetaboAnalyst [27, 28]. Initial biomarker investigation was carried out using PLS-DA using 5 components and pareto-scaling, maximising separation by mahalanobis distance. Panels of the discriminatory biomarkers were identified by varying the number of features employed but otherwise using the same hyperparameters as for the PLS-DA analyses. Reduced panels were employed to improve robustness, given that when the number of features employed exceeds the number of samples, machine learning can overfit models that lack predictive power [29]. Furthermore, panels emphasising named compounds such as amino acids makes future targeted analysis more straightforward using already-existing assays. Leave-one-out cross-validation was used for model validation test accuracy, sensitivity and specificity; variable importance in projection (VIP) scores were used to assess feature significance alongside p-values and effect sizes (fold count). Batch effects were assessed by PCA analysis of both collection batches (waves one and two) and also instrument and extraction batching by day (in S1 and S2 Figs), showing no clustering by batches.

In prognostic analysis, given the lack of a “gold standard” reference test for whether COVID-19 is likely to be high severity or low severity (as this depends on clinical judgement), positive percent agreement (PPA) between the generated model and a high severity clinical diagnosis was used in preference to sensitivity, which measures the detection of positive instances of a disease relative to a ground truth value. Similarly, negative percent agreement (NPA) between the model and a high severity clinical diagnosis was used in preference to specificity, which measures the detection of positive instances of a disease relative to a ground truth value. In diagnostic analysis, given that RT-PCR tests were available to establish a ground truth, sensitivity and specificity values were calculated alongside diagnostic accuracy.

3. Results

3.1. Population metadata overview

The study population analysed in this work included 75 participants, comprising 47 participants presenting with a positive COVID-19 RT-PCR test and 28 participants presenting without. Of the positive participants, 10 were classed as presenting with high severity COVID-19, 34 were classed as presenting with low severity COVID-19, and 3 lacked sufficient clinical information for severity scoring. A summary of the metadata is shown in Table 1.

Table 1. Summary of clinical characteristics by participant cohort.

Parameters Covid-19 Covid-19 p-value Covid-19 Negative p-value
Low Severity High Severity High vs Low Severity Pos vs Neg
N 34 10 28
Age (mean, standard deviation; years) 60 ± 18 63 ± 13 0.61 62 ± 22 0.74
Male / Female (n) 16 / 18 8 / 2 0.083 16 / 12 0.26
Treated for Hypertension (n) 6 6 .041 12 0.21
Treated for High Cholesterol (n) 2 0 1.00 6 .05
Treated for Type 2 Diabetes Mellitus (n) 5 3 0.39 10 0.29
Treated for Ischemic Heart Disease (n) 1 2 0.149 7 0.09
Current Smoker (n) 1 0 1.00 0 NA
Ex-Smoker (n) 12 5 0.71 8 0.46
Medical Acute Dependency admission (n) 10 6 0.26 4 0.06
Intensive Care Unit admission (n) 0 0 N/A 0 NA
Survived Admission (n) 34 8 0.048 27 1.00
Lymphocytes (mean, standard deviation; cells / μL) 0.8 ± 0.5 0.9 ± 0.7 0.77 1.0 ± 0.5 0.302
C-Reactive Protein (mean, standard deviation; mg / L) 115. ± 85 170. ± 83. 0.075 127 ± 105 0.80
Eosinophils (mean, standard deviation; 100 / μL) 0.1 ± 0.1 0.0 ± 0.0 0.018 0.3 ± 0.4 0.002
Bilateral Chest X-Ray changes (n) 15 8 0.26 3 0.0009
Continuous Positive Airway Pressure (n) 1 1 0.442 3 0.36
O2 required (n) 9 4 0.69 8 1.00

In this study all participants were recruited in a hospital setting with at least potential suspicion of COVID-19 infection; controls were age matched and had similar profiles in terms of gender, oxygen requirements and survival rates. The COVID-19 positive cohort did, however, present with statistically significant increases in bilateral chest X-ray changes (p-value 0.0009) and levels of eosinophils (p-value 0.002), in agreement with literature observations [23], but not for C-reactive protein (CRP, p-value 0.80). Type 2 diabetes mellitus (T2DM) was more prevalent in the COVID-19 negative population than the positive population, being observed in 36% of COVID-19 negatives versus 30% of high severity COVID-19 patients and 15% of low severity COVID-19 patients, and similar trends of greater comorbidity being seen in the negative population was also true for ischemic heart disease (IHD) and hypertension (HTN). The greater preponderance of underlying comorbidities within the negative population represents a confounding factor.

Within the COVID-19 positive cohort, comorbidities were again age matched, but the high severity grouping had more males (80% male for high severity versus 47% for low severity) and had a statistically significant difference in proportion presenting with hypertension (p-value 0.04) and a statistically significant decrease in eosinophil levels (p-value 0.02). Interestingly, CRP was increased by a 1.5x fold count in high severity participants versus low, but CRP for low severity participants was lower than for COVID-19 negative participants. This can be explained by the fact that CRP is associated with a larger number of comorbidities and patients were only recruited if they had clinical suspicion of COVID-19.

3.2. Overview of features identified by liquid chromatography mass spectrometry (LC-MS)

360 features with MS/MS validation were identified as being present in 30% or more of participant samples. Of these 360 features, 36 were identified as related to medical interventions or food and were excluded, leaving 324 for statistical analysis. Of the 324, 38 were annotated by m/z value only, 171 were putatively annotated by formula (elemental composition), and 114 were putatively annotated as metabolites, with annotations considered level two as set out by the metabolomics standards initiative (MSI) [30].

3.3. Analysis of cohorts by multivariate techniques

Initially separation of COVID-19 positive versus negative participants was tested, as well as separation of COVID-19 high severity and low severity. As shown in Fig 2A, separation for diagnostic purposes was poor by visual inspection and delivered R2Y of 0.78 and Q2Y of 0.18. Leave-one-out cross-validation (LOOCV) provided sensitivity of 0.74 (95% confidence interval of 0.60–0.86) and specificity of 0.75 (0.55–0.89). The most significantly dysregulated identified metabolites (measured by p-value) between positive and negative COVID-19 status are listed in S2 Table.

Fig 2.

Fig 2

Saliva metabolomics analysis for COVID-19 diagnosis and prognosis via LC-MS in positive mode, showing: A PLS-DA plot for 75 participants and 324 features, COVID-19 positive / negative. B PLS-DA plot for 44 participants and 324 features, high severity / low severity. C LOOCV confusion matrix, COVID-19 positive / negative. D LOOCV confusion matrix, high severity / low severity.

Fig 2B shows separation for COVID-19 high severity participants versus low severity participants. The optimal separation was found using 5 components. Using leave-one-out cross validation, PPA for COVID-19 high severity was 1.00 (95% confidence interval of 0.69–1.00) and NPA was 1.00 (0.90–1.00), for overall percent agreement with the clinical diagnosis of 1.00 (0.92–1.00).

A volcano plot is shown in Fig 3. Amino acids are highlighted because this class of metabolites was identified as differentiated between high and low severity (see also S3 Table).

Fig 3. Volcano plot of statistical significance versus effect size for MS/MS validated features identified in the patient samples.

Fig 3

In order to improve robustness and reduce overfitting, sparse PLS-DA models were also constructed for the purposes of establishing a smaller panel of metabolites or features capable of discriminating between high and low severity COVID-19 participants. A putative panel comprising Valine, Leucine, Phenylalanine, Tyrosine, Proline and a feature identified putatively only by formula as C44H74N8O16, was capable of discriminating between the two populations with 100% accuracy and AUC of 1.00. This panel of predictive metabolites are additionally shown as boxplots in Fig 4 below and a complete list of metabolites showing statistically significant differences between high and low COVID-19 severity populations is shown in S3 Table.

Fig 4. Boxplots of features selected for ability to differentiate high and low COVID-19 severity (corresponding p-values for high and low severity, left to right: < 0.001, 0.041, 0.051, 0.65, 0.16, and 0.02).

Fig 4

3.4. Validation set

Since no fully independent prognostic validation set was available, we projected the reduced-feature PLS-DA model obtained for high severity versus low severity participants on to COVID-19 negative participants. Given that these participants should not show features associated with high severity COVID-19, this was considered to offer additional information. The confusion matrix for the results of the projection is shown in Table 2 below.

Table 2. Confusion matrix for reduced-feature PLS-DA model projected on to COVID-19 negative participants.

COVID-19 negative participants
Reduced-feature PLS-DA model result: High Severity 1
Reduced-feature PLS-DA model result: Low Severity 27

4. Discussion

Whilst age and recruitment venue were well matched (all participants were recruited in a hospital setting including controls), several variables within the metadata illustrate the natural difficulties in experimental design experienced during a pandemic. Age ranges of participants were large, a wide range of comorbidities were present, and the time between symptom onset and saliva sampling ranged from 1 day to > 1 month. This variation in time between symptom onset and saliva sampling was addressed through exclusion of participants whose RT-PCR result was greater than 14 days from study sampling. However, participant recruitment of the most severely affected was limited by ethics approval only covering patients who could give informed consent, thereby precluding the participation of patients with the highest COVID severity. Furthermore, given the small n in this pilot study, precision was necessarily low and confidence intervals wide.

In this study, saliva samples were provided under conditions that could be practically achieved in a hospital pandemic setting. This meant no scope for abstinence from food and / or drink before saliva sampling, and no prior rinsing of the mouth, leading to potential confounding factors. Diagnostic sensitivity of 0.74 (95% confidence interval of 0.60–0.86) and specificity of 0.75 (0.55–0.89) was considered insufficient to justify further investigation, given that proteomic and serum / plasma based metabolomic diagnostic tests have shown markedly better performance in diagnosing COVID-19 by both meta-analysis and in a recent matched-sample study [8, 31]. Fig 4 illustrates that a more marked separation exists between low severity and high severity, than between hospital-recruited controls and low severity. We hypothesise that mild COVID-19 causes more limited alteration of the salivary metabolome versus controls, especially given that the controls in this work were recruited in a hospital setting with similar symptoms to COVID-19. The data presented here suggest that salivary dysregulation specific to COVID-19 (and not indicative of general poor health) only reaches clearly identifiable levels at greater levels of COVID-19 severity, at least using the uncontrolled, but pragmatic sampling approach described in this work.

Superior differentiation by multivariate analysis was, however, achieved in relation to COVID-19 severity. The reduced-feature PLS-DA model showed separation of High Severity COVID-19 positive participants from Low Severity COVID-19 positive participants, with PPA and NPA of 1.00 by LOOCV. Furthermore, whilst not a true independent validation set, projecting the reduced-feature PLS-DA model on to COVID-19 negative participants showed that the model classified 97% of them as “low risk”, i.e. that the characteristic levels of markers associated with high severity were not associated with low severity in COVID-19 negative participants.

A number of identified metabolites showed statistically significant differences between the high and low severity participants. As shown in Fig 4, amino acids constituted the class of metabolites seeing the most changes between high and low severity, similar to literature observations of changes in either amino acids or ratios of amino acids in serum or plasma. Encouragingly for clinical application, in this work a reduced feature panel of just six features was still capable of discriminating between high and low severity COVID-19 participants, with five of the six features being amino acids (all downregulated in the saliva of high-severity participants). One previous study found in contrast that salivary myo-inositol and 2-pyrrolidineacetic acid were capable of distinguishing an inpatient cohort from an outpatient cohort [19], rather than amino acids, but all recruitment in this work took place in a hospital setting, i.e. the results shown herein represent separation based on severity within the inpatient cohort.

A number of limitations in this study should be acknowledged. In this work, we were unable to standardise the saliva collection by asking patients to rinse the mouth or abstain from eating, due to health and safety considerations, Additionally, we were unable to access patients immediately after admission to hospital, meaning that sample collection took place 1 day– 1 month after admission. Further, high resolution mass spectrometry was only performed in positive mode, due to competing demands for participant samples. Analysis in both positive mode and negative mode could have identified additional significant features. The analysis was also untargeted, and so lacked the use of internal standards that would be available in a targeted assay. As an untargeted analysis, many features were putatively identified, resulting in a noisy dataset for machine learning. In addition, the supervised multivariate analysis used in this work can lead to overfitting and false discovery, especially given the relatively small numbers of participants recruited in this work. Validation of these results in a larger and more balanced study cohort is required, using a standardised approach for assessing COVID-19 severity, when such an approach is universally accepted. Furthermore, future studies may need to take account of biomarker changes with new variants, as these have been found to be dependent upon collection wave [32].

It should be noted, however, that whilst this work was untargeted for discovery purposes, the features selected for the PLS-DA panel (valine, leucine, phenylalanine, tyrosine, proline and C44H74N8O16) were on average present in 85% of participant samples and targeted methods are available that can reliably quantify amino acids for future investigation [33]. Therefore, whilst this is a preliminary and exploratory study, we see these results as encouraging first evidence for distinctive changes in the salivary metabolome of hospitalised individuals with severe COVID-19. We believe that saliva has potential to add to understanding of the progression and severity of COVID-19, providing evidence that the salivary metabolome is disrupted, and more generally illustrating the potential for saliva as a biofluid for investigating dysregulated metabolism related to infectious diseases.

Supporting information

S1 Fig. Clustering of patient samples according to extraction batch.

Principal component analysis of each patient sample (circles) and batch QC’s (squares), coloured according to extraction batch, showing no significant clustering of patient samples according to extraction batch.

(DOCX)

S2 Fig. Principal component analysis of each patient sample and run QC. Figure shows low levels of QC variation according to position in the run sequence.

(DOCX)

S3 Fig. Principal component analysis for 75 participants and 324 features, COVID-19 positive / negative, LC-MS analysis in positive mode.

(DOCX)

S4 Fig. Principal component analysis for 44 participants and 324 features, high severity / low severity, LC-MS analysis in positive mode.

(DOCX)

S1 Table. Operating conditions of the mass spectrometer used in this research.

(DOCX)

S2 Table. Distinctive features between COVID-19 positive and negative.

(DOCX)

S3 Table. Distinctive features between COVID-19 high severity and low severity.

(DOCX)

Acknowledgments

The authors acknowledge Samiksha Ghimire from Groningen Medical School for translation of participant information sheets and consent forms into Nepalese, as well as Kyle Saunders of the University of Surrey for access to batch controls. The authors are additionally grateful to Thanuja Weerasinge (Jay), Manjula Meda, Chris Orchard and Joanne Zamani of Frimley Park NHS Foundation Trust for their help with ethics approvals and access to hospital patients.

Abbreviations

COVID-19

Coronavirus disease 19

CRP

C-reactive protein

HTN

Hypertension

IHD

Ischemic heart disease

KEGG

Kyoto Encyclopedia of Genes and Genomes

LC

Liquid chromatography

LC-MS

Liquid chromatography mass spectrometry

LOOCV

Leave-one-out cross validation

MS

Mass spectrometry

MS/MS or MS2

Tandem mass spectrometry

NPA

Negative percent agreement

PCA

Principal components analysis

PCR

Polymerase chain reaction

PLS-DA

Partial least squares-discriminant analysis

PPA

Positive percent agreement

QC

Quality control

RT-PCR

Reverse transcription polymerase chain reaction

SARS-CoV-2

Severe acute respiratory syndrome coronavirus 2

T2DM

Type 2 diabetes mellitus

VIP

Variable importance in projection

Data Availability

The aligned and annotated LC-MS data matrices used in this work are available at (https://doi.org/10.5281/zenodo.6924738). The analytical protocols used are openly available for all researchers to access. The website URL for the protocols is (https://covid19-msc.org/).

Funding Statement

The authors would like to acknowledge funding from the Electronics and Physical Sciences Research Council (EPSRC) Impact Acceleration Account for sample collection and processing, as well as EPSRC Fellowship Funding EP/R031118/1, the University of Surrey and Biotechnology and Biological Sciences Research Council (BBSRC) BB/T002212/1. Mass Spectrometry was funded under EPSRC grant EP/P001440/1. https://epsrc.ukri.org/ https://www.ukri.org/councils/bbsrc/ The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.WHO. Novel Coronavirus–China. 2020. [cited 27 Jul 2020]. Available: https://www.who.int/csr/don/12-january-2020-novel-coronavirus-china/en/ [Google Scholar]
  • 2.Knoll MD, Wonodi C. Oxford–AstraZeneca COVID-19 vaccine efficacy. The Lancet. 2021. pp. 72–74. doi: 10.1016/S0140-6736(20)32623-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.The RECOVERY Collaborative Group. Dexamethasone in Hospitalized Patients with Covid-19—Preliminary Report. N Engl J Med. 2020. doi: 10.1056/nejmoa2021436 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Cacciapaglia G. Second wave COVID ‑ 19 pandemics in Europe;: a temporal playbook. Sci Rep. 2020; 1–8. doi: 10.1038/s41598-020-72611-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Lai JW, Cheong KH. Superposition of COVID-19 waves, anticipating a sustained wave, and lessons for the future. 2020; 1–12. doi: 10.1002/bies.202000178 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Piroth L, Cottenet J, Mariet A-S, Bonniaud P, Blot M, Tubert-Bitter P, et al. Comparison of the characteristics, morbidity, and mortality of COVID-19 and seasonal influenza: a nationwide, population-based retrospective cohort study. Lancet Respir Med. 2020;2600: 1–9. doi: 10.1016/s2213-2600(20)30527-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.WHO. SARS-CoV-2 Variant–United Kingdom of Great Britain and Northern Ireland. 2020. [cited 25 Jan 2021]. Available: https://www.who.int/csr/don/21-december-2020-sars-cov2-variant-united-kingdom/en/ [Google Scholar]
  • 8.Spick M, Lewis HM, Wilde MJ, Hopley C, Huggett J, Bailey MJ. Systematic review with meta-analysis of diagnostic test accuracy for COVID-19 by mass spectrometry. Metabolism. 2021; 154922. doi: 10.1016/j.metabol.2021.154922 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Darmon M, Dumas G. Anticipating outcomes for patients with COVID-19 and identifying prognosis patterns. Lancet Infect Dis. 2021;21: 744–745. doi: 10.1016/S1473-3099(21)00073-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Gallo Marin B, Aghagoli G, Lavine K, Yang L, Siff EJ, Chiang SS, et al. Predictors of COVID-19 severity: A literature review. Rev Med Virol. 2021;31: 1–10. doi: 10.1002/rmv.2146 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Wu D, Shu T, Yang X, Song J-X, Zhang M, Yao C, et al. Plasma metabolomic and lipidomic alterations associated with COVID-19. Natl Sci Rev. 2020;7: 1157–1168. doi: 10.1093/nsr/nwaa086 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Biagini D, Franzini M, Oliveri P, Lomonaco T, Ghimenti S, Bonini A, et al. MS-based targeted profiling of oxylipins in COVID-19: A new insight into inflammation regulation. Free Radic Biol Med. 2022;180: 236–243. doi: 10.1016/j.freeradbiomed.2022.01.021 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.COVID-19 Mass Spectrometry Coalition. COVID-19 metabolomics and lipidomics protocol. 2020. [cited 25 May 2021]. Available: https://covid19-msc.org/metabolomics-and-lipidomics-protocol/ [Google Scholar]
  • 14.Bellagambi FG, Lomonaco T, Salvo P, Vivaldi F, Hangouët M, Ghimenti S, et al. Saliva sampling: Methods and devices. An overview. TrAC—Trends Anal Chem. 2020;124. doi: 10.1016/j.trac.2019.115781 [DOI] [Google Scholar]
  • 15.To KK, Tsang OT, Yip CC, Chan K, Wu T, Chan JM, et al. Consistent Detection of 2019 Novel Coronavirus in Saliva. 2020;71: 841–843. doi: 10.1093/cid/ciaa149 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Mandel ID. The Functions of Saliva. J Dent Res. 1987;66: 623–627. doi: 10.1177/00220345870660S203 [DOI] [PubMed] [Google Scholar]
  • 17.Sugimoto M, Wong DT, Hirayama A, Soga T, Tomita M. Capillary electrophoresis mass spectrometry-based saliva metabolomics identified oral, breast and pancreatic cancer-specific profiles. Metabolomics. 2010;6: 78–95. doi: 10.1007/s11306-009-0178-y [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Assad DX, Acevedo AC, Cançado E, Mascarenhas P, Gabriela A, Normando C, et al. Using an Untargeted Metabolomics Approach to Identify Salivary Metabolites in Women with Breast Cancer. 2020; 1–13. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Pozzi C, Levi R, Braga D, Carli F, Darwich A, Spadoni I, et al. A ‘Multiomic’ Approach of Saliva Metabolomics, Microbiota, and Serum Biomarkers to Assess the Need of Hospitalization in Coronavirus Disease 2019. Gastro Hep Adv. 2022;1: 194–209. doi: 10.1016/j.gastha.2021.12.006 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Struwe W, Emmott E, Bailey M, Sharon M, Sinz A, Corrales FJ, et al. The COVID-19 MS Coalition—accelerating diagnostics, prognostics, and treatment. Lancet. 2020;395: 1761–1762. doi: 10.1016/S0140-6736(20)31211-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Duarte D, Castro B, Pereira JL, Marques JF, Lu A, Gil AM. Evaluation of Saliva Stability for NMR Metabolomics;: Collection and Handling Protocols.: 1–15. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Spick M, Longman K, Frampas C, Costa C, Walters DD, Stewart A, et al. Changes to the sebum lipidome upon COVID-19 infection observed via non- invasive and rapid sampling from the skin. 2020; 1–22. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Knight SR, Ho A, Pius R, Buchan I, Carson G, Drake TM, et al. Risk stratification of patients admitted to hospital with covid-19 using the ISARIC WHO Clinical Characterisation Protocol: development and validation of the 4C Mortality Score. BMJ. 2020;370: m3339. doi: 10.1136/bmj.m3339 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Beretta L, Santaniello A. Nearest neighbor imputation algorithms: A critical evaluation. BMC Med Inform Decis Mak. 2016;16. doi: 10.1186/s12911-016-0318-z [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Karpievitch Y V, Taverner T, Adkins JN, Callister SJ, Anderson GA, Smith RD, et al. Normalization of peak intensities in bottom-up MS-based proteomics using singular value decomposition. Bioinformatics. 2009;25: 2573–2580. doi: 10.1093/bioinformatics/btp426 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Li B, Tang J, Yang Q, Li S, Cui X, Li Y, et al. NOREVA: Normalization and evaluation of MS-based metabolomics data. Nucleic Acids Res. 2017;45: W162–W170. doi: 10.1093/nar/gkx449 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Chong J, Wishart DS, Xia J. Using MetaboAnalyst 4.0 for Comprehensive and Integrative Metabolomics Data Analysis. Curr Protoc Bioinforma. 2019;68: e86. doi: 10.1002/cpbi.86 [DOI] [PubMed] [Google Scholar]
  • 28.R Core Team. R: A Language and Environment for Statistical Computing. Vienna, Austria; 2020. Available: https://www.r-project.org/
  • 29.Lê Cao KA, Boitard S, Besse P. Sparse PLS discriminant analysis: Biologically relevant feature selection and graphical displays for multiclass problems. BMC Bioinformatics. 2011;12. doi: 10.1186/1471-2105-12-253 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Sumner LW, Amberg A, Barrett D, Beale MH, Beger R, Daykin CA, et al. Proposed minimum reporting standards for chemical analysis: Chemical Analysis Working Group (CAWG) Metabolomics Standards Initiative (MSI). Metabolomics. 2007;3: 211–221. doi: 10.1007/s11306-007-0082-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Spick M, Lewis H-M, Frampas CF, Longman K, Costa C, Stewart A, et al. An integrated analysis and comparison of serum, saliva and sebum for COVID-19 metabolomics. Sci Rep. 2022;12: 11867. doi: 10.1038/s41598-022-16123-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Lewis H-M, Liu Y, Frampas CF, Longman K, Spick M, Stewart A, et al. Metabolomics Markers of COVID-19 Are Dependent on Collection Wave. Metabolites. 2022;12. doi: 10.3390/metabo12080713 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Fernández-del-Campo-García MT, Casas-Ferreira AM, Rodríguez-Gonzalo E, Moreno-Cordero B, Pérez-Pavón JL. Development of a screening and confirmatory method for the analysis of polar endogenous compounds in saliva based on a liquid chromatographic-tandem mass spectrometric system. J Chromatogr A. 2019;1590: 88–95. doi: 10.1016/j.chroma.2019.01.001 [DOI] [PubMed] [Google Scholar]

Decision Letter 0

Tommaso Lomonaco

12 Jul 2022

PONE-D-22-09481Untargeted saliva metabolomics by liquid chromatography - mass spectrometry reveals biomarkers of COVID-19 severityPLOS ONE

Dear Dr. Matt P. Spick,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Aug 26 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Tommaso Lomonaco, Ph.D

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf  and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. Please amend either the abstract on the online submission form (via Edit Submission) or the abstract in the manuscript so that they are identical.

3. We note that you have stated that you will provide repository information for your data at acceptance. Should your manuscript be accepted for publication, we will hold it until you provide the relevant accession numbers or DOIs necessary to access your data. If you wish to make changes to your Data Availability statement, please describe these changes in your cover letter and we will update your Data Availability statement to reflect the information you provide.

4. Please expand the acronym “EPSRC, BBSRC” (as indicated in your financial disclosure) so that it states the name of your funders in full.

This information should be included in your cover letter; we will change the online submission form on your behalf.

5. Your ethics statement should only appear in the Methods section of your manuscript. If your ethics statement is written in any section besides the Methods, please delete it from any other section.

6. Please include captions for your Supporting Information files at the end of your manuscript, and update any in-text citations to match accordingly. Please see our Supporting Information guidelines for more information: http://journals.plos.org/plosone/s/supporting-information

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: No

Reviewer #2: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: In this paper, the untargeted profiling of the salivary metabolome in COVID-19 patients is interestingly presented. This topic seems to be really appealing in this field and the work takes place as part of the “COVID-19 International Mass Spectrometry (MS) Coalition” attempt.

I would like to suggest some comments to improve the manuscript and clarify some points:

Main general comments:

Even considering the natural difficulties experienced in patient recruitment and sample collection during a pandemic, the pilot study described in this work has not been designed in a systematic manner. This is evident from the lack of detailed instructions for saliva sample collection and a variable timeframe occurring between symptom onset and saliva sampling (from 1 day to > 1 month). These conditions represent substantial confounding factors when assessing the predictive role of salivary metabolites in COVID-19. A scattered sample collection generally produces random results and wrong data interpretation. Considering the unsystematic approach and the small population size (10 high severity, 34 low-severity, 28 COVID-19 negative), it is not appropriate to describe your approach as a prognostic tool to assess COVID-19 severity. The proposed pilot study should be described as a preliminary and explorative study to hypothesize the role of salivary metabolome as a predictor of COVID-19 severity, more than a study to identify biomarkers. As mentioned in the discussion section, this pilot study should be confirmed through a wide-scale prospective observational study, mainly focused on a subset of selected chemicals analysed in targeted mode. Considering all these aspects, please restate properly the title and the objective of the work/conclusions in the abstract session. Be in line with the considerations included in the discussion section.

Specific comments:

Abstract:

Line 35: Please add the list of clinical descriptors between brackets.

Lines 36-37: Please consider changing “mass spectrometry” with “high-resolution mass spectrometry”

Please delete the colon between peak and area.

Line 41: Specify the sixth feature

Lines 44-45: Please add the ROC curve to the Supplementary information section.

Lines 48-49: Unfortunately, the criteria to define low/high severity in COVID-19 are not standardized. Guidelines are not unique worldwide and, as you said, the threshold between low and high depends on clinical judgement. In my opinion, it is recommended to wait for a wider population size and more standardised criteria to define COVID-19 severity before proposing predictive models for prognostic purposes.

Introduction:

Line 77: “In contrast, tests based on mass spectrometry can be provided in minutes..” Are you talking about on-line instrumentation? Unfortunately, when dealing with off-line approaches and sample treatment, the time is actually longer. Please, restate or insert a reference.

Line 78: “Furthermore, mass spectrometry instrumentation is often available in hospital pathology laboratories.” Unfortunately, this is not true. MS is not among the facilities commonly found in hospitals/clinics. Please restate or insert a reference.

Line 81: “especially should vaccine escape lead to..”. Unclear. Please, restate.

Line 87: “and must be spun soon after collection to preserve the metabolome”. Please add a reference.

Line 88: “a saliva sample can be donated quickly and painlessly by a patient” Add the following reference on saliva collection (doi.org/10.1016/j.trac.2019.115781).

Line 89: “Saliva is itself a carrier of the coronavirus” This is not a good thing from a biosafety point of view. How did you ensure a safe collection/manipulation of the saliva specimen? Did you follow any standardized protocol? Please, insert some details on this important aspect.

Lines 89-90: “information via its own characteristic metabolites. [13]..” What do you mean? Please, clarify.

Materials and methods:

Lines 123-124: “saliva sampling ranged from 1 day to > 1 month..” This aspect could represent a critical issue especially for the main objective of the study, i.e. identifying prognostic markers. The levels and, thus, the predictivity are influenced by the timeframe occurring between symptoms/collection. I kindly suggest organising the next study by collecting samples 2-3 days maximum after the occurrence of symptoms for all the investigated subjects.

Line 125: “Each participant provided a sample of saliva by spitting directly into a falcon tube which was placed on ice immediately after collection.” What about additional details on saliva collection protocol? Same period of the day? Morning or afternoon? No food, smoke, beverages before sampling? Mouth rinsing? The straightforward application of a defined protocol in the clinical routine is not easy, but an unsystematic sample collection (especially for a biofluid as saliva) leads to random data acquisition.

Lines 141-142: Supplementary oxygen, spontaneous and/ or assisted ventilation? Be specific.

Line 153: Which temperature were centrifuged the samples?

Line 158: Considering the reconstitution with 100 μL water:methanol (95:5), you are diluting your sample 1:2 v/v. Are you loosing this way all the information related to trace/ultra-trace components? Would be better a clean-up and pre-concentration step?

Line 186: Considering the sophisticated instrumentation available in your lab, why not choosing a 2 ppm accuracy?

Line 189: Did you work both in positive and negative mode? If not, why not considering a sample characterization even in negative mode? Why choosing 70 000 as resolution?

Lines 196-197: Did you employ all the cited databases sequentially? Specify. What about METLIN?

Line 208: Since we are referring to an explorative pilot study characterized by a small sample-size for each group (≤30), I’d rather use an unsupervised technique as PCA instead of PLS-DA. The use of PLS would be more indicated when dealing with a restricted panel of marker (selected by this preliminary work, confirmed through pure standard and analysed in targeted mode) and a wider population to build up robust predictive models.

Just out of curiosity, please furnish the PCA plot for both positive/negative and low/high severity conditions.

Results:

Table 1: Please insert the correct digits for the age and C-reactive protein data.

Line 243: “controls were age matched and had similar profiles in terms of gender, oxygen requirements and survival rates”. Do controls present oxygen requirements similar to both low/high severe patients?

Lines 258-260: Do you have an explanation for this unexpected behaviour?

Lines 272-273: Did you characterise your samples both in positive and negative mode? Figure 2A/2B are referred to positive acquisition mode. Isn’t it? How many features? Specify.

Line 280: “The optimal separation was found using 5 components”. By using just the first two PLS component, we can explain 22.5% of the variation in the response variable. This value is quite low, suggesting the presence of lot of noise in your raw data instead of valuable information. Please add a comment.

Line 294: Please change “COVID” with “COVID-19”.

Line 296: Insert a p-value in the text just to understand what you mean by “most disturbed”.

Lines 298-299: “MS/MS validated features separating..” Specify how many features are you referring to.

Line 311-313: “It was decided to project the PLS-DA model obtained for high severity versus low severity participants on to COVID-19 negative participants”. I disagree. The population of the negative participants may not be representative of people infected by SARS-CoV-2 and thus may not be a proper test set. Please justify your decision.

Discussion:

Line 325: Please see comments lines 123-124.

Lines 336-339: How can you explain the presence of a more marked difference between low/high severity than positive/negative subjects (where high severity is not so high as you mentioned, because most of the very severe patients were excluded from the study)?

Please add some references about recent experimental works which already suggested differences between low and high severity in COVID-19 based on biomarkers (as the most famous markers of inflammation) (doi.org/10.1002/rmv.2146; doi.org/10.1093/nsr/nwaa086; doi.org/10.1016/j.freeradbiomed.2022.01.021) to strengthen your results.

Lines 340-342. I disagree. The alteration of plasmatic metabolome in infected people is generally dramatically marked when compared to healthy subjects (probably this is reflected even in saliva). Please delete this sentence or confirm it by adding some references.

Line 349: “the features associated with high severity were present neither in low severity nor in COVID-19 negative participants.” Were they absent or still present but characterised by lower levels?

Line 350: How many? Only in positive mode? Please specify.

Line 357: Have you a biochemical explanation for this down-regulation?

Lines 364-365: I fully agree. The best choice would be to post-pone the use of predictive models to future studies characterized by a wider cohort of subjects and a selected panel of chemicals fully identified and quantified in targeted mode.

Reviewer #2: Saliva samples were collected from COVID-19 suspected patients and confirmed by RT-PCR. Then the samples were analyzed for metabolites by Liquid Chromatography Mass Spectroscopy (LC-MS) and correlated to COVID-19 infection status of the patients. The metabolites and patient data (sex, age, comorbidities (whether any treatment , the results and dates of COVID PCR tests, bilateral chest X-Ray changes, smoking status, drug regimen, and whether and when the participant presented with clinical symptoms of COVID-19 etc.). Authors concluded that COVID-19 severity was related to the alteration of the salivary metabolome.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: Yes: Veli Cengiz Ozalp

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2022 Sep 22;17(9):e0274967. doi: 10.1371/journal.pone.0274967.r002

Author response to Decision Letter 0


31 Aug 2022

Please note that these responses have also been uploaded as a file.

Response to reviewers

We are grateful to the reviewers for taking their time to review our manuscript. Please find herein our point-by-point response to each of the reviewers.

Editorial comments

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

We have resubmitted in accordance with file naming requirements.

2. Please amend either the abstract on the online submission form (via Edit Submission) or the abstract in the manuscript so that they are identical.

We have resubmitted identical Abstracts.

3. We note that you have stated that you will provide repository information for your data at acceptance. Should your manuscript be accepted for publication, we will hold it until you provide the relevant accession numbers or DOIs necessary to access your data. If you wish to make changes to your Data Availability statement, please describe these changes in your cover letter and we will update your Data Availability statement to reflect the information you provide.

We have amended the manuscript to include a direct link to the Zenodo data repository where the complete dataset is saved for open access by all.

https://doi.org/10.5281/zenodo.6924738

4. Please expand the acronym “EPSRC, BBSRC” (as indicated in your financial disclosure) so that it states the name of your funders in full. This information should be included in your cover letter; we will change the online submission form on your behalf.

This has been expanded in the manuscript (financial disclosure).

5. Your ethics statement should only appear in the Methods section of your manuscript. If your ethics statement is written in any section besides the Methods, please delete it from any other section.

We have deleted the section “Ethics Declaration” so that now the only ethics statement is in the methods section

6. Please include captions for your Supporting Information files at the end of your manuscript, and update any in-text citations to match accordingly. Please see our Supporting Information guidelines for more information: http://journals.plos.org/plosone/s/supporting-information.

This has been updated

Reviewer 1 comments

In this paper, the untargeted profiling of the salivary metabolome in COVID-19 patients is interestingly presented. This topic seems to be really appealing in this field and the work takes place as part of the “COVID-19 International Mass Spectrometry (MS) Coalition” attempt.

I would like to suggest some comments to improve the manuscript and clarify some points:

Main general comments:

Even considering the natural difficulties experienced in patient recruitment and sample collection during a pandemic, the pilot study described in this work has not been designed in a systematic manner. This is evident from the lack of detailed instructions for saliva sample collection and a variable timeframe occurring between symptom onset and saliva sampling (from 1 day to > 1 month). These conditions represent substantial confounding factors when assessing the predictive role of salivary metabolites in COVID-19. A scattered sample collection generally produces random results and wrong data interpretation. Considering the unsystematic approach and the small population size (10 high severity, 34 low-severity, 28 COVID-19 negative), it is not appropriate to describe your approach as a prognostic tool to assess COVID-19 severity. The proposed pilot study should be described as a preliminary and explorative study to hypothesize the role of salivary metabolome as a predictor of COVID-19 severity, more than a study to identify biomarkers. As mentioned in the discussion section, this pilot study should be confirmed through a wide-scale prospective observational study, mainly focused on a subset of selected chemicals analysed in targeted mode. Considering all these aspects, please restate properly the title and the objective of the work/conclusions in the abstract session. Be in line with the considerations included in the discussion section.

We agree with the reviewer’s identifications of the challenges we faced. We have amended the text to state that this is a preliminary and explorative study, as suggested, and rather than discussing saliva metabolomics in terms of its ability to act as a prognostic tool, have rephrased the text to refer to saliva metabolomics in terms of its ability to differentiate between severe and non-severe cases of COVID-19. These points are also reflected in the changes made in response to the detailed comments below.

Specific comments:

Abstract:

Line 35: Please add the list of clinical descriptors between brackets.

This has been amended

Lines 36-37: Please consider changing “mass spectrometry” with “high-resolution mass spectrometry”

We have made this change.

Please delete the colon between peak and area.

We have made this change.

Line 41: Specify the sixth feature

This has been amended

Lines 44-45: Please add the ROC curve to the Supplementary information section.

Given that the area under receiver operating curve is 1.00 we think that adding a ROC curve will not add value to the reader.

Lines 48-49: Unfortunately, the criteria to define low/high severity in COVID-19 are not standardized. Guidelines are not unique worldwide and, as you said, the threshold between low and high depends on clinical judgement. In my opinion, it is recommended to wait for a wider population size and more standardised criteria to define COVID-19 severity before proposing predictive models for prognostic purposes.

We agree with the reviewer that there is no accepted method for classification of COVID-19 severity, and this is a current research topic. However, there are several published precedents of using blood for providing a prognostic information for COVID-19. We have amended the discussion to explain this limitation.

Introduction:

Line 77: “In contrast, tests based on mass spectrometry can be provided in minutes..” Are you talking about on-line instrumentation? Unfortunately, when dealing with off-line approaches and sample treatment, the time is actually longer. Please, restate or insert a reference.

We appreciate the reviewer’s concern here, this refers to off-line instrumentation (both for PCR and mass spectrometry). We appreciate that the run time on the instrument is not the same as the time to send the sample to a laboratory and return a test result. However this is the same in both cases, so it is correct that mass spectrometry is faster than RT-PCR. There is a fundamental difference between mass spectrometry and RT-PCR run times, because the amplification step in RT-PCR takes several hours due to the temperature cycling. The reference cited in the text gives more detail on both approaches. We have adapted to the test to specify that the analysis (rather than the test) is provided in minutes.

Line 78: “Furthermore, mass spectrometry instrumentation is often available in hospital pathology laboratories.” Unfortunately, this is not true. MS is not among the facilities commonly found in hospitals/clinics. Please restate or insert a reference.

Our experience in the UK is that it is, but we agree with the reviewer that this is not necessarily the case internationally, or for smaller hospitals. We have rephrased to show that a mass spectrometry service is normally at least available (perhaps from a third party).

Line 81: “especially should vaccine escape lead to..”. Unclear. Please, restate.

This has been amended

Line 87: “and must be spun soon after collection to preserve the metabolome”. Please add a reference.

We have added a reference to the MS Coalition protocols for biofluid collection to meet this concern.

Line 88: “a saliva sample can be donated quickly and painlessly by a patient” Add the following reference on saliva collection (doi.org/10.1016/j.trac.2019.115781).

This reference has been added to the manuscript.

Line 89: “Saliva is itself a carrier of the coronavirus” This is not a good thing from a biosafety point of view. How did you ensure a safe collection/manipulation of the saliva specimen? Did you follow any standardized protocol? Please, insert some details on this important aspect.

The saliva handling is carried out under the guidance given by the mass spectrometry coalition, which is referenced in the paper (reference 22). The protocols also explain the health and safety aspects of handling saliva from hospitalised patients. This has been added to the methods section.

Lines 89-90: “information via its own characteristic metabolites. [13]..” What do you mean? Please, clarify.

The text has been expanded to explain

Materials and methods:

Lines 123-124: “saliva sampling ranged from 1 day to > 1 month..” This aspect could represent a critical issue especially for the main objective of the study, i.e. identifying prognostic markers. The levels and, thus, the predictivity are influenced by the timeframe occurring between symptoms/collection. I kindly suggest organising the next study by collecting samples 2-3 days maximum after the occurrence of symptoms for all the investigated subjects.

Patients whose RT-PCR result was greater than 14 days from saliva sampling were removed from the sample set for statical analysis. This has now been clarified within the methods section. However, we do agree with this point, and have made this a recommendation for future studies – see our discussion section.

Line 125: “Each participant provided a sample of saliva by spitting directly into a falcon tube which was placed on ice immediately after collection.” What about additional details on saliva collection protocol? Same period of the day? Morning or afternoon? No food, smoke, beverages before sampling? Mouth rinsing? The straightforward application of a defined protocol in the clinical routine is not easy, but an unsystematic sample collection (especially for a biofluid as saliva) leads to random data acquisition.

Samples were collected at the same time of day (morning), but it was not possible to control rinsing (due to health and safety restrictions at the time), and consumption of beverages, due to ethical considerations – we were not allowed to control the dietary intake of our patients. We agree that a consistent sampling procedure would be ideal, but unfortunately, we had operational constraints. We decided to take the pragmatic approach and work around these. Since these constraints were likely to be faced in any practical implementation of this test, we aimed at finding markers for COVID severity that were robust enough not to be perturbed by the sample collection method. We have made several edits to the manuscript to explain this better.

Lines 141-142: Supplementary oxygen, spontaneous and/ or assisted ventilation? Be specific.

This information was not recorded as it was not required to generate the respiratory rate score. The score was generated using measurements of breaths per minute and blood oxygen saturation on admission to A&E.

Line 153: Which temperature were centrifuged the samples?

They were centrifuged at room temperature. This has been added to the text.

Line 158: Considering the reconstitution with 100 μL water:methanol (95:5), you are diluting your sample 1:2 v/v. Are you loosing this way all the information related to trace/ultra-trace components? Would be better a clean-up and pre-concentration step?

We took the method published by the International COVID-19 mass spectrometry coalition, which aimed at providing standardised methods that could be adopted by multiple laboratories to enable data sharing. We agree that the method could have been better optimised , but chose to use this method to enable the comparison of our data with that from other laboratories around the world.

Line 186: Considering the sophisticated instrumentation available in your lab, why not choosing a 2 ppm accuracy?

Again, we agree that the method used could have been better optimised. However, for this work, we prioritised using the standardised method which would allow for global comparison of COVID-19 data.

Line 189: Did you work both in positive and negative mode? If not, why not considering a sample characterization even in negative mode? Why choosing 70 000 as resolution?

We ran a set of pooled saliva samples using the method and found that positive mode identified a greater number of features than negative mode. This is also the case in literature. We do, of course, recognise that utilising both modes will always be additive, but we were constrained on two fronts. First, saliva samples were scarce and we faced competing demands for the patient samples (proteomics, sequencing). Second, the LC-MS method employed was lengthy and took two weeks of instrumental run-time and instrumental availability meant that we were unable to run both positive and negative in the time available. We have adjusted the text to recognise this as a limitation.

Lines 196-197: Did you employ all the cited databases sequentially? Specify. What about METLIN?

These are employed in parallel by the Compound Discoverer software. Metlin was not used but we believe the databases listed are comprehensive.

Line 208: Since we are referring to an explorative pilot study characterized by a small sample-size for each group (≤30), I’d rather use an unsupervised technique as PCA instead of PLS-DA. The use of PLS would be more indicated when dealing with a restricted panel of marker (selected by this preliminary work, confirmed through pure standard and analysed in targeted mode) and a wider population to build up robust predictive models.

We agree with the reviewer that unsupervised approaches are important in the exploratory stage. Supervised analyses are also helpful in identifying variability in the feature set specifically related to the condition being investigated, but with small n and in a pilot study, we agree are prone to overfitting and should not be over-interpreted until validated in a more robust and wider population analysis. We have extended the discussion of limitations to address this specific point.

Just out of curiosity, please furnish the PCA plot for both positive/negative and low/high severity conditions.

This has been added to the supplementary materials

Results:

Table 1: Please insert the correct digits for the age and C-reactive protein data.

This has been changed

Line 243: “controls were age matched and had similar profiles in terms of gender, oxygen requirements and survival rates”. Do controls present oxygen requirements similar to both low/high severe patients?

Oxygen requirements were slightly skewed towards the high severity group at 40% versus 22% in low severity patients. This is as expected in severe cases of COVID-19 where low blood oxygen level is associated with poor patient outcomes.

Lines 258-260: Do you have an explanation for this unexpected behaviour?

This has been adapted in the text. We believe the reason is that CRP is not specific to COVID and all patients had clinical suspicion of COVID-19 infection, as discussed.

Lines 272-273: Did you characterise your samples both in positive and negative mode? Figure 2A/2B are referred to positive acquisition mode. Isn’t it? How many features? Specify.

The reviewer is correct this is positive mode only, as discussed above. Response to be confirmed - Added

Line 280: “The optimal separation was found using 5 components”. By using just the first two PLS component, we can explain 22.5% of the variation in the response variable. This value is quite low, suggesting the presence of lot of noise in your raw data instead of valuable information. Please add a comment.

We agree and we have added this limitation to the discussion

Line 294: Please change “COVID” with “COVID-19”.

We have made this change.

Line 296: Insert a p-value in the text just to understand what you mean by “most disturbed”.

The text has been amended

Lines 298-299: “MS/MS validated features separating..” Specify how many features are you referring to.

We have edited the text of this caption – this is all the MS/MS validated features.

Line 311-313: “It was decided to project the PLS-DA model obtained for high severity versus low severity participants on to COVID-19 negative participants”. I disagree. The population of the negative participants may not be representative of people infected by SARS-CoV-2 and thus may not be a proper test set. Please justify your decision.

This was done in order to test the model. We agree with the reviewer that there is a possibility that the negative participants and low severity participants are not representative and have acknowledged in the text that this does not constitute the ideal validation strategy. However, as stated at the beginning of the results section, the two populations are age and gender matched, and in the same hospital environment. In addition, we found very little separation between COVID-19 positive and negative patients. Therefore we have no evidence to suggest that these populations are not representative.

Discussion:

Line 325: Please see comments lines 123-124.

The text has been amended to specify removal of patients whose PCR was greater than 14 days from saliva sampling.

Lines 336-339: How can you explain the presence of a more marked difference between low/high severity than positive/negative subjects (where high severity is not so high as you mentioned, because most of the very severe patients were excluded from the study)?

We hypothesise that at low severity, the differences versus our controls are modest – we also note that our controls are also hospitalised inpatients with clinical symptoms suggestive of COVID-19 infection. This may reduce the differentiation between mild cases and equivalently sick patients regarding e.g. biomarkers relating to generalised inflammation. We have expanded the discussion of this point in the Discussion.

Please add some references about recent experimental works which already suggested differences between low and high severity in COVID-19 based on biomarkers (as the most famous markers of inflammation)

(doi.org/10.1002/rmv.2146; doi.org/10.1093/nsr/nwaa086; doi.org/10.1016/j.freeradbiomed.2022.01.021)

to strengthen your results.

We agree that these references are helpful and have added them to the manuscript.

Lines 340-342. I disagree. The alteration of plasmatic metabolome in infected people is generally dramatically marked when compared to healthy subjects (probably this is reflected even in saliva). Please delete this sentence or confirm it by adding some references.

We agree and have modified the text to improve clarity – our original comment was intended to apply only to the salivary metabolome. In homeostatically regulated biofluids infection does have a more marked influence.

Line 349: “the features associated with high severity were present neither in low severity nor in COVID-19 negative participants.” Were they absent or still present but characterised by lower levels?

The markers were not absent in the negative patients – their pattern was more like the low severity patients, and the text has been modified to explain this.

Line 350: How many? Only in positive mode? Please specify.

Only in positive mode – we have amended the text in various places to make clear that the experiment was performed in positive mode.

Line 357: Have you a biochemical explanation for this down-regulation?

We don’t have a biochemical explanation, and can only point to previous work which shows that the cytokine storm that is characteristic of severe covid infection is highly disruptive to metabolism. We think that more work is needed before the research community can establish whether salivary metabolite concentrations can be indicative of specific biological processes.

Lines 364-365: I fully agree. The best choice would be to post-pone the use of predictive models to future studies characterized by a wider cohort of subjects and a selected panel of chemicals fully identified and quantified in targeted mode.

We agree but we do think there is a role for preliminary, and untargeted discovery studies to identify a set of metabolites that are suitable for research in targeted mode.

Reviewer 2 comments

We thank Reviewer 2 for the generally favourable response and hope that the improvements and changes made will be acceptable to Reviewer 2 also.

Attachment

Submitted filename: PLOS ONE_reviewer_responses.docx

Decision Letter 1

Tommaso Lomonaco

8 Sep 2022

Untargeted saliva metabolomics by liquid chromatography - mass spectrometry reveals markers of COVID-19 severity

PONE-D-22-09481R1

Dear Dr. Matt Spick,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Tommaso Lomonaco, Ph.D

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Acceptance letter

Tommaso Lomonaco

13 Sep 2022

PONE-D-22-09481R1

Untargeted saliva metabolomics by liquid chromatography - mass spectrometry reveals markers of COVID-19 severity

Dear Dr. Spick:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Tommaso Lomonaco

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Fig. Clustering of patient samples according to extraction batch.

    Principal component analysis of each patient sample (circles) and batch QC’s (squares), coloured according to extraction batch, showing no significant clustering of patient samples according to extraction batch.

    (DOCX)

    S2 Fig. Principal component analysis of each patient sample and run QC. Figure shows low levels of QC variation according to position in the run sequence.

    (DOCX)

    S3 Fig. Principal component analysis for 75 participants and 324 features, COVID-19 positive / negative, LC-MS analysis in positive mode.

    (DOCX)

    S4 Fig. Principal component analysis for 44 participants and 324 features, high severity / low severity, LC-MS analysis in positive mode.

    (DOCX)

    S1 Table. Operating conditions of the mass spectrometer used in this research.

    (DOCX)

    S2 Table. Distinctive features between COVID-19 positive and negative.

    (DOCX)

    S3 Table. Distinctive features between COVID-19 high severity and low severity.

    (DOCX)

    Attachment

    Submitted filename: PLOS ONE_reviewer_responses.docx

    Data Availability Statement

    The aligned and annotated LC-MS data matrices used in this work are available at (https://doi.org/10.5281/zenodo.6924738). The analytical protocols used are openly available for all researchers to access. The website URL for the protocols is (https://covid19-msc.org/).


    Articles from PLoS ONE are provided here courtesy of PLOS

    RESOURCES