Abstract
Background: Definitive early-stage diagnosis of severe acute respiratory syndrome (SARS) is important despite the number of laboratory tests that have been developed to complement clinical features and epidemiologic data in case definition. Pathologic changes in response to viral infection might be reflected in proteomic patterns in sera of SARS patients.
Methods: We developed a mass spectrometric decision tree classification algorithm using surface-enhanced laser desorption/ionization time-of-flight mass spectrometry. Serum samples were grouped into acute SARS (n = 74; <7 days after onset of fever) and non-SARS [n = 1067; fever and influenza A (n = 203), pneumonia (n = 176); lung cancer (n = 29); and healthy controls (n = 659)] cohorts. Diluted samples were applied to WCX-2 ProteinChip arrays (Ciphergen), and the bound proteins were assessed on a ProteinChip Reader (Model PBS II). Bioinformatic calculations were performed with Biomarker Wizard software 3.1.1 (Ciphergen).
Results: The discriminatory classifier with a panel of four biomarkers determined in the training set could precisely detect 36 of 37 (sensitivity, 97.3%) acute SARS and 987 of 993 (specificity, 99.4%) non-SARS samples. More importantly, this classifier accurately distinguished acute SARS from fever and influenza with 100% specificity (187 of 187).
Conclusions: This method is suitable for preliminary assessment of SARS and could potentially serve as a useful tool for early diagnosis.
Since November 1, 2002, severe acute respiratory syndrome (SARS)1 has affected 32 countries and regions, with 8422 reported probable cases, 916 deaths, and local transmission in at least 6 countries (1). Collective efforts have been made to identify its epidemiologic determinant as a novel member of Coronaviridae, SARS-associated coronavirus (SARS-CoV) (2)(3)(4)(5)(6), and etiologic experiments in cynomolgus macaques have confirmed the virus as the causative agent for SARS (7)(8). Rapid progress has also been made in the determination of its genome sequences (9)(10)(11) and the molecular evolution of the coronavirus (12). Identification of angiotensin-converting enzyme 2 as the viral receptor provided further information toward deciphering its molecular mechanisms of infection (13).
Despite such advances in virologic studies, early diagnosis of SARS has been based primarily on the clinical definitions released by WHO and CDC (14)(15), which can be confusing or contradictory (16). Available serologic tests cannot guarantee an early diagnosis (17), and PCR-based molecular detection of the viral RNA suffers from unsatisfactory sensitivity and specificity (3)(17)(18)(19). In the last year, failure to develop diagnostic tests for SARS, especially in the acute phase, severely impacted specific prevention and treatment measures for SARS. There is a need to establish a reliable diagnostic methodology for SARS-CoV, in particular, to distinguish the similar clinical manifestations of SARS and other respiratory tract infections. This urgency is reinforced by the first SARS case not linked to laboratory contamination, which occurred in Guangdong, China this year (20).
Proteomic analysis has provided a unique tool for the identification of diagnostic biomarkers, evaluation of disease progression, and drug development (21)(22). Surface-enhanced laser desorption/ionization time-of-flight mass spectrometry (SELDI-TOF MS) enables rapid, reproducible protein/peptide profiling of multiple disease-specific biomarkers directly from crude samples (e.g., tissue cell lysates or body fluids) (23)(24). Small amounts of sample can be applied directly to a biochip coated with specific chemical matrices (e.g., hydrophobic, cationic, or anionic) or specific biochemical materials such as DNA fragments or purified proteins. The bound proteins/peptides can then be analyzed by MS to obtain the protein fingerprints, or even amino acid sequence determinants, when interfaced to a mass spectrometric microsequencing device.
Analogous to the proteomic detection of various cancers (25)(26), we used a weakly cationic ProteinChip (WCX2 chip surface) to retrospectively analyze SARS sera to determine whether there are distinct and reproducible protein fingerprints potentially applicable to the diagnosis of SARS. We established a decision tree algorithm consisting of four unique biomarkers for acute SARS in the training set and subsequently validated the accuracy of this classifier by use of a completely blinded test set.
Materials and Methods
patients and samples
More than 2000 serum specimens from suspected/probable SARS patients admitted to 38 major hospitals in the Beijing area between April 14 and June 5, 2003, were eligible for inclusion. The serum procurement, data management, and blood collection protocols were approved by the Beijing SARS-Control Working Group and were in accordance with WHO biosafety guidelines (27). Among the retrospective samples, only 74 were selected from probable patients whose blood samples were collected with onset of fever within 7 days at the time of admission (acute SARS patients; Table 1 ). Probable cases were based on the eligibility criteria set forth by WHO (15). These cases had also radiographic evidence of infiltrates consistent with pneumonia or respiratory distress syndrome on chest x-ray. The paired convalescent serum samples from the SARS cohort tested positive for IgM seroconversion by the IFA method (Beijing Genomics Institute), and four samples also tested positive in a DNA array test using nasopharyngeal samples. The 1067 non-SARS control serum samples (Table 2 ) were obtained from recruited healthy donors (n = 659) or from patients with respiratory infections [pneumonia (n = 176) or high fever (n = 203; 66 with influenza A)] or lung cancer (n = 29). The control samples were all negative for SARS-CoV seroconversion.
Table 1.
Patients with acute SARS who matched the fit in WHO SARS case definition.
| Days after symptom onset1 | Patients, n (F/M) | Hospital identification2 | IFA34 | PCR,5 n/NA | Sample partition | ||
|---|---|---|---|---|---|---|---|
| Training | Blinded test | ||||||
| 1 | 7 (6/1) | A, C, J, K | + | 0/7 | 4 | 3 | |
| 2 | 2 (1/1) | J, K | + | 0/2 | 1 | 1 | |
| 3 | 9 (4/5) | A, B, H, N, S, Y | + | 1/6 | 5 | 4 | |
| 4 | 12 (5/7) | D, E, H, J, K, S, T, X, Y | + | 2/10 | 8 | 4 | |
| 5 | 15 (7/8) | A, D, E, J, K, M | + | 0/15 | 7 | 8 | |
| 6 | 17 (7/10) | A, D, H, J, S, T, X, Y | + | 1/16 | 8 | 9 | |
| 7 | 12 (6/8) | C, E, J, M, O, P, S, T | + | 0/7 | 4 | 8 | |
Cases from April 15 to June 5, 2003, with retrospective serum samples collected ≤7 days after self-described onset of symptoms. The ages of these cohorts varied from 6 to 74 years. Each group of samples was divided into two parts for training and blinded tests.
Abbreviations for hospitals in Beijing area: A, Civil Aviation Hospital; B, Beijing Center for Disease Control and Prevention; C, Concord Hospital; D, Dongzhimen Hospital; E, Earth Temple Hospital; H, Chaoyang Hospital; J, Jishuitan Hospital; K, Peking University Medical School 3rd Affiliate Hospital; M, Martial Police General Hospital; N, North Suburban Hospital; O, Osier Hospital; P, State Power Hospital; S, Shijingshan Hospital; T, Tongren Hospital; X, Jiuxianqiao Hospital; Y, Youan Hospital.
IFA, immunofluorescence assays; NA, not available.
Included patients were positive for IgM seroconversion in immunofluorescence assays with the paired convalescent sera. The other information on microbiological tests, clinical records, or treatment were not accessible because of the classified nature of the work performed by Beijing SARS-Control Working Group.
Four included patients tested positive in a DNA chip array method (Xiao et al, manuscript in preparation) with four sets of DNA probes derived from SARS-CoV genome coding replicase 1A (2 independent probes), spike, and nucleocapsid genes. Other patients were negative by real-time fluorescent RT-PCR of nasopharyngeal aspirates.
Table 2.
Control cohorts with various respiratory inflammations and carcinomas.
| Cohort | Symptoms | Patients, n (M/F) | Clinical manifestations | Sample partition | ||
|---|---|---|---|---|---|---|
| Training | Blinded test | |||||
| 1 | Healthy1 | 659 (340/319) | 40 | 619 | ||
| 2 | Fever2 | 203 (97/106) | 38.7–40.1 °C; Flu3 (n = 66) | 16 | 187 | |
| 3 | Pneumonia4 | 176 (90/86) | CXR, P (n = 75); MP (n = 57); P+TB (n = 44) | 8 | 168 | |
| 4 | Lung cancer5 | 29 (15/14) | CXR + pathology (n = 3); CT (n = 16) | 10 | 19 | |
Sera from healthy persons attending Anzhen Hospital (n = 14) were collected in 2001, sera from 307 Hospital (n = 10) were collected before November 2002, and sera from Deyi Diagnostic Institute (n = 21; Beijing; epidemic region) and Taizhou Hospital (n = 34; Zhejiang Province; nonepidemic region) were collected on June 3, 2003. The rest of the healthy control sera, from Beijing Red Cross Blood Center, were collected between July and December 2003.
Serum samples from patients with high fevers were collected from Taizhou Hospital, Zehjiang Province (nonepidemic region), on June 3, 2003; from Chaoyang Hospital on November 15, 2003; and from Di Tan Hospital on November 22 and December 3, 2003. Among them, 66 were positive in the influenza A IgM ELISA.
Flu, influenza; CXR, chest x-ray; MP, mycoplasma; P, pneumonia; TB, mycobacterium tuberculosis; CT, computed tomography.
Serum samples were collected from Tiantan Hospital (n = 12), Beijing, on May 3, 2003; from Taizhou Hospital (n = 54), Zehjiang Province, on June 3, 2003; from Chaoyang Hospital (n = 38) on November 25, 2003; and from Ditan Hospital (n = 72) on December 3, 2003. All patients had positive chest x-rays and manifested with pneumonia or atypical pneumonia; 57 tested positive in the mycoplasma IgM ELISA, and 44 were positive in both the pneumonia and tuberculosis PCR assays.
Diagnosis was based on the criteria in Surgery, 5th edition (Zaide Wu. Beijing, China: Public Health Press). Clinical features included various forms of metastasis in the pericardium (n = 1), upper right clavicle (n = 1), lymph nodes (n = 1), liver (n = 1), and brain (n = 1); accompanying hydrothorax was also observed in nine patients.
The patients and serum samples were then divided into two groups: one for the “training” set and the other for the blinded “test” set (Tables 1 and 2 ). SARS and non-SARS control sera were all stored at −80 °C in 30-μL aliquots. Before each round of mass spectrometric assays, we routinely performed quality control of serum samples by the appearance and peak intensity of m/z 6635.09 (Fig. 3A ). Because the peak intensity of m/z 6635.09 remained relatively constant among spectra from different assays and different instruments, it was also used for normalization between each round of analyses.
Figure 3.

Intra- and interassay reproducibility.
(A), example of intraassay reproducibility of mass spectra and tree decision classification. Serum from an unaffected healthy control was individually applied to seven bait surfaces on eight chips, and seven randomly selected peaks (arrows) in each spectrum over a course of 27 days were used as surrogate markers for calculation of CV. The reproducibility of SELDI spectra, mass location, and intensity from spectrum to spectrum was determined accordingly. (B and C), examples of interassay reproducibility evaluation of the same chip loaded with duplicate serum samples from a healthy control (C1-A and -B), a SARS patient (S4-A and -B), and patients with pneumonia (P10-A and -B) or fever (F7-A and -B). Spectra from a PBS II (B) and PBS IIc (C) are aligned for comparison.
proteomic analysis
Three different chip chemistries (hydrophobic, anionic, and cationic) were first evaluated to determine which affinity chemistry gave the best serum profiles in terms of the number and resolution of proteins. The weakly cationic exchange chip (WCX) gave the best results with mass spectra from 0 to 200 kDa. The WCX chips in an 8-well bioprocessor format (Ciphergen) were chosen to allow a larger volume of serum for the chip array. The bioprocessor was pretreated with 150 μL of 100 mmol/L sodium acetate (pH 4) on a platform shaker at 250 rpm for 5 min. The excess sodium acetate was removed by inverting the bioprocessor on a paper towel. This process was repeated twice. The serum samples were thawed on ice in a Biosafety Level II cabinet, and 20 μL of each sample was mixed with 30 μL of U9 buffer (9 mol/L urea, 10 g/L CHAPS in phosphate-buffered saline) in a 1.5-mL Eppendorf tube and vortex-mixed at 4 °C for 20 min. We then added 100 μL of U1 buffer [U9 buffer diluted by ninefold (100 mL of U9 buffer plus 800 mL of Tris-HCl) with 50 mmol/L Tris-HCl (pH 7)] to the serum/urea mixture, vortex-mixed it for 10 min, and stopped the reaction by addition of 600 μL of sodium acetate on ice. We applied 50 μL of the serum/urea sample to each well, and the bioprocessor was sealed and shaken on a platform shaker at 250 rpm for 30 min. The excess serum/urea solution was discarded, and the bioprocessor was washed three times with 100 mmol/L sodium acetate as described above. The chips were removed from the bioprocessor, washed twice with deionized water, and air-dried. Subsequently 0.5 μL of EAM sinapinic acid saturated in 500 mL/L acetonitrile–5 g/L trifluoroacetic acid was added to each well. After air-drying, the sinapinic acid application was repeated.
Chips were then placed in the Protein Biological System II (PBS II) mass spectrometer reader (Ciphergen), and TOF spectra were generated by an average of 104 laser shots collected in the positive mode. The settings for low-energy readings were set with a high mass of 50 kDa and were optimized from 3 to 15 kDa at a laser intensity of 200, detector sensitivity of 8, and a focus by optimization center. High-energy readings were set with a high mass of 200 kDa and were optimized from 10 to 50 kDa at a laser intensity of 230 and a detector sensitivity of 9. Mass accuracy was calibrated externally by use of the All-in-One peptide molecular mass calibrator (Ciphergen).
Sera from a healthy control were individually applied to seven bait surfaces of eight WCX2 chips and run during 3-day intervals for analysis of within-run reproducibility. In parallel, 40 samples (10 from SARS patients, 10 from patients with fever, 10 from patients with pneumonia, and 10 from health controls) were applied in duplicate to a single chip and run on two different instruments (PBS II and PBS IIc; Ciphergen) for between-run analysis of instrument drift. To avoid the possibility that placement or run order of samples would affect assay accuracy, samples were loaded on chips in a rotational fashion. In brief, sample 1 was spotted on the 8-well directional chip (wells A to H) in duplicate in wells A and B and then in wells G and H of the second chip. Samples 2, 3, and 4 were loaded on chips in the same rotation order. We also randomized the order of chip placement in the spectrometer to minimize bias from run order. Spectra were collected for each sample and analyzed independently using the classification algorithm established in the training step.
The peak at m/z 6635.09 in the quality-control serum was adjusted to have an intensity of 40–60 for both the PBS II and PBS IIc. The peak intensity of m/z 6635.09 in the quality-control serum was used to normalize instrument resolution between the PBS II and PBS IIc. We normalized spectra using total ion current with an identical normalization coefficient and a low mass cutoff <2000 Da. If the factor was <0.3 or >2.9 after normalization to total ion current for the peak at m/z 3939, repeated runs would be performed. No outlier was rejected in the test. The “root” biomarker, m/z 3939, yielded the lowest and similar P value in both the PBS II and PBS IIc.
bioinformatics and biostatistics
Peak detection was performed with Biomarker Wizard software 3.1.1 (Ciphergen). The m/z ratios between 2000 and 20 000 were selected for analysis because this range contained the majority of the resolved protein and peptides. The m/z range between 0 and 2000 was eliminated from analysis to avoid interference from adducts, artifacts of the energy-absorbing molecules, and other possible chemical contaminants. Peak detection involved baseline subtraction, mass normalization using a common calibrant peak (m/z 6635.09), and normalization to the total ion current intensity with a minimum m/z of 2000, using an external normalization coefficient of 0.2 (normalization factor for individual spectrum = 0.2/average ion current for each spectrum) for spectra obtained at different times or locations. The settings used for autodetect peaks to cluster in the first pass were a signal-to-noise ratio of 5 and a minimum peak threshold of 5% of all spectra. The peak clusters were completed by second-pass peak detection using a signal-to-noise ratio of 2 and 0.3% of mass for the cluster window. An average of 99 peaks was detected in each spectrum. The mass range from 20 to 200 kDa was analyzed in parallel.
analytical procedure
Data analysis.
The data analysis process used in this study involved three stages: (a) peak detection and alignment; (b) selection of peaks with the highest discriminatory power; and (c) data analysis using a decision tree algorithm. A random sampling (acute SARS, fever, pneumonia, lung cancer, and healthy) with two strata (acute SARS and non-SARS) was used to separate the entire data set into training and test data sets. The training data set consisted of SELDI spectra from 37 acute SARS and 74 non-SARS serum samples. The validity and accuracy of the classification algorithm were then challenged with a blinded test data set consisting of 37 acute SARS and 993 non-SARS samples.
Decision tree classification.
Construction of the decision tree classification algorithm was performed as described previously (26) with modifications based on the Biomarker Patterns Software (Ciphergen). Classification trees were split into two branches or nodes, using one rule at a time. We set target the variable level at 2 and the minimum value at 0, and the decision was made based on the presence or absence and the intensity of one peak, using the Gini or Twoing method, favoring even splits from 0.00 to 2.00 and varied by 0.2 each time, and with V-fold cross-validation from 6 to 12 changed by 2 for the growth of 88 trees. The lowest cost tree (value = 0.068; Gini = 2.0; V-fold = 10) was selected for the final test.
Results
tree classification and pattern discovery
To identify the serum biomarkers that could distinguish SARS from non-SARS samples, we used a training set of specimens (37 SARS acute and 74 controls; Tables 1 and 2 ) and constructed the decision tree classification algorithm using 10 989 peaks [99 peaks × (37 + 74) spectra] of statistical significance identified in the low energy readings (see Materials and Methods). The classification algorithm used four peaks between 3 and 12 kDa (m/z 3939.08, 4137.71, 8136.64, and 11 514.2) and generated five terminal nodes (Fig. 1 ). These discriminatory peaks efficiently split SARS specimens into terminal nodes 3 and 5 and non-SARS samples into terminal nodes 1, 2, and 4. Each mass peak showed a mean intensity ratio of SARS vs non-SARS >3 and a P value close to 0 (Table 3 ). Notably, the protein or peptide with masses at 3939.08, 8136.64, and 11 514.2 Da was up-regulated in patients with acute SARS, whereas that of a mass at 4137.71 Da was down-regulated compared with healthy controls or patients with respiratory tract infections. A representative spectrum of a SARS specimen aligned with that of a healthy control (Fig. 2A ) showed the four fingerprints in node 3 required for pattern recognition in the classifier. The unique presence of the root biomarker, m/z 3939.08, is demonstrated in the alignment of representative spectra of samples from patients with acute SARS (1, 3, 5, and 7 days after the onset of fever; from terminal node 5) and those from healthy controls and patients with fever and influenza or pneumonia (Fig. 2B ). This decision algorithm correctly classified 37 of 37 (100%) of the acute SARS samples and 72 of 74 (97.3%) of the non-SARS controls in the training set (Table 3 ).
Figure 1.

Diagram of the decision tree classification in the training data set.
The numbers in the root node (top), the descendant nodes (ovals), and the terminal nodes 1–5 (rectangles) represent the classes. S, SARS; NS, non-SARS; N, sum of S and NS. The numbers below the root and descendant nodes are the mass values followed by the peak intensity values. For example, the mass value under the root node is 3939.08 kDa, and the intensity is ≤1.7107.
Table 3.
Biomarker statistics for SARS vs non-SARS spectra and decision tree classification.1
| m/z | P | Acute SARS | Non-SARS | Fold | Proteomic analysis | Sensitivity,1 % | Specificity,2 % | ||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Mean | SD | Mean | SD | ||||||||
| 3939.08 | 0 | 11.80255 | 10.26216 | 0.71233 | 1.72247 | 16.57 | Training | 100.0 | 97.3 | ||
| 4137.72 | 3 × 10−10 | 0.69703 | 0.94952 | 2.25324 | 2.73292 | 0.31 | (37/37) | (72/74) | |||
| 8136.64 | 0 | 3.33836 | 2.74166 | 0.99829 | 1.44389 | 3.34 | Test | 97.3 | 99.4 | ||
| 11514.28 | 0 | 2.18812 | 2.89383 | 0.26264 | 0.68112 | 8.33 | (36/37) | (987/993) | |||
a,b The 95% confidence intervals were estimated using the principle of binominal distribution:
for sensitivity, the 95% confidence interval was 90.5–100.0% for the training set and 85.8–99.9% for the test set;
for specificity, the 95% confidence interval was 90.6–99.7% for the training set and 91.9–96.9% for the test set.
Figure 2.

Representative SELDI spectra.
(A), combination of four peak masses required to correctly classify the sample (S4d-B, patient B 4 days after the onset of illness) as SARS in terminal node 3. The arrows in the magnified panels indicate the differentially expressed protein peaks compared with the healthy control (C6-B) used in the classifier. The mass and peak intensity are displayed as in Fig. 1 . (B), alignment of representative SARS and non-SARS controls [healthy, pneumonia, and influenza and fever (Flu/Fever)] spectra with the mass range (boxed) for the root biomarker m/z 3939.08 (arrow) highlighted. Shown are examples of SARS spectra from days 1, 3, 5, and 7 after the onset of symptoms.
The above classifier used only those masses in the low-energy readings (m/z <50 000). To exhaust all meaningful serum biomarkers, we expanded the analysis of the same training samples in the high-energy setting (m/z <200 kDa, see Materials and Methods) and pooled both low- and high-energy readings together [161 × (37 + 74) = 17 871 peaks]. The classification algorithm then used five peaks between 4 and 16 kDa (m/z 4824.28, 8136.64, 11505.30, 14 023.00, and 15 369.20; peaks at m/z 8136.64 and 11 505.30 overlapped with those in Fig. 1 ) in six terminal nodes and yielded a sensitivity and specificity of 94.6% (35 of 37) and 95.9% (71 of 74), respectively (data not shown). The peaks at m/z 3939.08 and 4137.71 in this new classifier disappeared because their corresponding peak intensities were beyond the limits after normalization with the intensity for the peak at m/z 6635.09 (see the section on patients and samples in the Materials and Methods). However, because most of the SARS cases in this alternative classifier (34 of 37) fell into the terminal node where the proteins/peptides were down-regulated (m/z 14023.0 ≤0.611087, m/z 4824.28 ≤0.746989, and m/z 15369.2 ≤3.27656), and because this algorithm had to combine two energy settings for analysis, we reasoned that the decision tree generated with only low-energy readings (Fig. 1 ) would be more sensitive (100%) and more convenient for a clinical application.
To determine the reproducibility of SELDI spectra, mass location, and intensity from array to array on a single chip (intraassay) and between instruments (interassay), we first spotted the serum from a healthy control on seven baits in a single chip and collected seven independent spectra over a time span of 21 days (Fig. 3A ). We then selected seven proteins in the range of 3–10 kDa (m/z 4089.59, 5334.17, 5631.18, 5901.49, 6625.63, 7762.24, and 7966.63; black arrows in Fig. 3A ) to calculate the intraassay CV. These peaks were selected because they were in the proximity of the four biomarkers with comparable current intensities. The interassay experiments were similar except that sera from healthy controls and from patients with high fever, pneumonia, and SARS were applied to a single chip, and the independent spectra were collected from two different instruments (PBS II and PBS IIc; Fig. 3, B and C ). The mean intra- and interassay CVs for peak location were 0.02% and 0.03%, respectively. We considered masses with accuracies within 0.1% between spectra to be the same. The mean intra- and interassay CVs for the normalized intensity were 15% and 20%, respectively. CV calculations using lower intensity peaks (Fig. 3A , gray arrowheads), on the other hand, yielded results similar to those obtained with the seven high-intensity peaks (peak location, intra- and interassay CVs both 0.03%; peak intensity, intraassay CV = 17% and interassay CV = 18%).
detection of sars
Analysis of spectra from the completely blinded test set (37 acute SARS and 993 controls; Tables 1 and 2 ) accurately classified 36 of 37 (97.3%) SARS specimens and accurately classified 987 of 993 (99.4%) of the controls as non-SARS (Table 3 ). More important was that the classification algorithm successfully distinguished acute SARS from fever and influenza, with a sensitivity and specificity reaching 97.3% (36 of 37) and 100% (187 of 187; 60 of 60 with influenza), respectively. Interestingly, when we tested the classifier using an additional control population of 40 samples from patients in the Beijing area with measles after July 16, 2003, who had no history of close contact with SARS patients and had not visited those hospitals treating SARS patients, the classifier had a specificity of 100% (95% confidence interval, 89–100%; data not shown).
Discussion
Several laboratory tests, based on either viral RNA (3)(17)(19) or serology (6)(17), have been developed to complement clinical characteristics and epidemiologic data in the identification of SARS, but early detection of SARS with sufficiently high sensitivity and specificity has not been achieved.
The identification of proteins/peptides of pathophysiologic significance (phenomic fingerprints) in crude biological and clinical samples by SELDI-TOF MS has been demonstrated in various cancer studies (28). Using a similar profiling strategy, we have established a classification algorithm that delineates probable SARS patients as early as day 1 after self-described onset of symptoms from healthy individuals and from patients with respiratory tract infections in the training set (sensitivity = 100%; specificity = 97.3%). When applied to the blinded test set, this discriminatory profiling method precisely classified 97.3% of patients with acute SARS and 99.4% of non-SARS patients. More strikingly, our classifier was able to discriminate SARS-CoV infection from bacterial (mycoplasma, tuberculosis) and other local (influenza) or systemic (measles) viral infections of the respiratory tract with a specificity reaching 100%. This was attributable to the inclusion of corresponding inflammatory control samples in the training set and optimization of the classification algorithm. The biomarkers identified in the acute phase of SARS seemed to remain throughout the convalescent phase of the disease because when we applied the identical tree classification to samples from patients in whom onset of fever had been >2, 3, 4, and >5 weeks previously, we could detect SARS with sensitivities and specificities reaching 89.2% and 91.8%, 86.0% and 91.8%, 93.1% and 91.8%, and 79.5% and 91.8%, respectively (data not shown). One intriguing observation was that SARS patients clustering in terminal node 3 all demonstrated moderate clinical features, whereas those in node 5 were severe cases. We are investigating the correlation between this proteomic pattern and the pathology of SARS.
These results represent, to the best of our knowledge, the most accurate laboratory technique for early detection of SARS: PCR-based assays have a maximum sensitivity of 80% when used to test nasopharyngeal aspirates or plasma specimens (29)(30). The proteomic method described here also has advantages over PCR-based assays in that it does not require BSL-3 containment and it can detect SARS in serum samples. This is a critical alternative to PCR-based tests, which are challenged by low viral loads in nasopharyngeal aspirates and throat swab specimens in the acute phase of SARS.
Instead of traditional chromatographic fractionation of samples, we directly spotted the crude serum on the WCX chips. By doing this we avoided the unnecessarily biased depletion of thousands of proteins and/or peptides associated with human serum albumin before MS analysis. Processing of samples and generation of the diagnostic mass spectra by our method required only a small amount of serum (20 μL vs several milliliters needed for PCR methods) and took <3 h. High-throughput proteomic screening for SARS in a 96-well format is also feasible.
We adhered to the WHO case definition and eligibility criteria for SARS and avoided using samples from non-SARS controls from hospitals where SARS patients had been admitted because these persons might have a history of close contact with SARS patients or had been inside those SARS hospitals. We further emphasized this point by sampling control sera from a nonepidemic region of the country. Although the possibility might exist that the difference in serum fingerprints would reflect differences among SARS and non-SARS hospitals, the fact that all SARS cases from 38 different hospitals fit into the single classification algorithm would likely rule out such a concern. More importantly, severe and mild cases of SARS from different hospitals, which had been completely randomized in the experimental analysis, fell into distinct nodes of the tree classification, strongly indicating that the biomarkers we have identified were specific to SARS and not the sites at which blood samples were collected. We further minimized the potential sampling bias by simultaneously using four biomarkers instead of one (e.g., m/z 3939.08), which nevertheless could sufficiently delineate SARS from non-SARS (sensitivity = 93.7%; specificity = 91.8%; data not shown). All SARS and non-SARS samples were from patients with the same ethnic background. SARS and non-SARS control sera collected at different times were all freshly aliquoted and properly stored at −80 °C.
The differential protein pattern as the discriminator between SARS and non-SARS is independent of protein identities. The origins and full identities of the discriminating biomarkers are under investigation. To know their identities for the purpose of differential diagnosis is not absolutely required, as shown by numerous studies showing diagnosis of cancers by SELDI methods. However, to characterize these peaks would certainly help in understanding the biological roles of these peptide/proteins and could potentially lead to the discovery of more direct diagnostic tools and novel therapeutic targets for SARS-CoV.
Acknowledgments
We thank Drs. C. Stohr and F.C. He for helpful discussions and comments and Drs. T. Yip, E. Fung, L. Ma, W. Zhang, and F. Zhang for their communication of results and assistance in statistical analyses. We thank the SARS Control Scientific Committee of the Ministry of Science and Technology of China (MOST), the National Institute for the Control of Pharmaceutical and Biological Products (NICPBP), and the Beijing SARS-Control Working Group for their encouragement and technical assistance. This work was supported by an Outstanding Young Investigators Fellowship of National Natural Science Foundation of China (NSFC30025010) and the “973 Plan” of MOST (2002CB513000 and 2003CB514116) to H.T.
Footnotes
Nonstandard abbreviations: SARS, severe acute respiratory syndrome; CoV, coronavirus; SELDI-TOF MS, surface-enhanced laser desorption/ionization time-of-flight mass spectrometry; and PBS, Protein Biological System.
References
- 1.WHO. Cumulative number of reported probable cases of SARS. 2003. http://www.who.int/csr/sars/country/en/country2003_08_15.pdf (accessed August 2003)..
- 2.Donnelly CA, Ghani AC, Leung GM, Hedley AJ, Fraser C, Riley S, et al. Epidemiological determinants of spread of causal agent of severe acute respiratory syndrome in Hong Kong. Lancet 2003;361:1761-1766. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Drosten C, Gunther S, Preiser W, van der Werf S, Brodt HR, Becker S, et al. Identification of a novel coronavirus in patients with severe acute respiratory syndrome. N Engl J Med 2003;348:1967-1976. [DOI] [PubMed] [Google Scholar]
- 4.Ksiazek TG, Erdman D, Goldsmith CS, Zaki SR, Peret T, Emery S, et al. A novel coronavirus associated with severe acute respiratory syndrome. N Engl J Med 2003;348:1953-1966. [DOI] [PubMed] [Google Scholar]
- 5.Poutanen SM, Low DE, Henry B, Finkelstein S, Rose D, Green K, et al. Identification of severe acute respiratory syndrome in Canada. N Engl J Med 2003;348:1995-2005. [DOI] [PubMed] [Google Scholar]
- 6.Peiris JS, Lai ST, Poon LL, Guan Y, Yam LY, Lim W, et al. Coronavirus as a possible cause of severe acute respiratory syndrome. Lancet 2003;361:1319-1325. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Kuiken T, Fouchier RA, Schutten M, Rimmelzwaan GF, van Amerongen G, van Riel D, et al. Newly discovered coronavirus as the primary cause of severe acute respiratory syndrome. Lancet 2003;362:263-270. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Fouchier RA, Kuiken T, Schutten M, van Amerongen G, van Doornum GJ, van den Hoogen BG, et al. Aetiology: Koch’s postulates fulfilled for SARS virus. Nature 2003;423:240. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Rota PA, Oberste MS, Monroe SS, Nix WA, Campagnoli R, Icenogle JP, et al. Characterization of a novel coronavirus associated with severe acute respiratory syndrome. Science 2003;300:1394-1399. [DOI] [PubMed] [Google Scholar]
- 10.Marra MA, Jones SJ, Astell CR, Holt RA, Brooks-Wilson A, Butterfield YS, et al. The genome sequence of the SARS-associated coronavirus. Science 2003;300:1399-1404. [DOI] [PubMed] [Google Scholar]
- 11.Ruan YJ, Wei CL, Ee AL, Vega VB, Thoreau H, Su ST, et al. Comparative full-length genome sequence analysis of 14 SARS coronavirus isolates and common mutations associated with putative origins of infection. Lancet 2003;361:1779-1785. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Stadler K, Masignani V, Eickmann M, Becker S, Abrignani S, Klenk HD, et al. SARS—beginning to understand a new virus. Nat Rev Microbiol 2003;1:209-218. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Li W, Moore MJ, Vasilieva N, Sui J, Wong SK, Berne MA, et al. Angiotensin-converting enzyme 2 is a functional receptor for the SARS coronavirus. Nature 2003;426:450-454. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.CDC. Updated interim U.S. case definition of severe acute respiratory syndrome (SARS). 2003. http://www.cdc.gov/ncidod/sars/casedefinition.htm (accessed August 2003)..
- 15.WHO. Case definitions for surveillance of severe acute respiratory syndrome (SARS). 2003. http://www.who.int/csr/sars/casedefinition/en/ (accessed August 2003)..
- 16.Hon KL, Leung CW, Cheng WT, Chan PK, Chu WC, Kwan YW, et al. Clinical presentations and outcome of severe acute respiratory syndrome in children. Lancet 2003;361:1701-1703. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Peiris JS, Chu CM, Cheng VC, Chan KS, Hung IF, Poon LL, et al. Clinical progression and viral load in a community outbreak of coronavirus-associated SARS pneumonia: a prospective study. Lancet 2003;361:1767-1772. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Ng EKO, Hui DS, Chan KC, Hung E, Chiu R, Lee N, et al. Quantitative analysis and prognostic implication of SARS coronavirus RNA in the plasma and serum of patients with severe acute respiratory syndrome. Clin Chem 2003;49:1976-1980. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Poon LL, Wong OK, Luk W, Yuen KY, Peiris JS, Guan Y. Rapid diagnosis of a coronavirus associated with severe acute respiratory syndrome (SARS). Clin Chem 2003;49:953-955. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.WHO. Laboratory confirmation of a SARS case in southern China. http://www.who.int/csr/don/2004_01_05/en/ (accessed January 2004).. [PubMed]
- 21.Hanash S. Disease proteomics. Nature 2003;422:226-232. [DOI] [PubMed] [Google Scholar]
- 22.Boguski MS, McIntosh MW. Biomedical informatics for proteomics. Nature 2003;422:233-237. [DOI] [PubMed] [Google Scholar]
- 23.Petricoin EF, Zoon KC, Kohn EC, Barrett JC, Liotta LA. Clinical proteomics translating benchside promise to bedside reality. Nat Rev Drug Discov 2002;1:683-695. [DOI] [PubMed] [Google Scholar]
- 24.Wright GL, Jr. SELDI proteinchip MS: a platform for biomarker discovery and cancer diagnosis. Expert Rev Mol Diagn 2002;2:549-563. [DOI] [PubMed] [Google Scholar]
- 25.Petricoin EF, Ardekani AM, Hitt BA, Levine PJ, Fusaro VA, Steinberg SM, et al. Use of proteomic patterns in serum to identify ovarian cancer. Lancet 2002;359:572-577. [DOI] [PubMed] [Google Scholar]
- 26.Adam BL, Qu Y, Davis JW, Ward MD, Clements MA, Cazares LH, et al. Serum protein fingerprinting coupled with a pattern-matching algorithm distinguishes prostate cancer from benign prostate hyperplasia and healthy men. Cancer Res 2002;62:3609-3614. [PubMed] [Google Scholar]
- 27.WHO. WHO biosafety guidelines for handling of SARS specimens. 2003. http://www.who.int/csr/sars/biosafety2003_04_25/en/ (accessed August 2003)..
- 28.Wulfkuhle JD, Liotta LA, Petricoin EF. Proteomic applications for the early detection of cancer. Nat Rev Cancer 2003;3:267-275. [DOI] [PubMed] [Google Scholar]
- 29.Grant PR, Garson JA, Tedder RS, Chan PK, Tam JS, Sung JJ. Detection of SARS coronavirus in plasma by real-time RT-PCR. N Engl J Med 2003;349:2468-2469. [DOI] [PubMed] [Google Scholar]
- 30.Poon LL, Chan KH, Peiris JS. Crouching tiger, hidden dragon: the laboratory diagnosis of severe acute respiratory syndrome. Clin Infect Dis 2004;38:297-299. [DOI] [PMC free article] [PubMed] [Google Scholar]
