Abstract
Objective:
To better understand the pathogenesis of knee osteoarthritis (OA) through identification of serum diagnostics.
Design:
We conducted multiple reaction monitoring mass spectrometry analysis of 107 peptides in baseline sera of two cohorts: the Foundation for NIH (n=596 Kellgren-Lawrence (KL) grade 1–3 knee OA participants); and the Johnston County Osteoarthritis Project (n=127 multi-joint controls free of radiographic OA of the hands, hips, knees (bilateral KL=0), and spine). Data were split into (70%) training and (30%) testing sets. Diagnostic peptide and clinical data predictors were selected by random forest (RF); selection was based on association (p<0.05) with OA status in multivariable logistic regression models. Model performance was based on area under the curve (AUC) of receiver operating characteristic and precision-recall (PR) curves.
Results:
RF selected 23 peptides (19 proteins) and BMI as diagnostic of OA. BMI weakly diagnosed OA (ROC-AUC 0.57, PR-AUC 0.812) and only symptomatic OA cases. ACTG was the strongest univariable predictor (ROC-AUC 0.705, PR-AUC 0.897). The final model (8 serum peptides) was highly diagnostic (ROC-AUC 0.833, 95% CI 0.751, 0.905; PR-AUC 0.929, 95% CI 0.876, 0.973) in the testing set and equally diagnostic of non-symptomatic and symptomatic cases (AUCs 0.830–0.835), and not significantly improved with addition of BMI. The STRING database predicted multiple high confidence interactions of the 19 diagnostic OA proteins.
Conclusions:
No more than 8 serum protein biomarkers were required to discriminate knee OA from non-OA. These biomarkers lend strong support to the involvement and cross-talk of complement and coagulation pathways in the development of OA.
Keywords: osteoarthritis, diagnostic, serum, knee, coagulation, complement, marker, mass spectrometry, proteomics
Introduction
Currently, OA is diagnosed on the basis of symptoms, physical examination, and radiographic evidence but these methods are reactive, not predictive1 and lack sensitivity. For example, knee pain has only a 23% sensitivity and 88% specificity for a diagnosis of radiographic OA2. Moreover, in what has been termed the “OA iceberg”3, OA symptoms may manifest prior to imaging abnormalities, thereby being difficult to interpret; for example, based on a systematic literature review, 25–75% of painful knees could not be diagnosed as OA by radiography4.
Therefore, a search for reliable diagnostic biomarkers of OA is underway. There have been previous efforts to identify diagnostic systemic (serum, plasma, or urine) and synovial fluid biomarkers of OA1, 5, but even the most prominent candidate, notably urinary C-terminal telopeptide of collagen type II (uCTXII), is still not sufficiently discriminating6–8. More recently, many OA diagnostic biomarkers have been proposed based on multi-marker proteomic analysis9–13. The goal of our study was to add to this body of work with our targeted mass spectrometry-based multiple reaction monitoring (MRM) analysis of multiple markers in serum using a relatively large sample size compared to most OA diagnostic studies to date. We addressed statistical challenges in the field with the use of precision recall (PR) curves, and random forest (RF)14 for unbiased variable selection and to impute missing at random (MAR) data.
Methods
FNIH and JoCoOA cohorts
The Foundation for NIH (FNIH) cohort (n=600), a subset of the Osteoarthritis initiative (OAI), is comprised of individuals with Kellgren-Lawrence15 (KL) grade 1–3 radiographic knee OA at baseline based on scoring using a standardized atlas16; the majority (61%) had baseline WOMAC pain score >0. There were sufficient baseline sera to perform proteomic analyses on n=599.
The Johnston County Osteoarthritis Project (JoCoOA) cohort included a subset (n=129) of ‘multi-joint controls’, all >45 years of age, majority (76%) with baseline WOMAC pain score=0, selected to be: i) free of baseline radiographic OA, based on not meeting designated OA criteria for the hand (KL≥2 in at least 3 hand joints), hip (KL≥2), knee (KL≥1), and lumbar spine (moderate anterior vertebral osteophyte and mild disc space narrowing); ii) the majority (68.2%) having follow-up data confirming lack of radiographic OA development over the subsequent 5–15 years establishing them as non-incident OA controls; iii) having minimal baseline hand and spine symptoms17, 18; iv) being free of knee or hip symptoms (pain, aching, or stiffness on most days) at baseline and 5 to 15-year follow-up when available; v) the majority (76%) had baseline WOMAC pain score=0.
Baseline non-depleted sera (tryptic digests of 1/50th volume of neat serum) from a total of 728 individuals, n=599 (FNIH) OA cases and n=129 (JoCoOA) controls, underwent proteomic analysis by mass spectrometry (MS)-based multiple reaction monitoring (MRM) (method previously described19). The serum samples were randomly split into twelve sets; sub-aliquots of equal size from each of the samples in Set 1 were used for quality control. The average percent coefficients of variation (%CVs) of the sample pool quality control (SPQC) were 11.4–17.5% for the 12 sets. The average %CVs of the digestion quality control (DQC) samples were 10.2–17.5% for the 12 sets. For more on sample processing, see Zhou et al19.
Statistical Analysis
The proteomic data were expressed as a ratio of the endogenous to stable isotope labelled (SIL) peptide quantity, a unitless measure. We computed descriptive statistics (mean, median, etc.) of ratios of endogenous peptide to SIL quantities. These ratios were converted to z-scores to compare their values across the multiple biomarkers. The SIL mixture was spiked volumetrically at 20 fmol of SIL per 1 μg of serum assuming a serum protein concentration of 50μg/μL. To evaluate sample quality and outliers, we computed principal components (PC) based on all biomarkers and examined the participant clustering pattern. Two outliers were detected due to their clear deviation from the clusters by comparing the first two PCs, and were removed from further analyses. Three samples (one OA, two controls) with peptide missing rate >15% were excluded. For the remaining samples, missing biomarker data were imputed using the ‘missRanger’ package by chained random forest implementation20; we performed 10 multiple imputations of the missing values for FNIH and JoCoOA together. Following guidance in the literature21, 22, we included all variables in the imputation scheme that we desired to study: 107 peptides, demographic variables of age, body mass index (BMI), sex, race, and a binary variable indicating OA status.
We annotated each peptide based on their amino acid position and protein name; for example, the two CRAC1 peptides were denoted CRAC1(101–108) and CRAC1(170–178). To select and validate the diagnostic predictors, FNIH and JoCoOA data were split into non-overlapping training (70% of data) and test (30% of data) datasets balanced by age, BMI, sex, and KL grade (controls had knee KL-grade=0 bilaterally). To assess the biomarkers univariately, we performed Wilcoxon-Mann-Whitney (WMW) tests on the training data. The WMW odds (95%CIs) were calculated for each variable (107 peptides, BMI, and age); the WMW odds can be interpreted as showing the probability that a randomly selected value is higher (WMW-odds > 1) or lower (WMW-odds < 1) in OA than controls. We applied the Benjamini-Yekutieli (BY) adjustment to the p-values to control the false discovery rate (FDR) at level 0.05.
We implemented a three-step process for variable selection. First, the 10 multiple imputed training datasets each underwent 100 repetitions of RF for a total of 1000 repetitions. A variable was considered important if it was selected at least 90% of the time (900 or more times out of 1000 repetitions of selection). Second, the selected peptides were pruned by including only one peptide from each highly correlated peptide cluster (rs >0.8). Third, the selected independent peptides and clinical variables were used to construct a multivariable logistic regression model for OA status as the outcome. The peptides meeting p<0.05 were considered as the final parsimonious set of biomarkers.
Model performance was evaluated using the area under the curve (AUC) of the receiver operating characteristic (ROC) curve and the precision-recall (PR) curve; the PR curve can be more informative than the ROC curve when there are imbalanced sample sizes such as in this study23. The comparator for PR-AUCs was 185/(185+45)=0.80, corresponding to the number of OA cases divided by the total sample size of the test data, in contrast to ROC-AUC for which AUC 0.5 is the reference value. The logistic regression results from the imputed datasets were combined using Rubin’s rules24, then evaluated on the testing data, applying the final coefficients from the training set data. Confidence intervals for the PR-AUC and ROC-AUC were obtained by bootstrapping using the percentile method. For each model, sensitivity and specificity were calculated in the testing set applying the Youden’s J statistic calculated in the training set. Sensitivity analyses evaluated performance of the models in the non-symptomatic (baseline WOMAC pain=0) and symptomatic (baseline WOMAC pain>0) FNIH OA participants in the testing set.
All statistical analyses were conducted using R version 4.2.2. The WMW test results were used in Ingenuity Pathway Analysis (IPA, Qiagen) to create network diagrams and identify pathways with which the diagnostic proteins most aligned. STRING (version 11.5) database25 protein-protein interaction network analyses were conducted to elucidate first order interactions of the identified OA diagnostics. STRING combined interaction scores are reported that reflect direct (physical) and indirect (functional) associations, determined experimentally and by text mining.
Results
Cohort characteristics
Compared to controls (n=127), the knee OA cases (n=596) were slightly older (median age 61 vs. 59 years), with higher BMI (median 30.2 vs. 28.2), and fewer females (59% vs. 63%) (Table 1). The majority of study participants were White (79%, 67%); the FNIH OA cohort included 5 Asian and 11 Other non-White races, while JoCoOA controls were limited to White and Black.
Table 1:
Summary statistics of the JoCoOA and FNIH cohorts.
| Baseline Characteristics | JoCoOA Controls | FNIH OA Cases |
|---|---|---|
| Total n | 127 | 596 |
| Female n (%) | 80 (63%) | 350 (59%) |
| Age median (years) [Min, Max] | 59.0 [45.0, 95.0] | 61.0 [45.0, 79.0] |
| BMI median (kg/m2) [Min, Max] | 28.2 [20.0, 54.4] | 30.2 [18.6,46.7] |
| Race | ||
| 0 (0%) | 11 (2%) | |
| 85 (67%) | 474 (79%) | |
| 42 (33%) | 108 (18%) | |
| Asian | 0 (0%) | 5 (1%) |
| Knee KL grade | ||
| 127 (100%) | 0 (0%) | |
| 0 (0%) | 75 (12.6%) | |
| 0 (0%) | 303 (50.8%) | |
| 3 | 0 (0%) | 218 (36.6%) |
| WOMAC pain mean (SD) | 1 (2.5)* | 12.0 (15.5)* |
KL: Kellgren Lawrence grade knee OA
WOMAC: Western Ontario and McMaster Universities Arthritis Index pain score normalized on a 0–100 scale
The majority (76%) of JoCoOA controls had baseline WOMAC pain scores of 0; a total of 231 (39%) of FNIH cases had baseline WOMAC pain scores of 0.
The random parsing led to well-balanced of training and testing sets that were not significantly different from the overall combined cohort shown in the Table (data not shown).
Data Exclusion
Six samples were excluded from analysis based on the following: 1) one OA sample was exhausted in laboratory preparation; 2) two OA cases were designated as outliers based on the clusters of PC1 vs. PC2; and 3) three samples (one OA, two controls) had peptide missing rate >15%. These resulted in 596 FNIH and 127 control study participants in the final analysis dataset. The final missing rate for each peptide ranged from 0.1% to 5.1% and there was one missing BMI value (0.1%).
WMW results and diagnostic variable selection
The WMW analysis revealed 65 peptides with distributional differences between OA and controls (p<0.05, Table S1). Of these, 41 peptides remained significant after FDR adjustment (FDR p<0.05). The medians of the majority of peptides 101/107 (94%) were lower in OA than control (WMW-odds <1).
Of the 41 peptides with significantly different serum concentrations in cases vs controls, ACTG(96–113) was the top predictor of OA status (p=1.9×10−13) with WMW-odds 3.1, indicating that a randomly chosen OA individual was 3.1 times more likely to have a higher ACTG value than a randomly chosen control individual. We therefore decided to model ACTG(96–113) on its own in addition to the other models described below.
Among all the variables (107 peptides, age, BMI, sex, knee KL grade), random forest (RF) selected 24 variables as important diagnostic indicators at least 90% of the time (Table 2); these consisted of 23 peptides (5 that were higher in OA and 18 lower in OA based on medians corresponding to 19 proteins) (Table S2) and BMI (higher in OA). The assessment of correlation between the 23 selected peptides (Figure S1) identified 17 uncorrelated representative peptides, selected based on their RF variable important scores and WMW results.
Table 2.
Diagnostic variables (n=24; 23 peptides and BMI*) selected by random forest (RF).
| Peptide | Protein (Gene) Name | RF importance score | WMW P value | WMW odds (95% CI) | Function | Peptide Sequence | Accession Number |
|---|---|---|---|---|---|---|---|
| ACTG(96–113) | Actin, cytoplasmic 2 (ACTG1) | 1 | 1.85e-13 | 3.12 (2.28,4.23) | Cell motility | VAPEEHPVLLTEAPLNPK | P63261–1 |
| HEMO(198–201) | Hemopexin (HPX) | 1 | 3.02e-10 | 0.39 (0.29,0.53) | Acute phase protein, transports heme to liver for breakdown | YYCFQGNQFLR | P02790–1 |
| KNG1(317–324) | Kininogen-1 (KNG1) | 1 | 3.33e-10 | 0.39 (0.29,0.53) | Coagulation factor and regulator of Insulin-like Growth Factor | YFIDFVAR | P01042–1 |
| CO5(767–782) | Complement C5 (C5) | 1 | 2.02e-09 | 0.41 (0.31,0.55) | Complement component | ALEQDLPVNIK | P01031–1 |
| C1R(229–235) | Complement C1r subcomponent (C1R) | 1 | 4.28e-09 | 0.42 (0.31,0.56) | Serine protease, part of the first component of the classical pathway of the complement system | GLTLHLK | P00736–1 |
| A1BG(262–280) | Alpha-1B-glycoprotein (A1BG) | 1 | 9.43e-09 | 0.49 (0.37,0.65) | Novel member of immunoglobulin superfamily | IFFHLNAVALGDGGHYTCR | PO4217–1 |
| CD14(94–111) | CD14 Monocyte differentiation antigen (CD14) | 1 | 6.67e-07 | 0.48 (0.36,0.65) | Coreceptor for bacterial lipopolysaccharide | LTVGAAQVPAQLLVGALR | P08571–1 |
| HPT(392–401) | Haptoglobin (HP) | 1 | 3.70e-05 | 0.55 (0.42,0.73) | Antioxidant, has antibacterial activity, and plays a role in modulating many aspects of the acute phase response | VTSIQDWVQK | P00738–1 |
| HRG(188–201) | Histidine-rich glycoprotein (HRG) | 1 | 0.0002 | 0.59 (0.44,0.78) | Regulates many processes such as immune complex and pathogen clearance, cell chemotaxis, cell adhesion, angiogenesis, coagulation and fibrinolysis | GGEGTGYFVDFSVR | P04196–1 |
| CNDP1(182–192) | Beta-Ala-His dipeptidase (CNDP1) | 1 | 0.0009 | 1.60 (1.21,2.11) | Catalyzes the peptide bond hydrolysis in Xaa-His dipeptides | ALEQDLPVNIK | Q96KN2–1 |
| CRAC1(170–178) | Cartilage acidic protein 1 (CRTAC1) | 1 | 0.02 | 1.39 (1.05,1.82) | Glycosylated extracellular matrix protein | GVASLFAGR | Q9NQ79–1 |
| KNG1(479–496) | Kininogen-1 (KNG1) | 1 | 0.09 | 0.79 (0.60,1.04) | See above | LDDDLEHQGGHVLDHGHK | P01042–1 |
| HPT(216–227) | HP | 0.999 | 2.31e-05 | 0.54 (0.41,0.72) | See above | DIAPTLTLYVGK | P00738–1 |
| HEMO(92–102) | HPX | 0.999 | 0.0002 | 0.58 (0.44,0.77) | See above | NFPSPVDAAFR | P02790–1 |
| IC1(287–298) | Plasma protease C1 inhibitor (SERPING1) | 0.998 | 4.99e-06 | 0.52 (0.39,0.69) | Plays crucial role in regulating important physiological pathways including complement activation, blood coagulation, fibrinolysis and the generation of kinins | LVLLNAIYLSAK | P05155–1 |
| CRAC1(101–108) | Cartilage acidic protein 1 (CRTAC1) | 0.998 | 0.10 | 1.26 (0.96,1.66) | Glycosylated extracellular matrix protein | SSPYYALR | Q9NQ79–1 |
| MASP1(194–208) | Mannan-binding lectin serine protease 1 (MASP1) | 0.983 | 3.00e-05 | 0.55 (0.41,0.73) | Functions in lectin pathway of complement, which performs a key role in innate immunity by recognizing pathogens through patterns of sugar moieties and neutralizing them | TGVITSPDFPNPYPK | P48740–1 |
| CFAI(346–358) | Complement factor I (CFI) | 0.964 | 3.08e-06 | 0.51 (0.38,0.68) | Trypsin-like serine protease that plays an essential role in regulating the immune response by controlling all complement pathways | AQLGDLPWQVAIK | P05156–1 |
| VTDB(95–114) | Vitamin D-binding protein (GC) | 0.955 | 0.007 | 0.68 (0.52,0.90) | Vitamin D transport and storage, scavenging of extracellular G-actin, enhancement of the chemotactic activity of C5 alpha for neutrophils in inflammation and macrophage activation | SCESNSPFPVHPGTAECCTK | P02774–1 |
| FA5(1506–1517) | Coagulation factor V (F5) | 0.943 | 3.54e-06 | 0.51 (0.38,0.68) | Central regulator of hemostasis.; serves as critical cofactor for the prothrombinase activity of factor Xa that results in activation of prothrombin to thrombin. | EFNPLVIVGLSK | P12259–1 |
| HABP2(123–135)* | Hyaluronan-binding protein 2 (HABP2) | 0.937 | 0.65 | 0.94 (0.71,1.23) | Extracellular serine protease that binds hyaluronic acid, involved in extrinsic pathway of blood coagulation, activates urinary plasminogen activator and coagulation factor VII | GQCLITQSPPYYR | Q14520–1 |
| VTDB(208–218) | GC | 0.934 | 5.47e-06 | 0.52 (0.39,0.70) | See above | HLSLLTTLSNR | P02774–1 |
| AMBP(298–309) | Alpha-1-microglobulin (AMBP) | 0.906 | 5.06e-06 | 0.52 (0.39,0.69) | Antioxidant and tissue repair protein | AFIQLWAFDAVK | P02760–1 |
Results for BMI: RF importance score 0.995; WMW p value 8.3e-06; WMW odds (95% CI) 1.91 (1.44, 2.54) Amino Acid Number (per Swiss UniProt db searched 4.8.21); Table ordered according to RF score then WMW p value; WMW Odds: >1 indicates higher probability that a randomly chosen OA value will be greater than a randomly chosen control value, <1 indicates higher probability that a randomly chosen OA value will be lesser than a randomly chosen control value. The finally selected 17 uncorrelated peptides are underlined.
The finally selected 8 significant peptides are in bold;
OA diagnostic models
For all models (Table 3), PR-AUCs were uniformly higher than standard ROC-AUCs (consistently >0.8). Models with age and sex, with or without BMI, were the weakest predictors (ROC-AUCs =0.57 and 0.593). Neither age nor sex were selected by RF; nor did they add significantly to the prediction provided by BMI alone. For these reasons, we tested the discriminative capacity of the proteomic biomarkers with and without BMI.
Table 3.
Test set results for clinical and proteomic variables as OA diagnostics.
| DISCOVERY ANALYSES (TRAINING SET) | VALIDATION ANALYSES (TESTING SET) | SENSITIVITY ANALYSES FNIH WOMAC pain=0 subset | SENSITIVITY ANALYSES FNIH WOMAC pain >0 subset | |||||
|---|---|---|---|---|---|---|---|---|
| Model | TRAINING SET ROC-AUCs (95% CIs) | TRAINING SET PR-AUCs (95% CIs) | TESTING SET ROC-AUCs (95% CIs) | TESTING SET PR-AUCs (95% CIs) | TESTING SET ROC-AUCs (95% CIs) | TESTING SET PR-AUCs (95% CIs) | TESTING SET ROC-AUCs (95% CIs) | TESTING SET PR-AUCs (95% CIs) |
| BMI | 0.657 (0.588, 0.721) | 0.886 (0.848, 0.926) | 0.570 (0.473, 0.665) | 0.812 (0.747, 0.882) | 0.460 (0.348, 0.58) | 0.547 (0.442, 0.678) | 0.629 (0.524, 0.73) | 0.767 (0.680, 0.864) |
| BMI, age, sex | 0.689 (0.62, 0.753) | 0.893 (0.856, 0.932) | 0.593 (0.497, 0.687) | 0.823 (0.757, 0.893) | 0.515 (0.405, 0.629) | 0.582 (0.472, 0.725) | 0.636 (0.533, 0.732) | 0.773 (0.687, 0.868) |
| ACTG | 0.757 (0.697, 0.811) | 0.934 (0.906, 0.956) | 0.705 (0.612, 0.788) | 0.897 (0.844, 0.941) | 0.697 (0.591, 0.793) | 0.750 (0.633, 0.855) | 0.709 (0.617, 0.798) | 0.854 (0.780, 0.917) |
| ACTG + BMI | 0.776 (0.719, 0.825) | 0.942 (0.92, 0.96) | 0.703 (0.608, 0.787) | 0.884 (0.828, 0.936) | 0.650 (0.537, 0.754) | 0.688 (0.563, 0.813) | 0.733 (0.635, 0.822) | 0.848 (0.766, 0.919) |
| 23 RF peptides | 0.929 (0.9, 0.955) | 0.984 (0.977, 0.991) | 0.848 (0.766, 0.916) | 0.936 (0.884, 0.979) | 0.837 (0.751, 0.913) | 0.834 (0.717, 0.938) | 0.854 (0.771, 0.922) | 0.908 (0.833, 0.968) |
| 23 RF peptides + BMI | 0.938 (0.913, 0.961) | 0.987 (0.98, 0.992) | 0.853 (0.767, 0.923) | 0.929 (0.874, 0.979) | 0.832 (0.743, 0.913) | 0.809 (0.694, 0.932) | 0.864 (0.778, 0.932) | 0.903 (0.825, 0.972) |
| 17 uncorrelated peptides | 0.919 (0.884, 0.949) | 0.981 (0.97, 0.99) | 0.841 (0.757, 0.913) | 0.932 (0.878, 0.976) | 0.834 (0.744, 0.913) | 0.826 (0.708, 0.937) | 0.846 (0.761, 0.918) | 0.901 (0.823, 0.965) |
| 17 uncorrelated peptides + BMI | 0.925 (0.89, 0.953) | 0.982 (0.972, 0.99) | 0.845 (0.758, 0.918) | 0.926 (0.871, 0.976) | 0.829 (0.74, 0.911) | 0.807 (0.691, 0.93) | 0.854 (0.766, 0.927) | 0.897 (0.818, 0.968) |
| 8 significant peptides | 0.907 (0.871, 0.938) | 0.978 (0.967, 0.987) | 0.833 (0.751, 0.905) | 0.929 (0.876, 0.973) | 0.830 (0.738, 0.911) | 0.819 (0.700, 0.933) | 0.835 (0.749, 0.907) | 0.898 (0.819, 0.962) |
| 8 significant peptides + BMI | 0.916 (0.881, 0.945) | 0.981 (0.971, 0.989) | 0.842 (0.758, 0.914) | 0.925 (0.87, 0.975) | 0.824 (0.732, 0.906) | 0.802 (0.686, 0.927) | 0.853 (0.765, 0.925) | 0.897 (0.818, 0.968) |
23 peptides and BMI were selected based on random forest (RF)>0.9; 17 uncorrelated peptides were selected based on RF=1, WMW p value and uncorrelated status; 8 significant peptides were selected based on RF=1, WMW p value, uncorrelated status, and association with OA status in multivariable logistic regression; for the list of these sets of peptides see Table 2. 231 (39%) FNIH cases had baseline WOMAC pain scores=0
All models with peptides performed well as diagnostics of knee OA based on ROC-AUCs or PR-AUCs. ACTG(96–113), the top marker in univariable analyses, alone yielded ROC-AUC and PR-AUC of 0.705 and 0.823, respectively; these AUCs were significantly better than predictions with BMI alone or BMI combined with age and sex. All models with combinatorial peptide biomarkers (23, 17 or 8) performed significantly better as diagnostics of knee OA than ACTG(96–113) alone, with ROC-AUCs ranging from 0.833–0.848, and PR-AUCs ranging from 0.929–0.936, compared with 0.705 and 0.823 for ACTG(96–113) (Table 3). The addition of BMI to peptide only models did not improve the prediction. We also observed that the final parsimonious set of 8 peptides was as good as the models using 23 or 17 peptides (ROC-AUCs: 0.833 vs. 0.848 and 0.841). The forest plot, derived from multivariable logistic regression of these 8 final significant peptides (Figure 1), show that half were positive predictors of an OA status and half negative (i.e., higher concentrations predicted lower odds of knee OA) predictors.
Figure 1. Forest plot of 8 significant OA diagnostic peptides.

Multivariable model results for final 8 peptides selected by random forest as diagnostic predictors of knee OA. (odds of predicting OA, 95% CIs are depicted and p values are shown). aa=amino acid
Sensitivity analyses demonstrated that the proteomic biomarkers were equally effective for diagnosing non-symptomatic and symptomatic OA cases (baseline WOMAC pain=0 and baseline WOMAC pain>0 OA FNIH study participants, respectively), whereas BMI was only diagnostic for symptomatic OA cases (Table 3).
Diagnostic indices of the models
The Youden’s index J statistic for the 8 significant peptide model was 0.81 in the training dataset, yielding sensitivity 78% and specificity 78% in the testing dataset (Table 4). Although these results are based on the first of the ten imputed datasets, there were so few missing values that it is expected that the results would be approximately the same across all the 10 multiple imputed datasets. The highest sensitivity (94%) was provided by the combination of the 8 significant peptides plus BMI, but with a sacrifice of specificity (69% vs. 78% for 8 peptides alone). The model of 23 RF peptides yielded the highest overall combination of sensitivity (81%) and specificity (82%).
Table 4.
Youden Indices of biomarker sets for the diagnosis of OA.
| Model | Threshold based on optimal Youden’s index in training dataset | Sensitivity in testing dataset using training dataset threshold [95% CIs] |
Specificity in testing dataset using training dataset threshold [95% CIs] |
|---|---|---|---|
| BMI | 0.82 | 0.52 (0.46,0.59) | 0.60 (0.54,0.66) |
| BMI, age, sex | 0.85 | 0.42 (0.36,0.49) | 0.82 (0.77,0.87) |
| ACTG | 0.8 | 0.71 (0.66,0.77) | 0.62 (0.56,0.68) |
| ACTG + BMI | 0.73 | 0.71 (0.65,0.77) | 0.67 (0.61,0.73) |
| 23 RF peptides | 0.86 | 0.81 (0.75,0.86) | 0.82 (0.77,0.87) |
| 23 RF peptides + BMI | 0.81 | 0.86 (0.81,0.9) | 0.76 (0.7,0.81) |
| 17 uncorrelated peptides | 0.83 | 0.77 (0.72,0.83) | 0.84 (0.8,0.89) |
| 17 uncorrelated peptides + BMI | 0.79 | 0.88 (0.83,0.92) | 0.73 (0.68,0.79) |
| 8 significant peptides | 0.81 | 0.78 (0.73,0.84) | 0.78 (0.72,0.83) |
| 8 significant peptides + BMI | 0.9 | 0.94 (0.91,0.97) | 0.69 (0.63,0.75) |
Network interactions
Five proteins of the complement system, an important part of the innate immune system26, 27 implicated in the development of OA28, were identified as OA diagnostics in this study including C1R (C1R), IC1 (Serpin G1), CFAI (CFI), CO5 (C5), and MASP1 (MASP1) (Figure S2A); all were lower in OA compared with controls consistent with complement consumption in the context of OA inflammation. Three proteins related to coagulation, a pathway linked to the pathogenesis of OA, were also identified as OA diagnostics in this study: KNG1 and FA5 (both lower in OA), and HABP2 (higher in OA, also called factor VII activating protease (FSAP)) (Figure S2B). Results related to KNG1 and FA5 are consistent with coagulation pathway activation and factor consumption in the context of inflammation; results with HABP2 are consistent with its upregulation by inflammatory mediators, such as low molecular hyaluronan fragments, and potential role as an acute phase reactant25. With the exception of CRAC1 and CNDP1, STRING analyses revealed a major interaction of all the RF selected proteins (Figure S2C) with interaction scores of the RF selected proteins ranging from 0.403 (CFI / GC interaction) to 0.999 (C1R / SERPING1 interaction), reflecting medium to highest confidence of these interactions (Table S7). STRING analyses also demonstrated medium to high confidence of specific interactions of the complement and coagulation proteins identified as diagnostic of OA with interaction scores ranging from 0.41 to 0.87 (highest for SERPING1 interaction with KNG1) (Table S7). These results provide evidence for crosstalk of several biological pathways in OA, chief among them complement and coagulation cascades.
Discussion
In this study, we utilized MS-based MRM analysis of serum, and a mixture of the regression tree model by random forest for feature selection and logistic regression models for evaluation of the diagnostic capability of proteomic markers for OA. This culminated in a set of biomarkers that can diagnose both non-symptomatic and symptomatic radiographic OA. Inclusion of BMI in the models had minimal effect on the AUCs. This suggests that these biomarkers would be useful in differentiating the two groups, even in clinical trials with BMI matched cases and controls. Many studies have turned to proteomic strategies to identify novel biomarkers9 with the plan of future validation. Recently, 677 proteins were identified in the synovial fluid of 10 individuals with OA29; interestingly, these included all of the serum proteins that were selected by RF here. Previously in another study, using two-dimensional differential gel electrophoresis (2D-DIGE) and MS, 66 proteins were identified as differentially expressed in healthy (n=10) vs. OA (n=20) synovial fluid30. These differentially expressed proteins were associated with three pathways – the acute phase response, the complement pathway, and the coagulation pathway30 – which were also associated with the differential expression of biomarkers identified here as diagnostic of OA. In cartilage samples obtained at the time of total hip replacement for either OA (n=9) or femoral neck fracture (n=12, control), 7 proteins were identified as upregulated in OA using single reaction monitoring targeted proteomics and MS after controlling the false discovery rate13. Another study that targeted 35 peptides by MS-based MRM analysis of 116 sera from 116 individuals (n=39 controls, n=77 OA cases) identified haptoglobin and von Willebrand Factor as upregulated in OA12. We add to the prior knowledge by demonstrating diagnostic capabilities of a multi-marker panel of proteomic biomarkers in testing data. Interestingly, many of the protein biomarkers selected here, including ACTG, CRAC1, C1R, CO5, HPT, HEMO, HRG and HABP2, aore also predictive of incident radiographic knee OA31.
A number of the diagnostic markers identified in this study have putative roles in the pathogenesis of OA including ACTG32 and CRAC119, 33–35, both higher in OA in this and these previous studies. Among these, we previously observed expression of ACTG in our existing scRNAseq data, in lesioned and non-lesioned cartilage and synovium19. Though there has been some discussion surrounding actins in the OA literature36, none have yet concerned ACTG that we found to be the most important single predictor of OA. Actin, of which ACTG is a type, is a highly abundant intracellular protein; severe cell injury can cause its release into the systemic circulation37. VTDB is an actin-binding protein and acts as an actin-sequestering agent in the extracellular space37. VTDB along with gelsolin play a crucial role in the clearance of actin filaments from the circulation37; this is consistent with higher ACTG but lower VTDB (VTDB(208–218)) that we observed in OA compared to controls. CRAC1 has been studied both as a diagnostic biomarker of OA and as a prognostic biomarker of OA progression. Compared with controls, CRAC1 was upregulated in the synovial fluid30 and in the secretome of cartilage38 of individuals with knee OA. Serum CRAC1 has been shown to predict incident radiographic OA31, radiographic OA progression19, 39, and knee OA related pain progression19. In all cases, CRAC1 was higher in OA progressors and prior to onset of incident radiographic OA compared with controls; this aligns with our finding that CRAC1 was higher in individuals with OA than in controls. Five proteins related to the complement pathways were selected by RF as OA diagnostics in this study: C1R, IC1, CFAI, CO5, and MASP1. Serum concentrations were lower in OA than controls for all five proteins and both C1R and MASP1, members of the final set of 8 diagnostic biomarkers, were associated with lower odds of an OA status. We previously found that mean serum concentrations of C1R were lower in individuals with knee OA progression compared with non-progressors19. Interestingly, C1R and MASP1 are both involved in initiation of the classical- and lectin pathway activation of the complement system, respectively27. C1R is a proteolytic subunit of the complement system C1 complex and a component of the C1 complex. MASP-1 is a homologue of C1S and C1r27. IC1 (SERPING1), an inhibitor of the C1 complex, was also lower in association with OA status. Taken together, the lower concentrations of these peptides are consistent with complement activation and net consumption of complement components in the context of OA, a condition in which it is known that complement can be activated by various extracellular matrix components and their cleavage products, released during OA-associated cartilage degradation26.
Three proteins related to coagulation, a pathway linked to the pathogenesis of OA, were identified as OA diagnostics in this study: KNG130, HPT10, 12, and FA5. Based on two-dimensional gel electrophoresis, KNG1 was upregulated in knee synovial fluid of individuals with OA versus asymptomatic individuals without radiographic OA30. In contrast, lower not higher serum concentrations of KNG1 (both measured peptides) were associated with OA in our study. HPT, which has a systemic anti-inflammatory effect40 has been shown to be upregulated in synovial fluid in association with meniscal injury10. In contrast to a prior study showing a higher serum concentration of HPT(392–401) in association with a 1.22 increased odds of OA compared with control12, we observed a decreased odds, i.e., that higher HPT was associated with a lower risk of OA. FA5 is both downstream and upstream of thrombin in the coagulation cascade (Figure S2). The peptide we identified as an OA diagnostic, FA5(1506–1517), is part of the activator/connector domain of FA5 that is lost upon proteolytic cleavage by thrombin41 (thrombin proteolytic activity is associated with OA activity). Thus, a lower serum FA5(1506–1517) concentration would be consistent with an OA status, as observed in this study.
It is typically said that there are no blood tests that can diagnose OA; however, blood tests are often used in clinical settings to facilitate or confirm diagnoses of other arthritides such as rheumatoid arthritis (RA), gout, or lupus. We are now poised, with these strong diagnostic biochemical markers of knee OA, to better understand the pathogenesis of OA and potentially assist in filling the medical need of diagnostics for OA; with as few as 1–8 proteomic markers, a clinical mass spectrometry multiplex laboratory derived diagnostic test for OA would be feasible with minimal serum (a few microliters), and without the need for immunodepletion of high abundance proteins from the clinical sample. Clinical mass spectrometry is currently feasible as demonstrated by its use to diagnose metabolic deficiencies (typically for newborn screening), for toxicology testing, and for quantifying and differentiating between 25-hydroxy vitamins D2 and D3 with high accuracy as these forms cannot be distinguished by immunoassay42. Based on the robustness of identification of knee OA using a serum biomarker panel with as few as 1–8 peptides, it would appear that an OA diagnosis by a biomarker is a somewhat easier task than prediction of OA progression19 or incident OA31. This accords with the growing list of biomarkers for achieving a diagnosis of OA. It may be harder however to differentiate OA from other arthritides, chief among them RA, because active OA, like RA, is an inflammatory arthropathy with inflammatory serum biomarkers able to identify the active (progressing) joint disease19.
We aimed to optimize several aspects of the design of this study, both technical and statistical. Although proteomics have been utilized by previous studies to identify diagnostic biomarkers of OA, many of these studies relied on older, label-free proteomics10, 28, 30, 38. We began with targeted MS-based MRM using SILs which provide a higher accuracy of quantification than label-free methods43. It is necessary to examine differences univariately when sample sizes are small, as they were for many of these studies10, 13, 28, 30, 38; but with our larger sample size, we were able to use RF to obtain permutation variable importance scores, which are based on prediction, to perform variable selection while taking into account interactions between the variables. Interestingly, our models all performed equally well in terms of ROC-AUC and PR-AUC as the model derived from the top 8 univariate peptides. Due to the unbalanced design of the study, we incorporated PR-AUC in addition to ROC-AUC for the assessment of the prediction models. The ROC curve and its AUC are relatively well known and useful in evaluating predictive models that produce output values over a continuous range44. However, ROC curves, and by extension their associated AUC, can be misleading when the sample sizes of cases and controls are different. In this circumstance, the precision recall (PR) curve, can be more informative because it can provide an accurate prediction of future classification performance since it evaluates the fraction of true positives among positive predictions45. Additional strengths of this study included a relatively large sample size, and to the best of our knowledge, the most stringent OA controls in the literature to date, derived from the JoCoOA, with low or no burden of multi-joint OA accounting for hand, hip, knee and spine OA. Past studies may have been hampered by unaccounted for occult OA in controls.
There were several limitations of this study. Although another larger cohort was not available for validation of our model, our validation/testing dataset was still substantially larger than the OA sample sets used in discovery by many others11–13, 30, 38, 46. Typical for proteomic data, our discovery data also included missing values. However, the SILs enabled us to identify which values were below the limit of quantification and which were MAR so that we could remove peptides accordingly and impute the remaining missing values, assuming that they were MAR. Then, we combined RF, a flexible imputation scheme with multiple imputation, to report appropriate p-values and CIs. Notably, among the peptides that were most highly selected, there were few missing values in the training dataset: three (0.6%) in C1R(229–235), one (0.2%) in A1BG(262–280), one (0.1%) in CRAC1(170–178), and five (1%) in VTDB(95–114) with no missing values in the other peptides. There were also few missing values for the selected peptides in the testing data: one (0.4%) in A1BG(262–280) and five (1.7%) in VTDB(95–114) with no missing values in the other peptides. Combining the results from ten imputed datasets using Rubin’s rules relies on a normality assumption. In addition, we can appeal to the central limit theorem to assume normality for the regression coefficients given our sample size. Finally, although based on self-report the majority (98% FNIH and 64% JoCoOA) of sera in both cohorts were obtained after 2–8 hours fasting, we do not know what, if any effect, fasted vs non-fasted state would have on these predictions.
In summary, we identified and validated a model to diagnose radiographic knee OA using serum proteomics. No more than 8 protein serum markers, with or without BMI, were required for discrimination of knee OA from non-OA. Interestingly, many of the proteins indicative of an OA diagnosis have been shown to also identify incident radiographic OA as much as 8 years prior to the onset of radiographic abnormalities. These biomarkers further implicate the complement and coagulation pathways, lending support to the involvement of innate immunity in the development of OA.
Supplementary Material
Figure S1. Correlations of random forest selected peptides and BMI. Spearman correlations depict strength of correlations; these were used to prune the random forest selected set to 17 uncorrelated peptides +/− BMI.
Figure S2. Network of serum proteins identified as OA diagnostics. A) IPA diagram of the complement components identified in this study as OA diagnostics (highlighted in red). B) IPA diagram of the coagulation components identified in this study as OA diagnostics (highlighted in red). C) Interactions of random forestidentified OA diagnostic proteins in this study based on STRING database analysis. The gene names shown in the figure correspond to the following proteins in parentheses: – Complement: C1R (C1R), SERPING1 (IC1), CFI (CFAI), C5 (CO5), and MASP1 (MASP1); Coagulation: KNG1 (KNG1), F5 (FA5), and HABP2 (HABP2); bradykinin is identified in red in panel B as it is a component released from high molecular weight KNG1. (graphic compiled in BioRender, lab license JP25CXHDAG)
Table S1. Results of Wilcoxin-Mann-Whitney (WMW) univariable analyses of training data. WMW analyses results shown for all 107 peptides, BMI and age. Based on WMW analysis, 65 peptides were different across the two groups (OA vs control, p<0.05); of these, 41 peptides were significantly different with FDR (Benjamini-Yekutieli) adjusted p<0.05. The WMW odds (95%CIs) were calculated for each variable (107 peptides, BMI, and age); the WMW odds can be interpreted as showing the probability that a randomly selected value is higher (WMW-odds > 1) or lower (WMW-odds < 1) in OA than controls. Descriptive statistics of the endogenous to stable isotope labelled (SIL) ratios are provided including means, standard deviations, medians, minimum and maximum, first and third quartile values. Because these values are expressed as a ratio, there were no units associated with any of the peptide measurements.
Table S2. Descriptive statistics for each peptide (endogenous to stable isotope labelled (SIL) ratio values) for the training and test datasets separated for the control and OA study participants.
Table S3. Logistic regression model of demographics with OA status as outcome. Shown are BMI alone (on left) or all demographics, BMI, age and sex, (on right). All estimates from the developed model are provided (column B).
Table S4. Logistic regression models of 23 random forest selected peptides, without (on left) and with (on right) BMI, with OA status as outcome. The 23 peptides and BMI were selected based on random forest (RF)>0.9. All estimates from the developed model are provided (column B).
Table S5. Logistic regression models of 17 uncorrelated peptides from among those random forest selected peptides, without (on left) and with (on right) BMI, with OA status as outcome. The 17 uncorrelated peptides were selected based on random forest (RF)=1, WMW p value and uncorrelated status. All estimates from the developed model are provided (column B).
Table S6. Logistic regression models of 8 final peptides without (on left) and with (on right) BMI, with OA status as outcome. The 8 significant peptides were selected based on random forest (RF)=1, WMW p value, uncorrelated status, and association with OA status in multivariable logistic regression.
Table S7. STRING scores for interactions of the queried 23 random forest selected peptides (corresponding to 19 proteins) and their first order interactors. Interaction scores rank from 0 to 1, with 1 being the highest possible confidence. STRING suggests thresholds of 0.15 (low confidence), 0.40 (medium confidence), 0.70 (high confidence) or 0.90 (highest confidence). STRING combined interaction scores (column C) are reported that reflect direct (physical) and indirect (functional) associations, determined experimentally and by text mining.
Acknowledgments
We wish to thank M. Arthur Moseley, Greg Waitt, and Tricia Ho for expert technical assistance with the proteomic analyses.
Role of funding sources:
The funding sources had no influence on conduct of the study, the manuscript content or decision to publish these data. This work has been funded in part by the following sources.
National Institutes of Health grant R01 AR071450
National Institutes of Health grant P30 AG028716
The Johnston County Osteoarthritis Project has been funded in part by: Association of Schools of Public Health/Centers for Disease Control and Prevention (CDC) S043, S1734, S3486; CDC U01DP003206; National Institutes of Health/National Institute of Arthritis and Musculoskeletal and Skin Diseases P60AR30701, P60AR049465, P60AR064166, and P30AR072580.
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Author contributions:
Conceptualization: VBK
Methodology: VBK, AR, EJS, YJL
Investigation: VBK, AR, EJS, YJL
Visualization: VBK, AR
Supervision: VBK, YJL
Writing—VBK, AR
Writing—review & editing: VBK, AR, EJS, YJL, YG, AN
Competing interests: VBK and ES, are named inventors in a patent related to serum proteomic predictors of knee osteoarthritis progression. No other author has conflicts related to this work.
Data and materials availability:
All data needed to evaluate the conclusions in the paper are present in the paper and/or the Supplementary Materials. All proteomic data for this study are available at ftp://massive.ucsd.edu/ or massive.ucsd.edu.
REFERENCES
- 1.Munjal A, Bapat S, Hubbard D, Hunter M, Kolhe R, Fulzele S. Advances in Molecular biomarker for early diagnosis of Osteoarthritis. Biomol Concepts 2019; 10: 111–119. [DOI] [PubMed] [Google Scholar]
- 2.Hart DJ, Spector TD, Brown P, Wilson P, Doyle DV, Silman AJ. Clinical signs of early osteoarthritis: reproducibility and relation to x ray changes in 541 women in the general population. Ann Rheum Dis 1991; 50: 467–470. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Hochberg M, Kraus V, Lohmander S, Guermazi A, Roemer F, Mobasheri A. Osteoarthritis. In: Clinical Innovation in Rheumatology, Past, Present, and Future, Liebowitz J, Seo P Eds.: CRC Press, Taylor and Francis; 2023:47–63. [Google Scholar]
- 4.Bedson J, Croft PR. The discordance between clinical and radiographic knee osteoarthritis: a systematic search and summary of the literature. BMC Musculoskelet Disord 2008; 9: 116. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.van Spil WE, Degroot J, Lems WF, Oostveen JC, Lafeber FP. Serum and urinary biochemical markers for knee and hip-osteoarthritis: a systematic review applying the consensus BIPED criteria. Osteoarthritis Cartilage 2010; 18: 605–612. [DOI] [PubMed] [Google Scholar]
- 6.Kraus VB, Burnett B, Coindreau J, Cottrell S, Eyre D, Gendreau M, et al. Application of biomarkers in the development of drugs intended for the treatment of osteoarthritis. Osteoarthritis Cartilage 2011; 19: 515–542. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Nguyen LT, Sharma AR, Chakraborty C, Saibaba B, Ahn ME, Lee SS. Review of Prospects of Biological Fluid Biomarkers in Osteoarthritis. Int J Mol Sci 2017; 18: 601. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Ali N, Turkiewicz A, Hughes V, Folkesson E, Tjörnstand J, Neuman P, et al. Proteomics Profiling of Human Synovial Fluid Suggests Increased Protein Interplay in Early-Osteoarthritis (OA) That Is Lost in Late-Stage OA. Mol Cell Proteomics 2022; 21: 100200. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Hsueh MF, Onnerfjord P, Kraus VB. Biomarkers and proteomic analysis of osteoarthritis. Matrix Biol 2014; 39C: 56–66. [DOI] [PubMed] [Google Scholar]
- 10.Liao W, Li Z, Zhang H, Li J, Wang K, Yang Y. Proteomic analysis of synovial fluid as an analytical tool to detect candidate biomarkers for knee osteoarthritis. Int J Clin Exp Pathol 2015; 8: 9975–9989. [PMC free article] [PubMed] [Google Scholar]
- 11.Hsueh MF, Khabut A, Kjellstrom S, Onnerfjord P, Kraus VB. Elucidating the Molecular Composition of Cartilage by Proteomics. J Proteome Res 2016; 15: 374–388. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Fernandez-Puente P, Calamia V, Gonzalez-Rodriguez L, Lourido L, Camacho-Encina M, Oreiro N, et al. Multiplexed mass spectrometry monitoring of biomarker candidates for osteoarthritis. J Proteomics 2017; 152: 216–225. [DOI] [PubMed] [Google Scholar]
- 13.Hosseininia S, Onnerfjord P, Dahlberg LE. Targeted proteomics of hip articular cartilage in OA and fracture patients. J Orthop Res 2019; 37: 131–135. [DOI] [PubMed] [Google Scholar]
- 14.Stekhoven DJ, Bühlmann P. MissForest--non-parametric missing value imputation for mixed-type data. Bioinformatics 2012; 28: 112–118. [DOI] [PubMed] [Google Scholar]
- 15.Kellgren JH, Lawrence JS. Radiological assessment of osteo-arthrosis. Ann Rheum Dis 1957; 16: 494–502. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Altman RD, Gold GE. Atlas of individual radiographic features in osteoarthritis, revised. Osteoarthritis Cartilage 2007; 15 Suppl A: A1–56. [DOI] [PubMed] [Google Scholar]
- 17.Jordan JM, Helmick CG, Renner JB, Luta G, Dragomir AD, Woodard J, et al. Prevalence of knee symptoms and radiographic and symptomatic knee osteoarthritis in African Americans and Caucasians: the Johnston County Osteoarthritis Project. J Rheumatol 2007; 34: 172–180. [PubMed] [Google Scholar]
- 18.Kraus VB, Hargrove DE, Hunter DJ, Renner JB, Jordan JM. Establishment of reference intervals for osteoarthritis-related soluble biomarkers: the FNIH/OARSI OA Biomarkers Consortium. Ann Rheum Dis 2017; 76: 179–185. [DOI] [PubMed] [Google Scholar]
- 19.Zhou K, Li Y-J, Soderblom E, Reed A, Jain V, Sun S, et al. A ‘best-in-class’ systemic biomarker predictor of clinically relevant knee osteoarthritis structural and pain progression Science Advances 2023; 9: abq5095. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Wright M, Ziegler A. ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R. Journal of Statistical Software 2017; 77: 1 – 17. [Google Scholar]
- 21.Moons KG, Donders RA, Stijnen T, Harrell FE. Using the outcome for imputation of missing predictor values was preferred. J Clin Epidemiol 2006; 59: 1092–1101. [DOI] [PubMed] [Google Scholar]
- 22.Rubin D Multiple Imputation After 18+ Years. J American Statistical Association 1996; 91. [Google Scholar]
- 23.Saito T, Rehmsmeier M. The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PLoS One 2015; 10: e0118432. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Rubin D Multiple Imputation for Nonresponse in Surveys. NY, John Wiley & Sons; 1987. [Google Scholar]
- 25.Kwiatkowska I, Żekanowska E, Lattanzi S, Alexandre AM, Kister-Kowalska A, Słomka A. Factor VII Activating Protease (FSAP) and Its Importance in Hemostasis-Part I: FSAP Structure, Synthesis and Activity Regulation: A Narrative Review. Int J Mol Sci 2023; 24: 5473. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Silawal S, Triebel J, Bertsch T, Schulze-Tanzil G. Osteoarthritis and the Complement Cascade. Clin Med Insights Arthritis Musculoskelet Disord 2018; 11: 1179544117751430. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Gál P, Dobó J, Závodszky P, Sim RB. Early complement proteases: C1r, C1s and MASPs. A structural insight into activation and functions. Mol Immunol 2009; 46: 2745–2752. [DOI] [PubMed] [Google Scholar]
- 28.Wang Q, Rozelle AL, Lepus CM, Scanzello CR, Song JJ, Larsen DM, et al. Identification of a central role for complement in osteoarthritis. Nat Med 2011; 17: 1674–1679. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Balakrishnan L, Nirujogi RS, Ahmad S, Bhattacharjee M, Manda SS, Renuse S, et al. Proteomic analysis of human osteoarthritis synovial fluid. Clin Proteomics 2014; 11: 6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Ritter SY, Subbaiah R, Bebek G, Crish J, Scanzello CR, Krastins B, et al. Proteomic analysis of synovial fluid from the osteoarthritic knee: comparison with transcriptome analyses of joint tissues. Arthritis Rheum 2013; 65: 981–992. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Sun S, Li Y-J, Soderblom E, Moseley M, Zhou K, Reed A, et al. Serum prognostic biomarkers for incident radiographic knee osteoarthritis. Osteoarthritis & Cartilage 2021; 29: S8–S9. [Google Scholar]
- 32.Li C, Luo J, Xu X, Zhou Z, Ying S, Liao X, et al. Single cell sequencing revealed the underlying pathogenesis of the development of osteoarthritis. Gene 2020; 757: 144939. [DOI] [PubMed] [Google Scholar]
- 33.Styrkarsdottir U, Lund SH, Saevarsdottir S, Magnusson MI, Gunnarsdottir K, Norddahl GL, et al. The CRTAC1 Protein in Plasma Is Associated With Osteoarthritis and Predicts Progression to Joint Replacement: A Large-Scale Proteomics Scan in Iceland. Arthritis Rheumatol 2021; 73: 2025–2034. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Tardif G, Paré F, Gotti C, Roux-Dalvai F, Droit A, Zhai G, et al. Mass spectrometry-based proteomics identify novel serum osteoarthritis biomarkers. Arthritis Res Ther 2022; 24: 120. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Szilagyi IA, Vallerga CL, Boer CG, Schiphof D, Ikram MA, Bierma-Zeinstra SMA, et al. Plasma proteomics identifies CRTAC1 as a biomarker for osteoarthritis severity and progression. Rheumatology (Oxford) 2023; 62: 1286–1295. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Blain EJ. Involvement of the cytoskeletal elements in articular cartilage homeostasis and pathology. Int J Exp Pathol 2009; 90: 1–15. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Bouillon R, Schuit F, Antonio L, Rastinejad F. Vitamin D Binding Protein: A Historic Overview. Front Endocrinol (Lausanne) 2020; 10. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Lourido L, Calamia V, Mateos J, Fernandez-Puente P, Fernandez-Tajes J, Blanco FJ, et al. Quantitative proteomic profiling of human articular cartilage degradation in osteoarthritis. J Proteome Res 2014; 13: 6096–6106. [DOI] [PubMed] [Google Scholar]
- 39.Reed A, Li Y-J, Soderblom E, Moseley A, Attur M, Abramsom S, et al. A parsimonious approach to qualification of serum proteomic biomarkers for predicting osteoarthritis progression. Osteoarthritis & Cartilage 2020; 28: S326:471. [Google Scholar]
- 40.Kwon J-O, Jin WJ, Kim B, Ha H, Kim H-H, Lee ZH. Haptoglobin Acts as a TLR4 Ligand to Suppress Osteoclastogenesis via the TLR4–IFN-β Axis. The Journal of Immunology 2019; 202: 3359–3369. [DOI] [PubMed] [Google Scholar]
- 41.Lam W, Moosavi L. Physiology, Factor V. In: StatPearls Treasure Island (FL): StatPearls Publishing; 2023. [PubMed] [Google Scholar]
- 42.Banerjee S. Empowering Clinical Diagnostics with Mass Spectrometry. ACS Omega 2020; 5: 2041–2048. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Gharbi M, Deberg M, Henrotin Y. Application for Proteomic Techniques in Studying Osteoarthritis: A Review. Frontiers in Physiology 2011; 2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Lasko TA, Bhagwat JG, Zou KH, Ohno-Machado L. The use of receiver operating characteristic curves in biomedical informatics. Journal of Biomedical Informatics 2005; 38: 404–415. [DOI] [PubMed] [Google Scholar]
- 45.Brock G, Saito T, Rehmsmeier M. The Precision-Recall Plot Is More Informative than the ROC Plot When Evaluating Binary Classifiers on Imbalanced Datasets. Plos One 2015; 10. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Fuentes M, Ruiz-Romero C, Misiego S, Juanes-Velasco P, Landeira-Viñuela A, Torres-Roda A, et al. Exploring High-Throughput Immunoassays for Biomarker Validation in Rheumatic Diseases in the Context of the Human Proteome Project. J Proteome Res 2022; 22: 1105–1115. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Figure S1. Correlations of random forest selected peptides and BMI. Spearman correlations depict strength of correlations; these were used to prune the random forest selected set to 17 uncorrelated peptides +/− BMI.
Figure S2. Network of serum proteins identified as OA diagnostics. A) IPA diagram of the complement components identified in this study as OA diagnostics (highlighted in red). B) IPA diagram of the coagulation components identified in this study as OA diagnostics (highlighted in red). C) Interactions of random forestidentified OA diagnostic proteins in this study based on STRING database analysis. The gene names shown in the figure correspond to the following proteins in parentheses: – Complement: C1R (C1R), SERPING1 (IC1), CFI (CFAI), C5 (CO5), and MASP1 (MASP1); Coagulation: KNG1 (KNG1), F5 (FA5), and HABP2 (HABP2); bradykinin is identified in red in panel B as it is a component released from high molecular weight KNG1. (graphic compiled in BioRender, lab license JP25CXHDAG)
Table S1. Results of Wilcoxin-Mann-Whitney (WMW) univariable analyses of training data. WMW analyses results shown for all 107 peptides, BMI and age. Based on WMW analysis, 65 peptides were different across the two groups (OA vs control, p<0.05); of these, 41 peptides were significantly different with FDR (Benjamini-Yekutieli) adjusted p<0.05. The WMW odds (95%CIs) were calculated for each variable (107 peptides, BMI, and age); the WMW odds can be interpreted as showing the probability that a randomly selected value is higher (WMW-odds > 1) or lower (WMW-odds < 1) in OA than controls. Descriptive statistics of the endogenous to stable isotope labelled (SIL) ratios are provided including means, standard deviations, medians, minimum and maximum, first and third quartile values. Because these values are expressed as a ratio, there were no units associated with any of the peptide measurements.
Table S2. Descriptive statistics for each peptide (endogenous to stable isotope labelled (SIL) ratio values) for the training and test datasets separated for the control and OA study participants.
Table S3. Logistic regression model of demographics with OA status as outcome. Shown are BMI alone (on left) or all demographics, BMI, age and sex, (on right). All estimates from the developed model are provided (column B).
Table S4. Logistic regression models of 23 random forest selected peptides, without (on left) and with (on right) BMI, with OA status as outcome. The 23 peptides and BMI were selected based on random forest (RF)>0.9. All estimates from the developed model are provided (column B).
Table S5. Logistic regression models of 17 uncorrelated peptides from among those random forest selected peptides, without (on left) and with (on right) BMI, with OA status as outcome. The 17 uncorrelated peptides were selected based on random forest (RF)=1, WMW p value and uncorrelated status. All estimates from the developed model are provided (column B).
Table S6. Logistic regression models of 8 final peptides without (on left) and with (on right) BMI, with OA status as outcome. The 8 significant peptides were selected based on random forest (RF)=1, WMW p value, uncorrelated status, and association with OA status in multivariable logistic regression.
Table S7. STRING scores for interactions of the queried 23 random forest selected peptides (corresponding to 19 proteins) and their first order interactors. Interaction scores rank from 0 to 1, with 1 being the highest possible confidence. STRING suggests thresholds of 0.15 (low confidence), 0.40 (medium confidence), 0.70 (high confidence) or 0.90 (highest confidence). STRING combined interaction scores (column C) are reported that reflect direct (physical) and indirect (functional) associations, determined experimentally and by text mining.
Data Availability Statement
All data needed to evaluate the conclusions in the paper are present in the paper and/or the Supplementary Materials. All proteomic data for this study are available at ftp://massive.ucsd.edu/ or massive.ucsd.edu.
