Table 3. Potential plasma protein biomarkers for ME/CFS.
Gene Name | Uniprot ID | Direction | Lasso | Random Forest | XGBoost | |||
---|---|---|---|---|---|---|---|---|
Percentage1 | Rank4 | Mean Decrease in accuracy2 | Rank4 | Gain3 | Rank4 | |||
All ME/CFS | ||||||||
CAMP | P49913 | Increased | 22.80% | 1 | 0.1284 | 4 | 0.0652 | 2 |
LRG1 | P02750 | Decreased | 9.90% | 9 | 0.1302 | 3 | 0.0327 | 4 |
IGF1 | P05019 | Decreased | 3.90% | 19 | 0.1320 | 2 | 0.0318 | 6 |
GSN | P06396 | Decreased | 3.70% | 20 | 0.0743 | 9 | 0.0281 | 8 |
IGFALS | P35858 | Decreased | 11.60% | 7 | 0.0988 | 7 | 0.0292 | 7 |
IGLV1-47 | P01700 | Decreased | 14.10% | 2 | 0.0639 | 14 | 0.0319 | 5 |
FCRL3 | Q96P31 | Decreased | 4.70% | 17 | 0.0545 | 20 | 0.0127 | 17 |
CRTAC1 | Q9NQ79 | Decreased | 13.30% | 3 | 0.2653 | 1 | 0.1225 | 1 |
ME/CFS with sr-IBS | ||||||||
CAMP | P49913 | Increased | 30.10% | 1 | 0.1772 | 2 | 0.0852 | 2 |
SERPINA3 | P01011 | Decreased | 4.20% | 16 | 0.0731 | 7 | 0.0249 | 6 |
IGF1 | P05019 | Decreased | 11.00% | 6 | 0.1768 | 3 | 0.1132 | 1 |
ITIH2 | P19823 | Decreased | 13.60% | 4 | 0.1870 | 1 | 0.0529 | 4 |
IGHV1-18 | A0A0C4DH31 | Decreased | 19.30% | 3 | 0.0535 | 17 | 0.0157 | 14 |
CRTAC1 | Q9NQ79 | Decreased | 4.80% | 13 | 0.0922 | 4 | 0.0577 | 3 |
ME/CFS without sr-IBS | ||||||||
PON3 | Q15166 | Increased | 7.20% | 3 | 0.0601 | 19 | 0.0571 | 2 |
KNG1 | P01042 | Increased | 3.70% | 13 | 0.0674 | 17 | 0.0122 | 20 |
LRG1 | P02750 | Decreased | 5.50% | 6 | 0.0960 | 8 | 0.0400 | 4 |
IGLC7 | A0M8Q6 | Decreased | 8.40% | 2 | 0.0664 | 18 | 0.0196 | 14 |
CRTAC1 | Q9NQ79 | Decreased | 3.90% | 12 | 0.1031 | 6 | 0.0740 | 1 |
Proteins with more than 50% undetectable/filtere values were excluded. All 250 protein analytes were fitted as predictors in 3 different classifiers: Lasso, Random Forests, and XGBoost. Table shows the proteins that were ranked in the top 20 of importance measurements for all ME/CFS patients, ME/CFS patients with sr-IBS and ME/CFS patients without sr-IBS. Direction is measured relative to controls. ME/CFS: myalgic encephalomyelitis/chronic fatigue syndrome, sr-IBS: self-reported irritable bowel syndrome, CAMP: cathelicidin antimicrobial protein, LRG1: Leucin-rich glycoprotein 1, IGF1: insulin-like growth factor 1, IGFALS: Insulin-like growth factor-binding protein complex acid labile subunit, IGLV1-47: immunoglobulin lambda variable region 1–47, FCRL3: Fc receptor-like protein 3, SERPINA3: Alpha-1-antichymotrypsin, ITIH2: Inter-alpha-trypsin inhibitor heavy chain H2, IGHV1-18: immunoglobulin heavy variable region 1–18, PON3: Serum paraoxonase/lactonase 3, KNG1: Kininogen 1, IGLC7: immunoglobulin lambda constant region 7.
1Percentage: Lasso regularizes the least squares by adding a penalty term in which the L1 norm of the parameter vector is no greater than a given value, and increasing the penalty drives more coefficients of unimportant predictors to absolute zero. Therefore, measure of importance can be represented as the percentage of iterations (out of 1,000 random resampling cross-validation iterations) in which the predictor’s parameter estimate in the best fitting model is nonzero.
2Mean Decrease in Accuracy: Random Forests measures the mean decrease in accuracy when values of the predictor are randomly permuted. For unimportant predictors, the permutation should have little to no effect on model accuracy, while permuting values of important predictors should significantly decrease it.
3Gain: XGBoost measures the importance of predictors in ‘Gain’ to indicate the relative contribution of the corresponding predictor to the model calculated by taking each predictor’s contribution for each tree in the model.
4Rank: We selected the protein analytes that were ranked in the top 20 in all three importance measurements.