Table 2.
Metabolites selected with proportion of explained variation for predicting breast cancer and colorectal cancer 1.
Metabolites Selected | Proportion of Explained Variation 2 | Direction of Coefficient for Metabolites 3 |
---|---|---|
Breast Cancer | All covariates + metabolites: 0.27 | |
Serum: | ||
LC-MS | ||
Azelaic acid | 0.23 | − |
Choline | 0.23 | + |
Cysteinyl glycine | 0.23 | − |
Ethanolamine | 0.23 | + |
Gamma tocopherol | 0.23 | + |
Hippuric acid | 0.23 | − |
Isovaleryl carnitine | 0.23 | + |
N-isovaleryl glycine | 0.23 | − |
Sucrose | 0.23 | − |
Trimethylamine-N-oxide | 0.23 | + |
Valine | 0.23 | + |
Xylose | 0.23 | − |
Lipidyzer 4 | ||
Cholesteryl ester (CE 12:0) | 0.23 | − |
Cholesteryl ester (CE 20:0) | 0.23 | − |
Diacylglycerol (DAG 14:1) | 0.24 | + |
Free fatty Acid (FFA 18:4) | 0.23 | − |
Free fatty Acid (FFA 20:2) | 0.23 | + |
Hexosylceramide (HCER 22:0) | 0.23 | + |
Hexosylceramide (HCER 22:0) | 0.23 | − |
Phosphatidylcholine (PC 18:1) | 0.23 | + |
Phosphatidylcholine (PC 18:2) | 0.23 | + |
Phosphatidylcholine (PC 16:0/18:2) | 0.24 | − |
Phosphatidylethanolamine (PE 18:2) | 0.23 | + |
Triacylglycerol (TAG 12:0) | 0.23 | − |
Triacylglycerol (TAG 16:0) | 0.23 | − |
Triacylglycerol (TAG 18:0) | 0.23 | − |
Triacylglycerol (TAG 47:0/15:0) | 0.23 | − |
Triacylglycerol (TAG 48:4/18:2) | 0.23 | − |
Triacylglycerol (TAG 50:0/16:0) | 0.23 | + |
Triacylglycerol (TAG 50:2/18:2) | 0.23 | − |
Triacylglycerol (TAG 50:5/18:3) | 0.24 | − |
Triacylglycerol (TAG 52:2/18:2) | 0.24 | + |
Triacylglycerol (TAG 55:4/18:1) | 0.23 | − |
Urine | ||
NMR | ||
Dimethylamine | 0.23 | − |
Propanediol | 0.23 | − |
Formate | 0.23 | + |
Sucrose | 0.23 | − |
Taurine | 0.23 | + |
Uracil | 0.23 | − |
Trimethylamine-N-oxide | 0.23 | − |
2-Hydroxyisobutyrate | 0.23 | + |
2-Oxoglutarate | 0.23 | − |
GC-MS | ||
Unknown 73.012.10 5 | 0.23 | − |
Unknown 73.014.49 5 | 0.23 | + |
Unknown 73.016.52 5 | 0.23 | + |
Colorectal Cancer | All covariates + metabolites: 0.31 | |
Serum | ||
LC-MS | ||
Adenosine | 0.23 | − |
Leucic Acid | 0.21 | + |
Glycerate | 0.25 | + |
Myo-inositol | 0.22 | + |
N-Acetyl-glutamate | 0.22 | − |
N-Acetyl-glycine | 0.23 | + |
N-Acetylneuraminate | 0.22 | + |
2-Hydroxyglutarate | 0.22 | + |
Hydroxyproline | 0.21 | + |
7-Methylguanine | 0.22 | + |
Lipidyzer 4 | ||
Lysophosphatidylcholine (LPC 20:3) | 0.22 | − |
Urine | ||
NMR | ||
Acetate | 0.21 | + |
Allantoin | 0.21 | − |
Histidine | 0.22 | − |
Isoleucine | 0.21 | + |
Taurine | 0.22 | + |
Threonine | 0.21 | + |
Trimethylamine-N-oxide | 0.21 | + |
Uracil | 0.22 | − |
GC-MS | ||
Unknown 103 17.03 5 | 0.21 | − |
Unknown 285 22.41 5 | 0.22 | + |
Unknown 57 9.58 5 | 0.22 | + |
Unknown 73 10.76 5 | 0.21 | − |
Unknown 73 17.66 5 | 0.21 | + |
1 All variables listed below were selected by either the lasso or SL selection procedure in the corresponding platform-specific analysis. The base set of covariates (forced into all models) were age, WHI enrollment date, and self-reported race or ethnicity. Selected covariates for breast cancer: education level, income, alcohol intake, current smoking, total folate intake, Gail 5-year risk, family history of CRC, prior removal of ≤1 colon polyp, currently using estrogen, waist circumference, BMI (kg/m2), randomized to CaD or HT, date of sample draw visit. Selected covariates for colorectal cancer: age, self-reported race/ethnicity, education, income, alcohol intake, total folate intake, waist circumference, BMI (kg/m2), ≥1 colonoscopy, prior removal of ≥1 colon polyp, sample draw visit, randomized to DM control arm. 2 The proportion of explained variation (PEV) was estimated by first creating a dataset with only the selected metabolites and covariates for each outcome. Then, we used cross-validation to fit a logistic regression on each set of training data and predict on the test data; the PEV is defined as the correlation between the observed outcomes and the predictions. 3 Positive direction of the estimated coefficient from the multiple logistic regression model implies higher odds of being a case; negative direction implies lower odds of being a case. 4 In CE, X:A; FFA, X:A; DAG, X:A/Y:B; HCER, X:A; PC, X:A/Y:B; PE, X:A/Y:B; and LPC, X:A, X and Y indicate the number of carbon atoms and A and B indicate the number of double bonds in the fatty acid chains. Lipids without both A and B represent the sum of all fatty acids in that class. For example, DAG (14:1) equals the sum of all diacylglycerol, i.e., summing all DAG (x/14:1) and DAG (14:1/x). 5 Values represent mass at retention time of the unknown metabolites, i.e., 73 12.10 indicates a mass of 73 at 12.10 min. In TAG, X:A/Y:B, X indicates the total number of carbon atoms and A indicates the total number of double bonds in the three fatty acid chains, and Y indicates the number of carbon atoms and B indicates the number of double bonds in one of the fatty acid chains.