Skip to main content
. 2022 Jan 27;7(2):238–250. doi: 10.1038/s41564-021-01030-7

Fig. 5. CRC-associated functional alterations and performance of models constructed with KO genes.

Fig. 5

a, The box plots (left) show the relative abundance of the pathway of controls (blue bar) and patients with CRC (red bar) in each cohort. The number of samples was AUS (patients with CRC = 46, controls = 63), FRA (patients with CRC = 53, controls = 61), GER (patients with CRC = 60, controls = 65), CHN (patients with CRC = 80, controls = 86), JPN (patients with CRC = 258, controls = 251), respectively. All box plots represent the 25th–75th percentile of the distribution; the median is shown as a thick line in the middle of the box; the whiskers extend up to the most extreme points within a 1.5× the IQR and outliers are represented as dots. The heatmap (centre) shows the integrated meta-analysis that identified significantly changed KO gene expression in each metabolic pathway examined across five geographical populations. The cell colour and intensity represent the generalized abundance fold change of KO genes. The significant differential KO gene (P < 0.05, two-sided test) was identified via MMUPHin. P values are shown in the cells. b, Normalized log abundance for the functional genes bdhA/B (K00100), oraE (K17898) and oraS (K17899) is compared between controls (n = 494) and patients with CRC (n = 491). Statistical significance was determined via MMUPHin with treating age, BMI and sex as covariates (two-sided test). c,d, Expression of bdhA and bdhB in the butanoate metabolism pathway (c) and oraE and oraS in the D-arginine and D-ornithine metabolism pathway (d) were upregulated in patients with CRC (n = 24) than controls (n = 24) determined via qPCR with gDNA. Data are presented as the mean ± s.d. of three biological replicates. P values were calculated using a two-sided Wilcoxon signed-rank test and were Bonferroni-adjusted. The box plots show the IQRs as boxes, with the median as a black horizontal line and the whiskers extending up to the most extreme points within the 1.5× the IQR. e, AUROC matrix of models built with the 175 important EggNOG genes. Values on the diagonal refer to the average AUROC of 20× repeated fivefold stratified cross-validations. Values off the diagonal refer to the AUROCs obtained by training the model on the population of the corresponding row and applying it to the population of the corresponding column. The LOCO row refers to the performances obtained by training the model using all but the cohort dataset of the corresponding column and applying it to the dataset of the corresponding column. The asterisk represents the significance of models assessed with 1,000 permutations (two-sided test). *P = 0.001.

Source data