a, The box plots (left)
show the relative abundance of the pathway of controls (blue
bar) and patients with CRC (red bar) in each cohort. The number
of samples was AUS (patients with CRC = 46,
controls = 63), FRA (patients with
CRC = 53, controls = 61), GER
(patients with CRC = 60,
controls = 65), CHN (patients with
CRC = 80, controls = 86), JPN
(patients with CRC = 258,
controls = 251), respectively. All box plots
represent the 25th–75th percentile of the distribution;
the median is shown as a thick line in the middle of the box;
the whiskers extend up to the most extreme points within a
1.5× the IQR and outliers are represented as dots. The
heatmap (centre) shows the integrated meta-analysis that
identified significantly changed KO gene expression in each
metabolic pathway examined across five geographical populations.
The cell colour and intensity represent the generalized
abundance fold change of KO genes. The significant differential
KO gene (P < 0.05, two-sided test)
was identified via MMUPHin. P
values are shown in the cells. b, Normalized log abundance for the functional
genes bdhA/B (K00100),
oraE (K17898) and
oraS (K17899) is compared
between controls (n = 494) and patients with CRC
(n = 491). Statistical significance
was determined via MMUPHin with treating age, BMI and sex as
covariates (two-sided test). c,d, Expression
of bdhA and bdhB in the butanoate metabolism
pathway (c) and oraE and oraS in the D-arginine and D-ornithine
metabolism pathway (d) were
upregulated in patients with CRC (n = 24) than controls (n = 24) determined
via qPCR with gDNA. Data are presented as the
mean ± s.d. of three biological
replicates. P values were
calculated using a two-sided Wilcoxon signed-rank test and were
Bonferroni-adjusted. The box plots show the IQRs as boxes, with
the median as a black horizontal line and the whiskers extending
up to the most extreme points within the 1.5× the IQR.
e, AUROC matrix of models
built with the 175 important EggNOG genes. Values on the
diagonal refer to the average AUROC of 20× repeated
fivefold stratified cross-validations. Values off the diagonal
refer to the AUROCs obtained by training the model on the
population of the corresponding row and applying it to the
population of the corresponding column. The LOCO row refers to
the performances obtained by training the model using all but
the cohort dataset of the corresponding column and applying it
to the dataset of the corresponding column. The asterisk
represents the significance of models assessed with 1,000
permutations (two-sided test). *P = 0.001.
Source
data