a Agglomerative hierarchical clustering (HClust) dendrogram of the training cohort on the basis of their target organ involvements. An HClust distance threshold of 30 split the cohort into 4 clusters, which were numerically ordered. Grade I: n = 763; II: n = 1149, III: n = 191, IV: n = 216. Red dashed line indicates cutoff level for four grades. b Kaplan–Meier OS curve with 95% CI of 4 HClust-aGVHD grades (I–IV). Strata are compared with the two-sided log-rank test. c K-means partitional clustering performance indicators SSD (sum of squared distances, green dashed) and silhouette coefficient (green), labels on each figure side. Red dashed line indicates cutoff level with four grades; gray dashed line shows cutoff level with 8 grades, the optimal number determined by both methods (n = 8, Sil = 0.62). We evaluated a further cutoff point with 14 clusters in the supplementary notes. d Kaplan–Meier OS curve with 95% CI of K-means-4 grades (I–IV). Strata are compared with the two-sided log-rank test. e Multivariate competing risk regression analysis for 12 months NRM on the test cohort (n = 541 evaluable for all covariates) using the PC1 aGVHD grades as a time-dependent variable. The multivariate model was adjusted for potentially confounding variables, covariates as listed in e. Horizontal bars represent 95% CI. P-values are computed based on the Wald-test. The hazard ratio (HR) is a measure of the ratio of the hazard between two groups. A value of 1 is the reference, HR < 1 corresponds to lower risk and HR > 1 to higher risk of NRM than the reference. The HR of PC1 grade II was 2.12 (95% confidence interval, CI, 1.17-3.83, grade III HR 7.2 (95%CI 4.72-10.99) and grade IV HR 16.30 (95%CI 8.12-32.75). Significant covariates in this NRM model were diagnoses (acute lymphoblastic leukemia (ALL) HR 2.3 (95%CI 1.17–4.66), myelodysplastic syndromes (MDS) HR 1.74 (95%CI 1.06–2.84), other diagnoses HR 3.8 (95% CI 1.27–11.88), year of HCT HR 0.91, 95% CI 0.85–0.97, and EBMT risk score HR 1.40, 95%CI 1.21–1.63. The covariates, donor age, donor sex, donor type, Karnofsky performance index ≥80 were not significant in univariate regression analysis and hence not included in the multivariate model. Source data are provided as a source data file.