Figure 1.
Study design and analysis protocol. (A) Study design and flow diagram. A total of 654 PBMC samples were prospectively collected. Thirty-four CpGs were investigated using multiplex bisulfite sequencing in 341 HBVLD (CHB: 168, LC: 173) and 313 early HCC group (BCLC-0:148, BCLC-A: 165). The least absolute shrinkage and selection operator method (LASSO) cross-validation was introduced to selected six CpGs and built a six-CpG-scorer in the training dataset (n = 442, 212 early HCC and 230 HBVLD). (B) Schematic representation of statistical analysis. The analyses were performed on 18 selected predictors measured on 442 parents involved in the training set. Twenty complete datasets were created after 20 times missing imputation, and 17 candidate variables were detected the possible nonlinear dependency of the relationship with early HCC, of which 7 variables were transformed in a non-linear fashion. Resampling 500 times were from each of the 20 complete datasets, leading to a total number of 10,000 bootstrap datasets. A backward feature selection with AIC was repeated on each of the 10,000 bootstrap datasets for selecting the most relevant risk variables for early HCC. The factors chosen at least once during the feature selection procedure constituted the final mode, whose coefficients were estimated using Rubin’s Rule from the 20 complete datasets. In internal validation, the model predictiveness and correcting overfit were assessed using 10,000 separate enhanced bootstrapped datasets. The nomogram presented the final mode. Decision curve analysis and clinical impact curves were performed to determine the final model clinical usefulness. The final mode also showed an obvious diagnosis potential in test set (n = 212, 101 early HCC and 111 HBVLD). HBVLD, HBV-related liver disease; HCC, hepatocellular carcinoma; BCLC, Barcelona Clinic Liver Cancer staging system.
