Skip to main content
. 2019 Oct 10;47(22):e146. doi: 10.1093/nar/gkz868

Figure 1.

Figure 1.

Model schematic and performance. (A) CNN framework to detect regulatory patterns shared by risk variants residing in multiple association blocks centered on lead SNPs. In this example, block #1 carries n SNPs including the lead SNP. We apply k different kernels that learn particular patterns composed of various regulatory features encompassing DHSs, histone modifications, target gene function, and TF binding sites. At this stage, an autoencoder is used for pre-training. In this manner, the first convolution layer scores n SNPs with k pattern detectors. Afterward, another convolution layer is applied to combine the k scores, thereby enabling nonlinear combinatorial modeling of regulatory patterns. The output of the second layer serves as the prediction score for each SNP. The model is trained to maximize the likelihood derived from the block scores that are assigned by max pooling. (B) Model performance of rheumatoid arthritis (RA), systemic lupus erythematosus (SLE), Crohn's disease (CD), ulcerative colitis (UC), attention deficit-hyperactivity disorder (ADHD), autism spectrum disorder (ASD), bipolar disorder (BPD), major depressive disorder (MDD) and schizophrenia (SCZ) measured on the basis of AUC and F1. The red, blue and gray bars are for the original CNN model, linear model with only one convolution layer, and model with only the lead SNPs. Model training and performance evaluation were carried out on the training, validation, and testing sets (Supplementary Figure S2 and Supplementary Table S2).