Fig. 4. Characteristics of GTEx gene expression prediction models.
In panel (a), we illustrate the proportions of different window sizes among selected PUMICE models. Each boxplot is derived from the percent window composition of 48 GTEx tissues. In panel (b), we show the proportion of different values of tuning parameter among selected PUMICE models. is the tuning parameter that reduces the L1 and L2 penalties for essential predictors that overlap with ENCODE annotations. Each boxplot is derived from the percent penalty factor composition of 48 GTEx tissues. Minima and maxima values (excluding outliers) are represented by the lower- and upper-bound of the whiskers. Median value is represented by the bolded line in the middle. First and third quartiles are represented by the lower- and upper-bound of the box. In panel (c), we show the distribution of the number of SNPs with non-zero weights in gene expression prediction models across different TWAS methods. Vertical line represents median number of SNPs with non-zero weights. PUMICE models have the lowest median number of SNPs with non-zero weights (n = 13), while UTMOST models have the highest median number of SNPs with non-zero weights (n = 73). In panels (d), we plot the distribution of the locations of SNPs with non-zero weights (for PrediXcan, EpiXcan, PUMICE, and UTMOST) or top 100 SNPs with highest weights (for FUSION and TIGAR). Variant counts are plotted against their locations relative to 5’ gene transcription start site (TSS) and 3’ gene transcription end site (TES) across different TWAS methods.