Skip to main content
[Preprint]. 2023 Mar 20:2023.03.14.23287257. [Version 1] doi: 10.1101/2023.03.14.23287257

Fig. 2. Quantifying single-cell HLA expression using a personalized pipeline.

Fig. 2.

(A) Frequency of each imputed two-field HLA allele across all cohorts. Most common alleles (up to ten) are labeled for each gene, with other alleles grouped into “other.” Alleles with frequency of less than five are not labeled. (B) Boxplot (each observation is one sample) showing percentage change (y-axis) in the estimated UMIs for each HLA gene (x-axis) summed across all cells after quantification with scHLApers (compared to a pipeline using the standard reference genome) in the Synovium dataset (n=69 individuals). The center line of the boxplot represents the median, the lower and upper box limits represent the 25% and 75% quantiles, respectively, the whiskers extend to the box limit ±1.5 × IQR, and outlying points are plotted individually. Plot for all cohorts is in Fig. S4C. Dotted red line denotes no change. (C) Percentage change in estimated expression (y-axis) for three example genes (HLA-DQA1, HLA-DQB1, and HLA-DRB1) per sample as a function of the mean (between the individual’s two alleles) Levenshtein distance relative to the reference allele at the 3’ end (x-axis). Dotted red line denotes no change. Plot for all genes is in Fig. S4D. (D) Comparing the estimated HLA expression as measured using shorter reads (84 bp, y-axis) versus longer reads (289 bp, x-axis) in the standard pipeline (top row) compared to scHLApers (bottom row). Each dot shows the mean log(CP10k+1)-normalized expression across cells for one sample in the PBMC-cultured dataset (n=146 samples from 73 individuals). r is calculated as Pearson correlation; dashed gray line is the identity line. (E) HLA expression in different cell types across cohorts: myeloid (n=145,090 cells), B (n=180,935), T (n=805,389), NK (n=125,865), fibroblasts (n=82,651) and endothelial (n=26,300). Dot size indicates the proportion of cells with nonzero expression; color indicates log(CP10k+1)-normalized expression (mean across cells).