Fig. 2. Identification of aberrant basaloid cells in IPF and COPD lungs.
(A) UMAPs of 21,184 epithelial cells from 32 IPF, 18 COPD, and 28 control lungs labeled by cell type (top left), disease status (bottom right), and subject (bottom right). In the subject plot, each color depicts a distinct subject. (B) Boxplots representing the nonzero percent makeup distributions of epithelial cell types as a proportion of all sampled epithelial cells per subject within each disease group. Each dot represents a single subject, and whiskers represent 1.5 × interquartile range (IQR). FDR-adjusted Wilcoxon rank sum test results comparing IPF and control proportions are reported in data S12. (C) Heat map of average gene expression and predicted transcription factor activity per subject across each of the identified epithelial cell types. Columns are hierarchically ordered by disease status and cell type. The average gene expression per subject per cell type is unity normalized between 0 and 1 across samples. Top (green): Transcription factor signatures predicted by analysis with pySCENIC (43), and z scores are calculated across samples. Right: Zoom annotation of distinguishing markers for aberrant basaloid cells. (D) IHC staining of aberrant basaloid cells in IPF lungs: epithelial cells covering fibroblast foci are p63+ KRT17+ basaloid cells staining COX2-, p21-, and HMGA2-positive, while basal cells in bronchi do not. (E) Correlation matrix of epithelial cell populations were identified and reannotated in an independent dataset (3) with analogous cell types from our data. Matrix cells are colored by Spearman’s rho, and cell populations are ordered with hierarchical clustering. The origin dataset for each cell population is denoted by in the annotation bars.
