Abstract
BACKGROUND
Genomic assays capable of cellular resolution (i.e. scRNA-seq) are becoming ubiquitous in biomedical research. Machine learning, and the subtype known as Deep Learning, have broad application within scRNA-seq analytics. However, methods to facilitate the classification of cell populations are lacking. We present the novel computational framework HD Spot, which generates interpretable and robust Deep Learning classifiers that enable unbiased interrogation of linear and non-linear genomic signatures.
METHODS
HD Spot is written in python and relies on Google’s TensorFlow2 deep learning framework. Four datasets of immune cells were obtained from the publicly available Seurat repository, generated using the 10X chromium platform. Data preprocessing used standard Seurat methodology. HD Spot generated optimized classifiers via a custom platform. Network interpretability was achieved using Shapley values. Ontology analysis was performed using Metascape.
RESULTS
HD Spot identified meaningful ontologic signatures across all tested datasets. In the binary case of control versus IFN-B stimulated CD4+ T cells, gene ontologies reflected Th0 and Th2 T cell populations, congruent with T cell activation. In the 9-class case of PBMCs, HD Spot identified meaningful gene networks characteristic of the ground-truth populations using raw feature counts alone. When feature counts are processed into expression values, HD Spot demonstrates increased specificity of top genes and respective ontologies between subpopulations.
CONCLUSION
This work introduces a broadly applicable computational tool for the advanced bioinformatician to decipher complex cellular heterogeneity (e.g., tumors) in an unbiased way. Additionally, HD Spot lowers the barrier for novice bioinformaticists to derive actionable insights from their data.
