Abstract
The spatial arrangement of cells is vital in developmental processes and organogenesis in multicellular life forms. Deep learning models trained with spatial omics data uncover complex patterns and relationships among cells, genes, and proteins in a high-dimensional space, providing new insights into biological processes and diseases. State-of-the-art in silico spatial multi-cell gene expression methods using histological images of tissue stained with hematoxylin and eosin (H&E) to characterize cellular heterogeneity. These computational techniques offer the advantage of analyzing vast amounts of spatial data in a scalable and automated manner, thereby accelerating scientific discovery and enabling more precise medical diagnostics and treatments.
In this work, we developed a vision transformer (ViT) framework to map histological signatures to spatial single-cell transcriptomic signatures, named SPiRiT ( S patial Omics P rediction and R eproducibility integrated T ransformer). Our framework was enhanced by integrating cross validation with model interpretation during hyper-parameter tuning. SPiRiT predicts single-cell spatial gene expression using the matched histopathological image tiles of human breast cancer and whole mouse pup, evaluated by Xenium (10x Genomics) datasets. Furthermore, ViT model interpretation reveals the high-resolution, high attention area (HAR) that the ViT model uses to predict the gene expression, including marker genes for invasive cancer cells ( FASN ), stromal cells ( POSTN ), and lymphocytes ( IL7R ). In an apple-to-apple comparison with the ST-Net Convolutional Neural Network algorithm, SPiRiT improved predictive accuracy by 40% using human breast cancer Visium (10x Genomics) dataset. Cancer biomarker gene prediction and expression level are highly consistent with the tumor region annotation. In summary, our work highlights the feasibility to infer spatial single-cell gene expression using tissue morphology in multiple-species, i.e., human and mouse, and multi-organs, i.e., mouse whole body morphology. Importantly, incorporating model interpretation and vision transformer is expected to serve as a general-purpose framework for spatial transcriptomics.
Full Text Availability
The license terms selected by the author(s) for this preprint version do not permit archiving in PMC. The full text is available from the preprint server.