Skip to main content
. 2022 Jan 12;23:20. doi: 10.1186/s13059-021-02595-6

Fig. 1.

Fig. 1

Overview of the scMVP framework. a Given the scRNA-seq genes expression counts and TF-IDF transformed scATAC-seq chromatin accessibility peaks profile of each cell as input, scMVP learns the optimal joint embedding for downstream analysis with a multi-view deep generative model. Two independent channels of attention-based networks are utilized to the backbone of the encoder model to adapt inputs of the different modalities, including canonical mask attention subnetwork for scRNA and transformer derived self-attention for TF-IDF transformed scATAC, and then joint together to derive the posterior distribution parameters of common latent embedding z following a Gaussian mixture model prior. Next, the imputed scRNA and scATAC profiles are reconstructed by an attention based two-channels decoder network, which shares similar network structure with the encoder network. And an auxiliary attention module with input of cluster probability of common latent embedding z (denoted as p(c| z)) in the prior distribution is utilized to weight each decoder channel of the imputed scRNA and scATAC profile. Here, the imputed RNA and ATAC are produced by the mean value of Gamma distribution for scRNA data and the Poisson distribution for scATAC data, respectively. To guarantee the embedding consistency between the original and imputed data, two single-channel encoders are used to embed the imputed RNA and ATAC separately to minimize the KL divergence between common latent embedding z and each imputed embedding. b ARI metrics of clustering accuracy along with the varying of latent embedding dimensions in a range from 2 to 20. c Running times for training models on the resampling SHARE-seq cell line datasets with a set of 8000 genes and 23,000 peaks. scMVP, scVI, WNN, and cisTopic are tested on a server with one 10-core Intel Xeon E5-2680 with 32 GB RAM and one NVIDIA 1080TI GPU with 11 GB RAM