Figure 2.
Bulk RNA-seq data simulation from mixtures of cell types. (A) Cell type profiles are generated from a multivariate log-normal distribution, with parameters estimated from pure cell line RNA-seq data. (B) Cell type proportions are generated from Dirichlet distribution, with parameters estimated from a collection of 16 well-labeled scRNA-seq datasets. (C) Gamma-Poisson compound is adopted to generate observed RNA-seq data, with dispersion embedded. (D) Simulated data mimic the real data well, reflected through the gene expression means and dispersion distributions.
