Figure - PMC

Skip to main content

An official website of the United States government

Here's how you know

Here's how you know

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

View full-text article in PMC

. 2023 Nov 11;39(12):btad685. doi: 10.1093/bioinformatics/btad685

Search in PMC
Search in PubMed
View in NLM Catalog
Add to search

© The Author(s) 2023. Published by Oxford University Press.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

PMC Copyright notice

Figure 4. — Random intercept model multisource catalogue. (A) Significance of ANOVA test (on negative binomial generalized linear model) of each gene/cell type across datasets. Each dot is a gene/cell-type pair, coloured by cell type. The dashed line represents the significance threshold of 0.05 false-discovery rate. (B-top) A selection of three genes/cell-type pairs representing significant clustering according to the dataset. (B-bottom) A selection of three pairs in which transcript abundance is consistent across datasets. (C) Cartoon of the Bayesian random intercept model estimates gene/cell-type transcript abundance representing one gene-cell-type pair and four datasets. The four bright densities are the observed data distributions. The four bright dots are the point estimates. The dark distributions are the posterior densities for the estimates group –log mean transcript abundance. The purple dashed line is the posterior density for the group-level log-mean transcript abundance; the brown dashed density is the group-level standard deviation of transcript abundance. The thick line in the histogram represents the mean generated data distribution informed by the log-mean posterior density. In contrast, the thin lines represent the part of the generated data distribution informed by the overdispersion posterior density. (D) The edgeR trend of the tag-wise dispersion, on which estimation shrinkage is based. (E) The association between log-mean and log-overdispersion is linear compared to the association between log count per million and coefficient of variation modelled by edgeR (D). Red dots are point estimates, and the ellipse represents the uncertainty described by the posterior distribution (95% credible interval). (F) Marker genes (red-shaded points) have high transcription (x-axis) and low variability across datasets (y-axis, for the comparison of endothelial versus immune, fibroblasts, and epithelial). (G) The variability of gene-transcript abundance across datasets (x-axis) and within datasets (y-axis) are not associated (for endothelial cells).