Skip to main content
. 2025 Jan 2;13:1180. Originally published 2024 Oct 9. [Version 2] doi: 10.12688/f1000research.155230.2

Table 4. Calculation of data characteristics in R.

Name of data characteristic Name in matrix (data.prop) summarizing all data characteristics Calculation in R Dimension
dat.cpm
Counts per million normalized and log transformed data
edgeR::cpm (dat, log=TRUE, prior.count = 1) mxn
Feature sparsity data.prop$P0_feature apply (dat==0,1,sum)/ncol (dat) m
Sample sparsity data.prop$P0_sample apply (dat==0,2,sum)/nrow (dat) n
Feature mean abundance data.prop$mean_log2cpm apply (dat.cpm, 1,mean,na.rm=T) m
Feature median abundance data.prop$median_log2cpm apply (dat.cpm,1, median,na.rm=T) m
Feature variance data.prop$var_log2cpm apply (dat.cpm, 1, var) m
Library size data.prop$lib_size colSums (dat) n
Sample means data.prop$sample_means apply (dat,2,mean) n
Sample correlation data.prop$corr_sample cor (dat, dat, method="spearman",use="na.or.complete") nxn
Feature correlation data.prop$corr_feature cor(t (dat), t (dat), method="spearman",use="na.or.complete") mxm

Data characteristics that are vectors or matrices. These are used in Table 3 to calculate the final 43 integer value characteristics. Here, dat is a (mxn)-Matrix, with features as rows and samples as columns.