Table 4. Calculation of data characteristics in R.
| Name of data characteristic | Name in matrix (data.prop) summarizing all data characteristics | Calculation in R | Dimension |
|---|---|---|---|
| dat.cpm
Counts per million normalized and log transformed data |
edgeR::cpm (dat, log=TRUE, prior.count = 1) | mxn | |
| Feature sparsity | data.prop$P0_feature | apply (dat==0,1,sum)/ncol (dat) | m |
| Sample sparsity | data.prop$P0_sample | apply (dat==0,2,sum)/nrow (dat) | n |
| Feature mean abundance | data.prop$mean_log2cpm | apply (dat.cpm, 1,mean,na.rm=T) | m |
| Feature median abundance | data.prop$median_log2cpm | apply (dat.cpm,1, median,na.rm=T) | m |
| Feature variance | data.prop$var_log2cpm | apply (dat.cpm, 1, var) | m |
| Library size | data.prop$lib_size | colSums (dat) | n |
| Sample means | data.prop$sample_means | apply (dat,2,mean) | n |
| Sample correlation | data.prop$corr_sample | cor (dat, dat, method="spearman",use="na.or.complete") | nxn |
| Feature correlation | data.prop$corr_feature | cor(t (dat), t (dat), method="spearman",use="na.or.complete") | mxm |
Data characteristics that are vectors or matrices. These are used in Table 3 to calculate the final 43 integer value characteristics. Here, dat is a (mxn)-Matrix, with features as rows and samples as columns.