Table 5. Final integer values data characteristic and their calculation in R.
| Name of data characteristic | Calculation in R |
|---|---|
| Number of features | nrow (dat) |
| Number of samples | ncol (dat) |
| Sparsity of data set | sum (dat==0)/length (dat) |
| Median of data set | median (dat,na.rm=TRUE) |
| 95th Quantile | quantile (dat,probs=.95) |
| 99th Quantile | quantile (dat,probs=.99) |
| Mean library size | mean (colSums (dat),na.rm = T) |
| Median library size | median (colSums (dat),na.rm = T) |
| Standard deviation library size | sd (colSums (dat),na.rm = T) |
| Coefficient of variation of library size | sd (colSums (dat),na.rm = T)/mean (colSums (dat),na.rm = T)*100 |
| Maximum library size | max (colSums (dat),na.rm = T) |
| Minimum library size | min (colSums (dat),na.rm = T) |
| Read depth range between samples | diff (range (colSums (dat),na.rm = T)) |
| Mean sample richness | mean (colSums (dat>0), na.rm=T) |
| Spearman correlation library size with P0*(sample) | cor (data.prop$P0_sample, data.prop$lib_size, method=“spearman”) |
| Bimodality of feature correlations | bimodalIndex (matrix (data.prop$corr_feature,nrow=1))$BI |
| Bimodality of sample correlations | bimodalIndex (matrix (data.prop$corr_sample,nrow=1))$BI |
| Mean of all feature means | mean (data.prop$mean_log2cpm,na.rm=T) |
| SD of all feature means | sd (data.prop$mean_log2cpm,na.rm=T) |
| Median of all feature means | median (data.prop$median_log2cpm,na.rm=T) |
| SD of all feature medians | sd (data.prop$median_log2cpm,na.rm=T) |
| Mean of all feature variances | mean (data.prop$var_log2cpm,na.rm=T) |
| SD of all feature variances | sd (data.prop$var_log2cpm,na.rm=T) |
| Mean of all sample means | mean (data.prop$sample_means,na.rm=T) |
| SD of all sample means | sd (data.prop$sample_means,na.rm=T) |
| Mean of sample correlation matrix | mean (data.prop$corr_sample,na.rm=T) |
| SD of sample correlation matrix | sd (data.prop$corr_sample,na.rm=T) |
| Mean of feature correlation matrix | mean (data.prop$corr_feature,na.rm=T) |
| SD of feature correlation matrix | sd (data.prop$corr_feature,na.rm=T) |
| Mean-Variance relation: Linear component | res <-lm(y~x+I(x
2),data=data.frame(y=data.prop$var_log2cpm,x=data.prop$mean_log2cpm))
res$coefficients[2] |
| Mean-Variance relation: Quadratic component | res=lm(y~x+I(x
2),data=data.frame(y=data.prop$var_log2cpm,x=data.prop$mean_log2cpm))
res$coefficients[3] |
| Slope feature sparsity vs. feature mean | res=lm(y~slope,data=data.frame (slope=data.prop$P0_feature-1,y=data.prop$mean_log2cpm))
res$coefficients[2] |
| Clustering of features | coef.hclust (hcluster (dat.tmp)) |
| Clustering of samples | coef.hclust (hcluster(t (dat.tmp))) |
| Sample sparsity | apply (dat==0,2,sum)/nrow (dat) |
| Library sizes | colSums (dat) |
| Mean read depths | apply (dat,2,mean) |
| Feature sparsity | apply (dat==0,1,sum)/ncol (dat) |
| Feature mean intensity | apply (dat.cpm, 1,mean) |
| Feature median intensity | apply (dat.cpm,1, median) |
| Feature variances | apply (dat.cpm, 1, var) |
| Sample correlations | cor (dat, dat, method=“spearman”) |
| Feature correlations | calc_feature_corr(dat) |
| Mean inverse Simpson diversity | mean (vegan::diversity (dat, index = “invsimpson”),na.rm=T) |
| Mean Pilou evenness | shannon_div <- vegan::diversity (count.data, index = “shannon”)
richness <- apply (count.data, 1, function(x) sum(x > 0,na.rm = T)) pilou <- shannon_div/log (richness) mean (pilou (dat),na.rm=T) |
| Mean Bray-Curtis dissimilarity | mean (vegan::vegdist (dat, method = “bray”),na.rm=T) |
List of the 43 integer value characteristics and their calculation in R. P0 refers to ‘Percent of zeros’.