Table 3. Data set overview.
| Data set | Features | Samples | Normalisation method |
|---|---|---|---|
| Leukemia MILE study | 67191 | 2095 | 1 |
| Normal human hematopoiesis with AMLs | 67191 | 296 | 1,7 |
| Immgen Key populations | 47273 | 256 | 2 |
| AML versus normal | 67191 | 252 | 3 |
| AML TCGA data set | 67191 | 244 | 1 |
| AML TCGA data set versus normal | 67191 | 244 | 3 |
| AML Normal Karyotype | 54675 | 234 | 1 |
| AML Normal Karyotype versus normal | 67191 | 234 | 3 |
| Normal human hematopoiesis (DMAP) | 35459 | 211 | 4 |
| Immgen abT cells | 47273 | 190 | 2 |
| Immgen Dentritic cells | 47273 | 151 | 2 |
| Immgen MFs Monocytes Neutrophils | 47273 | 114 | 2 |
| Immgen B cells | 47273 | 103 | 2 |
| Normal human hematopoiesis (HemaExplorer) | 57270 | 77 | 5 |
| Immgen gdT cells | 47273 | 76 | 2 |
| Immgen Stem and progenitor cells | 47273 | 76 | 2 |
| Mouse normal hematopoietic system | 57613 | 67 | 4 |
| Immgen Activated T cells | 47273 | 55 | 2 |
| Immgen NK cells | 47273 | 47 | 2 |
| Immgen Stromal cells | 47273 | 39 | 2 |
| Mouse normal (RNA seq) | 45426 | 52 | 6 |
| BloodPool | 67191 | 2120 | 1,7 |
| BloodPool versus normal | 67191 | 2076 | 3,7 |
Normalisation method legend:
1 Each cancer sample is normalised together with a set of samples from sorted normal myeloid populations. All samples where normalised using RMA. Comparison of gene expression values is not possible with other data sets in Bloodspot.
2 All samples from the ImmGen data sets were normalised together with RMA. Samples were subsequently attributed to the different data sets in BloodSpot. This means that comparison of gene expression values is possible across all ImmGen data sets.
3 The data are normalised according to Rapin et al. Briefly, each cancer sample is normalised together with a set of samples from sorted normal myeloid populations. Next, using a PCA-based method, the 5 closest normal samples from the cancer sample are averaged and this computed normal sample are next compared to the cancer sample allowing for computation of gen expression fold changes. See Supplementary Methods and Rapin et al. (10).
4 All sampleswhere
normalised using RMA. Comparison of gene expression values is not possible with other datasets in Bloodspot.
5
See our previous work (Bagger et al. (3)).
6 The data were processed using the bcbio nextgen RNA-seq pipeline. Count data were subsequently processed with DESeq2's variance stabilising transformation method.
7 The data was batch corrected using ComBat, taking study number as batch.