Table 6.
Data set names/ | Data normalization | Datanormalization | ||
---|---|---|---|---|
GEO ID | Disease | Data retrieval methods | timing | methods |
GSE46579 | AD | GSE46579_AD_ngs_data_summarized.xls.gz | before FE | zero mean/variance is one |
GSE37472 | carcinoma | getGEO | before FE | zero mean/variance is one |
GSE49823 | CAD | getGEO | after FE | zero mean/variance is one ∗ |
GSE43329 | NPC | getGEO | before FE | zero mean/variance is one + |
GSE50013 | HCC | getGEO | before FE # | zero mean/variance is one ∗ |
GSE41922 | BC | GSE41922_series_matrix.txt.gz | after FE | zero mean/variance is one ∗ |
GSE49665 | AML | getGEO | after FE | zero mean/variance is one ∗ |
*no normalization for SVM/lasso, +no normalization for SVM with PCA-based FE, #after FE for PCA-based LDA with universal features. All the sample normalizations were sample-based; i.e., each sample was normalized to have both zero mean and unit variance. AD, Alzheimer disease; CAD, coronary artery disease; NPC, nasopharyngeal carcinoma; HCC, hepatocellular carcinoma; BC, breast cancer; AML, acute myeloid leukemia. Data retrieval methods/data set names were used to name files and for analysis. getGEO indicates that individual sample profiles whose files names started with “GEO” were downloaded by the getGEO command in R.