Skip to main content
. 2020 Nov 26;10(4):247. doi: 10.3390/jpm10040247

Table 1.

Details of the feature selection approaches that have been followed by each study prior to the machine learning experiments. The “Data/Instances” column summarises the primary number of instances before further filtering, and details about the datasets that have been used by each study. The “Features” column refers to the initial number of features in each dataset before feature selection and after quality control (the latter being applicable only in studies using genotype data). The “Epistasis” column indicates whether the respective study has incorporated epistatic events (i.e., multi-locus interactions) as a feature selection method. The “Regulatory Elements” column describes studies that have filtered their initial dataset by including only non-coding regulatory regions. Studies that have used prior ALS-related information (e.g., already known ALS-associated SNPs/genes, known functional information, filtering based on an ALS versus Control genotype-phenotype association analysis p-value etc.) in order to select and reduce their initial instance and feature space are indicated under “ALS-linked knowledge”. Lastly, we indicate machine learning methods that were used to select only highly informative features based on specific criteria. ML: Machine Learning, SNP: Single Nucleotide Polymorphism, MDR: Multifactor Dimensionality Reduction, CNN: Convolutional Neural Network, PPIs: Protein- Protein Interactions, DHS: DNase I hypersensitive sites, TFBS: Transcription Factor Binding Sites, PCA: Principal Component Analysis, t-SNE: t-distributed Stochastic Neighboring Embedding, UMAP: Uniform Manifold Approximation and Projection.

Study Data/Instances Features Genomic
Structure
Epistasis Cis-Regulatory Elements ALS-Linked Knowledge ML Methods
Vitsios et al. [103] 18,626 coding genes (label:positive/unlabelled) 1,249 gene-annotations: generic, disease- and tissue -specific features No No No Yes PCA, t-SNE, UMAP
Yousefian et al. [104] 8,697,640 SNP p-values of 14,791 ALS cases and 26,898 controls [36] 2,252 functional features: DHS mapping data, histone modifications, target gene functions, and TFBS Yes No Yes Yes None
Bean et al. [105] ALS-linked gene lists: DisGeNet: 101 genes ALSoD: 126 genes, ClinVar: 44 genes, Manual list: 40 genes Union: 199 genes PPIs, disease-gene associations and functional annotations No No No Yes None
Yin et al. [90] 4511 cases and 7397 controls [16] 823,504 SNPs from 7, 9, 17 and 22 chromosomes Yes No Yes Yes CNN
Kim et al. [91] SNP pairwise interactions 550,000 SNPs of 276 cases/271 controls and 211 cases/211 controls
[76,106]
No Yes No Yes MDR
Greene et al. [107] SNP pairwise interactions 210,382 SNPs of 276 cases/271 controls and 211 cases/211 controls
[76,106]
No Yes No No MDR
Sha et al. [108] SNP pairwise interactions 555,352 SNPs of 276 cases/271 controls [76] No No No Yes None