Table 1.
Phenotype | Method | Main software packages | Analysis of entire GWAS datasets | Advantages | Disadvantages |
---|---|---|---|---|---|
Binary traits | |||||
Stepwise logistic regression (19) | Orange (20), WEKA (21), statsa, MASSa | Limited to candidate variants | Results can be easily interpreted | Results could be negatively influenced by collinearity; computationally intensive; R implementationsa require advanced computer skills | |
LASSO (22) | Orange (20), PLINK (23), HyperLASSO (24), glmneta, larsa, penalizeda, ldlassoa, scikit-learnb | Yes (HyperLASSO), otherwise the analysis is limited to candidate variants | Fast computation; internal CV to learn the optimal λ parameter | Does not necessarily yield good results in presence of high collinearity and when the number of variants exceeds the number of examples; Ra, Pythonb, and PLINK implementations require advanced computer skills | |
Elastic net (25) | elasticneta, glmneta, scikit-learnb | Limited to candidate variants | Combines strengths of LASSO and Ridge regression (26), overcoming issues due to collinearity, and unbalanced variants/samples ratio | Requires advanced computer skills | |
BOSS (27) | BOSS | Limited to candidate variants | Works properly also when the number of features exceeds the number of samples | Computationally intensive; requires advanced computer skills | |
BoNB (28) | BoNB | Yes | Fast computation; robust to LD between variants | Requires advanced computer skills | |
Classification trees (29) | Orange (20), WEKA (21), rparta, treea, scikit-learnb | Limited to candidate variants | Fast computation; easy to interpret | May not perform well in the presence of complex interactions, overfitting may lead to instability; Ra and Pythonb implementations require advanced computer skills | |
Random forest (30) | Orange (20), WEKA (21), randomForesta, randomForestSRCa, scikit-learnb, RFF (31) | Yes (RFF) otherwise the analysis is limited to candidate variants | Robust to noise; fast computation | Results are difficult to interpret; Ra, Pythonb and RFF implementations require advanced computer skills | |
ABACUS (32) | ABACUSa | Candidate regions mapping to specific pathways | Able to simultaneously consider common and rare variants and different directions of genotype effect | Requires advanced computer skills | |
Time to event | |||||
Stepwise Cox proportional hazard model | Survivala, MASSa | Limited to candidate variants | Results can be easily interpreted | Results could be negatively influenced by collinearity; computationally intensive; requires advanced computer skills | |
LASSO (22) | glmneta, penalizeda coxneta | Limited to candidate variants | Fast computation; internal CV to learn the optimal λ parameter | Does not necessarily yield good results in presence of high collinearity and when the number of variants exceeds the number of examples; requires advanced computer skills | |
Elastic net (25) | coxneta | Limited to candidate variants | Combines strengths of LASSO and Ridge regression (26), overcoming issues due to collinearity, and unbalanced variants/samples ratio | Requires advanced computer skills | |
Classification (survival) trees (29) | rparta | Limited to candidate variants | Fast computation; easy to interpret | May not perform well in the presence of complex interactions, overfitting may lead to instability; requires advanced computer skills | |
Random forest (30) | randomForestSRCa | Limited to candidate variants | Robust to noise; fast computation | Results are difficult to interpret; requires advanced computer skills | |
Quantitative traits | |||||
Stepwise linear regression | statsa, MASSa | Limited to candidate variants | Results can be easily interpreted | Results could be negatively influenced by collinearity; computationally intensive; requires advanced computer skills | |
LASSO (22) | Orange (20), PLINK (23), HyperLASSO (24), glmneta, larsa, penalizeda, ldlassoa, scikit-learnb | Yes (HyperLASSO), otherwise the analysis is limited to candidate variants | Fast computation; internal CV to learn the optimal λ parameter | Does not necessarily yield good results in presence of high collinearity and when the number of variants exceeds the number of examples; Ra, Pythonb, and PLINK implementations require advanced computer skills | |
Elastic net (25) | Elasticneta, glmneta, scikit-learnb | Limited to candidate variants | Combines strengths of LASSO and Ridge regression (26), overcoming issues due to collinearity, and unbalanced variants/samples ratio | Requires advanced computer skills | |
GUESS (33) | GUESS/R2GUESSa | Yes | Fast parallel computation | Requires advanced computer skills | |
Regression trees (29) | Orange (20), rparta, treea, scikit-learnb | Limited to candidate variants | Fast computation; easy to interpret | May not perform well in the presence of complex interactions, overfitting may lead to instability; Ra and Pythonb implementations require advanced computer skills | |
Random forest (30) | Orange (20), randomForesta, randomForestSRCa, scikit-learnb, RFF (31) | Yes (RFF) otherwise the analysis is limited to candidate variants | Robust to noise; fast computation | Results are difficult to interpret; Ra, Pythonb, and RFF implementations require advanced computer skills |
Phenotype, dependent variable’s distribution; method, algorithm or method; main software packages, main softwares, packages, or functions implementing the described method; analysis of entire GWAS datasets, indicates whether the method can be applied to whole GWAS data; advantages, advantages of the method; disadvantages, disadvantages of the method.
aR package.
bPython package.