Skip to main content
. 2016 Jun 8;3:17. doi: 10.3389/fcvm.2016.00017

Table 1.

Summary of the main multivariate methods for common variants analysis.

Phenotype Method Main software packages Analysis of entire GWAS datasets Advantages Disadvantages
Binary traits
Stepwise logistic regression (19) Orange (20), WEKA (21), statsa, MASSa Limited to candidate variants Results can be easily interpreted Results could be negatively influenced by collinearity; computationally intensive; R implementationsa require advanced computer skills
LASSO (22) Orange (20), PLINK (23), HyperLASSO (24), glmneta, larsa, penalizeda, ldlassoa, scikit-learnb Yes (HyperLASSO), otherwise the analysis is limited to candidate variants Fast computation; internal CV to learn the optimal λ parameter Does not necessarily yield good results in presence of high collinearity and when the number of variants exceeds the number of examples; Ra, Pythonb, and PLINK implementations require advanced computer skills
Elastic net (25) elasticneta, glmneta, scikit-learnb Limited to candidate variants Combines strengths of LASSO and Ridge regression (26), overcoming issues due to collinearity, and unbalanced variants/samples ratio Requires advanced computer skills
BOSS (27) BOSS Limited to candidate variants Works properly also when the number of features exceeds the number of samples Computationally intensive; requires advanced computer skills
BoNB (28) BoNB Yes Fast computation; robust to LD between variants Requires advanced computer skills
Classification trees (29) Orange (20), WEKA (21), rparta, treea, scikit-learnb Limited to candidate variants Fast computation; easy to interpret May not perform well in the presence of complex interactions, overfitting may lead to instability; Ra and Pythonb implementations require advanced computer skills
Random forest (30) Orange (20), WEKA (21), randomForesta, randomForestSRCa, scikit-learnb, RFF (31) Yes (RFF) otherwise the analysis is limited to candidate variants Robust to noise; fast computation Results are difficult to interpret; Ra, Pythonb and RFF implementations require advanced computer skills
ABACUS (32) ABACUSa Candidate regions mapping to specific pathways Able to simultaneously consider common and rare variants and different directions of genotype effect Requires advanced computer skills
Time to event
Stepwise Cox proportional hazard model Survivala, MASSa Limited to candidate variants Results can be easily interpreted Results could be negatively influenced by collinearity; computationally intensive; requires advanced computer skills
LASSO (22) glmneta, penalizeda coxneta Limited to candidate variants Fast computation; internal CV to learn the optimal λ parameter Does not necessarily yield good results in presence of high collinearity and when the number of variants exceeds the number of examples; requires advanced computer skills
Elastic net (25) coxneta Limited to candidate variants Combines strengths of LASSO and Ridge regression (26), overcoming issues due to collinearity, and unbalanced variants/samples ratio Requires advanced computer skills
Classification (survival) trees (29) rparta Limited to candidate variants Fast computation; easy to interpret May not perform well in the presence of complex interactions, overfitting may lead to instability; requires advanced computer skills
Random forest (30) randomForestSRCa Limited to candidate variants Robust to noise; fast computation Results are difficult to interpret; requires advanced computer skills
Quantitative traits
Stepwise linear regression statsa, MASSa Limited to candidate variants Results can be easily interpreted Results could be negatively influenced by collinearity; computationally intensive; requires advanced computer skills
LASSO (22) Orange (20), PLINK (23), HyperLASSO (24), glmneta, larsa, penalizeda, ldlassoa, scikit-learnb Yes (HyperLASSO), otherwise the analysis is limited to candidate variants Fast computation; internal CV to learn the optimal λ parameter Does not necessarily yield good results in presence of high collinearity and when the number of variants exceeds the number of examples; Ra, Pythonb, and PLINK implementations require advanced computer skills
Elastic net (25) Elasticneta, glmneta, scikit-learnb Limited to candidate variants Combines strengths of LASSO and Ridge regression (26), overcoming issues due to collinearity, and unbalanced variants/samples ratio Requires advanced computer skills
GUESS (33) GUESS/R2GUESSa Yes Fast parallel computation Requires advanced computer skills
Regression trees (29) Orange (20), rparta, treea, scikit-learnb Limited to candidate variants Fast computation; easy to interpret May not perform well in the presence of complex interactions, overfitting may lead to instability; Ra and Pythonb implementations require advanced computer skills
Random forest (30) Orange (20), randomForesta, randomForestSRCa, scikit-learnb, RFF (31) Yes (RFF) otherwise the analysis is limited to candidate variants Robust to noise; fast computation Results are difficult to interpret; Ra, Pythonb, and RFF implementations require advanced computer skills

Phenotype, dependent variable’s distribution; method, algorithm or method; main software packages, main softwares, packages, or functions implementing the described method; analysis of entire GWAS datasets, indicates whether the method can be applied to whole GWAS data; advantages, advantages of the method; disadvantages, disadvantages of the method.

aR package.

bPython package.