| Algorithm 1: 1D-SRA |
|
input: Feature data X (N x P) Target class Y (N x 1) constants: P = 11,915,233 // total number of features (SNPs) S = 250 // reduced number of features K = 47,660 // number of reduced models step A1: generate reduced data sets and assess performance of reduced models shuffle the vector of all P features for each reduced model k in K // over reduced models randomly sample S features without replacement from the shuffled vector fit a multinomial logistic regression model to N individuals described by S features (dependent variable–target class independent variable–features) collect model_performance_k = cross-entropy loss for each s in S // over features feature_performance_k_s = max(SNP effect estimate) feature_index_k_s = feature Id from feature vector end for end for step A2: generate model performance matrix C generate a model performance matrix of K rows and P + 1 columns filled with zeros for each k in K // over reduced models C[k,P + 1] = model_performance_k for each s in S // over features C[k,feature_index_k_s] = feature_performance_k_s end for end for step A3: supervised rank aggregation based on matrix C run LMM (dependent variable–model performance C[:,P + 1], independent variable–feature performance C[:,1:P] collect 1:P vector of LMM parameter estimates step A4: feature selection based on LMM parameter estimates define 2 clusters of LMM parameter estimates based on 1D-K-means clustering |