Skip to main content
. 2025 Aug 18;26(16):7961. doi: 10.3390/ijms26167961
Algorithm 1: 1D-SRA
input:   Feature data X (N x P)
     Target class Y (N x 1)

constants: P = 11,915,233 // total number of features (SNPs)
     S = 250 // reduced number of features
     K = 47,660 // number of reduced models

step A1: generate reduced data sets and assess performance of reduced models

shuffle the vector of all P features

for each reduced model k in K // over reduced models
  randomly sample S features without replacement from the
  shuffled vector

  fit a multinomial logistic regression model to N individuals
  described by S features (dependent variable–target class
  independent variable–features)

  collect model_performance_k = cross-entropy loss

  for each s in S // over features
    feature_performance_k_s = max(SNP effect estimate)
    feature_index_k_s = feature Id from feature vector
  end for
end for

step A2: generate model performance matrix C

generate a model performance matrix of K rows and P + 1 columns filled with zeros

for each k in K // over reduced models
  C[k,P + 1] = model_performance_k

  for each s in S // over features
    C[k,feature_index_k_s] = feature_performance_k_s
  end for
end for

step A3: supervised rank aggregation based on matrix C

run LMM (dependent variable–model performance C[:,P + 1], independent variable–feature performance C[:,1:P]

collect 1:P vector of LMM parameter estimates

step A4: feature selection based on LMM parameter estimates

define 2 clusters of LMM parameter estimates based on 1D-K-means clustering