Skip to main content
. 2022 Oct 12;11(10):1495. doi: 10.3390/biology11101495
Algorithm 1: Steps to implement the proposed algorithm
  • 1.

    Load data set into R and assign classes 1 and 0 to the two selected group of cells to form a binary classification problem.

  • 2.

    Shuffle cells within each class to randomize the data points.

  • 3.

    Remove genes with no variability in expression across all cells.

  • 4.
    Split the data set into training (90%) and test (10%) for 10-fold cross validation.
    • (a)
      Fit ridge, lasso, elastic net, and drop lasso.
    • (b)
      Find the top important genes from each method. The top genes are the genes that have coefficients above a cut off (mean of absolute value of coefficients).
    • (c)
      Form a gene pool by taking union of the top important genes from the 4 models; for instance, Figure 3 and Figure 4 represent the gene pool of data sets GSE123818 and GSE71585, respectively.
    • (d)
      Fit SGL with the new gene pool pre-grouped by hierarchical clustering.
    • (e)
      Save the coefficients of SGL.
    • (f)
      Repeat the steps for a 10-fold CV.
  • 5.

    Calculate the average of coefficients for each gene across the 10 folds and sort the genes.

  • 6.

    Visualize the gene versus coefficients plot and select the final set of genes using an elbow curve.

  • 7.

    Cluster all the cells by applying K-means clustering on the top important genes.