Abstract
Single-cell RNA sequencing (scRNA-seq) provides a high throughput, quantitative and unbiased framework for scientists in many research fields to identify and characterize cell types within heterogeneous cell populations from various tissues. However, scRNA-seq based identification of discrete cell-types is still labor intensive and depends on prior molecular knowledge. Artificial intelligence has provided faster, more accurate, and user-friendly approaches for cell-type identification. In this review, we discuss recent advances in cell-type identification methods using artificial intelligence techniques based on single-cell and single-nucleus RNA sequencing data in vision science. The main purpose of this review paper is to assist vision scientists not only to select suitable datasets for their problems, but also to be aware of the appropriate computational tools to perform their analysis. Developing novel methods for scRNA-seq data analysis remains to be addressed in future studies.
Index Terms—: Artificial Intelligence, Single-cell RNA Sequencing, Vision Science, Review
1. Introduction
SINGLE-cell RNA sequencing (scRNA-seq) technologies were established in 1990’s [1]. These technologies provide a powerful way for researchers to efficiently distinguish phenotypic heterogeneity of complex cell populations types in biological samples [2].
The utilization of scRNA-seq technology has profoundly transformed the way vision research is conducted. It has enabled large-scale and reliable characterization of various cell types in the ocular tissues resulting in a growing number of ocular (in particular, retinal) cell atlases from different species. These atlases have provided new insights into several rare and common eye diseases in two major ways. First, they have revealed cell-class and cell-type specific expression patterns for many disease-associated genes, and second, they have pinpointed similarities and differences among human and other animals’ ocular cell types validating their use in vision research.
Traditional scRNA-seq cell-type identification methods include two stages. First, cells are clustered through an unsupervised approach then clusters are annotated based on the canonical markers that are recognized in differentially expressed genes in each cluster [3–5]. In these methods, cluster annotation is very time-consuming because it requires broad literature knowledge about marker genes and forces calls whether to lump or split clusters based upon overlapping expression of marker genes.
In recent years, another class of single-cell data analysis models have been introduced to automatically identify clusters (e.g., cell types) without the requirement of manual annotation. These models utilize annotated scRNA-seq atlases for training the models and identify the clusters of unlabeled data based on the labels from the cells in the atlases. Most of these models use feature learning based on annotated training data (e.g., related reference atlases). Feature learning includes a set of techniques that allows a model to automatically discover the representations appropriate for classification task from raw data. These approaches have a common purpose, which is accurate cell annotation; however, they differ in proposed algorithms and the usage of prior knowledge [6, 7]. These approaches are useful when the user has access to large and well annotated datasets (atlases) [7–9].
The scRNAseq technology has been used in numerous research domains including vision and ophthalmology. For example, scRNA-seq was applied to retinal tissues for cell identification as early as 2015 [10]. Shortly after, several scRNA-seq approaches were applied to various visual system including retina [11–28], outflow pathways [29–31], cornea [31–34], iris [31, 35], and ciliary body [31, 35, 36]. The eye can be used as a “window to the brain” as the structures of the eye are more accessible and can be more readily visualized using various imaging technologies compared to the brain proper. However, identifying eye’s cell types is highly challenging due to the specialized/hybrid nature of many cells that facilitate vision. Additionally, the numerous cell types of the eye collectively make it a structure of interest for investigating different health conditions. The light enters the eye and reaches the retina by passing through the cornea and lens, which in turn are responsible for concentrating the light appropriately on retina. The muscles of the iris, located posterior to the lens and the anterior cornea, regulates the angle of light that can enter the eye. The ciliary body and trabecular meshwork tissues control intraocular pressure via the regulation of the production and drainage of the aqueous humor, respectively. The retinal pigment epithelial (RPE) cells are located adjacent to photoreceptor cells lining the back of the orbit and serves to recycle visual pigment and provide nutrients to retinal photoreceptors. All these ocular tissues have major clinical impacts: For instance, lens opacification leads to cataracts, elevated intraocular pressure (IOP) may lead to glaucoma, and the retinal pigment epithelium dysfunction may eventually lead to age-related macular degeneration (AMD). Based on the WHO report, approximately 33.6 million people worldwide are blind in the 50 years and older age group [37]. The major distinguished causes of blindness are cataracts (~15.2 million individuals), glaucoma (~3.6 million individuals), and age-related macular degeneration (AMD) (~1.8 million individuals) [38]. Over the past few years, significant scRNA-seq data have been generated from different eye tissues corresponding to these and other ocular conditions using various single cell type and subtype identification approaches. While important and promising, these findings would be made even more useful and valuable by benchmarking different artificial intelligence approaches to determine which method(s) is better suited to solve a particular problem or identify specific cell types. Additionally, a benchmarking review would aid investigators who are new to single cell and vision research choosing the most appropriate tools, thus avoiding potential confusion and the need to make selections based on trial and error.
In this paper, we provide several taxonomies based on scRNA-seq datasets (atlases), well-established artificial intelligence approaches, and ocular diseases. First, we review the published scRNA-seq datasets related to the visual system. We then review the artificial intelligence approaches applied to broad cell type and subtype identification. We discuss the analysis pipeline and provide visualization of the scRNA-seq results. Furthermore, we review the identification of ocular diseases based on scRNA-seq data. Finally, we benchmark different artificial intelligence approaches based on three vision-related datasets, which are perfect for this type of analysis-not too big or small, and discuss advantages and disadvantages, as well as potential applications.
2. The scRNA-seq/snRNA-seq atlases (datasets) of the visual system
2.1. Atlases of the mouse retina
Macosko et al. [10] were among the first groups who applied high-throughput scRNA-seq (i.e., Drop-seq technology) and generated scRNA-seq data from ocular tissues. They used unsupervised machine learning methods and identified 39 cell clusters from 44,808 single retinal cells. They annotated 33 cell types comprised of both rod and cone photoreceptors (PRs), retinal ganglion cells (RGCs), horizontal cells (HCs), bipolar cells (BCs), amacrine cells (ACs), Müller glia (MG), and non-neuronal cells. A follow-up study led by Shekhar [11] enriched mouse retinal bipolar cells (BCs) before subjecting them to the scRNA-seq. From the transcriptome of 23,300 putative single BCs, they concluded a classification approach which identified 15 distinct BC subtypes including all types identified previously by Macosko et al. [10] and two additional novel BC subtypes. In total, they identified 18 BC clusters in this study. In another study, Rheaume et al. [11] generated 6225 RGCs from mouse eyes, developed a computational pipeline, and analyzed scRNA-seq data. They identified 40 clusters of cells corresponded to 40 different subtypes of RGCs in which 30 RGCs subtypes were recognized previously. They identified that most of the previously known RGC markers were expressed in multiple RGC clusters (subtypes) and only NPY, Jam2, Trhr, Pde1a, and Gna14 genes were expressed in a single cluster (RGC subtype). Therefore, they could determine the gene expression (variability) threshold required for RGC subtype segregation and diversification and presented a hierarchy from RGC cell types to subtypes. Further, they generated a portal for comparing gene expression in RGC subtypes. Tran et al. [14] in 2019 and Yan et al. [24] in 2020 adapted the pipeline introduced in Shekhar et al. [11] to classify a large number of RGCs and amacrine cells obtained from mouse retina. They identified 45 distinct clusters corresponding to RGC subtypes based on an scRNA-seq dataset with 35,699 single cells and 63 subtypes of amacrine cells. Yan et al. [24] adopted a method of selective depletion based on 55,287 single cells, of which 32,523 cells were ultimately recognized as amacrine cells. They identified 63 clusters corresponding to different subtypes of amacrine cells which led to determination of class-specific marker for amacrine cells. They recognized molecular markers in each cluster and utilized them for the characterization of morphology in multiple types. They showed that the identified markers included all of the previously known AC type markers as well as numerous new markers. They observed most of the AC types expressed markers in canonical inhibitory neurotransmitters glycine or GABA, but some of the identified markers expressed neither in glycine nor in GABA. They also found transcriptomic relationships between AC types and recognized transcription factors that were expressed through single or multiple closely related cell types. Also, They observed that Tcf4 and Meis2 genes were expressed by most glycinergic and most GABAergic types, respectively.
Heng et al. [15] proposed a scRNA-seq study to identify spontaneous uveoretinitis in Aire−/− mice by utilizing 64,196 single retinal cells. They analyzed the patterns of gene expression in various immune and retinal cells based on the abundance of different immune cell types and identified six different classes: Th1 cells, Cd8a+ T cells, T follicular Helper (Tfh) cells, regulatory T (Treg) cells, NK cells, and NK T cells. Sun et al. [24] constructed an atlas based on 14,424 single cells collected from healthy mice and mice with diabetes to identify pathological alterations of diabetic retinopathy (DR). They determined several pathological variations of DR and introduced potential guides for DR therapy.
While the mouse visual system is an invaluable resource for studying neural circuit structure, function, and development, it has various shortcomings compared to the human visual system. Essentially, retinas, vascular arrangement, and lamina of primates, such as humans, differ from those of rodents but some other structures such as conventional outflow system are very similar. Particularly, unlike rodents, primates have fovea, a small retinal region located in the center of retina, that is primarily responsible for chromatic vision [40]. In order to address these shortcomings and gain a broad comprehensive understanding picture of specific mechanisms underlying cell heterogeneity in human vision retina, several groups have generated single cells from human and other primate visual systems retina.
2.2. Retina atlases in non-human and human primates
In 2019, Peng et al. [15] generated a retinal atlas profiling the foveal and peripheral retina of the macaque monkey separately. In order to enhance the probability of being able to the identify rare RGC subtypes (e.g., ipRGC subtypes) localized within the highly rod-dominated peripheral region, they enriched RGCs using anti-Thy1 (from peripheral samples) and depleted rods using anti-CD73 in order to identify rare cell types and also subtypes. Based on 165,000 single cells, they identified 64 foveal and 71 peripheral clusters corresponding to different cell types and subtypes. In several studies, macaque atlases have been used to annotate different clusters of retinal cells [17–23, 27]. Over the past few years, several research groups have extended related studies to foveal and peripheral retina of humans [19, 21, 22, 27].
Consistent with physiological and morphological studies, cell types in human and macaque retinas are highly similar. In 2019, Menon et al. [17] generated the first human retina single-cell transcriptomic atlas based on 20,091 single cells. Their analysis was based on a multi- resolution network to identify different clusters corresponding to the retinal cell types and representative markers (e.g., genes expressed differentially). They found that human retinal microglia are more heterogeneous than previously thought and recognized some cell types (glia, cone photoreceptors, and vascular cells) associated with the risk of age-related macular degeneration (AMD). Lukowski et al. [18] compiled a transcriptome atlas from human retina by profiling 20,009 single cells. They identified 18 transcriptionally distinct cell types representing all known neural retinal cells: retinal ganglion cells, Müller glia, bipolar cells, amacrine cells, rod photoreceptors, cone photoreceptors, horizontal cells, microglia, and astrocytes. Liang et al. [20] generated an atlas based on 5873 single nuclei collected from human retinal tissue. They identified major retinal cell types and their corresponding marker genes. More specifically, they identified several differentially expressed genes corresponding to photoreceptor cells that may serve as potential genes for inherited retinal diseases (IRD) and showed that their atlas may improve prioritization of genes related to human retinal diseases in comparison to mouse snRNA-seq and human bulk RNA-seq data.
Several teams have generated atlases of single cells collected from human fovea and peripheral retina [19, 21–23, 27]. Voigt et al. [19] proposed an scRNA-seq for the human fovea and peripheral retina. They retrieved 8217 cells, including 3578 cells from fovea and 4639 cells from periphery. They found two clusters of cone photoreceptor cells based on differentially expressed genes. One cone cluster included 96% of foveal cells, while the other cluster included exclusively peripherally- isolated cone cells. The major marker for peripheral cones was Beta-Carotene Oxygenase 2. They observed a relative defect caused by this enzyme that may be due to the carotenoids accumulation which is primarily responsible for the yellow pigment in the macula.
The other study for understanding disease pathogenesis and identifying causal AMD genes was performed by Orozco et al. [23]. They presented an atlas of the human retina and retinal pigment epithelium (RPE) and comprised an expression quantitative trait loci (eQTL) atlas that included RPE/choroid and macula-specific retina, and single-nucleus RNA-seq from the human retina. They found enriched expression for AMD gene candidates in RPE cells. They identified 15 putative genes for AMD according to genetic association signals within AMD risk and eye eQTL, containing the genes TSPAN10 and TRPM1. Yan et al. [21] analyzed the transcriptomes of 84,982 cells of the fovea and peripheral retina and recognized 58 clusters corresponding to different cell types and subtypes within retinal ganglion, horizontal, bipolar, photoreceptor, amacrine, and non-neuronal cells groups. Their follow-up analyses suggested that almost all cell types are common among two retinal regions, however there are significant differences in the gene expression level and the number of expressed genes among foveal and peripheral groups of shared cell types.
Cowan et al. [22] used human retinal tissues with multiple synaptic and nuclear layers for studying disease mechanisms and treatment in human retinas. They sequenced 285,441 single cells from the fovea and periphery, choroid, and pigment epithelium of human retinas. Comparing the periphery to the fovea, they recognized regional characteristics and features of clusters and, through comparing organoid to organ, they determined that organoid cell type transcriptomes converge to those of peripheral retinas in adult humans.
Yi et al. [27] generated a transcriptomic atlas from 119,520 single cells collected from the foveal and peripheral retina from humans and macaques of different ages. They observed that retinal molecular features were differed among the two species, indicating distinct regional and species dependent in human and macaque retinas. They also observed that aging in human retinal occurred in specific cell types and regions, reflecting a foveal-to-peripheral gradient. Further quantification analysis of scRNA-seq data showed that population of MYO9A− rods and horizontal cell subtypes significantly decreased during aging of retina.
2.3. Retina atlases in insects, avians, and fish
Ariss et al. [13] generated 11,500 single eye disc cells including all main known cell types and determined the impact of an Rbf mutation within Drosophila eye development. They identified genes differentiating photoreceptor cells during axonogenesis and found a cell group that showed intracellular acidification primarily due to growth of glycolytic activity. Yamagata et al. [26] utilized scRNA-seq for generating a chick retina cell atlas to study the avian visual system. They recognized 136 cell types and 14 developmental or positional intermediates distributed between six classes: retinal ganglion, amacrine, photoreceptor, horizontal, bipolar, and glial cells. They adopted an approach based on CRISPR method, named eCHIKIN to annotate various clusters and to analyze selectively expressed genes to recognize the molecularly defined types of morphology. For Muller glia, they observed that transcriptionally different cells were regionally localized along the dorsal-ventral, anterior-posterior, and central-peripheral retinal axes. Additionally, they identified several cell types including immature horizontal cells, photoreceptors, and oligodendrocyte types which persist into late embryonic stages. Kolsch et al. [28] developed a systematic approach to classify RGCs in adult and larval zebrafish, then identified marker genes for greater than 30 mature cell types and various developmental intermediates.
2.4. Atlases from other ocular tissues
Most scRNA-seq atlases for vision were generated based on retinal tissues. However, since 2020, several researchers have generated atlases based on other ocular tissues as well.
The outflow pathway for aqueous humor (intraocular fluid), includes tissues responsible for maintaining the homeostasis of the intraocular pressure (IOP). Dysfunction of the cell types in these tissues may lead to ocular hypertension and subsequent risk for glaucoma, which is one of the leading causes of blindness worldwide. Patel et al. [29] generated a scRNA-seq atlas from human outflow tissues. They obtained expression profiles from 8758 cells and identified 12 different cell types. A major utility of their atlas is to map glaucoma-relevant genes to the human outflow cell types. They also presented two different TM cell types, showed that SC is a hybrid blood lymphatic vessel, and highlighted the abundance of resident macrophages in the outflow tissues. Van Zyl et al. [30] performed similar studies of outflow tissues on humans as well as four other species including mouse (Mus musculus), rhesus macaque (Macaca mulatta), cynomolgus macaque (Macaca fascicularis), and pig (Sus scrofa). They analyzed scRNA-seq data from 24,023 single cells and identified 19 clusters. While they observed that many human clusters were similar to those from other species, they found that there were also some differences in both clusters and the marker genes. They found out that although many human outflow pathway cell types had counterparts in other species, there were obvious differences in gene expression and clusters in human compared to non-human tissues. They also identified clusters that expressed genes that were known to be associated with glaucoma. Thomson et al. [39] showed that tissue-specific elimination of Svep1 or Angpt1 from the trabecular meshwork caused primary congenital glaucoma in mouse with severe defects in the adjacent Schlemm’s canal. By single-cell transcriptomic analysis on glaucomatous and normal Angpt1, they recognized distinct trabecular meshwork and Schlemm’s canal cell populations and discovered additional distinct trabecular meshwork-Schlemm’s canal signaling pathways.
As light enters the eye, it passes through five layers of cornea including endothelium, epithelium, stroma, Bowman’s layer, and Descemet’s membrane. The cornea’s structure and transparency formations are maintained by various cell-types populated in each layer. Attempts to understand disease conditions and to regenerate corneal tissue requires extensive knowledge for modifying cell profiles across this heterogeneous tissue. Collin et al. [32] analyzed 21,343 single cells collected from human corneas and adjacent conjunctivas and identified 21 clusters of cells. Subsequently, they extended their analysis to keratoconus disease and observed that activation of collagenase in corneal stroma and reduction of the number of limbal suprabasal cells could be key indicators for disease phenotypes. Català et al. [33] generated a single cell transcriptomic atlas based on 19,472 single cells collected from corneal endothelium. They analyzed the corneal layers heterogeneity and identified HOMER3, CAV1, and CPVL expressions in the corneal epithelial limbal stem cell niche. They showed STMN1, CKS2, and UBE2C were individually expressed across highly proliferative transit-amplifying cells, NNMT was individually expressed through stromal keratocytes, and CXCL14 was expressed individually in suprabasal/superficial limbus. These results provide a basis for future amendments to current primary cell expansion protocols in order to enhance future profiling of corneal disease states. Wang et al. [34] introduced a transcriptomic atlas based on 16,924 single cells collected from human corneal endothelium. The corneal endothelium is a major tissue for maintaining corneal clarity through mediating hydration by pump and barrier functions. Their findings provide novel insights into the development of Fuchs endothelial corneal dystrophy (FECD) and suggest that NEAT1 may offer an attractive method for treating FECD.
The iris is responsible for controlling level of retinal illumination via regulation of pupil diameter. Wang et al. [35] presented a study that provided snRNA-seq data generated from iris cells of mice. They identified major cell types for the mouse iris and the ciliary body, which led to the detection of two kinds of iris stromal cells and also iris sphincter cells. They also characterized the diversities in cell-type transcriptomes within the dilated vs. resting states, then identified and validated antibody and in situ hybridization (ISH) probes to illustrate major iris clusters.
Youkilis et al. [36] introduced a transcriptomic atlas based on 10,024 cells collected from ciliary body of mice and contiguous tissues, which play. a major role in ocular homeostasis. They utilized scRNA-seq to assess the transcriptional signatures from the ciliary body and adjacent tissues. They identified two fibroblast signatures in the ciliary body cells (sclera and uvea), which were subsequently confirmed via in situ hybridization (ISH).
Van Zyl et al. [31] generated a single nucleus atlas of ocular anterior segment from the human eye. The anterior segment of human eye includes cornea, iris, crystalline lens, aqueous humor outflow pathways, and ciliary body. They profiled 195,248 nuclei from anterior segment tissues and identified more than 60 cell types.
Table 1 shows the overview of scRNA-seq and snRNA-seq atlases generated based on vision system. It is worth mentioning that the Spectacle web tool houses all single cell atlases for vision [41].
TABLE 1.
OVERVIEW of the scRNA-seq/snRNA-seq atlases generated based on vision system
Reference | Year | #of Cells | #of Clusters | Species | Tissues | Diseases | Sequencing Instruments | Available Data File Formats | Deposited Data |
---|---|---|---|---|---|---|---|---|---|
Macosko et al. [10] | 2015 | 44,808 | 39 | Mouse | Retina | - | Illumina NextSeq 500 | BAM, FASTA, GTF, TXT | GSE63473 |
Shekhar et al. [11] | 2016 | 23,300 | 18 | Mouse | RGCs | - | Illumina HiSeq 2500, Illumina NextSeq 500 | BAM | GSE81905 |
Rheaume et al. [12] | 2018 | 6225 | 41 | Mouse | RGCs | Glaucoma | Illumina HiSeq 4000 | CSV | GSE115404 |
Ariss et al. [13] | 2018 | 11,500 | 15 | Drosophila | Eye disc | Retinoblastoma | Illumina NextSeq 500 | TXT | GSE115476 |
Tran et al. [14] | 2019 | 35,699 | 45 | Mouse | RGCs | Glaucoma | Illumina HiSeq 2500, llumina NextSeq 500 | CSV | GSE137400 |
Peng et al. [15] | 2019 | 165,000 | Fovea:64, Periphe:71 | Macaque | F&PR | DR | Illumina HiSeq 2500 | - | GSE118480 |
Heng et al. [16] | 2019 | 64,196 | 12 | Mouse | Retina | Uveoretinitis | Illumina HiSeq 2500, Illumina NovaSeq 6000 | MTX, TSV | GSE132229 GSM3854512-3854519 |
Menon et al. [17] | 2019 | 20,091 | 9 | Human | Retina | AMD | Illumina NextSeq 500 | MTX, TXT, TSV | GSE137537 GSE137847 |
Lukowski et al. [18] | 2019 | 20,009 | 18 | Human | Retina | - | Illumina HiSeq 2500 | TXT, TSV | E-MTAB-7316 |
Voigt et al. [19] | 2019 | 8,217 | 17 | Human | F&PR | AMD | Illumina HiSeq 4000 | TSV | GSE130636 |
Liang et al. [20] | 2019 | 5873 | 7 | Human | Retina | AMD | Illumina HiSeq 2500 | TXT | GSE133707 |
Yan et al. [21] | 2020 | 84,982 | 58 | Human | F&PR | AMD, DR, Glaucoma | Illumina HiSeq 2500 | CSV | GSE148077 |
Cowan et al. [22] | 2020 | 285,441 | 65 | Human | F&PR, RPE, choroid | AMD, DR, Glaucoma | Illumina HiSeq 2500 | TXT | GSE104827 |
Orozco et al. [23] | 2020 | 100,055 | 27 | Human | Retina and RPE | AMD | Illumina HiSeq 2500 | TSV | GSE135092 GSE135133 |
Yan et al. [24] | 2020 | 32,523 | 63 | Mouse | RACs | - | Illumina HiSeq 2500 | CSV | GSE149715 |
Patel et al. [29] | 2020 | 8,758 | 12 | Human | OP | Glaucoma | Illumina NextSeq 500 | - | PRJNA616025 |
Van Zyl et al. [30] | 2020 | 24,023 | 19 | Human, Macaque, Pig, Mouse | OP | Glaucoma | Illumina HiSeq 2500 | CSV | GSE146188 |
Sun et al. [25] | 2021 | 14,424 | 28 | Mouse | Retina | DR | Illumina NovaSeq 6000 | XLS | GSE178121 |
Yamagata et al. [26] | 2021 | 40,000 | 150 | Chick | Retina | - | Illumina HiSeq 2500 | CSV | GSE159107 |
Yi et al. [27] | 2021 | 119,520 | 56 | Humans, Macaques | F&PR | AMD | - | XLSX | GSA:CRA002680 GSA:HRA000182 |
Kölsch et al. [28] | 2021 | 32,679 | 32 | Zebrafish | RGCs | - | Illumina HiSeq 2500 | CSV | GSE152842 |
Collin et al. [32] | 2021 | 21,343 | 21 | Human | Cornea | Retinoblastoma | Illumina NovaSeq 6000 | CSV, TXT | GSE155683 |
Català et al. [33] | 2021 | 19,472 | 15 | Human | Cornea | - | Illumina NovaSeq 6000 | MTX, TSV | GSE186433 |
Wang et al. [35] | 2021 | 34,357 | 10 | Mouse | Iris and ciliary body | Iris disorders | Illumina NovaSeq 6000 | MTX, TSV | GSE183690 |
Youkilis et al. [36] | 2021 | 10,024 | 22 | Mouse | CB&CT | - | Illumina NovaSeq 6000 | MTX, TSV | GSE178667 |
Thomson et al. [39] | 2021 | 26,008 | 25 | Mouse | OP | Glaucoma | Illumina HiSeq 4000 | CSV | GSE168200 |
Wang et al. [34] | 2022 | 16,924 | 4 | Human | Corneal endothelium | FECD | Illumina No-vaSeq 6000 | - | GSA:HRA000781 |
Van Zyl et al. [31] | 2022 | 195,248 | 60 | Human | Anterior Segment | Glaucoma | Illumina HiSeq 2500, Illumina NovaSeq 6000 | CSV | GSE199013 |
RGCs = Retinal Ganglion Cells, F&PR = Fovea and peripheral retina, RPE = Retinal pigment epithelium, CB&CT = Ciliary body and contiguous tissues, RACs = Retinal amacrine cells, OP = Outflow pathways, DR = Diabetic Retinopathy, FECD = Fuchs endothelial corneal dystrophy
3. artificial intelligence approaches in cell-type identification methods for scRNA-seq/snRNA-seq
Table 2 presents a taxonomy for artificial intelligence methods applied on scRNA-seq and snRNA-seq data to recognize cell-types and subtypes based on different machine learning approaches. In this taxonomy, we have classified models based on how they learn to find different clusters or classes. According to this issue, we have divided them into three main categories: unsupervised, semi-supervised, and supervised approaches. We have then provided sub-categories at each level and provided the references to those approaches.
TABLE 2.
Taxonomy of cell-type identification methods that were included in this study
Unsupervised | Graph-based | Nearest neighbor | Scmap [42], Seurat [4], [10], [11], [13], [14], [15], [17], [18], [19], [20], [22], [29], [25], [26] , [27], [28], [32], [33], [21], [23], [24], [35], [36], [34], [30], scType [43] |
Partition-based | K-means clustering | [16] | |
Density-based | DBSCAN clustering | [12] | |
Deep NN-based | Recurrent network | ScScope [44] | |
Autoencoder | DESC [45], ScAIDE [46], scETM [47] | ||
Hierarchical Bayesian | scVI [48] | ||
Semi-supervised | Deep NN-based | Autoencoder | DISC [49], ScDCC [50] |
Supervised | Similarity-based | ScLearn [51] | |
General classifier-based | XGBoost | CaSTLe [52] | |
Logistic regression | SCCAF [53] | ||
Fisher’s linear discriminant analysis | ScID [54] | ||
Deep NN-based | Capsule networks | ScCapsNet [55] | |
Weighted GNN | ScDeepSort [56] | ||
Hierarchical FFNN | NeuCA [57] | ||
Transfer learning-based | ItClust [58], SCTL [59] |
3.1. Unsupervised methods for cell type and subtype identification
Unsupervised cell-type identification methods recognize cell types and subtypes based on different clustering techniques. In unsupervised learning models, the data has no labels (cells have no type or subtype labels). Labeling the data means annotating the samples into different categories. Unsupervised learning models identify different clusters (groups) of samples based on (interesting) structures or patterns in the data. The labels will be propagated to the samples once the clusters were identified. Clusters (cell types or subtypes here) must be validated as there is no prior knowledge regarding the true labels. Unsupervised models can be further categorized to four subgroups including graph-based, partition-based, density-based, and deep neural network-based clustering. Some of these methods are discussed in more detail below.
Scmap [42] is a graph-based clustering technique that evaluates the maximum similarity between cells in the reference data and query data to identify different clusters corresponding to cell types. The optimum projection of a new query cell onto a reference data set is identified by the nearest neighbor approach. Seurat 3.0 [4] is a widely used toolkit for single-cell data analysis that employs this approach. Essentially, Seurat uses canonical correlation analysis (CCA) [3] and mutual nearest neighbors (MNNs) [60] approaches to identify cells with shared properties in the datasets. Seurat first jointly reduces the dimensionality of the reference and query datasets utilizing diagonalized CCA, then applies L2-normalization for the canonical correlation vectors. The model then searches MNNs in the common low-dimensional representation. Seurat encodes cellular relationships among datasets, which serve as the basis in all subsequent integration analyses.
In contrary to Seurat, Scmap learns cell-type-specific gene expression information from reference dataset only. As such, Scmap could be more vulnerable to the batch effect if reference and query datasets are generated in different batches. It is worth mentioning that although Seurat 3.0 employs the information (anchor pairs) extracted from the reference and query datasets, it doesn’t specifically use cluster information in the reference dataset. Many studies [10, 13, 15, 18–20, 25–29, 32–36] have used Seurat to perform clustering as part of their computational pipeline to identify cell types and subtypes. In the scType [43] approach, the authors have introduced a marker database as well as a cell-type identification approach based on the Louvain algorithm for unsupervised cell-type annotation. This approach provides an end-to-end pipeline for identifying single cell types and subtypes based on a built-in marker database. It is worth mentioning that Yan et al. [21, 24] and Van Zyl et al. [30] performed unsupervised clustering approach based on Louvain algorithm with using Jaccard correction while Shekhar et al. [11], Tran et al. [14], and Cowan et al. [22] employed Infomap algorithm [61] in their graph-based unsupervised clustering approach.
Some of the approaches including Heng and colleagues [16] have used conventional unsupervised clustering models such as k-means. Based on this model, cell types and subtypes are identified by cross-referencing the clusters for expression of multiple known cell type-specific markers.
Rheaume et al. [12] used a density-based clustering approach, namely, DBSCAN [62] to identify cell types and subtypes. DBSCAN is an unsupervised clustering method that identifies distinctive clusters in data based on groups with high adjacent density of samples, while separated from other clusters with low adjacent density of samples.
scScope [44] is a deep learning-based cell-type identification method, which utilizes a recurrent network layer for performing imputations iteratively on entries with zero-valued of input scRNA-seq data. scScope model lets imputed output to be iteratively amended through a selected number of recurrent steps. If number of recurrent steps is equal to 1, the model reduces to a standard autoencoder [63].
DESC [45] provides another unsupervised model based on deep learning algorithm which iteratively learns gene expression patterns corresponding to each cluster. Essentially, DESC first initializes different parameters obtained via an autoencoder then learns a nonlinear function for mapping the original scRNA-seq data in the high-dimensional space into the feature space in the low dimensional space based on optimizing the unsupervised clustering objective function in an iterative manner. By performing this iterative procedure, each cell moves to its nearest cluster centroid and balances technical and biological differences among clusters, and gradually decreases the influence of batch effect. ScAIDE [46] is another unsupervised deep learning clustering method that utilizes the autoencoder imputation network along with the distance-preserved embedding network (AIDE) for learning the data representations and then uses random projection hashing based on the k-means algorithm (RPH-kmeans) for accommodating the identification of rare cell types. There are several models based on neural networks and deep learning approaches. Single-cell Embedded Topic Model (scETM) [47] is a neural network-based autoencoder that has a linear decoder along with matrix tri-factorization. This method simultaneously learns topic embeddings, highly expressed gene embeddings, batch-effect linear intercepts, and encoder network parameters from scRNA-seq data. This method synthesizes existing pathway information with gene embeddings at training the model to further enhance interoperability by tri-factorizing the cells-genes matrix into the cells-by-topics, the topics-by-embeddings, and the embeddings-by-genes.
Single-cell variational inference (scVI) [48] is a deep neural network along with a stochastic optimization function for obtaining the probabilistic representation as well as analyzing the gene expression in single cells data.
scVI analysis is performed based on the hierarchical Bayesian [64] along with conditional distributions specified through deep neural networks. scRNA-seq data is encoded by a nonlinear mapping function and is projected on a low-dimensional feature space with the latent vector of normal random variables. Then the latent representation is decoded via other nonlinear mapping function to produce a posterior estimate for the distributional parameters of genes from each cell.
3.2. Semi-supervised methods for cell type and subtype identification
Approaches that use a combination of supervised and unsupervised learning strategies for cell type and subtype identification are categorized in semi-supervised learning (SSL) group.
Semi-supervised learning approaches are promising when a few labeled data are available thus allowing unsupervised components to complement their training with unlabeled data [65].
SSL approaches may address dropout problems in single cell data. Dropout is essentially one of the main challenges in scRNA-seq data analysis. Dropouts, the extra false zero expressions, causes a skewed distribution of gene expressions and result in the misclassification of cell types [66]. Recent advances in combinatorial indexing-based or droplet-based sequencing methods have significantly grown the throughput from thousands to more than a million cells for a single experiment and caused more intense dropout problems because of shallow sequencing depth in per cell [67]. In the following, we will discuss a few SSL-based cell identification methods.
DISC [49] is an autoencoder-based semi-supervised deep learning network method to infer gene structure and expression of single cell data generated based on the dropout technology. Semi-supervised learning here generates a reliable imputation approach via learning information from zero-count genes and positive genes, that can be treated as unlabeled and labeled data, respectively. DISC works well in scenarios that the information from the labeled data is limited. Single Cell Deep Constrained Clustering (scDCC) [50] is another semi-supervised clustering approach based on autoencoder that encodes prior knowledge in constraint information, that is integrated into the clustering algorithm through a loss function. Also, scDCC integrates domain knowledge for the clustering step.
3.3. Supervised methods for cell -type and subtype identification
Different approaches in this category typically utilize a previously annotated dataset as a reference to train supervised machine learning classifiers and use the trained model to identify cell types and subtypes from unlabeled data. However, in such supervised methods, it is expected that the reference and query datasets resemble each other, which is not often the case. This poses challenges in successful label propagation [60].
scLearn [51] is a supervised single cell classification method that utilizes the similarity among single cells in the query dataset and cluster centroids of cells in the reference dataset based on information learned from reference datasets. scLearn can identify novel cell types not presenting in the reference datasets. It uses a multilabel single-cell assignment approach to assign a single cell to proper time status and cell type simultaneously.
General classifier-based methods are another subgroup of supervised cell type identification methods. CaSTLe [52], classification of single cells through transfer learning, is a supervised method based on the XGBoost classifier [68]. This method selects genes with mutual information gain and genes with top mean expression, then removes correlated genes before classification. All these steps are performed to ensure the reference and query datasets are brought into a common denominator to perform an accurate transfer of the classification model. SCCAF [53] presents a self-projection method based on linear regression [69], and ScID (Single Cell IDentification) [54] utilizes the Fisher’s Linear Discriminant Analysis [70] to recognize transcriptionally related cell types among scRNA-seq datasets. This method extracts markers through the reference dataset and then weighs their relevance on the target dataset via learning a classifier based on the putative population of different cells either expressing or not expressing these genes.
scCapsNet [55] and scDeepSort [56] are a subgroup of supervised deep neural network-based methods in proposed taxonomy. scCapsNet [55] has an interpretable deep-learning architecture utilizing capsule networks (scCapsNet). Capsule structure is a neuron vector that represents the properties of a specific object and captures hierarchical relations. scCapsNet creates the decision-making black box transparent via analyzing internal weight parameters between capsule structures. scDeepSort [56] is a pre-trained cell-type identification approach based on weighted graph neural networks (GNN) model. GNN is one of the popularly-used deep learning methods [71] that captures graph dependency via message passing among the graph nodes and keeps a state which represents required information from its neighborhood with ideal depth [72]. NeuCA [57] computes the mean gene expression of each cell group then evaluates the correlation matrix among different cell types. If cell types are highly correlated, the model constructs a tree structure based on hierarchical feed-forward clustering and subsequently trains several neural networks (based on the tree structure) to perform the final classification task.
ItClust [58] has received more attention compared to unsupervised models as it utilizes the information from external well-annotated reference datasets. As the performance of supervised methods are highly dependent on the quality of the reference dataset, ItClust employs a transfer learning approach [73] that takes advantage of cell type-specific gene expression information learned from a high-quality reference dataset, to help classify cell types on a newly generated query dataset. The approach considers highly variable genes within the reference dataset in order to extract relevant information. This ensures that transferred expression patterns will be useful for dividing cell types based on the new datasets. SCTL [59] is another deep transfer learning method to detect cell types and also subtypes in the scRNA-seq data. SCTL includes four loss functions where one loss function represents domain adaptation, and three loss functions represent the adversarial network. These loss functions eventually minimize the classification error.
4. Analysis pipeline and visualizing scRNA-seq/snRNA-seq results
In this section, we discuss key steps in scRNA-seq data visualization. Numerous statistical and machine learning methods are applied to the gene expression data to gain insight into the transcriptome heterogeneity of cell types. These techniques typically include normalization, scaling, dimension reduction, clustering, and classification. However, visualization plays an important role in both qualitative and quantitative scRNA-seq data analysis (see Figure 1).
Fig. 1.
Analysis pipeline and visualizing scRNA-seq/ snRNA-seq results. (A) Two-dimensional visualization for molecular diversity of single cells using tSNE. (B) Feature plot with cells colored via expression level of a single differentially expressed gene. (C) Probability distribution of gene expression levels within each cluster represented using Violin plot. (D) Expression patterns of genes across a set of cell clusters represented using Dot plot. (E) Expression patterns of specific marker genes for single cells grouped by their cell type clusters using Heat map plot. (F) Transcriptional correspondence among different clusters corresponding to cell types [5].
There are several different techniques for visualizing scRNA-seq/snRNA-seq results (Figure 1 A–F). The t-distributed stochastic neighbor embedding (t-SNE) is a widely used statistical method to visualize high-dimensional data by assigning each data point to a location on a low-dimensional (i.e., two-dimensional) map. Figure 1A shows clusters of the molecular diversity for cells using tSNE on a two-dimensional map [74]. Each point corresponds to a single cell, with transcriptionally similar cells being clustered together. Cells are typically colored differently based on their defined cluster identity. Figure 1B shows the feature plot of cells in panel (A) at which cells are colored relying on their level of expression of genes of interest (i.e., genes that were differentially expressed between clusters). Uniform manifold approximation and projection (UMAP) is similar to t-SNE. Both t-SNE and UMAP build a graph which represents data into high dimensional space and reconstruct the graph into a lower dimensional space. Figure 1C shows violin plot of the expression levels (y axis) of single cells in Panel A based on a specific gene (i.e., a gene that was differentially expressed in cluster number 4 compared to other clusters). Each violin represents the distribution of expression levels of cells within that particular cluster. The box and whisker plots within each violin correspond to the median value (black horizontal line), range (vertical lines), interquartile range (bars), and outliers (dots). Figure 1D shows a sample dot plot visualizing the expression pattern (both number of cells expressing a gene and the level of expression) of a set of genes across different clusters (x-axis). The size of each circle indicates the percentage of cells in that cluster expressing the gene, and color saturation represents normalized expression level. Along with the dot-plot, a dendrogram calculated using hierarchical clustering on top typically represents transcriptional interrelationships among clusters. Figure 1E shows a sample heat map plot. This ordered heatmap plot represents the expression patterns of specific marker genes for single cells grouped by their cells in different clusters. Figure 1F shows a sample confusion matrix plot representing transcriptional correspondence between clusters in two different datasets. Colors indicate the proportion of cells of a given cluster in one dataset assigned to a corresponding cluster in the other dataset in which the classification algorithm trained on cells in one of the datasets.
5. identifications of Ocular Diseases based on scRNA-seq/snRNA-seq data
In this section, we discuss ocular diseases and related tissues which have been studied on single-cell and single-nucleus RNA sequencing data. Most irreversible vision losses are caused by retinal diseases that explain the focus of most research on retina. The three prevalent and irreversible leading causes of vision loss are diabetic retinopathy, age-related macular degeneration (AMD), and glaucoma. Based on a single cell study, Lukowski et al. [18] observed that MALAT1, a long non-coding RNA, plays a major role in retinal homeostasis and disease [75] and they recognized that MIO-M1 cells have high levels of the thymosin beta 4 gene (TMSB4X) that has been related to glioma malignancy [76], and the calcyclin gene (S100A6), that is implicated in cone-associated or macular diseases [77]. These results show the differences and similarities in MIO-M1 for human retinal glial cells. Many groups have utilized scRNA-seq/snRNA-seq data to assess retinal cell types which express genes implicated in different retinal diseases [15, 21–23, 27]. For example, most genes which are mutated in retinitis pigmentosa and initially affect rods, are expressed via rods or via retinal pigment epithelial cells and required for rod viability. Also, most genes mutated in autosomal dominant optic atrophies and Leber hereditary optic neuropathies, that result in RGC death, are selectively expressed in RGCs; and many susceptibility genes related to diabetic retinopathy are expressed in vascular endothelial cells. There are various studies on AMD pathogenesis. CFH and ARMS2/HTRA1 genes impart biggest risk in AMD disease [78], Voigt et al. [19] evaluated the differentially expressed genes among cells of foveal versus peripheral origin. Menon et al. [17] identified various cell types associated with AMD. Their results suggest that the genetic risk variants related to AMD affect cone photoreceptors, and they emphasized the importance of vascular and glial cells for disease pathogenesis. They found that expression of COL4A3, HTRA1, and vascular endothelial growth factor (VEGFA) had high scores for leading to AMD. Orozco et al. [22] identified enriched expression of AMD candidate genes for RPE cells. They identified TSPAN10 and TRPM1 which were enriched in retinal pigment epithelium, as causal genes that have high impact in early AMD disease. Yi et al. [27] reported that genes related to AMD are highly enriched in cones and foveal MG, suggesting a relation of regional cell subtype with this disease. Lyu et al. [79] showed that compositional changes are more pronounced in the macula in rods, endothelium, microglia, astrocytes, and Müller glia in the transition from normal to advanced AMD. They also identified enrichment in coagulation and complement pathways, signaling pathways, tissue remodeling, and antigen presentation, including PI3K-Akt, Rap1, Toll-like, and NOD-like. Sun et al. [25] studied diabetic retinopathy and identified four stress-inducible genes Rmb3, Cirbp, Mt1, and Mt2 which commonly exist in most retinal cell types. Diabetes increases the inflammatory factor gene expressions in retinal microglia and stimulates the immediate early gene expressions (IEGs) in retinal astrocytes. Van Zyl et al. [30] studied glaucoma cases and recognized the cell types which represent gene expressions implicated in glaucoma. They found that several genes, such as MYOC, PITX2, CYP1B1, Cav1, and Cav2 are implicated in glaucoma. In their other study [31], they showed MYOC, ANGPT1, LMX1B, ANGPT2, PITX2, LTBP2, FOXC1, and CPAMD8 were associated with glaucoma. Patel et al. [29] studied assigning glaucoma-relevant genes to outflow cell clusters. They found that MYOC, PDPN, ANGPTL7, CHI3L1, and ANGPT1 were highly expressed in TM1 and TM2 cell types, while CAV1, CAV2, Tie2 (TEK), NOS3, ANGPT2, and PLAT, were highly expressed in vascular endothelial and lymphatic-like cell types.
In Figure 2, we show the percentage of scRNA-seq/snRNA-seq studies based on different ocular diseases and ocular tissues from different species We found that in vision and ophthalmology, mouse, human, macaque, and chick were studied the most (55%, 19%, 11%, 9%, respectively), followed by drosophila, pig, and zebrafish that were studied similarly (about 2% each).
Fig. 2.
Percentage of single cell studies in vision and ophthalmology based on different species, tissues, and diseases. (A) Percentage of studies conducted on mouse, pig, macaque, zebrafish, human, chick, drosophila were 55%, 2%, 11%, 2%, 19%, 9%, and 2%, respectively. From studies done on mouse species, about 89%, 7%, 4% are on retina, iris, and outflow pathways tissues, respectively. Almost all pig studies have only been done on outflow pathways tissues. From studies performed on macaque, 82% were on retina and 18% on outflow pathways tissues. Almost all zebrafish and drosophila studies have only been done on retina. From studies conducted on human, about 90% are on retina while about 10% are focused on outflow pathways tissues. From studies on chick, about 22% are focused on retina while nearly 78% are focused on cornea. (B) The percentage of studies performed on different diseases include about 33% on glaucoma, 29% on age related macular degeneration (AMD), 17% on diabetic retinopathy (DR), 4% on uveoretinitis, and 17% on other uvea diseases.
From mouse studies, about 89%, 7%, 4% are focused on retina, iris, and outflow pathways tissues, respectively. It is worth mentioning that almost all of the studies on pig have only been applied to the outflow pathways tissues. From all the studies performed on macaque, about 82% are on retina and 18% on outflow pathways tissues. Almost all zebrafish studies have only been conducted on retina. From studies done on human tissues, about 90% have analyzed retina and the remaining have focused on the outflow pathways tissues, respectively. From the studies performed on chick, about 22% were performed on retina and 78%, were applied on cornea (see Figure 2A).
Also, we observed that most single cell studies have focused on AMD, diabetic retinopathy, glaucoma, and uveoretinitis. About 33% of the single cell studies were focused on glaucoma, whereas AMD, diabetic retinopathy, uveoretinitis, and other uvea diseases constituted %29, 17%, 4% and 17% of studies, respectively (see Figure 2B).
6. Benchmarking cell-type identification methods for scRNA-seq/snRNA-seq data
6.1. Benchmark Datasets
Benchmarking could aid investigators who are new to single cell and vision research to select the most appropriate tools and validation datasets thus avoiding potential confusion and the need to perform trial and error. We introduced numerous scRNA-seq atlases in vision science and now benchmark numerous methods based on several of those atlases to highlight the advantages and limitations of those methods.
The datasets used for benchmarking vary in the number of cells, genes, and cell populations, thus resulting in different levels of challenges in identification of each cell type (see Table 3).
TABLE 3.
The datasets used for benchmarking in this study
Macosko et al. [10] | Rheaume et al. [12] | Tran et al. [14] | |||
---|---|---|---|---|---|
| |||||
Cluster name | #of Cells | Cluster name | #of Cells | Cluster name | #of Cells |
HC | 252 | P5 RGC0 | 161 | W3-like1: RGC1 | 3,000 |
RGC | 432 | W3D1: P5 RGC1 | 426 | W3D1: RGC2 | 2,859 |
AC | 289 | P5 RGC2 | 188 | F-mini-ON: RGC3 | 1,990 |
AC | 73 | P5 RGC3 | 196 | F-mini-OFF: RGC4 | 1,868 |
AC | 77 | F-mini-ON: P5 RGC4 | 329 | J-RGC: RGC5 | 1,715 |
AC | 211 | P5 RGC5 | 66 | W3B: RGC6 | 1,590 |
AC | 326 | P5 RGC6 | 52 | RGC7 | 1,579 |
AC | 159 | F-mini-OFF: P5 RGC7 | 268 | RGC8 | 1,258 |
AC | 350 | F-RGC: P5 RGC8 | 95 | T-RGC: RGC9 | 1,223 |
AC | 191 | P5 RGC9 | 185 | RGC10 | 1,170 |
AC | 214 | P5 RGC10 | 143 | RGC11 | 990 |
AC | 274 | P5 RGC11 | 88 | ooDSGC-N: RGC12 | 953 |
AC | 50 | P5 RGC12 | 135 | W3-like2: RGC13 | 943 |
AC | 111 | W3-like1: P5 RGC13 | 429 | RGC14 | 875 |
AC | 73 | J-RGC: P5 RGC14 | 235 | RGC15 | 865 |
AC | 262 | T-RGC-S2: P5 RGC15 | 91 | ooDSGC-D/V: RGC16 | 829 |
AC | 375 | P5 RGC16 | 135 | T-RC-S1: RGC17 | 828 |
AC | 83 | P5 RGC17 | 121 | RGC18 | 826 |
AC | 127 | W3D3: P5 RGC18 | 80 | RGC19 | 775 |
AC | 389 | P5 RGC19 | 115 | RGC20 | 711 |
AC | 254 | P5 RGC20 | 224 | T-RGC-S2: RGC21 | 687 |
AC | 274 | P5 RGC21 | 102 | MX: RGC22 | 610 |
AC | 264 | P5 RGC22 | 48 | W3D2: RGC23 | 601 |
Rods | 29,400 | P5 RGC23 | 93 | RGC24 | 553 |
Cones | 1,868 | P5 RGC24 | 89 | RGC25 | 542 |
BC | 2,217 | P5 RGC25 | 175 | RGC26 | 534 |
BC | 664 | P5 RGC26 | 233 | RGC27 | 529 |
BC | 496 | W3-like2: P5 RGC27 | 147 | F-midi-OFF: RGC28 | 517 |
BC | 591 | P5 RGC28 | 100 | RGC29 | 499 |
BC | 636 | P5 RGC29 | 124 | W3D3: RGC30 | 491 |
BC | 512 | P5 RGC30 | 186 | M2: RGC31 | 444 |
BC | 320 | ooDSGCN: P5 RGC31 | 168 | F-RGC: RGC32 | 407 |
BC | 849 | P5 RGC32 | 133 | M1a: RGC33 | 323 |
M | 1,624 | P5 RGC33 | 108 | RGC34 | 312 |
A | 54 | W3B: P5 RGC34 | 155 | RGC35 | 310 |
F | 85 | P5 RGC35 | 70 | RGC36 | 236 |
V | 252 | P5 RGC36 | 183 | RGC37 | 213 |
P | 63 | P5 RGC37 | 135 | F-midi-ON: RGC38 | 207 |
M | 67 | P5 RGC38 | 150 | RGC39 | 202 |
P5 RGC39 | 44 | M1b: RGC40 | 174 | ||
P5 RGC40 | 20 | aON-T: RGC41 | 126 | ||
aOFF-S: RGC42 | 113 | ||||
aON-S/M4: RGC43 | 106 | ||||
RGC44 | 62 | ||||
aOFF-T: RGC45 | 54 |
HC = Horrizontal Cells, RGC = Retinal Ganglion Cells, AC = Amacrine Cells; BC = Bipolar Cells, M = Muller Glia, A = Astrocytes, F = Fibroblasts, V = Vascular Endothelium, P = Pericytes, M = Microglia.
The first dataset, selected from the Macosko et al. [10] study, included 44,808 single cells of 39 various retinal cell types (and subtypes) and 24,658 common genes using Droplet-based RNA-seq technology. Single cells in this dataset were obtained from retinas of several 14-day-old wild-type C57BL/6 mice. As these mice, within each strain are genetically identical, they highly present uniform inherited characteristics and response to experimental treatments. Experimental workflow of this study was performed by using seven various batches generating 3226, 6020, 8336, 5683, 6991, 6971, and 7581 cells (total 44,808 cells).
The second dataset, selected from the Rheaume et al. [12] study, included 6225 single RGCs comprising 41 different RGC subtypes and 13,616 common genes generated using a single batch based on Droplet-based RNA-seq technology. Single RGCs were obtained from retinas of eight postnatal C57Bl/6 mice at day five including both sexes.
The third dataset, acquired from the Tran et al. [14] study comprised of 35,699 single RGCs in 45 various subtypes and 18,222 common genes. Single RGCs were collected from. Retinas of adult C57BL/6 mice with ages ranged from 6 to 20 weeks. Single RGCs were generated in three batches using the droplet-based RNA-seq technology. Characteristics of the cells and cell types are shown in Table 3.
We randomly split the cells in these datasets into two subsets, where 80% of cells were allocated to the training data and the remaining 20% of cells were allocated to testing data. The performance of the single cell-type identification methods was assessed by cross-validation (calculating the mean accuracy based on five rounds of testing.
6.2. Data preprocessing
Single cell datasets, count matrices, and annotated labels were downloaded from the public resources (GSE63473, GSE115404, GSE137400). We first applied the Seurat [4] R package (version 4.0.6) to normalize and scale the transcriptome datasets. We applied two exclusion criteria: we excluded cells with fewer than 200 expressed genes then removed genes which were expressed in fewer than three cells. We then selected a subset of highly variable genes to further reduce dimensionality. Specifically, we computed the dispersion (variance/mean) and the mean expression levels of each gene [10] and computed the z-normalized dispersion level of all genes to find highly variable genes. We used the same cutoff level of 1.7 as was used in the Macosko et al. [10], which is also the default parameter in Seurat package. The preprocessing led to inclusion of about 2000 highly variable genes for the downstream analysis.
6.3. Benchmark methods
Among the forty-five methods discussed in this review, we included seventeen published cell identification methods that their source codes or program was publicly accessible (see Table 4).
TABLE 4.
CELL-type identification methods used for benchmarking
These methods are divided into three main categories: (1) Unsupervised cell-type identification methods, which are based on unsupervised clustering of the transcriptomes [4, 42, 45–48], (2) Semi-supervised cell-type identification methods, which require some limited labeled data in order to allow models to complement their training with unlabeled data [49, 50], (3) Supervised cell-type identification approaches, which need a training dataset labeled with the corresponding cell populations for training the classifier [51–59].
6.4. Evaluation indicators
We evaluated the performance based on accuracy, speed, and memory usage metrics. Accuracy is defined as the ratio of cells that are correctly predicted cells, divided by the total number of annotated cells [9]. The memory requirements were obtained via reading rss (resident set size) attribute (which is returned through calling Process().memory_info() of psutil in the Python package) [80]. For the other approaches based on R language, we used the reticulate package [81] for calling the mentioned Python function for having consistency.
6.5. Results
6.6.1. Accuracy of cell type identification
We compared seventeen cell-type identification methods for correctly identifying the number of cell types through applying each method on three ocular datasets [10, 12, 14] that contain 39, 41, and 45 cell types. Figure 3 illustrates the accuracy of these models applied on three different datasets.
Fig. 3.
Accuracy (%) of seventeen cell-type identification methods based on (A) Macosko et al. (B) Rheaume et al. (C) Tran et al. benchmark ocular datasets.
Figure 3 presents the performance of different cell-type identification methods based on three ocular datasets. Macosko et al. [10] dataset contains heterogenous numbers of cells in different cell types and also includes single cells from seven different batches compared to the Tran et al. [14] and Rheaume et al. [12] datasets. The benchmarking results on Macosko et al. dataset showed that ItClust and SCTL generate the highest accuracy in comparison to the other methods. Both methods use a transfer learning-based algorithm, which improve cell-type identification particularly on datasets with varying number of cells from different cell types and removes the batch effect better than the other methods. In benchmarking the methods based on the Rheaume et al. and Tran et al. datasets, with over 40 cell types, NeuCA, ItClust, and SCTL outperformed the other methods. Generally, the results on all three datasets show that supervised cell identification methods obtained higher accuracy. In average, these three methods, ItClust, NeuCA, and SCTL, obtained the highest accuracy based on all three datasets.
6.6.2. Accuracy of batch correction
We compared seventeen cell-type identification methods and evaluated the accuracy of the models when data has been generated based on several different batches using the Macosko et al. dataset. From seven different batches of this dataset, we assigned one single batch as testing dataset and the rest of the other batches as training dataset and reaped this process seven times to cover all seven batches as testing. Substantially smaller number of single cells exist in Batch B1 compared to the B2-B7 batches. Figure 4 (A–G) shows the accuracy (and standard error) of these methods based on batches B1 to B7, respectively.
Fig. 4.
Accuracy (%) of seventeen cell-type identification methods based on seven different batches of Macosko et al dataset. We assigned each of batch B1 (A), B2 (B), B3 (C), B4 (D), B5 (E), B6 (F), B7 (G) as the testing dataset and the rest of the other batches as the training dataset.
The benchmarking results on Macosko et al. dataset showed that SCTL and ItClust generate the highest accuracy in comparison to the other methods based on different batches. Both methods use a transfer learning-based algorithm, which seems better in removing the batch effect. The accuracy of different models based on the B1 batch is lower compared to the other batches (see Figure 4A) which may suggest lower quality of cells in this batch.
6.6.3. Running time and peak memory usage
We assessed the computational time (Figure 5A) and peak memory usage (Figure 5B) for cell-type identification methods based on different number of cells (sample sizes) randomly selected from the Macosko et al. dataset.
Fig. 5.
Benchmarking cell-type identification methods. (A) Average memory used (GB) of sixteen cell-type identification methods on the Macosko et al. dataset. (B) Average time used (minutes) of sixteen cell-type identification methods based on the Macosko et al. dataset.
The experiments were performed on a workstation with 16 CPU cores at 3.90GHz, 64GB of memory and one GE-FORCE RTX 3070 graphical processing unit (GPU) card.
As expected, almost all of the neural network-based methods (for example, ScDeepSort, ScCapsNet) need more computational time than the other approaches, since a large number of parameters are required to be optimized. Generally, unsupervised cell-type identification models show excellent performance in terms of speed and almost all of these methods require no more than 5 minutes to complete cell type or subtype identification.
To compare the computational time of the cell-type identification methods and to see how they are scaled when the number of cells increases, we randomly selected subsets of cell from the Macosko et al. dataset and benchmarked the methods.
Results showed that the increase in computational time of Scmap and ScAIDE methods is smooth, suggesting that computational time increases almost linearly with increasing the sample size. Figure 5B shows the performance of cell-type identification methods based on computational time.
To compare the peak memory usage of the cell-type identification approaches, we randomly selected subsets of cells from the Macosko et al. dataset and compared the methods. Results indicated that the scVI, scETM, and DESC required least amount of memory to complete the analyses.
Collectively, results suggest that applying cell-type identification methods on large datasets require larger memory sizes and longer run time thus the optimum size and computational time may be optimized based on the estimated numbers in Figure 5.
7. Discussion
We discussed numerous vision-related scRNA-seq datasets and explained overall advantages and disadvantages of different cell-type identification models. We showed that scRNA-seq data analysis methods based on supervised learning performed better than methods based on unsupervised or semi-supervised models in terms of accuracy.
Transfer learning-based (e.g., ItClust, SCTL) and Deep NN-based (e.g., NeuCA) methods, in comparison, performed favorably in general. Therefore, in scenarios in which we deal with a large number of labeled samples, supervised models may generate higher accuracy in identifying cell types, however if dealing with unlabeled data, unsupervised methods (based on Autoencoder, e.g., scETM) may perform better.
In terms of computational complexity, almost all of the neural network-based approaches (e.g., ScDeepSort, and ScCapsNet) require more computational time than the other methods since a large number of parameters are required to be optimized. Generally, unsupervised cell-type identification approaches show excellent performance in terms of speed.
In terms of peak memory usage, unsupervised methods use less memory. In fact, Hierarchical Bayesian (e.g., scVI) and Autoencoder (e.g., scETM, and DESC) methods required the least amount of memory to complete the analyses.
8. Conclusion
In this study, we reviewed different single cell studies conducted in in vision science. We described numerous vision-related scRNA-seq/snRNA-seq atlases and provided taxonomies based on different species, tissue, and ocular diseases. We also described and benchmarked different machine-learning-based models in single cell data analysis and categorized models into three main subgroups: unsupervised_, semi-supervised_, and supervised_ cell-type and subtype identification methods. We also provided insights to scRNA-seq data visualization and provided several approaches for effectively visualizing the outcome in single cell studies. Finally, we benchmarked seventeen single cell type and subtype identification methods based on three different datasets encompassing diverse single retinal cell types and subtypes. We provided metrics on the accuracy, computational time, and memory usage thus allowing vision researchers to optimize their available computational resources when working with single cell transcriptomic datasets. Our study provides a valuable review on available vision-related single cell datasets and scRNA-seq data analysis techniques and discusses future development of scRNA-seq based cell-type identification methods in vision science.
Acknowledgment
This work was supported in part by grants from the Bright Focus Foundation, NIH Grants EY033005, EY031725, and P30DA044223.
Biographies
Yeganeh Madadi received her Ph.D. in Computer Engineering, Artificial Intelligence from the Azad University of Tehran in 2020 and her MSc in Computer Science from the Amirkabir University of Technology in 2015. Dr. Madadi is a Postdoctoral Research Fellow at the University of Tennessee Health Science Center. She was an Artificial Intelligence researcher at Aalborg University from 2019–2020. In addition, she has over 17 years of the executive experiment at the University of Tehran in Computer Engineering. Her research interests are Machine Learning, Computer Vision, and Bioinformatics.
Aboozar Monavarfeshani earned his Ph.D. in Biological Sciences from Virginia Tech, and he is currently completing his Postdoctoral training at Boston Children’s Hospital and the Center for Brain Science at Harvard University. During his doctoral training, he investigated molecules that are important for the formation of neural connections between retinal ganglion cells (RGCs) and neurons in several subcortical visual centers in the brain including the dorsal and ventral lateral geniculate nucleus, superior colliculus, and suprachiasmatic nucleus. In his Postdoc, Dr. Monavarfeshani is using single nucleus RNA sequencing to measure, both in post-mortem human tissues and in model organisms, the transcriptional changes that occur in degenerating RGCs, and in other cell types residing in the optic nerve head (ONH)–the primary site of glaucomatous injury to the axons of RGCs.
Hao Chen received his Ph.D. degree in Anatomy from Michigan State University. He is a full professor in the department of pharmacology, addiction science and toxicology at the University of Tennessee Health Science Center. Professor Chen has a wide range of research interests, with a focus on genetic and genomics of substance abuse related phenotypes modeled using rats. In addition, his lab contributed to many transcriptome studies of brain regions critically involved in the reward circuitry. Lastly, his lab is working on using deep learning to analyze rat social behaviors.
W. Daniel Stamer, Ph.D. was educated at the University of Arizona, earning his Bachelor of Science in Molecular and Cellular Biology in 1990 and doctorate in Pharmacology and Toxicology in 1996. After completing two research fellowships, Professor Stamer started his research program in 1998 at the University of Arizona, where he remained for 13 years; rising through the ranks to full Professor and Director of Ophthalmic Research. He was subsequently recruited to Duke University in 2011, where he currently serves as the Joseph A.C. Wadsworth Professor of Ophthalmology and Professor of Biomedical Engineering. The primary research focus of the Stamer laboratory is to understand the molecular and cellular mechanisms that regulate conventional outflow such that novel targets can be identified, validated and used for the development of therapeutics that target/modify the diseased tissue responsible for elevated intraocular pressure in glaucoma. Over the past 30+ years, Professor Stamer has pioneered the development of cellular, tissue, organ culture and murine model systems for use by his laboratory and others to study conventional outflow physiology and pharmacology. His laboratory has worked closely with industry, assisting in the development/pre-clinical testing of several new classes of glaucoma drugs that target the diseased conventional outflow pathway responsible for ocular hypertension. Research progress is documented in over 175 peer-reviewed primary contributions to the literature and two dozen reviews/book chapters/editorials, having over 9900 citations. His work was recognized by the Rudin Prize for Glaucoma in 2012 and the Research to Prevent Blindness Foundation in 2013. More recently, Professor Stamer was elected as ARVO trustee in 2015 and ARVO president in 2019/20 and elected into the Glaucoma Research Society in 2022. He currently holds prominent editorial positions in three premier ophthalmology journals: the Journal of Ocular Pharmacology and Therapeutics, Investigative Ophthalmology and Visual Science and Experimental Eye Research. Moreover, Professor Stamer currently serves on the scientific advisory boards for 6 companies and three foundations that support glaucoma research.
Robert (Rob) W. Williams received a BA in neuroscience from UC Santa Cruz (1975) and a Ph.D. in system physiology at UC Davis (1983). He did postdoctoral work in developmental neurobiology at Yale where he developed novel stereological methods to estimate neuron populations. In 1989 Williams moved to the University of Tennessee and in 2013 established the Department of Genetics, Genomics and Informatics in the College of Medicine. He holds UT Oak Ridge National Laboratory Governor’s Chair in Computational Genomics. Williams is director of the Complex Trait Community (www.complextrait.org) and founding editor-in-chief of Frontiers in Neurogenomics.
Siamak Yousefi is an Assistant Professor at the Department of Ophthalmology and the Department of Genetics, Genomics, and Informatics of the University of Tennessee Health Science Center (UTHSC) in Memphis. He received his PhD in Electrical Engineering from the University of Texas at Dallas in 2012 and completed two postdoctoral trainings at the University of California Los Angeles (UCLA) and University of California San Diego (UCSD). He is the director of the Data Mining and Machine Learning (DM2L) laboratory at UTHSC working on broad applications of Artificial Intelligence (AI) in vision and ophthalmology particularly glaucoma diagnosis, prognosis, and monitoring. Dr. Yousefi is a senior member of the IEEE and a member of ARVO.
Contributor Information
Yeganeh Madadi, Department of Ophthalmology, University of Tennessee Health Science Center, Memphis, TN, USA..
Aboozar Monavarfeshani, Center for Brain Science and Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA. USA, F.M. Kirby Neurobiology Center, Boston Children’s Hospital, Boston, MA, USA..
Hao Chen, Department of Pharmacology, Addiction Science and Toxicology, University of Tennessee Health Science Center, Memphis, TN, USA..
W. Daniel Stamer, Department of Ophthalmology, Duke Eye Center, Duke University, Durham, NC, USA.
Robert W. Williams, Department of Genetics, Genomics, and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA.
Siamak Yousefi, Department of Ophthalmology, University of Tennessee Health Science Center, Memphis, TN, USA, Department of Genetics, Genomics, and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA..
References
- [1].Eberwine J et al. “Analysis of gene expression in single live neurons,” Proceedings of the National Academy of Sciences, vol. 89, no. 7, pp. 3010–3014, 1992. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [2].Islam S et al. “Characterization of the single-cell transcriptional landscape by highly multiplex RNA-seq,” Genome Res, vol. 21, no. 7, pp. 1160–7, Jul 2011, doi: 10.1101/gr.110882.110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [3].Butler A, Hoffman P, Smibert P, Papalexi E, and Satija R, “Integrating single-cell transcriptomic data across different conditions, technologies, and species,” Nat Biotechnol, vol. 36, no. 5, pp. 411–420, Jun 2018, doi: 10.1038/nbt.4096. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [4].Stuart T et al. “Comprehensive Integration of Single-Cell Data,” Cell, vol. 177, no. 7, pp. 1888–1902 e21, Jun 13 2019, doi: 10.1016/j.cell.2019.05.031. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [5].Shekhar K and Sanes JR, “Generating and using transcriptomically based retinal cell atlases,” Annual Review of Vision Science, vol. 7, pp. 43–72, 2021. [DOI] [PubMed] [Google Scholar]
- [6].Zappia L, Phipson B, and Oshlack A, “Exploring the single-cell RNA-seq analysis landscape with the scRNA-tools database,” PLoS Comput Biol, vol. 14, no. 6, p. e1006245, Jun 2018, doi: 10.1371/journal.pcbi.1006245. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [7].Abdelaal T et al. “A comparison of automatic cell identification methods for single-cell RNA sequencing data,” Genome Biol, vol. 20, no. 1, p. 194, Sep 9 2019, doi: 10.1186/s13059-019-1795-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [8].Kiselev VY, Andrews TS, and Hemberg M, “Challenges in unsupervised clustering of single-cell RNA-seq data,” Nat Rev Genet, vol. 20, no. 5, pp. 273–282, May 2019, doi: 10.1038/s41576-018-0088-9. [DOI] [PubMed] [Google Scholar]
- [9].Xie B, Jiang Q, Mora A, and Li X, “Automatic cell type identification methods for single-cell RNA sequencing,” Comput Struct Biotechnol J, vol. 19, pp. 5874–5887, 2021, doi: 10.1016/j.csbj.2021.10.027. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [10].Macosko EZ et al. “Highly Parallel Genome-wide Expression Profiling of Individual Cells Using Nanoliter Droplets,” Cell, vol. 161, no. 5, pp. 1202–1214, May 21 2015, doi: 10.1016/j.cell.2015.05.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [11].Shekhar K et al. “Comprehensive Classification of Retinal Bipolar Neurons by Single-Cell Transcriptomics,” Cell, vol. 166, no. 5, pp. 1308–1323 e30, Aug 25 2016, doi: 10.1016/j.cell.2016.07.054. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [12].Rheaume BA et al. “Single cell transcriptome profiling of retinal ganglion cells identifies cellular subtypes,” Nat Commun, vol. 9, no. 1, p. 2759, Jul 17 2018, doi: 10.1038/s41467-018-05134-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [13].Ariss MM, Islam A, Critcher M, Zappia MP, and Frolov MV, “Single cell RNA-sequencing identifies a metabolic aspect of apoptosis in Rbf mutant,” Nat Commun, vol. 9, no. 1, p. 5024, Nov 27 2018, doi: 10.1038/s41467-018-07540-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [14].Tran NM et al. “Single-Cell Profiles of Retinal Ganglion Cells Differing in Resilience to Injury Reveal Neuroprotective Genes,” Neuron, vol. 104, no. 6, pp. 1039–1055 e12, Dec 18 2019, doi: 10.1016/j.neuron.2019.11.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [15].Peng YR et al. “Molecular Classification and Comparative Taxonomics of Foveal and Peripheral Cells in Primate Retina,” Cell, vol. 176, no. 5, pp. 1222–1237 e22, Feb 21 2019, doi: 10.1016/j.cell.2019.01.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [16].Heng JS et al. “Comprehensive analysis of a mouse model of spontaneous uveoretinitis using single-cell RNA sequencing,” Proc Natl Acad Sci U S A, Dec 16 2019, doi: 10.1073/pnas.1915571116. [DOI] [PMC free article] [PubMed]
- [17].Menon M et al. “Single-cell transcriptomic atlas of the human retina identifies cell types associated with age-related macular degeneration,” Nat Commun, vol. 10, no. 1, p. 4902, Oct 25 2019, doi: 10.1038/s41467-019-12780-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [18].Lukowski SW et al. “A single-cell transcriptome atlas of the adult human retina,” EMBO J, vol. 38, no. 18, p. e100811, Sep 16 2019, doi: 10.15252/embj.2018100811. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [19].Voigt AP et al. “Molecular characterization of foveal versus peripheral human retina by single-cell RNA sequencing,” Exp Eye Res, vol. 184, pp. 234–242, Jul 2019, doi: 10.1016/j.exer.2019.05.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [20].Liang Q et al. “Single-nuclei RNA-seq on human retinal tissue provides improved transcriptome profiling,” Nat Commun, vol. 10, no. 1, p. 5743, Dec 17 2019, doi: 10.1038/s41467-019-12917-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [21].Yan W et al. “Cell Atlas of The Human Fovea and Peripheral Retina,” Sci Rep, vol. 10, no. 1, p. 9802, Jun 17 2020, doi: 10.1038/s41598-020-66092-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [22].Cowan CS et al. “Cell Types of the Human Retina and Its Organoids at Single-Cell Resolution,” Cell, vol. 182, no. 6, pp. 1623–1640 e34, Sep 17 2020, doi: 10.1016/j.cell.2020.08.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [23].Orozco LD et al. “Integration of eQTL and a Single-Cell Atlas in the Human Eye Identifies Causal Genes for Age-Related Macular Degeneration,” Cell Rep, vol. 30, no. 4, pp. 1246–1259 e6, Jan 28 2020, doi: 10.1016/j.celrep.2019.12.082. [DOI] [PubMed] [Google Scholar]
- [24].Yan W, Laboulaye MA, Tran NM, Whitney IE, Benhar I, and Sanes JR, “Mouse Retinal Cell Atlas: Molecular Identification of over Sixty Amacrine Cell Types,” J Neurosci, vol. 40, no. 27, pp. 5177–5195, Jul 1 2020, doi: 10.1523/JNEUROSCI.0471-20.2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [25].Sun L et al. “Single cell RNA sequencing (scRNA-Seq) deciphering pathological alterations in streptozotocin-induced diabetic retinas,” Exp Eye Res, vol. 210, p. 108718, Sep 2021, doi: 10.1016/j.exer.2021.108718. [DOI] [PubMed] [Google Scholar]
- [26].Yamagata M, Yan W, and Sanes JR, “A cell atlas of the chick retina based on single-cell transcriptomics,” Elife, vol. 10, Jan 4 2021, doi: 10.7554/eLife.63907. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [27].Yi W et al. “A single-cell transcriptome atlas of the aging human and macaque retina,” Natl Sci Rev, vol. 8, no. 4, p. nwaa179, Apr 2021, doi: 10.1093/nsr/nwaa179. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [28].Kolsch Y et al. “Molecular classification of zebrafish retinal ganglion cells links genes to cell types to behavior,” Neuron, vol. 109, no. 4, pp. 645–662 e9, Feb 17 2021, doi: 10.1016/j.neuron.2020.12.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [29].Patel G et al. “Molecular taxonomy of human ocular outflow tissues defined by single-cell transcriptomics,” Proc Natl Acad Sci U S A, vol. 117, no. 23, pp. 12856–12867, Jun 9 2020, doi: 10.1073/pnas.2001896117. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [30].van Zyl T et al. “Cell atlas of aqueous humor outflow pathways in eyes of humans and four model species provides insight into glaucoma pathogenesis,” Proc Natl Acad Sci U S A, vol. 117, no. 19, pp. 10339–10349, May 12 2020, doi: 10.1073/pnas.2001250117. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [31].van Zyl T, Yan W, McAdams AM, Monavarfeshani A, Hageman GS, and Sanes JR, “Cell atlas of the human ocular anterior segment: Tissue-specific and shared cell types,” Proceedings of the National Academy of Sciences, vol. 119, no. 29, p. e2200914119, 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [32].Collin J et al. “A single cell atlas of human cornea that defines its development, limbal progenitor cells and their interactions with the immune cells,” Ocul Surf, vol. 21, pp. 279–298, Jul 2021, doi: 10.1016/j.jtos.2021.03.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [33].Catala P et al. “Single cell transcriptomics reveals the heterogeneity of the human cornea to identify novel markers of the limbus and stroma,” Sci Rep, vol. 11, no. 1, p. 21727, Nov 5 2021, doi: 10.1038/s41598-021-01015-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [34].Wang Q et al. “Heterogeneity of human corneal endothelium implicates lncRNA NEAT1 in Fuchs endothelial corneal dystrophy,” Mol Ther Nucleic Acids, vol. 27, pp. 880–893, Mar 8 2022, doi: 10.1016/j.omtn.2022.01.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [35].Jie Wang AR, Nathans Jeremy, “A transcriptome atlas of the mouse iris at single-cell resolution defines cell types and the genomic response to pupil dilation,” eLife, 2021, doi: 10.7554/eLife.73477. [DOI] [PMC free article] [PubMed]
- [36].Youkilis JC and Bassnett S, “Single-cell RNA-sequencing analysis of the ciliary epithelium and contiguous tissues in the mouse eye,” Exp Eye Res, vol. 213, p. 108811, Dec 2021, doi: 10.1016/j.exer.2021.108811. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [37].Bourne R et al. “Trends in prevalence of blindness and distance and near vision impairment over 30 years: an analysis for the Global Burden of Disease Study,” The Lancet Global Health, vol. 9, no. 2, pp. e130–e143, 2021, doi: 10.1016/s2214-109x(20)30425-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [38].Steinmetz JD et al. “Causes of blindness and vision impairment in 2020 and trends over 30 years, and prevalence of avoidable blindness in relation to VISION 2020: the Right to Sight: an analysis for the Global Burden of Disease Study,” The Lancet Global Health, vol. 9, no. 2, pp. e144–e160, 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [39].Thomson BR et al. “Cellular crosstalk regulates the aqueous humor outflow pathway and provides new targets for glaucoma therapies,” (in eng), Nat Commun, vol. 12, no. 1, p. 6072, Oct 18 2021, doi: 10.1038/s41467-021-26346-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [40].Bringmann A et al. “The primate fovea: Structure, function and development,” Prog Retin Eye Res, vol. 66, pp. 49–84, Sep 2018, doi: 10.1016/j.preteyeres.2018.03.006. [DOI] [PubMed] [Google Scholar]
- [41].A. P. Voigt et al. “Spectacle: An interactive resource for ocular single-cell RNA sequencing data analysis,” Experimental Eye Research, vol. 200, 2020, doi: 10.1016/j.exer.2020.108204. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [42].Kiselev VY, Yiu A, and Hemberg M, “scmap: projection of single-cell RNA-seq data across data sets,” Nat Methods, vol. 15, no. 5, pp. 359–362, May 2018, doi: 10.1038/nmeth.4644. [DOI] [PubMed] [Google Scholar]
- [43].Ianevski A, Giri AK, and Aittokallio T, “Fully-automated and ultra-fast cell-type identification using specific marker combinations from single-cell transcriptomic data,” Nat Commun, vol. 13, no. 1, p. 1246, Mar 10 2022, doi: 10.1038/s41467-022-28803-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [44].Deng Y, Bao F, Dai Q, Wu LF, and Altschuler SJ, “Scalable analysis of cell-type composition from single-cell transcriptomics using deep recurrent learning,” Nat Methods, vol. 16, no. 4, pp. 311–314, Apr 2019, doi: 10.1038/s41592-019-0353-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [45].Li X et al. “Deep learning enables accurate clustering with batch effect removal in single-cell RNA-seq analysis,” Nat Commun, vol. 11, no. 1, p. 2338, May 11 2020, doi: 10.1038/s41467-020-15851-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [46].Xie K, Huang Y, Zeng F, Liu Z, and Chen T, “scAIDE: clustering of large-scale single-cell RNA-seq data reveals putative and rare cell types,” NAR Genom Bioinform, vol. 2, no. 4, p. lqaa082, Dec 2020, doi: 10.1093/nargab/lqaa082. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [47].Zhao Y, Cai H, Zhang Z, Tang J, and Li Y, “Learning interpretable cellular and gene signature embeddings from single-cell transcriptomic data,” Nat Commun, vol. 12, no. 1, p. 5261, Sep 6 2021, doi: 10.1038/s41467-021-25534-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [48].Lopez R, Regier J, Cole MB, Jordan MI, and Yosef N, “Deep generative modeling for single-cell transcriptomics,” Nat Methods, vol. 15, no. 12, pp. 1053–1058, Dec 2018, doi: 10.1038/s41592-018-0229-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [49].He Y, Yuan H, Wu C, and Xie Z, “DISC: a highly scalable and accurate inference of gene expression and structure for single-cell transcriptomes using semi-supervised deep learning,” Genome Biol, vol. 21, no. 1, p. 170, Jul 10 2020, doi: 10.1186/s13059-020-02083-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [50].Tian T, Zhang J, Lin X, Wei Z, and Hakonarson H, “Model-based deep embedding for constrained clustering analysis of single cell RNA-seq data,” Nat Commun, vol. 12, no. 1, p. 1873, Mar 25 2021, doi: 10.1038/s41467-021-22008-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [51].Bin Duan CZ, Chuai Guohui, Tang Chen, Chen Xiaohan, Chen Shaoqi, and Shaliu Fu GL, Liu Qi, “Learning for single-cell assignment,” SCIENCE ADVANCES, 2020. [DOI] [PMC free article] [PubMed]
- [52].Lieberman Y, Rokach L, and Shay T, “CaSTLe - Classification of single cells by transfer learning: Harnessing the power of publicly available single cell RNA sequencing experiments to annotate new experiments,” PLoS One, vol. 13, no. 10, p. e0205499, 2018, doi: 10.1371/journal.pone.0205499. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [53].Miao Z, Moreno P, Huang N, Papatheodorou I, Brazma A, and Teichmann SA, “Putative cell type discovery from single-cell gene expression data,” Nat Methods, vol. 17, no. 6, pp. 621–628, Jun 2020, doi: 10.1038/s41592-020-0825-9. [DOI] [PubMed] [Google Scholar]
- [54].Boufea K, Seth S, and Batada NN, “scID Uses Discriminant Analysis to Identify Transcriptionally Equivalent Cell Types across Single-Cell RNA-Seq Data with Batch Effect,” iScience, vol. 23, no. 3, p. 100914, Mar 27 2020, doi: 10.1016/j.isci.2020.100914. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [55].Wang L et al. “An interpretable deep-learning architecture of capsule networks for identifying cell-type gene expression programs from single-cell RNA-sequencing data,” Nature Machine Intelligence, vol. 2, no. 11, pp. 693–703, 2020, doi: 10.1038/s42256-020-00244-4. [DOI] [Google Scholar]
- [56].Shao X et al. “scDeepSort: a pre-trained cell-type annotation method for single-cell transcriptomics using deep learning with a weighted graph neural network,” Nucleic Acids Res, vol. 49, no. 21, p. e122, Dec 2 2021, doi: 10.1093/nar/gkab775. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [57].Li Z and Feng H, “A neural network-based method for exhaustive cell label assignment using single cell RNA-seq data,” Sci Rep, vol. 12, no. 1, p. 910, Jan 18 2022, doi: 10.1038/s41598-021-04473-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [58].Hu J, Li X, Hu G, Lyu Y, Susztak K, and Li M, “Iterative transfer learning with neural network for clustering and cell type classification in single-cell RNA-seq analysis,” Nat Mach Intell, vol. 2, no. 10, pp. 607–618, Oct 2020, doi: 10.1038/s42256-020-00233-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [59].Madadi Y, Sun J, Chen H, Williams R, and Yousefi S, “Detecting retinal neural and stromal cell classes and ganglion cell subtypes based on transcriptome data with deep transfer learning,” Bioinformatics, 2022, doi: 10.1093/bioinformatics/btac514. [DOI] [PMC free article] [PubMed]
- [60].Haghverdi L, Lun ATL, Morgan MD, and Marioni JC, “Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors,” Nat Biotechnol, vol. 36, no. 5, pp. 421–427, Jun 2018, doi: 10.1038/nbt.4091. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [61].Martin Rosvall CTB, “Maps of random walks on complex networks reveal community structure,” APPLIED MATHEMATICS, vol. 105, no. 4, pp. 1118–1123, 2008, doi: 10.1073/pnas.0706851105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [62].Zheng GX et al. “Massively parallel digital transcriptional profiling of single cells,” Nat Commun, vol. 8, p. 14049, Jan 16 2017, doi: 10.1038/ncomms14049. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [63].Pascal Vincent HL, Lajoie Isabelle, Bengio Yoshua, Manzagol Pierre-Antoine, “Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion,” Journal of Machine Learning Research, vol. 11, 2010. [Google Scholar]
- [64].Gelman A and Hill J, Data analysis using regression and multilevel/hierarchical models Cambridge university press, 2006. [Google Scholar]
- [65].Kostopoulos G, Karlos S, Kotsiantis S, and Ragos O, “Semi-supervised regression: A recent review,” Journal of Intelligent & Fuzzy Systems, vol. 35, no. 2, pp. 1483–1500, 2018. [Google Scholar]
- [66].Pierson E and Yau C, “ZIFA: Dimensionality reduction for zero-inflated single-cell gene expression analysis,” Genome biology, vol. 16, no. 1, pp. 1–10, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [67].Cao J et al. “Comprehensive single-cell transcriptional profiling of a multicellular organism,” Science, vol. 357, no. 6352, pp. 661–667, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [68].Guo X, Li W, and Iorio F, “Convolutional neural networks for steady flow approximation,” in Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, 2016, pp. 481–490.
- [69].Ntranos V, Yi L, Melsted P, and Pachter L, “A discriminative learning approach to differential expression analysis for single-cell RNA-seq,” Nature Methods, vol. 16, no. 2, pp. 163–166, 2019. [DOI] [PubMed] [Google Scholar]
- [70].Mika S, Ratsch G, Weston J, Scholkopf B, and Mullers K-R, “Fisher discriminant analysis with kernels,” in Neural networks for signal processing IX: Proceedings of the 1999 IEEE signal processing society workshop (cat. no. 98th8468), 1999: Ieee, pp. 41–48.
- [71].Wu Z, Pan S, Chen F, Long G, Zhang C, and Philip SY, “A comprehensive survey on graph neural networks,” IEEE transactions on neural networks and learning systems, vol. 32, no. 1, pp. 4–24, 2020. [DOI] [PubMed] [Google Scholar]
- [72].Zhou J et al. “Graph neural networks: A review of methods and applications,” AI Open, vol. 1, pp. 57–81, 2020. [Google Scholar]
- [73].Wang J et al. “Data denoising with transfer learning in single-cell transcriptomics,” Nature methods, vol. 16, no. 9, pp. 875–878, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [74].Maaten L. v. d., “Visualizing Data using t-SNE,” Journal of Machine Learning Research, vol. 9, pp. 2579–2605, 2008. [Google Scholar]
- [75].Wan P, Su W, and Zhuo Y, “Precise long non-coding RNA modulation in visual maintenance and impairment,” J Med Genet, vol. 54, no. 7, pp. 450–459, Jul 2017, doi: 10.1136/jmedgenet-2016-104266. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [76].Wirsching HG et al. “Thymosin beta 4 gene silencing decreases stemness and invasiveness in glioblastoma,” Brain, vol. 137, no. Pt 2, pp. 433–48, Feb 2014, doi: 10.1093/brain/awt333. [DOI] [PubMed] [Google Scholar]
- [77].Yoshida S et al. “Expression profiling of the developing and mature Nrl−/− mouse retina: identification of retinal disease candidates and transcriptional regulatory targets of Nrl,” Hum Mol Genet, vol. 13, no. 14, pp. 1487–503, Jul 15 2004, doi: 10.1093/hmg/ddh160. [DOI] [PubMed] [Google Scholar]
- [78].Matušková V et al. “An association of neovascular age‐ related macular degeneration with polymorphisms of CFH, ARMS2, HTRA1 and C3 genes in Czech population,” Acta ophthalmologica, vol. 98, no. 6, pp. e691–e699, 2020. [DOI] [PubMed] [Google Scholar]
- [79].Lyu Y et al. “Implication of specific retinal cell-type involvement and gene expression changes in AMD progression using integrative analysis of single-cell and bulk RNA-seq profiling,” Sci Rep, vol. 11, no. 1, p. 15612, Aug 2 2021, doi: 10.1038/s41598-021-95122-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [80].Rodola G, “psutil: Cross-platform lib for process and system monitoring in python,” vol. Version: 5.8.0, 2020. [Online]. Available: https://psutil.readthedocs.io.
- [81].Allaire JJ, Ushey K, Tang Y & Eddelbuettel D, “reticulate: R Interface to Python,” vol. Version: 1.18, 2017. [Online]. Available: https://github.com/rstudio/reticulate.