Summary
Cytometry data, including flow and mass cytometry, are widely used in immunological studies such as cancer immunotherapy and vaccine trials. These data provide rich insights into immune cell dynamics and their relationship to clinical outcomes. However, traditional analyses based on summary statistics may overlook critical single-cell information. To address this, we introduce cytoGPNet, a novel method for predicting individual-level outcomes from cytometry data. cytoGPNet addresses four key challenges: (1) accommodating varying numbers of cells per sample, (2) analyzing longitudinal cytometry data to capture temporal patterns, (3) maintaining robustness despite limited sample sizes, and (4) ensuring interpretability for biomarker discovery. We apply cytoGPNet across multiple immunological studies with diverse designs and show that it consistently outperforms existing methods in predictive accuracy. Importantly, cytoGPNet also offers interpretable insights at multiple levels, enhancing our understanding of immune responses. These results highlight cytoGPNet’s potential to advance cytometry-based analysis in immunological research.
Keywords: longitudinal cytometry data, small cohort studies, machine learning in immunology, predictive modeling, deep Gaussian process, autoencoders, explainable deep learning, attention-based regression
Highlights
-
•
cytoGPNet predicts individual-level outcomes from single-cell cytometry data
-
•
Integrates deep learning and Gaussian processes to model longitudinal samples
-
•
Handles small cohort studies and varying numbers of cells per sample
-
•
Enables interpretable and scalable biomarker discovery
The bigger picture
Advances in cytometry have enabled detailed profiling of the immune system at single-cell resolution, opening new opportunities for precision medicine in areas such as cancer, autoimmune disorders, and infectious diseases. Translating these complex datasets into clinical insights, however, remains challenging, especially in studies with small patient cohorts or samples collected at a single or multiple time points. CytoGPNet addresses these challenges by using machine learning to predict outcomes directly from single-cell cytometry data, offering a scalable, accurate, and interpretable framework for biomarker discovery and immune monitoring.
cytoGPNet is a deep learning framework that uses Gaussian processes to predict individual-level outcomes from high-dimensional, longitudinal cytometry data. It tackles key challenges in immune profiling, including small sample sizes, variable cell counts, and the need for interpretability. When applied to diverse datasets, it outperforms existing methods and reveals biological insights. This framework advances single-cell data analysis and supports biomarker discovery for precision medicine.
Introduction
Routinely monitoring the immune responses in individuals to diseases or interventions, including therapies and vaccinations, holds paramount significance in clinical trials, treatment modalities, and vaccine development. This imperative involves not only identifying immune responses that confer protection against infection or disease but also predicting an individual’s response to intervention. The evolution of cytometry technologies, including flow cytometry and mass cytometry (CyTOF), empowers researchers to systematically monitor the peripheral immune status of an individual at a reasonable cost. These technologies offer comprehensive insights into immune cell subsets, activation status, polyfunctionality, and other features. With the capacity to measure 10 to 50 parameters (cell markers/variables), encompassing both phenotypic and functional markers, on a large number of single cells (ranging from hundreds of thousands to several millions) for a single sample, such as blood, these multi-dimensional cytometry data provide a rich source of information. This information is invaluable for predicting clinical outcomes, including responses to vaccinations and cancer immune therapies.1,2,3,4,5,6,7,8,9,10,11,12,13
The conventional analysis for predicting clinical outcomes, employing traditional gating strategies14 for cytometry data, can be time-consuming and suboptimal. This approach independently conducts quantitative feature selection and summarization, such as deriving cell subset proportions and mean expression based on gating results, which then serve as a feature vector for outcome prediction. Despite the development of advanced methods to automate manual gating such as SPADE, PhenoGraph, FlowSOM, FlowMeans, FlowPeaks, OpenCyto, and others,15,16,17,18,19,20,21,22,23,24,25,26,27 the stepwise approach—deriving summary statistics first and then using them for prediction—may compromise single-cell resolution, potentially introducing variation and obscuring crucial single-cell information relevant to clinical outcomes.28,29,30 To address this, approaches that directly utilize single-cell matrix data as input have recently gained popularity as potent tools for predicting outcomes, such as CloudPred,31 ProtoCell4P,32 scPheno,33 DeepGeneX,34 and ScRAT,35 for single-cell RNA sequencing (scRNA-seq) data. Notable examples for cytometry data include CellCnn,36 CyTOF_DL,37 and CytoSet.38
In contrast to the existing applications of statistical and machine learning (ML) methods for cytometry data analysis—such as dimensionality reduction, visualization, identification of cell subsets, and data integration39,40—the direct utilization of the single-cell data matrix as input for individual subject outcome prediction presents unique challenges. Firstly, employing advanced ML methods, such as deep neural networks (DNNs), for prediction offers a more flexible model capacity, potentially leading to more accurate outcome prediction. However, training DNNs requires a large number of well-labeled training samples to achieve good prediction performance, posing challenges in studies with a limited number of subjects. Secondly, different samples, represented as single-cell data matrices, often contain varying numbers of cells, hindering the direct use of most existing ML methods without compromising single-cell resolutions. For instance, approaches such as summarizing single-cell data by mean marker expressions34 or subsampling the cells to achieve an equal number of cells per sample37,38 can lead to information loss during cell aggregation and cell resampling from the original data. Thirdly, the black-box nature of DNN classifiers restricts interpretability, hindering the identification of predictive biomarkers. Lastly, and most importantly, existing methods do not directly model the relationships between single-cell data obtained longitudinally, which can be crucial for understanding the effects of diseases or interventions.41,42,43,44,45,46 Studies, such as immunotherapy trials and early-phase vaccine trials, typically enroll a limited number of individual patients/participants but conduct extensive cytometry-based immune profiling, including longitudinal samples from multiple time points such as pre- and post-treatment/vaccination. Consequently, maintaining coherence in the analysis and recognizing the longitudinal relationships between samples is crucial for understanding the effects of diseases or interventions.41,44,46
While existing methods address some of the challenges associated with predicting clinical outcomes, they fall short of addressing all the aforementioned obstacles. To bridge this gap, we propose a new analysis strategy named cytoGPNet, which seamlessly integrates DNNs into the classical Gaussian process (GP) model. By employing cell-level pre-training of an autoencoder, cytoGPNet learns meaningful cell representations even with small sample sizes (number of subjects). Additionally, the flexibility of the GP model allows for robust handling of cell representations, accommodating variations in cell subsets and the number of cells per sample, as well as capturing temporal dependency. Moreover, we incorporate customized attention layers to adaptively summarize cell-level information to facilitate subject-level prediction. To further enhance interpretability, we introduce a lightweight prediction model based on GP along with a post-hoc interpretation technique.
Results
The cytoGPNet pipeline
As demonstrated in Figure 1, our proposed cytoGPNet architecture has three components: (1) a DNN-based autoencoder (AE) serves as a dimensionality reduction module, encoding single-cell cytometry data into a low-dimensional latent space, (2) a GP model that can capture correlations between cells within the same individual, across different individuals, and over time, thereby enhancing the model’s ability to capture inter-cell dependencies and temporal relationships for improved prediction outcomes, and (3) an attention-based outcome predictor that adaptively summarizes cell-level information from each sample, facilitating the subject’s clinical outcome prediction.
Figure 1.
Overview of the proposed cytoGPNet framework
First, we pre-train an autoencoder model at the single-cell level, using the encoder to obtain a low-dimensional representation for each cell (highlighted by the red dashed line). Next, the encoder output is fed into our GP model, which captures correlations across subjects and time points. We then obtain a sample representation by aggregating the cell representations using multiple attention layers, with each attention layer applied to the cell representations at a specific time point. Finally, the sample representation is used as input to a logistic regression for outcome prediction.
More specifically, the DNN-based AE transforms single-cell cytometry data into a lower-dimensional latent space. In contrast to conventional unsupervised dimensionality reduction methods such as UMAP47 and tSNE,48 our AE module is uniquely designed for end-to-end training with the subsequent GP model. The AE is initially pre-trained at the cell level without requiring subject outcomes information, fully leveraging the vast number of cells to learn meaningful cell representations even with a limited number of subjects. However, our analysis reveals that, while the AE excels at learning single-cell embeddings, it falls short in capturing multi-cell correlations. This is where the GP becomes more well suited, as it is better equipped for exploring the immunological variation across cells of different subjects and time points. Modeling such variability is crucial for predicting the subjects’ outcomes.41,44,46,49,50 However, GP requires less noisy input for optimal performance. By combining the AE with GP, we fully harness the GP’s potential for more informative single-cell learning. This design also benefits from automatically adapting the latent space to align with the GP’s kernel function during training, thereby enhancing the GP’s capability to capture correlations among input cells. In our model, we train the latent space as a Euclidean space, employing the Euclidean distance as the distance metric. In the next step, our GP model takes these latent representations as input and outputs a more compressed representation for each cell that also contains information from other cells. In particular, we choose the squared exponential (SE) kernel51 for our GP. This kernel is used to extract cell differences or correlations from the AE-derived latent space, which is a Euclidean space.
Finally, in predicting subject-level outcomes, we employ a set of attention layers to adaptively consider pertinent information from all cells of this subject across all time points. The attention layers are followed by a simple classifier—in this case, logistic regression (LR)—to generate prediction outcomes. The attention layers are a key feature of cytoGPNet, allowing the model to accommodate varying numbers of cells across samples by automatically learning optimal weights during training. This enhances model flexibility and reduces the complexity of hyper-parameter tuning required for cell summarization per sample. In contrast to basic summary statistics such as mean or sum, which assign predefined and fixed weights to each cell, our approach offers a more adaptive and effective method for predicting subject-level outcomes.
Our proposed model is designed for end-to-end training. It incorporates AE, GP, and the attention layers to collectively model, learn, and summarize crucial cell information, thereby enhancing prediction accuracy. Additionally, we develop a customized posterior inference and parameter learning method based on inducing points and variational inference,52 facilitating efficient model training in a first-order optimization method (e.g., stochastic gradient descent53 or Adam54). Finally, we develop a post-hoc explanation technique to enhance the interpretability of cytoGPNet.
cytoGPNet enables accurate outcome prediction
We first benchmarked the prediction performance of cytoGPNet against competing methods on five cytometry datasets—three flow cytometry and two CyTOF—spanning various diseases: COVID-19 (SDY1708), influenza (SDY212), HIV (HEUvsUE), non-small cell lung cancer (TOP1501), and cytomegalovirus (CMV) infection (CMV). These datasets feature diverse dimensionalities (with the number of subjects ranging from 20 to 308 and the marker dimension ranging from 8 to 49) and different temporal dynamics (with time points ) (datasets). Additionally, we expanded our analysis by incorporating scRNA-seq data (SC4) to further evaluate cytoGPNet’s predictive performance across different single-cell data types. The summaries of all six datasets are provided in Table S1. CellCnn, CyTOF_DL, Cytoset, AE-based method (AE), LR, and random forest (RF) were used for comparison (comparison methods). For each of the datasets, we applied the 5-fold cross-validation strategy to ensure robust model evaluation and to prevent overfitting. The input to our model cytoGPNet is the preprocessed cytometry data (data processing) from all individual subjects, formatted as matrices with columns representing markers and rows representing the individual cells. The outputs of the model are subject-level information of interest, which are binary outcomes for our six datasets.
To evaluate the prediction performance, we use the following four metrics: area under the receiver operating characteristic curve (AUC), F1 score (F1), precision, and recall. The AUC measures the overall performance of a model by evaluating the area under the ROC curve, which plots the true positive rate against the false positive rate across various threshold settings. An AUC of 0.5 indicates that the model performs no better than random guessing. The F1 score is a metric that assesses a model’s accuracy by considering both precision and recall. It is the harmonic mean of precision (the proportion of true positive predictions out of all positive predictions) and recall (the proportion of true positive predictions out of all actual positive instances). The F1 score aims to balance the trade-off between precision and recall, providing a single measure of a model’s ability to correctly classify instances as positive or negative. For all four metrics, the values range from 0 to 1, with 1 indicating a perfect classifier.
Figure 2 presents a detailed comparison of the prediction performance of cytoGPNet against other methods. The bar plots highlight the AUC, F1, precision, and recall for each method, evaluated on four distinct datasets. Our method, cytoGPNet, consistently outperforms the alternatives, as indicated by the highest AUC and F1 scores, demonstrating its superior predictive accuracy. Notably, cytoGPNet shows significant improvement over the AE method, which is limited by its inability to model cell-cell correlations in single-cell inputs. This suggests that the GP model within cytoGPNet effectively models complex cell-cell interactions, thereby enhancing prediction accuracy and contributing to cytoGPNet’s improved generalizability. The other three deep learning-based methods—CellCnn, Cytoset, and CyTOF_DL—exhibit varying degrees of performance but generally show lower AUC and F1 values compared with cytoGPNet. Sometimes, these methods perform even worse than the traditional LR and RF models. For instance, this is observed in the CMV dataset, which has a relatively small sample size and a higher dimensionality for the markers . All three methods exhibit lower values than LR and RF for AUC, suggesting their potential limitations in handling datasets with less information in terms of sample size and dimensionality. This highlights the necessity of more robust methods such as cytoGPNet that can effectively manage these challenges and provide more accurate predictions. In addition, a comparison of computational time between cytoGPNet and other deep learning-based methods is provided in Figure S1. While our method exhibits a slightly longer running time compared with others, the difference is not significant, indicating comparable performance across all methods.
Figure 2.
Performance comparison of cytoGPNet and competing methods
Prediction accuracy is evaluated using AUC, F1, precision, and recall, based on 5-fold cross-validation across six datasets: SDY1708, SDY212, HEUvsUE, TOP1501, CMV, and SC4. The height of each bar represents the corresponding mean value across the five folds, and the vertical lines (error bars) indicate the standard error.
For HEUvsUE data analysis, we conducted three experiments: (1) aggregating all blood samples across the seven conditions , (2) using only the unstimulated blood samples , and (3) incorporating condition information for all 308 samples as additional features ( with covariates). In the third experiment, we introduced six dummy variables in the final classification layer as covariates in the LR model to account for the seven experimental conditions, with the unstimulated condition serving as the reference group. Each dummy variable corresponds to one of the six stimulation conditions, taking a value of 1 if the sample belongs to that condition and 0 otherwise. This encoding allows the model to estimate the effect of each stimulation condition relative to the unstimulated baseline. The results, shown in the second row and the left panel of the third row of Figure 2, demonstrate that our method remains robust to variations in sample size. However, in the third experiment ( with covariates), we observe a slight decrease in the predictive performance of cytoGPNet. This decline is likely due to the lack of significant associations between the six dummy variables and the outcome variable when fitting cytoGPNet to the entire dataset (the smallest p value is 0.24, and the largest is 0.76). Since these covariates do not provide informative contributions to the outcome, their inclusion may introduce unnecessary complexity, ultimately reducing predictive power. Additionally, it is worth noting that none of the benchmarked deep learning-based models can incorporate condition information as covariates into the analysis without modifying the existing code. These findings highlight our algorithm’s adaptability to different data configurations while maintaining its predictive capability.
For scRNA-seq data (SC4), there are two key differences compared with mass and flow cytometry data: they contain significantly fewer cells per sample and exhibit much higher dimensionality due to the large number of genes measured per cell. To accommodate this high-dimensional nature, we modified our AE architecture used for cytometry data by adding three additional fully connected hidden layers with output sizes of 256, 64, and 16. We used the ReLU activation function and trained the autoencoders using the Adam optimization algorithm. The widths of the three hidden layers were pre-specified without additional tuning, informed by prior experience with scRNA-seq data analysis using autoencoders.55 This was the only structural adjustment made to cytoGPNet, underscoring its flexibility in handling diverse data modalities while maintaining strong predictive performance. To showcase cytoGPNet’s flexibility in handling non-binary outcome data, we further categorized patient outcomes into three groups: healthy, mild or moderate, and severe. To accommodate this multiclass prediction, we extended our model’s final classification layer to a multinomial framework, replacing the sigmoid activation function with a softmax function. This modification preserves cytoGPNet’s core structure while enhancing its adaptability to diverse outcome types. The prediction results, illustrated in Figure S2, are presented in the form of a confusion matrix.
We also conducted analyses using cell type proportions as features—after applying a centered log-ratio transformation—to perform prediction via LR with a Lasso penalty. Specifically, for datasets lacking manual gating results (i.e., all datasets except TOP1501), we performed automatic clustering using FlowSOM with the default setting of 100 clusters, allowing the algorithm to capture granular cellular subpopulations. FlowSOM then aggregates these 100 clusters into 10 meta clusters, facilitating higher-level biological interpretation. For longitudinal datasets, clustering was performed separately at each time point, and the proportions of each cluster across all time points were included as covariates in the Lasso regression model. As illustrated in Figure S3, the performance of proportion-based prediction varies considerably across datasets, whereas cytoGPNet demonstrates consistent performance. Figure S4 further examines the TOP1501 dataset: Figure S4A displays p values from univariate analyses—where each cell type proportion is added as a covariate individually—and reveals that none of the gated cell types show a statistically significant association with the outcome. Figure S4B compares the performance of cytoGPNet with predictions based on cell type proportions obtained from FlowSOM (using both 100 clusters and 10 metaclusters) as well as manual gating results. For this dataset, we observe a dramatic decrease in prediction performance when relying solely on cell-type proportion-based analysis. This suggests that while cell type proportions are a commonly used summary statistics for single-cell data, other features, such as marker expression levels, can also provide valuable insights.
For both TOP1501 and CMV datasets, which include temporal information and have extremely small sample sizes, all alternative methods experience a decline in terms of AUC. This underscores the critical importance of explicitly modeling longitudinal data to effectively capture temporal dynamics and improve predictive accuracy. We also explored refining our model by explicitly incorporating the longitudinal structure into the additive GP layer. Specifically, we introduced a separate GP designed to learn time-step correlations by obtaining a unified representation of all cells at the same time step and applying a GP to these unified representations. This approach allows the covariance matrix to directly capture temporal dependencies across time points. However, our empirical results did not show a significant or consistent improvement across longitudinal datasets. For instance, while the AUC across five folds increased marginally by 0.2% for the TOP1501 dataset, it declined by 1.4% for the SDY212 dataset and 5% for the CMV dataset. These results indicate that incorporating this additional GP does not improve prediction performance, at least for small sample sizes, suggesting that our original GP formulation is already effectively capturing temporal correlations at a fine granularity. Overall, cytoGPNet’s ability to maintain high prediction performance across various datasets underscores its robustness and adaptability.
cytoGPNet demonstrates robustness to missing time points and smaller cell numbers
We next evaluated the robustness of cytoGPNet when the data contain missing time points or a reduced number of cells. For this purpose, we utilized the SDY212, TOP1501, and CMV datasets, which feature cell measurements across different time points. To simulate a smaller number of cells, we randomly selected 50% of all cells for each blood sample. We then re-trained our model using the same pipeline and assessed its performance using AUC and F1 scores, presented in Table 1. Remarkably, cytoGPNet maintained similar performance on both the full datasets and the datasets with only 50% of the cells. This demonstrates that reducing the number of cells does not significantly impact the accuracy of our model. In contrast, all competing methods either exhibit a decline in performance or continue to perform poorly even with the full datasets.
Table 1.
Table summarizing the prediction performance of cytoGPNet and competing methods using AUC and F1-score metrics
Method | AUC |
F1 |
||||
---|---|---|---|---|---|---|
Full data | 50% cells | Missing time points | Full data | 50% cells | Missing time points | |
SDY212 | ||||||
cytoGPNet | 78.1 (10.3) | 77.7 (8.17) | 67.6 (12.3) | 83.6 (6.54) | 84.9 (3.75) | 74.2 (8.81) |
AE | 68.4 (7.67) | 70.0 (7.76) | 62.7 (12.5) | 81.1 (3.84) | 81.2 (2.97) | 62.6 (7.74) |
CyTOF_DL | 56.58 (15.29) | 50.47 (11.08) | N/A | 36.28 (21.34) | 32.34 (19.97) | N/A |
CytoSet | 65.42 (17.42) | 62.78 (15.98) | N/A | 48.41 (14.93) | 34.67 (20.63) | N/A |
CellCnn | 57.55 (15.55) | 62.76 (2.93) | N/A | 28.43 (26.14) | 38.89 (21.87) | N/A |
LR | 59.3 (19.6) | 59.3 (19.6) | 50.2 (11.9) | 77.6 (4.97) | 77.6 (4.97) | 68.5 (2.70) |
RF | 57.8 (7.73) | 56.8 (4.81) | 46.9 (6.22) | 81.2 (2.00) | 81.2 (2.00) | 77.2 (4.93) |
TOP1501 | ||||||
cytoGPNet | 85.0 (33.5) | 85.0 (33.5) | 68.0 (28.0) | 78.3 (15.8) | 86.7 (16.3) | 60.8 (12.7) |
AE | 78.0 (22.8) | 69.5 (19.2) | 60.5 (29.4) | 64.0 (25.1) | 52.8 (9.88) | 50 (13.4) |
CyTOF_DL | 57.5 (44.72) | 58.5 (35.95) | N/A | 32.38 (20.52) | 25.71 (25.05) | N/A |
CytoSet | 36.00 (29.35) | 46.5 (29.98) | N/A | 22.67 (21.43) | 29.05 (29.76) | N/A |
CellCnn | 60.00 (32.06) | 63.00 (24.39) | N/A | 32.38 (20.52) | 32.38 (20.52) | N/A |
LR | 65.0 (14.1) | 50.5 (24.5) | 65.0 (14.1) | 42.7 (7.23) | 40.8 (6.87) | 42.7 (7.23) |
RF | 59.5 (38.5) | 59.5 (38.9) | 55.0 (11.2) | 66.7 (N/A) | 66.7 (N/A) | 38.1 (11.0) |
CMV | ||||||
cytoGPNet | 90.0 (22.4) | 87.6 (19.4) | 72.3 (21.8) | 82.7 (16.7) | 84.6 (10.1) | 69.4 (22.9) |
AE | 80.0 (27.4) | 74.8 (22.4) | 64.1 (10.2) | 79.3 (21.7) | 72.0 (20.1) | 59.3 (10.2) |
CyTOF_DL | 60.00 (45.41) | 60.00 (45.41) | N/A | 69.33 (21.27) | 27.33 (23.85) | N/A |
CytoSet | 56.67 (30.84) | 55.00 (37.08) | N/A | 53.80 (32.63) | 57.14 (32.99) | N/A |
CellCnn | 65.00 (48.73) | 61.67 (26.09) | N/A | 62.48 (35.62) | 56.67 (36.51) | N/A |
LR | 84.0 (18.4) | 84.0 (18.4) | 59.2 (29.7) | 82.0 (20.5) | 82.0 (20.5) | 48.2 (31.4) |
RF | 85.0 (22.4) | 81.2 (19.7) | 60.2 (21.4) | 78.0 (17.9) | 79.1 (19.1) | 59.7 (29.3) |
Evaluation is conducted on three longitudinal datasets (SDY212, TOP1501, and CMV) using 5-fold cross-validation, comparing three scenarios: full dataset (Full data), 50% reduced cells (50% cells), and 10% randomly masked samples from the second time point (Missing time points). Values are reported as the mean across five folds, with the standard deviation scaled by a factor of 100 in parentheses.
To evaluate the robustness of the models in the presence of missing time points, we randomly masked 10% of the samples’ data from the second time point, simulating real-world scenarios where patients miss their follow-up visits. Given that CellCnn, CyTOF_DL, and Cytoset cannot handle missing temporal data, we excluded these models from the analysis and focused on comparing cytoGPNet with the remaining methods. More specifically, during the 5-fold cross-validation process, we randomly masked 10% of the training samples’ data from the second time point. For subjects with missing cell data, we addressed this by padding with zeros. As shown in Table 1, our model demonstrates superior performance compared with other models, even when accounting for missing time points. Notably, cytoGPNet retains relatively high AUC and F1 values. This indicates the robustness and reliability of our model in handling incomplete datasets while preserving predictive accuracy. This robustness is particularly crucial in longitudinal studies where patient follow-up can be inconsistent, ensuring that cytoGPNet can provide reliable insights even with partial data.
The ability of cytoGPNet to handle missing data and reduced cell numbers without a significant loss in performance indicates its potential applicability in diverse clinical and research settings. For example, in scenarios such as early-phase clinical trials or studies involving rare diseases, where data may be scarce or incomplete, cytoGPNet’s resilience ensures that valuable predictive insights can still be obtained.
cytoGPNet is able to mitigate batch effect
To evaluate our model’s resilience to batch effects, we utilized the TOP1501 dataset, which consists of samples from three distinct batches in wet lab experiments. Since each batch contained blood samples from the same healthy control subject, we generated line plots for each measured marker by averaging expression levels across cells within the same blood sample. The averaged marker expression was then standardized across the three batches for visualization and comparison. Figure 3A illustrates the presence of batch effects. To further assess these effects, we performed the Kruskal-Wallis test, a non-parametric method for comparing multiple groups. This analysis determined whether statistically significant differences existed between sample distributions from different batches. The results indicated that all markers had significant p values well below 0.05, confirming the presence of batch effects across the measured markers.
Figure 3.
Assessment of batch effects in cytoGPNet
(A) Line plots showing the mean expression of 7 selected markers (out of 25) from samples of the same healthy control subject across 3 batches.
(B) Bar plot illustrating the batch effects across different layers in cytoGPNet, measured by the negative logarithm of pvalues obtained using the Kruskal-Wallis test. The horizontal dotted red line indicates the negative logarithm of 0.05.
(C–F) Dot plots displaying the activation scores in different layers of cytoGPNet, with each dot representing a sample within a batch, color coded by the patient’s outcome (blue, responder; red, non-responder), with shapes indicating the time point when the sample was collected (circle, baseline; triangle, post-treatment).
Next, following the methodology outlined in Hu et al.,37 we first derived the activation score for each layer of the model. For a given layer, the score is calculated by averaging the output values for each cell after it passes through that layer. This average is computed across all cells and output dimensions within each sample. This score provides a scalar summary measure of the model’s internal representation per sample at different stages of cytoGPNet. Thus, we assessed cross-batch heterogeneity in each layer of cytoGPNet using the Kruskal-Wallis test. Our results, shown in Figure 3B, indicate that heterogeneity gradually decreases across the layers: it is strongest at the input layer but becomes insignificant in both the GP and attention layers. This trend is further visualized in Figures 3C–3F. Each dot represents a single flow cytometry sample, color coded by the patient’s outcome, with shapes indicating the time point when the sample was collected. These plots visualize how samples from different batches are distributed across cytoGPNet’s layers, highlighting how the model processes and integrates information. Starting at the input layer (Figure 3C), the blue (responders) and red (non-responders) dots are intermixed both within each batch and cross batches, showing no discernible pattern or separation. However, as cytoGPNet progresses through the subsequent layers, a gradual trend emerges. The blue and red dots begin to separate more distinctly within each batch. For example, in the GP layer, the blue dots tend to have higher values than the majority of the red dots across all three batches, with this separation becoming more pronounced compared with the input and AE layers. This suggests an increasing ability of cytoGPNet to distinguish between responders and non-responders as the data are processed through its layers. By the attention layer (Figure 3F), the separation between the blue and red dots becomes even more distinct. For instance, in batch 2, the distinction between the two groups is noticeably clearer than in the preceding GP layer, indicating that the model has effectively captured and utilized relevant signals to differentiate patient outcomes. This progression underscores the model’s learning progression. This trend suggests that cytoGPNet effectively mitigates the batch effects as the data progresses through the network layers. Despite the presence of batch effects, cytoGPNet reliably predicted clinical outcomes at the individual subject level, highlighting its ability to discern meaningful signals amid batch-related noise.
cytoGPNet suggests potential biomarker dynamics
Identifying blood-based biomarkers to direct immune therapy for cancers remains a major area of unmet need. In non-small cell lung cancer in particular, obtaining adequate tumor biopsies for immune profiling in a safe and timely manner is often not feasible. Traditional immune profiling of blood has limitations for identifying immune cell subsets associated with benefits from programmed death 1 (PD-1) checkpoint therapy. To uncover the biomarkers associated with cytoGPNet’s high predictive accuracy on the TOP1501 dataset, we employed our model’s masking (explanation) algorithm. This algorithm was applied iteratively to gain insights into the potential dynamics of the biomarkers: first on baseline data and then on post-treatment data. The results, as depicted in Figure 4A, reveal that Killer cell lectin-like receptor G1 (KLRG1) emerges as the most predictive biomarker for baseline data. Figure 4B presents tSNE plots of randomly subsampled cells from baseline samples, shown separately for responders and non-responders, color coded by KLRG1 expression levels. Cells with expression values above the median are marked in orange, while those below the median are marked in green. Notably, at least two distinct clusters of cells (in the bottom left and bottom right regions of the figure) are observed. In these clusters, non-responders exhibit a higher proportion of cells with elevated KLRG1 expression compared with responders, suggesting a potential association between KLRG1 expression and response status. Interestingly, none of the markers individually stand out as highly predictive for post-treatment data alone. As a validation, we have also included six markers (the first six markers in Figure 4A) in our analysis, which were not used in the gating process for identifying T cells. As expected, our analysis did not identify these six markers as predictive biomarkers, suggesting that the markers selected by our model are relevant to predictive power. Among these, Violet E and A (405_610/20_vE-A, 405_800/60_vA-A) were part of the Innate panel but not the T cell panel, while Blue A, C, D, and Green E (488_610/20_bD-A, 488_660/40_bC-A, 488_820/60_bA-A, 532_575/25_gE-A) were not used in either panel and served as empty channels. These empty channels were included in the analysis as noise without compensation. Additionally, we performed a separate analysis excluding all six markers, yielding a mean AUC of 86.19 (SD = 0.29) and a mean F1 score of 81.74 (SD = 0.157) across 5-fold cross-validation—results unchanged from the analysis including these markers. This demonstrates that cytoGPNet effectively identifies key predictive markers.
Figure 4.
Marker importance and cell-type associations
(A) Boxplots visualizing the distribution of masking scores (ranging from 0 to 1) for each marker (x axis) for baseline data (left) and post-treatment data (right). Marker with the highest masking score is highlighted with blue rectangle.
(B) tSNE plots of randomly subsampled cells from baseline samples, shown separately for responders and non-responders and color coded by KLRG1 expression levels. Cells with KLRG1 expression values above the median are marked in orange, while those below the median are marked in green.
(C) Boxplots visualizing the distribution of masking scores for KLRG1 across all cell subsets for both baseline and post-treatment data.
(D) Boxplots depicting the distribution of masking scores for CD4+ CD39+ cells.
Our finding in KLRG1 is consistent with existing research findings, such as documented in a published study.56 KLRG1, identified as a co-inhibitory receptor for natural killer (NK) cells and antigen-experienced T cells, has been implicated in immune regulation in patients with non-small cell lung cancer. Studies have shown that KLRG1 knockdown affects tumor cell proliferation, highlighting its potential as a therapeutic target. While we found no predictive markers post-treatment, this does not necessarily imply the absence of differential marker expression between responder and non-responder groups post-treatment. It may suggest that post-treatment differences may be subtler compared with baseline. One possible reason for this could be the administration of 200 mg intravenous pembrolizumab over two cycles. Pembrolizumab functions by blocking interactions between PD-1 on T cells and its ligands PD-L1 and PD-L2 on tumor cells, thereby restoring effective T cell responses against cancer.57 This mechanism could lead to a reduction in individual-level differences post-treatment, resulting in a more uniform count of effective T cells across patients and consequently less variability in expression data among all patients. Our subsequent analysis further supports this interpretation.
We next applied our model’s explanation algorithm to different cell subsets identified by manual gating, leading to the identification of multiple cell subset-specific predictive biomarkers. Among them, Figure 4C, top panel, shows five cell subsets where KLRG1 emerges as the most predictive biomarker for baseline data. Interestingly, the bottom panel of Figure 4C demonstrates that, while KLRG1 is not a predictive biomarker for post-treatment data when considering all cells collectively, it remains predictive for patient outcomes for two specific cell subsets: CD8+ CD45RA+ CD197+ and CD8+ CD38− HLADR+ cells (annotated as CD8+ Q1 in the figure).
For CD8+ CD45RA+ CD197+ cells, the biological function has been well explained in mice data. KLRG1, defined as MPEC in mice, is known to be highly expressed in noncytotoxic cells with memory functions. These cells possess high levels of CD45RA and CD197 and serve as markers for distinguishing cytotoxic cells within the CD8 population.58 Noncytotoxic cells, also known as “natural killers,” play crucial roles in protective immunity against a wide range of pathogens and tumors.59,60 In contrast, cytotoxic cells are more effective in eliminating cancer cells as tumor antagonist effectors.61 Therefore, responders are expected to exhibit lower proliferation of KLRG1 expression in these cells to maintain tumor suppression, especially after pembrolizumab treatment, as supported by our data (not shown in the figure). This finding aligns with a novel treatment approach that combines the anti-cytotoxic T-lymphocyte-associated antigen-4 monoclonal antibody quavonlimab with pembrolizumab to enhance safety and efficacy in treating extensive-stage lung cancer.62 For CD8+ CD38− HLADR+ cells, KLRG1 appears to be a predictive biomarkers for both baseline and post-treatment data. This observation aligns with previous studies that the percentage of PD-1+ CD8 T cells that were KLRG1-negative was strongly associated with response in both pre-treatment and 2-week on-treatment blood samples.63 Given that PD-1+ CD8 T cells are significantly enriched in the specified gating (CD38+ HLADR+), it is plausible to infer that KLRG1 can be a critical marker of terminal differentiation.63,64
Figure 4D reveals that, within the CD4+ CD39+ cell subset, both CD127 and CD197 markers are particularly predictive of outcomes at baseline compared with other markers. Previous research has identified CD127 as a crucial factor in the dynamic regulation of the T cell compartment, playing a key role in the maintenance of memory T cells.65 CD197, on the other hand, is predominantly expressed in naive and memory cells, highlighting its importance in these populations.66 Furthermore, for post-treatment data, HLADR emerges as the most predictive marker within the CD4+ CD39+ cell subset. HLADR is a significant marker involved in antigen presentation to T cell receptors on T-helper cells, which subsequently leads to antibody production.67
We also compared our findings with other competing methods. LR fails to detect differentially expressed markers for prediction using a z-test on the parameters. Figure S4C illustrates the variable importance of RF for both baseline and post-treatment samples. CD197 emerges as the most important marker for baseline data. However, another marker, 448_660/40_bC–A, also appears highly important, despite not being used in the gating process for identifying T cells. Additionally, the relatively low prediction accuracy for RF further undermines the reliability of its importance computation. CytoSet lacks interpretability features, making it challenging to discern the significance of individual markers. In contrast, CellCnn provides cell type-specific filter responses, allowing for differentiation of outcomes based on variations in cell filter response values between outcome groups. Figure S4D shows the boxplots of cell filter response values for each cell subset, according to manual gating, for both baseline and post-treatment data. None of the cell subsets show a significant difference between responder and non-responder groups. CyTOF_DL uses a permutation-based method to interpret the model, involving up-sampling of cells and a decision tree algorithm to select significant markers for each cell subset. However, due to its high computational demands, we were unable to obtain any results after 24 h of computation. We also compared our findings with diffcyt, an R package designed for the differential analysis of cytometry data.68 It provides a framework for detecting differentially abundant cell populations and differentially expressed markers across experimental conditions. It outputs p values for each marker across all cell types, indicating whether a marker is differentially expressed within a specific cell type. Additionally, the Benjamini-Hochberg (BH) procedure is applied to adjust for multiple testing. Figure S5 presents a heatmap of adjusted p values from the differential abundance analysis across manually gated cell types and markers under baseline and post-treatment conditions. Overall, both diffcyt and our approach indicate that differential patterns are more pronounced in the baseline data than in the post-treatment data. Furthermore, both methods identify KLRG1 as a significant marker, consistently exhibiting differential expression across most gating labels. Notably, diffcyt detects a larger number of differentially expressed markers, which may be due to its differential analysis within manually gated, overlapping subsets of cells that are not mutually exclusive. In this context, multiple testing correction methods such as the BH procedure tend to be less conservative than when p values are independent.69,70,71,72 This likely contributes to diffcyt identifying more differentially expressed markers compared with our analysis method. Furthermore, the significant adjusted p values are generally close to 0.05, suggesting that applying a more conservative adjustment method could further reduce the number of differentially expressed markers.
Lastly, we also applied cytoGPNet to the analysis of CMV data. CMV infection significantly impacts immune system dynamics in renal transplant recipients, primarily through the activation and expansion of cytotoxic lymphocytes. A study by Ishiyama et al.73 showed that CMV-infected patients exhibit increased expression of proliferation and cytotoxicity markers, such as Ki67, Granzyme B, NKG2C, and CD57, largely due to the adaptive response of NK cells and CD8+ T cells in controlling viral replication. Interestingly, similar expression trends were observed in non-CMV-infected renal transplant patients, suggesting that immune activation may be influenced by additional factors beyond CMV infection. In our analysis of the CMV dataset as shown in Figure S6, we identified NKG2C and CD57 as highly predictive markers for CMV classification at day 0, highlighting their potential role in early immune responses to CMV infection. Furthermore, Ki67 was found to be a predictive marker in the post-viremia stage, suggesting its involvement in the proliferative response of cytotoxic lymphocytes following viral clearance. However, contrary to the original study, which identified as a differentially expressed marker in CMV-infected patients, our analysis did not replicate this finding. This discrepancy may stem from differences in patient cohorts and underlying biological variations, which merit further exploration.
Methods
The cytoGPNet model
Model design
Formally, consider a study comprising individual subjects. Each subject has immune cells measured at time points . For instance, with , could correspond to the baseline measurement, and corresponds to the measurement after the intervention. Thus, we denote a dataset . is the single-cell cytometry data for subject measured at time point , where is the number of cells (rows) for , and is the number of measured markers for each cell. typically varies across subjects and time. We define as the total number of cells for the entire dataset . denotes the outcome for patient . For example, can be a binary value with 1 (responder/protected) or 0 (non-responder/non-protected) in an immunotherapy study/vaccine trial setting. We let be the -th row (cell) of , where denotes transpose. We let denote all cells in the dataset .
As demonstrated in Figure 1, our proposed model combines DNNs with the classical GP model. Our rationale is: (1) non-parametric models are often preferred in longitudinal study designs because they require fewer assumptions about the underlying mechanism that generates the data. GP is a principled framework for learning non-parametric models probabilistically.74,75,76 Thus, by utilizing all given single-cell data as input, GP learns a unified covariance matrix that captures the correlations between cells within the same subject, across subjects and time, thereby improving the modeling of cell dependency and temporal relationship and the final prediction result. (2) The property of GP that any subset of random variables follows a joint Gaussian distribution, regardless of the number and order of variables in the set, enables the construction of a robust model that can handle variations in the number of cells per sample. This feature is particularly advantageous in studying immune responses where cell counts and cell heterogeneity vary across samples. (3) To further strengthen the capabilities of GP, we aim to enhance the robustness and informativeness of single-cell representations by combining GP with the AE. By training AE jointly with the GP model, the AE can learn an adaptable latent space optimized for the subsequent GP model. In other words, the AE model will be trained to convert noisy input into a latent space that aligns with a Gaussian distribution, making it appropriate for GP inputs. This property ensures that the model remains effective even if the original input does not follow a Gaussian distribution. Furthermore, GP and AE together transform the original input of each cell from a vector into a scalar representation, effectively reducing the dimensionality and making the problem easier for the subsequent outcome prediction model. Additional discussion of design decisions is included in the supplemental note.
Specifically, each cell from subject and time , , is first fed into an AE model consisting of an encoder and a decoder . The encoder is a multi-layer perception that transforms into , where . As shown in Figure 1, the decoder has a symmetric structure as the encoder, which reconstructs the input using , i.e., . The AE is first pre-trained by minimizing the reconstruction error, i.e., . Then, we preserve only the encoder and apply our customized GP model to the encoder outputs . Here, we use to represent the encoder outputs for all cells in the dataset, where is the representation for cell . Formally, a GP defines a prior of a non-parametric function . Following the common setup in the literature,75 we assume a zero mean function, and denotes a positive semi-definite kernel function. Here, we use the SE kernel function with as the scale parameter, and we set the variance to 1 by default. Any finite collections of follows a multivariate Gaussian distribution . Here, is the covariance matrix with . For simplicity, we omit and use to represent the representation for any two cells from .
So far, we have obtained a compressed representation for each input cell. To enable prediction at the subject level, we first need to derive a subject-level representation from the output of our GP model. Instead of using simple summary statistics, we utilized multiple attention layers to obtain a representation for each subject. The attention layer can learn to automatically adjust the weight for each cell in the subject representation and their contribution to the final outcome prediction. It enables unique cell weights for different subjects and avoids manual effort in determining the proper cell weights.
Specifically, given the GP outputs of all cells in each subject , where is a scalar, we first split it into sub-vectors according to the temporal information. In other words, the representations of cells from the same subject at the same time are concatenated as the sub-vector. Then, we input each sub-vector into an individual attention layer and output a scalar presentation, denoted as . Finally, we concatenate the attention outputs into a subject representation . The rationale for having individual attention for each time is to fuse the cell information at the same time and help the subsequent classification model to better capture the correlations across times. Finally, we let follow a Bernoulli distribution with .
Learning and inference
Direct inference of our model requires computational complexity. To improve scalability, we adopt the inducing points method.52 This method simplifies the posterior computation by reducing the effective number of samples in from to , where is the number of inducing points. More specifically, we define inducing points, where each inducing point is , and we let be the GP output of . Then, the joint prior of the GP outputs on actual inputs and the outputs for inducing points and the conditional prior are given by:
(Equation 1) |
where , , are the covariance matrices. With inducing points, we only need to compute the inverse of , which significantly reduces the computational cost from to . For simplicity, we omit the from the GP input and output .
So far, our model has introduced the following parameters: AE parameters , GP parameters , attention layer and prediction model parameters , and inducing points . To learn these parameters, we follow the idea of empirical Bayes75 and maximize the log marginal likelihood . Maximizing this log marginal likelihood is computationally expensive and, more importantly, intractable for models with non-Gaussian likelihood. To provide a factorized approximation to marginal likelihood and enable efficient learning, we assume a variational posterior over the inducing variable and a factorized joint posterior , where is the conditional prior in Equation 1. By Jensen’s inequality, we can derive the evidence lower bound (ELBO): . The likelihood term is intractable. To address this, we first compute the marginal variational posterior distribution of , denoted as . Then, we apply the reparameterization trick77 to . We define , with and . With this reparameterization, we can sample from the standard Gaussian distribution and approximate the likelihood term with Monte Carlo (MC) method78: , where is the number of MC samples. With the above approximations, our model parameters can be efficiently learned by maximizing the ELBO using a first-order optimization method.
Post-hoc explanation
We design a novel technique for global model interpretation using the trained model. Our method can identify the most influential features/markers contributing to the final prediction. The identification of these markers could provide a more comprehensive understanding of the biological mechanisms underlying the clinical outcomes. More specifically, our explanation goal is to search for a minimal set of markers that effectively maintain prediction accuracy across all subjects. We consider these markers to be globally significant since, regardless of perturbations applied to the remaining markers, as long as the values of the identified markers are preserved, the model’s prediction remains largely unchanged. This procedure can be formulated as an optimization problem, where the objective is to discover the minimum subset of markers that ensure high prediction performance when perturbing the values of the remaining markers. Formally, we use a mask matrix to indicate marker importance. For the identification of globally important markers, we constrain each column of to be the same value, indicating the -th marker is important and its value needs to be preserved, and otherwise. The optimization problem can be formulated as finding the optimal by solving the objective function: , where denotes our proposed model. We assume the value of each column is independently sampled from a Bernoulli distribution parameterized by . This design guarantees the value of to be either 0 or 1 and enables the stochastic searching for the proper mask. The joint distribution is , where the entire matrix is sampled from. , where is the concatenation of original input cells of all samples across all time points. is a perturbation of , where the values of non-important markers are replaced with the mean value across all cells. Thus, the first term in the loss function aims to minimize the prediction loss of the perturbed sample. For the second term, is the lasso regularization that restricts the number of non-zero elements in and is a hyperparameter that controls the strength of . The objective function can be hard to optimize because we cannot sample from the Bernoulli distribution with unknown parameters, and the Bernoulli distribution is discrete. To tackle these challenges, we approximate the Bernoulli distribution with its continuous relaxation—Gumbel-Softmax distribution77—which samples from a uniform distribution and computes as a function of and . With this approximation, we can move the unknown parameter inside the expectation and sample from a known distribution to solve with a first-order optimization method. We use as the final marker importance score, i.e., a larger indicates that the -th marker is more important.
Model parameters and tuning
In the cytoGPNet model, we begin by processing input cells across all subjects using an AE with a single hidden layer of 16 neurons, generating a latent embedding of dimension four. Our GP layer includes inducing points and utilizes the SE kernel function, with the output variance set to 1 and the length scale set to 0.1. For AE pre-training, we set the batch size as 128, the learning rate as , the training epoch as 1,000, and the optimizer as Adam. During end-to-end fine-tuning, we adjust the learning rate to , reduce the batch size to 10, and train for 100 epochs on the cytometry datasets. All the hyper-parameters are obtained through a grid search strategy. For scRNA-seq data, we modify the AE architecture by adding three fully connected hidden layers with output sizes of 256, 64, and 16, using the ReLU activation function. The widths of these hidden layers are pre-specified and not tuned. Through a grid search strategy, we set the batch size to 256, the learning rate to , and the training epoch to 200, using the Adam optimizer for AE pertaining. During end-to-end fine-tuning, we adjust the learning rate to , reduce the batch size to 10, and train for 100 epochs.
Datasets
SDY1708
A total of 64 patients were enrolled in the Stanford University COVID-19 Biobanking studies from March to June 2020. All patients were over 18 years old and had a positive SARS-CoV-2 test result from the RT-PCR of a nasopharyngeal swab. Additionally, 8 asymptomatic adult donors were included as healthy controls in this study. Thus, the total number of individuals in the dataset is , where 64 are COVID-19 positive and 8 are healthy controls. For all 72 individuals, blood was collected into cell preparation tubes or heparin vacutainers , and the CyTOF processing of 49 marker panels was performed on peripheral blood immune cells preserved in liquid nitrogen to generate the single-cell mass cytometry data.79 In our analysis, we let represent COVID-19 positive and represent healthy control.
SDY212
This dataset contains subjects who were enrolled in the influenza vaccine study. Peripheral blood samples, both pre- and post-vaccine , were collected. Flow cytometry analysis of 8 marker panels from experiment of EXP13405. Based on seroconversion, 52 individuals were classified as poor responders (PRs), and the remaining were classified as good responders (GRs).80 In our analysis, we let represent GRs and represent PRs.
HEUvsUE
This dataset contains African infants, comprising 20 who were either exposed to HIV in the maternal body but remained uninfected (HEU) and 24 who were unexposed (UE).81 A total of 308 blood samples were collected from these infants at 6 months after birth and were either left unstimulated (serving as a control) or stimulated with six Toll-like receptor ligands. Flow cytometry data with 8 marker panels were obtained from all 308 blood samples. In our analyses, we treated using the unstimulated blood samples, and by aggregating all blood samples across the 7 conditions. We defined for HEU infants and for UE infants.
TOP1501
patients with stage 1B-3A non-small cell lung cancer received 2 cycles pembrolizumab, surgery, adjuvant chemotherapy, and 4 cycles of pembrolizumab in a phase II study (NCT02818920). Viable and functional PBMCs were collected and stored at baseline and after pembrolizumab . Major pathologic response was observed in 7 of 29 patients after pembrolizumab. Functional immune cells were measured by flow cytometry using 25 marker panels at baseline and after pembrolizumab. In our analysis, we use T cells identified by manual gating as input and let represent the major pathologic response (responder), and represent the non-major pathologic response (non-responder).
CMV
This longitudinal dataset comprises PBMC CyTOF files from an NK-centric panel (singlet or live cells post-debarcoding), collected from 11 CMV-viremic patients and 9 non-viremic (NV) patients,73 for a total of . Each NV patient has 3 FCS files corresponding to three distinct time points of blood sample collection. Each CMV-viremic patient has 4 to 5 FCS files representing 5 sampling time points, with 4 patients missing one collection. Due to sampling discrepancies between CMV-viremic and NV patients, and the absence of intermediate samples for certain CMV-viremic patients, our analysis focused on three standardized time points: day 0, pre-viremia, and post-viremia, to facilitate comparative analysis. Mass cytometry data with a total of marker panels were generated from these participants. We define to indicate a CMV-viremic patient and to denote an NV control subject.
SC4
SC4 is a large COVID scRNA-seq dataset comprising subjects: 25 healthy controls and 171 COVID patients. Among the COVID patients, 79 exhibited mild or moderate symptoms, 92 were hospitalized with severe symptoms.82 The dataset includes scRNA-seq data from 284 PBMC samples, processed using the 10X Genomics 5′ sequencing platform. In total, 27,647 genes were detected, from which 1,000 highly variable genes were selected for our subsequent analyses. In our analysis, we define the binary outcome variable such that indicates a COVID positive patient, and indicates a healthy control.
Data preprocessing
To preprocess the cytometry data, we utilize the hyperbolic arcsine (arcsinh) transformation. This transformation is advantageous because it behaves similarly to a logarithmic transformation for high values while maintaining linearity around zero. Unlike the logarithmic transformation, arcsinh can effectively handle zero and small negative values. A key adjustable parameter of the arcsinh transformation is known as the “cofactor,” which controls the width of the linear region around zero. The transformation is defined as , where x is the raw measured marker intensity value, and is the cofactor. It is recommended to use a cofactor of 5 for CyTOF, and 150 for flow cytometry.83 To preprocess scRNA-seq data, we perform several steps to ensure data quality and prepare for downstream analysis. First, we filter out genes that are expressed in fewer than 10 cells. Next, we normalize the data to account for differences in sequencing depth across cells, scaling the counts in each cell to a total of 10,000. We then apply a logarithmic transformation (log1p) to stabilize variance and approximate normal distribution of the expression values. The log1p transformation computes for each expression value . Finally, we identify the top 1,000 highly variable genes for our analysis.84,85,86
Comparison methods
CellCnn
CellCnn is an end-to-end model designed to identify rare cell subsets associated with potential disease status within a large number of cells.36 The model employs a representation learning approach, enabling automatic feature extraction and deep learning simultaneously. The filter layers within the convolutional neural network extract significant subject-level phenotypes from raw cytometry data inputs. To ensure that each sample contains an equal number of cells, CellCnn performs random subsampling from the original dataset before each run.
However, compared with later works in cytometry data modeling and analysis, CellCnn only captures limited cellular complexity37 and does not generalize well to datasets with more significant batch effects and measurement noises.38 Moreover, the model supports only basic interpretability by identifying cell populations that most frequently activate each convolutional layer over many runs. CellCnn also does not utilize any form of temporal cytometry data in its predictions.
We downloaded Python codes for CellCnn from its GitHub repository: https://github.com/eiriniar/CellCnn/tree/python3. All four preprocessed datasets in CSV format were fed into the CellCnn model. We performed extensive hyperparameter tuning by exploring different combinations of key parameters: maxpool percent , number of filters , learning rate , dropout probability , and L2 regularization coefficient . For datasets containing temporal information, we combined the cytometry data across different time points for each individual subject into a single data file, as CellCnn is unable to handle time-related information of cells.
CyTOF_DL
CyTOF_DL is an end-to-end model that uses a deep convolutional neural network to predict subject-level outcomes from cytometry data.37 This model architecture captures the high-dimensional characteristics of cells through hidden layers and achieves invariance to cell permutation by using single-cell filters and max/mean pooling. The model also requires subsampling to keep the number of cells equal within each samples. To extract useful biological insights from the trained model, a permutation-based interpretation pipeline was developed. This pipeline quantifies how much each cell in the dataset influences the model’s prediction outcome. However, the model does not support the input of temporal cell information and exhibits a significant drop in prediction accuracy when the sample size is less than 200 or when fewer than 8 cell markers are used.37
We downloaded Python codes for CyTOF_DL from its GitHub repository: https://github.com/hzc363/DeepLearningCyTOF. All four preprocessed datasets in CSV format were fed into the CyTOF_DL model. During model optimization, we experimented with different architectural configurations: convolutional filters , dense units , and learning rates . For datasets containing temporal information, we combined the cytometry data across different time points for each individual subject into a single data file, as CyTOF_DL is unable to handle time-related information of cells.
CytoSet
CytoSet is an end-to-end model designed for clinical outcome prediction, utilizing a custom permutation-invariant architecture.38 CytoSet treats a collection of cells in a dataset as an unordered set, rather than a regular ordered matrix, based on the assumption that the order in which cells are profiled does not have significant biological implications. Due to this unordered approach to dataset handling, the CytoSet architecture does not consider temporal information. CytoSet also requires an equal cell count across different samples, necessitating subsampling before training. Additionally, the model lacks interpretability features.
We downloaded Python codes for CytoSet from its GitHub repository: https://github.com/CompCy-lab/cytoset. Similar to the above methods, all four preprocessed datasets in CSV format were fed into the CytoSet model. We conducted hyperparameter tuning by varying the number of blocks , hidden dimension sizes , learning rates , , . The parameters and are the exponential decay rates for the moment estimates in the Adam optimizer. controls the decay rate of the first moment (mean) of the gradient, while controls the decay rate of the second moment (uncentered variance) of the gradient. These hyperparameters affect how quickly the optimizer adapts to changes in the gradient during training. For datasets containing temporal information, we combined the cytometry data across different time points for each individual subject into a single data file, as CytoSet is unable to handle time-related information of cells.
AE
We design this AE model as a baseline for comparison with cytoGPNet. Like cytoGPNet, we use AE to encode the cytometry data. However, unlike cytoGPNet, we bypass the GP layer and instead pass the latent output directly to the set of attention layers and LR to generate subject-level outputs. Similar to cytoGPNet, the AE is first pre-trained by applying reconstruction loss to the cytometry data at the single-cell level with a default learning rate of under 100 epochs. Subsequently, the parameters of the AE are fine-tuned along with the remaining layers by minimizing the binary cross-entropy loss based on the subject-level outputs. To process temporal data, all cells will be collectively passed through a single AE. Subsequently, they will be directed to distinct attention layers corresponding to each time point. These time-specific representations will then serve as separate covariates for LR in the generation of subject-level predictions.
LR
LR is a widely utilized parametric model for analyzing multivariate data (features) with binary outcomes, as is the case in our paper. LR assumes a linear relationship between the logit transform of the class probability and the covariates. In our analysis, we employed mean expression values for all the markers of each subject as features to predict subject-level outcomes. For the cytometry data collected across time, we concatenated the mean expression values of all markers at each time point to form the full set of features. We used the glm function in R to implement LR and did not apply any regularization in our analysis.
RF
RF is an ensemble classification method that utilizes decision trees as its baseline classifiers. Each decision tree is trained on a random sample with replacement of the original data. Moreover, during the split of each node in a decision tree, only a random subset of features is considered. A key advantage of RF is its resistance to overfitting, achieved through the voting mechanism on decisions from multiple trees. Similar to the use of the above LR, we used mean expression values for all the markers as the feature set and applied the randomForest package in R with the default settings of 500 trees and randomly sampled features at each split, where is the total number of features.
Discussion
In this study, we introduce cytoGPNet, a novel approach designed to enhance the prediction accuracy of individual-level clinical outcomes using longitudinal cytometry data, particularly under the constraints of limited sample sizes. Traditional cell-gating techniques, while prevalent for extracting salient features from cytometry data, often lack direct integration with predictions, thereby reducing the relevance of cell-gating outcomes to the desired clinical results. In contrast, deep learning models leverage the extensive cytometry data directly, employing back-propagation algorithms to iteratively adjust parameters, thus refining feature detection pertinent to the desired outputs. Deep learning models show significant potential for analyzing cytometry data. However, recent methods, despite their innovation, encounter challenges in achieving consistent performance across diverse cytometry datasets. This issue is especially pronounced in datasets with very few subjects, such as the TOP1501 dataset, where stability of performance metrics such as AUC and F1 score can be compromised. Moreover, many existing methods require uniform cell counts per sample input, necessitating random sampling during preprocessing, which can result in loss of crucial cell information and reduce their effectiveness in handling longitudinal cytometry data.
Our developed method, cytoGPNet, addresses these multiple challenges inherent in cytometry data analysis: accommodating varying cell counts per sample, modeling temporal relationships, maintaining robustness with limited samples, and ensuring interpretability. cytoGPNet demonstrates substantial improvements over existing methods. Our application of cytoGPNet across diverse studies highlights its consistent superiority in prediction accuracy and its ability to provide valuable, interpretable insights. These findings underscore the potential of cytoGPNet to significantly advance the analysis of cytometry data, offering a powerful tool for immunological research. In addition to the strengths of our model, it is important to acknowledge potential extensions and limitations. Incorporating demographic information as additional covariates in the regression layer, as is common in many DNN models, could further enhance its predictive power. We opted not to include this feature in our current model due to inconsistencies across the four datasets, with two lacking demographic data. However, any future design incorporating demographic variables should be carefully implemented in subject-level outcome prediction, particularly given the relatively small sample size, to avoid overfitting. Our current approach employs a unified GP to capture cell-cell correlations across different time points. We visualized the covariance matrix and cannot find clear patterns for cell correlations and dependencies. An extension of the current explanation mechanism could involve designing more detailed analysis methods that leverage the covariance matrix to uncover hidden patterns. For example, we can apply clustering methods and then find the patterns across different clusters. Meanwhile, our model significantly outperforms in longitudinal data but may not improve that much on cytometry data with single time point. Our cytoGPNet framework’s flexibility is evident in its adaptability to different data types. For lower-dimensional cytometry data, the AE model effectively extracts cell representations. However, for higher-dimensional single-cell data, such as scRNA-seq, AEs may lose information. In such cases, transformer-based architectures could serve as alternative approaches for representation learning. Transformers have shown promise in modeling complex dependencies in high-dimensional data.87 Nonetheless, transformers do not inherently perform dimensionality reduction, so additional steps would be necessary to integrate their outputs into the GP component.
Resource availability
Lead contact
Requests for resources should be directed to the lead contact, Lin Lin (l.lin@duke.edu).
Materials availability
This study did not generate new materials.
Data and code availability
The datasets used in this paper are publicly available from the following sources. SDY170879 and SDY21280 are available from ImmPort (https://www.immport.org) under study accessions ImmPort: SDY1708 and SDY212, respectively. The HEUvsUE dataset81 is available through FlowRepository under repository ID FR-FCM-ZZZU. The CMV dataset73 is hosted on Mendeley Data at https://data.mendeley.com/datasets/fnbvcyf223/1, and the SC4 dataset82 can be accessed via CELLxGENE at https://cellxgene.cziscience.com/collections/0a839c4b-10d0-4d64-9272-684c49a2c8ba. Raw flow cytometry data for the TOP1051 dataset have been deposited on Zenodo.88 An open-source implementation of cytoGPNet is available on GitHub at https://github.com/llin-lab/cytoGPNet and has also been archived on Zenodo.89
Acknowledgments
Merck Sharp & Dohme LLC, a subsidiary of Merck & Co., Inc., Rahway, NJ, provided financial support for the study. The opinions expressed in this paper are those of the authors and do not necessarily represent those of Merck Sharp & Dohme LLC. This research was also supported by the Duke University Center for AIDS Research (CFAR), an NIH funded program (5P30 AI064518), and NIH P01 (2 P01 AI129859). The authors gratefully recognize the contributions of Jennifer Enzor and Prekshaben Patel, who generated all of the original TOP1501 flow cytometry data in the Duke Immune Profiling Core (DIPC), a designated Shared Resource of the NIH-sponsored Duke Cancer Institute (5P30-CA014236-50). We would like to thank Xiaoyu Liu for his valuable assistance in performing the benchmarking analysis, which contributed to the early stages of this project.
Author contributions
J.Z. programmed the model, performed data analyses, and edited the manuscript. L.S. performed benchmark analysis and edited the manuscript. N.E.R. reviewed data and edited the manuscript. W.G. conceptualized the new model and wrote the manuscript. L.L. conceived the study, conceptualized the new model, interpreted the results, and wrote the manuscript.
Declaration of interests
The authors declare no competing interests.
Declaration of generative AI and AI-assisted technologies in the writing process
During the preparation of this work, the authors used ChatGPT to improve conciseness. After using this tool, the authors reviewed and edited the content as needed and take full responsibility for the content of the publication.
Published: June 25, 2025
Footnotes
Supplemental information can be found online at https://doi.org/10.1016/j.patter.2025.101297.
Contributor Information
Wenbo Guo, Email: henrygwb@ucsb.edu.
Lin Lin, Email: l.lin@duke.edu.
Supplemental information
References
- 1.Lin L., Finak G., Ushey K., Seshadri C., Hawn T.R., Frahm N., Scriba T.J., Mahomed H., Hanekom W., Bart P.-A., et al. COMPASS identifies t-cell subsets correlated with clinical outcomes. Nat. Biotechnol. 2015;33:610–616. doi: 10.1038/nbt.3187. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Daud A.I., Loo K., Pauli M.L., Sanchez-Rodriguez R., Sandoval P.M., Taravati K., Tsai K., Nosrati A., Nardo L., Alvarado M.D., et al. Tumor immune profiling predicts response to anti–pd-1 therapy in human melanoma. J. Clin. Investig. 2016;126:3447–3452. doi: 10.1172/JCI87324. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Lingblom C.M.D., Kowli S., Swaminathan N., Maecker H.T., Lambert S.L. Baseline immune profile by cytof can predict response to an investigational adjuvanted vaccine in elderly adults. J. Transl. Med. 2018;16:153. doi: 10.1186/s12967-018-1528-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Schilling H.-L., Glehr G., Kapinsky M., Ahrens N., Riquelme P., Cordero L., Bitterer F., Schlitt H.J., Geissler E.K., Haferkamp S., et al. Development of a flow cytometry assay to predict immune checkpoint blockade-related complications. Front. Immunol. 2021;12 doi: 10.3389/fimmu.2021.765644. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Shah J.A., Musvosvi M., Shey M., Horne D.J., Wells R.D., Peterson G.J., Cox J.S., Daya M., Hoal E.G., Lin L., et al. A functional toll-interacting protein variant is associated with bacillus calmette-guérin–specific immune responses and tuberculosis. Am. J. Respir. Crit. Care Med. 2017;196:502–511. doi: 10.1164/rccm.201611-2346OC. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Tamariz-Amador L.-E., Battaglia A.M., Maia C., Zherniakova A., Guerrero C., Zabaleta A., Burgos L., Botta C., Fortuño M.-A., Grande C., et al. Immune biomarkers to predict sars-cov-2 vaccine effectiveness in patients with hematological malignancies. Blood Cancer J. 2021;11:202. doi: 10.1038/s41408-021-00594-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Aleman A., Upadhyaya B., Tuballes K., Kappes K., Gleason C.R., Beach K., Agte S., Srivastava K., PVI/Seronet Study Group, Van Oekelen O., et al. Variable cellular responses to sars-cov-2 in fully vaccinated patients with multiple myeloma. Cancer Cell. 2021;39:1442–1444. doi: 10.1016/j.ccell.2021.09.015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Stefanski A.-L., Rincon-Arevalo H., Schrezenmeier E., Karberg K., Szelinski F., Ritter J., Chen Y., Jahrsdörfer B., Ludwig C., Schrezenmeier H., et al. B cell characteristics at baseline predict vaccination response in rtx treated patients. Front. Immunol. 2022;13 doi: 10.3389/fimmu.2022.822885. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Seshadri C., Lin L., Scriba T.J., Peterson G., Freidrich D., Frahm N., DeRosa S.C., Moody D.B., Prandi J., Gilleron M., et al. T cell responses against mycobacterial lipids and proteins are poorly correlated in south african adolescents. J. Immunol. 2015;195:4595–4603. doi: 10.4049/jimmunol.1501285. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Schillebeeckx I., Earls J., Flanagan K.C., Hiken J., Bode A., Armstrong J.R., Messina D.N., Adkins D., Ley J., Alborelli I., et al. T cell subtype profiling measures exhaustion and predicts anti-pd-1 response. Sci. Rep. 2022;12:1342. doi: 10.1038/s41598-022-05474-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Xiao X., Guo Q., Cui C., Lin Y., Zhang L., Ding X., Li Q., Wang M., Yang W., Kong Y., Yu R. Multiplexed imaging mass cytometry reveals distinct tumor-immune microenvironments linked to immunotherapy responses in melanoma. Commun. Med. 2022;2:131. doi: 10.1038/s43856-022-00197-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Rubio A.M., Everaert C., Damme E.V., Preter K.D., Vermaelen K. Circulating immune cell dynamics as outcome predictors for immunotherapy in non-small cell lung cancer. J. Immunother. Cancer. 2023;11 doi: 10.1136/jitc-2023-007023. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Nuñez N.G., Berner F., Friebel E., Unger S., Wyss N., Gomez J.M., Purde M.-T., Niederer R., Porsch M., Lichtensteiger C., et al. Immune signatures predict development of autoimmune toxicity in patients with cancer treated with immune checkpoint inhibitors. Med. 2023;4:113–129.e7. doi: 10.1016/j.medj.2022.12.007. [DOI] [PubMed] [Google Scholar]
- 14.Brummelman J., Haftmann C., Núñez N.G., Alvisi G., Mazza E.M.C., Becher B., Lugli E. Development, application and computational analysis of high-dimensional fluorescent antibody panels for single-cell flow cytometry. Nat. Protoc. 2019;14:1946–1969. doi: 10.1038/s41596-019-0166-2. [DOI] [PubMed] [Google Scholar]
- 15.Qiu P., Simonds E.F., Bendall S.C., Gibbs K.D., Bruggner R.V., Linderman M.D., Sachs K., Nolan G.P., Plevritis S.K. Extracting a cellular hierarchy from high-dimensional cytometry data with spade. Nat. Biotechnol. 2011;29:886–891. doi: 10.1038/nbt.1991. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Levine J.H., Simonds E.F., Bendall S.C., Davis K.L., Amir E.a.D., Tadmor M.D., Litvin O., Fienberg H.G., Jager A., Zunder E.R., et al. Data-driven phenotypic dissection of aml reveals progenitor-like cells that correlate with prognosis. Cell. 2015;162:184–197. doi: 10.1016/j.cell.2015.05.047. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Van Gassen S., Callebaut B., Van Helden M.J., Lambrecht B.N., Demeester P., Dhaene T., Saeys Y. Flowsom: Using self-organizing maps for visualization and interpretation of cytometry data. Cytometry. A. 2015;87:636–645. doi: 10.1002/cyto.a.22625. [DOI] [PubMed] [Google Scholar]
- 18.Aghaeepour N., Nikolic R., Hoos H.H., Brinkman R.R. Rapid cell population identification in flow cytometry data. Cytometry. A. 2011;79:6–13. doi: 10.1002/cyto.a.21007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Ge Y., Sealfon S.C. flowPeaks: a fast unsupervised clustering for flow cytometry data via K-means and density peak finding. Bioinformatics. 2012;28:2052–2058. doi: 10.1093/bioinformatics/bts300. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Finak G., Frelinger J., Jiang W., Newell E.W., Ramey J., Davis M.M., Kalams S.A., De Rosa S.C., Gottardo R. Opencyto: An open source infrastructure for scalable, robust, reproducible, and automated, end-to-end flow cytometry data analysis. PLoS Comput. Biol. 2014;10:e1003806. doi: 10.1371/journal.pcbi.1003806. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Lin L., Chan C., Hadrup S.R., Froesig T.M., Wang Q., West M. Hierarchical bayesian mixture modelling for antigen-specific t-cell subtyping in combinatorially encoded flow cytometry studies. Stat. Appl. Genet. Mol. Biol. 2013;12:309–331. doi: 10.1515/sagmb-2012-0001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Pyne S., Hu X., Wang K., Rossin E., Lin T.-I., Maier L.M., Baecher-Allan C., McLachlan G.J., Tamayo P., Hafler D.A., et al. Automated high-dimensional flow cytometric data analysis. Proc. Natl. Acad. Sci. USA. 2009;106:8519–8524. doi: 10.1073/pnas.0903028106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Lin L., Chan C., West M. Discriminative variable subsets in bayesian classification with mixture models, with application in flow cytometry studies. Biostatistics. 2016;17:40–53. doi: 10.1093/biostatistics/kxv021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Cron A., Gouttefangeas C., Frelinger J., Lin L., Singh S.K., Britten C.M., Welters M.J.P., van der Burg S.H., West M., Chan C. Hierarchical modeling for rare event detection and cell subset alignment across flow cytometry samples. PLoS Comput. Biol. 2013;9:e1003130. doi: 10.1371/journal.pcbi.1003130. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Cheung M., Campbell J.J., Whitby L., Thomas R.J., Braybrook J., Petzing J. Current trends in flow cytometry automated data analysis software. Cytometry. A. 2021;99:1007–1021. doi: 10.1002/cyto.a.24320. [DOI] [PubMed] [Google Scholar]
- 26.Lin L., Li J. Clustering with hidden markov model on variable blocks. J. Mach. Learn. Res. 2017;18:1–49. [Google Scholar]
- 27.Lin L., Frelinger J., Jiang W., Finak G., Seshadri C., Bart P.-A., Pantaleo G., McElrath J., DeRosa S., Gottardo R. Identification and visualization of multidimensional antigen-specific t-cell populations in polychromatic cytometry data. Cytometry. A. 2015;87:675–682. doi: 10.1002/cyto.a.22623. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Maecker H.T., McCoy J.P., Nussenblatt R. Standardizing immunophenotyping for the human immunology project. Nat. Rev. Immunol. 2012;12:191–200. doi: 10.1038/nri3158. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Hu Z., Glicksberg B.S., Butte A.J. Robust prediction of clinical outcomes using cytometry data. Bioinformatics. 2019;35:1197–1203. doi: 10.1093/bioinformatics/bty768. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Zhang J., Li J., Lin L. Statistical and machine learning methods for immunoprofiling based on single-cell data. Hum. Vaccin. Immunother. 2023;19 doi: 10.1080/21645515.2023.2234792. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.He, B., Thomson, M., Subramaniam, M., Perez, R., Ye, C. J., and Zou, J. CloudPred: Predicting Patient Phenotypes from Single-Cell RNA-Seq ( 337–348) :( 337–348). 2021 doi: 10.1142/9789811250477_0031. [DOI] [PubMed]
- 32.Xiong G., Bekiranov S., Zhang A. ProtoCell4P: an explainable prototype-based neural network for patient classification using single-cell RNA-seq. Bioinformatics. 2023;39 doi: 10.1093/bioinformatics/btad493. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Zeng F., Kong X., Yang F., Chen T., Han J. scpheno: A deep generative model to integrate scrna-seq with disease phenotypes and its application on prediction of covid-19 pneumonia and severe assessment. bioRxiv. 2022 doi: 10.1101/2022.06.20.496916. Preprint at. [DOI] [Google Scholar]
- 34.Kang Y., Vijay S., Gujral T.S. Deep neural network modeling identifies biomarkers of response to immune-checkpoint therapy. iScience. 2022;25 doi: 10.1016/j.isci.2022.104228. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Mao Y., Lin Y.-Y., Wong N.K.Y., Volik S., Sar F., Collins C., Ester M. Phenotype prediction from single-cell rna-seq data using attention-based neural networks. Bioinformatics. 2024;40:btae067. doi: 10.1093/bioinformatics/btae067. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Arvaniti E., Claassen M. Sensitive detection of rare disease-associated cell subsets via representation learning. Nat. Commun. 2017;8 doi: 10.1038/ncomms14825. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Hu Z., Tang A., Singh J., Bhattacharya S., Butte A.J. A robust and interpretable end-to-end deep learning model for cytometry data. Proc. Natl. Acad. Sci. USA. 2020;117:21373–21380. doi: 10.1073/pnas.2003026117. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Yi H., Stanley N. Proceedings of the 12th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics. BCB ’21. Association for Computing Machinery; New York, NY, USA: 2021. Cytoset: Predicting clinical outcomes via set-modeling of cytometry data. [Google Scholar]
- 39.Hu Z., Bhattacharya S., Butte A.J. Application of machine learning for cytometry data. Front. Immunol. 2021;12 doi: 10.3389/fimmu.2021.787574. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Lo Y.-C., Keyes T.J., Jager A., Sarno J., Domizi P., Majeti R., Sakamoto K.M., Lacayo N., Mullighan C.G., Waters J., et al. Cytofin enables integrated analysis of public mass cytometry datasets using generalized anchors. Nat. Commun. 2022;13:934. doi: 10.1038/s41467-022-28484-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Nixon A.B., Schalper K.A., Jacobs I., Potluri S., Wang I.-M., Fleener C. Peripheral immune-based biomarkers in cancer immunotherapy: can we realize their predictive potential? J. Immunother. Cancer. 2019;7:325. doi: 10.1186/s40425-019-0799-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Mann E.R., Menon M., Knight S.B., Konkel J.E., Jagger C., Shaw T.N., Krishnan S., Rattray M., Ustianowski A., Bakerly N.D., et al. Longitudinal immune profiling reveals key myeloid signatures associated with covid-19. Sci. Immunol. 2020;5 doi: 10.1126/sciimmunol.abd6197. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Xiong S., Zhu D., Liang B., Li M., Pan W., He J., Wang H., Sutter K., Dittmer U., Lu M., et al. Longitudinal characterization of phenotypic profile of t cells in chronic hepatitis b identifies immune markers associated with hbsag loss. EBioMedicine. 2021;69 doi: 10.1016/j.ebiom.2021.103464. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Hiam-Galvez K.J., Allen B.M., Spitzer M.H. Systemic immunity in cancer. Nat. Rev. Cancer. 2021;21:345–359. doi: 10.1038/s41568-021-00347-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Sasaki T., Bracero S., Keegan J., Chen L., Cao Y., Stevens E., Qu Y., Wang G., Nguyen J., Sparks J.A., et al. Longitudinal immune cell profiling in patients with early systemic lupus erythematosus. Arthritis Rheumatol. 2022;74:1808–1821. doi: 10.1002/art.42248. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Leung E.L.-H., Li R.-Z., Fan X.-X., Wang L.Y., Wang Y., Jiang Z., Huang J., Pan H.-D., Fan Y., Xu H., et al. Longitudinal high-dimensional analysis identifies immune features associating with response to anti-pd-1 immunotherapy. Nat. Commun. 2023;14:5115. doi: 10.1038/s41467-023-40631-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.McInnes L., Healy J., Saul N., Großberger L. Umap: Uniform manifold approximation and projection. J. Open Source Softw. 2018;3:861. [Google Scholar]
- 48.van der Maaten L., Hinton G. Visualizing data using t-sne. J. Mach. Learn. Res. 2008;9:2579–2605. [Google Scholar]
- 49.Yao C., Rich J.B., Tirona K., Bernstein L.J. Intraindividual variability in reaction time before and after neoadjuvant chemotherapy in women diagnosed with breast cancer. Psychooncology. 2017;26:2261–2268. doi: 10.1002/pon.4351. [DOI] [PubMed] [Google Scholar]
- 50.Leete J.C., Zager M.G., Musante C.J., Shtylla B., Qiao W. Sources of inter-individual variability leading to significant changes in anti-pd-1 and anti-pd-l1 efficacy identified in mouse tumor models using a qsp framework. Front. Pharmacol. 2022;13 doi: 10.3389/fphar.2022.1056365. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Rasmussen C.E., Williams C.K.I. The MIT Press; 2005. Gaussian Processes for Machine Learning. [DOI] [Google Scholar]
- 52.Wilson A.G., Hu Z., Salakhutdinov R., Xing E.P. In: Proceedings of the International Conference on Neural Information Processing Systems. Lee D., Sugiyama M., Luxburg U., Guyon I., Garnett R., editors. Curran Associates Inc; Red Hook, NY, USA: 2016. Stochastic variational deep kernel learning; pp. 2594–2602. [DOI] [Google Scholar]
- 53.Ruder S. An overview of gradient descent optimization algorithms. arXiv. 2016 doi: 10.48550/arXiv.1609.04747. Preprint at. [DOI] [Google Scholar]
- 54.Kingma D., Ba J. In: Proceedings of the 3rd International Conference on Learning Representations. Bengio Y., LeCun Y., editors. ICLR; 2015. Adam: A method for stochastic optimization. [Google Scholar]
- 55.Xi N.M., Li J.J. Exploring the optimization of autoencoder design for imputing single-cell rna sequencing data. Comput. Struct. Biotechnol. J. 2023;21:4079–4095. doi: 10.1016/j.csbj.2023.07.041. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Yang X., Zheng Y., Han Z., Zhang X. Functions and clinical significance of KLRG1 in the development of lung adenocarcinoma and immunotherapy. BMC Cancer. 2021;21:752. doi: 10.1186/s12885-021-08510-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Toor S.M., Sasidharan Nair V., Pfister G., Elkord E. Effect of pembrolizumab on CD4+ CD25+ , CD4+ LAP+ and CD4+ TIM-3+ T cell subsets. Clin. Exp. Immunol. 2019;196:345–352. doi: 10.1111/cei.13264. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Remmerswaal E.B.M., Hombrink P., Nota B., Pircher H., Ten Berge I.J.M., van Lier R.A.W., van Aalderen M.C. Expression of IL-7Rα and KLRG1 defines functionally distinct CD8+ t-cell populations in humans. Eur. J. Immunol. 2019;49:694–708. doi: 10.1002/eji.201847897. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Strowig T., Brilot F., Münz C. Noncytotoxic functions of NK cells: direct pathogen restriction and assistance to adaptive immunity. J. Immunol. 2008;180:7785–7791. doi: 10.4049/jimmunol.180.12.7785. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Al Moussawy M., Abdelsamed H.A. Non-cytotoxic functions of CD8 T cells: “repentance of a serial killer”. Front. Immunol. 2022;13 doi: 10.3389/fimmu.2022.1001129. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Weigelin B., den Boer A.T., Wagena E., Broen K., Dolstra H., de Boer R.J., Figdor C.G., Textor J., Friedl P. Cytotoxic T cells are able to efficiently eliminate cancer cells by additive cytotoxicity. Nat. Commun. 2021;12:5217. doi: 10.1038/s41467-021-25282-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Cho B.C., Yoh K., Perets R., Nagrial A., Spigel D.R., Gutierrez M., Kim D.-W., Kotasek D., Rasco D., Niu J., et al. Anti–cytotoxic t-lymphocyte–associated antigen-4 monoclonal antibody quavonlimab in combination with pembrolizumab: Safety and efficacy from a phase i study in previously treated extensive-stage small cell lung cancer. Lung Cancer. 2021;159:162–170. doi: 10.1016/j.lungcan.2021.07.009. [DOI] [PubMed] [Google Scholar]
- 63.Luoma A.M., Suo S., Wang Y., Gunasti L., Porter C.B.M., Nabilsi N., Tadros J., Ferretti A.P., Liao S., Gurer C., et al. Tissue-resident memory and circulating T cells are early responders to pre-surgical cancer immunotherapy. Cell. 2022;185:2918–2935.e29. doi: 10.1016/j.cell.2022.06.018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Wu P., Zhao L., Chen Y., Xin Z., Lin M., Hao Z., Chen X., Chen D., Wu D., Chai Y. CD38 identifies pre-activated CD8+ T cells which can be reinvigorated by anti-PD-1 blockade in human lung cancer. Cancer Immunol. Immunother. 2021;70:3603–3616. doi: 10.1007/s00262-021-02949-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Dunham R.M., Cervasi B., Brenchley J.M., Albrecht H., Weintrob A., Sumpter B., Engram J., Gordon S., Klatt N.R., Frank I., et al. CD127 and CD25 expression defines CD4+ T cell subsets that are differentially depleted during HIV infection. J. Immunol. 2008;180:5582–5592. doi: 10.4049/jimmunol.180.8.5582. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Wald O., Weiss I.D., Galun E., Peled A. Chemokines in hepatitis c virus infection: Pathogenesis, prognosis and therapeutics. Cytokine. 2007;39:50–62. doi: 10.1016/j.cyto.2007.05.013. [DOI] [PubMed] [Google Scholar]
- 67.Cruz-Tapias P., Castiblanco J., Anaya J.-M. El Rosario University Press; 2013. Major Histocompatibility Complex: Antigen Processing and Presentation. [Google Scholar]
- 68.Weber L.M., Nowicka M., Soneson C., Robinson M.D. diffcyt: Differential discovery in high-dimensional cytometry via high-resolution clustering. Commun. Biol. 2019;2:183. doi: 10.1038/s42003-019-0415-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Benjamini Y., Yekutieli D. The control of the false discovery rate in multiple testing under dependency. Ann. Stat. 2001;29:1165–1188. [Google Scholar]
- 70.Efron B. Correlation and large-scale simultaneous significance testing. J. Am. Stat. Assoc. 2007;102:93–103. doi: 10.1198/016214506000001211. [DOI] [Google Scholar]
- 71.Blanchard G., Roquain E. Adaptive false discovery rate control under independence and dependence. J. Mach. Learn. Res. 2009;10:2837–2871. [Google Scholar]
- 72.Goeman J.J., Solari A. Multiple hypothesis testing in genomics. Stat. Med. 2014;33:1946–1978. doi: 10.1002/sim.6082. [DOI] [PubMed] [Google Scholar]
- 73.Ishiyama K., Arakawa-Hoyt J., Aguilar O.A., Damm I., Towfighi P., Sigdel T., Tamaki S., Babdor J., Spitzer M.H., Reed E.F., et al. Mass cytometry reveals single-cell kinetics of cytotoxic lymphocyte evolution in CMV-infected renal transplant patients. Proc. Natl. Acad. Sci. USA. 2022;119 doi: 10.1073/pnas.2116588119. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Rasmussen C.E., Williams C.K.I. Adaptive Computation and Machine Learning. MIT Press; 2006. Gaussian processes for machine learning. [Google Scholar]
- 75.Murphy K.P. MIT press; 2012. Machine Learning: A Probabilistic Perspective. [Google Scholar]
- 76.Cheng L., Ramchandran S., Vatanen T., Lietzén N., Lahesmaa R., Vehtari A., Lähdesmäki H. An additive gaussian process regression model for interpretable non-parametric analysis of longitudinal data. Nat. Commun. 2019;10:1798. doi: 10.1038/s41467-019-09785-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Maddison C.J., Mnih A., Teh Y.W. Proceedings of the 5th International Conference on Learning Representations. ICLR; 2017. The concrete distribution: A continuous relaxation of discrete random variables. [Google Scholar]
- 78.Thrun, S. (1999). Monte carlo pomdps. In: Proceedings of the International Conference on Neural Information Processing Systems vol. 12.
- 79.Wilk A.J., Lee M.J., Wei B., Parks B., Pi R., Martínez-Colón G.J., Ranganath T., Zhao N.Q., Taylor S., Becker W., et al. Multi-omic profiling reveals widespread dysregulation of innate immunity and hematopoiesis in COVID-19. J. Exp. Med. 2021;218 doi: 10.1084/jem.20210582. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80.Furman D., Jojic V., Kidd B., Shen-Orr S., Price J., Jarrell J., Tse T., Huang H., Lund P., Maecker H.T., et al. Apoptosis and other immune biomarkers predict influenza vaccine responsiveness. Mol. Syst. Biol. 2013;9:659. doi: 10.1038/msb.2013.15. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 81.Aghaeepour N., Finak G., FlowCAP Consortium, DREAM Consortium. Hoos H., Mosmann T.R., Brinkman R., Gottardo R., Scheuermann R.H. Critical assessment of automated flow cytometry data analysis techniques. Nat. Methods. 2013;10:228–238. doi: 10.1038/nmeth.2365. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 82.Ren X., Wen W., Fan X., Hou W., Su B., Cai P., Li J., Liu Y., Tang F., Zhang F., et al. COVID-19 immune features revealed by a large-scale single-cell transcriptome atlas. Cell. 2021;184:1895–1913.e19. doi: 10.1016/j.cell.2021.01.053. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 83.Melsen J.E., van Ostaijen-ten Dam M.M., Lankester A.C., Schilham M.W., van den Akker E.B. A Comprehensive Workflow for Applying Single-Cell Clustering and Pseudotime Analysis to Flow Cytometry Data. J. Immunol. 2020;205:864–871. doi: 10.4049/jimmunol.1901530. [DOI] [PubMed] [Google Scholar]
- 84.Wolf F.A., Angerer P., Theis F.J. Scanpy: large-scale single-cell gene expression data analysis. Genome Biol. 2018;19:15. doi: 10.1186/s13059-017-1382-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 85.Luecken M.D., Theis F.J. Current best practices in single-cell rna-seq analysis: a tutorial. Mol. Syst. Biol. 2019;15 doi: 10.15252/msb.20188746. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 86.Hafemeister C., Satija R. Normalization and variance stabilization of single-cell rna-seq data using regularized negative binomial regression. Genome Biol. 2019;20 doi: 10.1186/s13059-019-1874-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 87.Vaswani A., Shazeer N., Parmar N., Uszkoreit J., Jones L., Gomez A.N., Kaiser L.U., Polosukhin I. In: Guyon I., Luxburg U.V., Bengio S., Wallach H., Fergus R., Vishwanathan S., Garnett R., editors. Vol. 30. Curran Associates, Inc; 2017. Attention is all you need. (Advances in Neural Information Processing Systems). [Google Scholar]
- 88.Zhang, J., Sun, L., Ready, N. E., Guo, W., and Lin, L. (2025). Top1501 dataset and analysis for the paper “cytoGPNet: Enhancing clinical outcome prediction accuracy using longitudinal cytometry data in small cohort studies”. Zenodo. doi: 10.5281/zenodo.14999103. [DOI]
- 89.Zhang, J., Sun, L., Ready, N. E., Guo, W., and Lin, L. (2025). Python code for the paper “cytoGPNet: Enhancing clinical outcome prediction accuracy using longitudinal cytometry data in small cohort studies”. Zenodo. doi: 10.5281/zenodo.15258365. [DOI]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The datasets used in this paper are publicly available from the following sources. SDY170879 and SDY21280 are available from ImmPort (https://www.immport.org) under study accessions ImmPort: SDY1708 and SDY212, respectively. The HEUvsUE dataset81 is available through FlowRepository under repository ID FR-FCM-ZZZU. The CMV dataset73 is hosted on Mendeley Data at https://data.mendeley.com/datasets/fnbvcyf223/1, and the SC4 dataset82 can be accessed via CELLxGENE at https://cellxgene.cziscience.com/collections/0a839c4b-10d0-4d64-9272-684c49a2c8ba. Raw flow cytometry data for the TOP1051 dataset have been deposited on Zenodo.88 An open-source implementation of cytoGPNet is available on GitHub at https://github.com/llin-lab/cytoGPNet and has also been archived on Zenodo.89