Summary
The application and evaluation of single-cell foundation models (scFMs) present significant challenges due to heterogeneous architectures and coding standards. To address this, we introduce BioLLM (biological large language model), a unified framework for integrating and applying scFMs to single-cell RNA sequencing analysis. BioLLM provides a unified interface that integrates diverse scFMs, eliminating architectural and coding inconsistencies to enable streamlined model access. With standardized APIs and comprehensive documentation, BioLLM supports streamlined model switching and consistent benchmarking. Our comprehensive evaluation of scFMs revealed distinct strengths and limitations, highlighting scGPT’s robust performance across all tasks, including zero shot and fine-tuning. Geneformer and scFoundation demonstrated strong capabilities in gene-level tasks, benefiting from effective pretraining strategies. In contrast, scBERT lagged behind, likely due to its smaller model size and limited training data. Ultimately, BioLLM aims to empower the scientific community to leverage the full potential of foundational models, advancing our understanding of complex biological systems through enhanced single-cell analysis.
Keywords: single-cell foundation models, unified framework, model benchmarking, zero shot and fine-tuning
Highlights
-
•
Unified interface for diverse single-cell foundational models
-
•
Standardized APIs enable seamless model integration and evaluation
-
•
Zero-shot and fine-tuning support for benchmarking tasks
-
•
Reveals performance trade-offs across leading scFM architectures
The bigger picture
Advanced computer tools are helping scientists identify patterns in large datasets of gene activity data and gain a deeper understanding of complex biology. These tools, however, can be challenging to use because they vary widely and lack standardized methods for evaluating their performance. BioLLM (biological large language model), a unified system, simplifies the process of using, comparing, and improving these models. With clear instructions and built-in tests, it enables researchers to work more efficiently and uncover valuable insights.
BioLLM is a unified framework that simplifies the use of single-cell foundational models by bridging diverse architectures through standardized APIs and evaluation protocols. It enables seamless integration, model switching, and benchmarking in both zero-shot and fine-tuned settings. Through a comprehensive comparison of leading models, BioLLM reveals key performance differences and practical trade-offs. This work advances the accessibility, usability, and reproducibility of foundational models in single-cell transcriptomics.
Introduction
Single-cell RNA sequencing (scRNA-seq) has revolutionized molecular biology by enabling high-resolution transcriptome profiling, offering new insights into the complexity of biological systems.1,2,3,4 However, as vast amounts of single-cell data accumulate, effectively mining and extracting key features from these datasets pose significant challenges in biological research.5,6,7,8,9 The advancement of deep learning, particularly through foundation models,10,11,12,13,14,15 presents substantial potential in addressing these challenges. Foundation models are AI systems trained on extensive and diverse datasets without reliance on human annotations, enabling them to capture complex patterns and adapt to a wide range of tasks with significantly fewer data compared to models trained from scratch.16,17,18 Central to this innovation are transformer19 architectures, exemplified by models such as BERT20 and GPT-4,21 which have transformed the landscape of machine learning. The inherent flexibility and computational efficiency of transformers make them exceptionally well suited for the intricate task of mining single-cell data, thereby facilitating the extraction of meaningful biological insights from complex cellular landscapes.22,23,24,25,26
Leveraging the potential of foundation models to address the complexities of single-cell data analysis, several models, such as scBERT,10 Geneformer,11 scGPT,12 and scFoundation,13 have been developed to tackle specific challenges in this field. However, these models demonstrate both commonalities and distinctions in their architectural design and pretraining strategies, accompanied by differences in dataset size and parameter count. For example, scBERT employs a bidirectional transformer trained by masked language modeling and incorporates gene2vec embeddings27 to represent gene identities,10 while scGPT employs an autoregressive training strategy with flash-attention blocks28 and random gene identity embeddings, focusing on generating summaries for each cell.12 Given these differences, the performance of each model can vary significantly across various downstream tasks, such as batch-effect correction and cell-type classification. Therefore, it is essential to evaluate these models systematically to determine which performs best in specific contexts.
Moreover, the varying levels of code accessibility and documentation across these single-cell foundation models (scFMs) create significant challenges for their unified implementation. While some models, such as Geneformer and scGPT, provide extensive documentation and well-structured open-source repositories that facilitate easy integration and customization, others may lack comprehensive guidelines or present less user-friendly implementations. This inconsistency can hinder researchers from effectively leveraging multiple models, as differing frameworks and coding standards lead to compatibility issues. Consequently, integrating these diverse models into a single analytical pipeline becomes challenging, limiting the ability to conduct comparative analyses across multiple tasks simultaneously. This lack of standardization not only affects the usability of the models but also complicates reproducibility and collaboration within the research community. To fully realize the potential of foundation models in scRNA-seq, it is essential to advocate for a standardized framework for model usage and integration. Such a framework would enable researchers to seamlessly call upon various models in a unified context, thereby enhancing the reproducibility of research findings and fostering greater collaboration.
Here, we introduce BioLLM, which stands for “biological large language model,” which is a standardized framework aimed at facilitating the integration and utilization of large language models specifically for scRNA-seq analyses. A core feature of BioLLM is its ability to seamlessly incorporate various scFMs for downstream analysis through a cohesive interface, allowing researchers to access different models regardless of their architectural differences or coding standards. The framework supports benchmarking of scFMs, providing critical insights that help in selecting the most suitable models for specific tasks. With standardized APIs and comprehensive documentation, BioLLM facilitates easy model switching and comparative analyses while also incorporating best practices for model evaluation to ensure consistent performance assessment across various tasks. This advancement significantly enhances the quality and reliability of bioinformatics analyses in scRNA-seq.
Results
BioLLM provides a unified framework for scalable scFM analysis
scFMs represent a breakthrough in cellular heterogeneity analysis,29 yet their widespread utilization faces three critical challenges: inconsistent preprocessing pipelines, heterogeneous model interfaces, and non-standardized evaluation metrics. To address these limitations, we developed BioLLM, a unified framework that standardizes the deployment of scFMs through three integrated modules (Figure 1; Table S1). The first module implements a decision-tree-based preprocessing interface that establishes rigorous quality control standards for input data (Figure S1A). The BioTask executor functions as the central analytical engine of the framework, implementing a systematic workflow that progresses through five stages: configuration parsing, model initialization, data preprocessing, data-loader construction, and task execution. This sophisticated pipeline facilitates both zero-shot inference via cell or gene embeddings and targeted model fine-tuning for specialized applications, including cell-type annotation and drug response prediction. At the core of BioLLM lies its foundation model loader, which provides a unified interface for seamlessly integrating prominent scFMs such as scBERT, Geneformer, scFoundation, and scGPT (Figure S1B). This standardized approach enables systematic deployment and comparative evaluation of multiple foundation models within a consistent analytical framework. The third module complements this architecture by implementing comprehensive performance metrics that assess three crucial aspects: embedding quality through silhouette scores, biological fidelity through gene regulatory network (GRN) analysis, and prediction accuracy through standard classification metrics.
Figure 1.
BioLLM framework for single-cell data analysis
The BioLLM framework consists of three components: entries, BioTask executor, and evaluation. Entries include the input dataset, configuration file, and pretrained model. The BioTask executor processes tasks through five steps: configuration parsing, model initialization, data preprocessing, data-loader construction, and task execution (including zero-shot and fine-tuning tasks). Evaluation involves cell embedding, GRN analysis, cell-type annotation, and drug response prediction.
Through this integrated approach, BioLLM advances the field by providing a standardized, reproducible framework for large-scale single-cell data analysis across multiple foundation models, addressing a critical need in single-cell genomics research.
BioLLM supports a comprehensive evaluation of the cell representation capacity of scFMs
scFMs leverage extensive training datasets to learn and generate cell embeddings, effectively transforming potentially noisy gene expression data into a biologically meaningful latent space.30 We evaluated the performance of these models in zero-shot settings by assessing the quality of cell embeddings in both individual dataset and joint dataset contexts (Table S2), utilizing average silhouette width (ASW) as the evaluation metric. ASW indicates the similarity of an object to its own cluster; high values reflect quality embeddings capturing biological differences, while low values suggest poor differentiation and potential quality issues. Our initial evaluations comprised four distinct individual datasets to confirm the biological relevance of the zero-shot cell embeddings. The results demonstrated that scGPT consistently outperformed other models (Figure 2A). Uniform manifold approximation and projection (UMAP) visualizations further revealed that scGPT achieved superior separation of cell types compared to other foundational models (Figure S2). This advantage can be attributed to scGPT’s capacity to capture complex cellular features, thereby enhancing separability. Its architecture is particularly proficient at preserving biologically relevant information, rendering it more effective for clustering tasks.
Figure 2.
Evaluation of cell representation by foundational models
(A) Circular bar plot of ASW scores for scBERT, Geneformer, scGPT, and scFoundation across multiple datasets (Zheng68K, liver, kidney, and blood).
(B) Summary table of ASW scores for cell type and batch correction across three datasets (humanDC, hPancreas, and hPBMC).
(C) 3D surface plots of ASW variation with the number of highly variable genes (HVGs) from 500 to 3,000.
(D) Line charts of running time (top) and GPU memory usage (bottom) for cell embeddings with varying gene lengths.
Batch effects pose significant challenges in single-cell datasets, potentially compromising accurate interpretations. Consequently, a primary objective in single-cell analysis is to mitigate batch effects while preserving essential biological distinctions, thereby ensuring effective data integration.31,32 We evaluated the batch-effect-removal capabilities of scFMs in zero-shot cell embedding tasks using three joint datasets characterized by varying degrees of batch effects. ASW scores, incorporating both cell-type and batch information, were analyzed. Notably, scGPT outperformed the other models across both metrics, yielding superior results compared to principal-component analysis (PCA), while the other models performed worse than PCA (Figure 2B). UMAP visualizations demonstrated that while scGPT effectively integrated cells of the same type under consistent experimental conditions, it generally struggled to correct for batch effects across different technologies, whereas Geneformer and scFoundation distinguished certain cell types, but scBERT exhibited particularly poor performance (Figure S3).
We further investigated the impact of varying gene input lengths on the cell embeddings generated by each foundation model (Figure 2C). The results indicated that as the input sequence length increased, scGPT embeddings became more accurate in representing true biological features, suggesting that longer input sequences enable scGPT to capture richer information, resulting in more accurate cell representations. In contrast, Geneformer and scFoundation exhibited a slight negative correlation between input length and embedding quality in some datasets, although the overall changes were minimal. Notably, scBERT’s performance declined as input sequence length increased across most datasets, potentially due to its difficulty in learning meaningful cell features, which led to more inconsistent embeddings. Additionally, we assessed the computational efficacy and resource usage associated with generating cell embeddings across the models (Figure 2D). Both scGPT and Geneformer demonstrated superior efficiency in terms of memory usage and computational time compared to scBERT and scFoundation, underscoring their practicality for large-scale analyses.
We have implemented cell embedding extraction fine-tuning in conjunction with the existing zero-shot setting. The embeddings are derived from supervised training using cell-type labels, and our findings indicate that this approach significantly enhances performance (Figures S4A and S4B). Specifically, fine-tuning through supervised training has proven highly effective for both cell embedding extraction and batch-effect correction (Figures S4C and S4D). The results highlight the importance of incorporating fine-tuning techniques to optimize the accuracy and reliability of cell embeddings, ultimately contributing to more precise biological interpretations.
Overall, our evaluations demonstrate that BioLLM serves effectively as a comprehensive framework for assessing cell embeddings derived from various scFMs. While scGPT excels in generating biologically relevant embeddings and accurately distinguishing between cell types, it faces challenges in handling batch effects. Conversely, models like Geneformer and scFoundation demonstrate competitive performance, whereas scBERT shows notable deficiencies in this area. The incorporation of fine-tuning across these models significantly enhances their performance, underscoring the essential role of fine-tuning in achieving accurate single-cell analysis.
BioLLM facilitates in-depth analysis of GRNs across scFMs
Zero-shot gene-level evaluation is essential for benchmarking foundational models. BioLLM leverages gene-level embeddings to construct GRNs, enhancing our understanding of gene interactions and regulatory mechanisms (Figure 3A). The biological significance of the inferred GRNs will be assessed by examining their ability to delineate regulatory pathways and clarify functional relationships among genes, particularly through the lens of Gene Ontology (GO) pathways, which encompass biological processes (BPs), molecular functions (MFs), and cellular components (CCs). The results indicated that scGPT, scFoundation, and Geneformer consistently identify a greater number of enriched pathways across all clustering resolutions compared to scBERT, with notable performance at lower resolutions for gene module identification (Figure 3B). Focusing on core gene regulatory modules, the analysis of networks targeting human leukocyte antigen-DR alpha (HLA-DRA) reveals that both scGPT and Geneformer effectively group HLA family genes, such as HLA-DRB5, HLA-DPA1, HLA-DQB1, HLA-DMB, HLA-DQA1, and HLA-DRB1, demonstrating a higher degree of interaction (Figure 3C).
Figure 3.
GRN analysis
(A) Overview of GRN construction: gene embeddings are used to compute similarities and build the adjacency matrix, followed by community detection and GO enrichment analysis.
(B) Line plots of enriched GO pathways for each model at different resolution settings, indicating biological process, molecular function, and cellular component enrichment.
(C) Network visualizations of HLA-DRA regulation for different foundational models, showing regulatory relationships among genes. The edge weight values represent the cosine similarity between gene pairs, indicating their similarity degree. The color gradient represents similarity scores, with deeper blue indicating higher similarity and lighter blue indicating lower similarity between gene pairs.
Collectively, these findings underscore the efficacy of BioLLM in utilizing gene-level embeddings to construct informative GRNs, thereby facilitating the identification of critical gene interactions and regulatory pathways. The enhanced performance of scGPT, scFoundation, and Geneformer in GRN construction emphasizes their potential to yield valuable biological insights into gene regulatory mechanisms.
BioLLM allows for comparative performance evaluation of cell annotation tasks among scFMs
As cell annotation is a crucial aspect of single-cell analysis, BioLLM incorporated this task to evaluate the performance of scFMs across 13 datasets from diverse tissues (Table S2). The performance of these models was benchmarked against three established annotation methods: singleR,33 celltypist,34 and scANVI.35 Four classification metrics were employed to rigorously assess model efficacy: accuracy, precision, recall, and macro F1 score. The results indicated that scGPT outperformed all other foundational models, followed by Geneformer, scBERT, and scFoundation (Figures 4A and S6). Compared to traditional annotation tools, scGPT consistently demonstrated superior performance, although Geneformer achieved a slightly lower F1 score than both celltypist and singleR. Additionally, the logistic regression-based celltypist outperformed both scBERT and scFoundation overall. Notably, in the context of rare cell-type identification, scGPT exhibited greater capability than the other scFMs (Figures 4B and 4C). To simulate real-world scenarios, two datasets were allocated for cross-dataset evaluation, reflecting conditions where query data often suffer from batch effects relative to reference data (Figure S6A). In this scenario, scGPT also exhibited superior annotation performance (Figures S6B and S6C).
Figure 4.
Cell-type annotation evaluation
(A) Overview of cell-type annotation performance for four foundational models and three traditional methods across 13 datasets using accuracy and macro F1.
(B) t-Distributed stochastic neighbor embedding (t-SNE) plots of cells from the Zheng68K dataset, showing ground-truth and model predictions.
(C) Confusion matrices for Zheng68K dataset annotations by each model, comparing true and predicted cell types.
(D) Line plots showing how learning rate, number of HVGs, and epochs influence annotation accuracy.
(E) Bar charts of GPU memory usage (left) and line charts of running time (right) for cell annotation tasks.
To evaluate the impact of model adaptation, we systematically compared zero-shot and fine-tuned variants of scGPT and Geneformer across two representative datasets, COVID-19 and Lung-Kim (Figure S7). Fine-tuning led to consistent performance gains across all evaluated metrics, with the most pronounced improvements observed in precision and macro F1 scores (Figures S7A and S7C). A cell-type confusion matrix further demonstrated enhanced classification fidelity in fine-tuned models, as indicated by stronger diagonal dominance and reduced off-diagonal noise across annotated populations in the COVID-19 dataset (Figure S7B) and the Lung-Kim dataset (Figure S7D). While both models maintained high overall accuracy, fine-tuning conferred marginal but systematic improvements. These results highlight the importance of task-specific adaptation in boosting the robustness and resolution of scFM-based cell-type annotation.
The impact of various hyperparameter settings on the performance of scFMs was also evaluated (Figure 4D). The results indicated that a lower learning rate (lr) and an increased number of training epochs generally improved model performance. Additionally, increasing the input gene sequence length positively influenced the annotation performance of scGPT, Geneformer, and scFoundation, while it had a minimal effect on scBERT. Furthermore, we assessed the time required for the annotation task as the number of cells increased (Figure 4E). Geneformer achieved the shortest runtime and lowest GPU consumption, efficiently annotating 100,000 cells in under 1 h, followed by scGPT. In contrast, scBERT required the longest time for the annotation process.
Taken together, the results highlight the effectiveness of BioLLM in utilizing scFMs for cell annotation tasks. Among these models, scGPT emerged as the preeminent model, demonstrating enhanced performance across multiple metrics and excelling in the identification of rare cell types.
BioLLM enables seamless integration of scFMs and specific bioinformatics tools
To further investigate the capabilities of BioLLM, we aimed to determine whether integrating external bioinformatics tools during the fine-tuning of scFMs could enhance the framework’s applicability. In this study, we specifically replaced the transcriptomic feature extraction network in DeepCDR36 with four scFMs, utilizing their embeddings in subsequent DeepCDR network modules (Figure 5A). This integration was designed to predict the half-maximal inhibitory concentration (IC50) values of various drugs across multiple cell line datasets.
Figure 5.
Cancer drug response prediction using foundational models
(A) Schematic of drug response prediction workflow combining gene expression, drug molecular structure, and foundational models.
(B) Bar plot of PCC and SRCC for foundational models compared to DeepCDR.
(C) Scatterplots comparing PCC for all drugs between DeepCDR and foundational models.
(D) Scatterplots comparing PCC and SRCC for all cancer types, highlighting improved accuracy for foundational models like Geneformer and scGPT.
(E) Bar charts of GPU memory usage (left) and running time (right) for cell embedding drug response prediction task.
We assessed the performance of these scFMs in comparison to the traditional DeepCDR model across various drugs and cell lines using the Pearson correlation coefficient (PCC) and Spearman’s rank correlation coefficient (SRCC) as key performance metrics (Figures 5B and 5C). The results indicated that replacing the gene expression feature extraction in DeepCDR with scFMs generally led to improved performance, except for scBERT, which did not exhibit any significant enhancement. Notably, Geneformer and scGPT achieved the highest performance, producing comparable results, followed by scFoundation. Across all cancer types examined, embeddings from Geneformer and scGPT consistently yielded higher PCC and SRCC values (Figure 5D). Additionally, Geneformer exhibited the shortest runtime and lowest GPU consumption for this task, with scGPT and scFoundation following closely behind (Figure 5E).
In conclusion, BioLLM facilitates the seamless integration of scFMs with external bioinformatics tools, thereby enhancing biologically relevant discoveries. The evaluation results underscore the effectiveness of scFMs in improved predictive performance for cancer drug response.
Discussion
In this study, we introduced BioLLM, an integrative tool that consolidates multiple scFMs into a cohesive framework, thereby enhancing the analysis of scRNA-seq data. BioLLM facilitates the streamlined selection and application of various models, specifically scBERT, Geneformer, scGPT, and scFoundation, enabling researchers to efficiently navigate the complexities inherent in single-cell genomics. By supporting both zero-shot-learning and fine-tuning tasks, BioLLM not only optimizes research workflows but also ensures high-quality, reproducible outcomes in downstream analyses (Table S3). This is particularly critical given the urgent need for standardized methodologies to manage the increasing volume and complexity of single-cell data.
Our comprehensive evaluation of the foundational models revealed distinct strengths and limitations that warrant further discussion (Figure 6). Among all benchmarked scFMs, scGPT demonstrated robust performance across all downstream tasks, excelling in both zero-shot and fine-tuning scenarios. This suggests that its generative pretraining is particularly effective for synthesizing biological insights from complex datasets. However, its ability to mitigate batch effects was found to be suboptimal, likely due to insufficient incorporation of batch-related information during pretraining. This limitation underscores the critical need to integrate batch-specific variability into model training to enhance performance in practical applications.37
Figure 6.
Summary table for scFM benchmarking
Overview of the characteristics and performance of four scFMs: scBERT, Geneformer, scGPT, and scFoundation. The table summarizes key metrics, including the number of cells, tissues/organs, and species and the gene vocabulary size for each model. Architectural details are provided, highlighting model size and encoder/decoder types. Performance is evaluated through masked value prediction and read-level value prediction tasks. Computational resource requirements, such as GPU memory, are noted, along with the availability of open-source implementations. This analysis reveals distinct strengths and weaknesses among the models, informing their potential applications in biological data interpretation.
Notably, Geneformer and scFoundation demonstrated strong capabilities in gene-level tasks, with larger model parameters significantly enhancing their performance. Geneformer’s effective pretraining strategy, which focuses on the ordering of gene sequences and the prediction of gene IDs, significantly enhances its understanding of gene-gene interactions. This targeted approach appears to confer advantages in tasks that require nuanced knowledge of molecular relationships.38 In contrast, scBERT demonstrated considerable underperformance compared to other scFMs. This may be attributed to its pretraining strategy, which focuses on encoding full-length gene sequences, along with limited parameters and insufficient training data. These factors hinder its ability to accurately represent relationships at both the cellular and gene levels. The architectural differences among these models, as highlighted in Figure 6, further illuminate their performance disparities. The findings indicate a critical need for reevaluating the architectures employed in these models and suggest that refining pretraining methodologies could substantially improve their efficacy across tasks.
These findings highlight the necessity for ongoing refinement in the design and training of scFMs to overcome the limitations identified in our evaluations. Future research should focus on enhancing model specificity and generalization, particularly across diverse datasets and biological contexts, to ensure that these models can be applied effectively in a variety of experimental scenarios. Notably, emerging models, such as CellPLM,39 which encodes cell-cell relations, present promising avenues for improving annotation accuracy and biological interpretability (Table S4). As part of this effort, BioLLM has integrated support for CellPLM, further demonstrating the framework’s compatibility with diverse foundational architectures. Additionally, exploring potential synergies between complementary models could yield more comprehensive insights into biological systems at the single-cell level. Such integrative approaches may ultimately lead to the development of hybrid frameworks that capitalize on the unique strengths of individual foundational models, thereby advancing the field of single-cell genomics.40
In summary, BioLLM establishes a standardized framework for integrating scFMs and facilitating biological exploration. This cohesive architecture not only streamlines model selection and application but also promotes consistency and reproducibility in single-cell analysis. By addressing the complexities of single-cell genomics, BioLLM empowers researchers to derive meaningful insights and fosters advancements in our understanding of biological systems at the single-cell level.
Limitations of the study
A key limitation of the current BioLLM framework lies in its exclusive support for scRNA-seq data, restricting its applicability to other single-cell modalities—such as ATAC-seq—or multiome datasets. Furthermore, the heterogeneity in data-loading pipelines and fine-tuning strategies across different scFMs introduces variability that may compromise cross-model comparability and task-level consistency. These challenges underscore the need for a more standardized and extensible architecture in future versions of the framework, enabling broader modality integration and harmonized model interfacing.
Methods
BioLLM framework design and implementation
The BioLLM framework is structured around three core components: foundational architecture, task management, and model-loading interfaces. This design emphasizes modularity, flexibility, and extensibility, facilitating the integration of diverse foundational models and a wide range of downstream tasks for single-cell data analysis.
Framework architecture
BioLLM establishes a standardized infrastructure for managing configurations, loading models, and executing analytical tasks. This architecture promotes seamless interactions between model and task modules, enabling independent operation while ensuring consistent interoperability. Such a feature is critical in the rapidly evolving landscape of single-cell genomics, where adaptability is paramount.
Configuration management
The framework includes a unified configuration management module that allows users to specify parameters via a configuration file or an API. This includes model selection, data preprocessing specifications, and task types (e.g., fine-tuning or zero-shot inference). The configuration manager efficiently parses these parameters and supplies the necessary inputs for model loading and task execution, thereby simplifying the setup process and enhancing user experience while minimizing errors.
Modular task and model separation
To meet diverse analytical requirements, BioLLM implements a clear separation between task management and model loading. The task management module is responsible for data preprocessing, training, and inference, while the model module is dedicated to initializing and managing models independently. This organization not only enhances code structure but also allows users to customize and extend functionalities with ease.
LoaderBase model management class
To ensure modularity and maintain consistency across model implementations, BioLLM adopts a unified architectural interface for foundational model integration. The LoaderBase class serves as the central abstraction for model management, encapsulating standardized methods such as load_pretrain_model() and get_embedding(). All model-specific loaders inherit from this base class and are restricted to tasks involving model instantiation and embedding extraction, thereby promoting structural consistency, ease of maintenance, and interoperability across scFMs.
To decouple data preprocessing from model logic, we further introduced a centralized DataHandler base class. This class defines a unified preprocessing pipeline through methods such as read_h5ad() (for loading and preprocessing AnnData objects), process() (for model-specific data transformation), and make_dataset()/make_dataloader() (for generating PyTorch-compatible datasets and loaders). All dataset-specific handlers inherit from this class and are modularized under the dataset/directory. This design enables flexible yet standardized data handling, facilitating robust cross-model compatibility and ensuring that the embeddings produced are uniformly structured for downstream tasks.
BioTask class for task management
The BioTask class is central to managing downstream analytical tasks specific to single-cell data and incorporates several key functionalities.
Configuration parsing
BioTask loads and parses the configuration file to identify essential parameters, including task types (fine-tuning or zero shot), model selection, and preprocessing requirements. Based on these parameters, BioTask dynamically selects and initializes the appropriate LoaderBase subclass corresponding to the specified foundational model.
Compatibility checks
During model loading, LoaderBase performs rigorous compatibility checks on model parameters and task specifications, ensuring that the selected model is suitable for the intended task type—whether for fine-tuning or zero-shot inference—thus minimizing potential runtime issues.
Data preprocessing and data-loader creation
BioTask processes input data into the AnnData format, widely utilized in single-cell analyses, facilitating organized access to observations (cells) and variables (genes). The class standardizes raw input data to meet model requirements and constructs a data loader to manage large-scale datasets efficiently during execution.
Task execution
The execution logic within the BioTask class is tailored to the analytical task at hand. For fine-tuning tasks, BioTask loads the pretrained model and adjusts its parameters to optimize performance on the target dataset. In contrast, for zero-shot tasks, it leverages the pretrained model to directly extract cell and gene features, generating embeddings suitable for subsequent analyses without retraining.
Extensibility and modular design
BioLLM is designed to facilitate the rapid integration of new tasks and models, ensuring adaptability to emerging research needs.
Adding new task types
Developers can extend BioLLM’s functionality by defining new analytical tasks within the BioTask class. This is accomplished by implementing specific task execution methods, allowing for the seamless incorporation of additional task types without modifying the core framework logic.
Integrating new models
The framework supports the introduction of new foundational models through subclassing the LoaderBase class and implementing required interface methods. This modular design ensures compatibility with existing BioLLM task structures, streamlining the model integration process and encouraging innovation in modeling techniques.
Documentation and user guide
Comprehensive documentation accompanies the BioLLM framework, providing users with detailed guidance on configuring and executing tasks. This includes step-by-step instructions for model integration, task execution, and extending the framework to accommodate new analytical tasks. This resource enhances user experience, supporting both novice and experienced users in effectively utilizing BioLLM for advanced single-cell data analysis.
Evaluation of downstream tasks
Cell embedding
The input dataset underwent preprocessing tailored to the specific requirements of each foundational model. For Geneformer and scGPT, which support input sequence lengths of 2,048 and 1,200, respectively, we selected 3,000 highly variable genes as input features. In contrast, the other two foundational models utilized full-length gene sequences without feature selection. The preprocessing steps adhered to each model’s pretraining conditions to ensure optimal performance. Specifically, scBERT, scFoundation, and scGPT required a log1p transformation of the gene expression data, while Geneformer utilized raw counts without normalization. For the generation of cell embeddings, BioLLM supports three methods for scBERT: CLS, mean, and sum. The CLS method utilizes the CLS token embedding as the cell representation, while the mean and sum pooling methods apply mean and sum operations over the token embeddings produced by the model’s encoder. For scGPT, the cell embeddings are derived from the CLS token embedding from the original model. In the case of Geneformer, embeddings are extracted using the model’s native mean_nonpadding_embs function, which computes the mean of non-padded token embeddings. For scFoundation, the final cell embedding is generated through max pooling applied to the token embeddings.
GRN analysis
The GRN analysis was conducted using the Immune_ALL_human dataset as the input, processed through foundational models integrated with the BioLLM framework. The input dataset was preprocessed based on quality criteria as follows: (1) genes that have at least 3 counts were kept and (2) the raw data were subset to 1,200 highly variable genes. Once preprocessed, the dataset was passed through the foundational model, which generated gene embeddings as output. Following the embedding generation, cosine similarity was computed to construct an adjacency matrix. The distance quantified the relationships between each gene, and the adjacency matrix was developed to represent these interactions. Each element within the matrix indicated the degree of similarity between gene pairs, thus highlighting potential regulatory relationships. This adjacency matrix then served as the foundation for constructing the GRN, wherein nodes represented genes and edges depicted their inferred regulatory connections.
Cell-type annotation
Cell-type annotation was conducted in both intra-dataset and inter-dataset scenarios to assess the performance of the foundational models. Each dataset was divided into training and testing sets at an 8:2 ratio, ensuring a robust evaluation of model performance. The training set was further subdivided into training and validation subsets, also using an 8:2 split. During the preprocessing phase, all datasets retained their complete gene expression profiles without applying any selection criteria for highly variable genes. This approach aimed to provide a comprehensive input for model training, facilitating the evaluation of each model’s capacity to leverage the full range of gene expression data. Model training involved optimizing two critical hyperparameters: the number of training epochs and the learning rate. A uniform epoch count of 20 was applied across all models to standardize the training duration. The learning rates were set according to the default values recommended for each foundational model: 0.0001 for both scGPT and scFoundation, 0.001 for scBERT, and 0.00005 for Geneformer. To further investigate the impact of hyperparameter settings on cell annotation performance, the Zheng68k dataset was employed. This dataset allowed for a detailed exploration of how variations in epoch numbers, learning rates, and the selection of variable genes influenced the efficacy of the cell annotation models, providing insights into the optimal configuration for accurate cell-type classification.
Drug response
To evaluate and validate drug sensitivity predictions across the four scFMs, we utilized two datasets from DeepCDR: the Genomics of Drug Sensitivity in Cancer (GDSC), which provides IC50 values for each drug-cell pair, and the Cancer Cell Line Encyclopedia (CCLE), which contains comprehensive gene expression, mutation, and methylation data for various cancer cell lines. The embedding module of DeepCDR comprises two primary networks. The first network is a graph convolutional network (GCN) tasked with feature extraction from the drug’s feature matrix and adjacency matrix. Concurrently, three distinct sub-networks are employed to encode the gene expression, methylation, and mutation data of the cell lines, effectively capturing the latent biological characteristics of the cells. The encoded features from both the drug and the cell line are then concatenated and processed through a convolutional neural network (CNN) to predict the IC50 values. In the integration of scFMs with DeepCDR, we focused on leveraging gene expression data from the CCLE dataset. Prior to inputting these gene expression data into the sub-networks of DeepCDR, we employed the selected scFM to extract cell-specific features. These features were subsequently incorporated into the sub-networks, allowing for enhanced processing by the CNN to predict the IC50 values. Model performance was rigorously assessed by calculating both the PCC and SRCC between the predicted and actual IC50 values. These statistical measures provided a robust evaluation of the prediction accuracy, facilitating a comprehensive comparison of the models' effectiveness in drug response assessment.
Evaluation metrics
Cell embedding metrics
To evaluate the effectiveness of cell embeddings, we assessed the separation of cell types and the presence of batch effects in the embedding space. The ASW was employed as the primary metric, which quantifies the relationship between within-cluster and between-cluster distances. We computed ASW scores for both cell types and batch effects using a variant provided by scBI. For cell types, the ASW score is normalized between 0 and 1, where a score of 0 indicates tight within-cluster cohesion, 0.5 suggests overlapping clusters, and 1 signifies well-separated clusters. Higher ASW values indicate better cluster separation and overall performance. The ASW for cell types is calculated as follows:
where represents the set of all cell identity labels.
In the context of batch effects, ASW is computed based on batch labels, where a score of 0 denotes perfect mixing and scores deviating from 0 indicate varying degrees of batch effects. To ensure that higher scores reflect better batch mixing, we scale the scores by subtracting them from 1, resulting in a range from 0 to 1. A score of 0 indicates complete batch separation, while a score of 1 reflects ideal batch mixing and integration.
GRN metrics
To assess the performance of the foundational models in GRN analysis, we utilized the Immune_ALL_human dataset. The dataset was preprocessed according to the specific requirements of each model to ensure optimal embedding generation. Gene embeddings were then extracted to construct a neighborhood graph illustrating the connectivity among individual cells. The Leiden algorithm was employed to cluster cells based on this connectivity, with the resolution parameter varied systematically from 0.1 to 1.0 in increments of 0.1. This approach allowed for an exploration of clustering at various levels of granularity, resulting in distinct subgroups that provided insights into cellular heterogeneity. Subsequently, we filtered out clusters with more than 25 genes and performed GO enrichment analysis on the genes within each of these clusters, evaluating BPs, MFs, and CCs associated with these genes. The final result of GO enrichment only included categories with adjusted p values less than 0.01. This analysis offered valuable insights into the functional characteristics of the identified subgroups. Additionally, we selected HLA-DRA as the target gene for visualization of the GRN. The visualization elucidated the regulatory relationships and interactions involving HLA-DRA, enhancing our understanding of its role in the context of immune response regulation.
Cell-type annotation metrics
Model performance in cell-type annotation was assessed by comparing predicted results with true labels using four key metrics: accuracy, precision, recall, and macro F1 score. The macro F1 score, which accounts for class imbalances, is calculated by first determining the F1 score for each class. The F1 score is defined as the harmonic mean of precision and recall:
where is the F1 score for class and and represent the precision and recall for class . The macro F1 score is then computed by averaging the F1 scores across all classes:
Drug response metrics
For the evaluation of drug sensitivity predictions, we utilized preprocessed data from DeepCDR. The primary metrics employed were the PCC and SRCC. These metrics assess the correlation between predicted and true IC50 values for each cell line across the foundational models, providing a robust measure of prediction accuracy. The PCC is calculated as follows:
where and refer to the predicted IC50 value and the actual IC50 value for the -th drug-cell pair, respectively. and represent the mean values of all and .
The SRCC is calculated as follows:
where is the difference in ranks between the predicted IC50 value and actual IC50 value for the i-th drug-cell pair. n is the number of drug-cell pairs.
Resource availability
Lead contact
Requests for further information and resources should be directed to and will be fulfilled by the lead contact, Luni Hu (huluni@genomics.cn).
Materials availability
This study did not generate new unique reagents.
Data and code availability
-
•
All datasets used in this study were obtained from published sources and can be downloaded via the links in Table S2.
-
•
The source code is publicly available on GitHub (https://github.com/BGIResearch/BioLLM) and archived on Zenodo.41 Model checkpoints related to scFMs are also available on Zenodo.42
Acknowledgments
We acknowledge the Stomics Cloud platform (https://cloud.stomics.tech/) for providing GPU computational resources. This work was supported by the National Natural Science Foundation of China (32300526).
Author contributions
L.H. conceptualized the study. P.Q. and L.H. were responsible for the framework design and tool implementation. Q.C., H.Q., and Yilin Zhang performed data analysis and model evaluation. S.F., Yanlin Zhang, T.X., and L.C. contributed key ideas and advice. P.Q., Q.C., and L.H. wrote the manuscript. L.H., Y.L., X.F., and Yong Zhang supervised the study.
Declaration of interests
The authors declare no competing interests.
Declaration of generative AI and AI-assisted technologies in the writing process
During the preparation of this work, the authors used a large language model (ChatGPT) to improve conciseness. After using this tool or service, the authors reviewed and edited the content as needed and take full responsibility for the content of the publication.
Published: July 30, 2025
Footnotes
Supplemental information can be found online at https://doi.org/10.1016/j.patter.2025.101326.
Contributor Information
Yong Zhang, Email: zhangyong2@genomics.cn.
Xiaodong Fang, Email: fangxd@genomics.cn.
Yuxiang Li, Email: liyuxiang@genomics.cn.
Luni Hu, Email: huluni@genomics.cn.
Supplemental information
References
- 1.Heumos L., Schaar A.C., Lance C., Litinetskaya A., Drost F., Zappia L., Lücken M.D., Strobl D.C., Henao J., Curion F., et al. Best practices for single-cell analysis across modalities. Nat. Rev. Genet. 2023;24:550–572. doi: 10.1038/s41576-023-00586-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Chambers D.C., Carew A.M., Lukowski S.W., Powell J.E. Transcriptomics and single-cell RNA-sequencing. Respirology. 2019;24:29–36. doi: 10.1111/resp.13412. [DOI] [PubMed] [Google Scholar]
- 3.Jovic D., Liang X., Zeng H., Lin L., Xu F., Luo Y. Single-cell RNA sequencing technologies and applications: A brief overview. Clin. Transl. Med. 2022;12 doi: 10.1002/ctm2.694. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Chen G., Ning B., Shi T. Single-Cell RNA-Seq Technologies and Related Computational Data Analysis. Front. Genet. 2019;10:317. doi: 10.3389/fgene.2019.00317. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Bacher R., Kendziorski C. Design and computational analysis of single-cell RNA-sequencing experiments. Genome Biol. 2016;17:63. doi: 10.1186/s13059-016-0927-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Hicks S.C., Townes F.W., Teng M., Irizarry R.A. Missing data and technical variability in single-cell RNA-sequencing experiments. Biostatistics. 2018;19:562–578. doi: 10.1093/biostatistics/kxx053. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Lahnemann D., Koster J., Szczurek E., McCarthy D.J., Hicks S.C., Robinson M.D., Vallejos C.A., Campbell K.R., Beerenwinkel N., Mahfouz A., et al. Eleven grand challenges in single-cell data science. Genome Biol. 2020;21:31. doi: 10.1186/s13059-020-1926-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Sengupta D., Rayan N.A., Lim M., Lim B., Prabhakar S. Fast, scalable and accurate differential expression analysis for single cells. bioRxiv. 2016 doi: 10.1101/049734. Preprint at. [DOI] [Google Scholar]
- 9.Sinha D., Kumar A., Kumar H., Bandyopadhyay S., Sengupta D. dropClust: efficient clustering of ultra-large scRNA-seq data. Nucleic Acids Res. 2018;46:e36. doi: 10.1093/nar/gky007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Yang F., Wang W., Wang F., Fang Y., Tang D., Huang J., Lu H., Yao J. scBERT as a large-scale pretrained deep language model for cell type annotation of single-cell RNA-seq data. Nat. Mach. Intell. 2022;4:852–866. doi: 10.1038/s42256-022-00534-z. [DOI] [Google Scholar]
- 11.Theodoris C.V., Xiao L., Chopra A., Chaffin M.D., Al Sayed Z.R., Hill M.C., Mantineo H., Brydon E.M., Zeng Z., Liu X.S., Ellinor P.T. Transfer learning enables predictions in network biology. Nature. 2023;618:616–624. doi: 10.1038/s41586-023-06139-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Cui H., Wang C., Maan H., Pang K., Luo F., Duan N., Wang B. scGPT: toward building a foundation model for single-cell multi-omics using generative AI. Nat. Methods. 2024;21:1470–1480. doi: 10.1038/s41592-024-02201-0. [DOI] [PubMed] [Google Scholar]
- 13.Hao M., Gong J., Zeng X., Liu C., Guo Y., Cheng X., Wang T., Ma J., Zhang X., Song L. Large-scale foundation model on single-cell transcriptomics. Nat. Methods. 2024;21:1481–1491. doi: 10.1038/s41592-024-02305-7. [DOI] [PubMed] [Google Scholar]
- 14.Rosen Y., Roohani Y., Agarwal A., Samotorčan L., Tabula Sapiens C., Quake S.R., Leskovec J. Universal Cell Embeddings: A Foundation Model for Cell Biology. bioRxiv. 2023 doi: 10.1101/2023.11.28.568918. Preprint at. [DOI] [Google Scholar]
- 15.Yang X., Liu G., Feng G., Bu D., Wang P., Jiang J., Chen S., Yang Q., Miao H., Zhang Y., et al. GeneCompass: deciphering universal gene regulatory mechanisms with a knowledge-informed cross-species foundation model. Cell Res. 2024;34:830–845. doi: 10.1038/s41422-024-01034-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Bommasani R., Hudson D.A., Adeli E., Altman R., Arora S., von Arx S., Bernstein M.S., Bohg J., Bosselut A., Brunskill E., et al. On the Opportunities and Risks of Foundation Models. arXiv. 2021 doi: 10.48550/arXiv.2108.07258. Preprint at. [DOI] [Google Scholar]
- 17.Liang W., Tadesse G.A., Ho D., Fei-Fei L., Zaharia M., Zhang C., Zou J. Advances, challenges and opportunities in creating data for trustworthy AI. Nat. Mach. Intell. 2022;4:669–677. doi: 10.1038/s42256-022-00516-1. [DOI] [Google Scholar]
- 18.Moor M., Banerjee O., Abad Z.S.H., Krumholz H.M., Leskovec J., Topol E.J., Rajpurkar P. Foundation models for generalist medical artificial intelligence. Nature. 2023;616:259–265. doi: 10.1038/s41586-023-05881-4. [DOI] [PubMed] [Google Scholar]
- 19.Vaswani A., Shazeer N., Parmar N., Uszkoreit J., Jones L., Gomez A.N., Kaiser L.u., Polosukhin I. Attention is All you Need. arXiv. 2017 doi: 10.48550/arXiv.1706.03762. Preprint at. [DOI] [Google Scholar]
- 20.Kenton J.D.M.-W.C., Toutanova L.K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv. 2019 doi: 10.48550/arXiv.1810.04805. Preprint at. [DOI] [Google Scholar]
- 21.Achiam J., Adler S., Agarwal S., Ahmad L., Akkaya I., Leoni Aleman F., Almeida D., Altenschmidt J., Altman S., Anadkat S.J.a.e. GPT-4 Technical Report. arXiv. 2023 doi: 10.48550/arXiv.2303.08774. Preprint at. [DOI] [Google Scholar]
- 22.Chen J., Xu H., Tao W., Chen Z., Zhao Y., Han J.D.J. Transformer for one stop interpretable cell type annotation. Nat. Commun. 2023;14:223. doi: 10.1038/s41467-023-35923-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Cui H., Wang C., Maan H., Duan N., Wang B. scFormer: A Universal Representation Learning Approach for Single-Cell Data Using Transformers. bioRxiv. 2022 doi: 10.1101/2022.11.20.517285. Preprint at. [DOI] [Google Scholar]
- 24.Ma A., Wang X., Li J., Wang C., Xiao T., Liu Y., Cheng H., Wang J., Li Y., Chang Y., et al. Single-cell biological network inference using a heterogeneous graph transformer. Nat. Commun. 2023;14:964. doi: 10.1038/s41467-023-36559-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Xu J., Zhang A., Liu F., Chen L., Zhang X. CIForm as a Transformer-based model for cell-type annotation of large-scale single-cell RNA-seq data. Brief. Bioinform. 2023;24 doi: 10.1093/bib/bbad195. [DOI] [PubMed] [Google Scholar]
- 26.Szalata A., Hrovatin K., Becker S., Tejada-Lapuerta A., Cui H., Wang B., Theis F.J. Transformers in single-cell omics: a review and new perspectives. Nat. Methods. 2024;21:1430–1443. doi: 10.1038/s41592-024-02353-z. [DOI] [PubMed] [Google Scholar]
- 27.Du J., Jia P., Dai Y., Tao C., Zhao Z., Zhi D. Gene2vec: distributed representation of genes based on co-expression. BMC Genom. 2019;20:82. doi: 10.1186/s12864-018-5370-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Dao T., Fu D.Y., Ermon S., Rudra A., R'e C. FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness. arXiv. 2022 doi: 10.48550/arXiv.2205.14135. Preprint at. [DOI] [Google Scholar]
- 29.Yang X., Mann K.K., Wu H., Ding J. scCross: a deep generative model for unifying single-cell multi-omics with seamless integration, cross-modal generation, and in silico exploration. Genome Biol. 2024;25:198. doi: 10.1186/s13059-024-03338-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Fischer F., Fischer D.S., Mukhin R., Isaev A., Biederstedt E., Villani A.C., Theis F.J. scTab: Scaling cross-tissue single-cell annotation models. Nat. Commun. 2024;15:6611. doi: 10.1038/s41467-024-51059-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Haghverdi L., Lun A.T.L., Morgan M.D., Marioni J.C. Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors. Nat. Biotechnol. 2018;36:421–427. doi: 10.1038/nbt.4091. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Andreatta M., Hérault L., Gueguen P., Gfeller D., Berenstein A.J., Carmona S.J. Semi-supervised integration of single-cell transcriptomics data. Nat. Commun. 2024;15:872. doi: 10.1038/s41467-024-45240-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Aran D., Looney A.P., Liu L., Wu E., Fong V., Hsu A., Chak S., Naikawadi R.P., Wolters P.J., Abate A.R., et al. Reference-based analysis of lung single-cell sequencing reveals a transitional profibrotic macrophage. Nat. Immunol. 2019;20:163–172. doi: 10.1038/s41590-018-0276-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Dominguez Conde C., Xu C., Jarvis L.B., Rainbow D.B., Wells S.B., Gomes T., Howlett S.K., Suchanek O., Polanski K., King H.W., et al. Cross-tissue immune cell analysis reveals tissue-specific features in humans. Science. 2022;376 doi: 10.1126/science.abl5197. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Gayoso A., Lopez R., Xing G., Boyeau P., Valiollah Pour Amiri V., Hong J., Wu K., Jayasuriya M., Mehlman E., Langevin M., et al. A Python library for probabilistic analysis of single-cell omics data. Nat. Biotechnol. 2022;40:163–166. doi: 10.1038/s41587-021-01206-w. [DOI] [PubMed] [Google Scholar]
- 36.Liu Q., Hu Z., Jiang R., Zhou M. DeepCDR: a hybrid graph convolutional network for predicting cancer drug response. Bioinformatics. 2020;36:i911–i918. doi: 10.1093/bioinformatics/btaa822. [DOI] [PubMed] [Google Scholar]
- 37.Ma Q., Jiang Y., Cheng H., Xu D. Harnessing the deep learning power of foundation models in single-cell omics. Nat. Rev. Mol. Cell Biol. 2024;25:593–594. doi: 10.1038/s41580-024-00756-6. [DOI] [PubMed] [Google Scholar]
- 38.Zhou J., Troyanskaya O.G. Predicting effects of noncoding variants with deep learning-based sequence model. Nat. Methods. 2015;12:931–934. doi: 10.1038/nmeth.3547. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Wen H., Tang W., Dai X., Ding J., Jin W., Xie Y., Tang J. CellPLM: Pre-training of Cell Language Model Beyond Single Cells. bioRxiv. 2023 doi: 10.1101/2023.10.03.560734. Preprint at. [DOI] [Google Scholar]
- 40.Lukassen S., Ten F.W., Adam L., Eils R., Conrad C. Gene set inference from single-cell sequencing data using a hybrid of matrix factorization and variational autoencoders. Nat. Mach. Intell. 2020;2:800–809. doi: 10.1038/s42256-020-00269-9. [DOI] [Google Scholar]
- 41.Qiu, P., Chen, Q., Qin, H., and Hu, L. (2025). Chelsey-chen/BioLLM: v1.0.0. Zenodo. 10.5281/zenodo.15597639. [DOI]
- 42.Qiu, P., Chen, Q., Qin, H., and Hu, L. (2024). Bio_tools. Zenodo. 10.5281/zenodo.14189969. [DOI]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
-
•
All datasets used in this study were obtained from published sources and can be downloaded via the links in Table S2.
-
•
The source code is publicly available on GitHub (https://github.com/BGIResearch/BioLLM) and archived on Zenodo.41 Model checkpoints related to scFMs are also available on Zenodo.42






