Skip to main content
Chinese Medical Journal logoLink to Chinese Medical Journal
. 2024 Jul 30;137(17):2052–2064. doi: 10.1097/CM9.0000000000003254

Bioinformatics tools and resources for cancer and application

Jin Huang 1, Lingzi Mao 2, Qian Lei 1,, An-Yuan Guo 1,
Editor: Yuanyuan Ji
PMCID: PMC11374212  PMID: 39075637

Abstract

Tumor bioinformatics plays an important role in cancer research and precision medicine. The primary focus of traditional cancer research has been molecular and clinical studies of a number of fundamental pathways and genes. In recent years, driven by breakthroughs in high-throughput technologies, large-scale cancer omics data have accumulated rapidly. How to effectively utilize and share these data is particularly important. To address this crucial task, many computational tools and databases have been developed over the past few years. To help researchers quickly learn and understand the functions of these tools, in this review, we summarize publicly available bioinformatics tools and resources for pan-cancer multi-omics analysis, regulatory analysis of tumorigenesis, tumor treatment and prognosis, immune infiltration analysis, immune repertoire analysis, cancer driver gene and driver mutation analysis, and cancer single-cell analysis, which may further help researchers find more suitable tools for their research.

Keywords: Tumor bioinformatics, Cancer research, Tools, Databases

Introduction

Cancer is a highly aggressive and heterogeneous disease encompassing many tissue types and diverse oncogenic drivers. Many hallmarks of cancer have been identified, including sustained proliferative signaling, evasion of growth suppressors, and avoidance of immune destruction,[1] which contribute to the extensive complexity of cancer phenotypes and genotypes. With biotechnological advancements, massive amounts of omics data, including genomics, transcriptomics, epigenomics, and other omics data, have been generated to systematically measure different aspects of tumor characteristics.[2] These datasets provide a comprehensive overview of the molecular landscape of cancer. The surge in omics data has also led to the development of numerous bioinformatics databases and tools to facilitate the storage, retrieval, integration, and analysis of these data.[3] Integrating and analyzing these datasets has revolutionized cancer research by offering insights into the molecular mechanisms driving tumorigenesis, unraveling the tumor microenvironment (TME), identifying potential biomarkers for early detection and prognosis, and guiding the development of targeted therapies for individual patients.[4]

In this review, we introduce the publicly available bioinformatics tools and data resources for cancer research. These tools and resources are available for the analysis of pan-cancer, regulation of tumorigenesis, treatment and prognosis, immune infiltration, immune repertoire, cancer driver gene, and cancer single-cell datasets [Figure 1]. This review aims to help researchers choose suitable bioinformatics tools and resources for cancer research according to different purposes and applications.

Figure 1.

Figure 1

An overview of computational approaches and tools for tumor bioinformatics (by Figdraw).

Online Resources for Pan-cancer Multi-omics Analysis

High-throughput sequencing can generate a large amount of cancer omics data. To make it easy for users to use these data, many databases and web servers have been constructed. Here, we introduce the main databases and analysis platforms for pan-cancer multi-omics data, with detailed descriptions of their features and functions [Table 1].

Table 1.

Summary of online pan-cancer multi-omics analysis web servers and databases.

Name Description Link
TCGA The Cancer Genome Atlas https://www.cancer.gov/ccg/research/genome-sequencing/tcga
ICGC International Cancer Genome Consortium https://dcc.icgc.org
COSMIC Catalogue of Somatic Mutations in Cancer https://cancer.sanger.ac.uk/cosmic
cBioPortal Interactive exploration of multidimensional cancer genomics datasets https://www.cbioportal.org
UCSC Xena Exploration tool for public and private, multi-omic and clinical/phenotype data https://xena.ucsc.edu
GEPIA2 Large-scale expression profiling and interactive analysis http://gepia2.cancer-pku.cn
GSCA (GSCALite) Pan-cancer analysis of expression, mutation, immune infiltration, and drug sensitivity for a gene set https://guolab.wchscu.cn/GSCA
GDAC Firehose Downloadable TCGA datasets, summary reports, and graphical tools https://gdac.broadinstitute.org
CVCDAP Interactive and customizable tool for cohort-level analysis https://omics.bjcancer.org/cvcdap/home.do
CPTAC Clinical proteomic tumor analysis consortium https://proteomics.cancer.gov/programs/cptac
UALCAN Interactive web resource for analyzing cancer omics data http://ualcan.path.uab.edu
TCPA Resource for accessing and analyzing cancer functional proteomics http://tcpaportal.org/tcpa
CancerProteome Resource to functionally decipher the proteome landscape in cancer http://bio-bigdata.hrbmu.edu.cn/CancerProteome

CVCDAP: Cancer Virtual Cohort Discovery Analysis Platform; COSMIC: Catalogue of Somatic Mutations in Can­cer; GSCA: Gene Set Cancer Analysis; TCPA: The Cancer Proteome Atlas.

The Cancer Genome Atlas (TCGA)[5] is a landmark cancer genomics database that has generated gene expression, DNA mutation, DNA methylation, chromatin accessibility, copy number alterations (CNA), protein expression, and histopathology data for more than 20,000 primary cancer and matched normal samples spanning 33 cancer types. The TCGA database contains the largest and most comprehensive pan-cancer multi-omics dataset and is widely used in the research community. The TCGA data can be accessed through the Genomic Data Commons Data Portal, along with web-based analysis and visualization tools. Broad Genome Data Analysis Center (GDAC) Firehose (http://gdac.broadinstitute.org/) initially organizes and analyzes genomic data and allows users to view and download it efficiently. The International Cancer Genome Consortium (ICGC)[6] is a global initiative to generate a comprehensive catalog of genetic mutational abnormalities in more than 50 tumor types. The ICGC incorporates data from 84 global cancer projects derived from the TCGA, including approximately 77 million somatic mutations and molecular data from over 20,000 participants. Catalogue of Somatic Mutations in Cancer (COSMIC)[7] is the world’s largest source of expert manually curated somatic mutations for human cancers. Currently it contains known and suspected cancer genes and mutations, including CNA, methylation, gene fusion, single nucleotide polymorphism (SNP), and gene expression information from over 37,000 genomes.

To conveniently use these data, many online analysis platforms have been developed for data integration, mining, and visualization. cBioPortal[8] is a TCGA database-based resource for the interactive exploration of integrated cancer genomics datasets. It supports various analyses and visualizations of mutations and the expression of multiple genes in a variety of cancers. University of California, Santa Cruz (UCSC) Xena is an online tool for both large public and private datasets. Through the Xena Browser[9] (https://xenabrowser.net), users can explore visualization and analysis functions, such as survival analyses, genomic signatures, and statistical tests. Cancer Virtual Cohort Discovery Analysis Platform (CVCDAP)[10] is an interactive and customizable tool for cohort-level analysis of TCGA and Clinical Proteomic Tumor Analysis Consortium (CPTAC) public datasets, as well as user-uploaded datasets. CVCDAP allows flexible selection of patients from different projects as a virtual cohort, and provides dozens of features for seamless genomic, transcriptomic, proteomic, and clinical analysis.

Gene Expression Profiling Interactive Analysis 2 (GEPIA2)[11] is used to analyze the gene expression data of tumors and normal tissue samples from the TCGA and Genotype-Tissue Expression (GTEx) databases. Users can customize features such as tumor/normal differential expression analysis, patient survival analysis, and similar gene detection. Gene Set Cancer Analysis (GSCA)[3] integrates genomic (mutation and expression), pharmacogenomic, and immunogenomic analysis of clinical phenotypes. The unique function of GSCA is that it can analyze the gene set as a whole, and not just for each single gene.

In addition to genomics, several proteomic tools for cancer data have been developed. The CPTAC[12] contains genomic, transcriptomic, proteomic, and clinical data from more than 1000 tumors in 10 cancer cohorts. Users can identify proteomic-centric subtypes, prioritize driver mutations, and understand cancer-relevant pathways. The University of ALabama at Birmingham CANcer data analysis Portal (UALCAN)[13] is designed to provide easy access to data from CPTAC and TCGA. The Cancer Proteome Atlas (TCPA)[14] is a comprehensive resource for accessing, visualizing, and analyzing cancer functional proteomics data for cancer-related genes. The CancerProteome[15] contains manually curated publicly available mass spectrometry (MS)-based quantification and post-translational modification (PTM) proteomes, including 7406 samples from 21 different cancer types. Protein abundances correlated with corresponding transcript or PTM levels are also available.

The Genomic Data Commons (GDC) data portal, ICGC data portal, Broad GDAC Firehose, and CPTAC data portal are the main repositories used to browse, query, and download data for their corresponding projects. Web-based user-friendly data analysis tools, including cBioPortal, UCSC Xena, GEPIA2, GSCA (previous version GSCALite[16]), Broad GDAC Firehose, CVCDAP, UALCAN, TCPA, and CancerProteome can help scientists without a computational background gain greater biological insight. All of these online analysis tools have the unique features that we described above. Generally, GEPIA2 is a good option for performing differential gene expression analysis because of its high data volume of cancer sample data and user-friendly online analysis portal. In terms of pan-cancer analysis, cBioPortal, GSCA, and UALCAN are recommended. cBioPortal provides the most detailed TCGA online analyses. GSCA integrates expression, mutation, drug sensitivity, and clinical data for gene set level analysis. UALCAN provides clinical proteomic consortium data analysis including phosphoproteins expression analysis of tumor and normal samples.

Online Databases for Expression and Regulatory Analysis of Tumorigenesis

Gene expression during tumorigenesis is regulated by a variety of factors, including DNA methylation, chromatin openness, and transcription factors (TFs). Here, we summarize several online databases for expression and regulatory analysis of tumorigenesis.

AnimalTFDB[17] is a comprehensive database of genome-wide TF lists and annotations that has been maintained and updated for more than 10 years with more than 1000 citations. JASPAR[18] is a freely accessible database offering curated TF binding profiles, utilizing position frequency matrices (PFMs) and TF flexible models (TFFMs) to predict binding sites across various species. hTFtarget[19] provides a comprehensive and almost one-stop solution to explore TF-target regulation in humans, which integrated thousands of Chromatin Immunoprecipitation Sequencing (ChIP-seq) datasets and epigenetic modification information to predict reliable TF-target regulations, and provides online tools to predict potential co-association and coregulation between TFs. The Cistrome[20] is an excellent resource for transcriptional regulation that integrates ChIP-seq and Deoxyribonuclease (DNase) data to view and analyze the binding sites of TFs or histone modifications in the genome.

MicroRNA-related Single Nucleotide Polymorphisms (miRNASNP)[21] aims to provide a resource of microRNA (miRNA)-related SNPs, which includes SNPs in pre-microRNA (pre-miRNA) of humans, and target gain and loss by SNPs (or disease-related variants) in miRNA seed regions or 3′UTR of target messenger RNAs (mRNAs).

The Platelet Expression Atlas (PEA)[22] is a repository of gene/miRNA expression profiles of human platelets under different conditions, providing a comprehensive biological workbench for browsing platelet expression patterns. ChIPBase v3.0[23] is a comprehensive tool for studying transcriptional regulation across genes and regulators, uncovering millions of regulatory relationships from a vast dataset of ChIP-seq experiments. It features modules for enhancer prediction, co-expression maps, disease variations, and epitranscriptome–epigenome interactions. TRRUST v2[24] provides curated human and mouse transcriptional regulatory networks with over 8000 TF-target interactions per species. It uses PubMed text mining for detailed regulatory mode information and offers tools to prioritize key TFs relevant to specific physiological conditions. The Extracellular Vesicle Atlas (EVAtlas)[25] is the most advanced extracellular vesicle (EV) non-coding RNAs (ncRNAs) repository, storing comprehensive expression profiles of seven types of ncRNAs in EVs, which will facilitate functional research and biomarker discovery for EVs. The Extracellular Vesicles miRNA database (EVmiRNA)[26] database is the first database to focus on miRNA expression profiles in EVs. Tissue-specific Gene Expression and Regulation (TiGER)[27] specializes in tissue-specific gene expression and regulation in human tissues, featuring comprehensive datasets on gene expression profiles, TF interactions, and cis-regulatory module detections.

The FFLtool[28] is a web server for analyzing feed forward loop (FFL) regulatory motifs among TFs, miRNAs, and genes. The FFLtool could help biomedical researchers or bioinformaticians better explore complex regulatory mechanisms, such as by identifying key regulatory motifs and regulators for further experimental validation or analyzing the TF and miRNA coregulatory networks involved in diseases. It has been applied to the study of gene expression regulation in complex diseases with good results. GRAND[29] offers a comprehensive collection of gene regulatory networks (GRNs) across human tissues, cancers, cell lines, and drugs, linking TFs or miRNAs to target genes. It supports TF enrichment analysis, identifies regulatory compounds, and enables network querying and visualization. GRNdb[30] predicts GRNs from omics data, providing searchable TF-target pairs and motifs. It analyzes gene expression and survival in various conditions, including TCGA cancers. The Gene Network Construction Kit (GeNeCK)[31] is an online tool kit that integrates various statistical methods to construct gene networks based on gene expression data and optional hub gene information. It allows users to use 10 different network construction methods (such as partial correlation, likelihood, Bayes, and mutual information) and integrate the networks obtained from multiple methods.

In summary, the above-mentioned resources provide useful data for systematic exploration of tumor regulation [Table 2].

Table 2.

Summary of online resources for expression and regulation analysis in tumor.

Name Description Link
hTFtarget Comprehensive database for human TF regulation and their targets https://guolab.wchscu.cn/hTFtarget
JASPAR Largest TF binding profile database across species https://jaspar.elixir.no/
AnimalTFDB Comprehensive database for genome-wide TFs, transcription co-factors, and chromatin remodeling factors in 183 animal genomes https://guolab.wchscu.cn/AnimalTFDB4
Cistrome A collection of ChIP-seq, chromatin accessibility data (DNase-seq and ATAC-seq), and regulation in human and mouse http://cistrome.org/db
miRNASNP Database for miRNA-related SNPs and their functions https://guolab.wchscu.cn/miRNASNP
PEA Platelet expression atlas resource and platelet transcriptome https://guolab.wchscu.cn/PEA
ChIPBase The encyclopedia of transcriptional regulations of ncRNAs and protein-coding genes http://rna.sysu.edu.cn/chipbase/
TRRUST v2 Transcriptional Regulatory Relationships Unraveled by Sentence-based Text mining http://www.grnpedia.org/trrust
EVAtlas Comprehensive database for ncRNA expression in human extracellular vesicles https://guolab.wchscu.cn/EVAtlas
EVmiRNA Database of miRNA profiling in extracellular vesicles https://guolab.wchscu.cn/EVmiRNA
TiGER Tissue-specific Gene Expression and Regulation http://bioinfo.wilmer.jhu.edu/tiger/
GeNeCK Web server for gene network construction and visualization https://lce.biohpc.swmed.edu/geneck/
FFLtool Web server for TF and miRNA feed forward loop analysis in human https://guolab.wchscu.cn/FFLtool
GRAND Gene Regulatory Network Database https://grand.networkmedicine.org/
GRNdb Gene regulatory networks database http://www.grndb.com/

ncRNA: Non-coding RNA; SNPs: Single nucleotide polymorphisms; TF: Transcription factor; TFDB: Transcription Factor Database; ATAC-seq: Assay for Transposase-Accessible Chromatin with high throughput sequencing.

Tumor Treatment and Prognosis

The combination of gene sequencing data and clinical data has gained widespread attention for the diagnosis and treatment of cancer. Many databases and web servers, including the previously mentioned databases, have been developed to create prediction models and provide treatment information, including some databases mentioned before. ICGC, cBioPortal, UCSC Xena, GEPIA2, GSCA (GSCALite), GDAC Firehose, CVCDAP, UALCAN, TCPA, and CancerProteome can help assess the correlation between genomic data (mRNA, miRNA, protein, DNA, and methylation) and patient survival. Here, we list other useful tools for cancer prognosis and treatment purposes [Table 3].

Table 3.

Summary of tumor treatment and prognosis-related databases.

Name Description Link
Kaplan–Meier Plotter Assess the correlation between the expression of all genes and survival https://kmplot.com/analysis
OncoLnc Link TCGA survival data to mRNAs, miRNAs, and lncRNAs http://www.oncolnc.org
MethSurv Perform multivariable survival analysis using DNA methylation data https://biit.cs.ut.ee/methsurv
ICBcomb Expression resource and functional analysis for ICB combination therapy studies https://guolab.wchscu.cn/ICBcomb
ICBatlas Comprehensive expression resources and functional analysis for patients with ICB therapy https://guolab.wchscu.cn/ICBatlas
GDSC Genomics of Drug Sensitivity in Cancer http://www.cancerrxgene.org
CancerDR Compilation of mutation data and pharmacological drug profiles from COSMIC and Cancer Cell Line Encyclopedia http://crdd.osdd.net/raghava/cancerdr

COSMIC: Catalogue of Somatic Mutations in Can­cer; ICB: Immune checkpoint blockade; lncRNAs: Long non-coding RNAs; TCGA: The Cancer Genome Atlas; DR: Drug Resistance.

The Kaplan–Meier (KM) Plotter[32], which contains 35,000+ samples across 21 tumor types, is capable of assessing the correlation between the expression of all genes and survival time based on data from the Gene Expression Omnibus database (GEO), European Genome-Phenome Archive (EGA), and TCGA databases. OncoLnc[33] is a tool for interactively exploring survival correlations, and for downloading clinical data coupled to expression data for mRNAs, miRNAs, or long non-coding RNAs (lncRNAs). Users can separate patients by gene expression, and then generate K–M plots, which were stored precomputed. MethSurv[34] is a web tool for performing multivariable survival analysis using 7358 methylomes from 25 different human cancers. This tool allows for survival analysis of a CpG located near a query gene. Additionally, it provides cluster analysis to associate methylation patterns with clinical characteristics for the query gene.

Immune checkpoint blockade (ICB) therapy is an effective strategy for activating antitumor immunity. ICBcomb[35] is the first database on expression resources for ICB combination therapy. It provides useful data for studies related to ICB combination therapy, and expression is compared two-by-two between the ICB, drug, combination (ICB and drug), and control groups. ICBatlas[36] provides the transcriptome features of ICB therapy through the analysis of 1515 ICB-treated samples from 25 studies across nine cancer types. Users can investigate clinical outcomes, treatment-related genes, biological pathways, and immune cell infiltration at the response or treatment level, and compare the degree to which genes impact the response.

Genomics of Drug Sensitibity in Cancer (GDSC)[37] is a program for identifying the molecular features of cancers that predict the response to anticancer drugs. Users can find drug response data and genomic markers of sensitivity. GDSC has been used to characterize 1000 human cancer cell lines and screened for hundreds of compounds. The CancerDR[38] provides information on 148 anticancer drugs and their pharmacological profiles across 952 cancer cell lines, including sequences of natural variants, mutations, tertiary structures, and alignment profiles of mutants/variants. A number of web-based tools have been integrated into CancerDR. Users can identify genetic alterations in genes responsible for drug resistance.

For survival analysis, the KMplotter contains the most authoritative survival analyses for patients and is easy to use. OncoLnc enables users to download raw expression data and the corresponding prognostic information. The MethSurv tool is used for methylation-based cancer biomarkers. The GDSC is the largest public resource for information on drug sensitivity in cancer cells. ICBcomb and ICBatlas specialize in human and mouse expression data related to ICB combination therapy. ICBcomb divides groups into samples treated with ICB, other drugs, or their combinations, while ICBatlas focuses on drug responses.

Resources for Immune Infiltration and Tumor Antigen-HLA Typing

The tumor immune microenvironment, consisting of tumor cells, fibroblasts, immune cells, and various molecules, is essential for understanding immune infiltration in cancer research. Accurate prediction of tumor antigens and their Human Leukocyte Antigen (HLA) binding affinities is essential for effective cancer immunotherapy. Below, we outline the key tools used for these predictions [Table 4].

Table 4.

Summary of computational tools and resources for immune infiltration and tumor antigen-HLA typing.

Name Description Link
CIBERSORTx Provide an estimation of the abundances of member cell types in a mixed cell population https://cibersortx.stanford.edu/
ImmuCellAI A unique method for comprehensive T-cell subsets abundance prediction and ICB response prediction https://guolab.wchscu.cn/ImmuCellAI
ImmuCellAI-mouse A tool for comprehensive prediction of mouse immune cell abundance and immune microenvironment depiction https://guolab.wchscu.cn/ImmuCellAI-mouse
TIMER2.0 A comprehensive resource for systematical analysis of immune infiltrates across diverse cancer types http://timer.cistrome.org/
xCell Digitally portraying the tissue cellular heterogeneity landscape http://xCell.ucsf.edu/
TIP A web server for resolving tumor immunophenotype profiling http://biocc.hrbmu.edu.cn/TIP/
EPIC Estimating the proportions of immune and cancer cells http://epic.gfellerlab.org
ESTIMATE Estimation of STromal and Immune cells in MAlignant Tumor tissues using Expression data https://bioinformatics.mdanderson.org/estimate/
MCP-counter Microenvironment cell populations-counter https://github.com/ebecht/MCPcounter
quanTIseq Constrained least square regression http://icbii-med.ac.at/software/quantisea/doc/index.html
pVAC-seq Personalized variant antigens by cancer sequencing https://github.com/griffithlab/pVAC-Seq
INTERGTATE-Neo A pipeline for personalized gene fusion neoantigen discovery https://github.com/ChrisMaherLab/INTEGRATE-Neo
TSNAD Tumor-specific neoantigen detection http://biopharm.zju.edu.cn/tsnad/
CloudNeo A cloud pipeline for identifying patient-specific tumor neoantigens https://github.com/TheJacksonLaboratory/CloudNeo
ScanNeo Identifying indel-derived neoantigens using RNA-Seq data https://github.com/ylab-hi/ScanNeo
ASNEO Identification of personalized alternative splicing based neoantigens with RNA-seq https://github.com/bm2-lab/ASNEO
NetMHCpan Prediction of peptide-MHC class I binding using artificial neural networks http://www.cbs.dtu.dk/services/NetMHCpan-4.1/
DeepHLApan Neoantigen prediction considering both HLA-peptide binding and immunogenicity https://github.com/jiujiezz/deephlapan

HLA: Human Leukocyte Antigen; ICB: Immune checkpoint blockade; MHC: Major histocompatibility complex.

Immune Cell Abundance Identifier (ImmuCellAI)[39] is a tool used to estimate the abundance of 24 immune cell types (especially 18 T cell subtypes) from human gene expression datasets, including RNA Sequencing (RNA-seq) and microarray data. It has a powerful and unique function in the estimation of tumor immune invasion and the prediction of immunotherapy response. ImmuCellAI-mouse[40] is a tool used to estimate the abundance of 36 immune cells based on gene expression profiles from mouse RNA-Seq or microarray data. CIBERSORTx[41] is an analytical tool that provides an estimation of the abundances of member cell types in a mixed cell population, using gene expression data. By minimizing platform-specific variation, CIBERSORTx allows large-scale tissue dissection using RNA-Seq data. Estimating the Proportions of Immune and Cancer Cells (EPIC)[42] is a tool for estimating the proportions of different cell types from bulk gene expression data. It is able to analyze the percentage of infiltration of eight types of immune cells. Microenvironment Cell Populations-counter (MCP-counter)[43] is an R package used to robustly quantify the abundance of multiple immune and non-immune matrix populations in the transcriptome of cellular heterogeneous tissues, such as normal or malignant tissues. quanTIseq[44] is a computational pipeline for quantification of the tumor immune contexture from human RNA-seq data.

TIMER2.0[45] uses six different computational methods to estimate immune cell infiltration in TCGA tumors or user-supplied transcriptome data. It also helps to discover the relationship between immune infiltration, gene expression, mutation, and survival characteristics in the TCGA cohort. xCell[46] is a web tool that performs cell type enrichment analysis from gene expression data for 64 immune and stroma cell types. Tracking Tumor Immunophenotype (TIP)[47] is an open access, user-friendly web-based tool that provides a one-stop tool for comprehensive assessment of tumor immunophenotypes, including activity status and immune cell infiltration in the seven-step cancer immune cycle. ESTIMATE[48] provides researchers with scores for tumor purity, the level of stromal cells present, and the infiltration level of immune cells in tumor tissues based on expression data.

pVAC-seq[49] is the first software to streamline tumor neoantigen prediction, focusing primarily on point mutations. INTEGRATE[50] Neo extends the functionality of the high-precision gene fusion discovery tool INTEGRATE to identify gene fusion neoantigens using next-generation sequencing (NGS) data. Tumor-specific neoantigen detection (TSNAD)[51] supports the analysis of whole genome/exome and transcriptome data, identifying tumor neoantigens from different mutation sources such as point mutations, insertion deletions, and fusion genes. CloudNeo[52] is the first tool to provide a cloud platform for tumor neoantigen prediction. ScanNeo[53] provides insertion and deletion (INDEL)-derived tumor neoantigen predictions by using RNA-Seq data. Alternative Splicing NEOantigens (ASNEO)[54] provides tumor neoantigen prediction from alternative splicing. NetMHCpan[55] was the earliest developed and most widely used tool. Many comprehensive tumor neoantigen prediction software rely on NETMHCPAN to predict interactions between polypeptides and major histocompatibility complex (MHC) types. DeepHLApan[56] incorporates peptide and MHC interactions with the immunogenicity of peptide-HLA complex (pHLA), using deep learning to construct its prediction algorithm.

Immune Repertoire Analysis

Exploring the B-cell and T-cell receptor (BCR and TCR) repertoire is critical for understanding adaptive immune responses and immunotherapy. To handle these enormous amounts of data, several bioinformatics resources have been designed, aiming to identify the recombinant V/D/J genes, annotate the Complementarity Determining Region (CDR) regions [Table 5], and annotate the whole immune repertoire.

Table 5.

Summary of immune repertoire analysis tools and databases.

Name Description Link
MiXCR A universal framework that processes big immunome data from raw sequences to quantitated clonotypes http://mixcr.milaboratory.com
CATT An ultra-sensitive and accurate tool for characterizing TCR sequence in bulk and single cell TCR-Seq and RNA-Seq data https://guolab.wchscu.cn/CATT
Immunarch Fast and seamless exploration of single-cell and bulk T-cell/antibody immune repertoires in R https://immunarch.com
VDJtools Java/Groovy-based framework to facilitate analysis of immune repertoire sequencing data https://github.com/mikessh/vdjtools
TCRdist Python API-enabled toolkit for analyzing TCR repertoires with flexible distance measures https://github.com/kmayerb/tcrdist3
TRUST4.0 Analyze TCR and BCR sequences using unselected RNA sequencing data https://github.com/liulab-dfci/TRUST4
IGoR Quantitatively characterizes the statistics of TCR/BCR generation from both cDNA and gDNA https://github.com/qmarcou/IGoR
VisTCR An interactive visualization of high-throughput TCR sequencing data https://github.com/qingshanni/VisTCR
TCRosetta A powerful server for analyzing and annotating the TCR repertoire https://guolab.wchscu.cn/TCRosetta
TCRdb The largest TCR sequence database https://guolab.wchscu.cn/TCRdb2

BCR: B-cell receptor; DNA: Deoxyribonucleic acid; RNA: Ribonucleic acid; TCR: T-cell receptor.

MiXCR[57] is a software for immune profiling from raw TCR or RNA sequencing data to receptor sequence identification that works with any wet laboratory protocol and data type. CharActerzing TCR repertoires (CATT)[58] is a tool for characterizing TCR sequences in bulk and single cell TCR-Seq and RNA-Seq data. CATT showed superior recall and precision, especially for short read length, small size, and single-cell sequencing data. Immunarch[59] is an R package designed to analyze TCR and BCR repertoires. Given its ability to work with any type of data, user-friendly coding, and automatic format detection, this tool has a wide range of applications. VDJtools[60] is designed for analysis of immune repertoire sequencing data and is able to perform various forms of cross-sample analysis through a sufficiently simple enough command line tool. Tcrdist3[61] is a Python Application Programming Interface (API)-enabled toolkit for analyzing TCR repertoires and forming groups of biochemically similar TCRs that can be used to robustly quantify functionally similar TCRs. TRUST4.0[62] is an open-source algorithm for the reconstruction of immune receptor repertoires in αβ/γδ T cells and B cells from RNA-seq data, and this algorithm is faster and more sensitive when receptor repertoires are longer. Inference and Generation of Repertoires (IGoR)[63] is a comprehensive tool that calculates B or T cell receptor sequence reads and quantitatively CDR characterizes the statistics of receptor generation from both complementary DNA (cDNA) and genomic DNA (gDNA).

Since there are thousands of TCRs in a sample, it is difficult to analyze the features of the TCR repertoire; a few web-based tools have been designed with user-friendly interfaces. Visualization of high-throughput TCR sequencing (VisTCR)[64] provides an interactive visualization of high-throughput TCR sequencing data, while also incorporating a friendly graphical user interface and a flexible workflow for data analysis. TCRosetta[65] is a server for comprehensive TCR repertoire analysis, including TCR repertoire clonality, diversity, V/J gene usage, and disease-associated TCR annotation. Moreover, users can annotate disease condition and cell type information based on antigen-associated TCR sequences. T-Cell Receptor sequence Database (TCRdb2.0)[66] is the largest human TCR sequences database and it includes more than 691 million T-Cell Receptor Beta chain (TRB) sequences from more than 19,700 samples. It uses a uniform pipeline to characterize the TCR sequences, describing TCR diversity, length distribution, and V-J gene utilization, with outputs such as interactive data visualization charts. Its powerful search function allows users to identify their TCR sequences of interest under different conditions.

With comprehensive sequencing analysis and superior performance, MiXCR is most commonly used to process TCR and BCR sequences. It focuses on sequence assembly and error correction, and is thus suitable even for sequences with many errors and hypermutations. Immunarch and VDJtools provide global evaluation methods on the TCR/BCR sequencing data, including diversity measurements and clonotype distribution. TRUST4 is faster and more sensitive for assembling longer receptor repertoires. Tcrdist3 can be used to robustly quantify functionally similar TCRs in bulk repertoires across individuals. CATT showed superior recall and precision, especially for short read length, small-size, and single-cell sequencing data.

Resources for Cancer Driver Gene and Somatic Mutation Analysis

To date, significant progress has been made in understanding the mutations and abnormal genes involved in human cancers. For example, genome-wide association studies (GWAS) have provided insights into the genetic architecture of cancer. However, the systematic identification of genes that drives mutations and mediates tumor physiological effects from large-scale genomic data continues to be a great challenge. Here we introduce these databases, which are devoted to identifying cancer somatic mutation, GWAS cancer risk variants, and subsequent driver mutation identification [Table 6].

Table 6.

Summary of online resources and tools for identifying cancer driver genes and somatic mutations.

Name Description Link
GATK-Mutect2 Genome Analysis Toolkit Mutect2 https://gatk.broadinstitute.org/hc/en-us
VarScan Variant detection in massively parallel Sequencing data https://varscan.sourceforge.net/
NeuSomatic Deep CNNs for accurate somatic mutation detection https://github.com/bioinform/neusomatic
SyRI Synteny and Rearrangement Identifier https://github.com/schneebergerlab/syri
TSGene A web resource for tumor suppressor genes http://bioinfo.mc.vanderbilt.edu/TSGene/
ONGene A literature-based database for human oncogenes http://ongene.bioinfo-minzhao.org/
GWAS Catalog A manually curated resource of all published GWAS and association results https://www.ebi.ac.uk/gwas
GRASP Genome-Wide Repository of Associations Between SNPs and Phenotypes https://grasp.nhlbi.nih.gov/Overview.aspx
PancanQTL A user-friendly database to store cis/trans-eQTLs and GWAS-related eQTLs in cancers http://gong_lab.hzau.edu.cn/PancanQTL/
ABC-GWAS Analysis of Breast Cancer GWAS http://education.knoweng.org/abc-gwas/
DriverDBv4 A database for human cancer driver gene research http://driverdb.bioinfomics.org/
NCG Network of cancer genes http://ncg.kcl.ac.uk/
OncoVar An integrated database and analysis platform for oncogenic driver variants in cancers https://oncovar.org/
CNCDatabase Cornell Non-coding Cancer driver Database https://cncdatabase.med.cornell.edu
IntOGen Integrative oncogenomics https://www.intogen.org
DriverML Integrating Rao’s score test and supervised machine learning to identify cancer driver genes https://github.com/HelloYiHan/DriverML
OncodriveCLUST Identify genes with a significant bias toward mutation clustering within the protein sequence http://bg.upf.edu/oncodriveclust
MuSiC A pipeline for determining the mutational significance in cancer
MutSigCV An integrative approach that corrects for variants using patient-specific mutation frequency and spectrum, and gene-specific background mutation model http://bg.upf.edu/oncodrive
OncodriveFM An approach based on functional impact bias using three well-known methods https://doi.org/10.5281/zenodo.61372
ContrastRank A method based on estimating the putative defective rate of each gene in tumor against normal and samples from the 1000 Genomes Project data
PARADIGM A novel method for detecting consistent pathways in cancers by incorporating patient-specific genetic data into carefully curated NCI pathways http://sbenz.github.com/Paradigm
Helios An algorithm predicts SMGs by integrating genomic and functional RNAi screening data from primary tumors

CNNs: Convolutional Neural Networks; DB: Database; eQTLs: Expression quantitative trait loci; GWAS: Genome-wide association studies; NCI: National cancer institute; SNPs: Single nucleotide polymorphisms; SMGs: Significantly Mutated Genes.

Numerous computational tools are available for identifying somatic mutations in cancer. Genome Analysis ToolKit (GATK) Mutect2[67] is highly recognized for detecting somatic mutations, particularly SNPs and indels, in germline DNA and RNAseq data. It leverages the assembly based capabilities of HaplotypeCaller to achieve industry-standard performance. VarScan[68] is another versatile tool for variant detection in NGS data, comparing genomic loci between tumor and normal samples to identify enriched suballeles. Unlike probabilistic frameworks, VarScan employs a robust heuristic method for variant calling. NeuSomatic[69] was the first to utilize Convolutional Neural Networks (CNNs), extracting important mutation signals from raw data, achieving high accuracy across various sequencing technologies, sample purities, and sequencing strategies. SyRI[70] performs pairwise whole-genome comparisons at the chromosome level, using whole-genome alignments (WGAs) as input. It distinguishes rearranged regions from non-rearranged (syntenic) regions and offers comprehensive regional annotations, including transpositions and duplications.

The GWAS Catalog[71] is the largest public resource for GWAS data, featuring significant variant-trait associations and metadata from over 5000 human traits, including 14,017 cancer risk variants from 1344 studies. Users can explore SNP-cancer associations through its search interface. The Genome-Wide Repository of Associations Between SNPs and Phenotypes (GRASP)[72] database builds on the GWAS Catalog by re-annotating results using sources such as RNA editing sites, lincRNAs, and PTMs, offering tens of thousands of cancer risk variants and related publications. PancanQTL[73] lists cis- and trans-expression quantitative trait loci (eQTLs) across 33 cancer types, including GWAS-related eQTLs, allowing users to search, browse, and download overlapping eQTLs with GWAS risk loci. Analysis of Breast Cancer GWAS (ABC-GWAS)[74] annotates estrogen receptor-positive breast cancer GWAS variants, identifying functions for 2813 single nucleotide variants in 93 genomic loci via eQTLs, chromatin interactions, and TF binding motifs.

TSGene[75] is a web resource for tumor suppressor genes, containing 716 human, 628 mouse, and 567 rat tumor suppressor genes. TSGene[76] offers detailed annotations, including information on gene expression, mutation data, and literature references. The ONGene database includes 803 human oncogenes collated from thousands of research papers, providing a resource for exploring oncogenes’ mutation patterns and cross-cancer comparisons.

DriverDBv4[77] is a cancer omics database which incorporates somatic mutation, RNA expression, miRNA expression, protein expression, methylation, Copy-Number Variation (CNV), and clinical data in addition to annotation databases. The Network of Cancer Genes (NCG)[78] is a manually curated collection of cancer genes, healthy drivers, and their properties, including 711 disputed tumor driver genes and 1661 candidate driver genes. OncoVar[79] employs published bioinformatics algorithms and incorporates known driver events to identify driver mutations and driver genes. The CNCDatabase[80] documents 1111 protein-coding genes and 90 ncRNAs with reported drivers in their non-coding regions from 32 cancer types by computational predictions of positive selection using whole-genome sequences and differential gene expression in samples with and without mutations.

The OncoGenomics (IntOGen)[81] pipeline is an approach for obtaining a compendium of mutational cancer drivers. It was applied to somatic mutations in more than 28,000 tumors across 66 cancer types, revealing 568 cancer genes and pointing out their tumorigenesis mechanisms. DriverML[82] combines Rao’s score testing with supervised machine learning-based approach to prioritize cancer driver genes. OncodriveCLUSTL[83] is a linear clustering algorithm for detecting important clustering signals across genomic regions. The method is complementary to other drive detection methods for coded sequences and can also be applied to mutation cluster detection of non-coding regions and non-human data. MuSiC[84] uses statistical methods to separate important events that may be drivers of disease from passenger mutations and provides unique practical advantages over existing software. MutSigCV[85] is a tool for prioritizing driver genes, using gene expression and replication timing information to build a patient-specific background mutation model.

OncodriveFM[86] is a method for detecting candidate cancer drivers that are not dependent on recurrence. Users can calculate functional mutation (FM) bias for genes and pathways in their own somatic mutation dataset even if only a few cancer samples are sequenced. ContrastRank,[87] a method for prioritizing putative impaired genes in cancer, is based on the comparison of exome sequencing data from different cohorts and can detect putative cancer driver genes. PARADIGM[88] is a method for inferring patient-specific genetic activities incorporating curated pathway interactions among genes. The method predicts the degree to which a pathway’s activities are altered in the patient using probabilistic inference. Helios[89] is a new algorithm that integrates genomic data from primary tumors with data from functional RNA interference (RNAi) screening to identify driver genes in large regions of DNA that are repeatedly amplified. Applying Helios to breast cancer data identified a set of candidate drivers that are highly enriched with known drivers.

However, traditional methods based on mutation frequency often face limitations due to tumor heterogeneity and other factors. Assuming a constant background mutation model, low mutation frequencies will lead to false positive results. To address this issue, some complementary methods have been proposed. The representative software used to detect mutations between genes that are significantly clustered in specific regions of the amino acid sequence, is OncodriveCLUST. It primarily uses silencing mutations in coding regions as background. However, silent mutations may play an important functional role in cancer. In addition, background models based on silent mutations are not effective in assessing constraints in certain genomic regions due to the low recurrence rate of synonymous mutations. MutSigCV[85] is a popular driver gene prioritization tool that uses gene expression and replication time information to construct patient-specific background mutation models. In addition, there are algorithms based on gene function. According to the protein information corresponding to the mutated genes, the hazard evaluation model of gene mutation was constructed, and the genes with greater harmfulness were ultimately identified as driver genes, such as by the software OncodriveFM. Other methods include network or pathway-based methods, algorithms based on network and pathway analysis that can well evaluate mutation effects caused by mutations in tumors well, and software such as PARADIGM. It is important to integrate somatic mutation, structural variation, gene expression, and methylation profiles, such as Helios. Empirically, when using bioinformatics to identify tumor driver genes, different software can be used to identify some genes that are ignored by other detection methods, so it is recommended that different software based on different principles for integrative detection of tumor driver genes be used.

Cancer Single-cell Database and Analysis Resources

The explosion of massive cancer single-cell RNA-seq (scRNA-seq) datasets in the past few years has led to the need for integration. Here we summarize various databases and tools for cancer scRNA-seq data [Table 7].

Table 7.

Summary of cancer single-cell databases and analysis resources.

Name Description Link
HCA Human cell atlas https://data.humancellatlas.org
CancerSCEM Cancer Single-cell Expression Map https://ngdc.cncb.ac.cn/cancerscem
CancerSEA Cancer Single-cell State Atlas http://biocc.hrbmu.edu.cn/CancerSEA
TISCH2 Tumor Immune Single-cell Hub 2 http://tisch.comp-genomics.org
IMMUcan IMMUcan Single-cell RNAseq Database https://immucanscdb.vital-it.ch
CeDR Atlas Cellular Drug Response Atlas https://ngdc.cncb.ac.cn/cedr
scLiverDB Human and mouse liver transcriptome landscapes at single-cell resolution https://guolab.wchscu.cn/liverdb
Cell BLAST A cell-querying tool effectively handling batch effect https://cblast.gao-lab.org/

RNA: Ribonucleic acid.

The Human Cell Atlas (HCA)[90] is a large, authoritative source of scRNA-seq data across cell types and cell states at different life stages to create cellular reference maps. CancerSCEM[91] is a user-friendly web interface for browsing, searching, and online analysis of human cancer scRNA-seq data. To date, CancerSCEM consists of 208 cancer samples across 20 human cancer types with multiscale analyses, including accurate cell type annotation and functional gene expression. CancerSEA[92] is the first dedicated database that aims to explore distinct functional states of cancer cells at the single-cell level, involving 14 functional states of 41,900 cancer single cells from 25 cancer types. It allows users to query which functional states are associated with the genes of interest in different cancers. Tumor Immune Single-cell Hub 2 (TISCH2)[93] is a resource of scRNA-seq data from human and mouse tumors including 190 tumor scRNA-seq datasets covering 6 million cells in 50 cancer types. TISCH2 enables comprehensive characterization of gene expression in the TME across multiple cancer types. Integrated iMMUnoprofiling of large adaptive CANcer patient cohorts (IMMUcan)[94] is a fully integrated scRNA-seq database for human cancer, which is used to explore datasets across tumor locations in a gene/cell-centric manner, and ranks immune cell types and genes correlated to malignant transformation.

The Cellular Drug Response (CeDR) Atlas[95] is a knowledge base reporting computational inference of cellular drug response for hundreds of cell types with the results for more than 582 single cell datasets for human and mouse. scLiverDB[96] is a specialized database for human and mouse liver transcriptomes to unravel the landscape of liver cell types, cell heterogeneity, and gene expression at single-cell resolution. Cell BLAST[97] is an approach to single-cell transcriptome data retrieval and annotation built on neural network-based generative model and customized cell-to-cell similarity metric. The effectiveness of Cell BLAST in annotating discrete cell types and continuous cell differentiation potential, as well as identifying novel cell types, stands out.

Conclusion and Perspectives

In the past decade, as sequencing technologies have advanced significantly, there has been a substantial increase in the amount of omics data generated by the cancer research community. To interpret these data, effective collection and comprehensive analysis are required. This study introduces a range of important databases that store large amounts of cancer omics data, as well as online analysis platforms and tools for integrating, mining, and visualizing cancer data. These resources benefit cancer researchers in exploring the molecular mechanisms, personalized treatment, and prognosis of cancer.

With the development of bioinformatics methods, the integration and analysis of multi-omics data, as well as the integration of omics data with other data modalities such as histopathological data,[98] will deepen our comprehensive understanding of cancer. The development of new methods to extract more detailed clinical information from electronic health records will also contribute to the interpretation of omics data in disease contexts. Additionally, user-friendly interfaces and online platforms facilitate convenient access and analysis of data for individuals without expertise in bioinformatics. Limited access to data due to incomplete data annotations by submitting researchers often impede data utilization.[99] Therefore, strengthening the standardization of data annotation, establishing efficient data sharing strategies, and developing secure and convenient data sharing platforms are crucial for enhancing the accessibility and comparability of data. Continuous updates and expansions of existing databases and atlases are essential to encompass a broader range of tumor types, cell types, and data types. Innovative methods are needed to analyze and interpret single-cell sequencing data accurately, identify cell types and states, and elucidate the complex interactions between different cell types in the TME. Finally, there are still some limitations and shortcomings in this review. The coverage may be limited due to the ongoing development of new tools and databases. Furthermore, a detailed description of the specific usage methods of each database and bioinformatics tool is lacking, which may limit researchers’ comprehensive understanding of these tools.

Acknowledgments

The authors thank all those who contributed to this work.

Funding

This study was supported by a grant of the 1.3.5 Project for Disciplines of Excellence, West China Hospital, Sichuan University (No. ZYYC23007).

Conflicts of interest

None.

Footnotes

Jin Huang and Lingzi Mao contributed equally to this work.

How to cite this article: Huang J, Mao LZ, Lei Q, Guo AY. Bioinformatics tools and resources for cancer and application. Chin Med J 2024;137:2052–2064. doi: 10.1097/CM9.0000000000003254

References

  • 1.Hanahan D. Hallmarks of cancer: New dimensions. Cancer Discov 2022;12:31–46. doi: 10.1158/2159-8290.Cd-21-1059. [DOI] [PubMed] [Google Scholar]
  • 2.Connor AA, Gallinger S. Pancreatic cancer evolution and heterogeneity: Integrating omics and clinical data. Nat Rev Cancer 2022;22:131–142. doi: 10.1038/s41568-021-00418-1. [DOI] [PubMed] [Google Scholar]
  • 3.Liu CJ Hu FF Xie GY Miao YR Li XW Zeng Y, et al. GSCA: An integrated platform for gene set cancer analysis at genomic, pharmacogenomic and immunogenomic levels. Brief Bioinform 2023;24:bbac558. doi: 10.1093/bib/bbac558. [DOI] [PubMed] [Google Scholar]
  • 4.Jiménez-Santos MJ, García-Martín S, Fustero-Torre C, Di Domenico T, Gómez-López G, Al-Shahrour F. Bioinformatics roadmap for therapy selection in cancer genomics. Mol Oncol 2022;16:3881–3908. doi: 10.1002/1878-0261.13286. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Cancer Genome Atlas Research Network; Weinstein JN Collisson EA Mills GB Shaw KR Ozenberger BA, et al. The cancer genome atlas pan-cancer analysis project. Nat Genet 2013;45:1113–1120. doi: 10.1038/ng.2764. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Zhang J Bajari R Andric D Gerthoffert F Lepsa A Nahal-Bose H, et al. The international cancer genome consortium data portal. Nat Biotechnol 2019;37:367–369. doi: 10.1038/s41587-019-0055-9. [DOI] [PubMed] [Google Scholar]
  • 7.Tate JG Bamford S Jubb HC Sondka Z Beare DM Bindal N, et al. COSMIC: The catalogue of somatic mutations in cancer. Nucleic Acids Res 2019;47(D1):D941–D947. doi: 10.1093/nar/gky1015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Cerami E Gao J Dogrusoz U Gross BE Sumer SO Aksoy BA, et al. The cBio cancer genomics portal: An open platform for exploring multidimensional cancer genomics data. Cancer Discov 2012;2:401–404. doi: 10.1158/2159-8290.Cd-12-0095. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Goldman MJ Craft B Hastie M Repečka K McDade F Kamath A, et al. Visualizing and interpreting cancer genomics data via the Xena platform. Nat Biotechnol 2020;38:675–678. doi: 10.1038/s41587-020-0546-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Guan X, Cai M, Du Y, Yang E, Ji J, Wu J. CVCDAP: An integrated platform for molecular and clinical analysis of cancer virtual cohorts. Nucleic Acids Res 2020;48(W1):W463–W471. doi: 10.1093/nar/gkaa423. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Tang Z, Kang B, Li C, Chen T, Zhang Z. GEPIA2: An enhanced web server for large-scale expression profiling and interactive analysis. Nucleic Acids Res 2019;47(W1):W556–W560. doi: 10.1093/nar/gkz430. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Edwards NJ Oberti M Thangudu RR Cai S McGarvey PB Jacob S, et al. The CPTAC data portal: A resource for cancer proteomics research. J Proteome Res 2015;14:2707–2713. doi: 10.1021/pr501254j. [DOI] [PubMed] [Google Scholar]
  • 13.Chandrashekar DS Karthikeyan SK Korla PK Patel H Shovon AR Athar M, et al. UALCAN: An update to the integrated cancer data analysis platform. Neoplasia 2022;25:18–27. doi: 10.1016/j.neo.2022.01.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Li J Lu Y Akbani R Ju Z Roebuck PL Liu W, et al. TCPA: A resource for cancer functional proteomics data. Nat Methods 2013;10:1046–1047. doi: 10.1038/nmeth.2650. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Lv D Li D Cai Y Guo J Chu S Yu J, et al. CancerProteome: A resource to functionally decipher the proteome landscape in cancer. Nucleic Acids Res 2024;52(D1):D1155–D1162. doi: 10.1093/nar/gkad824. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Liu CJ, Hu FF, Xia MX, Han L, Zhang Q, Guo AY. GSCALite: A web server for gene set cancer analysis. Bioinformatics 2018;34:3771–3772. doi: 10.1093/bioinformatics/bty411. [DOI] [PubMed] [Google Scholar]
  • 17.Shen WK Chen SY Gan ZQ Zhang YZ Yue T Chen MM, et al. AnimalTFDB 4.0: A comprehensive animal transcription factor database updated with variation and expression annotations. Nucleic Acids Res 2023;51(D1):D39–D45. doi: 10.1093/nar/gkac907. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Castro-Mondragon JA Riudavets-Puig R Rauluseviciute I Lemma RB Turchi L Blanc-Mathieu R, et al. JASPAR 2022: The 9th release of the open-access database of transcription factor binding profiles. Nucleic Acids Res 2022;50(D1):D165–D173. doi: 10.1093/nar/gkab1113. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Zhang Q Liu W Zhang HM Xie GY Miao YR Xia M, et al. hTFtarget: A comprehensive database for regulations of human transcription factors and their targets. Genomics Proteomics Bioinformatics 2020;18:120–128. doi: 10.1016/j.gpb.2019.09.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Zheng R Wan C Mei S Qin Q Wu Q Sun H, et al. Cistrome data browser: Expanded datasets and new tools for gene regulatory analysis. Nucleic Acids Res 2019;47(D1):D729–D735. doi: 10.1093/nar/gky1094. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Liu CJ, Fu X, Xia M, Zhang Q, Gu Z, Guo AY. miRNASNP-v3: A comprehensive database for SNPs and disease-related variations in miRNAs and miRNA targets. Nucleic Acids Res 2021;49(D1):D1276–D1281. doi: 10.1093/nar/gkaa783. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Xie GY, Liu CJ, Miao YR, Xia M, Zhang Q, Guo AY. A comprehensive platelet expression atlas (PEA) resource and platelet transcriptome landscape. Am J Hematol 2021;97:E18–E21. doi: 10.1002/ajh.26393. [DOI] [PubMed] [Google Scholar]
  • 23.Huang J Zheng W Zhang P Lin Q Chen Z Xuan J, et al. ChIPBase v3.0: The encyclopedia of transcriptional regulations of non-coding RNAs and protein-coding genes. Nucleic Acids Res 2023;51(D1):D46–D56. doi: 10.1093/nar/gkac1067. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Han H Cho JW Lee S Yun A Kim H Bae D, et al. TRRUST v2: An expanded reference database of human and mouse transcriptional regulatory interactions. Nucleic Acids Res 2018;46(D1):D380–D386. doi: 10.1093/nar/gkx1013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Liu CJ Xie GY Miao YR Xia M Wang Y Lei Q, et al. EVAtlas: A comprehensive database for ncRNA expression in human extracellular vesicles. Nucleic Acids Res 2022;50(D1):D111–D117. doi: 10.1093/nar/gkab668. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Liu T Zhang Q Zhang J Li C Miao YR Lei Q, et al. EVmiRNA: A database of miRNA profiling in extracellular vesicles. Nucleic Acids Res 2019;47(D1):D89–D93. doi: 10.1093/nar/gky985. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Liu X, Yu X, Zack DJ, Zhu H, Qian J. TiGER: A database for tissue-specific gene expression and regulation. BMC Bioinformatics 2008;9:271. doi: 10.1186/1471-2105-9-271. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Xie GY, Xia M, Miao YR, Luo M, Zhang Q, Guo AY. FFLtool: A web server for transcription factor and miRNA feed forward loop analysis in human. Bioinformatics 2020;36:2605–2607. doi: 10.1093/bioinformatics/btz929. [DOI] [PubMed] [Google Scholar]
  • 29.Ben Guebila M Lopes-Ramos CM Weighill D Sonawane AR Burkholz R Shamsaei B, et al. GRAND: A database of gene regulatory network models across human conditions. Nucleic Acids Res 2022;50(D1):D610–D621. doi: 10.1093/nar/gkab778. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Fang L, Li Y, Ma L, Xu Q, Tan F, Chen G. GRNdb: Decoding the gene regulatory networks in diverse human and mouse conditions. Nucleic Acids Res 2021;49(D1):D97–D103. doi: 10.1093/nar/gkaa995. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Zhang M Li Q Yu D Yao B Guo W Xie Y, et al. GeNeCK: A web server for gene network construction and visualization. BMC Bioinformatics 2019;20:12. doi: 10.1186/s12859-018-2560-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Gyorffy B, Lánczky A, Szállási Z. Implementing an online tool for genome-wide validation of survival-associated biomarkers in ovarian-cancer using microarray data from 1287 patients. Endocr Relat Cancer 2012;19:197–208. doi: 10.1530/erc-11-0329. [DOI] [PubMed] [Google Scholar]
  • 33.Anaya J. OncoLnc: Linking TCGA survival data to mRNAs, miRNAs, and lncRNAs. PeerJ Comp Sci 2016;2:e67. doi: 10.7287/PEERJ.PREPRINTS.1780. [Google Scholar]
  • 34.Modhukur V, Iljasenko T, Metsalu T, Lokk K, Laisk-Podar T, Vilo J. MethSurv: A web tool to perform multivariable survival analysis using DNA methylation data. Epigenomics 2018;10:277–288. doi: 10.2217/epi-2017-0118. [DOI] [PubMed] [Google Scholar]
  • 35.Xia Y Gao Y Liu MY Li L Pan W Mao LZ, et al. ICBcomb: A comprehensive expression database for immune checkpoint blockade combination therapy. Brief Bioinform 2023;25:bbad457. doi: 10.1093/bib/bbad457. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Yang M Miao YR Xie GY Luo M Hu H Kwok HF, et al. ICBatlas: A comprehensive resource for depicting immune checkpoint blockade therapy characteristics from transcriptome profiles. Cancer Immunol Res 2022;10:1398–1406. doi: 10.1158/2326-6066.Cir-22-0249. [DOI] [PubMed] [Google Scholar]
  • 37.Yang W Soares J Greninger P Edelman EJ Lightfoot H Forbes S, et al. Genomics of drug sensitivity in cancer (GDSC): A resource for therapeutic biomarker discovery in cancer cells. Nucleic Acids Res 2013;41(Database issue):D955–D961. doi: 10.1093/nar/gks1111. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Kumar R Chaudhary K Gupta S Singh H Kumar S Gautam A, et al. CancerDR: Cancer drug resistance database. Sci Rep 2013;3:1445. doi: 10.1038/srep01445. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Miao YR Zhang Q Lei Q Luo M Xie GY Wang H, et al. ImmuCellAI: A unique method for comprehensive T-cell subsets abundance prediction and its application in cancer immunotherapy. Adv Sci 2020;7:1902880. doi: 10.1002/advs.201902880. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Miao YR, Xia M, Luo M, Luo T, Yang M, Guo AY. ImmuCellAI-mouse: A tool for comprehensive prediction of mouse immune cell abundance and immune microenvironment depiction. Bioinformatics 2022;38:785–791. doi: 10.1093/bioinformatics/btab711. [DOI] [PubMed] [Google Scholar]
  • 41.Newman AM Steen CB Liu CL Gentles AJ Chaudhuri AA Scherer F, et al. Determining cell type abundance and expression from bulk tissues with digital cytometry. Nat Biotechnol 2019;37:773–782. doi: 10.1038/s41587-019-0114-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Racle J, Gfeller D. EPIC: A tool to estimate the proportions of different cell types from bulk gene expression data. Methods Mol Biol 2020;2120:233–248. doi: 10.1007/978-1-0716-0327-7_17. [DOI] [PubMed] [Google Scholar]
  • 43.Becht E Giraldo NA Lacroix L Buttard B Elarouci N Petitprez F, et al. Estimating the population abundance of tissue-infiltrating immune and stromal cell populations using gene expression. Genome Biol 2016;17:218. doi: 10.1186/s13059-016-1070-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Finotello F Mayer C Plattner C Laschober G Rieder D Hackl H, et al. Molecular and pharmacological modulators of the tumor immune contexture revealed by deconvolution of RNA-seq data. Genome Med 2019;11:34. doi: 10.1186/s13073-019-0638-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Li T Fu J Zeng Z Cohen D Li J Chen Q, et al. TIMER2.0 for analysis of tumor-infiltrating immune cells. Nucleic Acids Res 2020;48(W1):W509–W514. doi: 10.1093/nar/gkaa407. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Aran D, Hu Z, Butte AJ. xCell: Digitally portraying the tissue cellular heterogeneity landscape. Genome Biol 2017;18:220. doi: 10.1186/s13059-017-1349-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Xu L Deng C Pang B Zhang X Liu W Liao G, et al. TIP: A web server for resolving tumor immunophenotype profiling. Cancer Res 2018;78:6575–6580. doi: 10.1158/0008-5472.Can-18-0689. [DOI] [PubMed] [Google Scholar]
  • 48.Yoshihara K Shahmoradgoli M Martínez E Vegesna R Kim H Torres-Garcia W, et al. Inferring tumour purity and stromal and immune cell admixture from expression data. Nat Commun 2013;4:2612. doi: 10.1038/ncomms3612. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Hundal J Carreno BM Petti AA Linette GP Griffith OL Mardis ER, et al. pVAC-Seq: A genome-guided in silico approach to identifying tumor neoantigens. Genome Med 2016;8:11. doi: 10.1186/s13073-016-0264-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Zhang J, Mardis ER, Maher CA. INTEGRATE-neo: A pipeline for personalized gene fusion neoantigen discovery. Bioinformatics 2017;33:555–557. doi: 10.1093/bioinformatics/btw674. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Zhou Z Wu J Ren J Chen W Zhao W Gu X, et al. TSNAD v2.0: A one-stop software solution for tumor-specific neoantigen detection. Comput Struct Biotechnol J 2021;19:4510–4516. doi: 10.1016/j.csbj.2021.08.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Bais P, Namburi S, Gatti DM, Zhang X, Chuang JH. CloudNeo: A cloud pipeline for identifying patient-specific tumor neoantigens. Bioinformatics 2017;33:3110–3112. doi: 10.1093/bioinformatics/btx375. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Wang TY, Wang L, Alam SK, Hoeppner LH, Yang R. ScanNeo: Identifying indel-derived neoantigens using RNA-seq data. Bioinformatics 2019;35:4159–4161. doi: 10.1093/bioinformatics/btz193. [DOI] [PubMed] [Google Scholar]
  • 54.Zhang Z Zhou C Tang L Gong Y Wei Z Zhang G, et al. ASNEO: Identification of personalized alternative splicing based neoantigens with RNA-seq. Aging (Albany NY) 2020;12:14633–14648. doi: 10.18632/aging.103516. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Jurtz V, Paul S, Andreatta M, Marcatili P, Peters B, Nielsen M. NetMHCpan-4.0: Improved peptide-MHC class I interaction predictions integrating eluted ligand and peptide binding affinity data. J Immunol 2017;199:3360–3368. doi: 10.4049/jimmunol.1700893. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Wu J Wang W Zhang J Zhou B Zhao W Su Z, et al. DeepHLApan: A deep learning approach for neoantigen prediction considering both HLA-peptide binding and immunogenicity. Front Immunol 2019;10:2559. doi: 10.3389/fimmu.2019.02559. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Bolotin DA Poslavsky S Mitrophanov I Shugay M Mamedov IZ Putintseva EV, et al. MiXCR: Software for comprehensive adaptive immunity profiling. Nat Methods 2015;12:380–381. doi: 10.1038/nmeth.3364. [DOI] [PubMed] [Google Scholar]
  • 58.Chen SY, Liu CJ, Zhang Q, Guo AY. An ultra-sensitive T-cell receptor detection method for TCR-Seq and RNA-Seq data. Bioinformatics 2020;36:4255–4262. doi: 10.1093/bioinformatics/btaa432. [DOI] [PubMed] [Google Scholar]
  • 59.Team I. Immunarch: An R package for painless bioinformatics analysis of T-cell and B-cell immune repertoires. Zenodo 2019;10:5281. doi: 10.5281/ZENODO.3367200. [Google Scholar]
  • 60.Shugay M Bagaev DV Turchaninova MA Bolotin DA Britanova OV Putintseva EV, et al. VDJtools: Unifying post-analysis of T cell receptor repertoires. PLoS Comput Biol 2015;11:e1004503. doi: 10.1371/journal.pcbi.1004503. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Dash P Fiore-Gartland AJ Hertz T Wang GC Sharma S Souquette A, et al. Quantifiable predictive features define epitope-specific T cell receptor repertoires. Nature 2017;547:89–93. doi: 10.1038/nature22383. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Song L, Cohen D, Ouyang Z, Cao Y, Hu X, Liu XS. TRUST4: Immune repertoire reconstruction from bulk and single-cell RNA-seq data. Nat Methods 2021;18:627–630. doi: 10.1038/s41592-021-01142-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Marcou Q, Mora T, Walczak AM. High-throughput immune repertoire analysis with IGoR. Nat Commun 2018;9:561. doi: 10.1038/s41467-018-02832-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Ni Q Zhang J Zheng Z Chen G Christian L Grönholm J, et al. VisTCR: An interactive software for T cell repertoire sequencing data analysis. Front Genet 2020;11:771. doi: 10.3389/fgene.2020.00771. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Yue T, Chen SY, Shen WK, Cheng L, Guo AY. TCRosetta: A powerful server for analyzing and annotating T-cell receptor repertoire. Research Square 2022.doi: 10.21203/rs.3.rs-1621224/v1. [Google Scholar]
  • 66.Chen SY, Yue T, Lei Q, Guo AY. TCRdb: A comprehensive database for T-cell receptor sequences with powerful search function. Nucleic Acids Res 2021;49(D1):D468–D474. doi: 10.1093/nar/gkaa796. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.DePristo MA Banks E Poplin R Garimella KV Maguire JR Hartl C, et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet 2011;43:491–498. doi: 10.1038/ng.806. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Koboldt DC Chen K Wylie T Larson DE McLellan MD Mardis ER, et al. VarScan: Variant detection in massively parallel sequencing of individual and pooled samples. Bioinformatics 2009;25:2283–2285. doi: 10.1093/bioinformatics/btp373. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Sahraeian SME, Liu R, Lau B, Podesta K, Mohiyuddin M, Lam HYK. Deep convolutional neural networks for accurate somatic mutation detection. Nat Commun 2019;10:1041. doi: 10.1038/s41467-019-09027-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Goel M, Sun H, Jiao WB, Schneeberger K. SyRI: Finding genomic rearrangements and local sequence differences from whole-genome assemblies. Genome Biol 2019;20:277. doi: 10.1186/s13059-019-1911-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Sollis E Mosaku A Abid A Buniello A Cerezo M Gil L, et al. The NHGRI-EBI GWAS Catalog: Knowledgebase and deposition resource. Nucleic Acids Res 2023;51(D1):D977–D985. doi: 10.1093/nar/gkac1010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72.Leslie R, O’Donnell CJ, Johnson AD. GRASP: Analysis of genotype-phenotype results from 1390 genome-wide association studies and corresponding open access database. Bioinformatics 2014;30:i185–i194. doi: 10.1093/bioinformatics/btu273. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73.Gong J Mei S Liu C Xiang Y Ye Y Zhang Z, et al. PancanQTL: Systematic identification of cis-eQTLs and trans-eQTLs in 33 cancer types. Nucleic Acids Res 2018;46(D1):D971–D976. doi: 10.1093/nar/gkx861. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74.Manjunath M, Zhang Y, Zhang S, Roy S, Perez-Pinera P, Song JS. ABC-GWAS: Functional annotation of estrogen receptor-positive breast cancer genetic variants. Front Genet 2020;11:730. doi: 10.3389/fgene.2020.00730. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 75.Zhao M, Sun J, Zhao Z. TSGene: A web resource for tumor suppressor genes. Nucleic Acids Res 2013;41(D1):D970–D976. doi: 10.1093/nar/gks937. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76.Liu Y, Sun J, Zhao M. ONGene: A literature-based database for human oncogenes. J Genet Genomics 2017;44:119–121. doi: 10.1016/j.jgg.2016.12.004. [DOI] [PubMed] [Google Scholar]
  • 77.Liu CH Lai YL Shen PC Liu HC Tsai MH Wang YD, et al. DriverDBv4: A multi-omics integration database for cancer driver gene research. Nucleic Acids Res 2024;52(D1):D1246–D1252. doi: 10.1093/nar/gkad1060. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 78.Repana D Nulsen J Dressler L Bortolomeazzi M Venkata SK Tourna A, et al. The network of cancer genes (NCG): A comprehensive catalogue of known and candidate cancer genes from cancer sequencing screens. Genome Biol 2019;20:1. doi: 10.1186/s13059-018-1612-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 79.Wang T Ruan S Zhao X Shi X Teng H Zhong J, et al. OncoVar: An integrated database and analysis platform for oncogenic driver variants in cancers. Nucleic Acids Res 2021;49(D1):D1289–D1301. doi: 10.1093/nar/gkaa1033. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 80.Liu EM, Martinez-Fundichely A, Bollapragada R, Spiewack M, Khurana E. CNCDatabase: A database of non-coding cancer drivers. Nucleic Acids Res 2021;49(D1):D1094–D1101. doi: 10.1093/nar/gkaa915. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 81.Martínez-Jiménez F Muiños F Sentís I Deu-Pons J Reyes-Salazar I Arnedo-Pac C, et al. A compendium of mutational cancer driver genes. Nat Rev Cancer 2020;20:555–572. doi: 10.1038/s41568-020-0290-x. [DOI] [PubMed] [Google Scholar]
  • 82.Han Y Yang J Qian X Cheng WC Liu SH Hua X, et al. DriverML: A machine learning algorithm for identifying driver genes in cancer sequencing studies. Nucleic Acids Res 2019;47:e45–e45. doi: 10.1093/nar/gkz096. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 83.Arnedo-Pac C, Mularoni L, Muiños F, Gonzalez-Perez A, Lopez-Bigas N, Schwartz R. OncodriveCLUSTL: A sequence-based clustering method to identify cancer drivers. Bioinformatics 2019;35:4788–4790. doi: 10.1093/bioinformatics/btz501. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 84.Dees ND Zhang Q Kandoth C Wendl MC Schierding W Koboldt DC, et al. MuSiC: Identifying mutational significance in cancer genomes. Genome Res 2012;22:1589–1598. doi: 10.1101/gr.134635.111. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 85.Lawrence MS Stojanov P Polak P Kryukov GV Cibulskis K Sivachenko A, et al. Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature 2013;499:214–218. doi: 10.1038/nature12213. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 86.Gonzalez-Perez A, Lopez-Bigas N. Functional impact bias reveals cancer drivers. Nucleic Acids Res 2012;40:e169. doi: 10.1093/nar/gks743. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 87.Tian R, Basu MK, Capriotti E. ContrastRank: A new method for ranking putative cancer driver genes and classification of tumor samples. Bioinformatics 2014;30:i572–i578. doi: 10.1093/bioinformatics/btu466. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 88.Vaske CJ Benz SC Sanborn JZ Earl D Szeto C Zhu J, et al. Inference of patient-specific pathway activities from multi-dimensional cancer genomics data using PARADIGM. Bioinformatics 2010;26:i237–i245. doi: 10.1093/bioinformatics/btq182. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 89.Sanchez-Garcia F Villagrasa P Matsui J Kotliar D Castro V Akavia UD, et al. Integration of genomic data enables selective discovery of breast cancer drivers. Cell 2014;159:1461–1475. doi: 10.1016/j.cell.2014.10.048. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 90.Regev A Teichmann SA Lander ES Amit I Benoist C Birney E, et al. The human cell atlas. Elife 2017;6:e27041. doi: 10.7554/eLife.27041. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 91.Zeng J Zhang Y Shang Y Mai J Shi S Lu M, et al. CancerSCEM: A database of single-cell expression map across various human cancers. Nucleic Acids Res 2022;50(D1):D1147–D1155. doi: 10.1093/nar/gkab905. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 92.Yuan H Yan M Zhang G Liu W Deng C Liao G, et al. CancerSEA: A cancer single-cell state atlas. Nucleic Acids Res 2019;47(D1):D900–D908. doi: 10.1093/nar/gky939. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 93.Han Y Wang Y Dong X Sun D Liu Z Yue J, et al. TISCH2: Expanded datasets and new tools for single-cell transcriptome analyses of the tumor microenvironment. Nucleic Acids Res 2023;51(D1):D1425–D1431. doi: 10.1093/nar/gkac959. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 94.Camps J Noël F Liechti R Massenet-Regad L Rigade S Götz L, et al. Meta-analysis of human cancer single-cell RNA-seq datasets using the IMMUcan database. Cancer Res 2023;83:363–373. doi: 10.1158/0008-5472.Can-22-0074. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 95.Wang YY, Kang H, Xu T, Hao L, Bao Y, Jia P. CeDR atlas: A knowledgebase of cellular drug response. Nucleic Acids Res 2022;50(D1):D1164–D1171. doi: 10.1093/nar/gkab897. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 96.Pan Q Li B Lin D Miao YR Luo T Yue T, et al. scLiverDB: A database of human and mouse liver transcriptome landscapes at single-cell resolution. Small Methods 2023;7:e2201421. doi: 10.1002/smtd.202201421. [DOI] [PubMed] [Google Scholar]
  • 97.Cao ZJ, Wei L, Lu S, Yang DC, Gao G. Searching large-scale scRNA-seq databases via unbiased cell embedding with Cell BLAST. Nat Commun 2020;11:3458. doi: 10.1038/s41467-020-17281-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 98.Coudray N Ocampo PS Sakellaropoulos T Narula N Snuderl M Fenyö D, et al. Classification and mutation prediction from non-small cell lung cancer histopathology images using deep learning. Nat Med 2018;24:1559–1567. doi: 10.1038/s41591-018-0177-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 99.Cortés-Ciriano I, Gulhan DC, Lee JJ, Melloni GEM, Park PJ. Computational analysis of cancer genome sequencing data. Nat Rev Genet 2022;23:298–314. doi: 10.1038/s41576-021-00431-y. [DOI] [PubMed] [Google Scholar]

Articles from Chinese Medical Journal are provided here courtesy of Wolters Kluwer Health

RESOURCES