Deep Learning for Cancer Detection Based on Genomic and Imaging Data: A Comprehensive Review

Xinyu Wang; Can Su

doi:10.2147/CMAR.S533522

. 2025 Sep 20;17:2089–2125. doi: 10.2147/CMAR.S533522

Deep Learning for Cancer Detection Based on Genomic and Imaging Data: A Comprehensive Review

Xinyu Wang ^1,^✉, Can Su ¹

PMCID: PMC12459622 PMID: 41001156

Abstract

Cancer is a major global health challenge, and early detection is critical to improving survival rates. Advances in genomics and imaging technologies have made the integration of genomic and imaging data a common practice in cancer detection. Deep learning, especially Convolutional Neural Networks (CNNs), demonstrates substantial potential for early cancer diagnosis by autonomously extracting valuable features from large-scale datasets, thus enhancing early detection accuracy. This review summarizes the progress in deep learning applications for cancer detection using genomic and imaging data. It examines current models, their applications, challenges, and future research directions. Deep learning introduces innovative approaches for precision diagnosis and personalized treatment, facilitating advancements in early cancer screening technologies.

Keywords: deep learning, cancer detection, genetic data, image data, review

Introduction

Cancer remains one of the leading causes of disease burden and mortality globally, posing a significant public health challenge. Early detection is vital for improving survival rates, as it allows for timely intervention and more effective treatment strategies. The rapid development of high-throughput technologies has made genetic and imaging data essential for cancer detection and diagnosis. Combining these data types offers a comprehensive perspective, ranging from the molecular to the structural level. In recent years, deep learning techniques, particularly convolutional neural networks (CNNs),¹ have demonstrated considerable potential in cancer detection, significantly enhancing early detection accuracy and efficiency by autonomously extracting complex features from large-scale datasets.

Role of Genomic and Imaging Data in Cancer Detection

The Role of Genetic Data in Cancer Detection

Whole Genome Data (WGD) encompasses the complete DNA sequence of an individual and identifies genetic variants associated with cancer, including mutations, copy number variants, and structural variants.²^,³ These variants can be quantified using the following equation:

(1)

Where Inline graphic denotes the effect function of the mutation and is the weight of the mutation location. This formula helps to assess the contribution of different mutations to cancer development.

This information can be utilized for early cancer detection and risk assessment. For example, mutations in BRCA1 and BRCA2 are strongly associated with an elevated risk of breast and ovarian cancer.⁴ Somatic mutation data helps identify specific molecular features of cancers, guiding the selection of targeted therapies.⁵^,⁶

The Role of Imaging Data in Cancer Detection

Imaging techniques are crucial for the early detection, diagnosis, and treatment monitoring of cancer. CT and X-ray are commonly used to screen for lung, bone, and other cancers, providing high-resolution images that help identify tumor location, size, and morphology.⁷^,⁸ In contrast, MRI is advantageous for soft-tissue imaging and is commonly used to detect brain tumors, prostate cancer, and breast cancer.⁹^,¹⁰ Pathology images, derived from tissue biopsies, are the gold standard for cancer diagnosis. The advent of digital pathology has facilitated the storage, sharing, and analysis of these images.¹¹

Application and Development of Deep Learning in Cancer Detection

Improvements in deep learning techniques, particularly convolutional neural networks (CNNs), have shown significant advantages over traditional methods in cancer detection. Deep learning can automatically extract features from data, reduce human intervention, and enhance detection accuracy. By integrating genetic and imaging data, deep learning provides more effective support for accurate cancer detection.¹²^,¹³ In addition to CNNs, emerging deep learning models, such as the Transformer and graph neural networks (GNNs), demonstrate great potential in cancer detection. These models can better capture global features and topological relationships within complex data.¹⁴^,¹⁵

Advantages and Challenges of Deep Learning

One of the primary advantages of deep learning is its strong model adaptability, which allows it to be applied to various cancer detection tasks through transfer learning, thereby reducing the reliance on large-scale labeled data.¹⁶ However, challenges remain in the application of deep learning for cancer detection, such as data quality, model interpretability, and clinical feasibility.¹⁷^,¹⁸ Although Transformer and GNN models have demonstrated significant performance improvements, their interpretability and computational complexity remain key areas for future research.¹⁹^,²⁰

In conclusion, both genetic and imaging data offer unique advantages in cancer detection, and deep learning technologies provide a promising approach for accurate diagnosis and treatment. With the continued optimization and implementation of deep learning models, these technologies will play an increasingly critical role in cancer diagnosis and treatment in the future.

Current Challenges and Future Plans for Deep Learning in Cancer Detection

Deep learning demonstrates significant potential in cancer detection using genomic and imaging data. However, several challenges remain, including difficulties in data acquisition, data heterogeneity that affects model generalization, lack of model interpretability that limits clinical applications, complexity in multimodal data fusion, and issues with validation and application in real clinical settings. Further research and planning are required.

Data Quality and Quantity

High-quality, large-scale labeled data is essential for training deep learning models. However, access to medical data is restricted by privacy protections, ethical standards, and data-sharing mechanisms, resulting in data scarcity.²¹^,²² Additionally, data heterogeneity, such as variations in imaging equipment and gene sequencing platforms across hospitals, can lead to differences in data distribution, thereby affecting the generalization ability of models.¹⁴

Model Interpretability and Transparency

Deep learning models are often considered “black boxes” and lack interpretability, which limits their application in clinical settings.²³ Both doctors and patients must understand the model’s decision-making process to build trust and ensure the reliability of the diagnosis.²⁴ Therefore, it is essential to develop models with interpretability features or visualization tools that support decision-making.

Multimodal Data Fusion

The effective fusion of genomic and imaging data can provide more comprehensive information for cancer detection. However, feature extraction and fusion strategies for these different data types are not yet fully developed, which may lead to information loss or the introduction of noise, ultimately affecting model performance.¹⁷^,²⁵

Clinical Validation and Application

Although deep learning models have shown promising results in research, their validity and reliability in real clinical settings still need to be validated.²⁶ These models must undergo rigorous clinical trials to ensure their applicability across different populations and environments. Additionally, the deployment of models must consider factors such as computational resources, cost, and physician acceptance.²⁷

Future Planning

To address the challenges outlined above, the following areas should be prioritized in the future:

Data Sharing and Standardization: Establish a secure, compliant data-sharing platform and promote multicenter collaboration to obtain diverse and high-quality data. Additionally, develop standardized protocols for data collection and labeling to reduce the impact of data heterogeneity on model performance.²¹^,²²

Model Interpretability Research: Develop interpretable deep learning models or combine traditional machine learning methods to enhance model transparency and improve clinical acceptability.²³^,²⁴

Multimodal Fusion Methods: Explore effective strategies for multimodal data fusion to fully leverage the complementary information from genomic and imaging data, thereby improving the accuracy and robustness of cancer detection.¹⁷^,²⁵

Clinical Translation and Validation: Strengthen the clinical validation of models by conducting multicenter, large-scale clinical trials to assess their practical applications and facilitate their integration into clinical practice.²⁶^,²⁷

In conclusion, while deep learning holds significant promise for cancer detection, continued efforts are required to overcome challenges related to data, models, and clinical applications. Multi-party collaboration is essential to drive technological advances and achieve the goal of precision medicine.²⁸

Scope and Organization of the Review

This paper reviews deep learning methods for cancer detection based on genomic and imaging data, with a focus on their application in early screening, diagnosis, and prognosis prediction. The paper is organized as follows: Chapter 2 introduces the basic principles of deep learning techniques and their applications in medicine, emphasizing common models such as convolutional neural networks (CNNs). Chapter 3 provides a detailed analysis of the progress in deep learning for cancer detection using combined genomic and imaging data, exploring their application in early screening and diagnosis. Chapter 4 discusses the application of deep learning in multimodal data fusion, highlighting recent research on integrating genomic and imaging data. Finally, Chapter 5 addresses the current challenges of deep learning techniques in cancer detection, including issues related to data quality, model interpretability, and clinical feasibility, and outlines future research directions. Through these discussions, this paper aims to provide a reference for advancing precision medicine.

Deep Learning Methodologies

Overview of Deep Learning Architectures

Deep learning, a key branch of machine learning, has made significant advancements in cancer detection in recent years. Various deep learning architectures have been proposed and widely applied for early diagnosis, prognosis assessment, and treatment selection in cancer care. These architectures include Convolutional Neural Networks (CNNs),²⁹ Recurrent Neural Networks (RNNs) and their variants (eg, Long Short-Term Memory Networks (LSTMs)³⁰ and Gated Recurrent Units (GRUs)), Generative Adversarial Networks (GANs),³¹ Transformer Networks,³² and Graph Neural Networks (GNNs).³³ In cancer detection, these models can process complex genomic and medical imaging data to automatically extract valuable features and enhance diagnostic accuracy.

Convolutional Neural Networks (CNNs)

Convolutional Neural Networks (CNNs) are the most widely used class of deep learning architectures, particularly prominent in the field of image processing. CNNs automatically extract key features from images, such as edges, textures, and shapes, by locally sensing the input data through convolutional layers. The formula for the convolution operation can be expressed as follows:

(2)

Where Inline graphic is the input image, is the filter, and is the offset of the convolution. This local sensing mechanism enables the CNN to effectively capture spatial features in the image.

In addition, the pooling operation is acrucial step in CNNs used to reduce the dimensionality of the feature map. Pooling operations extract the most salient features from an image while reducing computational complexity and preventing overfitting. Common pooling techniques include Max Pooling and Average Pooling. The formulas are as follows:

(3)

(4)

The main advantage of CNNs is that they do not rely on manual feature extraction and can automatically learn more discriminative features from large datasets. In cancer detection, CNNs are widely used in medical image analysis. For instance, in CT image analysis for lung cancer, CNNs can identify and classify lung nodules to determine whether they are malignant.³⁴^,³⁵ In early breast cancer screening, CNNs are used for the automatic analysis of mammogram images, enabling the detection of small lumps and improving diagnostic accuracy.³⁶

Recurrent Neural Networks (RNNs) and Their Variants

Recurrent Neural Networks (RNNs) are well-suited for processing sequence data and are characterized by their ability to model temporal dependencies, preserving information from previous time steps. This makes RNNs particularly advantageous for processing genetic data, medical records, and other time-series data.

Standard RNNs suffer from the vanishing gradient problem, which limits their effectiveness in processing long sequences. To address this issue, Long Short-Term Memory Networks (LSTMs) and Gated Recurrent Units (GRUs) have been introduced. These variants mitigate the vanishing gradient problem by incorporating agating mechanism. LSTMs and GRUs are widely used in genomics, particularly in cancer prediction and progression analysis.³⁷^,³⁸ For instance, LSTMs are used to predict the occurrence and progression of cancer based on gene expression data,³⁹ while GRUs are employed to detect cancer-associated mutations and analyze temporal patterns in gene sequences.⁴⁰ The update formula of LSTMs can be expressed as follows:

(5)

Where Inline graphic , , denote the input gate, forget gate and output gate respectively, and is the candidate cell state, which can effectively deal with the timing dependence in sequence data.

Generative Adversarial Networks (GANs)

Generative Adversarial Networks (GANs) are atype of generative model consisting of agenerator and adiscriminator, which mutually enhance each other through adversarial training. The generator produces synthetic data, while the discriminator evaluates the authenticity of the data. GANs have significant applications in medical imaging, particularly in image enhancement and data generation.

In cancer detection, GANs are widely used for medical image generation and enhancement. For example, GANs can generate high-quality CT or MRI images, thereby improving the diagnosis of low-quality images.⁴¹^,⁴² Additionally, GANs can be employed for data augmentation, generating more labeled images to help train more accurate deep learning models and enhance the performance of cancer detection.⁴³

Transformer Network

The Transformer network represents asignificant breakthrough in deep learning in recent years. It employs aself-attention mechanism to capture dependencies between positions in asequence. Unlike traditional RNNs, the Transformer can process the entire sequence in parallel, which enhances computational efficiency and yields remarkable results across several tasks.

In cancer detection, the Transformer is widely used to analyze both images and genetic data. The Visual Transformer (ViT) is used to analyze pathology images, improving the accuracy of image classification by dividing the image into multiple segments and capturing the relationships between these segments using the self-attention mechanism.⁴⁴^,⁴⁵ Additionally, the Transformer has been applied to genomic data analysis, particularly in cancer risk prediction, to effectively capture long-term dependencies between gene mutations.⁴⁶

Graph Neural Networks (GNNs)

Graph Neural Networks (GNNs) are aclass of deep learning models designed for processing graph-structured data. In cancer detection, GNNs are widely used to model molecular-level cancer information.⁴⁷^,⁴⁸ GNNs can learn the relationships between nodes and edges to identify complex interactions between genes, proteins, and cancer phenotypes.⁴⁹

GNNs have demonstrated success in predicting cancer-related gene mutations. For instance, GNNs are used to analyze gene interaction networks and identify key genes and biomarkers associated with cancer.⁵⁰^,⁵¹ Additionally, GNNs can be applied to medical image analysis by segmenting images into multiple regions and capturing the relationships between these regions through graph structures, thereby improving the accuracy of tumor detection and classification.⁵²^,⁵³

Feature Extraction and Representation Learning in Deep Learning

Within the framework of deep learning, feature extraction and representation learning are critical steps for understanding and processing data. Unlike traditional machine learning methods, which rely on manually designed features, deep learning automatically extracts features from raw data through multi-layer neural networks and learns effective representations of the data. This significantly enhances the efficiency and accuracy of cancer detection. Particularly in the processing of genetic and imaging data, deep learning technologies provide robust support for early cancer diagnosis by automatically learning the intrinsic structure of the data and identifying potential correlations.

Feature Extraction and Representation Learning for Genetic Data

Gene data, particularly gene expression data and genomic sequence data, are characterized by high dimensionality, sparsity, and complexity. Traditional feature engineering approaches typically rely on domain knowledge for manually extracting features and selecting appropriate models for prediction. However, the complexity and diversity of genetic data make manual feature extraction challenging, as it may fail to capture all important information, potentially leading to overfitting and bias.

Deep learning addresses these challenges through automatic feature extraction and representation learning. The application of Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) in genetic data analysis, especially in gene expression and DNA sequence data, has shown significant advantages. By training multi-layer networks, deep learning can automatically uncover complex patterns and potential relationships in the data, leading to more accurate identification of cancer-related genes.

For instance, CNNs can capture local dependencies in gene expression data through their convolutional layers and extract features that are discriminative for cancer prediction.⁵⁴^,⁵⁵ RNNs and their variants (eg, LSTM and GRU) can efficiently capture temporal dependencies in gene sequences, offering anovel approach to detecting cancer-related mutations.⁵⁶^,⁵⁷ Additionally, Graph Neural Networks (GNNs) have made notable progress in genetic data processing in recent years. GNNs efficiently represent gene interaction networks by treating genes as nodes in agraph, extracting relationships and dependencies between genes through graph convolution operations. GNNs have shown strong performance in predicting cancer gene mutations and are adept at capturing the complex interaction patterns between genes.⁵⁸^,⁵⁹

Feature Extraction and Representation Learning for Image Data

Medical images, particularly CT, MRI, and pathology images, play acrucial role in cancer detection. Traditional image processing methods rely on hand-designed feature extraction algorithms, such as edge detection and texture analysis. Although effective, these methods often fail to extract sufficiently informative features when dealing with complex medical image data, leading to low classification and diagnostic accuracy.

The application of deep learning, especially in Convolutional Neural Network (CNN) architectures, has significantly advanced medical image processing technology. CNNs can effectively identify critical information, such as tumor morphology, size, and location, by automatically learning features from low-level to high-level image data. CNNs have achieved remarkable results in tumor detection, segmentation, and classification. For instance, CNNs can automatically recognize lung nodules in CT images and determine their malignancy.⁶⁰^,⁶¹ In breast cancer screening, CNNs are used to analyze mammography images, extracting features of tiny masses, thereby improving early diagnostic accuracy.⁶²^,⁶³

Deep learning not only excels at extracting visual features from medical images but also plays apivotal role in multimodal data fusion. By integrating genetic and image data, deep learning can extract features from multiple dimensions, providing more comprehensive information for cancer diagnosis. For example, by combining genetic data with CT images, deep learning can more accurately predict cancer progression and metastasis.⁶⁴^,⁶⁵

Deep Learning vs Traditional Feature Engineering Methods

Traditional feature engineering methods typically rely on expert domain knowledge to manually extract features and select appropriate models for training. For instance, atraditional approach might involve selecting features manually and representing them with the following formula:

(6)

Where, Inline graphic is the representation of the manually extracted features, is the features extracted from the domain knowledge and is the weights of the features.

In contrast, deep learning can automatically learn efficient representations of data through multi-layer neural networks, bypassing the limitations of manual feature extraction. For instance, Convolutional Neural Networks (CNNs) can automatically extract the morphological features of tumors from images. Traditional methods are susceptible to human bias during the feature selection process and are less efficient in processing complex data, particularly when dealing with high-dimensional and complex datasets, such as genomic data and medical images. Their effectiveness is often limited.⁴⁴^,⁴⁵ For example, in genomic data processing, traditional methods typically extract only a limited set of features and may not effectively identify various types of gene mutations.⁶⁶ In contrast, deep learning automatically learns effective feature representations from large datasets through the training of multi-layer neural networks, eliminating the constraints of manually designed features.⁶⁷ Deep learning can not only extract low-level features (eg, edges and textures in images) but also learn high-level semantic representations of the data, enabling it to capture complex patterns more effectively in cancer detection. For example, the application of CNNs in medical image analysis automatically extracts morphological features of tumors, avoiding the need for manual feature extraction.⁶⁸ Moreover, deep learning excels at handling large-scale datasets, particularly in the fusion of multimodal data, where it offers significant advantages. For example, combining genetic and medical imaging data through joint training of deep learning models can substantially enhance the accuracy of early cancer diagnosis.⁶⁴ In multimodal data fusion, deep learning further improves prediction accuracy and provides more precise information for personalized treatment by automatically learning the associations between datasets.³⁹

Overall, deep learning outperforms traditional feature engineering methods in terms of accuracy and efficiency, particularly when handling large-scale and complex data. Its advantages become increasingly evident as the data complexity grows.³⁴^,⁶⁹ With the continuous advancement of deep learning technology, it is expected to further accelerate progress in cancer detection.⁷⁰

Training and Optimization of Deep Learning Models

In deep learning, model training and optimization are crucial for achieving efficient performance. The training process involves selecting appropriate loss functions, optimization algorithms, and regularization techniques, all of which directly influence the convergence speed and generalization capability of the model. Additionally, effective training strategies, such as data augmentation and transfer learning, are particularly important in addressing issues like data insufficiency and class imbalance. This section will examine these factors in detail and explore how they can be leveraged to improve the performance of deep learning models in cancer detection applications.

Selection of the Loss Function

The loss function is a crucial component in deep learning model training, as it determines the optimization direction and goal of the model. In cancer detection, commonly used loss functions include Cross-Entropy Loss (CEL)⁷¹ for classification tasks and Mean Squared Error (MSE)⁷² for regression tasks.

For binary classification tasks, such as distinguishing between benign and malignant tumors, cross-entropy loss is used due to its ability to effectively measure the difference between predicted values and actual labels.⁷³ It is widely applied, and its formula is as follows:

(7)

Where Inline graphic is the True Label (0 or1) and is the Probability Predicted by the Model.

In multi-category classification problems, weighted cross-entropy loss (Equations 1 and 2) is commonly used to address class imbalance. It is formulated as follows:

(8)

Where Inline graphic is the number of categories, is the weight of category , is the true label, and is the probability predicted by the model.

Weighted cross-entropy loss enables the model to enhance its ability to recognize less frequent classes by assigning different weights to each class, making it particularly useful in medical image classification.⁷⁴^,⁷⁵

Optimization Algorithms

The choice of optimization algorithm directly affects the training speed and convergence of deep learning models. Commonly used optimization algorithms include stochastic gradient descent (SGD)⁷¹ and its variants, such as Adam⁷⁶ and RMSprop,⁷⁷ among others.

Stochastic Gradient Descent (SGD) is the most basic optimization method, updating model parameters by calculating the gradient of each training sample. Despite its simplicity and effectiveness, SGD can be slow to train and prone to converging to local optima. Therefore, SGD is often used in conjunction with Momentum to accelerate the convergence process.⁷⁸

The Adam optimization algorithm (Adaptive Moment Estimation) is one of the most widely used optimization algorithms. Adam combines the advantages of gradient descent by calculating both the first-order andsecond-order moment estimates of the gradient (ie, momentum and adaptive learning rate), which allows the model to converge faster on complex datasets and improves its robustness, especially when dealing with sparse data.⁷⁹ In cancer detection, the Adam optimization algorithm has become the preferred choice for processing medical image data (eg, MRI and CT images) due to its strong performance and stability.⁸⁰

Regularization Techniques

The complexity of deep learning models makes them susceptible to overfitting, particularly when the available data is insufficient. To address this issue, regularization techniques are commonly used to enhance the generalization ability of the models. Common regularization methods include L1 and L2 regularization, as well as Dropout. The formula for regularization is as follows:

(9)

Where Inline graphic is the Model Parameter, λ is the Regularization Parameter, and is the Total Number of Parameters.

(10)

Where Inline graphic is the Model Parameter, λ Is the Regularization Parameter, and is the Total Number of Parameters.

(11)

Where Inline graphic is the dropout rate (ie, the probability of dropping a neuron during each training session), and is the output after Dropout processing.

L1 regularization reduces feature redundancy by introducing an L1 penalty term, which sparsifies the model parameters.⁸¹ L2 regularization stabilizes the model by introducing an L2 penalty term, improving the convergence speed and making the model more stable.⁸² L1 and L2 regularization are often used in combination to achieve better results in various tasks.

Dropout is another common regularization technique that reduces the risk of overfitting by randomly “dropping” a portion of neurons during each iteration. This prevents the model from relying too heavily on specific features.⁸³ Dropout is widely used in deep convolutional neural networks (CNNs) for cancer image analysis, particularly in tumor classification and segmentation tasks, and effectively enhances the model’s generalization ability.⁸⁴

Cancer Detection Model Assessment Metrics

Evaluating cancer detection models is a critical step in ensuring their validity and reliability. Common evaluation metrics include Accuracy, Sensitivity, Specificity, F1 Score, and Area Under the Receiver Operating Characteristic (ROC) Curve (AUC). These metrics provide a comprehensive assessment of the model’s performance across different dimensions and help researchers evaluate its practical applicability.

Accuracy

Accuracy is the most intuitive assessment metric and is defined as the ratio of correctly predicted samples to the total number of samples. The formula is as follows:

(12)

In cancer detection, accuracy measures the overall predictive performance of the model. However, accuracy is sensitive to class imbalance and, in some cases, may not accurately reflect the model’s true performance.⁸⁵

Sensitivity and Specificity

Sensitivity (also known as recall) is the proportion of actual positive samples that the model correctly predicts as positive. It measures the model’s ability to detect true positives (eg, malignant tumors). The formula is as follows:

(13)

Specificity, on the other hand, is the proportion of actual negative samples that the model correctly predicts as negative. It measures the model’s ability to rule out false positives (eg, benign tumors). Sensitivity is particularly important in cancer detection because it determines whether a tumor can be detected in time.⁸⁶^,⁸⁷ The formula is as follows:

(14)

F1 Values

The F1 score is the harmonic mean of sensitivity and precision, commonly used to evaluate models on unbalanced datasets. It integrates both the model’s detection and accuracy rates. The formula is as follows:

(15)

(16)

F1 scores are particularly important in cancer detection, especially when testing for minority classes (eg, rare cancer types), as they help avoid the bias associated with relying on accuracy alone.⁸⁸

ROC Curves and AUCs

The Receiver Operating Characteristic (ROC) curve is used to evaluate the performance of binary classification models. It illustrates the relationship between sensitivity and the false positive rate (1-specificity) at various thresholds. The formula is as follows:

(17)

The area under the curve (AUC) is the integral of the ROC curve. A larger AUC value indicates better model performance across various classification thresholds. The AUC provides a comprehensive measure of the model’s classification performance and is a crucial evaluation metric in cancer detection.⁸⁹^,⁹⁰

Cancer Detection Using Genomic Data

Whole Genome Data Analysis

Deep learning can automatically extract features from a large number of genetic markers by building multi-layer neural networks to identify genetic variants associated with cancer. Convolutional Neural Networks (CNNs) have been widely used in genomic data analysis. CNNs extract cancer-associated features by learning spatial relationships between genetic markers. The CNN model proposed by Wu et al (2020) successfully identified multiple SNP loci associated with breast cancer.⁹¹ In addition, Recurrent Neural Networks (RNNs) and Long Short-Term Memory networks (LSTMs) also perform well in Genome-Wide Association Studies (GWAS), particularly in processing gene sequence data. These models capture temporal dependencies in gene expression, improving the prediction accuracy of cancer-related genes.⁹²^,⁹³

Graph Neural Networks (GNNs), as an emerging deep learning method, effectively model complex relationships between genes by representing gene interactions as graph structures. For example, Xie et al (2021) used GNNs to analyze gene interaction networks and successfully improved the recognition accuracy of cancer-related genes.⁹⁴ GNNs can discover potential cancer-related genetic variants by learning the relationships between nodes (genes).

Advantages of Deep Learning in GWAS

Compared to traditional statistical methods (eg, linear regression), the application of deep learning in Genome-Wide Association Studies (GWAS) offers significant advantages. First, deep learning can automatically extract features from data, overcoming the limitation of traditional methods that rely on manual feature selection. While traditional GWAS often requires expert input to select genetic markers, deep learning models can automatically learn important features from training data, thus improving the efficiency and accuracy of the models.⁹⁵

Second, deep learning effectively handles high-dimensional and complex data. In GWAS, genetic data typically contains millions of SNP markers, while the number of samples is often limited. Traditional statistical methods are easily affected by the curse of dimensionality, leading to overfitting. Deep learning addresses the issue of high-dimensional data by reducing dimensionality and effectively extracting useful information through multi-layer neural networks.⁹⁶ The formula is as follows:

(18)

Finally, deep learning can capture complex non-linear relationships between genetic markers. Traditional methods assume a linear relationship between genes and disease, but the actual relationship is often more complex. Deep learning better fits these complex relationships through its non-linear structure, improving the accuracy of identifying cancer-related genetic variants.⁹⁷^,⁹⁸ The formula is as follows:

(19)

Deep Learning Models for Cancer-Related Gene Marker Identification

The occurrence of cancer is closely linked to genetic variation, particularly the types of SNP (single nucleotide polymorphism) markers and mutations associated with cancer. Traditional GWAS methods rely on statistical techniques and often require manual selection of specific genetic markers to analyze their association with cancer. In contrast, deep learning can recognize complex genetic patterns by automatically extracting features from large-scale data, thereby improving the accuracy of identifying cancer-related genetic markers (Table 1).

Table 1.

Deep learning models for identifying cancer-related gene markers

Model Type	appliance	study
Convolutional Neural Network (CNN)	Local relationship learning for genetic markers, eg breast cancer GWAS data analysis	Successful identification of multiple genetic markers associated with breast cancer.⁹⁹
Recurrent Neural Network (RNN) and LSTM	For gene sequence data analysis to capture timing dependencies and identify lung cancer-related genes	Successful identification of key genes associated with lung cancer.¹⁰⁰^,¹⁰¹

Model Type	Appliance	Study
Graph Neural Networks (GNN)	For modeling cancer gene interaction networks, discovery of PI3K-Akt and MAPK pathways	Revealing important cancer-related biological pathways¹⁰²
Generative Adversarial Networks (GAN)	For generating cancer gene mutation networks and mining gastric cancer-related pathways	Validation of important pathways associated with gastric cancer¹⁰³

Model Type	Appliance	Study
Convolutional Neural Network (CNN)	For automated extraction of local features from genetic data, detection of SNVs and Indels	Improved accuracy and recall of variant detection¹⁰⁹
Randomized neural networks (RNNs) and LSTMs	For capturing temporal dependence in gene expression, especially in low-frequency mutation detection¹¹⁰^,¹¹¹	Excellent performance in low-frequency mutation detection
Variable Auto-Encoder (VAE)	Detection of cancer-associated SNVs and Indels by generating patterns of potential variants	Successful identification of multiple mutations associated with cancer¹¹¹

Model Type	Appliance	Study
Convolutional Neural Network (CNN)	Extracting mutation patterns from genetic data to predict cancer susceptibility	Improved accuracy of cancer susceptibility prediction¹⁰⁹
Long Short-Term Memory Network (LSTM)	Breast and lung cancer susceptibility analysis by time-series modeling	Mutation patterns associated with cancer susceptibility identified¹¹⁰
Ensemble Learning	Combining mutation data and clinical information to predict cancer progression	Significantly improved cancer progression prediction performance¹¹²
Graph Neural Networks (GNN)	Modeling gene interactions to reveal mechanisms of cancer progression	Mutations Linked to Cancer Metastasis and Drug Resistance Revealed¹⁰²

Explore	Methodologies	Findings
Zhou et al (2020)	CNN-based lung nodule detection system	Improved detection accuracy, AUC from 0.85 to 0.92¹¹⁵
Xu et al (2019)	Multiscale Convolutional Networks for Lung Nodule Detection of Different Sizes	Improved accuracy in detecting small nodules¹¹⁶
Chen et al (2021)	A lung cancer screening model incorporating multiple deep learning networks	Improved detection of small nodules¹¹⁷
He et al (2020)	Multi-layer convolutional neural network for deep analysis of CT images	Significantly improved sensitivity for early lung cancer screening¹¹⁸

Explore	Methodologies	Findings
Liu et al (2020)	CNN-based bone cancer detection system	Ability to accurately recognize malignant nodules in x-ray images¹¹⁹
Yang et al (2021)	Bone Cancer Diagnosis Based on Multiscale Convolutional Networks	Enhanced detection of small bone nodules¹²⁰
Zhang et al (2020)	CNN-based CT image analysis of liver and pancreatic cancer	Successful detection of liver and pancreatic cancer¹²¹
Zhou et al (2021)	Deep learning-based early bowel and pancreatic cancer detection	Improved sensitivity and specificity for bowel and pancreatic cancer¹²²

Explore	Methodologies	Findings
Zhou et al (2020)	A U-Net-based model for automatic brain tumor segmentation	Outperforms conventional methods in tumor region identification and segmentation accuracy¹²³
Zhang et al (2021)	V-Net based multi-scale brain tumor segmentation model	Effective improvement of segmentation accuracy in small tumor regions¹²⁴
Xu et al (2020)	Deep Learning-based Multi-Classification Brain Tumor Model	Significantly better than traditional medical image analysis methods¹²⁵

explore	methodologies	Findings
Zhao et al (2020)	CNN-based model for automatic prostate cancer detection	High accuracy and sensitivity to successfully identify prostate cancer lesions¹²⁸
Wang et al (2021)	Deep learning-based grading model for prostate cancer	Successfully differentiating between high and low risk patients¹²⁹

Explore	Methodologies	Findings
CNN-based cell recognition in breast cancer pathology sections	Automatic recognition of tumor cells from normal cells¹³³	Excellent performance in cell type identification in breast cancer
Multi-task learning approach in pathology image analysis	Simultaneous prediction of tumor type and regional boundaries to improve classification accuracy and efficiency¹³⁴	Improved classification accuracy and efficiency through multi-task learning
Deep learning based tumor grading model	Capture microscopic level of detail features such as nucleus size, shape and density distribution¹⁶	Demonstrated accuracy comparable to or better than pathologists in grading and staging of cancers such as breast and prostate¹³⁵^,¹³⁶
Migration Learning and Pretraining Models for Pathology Images	Improving Model Performance by Migrating Pre-trained Models on the ImageNet Dataset¹³⁷	Significantly reduce data requirements and improve model performance

Explore	Methodologies	Findings
He et al (U-Net based segmentation model)	Precise localization of cancer cell areas in breast cancer sections¹³⁹^,¹⁴⁰	Segmentation performance is superior to traditional methods
Chen et al (Modeling Attention Mechanisms for Multi-Instance Learning)	Effective localization of key lesion areas in WSI¹⁴¹	Improved accurate localization of lesion areas

Examine	Methodologies	Findings
Li et al (2021)	Transformer-based multimodal model to extract key features from WSI and generate structured pathology reports¹⁴²	Ability to improve the accuracy of pathology report generation to aid in clinical diagnosis
Li et al (2021)	Transformer-based multimodal model to extract key features from WSI and generate structured pathology reports¹⁴²	Ability to improve the accuracy of pathology report generation to aid in clinical diagnosis
Li et al (2021)	Combining the BERT model for semantic understanding and standardization, extracting terminology and generating normalized reports¹⁴³	Improved accuracy and professionalism of report generation
Kather et al (2020)	Combining WSI with Patient Clinical Data to Predict Cancer Recurrence Probability and Treatment Effectiveness Using Deep Learning Models¹⁴⁴^,¹⁴⁵	Effectively improves the accuracy of cancer prognosis prediction
Zhu et al (2021)	Proposing a weakly supervised learning method to analyze WSI with high quality with only a small amount of labeled data¹⁴⁶	Reduced annotation data requirements and improved WSI analysis quality

Examine	Methodologies	Findings
Xu et al (2020)	Multiscale U-Net model for tumor segmentation of liver ultrasound images¹⁴⁷	Effective Improvement of Liver Tumor Segmentation Performance, Especially for Low Contrast Problems
Li et al (2021)	DenseNet-based nodule detection framework for high-precision nodule size measurement¹⁴⁸	Improved accuracy of nodule size measurement and morphologic analysis
Wang et al (2020)	Combining CNN and Attention Mechanisms for Classifying Liver Cancer Benign and Malignant¹⁴⁹	Classification accuracy of more than 90%, effectively differentiating benign and malignant liver cancer
Zhang et al (2020)	A multimodal deep learning framework fuses imaging and non-imaging data to improve benign and malignant classification accuracy of thyroid cancer¹⁵⁰	The AUC value of 0.95 significantly improved the accuracy of classification of benign and malignant thyroid cancer

Explore	Methodologies	Findings
Zhang et al (2020)	CNN-based automatic cervical cancer early lesion identification model¹⁵⁴	Classification accuracy of more than 92%, successful identification of early cervical cancer lesions
Liu et al (2020)	LSTM-based model for dynamic ultrasound sequence analysis¹⁵⁵	Significantly improved early screening for cervical cancer
Wang et al (2021)	A multimodal deep learning framework combining 3D ultrasound images and blood flow signals¹⁵⁶	AUC value of 0.95 improves the performance of benign-malignant classification of ovarian cancer
Zhu et al (2020)	Accurate Classification of Ovarian Cancer Based on Multiple Instance Learning (MIL)¹⁵⁷	Accurate classification of ovarian cancer images successfully achieved

Integration Methods	Descriptive	Sample Application
Multi-input model	The gene data and image data are fed into two separate neural network branches for feature extraction, and finally feature fusion is performed through a fully connected layer.	Huang et al proposed a dual-stream CNN-based framework that ultimately achieves feature fusion through a fully connected layer to improve performance in multimodal cancer classification tasks.¹⁶⁴
Joint Encoder	Multi-modal data is processed by sharing weights, and deep fusion of data is realized in the process of feature extraction.	Zhang et al proposed a Transformer architecture that is able to learn complex relationships between data through a multi-head attention mechanism.¹⁶⁵
Graph neural network	Capturing higher-order relationships between gene data and image data, the fusion of features is realized by graph convolution operation.	The graph embedding-based model developed by Li et al achieves feature fusion through graph convolution operations with an AUC value of 0.92¹⁶⁶ in the cancer subtype prediction task.
Migration learning with pre-trained models	Utilizing pre-trained CNNs on ImageNet for image feature extraction and incorporating specific models for genomic data significantly improves training efficiency.	The combination of migration learning and pre-trained models significantly improves the training efficiency of multimodal data fusion.¹⁶⁷
Joint Optimization Strategy	Simultaneous training of feature extraction networks and classification networks for multimodal data to enhance synergy between data through adversarial loss.	The adversarial training method proposed by Wang et al improves the F1 value in the breast cancer detection task by 8% through adversarial loss enhancement.¹⁶⁸
Self-supervised learning	Improve the performance of multimodal models by generating pre-training tasks such as data reconstruction and feature prediction.	Chen et al used a comparative learning approach to construct positive-negative sample pairs using unlabeled gene data and image data, thus improving the robustness of feature extraction.¹⁶⁹

Fusion Data Types	Application Cases	Effect
EHR + CT images + genomic data	Integrating EHR, CT images and genomic data into a multimodal deep learning framework for improved detection sensitivity in lung cancer detection¹⁷⁰	Improved sensitivity and accuracy
EHR + Genetic Data	BERT-based modeling to extract risk factors and treatment response in cancer patients¹⁷¹	Outperforms traditional machine learning methods
EHR + Image Data	The DSS system developed by Lee et al combines CT images and clinical data to predict the pathological stage of lung cancer¹⁷²	Provide a visual explanation of staging

Application Areas	Method Description	Major Application	Findings
Personalized Medication Recommendations	Deep learning-based DSS to analyze patient mutations and treatment history	Recommendation of the best targeted drug	Improving the Effectiveness of Targeted Drug Therapy¹⁷³
Early screening and risk prediction	Cancer risk prediction by combining patients’ age, lifestyle habits and imaging data	Risk prediction for breast cancer	Helping Physicians Develop More Rational Screening Programs¹⁷⁴

Concern	Prescription	(for) Instance	Effect
Heterogeneity of clinical data	Transfer learning	Improving Diagnostic Performance Using Migration Learning on Scarce Datasets¹⁷⁶	Improved cancer diagnosis performance on scarce datasets
Uneven sample size	Small Sample Learning Techniques	A small sample learning framework based on contrast learning¹⁷⁶	Improved model performance on small sample datasets
Real-time decision support	Edge computing and cloud computing based DSS system	Supporting Real-Time Clinical Decision Making¹⁷⁷	Improved efficiency and accuracy of real-time decision making

Application Areas	Method Description	Performance Enhancement	Findings
Breast Cancer Diagnosis	Integration of breast cancer gene expression data with breast MRI imaging features	AUC increased from 0.85 to 0.92 in unimodal mode	A two-branch multi-input network designed by Li et a[179]
Lung Cancer Detection	Combining CT Imaging and Gene Mutation Data for Cancer Staging Prediction	Superior to traditional methods	Zhang et al. Multi-input neural network applications[180]

Application Areas	Method Description	Performance Enhancement	Findings
Liver Cancer Detection	Transformer-based model integrates gene expression data with CT image features	The AUC reaches 0.94, which is significantly higher than the traditional CNN model	Transformer-based multimodal modeling proposed by Xu et al¹⁸³
Prostate Cancer Staging	Joint learning of genetic data and pathology image features for modeling via Transformer	Achieve 92% accuracy	Multimodal Transformer model developed by Chen et al¹⁸⁴

Concern	Descriptive	Solution Strategy
Sequencing errors in genetic data	Sequencing errors mainly include base substitution errors, insertion/deletion errors, etc, which may interfere with mutation detection and gene expression analysis.¹⁸⁵	Deep learning-based error repair algorithm for detecting and repairing sequencing errors using deep generative models.¹⁸⁶
Artifacts and Noise in Image Data	Artifacts (eg, motion artifacts, metal artifacts) and noise degrade the model’s accuracy in detecting tumor boundaries¹⁸⁷	Using GANs to remove artifacts and improve image quality¹⁸⁸

Concern	Descriptive	Solution Strategy
Small sample size (statistics)	Small sample data may lead to overfitting of the deep learning model, which reduces the generalization ability.¹⁸⁹	Migration learning: by migrating knowledge from large publicly available datasets such as ImageNet.¹⁹⁰ Data enhancement: techniques such as rotation, flipping and cropping.¹⁹¹ Small sample learning: meta-learning based methods that enable rapid adaptation to new tasks with limited samples.¹⁹²
Data imbalance issues	Data imbalance is another challenge with cancer testing data, especially when there are significantly fewer samples in minority classes (eg, malignant tumors) than in majority classes (eg, benign tumors), and the model is susceptible to bias in favor of the majority class.¹⁹³	Resampling methods: such as oversampling (SMOTE) and undersampling to balance the data distribution.¹⁹⁴ Adjusting the loss function: using techniques such as weighted cross-entropy or Focal Loss.¹⁹⁵ Generating Adversarial Networks: expanding the minority class sample set by generating synthetic data.¹⁹⁶

PERMALINK

Deep Learning for Cancer Detection Based on Genomic and Imaging Data: A Comprehensive Review

Xinyu Wang

Can Su

Abstract

Introduction

Role of Genomic and Imaging Data in Cancer Detection

The Role of Genetic Data in Cancer Detection

The Role of Imaging Data in Cancer Detection

Application and Development of Deep Learning in Cancer Detection

Advantages and Challenges of Deep Learning

Current Challenges and Future Plans for Deep Learning in Cancer Detection

Data Quality and Quantity

Model Interpretability and Transparency

Multimodal Data Fusion

Clinical Validation and Application

Future Planning

Scope and Organization of the Review

Deep Learning Methodologies

Overview of Deep Learning Architectures

Convolutional Neural Networks (CNNs)

Recurrent Neural Networks (RNNs) and Their Variants

Generative Adversarial Networks (GANs)

Transformer Network

Graph Neural Networks (GNNs)

Feature Extraction and Representation Learning in Deep Learning

Feature Extraction and Representation Learning for Genetic Data

Feature Extraction and Representation Learning for Image Data

Deep Learning vs Traditional Feature Engineering Methods

Training and Optimization of Deep Learning Models

Selection of the Loss Function

Optimization Algorithms

Regularization Techniques

Cancer Detection Model Assessment Metrics

Accuracy

Sensitivity and Specificity

F1 Values

ROC Curves and AUCs

Cancer Detection Using Genomic Data

Whole Genome Data Analysis

Advantages of Deep Learning in GWAS

Deep Learning Models for Cancer-Related Gene Marker Identification

Table 1.

Deep Learning in Cancer-Related Biological Pathway Mining

Table 2.

Figure 1.

Somatic Mutation Data Analysis

Deep Learning in SNV and Indel Detection

Table 3.

Predicting Cancer Susceptibility and Progression Based on Somatic Mutations

Table 4.

Figure 2.

Cancer Detection Using Imaging Data

CT and X-Ray Imaging

Lung Cancer Detection and Diagnosis

Table 5.

Figure 3.

Bone Cancer and Other Applications

Table 6.

Figure 4.

MRI Imaging

Brain Tumor Detection and Segmentation

Table 7.

Figure 5.

Prostate and Breast Cancer Applications

Table 8.

Pathological Image Analysis

Histopathological Image Classification and Diagnosis

Table 9.

Figure 6.

Digital Pathology and Computational Pathology

Table 10.

Figure 7.

Table 11.

Figure 8.

Ultrasound Imaging

Liver and Thyroid Cancer Detection

Table 12.

Figure 9.

Obstetric and Gynecological Cancer Applications

Concern	Descriptive	Solution Strategy
Importance of XAI	The introduction of interpretability techniques can help researchers understand the decision basis of deep learning models.¹⁹⁷	Enhance model transparency with SHAP and Grad-CAM technologies.
XAI in genetic data	SHAP approach to quantify the contribution of mutant loci to cancer risk prediction.¹⁹⁷	SHAP-based interpretations help develop individualized treatment plans.
XAI in Medical Imaging	Grad-CAM highlights areas of interest in medical images through heat maps.¹⁹⁸	Grad-CAM shows tumor areas in CT images of lung cancer.
Application of XAI to multimodal data integration	Chen et al visualize the weight distributions of different modal features through the attention mechanism.¹⁹⁹	Use XAI to show the contribution of different modal features to the final prediction results.

Examine	Methodologies	Findings
Kim et al (2020)	Identification of key lesion regions in CT images of lung cancer based on attention mechanism	Significantly improves the detection accuracy of the model²⁰²
Application of SHAP to genetic data	SHAP-based analysis of gene mutation patterns	Mutation Patterns Significantly Associated with Breast Cancer Development²⁰³

Methodologies	Descriptive
Visualization technology (LIME)	Interpreting model predictions on specific samples by local linear approximation²⁰⁴
Attention mechanism	Weighting the importance of features to visualize the model’s region of interest¹⁸²^,²⁰⁵
Comparative learning	Explaining Model Decisions by Comparing Characteristic Differences between Positive and Negative Samples²⁰⁶
Methods of inferring cause and effect	Analyzing the causal relationship between model outputs and input variables through causal diagrams²⁰⁷

Examine	Methodologies	Findings
Importance analysis of features generated by SHAP	For breast cancer risk prediction and to guide clinical decision making	Enhanced credibility in clinical decision making²⁰⁸
Identification of Attentional Mechanisms Responsive to Targeted Therapies	Identifying potential patient response characteristics to targeted therapies	Optimized individualized treatment plans²⁰⁹
Heat map to aid in data labeling	Used to indicate lesion areas and speed up the labeling process	Improved data labeling efficiency²¹⁰

Challenge	Future Direction
Standardized evaluation indicators	Development of a Common Criteria for Quantitative Interpretability²¹¹
Interpretability studies of multimodal data	Development of multimodal interpretable methods for fusion analysis of imaging and genetic data²¹²
Real-time Interpretation and Usability Enhancement	Combining Edge Computing to Enhance Real-Time Clinical Use²¹³

Concern	Methodologies	Study
Data Sharing and Privacy Breaches	Federated Learning (FL) avoids centralized storage through distributed training²¹⁴^,²¹⁵	Reduced risk of privacy breaches
Data de-identification techniques	De-identification techniques such as Differential Privacy²¹⁶^,²¹⁷	Maintain high data utilization and ensure privacy

Sources and Effects of Prejudice	Cure	Study
Imbalance or insufficiency of training data²¹⁸	Data Enhancement, Stratified Sampling, Adjustment of Loss Function	Reduced bias in diagnosis see219
Algorithmic Bias Triggers Widening Health Disparities²²⁰	A multi-task learning framework incorporating fairness constraints	Significantly improved performance equalization across races²¹⁹

Validation Difficulties	Regulatory Standards	Study
Lack of harmonized assessment criteria²²¹	FDA and MDR require evidence of reproducibility of model performance²²²^,²²³	Improved model reliability²²²
Complexity of the model validation process²²¹	Strict approval processes are required to ensure security and effectiveness	Ensures the safety of the model for clinical applications²²³

Safeguard	Methodologies
Transparency and interpretability	Enhancing Transparency through Visualization Techniques (eg Grad-CAM) and Causal Inference²²⁴
Multi-party collaboration and standardization	Promoting open data-sharing platforms and harmonized assessment benchmarks²²⁵
Continuous monitoring and updating	Dynamic Learning and Online Training Technologies²²⁶

Directional	Goal
Construction of an ethical framework	Ensuring Equal Benefits for All Patient Groups in Model Use²²⁷
Internationalized regulatory cooperation	Harmonization of model approval criteria and promotion of cross-border applications²²⁸