Machine Learning for Multi-Omics Characterization of Blood Cancers: A Systematic Review

. 2025 Sep 4;14(17):1385. doi: 10.3390/cells14171385

Artificial Intelligence (AI)	Computer systems designed to perform tasks that usually require human intelligence, such as recognizing patterns, making decisions, or interpreting data.
Machine Learning (ML)	A subset of AI that allows computers to learn from data and improve their performance over time without being explicitly programmed.
Deep Learning	A complex type of ML using layers of algorithms (‘neural networks’) to detect intricate patterns in data.
Neural Networks	A set of algorithms modeled after the human brain that are designed to recognize patterns and relationships in data.
Hematological Malignancies	Cancers that begin in blood-forming tissue, such as leukemia, lymphoma, and multiple myeloma.
Molecular Characterization	The process of identifying the unique molecular features (like genes, proteins, or metabolites) of a disease.
Omics	A broad term for fields of biological study ending in ‘-omics’—such as genomics (genes), proteomics (proteins), and metabolomics (metabolites).
Multi-Omics Integration	Combining data from multiple ‘omics’ levels (e.g., genes, proteins, metabolites) to achieve a complete view of disease biology.
Genomics	The study of an organism’s complete set of DNA, including all its genes.
Transcriptomics	The study of RNA molecules to understand which genes are actively being expressed.
Proteomics	The large-scale study of proteins and how they function in the body.
Metabolomics	The study of small molecules (metabolites) produced during metabolism, providing clues about disease states.
Biomarker	A measurable indicator (like a protein or gene mutation) that helps detect or predict disease.
Explainability/Interpretability	The degree to which the inner workings of a ML model can be understood and interpreted by humans.
Validation (Internal/External)	Processes to test how well a model’s predictions hold up on new, unseen data (internal uses the same dataset, external uses new data).
Overfitting	A problem where a model fits the training data too closely, capturing noise rather than the underlying pattern, and performs poorly on new data.
False Discovery	An incorrect identification of a feature or result as significant when it is not, often due to multiple testing or noise in the data.
AUC (Area Under the Curve)	A score from 0 to 1 indicating how well a model distinguishes between different disease states. Higher is better.
Sensitivity and Specificity	Sensitivity measures the true positive rate; specificity measures the true negative rate.
C-index	A metric used in prognosis models to evaluate how well predicted risks match actual outcomes over time.
Cross-validation	A method to evaluate model performance by partitioning data into training and testing sets multiple times.
SHAP (SHapley Additive exPlanations)	A tool used to explain how much each feature contributes to a model’s output.
LIME	A method that explains individual predictions of complex models in a human-understandable way.
Bias Mitigation	Efforts to prevent AI models from producing unfair results due to biases in the training data.
Federated Learning	A privacy-preserving approach where data stays at its source and only model updates are shared.
Digital Twin	A digital replica of a biological system used to simulate and predict disease progression or treatment response.
Principal Components Analysis (PCA)	A method for reducing the dimensionality of data while preserving trends and patterns.
t-distributed Stochastic Neighbor Embedding (t-SNE)	A technique for visualizing high-dimensional data in a way that makes patterns easier to see.
Random Forests (RF)	An ensemble learning method using many decision trees to improve prediction accuracy and control overfitting.
Support Vector Machines (SVMs)	A supervised ML algorithm that finds the best boundary (hyperplane) between data classes.
Decision Trees	A model that splits data into branches to reach a decision or classification based on input features.
Naive Bayes	A classification method based on Bayes’ theorem, assuming independence between predictors.
Ensemble Methods	Techniques that combine multiple models (like trees and SVMs) to improve overall performance.
Logistic Regression	A statistical model used to predict the probability of a binary outcome (e.g., disease vs. no disease).
Generalization (in ML)	The ability of a model to perform well on new, unseen data—not just the data it was trained on.
Ethical AI	Developing AI systems that are fair, transparent, and protect individual rights and privacy.