| Artificial Intelligence (AI) | Computer systems designed to perform tasks that usually require human intelligence, such as recognizing patterns, making decisions, or interpreting data. |
| Machine Learning (ML) | A subset of AI that allows computers to learn from data and improve their performance over time without being explicitly programmed. |
| Deep Learning | A complex type of ML using layers of algorithms (‘neural networks’) to detect intricate patterns in data. |
| Neural Networks | A set of algorithms modeled after the human brain that are designed to recognize patterns and relationships in data. |
| Hematological Malignancies | Cancers that begin in blood-forming tissue, such as leukemia, lymphoma, and multiple myeloma. |
| Molecular Characterization | The process of identifying the unique molecular features (like genes, proteins, or metabolites) of a disease. |
| Omics | A broad term for fields of biological study ending in ‘-omics’—such as genomics (genes), proteomics (proteins), and metabolomics (metabolites). |
| Multi-Omics Integration | Combining data from multiple ‘omics’ levels (e.g., genes, proteins, metabolites) to achieve a complete view of disease biology. |
| Genomics | The study of an organism’s complete set of DNA, including all its genes. |
| Transcriptomics | The study of RNA molecules to understand which genes are actively being expressed. |
| Proteomics | The large-scale study of proteins and how they function in the body. |
| Metabolomics | The study of small molecules (metabolites) produced during metabolism, providing clues about disease states. |
| Biomarker | A measurable indicator (like a protein or gene mutation) that helps detect or predict disease. |
| Explainability/Interpretability | The degree to which the inner workings of a ML model can be understood and interpreted by humans. |
| Validation (Internal/External) | Processes to test how well a model’s predictions hold up on new, unseen data (internal uses the same dataset, external uses new data). |
| Overfitting | A problem where a model fits the training data too closely, capturing noise rather than the underlying pattern, and performs poorly on new data. |
| False Discovery | An incorrect identification of a feature or result as significant when it is not, often due to multiple testing or noise in the data. |
| AUC (Area Under the Curve) | A score from 0 to 1 indicating how well a model distinguishes between different disease states. Higher is better. |
| Sensitivity and Specificity | Sensitivity measures the true positive rate; specificity measures the true negative rate. |
| C-index | A metric used in prognosis models to evaluate how well predicted risks match actual outcomes over time. |
| Cross-validation | A method to evaluate model performance by partitioning data into training and testing sets multiple times. |
| SHAP (SHapley Additive exPlanations) | A tool used to explain how much each feature contributes to a model’s output. |
| LIME | A method that explains individual predictions of complex models in a human-understandable way. |
| Bias Mitigation | Efforts to prevent AI models from producing unfair results due to biases in the training data. |
| Federated Learning | A privacy-preserving approach where data stays at its source and only model updates are shared. |
| Digital Twin | A digital replica of a biological system used to simulate and predict disease progression or treatment response. |
| Principal Components Analysis (PCA) | A method for reducing the dimensionality of data while preserving trends and patterns. |
| t-distributed Stochastic Neighbor Embedding (t-SNE) | A technique for visualizing high-dimensional data in a way that makes patterns easier to see. |
| Random Forests (RF) | An ensemble learning method using many decision trees to improve prediction accuracy and control overfitting. |
| Support Vector Machines (SVMs) | A supervised ML algorithm that finds the best boundary (hyperplane) between data classes. |
| Decision Trees | A model that splits data into branches to reach a decision or classification based on input features. |
| Naive Bayes | A classification method based on Bayes’ theorem, assuming independence between predictors. |
| Ensemble Methods | Techniques that combine multiple models (like trees and SVMs) to improve overall performance. |
| Logistic Regression | A statistical model used to predict the probability of a binary outcome (e.g., disease vs. no disease). |
| Generalization (in ML) | The ability of a model to perform well on new, unseen data—not just the data it was trained on. |
| Ethical AI | Developing AI systems that are fair, transparent, and protect individual rights and privacy. |