Skip to main content
Health Research Alliance Author Manuscripts logoLink to Health Research Alliance Author Manuscripts
. Author manuscript; available in PMC: 2023 Apr 1.
Published in final edited form as: Semin Ultrasound CT MR. 2022 Feb 12;43(2):142–146. doi: 10.1053/j.sult.2022.02.003

Radiomics: A primer on processing workflow and analysis

Emily Avery a, Pina C Sanelli b, Mariam Aboian a, Seyedmehdi Payabvash a,*
PMCID: PMC8961004  NIHMSID: NIHMS1786502  PMID: 35339254

Abstract

Quantitative analysis of medical images can provide objective tools for diagnosis, prognostication, and disease monitoring. Radiomics refers to automated extraction of a large number of quantitative features from medical images for characterization of underlying pathologies. In this review, we will discuss the principles of radiomics, image preprocessing, feature extraction workflow, and statistical analysis. We will also address the limitations and future directions of radiomics.

Background

In recent decades, dramatic advancements in high-throughput computing and availability of medical mega-data have led to the emergence of the “-omics” fields of translational science. From genomics, to proteomics, to metabolomics, the “-omics” fields employ collective quantification, characterization, and exploitation of pools of biologic information. Similar in principle to these large-scale analytic techniques, radiomics is a quantitative approach to medical imaging, wherein a large number of features are extracted from medical images and used to detect clinically relevant information that is invisible to the naked eye. Preset algorithms are fed two-dimensional or multi-section volumetric imaging data to derive a comprehensive representation of tissue shape, intensity, and texture, which is well suited for artificial intelligence applications.

The premise of utilizing information encoded in medical images has proven appealing to radiologists and other specialists in the interest of improving disease detection, increasing prognostic accuracy, and guiding timely treatment decisions. This is particularly relevant in the age of telemedicine, as automated radiomics-based models have the potential to provide objective data that increases speed and accuracy of patient evaluation. Furthermore, the opportunity to process hundreds to thousands of features in large datasets allows for the discovery of previously unknown patterns or markers of particular disease processes[1]. Such data mining through a population imaging approach is uniquely possible with radiomics.

Radiomics has proven particularly ripe for applications in oncologic imaging. As heterogeneity on a cellular level is known to be indicative of tumor aggressiveness[2] and numerous radiomics features from medical images are correlated with such heterogeneity[3], it comes as no surprise that radiomics features and radiomics-based models have been proposed to predict clinical variables such as survival and treatment response in oncologic subfields ranging from non-small cell lung cancer[4] to squamous cell carcinoma of the head and neck[5]. Radiomics-based models have also been used to predict specific clinical events such as metastasis of a primary lesion[6, 7], or to indicate specific genetic profiles[8, 9].

The range of radiomics applications extends far beyond oncologic imaging, however, with recent studies utilizing radiomics features to predict prognosis after intracerebral hemorrhage[10], classify the etiology of liver cirrhosis[11], and predict development of immunotherapy-induced pneumonitis[12], to name a few examples. Imaging modalities utilized in such recent works include MRI, CT, PET, SPECT, and even ultrasound. Relevant to such a wide range of imaging modalities, medical specialties, and clinical problems, radiomics research has naturally seen a rapid rise throughout the past decade, with a publication growth rate of 178% between 2013 and 2018 [13] that continues to rise. As radiomics becomes increasingly prevalent across medical specialties, it is important for physicians to understand the field’s principles, methodologies, and applications. In the present work, we provide a step-by-step description of the radiomics workflow and discuss the future of this analytic technique, its potential, and its limitations.

Radiomics Workflow

Image Acquisition and Processing

The first step in all radiomics research is image acquisition. CT, MRI, and PET imaging are among the most commonly utilized modalities, though SPECT, ultrasound[14], plain films[15], and other modalities have also been utilized.

Once a sufficiently large dataset of images is collected from the target patient population, the region of interest (ROI, for two-dimensional data) or volume of interest (VOI, for threedimensional data) must be segmented. Manual segmentation by qualified physicians or researchers is possible, though this approach is time consuming and brings observer bias, which necessitates that intra- and inter-observer reproducibility be measured and reported, and non-reproducible radiomics features be excluded.

To improve speed and reproducibility, automated segmentation tools have become increasingly popular. Both open-source and commercial segmentation tools are available, including 3D Slicer[16], FreeSurfer[17], ITK-SNAP [18], ImageJ [19], as well as deep learning-based algorithms such as U-Net[20] and iW-Net[21].

After segmentation, the imaging data should be processed to minimize inconsistencies before radiomics features are extracted. This is particularly critical in the interest of reproducibility of the radiomics-based models, particularly between imaging protocols, scanner machines, and institutions. The processing parameters should be reported to this end[22, 23]. Processing commonly includes interpolation to isotropic voxel spacing, range re-segmentation, intensity outlier filtering, and discretization of voxel intensities. Automated image processing is supported by many of the software tools listed above, and is commonly achieved by utilizing open-source packages such as pyRadiomics[24] which utilizes a parameter file that can be exported to a software platform or Python framework.

Radiomics Feature Extraction

In the feature extraction step, hundreds to thousands of radiomics features per the ROI/VOI are quantified based on preset algorithms. Most commonly, researchers adhere to the guidelines set forth by the Image Biomarker Standardization Institute in 2020[25] in the interest of increased reproducibility and validation of radiomics studies. Radiomics features comprise several classes, most commonly including histogram features, texture features, model-based features, transform-based features, and shape-based features. Common first-order (histogram) and texture features are listed in Figure 2. Feature extraction is typically performed as a function on Python or Linux platforms.

Figure 2. Commonly extracted first-order and texture radiomics features.

Figure 2.

A complete list of radiomics features is described in van Griesen et al., 2017 [24], and feature definitions are described in pyRadiomics documentation.

Feature Selection and Dimensionality Reduction

Though hundreds to thousands of radiomics features are extracted, only a subset will contribute to the radiomics signature of a particular disease process or research question. These most-relevant features must be identified, and redundant variables removed in the feature selection step. Features that are redundant (highly correlated with other features) or non-reproducible are combined or excluded. General guidelines for the assessment of feature reproducibility and robustness are described in Sullivan et al., 2015[26]. To this end, and to identify relevant features, researchers typically utilize established feature selection or dimensionality reduction algorithms. Common examples include hierarchical clustering[27], principal component analysis[28], least absolute shrinkage and selection operator (LASSO) regularized logistic regression, and maximum relevance minimum redundancy filtering [29]. All of these methodologies can be implemented in R through the ‘stats’ or ‘glmnet’ [30] packages. In general, the most common approaches to feature selection reduce redundancy without incorporating knowledge about the predicted or ‘target’ variable(s). If knowledge about the predicted variables is indeed considered during feature selection, researchers must exercise caution to avoid overfitting, or tailoring models so specifically to a given dataset that they lose generalizability to unseen data.

Model Construction

After selection of radiomics features, these features are ready for application into a modeling pipeline that will predict the pathology of focus. Machine learning is the most common approach to radiomics-based prediction, and this subset of artificial intelligence research has grown rapidly along with increasing availability of high-dimensionality data in recent years. The basic premise of machine learning is to train a model by showing it examples of input-output behavior such that the system learns the desired relationship without explicit manual programming. Numerous machine learning classifiers have been developed to this end, which utilize a large space of candidate algorithms to find optimal performance across repeated iterations of training and testing. Data representations utilized by machine learning classifiers vary widely, from decision trees to mathematical functions. Specific popular classifiers include extreme gradient boosting[31], random forest[32], naïve Bayes, and support vector machine, to name a few. There is no clear ‘best’ classifier for any given research question or dataset, rather, multiple machine learning classifiers, feature selection methods, and combinations thereof can and should be assessed for their prediction accuracy.

Prediction accuracy of radiomics-based models is most commonly assessed by the receiver operating characteristic (ROC) of the area under the curve (AUC). This approach incorporates true positive, true negative, false positive, and false negative predictions made by the radiomics-based model. Predicted or target variables vary widely in nature, with recent work targeting categorical, ordinal, binary, and other types of outcome measures.

A final, crucial step in model performance analysis is independent validation. In this step, the radiomics-based model is applied to datasets on which it was not trained, including datasets collected at other institutions (external validation). This is critical to demonstrate a model’s generalizability, robustness, and potential for wider clinical utility.

Limitations and Future Directions

Despite the increasing prevalence and promise of radiomics-based studies, researchers should be aware of challenges and limitations facing the field. Technical limitations related to image acquisition and processing can hinder reproducibility of radiomics features and the resulting models. While data acquisition and analysis should ideally be standardized and optimized to maximize reproducibility, there remains considerable variation in acquisition parameters used in real-world clinical settings[33]. Furthermore, image quality and characteristics can be affected by patient motion, specific scanner being used, and institution-specific protocols. To counteract such inconsistencies, researchers should continue to exclude poor quality images and non-reproducible radiomics features. Public phantom-based datasets have been proposed to assess the effects of acquisition parameters and site-specific protocols on radiomics features and eliminate features that are not robust, though the relationship between phantom- and human-based features remains under investigation[34]. Independent/external validation of models, particularly on datasets collected at other institutions, continues to be critical in assessing generalizability of radiomics-based models. Inter-institutional data sharing thus remains an important aspect of producing clinically relevant radiomics-based research.

Of additional note, collaboration between physicians and computer scientists is imperative to conducting high-quality work that answers important questions for a given biomedical field. Physicians and computer scientists have not historically shared a common language, but interdisciplinary communication and collaborative research continues to progress as open-source code and data become more prevalent and as radiomics itself becomes more widespread. We hope that articles such as this one work towards physician familiarization with radiomics techniques to help bridge the computational gap.

Challenges acknowledged, the future of radiomics has the potential for considerable impact. Particularly in the age of tele-health, radiomics research can provide automated, objective, time- and cost-sensitive tools for efficient treatment triage. Similarly, radiomics can increase radiologists’ efficiency and effectiveness by flagging studies that demonstrate highly important or ‘don’t miss’ findings. Radiomics-based models may also provide a means by which a patient’s disease processes can be detected at an early stage, or method by which pathophysiology can be indicated without invasive, costly procedures. As researchers and physicians continue to improve standardization of the radiomics workflow and create robust radiomics-based models, clinically relevant applications of this methodology will continue to enhance clinical decision making and improve patient care.

Figure 1.

Figure 1.

Radiomics Workflow

Declaration of Competing Interest

Dr. Sanelli received funding support from the NINDS R56NS114275, Siemens Healthineers, and the Harvey L. Neiman Health Policy Institute. Dr. Aboian received funding support from NCATS/NIH KL2 TR001862 and NIH roadmap for Medical Research. Dr. Payabvash is supported by NIH/NINDS K23NS118056, Doris Duke Charitable Foundation (2020097), and Foundation of American Society of Neuroradiology.

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

References

  • 1.Volzke H, et al. , Population imaging as valuable tool for personalized medicine. Clin Pharmacol Ther, 2012. 92(4): p. 422–4. [DOI] [PubMed] [Google Scholar]
  • 2.Moon SH, et al. , Correlations between metabolic texture features, genetic heterogeneity, and mutation burden in patients with lung cancer. Eur J Nucl Med Mol Imaging, 2019. 46(2): p. 446–454. [DOI] [PubMed] [Google Scholar]
  • 3.Morris LG, et al. , Pan-cancer analysis of intratumor heterogeneity as a prognostic determinant of survival. Oncotarget, 2016. 7(9): p. 10051–63. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Huang Y, et al. , Radiomics Signature: A Potential Biomarker for the Prediction of Disease-Free Survival in Early-Stage (I or II) Non-Small Cell Lung Cancer. Radiology, 2016. 281(3): p. 947–957. [DOI] [PubMed] [Google Scholar]
  • 5.Haider SP, et al. , Applications of radiomics in precision diagnosis, prognostication and treatment planning of head and neck squamous cell carcinomas. Cancers Head Neck, 2020. 5: p. 6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Huang YQ, et al. , Development and Validation of a Radiomics Nomogram for Preoperative Prediction of Lymph Node Metastasis in Colorectal Cancer. J Clin Oncol, 2016. 34(18): p. 2157–64. [DOI] [PubMed] [Google Scholar]
  • 7.Ji GW, et al. , A radiomics approach to predict lymph node metastasis and clinical outcome of intrahepatic cholangiocarcinoma. Eur Radiol, 2019. 29(7): p. 3725–3735. [DOI] [PubMed] [Google Scholar]
  • 8.Zhou M, et al. , Non-Small Cell Lung Cancer Radiogenomics Map Identifies Relationships between Molecular and Imaging Phenotypes with Prognostic Implications. Radiology, 2018. 286(1): p. 307–315. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Gevaert O, et al. , Non-small cell lung cancer: identifying prognostic imaging biomarkers by leveraging public gene expression microarray data--methods and preliminary results. Radiology, 2012. 264(2): p. 387–96. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Haider SP, et al. , Admission computed tomography radiomic signatures outperform hematoma volume in predicting baseline clinical severity and functional outcome in the ATACH-2 trial intracerebral hemorrhage population. Eur J Neurol, 2021. 28(9): p. 2989–3000. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Elkilany A, et al. , A radiomics-based model to classify the etiology of liver cirrhosis using gadoxetic acid-enhanced MRI. Sci Rep, 2021. 11(1): p. 10778. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Colen RR, et al. , Radiomics to predict immunotherapy-induced pneumonitis: proof of concept. Invest New Drugs, 2018. 36(4): p. 601–607. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Song J, et al. , A review of original articles published in the emerging field of radiomics. Eur J Radiol, 2020. 127: p. 108991. [DOI] [PubMed] [Google Scholar]
  • 14.Hu HT, et al. , Ultrasound-based radiomics score: a potential biomarker for the prediction of microvascular invasion in hepatocellular carcinoma. Eur Radiol, 2019. 29(6): p. 2890–2901. [DOI] [PubMed] [Google Scholar]
  • 15.Tamal M, et al. , An integrated framework with machine learning and radiomics for accurate and rapid early diagnosis of COVID-19 from Chest X-ray. Expert Syst Appl, 2021. 180: p. 115152. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Fedorov A, et al. , 3D Slicer as an image computing platform for the Quantitative Imaging Network. Magn Reson Imaging, 2012. 30(9): p. 1323–41. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Fischl B, FreeSurfer. Neuroimage, 2012. 62(2): p. 774–81. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Yushkevich PA, Yang G, and Gerig G, ITK-SNAP: An interactive tool for semi-automatic segmentation of multi-modality biomedical images. Annu Int Conf IEEE Eng Med Biol Soc, 2016. 2016: p. 3342–3345. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Girish V and Vijayalakshmi A, Affordable image analysis using NIH Image/ImageJ. Indian J Cancer, 2004. 41(1): p. 47. [PubMed] [Google Scholar]
  • 20.Falk T, et al. , U-Net: deep learning for cell counting, detection, and morphometry. Nat Methods, 2019. 16(1): p. 67–70. [DOI] [PubMed] [Google Scholar]
  • 21.Aresta G, et al. , iW-Net: an automatic and minimalistic interactive lung nodule segmentation deep network. Sci Rep, 2019. 9(1): p. 11591. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Bailly C, et al. , Revisiting the Robustness of PET-Based Textural Features in the Context of Multi-Centric Trials. PLoS One, 2016. 11(7): p. e0159984. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Shafiq-Ul-Hassan M, et al. , Intrinsic dependencies of CT radiomic features on voxel size and number of gray levels. Med Phys, 2017. 44(3): p. 1050–1062. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.van Griethuysen JJM, et al. , Computational Radiomics System to Decode the Radiographic Phenotype. Cancer Res, 2017. 77(21): p. e104–e107. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Zwanenburg A, et al. , The Image Biomarker Standardization Initiative: Standardized Quantitative Radiomics for High-Throughput Image-based Phenotyping. Radiology, 2020. 295(2): p. 328–338. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Sullivan DC, et al. , Metrology Standards for Quantitative Imaging Biomarkers. Radiology, 2015. 277(3): p. 813–25. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Loewenstein Y, et al. , Efficient algorithms for accurate hierarchical clustering of huge datasets: tackling the entire protein space. Bioinformatics, 2008. 24(13): p. i41–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Labbe DR, et al. , Feature selection using a principal component analysis of the kinematics of the pivot shift phenomenon. J Biomech, 2010. 43(16): p. 3080–4. [DOI] [PubMed] [Google Scholar]
  • 29.De Jay N, et al. , mRMRe: an R package for parallelized mRMR ensemble feature selection. Bioinformatics, 2013. 29(18): p. 2365–8. [DOI] [PubMed] [Google Scholar]
  • 30.Friedman J, Hastie T, and Tibshirani R, Regularization Paths for Generalized Linear Models via Coordinate Descent. J Stat Softw, 2010. 33(1): p. 1–22. [PMC free article] [PubMed] [Google Scholar]
  • 31.Chen TQ and Guestrin C, XGBoost: A Scalable Tree Boosting System. Kdd’16: Proceedings of the 22nd Acm Sigkdd International Conference on Knowledge Discovery and Data Mining, 2016: p. 785–794. [Google Scholar]
  • 32.Belgiu M and Dragut L, Random forest in remote sensing: A review of applications and future directions. Isprs Journal of Photogrammetry and Remote Sensing, 2016. 114: p. 24–31. [Google Scholar]
  • 33.Rizzo S, et al. , Radiomics: the facts and the challenges of image analysis. Eur Radiol Exp, 2018. 2(1): p. 36. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Kalendralis P, et al. , Multicenter CT phantoms public dataset for radiomics reproducibility tests. Med Phys, 2019. 46(3): p. 1512–1518. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Seminars in ultrasound, CT, and MR are provided here courtesy of Health Research Alliance manuscript submission

RESOURCES