Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2018 Nov 1.
Published in final edited form as: Cancer Res. 2017 Nov 1;77(21):e104–e107. doi: 10.1158/0008-5472.CAN-17-0339

Computational Radiomics System to Decode the Radiographic Phenotype

Joost JM van Griethuysen 1,3,4, Andriy Fedorov 2, Chintan Parmar 1, Ahmed Hosny 1, Nicole Aucoin 2, Vivek Narayan 1, Regina GH Beets-Tan 3,4, Jean-Christophe Fillion-Robin 5, Steve Pieper 6, Hugo JWL Aerts 1,2
PMCID: PMC5672828  NIHMSID: NIHMS894056  PMID: 29092951

Abstract

Radiomics aims to quantify phenotypic characteristics on medical imaging through the use of automated algorithms. Radiomic artificial intelligence (AI) technology, either based on engineered hard-coded algorithms or deep learning methods, can be used to develop non-invasive imaging-based biomarkers. However, lack of standardized algorithm definitions and image processing severely hampers reproducibility and comparability of results. To address this issue, we developed PyRadiomics, a flexible open-source platform capable of extracting a large panel of engineered features from medical images. PyRadiomics is implemented in Python and can be used standalone or using 3D-Slicer. Here, we discuss the workflow and architecture of PyRadiomics and demonstrate its application in characterizing lung-lesions. Source code, documentation, and examples are publicly available at www.radiomics.io. With this platform, we aim to establish a reference standard for radiomic analyses, provide a tested and maintained resource, and to grow the community of radiomic developers addressing critical needs in cancer research.

Keywords: Python, Radiomics, Pyradiomics, feature extraction, Medical image analysis

Introduction

Medical imaging is considered one of the top innovations that transformed clinical cancer care, as it significantly changed how physicians measure, manage, diagnose, and treat cancer. Imaging is able to noninvasively visualize the radiographic phenotype of a tumor before, during, and after treatment. Radiomics refers to the comprehensive and automated quantification of this radiographic phenotype using data-characterization algorithms (13). Radiomics can quantity a large panel of phenotypic characteristics, such as shape and texture, potentially reflecting biologic properties like intra- and inter-tumor heterogeneities (4).

Radiomic technologies, based on artificial intelligence (AI) methods, are either defined using engineered hard-coded features, which often rely on expert domain knowledge, or on deep learning methods, which can learn feature representations automatically from data (5). The potential of radiomics has been shown across multiple tumor types, including brain, head-and-neck, cervix, and lung cancer tumors. Furthermore, these data, extracted from MRI, PET or CT images, were associated with several clinical outcomes, and hence, potentially provide complementary information for decision support in clinical oncology (1).

However, there is a lack of standardization of both feature definitions and image processing, which has been shown to have a substantial impact on the reliability of radiomic data (68). Furthermore, many studies use in-house developed software, often not shared with the public, makes the reproduction and comparison of results difficult.

To address this issue, we developed a comprehensive open-source platform, called PyRadiomics, which enables processing and extraction of radiomic features from medical image data using a large panel of engineered hard-coded feature algorithms. PyRadiomics provides a flexible analysis platform with both a simple and convenient front-end interface in 3D Slicer, a free open-source platform for medical image computing (9), as well as a back-end interface allowing automation in data processing, feature definition, and batch handling. PyRadiomics is implemented in Python, a language that has established itself as a popular open-source language for scientific computing, and can be installed on any system.

Here, we discuss the workflow and architecture of PyRadiomics and demonstrate its application in characterizing benign and malignant lung lesions. Source code, documentation, instruction videos (see Video 1 and 2), and examples are available at www.radiomics.io/pyradiomics.html. With this resource, we aim to establish a reference standard for radiomic analyses, provide a tested and maintained open-source platforms, and raise the awareness among scientists of the potential of radiomics technologies.

Platform

The PyRadiomics platform can extract radiomic data from medical imaging (such as CT, PET, MRI) using four main steps: I) Loading and preprocessing of the image and segmentation maps, II) Application of enabled filters, III) Calculation of features using the different feature classes, and IV) Returning results. See Figure 1A for an illustration of this process.

Figure 1.

Figure 1

A Overview figure of the process of PyRadiomics. First, medical images are segmented. Second, features are extracted using the PyRadiomics platform, and third, features are analyzed for associations with clinical or biologic factors. B Stability of radiomics features for variation in manual segmentations by expert radiologists. C Heatmap showing expression values of radiomics features (rows) of 429 lesions (columns). Note the four subtypes that could be identified from the expression values and their associations with malignancy. D Area under curve (AUC) showing the performance of the multivariate biomarker to predict malignancy of nodules.

I) Loading and preprocessing

In this step, medical images (e.g. CT, PET, MRI) and segmentation maps (e.g. performed by radiologist) will be loaded into the platform. The large majority of image handling is done using SimpleITK, which provides a streamlined interface to the widely used open-source Insight Toolkit (ITK) (10). This enables PyRadiomics to support a wide variety of image formats, while also ensuring that much of the low-level functionality and basic image processing is thoroughly tested and maintained. For texture and shape features, several resampling options are included to ensure isotropic voxels with equal distances between neighbouring voxels in all directions.

II) Filtering

Features can be calculated on the original image or on images pre-processed using a choice of several built-in filters. These include wavelet and laplacian of gaussian (LoG) filters, as well as several simple filters, including square, square root, logarithm, and exponential filters. For the application of the wavelet and LoG filter, the platform makes use of the PyWavelets and SimpleITK, respectively. The remaining filters are implemented using NumPy.

III) Feature calculation

The platform contains five feature classes: a class for first-order statistics, a class for shape descriptors, and texture classes Gray Level Co-occurrence Matrix (11), Gray Level Run Length Matrix (12,13), and Gray Level Size Zone Matrix (14). All statistic and texture classes can be used for feature extraction from both filtered and unfiltered images. Shape descriptors are independent from intensity values and therefore can only be extracted from unfiltered images. Feature extraction is supported for both single slice (2D) and whole volume (3D) segmentations.

IV) Results

Calculated features are stored and returned in an ordered dictionary. Every feature is identified by a unique name consisting of the applied filter, the feature class and feature name. Besides the calculated features, this dictionary also contains additional information on the extraction, including current version, applied filters, settings, and original image spacing.

To enhance usability, PyRadiomics has a modular implementation, centered around the featureextractor module which defines the feature extraction pipeline and handles interaction with the other modules in the platform. All feature classes are defined in separate modules. Furthermore, all are inherited from a base feature extraction class, providing a common interface. Finally, the platform contains two helper modules, generalinfo that provides additional extraction information included in the returned result, and the imageoperations module that implements the functions used during image preprocessing and filters.

Aside from interactive use in Python scripts through the featureextractor module, PyRadiomics supports direct usage from the command line. There are two scripts available, pyradiomics and pyradiomicsbatch, for single image and batch processing, respectively. For both scripts, an additional parameter file can be used to customize the extraction and results can be directly imported into many statistical packages for analysis, including R and SPSS. Additionally, a convenient front-end interface for PyRadiomics is provided as the ‘radiomics’ extension within 3D Slicer. All code, including the Slicer extension, documentation, frequently asked questions, and instruction videos (see Video 1 and 2) are available at www.radiomics.io/pyradiomics.html. In the supplementary information detailed descriptions of feature definitions, dataset, and analyses can be found.

Case Study

In a case study, we demonstrated an application of PyRadiomics for lung lesion characterization to discriminate between benign and malignant nodules. We used the publicly available cohort of the Lung Image Database Consortium (LIDC-IDRI) (15), which consists of diagnostic and lung cancer screening CT scans along with marked-up annotated lesions and per-lesion malignancy rating (i.e. if a nodule is benign or malignant) from experienced radiologists (Supplementary Methods 1). From 302 patients, we included 429 distinct lesions in our analysis, each with four volumetric segmentations and malignancy ratings. In total, 1120 radiomic features (14 shape features, 19 first-order intensity statistics features, 60 texture features, 395 LOG features and 632 wavelet features) were extracted from all four delineations of every lesion (Supplementary Methods 2-4).

To assess the effect variations in the manual segmentations on radiomic feature values, we calculated the stability for each of the features extracted from four segmentations performed by expert radiologists. This stability was calculated using intraclass correlation coefficient (ICC) (Figure 1B). High stability (median ± sd: ICC > 0.8 ) was observed for LOG (ICC= 0.91 ± 0.11), first-order intensity statistics (ICC= 0.88 ± 0.13) and texture features (ICC= 0.91 ± 0.11), whereas shape (ICC= 0.60 ± 0.31) and wavelet (ICC= 0.63 ± 0.23) features showed moderate stability, which indicates their sensitivity towards delineation variability.

Selecting all features with high stability (ICC>0.8), resulted in 535 radiomic features (5 shape features, 14 first-order intensity statistics features, 48 texture features, 310 LOG features and 158 wavelet features). Figure 1C displays unsupervised clustering of the standardized expression values of the 535 stable radiomic features (rows) in 429 nodules (columns). We observed four distinct clusters of lesions with similar expression values. Comparing these clusters with lesion malignancy status, we observed significant difference between them (P = 2.56e−24, χ2 test). 92% (n= 81) of the samples of cluster S1 (n = 88) were malignant, whereas 95% (n = 38) of the samples of cluster S2 (n = 40) were benign. For cluster S3 (n = 143) and S4 (n = 158) the proportion of malignant samples were 54% (n = 78) and 34% ( n = 53) respectively. These results demonstrate associations between imaging-based subtypes and malignancy status of lung lesions.

In order to evaluate the performance of a multivariate imaging biomarker, we divided the cohort into training (n=214) and validation (n=215). Using Minimum Redundancy Maximum Relevance (mRMR), we selected 25 stable radiomic features from the training cohort (Supplementary Table 1). An multivariate biomarker was developed by fitting selected features into a random forest classifier, based on the training data. The biomarker demonstrated strong and significant performance to characterize lung nodules (AUC=0.79 [0.73-0.85], Noether test p-value=4.12e-22) on the validation cohort (Figure 1D). More details on features extraction and analysis methods are provided in the Supplements.

Conclusion

PyRadiomics provides a flexible radiomic quantification platform, with a simple and convenient front-end interface in 3D Slicer, as well as a back-end interface within Python allowing automation in data processing, feature definition, and batch handling. By providing a tested and maintained open-source radiomics platform, we aim to establish a reference standard for radiomic analyses promoting reproducible science within the quantitative imaging field, raise awareness among scientists of this platform to support their work, and to provide a practical go-to resource. By doing so we hope to grow the community of radiomic technology developers to address critical needs in cancer research.

Supplementary Material

1

Acknowledgments

Financial Support: Authors acknowledge financial support from the Informatics Technology for Cancer Research (ITCR) program (NIH-USA U24CA194354) and the Quantitative Imaging Network (QIN) program (NIH-USA U01CA190234) of the National Cancer Institute (NCI) of the National Institute of Health (NIH).

Footnotes

Conflicts of interest: None

References

  • 1.Aerts HJWL. The Potential of Radiomic-Based Phenotyping in Precision Medicine: A Review. JAMA Oncol. 2016;2:1636–42. doi: 10.1001/jamaoncol.2016.2631. [DOI] [PubMed] [Google Scholar]
  • 2.Aerts HJWL, Velazquez ER, Leijenaar RTH, Parmar C, Grossmann P, Carvalho S, et al. Decoding tumour phenotype by noninvasive imaging using a quantitative radiomics approach. Nat Commun. 2014;5:4006. doi: 10.1038/ncomms5006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Lambin P, Rios-Velazquez E, Leijenaar R, Carvalho S, van Stiphout RGPM, Granton P, et al. Radiomics: extracting more information from medical images using advanced feature analysis. Eur J Cancer. 2012;48:441–6. doi: 10.1016/j.ejca.2011.11.036. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Marusyk A, Almendro V, Polyak K. Intra-tumour heterogeneity: a looking glass for cancer? Nat Rev Cancer. 2012;12:323–34. doi: 10.1038/nrc3261. [DOI] [PubMed] [Google Scholar]
  • 5.LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015;521:436–44. doi: 10.1038/nature14539. [DOI] [PubMed] [Google Scholar]
  • 6.Yip SSF, Aerts HJWL. Applications and limitations of radiomics. Phys Med Biol IOP Publishing. 2016;61:R150–66. doi: 10.1088/0031-9155/61/13/R150. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Orlhac F, Soussan M, Maisonobe JA, Garcia CA, Vanderlinden B, Buvat I. Tumor Texture Analysis in 18F-FDG PET: Relationships Between Texture Parameters, Histogram Indices, Standardized Uptake Values, Metabolic Volumes, and Total Lesion Glycolysis. J Nucl Med. 2014;55:414–22. doi: 10.2967/jnumed.113.129858. [DOI] [PubMed] [Google Scholar]
  • 8.Tixier F, Hatt M, Le Rest CC, Le Pogam A, Corcos L, Visvikis D. Reproducibility of tumor uptake heterogeneity characterization through textural feature analysis in 18F-FDG PET. J Nucl Med. 2012;53:693–700. doi: 10.2967/jnumed.111.099127. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Fedorov A, Beichel R, Kalpathy-Cramer J, Finet J, Fillion-Robin JC, Pujol S, et al. Magn Reson Imaging. Vol. 30. Elsevier; 2012. 3D Slicer as an Image Computing Platform for the Quantitative Imaging Network; pp. 1323–41. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Johnson HJ, Mccormick MM. The ITK Software Guide Book 2 : Design and Functionality Fourth Edition Updated for ITK version 4.10. 2016 [Google Scholar]
  • 11.Haralick R, Shanmugan K, Dinstein I. Textural features for image classification [Internet] IEEE Transactions on Systems, Man and Cybernetics. 1973:610–21. Available from: http://dceanalysis.bigr.nl/Haralick73-Texturlfeaturesforimageclassification.pdf.
  • 12.Galloway MM. Texture analysis using gray level run lengths. Computer Graphics and Image Processing. 1975;4:172–9. [Google Scholar]
  • 13.Chu A, Sehgal CM, Greenleaf JF. Use of gray value distribution of run lengths for texture analysis. Pattern Recognit Lett. 1990;11:415–9. [Google Scholar]
  • 14.Thibault G, Fertil B, Navarro C, Pereira S, Cau P, Levy N, et al. Texture Indexes and Gray Level Size Zone Matrix Application to Cell Nuclei Classification. Pattern Recognition and Information Processing. 2009:140–5. [Google Scholar]
  • 15.Armato SG, 3rd, McLennan G, Bidaut L, McNitt-Gray MF, Meyer CR, Reeves AP, et al. The Lung Image Database Consortium (LIDC) and Image Database Resource Initiative (IDRI): a completed reference database of lung nodules on CT scans. Med Phys. 2011;38:915–31. doi: 10.1118/1.3528204. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

1

RESOURCES