A Non-invasive Method to Diagnose Lung Adenocarcinoma

Mengmeng Yan; Weidong Wang

doi:10.3389/fonc.2020.00602

. 2020 Apr 29;10:602. doi: 10.3389/fonc.2020.00602

A Non-invasive Method to Diagnose Lung Adenocarcinoma

Mengmeng Yan ^1,², Weidong Wang ^3,^4,^*

PMCID: PMC7200977 PMID: 32411600

Abstract

Purpose: To find out the CT radiomics features of differentiating lung adenocarcinoma from another lung cancer histological type.

Methods: This was a historical cohort study, three independent lung cancer cohorts included. One cohort was used to evaluate the stability of radiomics features, one cohort was used to feature selection, and the last was used to construct and evaluate classification models. The research is divided into four steps: region of interest segmentation, feature extraction, feature selection, and model building and validation. The feature selection methods included the intraclass correlation coefficient, ReliefF coefficient, and Partition-Membership filter. The performance metrics of the classification model included accuracy (Acc), precision (Pre), area under curve (AUC), and kappa statistics.

Results: The 10 features (First order shape features: Sphericity and Compacity, Gray-Level Run Length Matrix: Short-Run Emphasis, Low Gray-level Run Emphasis, and High Gray-level Run Emphasis, Gray Level Co-occurrence Matrix: Homogeneity, Energy, Contrast, Correlation, and Dissimilarity) showed the most stable and classification capability. The 6 classifiers, Logistic regression classifier (LR), Sequence Minimum Optimization algorithm, Random Forest, KStar, Naive Bayes and Random Committee, have great performance both on the train and the test sets, and especially LR has the best performance on the test set (Acc = 98.72, Pre = 0.988, AUC = 1, and kappa = 0.974).

Conclusion: Lung adenocarcinoma can be identified based on CT radiomics features. We can diagnose lung adenocarcinoma with CT non-invasively.

Keywords: radiomics, texture analysis, lung adenocarcinoma, multi-instance learning, lung cancer histological types

Introduction

Medical imaging can assess the characteristics of human tissues non-invasively and is often used in the diagnosis, treatment guidance and monitoring of tumors in clinical practice. And radiomics can extract and quantify the differences in tumor tissues (1–4).

The radiomics workflow is usually divided into four steps (1, 5, 6): The first step is image collection and segmentation. All kinds of medical image formats are supported by radiomics, but in terms of the number of studies, CT radiomics has the largest number of studies, followed by PET, MR, and ultrasound. The segmentation methods include manual segmentation and semi-automatic segmentation. The second step is feature extraction. This part of the work is easy to standardize. And the third step is feature selection. Feature selection methods are divided into supervised learning and unsupervised learning. No matter which type of feature selection, stability evaluation and performance evaluation should be carried out. The influence of feature redundancy varies with the algorithms. The final step is model building. The algorithms of model building can be roughly divided into machine learning and deep learning, and the selection index is data quantity. Besides, basic medical statistical methods, such as hypothesis testing, can also be used for radiomics analysis. Figure 1 shows the pipeline of our proposed radiomics analysis.

The pipeline of our proposed radiomics analysis. (1) Original images of lung cancer patients. (2) Tumor area of interest (ROI) segmentation of each slice of CT. (3) Extraction of shape, first-order features and higher-order features from the ROI. (4) Prediction model building based on machine learning classifiers, ROC curves used to assess the model performance. Adc is lung Adenocarcinoma, and Oth are other lung cancer histological subtypes.

The histological type diagnosis of lung cancer is fundamental in guiding patient management. Lung biopsy is a well-established method for the differential diagnosis of lung lesions (7), but it is expensive and invasive. Lung Adenocarcinoma (Adc) is the most common subtype of lung cancer (8), and diagnosing Adc by biopsy is not beneficial to the patients unfit for the invasive diagnostic procedure. So it is important to diagnose Adc from others (binary classification) by radiomics so that the patients will get accurate treatment earlier without invasive. In addition, it could be the basis to develop a multiple class classification model to reduce or avoid the use of invasive diagnostic methods.

This paper tests the hypothesis that Adc can be predicted from another lung cancer histological type (Oth) by radiomics. To invest the evidence of that, we analyzed three independent lung cancer cohorts, built some lung Adc classifiers that can differentiate Adc from Oth without considering the clinical parameters. To our knowledge, this work is the first radiomics-based study to predict Adc from Oth (including squamous cell carcinoma, other primary lung cancer and metastases), and the proposed models are non-invasive and cost-effectiveness.

Result

The Most Stable Features With High Classification Capability

Table 1.1 lists the 30 most stable features ranked by intraclass correlation coefficient (the threshold value is 0.85, p < 0.01) in RIDER (9) data set. Most of the extracted radiomics features have good stability. Based on the 30 most stable radiomics features, the ReleifF (KenjiKira et al. presented at the 1992 Machine Learning Proceedings) algorithm (10 times cross-validation) shows 10 features with classification ability (threshold value is 0.01) in Table 1.2. The features based on shape, Gray Level Co-occurrence Matrix (GLCM), and Gray-Level Run Length Matrix (GLRLM) had better classification ability, where Sphericity and Compacity based on shape describe the tumor shape such as spherical, round or elongated, Contrast_GLCM describes the local differences and higher value stands for greater difference between neighboring voxels, SRE_GLRLM is a measure of short run length distribution, and larger values represent better texture structure.

Table 1.

The analysis results of three independent data sets.

Class	Features
1.1 The 30 most stable features on RIDER data set
^aFH	Skewness, kurtosis, energy
^bFS	Sphericity, compacity, volume
^cGLZLM	Short-zone emphasis, high gray-level zone emphasis, short-zone low gray-level emphasis, short-zone high gray-level emphasis, long-zone low gray-level emphasis, zone length non-uniformity, low gray-level run emphasis, high gray-level run emphasis
^dGLRLM	Short-run emphasis, long-run emphasis, low gray-level run emphasis, high gray-level run emphasis, short-run high gray-level emphasis
^eNGLDM	Coarseness, contrast
^fGLCM	Homogeneity, energy, contrast, correlation, dissimilarity
Conventional Indices	minValue, maxValue, meanValue, stdValue
1.2 The 10 most stable features with classification capability on Lung 1 data set
^bFS	Sphericity, compacity
^dGLRLM	Short-run emphasis, low gray-level run emphasis, high gray-level run emphasis
^fGLCM	Homogeneity, energy, contrast, correlation, dissimilarity
Classifiers	Accuracy(%)
1.3 Accuracy ratio of 6 machine learning classifiers on Lung 2 test set
^gLR	98.72
^hRC	98.72
ⁱSMO	97.44
^jRF	97.44
^kNB	98.72
Ksrar	96.15

Open in a new tab

First-order features-histogram.

First order features-shape.

Gray-Level Zone Length Matrix, provides information on the size of homogeneous zones for each gray-level in 3 dimensions.

Gray-Level Run Length Matrix, gives the size of homogeneous runs for each gray level. This matrix is computed for the 13 different directions in 3D (4 in 2D) and each of the 11 texture indices derived from this matrix, the 3D value is the average over the 13 directions in 3D (4 in 2D).

Neighborhood Gray-Level Different Matrix, corresponds to the difference of gray-level between one voxel and its 26 neighbors in 3 dimensions (8 in 2D).

Gray Level Co-occurrence Matrix, takes into account the arrangements of pairs of voxels to calculate textural indices.

logistic regression.

Random Committee.

ⁱ

Sequential minimal optimization.

Random Forest.

Naive Bayes.

The best accuracy ratios are highlighted in bold.

Partition-Membership filter (PMF) used the random Committee algorithm as the partition generator to divide the 10 features into 1940 partitions (Supplementary Material). The minimum feature subset contained 122 partitions with the highest classification capability selected by correlation-based feature subset selection (CFS).

Model Performance

Table 1.3 shows the accuracy ratios in 6 machine learning classifiers on the test set, including Logistic regression classifier (LR), Sequence Minimum Optimization algorithm (SMO), Random Forest (SF), KStar, Naive Bayes (NB) and Random Committee (RC). All of them have a great performance on the test set, and especially LR, RF, and NB get the highest accuracy of 98.72%. It also stands for the great classification capability of those 10 features in diagnosing Adc.

Table 2 and Figure 2 show 6 classifiers with great performance on the train and the test sets. The best performance metrics for each set are highlighted in bold. As a whole, the 6 classifiers have excellent classification performance both on the train and the test sets, which shows that they can not only diagnose Adc but also rule out Oth with high accuracy. There is no significance between prediction models (P > 0.05), which can be inferred that the selected 10 features have great ability to diagnose Adc. On the test set, the Kappa statistics are approximately equal to 1 for all models shows that the models have great stability, and the minimum value is 0.923 (Kstar). Meanwhile, the mean absolute errors (MAE) are approximately equal to 0, and the maximum value is 0.09 (Kstar).

Table 2.

Performance metrics of 6 classifiers on the train set and test set.

Performance	Accuracy (%)	^lTPR	^mTNR	Precision	ⁿAUC	Kappa	^°MAE
^gLR
Train set	98.70	0.980	0.993	0.987	0.996	0.973	0.02
Test set	98.72	0.987	1.000	0.988	1.000	0.974	0.01
^hRC
Train set	96.40	0.967	0.961	0.964	0.997	0.928	0.07
Test set	98.72	0.974	1.000	0.988	1.000	1.000	0.05
ⁱSMO
Train set	97.72	0.961	0.993	0.978	0.977	0.954	0.02
Test set	97.44	0.974	0.974	0.974	0.974	0.949	0.03
^jRF
Train set	97.72	0.974	0.980	0.977	0.997	0.954	0.10
Test set	97.44	0.974	0.974	0.974	0.999	0.949	0.08
^kNB
Train set	97.01	0.948	0.993	0.972	0.994	0.942	0.06
Test set	98.72	0.974	1.000	0.988	1.000	0.974	0.05
Kstar
Train set	96.08	0.922	1.000	0.964	0.997	0.921	0.10
Test set	96.15	0.949	0.949	0.974	0.997	0.923	0.10

Open in a new tab

logistic regression.

Random Committee.

ⁱ

Sequential minimal optimization.

Random Forest.

Naive Bayes.

True Positive Rate.

True Negative Rate.

ⁿ

Area under curve.

^°

Mean absolute error.

The best performance metrics for each set are highlighted in bold.

Mean ROC curves obtained by six machine learning models for predicting lung adenocarcinoma. The black diagonal line in the diagram is the random line which is the worst possible performance a model can achieve. **(A)** Logistic regression (LR), naive bayes (NB), and random committee (RC) classifiers all have the same AUC. **(B)** Random forest (RF) classifier. **(C)** Kstar classifier. **(D)** Sequential minimal optimization (SMO) classifier.

LR classifier has the best performance on the test set, it also has the highest accuracy, true positive rate (TPR), true negative rate (TNR), precision, and lowest MAE on train set. Followed by RC and NB, which have the highest TNR, precision, and area under curve (AUC) on the test set. It is important to diagnose Adc from Adcs so that patients will get accurate treatment earlier. Table 2 shows LR has great ability to diagnose Adc from Adcs with over 98% accuracy on the test set. And LR, RC, and NB have perfect accuracy in diagnosing Oth from Oths.

Discussion

Radiomics provides a non-invasive and fast method to predict clinical outcomes. It could not only support precision medicine but also be a household diagnostic tool. It is an effective way to use radiomics to support therapy decision-making, which will advance personalized medicine. Radiomics has been applied to a variety of organs and systems such as brain, breast, lung, heart, liver, kidney, adrenal gland, cervix, limbs, and prostate (6, 10, 11). For example, Chaddad et al. (6, 12) proposed a multiscale texture features to predict progression free and overall survival in patients newly diagnosed with glioblastoma, they also reviewed the clinical implementation of radiomic in the current management of glioblastoma, which is important for advancing the personalized treatment of glioblastoma patients.

It has been proved the correlation between radiomics features and tumor phenotype (12–22). Many studies have found Adc can be predicted by radiomics (22–28). Tang et al. (27) developed a radiomics model to discriminate Adc from squamous cell carcinoma (Sqc) with an AUC of 0.82, Yang et al. (24) developed an LR model to predict lymph node metastasis in solid Adc with an AUC of 0.86. Remeo et al. (23) studied ground-glass nodules diagnosis by radiomics, and found radiomics classifier may be a reliable tool for clinical decision. Ferreira-Junior et al. (28) found some radiomics features associated with Adc and squamous cell carcinoma, and got an AUC of 0.88 with a machine learning model.

However, from the data set point of view, the data sets of these studies only contain Adc and Sqc, and in clinical we can't rule out the existence of other subtypes before lung biopsy. So from the perspective of clinical diagnosis, the study of predicting Adc should include all subtypes of lung cancer as many as possible. Besides, among these studies, the performance of CT radiomics models still needs to be improved.

The proposed radiomics models showed great performance in diagnosing Adc both on the train and the test sets. The models are available and can be applied in Weka.

In this study, lung cancer patients with various histological subtypes were included in the patient cohorts. We used stratified random sampling to balance the covariates. In feature selection, we first test the stability of the feature using the public RIDER data set. Then pick up the features with classification capability. The selected 10 features show excellent classification ability after PMF and CFS. PMF was used for transforming features and CFS is good at picking the most representative minimum feature subset. It has been proved that PMF can not only solve the problem of binary classification but also improve the accuracy of classification (29, 30). Meanwhile, in order to avoid over-fitting as much as possible, the train and the test sets were divided with stratified random sampling to keep them balanced. For model development, independent data sets were used for feature selection and model construction, and cross validation method was used for resampling. In model selection, we used many classifiers to show the classification ability of selected features, including three frequently used classifiers LR, RF, and NB. RF contains multiple trees, even if some trees have over-fitting, it can reduce over-fitting by voting or averaging. Many radiomics studies used RF for classification. RC is an ensemble method, it will build an ensemble of randomizable base classifiers. Each base classifier is built using a different random number seed. The final prediction is a straight average of the predictions generated by the individual base classifiers. Kstar is an instance-based learner using an entropic distance measure to solve the smoothness problem. SMO is used for training a support vector classifier, which has good robustness and generalization ability.

A few issues regarding the stability and reproducibility of the radiomics features have been raised in recent years (31–33). Multiple parameter changes (e.g., slice thickness) in general produce greater measurement errors. Therefore, some parameters such as slice thickness, dose, kernel, and segmentation methods should not be altered to assess the features of a radiomics model. In this case, we selected the most stable features across test-retest. To find the most representative feature subset and reduce the running time of the classifiers, we used CFS to pick the most representative minimum feature subset. CFS uses heuristic and best-first search methods to evaluate feature subsets and filters out features that are highly correlated with classes but have the lowest correlation with each other.

Although we try our best to reduce random errors and ensure the correctness of statistical analysis in this study, there are several limitations. Two cohorts in our study are from public data sets, so we cannot accurately estimate the size and direction of systematic bias. The area of interest of the Lung 1 data set and the Lung 2 data set are delineated in different ways, which will lead to measurement errors. Besides, we need more cases to improve the classification model.

In conclusion, CT based radiomics can identify Adc. Therefore, we can distinguish Adc only from CT images. We will include multicenter data to improve the classifier and make it a clinical diagnostic tool.

Materials and Methods

Our work was approved by the institutional Ethics Committee.

The tools used for statistical analysis were IBM SPSS Statistics 25.0 (USA), and Weka (Frank et al. presented at the 2009 Data mining and knowledge discovery handbook) (Weka v3.8.3, Hamilton, New Zealand).

Data Sets

We analyzed three independent data sets including a public RIDER data set (9), a lung cancer cohort from our institute (Lung 1), and a public radiomics features data set (Lung 2) (4), Table 3 shows Patient characteristics of Lung 1 and Lung 2. Patients characteristics in detail, criteria for patient selection, and CT scan protocol of Lung 2 have been already published (4).

Table 3.

Patient characteristics.

Characteristics	Lung 1	Lung 2
Size, N	180	535
Mean Age	66	69
Gender (%)
Female	30.6	33.3
Male	69.4	66.7
Histological type, N
Adenocarcinoma	90	193
Squamous cell carcinoma	30	132
Other primary lung cancer	30	79
Metastases	30	131
^aThe significance of radiomics features, N
P ≤ 0.05	^b8
P> 0.05	33

Open in a new tab

Paired t-test with 95% Confidence Interval, two-tailed.

They are Volume_Shape, Long-Run Emphasis_Gray-Level Run Length Matrix, Coarseness_Neighborhood Gray-Level Different Matrix, Contrast_Neighborhood Gray-Level Different Matrix, Long-Zone Low Gray-level Emphasis_Gray-Level Zone Length Matrix, Zone Length Non-Uniformity_Gray-Level Zone Length Matrix, Low Gray-level Run Emphasis_Gray-Level Zone Length Matrix, High Gray-level Run Emphasis_Gray-Level Zone Length Matrix.

The RIDER data set consists of 31 non-small cell lung cancer patients with two CT scans obtained in an interval of about 15 min. We use this data set to evaluate the stability of features for test-retest.

Lung 1 data set consists of 180 lung cancer patients (adenocarcinoma: squamous cell carcinoma: other types of lung cancer: metastasis = 3:1:1:1) from our institutional database in 2010–2018. For these patients, CT images, manual delineations, and clinical data were available. The criteria for patient selection are the same as Lung 2. We use this data set for feature selection.

Lung 2 data set consists of 535 lung cancer patients. For these patients, texture features were available. We used this data set for model building and validation. In order to keep the data class balanced on the train and the test sets(adenocarcinoma: squamous cell carcinoma: other types of lung cancer: metastasis = 3:1:1:1) and include as many patients as possible, we randomly divided it into train set (n = 306) and test set (n = 78). Specific patients were selected by pseudorandom numbers.

According to the lung histological diagnosis, the data class was divided into Adc and Oth (including squamous cell carcinoma, other primary histological subtypes, and metastatic lung cancer). The research of the data set can be divided into two stages: training phase and validation phases. The training phase included CT image acquisition, texture feature extraction, feature selection, and model building. The validation phase included model testing and performance evaluation.

CT Image Acquisition and Texture Feature Extraction

The acquisition and processing of Lung 1 and Lung 2 CT images were carried out following Image Biomarker Standardization Initiative (IBSI) (34). The volume of interest (VOI) of the lung 1 data set is made by two experienced radiologists independently. Before the work, the physiologists did not know the histological subtype (blindness) of the target patient. For the inconsistent segments, they will be segmented again after comparison until the outcomes are consistent. The VOI of the Lung 2 data set is segmented (semi)automatically.

LIFEx package (35) used to extract texture features. It can efficiently perform textural analysis and radiomics feature measurements from CT images. 41 features were extracted from CT images.

Feature Selection

The stability of the radiomics features was evaluated by using the RIDER data set. For each patient, we extracted image features from two scans. The stability of each feature was calculated using the intraclass correlation coefficient, where the higher the intraclass correlation coefficient corresponds to the more stable feature (1).

Based on the results of feature stability, The ReliefF algorithm (ReliefF Attribute Eval with Ranker in WEKA) was used to remove the irrelevant features from the lung 1 data set.

The selected features were filtered by propositionalization and partition using the Partition-Membership filter (Partition Membership Filter with option Random Committee in Weka) on Lung 2 train and test sets. It can apply any partition generator to a given feature vector to get these filtered vectors for all instances, and the filtered instances are composed of these values plus class attribute and make as sparse instances (29).

Then we used CFS to filter the results. The CFS can select the minimum feature set that is highly related to the classes. In this feature set, there is a low correlation between features, so feature redundancy can be reduced. That is to say, the final result is the feature set with the highest prediction ability, and there is a low correlation between the features in this feature set.

Model Building and Performance Evaluation

We used 6 machine learning classifiers, including LR(logistic with options -R 1.0E-8 -M−1 in Weka), ensemble learning classifier RF (Random Forest with options -K 0 -M 1.0 -V 0.001 -S 1 in Weka), Sequential minimal optimization(SMO with options -C 1.0 -L 0.001 -P 1.0E-12 -N 1 -V−1 -W 1 -K in Weka), NB (naïve Bayes in Weka), RC (Random Committee with options -S 1 -num-slots 1 -I 10 -W in Weka), and KStar (Kstar in Weka) with 10-folds cross validation. The performance metrics of the classification model included TPR, TNR, accuracy, precision, AUC, kappa statistics, and MAE. Table 4 shows the calculation formulas of these metrics.

Table 4.

The calculation formulas of performance metrics.

Metric	*^Formula**
TPR	$\frac{T P}{T P + F N}$
TNR	$\frac{TN}{TN + FP}$
Accuracy	$\frac{TP + TN}{TP + FP + TN + FN}$
Precision	$\frac{TP}{TP + FP}$
AUC	$\int_{x = 0}^{1} T P R (F P R^{- 1} (x)) d x$ , where x₁ is the score for a positive instance and x₀ is the score for a negative instance.
Kappa	$Kappa = \frac{P_{o} - P_{e}}{1 - P_{e}}$ , $P_{e} = \frac{P (T P + F P) + N (T N + F N)}{{(T + N)}^{2}}$ where P_o = Accuracy,
MAE	$\frac{1}{n} \sum_{i = 1}^{n} \| p (i) - a (i) \|$ , where p(i) is the prediction case, and a(i) is real case, n is the total cases.

Open in a new tab

TP is true positive, it means that the outcome from a prediction is lung adenocarcinoma (Adc) and the actual value is also Adc. FN is false negative, it means that the prediction outcome is another lung cancer histological type(Oth) while the actual value is Adc. TN is true negative, it means that both the prediction outcome and the actual value are Oth. FP is false positive, it means that the outcome from a prediction is Adc while the actual value is Oth. P is condition positive, N is condition negative, and MAE is the mean absolute errors. TPR is true positive rate, it measures the proportion of actual patients with Adc that are correctly identified. A negative result in a test with high TPR is useful for ruling in disease, it signifies a high probability of the presence of Oth. TNR is true negative rate, it measures the proportion of actual patients with Oth that are correctly identified. A test with 100% TNR will recognize all patients with Oth by testing negative, and a positive test result would definitively rule out the presence of Oth in a patient.

Data Availability Statement

All datasets generated for this study are included in the article/Supplementary Material.

Author Contributions

All authors listed have made a substantial, direct and intellectual contribution to the work, and approved it for publication.

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Footnotes

Funding. This study was supported by National Key Research and Development program [2017YFC0113904].

Supplementary Material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fonc.2020.00602/full#supplementary-material

Click here for additional data file.^{(1.4MB, csv)}

References

1.Aerts HJ, Velazquez ER, Leijenaar RT, Parmar C, Grossmann P, Carvalho S, et al. Decoding tumour phenotype by noninvasive imaging using a quantitative radiomics approach. Nat Commun. (2014) 5:4006. 10.1038/ncomms5644 [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Forghani R, Savadjiev P, Chatterjee A, Muthukrishnan N, Reinhold C, Forghani B, et al. Radiomics and artificial intelligence for biomarker and prediction model development in oncology. Comput Struct Biotechnol J. (2019) 17:995–1008. 10.1016/j.csbj.2019.07.001 [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Lambin P, Rios-Velazquez E, Leijenaar R, Carvalho S, van Stiphout RG, Granton P, et al. Radiomics: extracting more information from medical images using advanced feature analysis. Eur J Cancer. (2012) 48:441–6. 10.1016/j.ejca.2011.11.036 [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Kirienko M, Cozzi L, Rossi A, Voulaz E, Antunovic L, Fogliata A, et al. Ability of FDG PET and CT radiomics features to differentiate between primary and metastatic lung lesions. Eur J Nucl Med Mol Imaging. (2018) 45:1649–60. 10.1007/s00259-018-3987-2 [DOI] [PubMed] [Google Scholar]
5.Chaddad A, Desrosiers C, Toews M, Abdulkarim B. Predicting survival time of lung cancer patients using radiomic analysis. Oncotarget. (2017) 8:104393–407. 10.18632/oncotarget.22251 [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Chaddad A, Kucharczyk MJ, Daniel P, Sabri S, Jean-Claude BJ, Niazi T, et al. Radiomics in glioblastoma: current status and challenges facing clinical implementation. Front Oncol. (2019) 9:374. 10.3389/fonc.2019.00374 [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Yan W, Guo X, Zhang J, Zhou J, Chen C, Wang M, et al. Lobar location of lesions in computed tomography-guided lung biopsy is correlated with major pneumothorax: a STROBE-compliant retrospective study with 1452 cases. Medicine. (2019) 98:e16224. 10.1097/MD.0000000000016224 [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Bashir U, Kawa B, Siddique M, Mak SM, Nair A, Mclean E, et al. Non-invasive classifcation of non-small cell lung cancer: a comparison between random forest models utilising radiomic and semantic features. Br J Radiol. (2019) 92:20190159. 10.1259/bjr.20190159 [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Zhao B, James LP, Moskowitz CS, Guo P, Ginsberg MS, Lefkowitz R, et al. Evaluating variability in tumor measurements from same-day repeat CT scans of patients with non–small cell lung cancer. Radiology. (2009) 252:263–72. 10.1148/radiol.2522081593 [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Koçak B, Durmaz ES, Ateş E, Kiliçkesmez Ö. Radiomics with artificial intelligence: a practical guide for beginners. Diagn Interv Radiol. (2019) 25:485–95. 10.5152/dir.2019.19321 [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Meng Y, Sun J, Qu N, Zhang G, Yu T, Piao H. Application of radiomics for personalized treatment of cancer patients. Cancer Manage Res. (2019) 11:10851–58. 10.2147/CMAR.S232473 [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Chaddad A, Sabri S, Niazi T, Abdulkarim B. Prediction of survival with multi-scale radiomic analysis in glioblastoma patients. Med Biol Eng Comput. (2018) 56:2287–300. 10.1007/s11517-018-1858-4 [DOI] [PubMed] [Google Scholar]
13.Mendelson EB. Artificial intelligence in breast imaging: potentials and limitations. Am J Roentgenol. (2019) 212:293–9. 10.2214/AJR.18.20532 [DOI] [PubMed] [Google Scholar]
14.Zhang J, Zhao X, Zhao Y. Value of pre-therapy 18F-FDG PET/CT radiomics in predicting EGFR mutation status in patients with non-small cell lung cancer. Eur J Nucl Med Mol Imaging. (2019) 1137–46. 10.1007/s00259-019-04592-1 [DOI] [PubMed] [Google Scholar]
15.Lee SW, Park H, Lee HY, Sohn I, Lee S, Kang J, et al. Deciphering clinicoradiologic phenotype for thymidylate synthase expression status in patients with advanced lung adenocarcinoma using a radiomics approach. Sci Rep. (2018) 8:8968. 10.1038/s41598-018-27273-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
16.van Griethuysen JJM, Fedorov A, Parmar C, Hosny A, Aucoin N, Narayan V, et al. Computational radiomics System to Decode the Radiographic Phenotype. Cancer Res. (2017) 77:e104–7. 10.1158/0008-5472.CAN-17-0339 [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Grossmann P, Stringfield O, El-Hachem N. Defining the biological basis of radiomic phenotypes in lung cancer. Elife. (2017) 6:e23421. 10.7554/eLife.23421 [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Carvalho S, Leijenaar RTH, Troost EGC, van Timmeran JE, Oberije C, van Elmpt W, et al. 18F-fluorodeoxyglucose positron-emission tomography (FDG-PET)-Radiomics of metastatic lymph nodes and primary tumor in non-small cell lung cancer (NSCLC) - a prospective externally validated study. PLoS ONE. (2018). 13:e0192859. 10.1371/journal.pone.0192859 [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Yang B, Guo L, Lu G, Shan W, Duan L, Duan S. Radiomic signature: a non-invasive biomarker for discriminating invasive and non-invasive cases of lung adenocarcinoma. Cancer Manag Res. (2019) 11:7825–34. 10.2147/CMAR.S217887 [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Lee G, Bak SH, Lee HY, Choi JY, Park H. Radiomics and imaging genomics for evaluation of tumor response. Ther Response Imaging Oncol. (2020) 221–38. 10.1007/978-3-030-31171-1_13 [DOI] [Google Scholar]
21.Voigt W, Manegold C, Pilz L, Wu YL, Müllauer L, Pirker R, et al. Beyond tissue biopsy: a diagnostic framework to address tumor heterogeneity in lung cancer. Curr Opin Oncol. (2020) 32:68–77. 10.1097/CCO.0000000000000598 [DOI] [PubMed] [Google Scholar]
22.Zhao W, Zhang W, Sun Y. Convolution kernel and iterative reconstruction affect the diagnostic performance of radiomics and deep learning in lung adenocarcinoma pathological subtypes. Thorac Cancer. (2019) 10:1893–903. 10.1111/1759-7714.13161 [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Romeo V, Cuocolo R, Ricciardi C, Ugga L, Cocozza S, Verde F, et al. Prediction of tumor grade and nodal status in oropharyngeal and oral cavity squamous-cell carcinoma using a radiomic approach. Anticancer Res. (2020) 40:271–80. 10.21873/anticanres.13949 [DOI] [PubMed] [Google Scholar]
24.Yang X, Pan X, Liu H. A new approach to predict lymph node metastasis in solid lung adenocarcinoma: a radiomics nomogram. J Thorac Dis. (2018) 10:S807–19. 10.21037/jtd.2018.03.126 [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Yang B, Ji H, Ge Y. Correlation study of 18-fluorodeoxyglucose positron emission tomography/computed tomography in pathological subtypes of invasive lung adenocarcinoma and prognosis. Front Oncol. (2019) 9:908 10.3389/fonc.2019.00908 [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Wang B, Tang Y, Chen Y, Hamal P, Zhu Y, Wang T, et al. Joint use of the radiomics method and frozen sections should be considered in the prediction of the final classification of peripheral lung adenocarcinoma manifesting as ground-glass nodules. Lung Cancer. (2020) 139:103–10. 10.1016/j.lungcan.2019.10.031 [DOI] [PubMed] [Google Scholar]
27.Tang X, Xu X, Han Z, Bai G, Wang H, Liu Y, et al. Elaboration of a multimodal MRI-based radiomics signature for the preoperative prediction of the histological subtype in patients with non-small-cell lung cancer. BioMed Eng OnLine. (2020) 19:5. 10.1186/s12938-019-0744-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Ferreira-Junior JR, Koenigkam-Santos M, Tenório APM, Faleiros MC, Cipriano FEG, Fabro AT, et al. CT-based radiomics for prediction of histologic subtype and metastatic disease in primary malignant lung neoplasms. Int J Comput Assist Radiol Surg. (2020) 15:163–72. 10.1007/s11548-019-02093-y [DOI] [PubMed] [Google Scholar]
29.Frank E, Pfahringer B. Propositionalisation of multi-instance data using random forests. In Australasian Joint Conference on Artificial Intelligence. Springer: Dunedin, New Zealand; (2013) p. 362–73. 10.1007/978-3-319-03680-9_37 [DOI] [Google Scholar]
30.Weidmann N, Frank E, Pfahringer B. A two-level learning method for generalized multi-instance problems. In European Conference on Machine Learning. Springer: Berlin, Heidelberg; (2003) p. 468–79. 10.1007/978-3-540-39857-8_42 [DOI] [Google Scholar]
31.Erdal BS, Demirer M, Amadi CC, Ibrahim GFM, Grimmeruline R, Little K, et al. Are quantitative features of lung nodules reproducible at different CT acquisition and reconstruction parameters? (2019) arXiv [preprint]. arXiv:1908.05667. [DOI] [PMC free article] [PubMed] [Google Scholar]
32.Zhao B, Tan Y, Tsai WY, Qi J, Xie C, Schwartz LH. Reproducibility of radiomics for deciphering tumor phenotype with imaging. Sci Rep. (2016) 6:1–7. 10.1038/srep23428 [DOI] [PMC free article] [PubMed] [Google Scholar]
33.Tunali I, Hall LO, Napel S, Cherezov D, Guvenis A, Gillies RJ, et al. Stability and reproducibility of computed tomography radiomic features extracted from peritumoral regions of lung cancer lesions. Med Phys. (2019) 46:5075–85. 10.1002/mp.13808 [DOI] [PMC free article] [PubMed] [Google Scholar]
34.Zwanenburg A, Leger S, Vallières M, Löck S. Image biomarker standardisation initiative. (2016) arXiv [Preprint]. arXiv:1612.07003. [Google Scholar]
35.Nioche C, Orlhac F, Boughdad S, Reuzé S, Goya-Outi J, Robert C, et al. A freeware for tumor heterogeneity characterization in PET, SPECT, CT, MRI and US to accelerate advances in radiomics. J Nucl Med. (2017) 58(suppl. 1):1316 10.1158/0008-5472.CAN-18-0125 [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Click here for additional data file.^{(1.4MB, csv)}

Data Availability Statement

All datasets generated for this study are included in the article/Supplementary Material.

[B1] 1.Aerts HJ, Velazquez ER, Leijenaar RT, Parmar C, Grossmann P, Carvalho S, et al. Decoding tumour phenotype by noninvasive imaging using a quantitative radiomics approach. Nat Commun. (2014) 5:4006. 10.1038/ncomms5644 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B2] 2.Forghani R, Savadjiev P, Chatterjee A, Muthukrishnan N, Reinhold C, Forghani B, et al. Radiomics and artificial intelligence for biomarker and prediction model development in oncology. Comput Struct Biotechnol J. (2019) 17:995–1008. 10.1016/j.csbj.2019.07.001 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B3] 3.Lambin P, Rios-Velazquez E, Leijenaar R, Carvalho S, van Stiphout RG, Granton P, et al. Radiomics: extracting more information from medical images using advanced feature analysis. Eur J Cancer. (2012) 48:441–6. 10.1016/j.ejca.2011.11.036 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B4] 4.Kirienko M, Cozzi L, Rossi A, Voulaz E, Antunovic L, Fogliata A, et al. Ability of FDG PET and CT radiomics features to differentiate between primary and metastatic lung lesions. Eur J Nucl Med Mol Imaging. (2018) 45:1649–60. 10.1007/s00259-018-3987-2 [DOI] [PubMed] [Google Scholar]

[B5] 5.Chaddad A, Desrosiers C, Toews M, Abdulkarim B. Predicting survival time of lung cancer patients using radiomic analysis. Oncotarget. (2017) 8:104393–407. 10.18632/oncotarget.22251 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B6] 6.Chaddad A, Kucharczyk MJ, Daniel P, Sabri S, Jean-Claude BJ, Niazi T, et al. Radiomics in glioblastoma: current status and challenges facing clinical implementation. Front Oncol. (2019) 9:374. 10.3389/fonc.2019.00374 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B7] 7.Yan W, Guo X, Zhang J, Zhou J, Chen C, Wang M, et al. Lobar location of lesions in computed tomography-guided lung biopsy is correlated with major pneumothorax: a STROBE-compliant retrospective study with 1452 cases. Medicine. (2019) 98:e16224. 10.1097/MD.0000000000016224 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B8] 8.Bashir U, Kawa B, Siddique M, Mak SM, Nair A, Mclean E, et al. Non-invasive classifcation of non-small cell lung cancer: a comparison between random forest models utilising radiomic and semantic features. Br J Radiol. (2019) 92:20190159. 10.1259/bjr.20190159 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B9] 9.Zhao B, James LP, Moskowitz CS, Guo P, Ginsberg MS, Lefkowitz R, et al. Evaluating variability in tumor measurements from same-day repeat CT scans of patients with non–small cell lung cancer. Radiology. (2009) 252:263–72. 10.1148/radiol.2522081593 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B10] 10.Koçak B, Durmaz ES, Ateş E, Kiliçkesmez Ö. Radiomics with artificial intelligence: a practical guide for beginners. Diagn Interv Radiol. (2019) 25:485–95. 10.5152/dir.2019.19321 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B11] 11.Meng Y, Sun J, Qu N, Zhang G, Yu T, Piao H. Application of radiomics for personalized treatment of cancer patients. Cancer Manage Res. (2019) 11:10851–58. 10.2147/CMAR.S232473 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B12] 12.Chaddad A, Sabri S, Niazi T, Abdulkarim B. Prediction of survival with multi-scale radiomic analysis in glioblastoma patients. Med Biol Eng Comput. (2018) 56:2287–300. 10.1007/s11517-018-1858-4 [DOI] [PubMed] [Google Scholar]

[B13] 13.Mendelson EB. Artificial intelligence in breast imaging: potentials and limitations. Am J Roentgenol. (2019) 212:293–9. 10.2214/AJR.18.20532 [DOI] [PubMed] [Google Scholar]

[B14] 14.Zhang J, Zhao X, Zhao Y. Value of pre-therapy 18F-FDG PET/CT radiomics in predicting EGFR mutation status in patients with non-small cell lung cancer. Eur J Nucl Med Mol Imaging. (2019) 1137–46. 10.1007/s00259-019-04592-1 [DOI] [PubMed] [Google Scholar]

[B15] 15.Lee SW, Park H, Lee HY, Sohn I, Lee S, Kang J, et al. Deciphering clinicoradiologic phenotype for thymidylate synthase expression status in patients with advanced lung adenocarcinoma using a radiomics approach. Sci Rep. (2018) 8:8968. 10.1038/s41598-018-27273-9 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B16] 16.van Griethuysen JJM, Fedorov A, Parmar C, Hosny A, Aucoin N, Narayan V, et al. Computational radiomics System to Decode the Radiographic Phenotype. Cancer Res. (2017) 77:e104–7. 10.1158/0008-5472.CAN-17-0339 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B17] 17.Grossmann P, Stringfield O, El-Hachem N. Defining the biological basis of radiomic phenotypes in lung cancer. Elife. (2017) 6:e23421. 10.7554/eLife.23421 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B18] 18.Carvalho S, Leijenaar RTH, Troost EGC, van Timmeran JE, Oberije C, van Elmpt W, et al. 18F-fluorodeoxyglucose positron-emission tomography (FDG-PET)-Radiomics of metastatic lymph nodes and primary tumor in non-small cell lung cancer (NSCLC) - a prospective externally validated study. PLoS ONE. (2018). 13:e0192859. 10.1371/journal.pone.0192859 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B19] 19.Yang B, Guo L, Lu G, Shan W, Duan L, Duan S. Radiomic signature: a non-invasive biomarker for discriminating invasive and non-invasive cases of lung adenocarcinoma. Cancer Manag Res. (2019) 11:7825–34. 10.2147/CMAR.S217887 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B20] 20.Lee G, Bak SH, Lee HY, Choi JY, Park H. Radiomics and imaging genomics for evaluation of tumor response. Ther Response Imaging Oncol. (2020) 221–38. 10.1007/978-3-030-31171-1_13 [DOI] [Google Scholar]

[B21] 21.Voigt W, Manegold C, Pilz L, Wu YL, Müllauer L, Pirker R, et al. Beyond tissue biopsy: a diagnostic framework to address tumor heterogeneity in lung cancer. Curr Opin Oncol. (2020) 32:68–77. 10.1097/CCO.0000000000000598 [DOI] [PubMed] [Google Scholar]

[B22] 22.Zhao W, Zhang W, Sun Y. Convolution kernel and iterative reconstruction affect the diagnostic performance of radiomics and deep learning in lung adenocarcinoma pathological subtypes. Thorac Cancer. (2019) 10:1893–903. 10.1111/1759-7714.13161 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B23] 23.Romeo V, Cuocolo R, Ricciardi C, Ugga L, Cocozza S, Verde F, et al. Prediction of tumor grade and nodal status in oropharyngeal and oral cavity squamous-cell carcinoma using a radiomic approach. Anticancer Res. (2020) 40:271–80. 10.21873/anticanres.13949 [DOI] [PubMed] [Google Scholar]

[B24] 24.Yang X, Pan X, Liu H. A new approach to predict lymph node metastasis in solid lung adenocarcinoma: a radiomics nomogram. J Thorac Dis. (2018) 10:S807–19. 10.21037/jtd.2018.03.126 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B25] 25.Yang B, Ji H, Ge Y. Correlation study of 18-fluorodeoxyglucose positron emission tomography/computed tomography in pathological subtypes of invasive lung adenocarcinoma and prognosis. Front Oncol. (2019) 9:908 10.3389/fonc.2019.00908 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B26] 26.Wang B, Tang Y, Chen Y, Hamal P, Zhu Y, Wang T, et al. Joint use of the radiomics method and frozen sections should be considered in the prediction of the final classification of peripheral lung adenocarcinoma manifesting as ground-glass nodules. Lung Cancer. (2020) 139:103–10. 10.1016/j.lungcan.2019.10.031 [DOI] [PubMed] [Google Scholar]

[B27] 27.Tang X, Xu X, Han Z, Bai G, Wang H, Liu Y, et al. Elaboration of a multimodal MRI-based radiomics signature for the preoperative prediction of the histological subtype in patients with non-small-cell lung cancer. BioMed Eng OnLine. (2020) 19:5. 10.1186/s12938-019-0744-0 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B28] 28.Ferreira-Junior JR, Koenigkam-Santos M, Tenório APM, Faleiros MC, Cipriano FEG, Fabro AT, et al. CT-based radiomics for prediction of histologic subtype and metastatic disease in primary malignant lung neoplasms. Int J Comput Assist Radiol Surg. (2020) 15:163–72. 10.1007/s11548-019-02093-y [DOI] [PubMed] [Google Scholar]

[B29] 29.Frank E, Pfahringer B. Propositionalisation of multi-instance data using random forests. In Australasian Joint Conference on Artificial Intelligence. Springer: Dunedin, New Zealand; (2013) p. 362–73. 10.1007/978-3-319-03680-9_37 [DOI] [Google Scholar]

[B30] 30.Weidmann N, Frank E, Pfahringer B. A two-level learning method for generalized multi-instance problems. In European Conference on Machine Learning. Springer: Berlin, Heidelberg; (2003) p. 468–79. 10.1007/978-3-540-39857-8_42 [DOI] [Google Scholar]

[B31] 31.Erdal BS, Demirer M, Amadi CC, Ibrahim GFM, Grimmeruline R, Little K, et al. Are quantitative features of lung nodules reproducible at different CT acquisition and reconstruction parameters? (2019) arXiv [preprint]. arXiv:1908.05667. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B32] 32.Zhao B, Tan Y, Tsai WY, Qi J, Xie C, Schwartz LH. Reproducibility of radiomics for deciphering tumor phenotype with imaging. Sci Rep. (2016) 6:1–7. 10.1038/srep23428 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B33] 33.Tunali I, Hall LO, Napel S, Cherezov D, Guvenis A, Gillies RJ, et al. Stability and reproducibility of computed tomography radiomic features extracted from peritumoral regions of lung cancer lesions. Med Phys. (2019) 46:5075–85. 10.1002/mp.13808 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B34] 34.Zwanenburg A, Leger S, Vallières M, Löck S. Image biomarker standardisation initiative. (2016) arXiv [Preprint]. arXiv:1612.07003. [Google Scholar]

[B35] 35.Nioche C, Orlhac F, Boughdad S, Reuzé S, Goya-Outi J, Robert C, et al. A freeware for tumor heterogeneity characterization in PET, SPECT, CT, MRI and US to accelerate advances in radiomics. J Nucl Med. (2017) 58(suppl. 1):1316 10.1158/0008-5472.CAN-18-0125 [DOI] [Google Scholar]

PERMALINK

A Non-invasive Method to Diagnose Lung Adenocarcinoma

Mengmeng Yan

Weidong Wang

Abstract

Introduction

Figure 1.

Result

The Most Stable Features With High Classification Capability

Table 1.

Model Performance

Table 2.

Figure 2.

Discussion

Materials and Methods

Data Sets

Table 3.

CT Image Acquisition and Texture Feature Extraction

Feature Selection

Model Building and Performance Evaluation

Table 4.

Data Availability Statement

Author Contributions

Conflict of Interest

Footnotes

Supplementary Material

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

A Non-invasive Method to Diagnose Lung Adenocarcinoma

Mengmeng Yan

Weidong Wang

Abstract

Introduction

Figure 1.

Result

The Most Stable Features With High Classification Capability

Table 1.

Model Performance

Table 2.

Figure 2.

Discussion

Materials and Methods

Data Sets

Table 3.

CT Image Acquisition and Texture Feature Extraction

Feature Selection

Model Building and Performance Evaluation

Table 4.

Data Availability Statement

Author Contributions

Conflict of Interest

Footnotes

Supplementary Material

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases