Skip to main content
Elsevier - PMC COVID-19 Collection logoLink to Elsevier - PMC COVID-19 Collection
. 2022 Jan 23;142:105251. doi: 10.1016/j.compbiomed.2022.105251

Performance change with the number of training data: A case study on the binary classification of COVID-19 chest X-ray by using convolutional neural networks

Kuniki Imagawa 1,, Kohei Shiomoto 1
PMCID: PMC9749084  PMID: 35093727

Abstract

One of the features of artificial intelligence/machine learning-based medical devices resides in their ability to learn from real-world data. However, obtaining a large number of training data in the early phase is difficult, and the device performance may change after their first introduction into the market. To introduce the safety and effectiveness of these devices into the market in a timely manner, an appropriate post-market performance change plan must be established at the timing of the premarket approval. In this work, we evaluate the performance change with the variation of the number of training data. Two publicly available datasets are used: one consisting of 4000 images for COVID-19 and another comprising 4000 images for Normal. The dataset was split into 7000 images for training and validation, also 1000 images for test. Furthermore, the training and validation data were selected as different 16 datasets. Two different convolutional neural networks, namely AlexNet and ResNet34, with and without a fine-tuning method were used to classify two image types. The area under the curve, sensitivity, and specificity were evaluated for each dataset. Our result shows that all performances were rapidly improved as the number of training data was increased and reached an equilibrium state. AlexNet outperformed ResNet34 when the number of images was small. The difference tended to decrease as the number of training data increased, and the fine-tuning method improved all performances. In conclusion, the appropriate model and method should be selected considering the intended performance and available number of data.

Keywords: Performance change, Deep learning, Convolutional neural network, COVID-19, Chest X-ray, Medical device regulation

1. Introduction

The extensive interest in artificial intelligence (AI) and machine learning (ML) application is growing in the medical field. This is primarily driven by the impressive progress made by deep learning (DL) as a subset of ML because of the increased computational power and an explosion in the availability of large datasets. The number of research papers on the application of DL to medical fields has increased. For the medical image analysis, the number has dramatically increased since 2015. The number of academic papers in major conferences and journals exceeded 300 by the end of 2016 [1]. Furthermore, more than 60 AI/ML-based medical devices have already been approved by the U.S. Food and Drug Administration (FDA) in the United States [2]. There are several types of AI/ML-based medical devices. Muehlematter et al. [3] reported that most AI/ML-based medical devices approved by the FDA are used in radiology, but they span across various medical specialties, such as cardiovascular and neurology. For example, applications in radiology can be categorized into classification, detection, segmentation, etc., and the required performance varies greatly depending on the modality and target diseases.

AI/ML-based medical devices are roughly divided into two types: 1) locked type, which fixes performance prior to marketing and unable to change performance with use; and 2) continuous type, which can change performance by continuously training data after market introduction. To date, several FDA-approved AI/ML-based medical devices are typically locked type, but the FDA announced its marketing authorization for the continuous type on February 2020 [4]. The number of these medical devices is expected to increase in the market in the future. In April 2019, the FDA published a discussion paper for a proposed regulatory framework to account for the iterative nature of AI/ML-based medical devices [5]. The paper describes the total product lifecycle approach. As part of this framework, the necessity of submitting a predetermined change control plan, in which manufacturers are anticipated to perform modifications to performance, inputs, or intended use prior to marketing, is also described. Other jurisdictions are also preparing papers on regulatory guidance, with a concept similar to that in this discussion paper. Therefore, an appropriate post-market performance change plan must be established at the timing of the premarket approval.

Since the outbreak of the Coronavirus Disease 2019 (COVID-19), many studies have used convolutional neural networks (CNNs) to detect COVID-19 on chest X-ray (CXR) images. For the COVID-19 diagnosis, testing through viral RNA identification in reverse transcriptase polymerase chain reaction (RT-PCR) is currently recommended. However, chest imaging techniques, such as computed tomography (CT) and chest radiography, are considered as part of the diagnostic workup of patients with suspected or probable COVID-19 disease in case RT-PCR is not available or the results are delayed or initially negative in the presence of symptoms suggestive of COVID-19 [6,7]. Applying ML methods to COVID-19 radiological imaging may improve the diagnosis accuracy compared with the gold-standard RT-PCR while providing a valuable insight into the prognostication of patient outcomes. In particular, chest radiography is widely used, takes less imaging time, and has an accessible diagnostic modality that may be easily brought to the patient's bed. Arising from the success of the ImageNet Large-scale Visual Recognition Challenge (ILSVRC) in 2012 [8], the CNN is a class of artificial neural networks that has become dominant in object recognition, including radiology [9]. Most of the recent papers on COVID-19 diagnosis were conducted based on existing off-the-shelf models and classified images into two [[10], [11], [12], [13]] or three classes [[13], [14], [15]]: COVID-19 and Normal or COVID-19, non-COVID-19 pneumonia, and Normal. These performances such as accuracy achieves exceed 90% by using data augmentation, transfer learning or combining of publicly available datasets. For example, Nayak et al. [16] evaluated the performance of eight pretrained fundamental CNN models, which achieved excellent results in the ILSVRC, to classify COVID-19 and Normal. They also conducted a comparative analysis by considering hyperparameters, such as batch size, learning rate, and optimizer. ResNet-34 [17] exhibited the best performance, followed by AlexNet [18]. Rahaman et al. [19] evaluated the performance of 15 pretrained fundamental CNN models to classify COVID-19, non-COVID-19 pneumonia, and Normal. VGG 19 [20] obtained the highest classification accuracy. Tuan [21] performed two- and three-class tasks to classify COVID-19 using three fine-tuned CNN models (i.e., AlexNet, GoogleNet [22], and SqueezeNet [23]) without data augmentation and achieved a high classification performance in terms of accuracy, sensitivity, specificity, precision, F1 score, and area under the curve (AUC). The results suggested that the fine-tuning of network learning parameters is important because it can help avoid the development of more complex models when existing ones can achieve the same or much better performance. On the contrary, many current studies were conducted based on a small number of training data or a combination of training data without demographic statistics (e.g., age and sex distributions) due to the limited publicly available datasets. These estimated performances may probably be optimistic and misleading because of the high-risk bias based on the non-representative selection of control patients and model overfitting [24,25]. In 2020, Alberto et al. [26] publicly released a large and fully annotated BrixlA dataset of 4703 CXR images related to COVID-19 with additional information on severity, participants' age and sex, and modality manufacturer. A large dataset from the Valencian Region Medical Image Bank containing 3141 CXR and 2239 CT images of patients with COVID-19 along with their radiological findings and locations, pathologies, radiological reports, Digital Imaging and Communications in Medicine (DICOM) metadata, diagnostic antibody tests, etc., was also publicly released [27]. Such large databases are expected to be actively developed and made available to the public in the future. It is hoped that these large datasets with patient demographics will enable the introduction of appropriate AI/ML-based medical devices to the market to support medical decision making through proper regulation.

With the abovementioned background, this study evaluates the performance change as the number of training data increases. Two different CNNs, namely AlexNet and ResNet34, with and without a fine-tuning method are used to classify COVID-19 and Normal. A large BrixlA dataset of CXR images related to COVID-19 are used as the training, validation, and test data. The AUC, sensitivity, and specificity are utilized as the performance evaluation items because they are mainly evaluated through FDA premarket approvals [28]. The major outcomes of this study are the following: 1) All performances were rapidly improved as the number of training data were increased and reached an equilibrium state. 2) AlexNet outperformed ResNet34 when training data were small, and the difference between the performance of AlexNet and ResNet34 decreased as the training data increased. 3) The fine-tuned CNNs performed better than CNNs trained from scratch in all training datasets; this effect is particularly noticeable in small training data. 4) The change in the performance of the binary classification for COVID-19 and Normal datasets can be generalized as COVID-19 and non-COVID-19 pneumonia datasets.

2. Material

2.1. Datasets

Two independent datasets were used for the evaluation. The first dataset is the BrixlA dataset, which comprises 4703 CXR images of COVID-19 objects taken for both triage and patient monitoring in sub-intensive and intensive care units for 1 month between March 4 and April 4, 2020 at ASST Spedali Civili di Brescia. The images were retrieved from the facility's Picture Archiving and Communication Systems (PACS) and disclosed as DICOM formats with additional information (i.e., severity, patient's age and gender, and modality manufacturer). The second data set is a Chest X-ray14 dataset comprising 112 120 CXR images with 14 diseases and one for the Normal label from 30 805 unique patients. The CXR images were extracted from the PACS database through natural language processing (NLP) at the National Institutes of Health Clinical Center between 1992 and 2015 and disclosed as portable network graphics formats with additional information (i.e., patient ID, age, and gender and view position).

The DICOM images in the BrixlA dataset, whose window width (WW) and window center (WC) could not be derived from DICOM Tags, were excluded. The remaining DICOM images were converted from 16-bit into 8-bit through windowing, which changed the picture's appearance to highlight particular structures using the WW and WC derived from the DICOM Tags. Fig. 1 depicts an example of the CXR images from both COVID-19 and Normal classes. We randomly selected 4000 COVID-19 images and 4000 CXR images labeled as Normal from the Chest X-ray14 dataset. The combined dataset, which contained 8000 CXR images, was split into two to separate 7000 images for training and validation and 1000 images for testing. The training and validation data were selected as 16 different datasets (N = 250, 500, 750, 1000, 1500, 2000, 2500, 3000, 3500, 4000, 4500, 5000, 5500, 6000, 6500, and 7000). The ratio of the COVID-19 and Normal classes had the same proportion in all the training, validation, and test data. Table 1 and Fig. 2 depict the detailed patient demographics and age distribution for each dataset.

Fig. 1.

Fig. 1

Example of CXR images: (a) COVID-19 image without windowing; (b) COVID-19 without windowing derived in the BrixlA dataset; and (c) Normal image from the chest X-ray dataset.

Table 1.

Data and patient characteristics. AP: anteroposterior and PA: posteroanterior representing the view position. We cannot find AP and PA in the DICOM Tags, but the ratio is reported as AP (87%) and PA (13%) in the BrixlA dataset.

Database Origin Purpose Label Images Patients Female Male Age AP PA
Brixla Traning and Validation Test COVID-19 3500 1804 1060 2440 59 ± 14
500 429 130 370 58 ± 14
Chest X-ray 14 Traning and Validation Test Normal 3500 3026 1521 1979 45 ± 17 1229 2271
500 491 223 277 45 ± 17 175 325

Fig. 2.

Fig. 2

Age distribution for each dataset.

3. Methodology

Fig. 3, Fig. 4 show the proposed method for classifying COVID-19 and Normal, which mainly consisted of preprocessing and classification with a fine-tuned CNN. The detailed preprocessing, classification and evaluation are described in the subsequent sections. The 16 training and validation datasets fed to each CNN model with and without a fine-tuning method were evaluated on the common test dataset.

Fig. 3.

Fig. 3

Proposed method for classifying COVID-19 and Normal by using a fine-tuning method, that is AlexNet.

Fig. 4.

Fig. 4

Proposed method for classifying COVID-19 and Normal by using a fine-tuning method, that is ResNet34.

3.1. Preprocessing

Before feeding the CXR images to the system as input, all CXR images were resized to 256 × 256 px and cropped in the center as 224 × 224 px. The grayscale images were converted into a colored format to three channels (RGB: red, green, and blue). The pixel values of the input images were normalized in between ranges 0 and 1 based on the mean and the standard deviation to maintain the numerical stability in the CNN architectures and use a pretrained CNN with ImageNet, which contains 1.4 million images with 1000 classes.

3.2. Classification

The AlexNet and ResNet34 models were used for the classification. AlexNet was treated as the first breakthrough in the CNN model architecture, serving as the winner of the 2012 ILSVRC. It adopted an eight-layer network structure consisting of five convolutional layers and three fully connected layers. After each convolution in the five convolutional layers, maximum pooling was performed to reduce the amount of data. Data augmentation and dropout were used to reduce overfitting. Rectified linear units were utilized as an activation function instead of a sigmoid or hyperbolic tangent function. AlexNet had eight layers, and the number of parameters was approximately 60 million. Meanwhile, ResNet was the winner of the 2015 ILSVRC, which enables the training of up to hundreds or thousands of layers and inspired many other models. When the network is too deep, the gradients from where the loss function is calculated easily shrink to zero after several chain rule applications. This result on the weights never updates its values; therefore, no learning is performed. ResNet overcomes this degradation problem by introducing residual connections mapping to fit input from a previous layer to the next layer and achieves compelling performance. ResNet34 had 34 layers and 21.8 million parameters.

In addition to the AlexNet and ResNet34 CNN models without pretraining, we utilized a pretrained network, called the fine-tuning method. This method is often applied to radiology studies to replace the fully connected layers of the pretrained model with a new set of fully connected layers to retrain on a given dataset and finetune all the kernels in the pretrained convolutional base by means of backpropagation. All convolutional base layers can be finetuned. Alternatively, some earlier layers can be fixed while fine-tuning the remaining deeper layers. Our method requires the unfreezing of the entire model originally trained on a large-scale labeled dataset called ImageNet and re-training it on CXR images. Data augmentation was not conducted herein. This work is motivated by the observation that early-layer features appear more generic (e.g., edges applicable to various datasets and tasks), while later features progressively become more specific to a particular dataset or task [29,30].

3.3. Evaluation

The AUCs derived from the receiver operating characteristic (ROC) curve, sensitivity, and specificity were used for the evaluation. The ROC curve is graphical display of the true positive rate (TPR) on the y-axis and the false positive rate (FPR) on the x-axis for varying the cut-off points. Both axes are from 0 to 1. The TPR is equal to the sensitivity, while the FPR is equal to ”(1 - specificity)”. The AUC is the definite integral of an ROC curve and an effective and combined measure of sensitivity and specificity that assesses the inherent validity of a diagnostic test. An AUC closer to 1 indicates a better test performance. Sensitivity and specificity were determined by using the cut-off point defined as the Youden Index [31]. The Youden Index is the point on the ROC curve that is farthest from the line of equality (diagonal line). The optimal cut-off value is that at which the determined ”sensitivity + specificity - 1” is maximized. To construct a 95% confidence interval, the standard error was calculated using the Hanley and McNeil method [32] for the AUC and the Wald method [33] for sensitivity and specificity. A nonlinear function, y = abx (−c), was also fitted to each data. All results will be described herein to show the relation between each performance and the number of training images.

4. Experiment and results

The 16 training and validation datasets were divided into 32 batches. The training of the AlexNet and ResNet34 CNN models with and without a fine-tuning method was conducted by using an Adam optimizer (β1 = 0.9, β2 = 0.999). The network was trained 50 epochs. The learning rate was set to 1 × 105 for AlexNet and 1 × 106 for ResNet34. The hyperparameters were determined based on the comprehensive study of Nayak et al. [16]. To assess the performance of each CNN model, the ROC curve and the AUC on the common test dataset were initially determined through training on the 16 datasets. Fig. 5 shows the relations between the AUC and the number of training images for all CNN models. The sensitivity and the specificity were determined based on the cut-off values determined by using the Youden Index for each ROC curve. Fig. 6 presents an example of an ROC curve and the cut-off values obtained using the AlexNet model for the following number of CXR images: 500, 1000, 2000, and 4000 images. The sensitivity and the specificity for each ROC curve were determined. Fig. 7, Fig. 8 depict the relationship between the sensitivity and the number of training images and between the specificity and the number of training images. Fig. 5, Fig. 7, Fig. 8 show the 95% confidence interval and the fitted nonlinear function, in which parameters a, b, and c were calculated using the open-source Python library used for scientific and technical computing (SciPy; version 1.4.1) with the Levenberg–Marquardt algorism. Table 2 presents parameters a, b, and c.

Fig. 5.

Fig. 5

Relation between the AUC and the number of trained CXR images.

Fig. 6.

Fig. 6

Example of the ROC curve and the cut-off value defined as the Youden Index.

Fig. 7.

Fig. 7

Relation between sensitivity and the number of trained CXR images.

Fig. 8.

Fig. 8

Relation between specificity and the number of trained CXR images.

Table 2.

Determined parameters of the nonlinear function: y = abx(−c).

Model AUC
Sensitivity
Specificity
a b с A b С a b c
AlexNet 1.0072 1.1554 0.5274 1.0643 0.9958 0.2734 1.0176 0.6486 0.3053
Fine-Tuned AlexNet 1.0033 0.0502 0.2715 72.969 72.0594 0.0001 1.0043 1.4673 0.5727
ResNet34 0.9994 70.156 1.0889 115.1554 114.5618 0.0004 0.967 2336.8376 1.6034
Fine-Tuned ResNet34 0.9991 78.6357 1.3567 0.9885 26.3301 1.0009 1.0035 2.94 0.6165

5. Discussion

One of the features of AI/ML-based medical devices resides in their ability to learn from real-world data. However, obtaining a large number of training data in the early phase is difficult, and the device performance may change after their first market introduction. To introduce the safety and effectiveness of these devices into the market in a timely manner, an appropriate post-market performance change plan should be established at the timing of the premarket approval, and the real-world performance change must be monitored. In this work, we studied how performance changes when the number of training data is changed.

Fig. 9 shows the relations between the AUC and the training sample size and between the increment of the AUC and that of the training sample size by using the nonlinear function obtained from Table 2. All AUCs were rapidly improved as the training data increased and reached an equilibrium state. The 95% confidence interval also decreased as the training data increased. Fang et al. [34] and Samala et al. [35] obtained similar trends for deep learning-based organ auto-segmentation for head-and-neck patients by using CT images and binary classification of malignant and benign masses in digital breast tomosynthesis. Whether the performance of the preapproval stage is in the process of a rapid change or in a steady state must be determined. If the performance is in the process of a rapid change, the manufacturer and regulatory authority should carefully monitor the real-world performance change after the first market introduction.

Fig. 9.

Fig. 9

AUC versus the training sample size and the increment of the AUC versus the increment of the training sample size.

Alternatively, the AUCs for fine-tuned CNNs were better than those for CNNs trained from scratch in all training datasets. This effect is particularly noticeable when the training data are small. Tajbakhsh et al. [36] considered four distinct medical imaging applications in three specialties (radiology, cardiology, and gastroenterology) involving classification, detection, and segmentation under three imaging modalities and compared the performance of deep CNNs trained from scratch and fine-tuned pre-trained CNNs using ImageNet. They concluded that deeply fine-tuned CNNs are useful for medical image analysis, performing equally as CNNs trained from scratch and even outperforming them when limited training data are available. Furthermore, they indicated that the performance gap between deeply fine-tuned CNNs and those trained from scratch widened when the size of training sets was reduced. Our result agrees with their findings, which validates the generality of the mentioned case.

Most of the previous studies used publicly available datasets labeled as COVID-19, which are limited to tens to a few hundreds by using data augmentation, transfer learning or combining of datasets. Ran et al. [37] collected over 10 000 CXR images labeled COVID-19 and non-COVID-19 pneumonia from five hospitals and more than 30 clinics by using NLP. In their study, a binary classification was performed by using a modified DenseNet model. The AUC versus the training sample size and the increment of the AUC versus that of the training sample size was confirmed. Their results demonstrated that more than 3000 training samples are needed to achieve an AUC better than 0.90. Moreover, after the training sample size goes beyond 3000, the performance gain with the training sample increase will diminish. Their test dataset was composed of 500 chest radiographs of 500 patients. The COVID-19 and non-COVID-19 pneumonia ratio had the same proportion. A similar trend was observed in our results, where the number of training data for which the gradient disappears was approximately 1600, 400, 2200, and 1100 for AlexNet, Fine-tuned AlexNet, ResNet34, and Fine-tuned ResNet34, respectively (corresponding AUCs: 0.98, 0.99, 0.98, and 0.99, respectively). To generalize our results, additional experiments by using Fine-tuned AlexNet and ResNet34 were conducted to compare the model performance of using the datasets labeled as COVID-19 and Normal with that of using the datasets labeled as COVID-19 and non-COVID-19 pneumonia. Each dataset comprises 2500 CXR images that are randomly selected in the BrixlA and Chest X-ray 14 dataset. The dataset was split as 2000 images for training and validation, and 500 images for testing. Furthermore, training and validation data were selected as 15 different datasets. Fig. 10 shows the relation with AUC for COVID-19 and Normal and COVID-19 and non-COVID-19 pneumonia datasets. Similar trends are observed for both results. The AUC in the equilibrium state of the “pneumonia” dataset was almost the same as that of the “Normal”dataset, indicating good performance. Considering Ibrahim et al. [38], who achieved high performance using automatic detection AlexNet to classify CXR images of COVID-19 and non-COVID-19 viral pneumonia and COVID-19 pneumonia and healthy subjects, our results of binary classification for COVID-19 and Normal can be generalized to classify COVID-19 and non-COVID-19 pneumonia. Additional experiments also indicate that the main reason for the higher performance than Ran et al.’s result is not the difference in the target diseases but the collection of their dataset from multiple medical institutes.

Fig. 10.

Fig. 10

AUC versus the training sample size for COVID-19 and Normal and COVID-19 and non–COVID-19 pneumonia.

Fig. 11 shows the relation of sensitivity, specificity, and training sample size using the nonlinear function obtained from Table 2. The sensitivity and the specificity of AlexNet outperformed ResNet34 with and without the fine-tuning method in the small number of CXR images. In addition, the difference in these performances tended to decrease as the number of training data increased. In other words, if the available data are limited, AlexNet is a more proper model to use compared to ResNet34. D'souza et al. [39] conducted structural analysis and optimization of convolutional neural networks with a small sample size because the number of samples in a dataset can be relatively limited in numerous real-world applications. They trained and tested each structure followed by layer dimension (layer width) optimization using small subsets of these datasets from entirely different data nature (calligraphic, photographic, and microscopic). Their result suggests that “deeper the better” is not always true for CNNs for small datasets, also clearly shows that as the depth increase there is an initial drop in the classification error, but the error soon rises sharply (calligraphic and microscopic). However, this may not always be true, as the microscopic dataset has no clear bias toward deeper or shallower networks. They concluded optimally performing network is largely determined by the data nature. Our result, which 8-layer AlexNet outperformed the 34-layer ResNet34 with the small training dataset, is the same trend of their result especially for calligraphic (The number of training data is 100, 500, and 1000). In the medical fields, comprehensive research is limited in the literature that uses only small datasets without data augmentation, particularly on the relationship between layer and performance. Therefore, further consideration how the number of the layers affects the performance with small number of training data considering target diseases will be needed as a future work. Sensitivity and specificity required by AI/ML-based medical devices vary depending on their intended use. In general practice, high sensitivity is required if the intended use is screening diagnosis. High specificity is required if it is definitive diagnosis. In the real word, the number of available labeled data is limited. Therefore, the appropriate model and method must be selected, and an appropriate post-market performance change plan must be established by considering the intended use and the available real-world labeled data.

Fig. 11.

Fig. 11

Relation between AlexNet and ResNet34 for sensitivity and specificity.

Our study has some limitations. First, our dataset consists of only COVID-19 and Normal, and the ratio was limited to the same proportion. In the real-world data, the training data are expected to include CXR images with many pathologies not limited to COVID-19 and Normal alone. The ratio of COVID-19 and other pathologies, including Normal, may also vary. Therefore, as a future work, we will conduct a study on the performance change when adding many pathologies to the training data and change the ratio in the training data. Second, our study focused on binary classification to classify COVID-19 and Normal. To achieve a more effective classification, it is desirable to validate multi-class classifications, such as COVID-19, other viral, bacteria, and Normal by using CXR images. Although Cohen et al. [40,41] made CXR images labeled as detailed Viral, Bacterial, Fungal, etc. available to the public, the number of images is very limited. These data are continuously being collected from public sources and through indirect collection from hospitals and physicians. Therefore, it is hoped that many of these data will become available in the future.

6. Conclusion

Appropriate performance changes must be predicted to manage the performance of medical devices using AI/ML. In this study, we performed binary classification to classify COVID-19 and Normal by using large datasets. In addition, we observed the performance changes (i.e., AUC, sensitivity, and specificity) with the change in the number of training data. In the medical field, comprehensive research is limited in the literature that uses large datasets, particularly on the relationship between performance and training data because building large datasets is costly and burdensome for professionals, and there are concerns about ethical and privacy issues. This paper serves as a fundamental insight for regulators, policy makers, researchers, and manufacturers on how to develop appropriate post-market performance change plans.

Declaration of competing interest

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

References

  • 1.Litjens G., Kooi T., Bejnordi B.E., Setio A.A.A., Ciompi F., Ghafoorian M., van der Laak J.A.W.M., van Ginneken B., Sánchez C.I. A survey on deep learning in medical image analysis. Med. Image Anal. 2017;42:60–88. doi: 10.1016/j.media.2017.07.005. Dec. [DOI] [PubMed] [Google Scholar]
  • 2.Hamamoto Ryuji, Suvarna Kruthi, Yamada Masayoshi, Kobayashi Kazuma, Shinkai Norio, Miyake Mototaka, Takahashi Masamichi, Jinnai Shunichi, Shimoyama Ryo, Sakai Akira, Takasawa Ken, Bolatkan Amina, Shozu Kanto, Ai Dozen, Machino Hidenori, Takahashi Satoshi, Asada Ken, Komatsu Masaaki, Sese Jun, Kaneko Syuzo. Application of artificial intelligence technology in oncology: towards the establishment of precision medicine. Cancers. 2020;12(12) doi: 10.3390/cancers12123532. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Muehlematter Urs J., Daniore Paola, Vokinger Kerstin N. The Lancet Digital Health; 2021. Approval of Artificial Intelligence and Machine Learning-Based Medical Devices in the usa and Europe (2015–20): a Comparative Analysis. [DOI] [PubMed] [Google Scholar]
  • 4.U.S. Food and Drug Administration . 2021. Artificial Intelligence and Machine Learning (Ai/ml) Software as a Medical Device Action Plan. January. [Google Scholar]
  • 5.U.S. Food and Drug Administration . 2019. Proposed Regulatory Framework for Modifications to Artificial Intelligence/machine Learning (Ai/ml)-based Software as a Medical Device (Samd) - Discussion Paper and Request for Feedback. April. [Google Scholar]
  • 6.World Health Organization . 2020. Use of Chest Imaging in Covid-19: a Rapid Advice Guide. june. [Google Scholar]
  • 7.American College of RadiologyO . 2020. Acr Recommendations for the Use of Chest Radiography and Computed Tomography (Ct) for Suspected Covid-19 Infection. MARCH. [Google Scholar]
  • 8.Russakovsky Olga, Jia Deng, Su Hao, Krause Jonathan, Satheesh Sanjeev, Ma Sean, Huang Zhiheng, Karpathy Andrej, Khosla Aditya, Bernstein Michael, et al. Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 2015;115(3):211–252. [Google Scholar]
  • 9.Yamashita R., Nishio M., Do R.K.G., Togashi K. Convolutional neural networks: an overview and application in radiology. Insights Imaging. 2018;9(4):611–629. doi: 10.1007/s13244-018-0639-9. Aug. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Hemdan Ezz El-Din, Shouman Marwa A., Karar Mohamed Esmail. vol. 11055. 2020. Covidx-net: A Framework of Deep Learning Classifiers to Diagnose Covid-19 in X-Ray Images. arXiv:2003. [Google Scholar]
  • 11.Ali Narin, Kaya Ceren, Pamuk Ziynet. 2003. Automatic Detection of Coronavirus Disease (Covid-19) Using X-Ray Images and Deep Convolutional Neural Networks; p. 2020. arXiv. 10849. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Kumar Sethy Prabira, Behera Santi Kumari. MDPI AG; 2020. Detection of Coronavirus Disease (Covid-19) Based on Deep Features. [Google Scholar]
  • 13.Ozturk T., Talo M., Yildirim E.A., Baloglu U.B., Yildirim O., Rajendra Acharya U. Automated detection of COVID-19 cases using deep neural networks with X-ray images. Comput. Biol. Med. 2020;121 doi: 10.1016/j.compbiomed.2020.103792. 103792, 06. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Wang Linda, Wong Alexander. 2003. Covid-net: A Tailored Deep Convolutional Neural Network Design for Detection of Covid-19 Cases from Chest X-Ray Images. arXiv. 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Ucar F., Korkmaz D. COVIDiagnosis-Net: deep Bayes-SqueezeNet based diagnosis of the coronavirus disease 2019 (COVID-19) from X-ray images. Med. Hypotheses. 2020;140:109761. doi: 10.1016/j.mehy.2020.109761. Jul. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Nayak S.R., Nayak D.R., Sinha U., Arora V., Pachori R.B. Application of deep learning techniques for detection of COVID-19 cases using chest X-ray images: a comprehensive study. Biomed. Signal Process Control. 2021;64:102365. doi: 10.1016/j.bspc.2020.102365. Feb. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.He Kaiming, Zhang Xiangyu, Ren Shaoqing, Sun Jian. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2016. Deep residual learning for image recognition. June. [Google Scholar]
  • 18.Krizhevsky Alex, Sutskever Ilya, Hinton Geoffrey E. Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 2012;25:1097–1105. [Google Scholar]
  • 19.Rahaman M.M., Li C., Yao Y., Kulwa F., Rahman M.A., Wang Q., Qi S., Kong F., Zhu X., Zhao X. Identification of COVID-19 samples from chest X-Ray images using deep learning: a comparison of transfer learning approaches. J. X Ray Sci. Technol. 2020;28(5):821–839. doi: 10.3233/XST-200715. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Simonyan Karen, Zisserman Andrew. 2014. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv:1409.1556. [Google Scholar]
  • 21.Pham Tuan D. Classification of covid-19 chest x-rays with deep learning: new models or fine tuning? Health Inf. Sci. Syst. 2021;9(1):1–11. doi: 10.1007/s13755-020-00135-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Szegedy Christian, Liu Wei, Jia Yangqing, Sermanet Pierre, Scott Reed, Anguelov Dragomir, Erhan Dumitru, Vincent Vanhoucke, Rabinovich Andrew. 2014. Going Deeper with Convolutions. arXiv:1409.4842. [Google Scholar]
  • 23.Iandola Forrest N., Han Song, Moskewicz Matthew W., Ashraf Khalid, Dally William J., Squeezenet Kurt Keutzer. 2016. Alexnet-level Accuracy with 50x Fewer Parameters and <0.5mb Model Size. arXiv:1602.07360. [Google Scholar]
  • 24.Wynants L., Van Calster B., Collins G.S., Riley R.D., Heinze G., Schuit E., Bonten M.M.J., Dahly D.L., Damen J.A.A., Debray T.P.A., de Jong V.M.T., De Vos M., Dhiman P., Haller M.C., Harhay M.O., Henckaerts L., Heus P., Kammer M., Kreuzberger N., Lohmann A., Luijken K., Ma J., Martin G.P., McLernon D.J., Andaur Navarro C.L., Reitsma J.B., Sergeant J.C., Shi C., Skoetz N., Smits L.J.M., Snell K.I.E., Sperrin M., Spijker R., Steyerberg E.W., Takada T., Tzoulaki I., van Kuijk S.M.J., van Bussel B., van der Horst I.C.C., van Royen F.S., Verbakel J.Y., Wallisch C., Wilkinson J., Wolff R., Hooft L., Moons K.G.M., van Smeden M. Prediction models for diagnosis and prognosis of covid-19: systematic review and critical appraisal. BMJ. 2020;369:m1328. doi: 10.1136/bmj.m1328. 04. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Roberts Michael, Driggs Derek, Thorpe Matthew, Gilbey Julian, Yeung Michael, Ursprung Stephan, Aviles-Rivero Angelica I., Etmann Christian, McCague Cathal, Beer Lucian, et al. Common pitfalls and recommendations for using machine learning to detect and prognosticate for covid-19 using chest radiographs and ct scans. Nat. Mach. Intell. 2021;3(3):199–217. [Google Scholar]
  • 26.Signoroni A., Savardi M., Benini S., Adami N., Leonardi R., Gibellini P., Vaccher F., Ravanelli M., Borghesi A., Maroldi R., Farina D. BS-Net: Learning COVID-19 pneumonia severity on a large chest X-ray dataset. Med. Image Anal. 2021;71 doi: 10.1016/j.media.2021.102046. 102046, 07. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Vayá Maria de la Iglesia, Saborit Jose Manuel, Montell Joaquim Angel, Pertusa Antonio, Bustos Aurelia, Cazorla Miguel, Galant Joaquin, Barber Xavier, Orozco-Beltrán Domingo, García-García Francisco, et al. 2020. Bimcv Covid-19+: a Large Annotated Dataset of Rx and Ct Images from Covid-19 Patients. arXiv preprint arXiv:2006.01174. [Google Scholar]
  • 28.Wang Lu, Wang Hao, Xia Chen, Wang Yao, Tang Qiaohong, Li Jiage, Zhou Xiao-Hua. Toward standardized premarket evaluation of computer aided diagnosis/detection products: insights from fda-approved products. Expet Rev. Med. Dev. 2020;17(9):899–918. doi: 10.1080/17434440.2020.1813566. September. [DOI] [PubMed] [Google Scholar]
  • 29.Zeiler Matthew D., Fergus Rob. European Conference on Computer Vision. Springer; 2014. Visualizing and understanding convolutional networks; pp. 818–833. [Google Scholar]
  • 30.Yosinski Jason, Clune Jeff, Bengio Yoshua, Lipson Hod. 2014. How Transferable Are Features in Deep Neural Networks? arXiv:1411.1792. [Google Scholar]
  • 31.Kumar Rajeev, Indrayan Abhaya. Receiver operating characteristic (roc) curve for medical researchers. Indian Pediatr. 2011;48(4):277–287. doi: 10.1007/s13312-011-0055-4. [DOI] [PubMed] [Google Scholar]
  • 32.Hanley J.A., McNeil B.J. A method of comparing the areas under receiver operating characteristic curves derived from the same cases. Radiology. 1983;148(3):839–843. doi: 10.1148/radiology.148.3.6878708. Sep. [DOI] [PubMed] [Google Scholar]
  • 33.Youden William J. Index for rating diagnostic tests. Cancer. 1950;3(1):32–35. doi: 10.1002/1097-0142(1950)3:1<32::aid-cncr2820030106>3.0.co;2-3. [DOI] [PubMed] [Google Scholar]
  • 34.Fang Yingtao, Wang Jiazhou, Ou Xiaomin, Ying Hongmei, Hu Chaosu, Zhang Zhen, Hu Weigang. The impact of training sample size on deep learning-based organ auto-segmentation for head-and-neck patients. Phys. Med. Biol. 2021;66(18):185012. doi: 10.1088/1361-6560/ac2206. [DOI] [PubMed] [Google Scholar]
  • 35.Samala Ravi K., Chan Heang-Ping, Hadjiiski Lubomir, Helvie Mark A., Richter Caleb D., Cha Kenny H. Breast cancer diagnosis in digital breast tomosynthesis: effects of training sample size on multi-stage transfer learning using deep neural nets. IEEE Trans. Med. Imag. 2018;38(3):686–696. doi: 10.1109/TMI.2018.2870343. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Tajbakhsh Nima, Shin Jae Y., Suryakanth R Gurudu, Todd Hurst R., Kendall Christopher B., Gotway Michael B., Liang Jianming. Convolutional neural networks for medical image analysis: full training or fine tuning? IEEE Trans. Med. Imag. 2016;35(5):1299–1312. doi: 10.1109/TMI.2016.2535302. [DOI] [PubMed] [Google Scholar]
  • 37.Zhang Ran, Xin Tie, Qi Zhihua, Bevins Nicholas B., Zhang Chengzhu, Griner Dalton, Song Thomas K., Nadig Jeffrey D., Schiebler Mark L., Garrett John W., et al. Diagnosis of coronavirus disease 2019 pneumonia by using chest radiography: value of artificial intelligence. Radiology. 2021;298(2):E88–E97. doi: 10.1148/radiol.2020202944. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Umar Ibrahim Abdullahi, Ozsoz Mehmet, Serte Sertan, Al-Turjman Fadi, Shizawaliyi Yakoi Polycarp. Cognitive Computation. 2021. Pneumonia classification using deep learning from chest X-ray images during Covid-19; pp. 1–13. pages. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.D’souza R.N., Huang P.Y., Yeh F.C. Structural analysis and optimization of convolutional neural networks with a small sample size. Sci. Rep. 2020;10(1):834. doi: 10.1038/s41598-020-57866-2. 01. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Cohen Joseph Paul, Morrison Paul, Lan Dao, Roth Karsten, Duong Tim Q., Ghassemi Marzyeh. 2020. Covid-19 Image Data Collection: Prospective Predictions Are the Future. arXiv 2006.11988. [Google Scholar]
  • 41.Cohen Joseph Paul, Morrison Paul, Dao Lan. 2020. Covid-19 Image Data Collection. arXiv 2003.11597. [Google Scholar]

Articles from Computers in Biology and Medicine are provided here courtesy of Elsevier

RESOURCES