Skip to main content
Elsevier - PMC COVID-19 Collection logoLink to Elsevier - PMC COVID-19 Collection
. 2022 Dec 15;84:102722. doi: 10.1016/j.media.2022.102722

Bilateral adaptive graph convolutional network on CT based Covid-19 diagnosis with uncertainty-aware consensus-assisted multiple instance learning

Yanda Meng a, Joshua Bridge a, Cliff Addison b, Manhui Wang b, Cristin Merritt c, Stu Franks c, Maria Mackey d, Steve Messenger d, Renrong Sun e, Thomas Fitzmaurice f, Caroline McCann g, Qiang Li h, Yitian Zhao h,i,, Yalin Zheng a,j,⁎⁎
PMCID: PMC9753459  PMID: 36574737

Abstract

Coronavirus disease (COVID-19) has caused a worldwide pandemic, putting millions of people’s health and lives in jeopardy. Detecting infected patients early on chest computed tomography (CT) is critical in combating COVID-19. Harnessing uncertainty-aware consensus-assisted multiple instance learning (UC-MIL), we propose to diagnose COVID-19 using a new bilateral adaptive graph-based (BA-GCN) model that can use both 2D and 3D discriminative information in 3D CT volumes with arbitrary number of slices. Given the importance of lung segmentation for this task, we have created the largest manual annotation dataset so far with 7,768 slices from COVID-19 patients, and have used it to train a 2D segmentation model to segment the lungs from individual slices and mask the lungs as the regions of interest for the subsequent analyses. We then used the UC-MIL model to estimate the uncertainty of each prediction and the consensus between multiple predictions on each CT slice to automatically select a fixed number of CT slices with reliable predictions for the subsequent model reasoning. Finally, we adaptively constructed a BA-GCN with vertices from different granularity levels (2D and 3D) to aggregate multi-level features for the final diagnosis with the benefits of the graph convolution network’s superiority to tackle cross-granularity relationships. Experimental results on three largest COVID-19 CT datasets demonstrated that our model can produce reliable and accurate COVID-19 predictions using CT volumes with any number of slices, which outperforms existing approaches in terms of learning and generalisation ability. To promote reproducible research, we have made the datasets, including the manual annotations and cleaned CT dataset, as well as the implementation code, available at https://doi.org/10.5281/zenodo.6361963.

Keywords: COVID-19, Multiple instance learning, Graph convolutional network, Uncertainty and consensus, CT images

1. Introduction

Coronavirus disease (COVID-19) is a highly contagious respiratory infection caused by the new coronavirus SARS-CoV2. The most frequent symptoms of infection in the majority of infected people are fever, dry cough, and malaise (Wang et al., 2020c). Some of these patients quickly deteriorate, developing acute respiratory distress syndrome, septic shock, multiple organ failure, and even death, among other complications (Huang et al., 2020, Li et al., 2020a, Chen et al., 2020b). Nearly 600 million people have been infected worldwide so far, and over 6 million lost their lives, COVID-19 still spreads across the world. Timely and accurate COVID-19 diagnosis is critical for estimating the need for intensive care unit admission, oxygen therapy, prompt treatment, and so on. Despite the large number of deep learning models that have been proposed so far for the diagnosis of COVID-19 using computed tomography (CT) and X-ray, none of them is clinically useable due to methodological flaws and/or underlying biases (Roberts et al., 2021). There is an unmet need of accurate and robust diagnosis models for COVID-19.

Given existing COVID-19 related datasets, such as computed tomography (CT), X-ray, etc., previous deep learning based diagnosis methods (Goncharov et al., 2021, Bai et al., 2020b, Li et al., 2020b, Wang et al., 2020b, Han et al., 2020, He et al., 2021a) focus on the identification of three classes: novel coronavirus pneumonia (NCP), normal controls (Normal), and common pneumonia (CP) at either 2-dimensional (2D) or 3-dimensional (3D) level depending on the types of data they have used. Specifically, CT plays an important role in diagnosing and quantifying COVID-19 and other pneumonia (He et al., 2021a, Wang et al., 2021, Yang et al., 2021a, Yao et al., 2021, Xie et al., 2020, Zhu et al., 2021, Chao et al., 2021, Yang et al., 2021b, Xu et al., 2021). The appearances on CT of infective pneumonia can give clues to its aetiology, as certain consolidation patterns are associated with specific pneumonia. Fig. 1 demonstrates axial CT slices comparison between various patterns of pneumonia.

Fig. 1.

Fig. 1

Axial CT slices demonstrate various patterns (red arrows emphasised) of pneumonia. A: consolidation in the posterior right upper lobe and superior right lower lobe showing typical air bronchograms and a segmental/lobar distribution in an individual with bacterial pneumonia. B: multifocal patches of airspace change in the posterior right upper lobe in an individual with viral pneumonia. C: bilateral multifocal ground glass changes in the upper lobes with some smaller reticulonodular opacities, in an individual with COVID-19 pneumonia.

The CP group consists of different disease types, normally including community-acquired bacterial pneumonia and viral pneumonia. In detail, community-acquired bacterial pneumonia is described as showing focal segmental or lobar opacities, but may also show ground glass attenuation or centrilobular nodules (Vilar et al., 2004). Viral pneumonia is often described as multifocal, patchy or ground glass consolidation with influenza specifically demonstrating bilateral reticulonodular opacities (Koo et al., 2018, Kim et al., 2002). COVID-19 is associated with ground glass opacities (GGO) and areas of consolidation that are often bilateral and peripheral (Shi et al., 2020, Li et al., 2020d, Hani et al., 2020). However, given the overlap in radiological appearances between etiological agents, with a few exceptions no reliable diagnosis of bacterial or viral origin can be made from CT (Reittner et al., 2003), and attempts to differentiate definitively between COVID-19 and other viral pneumonia by imaging have met previously with similarly limited success (Kim et al., 2021, Bai et al., 2020a). Additionally, the underlying correlations among CT slices are essential in NCP diagnosis or infection detection tasks (Greenspan et al., 2020) but have not been considered enough in existing methods. Thus, we specially design and evaluate the proposed model on COVID-19 CT dataset in this work. The proposed framework is also readily applied to other medical applications where 3D data such as CT or MRI are used. Fig. 2 shows an overview of our proposed methods, where we propose a novel diagnosis framework in an attempt to address four critical difficulties that were rarely discussed or unsolved by earlier CT-based COVID-19 approaches. The four critical issues are discussed and elaborated as follows.

Fig. 2.

Fig. 2

Overview of the proposed diagnosis framework. Our framework first segments and crops automatically the lung regions from the input raw 3D CT volume. Then, we automatically select trustworthy slices and the corresponding 2D features via the proposed UC-MIL. Afterwards, a graph-based reasoning model BA-GCN is proposed to aggregate and fuse the information (vertices) at 2D and 3D levels simultaneously, which contributes to the final diagnosis.

Firstly, lung segmentation is an essential step prior to performing the COVID-19 classification task, however, it has received little attention in previous methods. Due to a lack of ground truth masks, previous methods (Wu et al., 2021a, Goncharov et al., 2021) segmented the lungs with pre-trained models on non-COVID datasets, while others (Wang et al., 2020b, Wang et al., 2020a) adopted un-/weakly-supervised schemes. However, due to the noticeable domain gap and complex appearances of CT images specific to COVID-19 (e.g. severe cases with massive GGO), the major issue is poor segmentation performance, which will compromise the subsequent NCP classification task. As a result, these methods need to manually clean a large number of wrong segmentations, which increase the labour cost and inconvenience for use. Here we manually annotated 7,768 slices from public COVID-19 datasets and trained a segmentation model under a fully-supervised learning mechanism. Our segmentation model can achieve more accurate results than pre-trained models or previous un-/weakly-supervised methods; please refer to Fig. 6 for the qualitative comparison between our model’s and others’ segmentations. We also prove that, without the lung segmentation, the subsequent diagnosis model may only learn a specific format pattern of different classes rather than the actual radiographic diagnosis characteristics (i.e. GGO for NCP). This may be due to specific CT scanner models, protocol standards, data sources of different classes, etc.. The potential dataset issues related to the lung segmentation process are further discussed in Section 7.1.

Fig. 6.

Fig. 6

Qualitative comparison of pre-segmented slices and our segmentation results on CC-CCII (Zhang et al., 2020) dataset. The top row is the pre-segmented slices that are provided by CC-CCII and the bottom row shows our segmentation examples on un-segmented cases. Red bounding boxes indicate the pre-segmented slices’ false positive or false negative predictions. In particular, the top left and top right examples illustrate a typical false negative prediction, where the potential GGO regions may be treated as background, as the patient-level label for this case is COVID-19 positive. Such false negative segmentation would perturb the subsequent COVID-19 classification model training because there is no infection areas or diagnosis characteristics left in the segmented CT slices. On the other hand, our segmentation model can produce a complete lung region, even when there is a large number of infection regions (e.g. GGO). Please note that CC-CCII only provides the pre-segmented CT slices without the original ones, thus we cannot intuitively compare the segmentation results with the same examples..

Secondly, selecting a fixed number of slices from each CT volume is often compulsory as the size of the inputs have to be the same for specific models (Fang et al., 2021, He et al., 2021a, Wang et al., 2021, Li et al., 2020c). Manual selection of CT slices is labour-intensive and time-consuming, which is incompatible with the goal of using AI models. Automatic selection following pre-defined slice sampling rules, on the other hand, may result in a hand-crafted optimisation process. Additionally, possibly infected slices being missed may construct a noisy dataset with intrinsic uncertainty. In this work, we propose automatically selecting reliable CT slices according to the model’s probability prediction on 2D slices. A specially designed Uncertainty-aware Consensus-assisted Multiple Instance Learning UC-MIL model is proposed to achieve such a goal. Our UC-MIL can extract 2D level features for each CT slice and automatically select trustworthy slices.

Thirdly, several methods (Shamsi et al., 2021, Mallick et al., 2020, Calderon-Ramirez et al., 2021) have attempted to quantify the uncertainty in the COVID-19 classification task but rarely exploited it during the training process. In other words, they only treated uncertainty as a quantification tool after the model had been trained, which overlooked the potential benefits of uncertainty during the model training process. In general, there are two types of uncertainty (Kendall and Gal, 2017): epistemic uncertainty, which corresponds to the uncertainty in the model parameters and can be addressed when sufficient data is available; and aleatoric uncertainty, which corresponds to the inevitable noisy perturbations presented in the data. Publicly available CT datasets (e.g. Zhang et al. (2020)) contain inevitable inherent uncertainties and constraints (He et al., 2021a), such as multiple domains data sources, duplicated or noisy slices, damaged data, disordered and incomplete slices, etc.. Alleviating the aleatoric uncertainty and exploiting it during the supervision is significant for the COVID-19 classification tasks. In this work, we propose a UC-MIL to extract 2D level features and select reliable CT slices. Specifically, an uncertainty and consensus estimation module is proposed to assist the supervision process of the multiple instance learning (MIL) models. The underlying motivations are threefold: (1) As discussed before, the inherent uncertainty in the CT dataset may perturb the model learning process. (2) In some NCP cases, there might be only few slices with COVID-19 features. Under the assumptions of class-imbalanced slices distribution in a CT volume, classic MIL might result in the classification decision boundary closer to the uncertain (rare) slices (Li et al., 2021a). (3) Owing to the weakly supervised learning nature of MIL, the model is prone to overfitting to noisy and uncertain slices (Khan et al., 2019), as all the slices from a COVID-19 positive CT volume will have the same positive labels. Nonetheless, because many slices may still look normal, this label assignment may mistakenly introduce label noise and uncertainty.

Fourthly, previous COVID-19 related deep learning methods only rely on the extracted features from either 2D or 3D level, for example, 2D CNN models on 2D X-ray images (Zhong et al., 2021, Minaee et al., 2020, Soda et al., 2021) or 3D CNN models for CT volumes (He et al., 2021a, Wang et al., 2021, Zhu et al., 2021). Differently, we propose to aggregate and reason features from 2D and 3D levels concurrently during the model learning process. Specifically, we adopt the pre-trained 2D CNN model of the proposed UC-MIL as the initialised 2D level feature extractor. We also adopt a 3D CNN as the backbone network to extract the 3D level features. Please refer to Section 3.3 for further details. With the 2D and 3D features extracted from the input CT slices, we propose a BA-GCN to aggregate the 2D and 3D information. Previous graph-based methods (Meng et al., 2021c, Luo et al., 2020a, Guo et al., 2021) have proven the superiority of graph-based models on tackling cross-granularity relationships. In this work, we regard 2D and 3D features as the bilateral vertices in a graph. A Graph Convolution Network (GCN) based model is proposed to aggregate information and exchange messages between cross-granularity vertices (2D and 3D). Note that the graph structure and edge relationships between vertices are adaptively learnt during the reasoning process, according to the 2D and 3D level features, respectively. In this way, the proposed BA-GCN can adaptively fuse and reason the bilateral relationships between 2D and 3D vertices. Specifically, in this work, the message exchange and information aggregation within 2D/3D vertices can be considered as ‘inner-granularity’, and between 2D/3D vertices can be considered as ‘cross-granularity’. Our experiments prove that such an adaptively learnt graph can better tackle the cross-granularity relationships and achieves superior classification performance than previous GCN reasoning based methods. Please refer to Section 6.2.2 for more details.

In summary, this work makes the following contributions:

  • This work proves that lung segmentation is an essential and necessary pre-processing step for the COVID-19 classification task on the public CT datasets. We establish the largest lung region mask dataset, with precise manual annotations of lung boundaries on the public COVID-19 CT dataset. Because of its significance we will make them publicly available to promote related research in the community.

  • We propose an Uncertainty-aware Consensus-assisted Multiple Instance Learning (UC-MIL) model for 2D level feature extraction and automatic selection of reliable CT slices simultaneously. This avoids handcrafted data preparation and also allows the framework to work on CT volumes with an arbitrary number of slices. It also alleviates the effects of inherent noise in public datasets on the learning and the potential uncertainty from the weakly-supervised learning mechanism of MIL.

  • We propose a Bilateral Adaptive Graph Convolution Network (BA-GCN) to aggregate information and exchange messages between bilateral cross-granularity vertices (2D and 3D levels). An adaptively learned graph structure and edge relationships are built during the graph learning process to fuse and reason the relationships between 2D and 3D vertices. This helps our proposed method consider features at both levels when making inference, thus improving the classification performance.

  • Extensive experiments show that our framework comprising UC-MIL and BA-GCN outperforms existing related approaches in terms of learning ability on the three largest publicly available COVID-19 CT datasets. In respect of varying dataset sources, we evaluate the generalisation ability of the proposed model on one of them as the external dataset, demonstrating its superior robustness and generalisability to the previous methods.

2. Related works

In this section, we review previous COVID-19 related works w.r.t. 2D and 3D level, respectively in several aspects, such as classification, infection segmentation, severity assessment, etc.. Additionally, as lung segmentation is an essential pre-process for the diagnosis, we review and compare previous works with such pre-process in a separate section. Apart from that, GCN related works in biomedical image tasks (segmentation, classification, etc.) have also been discussed.

2.1. COVID-19 diagnosis at 2D level

It is known that tackling the NCP diagnosis problem with 2D X-ray or 2D ultrasound images can achieve promising results in many tasks, such as severity assessment (Signoroni et al., 2021, Xue et al., 2021), infection localisation (Vieira et al., 2021, Wang et al., 2021b, Roy et al., 2020, Malhotra et al., 2021) and diagnosis (Zhong et al., 2021, Minaee et al., 2020, Soda et al., 2021, Oh et al., 2020, Aviles-Rivero et al., 2022, Guarrasi et al., 2022, Kumar et al., 2022, Shorfuzzaman and Hossain, 2021, Fan et al., 2021, Bridge et al., 2020). However, compared with CT images, X-ray cannot indicate the significant appearance characteristics of NCP, such as GGO), multi-focal patchy consolidation and bilateral patchy shadows (Zhang et al., 2020). On the other hand, CT images are 3D volumes, which contain correlated spatial information among slices, essential for NCP diagnosis and infection localisation tasks. However, some previous methods (Gao et al., 2021, Wu et al., 2021b, Uemura et al., 2021a, Wang et al., 2020d, Liu et al., 2021b, Fan et al., 2020b, Wang et al., 2020e, Zhou et al., 2020, Hou et al., 2021) overlooked the 3D spatial information and developed 2D deep Learning model for the aforementioned tasks only on selected CT slices. This is mainly due to limited 3D data at the early pandemic stage, and various slice numbers of CT scans from different patients. Thus, it is difficult to develop models that can directly take CT volumes with a random number of slices as input. A potential solution adopted by previous methods (Goncharov et al., 2021, Bai et al., 2020b, Li et al., 2020b, Qian et al., 2020) is to extract 2D features independently for each slice, then combining all slices’ feature maps via pooling operations. Although all the slices are considered, features are still extracted independently, and correlations between slices are not utilised. Other than that, hand-crafted selection of a fixed number of slices is commonly used for most CT based COVID-19 methods. We will discuss these methods in the following section (Section 2.2).

2.2. COVID-19 diagnosis at 3D level

Information at 3D level is essential for the tasks related with COVID-19. Most deep learning based models used 3D CT volume as the input, such as classification (He et al., 2021a, Wang et al., 2021, Tan and Liu, 2021), segmentation (Yang et al., 2021a, Yao et al., 2021, Xie et al., 2020, Yang et al., 2021b), disease progression (Zhu et al., 2021, Chao et al., 2021, Xu et al., 2021), etc. However, all of them need a pre-process to select a fixed number of slices as the input of these models. For example, Fang et al. (2021) selected 64 slices per CT volume as the model’s input. Similarly, He et al., 2021a, Tan and Liu, 2021 utilised different slices sampling rules, including random sampling and symmetrical sampling, to select a fixed number of slices. Then a neural architecture search (NAS) technique was proposed to search 3D models for the NCP diagnosis. Along the same line, Wang et al. (2021) used an equal interval sampling rule to select slices. A joint segmentation and classification model was proposed to indicate 3D lesion regions and NCP diagnosis simultaneously. Li et al. (2020c) proposed to extract the features of COVID-19 positive and negative samples as the pretext task, then a downstream model was developed to tackle the NCP classification. However, the pre-selection step was not discussed, where a fixed size of 256 × 192 × 56 voxels were cropped from CT volume as the input. Ouyang et al. (2020) proposed a size-balanced slice sampling mechanism to train the model in terms of repeating NCP data with small infections and CP data with large infections in each mini-batch. A pre-selection process of different patients w.r.t. different infection regions (small or large) need to be manually done as well. Excessive manual pre-processes made the whole framework labour-extensive and unsuitable for real-world applications.

Despite the cutting-edge performance of the models mentioned above, manual selection of a fixed number of slices is an underrated and rarely discussed issue in the task of COVID-19 with CT. For example, manual selection of CT slices is labour-intensive and time-consuming, which violate the intention of developing AI models. Automatic selection under manually designed slice sampling rules may lead to a handcrafted optimisation process and cause missing potential infected slices, which results in noise data and unsubtle predictions. Furthermore, more hyper-parameters, such as the interval value, are introduced into the developed model, which will impair models generalisability. Differently, we propose a UC-MIL framework to work as an automated trustworthy slice selection module, according to the estimated uncertainty and consensus score during the inference. Thus, our framework can automatically select corresponding slices, eliminating the labour-extensive pre-selection process and meeting real-world applications’ needs. In other words, our model can work with a raw CT volume with an arbitrary number of slices instead of pre-selected stacked CT slices.

2.3. Multiple instance learning

Multiple Instance Learning (MIL) based methods (Li et al., 2021b, Chikontwe et al., 2021, He et al., 2021b, Han et al., 2020) play a significant role to address the aforementioned challenges. In detail, a whole CT volume of a patient is considered as a bag of slices (instances) that can be COVID-19 positive or negative. Then a patient-level label is given to train the model under the weakly-supervised learning mechanism. Most of the aforementioned MIL based methods are inspired from Ilse et al. (2018), where an attention mechanism is proposed to learn a scoring system among different instances for the patient-level inference. For example, Li et al. (2021b) proposed an attention-based MIL framework for the task of NCP severity assessment, where the instance-level attention module assigns attention scores to different instances automatically during inference. Along the same line, Chikontwe et al. (2021) and Han et al. (2020) both exploited the instance-level attention mechanisms in the task of NCP diagnosis. In contrast, we propose to research the uncertainty and interpretability learning of the MIL model. A scoring system among different slices is achieved by the uncertain value of each instance’s probability predicted by our UC-MIL model. On the other hand, previous MIL based methods only rely on the extracted features of 2D instance levels. The attention module can only be seen as a weighting system among the embeddings of bags; the underlying correlations between instances are still understudied. Nevertheless, the correlations are essential in NCP diagnosis or infection detection tasks (Greenspan et al., 2020). In our proposed framework, the UC-MIL works for feature extraction and reliable slices selection in the first stage. Moreover, we developed a 3D volume-based BA-GCN model in the second stage to simultaneously exploit the 2D pixel-level features and 3D slice-level correlations for a better diagnosis performance.

2.4. Segmentation before classification

To mitigate the influence of non-lung region in CT slices, a standard pipeline will be to segment the lung region as a prerequisite before the NCP diagnosis (Wu et al., 2021a, Wang et al., 2020b, Wang et al., 2020a, Zhao et al., 2021b). For example, Wu et al. (2021a) and Goncharov et al. (2021), segmented the lung regions using a pre-trained U-net on other disease (non-COVID) dataset, such as NSCLC (Kiser et al., 2020) and LUNA16 (Team, 2011), then directly applied it to the COVID-19 CT datasets, (e.g. CC-CCII (Zhang et al., 2020) or MosMed (Morozov et al., 2020)). However, NSCLC and LUNA16 are CT datasets containing epithelial lung cancers, which differ noticeably from CC-CCII (Zhang et al., 2020) and MosMed (Morozov et al., 2020). The domain gaps between these datasets will cause poor segmentation performance of the pre-trained model, which in turn compromises the NCP diagnosis performance. Differently, Wang et al. (2020b) utilised an unsupervised method (Liao et al., 2019) to extract the connected component activation regions, which are regarded as the lung regions. However, the segmentation performance is relatively poor. It is due mainly to the distinct appearance of NCP CT slices from other normal ones, such as GGO, multi-focal patchy consolidation and patchy bilateral shadows. Thus, they had to manually clean a large number of failure cases. On the other hand, Wang et al. (2020a) followed Wang et al. (2019), used primitive thresholding and connected-component labelling algorithms to obtain a binary lung mask that indicates the coarse lung regions. Then, a sub-image was cropped to contain lung regions covered by the convex hull of the lung masks. They treated the rough mask as the ground truth to train a model to segment the lungs, which led to inevitable noisy training data because of the inaccurate lung regions.

In summary, the aforementioned methods either adopted a pre-trained model or un-/weakly-supervised methods to segment the lung region due to the lack of ground truth. The primary issue is poor segmentation performance, which perturbs the following NCP diagnosis task. Again, some methods need to clean the wrong segmentations manually, which increases the labour cost and reduces repeatability. On the contrary, we trained our segmentation module with the manually annotated lung masks under a fully-supervised learning mechanism; our segmentation model can achieve highly accurate results. We will make this manual annotation dataset publicly available. For more details about the dataset, readers are referred to Section 4.2.

2.5. Uncertainty-assisted COVID-19 diagnosis

In recent years, the uncertainty and interpretability of deep learning models have been explored in several different computer vision tasks, such as scene understanding (Meng et al., 2021b, Zhang et al., 2021) and medical image analysis (Yu et al., 2019, Ji et al., 2021, Wang et al., 2020f, Luo et al., 2020b), etc. Quantifying the uncertainty is crucial for COVID-19 classification task since publicly available CT datasets contain inherent constraints, such as multiple domains of data sources, limited dataset size, etc.. (Shamsi et al., 2021) proposed a transfer learning-based framework with the help of quantified uncertainty to address the COVID-19 diagnosis problem. They estimated the epistemic uncertainty with an ensemble learning scheme (Lakshminarayanan et al., 2017). Differently, Mallick et al. (2020) developed a deep uncertainty-aware classifier using a probabilistic generalisation of the non-parametric KNN approach. The proposed probabilistic neighbourhood component analysis method maps samples to latent probability distributions and then minimises a form of nearest-neighbour loss to develop classifiers. Then they estimated the uncertainty in terms of a threshold of the fraction of correctly classified examples. On the other hand, Calderon-Ramirez et al. (2021) researched the underlying capability of unlabelled data to improve the reliability of uncertainty. They estimated the uncertainty with the Monte Carlo Dropout (Le et al., 2018) methods under the MixMatch (Berthelot et al., 2019) semi-supervised learning scheme.

Although these aforementioned methods studied the uncertainty in the diagnosis of COVID-19 cases, the estimated uncertainty is only used as a quantification tool at the inference stage, which overlooked the potential benefits of uncertainty during the model training. Instead, we exploit the value of uncertainty throughout the training process. Specifically, an uncertainty-aware consensus-assisted training mechanism is proposed to help the model produce more reliable predictions. Please read Section 3.2.2 for more details.

2.6. Graph-based diagnosis and reasoning

Graph-based reasoning algorithms have been studied in the recent years. Benefits from Graph Neural Network (GNN)’s superior ability of information propagation and message exchange, it achieved promising results in segmentation ((Meng et al., 2020a, Meng et al., 2020b, Huang et al., 2021, Meng et al., 2021c, Meng et al., 2021a, Zhang et al., 2019), classification (Liang et al., 2018, Chen et al., 2019, Rhee et al., 2018, Chen et al., 2020a, Noh et al., 2020, Hao et al., 2021) and reconstruction (Zhao et al., 2021a, Yao et al., 2019, Wickramasinghe et al., 2020, Kong et al., 2021, Chen et al., 2021) tasks in the field of biomedical images analysis. Graph based techniques (Di et al., 2021, Aviles-Rivero et al., 2022, Liu et al., 2021a) have been used to tackle COVID-19 related tasks as well. For example, Aviles-Rivero et al. (2022) proposed a graph diffusion model that reinforces the natural relation among tiny labelled sets and vast unlabelled data in a semi-supervised learning scheme. Specifically, the graph is built on initial embeddings of the network, where each node represents an image, to produce pseudo labels, which is used for the semi-supervised NCP classification task. Moreover, Kumar et al. (2022) combined CNN and GCN to learn the relation-aware representation from the NCP X-ray images. Along the same line, Di et al. (2021) proposed a hypergraph model for the diagnosis of NCP. In detail, various types of features (e.g. regional features and radiographic features) are extracted from CT images for each case (CT volume). Then, the relationship among different cases was formulated by a hypergraph structure. Again, each case represented a vertex (node) in the hypergraph. Similarly, Liu et al. (2021a) proposed a distance-aware pooling procedure along with the GCN to aggregate the slice level feature into the patient level gradually. The CT scan is converted to a densely connected graph, where each slice represents a vertex (node) in the graph. The problem becomes a graph classification task, and each graph represents a different patient (CT volume).

The aforementioned methods shared a similar idea: each instance (single slice or whole CT volume) was represented as a vertex in the proposed graph. A subsequent graph reasoning mechanism then propagates the vertex and edge information among instance levels. However, there are some fundamental limitations: (1) the instance level features are reasoned individually. For example, work by Liu et al. (2021a) focused only on slice level feature reasoning by a graph; the same situation happened in the works of Aviles-Rivero et al. (2022) and Di et al. (2021) on patient levels (whole CT volume or X-ray) as well. This setting limits the graph-based model’s capability to tackle cross-granularity or cross-feature information propagation. In other words, the GCN mentioned above only serves to build a long relationship between instances. However, such functionality can also be achieved by pure CNN based methods, according to the recent development of Non-local methods (Wang et al., 2018a) or Transformer-based methods (Dosovitskiy et al., 2021). (2) For GCN based methods (Kumar et al., 2022, Liu et al., 2021a), they adopted Laplacian smoothing-based graph convolution (Kipf and Welling, 2017), which provided specific benefits in the sense of global long-range information reasoning. However, they estimated the initial graph structure from a data-independent Laplacian matrix. Such matrix is defined by a handcrafted or randomly initialised adjacency matrix (Meng et al., 2021a), which leads a model to learn a specific long-range context pattern (Li et al., 2020e). Differently, our graph-based model considered features from both 2D and 3D level to propagate the cross-granularity information. Also, as seen in previous works, the graph structure can be estimated with the similarity matrix from the input data (Li and Gupta, 2018). We estimate the initial adjacency matrix in an input-dependent way. Specifically, a reasoning mechanism is achieved by propagating information and passing messages among inner-granularity and cross-granularity vertices (2D and 3D). Additionally, the structure of our BA-GCN is adaptively built during the graph reasoning according to the information of 2D and 3D levels. Thus, the graph representations can be adaptively learnt in an input-dependent way instead of the pre-defined hand-craft one from the previous methods. Please read Section 3.3.2 for more details. Notably, a recent work (Zhao et al., 2020) built the adjacency matrix based on the instance features in a bag under MIL paradigm, which can also be regarded as input-dependent. However, they handcrafted the adjacency matrix weights, and the major difference between Ours and theirs are threefold: (1) Zhao et al. (2020) built a binary adjacency matrix with edge weight values of 0 or 1 to indicate whether the vertices are connected or not. However, the similarity among vertices is overlooked. Differently, Ours exploited the relationship among vertices’ own correlation and can indicate the similarity of different vertices with normalised edge weights between 0 and 1. (2)  Zhao et al. (2020) introduced a hyper-parameter (γ) to determine if two vertices are connected or not, according to their Euclidean distance. Conversely, Ours does not introduce any hyper-parameter and only relies on the vertices’ own correlations. (3) Ours constructs a fully-connected graph with every vertex connecting to one other, while  Zhao et al. (2020) did not because of the potential edge weight of 0.

3. Methods

Fig. 3 shows the proposed method’s pipeline. It contains three sub-tasks: (1) lung region segmentation, (2) reliable CT slices selection and COVID-19 classification on 2D levels (UC-MIL), (3) COVID-19 classification at both 2D and 3D levels (BA-GCN). Given an input CT volume with an arbitrary number of slices, we first segmented the lung regions for each slice, then fed the segmented CT volume into a UC-MIL model to learn and extract the relevant features at 2D slice level under a weakly supervised learning mechanism. After that, we selected D slices from each CT volume according to the predicted probability of UC-MIL model. The D slices and the corresponding 2D features extracted from UC-MIL are regarded as the input for the proposed BA-GCN. The BA-GCN learns the features on the 3D volume level (D slices) and also propagate the information from 2D level features among different vertices in the bilateral graph. Notably, the hyper-parameter D is empirically set as 16 in this work. The details of each task and the developed models are elaborated as follows.

Fig. 3.

Fig. 3

Illustration of the proposed method’s pipeline. In addition to the lung segmentation and region cropping, the two stage diagnosis mechanism w.r.t. UC-MIL and BA-GCN is shown on the top and bottom, respectively. Seg represents the lung region segmentation; UC score denotes the estimated uncertainty and consensus scores. Notably, the non-lung regions were masked out from the raw CT data by using our lung segmentation model before input into the UC-MIL. The 2D/3D level of vertices are initialised by the feature maps at 2D/3D level, which are extracted from UC-MIL and MF-Net backbone, respectively.

3.1. Lung segmentation

Because our intention was primarily the task of COVID-19 classification, here we only utilised classic methods, such as UNet (Ronneberger et al., 2015), UNet++ (Zhou et al., 2019), and other cutting-edge methods, such as PraNet (Fan et al., 2020a), RBA-Net (Meng et al., 2020a), CABNet (Meng et al., 2020b), GRB-GCN (Meng et al., 2021c), and BI-GConv (Meng et al., 2021a). We trained those models with the annotated slices at the 2D level, and applied the trained model on the rest unannotated images then cropped the lung regions. After that, the CT volume containing lungs only is ready for the following COVID-19 diagnosis task. Please note that lung segmentation process is essential and necessary in the task of COVID-19 classification primarily due to the dataset issue. Please refer to Section 7.1 for further details.

3.2. UC-MIL for diagnosis on 2D level

To develop a comprehensive COVID-19 classification model, we built a UC-MIL model to learn the diagnosis features at 2D level. In the MIL paradigm (Amores, 2013, Dietterich et al., 1997, Maron and Lozano-Pérez, 1998), unlabelled instances belong to labelled bags of instances. The goal is to predict the label of a new bag or the label of each instance. We will elaborate the mechanisms of the proposed UC-MIL in the following subsections.

3.2.1. Multiple instance learning

We denote a patient’s CT volume as a bag and the slices herein as instances, following the standard MIL formulation. We associate the bag label with the corresponding instances. In other words, all instances from the same bag have the same label and are considered discriminatory. Nonetheless, this assignment may inadvertently add label noise in positive bags due to the possibility of a certain number of slices being negative. Thus, exploiting the discriminative training samples is essential under this circumstance. Here, ‘discriminative’ represents that the true hidden label of the instance is the same as the true label of the bag.

Let X={X1,X2,,XN} as the dataset containing N bags. Each bag Xi={xi,1,xi,2,,xi,Ni} consists of Ni instances, where Xi,j={xi,j,yi}, xi,j is the j-th instance, yi denotes its associated label in the i-th bag. Please note, Ni may differ due to different number of slices in different CT volumes. The label Yi of bag Xi is given by:

Yi=0,iffiyi=01,otherwise. (1)

Generally, a MIL based prediction model contains an appropriate transformation f and a permutation-invariant transformation g (Wang et al., 2018b, Ilse et al., 2018, Li et al., 2021a). Thus, the MIL’s prediction for bag Xi is defined as:

P(Xi)=g(f(xi,1),f(xi,2),,f(xi,Ni)). (2)

With respect to the choice of f and g, there are generally two types: 1.) Instance-based approach. f is an instance classifier that assigns a score to each instance, and g is a pooling operator (e.g. max pooling) that fuses the instance scores to obtain a bag score. Specifically, a 2D CNN was trained to predict the class probability of each instance. A few instances with higher responses were selected and performed back-propagation during training. An iteration process was used with a new set of discriminative instances until convergence. 2.) Embedding-based approach. f is an instance-level feature extractor that maps each instance to an embedding; g is an aggregation operator that produces a bag embedding from the instance embeddings and outputs a bag score based on the bag embedding. The embedding-based method generates a bag score from a bag embedding supervised by the bag label. The discriminative and non-discriminative instances’ embeddings contribute differently to the overall bag prediction (Wang et al., 2018b). However, it is typically more challenging to identify the discriminative instances that activate the classifier, compared with instance-based approaches (Liu et al., 2017).

3.2.2. Uncertainty-aware consensus-assisted multiple instance learning

All the previous MIL based COVID-19 diagnosis methods (Li et al., 2021b, Chikontwe et al., 2021, Han et al., 2020) are embedding-based methods, which are adapted from Ilse et al. (2018). In this work, we take another direction and propose an instance-based UC-MIL method. Our experimental results prove that the proposed method outperforms previous instance-based and embedding-based methods on two different evaluation settings (learning ability and generalisation ability). Additionally, we conducted extensive ablation studies to determine the backbone network of the proposed MIL method. More experimental details are referred to Section 6.2.1.

Previous instance-based MIL methods (Wu et al., 2015, Pinheiro and Collobert, 2015, Hou et al., 2016, Perdomo et al., 2018, Qiu and Sun, 2019, Campanella et al., 2019, Zhang et al., 2022) achieved promising results on different medical image classification tasks, such as whole slide image classification, optical coherence tomography image classification, etc.. However, two significant challenges remain for these works. Firstly, the distribution of instances in the positive bags may be extremely imbalanced when only a tiny proportion of instances are positive, and models are prone to misclassify those positive instances as negative, especially when a simple aggregation operator, such as max-pooling, is used. This is because, under the assumptions of MIL and imbalanced instances in a bag, max-pooling might result in the classification decision boundary closer to the uncertain (rare) instances (Li et al., 2021a). Secondly, as discussed above, all the instances from the same bag have the same label and are considered discriminatory. Nonetheless, this assignment may inevitably add label noise into positive bags due to the possibility of a certain amount of slices being negative. Due to such a weakly supervised learning mechanism, the model is prone to overfitting to noisy and uncertain instances, resulting in poor generalisability in real-world clinical practice. Additionally, instances with high uncertainty have a disproportionate presence in the classification space, making it difficult to generalise learnt limits to new test examples (Khan et al., 2019).

To solve this problem, we specifically design an uncertainty estimation module and a consensus achievement module into the standard instance-based MIL model training pipeline, where an uncertainty-aware consensus-assisted supervision process is conducted. Firstly, to quantify the reliability of each instance’s prediction, we adopt Shannon Entropy (Shannon and Weaver, 1949) as the metric to measure the randomness of the information (Shannon, 2001), which is referred to as the uncertainty in this work. Formally, given a C-dimensional softmax predicted class score Pxi,j(C) from an input instance xi,j, the uncertainty Ixi,j is defined as:

Ixi,j=c=1CPxi,j(C)logPxi,j(C), (3)

where is Hadamard Product; C is the number of classes. In practice, we perform T times stochastic forward passes on each instance classifier under random dropout and Gaussian noise perturbed input for each input instance. Note that T is empirically set as 8 in this work. Therefore, under such self-ensemble mechanism, we obtain a set of softmax probability vectors: {Pxi,jt}t=1T, then the mean predicted class score P~xi,j(C) is given as:

P~xi,j(C)=1Tt=1TPxi,jt, (4)

thus based on Eq. (3) we can obtain the uncertainty I~xi,j for input instance xi,j as :

I~xi,j=c=1CP~xi,j(C)logP~xi,j(C). (5)

With the quantified uncertainty I~xi,j for instance xi,j, we normalise I~xi,j into [0,1] then perform element-wise broadcasting multiplication between I~xi,j and softmax predicted class score Pxi,j(C). In this way, uncertainty-weighted probability prediction PI~xi,j(C) for each instance xi,j is calculated as:

PI~xi,j(C)=I~xi,jPxi,j(C), (6)

where denotes the element-wise broadcasting multiplication. In other words, the operator g in our UC-MIL will consider the reliability of each f(xi,Ni) in Eq. (2), and only the trustworthy slides are considered for the model to learn the features.

Secondly, under a certain perturbation, network predictions for memorised features that learned from noise change significantly, while those for generalised features do not (Lee and Chung, 2020). In other words, the predictions of a generalisable instance classifier should be robust to input perturbation, and the predicted class score that changes significantly under a certain perturbation hence highly suggests a noisy instance (Li et al., 2021a). Thus, we quantify the consensus regarding the standard deviation over a self-ensembling models’ multiple outputs, with the same input but under various perturbations. Formally, for an instance xi,j, given a set of softmax probability vectors {Pxi,jt}t=1T and the mean predicted class score P~xi,j(C), the standard deviation Pˆxi,j(C) of the predicted class score is defined as:

Pˆxi,j(C)=1Tt=1T(Pxi,jtP~xi,j(C))2, (7)

which is regarded as the metric of consensus in this work. With such quantified consensus achievement, we exclude the uncertain instances so as to guide the model to learn from more reliable instances. More specifically, the reliable instances jr in bag Xi are selected iif Pˆxi,j(C) is smaller than a threshold γ. Formally, for bag Xi, the trustworthy instances set Ω is given by:

Ω={xi,j|Pˆxi,j(C)<γ}. (8)

Notably, we perform extensive experiments to tune the hyper-parameter γ value, which is empirically set as 0.02 in this work. The number of trustworthy slices in Ω ranges from 16 to 45 for all of the data used in this work. This comes with the advantage that our framework can deal with CT volumes with any arbitrary number of slices.

Combining the uncertainty and the consensus scores discussed before, the whole optimisation procedure of the proposed UC-MIL methods in a single bag (Xi) can be found in 1. An iteration process was used with a new set of bags (X1,,XN) to update the parameter of the instance classifier until convergence.

graphic file with name fx1001_lrg.jpg

In this way, whether a retrieved discriminative instance is trustworthy or noisy can be differentiated by the model during the training. The learnt classifier considers the uncertainty level of the instance predictions to re-adjust boundaries (i.e., providing more room to uncertain samples). This improves the generalisation ability of the proposed model for either imbalanced instances or weakly supervised learning mechanisms (Khan et al., 2019). Furthermore, our experiments prove that with the UC-MIL training, our model outperforms previous instance-level MIL methods by a large margin in the evaluation of generalisation ability. Notably, previous instance-level MIL methods conduct a promising classification results in the seen data (i.e. the evaluation of learning ability), however, drop dramatically on unseen data (i.e. the evaluation of generalisation ability).

During the training, we adopted the same method used in Campanella et al. (2019), which selects the top instances with maximum prediction probability within a bag as the bag’s prediction. Such bag-level aggregation derives directly from the standard multiple instance assumption and is generally referred to as ‘max-pooling’ (Campanella et al., 2019) and is shown in Fig. 3. With the proposed UC-MIL, we obtain temporary patient-level diagnosis results in the first stage. However, the instance level features are learned individually during the whole training process. In other words, only 2D level of information is considered in UC-MIL. Thus we aggregate both 2D and 3D features in the subsequent BA-GCN, which helps to make the diagnosis more reliable and accurate.

3.3. Diagnosis at both 2D and 3D levels

In this section, we demonstrate the proposed BA-GCN w.r.t. the COVID-19 diagnosis at both 2D and 3D levels. As discussed in Section 2.2, the correlations between different CT slices are essential for theCOVID-19 diagnosis. For bag Xi, we select the top D instances (slices) according to the ranked order of uncertainty-aware consensus-assisted instance prediction probability (PI~xi,jr(C)) from the corresponding trustworthy set Ω. Then we stack the slices along the depth channel as the 3D input for the proposed BA-GCN. In this way, we can automatically select a fixed number of reliable slices from each CT volume, which avoids the labour-intensive manual selection process or other hand-craft slice sampling strategies that are adopted by the previous methods (Ouyang et al., 2020, Li et al., 2020c, Wang et al., 2021, He et al., 2021a, Fang et al., 2021). Additionally, the extracted slice-level features of UC-MIL are used as the 2D feature maps input for the proposed BA-GCN. Specifically, for each of the D slices classifier in UC-MIL, we extract the feature map before the pooling layer and add an 1×1 convolution layer to reduce the channel size to 128. Then for each CT volume, we stack all the corresponding D feature maps along the depth channel as the ‘2D’ input for the proposed BA-GCN. We represent the ‘2D’ input as X2D in this work. Notably, X2D has the size of D × 128 × 7 × 7. The size format follows (D × C × H × W), where D is number of slices; C is channel size; H and W represents height and width of feature maps, respectively. There are two primary modules in the proposed BA-GCN: (1) Backbone Network, (2) Bilateral Adaptive Graph Reasoning Module. The details for each of them are elaborated as follows.

3.3.1. Backbone network

We firstly input 3D CT volumes as the inputs into a backbone network to extract features and learn the correlations between different slices at the 3D level. Different from previous methods (Li et al., 2020c, Wang et al., 2021, Ouyang et al., 2020), where the 3D extensions of ResNet (He et al., 2016) or Inception-Net (Szegedy et al., 2015) are used as the backbone, we adopt Multi-Fibre Network (MF-Net) (Chen et al., 2018) due to its superior ability to extract discriminative features in recognition tasks. MF-Net (Chen et al., 2018) is a sparsely connected 3D CNN backbone that costs a minimal computational overhead, but brings a boosted representation capability of features. The multiple separated lightweight residual units, called fibres, can effectively reduce the number of connections within the network and enhance the model efficiency. The advantage of MF-Net fits in and benefits our model in this specific task. Our ablation study results also prove that the MF-Net based backbone outperforms ResNet or Inception-Net variants in this work. Specifically, the 3D Multi-Fibre Units can enhance the model efficiency while effectively reducing the number of computations. In detail, we extract the feature map before the pooling layer, then add a 1×1×1 convolution layer to reduce the channel size to 128, and save it for the subsequent information aggregation process in the proposed BA-GCN. We refer to this feature maps as X3D in this work. Notably, the X3D has the same size as X2D, with D × 128 × 7 × 7.

3.3.2. Bilateral adaptive graph reasoning module

Given the feature maps extracted from UC-MIL as 2D level’s information (X2D) and the feature maps extracted from MF-Net Backbone as 3D level’s information (X3D), we propose a bilateral adaptive graph to aggregate the features from both 2D and 3D levels. In detail, a graph reasoning module is achieved by information exchange and propagation among different granularity levels of vertices. Additionally, our graph structure and the edge relationship are adaptively learnt during the reasoning process according to the 2D and 3D level features’ own information. Thus, a bilateral adaptive graph representation can be learnt in an input-dependent way, rather than predefined hand-craft ones (Di et al., 2021, Aviles-Rivero et al., 2022, Liu et al., 2021a).

Classic Graph Convolution We begin with a review of classic graph convolution. Given a graph G = (V, E), the normalised Laplacian matrix is defined as L=ID12AD12, where I is the identity matrix, A is the adjacency matrix, and D is a diagonal matrix representing the degree of each vertex in V, such that Dii=jAi,j. Because the graph’s Laplacian is a symmetric and positive semi-definite matrix, L may be diagonalised using the Fourier basis URN×N, resulting in L=UΛUT. Thus, the Fourier space spectral graph convolution of i and j may be described as ij=U((UTi) (UTj)). The columns of U correspond to the orthogonal eigenvectors U=[u1,,un], and Λ=diag([λ1,,λn])RN×N is a diagonal matrix with non-negative eigenvalues. Due to the fact that U is not a sparse matrix, this operation is computationally inefficient. Defferrard et al. (2016) hypothesised that the convolution operation on a graph may be characterised by constructing spectral filtering with a kernel gθ in Fourier space through a recursive Chebyshev polynomial. The filter gθ is parameterised as a Chebyshev polynomial expansion of order K, such that gθ(L)=kθkTk(Lˆ), where θRK is a vector of Chebyshev coefficients, and Lˆ=2L/λmaxIN is the rescaled Laplacian. TkRN×N is the Chebyshev polynomial of order KKipf and Welling (2017) further simplified the graph convolution to gθ=θ(Dˆ12AˆDˆ12), where Aˆ=A+I, Dˆii=jAˆij, and θ are the only remaining Chebyshev coefficients. The corresponding graph Laplacian adjacency matrix Aˆ is handcrafted, causing the model to learn a specific long range context pattern rather than the input-related one (Li et al., 2020e). Thus, we refer to the classic graph convolution (Kipf and Welling, 2017) as handcrafted input-independent graph convolution.

Bilateral Adaptive Graph Convolution Given X2DRN2d×C and X3DRN3d×C, where C is the channel size; N2d=H2d×W2d×D and N3d=H3d×W3d×D are the number of spatial locations of 2D and 3D level of input features, which are referred to as the number of vertices. Note that H, W and D represent the height, width, and depth of the corresponding level of feature map, respectively. Firstly, we construct the bilateral adjacency matrix (A~) in an adaptive way. The vertices of 2D and 3D (X2D, X3D) contribute to the adjacency matrix construction concurrently and adaptively. In detail, we stack them together and represent it as XallR(N3d+N2d)×C, which is regarded as the input vertices of BA-GConv (shown in Equation. (12)). Then, we implement two matrices (Λ~c and Λ~s) to execute channel-wise attention on the dot-product distance and to quantify spatially weighted relations between various input vertices embeddings, respectively. For example, Λ~c(Xall)RC×C is the matrix that contains channel-wise attention on the dot-product distance of the input vertex embeddings; Λ~s(Xall)R(N3d+N2d)×(N3d+N2d) is the spatial-wise weighting matrix, measuring the spatial relationships among different vertices.

Λ~c(Xall)=(MLP(Poolc(Xall)))T(MLP(Poolc(Xall))), (9)

where denotes matrix product; Poolc() is the global max pooling for each vertex embedding; MLP() is a multi-layer perceptron with one hidden layer. On the other hand,

Λ~s(Xall)=(Conv(Pools(Xall)))(Conv(Pools(Xall)))T, (10)

where Pools() represents the global max pooling for each position in the vertex embedding along the channel axis; Conv() is a 1 × 1 convolution layer. The data-dependent adjacency matrix A~ is given by spatial and channel attention-enhanced input vertex embeddings. We initialise the bilateral adjacency matrix A~R(N3d+N2d)×(N3d+N2d) as:

A~=ψ(Xall,Wψ)Λ~c(Xall)ψ(Xall,Wψ)T+ζ(Xall,Wζ)ζ(Xall,Wζ)TΛ~s(Xall), (11)

where represents matrix product; denotes Hadamard product; ψ(Xall,Wψ)R(N2d+N3d)×C and ζ(Xall,Wζ)R(N2d+N3d)×C are both linear embeddings; Wψ and Wζ are learnable parameters. Fig. 4 shows a detailed demonstration of the bilateral adjacency matrix A~. Please note that the different granularity levels of relationships among vertices from 2D and 3D (X2D, X3D) are exploited in this bilateral graph, where the graph is adaptively built up according to the multi-granularity vertices’ own correlations in a data-dependent way. With the constructed A~, the normalised Laplacian matrix is given as L~=ID~12A~D~12, where I is the identity matrix; D~ is a diagonal matrix that represents the degree of each vertex, such that D~ii=jA~i,j; notably a softmax is applied on A~ for normalised adjacency weights. We calculated degree matrix D~ with the same way that is used in Meng et al., 2021a, Li et al., 2020e, to override the computation overhead. Given computed L~, with Xall as the input vertex embeddings, we formulate the single-layer BA-GConv as :

Y=σ(L~XallWG)+Xall, (12)

where WGRC×C denotes the trainable weights of the BA-GConv; σ is the ReLu activation function. Additionally, we include a residual connection to preserve the features of input vertices. YR(N3d+N2d)×C is the output vertex features. Empirically, Three layers of the proposed BA-GConv with residual connections build up a graph reasoning module (BA-GCN shown in Fig. 4). After the BA-GCN, a convolution layer is added to reduce the channel size to one. Two layers of MLP with ReLu and Softmax as the activation functions respectively are used to aggregate the output vertices features and predict the final patient-level diagnosis probability.

Fig. 4.

Fig. 4

Overview of the proposed BA-GCN, Bilateral Adaptive Graph Convolution (BA-GConv) and Bilateral Adaptive Adjacency Matrix (A~).

4. Experiments

4.1. Datasets

In this work, we perform experiments on three currently largest publicly available COVID-19 CT datasets: CC-CCII (Zhang et al., 2020), MosMed (Morozov et al., 2020) and COVID-CTset (Rahimzadeh et al., 2021). All of the three datasets are used in PNG format in this work. The total number of slices per CT volume ranges from 16 to 375. They are utilised to evaluate the learning ability and generalisation ability of our proposed model, respectively. The details of these three datasets w.r.t. two evaluation settings are shown in Table 1. The CC-CCII (Zhang et al., 2020) dataset contains three classes of NCP, CP and Normal and the other two datasets only contain two classes of NCP and Normal. We evaluate the learning ability of our proposed model on CC-CCII (Zhang et al., 2020) dataset. On the other hand, we evaluate the generalisation ability of our proposed model . Firstly, in order to eliminate the effect of imbalanced data class distribution, we combine the Normal class’s data of CC-CCII (Zhang et al., 2020) with all of MosMed (Morozov et al., 2020) dataset as Train & Val dataset. Then, the COVID-CTset (Rahimzadeh et al., 2021) dataset is treated as the External Test dataset. We have shown this data setting in the middle of Table 1. We elaborate the details of each datasets below.

Table 1.

Descriptions of the three COVID-19 CT datasets. Cleaned CC-CCII (Zhang et al., 2020), MosMed (Morozov et al., 2020) and COVID-CTset (Rahimzadeh et al., 2021) are three currently largest publicly available COVID-19 CT datasets. # Patient and # Slices represent the number of patient and slices, respectively. Train & Val represent the subset that contains train and validation datasets. Note that we randomly select 10% of Train & Val as the validation datasets.

Datasets Classes # Patients
# Slices
Train & Val Test Train & Val Test
CC-CCII NCP 414 133 24,255 10,330
CP 773 186 59,080 12,509
Normal 675 174 50,874 15,266
Total 1,862 493 134,209 38,105

MosMed + NCP 856 95 28,188 15,589
CC-CCII + Normal 929 95 59,439 12,718
COVID-CTset Total 1,785 190 97,627 28,307
  • CC-CCII (Zhang et al., 2020). The original CC-CCII dataset contains a total of 617,775 slices of 6,752 CT volume from 4,154 patients. However, it has several problems, such as corrupted data, duplicated and noisy slices, incomplete slices, non-unified data type, etc.. Please see Fig. 5 for details. Considerable effort has been made to build a clean dataset for training and evaluation. We have manually checked the whole dataset and removed the noisy data (damaged, duplicated and non-unified). Note that we only use complete scans with volume scan per patient to avoid information leakage during training and evaluation. After addressing the above problems, we build a clean CC-CCII dataset, which consists of 172,314 slices of 2,355 scans from 2,355 patients (shown in Table 1). Apart from the issues above, CC-CCII provided pre-segmented CT slices only but without original CT slices for part of the dataset. For example, in the clean CC-CCII, 59,256 slices of 740 volume from 740 patients are pre-segmented, and the rest 113,058 slices of 1,615 scans from 1,615 patients are not. Our experimental results proved that lung segmentation pre-process is necessary for the task of COVID-19 classification, especially for models trained on CC-CCII (Zhang et al., 2020) datasets. The details of the potential dataset issue related to the lung segmentation pre-process are discussed in Section 7.1. Besides, some qualitative visualisation results, such as GradCAMs (Selvaraju et al., 2017), are shown in Fig. 10 to prove the importance of lung segmentation pre-process in this task. To address the non-segmentation problem, we segmented the lungs of the non-segmented slices with our trained model. Compared with the pre-segmented lung slices of CC-CCII (Zhang et al., 2020), our model can segment more accurate lung regions. Qualitative results and comparisons are shown in Fig. 6. As illustrated, our segmentation can generate a more smooth lung boundary and conduct fewer false positive predictions.

  • MosMed (Morozov et al., 2020). MosMed dataset was collected from March 2020 to April 2020, within the outpatient CT centres in Moscow outpatient clinics, Moscow, Russia. The CT scans were performed on Canon (Toshiba) Aquilion 64 units with standard scanner protocols and 8 mm inter-slice distance. The dataset contains 36,753 slices of 1,110 volume from 1,110 patients. Specifically, 28,188 slices of 856 volume are NCP cases, and the rest 8,565 slices of 254 volume are Normal. Additionally, 50 CT volume were annotated on the region of infection areas such as GGO and consolidation. However, the ground truth of lung region segmentation is not provided. We segmented the lung regions of all the slices with our trained segmentation model and used the cropped slices as the clean dataset for the COVID-19 classification task. Please note that the data were provided in NIfTI format by Morozov et al. (2020), which were converted to PNG format, where a window (window centre: -600HU, window width: 1200HU) was applied for re-scaling and normalising the pixel values.

  • COVID-CTset (Rahimzadeh et al., 2021). COVID-CTset dataset was collected from Negin radiology located at Sari in Iran between March 5th to April 23rd, 2020. This medical centre uses a SOMATOM Scope model and syngo CT VC30easyIQ software version for capturing and visualising the lung HRCT radiology images from the patients. The dataset contains 63,849 slices of 377 volumes from 377 patients. Specifically, 15,589 slices of volumes scans are NCP cases, and the rest 48,260 slices of volumes scans are Normal. We randomly select 95 out of 282 Normal volumes to construct a balanced external test dataset. Again the ground truth of lung region segmentation is not provided. Thus, pre-segmentation is performed with our trained segmentation model to build a clean dataset with cropped slices.

Fig. 5.

Fig. 5

Examples of problematic slices from the original CC-CCII (Zhang et al., 2020) dataset. Those noisy data will inevitably introduce perturbations into both the lung segmentation task and the COVID-19 diagnosis task.

Fig. 10.

Fig. 10

Qualitative comparison of Grad-CAM on the same input with and without pre-segmentation step. Models without pre-segmentation (Ours, w/o Seg) attend to other regions (e.g. scanner bed) rather than the discriminatory parts (e.g. GGO) of the lung regions in the NCP CT images.

4.2. Annotation of COVID-19 CT images

As discussed in Section 2.4, previous methods, such as Wu et al. (2021a) and Goncharov et al. (2021), pre-trained the lung segmentation model from non-COVID datasets (i.e. cancer nodule segmentation datasets: NSCLC (Kiser et al., 2020) and LUNA16 (Team, 2011)), then applied it to the COVID-19 CT scans. The domain gap between different datasets would cause significantly performance drop. For example, the GGO regions are typical characterises of NCP cases, which is an unseen feature in the cancer nodule dataset. Thus, their pre-trained models are likely to treat it as background (similar examples are shown in the top left and top right of Fig. 6). To address the challenges and train a robust lung segmentation model, four trained medical students (trainee doctors after training on the annotation tasks) from the University of Liverpool manually annotated 7,768 slices of NCP, CP, and Normal scans from CC-CCII (Zhang et al., 2020) datasets. In detail, the boundaries of the left and right lungs are traced via Labelme (Wada, 2016) annotation tool. Among the annotated 7,768 slices, 6,045 slice of 190 patients are NCP, 1,202 slices of 10 patients are CP, 521 slices of 10 patients are Normal. In this way, our annotated slices contain NCP, CP and Normal examples, which addresses the domain gap between the train and test dataset.

4.3. Evaluation metrics

Segmentation Metrics Typical segmentation metrics, such as Dice similarity score (Dice), Mean Absolute Error (MAE) and Balanced Accuracy (B-Acc), are applied. 95% confidence intervals were calculated using 2000 sample bootstrapping for Dice, MAE, and B-Acc. Specifically, B-Acc is the mean value of Sensitivity and Specificity; MAE is used to measure the pixel-wise error between the segmentation and ground truth. MAE is defined as:

MAE=1w×hxwyh|Sp(x,y)Sgt(x,y)|, (13)

where, w and h are the width and height of the ground truth GTs, and (x, y) denotes the coordinate of each pixel in GTs.

Classification Metrics Typical classification metrics, such as Sensitivity, Specificity, F1 score (F1), Precision, Receiver Operating Characteristic Curves (ROC Curve), Area Under the ROC Curve(AUROC), are used for the evaluation of classification. In particular, F1 is introduced to eliminate the interference of data imbalance. 95% confidence intervals were calculated using De Long’s method (DeLong et al., 1988) for AUROC and using 2000 sample bootstrapping for Sensitivity, Specificity, F1 and Precision.

4.4. Experimental details

In this section, we describe the experimental implementation details for the lung segmentation and COVID-19 classification tasks, respectively. All the training processes are performed on an Amazon Web Services p3.8xlarge node with four Tesla V100 16GiB GPUs and our workstation with four GEFORCE RTX 3090 24GiB GPUs. All the test experiments are conducted on a local workstation with Intel(R) Xeon(R) W-2104 CPU and Geforce RTX 2080Ti GPU. Notably, we have conducted extensive experiments to evaluate the sensitivity of the hyper-parameters, where γ has been set at 0.1, 0.05, 0.02, 0.01, 0.005, and T has been set at 2, 4, 6, 8, 10. In conclusion, we found no significant difference in diagnostic performance with paired t-test (p > 0.05) in the two evaluation settings, which proves that our model is robust to the hyper-parameters. Thus, we set the value of γ and T at 0.02 and 8 empirically, respectively.

4.4.1. Lung segmentation

Implementation Details The original slice image is resized into 224 × 224 from 512 × 512by bilinear interpolation for CT slices and by nearest neighbour interpolation for binary annotation masks. To augment the dataset, we randomly rotate and horizontally flip the training dataset with the probability of 0.3. The rotation ranges from 30 to 30 degree. Besides, a random crop of size 112 × 112 are also applied both on the input image and ground truth during the training. Among all of our annotated data, 60% of which are randomly selected as Train dataset, 10% are Val dataset and 30% are Test dataset. The network is trained end-to-end by an Adam optimiser (Kingma and Ba, 2014) for around 400 epochs, with a start learning rate of 0.01 and a cosine decay schedule (Loshchilov and Hutter, 2017). The batch size is set at 126. We adopt standard Dice Loss (Milletari et al., 2016) for training the lung segmentation model.

4.4.2. COVID-19 classification

Implementation Details. The input image size is 224 × 224 after lung segmentation. Similarly, to augment the dataset, we randomly rotate, horizontally and vertically flip the training dataset with the probability of 0.3. The rotation ranges from 30 to 30 degree. 10% of the Train & Val dataset are randomly selected as the validation dataset. The network is trained end-to-end for 400 epochs, with a start learning rate of 1e-4 and a cosine decay schedule (Loshchilov and Hutter, 2017). The optimiser is an Adam optimiser (Kingma and Ba, 2014), the batch size is set at 48 and 36, for 2D and 3D COVID-19 diagnosis training, respectively. We adopt standard Cross Entropy Loss for both 2D and 3D COVID-19 classification respectively.

Table 2.

Quantitative segmentation results of the lung regions on CT slices. The performance is reported as Dice (%), B-Acc (%) and MAE (%). 95% confidence intervals are presented in brackets. We performed experiments with classic segmentation methods such as U-Net (Ronneberger et al., 2015), U-Net++ (Zhou et al., 2019), and cutting-edge methods such as PraNet (Fan et al., 2020a), RBA-Net (Meng et al., 2020a), CABNet (Meng et al., 2020b), GRB-GCN (Meng et al., 2021c) and BI-Gconv (Meng et al., 2021a). Notably, we sampled 120 vertices for CABNet (Meng et al., 2020b) and RBA-Net (Meng et al., 2020a) to construct a smooth boundary.

Methods Metrics
Dice (%)↑ B-Acc (%)↑ MAE (%)↓
U-Net 95.7
(93.2, 97.6)
96.9
(95.0, 98.4)
1.49
(1.12, 1.68)
textitU-Net++ 94.1
(92.2, 96.0)
95.0
(93.2, 97.5)
1.98
(1.56, 2.23)
PraNet 95.2
(94.0, 96.6)
96.0
(95.1, 98.0)
1.55
(1.38, 1.68)
RBA-Net 96.2
(95.2, 98.0)
96.8
(95.9, 98.0)
1.45
(1.29, 1.56)
CABNet 95.4
(93.8, 96.7)
96.4
(94.7, 98.1)
1.60
(1.42, 1.78)
GRB-GCN 96.6
(94.9, 97.9)
96.7
(95.8, 97.9)
1.50
(1.32, 1.68)
BI-GConv 96.3
(94.8, 98.0)
96.5
(94.7, 98.2)
1.52
(1.34, 1.69)

5. Results

5.1. Lung segmentation

Fig. 6 shows the qualitative lung segmentation result of the pre-segmented slices (provided by CC-CCII) and our segmentation results on CC-CCII.

Table 2 shows the quantitative results of classic segmentation models, such as U-Net (Ronneberger et al., 2015), U-Net++ (Zhou et al., 2019), and cutting-edge methods such as PraNet (Fan et al., 2020a), RBA-Net (Meng et al., 2020a), CABNet (Meng et al., 2020b), GRB-GCN (Meng et al., 2021c) and BI-Gconv (Meng et al., 2021a). There are no significant differences between these models. Among them, GRB-GCN (Meng et al., 2021c) achieves the best performance of 96.6% Dice, outperforming U-Net (Ronneberger et al., 2015) and U-Net++ (Zhou et al., 2019) by 0.9 and 2.7%.

5.2. COVID-19 diagnosis

This section provides the classification results in two evaluation settings with the pre-segmented COVID-19 CT data. Firstly, we train, validate and test our model on CC-CCII dataset (seen data) only, where there are three classes such as Normal, NCP, and CP. In this way, the learning ability of our model can be illustrated on the seen data. Secondly, in order to address the unbalanced classes issue of Mosmed, we combine the Normal class’s data from CC-CCII and all of the data from MosMed, to train and validate our model, while test on COVID-CTset (unseen data). There are two classes in this setting, such as Normal and NCP. In this way, we demonstrate the generalisation ability of our model on the unseen data. Generalisability is essential for the real-world COVID-19 diagnosis task, because of different domains of data w.r.t. scanning machine types, protocol standards, data sources. The details of data settings in these two schemes can be found in Table 1. The quantitative comparison results on respective test datasets of two evaluation settings are shown in Table 3, where previous 3D CT based COVID-19 diagnosis methods such as CCT-Net (Goncharov et al., 2021), C19C-Net (Bai et al., 2020b), COVNet (Li et al., 2020b), DeCoVNet (Wang et al., 2020b), ASCo-MIL (Han et al., 2020) are presented. Notably, their results are reproduced by using their open-source code, and experiments are conducted under the same settings as Ours with our pre-segmented lung CT images.

Table 3.

Quantitative comparisons between Ours and previous 3D CT based COVID-19 diagnosis methods, such as CCT-Net (Goncharov et al., 2021), C19C-Net (Bai et al., 2020b), COVNet (Li et al., 2020b), DeCoVNet (Wang et al., 2020b), ASCo-MIL (Han et al., 2020)). The performance is reported as F1 (%), Precision (%), Specificity (%), Sensitivity (%), AUROC (%). 95% confidence intervals are presented in brackets.

Methods Learning ability
Generalisation ability
F1 (%)↑ Precision(%)↑ Specificity(%)↑ Sensitivity(%)↑ AUROC(%)↑ F1 (%)↑ Precision(%)↑ Specificity(%)↑ Sensitivity(%)↑ AUROC(%)↑
CCT-Net 76.8
(72.9, 80.6)
83.5
(81.0, 86.1)
84.4
(81.2, 87.4)
78.1
(74.6, 81.5)
96.1
(94.4, 97.1)
71.6
(62.8, 79.2)
86.6
(77.8, 94.2)
90.5
(84.2, 96.0)
61.1
(50.5, 71.3)
85.9
(80.1, 90.7)
C19C-Net 66.2
(61.5, 70.8)
71.2
(66.2, 75.9)
80.0
(76.8, 82.9)
70.2
(66.3, 74.0)
86.7
(83.9, 88.4)
70.4
(63.5, 76.2)
56.8
(48.7, 64.3)
29.5
(20.2, 38.8)
92.6
(87.4, 97.8)
80.0
(73.2, 86.0)
COVNet 59.6
(54.9, 64.6)
73.7
(64.5, 79.4)
75.5
(71.5, 78.7)
68.0
(63.9, 72.0)
87.5
(84.8, 89.3)
33.6
(22.4, 43.7)
70.0
(52.0, 85.7)
90.5
(84.0, 96.0)
22.1
(14.1, 30.6)
71.5
(63.7, 78.6)
DeCoVNet 91.2
(88.5, 93.7)
91.6
(89.1, 94.0)
95.0
(93.4, 96.5)
91.3
(88.6, 93.7)
97.5
(96.7, 98.6)
68.8
(59.6, 76.2)
87.1
(78.2, 94.9)
91.6
(85.7, 96.7)
56.8
(46.5, 67.0)
85.1
(79.2, 90.2)
ASCo-MIL 76.5
(72.5, 80.6)
79.6
(76.1, 82.9)
86.2
(83.8,88.4)
77.9
(74.2, 81.5)
91.2
(88.9, 93.0)
60.7
(50.7, 69.7)
88.0
(78.4, 96.1)
93.7
(88.5, 97.9)
46.3
(36.4, 56.6)
82.1
(75.9, 87.8)
Ours 94.9
(93.0, 96.8)
95.1
(93.3, 96.9)
97.1
(95.9, 98.2)
94.9
(93.1, 96.8)
98.7
(97.6, 99.4)
88.0
(82.3, 92.7)
96.3
(91.5, 100.0)
96.8
(92.9, 100.0)
81.1
(72.9, 88.7)
91.8
(84.6, 93.3)

5.2.1. Learning ability

Table 3 shows the quantitative comparison results in terms of the learning ability between Ours and previous 3D CT based COVID-19 diagnosis methods on CC-CCII dataset. Ours obtains an average of 94.9 F1, which outperforms the pooled 2D slice features based methods, such as CCT-Net (Goncharov et al., 2021), C19C-Net (Bai et al., 2020b), COVNet (Li et al., 2020b) by 23.6%, 43.4% and 59.2 %, respectively. In addition, Ours outperforms the 3D level CNN based approaches DeCoVNet (Wang et al., 2020b) by 4.1%, outperforms the attention score based MIL method ASCo-MIL (Han et al., 2020) by 24.1%. Fig. 7 demonstrates the ROC Curve comparison between the aforementioned methods. Ours achieves the best AUROC of 98.7%. Notably, the macro-averaged performance (aka unweighted mean of per-class performance) of three classes with one vs rest calculation setting1 on learning ability is presented in Table 3 and Fig. 7.

Fig. 7.

Fig. 7

ROC Curve comparisons between Ours and previous 3D CT based COVID-19 diagnosis methods, such as CCT-Net (Goncharov et al., 2021), C19C-Net (Bai et al., 2020b), COVNet (Li et al., 2020b), DeCoVNet (Wang et al., 2020b), ASCo-MIL (Han et al., 2020)). Two evaluation settings of learning Ability and Generalisation Ability are presented.

5.2.2. Generalisation ability

To evaluate the generalisation ability of the proposed model, we evaluate and compare Ours with previous 3D CT based COVID-19 diagnosis approaches with external test data (unseen data). The generalisation ability part of Table 3 shows the quantitative results. Ours achieves the best F1 of 88.0%, which outperforms the cutting-edge COVID-19 diagnosis methods DeCoVNet (Wang et al., 2020b) and ASCo-MIL (Han et al., 2020) by 27.9% and 45.0%. Fig. 7 shows the ROC Curve comparison. Ours achieves the best AUROC of 91.8%.

5.2.3. Attention heat maps visualisation

Fig. 8 demonstrates the attention heat maps generated by using the gradient-weighted class activation mapping (Grad-CAM) (Selvaraju et al., 2017). Specifically, Grad-CAM results on different slices of different NCP patients are presented in each row of the figure. We compare Ours with previous methods such as C19C-Net (Bai et al., 2020b), COVNet (Li et al., 2020b), ASCo-MIL (Han et al., 2020), CCT-Net (Goncharov et al., 2021), and present them in each column. Ours has a more accurate and comprehensive activate area that covers more diagnosis characteristics, such as GGO, multi-focal patchy consolidation and bilateral patchy shadows, which are highlighted within red bounding box in the figure. Notably, all the compared methods in Fig. 8 adopted at least the same D slices as ours to make the inference and prediction. Specifically, C19C-Net (Bai et al., 2020b) and COVNet (Li et al., 2020b) used the same selected D slices, which is also aligned with their original implementation. ASCo-MIL (Han et al., 2020) and DeCoVNet (Wang et al., 2020b) used all of the slices in a CT scan to make the inference, thus includes the selected D slices.

Fig. 8.

Fig. 8

Qualitative comparisons between Ours, C19C-Net (Bai et al., 2020b), COVNet (Li et al., 2020b), ASCo-MIL (Han et al., 2020) and DeCoVNet (Wang et al., 2020b). Specifically, attention heatmaps visualisation of Grad-CAM on NCP patients are presented in each row. Ours has a more precise and comprehensive activate area that encompasses more diagnosis characteristics, including GGO, multi-focal patchy consolidation and bilateral patchy shadows.

Table 4.

Computational efficiency. Model size, FLOPs, and inference time of different 3D CT based COVID-19 diagnosis methods on a 224 × 224 × D input volume.

CCT-Net C19C-Net COVNet DeCoVNet Ours
Params (M) 24.8 23.8 23.5 0.35 15.0
FLOPs (G) 67.1 39.0 65.8 28.9 35.0
Inference Time (s) 1.2 1.2 1.1 1.1 1.1

5.2.4. Computational efficiency

Table 4 presents the number of parameters (M), floating-point operations (FLOPs) and inference time (s) of the compared models. Notably, ignoring the slices selection process of the first stage, we represent the proposed BA-GCN as Ours to compare with other methods in the Table 4. Ours adopted a light-weight backbone network of MF-Net to extract the 3D level of features, which leads to a relatively smaller model size as 15.0 M parameters.

6. Ablation study

We conduct thorough ablation studies, and all the results demonstrate our model’s effectiveness. As an illustration, the ablation results for the lung segmentation and model components are elaborated as follows.

Table 5.

Ablation study of lung segmentation on CCT-Net (Goncharov et al., 2021), DeCoVNet (Wang et al., 2020b) and Ours. w/o Seg represents without lung segmentation pre-process; w/ Our seg represents adopting our fully supervised lung segmentation method. The performance is reported as F1 (%), AUROC (%). 95% confidence intervals are presented in brackets, respectively.

Methods Learning ability
Generalisation ability
F1 (%)↑ AUROC (%)↑ F1 (%)↑ AUROC (%)↑
CCT-Net, w/o Seg 82.3
(80.0, 84.6)
97.2
(95.2, 98.5)
51.0
(48.7, 54.0)
65.3
(63.2, 68.1)
CCT-Net 75.0
(73.0, 78.1)
95.1
(92.5, 97.7)
69.0
(66.9, 72.1)
83.8
(81.1, 85.9)
CCT-Net, w/ Our seg 76.8
(72.9, 80.6)
96.1
(94.4, 97.1)
71.6
(62.8, 79.2)
85.9
(80.1, 90.7)

DeCoVNet, w/o Seg 93.9
(91.0, 95.5)
99.2
(97.1, 99.8)
57.3
(55.8, 60.2)
76.5
(74.1, 78.8)
DeCoVNet 88.7
(86.0, 90.3)
95.4
(93.2, 97.7)
66.7
(64.4, 68.9)
82.0
(80.0, 84.7)
DeCoVNet, w/ Our seg 91.2
(88.5, 93.7)
97.5
(96.7, 98.6)
68.8
(59.6, 76.2)
85.1
(79.2, 90.2)

Ours, w/o Seg 96.8
(94.7, 98.9)
99.4
(98.1, 99.7)
71.3
(69.2, 73.5)
84.3
(82.0, 86.6)
Ours 94.9
(93.0, 96.8)
98.7
(97.6, 99.4)
88.0
(82.3, 92.7)
91.8
(84.6, 93.3)

6.1. Need of lung segmentation pre-process

Lung segmentation is an essential pre-processing step in this task. Please note that the original CCT-Net (Goncharov et al., 2021) adopted a pre-trained lung segmentation model on other CT datasets (non-COVID) and the original DeCoVNet (Wang et al., 2020b) used an unsupervised approach to segment the lung regions as the pre-process for subsequent classification task. In this experiment, we used our pre-segmented lung CT images (w/ Our seg) to provide a more accurate cropped lung regions for their methods. Table 5 shows that w/ Our seg can boost their original classification performance of F1 by 2.4%, 2.8% and 3.8%, 3.1% in the Learning Ability and Generalisation Ability experiment settings, respectively. This can demonstrate the importance and the benefits of our fully-supervised lung-segmentation model in the task of 3D CT based COVID-19 classification. Additionally, Table 5 shows that the three methods without lung pre-segmentation (w/o Seg) can produce a better classification performance on the non-segmented CT data under the Learning Ability experiment setting, than the one with segmentation w/ Our Seg. However, the qualitative results (Fig. 10) prove that, such model trained on non-segmented data, can only learn a specific format pattern of different classes rather than the real radiographic diagnosis characteristics (i.e. GGO for NCP), because of specific scanning machine types, protocol standards, data sources of different classes. Also, due to the evaluation setting under Learning Ability of test on seen data, such specific format patterns also exist in the test dataset, which helps the models achieve ‘excellent’ classification results, rather than learning the real diagnosis features.

Differently, under the experiment setting of Generalisation Ability, those methods w/o Seg conducts a terrible classification performance because an external test dataset (unseen data) is introduced to evaluate the trained model, where the aforementioned specific format patterns do not exist. This further demonstrates the importance of pre-segmentation, generalisation ability and external test dataset (unseen data) in this task. More visualisation comparisons and discussions related to this challenge are refereed to Section 7.1 and Fig. 10.

6.2. Model components

This section presents the results of our ablation study on model components. We evaluate the effectiveness of the proposed UC-MIL, BA-GCN modules, and present the quantitative results in Table 6. Firstly, we remove the BA-GCN and keep the rest of our model, conduct Ours w/o BA-GCN in the Table. Secondly, we replace UC-MIL with random and symmetrical slice sampling rules to select the fixed number of slices for each CT scan in the same manner as He et al. (2021a). In these two cases, the proposed bilateral graph model becomes an unilateral graph model, because there is no 2D feature information included in the vertices features. Specifically, for both evaluation settings (Learning and Generalisation Abilities), BA-GCN helps our model gain an average 9.4% performance boost w.r.t. F1 and 4.3% performance boost w.r.t AUROC; UC-MIL outperforms the hand-crafted slice sampling rules, e.g. random and symmetrical, by 12.5% and 11.9% F1 on average, respectively.

Table 6.

Ablation study on the effectiveness of the proposed UC-MIL and BA-GCN. The performance is reported as F1 (%), AUROC (%). 95% confidence intervals are presented in brackets, respectively.

Methods Learning ability
Generalisation ability
F1 (%)↑ AUROC (%)↑ F1 (%)↑ AUROC (%)↑
Ours w/o BA-GCN 82.6
(79.0, 86.0)
94.6
(92.9, 96.2)
81.0
(73.9, 87.2)
83.9
(77.1, 89.8)
Ours w/o UC-MIL
(random)
91.5
(89.1, 93.3)
97.4
(95.5, 98.8)
72.6
(70.9, 74.1)
87.0
(85.2, 88.7)
Ours w/o UC-MIL
(symmetrical)
91.9
(90.1, 93.3)
97.9
(95.8, 98.8)
73.1
(71.9, 75.2)
86.8
(84.8, 88.0)
Ours 94.9
(93.0, 96.8)
98.7
(97.6, 99.4)
88.0
(82.3, 92.7)
91.8
(84.6, 93.3)

Additionally, we conduct extensive experiments to evaluate the effectiveness of the proposed components inside UC-MIL and BA-GCN modules respectively, such as backbones, Uncertainty-Aware mechanism, Consensus-Assisted mechanism, BA-GConv layers, etc.. The experimental results are elaborated as follows, which prove their effectiveness.

6.2.1. UC-MIL

Backbone Network We conduct experiments to evaluate the effectiveness of different backbone models in the proposed UC-MIL. We adopt several classic 2D classification backbones, such as ResNet (He et al., 2016) variants (e.g. 18, 34, 50, 101), and cutting-edge classification backbones such as ResWide (Zagoruyko and Komodakis, 2016) variants (e.g. 50, 101), ResNeXt (Xie et al., 2017) variants (e.g. 50, 101), EfficientNet (Tan and Le, 2019) series (e.g. B0, B3, B5, B7) and Res2Net (Gao et al., 2019) variants (e.g. 50, 101). For each model’s variants, we present the best performance in Table 7 for an intuitive comparison. Ours achieves the best performance of 94.9% and 88.0% F1 with ResNeXt50 and ResNet18 as the backbone in Learning Ability and Generalisation Ability settings, respectively.

Table 7.

Ablation study on the effectiveness of the UC-MIL’s backbone networks and the proposed Uncertainty-aware Consensus-assisted mechanism. Specifically, we respectively replace the proposed UC-MIL to another two classic MIL methods, such as Campanella et al. (2019) (w/ Instance-based) and Ilse et al. (2018) (w/ Embedding-based). The performance is reported as F1 (%), AUROC (%). 95% confidence intervals are presented in brackets, respectively.

Methods Learning ability
Generalisation ability
F1 (%)↑ AUROC (%)↑ F1 (%)↑ AUROC (%)↑
Backbone
w/ ResNet34 93.2
(90.9, 95.5)
98.0
(96.1, 99.1)
86.8
(84.7, 88.1)
90.5
(88.7, 92.0)
w/ ResWide50 93.3
(91.7, 95.0)
97.7
(95.4, 98.9)
86.0
(84.2, 88.1)
90.2
(88.4, 92.3)
w/ EfficientNetB3 90.2
(88.1, 92.3)
96.0
(93.9, 98.0)
84.6
(82.7, 86.1)
88.1
(86.5, 89.7)
w/ Res2Net50 91.7
(89.9, 93.2)
96.8
(94.7, 98.0)
85.0
(83.1, 87.2)
88.7
(86.6, 89.9)
Ours 94.9
(93.0, 96.8)
98.7
(97.6, 99.4)
88.0
(82.3, 92.7)
91.8
(84.6, 93.3)

Component
w/o Uncertainty 92.2
(90.1, 94.1)
97.0
(95.8, 98.1)
86.2
(94.9, 88.3)
90.2
(88.1, 92.1)
w/o Consensus 92.2
(90.4, 94.6)
97.7
(95.1, 98.6)
85.8
(83.3, 87.0)
89.4
(87.2, 90.5)
w/ Instance-based 88.7
(86.0, 90.1)
95.5
(93.3, 96.8)
81.5
(80.0, 83.1)
85.7
(83.3, 87.1)
w/ Embedding-based 89.9
(87.4, 91.2)
95.9
(93.3, 97.6)
83.0
(81.0, 85.2)
87.0
(85.1, 88.9)
Ours 94.9
(93.0, 96.8)
98.7
(97.6, 99.4)
88.0
(82.3, 92.7)
91.8
(84.6, 93.3)

Uncertainty & Consensus Mechanism We evaluate the effectiveness of the proposed Uncertainty-aware mechanism and Consensus-assisted mechanism respectively. In detail, we remove each of them correspondingly and remain the rest of the model unchanged, which are represented as w/o Uncertainty and w/o Consensus in Table 7. As a result, the reliable slices selection process will rely on the ranked order of consensus-assisted instance probability (Pxi,jr(C), xi,jrΩ) and the ranked order of uncertainty-aware instance probability (PI~xi,j(C)), respectively. Specifically, Uncertainty-aware and Consensus-assisted modules boost the performance of F1 by 2.9% and 2.9% on Learning Ability and 2.0% and 2.6% on Generalisation Ability, respectively.

Multiple Instance Learning To further verify the usefulness of the proposed UC-MIL, we respectively replace it with another two classic MIL methods, Campanella et al. (2019) (w/ Instance-based) and Ilse et al. (2018) (w/ Embedding-based), shown in Table 7. Notably, w/ Instance-based can be seen as our UC-MIl but without Uncertainty and Consensus mechanisms. As for w/ Embedding-based (Ilse et al., 2018), as we discussed in Section 2.1, all previous 3D CT based COVID-19 diagnosis methods (Li et al., 2021b, Chikontwe et al., 2021, Han et al., 2020) adopted its attention scoring system. Specifically, we adopted the same backbone framework as Ours, but a trainable attention score-based pooling mechanism from Ilse et al. (2018). In detail, two fully-connected layers with Softmax as the activation functions are applied to learn a weighted average of instances (low-dimensional embeddings). We trained those two models (Campanella et al., 2019, Ilse et al., 2018) with all of the training CT slices/instances under the same experiment settings as ours. In Table 7, Ours outperforms w/ Instance-based and w/ Embedding-based by an average of 7.5% and 5.8% F1 on both evaluation settings. Notably, for w/ Instance-based (Campanella et al., 2019), we selected the top D instances (CT slices) according to the ranking of the predicted probability of instances, which is straightforward to implement and has been adopted by previous MIL methods (Campanella et al., 2019, Su et al., 2022, Kraus et al., 2016). On the other hand, for w/ Embedding-based (Ilse et al., 2018), we used the ranked attention weights to select the corresponding top D instances, which is similar to the previous methods (Ilse et al., 2018, Li et al., 2021a, Shao et al., 2021). Those selected top D instances were then used as the 3D input for our proposed BA-GCN.

As we have noted, previous instance-level MIL methods yield promising classification results in the seen data (i.e. the evaluation of Learning Ability), but poor on unseen data (i.e. the evaluation of Generalisation Ability). On the other hand, Ours can achieve more consistent results on the unseen data, with the benefit of the proposed uncertainty-aware and consensus-assisted mechanisms.

Table 8.

Ablation study on the effectiveness of the BA-GCN’s backbone networks and the proposed Bilateral Adaptive Graph Convolution. Specifically, we respectively replace the proposed BA-GConv layer to another three cutting-edge graph reasoning based classification layers, such as SGR (Liang et al., 2018), DualGCN (Zhang et al., 2019) and GloRe (Chen et al., 2019). The performance is reported as F1 (%), AUROC (%). 95% confidence intervals are presented in brackets, respectively.

Methods Learning ability
Generalisation ability
F1 (%)↑ AUROC (%)↑ F1 (%)↑ AUROC (%)↑
Backbone
w/ 3D-ResNet50 93.2
(91.4, 95.1)
98.0
(96.3, 98.9)
87.1
(86.2, 88.7)
91.0
(89.7, 92.8)
w/ 3D-ResNeXt50 93.2
(91.7, 95.5)
97.7
(95.8, 98.8)
87.3
(85.8, 89.9)
91.2
(89.7, 93.0)
w/ 3D-EfficientNetB0 90.7
(87.1, 91.3)
95.5
(92.8, 97.0)
85.2
(82.5, 84.0)
89.9
(87.0, 91.2)
Ours
(w/ MF-Net)
94.9
(93.0, 96.8)
98.7
(97.6, 99.4)
88.0
(82.3, 92.7)
91.8
(84.6, 93.3)

Component
w/ SGR 90.3
(88.2, 92.8)
95.5
(93.0, 97.7)
85.7
(83.6, 87.6)
86.0
(83.9, 88.0)
w/ DualGCN 91.1
(89.4, 92.9)
96.0
(94.1, 97.8)
85.9
(83.8, 87.1)
86.3
(84.9, 87.7)
w/ GloRe 90.8
(88.5, 92.0)
95.9
(93.1, 97.0)
86.1
(84.2, 88.2)
86.6
(84.1, 88.0)
Ours 94.9
(93.0, 96.8)
98.7
(97.6, 99.4)
88.0
(82.3, 92.7)
91.8
(84.6, 93.3)

6.2.2. Bilateral adaptive graph convolution network

Backbone Network. We conduct experiments to evaluate the effectiveness of different backbone models in the proposed BA-GCN. We adopt several classic classification backbones, such as 3D-ResNet (He et al., 2016) variants (e.g. 18, 34, 50), and cutting-edge classification backbones such as MF-Net (Chen et al., 2018), 3D-EfficientNet (Tan and Le, 2019) variants (e.g. B0, B3, B5) and 3D-ResNeXt (Xie et al., 2017) variants (e.g. 50, 101). For each model’s variants, we present the best performance in Table 8 for an intuitive comparison. Ours achieves the best performance of 94.9% and 88.0% F1 with MF-Net as the backbone in Learning Ability and Generalisation Ability settings, respectively.

Graph Convolution To further verify the usefulness of the proposed BA-GCN, we respectively replace it to another three cutting-edge graph-based reasoning methods, such as SGR (Liang et al., 2018), DualGCN (Zhang et al., 2019) and GloRe (Chen et al., 2019), shown in Table 8. In detail, we retain the same input vertices (Xall) and replace the proposed BA-GConv layer to their corresponding graph convolution layers, where SGR makes use of the knowledge graph mechanism, DualGCN investigates the coordinate space and feature space graph convolution, and GloRe makes use of the projection and re-projection mechanisms to reason about the relationships of different regions. In this way, the compared GCNs will consider both 2D and 3D levels of information from the input vertices.

Table 8 shows that Ours achieves more accurate and reliable results, and outperforms SGR, DualGCN and GloRe by an average of 3.9%, 2.9% and 3.4% F1 on both the evaluation settings.

7. Discussion

7.1. Hidden challenges of the COVID-19 dataset

CC-CCII (Zhang et al., 2020) is now the largest public available 3D CT dataset for the COVID-19 diagnosis, with patients’ CT scans of NCP, CP and Normal classes. Many previous methods (He et al., 2021a, Wu et al., 2021a, Tan and Liu, 2021, Hou et al., 2021) reported evaluation results on it, however, rarely discussed the importance of pre-segmentation process. In Table 5, we present a better quantification results of training the proposed model without pre-segmentation process (Ours, w/o Seg), than the one with the pre-segmentation (Ours). Similar circumstances are also observed with previous methods in Table 5, such as CCT-Net, w/o Seg and DeCoVNet, w/o Seg. However, the trained models without pre-segmentation may only learn a specific format pattern of different classes, rather than the true radiographic diagnosis characteristics (i.e. GGO for NCP), because of specific scanning machine types, protocol standards, data sources for different classes in the dataset. For example, in Fig. 9 we show ten randomly selected CT slices of different patients from Normal and NCP classes. The model can easily learn the difference between the specific scanner bed part of different classes (highlighted with red bounding box).

Fig. 9.

Fig. 9

CT slices are randomly selected from different patients. The top and bottom rows represent Normal and NCP classes, respectively. Red bounding box highlights the differences between the scanner beds in the two classes.

To further prove the necessity of pre-segmentation in this task, we visualise the trained model’s attention heat maps, which are generated by using the Grad-CAM. In Fig. 10, it shows that the models without pre-segmentation (Ours, w/o Seg) look at other regions (e.g. scanner bed) rather than the diagnosis characteristics part (e.g. GGO) of the lungs in the NCP CT images.

7.2. Limitations of the proposed model

Our proposed model achieves accurate classification results on three largest public available CT dataset w.r.t. Learning Ability and Generalisation Ability evaluation settings. However, one limitation of our model is two-stage, which requires a relatively longer inference time or training time compared to other one-stage methods. This is because we proposed UC-MIL for 2D feature extraction and trustworthy slices selection on the first stage, then, we propose BA-GCN to extract 3D features, and aggregate the 2D and 3D information for a more comprehensive level of feature reasoning on the second stage. Such a design increases the diagnostic accuracy but also consumes more time to infer and train. Compared to the other methods in Table 3, Ours takes 30.0 more hours on average for training the first stage of UC-MIL due to MIL’s specific training mechanism. This is similar to ASCo-MIL (Han et al., 2020), which is also a MIL-based method. However, we believe the training model process is often one-off, while inference speed plays a more important role in evaluating the algorithm and applying to the real applications. Specifically, Ours requires approx 0.26 s more inference time per 3D CT volume for both evaluation settings on average. In addition, if ignoring the slice selection process in the first stage for both Ours and other compared methods, we have demonstrated in the Table 4 that all the methods have a similar inference time. However, w.r.t. the diagnosis of COVID-19, the diagnostic accuracy would matter more than the inference speed. This highlights the need of a trade-off between accuracy and running time when applying AI models to real world applications. On the other hand, our proposed UC-MIL works as the automatically reliable CT slices selection step in the first stage, rather than the handcrafted slice sampling rules or manual slices selection of previous methods. In other words, previous methods also belong to the two-stage pipeline, where they need to select CT slices in a handcrafted way in the first stage. However, our method can automatically work with raw CT images without any manually designed pre-processing steps.

7.3. Future work

Future studies building on this work should may wish to focus on the first stage of reliable slices selection, as the second stage of graph-based 2D/3D feature reasoning processes will rely mainly on the selected slices and the 2D features from the first stage as the input. Consequently, a collection of noisy input slices will inevitably introduce noise into the second stage and in turn perturbing the training process. The ablation study experiments of UC-MIL and BA-GCN in Table 7 and Table 8 of the original manuscript further support this view, that is, unreliable slices of the first stage lead to lower performance in the diagnosis of second stage, especially in the generalisation ability evaluation.

A potential concern of using automatically selected top D CT slices of UC-MIL as the 3D input for the 3D CNN backbone in BA-GCN could be that the non-adjacent top D CT slices may lack abundant spatial correlations along the channel axis, which may lead to insufficient usage of the potential of 3D CNN. To address this concern, we have experimentally demonstrated that such 3D input can be used to boost the COVID-19 diagnosis performance via the extracted 3D features in both Learning Ability and Generalisation Ability settings in TABLE. 8, compared to the one without 3D features (Ours w/o BA-GCN in Table 6). Also, the same circumstance occurred and has been observed by many previous CT-based COVID-19 diagnosis studies (He et al., 2021a, Tan and Liu, 2021, Wang et al., 2021, Fang et al., 2021, Ouyang et al., 2020, Li et al., 2020c), where they sampled a fixed number of slices from adjacent CT slices, to form a 3D input volume with non-adjacent CT slices. Moreover, they have all proved that such 3D volume can be used for 3D CNN to extract COVID-19 diagnosis-related features and also achieve satisfying results. An extensive analysis of the relations between 3D CNN and non-adjacent CT slices’ effectiveness will be of interest in future studies.

8. Conclusion

We have proposed a novel and comprehensive framework for diagnosing COVID-19 using CT scans of an arbitrary number of slices. It takes advantage of both 2D and 3D features of CT images by utilising the proposed UC-MIL and BA-GCN modules. Our experiments have demonstrated that our framework can locate the diagnosis characteristics in both seen and unseen evaluation settings by the graph-based information aggregation of trustworthy 2D and 3D features. Our approach is anticipated to be widely applicable to real-world applications.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

This study was funded by EPSRC, United Kingdom Impact Acceleration Account (IAA) funding and Amazon Web Services, United Kingdom .

Footnotes

Appendix A. Supplementary data

The following is the Supplementary material related to this article.

MMC S1

Experiments on generalisation ability.

mmc1.pdf (268.8KB, pdf)

Data availability

Data will be made available on request.

References

  1. Amores J. Multiple instance classification: review, taxonomy and comparative study. Artificial Intelligence. 2013;201:81–105. [Google Scholar]
  2. Aviles-Rivero A.I., Sellars P., Schönlieb C.-B., Papadakis N. GraphXCOVID: explainable deep graph diffusion pseudo-labelling for identifying COVID-19 on chest X-rays. Pattern Recognit. 2022;122 doi: 10.1016/j.patcog.2021.108274. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Bai H.X., Hsieh B., Xiong Z., Halsey K., Choi J.W., Tran T.M.L., Pan I., Shi L.-B., Wang D.-C., Mei J., et al. Performance of radiologists in differentiating COVID-19 from viral pneumonia on chest CT. Radiology. 2020 doi: 10.1148/radiol.2020200823. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Bai H.X., Wang R., Xiong Z., Hsieh B., Chang K., Halsey K., Tran T.M.L., Choi J.W., Wang D.-C., Shi L.-B., et al. Artificial intelligence augmentation of radiologist performance in distinguishing COVID-19 from pneumonia of other origin at chest CT. Radiology. 2020;296(3):E156–E165. doi: 10.1148/radiol.2020201491. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Berthelot D., Carlini N., Goodfellow I., Papernot N., Oliver A., Raffel C.A. MixMatch: a holistic approach to semi-supervised learning. Adv. Neural Inf. Process. Syst. 2019;32 [Google Scholar]
  6. Bridge J., Meng Y., Zhao Y., Du Y., Zhao M., Sun R., Zheng Y. Introducing the GEV activation function for highly unbalanced data to develop COVID-19 diagnostic models. IEEE J. Biomed. Health Inf. 2020;24(10):2776–2786. doi: 10.1109/JBHI.2020.3012383. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Calderon-Ramirez S., Yang S., Moemeni A., Colreavy-Donnelly S., Elizondo D.A., Oala L., Rodríguez-Capitán J., Jiménez-Navarro M., López-Rubio E., Molina-Cabello M.A. Improving uncertainty estimation with semi-supervised deep learning for COVID-19 detection using chest X-ray images. IEEE Access. 2021 doi: 10.1109/ACCESS.2021.3085418. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Campanella G., Hanna M.G., Geneslaw L., Miraflor A., Silva V.W.K., Busam K.J., Brogi E., Reuter V.E., Klimstra D.S., Fuchs T.J. Clinical-grade computational pathology using weakly supervised deep learning on whole slide images. Nature Medicine. 2019;25(8):1301–1309. doi: 10.1038/s41591-019-0508-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Chao H., Fang X., Zhang J., Homayounieh F., Arru C.D., Digumarthy S.R., Babaei R., Mobin H.K., Mohseni I., Saba L., et al. Integrative analysis for COVID-19 patient outcome prediction. Med. Image Anal. 2021;67 doi: 10.1016/j.media.2020.101844. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Chen, Y., Kalantidis, Y., Li, J., Yan, S., Feng, J., 2018. Multi-fiber networks for video recognition. In: Proceedings of the European Conference on Computer Vision. ECCV, pp. 352–367.
  11. Chen B., Li J., Lu G., Yu H., Zhang D. Label co-occurrence learning with graph convolutional networks for multi-label chest X-ray image classification. IEEE J. Biomed. Health Inf. 2020;24(8):2292–2302. doi: 10.1109/JBHI.2020.2967084. [DOI] [PubMed] [Google Scholar]
  12. Chen X., Ravikumar N., Xia Y., Attar R., Diaz-Pinto A., Piechnik S.K., Neubauer S., Petersen S.E., Frangi A.F. Shape registration with learned deformations for 3D shape reconstruction from sparse and incomplete point clouds. Med. Image Anal. 2021 doi: 10.1016/j.media.2021.102228. [DOI] [PubMed] [Google Scholar]
  13. Chen, Y., Rohrbach, M., Yan, Z., Shuicheng, Y., Feng, J., Kalantidis, Y., 2019. Graph-based global reasoning networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 433–442.
  14. Chen N., Zhou M., Dong X., Qu J., Gong F., Han Y., Qiu Y., Wang J., Liu Y., Wei Y., et al. Epidemiological and clinical characteristics of 99 cases of 2019 novel coronavirus pneumonia in Wuhan, China: a descriptive study. Lancet. 2020;395(10223):507–513. doi: 10.1016/S0140-6736(20)30211-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Chikontwe P., Luna M., Kang M., Hong K.S., Ahn J.H., Park S.H. Dual attention multiple instance learning with unsupervised complementary loss for COVID-19 screening. Med. Image Anal. 2021 doi: 10.1016/j.media.2021.102105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Defferrard, M., Bresson, X., Vandergheynst, P., 2016. Convolutional neural networks on graphs with fast localized spectral filtering. In: Advances in Neural Information Processing Systems. pp. 3844–3852.
  17. DeLong E.R., DeLong D.M., Clarke-Pearson D.L. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics. 1988:837–845. [PubMed] [Google Scholar]
  18. Di D., Shi F., Yan F., Xia L., Mo Z., Ding Z., Shan F., Song B., Li S., Wei Y., et al. Hypergraph learning for identification of COVID-19 with CT imaging. Med. Image Anal. 2021;68 doi: 10.1016/j.media.2020.101910. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Dietterich T.G., Lathrop R.H., Lozano-Pérez T. Solving the multiple instance problem with axis-parallel rectangles. Artificial Intelligence. 1997;89(1–2):31–71. [Google Scholar]
  20. Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N., 2021. An image is worth 16x16 words: transformers for image recognition at scale. In: International Conference on Learning Representations.
  21. Fan D.-P., Ji G.-P., Zhou T., Chen G., Fu H., Shen J., Shao L. International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer; 2020. Pranet: carallel reverse attention network for polyp segmentation; pp. 263–273. [Google Scholar]
  22. Fan Y., Liu J., Yao R., Yuan X. COVID-19 detection from X-ray images using multi-kernel-size spatial-channel attention network. Pattern Recognit. 2021 doi: 10.1016/j.patcog.2021.108055. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Fan D.-P., Zhou T., Ji G.-P., Zhou Y., Chen G., Fu H., Shen J., Shao L. Inf-net: automatic COVID-19 lung infection segmentation from CT images. IEEE Trans. Med. Imaging. 2020;39(8):2626–2637. doi: 10.1109/TMI.2020.2996645. [DOI] [PubMed] [Google Scholar]
  24. Fang C., Bai S., Chen Q., Zhou Y., Xia L., Qin L., Gong S., Xie X., Zhou C., Tu D., et al. Deep learning for predicting COVID-19 malignant progression. Med. Image Anal. 2021;72 doi: 10.1016/j.media.2021.102096. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Gao S., Cheng M.-M., Zhao K., Zhang X.-Y., Yang M.-H., Torr P.H. Res2net: a new multi-scale backbone architecture. IEEE Trans. Pattern Anal. Mach. Intell. 2019 doi: 10.1109/TPAMI.2019.2938758. [DOI] [PubMed] [Google Scholar]
  26. Gao K., Su J., Jiang Z., Zeng L.-L., Feng Z., Shen H., Rong P., Xu X., Qin J., Yang Y., et al. Dual-branch combination network (DCN): towards accurate diagnosis and lesion segmentation of COVID-19 using CT images. Med. Image Anal. 2021;67 doi: 10.1016/j.media.2020.101836. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Goncharov M., Pisov M., Shevtsov A., Shirokikh B., Kurmukov A., Blokhin I., Chernina V., Solovev A., Gombolevskiy V., Morozov S., et al. CT-based COVID-19 triage: deep multitask learning improves joint identification and severity quantification. Med. Image Anal. 2021;71 doi: 10.1016/j.media.2021.102054. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Greenspan H., Estépar R.S.J., Niessen W.J., Siegel E., Nielsen M. Position paper on COVID-19 imaging and AI: from the clinical needs and technological challenges to initial AI solutions at the lab and national level towards a new era for AI in healthcare. Med. Image Anal. 2020;66 doi: 10.1016/j.media.2020.101800. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Guarrasi V., D’Amico N.C., Sicilia R., Cordelli E., Soda P. Pareto optimization of deep networks for COVID-19 diagnosis from chest X-rays. Pattern Recognit. 2022;121 doi: 10.1016/j.patcog.2021.108242. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Guo, L., Tang, L., Chen, T., Zhu, L., Nguyen, Q.V.H., Yin, H., 2021. DA-GCN: a domain-aware attentive graph convolution network for shared-account cross-domain sequential recommendation. In: The 30th International Joint Conference on Artificial Intelligence (IJCAI).
  31. Han Z., Wei B., Hong Y., Li T., Cong J., Zhu X., Wei H., Zhang W. Accurate screening of COVID-19 using attention-based deep 3D multiple instance learning. IEEE Trans. Med. Imaging. 2020;39(8):2584–2594. doi: 10.1109/TMI.2020.2996256. [DOI] [PubMed] [Google Scholar]
  32. Hani C., Trieu N.H., Saab I., Dangeard S., Bennani S., Chassagnon G., Revel M.-P. COVID-19 pneumonia: a review of typical CT findings and differential diagnosis. Diagn. Interv. Imaging. 2020;101(5):263–268. doi: 10.1016/j.diii.2020.03.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Hao J., Liu J., Pereira E., Liu R., Zhang J., Zhang Y., Yan K., Gong Y., Zheng J., Zhang J., et al. Uncertainty-guided graph attention network for parapneumonic effusion diagnosis. Med. Image Anal. 2021 doi: 10.1016/j.media.2021.102217. [DOI] [PubMed] [Google Scholar]
  34. He, X., Wang, S., Chu, X., Shi, S., Tang, J., Liu, X., Yan, C., Zhang, J., Ding, G., 2021a. Automated model design and benchmarking of deep learning models for COVID-19 detection with chest CT scans. In: Proceedings of the AAAI Conference on Artificial Intelligence. 35, (6), pp. 4821–4829.
  35. He, K., Zhang, X., Ren, S., Sun, J., 2016. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 770–778.
  36. He K., Zhao W., Xie X., Ji W., Liu M., Tang Z., Shi Y., Shi F., Gao Y., Liu J., et al. Synergistic learning of lung lobe segmentation and hierarchical multi-instance classification for automated severity assessment of COVID-19 in CT images. Pattern Recognit. 2021;113 doi: 10.1016/j.patcog.2021.107828. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Hou, L., Samaras, D., Kurc, T.M., Gao, Y., Davis, J.E., Saltz, J.H., 2016. Patch-based convolutional neural network for whole slide tissue image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 2424–2433. [DOI] [PMC free article] [PubMed]
  38. Hou J., Xu J., Jiang L., Du S., Feng R., Zhang Y., Shan F., Xue X. Periphery-aware COVID-19 diagnosis with contrastive representation enhancement. Pattern Recognit. 2021;118 doi: 10.1016/j.patcog.2021.108005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Huang, H., Lin, L., Zhang, Y., Xu, Y., Zheng, J., Mao, X., Qian, X., Peng, Z., Zhou, J., Chen, Y.-W., et al., 2021. Graph-BAS3Net: boundary-aware semi-supervised segmentation network with Bblateral graph convolution. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 7386–7395.
  40. Huang C., Wang Y., Li X., Ren L., Zhao J., Hu Y., Zhang L., Fan G., Xu J., Gu X., et al. Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China. Lancet. 2020;395(10223):497–506. doi: 10.1016/S0140-6736(20)30183-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Ilse M., Tomczak J., Welling M. International Conference on Machine Learning. PMLR; 2018. Attention-based deep multiple instance learning; pp. 2127–2136. [Google Scholar]
  42. Ji, W., Yu, S., Wu, J., Ma, K., Bian, C., Bi, Q., Li, J., Liu, H., Cheng, L., Zheng, Y., 2021. Learning calibrated medical image segmentation via multi-rater agreement modeling. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 12341–12351.
  43. Kendall, A., Gal, Y., 2017. What uncertainties do we need in bayesian deep learning for computer vision?. In: Advances in Neural Information Processing Systems. pp. 5574–5584.
  44. Khan, S., Hayat, M., Zamir, S.W., Shen, J., Shao, L., 2019. Striking the right balance with uncertainty. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 103–112.
  45. Kim E.A., Lee K.S., Primack S.L., Yoon H.K., Byun H.S., Kim T.S., Suh G.Y., Kwon O.J., Han J. Viral pneumonias in adults: radiologic and pathologic findings. Radiographics. 2002;22(suppl_1):S137–S149. doi: 10.1148/radiographics.22.suppl_1.g02oc15s137. [DOI] [PubMed] [Google Scholar]
  46. Kim S.-H., Wi Y.M., Lim S., Han K.-T., Bae I.-G. Differences in clinical characteristics and chest images between coronavirus disease 2019 and influenza-associated pneumonia. Diagnostics. 2021;11(2):261. doi: 10.3390/diagnostics11020261. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Kingma, D.P., Ba, J., 2014. Adam: a method for stochastic optimization. In: International Conference on Learning Representations.
  48. Kipf, T.N., Welling, M., 2017. Semi-supervised classification with graph convolutional networks. In: International Conference on Learning Representations.
  49. Kiser K., Ahmed S., Stieb S., Mohamed A., Elhalawani H., Park P., Doyle N., Wang B., Barman A., Fuller C., et al. Data from the thoracic volume and pleural effusion segmentations in diseased lungs for benchmarking chest CT processing pipelines. The Cancer Imaging Archive. 2020;10 doi: 10.1002/mp.14424. [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Kong F., Wilson N., Shadden S.C. A deep-learning approach for direct whole-heart mesh reconstruction. Med. Image Anal. 2021 doi: 10.1016/j.media.2021.102222. [DOI] [PMC free article] [PubMed] [Google Scholar]
  51. Koo H.J., Lim S., Choe J., Choi S.-H., Sung H., Do K.-H., et al. Radiographic and CT features of viral pneumonia. Radiographics. 2018;38(3):719–739. doi: 10.1148/rg.2018170048. [DOI] [PubMed] [Google Scholar]
  52. Kraus O.Z., Ba J.L., Frey B.J. Classifying and segmenting microscopy images with deep multiple instance learning. Bioinformatics. 2016;32(12):i52–i59. doi: 10.1093/bioinformatics/btw252. [DOI] [PMC free article] [PubMed] [Google Scholar]
  53. Kumar A., Tripathi A.R., Satapathy S.C., Zhang Y.-D. SARS-net: COVID-19 detection from chest X-rays by combining graph convolutional network and convolutional neural network. Pattern Recognit. 2022;122 doi: 10.1016/j.patcog.2021.108255. [DOI] [PMC free article] [PubMed] [Google Scholar]
  54. Lakshminarayanan B., Pritzel A., Blundell C. Simple and scalable predictive uncertainty estimation using deep ensembles. Adv. Neural Inf. Process. Syst. 2017;30 [Google Scholar]
  55. Le M.T., Diehl F., Brunner T., Knol A. Uncertainty estimation for deep neural object detectors in safety-critical applications. 2018 21st International Conference on Intelligent Transportation Systems; ITSC; IEEE; 2018. pp. 3873–3878. [Google Scholar]
  56. Lee, J., Chung, S.-Y., 2020. Robust training with ensemble consensus. In: International Conference on Learning Representations.
  57. Li Q., Guan X., Wu P., Wang X., Zhou L., Tong Y., Ren R., Leung K.S., Lau E.H., Wong J.Y., et al. Early transmission dynamics in Wuhan, China, of novel coronavirus–infected pneumonia. N. Engl. J. Med. 2020 doi: 10.1056/NEJMoa2001316. [DOI] [PMC free article] [PubMed] [Google Scholar]
  58. Li, Y., Gupta, A., 2018. Beyond grids: learning graph representations for visual recognition. In: Advances in Neural Information Processing Systems. pp. 9225–9235.
  59. Li, B., Li, Y., Eliceiri, K.W., 2021a. Dual-stream multiple instance learning network for whole slide image classification with self-supervised contrastive learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 14318–14328. [DOI] [PMC free article] [PubMed]
  60. Li L., Qin L., Xu Z., Yin Y., Wang X., Kong B., Bai J., Lu Y., Fang Z., Song Q., et al. Using artificial intelligence to detect COVID-19 and community-acquired pneumonia based on pulmonary CT: evaluation of the diagnostic accuracy. Radiology. 2020;296(2):E65–E71. doi: 10.1148/radiol.2020200905. [DOI] [PMC free article] [PubMed] [Google Scholar]
  61. Li Y., Wei D., Chen J., Cao S., Zhou H., Zhu Y., Wu J., Lan L., Sun W., Qian T., et al. Efficient and effective training of COVID-19 classification networks with self-supervised dual-track learning to rank. IEEE J. Biomed. Health Inf. 2020;24(10):2787–2797. doi: 10.1109/JBHI.2020.3018181. [DOI] [PMC free article] [PubMed] [Google Scholar]
  62. Li K., Wu J., Wu F., Guo D., Chen L., Fang Z., Li C. The clinical and chest CT features associated with severe and critical COVID-19 pneumonia. Investigative Radiology. 2020 doi: 10.1097/RLI.0000000000000672. [DOI] [PMC free article] [PubMed] [Google Scholar]
  63. Li, X., Yang, Y., Zhao, Q., Shen, T., Lin, Z., Liu, H., 2020e. Spatial pyramid based graph reasoning for semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 8950–8959.
  64. Li Z., Zhao W., Shi F., Qi L., Xie X., Wei Y., Ding Z., Gao Y., Wu S., Liu J., et al. A novel multiple instance learning framework for COVID-19 severity assessment via data augmentation and self-supervised learning. Med. Image Anal. 2021;69 doi: 10.1016/j.media.2021.101978. [DOI] [PMC free article] [PubMed] [Google Scholar]
  65. Liang, X., Hu, Z., Zhang, H., Lin, L., Xing, E.P., 2018. Symbolic graph reasoning meets convolutions. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems. pp. 1858–1868.
  66. Liao F., Liang M., Li Z., Hu X., Song S. Evaluate the malignancy of pulmonary nodules using the 3-d deep leaky noisy-or network. IEEE Trans. Neural Netw. Learn. Syst. 2019;30(11):3484–3495. doi: 10.1109/TNNLS.2019.2892409. [DOI] [PubMed] [Google Scholar]
  67. Liu C., Cui J., Gan D., Yin G. International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer; 2021. Beyond COVID-19 diagnosis: prognosis with hierarchical graph representation learning; pp. 283–292. [Google Scholar]
  68. Liu J., Dong B., Wang S., Cui H., Fan D.-P., Ma J., Chen G. COVID-19 lung infection segmentation with a novel two-stage cross-domain transfer learning framework. Med. Image Anal. 2021;74 doi: 10.1016/j.media.2021.102205. [DOI] [PMC free article] [PubMed] [Google Scholar]
  69. Liu Y., Gadepalli K., Norouzi M., Dahl G.E., Kohlberger T., Boyko A., Venugopalan S., Timofeev A., Nelson P.Q., Corrado G.S., et al. 2017. Detecting cancer metastases on gigapixel pathology images. arXiv preprint arXiv:1703.02442. [Google Scholar]
  70. Loshchilov I., Hutter F. SGDR: stochastic gradient descent with warm restarts. Int. Conf. Learn. Represent. 2017 [Google Scholar]
  71. Luo A., Li X., Yang F., Jiao Z., Cheng H., Lyu S. European Conference on Computer Vision. Springer; 2020. Cascade graph neural networks for rgb-d salient object detection; pp. 346–364. [Google Scholar]
  72. Luo L., Yu L., Chen H., Liu Q., Wang X., Xu J., Heng P.-A. Deep mining external imperfect data for chest X-ray disease screening. IEEE Trans. Med. Imaging. 2020;39(11):3583–3594. doi: 10.1109/TMI.2020.3000949. [DOI] [PubMed] [Google Scholar]
  73. Malhotra A., Mittal S., Majumdar P., Chhabra S., Thakral K., Vatsa M., Singh R., Chaudhury S., Pudrod A., Agrawal A. Multi-task driven explainable diagnosis of COVID-19 using chest X-ray images. Pattern Recognit. 2021 doi: 10.1016/j.patcog.2021.108243. [DOI] [PMC free article] [PubMed] [Google Scholar]
  74. Mallick, A., Dwivedi, C., Kailkhura, B., Joshi, G., Han, T.Y.-J., 2020. Can your AI differentiate cats from COVID-19? sample efficient uncertainty estimation for deep learning safety. In: ICML 2020 Workshop on Uncertainty & Robustness in Deep Learning.
  75. Maron O., Lozano-Pérez T. A framework for multiple-instance learning. Adv. Neural Inf. Process. Syst. 1998:570–576. [Google Scholar]
  76. Meng Y., Meng W., Gao D., Zhao Y., Yang X., Huang X., Zheng Y. European Conference on Computer Vision. Springer; 2020. Regression of instance boundary by aggregated CNN and GCN; pp. 190–207. [Google Scholar]
  77. Meng Y., Wei M., Gao D., Zhao Y., Yang X., Huang X., Zheng Y. International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer; 2020. CNN-GCN aggregation enabled boundary regression for biomedical image segmentation; pp. 352–362. [Google Scholar]
  78. Meng, Y., Zhang, H., Gao, D., Zhao, Y., Yang, X., Qian, X., Huang, X., Zheng, Y., 2021a. BI-GConv: boundary-aware input-dependent graph convolution for biomedical image segmentation. In: The British Machine Vision Conference. BMVC.
  79. Meng, Y., Zhang, H., Zhao, Y., Yang, X., Qian, X., Huang, X., Zheng, Y., 2021b. Spatial uncertainty-aware semi-spervised crowd counting. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 15549–15559.
  80. Meng, Y., Zhang, H., Zhao, Y., Yang, X., Qiao, Y., Ian, J.C.M., Huang, X., Zheng, Y., 2021c. Graph-based region and boundary aggregation for biomedical image segmentation. In: IEEE Transactions on Medical Imaging. [DOI] [PubMed]
  81. Milletari F., Navab N., Ahmadi S.-A. 2016 Fourth International Conference on 3D Vision (3DV) IEEE; 2016. V-net: fully convolutional neural networks for volumetric medical image segmentation; pp. 565–571. [Google Scholar]
  82. Minaee S., Kafieh R., Sonka M., Yazdani S., Soufi G.J. Deep-COVID: predicting COVID-19 from chest X-ray images using deep transfer learning. Med. Image Anal. 2020;65 doi: 10.1016/j.media.2020.101794. [DOI] [PMC free article] [PubMed] [Google Scholar]
  83. Morozov S.P., Andreychenko A.E., Blokhin I.A., Gelezhe P.B., Gonchar A.P., Nikolaev A.E., Pavlov N.A., Chernina V.Y., Gombolevskiy V.A. Mosmeddata: data set of 1110 chest CT scans performed during the COVID-19 epidemic. Digital Diagnostics. 2020;1(1):49–59. [Google Scholar]
  84. Noh K.J., Park S.J., Lee S. International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer; 2020. Combining fundus images and fluorescein angiography for artery/vein classification using the hierarchical vessel graph network; pp. 595–605. [Google Scholar]
  85. Oh Y., Park S., Ye J.C. Deep learning COVID-19 features on cxr using limited training data sets. IEEE Trans. Med. Imaging. 2020;39(8):2688–2700. doi: 10.1109/TMI.2020.2993291. [DOI] [PubMed] [Google Scholar]
  86. Ouyang X., Huo J., Xia L., Shan F., Liu J., Mo Z., Yan F., Ding Z., Yang Q., Song B., et al. Dual-sampling attention network for diagnosis of COVID-19 from community acquired pneumonia. IEEE Trans. Med. Imaging. 2020;39(8):2595–2605. doi: 10.1109/TMI.2020.2995508. [DOI] [PubMed] [Google Scholar]
  87. Perdomo O., Otálora S., González F.A., Meriaudeau F., Müller H. 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018) IEEE; 2018. Oct-net: a convolutional network for automatic classification of normal and diabetic macular edema using sd-oct volumes; pp. 1423–1426. [Google Scholar]
  88. Pinheiro, P.O., Collobert, R., 2015. From image-level to pixel-level labeling with convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 1713–1721.
  89. Qian X., Fu H., Shi W., Chen T., Fu Y., Shan F., Xue X. M3 Lung-sys: A deep learning system for multi-class lung pneumonia screening from CT imaging. IEEE J. Biomed. Health Inf. 2020;24(12):3539–3550. doi: 10.1109/JBHI.2020.3030853. [DOI] [PMC free article] [PubMed] [Google Scholar]
  90. Qiu J., Sun Y. Self-supervised iterative refinement learning for macular OCT volumetric data classification. Comput. Biol. Med. 2019;111 doi: 10.1016/j.compbiomed.2019.103327. [DOI] [PubMed] [Google Scholar]
  91. Rahimzadeh M., Attar A., Sakhaei S.M. A fully automated deep learning-based network for detecting COVID-19 from a new and large lung CT scan dataset. Biomed. Signal Process. Control. 2021 doi: 10.1016/j.bspc.2021.102588. URL: https://www.sciencedirect.com/science/article/pii/S1746809421001853. [DOI] [PMC free article] [PubMed] [Google Scholar]
  92. Reittner P., Ward S., Heyneman L., Johkoh T., Müller N.L. Pneumonia: high-resolution CT findings in 114 patients. European Radiology. 2003;13(3):515–521. doi: 10.1007/s00330-002-1490-3. [DOI] [PubMed] [Google Scholar]
  93. Rhee, S., Seo, S., Kim, S., 2018. Hybrid approach of relation network and localized graph convolutional filtering for breast cancer subtype classification. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence. pp. 3527–3534.
  94. Roberts M., Driggs D., Thorpe M., Gilbey J., Yeung M., Ursprung S., Aviles-Rivero A.I., Etmann C., McCague C., Beer L., et al. Common pitfalls and recommendations for using machine learning to detect and prognosticate for COVID-19 using chest radiographs and CT scans. Nat. Mach. Intell. 2021;3(3):199–217. [Google Scholar]
  95. Ronneberger O., Fischer P., Brox T. International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer; 2015. U-net: Convolutional networks for biomedical image segmentation; pp. 234–241. [Google Scholar]
  96. Roy S., Menapace W., Oei S., Luijten B., Fini E., Saltori C., Huijben I., Chennakeshava N., Mento F., Sentelli A., et al. Deep learning for classification and localization of COVID-19 markers in point-of-care lung ultrasound. IEEE Trans. Med. Imaging. 2020;39(8):2676–2687. doi: 10.1109/TMI.2020.2994459. [DOI] [PubMed] [Google Scholar]
  97. Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D., 2017. Grad-CAM: visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE International Conference on Computer Vision. pp. 618–626.
  98. Shamsi A., Asgharnezhad H., Jokandan S.S., Khosravi A., Kebria P.M., Nahavandi D., Nahavandi S., Srinivasan D. An uncertainty-aware transfer learning-based framework for COVID-19 diagnosis. IEEE Trans. Neural Netw. Learn. Syst. 2021;32(4):1408–1417. doi: 10.1109/TNNLS.2021.3054306. [DOI] [PMC free article] [PubMed] [Google Scholar]
  99. Shannon C.E. A mathematical theory of communication. ACM Sigmobile Mobile Comput. Commun. Rev. 2001;5(1):3–55. [Google Scholar]
  100. Shannon C.E., Weaver W. Illinois Press; Urbana, Illinois: 1949. The Mathematical Theory of Communication; p. 117. [Google Scholar]
  101. Shao Z., Bian H., Chen Y., Wang Y., Zhang J., Ji X., et al. Transmil: Transformer based correlated multiple instance learning for whole slide image classification. Adv. Neural Inf. Process. Syst. 2021;34:2136–2147. [Google Scholar]
  102. Shi H., Han X., Jiang N., Cao Y., Alwalid O., Gu J., Fan Y., Zheng C. Radiological findings from 81 patients with COVID-19 pneumonia in wuhan, China: a descriptive study. The Lancet Infect. Dis. 2020;20(4):425–434. doi: 10.1016/S1473-3099(20)30086-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  103. Shorfuzzaman M., Hossain M.S. MetaCOVID: a siamese neural network framework with contrastive loss for n-shot diagnosis of COVID-19 patients. Pattern Recognit. 2021;113 doi: 10.1016/j.patcog.2020.107700. [DOI] [PMC free article] [PubMed] [Google Scholar]
  104. Signoroni A., Savardi M., Benini S., Adami N., Leonardi R., Gibellini P., Vaccher F., Ravanelli M., Borghesi A., Maroldi R., et al. BS-net: learning COVID-19 pneumonia severity on a large chest X-ray dataset. Med. Image Anal. 2021;71 doi: 10.1016/j.media.2021.102046. [DOI] [PMC free article] [PubMed] [Google Scholar]
  105. Soda P., D’Amico N.C., Tessadori J., Valbusa G., Guarrasi V., Bortolotto C., Akbar M.U., Sicilia R., Cordelli E., Fazzini D., et al. AIforCOVID: predicting the clinical outcomes in patients with COVID-19 applying AI to chest- X-rays. an italian multicentre study. Med. Image Anal. 2021;74 doi: 10.1016/j.media.2021.102216. [DOI] [PMC free article] [PubMed] [Google Scholar]
  106. Su Z., Tavolara T.E., Carreno-Galeano G., Lee S.J., Gurcan M.N., Niazi M. Attention2majority: Weak multiple instance learning for regenerative kidney grading on whole slide images. Med. Image Anal. 2022;79 doi: 10.1016/j.media.2022.102462. [DOI] [PMC free article] [PubMed] [Google Scholar]
  107. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A., 2015. Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 1–9.
  108. Tan M., Le Q. International Conference on Machine Learning. PMLR; 2019. Efficientnet: rethinking model scaling for convolutional neural networks; pp. 6105–6114. [Google Scholar]
  109. Tan, W., Liu, J., 2021. A 3D cnn network with bert for automatic COVID-19 diagnosis from CT-scan images. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 439–445.
  110. Team N.L.S.T.R. Reduced lung-cancer mortality with low-dose computed tomographic screening. N. Engl. J. Med. 2011;365(5):395–409. doi: 10.1056/NEJMoa1102873. [DOI] [PMC free article] [PubMed] [Google Scholar]
  111. Uemura T., Näppi J.J., Watari C., Hironaka T., Kamiya T., Yoshida H. Weakly unsupervised conditional generative adversarial network for image-based prognostic prediction for COVID-19 patients based on chest CT. Med. Image Anal. 2021;73 doi: 10.1016/j.media.2021.102159. [DOI] [PMC free article] [PubMed] [Google Scholar]
  112. Vieira P., Sousa O., Magalhes D., Rablo R., Silva R. Detecting pulmonary diseases using deep features in X-ray images. Pattern Recognit. 2021 doi: 10.1016/j.patcog.2021.108081. [DOI] [PMC free article] [PubMed] [Google Scholar]
  113. Vilar J., Domingo M.L., Soto C., Cogollos J. Radiology of bacterial pneumonia. Eur. J. Radiol. 2004;51(2):102–113. doi: 10.1016/j.ejrad.2004.03.010. [DOI] [PubMed] [Google Scholar]
  114. Wada K. 2016. labelme: image polygonal annotation with python. https://github.com/wkentaro/labelme. [Google Scholar]
  115. Wang J., Bao Y., Wen Y., Lu H., Luo H., Xiang Y., Li X., Liu C., Qian D. Prior-attention residual learning for more discriminative COVID-19 screening in CT images. IEEE Trans. Med. Imaging. 2020;39(8):2572–2583. doi: 10.1109/TMI.2020.2994908. [DOI] [PubMed] [Google Scholar]
  116. Wang X., Deng X., Fu Q., Zhou Q., Feng J., Ma H., Liu W., Zheng C. A weakly-supervised framework for COVID-19 classification and lesion localization from chest CT. IEEE Trans. Med. Imaging. 2020;39(8):2615–2625. doi: 10.1109/TMI.2020.2995965. [DOI] [PubMed] [Google Scholar]
  117. Wang, X., Girshick, R., Gupta, A., He, K., 2018a. Non-local neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 7794–7803.
  118. Wang C., Horby P.W., Hayden F.G., Gao G.F. A novel coronavirus outbreak of global health concern. Lancet. 2020;395(10223):470–473. doi: 10.1016/S0140-6736(20)30185-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  119. Wang X., Jiang L., Li L., Xu M., Deng X., Dai L., Xu X., Li T., Guo Y., Wang Z., et al. Joint learning of 3D lesion segmentation and classification for explainable COVID-19 diagnosis. IEEE Trans. Med. Imaging. 2021 doi: 10.1109/TMI.2021.3079709. [DOI] [PMC free article] [PubMed] [Google Scholar]
  120. Wang Z., Liu Q., Dou Q. Contrastive cross-site learning with redesigned net for COVID-19 CT classification. IEEE J. Biomed. Health Inf. 2020;24(10):2806–2813. doi: 10.1109/JBHI.2020.3023246. [DOI] [PMC free article] [PubMed] [Google Scholar]
  121. Wang G., Liu X., Li C., Xu Z., Ruan J., Zhu H., Meng T., Li K., Huang N., Zhang S. A noise-robust framework for automatic segmentation of COVID-19 pneumonia lesions from CT images. IEEE Trans. Med. Imaging. 2020;39(8):2653–2663. doi: 10.1109/TMI.2020.3000314. [DOI] [PMC free article] [PubMed] [Google Scholar]
  122. Wang X., Tang F., Chen H., Luo L., Tang Z., Ran A.-R., Cheung C.Y., Heng P.-A. UD-MIL: uncertainty-driven deep multiple instance learning for OCT image classification. IEEE J. Biomed. Health Inf. 2020;24(12):3431–3442. doi: 10.1109/JBHI.2020.2983730. [DOI] [PubMed] [Google Scholar]
  123. Wang J., Wang J., Wen Y., Lu H., Niu T., Pan J., Qian D. Pulmonary nodule detection in volumetric chest CT scans using CNNs-based nodule-size-adaptive detection and classification. IEEE Access. 2019;7:46033–46044. [Google Scholar]
  124. Wang Z., Xiao Y., Li Y., Zhang J., Lu F., Hou M., Liu X. Automatically discriminating and localizing COVID-19 from community-acquired pneumonia on chest X-rays. Pattern Recognit. 2021;110 doi: 10.1016/j.patcog.2020.107613. [DOI] [PMC free article] [PubMed] [Google Scholar]
  125. Wang X., Yan Y., Tang P., Bai X., Liu W. Revisiting multiple instance neural networks. Pattern Recognit. 2018;74:15–24. [Google Scholar]
  126. Wickramasinghe U., Remelli E., Knott G., Fua P. International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer; 2020. Voxel2mesh: 3D mesh model generation from volumetric data; pp. 299–308. [Google Scholar]
  127. Wu X., Chen C., Zhong M., Wang J., Shi J. COVID-AL: the diagnosis of COVID-19 with deep active learning. Med. Image Anal. 2021;68 doi: 10.1016/j.media.2020.101913. [DOI] [PMC free article] [PubMed] [Google Scholar]
  128. Wu Y.-H., Gao S.-H., Mei J., Xu J., Fan D.-P., Zhang R.-G., Cheng M.-M. Jcs: an explainable COVID-19 diagnosis system by joint classification and segmentation. IEEE Trans. Image Process. 2021;30:3113–3126. doi: 10.1109/TIP.2021.3058783. [DOI] [PubMed] [Google Scholar]
  129. Wu, J., Yu, Y., Huang, C., Yu, K., 2015. Deep multiple instance learning for image classification and auto-annotation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 3460–3469.
  130. Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K., 2017. Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 1492–1500.
  131. Xie W., Jacobs C., Charbonnier J.-P., Van Ginneken B. Relational modeling for robust and efficient pulmonary lobe segmentation in CT scans. IEEE Trans. Med. Imaging. 2020;39(8):2664–2675. doi: 10.1109/TMI.2020.2995108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  132. Xu G.-X., Liu C., Liu J., Ding Z., Shi F., Guo M., Zhao W., Li X., Wei Y., Gao Y., et al. Cross-site severity assessment of COVID-19 from CT images via domain adaptation. IEEE Trans. Med. Imaging. 2021 doi: 10.1109/TMI.2021.3104474. [DOI] [PMC free article] [PubMed] [Google Scholar]
  133. Xue W., Cao C., Liu J., Duan Y., Cao H., Wang J., Tao X., Chen Z., Wu M., Zhang J., et al. Modality alignment contrastive learning for severity assessment of COVID-19 from lung ultrasound and clinical information. Med. Image Anal. 2021;69 doi: 10.1016/j.media.2021.101975. [DOI] [PMC free article] [PubMed] [Google Scholar]
  134. Yang D., Xu Z., Li W., Myronenko A., Roth H.R., Harmon S., Xu S., Turkbey B., Turkbey E., Wang X., et al. Federated semi-supervised learning for COVID region segmentation in chest CT using multi-national data from China, Italy, Japan. Med. Image Anal. 2021;70 doi: 10.1016/j.media.2021.101992. [DOI] [PMC free article] [PubMed] [Google Scholar]
  135. Yang Z., Zhao L., Wu S., Chen C.Y.-C. Lung lesion localization of COVID-19 from chest CT image: a novel weakly supervised learning method. IEEE J. Biomed. Health Inf. 2021;25(6):1864–1872. doi: 10.1109/JBHI.2021.3067465. [DOI] [PMC free article] [PubMed] [Google Scholar]
  136. Yao J., Cai J., Yang D., Xu D., Huang J. International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer; 2019. Integrating 3D geometry of organ for improving medical image segmentation; pp. 318–326. [Google Scholar]
  137. Yao Q., Xiao L., Liu P., Zhou S.K. Label-free segmentation of COVID-19 lesions in lung CT. IEEE Trans. Med. Imaging. 2021 doi: 10.1109/TMI.2021.3066161. [DOI] [PMC free article] [PubMed] [Google Scholar]
  138. Yu L., Wang S., Li X., Fu C.-W., Heng P.-A. International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer; 2019. Uncertainty-aware self-ensembling model for semi-supervised 3D left atrium segmentation; pp. 605–613. [Google Scholar]
  139. Zagoruyko, S., Komodakis, N., 2016. Wide residual networks. In: The British Machine Vision Conference. BMVC.
  140. Zhang J., Fan D.-P., Dai Y., Anwar S., Saleh F., Aliakbarian S., Barnes N. Uncertainty inspired RGB-D saliency detection. IEEE Trans. Pattern Anal. Mach. Intell. 2021 doi: 10.1109/TPAMI.2021.3073564. [DOI] [PubMed] [Google Scholar]
  141. Zhang, L., Li, X., Arnab, A., Yang, K., Tong, Y., Torr, P.H., 2019. Dual graph convolutional network for semantic segmentation. In: The British Machine Vision Conference. BMVC.
  142. Zhang K., Liu X., Shen J., Li Z., Sang Y., Wu X., Zha Y., Liang W., Wang C., Wang K., et al. Clinically applicable AI system for accurate diagnosis, quantitative measurements, and prognosis of COVID-19 pneumonia using computed tomography. Cell. 2020;181(6):1423–1433. doi: 10.1016/j.cell.2020.04.045. [DOI] [PMC free article] [PubMed] [Google Scholar]
  143. Zhang, H., Meng, Y., Zhao, Y., Qiao, Y., Yang, X., Coupland, S.E., Zheng, Y., 2022. DTFD-MIL: double-tier feature distillation multiple instance learning for histopathology whole slide image classification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.
  144. Zhao, T., Cao, K., Yao, J., Nogues, I., Lu, L., Huang, L., Xiao, J., Yin, Z., Zhang, L., 2021a. 3D graph anatomy geometry-integrated network for pancreatic mass segmentation, diagnosis, and quantitative patient management. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 13743–13752.
  145. Zhao C., Xu Y., He Z., Tang J., Zhang Y., Han J., Shi Y., Zhou W. Lung segmentation and automatic detection of COVID-19 using radiomic features from chest CT images. Pattern Recognit. 2021 doi: 10.1016/j.patcog.2021.108071. [DOI] [PMC free article] [PubMed] [Google Scholar]
  146. Zhao, Y., Yang, F., Fang, Y., Liu, H., Zhou, N., Zhang, J., Sun, J., Yang, S., Menze, B., Fan, X., et al., 2020. Predicting lymph node metastasis using histopathological images based on multiple instance learning with deep graph convolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 4837–4846.
  147. Zhong A., Li X., Wu D., Ren H., Kim K., Kim Y., Buch V., Neumark N., Bizzo B., Tak W.Y., et al. Deep metric learning-based image retrieval system for chest radiograph and its clinical applications in COVID-19. Med. Image Anal. 2021;70 doi: 10.1016/j.media.2021.101993. [DOI] [PMC free article] [PubMed] [Google Scholar]
  148. Zhou L., Li Z., Zhou J., Li H., Chen Y., Huang Y., Xie D., Zhao L., Fan M., Hashmi S., et al. A rapid, accurate and machine-agnostic segmentation and quantification method for CT-based COVID-19 diagnosis. IEEE Trans. Med. Imaging. 2020;39(8):2638–2652. doi: 10.1109/TMI.2020.3001810. [DOI] [PMC free article] [PubMed] [Google Scholar]
  149. Zhou Z., Siddiquee M.M.R., Tajbakhsh N., Liang J. UNet++: redesigning skip connections to exploit multiscale features in image segmentation. IEEE Trans. Med. Imaging. 2019 doi: 10.1109/TMI.2019.2959609. [DOI] [PMC free article] [PubMed] [Google Scholar]
  150. Zhu X., Song B., Shi F., Chen Y., Hu R., Gan J., Zhang W., Li M., Wang L., Gao Y., et al. Joint prediction and time estimation of COVID-19 developing severe symptoms using chest CT scan. Med. Image Anal. 2021;67 doi: 10.1016/j.media.2020.101824. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

MMC S1

Experiments on generalisation ability.

mmc1.pdf (268.8KB, pdf)

Data Availability Statement

Data will be made available on request.


Articles from Medical Image Analysis are provided here courtesy of Elsevier

RESOURCES