Skip to main content
The British Journal of Radiology logoLink to The British Journal of Radiology
. 2021 Nov 29;94(1128):20210332. doi: 10.1259/bjr.20210332

Diagnostic test accuracy of artificial intelligence analysis of cross-sectional imaging in pulmonary hypertension: a systematic literature review

Conor J Hardacre 1, Joseph A Robertshaw 1, Shaney L Barratt 2, Hannah L Adams 3, Robert V MacKenzie Ross 4, Graham RE Robinson 4, Jay Suntharalingam 4,5,4,5, John D Pauling 4,5,4,5, Jonathan Carl Luis Rodrigues 4,6,4,6,
PMCID: PMC8631018  PMID: 34541861

Abstract

Objectives:

To undertake the first systematic review examining the performance of artificial intelligence (AI) applied to cross-sectional imaging for the diagnosis of acquired pulmonary arterial hypertension (PAH).

Methods:

Searches of Medline, Embase and Web of Science were undertaken on 1 July 2020. Original publications studying AI applied to cross-sectional imaging for the diagnosis of acquired PAH in adults were identified through two-staged double-blinded review. Study quality was assessed using the Quality Assessment of Diagnostic Accuracy Studies and Checklist for Artificial Intelligence in Medicine frameworks. Narrative synthesis was undertaken following Synthesis Without Meta-Analysis guidelines. This review received no funding and was registered in the International Prospective Register of Systematic Reviews (ID:CRD42020196295).

Results:

Searches returned 476 citations. Three retrospective observational studies, published between 2016 and 2020, were selected for data-extraction. Two methods applied to cardiac-MRI demonstrated high diagnostic accuracy, with the best model achieving AUC=0.90 (95% CI: 0.85–0.93), 89% sensitivity and 81% specificity. Stronger results were achieved using cardiac-MRI for classification of idiopathic PAH, achieving AUC=0.97 (95% CI: 0.89–1.0), 96% sensitivity and 87% specificity. One study reporting CT-based AI demonstrated lower accuracy, with 64.6% sensitivity and 97.0% specificity.

Conclusions:

Automated methods for identifying PAH on cardiac-MRI are emerging with high diagnostic accuracy. AI applied to cross-sectional imaging may provide non-invasive support to reduce diagnostic delay in PAH. This would be helped by stronger solutions in other modalities.

Advances in knowledge:

There is a significant shortage of research in this important area. Early detection of PAH would be supported by further research advances on the promising emerging technologies identified.

Introduction

PAH is a progressive and fatal disease with an estimated incidence of 2.5–7.1 cases per million.1 PAH is diagnosed on right heart catheterisation (RHC) and was traditionally defined as a mean pulmonary artery pressure of ≥25 mm Hg, although subsequent classification has lowered the threshold to ≥20 mm Hg (>97.5 th percentile).2,3 Diagnosing PAH can be challenging, and diagnostic delay represents an important unmet need. Analysis of the Registry to Evaluate Early and Long-term Pulmonary Arterial Hypertension Disease Management (REVEAL Registry) suggests one in five patients who are eventually diagnosed with PAH report symptoms for longer than 2 years before the disease is diagnosed.4 There has been no meaningful decrease in the time from symptom onset to diagnosis of PAH over the past 20 years, and efforts have been made to alter the diagnostic algorithm and screening guidelines to avoid unnecessary invasive RHC studies and to balance “the benefits of earlier diagnosis and disease recognition against the economic healthcare burden of additional screening and increased referrals to PH centres”.5 Owing to its progressive nature, PAH has a short estimated median survival from diagnosis to death of 2.8 years, with a 5-year survival rate of 34%.6 These statistics demonstrate the need for earlier diagnosis as a potential key target to improve the prognosis of this condition. Assessment of PAH currently includes echocardiography, cross-sectional radiological imaging and nuclear medicine techniques such as ventilation/perfusion (V/Q) nuclear scanning, with definitive diagnosis finally provided using the gold standard of right heart catheterisation – a significant utilisation of healthcare resources. Developments in computing power and AI may allow for successful automation of the diagnostic process, representing an important opportunity to solve the endemic issues of delay and significant demand for healthcare resources in PAH diagnosis. AI solutions to identify disease from medical images have already been shown to be effective in other clinical specialties, such as breast cancer screening7 and diabetic retinopathy screening.8 The primary objective of this review was to systematically review, appraise and summarise the published literature to date on the performance and potential future of AI applied to cross-sectional imaging for the diagnosis of acquired forms of PAH in adult patients with suspected diagnoses of PAH.

Methods and materials

Protocol

The protocol was prepared in accordance with Preferred Reporting Items for Systematic review and Meta-Analysis Protocols (PRISMA-P) guidelines and registered in the International Prospective Register of Systematic Reviews [(PROSPERO), ID:CRD42020196295].

Search strategy

Standardised searches (Supplementary Material 1) were conducted on 1 July 2020 using Medline, EMBASE and Web Of Science. The search strategy was designed with input from the Head of library services in our institution, in addition to clinicians (radiology, respiratory and rheumatology) with expertise in AI and/or PAH. The following search terms were designed to incorporate studies using a broad range of AI techniques and include the diverse terminology used in describing these methods:

Supplementary Material 1.

((pulmonary hypertension) OR (PAH) OR (CTEPH)) AND ((artificial intelligence) OR (machine learning) OR (ARTIFICIAL INTELLIGENCE) OR (MACHINE LEARNING) OR (Decision Tree Analysis) OR (deep learning))

There were no restrictions on language or date of publication. Searches were undertaken using the Healthcare Databases Advanced Search platform provided in partnership by the National Institute for Health and Care Excellence (NICE) and Health Education England (HEE). No additional search methods were used.

Study selection

The titles and abstracts of all identified studies were assessed for inclusion based on the following criteria:

  • Articles incorporating AI or machine-learning-derived methods applied to cross-sectional imaging modalities, including but not limited to computed tomography (CT) and magnetic resonance imaging (MRI).

  • AI applied for the purposes of diagnosis or classification of acquired forms of PAH.

  • Adult study participants with a suspected diagnosis of acquired forms of PAH.

  • Reference tests limited to cross-sectional imaging modalities consisting of MRI and CT with echocardiography being excluded.

  • All clinical settings were considered.

Studies were excluded if they fulfilled the following exclusion criteria:

  • Unrelated subject area.

  • Animal studies.

  • Congenital/paediatric PAH studies

  • Case reports/case series not considered open-label studies (n ≤ 5).

  • Non-original research publications.

  • Conference abstracts (taken through to full text review where uncertain for later exclusion).

  • Abbreviated reports (i.e., letters to editors).

Two reviewers (CH and JAR) undertook independent screening of all records acquired by the searches, identifying articles of potential relevance to the review. Studies deemed potentially relevant and/or where the title and abstract provided insufficient detail to exclude were taken to the full-text assessment stage. Agreement was measured using Cohen’s κ, with intention to re-train both reviewers and repeat the study selection process if the κ was ≤0.80.9 Two reviewers (CH and JAR) undertook double-blinded full-text assessment. Studies in which there was discordance between reviewers were independently reviewed by a third reviewer (JCLR) to reach consensus.

Data extraction and synthesis

After training from the principal investigator (JCLR), data from selected studies was systematically extracted by a single reviewer (CH) using a pre-designed proforma to capture the study characteristics. Additionally, AI characteristics, outcomes and a summary of the main reported findings were collected, where available, including but not limited to: area under the receiver operating characteristic curve (AUC), positive-predictive value, negative-predictive value, sensitivity and specificity (Table 1). The principal investigator (JCLR) completed independent data extraction on a minimum of 2 studies or 20% of manuscripts (whichever larger) as quality assurance.

Table 1.

Summary of data extraction of the three assessed studies

Author, Date & Origin Description Modality Study population (and inclusion criteria) Control Exclusions Summary of AI training methods Summary of key reported outcomes Comments Funding
Swift et al10
2020,
United Kingdom
Retrospective observational study to determine performance of a tensor-based ML model which provides an accessible means of automating the segmentation of structures in imaging data and learn diagnostic features such that automated PH diagnosis is achieved with high accuracy. MRI (CMR) 220 treatment naive patients undergoing both CMR and RHC within 48 h of each other.
150 patients with a diagnosis of PH, 70 with no evidence of PH enrolled consecutively.
70 patients without PH undergoing CMR and RHC within 48 h of each other. Those not undergoing RHC and CMR within 48 h of each other
Prevalent scans
Incident patients with other causes of PH.
Cases (PH) and controls (no PH) entered into algorithm for training.
Ten-fold cross-validation was used to evaluate the proposed ML approach in terms of AUC. The patients were divided into 10 disjoint subsets. Nine subsets were used for training (learning the MPCA projection, determine S, and learning the classifier) and the remaining one subset for prediction (evaluation).
This process was repeated 10 times so that each subset was used for prediction exactly once, with the average over 10 repeats reported. Additionally, subgroup analysis was performed assessing the accuracy for differentiation of patients with and without IPAH.
PAH vs no PH: Most successful method shown as "short-axis with small ellipse model” returning an AUC = 0.90 [95%CI: 0.85–0.93].
IPAH vs no PH: Most successful method shown as "short-axis with small ellipse model” returning an AUC = 0.97 [95%CI: 0.89–1.0]
Linear ML methods allows for direct observation of features deemed diagnostically relevant. Allows for mapping of identified features onto image for user interpretation.
Average run time per study evaluated. Less than 1 sec after manual point landmarking.
This work was supported by:
Wellcome (215799/Z/19/Z and 205188/Z/
16/Z)
EPSRC (EP/R014507/1),
NIHR (NIHR-RP-R3-12-027)
MRC
(MR/M008894/1)
Chettrit et al11
2019,
Israel
Study comparing manual radiologist measurement of PA:Ao ratio to a neural network model that automates this process.
The study exhibits a successful approach to automatically identify and measure the PA and Ao to determine risk of PH based on crude estimate (ratio PA:Ao >1 = high risk of PH).
CT (contrast-enhanced) 288 contrast-enhanced CT chest studies.
Inclusion criteria and patient selection details not explicitly stated.
Experiment ground truth measurements were set by a radiologist on the test set of 288 CTs and compared to the algorithm outputs. Undefined.
Patient selection details insufficiently reported.
600 chest CT studies manually annotated by three radiologists. Ground truth was set to the mean of the three measurements. An additional 685 studies annotated with masks per artery used for segmentation model training and validation. Meaning a total of 1,285 Chest CT studies were used for training and validation (81.7% of the data set), the remaining 288 studies were used for testing (18.3%). 91.9% of test cases correctly risk stratified. (based on PA:Ao ratio >1 definition).
PPV = 80.3%. Sensitivity = 64.6%. Specificity 97.0%.
Algorithm measurements judged "acceptable and in the correct location” for 99.05 and 99.76% of cases.
Average run time 70 s per study using standard hardware. 22 s per study using high end processor (1080Ti GPU).
Given a region of interest, the model uses two unique CNNs –for slice selection (slice containing main PA bifurcation) and for image segmentation.
Image processing techniques are used after to measure the diameters.
Unspecified.
Lungu et al12
2016,
United Kingdom
Retrospective observational study using a model comprising computational metrics - which reflect haemodynamic changes in the pulmonary vasculature - and MRI measures of RV morphology and function imaging combined in a decision support algorithm achieving high diagnostic accuracy for PH. MRI (CMR) 72 consecutive patients being investigated for PH who underwent RHC and MRI within 48 h from the Sheffield Pulmonary Vascular Disease Unit. Patients were referred from other centres to the Sheffield Unit on the basis of clinical features and a local non-invasive assessment that usually included echocardiography 15 patients found to have mPAP at RHC <25 mm Hg (11 patients mPAP 22–24 mm Hg) MRI incompatibility
Claustrophobia
Pregnancy.
Ten non-invasive PH metrics of cardiopulmonary vascular function were computed or measured:
Distal resistance, Characteristic resistance, Total compliance, Ratio of backward to total pressure wave power, Relative area change, RV end-diastolic volume index, RV ejection fraction, Ventricular mass index, Systolic septal angle, RV mass index.
A random forest classification algorithm in MATLAB was used to assign a diagnosis of PH or no PH for each subject.
Combining all of the non-invasive PH metrics correctly classified 66 out of 72 patients (92%), with a high sensitivity of 97% (95% CI: 87.89–99.57%) and a good specificity of 73% (95% CI: 44.90–92.21%).
The corresponding positive and negative predicted values were 93.22% (95% CI: 83.54–98.12%) and 84.62% (95% CI: 54.55–98.08%), respectively.
The definition of "no PH” includes 11 patients which would, under new revision of definition, now be diagnosed with PH (mPAP >20 at RHC). NIHR

AI, Artificial intelligence; ANN, Artificial neural network; Ao, Ascending aorta; CI, Confidence interval; CMR, Cardiovascular magnetic resonance; CNN, Convolutional neural network; CT, Computed tomography; CXR, Chest x-ray/radiograph; EPSRC, Engineering and Physical Sciences Research Council; GPU, Graphics processing unit Hrs = hours; IPAH, Idiopathic pulmonary arterial hypertension; ML, Machine learning; MPCA, Multilinear principal component analysis; MRC, Medical Research Council; MRI, Magnetic resonance imaging; NIHR, National Institute of Health Research; PA, Pulmonary artery; PAH, Pulmonary arterial hypertension; PAP, Pulmonary arterial pressure; PE, Pulmonary embolism; PH, Pulmonary hypertension; PPV, Positive predictive value; RV, Right ventricle; V/Q, Ventilation-perfusion scan; mPAP, Mean pulmonary arterial pressure.

In anticipation of heterogeneity in AI models, study designs and reporting, a narrative synthesis approach was planned using the SWiM reporting guidelines,13 grouping studies according to imaging modality (and according to form of PAH secondarily), providing a description of the reporting metrics used in each and a descriptive data synthesis, drawing out key reported findings and limitations of each study. Our synthesis was to be influenced by our assessments of study quality, highlighting how limitations in study design affect our ability to draw conclusion from study findings where applicable.

Risk of bias assessment

Study quality and risk of bias were assessed for each included study, using the latest revision of the Quality Assessment of Diagnostic Accuracy Studies tool (QUADAS-2)14 and the Checklist for Artificial Intelligence in Medical Imaging (CLAIM) framework.15

Results

Study selection

Searches of the databases returned 476 records (Figure 1). After removing duplicates using EndNote, 457 unique records were identified of which 23 were selected by both reviewers for full-text assessment (Figure 1). There was excellent agreement between the two reviewers (98.9% with Cohen’s-Kappa 0.896 (95%CI: 0.806–0.986), representing "almost perfect” agreement9). Senior arbitration was necessary for five studies, all of which were subsequently excluded. Of the 23 studies assessed at full-text-review, 20 were excluded (Figure 1). The majority of exclusions (n = 11, 55%) were conference abstracts without sufficient data for inclusion. The remaining three studies proceeded to data extraction and analysis.

Figure 1.

Figure 1.

Study selection flow diagram, presented in conformity with the PRISMA statement.

Study characteristics

Two of the three studies described AI models applied to cardiac magnetic resonance imaging (CMR)10,12 and one study described an AI model applied to contrast-enhanced CT studies.11 The studies were published between 2016 and 2020. The work was undertaken in the UK10,12 and Israel.11 The studies were large with a median of 220 subjects [Interquartile range (IQR) 72–288]; in which 253 subjects in total, median of 57 (IQR:46–150), had confirmed PAH. The studies are summarised in Table 1.

Assessment of AI models

The two studies relating to MRI models primary reporting metric was area-under the curve (AUC), derived from receiver-operating-characteristic curve.10,12 Further metrics such as sensitivity, specificity, positive and negative predicted values were provided. Both studies demonstrated high diagnostic accuracy of their models. Swift et al10 use quantitative CMR imaging metrics and tensor-based machine-learning methods to identify cardiac features with diagnostic utility in PAH, and applies multilinear subspace learning techniques. The resultant model showed a high diagnostic accuracy. For the classification of no PAH vs PAH, the best performing model achieved AUC = 0.90 (95% CI: 0.85–0.93), with 89% sensitivity and 81% specificity. The model’s ability to classify no PAH vs idiopathic pulmonary hypertension (IPAH) was also assessed. The accuracy improved even further, with AUC = 0.97 (95%CI: 0.89–1.0), 96% sensitivity and 87% specificity. Routine cine CMR short-axis images of both ventricles achieved a higher AUC than those using 4-chamber images for classification of both PAH and IPAH. The stated operating time of the AI method reported by Swift et al10 is under 1 s and under 10 s if including manual landmarking.

Lungu et al12, meanwhile, used "MRI-derived metrics of cardiopulmonary function” to quantify haemodynamic changes in pulmonary vasculature and right-ventricular function and morphology. These metrics were used with machine-learning methods (decision-tree analysis and random forest classification) to create a non-invasive model with high diagnostic accuracy for PAH. The best model described by Lungu et al12 combined computation-derived metrics with MRI measures of RV morphology and function and used a random forest classification algorithm to determine diagnosis. Reported AUC = 0.91. The model accurately classified 92% of patients in the study, corresponding to 97% sensitivity (95% CI: 87.89–99.57%) and 73% specificity (95% CI: 44.90–92.21%).

Contrast-enhanced chest CT imaging was the focus of a study by Chetritt et al11. This involved an experiment to compare an automated, convolutional neural network-based (CNN) method of stratifying PAH risk based on the more rudimentary ratio of the pulmonary artery (PA) diameter to ascending-aorta (Ao) diameter. This ratio has previously been shown before to be an indicator for the presence and severity of PAH,16 but in isolation is considered insufficient for formal diagnosis. The experiment looks at the machine’s output compared to a radiologist”s manual measurements. The model used two CNNs, one to select the correct axial slice to take measurements from and another to segment the structures within the image for measurement. The work showed a strong correlation between automated and manual measurement with a 93% Pearson correlation coefficient for the Ao and 92% for the PA, with the system tending to marginally overestimate artery diameter (mean difference −0.94 mm for Ao and −0.86 mm for PA). The study quoted 64.7% sensitivity and 97.0% specificity. Secondarily, the work assessed the slice selection CNN’s performance. “Slice selection” and “measurements” were judged as acceptable and correctly located in 99.05 and 99.76% of cases, respectively. The system run-time was assessed and was found to average 70 s per study using “hardware similar to that found on a radiologist’s workstation” (Intel Xeon Processor E5-2695 v4). An average system run-time of 22 s was achieved when using a high-end processor (GTX 1080Ti GPU).

Risk of bias

Risk of bias assessment is summarised in Tables 2 and 3. Two of the three studies were considered to be a "low risk” of bias with regard to all four domains assessed in QUADAS-214 (Patient Selection, Index Test, Reference Standard and Flow and Timing). The exception to this is Chetritt et al11, which was determined to be unclear risk of bias regarding patient selection due to lack of transparency reporting patient selection methods. This “unclear” label should be interpreted as a potentially high risk of bias. Furthermore, the reference standard used by Chetritt et al11. (PA:Ao ratio measured manually from CT) cannot formally diagnose PAH and as such received the "high concern” label for domain 3B, applicability of the reference standard. The determination regarding reference standard risk of bias (domain 3A), however, was low as the approach used was an appropriate means of classifying the ratio-based risk stratification that the study focused on. Conclusions drawn from Chetritt et al11 must be drawn in the context of these limitations.

Table 2.

Summary of QUADAS-2 outcomes for the three assessed studies

Study Risk of Bias Applicability Concerns
Patient Selection Index
Test
Reference Standard Flow and Timing Patient Selection Index
Test
Reference Standard
Swift et al10 graphic file with name bjr.20210332.inline001.jpg graphic file with name bjr.20210332.inline002.jpg graphic file with name bjr.20210332.inline003.jpg graphic file with name bjr.20210332.inline004.jpg graphic file with name bjr.20210332.inline005.jpg graphic file with name bjr.20210332.inline006.jpg graphic file with name bjr.20210332.inline007.jpg
Chetritt et al11 ? graphic file with name bjr.20210332.inline008.jpg graphic file with name bjr.20210332.inline009.jpg graphic file with name bjr.20210332.inline010.jpg ? graphic file with name bjr.20210332.inline011.jpg ?
Lungu et al12 graphic file with name bjr.20210332.inline012.jpg graphic file with name bjr.20210332.inline013.jpg graphic file with name bjr.20210332.inline014.jpg graphic file with name bjr.20210332.inline015.jpg graphic file with name bjr.20210332.inline016.jpg

Inline graphic, Low Risk.

X, High Risk.

?, Unclear Risk.

Low risk of bias fields indicated by a tick, high risk of bias fields indicated by a cross, unclear risk of bias fields indicated by a question mark.

Table 3.

Summary of the CLAIM outcomes for the three assessed studies

CLAIM Item Swift et al10 Chetritt et al11 Lungu et al12
1 Identification as a study of AI methodology, specifying the category of technology used (e.g. deep learning) graphic file with name bjr.20210332.inline018.jpg graphic file with name bjr.20210332.inline019.jpg graphic file with name bjr.20210332.inline020.jpg
2 Structured summary of study design, methods, results, and conclusions graphic file with name bjr.20210332.inline021.jpg graphic file with name bjr.20210332.inline022.jpg graphic file with name bjr.20210332.inline023.jpg
3 Scientific and clinical background, including the intended use and clinical role of the AI approach graphic file with name bjr.20210332.inline024.jpg graphic file with name bjr.20210332.inline025.jpg graphic file with name bjr.20210332.inline026.jpg
4 Study objectives and hypotheses graphic file with name bjr.20210332.inline027.jpg graphic file with name bjr.20210332.inline028.jpg graphic file with name bjr.20210332.inline029.jpg
5 Prospective or retrospective study graphic file with name bjr.20210332.inline030.jpg graphic file with name bjr.20210332.inline031.jpg graphic file with name bjr.20210332.inline032.jpg
6 Study goal, such as model creation, exploratory study, feasibility study, noninferiority trial graphic file with name bjr.20210332.inline033.jpg graphic file with name bjr.20210332.inline034.jpg graphic file with name bjr.20210332.inline035.jpg
7 Data sources graphic file with name bjr.20210332.inline036.jpg graphic file with name bjr.20210332.inline037.jpg graphic file with name bjr.20210332.inline038.jpg
8 Eligibility criteria: how, where, and when potentially eligible participants or studies were identified (e.g. symptoms, results from previous tests, inclusion in registry, patient-care setting, location, dates) graphic file with name bjr.20210332.inline039.jpg graphic file with name bjr.20210332.inline040.jpg graphic file with name bjr.20210332.inline041.jpg
9 Data preprocessing steps graphic file with name bjr.20210332.inline042.jpg graphic file with name bjr.20210332.inline043.jpg graphic file with name bjr.20210332.inline044.jpg
10 Selection of data subsets, if applicable graphic file with name bjr.20210332.inline045.jpg graphic file with name bjr.20210332.inline046.jpg graphic file with name bjr.20210332.inline047.jpg
11 Definitions of data elements, with references to common data elements graphic file with name bjr.20210332.inline048.jpg graphic file with name bjr.20210332.inline049.jpg graphic file with name bjr.20210332.inline050.jpg
12 De-identification methods graphic file with name bjr.20210332.inline051.jpg graphic file with name bjr.20210332.inline052.jpg graphic file with name bjr.20210332.inline053.jpg
13 How missing data were handled N/A graphic file with name bjr.20210332.inline054.jpg N/A
14 Definition of ground truth reference standard, in sufficient detail to allow replication graphic file with name bjr.20210332.inline055.jpg graphic file with name bjr.20210332.inline056.jpg graphic file with name bjr.20210332.inline057.jpg
15 Rationale for choosing the reference standard (if alternatives exist) graphic file with name bjr.20210332.inline058.jpg graphic file with name bjr.20210332.inline059.jpg graphic file with name bjr.20210332.inline060.jpg
16 Source of ground truth annotations; qualifications and preparation of annotators N/A graphic file with name bjr.20210332.inline061.jpg graphic file with name bjr.20210332.inline062.jpg
17 Annotation tools graphic file with name bjr.20210332.inline063.jpg graphic file with name bjr.20210332.inline064.jpg graphic file with name bjr.20210332.inline065.jpg
18 Measurement of inter- and intrarater variability; methods to mitigate variability and/or resolve discrepancies graphic file with name bjr.20210332.inline066.jpg graphic file with name bjr.20210332.inline067.jpg graphic file with name bjr.20210332.inline068.jpg
19 Intended sample size and how it was determined graphic file with name bjr.20210332.inline069.jpg graphic file with name bjr.20210332.inline070.jpg graphic file with name bjr.20210332.inline071.jpg
20 How data were assigned to partitions; specify proportions graphic file with name bjr.20210332.inline072.jpg graphic file with name bjr.20210332.inline073.jpg graphic file with name bjr.20210332.inline074.jpg
21 Level at which partitions are disjoint (e.g. image, study, patient, institution) graphic file with name bjr.20210332.inline075.jpg graphic file with name bjr.20210332.inline076.jpg graphic file with name bjr.20210332.inline077.jpg
22 Detailed description of model, including inputs, outputs, all intermediate layers and connections graphic file with name bjr.20210332.inline078.jpg graphic file with name bjr.20210332.inline079.jpg graphic file with name bjr.20210332.inline080.jpg
23 Software libraries, frameworks, and packages graphic file with name bjr.20210332.inline081.jpg graphic file with name bjr.20210332.inline082.jpg graphic file with name bjr.20210332.inline083.jpg
24 Initialization of model parameters (e.g. randomization, transfer learning) graphic file with name bjr.20210332.inline084.jpg graphic file with name bjr.20210332.inline085.jpg graphic file with name bjr.20210332.inline086.jpg
25 Details of training approach, including data augmentation, hyperparameters, number of models trained graphic file with name bjr.20210332.inline087.jpg graphic file with name bjr.20210332.inline088.jpg graphic file with name bjr.20210332.inline089.jpg
26 Method of selecting the final model graphic file with name bjr.20210332.inline090.jpg graphic file with name bjr.20210332.inline091.jpg graphic file with name bjr.20210332.inline092.jpg
27 Ensembling techniques, if applicable graphic file with name bjr.20210332.inline093.jpg graphic file with name bjr.20210332.inline094.jpg N/A
28 Metrics of model performance graphic file with name bjr.20210332.inline095.jpg graphic file with name bjr.20210332.inline096.jpg graphic file with name bjr.20210332.inline097.jpg
29 Statistical measures of significance and uncertainty (e.g. confidence intervals) graphic file with name bjr.20210332.inline098.jpg graphic file with name bjr.20210332.inline099.jpg graphic file with name bjr.20210332.inline100.jpg
30 Robustness or sensitivity analysis graphic file with name bjr.20210332.inline101.jpg graphic file with name bjr.20210332.inline102.jpg graphic file with name bjr.20210332.inline103.jpg
31 Methods for explainability or interpretability (e.g. saliency maps) and how they were validated N/A graphic file with name bjr.20210332.inline104.jpg graphic file with name bjr.20210332.inline105.jpg
32 Validation or testing on external data graphic file with name bjr.20210332.inline106.jpg graphic file with name bjr.20210332.inline107.jpg graphic file with name bjr.20210332.inline108.jpg
33 Flow of participants or cases, using a diagram to indicate inclusion and exclusion graphic file with name bjr.20210332.inline109.jpg graphic file with name bjr.20210332.inline110.jpg graphic file with name bjr.20210332.inline111.jpg
34 Demographic and clinical characteristics of cases in each partition graphic file with name bjr.20210332.inline112.jpg graphic file with name bjr.20210332.inline113.jpg graphic file with name bjr.20210332.inline114.jpg
35 Performance metrics for optimal model(s) on all data partitions graphic file with name bjr.20210332.inline115.jpg graphic file with name bjr.20210332.inline116.jpg graphic file with name bjr.20210332.inline117.jpg
36 Estimates of diagnostic accuracy and their precision (such as 95% confidence intervals) graphic file with name bjr.20210332.inline118.jpg graphic file with name bjr.20210332.inline119.jpg graphic file with name bjr.20210332.inline120.jpg
37 Failure analysis of incorrectly classified cases graphic file with name bjr.20210332.inline121.jpg graphic file with name bjr.20210332.inline122.jpg graphic file with name bjr.20210332.inline123.jpg
38 Study limitations, including potential bias, statistical uncertainty, and generalizability graphic file with name bjr.20210332.inline124.jpg graphic file with name bjr.20210332.inline125.jpg graphic file with name bjr.20210332.inline126.jpg
Implications for practice, including the intended use and/or clinical role graphic file with name bjr.20210332.inline127.jpg graphic file with name bjr.20210332.inline128.jpg graphic file with name bjr.20210332.inline129.jpg
40 Registration number and name of registry graphic file with name bjr.20210332.inline130.jpg graphic file with name bjr.20210332.inline131.jpg graphic file with name bjr.20210332.inline132.jpg
41 Where the full study protocol can be accessed graphic file with name bjr.20210332.inline133.jpg graphic file with name bjr.20210332.inline134.jpg graphic file with name bjr.20210332.inline135.jpg
42 Sources of funding and other support; role of funders graphic file with name bjr.20210332.inline136.jpg graphic file with name bjr.20210332.inline137.jpg graphic file with name bjr.20210332.inline138.jpg
Total 40 30 33

Satisfactory fields are indicated by a tick, unsatisfactory fields are indicated by a cross. Total scores reflect only the total score minus the number of unsatisfactory fields, with "N/A” fields not contributing to deductions from the total.

No study received a maximal CLAIM score; however, all three studies were conducted before the CLAIM tool was published. Bearing this in mind, Swift et al10 achieved a highly satisfactory 40/42, only failing to state means by which data was deidentified and to declare registration in a public registry. The other papers - Lungu et al12 and Chetritt et al11 - received scores of 33 and 30, respectively. Both scores were reduced by poor performance in the final "other details” section of the checklist - which is not AI specific - with neither stating a registration number, registry or study protocol. Furthermore, Chetritt et al11 made no explicit statement regarding funding. The absent checklist items of the lowest scoring study, Chetritt et al11, principally related to low transparency with respect to patients and data. The second lowest scoring paper, Lungu et al12, missed checklist items related to data partitions and certain evaluation steps.

Discussion

To our knowledge this is the first study to systematically review and critically appraise studies reporting on the use of AI for the evaluation of PAH. Three original studies were identified from 457 unique returns from a literature search on three biomedical databases.

The identified work outlining automated diagnostic models based on MRI inputs both demonstrated diagnostic accuracy sufficient to reduce the need for RHC in PAH assessment. CMR-based AI may help exclude PAH, raising the threshold for proceeding to RHC in at-risk populations. CMR may be used to non-invasively conduct serial follow-up,17 replacing serial RHCs to guide therapeutic changes once a diagnosis has been established. Current AI would also allow for the flagging of patients who need conventional work-up from routine scans or trigger specialist review of imaging to assess for aetiological information.17,18 Some circumstances will still warrant RHC such as for measurement of pulmonary vascular resistance, ruling out other diagnoses or vasodilator testing. However, advancements in AI – along with innovations like 4D-MRI for calculation of pulmonary vascular resistance19 –are likely further reduce the requirements for invasive assessment in the future. Should these advancements allow for entirely non-invasive diagnosis and assessment, this would confer significant efficiency savings. RHC is estimated at £1200 per hour20 and each procedure lasts approximately 45 min.21 Specialist centres in the UK managed 5955 patients in 2018/1922 and, with patients commonly receiving RHC as often as annually, non-invasive automated methods could enable savings upwards of £5 million per year across the NHS, whilst improving patient experience.

A key difference between the two most successful models is that the AI method reported by Swift et al10 does not require manual segmentation but an easier, quicker labelling of three landmark points (inferior and superior hinge points, inferolateral inflection point of the right-ventricular free wall, for short-axis images). The model described by Lungu et al12 requires a more complicated and time-consuming manual segmentation. Linear machine-learning methods adopted by Swift et al10 allow for a more interpretable output because individual features chosen to justify diagnosis are visualised on the images. Moreover, Lungu et al12 was published before the revision of the diagnostic threshold of PAH from 25 to 20 mm Hg. Although it is likely this threshold could be altered without harming performance, it is reasonable to suggest that the model developed by Swift et al10 more faithfully represents the reality of current PAH diagnosis by using the new threshold. Future developments could allow deep-learning methods to replace the need for manual landmarking entirely, providing computer-lead PAH diagnosis requiring zero human input.

The Clinical Radiology UK Workforce Census 2019 found that greater than two-in-three radiology departments report they do not have enough radiologists to provide safe, effective care and that the demand for complex MRI and CT imaging is growing at a greater rate than the radiology workforce. The report found that the NHS spent £108 million on outsourcing scans in 2019 alone, with radiologist understaffing estimated to reach 43% (3,331 radiologists) by 2024.23 The interpretation of CMR by a radiologist typically takes 20 min per subject24; the automated approach described by Swift et al10 takes less than 1% of this time to achieve diagnosis and its ability to automate the segmentation process from just three manually-labelled landmarks means the skill-level required for diagnosis falls from expert to near-entry level. This illustrates a key area where AI implementation can play a role in overcoming these shortages, allowing senior and highly-skilled staff to allocate time in areas where it will better serve patient flow and management.

The failure of Chetritt et al11 to use the gold standard of RHC, and the return of only three studies highlights the urgent requirement for more research in this area. In light of relatively weaker studies in non-MRI modalities, future work should focus on the development of new diagnostic models, especially in modalities that are undertaken earlier in existing diagnostic pathways. Particularly CT, which has higher utilisation, with the UK undertaking 54% more CT scans than MRI annually (2019/20: 5.9 vs 3.8 m)25; is the preferred method of assessing the lung parenchyma in PAH-associated connective-tissue disorders (such as systemic sclerosis)26; and CT-pulmonary-angiography is the recommended investigation in the Royal College of Radiologists’ guidelines for patients with suspected pulmonary embolus or pre-existing pulmonary disease.27 Work to achieve automated CT-diagnosis of PAH would therefore, in conjunction with existing MRI models, broaden the ability to sift through routine imaging to flag PAH, including in scans where PAH is not being considered, which is crucial in achieving earlier detection of PAH. Further research into the performance of these AI models should not only consider generalisability across larger patient populations, but also the effects of MRI equipment and acquisition protocol. Both studies focussing on CMR papers originated from the same specialist centre and utilised the same equipment (GE HDx whole-body scanner, 1.5T, eight-channel cardiac coil10,12) and acquisition approach. However, concerns have been raised regarding the application of radiology AI under conditions where equipment and scanning protocols diverge from those encountered in the training dataset, which may negatively impact AI performance.28 Further research is required to validate the performance demonstrated in these studies and to quantify the robustness of the models to such input variation before widespread clinical adoption can be achieved.

Limitations

Heterogeneity in modality, AI methods and bias sources prohibited pooled analysis. Low numbers of eligible studies and relatively small study populations limit the certainty of conclusions that can be drawn. N = 11 conference abstracts were excluded at full-text review without author contact. Theoretically, studies may have been missed but would be small in absolute numbers. Deviation from gold-standard systematic review methods occurred by duplication of data extraction for only two (66.7%) included studies. Agreement between duplicated extraction was good, any uncertainties were raised with the senior author (JCLR). As such, potential limitations are unlikely to have significantly impacted the results and findings of this review. Finally, only small numbers of eligible studies were identified. It is likely the area will expand in the short to medium term, but this work may provide a comprehensive baseline summary of the literature. Paucity of studies could in theory be caused by paucity and reliability of the AI solutions, when the reliability is the main issue, with negative reporting bias but this is not proven at present. Future studies should cross-reference on the same vendor between different publications where applicable.

Conclusion

Systematic review of the literature identified paucity of studies concerning AI applied to cross-sectional imaging for PAH diagnosis. Two MRI studies show promise and may allow for reduction in RHC. However, in order to most impactfully automate PAH diagnosis, further research attention is required and successful implementation across larger patient sets must be achieved. This goal would be assisted by AI models that may be applied to more modalities. This review did not identify any AI sufficient to diagnose PAH that can be applied to CT; but identified work that may act as precursors for future models.

Key points

  1. AI for the diagnosis of PAH is emerging, and models applied to cardiac magnetic resonance imaging have been shown to achieve AUC = 0.90 (95% CI: 0.85–0.93).

  2. The diagnosis of idiopathic pulmonary hypertension can be achieved using AI and cardiac magnetic resonance imaging with AUC = 0.97 (95%CI: 0.89–1.0).

  3. Work evaluated which used AI applied to CT imaging requires further improvement. The study showed worse diagnostic accuracy of their model (64.6% sensitivity and 97.0% specificity) and potential risk of bias (unclear reporting of patients).

Footnotes

Acknowledgements: Mr Jason Ovens, Head of library services, Royal United Hospitals Bath NHS Foundation Trust for assistance with devising and performing the literatures searches.

Contributors: Declarations by authors for financial activities outside of the submitted work are as follows: JP– Boehringer Ingelheim (Personal fees). Actelion Pharmaceuticals (Grant, Personal fees, Attendance at educational meetings). Sojournix Pharma (Personal fees). SB– Boehringer Ingelheim (Grant, Personal fees, Attendance at educational meetings). JCLR– NHSX (Consultancy/Personal fees). Sanofi (speaker’s fees).

Data availability: The data underlying this article will be shared on reasonable request to the corresponding author.

Contributor Information

Conor J Hardacre, Email: ch17008@bristol.ac.uk.

Joseph A Robertshaw, Email: ch17008@bristol.ac.uk.

Shaney L Barratt, Email: shaney.barratt@nbt.nhs.uk.

Hannah L Adams, Email: h.l.adams@doctors.org.uk.

Robert V MacKenzie Ross, Email: rob.mackenzieross@nhs.net.

Graham RE Robinson, Email: grobinson1@nhs.net.

Jay Suntharalingam, Email: jay.suntharalingam@nhs.net.

John D Pauling, Email: johnpauling@nhs.net.

Jonathan Carl Luis Rodrigues, Email: j.rodrigues1@nhs.net.

REFERENCES

  • 1.Peacock AJ, Murphy NF, McMurray JJV, Caballero L, Stewart S. An epidemiological study of pulmonary arterial hypertension. Eur Respir J 2007; 30: 104–9. doi: 10.1183/09031936.00092306 [DOI] [PubMed] [Google Scholar]
  • 2.Condon DF, Nickel NP, Anderson R, Mirza S, de Jesus Perez VA. The 6th world Symposium on pulmonary hypertension: what's old is new. F1000Res 2019; 8: 888.19 06 2019. doi: 10.12688/f1000research.18811.1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Simonneau G, Montani D, Celermajer DS, Denton CP, Gatzoulis MA, Krowka M, et al. Haemodynamic definitions and updated clinical classification of pulmonary hypertension. Eur Respir J 2019; 53: 1801913.24 01 2019. doi: 10.1183/13993003.01913-2018 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Brown LM, Chen H, Halpern S, Taichman D, McGoon MD, Farber HW, et al. Delay in recognition of pulmonary arterial hypertension: factors identified from the reveal registry. Chest 2011; 140: 19–26. doi: 10.1378/chest.10-1166 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Frost A, Badesch D, Gibbs JSR, Gopalan D, Khanna D, Manes A, et al. Diagnosis of pulmonary hypertension. European Respiratory Journal 2019; 53: 1801904. doi: 10.1183/13993003.01904-2018 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.McLaughlin VV, Presberg KW, Doyle RL, Abman SH, McCrory DC, Fortin T, et al. Prognosis of pulmonary arterial hypertension: ACCP evidence-based clinical practice guidelines. Chest 2004; 126: 78S–92. doi: 10.1378/chest.126.1_suppl.78S [DOI] [PubMed] [Google Scholar]
  • 7.McKinney SM, Sieniek M, Godbole V, Godwin J, Antropova N, Ashrafian H, et al. International evaluation of an AI system for breast cancer screening. Nature 2020; 577: 89–94. doi: 10.1038/s41586-019-1799-6 [DOI] [PubMed] [Google Scholar]
  • 8.Bhaskaranand M, Ramachandra C, Bhat S, Cuadros J, Nittala MG, Sadda SR, et al. The value of automated diabetic retinopathy screening with the EyeArt system: a study of more than 100,000 consecutive encounters from people with diabetes. Diabetes Technol Ther 2019; 21: 635–43. doi: 10.1089/dia.2019.0164 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics 1977; 33: 159–74. doi: 10.2307/2529310 [DOI] [PubMed] [Google Scholar]
  • 10.Swift AJ, Lu H, Uthoff J, Garg P, Cogliano M, Taylor J, et al. A machine learning cardiac magnetic resonance approach to extract disease features and automate pulmonary arterial hypertension diagnosis. Eur Heart J Cardiovasc Imaging 2021; 22: 236–45. doi: 10.1093/ehjci/jeaa001 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Chettrit D, Amitai OB, Tamir I, Bar A, Elnekave E. PHT-bot: Deep-learning based system for automatic risk stratification of copd patients based upon signs of pulmonary hypertension.. In: Mori K, Hahn H. K, eds.Medical imaging 2019: computer-aided diagnosis. USA: SPIE; 2019. doi: 10.1117/12.2512469 [DOI] [Google Scholar]
  • 12.Lungu A, Swift AJ, Capener D, Kiely D, Hose R, Wild JM. Diagnosis of pulmonary hypertension from magnetic resonance imaging-based computational models and decision tree analysis. Pulm Circ 2016; 6: 181–90. doi: 10.1086/686020 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Campbell M, McKenzie JE, Sowden A, Katikireddi SV, Brennan SE, Ellis S, et al. Synthesis without meta-analysis (swim) in systematic reviews: reporting guideline. BMJ 2020; 368: l6890. doi: 10.1136/bmj.l6890 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Whiting PF, Rutjes AWS, Westwood ME, Mallett S, Deeks JJ, Reitsma JB, et al. QUADAS-2: a revised tool for the quality assessment of diagnostic accuracy studies. Ann Intern Med 2011; 155: 529–U104. doi: 10.7326/0003-4819-155-8-201110180-00009 [DOI] [PubMed] [Google Scholar]
  • 15.Mongan J, Moy L, Kahn CE, Khan C. Checklist for artificial intelligence in medical imaging (claim): a guide for authors and reviewers. Radiol Artif Intell 2020; 2: e200029. doi: 10.1148/ryai.2020200029 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Karakus G, Kammerlander AA, Aschauer S, Marzluf BA, Zotter-Tufaro C, Bachmann A, et al. Pulmonary artery to aorta ratio for the detection of pulmonary hypertension: cardiovascular magnetic resonance and invasive hemodynamics in heart failure with preserved ejection fraction. J Cardiovasc Magn Reson 2015; 17: 79. doi: 10.1186/s12968-015-0184-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Rich JD, Rich S. Clinical diagnosis of pulmonary hypertension. Circulation 2014; 130: 1820–30. doi: 10.1161/CIRCULATIONAHA.114.006971 [DOI] [PubMed] [Google Scholar]
  • 18.Goerne H, Batra K, Rajiah P. Imaging of pulmonary hypertension: an update. Cardiovasc Diagn Ther 2018; 8: 279–96. doi: 10.21037/cdt.2018.01.10 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Kheyfets VO, Schafer M, Podgorski CA, Schroeder JD, Browning J, Hertzberg J, et al. 4D magnetic resonance flow imaging for estimating pulmonary vascular resistance in pulmonary hypertension. J Magn Reson Imaging 2016; 44: 914–22. doi: 10.1002/jmri.25251 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.NHS Institue for Innovation and Improvement. The productive operating theatre: improving quality and efficiency in the operating theatre. 2020. Available from: https://www.england.nhs.uk/improvement-hub/publication/the-productive-operating-theatre/.
  • 21.Glowny MG, Resnic FS. What to expect during cardiac catheterization. Circulation 2012; 125: E363–4. doi: 10.1161/CIRCULATIONAHA.111.025916 [DOI] [PubMed] [Google Scholar]
  • 22.NHSDigital .National audit of pulmonary hypertension Great Britain, 2018-19: 10th annual report. In: Gibbs S, ed. Leeds: NHS; 2019. https://digital.nhs.uk/data-and-information/publications/statistical/national-pulmonary-hypertension-audit/2019. [Google Scholar]
  • 23.The Royal College of Radiologists .Clinical radiology UK workforce census 2019 report. London: The Royal College of Radiologists; 2020. pp. 1–61. https://www.rcr.ac.uk/publication/clinical-radiology-uk-workforce-census-2019-report. [Google Scholar]
  • 24.Bai W, Sinclair M, Tarroni G, Oktay O, Rajchl M, Vaillant G, et al. Automated cardiovascular magnetic resonance image analysis with fully convolutional networks. Journal of Cardiovascular Magnetic Resonance 2018; 20: 20. doi: 10.1186/s12968-018-0471-x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.NHS England Improvement . Diagnostic imaging dataset statistical release 2019/20. In: Dixon S, ed.Diagnostic imaging dataset annual statistical release. 1.0. Leeds: NHS England and NHS Improvement; 2020. pp. 7. https://www.england.nhs.uk/statistics/statistical-work-areas/diagnostic-imaging-dataset/diagnostic-imaging-dataset-2019-20-data/. [Google Scholar]
  • 26.Vij R, Strek ME. Diagnosis and treatment of connective tissue disease-associated interstitial lung disease. Chest 2013; 143: 814–24. doi: 10.1378/chest.12-0741 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.The Royal College of Radiologists .iRefer: making best use of a department of clinical radiology, guidelines for doctors. 7 ed. London: The Royal College of Radiologists; 2012. [Google Scholar]
  • 28.Hosny A, Parmar C, Quackenbush J, Schwartz LH, Aerts HJWL. Artificial intelligence in radiology. Nat Rev Cancer 2018; 18: 500–10. doi: 10.1038/s41568-018-0016-5 [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Material 1.

Articles from The British Journal of Radiology are provided here courtesy of Oxford University Press

RESOURCES