Abstract
Objective:
To assess the diagnostic accuracy and clinical impact of automated artificial intelligence (AI) measurement of thoracic aorta diameter on routine chest CT.
Methods:
A single-centre retrospective study involving three cohorts. 210 consecutive ECG-gated CT aorta scans (mean age 75 ± 13) underwent automated analysis (AI-Rad Companion Chest CT, Siemens) and were compared to a reference standard of specialist cardiothoracic radiologists for accuracy measuring aortic diameter. A repeated measures analysis tested reporting consistency in a second cohort (29 patients, mean age 61 ± 17) of immediate sequential pre-contrast and contrast CT aorta acquisitions. Potential clinical impact was assessed in a third cohort of 197 routine CT chests (mean age 66 ± 15) to document potential clinical impact.
Results:
AI analysis produced a full report in 387/436 (89%) and a partial report in 421/436 (97%). Manual vs AI agreement was good to excellent (ICC 0.76–0.92). Repeated measures analysis of expert and AI reports for the ascending aorta were moderate to good (ICC 0.57–0.88). AI diagnostic performance crossed the threshold for maximally accepted limits of agreement (>5 mm) at the aortic root on ECG-gated CTs. AI newly identified aortic dilatation in 27% of patients on routine thoracic imaging with a specificity of 99% and sensitivity of 77%.
Conclusion:
AI has good agreement with expert readers at the mid-ascending aorta and has high specificity, but low sensitivity, at detecting dilated aortas on non-dedicated chest CTs.
Advances in knowledge:
An AI tool may improve the detection of previously unknown thoracic aorta dilatation on chest CTs vs current routine reporting.
Introduction
Aortic aneurysm is the pathological dilatation of the aorta and is defined as >50% of the normal expected width for a given aortic segment with a tendency to expand further. 1 Normal aortic root measurements vary with sex, age and anthropometrics. However, thoracic aorta dilatation has been defined previously as ≥40 mm in the ascending aorta and ≥30 mm in the descending aorta, including in European guidelines. 2–4
Aortic aneurysms enlarge approximately 0.07 cm/year, though this can vary based on location, demographics and aetiology. 5 Risk of fatal complications increases with size. 6 There is an incidence of 5.3 cases per 100,000 population per year, though the accuracy of this is debated given the indolent nature of the disease. 7
Most thoracic aortic aneurysms will remain asymptomatic until presentation with life-threatening complications. In patients with a diameter greater than 6 cm, there is a 14.1% annual risk of complication, which includes rupture and dissection. 8 In-hospital mortality for type A dissection is reported at 57% without surgery and remains approximately 23% with emergency surgery. 9 There is a 50% mortality at 30 days from dissection. 10 Additionally, patients with thoracic aortic aneurysms are at increased risk of other major adverse cardiovascular events (MACE), thought to be related to shared risk factors (e.g. smoking, hypertension). 2
There are no known preventative treatments against developing thoracic aneurysm and so early diagnosis and management are key to reducing mortality. 11 As a result of their indolent nature and slow growth, early diagnosis is often through an incidental finding or screening where there is a strong family history. 11,12
In the UK, around half a million CT chest and/or abdomen studies were performed in the year 2020–2021. 13 In routine clinical practice, manual segmentation measurements of the aorta are very rarely performed on non-dedicated aorta CT scans, and double oblique measurements even less frequently. This is particularly true when reported by a general on-call radiologist, with both requiring a degree of training and can be time consuming. An artificial intelligence (AI) analysis could help address this. AI tools now exist that provide automated segmentation of the aorta on a variety of CT chest acquisitions, providing a measure of vessel diameter at recognised anatomical levels. 12,14,15 Whilst this tool is available for commercial purchase, the authors felt a local validation was warranted prior to a final judgement.
The aims of this validation study were to: (1) compare aortic diameter measurements on CT between an automated AI assessment and that of specialist cardiothoracic radiology consultants; (2) determine the consistency of the automated measurements, and (3) assess the potential clinical impact of automated aortic segmentation on routine CT chest studies.
Methods and materials
Patient population
This was a retrospective study of CT chest imaging at our institution, Royal United Hospital Bath, an NHS trust covering a population of 500,000. Three separate patient cohorts were included in the study to address the separate aims of measuring diagnostic accuracy, consistency, and clinical impact. Recorded demographic data included patient age and sex.
Inclusion criteria for the first cohort, evaluating the diagnostic accuracy of an AI tool, were consecutive ECG-gated CT Aorta studies reported by an expert reader between February 2018 and August 2021. The study period selected was to ensure contemporaneous imaging protocols and make use of existing expert reports. Exclusion criteria were: (1) the expert reader reported artefact significantly impairing accurate measurement (excluded at the relevant anatomical level if focal, or completely if severe), (2) aortic valve replacement (AVR) or transcatheter aortic valve implantation (TAVI), and (3) a dissection was reported. Both AVR/TAVI and dissections were excluded at the anatomical level of the abnormality rather than excluding the entire study.
For the second cohort, evaluating the reliability of the AI tool, inclusion criteria were non-ECG-gated non-contrast and ECG-gated contrast phases of the same acquisition CT angiogram aorta undertaken between February 2019 and July 2021. Exclusion criteria consisted of: (1) significant artefact affecting accurate measurements (excluded at the relevant anatomical level), and (2) failure of the AI analysis affecting at least one phase of the acquisition. To test the AI tool’s performance on the non-ECG-gated non-contrast imaging, additional analysis was undertaken comparing its accuracy with an expert reader on the pre-contrast acquisition of these studies.
The final cohort assessed the potential clinical impact of the AI tool. Inclusion criteria consisted of sequential thoracic CT studies (undertaken April–May 2019) until ≥48 studies were included for each of CT thorax, CT pulmonary angiogram, high-resolution CT (HRCT) and acute ECG-gated CT Aorta. These studies were selected as the most notable potential clinical utility of the AI tool may be in scans not routinely reported by expert cardiothoracic readers. Studies failing AI analysis were excluded.
CT acquisition
All CT scans were obtained using routine acquisition parameters, automated tube voltage and current settings on either Siemens Definition Edge or Drive CT machines (Siemens Healthineers, Erlangen, Germany) with suspended respiration from lung apices to bases. Standard CT protocols were followed and are outlined in the supplementary materials.
AI tool
The thinnest available axial reconstructed images were sent for fully automated analysis by an AI-derived post-processing software (AI-Rad Companion Chest CT, Siemens).
Training and validation of the software was conducted by Siemens and the following details of the software are external to our study. The Siemens software was trained on over 1000 chest CT scans including contrast and non-contrast scans and was further validated with 193 different chest CT data sets. 16 All data sets used for this were external to our institution.
Deep reinforcement learning is used in combination with scale-space analysis to detect the aortic landmarks automatically. 17 The aortic root is defined as the region of interest (ROI) for the segmentation algorithm. 16 An adversarial deep image-to-image network is used to perform segmentation in a symmetric convolutional encoder–decoder architecture from the ROI. 18 The front part is a convolutional encoder–decoder network, and the backend a deep supervision network. Blocks inside consist of convolutional and upscaling layers. 16 A centreline model generates the aortic centreline, which is used in combination with aortic landmarks to identify measurement planes at each anatomical location. In each measurement plane, multiple diameters are computed and the maximum in-plane diameter provided. 16 As an output, an image of the aorta accompanies the measurements for ease of reviewing results. Measurements are colour-coded in a traffic-light fashion to highlight abnormalities to the interpreter in a table accompanying the figure, or left out at anatomical levels where measurement failed. Where the AI tool is unable to analyse a scan at any anatomical level a report is returned stating “aortic landmarks not found”.
The frequency that AI analysis failed completely, processed the analysis but created a random output and processed the analysis with inaccurate segmentation were recorded. The frequency of partial (segmentation considered visually appropriate and >1 aortic measurement provided) and full reports were recorded.
Diagnostic accuracy
Dedicated ECG-gated CT aorta acquisitions reported by three expert cardiothoracic radiologists were considered the most appropriate reference standard to compare AI segmentation of the aorta. The expert read was considered the ground truth. Expert cardiothoracic radiologists (a cardiothoracic fellowship consultant radiologist with >10 years’ experience) reported these scans from multiplanar reconstructions in a double oblique view generated using Syngo.via (Siemens Healthineers, Germany). Widely reported anatomical levels of the thoracic aorta were measured 3 : the sinus of Valsalva (SoV, measured cusp-to-cusp), sinotubular junction (STJ), mid-ascending aorta, mid arch, isthmus, mid-descending aorta and hiatus (level of the diaphragm).
Measurements were compared on a per anatomical landmark basis between expert reader (using the largest diameter reported) and AI analysis for all included studies. Interclass correlation was used to measure the reliability between expert and AI readings. Single rater, two-way mixed effects model was used to define the absolute agreement between AI and manual reporters.
The sensitivity, specificity, positive-predictive value (PPV) and negative-predictive value (NPV) for the AI tool’s ability to detect a dilated aorta was measured vs the gold-standard expert reader. This was measured both in the ascending and descending aorta separately (defined as ≥40 mm ascending aorta/≥30 mm descending aorta 2–4 ) and across the thoracic aorta as a whole on a per-patient basis.
Repeated measures
Acute CT aorta studies reported by expert readers provided an opportunity to test for a repeated measures assessment across separate phases of the same acquisition protocol whilst using routine clinically acquired imaging. The non-ECG-gated non-contrast and contrast ECG-gated phases of the same CT angiogram acquisition for 29 patient’s CTs were sent for AI analysis. Measurements were also taken on both phases by the same expert cardiothoracic reader on two separate occasions >4 weeks apart. Repeated measures performance of the AI tool and expert cardiothoracic reader were compared. The diagnostic accuracy of the AI tool in non-ECG-gated non-contrast imaging was also assessed against the expert read, with analysis matching that outlined in the diagnostic accuracy section above.
Clinical impact
Consecutive chest CTs reported by unselected general consultant radiologists at the time of the scan (April–May 2021) were reviewed and sent for AI analysis. Studies were selected to represent a real-life cohort with contemporary imaging protocols to reflect the modern practice of non-expert thoracic readers. This included 50 CT thorax, 50 CT pulmonary angiograms (CTPAs), 49 high-resolution CT chests (HRCTs) and 48 ECG-gated CT aortogram studies (acute scans reported by non-cardiothoracic radiologists).
The manual general radiologists report and electronic clinical records for each patient were reviewed for pre-existing documented aortic dilatation. The proportion of patients where AI analysis would have newly identified a dilated thoracic aorta was recorded.
Statistical analysis
Statistical analysis was performed using SPSS v. 21 (IBM Corp Armonk, NY). Normality was assessed through visual analysis of histograms and the Shapiro–Wilk test. Categorical data are presented as frequency and percentage, and continuous data as mean ± standard deviation or median with interquartile range (IQR). Comparative testing was performed using paired independent t-tests, where appropriate. Reliability was assessed with Bland–Altman plots and via the intraclass correlation coefficient (ICC), with <0.5 indicating poor reliability, 0.5–0.75 moderate, 0.75–0.9 good, and >0.9 excellent reliability. 19 ICC estimates and their 95% confidence intervals were calculated based on a single measures rating, absolute agreement and two-way mixed effects model. For Bland–Altman analysis, the maximum acceptable limits of agreement were set at ± 5 mm based on prior assessment of expert reader interobserver variability. 20 Clinical impact was assessed as change in diagnosis of aortic dilatation with a McNemar test. Statistical significance was defined as a two-tailed p value ≤ 0.05.
Ethics
This is a retrospective service evaluation approved by our institution’s Trust Audit Committee, waiving the requirement for formal written consent in line with the Health Research Author decision tool. 21
Results
Across all three patient cohorts, a full report including an output for all anatomical levels was produced in 387/436 (89%). There were 14 scans (3%) with no analysis possible, 1 scan (<1%) failing manual quality assurance (QA) check for spurious anatomical locations selected, and 34 (8%) with a partial result only (i.e. ≥1 anatomical level not measured). Of scans with partial AI reports produced, 30/34 (88%) had only the hiatus measurement unavailable. The AI tool produced a measurement at ≥1 anatomical level that passed manual QA check in 422/436 (96.8%).
Demographics across the three study cohorts included are presented in Table 1.
Table 1.
Demographics of the three study cohorts
| Patient cohort | Age (mean [D]) | Sex (n [%] male) |
|---|---|---|
| Diagnostic accuracy (n=210) | 75 [±13] | 127 [60] |
| Repeat measures (n=30) | 63 [±17] | 20 [67] |
| Clinical impact | ||
| HRCT (n=49) | 66 [±13] | 25 [51] |
| CT chest (n=50) | 67 [±13] | 31 [62] |
| CTPA (n=50) | 63 [±17] | 27 [54] |
| CT aorta (n=48) | 67 [±15] | 25 [52] |
CTPA, CT pulmonary angiogram; HRCT, high-resolution CT; SD, standard deviation.
Diagnostic accuracy
Of the 195 included scans, 1317/1365 (96%) individual anatomical measurements were analysed after manual QA check with 48 exclusions at focal levels for dissection, incomplete analysis or inaccurate segmentation (Figure 1).
Figure 1.
Study flowchart for (a) diagnostic accuracy, (b) repeated measures analysis and (c) clinical impact. CTPA, CT pulmonary angiogram; HRCT, high-resolution CT.
The absolute difference, proportionate difference and ICC between the expert reader and AI assessment of the aorta dimensions at each anatomical level on ECG-gated contrast imaging was at least ‘good’ throughout (Table 2). AI analysis was most accurate relative to the expert reader in the mid-ascending aorta. Table 2 demonstrates the mean absolute and median proportionate differences in aortic measurements at each anatomical level, which was maximally 2.6 mm (±2.1 mm) at the level of the SoV. On Bland–Altman analysis, measurements crossed the threshold for maximally accepted limits of agreement (>5 mm) at the level of the SoV and STJ (Figure 2). Further detailed comparisons of expert reader and AI measurements are available in Supplementary Table 1.
Table 2.
Comparison of expert reader and AI measurements at each thoracic anatomical aortic level analysed on ECG-gated contrast imaging, assessed against absolute difference (mm), proportionate difference (%) and the average of each anatomical levels single measurer ICC with its associated rating.
| Anatomical location |
Absolute difference (mm; mean [± SD]) |
Proportionate difference (%; median [IQR]) | ICC (95% CI, p) | ICC rating |
|---|---|---|---|---|
| SoV | 2.6 [2.1] | 6 [3–11] | 0.78 (0.67–0.85, p<0.001) | Good |
| STJ | 1.9 [1.9] | 4 [3–8] | 0.83 (0.74–0.88, p<0.001) | Good |
| Mid Asc | 1.3 [1.3] | 3 [0–6] | 0.92 (0.89–0.94, p<0.001) | Excellent |
| Mid Arch | 1.7 [1.7] | 4 [3–8] | 0.76 (0.70–0.82, p<0.001) | Good |
| Isthmus | 1.7 [1.7] | 4 [3–8] | 0.81 (0.76-0.86, p<0.001) | Good |
| Mid Desc | 1.3 [1.4] | 4 [0–8] | 0.88 (0.78–0.92, p<0.001) | Good |
| Hiatus | 1.4 [1.6] | 4 [0–8] | 0.81 (0.76–0.86, p<0.001) | Good |
CI, confidence interval; ICC, intraclass correlation coefficient; IQR, interquartile range; SD, standard deviation; STJ, sinotubular junction; SoV, sinus of Valsalva.
Figure 2.

Bland–Altman analysis comparing expert reader with AI analysis at each anatomical level for ECG-gated contrast imaging. MidArch, mid aortic arch; MidAsc, mid-ascending aorta; MidDesc, mid-descending aorta; SoV, sinus of Valsalva; STJ, sinotubular junction.
Across the 195 studies analysed, relative to the expert reader reference standard the AI analysis would have missed 14 patients (7%) with a diameter ≥40 mm in the ascending aorta (10 at the level of the SoV, 4 at the mid-ascending aorta and 2 at both), and 10 cases with a diameter ≥30 mm in the descending aorta (five at the level of the isthmus, four at the level of the mid-descending aorta and one at both). Of those with a dilated ascending thoracic aorta missed by the AI tool, 6/14 (43%) had a maximum diameter of 40 mm reported by the expert reader. The maximum diameter reported by the expert reader that was missed by the AI tool was 43 mm in the ascending aorta and 36 mm in the descending (at the level of the isthmus). The mean difference in these scans between AI and expert reader was 0.9 mm. Additionally, 5/14 of those with a missed dilated ascending aorta (including both measuring 43 mm) would have been flagged as abnormal based on descending aorta measurements of ≥30 mm. The sensitivity, specificity, PPV and NPV of the AI tool at detecting aortic dilatation are presented in Table 3.
Table 3.
The AI-Rad Companion sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV) for detecting aortic dilatation versus expert cardiothoracic reader on ECG-gated contrast and non-ECG-gated non-contrast imaging.
| Sensitivity | Specificity | PPV | NPV | ||
|---|---|---|---|---|---|
| ECG-gated contrast imaging |
Thoracic Aorta |
77 (66–85%) |
99 (95–100) |
98 (90–100) |
86 (80–90) |
| Ascending Aorta |
74 (62–84) |
98 (94–100) |
96 (87–99) |
87 (82–91) |
|
| Descending Aorta |
68 (66–83) |
98 (94–99) | 85 (68–94) |
94 (90–96) |
|
| Non-ECG-gated non-contrast imaging |
Thoracic Aorta | 100 (63–100) |
77 (50–93) |
67 (46–83) |
100 – |
| Ascending Aorta |
90 (56–100) |
92.3 (64.0–99.8) |
90 (58–98) |
92 (65–99) |
|
| Descending Aorta |
100 (59–100) |
59 (33–82) |
50 (36–64) |
100 – |
NPV, negative-predictive value; PPV, positive-predictive value.
Repeated measures
After exclusions (Figure 1), 25 non-ECG-gated pre-contrast thorax and ECG-gated CT aortogram acquisitions from the same patient scan had both AI and expert radiologist assessment. 16 (64%) were male, mean age of 61 ± 17 years. Repeated measures assessment by both expert reader and AI analysis is presented in Table 4.
Table 4.
Comparison of expert reader vs AI reliability as assessed with the intraclass correlation coefficient (ICC) for repeated measures performance using the single measures rating, absolute agreement and two-way mixed effects model.
| Anatomical location |
Expert reader | AI analysis | ||
|---|---|---|---|---|
| ICC (95% CI, p) | Rating | ICC (95% CI, p) | Rating | |
| SoV | 0.89 (0.71–0.96, p<0.001) | Good | 0.78 (0.53–0.91, p<0.001) | Good |
| STJ | 0.78 (0.57–0.90, p<0.001) | Good | 0.69 (0.40–0.85, p<0.001) | Moderate |
| Mid Asc | 0.96 (0.92–0.98, p < 0.001) | Excellent | 0.88 (0.74–0.95, p<0.001) | Good |
| Mid Arch | 0.79 (0.57–0.91, p < 0.001) | Good | 0.74 (0.49–0.88, p<0.001) | Moderate |
| Isthmus | 0.87 (0.69–0.94, p < 0.001) | Good | 0.79 (0.58–0.91, p<0.001) | Good |
| Mid Desc | 0.84 (0.68–0.93, p < 0.001) | Good | 0.66 (0.35–0.84, p<0.001) | Moderate |
| Hiatus | 0.84 (0.66–0.93, p<0.001) | Good | 0.57 (0.22–0.79, T=0.002) | Moderate |
CI, confidence interval; ICC, intraclass correlation coefficient; STJ, sinotubular junction; SoV, sinus of Valsalva.
Diagnostic accuracy was measured on the non-ECG-gated pre-contrast acquisitions, with the mean absolute difference varying from 1.2 to 4.3 mm across all anatomical levels (Supplementary Table 2). The ICC between the expert reader and AI assessment were graded as moderate at all locations except the mid-ascending and isthmus where it was graded as good. The Bland–Altman limits of agreement crossed the pre-set threshold of 5 mm at all anatomical levels (Supplementary Table 2 and Supplementary Figure 1). When testing accuracy for the detection of aortic dilatation vs expert reader across the full thoracic aorta, sensitivity was 100% and specificity 76.5% (Table 3).
Clinical impact
Figure 3 presents the potential impact of AI aortic measurement on detecting new thoracic aorta dilatation on scans reported by non-expert cardiothoracic readers, subdivided by acquisition type, with AI success rate. Across all scans successfully analysed, the AI tool newly identified aortic dilatation in 53 (27%) patients where it was not previously documented in the baseline clinical report or electronic patient records (X 2 51, p < 0.001; Figure 4). This includes dilatation of up to 47 mm in the ascending aorta and 37 mm in the descending aorta in patients without a known aneurysm.
Figure 3.
Detection of new aortic dilatation via AI analysis of non-ECG gated chest CT imaging sent for AI analysis subdivided by acquisition type. AI, artificial intelligence; CTPA, CT pulmonary angiogram; HRCT, high-resolution CT.
Figure 4.

Detection of new aortic dilatation with AI analysis vs known aortic aneurysm or routine clinical reports by non-expert readers of non-dedicated aorta thoracic imaging. AI, artificial intelligence.
Discussion
This is the first study to compare expert cardiothoracic readers with an AI software segmentation of the aorta for accuracy and potential clinical impact in a UK population and across all chest CT imaging. Its accuracy varied across anatomical levels, however it demonstrates that the Siemens AI-Rad Companion Chest CT segmentation software has the potential to newly identify a dilated aorta on routine thoracic imaging with a high degree of specificity (99.1% on a per-patient basis) but with suboptimal sensitivity (76.5%).
The measurements from the AI performed modestly against expert manual segmentation (imaging example of output provided in Figure 5). Comparative measurements between AI and expert reader for the thoracic aorta demonstrated good to excellent interclass correlation (Table 2). On ECG-gated contrast imaging mean absolute differences in AI measurements vs expert reader varyied from 1.3 mm at the mid-ascending and -descending aorta to 2.6 mm at the SoV. The SoV also had the largest mean measurement variation from the use of AI-Rad Companion Chest CT in other studies. An example includes mean deviations at the SoV of up to 4.7 mm when compared against manual radiologist measurements reported by Rueckel et al. 22 Further, when radiology reporters were given the option to correct the AI anatomical level by Rueckel et al., 50% of the AI measurements at the SoV were manually adjusted. This compares with 34% of all measurements manually adjusted across other anatomical levels, suggesting a systematic AI error in the assessment of the SoV. Rueckel et al propose that the poorer analysis at the SoV is due to the geometrically complex shape of the proximal aorta, which can change with respiration, blood pressure and the bio-elasticity of the aorta. 22 This has been considered in other research outlining the difficulty in measuring the thoracic aorta, which considered the SoV a difficult measurement level due to its asymmetric structure. 20
Figure 5.

Example analyses of expert reader and AI aortic measurements, including (a) ECG-gated CT aorta (coronal plane), (b) measured manually in double oblique view with Syngo.Via (Siemens Healthineers), (c) mediastinal window of a HRCT, with (d, e) AI analysis and report of aorta assessment of HRCT in (c). AI, artificial intelligence; HRCT, high-resolution CT.
Monti et al demonstrated no significant difference in manual and automatic (Siemens, AI-Rad Companion Chest CT) maximum aortic diameter measurements. 14 However, they found a negative bias in automatic measurements of –1.5 mm with a coefficient of reproducibility of 8 mm concordance between AI and manual measurements. 14 In addition, Monti et al and Pradella report lower accuracy at the SoV attributed to the aortic centreline not being perpendicular to the aortic origin with a subsequently tilted measurement plane. 14,15 This is consistent with the geometrical complexities considered by Quaint and Rueckel. 20,22 This may explain the disproportionate number of cases measured at ≥40 mm missed by the AI at the SoV when compared to manual reporting (10/14 [71%] of all dilated ascending aortas missed). These 14 patients represent the 7% of scans where the AI tool would have missed a dilated ascending aorta defined by a ≥40 mm threshold. Pradella et al found similar results for AI misclassified aneurysms at 8.5%, though this is unclear if aneurysms were over- or underreported. 15 In our study the mean difference in AI report vs expert reader was low (0.9 mm), whilst in 6/14 (43%) cases missed by the AI tool the maximum diameter reported by the expert reader was at the pre-defined level for dilated (40 mm). Additionally, 5/14 (36%) would have been highlighted as abnormal by the AI tool based on descending aorta measurements of ≥30 mm. The maximum diameter missed in the ascending aorta for the diagnostic accuracy cohort was 43 mm and in both of these cases would again have been flagged to the reporter based on a dilated descending aorta.
We demonstrate that the AI was most accurate in ECG-gated contrast imaging at the mid-ascending level. We found that the AI performed better in the ascending aorta over the descending. This is reflected in a study by Macruz et al which reports an ICC greater than 0.8 for the ascending aorta and 0.7 for the descending. 23
Absolute differences of up to 2.6 mm were observed in our study across all anatomical levels on ECG-gated contrast CTs. It has been argued that CT measured increases of up to 4 mm may not always represent pathological dilation of the aorta. Discrepancies may result from multiple factors affecting measurement such as image resolution, type of cursor, inclusion of the aortic wall on non-contrast studies and motion artefact. 24,25 These will all lead to interscan, interobserver and intraobserver variability. Quint et al reviewed variability between manual segmentation using double oblique views and though there were mean differences of no more than 1.3 mm at each segmentation level, the 95% confidence interval was up to 5.1 mm. 20 Frazao et al found CT and MRI provide larger estimates of aortic root measurements when compared to transthoracic echocardiography assessment (mean difference 4.9 ± 2.7 mm). 26
However, for ECG-gated contrast studies the AI tool crossed the pre-defined maximum acceptable limits of agreement (>5 mm) at the SoV and STJ. In non-ECG-gated non-contrast studies, it crossed this threshold at all levels, though interpretation of this finding is limited by sample size. On this basis, this tool would be considered insufficiently accurate for use in routine surveillance imaging to track progressive change in known aortic dilatation. In such cases, subtle increases in size may impact major clinical management decisions. There is of course also a sliding scale in severity of dilatation, as evidenced by the mortality rate that rises with increasing dilatation. This is represented by the diagnostic overlap of diameter thresholds, e.g. 40 mm for a ‘dilated’ mid-ascending aorta vs 50 mm for it to be considered ‘aneurysmal’ at the same anatomical location.
It could be argued though that failure to meet the limits of agreement for accuracy at all anatomical levels may be interpreted in a wider context. For example, the tool may still address the proposed goal of identifying unknown dilated aortas on a per-patient basis on routine non-dedicated aorta chest CTs reported by non-expert readers. To this end, the AI tool’s specificity and sensitivity would suggest it may provide a simple and pragmatic way to deliver this. Automated assessment of the thoracic aorta across all chest imaging may identify a significant proportion of patients (27% in our population) with previously undiagnosed aortic dilatation. This is a patient group where the aorta is typically not routinely assessed in detail or reported by non-cardiothoracic radiologists. Measuring the aorta accurately takes a considerable amount of time using specific software to achieve double oblique views. As such, it demonstrates the role software may have in screening for thoracic aortic aneurysms in routine CT chest imaging, whilst not necessarily being sufficiently accurate for subsequent surveillance imaging.
AI not only highlights the shortcomings in manual reporting of routine chest CT studies, but offers a solution to the outlay in reporting time it would require. The incidence of aortic aneurysm is considered to be underreported due to the indolent nature of the disease. Our findings may be relevant in both demonstrating the opportunity to newly diagnose a proportion of patients with a dilated aorta, treat patients before potentially fatal complications, and to better assess its prevalence within the population. This includes the identification of a population at increased risk of other future MACE, 2 likely due to shared cardiovascular risk factors and may enable more aggressive, targeted primary prevention. This surreptitious screening from data already acquired for other purposes may have different cost implications versus a dedicated screening programme. Whether its use in this fashion translates into improved clinical outcomes or cost-effectiveness requires further investigation. This would need to ensure it factored in both the AI failure rate and requirement for a semi-automated QA assessment of the of AI output by the reporting radiologist. If this iteration were introduced into clinical practice, it may be appropriate to manually review scans with an ascending aorta measurement of 35–40 mm to ensure normality.
Our identification of 27% of patients with new aortic dilatation is comparable with others. Monti et al found 31% of their scans showing aneurysms, though the exact mix of heterogeneous scans (ECG-gated aorta vs non-ECG gated) is not outlined. Given the low number of their scans (29%) that demonstrated no pathology, their findings do not reflect the expected diagnosis rate of new disease in screening chest CT and raises the question of a higher pre-test probability in their cohort. 14 Similarly, Artzener et al demonstrated 20.5% of patients with a dilated aorta, though their cohort contained only 50% random CT chest acquisitions with the other 50 selected for aortopathy and mediastinal pathology. 27
It is important to note that the moderate sensitivity on contrast imaging does highlight the potential to miss patients with a dilated aorta if relying solely on AI performance. In our cohort, this would miss a thoracic aorta of no larger than 43 mm, which occurred in two patients, both of whom would have been flagged by the AI report for a descending aorta of >30 mm. Importantly, the detection rate for new diagnosis of a dilated aorta was significantly higher than that of the manual report by non-expert readers. The high specificity of the tool also highlights appropriate resource utilisation as there is a low likelihood of subsequent surveillance imaging being unnecessary based on an error in the baseline AI report.
The AI software successfully returned a reading at ≥1 anatomical level in 96.8% of our scans, though full reports were present in 89%. The success rate of the AI’s segmentation in our study is representative of its performance in the wider literature. Pradella et al found that automatic software (Chest AI, v. 0.2.9.2, Siemens Healthineers, Germany) returned a result in 94% of all measurements in a study with 405 ECG-gated CT aorta studies, 15 and Artzener et al returned results in 99.2% of scan parameters. 27 250 CT examinations underwent Siemens AI-Rad Companion Chest CT software analysis by Monti et al with 93% of scans adequate for analysis and successful transfer. Monti et al found that the software performed poorly with pathological aortas. In particular this included dissection, 14 which were excluded in our analysis. Artzener et al included aortic dissections within their cohort of patients and found the true lumen was only measured if the false lumen was thrombosed, otherwise the true and false lumen would be measured as a single diameter 27
The exclusion of imaging with significant artefact in our study may provide an overestimation of diagnostic accuracy performance. However, artefact also limits manual reporting and as such may be considered a non-diagnostic scan. Additionally, many AI tools currently embedded into clinical practice require a degree of manual QA in combination with AI outputs to reduce the potential for inaccurate reporting. How the relationship between AI and manual reporting differs over severity of artefact would be grounds for further research.
An interpretation of AI tool reliability was assessed across a pre- and post-contrast acquisition taken during the same patient study. The AI tool was not as reliable as the expert reader, with lower ICC ratings. The expert reader was more reliable at all anatomical landmarks apart from matching the AI at the SoV and isthmus. As discussed above, the comparison of images with and without contrast may lead to the inclusion of the aorta wall on non-contrast scans but not in contrast scans. 25 Expert readers may provide a better judgement over the AI of measuring to the aorta wall.
Other studies have assessed the reproducibility of automated thoracic segmentation software. Though Pradella et al achieved near perfect reproducibility, their method for assessment was to send a single phase through the AI software twice. 15 This may not truly represent the reproducibility of an AI tool. We approached this evaluation in an alternative fashion in our study with separate contrast and non-contrast sequences of the same aorta study sent for AI segmentation. Whilst the ICC in our study is lower and analysed from a smaller patient cohort of 25 patients (50 studies), it likely reflects variation in the approach taken to testing reproducibility of the software. However, it is important to recognise the limitations of our approach to reproducibility analysis. Our approach compared a non-ECG-gated with an ECG-gated acquisition. As a result, it would be reasonable to expect some variation in measurements based on changes in aortic dimension throughout the cardiac cycle. Ideally, patients would have undergone sequential ECG-gated acquisitions getting on and off the scanner between each scan to enable true test–retest analysis. However, in the context of our retrospective validation study that attempted to use routinely acquired clinical imaging this provided as close as we could achieve to a meaningful assessment of repeated measures testing.
This study was limited by its retrospective and single-centre design, though this was off-set to some degree by the use of consecutive imaging acquired. Further limitations include the statistical analysis being undertaken unblinded to imaging and the sample size of the repeated measures and diagnostic accuracy in non-ECG-gated non-contrast imaging cohort was low. Further, in our repeated measures analysis, the manual reporting of the sequential ECG-gated non-contrast and contrast CT Aorta acquisitions was completed by a single reader. This may lead to observer bias, though a period of >4 weeks elapsed between reporting the non-contrast and contrast scans and was performed blinded to the baseline measurements to minimise this effect.
The exclusion of aortas with dissection or prior valve surgery may partially limit the hypothetical benefit of our findings to the opportunistic screening of all patients. Further research evaluating the AI tool in these patients would be required to evaluate the AI for its role in monitoring patients post-intervention, having been beyond the scope of this study. Additionally, the AI tool was analysed in CTs acquired using a single vendor (Siemens) and may require validation using other CT systems to ensure the generalisability of our findings.
Conclusions
In conclusion, AI was feasible in the majority of scans and had good agreement with exception of the root. The high specificity for detecting aortic dilatation suggests the tool may have a role in opportunistic screening for aortic dilatation in CT chests performed for other clinical indications. Further work is required to determine if the tool’s sensitivity is acceptable in a cost-effective analysis.
Footnotes
Competing interests: All other authors report no conflict of interest.
Funding: Dr Rodrigues and Dr Graby report personal fees from Sanofi, and Dr Rodrigues personal fees from NHSX, HeartFlow - Physicians’ services, and is co-founder and partner for Heart & Lung health, all outside of this submitted work. Our institution (the Royal United Hospitals Bath NHS Foundation Trust) is a European reference centre for Siemens Healthineers CT scanners and Siemens AI-Rad Companion was provided free of charge for the study but the study was conducted independently to Siemens Healthineers.
Contributor Information
John Graby, Email: john.graby@nhs.net.
Maredudd Harris, Email: john.harris10@nhs.net.
Calum Jones, Email: calum.jones4@nhs.net.
Harry Waring, Email: harry.waring@nhs.net.
Stephen Lyen, Email: stephen.lyen1@nhs.net.
Benjamin J Hudson, Email: benjamin.hudson1@nhs.net.
Jonathan Carl Luis Rodrigues, Email: j.rodrigues1@nhs.net.
REFERENCES
- 1. Mathur A, Mohan V, Ameta D, Gaurav B, Haranahalli P. Aortic aneurysm. J Transl Int Med 2016; 4: 35–41. doi: 10.1515/jtim-2016-0008 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2. Erbel R, Aboyans V, Boileau C. ESC guidelines on the diagnosis and treatment of aortic diseases. Eur Heart J 2014; 35: 2873–2926. doi: 10.1093/eurheartj/ehu281 [DOI] [PubMed] [Google Scholar]
- 3. Hiratzka LF, Bakris GL, Beckman JA, et al. ACCF/AHA/AATS/ACR/ASA/SCA/SCAI/SIR/STS/SVM guidelines for the diagnosis and management of patients with Thoracic aortic disease: executive summary: A report of the American college of cardiology foundation/American heart Association task force on Pra. Circulation 2010; 121: 266–369. doi: 10.1161/CIR.0b013e3181d4739e [DOI] [Google Scholar]
- 4. Hager A, Kaemmerer H, Rapp-Bernhardt U, Blücher S, Rapp K, Bernhardt TM, et al. Diameters of the Thoracic aorta throughout life as measured with Helical computed tomography. J Thorac Cardiovasc Surg 2002; 123: 1060–66. doi: 10.1067/mtc.2002.122310 [DOI] [PubMed] [Google Scholar]
- 5. Sharples L, Sastry P, Freeman C, Bicknell C, Chiu YD, Vallabhaneni SR, et al. Aneurysm growth, survival, and quality of life in untreated Thoracic aortic aneurysms: the effective treatments for Thoracic aortic aneurysms study. Eur Heart J 2022; 43: 2356–69. doi: 10.1093/eurheartj/ehab784 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Erbel R, Aboyans V, Boileau C. ESC guidelines on the diagnosis and treatment of aortic diseases. Eur Heart J 2014; 35: 2873–29266. doi: 10.1093/eurheartj/ehu281 [DOI] [PubMed] [Google Scholar]
- 7. Gouveia e Melo R, Silva Duarte G, Lopes A, Alves M, Caldeira D, Fernandes e Fernandes R, et al. Incidence and prevalence of Thoracic aortic aneurysms: A systematic review and meta-analysis of population-based studies. Seminars in Thoracic and Cardiovascular Surgery 2022; 34: 1–16. doi: 10.1053/j.semtcvs.2021.02.029 [DOI] [PubMed] [Google Scholar]
- 8. Erbel R, Eggebrecht H. Aortic dimensions and the risk of dissection. Heart 2006; 92: 137–42. doi: 10.1136/hrt.2004.055111 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Wang TKM, Desai MY. Thoracic aortic aneurysm: optimal surveillance and treatment. CCJM 2020; 87: 557–68. doi: 10.3949/ccjm.87a.19140-1 [DOI] [PubMed] [Google Scholar]
- 10. Melvinsdottir IH, Lund SH, Agnarsson BA, Sigvaldason K, Gudbjartsson T, Geirsson A. The incidence and mortality of acute Thoracic aortic dissection: results from a whole nation study. Eur J Cardiothorac Surg 2016; 50: 1111–17. doi: 10.1093/ejcts/ezw235 [DOI] [PubMed] [Google Scholar]
- 11. Sastry P, Hughes V, Hayes P, Vallabhaneni S, Sharples L, Thompson M, et al. The ETTAA study protocol: a UK-wide observational study of 'effective treatments for Thoracic aortic aneurysm' BMJ Open 2015; 5: e008147. doi: 10.1136/bmjopen-2015-008147 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Sedghi Gamechi Z, Bons LR, Giordano M, Bos D, Budde RPJ, Kofoed KF, et al. Automated 3D Segmentation and diameter measurement of the Thoracic aorta on non-contrast enhanced CT. Eur Radiol 2019; 29: 4613–23. doi: 10.1007/s00330-018-5931-z [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. NHS England and NHS Improvement . Diagnostic imaging Dataset Annual Statistical Release 2020/21, Published 18th November 2021. Available from: https://www.england.nhs.uk/statistics/statistical-work-areas/diagnostic-imaging-dataset/diagnostic-imaging-dataset-2020-21-data/ (accessed 13 Apr 2022)
- 14. Monti CB, van Assen M, Stillman AE, Lee SJ, Hoelzer P, Fung GSK, et al. Evaluating the performance of a convolutional neural network algorithm for measuring thoracic aortic diameters in a heterogeneous population. Radiology: Artificial Intelligence 2022; 4(. doi: 10.1148/ryai.210196 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. Pradella M, Weikert T, Sperl JI, Kärgel R, Cyriac J, Achermann R, et al. Fully automated guideline-Compliant diameter measurements of the Thoracic aorta on ECG-Gated CT angiography using deep learning. Quant Imaging Med Surg 2021; 11: 4245–57. doi: 10.21037/qims-21-142 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16. Healthineers S. Features, Data, and Algorithms AI-Rad Companion Chest CT VA13 Whitepaper. 2021. Available from: https://cdn0.scrvt.com/39b415fb07de4d9656c7b516d8e2d907/3c06d6256fff0b5d/c01241ad9c70/DH-AI-Rad-Companion-Chest-CT-Whitepaper--2-.PDF (accessed 25 May 2022)
- 17. Ghesu FC, Georgescu B, Zheng Y, et al. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 1 Multi-Scale Deep Reinforcement Learning for Real-Time 3D-Landmark Detection in CT Scans [DOI] [PubMed] [Google Scholar]
- 18. Yang D, Xu D, Zhou SK, et al. (2017) Automatic Liver Segmentation Using an Adversarial Image-to-Image Network
- 19. Koo TK, Li MY. A guideline of selecting and reporting Intraclass correlation coefficients for reliability research. J Chiropr Med 2016; 15: 155–63. doi: 10.1016/j.jcm.2016.02.012 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20. Quint LE, Liu PS, Booher AM, Watcharotone K, Myles JD. Proximal Thoracic aortic diameter measurements at CT: Repeatability and reproducibility according to measurement method. Int J Cardiovasc Imaging 2013; 29: 479–88. doi: 10.1007/s10554-012-0102-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21. NHS Health Research Authority . Decision tools - is my study research. 2020. Available from: http://www.hra-decisiontools.org.uk/research (accessed 20 Sep 2021)
- 22. Rueckel J, Reidler P, Fink N, Sperl J, Geyer T, Fabritius MP, et al. Artificial intelligence assistance improves reporting efficiency of Thoracic aortic aneurysm CT follow-up. Eur J Radiol 2021; 134: 109424. doi: 10.1016/j.ejrad.2020.109424 [DOI] [PubMed] [Google Scholar]
- 23. Macruz FB de C, Lu C, Strout J, Takigami A, Brooks R, Doyle S, et al. Quantification of the thoracic aorta and detection of aneurysm at CT: development and validation of a fully automatic methodology. Radiology: Artificial Intelligence 2022; 4(. doi: 10.1148/ryai.210076 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24. Elefteriades JA, Farkas EA. Thoracic aortic aneurysm. clinically pertinent controversies and uncertainties. J Am Coll Cardiol 2010; 55: 841–57. doi: 10.1016/j.jacc.2009.08.084 [DOI] [PubMed] [Google Scholar]
- 25. Elefteriades JA, Mukherjee SK, Mojibian H. Discrepancies in measurement of the Thoracic aorta. Journal of the American College of Cardiology 2020; 76: 201–17. doi: 10.1016/j.jacc.2020.03.084 [DOI] [PubMed] [Google Scholar]
- 26. Frazao C, Tavoosi A, Wintersperger BJ, Nguyen ET, Wald RM, Ouzounian M, et al. Multimodality assessment of Thoracic aortic dimensions. J Thorac Imaging 2020. doi: 10.1097/RTI.0000000000000514 [DOI] [PubMed] [Google Scholar]
- 27. Artzner C, Bongers MN, Kärgel R, Faby S, Hefferman G, Herrmann J, et al. Assessing the accuracy of an artificial intelligence-based Segmentation algorithm for the Thoracic aorta in computed tomography applications. Diagnostics (Basel) 2022; 12(): 1790. doi: 10.3390/diagnostics12081790 [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.


