Skip to main content

This is a preprint.

It has not yet been peer reviewed by a journal.

The National Library of Medicine is running a pilot to include preprints that result from research funded by NIH in PMC and PubMed.

ArXiv logoLink to ArXiv
[Preprint]. 2025 Jul 8:arXiv:2312.17670v4. Originally published 2023 Dec 29. [Version 4]

Benchmarking the CoW with the TopCoW Challenge: Topology-Aware Anatomical Segmentation of the Circle of Willis for CTA and MRA

Kaiyuan Yang a,*, Fabio Musio a,b,*, Yihui Ma c,d,*, Norman Juchler b, Johannes C Paetzold e, Rami Al-Maskari f,g, Luciano Höher f, Hongwei Bran Li a,h, Ibrahim Ethem Hamamci a, Anjany Sekuboyina a, Suprosanna Shit a, Houjing Huang a, Chinmay Prabhakar a, Ezequiel de la Rosa a, Bastian Wittmann a, Diana Waldmannstetter a,g, Florian Kofler a,g,i,j, Fernando Navarro a,g,i, Martin J Menten g,k,l, Ivan Ezhov g, Daniel Rueckert g,i,k,l, Iris N Vos m, Ynte M Ruigrok n, Birgitta K Velthuis o, Hugo J Kuijf m, Pengcheng Shi p, Wei Liu p, Ting Ma p,q, Maximilian R Rokuss r,s, Yannick Kirchhoff r,s,u, Fabian Isensee r,t, Klaus Maier-Hein r,v, Chengcheng Zhu w, Huilin Zhao x, Philippe Bijlenga y,, Julien Hämmerli y,, Catherine Wurster y,, Laura Westphal z,, Jeroen Bisschop aa,, Elisa Colombo ab,, Hakim Baazaoui z,, Hannah-Lea Handelsmann z,, Andrew Makmur ac,, James Hallinan ac,, Amrish Soundararajan ad,, Bene Wiestler i,, Jan S Kirschke i,, Roland Wiest ae,, Emmanuel Montagnon af,#, Laurent Letourneau-Guillon af,#, Kwanseok Oh ag,ah,#, Dahye Lee ag,#, Orhun Utku Aydin ai,#, Adam Hilbert ai,#, Jana Rieger ai,#, Dimitrios Rallios ai,#, Satoru Tanioka ai,#, Alexander Koch ai,#, Dietmar Frey ai,#, Abdul Qayyum aj,#, Moona Mazher ak,#, Steven Niederer aj,#, Nico Disch r,s,u,#, Julius Holzschuh r,#, Dominic LaBella al,#, Francesco Galati am,#, Daniele Falcetta am,#, Maria A Zuluaga am,#, Chaolong Lin an,#, Haoran Zhao an,#, Zehan Zhang ao,#, Minghui Zhang ap,aq,#, Xin You ap,aq,#, Hanxiao Zhang ap,#, Guang-Zhong Yang ap,#, Yun Gu ap,aq,#, Sinyoung Ra ar,#, Jongyun Hwang ar,#, Hyunjin Park as,#, Junqiang Chen at,#, Marek Wodzinski au,av,#, Henning Müller au,#, Nesrin Mansouri aw,ax,#, Florent Autrusseau aw,ax,#, Cansu Yalçin ay,#, Rachika E Hamadache ay,#, Clara Lisazo ay,#, Joaquim Salvi ay,#, Adrià Casamitjana ay,#, Xavier Lladó ay,#, Uma Maria Lal-Trehan Estrada ay,#, Valeriia Abramova ay,#, Luca Giancardo az,#, Arnau Oliver ay,#, Paula Casademunt ba,#, Adrian Galdran ba,#, Matteo Delucchi b,bb,#, Jialu Liu bc,bd,#, Haibin Huang bc,bd,#, Yue Cui bc,bd,#, Zehang Lin be,#, Yusheng Liu bf,#, Shunzhi Zhu be,#, Tatsat R Patel bg,bi,#, Adnan H Siddiqui bg,bi,#, Vincent M Tutino bg,bh,#, Maysam Orouskhani w,#, Huayu Wang w,#, Mahmud Mossa-Basha w,#, Yuki Sato bj,#, Sven Hirsch b,**, Susanne Wegener z,**, Bjoern Menze a,**
PMCID: PMC10793481  PMID: 38235066

Abstract

The Circle of Willis (CoW) is an important network of arteries connecting major circulations of the brain. Its vascular architecture is believed to affect the risk, severity, and clinical outcome of serious neurovascular diseases. However, characterizing the highly variable CoW anatomy is still a manual and time-consuming expert task. The CoW is usually imaged by two non-invasive angiographic imaging modalities, magnetic resonance angiography (MRA) and computed tomography angiography (CTA), but there exist limited datasets with annotations on CoW anatomy, especially for CTA. Therefore, we organized the TopCoW challenge with the release of an annotated CoW dataset. The TopCoW dataset is the first public dataset with voxel-level annotations for 13 CoW vessel components, enabled by virtual reality technology. It is also the first large dataset using 200 pairs of MRA and CTA from the same patients. As part of the benchmark, we invited submissions worldwide and attracted over 250 registered participants from six continents. The submissions were evaluated on both internal and external test datasets of 226 scans from over five centers. The top performing teams achieved over 90% Dice scores at segmenting the CoW components, over 80% F1 scores at detecting key CoW components, and over 70% balanced accuracy at classifying CoW variants for nearly all test sets. The best algorithms also showed clinical potential in classifying fetal-type posterior cerebral artery and locating aneurysms with CoW anatomy. TopCoW demonstrated the utility and versatility of CoW segmentation algorithms for a wide range of downstream clinical applications with explainability. The annotated datasets and best performing algorithms have been released as public Zenodo records to foster further methodological development and clinical tool building.

Keywords: Circle of Willis, Vessel Segmentation, Variant Classification, Brain CT Angiography, Brain MR Angiography, Virtual Reality, Fetal PCA, Aneurysm Location

1. Introduction

The Circle of Willis (CoW) is an important anastomotic network of arteries connecting the anterior and posterior circulations of the brain, as well as the left and right cerebral hemispheres [1]. Due to its centrality, the CoW is commonly involved in pathologies like aneurysms and stroke. Clinically, the vascular architecture of the CoW is believed to impact the occurrence and severity of stroke [2, 3, 4, 5], pose a potential risk for aneurysm formation [6], and affect the neurologic events and clinical outcomes of neurosurgeries [7, 8]. An accurate characterization of the CoW is therefore of great clinical relevance.

However, clinicians have articulated an unmet demand for efficient software tools to analyze the angio-architecture of the CoW. Assessing the anatomy and vascular components of the CoW from angiography images is still a manual and time-consuming task requiring specialist judgment. The CoW anatomy involves multiple connections and branches of different cerebral vessels. These vessels vary in diameters from around 1 to 4 mm [9]. CoW vessel components are difficult to identify accurately in isolation and often require subtle spatial relationship to distinguish them anatomically. The vessels also have curvatures and turns along their courses. This can result in vessels crossing paths on the angiography images but can be difficult to differentiate whether the vessels are just touching or there are communicating blood flows at the crossing points. Furthermore, the CoW naturally has many variants of which certain principal artery components are hypoplastic or absent. It is estimated that only less than around half of our population has a complete CoW [9, 10]. It is common to see the CoW anatomies vary markedly from person to person. Characterizing the CoW anatomies can therefore be a challenging task due to the complexity and heterogeneity of the anatomy.

The brain arteries, including the CoW, are commonly diagnosed and imaged by two non-invasive angiographic imaging modalities, namely magnetic resonance angiography (MRA) and computed tomography angiography (CTA). There have been a number of publicly available datasets on MRA modality. Earlier MRA datasets include the CASILab (also known as TubeTK or MIDAS) [11] and the IXI [12] datasets, which were acquired from scanners before 2006 and with limited vessel annotations. Recently, more MRA datasets have been published and some were annotated with binary vessel masks, such as the CAS [13] dataset, the SMILE-UHURA [14] dataset on 7T MRA, and the COSTA [15] dataset that offered a subset of annotated CASIL and IXI images. However, the vessel annotation was in binary and there were no anatomical annotations on the CoW. Furthermore, to our knowledge, annotated dataset on the other important modality, CTA, did not exist.

Previously, there has been a high barrier to entry for annotating the CoW anatomy: one would not only need expert-level neuroanatomical knowledge to label or verify the complex and variable CoW anatomy, but also have to overcome the laborious and time-consuming process of 2D annotation for multiclass CoW vessels. To address such annotation obstacles, we turned to virtual reality (VR), which helped attract clinicians’ interest and engage them in the annotation process via its appealing visualization and gamification aspects. VR also significantly accelerated the annotation and verification process for tortuous and intertwined vessels via an intuitive 3D workflow.

Prior work on the CoW anatomy characterization task has been developed mainly as a labeling task built upon binary vessel masks, skeletons or graphs [16, 17, 18, 19, 20, 21], and with two recent studies that directly tackled the problem as a multiclass segmentation task [22, 23]. However, only private annotated in-house data or public data without verified CoW annotations were used, and the studies were restricted to only the MRA modality. Furthermore, given the complex and highly heterogeneous anatomies of the CoW in clinical settings, the difficulties associated with the CoW anatomy characterization task in past studies have not been sufficiently conveyed or discussed. We thus identify the following contributions we can make to the field: 1) We provide open data with verified annotations for CoW segmentation benchmarking. A public annotated CoW dataset can benefit algorithm development and comparison. 2) We include the CTA modality. Clinically, CTA is an equally important angiography modality as MRA for CoW anatomy diagnosis. 3) We shed light on the anatomical complexities of CoW variants and evaluate the clinical relevance of current algorithms in handling such complexities.

To this end, we organized a benchmark on “Topology-Aware Anatomical Segmentation of the Circle of Willis for CTA and MRA”, or “TopCoW” for short, as registered and included in the Medical Image Computing and Computer Assisted Intervention (MICCAI) conference held in 2023 and 2024. TopCoW was the first public challenge on CoW anatomical segmentation featuring voxel-level vessel annotations on two common non-invasive angiographic imaging modalities, MRA and CTA. The main tasks of the challenge were to automatically segment the CoW vessels and to classify the CoW variants on 3D angiographic images. We collected submissions from global participants and evaluated the performance on both internal and external multi-center test datasets. In addition, we evaluated the CoW segmentation algorithms on two clinical downstream tasks that were routinely performed in practice: classifying fetal-type posterior cerebral artery (PCA) and locating intracranial aneurysms using the CoW anatomy, with the aim of assessing the ability of the submissions to address real-world clinical needs. For transparent benchmark and continued development, we made our annotated datasets and the best algorithm submissions publicly available on Zenodo records3.

2. Challenge Dataset and Study Design

2.1. TopCoW Dataset

The TopCoW data cohort was composed of patients admitted to the Stroke Center of the University Hospital Zurich (USZ) in 2018 and 2019. The data were acquired at the hospital during routine examinations following standard procedures of the respective imaging. Siemens scanners were used for both modalities. MRA scans were imaged with magnetic field strength of 3 Tesla or 1.5 Tesla. In total, 200 pairs of MRA and CTA scans from unique patients were curated for the TopCoW challenge and subsequently split into train and test sets arbitrarily by patient. The inclusion criteria for the TopCoW data were: 1) both MRA and CTA scans were available and in good quality for that patient; 2) at least the MRA or CTA allowed for an assessment of the CoW anatomy; 3) no large aneurysms inside the CoW ROI; 4) certain rare CoW variants that could not be characterized by our 13 annotated CoW components were excluded. All TopCoW data were anonymized, defaced and cropped to the braincase region. Training, validation, and internal test cases all had the MRA and CTA joint-modality pairs, with one scan for each modality. Between the two years of the challenge, the training dataset grew from 90 to 125 patients, and the test dataset grew from 35 to 70 patients. The final 2024 training dataset had 125 patients with the images and annotations released to the public. The validation set included 5 patients whose annotations were not publicly released but were used on the submission website to allow participants to dry run their submissions, which was not included in the final benchmark evaluation. The internal test set had 70 patients and was hidden from the public. The TopCoW data used had been approved by the local ethical committee. The anonymized image data were approved to be released under the “Open use. Must provide the source. Use for commercial purposes requires permission of the data owner.” license from the OpenData Swiss [24].

More information on the data cohort, inclusion and exclusion of CoW variants, the anonymization pre-processing, and dataset changelog are in the Supplementary S1, S2, S3, and S4.

2.2. Data Annotation

For each 3D angiography image, we provided three types of annotation regarding the CoW: the voxel-level multiclass segmentation mask of the CoW, a 3D bounding box for the CoW region of interest (ROI), and the CoW variant graph. Virtual reality (VR) was used to efficiently annotate and verify the CoW anatomy in 3D. Fig. 1a shows the workflow and view from VR. The VR annotation setup followed the method as described in [25]. There were 13 CoW vessel components for the multiclass segmentation annotation: left and right internal carotid artery (ICA), left and right anterior cerebral artery (ACA), left and right middle cerebral artery (MCA), anterior communicating artery (Acom), left and right posterior communicating artery (Pcom), left and right posterior cerebral artery (PCA), and basilar artery (BA). Occasionally the anterior part of the CoW can have a third A2 artery arising from the Acom, and we labeled it with class 3rd-A2. Fig. 1a right shows an MRA example with all 13 CoW vessel class labeled. The CoW annotation protocol was designed by a senior neurosurgeon (Y.M., over 10 years of experience) and reviewed by a senior neurosurgeon (P.B., over 15 years of experience) and a senior neurologist (S.W., over 15 years of experience). Y.M. used around 35 initial patients to educate and train the annotators (K.Y. and F.M.) on the CoW anatomical knowledge and the annotation protocol. Around another 40 patients from subsequent cases that the annotators were uncertain of were reviewed and verified by Y.M.. Second opinions and verifications were also obtained from other neurosurgeons (P.B., J.H., C.W., E.C.) and neurologists (S.W., L.W., H.B.) for around 15 patients. All annotated data used in the benchmark were manually verified by at least one annotator. Further details on the CoW annotation protocol can be found in Supplementary S5.

Fig. 1.

Fig. 1.

CoW annotation. (a) Left shows data annotation using VR. Middle shows tailored settings by adjusting opacity, threshold, and window dynamically for suitable visualization during annotation. The right image shows the 13 anatomical labels for the CoW anatomy. (b) TopCoW dataset has paired modalities, CTA and MRA, from the same patient. Voxel annotations of the CoW vessels and a 3D bounding box of the CoW ROI are labelled for each modality. (c) CoW variant graph annotation was converted from the CoW segmentation mask. Each CoW can be classified by an anterior variant (AV) and a posterior variant (PV) graph. Both AV and PV are identified by a four-edge graph, with 0 being absent and 1 being present in the edge-list. (d) Inter-rater agreement for CoW variant classification on 40 TopCoW CTA test cases. Accuracy for each variant is shown along with the balanced accuracy and Cohen’s Kappa score.

Fig. 1b shows an example of segmentation mask and ROI annotations for both MRA and CTA modalities from a TopCoW patient. The CoW ROI was defined as the 3D bounding box containing the volume required for the diagnosis of the CoW variant with a padding. For higher sensitivity, we pad the bounding box with roughly the diameter of the ICA to include slightly more regions in the ROI.

The third type of annotation, the CoW variant graph, was derived from the segmentation mask and encompasses the anterior variant (AV) and posterior variant (PV) graphs. Fig. 1c shows the AV and PV graph composition. The AV graph was determined by four edges: L-A1, Acom, 3rd-A2, and R-A1. The corresponding edge-list was defined by 0 or 1 according to the edge presence. For example, AV-1001 is the anterior variant that has L-A1 present, Acom and 3rd-A2 absent, and R-A1 present. Similarly, the PV graph was determined by a four-element edge-list of L-Pcom, L-P1, R-P1, and R-Pcom.

The TopCoW training data of 250 MRA and CTA and their annotations have been released in a public Zenodo repository at https://zenodo.org/records/15692630.

2.3. Inter-Rater Agreement

To estimate the variability of the annotations used for benchmarking and the upper bounds of algorithmic performance, we analyzed inter-rater agreement in two aspects:

CoW Variant Classification Agreement.

Senior neurosurgeon P.B. labeled the AV and PV classes for 40 CTA patients from the TopCoW test set, using the images in a dedicated 2-hour session. Labels from P.B. were compared with the annotations used in the benchmark. The selected cases covered all available CoW variants in the internal test data, including 4 AV classes and 6 PV classes, as shown in Fig. 1d. Balanced accuracies between the raters were 88% for AV and 78% for PV. Cohen’s Kappa scores were 83% for AV and 72% for PV, suggesting good agreement.

Voxel-Level Segmentation Agreement.

Voxel-level annotations were done on a subset of 5 patients from the TopCoW test set by the two manual annotators (K.Y. and F.M.). These five patients were selected because they each contained most or all of the CoW multiclass labels. The CoW anatomical annotations from both annotators were evaluated for Dice scores. Many CoW component classes had Dice scores of around 90% or above, while R-Pcom, L-Pcom, Acom, and 3rd-A2 had slightly lower Dice at 76-89%. Detailed results on the voxel-level segmentation agreement can be found in Supplementary S6.

2.4. External Multi-Center Test Data

In addition to the internal test sets from the TopCoW dataset, we gathered and annotated 86 MRA and CTA scans from four external multi-center test datasets for evaluation on the robustness of the algorithms. These external test datasets were from existing public datasets without CoW annotations. Two external CTA test sets were from the TUM University Hospital in Germany of the public ISLES’24 challenge training set (ISLES) [26, 27] and various hospitals in China of the public Large IA Segmentation dataset (LargeIA) [28, 29]. Two external MRA test sets were from the Lausanne University Hospital in Switzerland of a public OpenNeuro dataset (Lausanne) [30, 31] and the Hammersmith Hospital in UK of the public IXI dataset (IXI-HH) [12]. The inclusion criteria were similar to that of the TopCoW dataset. For ISLES, we chose 26 CTA patients whose CoW were not occluded within the ROI. For LargeIA dataset, we chose 20 CTAs that do not have aneurysms inside the CoW ROI. For Lausanne and IXI-HH datasets, we chose 20 MRAs each from the healthy control group. All external datasets were annotated in the same fashion as the TopCoW dataset for the CoW benchmark.

Fig. 2 shows the statistical summary of the image information of the training, internal test, and external test data within the ROI. We compared the voxel dimension, entropy, and image intensity. TopCoW data had similar training and test distribution. ISLES and LargeIA datasets had much thinner slice thickness than TopCoW CTA. Lausanne and IXI-HH datasets had much bigger pixel spacing in the X-Y dimension compare with TopCoW MRA. Voxel dimensions were quite different among the external datasets. IXI-HH had a marked lower entropy, which may be due to its dated nature as the images were acquired from around 20 years ago. IXI-HH also had a very different mean intensity distribution compared to other datasets, with many MR images having ultra-high intensity values. Overall, the TopCoW internal test images were in-distribution while the external test datasets were out-of-distribution, which is useful for evaluating generalizability.

Fig. 2.

Fig. 2.

Data characteristics of the TopCoW training, internal test, and external multi-center test datasets. CTA datasets are in blue violin plots, and MRA datasets are in red. Datasets were compared in terms of pixel spacing, slice thickness, entropy, and mean intensity inside the ROI. One extreme outlier in mean intensity was removed from TopCoW MRA training set for visualization purposes. ‘n’ is number of cases. ‘*’ indicates external test datasets.

Distributions of the CoW variants in all our datasets are shown in Supplementary S7.

The multi-center test sets of 86 images with our CoW annotations used for the benchmark can be accessed from a public Zenodo repository at https://zenodo.org/records/15692630.

2.5. Algorithm Submission

Our challenge had two tracks for algorithm submissions, namely a CTA track and an MRA track. The main tasks were to multiclass segment the anatomical components of the CoW and to classify the CoW variant graph. The input to the algorithm was the whole 3D image volume, and the evaluation was conducted within the CoW ROI. For all tasks, the input to the submitted algorithm was initially intended to be a pair of CTA and MRA images from a patient due to the paired-modality feature of the TopCoW dataset. Algorithms that only needed one of the modalities could simply ignore the other modality input. In practice, all of the participating algorithms worked with single-modality input, and thus the submissions were also able to be evaluated with the single-modality external test datasets.

The submitted algorithms must be fully-automatic in the form of isolated Docker containers. For internal test data, the Docker containers were run in the cloud on the submission platform that provided an Nvidia T4 GPU with 16GB GPU memory. Submitted algorithms were limited to a runtime of 12-15 minutes per test case for inference on the cloud. Each team was given only one opportunity to upload their containers for the hidden test set. For external test sets, the Docker containers were run locally on a laptop with an RTX 3080 GPU with 16GB GPU memory.

2.6. Evaluation of Algorithms

Voxel-Level Multiclass Segmentation.

For voxel-level metrics, the multiclass CoW segmentation predictions were evaluated using Dice similarity coefficient (Dice score), centerline Dice (clDice) [32], Hausdorff distance at 95% percentile, and connected component (zero-th Betti number) error.

More details on the TopCoW tasks and the evaluation metrics are in Supplementary S8.

Beyond Segmentation I: Key CoW Component Detection.

The first beyond segmentation metric was the average F1 score, which is the harmonic mean of the precision and recall, for detection of Acom, Pcoms, and 3rd-A2. Positive detection was defined as at least 25% intersection over union between the predicted and ground truth masks.

Beyond Segmentation II: CoW Variant Classification.

The second beyond segmentation metric was variant balanced accuracy (VarBalAcc) for CoW variant graph classification. The VarBalAcc was calculated for both anterior and posterior variants. The variant class was determined using the AV and PV edge-list of the the variant graph, as shown in Fig. 1c. The segmentation mask was converted to edge-list based on presence of the Acom, Pcoms, and 3rd-A2 labels, and whether ACA and PCA were connected to the relevant neighbour labels of ICA and BA for A1 and P1 edges.

The separate CoW variant classification task required the algorithms to output the variant graph directly instead of a segmentation mask. The CoW variant classification task shared the same evaluation metric as the second beyond segmentation metric, which was the VarBalAcc for both anterior and posterior variants.

Ranking.

The algorithms were evaluated on the internal test sets using all the metrics, and their rank positions for each metric were averaged to reach a ranking for the leaderboards. To evaluate the ranking stability, we also created 10 bootstraps of the internal test sets and calculated the rankings on the bootstrapped test sets. We refer to the top performing teams on the averaged rank for CTA or MRA tracks as “top teams”. The top teams were further evaluated on the external multi-center test sets for generalizability. Details on the ranking analysis can be found in Supplementary S9.

Clinical Application I: Fetal PCA Classification.

We extracted diameters and centerline of CoW segmentation masks using the workflow from [33], with Supplementary S10 to provide more details. The diameters along the Pcom and P1 segments were used to determine the CoW anatomical variant called the fetal PCA variant [1]. We compared the diameters of the Pcom and P1 at the 25% percentile. If the Pcom was slightly larger in diameter (≥ 1.05x the diameter of P1), the CoW was classified as having a fetal PCA variant type. Fetal PCA was assessed separately for the left and right sides. The same set of 40 TopCoW CTA cases used in inter-rater agreement for variant classification were also labeled for fetal PCA class by the senior neurosurgeon (P.B.), who labeled the fetal PCA by visually inspecting the images. The fetal PCA labels from P.B. were treated as ground truth.

Clinical Application II: Locating Aneurysm.

We selected 12 patients with intracranial aneurysms along their CoW vessels from the aforementioned external LargeIA dataset, which included aneurysm ground truth annotations. The aneurysm locations were then reviewed and labeled by a senior neurosurgeon (Y.M.). CoW segmentation algorithms were applied to the images of the aneurysm patients, and the resulting CoW predictions were overlaid with the provided aneurysm ground truth. The aneurysm’s location was determined by identifying the CoW labels adjacent to or overlapping with the aneurysm mask.

3. Results

3.1. Progression of Submissions

Our two iterations of the TopCoW challenge received more than 250 registrations from participants from six continents. Over 25 teams made submissions to the benchmark. Notably, five teams participated in both years. As shown in Fig. 3a, by comparing the performance of the five teams on the common 34 patients present in both years’ test sets, we observed a marked improvement in their results, especially for the CTA modality. While an increase in our training data might have contributed to the performance improvement, two findings from Fig. 3a revealed that the progress came more from the innovation of algorithms. Firstly, team ‘DKFZ’ was the only team that did not make any major changes to their algorithms, and the increased training data had little effect on the performance change, especially on the CTA modality. Secondly, teams like ‘UZH’, ‘NIC-VICOROB’, and ‘junqiangchen’ that made major design changes to their algorithms resulted in more drastic improvement. Thus, the performance improvement was largely propelled by new methodological breakthroughs.

Fig. 3.

Fig. 3.

Progression of TopCoW submissions and winning strategies. (a) Performance of the five teams that participated in both years on the same 34 patients in both years’ test sets. The metrics shown are class-average Dice, variant-balanced accuracy (VarBalAcc) of the anterior and posterior CoW variant classifications, and average F1 score for detection of Acom, Pcoms, and 3rd-A2. Upper rows are for the CTA track. Bottom rows are for the MRA track. (b) Key characteristics of the segmentation algorithms from the top teams in alphabetical team name order. (c) Two methodological breakthroughs in 2023 got picked up by many more teams in the following year.

Fig. 3b summarizes the key design choices of the segmentation algorithms from 2024 top teams. All top teams used single-modality input even though TopCoW challenge provided a pair of test images from both modalities to the algorithms as input for the internal test sets. All top teams trained with both modalities in a mixed modality training pool, thus making the algorithm modality-agnostic. Only one team, ‘CLAIM’, used additional training data prepared independently, which included some external MRA test images without our ground truth labels. There were a mix of strategies for number of stages used in the pipeline: More than half of the top teams went for a two-stage approach, such as with first a localization stage to crop the CoW ROI followed by a segmentation stage on the zoomed-in ROI. All based their network architecture on nnUNet [34]. All used cross-entropy (CE) and Dice loss. All but one used topologybased loss. All but one employed topological optimizations in their methods. The topological optimizations came in three aspects: centerline or skeleton; connected components (CCs); relation among the labels such as neighborhood adjacency. Most top teams considered at least one aspect of the topological optimizations.

Two methodological breakthroughs stood out between 2023 and 2024: significantly more teams trained their algorithms using mixed modality and topological optimizations. As shown in Fig. 3c, only one team (‘DKFZ’) trained their model with mixed modalities in 2023. The same team stood out by winning most benchmark tasks, inspiring seven other teams to adopt mixed-modality training in 2024. Only two teams (‘DKFZ’ and ‘HITSZ’) employed any topological optimizations in 2023. These two teams achieved good performance, and seven additional teams followed suit by incorporating topological optimizations into their models in 2024.

For a full description of the submitted algorithms and their teams, please refer to Supplementary materials S11.

3.2. Voxel-Level Multiclass Segmentation

We show the qualitative segmentation results for TopCoW internal test sets from one of the top teams in Fig. 4a. Two patients for each modality were selected, representing a wide range of CoW variants and class-average Dice scores. Predictions were able to segment various complex CoW anatomies accurately, capturing the curvatures of various vessels and boundaries between classes. Patient 116 from TopCoW MR had a lower class-average Dice score of 79% because Acom was a false-positive detection, resulting in a 0% Dice for that label. For TopCoW internal test sets, top teams had a median classaverage Dice of around 90% for both CTA and MRA.

Fig. 4.

Fig. 4.

Voxel-level multiclass segmentation performance. (a) Qualitative results for multiclass segmentation task. The ground truth was compared with predictions from team ‘UZH’ on four selected patients from the internal test data. Note the increasing class-average Dice scores for the selected four cases. (b) Class-average Dice scores from the top 6 teams on CTA and MRA internal and external test sets. Team ‘NIC-VICOROB’ is abbreviated as ‘NIC’. Team ‘CLAIM’ used additional training data which included some external MRA test images from Lausanne and IXI-HH without our ground truth labels. Team ‘IMR’ had five cases in IXI-HH with Dice below the shown range. Charts showing violin plots with the middle bar being the median.

Fig. 4b shows the segmentation performance in class-average Dice by the top 6 teams on both internal and external test sets. The top teams were able to generalize to external test sets for both modalities, with above 80% median Dice for all test sets. The results for other voxel-level metrics also showed good performance and similar generalization pattern as the Dice metric. Detailed results of all the 2024 teams for both internal and external test datasets can be found in Supplementary S12 and S13. We also report the inference time per test image for best algorithms on the external test set in Supplementary S14.

3.3. Beyond Segmentation I: Key CoW Component Detection

The presence and absence of four CoW components directly determine many of the CoW variant types. The four key CoW components are the communicating arteries (Acom, R-Pcom and L-Pcom) and the 3rd-A2 segment. Correct detection of these components is a desirable and clinically relevant feature of segmentation algorithms. We investigated the detection performance of the aforementioned four key CoW components for all test sets in Fig. 5a. F1 scores, which are the mean of precision and recall, from the top 6 teams of each modality were shown in boxplots. The detection of the communicating arteries were consistent across test sets and modalities, and were able to be detected at above 75% F1 scores by top teams for most of the test sets. The 3rd-A2 had poorer detection performance, with CTA datasets above 50% and MRA datasets above 65% F1 scores for most teams. The 3rd-A2 is a rarer vessel component, and its smaller sample size in CTA test sets (6/70, 1/26, and 2/20 occurrences) and in MRA test sets (7/70, 2/20, and 2/20 occurrences) likely contributed to the less consistent detection results.

Fig. 5.

Fig. 5.

Beyond segmentation performance. (a) Detection of key CoW components for all test datasets by the top 6 teams from each modality. F1 scores from the six teams were displayed in box plots. (b) Four teams submitted both classification-based and segmentation-based algorithms for classifying the CoW variants. Classification-based algorithms were trained to predict CoW variant classes directly. Segmentation-based algorithms were purely for CoW segmentation and later evaluated for CoW variant classification performance. (c) CoW variant classification performance from the top 6 segmentation teams on internal and external test sets. Anterior and posterior VarBalAcc scores from the top 6 teams were displayed in box plots.

3.4. Beyond Segmentation II: CoW Variant Classification

CoW variant classification has long been a highly demanded task driven by its great clinical potential. In fact, concurrent to the first iteration of the TopCoW challenge, another CoW-related challenge, the CROWN challenge [21], was held in 2023 MICCAI with a main task on CoW variant classification. Inspired by the CROWN challenge, in 2024 our second iteration, we created a classification task where participants submit algorithms that output CoW variant classes. Interestingly, the best performing algorithms for the classification task came from the segmentation algorithms with an extra postprocessing step to convert the segmentation masks to the CoW variant edge-lists. Notably, four teams took part in both the segmentation task and the classification task with segmentationbased and classification-based algorithms respectively. Fig. 5b shows that the segmentation-based algorithms out-performed the classification-based methods by a factor of at least 2x on the internal test sets. The classification-based approaches listed here had explored various strategies ranging from graph learning to self-attention, but they all trained the algorithms as a classification model that optimized for the variant classifier. On the other hand, the segmentation-based approaches focused sorely on the multiclass CoW segmentation. The performance gap between segmentation-based and classification-based algorithms prompted us to focus on the top segmentation algorithms for the external test sets.

Fig. 5c shows the VarBalAcc for both anterior and posterior variants for all test sets from the top 6 segmentation teams. The top teams were able to generalize well to external MRA datasets: On average, both the anterior and posterior VarBalAcc were at around 80% for MRA test sets. CTA datasets had good generalizability except for posterior variants for the LargeIA dataset. This is likely because two PV classes in LargeIA had small sample sizes comprising difficult cases: PV-1011 had only one case and PV-1110 had only two cases (see supplementary Table S3), and they were wrongly classified by most teams, resulting in low a balanced accuracy for posterior variants. Apart from the posterior VarBalAcc performance from LargeIA, CTA datasets also had fairly good classification performance and generalizability with above 70% VarBalAcc on average.

3.5. Clinical Application I: Fetal PCA Classification

One potential clinical use of the CoW segmentation model is to classify fetal PCA variant. Fetal PCA variant plays important roles in surgical planning and interpretation of perfusion imaging [1]. Neurosurgeons and neuroradiologists routinely classify patients as having fetal PCA or not based on CTA and MRA findings. Here, we evaluated the clinical applicability of CoW segmentation algorithms for fetal PCA classification. As shown in Fig. 6a, diameters along the ipsilateral Pcom and P1 segmentation masks, if any, were used to predict the fetal PCA class. This simple heuristic allowed us to convert segmentation masks by top teams into fetal PCA labels. Fig. 6b shows that the segmentation output from the top teams were able to accurately classify fetal PCA variants with around 80% and above precision and recall for both fetal L-PCA and fetal R-PCA.

Fig. 6.

Fig. 6.

Two clinical applications of CoW segmentation. (a) Segmentation masks of the ipsilateral Pcom and P1 segments were used to determine whether a fetal PCA class was present on that side. Upper row shows CTA images overlaid with segmentation masks related to fetal PCA. Bottom row shows zoomed-in view of the segmentation masks. Examples shown illustrate the fetal R-PCA classification. (b) Fetal PCA classification performance in terms of precision and recall scores from the top 4 teams. (c) Upper row shows CTA images with aneurysms indicated by red arrowheads. Bottom row shows the aneurysm masks in silver color overlaid with predicted CoW segmentation masks from team ‘UZH’. The “Location in CoW Pred” listed the predicted CoW labels that the aneurysm overlapped with or was adjacent to in the overlay.

3.6. Clinical Application II: Locating Aneurysm

Another potential important clinical use is locating the aneurysms with the CoW vessel segments as reference. Clinicians routinely need to locate intracranial aneurysms relative to the CoW anatomy in MRA and CTA images, as aneurysms occur most frequently along the CoW vessels. Here, we evaluated the ability of our CoW segmentation models to help locate intracranial aneurysms automatically. 12 aneurysm patients with various locations of their aneurysms were segmented by the top 4 teams. Fig. 6c shows the aneurysm ground truth overlaid with the CoW segmentation predictions from team ‘UZH’ for four representative patients. CoW vessel labels that overlapped with or were adjacent to the aneurysm were used to describe the location of the aneurysm. Team ‘UZH’ were able to correctly locate 12/12 aneurysm patients in relation to the CoW vessel segments. Team ‘NIC-VICOROB’ and ‘CLAIM’ correctly located 11/12 patients and team ‘IMR’ located 10/12 patients, although the two mistakes were only minor and explainable. One common mistake was for an aneurysm from patient Tr0004 which was located on R-ACA but was very near where left and right ACAs touched, causing the location of this aneurysm to be wrongly predicted to be adjacent to both ACAs. Another mistake was for patient Tr0019 who had an aneurysm on the ICA from where Pcom tends to originate, and thus a part of the aneurysm was wrongly predicted as Pcom. Overall the CoW segmentation predictions from the top teams were robust against the presence of large aneurysms, and could be applied to locate the aneurysm for most of the patients. We highlight that these results also showed the robustness and generalizablity of the best CoW algorithms when there were no aneurysm cases in the training data and the aneurysm patients came from an external test data. Detailed results for locating the 12 aneurysm patients can be found in Supplementary S15.

4. Discussions

4.1. Benefits of Mixed Modality Training for CTA

Training with mixed modalities was an effective strategy, particularly for the CTA modality. This strategy was pioneered by team ‘DKFZ’ in the first iteration of our challenge, and was picked up by all top teams in the second iteration. The improvement was especially obvious for the more difficult modality CTA. Between MRA and CTA modalities, CTA tended to have lower metric scores even when both modalities had the same number of training data. This could be due to the extra veins and bones surrounding the CoW not seen in MRA, and also the less detailed brain soft tissues in the background. CTA benefited much more from the mixed modality training strategy with bigger improvement in performance than MRA. Given that the annotation efforts were also lighter in MRA images, investing in an easier-to-annotate modality such as MRA can be a good way to reduce the annotation burden and to boost the performance for CTA modality. This is a key finding that can help solve other CTA segmentation tasks in the future.

4.2. Importance of Topological Optimizations

Topological optimizations enabled CoW segmentation algorithms to be used in topology-dependent downstream clinical tasks, such as CoW variant classification and fetal PCA classification. These downstream applications require the segmented vessels to capture key topological properties of the underlying anatomy such as centerline, connected components, and adjacency relation. A wide range of topological optimizations were used by the top submissions. Two of these methods were newly developed by two teams while taking part in the first iteration of our challenge, with our challenge being one of the first venues to test them. Both methods were centerline-based loss functions that improve vessel segmentation, and they were the “skeleton recall (SkelRecall) loss” [35] by team ‘DKFZ’ and “centerline boundary Dice (cbDice) loss” [36] by team ‘HITSZ’. SkelRecall loss was subsequently picked up and used by three other top teams, ‘CLAIM’, ‘NIC-VICOROB’, and ‘UZH’ in the second iteration in 2024. Topology optimization techniques from other medical scenarios were used as well: Team ‘IMR’ built upon a loss previously designed to optimize topology for lung airway segmentation [37]. Top teams also optimized other aspects of topology such as joining disconnected components, removing small isolated components, and handling adjacency relation. Collectively, the various topological optimizations allowed the CoW segmentation to have improved connectivity and centerline, and more accurate topology in general, to be effectively applied to topology-dependent downstream tasks like the CoW variant classification and fetal PCA classification.

4.3. Comparison with CROWN Results

Since both the CROWN and TopCoW challenges included a CoW variant classification task, we compare the performance of our algorithms with those reported in the CROWN challenge. Based on the merged common set of variants, the top two teams from the CROWN challenge achieved 24-30% anterior and 28-50% posterior balanced accuracy, whereas the top two teams from the TopCoW MR submissions achieved 82-89% anterior and 75-82% posterior balanced accuracy. The main difference was that the CROWN challenge did not provide any CoW segmentation annotations and the task was formulated purely as an image classification problem. Similar observations were made in our challenge in Fig. 5b, where we found the segmentationbased algorithms that focused on the CoW multiclass segmentation task performed much better on the variant classification task than classification-based algorithms that optimized for the classification task. Solving the CoW multiclass segmentation task can lead to better solutions for downstream tasks like CoW variant classification.

4.4. Explainability via Segmentation

CoW segmentation algorithms not only can solve downstream tasks, but also provide explainability to the solutions—a feature that is important in clinical settings. We showed that the best CoW segmentation algorithms could be used to effectively detect key CoW components, classify CoW variants and fetal-type PCA, and locate intracranial aneurysms. But more importantly, the intermediate steps to transform the segmentation masks into relevant downstream outputs were fully transparent and interpretable, in contrast to black-box approaches. When clinicians need a “confidence” score on why a certain prediction was made for the detection, classification, or localization by the model, they can easily interpret and explain the results by simply inspecting the CoW segmentation prediction.

4.5. VR to Handle Complex Anatomy

As one of the first challenges to use VR generated annotations at-scale, we believe this challenge has successfully shown that VR-based annotation/verification workflow can overcome the otherwise too time-consuming annotation process for a complex multiclass anatomical segmentation problem. The depth dimension as viewed in 3D in VR offered efficient and powerful annotation/verification capabilities, which proved to be uniquely suitable for curvilinear structures like the CoW vessels that can have complicated spatial orientations and relations among the multiclass tortuous vessels. VR enabled us to quickly and accurately produce annotations and check predictions for even complex CoW anatomies and rare variants. This allowed us to prepare such a densely-annotated large-scale dataset that covered many clinically relevant CoW anatomies.

4.6. Limitations and Future Work

Due to the heterogeneity of the CoW anatomy, not all CoW variants were included in our annotation scheme. In future, we can expand the multiclass labels to accommodate other rare CoW variants excluded in our dataset, which will make the trained model applicable and more robust to a broader range of the population.

Our existing CoW variant graph can be further sub-divided into more fine-grained variant types involving hypo-plasticity and vessel diameters, such as whether the A1 segment of ACA or P1 segment of PCA is hypoplastic [38]. This can be done by applying the same workflow used in our fetal PCA classification where we extracted the diameters along the centerlines of P1 and Pcom.

5. Conclusion

The two iterations of the TopCoW challenge attracted over 250 registered participants from six continents, which resulted in over 25 submitted algorithms. We evaluated the performance on internal and external test sets from multi-centers. The top segmentation algorithms showed good performance on multiclass CoW segmentation, detection of key CoW components and classification of CoW variants, and generalized well to new test data. We conducted additional evaluations on the ability of the segmentation model to classify fetal PCA and locate intracranial aneurysms, with results showing promising potential for clinical applications.

TopCoW has demonstrated the power, potential, and versatility of CoW multiclass segmentation for a wide range of tasks beyond segmentation. TopCoW released the first dataset on paired CTA and MRA with annotations, and thus enabled some of the first anatomical segmentation models for CoW, especially for the CTA modality. TopCoW led to several methodological insights on how to best solve the CoW anatomical segmentation and variant classification task, and it catalyzed a few algorithmic breakthroughs from the participants. TopCoW was the first challenge to use a VR-based annotation workflow, which was crucial in preparing the multiclass annotations. As a first benchmark for such a CoW segmentation task, TopCoW gathered strong baseline results for further algorithm development and comparison. The annotated datasets and the best performing Docker submissions have been released in Zenodo records for public access.

Ultimately we want to solve real clinical problems, and one of them is to prototype an automated CoW characterization tool for diagnosis, screening, and treatment. An accurate characterization of the CoW is of great clinical relevance, and we hope TopCoW challenge has piqued the interest of the community on this worthwhile endeavor.

Supplementary Material

1

Acknowledgments

The challenge is supported by the Digitalization Initiative of the Zurich Higher Education Institutions (DIZH) and the Helmut Horten Foundation. We thank Hrvoje Bogunović for helpful discussions and suggestions during the early planning stages. We also thank Nathan Spencer and Michael Morehead from syGlass for the technical assistance for the VR setup, and James Meakin and Chris van Run from grand-challenge.org for the technical support for the challenge infrastructure.

Ynte Ruigrok has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation program (grant agreement No. 852173). Hakim Baazaoui received funding from the Koetser Foundation and the “Young Talents in Clinical Research” program of the SAMS and of the G. & J. Bangerter-Rhyner Foundation.

Team 2i_mtl was supported by Grants from the Quebec Bio-Imaging Network (Project No. 21.24) and start-up funds from the Centre de Recherche du CHUM and Departement de radiologie, radio-oncologie et medecine nucleaire, Universite de Montreal/Bayer. Laurent Letourneau-Guillon is supported by a Clinical Research Scholarship-Junior 1 Salary Award (311203) from the Fonds de Recherche du Quebec en Sante and Fondation de l’Association des Radiologistes du Quebec. Team CLAIM acknowledges funding from the German Federal Ministry of Education and Research (ANONYMED Project, coordinator DF). Computation has been performed on the HPC for Research cluster of the Berlin Institute of Health. They also acknowledge the contribution of MRCLEAN investigators by providing access to data from the MRCLEAN trial. Team DKFZ was supported by the Helmholtz Association under the joint research school “HIDSS4Health - Helmholtz Information and Data Science School for Health” and part of their work was funded by Helmholtz Imaging (HI), a platform of the Helmholtz Incubator on Information and Data Science. Team EURECOM was partially funded by the French government, through the 3IA Cote d’Azur Investments in the Future project managed by the ANR (ANR-19-P3IA-0002) and by the ANR JCJC project I-VESSEG (22-CE45-0015-01). Team IMR was supported in part by National Key R&D Program of China (Grant Number: 2022ZD0212400), Natural Science Foundation of China (Grant Number: 62373243) and the Science and Technology Commission of Shanghai Municipality, China (Grant Number: 20DZ2220400), Shanghai Municipal Science and Technology Major Project (No.2021SHZDZX0102). Team IWantTo-GoToCanada was supported by the National Research Foundation (NRF-2020M3E5D2A01084892), Institute for Basic Science (IBS-R015-D1), ITRC support program (IITP-2023-2018-0-01798), AI Graduate School Support Program (2019-0-00421), ICT Creative Consilience program (IITP-2023-2020-0-01821), and the Artificial Intelligence Innovation Hub program (2021-0-02068). Team lWM wants to acknowledge the Polish HPC infrastructure PLGrid support (No. PLG/2023/016239). Team NantesU was partially supported by the French ANR project “eCAN” and INSERM CoPoC #MAT-PI-22155-A-01 (RVF23037NSA). Team NIC-VICOROB was supported by the Ministerio de Ciencia e Innovacion (DPI2020-114769RB-I00) as well as by ICREA under the ICREA Academia programme, and also partly supported the Ministerio de Ciencia e Innovacion (DPI2020-114769RB-I00). Members of the 2024 team received the support from the PID2020-114769RBI00 and the PID2023-146187OB-I00 projects funded by the Ministerio de Ciencia, Innovación y Universidades. Team Pamaad P. Casademunt is supported by the European Union’s Horizon 2020 grant agreement No.101136438 (GEMINI project), by the Agència de Gestió d’Ajuts Universitaris i de Recerca (grant No. 2024 FI-1 00419), and the Maria de Maeztu grant of excellence. A. Galdran is supported by grant RYC2022-037144-I, funded by MCIN/AEI/10.13039/501100011033 and by FSE+. They would like to thank the HPC team from ZHAW, particularly Pascal Häussler and Stefan Weber, for their generous allocation of computational resources and technical assistance. Team UB-VTL wants to acknowledge the computational resources provided by the Center of Computational Research (CCR) at University of Buffalo. Team UW was supported by the United States National Institute of Health (grants R01HL162743 and R00HL136883).

Footnotes

Data Availability

The TopCoW training data of 250 annotated images and our multi-center test sets of 86 annotated images are released in our public Zenodo repository (15 GB) at https://zenodo.org/records/15692630.

Code Availability

The Docker images from best performing teams and the scripts to help run them locally are released in our public Zenodo repository (45 GB) at https://zenodo.org/records/15665435.

For code availability of individual team submission, please see Supplementary S11.

The implementation of our evaluation metric code is open sourced at https://github.com/CoWBenchmark/TopCoW_Eval_Metrics.

References

  • [1].Osborn A. G., Osborn’s Brain: Imaging, Pathology, and Anatomy, Amirsys, 2013. [Google Scholar]
  • [2].Liebeskind D. S., Collateral circulation, Stroke 34 (2003) 2279–2284. [DOI] [PubMed] [Google Scholar]
  • [3].Chuang Y.-M., Chan L., Lai Y.-J., Kuo K.-H., Chiou Y.-H., Huang L.-W., Kwok Y.-T., Lai T.-H., Lee S.-P., Wu H.-M., et al. , Configuration of the circle of willis is associated with less symptomatic intracerebral hemorrhage in ischemic stroke patients treated with intravenous thrombolysis, Journal of Critical Care 28 (2013) 166–172. [DOI] [PubMed] [Google Scholar]
  • [4].van Seeters T., Hendrikse J., Biessels G. J., Velthuis B. K., Mali W. P., Kappelle L. J., van der Graaf Y., Group S. S., Completeness of the circle of willis and risk of ischemic stroke in patients without cerebrovascular disease, Neuroradiology 57 (2015) 1247–1251. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [5].Kim K. M., Kang H.-S., Lee W. J., Cho Y. D., Kim J. E., Han M. H., Clinical significance of the circle of willis in intracranial atherosclerotic stenosis, Journal of Neurointerventional Surgery 8 (2016) 251–255. [DOI] [PubMed] [Google Scholar]
  • [6].Rinaldo L., McCutcheon B. A., Murphy M. E., Bydon M., Rabinstein A. A., Lanzino G., Relationship of a1 segment hypoplasia to anterior communicating artery aneurysm morphology and risk factors for aneurysm formation, Journal of Neurosurgery 127 (2016) 89–95. [DOI] [PubMed] [Google Scholar]
  • [7].Yang F., Li H., Wu J., Li M., Chen X., Jiang P., Li Z., Cao Y., Wang S., Relationship of a1 segment hypoplasia with the radiologic and clinical outcomes of surgical clipping of anterior communicating artery aneurysms, World Neurosurgery 106 (2017) 806–812. [DOI] [PubMed] [Google Scholar]
  • [8].Banga P. V., Varga A., Csobay-Novák C., Kolossváry M., Szántó E., Oderich G. S., Entz L., Sótonyi P., Incomplete circle of willis is associated with a higher incidence of neurologic events during carotid eversion endarterectomy without shunting, Journal of Vascular Surgery 68 (2018) 1764–1771. [DOI] [PubMed] [Google Scholar]
  • [9].Krabbe-Hartkamp M. J., Van der Grond J., De Leeuw F., de Groot J. C., Algra A., Hillen B., Breteler M., Mali W., Circle of willis: morphologic variation on three-dimensional time-of-flight mr angiograms., Radiology 207 (1998) 103–111. [DOI] [PubMed] [Google Scholar]
  • [10].Iqbal S., A comprehensive study of the anatomical variations of the circle of willis in adult human brains, Journal of Clinical and Diagnostic Research: JCDR 7 (2013) 2423. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [11].Bullitt E., Zeng D., Gerig G., Aylward S., Joshi S., Smith J. K., Lin W., Ewend M. G., Vessel tortuosity and brain tumor malignancy: a blinded study, Academic Radiology 12 (2005) 1232–1240. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [12].IXI, IXI dataset - brain development, https://brain-development.org/ixi-dataset/, 2022. Accessed: 2022-09-30.
  • [13].CAS2023, Cerebral artery segmentation challenge (CAS) 2023, https://codalab.lisn.upsaclay.fr/competitions/9804, 2023. Accessed: 2023-10-01.
  • [14].Chatterjee S., Mattern H., Dörner M., Sciarra A., Dubost F., Schnurre H., Khatun R., Yu C.-C., Hsieh T.-L., Tsai Y.-S., et al. , Smile-uhura challenge–small vessel segmentation at mesoscopic scale from ultra-high resolution 7t magnetic resonance angiograms, arXiv preprint arXiv:2411.09593 (2024). [Google Scholar]
  • [15].Mou L., Yan Q., Lin J., Zhao Y., Liu Y., Ma S., Zhang J., Lv W., Zhou T., Frangi A. F., et al. , Costa: A multi-center tof-mra dataset and a style self-consistency network for cerebrovascular segmentation, IEEE transactions on medical imaging (2024). [DOI] [PubMed] [Google Scholar]
  • [16].Bogunović H., Pozo J. M., Cárdenes R., San Román L., Frangi A. F., Anatomical labeling of the circle of willis using maximum a posteriori probability estimation, IEEE Transactions on Medical Imaging 32 (2013) 1587–1599. [DOI] [PubMed] [Google Scholar]
  • [17].Robben D., Sunaert S., Thijs V., Wilms G., Maes F., Suetens P., Anatomical labeling of the circle of willis using maximum a posteriori graph matching, in: Medical Image Computing and Computer-Assisted Intervention–MICCAI 2013: 16th International Conference, Nagoya, Japan, September 22-26, 2013, Proceedings, Part I 16, Springer, 2013, pp. 566–573. [DOI] [PubMed] [Google Scholar]
  • [18].Robben D., Türetken E., Sunaert S., Thijs V., Wilms G., Fua P., Maes F., Suetens P., Simultaneous segmentation and anatomical labeling of the cerebral vasculature, Medical Image Analysis 32 (2016) 201–215. [DOI] [PubMed] [Google Scholar]
  • [19].Chen L., Hatsukami T., Hwang J.-N., Yuan C., Automated intracranial artery labeling using a graph neural network and hierarchical refinement, in: Medical Image Computing and Computer Assisted Intervention–MICCAI 2020: 23rd International Conference, Lima, Peru, October 4–8, 2020, Proceedings, Part VI 23, Springer, 2020, pp. 76–85. [Google Scholar]
  • [20].Hong S.-W., Song H.-N., Choi J.-U., Cho H.-H., Baek I.-Y., Lee J.-E., Kim Y.-C., Chung D., Chung J.-W., Bang O.-Y., et al. , Automated indepth cerebral arterial labelling using cerebrovascular vasculature reframing and deep neural networks, Scientific Reports 13 (2023) 3255. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [21].Vos I. N., Ruigrok Y. M., Bennink E., Velthuis M. R., Paic B., Ophelders M. E., Buser M. A., van der Velden B. H., Chen G., Coupet M., et al. , Evaluation of techniques for automated classification and artery quantification of the circle of willis on tof-mra images: The crown challenge, Medical Image Analysis (2025) 103650. [DOI] [PubMed] [Google Scholar]
  • [22].Dumais F., Caceres M. P., Janelle F., Seifeldine K., Arès-Bruneau N., Gutierrez J., Bocti C., Whittingstall K., eicab: A novel deep learning pipeline for circle of willis multiclass segmentation and analysis, NeuroImage 260 (2022) 119425. [DOI] [PubMed] [Google Scholar]
  • [23].Hilbert A., Rieger J., Madai V. I., Akay E. M., Aydin O. U., Behland J., Khalil A. A., Galinovic I., Sobesky J., Fiebach J., et al. , Anatomical labeling of intracranial arteries with deep learning in patients with cerebrovascular disease, Frontiers in Neurology 13 (2022) 1000914. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [24].OpenData Swiss, Terms of use — OpenData Swiss, https://opendata.swiss/en/terms-of-use, 2024. [Online; accessed 1-March-2024].
  • [25].Kaltenecker D., Al-Maskari R., Negwer M., Hoeher L., Kofler F., Zhao S., Todorov M., Rong Z., Paetzold J. C., Wiestler B., Piraud M., Rueckert D., Geppert J., Morigny P., Rohm M., Menze B. H., Herzig S., Berriel Diaz M., Ertürk A., Virtual reality-empowered deep-learning analysis of brain cells, Nature Methods (2024) 1–10. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [26].de la Rosa E., Su R., Reyes M., Wiest R., Riedel E. O., Kofler F., Yang K., Baazaoui H., Robben D., Wegener S., et al. , Isles’24: Improving final infarct prediction in ischemic stroke using multimodal imaging and clinical data, arXiv preprint arXiv:2408.10966 (2024). [Google Scholar]
  • [27].Riedel E. O., de la Rosa E., Baran T. A., Petzsche M. H., Baazaoui H., Yang K., Robben D., Seia J. O., Wiest R., Reyes M., et al. , Isles 2024: The first longitudinal multimodal multi-center real-world dataset in (sub-) acute stroke, arXiv preprint arXiv:2408.11142 (2024). [Google Scholar]
  • [28].Bo Z.-H., Large ia segmentation dataset, 2021. URL: https://doi.org/10.5281/zenodo.6801398. doi: 10.5281/zenodo.6801398. [DOI] [Google Scholar]
  • [29].Bo Z.-H., Qiao H., Tian C., Guo Y., Li W., Liang T., Li D., Liao D., Zeng X., Mei L., et al. , Toward human intervention-free clinical diagnosis of intracranial aneurysm via deep neural network, Patterns 2 (2021) 100197. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [30].Noto T. D., Marie G., Tourbier S., Alemán-Gómez Y., Esteban O., Saliou G., Cuadra M. B., Hagmann P., Richiardi J., ”lausanne tof-mra aneurysm cohort”, 2022. doi:doi: 10.18112/openneuro.ds003949.v1.0.1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [31].Di Noto T., Marie G., Tourbier S., Alemán-Gómez Y., Esteban O., Saliou G., Cuadra M. B., Hagmann P., Richiardi J., Towards automated brain aneurysm detection in tof-mra: open data, weak labels, and anatomical knowledge, Neuroinformatics 21 (2023) 21–34. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [32].Shit S., Paetzold J. C., Sekuboyina A., Ezhov I., Unger A., Zhylka A., Pluim J. P., Bauer U., Menze B. H., clDice-a novel topology-preserving loss function for tubular structure segmentation, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 16560–16569. [Google Scholar]
  • [33].Musio F., Yang K., Shit S., Prabhakar C., Juchler N., Menze B., Hirsch S., Quantitative evaluation of the circle of willis vascular architecture in 3d ct and mr angiography, in: 8th International Conference on Computational and Mathematical Biomedical Engineering (CMBE24), Arlington, VA, USA, 24-26 June 2024, volume 2, Computational & Mathematical Biomedical Engineering, 2024, pp. 563–566. [Google Scholar]
  • [34].Isensee F., Jaeger P. F., Kohl S. A., Petersen J., Maier-Hein K. H., nnunet: a self-configuring method for deep learning-based biomedical image segmentation, Nature Methods 18 (2021). [DOI] [PubMed] [Google Scholar]
  • [35].Kirchhoff Y., Rokuss M. R., Roy S., Kovacs B., Ulrich C., Wald T., Zenk M., Vollmuth P., Kleesiek J., Isensee F., et al. , Skeleton recall loss for connectivity conserving and resource efficient segmentation of thin tubular structures, in: European Conference on Computer Vision, Springer, 2024, pp. 218–234. [Google Scholar]
  • [36].Shi P., Hu J., Yang Y., Gao Z., Liu W., Ma T., Centerline boundary dice loss for vascular segmentation, in: International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer, 2024, pp. 46–56. [Google Scholar]
  • [37].Zhang M., Gu Y., Towards connectivity-aware pulmonary airway segmentation, IEEE Journal of Biomedical and Health Informatics 28 (2023) 321–332. [DOI] [PubMed] [Google Scholar]
  • [38].Westphal L. P., Lohaus N., Winklhofer S., Manzolini C., Held U., Steigmiller K., Hamann J. M., El Amki M., Dobrocky T., Panos L. D., et al. , Circle of willis variants and their association with outcome in patients with middle cerebral artery-m1-occlusion stroke, European Journal of Neurology 28 (2021) 3682–3691. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [39].Li X., Morgan P. S., Ashburner J., Smith J., Rorden C., The first step for neuroimaging data analysis: Dicom to nifti conversion, Journal of Neuroscience Methods 264 (2016) 47–56. [DOI] [PubMed] [Google Scholar]
  • [40].Wasserthal J., Breit H.-C., Meyer M. T., Pradella M., Hinck D., Sauter A. W., Heye T., Boll D. T., Cyriac J., Yang S., et al. , Totalsegmentator: Robust segmentation of 104 anatomic structures in ct images, Radiology: Artificial Intelligence 5 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [41].Schimke N., Hale J., Quickshear defacing for neuroimages, in: Proceedings of the 2nd USENIX Conference on Health Security and Privacy, HealthSec’11, USENIX Association, USA, 2011, p. 11. [Google Scholar]
  • [42].Isensee F., Schell M., Pflueger I., Brugnara G., Bonekamp D., Neuberger U., Wick A., Schlemmer H.-P., Heiland S., Wick W., et al. , Automated brain extraction of multisequence mri using artificial neural networks, Human Brain Mapping 40 (2019) 4952–4964. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [43].Bouthillier A., Van Loveren H. R., Keller J. T., Segments of the internal carotid artery: a new classification, Neurosurgery 38 (1996) 425–433. [DOI] [PubMed] [Google Scholar]
  • [44].Cheng B., Girshick R., Dollár P., Berg A. C., Kirillov A., Boundary iou: Improving object-centric image segmentation evaluation, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 15334–15342. [Google Scholar]
  • [45].Reinke A., Tizabi M. D., Baumgartner M., Eisenmann M., Heckmann-Nötzel D., Kavur A. E., Rädsch T., Sudre C. H., Acion L., Antonelli M., et al. , Understanding metric-related pitfalls in image analysis validation, Nature methods 21 (2024) 182–194. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [46].Meyer-Spradow J., Ropinski T., Mensmann J., Hinrichs K., Voreen: A rapid-prototyping environment for ray-casting-based volume visualizations, IEEE Computer Graphics and Applications 29 (2009) 6–13. [DOI] [PubMed] [Google Scholar]
  • [47].Oktay O., Schlemper J., Folgoc L. L., Lee M., Heinrich M., Misawa K., Mori K., McDonagh S., Hammerla N. Y., Kainz B., et al. , Attention u-net: Learning where to look for the pancreas, arXiv preprint arXiv:1804.03999 (2018). [Google Scholar]
  • [48].T. M. Consortium, Project monai, 10.5281/zenodo.4323059, 2020. Zenodo. [DOI] [Google Scholar]
  • [49].Cardoso M. J., Li W., Brown R., Ma N., Kerfoot E., Wang Y., Murrey B., Myronenko A., Zhao C., Yang D., et al. , Monai: An open-source framework for deep learning in healthcare, arXiv preprint arXiv:2211.02701 (2022). [Google Scholar]
  • [50].Celaya A., Riviere B., Fuentes D., A generalized surface loss for reducing the hausdorff distance in medical imaging segmentation, arXiv preprint arXiv:2302.03868 (2023). [Google Scholar]
  • [51].Jocher G., Qiu J., Chaurasia A., Ultralytics YOLO, 2023. URL: https://github.com/ultralytics/ultralytics. [Google Scholar]
  • [52].Oquab M., Darcet T., Moutakanni T., Vo H., Szafraniec M., Khalidov V., Fernandez P., Haziza D., Massa F., El-Nouby A., et al. , Dinov2: Learning robust visual features without supervision, arXiv preprint arXiv:2304.07193 (2023). [Google Scholar]
  • [53].Dutta P., Bose S., Roy S. K., Mitra S., Are vision xlstm embedded unet more reliable in medical 3d image segmentation?, arXiv (2024). URL: https://arxiv.org/abs/2406.16993. [Google Scholar]
  • [54].Myronenko A., 3d mri brain tumor segmentation using autoencoder regularization, in: International MICCAI brainlesion workshop, Springer, 2018, pp. 311–320. [Google Scholar]
  • [55].Galati F., Falcetta D., Cortese R., Casolla B., Prados F., Burgos N., Zuluaga M. A., A2v: A semi-supervised domain adaptation framework for brain vessel segmentation via two-phase training angiography-to-venography translation, arXiv preprint arXiv:2309.06075 (2023). [Google Scholar]
  • [56].Billot B., Greve D., Puonti O., Thielscher A., Van Leemput K., Fischl B., Dalca A., Iglesias J., Synthseg: Segmentation of brain mri scans of any contrast and resolution without retraining, Medical Image Analysis 86 (2023) 102789. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [57].Roy S., Koehler G., Ulrich C., Baumgartner M., Petersen J., Isensee F., Jaeger P. F., Maier-Hein K. H., Mednext: transformer-driven scaling of convnets for medical image segmentation, International Conference on Medical Image Computing and Computer-Assisted Intervention (MIC-CAI) (2023) 405–415. [Google Scholar]
  • [58].Lee H. H., Bao S., Huo Y., Landman B. A., 3d ux-net: A large kernel volumetric convnet modernizing hierarchical transformer for medical image segmentation, arXiv preprint arXiv:2209.15076 (2022). [Google Scholar]
  • [59].Shi P., Guo X., Yang Y., Ye C., Ma T., Nextou: Efficient topology-aware u-net for medical image segmentation, arXiv preprint arXiv:2305.15911 (2023). [Google Scholar]
  • [60].Zhang M., You X., Zhang H., Gu Y., Topology-aware exploration of circle of willis for cta and mra: Segmentation, detection, and classification, arXiv preprint arXiv:2410.15614 (2024). [Google Scholar]
  • [61].Hatamizadeh A., Nath V., Tang Y., Yang D., Roth H. R., Xu D., Swin unetr: Swin transformers for semantic segmentation of brain tumors in mri images, in: International MICCAI BrainLesion Workshop, Springer, 2021, pp. 272–284. [Google Scholar]
  • [62].Milletari F., Navab N., Ahmadi S.-A., V-net: Fully convolutional neural networks for volumetric medical image segmentation, in: 2016 fourth International Conference on 3D Vision (3DV), IEEE, 2016, pp. 565–571. [Google Scholar]
  • [63].Autrusseau F., Nader R., Nouri A., L’Allinec V., Bourcier R., Toward a 3d arterial tree bifurcation model for intra-cranial aneurysm detection and segmentation, in: 2022 26th International Conference on Pattern Recognition (ICPR), IEEE, 2022, pp. 4500–4506. [Google Scholar]
  • [64].Nader R., Autrusseau F., L’Allinec V., Bourcier R., A vascular synthetic model for improved aneurysm segmentation and detection via deep neural networks, arXiv preprint arXiv:2403.18734 (2024). [Google Scholar]
  • [65].Acebes C., Moustafa A. H., Camara O., Galdran A., The centerlinecross entropy loss for vessel-like structure segmentation: Better topology consistency without sacrificing accuracy, in: International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer, 2024, pp. 710–720. [Google Scholar]
  • [66].Liu Z., Ning J., Cao Y., Wei Y., Zhang Z., Lin S., Hu H., Video swin transformer, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 3202–3211. [Google Scholar]
  • [67].Karras T., Laine S., Aila T., A style-based generator architecture for generative adversarial networks, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 4401–4410. [Google Scholar]
  • [68].Redmon J., Divvala S., Girshick R., Farhadi A., You only look once: Unified, real-time object detection, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 779–788. [Google Scholar]
  • [69].Hilbert A., Madai V. I., Akay E. M., Aydin O. U., Behland J., Sobesky J., Galinovic I., Khalil A. A., Taha A. A., Wuerfel J., et al. , Brave-net: fully automated arterial brain vessel segmentation in patients with cerebrovascular disease, Frontiers in artificial intelligence 3 (2020) 552258. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [70].Muschelli J., A publicly available, high resolution, unbiased ct brain template, in: Information Processing and Management of Uncertainty in Knowledge-Based Systems: 18th International Conference, IPMU 2020, Lisbon, Portugal, June 15–19, 2020, Proceedings, Part III 18, Springer, 2020, pp. 358–366. [Google Scholar]
  • [71].Warfield S. K., Zou K. H., Wells W. M., Simultaneous truth and performance level estimation (staple): an algorithm for the validation of image segmentation, IEEE transactions on medical imaging 23 (2004) 903–921. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [72].Baumgartner M., Jäger P. F., Isensee F., Maier-Hein K. H., nndetection: a self-configuring method for medical object detection, in: Medical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part V 24, Springer, 2021, pp. 530–539. [Google Scholar]
  • [73].Drees D., Scherzinger A., Hägerling R., Kiefer F., Jiang X., Scalable robust graph and feature extraction for arbitrary vessel networks in large volumetric datasets, BMC bioinformatics 22 (2021) 346. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

1

Data Availability Statement

The TopCoW training data of 250 annotated images and our multi-center test sets of 86 annotated images are released in our public Zenodo repository (15 GB) at https://zenodo.org/records/15692630.

The Docker images from best performing teams and the scripts to help run them locally are released in our public Zenodo repository (45 GB) at https://zenodo.org/records/15665435.

For code availability of individual team submission, please see Supplementary S11.

The implementation of our evaluation metric code is open sourced at https://github.com/CoWBenchmark/TopCoW_Eval_Metrics.


Articles from ArXiv are provided here courtesy of arXiv

RESOURCES