Skip to main content
NPJ Precision Oncology logoLink to NPJ Precision Oncology
. 2023 Aug 31;7:83. doi: 10.1038/s41698-023-00432-6

Artificial intelligence in ovarian cancer histopathology: a systematic review

Jack Breen 1,, Katie Allen 2, Kieran Zucker 3, Pratik Adusumilli 2,4, Andrew Scarsbrook 2,4, Geoff Hall 3, Nicolas M Orsi 2, Nishant Ravikumar 1
PMCID: PMC10471607  PMID: 37653025

Abstract

This study evaluates the quality of published research using artificial intelligence (AI) for ovarian cancer diagnosis or prognosis using histopathology data. A systematic search of PubMed, Scopus, Web of Science, Cochrane CENTRAL, and WHO-ICTRP was conducted up to May 19, 2023. Inclusion criteria required that AI was used for prognostic or diagnostic inferences in human ovarian cancer histopathology images. Risk of bias was assessed using PROBAST. Information about each model was tabulated and summary statistics were reported. The study was registered on PROSPERO (CRD42022334730) and PRISMA 2020 reporting guidelines were followed. Searches identified 1573 records, of which 45 were eligible for inclusion. These studies contained 80 models of interest, including 37 diagnostic models, 22 prognostic models, and 21 other diagnostically relevant models. Common tasks included treatment response prediction (11/80), malignancy status classification (10/80), stain quantification (9/80), and histological subtyping (7/80). Models were developed using 1–1375 histopathology slides from 1–776 ovarian cancer patients. A high or unclear risk of bias was found in all studies, most frequently due to limited analysis and incomplete reporting regarding participant recruitment. Limited research has been conducted on the application of AI to histopathology images for diagnostic or prognostic purposes in ovarian cancer, and none of the models have been demonstrated to be ready for real-world implementation. Key aspects to accelerate clinical translation include transparent and comprehensive reporting of data provenance and modelling approaches, and improved quantitative evaluation using cross-validation and external validations. This work was funded by the Engineering and Physical Sciences Research Council.

Subject terms: Translational research, Ovarian cancer, Cancer imaging

Introduction

Ovarian cancer is the eighth most common malignancy in women worldwide1. It is notoriously difficult to detect and diagnose, with ineffective screening2 and non-specific symptoms similar to those caused by menopause3. Encompassing primary malignant tumours of the ovaries, fallopian tubes, and peritoneum, the disease has often started to spread within the abdomen at the time of diagnosis (FIGO4 Stage 3). This typical late stage at diagnosis makes ovarian cancer a particularly deadly disease, with the 314,000 new cases diagnosed each year translating to 207,000 deaths per year globally1.

Most ovarian cancers are carcinomas (cancers of epithelial origin) which predominantly fall into five histological subtypes: high-grade serous, low-grade serous, clear cell, endometrioid, and mucinous. Non-epithelial ovarian cancers are much less common and include germ cell, sex cord-stromal, and mesenchymal tumours. Ovarian cancer subtypes differ morphologically and prognostically and have varying treatment options5. High-grade serous carcinoma is the most common form of ovarian cancer, accounting for approximately 70% of all cases6.

Histopathology, the examination of tissue specimens at the cellular level, is the gold standard for ovarian cancer diagnosis. Pathologists typically interpret tissue stained with haematoxylin and eosin (H&E), though interpretation can be a subjective, time-consuming process, with some tasks having a high level of inter-observer variation79. In the assessment of difficult cases, general pathologists may seek assistance from subspecialty gynaecological pathology experts, and/or use ancillary tests, such as immunohistochemistry (IHC). Referrals and ancillary testing can be essential to the accuracy of the diagnostic process but come at the cost of making it longer and more expensive. Worldwide, pathologists are in much greater demand than supply, with significant disparities in the number of pathologists between countries10, and with better-supplied countries still unable to meet demand11.

Traditionally, pathologists have analysed glass slides using a light microscope. However, the implementation of a digital workflow, where pathologists review scanned whole slide images (WSIs) using a computer, is becoming more common. While digital pathology uptake has likely been driven by efficiency benefits12, it has created an opportunity for the development of automated tools to assist pathologists. These tools often aim to improve the accuracy, efficiency, objectivity, and consistency of diagnosis. Such tools could help to alleviate the global workforce shortage of pathologists, increasing diagnostic throughput and reducing the demand for referrals and ancillary tests. This is an increasingly active area of research13 and, for some malignancies, these systems are starting to achieve clinical utility14.

In this study, we systematically reviewed all literature in which artificial intelligence (AI) techniques (comprising both traditional machine learning (ML) and deep learning methods) were applied to digital pathology images for the diagnosis or prognosis of ovarian cancer. This included research that focused on a single diagnostic factor such as histological subtype and studies that performed computer-aided diagnostic tasks such as tumour segmentation. The review characterises the state of the field, describing which diagnostic and prognostic tasks have been addressed, and assessing factors relevant to the clinical utility of these methods, such as the risks of bias. Despite ovarian cancer being a particularly difficult disease to detect and diagnose, and the shortage of available pathologists, AI models have not yet been implemented in clinical practice for this disease. This review aims to provide insights and recommendations based on published literature to improve the clinical utility of future research, including reducing risks of bias, improving reproducibility, and increasing generalisability.

Results

As shown in Fig. 1, the literature searches returned a total of 1573 records, of which 557 were duplicates. Nine hundred and thirty records were excluded during the screening of titles and abstracts, and 41 were excluded based on full paper screening, including 3 records for which full articles could not be obtained. The remaining 45 studies were included in the review, of which 11 were conference papers and 34 were journal papers. All accepted studies were originally identified through searches of research databases, with no records from trial registries meeting the inclusion criteria. While the searches returned literature from as early as 1949, all of the research which met the inclusion criteria was published since 2010, with over 70% of the included literature published since 2020. Study characteristics are shown in Table 1. The 45 accepted articles contained 80 models of interest, details of which are shown in Table 2.

Fig. 1. PRISMA 2020 flowchart.

Fig. 1

PRISMA 2020 flowchart of the study identification and selection process for the systematic review. Records were screened on titles and abstracts alone, and reports were assessed based on the full-text content. CENTRAL Central Register of Controlled Trials. WHO-ICTRP World Health Organisation International Clinical Trial Registry Platform.

Table 1.

Characteristics of the 45 studies included in this systematic review.

Publication Ovarian cancer data source Models of interest Outcome type Model outcomes Published code
Dong 201049 Unclear 1 Other Stain segmentation None
Dong 201050 Unclear 1 Other Stain segmentation None
Signolle 201051 Unclear 1 Other Tumour segmentation None
Janowczyk 201152 Unclear 1 Diagnosis Malignancy None
Janowczyk 201253 Unclear 1 Other Stain segmentation None
Kothari 201218 TCGA-OV (Multi-city, USA) 1 Diagnosis Malignancy None
Poruthoor 201321 TCGA-OV (Multi-city, USA) 2 Diagnosis, prognosis Grade; overall survival None
BenTaieb 201529 Transcanadian Study (Multi-city, Canada) 1 Diagnosis Histological subtype None
BenTaieb 201630 Transcanadian Study (Multi-city, Canada) 1 Diagnosis Histological subtype Inaccessible
BenTaieb 201748 Unclear 1 Diagnosis Histological subtype Inaccessible
Lorsakul 201766 Unclear 1 Other Cell type None
Du 201840 Unique (Oklahoma, USA) 1 Other Tissue type None
Heindl 201857 TCGA-OV (Multi-city, USA) 1 Other Cell type https://yuanlab.org/file/Ov3sweave2.pdf
Kalra 202015 TCGA-OV (Multi-city, USA) 4 Diagnosis Primary cancer type None
Levine 202026 OVCARE (Vancouver, Canada) 1 Diagnosis Histological subtype https://github.com/AIMLab-UBC/pathGAN
Yaar 202022 TCGA-OV (Multi-city, USA) 1 Prognosis Treatment response https://github.com/asfandasfo/LUPI
Yu 202019 TCGA-OV (Multi-city, USA) 4 Diagnosis, prognosis Malignancy, grade, transcriptomic subtype; treatment response https://github.com/khyu/ovarian_ca/
Gentles 202155 Unique (Newcastle, UK) 6 Other Stain quantity/intensity None
Ghoniem 202123 TCGA-OV (Multi-city, USA) 1 Diagnosis Stage None
Jiang 202131 Mayo Clinic (Rochester, USA) 1 Diagnosis Malignancy https://github.com/smujiang/CellularComposition
Laury 202156 Unique (Helsinki, Finland) 1 Prognosis Progression-free survival None
Paijens 202137 Unique (Groningen & Zwolle, The Netherlands) 1 Other Tissue type None
Shin 202138 TCGA-OV (Multi-city, USA) + Unique (Ajou, Korea) 1 Diagnosis Malignancy https://github.com/ABMI/HistopathologyStyleTransfer
Zeng 202124 TCGA-OV (Multi-city, USA) + Unique (Shanghai, China) 5 Diagnosis, prognosis Genetic mutation, transcriptomic subtype, microsatellite instability; overall survival None
Boehm 202217 TCGA-OV (Multi-city, USA) + MSKCC (New York, USA) 3 Diagnosis, prognosis Malignancy; overall survival, progression-free survival https://github.com/kmboehm/onco-fusion
Boschman 202227 OVCARE (Vancouver, Canada) 1 Diagnosis Histological subtype None
Elie 202261 Unique (Caen, France) 3 Other Stain quantity/intensity None
Farahani 202228 OVCARE (Vancouver, Canada) + Unique (Calgary, Canada) 2 Diagnosis Malignancy, histological subtype https://github.com/AIMLab-UBC/ModernPath2022
Hu 202241 TCGA-OV (Multi-city, USA) 1 Diagnosis Epithelial–mesenchymal transition https://github.com/superhy/LCSB-MIL
Jiang 202232 Mayo Clinic (Rochester, USA) 4 Diagnosis, other Tumour–stroma reaction; tumour segmentation https://github.com/smujiang/TumorStromaReaction
Kasture 202246 TCGA-OVa (Multi-city, USA) 1 Diagnosis Histological subtype https://github.com/kokilakasture/OvarianCancerPrediction
Kowalski 202247 Unclear 1 Other Tumour segmentation None
Lazard 202242 TCGA-OV (Multi-city, USA) 1 Diagnosis Homologous recombination deficiency status https://github.com/trislaz/wsi_mil
Liu 202220 TCGA-OV (Multi-city, USA) 1 Prognosis Overall survival https://github.com/RanSuLab/EOCprognosis
Mayer 202239 TCGA-OV (Multi-city, USA) + Unique (Frankfurt, Germany) 1 Diagnosis Malignancy None
Nero 202244 Unique (Rome, Italy) 2 Diagnosis, prognosis Genetic mutation; relapse None
Salguero 202267 TCGA-OV (Multi-city, USA) 1 Diagnosis Malignancy None
Wang 202233 Tri-Service (Taipei, Taiwan) 4 Prognosis Treatment response None
Wang 202234 Tri-Service (Taipei, Taiwan) 1 Prognosis Treatment response None
Yokomizo 202243 Unique (Tokyo, Japan) 3 Prognosis Overall survival, progression-free survival, relapse Inaccessible
Ho 202336 MSKCC (New York, USA) 2 Diagnosis, other Genetic mutation; tumour segmentation https://github.com/MSKCC-Computational-Pathology/DMMN-ovary
Meng 202316 Unique (Beijing, China) 1 Diagnosis Malignancy https://github.com/dreambamboo/STT-BOX-public
Ramasamy 202354 TCGA-OVa (Multi-city, USA) 2 Diagnosis, other Primary cancer type; tumour segmentation None
Wang 202335 Tri-Service (Taipei, Taiwan) 4 Prognosis Treatment response https://github.com/cwwang1979/OvaryTreatment_AnginPKM2VEGF
Wu 202345 TCGA-OV (Multi-city, USA) 1 Prognosis Overall survival None

Details are shown for individual models in Table 2. Six data sources are used in multiple studies—The Cancer Genome Atlas (TCGA-OV)25, the British Columbia Ovarian Cancer Research Program (OVCARE), The Transcanadian Study72, and three individual centres (Mayo Clinic, Tri-Service, and Memorial Sloan Kettering Cancer Center (MSKCC)). Code is labelled as inaccessible where it could not be found despite a link being provided in the publication.

aIndicates papers where significant discrepancies were found regarding the data source, as described in the “Discussion”.

Table 2.

Characteristics of the 80 models of interest from the 45 papers included in this systematic review, grouped by model outcome.

graphic file with name 41698_2023_432_Tab1_HTML.gif
graphic file with name 41698_2023_432_Tab2_HTML.gif
graphic file with name 41698_2023_432_Tab3_HTML.gif

SVM support vector machine, CNN convolutional neural network, AUC area under the receiver operating characteristic (ROC) curve, HGSC high-grade serous carcinoma, LGSC low-grade serous carcinoma, CCC clear cell carcinoma, MC mucinous carcinoma, EC endometrioid carcinoma, H&E haematoxylin and eosin, IHC immunohistochemistry, TMA individual cores from tissue microarrays, WSI whole slide images of biopsy or resection specimens.

aOther data types are Genomics (G), Proteomics (P), Radiomics (R), and Transcriptomics (T).

Risk of bias assessment

The results of the PROBAST assessments are shown in Table 3. While some studies contained multiple models of interest, none of these contained models with different risk of bias scores for any section of the PROBAST assessment, so one risk of bias analysis is presented per paper. All models showed either a high overall risk of bias (37/45) or an unclear overall risk of bias (8/45). Every high-risk model had a high-risk score in the analysis section (37/45), with several also being at high risk for participants (6/45), predictors (11/45), or outcomes (13/45). Less than half of the studies achieved a low risk of bias in any domain (21/45), with most low risks being found in the outcomes (16/45) and predictors (9/45) sections. Nearly all of the papers had an unclear risk of bias in at least one domain, most commonly the participants (36/45) and predictors (25/45) domains. Qualitative summaries are presented in Fig. 2.

Table 3.

PROBAST risk of bias assessment results for the 45 papers included in this review.

Publication Participants Predictors Outcome Analysis Overall
Dong 201049 High High High High High
Dong 201050 High High High High High
Signolle 201051 Unclear Unclear High High High
Janowczyk 201152 Unclear Unclear Low High High
Janowczyk 201253 Unclear High Unclear High High
Kothari 201218 Unclear Low Low Unclear Unclear
Poruthoor 201321 Unclear High High High High
BenTaieb 201529 Unclear Unclear Low High High
BenTaieb 201630 Unclear High Unclear High High
BenTaieb 201748 Unclear Unclear Low High High
Lorsakul 201766 Unclear Unclear High High High
Du 201840 Unclear Unclear Unclear Unclear Unclear
Heindl 201857 Unclear Low Low High High
Kalra 202015 Unclear Low Low High High
Levine 202026 Unclear Low Low Unclear Unclear
Yaar 202022 Unclear Unclear Low High High
Yu 202019 Unclear Low Low High High
Gentles 202155 High Unclear High High High
Ghoniem 202123 Unclear Unclear Unclear High High
Jiang 202131 High High Unclear High High
Laury 202156 Low High High High High
Paijens 202137 Low High Unclear High High
Shin 202138 Unclear Unclear Unclear High High
Zeng 202124 Unclear Unclear Low High High
Boehm 202217 Unclear High Unclear High High
Boschman 202227 Unclear Low Low High High
Elie 202261 Unclear Low High High High
Farahani 202228 Unclear Unclear Low Unclear Unclear
Hu 202241 Unclear Unclear Unclear Unclear Unclear
Jiang 202232 Unclear Unclear High High High
Kasture 202246 High High High High High
Kowalski 202247 Unclear Unclear Unclear High High
Lazard 202242 Unclear Unclear Unclear Unclear Unclear
Liu 202220 Unclear Unclear Unclear Unclear Unclear
Mayer 202239 Unclear Unclear High High High
Nero 202244 Unclear Low High High High
Salguero 202267 Unclear Unclear Low High High
Wang 202233 Unclear Unclear Unclear High High
Wang 202234 Unclear Unclear Low High High
Yokomizo 202243 Low Low Unclear Unclear Unclear
Ho 202336 Unclear Unclear Unclear High High
Meng 202316 Unclear Unclear Low High High
Ramasamy 202354 High High High High High
Wang 202335 Unclear Unclear Unclear High High
Wu 202345 Unclear Unclear Low High High

This is presented as one row for each paper because every paper that contained multiple models of interest was found to have the same risk of bias for every model.

Fig. 2. PROBAST risk of bias results.

Fig. 2

PROBAST risk of bias results summarised for the 45 papers included in this review.

Data synthesis results

Data in included literature

The number of participants in internal datasets varied by orders of magnitude, with each study including 1–776 ovarian cancer patients, and one study including over 10,000 total patients across a range of 32 malignancies15. Most research only used data from the five most common subtypes of ovarian carcinoma, though one recent study included the use of sex cord-stromal tumours16. Only one study explicitly included any prospective data collection, and this was only for a small subset which was not used for external validation17.

As shown in Fig. 3, the number of pathology slides used was often much greater than the number of patients included, with three studies using over 1000 slides from ovarian cancer patients1820. In most of the studies, model development samples were WSIs containing resected or biopsied tissue (34/45), with others using individual tissue microarray (TMA) core images (5/45) or pre-cropped digital pathology images (3/45). Most studies used H&E-stained tissue (33/45) and others used a variety of IHC stains (11/45), with no two papers reporting the use of the same IHC stains. Some studies included multi-modal approaches, using genomics 17,2124, proteomics21,24, transcriptomics24, and radiomics17 data alongside histopathological data.

Fig. 3. Number of patients and slides per model.

Fig. 3

Histograms showing the number of a ovarian cancer patients and b ovarian cancer histopathology slides used in model development. Many of these values are uncertain due to incomplete reporting, as reflected in Table 2.

The most commonly used data source was The Cancer Genome Atlas (TCGA) (18/45), a project from which over 30,000 digital pathology images from 33 malignancies are publicly available. The ovarian cancer subset, TCGA-OV25, contains 1481 WSIs from 590 cases of ovarian serous carcinoma (mostly, but not exclusively, high-grade), with corresponding genomic, transcriptomic, and clinical data. This includes slides from eight data centres in the United States, with most slides containing frozen tissue sections (1374/1481) rather than formalin-fixed, paraffin-embedded (FFPE) sections. Other recurring data sources were the University of British Columbia Ovarian Cancer Research Program (OVCARE) repository2628, the Transcanadian study29,30, and clinical records at the Mayo Clinic31,32, Tri-Service General Hospital3335, and Memorial Sloan Kettering Cancer Center17,36. All other researchers either used a unique data source (12/45) or did not report the provenance of their data (8/45). TCGA-OV, OVCARE, and the Transcanadian study are all multi-centre datasets. Aside from these, few studies reported the use of multi-centre data17,24,28,3739. Only two studies reported the use of multiple slide scanners, with every slide scanned on one of two available scanners27,28. The countries from which data were sourced included Canada, China, Finland, France, Germany, Italy, Japan, the Netherlands, South Korea, Taiwan, the United Kingdom, and the United States of America.

Methods in included literature

There was a total of 80 models of interest in the 45 included papers, with each paper containing 1–6 such models. There were 37 diagnostic models, 22 prognostic models, and 21 other models predicting diagnostically relevant information. Diagnostic model outcomes included the classification of malignancy status (10/37), histological subtype (7/37), primary cancer type (5/37), genetic mutation status (4/37), tumour-stroma reaction level (3/37), grade (2/37), transcriptomic subtype (2/37), stage (1/37), microsatellite instability status (1/37), epithelial-mesenchymal transition status (1/37), and homologous recombination deficiency status (1/37). Prognostic models included the prediction of treatment response (11/23), overall survival (6/23), progression-free survival (3/23), and recurrence (2/23). The other models performed tasks that could be used to assist pathologists in analysing pathology images, including measuring the quantity/intensity of staining, generating segmentation masks, and classifying tissue/cell types.

A variety of models were used, with the most common types being convolutional neural network (CNN) (41/80), support vector machine (SVM) (10/80), and random forest (6/80). CNN architectures included GoogLeNet40, VGG1619,32, VGG1926,28, InceptionV33335,38, ResNet1817,27,28,39,41,42, ResNet3443, ResNet5016,44,45, ResNet18236, and MaskRCNN32. Novel CNNs typically used multiple standardised blocks involving convolutional, normalisation, activation, and/or pooling layers22,46,47, with two studies also including attention modules20,35. One study generated their novel architecture by using a topology optimisation approach on a standard VGG1623.

Most researchers split their original images into patches to be separately processed, with patch sizes ranging from 60×60 to 2048×2048 pixels, the most common being 512×512 pixels (19/56) and 256×256 pixels (12/56). A range of feature extraction techniques were employed, including both hand-crafted/pre-defined features (23/80) and features that were automatically learned by the model (51/80). Hand-crafted features included a plethora of textural, chromatic, and cellular and nuclear morphological features. Hand-crafted features were commonly used as inputs to classical ML methods, such as SVM and random forest models. Learned features were typically extracted using a CNN, which was often also used for classification.

Despite the common use of patches, most models made predictions at the WSI level (29/80), TMA core level (18/80), or patient level (6/80), requiring aggregation of patch-level information. Two distinct aggregation approaches were used, one aggregating before modelling and one aggregating after modelling. The former approach requires the generation of slide-level features before modelling, the latter requires the aggregation of patch-level model outputs to make slide-level predictions. Slide-level features were generated using summation16, averaging21,24,36, attention-based weighted averaging20,41,42,44,45, concatenation15,30, as well as more complex embedding approaches using Fisher vector encoding29 and k-means clustering48. Patch-level model outputs were aggregated to generate slide-level predictions by taking the maximum22,35, median43, or average23, using voting strategies27,34, or using a random forest classifier28. These approaches are all examples of multiple instance learning (MIL), though few models of interest were reported using this terminology22,41,42,44.

Most studies included segmentation at some stage, with many of these analysing tumour/stain segmentation as a model outcome32,36,37,47,4954. Some other studies used segmentation to determine regions of interest for further modelling, either simply separating tissue from background15,18,44,45, or using tumour segmentation to select the most relevant tissue regions3335,55,56. One study also used segmentation to detect individual cells for classification57. Some studies also used segmentation in determining hand-crafted features relating to the quantity and morphology of different tissues, cells, and nuclei17,18,21,24,30,31.

While attention-based approaches have been applied to other malignancies for several years58,59, they were only seen in the most recent ovarian cancer studies20,28,3335,41,42,44,45, and none of the methods included self-attention, an increasingly popular method for other malignancies60. Most models were deterministic, though hidden Markov trees51, probabilistic boosting trees52, and Gaussian mixture models61 were also used. Aside from the common use of low-resolution images to detect and remove non-tissue areas, images were typically analysed at a single resolution, with only six papers including multi-magnification techniques in their models of interest. Four of these combined features from different resolutions for modelling29,30,36,48, and the other two used different magnifications for selecting informative tissue regions and for modelling33,34. Out of the papers for which it could be determined, the most common modelling magnifications were ×20 (35/41) and ×40 (7/41). Few models integrated histopathology data with other modalities (6/80). Multi-modal approaches included the concatenation of separately extracted uni-modal features before modelling21,23,24, the amalgamation of uni-modal predictions from separate models17, and a teacher–student approach where multiple modalities were used in model training but only histopathology data was used for prediction22.

Analysis in included literature

Analyses were limited, with less than half of the model outcomes being evaluated with cross-validation (39/80) and with very few externally validated using independent ovarian cancer data (7/80), despite small internal cohort sizes. Cross-validation methods included k-fold (22/39) with 3–10 folds, Monte Carlo (12/39) with 3–15 repeats, and leave-one-patient-out cross-validations (5/39). Some other papers included cross-validation on the training set to select hyperparameters but used only a small unseen test set from the same data source for evaluation. Externally validated models were all trained with WSIs, with validations either performed on TMA cores (2/7) or WSIs from independent data sources (5/7), with two of these explicitly using different scanners to digitise internal and external data27,28. Some reported methods were externally validated with data from non-ovarian malignancies, but none of these included ovarian cancer data in any capacity, so were not included in the review. However, there was one method which trained with only gastrointestinal tumour data and externally validated with ovarian tumour data16.

Most classification models were evaluated using accuracy, balanced accuracy, and/or area under the receiver operating characteristic curve (AUC), with one exception where only a p-value was reported measuring the association between histological features and transcriptomic subtypes based on a Kruskal–Wallis test19. Some models were also evaluated using the F1-score, which we chose not to tabulate (in Fig. 3) as the other metrics were reported more consistently. Survival model performance was typically reported using AUC, with other metrics including p-value, accuracy, hazard ratios, and C-index, which is similar to AUC but can account for censoring. Segmentation models were almost all evaluated differently from each other, with different studies reporting AUC, accuracy, Dice coefficient, intersection over union, sensitivity, specificity, and qualitative evaluations. Regression models were all evaluated using the coefficient of determination (R2-statistic). For some models, performance was broken down per patient39,61, per subtype16, or per class15,24,32,57, without an aggregated, holistic measure of model performance.

The variability of model performance was not frequently reported (33/94), and when it was reported it was often incomplete. This included cases where it was unclear what the intervals represented (95% confidence interval, one standard deviation, variation, etc.), or not clear what the exact bounds of the interval were due to results being plotted but not explicitly stated. Within the entire review, there were only three examples in which variability was reported during external validation27,38,39, only one of which clearly reported both the bounds and the type of the interval38. No studies performed any Bayesian form of uncertainty quantification. Reported results are shown in Table 2, though direct comparisons between the performance of different models should be treated with caution due to the diversity of data and validation methods used to evaluate different models, the lack of variability measures, the consistently high risks of bias, and the heterogeneity in reported metrics.

Discussion

The vast majority of published research on AI for diagnostic or prognostic purposes in ovarian cancer histopathology was found to be at a high risk of bias due to issues within the analyses performed. Researchers often used a limited quantity of data and conducted analyses on a single train-test data split without using any methods to account for overfitting and model optimism (cross-validation, bootstrapping, external validation). These limitations are common in gynaecological AI research using other data types, with recent reviews pointing to poor clinical utility caused by predominantly retrospective studies using limited data62,63 and limited methodologies with weak validation, which risk model performance being overestimated64,65.

The more robust analyses included one study in which several relevant metrics were evaluated using 10 repeats of Monte Carlo cross-validation on a set of 406 WSIs, with standard deviations reported for each metric26. Other positive examples included the use of both internal cross-validation and external validation for the same outcome, giving a more rigorous analysis28,34,39. While external validations were uncommon, those which were conducted offered a real insight into model generalisability, with a clear reduction in performance on all external validation sets except one28. The only study which demonstrated high generalisability included the largest training set out of all externally validated approaches, included more extensive data labelling than many similar studies, and implemented a combination of three colour normalisation approaches, indicating that these factors may benefit generalisability.

Studies frequently had an unclear risk of bias within the participants and predictors domains of PROBAST due to incomplete reporting. Frequently missing information included where the patients were recruited, how many patients were included, how many samples/images were used, whether any patients/images were excluded, and the methods by which tissue was processed and digitised. Reporting was often poor regarding open-access datasets. Only three papers were found to be at low risk of bias for participants, with these including clear and reasonable patient recruitment strategies and selection criteria, which can be seen as positive examples for other researchers37,43,56. Information about the predictors (histopathology images and features derived thereof) was generally better reported, but still often missed key details which meant that it was unclear whether all tissue samples were processed similarly to avoid risks of bias from visual heterogeneity. It was found that when patient characteristics were reported, they often showed a high risk of bias. Many studies included very small quantities of patients with specific differences from the majority (e.g. less than 20 patients with a different cancer subtype to the majority), causing a risk of spurious correlations and results which are not generalisable to the wider population.

Reporting was particularly sparse in studies which used openly accessible data, possibly indicating that AI-focused researchers were not taking sufficient time to understand these datasets and ensure their research was clinically relevant. For example, many of the researchers who used TCGA data included frozen tissue sections without commenting on whether this was appropriate, despite the fact that pathologists do not consider them to be of optimal diagnostic quality. One paper handled TCGA data more appropriately, with a clear explanation of the positives and negatives of the dataset, and entirely separate models for FFPE and frozen tissue slides15.

Sharing code can help to mitigate the effects of incomplete reporting and drastically improve reproducibility, but only 19 of the 45 papers did this, with some of these appearing to be incomplete or inaccessible. The better code repositories included detailed documentation to aid reproducibility, including environment set-up information16,19, overviews of included functions17,36,42, and code examples used to generate reported results57.

Two papers were found to have major discrepancies between the reported data and the study design, indicating much greater risks of bias than those seen in any other research46,54. In one paper46, it was reported that TCGA-OV data was used for subtyping with 5 classes, despite this dataset only including high-grade serous and low-grade serous carcinomas. In the other paper54, it was reported that TCGA-OV data was used for slide-level classification into ovarian cancer and non-ovarian cancer classes using PAS-stained tissue, despite TCGA-OV only containing H&E-stained ovarian cancer slides.

Limitations of the review

While the review protocol was designed to reduce biases and maximise the quantity of relevant research included, there were some limitations. This review is restricted to published literature in the English language, however, AI research may be published in other languages or made available as pre-prints without publication in peer-reviewed journals, making this review incomplete. While most of the review process was completed by multiple independent researchers, the duplicate detection was performed by only a single researcher, raising the possibility of errors in this step of the review process, resulting in incorrect exclusions. Due to the significant time gap between the initial and final literature searches (approximately 12 months), there may have been inconsistencies in interpretations, both for data extraction and risk of bias assessments. Finally, this review focused only on light microscopy images of human histopathology samples relating to ovarian cancer, so may have overlooked useful literature outside of this domain.

Development of the field

The field of AI in ovarian cancer histopathology diagnosis is rapidly growing, with more research published since the start of 2020 than in all preceding years combined. The earliest research, published between 2010 and 2013, used hand-crafted features to train classical ML methods such as SVMs. These models were used for segmentation4951,53, malignancy classification18,52, grading21, and overall survival prediction21. Most of these early studies focused on IHC-stained tissue (5/7), which would be much less commonly used in subsequent research (6/38).

The field was relatively dormant in the following years, with only 6 papers published between 2014 and 2019, half of which had the same primary author29,30,48. These models still used traditional ML classifiers, though some used learned features rather than the traditional hand-crafted features. The models developed were used for histological subtyping29,30,48 and cellular/tissue classification40,57,66.

Since 2020, there has been a much greater volume of research published, most of which has involved the use of deep neural networks for automatic feature extraction and classification, with a minority using traditional machine learning model17,24,31,61,67. Recent research has investigated a broader array of diagnostic outcomes, including the classification of primary cancer type15,54, mutation status24,36,44, homologous recombination deficiency status42, tumour–stroma reaction level32, transcriptomic subtypes19,24, microsatellite instability24, and epithelial-mesenchymal transition status41. Three additional prognostic outcomes have also been predicted in more recent literature—progression-free survival17,43,56, relapse43,44, and treatment response19,22,3335.

Despite progress within a few specific outcomes, there was no obvious overall trend in the sizes of datasets used over time, either in terms of the number of slides or the number of participants. Similarly, there was no evidence that recent research included more rigorous internal validations, though external validations have been increasing in frequency—no research before 2021 included any external validation with ovarian cancer data, but seven studies published more recently did16,24,27,28,34,38,39. While these external validations were typically limited to small quantities of data, the inclusion of any external validation demonstrates progress from previous research. Such validations are essential to the clinical utility of these models as real-world implementation will require robustness to different sources of visual heterogeneity, with variation occurring across different data centres and within data centres over time. As this field continues to mature, we hope to see more studies conduct thorough validations with larger, high-quality independent datasets, including clearly reported protocols for patient recruitment and selection, pathology slide creation, and digitisation. This will help to reduce the biases, limited reproducibility, and limited generalisability identified in most of the existing research in this domain.

Current limitations and future recommendations

A large proportion of published work did not provide sufficient clinical and pathological information to assess the risk of bias. It is important that AI researchers thoroughly report data provenance to understand the extent of heterogeneity in the dataset, and to understand whether this has been appropriately accounted for in the study design. Modelling and analysis methods must also be thoroughly reported to improve reliability and reproducibility. Researchers may find it useful to refer to reporting checklists, such as transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD)68, to ensure that they have understood and reported all relevant details of their studies. In many studies, it is not clear how AI would fit in the clinical workflow, or whether there are limitations in how these methods could be applied. AI researchers should ensure they understand the clinical context of their data and potential models before undertaking research to reduce bias and increase utility. Ideally, this will involve regular interactions with expert clinicians, including histopathologists and oncologists.

To further improve reproducibility, we recommend that researchers should make code and data available where possible. It is relatively easy to publish code and generate documentation to enhance usability, and there are few drawbacks to doing so when publishing research. Making data available is more often difficult due to data security requirements and the potential storage costs, but it can provide benefits beyond the primary research of the original authors. Digital pathology research in ovarian cancer is currently limited by the lack of openly accessible data, leading to over-dependence on TCGA, and causing many researchers to painstakingly collate similar but distinct datasets. These datasets often contain little of the heterogeneity seen in multi-centre, multi-scanner data, making it difficult for researchers to train robust models or assess generalisability. Where heterogeneous data is included, it often includes small quantities of data which are different to the majority, introducing risks of bias and confounding rather than helping to overcome these issues. TCGA-based studies are prone to this, with significant differences between TCGA slides originating from different data centres69, but with many of these centres only providing small quantities of data. Many researchers are reliant on open-access data, but there is a severe shortage of suitable open-access ovarian cancer histopathology data. Making such data available, with detailed protocols describing data creation, allows researchers to conduct more thorough analyses and significantly improve model generalisability and clinical implementability.

For AI to achieve clinical utility, it is essential that more robust validations are performed, especially considering the limitations of the available datasets. We recommend that researchers should always conduct thorough analyses, using cross-validation, bootstrapping, and/or external validations to ensure that results are robust and truly reflect the ability of their model(s) to generalise to unseen data, and are not simply caused by chance. This should include reporting the variability of results (typically in a 95% confidence interval), especially when comparing multiple models to help to distinguish whether one model is genuinely better than another or whether the difference is due to chance. Statistical tests can also be beneficial for these evaluations. Another option for capturing variability is Bayesian uncertainty quantification, which can be used to separate aleatoric (inherent) and epistemic (modelling) uncertainty.

Current literature in this field can be largely characterised as model prototyping with homogeneous retrospective data. Researchers rarely consider the reality of human-machine interaction, perhaps believing that these models are a drop-in replacement for pathologists. However, these models perform narrow tasks within the pathology pipeline and do not take into consideration the clinical context beyond their limited training datasets and siloed tasks. We believe these models would be more beneficial (and more realistic to implement) as assistive tools for pathologists, providing secondary opinions or novel ancillary information. While current research is typically focused on assessing model accuracy without any pathologist input, different study designs could be employed to better assess the real-world utility of these models as assistive tools. For example, usability studies could investigate which models are most accessible and most informative to pathologists in practice, and prospective studies could quantify any benefits to diagnostic efficiency and patient outcomes, and investigate the robustness of models in practice. Understanding the effects of AI on the efficiency of diagnosis is particularly important given the limited supply of pathologists worldwide. As such, this type of research will significantly benefit clinical translation.

Summary of recommendations

To improve clinical utility, researchers should understand their data and ensure planned research is clinically relevant before any modelling, ideally involving clinicians throughout the project. They should also consider different study designs, including usability studies and/or prospective studies. When evaluating models, researchers should conduct thorough analyses using cross-validation, external validation, and/or bootstrapping. When reporting research, researchers should clearly report the context of any histopathology data, including how patients were recruited/selected, and how tissue specimens were processed to generate digital pathology images. Finally, researchers should make all code openly accessible, and make data available where possible.

Methods

Literature search

Searches were conducted in three research databases, PubMed, Scopus and Web of Science, and two trial registries, Cochrane Central Register of Controlled Trials (CENTRAL) and the World Health Organisation International Clinical Trial Registry Platform (WHO-ICTRP). The research databases only include journals and conference proceedings which have undergone peer review, ensuring the integrity of included research. The initial searches were performed on 25/04/2022 and were most recently repeated on 19/05/2023. The search strategy was composed of three distinct aspects—artificial intelligence, ovarian cancer, and histopathology. For each aspect, multiple relevant terms were combined using the OR operator (e.g. “artificial intelligence” OR “machine learning”), and then these were combined using the AND operator to ensure that retrieved research met all three aspects. The widest possible set of search fields was used for each search engine except for Scopus, where restrictions were imposed to avoid searching within the citation list of each article, which is not an available field in the other search engines. The terms “ML” and “AI” were restricted to specific fields due to the diversity of their possible meanings. To ensure the most rigorous literature search possible, no restrictions were placed on the publication date or article type during searching.

Many AI approaches build on statistical models, such as logistic regression, which can blur the lines between disciplines. When conducting searches, a previously reported methodology was adopted70 whereby typical AI approaches were searched by name (e.g. neural networks), and other methods were searched by whether the authors described their work as artificial intelligence. Full details of the search implementation for each database are provided in Supplementary Note 1. The review protocol was registered with PROSPERO before the search results were screened for inclusion (CRD42022334730).

Literature selection

One researcher (J.B.) manually removed duplicate papers with the assistance of the referencing software EndNote X9. Two researchers (J.B., K.A.) then independently screened articles for inclusion in two stages, the first based on title and abstract, the second based on full text. Disagreements were discussed and arbitrated by a third researcher (N.R. or N.M.O.). Trials in WHO-ICTRP do not have associated abstracts, so for these studies, only titles were available for initial screening.

The inclusion criteria required that research evaluated the use of at least one AI approach to make diagnostic or prognostic inferences on human histopathology images from suspected or confirmed cases of ovarian cancer. Studies were only included where AI methods were applied directly to the digital pathology images, or to features which were automatically extracted from the images. Fundamental tasks, such as segmentation and cell counting, were included as these could be used by pathologists for computer-aided diagnosis. Only conventional light microscopy images were considered, with other imaging modalities, such as fluorescence and hyperspectral imaging, excluded. Publications which did not include primary research were excluded (such as review papers). Non-English language articles and research where a full version of the manuscript was not accessible were excluded.

A model in an included study was considered to be a model of interest if it met the same inclusion criteria. Where multiple models were compared against the same outcome, the model of interest was taken to be the newly proposed model, with the best performing model during validation taken if this was unclear. If multiple model outcomes were assessed in the same study, a model of interest was taken for each model outcome, regardless of any similarity in modelling approaches. The same model outcome at different levels of precision (e.g. patch-level, slide-level, patient-level) were not considered to be different model outcomes. Models did not need to be entirely independent, for example, the output of one model of interest could have been used as the input of another model of interest on the condition that model performance was separately evaluated for each model.

Risk of bias assessment

The risk of bias was assessed for models of interest using the Prediction model Risk Of Bias ASsessment Tool (PROBAST)71, where risk of bias is the chance of reported results being distorted by limitations within the study design, conduct, and analysis. It includes 20 guiding questions which are categorised into four domains (participants, predictors, outcome, and analysis), which are summarised as either high-risk or low-risk, or unclear in the case that there is insufficient information to make a comprehensive assessment and none of the available information indicates a high risk of bias. As such, an unclear risk of bias does not indicate methodological flaws, but incomplete reporting.

The participants domain covers the recruitment and selection of participants to ensure the study population is consistent and representative of the target population. Relevant details include the participant recruitment strategy (when and where participants were recruited), the inclusion criteria, and how many participants were recruited.

The predictors domain covers the consistent definition and measurement of predictors, which in this field typically refers to the generation of digital pathology images. This includes methods for fixing, staining, scanning, and digitally processing tissue before modelling.

The outcome domain covers the appropriate definition and consistent determination of ground-truth labels. This includes the criteria used to determine diagnosis/prognosis, the expertise of any persons determining these labels, and whether labels are determined independently of any model outputs.

The analysis domain covers statistical considerations in the evaluation of model performance to ensure valid and not unduly optimistic results. This includes many factors, such as the number of participants in the test set with each outcome, the validation approaches used (cross-validation, external validation, bootstrapping, etc.), the metrics used to assess performance, and methods used to overcome the effects of censoring, competing risks/confounders, and missing data. The risks caused by some of these factors are interrelated, for example, the risk of bias from using a small dataset is somewhat mitigated by cross-validation, which increases the effective size of the test set and can be used to assess variability, reducing optimism in the results. Further, the risk caused by using a small dataset depends on the type of outcome being predicted, for example, more data is required for a robust analysis of 5-class classification than binary classification. There must also be sufficient data within all relevant patient subgroups, for example, if multiple subtypes of ovarian cancer are included, there must not be a subtype that is only represented by a few patients. Due to these interrelated factors, there are no strict criteria to determine the appropriate size of a dataset, though fewer than 50 samples per class or fewer than 100 samples overall is likely to be considered high-risk, and more than 1000 samples overall is likely to be considered low-risk.

Risks of bias often arise due to inconsistent methodologies. Inconsistency in the participants and predictors domains may cause heterogeneity in the visual properties of digital pathology slides which may lead to spurious correlations, either through random chance or systematic differences between subgroups in the dataset. Varied data may be beneficial during training to improve model generalisability when using large datasets, though this must be closely controlled to avoid introducing systematic confounding. Inconsistent determination of the outcome can mean that the results of a study are unreliable due to spurious correlations in the ground truth labels, or invalid due to incorrect determination of labels.

While PROBAST provides a framework to assess risks of bias, there is some level of subjectivity in the interpretation of signalling questions. As such, each model was analysed by three independent researchers (any of J.B., K.A., N.R., K.Z., N.M.O.), with at least one computer scientist and one clinician involved in the risk of bias assessment for each model. The PROBAST applicability of research analysis was not implemented as it is unsuitable for such a diverse array of possible research questions.

Data synthesis

Data extraction was performed independently by two researchers (J.B., K.A.) using a form containing 81 fields within the categories Overview, Data, Methods, Results, and Miscellaneous. Several of these fields were added or clarified during data extraction with the agreement of both researchers and retroactively applied to all accepted literature. The final data extraction form is available at www.github.com/scjjb/OvCaReview, and is summarised in Supplementary Table 1.

Information was sought from full-text articles, as well as references and supplementary materials where appropriate. Inferences were made only when both researchers were confident that this gave the correct information, with disagreements resolved through discussion. Fields which could not be confidently completed were labelled as being unclear.

All extracted data were summarised in two tables, one each for study-level and model-level characteristics. Only models of interest were included in these tables. The term model outcome refers to the model output, whether this was a clinical outcome (diagnosis/prognosis), or a diagnostically relevant outcome that could be used for computer-aided diagnosis, such as tumour segmentation. The data synthesis did not include any meta-analysis due to the diversity of included methods and model outcomes. The PRISMA 2020 guidelines for reporting systematic reviews were followed, with checklists provided in Supplementary Tables 2 and 3.

Supplementary information

Supplemental Material (276.8KB, pdf)

Acknowledgements

There was no direct funding for this research. J.B. is supported by the UKRI Engineering and Physical Sciences Research Council (EPSRC) [EP/S024336/1]. K.A. and P.A. are supported by the Tony Bramall Charitable Trust. A.S. is supported by Innovate UK via the National Consortium of Intelligent Medical Imaging (NCIMI) [104688], Cancer Research UK [C19942/A28832] and Leeds Hospitals Charity [9R01/1403]. The funders had no role in influencing the content of this research. For the purpose of open access, the author has applied a Creative Commons Attribution (CC BY) licence to any Author Accepted Manuscript version arising from this submission.

Author contributions

J.B. created the study protocol with feedback and contributions from all other authors. J.B., K.A., K.Z., N.M.O., and N.R. performed the risk of bias assessments. J.B. and K.A. performed data extraction. J.B. analysed extracted data and wrote the manuscript, with feedback and contributions from all other authors.

Data availability

The authors declare that the main data supporting the findings of this study are available within the article and its Supplementary Information files. Extra data are available from the corresponding author upon request.

Competing interests

G.H. receives research funding from IQVIA. N.M.O. receives research funding from 4D Path. The other authors declare no competing interests.

Footnotes

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

These authors jointly supervised this work: Nicolas M. Orsi, Nishant Ravikumar.

Supplementary information

The online version contains supplementary material available at 10.1038/s41698-023-00432-6.

References

  • 1.Sung H, et al. Global cancer statistics 2020: Globocan estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J. Clin. 2021;71:209–249. doi: 10.3322/caac.21660. [DOI] [PubMed] [Google Scholar]
  • 2.Menon U, et al. Ovarian cancer population screening and mortality after long-term follow-up in the UK Collaborative Trial of Ovarian Cancer Screening (UKCTOCS): a randomised controlled trial. Lancet. 2021;397:2182–2193. doi: 10.1016/S0140-6736(21)00731-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Ebell MH, Culp MB, Radke TJ. A systematic review of symptoms for the diagnosis of ovarian cancer. Am. J. Prev. Med. 2016;50:384–394. doi: 10.1016/j.amepre.2015.09.023. [DOI] [PubMed] [Google Scholar]
  • 4.Berek JS, Renz M, Kehoe S, Kumar L, Friedlander M. Cancer of the ovary, fallopian tube, and peritoneum: 2021 update. Int. J. Gynecol. Obstet. 2021;155:61–85. doi: 10.1002/ijgo.13878. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Köbel M, et al. Ovarian carcinoma subtypes are different diseases: implications for biomarker studies. PLoS Med. 2008;5:e232. doi: 10.1371/journal.pmed.0050232. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Prat J. Staging classification for cancer of the ovary, fallopian tube, and peritoneum. Int. J. Gynecol. Obstet. 2014;124:1–5. doi: 10.1016/j.ijgo.2013.10.001. [DOI] [PubMed] [Google Scholar]
  • 7.Matsuno RK, et al. Agreement for tumor grade of ovarian carcinoma: analysis of archival tissues from the surveillance, epidemiology, and end results residual tissue repository. Cancer Causes Control. 2013;24:749–757. doi: 10.1007/s10552-013-0157-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Köbel M, et al. Ovarian carcinoma histotype determination is highly reproducible, and is improved through the use of immunohistochemistry. Histopathology. 2014;64:1004–1013. doi: 10.1111/his.12349. [DOI] [PubMed] [Google Scholar]
  • 9.Barnard ME, et al. Inter-pathologist and pathology report agreement for ovarian tumor characteristics in the nurses’ health studies. Gynecol. Oncol. 2018;150:521–526. doi: 10.1016/j.ygyno.2018.07.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Wilson ML, et al. Access to pathology and laboratory medicine services: a crucial gap. Lancet. 2018;391:1927–1938. doi: 10.1016/S0140-6736(18)30458-6. [DOI] [PubMed] [Google Scholar]
  • 11.Royal College of Pathologists. Meeting pathology demand: histopathology workforce census. https://www.rcpath.org/static/952a934d-2ec3-48c9-a8e6e00fcdca700f/Meeting-Pathology-Demand-Histopathology-Workforce-Census-2018.pdf (2018).
  • 12.Baidoshvili A, et al. Evaluating the benefits of digital pathology implementation: time savings in laboratory logistics. Histopathology. 2018;73:784–794. doi: 10.1111/his.13691. [DOI] [PubMed] [Google Scholar]
  • 13.Stenzinger A, et al. Artificial intelligence and pathology: from principles to practice and future applications in histomorphology and molecular profiling. Semin. Cancer Biol. 2022;84:129–143. doi: 10.1016/j.semcancer.2021.02.011. [DOI] [PubMed] [Google Scholar]
  • 14.Raciti, P. et al. Clinical validation of artificial intelligence–augmented pathology diagnosis demonstrates significant gains in diagnostic accuracy in prostate cancer detection. Arch. Pathol. Lab. Med.10.5858/arpa.2022-0066-OA (2022). [DOI] [PubMed]
  • 15.Kalra S, et al. Pan-cancer diagnostic consensus through searching archival histopathology images using artificial intelligence. npj Digit. Med. 2020;3:31. doi: 10.1038/s41746-020-0238-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Meng Z, et al. A deep learning-based system trained for gastrointestinal stromal tumor screening can identify multiple types of soft tissue tumors. Am. J. Pathol. 2023;193:899–912. doi: 10.1016/j.ajpath.2023.03.012. [DOI] [PubMed] [Google Scholar]
  • 17.Boehm KM, et al. Multimodal data integration using machine learning improves risk stratification of high-grade serous ovarian cancer. Nat. Cancer. 2022;3:723–733. doi: 10.1038/s43018-022-00388-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Kothari, S., Phan, J. H., Osunkoya, A. O. & Wang, M. D. Biological interpretation of morphological patterns in histopathological whole-slide images. In Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine 218–225 (ACM, 2012). [DOI] [PMC free article] [PubMed]
  • 19.Yu KH, et al. Deciphering serous ovarian carcinoma histopathology and platinum response by convolutional neural networks. BMC Med. 2020;18:1–14. doi: 10.1186/s12916-020-01684-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Liu T, Su R, Sun C, Li X, Wei L. EOCSA: Predicting prognosis of epithelial ovarian cancer with whole slide histopathological images. Expert Syst. Appl. 2022;206:117643. [Google Scholar]
  • 21.Poruthoor, A., Phan, J. H., Kothari, S. & Wang, M. D. Exploration of genomic, proteomic, and histopathological image data integration methods for clinical prediction. In 2013 IEEE China Summit and International Conference on Signal and Information Processing 259–263 (IEEE, 2013). [DOI] [PMC free article] [PubMed]
  • 22.Yaar, A., Asif, A., Raza, S. E. A., Rajpoot, N. & Minhas, F. Cross-domain knowledge transfer for prediction of chemosensitivity in ovarian cancer patients. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops 928–929 (IEEE, 2020).
  • 23.Ghoniem RM, Algarni AD, Refky B, Ewees AA. Multi-modal evolutionary deep learning model for ovarian cancer diagnosis. Symmetry. 2021;13:643. [Google Scholar]
  • 24.Zeng H, Chen L, Zhang M, Luo Y, Ma X. Integration of histopathological images and multi-dimensional omics analyses predicts molecular features and prognosis in high-grade serous ovarian cancer. Gynecol. Oncol. 2021;163:171–180. doi: 10.1016/j.ygyno.2021.07.015. [DOI] [PubMed] [Google Scholar]
  • 25.Holback, C. et al. The cancer genome atlas ovarian cancer collection (TCGA-OV) (version 4) [data set]. The Cancer Imaging Archive. https://wiki.cancerimagingarchive.net/pages/viewpage.action?pageId=7569497 (2016).
  • 26.Levine AB, et al. Synthesis of diagnostic quality cancer pathology images by generative adversarial networks. J. Pathol. 2020;252:178–188. doi: 10.1002/path.5509. [DOI] [PubMed] [Google Scholar]
  • 27.Boschman J, et al. The utility of color normalization for ai-based diagnosis of hematoxylin and eosin-stained pathology images. J. Pathol. 2022;256:15–24. doi: 10.1002/path.5797. [DOI] [PubMed] [Google Scholar]
  • 28.Farahani H, et al. Deep learning-based histotype diagnosis of ovarian carcinoma whole-slide pathology images. Mod. Pathol. 2022;35:1983–1990. doi: 10.1038/s41379-022-01146-z. [DOI] [PubMed] [Google Scholar]
  • 29.BenTaieb, A., Li-Chang, H., Huntsman, D. & Hamarneh, G. Automatic diagnosis of ovarian carcinomas via sparse multiresolution tissue representation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part I 18 629–636 (Springer, 2015).
  • 30.BenTaieb A, Nosrati MS, Li-Chang H, Huntsman D, Hamarneh G. Clinically-inspired automatic classification of ovarian carcinoma subtypes. J. Pathol. Informatics. 2016;7:28. doi: 10.4103/2153-3539.186899. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Jiang J, et al. Digital pathology-based study of cell- and tissue-level morphologic features in serous borderline ovarian tumor and high-grade serous ovarian cancer. J. Pathol. Informatics. 2021;12:24. doi: 10.4103/jpi.jpi_76_20. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Jiang J, et al. Computational tumor stroma reaction evaluation led to novel prognosis-associated fibrosis and molecular signature discoveries in high-grade serous ovarian carcinoma. Front. Med. 2022;9:994467. doi: 10.3389/fmed.2022.994467. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Wang C-W, et al. A weakly supervised deep learning method for guiding ovarian cancer treatment and identifying an effective biomarker. Cancers. 2022;14:1651. doi: 10.3390/cancers14071651. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Wang C-W, et al. Weakly supervised deep learning for prediction of treatment effectiveness on ovarian cancer from histopathology images. Comput. Med. Imaging Graphics. 2022;99:102093. doi: 10.1016/j.compmedimag.2022.102093. [DOI] [PubMed] [Google Scholar]
  • 35.Wang C-W, et al. Interpretable attention-based deep learning ensemble for personalized ovarian cancer treatment without manual annotations. Comput. Med. Imaging Graphics. 2023;107:102233. doi: 10.1016/j.compmedimag.2023.102233. [DOI] [PubMed] [Google Scholar]
  • 36.Ho DJ, et al. Deep interactive learning-based ovarian cancer segmentation of h&e-stained whole slide images to study morphological patterns of brca mutation. J. Pathol. Informatics. 2023;14:100160. doi: 10.1016/j.jpi.2022.100160. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Paijens ST, et al. Prognostic image-based quantification of cd8cd103 t cell subsets in high-grade serous ovarian cancer patients. Oncoimmunology. 2021;10:1935104. doi: 10.1080/2162402X.2021.1935104. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Shin SJ, et al. Style transfer strategy for developing a generalizable deep learning application in digital pathology. Comput. Methods Programs Biomed. 2021;198:105815. doi: 10.1016/j.cmpb.2020.105815. [DOI] [PubMed] [Google Scholar]
  • 39.Mayer RS, et al. How to learn with intentional mistakes: Noisyensembles to overcome poor tissue quality for deep learning in computational pathology. Front. Med. 2022;9:959068. doi: 10.3389/fmed.2022.959068. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Du Y, et al. Classification of tumor epithelium and stroma by exploiting image features learned by deep convolutional neural networks. Ann. Biomed. Eng. 2018;46:1988–1999. doi: 10.1007/s10439-018-2095-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Hu, Y. et al. Predicting molecular traits from tissue morphology through self-interactive multi-instance learning. In Medical Image Computing and Computer Assisted Intervention–MICCAI 2022: 25th International Conference, Singapore, September 18–22, 2022, Proceedings, Part II 130–139 (Springer, 2022).
  • 42.Lazard T, et al. Deep learning identifies morphological patterns of homologous recombination deficiency in luminal breast cancers from whole slide images. Cell Rep. Med. 2022;3:100872. doi: 10.1016/j.xcrm.2022.100872. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Yokomizo R, et al. O3c glass-class: a machine-learning framework for prognostic prediction of ovarian clear-cell carcinoma. Bioinformatics Biol. Insights. 2022;16:11779322221134312. [Google Scholar]
  • 44.Nero C, et al. Deep-learning to predict brca mutation and survival from digital H&E slides of epithelial ovarian cancer. Int. J. Mol. Sci. 2022;23:11326. doi: 10.3390/ijms231911326. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Wu M, et al. Exploring prognostic indicators in the pathological images of ovarian cancer based on a deep survival network. Front. Genet. 2023;13:1069673. doi: 10.3389/fgene.2022.1069673. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Kasture KR, Choudhari D, Matte PN. Prediction and classification of ovarian cancer using enhanced deep convolutional neural network. Int. J. Eng. Trends Technol. 2022;70:310–318. [Google Scholar]
  • 47.Kowalski, P. A., Błoniarz, J. & Chmura, Ł. Convolutional neural networks in the ovarian cancer detection. In Computational Intelligence and Mathematics for Tackling Complex Problems 2 55–64 (Springer, 2022).
  • 48.BenTaieb A, Li-Chang H, Huntsman D, Hamarneh G. A structured latent model for ovarian carcinoma subtyping from histopathology slides. Med. Image Anal. 2017;39:194–205. doi: 10.1016/j.media.2017.04.008. [DOI] [PubMed] [Google Scholar]
  • 49.Dong, J., Li, J., Lu, J. & Fu, A. Automatic segmentation for ovarian cancer immunohistochemical image based on chroma criterion. In 2010 2nd International Conference on Advanced Computer Control, vol. 2 147–150 (IEEE, 2010).
  • 50.Dong, J., Li, J., Fu, A. & Lv, H. Automatic segmentation for ovarian cancer immunohistochemical image based on YUV color space. In 2010 International Conference on Biomedical Engineering and Computer Science 1–4 (IEEE, 2010).
  • 51.Signolle N, Revenu M, Plancoulaine B, Herlin P. Wavelet-based multiscale texture segmentation: application to stromal compartment characterization on virtual slides. Signal Process. 2010;90:2412–2422. [Google Scholar]
  • 52.Janowczyk, A., Chandran, S., Feldman, M. & Madabhushi, A. Local morphologic scale: application to segmenting tumor infiltrating lymphocytes in ovarian cancer tmas. In Medical Imaging 2011: Image Processing,vol. 7962 827–840 (SPIE, 2011).
  • 53.Janowczyk, A. et al. High-throughput biomarker segmentation on ovarian cancer tissue microarrays via hierarchical normalized cuts. IEEE Trans. Biomed. Eng.59, 1250–1252 (2012). [DOI] [PubMed]
  • 54.Ramasamy S, Kaliyaperumal V. A hybridized channel selection approach with deep convolutional neural network for effective ovarian cancer prediction in periodic acid-Schiff-stained images. Concurrency Comput. Pract. Exp. 2023;35:e7568. [Google Scholar]
  • 55.Gentles L, et al. Integration of computer-aided automated analysis algorithms in the development and validation of immunohistochemistry biomarkers in ovarian cancer. J. Clin. Pathol. 2021;74:469–474. doi: 10.1136/jclinpath-2020-207081. [DOI] [PubMed] [Google Scholar]
  • 56.Laury AR, Blom S, Ropponen T, Virtanen A, Carpén OM. Artificial intelligence-based image analysis can predict outcome in high-grade serous carcinoma via histology alone. Sci. Rep. 2021;11:19165. doi: 10.1038/s41598-021-98480-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Heindl A, et al. Microenvironmental niche divergence shapes BRCA1-dysregulated ovarian cancer morphological plasticity. Nat. Commun. 2018;9:3917. doi: 10.1038/s41467-018-06130-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Ilse, M., Tomczak, J. & Welling, M. Attention-based deep multiple instance learning. In International Conference on Machine Learning 2127–2136 (PMLR, 2018).
  • 59.Lu MY, et al. Data-efficient and weakly supervised computational pathology on whole-slide images. Nat. Biomed. Eng. 2021;5:555–570. doi: 10.1038/s41551-020-00682-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.He, K. et al. Transformers in medical image analysis: a review. Intell. Med.3, 59–78 (2022).
  • 61.Elie N, et al. Impact of automated methods for quantitative evaluation of immunostaining: towards digital pathology. Front. Oncol. 2022;12:931035. doi: 10.3389/fonc.2022.931035. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Shrestha, P. et al. A systematic review on the use of artificial intelligence in gynecologic imaging–background, state of the art, and future directions. Gynecol. Oncol.166, 596–605 (2022). [DOI] [PubMed]
  • 63.Zhou J, Cao W, Wang L, Pan Z, Fu Y. Application of artificial intelligence in the diagnosis and prognostic prediction of ovarian cancer. Comput. Biol. Med. 2022;146:105608. doi: 10.1016/j.compbiomed.2022.105608. [DOI] [PubMed] [Google Scholar]
  • 64.Fiste, O., Liontos, M., Zagouri, F., Stamatakos, G. & Dimopoulos, M. A. Machine learning applications in gynecological cancer: a critical review. Crit. Rev. Oncol. Hematol.179, 103808 (2022). [DOI] [PubMed]
  • 65.Xu H-L, et al. Artificial intelligence performance in image-based ovarian cancer identification: a systematic review and meta-analysis. EClinicalMedicine. 2022;53:101662. doi: 10.1016/j.eclinm.2022.101662. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Lorsakul, A. et al. Automated wholeslide analysis of multiplex-brightfield ihc images for cancer cells and carcinoma-associated fibroblasts. In Medical Imaging 2017: Digital Pathology, vol. 10140 41–46 (SPIE, 2017).
  • 67.Salguero, J. et al. Selecting training samples for ovarian cancer classification via a semi-supervised clustering approach. In Medical Imaging 2022: Digital and Computational Pathology, vol. 12039 20–24 (SPIE, 2022).
  • 68.Collins GS, Reitsma JB, Altman DG, Moons KG. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (tripod): the tripod statement. Ann. Intern. Med. 2015;162:55–63. doi: 10.7326/M14-0697. [DOI] [PubMed] [Google Scholar]
  • 69.Dehkharghanian T, et al. Biased data, biased AI: deep networks predict the acquisition site of TCGA images. Diagn. Pathol. 2023;18:1–12. doi: 10.1186/s13000-023-01355-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Dhiman P, et al. Methodological conduct of prognostic prediction models developed using machine learning in oncology: a systematic review. BMC Med. Res. Methodol. 2022;22:101. doi: 10.1186/s12874-022-01577-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Wolff RF, et al. Probast: a tool to assess the risk of bias and applicability of prediction model studies. Ann. Intern. Med. 2019;170:51–58. doi: 10.7326/M18-1376. [DOI] [PubMed] [Google Scholar]
  • 72.Köbel M, et al. Diagnosis of ovarian carcinoma cell type is highly reproducible: a transcanadian study. Am. J. Surg. Pathol. 2010;34:984–993. doi: 10.1097/PAS.0b013e3181e1a3bb. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental Material (276.8KB, pdf)

Data Availability Statement

The authors declare that the main data supporting the findings of this study are available within the article and its Supplementary Information files. Extra data are available from the corresponding author upon request.


Articles from NPJ Precision Oncology are provided here courtesy of Nature Publishing Group

RESOURCES