Skip to main content
The American Journal of Pathology logoLink to The American Journal of Pathology
. 2023 Mar;193(3):341–349. doi: 10.1016/j.ajpath.2022.12.004

Deep Learning–Based Objective and Reproducible Osteosarcoma Chemotherapy Response Assessment and Outcome Prediction

David J Ho , Narasimhan P Agaram , Marc-Henri Jean , Stephanie D Suser , Cynthia Chu , Chad M Vanderbilt , Paul A Meyers , Leonard H Wexler , John H Healey §, Thomas J Fuchs , Meera R Hameed ∗,
PMCID: PMC10013034  PMID: 36563747

Abstract

Osteosarcoma is the most common primary bone cancer, whose standard treatment includes pre-operative chemotherapy followed by resection. Chemotherapy response is used for prognosis and management of patients. Necrosis is routinely assessed after chemotherapy from histology slides on resection specimens, where necrosis ratio is defined as the ratio of necrotic tumor/overall tumor. Patients with necrosis ratio ≥90% are known to have a better outcome. Manual microscopic review of necrosis ratio from multiple glass slides is semiquantitative and can have intraobserver and interobserver variability. In this study, an objective and reproducible deep learning–based approach was proposed to estimate necrosis ratio with outcome prediction from scanned hematoxylin and eosin whole slide images (WSIs). To conduct the study, 103 osteosarcoma cases with 3134 WSIs were collected. Deep Multi-Magnification Network was trained to segment multiple tissue subtypes, including viable tumor and necrotic tumor at a pixel level and to calculate case-level necrosis ratio from multiple WSIs. Necrosis ratio estimated by the segmentation model highly correlates with necrosis ratio from pathology reports manually assessed by experts. Furthermore, patients were successfully stratified to predict overall survival with P = 2.4 × 10–6 and progression-free survival with P = 0.016. This study indicates that deep learning can support pathologists as an objective tool to analyze osteosarcoma from histology for assessing treatment response and predicting patient outcome.

Graphical abstract

graphic file with name fx1.jpg


Osteosarcoma is the most common primary bone cancer, with an incidence of 4 to 5 cases per million worldwide per year.1 Induction chemotherapy before surgery is the standard of care for patients with osteosarcoma.2 Multiple studies have shown that necrosis ratio, defined as ratio of necrotic tumor/overall tumor, from histologic assessments of resected samples, is one of the important prognostic factors that correlates with patient outcome.3, 4, 5, 6, 7, 8 Treatment response to chemotherapy includes necrosis, fibrosis, and/or hyalinization; and necrosis ratio estimation by pathologists includes all three pathologic features. The 5-year overall survival (OS) rate for patients whose necrosis ratio is ≥90% is approximately 80%.6 However, manually assessing tumor necrosis from multiple hematoxylin and eosin (H&E)–stained slides is semiquantitative and is prone to interobserver and intraobserver variability.4 Necrosis ratio estimation of osteosarcoma on an H&E-sectioned slide at different time points has been shown to have interclass correlation coefficient of 0.652 between six pathologists.9

Deep learning, a subfield of machine learning, has been widely studied for the analysis of whole slide images (WSIs) because of its objective and reproducible nature.10,11 Multiple groups have developed deep learning models for osteosarcoma that can segment viable tumor and necrotic tumor.12, 13, 14, 15, 16 Although these models achieve acceptable performance, neither comparison with manually assessed necrosis ratio nor correlation with patient outcome data has been performed.

In the current study, a complete pipeline that segments multiple tissue subtypes, including viable tumor and necrotic tumor, at a pixel level from multiple WSIs was proposed to estimate case-level necrosis ratio in an objective and reproducible manner. The estimated necrosis ratio was then correlated with OS and progression-free survival (PFS) outcome data. Figure 1 shows the block diagram of the proposed method. For pixel-wise segmentation, Deep Multi-Magnification Network17 was used to accurately segment multiple tissue subtypes. Case-level necrosis ratio can be calculated from segmentation predictions of multiple WSIs by counting the number of pixels of viable tumor and necrotic tumor on WSIs. These data were used to correlate OS and PFS. In addition, the cutoff threshold was tuned to stratify patients specifically for this segmentation model and this data set. The technical details of the method and proof of concept have been previously published at a machine learning conference.18 In this work, the method was extended to the largest known cohort of digital slide images from patients with osteosarcoma. The main aims of the study were as follows: i) to collect the largest osteosarcoma data set, ii) to develop and release a pixel-wise osteosarcoma segmentation model, iii) to estimate case-level necrosis ratio and compare with manually assessed ratio from pathologists, and iv) to correlate necrosis ratio with the OS and PFS outcome data.

Figure 1.

Figure 1

Block diagram of the proposed method. Top: Currently, an osteosarcoma case with multiple slides is assessed via a microscope to estimate necrosis ratio and to predict outcome. Bottom: Deep learning–based segmentation by Deep Multi-Magnification Network17 was used to segment multiple tissue subtypes, to count the number of pixels for viable tumor (VT) and necrotic tumor (NT), to estimate necrosis ratio, and to predict outcome.

Materials and Methods

Data Set

This study was approved by the Institutional Review Board at Memorial Sloan Kettering Cancer Center (protocol number 18-013). After Institutional Review Board approval, osteosarcoma cases with resection materials available at Memorial Sloan Kettering Cancer Center were selected. The resection cases were selected from 2002 to 2020. All cases had preoperative chemotherapy followed by resection. Detailed treatment information was available on 84 cases; and the patients received combination chemotherapy, including cisplatin, doxorubicin, high-dose methotrexate, and/or etoposide or ifosfamide. The resected specimens are routinely sliced along the long axis, and one to three representative slabs are mapped and labeled as per anatomic orientation. After routine processing, the H&E-stained slides are examined microscopically for necrosis assessment (necrotic tumor divided by overall tumor). The pathology reports were reviewed, and the documented percentages of therapy-related changes were recorded. Whenever available, the follow-up data were retrieved from the clinical database. During our previous study,18 55 cases with 1578 WSIs were collected. To increase the data set, 48 additional cases with 1556 WSIs digitized in ×20 magnification by Aperio AT2 (Leica Biosystems, Buffalo Grove, IL) scanners at Memorial Sloan Kettering Cancer Center were collected. In total, the data set contains 103 cases with 3134 WSIs, where mean and median of the number of WSIs per case are 30.4 and 27, respectively. To train the pixel-wise tissue segmentation model, two pathologists (N.P.A. and M.R.H.) annotated tissue regions with concordance to avoid variability on 75 WSIs from 15 training cases. The training cases were selected on the basis of heterogeneous percentage of necrosis and the distribution of seven classes (viable tumor, necrosis/nonviable bone, necrosis/fibrosis without bone, normal bone, normal tissue, normal cartilage, and blank). The two pathologists annotated distinctive morphologic patterns of the seven classes on a subset of WSIs from the training cases, which was sufficient for the model to learn the patterns. The remaining 88 cases were used to test our segmentation model. Because pathologists microscopically review all glass slides to assess necrosis ratio, all WSIs on testing cases were utilized to calculate necrosis ratio. First, 80 cases were used to evaluate necrosis ratio estimation after excluding eight cases missing necrosis ratio in pathology reports. Next, 75 cases were used to predict OS after excluding two cases with overdecalcification and three cases missing OS outcome data. Last, 64 cases were used to predict PFS after excluding one case missing metastasis status and 10 cases who presented with metastases at the time of diagnosis. Figure 2 shows a Consolidated Standards of Reporting Trials (CONSORT) flow diagram of the data set. To the best of our knowledge, this is the largest osteosarcoma data set.

Figure 2.

Figure 2

Osteosarcoma data set containing 103 cases with 3134 whole slide images (WSIs). Fifteen cases were used to train the segmentation model, and the other 88 cases were used to test the model. More specifically, 80 cases were used to evaluate necrosis ratio assessment, 75 cases were used to predict overall survival (OS), and 64 cases were used to predict progression-free survival (PFS). MSKCC, Memorial Sloan Kettering Cancer Center.

Tissue Segmentation

Case-level necrosis ratio consists of the ratio of the area of necrotic tumor/the area of overall tumor on a set of osteosarcoma slides. Therefore, accurate pixel-wise segmentation would be necessary to count the number of pixels for viable tumor and the number of pixels for necrotic tumor on a set of osteosarcoma WSIs and to estimate the case-level necrosis ratio. WSIs are made up of giga-pixels that cannot be processed as one image because of their large size. Instead, they need to be processed in patches, which are cropped square-shaped regions from the WSIs. In this study, the Deep Multi-Magnification Network,17 which processes a set of patches in size of 256 × 256 pixels in ×20, ×10, and ×5 magnifications centered at the same coordinate, was used to accurately generate pixel-wise tissue segmentation predictions of a patch in size of 256 × 256 pixels in ×20 magnification.

To train the segmentation model, Deep Interactive Learning18 was used to efficiently annotate a limited set of osteosarcoma training cases. Deep Interactive Learning applies an iterative approach of correcting (or annotating) mislabeled regions from a previous model and fine-tuning the model with the additionally corrected patches to a training set. In this study, the model generated in our previous work18 segmenting seven classes, including viable tumor, necrosis/nonviable bone, necrosis/fibrosis without bone, normal bone, normal tissue, normal cartilage, and blank, was fine-tuned. Specifically, regions with treatment effect with an increased density of inflammatory cells, macrophages, and stromal cells were found to be incorrectly labeled as viable tumor by the previous segmentation model. To fine-tune the model with these morphologic patterns, two additional cases with 26 WSIs containing these patterns were included in the training set. Without any additional manual annotation, these mislabeled regions from the two cases were extracted in patches with the corresponding correct labels (necrosis/fibrosis without bone). For optimization, weighted cross entropy was used as the loss function with stochastic gradient descent, with a learning rate of 5 × 10−6, a momentum of 0.99, and a weight decay of 10−4 for 10 epochs. The final model was selected on the basis of the highest mean intersection over union on the validation set, which is a subset of the training set not used for optimization.

Because giga-pixel WSIs are too large to be segmented at once, patches were segmented starting from a window at the top, left corner of the WSIs and sliding the window to horizontal and vertical directions by 256 pixels until the entire WSIs are segmented. Otsu Algorithm19 was not used because some necrosis regions can be excluded because of their pixel intensities. All of the implementation for training and inference was done on PyTorch software version 1.3.1,20 and all experiments were conducted on a Tesla V100 GPU (Nvidia, Santa Clara, CA). WSIs and their segmentation predictions were visualized by our Memorial Sloan Kettering Cancer Center slide viewer.21

After all the WSIs in a case are segmented, a case-level necrosis ratio from multiple WSIs, estimated by the deep learning model, rDL, is calculated as follows:

rDL=pNTpVT+pNT (1)

where pVT and pNT are the number of pixels for viable tumor and necrotic tumor, respectively. Necrosis ratios estimated by the deep learning model with necrosis ratios estimated by pathologists from pathology reports were compared to evaluate if the segmentation model can reproduce manually assessed necrosis ratio by experts.

Patient Stratification

On the basis of necrosis ratio calculated by the segmentation model, patients were stratified to predict patient outcome. OS and PFS outcome data were collected from patient charts, and Kaplan-Meier curves were plotted. Because reproducible estimation of necrosis ratio without any variability is now possible with the deep learning model, we not only tried the well-known cutoff threshold at 90%6 but also tuned the cutoff threshold with an interval of 10% to objectively investigate various cutoff thresholds specifically for this segmentation model and this data set. The log-rank test was performed to evaluate patient stratification.

Results

Necrosis Ratio Assessment

Figures 3 and 4 show multiclass segmentation predictions on WSIs and zoom-in images, respectively. By overlaying the multiclass segmentation predictions on testing WSIs using the Memorial Sloan Kettering Cancer Center slide viewer,21 the ability of the segmentation model to accurately segment the seven tissue subtypes was visually validated. The model was not able to accurately segment certain morphologic patterns, such as isolated viable tumor cells, chondroid foci, and densely sclerotic osteosarcoma, as shown in Supplemental Figure S1. The segmentation model and code are publicly available (https://github.com/MSKCC-Computational-Pathology/DMMN-osteosarcoma, last accessed March 15, 2021).

Figure 3.

Figure 3

Multiclass segmentation of two osteosarcoma whole slide images. Whole slide images (A and C) and their segmentation predictions (B and D). Viable tumor is segmented in red, necrosis/nonviable bone is segmented in blue, necrosis/fibrosis without bone is segmented in yellow, normal bone is segmented in green, normal tissue is segmented in orange, normal cartilage is segmented in brown, and blank is segmented in gray. Scale bar = 5 mm (A and C).

Figure 4.

Figure 4

Segmentation of viable tumor (A and B), necrosis/nonviable bone (C and D), and necrosis/fibrosis without bone (E and F). Viable tumor is segmented in red, necrosis/nonviable bone is segmented in blue, and necrosis/fibrosis without bone is segmented in yellow. Scale bars: 100 μm (A); 200 μm (C and E).

To quantitatively evaluate the segmentation model, necrosis ratio manually assessed by experts from pathology reports (denoted as rPR) and necrosis ratio objectively assessed by our deep learning model (denoted as rDL) were compared using absolute difference between them. Our hypothesis was that the necrosis ratio of the deep learning model would be close to the necrosis ratio of manual assessment by experts. Therefore, absolute difference was used as a metric and was defined as |rPRrDL|. Table 1 shows mean, median, and SD of absolute differences in various ranges of necrosis ratio. Mean and median absolute difference for cases whose necrosis ratio ≥90% were 4.44% and 2.95%, respectively. The scatterplot of the 80 testing cases is shown in Supplemental Figure S2. Three cases showed significant differences in necrosis ratio, where the pathologists' assessment described <50%, whereas the model predicted >90%. On rereview of the three cases, the consensus of pathologists' assessment of necrosis was 60% to 70%, which is still below the model's assessment of >90%. The contributing factors included isolated or small clusters of viable tumor cells, chondroid areas, and, in one case, densely sclerotic osteosarcoma with viable residual tumor cells. The model was further analyzed using outcome data of testing cases to evaluate if the deep learning model can be clinically used, as described in the next section.

Table 1.

Mean, Median, and SD of Absolute Differences on Various Ranges Based on Necrosis Ratio from Pathology Reports, Denoted as rPR

Range Mean Median SD Cases, n
rPR ≥ 90% 4.44 2.95 4.17 26
rPR < 90% 30.99 28.35 17.37 54
0% ≤ rPR ≤ 100% 22.37 17.9 19.09 80

Outcome Prediction

Kaplan-Meier curves were plotted, and the log-rank P values were calculated to evaluate outcome predictions, as shown in Figure 5. On the basis of manual assessment from pathology reports at the conventional 90% cutoff threshold on 75 testing cases, P = 0.045 was achieved for OS outcome. On the basis of automated assessment from the deep learning model at the 90% cutoff threshold, P = 0.0031 was achieved, showing the deep learning model can successfully stratify patients for OS outcome. Because there is no variability caused by the deep learning model, an objective approach was proposed to investigate various cutoff thresholds, specifically for this segmentation model and this data set. With the interval of 10%, P = 2.4 × 10–6 was achieved at the 80% cutoff threshold. Furthermore, PFS outcome data were predicted on 64 testing cases using the deep learning model, which achieved P = 0.016 at the 60% cutoff threshold. The P values from various cutoff thresholds are shown in Supplemental Table S1.

Figure 5.

Figure 5

Outcome prediction. A: Patient stratification based on overall survival (OS) outcome at the conventional 90% cutoff threshold from manually assessed pathology reports, achieving P = 0.045. B: Patient stratification based on OS outcome at the same 90% cutoff threshold from our deep learning model, achieving P = 0.0031. The deep learning model performed a better stratification than manual assessment of glass slides. C: Patient stratification based on OS outcome at the 80% cutoff threshold from our deep learning model, achieving P = 2.4 × 10−6. The cutoff threshold for our deep learning model and our data set can be tuned to have better stratification because of its objective and reproducible manner. D: Patient stratification based on progression-free survival outcome at the 60% cutoff threshold from our deep learning model, achieving P = 0.016.

Discussion

In this study, a deep learning–based approach was developed to estimate case-level necrosis ratio from multiple H&E-stained osteosarcoma WSIs, where necrosis ratio is known to correlate with prognosis.3, 4, 5, 6, 7, 8 Specifically, Deep Multi-Magnification Network17 was trained to objectively and reproducibly segment multiple tissue subtypes, including viable tumor and necrotic tumor, at a pixel level, to calculate necrosis ratio. The accuracy of necrosis ratio performed by the deep learning model was verified by comparing with manually assessed necrosis ratio from pathology reports. Furthermore, patients were stratified by OS and PFS based on necrosis ratio. Because of its objective manner, the cutoff threshold was tuned to stratify patients specifically for our trained model and our data set. In this study, the segmentation model achieved P = 2.4 × 10–6 at the 80% cutoff threshold for OS and P = 0.016 at the 60% cutoff threshold for PFS. To our knowledge, this is the first study with the largest osteosarcoma cohort to compare manually assessed necrosis ratio from pathology reports to objectively assess necrosis ratio from the deep learning model and successfully stratify patients to predict OS and PFS based on objectively assessed necrosis ratio.

High intraobserver and interobserver variability of histologic subtypes of in situ and invasive cancer and necrosis percentage by manual microscopic assessment of H&E-stained sections has been observed in various cancer types, such as lung,22,23 breast,24 and colon.25 Although necrosis ratio from histologic slides is well proven as a prognostic factor in osteosarcoma,3, 4, 5, 6, 7, 8 this visual estimation of necrosis remains subjective.9,26 Even with standardization of diagnostic criteria, reducing variability in necrosis ratio from ≥30 glass slides is challenging.

Deep learning with digitized histopathology images can be used as a tool to avoid this variability10,11 because deep learning models can objectively and consistently generate the same output given the same input. In this study, the Deep Multi-Magnification Network17 was used to accurately segment viable tumor and necrotic tumor at the level of a pixel, the most basic element in an image. After segmentation of osteosarcoma WSIs, model performance was evaluated using manually assessed necrosis ratio from pathology reports and patient outcome data. For clinical relevance, necrosis ratios from the segmentation model were compared with necrosis ratios from pathology reports, taking into account the subjective nature of manual estimation of necrosis. Although necrosis ratio estimated by the segmentation model highly correlated with necrosis ratio manually assessed by experts in cases with high necrosis ratio, cases with necrosis ratio <50% generally had a higher absolute difference. This result is neither surprising nor unexpected. Manual assessment of necrosis ratio is known to be highly subjective. For example, in their study of necrosis assessment by pathologists, Kang et al9 showed that necrosis ratio assessed by six expert pathologists demonstrated an interclass correlation coefficient of 0.652 for 10 cases. In addition, high absolute differences within this range may be related to imprecise subjective estimation of low percentage of necrotic tumor, which is much below the cutoff threshold (90%) used to determine good or poor prognosis.6

For this reason, the necrosis ratio estimated by the model was used to stratify patients to predict OS and PFS outcome data. The log-rank test was used to verify better stratification by the segmentation model than by human experts. The cutoff threshold was tuned specifically for the segmentation model and the current data set because of its objective and reproducible manner. The outcome results validated the objectivity of the model to recognize necrosis patterns and to estimate percentage of treatment response. Previous studies have attempted to find the optimal cutoff threshold of necrosis ratio as a strong indicator of prognosis using manual assessment, but high intraobserver and interobserver variability has precluded effective conclusions.9,27 With a deep learning model, it would be possible to objectively and reproducibly select the optimal cutoff threshold stratifying patients with the lowest log-rank P value.

There are several limitations to this study. The qualitative evaluation of segmentation predictions indicated that the segmentation model missed some viable tumors, such as isolated tumor cells, chondroid foci, and densely sclerotic osteosarcoma, potentially causing overestimation of necrosis ratio. The segmentation model was designed to segment at a tissue level, not at a cellular level. Although the model was able to segment regions with dense areas of viable tumor cells, it missed isolated viable tumor cells because of a lack of training by cell-level annotations. The estimation of necrosis ratio can be further improved by combining with a cell segmentation model28 that can detect isolated viable tumor cells. Chondroid foci and densely sclerotic osteosarcoma were underrepresented in the training set. By including more regions with rare patterns to the training set using Deep Interactive Learning18 or generating synthetic histology images with the rare patterns using generative adversarial networks,29 the model could be fine-tuned to accurately segment them. Artifacts caused during slide preparation (bone dust and stain precipitate) can lead to missegmentation, which is a common challenge in all digital and computational pathology.30,31 Training a more robust segmentation model by including artifacts in the training set would circumvent this issue. Lastly, this study was done with a data set from a single institution. For a more comprehensive study to improve segmentation and to select the optimal cutoff threshold, collecting a multi-institutional data set would be necessary.

In summary, the deep learning–based segmentation model was able to objectively and reproducibly estimate necrosis ratio from multiple osteosarcoma whole slide images. The experimental results demonstrated high correlation between manually assessed necrosis ratio by pathologists and automatically calculated necrosis ratio by the segmentation model. This indicated that the segmentation model could successfully estimate osteosarcoma necrosis ratio from multiple slide images. Patients were stratified to predict overall survival and progression-free survival by additionally tuning the cutoff threshold in an objective manner. As intraobserver and interobserver variability is an intrinsic phenomenon in the manual and semiquantitative estimation of necrosis ratio, adopting deep learning–based models for a more objective assessment of necrosis ratio can pave the way for more prospective studies to assess treatment response and outcome in patients with osteosarcoma.

Acknowledgments

We thank the PRISSMM collaborative for support. Electronic health records were curated, and patient outcomes were defined, using the PRISSMM phenomic data system. PRISSMM is a set of phenomic data standards and tools for characterization and communication of structured information about cancer status and treatment outcomes for patients with solid tumors.

Footnotes

Supported by the Warren Alpert Foundation Center for Digital and Computational Pathology at Memorial Sloan Kettering Cancer Center; and NIH/National Cancer Institute Cancer Center Support grant P30 CA008748.

D.J.H. and N.P.A. contributed equally to this work.

Disclosures: T.J.F. is cofounder, chief scientist, and equity holder of Paige.AI. C.M.V. is a consultant (uncompensated) and equity holder in Paige.AI. D.J.H., N.P.A., C.M.V., T.J.F., and M.R.H. have intellectual property interests related to Paige.AI, which is relevant to the work that is the subject of this article. Memorial Sloan Kettering Cancer Center has institutional financial interests in Paige.AI. The remaining authors declare no competing interests.

Supplemental material for this article can be found at http://doi.org/10.1016/j.ajpath.2022.12.004.

Author Contributions

D.J.H., N.P.A., C.M.V., T.J.F., and M.R.H. conceived the study; D.J.H. developed the deep learning model; N.P.A. and M.R.H. reviewed and annotated whole slide images; M.-H.J. scanned glass slides; S.D.S., C.C., P.A.M., L.H.W., and J.H.H. provided the clinical data set; D.J.H., N.P.A., and M.R.H. performed statistical analysis and wrote the initial manuscript; and all authors read, edited, and approved the final manuscript.

Supplemental Data

Supplemental Figure S1

Mislabeled regions from the segmentation model. A and B: The model was designed to segment tissue components. Although the model can segment dense viable tumor cells, it missed isolated viable tumor cells. C and D: Because of a lack of sufficient examples of their morphologic pattern in our training set, the model mislabeled chondroid foci, which needed to be labeled as viable tumor. E and F: Because of a lack of sufficient examples of their morphologic pattern in our training set, the model mislabeled densely sclerotic osteosarcomas, which needed to be labeled as viable tumor. Regions with red, blue, yellow, green, orange, brown, and gray indicate predicted viable tumor, necrosis/nonviable bone, necrosis/fibrosis without bone, normal bone, normal tissue, cartilage, and blank by our segmentation model, respectively. Scale bars: 100 μm (A and E); 500 μm (C).

mmc1.pdf (1.2MB, pdf)
Supplemental Figure S2

Scatterplot between necrosis ratio from pathology reports and necrosis ratio from the deep learning model. Dotted line shows perfect matching points between necrosis ratio from pathology reports and necrosis ratio from the deep learning model. Blue and orange dots represent cases with necrosis ratio from pathology reports ≥90% and <90%, respectively.

mmc2.pdf (46.6KB, pdf)
Supplemental Table S1
mmc3.docx (13.4KB, docx)

References

  • 1.Ottaviani G., Jaffe N. The epidemiology of osteosarcoma. Cancer Treat Res. 2009;152:3–13. doi: 10.1007/978-1-4419-0284-9_1. [DOI] [PubMed] [Google Scholar]
  • 2.Provisor A.J., Ettinger L.J., Nachman J.B., Krailo M.D., Makley J.T., Yunis E.J., Huvos A.G., Betcher D.L., Baum E.S., Kisker C.T., Miser J.S. Treatment of nonmetastatic osteosarcoma of the extremity with preoperative and postoperative chemotherapy: a report from the Children's Cancer Group. J Clin Oncol. 1997;15:76–84. doi: 10.1200/JCO.1997.15.1.76. [DOI] [PubMed] [Google Scholar]
  • 3.Davis A.M., Bell R.S., Goodwin P.J. Prognostic factors in osteosarcoma: a critical review. J Clin Oncol. 1994;12:423–431. doi: 10.1200/JCO.1994.12.2.423. [DOI] [PubMed] [Google Scholar]
  • 4.Glasser D.B., Lane J.M., Huvos A.G., Marcove R.C., Rosen G. Survival, prognosis, and therapeutic response in osteogenic sarcoma: the Memorial Hospital experience. Cancer. 1992;69:698–708. doi: 10.1002/1097-0142(19920201)69:3<698::aid-cncr2820690317>3.0.co;2-g. [DOI] [PubMed] [Google Scholar]
  • 5.Huvos A.G., Rosen G., Marcove R.C. Primary osteogenic sarcoma: pathologic aspects in 20 patients after treatment with chemotherapy en bloc resection, and prosthetic bone replacement. Arch Pathol Lab Med. 1977;101:14–18. [PubMed] [Google Scholar]
  • 6.O'Kane G.M., Cadoo K.A., Walsh E.M., Emerson R., Dervan P., O'Keane C., Hurson B., O'Toole G., Dudeney S., Kavanagh E., Eustace S., Carney D.N. Perioperative chemotherapy in the treatment of osteosarcoma: a 26-year single institution review. Clin Sarcoma Res. 2015;5:17. doi: 10.1186/s13569-015-0032-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Raymond A.K., Chawla S.P., Carrasco C.H., Ayala A.G., Fanning C.V., Grice B., Armen T., Plager C., Papadopoulos N.E., Edeiken J. Osteosarcoma chemotherapy effect: a prognostic factor. Semin Diagn Pathol. 1987;4:212–236. [PubMed] [Google Scholar]
  • 8.Rosen G., Caparros B., Huvos A.G., Kosloff C., Nirenberg A., Cacavio A., Marcove R.C., Lane J.M., Mehta B., Urban C. Preoperative chemotherapy for osteogenic sarcoma: selection of postoperative adjuvant chemotherapy based on the response of the primary tumor to preoperative chemotherapy. Cancer. 1982;49:1221–1230. doi: 10.1002/1097-0142(19820315)49:6<1221::aid-cncr2820490625>3.0.co;2-e. [DOI] [PubMed] [Google Scholar]
  • 9.Kang J.-W., Shin S.H., Choi J.H., Moon K.C., Koh J.S., kwon Jung C., Park Y.-K., Lee K.B., Chung Y.-G. Inter-and intra-observer reliability in histologic evaluation of necrosis rate induced by neo-adjuvant chemotherapy for osteosarcoma. Int J Clin Exp Pathol. 2017;10:359–367. [Google Scholar]
  • 10.Srinidhi C.L., Ciga O., Martel A.L. Deep neural network models for computational histopathology: a survey. Med Image Anal. 2021;67:101813. doi: 10.1016/j.media.2020.101813. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Van der Laak J., Litjens G., Ciompi F. Deep learning in histopathology: the path to the clinic. Nat Med. 2021;27:775–784. doi: 10.1038/s41591-021-01343-4. [DOI] [PubMed] [Google Scholar]
  • 12.Anisuzzaman D., Barzekar H., Tong L., Luo J., Yu Z. A deep learning study on osteosarcoma detection from histological images. Biomed Signal Process Control. 2021;69:102931. [Google Scholar]
  • 13.Arunachalam H.B., Mishra R., Daescu O., Cederberg K., Rakheja D., Sengupta A., Leonard D., Hallac R., Leavey P. Viable and necrotic tumor assessment from whole slide images of osteosarcoma using machine-learning and deep-learning models. PLoS One. 2019;14:e0210706. doi: 10.1371/journal.pone.0210706. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Fu Y., Xue P., Ji H., Cui W., Dong E. Deep model with Siamese network for viable and necrotic tumor regions assessment in osteosarcoma. Med Phys. 2020;47:4895–4905. doi: 10.1002/mp.14397. [DOI] [PubMed] [Google Scholar]
  • 15.Mishra R., Daescu O., Leavey P., Rakheja D., Sengupta A. Springer; 2017. Histopathological diagnosis for viable and non-viable tumor prediction for osteosarcoma using convolutional neural network. International Symposium on Bioinformatics Research and Applications; pp. 12–23. [Google Scholar]
  • 16.Mishra R., Daescu O., Leavey P., Rakheja D., Sengupta A. Convolutional neural network for histopathological analysis of osteosarcoma. J Comput Biol. 2018;25:313–325. doi: 10.1089/cmb.2017.0153. [DOI] [PubMed] [Google Scholar]
  • 17.Ho D.J., Yarlagadda D.V.K., D’Alfonso T.M., Hanna M.G., Grabenstetter A., Ntiamoah P., Brogi E., Tan L.K., Fuchs T.J. Deep multi-magnification networks for multi-class breast cancer image segmentation. Comput Med Imaging Graph. 2021;88:101866. doi: 10.1016/j.compmedimag.2021.101866. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Ho D.J., Agaram N.P., Schüffler P.J., Vanderbilt C.M., Jean M.-H., Hameed M.R., Fuchs T.J. Springer; 2020. Deep interactive learning: an efficient labeling approach for deep learning-based osteosarcoma treatment response assessment. International Conference on Medical Image Computing and Computer-Assisted Intervention; pp. 540–549. [Google Scholar]
  • 19.Otsu N. A threshold selection method from gray-level histograms. IEEE Trans Syst Man Cybern. 1979;9:62–66. [Google Scholar]
  • 20.Paszke A., Gross S., Massa F., Lerer A., Bradbury J., Chanan G., Killeen T., Lin Z., Gimelshein N., Antiga L., Desmaison A., Köpf A., Yang E., DeVito Z., Raison M., Tejani A., Chilamkurthy S., Steiner B., Fang L., Bai J., Chintala S. PyTorch: an imperative style, high-performance deep learning library. Advances in Neural Information Processing Systems. 2019:32. [Google Scholar]
  • 21.Schüffler P.J., Geneslaw L., Yarlagadda D.V.K., Hanna M.G., Samboy J., Stamelos E., Vanderbilt C., Philip J., Jean M.-H., Corsale L., Manzo A., Paramasivam N.H.G., Ziegler J.S., Gao J., Perin J.C., Kim Y.S., Bhanot U.K., Roehrl M.H.A., Ardon O., Chiang S., Giri D.D., Sigel C.S., Tan L.K., Murray M., Virgo C., England C., Yagi Y., Sirintrapun S.J., Klimstra D., Hameed M., Reuter V.E., Fuchs T.J. Integrated digital pathology at scale: a solution for clinical diagnostics and cancer research at a large academic medical center. J Am Med Inform Assoc. 2021;28:1874–1884. doi: 10.1093/jamia/ocab085. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Wang C., Durra H.Y., Huang Y., Manucha V. Interobserver reproducibility study of the histological patterns of primary lung adenocarcinoma with emphasis on a more complex glandular pattern distinct from the typical acinar pattern. Int J Surg Pathol. 2014;22:149–155. doi: 10.1177/1066896913519165. [DOI] [PubMed] [Google Scholar]
  • 23.Warth A., Stenzinger A., von Brünneck A.-C., Goeppert B., Cortis J., Petersen I., Hoffmann H., Schnabel P.A., Weichert W. Interobserver variability in the application of the novel IASLC/ATS/ERS classification for pulmonary adenocarcinomas. Eur Respir J. 2012;40:1221–1227. doi: 10.1183/09031936.00219211. [DOI] [PubMed] [Google Scholar]
  • 24.Gomes D.S., Porto S.S., Balabram D., Gobbi H. Inter-observer variability between general pathologists and a specialist in breast pathology in the diagnosis of lobular neoplasia, columnar cell lesions, atypical ductal hyperplasia and ductal carcinoma in situ of the breast. Diagn Pathol. 2014;9:1–9. doi: 10.1186/1746-1596-9-121. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Viray H., Li K., Long T.A., Vasalos P., Bridge J.A., Jennings L.J., Halling K.C., Hameed M., Rimm D.L. A prospective, multi-institutional diagnostic trial to determine pathologist accuracy in estimation of percentage of malignant cells. Arch Pathol Lab Med. 2013;137:1545–1549. doi: 10.5858/arpa.2012-0561-CP. [DOI] [PubMed] [Google Scholar]
  • 26.Glasser D.B. Columbia University; New York, NY: 1993. Histologic Response to Pre-Operative Chemotherapy in Osteosarcoma: Appropriate Uses in Clinical Research. [Google Scholar]
  • 27.Li X., Ashana A.O., Moretti V.M., Lackman R.D. The relation of tumour necrosis and survival in patients with osteosarcoma. Int Orthop. 2011;35:1847–1853. doi: 10.1007/s00264-011-1209-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Graham S., Vu Q.D., Raza S.E.A., Azam A., Tsang Y.W., Kwak J.T., Rajpoot N. Hover-net: simultaneous segmentation and classification of nuclei in multi-tissue histology images. Med Image Anal. 2019;58:101563. doi: 10.1016/j.media.2019.101563. [DOI] [PubMed] [Google Scholar]
  • 29.Deshpande S., Minhas F., Graham S., Rajpoot N. SAFRON: stitching across the Frontier Network for generating colorectal cancer histology images. Med Image Anal. 2022;77:102337. doi: 10.1016/j.media.2021.102337. [DOI] [PubMed] [Google Scholar]
  • 30.Janowczyk A., Zuo R., Gilmore H., Feldman M., Madabhushi A. HistoQC: an open-source quality control tool for digital pathology slides. JCO Clin Cancer Inform. 2019;3:1–7. doi: 10.1200/CCI.18.00157. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Schömig-Markiefka B., Pryalukhin A., Hulla W., Bychkov A., Fukuoka J., Madabhushi A., Achter V., Nieroda L., Büttner R., Quaasand A., Tolkach Y. Quality control stress test for deep learning-based diagnostic model in digital pathology. Mod Pathol. 2021;34:2098–2108. doi: 10.1038/s41379-021-00859-x. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental Figure S1

Mislabeled regions from the segmentation model. A and B: The model was designed to segment tissue components. Although the model can segment dense viable tumor cells, it missed isolated viable tumor cells. C and D: Because of a lack of sufficient examples of their morphologic pattern in our training set, the model mislabeled chondroid foci, which needed to be labeled as viable tumor. E and F: Because of a lack of sufficient examples of their morphologic pattern in our training set, the model mislabeled densely sclerotic osteosarcomas, which needed to be labeled as viable tumor. Regions with red, blue, yellow, green, orange, brown, and gray indicate predicted viable tumor, necrosis/nonviable bone, necrosis/fibrosis without bone, normal bone, normal tissue, cartilage, and blank by our segmentation model, respectively. Scale bars: 100 μm (A and E); 500 μm (C).

mmc1.pdf (1.2MB, pdf)
Supplemental Figure S2

Scatterplot between necrosis ratio from pathology reports and necrosis ratio from the deep learning model. Dotted line shows perfect matching points between necrosis ratio from pathology reports and necrosis ratio from the deep learning model. Blue and orange dots represent cases with necrosis ratio from pathology reports ≥90% and <90%, respectively.

mmc2.pdf (46.6KB, pdf)
Supplemental Table S1
mmc3.docx (13.4KB, docx)

Articles from The American Journal of Pathology are provided here courtesy of American Society for Investigative Pathology

RESOURCES