Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2017 Jan 24.
Published in final edited form as: Ann Surg Oncol. 2012 Apr 20;19(9):2992–3001. doi: 10.1245/s10434-012-2345-z

A Probabilistic Analysis of Completely Excised High-Grade Soft Tissue Sarcomas of the Extremity: An Application of a Bayesian Belief Network

Jonathan Agner Forsberg 1, John H Healey 2,4, Murray F Brennan 3,4
PMCID: PMC5262491  NIHMSID: NIHMS840674  PMID: 22526900

Abstract

Background

It is important to understand the relative importance of prognostic variables in patients with soft tissue sarcomas. The purpose of this study was to describe the hierarchical relationships between features inherent to completely excised, localized high-grade soft tissue sarcomas of the extremity and compare the associations to those previously reported.

Methods

Data were collected from the Memorial Sloan-Kettering Cancer Center Sarcoma Database. All adult patients with high-grade extremity soft tissue sarcomas who underwent complete excision (R0 margins) at our institution between 1982 and 2010 were included in the analysis. Bayesian belief network (BBN) modeling software was used to develop a hierarchical network of features trained to estimate the likelihood of disease-specific survival. Important relationships depicted by the BBN model were compared to those previously reported.

Results

The records of 1318 consecutive patients met the inclusion criteria, and all were included in the analysis. First-degree associates of disease-specific survival were the primary tumor size; presence of and time to distant recurrence; and presence of and time to local recurrence. On cross-validation, the BBN model was sufficiently robust, with an area under the curve of 0.94 (95% CI, 0.93–0.96).

Conclusions

We successfully described the hierarchical relationships between features inherent to patients with completely excised high-grade soft tissue sarcomas of the extremity. The relationships defined by the BBN model were similar to those previously reported. Cross-validation results were encouraging, demonstrating that BBN modeling can be used to graphically illustrate the complex hierarchical relationships between prognostic features in this setting.

SYNOPSIS

We used a Bayesian Belief Network (BBN) to describe the hierarchical relationships between features inherent to patients with completely excised high-grade soft tissue sarcomas of the extremities. The relationships defined by the BBN model were similar to those previously reported, demonstrating that BBN modeling can be used as an adjunct to existing nomograms to graphically illustrate the complex hierarchical relationships between prognostic features.

INTRODUCTION

Accurate, personalized survival estimates are important in the treatment of soft tissue sarcomas (STS). Since the development of our institution’s prospectively collected soft tissue sarcoma database, we have evolved from the descriptive characterization14 of prognostic factors for outcome5 to the development of generic,6 disease-specific,7 and therapy-related8 nomograms. These nomograms, however, require that specific outcomes be defined a priori, are limited by the number of specific categories in each variable (i.e., histology), and, more importantly, can be hampered by missing or outlying data points.

Bayesian classification, which includes Bayesian belief network (BBN) modeling, is an inductive statistical method that allows for rational learning from experimental data. For a given set of data, probabilistic relationships between features can be characterized by defining conditional probabilities in terms of joint events: P(A,B) = P(A /B) · P(B), which allows one to estimate the probability of “A” given a frame of knowledge “B.” Repeated application of this formula enables the development of a graphical n-dimensional model that codifies all related features within a single hierarchical network. When used as prognostic models, Bayesian classifiers can also account effectively for data multidimensionality and uncertainty—a quality that enables prognostic BBN models to maintain their robustness in the setting of incomplete or outlying clinical data.9, 10 Importantly, BBN models are capable of codifying even the most complex data into clear, graphical representations of hierarchical relationships—a quality that may interest physicians and surgeons. For instance, this technique has been used to better understand relationships between variables in a variety of oncologic settings,1114

The purpose of this study was to develop a BBN model to describe the hierarchical relationships between features inherent to completely-excised localized high-grade soft tissue sarcomas of the extremity. In doing so, we illustrate the power of BBN modeling as a clinical tool that can aid clinicians in codifying complex hierarchical relationships between prognostic variables into clear graphical representations.

MATERIALS AND METHODS

The Memorial Sloan-Kettering Cancer Center (MSKCC) Sarcoma Database contains prospectively collected clinical, pathologic, and treatment-related variables for all adult patients (age >16 years) with primary and recurrent STS treated at our institution since 1982. After obtaining approval from the MSKCC institutional review board, which issued a waiver of informed consent, we searched the MSKCC Sarcoma Database for all patients with high-grade STS of the extremity who underwent complete resection (R0 margins). This homogeneous patient population was chosen in an effort to control for tumor grade and resection margin status prior to performing the initial probabilistic analysis.

Twenty-seven candidate features were chosen based on their current clinical or historical association with disease-specific outcomes in patients with high-grade STS, as well as their availability within the MSKCC Sarcoma Database (Table 1). These included the following: age at the time of surgical excision; sex; size, depth, and location of the tumor; histology and, if applicable, histologic variant; oncologic procedures prior to referral, if any; whether the sarcoma was thought to be radiation-induced; the patient’s home zip code at the time of referral; the surgeon and surgical service of record; need for tumor bed excision following referral; type of surgical procedure performed; presence of bone, nerve, or vascular invasion by the tumor; bone or nerve resection with the tumor; history of chemotherapy or radiotherapy; timing of chemotherapy or radiotherapy, if applicable; presence of and time to local recurrence (LR); presence of and time to distant recurrence (DR); and death from disease.

TABLE 1.

Candidate features considered for inclusion in the Bayesian belief network model

Candidate feature Label Description States
Agea AGE Patient age, at the
time of surgery
CV
Sex SEX Male
Female
Sizea PRIMARY SIZE
CATEGORY
Size category of
tumor in maximum
dimension
≤5 cm
5–10 cm
>10 cm
Deptha DEPTH Depth of primary
tumor compared to
investing fascia of
limb
Superficial
Deep
Sitea SITE Upper extremity
Lower extremity
Subsite SUBSITE Extremity tumors
distal to the vertical
plane made by the
axillary fold and
horizontal plane
made by the
inguinal ligament
were considered.
Hand
Forearm
Elbow
Arm
Axilla
Shoulder
Groin
Hip
Thigh
Knee
Leg
Ankle
Foot
Histologya HISTOLOGY Final histology
following excision,
review by 3
pathologists
MFH/HGPS
Synovial
Liposarcoma
Leiomyosarcoma
MPNST
Fibrosarcoma
Other
Varianta VARIANT Histologic variant, if
applicable
Monophasic
Biphasic…etc.
Presentation statusa PRESENTATION
STATUS
Oncologic
procedures
performed prior to
referral (if any)
None
Biopsy only
Marginal excision
Wide excision
Radiation-induced RT INDUCED Whether the
sarcoma is
considered radiation
induced
Yes
No
Referring zip code FIRST 3 DIGITS
ZIP
First 3 digits of
patient’s home zip
code at the time of
referral
CV
Surgeona SURGEON CODE 31 surgeons, listed
anonymously
A-EE
Servicea SERVICE CODE 2 surgical services GMT
Orthopaedic
Surgery
Re-excisiona RE EXCISION Whether the patient,
upon referral,
required a tumor
bed excision
Yes
No
Procedurea PROCEDURE
Type of procedure
performed
Limb sparing
Amputation
Bone invasiona BONE INVASION Yes
No
Bone resectiona BONE RESECTED Yes
No
Nerve invasiona
NERVE
INVASION
Yes
No
Nerve resectiona
NERVE
RESECTED
Yes
No
Vascular invasion VASC INVASION Yes
No
Chemotherapy
Preopa PREOP CHEMO Yes
No
Postopa POSTOP CHEMO Yes
No
Radiotherapy
Preopa PREOP RT Yes
No
Postopa POSTOP RT Yes
No
Time to LRa TIME TO LR In months CV
None
Time to DRa TIME TO DR In months CV
None
Death from diseasea DOD Whether the patient
died of disease, a
reflection of
disease-specific
survival
Yes
No
a

Candidate feature included in the final model.

CV = continuous variable; MFH/HGPS = malignant fibrous histiocytoma/high-grade pleomorphic sarcoma; MPNST = malignant peripheral nerve sheath tumor; GMT = Gastric & Mixed Tumor; LR = local recurrence; DR = distant recurrence.

The following definitions were used in this study. As mentioned above, only completely excised high-grade STS of the extremity were considered in this analysis. Specifically, tumors distal to the vertical plane made by the axillary fold in the upper extremity or distal to the inguinal ligament in the lower extremity were considered. A sarcoma was considered to be radiation-induced if it was histologically dissimilar from the original tumor and occurred within an irradiated field more than 6 months after irradiation. For histology, only histologic diagnoses comprising more than 2% of cases were included. Those comprising less than 2% of cases were combined and categorized as “other.” Patients were considered to have undergone a re-excision if they required, in the opinion of the treating surgeon, a tumor bed excision after referral for local control. Bone adherence/invasion was considered present if, on imaging, the tumor exhibited any effect on an adjacent bone, including periosteal reaction. Nerve and vascular invasion were determined histologically. The presence of and time to first DR were determined based on imaging, and calculated from the date of initial operation. Local recurrences were diagnosed by physical examination and/or imaging, and were confirmed histologically. Time to LR was calculated from the date of initial surgical excision to the date of histologic diagnosis of recurrence. Deaths confirmed to be sarcoma-related were considered disease-related.

Bayesian Statistical Analysis and Model Development

The training data set included all cases identified from the MSKCC Sarcoma Database during the study period. All 27 candidate features were considered for inclusion in the preliminary models. The BBN models were developed using commercially available machine-learning algorithms (FasterAnalytics, DecisionQ Corp., Washington, DC, USA) that automatically learn network structures and priors from the training data, thus, priors were not specified a priori.15, 16 Prior to Bayesian analysis, features containing continuous variables required conversion to categorical variables,17, 18 We used equal-area binning based on prior distributions learned from the training set. In an effort to balance goodness-of-fit against robustness, a parsimony metric was used to reduce the risk of overfitting the final model to the training data.9, 18 Using a step-wise process, unrelated and redundant features were pruned from the preliminary models to produce the final model.9, 18

To account for missing data within the training set, we used a passive, truncation-based imputation algorithm.9 We imputed values for those features in which missing data represented less than 30% of the total record count, and for which there was no adequate substitute feature. The imputation algorithm was applied to six features within the training set: bone invasion (missing in 5.4% of records), bone resection (7.9%), nerve invasion (11.6%), nerve resection (12.2%), vascular invasion (12.4%), and re-excision (27.5%). Thus, no features were pruned from the model because of missing data. Most patients had neither LR (85.3%) nor DR (68.3%). Therefore, a “missing” value in each of these features was defined as no LR or DR.

We trained the BBN model to specify network structure and prior probability distributions in order to develop classifiers of estimated disease-specific survival (DSS). The network structure was then portrayed graphically to illustrate the conditional interdependence and hierarchy of the features. First-degree associates were defined as those nodes that share edges with the outcome of interest (death from disease, in this case), while second-degree associates were those nodes that share edges with the first-degree associates. Inference tables were calculated depicting posterior estimates of probability for each possible permutation with respect to the outcome.

Cross-validation

Ten-fold cross-validation was performed to assess the robustness of the final BBN model. Data were randomized and divided into 10 matching train-and-test sets. Each train-and-test set consisted of a training set composed of 90% of the patient records and a test set composed of the remaining 10%. Each matching set was unique, without overlap. A BBN model was trained, using each training set, by applying the same parameters as the final BBN model, then tested on the corresponding test set. This was repeated 10 times on each of the ten unique train and test groups. Using the generated predicted probabilities, a receiver operating characteristic (ROC) curve, which is a graphical plot of sensitivity versus 1-specificity at all discrimination threshold levels, was generated considering DSS as the outcome. The area under the ROC curve (AUC) was then calculated to assess the model’s overall accuracy and robustness.

RESULTS

We identified a total of 1318 patients who met the inclusion criteria. All records were included in the analysis. The clinical characteristics and demographics of the patient population are shown in Table 2. These data comprised the training set for model development. The median age was 54 years (interquartile range [IQR], 38, 58). Most patients were male (55.2%), and most had lower-extremity lesions (73.1%). Tumor size was less than 5 cm (35.5%), 5–10 cm (32.5%), and greater than 10 cm (31.6%); it was unknown in six patients (0.5%). The most common histology was malignant fibrous histiocytoma/high-grade pleomorphic sarcoma (39.6%) followed by synovial sarcoma (15.8%), liposarcoma (12.7%), leiomyosarcoma (10.8%), malignant peripheral nerve sheath tumor (3.9%), and fibrosarcoma (2.6%).

TABLE 2.

Clinical characteristics and demographics for all patients with completely excised high-grade soft tissue sarcomas of the extremity (N = 1318)

Feature No. % Median IQR
Age (years) 54 38, 58
Sex
  Male 728 55.2
  Female 590 44.8
Tumor size category
  ≤5 cm 468 35.5
  5–10 cm 428 32.5
  >10 cm 416 31.6
  Unknown 6 0.5
Depth
  Superficial 265 20.1
  Deep 1053 79.9
Site
  Upper extremity 354 26.9
  Lower extremity 964 73.1
Subsite
  Hand 37 2.8
  Forearm 99 7.5
  Elbow 26 2.0
  Arm 91 6.9
  Axilla 33 2.5
  Shoulder 68 5.2
  Groin 43 3.3
  Hip 12 0.9
  Thigh 607 46.0
  Knee 73 5.5
  Leg 149 11.3
  Ankle 22 1.7
  Foot 58 4.4
Histology, variant
  MFH/HGPS 522 39.6
  Pleomorphic 200 38.3
  Myxofibrosarcomatous 171 32.8
  Giant Cell 10 1.9
  Inflammatory 2 0.4
  NOS 139 26.6
  Synovial sarcoma 208 15.8
  Monophasic 136 65.4
  Biphasic 70 33.7
  NOS 2 0.9
  Liposarcoma 168 12.7
  Myxoid/round cell 83 49.4
  Pleomorphic 57 33.9
  Dedifferentiated 21 12.5
  NOS 7 4.2
  Leiomyosarcoma 142 10.8
  MPNST 51 3.9
  Fibrosarcoma 34 2.6
  Other 193 14.6
Presentation Status
  No prior treatment 260 19.7
  Biopsy only 555 42.1
  Marginal excision 371 28.1
  Wide excision 132 10.0
Radiation-induced
  Yes 19 1.4
  No 1299 98.6
Zip code upon referral
  First 3 digits 112 087, 125
Surgeon
  A 318 24.2
  B 245 18.6
  C 146 11.1
  D 123 9.3
  E 116 8.8
  F 103 7.8
  G 102 7.8
  Other 165 12.4
Surgical Service
  GMT 798 60.6
  Orthopaedic Surgery 512 38.8
  Other 8 0.6
Tumor bed excision
  Yes 457 34.7
  No 498 37.8
  Missing 363 27.5
Procedure
  Amputation 106 8.0
  Limb-sparing surgery 1212 92.0
Bone invasion
  Yes 51 3.9
  No 1196 90.7
  Missing 71 5.4
Bone resected
  Yes 183 13.9
  No 1031 78.2
  Missing 104 7.9
Nerve invasion
  Yes 27 2.1
  No 1137 86.3
  Missing 154 11.6
Nerve resected
  Yes 156 11.8
  No 1001 78.0
  Missing 161 12.2
Vascular invasion
  Yes 73 5.5
  No 1081 82.0
  Missing 164 12.4
Chemotherapy
  Preop 164 12.4
  Postop 202 15.3
  None 952 72.2
Radiotherapy
  Preop 51 3.9
  Postop 579 43.9
  None 688 52.2
LR 194 14.7
Time to LR (months) 15 6, 29.5
DR 419 31.8
Time to DR (months) 11 5, 24
Death from disease 349 26.5
Follow-up time (months) 39 14.6, 96.8

IQR = interquartile range; MFH/HGPS = malignant fibrous histiocytoma/high-grade pleomorphic sarcoma; NOS = not otherwise specified; MPNST = malignant peripheral nerve sheath tumor; GMT = Gastric & Mixed Tumor; LR = local recurrence; DR = distant recurrence.

Chemotherapy was administered preoperatively in 12.4% of patients and postoperatively in 15.3%. Radiotherapy was given preoperatively in 3.9% of patients and postoperatively in 43.9%. In 34.7% of patients, re-excision was performed after a previous marginal excision (28.1%) or wide excision (10.0%). LR occurred in 14.7% of patients, with a median time to LR of 15 months (IQR, 6.0, 29.5). DR occurred in 31.8% of patients, with a median time of 11 months (IQR, 5.0, 24.0). Overall survival for the entire group was 54.4%, and DSS was 73.5%, at a median follow-up of 39.9 months (IQR, 14.6, 96.8). Median survival was not reached.

Bayesian analysis revealed strong, hierarchical associations between the various features (Figure 1). As shown, there were three first-degree associates of DSS: primary tumor size, time to DR, and time to LR. Second-degree associates of DSS were: the anatomic site and depth of the tumor, any oncologic treatment prior to referral, and a history of preoperative chemotherapy.

Figure 1.

Figure 1

Bayesian belief network constructed using the records of 1318 patients with completely excised high-grade soft tissue sarcomas of the extremity. As shown, there are three first-degree associates of death from disease (“DOD”): primary tumor size (“PRIMARY SIZE CATEGORY”); time to distant recurrence (“TIME TO DR”); and time to local recurrence (“TIME TO LR”).

On cross-validation, ROC curve analysis demonstrated that the model was robust. With DSS as the predicted outcome, the AUC was 0.94 (95% CI, 0.93–0.96). Inference tables were calculated based on the three first-degree associates, and posterior estimates of probability of death from disease ranged from 0.1% to 97.7%. The 41 most likely inferential cases are presented (Table 3), out of 144 potential permutations for this model.

TABLE 3.

Inference table describing the 41 most likely scenarios, out of a total of 144 potential permutations. The outcome of interest was the predicted probability of death from disease

Probability of case
based on training
data (%)
Primary size
category (cm)
Time to
LR
(months)
Time to
DR
(months)
Predicted probability of
death from disease (%)
24.5 ≤ 5 No LR No DR 0.1
18.6 5–10 No LR No DR 0.4
14.6 > 10 No LR No DR 0.7
2.4 > 10 No LR ≤ 4 83.3
2.3 > 10 No LR 14–28 83.9
2.3 > 10 No LR 9–14 85.3
2.2 > 10 No LR > 28 75.0
2.1 > 10 No LR 4–9 84.1
1.9 5–10 No LR ≤ 4 72.4
1.8 5–10 No LR > 28 61.2
1.8 5–10 No LR 14–28 73.3
1.8 5–10 No LR 9–14 75.4
1.6 5–10 No LR 4–9 73.5
1.4 ≤ 5 No LR > 28 35.8
1.3 ≤ 5 No LR ≤ 4 48.1
1.2 ≤ 5 No LR 14–28 49.3
1.2 ≤ 5 No LR 9–14 52.0
1.1 ≤ 5 No LR 4–9 49.5
0.8 ≤ 5 18–37 No DR 0.2
0.7 ≤ 5 > 37 No DR 0.2
0.6 5–10 18–37 No DR 0.6
0.6 5–10 > 37 No DR 0.6
0.5 ≤ 5 ≤ 5 No DR 0.6
0.5 > 10 18–37 No DR 1.2
0.4 > 10 > 37 No DR 1.1
0.4 > 10 > 37 No DR 1.1
0.4 ≤ 5 11–18 No DR 0.9
0.4 5–10 ≤ 5 No DR 1.7
0.4 ≤ 5 5–11 No DR 1.0
0.3 5–10 11–18 No DR 2.5
0.3 > 10 ≤ 5 No DR 3.1
0.3 5–10 5–11 No DR 2.7
0.3 > 10 11–18 No DR 4.7
0.2 > 10 11–18 ≤ 4 97.1
0.2 > 10 11–18 9–14 97.5
0.2 > 10 11–18 14–28 97.2
0.2 > 10 5–11 No DR 5.0
0.2 > 10 5–11 ≤ 4 97.3
0.2 > 10 5–11 9–14 97.7
0.2 > 10 5–11 14–28 97.4
0.2 > 10 5–11 No DR 5.0

LR = local recurrence; DR = distant recurrence

For locally recurrent tumors, the model showed a difference in survival based on the original size of the primary tumor. From the BBN model, we generated case-specific examples of LR for the three size categories (Table 4). The predicted probability of disease-related death was 28.6% for tumors 5 cm or smaller, but increased to 52.5% for tumors 5–10 cm and to 67.9% for tumors larger than 10 cm. Importantly, death from disease became the most likely scenario following a LR if the size of the primary tumor was greater than 5 cm, but only in the presence of DR (Table 3).

TABLE 4.

Association between the size category of the primary tumor and disease-specific survival in cases of local recurrence

Local
Recurrence
Size category
of primary
tumor (cm)
Predicted probability
of death from disease
(%)
Change in
probability above
baseline (%)
N/A N/A 26.8 0
Yes ≤ 5 28.6 +1.9
Yes 5–10 52.5 +25.7
Yes > 10 67.9 +41.2

N/A = not applicable

An association between DSS and the time to LR was also shown. Again, from the BBN model, we created five case-specific examples in which LR occurred prior to and after 18 months (Table 5). If LR occurred prior to 18 months after surgery, the predicted probability was 59.6%– 68.2% and death from disease was 55.8%–65.9%. However, if LR occurred 18 months after surgery, the predicted likelihood of DR and death from disease decreased substantially to 37.4%–39.8% and 29.7%–32.4%, respectively. Thus, time to LR affected both the likelihood of DR and the likelihood of death from disease.

TABLE 5.

Time-dependent association between local recurrence and disease-specific survival

LR Time to LR
(months)
Predicted probability
of death from disease
(%)
Change in probability
above baseline (%)
N/A N/A 27.5 0
Yes ≤ 5 55.8 29.1
Yes 5–11 67.5 40.8
Yes 11–18 65.9 39.2
Yes 18–37 32.4 5.8
Yes > 37 29.7 3.0

LR = local recurrence; N/A = not applicable

DISCUSSION

Using readily available clinical data, we successfully described the hierarchical relationships between features inherent to patients with completely excised high-grade STS of the extremity. In doing so, we illustrated the ability of Bayesian statistics to codify the complex relationships that exist between prognostic variables in STS into a simple hierarchical model. In the broader sense, we hope that by incorporating the use of BBN modeling in oncology, clinicians can improve their understanding of the complex relationships between various features and offer personalized estimations of any outcome of interest based on prior information. For example, the BBN model constructed in the present study confirmed several important relationships between prognostic features related to the treatment of STS (Figure 1). First-degree associates, or primary determinants, of DSS in this case, were the size of the primary tumor, time to DR, and time to LR.

Using a series of regression-derived nomograms, we have previously shown that a personalized estimation of outcomes in patients with STS is possible.18 Canter et al. demonstrated that the primary tumor size and the tumor site (upper or lower extremity) were the two variables most strongly associated with DSS in patients with synovial sarcoma, the majority of whom had undergone R0 resections. 19 In the present BBN model, both the primary tumor size and site of the tumor (upper vs lower extremity) also emerged as dominant features; the former was identified as a first-degree associate of DSS and the latter as a second-degree associate. Elsewhere, Canter et al. acknowledged that using tumor grade alone to classify patient outcomes was inadequate; in high-grade tumors, other variables such as histology need to be included for accurate outcome prediction. 20 They found a statistically significant difference in DSS by discretizing histologic diagnoses into “favorable,” “intermediate,” and “unfavorable” groups. We chose not to group histologic diagnoses in this fashion, but in an effort to improve the generalizability of the BBN model, we represented the dominant histologies comprising more than 2% of cases as individual categories. In the present study, which is based on a relatively homogeneous patient population, the BBN model structure revealed that the specific histologic diagnosis is a fourth-degree associate of DSS, and that knowledge of the size category (first-degree associate) or depth of the primary tumor (second-degree associate) renders DSS independent of the individual histologic diagnosis. Stojadinovic et al. previously described the relationship between LR, including the disease-free interval, and DSS. 21 In the present study, by examining the prior distributions within the BBN model, we observed not only an association between LR and a higher probability of death from disease (22.7% vs 27.0%), but also that patients who experienced an LR within 18 months were twice as likely to die from disease than those who experienced an LR after 18 months (63.6% vs 31.6%).

Preoperative chemotherapy was identified as a second-degree associate of DSS in this patient population. This is an important observation considering that only 27.8% of patients received adjuvant chemotherapy and 12.4% received neoadjuvant chemotherapy. Although it varies between treating oncologists, preoperative chemotherapy is typically offered at our institution for larger tumors (>7.5 cm) in which the risk of metastatic disease is believed to be higher. As such, this relationship may represent an institutional selection bias. Evaluation in other independent data sets is thus needed to confirm this observation.

We believe that a Bayesian classifier is well suited for analyzing outcomes in patients with STS for a variety of reasons. First, we have previously shown that there are verifiable relationships between features in the setting of patients with extremity STS 57, 20, 22 and that the nature of and time to LR influence these relationships.23 Studying the effect(s) of one variable at a time while holding all others constant is generally not possible in clinical research, nor does it adequately represent conditional interdependence. The Bayesian method not only generates a joint distribution function describing these relationships, but it also displays them graphically in an intuitive, transparent, and comprehensive manner. For the purpose of the current study, this is considered a major advantage over other, popular machine learning approaches such as Artificial Neural Networks (ANNs), which do not represent data graphically. The BBN method also depicts important relationships within one cohesive model (as opposed to a series of nomograms), which allows the clinician to better understand the hierarchy and relative importance of each feature as it pertains to an individual clinical scenario. Second, Bayesian networks can account for uncertainty within the data, 9, 10 and can thus be used effectively in the setting of reasonable amounts of incomplete or missing input data. This is a significant advantage over traditional nomograms and ANNs which require that input data be complete. As we have recommended with nomograms, clinicians may first conceptualize the model structure, then compute personalized estimations of outcomes for a given clinical scenario. As an analytical tool, Bayes’ theorem is inductive, updating beliefs (H) in response to evidence (e)— P(H/e)—something clinicians do every day. Thus, like other modeling techniques, Bayesian models can be updated (improved) from time to time as new evidence presents itself, be it emerging patterns of disease, new prognostic variables, or more effective treatment modalities. We acknowledge that prospectively collected data are ideally suited for this purpose; however, to establish the basis of this analysis in this patient population and improve the ability for other populations to be treated similarly, we encourage participation in multi-institutional prospective trials.

This study has several limitations. First, the BBN model was developed using only patients with localized high-grade STS of the extremity resected with negative margins. Thus, it is not applicable to all patients with STS, particularly those with low-grade lesions or those who have positive margins. Our inclusion of institution-specific features, such as “SURGEON,” “SERVICE,” and “REFERRING ZIP CODE,” further limits the model’s applicability and were included simply as a description of our current patient population. Second, like other machine-learning techniques, Bayesian methods have imperfections, especially when trained using censored observations.24 Other methods such as proportional hazard regression25 may better describe the likelihood of disease-specific death in the presence of censored data. However, graphically representing the hierarchical relationships between features (the purpose of this study) is not possible, by this method. We recognize this limitation and in lieu of developing a prognostic model, we performed an initial, descriptive Bayesian analysis designed to compare our results to previously reported data. The relationships described by the current model are similar to those previously reported; and cross-validation, which was performed as a general metric of overall robustness, yielded encouraging results. Nevertheless, the effect of censoring on Bayesian analysis is the subject of continued research, and future studies that include broader populations of patients with extremity sarcomas are under way to better define the role of Bayesian modeling in these patients. Next, feature selection was performed prior to cross validation, which could theoretically result in overstatement of model robustness. As such, adoption of the model structure depends on similarities between it and BBN models developed in other, independent datasets. Finally, this cohort is from a highly selected, relatively homogeneous referral population and the generalizability of subsequent Bayesian models depends on their performance in a variety of centers with differing institutional biases and treatment philosophies. Other variables that are potentially important, such as compartmentalization,26 were not available; these variables may be as important as other variables included in this model. Thus, development and validation of future models in other, more diverse patient populations is required and has already been planned.

The present analysis demonstrates that Bayesian modeling, an inductive statistical method, can be used to illustrate the complex hierarchical relationships between features and can thus function as an adjunct to existing nomograms. Controlling for tumor grade and resection margin status, as we did in this study, helps to easily visualize the relative importance of time to LR, primary tumor size, histology, and other features with respect to DSS. Perhaps most importantly, this type of modeling can provide insights into the interrelationships between various prognostic features that can then be explored further either by more restrictive Bayesian models or by defined population nomograms.

Acknowledgments

We thank Lionel Santibáñez for his superb editorial assistance, Mithat Gönen, PhD, for his timely statistical advice, and Nicole Moraco for helpful data assembly.

Financial support: U.S. National Cancer Institute (Soft Tissue Sarcoma Program Project Grant No. P01 CA047179) and The Maynard Limb Preservation Fund.

Footnotes

Disclosures: There are no potential conflicts of interest.

REFERENCES

  • 1.Collin C, Hadju SI, Godbold J, Shiu MH, Hilaris BI, Brennan MF. Localized, operable soft tissue sarcoma of the lower extremity. Arch Surg. 1986;121:1425–1433. doi: 10.1001/archsurg.1986.01400120075013. [DOI] [PubMed] [Google Scholar]
  • 2.Collin C, Hajdu SI, Godbold J, Friedrich C, Brennan MF. Localized operable soft tissue sarcoma of the upper extremity. Presentation, management, and factors affecting local recurrence in 108 patients. Ann Surg. 1987;205:331–339. doi: 10.1097/00000658-198704000-00001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Collin CF, Friedrich C, Godbold J, Hajdu S, Brennan MF. Prognostic factors for local recurrence and survival in patients with localized extremity soft-tissue sarcoma. Semin Surg Oncol. 1988;4:30–37. doi: 10.1002/ssu.2980040108. [DOI] [PubMed] [Google Scholar]
  • 4.Torosian MH, Friedrich C, Godbold J, Hajdu SI, Brennan MF. Soft-tissue sarcoma: Initial characteristics and prognostic factors in patients with and without metastatic disease. Semin Surg Oncol. 1988;4:13–19. doi: 10.1002/ssu.2980040105. [DOI] [PubMed] [Google Scholar]
  • 5.Pisters PW, Leung DH, Woodruff J, Shi W, Brennan MF. Analysis of prognostic factors in 1,041 patients with localized soft tissue sarcomas of the extremities. J Clin Oncol. 1996;14:1679–1689. doi: 10.1200/JCO.1996.14.5.1679. [DOI] [PubMed] [Google Scholar]
  • 6.Kattan MW, Leung DH, Brennan MF. Postoperative nomogram for 12-year sarcoma-specific death. J Clin Oncol. 2002;20:791–796. doi: 10.1200/JCO.2002.20.3.791. [DOI] [PubMed] [Google Scholar]
  • 7.Dalal KM, Kattan MW, Antonescu CR, Brennan MF, Singer S. Subtype specific prognostic nomogram for patients with primary liposarcoma of the retroperitoneum, extremity, or trunk. Ann Surg. 2006;244:381–391. doi: 10.1097/01.sla.0000234795.98607.00. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Canter RJ, Martinez SR, Tamurian RM, Wilton M, Li CS, Ryu J, Mak W, Monsky WL, Borys D. Radiographic and histologic response to neoadjuvant radiotherapy in patients with soft tissue sarcoma. Ann Surg Oncol. 2010;17:2578–2584. doi: 10.1245/s10434-010-1156-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Pearl J. Bayesian Inference. In: Morgan M, editor. Probabilistic reasoning in intelligent systems: networks of plausible inference. San Mateo, CA: Morgan Kaufmann; 1988. pp. 29–73. [Google Scholar]
  • 10.Rubin DB, Schenker N. Multiple imputation in health-care databases: An overview and some applications. Stat Med. 1991;10:585–598. doi: 10.1002/sim.4780100410. [DOI] [PubMed] [Google Scholar]
  • 11.Burnside ES, Rubin DL, Fine JP, Shachter RD, Sisney GA, WK L. Bayesian network to predict breast cancer risk of mammographic microcalcifications andreduce number of benign biopsy results: initial experience. Radiology. 2006;240:666–673. doi: 10.1148/radiol.2403051096. [DOI] [PubMed] [Google Scholar]
  • 12.Stojadinovic A, Peoples GE, Libutti SK, Henry LR, Eberhardt J, Howard RS, Gur D, Elster EA, Nissan A. Development of a clinical decision model for thyroid nodules. BMC Surgery. 2009;9:12–23. doi: 10.1186/1471-2482-9-12. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Forsberg JA, Eberhardt J, Boland PJ, Wedin… R. Estimating survival in patients with operable skeletal metastases: An application of a bayesian belief network. PLoS ONE. 2011 doi: 10.1371/journal.pone.0019956. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Adamina M, Tomlinson G, Guller U. Bayesian statistics in oncology: A guide for the clinical investigator. Cancer. 2009;115:5371–5381. doi: 10.1002/cncr.24628. [DOI] [PubMed] [Google Scholar]
  • 15.Maskery SM, Hu H, Hooke J, Shriver C, Liebman MN. A bayesian derived network of breast pathology co-ocurrence. J Biomed Inform. 2008;41:242–250. doi: 10.1016/j.jbi.2007.12.005. [DOI] [PubMed] [Google Scholar]
  • 16.Peterson LT, Ford EW, Eberhardt J, Huerta TR, Menachemi N. Assessing differences between physicians’ realized and anticipated gains from electronic health record adoption. J Med Syst. 2011;35:151–161. doi: 10.1007/s10916-009-9352-z. [DOI] [PubMed] [Google Scholar]
  • 17.Pearl J. Taxonomic hierarchies, continuous variables and uncertain probabilities. In: Morgan M, editor. Probabilistic reasoning in intelligent systems: networks of plausible inference. 1988. pp. 333–375. [Google Scholar]
  • 18.Jensen FV. An Introduction to Bayesian Networks. New York: Springer-Verlag; 1996. Building models; pp. 33–68. [Google Scholar]
  • 19.Canter RJ, Qin LX, Maki RG, Brennan MF, Ladanyi M, Singer S. A synovial sarcoma-specific preoperative nomogram supports a survival benefit to ifosfamide-based chemotherapy and improves risk stratification for patients. Clin Cancer Res. 2008;14:8191–8197. doi: 10.1158/1078-0432.CCR-08-0843. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Canter RJ, Beal S, Borys D, Martinez SR, Bold RJ, Robbins AS. Interaction of histologic subtype and histologic grade in predicting survival for soft-tissue sarcomas. J Am Coll Surg. 2010;210:191.e2–198.e2. doi: 10.1016/j.jamcollsurg.2009.10.007. [DOI] [PubMed] [Google Scholar]
  • 21.Stojadinovic A, Leung DH, Allen P, Lewis JJ, Jaques DP, Brennan MF. Primary adult soft tissue sarcoma: Time-dependent influence of prognostic variables. J Clin Oncol. 2002;20:4344–4352. doi: 10.1200/JCO.2002.07.154. [DOI] [PubMed] [Google Scholar]
  • 22.Kattan MW, Heller G, Brennan MF. A competing-risks nomogram for sarcoma-specific death following local recurrence. Stat Med. 2003;22:3515–3525. doi: 10.1002/sim.1574. [DOI] [PubMed] [Google Scholar]
  • 23.Eilber FC, Brennan MF, Riedel E, Alektiar KM, Antonescu CR, Singer S. Prognostic factors for survival in patients with locally recurrent extremity soft tissue sarcomas. Ann Surg Oncol. 2005;12:228–236. doi: 10.1245/ASO.2005.03.045. [DOI] [PubMed] [Google Scholar]
  • 24.Stajduhar I, Dalbelo-Basic B, Bogunovic N. Impact of censoring on learning bayesian networks in survival modelling. Artif Intell Med. 2009;47:199–217. doi: 10.1016/j.artmed.2009.08.001. [DOI] [PubMed] [Google Scholar]
  • 25.Cox DR. Regression models and life-tables. J Roy Stat Soc Ser B Methodological. 1972;34:187–220. [Google Scholar]
  • 26.Wunder JS, Healey JH, Davis AM, Brennan MF. A comparison of staging systems for localized extremity soft tissue sarcoma. Cancer. 2000;88:2721–2730. doi: 10.1002/1097-0142(20000615)88:12<2721::aid-cncr10>3.0.co;2-d. [DOI] [PubMed] [Google Scholar]

RESOURCES