Prediction of Pathological Stage in Patients with Prostate Cancer: A Neuro-Fuzzy Model

Georgina Cosma; Giovanni Acampora; David Brown; Robert C Rees; Masood Khan; A Graham Pockley

doi:10.1371/journal.pone.0155856

. 2016 Jun 3;11(6):e0155856. doi: 10.1371/journal.pone.0155856

Prediction of Pathological Stage in Patients with Prostate Cancer: A Neuro-Fuzzy Model

Georgina Cosma ^1,^*,^#, Giovanni Acampora ^1,^‡, David Brown ^1,^‡, Robert C Rees ², Masood Khan ^3,^#, A Graham Pockley ^2,^#

Editor: Daotai Nie⁴

PMCID: PMC4892614 PMID: 27258119

Abstract

The prediction of cancer staging in prostate cancer is a process for estimating the likelihood that the cancer has spread before treatment is given to the patient. Although important for determining the most suitable treatment and optimal management strategy for patients, staging continues to present significant challenges to clinicians. Clinical test results such as the pre-treatment Prostate-Specific Antigen (PSA) level, the biopsy most common tumor pattern (Primary Gleason pattern) and the second most common tumor pattern (Secondary Gleason pattern) in tissue biopsies, and the clinical T stage can be used by clinicians to predict the pathological stage of cancer. However, not every patient will return abnormal results in all tests. This significantly influences the capacity to effectively predict the stage of prostate cancer. Herein we have developed a neuro-fuzzy computational intelligence model for classifying and predicting the likelihood of a patient having Organ-Confined Disease (OCD) or Extra-Prostatic Disease (ED) using a prostate cancer patient dataset obtained from The Cancer Genome Atlas (TCGA) Research Network. The system input consisted of the following variables: Primary and Secondary Gleason biopsy patterns, PSA levels, age at diagnosis, and clinical T stage. The performance of the neuro-fuzzy system was compared to other computational intelligence based approaches, namely the Artificial Neural Network, Fuzzy C-Means, Support Vector Machine, the Naive Bayes classifiers, and also the AJCC pTNM Staging Nomogram which is commonly used by clinicians. A comparison of the optimal Receiver Operating Characteristic (ROC) points that were identified using these approaches, revealed that the neuro-fuzzy system, at its optimal point, returns the largest Area Under the ROC Curve (AUC), with a low number of false positives (FPR = 0.274, TPR = 0.789, AUC = 0.812). The proposed approach is also an improvement over the AJCC pTNM Staging Nomogram (FPR = 0.032, TPR = 0.197, AUC = 0.582).

Introduction

Cancer staging prediction is a process for estimating the likelihood that the disease has spread before treatment is given to the patient. Cancer staging evaluation occurs before (i.e. at the prognosis stage) and after (i.e. at the diagnosis stage) the tumor is removed—the clinical and pathological stages respectively. The clinical stage evaluation is based on data gathered from clinical tests that are available prior to treatment or the surgical removal of the tumor. There are three primary clinical stage tests for prostate cancer: the Prostate Specific Antigen (PSA) test which measures the level of PSA in the bloodstream; a biopsy which is used to detect the presence of cancer in the prostate and to evaluate the degree of cancer aggressiveness (results are usually given in the form of the Primary and Secondary Gleason patterns); and a physical examination, namely the Digital Rectal Examination (DRE) which can determine the existence of disease and possibly provide sufficient information to predict the stage of the cancer. A limitation of the PSA test is that abnormally high PSA levels may not necessarily indicate the presence of prostate cancer, nor might normal PSA levels reflect the absence of prostate cancer. Pathological staging can be determined following surgery and the examination of the removed tumor tissue, and is likely to be more accurate than clinical staging, as it allows a direct insight into the extent and nature of the disease. More information on the clinical tests is provided in the next subsection Medical Background.

Given the potential prognostic power of the clinical tests, a variety of prostate cancer staging prediction systems have been developed. The ability to predict the pathological stage of a patient with prostate cancer is important, as it enables clinicians to better determine the optimal treatment and management strategies. This is to the patient’s considerable benefit, as many of the therapeutic options can be associated with significant short- and long- term side-effects. For example, radical prostatectomy (RP)—the surgical removal of the prostate gland—offers the best chance for curing the disease when prostate cancer is localised, and the accurate prediction of pathological stage is fundamental to determining which patients would benefit most from this approach [1–3]. Currently, clinicians use nomograms to predict a prognostic clinical outcome for prostate cancer, and these are based on statistical methods such as logistic regression [4]. However, cancer staging continues to present significant challenges to the clinical community.

The prostate cancer staging nomograms which are used to predict the pathological stage of the cancer are based on results from the clinical tests. However, the accuracy of the nomograms is debatable [5, 6]. Briganti et al. [5] argues that nomograms are accurate tools and that “Personalized medicine recognizes the need for adjustments, according to disease and host characteristics. It is time to embrace the same attitude in other disciplines of medicine. This includes urologic oncology where nomograms, regression-trees, lookup tables and neural networks represent the key tools capable of providing individualized predictions”. Dr Joniau in [5] argues that the data used for devising the nomograms are subjective and, to a certain extent, biased by institutional protocols on which patients are selected for a given treatment. Dr Joniau states that one of the drawbacks of nomograms is that various nomograms have been devised for risk estimation and it is difficult to determine which nomogram will provide the most reliable risk estimation for a particular patient. He emphasises that although nomograms allow for more accurate risk assessment, this risk estimation is a “snapshot in a risk continuum”. Although this might allow personalized predictions, it also makes treatment decisions difficult [5].

Cancer prediction systems which consider various variables for the prediction of an outcome require computational intelligent methods for efficient prediction outcomes [7]. Although computational intelligence approaches have been used to predict prostate cancer outcomes, very few models for predicting the pathological stage of prostate cancer exist. In essence, classification models based on computational intelligence are utilised for prediction tasks. Classification is a form of data analysis which extracts classifier models describing data classes, and uses these models to predict categorical labels (classes) or numeric values [8]. When the classifier is used to predict a numeric value, as opposed to a class label, it is referred to as a predictor. Classification and numeric prediction are both types of prediction problems [8], and classification models are widely adopted to analyse patient data and extract a prediction model in the medical setting.

Computational intelligence approaches, and in particular fuzzy-based approaches, are based on mathematical models that are specially developed for dealing with the uncertainty and imprecision which is typically found in the clinical data that are used for prognosis and the diagnosis of diseases in patients. These characteristics make these algorithms a suitable platform on which to base new strategies for diagnosing and staging prostate cancer. For example, not everyone diagnosed with prostate cancer will exhibit abnormal results in all tests, as a consequence of which, different test result combinations can lead to the same outcome.

The capacity of fuzzy, and especially neuro-fuzzy approaches, to predict the pathological stage of prostate cancer has not been as widely evaluated as the more commonly used Artificial Neural Network (ANN) and other approaches. However, fuzzy approaches have been applied to other prostate cancer scenarios. Benechi et al. [9] have applied the Co-Active Neuro-Fuzzy Inference System (CANFIS) to predict the presence of prostate cancer; Keles et al.[10] proposed a neuro-fuzzy system for predicting whether an individual has cancer or Benign Prostatic Hyperplasia (BPH, a benign enlargement of the prostate). Çinar [11] designed a classifier-based expert system for the early diagnosis of prostate cancer, thereby aiding the decision-making process and informing the need for a biopsy. Castanho et al. [12] developed a genetic-fuzzy expert system which combines pre-operative serum PSA, clinical stage, and Gleason grade of a biopsy to predict the pathological stage of prostate cancer (i.e. whether it was confined or not-confined).

Saritas et al. [13] devised an ANN approach for the prognosis of cancer which can be used to assist clinical decisions relating to the necessity for a biopsy. Shariat et al. [14] have performed a critical review of prostate cancer prediction tools and concluded that predictive tools can help during the complex decision-making processes, and that they can provide individualised, evidence-based estimates of disease status in patients with prostate cancer.

Finally, Tsao et al. [15] developed an ANN model to predict prostate cancer pathological staging in 299 patients prior to radical prostatectomy, and found that the ANN model was superior at predicting Organ Confined Disease in prostate cancer than a Logistic Regression model. Tsao et al. [15] also compared their ANN model with Partin Tables, and found that the ANN model more accurately predicted the pathological stage of prostate cancer.

Herein we propose a neuro-fuzzy model for predicting the pathological stage of prostate cancer. The system inputs comprise the following variables: the most common tumor pattern (Primary Gleason pattern), the second most common pattern (Secondary Gleason pattern), PSA levels, age at diagnosis, and clinical T stage. The neuro-fuzzy model automatically constructs fuzzy rules via a training process which is applied to existing and known patient records and status. These rules are then used to predict the prostate cancer stage of patients in a validation set. The model makes use of the Adaptive Neuro-Fuzzy Inference System which is also used to optimise the predictive performance. The outcome for each patient record is a numerical prediction of the ‘degree of belongingness’ of each patient in the Organ-Confined Disease and Extra-Prostatic Disease classes.

Medical Background

This section describes the variables used for diagnosis.

Prostate Specific Antigen (PSA)

The Prostate Specific Antigen (PSA) test is a blood test that measures the level of prostate-specific antigen in the bloodstream. Although having limitations, the PSA test is currently the best method for identifying an increased risk of localised prostate cancer. PSA values tend to rise with age, and the total PSA levels (ng/ml) recommended by the Prostate Cancer Risk Management Programme are as follows [16]: 50–59 years, PSA ≥ 3.0; 60–69 years, PSA ≥ 4.0; and 70 and over, PSA > 5.0. Abnormally high and raised PSA levels may, but does not necessarily, indicate the presence of prostate cancer. The European Study of Screening for Prostate Cancer revealed that screening significantly reduces death from prostate cancer, and that a man who undergoes PSA testing will have his risk of dying from prostate cancer reduced by 29% [17, 18], and [19]. However, it should also be noted that a normal PSA test does not necessarily exclude the presence of prostate cancer.

Primary and Secondary Gleason Patterns

A tissue sample (biopsy) is used to detect the presence of cancer in the prostate and to evaluate its aggressiveness. The results from a prostate biopsy are usually provided in the form of the Gleason grade score. For each biopsy sample, pathologists examine the most common tumor pattern (Primary Gleason pattern) and the second most common pattern (Secondary Gleason pattern), with each pattern being given a grade of 3 to 5. These grades are then combined to create the Gleason score (a number ranging from 6 to 10) which is used to describe how abnormal the glandular architecture appears under a microscope. For example, if the most common tumor pattern is grade 3, and the next most common tumor pattern is grade 4, the Gleason score is 3 + 4, or 7. A score of 6 is regarded as low risk disease, as it poses little danger of becoming aggressive; and a score of 3 + 4 = 7 indicates intermediate risk. Because the first number represents the majority of abnormal tissue in the biopsy sample, a 3 + 4 is considered less aggressive than a 4 + 3. Scores of 4 + 3 = 7, or 8 to 10 indicate that the glandular architecture is increasingly more abnormal and associated with high risk disease which is likely to be aggressive.

Clinical and Pathological Stages

The clinical stage is an estimate of the prostate cancer stage, and this is based on the results of the digital rectal examination (DRE). The pathological stage can be determined if a patient has had surgery and hence is based on the examination of the removed tissue. Pathological staging is likely to be more accurate than clinical staging, as it can provide a direct insight into the extent of the disease. At the clinical stage, there are four categories for describing the local extent of a prostate tumor (T1 to T4). Clinical and pathological staging use the same categories, except that the T1 category is not used for pathological staging. In summary, stages T1 and T2 describe a cancer that is probably organ-confined, T3 describes cancer which is beginning to spread outside the prostate, and T4 describes a cancer that has likely begun to spread to nearby organs. Category T1 is when the tumor cannot be felt during the DRE or be seen with imaging such as transrectal ultrasound (TRUS). Category T1 has three subcategories: T1a cancer is found incidentally during a transurethral resection of the prostate (TURP) which will have been performed for the treatment of Benign Prostatic Hyperplasia, and the cancer is present in no more than 5% of the tissue removed; T1b cancer is found during a TURP, but is present in more than 5% of the tissue removed, and T1c cancer is found in a needle biopsy which has been performed due to an elevated PSA level. Category T2 is when the tumor can be felt during a DRE or seen with imaging, but still appears to be confined to the prostate gland. Category T2 has three subcategories: T2a cancer is in one half or less of only one side (left or right) of the prostate; T2b cancer is in more than half of only one side (left or right) of the prostate; and T2c cancer is in both sides of the prostate. Category T3 has two subcategories: T3a cancer extends outside the prostate, but not to the seminal vesicles; and T3b cancer has spread to the seminal vesicles. Finally, category T4 cancer has grown into tissues next to the prostate (other than the seminal vesicles), such as the urethral sphincter, the rectum, the bladder, and/or the wall of the pelvis.

The TNM staging is the most widely used system for prostate cancer staging and aims to determine the extent of:

primary tumor (T stage),
the absence or presence of regional lymph node involvement (N stage), and
the absence or presence of distant metastases (M stage)

The TNM system has been accepted by the Union for International Cancer Control (UICC) and the American Joint Committee on Cancer (AJCC). Most medical facilities use the TNM system as their main method for cancer reporting. The clinical TNM and pathological TNM are provided in Tables 1 and 2 respectively. Once the T, N, and M are determined, a stage of I, II, III, or IV is assigned, with stage I being early and stage IV being advanced disease. Upon determining the T, N, and M stages, a prognosis can be made about the anatomic stage of cancer using the groupings shown in Table 3 where a stage of I, II, III, or IV is assigned to a patient, with stage I being early and stage IV being advanced disease [20]. Stages I, II, are organ confined cancer stages, whereas Stages III and IV are extra-prostatic stages. TNM systems have gone through several refinements in order to “improve the uniformity of patient evaluation and to maintain a clinically relevant evaluation” [20]. In the most recent American Joint Committee on Cancer (AJCC) [21], the Gleason score and PSA have been incorporated in the cancer stage/prognostic groups 3.

Table 1. Definitions of clinical TNM according AJCC 2010 [21].

Primary tumor (pT)
TX	Primary tumor cannot be assessed
T0	No evidence of primary tumor
Clinically inapparent tumor neither palpable nor visible by imaging (T1)
T1a	Tumor incidental histologic finding in ≤ 5% of tissue resected
T1b	Tumor incidental histologic finding in > 5% of tissue resected
T1c	Tumor identified by needle biopsy (e.g. because of elevated PSA)
Tumor confined within prostate (T2)
T2a	Tumor involves one-half of one lobe or less
T2b	Tumor involves more than one-half of one lobe but not both lobes
T2c	Tumor involves both lobes
Tumor extends through the prostate capsule (T3)
T3a	Extracapsular extension (unilateral or bilateral)
T3b	Tumor invades seminal vesicle(s)
T4	Tumor is fixed or invades adjacent structures other than seminal vesicles such as external sphincter, rectum, bladder, levator muscles, and/or pelvic wall
Regional lymph nodes (pN)
NX	Regional lymph nodes were not assessed
N0	No regional lymph node metastasis
N1	Metastasis in regional lymph node(s)
Distant metastasis (pM)
M0	No distant metastasis
M1	Distant metastasis
M1a	Non-regional lymph node(s)
M1b	Bone(s)
M1c	Other site(s) with or without bone disease

Organ confined (pT2)
pT2a	Unilateral, one-half of one side or less
pT2b	Unilateral, involving more than one-half of one side, but not both sides
pT2c	Bilateral disease
Extraprostatic extension (pT3)
pT3a	Extraprostatic extension or microscopic bladder neck invasion
pT3b	Seminal vesicle invasion
pT4	Invasion of rectum levator muscles, and/or pelvic wall
Regional lymph nodes (pN)
pNX	Regional lymph nodes not sampled
pN0	No positive regional lymph nodes
pN1	Metastasis in regional lymph node(s)
Distant metastasis (pM)
pM1	Distant metastasis
pM1a	Non-regional lymph node(s)
pM1b	Bone(s)
pM1c	Other site(s) with or without bone disease

Group	T	N	M	PSA	Gleason score (GS)
I	T1a–c	N0	M0	PSA < 10	GS ≤ 6
	T2a	N0	M0	PSA < 10	GS ≤ 6
	T1–2a	N0	M0	PSA X	GS X
IIA	T1a–c	N0	M0	PSA < 20	GS 7
	T1a–c	N0	M0	PSA ≥ 10 < 20	GS ≤ 6
	T2a	N0	M0	PSA < 20	GS ≤ 7
	T2b	N0	M0	PSA < 20	GS ≤ 7
	T2b	N0	M0	PSA X	GS X
IIB	T2c	N0	M0	Any PSA	Any GS
	T1–2	N0	M0	PSA ≥ 20	Any GS
	T1–2	N0	M0	Any PSA	GS ≥ 8
III	T3a–b	N0	M0	Any PSA	Any GS
IV	T4	N0	M0	Any PSA	Any GS
	Any T	N1	M0	Any PSA	Any GS
	Any T	Any N	M1	Any PSA	Any GS

Statistics of variables before categorisation
	Minimum	Maximum	Mean	Standard deviation
Primary Gleason pattern	3	5	3.54	0.60
Secondary Gleason pattern	3	5	3.74	0.69
PSA level (ng/mL)	0.70	107.00	9.84	11.25
Age at Diagnosis	41.10	78.00	59.88	6.92
Clinical T	1.00	5.00	2.19	1.45
Pathological T stage	1.00	2.00	1.55	0.50

Primary Gleason pattern groups	Frequency count	Proportion of patients(%)
3	205	51.4
4	173	43.4
5	21	5.3
Total	399	100.0
Secondary Gleason pattern groups	Frequency count	Proportion of patients(%)
3	159	39.8
4	185	46.4
5	55	13.8
Total	399	100.0

PSA group	PSA range	Frequency count	Proportion of patients (%)
1	0–2.5 ng/mL	16	4.01
2	2.6–4.0 ng/mL	33	8.27
3	4.1–6.0 ng/mL	124	31.08
4	6.1–9.9 ng/mL	124	31.08
5	10–19 ng/mL	67	16.79
6	≥ 20 ng/mL	35	8.77

Age group	Age range	Frequency count	Proportion of patients (%)
1	< 25	0	0
2	25–29	0	0
3	30–34	0	0
4	35–39	0	0
5	40–44	5	1.25
6	45–49	22	5.51
7	50–54	68	17.04
8	55–59	97	24.31
9	60–64	100	25.06
10	65–69	76	19.05
11	> 70	31	7.77

Clinical T group	Clinical T stage	Frequency count	Proportion of patients (%)
1	T1(a-c)	204	51.13
2	T2a	53	13.28
3	T2b	53	13.28
4	T2c	42	10.53
5	T3a	29	7.27
5	T3b	16	4.01
5	T4	2	0.50
	Total	399	100.00

pT group	Pathological T (pT) stage	Frequency count	Proportion of patients (%)	OCD or ED
1	T2(unknown if a or b)	1	0.25	OCD
1	T2a	14	3.51	OCD
1	T2b	47	11.78	OCD
1	T2c	117	29.32	OCD
2	T3a	142	35.59	ED
2	T3b	72	18.05	ED
2	T4	6	1.50	ED
	Total	399	100.00

Case No.	Primary Gleason Pattern	Secondary Gleason Pattern	PSA	Age	Clinical T stage	Pathological (pT) stage
1	3	3	1.00	51.6	T2b	T2a
2	3	3	1.70	77.0	T2b	T2c
3	3	3	2.05	55.2	T2a	pT2b
4	3	3	2.09	61.1	T1c	pT2b
5	3	3	2.20	57.0	T1c	T3a
n	…	…	…	…	…	…

Age group	Patient count	PSA mean	Standard deviation of PSA values
5	5	4.40	1.14
6	22	4.14	0.89
7	68	3.54	1.23
8	97	3.75	1.20
9	100	3.74	1.14
10	76	3.75	1.21
11	31	3.81	1.56
Total	399	3.75	1.21

	Groups
Variables	OCD	ED	p
	n = 179	n = 220
Primary Gleason pattern	3.45 ± 0.52	3.61 ± 0.64	0.005
Secondary Gleason pattern	3.65 ± 0.64	3.81 ± 0.71	0.016
Pre-treatment PSA level (ng/mL)	3.76 ± 1.26	3.74 ± 1.17	0.848
Age at diagnosis (groups)	8.49 ± 1.37	8.60 ± 1.40	0.434
Clinical T stage	2.01 ± 1.36	2.33 ± 1.51	0.025

	Performances based on ROC evaluation measurements
	Neuro-Fuzzy (Our approach)	FCM	Quadratic-SVM	ANN	GB-NB	AJCC pTNM Nomogram
Area Under the Curve (AUC)	0.812	0.809	0.738	0.699	0.750	0.582
Optimal ROC point FPR	0.274	0.403	0.242	0.303	0.274	0.032
Optimal ROC point TPR	0.789	0.901	0.718	0.701	0.775	0.197
Asymp. Sig. (McNemars)	1.000	0.868	0.499	1.000	1.000	0.000

	Kernel Function
Evaluation Measure	Linear	Quadratic	GRB	MP
Specificity (TNR)	0.758	0.758	0.661	0.597
Sensitivity (TPR)	0.704	0.718	0.747	0.690
Area Under the Curve	0.731	0.738	0.704	0.644
Optimal ROC point FPR	0.242	0.242	0.339	0.403
Optimal ROC point TPR	0.704	0.718	0.747	0.690

	Type of function
Evaluation Measure	GD-NB	KDE-NB
Specificity (TNR)	0.726	0.645
Sensitivity (TPR)	0.745	0.747
Area Under the Curve	0.750	0.696
Optimal ROC point FPR	0.274	0.355
Optimal ROC point TPR	0.775	0.747

PERMALINK

Prediction of Pathological Stage in Patients with Prostate Cancer: A Neuro-Fuzzy Model

Georgina Cosma

Giovanni Acampora

David Brown

Robert C Rees

Masood Khan

A Graham Pockley

Roles

Abstract

Introduction

Medical Background

Prostate Specific Antigen (PSA)

Primary and Secondary Gleason Patterns

Clinical and Pathological Stages

Table 1. Definitions of clinical TNM according AJCC 2010 [21].

Table 2. Pathological TNM according AJCC 2010 [21]. There is no pT1 classification.

Table 3. Anatomic stage/prognostic groups (from AJCC 2010) [21].

Methods I—Neuro-Fuzzy Model

Fig 1. Neuro-Fuzzy Prostate Cancer Pathological Stage Predictor.

System Inputs

Data Normalisation

Fuzzy C-Means

Sugeno-Yusukawa Method

Adaptive-Neuro Fuzzy Inference System

Neuro-Fuzzy Predictor

Methods II—Other Computational Intelligence Approaches

Artificial Neural Network Classifier

Naive Bayes Classifier

Support Vector Machine Classifier

Results I: Dataset Analysis

Dataset Description

Table 4. Dataset Statistics.

Table 5. Primary and Secondary Gleason pattern groups.

Table 6. PSA groups.

Table 7. Age groups.

Fig 2. Histogram of grouped PSA values.

Table 8. Clinical T stage groups.

Table 9. Pathological T (pT) stage groups.

Table 10. Before data normalisation.

Table 11. After data normalisation.

Age at Diagnosis and its Association with PSA Values

Fig 3. Histogram of grouped age values.

Table 12. PSA levels categorised by age group.

Analysis of the Clinical T Stage Values

Analysis of the Pathological T (pT) Stage Values

Table 13. Mean and Standard deviation values for Organ-Confined Disease (OCD) and Extra-Prostatic Disease (ED) groups diagnosed at the Pathological stage.

Results II: Pathological Stage Prediction Using the Neuro-Fuzzy Model

Experiment Methodology

System Inputs

Fig 4. Neuro-Fuzzy System Membership Functions: Gleason 1 is Primary Gleason Pattern; Gleason 2 is Secondary Gleason pattern; PSA is Prostate Specific Antigen; Age represents the Age group; and clinical T stage is the result of the Digital Rectal Examination.

Performance Evaluation Measures

Comparison of the Neuro-Fuzzy Model with Other Methods

Table 14. Performance evaluation.

Fig 5. Performance Comparison.

Fig 6. ROC Curves: Performance Comparison.

Table 15. Support Vector Machine(SVM) performance evaluation when applying various kernel functions.

Table 16. Naive Bayes(NB) performance evaluation using the Gaussian distribution and Kernel Density Estimation functions.

Discussion and Conclusion

Data Availability

Funding Statement

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases