A Convolutional Neural Network-Based Intelligent Medical System with Sensors for Assistive Diagnosis and Decision-Making in Non-Small Cell Lung Cancer

. 2021 Nov 30;21(23):7996. doi: 10.3390/s21237996

Algorithm 1: The dynamic sampling technique training algorithm

Input:

S = {(x_{1}, y_{1}), (x_{2}, y_{2}), \dots, (x_{n}, y_{n})}, x_{i} \in V \subseteq ℝ^{n}, y_{i} \in {l_{1}, l_{2}, \dots, l_{n}}

is the multi-label data set. where

n

is the total number of labels, NSCLC has four stages, so

n = 4

, the labels to be trained are

l_{i} \in {I, II, III, IV}

, the number of iterations is

E

and the size of the training data block for each iteration is

M

.
Output: The prediction model

H_{i} (x)

.
Step 1: For any label

l_{j}

, the co-occurrence frequency

F (l_{i}, l_{j})

of the small sample pathological stage label

l_{i}

and

l_{j}

is calculated, as shown in Equation (4). Then, the parameters of the training model for the large sample case dataset are selected and saved according to the label

l_{j}^{m a x}

corresponding to the maximum

F (l_{i}, l_{j})

value, which is used as the initialization value for the small sample pathological period prediction model. where

Q ({l_{i}, l_{j}} \subseteq y_{n})

is a binary function that labels each case sample of NSCLC patients, and the labeling value is 1 if

l_{i}

and

l_{j}

have appeared in the same case sample, otherwise the labeling value is 0, as shown in Equation (5).

F (l_{i}, l_{j}) = \sum_{(x_{n}, y_{n}) \in S} Q ({l_{i}, l_{j}} \subseteq y_{n})

(4)

Q ({l_{i}, l_{j}} \subseteq y_{n}) = {\begin{matrix} 1, l_{i} \in y_{n} a n d l_{j} \in y_{n} \\ 0, l_{i} \notin y_{n} o r l_{j} \notin y_{n} \end{matrix}

(5)

Step 2:

S

is split into multiple binary data sets

{S_{1}, S_{2}, \dots, S_{k}}

of NSCLC pathological stages using the One-VS-Rest approach, where

S_{i}

is the training set of the pathological stage label

l_{i}

. Then, select the majority pathological stage label

l_{k}

with the most frequent co-occurrence with the pathological stage label

l_{i}

. Train the large sample NSCLC pathological stage prediction model

H_{k} (x)

using the training dataset

S_{k}

with label

l_{k}

(see Equation (6)), and save the parameters of

H_{k} (x)

H_{k} (x) = T (S_{k})

(6)

Step 3: The parameters of the pathological stage prediction model

H_{k} (x)

for NSCLC in a large sample of cases are read as the initialized model

H_{i, 1} (x)

for the pathological stage label

l_{i}

. The majority category sample set of

l_{i}

S_{k, n e g}

, the minority category sample set is

S_{k, p o s}

, the sample size is

N_{n e g}

and

N_{p o s}

, respectively, with a total sample size of

N

. Initialize the minority category sample sampling probability

P_{i, 1} = {P_{i, 1} (1), P_{i, 1} (2), \dots, P_{i, 1} (N)}

, as shown in Equation (7). Since the sum of the sampling probabilities of both positive and negative samples is

M / 2

, after sampling for each positive and negative sample according to the sampling method in step 4, the average value of the number of positive and negative samples can be obtained by sampling

M / 2

, so the samples constructed by sampling are balanced.

P_{i, 1} (j) = {\begin{matrix} \frac{M}{2 \times N_{p o s}}, l_{i} \in y_{j} \\ \frac{M}{2 \times N_{n e g}}, l_{i} \notin y_{j} \end{matrix}

(7)

Step 4: The positive and negative sample sets are sampled separately based on the sampling probability

P_{i, t}

. For any sample

(x_{j}, y_{j})

with a sampling probability of

P_{i, t} (j)

, a random value

R (x_{j})

with a uniform distribution between 0 and 1 is generated using

R (*)

. If

R (x_{j}) \leq P_{i, t} (j)

, the sample

(x_{j}, y_{j})

is added to the new balanced sample set

S_{i, t r a i n}

. At this point, if

l_{i} \notin y_{j}

, the sample

(x_{j}, y_{j})

will be added to the partial majority class sample set

S_{i, n e g}^{s e l}

. On the contrary, the sample

(x_{j}, y_{j})

will be added to the minority class sample set

S_{i, p o s}

l_{i} \in y_{j}

, as shown in Equations (8) and (9). For each sample

(x_{j}, y_{j})

, its sampling probability is

P_{i, t} (j)

, which is equal to the probability that the randomly generated number

R (x_{j})

is smaller than

P_{i, t} (j)

. Therefore, when

R (x_{j})

is smaller than

P_{i, t} (j)

, the sample

(x_{j}, y_{j})

is added to this balanced sample set, and it is reasonable to update the sampling probability using this algorithm.

S_{i, n e g}^{s e l} = {(x_{j}, y_{j}) | R (x_{j}) \leq P_{i, t} (j), (x_{j}, y_{j}) \in S_{i, n e g}}

(8)

S_{i, p o s}^{s e l} = {(x_{j}, y_{j}) | R (x_{j}) \leq P_{i, t} (j), (x_{j}, y_{j}) \in S_{j, p o s}}

(9)

Finally,

S_{i, n e g}^{s e l}

and

S_{i, p o s}^{s e l}

are combined to form the training set

S_{i, t r a i n}

S_{i, t r a i n} = S_{i, n e g}^{s e l} \cup S_{i, p o s}^{s e l}

(10)

Step 5:

H_{i, t - 1} (x)

is trained based on the set data

S_{i, t r a i n}

to generate the new model

H_{i, t} (x)

, as shown in Equation (11).

H_{i, t} (x) = T (H_{i, t - 1} (x); S_{i, t r a i n})

(11)

Step 6: If result of calculating the probability that the predicted sample of model

H_{i, t} (x)

on the overall training sample of a positive sample is

η_{i, t}

and

η_{i, t} (j) \in [0, 1]

, this indicates that the probability value the predicted sample of the classifier belongs to a positive sample. The larger

η_{i, t} (j)

is better for positive samples, and the smaller

η_{i, t} (j)

is better for negative samples.

η_{i, t}

can be used to update the sampling probability

P_{i, t + 1} = {P_{i, t + 1} (1), P_{i, t + 1} (2), \dots, P_{i, t + 1} (N)}

, as shown in Equation (12).

P_{i, t + 1} (j) = {\begin{matrix} P_{i, t} (j) e x p (1 - η_{i, t} (j)), l_{i} \in y_{j} \\ P_{i, t} (j) e x p (η_{i, t} (j)), l_{i} \notin y_{j} \end{matrix}

(12)

When the model

H_{i, t} (x)

predicts the training sample incorrectly, or correctly but with low confidence, sampling probability of that sample is increased, which increases the focus of the model on the sample. Conversely, when the model predicts a sample correctly and with high confidence, it relatively reduces the sampling probability of the sample which reduces the attention of the model to it. This will increase the distinguishability of the model for positive and negative samples to improve the prediction accuracy and confidence of the model. Therefore, when the sample

(x_{j}, y_{j})

is a positive sample, the closer

η_{i, t} (j)

is to 0, the probability of the updated sample increases when the classification is incorrect or correct but the confidence is low. When it is a negative sample, the closer

η_{i, t} (j)

is to 1, the probability of updated sampling will increase when the classification is incorrect or correct but the confidence level is not high.
The sampling probabilities of positive samples are regulized, where

S u m_{t, p o s}

is the sum of all positive sample sampling probabilities, as shown in Equations (13) and (14).

P_{i, t + 1} (j) = \frac{M \times P_{i, t + 1} (j)}{2 \times S u m_{t, p o s}}

(13)

S u m_{t, p o s} = \sum_{(x_{n}, y_{n}) \in S_{i, p o s}} P_{i, t + 1} (n)

(14)

Similarly, the sampling probability of negative samples are regularized, where

S u m_{t, n e g}

is the sum of the sampling probabilities of all negative samples, as shown in Equations (15) and (16), respectively.

P_{i, t + 1} (j) = \frac{M \times P_{i, t + 1} (j)}{2 \times S u m_{t, n e g}}

(15)

S u m_{t, n e g} = \sum_{(x_{n}, y_{n}) \in S_{i, n e g}} P_{i, t + 1} (n)

(16)

Step 7: Determine whether the specified number of iterations is reached and return the final classifier if it is satisfied; otherwise, continue with steps 4 to 7 using the new sampling probabilities.