A Grid Search-Based Multilayer Dynamic Ensemble System to Identify DNA N4—Methylcytosine Using Deep Learning Approach

. 2023 Feb 25;14(3):582. doi: 10.3390/genes14030582

Algorithm 1: Proposed Algorithm

Input:
The DNA sequence dataset.
Output:
Classify 4mC and non-4mC

1. Begin
2. Remove the redundant sequences.
3. If total number of

(S_{i}^{+}) \neq

total number of

(S_{i}^{-})

then
5. Randomly select the equal number of

(S_{i}^{+})

and

(S_{i}^{-})

6. Else
7.

S_{i} = S_{i}^{+} \cup S_{i}^{-}

[Where

i = 1, 2, 3, 4, ......., 8

]
8. End If
9. Transform each DNA sequence into a fixed length of word.
10. Convert each word into n-dimensional vector form.
11. Split the dataset into training

(S_{T r a i n})

and testing

(S_{T e s t})

set. Where

(S_{T r a i n})

(S_{T e s t})

. The ratio between

S_{i}^{+}

and

S_{i}^{-}

of each data set is the same as in the entire dataset.
12. Tuning the base classifiers with numerous hyper-parameters.
13. Apply training set to fit each classifier.
14. Integrate the base classifiers and apply gird search to choose the relative weight for each ensemble member.
15. Predict new data.
16. For

i

= 2 to

n

do
17. If TN and TP

\neq 0

then
18.

S_{T r a i n}

S_{T r a i n}

(i - 1)

+ Number of TP, TN in layer

(i - 1)

19.

S_{T e s t}

= Number of FP, FN in layer

(i - 1)

[Where

i

= layer number]
20. Repeat Step 13 to 15.
21. Else
22. Calculate total accuracy using the following formula:

T A = \frac{\sum_{i = 1}^{n} {(T P + T N)}_{i}}{S_{T e s t} (S^{+} + S^{-})}

[Where = 1, 2, 3,……., n]
23. End If
24. End For
25. Stop