Skip to main content
. 2023 Feb 25;14(3):582. doi: 10.3390/genes14030582
Algorithm 1: Proposed Algorithm
Input:
The DNA sequence dataset.
Output:
Classify 4mC and non-4mC
1. Begin
2. Remove the redundant sequences.
3. If total number of (Si+) total number of (Si) then
5.   Randomly select the equal number of (Si+) and (Si)
6. Else
7.   Si=Si+Si   [Where i=1,2,3,4,.......,8]
8. End If
9. Transform each DNA sequence into a fixed length of word.
10. Convert each word into n-dimensional vector form.
11. Split the dataset into training (STrain) and testing (STest) set. Where (STrain) > (STest). The ratio between Si+
   and Si of each data set is the same as in the entire dataset.
12. Tuning the base classifiers with numerous hyper-parameters.
13. Apply training set to fit each classifier.
14. Integrate the base classifiers and apply gird search to choose the relative weight for each ensemble member.
15. Predict new data.
16. For i = 2 to n do
17.    If TN and TP 0 then
18.     STrain = STrain (i1) + Number of TP, TN in layer (i1)
19.     STest = Number of FP, FN in layer (i1)   [Where i = layer number]
20.    Repeat Step 13 to 15.
21.   Else
22.    Calculate total accuracy using the following formula:
      TA=i=1n(TP+TN)iSTest(S++S)   [Where = 1, 2, 3,……., n]
23.   End If
24. End For
25. Stop