Algorithm 1: Proposed Algorithm |
Input: The DNA sequence dataset. Output: Classify 4mC and non-4mC |
1. Begin 2. Remove the redundant sequences. 3. If total number of total number of then 5. Randomly select the equal number of and 6. Else 7. [Where ] 8. End If 9. Transform each DNA sequence into a fixed length of word. 10. Convert each word into n-dimensional vector form. 11. Split the dataset into training and testing set. Where > . The ratio between and of each data set is the same as in the entire dataset. 12. Tuning the base classifiers with numerous hyper-parameters. 13. Apply training set to fit each classifier. 14. Integrate the base classifiers and apply gird search to choose the relative weight for each ensemble member. 15. Predict new data. 16. For = 2 to do 17. If TN and TP then 18. = + Number of TP, TN in layer 19. = Number of FP, FN in layer [Where = layer number] 20. Repeat Step 13 to 15. 21. Else 22. Calculate total accuracy using the following formula: [Where = 1, 2, 3,……., n] 23. End If 24. End For 25. Stop |