Small and dependent datasets |
Data availability |
Restricting the number of parameters [27,190] |
Neural network architectures for small and sparse datasets |
Separating training and test sets by phylogenetic similarity [27] |
Methods to evaluate data dependency by protein and sequence similarities |
Biological sequence representation |
Methodological |
NLP with neural networks-based modeling [191,192,193,194] |
Incorporating amino acid substitution and codon usage matrices to representation frameworks |
Incorporating conserved domain databases to the training framework |
Incorporation of different data types |
Methodological |
Integration of multi-omics datasets through existing network topologies |
Reproducibility |
Acceptance |
Documentation and deposition of the processed data [195] |
- |
Benchmarking of the processing pipeline and optimized parameters [196] |
- |
Interpretability |
Acceptance |
Incorporation of established bioinformatic methods and databases with ML and DL frameworks [128,196] |
Generation of interpretable DL models [197,198,199] |