Skip to main content
. 2024 Apr 22;13:giae018. doi: 10.1093/gigascience/giae018

Figure 1:

Figure 1:

Workflow for extracting the sequence pattern matrix and using a deep learning neural network structure to predict taxon. (A) The virus genomes are initially divided into 5 subsets, and then each subset is simulated to represent 4 groups with different contig lengths. (B) Overlapping trinucleotides are used to represent the virus contigs. For example, if the nucleotides of the viral fragment are “ATTCATAACTT,” the trinucleotide set would consist of “ATT, TTC, TCA, CAT, ATA, TAA, AAC, ACT, CTT.” The trinucleotide set is then converted to a 64 × 64 sequence pattern matrix using a sequence pattern function. (C) The IPEV tool employs a 2D CNN model as the classifier. The CNN model accepts the sequence pattern matrix as input and outputs a 1 × 2 array representing the likelihood of prokaryotic and eukaryotic viruses.