Skip to main content
. 2021 Jan 25;118(5):e2010758118. doi: 10.1073/pnas.2010758118

Fig. 1.

Fig. 1.

k-mer frequency of nrEVEs is similar to that of extant RNA viruses and differentiates them from human sequences. (A) Confusion matrix of the SVM classifier using leave-one-out cross-validation. (B) nrEVEs share characteristic sequence similarities. nrEVEs were categorized based on the viral genes of those origins (bornavirus N, M, G, and L genes and filovirus NP, VP35, GP, and L genes). The SVM was trained using the indicated nrEVE categories. The numbers of nrEVEs used for training and test are shown (Left). (C) Hierarchical clusters of the human and vertebrate negative-strand RNA virus-coding sequences and nrEVEs. The k-mer frequency of these sequences was used for clustering with Ward’s method. The heatmap and dendrograms show the input k-mer frequency and the clustering process, respectively. The dendrogram leaf nodes of the human and virus coding sequences and nrEVEs are shown in gray, red, and blue, respectively. See SI Appendix, Fig. S1 for additional details.