Skip to main content
. 2021 May 6;9:e11396. doi: 10.7717/peerj.11396

Table 5. For each search term used to identify putatively important conserved protein domains, we show the number of domain descriptions that contain this term for various categories including: (i) the starting set, (ii) the input to the Random Forest model, (iii) the top 50 and (iv) the top 20 features after model fitting.

Note that column entries will not sum up to the n depicted at the top of each column as many descriptions contain multiple search terms.

Starting Model
Search term (n = 371) (n = 206) Top 50 Top 20
integrase 101 72 24 13
excisionase 5 4 2 0
recombinase 72 52 28 17
transposase 143 70 14 3
lysogen 23 10 2 1
temperate 11 10 3 0
parA |ParA |parB |ParB 65 29 7 0