Figure 5.
(A) 150 nucleotides is sufficient to cover the junction in most sequences. The number of nucleotides from the start of the junction to the 3′ end was calculated for each sequence of 20 artificial repertoires. Note that this count includes additional base pairs from the constant region between the J region and the primer (see Figure 4). (B) Comparing the node sensitivity based on different number of nucleotides (L) and different k-mers values. The threshold is tuned to keep the specificity constant (0.99). The node PPV remains higher than 0.99 across all values of L and k presented above. (C) Performance comparison for three different settings; tf-idf applied to the full sequence (full-seq), tf applied to L=150 nucleotides from each sequence (tf) and tf-idf applied to L=150 nucleotides from each sequence (tf).