Skip to main content
. 2019 Dec 17;8:e49750. doi: 10.7554/eLife.49750

Figure 5. Cuscuta HI-sRNAs form superfamilies that co-vary with target sites across eudicots.

(A) sRNA superfamily count and membership for each Cuscuta isolate. Colors indicate general groupings of superfamilies. (B) An example HI-sRNA superfamily aligned to target sites from homologs in 36 eudicot genomes. Nucleotide and amino acid Shannon entropy from the alignments are shown as bits. Vertical red lines indicate the frame. Dots indicate the number of possible synonymous nucleotides at each codon. 17 additional examples in supplementary file 7. (C) Average conservation of target sites from homologs. Confirmed target site shown (red point), with all other possible sites shown by 25–75% quartiles (black line) and median (black point).

Figure 5.

Figure 5—figure supplement 1. Clustering method for forming HI-sRNA superfamilies.

Figure 5—figure supplement 1.

(A) Example demonstrating implementation of the ‘modified hamming distance’ (mHD) when comparing strings. Levenshtein edit distance is tolerant of insertions and deletions, yet the mHD does not allow these operations, making a high penalty to strings which contain insertional errors while shift errors are penalized the same. (B) Example of clustering seven HI-sRNAs into three superfamilies using mHD. Species are indicated by color; clustering is independent of species. Edges close enough to form a cluster (solid line, red distance number) and inadequate edges (dashed line, black distance number) connect HI-sRNA nodes. Cutoff for clustering is an mHD distance of five or less and it is not required that all nodes in a cluster must meet this threshold (must have one adequate edge to join a cluster).

Figure 5—figure supplement 2. Testing distance cutoff parameters for superfamily formation.

Figure 5—figure supplement 2.

(A) Experimental pipeline for testing cutoff. sRNA libraries are shuffled using UShuffle maintaining dinucleotide composition. (B) Number of superfamilies formed for real HI-sRNAs and shuffled libraries by maximum distance allowed for cluster formation. Smaller count of superfamilies means that more HI-sRNAs are successfully clustering with each other. (C) The same analysis as in B, except demonstrating the cumulative density of superfamilies by the number of sRNAs grouped in them. Larger cutoffs yield larger superfamilies, with shuffled libraries remaining unable to form clusters larger than one or two.