Table 1. A hypothetical short sequence data set and experimental RNA-sequencing data preferentially aligned to the cognate genome.
Total DB size | Total hits (% rRNA) | Non-rRNA hits | Unique locia | Genes with a hit (%) | |
---|---|---|---|---|---|
Hypothetical | |||||
Exact | 3 855 671 | 6518 (45%) | 2923 | 2923 | 109 (2.8%) |
1 miss | 3 855 671 | 14 086 (27%) | 10 283 | 10 283 | 290 (7.6%) |
Actualb | |||||
Before mask | 10 943 994 | 1 936 998 (99.9%) | 1525 | 575 | 161 (4.1%) |
After maskc | 542 600 | 646 (0%) | 646 | 340 | 139 (3.6%) |
Actuald | |||||
Before mask | 15 151 014 | 2 400 846 (99.9%) | 1280 | 409 | 194 (4.8%) |
After maskc | 427 444 | 380 (0%) | 380 | 204 | 151 (3.8%) |
Abbreviation: rRNA, ribosomal RNA.
Unique loci refer to the number of distinct non-ribosomal sequences of T. primitia that had at least one hit.
An in-silico-generated data set of all possible 37 base-pair sequences in the genome of T. azotonutricium (hypothetical) and the RNA-seq data from a sample of T. azotonutricium (denoted as ‘Actualb' in the table) were mapped to the genome of T. primitia. Database size describes the number of short sequences in the data sets. The total hits column displays the number of short sequences that mapped to the T. primitia genome, and the percentage of hits that align to ribosomal 16S or 23S.
After mask sequences are those remaining after the most similar sequences between the two genomes were removed from sampling.
Refers to the RNA-Seq data from a sample of T. primitia, when mapped to the genome of T. azotonutricium.