Skip to main content
. 2011 Feb 17;5(7):1133–1142. doi: 10.1038/ismej.2011.3

Table 1. A hypothetical short sequence data set and experimental RNA-sequencing data preferentially aligned to the cognate genome.

  Total DB size Total hits (% rRNA) Non-rRNA hits Unique locia Genes with a hit (%)
Hypothetical
 Exact 3 855 671 6518 (45%) 2923 2923 109 (2.8%)
 1 miss 3 855 671 14 086 (27%) 10 283 10 283 290 (7.6%)
           
Actualb
 Before mask 10 943 994 1 936 998 (99.9%) 1525 575 161 (4.1%)
 After maskc 542 600 646 (0%) 646 340 139 (3.6%)
           
Actuald          
 Before mask 15 151 014 2 400 846 (99.9%) 1280 409 194 (4.8%)
 After maskc 427 444 380 (0%) 380 204 151 (3.8%)

Abbreviation: rRNA, ribosomal RNA.

a

Unique loci refer to the number of distinct non-ribosomal sequences of T. primitia that had at least one hit.

An in-silico-generated data set of all possible 37 base-pair sequences in the genome of T. azotonutricium (hypothetical) and the RNA-seq data from a sample of T. azotonutricium (denoted as ‘Actualb' in the table) were mapped to the genome of T. primitia. Database size describes the number of short sequences in the data sets. The total hits column displays the number of short sequences that mapped to the T. primitia genome, and the percentage of hits that align to ribosomal 16S or 23S.

c

After mask sequences are those remaining after the most similar sequences between the two genomes were removed from sampling.

d

Refers to the RNA-Seq data from a sample of T. primitia, when mapped to the genome of T. azotonutricium.