Table 2.
General features of the metagenome datasets
Feature | KB-1 | DonnaII | ANAS |
---|---|---|---|
Type of sequencing |
Sanger |
454 |
454 & Sanger |
Total number of bases pre-assembly |
106,515,530 |
930,446,714 |
330,964,688 |
Number of contigs |
6,361 |
47,030 |
10,807 |
Total length of contigs (bp) |
14,988,108 |
24,573,718 |
30,615,713 |
Number of singletons |
18,629 |
105,608 |
15,486 |
Total length of singletons (bp) |
13,487,233 |
57,708,799 |
10,450,264 |
Largest contig (bp) |
155,970 |
121,460 |
921,258 |
Average contig size (bp) |
2,356 |
522 |
2,832 |
Average G + C content (%) |
52.33 |
52.28 |
51.91 |
Protein coding genes |
40,766 |
194,527 |
60,992 |
- with COGs |
21,857 |
116,001 |
39,920 |
- connected to KEGG pathways |
8,077 |
36,685 |
11,878 |
rRNA genes (5 S/16 S/23 S) |
18 (7/5/6) |
185 (11/62/112) |
40 (23/8/9) |
tRNA genes |
330 |
818 |
525 |
CRISPR count |
48 |
7 |
57 |
MG-RAST data |
|
|
|
% Dhc in culture* |
43.7 |
31.3 |
18.2 |
Metagenome size (bp)* |
106,508,248 |
916,191,214 |
330,396,345 |
Average read length* |
958 |
477 |
547 |
Number of sequences* |
111,162 |
1,920,396 |
603,841 |
Number (%) identified for metabolic analysis† |
63,352 (57.0) |
363,424 (18.9) |
222,012 (36.8) |
Number (%) identified for phylogenetic analysis† | 88,888 (80.0) | 540,785 (28.2) | 294,470 (48.8) |
* = post-MG-RAST preprocessing, which removed duplicate reads and nonsense reads from the datasets.
† = maximum e-value of 1x10-5, minimum alignment length ~100.