Table 1.
Assembly | |
---|---|
Raw reads (paired-end) | 43,150,858 |
After clipping and QC | 23,370,860 |
Contigs | 34,438 |
Average length ± SD | 1,056 ± 1359 |
Length min - max | 100-26,403 |
% GC content | 38.88 |
Raw reads mapped to contigs | 13,052,970 (56%) |
ORFs | |
Transcripts with signficant BLAST hit (10e-5) | 13736 (40%) |
Containing an Open Reading frame | 20,548 (60%) |
With homologues in: | |
Nematostella vectensis | 12,143 (35%) |
GenBank nr proteins (Cnidaria) | 13,035 (38%) |
Hydra magnipapillata | 11,681 (34%) |
SwissProt | 11,123 (32%) |
UniProt venom and toxins database | 455 (1%) |
Matching CEGMA core eukaryotic proteins | |
% Full length (>90% cover) | 77.02 |
% Partial (<90% cover) | 80.65 |
Interproscan | |
Returning Pfam terms | 10,653 (31%) |
Returning GO terms | 7,208 (21%) |
Total GO terms | 17203 |
Biological Process | 5,060 |
Cellular Component | 2,745 |
Molecular Function | 9,398 |
Signal sequence and transmembrane domains | |
Predicted proteins with signal sequences | 930 (3%) |
Predicted proteins with > = 2 transmembrane domains | 1,332 (4%) |