Table 1.
Assembly | Count |
---|---|
Raw reads (paired-end) | 26,991,925 |
After cleaning | 17,319,746 |
Contigs | 30,317 |
Average length ± SD | 628.70 ± 840.07 |
Length (min and max) | 201 to 31,945 |
GC content | 40.42% |
Raw reads mapped to contigs | 97.69% |
CDS | Count |
Containing a coding region | 23,534 (78%) |
Transcripts with significant BLAST hit (1 × 10−5) | 16,925 (72%) |
With homologues in databases: | |
GenBank non-redundant Cnidarian protein sequences | 15,987 (53%) |
H. vulgaris | 14,261 (47%) |
SwissProt | 13,375 (44%) |
N. vectensis | 12,144 (40%) |
Uniprot animal toxin and venom | 549 (2%) |
Sequence analysis | Count |
Returning GO term | 11,586 (49%) |
GO terms returned: | |
Molecular function | 8265 (35%) |
Biological process | 4768 (20%) |
Cellular component | 2173 (9%) |
Predicted proteins with signal sequences * | 1012 (4%) |
Predicted proteins with two or more transmembrane helices | 641 (2%) |
* SignalP on top hit from SwissProt returned 1666 (7%).