Table 2.
Statistics of the tea plant genome assembly and improved annotation.
Assembly | |
Estimated genome size (Gb) | 3.08 |
Number of scaffolds | 14,051 |
Total length of scaffolds (bp) | 3,141,536,798 |
N50 of scaffolds (bp) | 1,397,810 |
N90 of scaffolds (bp) | 358,724 |
Longest scaffold (bp) | 7,310,916 |
Number of contigs | 94,321 |
Total length of contigs (bp) | 2,893,782,109 |
N50 of contigs (bp) | 67,068 |
N90 of contigs (bp) | 14,057 |
Longest contig (bp) | 538,748 |
Gap sequence (bp) | 247,754,689 |
Predicted coverage of the assembled sequences (%) | 95.07 |
GC content of the genome (%) | 37.84 |
Annotation | |
Number of predicted protein-coding genes | 53,512 |
Average gene length (bp) | 3,747 |
Mean exon length (bp) | 284 |
Average exon per gene | 4.5 |
Mean intron length (bp) | 712 |
Annotated to Swissport | 34,694 (64.83%) |
Annotated to PFAM | 39,889 (74.54%) |
Annotated to TAIR (version 10) | 38,952 (72.79%) |
Annotated to GO | 21,961 (41.04%) |
Annotated to KOG | 14,587 (27.26%) |
tRNAs | 597 |
rRNAs | 2,838 |
snRNAs | 416 |
miRNAs | 355 |
Masked repeat sequence length (bp) | 1,861,774,995 |
Percentage of repeat sequences (%) | 64.42 |
The statistics of genome assembly are based on sequence lengths that are larger than 1 kb. The protein-coding genes were re-predicted based on the improved ab intio training models and manual filtering. Putative functions of the re-annotated tea plant genes were predicted by aligning them against Swiss-Prot, InterPro, KEGG and GO databases. The statistics of genome assembly, noncoding RNAs and repeat contents were summarized from our previous work6.