Skip to main content
. 2019 Jul 15;6:122. doi: 10.1038/s41597-019-0127-1

Table 2.

Statistics of the tea plant genome assembly and improved annotation.

Assembly
Estimated genome size (Gb) 3.08
Number of scaffolds 14,051
Total length of scaffolds (bp) 3,141,536,798
N50 of scaffolds (bp) 1,397,810
N90 of scaffolds (bp) 358,724
Longest scaffold (bp) 7,310,916
Number of contigs 94,321
Total length of contigs (bp) 2,893,782,109
N50 of contigs (bp) 67,068
N90 of contigs (bp) 14,057
Longest contig (bp) 538,748
Gap sequence (bp) 247,754,689
Predicted coverage of the assembled sequences (%) 95.07
GC content of the genome (%) 37.84
Annotation
Number of predicted protein-coding genes 53,512
Average gene length (bp) 3,747
Mean exon length (bp) 284
Average exon per gene 4.5
Mean intron length (bp) 712
Annotated to Swissport 34,694 (64.83%)
Annotated to PFAM 39,889 (74.54%)
Annotated to TAIR (version 10) 38,952 (72.79%)
Annotated to GO 21,961 (41.04%)
Annotated to KOG 14,587 (27.26%)
tRNAs 597
rRNAs 2,838
snRNAs 416
miRNAs 355
Masked repeat sequence length (bp) 1,861,774,995
Percentage of repeat sequences (%) 64.42

The statistics of genome assembly are based on sequence lengths that are larger than 1 kb. The protein-coding genes were re-predicted based on the improved ab intio training models and manual filtering. Putative functions of the re-annotated tea plant genes were predicted by aligning them against Swiss-Prot, InterPro, KEGG and GO databases. The statistics of genome assembly, noncoding RNAs and repeat contents were summarized from our previous work6.