A high-quality draft genome for Melaleuca alternifolia (tea tree): a new platform for evolutionary genomics of myrtaceous terpene-rich species

. 2021 Aug 9;2021:gigabyte28. doi: 10.46471/gigabyte.28

Reviewer name and names of any other individual's who aided in reviewer	Yue Zhang
Do you understand and agree to our policy of having open and named reviews, and having your review included with the published papers. (If no, please inform the editor that you cannot review this manuscript.)	Yes
Is the language of sufficient quality?	Yes
Please add additional comments on language quality to clarify if needed
Are all data available and do they match the descriptions in the paper?	Yes
Additional Comments
Are the data and metadata consistent with relevant minimum information or reporting standards? See GigaDB checklists for examples <a href="http://gigadb.org/site/guide" target="_blank">http://gigadb.org/site/guide</a>	Yes
Additional Comments
Is the data acquisition clear, complete and methodologically sound?	Yes
Additional Comments
Is there sufficient detail in the methods and data-processing steps to allow reproduction?	Yes
Additional Comments
Is there sufficient data validation and statistical analyses of data quality?	Yes
Additional Comments
Is the validation suitable for this type of data?	Yes
Additional Comments
Is there sufficient information for others to reuse this dataset or integrate it with other data?	Yes
Additional Comments
Any Additional Overall Comments to the Author	Comments to DRR-202104-02 This manuscript reported an updated genome database for M. alternifolia, which was mainly used for investigating the genome evolution in Myrtaceae. I think this database will be useful for the community. Even though, I have some suggestions that may help to improve this database and manuscript. My first issue with this manuscript was the three assemblies by using different methods and datasets. The Canu and Flye assemblies only used the SMRT dataset, however, a bigger dataset including previous Illumina sequencing reads and SMRT reads was used for MaSuRCA assembly. And the results showed that the MaSuRCA assembly was better than the other two assemblies. With different sizes of datasets, we cloud not compare the performance for different genome assembly algorithms. The bigger dataset means more sequencing depth, which reasonably results in better genome assembly. If this paper wants to compare the algorithms, the dataset should be the same. By using the Fgenesh++ pipeline, a total of 37,226 gene models with a complete open reading frame (ORF) of the reference genome were predicted. The genes without ORF or overlapping with transposable elements were filtered out, therefore, the number of filtered genes in the main Table3 makes little sense. Besides, I suggested that the authors mapped the gene sequencing to different databases, such as Gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG), to complete the gene functional annotation. Only the M. alternifola CDS were mapped to the E. grandis genome for identifying the gene order change. I wonder to know how the authors deal with the small scaffolds that might contain single or few genes. Besides, what biological functions of the inversion or translocation genes are? In the introduction, a previous study suggested the Melelaeucae and Eucalypteae have a diversity of unique mono- and sesquiterpenoid compounds in their leaf oils. Whether the genes with changed order resulted from genome structure variation and these genes, especially in TPS gene family, contributed to the diversity of sesquiterpenoid compounds need further analysis and discussion. Small issues: Line 151. The definition of genome coverage is different from the sequencing depth, please confirm whether both genome coverage and sequencing depth are around 55x. Line 181. Which color dots means the contaminated scaffolds from bacterial, viral, or fungal, there is no explains in the figure legend or manuscript. Line 275. “suggest” should be “suggests”.
Recommendation	Major Revision