| Reviewer name and names of any other individual's who aided in reviewer | Reuben W Nowell |
| Do you understand and agree to our policy of having open and named reviews, and having your review included with the published papers. (If no, please inform the editor that you cannot review this manuscript.) | Yes |
| Is the language of sufficient quality? | Yes |
| Please add additional comments on language quality to clarify if needed | |
| Are all data available and do they match the descriptions in the paper? | No |
| Additional Comments | I wasn't able to access the data with the FTP link provided. |
| Are the data and metadata consistent with relevant minimum information or reporting standards? See GigaDB checklists for examples <a href="http://gigadb.org/site/guide" target="_blank">http://gigadb.org/site/guide</a> | Yes |
| Additional Comments | |
| Is the data acquisition clear, complete and methodologically sound? | Yes |
| Additional Comments | |
| Is there sufficient detail in the methods and data-processing steps to allow reproduction? | Yes |
| Additional Comments | |
| Is there sufficient data validation and statistical analyses of data quality? | Yes |
| Additional Comments | |
| Is the validation suitable for this type of data? | Yes |
| Additional Comments | |
| Is there sufficient information for others to reuse this dataset or integrate it with other data? | Yes |
| Additional Comments | |
| Any Additional Overall Comments to the Author | A very nice piece of work, I have only a few minor comments: - Line 140: "with the k-mer length set to 1" - do you mean 21? - Line 164: great that you provide a link to the GenomeScope html but I recommend to add these kmer plots as additional supplemental figures, they are extremely useful. Just a screenshot of the GenomeScope plot would be fine. - Line 164: in relation to the kmer distributions, in fact both plots look a little bit multimodal to me... especially the Eubasilissa, with peaks at 1n (20x), 2n (40x) and 4n (~80x) coverage. This might indicate tetraploidy, which might explain the large increase in genome span and gene number for this species too. You could run OrthoFinder and look at the distribution of OG membership size, for diploid assemblies it peaks at 2, but you might find a peak at 2 and 4 for Eubasilissa if it is tetraploid. - Line 167: how many contaminant contigs were identified, and where did they come from? - Line 168: the coverage for both species is roughly the same, but the species with the much larger genome is the more contiguous one - any ideas why this is the case? - Line 184: maybe this is a silly question, but how do you know they are full-length? Based on the B. mori BAC sequence? - Line 192: a unit for molecular weight, Da? - Line 224: would be useful to know how many genes are in the Insecta core BUSCO db (i.e., where the 95% comes from). - Line 233: is there a possibility that RepetModeler has also classified the repeat-rich fibroin genes as 'repeats', and so these are masked in the assemblies? - Line 243: this is a huge difference in gene number! Why? Is the E. regina assembly actually a diploid assembly? Or ploidy > 2? [See above comment on kmer plots]. - Line 265: "insects have generally been neglected with respect to genome sequencing efforts" - quite a bold statement and I'm not sure I agree, there has been a huge focus on lepidopteran genomics and much of the early sequencing from initiatives such as Darwin Tree of Life have been on insects (also i5k). - Line 457: Table 2: any idea why the P. interpunctella HiFi assembly is ~60 Mb shorter than the two Illumina assemblies? - Line 475: Figures 2 and 3: these are nice figures but I don't quite follow what the two coloured panels on the left are showing, specifically, why are there two panels? A bit more clarification in the legend needed perhaps. - Line 476: N and C capitalised |
| Recommendation | Minor Revision |