Skip to main content
. 2022 Jun 30;2022:gigabyte64. doi: 10.46471/gigabyte.64
Reviewer name and names of any other individual's who aided in reviewer Martin Pippel
Do you understand and agree to our policy of having open and named reviews, and having your review included with the published papers. (If no, please inform the editor that you cannot review this manuscript.) Yes
Is the language of sufficient quality? Yes
Please add additional comments on language quality to clarify if needed
Are all data available and do they match the descriptions in the paper? Yes
Additional Comments
Are the data and metadata consistent with relevant minimum information or reporting standards? See GigaDB checklists for examples <a href="http://gigadb.org/site/guide" target="_blank">http://gigadb.org/site/guide</a> Yes
Additional Comments
Is the data acquisition clear, complete and methodologically sound? Yes
Additional Comments
Is there sufficient detail in the methods and data-processing steps to allow reproduction? Yes
Additional Comments
Is there sufficient data validation and statistical analyses of data quality? Yes
Additional Comments
Is the validation suitable for this type of data? Yes
Additional Comments
Is there sufficient information for others to reuse this dataset or integrate it with other data? Yes
Additional Comments (partly) : To make the study fully reproducible the authors need to upload the PacBio HiFi data (e.g. to NCBI). Otherwise the genome assemblies cannot be reproduced with the available raw data in GenBank.
Any Additional Overall Comments to the Author The manuscript entitled “Long-read HiFi sequencing correctly assembles repetitive heavy fibroin silk genes in new moth and caddisfly genomes” from Kawahara et al. describes the de novo assembly and gene annotation of two silk-producing insect species Plodia interpunctella and Eubasilissa regina. The manuscript is well structured and written. Sequencing data, assemblies and genome annotations are publicly available and can be reused by the scientific community. Both contig assemblies show a very high contiguity and good BUSCO scores. Indeed, several from the 118 P. interpunctella and 53 E. regina contigs show telomere repeat sequence at both ends indicating that those represent full chromosomes. Furthermore, the authors showed that even long repetitive genes such as silk fibroin genes were gapless assembled. I consider the manuscript as a valuable contribution for the scientific community and do only have some minor comments and suggestions: line 129: - which CCS version was used? line 140: - k-mer length was set to 1? Not 21? line 148: - Typo: obd10 reference endopterygota. - In order to make the Busco scores better comparable to other recent Lepidoptera assemblies it would be better to provide the BUSCO scores for P. interpunctella based on the lepidoptera lineage line 158: CCS data should be added to GenBank as well. Usually the raw data (subreads.bam) is lossy converted into fastq files from NCBI, which makes it impossible to reproduce the consensus step with pbCCS or even the assembly. line 159: Both read coverages are quite high and the heterozygosity rates are with 0.7 (Eubasilissa) and 0.36 (Plodia) high as well. I was wondering if the alternate assemblies were also of a decent quality and if those are published as well? line 265: As of today, there are at least 3 other HiFi assemblies available: (GCA_917563855.2, GCA_929108145.1, GCA_917880885.1) line 457: Table 2 states that E.regina was assembled into 53 contigs. However the assembly available at NCBI GCA_022840565.1 has 123 contigs!?
Recommendation Minor Revision