Comparison of different sequencing strategies for assembling chromosome-level genomes of extremophiles with variable GC content

Zhidong Zhang; Guilin Liu; Yao Chen; Weizhen Xue; Qianyue Ji; Qiwu Xu; He Zhang; Guangyi Fan; He Huang; Ling Jiang; Jianwei Chen

doi:10.1016/j.isci.2021.102219

. 2021 Feb 20;24(3):102219. doi: 10.1016/j.isci.2021.102219

Comparison of different sequencing strategies for assembling chromosome-level genomes of extremophiles with variable GC content

Zhidong Zhang ^1,^2,¹⁰, Guilin Liu ^3,¹⁰, Yao Chen ¹, Weizhen Xue ³, Qianyue Ji ³, Qiwu Xu ³, He Zhang ³, Guangyi Fan ^3,⁷, He Huang ^4,⁵, Ling Jiang ^6,^∗, Jianwei Chen ^3,^7,^8,^9,^11,^∗∗

PMCID: PMC7961107 PMID: 33748707

Summary

In this study, six bacterial isolates with variable GC, including Escherichia coli as mesophilic reference strain, were selected to compare hybrid assembly strategies based on next-generation sequencing (NGS) of short reads, single-tube long-fragment reads (stLFR) sequencing, and Oxford Nanopore Technologies (ONT) sequencing platforms. We obtained the complete genomes using the hybrid assembler Unicycler based on the NGS and ONT reads; others were de novo assembled using NGS, stLFR, and ONT reads by using different strategies. The contiguity, accuracy, completeness, sequencing costs, and DNA material requirements of the investigated strategies were compared systematically. Although all sequencing data could be assembled into accurate whole-genome sequences, the stLFR sequencing data yield a scaffold with more contiguity with more completeness of gene function than NGS sequencing assemblies. Our research provides a low-cost chromosome-level genome assembly strategy for large-scale sequencing of extremophile genomes with different GC contents.

Subject areas: Microbiology, Microbial Genomics, Omics

Graphical abstract

Highlights

•
Assembling and evaluating bacterial chromosome-level genomes by multiple strategies
•
The stLFR data can assemble chromosome sequences of different GC extremophiles

Microbiology; Microbial Genomics; Omics

Introduction

Based on the severe environments to which extremophiles have adapted, they include thermophiles, psychrophiles, alkalophiles, acidophiles, barophiles, and radiation-resistant organisms (Mao et al., 2017; Orellana-Saez et al., 2019; Swarup et al., 2014; Urbieta et al., 2015). These microbes thrive in ecological niches such as deep-sea hydrothermal vents, hot springs, geysers, salt flats, deserts, natural lakes, sulfuric fields, and so on (Brito et al., 2006; DeLong, 2000; Kang et al., 2018; Palmieri et al., 2019; Ziko et al., 2019). Half a century ago, extremophiles received little attention, but they are recently being increasingly explored as sources of basic data as well as useful enzymes for molecular biology and the biotech industry (Merino et al., 2019; Rothschild and Mancinelli, 2001). For example, biocatalysts cloned from extremophiles have had a great impact on the global biotechnological market (Brininger et al., 2018; Mokashe et al., 2018; Schiraldi and De Rosa, 2002; Wang et al., 2019a). The enzymes with the widest applications include Taq DNA polymerase (Chien et al., 1976), heat-tolerant cellulase (Adamiak et al., 2015), alkali-resistant β-D-galactosidase (Wang et al., 2011), etc. Additionally, various CRISPR loci belonging to different CRISPR-Cas systems have been identified in the genomes of extremophiles in recent years, providing a valuable resource for mining efficient gene editing solutions (Makarova et al., 2015). However, the exploitation and utilization of extremophiles is still challenging owing to demanding separation and purification of strains as well as the further mining of their functional genes. What's more, there is still a lack of technology for efficiently mining biological information from extremophiles on a large scale efficiently and at low cost.

Owing to the advances in high-throughput sequencing technology, a large amount of gene sequence data can be acquired in a relatively short time (Metzker, 2010; Niedringhaus et al., 2011). The next-generation sequencing (NGS) technologies, such as Illumina, MGI, Shenzhen, and Ion Proton, have enabled widespread bacterial whole-genome sequencing, producing millions of paired-end reads with a low error rate (0.1%). However, the short reads of 100–300 bp make it challenging to fully reconstruct genomic structures of interest (De Maio et al., 2019). Hybrid assembly based on third-generation sequencing technologies, such as the Oxford Nanopore Technologies (ONT) and SMRT Pacific Biosciences (PacBio) sequencing platforms, combined with NGS short-read sequencing can be used to assemble the complete chromosome and recover plasmid genomes. However, these sequencing strategies require library construction, sequencing on two different platforms, and large amounts of high-quality DNA than NGS sequencing, and are much more costly. When starting a large-scale bacterial whole-genome sequencing (WGS) project, it is a challenge to choose the most cost-effective sequencing strategy and still obtain high-quality genome sequences. Therefore, a new sequencing approach that integrates the low cost and high accuracy with enhanced efficiency for extremophiles is highly desirable but remains challenging.

In the past decades, numerous methods have been developed to capture long-range information with short-read sequencing, including mate-pair (Korbel et al., 2007; Rubin et al., 2007), clonal barcoding methods (e.g., synthetic long reads [Bankevich and Pevzner, 2016; Peters et al., 2012; Voskoboynik et al., 2013], linked reads (Wang et al., 2019b; Zhang et al., 2017; Zheng et al., 2016), and Hi-C ]Burton et al., 2013]). Among these, clonal barcoding library technologies (Bankevich and Pevzner, 2016; Peters et al., 2012; Voskoboynik et al., 2013; Wang et al., 2019b; Zhang et al., 2017; Zheng et al., 2016) showed the most promising results in terms of bringing routine long-read capability using second-generation platforms. For example, single-tube long-fragment read (stLFR) technology is a novel WGS library preparation approach that enables efficient WGS, haplotyping, and contig scaffolding on the basis of adding a single unique clonal barcode sequence to sub-fragments of the original DNA in a single-tube process (Wang et al., 2019b). The use of microbeads as miniaturized virtual compartments allows a practically unlimited number of clonal barcodes to be used per sample at a negligible cost. The stLFR method enables short-read NGS systems to generate highly accurate and economical long-read sequencing information for de novo genome assembly (Chen et al., 2020).

In the present study, we described for the first time the implementation of stLFR technology to resolve the accurate sequencing of complex extremophile genomes. Different strategies for hybrid bacterial genome assembly were selected and compared, including Illumina, ONT, and PacBio data generated from the same DNA extracts. We selected five radiation-resistant extremophiles isolated from the Xinjiang Uygur Autonomous Region of China (Bacillus cereus 43-1A, Brevibacterium frigoritolerans 44A, Rufibacter sp. LB8, Deinococcus wulumuqiensis R12, Janibacter melonis M714) as well as Escherichia coli K-12 as the reference strain. The GC content of the investigated genomes varied from 30% to 70%. Moreover, extremophilic microbes usually have large genomes of 4.3–6.5 Mb as well as varying numbers of plasmids (Carattoli, 2009). The objective of this work was to evaluate and optimize the accuracy of stLFR technology when sequencing the genomes of extremophiles with different GC content, and the analytical results were compared with both NGS and third-generation sequencing. The conclusion paves the way for rapid, cheap, and accurate generation of completely resolved extremophile genomes to become widely accessible.

Results

High GC bacterial stLFR sequencing and assembly

To determine the optimal conditions for the construction of stLFR libraries for bacteria with high genomic GC content, we used five different concentrations of the interrupting enzyme, ranging from 0.4 to 1.2 pmol/10 ng DNA, to construct the libraries of D. wulumuqiensis R12. We generated 2 Gb raw sequencing data for each concentration (Table S1). After filtering, we found that the sequencing reads number and barcode frequency distributions of the five concentration clean reads changed at different enzyme concentrations. When the enzyme concentration was low, less transposon insertion resulted in less fragmentation of the DNA, so that many co-barcode reads with lower barcode frequency perhaps had larger insert size and contributed to genome assembly and scaffolding (Figure 1A). The clean reads of the five concentrations were assembled into draft genome using Supernova. We found that the estimated molecule lengths and the assembled genome sizes of libraries constructed with different enzyme concentrations were similar, but the scaffold N50 values were significantly different (Table S1, Figure 1B). The scaffold N50 of 0.4 pmol/10 ng DNA assembly genome was 2,905 kb, which accounted for 80% of the genome length, and was significantly higher than the other concentrations with scaffold N50, about 156–402 kb. These results indicated that this enzyme concentration offers assembly results with the most contiguity.

The five different enzyme concentrations/ng DNA using stLFR sequencing of *D. wulumuqiensis* R12

(A) The barcode frequency distribution of five conditions.

(B) The Supernova-assembled statistics of five conditions.

Draft genome assembly using NGS short reads

We used more than 100X NGS clean data for each sample to assemble the draft bacterial genomes using SPAdes (Table S2). The assembled genome sizes of the 6 bacteria ranged from 3.39 Mb (D. wulumuqiensis R12) to 5.52 Mb (B. frigoritolerans 44A), with scaffold N50 values ranging from 34.67 kb (D. wulumuqiensis R12) to 973 kb (J. melonis M714) (Table 1, Figure 2). The CheckM genome quality evaluation showed that the completeness of all genomes was higher than 97% and contamination was lower than 2.5%, reflecting the high quality of each draft genome (Table 1, Figure 2). Accordingly, the estimated genome sizes ranged from 3.18 to 6.15 Mb according to the 17 bp k-mer frequency distribution (Figure S2). Thus the assembled genomes were close to the estimated sizes.

Table 1.

Statistics of three sequencing strategies assembly genomes

Sample	Method	Scaf (#)	Length (bp)	Gap (bp)	N50 (bp)	N90 (bp)	GC%	ContigN50 (bp)	Mapped rate %	Single base %	Structure %	Completeness %	Contamination %	Gene (#)	Gene AvgL	Gene CheckM	16SrRNA	Repeat L (bp)
LB8	NGS	28	4,730,895	100	356,460	127,204	50.28	306,200	92.26	99.99	99.66	99.97	1.04	4,056	985.10	99.88	1	10,640
LB8	stLFR	2	4,746,533	1,025	4,591,406	4,591,406	50.30	2,795,707	92.26	99.98	99.94	99.97	1.04	4,089	981.57	99.80	3	9,909
LB8	stLFR + ONT	2	4,746,090	0	4,590,963	4,590,963	50.30	4,590,963	92.27	100.00	99.92	99.97	1.04	4,086	982.57	99.80	3	9,909
LB8	Unicycler	2	4,746,099	0	4,590,972	4,590,972	50.30	4,590,972	92.27	100.00	99.94	99.97	1.04	4,087	982.35	99.94	3	9,916
LB8	Canu	2	4,875,130	0	4,654,191	4,654,191	50.25	4,654,191	92.25	99.75	99.68	99.97	3.72	4,299	956.03	99.90	3	10,481
M714	NGS	33	3,483,103	300	973,846	862,807	72.89	973,846	97.8	99.77	99.62	99.82	0.18	3,353	955.71	99.28	1	25,845
M714	stLFR	2	3,480,703	10	3,426,494	3,426,494	72.90	1,978,253	97.74	100.00	99.81	99.82	0	3,360	956.86	99.28	2	25,499
M714	stLFT + ONT	2	3,478,886	0	3,426,533	3,426,533	72.90	3,426,533	97.82	100.00	99.81	99.82	0	3,358	957.18	99.28	2	25,374
M714	Unicycler	2	3,481,073	0	3,426,637	3,426,637	72.99	3,426,637	97.85	100.00	99.85	99.82	0	3,359	956.54	99.28	2	25,531
M714	Canu	1	3,357,952	0	3,357,952	3,357,952	72.99	3,357,952	83.8	97.90	91.61	85.99	0	3,321	891.49	85.99	2	28,332
R12	NGS	207	3,392,156	149	34,666	9,895	66.19	33,013	96.48	99.83	95.29	97.88	0.85	3,218	904.53	96.61	1	24,824
R12	stLFR	5	3,577,039	2,430	2,869,672	286,860	66.01	118,540	97.41	99.35	96.52	98.73	2.75	3,430	903.33	98.73	3	26,062
R12	stLFT + ONT	5	3,610,754	0	2,904,890	286,860	65.98	2,904,890	98.26	99.42	97.20	99.58	2.75	3,474	902.14	99.15	3	24,804
R12	Unicycler	5	3,505,947	0	2,857,585	323,544	66.05	2,857,585	98.41	99.89	98.06	99.58	0.21	3,335	913.47	99.15	3	26,574
R12	Canu	4	3,577,988	0	2,874,385	2,874,385	65.90	3,016,002	95.03	99.62	96.50	97.14	0.21	3,701	819.63	97.35	3	34,079
K-12	NGS	861	4,998,809	350	132,349	5,037	49.88	132,349	97.33	91.85	98.65	100.00	2.25	4,520	900.24	99.97	2	26,612
K-12	stLFR	6	4,578,448	1,040	4,561,935	4,561,935	50.77	281,469	97.35	99.97	99.72	99.97	0.04	4,368	928.29	99.97	7	9,426
K-12	Ref	1	4,502,758	0	4,502,758	4,502,758	50.78	4,502,758	96.04	99.98	99.69	99.37	0.04	4,276	932.16	99.37	7	20,057
44A	NGS	98	5,515,358	210	601,314	110,298	40.53	539,757	98.13	99.49	99.47	98.63	1.84	5,410	827.50	98.63	1	46,210
44A	stLFR	8	5,574,407	91,277	4,239,514	264,146	40.43	264,146	97.85	98.36	99.39	98.63	1.39	5,417	828.10	98.63	3	32,708
43-1A	NGS	62	5,442,287	100	429,469	80,246	35.26	372,952	97.03	97.38	99.64	98.61	0.35	5,600	824.33	98.61	1	42,164
43-1A	stLFR	13	5,577,895	146,220	4,637,631	426,869	35.26	282,204	96.77	99.96	99.54	98.61	0.33	5,621	823.66	98.61	3	24,531

Open in a new tab

The three sequencing strategies-assembled genome statistics of the six bacterial strains

The total assembly genome length (bottom), N50 length (middle), and maximum scaffold length (top) by using different algorithms are shown. The assembly algorithms of each sample from left to right are NGS draft genome, stLFR chromosome scaffolds, stLFR + ONT complete genome, ONT complete genome assembled by Canu, and Hybrid complete genome assembled by Unicycler using ONT reads and NGS reads.

Complete genome assembly

For ONT sequencing, we generated 2.22, 1.32, and 2.91 Gb data, with N50 of 20.07, 36.49, and 29.60 kb, and a median read quality of 11 for strains Rufibacter sp. LB8, D. wulumuqiensis R12, and J. melonis M714, respectively (Table S4). We used all ONT reads longer than 8 kb to assemble the bacterial chromosome using Unicycler software based on the NGS reads. Additionally, the ONT reads were directly assembled into the complete genomes using Canu. All the genomes were polished using NGS short reads to fix base errors in Pilon and GATK. For the three ONT Unicycler assembly genomes, the final corrected genomes had circular chromosome sequences and lengths close to the k-mer estimated genome sizes. Additionally, some had circular plasmid sequences. All the genomes had high accuracy, with the genomics completeness >99%, single-base accuracy rates >99.8%, and structural accuracy rates >99.7%. Thus the results were of sufficiently high quality to be regarded as reference genomes of these strains. However, in the high-GC-content genomes assembled using Canu, the genomics completeness was less than Unicycler and also displayed low structural accuracy (91.61% for M714 and 95.89% for R12), which was lower than that of the other genome assemblies, including NGS assemblies (Table 1). This indicated that the quality of genomes assembled from ONT reads with high error rate using Canu was lower than that of the hybrid genome assemblies obtained using Unicycler.

Chromosome-level genome assembly using stLFR

We obtained more than 2.5 Gb clean stLFR data from the six samples (Table S2). To find the best assembly method for bacterial stLFR data, we used SPAdes, cloudSPAdes, Athena, Architect, and Supernova to assemble all clean reads for each genome, and SLR-scaffolder was used to link the scaffolds. Owing to the occurrence of large gaps during scaffolding, the Supernova-assembled genomes of high-GC strains D. wulumuqiensis R12 and J. melonis M714 were larger than those produced by the other assembly methods. However, the other strains had the same genome sizes with different assembly methods (Figure S1). We found that the scaffold N50 of the Supernova assembly result was higher than that of the other algorithms, and the scaffold N50 length was consistent with the longest scaffold length and accounted for more than 95% of the total assembled genome length (Figure S1), which indicated that it was a chromosome-level scaffold. Among barcoding-based synthetic long-read assembly algorithms, Supernova gave the best assembly results, followed by Athena and cloudSPAdes. We used the best chromosome-level scaffold assemblies for subsequent comparative analysis (Table 1).

Furthermore, in the chromosome-level scaffolds of Rufibacter sp. LB8, D. wulumuqiensis R12, and J. melonis M714, ONT reads were used to close the gaps and obtain the complete genomes. We generated the complete chromosome and plasmid scaffolds for the three samples. The genomes with closed gaps had higher accuracy than the chromosome-level scaffolds, reaching similar genome sizes, completeness, and accuracy as those of the Unicycler assemblies (Table 1).

Comparison of the genome assemblies

We compared the assembly results from the three sequencing strategies using various methods. First, the draft genomes were assembled from NGS data using SPAdes (NGS genomes). Second, the chromosome-level genomes were assembled from stLFR data using Supernova (or Athena) and SLR-scaffolder (stLFR genomes). The hybrid assembled complete genomes were obtained after closing the gaps in TGS-Gapfiller using the ONT reads (stLFR + ONT genomes). Finally, the complete reference genomes were assembled based on the Nanopore and NGS data using Unicycler, except for the downloaded NCBI reference genome CP011124.1, which was assembled from PacBio long sequencing reads as the representative complete genome of E. coli K-12 (Table 1) (Tharek et al., 2017). For comparison, we assembled the complete genome in Canu, using only ONT reads. For all assembly results, the genome lengths were consistent with the k-mer estimated results (Figure 2, Table1), and the completeness assessments of the genomes by CheckM were also very close (∼99%) with the high mapped rates of NGS reads (∼97%), except for the M714 Canu-assembled genomes (Table 1). This indicated that the assembly results of the long-read sequencing strategies (included stLFR, Nanopore, and PacBio) were reasonable and accurate.

In addition, we compared the structural accuracy of the results of the three sequencing strategies for strains E. coli K-12, Rufibacter sp. LB8, D. wulumuqiensis R12, and J. melonis M714. There was no significant difference in the single base accuracy rate and structural accuracy rate between stLFR genomes, stLFR + ONT genomes, and third-generation sequencing Unicycler-assembled genomes, whereas it was higher than that of NGS sequencing and Canu-assembled genomes (Table 1). The circular synteny analysis of stLFR genomes, stLFR + ONT genomes, Canu assembly genomes, and Unicycler assembly genomes did not reveal any abnormal structural errors or large fragment indels. Moreover, the gene distribution was even and without bias, which also showed that the assembled structures of the genomes were consistent (Figure 3). For the strains B. cereus 43-1A and B. frigoritolerans 44A, the synteny analysis also showed that the NGS assembly could completely match the stLFR-assembled genomes (Figure S5). Furthermore, the NGS-, stLFR-, stLFR + ONT-, and Canu-assembled genomes were compared with the complete Unicycler-assembled genomes using QUAST to show the sequence alignment and to estimate the base accuracy, genome fraction, mismatches, and indels per 100 kb. The stLFR assemblies and stLFR + ONT assemblies showed higher consistency and fewer misassembled blocks compared with Unicycler assemblies than the NGS assemblies (Figure 4A). The Canu assemblies also had high consistency but contained more misassembled blocks than stLFR assemblies and stLFR + ONT assemblies for the high GC content strains R12 and M714 (Figure 4A). In all the strains, the genome fractions of stLFR assemblies and stLFR + ONT assemblies were higher than those of the NGS assemblies, and except D. wulumuqiensis R12, the SNPs and indels percentage between stLFR assemblies, stLFR + ONT assemblies, and NGS assemblies were similar (Figure 4B). Although the Canu assemblies had somewhat higher genome fractions, the stLFR assemblies and stLFR + ONT assemblies had lower mismatches and indels for Rufibacter sp. LB8, D. wulumuqiensis R12, and J. melonis M714. This was especially true for the M714 Canu assemblies, which had more than seven times higher percentages of indels than the other assemblies (Figure 4B).

The longest chromosome sequence comparisons for strains

(A–D) (A) *E. coli* K-12 using stLFR-assembled genome and the third-generation sequencing-assembled genome, (B) *Rufibacter* sp. LB8, (C) *D. wulumuqiensis* R12, and (D) *J. melonis* M714 using stLFR-assembled genomes, stLFR + ONT-assembled genomes, ONT + NGS Unicycler-assembled genomes, and ONT Canu-assembled genomes. The outermost circle is GC heatmap, the next circle is the histogram of GC (red: G > C; blue: G < (C), and the middle circle is gene density in chromosomes. The last two circles are the COG positive/negative annotation heatmaps, and the legend is shown at the bottom.

The longest chromosome sequence comparison

(A and B) (A) Sequences alignment viewer and (B) accuracy evaluation of stLFR genomes, stLFR + ONT genomes, Canu-assembled genomes, and NGS genomes when compared with the complete genomes assembled by Unicycler using NGS and ONT reads for the strains *E. coli* K-12, *Rufibacter* sp. LB8, *D. wulumuqiensis* R12, and *J. melonis* M714.

Finally, we compared the genome components and functional gene annotations of the different assemblies. The gene number and average gene length of stLFR assemblies and stLFR + ONT assemblies were closer to the Unicycler assembly results than the NGS assemblies and Canu assemblies. Furthermore, the stLFR assemblies, stLFR + ONT assemblies, and Canu assemblies detected the same number of 16S rRNA copies as the Unicycler assemblies, whereas the NGS assemblies detected only one 16SrRNA (Table 1). The COG annotation heatmap revealed that the different assembly strategies had the same annotated COG categories for each strain. Compared with the NGS fragment assembly genomes and Canu assemblies, the annotated gene number of each category of stLFR assemblies and stLFR + ONT assemblies was more similar to the Unicycler assemblies (Figures 3 and S3). At the same time, we investigated the KEGG annotation results and found the same patterns. For each strain, the different sequencing strategies produced the same annotated pathways, and the annotated gene number of stLFR assemblies and stLFR + ONT assemblies was similar to the complete reference genomes in each pathway (Figure S4).

Discussion

Current research on extremophiles is mostly focused on the exploration of physiological parameters, such as the extremozymes, to offer an excellent source of replacement for mesophilic ones currently used in biotechnology. The development of rapid and low-cost NGS technologies has cleared the way for exploiting natural genetic diversity and identifying the corresponding functional genes. In this study, we used three different sequencing strategies, including NGS, stLFR technology, and third-generation sequencing (Nanopore or PacBio) to construct libraries, sequence, and assemble the genomes of five extremophilic radiation-resistant strains and E. coli K-12. The GC content of the investigated genomes varied from 35% to 72%.

Among the three sequencing strategies, the cost of stLFR is about twice that of NGS, as well as less than one-third the cost of Nanopore sequencing and one-fourth that of PacBio sequencing (Figure S6). In addition, we generally combined the Illumina or MGI short-reads sequencing (NGS) with different long-read sequencing technologies (included Nanopore and PacBio) to obtain the complete accurate assembly of bacterial genomes, which increased the sequencing cost. Additionally, the computational resources required for stLFR are comparable to those needed for NGS and far lower than those needed for third-generation sequencing.

Furthermore, stLFR requires only 1–10 ng high-quality DNA, which is much lower than the 1,500 ng required for Nanopore or PacBio sequencing, and also lower than the 200 ng required for NGS (Tables S2–S4).

We investigated the optimal stLFR library construction conditions for bacteria with a high GC content and found that a low transposase concentration was favorable for sequencing. For all bacteria, we used a transposon that contains the sample index, and one-tenth of the magnetic beads were used for the library construction. Different from previous studies, in which stLFR was applied for animal or plant genomics, our method allowed the pooling and parallel sequencing of a large number of samples, which cannot be achieved by 10X Genomics read cloud sequencing technology (Goodwin et al., 2016). We also investigated the impacts of valid clean data and assembly software on assembly quality. We found that using 2–4 Gb clean data can result in a high-quality assembly, whereby Supernova and Athena were more suitable for the assembly stLFR reads to obtain a complete bacterial genome (Figure S1).

Based on the evaluation of the five genomes assembled using three sequencing strategies, we found that we could obtain chromosome-scale assembly scaffolds from stLFR sequencing data, while achieving the same structural and functional accuracy as the assembly results of third-generation sequencing (Table1, Figure2). Compared with the NGS assembly results, the stLFR assemblies had higher completeness and fewer mismatches. Additionally, we also used stLFR data to assemble the complete plasmid genomes, which was only achieved using third-generation sequencing before (Table1). Furthermore, we used ONT reads to fill the gaps in the stLFR assemblies and obtain the complete genomes. Compared with the ONT Unicycler assemblies, the new assembly method could also generate complete genomes with high accuracy using fewer computational resources.

Many assembly methods were compared for the construction of bacterial complete genomes using Nanopore or PacBio long-read sequencing data (Chin et al., 2016; Danko et al., 2019; De Maio et al., 2019; Koren et al., 2017). Here, we also compared the accuracy of the assembly results produced using Canu to directly assemble ONT data and using the hybrid assembly software Unicycler to assemble ONT data. Owing to the high sequence error rate of ONT reads, we found that the accuracy (included the ratio of mapped reads, genomic completeness, single-base accuracy, and structural accuracy) of the Unicycler hybrid assembly based on the NGS assembly contigs was much higher than that of Canu, especially for bacteria with abnormally high GC content. Similar studies used 10X Genomics long-read cloud sequencing data to assemble microbial genomes (Bishara et al., 2018a, 2018b; Tolstoganov et al., 2019; Weisenfeld et al., 2017). We optimized the conditions for the stLFR library construction and sequencing of bacterial genomes and constructed an stLFR de novo assembly pipeline to obtain chromosome-scale bacterial genomes. In conclusion, we have shown that assembling high-quality reference-grade bacterial genomes using stLFR sequencing data is a cost-effective option, especially for bacteria that are difficult to culture or do not yield large amounts of DNA due to challenging extraction. Based on the presented stLFR assembly results, we are confident that it will be possible to fill gaps and optimize ONT sequencing data to obtain the complete genome of any industrially important strain in the future.

Limitations of the study

We used stLFR sequencing technology for the first time to assemble the chromosome-level bacterial genomes, and this approach currently has some limitations. Owing to the short read length of stLFR sequencing, the bacterial chromosome sequences still contained gaps, and we will try to increase the read length using a 200- to 300-bp paired-end sequencing strategy to fill in the gaps caused by short repeats. There are currently few de novo assembly algorithms for stLFR data. Although the recently released stLFR de novo software is an exception (https://github.com/BGI-biotools/stLFRdenovo), it is based on Supernova and is commonly used in animal or plant genome assembly. It is therefore necessary to develop publicly available tools specifically for the assembly of bacteria stLFR data to obtain better results. In addition, the calling of large variations (such as structural variations, inversions, and copy number variations) in large-scale bacterial sequencing projects based on stLFR data also needs to be addressed further.

Resource availability

Lead contact

Further information and requests for resources should be directed to and will be fulfilled by the lead contact, Jianwei Chen (chenjianwei@genomics.cn).

Materials availability

This study did not generate new unique reagents.

Data and code availability

The sequencing data and assembly genomes that support the findings of this study have been deposited into CNSA (CNGB Sequence Archive) of CNGBdb with accession number CNP0001196 and under NCBI BioProject Accession PRJNA665116.

Methods

All methods can be found in the accompanying Transparent methods supplemental file.

Acknowledgments

The work supported by the National Natural Science Foundation of China (31922070, 32060004), the National Science and Technology Major Project of China (2017ZX10303406), the Natural Science Foundation of Jiangsu Province (BK20180038), Tianshan Pine Plan (2017XS26), and Basic Scientific R&D Program for Public Welfare Institutes in Xinjiang (KY2019023, KY 2019019).

Author contributions

Z.Z., writing – original draft preparation and investigation; G.L., application and calculation analysis; Y.C., data curation and software; H.H., conceptualization; W.X. and Q.J., methodology; Q.X., validation; H.Z. and G.F., formal analysis; L.J. and J.C., writing, editing, and funding acquisition.

Declaration of interests

The authors declare no competing interests.

Inclusion and diversity

We worked to ensure diversity in experimental samples through the selection of the genomic datasets. The author list of this paper includes contributors from the location where the research was conducted who participated in the data collection, design, analysis, and/or interpretation of the work.

Published: March 19, 2021

Footnotes

Supplemental information can be found online at https://doi.org/10.1016/j.isci.2021.102219.

Contributor Information

Ling Jiang, Email: jiangling@njtech.edu.cn.

Jianwei Chen, Email: chenjianwei@genomics.cn.

Supplemental information

Document S1. Transparent methods, Figures S1–S6, and Tables S1–S4

mmc1.pdf^{(870.3KB, pdf)}

References

Adamiak J., Otlewska A., Gutarowska B. Halophilic microbial communities in deteriorated buildings. World J.Microbiol.Biotechnol. 2015;31:1489–1499. doi: 10.1007/s11274-015-1913-3. [DOI] [PubMed] [Google Scholar]
Bankevich A., Pevzner P.A. TruSPAdes: barcode assembly of TruSeq synthetic long reads. Nat. Methods. 2016;13:248–250. doi: 10.1038/nmeth.3737. [DOI] [PubMed] [Google Scholar]
Bishara A., Moss E.L., Kolmogorov M., Parada A.E., Weng Z., Sidow A., Dekas A.E., Batzoglou S., Bhatt A.S. High-quality genome sequences of uncultured microbes by assembly of read clouds. Nat.Biotechnol. 2018;36:1067–1075. doi: 10.1038/nbt.4266. [DOI] [PMC free article] [PubMed] [Google Scholar]
Bishara A., Moss E.L., Tkachenko E., Kang J.B., Zlitni S., Culver R.N., Andermann T.M., Weng Z., Wood C., Handy C. De novo assembly of microbial genomes from human gut metagenomes using barcoded short read sequences. bioRxiv. 2018:125211. [Google Scholar]
Brininger C., Spradlin S., Cobani L., Evilia C. The more adaptive to change, the more likely you are to survive: protein adaptation in extremophiles. Semin.CellDev. Biol. 2018;84:158–169. doi: 10.1016/j.semcdb.2017.12.016. [DOI] [PubMed] [Google Scholar]
Brito J.A., Bandeiras T.M., Teixeira M., Vonrhein C., Archer M. Crystallisation and preliminary structure determination of a NADH: quinoneoxidoreductase from the extremophile Acidianusambivalens. Biochim.Biophys.Acta. 2006;1764:842–845. doi: 10.1016/j.bbapap.2005.09.015. [DOI] [PubMed] [Google Scholar]
Burton J.N., Adey A., Patwardhan R.P., Qiu R., Kitzman J.O., Shendure J. Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions. Nat.Biotechnol. 2013;31:1119–1125. doi: 10.1038/nbt.2727. [DOI] [PMC free article] [PubMed] [Google Scholar]
Carattoli A. Resistance plasmid families in Enterobacteriaceae. Antimicrob.Agents Chemother. 2009;53:2227–2238. doi: 10.1128/AAC.01707-08. [DOI] [PMC free article] [PubMed] [Google Scholar]
Chen Z., Pham L., Wu T.-C., Mo G., Xia Y., Chang P.L., Porter D., Phan T., Che H., Tran H. Ultralow-input single-tube linked-read library method enables short-read second-generation sequencing systems to routinely generate highly accurate and economical long-range sequencing information. Genome Res. 2020;30:898–909. doi: 10.1101/gr.260380.119. [DOI] [PMC free article] [PubMed] [Google Scholar]
Chien A., Edgar D.B., Trela J.M. Deoxyribonucleic acid polymerase from the extreme thermophile Thermusaquaticus. J.Bacteriol. 1976;127:1550–1557. doi: 10.1128/jb.127.3.1550-1557.1976. [DOI] [PMC free article] [PubMed] [Google Scholar]
Chin C.S., Peluso P., Sedlazeck F.J., Nattestad M., Concepcion G.T., Clum A., Dunn C., O'Malley R., Figueroa-Balderas R., Morales-Cruz A. Phased diploid genome assembly with single-molecule real-time sequencing. Nat. Methods. 2016;13:1050–1054. doi: 10.1038/nmeth.4035. [DOI] [PMC free article] [PubMed] [Google Scholar]
Danko D.C., Meleshko D., Bezdan D., Mason C., Hajirasouliha I. Minerva: an alignment- and reference-free approach to deconvolve Linked-Reads for metagenomics. Genome Res. 2019;29:116–124. doi: 10.1101/gr.235499.118. [DOI] [PMC free article] [PubMed] [Google Scholar]
De Maio N., Shaw L.P., Hubbard A., George S., Sanderson N.D., Swann J., Wick R., AbuOun M., Stubberfield E., Hoosdally S.J. Comparison of long-read sequencing technologies in the hybrid assembly of complex bacterial genomes. Microb.Genom. 2019;5:e000294. doi: 10.1099/mgen.0.000294. [DOI] [PMC free article] [PubMed] [Google Scholar]
DeLong E.F. Extreme genomes. Genome Biol. 2000;1 doi: 10.1186/gb-2000-1-6-reviews1029. reviews1029.1021. [DOI] [PMC free article] [PubMed] [Google Scholar]
Goodwin S., McPherson J.D., McCombie W.R. Coming of age: ten years of next-generation sequencing technologies. Nat. Rev. Genet. 2016;17:333–351. doi: 10.1038/nrg.2016.49. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kang J.-E., Kim H.-D., Park S.-Y., Pan J.-G., Kim J.H., Yum D.-Y. Dietary supplementation with a Bacillus superoxide dismutase protects against γ-Radiation-induced oxidative stress and ameliorates dextran sulphate sodium-induced ulcerative colitis in mice. J.Crohns Colitis. 2018;12:860–869. doi: 10.1093/ecco-jcc/jjy034. [DOI] [PubMed] [Google Scholar]
Korbel J.O., Urban A.E., Affourtit J.P., Godwin B., Grubert F., Simons J.F., Kim P.M., Palejev D., Carriero N.J., Du L. Paired-end mapping reveals extensive structural variation in the human genome. Science. 2007;318:420–426. doi: 10.1126/science.1149504. [DOI] [PMC free article] [PubMed] [Google Scholar]
Koren S., Walenz B.P., Berlin K., Miller J.R., Bergman N.H., Phillippy A.M. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res. 2017;27:722–736. doi: 10.1101/gr.215087.116. [DOI] [PMC free article] [PubMed] [Google Scholar]
Makarova K.S., Wolf Y.I., Alkhnbashi O.S., Costa F., Shah S.A., Saunders S.J., Barrangou R., Brouns S.J., Charpentier E., Haft D.H. An updated evolutionary classification of CRISPR-Cas systems. Nat. Rev.Microbiol. 2015;13:722–736. doi: 10.1038/nrmicro3569. [DOI] [PMC free article] [PubMed] [Google Scholar]
Mao D., Grogan D.W., de Boer P.A.J. How a genetically stable extremophile evolves: modes of genome diversification in the archaeonsulfolobusacidocaldarius. J.Bacteriol. 2017;199 doi: 10.1128/JB.00177-17. e00177–17. [DOI] [PMC free article] [PubMed] [Google Scholar]
Merino N., Aronson H.S., Bojanova D.P., Feyhl-Buska J., Wong M.L., Zhang S., Giovannelli D. Living at the extremes: extremophiles and the limits of life in a planetary context. Front.Microbiol. 2019;10:780. doi: 10.3389/fmicb.2019.00780. [DOI] [PMC free article] [PubMed] [Google Scholar]
Metzker M.L. Sequencing technologies - the next generation. Nat. Rev. Genet. 2010;11:31–46. doi: 10.1038/nrg2626. [DOI] [PubMed] [Google Scholar]
Mokashe N., Chaudhari B., Patil U. Operative utility of salt-stable proteases of halophilic and halotolerant bacteria in the biotechnology sector. Int. J.Biol.Macromol. 2018;117:493–522. doi: 10.1016/j.ijbiomac.2018.05.217. [DOI] [PubMed] [Google Scholar]
Niedringhaus T.P., Milanova D., Kerby M.B., Snyder M.P., Barron A.E. Landscape of next-generation sequencing technologies. Anal. Chem. 2011;83:4327–4341. doi: 10.1021/ac2010857. [DOI] [PMC free article] [PubMed] [Google Scholar]
Orellana-Saez M., Pacheco N., Costa J.I., Mendez K.N., Miossec M.J., Meneses C., Castro-Nallar E., Marcoleta A.E., Poblete-Castro I. In-depth genomic and phenotypic characterization of the antarcticpsychrotolerantstrain Pseudomonas sp. MPC6reveals unique metabolic features, plasticity, and biotechnological potential. Front.Microbiol. 2019;10:1154. doi: 10.3389/fmicb.2019.01154. [DOI] [PMC free article] [PubMed] [Google Scholar]
Palmieri G., Arciello S., Bimonte M., Carola A., Tito A., Gogliettino M., Cocca E., Fusco C., Balestrieri M., Colucci M.G. The extraordinary resistance to UV radiations of a manganese superoxide dismutase of Deinococcusradiodurans offers promising potentialities in skin care applications. J.Biotechnol. 2019;302:101–111. doi: 10.1016/j.jbiotec.2019.07.002. [DOI] [PubMed] [Google Scholar]
Peters B.A., Kermani B.G., Sparks A.B., Alferov O., Hong P., Alexeev A., Jiang Y., Dahl F., Tang Y.T., Haas J. Accurate whole-genome sequencing and haplotyping from 10 to 20 human cells. Nature. 2012;487:190–195. doi: 10.1038/nature11236. [DOI] [PMC free article] [PubMed] [Google Scholar]
Rothschild L.J., Mancinelli R.L. Life in extreme environments. Nature. 2001;409:1092–1101. doi: 10.1038/35059215. [DOI] [PubMed] [Google Scholar]
Rubin E.M., Levy S., Sutton G., Ng P.C., Feuk L., Halpern A.L., Walenz B.P., Axelrod N., Huang J., Kirkness E.F. The diploid genome sequence of an individual human. PLoS Biol. 2007;5:e254. doi: 10.1371/journal.pbio.0050254. [DOI] [PMC free article] [PubMed] [Google Scholar]
Schiraldi C., De Rosa M. The production of biocatalysts and biomolecules from extremophiles. Trends Biotechnol. 2002;20:515–521. doi: 10.1016/s0167-7799(02)02073-5. [DOI] [PubMed] [Google Scholar]
Swarup A., Lu J., DeWoody K.C., Antoniewicz M.R. Metabolic network reconstruction, growth characterization and 13C-metabolic flux analysis of the extremophile ThermusthermophilusHB8. Metab. Eng. 2014;24:173–180. doi: 10.1016/j.ymben.2014.05.013. [DOI] [PubMed] [Google Scholar]
Tharek M., Sim K.-S., Khairuddin D., Amir Hamzah G., Najimudin N. Whole-genome sequence of endophyticplant growth-promoting Escherichia coli USML2. Genome Announc. 2017;5 doi: 10.1128/genomeA.00305-17. e00305–17. [DOI] [PMC free article] [PubMed] [Google Scholar]
Tolstoganov I., Bankevich A., Chen Z., Pevzner P.A. cloudSPAdes: assembly of synthetic long reads using de Bruijn graphs. Bioinformatics. 2019;35:i61–i70. doi: 10.1093/bioinformatics/btz349. [DOI] [PMC free article] [PubMed] [Google Scholar]
Urbieta M.S., Donati E.R., Chan K.-G., Shahar S., Sin L.L., Goh K.M. Thermophiles in the genomic era: biodiversity, science, and applications. Biotechnol. Adv. 2015;33:633–647. doi: 10.1016/j.biotechadv.2015.04.007. [DOI] [PubMed] [Google Scholar]
Voskoboynik A., Neff N.F., Sahoo D., Newman A.M., Pushkarev D., Koh W., Passarelli B., Fan H.C., Mantalas G.L., Palmeri K.J. The genome sequence of the colonial chordate, Botryllusschlosseri. Elife. 2013;2:e00569. doi: 10.7554/eLife.00569. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wang H., Gong Y., Xie W., Xiao W., Wang J., Zheng Y., Hu J., Liu Z. Identification and characterization of a novel thermostablegh-57 gene from metagenomicfosmid library of the Juan de Fuca Ridge hydrothemal vent. Appl.Biochem.Biotechnol. 2011;164:1323–1338. doi: 10.1007/s12010-011-9215-1. [DOI] [PubMed] [Google Scholar]
Wang J., Salem D.R., Sani R.K. Extremophilicexopolysaccharides: a review and new perspectives on engineering strategies and applications. Carbohydr.Polym. 2019;205:8–26. doi: 10.1016/j.carbpol.2018.10.011. [DOI] [PubMed] [Google Scholar]
Wang O., Chin R., Cheng X., Wu M.K.Y., Mao Q., Tang J., Sun Y., Anderson E., Lam H.K., Chen D. Efficient and unique cobarcoding of second-generation sequencing reads from long DNA molecules enabling cost-effective and accurate sequencing, haplotyping, and de novo assembly. Genome Res. 2019;29:798–808. doi: 10.1101/gr.245126.118. [DOI] [PMC free article] [PubMed] [Google Scholar]
Weisenfeld N.I., Kumar V., Shah P., Church D.M., Jaffe D.B. Direct determination of diploid genome sequences. Genome Res. 2017;27:757–767. doi: 10.1101/gr.214874.116. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zhang F., Christiansen L., Thomas J., Pokholok D., Jackson R., Morrell N., Zhao Y., Wiley M., Welch E., Jaeger E. Haplotype phasing of whole human genomes using bead-based barcode partitioning in a single tube. Nat.Biotechnol. 2017;35:852–857. doi: 10.1038/nbt.3897. [DOI] [PubMed] [Google Scholar]
Zheng G.X.Y., Lau B.T., Schnall-Levin M., Jarosz M., Bell J.M., Hindson C.M., Kyriazopoulou-Panagiotopoulou S., Masquelier D.A., Merrill L., Terry J.M. Haplotypinggermline and cancer genomes with high-throughput linked-read sequencing. Nat.Biotechnol. 2016;34:303–311. doi: 10.1038/nbt.3432. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ziko L., Adel M., Malash M.N., Siam R. Insights into red sea brine pool specialized metabolism gene clusters encoding potential metabolites for biotechnological applications and extremophile survival. Mar. Drugs. 2019;17:273. doi: 10.3390/md17050273. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Document S1. Transparent methods, Figures S1–S6, and Tables S1–S4

mmc1.pdf^{(870.3KB, pdf)}

Data Availability Statement

[bib1] Adamiak J., Otlewska A., Gutarowska B. Halophilic microbial communities in deteriorated buildings. World J.Microbiol.Biotechnol. 2015;31:1489–1499. doi: 10.1007/s11274-015-1913-3. [DOI] [PubMed] [Google Scholar]

[bib2] Bankevich A., Pevzner P.A. TruSPAdes: barcode assembly of TruSeq synthetic long reads. Nat. Methods. 2016;13:248–250. doi: 10.1038/nmeth.3737. [DOI] [PubMed] [Google Scholar]

[bib3] Bishara A., Moss E.L., Kolmogorov M., Parada A.E., Weng Z., Sidow A., Dekas A.E., Batzoglou S., Bhatt A.S. High-quality genome sequences of uncultured microbes by assembly of read clouds. Nat.Biotechnol. 2018;36:1067–1075. doi: 10.1038/nbt.4266. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib4] Bishara A., Moss E.L., Tkachenko E., Kang J.B., Zlitni S., Culver R.N., Andermann T.M., Weng Z., Wood C., Handy C. De novo assembly of microbial genomes from human gut metagenomes using barcoded short read sequences. bioRxiv. 2018:125211. [Google Scholar]

[bib5] Brininger C., Spradlin S., Cobani L., Evilia C. The more adaptive to change, the more likely you are to survive: protein adaptation in extremophiles. Semin.CellDev. Biol. 2018;84:158–169. doi: 10.1016/j.semcdb.2017.12.016. [DOI] [PubMed] [Google Scholar]

[bib6] Brito J.A., Bandeiras T.M., Teixeira M., Vonrhein C., Archer M. Crystallisation and preliminary structure determination of a NADH: quinoneoxidoreductase from the extremophile Acidianusambivalens. Biochim.Biophys.Acta. 2006;1764:842–845. doi: 10.1016/j.bbapap.2005.09.015. [DOI] [PubMed] [Google Scholar]

[bib7] Burton J.N., Adey A., Patwardhan R.P., Qiu R., Kitzman J.O., Shendure J. Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions. Nat.Biotechnol. 2013;31:1119–1125. doi: 10.1038/nbt.2727. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib8] Carattoli A. Resistance plasmid families in Enterobacteriaceae. Antimicrob.Agents Chemother. 2009;53:2227–2238. doi: 10.1128/AAC.01707-08. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib9] Chen Z., Pham L., Wu T.-C., Mo G., Xia Y., Chang P.L., Porter D., Phan T., Che H., Tran H. Ultralow-input single-tube linked-read library method enables short-read second-generation sequencing systems to routinely generate highly accurate and economical long-range sequencing information. Genome Res. 2020;30:898–909. doi: 10.1101/gr.260380.119. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib10] Chien A., Edgar D.B., Trela J.M. Deoxyribonucleic acid polymerase from the extreme thermophile Thermusaquaticus. J.Bacteriol. 1976;127:1550–1557. doi: 10.1128/jb.127.3.1550-1557.1976. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib11] Chin C.S., Peluso P., Sedlazeck F.J., Nattestad M., Concepcion G.T., Clum A., Dunn C., O'Malley R., Figueroa-Balderas R., Morales-Cruz A. Phased diploid genome assembly with single-molecule real-time sequencing. Nat. Methods. 2016;13:1050–1054. doi: 10.1038/nmeth.4035. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib12] Danko D.C., Meleshko D., Bezdan D., Mason C., Hajirasouliha I. Minerva: an alignment- and reference-free approach to deconvolve Linked-Reads for metagenomics. Genome Res. 2019;29:116–124. doi: 10.1101/gr.235499.118. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib13] De Maio N., Shaw L.P., Hubbard A., George S., Sanderson N.D., Swann J., Wick R., AbuOun M., Stubberfield E., Hoosdally S.J. Comparison of long-read sequencing technologies in the hybrid assembly of complex bacterial genomes. Microb.Genom. 2019;5:e000294. doi: 10.1099/mgen.0.000294. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib14] DeLong E.F. Extreme genomes. Genome Biol. 2000;1 doi: 10.1186/gb-2000-1-6-reviews1029. reviews1029.1021. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib15] Goodwin S., McPherson J.D., McCombie W.R. Coming of age: ten years of next-generation sequencing technologies. Nat. Rev. Genet. 2016;17:333–351. doi: 10.1038/nrg.2016.49. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib16] Kang J.-E., Kim H.-D., Park S.-Y., Pan J.-G., Kim J.H., Yum D.-Y. Dietary supplementation with a Bacillus superoxide dismutase protects against γ-Radiation-induced oxidative stress and ameliorates dextran sulphate sodium-induced ulcerative colitis in mice. J.Crohns Colitis. 2018;12:860–869. doi: 10.1093/ecco-jcc/jjy034. [DOI] [PubMed] [Google Scholar]

[bib17] Korbel J.O., Urban A.E., Affourtit J.P., Godwin B., Grubert F., Simons J.F., Kim P.M., Palejev D., Carriero N.J., Du L. Paired-end mapping reveals extensive structural variation in the human genome. Science. 2007;318:420–426. doi: 10.1126/science.1149504. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib18] Koren S., Walenz B.P., Berlin K., Miller J.R., Bergman N.H., Phillippy A.M. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res. 2017;27:722–736. doi: 10.1101/gr.215087.116. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib19] Makarova K.S., Wolf Y.I., Alkhnbashi O.S., Costa F., Shah S.A., Saunders S.J., Barrangou R., Brouns S.J., Charpentier E., Haft D.H. An updated evolutionary classification of CRISPR-Cas systems. Nat. Rev.Microbiol. 2015;13:722–736. doi: 10.1038/nrmicro3569. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib20] Mao D., Grogan D.W., de Boer P.A.J. How a genetically stable extremophile evolves: modes of genome diversification in the archaeonsulfolobusacidocaldarius. J.Bacteriol. 2017;199 doi: 10.1128/JB.00177-17. e00177–17. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib21] Merino N., Aronson H.S., Bojanova D.P., Feyhl-Buska J., Wong M.L., Zhang S., Giovannelli D. Living at the extremes: extremophiles and the limits of life in a planetary context. Front.Microbiol. 2019;10:780. doi: 10.3389/fmicb.2019.00780. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib22] Metzker M.L. Sequencing technologies - the next generation. Nat. Rev. Genet. 2010;11:31–46. doi: 10.1038/nrg2626. [DOI] [PubMed] [Google Scholar]

[bib23] Mokashe N., Chaudhari B., Patil U. Operative utility of salt-stable proteases of halophilic and halotolerant bacteria in the biotechnology sector. Int. J.Biol.Macromol. 2018;117:493–522. doi: 10.1016/j.ijbiomac.2018.05.217. [DOI] [PubMed] [Google Scholar]

[bib24] Niedringhaus T.P., Milanova D., Kerby M.B., Snyder M.P., Barron A.E. Landscape of next-generation sequencing technologies. Anal. Chem. 2011;83:4327–4341. doi: 10.1021/ac2010857. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib25] Orellana-Saez M., Pacheco N., Costa J.I., Mendez K.N., Miossec M.J., Meneses C., Castro-Nallar E., Marcoleta A.E., Poblete-Castro I. In-depth genomic and phenotypic characterization of the antarcticpsychrotolerantstrain Pseudomonas sp. MPC6reveals unique metabolic features, plasticity, and biotechnological potential. Front.Microbiol. 2019;10:1154. doi: 10.3389/fmicb.2019.01154. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib26] Palmieri G., Arciello S., Bimonte M., Carola A., Tito A., Gogliettino M., Cocca E., Fusco C., Balestrieri M., Colucci M.G. The extraordinary resistance to UV radiations of a manganese superoxide dismutase of Deinococcusradiodurans offers promising potentialities in skin care applications. J.Biotechnol. 2019;302:101–111. doi: 10.1016/j.jbiotec.2019.07.002. [DOI] [PubMed] [Google Scholar]

[bib27] Peters B.A., Kermani B.G., Sparks A.B., Alferov O., Hong P., Alexeev A., Jiang Y., Dahl F., Tang Y.T., Haas J. Accurate whole-genome sequencing and haplotyping from 10 to 20 human cells. Nature. 2012;487:190–195. doi: 10.1038/nature11236. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib28] Rothschild L.J., Mancinelli R.L. Life in extreme environments. Nature. 2001;409:1092–1101. doi: 10.1038/35059215. [DOI] [PubMed] [Google Scholar]

[bib29] Rubin E.M., Levy S., Sutton G., Ng P.C., Feuk L., Halpern A.L., Walenz B.P., Axelrod N., Huang J., Kirkness E.F. The diploid genome sequence of an individual human. PLoS Biol. 2007;5:e254. doi: 10.1371/journal.pbio.0050254. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib30] Schiraldi C., De Rosa M. The production of biocatalysts and biomolecules from extremophiles. Trends Biotechnol. 2002;20:515–521. doi: 10.1016/s0167-7799(02)02073-5. [DOI] [PubMed] [Google Scholar]

[bib31] Swarup A., Lu J., DeWoody K.C., Antoniewicz M.R. Metabolic network reconstruction, growth characterization and 13C-metabolic flux analysis of the extremophile ThermusthermophilusHB8. Metab. Eng. 2014;24:173–180. doi: 10.1016/j.ymben.2014.05.013. [DOI] [PubMed] [Google Scholar]

[bib32] Tharek M., Sim K.-S., Khairuddin D., Amir Hamzah G., Najimudin N. Whole-genome sequence of endophyticplant growth-promoting Escherichia coli USML2. Genome Announc. 2017;5 doi: 10.1128/genomeA.00305-17. e00305–17. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib33] Tolstoganov I., Bankevich A., Chen Z., Pevzner P.A. cloudSPAdes: assembly of synthetic long reads using de Bruijn graphs. Bioinformatics. 2019;35:i61–i70. doi: 10.1093/bioinformatics/btz349. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib34] Urbieta M.S., Donati E.R., Chan K.-G., Shahar S., Sin L.L., Goh K.M. Thermophiles in the genomic era: biodiversity, science, and applications. Biotechnol. Adv. 2015;33:633–647. doi: 10.1016/j.biotechadv.2015.04.007. [DOI] [PubMed] [Google Scholar]

[bib35] Voskoboynik A., Neff N.F., Sahoo D., Newman A.M., Pushkarev D., Koh W., Passarelli B., Fan H.C., Mantalas G.L., Palmeri K.J. The genome sequence of the colonial chordate, Botryllusschlosseri. Elife. 2013;2:e00569. doi: 10.7554/eLife.00569. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib36] Wang H., Gong Y., Xie W., Xiao W., Wang J., Zheng Y., Hu J., Liu Z. Identification and characterization of a novel thermostablegh-57 gene from metagenomicfosmid library of the Juan de Fuca Ridge hydrothemal vent. Appl.Biochem.Biotechnol. 2011;164:1323–1338. doi: 10.1007/s12010-011-9215-1. [DOI] [PubMed] [Google Scholar]

[bib37] Wang J., Salem D.R., Sani R.K. Extremophilicexopolysaccharides: a review and new perspectives on engineering strategies and applications. Carbohydr.Polym. 2019;205:8–26. doi: 10.1016/j.carbpol.2018.10.011. [DOI] [PubMed] [Google Scholar]

[bib38] Wang O., Chin R., Cheng X., Wu M.K.Y., Mao Q., Tang J., Sun Y., Anderson E., Lam H.K., Chen D. Efficient and unique cobarcoding of second-generation sequencing reads from long DNA molecules enabling cost-effective and accurate sequencing, haplotyping, and de novo assembly. Genome Res. 2019;29:798–808. doi: 10.1101/gr.245126.118. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib39] Weisenfeld N.I., Kumar V., Shah P., Church D.M., Jaffe D.B. Direct determination of diploid genome sequences. Genome Res. 2017;27:757–767. doi: 10.1101/gr.214874.116. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib40] Zhang F., Christiansen L., Thomas J., Pokholok D., Jackson R., Morrell N., Zhao Y., Wiley M., Welch E., Jaeger E. Haplotype phasing of whole human genomes using bead-based barcode partitioning in a single tube. Nat.Biotechnol. 2017;35:852–857. doi: 10.1038/nbt.3897. [DOI] [PubMed] [Google Scholar]

[bib41] Zheng G.X.Y., Lau B.T., Schnall-Levin M., Jarosz M., Bell J.M., Hindson C.M., Kyriazopoulou-Panagiotopoulou S., Masquelier D.A., Merrill L., Terry J.M. Haplotypinggermline and cancer genomes with high-throughput linked-read sequencing. Nat.Biotechnol. 2016;34:303–311. doi: 10.1038/nbt.3432. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib42] Ziko L., Adel M., Malash M.N., Siam R. Insights into red sea brine pool specialized metabolism gene clusters encoding potential metabolites for biotechnological applications and extremophile survival. Mar. Drugs. 2019;17:273. doi: 10.3390/md17050273. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Comparison of different sequencing strategies for assembling chromosome-level genomes of extremophiles with variable GC content

Zhidong Zhang

Guilin Liu

Yao Chen

Weizhen Xue

Qianyue Ji

Qiwu Xu

He Zhang

Guangyi Fan

He Huang

Ling Jiang

Jianwei Chen

Summary

Graphical abstract

Highlights

Introduction

Results

High GC bacterial stLFR sequencing and assembly

Figure 1.

Draft genome assembly using NGS short reads

Table 1.

Figure 2.

Complete genome assembly

Chromosome-level genome assembly using stLFR

Comparison of the genome assemblies

Figure 3.

Figure 4.

Discussion

Limitations of the study

Resource availability

Lead contact

Materials availability

Data and code availability

Methods

Acknowledgments

Author contributions

Declaration of interests

Inclusion and diversity

Footnotes

Contributor Information

Supplemental information

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases