Genome comparison reveals that Halobacterium salinarum 63‐R2 is the origin of the twin laboratory strains NRC‐1 and R1

Friedhelm Pfeiffer; Mike Dyall‐Smith

doi:10.1002/mbo3.1365

. 2023 Jun 13;12(3):e1365. doi: 10.1002/mbo3.1365

Genome comparison reveals that Halobacterium salinarum 63‐R2 is the origin of the twin laboratory strains NRC‐1 and R1

Friedhelm Pfeiffer ^1,^✉, Mike Dyall‐Smith ^1,²

PMCID: PMC10264941 PMID: 37379421

Abstract

The genome of Halobacterium strain 63‐R2 was recently reported and provides the opportunity to resolve long‐standing issues regarding the source of two widely used model strains of Halobacterium salinarum, NRC‐1 and R1. Strain 63‐R2 was isolated in 1934 from a salted buffalo hide (epithet “cutirubra”), along with another strain from a salted cow hide (91‐R6^T, epithet “salinaria,” the type strain of Hbt. salinarum). Both strains belong to the same species according to genome‐based taxonomy analysis (TYGS), with chromosome sequences showing 99.64% identity over 1.85 Mb. The chromosome of strain 63‐R2 is 99.99% identical to the two laboratory strains NRC‐1 and R1, with only five indels, excluding the mobilome. The two reported plasmids of strain 63‐R2 share their architecture with plasmids of strain R1 (pHcu43/pHS4, 99.89% identity; pHcu235/pHS3, 100.0% identity). We detected and assembled additional plasmids using PacBio reads deposited at the SRA database, further corroborating that strain differences are minimal. One plasmid, pHcu190 (190,816 bp) corresponds to pHS1 (strain R1) but is even more similar in architecture to pNRC100 (strain NRC‐1). Another plasmid, pHcu229, assembled partially and completed in silico (229,124 bp), shares most of its architecture with pHS2 (strain R1). In deviating regions, it corresponds to pNRC200 (strain NRC‐1). Further architectural differences between the laboratory strain plasmids are not unique, but are present in strain 63‐R2, which contains characteristics from both of them. Based on these observations, it is proposed that the early twentieth‐century isolate 63‐R2 is the immediate ancestor of the twin laboratory strains NRC‐1 and R1.

Keywords: Archaea, comparative genomics, haloarchaea, halobacteria, mobilome, plasmid

The complete genomes of four Halobacterium salinarum strains were compared in detail. Two strains (91‐R6^T and 63‐R2) were isolated in 1934 by Lochhead from cow and buffalo hides. From the results of these comparisons, we conclude that strain 63‐R2 is the immediate ancestor of the two, widely used laboratory strains NRC‐1 and R1.

graphic file with name MBO3-12-e1365-g006.jpg

1. INTRODUCTION

The use of salt in the preservation of food (curing) and the tanning of leather are traditional processes dating back hundreds of years. In 1922, searching for the cause of “red discolorations” on salted codfish, which was seen as a threat to the Canadian fishery industry, Harrison and Kennedy isolated a red pigmented microorganism (Harrison & Kennedy, 1922), which they named Pseudomonas salinaria. This was the original isolate and designated the type strain of Halobacterium salinarum (according to the currently approved taxonomy). Later, this strain was lost.

In 1934, Lochhead investigated “red discolorations” (also called “red heat”) of salted hides, which were causing losses for Canadian leather manufacturers (Lochhead, 1934). During that study, he cultivated two more red‐pigmented isolates, one of which (91‐R6), obtained from a cow hide, was given the species epithet “salinaria” because of its high similarity with the organism isolated previously by Harrison and Kennedy. After the 1922 isolate was lost, 91‐R6^T was advanced as the type strain (neotype) of Hbt. salinarum. The other Lochhead isolate (63‐R2), obtained from a buffalo hide, was given the species epithet “cutirubra.” It was considered a distinct species by Lochhead, but this was later revised (Ventosa & Oren, 1996). The two Lochhead isolates from 1934 were deposited in the National Research Council (NRC) of Canada culture collection as NRC 34001 (63‐R2) and NRC 34002 (91‐R6). Although the NRC culture collection closed, the strains are preserved in several other culture collections (Grant et al., 2001), e.g ATCC 33170, DSM 669 (63‐R2) and ATCC 33171, DSM 3754 (91‐R6).

Taxonomically, strain 91‐R6^T is the type strain of Hbt. salinarum, and strain 63‐R2 is the type strain of Halobacterium cutirubrum, a name that has been validly published but is a younger heterotypic synonym of Hbt. salinarum according to the LPSN (list of prokaryotic names with standing in nomenclature) (Meier‐Kolthoff et al., 2022). The epithet salinarum has priority due to its earlier publication (1922, compared to 1934) according to the international code of nomenclature of prokaryotes (Parker et al., 2019). Hbt. halobium and Hbt. cutirubrum were designated species incertae sedis in 1996 (Ventosa & Oren, 1996), and since then organisms previously referred to by these names have been designated strains of Hbt. salinarum.

While the original Lochhead isolates (63‐R2 and 91‐R6) were only rarely used for experimental analyses, the twin pair of laboratory strains (NRC‐1, R1) was extensively studied. Both were assumed to be derived from Hbt. salinarum DSM 670, a strain obtained from the Stoeckenius lab which was referred to as Hbt. halobium (Stoeckenius & Kunau, 1968; Stoeckenius & Rowen, 1967). DSM 670 is thought to have come from NRC deposited strain NRC 34020 (Gruber et al., 2004). Attempts to retrieve their exact origin were not successful (Grant et al., 2001), but this can now be re‐evaluated using their genome sequences. DSM 671, strain R1, is the gas‐vesicle‐free mutant of DSM 670 (Stoeckenius & Kunau, 1968). It is from the purple membrane of strain R1 that Dieter Oesterhelt isolated bacteriorhodopsin (Oesterhelt & Stoeckenius, 1971), a light‐driven proton pump (Oesterhelt & Stoeckenius, 1973) which enables Halobacterium to grow by a second principle of photosynthesis (Oesterhelt & Krippahl, 1983).

High‐quality genome sequences for all four strains of Hbt. salinarum (91‐R6, 63‐R2, NRC‐1, R1) have now been determined, allowing detailed comparison and analysis. The genome sequence of Hbt. salinarum strain NRC‐1 was the first haloarchaeal genome sequence that became publicly available in 2000 (Ng et al., 2000), and also one of the first archaeal species sequenced. This has become the reference genome for halophilic archaea, with hundreds of literature citations. The complete genome sequence of strain R1 was published in 2008 (Pfeiffer et al., 2008), that of strain 91‐R6 in 2019 (Pfeiffer et al., 2019, 2020), and that of strain 63‐R2 in 2022 (DasSarma et al., 2022). Detailed interstrain comparisons revealed that the chromosomes of R1 and NRC‐1 are completely colinear and virtually identical (Pfeiffer et al., 2008). They are also highly similar (in silico DDH, 95%) to the type strain (91‐R6^T) (Pfeiffer et al., 2020), confirming the taxonomic assignment of strain NRC‐1 to the species Hbt. salinarum (Gruber et al., 2004). The availability of the high‐quality genome sequence for strain 63‐R2 now allows the interstrain genome comparisons of all four strains.

A distinctive feature of Hbt. salinarum is a high rate of spontaneous mutation due to the movement of, and recombination between, mobile genetic elements (MGEs) (ISH elements, transposons, “the mobilome”), and this has been a focus of study from the 1980s onwards (DasSarma et al., 1983; Ng et al., 2000; Pfeifer & Blaseio, 1990; Pfeiffer et al., 2008, 2020). ISH elements are not only associated with insertional inactivation of genes but also genome inversions and other genome rearrangements (Ng et al., 1991; Pfeiffer et al., 2020). Most of the differences between the twin laboratory strains NRC‐1 and R1 could be attributed to this highly active mobilome (Pfeiffer et al., 2008).

In this study, the core genomes for all four strains were compared to assess the relationship between laboratory strains NRC‐1 and R1, and the original Lochhead strains 91‐R6, 63‐R2. In these comparisons, strain‐specific copies of MGEs were removed to reduce the background noise and enhance any evolutionary signals. The genome of strain 63‐R2 (NRC 34001) was found to be exceedingly similar to the laboratory twins NRC‐1 and R1, and the types of changes seen are consistent with strain 63‐R2 being the ancestral strain from which the two laboratory strains were derived. We believe that the origin of these laboratory strains has now been resolved.

2. MATERIALS AND METHODS

Detailed methods are provided in the supplementary material, deposited at Zenodo: https://doi.org/10.5281/zenodo.7780801. For convenience, summaries of these methods are given below.

2.1. Formatting the chromosomal sequences of strains 63‐R2, 91‐R6, NRC‐1, and R1 for comparative analysis

In‐house tagged versions of the genome sequences of strains 63‐R2, 91‐R6, R1, and NRC‐1 were generated in which all unique sequences between MGEs were identified, as well as each MGE and associated target sequence duplication (TSD).

After the removal of comments, a “total” sequence was available for each strain. The concatenation of these sequences resulted in a “total” database for subsequent analyses, especially the determination of positions in the original genome sequences. In this “total” database file, line breaks around MGEs are preserved so that their visual identification is simple, especially when the MGE is enclosed by a TSD. A copy of that file served as the initial version of the “core” database, open for subsequent manual modification, most importantly the removal of strain‐specific copies of MGEs.

2.2. Chromosome comparison strategy and generation of core chromosomes devoid of strain‐specific MGEs for strains 63‐R2, NRC‐1 and R1

Preliminary genomic comparisons (BLASTn, MUMMer) had indicated that the genome sequence of strain 63‐R2 was much more closely related to those of the twin laboratory strains NRC‐1 and R1 than to that of strain 91‐R6^T, and because of this, the initial analyses were restricted to these three strains. Applying an iterative comparison procedure, “core” chromosome sequences devoid of strain‐specific MGEs were generated. The build‐up of this “core” database is described in Supplementary Methods, deposited at Zenodo: https://doi.org/10.5281/zenodo.7780801. All eliminated MGEs, including their position in the original genome sequence, are documented in Table A1.

BLASTn analyses of the “core” sequences resulted in a complete set of HSPs (high‐scoring pairs) that correlated the complete “core” sequence of the chromosome from strain 63‐R2 against the core chromosomes from the twin laboratory strains NRC‐1 and R1.

HSP positions of the interstrain comparison are reported for the “core” database, but to allow easy correlation with biological features, all “core” database positions have been correlated with the corresponding positions in the original sequences of the “total” database (Supplementary Table S4, deposited at Zenodo: https://doi.org/10.5281/zenodo.7780801).

2.3. Comparison of the chromosome of strain 91‐R6 to the core chromosome from strain 63‐R2

The identification and elimination of MGEs that are specific for strain 91‐R6^T as compared to strains 63‐R2, NRC‐1, and R1 are described in Supplementary Methods: https://doi.org/10.5281/zenodo.7780801. Strain‐specific MGEs detected upon analysis of strain 91‐R6 either occur only in the chromosome of strain 91‐R6 (documented in Table A2), or are present in all three of the other strains (63‐R2, NRC‐1, and R1; documented in Table A1).

The core chromosomes of strains 91‐R6 and 63‐R2 were compared by BLASTn, leading to long HSPs, interrupted by unique sequences, which typically were short. Two long regions were encountered that are considered unique despite having a small number of short HSPs (see the Supplementary Methods for details).

2.4. Comparison of the reported plasmids pHcu235 and pHcu43 from strain 63‐R2 to plasmids pHS3 and pHS4, respectively, from strain R1

Preliminary comparisons (BLASTn) indicated that the sequence of plasmid pHcu235 from strain 63‐R2 is most closely related to plasmid pHS3 from strain R1, so these plasmids were compared in detail using the same procedure as described for chromosomal comparison (see above, Section 2.2). Plasmid pNRC200 from strain NRC‐1 showed a more patchy relationship and was not included in this analysis.

Preliminary comparisons (BLASTn) indicated that the unique, 2.3 kb sequence of plasmid pHS4 from strain R1 is closely related to a region on plasmid pHcu43 from strain 63‐R2. Thus, the sequences of these two plasmids were compared. A plasmid corresponding to pHS4 has not been reported for strain NRC‐1, and thus a plasmid from this strain was not included in the analysis.

The position of strain‐specific MGEs and their associated TSD which were removed upon generation of core plasmid sequences, are listed in Table A3 (pHcu235/pHS3) and Table A4 (pHcu43/pHS4). The final results of this analysis are the HSPs obtained with the “core” sequence of pHcu235 against the “core” sequence of plasmid pHS3 from strain R1 and the HSP obtained for the “core” sequences of pHcu43 against pHS4.

2.5. Validation that a plasmid corresponding to pHS4 from strain R1 is absent from strain NRC‐1

This is based on an analysis of Illumina sequence reads obtained upon resequencing of strain NRC‐1 (Kunka et al., 2020) (SRA:SRR9025102) and is described in Supplementary Methods: https://doi.org/10.5281/zenodo.7780801.

2.6. Assembly of strain 63‐R2 plasmids pHcu190 and pHcu229 from deposited PacBio read data

PacBio sequence reads for strain 63‐R2 have recently become available (DasSarma et al., 2022). Reads were downloaded from the SRA database (SRA:SRR16600243). Details of the assembly procedure are described in Supplementary Methods: https://doi.org/10.5281/zenodo.7780801.

When sequence duplications between contigs exceeded the length of even the longest PacBio reads, related plasmids were used to guide assembly at the junctions of these duplications. For plasmid pHcu190, plasmids pNRC100 and pHS1 were used as guide sequences, and a complete plasmid could be assembled. For plasmid pHcu229, plasmids pNRC200 and pHS2 were used as a guide. The assembly remained incomplete at both ends, due to a very long duplication between pHcu229 and pHcu190. No heterogeneities could be detected within this duplication, and thus the sequence of pHcu229 could be completed in silico by transferring the corresponding sequence from pHcu190. The sequences of pHcu190 and both versions of pHcu229 are deposited at Zenodo: https://doi.org/10.5281/zenodo.7288901.

2.7. Subassembly walking

Sequence duplications that exceed the length of PacBio reads cannot be resolved by regular assembly procedures. In this case, we applied a method that we refer to as “subassembly walking” which is described in Supplementary Methods: https://doi.org/10.5281/zenodo.7780801.

For subassembly walking attempts, we selected PacBio reads based on the following sequence features: (a) unique sequences from other strains which were not covered in the set of contigs from strain 63‐R2, (b) sets of PacBio reads selected according to a yet unexplored junction between a unique sequence and a duplication; this enabled the minimum length of the duplicated sequence which is connected to that junction to be determined, and (c) optional MGE's, where some reads contained the MGE‐free sequence version, while others exemplified the junction between the MGE and the adjacent unique sequence.

2.8. Assembly of strain 63‐R2 contigs contigDRAFT1 and contigDRAFT2 which represent the residuals of a plasmid that has integrated into the chromosome

Some sequences in strains R1 and NRC‐1 are strain‐specific and are not represented in the other strain (R1: 210 kb; NRC‐1; 15 kb) (Pfeiffer et al., 2008). Large parts of these strain‐specific sequences occur in strain 63‐R2. Nevertheless, some of the R1‐specific sequences were seemingly absent from this strain and it was attempted to validate their absence. Surprisingly, PacBio reads were identified which contain some of the R1‐specific sequences even though these occur neither in the chromosome nor in any of the assembled plasmids from strain 63‐R2 (case [a] in Section 2.7). Readsets were selected and assembled within Geneious (de novo assembly tool). Reads were also mapped to available contigs, including minor ones (e.g., short; low coverage; atypical connectivities of duplicated sequences). Emerging contigs were validated and/or extended by subassembly walking, resulting in contigDRAFT1 and contigDRAFT2.

2.9. Additional bioinformatics tools

As general tools, MUMMER v4 (Delcher et al., 2003) and the BLAST suite of programs v2.2 (Altschul et al., 1997; Johnson et al., 2008) were used for genome comparisons. All of the reported HSPs were obtained by BLASTn with default parameters except for three (−e 0.001; −F F; −C 0). Thus, low‐complexity filtering and composition‐based statistics were switched off. This slightly more stringent e‐value cutoff was chosen to reduce casual hits. The TYGS server (Meier‐Kolthoff & Göker, 2019) was used to query by whole genome comparison if strains represent novel species or belong to known species. Geneious Prime (version 2022.0.2) was used for read mapping and read assembly (Kearse et al., 2012).

3. RESULTS

3.1. Initial comparison of the genome of Lochhead strain 63‐R2 with that of other completely sequenced strains of Hbt. salinarum

Complete genome sequences consisting of both chromosomes and plasmids of the Lochhead strains 91‐R6 and 63‐R2, and the laboratory strains NRC‐1 and R1 were submitted to the TYGS server for taxonomy assignment based on comparison of complete genomes (Meier‐Kolthoff & Göker, 2019; Meier‐Kolthoff et al., 2022). This server accesses its database of genomes from known type strains, including the type strain of Hbt. salinarum (Lochhead strain 91‐R6) as well as to Hbt. salinarum DSM 669 (=NRC 34001 = Lochhead strain 63‐R2, previously “Hbt. cutirubrum”). The most relevant data for taxonomic analyses generated by the TYGS server (digital DNA‐DNA hybridization, formula d4, dDDH[d₄]) are given in Table 1. All dDDH(d₄) values were above 90% in comparison to the type strain 91‐R6^T, confirming they are all strains of Hbt. salinarum because they exceed the 70% species delineation threshold (Meier‐Kolthoff & Göker, 2019). The twin laboratory strains NRC‐1 and R1 show an exceedingly close relationship to strain 63‐R2 (>99% dDDH[d₄]) and are slightly less related (93%–94% dDDH[d₄]) to strain 91‐R6.

Table 1.

Genome‐based taxonomy analysis (TYGS) server results for the analyzed strains of Halobacterium salinarum.

Strain analyzed	Accessions	dDDH(d₄) versus Hbt. salinarum type strain (91‐R6)	CI d₄	dDDH(d₄) versus “Hbt. cutirubrum” (63‐R2)	CI d ₄
63‐R2	CP085882–CP085884	94.6	92.9–95.9	99.8	99.6–99.9
NRC‐1	AE004437, AE004438, AF016485	92.1	90.0–93.8	99.3	98.9–99.5
R1	AM774415–AM774419	93.1	91.1–94.6	99.5	99.1–99.7
91‐R6	CP038631–CP038633	99.9	99.7–99.9	94.1	92.4–95.5

Tag	Position (strain 63‐R2)	Position (strain R1)	Sequence identity (%)	Match bases/total bases	Gap characters	Comment
R_HSP1	1–170,773	1–170,773	99.99	170,772/170,773	0	‐
R_break1	170,774–180,249	8 bp overlap	‐	‐	‐	9476 bp insertion in strain 63‐R2
R_HSP2	180,250–589,134	170,766–579,649	99.99	408,881/408,885	1	‐
R_break2	13 bp overlap	579,650–579,678	‐	‐		14 codon deletion (pos 472–485) in transducer protein htrVI of strain 63‐R2 (LJ422_03260) compared to the orthologs from the laboratory strains (OE_2168R, VNG_0793G)
R_HSP3	589,122–1,095,986	579,679–1,086,558	99.99	506,862/506,880	17	‐
R_break3	Directly adjacent	38 bp overlap	‐	‐	‐	An insert in strain 63‐R2 in an intergenic region between divergently transcribed ORFs (OE_3125R and OE_3126F)
R_HSP4	1,095,987–1,861,458	1,086,521–1,852,001	99.99	765,470/765,481	9	‐
R_break4	1,861,459–1,861,562	29 bp overlap	‐	‐	‐	A 133 bp deletion in strain R1 in the rRNA promoter region
R_HSP5	1,861,563–1,997,337	1,851,973–1,987,747	100.00	135,775/135,775	0	‐

Tag	Position (strain 63‐R2)	Position (strain NRC‐1)	Sequence identity (%)	Match bases/total bases	Gap characters	Comment
N_HSP1	1–589,134	1–589,132	99.99	589,128/589,134	2	‐
N_break1	13 bp overlap	589,133–589,161	‐	‐		See Table 2a, R_break2
N_HSP2	589,122–1,095,986	589,162–1,096,040	99.99	506,858/506,880	18	‐
N_break2	Directly adjacent	38 bp overlap	‐	‐	‐	See Table 2a, R_break3
N_HSP3	1,095,987–1,613,654	1,096,003–1,613,679	99.99	517,642/517,677	9	‐
N_break3	1,613,655–1,613,818	259 bp overlap	‐	‐	‐	164 extra bases in strain 63‐R2; a 423 bp deletion in strain NRC‐1 compared to strain R1 in the hcpB gene (VNG_2196G)
N_HSP4	1,613,819–1,997,337	1,613,421–1,996,939	99.99	383,517/383,519	0	‐

Tag	Position (pHcu235)	Position (pHS3)	Sequence identity (%)	Match bases/total bases	Gap characters	Comment
pHcu235_HSP1	1–210,501	1–210,501	100.00	210,501/210,501	0
pHcu235_ break1	549 bp overlap	210,502–254,717				44,216 bp deletion in pHcu235
pHcu235_HSP2	209,953–212,491	254,718–257,256	100.00	2539/2539	0
pHcu235_ break2	Directly adjacent	257,257–264,819				7563 bp deletion in pHcu235
pHcu235_HSP3	212,492–230,601	264,820–282,929	100.00	18,110/18,110	0

Tag	Position (pHcu190)	Position (pHS1)	Sequence identity (%)	Match bases/total bases	Gap characters	Comment
R_pHcu190_HSP1	1–72,155	1–72,155	100.00	72,155/72,155	0
R_pHcu190_break1	72,156–76,681	72,156–91,519	‐			4526 bp region specific to pHcu190/pNRC100; 19,364 bp region specific to pHS1
R_pHcu190_HSP2	76,682–125,134	91,520–139,972	99.95	48,433/48,453	0
R_pHcu190_break2	125,135–183,605	139,973–141,861	‐			1889 bp terminal region specific for pHS1; 58,471 bp region specific for pHcu190/pNRC100, which includes a 16 kb sequence and the long (40 kb) inverted duplication

Tag	Position (pHcu229)	Position (pHS2)	Sequence identity (%)	Match bases/total bases	Gap characters	Comment
Incompleteness	‐	65,602–91,881	‐			No upstream match because pHcu229 has only been partially assembled (in silico extension not considered)
R_pHcu229_HSP1	1–67,160	91,882–159,041	99.99	67,159/67,160	0
R_pHcu229_break1	11 bp overlap	159,042–163,202	‐			A 4161 bp transposon cassette in pHS2
R_pHcu229_HSP2	67,150–95,583	163,203–191,636	100.00	28,434/28,434	0
R_pHcu229_break2	Directly adjacent	191,637–E/1‐10,161	‐			E:194432
R_pHcu229_HSP3	95,584–151,023	10,162–65,601	100.00	55,440/55,440	0
R_pHcu229_break3	151,024–165,531	‐	‐			Present in pHcu229 and pNRC200 but absent from pHS2

Tag	Position (pHcu229)	Position (pNRC200)	Sequence identity (%)	Match bases/total bases	Gap characters	Comment
Incompleteness		1–42,124				No upstream match because pHcu229 has only been partially assembled (in silico extension not considered)
N_pHcu229_HSP1	1–31,107	42,125–73,231	100.00	31,107/31,107	0
N_pHcu229_break1	31,108–50,471	73,232–77,757				19,364 bp region in pHcu229; 4526 bp region in pNRC200
N_pHcu229_HSP2	50,472–55,954	77,758–83,240	100.00	5483/5483	0
N_pHcu229_break2	55,955–95,583	83,241–275,058				39,629 bp in pHcu229;191,818 bp in pNRC200
N_pHcu229_HSP3	95,584–165,531	275,059–348,884	99.99	61,062/61,063	0
Incompleteness	‐	348,885–361,547				No downstream match because pHcu229 has only been partially assembled (in silico extension not considered)

Serial	Category	Length (bp)	Region (63‐R2)	Region (NRC‐1)	Region (R1)	MGE type	Target sequence duplication (TSD) (bp)	Comment
1	CA	1403	8546–9948	‐	‐	ISH3B	5
2	CA	842	10,071–10,912	‐	‐	ISH2 + ISH8A (partial)	10	Special case A
3	CA	1012	‐		35,961–36,972	ISH4	8
4	CA	1413	‐	‐	56,795–58,207	ISH8B	10	Integration site corresponds to core genome positions 55,782 in strains 63‐R2 and R1 and 55,781 in strain NRC‐1
5	CA	531	58,039–58,569	‐	‐	ISH2	10	Integration site corresponds to core genome positions 55,793 in strains 63‐R2 and R1 and 55,792 in strain NRC‐1
6	CA	1413	‐	56,171–57,583	‐	ISH8B	10	Integration site corresponds to core genome positions 56,174 in strains 63‐R2 and R1 and 56,173 in strain NRC‐1
7	CA	1130	61,642–62,771	‐	‐	ISH1	8
8	CA	1413	90,788–92,200	‐	89,307–90,719	ISH8B	10
9	CA	1403	‐	95,825–97,227	‐	ISH3B	5
10	CA	1394	‐	99,327–100,720	‐	ISH3C	5
11	CA	531	103,081–103,611	‐	‐	ISH2	10
12	CA	531	119,673–120,203	‐	117,661–118,191	ISH2	10
13	CA	531	‐	176,934–177,464	‐	ISH2	10	Special case B
14	CB	1413	267,225–268,637	265,581–266,993	255,729–257,141	ISH8B	10
15	CA	1130	‐	‐	289,699–290,828	ISH1	8
16	CA	1456	533,027–534,482	‐	‐	ISH6	8
17	CA	1012	‐	697,026–698,037	‐	ISH4	8
18	CA	1403	753,135–754,537	‐	‐	ISH8E	None	Special case C
19	CA	1394	‐	‐	741,710–743,103	ISH3C	5
20	CA	1394	‐	‐	744,664–746,057	ISH3C	5
21	CA	531	762,082–762,612	‐	‐	ISH2	10	Integration site corresponds to core genome position 752,839 in strain 63‐R2, 752,882 in strain NRC‐1 and 743,400 in strain R1
22	CA	531	‐	759,508–760,038	‐	ISH2	10	Integration site corresponds to core genome position 753,711 in strain 63‐R2, 753,754 in strain NRC‐1 and 744,272 in strain R1
23	CA	1394	765,891–767,284	‐	‐	ISH3C	5	Integration site corresponds to core genome position 756,117 in strain 63‐R2, 756,160 in strain NRC‐1 and 746,678 in strain R1
24	CA	531	‐	771,563–772,093	‐	ISH2	10	Integration site corresponds to core genome position 765,235 in strain 63‐R2, 765,278 in strain NRC‐1 and 755,796 in strain R1
25	CA	1130	990,664–991,793	‐	978,358–979,487	ISH1
26	CA	1074	‐	1,184,712–1,185,785	‐	ISH11	6
27	CA	1394	‐	1,186,471–1,187,864	‐	ISH3C	5
28	CA	532	‐	‐	1,220,109–1,220,640	ISH2	11	Integration site corresponds to core genome position 1,220,155 in strain 63‐R2, 1,220,173 in strain NRC‐1 and 1,210,691 in strain R1
29	CA	531	‐	1,230,337–1,230,867	‐	ISH2	10	Integration site corresponds to core genome position 1,221,035 in strain 63‐R2, 1,221,053 in strain NRC‐1 and 1,211,571 in strain R1
30	CA	1413	‐	1,231,045–1,232,457	‐	ISH8E	10
31	CA	1413	‐	1,608,077–1,609,489	‐	ISH8B	10
32	CB	1853	1,987,893–1,989,745	1,987,839–1,989,691	1,975,957–1,977,809	ISH34	‐	An IS605‐type element; MGEs of this type never generate a TSD
33	CA	1394	‐	2,004,215–2,005,608	‐	ISH3C	5

Serial	Category	Length (bp)	Region (91‐R6)	MGE type	Target sequence duplication (TSD) (bp)
1	CB	1394	1,027,440–1,028,833	ISHsal1	5
2	CB	1657	1,203,639–1,205,295	ISNpe8	7
3	CB	413	1,408,020–1,408,432	MITEHsal2	8
4	CB	411	1,455,597–1,456,007	MITEHsal2	6
5	CB	1592	1,593,564–1,595,155	ISH10	8

Serial	Length (bp)	Region (pHcu235)	Region (pHS3)	MGE type	Target sequence duplication (TSD) (bp)
1	1394	206,223–207,616	‐	ISH3C	5
2	1403	220,507–221,909	‐	ISH3B	5
3	531	233,778–234,308	‐	ISH2	10
4	1403	‐	140,883–142,285	ISH3B	5
5	1389	175,248–176,641	‐	ISH3D	5

Serial	Length (bp)	Region (pHcu43)	Region (pHS4)	MGE type	Target sequence duplication (TSD) (bp)
1	531	39,231–39,761	‐	ISH2	10
2	1394	7287–8680	‐	ISH3D	5
3	1413	12,184–13,596	‐	ISH8B	10
4	1413	‐	30,920–32,332	ISH8B	10

Strain	Position	Length (bp)	G + C (%)	Protein range
91‐R6	216,886–264,382	47,497	56.4	HBSAL_01115 to HBSAL_01355
63‐R2	12,851–74,116	61,266	56.1	LJ422_00060 to LJ422_00390
NRC‐1	10,606–71,619	61,014	56.1	VNG_0011C to VNG_0080H
R1	10,606–72,635	62,030	56.2	OE_1018F to OE_1136F

Serial	Length (bp)	Region (pHcu190)	Region (pNRC100)	Region (pHS1)	MGE type	Target sequence duplication (TSD) (bp)	Comment
1	1394	‐	‐	22,454–23,847	ISH3C	5
2	1076	36,765–37,840	36,765–37,840	‐	ISH11	8
3	531	‐	‐	40,645–41,175	ISH2	10
4	1413	58,502–59,914	58,502–59,914	‐	ISH8B	10
5	531	70,690–71,220	70,690–71,220	‐	ISH2	10
6	1403	81,981–83,383	81,981–83,383	‐	ISH3B	5
7	1413	‐	‐	104,639–106,051	ISH8B	10
8	1013	‐	‐	106,149–107,161	ISH4	9
9	1394	101,441–102,834	101,441–102,834	‐	ISH3D	5
10	1394	105,570–106,963	105,570–106,963	‐	ISH3C	5
11	1413	‐	‐	134,541–135,953	ISH8B	10
12	531	‐	153,539–154,069	‐	ISH2	10	Region absent from pHS1

PERMALINK

Genome comparison reveals that Halobacterium salinarum 63‐R2 is the origin of the twin laboratory strains NRC‐1 and R1

Friedhelm Pfeiffer

Mike Dyall‐Smith

Abstract

1. INTRODUCTION

2. MATERIALS AND METHODS

2.1. Formatting the chromosomal sequences of strains 63‐R2, 91‐R6, NRC‐1, and R1 for comparative analysis

2.2. Chromosome comparison strategy and generation of core chromosomes devoid of strain‐specific MGEs for strains 63‐R2, NRC‐1 and R1

2.3. Comparison of the chromosome of strain 91‐R6 to the core chromosome from strain 63‐R2

2.4. Comparison of the reported plasmids pHcu235 and pHcu43 from strain 63‐R2 to plasmids pHS3 and pHS4, respectively, from strain R1

2.5. Validation that a plasmid corresponding to pHS4 from strain R1 is absent from strain NRC‐1

2.6. Assembly of strain 63‐R2 plasmids pHcu190 and pHcu229 from deposited PacBio read data

2.7. Subassembly walking

2.8. Assembly of strain 63‐R2 contigs contigDRAFT1 and contigDRAFT2 which represent the residuals of a plasmid that has integrated into the chromosome

2.9. Additional bioinformatics tools

3. RESULTS

3.1. Initial comparison of the genome of Lochhead strain 63‐R2 with that of other completely sequenced strains of Hbt. salinarum

Table 1.

Figure 1.

3.2. Detailed comparison of the chromosomes from the most closely related strains, 63‐R2, NRC‐1, and R1

Table 2a.

Table 2b.

Figure 2.

3.3. Comparison of the core chromosomes from the two Lochhead strains 63‐R2 and 91‐R6

3.4. Comparison of the reported plasmids from strain 63‐R2 and strain R1

Figure 3.

Table 3.

Table 4.

3.5. Assembly of additional plasmids of strain 63‐R2 from the PacBio reads deposited at the SRA database

3.5.1. Assembly of plasmid pHcu190 from strain 63‐R2 and its comparison to pNRC100 from strain NRC‐1 and pHS1 from strain R1

Table 5a.

Table 5b.

3.5.2. Detection and assembly of plasmid pHcu229 from strain 63‐R2, which is related to plasmid pHS2 from strain R1 and plasmid pNRC200 from strain NRC‐1

Table 6a.

Table 6b.

3.5.3. Integration of a plasmid into the chromosome, various sequence heterogeneities, and long contigs (contigDRAFT1 and contigDRAFT2)

4. DISCUSSION

Figure 4.

AUTHOR CONTRIBUTIONS

CONFLICT OF INTEREST STATEMENT

ETHICS STATEMENT

ACKNOWLEDGMENTS

1.

Figure A1.

Figure A2.

Table A1.

Table A2.

Table A3.

Table A4.

Table A5.

Table A6.

Table A7.

DATA AVAILABILITY STATEMENT

REFERENCES

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases