Skip to main content
Elsevier Sponsored Documents logoLink to Elsevier Sponsored Documents
. 2015 Nov;118:313–321. doi: 10.1016/j.biochi.2015.07.008

Comparative genomics reveals conserved positioning of essential genomic clusters in highly rearranged Thermococcales chromosomes

Matteo Cossu 1, Violette Da Cunha 1, Claire Toffano-Nioche 1, Patrick Forterre 1, Jacques Oberto 1,
PMCID: PMC4640148  PMID: 26166067

Abstract

The genomes of the 21 completely sequenced Thermococcales display a characteristic high level of rearrangements. As a result, the prediction of their origin and termination of replication on the sole basis of chromosomal DNA composition or skew is inoperative. Using a different approach based on biologically relevant sequences, we were able to determine oriC position in all 21 genomes. The position of dif, the site where chromosome dimers are resolved before DNA segregation could be predicted in 19 genomes. Computation of the core genome uncovered a number of essential gene clusters with a remarkably stable chromosomal position across species, in sharp contrast with the scrambled nature of their genomes. The active chromosomal reorganization of numerous genes acquired by horizontal transfer, mainly from mobile elements, could explain this phenomenon.

Keywords: Archaea, Thermococcales, Genome evolution, Mobile elements, Bioinformatics, Chromosomal landmarks

Abbreviations: dif, chromosome dimer resolution site; NCBI, National Center for Biotechnology Information; nt, nucleotide; ORB, origin recognition boxes; oriC, origin of replication; PSSM, position-specific scoring matrix; TCA, tricarboxylic acid; RPKM, reads per kilobase per million mapped reads

Highlights

  • Thermococcales chromosomal landmarks were uncovered using biologically relevant sequences.

  • Core genomes procedures predict integration of mobile elements on Thermococcales chromosomes.

  • Thermococcales genomes are highly rearranged but core clusters positions remain invariable.

  • Thermococcales core genes are more expressed and predominantly encoded on the leading strand.

1. Introduction

The discovery of anaerobic hyperthermophilic microbes by Karl Stetter and Wolfram Zillig extended the limits of life beyond environmental barriers commonly considered as insuperable. Inhospitable habitats such as saline thermal pools and deep sea hydrothermal vents have been remarkably colonized by these extremophilic life forms. The organisms whose optimal growth temperature approaches or exceeds that of boiling water, belong exclusively to the third domain of life: the Archaea. A significant proportion of microorganisms thriving at the fringe of life in terms of temperature belong to the taxonomic order Thermococcales, ranked in the Euryarchaeaota phylum [1]. Thermococcales are divided into three principal genera: Pyrococcus, Thermococcus and Palaeococcus, and grow chemoorganoheterotrophically at temperatures ranging from 80 °C to 100 °C [2]. They require a source of protein and present variable amino acid requirements; several species such as Pyrococcus furiosus and Thermococcus kodakarensis are able to use chitin as a carbon source [3]. Thermococcales grow easily in the laboratory in complete or synthetic media under strict anoxia. To produce energy, these Archaea prefer anaerobic respiration using S° as terminal electron acceptor to produce hydrogen sulfide. Alternatively, they are able to ferment pyruvate to produce hydrogen [2]. Such unique growth parameters prompted several teams to investigate biosynthetic pathways in Thermococcales. The central metabolism differs quite notably from previously known pathways. The pentose pathway is absent, the TCA cycle is incomplete and glycolysis uses a number of enzymes remarkably different from the canonical view [2]. Even if the net energy balance is still subject to debate, it appears that these Archaea are geared towards an extremely conservative use of energy [2]. Despite their extreme growth conditions, low energetic efficiency and simplified biochemistry, Thermococcales display a very short generation time as low as 23 min [4]. This doubling interval is remarkably similar to that of the fast growing model microbe Escherichia coli, grown under the much more favorable conditions of aerobic respiration [5]. Growth efficiency of Thermococcales is in sharp contrast with an apparent disorganization of their chromosome. Indeed it has been reported that these genomes are subjected to a shuffling-driven evolution [6]. This apparent paradox prompted us to investigate, in this work, the process of fast cell growth and rapid chromosome replication by analyzing genomic organization and replication patterns of the completely sequenced Thermococcales.

2. Material and methods

2.1. Genomic data files retrieval and formatting

GenBank genomic data files corresponding to the 21 Thermococcales species were retrieved locally from the NCBI repository using four sequential commands from NCBI Entrez Programming Utilities (E-Utilities). This redundant procedure was defined in order to guarantee retrieval of the main chromosome of complete genomes exclusively. The first command allows retrieval of the species-specific bioproject:

The second command permits to examine the 'Sequencing_Status' flag for completeness:

The third command retrieves the unique and chromosome-specific GenBank Identification (GI) number:

The fourth command retrieves locally the organism-specific data file in GenBank format:

The Thermococcales protein sequences were extracted in Fasta format from these GenBank files using an in-house c# parsing script retaining only the actual amino acid sequence and the unique genomic identification number (GI). All proteins were merged into a single database which was converted to binary format using the NCBI executable 'makeblastdb'. The same script generated a separate indexed file where each individual protein was represented using the following fields: ORF genomic orientation, ORF starting and ending coordinates, gene name, unique protein GI identifier, protein function and source organism name.

2.2. Thermococcales phylogenetic tree

DNA sequence corresponding to the 16S ribosomal RNA genes were retrieved using the BAGET web service at http://archaea.u-psud.fr/bin/baget.dll [7]. PhyML phylogeny was computed using web service http://phylogeny.lirmm.fr/ [8].

2.3. Thermococcales origin of replication prediction

Replication origin predictions with GC skew or Z-curve methods were performed using software Ori-Finder 2 available at http://tubic.tju.edu.cn/Ori-Finder2/ [9]. In a second predictive method, we used the mini-ORBs sequences identified in Pyrococcus abyssi by Matsunaga et al. [10] as a matrix for oriC prediction using FITBAR available at http://archaea.u-psud.fr/fitbar [11]. In this case, the search algorithm parameters were log-odds PSSM, with a local Markov Model to compute the p-value of the newly predicted ORB site and the investigation was made in intergenic regions only. We have considered as putative replication origin, intergenic regions where more than 4 mini-ORBs can be predicted using FITBAR, with p-values < 0.005. These results were compared to those obtained with Ori-Finder 2 using as ORBs sequences, the three motifs predicted for Thermococcaceae. These three conserved motifs of ORBs sequences were obtained from the comparison of Thermococcales replication origin indicated in the DoriC database [12]. The conserved ORB motifs were calculated from the Thermococcales records in DoriC, with the MEME tool (Multiple EM for Motif Elicitation) used to discover conserved patterns in related DNA sequences [13].

2.4. Thermococcales dif site prediction

The identification of dif sites on the 21 sequenced Thermococcales chromosomes was performed using a consensus sequence deduced from the alignment of predicted dif sites in P. abyssi, Pyrococcus horikoshii, P. furiosus and Thermococcus kodakaraensis [14]. This consensus was then used to perform dif site prediction using FITBAR with the same search algorithm parameters as described above for ORBs prediction but on the whole chromosome. Progressively, every newly predicted sequence was added to the consensus to improve detection sensitivity.

2.5. Homology searches of XerA recombinase

Thermococcales XerA orthologs were searched by BLASTp analysis using the amino acid sequence of P. abyssi XerA (NP_126073.1). A second predictive method was performed using SYNTTAX web service [15] available at http://archaea.u-psud.fr/synttax.

2.6. Core genome procedure

The core genome procedure was conducted as follows. We designed a c# script to construct protein orthologous groups by non-redundant bi-directional BLASTs. Every BLAST score was normalized to the alignment of query and hit proteins to themselves. Proteins showing normalized bi-directional BLASTs > 30% were considered orthologous as recommended by Lerat et al. [16]. A c# script was designed to query the orthologous groups and define the core genome which consists of all protein genes present at least once in the whole dataset. A 'single core' dataset was derived for this core genome by excluding orthologous classes containing more than a single representative per genome.

2.7. Core genome chromosomal positioning

For each gene composing the single core, we calculated the mean distance to the predicted origin of replication and its standard deviation (SD) using an in house c# script. The core genes were then successively ranked by mean distance and SD to highlight the presence of clusters.

2.8. P. abyssi genome expression

In order to quantify the expression level of every gene in P. abyssi, we used RNA-seq data obtained across several growth phases as described in Ref. [17]. As the sequencing was produced in a directed way, the reads alignment respects the strand of the DNA molecule. The CompareOverlapping tool from the S-mart toolbox [18] was used (with the -c option to respect strand constraint) in order to define the number of overlapping reads for every CDS feature defined into the NC_000868.1 entry from the NCBI repository. For each gene, the RPKM measurement defined by Ref. [19] was computed based on the number of overlapping reads, a read size of 40nt, and a total of 5587560 aligned reads. We have used the RPKM measure for each gene as an estimation of their respective expression level.

3. Results

3.1. Thermococcales genomic dataset

At the time of writing, 21 Thermococcales genomes have been completely sequenced and annotated. They are publicly available at the NCBI repository and consist of 13 Thermococcus, 7 Pyrococcus and 1 Palaeococcus (Table 1). Thermococcales carry a single ∼2 Mb chromosome and encode an average of 2100 proteins. Evolutionary relationships among the various species are illustrated by a phylogenetic tree of their 16S ribosomal RNA genes (Fig. 1). Genomic sequences were retrieved as described in Materials and Methods. The comparative genomic analysis presented here is based on this entire dataset. The first step of this analysis consisted in the identification of chromosomal landmarks such as the origin and terminus of DNA replication followed in a second step by the comparison of the protein content at the genomic level.

Table 1.

List of Thermococcales species with a complete genome sequence available.

Species Bioproject GI Genes Size (Mb) GC% Optimum T°C Habitat Reference
Palaeococcus pacificus DY20341 PRJNA207495 664800204 2046 1.86 43.0 80 °C Aquatic [57]
Pyrococcus abyssi GE5 PRJNA62903 14518450 1875 1.77 44.71 103°C/90 °C Aquatic [58]
Pyrococcus furiosus DSM 3638 PRJNA57873 18976372 2225 1.90 40.77 100°C/90 °C Aquatic [59]
Pyrococcus furiosus COM1 PRJNA169620 397650687 2113 1.91 40.79 100 °C Aquatic [60]
Pyrococcus horikoshii OT3 PRJNA57753 14589963 2000 1.73 41.88 98°C/95 °C Aquatic [61]
Pyrococcus sp. NA2 PRJNA66551 332157643 2028 1.86 42.74 93 °C Aquatic [62]
Pyrococcus sp. ST04 PRJNA167261 389851449 1839 1.73 42.30 95 °C Aquatic [63]
Pyrococcus yayanosii CH1 PRJNA68281 337283511 1952 1.72 51.64 98 °C Aquatic [64]
Thermococcus barophilus MP PRJNA54733 315229765 2257 2.01 41.76 85 °C Aquatic [65]
Thermococcus eurythermalis strain A501 PRJNA251677 700302025 2183 2.12 53.47 85 °C Aquatic [66]
Thermococcus gammatolerans EJ3 PRJNA59389 240102057 2210 2.05 53.56 88 °C Aquatic [67]
Thermococcus guaymasensis DSM11113 PRJNA230529 744793172 2170 1.92 52.86 88 °C Aquatic Zhang,X. et al., 2015
Thermococcus kodakarensis KOD1 PRJNA58225 57639935 2358 2.09 52.00 85 °C Aquatic [68]
Thermococcus litoralis DSM 5473 PRJNA82997 530547444 2575 2.22 43.09 83 °C Aquatic [69]
Thermococcus nautili strain 30-1 PRJNA237737 589908590 2288 1.97 54.84 87.5 °C Aquatic [70]
Thermococcus onnurineus NA1 PRJNA59043 212223144 2026 1.85 51.27 80 °C Terrestrial [71]
Thermococcus sibiricus MM 739 PRJNA59399 242397997 2107 1.85 40.20 78 °C Oil [72]
Thermococcus sp. 4557 PRJNA70841 341581088 2181 2.01 56.08 ND Aquatic [73]
Thermococcus sp. AM4 PRJNA54735 350525682 2279 2.08 54.78 80 °C Aquatic [74]
Thermococcus sp. CL1 PRJNA168259/PRJNA167371 390960176 2090 1.95 55.82 85 °C Aquatic [75]
Thermococcus sp. ES1 PRJNA230233 573023865 2090 1.95 40.30 82 °C Aquatic [76]

Fig. 1.

Fig. 1

Phylogenetic tree of the 21 sequenced Thermococcales. The phylogeny of the Thermococcales dataset was calculated with PhyML using the 16S ribosomal RNA genes as described in Material and Methods.

3.2. Prediction of Thermococcales DNA replication origins

The duplication and transmission of genetic information without loss is of fundamental importance for living cells. Cell division must be accompanied by DNA replication executed with appropriate timing and frequency. In all organisms, replication initiates at specific region(s) of the genome known as the origin of replication (oriC) site(s). Eukaryotic DNA replication is initiated at multiple origins at different times across linear chromosomes. In eukaryotes, the origin recognition complex (ORC) contains six separate polypeptides, Orc1-6. Comparative genomic analysis of whole archaeal genome sequences show that the archaeal machinery responsible for DNA replication is largely homologous to that of eukaryotes and is clearly distinct from its bacterial counterpart [20], [21]. It has been shown experimentally that the archaeal origin binding protein is homologous to the related eukaryotic Orc1 and Cdc6 proteins [22]. The fine mapping of the three replication origins in Sulfolobus solfataricus led to the identification of origin recognition boxes (ORBs) and mini-ORBS [23]. ORBs are repeated sequences located on both sides of A/T rich regions and were shown to be the binding site for Cdc6 proteins [23]. ORBs from different species share sequence similarity with a consensus sequence referred to as mini-ORB. It was shown that mini-ORBs are sufficient to bind Cdc6 proteins and that Cdc6 from one organism (Cdc6-1 of S. solfataricus) can bind ORBs from other species in vitro (P. furiosus, Halobacterium NRC1) [23]. ORBs sites are well conserved across many archaeal species and specific binding of ORB sequences by Cdc6 is likely to be a common mechanism for origin recognition in Archaea [22], [24], [25], [26]. Several archaeal species such as S. solfataricus, Sulfolobus acidocaldarius, Haloferax volcanii and Aeropyrum pernix possess multiple oriC per chromosome [23], [27], [28], [29]. Multiple chromosomal replication origins might have arisen by capture of viral or plasmidic replication origins and their respective associated initiator factor [21]. On the other hand, single origins were found in Methanothermobacter thermautotrophicus [24] and mapped precisely in the Thermococcales genus Pyrococcus [22], [30]. In order to compare our genomic dataset, it was fundamental to identify a common and unique genomic feature shared by all 21 Thermococcales genomes under study. Since the origin of replication was shown to be unique in these genomes, we proceeded with a computational prediction of their respective locations. Several bioinformatics techniques have been used to locate origins of replication in prokaryotic genomes: they are based on the measure of asymmetric nucleotide compositions on leading and lagging strands. Cumulative GC-skew plots are commonly used for this purpose [31], [32], [33], [34]. Thermococcales oriC for species P. abyssi, P. horikoshii and P. furiosus have been located using other skewed sequences such as GGTT and GGGT [6], [30]. However, these two particular skews and the remaining 254 tetranucleotide combinations failed to reliably predict Thermococcus origins (data not shown). Alternative scoring methods such as Z-curve calculation have been used successfully for the archaea Methanocaldococcus jannaschii and Methanosarcina mazei, Halobacterium sp. strain NRC-1 and S. solfataricus P2 [9]. Cumulative GC skew and Z-curve methods were tested on Thermococcales genomes using the Ori-Finder 2 web service [9], and the results obtained with four representative genomes are shown in Supplemental Fig. S1. Our results show that the cumulative GC skew method fails to locate replication origins in Thermococcales. The Z-curve approach is positive for few genomes such as P. abyssi and T. kodakarensis but does not provide a prediction for the remaining genomes. Clearly, methods based on Z-curve and DNA composition bias or skew were inoperative for the robust prediction or replication origins in Thermococcales. Therefore, in order to map the position of the replication origins we adopted a different approach based on the systematic detection of biological sequences associated with the initiation of DNA synthesis. As shown above these repeated sequences called ORB are clustered at or near the replication origin and often closely associated with the Cdc6 genes encoding a protein involved in the initiation or replication [10]. All Thermococcales encode a unique Cdc6 gene except Thermococcus sp. CL1 which encodes a second putative Cdc6-related protein encoded by gene CL1_0695. Using the published archaeal mini-ORB sequences [10], the web service FITBAR [11] was used to build consensus sequence and detect its occurrences genome wide, as described in Materials and Methods. A unique oriC could be detected unambiguously in all Thermococcales from the dataset with a p-value < 0.005 (Table 2 and Suppl. Fig. S2). No putative ORB sequence could be found near the second Cdc6-related gene of Thermococcus sp. CL1 and this observation is in agreement with Ori-Finder 2 predictions (data not shown). The association between oriC and Cdc6 was found in all genomes except Thermococcus litoralis and Thermococcus sibiricus where the oriC-Cdc6 distance is respectively 453 kb and 349 kb. Synteny analysis using the SYNTTAX web service [15] indicated that in Thermococcus and Palaeococcus genera, oriC is located between Cdc6 and Rad51-ortholog RadA (Suppl. Fig. S3A). Like its bacterial recA and eukaryal Rad51 orthologs, RadA in involved not only in double strand break repair but also in DNA replication by rescuing collapsed replication forks [35]. In Pyrococcus genus, Cdc6 and oriC are also immediately adjacent whereas RadA is not syntenic (Suppl. Fig. S3B). In all cases, the origin of replication is located in extended non-translated regions or overlaps small computer-predicted orphan genes (Suppl. Fig. S3A&B). A prediction of clustered ORB sequences obtained with the FITBAR web service [11] was used to localize oriCs as shown in Supplemental Table S1. Our analysis indicates that the most robust oriC predictions are those based solely on mini-ORB clusters. The positions of these clusters were therefore considered as bona fide oriC (Table 2, column 2). Replication origin positioning was then used as the first common reference to align and orient all genomes in the dataset (Supp. Fig. S2).

Table 2.

Prediction of oriC and dif in Thermococcales.

Species Putative oriC characteristics
Putative dif characteristics
Position on chromosome (Orb cluster coord.) Cdc6 coord. Sequence (28 bp)
Position on chromosome Intergenic location
Left arm Spacer Right arm
Palaeococcus pacificus DY20341 1858353..0 583..1839 TTTGGATATAA TCAACA TTATATCTAAA 1158048 Yes
Pyrococcus abyssi GE5 122701..123499 121402..122700 ATTGGATATAA TCGGCC TTATATCTAAA 1220264 Yes
Pyrococcus furiosus DSM 3638 15355..16235 16236..17498 TTTAGATATAA TCAGCC TTATATCTAAA 659548 Yes
Pyrococcus furiosus COM1 1479769..1480649 1478506..1479768 TTTAGATATAA TCAGCC TTATATCTAAA 462638 Yes
Pyrococcus horikoshii OT3 110790..111561 109476..110789 TTTAGATATAA TCAGCC TTATATCTAAA 736581 Yes
Pyrococcus sp. NA2 579324..580109 578064..579323 ND
Pyrococcus sp. ST04 227904..228761 228762..230021 ND
Pyrococcus yayanosii CH1 1426398..1427171 1427172..1428431 TTTAGATATAA TGATCC TTATATCTAAA 1058381 Yes
Thermococcus barophilus MP 1672620..1673707 1670448..1671713 TTGTCATATAA TATGCC TTATATCTAAA 880625 Yes
Thermococcus eurythermalis strain A501 425720..426421 423614..424867 TTTAGATATAA TGTACC TTATATCTAAA 1862025 Yes
Thermococcus gammatolerans EJ3 126739..127591 125431..126738 TTTGGATATAA TGTACC TTATATCTAAA 1457065 Yes
Thermococcus guaymasensis DSM11113 813701..814368 1594403..1595665 TTTAGATATAA TGTGCC TTATATCTCAA 100930 Yes
Thermococcus kodakarensis KOD1 1711251..1712157 1712158..1713405 TTTTGATATAA TGTACC TTATATGACAA 483614 Yes
Thermococcus litoralis DSM 5473 974680..975085 1594403..1595665 TTTGGATATAA TGTGCC TTATATGACAA 1867166 No
Thermococcus nautili strain 30-1 1603522..1604207 1605068..1606321 TTGAGATATAA TGTACC TTATATCTAAA 772784 Yes
Thermococcus onnurineus NA1 1510250..1510926 1508116..1509363 TTTAGATATAA TGTGTC TTATATCTAAA 854799 Yes
Thermococcus sibiricus MM 739 1783451..1784177 1434100..1435362 TTGTCATATAA TAAGCC TTATATCTAAA 689121 No
Thermococcus sp. 4557 1373703..1374410 1376165..1377412 TTTTCCTATAA TGTGCC TTATATCTAAA 97343 Yes
Thermococcus sp. AM4 1530315..1531266 1529070..1530314 TTTGGATATAA TGTGCC TTATATCCAAA 849102 Yes
Thermococcus sp. CL1 1018000..1018309 1020367..1021614 TTTGGATATAA TGTACC TTATATCCAAA 1704316 Yes
Thermococcus sp. ES1 1754560..1755481 1752377..1753639 TTTAGATATAA TGAATC TTATATGACAA 1028150 Yes
Thermococcales dif consensus WTKDSMTATAA TVDDYM TTATATSHMAA

3.3. Prediction of Thermococcales DNA replication termination sites

As shown above, the cumulative GC-skew cannot be used reliably to predict the location of terC where Thermococcales terminate bidirectional DNA replication. So far, terC sites have received much less attention than oriC. To our knowledge, neither biological nor sequence data are available to define where replication forks meet. In accordance with the bacterial paradigm, archaeal DNA replication forks are believed to terminate in the vicinity of dif sites [14], [36]. These dif sites are present in a single copy per genome and are used by a Xer-like recombinase to resolve chromosome dimers, a critical step before their segregation into daughter cells [37]. The 28-nt dif site is composed by two inverted repeats of 11 base pairs (each one specific for one of the two Xer recombinase) separated by a central hexanucleotide; the XerCD/dif recombination system is widespread in the bacterial domain [38]. The efficiency of the archaeal XerA/dif system has been demonstrated in vitro [14]. By sequence homology search, XerA orthologs were found in single copy in all Thermococcales (data not shown). In order to identify dif sites in our dataset, we followed the same methodology used for oriC, as described above. The biological dif sites proposed by Cortes et al. [14] were used to build a consensus for genome wide searching using FITBAR [11]. Bona fide unique dif sites could be identified for 19 genomes out of 21 (Table 2 and Suppl. Fig. S2). The dif site position of Pyrococcus sp. NA2 and Pyrococcus sp. ST04 were estimated to be opposite from their respective predicted oriC.

3.4. Core genome

Early chromosomal alignments demonstrated the high level of recombinations and rearrangements in Thermococcales genomes [6]. These observations indicate that these genomes evolve rapidly which might suggest that their genetic content is also highly variable among species. In order to quantify this genomic drift, we submitted our dataset to a recursive systematic comparison of the predicted protein sequences they encode. Each Thermococcales genome encodes an average of 2100 proteins. All the corresponding sequences were compared as described in Material and Methods in order to rank them into orthologous groups. These groups could then be queried to extract common proteins, defined as 'core genome' as well as species-specific or genus-specific proteins and their combinations (Fig. 2). We have used two genetic subsets to define the core: a distinction was made between the 'general core' which contains proteins orthologs and paralogs in every genome and a more restrictive 'single core' which regroups only single copy orthologs shared by all genomes. The general core and single core amount to 790 and 668 proteins respectively (Fig. 2 and Suppl. Table S2A&B). A detailed gene list of the 668 core genome is presented in Supplemental Table S3. The same procedure allowed the identification of genus-specific proteins as well. Pyrococcus and Palaeococcus genera encoded respectively 19 and 116 specific proteins whereas a single Thermococcus-specific protein was found. As shown in Table 3, these proteins could be ranked into functional groups as defined in the archaeal clusters of orthologous genes (ArCOGS) [39]. The core genome comprises proteins of the following classes: information storage and processing (32%), metabolism (30%), poorly characterized (27%) and cellular processes and signaling (11%). This high conservation is in sharp contrast with the very limited chromosomal alignment observed to these organisms [6]. Thus it seemed important to analyze whether this genomic conservation would be clustered to particular chromosomal locations.

Fig. 2.

Fig. 2

Venn diagram for core and genus-specific proteins counting. Core, genus-specific proteins and their combinations were computed as described in Materials and Methods.

Table 3.

ArCOG assignment of the Thermococcales core genes.

ArCOG class Function 790 core 668 core
Information storage and processing 32% (34%) Translation, ribosomal structure and biogenesis 149 140
RNA processing and modification 0 0
Transcription 52 43
Replication, recombination and repair 51 45
Chromatin structure and dynamics 0 0
Cellular processes and signaling 11% (10%) Cell cycle control, cell division, chromosome partitioning 11 8
Nuclear structure 0 0
Defense mechanisms 11 8
Signal transduction mechanisms 5 4
Cell wall/membrane/envelope biogenesis 14 12
Cell motility 7 5
Cytoskeleton 0 0
Extracellular structures 0 0
Intracellular trafficking, secretion, and vesicular transport 8 8
Posttranslational modification, protein turnover, chaperones 31 22
Mobilome: prophages, transposons 0 0
Metabolism 30% (27%) Energy production and conversion 52 28
Carbohydrate transport and metabolism 33 30
Amino acid transport and metabolism 45 36
Nucleotide transport and metabolism 28 25
Coenzyme transport and metabolism 41 36
Lipid transport and metabolism 12 12
Inorganic ion transport and metabolism 25 11
Secondary metabolites biosynthesis, transport and catabolism 5 4
Poorly characterized 27% (29%) General function prediction only 128 115
Function unknown 82 76

Bold numbers in columns 1 & 3 refer to 790 core genes.

3.5. Core genome positioning

In Eukarya, genes involved in related and essential functions often cluster on the chromosome and are co-expressed, which correlates with elevated expression rates [40], [41]. In Archaea and Bacteria, these genes belong to single transcription units or operons, which provide tight co-regulation in addition to expression polarity [42]. Furthermore, bacterial genomes display a non-random gene organization at a higher level such as macrodomains [43] or with multiple scales [44]. Additional chromosomal structuring involves positioning of essential genes preferentially on the leading strand [45] and clustering of transcription and replication genes in the proximity of the bacterial origin of replication [46]. The archaeal chromosome organization has not been investigated in depth with the exception of a few Crenarcheota. It was shown that S. solfataricus and S. acidocaldarius are equipped with three origins or replication surrounded by a higher density of core or essential genes; furthermore, these same regions are more highly expressed [36]. These reports prompted us to investigate the genomic architecture of the Euryarchaeota Thermococcales. For each genome in the dataset, we constructed a detailed physical map indicating the position of each gene. We have used our oriC and dif sites predictions to determine the polarity of each gene respective to the orientation of the replication forks (Fig. 3 and Suppl. Fig. S2). These maps could be used to calculate the proportion of genes whose transcription is collinear with the orientation of DNA replication. Out of the 19 genomes where dif could be predicted, 16 display a higher proportion of genes encoded on the leading strand (Suppl. Table S4). Plotting of 'single core' genes onto the same circular physical maps indicated an even higher proportion of leading strand-encoded genes for 16 genomes (Suppl. Table S4). Since previous studies have shown that essential Sulfolobus genes are clustered near the origin or replication [36], we investigated whether this is the case in Thermococcales as well. We therefore calculated the genomic distance to the respective predicted oriC for each single core ortholog (Suppl. Table S3). Computation of their mean distance and standard deviation allowed the definition of 17 genes clusters whose distance to oriC remains relatively invariable across species (Table 4). The locations of these clusters for each Thermococcales are shown in Supplemental Fig. S2; they often correlate with GC-skew variations.

Fig. 3.

Fig. 3

Graphical correlation between core-free genomic regions and integration of mobile elements in Thermococcus kodakarensis. The physical map corresponding to Thermococcus kodakarensis was drawn proportionally. The outermost numbered cyan bars indicate the clusters of core genes. Each black bar positions a single gene of the entire genome: the outer bars correspond to genes transcribed in the same polarity as DNA replication; the inner bars refer to the opposite orientation. Similarly, red bars correspond to single 'core genes' with the same orientation convention as above. Bright green bars indicate the location of clusters of species-specific genes (integrated mobile elements). Purple and green bars correspond to GC skew values calculated in windows of 1000bp, shifted 500bp with the purple and green bars indicating values below and above average genomic GC skew, respectively. Predicted origins of replication and dif sites are show as green circles and red squares, respectively. The positions of the four integrated elements (TKV1 to TKV4) as well as the predicted dark matter islands are represented in blue color.

Table 4.

Thermococcales conserved clusters characteristics.

Cluster oriC distance
Number of genes Mean expression level
pangenomic: 668.5
single core: 896.7
clusters: 1978.8
Relevant encoded protein(s)
Mean (%) Standard deviation (%)
01 0.33 0.44 3 478.9 Hypothetical
02 2.69 1.91 2 221.1 Molybdopterin converting factor, subunit 2
03 5.17 3.42 2 2551.7 Hypothetical
04 5.39 3.23 3 557.2 KEOPS complex KAE1
05 7.36 4.34 7 877.6 V-type ATP synthase, 7 subunits
06 8.25 3.41 3 268.2 Preprotein translocase
07 9.14 4.67 2 357.5 Oligopeptide transporters
08 12.94 5.18 5 2926.0 RNA polymerase
09 17.76 3.90 27 3626.6 Ribsosomal proteins
10 20.89 3.63 10 2234.8 Ribosomal proteins – RNA polymerase
11 22.40 5.77 5 482.4 Thymidylate kinase
12 23.46 4.47 3 1011.2 DNA primase
13 24.62 5.45 3 234.9 Mevalonate kinase
14 26.50 5.92 7 1535.2 Ribosomal proteins - RNA polymerase
15 33.34 6.01 2 486.7 Glutamyl-tRNA(Gln) amidotransferase
16 34.14 5.44 2 840.6 Translation initiation factor IF-2
17 38.58 5.63 2 1685.0 Ribosomal protein

3.6. Expression of core genes and conserved gene clusters

Recent experiments have shown that core genes are more strongly expressed in the model organism E. coli [47]. It was therefore important to verify this observation in Thermococcales. The next logical step consisted in the analysis of the correlation between gene position and level of gene expression. We have used the pangenomic gene expression data which was measured recently in P. abyssi using RNA-seq [17]. As shown in Table 4, the mean expression level of the 17 gene clusters described above indicates that they are more transcribed than single core genes which in turn are also more expressed than non-core genes. The largest clusters 8, 9 and 10 were found to be the most highly expressed; they contain genes encoding RNA polymerase subunits and ribosomal proteins. Remarkably, these clusters are positioned at one-quarter of the genome length suggesting that a high selective pressure is acting to constrain them at this particular favorable location.

3.7. Localization of organism-specific genes

The positioning of the 'single core' on the chromosomal maps revealed, for all genomes, a number or large area devoid of core genes (Fig. 3 and Suppl. Fig. S2). We observed that clusters containing 3 or more species-specific genes could overlap these blank regions. Since species-specific clusters correspond very likely to the integration of mobile elements such as plasmids or viruses, we can extrapolate the nature of these blank regions as being integrated mobile elements shared by several genomes. Contrarily to what was observed in Sulfolobales [48], the integration of mobile elements in Thermococcales is not confined to a specific location and seems to occur randomly on the chromosome (Suppl. Fig. S2). To confirm this observation, we have mapped on the T. kodakarensis genomic map the four known integrated elements (TKV1 to TKV4) [49] and predicted dark matter islands [50]; all are located in core-free regions (Fig. 3).

4. Discussion

With the exception of three methanogens, all archaeal genomes sequenced to date encode at least one Cdc6/Orc1 protein which initiates chromosomal DNA replication at one or more oriC origins [51].

In most prokaryotes including several Archaea, chromosomal oriCs can be predicted on the basis of DNA composition using GC-skew [52] or Z-curve algorithms [53]. The comparative genomics analysis presented here confirms the initial observation that Thermococcales chromosomes are highly rearranged. In these genomes, DNA sequence scrambling has reached such a high level that commonly observed prokaryotic chromosomal landmarks such as oriC and terC are no longer readily identifiable by measuring DNA composition biases. It was indeed reported that pure in silico approaches can be unreliable due to frequent genome rearrangements [54]. Nevertheless, the regions corresponding to the origin and termination of replication could be predicted by the means of biological sequence sites determined either biochemically or by analogy to bacterial systems. In most Archaea, replication initiates at ORB sites specifically recognized and bound by Cdc6 [22]. Using the well documented ORB sequences [10], unique origins of replication could be predicted unambiguously for all 21 genomes. They are located in close proximity to RadA which corresponds also to the genomic context of Cdc6 in 19 genomes out of 21. The chromosomal location of terC was identified by the means of the XerC binding site (dif) as defined by Cortez et al. [14]. A unique corresponding site could be identified with high confidence in 19 genomes out of 21. The locations of oriC and dif in each genome define the respective replichores which appear asymmetrical in most Thermococcales and extremely asymmetrical in Pyrococcus yayanosii. This observation raises the question whether terC and dif are co-localized. By analogy to bacterial systems, it is commonly accepted that DNA replication termination and dif sites coincide [14], [36]. On the other hand, an extensive computational analysis based on bacterial genomes has shown a lack of correlation between dif position and the degree of GC skew suggesting that replication termination does not occur strictly at dif sites [55]. However it is quite difficult to extrapolate replication features between Archaea and Bacteria since they use such different replication proteins. Recent evidence has shown that in the Crenarchaeota S. solfataricus, replication termination and dimer resolution are temporally and spatially distinct processes [56]. Since this organism carries three functional oriCs whereas a single one is found in Thermoccocales, it is once again difficult to transpose replication features across archaeal phyla. In the absence of experimental data and of a functional cumulative GC skew in Thermococcales, we cannot prove nor disprove that terC and dif positions are distinct.

To assess whether the observed genomic rearrangement could be reflected at the protein level as well, we conducted an extensive ranking of each protein into orthologous groups using a discriminant threshold of 30% similarity. This procedure permitted to characterize the core genome of Thermococcales as well as genus- and species-specific proteins. The 21 genomes considered here share 790 orthologs which corresponds to ∼40% of their total proteins. From the core genome, we isolated the subset of proteins found only once per genome. The genes encoding these 668 'single core' proteins were plotted onto circular chromosome maps which revealed several interesting features. First, the 'single core' genes are not evenly distributed along the chromosome: a number of very extensive areas without core genes are readily observable in all 21 genomes. This phenomenon can be interpreted as the result of recent acquisitions of (non essential) genetic information through horizontal transfer. In a further analysis we were indeed able to show that clusters of strain-specific genes, which correspond presumably to integrated mobile elements, are precisely located within these regions. A second feature consists in the conservation of clusters of core genes in particular location of the chromosome, across Thermococcales. A series of 17 clusters could be identified with a standard deviation of mean distance to origin ≤6%. Despite a high level of genomic rearrangements, the absolute distance between these clusters and the origin of replication remains remarkably constant. These clusters are not confined to oriC-proximal regions but are scattered along the entire chromosome. It is interesting to note that the individual clusters do not belong to the same replichore in every organism; however, their distance to oriC is maintained in a mirrored fashion. The size of each cluster is variable and ranges from 2 to 27 genes often expressed in operons. The largest clusters group essential genes involved in protein translation (cluster 9, 27 genes), gene transcription and protein translation (cluster 10, 10 genes; cluster 14, 7 genes) and energy metabolism (cluster 5, 7 genes). A third feature of the 'single core' consists in its enrichment of genes encoded on the leading strand. This is particularly true with the largest clusters for which a net variation in GC skew is also readily apparent and is very likely to reflect a gene orientation bias of the genes composing the clusters. Indeed, we computed that in 16 organisms out of 19, the core genome is enriched in genes expressed in the same orientation as DNA replication. We were able to show that most of the large clusters display a significantly higher expression rate which further correlates conserved gene position with essential biological functions. The positional conservation of essential genomic subregions is found in the three domains of life [40], [41], [42]. This work has shown that this property is particularly relevant in Archaea Thermococcales due to the highly level of rearrangements of their chromosomes. These small and heavily scrambled genomes were able to maintain highly expressed key genes in the most favorable chromosomal positions and transcribe them in a polarity compatible with DNA replication. We would like to hypothesize that genome shuffling is instrumental to better adapt to challenging extreme environments.

5. Conclusion

5.1. Evolution considerations

All the above observations indicate that a remarkable degree of 'order' has been maintained across Thermococcales even if they display highly scrambled chromosomes. Nevertheless, these organisms display an astonishingly short cell cycle in extreme and resource-deficient environments. This apparent paradox motivated our analysis. The data we presented here led us to propose that Thermococcales chromosome shuffling introduces an increased genome variability which is being actively used by natural selection: (1) to maintain highly expressed key essential genes in favorable and invariant chromosomal positions (2) continuously adapt and optimize the positioning of the constant flow of new genes acquired by horizontal transfer, in order to allow allopatric speciation. The molecular mechanism by which Thermococcales rearrange their chromosomes is presently being investigated.

Acknowledgements

This work was funded by the European Research Council under the European Union's Seventh Framework Program (FP/2007-2013)/Project EVOMOBIL - ERC Grant Agreement no. 340440.

Footnotes

Appendix A

Supplementary data related to this article can be found at http://dx.doi.org/10.1016/j.biochi.2015.07.008.

Appendix A. Supplementary data

The following are the supplementary data related to this article:

Supplemental Fig. S1.

Supplemental Fig. S1

Example of Origin replication prediction for four Thermococcales genomes. The graphs show the Z-curves (AT, GC, RY and MK disparity curves) obtained with Ori-Finder 2 with four Thermoccocales genomes. Short vertical red lines indicate cdc6 gene location on the genome. The black arrows indicate the oriCs defined by ORB clusters, predicted with Ori-Finder2. The location of the putative replication origins (ORBs) and potential termination locations (dif sites) predicted with FITBAR are indicated by purple and orange arrows, respectively.

Supplemental Fig. S2.

Supplemental Fig. S2

Physical chromosomal maps of the 21 Thermococcales. The physical maps corresponding to the 21 chromosomes of the dataset were drawn proportionally for each genome using a color coding identical to Fig. 2.

Supplemental Fig. S3A.

Supplemental Fig. S3A

Genomic of the origin or replication for genera Thermococcus and Palaeococcus. Orthologous genes are represented in consistent colors and the predicted origin of replication is drawn as a green circle.

Supplemental Fig. S3B.

Supplemental Fig. S3B

Genomic of the origin or replication for genus Pyrococcus. Orthologous genes are represented in consistent colors and the predicted origin of replication is drawn as a green circle.

mmc1.xlsx (448.6KB, xlsx)
mmc2.xlsx (150.1KB, xlsx)
mmc3.xlsx (117.2KB, xlsx)
mmc4.pdf (203.8KB, pdf)
mmc5.pdf (329.7KB, pdf)

References

  • 1.Achenbach-Richter L., Gupta R., Zillig W., Woese C.R. Rooting the archaebacterial tree: the pivotal role of Thermococcus celer in archaebacterial evolution. Syst. Appl. Microbiol. 1988;10:231–240. doi: 10.1016/s0723-2020(88)80007-9. [DOI] [PubMed] [Google Scholar]
  • 2.Brasen C., Esser D., Rauch B., Siebers B. Carbohydrate metabolism in archaea: current insights into unusual enzymes and pathways and their regulation. Microbiol. Mol. Biol. Rev. MMBR. 2014;78:89–175. doi: 10.1128/MMBR.00041-13. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Oku T., Ishikawa K. Analysis of the hyperthermophilic chitinase from Pyrococcus furiosus: activity toward crystalline chitin. Biosci. Biotechnol. Biochem. 2006;70:1696–1701. doi: 10.1271/bbb.60031. [DOI] [PubMed] [Google Scholar]
  • 4.Gorlas A., Alain K., Bienvenu N., Geslin C. Thermococcus prieurii sp. nov., a hyperthermophilic archaeon isolated from a deep-sea hydrothermal vent. Int. J. Syst. Evol. Microbiol. 2013;63:2920–2926. doi: 10.1099/ijs.0.026419-0. [DOI] [PubMed] [Google Scholar]
  • 5.Sezonov G., Joseleau-Petit D., D'Ari R. Escherichia coli physiology in Luria-Bertani broth. J. Bacteriol. 2007;189:8746–8749. doi: 10.1128/JB.01368-07. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Zivanovic Y., Lopez P., Philippe H., Forterre P. Pyrococcus genome comparison evidences chromosome shuffling-driven evolution. Nucleic Acids Res. 2002;30:1902–1910. doi: 10.1093/nar/30.9.1902. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Oberto J. BAGET: a web server for the effortless retrieval of prokaryotic gene context and sequence. Bioinformatics. 2008;24:424–425. doi: 10.1093/bioinformatics/btm600. [DOI] [PubMed] [Google Scholar]
  • 8.Dereeper A., Guignon V., Blanc G., Audic S., Buffet S., Chevenet F., Dufayard J.F., Guindon S., Lefort V., Lescot M., Claverie J.M., Gascuel O. Phylogeny.fr: robust phylogenetic analysis for the non-specialist. Nucleic Acids Res. 2008;36:W465–W469. doi: 10.1093/nar/gkn180. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Luo H., Zhang C.T., Gao F. Ori-Finder 2, an integrated tool to predict replication origins in the archaeal genomes. Front. Microbiol. 2014;5:482. doi: 10.3389/fmicb.2014.00482. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Matsunaga F., Glatigny A., Mucchielli-Giorgi M.H., Agier N., Delacroix H., Marisa L., Durosay P., Ishino Y., Aggerbeck L., Forterre P. Genomewide and biochemical analyses of DNA-binding activity of Cdc6/Orc1 and Mcm proteins in Pyrococcus sp. Nucleic Acids Res. 2007;35:3214–3222. doi: 10.1093/nar/gkm212. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Oberto J. FITBAR: a web tool for the robust prediction of prokaryotic regulons. BMC Bioinform. 2010;11:554. doi: 10.1186/1471-2105-11-554. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Gao F., Luo H., Zhang C.T. DoriC 5.0: an updated database of oriC regions in both bacterial and archaeal genomes. Nucleic Acids Res. 2013;41:D90–D93. doi: 10.1093/nar/gks990. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Bailey T.L., Williams N., Misleh C., Li W.W. MEME: discovering and analyzing DNA and protein sequence motifs. Nucleic Acids Res. 2006;34:W369–W373. doi: 10.1093/nar/gkl198. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Cortez D., Quevillon-Cheruel S., Gribaldo S., Desnoues N., Sezonov G., Forterre P., Serre M.C. Evidence for a Xer/dif system for chromosome resolution in archaea. PLoS Genet. 2010;6:e1001166. doi: 10.1371/journal.pgen.1001166. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Oberto J. SyntTax: a web server linking synteny to prokaryotic taxonomy. BMC Bioinform. 2013;14:4. doi: 10.1186/1471-2105-14-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Lerat E., Daubin V., Moran N.A. From gene trees to organismal phylogeny in prokaryotes: the case of the gamma-Proteobacteria. PLoS Biol. 2003;1:E19. doi: 10.1371/journal.pbio.0000019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Toffano-Nioche C., Ott A., Crozat E., Nguyen A.N., Zytnicki M., Leclerc F., Forterre P., Bouloc P., Gautheret D. RNA at 92 degrees C: the non-coding transcriptome of the hyperthermophilic archaeon Pyrococcus abyssi. RNA Biol. 2013;10:1211–1220. doi: 10.4161/rna.25567. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Zytnicki M., Quesneville H. S-MART, a software toolbox to aid RNA-seq data analysis. PloS one. 2011;6:e25988. doi: 10.1371/journal.pone.0025988. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Mortazavi A., Williams B.A., McCue K., Schaeffer L., Wold B. Mapping and quantifying mammalian transcriptomes by RNA-seq. Nat. Methods. 2008;5:621–628. doi: 10.1038/nmeth.1226. [DOI] [PubMed] [Google Scholar]
  • 20.Edgell D.R., Doolittle W.F. Archaea and the origin(s) of DNA replication proteins. Cell. 1997;89:995–998. doi: 10.1016/s0092-8674(00)80285-8. [DOI] [PubMed] [Google Scholar]
  • 21.Raymann K., Forterre P., Brochier-Armanet C., Gribaldo S. Global phylogenomic analysis disentangles the complex evolutionary history of DNA replication in archaea. Genome Biol. Evol. 2014;6:192–212. doi: 10.1093/gbe/evu004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Matsunaga F., Forterre P., Ishino Y., Myllykallio H. In vivo interactions of archaeal Cdc6/Orc1 and minichromosome maintenance proteins with the replication origin. Proc. Natl. Acad. Sci. U. S. A. 2001;98:11152–11157. doi: 10.1073/pnas.191387498. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Robinson N.P., Dionne I., Lundgren M., Marsh V.L., Bernander R., Bell S.D. Identification of two origins of replication in the single chromosome of the archaeon Sulfolobus solfataricus. Cell. 2004;116:25–38. doi: 10.1016/s0092-8674(03)01034-1. [DOI] [PubMed] [Google Scholar]
  • 24.Majernik A.I., Chong J.P. A conserved mechanism for replication origin recognition and binding in archaea. Biochem. J. 2008;409:511–518. doi: 10.1042/BJ20070213. [DOI] [PubMed] [Google Scholar]
  • 25.Ojha K.K., Swati D. Mapping of origin of replication in Themococcales. Bioinformation. 2010;5:213–218. doi: 10.6026/97320630005213. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Wu Z., Liu H., Liu J., Liu X., Xiang H. Diversity and evolution of multiple orc/cdc6-adjacent replication origins in haloarchaea. BMC Genomic. 2012;13:478. doi: 10.1186/1471-2164-13-478. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Lundgren M., Andersson A., Chen L., Nilsson P., Bernander R. Three replication origins in Sulfolobus species: synchronous initiation of chromosome replication and asynchronous termination. Proc. Natl. Acad. Sci. U. S. A. 2004;101:7046–7051. doi: 10.1073/pnas.0400656101. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Norais C., Hawkins M., Hartman A.L., Eisen J.A., Myllykallio H., Allers T. Genetic and physical mapping of DNA replication origins in Haloferax volcanii. PLoS Genet. 2007;3:e77. doi: 10.1371/journal.pgen.0030077. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Robinson N.P., Bell S.D. Extrachromosomal element capture and the evolution of multiple replication origins in archaeal chromosomes. Proc. Natl. Acad. Sci. U. S. A. 2007;104:5806–5811. doi: 10.1073/pnas.0700206104. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Myllykallio H., Lopez P., Lopez-Garcia P., Heilig R., Saurin W., Zivanovic Y., Philippe H., Forterre P. Bacterial mode of replication with eukaryotic-like machinery in a hyperthermophilic archaeon. Science. 2000;288:2212–2215. doi: 10.1126/science.288.5474.2212. [DOI] [PubMed] [Google Scholar]
  • 31.Grigoriev A. Analyzing genomes with cumulative skew diagrams. Nucleic Acids Res. 1998;26:2286–2290. doi: 10.1093/nar/26.10.2286. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Lopez P., Forterre P., le Guyader H., Philippe H. Origin of replication of Thermotoga maritima. Trends Genet. TIG. 2000;16:59–60. doi: 10.1016/s0168-9525(99)01894-6. [DOI] [PubMed] [Google Scholar]
  • 33.Lobry J.R. Asymmetric substitution patterns in the two DNA strands of bacteria. Mol. Biol. Evol. 1996;13:660–665. doi: 10.1093/oxfordjournals.molbev.a025626. [DOI] [PubMed] [Google Scholar]
  • 34.Lobry J.R. A simple vectorial representation of DNA sequences for the detection of replication origins in bacteria. Biochimie. 1996;78:323–326. doi: 10.1016/0300-9084(96)84764-x. [DOI] [PubMed] [Google Scholar]
  • 35.Haldenby S., White M.F., Allers T. RecA family proteins in archaea: RadA and its cousins. Biochem. Soc. Trans. 2009;37:102–107. doi: 10.1042/BST0370102. [DOI] [PubMed] [Google Scholar]
  • 36.Andersson A.F., Pelve E.A., Lindeberg S., Lundgren M., Nilsson P., Bernander R. Replication-biased genome organisation in the crenarchaeon Sulfolobus. BMC Genomics. 2010;11:454. doi: 10.1186/1471-2164-11-454. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Lindas A.C., Bernander R. The cell cycle of archaea. Nat. Rev. Microbiol. 2013;11:627–638. doi: 10.1038/nrmicro3077. [DOI] [PubMed] [Google Scholar]
  • 38.Carnoy C., Roten C.A. The dif/Xer recombination systems in proteobacteria. PloS One. 2009;4:e6531. doi: 10.1371/journal.pone.0006531. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Wolf Y.I., Makarova K.S., Yutin N., Koonin E.V. Updated clusters of orthologous genes for Archaea: a complex ancestor of the Archaea and the byways of horizontal gene transfer. Biol. Direct. 2012;7:46. doi: 10.1186/1745-6150-7-46. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Lercher M.J., Urrutia A.O., Hurst L.D. Clustering of housekeeping genes provides a unified model of gene order in the human genome. Nat. Genet. 2002;31:180–183. doi: 10.1038/ng887. [DOI] [PubMed] [Google Scholar]
  • 41.Williams E.J., Bowles D.J. Coexpression of neighboring genes in the genome of Arabidopsis thaliana. Genome Res. 2004;14:1060–1067. doi: 10.1101/gr.2131104. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Pal C., Hurst L.D. Evidence against the selfish operon theory. Trends Genet. TIG. 2004;20:232–234. doi: 10.1016/j.tig.2004.04.001. [DOI] [PubMed] [Google Scholar]
  • 43.Esnault E., Valens M., Espeli O., Boccard F. Chromosome structuring limits genome plasticity in Escherichia coli. PLoS Genet. 2007;3:e226. doi: 10.1371/journal.pgen.0030226. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Allen T.E., Price N.D., Joyce A.R., Palsson B.O. Long-range periodic patterns in microbial genomes indicate significant multi-scale chromosomal organization. PLoS Comput. Biol. 2006;2:e2. doi: 10.1371/journal.pcbi.0020002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Rocha E.P., Danchin A. Essentiality, not expressiveness, drives gene-strand bias in bacteria. Nat. Genet. 2003;34:377–378. doi: 10.1038/ng1209. [DOI] [PubMed] [Google Scholar]
  • 46.Couturier E., Rocha E.P. Replication-associated gene dosage effects shape the genomes of fast-growing bacteria but only for transcription and translation genes. Mol. Microbiol. 2006;59:1506–1518. doi: 10.1111/j.1365-2958.2006.05046.x. [DOI] [PubMed] [Google Scholar]
  • 47.Vital M., Chai B.L., Ostman B., Cole J., Konstantinidis K.T., Tiedje J.M. Gene expression analysis of E. coli strains provides insights into the role of gene regulation in diversification. Isme J. 2015;9:1130–1140. doi: 10.1038/ismej.2014.204. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Reno M.L., Held N.L., Fields C.J., Burke P.V., Whitaker R.J. Biogeography of the Sulfolobus islandicus pan-genome. Proc. Natl. Acad. Sci. U. S. A. 2009;106:8605–8610. doi: 10.1073/pnas.0808945106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Krupovic M., Bamford D.H. Archaeal proviruses TKV4 and MVV extend the PRD1-adenovirus lineage to the phylum Euryarchaeota. Virology. 2008;375:292–300. doi: 10.1016/j.virol.2008.01.043. [DOI] [PubMed] [Google Scholar]
  • 50.Makarova K.S., Wolf Y.I., Forterre P., Prangishvili D., Krupovic M., Koonin E.V. Dark matter in archaeal genomes: a rich source of novel mobile elements, defense systems and secretory complexes. Extremophiles. 2014;18:877–893. doi: 10.1007/s00792-014-0672-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Barry E.R., Bell S.D. DNA replication in the archaea. Microbiol. Mol. Biol. R. 2006;70:876. doi: 10.1128/MMBR.00029-06. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Grigoriev A. Analyzing genomes with cumulative skew diagrams. Nucleic Acids Res. 1998;26:2286–2290. doi: 10.1093/nar/26.10.2286. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Zhang C.T., Zhang R., Ou H.Y. The Z curve database: a graphic representation of genome sequences. Bioinformatics. 2003;19:593–599. doi: 10.1093/bioinformatics/btg041. [DOI] [PubMed] [Google Scholar]
  • 54.Lopez P., Philippe H., Myllykallio H., Forterre P. Identification of putative chromosomal origins of replication in Archaea. Mol. Microbiol. 1999;32:883–886. doi: 10.1046/j.1365-2958.1999.01370.x. [DOI] [PubMed] [Google Scholar]
  • 55.Kono N., Arakawa K., Tomita M. Comprehensive prediction of chromosome dimer resolution sites in bacterial genomes. BMC Genomics. 2011;12:19. doi: 10.1186/1471-2164-12-19. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Duggin I.G., Dubarry N., Bell S.D. Replication termination and chromosome dimer resolution in the archaeon Sulfolobus solfataricus. EMBO J. 2011;30:145–153. doi: 10.1038/emboj.2010.301. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Zeng X., Zhang X., Jiang L., Alain K., Jebbar M., Shao Z. Palaeococcus pacificus sp. nov., an archaeon from deep-sea hydrothermal sediment. Int. J. Syst. Evol. Microbiol. 2013;63:2155–2159. doi: 10.1099/ijs.0.044487-0. [DOI] [PubMed] [Google Scholar]
  • 58.Cohen G.N., Barbe V., Flament D., Galperin M., Heilig R., Lecompte O., Poch O., Prieur D., Querellou J., Ripp R., Thierry J.C., Van der Oost J., Weissenbach J., Zivanovic Y., Forterre P. An integrated analysis of the genome of the hyperthermophilic archaeon Pyrococcus abyssi. Mol. Microbiol. 2003;47:1495–1512. doi: 10.1046/j.1365-2958.2003.03381.x. [DOI] [PubMed] [Google Scholar]
  • 59.Robb F.T., Maeder D.L., Brown J.R., DiRuggiero J., Stump M.D., Yeh R.K., Weiss R.B., Dunn D.M. Genomic sequence of hyperthermophile, Pyrococcus furiosus: implications for physiology and enzymology. Methods Enzym. 2001;330:134–157. doi: 10.1016/s0076-6879(01)30372-5. [DOI] [PubMed] [Google Scholar]
  • 60.Bridger S.L., Lancaster W.A., Poole F.L., 2nd, Schut G.J., Adams M.W. Genome sequencing of a genetically tractable Pyrococcus furiosus strain reveals a highly dynamic genome. J. Bacteriol. 2012;194:4097–4106. doi: 10.1128/JB.00439-12. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Kawarabayasi Y., Sawada M., Horikawa H., Haikawa Y., Hino Y., Yamamoto S., Sekine M., Baba S., Kosugi H., Hosoyama A., Nagai Y., Sakai M., Ogura K., Otsuka R., Nakazawa H., Takamiya M., Ohfuku Y., Funahashi T., Tanaka T., Kudoh Y., Yamazaki J., Kushida N., Oguchi A., Aoki K., Kikuchi H. Complete sequence and gene organization of the genome of a hyper-thermophilic archaebacterium, Pyrococcus horikoshii OT3 (supplement) DNA Res. Int. J. Rapid Publ. Rep. Genes Genomes. 1998;5:147–155. doi: 10.1093/dnares/5.2.147. [DOI] [PubMed] [Google Scholar]
  • 62.Lee H.S., Bae S.S., Kim M.S., Kwon K.K., Kang S.G., Lee J.H. Complete genome sequence of hyperthermophilic Pyrococcus sp. strain NA2, isolated from a deep-sea hydrothermal vent area. J. Bacteriol. 2011;193:3666–3667. doi: 10.1128/JB.05150-11. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Jung J.H., Lee J.H., Holden J.F., Seo D.H., Shin H., Kim H.Y., Kim W., Ryu S., Park C.S. Complete genome sequence of the hyperthermophilic archaeon Pyrococcus sp. strain ST04, isolated from a deep-sea hydrothermal sulfide chimney on the Juan de Fuca Ridge. J. Bacteriol. 2012;194:4434–4435. doi: 10.1128/JB.00824-12. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Jun X., Lupeng L., Minjuan X., Oger P., Fengping W., Jebbar M., Xiang X. Complete genome sequence of the obligate piezophilic hyperthermophilic archaeon Pyrococcus yayanosii CH1. J. Bacteriol. 2011;193:4297–4298. doi: 10.1128/JB.05345-11. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Vannier P., Marteinsson V.T., Fridjonsson O.H., Oger P., Jebbar M. Complete genome sequence of the hyperthermophilic, piezophilic, heterotrophic, and carboxydotrophic archaeon Thermococcus barophilus MP. J. Bacteriol. 2011;193:1481–1482. doi: 10.1128/JB.01490-10. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Zhao W., Xiao X. Complete genome sequence of Thermococcus eurythermalis A501, a conditional piezophilic hyperthermophilic archaeon with a wide temperature range, isolated from an oil-immersed deep-sea hydrothermal chimney on Guaymas Basin. J. Biotechnol. 2015;193:14–15. doi: 10.1016/j.jbiotec.2014.11.006. [DOI] [PubMed] [Google Scholar]
  • 67.Zivanovic Y., Armengaud J., Lagorce A., Leplat C., Guerin P., Dutertre M., Anthouard V., Forterre P., Wincker P., Confalonieri F. Genome analysis and genome-wide proteomics of Thermococcus gammatolerans, the most radioresistant organism known amongst the Archaea. Genome Biol. 2009;10:R70. doi: 10.1186/gb-2009-10-6-r70. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Fukui T., Atomi H., Kanai T., Matsumi R., Fujiwara S., Imanaka T. Complete genome sequence of the hyperthermophilic archaeon Thermococcus kodakaraensis KOD1 and comparison with Pyrococcus genomes. Genome Res. 2005;15:352–363. doi: 10.1101/gr.3003105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Gardner A.F., Kumar S., Perler F.B. Genome sequence of the model hyperthermophilic archaeon Thermococcus litoralis NS-C. J. Bacteriol. 2012;194:2375–2376. doi: 10.1128/JB.00123-12. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Oberto J., Gaudin M., Cossu M., Gorlas A., Slesarev A., Marguet E., Forterre P. Genome sequence of a hyperthermophilic archaeon, Thermococcus nautili 30–1, that produces viral vesicles. Genome Announc. 2014;2 doi: 10.1128/genomeA.00243-14. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Lee H.S., Kang S.G., Bae S.S., Lim J.K., Cho Y., Kim Y.J., Jeon J.H., Cha S.S., Kwon K.K., Kim H.T., Park C.J., Lee H.W., Kim S.I., Chun J., Colwell R.R., Kim S.J., Lee J.H. The complete genome sequence of Thermococcus onnurineus NA1 reveals a mixed heterotrophic and carboxydotrophic metabolism. J. Bacteriol. 2008;190:7491–7499. doi: 10.1128/JB.00746-08. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72.Mardanov A.V., Ravin N.V., Svetlitchnyi V.A., Beletsky A.V., Miroshnichenko M.L., Bonch-Osmolovskaya E.A., Skryabin K.G. Metabolic versatility and indigenous origin of the archaeon Thermococcus sibiricus, isolated from a siberian oil reservoir, as revealed by genome analysis. Appl. Environ. Microbiol. 2009;75:4580–4588. doi: 10.1128/AEM.00718-09. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73.Wang X., Gao Z., Xu X., Ruan L. Complete genome sequence of Thermococcus sp. strain 4557, a hyperthermophilic archaeon isolated from a deep-sea hydrothermal vent area. J. Bacteriol. 2011;193:5544–5545. doi: 10.1128/JB.05851-11. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74.Oger P., Sokolova T.G., Kozhevnikova D.A., Chernyh N.A., Bartlett D.H., Bonch-Osmolovskaya E.A., Lebedinsky A.V. Complete genome sequence of the hyperthermophilic archaeon Thermococcus sp. strain AM4, capable of organotrophic growth and growth at the expense of hydrogenogenic or sulfidogenic oxidation of carbon monoxide. J. Bacteriol. 2011;193:7019–7020. doi: 10.1128/JB.06259-11. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 75.Jung J.H., Holden J.F., Seo D.H., Park K.H., Shin H., Ryu S., Lee J.H., Park C.S. Complete genome sequence of the hyperthermophilic archaeon Thermococcus sp. strain CL1, isolated from a Paralvinella sp. polychaete worm collected from a hydrothermal vent. J. Bacteriol. 2012;194:4769–4770. doi: 10.1128/JB.01016-12. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76.Hensley S.A., Jung J.H., Park C.S., Holden J.F. Thermococcus paralvinellae sp. nov. and Thermococcus cleftensis sp. nov. of hyperthermophilic heterotrophs from deep-sea hydrothermal vents. Int. J. Syst. Evol. Microbiol. 2014;64:3655–3659. doi: 10.1099/ijs.0.066100-0. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

mmc1.xlsx (448.6KB, xlsx)
mmc2.xlsx (150.1KB, xlsx)
mmc3.xlsx (117.2KB, xlsx)
mmc4.pdf (203.8KB, pdf)
mmc5.pdf (329.7KB, pdf)

RESOURCES