Skip to main content
Journal of Bacteriology logoLink to Journal of Bacteriology
. 2020 Oct 8;202(21):e00229-20. doi: 10.1128/JB.00229-20

Unique Features of Tandem Repeats in Bacteria

Juan A Subirana a,, Xavier Messeguer a
Editor: Anke Beckerb
PMCID: PMC7549362  PMID: 32839174

We found an enigmatic group of noncoding satellites in 85 bacterial genomes with a constant repeat size but variable sequence. This pattern of DNA organization is unique and had not been previously described in bacteria. These findings strongly suggest that satellite size in some bacteria is under strong selective constraints and thus that satellites are very likely to play a fundamental role. We also provide a list and properties of all satellites in 12,233 genomes, which may be used for further genomic analysis.

KEYWORDS: Bacillus, Bacillus coagulans, bacterial nucleoid, Leptospira interrogans, satellites, tandem repeats, noncoding DNA, transcription factors

ABSTRACT

DNA tandem repeats, or satellites, are well described in eukaryotic species, but little is known about their prevalence across prokaryotes. Here, we performed the most complete characterization to date of satellites in bacteria. We identified 121,638 satellites from 12,233 fully sequenced and assembled bacterial genomes with a very uneven distribution. We also determined the families of satellites which have a related sequence. There are 85 genomes that are particularly satellite rich and contain several families of satellites of yet unknown function. Interestingly, we only found two main types of noncoding satellites, depending on their repeat sizes, 22/44 or 52 nucleotides (nt). An intriguing feature is the constant size of the repeats in the genomes of different species, whereas their sequences show no conservation. Individual species also have several families of satellites with the same repeat length and different sequences. This result is in marked contrast with previous findings in eukaryotes, where noncoding satellites of many sizes are found in any species investigated. We describe in greater detail these noncoding satellites in the spirochete Leptospira interrogans and in several bacilli. These satellites undoubtedly play a specific role in the species which have acquired them. We discuss the possibility that they represent binding sites for transcription factors not previously described or that they are involved in the stabilization of the nucleoid through interaction with proteins.

IMPORTANCE We found an enigmatic group of noncoding satellites in 85 bacterial genomes with a constant repeat size but variable sequence. This pattern of DNA organization is unique and had not been previously described in bacteria. These findings strongly suggest that satellite size in some bacteria is under strong selective constraints and thus that satellites are very likely to play a fundamental role. We also provide a list and properties of all satellites in 12,233 genomes, which may be used for further genomic analysis.

INTRODUCTION

Bacterial genomes contain many repetitive sequences, some of which have been studied in great detail. An overall review describing them is available in reference 1. It is shown that some repeats do not carry any adaptive value, while others allow functional diversification and increased expression. Many of these repeats are identical or inverted sequences of many different sizes interspersed throughout the genome; there are several kinds with different transposition features (1). The main groups of transposons have been recently reviewed (2). Most transposons are associated with insertion sequences, found in most bacterial genomes (3). They consist of transposase genes, often limited in both sides by short inverted repeats. They have been translocated throughout the genome and participate in genome rearrangements and plasmid insertions. A catalogue of these sequences is available online (https://www-is.biotoul.fr/index.php).

Another large group of repetitive sequences are tandem repeats or satellites. When the repeated sequence is short, comprising a few nucleotides, they are called microsatellites. They have been studied in many genomes; data on their sequence and length are available in the Microsatellite Database (MCSB) (https://data.ccmb.res.in/msdb/). Microsatellites, including satellites with longer repeats, have been used for genetic fingerprinting of bacterial pathogenic species. Many of them are polymorphic and present a variable number of tandem repeats (VNTR) in bacterial populations (4). These differences allow the characterization of different strains of a given bacterial species, as reviewed by Lindsted (5). In particular, VNTRs have been used to characterize different strains of Leptospira interrogans (6, 7), a species which we will study in detail in this paper. A database covering tandem repeats suitable for this application was developed by Denoeud and Vergnaud (8). This database now covers a complete list of all satellites from over a thousand bacterial species (http://minisatellites-rec.igmors.u-psud.fr). This database also allows a comparison of satellites in closely related species. An additional class of repetitive sequences is the clustered regularly interspaced short palindromic repeats (CRISPR motifs), which present consensus sequences separated by variable spacers. They are available for most bacterial species in the CRISPR database (https://crispr.i2bc.paris-saclay.fr). The database also allows a determination of CRISPR regions in new bacterial genomes.

In this paper, we provide a catalogue of all the satellites present in 12,233 completely assembled bacterial genomes. We limit our search to satellites with at least four repeats with a length of over 9 nucleotides (nt). We have also determined the families of satellites found in individual and related species. Recently, we published a survey of satellites for a group of reference genome sequences of prokaryotic species. We found unique features in a few species (9), which prompted us to carry the present general survey of the occurrence and properties of satellites in all the available genomes of bacteria which are fully sequenced and assembled. The survey we present here demonstrates that bacterial genomes in general contain few satellites, as expected from the limited space available between coding sequences. Only roughly 10% of the bacterial genomes contain more than 20 satellites in their genome, a satellite density similar to what is found in eukaryotes (10). Among them we found species with a unique distribution of noncoding satellites which are likely to have biological relevance.

RESULTS

General features of satellites.

We searched all the satellites found in the 12,233 genomes which we downloaded from GenBank; we detected 121,638 satellites in them, an average of 10 satellites per genome. An Excel file with the sequences of all of the satellites has been deposited in GitHub (https://github.com/jasubirana/Bacterial-Satellites). A list of satellites from bacteria and other species is also available in our website (http://alggen.lsi.upc.edu). The genomes which contain satellites are listed in Table S1 in the supplemental material, including genome size, number of satellites, and the CG content for each species. A list of all 943 bacterial genomes in which no satellite was found is given in Table S2. For our analysis, we selected the 1,241 genomes which have more than 19 satellites, also included in Table S1. They represent approximately 10% of the bacterial genomes we downloaded. In Table 1, we present their classification in taxonomical groups. Unfortunately, the genomes available do not cover in a uniform manner all taxonomic groups: from some of them there is no complete genome available, whereas there are many genomes for different strains of the most common species. For this reason, we excluded redundant genomes for further analysis, as indicated in Table 1. This lack of uniform coverage is also apparent when we compare the distribution of satellite density (satellites per megabase) in different groups, as shown in Fig. 1. The Clostridium group, for example, shows limited variation because all genomes covered are different strains of the same organism, Clostridioides difficile.

TABLE 1.

Data for genomes with more than 19 satellitesa

Group No. of genomes No. of satellites Short repeats (%)j
Actinobacteria 399 14,915 52.5
    Mycobacteria 173 4,200 33.6
    Mycobacteria (simpl)b 12 384 33.9
    Streptomyces 100 4,443 54.9
    Miscellanea 126 6,272 52.4
Firmicutes 257 8,420 19.4
    Bacillus 158 4,510 ND
    Bacillus (simpl)c 48 1,838 10.3
    Clostridium 50 1,591 22.6
    Paenibacillus 23 1,285 29.0
    Miscellanea 26 1,034 19.1
Proteobacteria 434 17,925 55.5
    Alphaproteobacteria 17 576 48.6
    Betaproteobacteria 167 8,709 ND
    Betaproteobacteria (simpl)d 50 1,372 85.2
    Gammaproteobacteria 220 6,222 ND
    Gammaproteobacteria (simpl)e 153 4,266 57.7
    Deltaproteobacteria-Epsilonproteobacteria 30 2,418 39.4
Chloroflexi 7 624 46.3
Spirochaetes 15 591 2.16i
FCB (simpl)f 47 1,889 33.6
Cyanobacteria 54 5,585 51.6
Unassigned 12 375 68.5
Total 1,241 50,947g
Total analyzedh 770 34,543
a

Families of satellites were determined for each group in the table. Some groups were simplified to eliminate redundant genomes from the same species (simpl). The third column corresponds to the percentage of satellites which have a repeat length with less than 20 nt. Extreme values are shown in boldface.

b

Only one representative of each species is included.

c

Only three genomes left for B. anthracis, B. cereus, and B. thuringiensis.

d

One hundred seventeen Burkholderia genomes not included.

e

Only three genomes left for X. oryzae, Yersinia pestis, and Yersinia pseudotuberculosis.

f

Sixteen Porphyromonas gingivalis genomes not included. FCB, Firmicutes, Cyanobacteria, Actinobacteria, Proteobacteria, and Bacterioidetes.

g

Includes 623 satellites from the 16 eliminated genomes of P. gingivalis.

h

Satellite families were determined for all these genomes. Redundant genomes were excluded as described in footnotes b to f.

i

Only Leptospira (12 genomes) included.

j

ND, not determined.

FIG 1.

FIG 1

Distribution of the number of satellites per genome in different bacterial groups. The number of genomes in each group is indicated in parentheses. Three species with more than 35 satellites/genome are not shown in the figure.

In our preliminary study of a few representative bacterial species (9), we found that their satellites could be classified in four different types (groups A to D), depending on the nature of the repeated sequence. Group A consists of short repeats (<20 nt), usually related to microsatellites; in Table 1 we present the average proportion of such satellites in different groups. Group B consists of repeats with lengths that are multiples of three, which in most cases represent sequences coding for amino acid repeats in proteins. Group C consists of unique satellites which have appeared by chance in individual species. They are not expected to play any role in the genome. Group D consists of repeats with lengths of over 20 nt, not multiples of three, which build related noncoding satellites, often transposed throughout the genome; their study is the main objective of this paper. Some of these satellites might have a function in the genome.

For the study of these different types of satellites, we built a table in which the size of all satellites for each individual genome is listed (Table S3). Furthermore, we searched all the families of related satellite sequences, determined independently for each of the groups given in Table 1. The results obtained have been deposited in GitHub, where all satellite families and their members are given (https://github.com/jasubirana/Bacterial-Satellites). Each species presents a set of unique satellite families not shared by other species; we find related satellites only when the species are very closely related. In most cases only satellites of the A to C groups are found; we detected a significant number of noncoding satellites (group D) in only 113 genomes, which are listed in Tables S4 to S6. A striking result from our study is that noncoding satellites (excluding those with short repeats) belong to only three types, as a function of their repeat lengths (Fig. 2), with the notable exception of some Spirochaetes to be described below. Satellites from the different groups will be analyzed in the following sections. We should note at the start that the distributions of lengths in all these groups of satellites are similar, with average lengths of around 400 nt and rarely exceeding 1,000 nt (Fig. S1).

FIG 2.

FIG 2

Repeat sizes in noncoding bacterial satellites. The areas are proportional to the number of genomes in each group. Repeats of 20 to 23 nt are found in many bacterial groups, Firmicutes, Cyanobacteria, Actinobacteria, Proteobacteria, and Bacterioidetes (FCB). The 52-nt repeat is found in many Bacillus and in some Firmicutes species. Further details are given in the text and in Tables S4 to S8.

Satellites with a short repeat.

This group is formed by satellites with a short repeat (<20 nt), which are usually related to microsatellites (with a repeat of 6 to 10 nt). This relation is clearly apparent in the main satellite families of Betaproteobacteria, which present the largest number of this type of satellite. They probably appear and disappear by either chance or transposition in different regions of the genome. Their total number has been calculated from Table S3, and the average values are given in Table 1. In general, they represent an important proportion of the total number of satellites (33.6 to 68.5%), but there are significant deviations from this common trend, in particular, the practical absence of satellites with short repeats found in Leptospira. Firmicutes also show a moderate content of satellites with short repeats; some species have no satellites of this type. At the other extreme, we find that Betaproteobacteria, in particular Burkholderia, have a very large number of such satellites. It has been suggested that they may play an important role in the adaptation of these bacteria to different mammalian hosts (11).

Satellites in Bacillus.

We will first analyze noncoding satellites in Bacillus. We studied 48 genomes; a complete list with the distribution of the main repeat sizes in each species is given in Table S5. We divided the Bacillus genomes into four groups as a function of the noncoding satellites present in each species, as graphically shown in Fig. 3. The average features of each group are given in Table 2.

FIG 3.

FIG 3

Average number of satellites as a function of repeat length for different groups of Bacillus. Satellites of the B. cereus group are distributed through many repeat lengths, whereas in the other two groups, a single repeat length predominates. The B. circulans group is not represented; it has an intermediate composition (Table 3).

TABLE 2.

Average values for the four groups of Bacillus

Genome size (Mb) Bacillus group No. of genomes GC content (%) No. of satellites/Mb Avg no. of satellitesa with repeat size (nt) of:
Total 10–19 20–21 22–50 51–53 >53
5.32 B. cereus 22 35.3 5.0 26.8 6.9 2.9 14.0 0.1 2.9
4.78 B. simplex 6 40.1 8.1 38 1.7 24.5 9.3 0 2.5
4.67 B. circulans 10 38.6 10.6 48.9 1.9 11.6 3.5 29.3 2.6
3.95 B. coagulans 10 43.9 13.7 53.2 0.9 0.5 1.6 46.8 3.4
a

The characteristic average number of satellites for each group is given in boldface.

The Bacillus cereus group is very homogeneous, with similar genome sizes (5 to 5.8 Mb) and 34.9 to 35.6% GC content. Practically all repeats have a size which is a multiple of three and correspond to amino acid repeats in proteins, most of them shared by different species of this group; noncoding satellites are practically absent. Satellite families are mixed; they include several species. In fact, the boundaries between members of this group are difficult to define (12), so the genomes in this group might be considered different strains of a single species.

We defined the other three groups of Bacillus by their noncoding satellites. These groups have a limited taxonomical significance, as is apparent by inspection of the genomic similarity trees reported by Hernández-González et al. (13). The B. simplex group is characterized by the total absence of satellites with a 52-nt repeat and the abundance of satellites with a 21- to 42-nt repeat, whereas the B. coagulans group shows the opposite features. Finally, the B. circulans group has an intermediate composition. The different species in each group show great variability, with significant differences in genome size and GC content. Furthermore, each genome has species-specific families of related satellites, with the exception of B. simplex, B. muralis, and B. butanolivorans, which share several families of related satellites, in agreement with their taxonomic relationship (14).

Satellites with a 52-nt repeat.

These satellites are of particular interest, since they have a very constant repeat length, while their composition is very variable. Every genome has several families and unique satellites with the same 52- nt repeat length. An example is given in Table S7, where we present the complete sequences of all satellites for one strain of B. coagulans. Inspection of the table clearly shows that the repeat length is conserved, while the sequence of each repeat may vary significantly. In Table 3 we present the repeat sequence of the main families for all genomes of the B. coagulans group and for those species in the B. circulans group with a higher number of satellites with a 52-nt repeat. We also included a few examples of additional species which also contain this type of satellite, although in a much lower proportion. In all cases, the length of the repeat is constant throughout the whole satellite length. An additional feature of these species is the surprisingly small proportion of satellites with different repeat lengths, as is apparent in Fig. 3 and Table 2. The case of B. coagulans is particularly striking, with over 90% of its satellites having a 52-nt repeat. In most cases, when a species acquires satellites with a 52-nt repeat, all other satellites are absent, even those which might code for proteins.

TABLE 3.

Main satellite families with repeat size of 52 nta

Speciesb GC content (%) No. of sats % with 52-nt repeat Family data
Family code Alignment score Consensus repeatc GC content (%)d
Bacillus cellulosilyticus 36.5 64 73.4 37_52_10 0.634747 gTGTaTCATACgaaggCAATGACACgtGAgAAAGtaGaaGaaacgnAATAAa 37.3
61_52_6 0.734949 CAcTCAACGAAGGTcATCATAAGcAAGCAATGCTaCCCCAAAACCAAAcCcn 47.1
Bacillus coagulans 46.2–47.3 53 (avg) 90.8 1_52_139 0.608341 GTgAAGgAAGgcCnTCnTTTTTcCncGCTTcCTTAACGTAGACGcgCTCTAT 51.0
2_52_35 0.736193 TTTTGTCCTTTTGaCaGcTTCAAAAnGACATTTCGgGCCCgGATgCAgCntG 48.0
8_52_18 0.691802 TGTCCTTCATaagggtGATGAAaGACAAAACaCnGGcCgggaAAcGgCgAAt 49.0
Bacillus horikoshii 40.6 64 71.9 22_52_12 0.638076 ATGAAGACaTcAGTGAcGAGaAAtcAGGAGgAGAGAaGTCcTCATcGncGTt 47.1
Bacillus kochii 36.8 59 88.1 52_51_7 0.658586 TTCnTTCTGAGTGACtTGCtAatCCcTTTTGCGAAGCgCTCAnCtCtgGtt 46.9
92_54_4 0.614087 CTTTTaTTAGTcGCGaTTcTcacTTTTgCcCTACTtATCttnTcacTnatTCTT 34.6
Bacillus litoralis 35.9 80 73.7 42_52_9 0.604663 TGTCGTTCATaAGggtgATGAACGACAAAAGtGnTnaGAAAAGagngnGtag 41.7
77_52_5 0.723457 AATCGGGACAGAAAAaaGAgcgaGCAGtgaAAnTgaGTCTCGATAgTgggng 48.0
Bacillus oceanisediminis 40.8 38 84.2 79_52_5 0.669182 nCgCcaAcTTCGGACTCatTCtCtCggTTTTccgcttCTtCTGTCCGAAgtn 52.0
96_54_4 0.607071 cGATTAACTACCATTTTnCctTncaCCnnccTcaTTTTCGtCGTtAatcCaTcc 42.0
Bacillus thermoamylovorans 37.5 70 78.6 4_52_21 0.685269 AAAAtGacGACGAGAAnnGGTCTCGTCGCCAAAAAatGGAGTTTTcCGgctc 48.0
5_52_21 0.659269 TTGgCGACGAGaCcnatTCTCGTCaCCaTTTTgaGGtGAAAAAnGCtCnaTT 44.9
30_53_11 0.72022 TGTCCAATAGAACGGcTCTCgTGGACAAAATnGAGgnnTCAATCAGgAAAAAc 42.0
Virgibacillus halodenitrificans 37.4 48 (avg) 36.1 2_52_19 0.604352 ATGTCTaTCATgAgngtGATGACGGACATTGgGAgcGnGAAAtcgnGnGtGA 50.0
Paenibacillus borealis 51.1-51.9 88 (avg) 53.6 1_52_46 0.656479 aTGTATGCGAAAAACCGAATACAATGtGCcaccGntAggGTAaacGaGCCnA 46.0
3_51_29 0.696522 TGAaTCCGCcgaAAGtGGGCggnAgGnGGaAATGAGGgGCAgAAGTGCCCC 63.3
4_51_28 0.695681 AAATtAAAGGGATAAATCCCtCTaAttCCGCCGAAAGTGGGCgGtAtGggG 47.1
Microlunatus phosphovorus 67.3 20 65.0 168_52_5 0.600606 GGCCCgnGGGCGcgAGAcGTaCGCagttCnnGcctGgctcGcGgTCGTntGT 72.9
457_52_3 0.725367 GCGCCGCCGGGCCACngACgaCCgncGACCGcGCGTGCnnGGCgTACGTcTc 81.2
a

We present in this table the species with a large number of satellites (sats) with a 52-nt repeat. Only the main families are shown. All species in this table belong to the Firmicutes group, with the exception of M. phosphovorus, which belongs to Actinobacteria.

b

The values for V. halodenitrificans include three strains of this species, those of B. coagulans include seven strains, and those of P. borealis include five related Paenibacillus. The average number of satellites per species is given for these three cases.

c

Uppercase indicates a coincidence over 90%, lowercase 50 to 90%, and n below 50%. The consensus sequence is different in each family. We detected a 30-nt homologous region in only two cases (underlined).

d

The GC content of a consensus repeat is calculated excluding the n’s in the sequence.

For an analysis of the distribution of 52-nt repeat satellites throughout the genome, the B. coagulans group is most adequate, since there are seven strains which have been fully sequenced. This species is considered to be probiotic; different strains have been isolated and sequenced from the soil and from the gut of various animals (pig and chicken). All strains have a similar number of over 50 satellites with the 52-nt repeat, with the exception of strain 2-6, which has a smaller genome (15) and only 21 satellites (Table S5). We cannot fully exclude the possibility that these differences are due to sequencing errors, although sequencing has been carried out with a correct methodology (15, 16). In any case, we present in Fig. 4 a comparison of the genomes of the 2-6 and R11 strains. The difference in the sizes of the two genomes is due to many small interspersed gaps. About half of the satellites of the 2-6 genome are found in corresponding positions in the longer R11 genome, but no correspondence is found for the other half; some satellites are conserved in both strains, whereas other satellites appeared independently. There is no obvious explanation for the conservation of a constant 52-nt repeat in these groups of satellites; it appears that the 52-nt length is required for whatever function these satellites might have.

FIG 4.

FIG 4

Alignment of genomes of two B. coagulans strains. The genome of the 2-6 strain (GenBank accession number NC_015634.1) is given at the top; at the bottom is the R11 strain (GenBank accession number NZ_CP026649.1).They contain, respectively, 21 and 52 satellites with a 52-nt repeat. Each satellite is indicated at its position with a short black line. A cluster of satellites is present in both genomes around the 300 kb position. In the rest, the greater abundance of satellites in the R11 strain is obvious.

Satellites with 20- to 23-nt and 40- to 43-nt repeats.

Satellites with 20- to 23-nt and 40- to 43-nt repeats may be considered functionally related, since their lengths differ by a factor of two. As shown in Fig. 2, they are found in a miscellaneous group of bacterial genomes, but a detailed study of these satellites may lead to confusing results, since they include repeats of 21 or 42 nt which may correspond to either noncoding satellites or to parts of genes coding for amino acid repeats. We included some examples of their repeat sequences in Table S8. We find here again a constant size but unrelated sequences in different species. For example, among the species with the largest number of satellites in this group, Isoptericola variabilis has 48 satellites with a 20-nt repeat, whereas Roseiflexus castenholzii has 40 satellites with a 22-nt repeat. In Nostoc sphaeroides, we found 80 satellites with an exactly 21-nt repeat, which belong to different satellite families. We suspect that these satellites are also noncoding and functionally related to the other satellites in the 20 to 23 class. To find out if there is any known protein with a related repeat of seven amino acids, we compared the satellites in N. sphaeroides with the proteins in this species which have repeats of seven amino acids (Table S9). We found that 48 out of the 80 satellites are located in a region which contains a predicted gene for a protein of this type. However, these proteins are considered hypothetical and do not appear to have any feature in common; we conclude that these satellites are probably noncoding.

Given the large number of species in this group, we also analyzed their occurrence in the genomes which have the largest number of satellites (>100) (Table S10). We found that satellites with short repeats predominate in Actinobacteria and in most Cyanobacteria, but a few species of the latter group have a large number of satellites with a 21-nt repeat, as found in the Nostoc genus. A few other Cyanobacteria also contain a few satellites of this type. In all cases, they present a repeat of 21 nt; we never found a significant number of satellites with a 22- or 23-nt repeat. Moorea producens, the bacterium whose genome has the largest number of satellites, 748, also belongs to the Cyanobacteria, but it contains mainly short repeat satellites (598). This species is known for its impressive potential to produce secondary metabolites (17). In Proteobacteria with a large number of satellites, we also found a significant number of satellites with 20- to 23-nt and 40- to 43-nt repeats; in Sandaracinus amylolyticus, they represent 48% of all satellites (Table S4).

A unique case to be considered here is a Tannerella sp. strain (NCBI accession number NZ_CP017038.2). It contains 273 satellites, 239 of which have a 40- to 43-nt repeat. They form 23 families with 4 to 14 satellites each. These 23 families have similar repeat lengths but different repeat sequences. There are also 30 unique satellites with the same size but with no sequence correspondence with other satellites found in this genome. The predominance of repeats of this size is not found in any other bacterial species, with the exception of another Tannerella genome (NCBI accession number NZ_CP028365.1), which was recently published (18). Comparison with the genome of Tannerella forsythia shows only 9% genomic alignment, as determined with M-GCAT (19). Note that this species is not included in our study since its genome has fewer than 20 satellites. It appears that these Tannerella species are only distantly related.

Spirochaetes.

Spirochaetes form a separate group of bacteria with no clear relation to other bacterial species. We found that most Spirochaetes have very few satellites. An exception is Leptospira interrogans, which has a considerable number of satellites, 37 to 68 in different strains. This species has unique features (20, 21) which differentiate it from other Spirochaetes. It is a facultative parasitic species, first sequenced by Ren et al. (21). Strains from different mammalian hosts have since been sequenced. We analyzed in detail the satellite families in seven strains together with two related Leptospira, L. noguchii and L. kirschneri. These two species have been fully sequenced but not assembled; they are closely related to L. interrogans (22). A large satellite family with a consensus repeat of 69 nt corresponds to sequences coding for amino acid repeats of leucine-rich proteins (23). Several noncoding satellite families are also present; their consensus sequences are shown in Fig. 5. The distribution of satellites in these families is given for each strain in Table S6. The repeat sequences are different from those described above for other bacterial species, but a common feature is that satellites with short repeats (<30 nt) are rarely found (Table S6), as is also found in B. coagulans and in other Bacillus spp. described above. An additional difference from the 52-nt constant repeats found in Bacillus is the variable repeat length, which fluctuates in a broad range, 34 to 39 and 44 to 47 nt. We also studied the position of satellites along the genome, as shown in Fig. 6, for two strains which have different numbers of satellites. About half are found in syntenic regions, but in other cases, no correspondence is apparent. Their distribution along the genome is not uniform, with a lower number of satellites at both ends of the published genome sequence and in some other regions. We did not detect any relationship with the distribution of transcriptional start sites throughout the genome (24). The unique features of these satellites will be discussed below.

FIG 5.

FIG 5

Consensus repeat sequence of noncoding satellites in L. interrogans. The codes for all families are given. In the case of repeats of 45 to 47 nt, the sequences can be aligned with a coincidence of 66%.

FIG 6.

FIG 6

Comparison of the distribution of satellites in two strains of L. interrogans. In the top frame, we present the length of each satellite as a function of its position in the genome; it is obvious that the Lai strain has fewer satellites than the Hardjo strain. The genomes of both species are aligned in the bottom frame. Satellites are indicated by short black lines on both genomes. A clear correspondence has been detected for only 13 satellites. An overall comparison of both genomes has been reported by Llanes et al. (43).

DISCUSSION

A resource for bacterial satellites.

The data on satellites and satellite families which we deposited in GitHub may be of interest for researchers interested in the study of bacterial genomes (https://github.com/jasubirana/Bacterial-Satellites). A unique feature of our work is the classification of satellites in different families with a related composition. This feature allows a search for an eventual function of satellites from a given family. Although our data cover only the fully assembled genome sequences which were available at the time of writing this paper, the methodology we describe may be used to obtain the distribution of satellites in other genomes. One general conclusion of our study is that bacterial genomes usually have a low number of satellites, but there are notable exceptions as we have described. We should note that some of the satellites we have described might be used to determine different strains of bacterial species by the analysis of the variable number of tandem repeats (VNTR), although in general, minisatellites are used for this purpose (48).

The biological relevance of noncoding satellites.

An unexpected and enigmatic result of our study is the strict length conservation of noncoding satellites throughout the bacterial kingdom. This is in contrast with the wide range of sizes found in satellites in eukaryotic species (10, 25). We detected noncoding satellites of only two lengths (22/44 and 52 nt) in many unrelated species, with no sequence conservation. As far as we know, there is no precedent for such organization in any genome. The constant size of the repeats suggests a biological role for these satellites, but there is no obvious function for them. A possible explanation is that there is a protein or group of proteins which recognize some feature of the sequence and require a constant repeat size to polymerize on the DNA. A tentative model is presented in Fig. 7. A distantly related structure is found in some transcription factors, such as the zinc fingers (26), which recognize short DNA repeats of 3 base pairs through a repeat of 28 amino acids (27). Another related structure is found in eukaryotic centromeres (28), in which an array of satellites with a constant length is bound to nucleosomes which contain a specific CENP-A histone. The repeat length of the satellites varies in different species (29); in human centromeres, it is denominated alfa satellite with a length of 171 nt (30). In the case of the noncoding satellite repeats which we are considering, a protein-DNA complex might be involved in the stabilization of the bacterial nucleoid. There is an extensive literature describing proteins which contribute to stabilizing the nucleoid, as reviewed by Dame et al. (31), but no recognition sequence as a satellite has been reported. We should note that the nucleoid-associated proteins also have a transcription factor-related function, since they organize the nucleoid into domains with a highly dynamic structure (32) and different transcription activity (33). Among these DNA-associated proteins, H-NS, a protein with 125 amino acid residues, is capable of polymerizing along DNA (34) and also cross-linking distant strands of DNA (35). An oligomer of a related protein would fulfill the requirements of the model presented in Fig. 7. If this model turns out to be correct, the B. coagulans group and other bacteria which contain the 52-nt repeat may have found a unique mechanism to stabilize their nucleoid. Horizontal gene transfer, very common in bacterial species and particularly in B. coagulans (36), could have contributed to the dissemination of satellites with a 52-nt repeat among many Firmicutes species (Table 3; see Tables S4 and S5 in the supplemental material). The other group of noncoding satellites (20- to 23-nt and 40- to 43-nt repeats) may play a similar or unrelated function using a different set of associated proteins.

FIG 7.

FIG 7

Model of protein-DNA complexes formed by satellites with a regular 52-nt repeat. Individual proteins or protein aggregates may bind to satellites. DNA in the complex is indicated as a straight line, but it might have any shape (curved, bent, coiled, etc.). Complexes formed by one satellite may aggregate with similar complexes in other regions of the genome in order to stabilize the nucleoid.

The particular case of Leptospira interrogans.

L. interrogans is a unique species which is clearly different from other Leptospira, as judged from the detrended canonical analysis of the proteomes of different spirochetes (20). Furthermore, the genome of L. interrogans surpasses those of other bacteria in terms of the number of proteins with structural similarity to eukaryotic proteins that it encodes (21), which indicates extensive horizontal gene transfer. Our work adds a new unique feature to this species, the presence of a specific group of noncoding tandem repeats, as given in Fig. 5. There is no obvious role for the presence of these satellites, but we can suggest several possibilities. The large number of noncoding satellites in L. interrogans may give a selective advantage to this species, which has a high genomic plasticity (22) and is able to adapt to different hosts; satellites may facilitate genome rearrangements through recombination between satellites in distant genome positions. Noncoding satellites might also influence the virulence of this species; its pathogenic factors are poorly characterized (37), in spite of the detailed genomic analysis available for this genus (38). Other pathogenic species also have a few related noncoding satellites. For example, Leptospira santarosai has only 24 satellites, 3 of them with a repeat sequence similar to those shown in Fig. 5. Noncoding satellites might also be involved in the stabilization of the nucleoid. In this species, the DNA is folded back and forth and appears as a discontinuous bundle of DNA strands along the whole cell (39). However, this organization is found in other Leptospira, such as L. biflexa, which have very few satellites and lack noncoding satellites. On the other hand, a unique DNA-binding protein has been described in L. interrogans (40), so this protein may stabilize packing of the nucleoid in L. interrogans by interacting with noncoding satellites.

MATERIALS AND METHODS

Detection of satellites.

We downloaded from the NCBI web site (https://www.ncbi.nlm.nih.gov/genome/browse#!/prokaryotes/#!/prokaryotes/) all the genome sequences of bacterial genomes which were fully sequenced and aligned on 14 December 2018. These 12,233 genomes represent a heterogeneous view of the bacterial kingdom: many genera are not represented, whereas several genomes are provided for the most studied groups. Satellites have been searched in all these genomes with the SATFIND program, which is available in our website (http://alggen.lsi.upc.edu) and is described in great detail elsewhere (10, 25). Its source code has been deposited in Dryad (http://datadryad.org/review?doi=doi:10.5061/dryad.h5s2q). The program determines the localization of clusters of any short sequence of a prefixed size without internal repetitions and repeated a minimum number of times in regions with a fixed size. Repeats of 1 to 5 nt are automatically eliminated. The minimum length of a repeat was taken as 10 nt in our search. As a result, short repeats, for example 7 nt, appear as repeats with a doubled size, 14 nt. Once a satellite is located, the program continues its search along the genome until no further neighboring repeats are detected, with no upper limit for the number of repeats in the satellite. This program allows a precise definition of satellites (repeat size, number of repeats, and internal regularity). We adjusted the parameters in order to capture short satellites with at least four repeats in a genome region of 800 nt. In order to eliminate the most irregular satellites, we only accepted those which have at least 60% of their repeats with an identical length (±1 nt). In this way, most irregular satellites are eliminated, although with these parameters, some satellites with only four repeats may still be irregular. Occasionally, we also changed the parameters of the program to detect additional satellites with a decreased regularity by requiring that only 10% of them had an identical length. We also used a modified version of SATFIND for the detection of amino acid repeats in proteins.

Each satellite was also characterized by a similarity score obtained upon alignment of all its repeats which have an identical size, thus excluding all repeats with indels. Each satellite may be also characterized by a homogeneity parameter which gives the proportion of repeats with the same length in each satellite. This parameter varies between 0.6 (60%) and 1.0 (100%), since satellites with low homogeneity have not been accepted, as mentioned above. The regularity of each satellite is thus characterized by two parameters; Ni gives the number of repeats in the satellite which have an identical length, and an alignment score is calculated for these Ni repeats.

A limitation of our study is the difficulty in determining accurately the complete sequence of tandem repeats due to common sequencing errors (41, 42). However, when sequences of the same genome reported by different authors are compared (41), all reveal satellites in the same positions, although their lengths may differ due to limitations of the sequencing methodologies. For this reason, we studied in more detail satellites of species from which we have complete sequences of several strains.

Identification of satellite families.

In order to detect related satellites, we used MALIG, a progressive multiple sequence alignment algorithm, which we developed to align satellite repeats and identify families of satellites with a related sequence, available in our website. It has been described in detail elsewhere (10). The program considers reverse sequences as well, normalizes the alignment score to the maximum possible value, and selects the cyclic permutation with the highest score. Then the progressive multialignment is applied to the matrix of pairwise alignment scores. The process finishes when the score is smaller than a similarity threshold (input parameter), which we set to 0.6.

We searched for satellite families separately in different taxonomic groups of species. Each family is characterized by three values, Fam_a_b_c. The order in the list of families is given by a, starting with those families with the largest number of members. The second value, b, gives the size of the repeat; c gives the number of members in the family. The consensus sequence of the repeat in each family is calculated taking into account the circularly permuted sequence of all repeats. Individual families may contain satellites with slightly different repeat lengths (±1  nt) from either one or several species.

Satellites which have unique repeats, not related in sequence to any other satellite, are considered unique repeats. In the list of families, they appear as families with a single member, c = 1.

Genome alignments.

In a few cases, we determined the correspondence of pairs of genomes using the M-GCAT method (19). M-GCAT (Multiple Genome Comparison and Alignment Tool) is a multiple genome alignment tool based on the search of maximal unique matches (MUMs) between genomes on both strands. First, a set of anchor MUMs is found where those MUMs shorter than a specific parameter (minimum anchor length) or randomly found (shorter than log base 4 on the length of the genome) are discarded. These sets of anchor MUMs divide the genome into several short parts in which a recursive search of MUMs is made. This recursive search is made until the length of the part is shorter than a given parameter (100 nucleotides in our case). Finally, close consecutive MUMs, separated by less than a given parameter (in our case 2,000 nucleotides), are grouped in clusters. The program provides a numerical and a graphical representation of the alignment.

Data availability.

An Excel file with the sequences of all satellites and a list of all satellite families and their members have been deposited in GitHub (https://github.com/jasubirana/Bacterial-Satellites).

Supplementary Material

Supplemental file 1
JB.00229-20-s0001.xlsx (1.3MB, xlsx)
Supplemental file 2
JB.00229-20-s0002.xlsx (314KB, xlsx)
Supplemental file 3
JB.00229-20-s0003.pdf (779.8KB, pdf)

ACKNOWLEDGMENTS

We thank J. Lourdes Campos for help with the figures and M. Mar Albà for insightful discussion and suggestions.

This work was supported by Ministerio de Ciencia e Innovación–Agencia Estatal de Investigación, Spain (projects TIN2015-69175-C4-3-R and RTI2018-094403-B-C33), and FEDER.

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Footnotes

Supplemental material is available online only.

REFERENCES

  • 1.Treangen TJ, Abraham A, Touchon M, Rocha EPC. 2009. Genesis, effects and fates of repeats in prokaryotic genomes. FEMS Microbiol Rev 33:539–571. doi: 10.1111/j.1574-6976.2009.00169.x. [DOI] [PubMed] [Google Scholar]
  • 2.Babakhani S, Oloomi M. 2018. Transposons: the agents of antibiotic resistance in bacteria. J Basic Microbiol 58:905–917. doi: 10.1002/jobm.201800204. [DOI] [PubMed] [Google Scholar]
  • 3.Mahillon J, Chandler M. 1998. Insertion sequences. Microbiol Mol Biol Rev 62:725–774. doi: 10.1128/MMBR.62.3.725-774.1998. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.van Belkum A, Scherer S, van Alphen L, Verbrugh H. 1998. Short-sequence DNA repeats in prokaryotic genomes. Microbiol Mol Biol Rev 62:275–293. doi: 10.1128/MMBR.62.2.275-293.1998. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Lindstedt B. 2005. Multiple-locus variable tandem repeats analysis for genetic fingerprinting of pathogenic bacteria. Electrophoresis 26:2567–2582. doi: 10.1002/elps.200500096. [DOI] [PubMed] [Google Scholar]
  • 6.Slack AT, Dohnt MF, Symonds ML, Smythe LD. 2005. Development of a multiple-locus variable number of tandem repeat analysis (MLVA) for Leptospira interrogans and its application to Leptospira interrogans serovar Australia isolates from far north Queensland, Australia. Ann Clin Microbiol Antimicrob 4:10. doi: 10.1186/1476-0711-4-10. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Salaün L, Mérien F, Gurianova S, Baranton G, Picardeau M. 2006. Application of multilocus variable-number tandem-repeat analysis for molecular typing of the agent of Leptospirosis. J Clin Microbiol 44:3954–3962. doi: 10.1128/JCM.00336-06. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Denoeud F, Vergnaud G. 2004. Identification of polymorphic tandem repeats by direct comparison of genome sequence from different bacterial strains: a Web-based resource. BMC Bioinformatics 5:4. doi: 10.1186/1471-2105-5-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Subirana JA, Messeguer X. 2019. Satellites in the prokaryote world. BMC Evol Biol 19:181. doi: 10.1186/s12862-019-1504-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Subirana JA, Albà MM, Messeguer X. 2015. High evolutionary turnover of satellite families in Caenorhabditis. BMC Evol Biol 15:218. doi: 10.1186/s12862-015-0495-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Nierman WC, DeShazer D, Kim HS, Tettelin H, Nelson KE, Feldblyum T, Ulrich RL, Ronning CM, Brinkac LM, Daugherty SC, Davidsen TD, Deboy RT, Dimitrov G, Dodson RJ, Durkin AS, Gwinn ML, Haft DH, Khouri H, Kolonay JF, Madupu R, Mohammoud Y, Nelson WC, Radune D, Romero CM, Sarria S, Selengut J, Shamblin C, Sullivan SA, White O, Yu Y, Zafar N, Zhou L, Fraser CM. 2004. Structural flexibility in the Burkholderia mallei genome. Proc Natl Acad Sci U S A 101:14246–14251. doi: 10.1073/pnas.0403306101. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Maughan H, Van der Auwera G. 2011. Bacillus taxonomy in the genomic era finds phenotypes to be essential though often misleading. Infection Gen Evol 11:789–797. doi: 10.1016/j.meegid.2011.02.001. [DOI] [PubMed] [Google Scholar]
  • 13.Hernández-González IL, Moreno-Hagelsieb G, Olmedo-Alvarez G. 2018. Environmentally-driven gene content convergence and the Bacillus phylogeny. BMC Evol Biol 18:148. doi: 10.1186/s12862-018-1261-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Kämpfer P, Busse H, McInroy JA, Glaeser SP. 2015. Bacillus gossypii sp. nov., isolated from the stem of Gossypium hirsutum. Int J Syst Evol Microbiol 65:4163–4168. doi: 10.1099/ijsem.0.000555. [DOI] [PubMed] [Google Scholar]
  • 15.Su F, Yu B, Sun J, Ou H, Zhao B, Wang L, Qin J, Tang H, Tao F, Jarek M, Scharfe M, Ma C, Ma Y, Xu P. 2011. Genome sequence of the thermophilic strain Bacillus coagulans 2-6, an efficient producer of high-optical-purity L-lactic acid. J Bacteriol 193:4563–4564. doi: 10.1128/JB.05378-11. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Qin J, Wang X, Wang L, Zhu B, Zhang X, Yao Q, Xu P. 2015. Comparative transcriptome analysis reveals different molecular mechanisms of Bacillus coagulans 2-6 response to sodium lactate and calcium lactate during lactic acid production. PLoS One 10:e0124316. doi: 10.1371/journal.pone.0124316. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Leao T, Castelão G, Korobeynikov A, Monroe EA, Podell S, Glukhov E, Allen EE, Gerwick WH, Gerwick L. 2017. Comparative genomics uncovers the prolific and distinctive metabolic potential of the cyanobacterial genus Moorea. Proc Natl Acad Sci U S A 114:3198–3203. doi: 10.1073/pnas.1618556114. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Beall CJ, Campbell AG, Griffen AL, Podar M, Leys EJ. 2018. Genomics of the uncultivated periodontitis-associated bacterium Tannerella sp. BU045 (oral taxon 808). mSystems 3:e00018-18. doi: 10.1128/mSystems.00018-18. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Treangen TJ, Messeguer X. 2006. M-GCAT: interactively and efficiently constructing large-scale multiple genome comparison frameworks in closely related species. BMC Bioinformatics 7:433. doi: 10.1186/1471-2105-7-433. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Estrada-Peña A, Cabezas-Cruz A. 2019. Phyloproteomic and functional analyses do not support a split in the genus Borrelia (phylum Spirochaetes). BMC Evol Biol 19:54. doi: 10.1186/s12862-019-1379-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Ren S-X, Fu G, Jiang X-G, Zeng R, Miao Y-G, Xu H, Zhang Y-X, Xiong H, Lu G, Lu L-F, Jiang H-Q, Jia J, Tu Y-F, Jiang J-X, Gu W-Y, Zhang Y-Q, Cai Z, Sheng H-H, Yin H-F, Zhang Y, Zhu G-F, Wan M, Huang H-L, Qian Z, Wang S-Y, Ma W, Yao Z-J, Shen Y, Qiang B-Q, Xia Q-C, Guo X-K, Danchin A, Saint Girons I, Somerville RL, Wen Y-M, Shi M-H, Chen Z, Xu J-G, Zhao G-P. 2003. Unique physiological and pathogenic features of Leptospira interrogans revealed by whole-genome sequencing. Nature 422:888–893. doi: 10.1038/nature01597. [DOI] [PubMed] [Google Scholar]
  • 22.Xu Y, Zhu Y, Wang Y, Chang Y, Zhang Y, Jiang X, Zhuang X, Zhu Y, Zhang J, Zeng L, Yang M, Li S, Wang S, Ye Q, Xin X, Zhao G, Zheng H, Guo X, Wang J. 2016. Whole genome sequencing revealed host adaptation-focused genomic plasticity of pathogenic Leptospira. Sci Rep 6:20020. doi: 10.1038/srep20020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Eshghi A, Gaultney RA, England P, Brûlé S, Miras I, Sato H, Coburn J, Bellalou J, Moriarty TJ, Haouz A, Picardeau M. 2018. An extracellular Leptospira interrogans leucine-rich repeat protein binds human E- and VE-cadherins. Cell Microbiol 21:e12949. doi: 10.1111/cmi.12949. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Zhukova A, Fernandes LG, Hugon P, Pappas CJ, Sismeiro O, Coppée J, Becavin C, Malabat C, Eshghi A, Zhang J, Yang FX, Picardeau M. 2017. Genome-wide transcriptional start site mapping and sRNA identification in the pathogen Leptospira interrogans. Front Cell Infect Microbiol 7:10. doi: 10.3389/fcimb.2017.00010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Subirana JA, Messeguer X. 2017. Evolution of tandem repeat satellite sequences in two closely related Caenorhabditis species. Diminution of satellites in hermaphrodites. Genes 8:351. doi: 10.3390/genes8120351. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Wolfe SA, Nekludova L, Pabo CO. 2000. DNA recognition by Cys2His2 zinc finger proteins. Annu Rev Biophys Biomol Struct 29:183–212. doi: 10.1146/annurev.biophys.29.1.183. [DOI] [PubMed] [Google Scholar]
  • 27.Castresana J, Guigó R, Albà MM. 2004. Clustering of genes coding for DNA binding proteins in a region of atypical evolution of the human genome. J Mol Evol 59:72–79. doi: 10.1007/s00239-004-2605-z. [DOI] [PubMed] [Google Scholar]
  • 28.Hartley G, O’Neill RJ. 2019. Centromere repeats: hidden gems of the genome. Genes 10:223. doi: 10.3390/genes10030223. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Jiang J, Birchler JA, Parrott WA, Dawe RK. 2003. A molecular view of plant centromeres. Trends Plant Sci 8:570–575. doi: 10.1016/j.tplants.2003.10.011. [DOI] [PubMed] [Google Scholar]
  • 30.Miga KH. 2019. Centromeric satellite DNAs: hidden sequence variation in the human population. Genes 10:352. doi: 10.3390/genes10050352. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Dame RT, Rashid FM, Grainger DC. 2020. Chromosome organization in bacteria: mechanistic insights into genome structure and function. Nat Rev Genet 21:227–242. doi: 10.1038/s41576-019-0185-4. [DOI] [PubMed] [Google Scholar]
  • 32.Wu F, Japaridze A, Zheng X, Wiktor J, Kerssemakers JWJ, Dekker C. 2019. Direct imaging of the circular chromosome in a live bacterium. Nat Commun 10:2194. doi: 10.1038/s41467-019-10221-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Shen BA, Landick R. 2019. Transcription of bacterial chromatin. J Mol Biol 431:4040–4066. doi: 10.1016/j.jmb.2019.05.041. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Ali SS, Whitney JC, Stevenson J, Robinson H, Howell PL, Navarre WW. 2013. Structural insights into the regulation of foreign genes in salmonella by the Hha/H-NS complex. J Biol Chem 288:13356–13369. doi: 10.1074/jbc.M113.455378. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Qin L, Erkelens AM, Bdira FB, Dame RT. 2019. The architects of bacterial DNA bridges: a structurally and functionally conserved family of proteins. Open Biol 9:190223. doi: 10.1098/rsob.190223. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Khatri I, Sharma S, Ramya TNC, Subramanian S. 2016. Complete genomes of Bacillus coagulans S-lac and Bacillus subtilis TO-A JPC, two phylogenetically distinct probiotics. PLoS One 11:e0156745. doi: 10.1371/journal.pone.0156745. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Picardeau M. 2017. Virulence of the zoonotic agent of leptospirosis: still terra incognita? Nat Rev Microbiol 15:297–307. doi: 10.1038/nrmicro.2017.5. [DOI] [PubMed] [Google Scholar]
  • 38.Fouts DE, Matthias MA, Adhikarla H, Adler B, Amorim-Santos L, Berg DE, Bulach D, Buschiazzo A, Chang Y, Galloway RL, Haake DA, Haft DH, Hartskeerl R, Ko AI, Levett PN, Matsunaga J, Mechaly AE, Monk JM, Nascimento ALT, Nelson KE, Palsson B, Peacock SJ, Picardeau M, Ricaldi JN, Thaipandungpanit J, Wunder EA Jr, Yang F, Zhang J, Vinetz JM. 2016. What makes a bacterial species pathogenic? Comparative genomic analysis of the genus Leptospira. PLoS Negl Trop Dis 10:e0004403. doi: 10.1371/journal.pntd.0004403. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Raddi G, Morado DR, Yan J, Haake DA, Yang XF, Liu J. 2012. Three-dimensional structures of pathogenic and saprophytic Leptospira species revealed by cryo-electron tomography. J Bacteriol 194:1299–1306. doi: 10.1128/JB.06474-11. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Mazouni K, Pehau-Arnaudet G, England P, Bourhy P, Girons IS, Picardeau M. 2006. The Scc spirochetal coiled-coil protein forms helix-like filaments and binds to nucleic acids generating nucleoprotein structures. J Bacteriol 188:469–476. doi: 10.1128/JB.188.2.469-476.2006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Subirana JA, Messeguer X. 2018. How long are long tandem repeats? A challenge for current methods of whole-genome sequence assembly: the case of satellites in Caenorhabditis elegans. Genes 9:500. doi: 10.3390/genes9100500. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Tørresen OK, Star B, Mier B, Andrade-Navarro MA, Bateman A, Jarnot P, Gruca A, Grynberg M, Kajava AV, Promponas VJ, Anisimova M, Jakobsen KS, Linke D. 2019. Tandem repeats lead to sequence assembly errors and impose multi-level challenges for genome and protein databases. Nucleic Acids Res 47:10994–11006. doi: 10.1093/nar/gkz841. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Llanes A, Restrepo CM, Rajeev S. 2016. Whole genome sequencing allows better understanding of the evolutionary history of Leptospira interrogans serovar Hardjo. PLoS One 11:e0159387. doi: 10.1371/journal.pone.0159387. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental file 1
JB.00229-20-s0001.xlsx (1.3MB, xlsx)
Supplemental file 2
JB.00229-20-s0002.xlsx (314KB, xlsx)
Supplemental file 3
JB.00229-20-s0003.pdf (779.8KB, pdf)

Data Availability Statement

An Excel file with the sequences of all satellites and a list of all satellite families and their members have been deposited in GitHub (https://github.com/jasubirana/Bacterial-Satellites).


Articles from Journal of Bacteriology are provided here courtesy of American Society for Microbiology (ASM)

RESOURCES