Skip to main content
Nucleic Acids Research logoLink to Nucleic Acids Research
. 2002 Jan 15;30(2):e6. doi: 10.1093/nar/30.2.e6

A new approach to genome mapping and sequencing: slalom libraries

Veronika I Zabarovska 1, Rinat Z Gizatullin 2, Ali N Al-Amin 2, Raf Podowski 2, Alexei I Protopopov 1,2, Sven Löfdahl 3, Claes Wahlestedt 2, Gösta Winberg 1,3, Vladimir I Kashuba 1,2, Ingemar Ernberg 1, Eugene R Zabarovsky 1,2,4,a
PMCID: PMC99845  PMID: 11788732

Abstract

We describe here an efficient strategy for simultaneous genome mapping and sequencing. The approach is based on physically oriented, overlapping restriction fragment libraries called slalom libraries. Slalom libraries combine features of general genomic, jumping and linking libraries. Slalom libraries can be adapted to different applications and two main types of slalom libraries are described in detail. This approach was used to map and sequence (with ∼46% coverage) two human P1-derived artificial chromosome (PAC) clones, each of ∼100 kb. This model experiment demonstrates the feasibility of the approach and shows that the efficiency (cost-effectiveness and speed) of existing mapping/sequencing methods could be improved at least 5–10-fold. Furthermore, since the efficiency of contig assembly in the slalom approach is virtually independent of length of sequence reads, even short sequences produced by rapid, high throughput sequencing techniques would suffice to complete a physical map and a sequence scan of a small genome.

INTRODUCTION

In the past two years, impressive progress has been made in mapping and sequencing whole genomes of various organisms (15). A draft sequence of the human genome has recently been published (6,7). Two basic strategies have so far been employed for genome sequencing.

According to one scheme, the whole genome is mapped using different types of markers and a minimal tiling set of large insert clones, such as cosmids, P1 (PAC) or bacterial (BAC) artificial chromosome clones, is established. Subsequently, these large insert clones are sequenced using a shotgun sequencing strategy: small insert libraries, containing randomly sheared fragments of the large insert clones, are constructed and clones are sequenced from the ends.

The second approach, the whole genome shotgun sequencing strategy (WGS), was recently developed by Venter and colleagues (3), and has proved valuable. This method involves end sequencing of large (PACs, BACs or plasmids with a 50 kb insert) and small insert (2 and 10 kb) clones. PAC and BAC clones covering the whole genome should be carefully mapped. DNA fragments in the small insert clones are generated by physical shearing of whole genomic DNA. Some of the most recent achievements using this strategy have been determination of the nucleotide sequence of nearly the entire 120 Mb euchromatic portion of the Drosophila melanogaster genome (3,4) and generation of the draft sequence of the human genome (7). The WGS method requires the generation of sequences covering the whole genome 10–15 times (4). If sequence coverage is less, then the contig assembly process cannot be completed and sequences and assembled sequences will represent mainly disconnected unordered islands. Therefore, despite impressive technological progress, mapping and sequencing of even small bacterial genomes is still expensive and laborious.

After completion of the genomic sequence from one organism there will be a great demand, in many cases, for comparisons with the genomes of other individuals, related species, etc. The growing field of comparative genomics is highly relevant for our understanding of human and animal health, epidemiology, evolution and ecology.

Such comparative techniques should be able to be applied rapidly and effectively to related bacterial strains and species, in order to identify the genomic basis for their pathogenic and biological differences, e.g. in the challenging task of studying the human intestinal flora or to identify pathogenicity islands.

The strategies available for mapping and sequencing are not equal to the challenge of high throughput comparative genomics. Alternative and complementary strategies need to be developed and the imperative now is to find cost-effective and convenient methods that allow comparative genomics projects to be undertaken by a wide range of laboratories.

We have previously suggested an approach for large scale mapping of the human genome based on shotgun sequencing of NotI jumping/linking clones (Fig. 1). We have demonstrated the validity of this method and established that jumps over 1.5 Mb can be achieved (813). A significant advantage of this procedure is that it can be automated and used to construct accurate physical and genetic maps at 100–300 kb resolution. For such fine mapping other restriction enzymes, which contain a CpG pair in a shorter recognition site, such as XmaIII, XhoI or SalI, would obviously be more relevant.

Figure 1.

Figure 1

General scheme of computer-assisted large scale mapping, using NotI linking and jumping libraries. Arrowheads show the sequenced part of the DNA fragments. In the genomic DNA, the NotI sites are positioned at junctions between the numbered boxes. BamHI sites are indicated by small vertical arrows (8).

The approach to sequence sampling that we propose in the present work will allow the establishment of a physical map with minimal sets of overlapping clones, which will pinpoint differences in genome organization between organisms. At the same time, a considerable sequence coverage of the genome (∼50%) will be achieved. This will make it possible to locate virtually every gene in a genome, for more detailed study.

The concept is based on alternative approaches to the construction of linking and jumping libraries (8,1416) and involves the construction of ‘slalom libraries’.

The main purpose of this work is to demonstrate the feasibility and efficiency of this technology and to show that it can be exploited in a cost-efficient manner in combination with new high throughput sequencing techniques, such as pyrosequencing or massively parallel signature sequencing (MPSS) (17,18).

MATERIALS AND METHODS

General molecular biology methods

Common molecular biology and microbiology procedures were performed according to standard methods.

DNA from PAC clones was isolated using a Qiagen Large-Construct Kit. Plasmid isolation was done using Biorobot 9600 and R.E.A.L. Prep 96 Biorobot kits (Qiagen) according to the manufacturer’s protocols. Prior to sequencing, the quality of DNA and insert size were evaluated by agarose gel electrophoresis.

Sequencing was performed using an ABI377 sequencer (Perkin Elmer, Foster City, CA) according to the manufacturer’s instructions.

Construction of slalom libraries

A BamHI library (B slalom library) was constructed using the pBluescriptII KS(+) (Stratagene) vector digested with BamHI and dephosphorylated with calf intestinal phosphatase (CIP). PAC DNA (3 µg) was digested with 30 U BamHI for 1 h. Upon completion of digestion, the enzyme was heat-inactivated for 20 min at 65°C. Approximately 0.5 µg digested DNA was ligated overnight in the presence of the cloning vector (0.1 µg) at room temperature. The ligation mixture was transformed into XL2-Blue cells by electroporation and DNA from white colonies was isolated and sequenced, using reverse and universal primers. An EcoRI library (R slalom library) was constructed in the same way, but the SLΔB vector was used for cloning. This vector was constructed from BluescriptII KS(+), in which the BamHI site was removed without destroying the open reading frame. Therefore, the colonies with non-recombinant SLΔB plasmids are blue and those with recombinant plasmids are white. To construct the connecting library (RBR slalom library), plasmid DNA was isolated from ∼2 × 104 pooled clones of the R slalom library and completely digested with BamHI (R-jumping DNA). The kanamycin resistance gene was obtained by PCR amplification from the pUC4K plasmid (Amersham Pharmacia Biotech). PCR primers used were: LinkB-for, 5′-GAA GGG ATC CGC TGA GGT CTG CC-3′; LinkB-rev, 5′-GAA GGG ATC CGG GGA AAG CCA CG-3′. PCR was performed in a 100 µl solution containing 67 mM Tris–HCl, pH 9.1, 16.6 mM (NH4)2SO4, 1.0 mM MgCl2, 0.1% Tween-20, 200 µM dNTPs, 5 ng pUC4K DNA, 400 nM each primer and 5 U Taq DNA polymerase. The PCR cycling conditions were 95°C for 2.5 min, followed by 15 cycles of 95°C for 0.5 min, 55°C for 1 min and 72°C for 1.5 min, with a final extension at 72°C for 3 min. PCR amplified DNA was concentrated with ethanol, digested with BamHI and treated with CIP. The sample was then purified using a JETquick PCR Purification Spin Kit (Genomed Inc.) and dissolved in 100 µl of H2O (Kan-B DNA).

Ligation of 0.5 µg R-jumping DNA with 0.5 µg Kan-B DNA was performed overnight as described above. The ligate was transformed into XL2-Blue cells by electroporation and kanamycin-resistant colonies were selected and sequenced, using kanamycin-specific reverse and universal primers.

Mapping and sequencing of PAC clones 36b12 and 55a10

PAC clone 36b12 was isolated from high density gridded filters with the human genomic PAC library RPCI1 (UK HGMP Resource Centre) using as a probe a cDNA (accession no. AA429319) containing part of the SEMA IV gene.

PAC clone 55a10 was isolated using PCR primers to the 5′- and 3′-ends of the SEMA IV gene and PCR pools from the human PAC library RPCI1 (UK HGMP Resource Centre).

PCR primers used were: SEMA3F, 5′-AGT AGG GAA GCC CAG AGA AGA A-3′; SEMA3R, 5′-GGG GCC TAT TGG TAC TAT CTC C-3′; SEMA5F, 5′-ATT AAA AGG GAC AAG GGC TAG G-3′; SEMA5R, 5′-AAC AAC TTT AAG CAC GTC GTC A-3′.

PCR was performed in a 40 µl solution containing 67 mM Tris–HCl, pH 9.1, 16.6 mM (NH4)2SO4, 2.0 mM MgCl2, 0.1% Tween-20, 200 µM dNTPs, 3 µl PCR pool, 400 nM each primer and 5 U Taq DNA polymerase. The PCR cycling conditions were 95°C for 1.5 min, followed by 25 cycles of 95°C for 1 min, 60°C for 1 min and 72°C for 0.5 min, with a final extension at 72°C for 3 min.

Sequencing gels were run in an ABI377 automated sequencer (Perkin Elmer), according to the manufacturers’ protocols using standard primers. When sequencing from the marker (KanR) fragment, the following primers were used: linkseq-for, 5′-GCT CAT AAC ACC CCT TGT-3′; linkseq-rev, 5′-CAA CCG TGG CTC CCT CAC-3′.

The slalom clones were ordered and arranged based on sequence comparisons between the three libraries.

Sequence analysis

DNA homology searches were performed in a non-redundant (nr) database using the BLASTN (19) program on the NCBI server (http://www.ncbi.nlm.nih.gov:80/BLAST). Sequence assembly was done using DNASIS v.7.00 (Hitachi Pharmacia). In all cases default parameters were used. Repeat sequences would complicate sequence analysis and clones containing repeats at the ends would lead to multiple branches in the map. To evaluate the significance of this problem, we did not used RepeatMasker.

RESULTS AND DISCUSSION

The idea of the slalom library approach and type I slalom libraries

The principle of the slalom libraries is depicted in Figure 2. In our jumping-linking scheme (Fig. 1) we used NotI–BamHI fragments from two neighboring NotI sites as jumping clones and BamHI–NotI fragments surrounding the same NotI site as linking clones.

Figure 2.

Figure 2

The scheme demonstrating the main principle of slalom libraries. R, EcoRI sites; B, BamHI sites. Horizontal arrows show sequenced ends of the clones and vertical arrows designate the position of the restriction sites in the genomic DNA. The end sequences from the R and BR libraries were compared using a suitable computer program (e.g. DNASIS) and assembled by combining the information on the relative positions of slalom clone ends

Let us assume that EcoRI and BamHI sites are alternating in a genome. Using the same principle that underlies NotI linking/jumping libraries, we construct EcoRI ‘jumping’ and ‘linking’ libraries. It is obvious that these jumping and linking libraries will be nearly identical to the two standard genomic EcoRI (R) and BamHI (B) digested libraries. For instance, a BamHI fragment from B1 to B2 will contain the EcoRI site R2 and will be equivalent to the corresponding R2 linking clone (Fig. 2, bottom). The same is true for the R library: the EcoRI fragment from R2 to R3 will be equivalent to the jumping clone from R2 to R3 (Fig. 2, top). Therefore, instead of making the more complicated jumping and linking libraries, we can simply replace them with standard R and B libraries. The only problem is how to put EcoRI and BamHI fragments in the correct order. If we sequence with standard primers, we will obtain sequences near EcoRI sites in the first case and sequences near BamHI sites in the second case (Fig. 2, R and B libraries). The ordering problem can be solved in different ways. One way is to create a real EcoRI linking library using the enzyme BamHI, instead of the simple BamHI digested library. The EcoRI linking library (BR library) is constructed in exactly the same way as the NotI linking library: genomic DNA is digested with BamHI, circularized, opened with EcoRI and cloned (10; Figs 2 and 3, left).

Figure 3.

Figure 3

General scheme showing construction of libraries used in the slalom approach. Different types of clones present in the libraries are shown. R and B represent EcoRI and BamHI sites, respectively.

Sequences from the BR and R libraries are produced using standard reverse and forward sequencing primers and overlapping clones can be identified using a computer program (Fig. 2). The identified sequence matches will correlate the ends of EcoRI fragments in the BR library with the ends of EcoRI fragments in the R library, to yield an ordered set of BamHI–EcoRI clones or sequence tagged sites (STS) distributed along the genome. A set of minimally overlapping clones covering the whole genome can be created using this information.

Figure 2 shows the scheme when EcoRI and BamHI sites alternate. In reality, EcoRI and BamHI sites do not always alternate throughout genomes, and this will lead to gaps in this set of overlapping clones if several EcoRI or BamHI sites are located together. Therefore, the genome will not be completely covered with clones, because information between some EcoRI and BamHI sites will be lost. However, this strategy will result in a set of clones that will cover the whole genome. Only two libraries are employed in this variant of the slalom approach (type I).

Type II slalom libraries

The problem with the gaps introduced in the type I slalom libraries scheme can be solved using the type II slalom libraries approach, in which three libraries are utilized (Figs 3 and 4): an EcoRI jumping (slalom RBR or connecting) library is constructed in addition to the R and B libraries.

Figure 4.

Figure 4

Simplified slalom library scheme. Small horizontal arrows indicate the sequenced part of the DNA fragments. Red vertical arrows indicate EcoRI sites and blue vertical arrows indicate BamHI sites. Dashed lines designate genomic DNA sequences missing in the RBR library.

To construct R and B slalom libraries, genomic DNA is digested with a restriction enzyme (EcoRI or BamHI, respectively) and cloned in the appropriate vector. The RBR library can be constructed in different ways. For example, DNA isolated en masse from a slalom R library is digested with BamHI, circularized in the presence of the KanR gene and plated on agar with kanamycin (Fig. 3, right). The clones isolated in this manner will be practically (EcoRI–EcoRI fragments will be missing) identical in structure to the clones from an EcoRI jumping library prepared in the classical way (8,16).

By comparing end sequences of the B library clones with the internal BamHI sequences (from the marker fragments) of the slalom RBR library clones, the BamHI clones can be positioned relative to each other. After comparing end sequences in the R and RBR libraries, EcoRI clones can be positioned relative to each other. Finally, EcoRI and BamHI clones can be assembled into a contig representing their physical positions in the complete genome.

It is clear that a type II slalom library can be used in an ‘express’ version, like a type I slalom library. Only two libraries are used: a B library that will be sequenced with the standard reverse and forward primers and a slalom RBR library that will be sequenced using marker-specific primers. This scheme is, in reality, identical to the scheme shown in Figure 2, the only difference being that in the original scheme R and slalom BR libraries are used.

Mapping and sequencing of PAC clones 36b12 and 55a10

To demonstrate the feasibility of the method, two PAC clones (36b12 and 55a10), each containing an ∼100 kb insert of human DNA, were mapped and partially sequenced using the type II slalom library. 180 subclones of PAC clone 36b12 and 204 subclones of PAC clone 55a10 were sequenced (Table 1). The average length of the sequence reads was 680 bp, the accuracy was 99.5% (compared to the human genome sequence available in the EMBL database). A single read sequence from one of several identical clones/sequences was used for the alignment.

Table 1. The number of end sequences used in the study.

PAC clone Slalom library
  Ba Ra RBRa Sum
36b12 80 (30) 107 (46) 50 (24) 237 (76)
55a10 98 (32) 109 (38) 59 (36) 266 (70)
Total 178 (62) 216 (84) 109 (60) 503 (146)

aThe first figure is the total number of generated sequences; the number of unique sequences is given in parentheses.

Initially, all clones were sequenced with the reverse primer and only unique clones were sequenced with the forward primer and, if necessary, with the kanamycin-specific primer.

Altogether, 342 kb of sequence was generated, corresponding to a 1.5-fold coverage of the PAC clones. Five clones were present only once and others 2–15 times. These sequences were sufficient to order clones and to cover the two PACs completely. A minimal tiling set of the clones was also established (Figs 5 and 6). As each time we isolated plasmid DNA from 96 clones using a Biorobot 9600 and sequencing was done in a ABI377 analyzer, it is quite possible that the same results could have been achieved after less sequencing.

Figure 5.

Figure 5

Schematic picture of slalom mapping of PAC clone 36b12. (A) Slalom clones in the B library. (B) Restriction map of the PAC clone. (C) The minimal set of overlapping clones established using the slalom type I libraries approach. (D) The minimal set of overlapping clones established using the slalom type II libraries approach. (E) Slalom clones in the RBR or connecting library. (F) Slalom clones in the R library. The vector region in (B) is shown without details. Small blue horizontal arrows indicate sequences generated from BamHI sites and red arrows indicate sequences from EcoRI sites. Vertical arrows indicate BamHI (Bn), EcoRI (Rn), left (End1) and right (End2) ends of the inserts. Vector EcoRI sites are labeled V. Dashed lines designate genomic DNA sequences missing in the connecting RBR library. Fragments printed in yellow were not actually cloned (R2-3 and R11-12).

Figure 6.

Figure 6

Schematic picture of slalom mapping of PAC clone 55a10. All designations are as in Figure 5. Clone B6-7 was not recovered from the libraries.

A BLAST (19) search of the sequences showed that these regions of the human genome had already been sequenced. PAC clone 36b12 represents sequences from 37 455 to 131 808 (GenBank accession no. AC008064) and PAC clone 55a10 represents sequences 12 936 to 110 500 (GenBank accession no. NT_000067). The alignment of our map with the sequenced human genomic fragments showed that some small BamHI and EcoRI fragments were missing in our scheme. Alternatively, they may represent restriction fragment polymorphisms. However, this does not constitute a problem for the final tiling paths that completely cover the entire PACs.

If we used only our sequence information (i.e. as in sequencing a new genome) then a complete contig of overlapping clones could be established and this approach would generate 19.7 kb of PAC 36b12 (20.9% of all insert sequence) and 22.4 kb (23.0%) of PAC 55a10. These will be ordered sequences, i.e. the distance between sequences will be known because the insert size in each clone was established before sequencing (see Materials and Methods). The largest sequence contig was 4.7 kb (average size ∼1 kb) and BamHI and EcoRI sequences overlapped only eight times. These overlaps were together less than 3.7 kb. If we used added mapping information from accession nos AC008064 and NT_000067 (i.e. as in comparing two related genomes) it would generate a total of 47.3 kb of sequence for PAC 36b12 (50.1%) and 41.5 kb for PAC 55a10 (42.5%). It is worthwhile mentioning that the SEMA IV gene was successfully detected with these slalom libraries.

Interestingly, large BamHI and EcoRI fragments (>20 kb) were successfully cloned, but the choice of enzymes was not optimal to obtain maximum sequencing information, because some fragments were too large (the largest fragment was almost 27 kb in size). The optimal size of the fragments for slalom libraries is 3–4 kb and for different genomes optimal combination of the enzymes should be established before construction of slalom libraries.

Perspectives

The major difference between the slalom library mapping/sequencing approach and the WGS strategy is that the clones are generated according to a specific scheme and using complete digestion. As a result, the number of variants required to cover the whole genome decreases significantly. The preparation of libraries for the slalom approach is remarkably simple: only complete digestion with EcoRI or BamHI is used. There is no need for size separation, agarose gel purification or establishing conditions for partial shearing/digestion. It is important to mention that there is no need to keep all slalom clones because sequencing information can be used to design PCR primers and even large fragments (up to 40–50 kb) can be amplified by long-range PCR.

The slalom library approach differs fundamentally from the shotgun sequencing approach with respect to the efficiency of assembly (EOA). The EOA for the latter method is strictly dependent on the length of sequencing reads. The longer the reads, the higher the EOA. As the slalom approach uses non-random fractionation of the DNA and each start site is tightly linked with the recognition site for the restriction enzyme, even very short sequences will, in principle, be enough to create a contig of the overlapping clones. The EOA of the slalom library approach is, therefore, essentially independent of read length. Even the short sequences generated by pyrosequencing or MPSS (17,18) should suffice, in principle. As one person can generate thousands of sequences a day using a pyrosequencer, the minimal set of overlapping clones covering a 4 Mb genome can be completed in a couple of days.

We decided to test whether smaller flanking sequences can be successfully used for the ordering of clones. After shortening all sequences to only 20 bp from the restriction enzyme recognition sequences, the EcoRI and BamHI sites, it was possible to successfully reconstruct the same clone contigs. Therefore, pyrosequencing can be combined with this technique to generate a minimal tiling path and only unique clones need be selected to generate sequence information. The use of this high throughput technique (which is not compatible with the shotgun sequencing approach) highlights the advantage of our slalom approach: 4% coverage is enough to construct a complete contig of overlapping clones covering both PACs. As was mentioned before, sequencing of unique clones produced >20% of ordered sequences and for comparative analysis ~50% of all sequences.

Before sequencing we usually check the quality of plasmid DNA by agarose gel electrophoresis. Therefore, the size of inserts is known and the distance between sequences can be easily determined. This means that even without sequencing a given genome, its size can be precisely established.

We specifically selected human DNA for testing the slalom scheme because human DNA contains a number of repetitive sequences that create serious problems in establishing the complete human genome sequence. As was mentioned in Materials and Methods, to evaluate the significance of this problem, we did not use RepeatMasker. The content of repeat sequences in the PAC clones is shown in Table 2. One PAC clone (36b12) contained more repeats than the human genome on average and another (55a10) slightly less. This difference may result from different GC contents: high for 55a10 (54%) and low for 36b12 (40%). Altogether, these two PACs contained a rather large fraction of repeat sequences, constituting a good representation of the human genome, and we did not have any problems with repeats, even when using only 20 bp end sequences. Repeat sequences may in fact be less of a problem for the slalom approach than for the shotgun sequencing approach for a number of reasons. First, restriction enzymes can be selected that do not cut inside the major repeats, i.e. Alu repeats, LINE repeats, etc. Secondly, long repeats will have unique positions in slalom clones as the second recognition site will be outside the repeat. Thirdly, short repeated elements can be sequenced without serious problems because they will be located within one particular clone (one insert), in contrast to the shotgun sequencing approach where the particular repeat is represented in many different clones and the main problem is to understand if it is really the same repeat or closely related.

Table 2. Fraction (%) of repeat sequences in the human genome and PAC clones 36b12 and 55a10.

Human genome/PAC clone Repeats        
  SINEs LINEs LTR elements DNA elements Total interspersed repeats
Human genomea 13.14 20.42 8.29 2.84 44.83
36b12 11.93 22.75 10.79 2.30 47.77 (49.44)b
55a10 18.57 4.02 0.18 2.15 24.92 (26.41)b
36b12 + 55a10 15.30 13.23 5.40 2.22 36.16 (37.73)b

aData from Lander et al. (6).

bInterspersed repeats including simple and low complexity repeats.

Based on these experiments, we believe that the slalom library approach is suitable for at least two major applications. (i) For mapping and sequencing large genomes (e.g. of mammals). In this case, the method should be applied to the sequencing of clones with large inserts, e.g. BACs or PACs. (ii) For mapping and sequencing small genomes (e.g. bacterial). Here, the method can be applied directly to the whole genome.

It is important to stress that the benefits of the slalom approach are most obvious in comparative sequencing experiments in combination with high throughput techniques like pyrosequencing or MPSS.

For partial genome sequencing (e.g. for comparing bacterial strains), the minimal set of clones covering the genome is established using pyrosequencing and then sequences covering 50% of the whole genome are generated. The efficiency of this approach will be close to 100% (unique sequence information compared to all generated sequences). Sequenced islands will be separated by gaps, but clones covering these gaps will be available and the order of islands/gaps will be known. Therefore, it will be easy to identify interesting genomic regions, e.g. a pathogenic island, and sequence it.

If the aim is to completely sequence a given genome, then sequence gaps can be filled using any of the standard methods, such as primer walking or transposon-mediated sequencing (20). Of course, closing of the gaps will be done with significantly lower efficiency. However, the finishing stages of the shotgun sequencing approach are also the most expensive and time consuming part of the process. Lander et al. (6) distinguished three types of gaps: gaps within unfinished sequenced clones; gaps between sequenced clone contigs, but within fingerprint clone contigs; gaps between fingerprint clone contigs. The first type is the simplest and the third is the most complicated to close because constructing a contig of overlapping clones is the most difficult procedure. With the slalom approach we already have a contig of overlapping clones and thus we will only suffer from the first, simplest type of gap. It is important to mention another difference in the finishing stages of these two approaches. With the shotgun sequencing approach sequences from different clones must be connected and here highly related repeats, gene families and polymorphisms will represent a major problem. These problems are non-existent in the slalom approach, where a single insert should be sequenced.

In summary, in this study we applied a novel strategy for the mapping and partial sequencing of two human PACs (230 kb) which contained ∼38% different repeats. One and a half sequence coverage was achieved (342 kb). Without added mapping information and at the same sequence coverage (1.5-fold) the shotgun sequencing approach can generate practically no ordered sequence information/contigs of overlapping clones. The slalom approach generated a complete contig of overlapping clones and >20% of sequences were ordered (in addition, 25–30% of sequences were not ordered, but can be ordered using additional mapping information). Moreover, as calculations showed, in our particular experiment ∼4% coverage would be enough to construct the contig of the overlapping clones and subsequently generate >20% ordered sequences with almost 100% efficiency, i.e. with 0.2-fold coverage. The benefits of the slalom approach will be most obvious in comparative sequencing experiments in combination with new high throughput sequencing technologies which cannot be used in the shotgun sequencing approach.

Acknowledgments

ACKNOWLEDGEMENTS

The authors are grateful to Dr Michael Lerman for fruitful discussions and valuable advice. This work was supported by research grants from the Swedish Cancer Society, the Swedish Research Council for Engineering Sciences, Ingabritt och Arne Lundbergs Forskningsstiftelse, the Royal Swedish Academy of Sciences, Pharmacia Corporation Center for Genomics Research and the Karolinska Institute.

REFERENCES

  • 1.Dunham I., Hunt,A.R., Collins,J.E., Bruskiewich,R., Beare,D.M., Clamp,M., Smink,L.J., Ainscough,R., Almeida,J.P., Babbage,A. et al. (1999) The DNA sequence of human chromosome 22. Nature, 402, 489–495. [DOI] [PubMed] [Google Scholar]
  • 2.Hattory M., Fujiyama,A., Taylor,T.D., Watanabe,H., Yada,T., Park,H.-S., Toyoda,A., Ishii,K., Totoki,Y., Choi,D.-K. et al. (2000) The DNA sequence of human chromosome 21. Nature, 405, 311–319. [DOI] [PubMed] [Google Scholar]
  • 3.Adams M.D., Celniker,S.E., Holt,R.A., Evans,C.A., Gocayne,J.D., Amanatides,P.G., Scherer,S.E., Li,P.W., Hoskins,R.A., Galle,R.F. et al. (2000) The genome sequence of Drosophila melanogaster. Science, 287, 2185–2195. [DOI] [PubMed] [Google Scholar]
  • 4.Myers E.W., Sutton,G.G., Delcher,A.L., Dew,I.M., Fasulo,D.P., Flanigan,M.J., Kravitz,S.A., Mobarry,C.M., Reinert,K.H., Remington,K.A. et al. (2000) A whole-genome assembly of Drosophila. Science, 287, 2196–2204. [DOI] [PubMed] [Google Scholar]
  • 5.Broder S. and Venter,J.C. (2000) Sequencing the entire genomes of free-living organisms: the foundation of pharmacology in the new millennium. Annu. Rev. Pharmacol. Toxicol., 40, 97–132. [DOI] [PubMed] [Google Scholar]
  • 6.Lander E.S., Linton,L.M., Birren,B., Nusbaum,C., Zody,M.C., Baldwin,J., Devon,K., Dewar,K., Doyle,M., FitzHugh,W. et al. (2001) Initial sequencing and analysis of the human genome. International Human Genome Sequencing Consortium. Nature, 409, 860–921. [DOI] [PubMed] [Google Scholar]
  • 7.Venter J.C., Adams,M.D., Myers,E.W., Li,P.W., Mural,R.J., Sutton,G.G., Smith,H.O., Yandell,M., Evans,C.A., Holt,R.A. et al. (2001) The sequence of the human genome. Science, 291, 1304–1351. [DOI] [PubMed] [Google Scholar]
  • 8.Zabarovsky E.R., Boldog,F., Erlandsson,R., Allikmets,R.L., Kashuba,V.I., Marcsek,Z., Stanbridge,E., Sumegi,J., Klein G. and Winberg G. (1991) New strategy for mapping the human genome based on a novel procedure for construction of jumping libraries. Genomics, 11, 1030–1039. [DOI] [PubMed] [Google Scholar]
  • 9.Zabarovsky E.R., Kashuba,V.I., Zakharyev,V.M., Petrov,N., Pettersson,B., Lebedeva,T., Gizatullin,R., Pokrovskaya,E.S., Bannikov,V.M., Zabarovska,V.I. et al. (1994) Shot-gun sequencing strategy for long-range genome mapping: a pilot study. Genomics, 21, 495–500. [DOI] [PubMed] [Google Scholar]
  • 10.Zabarovsky E.R., Kashuba,V.I., Gizatullin,R.Z., Winberg,G., Zabarovska,V.I., Erlandsson,R., Domninsky,D.A., Bannikov,V.M., Pokrovskaya,E., Kholodnyuk,I. et al. (1996) NotI jumping and linking clones as a tool for genome mapping and analysis of chromosome rearrangements in different tumors. Cancer Detect. Prev., 20, 1–10. [PubMed] [Google Scholar]
  • 11.Kashuba V.I., Szeles,A., Allikmets,R., Nilsson,A.S., Bergerheim,U.S., Modi,W., Grafodatsky,A., Dean,M., Stanbridge,E.J., Winberg,G. et al. (1995) A group of NotI jumping and linking clones cover 2.5 Mb in the 3p21-p22 region suspected to contain a tumor suppressor gene. Cancer Genet. Cytogenet., 81, 144–150. [DOI] [PubMed] [Google Scholar]
  • 12.Kashuba V.I., Gizatullin,R.G., Protopopov,A.I., Allikmets,R., Korolev,S., Li,J., Boldog,F., Tory,K., Zabarovska,V.I., Marcsek,Z. et al. (1997) NotI linking/jumping clones of human chromosome 3: mapping of the TFRC, RAB7 and HAUSP genes to regions rearranged in leukemia and deleted in solid tumors. FEBS Lett., 419, 181–185. [DOI] [PubMed] [Google Scholar]
  • 13.Kashuba V.I., Gizatullin,R.Z., Protopopov,A.I., Li,J., Vorobieva,N.V., Fedorova,L., Zabarovska,V.I., Muravenko,O.V., Kost-Alimova,M., Domninsky,D.A. et al. (1999) Analysis of NotI linking clones isolated from human chromosome 3 specific libraries. Gene, 239, 259–271. [DOI] [PubMed] [Google Scholar]
  • 14.Collins F.S. and Weissman,S.M. (1984) Directional cloning of DNA fragments at a large distance from an initial probe: a circularisation method. Proc. Natl Acad. Sci. USA, 81, 6812–6816. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Poustka A., Pohl,T.M., Barlow,D.P., Frischauf,A.M. and Lehrach,H. (1987) Construction and use of chromosome jumping libraries from NotI-digested DNA. Nature, 325, 353–355. [DOI] [PubMed] [Google Scholar]
  • 16.Zabarovsky E.R., Winberg,G. and Klein,G. (1993) The SK-diphasmids—vectors for genomic, jumping and cDNA libraries. Gene, 127, 1–14. [DOI] [PubMed] [Google Scholar]
  • 17.Ronaghi M., Uhlen,M. and Nyren,P. (1998) A sequencing method based on real-time pyrophosphate. Science, 281, 363–365. [DOI] [PubMed] [Google Scholar]
  • 18.Brenner S., Johnson,M., Bridgham,J., Golda,G., Lloyd,D.H., Johnson,D., Luo,S., McCurdy,S., Foy,M., Ewan,M. et al. (2000) Gene expression analysis by massively parallel signature sequencing (MPSS) on microbead arrays. Nat. Biotechnol., 18, 630–634. [DOI] [PubMed] [Google Scholar]
  • 19.Altschul S.F., Gish,W., Miller,W., Myers,E.W. and Lipman,D.J. (1990) Basic local alignment search tool. J. Mol. Biol., 215, 403–410. [DOI] [PubMed] [Google Scholar]
  • 20.Haapa S., Taira,S., Heikkinen,E. and Savilahti,H. (1999) An efficient and accurate integration of mini-Mu transposons in vitro: a general methodology for functional genetic analysis and molecular biology applications. Nucleic Acids Res., 27, 2777–2784. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press

RESOURCES