Skip to main content

This is a preprint.

It has not yet been peer reviewed by a journal.

The National Library of Medicine is running a pilot to include preprints that result from research funded by NIH in PMC and PubMed.

bioRxiv logoLink to bioRxiv
[Preprint]. 2023 Mar 7:2023.03.06.530899. [Version 1] doi: 10.1101/2023.03.06.530899

Tigerfish designs oligonucleotide-based in situ hybridization probes targeting intervals of highly repetitive DNA at the scale of genomes

Robin Aguilar 1, Conor K Camplisson 1, Qiaoyi Lin 1, Karen H Miga 3,4, William S Noble 1,2,*, Brian J Beliveau 1,5,*
PMCID: PMC10028787  PMID: 36945528

Abstract

Fluorescent in situ hybridization (FISH) is a powerful method for the targeted visualization of nucleic acids in their native contexts. Recent technological advances have leveraged computationally designed oligonucleotide (oligo) probes to interrogate >100 distinct targets in the same sample, pushing the boundaries of FISH-based assays. However, even in the most highly multiplexed experiments, repetitive DNA regions are typically not included as targets, as the computational design of specific probes against such regions presents significant technical challenges. Consequently, many open questions remain about the organization and function of highly repetitive sequences. Here, we introduce Tigerfish, a software tool for the genome-scale design of oligo probes against repetitive DNA intervals. We showcase Tigerfish by designing a panel of 24 interval-specific repeat probes specific to each of the 24 human chromosomes and imaging this panel on metaphase spreads and in interphase nuclei. Tigerfish extends the powerful toolkit of oligo-based FISH to highly repetitive DNA.

Introduction

Fluorescent in situ hybridization (FISH) is a powerful technique that can reveal the spatial positioning and abundance of DNA and RNA molecules in fixed samples with subcellular resolution. Since their introduction in 19691, ISH and later FISH24 methods have been refined to improve their detection efficiency and sensitivity5. One important technical development has been the introduction of synthetic DNA oligonucleotides (oligos) as a source of probe material6. Oligo-based probes offer important advantages over more traditional probes deriving from isolated genomic material, as oligo probes can be designed to have specific thermodynamic properties and programmed to contain stretches of exogenous sequences that can serve as ‘readout’ domains via the ‘secondary’ hybridization of a labeled, complementary oligo. These advantages have led to the introduction of a growing set of ‘spatial genomics’ and ‘spatial transcriptomics’ methods that use complex ‘probe sets’ of many distinct oligo species710 in combination with iterative rounds of secondary hybridization to visualize dozens or more genomic regions1114 and thousands or more RNA species1517, respectively, in the same cell or tissue sample.

The rapid adoption of oligo probes as a source of FISH probe material has also catalyzed the parallel development of computational tools for oligo probe design. These tools—which include OligoArray18, PROBER19, Chorus20, mathFISH21, OligoMiner22, iFISH23, ProbeDealer24, Chorus225, and PaintSHOP26—aim to identify short windows of genomic sequence that have suitable thermodynamic and sequence properties to serve as FISH probes. Once identified, ‘candidate’ probes are next screened for specificity to predict whether they will have off-target sites in addition to their intended target. This specificity screening typically relies on using alignment programs such as BLAST27 or Bowtie228 to search for regions with high sequence similarity to the candidate probes, the use of k-mer counting programs such as Jellyfish29 to assess whether the candidate probes contain k-mers (i.e., substrings) with high abundance in the genome of interest, or a combination of both approaches. After this specificity screening, candidate probes with predicted off-target binding are filtered and a final set of target-specific oligo probes is returned.

A key advantage of oligo probes is that they can be designed specifically to avoid targeting repetitive sequences. Repetitive sequences are frequent sources of unwanted background when performing in situ hybridization experiments due to their high copy number, and a set of “suppressive hybridization” methods using unlabeled repetitive DNA from the C0t-1 fraction30 as a blocking agent have been introduced to abrogate this background when using probes derived directly from genomic material3133. Such blocking agents are generally not needed when using oligo probes, however, as computational oligo probe design methods either avoid discovering candidate probes in sequence annotated as being repetitive by tools like RepeatMasker1820,22,26,34 or purposefully filter candidate probes that align many times to the genome18,2026 or contain highly abundant k-mers22,23,25,26. As a result, while computational oligo probe design tools are able to operate at the scale of whole plant and mammalian genomes to produce repositories of tens of millions of oligo probes23,26, a substantial fraction of large and complex genomes remains intentionally uncovered due to the presence of repetitive sequences.

Repetitive DNA accounts for ~50% of the human and mouse genomes and often even higher percentages in the genomes of plants30,35,36. Broadly, repetitive DNA falls into two categories: 1) Interspersed repeats such as SINE, LINE, and ALU elements that often occur as short, spatially isolated intervals within larger blocks of non-repetitive sequence35; 2) long tandem repeat arrays such as alpha satellite, human satellites 1–3, and the 45S ribosomal DNA at which a single monomer is repeated many times to form multi-megabase intervals of repetitive sequence that are frequently located in pericentromeric regions and on the short arms of acrocentric chromosomes36,37. Collectively, repetitive DNA sequences are central to a set of diverse and essential cellular and organismal functions, including the recruitmentment of the chromosome segregation machinery during mitosis, the encoding of essential information such as the 47S rRNA38 and the replication-dependent histone genes39, and the protection of chromosome ends40. Moreover, repetitive sequences are an important source of novel genic and regulatory sequences41 and are hypothesized to be actively involved in potent evolutionary processes such as meiotic drive and speciation42. Thus, more detailed studies of highly repetitive DNA regions and their transcription products through targeted assays such as FISH may help uncover the mechanisms by which these mysterious regions exert their influence on important biological processes.

When desired, repetitive intervals make highly robust and effective FISH targets, as one or a few probe species can bind many times and thus produce a very large, bright signal at low cost. Indeed, all of the initial ISH targets were repetitive1,43, and repetitive targets continue to be used routinely for diagnostic assays such as aneuploidy detection via interphase chromosome enumeration44. However, the deployment of probes against repetitive targets either requires the isolation and experimental validation of cloned genomic material or a priori knowledge of experimentally validated oligo sequences. Computational approaches have been introduced to identify tandem repeat regions in worm45 and plant systems46,47 to select candidate chromosome-specific imaging oligo probes for experimental validation. However, neither these approaches nor computational tools designed to target non-repetitive regions provide a computationally scalable way to assess the predicted in situ behavior of oligo probes targeting repetitive DNA in the background of large and complex genomes.

Here, we introduce Tigerfish, a computational ecosystem tailored for the design and characterization of oligo probes targeting intervals of repetitive DNA at the genome scale. Tigerfish provides all functionality needed for discovering repetitive regions de novo, designing candidate probes, and performing deep in silico profiling of predicted binding activity. Tigerfish is open source, freely available, supported by extensive documentation and tutorials, and ships with a dedicated set of utilities to make it easier for users to visualize the predicted experimental outcomes of their designs. We showcase the utility of Tigerfish by designing and experimentally validating at least one interval-specific repeat probe for all 24 human chromosomes on metaphase spreads and augment these data by performing interphase enumeration of chromosomal copy number in human primary lymphocytes for all 24 human chromosomes. Finally, we provide a comprehensive catalog of probes and their predicted associated binding specificities that have been discovered by Tigerfish in the fully assembled human CHM13 genome released by the Telomere-to-Telomere Consortium36. As our knowledge of the complete sequence of highly repetitive regions and how these regions vary amongst individuals and populations continues to increase from efforts such as the Human Pangenome Project48 and Vertebrate Genomes Project49, we anticipate that Tigerfish will play a key role in a number of applications including genome assembly variation, in situ karyotyping, and biological discovery.

Results

Oligo probe design with Tigerfish

Tigerfish is a computational pipeline composed of a collection of Python scripts embedded in an automated Snakemake workflow50 and is designed to be executed in a command line environment. No direct knowledge of programming is required to run Tigerfish, and this bioinformatic workflow can be deployed on any modern Windows, Macintosh, or Linux system. Tigerfish is open-source, freely available via GitHub (https://github.com/beliveau-lab/TigerFISH), and depends on Bowtie228, NUPACK51, Jellyfish,29 SamTools52, Biopython53, Scikit-learn54, and chromoMap55. Tigerfish is also supported by extensive documentation (https://beliveau-labtigerfish.readthedocs-hosted.com). In order to run Tigerfish, users must include the full sequence of the genome assembly in which probe design is to be performed in FASTA format56 and also provide an accompanying ‘chrom.sizes’ file that details the scaffolds present in the assembly and their lengths in base pairs. Users must also edit a small configuration file in which the locations of relevant files and scripts can be specified and parameter choices for the probe discovery can be specified.

Tigerfish can be run in one of three execution modes (Fig. 1). In the first, termed “Repeat Discovery Mode”, users list genomic scaffolds where de novo repeat discovery and probe design is to be performed in the configuration file. Repeat Discovery Mode uses a k-mer counting strategy to identify repetitive DNA regions de novo by identifying intervals that contain k-mers with high abundance in the genome (Methods). Users can tune the size of the search window and the magnitude of the k-mer count values needed for an interval to be flagged as repetitive, thereby controlling the nature of the repeat regions identified. Tigerfish may also be run in “Probe Design Mode” in instances where the genomic interval(s) a user wants to target for probe design are already known. In this case, the user must provide an additional BED-formatted file57 that specifies the genomic coordinates for interval(s) to perform probe design against. Lastly, “Probe Analysis Mode” generates a new set of in silico binding predictions for probes contained in an existing Tigerfish output file. Tutorials providing a comprehensive walkthrough of these three modes,along with an example of implementing Tigerfish in the human CHM13 genome on a satellite repeat, can be found at https://beliveau-lab-tigerfish.readthedocs-hosted.com.

Fig. 1 |. The Tigerfish workflow.

Fig. 1 |

Schematic overview of the inputs, major processing steps, and outputs of the Tigerfish probe design pipeline.

When using Repeat Discovery Mode or Probe Design Mode, Tigerfish designs candidate oligo probes for each genomic interval passed forward (Repeat Discovery Mode) or specified in the user-provided BED (Probe Design Mode). Candidate probe discovery is performed using a modified version of the ‘blockParse.py’ script from OligoMiner22 that screens the provided sequences for windows with desirable sequence and thermodynamic properties (Methods). To maximize the chance that the optimal probe or set of probes will be identified, Tigerfish mines the entire repeat region for candidate probes, which can result in redundant and even duplicate candidate probe sequences being returned. In order to minimize the amount of downstream computation needed, duplicates are removed and the candidate probes for each region are then rank-ordered to prioritize candidates that contain k-mers with elevated abundance specifically in the target interval from which they were designed (Methods), as such candidates are more likely to have many on-target binding sites while having minimal binding elsewhere in the genome.

In order to return a final probe set, Tigerfish begins with the top-ranked candidate probe for each target interval and performs deep in silico specificity profiling. The selected candidate probe is aligned to the genome with very sensitive settings (Methods) and up to 500,000 alignments are returned. The genomic sequence of each alignment site is then extracted and put into a virtual test tube to simulate how likely binding would be with the input candidate probe in FISH conditions using NUPACK51. Finally, Tigerfish processes the result of these simulations and calculates the number of predicted on- and off-target binding sites for each candidate probe (Methods). Users can specify a number of parameters to tune performance at this step, including the maximum number of allowed off-target binding sites per probe, the minimum number of required on-target binding sites per probe, and the maximum number of probes in the final set (Methods, Supplementary Note 1). If needed, Tigerfish will continue analyzing the predicted binding specificities of candidates from the rank-ordered list until either the user-supplied criteria are met or all possible candidate probes are considered. The final output of Tigerfish includes a text file containing all final probes and their aggregate on- and off-target binding predictions, a summary table that lists all target intervals for which probes were designed and their aggregate on- and off-target binding predictions for the probes that map to each interval, and a set of auxiliary files that provide more detailed information about the predicted binding profiles of the probes. Users can also optionally populate chromoMap ideograms that depict the chromosomal locations of probe binding for the probe or set of probes designed against each target interval (Fig. 1). Example input and output files for full test runs of Tigerfish in Repeat Discovery Mode, Probe Design Mode, and Probe Analysis Mode can be found within Supplementary Software.

Probe discovery at the scale of human genomes

In order to demonstrate the scalability of Tigerfish, we set out to perform genome-wide de novo repeat interval identification and probe design for all 24 chromosomes in the human telomere-to-telomere CHM13v2 + HG002 chrY assembly36 using Repeat Discovery Mode. In order to showcase how users can tune parameters to optimize their design for different types of repeat regions, we performed our genome-scale runs with two sets of parameter groupings: 1) a ‘conservative’ set that prioritizes identifying large intervals of highly repetitive sequence such as those found at pericentromeres; 2) a ‘permissive’ set that aims to discover smaller, interspersed intervals of repetitive DNA (Supplementary Data 1) in addition to larger intervals found in the ‘conservative’ set (Supplementary Data 2). The genome-wide probe design runs designed probe sets for 263 intervals, of which 235 intervals were only identified with the ‘permissive’ parameter settings and 28 intervals were identified by both parameter settings. We found that Tigerfish was able to generate at least one interval-specific probe or probe set for all 24 chromosomes, prominently covering the pericentromeric and subtelomeric regions of most chromosomes. The Tigerfish probes mostly fell into regions not already covered by existing PaintSHOP probes26 designed with non-repetitive intervals in mind (Fig. 2a) and predominantly mapped to annotated satellite and simple repeat regions (Fig. 2b). We found that the repeat intervals identified spanned a broad range of sizes ranging from 411 bp – 34.3 Mb (median: 3.6 kb) for the group identified using the ‘permissive’ settings and from 37.6 kb – 34.2 Mb (median: 2.7 Mb) for the group identified using the ‘conservative’ settings (Fig. 2c). Collectively, these probes and probe sets cover 164.5 Mb of the human T2T CHM13v2 HG002 chrY assembly after accounting for any differences between the size of the interval inputted for design and the effective size of the interval covered by the output probes (Supplementary Fig. 1). Our in silico specificity profiling also revealed a broad distribution of predicted binding activities for the probes or probe sets covering the 263 intervals, ranging from 25–30,972 target sites in the ‘permissive’ group (median: 236.9 target sites) and 500–30,972 targets sites in the ‘conservative group (median: 20,165.2 target sites) (Fig. 2d). When factoring in the size of the target intervals, we observed target site densities of 0.017–798.6 target sites per kb (median: 47.9 target sites per kb) for the ‘permissive’ group and 0.64–475.9 target sites per kb (median: 6.4 target sites per kb) for the ‘conservative’ group (Fig. 2e).

Fig. 2 |. Genome-scale probe design with Tigerfish.

Fig. 2 |

a, Schematic visualization of intervals for which Tigerfish probe sets were identified using conservative (pink) or permissive (teal) parameters and intervals covered by existing PaintSHOP probes designed using parameters suitable for non-repetitive targets (lilac). b, The distribution of RepeatMasker annotations for intervals identified and processed by Tigerfish using conservative and permissive settings. c, Length distributions of the regions identified and targeted by Tigerfish using conservative and permissive parameters. d, The aggregate number on-target binding predictions for probe sets designed by Tigerfish using conservative and permissive parameters. e, The aggregate number on-target binding predictions per kilobase for probe sets designed by Tigerfish using conservative and permissive parameters.

Validating Tigerfish probes in situ

In order to evaluate how effectively the in silico design approach of Tigerfish translates to performance in situ, we designed and conducted a series of FISH experiments. Specifically, we set out to investigate whether Tigerfish was able to generate a panel of FISH probes targeting repetitive DNA intervals specific to each of the 24 human chromosomes, as such a panel would have utility in diagnostic and chromosomal enumeration assays. In order to showcase the versatility of the different Tigerfish run modes, our panel consisted of a mix of probes designed against regions identified using “Repeat Discovery Mode” and regions selected manually based on their RepeatMasker34 annotations using “Probe Design Mode” (Supplementary Data 3 and Supplementary Data 4). The panel spanned a range of target sizes (10 kb – 4.5 Mb, mean = 1.3 Mb) and predicted on-target binding activities (477.5 – 7,228, mean = 2,418.1; Table 1). In order to verify that our Tigerfish probes were binding to their intended genomic targets, we implemented an experimental scheme in which the Tigerfish probe set targeting a given interval was co-hybridized with a set of 1,000 probes designed by PaintSHOP26 that targeted a 200 kb non-repetitive interval on the target chromosome, with the Tigerfish and PaintSHOP probe sets being labeled with spectrally distinct fluorophores (Fig. 3a, Supplementary Data 4). We used this experimental design to perform a series of 24 two-color FISH experiments on 46,XY human primary metaphase chromosome spreads (Fig. 3b). Using this approach, we confirmed that our metaphase FISH produced the predicted staining patterns for all 24 combinations of Tigerfish and PaintSHOP probe sets (Fig. 3c, Supplementary Fig. S2S5). In order to augment our metaphase data, we also performed a series of 24 interphase FISH experiments on 46,XY primary human lymphoblasts using the same Tigerfish and PaintSHOP probe set combinations as a means to visually enumerate chromosomal copy number (Fig. 4a). Specifically, we imaged >40 cells for each experiment and quantified the number of observed Tigerfish and PaintSHOP foci in the 3D volume of the nucleus (Fig. 4b, Supplementary Fig. S6S9). Our analysis of the resulting data revealed a strong agreement between the two types of probe set (78.4% concordance, n = 1,061), with both approaches predominantly displaying 2 foci per nucleus (PaintSHOP: 781/1061, 73.6%; Tigerfish: 922/1061, 86.9%) and identifying a range of foci (1–4) per nucleus consistent with our previous studies using oligo-based probes for enumeration10,22,58 (Fig. 4c). Taken together, our metaphase and interphase FISH experiments demonstrate the specificity and utility of Tigerfish for visualizing the positioning and abundance of highly repetitive DNA intervals in situ.

Table 1 |.

Description of the 24-target Tigerfish probe set panel.

Imaging Coordinates On-target Off-target Imaging Repeat Length (Mb)
chr1:134680000–134800000 7228.4 191.2 0.12
chr2:92330000–94670000 3174.8 243.0 2.34
chr3:91730000–92590000 505.4 161.4 0.86
chr4:52140000–53070000 1074.6 51.0 0.93
chr5:47650000–48150000 1675.0 352.1 0.5
chr6:58540000–61060000 2356.1 73.6 2.52
chr7:60410000–63720000 4977.8 251.6 3.31
chr8:44250000–46320000 1964.4 933.4 2.07
chr9:44960000–47230000 2049.0 261.9 2.27
chr10:39640000–40710000 1637.4 25.3 1.07
chr11:51040000–54420000 3908.9 110.8 3.38
chr12:34780000–37060000 2022.9 108.8 2.28
chr13:111520000–111570000 927.9 644.1 0.05
chr14:99470000–99490000 477.5 1188.7 0.02
chr15:8550000–8680000 4162.8 1608.9 0.13
chr16:48950000–48980000 3812.7 807.1 0.03
chr17:23890000–27420000 3912.9 512.0 3.53
chr18:15970000–20430000 4576.7 3319.3 4.46
chr19:21000000–21060000 1408.6 261.4 0.06
chr20:27580000–27630000 950.1 174.9 0.05
chr21:44760000–44780000 761.1 440.1 0.02
chr22:18540000–18550000 1347.7 295.7 0.01
chrX:58910000–59080000 1518.8 146.5 0.17
chrY:20960000–21230000 1603.9 175.5 0.27

Fig. 3 |. In situ validation of Tigerfish probes.

Fig. 3 |

a, Schematic overview of the experimental design used to validate Tigerfish probe sets on metaphase chromosome spreads also labeled with probe sets targeting non-repetitive DNA designed by PaintSHOP. b, Representative full field of view (left) and zoomed insets (right) showing Tigerfish (magenta) and PaintSHOP (yellow) probe sets targeting chr16. c, Zoomed crops depicting Tigerfish (magenta) and PaintSHOP probes targeting the indicated chromosomes. For the autosomes, each image pair was obtained from the same metaphase spread. The X and Y chromosome images were obtained from separate 46,XY spreads and thus only have one chromosome each. Please see Supplementary Figs. S8S11 for the full spread images. Images are maximum intensity projections in Z. Scale bars, 5 μm (zoomed crops) or 20 μm (fields of view).

Fig. 4 |. Chromosome enumeration in interphase nuclei.

Fig. 4 |

a, Schematic overview of the experimental design used to perform chromosome enumeration using Tigerfish probe sets in 46,XY interphase nuclei also labeled with probe sets targeting non-repetitive DNA designed by PaintSHOP. b, Representative images of nuclei labeled with Tigerfish probe sets (magenta) and PaintSHOP probe sets (yellow) targeting intervals on chr22 (top row) or chrX (bottom row). c, Heatmap displaying the observed distribution of Tigerfish and PaintSHOP puncta per nucleus. Images are maximum intensity projections in Z. Scale bars, 10 μm.

Computational requirements to run Tigerfish

In order to evaluate the computational resources required to run Tigerfish at the scale of mammalian genomes, we collected a series of benchmarking data during our probe design runs on the full human CHM13v2 + HG002 chrY assembly using the ‘permissive’ and ‘conservative’ parameter settings. Our analyses focused on four key usage metrics: 1) he “wall clock” run time, which reflects the overall duration of the run from start to finish; 2) the amount of active CPU processing time needed to complete the run; 3) the maximum amount of virtual memory used, which represents the sum total of physical (RAM) and swap (hard disk) memory allocations; 4) the maximum amount of physical memory used, which reflects the RAM component of the virtual memory pool. As Tigerfish uses Snakemake50 for parallelization, we were able to record data about these four metrics on a per-interval basis for all 263 intervals identified collectively by the ‘permissive’ and ‘conservative’ parameter settings. In line with the broad range of observed target interval sizes and target site numbers of the 263 intervals (Fig. 2fh), we also found a wide distribution of resource usage values. Our analyses revealed that probe design against the majority of target intervals finished quickly, with a median run time of 6.9 hours (range: 1.8 min – 50.2 hr) and a median CPU time of 5.1 hours (range: 0.6 min – 4.9 hr) (Fig. 5a,b). Moreover, Tigerfish generally required only modest amounts of memory for software designed to be run on a computing cluster, with a median max virtual memory allocation of 24.8 GB (range 12.3–44.8 GB) and a median max physical memory allocation of 20.3 GB (range 8.2–40.8 GB) (Fig. 5c,d). Given the observed spread in the resource usage values, we hypothesized that the resource requirements might vary as a function of the size of the target interval. Indeed, stratifying the benchmarking data into three groups based on span of the target interval revealed that the group of intervals less than 100 kb in span had a median run time of 5.8 minutes (range: 1.8 min – 43.8 min, n = 211) and the group of intervals between 100 kb and 1 Mb had run a median run time of 21.6 minutes (range: 6 min – 59.4 min, n = 20), with the group of intervals >1 Mb in span having a considerably longer median run time of 4.6 hours (range: 16.2 min – 50.1 hr, n = 32) (Supplementary Fig. 10). We did not observe a similar trend with virtual memory or physical memory usage, as all three length groups had nearly identical memory requirements (Supplementary Fig. 10). Taken together, our benchmarking results indicate that Tigerfish can readily be deployed on computing clusters or powerful individual computers to identify repetitive intervals and design probes specific to these intervals at the scale of genomes.

Fig. 5 |. Resource requirements for genome-scale Tigerfish probe design.

Fig. 5 |

a, Distribution (left) and empirical cumulative distribution (right) of the wall-clock runtime recorded for running the 263 conservative and permissive intervals. b, Distribution (left) and empirical cumulative distribution (right) of the CPU runtime recorded for running the 263 conservative and permissive intervals. c, Distribution (left) and empirical cumulative distribution (right) of the maximum recorded virtual memory allocation for running the 263 conservative and permissive intervals. d, Distribution (left) and empirical cumulative distribution (right) of the maximum recorded physical memory allocation for running the 263 conservative and permissive intervals. Vertical dashed lines in the cumulative distribution plots correspond to the median values.

Discussion

Tigerfish is a freely available computational platform that facilitates the design of oligo-based FISH probes against intervals of repetitive DNA at the scale of genomes. The Tigerfish pipeline establishes a paradigm for the deep specificity analysis of probes targeting repetitive sequences, which in turn enables users to establish criteria by which to select and empirically evaluate the effectiveness of oligos targeting such regions. Once designed, Tigerfish probes can readily be augmented with any of the powerful toolkits available for oligo-based FISH, including signal amplification approaches such as SABER59, HCR60, and RCA61 and multiplexing approaches such as DNA MERFISH62 and DNA seqFISH14. Moreover, Tigerfish offers users a great number of tunable parameters, providing flexibility to tailor the probe design process for different types of repetitive intervals and different genome compositions and complexities. We have demonstrated the efficacy of Tigerfish by performing genome-scale probe discovery in a fully assembled human genome and provided extensive experimental validation on both spread metaphase chromosomes and in interphase nuclei for the specificity of Tigerfish probes. Tigerfish is supported by extensive documentation and tutorials and can perform complex probe discovery tasks against the most challenging intervals of genomic DNA using only modest computational resources. We anticipate Tigerfish will play a key role in the experimental validation and biological investigation of repetitive DNA intervals as more fully assembled human, vertebrate, plant, and other model organism genomes continue to be introduced.

Methods

Genome sequences used for probe set design

The CHM13 genome assembly versions 1.0, 1.1, and 2.0 were downloaded without repeat masking from the T2T consortium at https://github.com/marbl/CHM13.

Pipeline construction and implementation

Tigerfish is written in Python 3.7.8 with dependencies that include Biopython 1.7753, Bowtie 2.3.5.128, NUPACK 4.051, BEDtools 2.29.263, Numpy 1.18.564, Pandas 1.0.565, pip 20.1.1, pybedtools 0.8.163,66, sam2pairwise 1.0.067, samtools 1.952, scikit-learn 0.23.154, scipy 1.5.068, zip 3.0, matplotlib 3.3.469, seaborn 0.11.170, pytest 6.271, and Jellyfish 2.2.1029. All Tigerfish probe collections were generated using a pipeline implemented with Snakemake 7.1950. Dependencies that implement Python libraries can be found via the tigerfish.yml, snakemake_env.yml, and chromomap_env.yml files that are used to execute Tigerfish as a Snakemake50 pipeline. These scripts and their dependencies are documented on Tigerfish’s GitHub repository. These environments are also available in the Supplementary Software. Scripts were executed locally in an OS X Anaconda Python 3.772 environment or in a CentOS Linux environment on the Department of Genome Science ‘Grid’ Cluster at the University of Washington.

Whole genome probe discovery

Genome assemblies in FASTA format without repeat masking were used when building Jellyfish29 files and Bowtie228 indices, and were used as input files for probe discovery. Jellyfish hash size was set to approximate the size of the genome assembly so that files were generated using the command, “jellyfish count -s 3300M -m 18”.

Identification of k-mer enriched sequences

Tigerfish identifies repeat regions in Repeat Identification mode by using a sliding window of a specified size (window, W) flagging all counts exceeding a user-specified value (threshold, T). The sum of the counts within the sliding window are divided by the length of the window so that as long as the user-specified composition score (composition, C) is exceeded, Tigerfish will identify windows of the genome where k-mer counts which map to abundantly repetitive sequences. Here, users may also specify at what base position they wish to start searching for repeats, which is described as a file_start parameter. Alternatively, if the user provides coordinates of target regions (i.e., defined_coords=True and repeat_discovery=False), then the user must also provide the name of the scaffold. In this case, Tigerfish skips the ‘repeat_ID.py’ script entirely to proceed with oligo probe design. For whole genome mining in CHM13, the sliding window was implemented with parameters described in Supplementary Data 5.

Designing oligo probes

Tigerfish implements logic as described in the OligoMiner22 framework for probe design using the bed file generated during Repeat Identification mode or from a user-provided BED file. Here, a FASTA file containing all regions of interest is used to design valid probe sequences using parameters values for probe length, percent G+C content (GC%) and adjusted melting temperature Tm calculated using nearest neighbor thermodynamics22. The modified blockParse script described in OligoMiner was used to mine probe candidates ranging in length from (min_length, max_length) 25–50 nt and Tm (min_temp, max_temp) between 42–52°C.

Predicting probe specificity

The k-mer binding proportion (enrich_score, Kb) was determined by obtaining the proportion of two computed values, copy_num and total_genome_binding. The aggregate count of all k-mers for any given probe sequence within its respective repeat target is described as copy_num, or Rm . The aggregate count of all k-mers for any given probe within the entire queried genome is described as total_genome_binding, or (Hm). Thus, the k-mer binding proportion was computed as Rm/Hm. Probes with shared k-mer composition similarity above the mer_cutoff proportion are omitted from downstream filtering. Probes are ranked in descending order within each repeat region by Normalized Rank (Nr) = (Rm/(max(Rm)*c1)) + (Kb/(max(Kb)*c2)), where c1 (c1_val) and c2 (c2_val) are user-specified constants.. The mer_cutoff proportion is determined by storing k-mers of ranked probes and profiling all consecutive candidate probes to see if the proportion of their k-mer composition exceeds that of the mer_cutoff. Users may modify enrich_score, copy_num, c1_val, c2_val, and mer_cutoff within the config.yml file. The parameters chosen for the conservative and permissive datasets are reported in Supplementary Data 5.

Computing in silico binding predictions

Bowtie2 was run on each probe sequence against the human genome using the following parameters (--local -N 1 -R 3 -D 20 -i C,4 --score-min G,1,4, -L 15, k 500000). The parameters - L (seed_length) and -k (bt2_alignment) may be modified by users within the Tigerfish config.yml. Probe alignments are returned as a BAM file for each probe sequence, which is then processed from the resulting SAM file using SAMtools52. Using this SAM file, sam2pairwise67 is used to return derived alignment sequences. With these provided pairs of probe sequence and derived alignment sequence, NUPACK 4.051 computes the predicted thermodynamic likelihood that each alignment pair will form duplexes under FISH conditions22. The NUPACK model summarizing these conditions is described as (material=‘dna’, celsius=69.5, sodium=0.39, magnesium=0.0, ensemble=‘stacking’). Candidate probes are only added to the final probe set if they do not share predicted probe binding greater than the value max_pdups_binding.

The on-target alignment score (OnT) is determined by taking the sum of all predicted duplexing scores for derived alignments that are found within the repeat target. Off-target alignment scores are computed by taking the aggregate sum of all predicted duplexing scores from derived alignments that are found outside the repeat target (OffT). The predicted in silico on-target binding proportion (binding_prop) for each oligo is then computed as OnT/(OnT+OffT). Genome bins (genome_windows) are generated using BEDtools makewindows, and BEDtools intersect is applied to all reported sam2pairwise genome alignments to identify potential off-target binding signals. All predicted duplexing scores are aggregated over windows, which are binarized to map binding signals to the repeat target and all other genomic regions where binding events are predicted. Probes with an aggregate OffT over any given non-target genome bin that exceeds the parameter off_bin_thresh are culled from the candidate probe set. Users may modify the parameters seed_length, bt2_alignment, genome_windows, binding_prop, and off_bin_thresh. There are additional parameters that may be used to control permissiveness of filtering in the alignment_filter.py script. Users may control the desired aggregate on-target sum for any set of probes designed against a repeat region (target_sum), the minimum on-target value for any desired candidate probe (min_on_target), and maximum desired candidate probes to be returned in any target repeat region (max_probe_return). Parameters chosen for conservative and permissive datasets may be viewed in Supplementary Data 5.

Visualizing candidate probe in silico binding

Bowtie2 alignments are derived for individual or pools of probes against a repeat region where predicted thermodynamic binding is computed over a given size of genomic bins generated by BEDtools (thresh_windows). These predicted thermodynamic binding events are summarized by scaffolds and are used to determine the size of the imaging target window for bins containing binding events that are greater than the parameter within the repeat region target (align_thresh). The sum of predicted duplexing values are aggregated over computed genomic bins and normalized using the MinMaxScalar function of scikit-learn73, where the range of values is mapped from 0 to 255 to summarize predicted binding over genomic bins. chromoMap74 in R is used to generate summary ideograms of probe target signals as an optional step in Probe Analysis Mode.

Read the Docs

A Read the Docs web page (https://beliveau-lab-tigerfish.readthedocs-hosted.com) was created to provide detailed documentation of our tool. The intention of hosting our work on Read the Docs was to provide sufficient background and resources for individuals from all computational backgrounds to be able to leverage Tigerfish for their own work. Here we provide installation information, simple tutorials for testing the Tigerfish install, a glossary of all parameters that may be modified by users, summaries of our default parameters, and frequently asked questions.

Computational benchmarking

Speed calculations were computed using the Snakemake benchmark feature. Each scaffold in the CHM13v2 + HG002 chrY assembly was run as its own individual cluster job in parallel for the repeat discovery steps, and the resulting intervals identified for probe design were also processed in parallel. Benchmarking was performed on a Dell PowerEdge R840 server node equipped with 4 Intel Xeon Gold 6252 2.1 GHz 24-core CPUs (192 total job threads) and 1.5 TB of DDR4 PC4–23400 2933 Mhz ECC RAM running CentOS 7.9 Linux.

PER concatemerization

100 μl Primer Exchange Reactions were prepared for both Tigerfish probes and PaintSHOP bridge sequences with a final concentration of 1x PBS, 10 mM MgSO4, 400–1,000 U/ml Bst DNA Polymerase (large fragment), 120,000 units/ml (NEB M0275M), 100 nM Clean G hairpin, 50 nM – 1 μM hairpin and water to 90 μl. After incubation for 15 min at 37°C, 10 μM oligo probe(s) were added and the reaction was incubated for another 2 hours with another 20 min at 80°C to heat-inactivate the polymerase. PER extension solutions were directly diluted into FISH solutions. Lengths of the concatemers were evaluated by diluting 6.7 μl of the in vitro reaction with 3.3 μl 6X TriTrack. Samples were then run on a 10% TBE-Urea denaturation gel (ThermoFisher EC68755BOX) for 10 min alongside 1 kb Plus DNA Ladder to estimate length and imaged with SYBR Gold channel and then imaged after a 15 min incubation.

DNA-SABER-FISH on spread metaphase chromosomes

PaintSHOP bridge oligos and Tigerfish primary probes were extended using the PER as previously described59. Dry microscope slides containing human 46,XY metaphase spreads (from AppliedGenetics Laboratories) were immersed in 2× SSCT + 70% (vol/vol) formamide at 70°C and incubated for 90 s in Coplin jars. Slides were then transferred and incubated in ice-cold 70% (vol/vol) ethanol, ice-cold 90% (vol/vol) ethanol, and ice-cold 100% (vol/vol) for 5 minutes each. Slides were then air dried after incubation in 100% ethanol. A hybridization solution consisting of 2X SSCT, 50% formamide, 10% (wt/vol) dextran sulfate, 40 ng/μL RNase A (EN0531; Thermo Fisher), and resuspended PER-extended PaintSHOP bridge oligos (20 pmol total), amplified ssDNA primary probes (25 pmol total), and PaintSHOP bridge library (60 pmol total) which were dried at 60°C for 30 minutes using a SpeedVac concentrator. The solution was sealed using a 22 × 22-mm #1.5 coverslip using rubber cement. Samples hybridized overnight at 45°C in a humidified chamber. Samples were then washed for 15 min in 2X SSCT at 60°C and then twice for 5 mins with room temperature 2X SSCT. Samples were then incubated in a secondary hybridization containing 5X PBST, 10% dextran sulfate, 10 μM fluorescent oligos for 1 hour at 37ºC. Slides were then washed three times with 1X PBST at 37ºC. After air drying slides, samples were mounted with SlowFade Gold + DAPI and sealed beneath a 22 × 30 mm #1.5 coverslip using nail polish.

Microscopy

Microscopy was performed using a Yokogawa CSU-W1 SoRa spinning disc confocal unit attached to a Nikon Eclipse Ti-2 chassis. Excitation light was emitted at 30% of maximal intensity from 405 nm, 488 nm, 561 nm, or 640 nm lasers housed inside of a commercial Nikon LUNF 405/488/561/640NM launch. Laser excitation was delivered via a single-mode optical fiber into the CSU-W1 SoRa unit. Excitation light was then directed through a microlens array disc and a ‘SoRa’ disc containing 50 μm pinholes and directed to the rear aperture of a 100x N.A. 1.49 Apo TIRF oil immersion objective lens by a prism in the base of the Ti2. Emission light was collected by the same objective and passed via a prism in the base of the Ti2 back into the SoRa unit, where it was relayed by a 1x lens through the pinhole disc and directed into the emission path by a quad-band dichroic mirror (Semrock Di01-T405/488/568/647–13×15×0.5). Emission light was then spectrally filtered by one of four single bandpass filters (DAPI: Chroma ET455/50M; ATTO 488: Chroma ET525/36M; ATTO 565: 27 Chroma ET605/50M; Alexa Fluor 647: Chroma ET705/72M) and focused by a 1x relay lens onto an Andor Sona 4.2B-11 camera with a physical pixel size of 11 μm, resulting in an effective pixel size of 110 nm. The Sona was operated in 30 16-bit mode with rolling shutter readout and exposure times of 300 ms. Images were processed in ImageJ and Fiji75,76 and Adobe Photoshop.

Supplementary Material

Supplement 1
media-1.pdf (38MB, pdf)
Supplement 2
media-2.txt (399KB, txt)
Supplement 3
media-3.txt (62.8KB, txt)
Supplement 4
media-4.txt (10.4KB, txt)
Supplement 5
media-5.xlsx (15.7KB, xlsx)
Supplement 6
media-6.xlsx (13.3KB, xlsx)
Supplement 7
media-7.zip (32.2MB, zip)

Acknowledgements

We thank Dr. Evan Eichler for providing CHM13-hTERT cells and Dr. Tamara Potapova and Dr. Jennifer Gerton for their valuable advice for preparing fixed metaphase slides. Additionally, we would like to thank Dr. Ching-Ho Chang and Dr. Amanda Laurracuente for their feedback on the D. melanogaster genome. We thank David Nwizugbo, Caleb Kono, Caleb J. Bower, and Chris Hsu for feedback on the development of Tigerfish and members of the Beliveau and Noble labs for helpful discussions. This work was supported by the National Institutes of Health (under grants 1R35GM137916 to B.J.B., UM1HG011531 to W.S.N., and 1R01HG011274 to K.H.M) and the Brotman Baty Institute for Precision Medicine (under a Catalytic Collaboration award to B.J.B.). R.A. was supported by a National Science Foundation Graduate Research Fellowship Program Award and a Howard Hughes Medical Institute Gilliam Fellowship for Advanced Study. C.K.C. was supported by NIH training grant 5T32HG000035.

Footnotes

Competing Interest Statement

The authors declare no competing interests.The authors declare no competing interests.

Code Availability

The Tigerfish source code is available under a MIT license at https://github.com/beliveau-lab/TigerFISH.

Data Availability

Primary microscopy data will be made available upon request.

References

  • 1.Pardue M. L. & Gall J. G. Molecular hybridization of radioactive DNA to the DNA of cytological preparations. Proc. Natl. Acad. Sci. U. S. A. 64, 600–604 (1969). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Rudkin G. T. & Stollar B. D. High resolution detection of DNA–RNA hybrids in situ by indirect immunofluorescence. Nature 265, 472–473 (1977). [DOI] [PubMed] [Google Scholar]
  • 3.Bauman J. G. J., Wiegant J., Borst P. & van Duijn P. A new method for fluorescence microscopical localization of specific DNA sequences by in situ hybridization of fluorochrome-labelled RNA. Exp. Cell Res. 128, 485–490 (1980). [DOI] [PubMed] [Google Scholar]
  • 4.Langer-Safer P. R., Levine M. & Ward D. C. Immunological method for mapping genes on Drosophila polytene chromosomes. Proc. Natl. Acad. Sci. U. S. A. 79, 4381–4385 (1982). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Lawrence J. B. & Singer R. H. Quantitative analysis of in situ hybridization methods for the detection of actin gene expression. Nucleic Acids Res. 13, 1777–1799 (1985). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Lewis M. E., Sherman T. G. & Watson S. J. In situ hybridization histochemistry with synthetic oligonucleotides: strategies and methods. Peptides 6 Suppl 2, 75–87 (1985). [DOI] [PubMed] [Google Scholar]
  • 7.Raj A., van den Bogaard P., Rifkin S. a., van Oudenaarden A. & Tyagi S. Imaging individual mRNA molecules using multiple singly labeled probes. Nat. Methods 5, 877–879 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Yamada N. A. et al. Visualization of fine-scale genomic structure by oligonucleotide-based high-resolution FISH. Cytogenet. Genome Res. 132, 248–254 (2011). [DOI] [PubMed] [Google Scholar]
  • 9.Boyle S., Rodesch M. J., Halvensleben H. A., Jeddeloh J. A. & Bickmore W. A. Fluorescence in situ hybridization with high-complexity repeat-free oligonucleotide probes generated by massively parallel synthesis. Chromosome Res. 19, 901–909 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Beliveau B. J. et al. Versatile design and synthesis platform for visualizing genomes with Oligopaint FISH probes. Proceedings of the National Academy of Sciences 109, 21301–21306 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Wang S. et al. Spatial organization of chromatin domains and compartments in single chromosomes. Science 353, 598–602 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Bintu B. et al. Super-resolution chromatin tracing reveals domains and cooperative interactions in single cells. Science 362, (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Mateo L. J. et al. Visualizing DNA folding and RNA in embryos at single-cell resolution. Nature vol. 568 49–54 Preprint at 10.1038/s41586-019-1035-4 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Takei Y. et al. Integrated spatial genomics reveals global architecture of single nuclei. Nature 590, 344–350 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Lubeck E., Coskun A. F., Zhiyentayev T., Ahmad M. & Cai L. Single-cell in situ RNA profiling by sequential hybridization. Nature methods vol. 11 360–361 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Chen K. H., Boettiger A. N., Moffitt J. R., Wang S. & Zhuang X. RNA imaging. Spatially resolved, highly multiplexed RNA profiling in single cells. Science 348, aaa6090 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Shah S., Lubeck E., Zhou W. & Cai L. In Situ Transcription Profiling of Single Cells Reveals Spatial Organization of Cells in the Mouse Hippocampus. Neuron 92, 342–357 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Rouillard J. M., Zuker M. & Gulari E. OligoArray 2.0: Design of oligonucleotide probes for DNA microarrays using a thermodynamic approach. Nucleic Acids Res. 31, 3057–3062 (2003). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Navin N. et al. PROBER: Oligonucleotide FISH probe design software. Bioinformatics 22, 2437–2438 (2006). [DOI] [PubMed] [Google Scholar]
  • 20.Han Y., Zhang T., Thammapichai P., Weng Y. & Jiang J. Chromosome-Specific Painting in Cucumis Species Using Bulked Oligonucleotides. Genetics 200, 771–779 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Yilmaz L. S., Parnerkar S. & Noguera D. R. mathFISH, a web tool that uses thermodynamics-based mathematical models for in silico evaluation of oligonucleotide probes for fluorescence in situ hybridization. Appl. Environ. Microbiol. 77, 1118–1122 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Beliveau B. J. et al. OligoMiner provides a rapid, flexible environment for the design of genome-scale oligonucleotide in situ hybridization probes. Proc. Natl. Acad. Sci. U. S. A. 115, E2183–E2192 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Gelali E. et al. iFISH is a publically available resource enabling versatile DNA FISH to study genome architecture. Nat. Commun. 10, 1636 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Hu M. et al. ProbeDealer is a convenient tool for designing probes for highly multiplexed fluorescence in situ hybridization. Sci. Rep. 10, 22031 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Zhang T., Liu G., Zhao H. & Braz G. T. Chorus2: design of genome-scale oligonucleotide-based probes for fluorescence in situ hybridization. Plant Biotechnol. (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Hershberg E. A. et al. PaintSHOP enables the interactive design of transcriptome- and genome-scale oligonucleotide FISH experiments. Nat. Methods 18, 937–944 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Altschul S. F., Gish W., Miller W., Myers E. W. & Lipman D. J. Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990). [DOI] [PubMed] [Google Scholar]
  • 28.Langmead B. & Salzberg S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Marçais G. & Kingsford C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics 27, 764–770 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Britten RJ and Kohne DE. Repeated Sequences in DNA. Science 161, 529–540 (1968). [DOI] [PubMed] [Google Scholar]
  • 31.Landegent J. E., Jansen in de Wal N., Dirks R. W. & van der Ploeg M. Use of whole cosmid cloned genomic sequences for chromosomal localization by non-radioactive in situ hybridization. Hum. Genet. 77, 366–370 (1987). [DOI] [PubMed] [Google Scholar]
  • 32.Lichter P., Cremer T., Borden J., Manuelidis L. & Ward D. C. Delineation of individual human chromosomes in metaphase and interphase cells by in situ suppression hybridization using recombinant DNA libraries. Hum. Genet. 80, 224–234 (1988). [DOI] [PubMed] [Google Scholar]
  • 33.Pinkel D. et al. Fluorescence in situ hybridization with human chromosome-specific libraries: detection of trisomy 21 and translocations of chromosome 4. Proc. Natl. Acad. Sci. U. S. A. 85, 9138–9142 (1988). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Smit AFA, Hubley R & Green P. RepeatMasker Open-4.0. 2013–2015. http://www.repeatmasker.org.
  • 35.Treangen T. J. & Salzberg S. L. Repetitive DNA and next-generation sequencing: computational challenges and solutions. Nat. Rev. Genet. 13, 36–46 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Nurk S. et al. The complete sequence of a human genome. Science 376, 44–53 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Altemose N. et al. Complete genomic and epigenetic maps of human centromeres. Science 376, eabl4178 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Gonzalez I. L. & Sylvester J. E. Complete sequence of the 43-kb human ribosomal DNA repeat: analysis of the intergenic spacer. Genomics 27, 320–328 (1995). [DOI] [PubMed] [Google Scholar]
  • 39.Marzluff W. F., Gongidi P., Woods K. R., Jin J. & Maltais L. J. The human and mouse replication-dependent histone genes. Genomics 80, 487–498 (2002). [PubMed] [Google Scholar]
  • 40.Moyzis R. K. et al. A highly conserved repetitive DNA sequence, (TTAGGG)n, present at the telomeres of human chromosomes. Proc. Natl. Acad. Sci. U. S. A. 85, 6622–6626 (1988). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Franke V. et al. Long terminal repeats power evolution of genes and gene expression programs in mammalian oocytes and zygotes. Genome Res. 27, 1384–1394 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Henikoff S., Ahmad K. & Malik H. S. The centromere paradox: stable inheritance with rapidly evolving DNA. Science 293, 1098–1102 (2001). [DOI] [PubMed] [Google Scholar]
  • 43.Jones K. W. Chromosomal and nuclear location of Mouse Satellite DNA in individual cells. Nature 225, 912–915 (1970). [DOI] [PubMed] [Google Scholar]
  • 44.Riegel M. Human molecular cytogenetics: From cells to nucleotides. Genetics and Molecular Biology vol. 37 194–209 Preprint at 10.1590/S1415-47572014000200006 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Adilardi R. S. & Dernburg A. F. Robust, versatile DNA FISH probes for chromosome-specific repeats in Caenorhabditis elegans and Pristionchus pacificus. G3 12, (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Tang S. et al. Developing New Oligo Probes to Distinguish Specific Chromosomal Segments and the A, B, D Genomes of Wheat (Triticum aestivum L.) Using ND-FISH. Front. Plant Sci. 9, 1104 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Lei J. et al. Development of oligonucleotide probes for FISH karyotyping in Haynaldia villosa, a wild relative of common wheat. The Crop Journal 8, 676–681 (2020). [Google Scholar]
  • 48.Wang T. et al. The Human Pangenome Project: a global resource to map genomic diversity. Nature 604, 437–446 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Paez S. et al. Reference genomes for conservation. Science 377, 364–366 (2022). [DOI] [PubMed] [Google Scholar]
  • 50.Köster J. & Rahmann S. Snakemake—a scalable bioinformatics workflow engine. Bioinformatics 28, 2520–2522 (2012). [DOI] [PubMed] [Google Scholar]
  • 51.Zadeh J. N. et al. NUPACK: Analysis and design of nucleic acid systems. J. Comput. Chem. 32, 170–173 (2011). [DOI] [PubMed] [Google Scholar]
  • 52.Li H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Cock P. J. A. et al. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics vol. 25 1422–1423 Preprint at 10.1093/bioinformatics/btp163 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Pedregosa F. et al. Scikit-learn: Machine learning in Python. the Journal of machine Learning research 12, 2825–2830 (2011). [Google Scholar]
  • 55.Anand L. & Rodriguez Lopez C. M. ChromoMap: an R package for interactive visualization of multi-omics data and annotation of chromosomes. BMC Bioinformatics 23, 33 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Lipman D. J. & Pearson W. R. Rapid and sensitive protein similarity searches. Science 227, 1435–1441 (1985). [DOI] [PubMed] [Google Scholar]
  • 57.Kent W. J. et al. The Human Genome Browser at UCSC. Genome Res. 12, 996–1006 (2002). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Beliveau B. J. et al. Single-molecule super-resolution imaging of chromosomes and in situ haplotype visualization using Oligopaint FISH probes. Nat. Commun. 6, 7147 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Kishi J. Y. et al. SABER amplifies FISH: enhanced multiplexed imaging of RNA and DNA in cells and tissues. Nat. Methods (2019) doi: 10.1038/s41592-019-0404-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Choi H. M. T., Beck V. A. & Pierce N. A. Next-generation in situ hybridization chain reaction: Higher gain, lower cost, greater durability. ACS Nano 8, 4284–4294 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Banér J., Nilsson M., Mendel-Hartvig M. & Landegren U. Signal amplification of padlock probes by rolling circle replication. Nucleic Acids Res. 26, 5073–5078 (1998). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Su J.-H., Zheng P., Kinrot S. S., Bintu B. & Zhuang X. Genome-Scale Imaging of the 3D Organization and Transcriptional Activity of Chromatin. Cell 182, 1641–1659.e26 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Quinlan A. R. & Hall I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics vol. 26 841–842 Preprint at 10.1093/bioinformatics/btq033 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Harris C. R. et al. Array programming with NumPy. Nature 585, 357–362 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.McKinney W. Data Structures for Statistical Computing in Python. in Proceedings of the 9th Python in Science Conference (SciPy, 2010). doi: 10.25080/majora-92bf1922-00a. [DOI] [Google Scholar]
  • 66.Dale R. K., Pedersen B. S. & Quinlan A. R. Pybedtools: a flexible Python library for manipulating genomic datasets and annotations. Bioinformatics 27, 3423–3424 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.LaFave M. C. & Burgess S. M. sam2pairwise version 1.0. 0. 2014. [Google Scholar]
  • 68.Virtanen P. et al. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat. Methods 17, 261–272 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Hunter. Matplotlib: A 2D Graphics Environment. 9, 90–95 (2007). [Google Scholar]
  • 70.Waskom M. seaborn: statistical data visualization. J. Open Source Softw. 6, 3021 (2021). [Google Scholar]
  • 71.Krekel H. et al. pytest 5.3. 2. Preprint at (2004). [Google Scholar]
  • 72.Anaconda I. Anaconda Software Distribution. Computer software (2014). [Google Scholar]
  • 73.Garreta R. & Moncecchi G. Learning scikit-learn: Machine Learning in Python. (Packt Publishing Ltd, 2013). [Google Scholar]
  • 74.Anand L. & Rodriguez Lopez C. M. chromoMap: An R package for Interactive Visualization and Annotation of Chromosomes. Preprint at 10.1101/605600. [DOI] [PMC free article] [PubMed]
  • 75.Schneider C. A., Rasband W. S. & Eliceiri K. W. NIH Image to ImageJ: 25 years of image analysis. Nat. Methods 9, 671–675 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76.Schindelin J. et al. Fiji: an open-source platform for biological-image analysis. Nat. Methods 9, 676–682 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplement 1
media-1.pdf (38MB, pdf)
Supplement 2
media-2.txt (399KB, txt)
Supplement 3
media-3.txt (62.8KB, txt)
Supplement 4
media-4.txt (10.4KB, txt)
Supplement 5
media-5.xlsx (15.7KB, xlsx)
Supplement 6
media-6.xlsx (13.3KB, xlsx)
Supplement 7
media-7.zip (32.2MB, zip)

Data Availability Statement

Primary microscopy data will be made available upon request.


Articles from bioRxiv are provided here courtesy of Cold Spring Harbor Laboratory Preprints

RESOURCES