Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2011 Mar 4.
Published in final edited form as: Open Access Bioinformatics. 2010 Nov 1;2(2010):145–155. doi: 10.2147/OAB.S13741

Microarray oligonucleotide probe designer (MOPeD): A web service

Viren C Patel 1,*, Kajari Mondal 1,*, Amol Carl Shetty 1,*, Vanessa L Horner 1, Jirair K Bedoyan 2, Donna Martin 2,3, Tamara Caspary 1, David J Cutler 1, Michael E Zwick 1
PMCID: PMC3048354  NIHMSID: NIHMS274493  PMID: 21379402

Abstract

Methods of genomic selection that combine high-density oligonucleotide microarrays with next-generation DNA sequencing allow investigators to characterize genomic variation in selected portions of complex eukaryotic genomes. Yet choosing which specific oligonucleotides to be use can pose a major technical challenge. To address this issue, we have developed a software package called MOPeD (Microarray Oligonucleotide Probe Designer), which automates the process of designing genomic selection microarrays. This web-based software allows individual investigators to design custom genomic selection microarrays optimized for synthesis with Roche NimbleGen’s maskless photolithography. Design parameters include uniqueness of the probe sequences, melting temperature, hairpin formation, and the presence of single nucleotide polymorphisms. We generated probe databases for the human, mouse, and rhesus macaque genomes and conducted experimental validation of MOPeD-designed microarrays in human samples by sequencing the human X chromosome exome, where relevant sequence metrics indicated superior performance relative to a microarray designed by the Roche NimbleGen proprietary algorithm. We also performed validation in the mouse to identify known mutations contained within a 487-kb region from mouse chromosome 16, the mouse chromosome 16 exome (1.7 Mb), and the mouse chromosome 12 exome (3.3 Mb). Our results suggest that the open source MOPeD software package and website (http://moped.genetics.emory.edu/) will make a valuable resource for investigators in their sequence-based studies of complex eukaryotic genomes.

Keywords: genomic selection, oligonucleotide, microarray, next-generation sequencing, software

Introduction

Next-generation sequencing platforms enable individual investigators to harness enormous raw sequencing power at a dramatically lower cost per sequenced base than traditional Sanger sequencing.1, 2 Although sequencing complete eukaryotic genomes can still be prohibitively expensive for many types of studies, the recent development and validation of methods of isolating target DNA from complex eukaryotic genomes offer a way forward for many investigators.310 (see review in 11). These methods have been used recently to perform targeted next-generation sequencing of human exomes in order to identify causative variants underlying monogenic disorders.12, 13 Similarly, it seems that the use of targeted next-generation sequencing to reveal mutations induced in forward genetic screens of model organisms is bound to be used more and more to identify causative mutations. Ultimately, given a reference genome sequence, these improved methods of target DNA isolation combined with next-generation sequencing platforms will allow a more complete and comprehensive ascertainment of DNA sequence variation.

Nevertheless, to fully realize this experimental paradigm an investigator must obtain a specialized, often custom-designed set of reagents. The development of maskless array synthesis allows the custom design and production of high-density oligonucleotide microarrays.14 The central challenge is then the selection of the specific oligonucleotides to be placed on a genomic selection array.4, 9 Although there are a number of algorithms for designing tiling arrays for genome-wide transcriptome or ChIP-seq experiments.1518 (see review in 19), we still lack easily accessible open-sources tools for building genomic selection arrays.

To address this issue, we have developed a software package named MOPeD (Microarray Oligonucleotide Probe Designer), which automates the process of designing genomic selection microarrays. This web-based software allows individual investigators to easily design custom genome capture arrays that have been optimized for maskless array synthesis by Roche NimbleGen. Here we experimentally validated the performance of MOPeD-designed genomic selection microarrays by sequencing the human X chromosome exome, a mouse chromosome 16 genomic region, and the mouse chromosome 12 and 16 exomes. Our data show that MOPeD can provide investigators a valuable resource for their sequence-based studies of complex eukaryotic genomes.

Material and methods

The MOPeD software package is implemented in two parts. The first involves the creation of a probe database for a specific reference genome. Operations in this first part are required once for a specific genome. The second part involves obtaining user parameters, querying the previously created probe database, and selecting optimal probes for target regions. This process may be repeated for the design of different microarrays. MOPeD was developed in C and Perl and is licensed under a GPL 3.0. The source code is available from the MOPeD website (http://moped.genetics.emory.edu/) and via SourceForge (http://moped.sourceforge.net).

Construction of the MOPeD probe database

Creation of the probe database for a specific reference genome is implemented in two steps (Figure 1). The first involves the creation of a database that contains the count of every k-mer (k = 10 to 15) in the given genome. The second step involves the computation of attributes for both forward and reverse probes of size 55 to 65. UCSC reference assemblies for human (hg18), mouse (mm9), and rhesus macaque (rheMac2), along with their respective dbSNP tracks, were used for the current implementation of MOPeD.

Figure 1.

Figure 1

Steps required to generate the MOPeD probe database.

Construction of k-mer database

We constructed a database containing the count of all k-mers in a given genome, where k ranges from 10 to 15. Each k-mer is given an index from 0 to 4k – 1 according to its alphabetical position. In this scheme a 10-mer consisting of all A’s would have index 0, and a 10-mer consisting of all T’s would have index 410 – 1. A 15-mer consisting of all A’s would also have index 0; however a 15-mer consisting of all T’s would have index 415 – 1. Distinct files were used for each k. This facilitated searching and locating the count of any particular k-mer. This database was used to compute a weighted score that estimates the uniqueness of probes.

Computation of probe attributes

The final probe database contained all possible probes of size n (n = 55 to 65) from the genome of interest, excluding any probes that contained N (unknown base). The database stored four attributes of each probe: the potential to form hairpin structures, a weighted uniqueness score, the number and positions of SNPs, and the Roche NimbleGen synthesis cycle length.

Hairpin

Each probe was tested for the potential to form a hairpin structure by computing the cumulative melting temperature (Tm) of Watson-Crick pairings in the pre-loop and post-loop segments for varying sizes of pre-loop, post-loop, and loop segments. If the cumulative Tm exceeded a pre-defined Tm limit (eg, annealing temperature), the probe was considered a candidate for hairpin formation and noted as such. The Tm of the probes was calculated for oligonucleotides bound to a surface using the model and parameters described.20 A Tm limit was 40° C.

Uniqueness score

A weighted uniqueness score for each forward and reverse probe was computed. The weighting scheme gave proportionally more weight to larger k-mers (k = 10 to 15), since larger k-mers are more unique in the genome. For each probe of size n (n = 55 to 65), all possible k-mers present in the probe are extracted and their counts obtained from the k-mer database. The counts were summed and divided by the number of k-mers to provide an average score. The score was further adjusted to account for the larger counts associated with smaller k-mers. In this scheme lower score values indicate higher specificity of the probes.

Synthesis cycle length

The Roche NimbleGen synthesis cycle length was computed for each probe and added to the final probe database. Roche NimbleGen cycle length was computed using their published algorithm.21 Synthesis cycles computations and limits for other manufacturers may be easily incorporated into the software.

SNP variation

The probes were analyzed for the presence/absence of SNPs, and their positions on the probe were noted. SNPs were determined using UCSC SNPs Track for hg18 (dbSNP build 130) and mm9 (dbSNP build 128). Probes with SNPs have been implicated in lower performance in array comparative genomic applications 22.

Design of Microarray-based genomic selection (MGS) microarrays

MOPeD design of microarrays requires user input; the software selects optimal probes and outputs a text file that can be transmitted for the manufacture of a microarray (Figure 2). User inputs include: the genome of interest; minimum and maximum values for probe size, coverage, and Tm; number of chip features; upper bounds on the number of synthesis cycles; and number of SNPs on a probe. Also specifiable is the priority of probe filtering by various parameters. Optionally a BED file containing regions that should be biased for additional probe coverage may be specified Finally, a BED file containing target regions from the genome of interest is required.

Figure 2.

Figure 2

Steps required to generate a probe design file with MOPeD.

The format of the BED file submitted by the user is then verified. Duplicate regions are removed, while overlapping regions are merged. Preliminary probe allotment is then computed for all regions taking into account the user-specified parameters, as well as the characteristics of the genomic region under consideration such as size and GC content.

Dynamic allocation of probes

Previous studies have shown that the performance of oligonucleotide probes can vary as a function of sequence content and context.22, 23 To improve performance, MOPeD uses genomic variation information to aid in the selection and dynamic allocation of probes to targeted genomic regions. Two variables, targeted fragment size and GC content, can alter the performance of a genomic selection array. To achieve more uniform sequence capture, we employ a set of linear models to guide the dynamic allocation of probes (Figure 3). Fragments with high GC content (GCmax) have maximum coverage (Cmax) and correspondingly smaller shift (Smin). Similarly, large fragments (Lmax) have smaller coverage (Cmin) and correspondingly larger shift (Smax). Our protocol attempts to ensure that every base in the region of interest (ROI) has at least the minimum coverage (Cmin) of probes.

Figure 3.

Figure 3

Linear models MOPeD uses to select and dynamically allocated probes when generating a probe design file.

Selection of probes

The final step of the protocol involves selecting probes for each fragment in the ROI based on parameters such as probe length, Tm, uniqueness score, hairpin potential, Roche NimbleGen cycle length, and SNPs. The first part involves the selection of the best probes to ensure maximum coverage of the target region according the algorithm outlined below. For each fragment:

  1. Query probe database for all probes that tile over the fragment

  2. Evaluate probes to meet user-specified parameters for hairpin, length, Tm, synthesis cycle length, and SNPs; create viable probe set (VPS)

  3. If (VPS is not empty)

    1. Set V(X) = 0 for every base X in the fragment

    2. Loop until V(X) ≠ 0 for every base X in the fragment

      1. Set B(X,s) to first base X where V(X) == 0

      2. Set B(X,e) to last base X where V(X) == 0 and V(B(X,s) ‥ B(X,e)) == 0

      3. Set M = (B(X,s) + B(X,e)) / 2

      4. Query VPS for all probes that tile over M; create probe set MPS for position M; each probe Pi has uniqueness score Ui

      5. If (MPS is empty)

        • (a)

          Mark V(M) = ‘N’

        Else

        • (b)

          Select probe Pi with lowest uniqueness score Ui

        • (c)

          Set Ps = start coordinate of Pi

        • (d)

          Set Pe = stop coordinate of Pi

        • (e)

          Mark V(Ps ‥ Pe) = 1

        End If

      End loop

    End If

The final part involves replication of the tiled probes to satisfy the fragment coverage allotment computed beforehand.

MOPeD design files and coverage statistics

The output consists of a text file in fasta format with all of the unique probes selected for the regions specified in the user-supplied BED file. A text file with the complete probe list (385 thousand or 2.1 million oligonucleotides) in a format suitable for providing to Roche NimbleGen is generated. Also provided are a summary of the design statistics and the distribution of the selected probes across the user-defined criteria, along with probe coverage analysis for individual fragments in the targeted region. For each fragment a BED file that can be uploaded to the UCSC Genome Browser is supplied. These files show the overlay of unique probes in the target region.

MOPeD design parameters

Four different microarray-based genomic selection arrays were designed and experimentally validated. For the human X chromosome exome microarray, the target region was preprocessed to remove fragments smaller than 25 bases and repeat regions greater than 25 bases. The MOPeD design was generated using the following criteria: probe size ranged from 55 to 65; the probe Tm range was 65°C to 75°C; number of SNPs per probe was limited to 2; the synthesis cycles limit was 192. The selected probes were further filtered to remove probes with more than 33% repeat content.

The mouse chromosome 16 487-kb microarray and the chromosome 16 and 12 exome microarray designs were generated using MOPeD with the following parameters: probe size ranged from 55 to 65; number of SNPs per probe was limited to 2; the synthesis cycle limit was 192.

Validation of MOPeD using microarray-based genomic selection

Experiments were carried out as outlined in{4, 9, with the following changes to the microarray-based genomic selection (MGS) protocol. Instead of 20–25µg of fragmented DNA, 5µg of fragmented DNA was used while repairing the ends of the DNA library. After purification of the adaptor-ligated product, the samples were run on Invitrogen 2% SizeSelect gels (Invitrogen, Cat #: G6610-02). A 300-bp band was selected and placed in a plastic tube. The entire 300-bp size-selected DNA was then amplified using the following primers: 5' AAT GAT ACG GCG ACC ACC GAG ATC TAC ACT CTT TCC CTA CAC GAC GCT CTT CCG ATC T 3' and 5' CAA GCA GAA GAC GGC ATA CGA GCT CTT CCG ATC T 3' and high-fidelity polymerase. This pre-capture PCR product was purified, and 1 µl of the purified product was run on a Bioanalyzer DNA 7500 chip for DNA quantitation and also for ensuring that most of the DNA fragments fell between 250–350bp. To 1 µg of the pre-captured PCR-purified sample, a 100-fold amount (in µg) of Human Cot-1 DNA (Invitrogen) was added. The samples were dried down to a pellet in a Speed-Vac at medium heat (75°C). To each pellet, 2.8 µl of water and 1 µl each of two hybridization-enhancing oligos (5' AAT GAT ACG GCG ACC ACC GAG ATC TAC ACT CTT TCC CTA CAC GAC GCT CTT CCG ATC T 3' and 5' CAA GCA GAA GAC GGC ATA CGA GCT CTT CCG ATC T 3') were added. To this we added 8 µl of 2X SC hybridization buffer (Roche NimbleGen) and 3.2 µl of SC Hybridization component A (Roche NimbleGen). The sample pellet was then gently resuspended, and hybridization on a 385K chip was done following Roche NimbleGen’s SeqCap User’s Guide version 3.2. After hybridization, arrays were eluted following the protocol mentioned in Roche NimbleGen’s SeqCap UserGuide version 3.2. Each eluted sample was split into 10 tubes, and post-capture PCR was done using the following primers 5' AAT GAT ACG GCG ACC ACC GAG A 3' and 5' CAA GCA GAA GAC GGC ATA CGA G 3'.

After PCR the products were pooled from 10 tubes and were purified using the Qiagen QIAquick PCR Purification Kit. We then analyzed 1µl of the purified products on a Bioanalyzer DNA 7500 chip. More than 1 µg of DNA was obtained for each sample. PCR products were than subjected to quantitative PCR using a KAPA Library Quant Kit (catalog # 4852). Based on the qPCR quantification, each sample was diluted to 10nM using water. The samples were then denatured using NaOH, and 120 µl of 8pM of each sample were loaded onto each lane of the flow-cell on the Illumina Cluster Station. Following cluster amplification, the flow-cell was transferred to the Illumina Genome Analyzer (IGA). A 76-cycle step-wise sequencing-by-synthesis using four-color nucleotides was performed according to the manufacturer’s instructions (Illumina, San Diego, CA).

DNA samples analyzed

Human DNA samples used included a HapMap sample, NA18503 obtained from the Coriell Cell Repositories. Whole genomic DNA was isolated from blood samples obtained from two additional male anonymous samples, M1 and M2, for use in the X chromosome exome experiments. Consent was obtained and the study approved by The University of Michigan Institutional Review Board. Mouse DNA was isolated from the liver of heterozygous carrier females. Approximately pea-sized fragments of liver were homogenized in 10 mL of DNA extraction buffer (10 mM Tris pH 8.0, 0.1 M EDTA pH 8.0, 0.5% SDS, 20 µg/mL RNAse A). After homogenization, the samples were incubated at 37°C for 1 hour to degrade RNA. Proteinase K was added at a concentration of 100 µg/mL, and samples were incubated at 50°C overnight. DNA was extracted 3 times using an equal volume of phenol equilibrated with 0.5 M Tris pH 8.0. After the final extraction an equal volume of chloroform was added to remove traces of phenol. DNA was precipitated with 0.2 volumes of 3M sodium acetate and 2 volumes of ethanol. After precipitation, DNA was washed once with 70% ethanol and dissolved in 100–200 uL of water. The MGS protocol was then carried out as described previously.

Results

We performed three distinct targeted sequence capture experiments to validate MOPeD. Sequences targeted for genomic selection and sequencing were derived from the human and mouse genomes. These experiments exemplify potential applications of MOPeD and next-generation sequencing.

Targeted sequencing of the human X chromosome exome

We first used MOPeD to design a MGS array capable of capturing the human X chromosome exome. Targeted sequences included all coding and non-coding (3’ and 5’ untranslated regions) exons. The total reference sequence, consisting of 7,429 fragments with a total size of 2,477,787 bases, was used to design capture microarrays using MOPeD and Roche NimbleGen’s proprietary algorithm (Figure 4). MOPeD successfully selected oligonucleotide probes for 95.1% (7061) of the targeted fragments. As a comparison, the Roche NimbleGen design selected probes for 6% (436) fewer fragments, or 89.1% (6625) of the targeted fragments.

Figure 4.

Figure 4

Comparison of MOPeD and Roche NimbleGen microarray designs for the human X chromosome exome.

Comparing the coverage of the two methods revealed that there was a significant number of exons where only one algorithm successfully chose target probes (Figure 4). A total of 301 exons were covered only in the Roche NimbleGen design. These were not found in the MOPeD design because of high Tm (239), sequence repeats (33) and small fragment size (29). Relaxing the Tm parameter or increasing the size of the region searched would likely allow MOPeD to successfully design probes for these exons. The 737 exons covered only in the MOPeD design were clustered in regions of the genome (telomere, peri-centromeric) expected to contain higher levels of repetitive sequences. These data suggest that the MOPeD algorithm is better at finding unique probes in regions composed of repetitive sequences.

The two designs were then empirically evaluated using the identical experimental protocol. The MOPeD-designed microarray mapped approximately 12% more reads uniquely to the reference target sequence, while at the same time mapping 13% fewer reads outside of the target region (Table 1). The MOPeD-designed microarray also had fewer exons with 0 coverage (Supplemental Figure 1). The Roche NimbleGen-designed microarray had slightly fewer (1.5%) fewer reads that failed to map uniquely to a single location in the target sequence. To assess data accuracy, genotype calls at 1,679 known X chromosome HapMap sites in sample NA18503 showed comparable rates of data completion (97%) and accuracy (98.8%) for both designs. To assess repeatability, we performed MGS with the MOPeD-designed microarray two additional times with non-HapMap samples and obtained comparable results (Supplemental Table 1, Supplemental Figure 2). Thus, the performance differences between the MOPeD- and Roche NimbleGen-designed microarrays are repeatable.

Table 1.

Results of targeted sequencing of human X chromosome exome

Sample ID NA18503 NA18503

Design Algorithm Roche NimbleGen MOPeD
Size of Target Reference Sequence (bp) 2,477,787 2,477,787
Total Number of Reads 6,072,205 11,006,867
Median Depth (bp) 107 184
Proportion of Reads Map to Target 0.436 0.551
Proportion of Reads That Fail to Map Uniquely to Target 0.002 0.019
Proportion of Reads Mapping Outside Target Region 0.562 0.431

Design and validation of a mouse chromosome 16 microarray

To assess whether MOPeD can speed the identification of mutations in the mouse, we first asked whether it could design microarrays that could be combined with next-generation sequencing to identify single basepair changes, such as those induced by the alkylating chemical, N-ethyl-N-nitrosourea (ENU). We focused on the ENU-induced mouse mutant Hnn, which was identified in a forward genetic screen as a recessive mutation that disrupts normal embryogenesis.24 The Hnn mutation was induced on a C57/BL6 background and mapped using a C3H/HeJ backcross to mouse chromosome 16. The region targeted for genomic selection and sequencing consisted of unique coding and noncoding DNA contained within 729 fragments with a total size of 487,615 bases. The MOPeD design successfully selected oligonucleotide probes for all 729 fragments. Microarray-based genomic selection and next-generation sequencing was performed on a DNA sample from a mouse heterozygous for the known mutation (Table 2). Only two fragments out of 729 had a median depth of zero after mapping (Supplemental Figure 3). After mapping the reads, the causative mutation (a T-to-G mutation) in a splice donor site at position 62830567 (mm9 assembly) was successfully identified as a heterozygote with a total coverage of 480.

Table 2.

Results of targeted sequencing of mouse chromosomes 16 and 12

Sample ID Mouse
Chromosome 16
Region
Mouse
Chromosome 16
Exome
Mouse
Chromosome 12
Exome

Design Algorithm MOPeD MOPeD MOPeD
Size of Target Reference Sequence (bp) 487,615 1,712,120 3,345,769
Total Number of Reads 11,219,282 15,444,662 12,933,835
Median Depth (bp) 331 435 119
Proportion of Reads Map to Target 0.444 0.723 0.577
Proportion of Reads That Fail to Map Uniquely to Target 0.035 0.011 0.050
Proportion of Reads Mapping Outside Target Region 0.521 0.266 0.373

Design and validation of a mouse chromosome 16 exome microarray

Mapping a newly induced mutation to a specific chromosome in the mouse can be accomplished inexpensively and rapidly with any number of SNP genotyping arrays. The major bottleneck and cost arise from the need to reduce the size of the region containing the mutation to make it amenable to sequencing. An alternative strategy would be to simply sequence the entire exome of a mouse chromosome suspected to harbor a mutation that results in a visible phenotype when homozygous. To evaluate how MOPeD could make this strategy feasible, we designed a genomic selection microarray targeting the chromosome 16 exome. The targeted sequence consisted of 4,280 unique fragments with a total size of 1,712,120 base pairs. MOPeD was able to successfully design oligonucleotide probes for all chromosome 16 fragments. We then performed genomic selection and targeted sequencing using DNA from a mouse heterozygous for the Hnn mutation (Table 2). Again we successfully identified the Hnn mutation (a T-to-G mutation at position 62830567, mm9 assembly) as a heterozygote with a total coverage of 498. None of the chromosome 16 fragments had a median depth of zero (Supplemental Figure 3).

Design and validation of a mouse chromosome 12 exome microarray

To further validate MOPeD, we designed a mouse chromosome 12 exome microarray to identify an induced mutation. The targeted sequence consisted of 6200 unique fragments with a total size of 3,345,769 base pairs. The MOPeD design successfully selected oligonucleotide probes for all fragments. Genomic selection and Illumina sequencing were then performed (Table 2), and a putative mutation was identified as a heterozygote with a total sequence depth of 108. Subsequent Sanger sequencing confirmed the variant, and complementation testing demonstrated that it was in fact the causative mutation (data not shown). For the chromosome 12 exome microarray, only 45 out of 6,200 fragments had a a median sequence depth of zero (Supplemental Figure 3).

Discussion

Methods of direct genomic selection, especially when combined with next-generation sequencing platforms, offer a number of significant advantages over traditional PCR based methods of target DNA preparation.310, 25, 26 Our software package, MOPeD, enables individual investigators to use a fully open source set of software tools to optimize the design of high-density oligonucleotide microarrays for genomic selection. When integrated with maskless synthesis commercially available from Roche NimbleGen, MOPeD can be especially useful for experiments requiring custom designs, or for those instances when only a limited number of samples need to be characterized.

MOPeD offers a number of advantages over the standard Roche NimbleGen design algorithm. First, MOPeD-designed arrays are able to capture a larger proportion of a targeted reference sequence, while at the same time, having more reads map to the targeted sequence than the equivalent Roche NimbleGen microarray. Second, because the MOPeD software is fully open source and freely available to the scientific community, the methods used are thoroughly described and are available to be improved upon by the larger scientific community. Furthermore, synthesis cycle computations and limits for other manufacturers could be easily incorporated into the software. Third, MOPeD allows the user to know the complete sequence of all oligonucleotide probes. This information is not made available to users of Roche NimbleGen-designed microarrays. Finally, the approach we employ is general, thereby enabling analysis of genomes beyond the human and the mouse. Presently, the MOPeD website (http://moped.genetics.emory.edu/) also includes a rhesus macaque probe database, and we intend to support additional reference genomes in the future.

We believe there are a number of potential future directions MOPeD could help pursue. The current implementation uses a dynamic probe allocation scheme that uses linear models to guide probe selection. The software and performance of the genomic selection microarray might be further improved with the development of non-linear models to help guide probe distribution. Recently, methods of genomic selection that use oligonucleotides in solution are becoming more prevalent and offer some advantages. Regardless of the specific experimental protocol used, the fundamental technical challenge lies in designing oligonucleotides that can uniquely and successfully bind targets from a given genome, and, MOPeD offers a fully open method that can be used to address this important technical challenge.

Conclusion

Here we describe an open source software package named MOPeD that efficiently designs high-density oligonucleotide genomic selection microarrays. At present, individual investigators can access the MOPeD website and design oligonucleotide microarrays for the human, mouse, and rhesus macaque genomes (http://moped.genetics.emory.edu/). Experimental validation of four different MOPeD-designed microarrays shows improved performance on a number of standard metrics, as compared with the proprietary Roche NimbleGen design algorithm.

Supplementary Material

Supplemental Files

Acknowledgements

The ELLIPSE Emory High Performance Computing Cluster was used for this project. This work was supported in part by the National Institutes of Health/National Institute of Mental Health and Gift Fund (grant number MH076439) to MEZ, the Simons Foundation Autism Research Initiative (MEZ), and the PHD Grant (UL1 RR025008, KL2 RR025009 or TL1 RR025010) from the Clinical and Translational Science Award program, National Institutes of Health, National Center for Research Resources.

Footnotes

Disclosures

The author reports no financial interest or conflicts of interest in this work.

References

  • 1.Shendure J, Mitra RD, Varma C, Church GM. Advanced sequencing technologies: methods and goals. Nat Rev Genet. 2004 May;5(5):335–344. doi: 10.1038/nrg1325. [DOI] [PubMed] [Google Scholar]
  • 2.Shendure J, Ji H. Next-generation DNA sequencing. Nat Biotechnol. 2008 Oct;26(10):1135–1145. doi: 10.1038/nbt1486. [DOI] [PubMed] [Google Scholar]
  • 3.Bashiardes S, Veile R, Helms C, Mardis ER. Direct genomic selection. Nat Meth. 2005 Jan 1; doi: 10.1038/nmeth0105-63. [DOI] [PubMed] [Google Scholar]
  • 4.Okou DT, Steinberg KM, Middle C, Cutler DJ, Albert TJ, Zwick ME. Microarray-based genomic selection for high-throughput resequencing. Nature Methods. 2007 Nov;4(11):907–909. doi: 10.1038/nmeth1109. [DOI] [PubMed] [Google Scholar]
  • 5.Porreca GJ, Zhang K, Li JB, et al. Multiplex amplification of large sets of human exons. Nat Methods. 2007 Nov;4(11):931–936. doi: 10.1038/nmeth1110. [DOI] [PubMed] [Google Scholar]
  • 6.Albert TJ, Molla MN, Muzny DM, et al. Direct selection of human genomic loci by microarray hybridization. Nat Methods. 2007 Nov;4(11):903–905. doi: 10.1038/nmeth1111. [DOI] [PubMed] [Google Scholar]
  • 7.Hodges E, Xuan Z, Balija V, et al. Genome-wide in situ exon capture for selective resequencing. Nat Genet. 2007 Dec;39(12):1522–1527. doi: 10.1038/ng.2007.42. [DOI] [PubMed] [Google Scholar]
  • 8.Krishnakumar S, Zheng J, Wilhelmy J, Faham M, Mindrinos M, Davis R. A comprehensive assay for targeted multiplex amplification of human DNA sequences. Proc Natl Acad Sci USA. 2008 Jul 8;105(27):9296–9301. doi: 10.1073/pnas.0803240105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Okou DT, Locke AE, Steinberg KM, et al. Combining microarray-based genomic selection (MGS) with the Illumina Genome Analyzer platform to sequence diploid target regions. Annals of Human Genetics. 2009 Sep;73(Pt 5):502–513. doi: 10.1111/j.1469-1809.2009.00530.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Gnirke A, Melnikov A, Maguire J, et al. Solution hybrid selection with ultra-long oligonucleotides for massively parallel targeted sequencing. Nature Biotechnology. 2009 Feb 1;27(2):182–189. doi: 10.1038/nbt.1523. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Mamanova L, Coffey AJ, Scott CE, et al. Target-enrichment strategies for next-generation sequencing. Nat Methods. 2010 Feb;7(2):111–118. doi: 10.1038/nmeth.1419. [DOI] [PubMed] [Google Scholar]
  • 12.Ng SB, Turner EH, Robertson PD, et al. Targeted capture and massively parallel sequencing of 12 human exomes. Nature. 2009 Sep 10;461(7261):272–276. doi: 10.1038/nature08250. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Ng SB, Buckingham KJ, Lee C, et al. Exome sequencing identifies the cause of a mendelian disorder. Nat Genet. 2010 Jan;42(1):30–35. doi: 10.1038/ng.499. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Singh-Gasson S, Green RD, Yue Y, et al. Maskless fabrication of light-directed oligonucleotide microarrays using a digital micromirror array. Nature Biotechnology. 1999 Oct 1;17(10):974–978. doi: 10.1038/13664. [DOI] [PubMed] [Google Scholar]
  • 15.Graf S, Nielsen FG, Kurtz S, et al. Optimized design and assessment of whole genome tiling arrays. Bioinformatics. 2007 Jul 1;23(13):i195–i204. doi: 10.1093/bioinformatics/btm200. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Lipson D, Yakhini Z, Aumann Y. Optimization of probe coverage for high-resolution oligonucleotide aCGH. Bioinformatics. 2007 Jan 15;23(2):e77–e83. doi: 10.1093/bioinformatics/btl316. [DOI] [PubMed] [Google Scholar]
  • 17.Schliep A, Krause R. Efficient algorithms for the computational design of optimal tiling arrays. IEEE/ACM Trans Comput Biol Bioinform. 2008 Oct–Dec;5(4):557–567. doi: 10.1109/TCBB.2008.50. [DOI] [PubMed] [Google Scholar]
  • 18.Hovik H, Chen T. Dynamic probe selection for studying microbial transcriptome with high-density genomic tiling microarrays. BMC Bioinformatics. 2010;11:82. doi: 10.1186/1471-2105-11-82. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Lemoine S, Combes F, Le Crom S. An evaluation of custom microarray applications: the oligonucleotide design challenge. Nucleic Acids Res. 2009 Apr;37(6):1726–1739. doi: 10.1093/nar/gkp053. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Vainrub A, Pettitt BM. Theoretical aspects of genomic variation screening using DNA microarrays. Biopolymers. 2004 Apr 5;73(5):614–620. doi: 10.1002/bip.20008. [DOI] [PubMed] [Google Scholar]
  • 21.Roche Nimblegen Systems I. Technical Note: Roche Nimblegen Probe Design Fundamentals. Part No. TN-ARAY0100.2007. [Google Scholar]
  • 22.Mulle JG, Patel VC, Warren ST, Hegde MR, Cutler DJ, Zwick ME. Empirical evaluation of oligonucleotide probe selection for DNA microarrays. PLoS ONE. 2010;5(3):e9921. doi: 10.1371/journal.pone.0009921. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Cutler DJ, Zwick ME, Carrasquillo MM, et al. High-throughput variation detection and genotyping using microarrays. Genome Research. 2001 Nov;11(11):1913–1925. doi: 10.1101/gr.197201. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Caspary T, Larkins CE, Anderson KV. The graded response to Sonic Hedgehog depends on cilia architecture. Dev Cell. 2007 May;12(5):767–778. doi: 10.1016/j.devcel.2007.03.004. [DOI] [PubMed] [Google Scholar]
  • 25.Dahl F, Stenberg J, Fredriksson S, et al. Multigene amplification and massively parallel sequencing for cancer mutation discovery. Proc Natl Acad Sci U S A. 2007 May 29;104(22):9387–9392. doi: 10.1073/pnas.0702165104. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Bau, Schracke N, Kränzle M, Wu H, Stähler P. Targeted next-generation sequencing by specific capture of multiple genomic loci using low-volume …. Anal Bioanal Chem. 2008 Jan 1; doi: 10.1007/s00216-008-2460-7. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental Files

RESOURCES