Abstract
Summary
High-throughput sequencing can enhance the analysis of aptamer libraries generated by the Systematic Evolution of Ligands by EXponential enrichment. Robust analysis of the resulting sequenced rounds is best implemented by determining a ranked consensus of reads following the processing by multiple aptamer detection algorithms. While several such approaches have been developed to this end, their installation and implementation is problematic. We developed AptCompare, a cross-platform program that combines six of the most widely used analytical approaches for the identification of RNA aptamer motifs and uses a simple weighted ranking to order the candidate aptamers, all driven within the same GUI-enabled environment. We demonstrate AptCompare’s performance by identifying the top-ranked candidate aptamers from a previously published selection experiment in our laboratory, with follow-up bench assays demonstrating good correspondence between the sequences’ rankings and their binding affinities.
Availability and implementation
The source code and pre-built virtual machine images are freely available at https://bitbucket.org/shiehk/aptcompare.
Supplementary information
Supplementary data are available at Bioinformatics online.
1 Introduction
Aptamers are oligonucleotides which possess specific binding affinity for their intended targets. To date, aptamers with high affinity and selectivity have been reported for various targets, including small molecules, proteins and cell-surface receptors (Stoltenburg et al., 2007). Compared with antibodies, aptamers have several advantages: they can be synthesized in vitro, are inexpensive to produce and are not immunogenic (Yan and Levy, 2009). The technique by which aptamers are generated is known as in vitro selection or SELEX (Systematic Evolution of Ligands by EXponential enrichment; Ellington and Szostak, 1990; Tuerk and Gold, 1990). In this approach, a randomized sequence pool is subjected to multiple iterations of binding to a molecular target, elution from that target and amplification. Between 5 and 15 rounds of SELEX are typically required to yield detectable function which can be attributed to individual clones following sequence analysis and subsequent assays.
Traditionally accomplished by molecular cloning, advances in DNA sequencing technology have increased the sampling resolution of the RNA species and enabled a quantitative analysis of the final, enriched population in each selection round. As aptamers bind to their targets, individual molecules or families of similar sequences harboring one or more ‘active’ motifs are expected to become enriched throughout the course of selection. When analyzed by high-throughput sequencing (HTS), reads from each round of the selection can be used to identify the over-represented subsets, and the resulting consensus aptamer sequences can be determined. These can then be modified or minimized at the bench for improved efficiency in synthesis and binding. To date, multiple programs have been developed to perform motif discovery from aptamer selections following HTS-SELEX, each with differing modes of implementation, installation and overall performance. Here we present AptCompare, a GUI-driven software environment that offers a first-of-its-kind ability to perform a meta-analysis of multiple aptamer detection algorithms simultaneously on the same dataset. Currently, AptCompare automates the pre-processing and analysis of HTS-SELEX sequencing data using six of the most widely used aptamer motif discovery programs, ranking each program’s performance and using a meta-rank metric to identify the most promising aptamer targets for validation at the bench.
2 Implementation
AptCompare combines several approaches for de novo motif discovery in aptamer selections. The first approach is a frequency or sequence counting script written in Perl, which performs a simple ranking of the most frequent sequences in an aptamer pool that is often the first step in any such analysis. The other five programs are: AptaCluster (AptaTools; Hoinka et al., 2014), FASTAptamer (Alam et al., 2015), MPBind (Jiang et al., 2014), APTANI (Caroli et al., 2016) and RNAmotifAnalysis (Ditzler et al., 2013). Of these six approaches, the first four are sequence-based, whereas the last two incorporate RNA folding programs to predict secondary structure. All of these approaches, their features and their requirements are summarized in Supplementary Table S1. Designed to be cross-platform, AptCompare was developed primarily in Python 2.7, with the GUI written using the PyForms and PyQt4 libraries. The individual programs, however, have intrinsic software dependencies, including Unix utilities, MySQL and third-party components, several of which require deployment explicitly using Python 2.7 or supporting processing libraries only available on specific Ubuntu releases. Issues involving the operation of Python-based GUI’s within containers precluded the development of a docker port. Therefore, we have also provided a virtual machine image and an Amazon Machine Image to facilitate ‘turn-key’ configuration and deployment on new systems.
In order to make equivalent comparisons, we have designed AptCompare to require minimal user input and configuration in most situations. This ‘basic’ mode executes the analysis with default parameters. Advanced users can configure specific parameters and run the individual programs. Following the completion of the analysis, the results can be exported into a tab-delimited format and saved as a spreadsheet. The results are sorted by the weighted ranking, the mean rank calculated by all six methods. AptCompare also sums the number of programs (out of six) that select a given motif, corresponding to a ‘measure of belief’ in that motif model.
A pre-processing step converts raw FASTQ, FASTA or sequence files to all necessary formats for each of the individual programs. Because we are primarily interested in aptamer motifs, AptCompare uses an initial hierarchical clustering step in the FASTAptamer package to identify cluster seeds, the representative sequences of each cluster. The remainder of the analysis is conducted using these cluster seeds, significantly enhancing overall performance.
3 Results
We evaluated the performance of AptCompare using the dataset from an aptamer selection against the human transferrin receptor previously conducted in our laboratory (Maier et al., 2016; Wilner et al., 2012). The original selection, which did not use HTS, identified three aptamers, all of which share a core motif, characterized by two asymmetric internal loops and a longer motif in one of the loops (Supplementary Fig. S1a). We sequenced each round of this selection using HTS and ran AptCompare on the resulting reads from each round. A total of 27 unique ranked candidate aptamers were identified in total (details in Supplementary Material). Multiple sequence alignments (Supplementary Fig. S2) revealed that nearly all shared the common nonamer GATCA[AT]TNC, which was the motif identified in the previous selection (Supplementary Fig. S1b). We synthesized all 27 candidate aptamers and measured their apparent binding affinity via flow cytometry (see the Supplementary Material for methods). The binding affinities were measured by estimating the apparent dissociation constant, Kd. Our results demonstrate strong concordance between measured binding affinity and aggregate aptamer ranking (Supplementary Tables S2 and S3). For the analysis of this particular dataset, all methods demonstrated generally similar detection performance—even the simplest method, sequence counting, proved highly effective. However AptCompare’s utility is demonstrated by the fact that one can not only identify over-represented sequence counts iteratively with several SELEX rounds, but also characterize any aptamer signals’ evolution structurally via its structural prediction/statistical modeling components.
We have designed AptCompare with experimentalists in mind, who need to analyze the results from aptamer selections derived from HTS-SELEX and wish to base their validation assays on a comprehensive study of the derived read libraries. AptCompare conveniently performs pre-processing, clusters sequences and compares the results from six of the most widely used approaches encompassing a variety of sequence- and secondary structure-based techniques. It calculates a weighted ranking and the number of appearances, which are analogous to the significance of a motif and a measure of belief in that motif, respectively. In addition, AptCompare’s modular and flexible design allow users to easily upgrade the aptamer discovery suite of programs, as and when new tools become available.
Supplementary Material
Acknowledgement
The authors acknowledge the support and advice Myles Akabas, MD, PhD.
Funding
This work was supported by National Institutes of Health Medical Scientist Training Program training grant [5T32GM007288].
Conflict of Interest: none declared.
Contributor Information
Kevin R Shieh, Department of Medicine, Maimonides Medical Center, Brooklyn, NY, USA.
Christina Kratschmer, Department of Medicine, Washington University School of Medicine, St. Louis, MO, USA.
Keith E Maier, Vitrisa Therapeutics, Durham, NC, USA.
John M Greally, Department of Genetics, Albert Einstein College of Medicine, Bronx, NY, USA.
Matthew Levy, Vitrisa Therapeutics, Durham, NC, USA.
Aaron Golden, School of Mathematics, Statistics and Applied Mathematics, National University of Ireland, Galway, Galway, Ireland.
References
- Alam K.K. et al. (2015) FASTAptamer: a bioinformatic toolkit for high-throughput sequence analysis of combinatorial selections. Mol. Ther. Nucleic Acids, 4, e230. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Caroli J. et al. (2016) APTANI: a computational tool to select aptamers through sequence-structure motif analysis of HT-SELEX data. Bioinformatics, 32, 161–164. [DOI] [PubMed] [Google Scholar]
- Ditzler M.A. et al. (2013) High-throughput sequence analysis reveals structural diversity and improved potency among RNA inhibitors of HIV reverse transcriptase. Nucleic Acids Res., 41, 1873–1884. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ellington A.D., Szostak J.W. (1990) In vitro selection of RNA molecules that bind specific ligands. Nature, 346, 818–822. [DOI] [PubMed] [Google Scholar]
- Hoinka J. et al. (2014) AptaCluster—a method to cluster HT-SELEX aptamer pools and lessons from its applications. Res. Comput. Mol. Biol., 8394, 115–128. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jiang P. et al. (2014) MPBind: a meta-motif-based statistical framework and pipeline to predict binding potential of SELEX-derived aptamers. Bioinformatics, 30, 2665–2667. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Maier K.E. et al. (2016) A new transferrin receptor aptamer inhibits new world hemorrhagic fever mammarenavirus entry. Mol. Ther. Nucleic Acids, 24, e321. [DOI] [PubMed] [Google Scholar]
- Stoltenburg R. et al. (2007) SELEX–a r(e)volutionary method to generate high-affinity nucleic acid ligands. Biomol. Eng., 24, 381–403. [DOI] [PubMed] [Google Scholar]
- Tuerk C., Gold L. (1990) Systematic evolution of ligands by exponential enrichment: RNA ligands to bacteriophage T4 DNA polymerase. Science, 249, 505–510. [DOI] [PubMed] [Google Scholar]
- Wilner S.E. et al. (2012) An RNA alternative to human transferrin: a new tool for targeting human cells. Mol. Ther. Nucleic Acids, 1, e21. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yan A.C., Levy M. (2009) Aptamers and aptamer targeted delivery. RNA Biol., 6, 316–320. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.