Skip to main content
ACS Omega logoLink to ACS Omega
. 2018 Aug 28;3(8):10002–10007. doi: 10.1021/acsomega.8b01464

Novel Method for Structure–Activity Relationship of Aptamer Sequences for Human Prostate Cancer

Xinliang Yu †,‡,*, Huiqiong Yang , Xianwei Huang
PMCID: PMC6644987  PMID: 31459128

Abstract

graphic file with name ao-2018-01464j_0001.jpg

Prostate cancer (PCa) is one of the most common malignancies in men and seriously threatens men’s health. Developing aptamer probes for PCa cells is of great significance for early diagnosis and treatment of PCa. This paper reports a classification model for SELEX-based aptamers, which were obtained with PCa cell line PCa-3M-1E8 (highly metastatic tumor cell) as target cells and PCa cell line PCa-3M-2B4 (low metastatic tumor cell) as control cells. On the basis of the SELEX principle, 100 oligonucleotide sequences from the 3rd round of SELEX were defined as low affinity and specificity aptamers, and 100 sequences from the 11th round were set as high affinity and specificity aptamers. Seven molecular descriptors were used for the classification model, which were calculated from amino acid sequences translated from DNA aptamer sequences with DNAMAN software. The classification model based on binary logical regression analysis has prediction accuracies, sensitivity, and specificity of about 80% for both the training set and test set. Therefore, it is feasible to calculate molecular descriptors from amino acid sequence translated from DNA aptamer sequences and develop a classification model for PCa cell line PCa-3M-1E8.

Introduction

Aptamers are short single-stranded DNA (ssDNA) (or RNA) oligonucleotides. They can fold into distinct two-dimensional (2D) and three-dimensional (3D) structures and are capable of binding targets, such as proteins, cells, and viruses, with high affinity and specificity.1 Compared to natural antibodies, aptamers possess several advantages in synthesis and modification. In addition, aptamers are small in sizes and display good binding affinity to targets, which make them easily penetrate tissue and tumor, and possess fast blood clearance.2,3

Aptamers are promising in therapeutic applications, such as cancer cell detection and diagnostics, as well as in molecular tools in the areas of anti-infective, anticoagulation, anti-inflammation, antiangiogenesis antiproliferation, and immune therapies.4,5 DNA aptamers have been developed for the virus-infected cells, mesenchymal stem cells, porcine endothelial precursor cells, and live bacterial cells.6,7 At present, an aptamer (brand name Macugen) has been used for the treatment of neovascular age-related macular degeneration. In addition, some aptamers are currently in clinical trials.8

Aptamers can be acquired through an in vitro iterative selection process designated as the systematic evolution of ligands by an exponential enrichment (SELEX) technique.1,3 This is a reiterative process as follows: the starting oligonucleotide library with approximately 1014–1015 unique sequences is incubated with the targets of interest by an affinity binding method, the bound aptamer–target complexes are partitioned from the unbound oligonucleotides, the unbound oligonucleotides are discarded, the bound aptamers are eluted in from the targets and amplified by polymerase chain reaction (PCR), and the amplified aptamers are incubated with the targets in the next round of SELEX. After several rounds of selection, aptamers with high affinity and specificity will be obtained.

However, the above selection processes are lengthy and need relatively higher consumption in reagents. The quantitative structure–activity relationship (QSAR) (e.g., pattern recognition model or classification model) method can reduce aptamer development costs and accelerate the process of screening candidate sequences. The main objective of the QSAR methodology is to find a mathematical relationship between chemical structures and the activity of interest. To accomplish this purpose, one usually employs a mathematical model (F)

graphic file with name ao-2018-01464j_m001.jpg 1

Equation 1 correlates experimental activity values with a set of molecular descriptors, including physiochemical properties and structural properties derived from the molecular structures. Once a reliable QSAR model is developed, this model can be used to predict the activities of new chemicals. Thus, QSAR can be used to understand the mechanism, design new compounds, and screen candidates.9 Then, the most-promising candidate may be synthesized for laboratory testing.

A few groups have carried out QSAR studies for aptamers. Yu et al. introduced a classification model for predicting streptavidin-binding aptamers that may be “low”- or “high”-affinity aptamers. The model possesses a classification accuracy of 97.98% for the training set. Furthermore, the prediction fractions of winning aptamers from the 1st round to the 10th round of SELEX consist of the enrichment characteristics of aptamer based on SELEX selection.10 In addition, Yu et al. developed another classification model for the binding affinity of aptamers of hepatocarcinoma cell line SMMC-7721.11 The classification model has good statistical qualities with specificity and sensitivity greater than 80%. For these two classification models,10,11 some molecular descriptors used in the models were calculated from the loop structures of center sequences. Li et al. analyzed aptamer–target pairs by applying sequence information from DNA (or RNA) aptamers and their target proteins. The predictive accuracies of the model for the training set and the test set are, respectively, 81.34 and 77.41%.12 Musafia et al. introduced a novel QSAR model for the binding affinity of anti-influenza aptamers. The model is accurate with regression coefficients R2 being 0.702 for the training set and 0.66 for the test set, though the descriptors are based on topological structures, which cannot describe three-dimensional information, such as the electronic and geometric properties.6

Despite aptamers having found an increasingly wide utilization, only a few QSAR studies have been reported. There are two reasons for this phenomenon. One is that experimental data of aptamers are insufficient. The other is that accurately calculating molecular descriptors from aptamers is difficult because of their larger molecular weight. In this work, we adopt a novel method to calculate molecular descriptors from aptamers against prostate cancer (PCa) cells that were selected by our group with the cell-SELEX method and develop a pattern recognition model for PCa aptamers. PCa is one of the most common cancers nowadays in men and, as the second leading cause of cancer death, seriously threatens men’s health.13 A deficiency of early clinical diagnosis of PCa can lead to an estimated 270 thousand men dying from PCa in one year in the United States alone.14 The development of aptamer probes for PCa is urgently required.1517 This work will help the SEXLEX selection of aptamer sequences for PCa.

Results and Discussion

Table S1 in the Supporting Information shows 200 center sequences of candidate aptamer (5′-AGAAGGAAGGAGAGCGACAC-N40-TATCAGTGGTCGGTCGTCAT-3′), with PCa cell line PCa-3M-1E8 acting as target cells and PCa cell line PCa-3M-2B4 being the negative control cells in SELEX selection. Sequence nos. 1–67 and 135–167 were selected from the 3rd round of SELEX, while sequence nos. 68–134 and 168–200 were obtained from the 11th round of selection. These sequences were randomly divided into two sets: a training set and a test set. Sequence nos. 1–134 from the training set were used to build classification models. The remaining sequences (nos. 135–200) in the test set were used for evaluating the predictive power of classification models. Candidate aptamer sequences in Table 1 were selected from the 12th round of SELEX selection. These sequences obtained from the 3rd round of SELEX have low affinity and specificity for PCa-3M-1E8 target. Their class labels were defined as 1. The sequences obtained from the 11th and 12th rounds of SELEX possess high affinity and specificity for PCa-3M-1E8 target. Their class labels were defined as 2.

Table 1. Twenty-Four Candidate Aptamer Sequences from 12 Rounds.

no. sequences (5′ → 3′) exp. pre.
1 TGCAGGGTGAGAGGTTGGCTTTAGAGGGTTAGGGGGAATT 2 1
2 GGAGGGCTAGAGTAGGGGGCTGTCAAGGGGTCGGTGGGGA 2 2
3 TGCAGGGTGAGAGGTTGGTTTTAGAGGGTTAGGGGGAATT 2 1
4 GGAAGGGGCGTGGTTGGTAGAAAGGGAAGGGGAAGTTTAG 2 2
5 AGGGGGCAAGAGGGTGGTTTTAGAGGGGCAGGGGGAGTT 2 2
6 GGGGGAGGCGGGCGGGGTGCTGACGGGGGAGTTTAGCCGT 2 2
7 GAGGAGGTCATGGGAGAGGAGGCGGAGACGGGGAGGGATG 2 2
8 GCACGGGATCAGGGTGGGGTGGAGAGGGGAATTTTAGTGG 2 2
9 GGGACACGGTTGGGAGTGGGGTTTGGTCGTCCGGGGGATG 2 2
10 GAGCATGGAGTACGGGCGGGGTGATGACGGGGGAGTTTAG 2 2
11 AGGGGGGGTTGGGTATGCGTCCGGAGAGTTGCTCGAGTTC 2 2
12 TTCGGGCGATGGGTTAGGTTGGCGGAGGTGGGAGGGCGCGG 2 2
13 CTCTCGGGAAGTACGGTAGGAAGTGGTACCACGGGGTTA 2 2
14 TGGGGGCAAGAGGGTGGTTTTAGAGGGGCAGGGGGAGTT 2 2
15 GGGGGCGAGAGGGTGGTTTTAGAGGGGATGGGAGGAGTT 2 2
16 GGGGAGGCGGGCGGGGTGCTGACGGGGGAGTTTAGCCGT 2 2
17 GGGGAGGCGGGCGGGGTGCTGACGGGGGAGTTTAGCCGT 2 2
18 GGAGGAGGCGGGCGGGGTGCTGACGGGGGAGTTTAGCCGT 2 2
19 GTACCCGGAGACCAGTGACGGGGGTGTTTCGGCTGAAGCT 2 1
20 GTACTCTGCGTGCTGGGGGTGTTTGATGTAGTTCAGGCT 2 2
21 GTACTTGCGGAGCAGGGGTGGACCGTGTATAGTCGGGACT 2 2
22 GAGAGAGTGGGGGAGTGATCGGAGCGTGGGGTGTAGGGC 2 1
23 GGATCGGCTCGGGGGGGCAAGGGCCGGCGGGGATGTCATG 2 2
24 TAGGACGGAATGGGGGTGTGGGCTGTAGGGGAGGACAAAG 2 2

After calculating 1000 molecular descriptors for each optimized amino acid sequence that was translated from DNA aptamer sequences with DNAMAN software, binary logical regression analysis was performed for selecting the optimal subset of descriptors, by applying Forward Wald method in IBM SPSS Statistics 19 for the total data set (200 sequences) in Table S1. Classification table, variables in the classification models, and their definitions of molecular descriptors are listed in Tables 2, 3, and 4, respectively. Table 2 shows that only classification model 7 has accuracies above 80%. Table 3 suggests that all the variables in the classification model 7 possess sig. values smaller than 0.05, which indicate that these independent variables have meaningful effect on the dependent variable. Thus, the seven molecular descriptors in classification model 7 were taken as the optimal descriptor subset and used to develop logical regression eq 2 from the training set in Table S1

graphic file with name ao-2018-01464j_m002.jpg 2

Here, we take the calculation of probable value for sequence no. 200 in Table S1 as an example. Its descriptor values for EED, SVV, CWP, FCO, FOS, FCS, and FNN are, respectively, 1.963, −13.483, 0.139, 28, 2, 3, and 8. From eq 2, we can obtain P(Y) = 1/(1 + e–0.994) = 0.730. Because the P(Y) is above 0.5, the class label of sequence no. 200 is “2”.

Table 2. Classification Table Based on the Total Set.

 
predicted
observed class 1 class 2 accuracy
model 1 class 1 56 44 56.0
class 2 32 68 68.0
overall percentage 62.0
model 2 class 1 69 31 69.0
class 2 32 68 68.0
overall percentage  
model 3 class 1 72 28 72.0
class 2 26 74 74.0
overall percentage 73.0
model 4 class 1 72 28 72.0
class 2 22 78 78.0
overall percentage 75.0
model 5 class 1 76 24 76.0
class 2 23 77 77.0
overall percentage  
model 6 class 1 76 24 76.0
class 2 20 80 80.0
overall percentage 78.0
model 7 class 1 81 19 81.0
class 2 19 81 81.0
overall percentage 81.0

Table 3. Variables in the Classification Models.

model descriptor B SE Waals df sig. exp(B)
model 1a FCO –0.125 0.027 20.919 1 0.000 0.883
constant 4.242 0.936 20.538 1 0.000 69.544
model 2b SVV –0.526 0.117 20.319 1 0.000 0.591
FCO –0.221 0.038 34.229 1 0.000 0.802
constant 0.169 1.267 0.018 1 0.894 1.184
model 3c SVV –0.555 0.123 20.356 1 0.000 0.574
FCO –0.228 0.039 34.630 1 0.000 0.796
FCS –0.350 0.104 11.399 1 0.001 0.705
constant 0.442 1.334 0.110 1 0.740 1.556
model 4d EED 23.734 13.191 3.237 1 0.072 2.031 × 1010
SVV –0.622 0.128 23.566 1 0.000 0.537
FCO –0.280 0.046 37.076 1 0.000 0.756
FCS –0.426 0.115 13.750 1 0.000 0.653
constant –45.591 25.627 3.165 1 0.075 0.000
model 5e EED 42.590 15.877 7.196 1 0.007 3.139 × 1018
SVV –0.684 0.136 25.155 1 0.000 0.505
FCO –0.0326 0.052 39.040 1 0.000 0.722
FOS 1.196 0.381 9.863 1 0.002 3.306
FCS –0.855 0.194 19.404 1 0.000 0.425
constant –82.136 30.791 7.116 1 0.008 0.000
model 6f EED 36.783 16.405 5.027 1 0.025 9.431 × 1015
SVV –0.853 0.157 29.673 1 0.000 0.426
FCO –0.362 0.058 39.332 1 0.000 0.696
FOS 1.245 0.396 9.894 1 0.002 3.473
FCS –0.901 0.201 20.087 1 0.000 0.406
FNN –0.240 0.089 7.211 1 0.007 0.787
constant –70.938 31.822 4.969 1 0.026 0.000
model 7g EED 36.306 16.785 4.679 1 0.031 5.852 × 1015
SVV –0.838 0.161 27.062 1 0.000 0.433
CWP –90.770 28.598 10.074 1 0.002 0.000
FCO –0.404 0.064 40.162 1 0.000 0.667
FOS 1.306 0.424 9.488 1 0.002 3.691
FCS –0.995 0.216 21.164 1 0.000 0.370
FNN –0.329 0.098 11.388 1 0.001 0.719
constant –55.171 32.756 2.837 1 0.092 0.000
a

Variable(s) entered on model 1: FCO.

b

Variable(s) entered on model 2: SVV.

c

Variable(s) entered on model 3: FCS.

d

Variable(s) entered on model 4: EED.

e

Variable(s) entered on model 5: FOS.

f

Variable(s) entered on model 6: FNN.

g

Variable(s) entered on model 7: CWP.

Table 4. Definitions of Molecular Descriptors in Classification Models.

no. symbol definition class
1 EED eigenvalue n (=7) from edge adjacency matrix weighted by dipole moment edge adjacency indices
2 SVV signal 05/weighted by van der Waals volume 3D-MoRSE descriptors
3 CWP 3rd component symmetry directional WHIM index/weighted by polarizability WHIM descriptors
4 FCO frequency of C–O at topological distance 6 2D atom pairs
5 FOS frequency of O–S at topological distance 7 2D aom pairs
6 FCS frequency of C–S at topological distance 9 2D atom pairs
7 FNN frequency of N–N at topological distance 10 2D atom pairs

Subsequently, we adopted eq 2 to predict the class labels of sequences in the test set in Table S1. The class labels calculated are listed in Table S1. Besides the statistical parameter accuracy (=true prediction/(true prediction + false prediction)), sensitivity (=true positives/(true positive + false negative)) and specificity (=true negatives/(true negative + false positives)) were used to describe the characteristics of prediction. The statistical results from eq 2 are listed in Table 5, which shows that the accuracy, sensitivity, and specificity of the training set and test set are about 80%. Further, eq 2 was used to predict class labels of sequences in Table 1 from the 12th round of SELEX. The calculation class labels are listed in Table 1. The prediction accuracy is 83.3%. Therefore, the prediction results are accurate.

Table 5. Statistical Results from Logical Regression Equation 2.

data set label experiment prediction accuracy (%)
training set   class 1 class 2  
class 1 67 53 14 79.1
class 2 67 12 55 82.1
  sensitivity specificity  
81.5% 79.7%
test set   class 1 class 2  
class 1 33 27 6 81.8
class 2 33 6 27 81.8
  sensitivity specificity  
81.8% 81.8%

The first descriptor EED (eigenvalue n (=7) from the edge adjacency matrix weighted by dipole moment) is an edge adjacency index. EED belongs to topological indices in nature and is calculated from the edge adjacency matrix of an H-depleted molecule. The square matrix is symmetric and has a dimension of nBO × nBO (here nBO is the number of bonds of nonhydrogen atom pairs), whose entries equal one if the bonds are adjacent and zero for nonadjacent. EED is calculated by applying the dipole moment for edge weighting schemes. Thus, EED is related to the symmetry of molecules and polarity and possesses good discrimination among isomers.18

The second descriptor SVV, signal 05/weighted by van der Waals volume, belongs to 3D molecule representation of structures based on electron diffraction (3D-MoRSE) descriptors. They can be calculated with the following expression

graphic file with name ao-2018-01464j_m003.jpg 3

where Morsw denotes the scattered electron intensity, the parameter s means the scattered intensity of atoms in the different directions varying from 0 to 31, rij represents the distance between atoms i and j, nAT denotes the number of atoms, and w denotes an atomic property, including unweighted schemes and weighting properties, such as van der Waals volume and atomic polarizability. The descriptor SVV describes the scattering of electron intensity along the van der Waals volume. It can describe molecular similarity/diversity from three-dimensional structures.18

The third descriptor, CWP, is the 3rd component symmetry directional WHIM index/weighted by polarizability. WHIM descriptors are based on the configuration of atoms in the 3D space, which is defined by the Cartesian axes (x, y, and z). Molecular principal axes are evaluated to obtain a unique reference frame. Besides atomic projections being carried out along their respective principal axes, atomic dispersion and distribution around the geometric center are calculated. Similarly, six different weighting schemes (including unweighted schemes, atomic mass, the van der Waals volume, the Sanderson atomic electronegativity, the atomic polarizability, and the electrotopological state indices of Kier and Hall) are taken into account to obtain various WHIM descriptors. The descriptor CWP not only reflects molecular size and shape, but also describes molecular symmetry and atom distribution with respect to invariant reference frames.18

The remaining topological molecular descriptors, FCO (frequency of C–O with the topological distance being 6), FOS (frequency of O–S with the distance being 7), FCS (frequency of C–S with the distance being 9), and FNN (frequency of N–N with the distance being 10), are binary and frequency atom pairs. In binary atom pairs, the value (1) or (0), respectively, expresses the presence or absence of a certain pair of atoms at a specific bond distance. As for the frequency of atom pairs, each molecular descriptor means the number of atom pairs that satisfy the above conditions of topological distance.

The main aim of QSAR studies is to develop correlations of molecular structures and their properties. However, aptamer structures are related to the structure of amino acid sequences translated from their corresponding DNA aptamer sequences. This is the reason why the seven molecular descriptors present in eq 2 from amino acid sequences can describe the aptamer structures. In general, the QSAR prediction strategies depend on molecular descriptors calculated directly from molecular structures. However, our classification model is related to descriptors through an indirect route. Thus, our paper is novel and provides a new approach for the QSAR investigations of aptamers. Especially, this paper indicates that there is a correlation between the chemical structures of DNA aptamer sequences and amino acid sequences translated from the former, although this subject may be complicated. The feasibility of calculating molecular descriptors from amino acid sequences translated from DNA aptamer sequences and developing binary logical regression equation for aptamers of PCa cell line PCa-3M-1E8 has been demonstrated. However, it should be noted that our method is based on the DNA sequences encoding protein’s information to realize the prediction for aptamer sequences. However, the method cannot be used to predict their specific confirmations, which are correlated with aptamer binding specificity. The reason is that the specific confirmations of aptamers depend on target properties. That is to say, an aptamer sequence may possess different confirmations when it is applied in various fields.

Conclusions

Although directly calculating molecular descriptors from aptamers is difficult, the seven molecular descriptors EED,SVV, FCO, CWP, FOS, FCS, and FNN can describe the aptamer structures, which were calculated from amino acid sequence translated by DNA aptamer sequences with DNAMAN software. The classification model can be used to recognize candidate aptamers with high or low affinity and specificity to PCa cell line PCa-3M-1E8. The prediction accuracies, sensitivity, and specificity obtained from the training and test sets are about 80%. Therefore, the prediction results are accurate and acceptable. The investigation provides a novel approach to calculate descriptors for aptamer molecules.

Materials and Methods

Cell-SELEX Experimental

Candidate aptamer sequences used in this paper were selected in our laboratory with the cell-SELEX technique. The forward primer (FP) labeled with 6-carboxyfluorescein (FAM) and the biotinylated reverse primer (RP) were, respectively, 5′-FAM-AGAAGGAAGGAGAGCGACAC-3′ and 5′-biotin-ATGACGACCGACCACTGATA-3′, which were used for the synthesis of the random ssDNA library, 5′-TTGACTTGCCACTGACTACC-(randomized region with 40 ± 2 nucleotides)-GAAGTCAGTCGGTCGTCATC-3′. In addition, the primers (FP, 5′-AGAAGGAAGGAGAGCGACAC-3′) and (RP, 5′-ATGACGACCGACCACTGATA-3′) were applied to clone and sequence the product sequences. These primer products and the initial ssDNA pool were purchased from Sangon Biotech (Shanghai, China) Co., Ltd. PCa cell lines were bought from Shanghai Institute of Biochemistry and Cell Biology.

PCa cell line PCa-3M-1E8 (highly metastatic tumor cell) was used as target cells for positive selection, and PCa cell line PCa-3M-2B4 (low metastatic tumor cell) was taken as the negative control targets during cell-SELEX. The SELEX procedures are as follows: ssDNA pool (10 nmol) was completely dissolved in 900 μL of binding buffer (obtained from washing buffer with the concentrations of 1 mg/mL for bovine serum albumin and 0.1 mg/mL for yeast tRNA) and formed the initial library; incubation of the initial library with PCa-3M-1E8 cells (1 × 106) was continued for 1 h on ice on a rotary shaker; the elution of cells from unbound sequences was performed in triplicate with washing buffer (5 mmol/L magnesium chloride and 4.5 g/L glucose in phosphate buffer saline); the bound DNAs were eluted in 500 mL of sterile water at 95 °C for 10 min and then were amplified with the PCR technique; the products were isolated by means of streptavidin-coated sepharose beads and treated with 0.2 mol/L sodium hydroxide. The enriched ssDNA aptamer pools from each round of SELEX were used for the next round. After the fourth round of SELEX selection, the counterselection was introduced to increase the specificity of aptamers, which may differentiate similar PCa cells. In addition, the incubation and washing conditions were made stringent gradually to improve aptamer performances and SELEX screening efficiency. Other SELEX procedure and materials can be found in References.5,15 After 12 rounds of SELEX selection, the selection library reached a platform, which was monitored by using the flow cytometer (FACScalibur, BD Bioscience). The ssDNA pools from the 3rd and 11th rounds were sequenced by a high-throughput sequencing device. The resulting pool from the 12th round was sequenced with the classic method of cloning and sequencing.

According to aptamer evolutionary principles of SELEX selection, the binding ability of aptamers increases exponentially during the first few rounds. Finally, the binding affinity does not increase obviously or reaches a saturation point. Although a candidate aptamer appearing in one of the first few rounds does not mean it does not bind, the sequences from 3rd round and 11th round (see Table S1) are not duplicated in our experiment. Thus, it is reasonable to define the class labels of ssDNA sequences obtained from the first few rounds (e.g., from the 1st to 3rd rounds) of SELEX as 1 and the class labels of sequences from the final rounds (such as the 11th and 12th rounds) as 2 (here, class labels 1 and 2, respectively, denote aptamer sequences having low and high affinity). The experimental affinities from aptamer candidates (nos. 1, 2, 4, and 5 in Table 1) with PCa cell line PCa-3M-1E8 were tested with a flow cytometer. PCa-3M-1E8 cells (1 × 106) and candidate aptamers in diverse concentrations were incubated in binding buffer (200 mL) on ice for 30 min. The equilibrium dissociation constant Kd used to describe the affinities was calculated with eq 4.

graphic file with name ao-2018-01464j_m004.jpg 4

where the parameter C denotes the concentration of aptamer sequences, the parameter B denotes the aptamer adsorption, and the parameter Bmax expresses the saturated adsorption. The experimental Kd values are 3.24, 2.0, 46.8, and 7.1 nM, which are in the nanomolar range and acceptable.

Molecular Descriptor Derivation

An important step in classical QSAR studies is to calculate molecular descriptors. Generally, the QSAR models are based on molecular descriptors calculated directly from their respective molecular structure.1922 However, this approach of calculating molecular descriptors is only appropriate to small molecules, rather than macromolecules or larger molecules. For instance, the commerce soft, Dragon,18 which has been widely used in molecular descriptor calculation, is not appropriate to aptamers (typical with 15–45 nucleotides).

Since the nucleotide sequence of a genome segment encodes the amino acid sequence, the amino acid sequence indicates the structure information of the corresponding nucleotide sequence. Thus, the structure descriptors of amino acid sequence are correlated with that of their corresponding nucleotide sequence. In this paper, DNA aptamer sequences were translated into amino acid sequence with DNAMAN software (version 6.0). Then, molecular structures of amino acid sequences were constructed by ChemBioDraw Ultra 11.0 in ChemOffice 2008, which were subsequently converted to 3D structures with ChemBio3D Ultra 11.0. The 3D structures of amino acid sequences were optimized by applying the molecular mechanics (MM2 force field) method with the convergence criterion of minimum root-mean-square error of the gradient value reaching 0.42 kJ/(mol Å). Lastly, the optimized amino acid sequences saved as Sybyl MOL2 files (*.mol2) were used as the input files for Dragon 6.0 software,18 which can calculate 4885 descriptors for each molecule. After removing redundant and nonuseful structure descriptors, which are equal to a constant or have a pair of correlation coefficients greater than 0.95, 1000 variables remained for descriptor selection.

Acknowledgments

The work was financially supported by the National Natural Science Foundation of China (nos. 21190044 and 21190041), Scientific Research Fund of Hunan Provincial Education Department (no. 16A047), and the Open Project Program of State Key Laboratory of Chemo/Biosensing and Chemometrics (Hunan University) (no. 2016013).

Supporting Information Available

The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acsomega.8b01464.

  • Candidate aptamer sequences and their class labels (PDF)

The authors declare no competing financial interest.

Supplementary Material

ao8b01464_si_001.pdf (100.5KB, pdf)

References

  1. Tan W.; Donovan M. J.; Jiang J.-H. Aptamers from Cell-Based Selection for Bioanalytical Applications. Chem. Rev. 2013, 113, 2842–2862. 10.1021/cr300468w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Kashefi-Kheyrabadi L.; Mehrgardi M. A.; Wiechec E.; Turner A. P.; Tiwari A. Ultrasensitive detection of human liver hepatocellular carcinoma cells using a label-free aptasensor. Anal. Chem. 2014, 86, 4956–4960. 10.1021/ac500375p. [DOI] [PubMed] [Google Scholar]
  3. Shi H.; Cui W.; He X.; Guo Q.; Wang K.; Ye S.; Tang J. Whole cell-SELEX aptamers for highly specific fluorescence molecular imaging of carcinomas in vivo. PLoS One 2013, 8, e70476 10.1371/journal.pone.0070476. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Hong H.; Goel S.; Zhang Y.; Cai W. Molecular imaging with nucleic acid aptamers. Curr. Med. Chem. 2011, 18, 4195–4205. 10.2174/092986711797189691. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Guo Q.-P.; Liu X.-D.; Tan Y.-Y.; Wang K.-M.; Yang X.-H.; Zhou Y.; Ye L.; Zhao X.-Y. Selection of aptamers for human hepatocellular carcinoma with high specificity. Chin. Sci. Bull. 2013, 58, 2745–2750. 10.1360/972013-360. [DOI] [Google Scholar]
  6. Musafia B.; Oren-Banaroya R.; Noiman S. Designing anti-influenza aptamers: novel quantitative structure activity relationship approach gives insights into aptamer–virus interaction. PLoS One 2014, 9, e97696 10.1371/journal.pone.0097696. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Binning J. M.; Leung D. W.; Amarasinghe G. K. Aptamers in virology: recent advances and challenges. Front. Microbiol. 2012, 3, 29 10.3389/fmicb.2012.00029. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Bouchard P. R.; Hutabarat R. M.; Thompson K. M. Discovery and development of therapeutic aptamers. Annu. Rev. Pharmacol. Toxicol. 2010, 50, 237–257. 10.1146/annurev.pharmtox.010909.105547. [DOI] [PubMed] [Google Scholar]
  9. Yu X.; Deng J.; Guo Q. Computer-aided design of aptamers for SMMC-7721 liver carcinoma cells. Turk. J. Biochem. 2017, 42, 565–570. 10.1515/tjb-2016-0166. [DOI] [Google Scholar]
  10. Yu X.; Yu Y.; Zeng Q. Support Vector Machine Classification of Streptavidin-Binding Aptamers. PLoS One 2014, 9, e99964 10.1371/journal.pone.0099964. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Yu X.; Yu R.; Tang L.; Guo Q.; Zhang Y.; Zhou Y.; Yang Q.; He X.; Yang X.; Wang K. Recognition of candidate aptamer sequences for human hepatocellular carcinoma in SELEX screening using structure–activity relationships. Chemom. Intell. Lab. Syst. 2014, 136, 10–14. 10.1016/j.chemolab.2014.05.002. [DOI] [Google Scholar]
  12. Li B.-Q.; Zhang Y.; Huang G.; Cui W.; Zhang N.; Cai Y. D. Prediction of aptamer-target interacting pairs with pseudo-amino acid composition. PLoS One 2014, 9, e86729 10.1371/journal.pone.0086729. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Thorne H.; Willems A. J.; Niedermayr E.; Hoh I. M.; Li J.; Clouston D.; Mitchell G.; Fox S.; Hopper J. L.; et al. Kathleen Cunningham Consortium for Research in Familial Breast Cancer Consortium, Bolton, D. Decreased prostate cancer-specific survival of men with BRCA2 mutations from multiple breast cancer families. Cancer Prev. Res. 2011, 4, 1002–1010. 10.1158/1940-6207.CAPR-10-0397. [DOI] [PubMed] [Google Scholar]
  14. Lian F.; Sharma N. V.; Moran J. D.; Moreno C. S. The biology of castration-resistant prostate cancer. Curr. Probl. Cancer 2015, 39, 17–28. 10.1016/j.currproblcancer.2014.11.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Huang Z.-X.; Xie Q.; Guo Q.-P.; Wang K.-M.; Meng X.-X.; Yuan B.-Y.; Wan J.; Chen Y.-Y. DNA aptamer selected for specific recognition of prostate cancer cells and clinical tissues. Chin. Chem. Lett. 2017, 28, 1252–1257. 10.1016/j.cclet.2017.01.002. [DOI] [Google Scholar]
  16. Duan M.; Long Y.; Yang C.; Wu X.; Sun Y.; Li J.; Hu X.; Lin W.; Han D.; Zhao Y.; Liu J.; Ye M.; Tan W. Selection and characterization of DNA aptamer for metastatic prostate cancer recognition and tissue imaging. Oncotarget 2016, 7, 36436–36446. 10.18632/oncotarget.9262. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Wang Y.; Luo Y.; Bing T.; Chen Z.; Lu M.; Zhang N.; Shangguan D.; Gao X. DNA Aptamer Evolved by Cell-SELEX for Recognition of Prostate Cancer. PLoS One 2014, 9, e100243 10.1371/journal.pone.0100243. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Todeschini R.; Consonni V.; Mauri A.; Pavan M.. DRAGON-Software for the Calculation of Molecular Descriptors, revision 6.0 for Windows; Talete s.r.l.: Milan, 2010.
  19. Deeb O.; Goodarzi M.; Khadikar P. V. Quantum Chemical QSAR Models to Distinguish Between Inhibitory Activities of Sulfonamides Against Human Carbonic Anhydrases I and II and Bovine IV Isozymes. Chem. Biol. Drug Des. 2012, 79, 514–522. 10.1111/j.1747-0285.2011.01309.x. [DOI] [PubMed] [Google Scholar]
  20. Le T.; Epa V. C.; Burden F. R.; Winkler D. A. Quantitative Structure Property Relationship Modeling of Diverse Materials Properties. Chem. Rev. 2012, 112, 2889–2919. 10.1021/cr200066h. [DOI] [PubMed] [Google Scholar]
  21. Shahlaei M. Descriptor Selection Methods in Quantitative Structure–Activity Relationship Studies: A Review Study. Chem. Rev. 2013, 113, 8093–8103. 10.1021/cr3004339. [DOI] [PubMed] [Google Scholar]
  22. Singh N.; Chaudhury S.; Liu R.; Abdulhameed M. D.; Tawa G.; Wallqvist A. QSAR Classification Model for Antibacterial Compounds and Its Use in Virtual Screening. J. Chem. Inf. Model. 2012, 52, 2559–2569. 10.1021/ci300336v. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

ao8b01464_si_001.pdf (100.5KB, pdf)

Articles from ACS Omega are provided here courtesy of American Chemical Society

RESOURCES