Abstract
In this study, the authors studied the protein structure prediction problem by the two‐dimensional hydrophobic–polar model on triangular lattice. Particularly the non‐compact conformation was modelled to fold the amino acid sequence into a relatively larger triangular lattice, which is more biologically realistic and significant than the compact conformation. Then protein structure prediction problem was abstracted to match amino acids to lattice points. Mathematically, the problem was formulated as an integer programming and they transformed the biological problem into an optimisation problem. To solve this problem, classical particle swarm optimisation algorithm was extended by the single point adjustment strategy. Compared with square lattice, conformations on triangular lattice are more flexible in several benchmark examples. They further compared the authors’ algorithm with hybrid of hill climbing and genetic algorithm. The results showed that their method was more effective in finding solution with lower energy and less running time.
Inspec keywords: proteins, molecular biophysics, molecular configurations, particle swarm optimisation, bioinformatics
Other keywords: extended particle swarm optimisation method, triangular lattice, protein structure prediction problem, two‐dimensional hydrophobic–polar model, noncompact conformation, amino acid sequence, single point adjustment strategy, protein folding
1 Introduction
Protein structure prediction, or protein folding problem, is one of the fundamental tasks in bioinformatics study. Pursuit of this ‘holy grail’ is important to understand the relationship between protein structure and protein function, to further help design drugs with specific therapeutic properties, and to finally grow biological polymers with the specific material properties in a synthetic way. If we treat protein structure folding in a simplified way as determining the three‐dimensional (3D) structure from the 1D amino acid sequence by minimising the energy function. This remains one of the most challenging problems in global optimisation [1].
If we further simplify the energy function in a discrete way, protein structure prediction problem will be translated to maximise the number of hydrophobic–hydrophilic (HH) interactions from the non‐covalently interacting lattice neighbours. This combinatorial optimisation problem represents the hard core of protein folding problem and is widely used in practice. For example, chemists evaluate new hypothesis of protein structure prediction by hydrophobic–hydrophilic (HP) model [2]. Also the simplicity of model allows a rigorous analysis of efficiency for a folding algorithm. In fact, this model has become a standard in testing efficiency of folding algorithm.
At present, there are four popular lattice models in terms of visual comparison, including 2D square and triangular lattice models, 3D cubic lattice model and face‐centred‐cube model. The protein structures obtained from the four modelling types were compared with reported real biological protein structures. It was showed that the 2D triangular lattice model can give a better structure modelling and prediction for proteins with short primary amino acid sequences [3, 4].
Traditionally, the protein structure prediction problem was studied on square lattice for 2D HP model, because it has many associated benchmarks, large amount of data accumulated over years, and the availability of comparison with different strategies and modelling methods. By contrast, little work has been done on the 2D triangular lattice. Moreover from the view point of applications of HP model, it is not natural to limit the native state to be a square or a rectangle. Therefore, we aim to extend the model and find native configuration of protein on a triangular lattice in this paper. Then more flexible and different structures can be expected. A significant drawback of the square lattice is that, two amino acids in contact in any folding must be at odd distance away in the protein sequence. However, on the triangular lattice each lattice point has six neighbours, each residue has two covalent neighbours, except the first one and the last one, an amino acid can be in topological contact with at most four other residues on the triangular lattice. Thus each residue is involved in up to four HH bonds.
With the unit vectors obtained from the triangular lattice, it is much easier to model protein conformation on a 2D triangular lattice without exhibiting the parity problem. However, HP model is hard to solve in computation. Crescenzi et al. [5] have proved that decision problem for 2D HP model is NP‐complete. Therefore its optimization problem is an NP‐hard problem. So heuristic search algorithms for a variety of lattice models have been proposed and proven useful to explore the relationship between the primary amino acid sequence and its native folding structure, particularly in the protein folding problem and protein structure prediction. How to find an effective heuristic method will be a primary research objective for 2D HP lattice model. In the past years, some heuristic search algorithms have been developed for various lattice models [6–12]. Particle swarm optimisation (PSO) was adopted, which was presented by Kennedy and Eberhart in 1995 and inspired by the social behaviour of bird flocking [13]. Currently, PSO has been successfully used to solve protein structure prediction problem on the 2D square lattice [14, 15]. One challenge is to define the adjustment operators and adjustment sequences to extend PSO for predicting conformations of amino acid sequences on triangular lattice.
This paper is organised as follows. Section 2 presents the 2D HP mathematical model. In Section 3, the modified PSO is described. Section 4 focuses on finding the ground state. Section 5 concludes the paper with discussion.
2 Combinatorial optimisation for HP model on triangular lattice
Suppose that the number of amino acids in a protein sequence is n and the number of lattice points is m. If m = n, the conformation of the sequence is defined as compact. If m > n, the conformation is defined as non‐compact. Searching the optimal conformation on square lattice was studied by PSO in [14, 15]. In this paper, we will consider to find the optimal conformation on triangular lattice. According to the classical HP model, all amino acids are classified as either hydrophobic (H) or polar (P). The folding of a protein sequence is defined as a self‐avoiding walk in a lattice. After setting the core of the triangular lattice on the core of coordinate system and the number for grid points of the triangular lattice, we denote
| (1) |
x i is the grid point which is visited by the ith amino acid. The set of neighbour points of x i is denoted by N(x i ). On triangular lattice, the number of neighbour points x i is . Denote , Then the optimisation model with constraints is established as follows
| (2) |
In this way, we translate the protein structure prediction problem to a combinational optimisation problem. This problem is known as NP‐hard. As the next step, we will design efficient heuristic algorithm to solve it.
3 Extended PSO (EPSO)
The optimal folding conformations of amino acid sequence can be predicted by PSO method on square lattice [14, 15]. Several improvements have been introduced here to make it more efficient on triangular lattice. Before proceeding, we will first introduce some new notations.
Each particle in PSO flies through the search space to find the best solution with an adaptable velocity that is dynamically modified according to its own flying experience and also to the flying experience of the other particles. The positions of particles are initialised by n‐dimensional random vectors. The ith particle is denoted as x i = (x i1, x i2,…,x in), where x ij is the number of grid point that is visited by the jth amino acid in the ith particle. In this paper, the adjustment operator is defined as T(k, l), which means that the kth element is placed after the (l−1)th element, then the other elements will be ranked by original order for particle x i . For example, x i = (3, 11, 24, 6, 10, 81), V = (5, 2), then x i ′ = x i + V, in which x i ′ = (3, 10, 11, 24, 6, 81), and ‘+’ indicates that the particle x i will be adjusted by adjustment operator V. When the particle will be adjusted by some operators, we take these ordered operators as adjustment sequence in which the arbitrary different adjustment operators are not satisfied with commutability.
Each particle keeps track of its own position, which is associated with the best fitness it has achieved so far. On iteration, particle's velocity and position are updated using the local best and the global best positions. The position corresponding to the population is known as ‘p ibest’ and the overall best out of all the particles in the population is called as ‘p gbest’.
The position of every particle is updated by the formula . Then denote , the new velocity of every particle is updated by the following equation
in which r 1, r 2 ∈ (0, 1) are random numbers, both and are adjustment sequences.
4 Numerical simulation
To assess its performance, we applied the EPSO to instances on triangular lattice. These instances come from standard benchmark dataset, and are shown in Table 1. Sequences 1–4 were simulated in [14, 15], and sequences 5–8 were computed in [3]. We also choose these sequences to be convenient for comparison. In the following figures, polar amino acid will be depicted as ‘◆’, hydrophobic amino acid is depicted as ‘●’ and the black lines show the covalent bond between the adjacent amino acids. To make it easier to understand, we showed the 4 × 4 triangular lattice (Fig. 1) in which all edges are equal and are treated as 1.
Table 1.
Sequences for simulating
| Sequence | Length | Protein sequence |
|---|---|---|
| 1 | 20 | HPHPPHHPHPPHPHHPPHPH |
| 2 | 20 | HHHHHPHHHHHHPHHHHPHH |
| 3 | 17 | HPPHHPPHHPPHHPPHH |
| 4 | 25 | HHPPPPHHPPPPHHPPPPHHPPPPH |
| 5 | 24 | HHPPHPPHPPHPPHPPHPPHPPHH |
| 6 | 25 | PPHPPHHPPPPHHPPPPHHPPPPHH |
| 7 | 36 | PPPHHPPHHPPPPPHHHHHHHPPHHPPPPHHPPHPP |
| 8 | 48 | PPHPPHHPPHHPPPPPHHHHHHHHHHPPPPPPHHPPHHPPHPPHHHHH |
Fig. 1.

One 4 × 4 triangular lattice example
4.1 Sequence 1
This sequence is embedded into the square lattice and triangular lattice with 25 lattice points, respectively. The lattice points in the first row are tabbed as 1–5 from left side. The points of other rows are done sequentially. The 13th point is treated as the origin of the lattice. The non‐compact conformation with 13 HH bonds on triangular lattice is shown in Fig. 2 a.
Fig. 2.

One non‐compact conformation of sequence 1 on
a Triangular lattice
b Square picture
To compare, we can find the optimal non‐compact conformations with nine HH pairs on square lattice by our model and algorithm. The results are shown in Fig. 2 b.
It indicates that our new method can not only obtain the non‐compact conformations on triangular lattice, but also on square lattice. Comparing Figs. 2 a and b, it shows that the optimal structure is more flexible on triangular lattice. First, the optimal configuration on triangular lattice has more HH pairs. This is naturally since triangular allows the diagonal interaction. Second, the triangular lattice allows a tight core with ten hydrophobic amino acids. The square lattice has similar shape, but the core part is not as tight as triangular lattice since lacking of diagonal interactions.
4.2 Sequence 2
This benchmark example has 20 amino acids and is embedded into a 5 × 5 triangular lattice and 4 × 5 triangular lattice of R 2, respectively. The lattice points in the first row are signed as label 1–5 from the left side. The points of other rows are ordered sequentially. We can find the optimal conformations with 23 HH pairs on 4 × 5 triangular lattice. One of the optimal compact conformations is demonstrated in Fig. 3 a, and non‐compact conformation with 23 HH bonds on 5 × 5 triangular lattice is shown in Fig. 3 b.
Fig. 3.

One compact conformation of sequence 2 on
a 4 × 5 triangular lattice
b 5 × 5 triangular lattice
The results indicate that the compact and non‐compact configurations can be found on triangular lattice. A close comparison reveals that both the compact conformation and non‐compact conformation are with 23 HH bonds, but non‐compact results on triangular lattice has more flexible shape than compact ones. We believe that this shape is closer to the native conformation of protein.
4.3 Sequence 3
This sequence has 17 amino acids and is embedded into a 5 × 5 triangular lattice of R 2. The lattice points in the first row are labelled as 1–5 from the left side. The points of other rows are labelled sequentially. The minimal free energy of this sequence is −12. One of the optimal conformations is demonstrated in Fig. 4 a with 12 HH pairs.
Fig. 4.

One optimal conformation of sequence 3 on
a 5 × 5 triangular lattice
b 5 × 5 square lattice
The length of sequence 3 is 17 and it is not the product of two integers, so we cannot find its compact conformations on triangular lattice. Folding on square lattice is not possible either. The computational results show that our method can be used to find the optimal non‐compact conformation of this type of sequence on triangular lattice. The optimal conformation with eight HH pairs was found in [15] on square lattice. One of the non‐compact conformations is demonstrated in Fig. 4 b.
The configurations on triangular lattice are more flexible than ones on square lattice. It means that our model and method can be applied to fold this type of amino acid sequences.
4.4 Sequence 4
This example has 25 amino acids and is embedded into a 5 × 5 and 5 × 6 triangular lattices of R 2 respectively. The lattice points in the first row are signed as label 1–5 from the left side. The points of other rows are ordered sequentially. One of the optimal compact conformations is demonstrated in Fig. 5 a. One of the optimal non‐compact conformations is demonstrated in Fig. 5 b.
Fig. 5.

Compact and non‐compact configurations of sequence 4 on
a 5 × 5 triangular lattice
b 5 × 6 triangular lattice
c 5 × 6 square lattice
To compare the results, this sequence is embedded into a 5 × 6 square lattice. We can find the optimal non‐compact conformations with eight HH pairs on square lattice. It is shown in Fig. 5 c.
The results demonstrate that the compact and non‐compact configurations can be found on triangular lattice and are both with 12 HH bonds. However, non‐compact results on triangular lattice have more flexible shape than compact ones, and it is closer to the native conformation of protein. Fig. 5 c shows that non‐compact conformation with eight HH bonds was obtained on square lattice. It indicates that the conformations folded on triangular lattice are with lower free energy and are more biological realistic than ones folded on square lattice.
To demonstrate the advantages on triangular lattice, the lowest free energies of these sequences are given in Table 2.
Table 2.
HH pairs of the sequences on triangular lattice and square lattice
| HH pairs on triangular lattice | HH pairs on square lattice | |
|---|---|---|
| sequence 1 | non‐compact 13 | non‐compact 9 |
| sequence 2 | non‐compact 23 | |
| compact 23 | ||
| sequence 3 | non‐compact 13 | non‐compact 8 |
| sequence 4 | non‐compact 12 | non‐compact 8 |
| compact 12 |
4.5 Sequences 5–8
Moreover, we computed sequences 5–8 from [3, 6], which were simulated non‐compact conformations on triangular lattice by using hybrid of hill climbing and genetic algorithm (HHGA) and simple genetic algorithm (SGA), respectively. The results, which are given in Table 3, demonstrate that our model and method (EPSO) are more effective to discover the optimal conformation and can achieve better solution quality (lower energy) than SGA, but it is similarly good approach with HHGA.
Table 3.
Minimal free energies by EPSO, SGA and HHGA
In [3], HHGA was run for 30 times for measuring the average running time (Table 4). Now, we run our method (EPSO) for 30 times also. The experimental results show that EPSO obtains this advantage at the cost of running time. It shows that EPSO gains this advantage at the cost of running time.
Table 4.
Average running time of EPSO
| Sequence | Length | Average running time | |
|---|---|---|---|
| HHGA | EPSO | ||
| 5 | 24 | 378.99 | 22.43 |
| 6 | 25 | 403.84 | 23.78 |
| 7 | 36 | 713.55 | 61.22 |
| 8 | 48 | 1173.2 | 126.17 |
5 Conclusion
The protein structure prediction on triangular lattice was studied in this paper. We proposed a mathematical model, which has the following advantages. First, it is highly simplified. Second, it leads itself to exact results. Third, it is easy to perform simulation. The results indicated that the improved PSO can find new conformations with minimal free energy. It was faster than other stochastic sampling methods. It can be applied to find compact and non‐compact conformations, thus can fold the sequence with arbitrary length. An empirical study demonstrated the effectiveness of the extended PSO and optimal model for solving 2D HP model on triangular lattice.
There are many future directions to pursuit. Since the free energy in the model is given only by the number of non‐specific hydrophobic contact, the positions of polar segments are not directly optimised when searching for optimal structures. This may result in unnatural structures if these segments are too long to be located at the ends of the sequences. A modification is required to try to obtain more nature‐like structures for the HP model's scoring system. In addition, we intend to develop and study modified PSO algorithm for other types of protein folding problems such as 3D HP lattice model. Overall, we believe that the modified PSO method offers considerable potential for protein structure prediction problem.
6 Acknowledgments
This work was supported by subject of Foundation from Nanjing University of Aeronautics and Astronautics (grant no. NN2014083), Natural Science Foundation of China (grant no. 61304178) and subject of Foundation from Dalian Jiaotong University (grant no. L2011068).
7 References
- 1. Wang F., Song J., Song Y.: ‘Application of BP neural network in protein secondary structure prediction’, Comput. Technol. Dev., 2009, 19, (5), pp. 217–218 [Google Scholar]
- 2. Dill K.A., Bronmberg S., Yue K. et al.: ‘Principles of protein folding a perspective from simple exact models’, Protein Sci., 1995, 4, pp. 561–602 (doi: 10.1002/pro.5560040401) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Su S.C., Lin C.J., Ting C.K.: ‘An effective hybrid of hill climbing and genetic algorithm for 2D triangular protein structure prediction’, Protemoe Sci., 2011, 9, (Suppl 1), s19 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4. Garcia J.M., Garzon E.M., Cecilia J.M.: ‘An efficient approach for solving the HP protein folding problem based on UEGO’, J. Math. Chem., 2015, 53, pp. 794–806 (doi: 10.1007/s10910-014-0459-1) [DOI] [Google Scholar]
- 5. Crescenzi P., Goldman D., Papadimitriou C.H. et al.: ‘On the complexity of protein folding’, J. Computat. Biol., 1998, 1009, (3), p. 423 (doi: 10.1089/cmb.1998.5.423) [DOI] [PubMed] [Google Scholar]
- 6. Hoque M.T., Chetty M., Dooley L.S.: ‘A hybrid genetic algorithm for 2D FCC hydrophobic‐hydrophilic lattice model to predict protein folding’, Adv. Artif. Intell. Lect. Notes Comput. Sci., 2006, 4304, pp. 867–876 (doi: 10.1007/11941439_91) [DOI] [Google Scholar]
- 7. Shaw D.L., Islam A.S., Rahaman M.: ‘Protein folding in HP model on hexagonal lattice with diagonals’, BMC Bioninf., 2014, 15, (suppl 2), s7 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Albrechta T.T., Skaliotisb A., Steinhofelb K.: ‘Stochastic protein folding simulation in the three‐dimensional HP model’, Comput. Biol. Chem., 2008, 32, pp. 248–255 (doi: 10.1016/j.compbiolchem.2008.03.004) [DOI] [PubMed] [Google Scholar]
- 9. Islam A.S., Rahman M.S.: ‘On the protein folding problem in 2D triangular lattices’, Algorithms Mol. Biol., 2013, 8, (1), p. 30 (doi: 10.1186/1748-7188-8-30) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Istrail S., Lam F.: ‘Combinatorial algorithms for protein folding in lattice models: a survey of mathematical results’, Commun. Inf. Syst., 2009, 9, (4), pp. 303–346 [Google Scholar]
- 11. Wang Y.: ‘Research on protein's structure prediction and classification by using neural networks’. The PhD thesis. April 2005. [Google Scholar]
- 12. Li Z.P., Zhang X.S., Chen L.N.: ‘Unique optimal folding of proteins on a triangular lattice’, Appl. Bioinf., 2005, 4, (2), pp. 105–116 (doi: 10.2165/00822942-200504020-00004) [DOI] [PubMed] [Google Scholar]
- 13. Kennedy J., Eberhart R.C.: ‘Particle swarm optimization’. Proc. of IEEE Int. Conf. on Neural Networks, Piscataway, America, 1995, pp. 1942–1948 [Google Scholar]
- 14. Yan W.J., Guo Y.Z.: ‘Modified particle swarm optimization algorithm for protein structure prediction problem’, Comput. Technol. Dev., 2011, 21, (12), pp. 109–112 [Google Scholar]
- 15. Guo Y., Wang Y.: ‘Predicting the non‐compact conformation of amino acid sequence by particle swarm optimization’. The 7th Int. Conf. on Systems Biology (ISB), Huangshan, China, August 2013, pp. 23–25 [Google Scholar]
