Abstract
COVID-19 is a global pandemic that aroused the interest of scientists to prevent it and design a drug for it. Nowadays, presenting intelligent biological data analysis tools at a low cost is important to analyze the biological structure of COVID-19. The global alignment algorithm is one of the important bioinformatics tools that measure the most accurate similarity between a pair of biological sequences. The huge time consumption of the standard global alignment algorithm is its main limitation especially for sequences with huge lengths. This work proposed a fast global alignment tool (G-Aligner) based on meta-heuristic algorithms that estimate similarity measurements near the exact ones at a reasonable time with low cost. The huge length of sequences leads G-Aligner based on standard Sine–Cosine optimization algorithm (SCA) to trap in local minima. Therefore, an improved version of SCA was presented in this work that is based on integration with PSO. Besides, mutation and opposition operators are applied to enhance the exploration capability and avoiding trapping in local minima. The performance of the improved SCA algorithm (SP-MO) was evaluated on a set of IEEE CEC functions. Besides, G-Aligner based on the SP-MO algorithm was tested to measure the similarity of real biological sequence. It was used also to measure the similarity of the COVID-19 virus with the other 13 viruses to validate its performance. The tests concluded that the SP-MO algorithm has superiority over the relevant studies in the literature and produce the highest average similarity measurements 75% of the exact one.
Keywords: COVID-19, Bioinformatics, Pairwise global alignment, Sine–Cosine optimization algorithm, Particle swarm optimization algorithm
1. Introduction
Coronavirus disease 2019 (COVID-19) is a contagious virus created as a result of an evolution of severe acute respiratory syndrome coronavirus (SARS-CoV). The infected people were detected firstly in December 2019 in Wuhan city (China) and became a dilapidation pandemic when it flare-up through most of the countries. Until December 2020, more than 70 million infected cases were reported include 1.5 million deaths as reported by the World Health Organization (WHO). COVID-19 is a critical human disease that infects the liver, nervous systems, and respiratory system [1], [2].
It transmits from bats to human and it has high mobility for transmission from human to human through the air, close personal contact, touching surfaces containing viral particles, and rare stool contamination [3].
Therefore, Intensive efforts are being made to analyze the virus to design a drug for it and model the pandemic spreading to overcome the devastating proliferation of COVID-19 [4], [5]. Studies have been performed for modeling the physical transportation of COVID-19 in the air [6], [7]. In [8], the stability of the transmission model of COVID-19 that was developed based on the SEIR model was investigated under different control strategies. In [9], The transmission of the airborne germ of COVID-19 was provided from a physics view based on fluid dynamics analysis methodology. Another trial was done to minimize the indoor transmission of COVID-19’s airborne [10]. The effect of weather on the transmission of COVID-19 was researched also [11], [12]. Besides, another research direction of COVID-19 is testing the influence of temperature on its spreading [13], [14], [15].
Computational intelligent tools were developed for analyzing the behavior of the virus such [16], [17], [18], [19], [20], [21], [22], [23] besides intelligent diagnosis tools of COVID-19 were proposed [19], [22], [24], [25], [26], [27]. Also, docking tools are developed for docking antibodies and peptides against the ligands of protein of COVID-19 [28], [29]. Also, the internet of things research was performed to overcome the spreading of COVID-19 [30], [31], [32]. All these research efforts aim for analyzing the pandemic and designing a drug for it. One of the important tools that aid in analyzing COVID-19 for constructing its phylogenetic trees, predicting its structure is measuring the similarity of COVID-19 with other viruses. The pairwise global alignment algorithm proposed by Needleman–Wunsch (NW) [33] is the most accurate technique for measuring the similarity of two biological sequences. Pairwise global alignment aligns the entire of the two sequences not the portion of sequences such as local sequence alignment [34] such as in Fig. 1. A similar portion of the two sequences is colored in blue while the other portions align with gaps ’-’ to shift the similar ones.
Fig. 1.
Aligning two biological sequences using global alignment.
The most similar viruses of COVID-19 can be detected by using the NW algorithm to align them against the huge biological databases of viruses.
The main limitation of the NW global alignment algorithm is its huge consuming time especially for huge length sequences (COVID-19 has a length exceeds 7000 bases pair) however it provides the most accurate alignment results.
Hence, a fast global alignment tool G-Aligner) is needed to be developed for fast primary scanning of databases to detect the viruses with the highest similarity scores (near to the exact ones founded by the NW algorithm). The aim of this primary scanning is to filter the huge number of sequences in the database into some sequences with reasonable similarity scores as near as possible from the exact score which decreases the search time. NW algorithm can be used to align these filtered sequences to measure the accurate and exact similarity score. Fig. 2 shows the rule of using the accelerated global aligner (G-Aligner) technique to align COVID-19 with another virus to test the similarity between them. The results of similarity will be used with other applications such as designing drugs, prediction of protein structure, and constructing the phylogenetic tree of COVID-19.
Fig. 2.
The rule of using G-Aligner with NW alignment and other applications of analyzing COVID-19.
The pairwise global alignment algorithm was accelerated in the literature using hardware acceleration devices such as using Graphical Processing Unit (GPU) devices [35], [36], [37], [38] and Field Programmable Gate Arrays (FPGAs) [39], [40], [41], [42], [43], [44]. These quick versions of global alignment propose efficient speedup when using massive parallelization devices but are cost money.
Hence, the necessity of this work is developing a low cost accelerated global aligner tool (G-Aligner) that can produce a fast measurement of the similarity between pair of biological sequences with reasonable results near to the exact ones produced by the NW algorithm. The main innovation is using the stochastic search of meta-heuristic algorithms [45] in designing the G-Aligner tool.
The meta-heuristic algorithm is a search-based algorithm that accelerates exploring the search space of the problem based on random movement to find the best solutions [45]. The meta-heuristic algorithm mimics the search methods from nature, physics, or humans [45] such as the Sine–Cosine Optimization algorithm (SCA) [46] which mines the search space by attracting the search agents toward the best candidates based on the sine and cosine operators. Besides, Particle Swarm Optimization (PSO) [47] mimics the search strategy from the bird flocking from nature. Also, there are a lot of released algorithms such as Ions Motion Optimization (IMO) [48], Lightning Attachment Procedure Optimization [49]. Gravitational Search Algorithm (GSA) [50], Electromagnetic Field Optimization (EFO) [51], Moth-Flame Optimization (MFO) [52], and other hundreds of algorithms are developed.
Meta-heuristic algorithms are succeeded to enhance the performance of many bioinformatics tools such as protein folding prediction [53], [54], protein structure prediction [55], [56], [57], Drug discovery [58], [59], local alignment [60], and other applications which are reviewed in [61], [62] and that motivates for using meta-heuristic algorithms for accelerating the pairwise global alignment.
In this work, pairwise global alignment is formulated as an optimization problem where the objective is accelerating the execution time while producing a reasonable similarity score near to the exact one. G-Aligner was implemented using SCA [46] and the performance of it was validated on a two set of experimental biological data. The first was a set of Homo sapiens biological sequences and the second was COVID-19 virus protein versus 13 proteins of other viruses. The G-Aligner based on SCA reduced the execution time significantly over various lengths of biological sequences but achieved an average similarity score of 39% of the exact one measured by NW global alignment [33]. Hence, the performance of G-Aligner based on SCA needs to be enhanced.
In the literature, there are previous studies to enhance the performance of SCA such as improving SCA by applying the opposition of solutions to increase the exploration of SCA (ISCA) [63]. In m-SCA [64], SCA was enhanced by applying the opposition on the solutions besides adding a self-adaptive parameter was added in the updating equations of SCA to enhance the exploitation of promising regions of search space.
Besides, SCA was merged with other algorithms to enhance the performance of it such as SCA-DE [65] and SCA-PSO [66]. In SCA-DE [65], SCA updates the search agents for several iterations and at the last of each iteration, DE was used to update the solutions based on the updating mechanism of DE.
In SCA-PSO [66], SCA updates its search agents for some iteration based on updating equation of SCA and the best fitness of each search agent of SCA are saved in addition to the best global solutions among all search agents. Then, PSO is used to update the search agents of SCA based on updating the equation of PSO toward the best solution achieved by each search agent and the best global solution and the new solutions will be updated using SCA again. SCA-DE and SCA-PSO were developed to get the benefit of the two algorithms (efficient exploration of the search space using SCA beside the efficient exploitation of DE and PSO).
The chaotic Sine–Cosine algorithm was merged with chaotic firefly (CSCF) [67] where when updating each agent it can be updated using chaotic SCA or chaotic firefly depends on the fitness value of each solution. There was 5 version of embedding the chaos parameters on the two algorithms while the updating equation was used chosen randomly. SCA-GWO [68] is integration between SCA and Gray Wolf Optimization (GWO) algorithm [69] that developed to benefit from the advantage of SCA for exploration and GWO for the exploitation of the search space. In SCA-GWO, SCA was executed first for all agents toward the best global solution founded and the best solution each agent visit. Then GWO was used to update the solutions for exploitation.
Besides, another different hybrid scheme of SCA and PSO (ASCA-PSO) was proposed that performs SCA and PSO in parallel as a two-layer. The bottom one responsible for exploring the search space using SCA while the upper one intensifies the best solutions founded from the bottom layer using PSO. ASCA-PSO succeeded to accelerate the performance of the local alignment algorithm [60].
G-Aligner was implemented using all the enhanced versions of SCA in the literature and ASCA-PSO produced the highest similarity scores (51.5% of the exact ones). The poor results of G-Aligner based on ASCA-PSO since its fall in local minima after some iteration and its exploration capability of SCA need to be enhanced.
The main argument of this work is enhancing the performance of ASCA-PSO for G-Aligner using the mutation operator in the updating equations of SCA to increase the efficiency of the exploration of the search space and applying the opposition on the solutions that fall in local minima to avoid it.
The main contributions in this paper are concluding as follows:
-
1-
A fast and low cost pairwise global alignment technique was proposed based on meta-heuristic algorithms.
-
2-
The SCA algorithm was improved by hybrid with PSO using mutation operators and opposition operators to enhance the exploration capability and avoiding trapping in local optima.
-
3-
The performance of the proposed algorithm (SP-MO) was validated on a set of IEEE benchmark mathematical functions.
-
4-
G-Aligner based on SP-MO algorithm was validated to measure the similarity of COVID-19 virus with other 13 viruses and achieved similarity scores 75% of the exact ones measured by NW global alignment algorithm at a reasonable time.
The structure of the article is organized as follows: Section 2 presents the basic information pairwise global alignment algorithm. Section 3 presents the preliminaries of SCA, PSO, and ASCA-PSO. In Section 4, the global alignment technique based on meta-heuristic (G-Aligner) is presented and in Section 5 the proposed SP-MO is presented. Section 6 proposes the experimental results of validating SP-MO. Finally, Section 7 concludes the proposed work and results.
2. Pairwise global alignment algorithm
NW global alignment algorithm [33] is the standard algorithm for performing pairwise global alignment and produce accurate alignment results. It depends on the dynamic programming approach [70] which calculates all possible alignments. It aligns the similar residues between two biological sequences hence a gaps ’-’ are need to be inserted to shift the similar residues.
The algorithm starts by constructing a scoring matrix that has a size of row and column (m+1) and (n+1) in order where m and n are the lengths of the two sequences. The first row and column are filled with the negative index of the cells. For example, the first cell in row and column has a score (0), the second cell in the first row has a score (-1) and the third one has a score (-2), and so on and the same for the first column. The scores of cells that are starting from the second row and column until the final cell of the scoring matrix will be computed according to Eq. (1) based on the corresponding residues of the two sequences.
| (1) |
where the sequences to be aligned are represented as Seq and Seq and have lengths n and m respectively, i and j are the indices of the row and column where 1< i < m and 1< j < n. The scoring of gaps is based on linear gap () penalty to penalize the scoring of consecutive gaps where the score of an open gap is , score of the extended gap is and k is several inserted gaps [71].
Similarity () is a function that is used to measure the similarity between residues of proteins. A different scoring schemes for measuring the similarity of residues of proteins such as BLOcked SUbstitution Matrix (BLOSUM) and Point Accepted Mutation (PAM) [72]. The simple scoring scheme has penalized a score (+1) if the two residues are similar otherwise give (0) and this scheme which was applied in this paper and can be adjusted.
The second stage of alignment is tracing back to align the sequences that are performed after finishing the computing of the scoring matrix. The tracing starts from the final cell at the bottom right of the scoring matrix and finishes at the upper left cell. At each cell, it has three movements toward one of the upper, left, or diagonal cells. The next movement occurs toward the cell that has the maximum score. If the movement is diagonal then it means aligning similar two corresponding residues (one from each sequence). If the movement toward upper that means aligning one residue from the sequences in the row with a gap. If the movement toward the left then aligns a residue from the sequence in the column with a gap and this tracing movement resuming until reaching the start cell at first row and column.
After constructing the alignment, the similarity between the two sequences can be computed according to Eq. (2) where A and B represent the aligned sequences (with insertion gaps) and L denotes the length of (A or B).
| (2) |
The time complexity of the exact global alignment algorithm (NW) is O(n3), where (n) denotes the length of the sequences to be aligned (assuming the two sequences have the same length). It is clear that from the time complexity by increasing the length of sequences the execution time will be huge especially for sequences with huge lengths. Hence, there is a motivation to decrease the execution time of the NW algorithm by proposing a developed version of it based on meta-heuristic algorithms.
3. Preliminaries
The following subsections propose a brief description of the Sine–Cosine optimization algorithm (SCA), Particle Swarm Optimization (PSO) algorithms, and the hybrid algorithm of SCA and PSO algorithms (ASCA-PSO).
3.1. Sine-Cosine optimization algorithm (SCA)
SCA is a population-based optimization algorithm that depends on sine and cosine mathematical operators for updating the agents as in Eqs. (3), (4).
| (3) |
| (4) |
where () is the solution of the search agent (i), () the global best solution, (t) is the current iteration number, (T) is the maximum number of iterations, () is the parameter responsible for determining the next region of search and increase the exploration of search space for the higher value of it. (a) is a scaling factor that balances between the exploration and exploitation of SCA. Meanwhile, () defines the direction of movement toward or outwards (), and () controls the effect of destination on current movement. () is used to switch between sine and cosine functions as in Eq. (3).
The steps of SCA are presented in Algorithm (1). The time complexity of SCA is ) where (n) is the size of populations and () is the time cost of updating all populations per one iteration, and (T) is the number of iterations.
3.2. Particle swarm optimization (PSO)
PSO is a swarm optimization algorithm that mimics the attitude of birds flocking for flying. It has a stochastic search strategy that depends mainly on the global communications between the search agents, where all search agents modify their movements pointed to the global search agents that finds the global solution. Besides, it memorizes the best solutions each search agent pass through, which influences the new update of it as stated in Eq. (5), and this memorization of location enhances the exploitation phase of PSO.
The updating equations of PSO are represented as Eqs. (5), (6), where the particle (P) has the global best position (solution) among all search agents and the best personal position () that each search agents found during the previous iterations.
| (5) |
| (6) |
where represents the velocity of the th particle, and are the local the global best position coefficient in order. w is the inertia coefficient that estimates the influence of the prior velocity on the new estimated velocity. rand () is a uniformly distributed random variable in the range (0–1). The PSO search strategy as SCA in the algorithm (1) except the updating equations in step 6 will be Eqs. (5), (6).
PSO has a complexity of time where T, n, and express the number of iterations, the number of search agents, and the cost time of modifying the position of one search agent, respectively. PSO has a main advantage is the interchanging of information between search agents, which gives it more reliability to achieve an approximate optimal solution with acceptable convergence speed besides robustness. Besides, the agents move toward the best location it achieved in the previous iteration which makes its exploitation of the search space more efficient.
3.3. Two layer hybrid SCA and PSO (ASCA-PSO)
ASCA-PSO consists of two layers, the bottom layer (exploration layer) contains search agents that updates their movement according to the updating equations of SCA. Each search agent in the upper layer represents the best solution found from each group in the bottom layer that updates its movement using the PSO algorithm. There is a global solution () that represents the best solution founded among the agents of upper and bottom layers and it represents the output optimal solution.
The search agents in the bottom layer are divided into (M) groups where each group contains (N) search agents. Each group has the best agent in the upper layer (y, k: 1 to M) represents the best solution founded from the search agents in the bottom group, and all best agents in the upper layer are moved according to the updating equation of PSO. Each search agent in the bottom layer updates its movement according to Eq. (7).
| (7) |
where, () represents the solution of search agent in the bottom layer, (y) is the best solution founded of the group (i) in the bottom layer, (i) and (j) represent the indices of solutions in the top and bottom layer respectively.
Besides, Eqs. (8), (9) represents the updating movement equations of the search agents in the upper layer toward () which represents the best global solution founded among all search agents in the upper and bottom layers.
| (8) |
| (9) |
The synergy of execution SCA and PSO in two-layer form is performed by executing the first group in the bottom layer which explores the search space using the updating strategy of SCA based on Eq. (7) and update (y1) in the upper layer and () if a better fitness solution is founded. Then the agent (y1) is updated based on PSO updating strategy based on Eqs. (8), (9) to intensify the best solution founded from the exploration in the bottom layer. Then the second group in the bottom layer is executed and (y2) in the upper layer is updated for intensification of the search space and then the third group in the bottom and so on.
Hence, as shown in this hybridization mechanism after some exploration of the search space using SCA, it is intensified using PSO around the best-explored regions produced by the bottom layer. This mechanism of hybridization will increase the diversity of produced solutions and enhance the quality of solutions which was proved for enhancing the Fragmented Local Aligner Technique (FLAT) [60]. This advantage motivates to use of ASCA-PSO for optimization of the global sequence alignment algorithm.
ASCA-PSO algorithm has a time complexity of where N and M are the number of search agents in the bottom and top layer in order. and are the time cost for updating each search agent for PSO and SCA in order, and T is the number of iterations.
4. Pairwise global alignment based on meta-heuristic algorithms (G-Aligner)
This section presents the procedure of performing the pairwise global alignment based on a stochastic search using meta-heuristic algorithms. The global alignment algorithm can be formulated as an optimization problem where the desired output is finding the best alignment between the two sequences by matching the similar residues of proteins in a reasonable time smaller than that is consumed by NW global alignment algorithm.
To match the similar residues it is needed to insert gaps in different positions on the aligned sequences to shift the matching residues. Hence, the objective is inserting gaps (which can be 30% of the length of the sequences to be aligned) at locations in the aligned sequences that maximize the similarity of biological sequences.
The solution to this optimization problem is the locations of gaps in each sequence that maximize the similarity scores and the fitness is the similarity score of the aligned sequences that is estimated based on Eq. (2).
In case of the sequences have different lengths then it is needed first to equalize the aligned sequences by adding extra gaps to the shorter sequence that equal the difference between the length of the two sequences. For example, Fig. 3 shows the representation of solutions for performing the global alignment based on meta-heuristics algorithms.
Fig. 3.
Representation of the solution of the global alignment based on stochastic algorithms.
In part (a) of Fig. 3, the pair of sequences to be aligned have different lengths hence in part (b) the difference between lengths filled by blue gaps to equalize the two sequences and the red gaps are extra gaps inserted as a 30% of the shorter length for example. In part (c) of Fig. 3, the gaps are inserted in random locations to shift the residues to align similar ones. As shown and keeps the indices of gaps in each sequence hence and together represent one solution.
The similarity scores are estimated using Eq. (2) (fitness function) and each solution moves its gaps indices (positions in each sequence) toward the indices of the best solution founded (the solution that has the maximum similarity score) based on the updating mechanism of the meta-heuristic technique used.
The general procedure of performing the global alignment based on the meta-heuristic technique as follows :
-
1-
Constructing the aligned arrays of the two sequences for (N) solutions after equalization of the lengths of the pair of sequences and estimating the extra gaps as a specified percentage of the shorter sequence.
-
2-
Initialize (N) solutions with indices in each sequence by spreading the gaps over the entire length of sequences in random locations.
-
3-
Find the best solution from the N solutions that give the maximum similarity score based on Eq. (2).
-
4-
Update the solution toward the best solution founded according to the updating equation of the meta-heuristic techniques.
-
5-
Evaluating the similarity scores of the updated solutions and repeat from step (3) for some iterations.
-
6-
Output the aligned sequences according to the best location of gaps that maximize the similarity score of the two sequences.
5. The improved SCA algorithm based on mutation operator and opposition (SP-MO) for G-Aligner
This section proposes the procedure of the improved SCA algorithm based on integration with PSO and using mutation and opposition operators. In the proposed algorithm (SP-MO), the agents (x) are divided into some groups which update their movements based on SCA algorithm for exploring the search space. Then, before updating the following group, the best agent of the current group (y) is determined and is updated toward the global best agent of all agents (y) based on PSO operators for the exploitation of the search space. The global best solution (y) is updated if any solutions achieved better fitness.
This integration mechanism between SCA and PSO balance the exploration and exploitation of the search space. As mentioned in Section 4, the locations of gaps in each sequence represent the solution of the optimization problem which needs to be moved over the entire length of the sequences to maximize the similarity score. Due to the huge length of the sequences, there is a high possibility for trapping in local minima which are defined for the G-Aligner in the following.
If the locations of gaps of two solutions are near hence the fitness of the two solutions are the same approximately. Hence, the local minima of G-Aligner are represented as the locations of solutions that become nearby and be in the approximate stable form (movements in a small range of locations) which leads to approximate fitness (similarity score). Fig. 4 represents the local minima of G-Aligner where the vertical blue arrow represents the alignment score scale (fitness of solutions) and the seven circles represent the fitness of the solutions. Circle (2) has the highest similarity score and circles (3), (4), and (6) have fitness larger than the average score (red line) while circles (1), (5), and (7) have a fitness smaller than the average line.
Fig. 4.
The local minima of the proposed G-Aligner based on the SP-MO algorithm.
The solutions are attracted toward the global best solution (circle 2), hence as shown in the figure solutions (3), (4), and (6) have fitness approximate to the best solutions (circle 2) which means the locations of inserted gaps of the solutions are in near positions. While locations of gaps in solutions (1), (5), and (7) are located in positions that are far from that of the best solutions so they produce lower similarity scores than that of the best position (smaller than the average score). So, solutions (3), (4), and (6) become stable move with small step movement which mean they lie in local minima. Hence, if the solutions (3), (4), and (6) are opposed that may produce better fitness and enhance the best fitness founded.
The condition to determine if a solution trapped in local minima is the difference between its fitness and the best fitness founded being lower than the average fitness among all fitness of solutions. As shown in Fig. 4, for the circle (6) the distance to best solution (circle 2) is is lower than average fitness () so it needs to opposed while is larger than which means the locations of gaps of solution (7) are far from that of the best solution.
The opposing is occurred on the search agents of the upper layer of ASCA-PSO due to it influences the movement updating of the search agent of the bottom layer. Therefore, search agents in the bottom layer also explored and enhance their fitness if their best solution in the upper layer was opposed. The condition of a solution trapped in local minima and the procedure of opposing as follows:
where () and () represents the array of a position of the inserted gaps in the aligned sequences A and B, i is the index of the search agent of the upper layer, (F) is the global best alignment score among all search agents of the bottom and upper layer, (F) is the alignment score of the search agent (i).
A mutation operator is applied to the updating equations of the bottom layer (search agents of SCA) to increase the avoidance of the local minima of the problem. Since mutation operators are succeeded to enhance the exploration capability of many meta-heuristic techniques to increase the diversity of generated solutions [73], [74], [75], [76], [77], [78]. Two common mutation operators are used to enhance the meta-heuristic techniques are Gaussian mutation (GM) and Cauchy mutation (CM) operators.
The previous study proved that CM operator has an efficient search capability more than GM operator [73], [77], [79], [80]. The main reason behind that is CM operator has a broader distribution in the horizontal direction more than the vertical one however the GM operator has a broader distribution but in the vertical direction. Hence, this is the main motivation to use the CM operator.
The density function of the CM operator is used as follow :
| (10) |
Where g is the proportion parameter and is assigned value (1) [77], rand is a uniform random generator function in the range (0,1).
The updating equation of SP-MO after adding the CM are as follow :
| (11) |
| (12) |
| (13) |
where (x) represents the position of a gap in the sequences, (i) is the index of the group and (j) is the index of agent in the group. (y) is the best solution of search agents in the group (i) and is updated according to Eqs. (12), (13) (updating equation of PSO) toward the global best solution founded among all search agents in (y). () represents the Cauchy mutation operator used to increase the exploration of the solution in the bottom layer to increase the diversity of solutions.
6. Experimental results and discussion
The performance of the proposed SP-MO algorithm was evaluated on a set of uni-modal and multimodal benchmark mathematical functions of IEEE CEC [81]. Besides, the optimized global alignment technique (G-Aligner) using SP-MO was tested on real biological protein sequences (Homo sapiens proteins) were gathered from NCBI to validate its performance for measuring the similarity between pair of sequences. The founded similarity of scores is compared with that founded by the exact NW global alignment [33] to validate the quality of the solution of G-Aligner. Besides, the similarity of COVID-19 protein was measured with the other 13 viruses to validate the performance of G-Aligner based on SP-MO. The results of the SP-MO algorithm were compared with other recent development of SCA in the literature such as m-SCA, ISCA, SCA-DE, SCA-PSO and ASCA-PSO, SCA-GWO, and CSCF.
6.1. Evaluation of SP-MO’s performance on mathematical benchmark functions
In this section, the developed SP-MO algorithm was tested on 15 mathematical benchmark functions (unimodal and multimodal) that are described in Table 1. Table 2 shows the average optimum results (30 independent runs) for the proposed SP-MO algorithm versus other algorithms in the literature to find the optimum value of the mathematical functions in Table 1.
Table 1.
Benchmark of mathematical test functions (Dimension 50).
| Function | Bounds | ||
|---|---|---|---|
| 0 | |||
| 0 | |||
| 0 | |||
| 0 | |||
| 0 | |||
| 0 | |||
| 0 | |||
| 0 | |||
| 0 | |||
| [−100,100] | 0 | ||
| [−10,10] | 0 | ||
| [0, ] | 0 | ||
| [−100,100] | 0 | ||
| [-2, 2] | 0 | ||
| [−10,10] | 0 | ||
Table 2.
The average results for all algorithms for 30 independent runs.
| F | SP-MO | m-SCA | SCA | PSO | ISCA | SCA-DE | ASCA-PSO | SCA-PSO | SCA-GWO | CSCF |
|---|---|---|---|---|---|---|---|---|---|---|
| 3.20E−07 | 0.37 | 2.3 | 1.21 | 1.20 | 1.50E−14 | 2.30E−15 | 0.62 | 2.60E−13 | 0.034 | |
| 2.30E−06 | 0.68 | 82.5 | 11.69 | 3.20 | 0.09 | 0.12 | 2.10 | 0.21 | 3.67 | |
| 8.50E−17 | 0.13 | 0.28 | 0.26 | 0.37 | 2.10E−4 | 0.002 | 0.008 | 4.50E−05 | 0.00243 | |
| 0 | 0.80 | 1.10 | 0.7 | 0.34 | 0.007 | 4.20E−3 | 0.06 | 3.01E−03 | 8.98E−03 | |
| 5.20E−68 | 7.00E−06 | 1.85E−14 | 0.08 | 0.07 | 1.90E−16 | 7.30E−71 | 9.10E−20 | 8.50E−12 | 10.4E−03 | |
| 3.40E−15 | 0.71 | 12.1 | 5.20 | 7.89 | 7.60E−08 | 6.20E−06 | 1.20 | 0.98 | 2.45 | |
| 4.5E−128 | 1.52 | 90 | 120.8 | 3.20 | 2.30E−16 | 4.10E−68 | 5.73 | 6.70E−10 | 3.95 | |
| 4.80E−14 | 4.20E−02 | 2.40 | 18.71 | 0.30 | 3.48E−04 | 8.52E−06 | 1.77 | 6.54E−09 | 1.24 | |
| 2.20E−65 | 0.08 | 5.2 | 3.50 | 0.09 | 2.50E−04 | 4.10E−10 | 0.06 | 1.03 | 2.34 | |
| 1.6E−137 | 0.13 | 2.76 | 2.63 | 0.40 | 4.50E−26 | 0.004 | 0.12 | 3.45E−03 | 0.245 | |
| 2.19E−04 | 3.40 | 5.13 | 4.31 | 1.03 | 1.20 | 3.25 | 1.4 | 0.97 | 1.89 | |
| 0 | 2.30 | 4.93 | 6.20 | 1.42 | 4.60E−03 | 3.60E−02 | 0.0008 | 0.004 | 4.9E−03 | |
| 0 | 1.30 | 4.20 | 3.65 | 0.43 | 3.65E−05 | 2.10E−02 | 0.05 | 8.64E−06 | 1.045 | |
| 2.07E−06 | 0.70 | 6.70 | 3.56 | 0.93 | 1.30E−03 | 0.47 | 1.03 | 0.067 | 2.53 | |
| 3.67E−13 | 3.20 | 7.23 | 4.23 | 2.10 | 2.30E−04 | 0.004 | 0.03 | 3.7E−03 | 0.00078 |
SP-MO has superiority over other algorithms for all functions by finding the minimum fitness of the functions near the optimum. While SCA-DE, ASCA-PSO, SCA-PSO, SCA-GWO, and CSCF achieved near the optimum for some functions with lower accuracy than that of SP-MO. SCA-DE provided poor results for functions (, , , F10, , and ) and ASCA-PSO is poor for functions (, , , F10, , and ). SCA-PSO provided poor results for the functions (, , , F6, , , , , and ) and SCA-GWO is poor for all the function except (, , , and ).
For the rest of the algorithms, it provided poor results in all functions approximately which reflects the powerful of the proposed method (SP-MO). The addition of the mutation operator and avoiding trapping in the local minima with aiding of applying the opposition to the solutions become in nearby aid for efficient exploration in the search space.
Besides, the synergy of exploration and exploitation of the search space using SCA and PSO enhance the provided quality of solutions.
Table 3 shows the standard deviation of the results that were provided using SP-MO in comparison with other algorithms in the literature. As shown in Table 3 the proposed method SP-MO provided the lowest standard deviation while other algorithms produce higher standard deviation. That reflects the robustness of SP-MO and shows the significance of using mutation operator and opposition to intensify the search space more accurately.
Table 3.
Standard Deviation of SP-MO versus comparative algorithms.
| F | SP-MO | m-SCA | SCA | PSO | ISCA | SCA-DE | ASCA-PSO | SCA-PSO | SCA-GWO | CSCF |
|---|---|---|---|---|---|---|---|---|---|---|
| 3.20E−07 | 0.757 | 3.41 | 1.21 | 0.97 | 0.62 | 0.65 | 0.37 | 0.068 | 0.236 | |
| 0 | 2.09 | 17.6 | 11.69 | 2.68 | 1.36 | 3.64 | 8.68 | 0.543 | 0.326 | |
| 8.84E−22 | 0.234 | 0.37 | 0.26 | 0.30 | 0.03 | 0 | 0.006 | 0.017 | 0.085 | |
| 1.49E−28 | 0.702 | 0.34 | 0.97 | 0.90 | 0.06 | 0 | 0.12 | 0.039 | 0.466 | |
| 1.85E−14 | 0.624 | 1.27 | 0.08 | 0.80 | 0 | 3.12E−12 | 7.00E−06 | 0.0234 | 0.443 | |
| 0 | 0.554 | 14.4 | 3.42 | 0.71 | 2.34 | 2.63 | 0.71 | 2.06 | 0.026 | |
| 0.003 | 0.016 | 0.48 | 2.14 | 0.02 | 6.10E−09 | 6.20E−06 | 0.02 | 6.8E−10 | 0.019 | |
| 0 | 0.001 | 2.03 | 4.20 | 1.80E−03 | 4.50E−06 | 2.80E−13 | 1.80E−16 | 4.3E−06 | 0 | |
| 0 | 2.668 | 18.49 | 120.8 | 3.42 | 0.08 | 11.07 | 0.08 | 0.018 | 3.056 | |
| 0 | 0 | 5.60 | 18.71 | 3.05E−04 | 1.77 | 0.29 | 3.05E−02 | 1.371 | 0 | |
| 6.38E−08 | 0.062 | 18.18 | 0.88 | 0.08 | 0.06 | 0.18 | 0.08 | 0.020 | 0.034 | |
| 0 | 0 | 1.20 | 4.49E−02 | 4.40E−04 | 7.10E−08 | 9.20E−09 | 4.40E−04 | 7.0E−08 | 0 | |
| 2.65E−07 | 0.881 | 7.78 | 2.63 | 1.13 | 0.12 | 0.48 | 0.73 | 0.021 | 0.075 | |
| 0.0052 | 2.652 | 5.13 | 4.31 | 3.40 | 1.4 | 0.02 | 2.31 | 0.699 | 2.136 | |
| 1.90E−16 | 1.638 | 3.54 | 1.23 | 2.10 | 8.30E−07 | 4.36E−8 | 1.00E−03 | 2.1E−07 | 1.284 |
So, evaluating the performance of SP-MO for finding the optimal value of benchmark mathematical functions concludes its superiority over other algorithms in the literature in terms of quality of solution and robustness.
6.2. Estimating the similarity of biological sequences using G-Aligner based on SP-MO algorithm
In this section, the performance of G-Aligner based on SP-MO was evaluated in measuring the similarity of biological sequences (set of Homo sapiens) and finding the similarity of COVID-19 virus with other viruses. NW alignment algorithm provides the accurate alignment score (similarity) hence it was used as a reference in comparison.
The performance of G-Aligner based on different techniques in terms of execution times was tested on a set of biological sequences each pair have a product of its lengths ranges from 100,000 to 9,000,000. The G-Aligner was implemented on MATLAB software toolkit on a computer machine that has a processor Core I3 (3.14 GHz for each processor) and 4 GB RAM. The number of iterations of the meta-heuristic techniques is 200, the search agents of different techniques were assigned as in Table 4 according to the product of lengths of the aligned sequences. Table 5 shows the setting of parameters of the meta-heuristic techniques used for implementing the G-Aligner.
Table 4.
The number of search agents used for G-Aligner according to each technique.
| m n | Search agents |
|---|---|
| 100000 | 10 |
| 150000 | 20 |
| 400000 | 30 |
| 700000 | 50 |
| 900000 | 80 |
| 1200000 | 100 |
| 1700000 | 130 |
| 2000000 | 150 |
| 2500000 | 180 |
| 3000000 | 200 |
| 3500000 | 220 |
| 4500000 | 250 |
| 6000000 | 300 |
| 7000000 | 350 |
| 8000000 | 380 |
| 9000000 | 400 |
Table 5.
The setting values for the parameters of G-Aligner based on a different technique.
| Algorithm | Parameter | Value | |
|---|---|---|---|
| NW Alignment | Match | 1.0 | |
| −0.5 | |||
| −1.0 | |||
| G-Aligner | SCA m-SCA ISCA |
a |
20 |
| PSO | Inertia Coefficient | 0.2 | |
| Local coefficient (C1) | 1.5 | ||
| Global coefficient (C2) |
1.5 |
||
| SP-MO ASCA-PSO SCA-PSO |
Inertia Coefficient | 0.2 | |
| Local coefficient (C1) | 1.5 | ||
| Global coefficient (C2) | 1.5 | ||
| A |
20 |
||
| SCA-DE | Beta | 0.3 | |
| 0.3 | |||
| A | 20 | ||
Fig. 5 shows the execution time of NW global alignment against G-Aligner based on meta-heuristic techniques for aligning pair of biological sequences have a product of lengths of their sequences ranges from 100000 to 9000000. As shown in figure G-Aligner based on various meta-heuristic techniques consumes a smaller execution time than that of NW global alignment especially for longer sequences. This test verifies the significant computational time improvement of G-Aligner over NW global alignment. However, SP-MO consumes the greater execution time that is due to its big-time complexity than other meta-heuristic algorithms were used in the test which represents one of its main limitations.
Fig. 5.
Execution time of NW-Alignment versus G-Aligner based on different metaheuristic techniques.
6.2.1. Measuring the similarity Homo sapiens proteins using G-Aligner
The performance of G-Aligner based on the various meta-heuristic technique for measuring the similarity score of pair of sequences are evaluated using a set of pair of biological protein sequences (Homo sapiens proteins) gathered from NCBI. The experimental results were executed based on the parameter setting of Table 2 while Eq. (2) was used as the fitness function to score the similarity (+1 used for similar residues and otherwise is 0).
G-Aligner was implemented using the proposed SP-MO and the results were compared versus the results of standard SCA, PSO, ISCA, m-SCA, SCA-PSO and SCA-DE, ASCA-PSO, SCA-GWO, and CSCF) and the results of NW global alignment algorithms were used as a reference to validate the results of G-Aligner.
Table 6 presents the similarity scores measured by G-Aligner using SP-MO and other techniques. The first column shows the protein ID of all biological sequences of data used in the test. G-Aligner based on SP-MO provided the highest score for all pairs of comparisons with an average score of 75% of that provided by the exact global alignment algorithm (NW) which proves its powerful capability over all other algorithms in the comparison. G-Aligner based on SCA and PSO achieved approximate results where the average similarity scores are 39% for SCA and 38% for PSO relative to that measured by NW global alignment. ISCA and m-SCA achieve average similarity scores of 42% which have no enhancement of SCA for G-Aligner. However, m-SCA produces a smaller standard deviation of the experimental results than ISCA and SCA as shown in Table 7.
Table 6.
The average similarity scores using G-Aligner based on meta-heuristic techniques versus exact scores of NW global alignment.
| Protein ID (length) | NW | SCA | PSO | ISCA | m-SCA | SCA-DE | SCA-PSO | ASCA-PSO | SCA-GWO | CSCF | SP-MO | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 |
Q08AH3 Q9ULC5 |
94 | 36 | 34 | 40 | 41 | 44 | 46 | 39 | 38 | 35 | 69 |
| 2 |
P18089 Q6P093 |
53 | 22 | 25 | 24 | 23 | 25 | 25 | 22 | 21 | 18 | 39 |
| 3 |
Q9Y2D8 Q5TYW2 |
96 | 36 | 34 | 39 | 41 | 43 | 46 | 37 | 39 | 36 | 70 |
| 4 |
Q9UBJ2 Q8NE71 |
107 | 41 | 38 | 41 | 46 | 50 | 50 | 46 | 47 | 44 | 77 |
| 5 |
Q9H172 Q9H222 |
131 | 50 | 47 | 52 | 58 | 59 | 63 | 51 | 53 | 50 | 98 |
| 6 |
Q12979 Q96P50 |
129 | 53 | 49 | 57 | 58 | 58 | 65 | 58 | 57 | 54 | 90 |
| 7 |
Q12979 Q15027 |
125 | 46 | 41 | 53 | 57 | 59 | 62 | 55 | 52 | 49 | 98 |
| 8 |
Q9UG63 O95870 |
72 | 28 | 29 | 32 | 30 | 33 | 34 | 28 | 31 | 28 | 56 |
| 9 |
Q8WWZ7 Q96GR2 |
93 | 34 | 33 | 40 | 39 | 42 | 45 | 38 | 38 | 35 | 66 |
| 10 |
O95870 Q6H8Q1 |
83 | 33 | 35 | 38 | 34 | 37 | 40 | 34 | 31 | 28 | 60 |
| 11 |
O95342 Q96J66 |
192 | 74 | 69 | 78 | 80 | 83 | 95 | 79 | 77 | 74 | 138 |
| 12 |
Q8IUA7 Q6H8Q1 |
209 | 75 | 76 | 81 | 91 | 95 | 104 | 92 | 91 | 87 | 165 |
| 13 |
P55198 Q96J66 |
126 | 45 | 40 | 59 | 53 | 59 | 60 | 54 | 54 | 50 | 97 |
| 14 |
Q8NFM4 Q9BZC7 |
156 | 59 | 57 | 68 | 68 | 68 | 78 | 64 | 64 | 60 | 113 |
| 15 |
Q9UKV3 Q07912 |
171 | 63 | 60 | 73 | 70 | 75 | 86 | 71 | 71 | 67 | 124 |
| 16 |
A8K2U0 Q8NFM4 |
172 | 60 | 55 | 67 | 72 | 81 | 87 | 79 | 80 | 76 | 128 |
| 17 |
O60706 Q9UKV3 |
158 | 57 | 55 | 65 | 69 | 74 | 80 | 70 | 74 | 70 | 115 |
| 18 |
O43306 Q6IQ32 |
140 | 53 | 49 | 54 | 59 | 61 | 70 | 56 | 60 | 56 | 106 |
| 19 |
Q7Z5R6 Q8N961 |
27 | 13 | 11 | 14 | 12 | 13 | 14 | 8 | 12 | 8 | 20 |
| 20 | A0PJZ0 Q96IX9 | 26 | 39 | 39 | 58 | 50 | 49 | 48 | 46 | 46 | 42 | 80 |
| 21 | Q96IX9 P86434 | 20 | 8 | 8 | 10 | 10 | 10 | 11 | 10 | 7 | 3 | 15 |
| 22 | Q96IU4 Q969K4 | 37 | 16 | 14 | 19 | 19 | 19 | 20 | 14 | 14 | 10 | 27 |
| 23 | P14060 Q7L8J4 | 50 | 20 | 20 | 25 | 26 | 26 | 25 | 22 | 23 | 18 | 38 |
| 24 | J3QRE5 H7C0G5 | 29 | 11 | 12 | 14 | 15 | 15 | 16 | 12 | 15 | 10 | 22 |
| 25 |
P04229 P13761 |
245 | 94 | 90 | 115 | 127 | 130 | 121 | 122 | 126 | 121 | 183 |
| 26 |
P14060 Q7L8J4 |
157 | 63 | 57 | 76 | 82 | 78 | 79 | 72 | 75 | 70 | 116 |
| 27 | Q8R4X Q8VD53 | 32 | 14 | 12 | 16 | 17 | 16 | 17 | 11 | 12 | 7 | 23 |
| 28 |
P68510 P63101 |
148 | 61 | 58 | 68 | 73 | 72 | 79 | 69 | 68 | 63 | 105 |
| 29 | A0PJZ0 Q96IX9 | 26 | 11 | 11 | 12 | 13 | 14 | 14 | 11 | 11 | 9 | 20 |
Table 7.
The standard deviation of G-Aligner based on different meta-heuristic for a 20 independent run.
| Protein ID (length) | SCA | PSO | ISCA | m-SCA | SCA-DE | SCA-PSO | ASCA-PSO | SCA-GWO | CSCF | SP-MO | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 |
Q08AH3 Q9ULC5 |
2.33 | 3.57 | 2.5 | 0.9 | 0.91 | 0.3 | 0.57 | 1.11 | 1.01 | 0.61 |
| 2 |
P18089 Q6P093 |
1.84 | 2.01 | 1.41 | 0.75 | 0.63 | 0.26 | 1.58 | 0.83 | 0.92 | 0.80 |
| 3 |
Q9Y2D8 Q5TYW2 |
2.69 | 1.81 | 1.27 | 1.01 | 0.91 | 0.59 | 0.78 | 1.11 | 1.16 | 0.37 |
| 4 |
Q9UBJ2 Q8NE71 |
2.46 | 2.75 | 1.93 | 0.94 | 0.4 | 1.06 | 0.88 | 0.6 | 0.57 | 0.24 |
| 5 |
Q9H172 Q9H222 |
3.17 | 4.53 | 3.17 | 1.15 | 0.74 | 0.85 | 2.42 | 0.94 | 1 | 0.8 |
| 6 |
Q12979 Q96P50 |
1.64 | 3.97 | 2.78 | 0.69 | 0.48 | 0.07 | 0.7 | 0.68 | 0.83 | 0.44 |
| 7 |
Q12979 Q15027 |
2.46 | 2.1 | 1.47 | 0.94 | 0.75 | 1.39 | 1.22 | 1.09 | 1.86 | 0.55 |
| 8 |
Q9UG63 O95870 |
2.27 | 2.31 | 1.62 | 0.88 | 0.42 | 1.01 | 1.36 | 0.76 | 1.09 | 0.63 |
| 9 |
Q8WWZ7 Q96GR2 |
2.37 | 1.63 | 1.14 | 0.91 | 1.1 | 0.88 | 1.04 | 1.44 | 1.78 | 0.4 |
| 10 |
O95870 Q6H8Q1 |
1.96 | 2.95 | 2.07 | 0.79 | 0.47 | 0.8 | 0.89 | 0.81 | 1.12 | 0.74 |
| 11 |
O95342 Q96J66 |
4.5 | 9.27 | 6.49 | 1.55 | 0.82 | 2.42 | 2.66 | 1.16 | 1.29 | 0.91 |
| 12 |
Q8IUA7 Q6H8Q1 |
2.37 | 3.17 | 2.22 | 0.91 | 1 | 0.72 | 4.05 | 1.34 | 1.22 | 0.23 |
| 13 |
P55198 Q96J66 |
2.72 | 1.23 | 0.86 | 1.01 | 0.61 | 1.49 | 0.94 | 0.95 | 1.7 | 0.4 |
| 14 |
Q8NFM4 Q9BZC7 |
3.77 | 1.4 | 0.98 | 1.33 | 0.78 | 0.37 | 0.77 | 1.12 | 0.99 | 0.48 |
| 15 |
Q9UKV3 Q07912 |
1.07 | 2.4 | 1.68 | 0.52 | 0.43 | 0.02 | 1.58 | 0.77 | 1.4 | 0.72 |
| 16 |
A8K2U0 Q8NFM4 |
5.96 | 2.88 | 2.01 | 1.99 | 1.86 | 4.04 | 1.31 | 2.2 | 2.76 | 0.84 |
| 17 |
O60706 Q9UKV3 |
1.84 | 1.84 | 1.29 | 0.75 | 0.4 | 0.42 | 0.41 | 0.74 | 0.67 | 0.95 |
| 18 |
O43306 Q6IQ32 |
2 | 1.96 | 1.37 | 0.8 | 0.53 | 0.45 | 1.55 | 0.87 | 1.01 | 0.66 |
| 19 |
Q7Z5R6 Q8N961 |
1.4 | 1.4 | 0.98 | 0.62 | 0.9 | 0.91 | 0.95 | 1.07 | 1.44 | 0.36 |
| 20 | A0PJZ0 Q96IX9 | 2.25 | 3.44 | 2.41 | 0.88 | 0.71 | 0.99 | 1.85 | 0.88 | 0.86 | 0.61 |
| 21 | Q96IX9 P86434 | 2.52 | 2.64 | 1.78 | 0.43 | 0.21 | 0.85 | 0.97 | 0.38 | 0.3 | 0.28 |
| 22 | Q96IU4 Q969K4 | 2.31 | 2.5 | 1.63 | 0.72 | 0.43 | 0.70 | 0.94 | 0.6 | 0.72 | 0.25 |
| 23 | P14060 Q7L8J4 | 2.58 | 2.41 | 1.51 | 0.73 | 0.38 | 0.88 | 0.84 | 0.55 | 1.16 | 0.16 |
| 24 | J3QRE5 H7C0G5 | 2.48 | 2.28 | 1.64 | 0.88 | 0.25 | 0.73 | 0.96 | 0.42 | 1.23 | 0.45 |
| 25 |
P04229 P13761 |
2.33 | 2.25 | 1.81 | 0.75 | 0.33 | 0.90 | 1.2 | 0.5 | 1.43 | 0.28 |
| 26 |
P14060 Q7L8J4 |
2.22 | 2.36 | 1.65 | 0.44 | 0.43 | 0.92 | 0.86 | 0.6 | 0.75 | 0.47 |
| 27 | Q8R4X Q8VD53 | 2.24 | 2.34 | 1.44 | 0.6 | 0.43 | 0.93 | 0.78 | 0.6 | 0.63 | 0.36 |
| 28 |
P68510 P63101 |
2.55 | 2.38 | 1.71 | 0.67 | 0.41 | 0.84 | 1.30 | 0.58 | 0.73 | 0.19 |
| 29 | A0PJZ0 Q96IX9 | 2.48 | 2.33 | 1.64 | 0.87 | 0.47 | 0.94 | 1.03 | 0.64 | 0.97 | 0.55 |
The G-Aligner based on hybrid techniques SCA-DE and SCA-PSO achieved average similarity scores of 46% and 49% in order and the two techniques have smaller standard deviation than SCA. The hybrid of SCA and PSO (ASCA-PSO) has a great enhancement by over the performance of SCA where it achieves 51.5% of the exact similarity scores founded by NW global alignment. ASCA-PSO has a standard deviation results approximate to SCA-DE and SCA-PSO. SCA-GWO provided an average score of 45% while CSCF had an average score of 44%.
The proposed technique SP-MO for G-Aligner achieved the highest similarity scores with 75% of the exact similarity scores founded by NW global alignment with the lowest standard deviation among all techniques as shown in Table 7. That verifies the superiority of G-Aligner based on SP-MO over the other algorithms in the literature for finding the highest similarity score near to the exact one as possible in small time.
6.2.2. Measuring similarity of COVID-19 versus other viruses using G-Aligner
The G-Aligner performance was validated by measuring the similarity of the COVID-19 virus with other viruses where all the protein of viruses gathered from NCBI. The viruses are (1) Middle East respiratory syndrome coronavirus (MERS-CoV), (2) Malaria, (3) Hepatitis C, (4) Hepatitis B, (5) Epstein–Barr virus (HHV-4), (6) Influenza A, (7) Influenza B, (8) Simian immunodeficiency virus, (9) Trachea Infections, (10) Severe acute respiratory syndrome coronavirus (SARS-CoV), (11) Dengue virus, (12) Cowbox virus and (13) Alveolar proteinosis.
Fig. 6 presents the comparisons of measuring the similarity of COVID-19 virus with viruses using G-Aligner based on SP-MO, ASCA-PSO, and NW global alignment as a reference. The horizontal line represents the index of the virus while the vertical one represents the scale of similarity score. As shown in the figure G-Aligner proposes a similarity score using SP-MO higher than that of ASCA-PSO and is 75% of that measured by NW global alignment.
Fig. 6.
Similarity scores of aligning COVID-19 against 13 viruses based on G-Aligner using SP-MO versus ASCA-PSO and NW algorithm [33].
Fig. 7 shows the comparison of similarity scores between COVID-19 and other viruses using G-Aligner based on SP-MO and the enhancement version of SCA in the literature review. As shown in Fig. 7(a), m-SCA and ISCA achieved approximate scores little better than that of SCA and PSO and in Fig. 7(b) SCA-DE and SCA-PSO achieved approximate scores but ASCA-PSO beat them. In Fig. 7(c), SP-MO beat SCA-GWO and CSCF with a significant difference.
Fig. 7.
Similarity scores of aligning COVID-19 against 13 viruses based on G-Aligner using SP-MO versus various stochastic techniques in the literature.
From Figs. 6 and 7 we can conclude them G-Aligner based on SP-MO has the superiority of measuring the similarity scores with the highest similarity scores of aligning COVID-19 with 13 viruses that are 75% of the score measured by NW global alignment in a reasonable time. SP-MO beat all algorithms in the literature due to its hybrid mechanism which is based on the balance between exploration and exploitation using SCA and PSO in order. Besides, the mutation and opposition operator enhance the exploration of the search space especially for sequences with huge lengths. Besides, SP-MO has the advantage of avoidance the trapping in local optima where there is a condition if the solutions become nearby then apply the opposition operator to diverse the solution.
The main advantages of G-Aligner based on SP-MO as follows:
-
1-
Measuring the similarity score of pair of biological sequences with a reasonable percentage of that measured by NW global alignment algorithm (the exact ones) in very small time especially with sequences with huge length at low cost.
-
2-
It can work offline or online and with any scoring weight for measuring similarity.
-
3-
It is easy to develop G-Aligner in the future by replacing the meta-heuristic technique to test its performance and develop it.
The main limitation of the proposed method (SP-MO) as follows:
-
1-
It consumes execution time more than that of the ASCA-PSO algorithm.
-
2-
Implementing G-Aligner, provided a similarity score of 75% of the exact result that measured by NW global alignment which needs more enhancement.
-
3-
It was tested on real biological sequences that have a product of lengths up to 9,000,000 which need to increase the length of sequences and test its performance to develop it.
7. Conclusion
This work proposed an accelerated global alignment technique (G-Aligner) based on meta-heuristic algorithms to measure the similarity score of the pair of biological sequences in a small time at a low cost. The main benefit of G-Aligner it can scan biological databases fastly to filter the highest similarity sequences to a query one with acceptable similarity measurements near to the exact ones. The developed algorithm (SP-MO) was tested on a set of benchmark mathematical functions in comparison with recent related work in the literature. SP-MO algorithm has superiority over the relevant studies in the literature by finding the best minimum fitness values of all functions with the lowest standard deviation. Besides, G-Aligner based on SP-MO was validated by measuring the similarity of COVID-19 virus with the other 13 viruses. G-Aligner using SP-MO succeeded to measure the similarity with 75% of the exact one but in execution time very smaller than that of the exact global alignment.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
References
- 1.Chen Y., Liu Q., Guo D. Emerging coronaviruses: genome structure, replication, and pathogenesis. J. Med. Virol. 2020;92(4):418–423. doi: 10.1002/jmv.25681. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Ge X.-Y., et al. Isolation and characterization of a bat SARS-like coronavirus that uses the ACE2 receptor. Nature. 2013;503(7477):535–538. doi: 10.1038/nature12711. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Cauchemez S., et al. Vol. 18. 2013. Transmission scenarios for middle east respiratory syndrome coronavirus (MERS-CoV) and how to tell them apart. (Euro surveillance: bulletin Europeen sur les maladies transmissibles= European communicable disease bulletin). [PMC free article] [PubMed] [Google Scholar]
- 4.Tipaldi M.A., et al. How to manage the COVID-19 diffusion in the angiography suite: experiences and results of an Italian interventional radiology unit. SciMedicine J. 2020;2:1–8. [Google Scholar]
- 5.Hanscom D., et al. Polyvagal and global cytokine theory of safety and threat Covid-19–Plan B. SciMedicine J. 2020;2:9–27. [Google Scholar]
- 6.Anchordoqui L.A., Dent J.B., Weiler T.J. A physics modeling study of COVID-19 transport in air. SciMedicine J. 2020;2:83–91. [Google Scholar]
- 7.Sun X., Wandelt S., Zhang A. How did COVID-19 impact air transportation? A first peek through the lens of complex networks. J. Air Transp. Manag. 2020;89 doi: 10.1016/j.jairtraman.2020.101928. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Intissar A. A mathematical study of a generalized SEIR model of COVID-19. SciMedicine J. 2020;2:30–67. [Google Scholar]
- 9.Anchordoqui L.A., Chudnovsky E.M. A physicist view of COVID-19 airborne infection through convective airflow in indoor spaces. SciMedicine J. 2020;2:68–72. [Google Scholar]
- 10.Morawska L., et al. How can airborne transmission of COVID-19 indoors be minimised? Environ. Int. 2020;142 doi: 10.1016/j.envint.2020.105832. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Sahoo P.K., et al. Is the transmission of novel coronavirus disease (COVID-19) weather dependent? J. Air Waste Manage. Assoc. 2020:1–4. doi: 10.1080/10962247.2020.1823763. [DOI] [PubMed] [Google Scholar]
- 12.Sahoo P.K., et al. COVID-19 pandemic: an outlook on its impact on air quality and its association with environmental variables in major cities of Punjab and Chandigarh, India. J. Air Waste Manage. Assoc. 2020:1–12. [Google Scholar]
- 13.Tobías A., Molina T. Is temperature reducing the transmission of COVID-19? Environ. Res. 2020;186 doi: 10.1016/j.envres.2020.109553. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Jamil T., et al. No evidence for temperature-dependence of the COVID-19 epidemic. Front. public health. 2020;8:436. doi: 10.3389/fpubh.2020.00436. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Holtmann M., et al. Environ. Res. 2020. Low ambient temperatures are associated with more rapid spread of COVID-19 in the early phase of the endemic. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Lalmuanawma S., Hussain J., Chhakchhuak L. Applications of machine learning and artificial intelligence for Covid-19 (SARS-CoV-2) pandemic: A review. Chaos Solitons Fractals. 2020 doi: 10.1016/j.chaos.2020.110059. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Al-Qaness M.A., et al. Optimization method for forecasting confirmed cases of COVID-19 in China. J. Clin. Med. 2020;9(3):674. doi: 10.3390/jcm9030674. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Pirouz B., et al. Investigating a serious challenge in the sustainable development process: analysis of confirmed cases of COVID-19 (new type of coronavirus) through a binary classification using artificial intelligence and regression analysis. Sustainability. 2020;12(6):2427. [Google Scholar]
- 19.Jamshidi M., et al. Artificial intelligence and COVID-19: Deep learning approaches for diagnosis and treatment. IEEE Access. 2020;8 doi: 10.1109/ACCESS.2020.3001973. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Abdel-Basset M., et al. A hybrid COVID-19 detection model using an improved marine predators algorithm and a ranking-based diversity reduction strategy. IEEE Access. 2020;8:79521–79540. [Google Scholar]
- 21.Alabool H., et al. 2020. Artificial Intelligence Techniques for Containment COVID-19 Pandemic: A Systematic Review. [Google Scholar]
- 22.Hamzah F.B., et al. Coronatracker: worldwide COVID-19 outbreak data analysis and prediction. Bull. World Health Organ. 2020;1:32. [Google Scholar]
- 23.Hazarika B.B., Gupta D. Modelling and forecasting of COVID-19 spread using wavelet-coupled random vector functional link networks. Appl. Soft Comput. 2020 doi: 10.1016/j.asoc.2020.106626. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Wynants L., et al. Prediction models for diagnosis and prognosis of covid-19 infection: systematic review and critical appraisal. BMJ. 2020;369 doi: 10.1136/bmj.m1328. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Monaghan C., et al. 2020. Artificial Intelligence for COVID-19 Risk Classification in Kidney Disease: Can Technology Unmask an Unseen Disease? MedRxiv. [Google Scholar]
- 26.Nour M., Cömert Z., Polat K. A novel medical diagnosis model for COVID-19 infection detection based on deep features and Bayesian optimization. Appl. Soft Comput. 2020 doi: 10.1016/j.asoc.2020.106580. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Marques G., Agarwal D., de la Torre Díez I. Automated medical diagnosis of COVID-19 through efficientnet convolutional neural network. Appl. Soft Comput. 2020 doi: 10.1016/j.asoc.2020.106691. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Sen Gupta P.S., et al. Binding insight of clinically oriented drug famotidine with the identified potential target of SARS-CoV-2. J. Biomol. Struct. Dyn. 2020:1–7. doi: 10.1080/07391102.2020.1784795. [DOI] [PubMed] [Google Scholar]
- 29.Kong R., et al. 2020. COVID-19 docking server: An interactive server for docking small molecules, peptides and antibodies against potential targets of COVID-19. arXiv preprint arXiv:2003.00163. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Lamptey E., Serwaa D. The use of zipline drones technology for COVID-19 samples transportation in Ghana. HighTech Innov. J. 2020;1(2):67–71. [Google Scholar]
- 31.Angurala M., et al. An internet of things assisted drone based approach to reduce rapid spread of COVID-19. J. Saf. Sci. Resil. 2020;1(1):31–35. [Google Scholar]
- 32.Kumar A., et al. A drone-based networked system and methods for combating coronavirus disease (COVID-19) pandemic. Future Gener. Comput. Syst. 2020;115:1–19. doi: 10.1016/j.future.2020.08.046. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Needleman S.B., Wunsch C.D. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 1970;48(3):443–453. doi: 10.1016/0022-2836(70)90057-4. [DOI] [PubMed] [Google Scholar]
- 34.Smith T.F., Waterman M.S. Identification of common molecular subsequences. J. Mol. Biol. 1981;147(1):195–197. doi: 10.1016/0022-2836(81)90087-5. [DOI] [PubMed] [Google Scholar]
- 35.Ahmed N., et al. GASAL2: a GPU accelerated sequence alignment library for high-throughput NGS data. BMC Bioinformatics. 2019;20(1):520. doi: 10.1186/s12859-019-3086-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Mohamed Issa A.H., Ibrahim Ziedan, Ahmed Alzohairy Maximizing occupancy of GPU for fast scanning biological database using sequence alignment. J. Appl. Sci. Res. 2017;13(6) [Google Scholar]
- 37.Alawneh L., et al. A scalable multiple pairwise protein sequence alignment acceleration using hybrid CPU–GPU approach. Cluster Comput. 2020:1–12. [Google Scholar]
- 38.Sundfeld D., et al. Using GPU to accelerate the pairwise structural RNA alignment with base pair probabilities. Concurr. Comput.: Pract. Exper. 2020;32(10) [Google Scholar]
- 39.Kasap S., Benkrid K., Liu Y. Design and implementation of an FPGA-based core for gapped BLAST sequence alignment with the two-hit method. Eng. Lett. 2008;16(3) [Google Scholar]
- 40.Liu Y., et al. 2009 NASA/ESA Conference on Adaptive Hardware and Systems. IEEE; 2009. An fpga-based web server for high performance biological sequence alignment. [Google Scholar]
- 41.Benkrid K., et al. High performance biological pairwise sequence alignment: FPGA versus GPU versus cell BE versus GPP. Int. J. Reconfigurable Comput. 2012;2012 [Google Scholar]
- 42.Benkrid K., Liu Y., Benkrid A. A highly parameterized and efficient FPGA-based skeleton for pairwise biological sequence alignment. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 2009;17(4):561–570. [Google Scholar]
- 43.Chamberlain R., et al. Google Patents; 2008. Method and Apparatus for Protein Sequence Alignment using FPGA Devices. [Google Scholar]
- 44.Ramdas T., Egan G. TENCON 2005 2005 IEEE Region 10. IEEE; 2005. A survey of FPGAs for acceleration of high performance computing and their application to computational molecular biology. [Google Scholar]
- 45.Talbi E.-G. John Wiley & Sons; 2009. Metaheuristics: From Design To Implementation, Vol. 74. [Google Scholar]
- 46.Mirjalili S. SCA: a sine cosine algorithm for solving optimization problems. Knowl.-Based Syst. 2016;96:120–133. [Google Scholar]
- 47.Kennedy Particle swarm optimization. Neural Netw. 1995 [Google Scholar]
- 48.Javidy B., Hatamlou A., Mirjalili S. Ions motion algorithm for solving optimization problems. Appl. Soft Comput. 2015;32:72–79. [Google Scholar]
- 49.Shareef H., Ibrahim A.A., Mutlag A.H. Lightning search algorithm. Appl. Soft Comput. 2015;36:315–333. [Google Scholar]
- 50.Rashedi E., Nezamabadi-Pour H., Saryazdi S. GSA: a gravitational search algorithm. Inf. Sci. 2009;179(13):2232–2248. [Google Scholar]
- 51.Abedinpourshotorban H., et al. Electromagnetic field optimization: A physics-inspired metaheuristic optimization algorithm. Swarm Evol. Comput. 2016;26:8–22. [Google Scholar]
- 52.Mirjalili S. Moth-flame optimization algorithm: A novel nature-inspired heuristic paradigm. Knowl.-Based Syst. 2015;89:228–249. [Google Scholar]
- 53.Yang C.-H., et al. Protein folding prediction in the HP model using ions motion optimization with a greedy algorithm. BioData min. 2018;11(1):17. doi: 10.1186/s13040-018-0176-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Zhao X. Advances on protein folding simulations based on the lattice HP models with natural computing. Appl. Soft Comput. 2008;8(2):1029–1040. [Google Scholar]
- 55.Bošković B., Brest J. Genetic algorithm with advanced mechanisms applied to the protein structure prediction in a hydrophobic-polar model and cubic lattice. Appl. Soft Comput. 2016;45:61–70. [Google Scholar]
- 56.Morshedian A., Razmara J., Lotfi S. A novel approach for protein structure prediction based on an estimation of distribution algorithm. Soft Comput. 2019;23(13):4777–4788. [Google Scholar]
- 57.Márquez-Chamorro A.E., et al. Soft computing methods for the prediction of protein tertiary structures: A survey. Appl. Soft Comput. 2015;35:398–410. [Google Scholar]
- 58.Pérez-Sánchez H., Cano G., García-Rodríguez J. Improving drug discovery using hybrid softcomputing methods. Appl. Soft Comput. 2014;20:119–126. [Google Scholar]
- 59.Leonhart P.F., et al. A biased random key genetic algorithm for the protein–ligand docking problem. Soft Comput. 2019;23(12):4155–4176. [Google Scholar]
- 60.Issa M., et al. ASCA-PSO: Adaptive sine cosine optimization algorithm integrated with particle swarm for pairwise local sequence alignment. Expert Syst. Appl. 2018;99:56–70. [Google Scholar]
- 61.Muppalaneni M., Ma M., Gurumoorthy S. Springer; 2019. Soft Computing and Medical Bioinformatics. [Google Scholar]
- 62.Ali A.F., Hassanien A.-E. Applications of Intelligent Optimization in Biology and Medicine. Springer; 2016. A survey of metaheuristics methods for bioinformatics applications; pp. 23–46. [Google Scholar]
- 63.Elaziz M.Abd., Oliva D., Xiong S. An improved opposition-based sine cosine algorithm for global optimization. Expert Syst. Appl. 2017;90:484–500. [Google Scholar]
- 64.Gupta S., Deep K. A hybrid self-adaptive sine cosine algorithm with opposition based learning. Expert Syst. Appl. 2019;119:210–230. [Google Scholar]
- 65.Nenavath H., Jatoth R.K. Hybridizing sine cosine algorithm with differential evolution for global optimization and object tracking. Appl. Soft Comput. 2018;62:1019–1043. [Google Scholar]
- 66.Nenavath H., Jatoth R.K., Das S. A synergy of the sine-cosine algorithm and particle swarm optimizer for improved global optimization and object tracking. Swarm Evol. Comput. 2018 [Google Scholar]
- 67.Hassan B.A. CSCF: a chaotic sine cosine firefly algorithm for practical application problems. Neural Comput. Appl. 2020:1–20. [Google Scholar]
- 68.Gupta S., et al. Sine cosine grey wolf optimizer to solve engineering design problems. Eng. Comput. 2020:1–27. [Google Scholar]
- 69.Mirjalili S., Mirjalili S.M., Lewis A. Grey wolf optimizer. Adv. Eng. Softw. 2014;69:46–61. [Google Scholar]
- 70.Cormen T.H. MIT press; 2009. Introduction To Algorithms. [Google Scholar]
- 71.Xiong J. Cambridge University Press; 2006. Essential Bioinformatics. [Google Scholar]
- 72.Mount D.W. Comparison of the PAM and BLOSUM amino acid substitution matrices. Cold Spring Harbor Protoc. 2008;2008(6) doi: 10.1101/pdb.ip59. p. pdb. ip59. [DOI] [PubMed] [Google Scholar]
- 73.Salgotra R., Singh U. Application of mutation operators to flower pollination algorithm. Expert Syst. Appl. 2017;79:112–129. [Google Scholar]
- 74.Zhang Q., et al. Chaos-induced and mutation-driven schemes boosting salp chains-inspired optimizers. IEEE Access. 2019;7:31243–31261. [Google Scholar]
- 75.Jia H., et al. Dynamic harris hawks optimization with mutation mechanism for satellite image segmentation. Remote sens. 2019;11(12):1421. [Google Scholar]
- 76.Xu Y., et al. Enhanced moth-flame optimizer with mutation strategy for global optimization. Inform. Sci. 2019;492:181–203. [Google Scholar]
- 77.Wang H., et al. 2007 IEEE Congress on Evolutionary Computation. IEEE; 2007. Opposition-based particle swarm algorithm with Cauchy mutation. [Google Scholar]
- 78.Zhang X., et al. Gaussian mutational chaotic fruit fly-built optimization and feature selection. Expert Syst. Appl. 2020;141 [Google Scholar]
- 79.Wang G.-G., et al. Opposition-based krill herd algorithm with Cauchy mutation and position clamping. Neurocomputing. 2016;177:147–157. [Google Scholar]
- 80.Sapre S., Mini S. Opposition-based moth flame optimization with Cauchy mutation and evolutionary boundary constraint handling for global optimization. Soft Comput. 2019;23(15):6023–6041. [Google Scholar]
- 81.Jamil M., Yang X.-S. A literature survey of benchmark functions for global optimisation problems. Int. J. Math. Model. Numer. Optimis. 2013;4(2):150–194. [Google Scholar]









