Skip to main content
Computational and Mathematical Methods in Medicine logoLink to Computational and Mathematical Methods in Medicine
. 2015 Oct 7;2015:674296. doi: 10.1155/2015/674296

Survey of Natural Language Processing Techniques in Bioinformatics

Zhiqiang Zeng 1, Hua Shi 1, Yun Wu 1, Zhiling Hong 2,*
PMCID: PMC4615216  PMID: 26525745

Abstract

Informatics methods, such as text mining and natural language processing, are always involved in bioinformatics research. In this study, we discuss text mining and natural language processing methods in bioinformatics from two perspectives. First, we aim to search for knowledge on biology, retrieve references using text mining methods, and reconstruct databases. For example, protein-protein interactions and gene-disease relationship can be mined from PubMed. Then, we analyze the applications of text mining and natural language processing techniques in bioinformatics, including predicting protein structure and function, detecting noncoding RNA. Finally, numerous methods and applications, as well as their contributions to bioinformatics, are discussed for future use by text mining and natural language processing researchers.

1. Introduction

Text mining and natural language processing refer to comprehending and analyzing natural language by using computer algorithms and programs. It is an important research direction in the application field of artificial intelligence. Research on natural language processing and text mining has been reported as early as the emergence of computers. With continuous and extensive research on machine learning and data mining algorithms, existing text mining technologies have achieved good results in automatic abstraction, automatic question answering, web relational network analysis, and anaphora resolution [1, 2].

Bioinformatics is an interdiscipline that emerged with the progress and accomplishment of the Human Genome Project. It predicts and solves live science problems related to genetics by using computer and statistical informatics. Data storage, retrieval, and analysis are the key processes in bioinformatics [37]. The National Center for Biotechnology Information established various databases for biological data, including sequence databases for storing DNA and protein data (e.g., dbEST and dbSNP) [8, 9], Online Mendelian Inheritance in Man database for storing disease data, Gene Expression Omnibus database for storing gene chip data, and PubMed database for storing biological and medical literature [10].

Text mining and natural language processing techniques are necessary to retrieve user preference knowledge from expanding databases. Therefore, researchers retrieve papers on certain topics of interest, such as determining protein-protein interactions, from PubMed using computer algorithms and programs. With the cracking of genetic codes, researchers have determined that biological sequences, particularly protein sequences, are similar to human language in terms of composition. In addition to using text mining to retrieve bioinformatics articles directly, an increasing number of researchers are regarding protein sequences as a special “text” and analyzing them based on existing text mining technologies. The relationship between bioinformatics and natural language processing is shown in Figure 1. Researchers have also predicted the structures and functions of proteins. Based on these two aspects, we summarize the text mining technologies used in bioinformatics research. We aim to present these technologies to more bioinformatics researchers and hope that the number of researchers who can use good text mining technologies in bioinformatics studies will increase.

Figure 1.

Figure 1

Problems and methodology relationship between NLP and bioinformatics.

2. Mining Bioinformatics Literature

The development of text mining technology plays an important role in retrieving biological literature, particularly in establishing biological information databases. A special workshop on biological literature retrieval problems was conducted during the Annual Meeting of the Association for Computational Linguistics and the Annual International Conference on Intelligent Systems for Molecular Biology in 2005 to discuss literature mining problems related to bioinformatics. Extracting protein-protein interactions and the relationship between gene functions and diseases are two leading application subjects.

2.1. Extracting Protein-Protein Interactions

Extracting the protein interaction network is an important research topic in bioinformatics and systems biology [1114]. In previous studies, researchers searched for protein-protein interactions manually. However, with the exponential growth of biological literature, a program that can recognize protein-protein interactions automatically from PubMed abstracts is necessary. Nevertheless, no unified naming rule for proteins has been established yet. Many proteins and genes use the same name. Consequently, recognizing protein names from the literature abstracts and further determining their interactions are key problems in the application of text mining in searching for protein-protein interactions.

Initially, researchers extracted protein-protein interactions through statistical and counting methods. They manually created dictionaries of protein names and then searched abstracts that involve elements occurring at least twice. On this basis, researchers determined that associated proteins interact with one another [15]. Some researchers also used dynamic planning to extract and compare protein-protein interactions [16].

Extracting protein-protein interactions has been a research hot spot in bioinformatics for a long time and has attracted an increasing number of researchers in the fields of text mining and natural language processing. First, the grammar of literature abstracts is analyzed more carefully, rather than making a simple statistics of dictionary words. Kim et al. converted a complicated semantic structure analysis into calculating the shortest path in a graph by creating a nucleus [17]. Similar analysis methods of literature abstracts include grammatical analysis [1821], context-free grammar analysis [22], ontology analysis [23], and other information retrieval methods. Protein-protein interactions are examined using these analysis methods. In addition, many machine learning methods, such as ensemble learning [24] and Bayesian network [25], are applied to recognize protein names and interactions.

2.2. Extracting the Relationship between Gene Functions and Diseases

Extracting protein-protein interactions involves searching for two proteins in the text and determining whether they interact with each other. Similarly, extracting the relationship between gene functions and diseases also involves searching for gene names and disease names simultaneously in the literature and then determining whether a particular gene is related to a certain disease [26].

In general, such extraction process can be divided into three steps. First, the abstracts of associated papers are searched through comparison with a dictionary. Second, the search scope has to be expanded forward and backward sometimes based on the location of the related word or clause to ensure accuracy. Finally, facts are evaluated using grammar analysis methods or machine learning methods. Such extraction methods frequently yield good results for special genes and diseases. Bui et al. examined the relationship between drugs and HIV variation in PubMed [27]. Jiang et al. determined the relationship between approximately 3000 microRNAs and different diseases based on the naming rule of microRNA [28]. Cheng et al. developed a text mining system based on the relationship among human diseases, variations, and drug effects [29]. Iossifov et al. focused on investigating malformations of human and mouse encephalon [30]. Jensen et al. made a detailed summary of related document databases, literature mining software, and functions [31].

2.3. Retrieving References

A considerable amount of bioscience literature has been published. Searching for interacting proteins and examining the relationship between genes and diseases are only two application cases. Text mining technology is required to obtain answers to many other bioscience and bioinformatics problems in various databases, such as PubMed.

Biological literature mining and related problem solving have to cope with two major problems, namely, recognizing name entities and extracting relations. These problems are mainly solved by (1) methods based on linguistic analysis [32], (2) methods based on dictionaries [33], (3) machine learning methods [34, 35], and (4) statistical methods [36].

Several important databases are also selected with text mining. STRING [37] and BioGRID [38] are built for protein-protein interaction with literature mining. For predicting gene function, PubTator [39] and GeneCards [40] are important databases using text mining techniques. Related works were reviewed in detail in Huang and Lu's work [41] recently. As the development of crowdsource, artificial text searching and mining can also be helpful for biomedicine literature collection [42].

Moreover, converting PubMed database into an Extensible Markup Language relational database [43] and a fuzzy search of papers and author names through short-term matching are also current research hot spots [44].

3. Applying Text Mining Technologies to Protein Research

DNA and protein sequences are a meaningful genetic language and are regarded as the sealed book of life. Therefore, an increasing number of natural language processing and text mining algorithms are being applied to study bioinformatics. For example, latent semantic analysis was applied to protein remote homology detection [45, 46], and protein spectral analysis originates from word frequency statistics in natural language processing. Furthermore, some grammar rules of protein, DNA, and RNA sequences were discovered, and several web servers were constructed so as to extract these features and rules [47].

3.1. Predicting Protein Structure

Protein structure determines function [48]. Hence, it should be analyzed to determine protein function. The structural analysis of protein mainly focuses on certain protein sequences and classifies regions into the α-helix, β-lamella, and protein disordered regions. Predicting the α-helix and β-lamella regions is the same as predicting the secondary protein structure.

If a protein sequence is regarded as a natural language, then analyzing the type of protein in a region is similar to calibrating grammar in natural language processing. First, the secondary protein structure is predicted by combining rules and statistics [4952]. However, faced with the bottleneck of statistical prediction, some researchers have proposed using machine learning prediction methods, including methods based on artificial neural network (ANN) [53], support vector machine (SVM) [54, 55], random forest [5658], and maximum entropy [59].

Predicting the protein disordered region is also conducted. This region refers to the area without a stable or unique 3D structure in the protein space structure. Many text mining and machine learning methods, including ANN [6062], SVM [6365], conditional random field [66], and random forest [67], have been used to predict the protein disordered region. Common existing server addresses are listed in Table 1.

Table 1.

Web server for protein disorder prediction.

Problem Name Websites Input format
Protein disorder prediction DisProt http://www.disprot.org/pondr-fit.php Fasta or EMBL sequence format
http://www.disprot.org/metapredictor.php
http://www.dabi.temple.edu/disprot/predictor.php
DisEMBL http://dis.embl.de/ SwissProt ID
DRIPPRED http://www.sbc.su.se/~maccallr/disorder/cgi-bin/submit.cgi Only plain sequence; one sequence once; slow
FoldIndex http://bip.weizmann.ac.il/fldbin/findex Only plain sequence; one sequence once
IUPred http://iupred.enzim.hu/ SwissProt ID or plain sequence
PONDR http://www.pondr.com/cgi-bin/PONDR/pondr.cgi Fasta
PSIPRED http://bioinf.cs.ucl.ac.uk/psipred/?disopred=1 Raw sequence or fasta format
SCRATCH http://scratch.proteomics.ics.uci.edu/ Only plain sequence; one sequence once; slow
Spritz http://distill.ucd.ie/spritz/ Raw sequence or fasta format
RONN http://www.strubi.ox.ac.uk/RONN/ Fasta, but only one sequence once

3.2. Predicting Protein Function

Predicting protein function is one of the most basic research topics in bioinformatics. It involves predicting protein-protein interactions and interaction sites [68, 69], localizing subcellular protein [7078], predicting and classifying transmembrane protein [7982], protein remote homology detection [83, 84], classifying protein functions [8593], recognizing multifunctional enzymes [9496], and DNA binding protein identification [97, 98].

The protein sequence is easy to determine. Similar to natural language, the protein sequence has many complicated rules. However, summarizing and understanding the rules of protein sequences are difficult. Therefore, analyzing and predicting the “protein language” expressed by amino acid sequences by using computational linguistics and machine learning methods are necessary. Through these procedures, we may be able to understand the functions of protein sequences.

Predicting protein-protein interactions is one of the most basic research topics in protein functions. Many researchers are committed to predicting whether two protein sequences exhibit interactions. To date, many machine learning methods have been applied, including SVM [99], kernel method [100, 101], decision-making tree [102, 103], random forest [104], Bayesian network [105], and the autoregressive model [106]. Several text processing methods, such as ontology annotation and sample weighting [107], are used to detect features and process training data. When predicting protein-protein interactions, researchers also aim to analyze the region of protein-protein interactions, which is used to predict protein-protein interaction sites. Information approaches commonly used in grammatical analyses, such as condition random fields [108] and a hidden Markov model (HMM) [109], have been used to analyze interaction sites and have achieved good results. Moreover, random forest [110], SVM [111], ANN [112], Bayesian network [113], linear regression [114], and other machine learning methods are used to predict protein-protein interaction sites. Nevertheless, some researchers doubt that determining the protein sequence alone is inadequate to provide sufficient information for predicting interactions [115]. Text mining and machine learning researchers should develop new features and classification methods to solve this problem. The websites of existing common software used to predict protein-protein interactions and interaction sites are provided in Table 2.

Table 2.

Web server for protein-protein interaction and sites prediction.

Problem Name Websites Input format
Protein interaction sites prediction PPISP http://pipe.scs.fsu.edu/ppisp.html PDB file
http://pipe.scs.fsu.edu/meta-ppisp.html
Protemot http://protemot.csbb.ntu.edu.tw/index.html PDB ID
SPPIDER http://sppider.cchmc.org PDB file or PDB ID
Whiscy http://nmr.chem.uu.nl/Software/whiscy/index.html PDB file

Protein-protein interaction prediction InterPreTS http://www.russell.embl.de/cgi-bin/tools/interprets.pl Fasta, 40 sequences at most
PIE http://www.ncbi.nlm.nih.gov/CBBresearch/Wilbur/IRET/PIE/ Gene ID or name
PPI http://121.192.180.204:8080/PPI/Home.jsp Fasta
PredHS http://www.predhs.org/ PDB files, 10 files at most
Pred-PPI http://cic.scu.edu.cn/bioinformatics/predict_ppi/default.html Two fasta sequences
Prism http://cosbi.ku.edu.tr/prism/ Two PDB IDs or PDB files
Struct2Net http://groups.csail.mit.edu/cb/struct2net/webserver/ Gene names or keywords

4. Applying Natural Language Processing Techniques to Noncoding RNA Identification

4.1. Comparative RNA Prediction Methods

Alignment is also an important topic in natural language processing. DNA or RNA sequences can also be viewed as text. Sequence-based multiple sequence alignment methods can be used only at the sequence similarity level. The secondary structures of ncRNAs are usually more conserved than their sequences [116, 117]; for example, miRNA precursors share the common hairpin-like structure and tRNAs form cloverleaf structures [118, 119]. The functions of many ncRNAs are therefore determined by their secondary structure rather than by their sequences. As a result, structure-based multiple sequence alignment methods have been developed to align an input sequence to known ncRNA structures to determine the ncRNA class to which the input sequence belongs.

LocARNA [120] can produce fast and high-quality pairwise and multiple alignments of RNA sequences. It uses a complex RNA energy model for simultaneous folding and sequence/structure alignment of the RNAs. LocARNA performs global and local sequence alignments as well as local structural alignment of RNA molecules. An upgraded version of LocARNA, called LocARNA-P, has been developed recently [121]. The new version incorporates a probabilistic model that can compute accurate multiple alignments based on a probabilistic consistency transformation and reliability profiles for assessing local alignment quality and localizing RNA motifs. These features are based on computing sequence and structure match probabilities based on the LocARNA alignment model.

Although comparative methods perform well in most cases, they have three intrinsic limitations: (1) they are highly dependent on the availability of homologous sequences or structures and cannot make predictions when no relevant sequence similarity or structure similarity is available; (2) they cannot correctly identify real ncRNAs that have low homology with known ncRNAs; and (3) they can identify only ncRNAs that are homologous with members of known ncRNA classes but cannot identify members of novel ncRNA classes. Most lncRNAs (long noncoding RNAs) cannot be predicted using comparative methods because they do not have specific structures or sequence similarity. These limitations mean that comparative methods display low specificity for identifying ncRNAs. The multiple sequence alignment tools that are currently available are listed in Table 3.

Table 3.

Multiple sequence alignment tools.

4.2. Noncomparative RNA Prediction Methods

The noncomparative methods are independent of homologous information and can, therefore, detect nonconserved ncRNAs. Most noncomparative methods employ machine learning techniques to make the predictions [122], which are similar to the text mining techniques.

Because of the importance of RNA structure, several computational RNA folding tools have been developed, such as mfold, RNAfold, vsfold, evofold, and sfold. Generally, these algorithms determine the folded secondary structure from and input sequence by optimizing the intermolecular base pairing to minimize the free energy. Some miRNA identification methods are shown in Table 4 and existing RNA secondary prediction tools are listed in Table 5.

Table 4.

miRNA identification methods.

Table 5.

Secondary prediction tools.

5. Conclusion and Future Research

As research on natural language and text mining methods develops, different application fields will be the key to future studies. Interdisciplines represented by bioinformatics are becoming the focus of an increasing number of information science researchers. The application of text mining technologies and methods in bioinformatics study will become the focus of text mining researchers. Meanwhile, bioinformatics researchers have to learn text mining technologies intensively to solve specific bioinformatics problems.

In retrieving biological literature, apart from the aforementioned prediction of protein-protein interactions and gene-disease relationship, many problems, particularly those that require updating literature retrieval results, such as the relationships between adverse drug reaction and molecule composition as well as among single nucleotide polymorphism sites, diseases, and adverse drug effects, require the use of text mining to search for related knowledge in a literature database.

In bioinformatics, nearly all studies related to proteomics and predicting protein structure according to amino acid sequences can be conducted using text mining and natural language processing technology. Many mature texts mining technologies, such as word frequency statistics, condition random fields, HMM, and context-free grammar, have been successfully applied to predict secondary protein structures, irregular regions, interactions, and interaction sites. However, the latest research results in text mining and natural language processing should be verified by applying them in protein and DNA languages. No effective computation method is available yet for predicting third and fourth protein structures, protein homology remote detection, protein disordered region detection, interaction network establishment, and drug target prediction. Information science researchers should develop and provide more effective algorithms. In addition, new machine learning and text mining methods (e.g., semisupervised learning and active learning) have been proposed and will be applied in biological literature retrieval and bioinformatics. At present, recommending systems based on feedback has become a new hot spot problem in retrieving biological literature. And the Hadoop technique for big data is another hot spot for biology sequences [123].

The development of bioinformatics relies on information science. In particular, text mining and natural language processing researchers should provide a more extensive application space. Researchers of text mining algorithms should develop more effective intelligent algorithms based on the characteristics of biological data. This study does not only summarize text mining methods used in bioinformatics and corresponding problems, but it also provides related websites of successful prediction software. Recently, text mining researchers who are involved in bioinformatics can test and compare different types of software. The authors hope that the number of text mining researchers who can apply their own methods in bioinformatics will increase, which will facilitate the development of bioinformatics and even genetic studies.

Acknowledgments

This work was supported by Natural Science Foundation of China (Grant no. 31200769), the Natural Science Foundation of Fujian Province of China (Grants no. 2013J05103 and no. 2014J01253), Xiamen Science and Technology Planning Project (Grant no. 3502Z20143030), and Scientific Research Plan Project of Fujian Education Department (Grants nos. JB12184 and JB09203).

Conflict of Interests

The authors declare that they have no competing interests.

References

  • 1.Lin C., Huang Z., Yang F., Zou Q. Identify content quality in online social networks. IET Communications. 2012;6(12):1618–1624. doi: 10.1049/iet-com.2011.0202. [DOI] [Google Scholar]
  • 2.Chen L., Chun L., Ziyu L., Quan Z. Hybrid pseudo-relevance feedback for microblog retrieval. Journal of Information Science. 2013;39(6):773–788. doi: 10.1177/0165551513487846. [DOI] [Google Scholar]
  • 3.Li Y., Wang C., Miao Z., et al. ViRBase: a resource for virus-host ncRNA-associated interactions. Nucleic Acids Research. 2015;43(1):D578–D582. doi: 10.1093/nar/gku903. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Wang L., Qian K., Huang Y., et al. SynBioLGDB: a resource for experimentally validated logic gates in synthetic biology. Scientific Reports. 2015;5, article 8090 doi: 10.1038/srep08090. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Wang Y., Chen L., Chen B., et al. Mammalian ncRNA-disease repository: a global view of ncRNA-mediated disease network. Cell Death & Disease. 2013;4(8, article e765) doi: 10.1038/cddis.2013.292. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Zhang X., Wu D., Chen L., et al. RAID: a comprehensive resource for human RNA-associated (RNA-RNA/RNA-protein) interaction. RNA. 2014;20(7):989–993. doi: 10.1261/rna.044776.114. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Li Y., Zhuang L., Wang Y., et al. Connect the dots: a systems level approach for analyzing the miRNA-mediated cell death network. Autophagy. 2013;9(3):436–439. doi: 10.4161/auto.23096. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Wang J., Zou Q., Guo M. Z. Mining SNPs from EST sequences using filters and ensemble classifiers. Genetics and Molecular Research. 2010;9(2):820–834. doi: 10.4238/vol9-2gmr765. [DOI] [PubMed] [Google Scholar]
  • 9.Wang J., Zhang L., Zou Q., Tan J., Chen X., Wu Y. Association studies on mtDNA and Parkinson’s disease population discrimination using the statistical classification. Current Bioinformatics. 2014;9(5):481–489. doi: 10.2174/15748936113086660014. [DOI] [Google Scholar]
  • 10.Zou Q., Li J., Hong Q., et al. Prediction of microRNA-disease associations based on social network analysis methods. doi: 10.1155/2015/810514. BioMed Research International. In press. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Liu B., Wang X., Lin L., Tang B., Dong Q., Wang X. Prediction of protein binding sites in protein structures using hidden Markov support vector machine. BMC Bioinformatics. 2009;10, article 381 doi: 10.1186/1471-2105-10-381. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Guo F., Li S. C., Du P., Wang L. Probabilistic models for capturing more physicochemical properties on protein-protein interface. Journal of Chemical Information and Modeling. 2014;54(6):1798–1809. doi: 10.1021/ci5002372. [DOI] [PubMed] [Google Scholar]
  • 13.Guo F., Li S. C., Wang L., Zhu D. Protein-protein binding site identification by enumerating the configurations. BMC Bioinformatics. 2012;13, article 158 doi: 10.1186/1471-2105-13-158. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Guo F., Li S. C., Wang L. Protein-protein binding sites prediction by 3D structural similarities. Journal of Chemical Information and Modeling. 2011;51(12):3287–3294. doi: 10.1021/ci200206n. [DOI] [PubMed] [Google Scholar]
  • 15.Huang M., Zhu X., Hao Y., Payan D. G., Qu K., Li M. Discovering patterns to extract protein-protein interactions from full texts. Bioinformatics. 2004;20(18):3604–3612. doi: 10.1093/bioinformatics/bth451. [DOI] [PubMed] [Google Scholar]
  • 16.Hao Y., Zhu X., Huang M., Li M. Discovering patterns to extract protein-protein interactions from the literature: part II. Bioinformatics. 2005;21(15):3294–3300. doi: 10.1093/bioinformatics/bti493. [DOI] [PubMed] [Google Scholar]
  • 17.Kim S., Yoon J., Yang J. Kernel approaches for genic interaction extraction. Bioinformatics. 2008;24(1):118–126. doi: 10.1093/bioinformatics/btm544. [DOI] [PubMed] [Google Scholar]
  • 18.Ono T., Hishigaki H., Tanigami A., Takagi T. Automated extraction of information on protein-protein interactions from the biological literature. Bioinformatics. 2001;17(2):155–161. doi: 10.1093/bioinformatics/17.2.155. [DOI] [PubMed] [Google Scholar]
  • 19.Fundel K., Küffner R., Zimmer R. RelEx—relation extraction using dependency parse trees. Bioinformatics. 2007;23(3):365–371. doi: 10.1093/bioinformatics/btl616. [DOI] [PubMed] [Google Scholar]
  • 20.Šarić J., Jensen L. J., Ouzounova R., Rojas I., Bork P. Extraction of regulatory gene/protein networks from Medline. Bioinformatics. 2006;22(6):645–650. doi: 10.1093/bioinformatics/bti597. [DOI] [PubMed] [Google Scholar]
  • 21.Friedman C., Kra P., Yu H., Krauthammer M., Rzhetsky A. GENIES: a natural-language processing system for the extraction of molecular pathways from journal articles. Bioinformatics. 2001;17(1):S74–S82. doi: 10.1093/bioinformatics/17.suppl_1.s74. [DOI] [PubMed] [Google Scholar]
  • 22.Temkin J. M., Gilder M. R. Extraction of protein interaction information from unstructured text using a context-free grammar. Bioinformatics. 2003;19(16):2046–2053. doi: 10.1093/bioinformatics/btg279. [DOI] [PubMed] [Google Scholar]
  • 23.Skusa A., Rüegg A., Köhler J. Extraction of biological interaction networks from scientific literature. Briefings in Bioinformatics. 2005;6(3):263–276. doi: 10.1093/bib/6.3.263. [DOI] [PubMed] [Google Scholar]
  • 24.Malik R., Franke L., Siebes A. Combination of text-mining algorithms increases the performance. Bioinformatics. 2006;22(17):2151–2157. doi: 10.1093/bioinformatics/btl281. [DOI] [PubMed] [Google Scholar]
  • 25.Chowdhary R., Zhang J., Liu J. S. Bayesian inference of protein–protein interactions from biological literature. Bioinformatics. 2009;25(12):1536–1542. doi: 10.1093/bioinformatics/btp245. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Zou Q., Li J., Wang C., Zeng X. Approaches for recognizing disease genes based on network. BioMed Research International. 2014;2014:10. doi: 10.1155/2014/416323.416323 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Bui Q.-C., Nualláin B. T., Boucher C. A., Sloot P. M. A. Extracting causal relations on HIV drug resistance from literature. BMC Bioinformatics. 2010;11, article 101 doi: 10.1186/1471-2105-11-101. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Jiang Q., Wang Y., Hao Y., et al. miR2Disease: a manually curated database for microRNA deregulation in human disease. Nucleic Acids Research. 2009;37(1):D98–D104. doi: 10.1093/nar/gkn714. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Cheng D., Knox C., Young N., Stothard P., Damaraju S., Wishart D. S. PolySearch: a web-based text mining system for extracting relationships between human diseases, genes, mutations, drugs and metabolites. Nucleic Acids Research. 2008;36:W399–W405. doi: 10.1093/nar/gkn296. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Iossifov I., Rodriguez-Esteban R., Mayzus I., Millen K. J., Rzhetsky A. Looking at cerebellar malformations through text-mined interactomes of mice and humans. PLoS Computational Biology. 2009;5(11) doi: 10.1371/journal.pcbi.1000559.e1000559 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Jensen L. J., Saric J., Bork P. Literature mining for the biologist: from information retrieval to biological discovery. Nature Reviews Genetics. 2006;7(2):119–129. doi: 10.1038/nrg1768. [DOI] [PubMed] [Google Scholar]
  • 32.Müller H.-M., Kenny E. E., Sternberg P. W. Textpresso: an ontology-based information retrieval and extraction system for biological literature. PLoS Biology. 2004;2(11, article e309) doi: 10.1371/journal.pbio.0020309. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Uramoto N., Matsuzawa H., Nagano T., Murakami A., Takeuchi H., Takeda K. A text-mining system for knowledge discovery from biomedical documents. IBM Systems Journal. 2004;43(3):516–533. doi: 10.1147/sj.433.0516. [DOI] [Google Scholar]
  • 34.Banko M., Cafarella M. J., Soderland S., Broadhead M., Etzioni O. Open information extraction from the web. Proceedings of the International Joint Conference on Artificial Intelligence; 2007; New York, NY, USA. pp. 68–74. [Google Scholar]
  • 35.Banko M., Etzioni O. The tradeoffs between open and traditional relation extraction. Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies; June 2008; Columbus, Ohio, USA. pp. 28–36. [Google Scholar]
  • 36.Abulaish M., Dey L. Biological relation extraction and query answering from MEDLINE abstracts using ontology-based text mining. Data and Knowledge Engineering. 2007;61(2):228–262. doi: 10.1016/j.datak.2006.06.007. [DOI] [Google Scholar]
  • 37.Szklarczyk D., Franceschini A., Wyder S., et al. STRING v10: protein-protein interaction networks, integrated over the tree of life. Nucleic Acids Research. 2015;43(1):D447–D452. doi: 10.1093/nar/gku1003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Chatr-Aryamontri A., Breitkreutz B.-J., Oughtred R., et al. The BioGRID interaction database: 2015 update. Nucleic Acids Research. 2015;43(1):D470–D478. doi: 10.1093/nar/gku1204. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Wei C.-H., Kao H.-Y., Lu Z. PubTator: a web-based text mining tool for assisting biocuration. Nucleic Acids Research. 2013;41(1):W518–W522. doi: 10.1093/nar/gkt441. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Safran M., Dalah I., Alexander J., et al. GeneCards version 3: the human gene integrator. Database. 2010;2010:16. doi: 10.1093/database/baq020.baq020 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Huang C. C., Lu Z. Community challenges in biomedical text mining over 10 years: success, failure and the future. Briefings in Bioinformatics. 2015 doi: 10.1093/bib/bbv024. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Khare R., Good B. M., Leaman R., Su A. I., Lu Z. Crowdsourcing in biomedicine: challenges and opportunities. Briefings in Bioinformatics. 2015 doi: 10.1093/bib/bbv021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Oliver D. E., Bhalotia G., Schwartz A. S., Altman R. B., Hearst M. A. Tools for loading MEDLINE into a local relational database. BMC Bioinformatics. 2004;5, article 146 doi: 10.1186/1471-2105-5-146. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Wang J., Cetindil I., Ji S., et al. Interactive and fuzzy search: a dynamic way to explore MEDLINE. Bioinformatics. 2010;26(18):2321–2327. doi: 10.1093/bioinformatics/btq414.btq414 [DOI] [PubMed] [Google Scholar]
  • 45.Liu B., Wang X., Lin L., Dong Q., Wang X. A discriminative method for protein remote homology detection and fold recognition combining Top-n-grams and latent semantic analysis. BMC Bioinformatics. 2008;9, article 510 doi: 10.1186/1471-2105-9-510. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Liu B., Xu J., Zou Q., Xu R., Wang X., Chen Q. Using distances between Top-n-gram and residue pairs for protein remote homology detection. BMC Bioinformatics. 2014;15(supplement 2, article S3) doi: 10.1186/1471-2105-15-S2-S3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Liu B., Liu F., Fang L., Wang X., Chou K. repDNA: a python package to generate various modes of feature vectors for DNA sequences by incorporating user-defined physicochemical properties and sequence-order effects. Bioinformatics. 2015;31(8):1307–1309. doi: 10.1093/bioinformatics/btu820. [DOI] [PubMed] [Google Scholar]
  • 48.Liu B., Zhang D., Xu R., et al. Combining evolutionary information extracted from frequency profiles with sequence-based kernels for protein remote homology detection. Bioinformatics. 2014;30(4):472–479. doi: 10.1093/bioinformatics/btt709. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Chou P. Y., Fasman G. D. Empirical predictions of protein conformation. Annual Review of Biochemistry. 1978;47:251–276. doi: 10.1146/annurev.bi.47.070178.001343. [DOI] [PubMed] [Google Scholar]
  • 50.Garnier J., Osguthorpe D. J., Robson B. Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular proteins. Journal of Molecular Biology. 1978;120(1):97–120. doi: 10.1016/0022-2836(78)90297-8. [DOI] [PubMed] [Google Scholar]
  • 51.Dong Q., Wang X., Lin L., Wang Y. Analysis and prediction of protein local structure based on structure alphabets. Proteins: Structure, Function and Genetics. 2008;72(1):163–172. doi: 10.1002/prot.21904. [DOI] [PubMed] [Google Scholar]
  • 52.Dong Q., Wang X., Lin L. Prediction of protein local structures and folding fragments based on building-block library. Proteins: Structure, Function and Genetics. 2008;72(1):353–366. doi: 10.1002/prot.21931. [DOI] [PubMed] [Google Scholar]
  • 53.Rost B., Sander C. Prediction of protein secondary structure at better than 70% accuracy. Journal of Molecular Biology. 1993;232(2):584–599. doi: 10.1006/jmbi.1993.1413. [DOI] [PubMed] [Google Scholar]
  • 54.Ding H., Lin H., Chen W., et al. Prediction of protein structural classes based on feature selection technique. Interdisciplinary Sciences: Computational Life Sciences. 2014;6(3):235–240. doi: 10.1007/s12539-013-0205-6. [DOI] [PubMed] [Google Scholar]
  • 55.Lin H., Ding C., Song Q., et al. The prediction of protein structural class using averaged chemical shifts. Journal of Biomolecular Structure & Dynamics. 2012;29(6):643–649. doi: 10.1080/07391102.2011.672628. [DOI] [PubMed] [Google Scholar]
  • 56.Lin C., Zou Y., Qin J., et al. Hierarchical classification of protein folds using a novel ensemble classifier. PLoS ONE. 2013;8(2) doi: 10.1371/journal.pone.0056499.e56499 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Chen W., Liu X., Huang Y., Jiang Y., Zou Q., Lin C. Improved method for predicting protein fold patterns with ensemble classifiers. Genetics and Molecular Research. 2012;11(1):174–181. doi: 10.4238/2012.january.27.4. [DOI] [PubMed] [Google Scholar]
  • 58.Zhao X., Zou Q., Liu B., Liu X. Exploratory predicting protein folding model with random forest and hybrid features. Current Proteomics. 2014;11(4):289–299. [Google Scholar]
  • 59.Liu Y., Carbonell J., Klein-Seetharaman J., Gopalakrishnan V. Comparison of probabilistic combination methods for protein secondary structure prediction. Bioinformatics. 2004;20(17):3099–3107. doi: 10.1093/bioinformatics/bth370. [DOI] [PubMed] [Google Scholar]
  • 60.Romero P., Obradovic Z., Li X., Garner E. C., Brown C. J., Dunker A. K. Sequence complexity of disordered protein. Proteins: Structure, Function and Genetics. 2001;42(1):38–48. doi: 10.1002/1097-0134(20010101)42:1<38::aid-prot50>3.0.co;2-3. [DOI] [PubMed] [Google Scholar]
  • 61.Su C.-T., Chen C.-Y., Ou Y.-Y. Protein disorder prediction by condensed PSSM considering propensity for order or disorder. BMC Bioinformatics. 2006;7, article 319 doi: 10.1186/1471-2105-7-319. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Su C.-T., Chen C.-Y., Hsu C.-M. IPDA: integrated protein disorder analyzer. Nucleic Acids Research. 2007;35(2):W465–W472. doi: 10.1093/nar/gkm353. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Ward J. J., Sodhi J. S., McGuffin L. J., Buxton B. F., Jones D. T. Prediction and functional analysis of native disorder in proteins from the three kingdoms of life. Journal of Molecular Biology. 2004;337(3):635–645. doi: 10.1016/j.jmb.2004.02.002. [DOI] [PubMed] [Google Scholar]
  • 64.Shimizu K., Hirose S., Noguchi T. POODLE-S: web application for predicting protein disorder by using physicochemical features and reduced amino acid set of a position-specific scoring matrix. Bioinformatics. 2007;23(17):2337–2338. doi: 10.1093/bioinformatics/btm330. [DOI] [PubMed] [Google Scholar]
  • 65.Hirose S., Shimizu K., Kanai S., Kuroda Y., Noguchi T. POODLE-L: a two-level SVM prediction system for reliably predicting long disordered regions. Bioinformatics. 2007;23(16):2046–2053. doi: 10.1093/bioinformatics/btm302. [DOI] [PubMed] [Google Scholar]
  • 66.Wang L., Sauer U. H. OnD-CRF: predicting order and disorder in proteins conditional random fields. Bioinformatics. 2008;24(11):1401–1402. doi: 10.1093/bioinformatics/btn132. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Han P., Zhang X., Norton R. S., Feng Z.-P. Large-scale prediction of long disordered regions in proteins using random forests. BMC Bioinformatics. 2009;10, article 8 doi: 10.1186/1471-2105-10-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Liu B., Wang X., Lin L., Dong Q., Wang X. Exploiting three kinds of interface propensities to identify protein binding sites. Computational Biology and Chemistry. 2009;33(4):303–311. doi: 10.1016/j.compbiolchem.2009.07.001. [DOI] [PubMed] [Google Scholar]
  • 69.Liu B., Liu B., Liu F., Wang X. Protein binding site prediction by combining hidden markov support vector machine and profile-based propensities. The Scientific World Journal. 2014;2014:6. doi: 10.1155/2014/464093.464093 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Wang Z., Zou Q., Jiang Y., Ju Y., Zeng X. Review of protein subcellular localization prediction. Current Bioinformatics. 2014;9(3):331–342. doi: 10.2174/1574893609666140212000304. [DOI] [Google Scholar]
  • 71.Lin H., Ding H., Guo F.-B., Zhang A.-Y., Huang J. Predicting subcellular localization of mycobacterial proteins by using Chou's pseudo amino acid composition. Protein & Peptide Letters. 2008;15(7):739–744. doi: 10.2174/092986608785133681. [DOI] [PubMed] [Google Scholar]
  • 72.Lin H., Ding H., Guo F.-B., Huang J. Prediction of subcellular location of mycobacterial protein using feature selection techniques. Molecular Diversity. 2010;14(4):667–671. doi: 10.1007/s11030-009-9205-1. [DOI] [PubMed] [Google Scholar]
  • 73.Lin H., Wang H., Ding H., Chen Y.-L., Li Q.-Z. Prediction of subcellular localization of apoptosis protein using Chou's pseudo amino acid composition. Acta Biotheoretica. 2009;57(3):321–330. doi: 10.1007/s10441-008-9067-4. [DOI] [PubMed] [Google Scholar]
  • 74.Lin H., Chen W., Yuan L.-F., Li Z.-Q., Ding H. Using over-represented tetrapeptides to predict protein submitochondria locations. Acta Biotheoretica. 2013;61(2):259–268. doi: 10.1007/s10441-013-9181-9. [DOI] [PubMed] [Google Scholar]
  • 75.Ding H., Guo S.-H., Deng E.-Z., et al. Prediction of Golgi-resident protein types by using feature selection technique. Chemometrics and Intelligent Laboratory Systems. 2013;124:9–13. doi: 10.1016/j.chemolab.2013.03.005. [DOI] [Google Scholar]
  • 76.Lin H., Ding C., Yuan L.-F., et al. Predicting subchloroplast locations of proteins based on the general form of Chou's pseudo amino acid composition: approached from optimal tripeptide composition. International Journal of Biomathematics. 2013;6(2) doi: 10.1142/s1793524513500034.1350003 [DOI] [Google Scholar]
  • 77.Zhu P.-P., Li W.-C., Zhong Z.-J., et al. Predicting the subcellular localization of mycobacterial proteins by incorporating the optimal tripeptides into the general form of pseudo amino acid composition. Molecular BioSystems. 2015;11(2):558–563. doi: 10.1039/c4mb00645c. [DOI] [PubMed] [Google Scholar]
  • 78.Ding H., Liu L., Guo F.-B., Huang J., Lin H. Identify golgi protein types with modified mahalanobis discriminant algorithm and pseudo amino acid composition. Protein & Peptide Letters. 2011;18(1):58–63. doi: 10.2174/092986611794328708. [DOI] [PubMed] [Google Scholar]
  • 79.Zou Q., Li X., Jiang Y., Zhao Y., Wang G. BinMemPredict: a web server and software for predicting membrane protein types. Current Proteomics. 2013;10(1):2–9. doi: 10.2174/1570164611310010002. [DOI] [Google Scholar]
  • 80.Lin H. The modified Mahalanobis discriminant for predicting outer membrane proteins by using Chou's pseudo amino acid composition. Journal of Theoretical Biology. 2008;252(2):350–356. doi: 10.1016/j.jtbi.2008.02.004. [DOI] [PubMed] [Google Scholar]
  • 81.Ding C., Yuan L.-F., Guo S.-H., Lin H., Chen W. Identification of mycobacterial membrane proteins and their types using over-represented tripeptide compositions. Journal of Proteomics. 2012;77:321–328. doi: 10.1016/j.jprot.2012.09.006. [DOI] [PubMed] [Google Scholar]
  • 82.Lin H., Ding H. Predicting ion channels and their types by the dipeptide mode of pseudo amino acid composition. Journal of Theoretical Biology. 2011;269:64–69. doi: 10.1016/j.jtbi.2010.10.019. [DOI] [PubMed] [Google Scholar]
  • 83.Liu B., Wang X., Zou Q., Dong Q., Chen Q. Protein remote homology detection by combining Chou's pseudo amino acid composition and profile-based protein representation. Molecular Informatics. 2013;32(9-10):775–782. doi: 10.1002/minf.201300084. [DOI] [PubMed] [Google Scholar]
  • 84.Liu B., Wang X., Chen Q., Dong Q., Lan X. Using amino acid physicochemical distance transformation for fast protein remote homology detection. PLoS ONE. 2012;7(9) doi: 10.1371/journal.pone.0046633.e46633 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 85.Yu G., Rangwala H., Domeniconi C., Zhang G., Yu Z. Protein function prediction with incomplete annotations. IEEE/ACM Transactions on Computational Biology and Bioinformatics. 2014;11(3):579–591. doi: 10.1109/tcbb.2013.142. [DOI] [PubMed] [Google Scholar]
  • 86.Zou Q., Wang Z., Guan X., Liu B., Wu Y., Lin Z. An approach for identifying cytokines based on a novel ensemble classifier. BioMed Research International. 2013;2013:11. doi: 10.1155/2013/686090.686090 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 87.Yu G., Rangwala H., Domeniconi C., Zhang G., Yu Z. Protein function prediction using multi-label ensemble classification. IEEE/ACM Transactions on Computational Biology and Bioinformatics. 2013;10(4):1045–1057. doi: 10.1109/tcbb.2013.111. [DOI] [PubMed] [Google Scholar]
  • 88.Ding H., Deng E.-Z., Yuan L.-F., et al. iCTX-type: a sequence-based predictor for identifying the types of conotoxins in targeting ion channels. BioMed Research International. 2014;2014:10. doi: 10.1155/2014/286419.286419 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 89.Liu W.-X., Deng E.-Z., Chen W., Lin H. Identifying the subfamilies of voltage-gated potassium channels using feature selection technique. International Journal of Molecular Sciences. 2014;15(7):12940–12951. doi: 10.3390/ijms150712940. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 90.Ding H., Li D. Identification of mitochondrial proteins of malaria parasite using analysis of variance. Amino Acids. 2015;47(2):329–333. doi: 10.1007/s00726-014-1862-4. [DOI] [PubMed] [Google Scholar]
  • 91.Ding H., Feng P.-M., Chen W., Lin H. Identification of bacteriophage virion proteins by the ANOVA feature selection and analysis. Molecular BioSystems. 2014;10(8):2229–2235. doi: 10.1039/c4mb00316k. [DOI] [PubMed] [Google Scholar]
  • 92.Yuan L.-F., Ding C., Guo S.-H., Ding H., Chen W., Lin H. Prediction of the types of ion channel-targeted conotoxins based on radial basis function network. Toxicology in Vitro. 2013;27(2):852–856. doi: 10.1016/j.tiv.2012.12.024. [DOI] [PubMed] [Google Scholar]
  • 93.Lin H., Chen W. Prediction of thermophilic proteins using feature selection technique. Journal of Microbiological Methods. 2011;84(1):67–70. doi: 10.1016/j.mimet.2010.10.013. [DOI] [PubMed] [Google Scholar]
  • 94.Cheng X.-Y., Huang W.-J., Hu S.-C., et al. A global characterization and identification of multifunctional enzymes. PLoS ONE. 2012;7(6) doi: 10.1371/journal.pone.0038979.e38979 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 95.Lin H., Chen W., Ding H. AcalPred: a sequence-based tool for discriminating between acidic and alkaline enzymes. PLoS ONE. 2013;8(10) doi: 10.1371/journal.pone.0075726.e75726 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 96.Zou Q., Chen W., Huang Y., Liu X., Jiang Y. Identifying multi-functional enzyme by hierarchical multi-label classifier. Journal of Computational and Theoretical Nanoscience. 2013;10(4):1038–1043. doi: 10.1166/jctn.2013.2804. [DOI] [Google Scholar]
  • 97.Liu B., Xu J., Fan S., Xu R., Zhou J., Wang X. PseDNA-Pro: DNA-binding protein identification by combining chou's PseAAC and Physicochemical distance transformation. Molecular Informatics. 2015;34(1):8–17. doi: 10.1002/minf.201400025. [DOI] [PubMed] [Google Scholar]
  • 98.Liu B., Xu J., Lan X., et al. IDNA-Prot|dis: identifying DNA-binding proteins by incorporating amino acid distance-pairs and reduced alphabet profile into the general pseudo amino acid composition. PLoS ONE. 2014;9(9) doi: 10.1371/journal.pone.0106691.e106691 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 99.Bock J. R., Gough D. A. Predicting protein-protein interactions from primary structure. Bioinformatics. 2001;17(5):455–460. doi: 10.1093/bioinformatics/17.5.455. [DOI] [PubMed] [Google Scholar]
  • 100.Ben-Hur A., Noble W. S. Kernel methods for predicting protein-protein interactions. Bioinformatics. 2005;21(supplement 1):i38–i46. doi: 10.1093/bioinformatics/bti1016. [DOI] [PubMed] [Google Scholar]
  • 101.Qi Y., Bar-Joseph Z., Klein-Seetharaman J. Evaluation of different biological data and computational classification methods for use in protein interaction prediction. Proteins: Structure, Function and Genetics. 2006;63(3):490–500. doi: 10.1002/prot.20865. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 102.Zhang L. V., Wong S. L., King O. D., Roth F. P. Predicting co-complexed protein pairs using genomic and proteomic data integration. BMC Bioinformatics. 2004;5, article 38 doi: 10.1186/1471-2105-5-38. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 103.Darnell S. J., Page D., Mitchell J. C. An automated decision-tree approach to predicting protein interaction hot spots. Proteins: Structure, Function, and Bioinformatics. 2007;68(4):813–823. doi: 10.1002/prot.21474. [DOI] [PubMed] [Google Scholar]
  • 104.Chen X.-W., Liu M. Prediction of protein-protein interactions using random decision forest framework. Bioinformatics. 2005;21(24):4394–4400. doi: 10.1093/bioinformatics/bti721. [DOI] [PubMed] [Google Scholar]
  • 105.Jansen R., Yu H., Greenbaum D., et al. A bayesian networks approach for predicting protein-protein interactions from genomic data. Science. 2003;302(5644):449–453. doi: 10.1126/science.1087361. [DOI] [PubMed] [Google Scholar]
  • 106.Gomez S. M., Noble W. S., Rzhetsky A. Learning to predict protein-protein interactions from protein sequences. Bioinformatics. 2003;19(15):1875–1881. doi: 10.1093/bioinformatics/btg352. [DOI] [PubMed] [Google Scholar]
  • 107.Li M.-H., Wang X.-L., Lin L., Liu T. Effect of example weights on prediction of protein-protein interactions. Computational Biology and Chemistry. 2006;30(5):386–392. doi: 10.1016/j.compbiolchem.2006.08.005. [DOI] [PubMed] [Google Scholar]
  • 108.Li M.-H., Lin L., Wang X.-L., Liu T. Protein-protein interaction site prediction based on conditional random fields. Bioinformatics. 2007;23(5):597–604. doi: 10.1093/bioinformatics/btl660. [DOI] [PubMed] [Google Scholar]
  • 109.Friedrich T., Pils B., Dandekar T., Schultz J., Müller T. Modelling interaction sites in protein domains with interaction profile hidden Markov models. Bioinformatics. 2006;22(23):2851–2857. doi: 10.1093/bioinformatics/btl486. [DOI] [PubMed] [Google Scholar]
  • 110.Šikić M., Tomić S., Vlahoviček K. Prediction of protein-protein interaction sites in sequences and 3D structures by random forests. PLoS Computational Biology. 2009;5(1) doi: 10.1371/journal.pcbi.1000278.e1000278 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 111.Bradford J. R., Westhead D. R. Improved prediction of protein-protein binding sites using a support vector machines approach. Bioinformatics. 2005;21(8):1487–1494. doi: 10.1093/bioinformatics/bti242. [DOI] [PubMed] [Google Scholar]
  • 112.Fariselli P., Pazos F., Valencia A., Casadio R. Prediction of protein-protein interaction sites in heterocomplexes with neural networks. European Journal of Biochemistry. 2002;269(5):1356–1361. doi: 10.1046/j.1432-1033.2002.02767.x. [DOI] [PubMed] [Google Scholar]
  • 113.Bradford J. R., Needham C. J., Bulpitt A. J., Westhead D. R. Insights into protein-protein interfaces using a bayesian network prediction method. Journal of Molecular Biology. 2006;362(2):365–386. doi: 10.1016/j.jmb.2006.07.028. [DOI] [PubMed] [Google Scholar]
  • 114.Kufareva I., Budagyan L., Raush E., Totrov M., Abagyan R. PIER: protein interface recognition for structural proteomics. Proteins. 2007;67(2):400–417. doi: 10.1002/prot.21233. [DOI] [PubMed] [Google Scholar]
  • 115.Yu J., Guo M., Needham C. J., Huang Y., Cai L., Westhead D. R. Simple sequence-based kernels do not predict protein-protein interactions. Bioinformatics. 2010;26(20):2610–2614. doi: 10.1093/bioinformatics/btq483. [DOI] [PubMed] [Google Scholar]
  • 116.Zou Q., Zhao T., Liu Y., Guo M. Predicting RNA secondary structure based on the class information and Hopfield network. Computers in Biology and Medicine. 2009;39(3):206–214. doi: 10.1016/j.compbiomed.2008.12.010. [DOI] [PubMed] [Google Scholar]
  • 117.Zou Q., Lin C., Liu X.-Y., Han Y.-P., Li W.-B., Guo M.-Z. Novel representation of RNA secondary structure used to improve prediction algorithms. Genetics and Molecular Research. 2011;10(3):1986–1998. doi: 10.4238/vol10-3gmr1181. [DOI] [PubMed] [Google Scholar]
  • 118.Liu B., Fang L., Liu F., et al. Identification of real microRNA precursors with a pseudo structure status composition approach. PLoS ONE. 2015;10(3) doi: 10.1371/journal.pone.0121501.e0121501 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 119.Liu B., Fang L., Chen J., Liu F., Wang X. miRNA-dis: microRNA precursor identification based on distance structure status pairs. Molecular BioSystems. 2015;11(4):1194–1204. doi: 10.1039/c5mb00050e. [DOI] [PubMed] [Google Scholar]
  • 120.Will S., Reiche K., Hofacker I. L., Stadler P. F., Backofen R. Inferring noncoding RNA families and classes by means of genome-scale structure-based clustering. PLoS Computational Biology. 2007;3(4, article e65) doi: 10.1371/journal.pcbi.0030065. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 121.Will S., Joshi T., Hofacker I. L., Stadler P. F., Backofen R. LocARNA-P: accurate boundary prediction and improved detection of structural RNAs. RNA. 2012;18(5):900–914. doi: 10.1261/rna.029041.111. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 122.Wang C., Wei L., Guo M., Zou Q. Computational approaches in detecting non-coding RNA. Current Genomics. 2013;14(6):371–377. doi: 10.2174/13892029113149990005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 123.Zou Q., Li X.-B., Jiang W.-R., Lin Z.-Y., Li G.-L., Chen K. Survey of MapReduce frame operation in bioinformatics. Briefings in Bioinformatics. 2014;15(4):637–647. doi: 10.1093/bib/bbs088.bbs088 [DOI] [PubMed] [Google Scholar]

Articles from Computational and Mathematical Methods in Medicine are provided here courtesy of Wiley

RESOURCES