A Review of Recent Advancement in Integrating Omics Data with Literature Mining towards Biomedical Discoveries

Kalpana Raja; Matthew Patrick; Yilin Gao; Desmond Madu; Yuyang Yang; Lam C Tsoi

doi:10.1155/2017/6213474

. 2017 Feb 26;2017:6213474. doi: 10.1155/2017/6213474

A Review of Recent Advancement in Integrating Omics Data with Literature Mining towards Biomedical Discoveries

Kalpana Raja ¹, Matthew Patrick ¹, Yilin Gao ¹, Desmond Madu ¹, Yuyang Yang ¹, Lam C Tsoi ^1,^2,^3,^*

PMCID: PMC5346376 PMID: 28331849

Abstract

In the past decade, the volume of “omics” data generated by the different high-throughput technologies has expanded exponentially. The managing, storing, and analyzing of this big data have been a great challenge for the researchers, especially when moving towards the goal of generating testable data-driven hypotheses, which has been the promise of the high-throughput experimental techniques. Different bioinformatics approaches have been developed to streamline the downstream analyzes by providing independent information to interpret and provide biological inference. Text mining (also known as literature mining) is one of the commonly used approaches for automated generation of biological knowledge from the huge number of published articles. In this review paper, we discuss the recent advancement in approaches that integrate results from omics data and information generated from text mining approaches to uncover novel biomedical information.

1. Introduction

The advances in biotechnology have allowed biomedical research to answer efficiently important biological questions in the different omics scales: genetics, genomics, transcriptomics, epigenomics, proteomics, and metabolomics [1–4]. The omics data can characterize the behaviors of cells, tissues, and organs at the molecular level and allow the comprehensive understanding for the etiology of human diseases. Among the various omics studies, genetic and genomic studies are widely adopted in biomedical research to discover new genes or susceptibility loci associated with different human traits or diseases [5, 6]. Proteomic study is concerned with the structure, function, and modification of proteins expressed in a biological system, specifically the posttranscriptional modifications such as phosphorylation, methylation, and acetylation, which lead to transcription and translation of the same genome into various types of proteomes [7, 8]. Epigenomic study has attracted great attention in the last 5 years. It characterizes the epigenetic modifications of the genome and aims to understand the regulations of the gene expression. Transcriptomic study, in turn, enables the genome-wide assessment of gene expression patterns in cells and tissues by studying the complete set of RNA transcriptomes [9]. Finally, metabolomic study characterizes the metabolites present in cell, tissue, and body fluid and identifies the fluctuation of these metabolites in various disease conditions [10]. The different types of omics studies accumulate a huge volume of data through high-throughput sequencing experiments and provide insights towards the cellular and metabolic processes related to disease diagnoses, treatment, and prevention.

According to the PubMed, over 36,000 research articles have been published in the past ten years and annotated by at least one of the above “omics” experiments (by using the following search phrase: “(genomics [MeSH] OR proteomics [MeSH] OR metabolomics [MeSH] OR transcriptomics [MeSH]) AND humans [MeSH]”). The interest in omics studies has not declined and their applications are evident from the publications in recent years, when compared to only over 10,000 research articles published prior to 2006 by using the same search phrase. However, the acquired data raises various significant challenges: (i) the interpretation of high-throughput results; (ii) the translation of biological data to clinical application; (iii) the data handling, storage, and sharing issues; and (iv) the reproducibility when comparing between different experiments [11, 12]. Among these, the last challenge has been a long-lasting issue, most likely due to the potential discrepancies in processing and interpreting the high-throughput data or due to “cherry-picking” approach to subjectively focus on the components that are indeed false positives. The traditional strategies to overcome these challenges are to conduct extensive literature search and seek professional opinions from domain experts to decipher the mechanism and then conduct downstream experiments to verify the findings. However, this has proven to be time consuming and subjective and has not been a common practice when researchers publish their results from high-throughput experiments. On the other hand, automated approaches have gained much interest in recent years to annotate gene functions [13], to identify biomarkers [14], and to explore genetic mutations [15]. Text mining (also known as literature mining) is a technique that has been used to retrieve and process research articles from PubMed database and can summarize biomedical information present across articles. In molecular biology, text mining is typically used to retrieve relevant documents, prioritize the documents, extract the biomedical concepts (e.g., genes, proteins, cell, tissue, and cell-type), and extract the causal relationships between concepts [16, 17]. Text mining can significantly decrease the time and effort required, compared with traditional labor-intensive approaches.

In this review, we first discuss the various omics techniques used in healthcare and summarize the recent advances in utilizing text mining approaches to facilitate the interpretation and translation of these omics data. We then focus on biomedical literature mining and clinical text mining and further describe the challenges involved in integrating the knowledge from different resources to enhance the biomedical research. Finally, we explain the recent methods to integrate omics and biomedical literature mining data in order to uncover novel biomedical information.

2. The Study of “Omics”

Traditionally, “omics” corresponds to the study of four major biomolecules: genes, proteins, transcriptomes, and metabolites [4]. Since the discovery of DNA [31], much interest has been gained towards understanding the roles of genes and proteins in cellular functions and transduction. Healthcare is considered to vary from one individual to another based on his genome, proteome, transcriptome, and metabolome. The digital revolution has paved the way for integrating patient omics data with the findings in literature for the discovery of novel biomarkers and drug targets [32–34]. Therefore, the study of omics has expanded beyond these four major omics studies, and Table 1 summarizes the various types of omics data applied to biomedical discoveries. The study of omics has introduced the realm of big data to biomedicine [35, 36]. While the first human genome project took more than a decade to complete and involved $3 billion dollars, the entire genome can be sequenced and analyzed within hours for ~$1000 now. Thus, biomedical projects are now possible to generate information at the petabyte (i.e., 1,012 bytes) scale. Nevertheless, the greatest challenge is the large-scale data analysis and its integration with clinical data available in patient electronic health records (EHR) [37].

Table 1.

Omics and biomedical applications.

	Omics	Study topic	Biomedical applications^†
Genetics/molecular genetics	Genomics	Genes	Gencode, Entrez Gene
	Epigenomics	Epigenetics modifications	Gene Express Omnibus
	Exposomics	Disease-causing environmental factors	Comparative Toxicogenomics Database
	Exomics	Exons in a genome	ICE—a human splice sites database
	ORFeomics	Open Reading Frame (ORF)	—
	Phenomics	Phenotypes	Human Phenotype Ontology
	Pharmacogenomics	Impact of genes on individual's response to drugs	PharmGKB
	Pharmacogenetics	SNPs and their impact on pharmacodynamics and pharmacokinetics	PharmGKB
	Toxicogenomics	Genes response to toxic substances	Comparative Toxicogenomics Database

Molecular biology	Proteomics	Proteins and amino acids	Proteomics Identifications Database (PRIDE)
	Metabolomics	Metabolites	HMDB: Human Metabolome Database
	Transcriptomics	Transcripts (i.e., rRNA, mRNA, tRNA, and microRNA)	Human Transcriptome Map
	Ionomics	Inorganic biomolecules	—
	Kinomics	Protein kinases	KinBase database and KinWeb database
	Metagenomics	Genetic material from multiple organisms	MG-RAST
	Regulomics	Transcription factors and other biomolecules involved in the regulation of gene expression	miRegulome
	Toponomics	Cell and tissue structure	—

Medicine	Trialomics	Human interventional trials from clinical trials	—
	Connectomics	Structural and functional connectivity in brain	—
	Interactomics	Interferons	CREDO

Open in a new tab

^†The list shows example applications.

Cloud [38] and parallel computing [39] are currently used in omics research to handle the huge volume of data. Cloud computing is described as a network of computers connected together through the Internet for effective processing. It is available remotely, through cloud computing providers (e.g., Microsoft, Google, and Amazon), and researchers have an option to make use of it at an affordable cost. Parallel computing speeds up the processing time using the same hardware and Internet setup. The combined approach of using cloud computing and parallel computing together is capable of processing omics data in a feasible time [40, 41]. Other high performance computing platforms include clusters [42], grid computing [43], and graphical processing units [44]. Processing omics data and applying bioinformatics models to the data require expertise to integrate computational, biological, mathematical, and statistical knowledge.

3. Text Mining

PubMed database is a main repository for biomedical literature and contains over 26 million articles. The number of articles being published and indexed by PubMed is increasing exponentially, and therefore text mining has become an attractive (and standard) approach in mining literature data when comparing with the traditional labor-intensive strategies. Researchers use the text mining approach to tackle information overload, both in biomedical and in general areas of big data collection, because it automates data retrieval and information extraction from the unstructured biomedical texts to reveal novel information [45, 46]. While information extraction examines the relationships between specific kinds of information contained within or between documents, information retrieval focuses on summarizing data from the larger units of documents [47]. Another automated approach to deal with unstructured data is Natural Language Processing (NLP). While text mining concentrates on solving a specific problem in a particular domain, NLP attempts to understand the text as a whole [48]. Recently, text mining and NLP have been used to address different biological questions in omics research [49].

3.1. Biomedical Literature Mining

The era of applying text mining approaches to biology and biomedical fields came into existence in 1999. It was first applied to the biomedical domain for gene expression profiling [50], as well as the extraction and visualization of protein-protein interaction [51]. It emerged as a hybrid discipline from the edges of three major fields, namely, bioinformatics, information science, and computational linguistics. Biomedical literature mining is concerned with the identification and extraction of biomedical concepts (e.g., genes, proteins, DNA/RNA, cells, and cell types) and their functional relationships [17]. The major tasks include (i) document retrieval and prioritization (gathering and prioritizing the relevant documents); (ii) information extraction (extracting information of interest from the retrieved document); (iii) knowledge discovery (discovering new biological event or relationship among the biomedical concepts); and (iv) knowledge summarization (summarizing the knowledge available across the documents). A brief description of the biomedical literature mining tasks is listed as follows.

Biomedical Text Mining Tasks

Document Retrieval. The process of extracting relevant documents from a large collection is called document retrieval or information retrieval [52]. The two basic strategies applied are query-based and document-based retrieval. In query-based retrieval, documents matching with the user specified query are retrieved. In document-based retrieval, a ranked list of documents similar to a document of interest is retrieved.

Document Prioritization. The retrieved documents are usually prioritized to get the most relevant document. Many biomedical document retrieval systems achieve prioritization based on certain parameters including journal-related metrics (e.g., impact factor, citation count) [53] and MeSH index [54, 55] for biomedical articles. The similarity between the documents is estimated with various similarity measurements (e.g., Jaccard similarity, cosine similarity) [56].

Information Extraction. This task aims to extract and present the information in a structured format. Concept extraction and relation/event extraction are the two major components of information extraction [57, 58]. While concept extraction automatically identifies the biomedical concepts present in the articles, relation/event extraction is used to predict the relationship or biological event (e.g., phosphorylation) between the concepts [59, 60].

Knowledge Discovery. It is a nontrivial process to discover novel and potentially useful biological information from the structured text obtained from information extraction. Knowledge discovery uses techniques from a wide range of disciplines such as artificial intelligence, machine learning, pattern recognition, data mining, and statistics [61]. Both information extraction and knowledge discovery find their application in database curation [62, 63] and pathway construction [64, 65].

Knowledge Summarization. The purpose of knowledge summarization is to generate information for a given topic from one or multiple documents. The approach aims to reduce the source text to express the most important key points through content reduction selection and/or generalization [66]. Although knowledge summarization helps to manage the information overload, the state of the art is still open to research to develop more sophisticated approaches that increase the likelihood of identifying the information.

Hypothesis Generation. An important task of text mining is hypothesis generation to predict unknown biomedical facts from biomedical articles. These hypotheses are useful in designing experiments or explaining existing experimental results [67].

Conventional text mining approaches process PubMed abstracts rather than the full-text articles and fail to mine the information not in abstracts. Recently, text mining from the full-text articles is gaining more interest [59]. However, it involves many challenges: (1) the availability of full-text articles is limited (4 million full-text articles in PubMed Central versus 26 million abstracts in PubMed); (2) text mining within tables, figures, and equations is complicated; and (3) information redundancy within the articles. An automated text mining system is generally evaluated using a standard corpus (Table 2). However, the availability of standard corpora in biomedical domain is limited because its generation is expensive, time consuming, and requires domain experts. In general, a gold standard is developed within the research groups when the standard corpora are not available, but mostly not available to other researchers. The text mining systems are commonly evaluated using precision, recall, and f-score. Precision is defined as the relevance accuracy, recall is defined as the retrieval accuracy, and f-score is defined as the harmonic mean of precision and recall [56].

Table 2.

Standard corpora for omics domain.

Corpus	Text mining evaluation task	Brief introduction
JNLPBA (Joint Workshop on NLP in Biomedicine and Its Applications) [18]	Gene/protein concept extraction	The corpus consists of 2,000 PubMed abstracts as training data and 404 PubMed abstracts as test data.

BioCreAtivE 2004 Task 1A dataset [19]	Gene/protein concept extraction	The corpus consists of 15,000 PubMed sentences as training data and 5,000 PubMed sentences as test data.

BioCreAtivE 2 Gene Mention (GM) dataset [20]	Gene/protein concept extraction	The corpus consists of 15,000 PubMed sentences as training data and 5,000 PubMed sentences as test data.

AIMED [21]	Protein-protein interaction	The corpus consists of 225 PubMed abstracts that contain 1,987 sentences with 4,075 protein mentions.

HPRD50 (Human Protein Reference Database) [22]	Protein-protein interaction	The corpus consists of sentences with protein-protein interaction from 50 PubMed abstracts.

BioInfer (Bio Information Extraction Resource) [23]	Protein, gene, and RNA relationships	The corpus consists of 1100 sentences annotated with concept names, relationships, and syntactic dependencies.

IEPA (Interaction Extraction Performance Assessment) [24]	Protein-protein interaction	The corpus consists of more than 200 PubMed sentences annotated with protein-protein interaction.

BioCreAtivE 2.5 Elsevier Corpus [25]	Protein-protein interaction	The corpus consists of 61 PubMed articles as training data and 62 PubMed articles as test data.

BC4GO Corpus [26]	Gene ontology	The corpus consists of 1356 distinct GO terms from 200 PubMed articles.

GREC Corpus [27]	Gene regulation and gene expression events	The corpus consists of 240 PubMed abstracts with annotations on gene regulation and gene expression events.

GETM [28]	Gene expression events	The corpus consists of 150 PubMed abstracts with annotation for gene expression events.

AnEM [29]	Tissue, cell, developing anatomical structure, cellular component	The corpus consists of 500 PubMed sentences with annotations on variety of biomedical concepts.

CellFinder Corpus [30]	Anatomical parts, cell lines, cell types, species, and cell components	The corpus consists of annotations from 10 full-text PubMed articles.

Open in a new tab

3.2. Clinical Text Mining

Electronic health records, discharge summaries, and clinical narratives of patients are rich in information that could be useful for improving the healthcare. In addition, the information is also available from the transcription of dictations, direct entry by clinicians/physicians, or speech recognition software. The encoding of structural information from the clinical resources is useful to clinicians and researchers. For example, automated high-throughput clinical applications can be developed to support clinicians' information needs [68]. However, manual encoding is expensive and limited to primary and secondary diagnoses. Clinical text mining, also known as clinical NLP or Medical Language Processing (or simply MLP), is suggested as a potential technology by Institute of Medicine for mining clinical resources. The tasks described above in biomedical literature mining are applicable to clinical text mining and include additional subtasks [69]: (i) negation recognition (e.g., “patient denies on developing rashes”), (ii) temporal extraction (e.g., “small bumps noticed last year”), and (iii) patient-event relationship (e.g., “patient mother had arthritis”).

The modern healthcare relies on big data analytics for integrating, organizing, and utilizing different pharmacological or clinical information. A hybrid approach to combine patient genomic data and electronic health record information is expanding as the future vision of healthcare. The omics data has become an emerging tool for diagnosis/clinical investigations of common and rare diseases and helps in clinical decision making (i.e., selecting the best possible treatments for patients). Genome-Wide Association Study (GWAS), also known as Whole Genome Association Study (WGAS), is a relatively new approach for identifying genes (i.e., loci associated with human traits) through rapid scanning of markers across whole DNA or genome [70]. GWAS has been applied also to cancer research for drug repositioning [71], prioritizing susceptible genes in Crohn's disease [72], and analyzing the human variants in the area of precision medicine [73]. As an example, the Michigan Genomics Initiatives (MGI) at the University of Michigan has developed an institutional based DNA and genetics repository combined with patient phenotype. The project aims to bring awareness to each patient/participant about the disease development and response to treatments for better health and wellness. The current studies at MGI include analgesics outcome study (AOS), understanding opioid use in chronic pain patients, a pivotal study on high-frequency nerve block for postamputation pain, Michigan body map (MBM), and positive piggy bag (https://www.michigangenomics.org/).

Clinical text mining faces the following specific challenges: (1) access to patient EHR requires permission from Institutional Review Board (IRB); (2) personal details of the patients should be deidentified; (3) mining approaches depend on the types of clinical documents (e.g., EHR, discharge summary, medical billing, and clinical narratives); (4) mining of dosage information, different types of formulations, and temporal information is demanded; and (5) spelling mistakes and grammatical errors are common in clinical text [69]. The state of the art for both biomedical literature mining and clinical text mining is still open with many challenges and requires more sophisticated and robust approaches.

4. Role of Text Mining in Omics Study

Relationship between concepts of the same kind (e.g., gene-gene) or different kind (e.g., gene-disease) is commonly known as “event” [74]. The events are useful to identify many clinical facts such as disease onset and response to drug treatment. Overwhelming of biomedical articles from omics research has accumulated abundance of information and requires advanced event extraction systems to support the complexity of available information and coverage of varieties of biomedical subdomains [16]. Text mining approaches do not replace the manual curation of biomedical information but support speeding up the process by several-fold [75, 76]. In this section we describe the various text mining approaches developed for mining omics related information.

4.1. Genomics and Text Mining

In the current era of genomics, text mining plays an important role in mining gene-gene interactions [77, 78] and other gene involved interactions (e.g., gene-chemical, gene-disease) [79, 80] to support integrative analysis of gene expression [81, 82], pathway construction [83, 84], ontology development [85], and database annotation [62, 86, 87].

Genes encode proteins and proteins enroll in various biological functions by interacting with other proteins. This encoding process is defined in two steps: transcription (i.e., DNA to RNA) and translation (RNA to protein). Many cellular processes are regulated by microRNA through mRNA degradation and suppression of gene expression such that the protein synthesis is interrupted. This is the fundamental of genomics. In genomics, gene function is assessed from the involvement of genes/proteins in biochemical pathways. The functional genomics is a revolutionary area in text mining where the gene/protein mentions in the biomedical articles and their relationship are considered to be important. Furthermore, gene and protein names are highly complex and text mining has contributed to their recognition in the unstructured text [57, 58].

Different text mining implementations for exploring the finding of genome research have been developed in the past decade. miRTex is a text mining system developed for mining experimentally validated microRNA gene targets from PubMed articles. The system has been successfully implemented to identify the Triple Negative Breast Cancer related genes that are regulated by microRNAs [81]. More sophisticated approaches integrate gene expressions from microarray experiments, biomedical data extracted by text mining, and gene interaction data to predict gene-based drug indications [82]. A similar approach [87] attempts to support manual curation of links between biological databases such as Gene Expression Omnibus (GEO) and PubMed database. Another approach [88] combines text mining data with microarray data for discovering disease-gene association by using unsupervised clustering. The gene-drug interaction information extracted by text mining is used to predict the drug-drug interaction [89]. Above all, the researchers have attempted to use text mining for annotating genome function with gene ontology [90]. Thus, text mining and genomics together uncover much biomedical information that was previously unknown.

4.2. Proteomics and Text Mining

Protein-protein interaction is important to explore the mechanism involved in biological processes and onset of diseases [91]. Intact [92], BIND [93], MIND [94], and DIP [95] are the major databases available for protein-protein interaction. These databases are manually curated by the domain experts, but a larger portion of information is still available only in the biomedical literature. Text mining provides a bridge to cover the gap existing between the manual curation and information hidden in the literature. The approaches to extract protein-protein interaction range from simple rule-based systems and cooccurrence systems to more sophisticated NLP methods [60] and machine learning systems [96]. Apart from protein-protein interaction extraction systems, text mining also provides automated approaches for extracting posttranslational modification of proteins such as protein phosphorylation [59].

4.3. Transcriptomics, Metabolomics, and Text Mining

Text mining approaches for transcriptomics and metabolomics are limited. One major fact is that these two areas of genomics are comparatively new when compared to genomics and proteomics. A recent study compares the metagenome characteristics of healthy individuals with autism patients to analyze the enzymes involved [97]. The computational approach uses text mining for genomics and metabolomics information extraction. A web-based tool called 3Omics is available for integrating, comparing, analyzing, and visualizing data from transcriptomics, metabolomics, and proteomics [98]. Another tool called Babelomics integrates transcriptomics, proteomics, and genomics data to uncover the underlying function profiles [99]. Thus, a wide variety of hidden biomedical information within the omics data are extracted and predicted through text mining.

5. Conclusion

In this review, we summarized the current state of the art in omics research and contribution of text mining approaches to uncover the omics related biomedical information hidden within the published articles. We discussed the core concepts of omics and the challenges involved in storing and analyzing the huge volume of omics data generated from high-throughput experiments. We also highlighted the use of computer techniques such as parallel processing and cloud computing to manage omics data and elaborated on text mining approaches for biomedical literature and clinical text with emphasis on omics. While the omics approach is emerging to be commonly used practice for basic science or clinical diagnosis technique, it is imminent to note that data interpretation and translation is the bottleneck. The advances in text mining can be useful to resolve the challenges with the omics data and further support in novel biomedical discoveries.

Acknowledgments

The authors acknowledge the support from the Undergraduate Research Opportunity Program (UROP) from the University of Michigan, the Dermatology Foundation, the Arthritis National Research Foundation, and the National Psoriasis Foundation.

Competing Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

References

1.Morra E., Lazzarino M., Alessandrino E. P., et al. Central nervous system (CNS) leukemia: the role of high dose cytarabine (HDAra-C) Bone Marrow Transplantation. 1989;4(supplement 1):101–103. [PubMed] [Google Scholar]
2.Kell D. B. The virtual human: towards a global systems biology of multiscale, distributed biochemical network models. IUBMB Life. 2007;59(11):689–695. doi: 10.1080/15216540701694252. [DOI] [PubMed] [Google Scholar]
3.Westerhoff H. V., Palsson B. O. The evolution of molecular biology into systems biology. Nature Biotechnology. 2004;22(10):1249–1252. doi: 10.1038/nbt1020. [DOI] [PubMed] [Google Scholar]
4.Horgan R. P., Kenny L. C. ‘Omic’ technologies: genomics, transcriptomics, proteomics and metabolomics. The Obstetrician & Gynaecologist. 2011;13(3):189–195. doi: 10.1576/toag.13.3.189.27672. [DOI] [Google Scholar]
5.Tsoi L. C., Spain S. L., Ellinghaus E., et al. Enhanced meta-analysis and replication studies identify five new psoriasis susceptibility loci. Nature Communications. 2015;6 doi: 10.1038/ncomms8001.7001 [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Bertrand D., Chng K. R. E., Sherbaf F. G. H., et al. Patient-specific driver gene prediction and risk assessment through integrated network analysis of cancer omics profiles. Nucleic acids research. 2015;43(7):p. e44. doi: 10.1093/nar/gku1393. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.James P. Protein identification in the post-genome era: the rapid rise of proteomics. Quarterly Reviews of Biophysics. 1997;30(4):279–331. doi: 10.1017/s0033583597003399. [DOI] [PubMed] [Google Scholar]
8.Khoury G. A., Baliban R. C., Floudas C. A. Proteome-wide post-translational modification statistics: frequency analysis and curation of the swiss-prot database. Scientific Reports. 2011;1, article 90 doi: 10.1038/srep00090. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Mortazavi A., Williams B. A., McCue K., Schaeffer L., Wold B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nature Methods. 2008;5(7):621–628. doi: 10.1038/nmeth.1226. [DOI] [PubMed] [Google Scholar]
10.Nicholson J. K., Lindon J. C., Holmes E. “Metabonomics”: understanding the metabolic responses of living systems to pathophysiological stimuli via multivariate statistical analysis of biological NMR spectroscopic data. Xenobiotica. 1999;29(11):1181–1189. doi: 10.1080/004982599238047. [DOI] [PubMed] [Google Scholar]
11.Shoenbill K., Fost N., Tachinardi U., Mendonca E. A. Genetic data and electronic health records: a discussion of ethical, logistical and technological considerations. Journal of the American Medical Informatics Association. 2014;21(1):171–180. doi: 10.1136/amiajnl-2013-001694. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Poste G. Bring on the biomarkers. Nature. 2011;469(7329):156–157. doi: 10.1038/469156a. [DOI] [PubMed] [Google Scholar]
13.Lu Q., Powles R. L., Wang Q., He B. J., Zhao H. Integrative tissue-specific functional annotations in the human genome provide novel insights on many complex traits and improve signal prioritization in genome wide association studies. PLoS Genetics. 2016;12(4) doi: 10.1371/journal.pgen.1005947.e1005947 [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Puchades-Carrasco L., Palomino-Schätzlein M., Pérez-Rambla C., Pineda-Lucena A. Bioinformatics tools for the analysis of NMR metabolomics studies focused on the identification of clinically relevant biomarkers. Briefings in Bioinformatics. 2016;17(3):541–552. doi: 10.1093/bib/bbv077. [DOI] [PubMed] [Google Scholar]
15.Forbes S. A., Beare D., Gunasekaran P., et al. COSMIC: exploring the world's knowledge of somatic mutations in human cancer. Nucleic Acids Research. 2015;43:D805–D811. doi: 10.1093/nar/gku1075. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Ananiadou S., Thompson P., Nawaz R., McNaught J., Kell D. B. Event-based text mining for biology and functional genomics. Briefings in Functional Genomics. 2015;14(3):213–230. doi: 10.1093/bfgp/elu015. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Krallinger M., Valencia A. Text-mining and information-retrieval services for molecular biology. Genome Biology. 2005;6(7, article no. 224) doi: 10.1186/gb-2005-6-7-224. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Lee H. K., Hsu A. K., Sajdak J., Qin J., Pavlidis P. Coexpression analysis of human genes across many microarray data sets. Genome Research. 2004;14(6):1085–1094. doi: 10.1101/gr.1910904. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Yeh A., Morgan A., Colosimo M., Hirschman L. BioCreAtIvE task 1A: gene mention finding evaluation. BMC Bioinformatics. 2005;6(1, article no. S2) doi: 10.1186/1471-2105-6-s1-s2. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Vlachos A. Tackling the BioCreative2 gene mention task with conditional random fields and syntactic parsing. Proceedings of the 2nd BioCreative Challenge Evaluation Workshop; April 2007; Madrid, Spain. [Google Scholar]
21.Bunescu R., Ge R., Kate R. J., et al. Comparative experiments on learning information extractors for proteins and their interactions. Artificial Intelligence in Medicine. 2005;33(2):139–155. doi: 10.1016/j.artmed.2004.07.016. [DOI] [PubMed] [Google Scholar]
22.Fundel K., Küffner R., Zimmer R. RelEx—relation extraction using dependency parse trees. Bioinformatics. 2007;23(3):365–371. doi: 10.1093/bioinformatics/btl616. [DOI] [PubMed] [Google Scholar]
23.Pyysalo S., Ginter F., Heimonen J., et al. BioInfer: a corpus for information extraction in the biomedical domain. BMC Bioinformatics. 2007;8, article 50 doi: 10.1186/1471-2105-8-50. [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Ding J., Berleant D., Nettleton D., Wurtele E. Mining MEDLINE: abstracts, sentences, or phrases? Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing. 2002;7:326–337. doi: 10.1142/9789812799623_0031. [DOI] [PubMed] [Google Scholar]
25.Leitner F., Krallinger M., Cesareni G., Valencia A. The FEBS letters SDA corpus: a collection of protein interaction articles with high quality annotations for the BioCreative II.5 online challenge and the text mining community. FEBS Letters. 2010;584(19):4129–4130. doi: 10.1016/j.febslet.2010.08.026. [DOI] [PubMed] [Google Scholar]
26.Van Auken K., Schaeffer M. L., McQuilton P., et al. BC4GO: a full-text corpus for the BioCreative IV GO task. Database. 2014;2014 doi: 10.1093/database/bau074.bau074 [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Thompson P., Iqbal S. A., McNaught J., Ananiadou S. Construction of an annotated corpus to support biomedical information extraction. BMC Bioinformatics. 2009;10, article 349 doi: 10.1186/1471-2105-10-349. [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Gerner M., Nenadic G., Bergman C. M. An exploration of mining gene expression mentions and their anatomical locations from biomedical text. Proceedings of the Workshop on Biomedical Natural Language Processing; July 2010; Uppsala, Sweden. Association for Computational Linguistics; pp. 72–80. [Google Scholar]
29.Ohta T., Pyysalo S., Tsujii J., Ananiadou S. Open-domain anatomical entity mention detection. Proceedings of the Workshop on Detecting Structure in Scholarly Discourse; July 2012; Jeju, Korea. Association for Computational Linguistics; [Google Scholar]
30.Neves M., Damaschun E., Kurtz A., Leser U. Annotating and evaluating text for stem cell research. Proceedings of the 3rd Workshop on Building and Evaluation Resources for Biomedical Text Mining (BioTxtM '12) at Language Resources and Evaluation (LREC); 2012; Istanbul, Turkey. [Google Scholar]
31.Pray L. A. Discovery of DNA structure and function: Watson and Crick. Nature Education. 2008;1(1, article 100) [Google Scholar]
32.Issa N. T., Byers S. W., Dakshanamurthy S. Big data: the next frontier for innovation in therapeutics and healthcare. Expert Review of Clinical Pharmacology. 2014;7(3):293–298. doi: 10.1586/17512433.2014.905201. [DOI] [PMC free article] [PubMed] [Google Scholar]
33.Jiang S., Hinchliffe T. E., Wu T. Biomarkers of an autoimmune skin disease-psoriasis. Genomics, Proteomics and Bioinformatics. 2015;13(4):224–233. doi: 10.1016/j.gpb.2015.04.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
34.Tebani A., Afonso C., Marret S., Bekri S. Omics-based strategies in precision medicine: toward a paradigm shift in inborn errors of metabolism investigations. International Journal of Molecular Sciences. 2016;17(9):p. 1555. doi: 10.3390/ijms17091555. [DOI] [PMC free article] [PubMed] [Google Scholar]
35.Rothberg J. M., Hinz W., Rearick T. M., et al. An integrated semiconductor device enabling non-optical genome sequencing. Nature. 2011;475(7356):348–352. doi: 10.1038/nature10242. [DOI] [PubMed] [Google Scholar]
36.Clarke J., Wu H.-C., Jayasinghe L., Patel A., Reid S., Bayley H. Continuous base identification for single-molecule nanopore DNA sequencing. Nature Nanotechnology. 2009;4(4):265–270. doi: 10.1038/nnano.2009.12. [DOI] [PubMed] [Google Scholar]
37.Canuel V., Rance B., Avillach P., Degoulet P., Burgun A. Translational research platforms integrating clinical and omics data: a review of publicly available solutions. Briefings in Bioinformatics. 2015;16(2):280–290. doi: 10.1093/bib/bbu006. [DOI] [PMC free article] [PubMed] [Google Scholar]
38.Griebel L., Prokosch H., Köpcke F., et al. A scoping review of cloud computing in healthcare. BMC Medical Informatics and Decision Making. 2015;15, article 17 doi: 10.1186/s12911-015-0145-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
39.Ocaña K., De Oliveira D. Parallel computing in genomic research: advances and applications. Advances and Applications in Bioinformatics and Chemistry. 2015;8:23–35. doi: 10.2147/aabc.s64482. [DOI] [PMC free article] [PubMed] [Google Scholar]
40.Wall D. P., Kudtarkar P., Fusaro V. A., Pivovarov R., Patil P., Tonellato P. J. Cloud computing for comparative genomics. BMC Bioinformatics. 2010;11, article no. 259 doi: 10.1186/1471-2105-11-259. [DOI] [PMC free article] [PubMed] [Google Scholar]
41.Armbrust M., Fox A., Griffith R., et al. A view of cloud computing. Communications of the ACM. 2010;53(4):50–58. doi: 10.1145/1721654.1721672. [DOI] [Google Scholar]
42.Zaharia M., Chowdhury M., Franklin M. J., Shenker S., Stoica I. Spark: cluster ComSpark: cluster computing with working sets. Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing; June 2010; Boston, Mass, USA. p. p. 10. [Google Scholar]
43.Baker M., Buyya R., Laforenza D. Grids and grid technologies for wide-area distributed computing. Software—Practice & Experience. 2002;32(15):1437–1466. doi: 10.1002/spe.488. [DOI] [Google Scholar]
44.Ufimtsev I. S., Martinez T. J. Quantum chemistry on graphical processing units. 2. Direct self-consistent-field implementation. Journal of Chemical Theory and Computation. 2009;5(4):1004–1015. doi: 10.1021/ct800526s. [DOI] [PubMed] [Google Scholar]
45.Hearst M. A. Untangling text data mining. Proceedings of the the 37th annual meeting of the Association for Computational Linguistics (ACL '99); June 1999; College Park, Maryland. pp. 3–10. [DOI] [Google Scholar]
46.Cohen K. B., Hunter L. Artificial Intelligence Methods and Tools for Systems Biology. Vol. 5. Dordrecht, The Netherlands: Springer; 2004. Natural language processing and systems biology; pp. 147–173. (Computational Biology). [DOI] [Google Scholar]
47.Weeber M., Klein H., Aronson A. R., Mork J. G., de Jong-van den Berg L. T., Vos R. Text-based discovery in biomedicine: the architecture of the DAD-system. Proceedings of the AMIA Symposium. 2000:903–907. [PMC free article] [PubMed] [Google Scholar]
48.Yeh A. S., Hirschman L., Morgan A. A. Evaluation of text data mining for database curation: lessons learned from the KDD Challenge Cup. Bioinformatics. 2003;19, supplement 1:i331–i339. doi: 10.1093/bioinformatics/btg1046. [DOI] [PubMed] [Google Scholar]
49.Liu Y., Liang Y., Wishart D. PolySearch2: a significantly improved text-mining system for discovering associations between human diseases, genes, drugs, metabolites, toxins and more. Nucleic Acids Research. 2015;43(1):W535–W542. doi: 10.1093/nar/gkv383. [DOI] [PMC free article] [PubMed] [Google Scholar]
50.Tanabe L., Scherf U., Smith L. H., Lee J. K., Hunter L., Weinstein J. N. MedMiner: an Internet text-mining tool for biomedical information, with application to gene expression profiling. BioTechniques. 1999;27(6):1210–1217. doi: 10.2144/99276bc03. [DOI] [PubMed] [Google Scholar]
51.Blaschke C., Andrade M. A., Ouzounis C., Valencia A. Automatic extraction of biological information from scientific text: protein-protein interactions. Proceedings of the 7th International Conference on Intelligent Systems for Molecular Biology; August 1999; Heidelberg, Germany. AAAI; pp. 60–67. [PubMed] [Google Scholar]
52.Baeza-Yates R., Ribeiro-Neto B. Modern Information Retrieval: The Concepts and Technology behind Search. 2nd. ACM Press; 2011. [Google Scholar]
53.Lin Y., Li W., Chen K., Liu Y. A document clustering and ranking system for exploring MEDLINE citations. Journal of the American Medical Informatics Association. 2007;14(5):651–661. doi: 10.1197/jamia.m2215. [DOI] [PMC free article] [PubMed] [Google Scholar]
54.Darmoni S. J., Soualmia L. F., Letord C., et al. Improving information retrieval using medical subject headings concepts: a test case on rare and chronic diseases. Journal of the Medical Library Association. 2012;100(3):176–183. doi: 10.3163/1536-5050.100.3.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
55.Petrova M., Sutcliffe P., Fulford K. W. M., Dale J. Search terms and a validated brief search filter to retrieve publications on health-related values in Medline: a word frequency analysis study. Journal of the American Medical Informatics Association. 2012;19(3):479–488. doi: 10.1136/amiajnl-2011-000243. [DOI] [PMC free article] [PubMed] [Google Scholar]
56.Manning C. D., Raghavan P., Schuetze H. Introduction to Information Retrieval. Cambridge University Press; 2008. [Google Scholar]
57.Leaman R., Gonzalez G. BANNER: an executable survey of advances in biomedical named entity recognition. Proceedings of the 13th Pacific Symposium on Biocomputing (PSB '08); January 2008; Kohala Coast, Hawaii, USA. pp. 652–663. [PubMed] [Google Scholar]
58.Raja K., Subramani S., Natarajan J. A hybrid named entity tagger for tagging human proteins/genes. International Journal of Data Mining and Bioinformatics. 2014;10(3):315–328. doi: 10.1504/IJDMB.2014.064545. [DOI] [PubMed] [Google Scholar]
59.Torii M., Arighi C. N., Li G., Wang Q., Wu C. H., Vijay-Shanker K. RLIMS-P 2.0: a generalizable rule-based information extraction system for literature mining of protein phosphorylation information. IEEE/ACM Transactions on Computational Biology and Bioinformatics. 2015;12(1):17–29. doi: 10.1109/tcbb.2014.2372765. [DOI] [PMC free article] [PubMed] [Google Scholar]
60.Raja K., Subramani S., Natarajan J. PPInterFinder—a mining tool for extracting causal relations on human proteins from literature. Database (Oxford) 2013;2013 doi: 10.1093/database/bas052.bas052 [DOI] [PMC free article] [PubMed] [Google Scholar]
61.Natarajan J., Berrar D., Hack C. J., Dubitzky W. Knowledge discovery in biology and biotechnology texts: a review of techniques, evaluation strategies, and applications. Critical Reviews in Biotechnology. 2005;25(1-2):31–52. doi: 10.1080/07388550590935571. [DOI] [PubMed] [Google Scholar]
62.Ravikumar K. E., Wagholikar K. B., Li D., Kocher J.-P., Liu H. Text mining facilitates database curation—extraction of mutation-disease associations from Bio-medical literature. BMC Bioinformatics. 2015;16(1, article 185) doi: 10.1186/s12859-015-0609-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
63.Matos S., Campos D., Pinho R., et al. Mining clinical attributes of genomic variants through assisted literature curation in Egas. Database (Oxford) 2016;2016 doi: 10.1093/database/baw096.baw096 [DOI] [PMC free article] [PubMed] [Google Scholar]
64.Subramani S., Kalpana R., Monickaraj P. M., Natarajan J. HPIminer: a text mining system for building and visualizing human protein interaction networks and pathways. Journal of Biomedical Informatics. 2015;54:121–131. doi: 10.1016/j.jbi.2015.01.006. [DOI] [PubMed] [Google Scholar]
65.Czarnecki J., Nobeli I., Smith A. M., Shepherd A. J. A text-mining system for extracting metabolic reactions from full-text articles. BMC Bioinformatics. 2012;13(1, article 172) doi: 10.1186/1471-2105-13-172. [DOI] [PMC free article] [PubMed] [Google Scholar]
66.Mishra R., Bian J., Fiszman M., et al. Text summarization in the biomedical domain: a systematic review of recent research. Journal of Biomedical Informatics. 2014;52:457–467. doi: 10.1016/j.jbi.2014.06.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
67.Zhu F., Patumcharoenpol P., Zhang C., et al. Biomedical text mining and its applications in cancer research. Journal of Biomedical Informatics. 2013;46(2):200–211. doi: 10.1016/j.jbi.2012.10.007. [DOI] [PubMed] [Google Scholar]
68.Meystre S. M., Savova G. K., Kipper-Schuler K. C., Hurdle J. F. Extracting information from textual documents in the electronic health record: a review of recent research. Yearbook of medical informatics. 2008:128–144. [PubMed] [Google Scholar]
69.Raja K., Jonnalagadda S. R. Natural language processing and data mining for clinical text. In: Reddy C. K., Aggarwal C. C., editors. Healthcare Data Analytics. CRC Press; 2015. pp. 219–250. [Google Scholar]
70.Welter D., MacArthur J., Morales J., et al. The NHGRI GWAS Catalog, a curated resource of SNP-trait associations. Nucleic Acids Research. 2014;42(1):D1001–D1006. doi: 10.1093/nar/gkt1229. [DOI] [PMC free article] [PubMed] [Google Scholar]
71.Zhang J., Jiang K., Lv L., et al. Use of genome-wide association studies for cancer research and drug repositioning. PLoS ONE. 2015;10(3) doi: 10.1371/journal.pone.0116477.e0116477 [DOI] [PMC free article] [PubMed] [Google Scholar]
72.Muraro D., Lauffenburger D. A., Simmons A. Prioritisation and network analysis of Crohn's disease susceptibility genes. PLoS ONE. 2014;9(9) doi: 10.1371/journal.pone.0108624.e108624 [DOI] [PMC free article] [PubMed] [Google Scholar]
73.Peterson T. A., Doughty E., Kann M. G. Towards precision medicine: advances in computational approaches for the analysis of human variants. Journal of Molecular Biology. 2013;425(21):4047–4063. doi: 10.1016/j.jmb.2013.08.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
74.Kim J.-D., Nguyen N., Wang Y., Tsujii J., Takagi T., Yonezawa A. The genia event and protein coreference tasks of the BioNLP shared task 2011. BMC bioinformatics. 2012;13, supplement 11:p. S1. doi: 10.1186/1471-2105-13-S11-S1. [DOI] [PMC free article] [PubMed] [Google Scholar]
75.Wiegers T. C., Davis A. P., Cohen K. B., Hirschman L., Mattingly C. J. Text mining and manual curation of chemical-gene-disease networks for the Comparative Toxicogenomics Database (CTD) BMC Bioinformatics. 2009;10, article 1471:p. 326. doi: 10.1186/1471-2105-10-326. [DOI] [PMC free article] [PubMed] [Google Scholar]
76.Hirschman L., Burns G. A. P. C., Krallinger M., et al. Text mining for the biocuration workflow. Database. 2012;2012 doi: 10.1093/database/bas020.bas020 [DOI] [PMC free article] [PubMed] [Google Scholar]
77.Mallory E. K., Zhang C., Ré C., Altman R. B. Large-scale extraction of gene interactions from full-text literature using DeepDive. Bioinformatics. 2015;32(1):106–113. doi: 10.1093/bioinformatics/btv476. [DOI] [PMC free article] [PubMed] [Google Scholar]
78.Hur J., Özgür A., Xiang Z., He Y. Development and application of an interaction network ontology for literature mining of vaccine-associated gene-gene interactions. Journal of Biomedical Semantics. 2015;6(1, article no. 2) doi: 10.1186/2041-1480-6-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
79.Davis A. P., Grondin C. J., Johnson R. J., et al. The comparative toxicogenomics database: update 2017. Nucleic Acids Research. 2017;45 doi: 10.1093/nar/gkw838. [DOI] [PMC free article] [PubMed] [Google Scholar]
80.Pletscher-Frankild S., Pallejà A., Tsafou K., Binder J. X., Jensen L. J. DISEASES: text mining and data integration of disease-gene associations. Methods. 2015;74:83–89. doi: 10.1016/j.ymeth.2014.11.020. [DOI] [PubMed] [Google Scholar]
81.Li G., Ross K. E., Arighi C. N., Peng Y., Wu C. H., Vijay-Shanker K. miRTex: a text mining system for mirna-gene relation extraction. PLoS Computational Biology. 2015;11(9) doi: 10.1371/journal.pcbi.1004391.e1004391 [DOI] [PMC free article] [PubMed] [Google Scholar]
82.Qabaja A., Jarada T., Elsheikh A., Alhajj R. Prediction of gene-based drug indications using compendia of public gene expression data and PubMed abstracts. Journal of Bioinformatics and Computational Biology. 2014;12(3) doi: 10.1142/s0219720014500073.14500073 [DOI] [PubMed] [Google Scholar]
83.Donnard E., Barbosa-Silva A., Guedes R. L. M., et al. Preimplantation development regulatory pathway construction through a text-mining approach. BMC Genomics. 2011;12(4, article S3) doi: 10.1186/1471-2164-12-s4-s3. [DOI] [PMC free article] [PubMed] [Google Scholar]
84.Lehmann R., Childs L., Thomas P., et al. Assembly of a comprehensive regulatory network for the mammalian circadian clock: a bioinformatics approach. PLoS ONE. 2015;10(5) doi: 10.1371/journal.pone.0126283.e0126283 [DOI] [PMC free article] [PubMed] [Google Scholar]
85.Chen H., Han D., Dai Y., Zhao L. Design of automatic extraction algorithm of knowledge points for MOOCs. Computational Intelligence and Neuroscience. 2015;2015:10. doi: 10.1155/2015/123028.123028 [DOI] [PMC free article] [PubMed] [Google Scholar]
86.Weikard R., Hadlich F., Kuehn C. Identification of novel transcripts and noncoding RNAs in bovine skin by deep next generation sequencing. BMC Genomics. 2013;14(1, article no. 789) doi: 10.1186/1471-2164-14-789. [DOI] [PMC free article] [PubMed] [Google Scholar]
87.Neveol A., Wilbur W. J., Lu Z. Improving links between literature and biological data with text mining: a case study with GEO, PDB and MEDLINE. Database. 2012;2012 doi: 10.1093/database/bas026.bas026 [DOI] [PMC free article] [PubMed] [Google Scholar]
88.Faro A., Giordano D., Spampinato C. Combining literature text mining with microarray data: advances for system biology modeling. Briefings in Bioinformatics. 2012;13(1):61–82. doi: 10.1093/bib/bbr018.bbr018 [DOI] [PubMed] [Google Scholar]
89.Percha B., Garten Y., Altman R. B. Discovery and explanation of drug-drug interactions via text mining. Proceedings of the 17th Pacific Symposium on Biocomputing (PSB '12); January 2012; Kohala Coast, Hawaii, USA. pp. 410–421. [PMC free article] [PubMed] [Google Scholar]
90.Daley J. M., Niu H., Miller A. S., Sung P. Biochemical mechanism of DSB end resection and its regulation. DNA Repair. 2015;32:66–74. doi: 10.1016/j.dnarep.2015.04.015. [DOI] [PMC free article] [PubMed] [Google Scholar]
91.Kann M. G. Protein interactions and disease: computational approaches to uncover the etiology of diseases. Briefings in Bioinformatics. 2007;8(5):333–346. doi: 10.1093/bib/bbm031. [DOI] [PubMed] [Google Scholar]
92.Kerrien S., Alam-Faruque Y., Aranda B., et al. IntAct—open source resource for molecular interaction data. Nucleic Acids Research. 2007;35(1):D561–D565. doi: 10.1093/nar/gkl958. [DOI] [PMC free article] [PubMed] [Google Scholar]
93.Bader G. D., Donaldson I., Wolting C., Ouellette B. F. F., Pawson T., Hogue C. W. V. BIND—The Biomolecular Interaction Network Database. Nucleic Acids Research. 2001;29(1):242–245. doi: 10.1093/nar/29.1.242. [DOI] [PMC free article] [PubMed] [Google Scholar]
94.Zanzoni A., Montecchi-Palazzi L., Quondam M., Ausiello G., Helmer-Citterich M., Cesareni G. MINT: a molecular INTeraction database. FEBS Letters. 2002;513(1):135–140. doi: 10.1016/s0014-5793(01)03293-8. [DOI] [PubMed] [Google Scholar]
95.Salwinski L., Miller C. S., Smith A. J., Pettit F. K., Bowie J. U., Eisenberg D. The database of interacting proteins: 2004 update. Nucleic Acids Research. 2004;32:D449–D451. doi: 10.1093/nar/gkh086. [DOI] [PMC free article] [PubMed] [Google Scholar]
96.Bui Q.-C., Katrenko S., Sloot P. M. A. A hybrid approach to extract protein-protein interactions. Bioinformatics. 2011;27(2):259–265. doi: 10.1093/bioinformatics/btq620. [DOI] [PubMed] [Google Scholar]
97.Heberling C., Dhurjati P. Novel systems modeling methodology in comparative microbial metabolomics: identifying key enzymes and metabolites implicated in autism spectrum disorders. International Journal of Molecular Sciences. 2015;16(4):8949–8967. doi: 10.3390/ijms16048949. [DOI] [PMC free article] [PubMed] [Google Scholar]
98.Kuo T.-C., Tian T.-F., Tseng Y. J. 3Omics: a web-based systems biology tool for analysis, integration and visualization of human transcriptomic, proteomic and metabolomic data. BMC Systems Biology. 2013;7, article 64 doi: 10.1186/1752-0509-7-64. [DOI] [PMC free article] [PubMed] [Google Scholar]
99.Medina I., Carbonell J., Pulido L., et al. Babelomics: an integrative platform for the analysis of transcriptomics, proteomics and genomic data with advanced functional profiling. Nucleic Acids Research. 2010;38(2):W210–W213. doi: 10.1093/nar/gkq388. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B1] 1.Morra E., Lazzarino M., Alessandrino E. P., et al. Central nervous system (CNS) leukemia: the role of high dose cytarabine (HDAra-C) Bone Marrow Transplantation. 1989;4(supplement 1):101–103. [PubMed] [Google Scholar]

[B2] 2.Kell D. B. The virtual human: towards a global systems biology of multiscale, distributed biochemical network models. IUBMB Life. 2007;59(11):689–695. doi: 10.1080/15216540701694252. [DOI] [PubMed] [Google Scholar]

[B3] 3.Westerhoff H. V., Palsson B. O. The evolution of molecular biology into systems biology. Nature Biotechnology. 2004;22(10):1249–1252. doi: 10.1038/nbt1020. [DOI] [PubMed] [Google Scholar]

[B4] 4.Horgan R. P., Kenny L. C. ‘Omic’ technologies: genomics, transcriptomics, proteomics and metabolomics. The Obstetrician & Gynaecologist. 2011;13(3):189–195. doi: 10.1576/toag.13.3.189.27672. [DOI] [Google Scholar]

[B5] 5.Tsoi L. C., Spain S. L., Ellinghaus E., et al. Enhanced meta-analysis and replication studies identify five new psoriasis susceptibility loci. Nature Communications. 2015;6 doi: 10.1038/ncomms8001.7001 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B6] 6.Bertrand D., Chng K. R. E., Sherbaf F. G. H., et al. Patient-specific driver gene prediction and risk assessment through integrated network analysis of cancer omics profiles. Nucleic acids research. 2015;43(7):p. e44. doi: 10.1093/nar/gku1393. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B7] 7.James P. Protein identification in the post-genome era: the rapid rise of proteomics. Quarterly Reviews of Biophysics. 1997;30(4):279–331. doi: 10.1017/s0033583597003399. [DOI] [PubMed] [Google Scholar]

[B8] 8.Khoury G. A., Baliban R. C., Floudas C. A. Proteome-wide post-translational modification statistics: frequency analysis and curation of the swiss-prot database. Scientific Reports. 2011;1, article 90 doi: 10.1038/srep00090. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B9] 9.Mortazavi A., Williams B. A., McCue K., Schaeffer L., Wold B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nature Methods. 2008;5(7):621–628. doi: 10.1038/nmeth.1226. [DOI] [PubMed] [Google Scholar]

[B10] 10.Nicholson J. K., Lindon J. C., Holmes E. “Metabonomics”: understanding the metabolic responses of living systems to pathophysiological stimuli via multivariate statistical analysis of biological NMR spectroscopic data. Xenobiotica. 1999;29(11):1181–1189. doi: 10.1080/004982599238047. [DOI] [PubMed] [Google Scholar]

[B11] 11.Shoenbill K., Fost N., Tachinardi U., Mendonca E. A. Genetic data and electronic health records: a discussion of ethical, logistical and technological considerations. Journal of the American Medical Informatics Association. 2014;21(1):171–180. doi: 10.1136/amiajnl-2013-001694. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B12] 12.Poste G. Bring on the biomarkers. Nature. 2011;469(7329):156–157. doi: 10.1038/469156a. [DOI] [PubMed] [Google Scholar]

[B13] 13.Lu Q., Powles R. L., Wang Q., He B. J., Zhao H. Integrative tissue-specific functional annotations in the human genome provide novel insights on many complex traits and improve signal prioritization in genome wide association studies. PLoS Genetics. 2016;12(4) doi: 10.1371/journal.pgen.1005947.e1005947 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B14] 14.Puchades-Carrasco L., Palomino-Schätzlein M., Pérez-Rambla C., Pineda-Lucena A. Bioinformatics tools for the analysis of NMR metabolomics studies focused on the identification of clinically relevant biomarkers. Briefings in Bioinformatics. 2016;17(3):541–552. doi: 10.1093/bib/bbv077. [DOI] [PubMed] [Google Scholar]

[B15] 15.Forbes S. A., Beare D., Gunasekaran P., et al. COSMIC: exploring the world's knowledge of somatic mutations in human cancer. Nucleic Acids Research. 2015;43:D805–D811. doi: 10.1093/nar/gku1075. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B16] 16.Ananiadou S., Thompson P., Nawaz R., McNaught J., Kell D. B. Event-based text mining for biology and functional genomics. Briefings in Functional Genomics. 2015;14(3):213–230. doi: 10.1093/bfgp/elu015. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B17] 17.Krallinger M., Valencia A. Text-mining and information-retrieval services for molecular biology. Genome Biology. 2005;6(7, article no. 224) doi: 10.1186/gb-2005-6-7-224. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B87] 18.Lee H. K., Hsu A. K., Sajdak J., Qin J., Pavlidis P. Coexpression analysis of human genes across many microarray data sets. Genome Research. 2004;14(6):1085–1094. doi: 10.1101/gr.1910904. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B88] 19.Yeh A., Morgan A., Colosimo M., Hirschman L. BioCreAtIvE task 1A: gene mention finding evaluation. BMC Bioinformatics. 2005;6(1, article no. S2) doi: 10.1186/1471-2105-6-s1-s2. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B89] 20.Vlachos A. Tackling the BioCreative2 gene mention task with conditional random fields and syntactic parsing. Proceedings of the 2nd BioCreative Challenge Evaluation Workshop; April 2007; Madrid, Spain. [Google Scholar]

[B90] 21.Bunescu R., Ge R., Kate R. J., et al. Comparative experiments on learning information extractors for proteins and their interactions. Artificial Intelligence in Medicine. 2005;33(2):139–155. doi: 10.1016/j.artmed.2004.07.016. [DOI] [PubMed] [Google Scholar]

[B91] 22.Fundel K., Küffner R., Zimmer R. RelEx—relation extraction using dependency parse trees. Bioinformatics. 2007;23(3):365–371. doi: 10.1093/bioinformatics/btl616. [DOI] [PubMed] [Google Scholar]

[B92] 23.Pyysalo S., Ginter F., Heimonen J., et al. BioInfer: a corpus for information extraction in the biomedical domain. BMC Bioinformatics. 2007;8, article 50 doi: 10.1186/1471-2105-8-50. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B93] 24.Ding J., Berleant D., Nettleton D., Wurtele E. Mining MEDLINE: abstracts, sentences, or phrases? Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing. 2002;7:326–337. doi: 10.1142/9789812799623_0031. [DOI] [PubMed] [Google Scholar]

[B94] 25.Leitner F., Krallinger M., Cesareni G., Valencia A. The FEBS letters SDA corpus: a collection of protein interaction articles with high quality annotations for the BioCreative II.5 online challenge and the text mining community. FEBS Letters. 2010;584(19):4129–4130. doi: 10.1016/j.febslet.2010.08.026. [DOI] [PubMed] [Google Scholar]

[B95] 26.Van Auken K., Schaeffer M. L., McQuilton P., et al. BC4GO: a full-text corpus for the BioCreative IV GO task. Database. 2014;2014 doi: 10.1093/database/bau074.bau074 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B96] 27.Thompson P., Iqbal S. A., McNaught J., Ananiadou S. Construction of an annotated corpus to support biomedical information extraction. BMC Bioinformatics. 2009;10, article 349 doi: 10.1186/1471-2105-10-349. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B97] 28.Gerner M., Nenadic G., Bergman C. M. An exploration of mining gene expression mentions and their anatomical locations from biomedical text. Proceedings of the Workshop on Biomedical Natural Language Processing; July 2010; Uppsala, Sweden. Association for Computational Linguistics; pp. 72–80. [Google Scholar]

[B98] 29.Ohta T., Pyysalo S., Tsujii J., Ananiadou S. Open-domain anatomical entity mention detection. Proceedings of the Workshop on Detecting Structure in Scholarly Discourse; July 2012; Jeju, Korea. Association for Computational Linguistics; [Google Scholar]

[B99] 30.Neves M., Damaschun E., Kurtz A., Leser U. Annotating and evaluating text for stem cell research. Proceedings of the 3rd Workshop on Building and Evaluation Resources for Biomedical Text Mining (BioTxtM '12) at Language Resources and Evaluation (LREC); 2012; Istanbul, Turkey. [Google Scholar]

[B18] 31.Pray L. A. Discovery of DNA structure and function: Watson and Crick. Nature Education. 2008;1(1, article 100) [Google Scholar]

[B19] 32.Issa N. T., Byers S. W., Dakshanamurthy S. Big data: the next frontier for innovation in therapeutics and healthcare. Expert Review of Clinical Pharmacology. 2014;7(3):293–298. doi: 10.1586/17512433.2014.905201. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B20] 33.Jiang S., Hinchliffe T. E., Wu T. Biomarkers of an autoimmune skin disease-psoriasis. Genomics, Proteomics and Bioinformatics. 2015;13(4):224–233. doi: 10.1016/j.gpb.2015.04.002. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B21] 34.Tebani A., Afonso C., Marret S., Bekri S. Omics-based strategies in precision medicine: toward a paradigm shift in inborn errors of metabolism investigations. International Journal of Molecular Sciences. 2016;17(9):p. 1555. doi: 10.3390/ijms17091555. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B22] 35.Rothberg J. M., Hinz W., Rearick T. M., et al. An integrated semiconductor device enabling non-optical genome sequencing. Nature. 2011;475(7356):348–352. doi: 10.1038/nature10242. [DOI] [PubMed] [Google Scholar]

[B23] 36.Clarke J., Wu H.-C., Jayasinghe L., Patel A., Reid S., Bayley H. Continuous base identification for single-molecule nanopore DNA sequencing. Nature Nanotechnology. 2009;4(4):265–270. doi: 10.1038/nnano.2009.12. [DOI] [PubMed] [Google Scholar]

[B24] 37.Canuel V., Rance B., Avillach P., Degoulet P., Burgun A. Translational research platforms integrating clinical and omics data: a review of publicly available solutions. Briefings in Bioinformatics. 2015;16(2):280–290. doi: 10.1093/bib/bbu006. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B25] 38.Griebel L., Prokosch H., Köpcke F., et al. A scoping review of cloud computing in healthcare. BMC Medical Informatics and Decision Making. 2015;15, article 17 doi: 10.1186/s12911-015-0145-7. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B26] 39.Ocaña K., De Oliveira D. Parallel computing in genomic research: advances and applications. Advances and Applications in Bioinformatics and Chemistry. 2015;8:23–35. doi: 10.2147/aabc.s64482. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B27] 40.Wall D. P., Kudtarkar P., Fusaro V. A., Pivovarov R., Patil P., Tonellato P. J. Cloud computing for comparative genomics. BMC Bioinformatics. 2010;11, article no. 259 doi: 10.1186/1471-2105-11-259. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B28] 41.Armbrust M., Fox A., Griffith R., et al. A view of cloud computing. Communications of the ACM. 2010;53(4):50–58. doi: 10.1145/1721654.1721672. [DOI] [Google Scholar]

[B29] 42.Zaharia M., Chowdhury M., Franklin M. J., Shenker S., Stoica I. Spark: cluster ComSpark: cluster computing with working sets. Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing; June 2010; Boston, Mass, USA. p. p. 10. [Google Scholar]

[B30] 43.Baker M., Buyya R., Laforenza D. Grids and grid technologies for wide-area distributed computing. Software—Practice & Experience. 2002;32(15):1437–1466. doi: 10.1002/spe.488. [DOI] [Google Scholar]

[B31] 44.Ufimtsev I. S., Martinez T. J. Quantum chemistry on graphical processing units. 2. Direct self-consistent-field implementation. Journal of Chemical Theory and Computation. 2009;5(4):1004–1015. doi: 10.1021/ct800526s. [DOI] [PubMed] [Google Scholar]

[B32] 45.Hearst M. A. Untangling text data mining. Proceedings of the the 37th annual meeting of the Association for Computational Linguistics (ACL '99); June 1999; College Park, Maryland. pp. 3–10. [DOI] [Google Scholar]

[B33] 46.Cohen K. B., Hunter L. Artificial Intelligence Methods and Tools for Systems Biology. Vol. 5. Dordrecht, The Netherlands: Springer; 2004. Natural language processing and systems biology; pp. 147–173. (Computational Biology). [DOI] [Google Scholar]

[B34] 47.Weeber M., Klein H., Aronson A. R., Mork J. G., de Jong-van den Berg L. T., Vos R. Text-based discovery in biomedicine: the architecture of the DAD-system. Proceedings of the AMIA Symposium. 2000:903–907. [PMC free article] [PubMed] [Google Scholar]

[B35] 48.Yeh A. S., Hirschman L., Morgan A. A. Evaluation of text data mining for database curation: lessons learned from the KDD Challenge Cup. Bioinformatics. 2003;19, supplement 1:i331–i339. doi: 10.1093/bioinformatics/btg1046. [DOI] [PubMed] [Google Scholar]

[B36] 49.Liu Y., Liang Y., Wishart D. PolySearch2: a significantly improved text-mining system for discovering associations between human diseases, genes, drugs, metabolites, toxins and more. Nucleic Acids Research. 2015;43(1):W535–W542. doi: 10.1093/nar/gkv383. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B37] 50.Tanabe L., Scherf U., Smith L. H., Lee J. K., Hunter L., Weinstein J. N. MedMiner: an Internet text-mining tool for biomedical information, with application to gene expression profiling. BioTechniques. 1999;27(6):1210–1217. doi: 10.2144/99276bc03. [DOI] [PubMed] [Google Scholar]

[B38] 51.Blaschke C., Andrade M. A., Ouzounis C., Valencia A. Automatic extraction of biological information from scientific text: protein-protein interactions. Proceedings of the 7th International Conference on Intelligent Systems for Molecular Biology; August 1999; Heidelberg, Germany. AAAI; pp. 60–67. [PubMed] [Google Scholar]

[B77] 52.Baeza-Yates R., Ribeiro-Neto B. Modern Information Retrieval: The Concepts and Technology behind Search. 2nd. ACM Press; 2011. [Google Scholar]

[B78] 53.Lin Y., Li W., Chen K., Liu Y. A document clustering and ranking system for exploring MEDLINE citations. Journal of the American Medical Informatics Association. 2007;14(5):651–661. doi: 10.1197/jamia.m2215. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B79] 54.Darmoni S. J., Soualmia L. F., Letord C., et al. Improving information retrieval using medical subject headings concepts: a test case on rare and chronic diseases. Journal of the Medical Library Association. 2012;100(3):176–183. doi: 10.3163/1536-5050.100.3.007. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B80] 55.Petrova M., Sutcliffe P., Fulford K. W. M., Dale J. Search terms and a validated brief search filter to retrieve publications on health-related values in Medline: a word frequency analysis study. Journal of the American Medical Informatics Association. 2012;19(3):479–488. doi: 10.1136/amiajnl-2011-000243. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B40] 56.Manning C. D., Raghavan P., Schuetze H. Introduction to Information Retrieval. Cambridge University Press; 2008. [Google Scholar]

[B62] 57.Leaman R., Gonzalez G. BANNER: an executable survey of advances in biomedical named entity recognition. Proceedings of the 13th Pacific Symposium on Biocomputing (PSB '08); January 2008; Kohala Coast, Hawaii, USA. pp. 652–663. [PubMed] [Google Scholar]

[B63] 58.Raja K., Subramani S., Natarajan J. A hybrid named entity tagger for tagging human proteins/genes. International Journal of Data Mining and Bioinformatics. 2014;10(3):315–328. doi: 10.1504/IJDMB.2014.064545. [DOI] [PubMed] [Google Scholar]

[B39] 59.Torii M., Arighi C. N., Li G., Wang Q., Wu C. H., Vijay-Shanker K. RLIMS-P 2.0: a generalizable rule-based information extraction system for literature mining of protein phosphorylation information. IEEE/ACM Transactions on Computational Biology and Bioinformatics. 2015;12(1):17–29. doi: 10.1109/tcbb.2014.2372765. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B72] 60.Raja K., Subramani S., Natarajan J. PPInterFinder—a mining tool for extracting causal relations on human proteins from literature. Database (Oxford) 2013;2013 doi: 10.1093/database/bas052.bas052 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B81] 61.Natarajan J., Berrar D., Hack C. J., Dubitzky W. Knowledge discovery in biology and biotechnology texts: a review of techniques, evaluation strategies, and applications. Critical Reviews in Biotechnology. 2005;25(1-2):31–52. doi: 10.1080/07388550590935571. [DOI] [PubMed] [Google Scholar]

[B59] 62.Ravikumar K. E., Wagholikar K. B., Li D., Kocher J.-P., Liu H. Text mining facilitates database curation—extraction of mutation-disease associations from Bio-medical literature. BMC Bioinformatics. 2015;16(1, article 185) doi: 10.1186/s12859-015-0609-x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B82] 63.Matos S., Campos D., Pinho R., et al. Mining clinical attributes of genomic variants through assisted literature curation in Egas. Database (Oxford) 2016;2016 doi: 10.1093/database/baw096.baw096 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B83] 64.Subramani S., Kalpana R., Monickaraj P. M., Natarajan J. HPIminer: a text mining system for building and visualizing human protein interaction networks and pathways. Journal of Biomedical Informatics. 2015;54:121–131. doi: 10.1016/j.jbi.2015.01.006. [DOI] [PubMed] [Google Scholar]

[B84] 65.Czarnecki J., Nobeli I., Smith A. M., Shepherd A. J. A text-mining system for extracting metabolic reactions from full-text articles. BMC Bioinformatics. 2012;13(1, article 172) doi: 10.1186/1471-2105-13-172. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B85] 66.Mishra R., Bian J., Fiszman M., et al. Text summarization in the biomedical domain: a systematic review of recent research. Journal of Biomedical Informatics. 2014;52:457–467. doi: 10.1016/j.jbi.2014.06.009. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B86] 67.Zhu F., Patumcharoenpol P., Zhang C., et al. Biomedical text mining and its applications in cancer research. Journal of Biomedical Informatics. 2013;46(2):200–211. doi: 10.1016/j.jbi.2012.10.007. [DOI] [PubMed] [Google Scholar]

[B41] 68.Meystre S. M., Savova G. K., Kipper-Schuler K. C., Hurdle J. F. Extracting information from textual documents in the electronic health record: a review of recent research. Yearbook of medical informatics. 2008:128–144. [PubMed] [Google Scholar]

[B42] 69.Raja K., Jonnalagadda S. R. Natural language processing and data mining for clinical text. In: Reddy C. K., Aggarwal C. C., editors. Healthcare Data Analytics. CRC Press; 2015. pp. 219–250. [Google Scholar]

[B43] 70.Welter D., MacArthur J., Morales J., et al. The NHGRI GWAS Catalog, a curated resource of SNP-trait associations. Nucleic Acids Research. 2014;42(1):D1001–D1006. doi: 10.1093/nar/gkt1229. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B44] 71.Zhang J., Jiang K., Lv L., et al. Use of genome-wide association studies for cancer research and drug repositioning. PLoS ONE. 2015;10(3) doi: 10.1371/journal.pone.0116477.e0116477 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B45] 72.Muraro D., Lauffenburger D. A., Simmons A. Prioritisation and network analysis of Crohn's disease susceptibility genes. PLoS ONE. 2014;9(9) doi: 10.1371/journal.pone.0108624.e108624 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B46] 73.Peterson T. A., Doughty E., Kann M. G. Towards precision medicine: advances in computational approaches for the analysis of human variants. Journal of Molecular Biology. 2013;425(21):4047–4063. doi: 10.1016/j.jmb.2013.08.008. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B47] 74.Kim J.-D., Nguyen N., Wang Y., Tsujii J., Takagi T., Yonezawa A. The genia event and protein coreference tasks of the BioNLP shared task 2011. BMC bioinformatics. 2012;13, supplement 11:p. S1. doi: 10.1186/1471-2105-13-S11-S1. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B48] 75.Wiegers T. C., Davis A. P., Cohen K. B., Hirschman L., Mattingly C. J. Text mining and manual curation of chemical-gene-disease networks for the Comparative Toxicogenomics Database (CTD) BMC Bioinformatics. 2009;10, article 1471:p. 326. doi: 10.1186/1471-2105-10-326. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B49] 76.Hirschman L., Burns G. A. P. C., Krallinger M., et al. Text mining for the biocuration workflow. Database. 2012;2012 doi: 10.1093/database/bas020.bas020 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B50] 77.Mallory E. K., Zhang C., Ré C., Altman R. B. Large-scale extraction of gene interactions from full-text literature using DeepDive. Bioinformatics. 2015;32(1):106–113. doi: 10.1093/bioinformatics/btv476. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B51] 78.Hur J., Özgür A., Xiang Z., He Y. Development and application of an interaction network ontology for literature mining of vaccine-associated gene-gene interactions. Journal of Biomedical Semantics. 2015;6(1, article no. 2) doi: 10.1186/2041-1480-6-2. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B52] 79.Davis A. P., Grondin C. J., Johnson R. J., et al. The comparative toxicogenomics database: update 2017. Nucleic Acids Research. 2017;45 doi: 10.1093/nar/gkw838. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B53] 80.Pletscher-Frankild S., Pallejà A., Tsafou K., Binder J. X., Jensen L. J. DISEASES: text mining and data integration of disease-gene associations. Methods. 2015;74:83–89. doi: 10.1016/j.ymeth.2014.11.020. [DOI] [PubMed] [Google Scholar]

[B54] 81.Li G., Ross K. E., Arighi C. N., Peng Y., Wu C. H., Vijay-Shanker K. miRTex: a text mining system for mirna-gene relation extraction. PLoS Computational Biology. 2015;11(9) doi: 10.1371/journal.pcbi.1004391.e1004391 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B55] 82.Qabaja A., Jarada T., Elsheikh A., Alhajj R. Prediction of gene-based drug indications using compendia of public gene expression data and PubMed abstracts. Journal of Bioinformatics and Computational Biology. 2014;12(3) doi: 10.1142/s0219720014500073.14500073 [DOI] [PubMed] [Google Scholar]

[B56] 83.Donnard E., Barbosa-Silva A., Guedes R. L. M., et al. Preimplantation development regulatory pathway construction through a text-mining approach. BMC Genomics. 2011;12(4, article S3) doi: 10.1186/1471-2164-12-s4-s3. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B57] 84.Lehmann R., Childs L., Thomas P., et al. Assembly of a comprehensive regulatory network for the mammalian circadian clock: a bioinformatics approach. PLoS ONE. 2015;10(5) doi: 10.1371/journal.pone.0126283.e0126283 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B58] 85.Chen H., Han D., Dai Y., Zhao L. Design of automatic extraction algorithm of knowledge points for MOOCs. Computational Intelligence and Neuroscience. 2015;2015:10. doi: 10.1155/2015/123028.123028 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B60] 86.Weikard R., Hadlich F., Kuehn C. Identification of novel transcripts and noncoding RNAs in bovine skin by deep next generation sequencing. BMC Genomics. 2013;14(1, article no. 789) doi: 10.1186/1471-2164-14-789. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B61] 87.Neveol A., Wilbur W. J., Lu Z. Improving links between literature and biological data with text mining: a case study with GEO, PDB and MEDLINE. Database. 2012;2012 doi: 10.1093/database/bas026.bas026 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B64] 88.Faro A., Giordano D., Spampinato C. Combining literature text mining with microarray data: advances for system biology modeling. Briefings in Bioinformatics. 2012;13(1):61–82. doi: 10.1093/bib/bbr018.bbr018 [DOI] [PubMed] [Google Scholar]

[B65] 89.Percha B., Garten Y., Altman R. B. Discovery and explanation of drug-drug interactions via text mining. Proceedings of the 17th Pacific Symposium on Biocomputing (PSB '12); January 2012; Kohala Coast, Hawaii, USA. pp. 410–421. [PMC free article] [PubMed] [Google Scholar]

[B66] 90.Daley J. M., Niu H., Miller A. S., Sung P. Biochemical mechanism of DSB end resection and its regulation. DNA Repair. 2015;32:66–74. doi: 10.1016/j.dnarep.2015.04.015. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B67] 91.Kann M. G. Protein interactions and disease: computational approaches to uncover the etiology of diseases. Briefings in Bioinformatics. 2007;8(5):333–346. doi: 10.1093/bib/bbm031. [DOI] [PubMed] [Google Scholar]

[B68] 92.Kerrien S., Alam-Faruque Y., Aranda B., et al. IntAct—open source resource for molecular interaction data. Nucleic Acids Research. 2007;35(1):D561–D565. doi: 10.1093/nar/gkl958. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B69] 93.Bader G. D., Donaldson I., Wolting C., Ouellette B. F. F., Pawson T., Hogue C. W. V. BIND—The Biomolecular Interaction Network Database. Nucleic Acids Research. 2001;29(1):242–245. doi: 10.1093/nar/29.1.242. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B70] 94.Zanzoni A., Montecchi-Palazzi L., Quondam M., Ausiello G., Helmer-Citterich M., Cesareni G. MINT: a molecular INTeraction database. FEBS Letters. 2002;513(1):135–140. doi: 10.1016/s0014-5793(01)03293-8. [DOI] [PubMed] [Google Scholar]

[B71] 95.Salwinski L., Miller C. S., Smith A. J., Pettit F. K., Bowie J. U., Eisenberg D. The database of interacting proteins: 2004 update. Nucleic Acids Research. 2004;32:D449–D451. doi: 10.1093/nar/gkh086. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B73] 96.Bui Q.-C., Katrenko S., Sloot P. M. A. A hybrid approach to extract protein-protein interactions. Bioinformatics. 2011;27(2):259–265. doi: 10.1093/bioinformatics/btq620. [DOI] [PubMed] [Google Scholar]

[B74] 97.Heberling C., Dhurjati P. Novel systems modeling methodology in comparative microbial metabolomics: identifying key enzymes and metabolites implicated in autism spectrum disorders. International Journal of Molecular Sciences. 2015;16(4):8949–8967. doi: 10.3390/ijms16048949. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B75] 98.Kuo T.-C., Tian T.-F., Tseng Y. J. 3Omics: a web-based systems biology tool for analysis, integration and visualization of human transcriptomic, proteomic and metabolomic data. BMC Systems Biology. 2013;7, article 64 doi: 10.1186/1752-0509-7-64. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B76] 99.Medina I., Carbonell J., Pulido L., et al. Babelomics: an integrative platform for the analysis of transcriptomics, proteomics and genomic data with advanced functional profiling. Nucleic Acids Research. 2010;38(2):W210–W213. doi: 10.1093/nar/gkq388. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

A Review of Recent Advancement in Integrating Omics Data with Literature Mining towards Biomedical Discoveries

Kalpana Raja

Matthew Patrick

Yilin Gao

Desmond Madu

Yuyang Yang

Lam C Tsoi

Abstract

1. Introduction

2. The Study of “Omics”

Table 1.

3. Text Mining

3.1. Biomedical Literature Mining

Table 2.

3.2. Clinical Text Mining

4. Role of Text Mining in Omics Study

4.1. Genomics and Text Mining

4.2. Proteomics and Text Mining

4.3. Transcriptomics, Metabolomics, and Text Mining

5. Conclusion

Acknowledgments

Competing Interests

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

A Review of Recent Advancement in Integrating Omics Data with Literature Mining towards Biomedical Discoveries

Kalpana Raja

Matthew Patrick

Yilin Gao

Desmond Madu

Yuyang Yang

Lam C Tsoi

Abstract

1. Introduction

2. The Study of “Omics”

Table 1.

3. Text Mining

3.1. Biomedical Literature Mining

Table 2.

3.2. Clinical Text Mining

4. Role of Text Mining in Omics Study

4.1. Genomics and Text Mining

4.2. Proteomics and Text Mining

4.3. Transcriptomics, Metabolomics, and Text Mining

5. Conclusion

Acknowledgments

Competing Interests

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases