Skip to main content
Cardiology Research and Practice logoLink to Cardiology Research and Practice
. 2019 Dec 30;2019:4237285. doi: 10.1155/2019/4237285

FibroAtlas: A Database for the Exploration of Fibrotic Diseases and Their Genes

Jinying Liu 1,2, Dezhi Sun 3, Jiale Liu 3, Hao Xu 3, Yuan Liu 3, Yang Li 3, Lihong Diao 3, Xun Wang 3, Dan Wang 3, Lei Tian 2, Huimin Zhang 2, Zhongyang Liu 3, Weiquan Ren 2, Fuchu He 3, Dong Li 3,, Shuzhen Guo 2,
PMCID: PMC7012261  PMID: 32082621

Abstract

Background

Fibrosis is a highly dynamic process caused by prolonged injury, deregulation of the normal processes of wound healing, and extensive deposition of extracellular matrix (ECM) proteins. During fibrosis process, multiple genes interact with environmental factors. Over recent decades, tons of fibrosis-related genes have been identified to shed light on the particular clinical manifestations of this complex process. However, the genetics information about fibrosis is dispersed in lots of extensive literature.

Methods

We extracted data from literature abstracts in PubMed by text mining, and manually curated the literature and identified the evidence sentences.

Results

We presented FibroAtlas, which included 1,439 well-annotated fibrosis-associated genes. FibroAtlas 1.0 is the first attempt to build a nonredundant and comprehensive catalog of fibrosis-related genes with supporting evidence derived from curated published literature and allows us to have an overview of human fibrosis-related genes.

1. Introduction

Fibrosis is a chronic and progressive process characterized by an excessive deposition of extracellular matrix (ECM) leading to overgrowth, hardening, and/or scarring of various tissues [1]. Fibrotic changes may affect almost all the main tissues and organs, including the skin, kidney, lung, and liver, as well as various vascular disorders [2]. Failure to control the abnormal wound healing responses can lead to considerable tissue remodeling and organ malfunction as seen in late-stage idiopathic pulmonary fibrosis and cardiac fibrosis [2, 3]. Aberrant fibrotic tissue remodeling also may be involved in the tumor initiation and progression, and accelerate chronic graft rejection in recipients of organ transplantation [4]. Fibrosis is one of the major causes of morbidity and mortality. Approximately 45 percent of all-cause mortality in the United States was attributed to fibrotic disorders [1].

Identification of effective therapeutic targets and designation for antifibrotic treatment strategies will depend on the underlying etiology, the severity, and extent of the fibrotic disease. However, the etiology and pathogenesis of fibrosis still remain virtually unknown, which limits our ability to optimally prevent or treat this disease. The natural history and the factors associated with fibrosis progression are highly variable [5]. Currently, lots of studies have indicated that both genetic factors and environmental exposures have been implicated in the formation and progression of fibrosis. For example, rs 35705950, a common polymorphism in the promoter of Mucin 5B (MUC5B), is associated with familial interstitial pneumonia and idiopathic pulmonary fibrosis, which suggests a crucial role of dysregulated MUC5B expression in the pathogenesis of pulmonary fibrosis [6]. Platelet factor 4 (PF4) is identified as a marker for fibrosis, levels of which are elevated in patients with systemic sclerosis and correlated with the presence and progression of pulmonary arterial hypertension [7]. Studies have suggested that multiple fibrotic diseases are usually triggered by the same irritation and share a number of common pathways, such as transforming growth factor beta (TGF-β), interleukin-6 (IL-6), and integrin-linked kinase signaling [8, 9].

Besides, there is still no database concentrating on fibrosis-associated genes. Therefore, a targeted strategy should be established to collect the magnanimity information about previously reported fibrosis-associated genes. To address the challenge, we create the FibroAtlas database 1.0 (http://biokb.ncpsb.org/fibroatlas/), which identifies 1,439 manual curated fibrosis-related genes by literature mining. FibroAtlas will shed light on the pathogenesis of individual cases, novel biomarkers for diagnosis and prognosis, and personalized therapeutic strategies.

2. Materials and Methods

2.1. Literature Mining and Manual Curation

We have constructed an ontology-based bioentity recognizer to recognize and extract genes in PubMed abstracts. This system compares favorably with current state-of-the-art biomedical annotation systems such as BeCAS [10] and has been evaluated against the CRAFT [11] corpus for gene/protein recognition based on Protein Ontology (PR) [12], which has the precision, F-measure, and recall of 0.959, 0.802, and 0.874, respectively. This system has been used to build AllerGAtlas 1.0 [13] successfully.

Three steps were taken to compile a comprehensive catalogue of human candidate genes related to fibrosis from PubMed abstracts.

First, 227,458 sentences in 114,973 PubMed abstracts including the keywords of “fibrosis,” “fibrotic,” “fibrotic action,” “fibrotic change,” or their lexical variants were identified by our bioentity recognizer.

Second, a list of 4,079 human genes with the fibrosis-associated keywords at sentence level co-occurrences were identified and extracted from 62,302 sentences in 10,243 PubMed abstracts by bioentity recognizer based on Protein Ontology (Supplementary material: .xlsx).

Third, 4,079 candidate genes were manually curated by our experts and 1,439 genes were finally certified as the human fibrosis-associated genes.

The co-occurrences between fibrosis-associated genes/proteins and fibrosis-related disease terminology based on Human Disease Ontology (DO) [14] were identified at sentence level from PubMed abstracts by bioentity recognizer. Furthermore, the genes identified as biomarkers were mined and marked with the terms “biomarker,” “biomarkers,” “marker,” “markers,” or “mark,” and then these potential biomarkers were manually curated by our experts.

2.2. Gene Annotation

We provided detailed annotations for each fibrosis-related gene to facilitate deeper interpretations for users. NCBI Entrez Gene ID and gene symbol were used for cross links and annotations. The basic gene information including gene symbol, synonyms, gene summary, chromosome, and chromosomal location were supplied to facilitate alignment known splicing sites. Gene ontology (GO) annotations were taken from the AmiGO database [15], and the gene-pathway relations were obtained from the Reactome database [16]. SNPs linked to genes were retrieved by the literature's PMIDs (PubMed Unique Identifier) from the dbSNP database [17]. The public databases such as Ensembl [18], Entrez gene [19], UniProt [20], neXtProt [21], and Antibodypedia [22] were also utilized to map and annotate.

3. Results

3.1. Database Implementation and Service

All identified fibrosis-related genes/proteins, human disease terminology, and their biomarkers were loaded into a local MySQL server. PHP was used to implement the web interface of FibroAtlas on a Windows server. All the data of FibroAtlas are accessible to every user without login or registration.

3.2. Database Search and Navigation

FibroAtlas is a user-friendly interface website to query the database (http://biokb.ncpsb.org/fibroatlas/), which has five components including “Home,” “Browse & Download,” “Feedback,” “FAQ,” and “Contact” (Figure 1). In the “Home” page, three main types of navigational queries are available: protein name, nucleotide sequence, and protein sequence. For example, if users submit a gene name in the search box of “Gene Symbol,” an autocompleted dropdown list of gene symbols will be displayed to show the possible matches in the FibroAtlas. Users can select one of them and click the “Search” button to jump to the result page. If users search the gene by nucleotide sequence or protein sequence, the sequence match scores from BLAST will be listed. Users can choose the matched gene name and click “continue” to browse result interface (Figure 1(A)). A table containing the queried gene, the supporting literature evidences for related human disease terminology, the role of gene, and the number of evidences will be displayed on the search result page by the search engine (Figure 1(B)). By clicking on the gene hyperlink, users can access the page of gene annotations, which includes a list of SNPs mapped to dbSNP, gene ontology (GO) terms derived from GOA, pathway identifiers derived from Reactome, and the gene description based on UniProtKB, etc. (Figure 1(C)). By clicking on the number of the evidence abstracts or sentences, users can browse a table containing the gene symbol, the PubMed ID, and the manual curated evidences. In addition, to specify individual interested evidence, users can obtain the whole abstract with highlighted names of entities, i.e., the alias names of gene and disease term (Figure 1(D)). Three approaches are supported by the page of “Browse & Download.” All the data can be freely downloaded (Figure 1(E)).

Figure 1.

Figure 1

(A) Three main types of queries are supported by the “Home” page: gene symbol query, nucleotide sequence query, and protein sequence query. Users can input the gene symbol such as “STAT3” in the query box. Users can also input a nucleotide or protein sequence, and the sequence similarity identity score from BLAST will be displayed. Choose the matched gene name and click “continue” to scan the set of search results. (B) In the result page, a table including the queried gene, related disease terminology, and supporting evidences is listed. (C) By clicking the gene symbol of “STAT3” in the “search results” interface, users can browse detailed information of “STAT3” and cross links to external databases. (D) By clicking the number of PubMed abstracts or sentences in the “search results” interface, users can scan a table containing the information of gene, associated disease terminology, PubMed ID, evidence, and manual curation. Click the link of evidence in this page to scan the abstract with highlighted keywords. (E) Three approaches for browsing are presented in the “Browse & Download” page. All the data can be downloaded.

3.3. Application Case of the Database

Cardiac fibrosis is an inevitable consequence of chronic myocardial injury and leads to both systolic and diastolic dysfunction in many cardiac pathological conditions [23]. Cardiac fibrosis is a common phenomenon in the end stages of diverse cardiac diseases and is a predictive factor for sudden cardiac death [24]. There is an urgent need to unravel the intricate mechanisms underlying the development of cardiac fibrosis, in order to prevent long-term sequelae of cardiac fibrosis. We searched the database with the term of “cardiac fibrosis” and obtained 119 expert curated genes with detailed annotations. Pathway analyses were run on the list of cardiac fibrosis-related genes. The result shows that most of the genes share a number of common pathways and contribute in MAPK signaling pathway, cytokine-cytokine receptor interaction, Hippo signaling pathway, TGF-beta signaling pathway, and mTOR signaling pathway, etc (Figure 2). These results are validated by the literature and suggest that fibrosis arises as a consequence of multiple coactivated pathogenic pathways that affect inflammation and wound repair [2527]. For example, yes-associated protein (Yap) acts as a transcriptional cofactor in the Hippo signaling pathway by activating the transcription of genes, inactivation of which after MI elicits increased myocyte apoptosis and fibrosis [28]. Furthermore, users can specify the hyperlink of the interested cardiac fibrosis-related genes to find the page with detailed functional annotation of genes, such as gene-related SNPs, pathways, and GO terms.

Figure 2.

Figure 2

Bioinformatics pathway analysis for cardiac fibrosis-related gene sets with clusterProfiler [29].

4. Discussion

Identification of key regulators of cell proliferation and quiescence is a significant step toward potential regenerative therapies [3, 30]. FibroAtlas 1.0 is the first complete and up-to-date gene network aiming to extract the literature on fibrosis-related genes and their function in diseases. FibroAtlas 1.0 (http://biokb.ncpsb.org/fibroatlas/), a powerful and time-saving tool with credible content, can provide accurate information and overview of human fibrosis-related genes. Analysis with Reactome (http://www.reactome.org/) [16] shows a strong tendency for these genes to participate in the pathways of signal transduction, immune system, cell cycle, hemostasis, gene expression (transcription), extracellular matrix organization, metabolism of proteins, developmental biology, neuronal system, cell-cell communication, transport of small molecules, muscle contraction, etc. (Figure 3(a)). The protein class analysis with DAVID (https://david.ncifcrf.gov) [31] reveals that these genes concentrate predominately on the role of signaling molecule, hydrolase, receptor, enzyme modulator, nucleic acid binding, defense/immunity protein, transcription factor, transferase, etc. (Figure 3(b)).

Figure 3.

Figure 3

Bioinformatics analysis on the list of human fibrosis-related genes. (a) Biological pathway analysis with Reactome (http://www.reactome.org/). (b) Protein class analysis with PANTHER (http://pantherdb.org/).

A circulation system is supported by FibroAtlas 1.0. Sign in to give feedback by clicking the green “Yes” or red “No” button to accept or deny the evidence sentences (Figure 4). Our database will be periodically updated based on the results.

Figure 4.

Figure 4

All logged-in users can give their feedback by clicking the “Yes” or “No” button to confirm or reject the evidence phrases.

In future, we intend to carry out the following work to improve the performance of our database. Firstly, we will continue collecting fibrosis-related genes and replenishing genome-wide association studies data regularly. Second, we want to integrate the PPI information from both HPRD [32] and BioGRID [33] and then extract the direct interactors for fibrosis diseases candidate proteins in fibrosis-related genes. Finally, to help users to prioritize and select the information, we will further consider the following factors to implement a score for each fibrosis-related gene based on the supporting evidence, such as the number of supporting publications from text mining-based sources, the number of sources that report the association, the animal models and experimental strategies where the association has been studied, and the type of curation of each of these sources. In conclusion, we believe that FibroAtlas 1.0 will become a well-established resource with stable releases and be widely used as it can provide facilities for the research community and allied fields.

Acknowledgments

This work was funded by the Program of Precision Medicine (2016YFC0901905), National Natural Science Foundation of China (31871341), Major Project (BWS18J008), State Key Laboratory of Proteomics (SKLP-K201702), National Key Research and Development Program (2017YFC1700105), Innovation Project (16CXZ027), National Natural Science Foundation of China (31601064), Fundamental Research Funds for the Central Universities (1000061222884), and Beijing Nova Program (Z171100001117117).

Contributor Information

Dong Li, Email: lidong.bprc@foxmail.com.

Shuzhen Guo, Email: guoshz@bucm.edu.cn.

Data Availability

The data sets generated during the current study are available in the FibroAtlas 1.0 repository (http://biokb.ncpsb.org/fibroatlas/).

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Authors' Contributions

Dong Li and Shuzhen Guo conceived and conducted this work. Data collection, curation, and analysis were performed by Jiale Liu, Yuan Liu, Dezhi Sun, Jinying Liu, Yang Li, Xun Wang, Hao Xu, Lei Tian, and Huimin Zhang. The manuscript was written and revised by Jinying Liu and Dezhi Sun. The website was developed by Lihong Diao and Dezhi Sun. All authors reviewed and approved the submitted manuscript. All authors critically revised and edited the manuscript and approved the final version. Jinying Liu, Dezhi Sun, Jiale Liu, and Hao Xu contributed equally to this work.

Supplementary Materials

Supplementary Materials

Supplementary data are available at Cardiology Research and Practice Online.

References

  • 1.Wynn T. Cellular and molecular mechanisms of fibrosis. The Journal of Pathology. 2008;214(2):199–210. doi: 10.1002/path.2277. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Wynn T. A. Fibrotic disease and the TH1/TH2 paradigm. Nature Reviews Immunology. 2004;4(8):583–594. doi: 10.1038/nri1412. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Wynn T. A., Ramalingam T. R. Mechanisms of fibrosis: therapeutic translation for fibrotic disease. Nature Medicine. 2012;18(7):1028–1040. doi: 10.1038/nm.2807. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Wynn T. A. Common and unique mechanisms regulate fibrosis in various fibroproliferative diseases. Journal of Clinical Investigation. 2007;117(3):524–529. doi: 10.1172/jci31487. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Poynard T., Bedossa P., Opolon P. Natural history of liver fibrosis progression in patients with chronic hepatitis C. The OBSVIRC, METAVIR, CLINIVIR, and DOSVIRC groups. The Lancet. 1997;349(9055):825–832. doi: 10.1016/s0140-6736(96)07642-8. [DOI] [PubMed] [Google Scholar]
  • 6.Seibold M. A., Wise A. L., Speer M. C., et al. A common MUC5B promoter polymorphism and pulmonary fibrosis. The New England Journal of Medicine. 2011;364(16):1503–1512. doi: 10.1056/NEJMoa1013660. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.van Bon L., Affandi A. J., Broen J., et al. Proteome-wide analysis and CXCL4 as a biomarker in systemic sclerosis. The New England Journal of Medicine. 2014;370(5):433–443. doi: 10.1056/nejmoa1114576. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Makarev E., Izumchenko E., Aihara F., et al. Common pathway signature in lung and liver fibrosis. Cell Cycle. 2016;15(13):1667–1673. doi: 10.1080/15384101.2016.1152435. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Sangaralingham S. J., Wang B. H., Huang L., et al. Cardiorenal fibrosis and dysfunction in aging: imbalance in mediators and regulators of collagen. Peptides. 2016;76:108–114. doi: 10.1016/j.peptides.2016.01.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Nunes T., Campos D., Matos S., Oliveira J. L. BeCAS: biomedical concept recognition services and visualization. Bioinformatics. 2013;29(15):1915–1916. doi: 10.1093/bioinformatics/btt317. [DOI] [PubMed] [Google Scholar]
  • 11.Bada M., Eckert M., Evans D., et al. Concept annotation in the CRAFT corpus. BMC Bioinformatics. 2012;13(1):p. 161. doi: 10.1186/1471-2105-13-161. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Natale D. A., Arighi C. N., Blake J. A., et al. Protein ontology (PRO): enhancing and scaling up the representation of protein entities. Nucleic Acids Research. 2014;45(D1):D339–D346. doi: 10.1093/nar/gkw1075. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Liu J., Liu Y., Wang D., et al. AllerGAtlas 1.0: a human allergy-related genes database. Database. 2018;2018:p. bay010. doi: 10.1093/database/bay010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Schriml L. M., Mitraka E., Munro J., et al. Human disease ontology 2018 update: classification, content and workflow expansion. Nucleic Acids Research. 2019;47(D1):D955–D962. doi: 10.1093/nar/gky1032. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Carbon S., Ireland A., Mungall C. J., et al. AmiGO: online access to ontology and annotation data. Bioinformatics. 2009;25(2):288–289. doi: 10.1093/bioinformatics/btn615. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Fabregat A., Jupe S., Matthews L., et al. The reactome pathway knowledgebase. Nucleic Acids Research. 2018;46(D1):D649–D655. doi: 10.1093/nar/gkx1132. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Sherry S. T., Ward M., Sirotkin K. dbSNP—database for single nucleotide polymorphisms and other classes of minor genetic variation. Genome Research. 1999;9(8):677–679. [PubMed] [Google Scholar]
  • 18.Yates A., Akanni W., Amode M. R., et al. Ensembl 2016. Nucleic Acids Research. 2016;44(D1):D710–D716. doi: 10.1093/nar/gkv1157. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Maglott D., Ostell J., Pruitt K. D., et al. Entrez gene: gene-centered information at NCBI. Nucleic Acids Research. 2005;33(suppl_1):D54–D58. doi: 10.1093/nar/gki031. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Bairoch A., Apweiler R., Wu C. H., et al. The universal protein resource (UniProt) Nucleic Acids Research. 2005;33(suppl_1):D154–D159. doi: 10.1093/nar/gki070. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Gaudet P., Argoud-Puy G., Cusin I., et al. neXtProt: organizing protein knowledge in the context of human proteome projects. Journal of Proteome Research. 2012;12(1):293–298. doi: 10.1021/pr300830v. [DOI] [PubMed] [Google Scholar]
  • 22.Björling E., Uhlén M. Antibodypedia, a portal for sharing antibody and antigen validation data. Molecular & Cellular Proteomics. 2008;7(10):2028–2037. doi: 10.1074/mcp.m800264-mcp200. [DOI] [PubMed] [Google Scholar]
  • 23.Kong P., Christia P., Frangogiannis N. G. The pathogenesis of cardiac fibrosis. Cellular and Molecular Life Sciences. 2014;71(4):549–574. doi: 10.1007/s00018-013-1349-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.van de Schoor F. R., Aengevaeren V. L., Hopman M. T. E., et al. Myocardial fibrosis in athletes. Mayo Clinic Proceedings. 2016;91(11):1617–1631. doi: 10.1016/j.mayocp.2016.07.012. [DOI] [PubMed] [Google Scholar]
  • 25.Wilson M. S., Madala S. K., Ramalingam T. R., et al. Bleomycin and IL-1β-mediated pulmonary fibrosis is IL-17A dependent. The Journal of Experimental Medicine. 2010;207(3):535–552. doi: 10.1084/jem.20092121. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Chen G., Chen H., Wang C. Rapamycin ameliorates kidney fibrosis by inhibiting the activation of mTOR signaling in interstitial macrophages and myofibroblasts. PLoS One. 2012;7(3) doi: 10.1371/journal.pone.0033626.e33626 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Xin M., Kim Y., Sutherland L. B., et al. Hippo pathway effector Yap promotes cardiac regeneration. Proceedings of the National Academy of Sciences. 2013;110(34):13839–13844. doi: 10.1073/pnas.1313192110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Del Re D. P., Yang Y., Nakano N., et al. Yes-associated protein isoform 1 (Yap1) promotes cardiomyocyte survival and growth to protect against myocardial ischemic injury. Journal of Biological Chemistry. 2013;288(6):3977–3988. doi: 10.1074/jbc.m112.436311. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Yu G., Wang L.-G., Han Y., He Q.-Y. clusterProfiler: an R package for comparing biological themes among gene clusters. Omics: A Journal of Integrative Biology. 2012;16(5):284–287. doi: 10.1089/omi.2011.0118. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Li X., Zhu L., Wang B., Yuan M., Zhu R. Drugs and targets in fibrosis. Frontiers in Pharmacology. 2017;8:855–884. doi: 10.3389/fphar.2017.00855. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Huang D. W., Sherman B. T., Lempicki R. A. Systematic and integrative analysis of large gene lists using DAVID Bioinformatics Resources. Nature Protocols. 2009;4(1):44–57. doi: 10.1038/nprot.2008.211. [DOI] [PubMed] [Google Scholar]
  • 32.Peri S., Navarro J. D., Amanchy R., et al. Development of human protein reference database as an initial platform for approaching systems biology in humans. Genome Research. 2003;13(10):2363–2371. doi: 10.1101/gr.1680803. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Stark C., Breitkreutz B. J., Reguly T., Boucher L., Breitkreutz A., Tyers M. BioGRID: a general repository for interaction datasets. Nucleic Acids Research. 2006;34(90001):D535–D539. doi: 10.1093/nar/gkj109. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Materials

Supplementary data are available at Cardiology Research and Practice Online.

Data Availability Statement

The data sets generated during the current study are available in the FibroAtlas 1.0 repository (http://biokb.ncpsb.org/fibroatlas/).


Articles from Cardiology Research and Practice are provided here courtesy of Wiley

RESOURCES