Skip to main content
Nucleic Acids Research logoLink to Nucleic Acids Research
. 2009 Nov 9;38(Database issue):D665–D669. doi: 10.1093/nar/gkp945

HLungDB: an integrated database of human lung cancer research

Lishan Wang 1, Yuanyuan Xiong 1, Yihua Sun 2, Zhaoyuan Fang 2, Li Li 2, Hongbin Ji 2,*, Tieliu Shi 1,3,*
PMCID: PMC2808962  PMID: 19900972

Abstract

The human lung cancer database (HLungDB) is a database with the integration of the lung cancer-related genes, proteins and miRNAs together with the corresponding clinical information. The main purpose of this platform is to establish a network of lung cancer-related molecules and to facilitate the mechanistic study of lung carcinogenesis. The entries describing the relationships between molecules and human lung cancer in the current release were extracted manually from literatures. Currently, we have collected 2585 genes and 212 miRNA with the experimental evidences involved in the different stages of lung carcinogenesis through text mining. Furthermore, we have incorporated the results from analysis of transcription factor-binding motifs, the promoters and the SNP sites for each gene. Since epigenetic alterations also play an important role in lung carcinogenesis, genes with epigenetic regulation were also included. We hope HLungDB will enrich our knowledge about lung cancer biology and eventually lead to the development of novel therapeutic strategies. HLungDB can be freely accessed at http://www.megabionet.org/bio/hlung.

INTRODUCTION

Lung cancer, one of the most common causes of cancer-related death in both men and women, is responsible for 1.3 million deaths worldwide every year. Lung cancer can be roughly divided into two groups according to pathology: non-small cell lung cancer (NSCLC) (80.4%) and small cell lung cancer (16.8%) (1). Many factors potentially contribute to lung cancer formation, e.g. tobacco smoke, ionizing radiation and viral infection. However, the mechanisms involved in lung carcinogenesis remain largely unknown.

Similar to many other cancers, lung cancer is initiated by activation of oncogenes or inactivation of tumor suppressor genes (2). Previous studies have revealed the various causes of lung cancer at the genomic level. Mutations in the K-ras proto-oncogene are responsible for 10–30% of lung adenocarcinomas (3,4). The epidermal growth factor receptor (EGFR) regulates cell proliferation, apoptosis, angiogenesis and tumor invasion (3). Oncogenic mutations and amplification of EGFR are common in non-small cell lung cancer and thus provide the basis for treatment with EGFR inhibitors. In contrast, Her2/neu oncogenic mutation is less frequently observed (3). Other oncogenes involved include c-MET, NKX2-1, PIK3CA and BRAF (3). Inactivation of tumor suppressor genes plays important role in lung carcinogenesis. The p53 tumor suppressor gene, located on chromosome 17p, is affected in 60–75% of lung cancer including both NSCLC and SCLC while Rb is more likely inactivated in SCLC (5). P16 is also frequently inactivated through the methylation of its promoter region at genomic DNA level. Another important tumor suppressor gene is LKB1, whose loss-of-function mutation/deletion is observed in ∼30% lung adenocarcinomas and 20% of squamous cell carcinomas (6,7). Genetic polymorphisms are also indicated to be involved in lung carcinogenesis, e.g. interleukin-1 (8), cytochrome P450 (9), apoptosis promoters such as caspase-8 (10) and DNA repair molecules such as XRCC1 (11). People with these polymorphisms are susceptible to lung cancer development after exposure to carcinogens. Studies also suggest that the MDM2 309G allele is a low-penetrant risk factor for lung cancer development in Asian population (12).

Although lung cancer research data have accumulated dramatically during the past several years, to our knowledge, there is no database specifically focusing on lung cancer molecular biology yet available. OMIM contains information on all known Mendelian disorders and focuses on the relationship between phenotype and genotype (13). MethyCancer is developed to study the interplay of DNA methylation, gene expression and cancer. It contains both highly integrated data of DNA methylation, cancer-related genes, mutation and cancer information from public resources, and the CpG Island (CGI) clones derived from the large-scale sequencing projects (14). MiR2Disease aims at providing a comprehensive resource of microRNA misregulation in various human diseases (15). EGFR Mutation Database has a convenient compilation of somatic EGFR mutations in NSCLC and associated epidemiological and methodological data, including response to the tyrosine kinase inhibitors Gefitinib and Erlotinib (16). These databases focus on cancer pathogenesis from different angles with a little touch of lung cancer. Thus, it is beneficial to establish a lung cancer-related database or platform involving genes/proteins/miRNAs.

High-throughput techniques applied in the lung cancer research have generated a mass of data and provided important resources for us to potentially explore the molecular mechanisms and identify lung cancer-related molecules. The integration of information generated by small-scale studies and using high-throughput technology could provide a unique resource to facilitate the systematic study of the lung carcinogenesis process. To this end, we collected lung cancer-related molecules and other detailed information for database construction through text mining in combination with bioinformatics analysis. This repository and maintenance system specially designed for lung cancer information can no doubt facilitate future lung cancer investigations.

Overall, HLungDB enables the exploration of relevant information for human lung cancer-related molecules from multiple angles, making it a unique resource for human lung cancer and will serve as a useful platform for those interested in lung cancer biology.

DATA COLLECTION AND CONTENT

As aforementioned, initial entries describing the relationship between genes and human lung cancer are collected manually. The gene–lung cancer relationship documented in the current release were collected through searching the PubMed database with a list of keywords, such as ‘lung cancer gene’, ‘pulmonary cancer gene’, ‘pulmonary adenocarcinoma gene’, etc. After we obtained the literature with the keywords above, we read through and interpreted each paper by collecting the important information, including the type of gene alteration, the clinical correlation and/or significance of the gene alternation with lung cancer, the lung cancer subtype, the potential mechanism of gene regulation and the experimental methods involved.

Each entry in the database contains detailed information on a lung cancer–gene relationship, including a basic description of the gene, the expression pattern of gene (up- or down-regulated) in the lung cancer patient, the experimentally validated regulatory information (transcription factors, their binding motif and the promoter) and protein–protein interaction (PPI) network etc.

Gene expression profiling data for lung cancer patient samples were also retrieved from GEO. The differentially expressed genes were selected if the change between lung cancer samples and normal control is larger than 2-fold. To make the results more reliable, we only selected those genes differentially expressed from at least three patients in a dataset and displayed them on our web site.

In the current release of HLungDB, 2585 genes were selected for their relationships with lung carcinogenesis. A total of 271 lung cancer samples from six expression profiling datasets were analyzed to get the gene expression pattern (17–22). For the lung cancer-related SNPs, we searched PubMed with key words, namely ‘SNP’ and ‘lung cancer’. Then, we collected the SNPs proven to be correlated with lung cancer from those returned papers. In total, 424 SNPs, no matter whether they could be mapped to a gene or not, were added into the database. Additionally, 360 transcription factors with 1160 binding motifs and 253 lung cancer-related genes with detailed epigenetic information were also placed into the database.

Accumulating evidence has indicated that miRNAs play an important role in lung cancer pathology. Previous experiments, both with high-throughput and small-scale methods, have identified many miRNAs differentially expressed in lung cancer and/or confirmed to be related to lung cancer. Hence, miRNA data are an important resource for lung cancer research. Therefore, we selected lung cancer-related miRNAs with experimental information from the literature. For those miRNAs with identified targets, the targets along with the experiment methods used are also provided in the platform. Currently, there are 212 lung cancer-related miRNAs included in the HLungDB.

Next, we built the HLungDB database by integrating the data we collected with information from other resources (Figure 1), which makes our database a one-stop and knowledgeable platform for the lung cancer research community.

Figure 1.

Figure 1.

The database structure of HLungDB.

DATA ACCESS

HLungDB provides a search engine to query detailed information on each gene–lung cancer relationship documented in the database. Query keywords, including gene/protein symbol or its synonym, are all allowed. The information flow is roughly described in Figure 2.

Figure 2.

Figure 2.

The flowchart of query in the HLungDB database.

After submission of the symbol or the alias of a gene, gene centered information will be displayed in a new page, including symbol, alias, description, protein–protein interactions, expression alterations based on the microarray data and regulatory information if the gene has been confirmed to be related to lung cancer in our database. To see more details about how the gene is related to lung cancer, the user can click on the gene symbol link and a new page will appear to display evidence of the genes relationship to lung cancer. ‘Clinical Significance’ indicates the effect of the gene alteration on the lung cancer in the point of clinical view that is collected from the literature; ‘Function’ describes the gene’s role in lung cancer extracted from the published papers; ‘Gene Regulation’ presents the regulatory relationship of the gene with other genes, while ‘Expression Alteration’ shows the analysis results of lung cancer related microarray datasets, in which the user can see how many patients show gene upregulation and/or downregulation.

The PPI link leads to a new page that shows the proteins interacting with the query protein, the ‘Show PPI Network’ link will display the selected protein–protein interaction network based on experimental evidence mostly from the HPRD system (23). In the PPI network section, user-friendly interfaces have made all the features of HLungDB PPI easily accessible and also provide direct view for the user to explore the relationship among the proteins.

The ‘Regulatory Information’ links the user to the names of the transcription factors confirmed to regulate the gene. ‘See Details’ links the user to a new page that displays the binding site motifs of those transcription factors with the supporting PubMed ID. The ‘Show Promoter of Gene’ link will display the promoter sequence(s) of the selected gene. A gene with an unknown transcription factor will only show its promoter sequence(s).

Alternatively, the user can query our system with the protein symbol, and a new summary page will provide a brief description of the protein, the PPI, the links to other related resources and the PPI network. Users can navigate each item in detail by clicking the related links.

Users can also check whether a miRNA is related to lung cancer with the miRNA symbol. The results page will display the manually collected details for the related miRNA, including the disease type the miRNA is related to, the alterations in expression of the miRNA, the mechanism of the miRNA in lung cancer, the experiment methods used to confirm the mechanism, the targets of the miRNA if any with PubMed ID and the description of the miRNA involved in lung cancer.

HLungDB provides two ways to view all lung cancer-related genes. The first approach is to query the database via visualized chromosome browser through ‘Chromosome’ listed on the first page. The user then clicks ‘Chromosome’ on the top of this page, and a chromosome map will return. In the Chromosome page, the user can view lung cancer-related genes by Chromosome ID. With the second approach, ‘Browse’ on the first page of HLungDB allows users to see all the genes confirmed to be related to lung cancer. The genes in this list are sorted by alphabetical order. Using these two approaches, users can easily retrieve all genes that are related to lung cancer.

Another way to view lung cancer-related genes is provided in the pathway view. On the pathway list, users can check those lung cancer genes by clicking on the pathway name and view the network about this pathway through the ‘Pathway Network’ entrance. User can also click on the marginal node on the network to expand the network. For more detailed usage of the network, users can read the annotation on the pathway network page.

Users can view lung cancer-related information in our database through browsing SNP, transcription factor and methylation lists. The ‘SNP View’ provides the user with lung cancer-related SNP obtained from PubMed by searching ‘SNP’ and ‘lung cancer’. The ‘TransFactor View’ presents transcription factors related to lung cancer with other detailed information. The ‘Methylation View’ displays genes with epigenetic alterations observed in lung cancer.

In addition, convenient links are provided to other databases. HLungDB has been developed with crosslink to other relevant external resources. It includes the National Center for Biotechnology Information, a repository for published gene information, and PubMed, US National Library of Medicine, that includes over 18 million citations from MEDLINE and other life science journals. HPRD, HUGO, IPI, EBI and KEGG are also linked to HLungDB.

DISCUSSION

In order to provide a central resource for biologists in the lung cancer research community, we developed HLungDB, a database system aimed at providing a comprehensive resource of gene information and their relationships to lung cancer.

The goal of the lung cancer database project was to construct a large-scale platform for lung cancer that would contribute to basic research and clinical research in the future. In the past 2 years, large amounts of data have been collected for this project. Information on lung cancer data was obtained from the PubMed and GEO databases. Genes, miRNAs, gene promoters, transcription factors, transcription factor-binding sites and the SNPs related to lung cancer have been collected and integrated into this system. Clinical information related to gene expression profile data was also extracted from GEO. We have systematically extracted information from published lung cancer-related studies. The database currently contains 2585 full-text entries describing lung cancer and genes. They have been integrated in such a way that investigators can rapidly query whether a gene or protein is found in human lung cancer, and other detailed lung cancer-related information about this gene. User-friendly query interfaces have made all the features of HLungDB easily accessible.

HLungDB provides a comprehensive resource for human lung cancer research. We believe that HLungDB will be particularly interesting to the life science community and will greatly facilitate cancer biologists’ mission of unraveling the pathogenesis of lung cancer.

FUTURE DIRECTIONS

We are working to increase the quality and quantity of data and to supply additional database function. We plan to adopt two strategies to achieve these goals. First, text-mining tools will be adopted to improve our data collection. We will use text-mining tools to help us prescreen PubMed abstracts regularly that potentially describe the lung cancer–gene relationships. Second, since many proteins in the signaling transduction pathways are involved in the lung cancer development and progression, our next step is to identify those signal transduction pathways that have significant changes and display their components with identified alteration in lung cancer in a network view. At the same time, we will also collect the downstream genes for each altered signaling pathway in lung cancer and further characterize the relationship between them to ultimately fulfill the goal of identifying new potentially relevant lung cancer genes and new mechanisms.

FUNDING

State Key Program of Basic Research of China (Grant 2007CB108800, 2009CB918402, 2010CB912102); National High Technology Research and Development Program of China (863 project) (Grant No. 2006AA02Z313); National Natural Science Foundation of China (Grant 30870575, 30740084 and 30871284); Chinese Academy of Sciences (2008KIP101); Science and Technology Commission of Shanghai Municipality (06DZ22923, 08PJ14105). H.J. is a scholar of the Hundred Talents Program of the Chinese Academy of Sciences. Funding for open access charge: National Natural Science Foundation of China and the State Key Program of Basic Research of China.

Conflict of interest statement. None declared.

ACKNOWLEDGEMENTS

We are thankful for M. Tyler Hougland, Peng Li, Tian Xiao, Chao Zheng, Yan Feng, Rong Fang, Yijun Gao, Yujuan Jin, Zuoyun Wang, Xiankun Han, Junhua Zhang, Xiaolei Ye, Bin Gao, Hongling Huang, Fei Li, Ye Wang for technical supports.

REFERENCES

  • 1.Travis WD, Travis LB, Devesa SS. Lung cancer. Cancer. 1995;75:191–202. doi: 10.1002/1097-0142(19950101)75:1+<191::aid-cncr2820751307>3.0.co;2-y. [DOI] [PubMed] [Google Scholar]
  • 2.Fong KM, Sekido Y, Gazdar AF, Minna JD. Lung cancer. 9: Molecular biology of lung cancer: clinical implications. Thorax. 2003;58:892–900. doi: 10.1136/thorax.58.10.892. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Herbst RS, Heymach JV, Lippman SM. Lung cancer. N. Engl. J. Med. 2008;359:1367–1380. doi: 10.1056/NEJMra0802714. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Aviel-Ronen S, Blackhall FH, Shepherd FA, Tsao MS. K-ras mutations in non-small-cell lung carcinoma: a review. Clin. Lung Cancer. 2006;8:30–38. doi: 10.3816/CLC.2006.n.030. [DOI] [PubMed] [Google Scholar]
  • 5.Devereux TR, Taylor JA, Barrett JC. Molecular mechanisms of lung cancer. Interaction of environmental and genetic factors. Giles F. Filley Lecture. Chest. 1996;109:14S–19S. doi: 10.1378/chest.109.3_supplement.14s. [DOI] [PubMed] [Google Scholar]
  • 6.Ji H, Ramsey MR, Hayes DN, Fan C, McNamara K, Kozlowski P, Torrice C, Wu MC, Shimamura T, Perera SA, et al. LKB1 modulates lung cancer differentiation and metastasis. Nature. 2007;448:807–810. doi: 10.1038/nature06030. [DOI] [PubMed] [Google Scholar]
  • 7.Ding L, Getz G, Wheeler DA, Mardis ER, McLellan MD, Cibulskis K, Sougnez C, Greulich H, Muzny DM, Morgan MB, et al. Somatic mutations affect key pathways in lung adenocarcinoma. Nature. 2008;455:1069–1075. doi: 10.1038/nature07423. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Engels EA, Wu X, Gu J, Dong Q, Liu J, Spitz MR. Systematic evaluation of genetic variants in the inflammation pathway and risk of lung cancer. Cancer Res. 2007;67:6520–6527. doi: 10.1158/0008-5472.CAN-07-0370. [DOI] [PubMed] [Google Scholar]
  • 9.Wenzlaff AS, Cote ML, Bock CH, Land SJ, Santer SK, Schwartz DR, Schwartz AG. CYP1A1 and CYP1B1 polymorphisms and risk of lung cancer among never smokers: a population-based study. Carcinogenesis. 2005;26:2207–2212. doi: 10.1093/carcin/bgi191. [DOI] [PubMed] [Google Scholar]
  • 10.Son JW, Kang HK, Chae MH, Choi JE, Park JM, Lee WK, Kim CH, Kim DS, Kam S, Kang YM, et al. Polymorphisms in the caspase-8 gene and the risk of lung cancer. Cancer Genet. Cytogenet. 2006;169:121–127. doi: 10.1016/j.cancergencyto.2006.04.001. [DOI] [PubMed] [Google Scholar]
  • 11.Yin J, Vogel U, Ma Y, Qi R, Sun Z, Wang H. The DNA repair gene XRCC1 and genetic susceptibility of lung cancer in a northeastern Chinese population. Lung Cancer. 2007;56:153–160. doi: 10.1016/j.lungcan.2006.12.012. [DOI] [PubMed] [Google Scholar]
  • 12.Tomoda K, Ohkoshi T, Hirota K, Sonavane GS, Nakajima T, Terada H, Komuro M, Kitazato K, Makino K. Preparation and properties of inhalable nanocomposite particles for treatment of lung cancer. Colloids Surf. B: Biointerfaces. 2009;71:177–182. doi: 10.1016/j.colsurfb.2009.02.001. [DOI] [PubMed] [Google Scholar]
  • 13.Amberger J, Bocchini CA, Scott AF, Hamosh A. McKusick’s Online Mendelian Inheritance in Man (OMIM) Nucleic Acids Res. 2009;37:D793–D796. doi: 10.1093/nar/gkn665. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.He X, Chang S, Zhang J, Zhao Q, Xiang H, Kusonmano K, Yang L, Sun ZS, Yang H, Wang J. MethyCancer: the database of human DNA methylation and cancer. Nucleic Acids Res. 2008;36:D836–D841. doi: 10.1093/nar/gkm730. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Jiang Q, Wang Y, Hao Y, Juan L, Teng M, Zhang X, Li M, Wang G, Liu Y. miR2Disease: a manually curated database for microRNA deregulation in human disease. Nucleic Acids Res. 2009;37:D98–D104. doi: 10.1093/nar/gkn714. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Gu D, Scaringe WA, Li K, Saldivar JS, Hill KA, Chen Z, Gonzalez KD, Sommer SS. Database of somatic mutations in EGFR with analyses revealing indel hotspots but no smoking-associated signature. Hum. Mutat. 2007;28:760–770. doi: 10.1002/humu.20512. [DOI] [PubMed] [Google Scholar]
  • 17.Landi MT, Dracheva T, Rotunno M, Figueroa JD, Liu H, Dasgupta A, Mann FE, Fukuoka J, Hames M, Bergen AW, et al. Gene expression signature of cigarette smoking and its role in lung adenocarcinoma development and survival. PLoS ONE. 2008;3:e1651. doi: 10.1371/journal.pone.0001651. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Stearman RS, Dwyer-Nield L, Zerbe L, Blaine SA, Chan Z, Bunn PA, Jr, Johnson GL, Hirsch FR, Merrick DT, Franklin WA, et al. Analysis of orthologous gene expression between human pulmonary adenocarcinoma and a carcinogen-induced murine model. Am. J. Pathol. 2005;167:1763–1775. doi: 10.1016/S0002-9440(10)61257-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Wachi S, Yoneda K, Wu R. Interactome–transcriptome analysis reveals the high centrality of genes differentially expressed in lung cancer tissues. Bioinformatics. 2005;21:4205–4208. doi: 10.1093/bioinformatics/bti688. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Rohrbeck A, Neukirchen J, Rosskopf M, Pardillos GG, Geddert H, Schwalen A, Gabbert HE, von Haeseler A, Pitschke G, Schott M, et al. Gene expression profiling for molecular distinction and characterization of laser captured primary lung cancers. J. Transl. Med. 2008;6:69. doi: 10.1186/1479-5876-6-69. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Wrage M, Ruosaari S, Eijk PP, Kaifi JT, Hollmen J, Yekebas EF, Izbicki JR, Brakenhoff RH, Streichert T, Riethdorf S, et al. Genomic profiles associated with early micrometastasis in lung cancer: relevance of 4q deletion. Clin. Cancer Res. 2009;15:1566–1574. doi: 10.1158/1078-0432.CCR-08-2188. [DOI] [PubMed] [Google Scholar]
  • 22.Spira A, Beane JE, Shah V, Steiling K, Liu G, Schembri F, Gilman S, Dumas YM, Calner P, Sebastiani P, et al. Airway epithelial gene expression in the diagnostic evaluation of smokers with suspect lung cancer. Nat. Med. 2007;13:361–366. doi: 10.1038/nm1556. [DOI] [PubMed] [Google Scholar]
  • 23.Keshava Prasad TS, Goel R, Kandasamy K, Keerthikumar S, Kumar S, Mathivanan S, Telikicherla D, Raju R, Shafreen B, Venugopal A, et al. Human Protein Reference Database—2009 update. Nucleic Acids Res. 2009;37:D767–D772. doi: 10.1093/nar/gkn892. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press

RESOURCES