Abstract
More than 100 000 human genetic variations have been described in various genes that are associated with a wide variety of diseases. Such data provides invaluable information for both clinical medicine and basic science. A number of locus-specific databases have been developed to exploit this huge amount of data. However, the scope, format and content of these databases differ strongly and as no standard for variation databases has yet been adopted, the way data is presented varies enormously. This review aims to give an overview of current resources for human variation data in public and commercial resources.
Background
Over the recent years the cloning of genes involved in complex diseases such as cancer as well as the development of new high throughput techniques like single nucleotide polymorphism (SNP) arrays has made enormous progress. This resulted in more than 100 000 human genetic variations which have been described in various genes associated with a wide variety of diseases (1–3). Somatic variations in cancer are used in clinical studies and molecular pathology to characterize tumor types, to improve the best suited treatment choice, and to predict response to treatment. Thus, mutation analysis can play an important role in drug discovery and drug development. Identification of genetic variants will yield new drug targets and biomarkers.
Cancer, as a disease of genome alterations, arises through the sporadic acquisition of multiple somatic variations (4). However, not all mutations contribute equally to the cancer type in which they are found. The proportion of mutations causally implicated in cancer is still unknown especially due to the high number of variations between different tumors (5–9) Although the number of unique variations for each cancer genome can be very high (10,11), only a few somatic variations will be critical for the development of the tumor. These causative variations, the so-called ‘drivers’, are emerging because of selective pressure during tumorigenesis, whereas many mutations are only incidental or caused by genome instabilities, so-called ‘passengers’ (12). The differentiation of disease causing driver mutations from the passenger variations is a challenge for mutation analysis (13).
Usefulness of mutation analysis
Analysis of mutations is useful in many ways: the study of cancer-prone DNA repair diseases (Xeroderma pigmentosum, Ataxia telangiectasia, Fanconi’s anemia, Bloom’s syndrome and others) has given valuable insights in the type and function of genes responsible for maintaining DNA integrity (14–18). Mutation analysis can help to predict the risk for developing certain types of cancer, BRCA1 and BRCA2 (increased breast cancer risk) (19) and APC (increased risk for colon cancer) (20) being among the best known so far.
Mutations can also influence the response of patients to cancer drugs, e.g. the KRAS (21,22) or BRAF (23,24) mutations. The presence of certain mutations can also influence progression free or overall survival rates of patients (22,25).
Germline versus somatic mutations
In general, mutations can be grouped into two different categories: germline and somatic. Germline mutations are variations found in all cells of an organism including germ line cells. They play an important role in evolution by giving every human its genetic individuality (see SNPs) but also give a rise to hereditary diseases like sickle-cell anemia or phenylketonuria. Germline mutations can also lead to increased risk for developing cancer, like BRCA1 and BRCA2 gene mutations which are associated with an increased risk for breast and ovarian cancer (26–28). Other examples of familial cancer syndromes include von Hippel–Lindau syndrome (caused by mutations in VHL) (29), Peutz–Jeghers syndrome (caused by mutations in LKB1) (30) and Li–Fraumeni syndrome (caused by mutations in TP53) (31).
Detection of germline mutations with current technologies is state of the art but time-consuming. Usually a large amount of genetic material of good quality can be extracted from blood cells. However, in addition to the mutation detection, the differentiation of disease causing and neutral germline mutations having no effect on the phenotype is an important but non-trivial task. Currently, no generally applicable solution for this problem exists and this question often remains unsolved.
Somatic mutations are not inherited but acquired during lifetime in somatic cells of an organism and might cause tissue specific tumors. An important problem with somatic mutations is the difficulty of their detection. Tumor samples can be very heterogeneous and are very often ‘contaminated’ with normal cells, such as stromal cells. However, since somatic mutations are identified through a comparison of a tumor sample with a normal sample of the same organism the identification of the mutation is unambiguous. Also for somatic mutations the differentiation between drivers and passengers is an important but still unsolved problem. In contrast to germline mutations however, all somatic mutations are tumor associated. Therefore, all non-silent somatic mutations are potential candidates for biomarker development.
Mutation types
Genome alterations are typically classified by the mutation type. The different databases characterize all variations first by the effect on the nucleotide sequence: deletions, insertions and single nucleotide variations. Mutations occurring in the coding region of a gene can also be classified by their effect on the amino acid sequence. A variation of the coding sequence without any change of the amino acid sequence of the protein is called silent mutation. Single nucleotide mutations causing the substitution of a different amino acid are called missense mutation. A frameshift mutation is an insertion or deletion in the coding sequence which changes the reading frame resulting in a different translation of the subsequent sequence. Nonsense mutations generate a premature stop codon and often a non-functional truncated protein product.
SNPs versus germline mutations
Single nucleotide germline mutations and SNP are often used as synonyms, since both describe variations of single nucleotides, which are inherited and not tumor-associated per se. However, concerning the databases presented here these synonyms are used in two different meanings: SNPs as presented in public databases like dbSNP (32,33) or HapMap (34) are germline variations for which at most population frequencies are known. In literature it is usually assumed that the variation should be found in more than 1% of the population in order to be called a SNP. Such information is very useful for biomarker development since it describes the prevalence of the mutation in different populations. However, it is normally not possible to get additional information (like gender, age, or disease status) on the individuals having the SNP, only the population a person belongs to is given. Since it is not known if the information comes from a tumor or normal sample, a correlation between diseases and SNPs cannot be calculated.
In contrast, germline mutations presented in cancer or disease mutation databases like ‘The Cancer Genome Atlas (TCGA)’ (35) are usually connected with additional sample information like patient gender, age, histology or tissue. Germline mutations are found in the normal as well as the tumor sample. Hence, the sample information allows for further analyses of associations between germline mutations and diseases.
Standardization efforts
A standard problem occurring in every field where huge amounts of data are generated is standardization. Without standardization the task to identify and integrate the data is very complicated, laborious, error-prone and time-consuming. Although databases may have different scope and aims it is important to standardize content and annotation. The Human Genome Variation Society (HGVS) has proposed a recommendation for the nomenclature of genetic variations and content of mutation databases and scientific publications (36). This naming of mutations has now become widely accepted. Some journals (e.g. Human Mutation) already accept only publications with mutation notation following the HGVS proposed recommendations. If more publishers should follow this trend it would have a very positive effect on the usability of mutation databases including an increase of the quality and amount of their content.
HGVS and members have published number of recommendations e.g. for the collection of somatic mutations, sharing data, etc. There are also projects at European Bioinformatics Institute (EBI) and National Center for Biotechnology Information (NCBI) to develop reference sequences, locus reference genomics (LRGs) (http://www.lrg-sequence.org) and RefSeqGenes (http://www.ncbi.nlm.nih.gov/projects/RefSeq/RSG/), respectively. In addition, the Gen2Phen (http://www.gen2phen.org) project works on data models and standards for a number of aspects related to variation data description, storage and integration in databases.
Except for the already widely accepted naming recommendations of mutations by the HGVS, a promising standardization effort for integrating all cancer genome data is still missing.
Structure and accessibility
Historically, mutations and variations in human have been reported only in the published literature. Mutation descriptions were often not precise, no standard notation existed, and the sequence of the reference gene under study was almost never indicated. To this end a sophisticated mutation analysis was mostly unfeasible. However, with the explosion of large-scale cancer genome sequencing (35,37–40) more and more information on genetic variations has been captured over the last years in publicly available databases that can be used by clinicians or scientists as a research tool. These databases are widespread and their scope, format and content can be very different. Current data related to somatic mutations is mostly buried in journals or scattered between several locus-specific databases (LSDBs) and general databases that have no or very limited connections between them.
Only a few large public resources exist that comprehensively compile data on somatic gene alterations in cancer: International Agency for Research on Cancer (IARC) TP53 Database (41), Catalog of Somatic Mutations in Cancer (COSMIC) (42), TCGA (35), Roche Cancer Genome Database (RCGDB) and Human Gene Mutation Database (HGMD®) (43).
The LSDBs often originate as loosely organized compilation of data. Since no standard system similar to the HGVS recommendation for mutation notation has yet been established, the presentation of the data in LSDBs varies enormously. The data is mostly presented in flat files, plain text databases, or Microsoft Excel spreadsheets making it easy to collect and store the information, but nearly impossible to search or retrieve specific data. More ambitious databases use open source database management software (DBMS)—like MySQL or PostgreSQL—whereas only a minority of curators use specialized software such as the UMD (44), the Mutation Storage and Retrieval Program (MuStaR) (45), or the Leiden Open Source Variation Database (LOVD) (46). The use of such relational DBMSs allows to specify complex queries and specific analyses of customized subsets of the database.
Cancer variation databases
Currently, the best-known publicly available primary database on somatic mutations in human cancer is the ‘COSMIC’(42) hosted at the Wellcome Trust Sanger institute in Cambridge. The data is gathered from scientific publications and genome-wide screens from the Cancer Genome Project (CGP) at the same institute. The project has been continuously updated and improved for over 9 years and currently contains more than 108 773 mutations in >13 500 different genes observed in over 449 676 different tumor samples. The curation process in COSMIC is largely manual resulting in a very high quality of the data. For each mutation all details on the sample like patient age, gender, histology and tissue are available. COSMIC uses its own internal classification system to provide tissue and histology consistency within the database and to reduce redundancy. All tissue and histology information from scientific publications is translated using this classification system. In addition, for each study the project offers the information which genes where actually screened, since published studies often focus on mutation hot spots, for example KRAS (47), BRAF (48) or TP53 (49). This information enables frequency data to be calculated for mutations in various genes and different cancer types. COSMIC offers also somatic mutations found in cell-lines including the NCI-60 (50). The website of COSMIC has a clear structure and is easy to use. The interface allows to browse by gene, or search by phenotype. Summary information on mutation counts and frequencies are presented graphically for a better understanding. In addition, all information can be downloaded as txt files, or as an Oracle dump file.
Another large mutation data source is ‘TCGA’ (35), a project at the National Cancer Institute (NCI) and the National Human Genome Research Institute (NHGRI). The main goal of TCGA is to understand the molecular basis of cancer through the application of genome analysis techniques, including large-scale genome sequencing and SNP analysis. For each patient a whole genome analysis of a normal, a tumor and control samples (a second normal and tumor sample as control) is performed enabling researchers to distinguish between somatic and germline mutations. All mutations found are publicly available in a special Mutation Analysis file Format (MAF) and can be downloaded via the TCGA Data Portal. This portal contains all TCGA data concerning to clinical information associated with tumors and human subjects, genomic characterization, and high-throughput sequencing analysis of the tumor genomes. However, no advanced search interface or graphical visualization of the mutation data is available. In the starting phase the project focused on only two cancer types: brain cancer (glioblastoma multiforme) and ovarian serous adenocarcinoma. After the pilot phase, which was completed in 2009, TCGA matured to a full project and is now dealing with more than 20 types of cancer.
Another concept focusing on the integration of heterogeneous mutation data sources is pursued by the RCGDB (51), developed at Roche Pharma Research. The freely available warehouse system integrates somatic and germline mutations gathered by manual curation from scientific publications and public cancer mutation databases (COSMIC, TCGA, etc.). In addition, these mutations are enriched by SNP data from the HapMap (34) project. Updates are provided on a regular basis depending on the update frequency of the external data sources (approximately every 3 months). Access to the RCGDB is offered via a publicly available web interface. A major aspect in designing the user interface was that users should be able to search and view mutations in an intuitive and straight-forward manner, without having to understand the architectural details of the warehouse system. Therefore, the database offers a Google-like web interface to search for cancer genome information on a single gene, sample or cell-line, and on multiple genes, samples or cell-lines. As a special feature the search is supported by an auto-suggestion functionality allowing to search by NCBI GeneIDs, names, or synonyms.
The HGMD® (43) at the Institute of Medical Genetics in Cardiff is a commercial mutation database providing information on somatic and germline mutations. Furthermore, the database offers a less up-to-date public version which is freely available only to registered users from academic institutions or non-profit organizations, respectively. The data is gathered from scientific publications and from publicly available LSDBs. The project claims to include all mutations causing or associated with human inherited disease, plus disease-associated/functional polymorphisms reported in the literature. Currently, HGMD provides information on 96 631 mutations in 3611 genes under the professional license and 69 660 mutation in 2572 genes in the public version of the database. The website of HGMD allows to search by gene, publication or mutation id and presents the results in a table view. A downloadable version of HGMD is only available under the professional license.
In addition to multi-gene LSDBs, various single-gene LSDBs are publicly available. The largest and best-known single-gene LSDB is the TP53 mutation database from the IARC (41), with all TP53 gene variations identified in human populations and tumor samples since 1989. The database contains information on somatic as well as germline mutations of TP53 in patient samples, human cell-lines, and mouse models. This data is compiled from the peer-reviewed literature and from generalist databases. The website offers different sophisticated interfaces for searching and mining the database by multiple criteria. Furthermore, all information can be downloaded in tab-delimited txt-files. A large number of other single gene databases exists like the L1CAM mutation database from the university of Groningen (52) containing single gene somatic mutations. Most of these LSDBs are small containing mostly <500 variants.
For a detailed list of cancer mutation databases see Table 1.
Table 1.
Database | URL | Gene(s) | Mutation type | Remark |
---|---|---|---|---|
BLMbase | http://bioinf.uta.fi/BLMbase | BLM | Germline | |
CASP10base | http://bioinf.uta.fi/CASP10base | CASP10 | Germline | |
CASP8base | http://bioinf.uta.fi/CASP8base | CASP8 | Germline | |
Catalog of Somatic Mutations in Cancer (COSMIC) | http://www.sanger.ac.uk/genetics/CGP/cosmic/ | >13 500 | Somatic | |
Fanconi Anemia Mutation Database | http://www.rockefeller.edu/fanconi/mutate/ | FANCA, FANCB, FANCC, FANCD1, FANCD2, FANCE, FANCF, FANCG, FANCI, FANCJ, FANCL, FANCM, FANCN | ||
Genetic Alterations in Cancer DB (GAC) | http://www.niehs.nih.gov/research/resources/databases/gac/ | 32 cancer genes | Somatic, germline | |
HNPCC database | http://www.insight-group.org/ | APC, EPCAM, MUTYH, MSH2, MSH6, MLH1, MLH3, PMS1, PMS2 | ||
Human Genome Variation and Genotype/Phenotype Database (HGVbaseG2 P) | http://www.hgvbaseg2p.org/index | Germline | ||
IARC TP53 database | http://www-p53.iarc.fr/ | TP53 | Somatic, germline | |
International HapMap Project | http://www.hapmap.org/ | SNP | ||
KinMutBase | http://bioinf.uta.fi/KinMutBase/ | Protein kinases | Germline | |
LOVD-ATM | http://www.LOVD.nl/ATM | ATM | Germline | Uses Leiden Open Variation Database |
LOVD-B3GALTL | http://www.LOVD.nl/B3GALTL | B3GALTL | Germline | Uses Leiden Open Variation Database |
LOVD-BRCA2 | http://www.LOVD.nl/BRCA2 | BRCA2 | Germline | Uses Leiden Open Variation Database |
LOVD-FANCA | http://www.LOVD.nl/FANCA | FANCA | Germline | Uses Leiden Open Variation Database |
LOVD-FANCB | http://www.LOVD.nl/FANCB | FANCB | Germline | Uses Leiden Open Variation Database |
LOVD-FANCC | http://www.LOVD.nl/FANCC | FANCC | Germline | Uses Leiden Open Variation Database |
LOVD-FANCD2 | http://www.LOVD.nl/FANCD2 | FANCD2 | Germline | Uses Leiden Open Variation Database |
LOVD-FANCE | http://www.LOVD.nl/FANCE | FANCE | Germline | Uses Leiden Open Variation Database |
LOVD-FANCF | http://www.LOVD.nl/FANCF | FANCF | Germline | Uses Leiden Open Variation Database |
LOVD-FANCG | http://www.LOVD.nl/FANCG | FANCG | Germline | Uses Leiden Open Variation Database |
LOVD-FANCL | http://www.LOVD.nl/FANCL | FANCL | Germline | Uses Leiden Open Variation Database |
LOVD-MUTYH | http://www.LOVD.nl/MUTYH | MUTYH | Germline | Uses Leiden Open Variation Database |
LOVD-NOTCH3 | http://www.LOVD.nl/NOTCH3 | NOTCH3 | Germline | Uses Leiden Open Variation Database |
LOVD-NROB1 | http://www.LOVD.nl/NROB1 | NROB1 | Germline | Uses Leiden Open Variation Database |
LOVD-OTC | http://www.LOVD.nl/OTC | OTC | Germline | Uses Leiden Open Variation Database |
LOVD-TSC1 | http://www.LOVD.nl/TSC1 | TSC1 | Germline | Uses Leiden Open Variation Database |
LOVD-TSC2 | http://www.LOVD.nl/TSC2 | TSC2 | Germline | Uses Leiden Open Variation Database |
MDL EGFR Mutation Database | http://www.egfr.org/ | EGFR | Somatic, germline | |
Mismatch Repair Genes Variant Database | http://www.med.mun.ca/mmrvariants/ | MSH2,MSH6,MLH1,PMS2 | Germline | |
NCBI dbSNP | http://www.ncbi.nlm.nih.gov/projects/SNP/ | SNP | ||
Online Mendelian Inheritance in Man (OMIM) | http://www.ncbi.nlm.nih.gov/sites/entrez?db=omim | All genes | Germline | |
PTCH Mutation Database | http://www.cybergene.se/PTCH/ | PTCH | Somatic, germline | |
Roche Cancer Genome Database (RCGDB) | http://rcgdb.bioinf.uni-sb.de/MutomeWeb | >10 000 | Somatic, SNP | |
The Cancer Genome Atlas (TCGA) | http://cancergenome.nih.gov/ | >400 | Somatic, germline | Login for TCGA Data Portal necessary, more disease to come |
The Human Gene Mutation Database (HGMD) | http://www.hgmd.cf.ac.uk/ac/index.php | 2572 | Somatic, germline | Commercial license: 3611 genes |
The TP53 database | http://p53.free.fr/ | TP53 | Somatic, germline | |
TSH Receptor mutation database | http://www.uni-leipzig.de/innere/tsh/ | TSHR | Somatic, germline | |
UMD-APC | http://www.umd.be/APC/ | APC | Germline | |
UMD-BRCA1 | http://www.umd.be/BRCA1/ | BRCA1 | Germline | Restricted access |
UMD-BRCA2 | http://www.umd.be/BRCA2/ | BRCA2 | Germline | Restricted access |
UMD-MEN1 | http://www.umd.be/MEN1/ | MEN1 | Germline | |
UMD-VHL | http://www.umd.be/VHL/ | VHL | Germline | |
UML-TGFBR2 | http://194.167.35.228/TGFBR2/ | TGFBR2 | Germline | |
University of Groningen L1CAM Mutation DB | http://www.l1cammutationdatabase.info/ | L1CAM | Somatic | |
WASbase | http://bioinf.uta.fi/WASbase | WASP | Germline | |
Werner Syndrome Mutational Database | http://www.pathology.washington.edu/research/werner/database/ | WRN | Germline |
For each database the type of mutations as well as the genes have been covered.
Disease variation databases
In addition to the Cancer variation database a large number of publicly available databases focuses on disease specific variations. An overview on such disease variation databases can be found in Table 2.
Table 2.
Database | URL | Gene(s) | Diseases | Remark |
---|---|---|---|---|
ADAbase | http://bioinf.uta.fi/ADAbase | ADA | Adenosine deaminase deficiency | |
AICDAbase | http://bioinf.uta.fi/AICDAbase | AICDA | Non-X-linked hyper-IgM syndrome | |
AIREbase | http://bioinf.uta.fi/AIREbase | AIRE | Autoimmune polyendocrinopathy with candidiasis and ectodermal dystrophy (APECED) | |
Albinism Database (CHS) | http://albinismdb.med.umn.edu/chs1mut.html | LYST | Chediak–Higashi Syndrome | |
Albinism Database (HPS2) | http://albinismdb.med.umn.edu/hps2mut.htm | AP3B1 | Hermansky–Pudlak syndrome 2 | |
ALPSbase (II) | http://research.nhgri.nih.gov/ALPS/alpsII_mut.shtml | CASP10 | Autoimmune lymphoproliferative syndrome, type II | |
ALPSbase (Ia) | http://research.nhgri.nih.gov/ALPS/alpsIa_mut.shtml | FAS | Autoimmune lymphoproliferative syndrome, type Ia | |
AP3B1base | http://bioinf.uta.fi/AP3B1base | AP3B1 | Hermansky–Pudlak syndrome 2 | |
BIRC4base | http://bioinf.uta.fi/BIRC4base | BIRC4 | X-linked lymphoproliferative syndrome | |
BLMbase | http://bioinf.uta.fi/BLMbase | BLM | Bloom syndrome | |
BLNKbase | http://bioinf.uta.fi/BLNKbase | BLNK | BLNK deficiency | |
BTKbase | http://bioinf.uta.fi/BTKbase | BTK | X-linked agammaglobulinemia (XLA) | |
C1QAbase | http://bioinf.uta.fi/C1QAbase | C1QA | C1q α polypeptide deficiency | |
C1QBbase | http://bioinf.uta.fi/C1QBbase | C1QB | C1q β polypeptide deficiency | |
C1QCbase | http://bioinf.uta.fi/C1QCbase | C1QC | C1q γ-polypeptide deficiency | |
C1Sbase | http://bioinf.uta.fi/C1Sbase | C1 S | C1 s deficiency | |
C2base | http://bioinf.uta.fi/C2base | C2 | C2 deficiency | |
C3base | http://bioinf.uta.fi/C3base | C3 | C3 deficiency | |
C5base | http://bioinf.uta.fi/C5base | C5 | C5 deficiency | |
C6base | http://bioinf.uta.fi/C6base | C6 | C6 deficiency | |
C7base | http://bioinf.uta.fi/C7base | C7 | C7 deficiency | |
C8Bbase | http://bioinf.uta.fi/C8Bbase | C8B | C8B deficiency | |
C9base | http://bioinf.uta.fi/C9base | C9 | C9 deficiency | |
CA2base | http://bioinf.uta.fi/CA2base | CA2 | Osteopetrosis with renal tubular acidosis | |
CASP10base | http://bioinf.uta.fi/CASP10base | CASP10 | Autoimmune lymphoproliferative syndrome, type II | |
CASP8base | http://bioinf.uta.fi/CASP8base | CASP8 | Caspase 8 deficiency | |
Catalogue of Somatic Mutations in Cancer (COSMIC) | http://www.sanger.ac.uk/genetics/CGP/cosmic/ | >13 500 | multiple tissues and histologies | |
CD19base | http://bioinf.uta.fi/CD19base | CD19 | CD19 deficiency | |
CD247base | http://bioinf.uta.fi/CD247base | CD247 | CD3ζ deficiency | |
CD3Dbase | http://bioinf.uta.fi/CD3Dbase | CD3D | CD3δ deficiency | |
CD3Ebase | http://bioinf.uta.fi/CD3Ebase | CD3 E | CD3ε deficiency | |
CD3Gbase | http://bioinf.uta.fi/CD3Gbase | CD3 G | CD3γ deficiency | |
CD40base | http://bioinf.uta.fi/CD40base | CD40 | CD40 deficiency | |
CD40Lbase | http://bioinf.uta.fi/CD40Lbase | CD40 L | X-linked Hyper-IgM syndrome (XHIM) | |
CD55base | http://bioinf.uta.fi/CD55base | CD55 | Decay-accelerating factor (CD55) deficiency | |
CD59base | http://bioinf.uta.fi/CD59base | CD59 | CD59 deficiency | |
CD79Abase | http://bioinf.uta.fi/CD79Abase | CD79 A | Igα deficiency | |
CD79Bbase | http://bioinf.uta.fi/CD79Bbase | CD79B | Igβ deficiency | |
CD8Abase | http://bioinf.uta.fi/CD8Abase | CD8 A | CD8α deficiency | |
CEBPEbase | http://bioinf.uta.fi/CEBPEbase | CEBPE | Neutrophil-specific granule deficiency | |
CFDbase | http://bioinf.uta.fi/CFDbase | CFD | Factor D deficiency | |
CFHbase | http://bioinf.uta.fi/CFHbase | CFH | Factor H deficiency | |
CFIbase | http://bioinf.uta.fi/CFIbase | CFI | Complement factor I deficiency | |
CFPbase | http://bioinf.uta.fi/CFPbase | CFP | Properdin deficiency | |
CIITAbase | http://bioinf.uta.fi/CIITAbase | CIITA | MHCII transactivating protein deficiency | |
CLCN7base | http://bioinf.uta.fi/CLCN7base | CLCN7 | Autosomal dominant osteopetrosis, type 2 | |
CTSCbase | http://bioinf.uta.fi/CTSCbase | CTSC | Papillon-Lefevre syndrome | |
CXCR4base | http://bioinf.uta.fi/CXCR4base | CXCR4 | WHIM syndrome | |
CYBAbase | http://bioinf.uta.fi/CYBAbase | CYBA | Autosomal recessive p22phox deficiency | |
CYBBbase | http://bioinf.uta.fi/CYBBbase | CYBB | X-linked chronic granulomatous disease (XCGD) | |
DCLRE1Cbase | http://bioinf.uta.fi/DCLRE1Cbase | DCLRE1 C | Artemis deficiency | |
DKC1base | http://bioinf.uta.fi/DKC1base | DKC1 | Hoyeraal-Hreidarsson syndrome | |
DNMT3Bbase | http://bioinf.uta.fi/DNMT3Bbase | DNMT3B | ICF syndrome | |
ELA2base | http://bioinf.uta.fi/ELA2base | ELA2 | Cyclic neutropenia; Congenital neutropenia | |
F12base | http://bioinf.uta.fi/F12base | F12 | Hereditary angioedema type III | |
Fanconi Anemia Mutation Database | http://www.rockefeller.edu/fanconi/mutate/jumpa.html | FANCA, FANCB, FANCC, FANCD2, FANCE, FANCF, FANCG, FANCL | Fanconi anemia | |
FASLGbase | http://bioinf.uta.fi/FASLGbase | FASLG | Autoimmune lymphoproliferative syndrome, type 1B (ALPS1B) | |
FCGR1Abase | http://bioinf.uta.fi/FCGR1Abase | FCGR1 A | CD64 deficiency | |
FCGR3Abase | http://bioinf.uta.fi/FCGR3Abase | FCGR3 A | Natural killer cell deficiency | |
FH aHUS Mutation Database | http://www.fh-hus.org/ | CFH | Hemolytic uraemic syndrome (HUS) | |
FOXN1base | http://bioinf.uta.fi/FOXN1base | FOXN1 | T-cell immunodeficiency, congenital alopecia, and nail dystrophy | |
FOXP3base | http://bioinf.uta.fi/FOXP3base | FOXP3 | Immunodysregulation, polyendocrinopathy, and enteropathy, X-linked; IPEX | |
GFI1base | http://bioinf.uta.fi/GFI1base | GFI1 | Severe congenital neutropenia (SCN); Nonimmune chronic idiopathic neutropenia of adults (NI-CINA) | |
HAEdb | http://hae.enzim.hu/ | SERPING1 | Hereditary angioedema | |
HAX1base | http://bioinf.uta.fi/HAX1base | HAX1 | Severe congenital neutropenia (Kostmann disease) | |
ICOSbase | http://bioinf.uta.fi/ICOSbase | ICOS | ICOS deficiency | |
IFNGR1base | http://bioinf.uta.fi/IFNGR1base | IFNGR1 | IFNγ1-receptor deficiency | |
IFNGR2base | http://bioinf.uta.fi/IFNGR2base | IFNGR2 | IFNγ2-receptor deficiency | |
IGHG2base | http://bioinf.uta.fi/IGHG2base | IGHG2 | IgG2 deficiency | |
IGHMbase | http://bioinf.uta.fi/IGHMbase | IGHM | μ heavy chain deficiency | |
IGLL1base | http://bioinf.uta.fi/IGLL1base | IGLL1 | λ5surrogate light-chain deficiency | |
IKBKGbase | http://bioinf.uta.fi/IKBKGbase | IKBKG | Nemo deficiency | |
IL12Bbase | http://bioinf.uta.fi/IL12Bbase | IL12B | Interleukin-12 (IL12) p40 deficiency | |
IL12RB1base | http://bioinf.uta.fi/IL12RB1base | IL12RB1 | Interleukin-12 receptor β1 deficiency | |
IL2RAbase | http://bioinf.uta.fi/IL2RAbase | IL2RA | Interleukin-2 receptor α deficiency | |
IL2RGbase | http://research.nhgri.nih.gov/scid/ | IL2RG | X-linked SCID | |
IL7Rbase | http://bioinf.uta.fi/IL7Rbase | IL7 R | Interleukin-7 receptor α deficiency | |
Infevers | http://fmf.igh.cnrs.fr/ISSAID/infevers/ | LPIN2 | Majeed syndrome | |
Infevers | http://fmf.igh.cnrs.fr/ISSAID/infevers/ | MEFV | Familial Mediterranean fever | |
Infevers | http://fmf.igh.cnrs.fr/ISSAID/infevers/ | MVK | Hyper IgD Syndrome and periodic fever | |
Infevers | http://fmf.igh.cnrs.fr/ISSAID/infevers/ | NLRP3 | Familial cold autoinflammatory syndrome, Muckle-Wells syndrome and chronic infantile neurological cutaneous and articular syndrome | |
Infevers | http://fmf.igh.cnrs.fr/ISSAID/infevers/ | NLRP7 | Recurrent Hydatidiform moles and reproductive wastage | |
Infevers | http://fmf.igh.cnrs.fr/ISSAID/infevers/ | NOD2 | Blau syndrome, Chrohn's disease, early onset sarcoidosis | |
Infevers | http://fmf.igh.cnrs.fr/ISSAID/infevers/ | PSTPIP1 | Pyogenic sterile arthritis, pyoderma gangrenosum and acne syndrome | |
Infevers | http://fmf.igh.cnrs.fr/ISSAID/infevers/ | TNFRSF1 A | Tumor necrosis factor receptor-associated periodic syndrome | |
IRAK4base | http://bioinf.uta.fi/IRAK4base | IRAK4 | IRAK4 deficiency | |
ITGB2base | http://bioinf.uta.fi/ITGB2base | ITGB2 | Leukocyte adhesion deficiency I (LAD-I) | |
JAK3base | http://bioinf.uta.fi/JAK3base | JAK3 | Jak3 deficiency | |
LIG1base | http://bioinf.uta.fi/LIG1base | LIG1 | DNA ligase I deficiency | |
LIG4base | http://bioinf.uta.fi/LIG4base | LIG4 | LIG4 syndrome | |
LRRC8Abase | http://bioinf.uta.fi/LRRC8Abase | LRRC8 A | Non-Bruton type autosomal dominant agammaglobulinemia | |
LYSTbase | http://bioinf.uta.fi/LYSTbase | LYST | Chediak–Higashi syndrome | |
MAPBPIPbase | http://bioinf.uta.fi/MAPBPIPbase | MAPBPIP | Endosomal adaptor protein p14 deficiency | |
MASP2base | http://bioinf.uta.fi/MASP2base | MASP2 | MASP2 deficiency | |
MLPHbase | http://bioinf.uta.fi/MLPHbase | MLPH | Griscelli syndrome, type 3 (GS3) | |
MPObase | http://bioinf.uta.fi/MPObase | MPO | Myeloperoxidase deficiency | |
MRE11Abase | http://bioinf.uta.fi/MRE11Abase | MRE11 A | Ataxia-telangiectasia-like disorder (ATLD) | |
Mutation Database - Papillon Lefevre Syndrome | http://www.genetics.pitt.edu/mutation/pls/ | CTSC | Papillon Lefevre syndrome | |
MYO5Abase | http://bioinf.uta.fi/MYO5Abase | MYO5 A | Griscelli syndrome, type 1 (GS1) | |
NCF1base | http://bioinf.uta.fi/NCF1base | NCF1 | Autosomal recessive p47phox deficiency | |
NCF2base | http://bioinf.uta.fi/NCF2base | NCF2 | Autosomal recessive p67phox deficiency | |
NFKBIAbase | http://bioinf.uta.fi/NFKBIAbase | NFKBIA | Autosomal dominant anhidrotic ectodermal dysplasia and T-cell immunodeficiency | |
NHEJ1base | http://bioinf.uta.fi/NHEJ1base | NHEJ1 | Combined immunodeficiency (CID) associated with microcephaly and increased cellular sensitivity to IR | |
NPbase | http://bioinf.uta.fi/Npbase | NP | PNP deficiency | |
NRASbase | http://bioinf.uta.fi/NRASbase | NRAS | Autoimmune lymphoproliferative syndrome type IV | |
ORAI1base | http://bioinf.uta.fi/ORAI1base | ORAI1 | Severe combined immunodeficiency | |
OSTM1base | http://bioinf.uta.fi/OSTM1base | OSTM1 | Autosomal recessive osteopetrosis | |
PIK3R1base | http://bioinf.uta.fi/PIK3R1base | PIK3R1 | Pathogenic mutations in the p85α SH2 domain | |
PRF1base | http://bioinf.uta.fi/PRF1base | PRF1 | Familial haemophagocytic lymphohistiocytosis, type II (FHL2) | |
PTPN11base | http://bioinf.uta.fi/PTPN11base | PTPN11 | Pathogenic mutations in the SHP-2 SH2 domain | |
PTPRCbase | http://bioinf.uta.fi/PTPRCbase | PTPRC | CD45 deficiency | |
RAB27Abase | http://bioinf.uta.fi/RAB27Abase | RAB27 A | Griscelli syndrome, type 2 (GS2) | |
RAC2base | http://bioinf.uta.fi/RAC2base | RAC2 | Neutrophil immunodeficiency syndrome | |
RAG1base | http://bioinf.uta.fi/RAG1base | RAG1 | RAG1 deficiency | |
RAG2base | http://bioinf.uta.fi/RAG2base | RAG2 | RAG2 deficiency | |
RASA1base | http://bioinf.uta.fi/RASA1base | RASA1 | Pathogenic mutations in the RasGAP SH2 domain | |
RASGRP2base | http://bioinf.uta.fi/RASGRP2base | RASGRP2 | Leukocyte adhesion deficiency III | |
RFX5base | http://bioinf.uta.fi/RFX5base | RFX5 | MHCII promoter X box regulatory factor 5 deficiency | |
RFXANKbase | http://bioinf.uta.fi/RFXANKbase | RFXANK | Ankyrin repeat containing regulatory factor X-associated protein deficiency | |
RFXAPbase | http://bioinf.uta.fi/RFXAPbase | RFXAP | Regulatory factor X-associated protein deficiency | |
Roche Cancer Genome Database (RCGDB) | http://rcgdb.bioinf.uni-sb.de/MutomeWeb | >10 000 | multiple tissues and histologies | |
SBDSbase | http://bioinf.uta.fi/SBDSbase | SBDS | Shwachman–Diamond syndrome | |
SERPING1base | http://bioinf.uta.fi/SERPING1base | SERPING1 | Hereditary angioedema | |
SH2base | http://bioinf.uta.fi/SH2base | SH2 | Pathogenic SH2 domain mutations | |
SH2D1Abase | http://bioinf.uta.fi/SH2D1Abase | SH2D1 A | X-linked lymphoproliferative syndrome (XLP) | |
SLC35C1base | http://bioinf.uta.fi/SLC35C1base | SLC35C1 | Leukocyte adhesion deficiency I I (LAD-II) | |
SMARCAL1base | http://bioinf.uta.fi/SMARCAL1base | SMARCAL1 | Schimke immuno-osseous dysplasia | |
SP110base | http://bioinf.uta.fi/SP110base | SP110 | Hepatic veno-occlusive disease with immunodeficiency syndrome (VODI) | |
SPINK5base | http://bioinf.uta.fi/SPINK5base | SPINK5 | Netherton syndrome | |
STAT1base | http://bioinf.uta.fi/STAT1base | STAT1 | STAT1 deficiency | |
STAT3base | http://bioinf.uta.fi/STAT3base | STAT3 | Hyper-IgE syndrome | |
STAT5Bbase | http://bioinf.uta.fi/STAT5Bbase | STAT5B | Growth hormone insensitivity with immunodeficiency | |
STX11base | http://bioinf.uta.fi/STX11base | STX11 | Familial haemophagocytic lymphohistiocytosis 4 | |
TAP1base | http://bioinf.uta.fi/TAP1base | TAP1 | TAP1 deficiency | |
TAP2base | http://bioinf.uta.fi/TAP2base | TAP2 | TAP2 deficiency | |
TAPBPbase | http://bioinf.uta.fi/TAPBPbase | TAPBP | Tapasin deficiency | |
TAZbase | http://bioinf.uta.fi/TAZbase | TAZ | Barth syndrome | |
TCIRG1base | http://bioinf.uta.fi/TCIRG1base | TCIRG1 | Autosomal recessive osteopetrosis (arOP) | |
TCN2base | http://bioinf.uta.fi/TCN2base | TCN2 | Transcobalamin II deficiency | |
The Cancer Genome Atlas (TCGA) | http://cancergenome.nih.gov/ | >400 | Brain (glioblastoma multiforme), ovarian (serous cystadenocarcinoma) | Login for TCGA Data Portal necessary, more disease to come |
TLR3base | http://bioinf.uta.fi/TLR3base | TLR3 | Influenza-associated encephalopathy | |
TMC6base | http://bioinf.uta.fi/TMC6base | TMC6 | Epidermodysplasia verruciformis | |
TMC8base | http://bioinf.uta.fi/TMC8base | TMC8 | Epidermodysplasia verruciformis | |
TNFRSF13Bbase | http://bioinf.uta.fi/TNFRSF13Bbase | TNFRSF13B | TACI deficiency | |
TYK2base | http://bioinf.uta.fi/TYK2base | TYK2 | TYK2 deficiency | |
UMD-ATP7B | http://www.umd.be/ATP7B/ | ATPase, Cu++ transporting, beta polypetide | Wilson disease | |
UMD-COL3A1 | http://www.umd.be/COL3A1/ | COL3A1 | COL3A1 deficiency | Restricted access |
UMD-CSA | http://www.umd.be/CSA/ | ERCC8 | ERCC8 deficiency | |
UMD-CSB | http://www.umd.be/CSB/ | ERCC6 | ERCC6 deficiency | |
UMD-DFNB1-GJB2 | http://www.umd.be/DFNB1-GJB2/ | DFNB1, GJB2 | DFNB1 deficiency | Restricted access |
UMD-DMD | http://www.umd.be/DMD/ | DMD | DMD deficiency | |
UMD-DPYD | http://www.umd.be/DPYD/ | DPYD | Dihydropyrimidine dehydrogenase disease | Restricted access |
UMD-EMD | http://www.umd.be/EMD/ | EMD | EMD deficiency | |
UMD-FBN1 | http://www.umd.be/FBN1/ | FBN1 | Marfan syndrome and related disorders | |
UMD-FBN2 | http://194.167.35.168/FBN2/ | FBN2 | Congenital contractural arachnodactyly | |
UMD-LDLR | http://www.umd.be/LDLR/ | LDLR | Familial hypercholesterolemia (FH) | |
UMD-LMNA | http://www.umd.be/LMNA/ | LMNA | LMNA deficiency | |
UMD-TGFBR1 | http://www.umd.be/LSDB.html | TGFBR1 | TGFBR1 deficiency | Restricted access |
UMD-TGFBR2 | http://www.umd.be/TGFBR2/ | TGFBR2 | Marfan syndrome, Loeys–Dietz syndome, Familial thoracic aortic anezrysms and dissections | |
UMD-USHbases | http://www.umd.be/usher.html | Usher syndrome | ||
UNC13Dbase | http://bioinf.uta.fi/UNC13Dbase | UNC13D | Familial hemophagocytic lymphohistiocytosis 3 | |
UNC93B1base | http://bioinf.uta.fi/UNC93B1base | UNC93B1 | UNC93B deficiency (Herpes simplex encephalitis) | |
UNGbase | http://bioinf.uta.fi/UNGbase | UNG | UNG deficiency | |
WASbase | http://bioinf.uta.fi/WASbase | WAS | Wiskott–Aldrich syndrome (WAS) | |
WASPbase | http://homepage.mac.com/kohsukeimai/wasp/WASPbase.html | WASP | Wiskott–Aldrich syndrome | |
ZAP70base | http://bioinf.uta.fi/ZAP70base | ZAP70 | ZAP70 deficiency |
For each database the disease as well as the genes have been covered.
Prominent disease mutation databases are the public IDbases (53) maintained at the Institute of Medical Technology, University of Tampere. The IDbases are LSDBs for immunodeficiency-causing mutations. The project maintains 122 different IDBases containing altogether data for 5359 patients. In addition to gene mutations, IDbases provide information about clinical presentation. All information has been collected from the literature as well as directly from researchers. The databases do not provide any sophisticated search interface and allow to download the data as a txt-file.
Conclusion
All databases presented are good starting points to retrieve human variation data for certain use cases depending on the provided interfaces. However, as soon as a query gets more complicated, an integrative approach will be necessary. Unfortunately, the diversity of current mutation information systems and the underlying data models make it difficult to mine human variation databases in an integrative approach. Currently, researchers typically have to browse and search several databases to obtain the required information. No unified access to all different cancer genome related data sources exists resulting in a need for more efficient integrative systems. With COSMIC, which is currently integrating TCGA and IARC TP53 information, and the RCGDB, which already integrates most of in this review presented data sources, two promising integrative data resources are available. Nevertheless, the standardization and virtual consolidation of existing databases will be one major challenge for future developments. Although these problems have already been discussed in previous publications (54–56), the current situation concerning mutation databases and their heterogeneity is still an acute problem due to the exponential growth of data generated by genome sequencing. This review is meant to provide an overview on the current status of mutation data in public resources to overcome the difficulties for users to know where to find the information they are looking for.
Funding
This work was supported by the Roche Postdoc Fellowship Program.
Conflict of interest. None declared.
References
- 1.Vogelstein B, Kinzler K. The Genetic Basis of Human Cancer. McGraw-Hill Professional; 2002. [Google Scholar]
- 2.Thomas RK, Baker AC, Debiasi RM, et al. High-throughput oncogene mutation profiling in human cancer. Nat. Genet. 2007;39:347–351. doi: 10.1038/ng1975. [DOI] [PubMed] [Google Scholar]
- 3.Wood L, Parsons D, Jones S, et al. The genomic landscapes of human breast and colorectal cancers. Science. 2007;318:1108–1113. doi: 10.1126/science.1145720. [DOI] [PubMed] [Google Scholar]
- 4.Pinkel D, Albertson DG. Comparative genomic hybridization. Annu. Rev. Genomics Hum. Genet. 2005;6:331–354. doi: 10.1146/annurev.genom.6.080604.162140. [DOI] [PubMed] [Google Scholar]
- 5.Wang Z, Shen D, Parsons DW, et al. Mutational analysis of the tyrosine phosphatome in colorectal cancers. Science. 2004;304:1164–1166. doi: 10.1126/science.1096096. [DOI] [PubMed] [Google Scholar]
- 6.Stephens P, Edkins S, Davies H, et al. A screen of the complete protein kinase gene family identifies diverse patterns of somatic mutations in human breast cancer. Nat. Genet. 2005;37:590–592. doi: 10.1038/ng1571. [DOI] [PubMed] [Google Scholar]
- 7.Davies H, Hunter C, Smith R, et al. Somatic mutations of the protein kinase gene family in human lung cancer. Cancer Res. 2005;65:7591–7595. doi: 10.1158/0008-5472.CAN-05-1855. [DOI] [PubMed] [Google Scholar]
- 8.Bignell G, Smith R, Hunter C, et al. Sequence analysis of the protein kinase gene family in human testicular germ-cell tumors of adolescents and adults. Genes Chromosomes Cancer. 2006;45:42–46. doi: 10.1002/gcc.20265. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Futreal PA, Wooster R, Stratton MR. Somatic mutations in human cancer: insights from resequencing the protein kinase gene family. Cold Spring Harb. Symp. Quant. Biol. 2005;70:43–49. doi: 10.1101/sqb.2005.70.015. [DOI] [PubMed] [Google Scholar]
- 10.Haber DA, Settleman J. Cancer: drivers and passengers. Nature. 2007;446:145–146. doi: 10.1038/446145a. [DOI] [PubMed] [Google Scholar]
- 11.Futreal P, Coin L, Marshall M, et al. A census of human cancer genes. Nat. Rev. Cancer. 2004;4:177–183. doi: 10.1038/nrc1299. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Stratton MR, Campbell PJ, Futreal PA. The cancer genome. Nature. 2009;458:719–724. doi: 10.1038/nature07943. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Greenman C, Stephens P, Smith R, et al. Patterns of somatic mutation in human cancer genomes. Nature. 2007;446:153–158. doi: 10.1038/nature05610. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Cleaver JE. Cancer in xeroderma pigmentosum and related disorders of DNA repair. Nat. Rev. Cancer. 2005;5:564–573. doi: 10.1038/nrc1652. [DOI] [PubMed] [Google Scholar]
- 15.Hoeijmakers JH. DNA damage, aging, and cancer. N. Engl. J. Med. 2009;361:1475–1485. doi: 10.1056/NEJMra0804615. [DOI] [PubMed] [Google Scholar]
- 16.Shen X, Do H, Li Y, et al. Recruitment of fanconi anemia and breast cancer proteins to DNA damage sites is differentially governed by replication. Mol. Cell. 2009;35:716–723. doi: 10.1016/j.molcel.2009.06.034. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Capell BC, Tlougan BE, Orlow SJ. From the rarest to the most common: insights from progeroid syndromes into skin cancer and aging. J. Invest. Dermatol. 2009;129:2340–2350. doi: 10.1038/jid.2009.103. [DOI] [PubMed] [Google Scholar]
- 18.Wu L. Role of the BLM helicase in replication fork management. DNA Repair. 2007;6:936–944. doi: 10.1016/j.dnarep.2007.02.007. [DOI] [PubMed] [Google Scholar]
- 19.Osorio A, Milne RL, Pita G, et al. Evaluation of a candidate breast cancer associated SNP in ERCC4 as a risk modifier in BRCA1 and BRCA2 mutation carriers. Results from the Consortium of Investigators of Modifiers of BRCA1/BRCA2 (CIMBA) Br. J. Cancer. 2009;101:2048–2054. doi: 10.1038/sj.bjc.6605416. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Kwong LN, Dove WF. APC and its modifiers in colon cancer. Adv. Exp. Med. Biol. 2009;656:85–106. doi: 10.1007/978-1-4419-1145-2_8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Normanno N, Tejpar S, Morgillo F, et al. Implications for KRAS status and EGFR-targeted therapies in metastatic CRC. Nat. Rev. Clin. Oncol. 2009;6:519–527. doi: 10.1038/nrclinonc.2009.111. [DOI] [PubMed] [Google Scholar]
- 22.Walther A, Johnstone E, Swanton C, et al. Genetic prognostic and predictive markers in colorectal cancer. Nat. Rev. Cancer. 2009;9:489–499. doi: 10.1038/nrc2645. [DOI] [PubMed] [Google Scholar]
- 23.Nucera C, Goldfarb M, Hodin R, et al. Role of B-Raf(V600E) in differentiated thyroid cancer and preclinical validation of compounds against B-Raf(V600E) Biochim. Biophys. Acta. 2009;1795:152–161. doi: 10.1016/j.bbcan.2009.01.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Halilovic E, Solit DB. Therapeutic strategies for inhibiting oncogenic BRAF signaling. Curr. Opin. Pharmacol. 2008;8:419–426. doi: 10.1016/j.coph.2008.06.014. [DOI] [PubMed] [Google Scholar]
- 25.Loriot Y, Mordant P, Deutsch E, et al. Are RAS mutations predictive markers of resistance to standard chemotherapy? Nat. Rev. Clin. Oncol. 2009;6:528–534. doi: 10.1038/nrclinonc.2009.106. [DOI] [PubMed] [Google Scholar]
- 26.Ford D, Easton DF, Stratton M, et al. Genetic heterogeneity and penetrance analysis of the BRCA1 and BRCA2 genes in breast cancer families. The Breast Cancer Linkage Consortium. Am. J. Hum. Genet. 1998;62:676–689. doi: 10.1086/301749. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Kadouri L, Hubert A, Rotenberg Y, et al. Cancer risks in carriers of the BRCA1/2 Ashkenazi founder mutations. J. Med. Genet. 2007;44:467–471. doi: 10.1136/jmg.2006.048173. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Thompson D, Easton DF. Cancer Incidence in BRCA1 mutation carriers. J. Natl Cancer Inst. 2002;94:1358–1365. doi: 10.1093/jnci/94.18.1358. [DOI] [PubMed] [Google Scholar]
- 29.Lonser R, Glenn G, Walther M, et al. von Hippel-Lindau disease. The Lancet. 2003;361:2059–2067. doi: 10.1016/S0140-6736(03)13643-4. [DOI] [PubMed] [Google Scholar]
- 30.Hastings ML, Resta N, Traum D, et al. An LKB1 AT-AC intron mutation causes Peutz-Jeghers syndrome via splicing at noncanonical cryptic splice sites. Nat. Struct. Mol. Biol. 2005;12:54–59. doi: 10.1038/nsmb873. [DOI] [PubMed] [Google Scholar]
- 31.Tinat J, Bougeard G, Baert-Desurmont S, et al. 2009 version of the Chompret criteria for Li Fraumeni syndrome. J. Clin. Oncol. 2009;27:e108–e109. doi: 10.1200/JCO.2009.22.7967. [DOI] [PubMed] [Google Scholar]
- 32.Sherry ST, Ward MH, Kholodov M, et al. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 2001;29:308–311. doi: 10.1093/nar/29.1.308. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Wheeler DL, Barrett T, Benson DA, et al. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 2008;36:D13–D21. doi: 10.1093/nar/gkm1000. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Frazer KA, Ballinger DG, Cox DR, et al. A second generation human haplotype map of over 3.1 million SNPs. Nature. 2007;449:851–861. doi: 10.1038/nature06258. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Mclendon R, Friedman A, Bigner D, et al. Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature. 2008;455:1061–1068. doi: 10.1038/nature07385. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.den Dunnen JT, Antonarakis SE. Mutation nomenclature extensions and suggestions to describe complex mutations: a discussion. Hum. Mutat. 2000;15:7–12. doi: 10.1002/(SICI)1098-1004(200001)15:1<7::AID-HUMU4>3.0.CO;2-N. [DOI] [PubMed] [Google Scholar]
- 37.Greenman C, Wooster R, Futreal PA, et al. Statistical analysis of pathogenicity of somatic mutations in cancer. Genetics. 2006;173:2187–2198. doi: 10.1534/genetics.105.044677. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Sjöblom T, Jones S, Wood LD, et al. The consensus coding sequences of human breast and colorectal cancers. Science. 2006;314:268–274. doi: 10.1126/science.1133427. [DOI] [PubMed] [Google Scholar]
- 39.Jones S, Zhang X, Parsons DW, et al. Core signaling pathways in human pancreatic cancers revealed by global genomic analyses. Science. 2008;321:1801–1806. doi: 10.1126/science.1164368. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Parsons DW, Jones S, Zhang X, et al. An integrated genomic analysis of human glioblastoma multiforme. Science. 2008;321:1807–1812. doi: 10.1126/science.1164382. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Petitjean A, Mathe E, Kato S, et al. Impact of mutant p53 functional properties on TP53 mutation patterns and tumor phenotype: lessons from recent developments in the IARC TP53 database. Hum. Mutat. 2007;28:622–629. doi: 10.1002/humu.20495. [DOI] [PubMed] [Google Scholar]
- 42.Forbes SA, Bhamra G, Bamford S, et al. The Catalogue of Somatic Mutations in Cancer (COSMIC) Curr. Protoc. Hum. Genet. 2008;10 doi: 10.1002/0471142905.hg1011s57. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Stenson PD, Ball EV, Mort M, et al. Human Gene Mutation Database (HGMD): 2003 update. Hum. Mutat. 2003;21:577–581. doi: 10.1002/humu.10212. [DOI] [PubMed] [Google Scholar]
- 44.Beroud C, Collod-Beroud G, Boileau C, et al. UMD (Universal mutation database): a generic software to build and analyze locus-specific databases. Hum. Mutat. 2000;15:86–94. doi: 10.1002/(SICI)1098-1004(200001)15:1<86::AID-HUMU16>3.0.CO;2-4. [DOI] [PubMed] [Google Scholar]
- 45.Brown AF, McKie MA. MuStaR and other software for locus-specific mutation databases. Hum. Mutat. 2000;15:76–85. doi: 10.1002/(SICI)1098-1004(200001)15:1<76::AID-HUMU15>3.0.CO;2-8. [DOI] [PubMed] [Google Scholar]
- 46.Fokkema IF, den Dunnen JT, Taschner PE. LOVD: easy creation of a locus-specific sequence variation database using an ‘LSDB-in-a-box’ approach. Hum. Mutat. 2005;26:63–68. doi: 10.1002/humu.20201. [DOI] [PubMed] [Google Scholar]
- 47.Edkins S, O'Meara S, Parker A, et al. Recurrent KRAS codon 146 mutations in human colorectal cancer. Cancer Biol. Ther. 2006;5:928–932. doi: 10.4161/cbt.5.8.3251. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Hostein I, Faur N, Primois C, et al. BRAF mutation status in gastrointestinal stromal tumors. Am J. Clin. Pathol. 2010;133:141–148. doi: 10.1309/AJCPPCKGA2QGBJ1R. [DOI] [PubMed] [Google Scholar]
- 49.Cui W, Kong X, Cao HL, et al. Mutations of p53 gene in 41 cases of human brain gliomas. Ai Zheng. 2008;27:8–11. [PubMed] [Google Scholar]
- 50.Shoemaker RH. The NCI60 human tumour cell line anticancer drug screen. Nat. Rev. Cancer. 2006;6:813–823. doi: 10.1038/nrc1951. [DOI] [PubMed] [Google Scholar]
- 51.Kuntzer J, Eggle D, Lenhof HP, et al. The Roche Cancer Genome database (RCGDB) Hum. Mutat. 2010;31:407–413. doi: 10.1002/humu.21207. [DOI] [PubMed] [Google Scholar]
- 52.Vos YJ, de Walle HE, Bos KK, et al. Genotype-phenotype correlations in L1 syndrome: a guide for genetic counselling and mutation analysis. J. Med. Genet. 2009;47:169–175. doi: 10.1136/jmg.2009.071688. [DOI] [PubMed] [Google Scholar]
- 53.Piirila H, Valiaho J, Vihinen M. Immunodeficiency mutation databases (IDbases) Hum. Mutat. 2006;27:1200–1208. doi: 10.1002/humu.20405. [DOI] [PubMed] [Google Scholar]
- 54.Claustres M, Horaitis O, Vanevski M, et al. Time for a unified system of mutation description and reporting: a review of locus-specific mutation databases. Genome Res. 2002;12:680–688. doi: 10.1101/gr.217702. [DOI] [PubMed] [Google Scholar]
- 55.Horaitis O, Cotton RG. The challenge of documenting mutation across the genome: the human genome variation society approach. Hum. Mutat. 2004;23:447–452. doi: 10.1002/humu.20038. [DOI] [PubMed] [Google Scholar]
- 56.Kaput J, Cotton RG, Hardman L, et al. Planning the human variome project: the Spain report. Hum. Mutat. 2009;30:496–510. doi: 10.1002/humu.20972. [DOI] [PMC free article] [PubMed] [Google Scholar]