Skip to main content
Nucleic Acids Research logoLink to Nucleic Acids Research
. 2000 Jan 1;28(1):163–165. doi: 10.1093/nar/28.1.163

MitoNuc and MitoAln: two related databases of nuclear genes coding for mitochondrial proteins

Graziano Pesole 1,4,a, Carmela Gissi 2, Domenico Catalano 3, Giorgio Grillo 3, Flavio Licciulli 3, Sabino Liuni 3,4, Marcella Attimonelli 2, Cecilia Saccone 2,3,4
PMCID: PMC102385  PMID: 10592211

Abstract

Mitochondria, besides their central role in energy metabolism, have recently been found to be involved in a number of basic processes of cell life and to contribute to the pathogenesis of many degenerative diseases. All functions of mitochondria depend on the interaction of nuclear and organellar genomes. Mitochondrial genomes have been extensively sequenced and analysed and the data collected in several specialised databases. In order to collect information on nuclear coded mitochondrial proteins we developed MitoNuc and MitoAln, two related databases containing, respectively, detailed information on sequenced nuclear genes coding for mitochondrial proteins in Metazoa and yeast, and the multiple alignments of the relevant homologous protein coding regions. MitoNuc and MitoAln retrieval through SRS at http://bio-www.ba.cnr.it:8000/srs6/ can easily allow the extraction of sequence data, subsequences defined by specific features and nucleotide or amino acid multiple alignments.

INTRODUCTION

Mitochondria, subcellular organelles present in the majority of eukaryotes, contain their own independent genome and expression machinery. Recently, mitochondria have returned to the interest of researchers since, besides their fundamental role in the cellular energy metabolism, they seem to be involved in a number of central cellular processes such as apoptosis. Moreover, mitochondrial defects contribute to the pathogenesis of many degenerative diseases, to aging and to cancer.

Mitochondrial genomes have been extensively sequenced in many organisms and data stored in several specialised databases, among which MitBase (1) collects all available information from different organisms and from intraspecies variants and mutants.

A few mitochondrial proteins are coded by the mitochondrial genome. Indeed, most of the mitochondrial proteins are encoded by nuclear genes, synthesised in the cytoplasm and then imported into the organelle. Thus, the cross-talk between the nuclear and the mitochondrial genomes is crucial for mitochondrial biogenesis and function and the two genomes are probably subjected to co-evolutionary processes.

The present paper describes the specialised database MitoNuc reporting information on sequenced nuclear genes coding for mitochondrial proteins in Metazoa and their homologous genes in the yeast Saccharomyces cerevisiae. Yeast has been included because it is used as a model system for genetic and biochemical studies on mitochondrial biogenesis, and much information on mitochondrial-related proteins is available for this organism. The complementary database MitoAln, which contains multiple alignments of the homologous genes represented in MitoNuc, has also been developed.

DATABASE STRUCTURE DESIGN

Each MitoNuc entry defines a nuclear gene coding for a mitochondrial-related protein in a given species. Each entry reports a set of defined information: gene description; species name and taxonomic classification; gene name; EC (Enzyme Classification) code in the case of genes coding for enzymes; gene product name, synonymous and functional classification as defined in the KEYnet database (2); metabolic pathways in which the protein product is involved; cellular and sub-mitochondrial localisation of the encoded proteins; possible presence of tissue-specific isoforms. Cross-references to the EMBL, SWISS-PROT/TrEMBL and the complementary database MitoAln are also present.

More than one EMBL nucleotide sequence can be linked to the same MitoNuc entry because of the high level of redundancy of the primary databases where the same gene is frequently reported in different entries. MitoNuc is cross-referenced to each of the redundant entries in order to avoid losing important information related to such redundancy (i.e. polymorphisms, tissue specificity, etc.).

The location and description of sequence regions with specific functions present in each of the EMBL entries linked to a given MitoNuc entry has been included in a detailed feature table. In particular, the reported information concerns the protein coding region (CDS), the signal peptide (sig_peptide), the mature peptide (mat_peptide), and the 5′ and 3′ untranslated regions (5′UTR and 3′UTR) of the relevant mRNAs. Each CDS of a given MitoNuc entry defines a sub-entry (identified by the key ‘aln_name’) which has been included in the multi-alignments collected in the MitoAln database.

The MitoAln database collects and describes the multiple alignments, both at the nucleotide and amino acid level, of homologous coding regions (only CDS) present in the MitoNuc database. Each MitoAln entry contains a multiple alignment where CDS sequences of different species have been included and, in the case of redundancy or tissue specificity, even sequences belonging to the same species. Information reported in each MitoAln entry includes the names of the multiple alignment files, the name and number of the MitoNuc sub-entries contained in the multi-alignment and the species to which they belong.

Figure 1, reporting MitoNuc (a) and MitoAln (b) flat file structures, clarifies the organisation of both databases.

Figure 1.

Figure 1

MitoNuc (a) and MitoAln (b) entry flatfile structure. Underlined words indicate active crosslinks which, in the case of multiple alignment files, allow to visualise, manipulate and extract nucleotide and amino acid multiple alignments.

For standardisation purposes, the entries of both MitoNuc and MitoAln databases have been formatted according to a modified EMBL database format.

DATABASE DATA SOURCE

The gene sequences have been first collected from the EMBL primary databases (Release 55) with the ACNUC (3) and SRS (4) retrieval systems using keywords or reference-based selection criteria. Further metazoan genes homologous to yeast genes listed in MitBASE Pilot (5; http://www3.ebi.ac.uk/Research/Mitbase/mitbiog.pl ) or to other genes previously selected have been found with BLAST/FASTA similarity searches. Additional information concerning the gene and the gene product of MitoNuc entries have been obtained from the literature or extracted from specialised databases. In particular, the exact definition of the encoded proteins and their function has been obtained from SWISS-PROT, ENZYME (6; http://www.expasy.ch/enzyme/ ) and KEYnet (2; http://www.ba.cnr.it/keynet.html ) databases. The metabolic pathway, in which the protein product is involved, has been obtained from the LIGAND database (7; http://www.genome.ad.jp/dbget/ligand.html ). Data on the cellular and submitochondrial localisation, and tissue specificity of the encoded proteins have been obtained from the scientific literature and SWISS-PROT database. Additional data on the presence of cytosolic isoforms of mitochondrial proteins (e.g. aspartate amino transferase) have been inferred from literature surveys. The MitoNuc feature table has been automatically extracted from EMBL and the specialised UTRdb (8) databases, whereas the location of features such as sig_peptide and mat_peptide has been obtained from comparison of information reported in EMBL and SWISS-PROT entries and literature surveys. The gene details in MitoNuc entries, such as product location or tissue specificity, inferred from the literature are linked to the relevant MEDLINE references.

MitoAln ALIGNMENTS

Sequence alignments have been performed with the PILEUP program and manually optimised using the LINEUP program of the GCG package (9). Nucleotide alignments have been guided by the corresponding amino acid alignments.

AVAILABILITY AND RETRIEVAL OF THE DATABASES

MitoNuc and MitoAln databases are available on the Web through SRS retrieval at the http://bio-www.ba.cnr.it:8000/srs6/ site.

Specific sequence regions of the MitoNuc feature table can be easily selected and extracted, thus greatly helping further specific sequence analyses. MitoAln multi-alignment files are available in MSF format and can be managed and retrieved through Web browser interface with different application programs, such as Genedoc (http://www.cris.com/~ketchup/genedoc.shtml ), SeaView (http://pbil.univ-lyon1.fr/software/seaview.html ) and BoxShade (http://helix.nih.gov/science/boxshade.html ). Complete multi-alignments or specific sequence subsets defined by the user can be extracted.

Updating news and on-line tutorials describing retrieval and extraction of MitoNuc and MitoAln entries are available at http://bio-www.ba.cnr.it:8000/EmbIT/Tutorials/Mitonuc/

DATABASE CONTENT

MitoNuc database now contains 599 entries with cross references to a total of 1301 EMBL/GenBank entries. Such entries refer to 27 different metazoan species, most of them belonging to the mammalian class, plus the yeast S.cerevisiae.

MitoAln database contains 150 entries, 62 (41%) of them describing alignment of four or more species. Of the total, 58 (39%) contain yeast sequences.

CONCLUSIONS AND PERSPECTIVES

MitoNuc and MitoAln databases can greatly help the study of mitochondrial function depending on nuclear genes and the co-evolution and interactions of the nuclear and mitochondrial genetic systems. The data collected can be used for evolutionary study and/or for laboratory research purposes.

MitoNuc database contains detailed information on sequenced nuclear genes coding for mitochondrial proteins in Metazoa and yeast, whereas MitoAln database contains the alignments of homologous genes, thus providing a classification of such genes in homologous groups. The MITODAT (http://www-lecb.ncifcrf.gov/mitoDat/ ) and MITOP (10; http://websvr.mips.biochem.mpg.de/proj/medgen/mitop/ ) data collections contain similar information on mitochondrion-related proteins, but they are restricted to a more limited species sample (only human for MITODAT and only five species for MITOP), and do not allow sequence extractions and easy information retrieval. Our databases have been structured in order to allow specific selections, combining various criteria, and easy extraction of sequence data, also limited to regions of specific interest.

Acknowledgments

ACKNOWLEDGEMENTS

We wish to thank A. T. Dimaggio for her contribution in the database construction. We are grateful to N. Altamura for many helpful discussions and for providing MitBASE Pilot data. This work was supported by Programma Biotecnologie legge 95/95 (MURST 5%) and by the EU grant ERB-BIO4-CT96-0030.

REFERENCES

  • 1.Attimonelli M., Cooper,J.M., D’Elia,D., de Montalvo,A., De Robertis,M., Lehvaslaiho,H., Malladi,S.B., Memeo,F., Stevens,K., Schapira,A.H. and Saccone,C. (1999) Nucleic Acids Res., 27, 143–146. Updated article in this issue: Nucleic Acids Res. (2000), 28, 148–152. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Licciulli F., Catalano,D., D’Elia,D., Lorusso,V. and Attimonelli,M. (1999) Nucleic Acids Res., 27, 365–367. Updated article in this issue: Nucleic Acids Res. (2000), 28, 372–373.9847230 [Google Scholar]
  • 3.Gouy M., Gautier,C., Attimonelli,M., Lanave,C. and Di Paola,G. (1985) Comput. Appl. Biosci., 1, 167–172. [DOI] [PubMed] [Google Scholar]
  • 4.Etzold T., Ulyanov,A. and Argos,P. (1996) Methods Enzymol., 266, 114–128. [DOI] [PubMed] [Google Scholar]
  • 5.de Pinto B., Malladi,S.B. and Altamura,N. (1999) Nucleic Acids Res., 27, 147–149. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Bairoch A. (1999) Nucleic Acids Res., 27, 310–311. Updated article in this issue: Nucleic Acids Res. (2000), 28, 304–305. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Goto S., Nishioka,T. and Kanehisa,M. (1999) Nucleic Acids Res., 27, 377–379. Updated article in this issue: Nucleic Acids Res. (2000), 28, 380–382.9847234 [Google Scholar]
  • 8.Pesole G., Liuni,S., Grillo,G., Ippedico,M., Larizza,A., Makalowski,W. and Saccone,C. (1999) Nucleic Acids Res., 27, 188–191. Updated article in this issue: Nucleic Acids Res. (2000), 28, 193–196.9847176 [Google Scholar]
  • 9.Devereux J., Habrerli,P. and Smithies,O. (1984) Nucleic Acids Res., 12, 387–395. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Scharfe C., Zaccaria,P., Hoertnagel,K., Jaksch,M., Klopstock,T., Lill,R., Prokisch,H., Gerbitz,K.D., Mewes,H.W. and Meitinger,T. (1999) Nucleic Acids Res., 27, 153–155. Updated article in this issue: Nucleic Acids Res. (2000), 28, 155–158. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press

RESOURCES