Skip to main content
Comparative and Functional Genomics logoLink to Comparative and Functional Genomics
. 2001 Aug;2(4):226–235. doi: 10.1002/cfg.93

SAND, a New Protein Family: From Nucleic Acid to Protein Structure and Function Prediction

Amanda Cottage 1, Yvonne J K Edwards 1, Greg Elgar 1,
PMCID: PMC2447211  PMID: 18628914

Abstract

As a result of genome, EST and cDNA sequencing projects, there are huge numbers of predicted and/or partially characterised protein sequences compared with a relatively small number of proteins with experimentally determined function and structure. Thus, there is a considerable attention focused on the accurate prediction of gene function and structure from sequence by using bioinformatics. In the course of our analysis of genomic sequence from Fugu rubripes, we identified a novel gene, SAND, with significant sequence identity to hypothetical proteins predicted in Saccharomyces cerevisiae, Schizosaccharomyces pombe, Caenorhabditis elegans, a Drosophila melanogaster gene, and mouse and human cDNAs. Here we identify a further SAND homologue in human and Arabidopsis thaliana by use of standard computational tools. We describe the genomic organisation of SAND in these evolutionarily divergent species and identify sequence homologues from EST database searches confirming the expression of SAND in over 20 different eukaryotes. We confirm the expression of two different SAND paralogues in mammals and determine expression of one SAND in other vertebrates and eukaryotes. Furthermore, we predict structural properties of SAND, and characterise conserved sequence motifs in this protein family.

Full Text

The Full Text of this article is available as a PDF (230.4 KB).

Selected References

These references are in PubMed. This may not be the complete list of references from this article.

  1. Adams M. D., Celniker S. E., Holt R. A., Evans C. A., Gocayne J. D., Amanatides P. G., Scherer S. E., Li P. W., Hoskins R. A., Galle R. F. The genome sequence of Drosophila melanogaster. Science. 2000 Mar 24;287(5461):2185–2195. doi: 10.1126/science.287.5461.2185. [DOI] [PubMed] [Google Scholar]
  2. Altschul S. F., Madden T. L., Schäffer A. A., Zhang J., Zhang Z., Miller W., Lipman D. J. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997 Sep 1;25(17):3389–3402. doi: 10.1093/nar/25.17.3389. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Aparicio S., Hawker K., Cottage A., Mikawa Y., Zuo L., Venkatesh B., Chen E., Krumlauf R., Brenner S. Organization of the Fugu rubripes Hox clusters: evidence for continuing evolution of vertebrate Hox complexes. Nat Genet. 1997 May;16(1):79–83. doi: 10.1038/ng0597-79. [DOI] [PubMed] [Google Scholar]
  4. Armes N., Gilley J., Fried M. The comparative genomic structure and sequence of the surfeit gene homologs in the puffer fish Fugu rubripes and their association with CpG-rich islands. Genome Res. 1997 Dec;7(12):1138–1152. doi: 10.1101/gr.7.12.1138. [DOI] [PubMed] [Google Scholar]
  5. Attwood T. K., Avison H., Beck M. E., Bewley M., Bleasby A. J., Brewster F., Cooper P., Degtyarenko K., Geddes A. J., Flower D. R. The PRINTS database of protein fingerprints: a novel information resource for computational molecular biology. J Chem Inf Comput Sci. 1997 May-Jun;37(3):417–424. doi: 10.1021/ci960468e. [DOI] [PubMed] [Google Scholar]
  6. Bairoch A., Apweiler R. The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res. 2000 Jan 1;28(1):45–48. doi: 10.1093/nar/28.1.45. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Bork P., Doerks T., Springer T. A., Snel B. Domains in plexins: links to integrins and transcription factors. Trends Biochem Sci. 1999 Jul;24(7):261–263. doi: 10.1016/s0968-0004(99)01416-4. [DOI] [PubMed] [Google Scholar]
  8. Brenner S., Elgar G., Sandford R., Macrae A., Venkatesh B., Aparicio S. Characterization of the pufferfish (Fugu) genome as a compact model vertebrate genome. Nature. 1993 Nov 18;366(6452):265–268. doi: 10.1038/366265a0. [DOI] [PubMed] [Google Scholar]
  9. Burge C., Karlin S. Prediction of complete gene structures in human genomic DNA. J Mol Biol. 1997 Apr 25;268(1):78–94. doi: 10.1006/jmbi.1997.0951. [DOI] [PubMed] [Google Scholar]
  10. C. elegans Sequencing Consortium Genome sequence of the nematode C. elegans: a platform for investigating biology. Science. 1998 Dec 11;282(5396):2012–2018. doi: 10.1126/science.282.5396.2012. [DOI] [PubMed] [Google Scholar]
  11. Corpet F., Gouzy J., Kahn D. The ProDom database of protein domain families. Nucleic Acids Res. 1998 Jan 1;26(1):323–326. doi: 10.1093/nar/26.1.323. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Cottage A., Clark M., Hawker K., Umrania Y., Wheller D., Bishop M., Elgar G. Three receptor genes for plasminogen related growth factors in the genome of the puffer fish Fugu rubripes. FEBS Lett. 1999 Jan 29;443(3):370–374. doi: 10.1016/s0014-5793(99)00011-3. [DOI] [PubMed] [Google Scholar]
  13. Cuff J. A., Clamp M. E., Siddiqui A. S., Finlay M., Barton G. J. JPred: a consensus secondary structure prediction server. Bioinformatics. 1998;14(10):892–893. doi: 10.1093/bioinformatics/14.10.892. [DOI] [PubMed] [Google Scholar]
  14. Dunham I., Shimizu N., Roe B. A., Chissoe S., Hunt A. R., Collins J. E., Bruskiewich R., Beare D. M., Clamp M., Smink L. J. The DNA sequence of human chromosome 22. Nature. 1999 Dec 2;402(6761):489–495. doi: 10.1038/990031. [DOI] [PubMed] [Google Scholar]
  15. Edwards Y. J., Perkins S. J. Assessment of protein fold predictions from sequence information: the predicted alpha/beta doubly wound fold of the von Willebrand factor type A domain is similar to its crystal structure. J Mol Biol. 1996 Jul 12;260(2):277–285. doi: 10.1006/jmbi.1996.0398. [DOI] [PubMed] [Google Scholar]
  16. Garavelli J. S., Hou Z., Pattabiraman N., Stephens R. M. The RESID Database of protein structure modifications and the NRL-3D Sequence-Structure Database. Nucleic Acids Res. 2001 Jan 1;29(1):199–201. doi: 10.1093/nar/29.1.199. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Goffeau A., Barrell B. G., Bussey H., Davis R. W., Dujon B., Feldmann H., Galibert F., Hoheisel J. D., Jacq C., Johnston M. Life with 6000 genes. Science. 1996 Oct 25;274(5287):546, 563-7. doi: 10.1126/science.274.5287.546. [DOI] [PubMed] [Google Scholar]
  18. Hattori M., Fujiyama A., Taylor T. D., Watanabe H., Yada T., Park H. S., Toyoda A., Ishii K., Totoki Y., Choi D. K. The DNA sequence of human chromosome 21. Nature. 2000 May 18;405(6784):311–319. doi: 10.1038/35012518. [DOI] [PubMed] [Google Scholar]
  19. Henikoff S., Henikoff J. G., Pietrokovski S. Blocks+: a non-redundant database of protein alignment blocks derived from multiple compilations. Bioinformatics. 1999 Jun;15(6):471–479. doi: 10.1093/bioinformatics/15.6.471. [DOI] [PubMed] [Google Scholar]
  20. Hofmann K., Bucher P., Falquet L., Bairoch A. The PROSITE database, its status in 1999. Nucleic Acids Res. 1999 Jan 1;27(1):215–219. doi: 10.1093/nar/27.1.215. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Jones D. T., Taylor W. R., Thornton J. M. A new approach to protein fold recognition. Nature. 1992 Jul 2;358(6381):86–89. doi: 10.1038/358086a0. [DOI] [PubMed] [Google Scholar]
  22. Kok K., Osinga J., Carritt B., Davis M. B., van der Hout A. H., van der Veen A. Y., Landsvater R. M., de Leij L. F., Berendsen H. H., Postmus P. E. Deletion of a DNA sequence at the chromosomal region 3p21 in all major types of lung cancer. Nature. 1987 Dec 10;330(6148):578–581. doi: 10.1038/330578a0. [DOI] [PubMed] [Google Scholar]
  23. Lin X., Kaul S., Rounsley S., Shea T. P., Benito M. I., Town C. D., Fujii C. Y., Mason T., Bowman C. L., Barnstead M. Sequence and analysis of chromosome 2 of the plant Arabidopsis thaliana. Nature. 1999 Dec 16;402(6763):761–768. doi: 10.1038/45471. [DOI] [PubMed] [Google Scholar]
  24. Mayer K., Schüller C., Wambutt R., Murphy G., Volckaert G., Pohl T., Düsterhöft A., Stiekema W., Entian K. D., Terryn N. Sequence and analysis of chromosome 4 of the plant Arabidopsis thaliana. Nature. 1999 Dec 16;402(6763):769–777. doi: 10.1038/47134. [DOI] [PubMed] [Google Scholar]
  25. McPherson J. D., Marra M., Hillier L., Waterston R. H., Chinwalla A., Wallis J., Sekhon M., Wylie K., Mardis E. R., Wilson R. K. A physical map of the human genome. Nature. 2001 Feb 15;409(6822):934–941. doi: 10.1038/35057157. [DOI] [PubMed] [Google Scholar]
  26. Murvai J., Vlahovicek K., Barta E., Pongor S. The SBASE protein domain library, release 8.0: a collection of annotated protein sequence segments. Nucleic Acids Res. 2001 Jan 1;29(1):58–60. doi: 10.1093/nar/29.1.58. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Nagase T., Ishikawa K., Suyama M., Kikuno R., Hirosawa M., Miyajima N., Tanaka A., Kotani H., Nomura N., Ohara O. Prediction of the coding sequences of unidentified human genes. XII. The complete sequences of 100 new cDNA clones from brain which code for large proteins in vitro. DNA Res. 1998 Dec 31;5(6):355–364. doi: 10.1093/dnares/5.6.355. [DOI] [PubMed] [Google Scholar]
  28. Reese M. G., Kulp D., Tammana H., Haussler D. Genie--gene finding in Drosophila melanogaster. Genome Res. 2000 Apr;10(4):529–538. doi: 10.1101/gr.10.4.529. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Salanoubat M., Lemcke K., Rieger M., Ansorge W., Unseld M., Fartmann B., Valle G., Blöcker H., Perez-Alonso M., Obermaier B. Sequence and analysis of chromosome 3 of the plant Arabidopsis thaliana. Nature. 2000 Dec 14;408(6814):820–822. doi: 10.1038/35048706. [DOI] [PubMed] [Google Scholar]
  30. Sonnhammer E. L., Eddy S. R., Birney E., Bateman A., Durbin R. Pfam: multiple sequence alignments and HMM-profiles of protein domains. Nucleic Acids Res. 1998 Jan 1;26(1):320–322. doi: 10.1093/nar/26.1.320. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Stoesser G., Baker W., van den Broek A., Camon E., Garcia-Pastor M., Kanz C., Kulikova T., Lombard V., Lopez R., Parkinson H. The EMBL nucleotide sequence database. Nucleic Acids Res. 2001 Jan 1;29(1):17–21. doi: 10.1093/nar/29.1.17. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Tabata S., Kaneko T., Nakamura Y., Kotani H., Kato T., Asamizu E., Miyajima N., Sasamoto S., Kimura T., Hosouchi T. Sequence and analysis of chromosome 5 of the plant Arabidopsis thaliana. Nature. 2000 Dec 14;408(6814):823–826. doi: 10.1038/35048507. [DOI] [PubMed] [Google Scholar]
  33. Theologis A., Ecker J. R., Palm C. J., Federspiel N. A., Kaul S., White O., Alonso J., Altafi H., Araujo R., Bowman C. L. Sequence and analysis of chromosome 1 of the plant Arabidopsis thaliana. Nature. 2000 Dec 14;408(6814):816–820. doi: 10.1038/35048500. [DOI] [PubMed] [Google Scholar]
  34. Thompson J. D., Higgins D. G., Gibson T. J. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 1994 Nov 11;22(22):4673–4680. doi: 10.1093/nar/22.22.4673. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Uberbacher E. C., Mural R. J. Locating protein-coding regions in human DNA sequences by a multiple sensor-neural network approach. Proc Natl Acad Sci U S A. 1991 Dec 15;88(24):11261–11265. doi: 10.1073/pnas.88.24.11261. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Venter J. C., Adams M. D., Myers E. W., Li P. W., Mural R. J., Sutton G. G., Smith H. O., Yandell M., Evans C. A., Holt R. A. The sequence of the human genome. Science. 2001 Feb 16;291(5507):1304–1351. doi: 10.1126/science.1058040. [DOI] [PubMed] [Google Scholar]

Articles from Comparative and Functional Genomics are provided here courtesy of Wiley

RESOURCES