Data Files

The following data file is a tab-delineated text table. To view this file, we suggest the you first download the file by either right-clicking the link (on a PC) or option-clicking the link (on a Mac). You may then use any tab-capable word processor or MS Excel to open the saved file.

User Manual for SAGE data analysis

Data analysis methods

The Microsoft Excel 5 file "SAGEDATA" contains the data from a SAGE analysis performed on oleate-grown yeast cells (Kal et al., 1998). Initial data analysis was performed using the SAGE Software package version 1.0 (Velculescu et al., 1995). The tag list from wild type cells and pip2/oaf1 cells contained 10,943 and 3847 tags, respectively, of which 577 and 234, respectively, were derived from linker sequences. These tags were excluded from the analysis. The resulting tag lists contained 10,366 total tags from wild type cells and 3613 tags from pip2/oaf1 cells. We compiled a database of all potential tags of the complete yeast genome (over 69,000 10-bp sequences) and linked each tag to the gene annotations in the MIPS database (as of 9th December 1998). Next, we merged this dataset with the tags found with SAGE. Tag numbers can be converted to number of mRNA transcripts per cell assuming a total of 15,000 mRNA molecules per cell (see below). Classification in Functional Categories was done according to the yeast protein functional catalogue (Goffeau, 1997; Mewes et al., 1997); also available via the World Wide Web at http://websvr.mips.biochem.mpg.de/proj/yeast).

Explanation of SAGE data in file "SAGEDATA"

A: Tag sequence 10-bp tag sequence
B: wt ole the number of times a certain tag was found in wild type oleate-grown cells
C: pip2/oaf1 the number of times a certain tag was found in pip2/oaf1 mutant oleate-grown cells
D: Hits in genome the number of times a tag was found in the yeast genome sequence
E: Chromosome chromosome number F: Position tag position on the chromosome. "(C)" indicates that the ORF is encoded on the "Cr ick" strand
G: Distance to ORF distance to ORF: the number of basepairs between 3' end of ORF and tag. "0" indicates that the tag lies within the ORF.
H: ORF coordinates ORF coordinates
I: URL URL for ORF info in MIPS database
J: Systematic name systematic name
K: Gene name gene name
L: Description description of the gene (-product)
M: Funcat(s) MIPS functional catalog numbers

Usage of the "SAGEDATA" file

Determine expression levels
To determine the expression level of a certain gene, follow the guidelines below.

1. Use the systematic name of the gene, e.g. YMR303c is the systematic name for the ADH2 gene. For searches, always use the systematic names. Not all synonyms are includ ed in the descriptions, and sometimes the same acronym is used for different genes (e.g. CTP1 = citrate transport protein or copper transport protein).
2. Select all data in the file by clicking the upper left corner.
3. Sort the Excel file by sys tematic name
4. Find the systematic name of your gene in column I by selecting column I and using the "Find" function.
5. Only consider tags that are within the 500 bp 3' of the ORF. If multiple tags match a gene within the ORF or within the 500 bp 3' of the ORF AND these tags have only one genome hit (column D), tags can be considered to originate from the same gene and can be added. If tags match the genome at multiple places, all places should be checked. Sometime the 11th bp of the tag can be identified using the SAGE software, this sometimes resolves ambiguities.
6. Calculate the expression level (mRNA copies per cell) by dividing the number of tags by the total number of tags for that condition, and multiply the resulting number by 15,00 0 (total number of mRNA molecules per cell). E.g. 100 tags from a gene in the wild type oleate library equals an expression level of 100/10,366*15000=145 mRNA copies per cell.

Note that a single tag can originate from multiple genes (e.g. tag GGT GAAAACG can originate from ADH1, ADH2 or DYN1 genes), that a single gene can give multiple tags and that tags that originate from chromosome localizations far away (>500 bp) from annotated ORF can originate from NORFs (Non-annotated ORFs).

Additional data and software

Data for yeast cells grown on glucose and SAGE software for initial data analysis are available on request (Velculescu et al., 1995, 1998). Data from glucose grown cells are also available via the S accharomyc es Genome Database (http://genome-www.stanford.edu/Saccharomyces/) which provides a web page (http://genome-www.stanford.edu/cgi-bin/SGD/SAGE/querySAGE) where tag sequences can be entered and searched against the yeast genome. However, tags tha t are found in the oleate datasets only cannot be found via SGD yet, but will be soon upon publication of the oleate data.

The SAGEstat software to perform statistical calculations for planning and evaluation of SAGE projects is available on requ est (send a n e-mail to J.M.Ruijter@amc.uva.nl, subject: SAGEstat)

References

Kal, A.J., van Zonneveld, A.J., Benes, V., van den Berg, M., Groot Koerkamp, M., Albermann, K., Strack, N., Ruijter, J.M., Richter, A., Dujon, B., An sorge, W., a nd Tabak, H.F. (1998). Dynamics of gene expression revealed by comparison of SAGE transcript profiles from yeast grown on different carbon sources. Submitted.

Velculescu, V. E., Zhang, L., Vogelstein, B., and Kinzler, K. W. (1995). S erial analysis of gene expression. Science 270, 484-487.

Velculescu, V. E., Zhang, L., Zhou, W., Vogelstein, J., Basrai, M. A., Bassett, D. E., Jr., Hieter, P., Vogelstein, B., and Kinzler, K. W. (1997). Characterization of the yeast transcripto me. Cell 88, 243-251.

For questions and furter information about the SAGE datsets contact:
Dr. A.J. Kal
ICRF, Gene Expression Control Lab, Rm 506
44 Lincoln's Inn Fields
London WC2A 3PX
UK
+44-171-2693229 tel
+44-171-2693581 fax
A.Kal@icrf.icnet.uk