Table 1. All the attributes ranked by PartsList.
A | ||||
Category |
Symbol |
Definition of symbol |
Attributes in category |
Reference |
Genome Occurrence | G(x) | Number of times a particular PART occurs in genome x. (These are based on PSI-BLAST comparisons between PDB and the genomes with an e-value cutoff in these comparisons of 0.0001.) | 20 | (H.Hegyi, J.Lin and M.Gerstein, manuscript submitted; 19,35) |
Expression | L(e) | Average expression level of a particular PART. This is the average expression level over all genes that contain this PART. | 8 | (46) |
C(e) | PART composition of the yeast transcriptome in expression level experiment e. This refers to the fraction of the mRNA population with this PART as opposed to all other parts. (This is only applicable to expression experiments, such as SAGE and GeneChips, that measure absolute mRNA levels in copies per cell.) | 8 | (46) | |
E(e) | Transcriptome enrichment compared to genome in experiment e. [Transcriptome enrichment is defined as percentage difference of PART composition in the transcriptome and the genome. In symbols: E(e) = [C(e)-G(Scer)] / G(Scer).] | 8 | (46 | |
F(r) | Expression level fluctuation in experiment r. [This is the standard deviation in the expression ratio measurement R(i,t) over a timecourse, for example, <(R(i,t)–<R(i,t)>)2> where one averages over all times t and genes i that have a particular PART.] | 7 | (67) | |
Alignments | V(f) | The number of aligned pairs in pair-set f. | 2 | (39) |
U(f) | RMS deviation in Cα atoms averaged over all alignments in pair-set f | 2 | (39) | |
R(f) | Similar to U(f) for pair-set f but only the best fitting half of the atoms are included in the calculation | 2 | (39) | |
S(f) | Average percentage identity between pairs of aligned proteins in pair-set f | 2 | (39) | |
P(f) | Average sequence P value for pair-set f | 2 | (39) | |
Q(f) | Average structural P value for pair-set f | 2 | (39) | |
Compositions | N(p) | The number of structures associated with a particular PART in dataset p. | 2 | |
B(a,p) | Composition of amino acid a in a particular PART where one averages over all structures in dataset p associated with the PART | 40 | ||
Motion | M(s,d) | The maximum value of statistic s derived from surveying set of motions d in the Macromolecular Motions Database for a particular PART, where s is only calculated from the entries in the database that are associated with the PART. | 7 | (56,57) |
A(s,d) | Similar to M(s,d) but now we take the average instead of the maximum. | 7 | (56,57) | |
Interaction | I(y,c) | For a given PART, the number of types of protein–protein interactions in interaction dataset y subject to the restriction c regarding whether or not the proteins are on the same chain. The number of interaction types is the number of distinctly different PARTs that interacts with a given PART. | 24 | (51,68) |
J(y,c) | For a given PART, the total number of types of interactions in interaction dataset y subject to the restriction c regarding whether or not the proteins are on the same chain. Here we show all interactions observed not just the number of distinct PART-PART interactions tabulated in I(y,c). | 24 | (52,68) | |
Transposon | T(b) | The sensitivity of the cell to a transposon inserted into genes containing a particular PART under different growth condition b. The sensitivity was indicated by negative logarithm of a P value, which measures the degree to which the observations for one particular gene could have resulted from wild-type cells that randomly change their phenotype. | 20 | (58) |
Miscelleneous | X(q) | Various miscellaneous ranks. | 5 | |
Total |
|
|
182 |
|
B | ||||
Attributes |
Value |
Description |
Reference |
|
Genome x = | aful | Archaeoglobus fulgidus | (H.Hegyi, J.Lin and M.Gerstein, manuscript submitted; 19,35) | |
mjan | Methanococcus jannaschii | (H.Hegyi, J.Lin and M.Gerstein, manuscript submitted; 19,35) | ||
mthe | Methanobacterium thermoautotrophicum | (H.Hegyi, J.Lin and M.Gerstein, manuscript submitted; 19,35) | ||
phor | Pyrococcus horikoshii | (H.Hegyi, J.Lin and M.Gerstein, manuscript submitted; 19,35) | ||
scer | Saccharomyces cerevisiae | (H.Hegyi, J.Lin and M.Gerstein, manuscript submitted; 19,35) | ||
cele | Caenorhabditis elegans | (H.Hegyi, J.Lin and M.Gerstein, manuscript submitted; 19,35) | ||
aaeo | Aquifex aeolicus | (H.Hegyi, J.Lin and M.Gerstein, manuscript submitted; 19,35) | ||
syne | Synechocystis sp. | (H.Hegyi, J.Lin and M.Gerstein, manuscript submitted; 19,35) | ||
ecol | Escherichia coli | (H.Hegyi, J.Lin and M.Gerstein, manuscript submitted; 19,35) | ||
bsub | Bacillus subtilis | (H.Hegyi, J.Lin and M.Gerstein, manuscript submitted; 19,35) | ||
mtub | Mycobacterium tuberculosis | (H.Hegyi, J.Lin and M.Gerstein, manuscript submitted; 19,35) | ||
hinf | Haemophilus influenza Rd | (H.Hegyi, J.Lin and M.Gerstein, manuscript submitted; 19,35) | ||
hpyl | Helicobacter pylor | (H.Hegyi, J.Lin and M.Gerstein, manuscript submitted; 19,35) | ||
mgen | Mycoplasma genitalium | (H.Hegyi, J.Lin and M.Gerstein, manuscript submitted; 19,35) | ||
mpne | Mycoplasma pneumoniae | (H.Hegyi, J.Lin and M.Gerstein, manuscript submitted; 19,35) | ||
bbur | Borrelia burgdorferi | (H.Hegyi, J.Lin and M.Gerstein, manuscript submitted; 19,35) | ||
tpal | Treponema pallidum | (H.Hegyi, J.Lin and M.Gerstein, manuscript submitted; 19,35) | ||
ctra | Chlamydia trachomatis | (H.Hegyi, J.Lin and M.Gerstein, manuscript submitted; 19,35) | ||
cpne | Chlamydia pneumoniae | (H.Hegyi, J.Lin and M.Gerstein, manuscript submitted; 19,35) | ||
rpro | Rickettsia prowazekii | (H.Hegyi, J.Lin and M.Gerstein, manuscript submitted; 19,35) | ||
Absolute Expression Experimente = | vegsam | GeneChip mRNA expression analysis of 6200 yeast ORFs under vegetative growth conditions. | (48) | |
vegyou | GeneChip mRNA expression analysis of 5455 yeast ORFs under vegetative growth conditions. | (49) | ||
sage | mRNA expression analysis of 3788 yeast ORFs determined by SAGE. | (43) | ||
matea | GeneChip mRNA expression analysis of yeast mating type a strain grown on glucose. | (50) | ||
mateal | GeneChip mRNA expression analysis of yeast mating type α strain grown on glucose. | (50) | ||
gal | GeneChip mRNA expression analysis of yeast mating type a strain grown on galactose. | (50) | ||
heat | GeneChip mRNA analysis of yeast mating type a strain grown on glucose at 30°C before a 39°C heat shock. | (50) | ||
ref | Reference transcriptome. This is a scaling and merging of the above experiments. | (46) | ||
Microarray Experimentr = | cdc28 | cDNA microarray genome-wide characterization of mRNA transcript levels for CDC28 synchronized yeast cells during the cell cycle. | (69) | |
cdc15 | cDNA microarray genome-wide characterization of mRNA transcript levels for CDC15 synchronized yeast cells during the cell cycle. | (69) | ||
alpha | Analysis using cDNA microarrays of yeast mRNA levels after synchronization of cell cycle via α arrest factor. | (69) | ||
diaux | Genome-wide cDNA microarray analysis of the temporal program of yeast mRNA expression accompanying the metabolic shift from fermentation to respiration. | (70) | ||
spor | cDNA microarray genome-wide analysis to assay changes in gene expression during sporulation. | (71) | ||
heatec | cDNA microarray experiment and analysis on 4290 E.coli ORFs after exposure of the bacteria to heat shock. | (72) | ||
deve | Analysis of genome wide changes during successive larval stages using cDNA microarrays of ∼12 000 C.elegans ORFs. | (73) | ||
Pair-setf = | all | All pairs within a PART included in the calculations in Wilson et al. (For example, for fold rankings this would be the total number of pairs within a fold.) | (39) | |
foldonly | A subset of the pair-set ‘all’ that only includes pairs between structures that are in the same PART but different sub-PART. (If PART is fold, then sub-PART is superfamily; If PART is superfamily, then sub-PART is family.) | (39) | ||
Amino acid a= | Ala, Cys, Asp, Glu, Phe, Gly, His, Ile, Lys, Leu, Met, Asn, Pro, Gln, Arg, Ser, Thr, Val, Trp, Tyr. | (31) | ||
Datasetp = | pdb100 | All structures within the fold (as defined by SCOP pdb100d). | (31) | |
pdb40 | Similar to pdb100 but now using a version of the PDB clustered at 40% similarity (as defined by SCOP pdb40d) | (31) | ||
Interaction typey = | pdball | Interactions for a PART are computed with all other PARTS in the PDB databank based on the distances between atoms in the coordinate files. Five or more contacts between atoms separated by <5 Å was considered a valid PART–PART contact. | (9,51,55) | |
pdba | A subset of ‘pdball’. Interactions for a PART are computed just with all-α proteins (SCOP class 1) in the PDB. | (9,51,55) | ||
pdbb | Similar to ‘pdba’ but now just with all-β proteins (SCOP class 2). | (9,51,55) | ||
pdbab | Similar to ‘pdba’ but now just with mixed helix-sheet proteins (SCOP class 3 and 4) | (9,51,55) | ||
scerall | Interactions for a PART are computed with all other PARTS based on the yeast two-hybrid experimental data. In particular, interactions between structural domains in the yeast genome were obtained by assigning protein structures to the yeast proteins. Structural domains contained within the same ORF that were within 30 amino acids were assumed to interact in an intramolecular fashion. To derive intermolecular interactions, we combined three sets of protein–protein interactions: (i) the MIPS web pages on complexes and pairwise interactions (February 2000) (9), (ii) the global yeast two-hybrid experiments by Uetz et al. (51) and (iii) large-scale yeast two-hybrid experiments by Ito et al. (52). Out of all these pairwise interactions known for yeast ORFs, there is a limited set in which both partners are completely covered by one structural domain (to within 100 residues). | (9,51,55) | ||
scera | A subset of ‘scerall’. Interactions for a PART are computed just with all-α proteins (SCOP class 1) in the yeast experiment. | (9,51,55) | ||
scerb | Similar to ‘scera’ but now just with all-β proteins (SCOP class 2). | (9,51,55) | ||
scerab | Similar to ‘scera’ but now just with mixed helix-sheet proteins (SCOP class 3 and 4). | (9,51,55) | ||
Interaction restrictionc = | inter | The interaction must occur between PARTS in different chains | (9,51,55) | |
intra | The interaction must occur between PARTS in the same chain. | (9,51,55) | ||
none | The union of ‘inter’ and ‘intra’. Interactions can occur in PARTS on the same or different chains. | (9,51,55) | ||
Motion statistics = | nresidue | Number of residues. | (56,57) | |
maxcadev | Maximal displacement of a Cα atom, in Å, of any residue during the motion (after fitting on the first core). | (56,57) | ||
rmsoverall | Overall RMS of two structures after they are superimposed by a sieve-fit technique. Note that they are larger than traditionally used RMS. | (56,57) | ||
nhinges | Number of hinges involved in the motion. | (56,57) | ||
kappa | The rotation (in degrees) around the screw axis necessary to superimpose two domains of motion. | (56,57) | ||
transe | Transition energy of the motion (maximum energy less minimum energy over the motion) (in kcal/mol). | (56,57) | ||
deltae | Absolute value of energy difference between the ‘starting’ and ‘ending’ conformations of a motion (in kcal/mol). | (56,57) | ||
Motion datasetd = | goldstd | List of approximately 220 ‘gold-standard’ manually curated motions | (56,57) | |
auto | List of approximately 4000 conformational different proteins based on analyzing the SCOP database for similar proteins with large conformational differences (as measured by RMS) but close sequence similarity. | (56,57) | ||
Transposon conditionsb = | caff | YPD + 8mM caffeine. | (58) | |
cyss | Cyclohexmide hypersensitivity: YPD + 0.08 µg/ml cycloheximide at 30°C. | (58) | ||
wr | White/red colour on YPD. | (58) | ||
ypg | YPGlycerol. | (58) | ||
calcs | Calcofluor hypersensitivity: YPD+12µg/ml calcoluor at 30°C. | (58) | ||
hyg | YPD + 46µg/ml hygromycin at 30°C. | (58) | ||
sds | YPD + 0.003% SDS. | (58) | ||
bens | Benomyl hypersensitivity: YPD + 10 µg/ml benomyl. | (58) | ||
bcip | YPD + 5-bromo-4-chloro-3-indolyl phosphate at 37°C | (58) | ||
mb | YPD + 0.001% methylene blue at 30°C. | (58) | ||
benr | Benomyl resistance: YPD + 20 µg/ml benomyl. | (58) | ||
ypd37 | YPD at 37°C. | (58) | ||
egta | YPD + 2mM EGTA | (58) | ||
mms | YPD + 0.008% MMS. | (58) | ||
hu | YPD + 75mM hydroxyurea. | (58) | ||
ypd11 | YPD at 11°C. | (58) | ||
calcr | Calcofluor resistance: YPD + 0.3 µg/ml calcofluor at 30°C. | (58) | ||
cycr | Cyclohexmide resistance: YPD + 0.3 µg/ml cycloheximide. | (58) | ||
hhig | Hyperhaploid invasive growth mutants. | (58) | ||
nacl | YPD + 0.9 M NaCl. | (58) | ||
Misc. quantitiesq = | pseu | Number of pseudogenes in worm genome matching a particular PART. | (59) | |
func | Total number of functions associated with this PART. (In this survey all non-enzyme functions were lumped into a single category.) | (60) | ||
enz | Total number of enzymatic functions associated with this PART. | (60) | ||
size | Average length of a PART in the pdb40d clustering of the PDB. | |||
age | The year of the first structure that is part of the PART was determined. |
The formalism for specifying an attribute has two parts: an overall category, denoted by a single uppercase symbol, and some parameter choices, which are denoted by lower-case arguments to the first symbol. Some examples for folds will suffice to make this clear: G(aful) is genome occurrence of a particular fold in A.fulgidus; M(nhinges,goldstd) is the maximum value of the number of hinges statistic from surveying a set of motions in the gold-standard subset of the Macromolecular Motions Database, where this statistic is only calculated for the entries in the motions database that are associated with a particular fold; and I(pdball,inter) is the number of distinct types of protein-protein interactions found in a survey of the PDB, subject to the restriction that the interactions must be between folds on different chains.