Supporting information for Virtanen et al. (2002) Proc. Natl. Acad. Sci. USA, 10.1073/pnas.192240599

 

Supporting Materials and Methods

Binary Search Algorithm.

We devised an algorithm based on a comparison of binary strings to determine which genes were over- or underexpressed in tumors relative to the cell lines. First, for each gene the mean expression value was calculated across all the experiments. Then, for each individual experiment, the expression value was compared with this mean value and was flagged as 1 if greater or 0 if less than the mean. Thus, a string of ones and zeros was generated, giving a binary representation for the gene across the experiments. This string then was compared to a binary "ideal state" string in which all the tumors were flagged with ones, and all the cell lines were flagged with zeros. The opposite of this ideal state string was also compared (cell lines with ones and tumors with zeros). If the comparisons matched at greater than 66% of the positions, then that gene was flagged as being over- or underexpressed in tumors relative to cell lines. We tried variations of this technique where the binary representation was taken relative to an expression value of 1 and by doing t tests to determine the significance of over- and underexpression (i.e., a flag of 0 or 1 was only given if it was a significant over- or underexpression). The binary search algorithm performed best in its initial formulation (data not shown). We also used this algorithm to search for genes that, for example, were overexpressed in small cell carcinomas (both fresh and cell line) relative to other carcinomas.