The diagram shows the steps in the procedure for the evaluation of mutation probabilities and the data flow towards the identification of candidate recessive cancer genes. Molecular data were extracted from public databases (dbEST and GEO at NCBI, and Stanford Microarray Database). A very large number of alignments (over 4.5 million) was obtained for over 24,000 human genes from BLAST analysis of 3 Gbases of EST sequences. The alignments were parsed to extract mismatches which were deposited in the Cancer Mutome local SQL database. The mismatches were then evaluated by specific procedures to associate mutational p-values to each human gene. In parallel, almost 20,000 human genes were assayed from 744 array CGH to define their propensity to deletion in cancer. The specific mutational p-values were combined to produce a recessive cancer p-value. A genome subset of 154 genes, among which TP53, PTEN, CDKN2A and CDKN2B were present, was selected (cancer p-value<1.5×10−7).