Table 1. Relationships among top-ranking Bi-cliques from Simulated Dataset D2.
Rank | G01 = 0.783 | G03 = 0.0782 | G05 = 0.8388 | G08 = 0.8821 | Fisher's Exact Test P-value |
1 | T | T | 5.59×10−7 | ||
2 | S | S | S | 7.55×10−7 | |
3 | T | T | T | 1.07×10−6 | |
4 | S | S | S | S | 1.72×10−6 |
5 | T | T | 2.11×10−6 | ||
6 | T | 2.59×10−6 | |||
7 | S | S | S | 2.62×10−6 | |
8 | R | R | 3.16×10−6 |
Genes are labeled with their frequencies used for simulating the dataset. The designated high risk pattern, marked R, is ranked 8th. Some specializations of R, marked S, are also high risk. Thus, bi-cliques ranked 2, 4, and 7 are specific instances of bi-clique 8, and include 78%, 69%, and 88%, respectively, of the same individuals as bi-clique 8. All confer an approximately two-fold enhanced risk of disease. These patterns all contain the rare allele (7.8%) for G03, plus common alleles of G01, G05, and G08. Thus, the chance of having the designated genotype pattern if the individual has G03 = 0.0782 is 84%, regardless of the genotypes at the other loci. Stated differently, 84% of the individuals in bi-cliques 1, 3, 5, and 6 have the simulated combination of risk-conferring alleles. G03 is the single gene selected by our set covering algorithm to be the most parsimonious description of all the significant risky patterns. Note that patterns containing G03 but not G05, marked T, involve very common genes combined with G03. This makes the population at risk from these patterns a large subset of the population described by G03 alone. Similar effects are seen in datasets D3 and D4.