Table 2. Summary of Results of Set Covering Algorithm for Simulated Datasets.
Dataset | Designated Risk Pattern | Covering Pattern | Coverage | OR | P |
D2 | G03 = 0.0782 & G05 = 0.8388 | G03 = 0.0782 | 30/33 (91%) | 2.33 | 2.59E-06 |
None | 3/33 (9%) | ||||
D3 | G01 = 0.783 & G02 = 0.2784 & G04 = 0.4529 & G06 = 0.7125 | G02 = 0.2784 & G04 = 0.4529 & G06 = 0.7125 | 9/96 (9%) | 1.98 | 1.06E-04 |
G04 = 0.4529 | 56/96 (58%) | 1.39 | 3.98E-03 | ||
G02 = 0.1919 & G07 = 0.3285 | 8/96 (8%) | 1.78 | 8.42E-03 | ||
G02 = 0.2784 & G06 = 0.7125 | 14/96 (15%) | 1.38 | 1.37E-02 | ||
G08 = 0.8821 | 7/96 (7%) | 1.56 | 3.84E-02 | ||
None | 2/96 (2%) | ||||
D4 | G03 = 0.0782 & G05 = 0.8388 | G03 = 0.0782 | 9/38 (24%) | 1.76 | 2.37E-03 |
G01 = 0.783 & G02 = 0.2784 & G04 = 0.4529 & G06 = 0.7125 | G02 = 0.2784 & G04 = 0.4529 | 24/38 (63%) | 1.44 | 1.29E-02 | |
None | 5/38 (13%) |
The set covering algorithm was run on the bi-cliques found in the three simulated datasets. The fraction of input patterns covered by each covering pattern is shown. In dataset D2, 30 of the 33 input patterns could be covered by the single pattern G03 = 0.0782. This is consistent with the data in Table 1, where the common thread of G03 was seen in all eight top patterns. The number of interesting patterns in D2 has been reduced from 30 to 1. Dataset D3 has a more complex risk (four genes), and five patterns were needed to cover 94 of the 96 bi-cliques found in D3. Note that the first cover (3 genes, P≈0.0001) could itself be covered by the second cover (1 gene, P≈0.0040) or the fourth cover (two genes, P≈0.0137). However, the cost model (Appendix S1, Step 5) determined that the difference in P values between these was too large to generalize the three-gene cover pattern to a more parsimonious, but less significant, one- or two-gene cover pattern. Dataset D4, with risk from both the D2 and D3 patterns in the same population, is covered by two simpler patterns. Note that the first D4 cover is the same as the D2 cover. The other D4 cover is a simpler version of the top D3 cover. This slight difference is not unexpected since, for reasons discussed in the text and Appendix S3, the odds ratios and P values are different in the heterogeneous population D4 than in the homogeneous populations D2 and D3.