. 2017 Dec;187(12):2895–2911. doi: 10.1016/j.ajpath.2017.08.006

Table 5.

Over-Representation Analysis of Gene Lists

Comparison	Set 1	Set 2	Overlap	Total	Representation factor	P value
SAMARA vs CATHGEN/PREDICT^∗	618	655	43	29,281^†	3.1	6.70 × 10⁻¹¹
CXCL5 vs CAD^‡	1041	681	253	18,425^§	6.6	6.01 × 10⁻¹⁴⁵

Two sets of genes and their overlap were compared, and the representation factor and probability of finding an overlap of genes were calculated. The representation factor is the number of overlapping genes/the expected number of overlapping genes drawn from two independent groups. A representation factor >1 indicates more overlap than expected of two independent groups, a representation factor <1 indicates less overlap than expected, and a representation factor of 1 indicates two groups by the number of genes expected for independent groups of genes. The probability (P) of finding more than the number of identified overlapping genes was calculated using the hypergeometric probability formula.

CAD, coronary artery disease; CATHGEN, CATHeterization GENetics; PREDICT, Personalized Risk Evaluation and Diagnosis in the Coronary Tree; SAMARA, Supporting A Multidisciplinary Approach to Researching Atherosclerosis.

^∗

Indicates gene-level comparison (unique genes only).

^†

Indicates the number of unique genes (GenBank accessions) represented on the Agilent arrays used in the SAMARA, CATHGEN, and PREDICT studies.

^‡

Indicates probe-level comparison.

^§

Indicates the number of probes that passed quality control in the SAMARA microarray data set used to identify CXCL5- and CAD-associated gene expression.