Table 3. Linear model results of log2OR ~ GC content + TFBS frequency + GC content × TFBS frequency + ε.
Dataset | Region | Variable | Estimate | Standard error | P-value | SSR |
---|---|---|---|---|---|---|
Human |
Enhancer (n = 4321, R2 = 0.127) |
GC contents | 0.56 | 0.024 | < 1×10−15 | 0.11 |
TFBS frequency | 0.44 | 0.11 | 0.00012 | 0.003 | ||
GC × TFBS | NS | |||||
Promoter (n = 1342, R2 = 0.360) |
GC contents | 6.40 | 0.30 | < 1×10−15 | 0.21 | |
TFBS frequency | 22.65 | 2.74 | < 1×10−15 | 0.03 | ||
GC × TFBS | -41.84 | 4.61 | < 1×10−15 | 0.04 | ||
Mouse | Enhancer (n = 4423, R2 = 0.0287) |
GC contents | 0.25 | 0.026 | < 1×10−15 | 0.020 |
TFBS frequency | 0.87 | 0.097 | < 1×10−15 | 0.018 | ||
GC × TFBS | NS | |||||
Promoter (n = 1615, R2 = 0.372) |
GC contents | 5.23 | 0.23 | < 1×10−15 | 0.21 | |
TFBS frequency | 18.79 | 1.89 | < 1×10−15 | 0.038 | ||
GC × TFBS | -37.82 | 3.09 | < 1×10−15 | 0.059 |
We used LASSO-selected species sequence determinants for these analyses. NS indicates that the interaction terms were not statistically significant at P = 0.05. In such cases we conducted log2OR ~ GC content + TFBS frequency + ε model instead of the original model. Numbers of sequence determinants, R2 values of the models, and Type III partial sum of square in regression (SSR) for each variable are also provided.