Figure 1.
Illustration of the ChIP-seq protocols in use and the generation of spurious sites. The ChIP-seq protocol (A) can consist of IP, DNA input and mock IP experiments. For simplicity, the three open chromatin regions are assumed to be equally sensitive to sonication. In IP, the peak of reads at region 1 is mainly due to the specific interactions between the antibodies (1) and the antigens (triangle) of the target TFs (in red). The peaks at region 2 and 3 are due to nonspecific interactions between the antibody and regulatory proteins at the regions. In mock IP, to avoid the specific antigen-antibody reactions, we use another antibody (2), which does not bind specifically to any DNA binding proteins in the sample. Therefore, the resultant three peaks of reads are due to nonspecific interactions between antibody 2 and other DNA binding proteins. In this hypothetical example, a peak caller compares the three peaks from the IP to those from the DNA input, resulting in binding peaks at region 1 and 2. Since there is no target TF binding at region 2, the detected binding peak is spurious due to strong nonspecific interactions at region 2. Using the mock IP as a control, the peak caller identifies only the genuine binding peak at region 1. For worm and fly samples, due to the use of a GFP tag, we can remove the antigen to avoid antibody-antigen reactions (B). Therefore, the mock IP for a worm or fly sample uses the same GFP antibody as its IP. Because there is no antigen present in the sample for mock IP, the peaks of reads observed are also due to nonspecific interactions. A DNA input control is also generated for the worm or fly sample. The peaks identified from the mock IP using the DNA input as a control are all spurious due to lack of specific interactions.