Skip to main content
. Author manuscript; available in PMC: 2015 Apr 24.
Published in final edited form as: Curr Protoc Hum Genet. 2014 Apr 24;81:7.23.1–7.23.21. doi: 10.1002/0471142905.hg0723s81

Figure 6. Principal component analysis (PCA) normalization.

Figure 6

This plot compares each of the principal components (PC) to known sample and target features (samples features can be added in step 2E of Basic Protocol 2). The dotted line (at PC = 15) indicates that XHMM automatically removed the first 15 components based on their significant relative variance. In this plot, we consider known sample and target features (that XHMM did not incorporate in its decision to remove them). We see that these first 15 PC tend to show correlation with various target features (colored circles) such as GC content and the mean depth of sequencing coverage at that target, and also with various sample features (colored diamonds) such as gender and mean depth of sequencing for that sample. On the other hand, there is a marked change in quality of the PC after the first 15 or so, with a sudden drop-off in the levels of correlation with genome-wide and batch effects expected to strongly bias the read depth of coverage.