Toy example of the detection of important SNPs in RF. (A) Toy data set representing six individuals located in six populations with different mean annual temperatures, genotyped at two loci. (B) RF splits the individuals at the first node according to their genotype at SNP A, which leads to two rather homogeneous child nodes in terms of temperature. Splitting individuals at the first node according to their genotype at SNP B would have led to more heterogeneous child nodes, so RF would rather split the individuals using SNP A at the first node. Although SNP B had a poor splitting power at the first node, it has a good splitting power at the second node. Permuting the values of SNP A or SNP B would increase heterogeneity (variance) in the child nodes, meaning that they are important in the model. In this example, SNP A would thus have a main effect, while the effect of SNP B would depend on genotype at SNP A, suggesting an interaction. In regression, we would find SNP A as associated to temperature, contrary to SNP B, whereas it might still be important in adaptation to temperature through an interaction with another SNP. Note that in the actual RF analysis using many more individuals, nodes are not split (and thus considered terminal) when containing five individuals or less.