Figure 1.
Graphical visualization of the combined HaploBlock and backcross approach presented in the current study. The figure illustrates the main steps of the process with an example from the true data of five families, each represented by seven individuals, two HaploBlocks (HBs) and one individual SNP on linkage group 1. Genotype codes presented here follow the format of JoinMap v3 and later versions for the cross-pollinated (CP) segregation types (Segr), where <lmxll> refers to a maternal marker with genotypes lm and ll, <nnxnp> to a paternal marker with genotypes nn and np, and <hkxhk> refers to a bi-parental marker with genotypes hh, hk and kk (see https://www.kyazma.nl/docs/JM4manual.pdf—Table 4). These three segregation types are highlighted with different colors: red for markers segregating only in the mother, blue for markers segregating only in the father, and green for those segregating in both parents; missing data (−−) and initially non-informative codes (hk) are not highlighted. (a) The use of the HB strategy allowed the identification of stable sets of SNP-markers, such as those composing HB_1430 and HB_902 that consist of 6 and 10 SNPs, respectively. These SNPs do not segregate in all families (the only exception is F_0420898_L1_PA), thus leading to a considerable amount of missing information (62% of data points). (b) The genotypic information of the co-segregating SNPs is aggregated to form a single HB marker across families and the bi-parental allelic contribution is also split to form two distinct single-parent data sets, where the phase of the new ‘single parent’ HB-markers is adjusted accordingly. (c) The two complete single-parent data sets are subsequently converted in a backcross (BC) design and combined to form a unique population of twice the number of individuals as the initial CP populations. The presented strategy permits the almost complete exploitation of the segregation information available (losing only some information from the rare recombination events within a HB) while considerably reducing the amount of missing information: in this example, from 76% for the initial CP data sets of the two HBs to 28% in the final unique BC population. For the single SNP, the amount of missing data did not change throughout the process by definition and was 66%. This approach of data aggregation and mating type was implemented in the software Haplotype Aggregator (HapAg—http://www.wageningenur.nl/en/show/HaploblockAggregator.htm), whose manual describes the process in more detail.