Table 2. List of datasets. 23 .
ID | Description | n | p |
---|---|---|---|
simdata | Simulated dataset used to explore GP characteristics of trait genetic complexity, population properties and dimensionality. | See Methods section 2.1.1 for details. | |
Wheat | Real wheat dataset from Norman, Taylor
24
containing 13 traits of varying genetic complexity. These traits are referred to by abbreviations:
BM: Biomass, PH: Plant Height, NDVI: Normalised Difference Vegetative Index, LL: Leaf Loss, LW: Leaf Width, GY: Grain Yield, GL: Glaucousness, GP: Grain Protein, Y: Physiological Yellows, TW: Test Weight of grains, TKW: Thousand Kernel Weight, GH: Growth Habit, GR: Greenness |
10,375 | 17,181 |
STRUCT-simdata | Real structured RegMap panel genotype data of Arabidopsis thaliana with simulated phenotypes data used to analyse the effect of population structure | 1,307 | 15,662 |
STRUCT-realdata | A subset of the real Arabidopsis thaliana structured RegMap panel genotype data with real phenotype data of the sodium accumulation trait used to analyse the effect of population structure | 300 | 169,881 |
LD-simdata | An unstructured set accessions from the core set of the Arabidopsis thaliana HapMap population with known genotype data and simulated phenotype data to study the impact of LD | 344 | 48,343 |
LD-soy | Real soybean dataset of with real phenotypes (R8, HT: height and YLD: yield) for studying the impact of low SNP-QTN LD 32 | 5,014 | 4,235 |