Large-scale genotyping and phenotyping of a worldwide winter wheat genebank for its use in pre-breeding

Albert W Schulthess; Sandip M Kale; Yusheng Zhao; Abhishek Gogna; Maximilian Rembe; Norman Philipp; Fang Liu; Ulrike Beukert; Albrecht Serfling; Axel Himmelbach; Markus Oppermann; Stephan Weise; Philipp H G Boeven; Johannes Schacht; C Friedrich H Longin; Sonja Kollers; Nina Pfeiffer; Viktor Korzun; Anne Fiebig; Danuta Schüler; Matthias Lange; Uwe Scholz; Nils Stein; Martin Mascher; Jochen C Reif

doi:10.1038/s41597-022-01891-5

. 2022 Dec 26;9:784. doi: 10.1038/s41597-022-01891-5

Large-scale genotyping and phenotyping of a worldwide winter wheat genebank for its use in pre-breeding

Albert W Schulthess ¹, Sandip M Kale ^1,⁹, Yusheng Zhao ¹, Abhishek Gogna ¹, Maximilian Rembe ¹, Norman Philipp ¹, Fang Liu ^1,¹⁰, Ulrike Beukert ², Albrecht Serfling ², Axel Himmelbach ¹, Markus Oppermann ¹, Stephan Weise ¹, Philipp H G Boeven ³, Johannes Schacht ³, C Friedrich H Longin ⁴, Sonja Kollers ⁵, Nina Pfeiffer ⁶, Viktor Korzun ⁵, Anne Fiebig ¹, Danuta Schüler ¹, Matthias Lange ¹, Uwe Scholz ¹, Nils Stein ^1,⁷, Martin Mascher ^1,⁸, Jochen C Reif ^1,^✉

PMCID: PMC9792552 PMID: 36572688

Abstract

Plant genetic resources (PGR) stored at genebanks are humanity’s crop diversity savings for the future. Information on PGR contrasted with modern cultivars is key to select PGR parents for pre-breeding. Genotyping-by-sequencing was performed for 7,745 winter wheat PGR samples from the German Federal ex situ genebank at IPK Gatersleben and for 325 modern cultivars. Whole-genome shotgun sequencing was carried out for 446 diverse PGR samples and 322 modern cultivars and lines. In 19 field trials, 7,683 PGR and 232 elite cultivars were characterized for resistance to yellow rust - one of the major threats to wheat worldwide. Yield breeding values of 707 PGR were estimated using hybrid crosses with 36 cultivars - an approach that reduces the lack of agronomic adaptation of PGR and provides better estimates of their contribution to yield breeding. Cross-validations support the interoperability between genomic and phenotypic data. The here presented data are a stepping stone to unlock the functional variation of PGR for European pre-breeding and are the basis for future breeding and research activities.

Subject terms: Plant breeding, Phylogenomics, Agricultural genetics

Measurement(s)	genome sequence • wheat stripe rust disease resistance • grain yield trait
Technology Type(s)	genotyping-by-sequencing • whole genome shotgun sequencing • Manual scoring in natural and artificially infected fields • Combine harvester
Factor Type(s)	genotype • location
Sample Characteristic - Organism	Triticum aestivum
Sample Characteristic - Environment	agricultural field
Sample Characteristic - Location	Germany

Open in a new tab

Background & Summary

Common wheat (Triticum aestivum L.) is one of the ‘big three’ crops supplying most of the calories for the world population¹. Urban expansion at expenses of agricultural areas^2–4, climate change^4–6, environmental pollution and agroecosystem degradation^3,4,6 threaten future food security. Furthermore, wheat yield improvements showed a significant stagnation during the last decade⁶ which can be attributed to an eroded diversity in elite breeding^7–9. Historically, we have learned the ‘hard way’ that a narrow genetic basis in cultivated plants increases the risk of crop pandemics^10–12. For instance, Puccinia striiformis f. sp. tritici, - the causal agent of yellow rust (YR) - causes severe yield losses¹³ and its recent pandemics broke down several resistance genes (Yr) that are widely deployed in modern wheat cultivars^10,14–16. Pandemic races ‘PstS7’ and ‘PstS8’, a.k.a. ‘Warrior’ and ‘Kranich’, respectively, originated in the near-Himalayan region^10,15 and spread worldwide during the last decade^10,15,16. During 2020, ‘Warrior’ and ‘Kranich’ lost importance in Europe, where ‘PstS10’ was the most prevalent race group¹⁶, thus illustrating how fast YR populations can evolve in the field. Paving the way towards a more sustainable agriculture, the management and diversification of resistant mechanisms¹⁷ will be the ‘main weapon’ to confront the increased risk of YR infection occurrence expected for Europe and other temperate regions in a climate change context¹⁸.

Decreased genetic diversity of cultivated wheat could be boosted by rescuing the abandoned or unexplored diversity treasure hidden in the ~800 K wheat plant genetic resources (PGR) stored at Genebanks worldwide^19,20. In this respect, the Nagoya Protocol of the Convention of Biological Biodiversity was conceived to promote the use and equitable sharing of benefits from PGR for sustainable development (https://www.cbd.int/abs/about/). In addition, genebank genomics have already demonstrated how ‘molecular passports’ create value in genebank management by providing precise knowledge that goes beyond the boundaries of classical descriptors^20,21. Nevertheless, due to the scarcity of agriculture-relevant information available for PGR, breeders often end up randomly choosing PGR as parents in crosses: an untargeted approach with low return of investment²². Furthermore, the value of PGR for agriculture and breeding is always relative to what can be found in modern cultivars. On the one hand, PGR characterizations should be put into this context by doing side-by-side comparisons between PGR and the modern gene pool^23,24. On the other hand, PGR-versus-cultivar comparisons are biased for complex traits such as yield, where PGR are at disadvantage for their general lack of agronomic adaptation. This lack of adaptation is corrected by evaluating hybrid crosses between PGR and modern cultivars^14,25,26 – an intermediate background that allows the estimation of the yield breeding value of PGR^14,27.

Our study relies on the winter wheat collection hosted at the German Federal ex situ Genebank of Agricultural and Horticultural Crops in Gatersleben (IPK genebank). With ~ 27 K Triticum sp. PGR, the IPK collection is one of the largest among EU-27 countries¹⁹. Genotyping-by-sequencing (GBS) was carried out for 7,745 PGR samples of the IPK genebank and 325 modern cultivars. Whole-genome shotgun sequencing (WGS, 3-fold coverage) was performed for 446 diverse PGR samples, 191 modern cultivars and 131 advanced breeding lines. YR resistance was scored across 19 field experiments for 7,683 PGR and 232 elite cultivars. A total of 26 yield experiments allowed the evaluation of the contribution of 707 diverse PGR to yield improvement using ‘Elite × PGR’ bridging crosses. To the best of our knowledge, no large-scale datasets have been made publicly available so far that contain interoperable genomic and agriculture-relevant information on wheat PGR. Raw and processed data as well as phenotypic- and genomic-based approaches to prove data quality and interoperability are made available here following the FAIR principles of data publication²⁸. In our main companion study¹⁴, genomic data was used to study crop genetic diversity as well as to detect duplicates, mislabeling of PGR, gaps between European genebank collections, selective sweeps and alien introgressions introduced by breeding. Mining YR and genomic data identified potential PGR donors of new sources of resistance not yet used in breeding. Yield breeding values guided early pre-breeding activities and allowed the obtention of PGR-derived lines with competitive yield levels in field experiments. We expect that these data incentive additional data science and breeding activities that can further valorize PGR.

Methods

Plant material

Overview

Across datasets, experiments and crosses, wheat genotypes trace back to 9,145 PGR from the IPK genebank, 340 European elite cultivars, and 131 German advanced breeding lines. Passport information respecting scientific name, growth habit, geographical origin, as well as acquisition (‘TRI’ PGR), release (‘B’ PGR/elite cultivars), and obtention (elite breeding lines) date of the studied material were collected in our companion publication¹⁴. In more detail, information for PGR of the ‘TRI’ collection was accessed through the Genebank Information System (GBIS)²⁹ as extended MCPD-format. Passport data of PGR from the ‘Boris 96’ panel³⁰, i.e. ‘B’ collection; as well as information for elite cultivars and German breeding lines were compiled from different publications, online databases and website sources¹⁴. Not only for the IPK genebank but for genebanks worldwide in general, passport data are dynamic and are prone to change over time due to, for instance, genebank curation activities^20,21. To deal with different versions of identifiers in passport data, the IPK genebank uses unique digital object identifiers (DOIs) that are fixed and can trace back plant material to the formerly requested IPK accessions and their information.

Taxonomically, almost all PGR were either explicitly declared/classified as Triticum aestivum L. (67.2%) or in general as Triticum sp. (32.3%), while the very small proportion of remaining PGR have either not been classified yet (0.4%), corresponded to tetraploid wheat species (0.08%) or were wheat interspecific hybrids (0.02%). All European elite genotypes (cultivars plus breeding lines) were classified as Triticum aestivum L.. Growth habits of databases were contrasted with own field observations and correspondingly updated in our companion work¹⁴. PGR are almost entirely composed of strictly winter types (98.8%) with a small proportion of facultative types (1.2%). Similarly, most European elite genotypes are of winter type (96.8%) plus some facultative (2.1%) and spring (1.1%) type wheats.

Among the 55 different geographical origins reported for PGR (Fig. 1a), 52 have official ISO 3166 Alpha-3 country codes (https://www.iso.org/obp/ui). Current states/countries of the former Soviet Union (SUN), Yugoslavia (YUG), and Czechoslovakia (CSK) were pooled together according to each of these three origins to homogenize different nomenclatures that arise due to historical reasons. In addition, for 1,506 PGR (16.5%) their origins are unknown, while only one European cultivar (0.3%) is missing its exact country of origin. European nations were the most common PGR origins (60.6%), followed by Asia (15.1%), and American countries (7.4%) (Fig. 1a). Most European PGR originated in Germany (23.4% of European origins), Italy (13.3%), and the former Soviet Union (10.7%). The majority of European elite genotypes belongs to Germany (61.8%), Great Britain (11%), France (10.6%), and Poland (7%) (Fig. 1b).

Regarding acquisition/release dates, these are unknown for 44 PGR (0.5%) and six elite cultivars (1.8%), respectively. PGR span an 80-year time window (1927–2007), with most of them (97.8%) tracing back to the last century (Fig. 2). Release or obtention dates of European elite genotypes range from 1975 to 2020, with the majority of them (87.5%) released/obtained from 2000 onwards. Among European cultivars, ‘Monopol’ is the oldest one (released in 1975) and is still grown today in Germany for its high milling and baking quality³¹, while ‘LGCharacter’ and ‘RGTRitter’ were the newest varieties (both released in 2020).

Fig. 2 — Distribution of the number of PGR (orange columns), elite cultivars (green) and breeding lines (blue) considered in this study according to their years of acquisition, release or obtention. When present, the exact counts number of genotypes per year are included within brackets [].

Seed sources

PGR from the ‘TRI’ catalog were directly obtained from the IPK genebank through GBIS, while seeds of PGR from the ‘B’ collection were kindly facilitated by Dr. Andreas Börner. The IPK genebank can normally provide ~5 g of seeds per PGR. Thus, in order to fulfill seed amount needs of large-scale research activities, seeds of 9,135 PGR were multiplied in a first step using single-row propagation plots. In parallel, 173 of these PGR plus 10 additional ones were multiplied under greenhouse conditions under the frame of the GenDiv project³². For elite cultivars, seeds were obtained from the local market either recently or in previous projects^33,34. Seeds of the advanced breeding lines were provided by four breeding companies with base in Germany^35,36.

For field-propagated PGR, one (two) representative ear(s) was (were) bagged in case of homogeneous (clearly non-homogenous) PGR. Following a single-seed descent (SSD) method, seeds from isolated ears were harvested separately from the rest of the plot and further propagated using an ear-to-row method. Hereafter, we refer to these PGR samples as SSD-PGR. Greenhouse-propagated PGR were fixed by applying the SSD method for two consecutive generations (2 × SSD) for each PGR³².

Large-scale genotyping

Genomic data production

GBS was performed for 7,745 SSD-PGR plus 325 elite cultivars, while WGS was conducted for 263 SSD-PGR, 191 elite cultivars, 131 advanced breeding lines, and the 183 greenhouse-propagated PGR. GBS-characterized SSD-PGR trace back to 7,651 accessions and are identified with a unique correlative number combined with the PGR name nomenclature from the ‘TRI’ and ‘B’ collections. In more detail, SSD-PGR originate from single representative plants of 7,557 homogenous PGR and 94 double sampled non-homogenous PGR. For 171 modern cultivars, three plants per genotype were sequenced in a previous work⁷, while only a single plant per genotype was sequenced for the remaining samples. Regarding WGS-characterized SSD-PGR, only two of them trace back to a double sampled non-homogenous PGR. For each genotype, DNA was extracted from a single in-greenhouse-grown-10-days-old seedling using silica-membrane technology (NucleoSpin^® 96 Plant II) according to manufacturer instructions (Machery-Nagel).

GBS was conducted following a two-enzyme digestion protocol^37,38. For this, DNA samples were simultaneously digested with PstI and MspI (New England Biolabs) and sticky ends were ligated with adapters containing sample-specific barcode sequences. This step allowed to track down each processed barcoded sample after DNA pooling. DNA was pooled into batches of 540 genotypes in equimolar amount to form a GBS library. Single-end sequencing (1 × 107 cycles) was conducted following Illumina protocols on HiSeq 2000 (171 × 3 = 513 samples), HiSeq 2500 (6,262 samples) or NovaSeq 6000 (1,637 samples) devices and using custom sequencing primers.

For WGS, libraries were prepared using the Nextera DNA Flex Library Prep following Illumina protocols and pooled afterwards in an equimolar manner. The multiplexed pool was quantified by qPCR and paired-end sequenced (2 × 151 cycles and 10 bp for the index reads) using a NovaSeq 6000 system and an average genome coverage of 3-fold.

Genomic data processing

In a first step, reads of each of the 171 modern cultivars sequenced in triplicate were pooled according to each original genotype. In this regard and if the opposite is not stated, the terms samples and genotypes are used indistinctly when referring to genomic data throughout the whole manuscript. Low-quality bases and adapter sequences were discarded from GBS raw reads using cutadapt (v1.16)³⁹ by considering a minimum read length of 30 bp. This step was subsequently confirmed by using FastQC (v0.11.7)⁴⁰. After this, BWA-MEM (v0.7.17)⁴¹ was applied at default settings to align the retained high-quality reads against the hexaploid wheat reference genome assembly of Chinese Spring (RefSeq v1.0)⁴². For WGS, reads were aligned with minimap2 (v2.17)⁴³, in which the genome index size was set to 50 Gb while all other parameters remained as per default. Alignments were converted into binary map format using SAMtools (v1.9)⁴⁴ and sorted afterwards with NovoSort^® (v3.06.05). Variant calling was done using the mpileup and call functions from SAMtools/BCFtools (v1.8)⁴⁵ with parameters -DV and minimum read quality (q) cutoff = 20. Later, those biallelic variants were retained in the generated variant calling format (VCF) files using a custom awk script if the minimum QUAL = 40, minimum read depth for homozygous calls = 2 and for heterozygous calls = 4, in case of GBS, while these parameters were set to 40, 1, and 2, respectively, for WGS. From here onwards, we refer to these VCF files for GBS and WGS polymorphic variants as “unfiltered” SNP data. For the assessment of molecular diversity, linkage disequilibrium and genomic-phenotypic data interoperability (see last two sections of Methods), VCF files were further filtered using BCFtools and base and data.table (v1.12.8) functions in R environment⁴⁶ (v3.6.1). Files were imported into R using the vcfR package (v1.12.0)⁴⁷. Here, only markers having a minimum QUAL value of 40, a maximum percentage of missing values of 10%, ≥10 genotypes carrying any of both alleles in homozygous state, and ≤1% heterozygosity, were retained.

Field evaluations of yellow rust resistance

Two groups of field experiments, summing up to 19 in total, were conducted to evaluate the resistance against YR (Table 1):

Table 1.

Summary of the 12 unbalanced large-scale screening experiments and seven balanced precision experiments conducted in the field to evaluate yellow rust (YR, Puccinia striiformis f. sp. tritici) severity of plant genetic resources and elite cultivars.

Group	Experiment	Location^a	Year^b	Times^c	Infection^d	Agronomy^e	Design^f	Plot size (m²)	Replicates^g	Entries^h
Large-scale screening	GAT_2014_1	Gatersleben	2014–2015	1x	Natural	H	α lattice	0.4	2	1537
	SST_2014_1	Schackstedt	2014–2015	1x	Natural	H	α lattice	0.4	2	1537
	GAT_2014_2	Gatersleben	2014–2015	1x	Natural	H	α lattice	0.4	2	1560
	GAT_2015_2	Gatersleben	2015–2016	1x	Natural	H	α lattice	0.4	2	1561
	SST_2015_2	Schackstedt	2015–2016	1x	Natural	H	α lattice	0.4	2	1509
	GAT_2015_3	Gatersleben	2015–2016	1x	Natural	H	α lattice	0.4	2	1600
	GAT_2016_3	Gatersleben	2016–2017	1x	Natural	H	α lattice	0.4	2	1588
	GAT_2017_3	Gatersleben	2017–2018	1x	Natural	H	α lattice	0.4	2	1583
	GAT_2017_5	Gatersleben	2017–2018	1x	Natural	H	α lattice	0.4	2	1447
	SST_2018_5	Schackstedt	2018–2019	1x	Natural	H	α lattice	0.4	2	1428
	GAT_2019_6	Gatersleben	2019–2020	M:2x	Natural	H	α lattice	0.4	2	1697
	SST_2019_6	Schackstedt	2019–2020	1x	Natural	H	α lattice	0.4	1⁽²⁾	1696
Precision	GAT_YR_2018	Gatersleben	2018–2019	2x	Natural	H	α lattice	0.4	2	793
	ROS_YR_2018	Rosenthal	2018–2019	1x	Artificial „F“	H/F	α lattice	0.3	2	800
	WTZ_YR_2018	Wetze	2018–2019	2x	Artificial „S“	H/F	α lattice	0.5	2	800
	GAT_YR_2019	Gatersleben	2019–2020	2x	Natural	H	α lattice	0.4	2	801
	QLB_YR_2019	Quedlinburg	2019–2020	2x	Artificial „S“	F	IB	0.2	1	800
	ROS_YR_2019	Rosenthal	2019–2020	1x	Artificial „F“	H/F	α lattice	0.3	2	798
	WTZ_YR_2019	Wetze	2019–2020	1x	Artificial „S“	H/F	α lattice	0.5	2	799

Open in a new tab

^aLocation specificities are as follows: Gatersleben (latitude 51° 49’ 19.74” N, longitude 11° 17’ 11.80” E, 111 m.a.s.l., black soil of clayey loam-texture, 9 °C average annual temperature, 490 mm average annual rainfall), Schackstedt (latitude 51° 43’ 0” N, longitude 11° 37’ 0” E, 122 m.a.s.l., black soil of clayey loam-texture, 8.9 °C average annual temperature, 483 mm average annual rainfall), Quedlinburg (latitude 51° 46’ 22.22” N, longitude 11° 9’ 12.82” E, 140 m.a.s.l., black soil of clayey loam-texture, 8.9 °C average annual temperature, 497 mm average annual rainfall), Rosenthal (latitude 52° 18’ 10.242” N, longitude 10° 10’ 26.2668” E, 72 m.a.s.l., brown soil of slightly sandy loam-texture, 9.5 °C average annual temperature, 637 mm average annual rainfall), Wetze (latitude 51° 44’ 22.686” N, longitude 9° 54’ 36.1224” E, 136 m.a.s.l., brown soil of slightly clayey loam-texture, 8.6 °C average annual temperature, 664 mm average annual rainfall).

^bSowing - harvest years.

^cDisease symptoms were scored either once (1x) or twice (2x) after the onset of YR infection. M:2x means that only the maximum infection value was recorded.

^dMaterial was tested based either on natural infections (Natural), artificial inoculations using spreader plots (Artifial “S”) or full experiment artificial inoculations (Artifial “F”).

^eCrop management considered chemical control against weeds (H) and/or use of fertilizers (F).

^fSpatial variation was corrected using an alpha (α lattice) design with blocks subdivided into incomplete blocks or only considering incomplete blocks (IB).

^gIn case of 1⁽²⁾ only one of two replicates was measured.

^hNumber of entries according to the original field plan.

Large-scale resistance screening in unbalanced experiments

Twelve experiments were performed to large-scale evaluate the YR resistance of 7,684 PGR and 80 European elite cultivars based on naturally occurring infections (Table 1). An additional wheat genotype denoted as ‘Filler’ was considered due to technical reasons during sowing, but it was not part of the tested entries and has no passport data. Given the large number of entries to be screened, the plant material was tested in an unbalanced fashion by considering 1,428–1,697 entries per experiment (Table 1). Experiments were conducted between harvest years 2015–2020 at locations Gatersleben and Schackstedt. Wheat plants were cultivated in all experiments under rainfed continental conditions predominant at both German locations. In all experiments, chemical crop protection comprised only the use of herbicides, while no fertilizers were applied. In each experiment, the experimental unit corresponded to a 0.4 m² plot. An alpha lattice design with two complete replications divided into incomplete blocks was used to account for uncontrolled spatial variation. Except experiment SST_2019_6, in which infection severity was scored in only one replication, both replications were measured in each experiment. In addition, GAT_2019_6 is the only experiment of this group in which YR infections were scored at two (early and late) dates. For this experiment, only the maximum as the most differentiating value among the two dates was retained for each plot. Otherwise experiments considered only a single scoring date after the clear onset of YR infections. Infection severity was expressed in a 1 (no symptoms) to 9 (severe infection) scoring scale following the official protocols of the German Federal Plant Variety Office⁴⁸.

Precision balanced experiments

Seven experiments were conducted to test 200 European elite cultivars and 600 SSD-PGR (Table 1). Elite cultivars were pre-selected based on their maximized genomic diversity. The 600 SSD-PGR are not only a highly diverse sample but harbor also an increased proportion of resistant genotypes - which are in general at low frequencies in genebanks¹⁴. Among the 600 SSD-PGR, only two of them trace back to a double sampled non-homogenous PGR. Three wheat entries not belonging to the 200 cultivars plus 600 SSD-PGR but used to estimate experiment effects lack of passport data and were thus denoted as Filler_1–3. Experiments were conducted during harvest years 2019 and 2020 in German locations Gatersleben, Quedlinburg, Wetze, and Rosenthal (Table 1). Wheat plants were cultivated in all experiments under rainfed continental conditions predominant at all considered locations. Experiments GAT_YR_2018 and -_2019 were based on natural infections, while the other five experiments were artificially inoculated. Experiments ROS_YR_2018 and -_2019 relied on inoculations directly applied on the tested material, whereas surrounding susceptible spreader plots served as initial inoculum source for the tested entries in the other three inoculated experiments. Artificial inoculations were based on spore mixtures of race isolates from genetic groups ‘PstS7’ and ‘PstS10’ collected during past crop seasons. As reported by the Global Rust Reference Center¹⁶, these two aggressive race groups are virulent against resistance genes Yr1, Yr2, Yr3, Yr4, Yr6, Yr7, Yr9, Yr17, Yr25, Yr32 and also against resistance specificities of genotypes ‘Spalding Prolific’ and ‘Avocet S’. As well, both race groups are avirulent against resistance genes Yr5, Yr8, Yr10, Yr15, Yr24 and Yr27. In particular, ‘PstS7’ is virulent against the resistance specificity of the genotype ‘Ambition’, while ‘PstS10’ being avirulent. Experiments conducted in Gatersleben considered chemical weed control without use of fertilizers, while fertilizers but no herbicides were applied in the QLB_YR_2019 experiment (Table 1). Regarding experiments conducted in Wetze and Rosenthal both, herbicides and fertilizers were applied. Except for the QLB_YR_2019 experiment, where plant material was tested using a single replication in incomplete blocks, all experiments considered two complete replications and an alpha lattice design. The size of the experimental unit was a 0.2–0.5 m² plot, with a size fixed for each test location. Disease symptoms were scored at a single timepoint after the onset of YR infection in WTZ_YR_2019, ROS_YR_2018 and -_2019 experiments, while early and late infections were recorded in the other four experiments. QLB_YR_2019 was the only experiment where infection was originally recorded using a percentage instead of a 1–9 scoring scale. Percentage scorings were transformed into a 1–9 scale using the scale conversion key of the German Federal Plant Variety Office⁴⁸.

Yield evaluations for informed pre-breeding

Two groups of field experiments, summing up to 26 in total, were conducted to evaluate the contribution of PGR to yield improvement using ‘Elite × PGR’ crosses (Table 2):

Table 2.

Summary of experimental setup for the 22 and four estimation and validation experiments, respectively, of yield breeding values of plant genetic resources using bridging ‘Elite × PGR’ crosses.

Group	Series	Experiment	Location^a	Year^b	Trials^c	Trial Design	Test^d	Block size^e	Entries^f
Estimation	S1	BOH_2015	Böhnhausen	2015–2016	2	α lattice	U	9	611
	S1	GAT_2015	Gatersleben	2015–2016	2	α lattice	U	9	611
	S1	HOH_2015	Hohenheim	2015–2016	2	α lattice	U	9	611
	S1	RNG_2015	Renningen	2015–2016	2	α lattice	U	9	611
	S1	SST_2015	Schackstedt	2015–2016	2	α lattice	U	9	611
	S2	ASD_2016	Asendorf	2016–2017	2	α lattice	U	10	615
	S2	GAT_2016	Gatersleben	2016–2017	2	α lattice	U	10	614
	S2	HOH_2016	Hohenheim	2016–2017	2	α lattice	U	10	614
	S2	RNG_2016	Renningen	2016–2017	2	α lattice	U	10	617
	S2	SST_2016	Schackstedt	2016–2017	2	α lattice	U	10	617
	S3	GAT_2017	Gatersleben	2017–2018	1	α lattice	PR	10	433
	S3	HDM_2017	Hadmersleben	2017–2018	1	α lattice	U	10	500
	S3	HOH_2017	Hohenheim	2017–2018	1	α lattice	U	10	500
	S3	RNG_2017	Renningen	2017–2018	1	α lattice	U	10	500
	S3	SST_2017	Schackstedt	2017–2018	1	α lattice	PR	10	389
	S4	GAT_2018	Gatersleben	2018–2019	1	α lattice	U	12	495
	S4	HDM_2018	Hadmersleben	2018–2019	1	α lattice	U	12	488
	S4	HOH_2018	Hohenheim	2018–2019	1	α lattice	U	12	502
	S4	RNG_2018	Renningen	2018–2019	1	α lattice	U	12	502
	S4	SST_2018	Schackstedt	2018–2019	1	α lattice	U	12	502
	S5	GAT_2019	Gatersleben	2019–2020	1	α lattice	CR	10	510
	S5	SST_2019	Schackstedt	2019–2020	1	α lattice	CR	10	510
Validation	—	GAT_PB_2019	Gatersleben	2019–2020	1	Incomplete block	PR	4–5	95
	—	SST_PB_2019	Schackstedt	2019–2020	1	Incomplete block	PR	4–5	59
	—	GAT_PB_2020	Gatersleben	2020–2021	1	Incomplete block	PR	5	118
	—	SST_PB_2020	Schackstedt	2020–2021	1	Incomplete block	PR	5	108

Open in a new tab

^aLocations specificities are as follows: Böhnhausen (latitude 51° 51’ 30.95” N, longitude 10° 57’ 44.32” E, 173 m.a.s.l.), Gatersleben (latitude 51° 49’ 19.74” N, longitude 11° 17’ 11.80” E, 111 m.a.s.l.), Hohenheim (latitude 49° 43’ 2.65” N, longitude 9° 11’ 12.70” E, 406 m.a.s.l.), Renningen (latitude 48° 44’ 29.58” N, longitude 8° 55’ 15.35” E, 484 m.a.s.l.), Schackstedt (latitude 51° 43’ 0” N, longitude 11° 37’ 0” E, 122 m.a.s.l.), Asendorf (latitude 52° 44’ 17.93” N, longitude 9° 0’ 24.11” E, 45 m.a.s.l.), Hadmersleben (latitude 51° 59’ 29.79” N, longitude 11° 18’ 12.79” E, 91 m.a.s.l.).

^bSowing - harvest years.

^cNumber of trials per experiment.

^dGenotypes were tested either in an unreplicated (U), partially (>20%) replicated (PR) or completely replicated (CR) fashion.

^eSize of the incomplete blocks (number of plots) used to account and correct for uncontrolled spatial variation. Plot sizes ranged between 6 to 9 m².

^fNumber of entries according to the original field plan.

Yield breeding value estimation experiments

A total of 751 PGR - 234 PGR plus 527 SSD-PGR denoted with the suffix “_ISO” - and four elite cultivars were crossed as male parents with up to 16 out of 42 elite cultivars using chemical hybridization agents in the field. Particularly, 1,569 out of the 1,577 resulting hybrids corresponded to ‘Elite × PGR’ factorial crosses, while the remaining eight hybrids were ‘Elite₁ × Elite₂’ crosses. PGR serving as pollen donors comprise a diverse sample from the IPK genebank¹⁴ and were pre-selected for their pronounced anther extrusion based on large-scale screenings of genebank material. This pre-selection ensured a sufficient quantity of field-produced hybrid seed to be able to conduct multiple field experiments. Hybrid seed of ‘Elite × PGR’ and ‘Elite₁ × Elite₂’ crosses was produced at the IPK facilities. Sterility of the female parents was checked by bagging at least one plant per crossing block⁴⁹. In addition, during the season following seed production, the uniformity and hybridity – a clear morphological differentiation from the female parent – of F₁ seeds were controlled by growing each hybrid and its both parents side-by-side in 0.2 m² plots. In parallel, the 1,577 IPK hybrids were tested together with 347 hybrids from the State Plant Breeding Institute of the University of Hohenheim (Landessaatzuchtanstalt, LSA), 518 parent genotypes, in addition to a set of 40 checks for their grain yield. Yield testing was conducted in a staggered fashion throughout five consecutive years by using partially overlapping entry groups (series), each composed of 501 to 617 genotypes (Table 2). Except for series 5, which was tested in only two locations, each series was tested in five locations. Across series, a total of 22 estimation experiments spanned together harvest years 2016–2020 and seven different German locations: Hohenheim, Renningen, Gatersleben, Schackstedt, Böhnhausen, Asendorf, and Hadmersleben. All experiments were conducted following an alpha lattice design. Experiments were performed either in an unreplicated (series 1, 2 and 4, plus three experiments of series 3), partially replicated (series 3) or completely replicated (series 5) fashion. For experiments of series 1 and 2, trialing and blocking was used to account and correct for uncontrolled spatial variation, while complete and/or incomplete blocks were considered for this purpose in series 3 to 5. The experimental unit corresponded for all series to a plot, with sizes ranging between 6 to 9 m². Wheat plants were cultivated in all experiments under rainfed continental conditions predominant at all considered locations. In all breeding value estimation experiments plots were treated with fertilizers, herbicides, and pesticides by following conventional local practices. Harvest of plots was performed with a combine harvester and plot yield was adjusted to a 140 g H₂O kg⁻¹ moisture basis and expressed in Mg ha⁻¹.

Yield breeding value validation experiments

The feasibility to develop high yielding pre-breeding material using breeding values as a tool for PGR parent selection was evaluated in early yield testing experiments (Table 2). Preliminary breeding values obtained from estimation experiments of harvest year 2016 were used to select 13 PGR with high yield breeding value estimates. These PGR served as pollen donors in 14 and 18 initial crosses during 2016 and 2017, respectively. Two additional PGR lacking of breeding value estimates were also considered as male parents in crosses during 2017. A set of 12 locally adapted European elite cultivars released between years 2004 and 2015 were used as pollen receptors in two- (Elite₁ × PGR) and three-way crosses (Elite₂ × [Elite₁ × PGR]) involving PGR. Seeds of segregating progenies from each of the eight and 27 performed two- and three-way crosses, respectively, were increased and genetically fixed by two generations of selfing in Gatersleben. Besides roughing of off-types plus fixing true types, two-stage selection based on visual assessment of single plants, followed by rows focusing on plant height and leaf health in 0.5 m² plots, was applied. Other than herbicides, no additional chemical treatments (i.e. fungicides, nitrogen fertilizers, etc.) were used for crop management during plant material depuration. After these selection steps, at least one genotype per initial cross could enter early yield testing experiments, summing up to a total of 189 advanced F_3:4 families across 35 initial crosses. In the breeding value validation experiments (Table 2), candidate families were evaluated for their yield per se performance together with 15 elite checks under conventional local agricultural practices. Experiments were conducted during harvest years 2020 and 2021; with each year considering two locations: Gatersleben and Schackstedt. Wheat plants were cultivated in all experiments under rainfed continental conditions predominant at both considered locations. Elite checks corresponded to winter wheat cultivars approved for commercial use in Germany, with the French cultivar ‘Arezzo’ (released in 2007) being the oldest one, while the German ‘LGCharacter’ and French ‘RGTRitter’ varieties (both released in 2020) were the newest ones. Check varieties ‘RGTReform’, ‘Benchmark’, and ‘LGInformer’, were commercially released in 2014, 2015, and 2018, respectively, and connected the four validation experiments, thus allowing an integrated analysis. Seven additional genotypes (coded as LSA_1–7) present in early yield experiments were lines from the LSA breeding program. Although LSA lines lack of passport data, these were kept in datasets to not disrupt the estimation of field design effects. In all experiments, the experimental unit corresponded to a 6 m² plot. Entries were tested in a partially replicated fashion and an incomplete block design was used to correct for uncontrolled spatial variation. Plots were harvested using a combine harvester, whereas grain yield was adjusted to a 140 g H₂O kg⁻¹ moisture basis and expressed in Mg ha⁻¹.

Phenotypic data analyses

A multiple-step strategy based on mixed models⁵⁰ was implemented for data curation and parameter estimation:

Data curation and preparation

With the exception of yield breeding value validation experiments as well as the YR evaluations of SST_2019_6 and QLB_YR_2019 experiments, phenotypic data were outlier-corrected first by using the following general mixed model:

Trait ~ μ + Genotypes + Experiments + Genotypes \times Experiments + Trials + Replicates (Trials) + Blocks (Replicates : Trials) + Error,

where the common mean (μ) and genotype effects were treated as fixed factors, whereas experiments and their multiplicative interactions with genotypes, trials nested within experiments, replicates nested within trials, incomplete blocks nested within replicates and trials, as well as the model error nested within experiments were assumed as random and normally distributed.

In case of YR evaluations, Eq. (1) was fitted experiment-wise for each scoring timepoint (single, early or late). Therefore, effects of trials, experiments, and their interactions with genotypes were dropped from Eq. (1) according to each specific experimental design (Table 1). Normalized residuals of this model were obtained by subtracting their average and dividing them by their standard deviation. After this, residuals were tested experiment-wise for their significance as outliers following Anscombe and Tukey⁵¹ and assuming a nominal alpha level of 0.01. Accordingly, datapoints flagged as outliers were discarded from final datasets.

For breeding value estimation experiments, outlier correction of yield data underwent series-wise and trials and/or replicate effects in Eq. (1) were considered/ignored according to the respective experimental design(s) used in each series (Table 2). Later, yield data were screened series-wise for significant outliers using the method M4 “Bonferroni-Holm with rescaled median absolute deviation standardized residuals” as suggested by Bernal-Vasquez et al.⁵². Following this, datapoints detected as significant outliers were accordingly discarded. Afterwards, yield records of series 1 to 4 were adjusted series-wise for trials, replicates and/or effects of incomplete blocks using Eq. (1) according to the experimental design(s) specific for each series (Table 2) while this adjustment underwent experiment-wise for series 5. In a next step, 161 hybrids (144 IPK plus 17 LSA hybrids) with low homogeneity and/or hybridity were discarded from the integrated dataset and IPK hybrids plus line parent genotypes were subtracted for parameter estimation.

Parameter estimations within experiments

Following data preparation, parameter estimation underwent first experiment-wise for YR-scores and yield breeding value validation experiments. Best linear unbiased estimations (BLUEs) of genotypes for YR-scores were computed for each scoring timepoint (“single”, “early” or “late”) of replicated experiments as well as for yield performance in each yield breeding value validation experiment. For this, effects of trials, experiments, and their interactions with genotypes were ignored in Eq. (1) and design effects were considered/skipped according to each specific experiment (Tables 1 and 2). Due to the absence of replications in QLB_YR_2019 and SST_2019_6, YR-scores in these particular experiments were adjusted out of the frame of mixed models using the means of corresponding incomplete blocks. In parallel, variance components of single replicated experiments were estimated for each scoring timepoint in a similar fashion as BLUEs but assuming genotypes as random. Variance estimates were used to compute experiment-specific heritabilities in the way:

h_{Experiment}^{2} = \frac{{\hat{σ}}_{g}^{2}}{{\hat{σ}}_{g}^{2} + \frac{{\hat{σ}}_{E r r o r}^{2}}{\bar{N . Rep .}}}

where ${\hat{σ}}_{g}^{2}$ and ${\hat{σ}}_{E r r o r}^{2}$ are the genotypic and error variance estimates, respectively, while $\bar{N . Rep .}$ is the average number of effective replicates after considering missing plots and/or outlier-correction.

Parameter estimations across experiments

Large-scale YR screening experiments: firstly, a correlation test for BLUEs and/or experimental design corrected data was performed between experiments. Later, BLUEs of genotypes and variance components of YR-scores were obtained from the outlier-corrected data integrated across 12 experiments. For this, the trial effect was dropped from Eq. (1). The heritability across experiments was then computed as:

h_{Across}^{2} = \frac{{\hat{σ}}_{g}^{2}}{{\hat{σ}}_{g}^{2} + \frac{{\hat{σ}}_{g \times E x p .}^{2}}{\bar{N . Exp .}} + \frac{{\hat{σ}}_{\bar{E r r o r}}^{2}}{\bar{N . Exp .} \times \bar{N . Rep .}}},

where ${\hat{σ}}_{g \times E x p .}^{2}$ is the variance of the interaction between genotypes and experiments, ${\hat{σ}}_{\bar{E r r o r}}^{2}$ is the average error variance nested within experiments, $\bar{N . Exp .}$ is the average number of effective experiments used to test a genotype, while all other components in Eq. (3) retain the definitions from Eq. (2).

Precision balanced YR experiments: data of one genotype (‘PilgrimPZO’) was discarded from these integrated analyses due to material mislabeling. In a first step, correlations of BLUEs and/or experimental design corrected data were computed between experiments. In addition, the maximum value among early and late scorings or single timepoint scoring were selected experiment-wise for each genotype based on single experiment BLUEs or data corrected for incomplete-block effects in the case of QLB_YR_2019. Using this integrated dataset BLUEs were computed across experiments by fitting Eq. (1) but only considering μ as well as genotype, experiment and error effects. By assuming μ as fixed factor and the remaining model effects as random, ${\hat{σ}}_{g}^{2}$ was obtained but the error term and genotype × experiment interaction were confounded in this model. Assuming that the average of single-experiment error variance estimates equals ${\hat{σ}}_{\bar{E r r o r}}^{2}$ , ${\hat{σ}}_{g \times E x p .}^{2} = {\hat{σ}}_{E r r o r}^{2} - \frac{{\hat{σ}}_{\bar{E r r o r}}^{2}}{\bar{N . Rep .}}$ , where ${\hat{σ}}_{E r r o r}^{2}$ is the variance estimate of the confounded error and interaction terms of the model. After this, Eq. (3) was used to estimate the heritability of YR-scores across precision experiments.

Yield breeding value estimation experiments: in a first step, correlations of BLUEs and/or experimental design corrected data were computed between experiments. Later, the following mixed model was fitted to the outlier-and-design corrected yield data from 22 estimation experiments:

Yield ~ Type + Experiments + Lines + Hybrids + Lines \times Experiments + Hybrids \times Experiments + Error,

where Type includes the specific group mean of either lines (μ_Lines) or hybrids (μ_Hybrids) and was assumed as fixed, while hybrid and lines, experiments and their interactions with genotypes as well as the error nested within experiments were assumed as random factors. For lines, variance estimates ${\hat{σ}}_{g_{L i n e s}}^{2}$ and ${\hat{σ}}_{g_{L i n e s} \times E x p .}^{2}$ of yield per se performance are directly obtained from Eq. (4), while the total variance of hybrid yield can be further decomposed as follows:

{Yield}_{Hybrids} ~ μ_{Hybrids} + Experiments + {GCA}_{Females} + {GCA}_{Males} + SCA + {GCA}_{Females} \times Experiment + {GCA}_{Males} \times Experiments + Error,

where GCA_Females and GCA_Males are the general combining abilities (GCA) of female and male parents, respectively, SCA is the specific combining ability between parents, while the error term is confounded with the SCA×Experiments interaction. From Eq. (5), ${\hat{σ}}_{g_{H y b r i d s}}^{2} = {\hat{σ}}_{G C A_{F e m a l e s}}^{2} + {\hat{σ}}_{G C A_{M a l e s}}^{2} + {\hat{σ}}_{S C A}^{2}$ , and ${\hat{σ}}_{g_{H y b r i d s} \times E x p .}^{2} = {\hat{σ}}_{G C A_{F e m a l e s} \times E x p .}^{2} + {\hat{σ}}_{G C A_{M a l e s} \times E x p .}^{2}$ , are derived. These estimates in addition to variance components of lines and ${\hat{σ}}_{\bar{E r r o r}}^{2}$ , were used to compute yield heritabilities ( $h_{Lines}^{2}$ and $h_{Hybrids}^{2}$ ) across experiments according to Eq. (3). In parallel, the breeding value of the i^th male parent genotype was defined as ${\hat{μ}}_{Hybrids} + {\hat{GCA}}_{Male (i)}$ , where ${\hat{GCA}}_{Male (i)}$ is the best linear unbiased predictor (BLUP) of the corresponding male parent component. In case of PGR tested in hybrids as both PGR and SSD-PGR, the respective breeding values were averaged into a single estimate. The heritability of breeding values estimated in the hybrid context was defined as:

h^{2} = \frac{{\hat{σ}}_{G C A_{M a l e s}}^{2}}{{\hat{σ}}_{G C A_{M a l e s}}^{2} + \frac{{\hat{σ}}_{G C A_{M a l e s} \times E x p .}^{2}}{\bar{N . Exp .}} + \frac{{\hat{σ}}_{S C A}^{2}}{\bar{N . Fem .}} + \frac{{\hat{σ}}_{\bar{E r r o r}}^{2}}{\bar{N . Exp .} \times \bar{N . Rep .}}},

where $\bar{N . Fem}$ . is the average number of crossing females used to test male parents, while the remaining terms retain all previous definitions.

Yield breeding value validation experiments: BLUEs of genotypes and variance components of the yield performance across breeding value validation experiments were obtained using Eq. (1) but skipping replicate and trial effects and assuming a common error variance for all experiments. The heritability of yield performance across experiments was then computed as specified in Eq. (3).

Linear mixed models of phenotypic data analyses were fitted using the average information matrix algorithm for restricted maximum likelihood (REML) computation implemented in ASReml-R (v3.0 and 4.0)^53,54.

Molecular diversity and linkage disequilibrium as captured by genotyping platforms

Considering the total number of genotypes for each genotyping platform (GBS and WGS) as n and the total number of filtered markers as p, SNP-matrices can be represented as M = (m_si), with 1 ≤ i ≤ n and 1 ≤ s ≤ p. Given 1 ≤ j ≤ n, the Rogers’ distance⁵⁵ between any i-th and j-th genotypes is calculated in the way: $\frac{1}{2 p} \sum_{s = 1}^{p} ∣m_{s i} - m_{s j}∣$ . For each row of M, homozygous states for reference and alternative alleles were coded as 0 and 2 according to RefSeq v1.0, respectively, while 1 represented the heterozygous state. A principal coordinate analysis (PCoA) was conducted on Rogers’ distance matrices using the cmdscale() function in R. Here, the first two PCo, i.e. PCo1 and 2, were retained to respectively portray the molecular diversity captured by GBS and WGS using biplots. To investigate the level of concordance between GBS and WGS, a Mantel correlation test⁵⁶ as implemented in the vegan R package (v2.5–7)⁵⁷ was performed on the Rogers’ distance matrices for 454 overlapping genotypes between both platforms.

SNP filtering resulted in 29,846 GBS and 1,452,806 WGS markers having a minimum QUAL score of 40, a maximum missing value rate of 10%, ≥10 genotypes carrying any of both alleles in homozygous state, and up to 1% heterozygosity. This implied 24,091,446 and more than 67 billion intra-chromosomal marker combinations to be assessed for GBS and WGS platforms, respectively. To reduce the computational burden for WGS SNP markers, variants were chromosome-wise downsampled to an expected value of 10 markers per Mb, resulting in 145,307 markers across 21 chromosomes and the unanchored sequences. Intra-chromosomal linkage disequilibrium between marker (column) pairs of t(M) was estimated as the squared correlation coefficient (r²)⁵⁸, while physical distances were computed as the pairwise Euclidean distance between SNP marker positions on RefSeq v1.0 of Chinese Spring⁴¹. Efficient correlation computation was performed using the bigcor() function implemented in the propagate R package (v1.0–6)⁵⁹. After this, cubic splines were fitted in R environment using smooth.spline() to estimate the r² decay as a function of the distances between marker pairs in different genetic pools: PGR samples, European elite cultivars and German elite breeding lines.

Genomic-phenotypic data interoperability

The accuracy of the genomic best linear unbiased prediction (GBLUP)⁶⁰ was used as a quality measure for data interoperability of overlapping phenotypic and genomic datasets. Using matrix nomenclature, the mixed model⁵⁰ for GBLUP can be described as follows:

Y = 1_{n} μ + Z g + e,

where Y denotes an n-dimensional vector of trait values for n genotypes, 1_n is a unit vector of length n, μ indicates the fixed common population mean, Z corresponds to a design matrix connecting the elements of g to Y, g represents an n-dimensional vector of random genotypic values and e is the random residual term. Traits corresponded to either ready-to-use BLUEs of YR-scores across large-scale screening or precision experiments as well as ready-to-use yield breeding values of PGR estimated across experiments using ‘Elite × PGR’ F₁ crosses. In Eq. (7), $g ~ N (0, σ_{g}^{2} G)$ and $e ~ N (0, σ_{e}^{2} I)$ , where G is an additive genomic relationship matrix computed from markers based on the first method of VanRaden⁶⁰, I indicates an identity matrix, while $σ_{g}^{2}$ and $σ_{e}^{2}$ correspond to the genotypic and error variance components of the model, respectively. For G matrix computation, profiles in M were coded as −1, 1, 0, for the reference and alternative alleles at homozygous and heterozygous states, respectively, while missing values were imputed using the average of observed values for each particular locus. Prediction accuracies of GBLUP were estimated by means of five-fold cross validations. For this, datasets containing both phenotypic and genomic data were randomly subdivided into five groups. The first four groups served together as the training set, whereas the fifth group corresponded to the prediction set. During prediction, the phenotypes of the prediction set were masked, while monomorphic markers across training and predictions sets were discarded for G matrix computation. After prediction, the accuracy was computed for genotypes in the prediction set as the Pearson correlation coefficient between predicted and observed values divided by $\sqrt{h^{2}}$ , with h² being the heritability of the corresponding predicted trait. The five groups were permuted, so that each of them serves exactly four times as training set, and one time as prediction set. The random subdivision into five groups was repeated 20 times, giving a total of 5 × 20 = 100 cross-validation runs.

Mixed model equations for genomic prediction were computed using REML as implemented in the rrBLUP R package (v4.6.1)⁶¹. All computational methods related to phenotypic analyses and genomic prediction were implemented within R statistical environment⁴⁶ (v3.4.4, v3.6.1).

Data Records

Raw sequencing reads: FASTQ files containing raw reads for 8,070 (GBS) and 768 (WGS) genotypes were deposited at the European Nucleotide Archive⁶²: GBS⁶³ and WGS^64,65. Sequenced genotypes are findable through their ‘SAMEA’ IDs on EMBL-EBI BioSamples⁶⁶: a full list of integrated GBS and WGS ‘SAMEA’ BioSample IDs connected with plant material passports, passport data sources, SSD- and IPK genebank DOIs was deposited at e!DAL-PGP⁶⁷ and can be accessed here⁶⁸.

SNP markers: variant calling results based on read mapping against the reference sequence of Chinese Spring (RefSeq v1.0)⁴² were stored as VCF files. Unfiltered VCF files for GBS (‘090222_8070_sample_unfiltered_snps_biosample.vcf.gz’) and WGS (‘070222_768_samples_wgs_no_filter_biosample.vcf.gz’) data are located at the European Nucleotide Archive and can be accessed here⁶⁹. These files contain information on 1,628,276 and 213,804,916 SNP markers with a minimum QUAL = 40 and polymorphic among 8,070 GBS and 768 WGS samples, respectively. Genotype names are coded using the respective ‘SAMEA’ BioSample IDs⁶⁸. VCF files contain relevant information for each SNP regarding its chromosome, physical position on chromosome in bp, reference and alternative alleles, as well as QUAL. R objects containing reduced numbers of SNP markers used for technical validations were deposited into the e!DAL-PGP repository and can be accessed here⁷⁰.

Phenotypic records were also deposited into e!DAL-PGP:

YR-scorings: infection severity of Puccinia striiformis f. sp. tritici on wheat plants were field recorded on plots and expressed in a 1 (no symptoms) to 9 (severe infection) scoring scale following the official protocols of the German Federal Plant Variety Office⁴⁸. Text files containing YR-scores were stored in ISA-Tab format. After outlier correction, the effective number of records from large-scale screening⁷¹ and precision experiments⁷² amount to 35,043 and 15,353, respectively. Besides phenotypic records, each ISA-Tab file includes information that connect records with the corresponding plant material, incomplete block, replication, year, location, and experiment, in addition to plant material passports, passport data sources, SSD- and IPK genebank DOIs, as well as ‘SAMEA’ IDs. Ready-to-use BLUEs computed across large-scale screening experiments⁷¹ for 7,682 PGR and 80 elite cultivars as well as BLUEs across precision experiments⁷² for 199 elite cultivars and 600 SSD-PGR are available in the’BLUEs_and_heritabilities‘ folder associated to each corresponding dataset.

Yield records: wheat grain yield was field recorded on plots and expressed in Mg ha⁻¹ on a 140 g H₂O kg⁻¹ moisture basis. Text files containing grain yield records were stored in ISA-Tab format. After correction for outliers and design effects, the effective number of phenotypic records in breeding value estimation experiments amounts to 7,407⁷³. The ISA-Tab file contains also information to connect phenotypic records with the corresponding tested material, material type (hybrid or line), parents, year, location, experiment, and series, in addition to plant material passports of parent lines, their passport data sources, SSD- and IPK genebank DOIs, as well as ‘SAMEA’ IDs. Ready-to-use breeding values computed across estimation experiments for 707 PGR can be found in the respective ‘Breeding_values_and_heritabilities’ folder⁷³. For breeding value validation experiments, the number of yield records corresponds to 739⁷⁴. The corresponding ISA-Tab file includes also the information needed to connect phenotypic records to the respective plant material, FAMILY-DOI, incomplete block, year, location, and experiment, in addition to plant material passports of check cultivars and parent lines, their passport data sources, SSD- and IPK genebank DOIs, as well as ‘SAMEA’ IDs. Ready-to-use BLUEs of grain yield computed across validation experiments are also available for 189 advanced F_3:4 progenies and 15 elite cultivar checks in the corresponding ‘BLUEs_and_heritabilities’ folder⁷⁴.

For more details on genomic and phenotypic data production, preparation, and processing, please refer to the Methods section. Machine readable details are also included in a ‘i_investigation.txt’ file associated to each phenotypic dataset^71–74.

Technical Validation

Genotyping-by-sequencing is a precise and cost-efficient platform to study molecular diversity in genebanks

For GBS, the average total read count per genotype (after trimming) corresponded to ~2.64 million, while a WGS sample presented on average ~354.8 million reads (Table 3). These numbers slightly decreased to ~2.59 (98.3% of total) and ~349.1(98.4%) after read mapping against Chinese Spring (RefSeq v1.0)⁴², with 62.7% (GBS) and 47.7% (WGS) of the average read count per genotype having a mapping quality >q20. These reads (>q20) were retained during variant calling and allowed the obtention of VCF files for 8,070 GBS and 768 WGS samples, respectively. SNP markers with QUAL ≥40, ≤10% missing values rate, at least 10 genotypes carrying each allele in homozygous state, and a maximum of 1% heterozygosity, were used to assess the molecular diversity, linkage disequilibrium and genomic-phenotypic data interoperability. The latter can be found after quality assessment sections of phenotypic data. After filtering a total of 29,846 SNP markers across 8,070 genotypes were available for GBS, while the SNP-matrix for WGS contained 1,452,806 markers across 768 genotypes⁷⁰.

Table 3.

Minimum, maximum, median and average sequencing read counts per genotype characterized with genotyping-by-sequencing (GBS) or whole-genome sequencing (WGS, 3-fold coverage).

Reads	GBS (8,070 genotypes)
Reads	Minimum	Median	Average	Maximum
All^a	778,809	2,406,622	2,637,070.2	27,663,505
Mapped
Total	757,460	2,369,798.5	2,591,115.5	27,296,923
>q1	576,566	1,798,418.5	1,962,639.1	20,679,706
>q20	481,039	1,486,933	1,623,899	17,034,192
>q30	415,903	1,284,438.5	1,402,438	14,765,665
	WGS (768 genotypes)
All	56,358,130	346,951,848	354,818,926.3	1,015,707,996
Mapped
Total	55,439,721	341,737,943	349,082,726.1	999,714,679
>q10	27,616,390	178,233,289.5	178,555,659.1	473,928,110
>q20	25,815,724	166,330,159.5	166,505,569.7	439,321,203
>q30	24,084,200	154,655,202	154,777,065.9	406,186,889

Open in a new tab

Read counts are presented according to different minimum read mapping quality (q) levels.

^aAfter trimming.

A PCoA was conducted on the Rogers’ distances to assess the molecular diversity among the 8,070 GBS and the 768 WGS samples (Fig. 3a,b). Rogers’ distance matrices and a custom R code for their computation are also available here⁷⁰. PCo1 and 2 explained 9.5% and 5.6% of the molecular variation portrayed by GBS-SNP markers, respectively, while the reduced number of samples for WGS slightly increased the percentage of explained variation to 13.1% (PCo1) and 7.1% (PCo2). As expected, PGR samples expand the molecular diversity of the elite pool, whose genotypes cluster very close to each other in the left corner of biplots. Coordinates of the German elite breeding lines formed a slightly more contracted group than elite cultivars (Fig. 3b). Nevertheless, the good overlap between these two latter groups reflects the continuous material exchange that takes place between European wheat breeders^75,76.

The intra-chromosomal decay of linkage disequilibrium (r²) as a function of physical distance was estimated by fitting cubic splines curves for each genotypic group and genotyping platform (Fig. 3c,d). Independent of the genotyping platform and as already observed in past studies^77,78, r² values decay faster in genebank samples as compared to modern elite genotypes. European elite cultivars presented in turn a slightly faster decay of linkage disequilibrium compared to the German elite breeding lines (Fig. 3d). Since differences in r² decay between GBS (Fig. 3c) and WGS (Fig. 3d) platforms were less pronounced for elite cultivars, we attribute the faster r² decay in PGR samples portrayed by GBS to the large difference in population size (7,745 GBS vs 446 WGS samples) rather than to the genotyping platforms themselves.

As also reported in our companion study¹⁴, the correlation between GBS- and WGS-based Rogers’ distances among 454 genotypes characterized with both platforms amounted to 0.88 (Mantel correlation test p-value = 0.001). According to correlation estimates from past works comparing GBS with other mainstream genotyping platforms to assess crop plant genebank diversity^7,79, the correspondence between GBS and WGS observed in our study is very high. This is noteworthy, considering that WGS reads deliver the least biased genome representation possible. GBS is a cost-effective and simple method that reduces the complexity of genomes. On the one hand, complexity reduction obviously limits the depth of analysis for large and complex genomes of species such as hexaploid wheat. On the other hand, if the primary objective of characterizations is the macro-assessment of molecular diversity and linkage-disequilibrium in wheat populations, GBS is the method of choice^7,79.

Large-scale screening and precision experiments revealed yellow rust resistance as a rare phenotype in the German Federal ex situ genebank

After outlier correction, heritability estimates for replicated experiments were in general higher than 0.7, with the only exception being SST_2018_5 (h² = 0.54) (Table 4). The highest heritabilities (h² = 0.92) were estimated in inoculated precision experiments ROS_ and WTZ_YR_2019. Due to material mislabeling during the prosecution of experiments, the cultivar ‘PilgrimPZO’ was completely discarded from further phenotypic analyses. The effective number of entries with YR scores (either BLUEs or single point values) ranged from 1,395 to 1,669 per individual screening experiment and between 722 and 797 for precision experiments. Because of the unbalanced structure of large-scale screening experiments, the pairwise entry overlap ranged between four and 1,641 common genotypes. In contrast, these numbers were higher in balanced precision experiments, ranging from 697 to 797 common entries between experiments. In our companion study¹⁴, PGR tested in the first five large-scale screening experiments constituted the base population for trait-customized core selection of the 600 SSD-PGR tested in precision experiments. For this reason, the pairwise entry overlap between large-scale screenings and precision experiments drops drastically from 259–339 to 13–26 in later experiments. All significant pairwise correlations (p-value < 0.05) between design-corrected YR scores from different experiments were positive and ranged between 0.29 and 0.92. In analyses across experiments, the heritability of YR-scores was 0.82 for the large-scale screening and 0.89 for precision experiments. Presumably due to unbalanced phenotyping, a very small proportion (0.4%) of BLUEs computed across large-scale screening experiments for 7,682 PGR plus 80 elite cultivars lied outside of the 1–9 parametric space (Fig. 4). This bias was not observed in the BLUEs computed across precision experiments for 199 elite cultivars and 600 SSD-PGR. Nevertheless, such a bias is ignorable considering the strong correlation [r = 0.77, -log₁₀(p-value) = 128.4] of BLUEs across experiments for the overlapping material between precision and large-scale screening experiments. Large-scale screening experiments revealed that only a small PGR fraction (8.4%) have fewer infection symptoms than an average elite cultivar. We showed in our companion study¹⁴ that this resistant PGR fraction is enriched with material from European origins that entered the IPK genebank during recent decades. The implemented trait-customized core selection approach¹⁴ allowed to more than triple (27.5%) the PGR YR-resistant proportion in precision experiments while increasing exotic molecular diversity and reducing the association between population structure and trait variation. This provided the base to identify genetically diverse PGR donors of YR resistance sources not yet used in elite breeding¹⁴, which are being currently validated using classical and functional genetics approaches. Heritabilities, BLUEs and custom R codes for their computation are also available in the respective ‘BLUEs_and_heritabilities’ folders^71,72.

Table 4.

Heritabilities (h²) and matrix with the effective number of entries (underlined diagonal values) for outlier-and-experiment-design-corrected yellow rust (Puccinia striiformis f. sp. tritici) infections scored in 12 experiments of a large-scale screening and seven precision experiments as well as the significant correlations (p-value < 0.05, above diagonal) and number of overlapping entries (below diagonal) among them.

	Experiment^a	Type^b		Large-scale screening												Precision experiments
			Experiment	GAT_2014_1	SST_2014_1	GAT_2014_2	GAT_2015_2	SST_2015_2	GAT_2015_3	GAT_2016_3	GAT_2017_3	GAT_2017_5	SST_2018_5	GAT_2019_6	SST_2019_6	GAT_YR_2018			ROS_YR_2018	WTZ_YR_2018			GAT_YR_2019			QLB_YR_2019			ROS_YR_2019	WTZ_YR_2019
			Type	1x	1x	1x	1x	1x	1x	1x	1x	1x	1x	M:2x	1x	E	L	M:2x	1x	E	L	M:2x	E	L	M:2x	E	L	M:2x	1x	1x
			h²	1x	1x	1x	1x	1x	1x	1x	1x	1x	1x	M:2x	1x	E	L	M:2x	1x	E	L	M:2x	E	L	M:2x	E	L	M:2x	1x	1x
Large-scale screening	GAT_2014_1	1x	0.86	1514	0.76	0.85	0.79	0.71	0.87	0.51	—	—	—	—	—	0.65	0.63	0.67	0.74	0.68	0.64	0.64	0.53	0.55	0.54	0.57	0.54	0.58	0.69	0.70
	SST_2014_1	1x	0.84	1491	1514	0.79	0.54	0.60	0.73	0.53	0.64	—	—	—	—	0.62	0.61	0.65	0.62	0.72	0.68	0.68	0.50	0.54	0.54	0.51	0.50	0.55	0.72	0.69
	GAT_2014_2	1x	0.88	14	13	1544	0.60	0.70	0.84	0.78	0.65	—	—	—	—	0.55	0.49	0.51	0.59	0.59	0.60	0.60	0.51	0.49	0.50	0.34	0.42	0.44	0.64	0.63
	GAT_2015_2	1x	0.79	15	14	1502	1516	0.69	0.67	0.92	0.72	—	—	—	—	0.54	0.47	0.48	0.51	0.47	0.54	0.54	0.48	0.49	0.50	0.29	0.40	0.40	0.53	0.53
	SST_2015_2	1x	0.83	15	14	1453	1444	1467	0.76	0.51	—	—	—	—	—	0.50	0.46	0.47	0.50	0.64	0.57	0.57	0.42	0.44	0.44	0.34	0.38	0.40	0.62	0.64
	GAT_2015_3	1x	0.90	14	14	12	13	13	1579	0.76	0.75	—	—	—	0.55	—	0.52	0.52	—	—	—	—	—	0.54	0.56	0.61	—	0.59	—	0.54
	GAT_2016_3	1x	0.90	17	16	18	19	19	1525	1560	—	0.39	—	0.55	0.68	0.47	—	—	—	—	—	—	—	—	—	—	—	—	—	—
	GAT_2017_3	1x	0.86	16	15	18	19	19	1504	1512	1539	0.58	—	—	0.49	—	—	—	0.40	—	—	—	—	—	—	0.50	0.55	0.52	0.40	0.60
	GAT_2017_5	1x	0.76	7	7	5	6	6	24	26	24	1395	0.34	—	—	—	—	—	—	—	—	—	—	—	—	—	—	—	0.50	—
	SST_2018_5	1x	0.54	7	7	5	6	6	23	25	23	1363	1415	—	—	—	—	—	—	—	—	—	—	—	—	—	—	—	—	—
	GAT_2019_6	M:2x	0.74	8	8	4	5	5	20	22	20	23	23	1667	0.55	—	—	—	—	—	—	—	—	—	—	—	—	—	—	—
	SST_2019_6	1x	—	8	8	4	5	5	21	23	21	24	24	1641	1669	—	—	—	—	—	—	—	—	—	—	—	—	—	—	—
Precision experiments	GAT_YR_2018	E	0.91	282	282	330	331	319	19	26	26	16	15	14	14	779	0.80	0.88	0.67	0.50	0.59	0.59	0.70	0.76	0.76	0.52	0.53	0.55	0.61	0.61
		L	0.81	286	286	331	332	320	19	26	26	16	15	14	14	773	784	0.97	0.65	0.48	0.56	0.56	0.62	0.69	0.69	0.48	0.48	0.50	0.59	0.60
		M:2x	—	280	280	327	328	316	19	26	26	16	15	14	14	773	773	773	0.67	0.51	0.58	0.58	0.66	0.73	0.73	0.52	0.53	0.55	0.61	0.63
	ROS_YR_2018	1x	0.91	259	259	302	303	292	19	26	26	16	15	14	14	704	709	698	722	0.61	0.63	0.63	0.48	0.53	0.52	0.53	0.56	0.58	0.71	0.70
	WTZ_YR_2018	E	0.89	287	287	334	335	323	19	26	26	16	15	14	14	771	776	765	715	789	0.69	0.69	0.40	0.45	0.45	0.48	0.51	0.54	0.72	0.77
		L	0.88	285	286	328	329	317	19	25	25	16	15	14	14	756	761	750	703	768	775	1.00	0.48	0.54	0.54	0.44	0.47	0.50	0.72	0.69
		M:2x	—	282	283	324	325	313	19	25	25	16	15	14	14	750	755	744	697	768	768	768	0.48	0.54	0.54	0.44	0.47	0.50	0.72	0.69
	GAT_YR_2019	E	0.77	286	284	334	335	323	18	25	25	14	13	13	13	764	769	758	709	774	759	753	782	0.73	0.74	0.44	0.49	0.51	0.51	0.48
		L	0.85	285	285	332	333	321	19	26	26	15	14	13	13	766	771	760	708	776	761	755	771	784	1.00	0.47	0.52	0.53	0.54	0.56
		M:2x	—	281	279	328	329	317	18	25	25	14	13	13	13	753	758	747	698	763	748	742	771	771	771	0.47	0.51	0.52	0.53	0.55
	QLB_YR_2019	E	—	289	289	338	339	327	19	26	26	16	15	14	14	777	782	771	720	788	773	767	780	782	769	797	0.70	0.79	0.55	0.54
		L	—	289	289	338	339	327	19	26	26	16	15	14	14	777	782	771	720	788	773	767	780	782	769	797	797	0.97	0.58	0.57
		M:2x	—	289	289	338	339	327	19	26	26	16	15	14	14	777	782	771	720	788	773	767	780	782	769	797	797	797	0.62	0.61
	ROS_YR_2019	1x	0.92	283	283	335	336	324	19	26	26	16	15	14	14	766	771	760	713	777	764	757	769	771	758	784	784	784	787	0.84
	WTZ_YR_2019	1x	0.92	281	281	332	333	322	19	26	26	16	15	14	14	760	765	754	706	771	757	750	763	765	752	778	778	778	772	780

Open in a new tab

^aWith the exception of SST_2019_6 and QLB_YR_2019, where block-wise corrected data were respectively used, all comparisons were performed using best linear unbiased estimations.

^bIn experiments, either a single measurement (1x), both early (E) and late (L) measurements and/or the maximum value among them (M:2x) were used in comparisons.

Fig. 4 — Distribution of the best linear unbiased estimations (BLUEs) across experiments for outlier-corrected yellow rust (YR, *Puccinia striiformis* f. sp. *tritici*) infections of plant genetic resources (PGR or SSD-PGR) and elite cultivars (Elite) tested in precision (boxplot, upper left corner), large-scale screening (boxplot, lower right) or both types of field experiments (scatter plot, upper right). YR infections were scored using an ordinal rating scale between 1 and 9, where 1 means complete absence of YR leaf symptoms and 9 denotes fully infected leaves. BLUEs that lie outside of the 1–9 parametric space are due to the unorthogonal structure of unbalanced experiments. In total, 19 field experiments were conducted between harvest years 2015 and 2020 considering five German locations. Large-scale screenings fully relied on natural YR infections, while five out of seven precision experiments were artificially inoculated. The exact numbers of genotypes according to each category are included within brackets []. In boxplots, boxes enclose 50% of the central data, including median (black bold line) and mean (black diamond), while whiskers are ± 1.5 × interquartile range and dots represent extreme values. In the scatter plot, ** denotes the significance [-log₁₀(p-value) = 128.4] of the correlation between YR scores from precision and large-scale screening experiments.

Yield breeding values of plant genetic resources to inform breeders and initiate pre-breeding programs

After outlier-and-design correction, the effective number of entries with yield records ranged between 238 and 500 for individual breeding value estimation experiments. Due to the series-wise strategy to test plant material, the highest numbers of overlapping entries with yield records were observed between experiments conducted within the same year (Table 5). In more detail, the number of pairwise overlapping entries among 22 breeding value estimation experiments ranged between 0 and 500. All significant pairwise correlations (p-value < 0.05) between yield records of different experiments had positive sign, with magnitudes ranging between 0.11 and 0.96. Across experiments, 37 elite cultivars in addition to 227 PGR plus 1,429 ‘Elite × PGR’ and four ‘Elite₁ × Elite₂’ F₁ hybrids have yield records in the outlier-and-design corrected dataset. Heritabilities of per se yield performance amounted to 0.89 for elite cultivars and PGR together, while the heritability of hybrid performance was 0.50, as also reported in our companion work¹⁴. The 1,429 ‘Elite × PGR’ originate from crossing 36 elite cultivars with 205 PGR and 510 SSD-PGR which, put together, trace back to 707 PGR tested in 1,427 merged ‘Elite × PGR’ hybrid crosses. Yield breeding values of PGR computed across the 22 estimation experiments ranged between 6.11 and 7.11 Mg/ha, with a mean of 6.79 (Fig. 5a). As reported in our companion work¹⁴, the heritability of breeding values was 0.32, which reflects the complexity of handling less than half of the genetic variation underlying yield in hybrids. A custom R code for breeding value estimation of PGR and heritability computations as well as its expected outputs are available in the ‘Breeding_values_and_heritabilities’ folder⁷³.

Table 5.

Matrix with the effective number of entries (underlined diagonal values) in, as well as significant correlations (p-value < 0.05, above diagonal) and number of overlapping entries (below diagonal) among 22 outlier-and-design-corrected experiments used to estimate the yield breeding value of 707 PGR using ‘Elite × PGR’ F₁ hybrid crosses.

Experiment	BOH _2016	GAT _2016	HOH _2016	RNG _2016	SST _2016	ASD _2017	GAT _2017	HOH _2017	RNG _2017	SST _2017	GAT _2018	HDM _2018	HOH _2018	RNG _2018	SST _2018	GAT _2019	HDM _2019	HOH _2019	RNG _2019	SST _2019	GAT _2020	SST _2020
BOH_2016	240	0.91	0.85	0.83	0.90	0.35	0.47	0.32	0.26	0.26	—	—	0.72	0.93	—	—	—	—	—	—	—	—
GAT_2016	240	240	0.82	0.84	0.89	0.36	0.65	0.29	—	—	0.72	—	0.72	0.96	0.81	—	—	—	—	—	—	—
HOH_2016	240	240	240	0.79	0.83	0.29	0.59	0.39	0.28	0.27	0.67	—	—	0.92	—	—	—	—	—	—	—	—
RNG_2016	238	238	238	238	0.77	—	—	—	—	0.29	0.72	—	—	0.95	0.78	—	—	—	—	—	—	—
SST_2016	239	239	239	237	239	0.46	0.68	0.35	—	—	—	—	0.89	0.89	—	—	—	—	—	—	—	—
ASD_2017	94	94	94	94	94	353	0.62	0.58	0.65	0.30	—	—	—	—	—	—	—	—	—	—	—	—
GAT_2017	95	95	95	95	95	353	354	0.57	0.69	0.45	—	—	—	0.89	—	—	—	—	—	—	—	—
HOH_2017	93	93	93	93	93	338	339	340	0.58	0.17	—	—	0.79	—	—	—	—	—	—	—	—	—
RNG_2017	94	94	94	94	94	348	349	334	349	0.40	—	—	—	—	—	—	—	—	—	—	—	—
SST_2017	95	95	95	95	95	347	348	334	343	348	—	—	—	—	0.81	—	—	—	—	—	—	—
GAT_2018	9	9	9	9	9	8	8	7	7	8	292	—	0.54	0.56	0.49	—	—	—	—	—	—	—
HDM_2018	9	9	9	9	9	7	7	6	6	7	279	316	0.18	0.11	0.17	—	—	—	—	—	—	—
HOH_2018	9	9	9	9	9	8	8	7	7	8	289	313	328	0.80	0.53	—	—	—	—	—	—	—
RNG_2018	9	9	9	9	9	8	8	7	7	8	288	312	324	328	0.49	—	—	—	—	—	—	—
SST_2018	7	7	7	7	7	7	7	6	6	7	256	244	253	252	257	—	—	—	—	—	—	—
GAT_2019	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	374	0.32	—	0.51	0.53	—	—
HDM_2019	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	374	377	—	—	0.54	—	—
HOH_2019	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	320	322	322	—	0.17	—	—
RNG_2019	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	363	366	312	366	0.29	—	—
SST_2019	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	372	375	320	364	375	—	—
GAT_2020	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	500	0.55
SST_2020	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	500	500

Open in a new tab

Fig. 5 — Using yield breeding value estimates of plant genetic resources (PGR) to initiate pre-breeding programs in wheat. (a) Kernel density distribution of yield breeding values (Mg/ha) for 707 PGR. Breeding values were estimated using yield data of ‘Elite × PGR’ F₁ hybrids from 22 field experiments conducted between harvest years 2016 and 2020. Based on preliminary data from 2016, 13 PGR with superior breeding values were used as male parents in two- (Elite₁ × PGR) and three-way (Elite₂ × [Elite₁ × PGR]) crosses involving 11 adapted elite cultivars. Vertical dashed lines indicate the breeding values of selected PGR estimated across the full set of 22 experiments. (b) After two-stage selection for high leaf health and reduced plant height, 173 advanced F_3:4 PGR-derived progenies tracing back to 32 initial crosses were tested together with 15 elite cultivar checks (black dots) and 16 additional IPK pre-breeding lines (gray dots) in yield validation experiments conducted in two locations during harvest years 2020 and 2021. The best linear unbiased estimations of yield (Mg/ha) computed across validation experiments for the tested material are portrayed and grouped according to each initial cross. The color legend of PGR-derived populations matches that of the selected PGR parents used in initial crosses. Horizontal dotted and dashed lines indicate the yield performances of the best newest cultivar (‘LGCharacter’) and the mostly grown cultivar during the last decade (‘RGTReform’) in Germany, respectively.

Based on preliminary results from estimation experiments conducted during harvest year 2016, 13 PGR were selected for their superior breeding value to initiate a small pre-breeding program using 11 European elite cultivars as breeding value receptors (Fig. 5). Computed across the 22 estimation experiments, breeding values of selected PGR ranged from 6.80 to 7.05 Mg/ha and were superior to the general mean of breeding value estimates (Fig. 5a). Progenies from seven and 25 Elite₁ × PGR and Elite₂ × [Elite₁ × PGR] initially performed crosses, respectively, were advanced to F_3:4 and pre-selected for good visual performance. The in total 173 advanced F_3:4 progenies, with at least one progeny per initial cross, were tested for grain yield together with 15 elite cultivar checks and 16 additional pre-breeding lines in four validation experiments conducted in two locations during harvest years 2020 and 2021 (Fig. 5b). A custom R code for BLUEs and heritability computation as well as its expected output files are available in the ‘BLUEs_and_heritabilities’ folder⁷⁴. Yield performances were highly repeatable, which was reflected by the heritabilities within (h² ≥ 0.76, Table 6) and across (h² = 0.76) validation experiments as well as by the significant positive correlations among them (r ≥ 0.68, Table 6). The highest yields were observed in general for elite check cultivars, with grain yield values ranging between 5.94 and 7.48 Mg/ha (Fig. 5b). For them, the year of cultivar release was significantly and positively correlated with the yield performance (r = 0.79, p-value < 0.001), reflecting the advances in yield breeding achieved between years 2007 and 2020. Among the 173 + 16 = 189 advanced F_3:4 progenies, up to two with an average of only 1.1 progenies per initial Elite₁ × PGR cross made it through the first selection stages based on visual performance. Adding a second elite cultivar to pedigrees as Elite₂ × [Elite₁ × PGR] increased in general the number of progenies per initial cross with good visual performance to an average of 6.7, with the least and most prolific crosses having one and 21 F_3:4 progenies, respectively. Two-way crosses were as a group also significantly less competitive than three-way crosses ( ${\hat{μ}}_{T h r e e - w a y} - {\hat{μ}}_{T w o - w a y}$ = 0.72 Mg/ha, p-value < 0.001). The lower outputs for two-way crosses could be attributed to the increased proportion of deleterious PGR background still present in them, which also indicates that a realistic use of PGR variation for pre-breeding is achieved through three-way crosses. In fact, three-way crossing schemes are already the main strategy to introduce PGR variation into large-scale global pre-breeding programs such as Seeds of Discovery²² (https://seedsofdiscovery.org/). Although none of the PGR-derived progenies was as competitive as the best check cultivar ‘Informer’ (released in 2018), among the 173 F_3:4 progenies whose PGR parents were selected based on breeding values, three (2%) and 15 (9%) had better yield performance than the best newest (‘LGCharacter’, released in 2020) and the locally most grown cultivar (‘RGTReform’, released in 2014), respectively. Regarding the additional 16 pre-breeding lines whose PGR parents lack of breeding value estimates, none of them could reach these previously mentioned yield levels. Global efforts of the Seeds of Discovery initiative led to the development of 2,867 pre-breeding lines that trace back to 366 exotic wheat PGR²². Multiple environment yield trials conducted in Central America and South Asia revealed that locally, up to ~2% of these pre-breeding lines have better yield performance than the best adapted varieties. Moreover, pre-breeding lines with beneficial traits trace back to 62 of the 366 exotic founders (17%) used as PGR parents by Seeds of Discovery²². In our small pre-breeding program established using breeding values as parent selection tool for PGR (Fig. 5), competitive pre-breeding lines trace back to eight of the 13 selected PGR (62%) parents. All in all, at least as regards visual performance and yield, our validation experiments suggest that using breeding values as tool could boost the input-to-output ratio for pre-breeding programs.

Table 6.

Heritabilities (h²) and matrix containing the effective numbers of entries (underlined diagonal values) for best linear unbiased estimates of yield in four pre-breeding validation experiments as well as the significant correlations (p-value < 0.05, above diagonal) and number of overlapping entries (below diagonal) among them.

Experiment	h²	Experiment
Experiment	h²	GAT_PB_2019	SST_PB_2019	GAT_PB_2020	SST_PB_2020
GAT_PB_2019	0.91	95	0.72	—	—
SST_PB_2019	0.85	56	57	—	—
GAT_PB_2020	0.87	3	3	118	0.68
SST_PB_2020	0.76	3	3	108	108

Open in a new tab

High genomic prediction accuracies support the suitability of data for integrated phenotypic-genomic analyses

Seed mixtures, sample mislabeling, among other sources of systematic errors can occur in large-scale characterizations. This obviously disrupts the connectivity between genotype and phenotype and in turn, decrease the value of the data for integrated analyses. To rule-out the presence of such data-imparity, we used the cross-validated accuracy of genomic prediction as a quality measure for genomic-phenotypic data interoperability (Fig. 6).

Integrating YR-scorings and GBS data resulted in 6,371 and 799 genotyped samples having records in large-scale screenings and precision experiments, respectively (Fig. 6). In cross-validations, the genomic-phenotypic data interoperability between YR-scorings and GBS was in general high, with prediction accuracies of 0.72 ± 0.02 and 0.69 ± 0.05 for large-scale screening and precision experiments, respectively. As expected from past simulation studies^80,81, the slightly higher accuracy for large-scale screenings than for precision experiments can be attributed to the ~8 times larger population size of the former group, which provided in turn also a small increase (~6%) in the number of polymorphic markers used for prediction. Shifting to WGS decreased population sizes for large-scale screenings and precision experiments to 309 and 454 samples but accuracies only slightly dropped to 0.69 ± 0.09 and 0.66 ± 0.08, respectively. Moreover, the ~55-fold increase in number of polymorphic markers from WGS provided practically no improvement in accuracy over GBS when the same population size was considered for both genotyping platforms (see GBS* in Fig. 6). This last observation is not surprising considering that GBLUP⁶⁰ mostly relies on relatedness for prediction^80,81 and that, as mentioned before, GBS- and WGS-based Rogers’ distances were highly correlated. Although out of the scope of this work, we anticipate that alternative genomic prediction methods less dependent on relatedness^80,81 would benefit more from the increased marker densities provided by WGS.

GBS samples of PGR having also breeding value estimates amount to 597 (Fig. 6). Prediction accuracies of 0.68 ± 0.1 support also the high genomic-phenotypic data interoperability for these samples and come very close to the estimates presented in our companion study using 1,000 instead of 100 cross-validations¹⁴. Last but not least, only 24 PGR samples have both WGS data and breeding value estimates – a too limited number of genotypes to meaningfully assess the genomic-phenotypic data interoperability. Although integrated analyses for WGS and yield breeding values are currently not advisable, PGR are available upon request, and we thus fully encourage future activities that increase the connectivity between these two types of data. A custom R code to assess the genomic-phenotypic data interoperability as well as its needed inputs and expected outputs are available here⁷⁰.

Usage Notes

We expect that these FAIR data support and encourage future research and breeding initiatives that further valorize crop plant genebanks. The genebank material of the ‘TRI’ catalogue is available upon request using IPK genebank DOIs and can be accessed through GBIS (gbis.ipk-gatersleben.de) under the conditions of a standard material transfer agreement (SMTA). Seeds of field isolated accessions and pre-breeding material can be requested upon availability through their SSD- and FAMILY-DOIs, respectively, by following SMTA conditions as well (contact e-mail: reif@ipk-gatersleben.de).

Acknowledgements

This research work was primarily funded by the German Federal Ministry of Education and Research through the Project GeneBank2.0 (grant no. FKZ031B0184B and FKZ031B0184A, J.C.R.). Further financial support was provided by the German Federal Ministry of Food and Agriculture via the GenDiv-Project (grant no. 2814603813, N.S.). Costs for open access publishing were partially funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation, grant 491250510). Authors are grateful to Andreas Börner who provided seeds of the ‘B’ collection. A special mention goes to Christoph Martin, Jelena Perovic, Johannes Schneider, Sonja Gentz, Andrea Kunze, Martina Kühne, Lena Gaczensky and Martin Koch for their great technical support in field trials and seed logistics, as well as Susanne König, Jacqueline Pohl, Ines Walde and Manuela Knauft for their technical assistance in producing genomic data. Authors are additionally thankful to Jens Bauernfeind, Thomas Münch and Heiko Miehe for administration of the IT infrastructure.

Author contributions

J.C.R., M.M., N.S. and V.K. developed the scientific concept. M.O. and S.W. provided passport information of the ‘TRI’ catalog as well as DOIs for their derived progenies. A.W.S. collected passports for the remaining plant material. S.K. provided DNA for GBS and A.H. and N.S. produced sequencing raw reads. A.H. and N.S. obtained high quality DNA samples and generated WGS raw reads. A.W.S., U.B., N.Philipp, A.S., N.Pfeiffer, P.H.G.B. and J.S. performed YR resistance characterizations. N.Philipp, P.H.G.B. and C.F.H.L. conducted yield breeding value estimation experiments. N.Philipp, M.R. and J.C.R. performed yield breeding value validation experiments. S.M.K. processed sequencing reads and generated SNP matrices. A.W.S., S.M.K., Y.Z., N.Philipp, M.R., F.L., D.S. and A.F. performed data curation. A.W.S. and A.G. transformed phenotypic data into ISA-Tab format. D.S., A.F., M.L., U.S. facilitated the data management, as well as the sequence and variation data submission to public repositories. A.W.S. performed downstream analyses for the current work with the support of Y.Z., N.Philipp and M.R.. A.W.S. wrote the manuscript with the input of all other co-authors.

Funding

Open Access funding enabled and organized by Projekt DEAL.

Code availability

The custom awk script for filtering of VCF files is available at e!DAL-PGP and can be accessed here⁸². Custom R codes for phenotypic parameter estimations are included within the respective ‘R_code’ subfolder of each dataset^71–74 deposited into e!DAL-PGP. In addition, custom R codes to assess the genomic-phenotypic data interoperability and the computation of Roger’s distances as well as their needed inputs and expected outputs were also deposited into e!DAL-PGP and can be accessed here⁷⁰.

Competing interests

All authors declare no competing interests.

Footnotes

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

1.FAO. Crop prospects and food situation - Quarterly global report No. 4, December 2021, Rome. https://www.fao.org/3/cb7877en/cb7877en.pdf (2021).
2.d’Amour CB, et al. Future urban land expansion and implications for global croplands. Proc. Natl. Acad. Sci. USA. 2017;114:8939–8944. doi: 10.1073/pnas.1606036114. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Cassman KG. Ecological intensification of cereal production systems: Yield potential, soil quality, and precision agriculture. Proc. Natl. Acad. Sci. USA. 1999;96:5952–5959. doi: 10.1073/pnas.96.11.5952. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Tilman D, et al. Forecasting agriculturally driven global environmental change. Science. 2001;292:281–284. doi: 10.1126/science.1057544. [DOI] [PubMed] [Google Scholar]
5.Zhao C, et al. Temperature increase reduces global yields of major crops in four independent estimates. Proc. Natl. Acad. Sci. USA. 2017;114:9326–9331. doi: 10.1073/pnas.1701762114. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Ray DK, Ramankutty N, Mueller ND, West PC, Foley JA. Recent patterns of crop yield growth and stagnation. Nat. Commun. 2012;3:1293. doi: 10.1038/ncomms2296. [DOI] [PubMed] [Google Scholar]
7.Chu J, et al. Suitability of single-nucleotide polymorphism arrays versus genotyping-by-sequencing for genebank genomics in Wheat. Front. Plant. Sci. 2020;11:42. doi: 10.3389/fpls.2020.00042. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Hammer K, Knüpffer H, Xhuveli L, Perrino P. Estimating genetic erosion in landraces - two case studies. Genet. Resour. Crop. Evol. 1996;43:329–336. doi: 10.1007/BF00132952. [DOI] [Google Scholar]
9.Roussel V, Leisova L, Exbrayat F, Stehno Z, Balfourier F. SSR allelic diversity changes in 480 European bread wheat varieties released from 1840 to 2000. Theor. Appl. Genet. 2005;111:162–170. doi: 10.1007/s00122-005-2014-8. [DOI] [PubMed] [Google Scholar]
10.Ali S, et al. Yellow rust epidemics worldwide were caused by pathogen races from divergent genetic lineages. Front. Plant. Sci. 2017;8:1057. doi: 10.3389/fpls.2017.01057. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Yuen J. Pathogens which threaten food security: Phytophthora infestans, the potato late blight pathogen. Food Secur. 2021;13:247–253. doi: 10.1007/s12571-021-01141-3. [DOI] [Google Scholar]
12.Wise RP, Bronson CR, Schnable PS, Horner HT. The genetics, pathology, and molecular biology of T-cytoplasm male sterility in maize. Adv. Agron. 1999;65:79–130. doi: 10.1016/S0065-2113(08)60911-6. [DOI] [Google Scholar]
13.Singh RP, et al. Disease impact on wheat yield potential and prospects of genetic control. Annu. Rev. Phytopathol. 2016;54:303–322. doi: 10.1146/annurev-phyto-080615-095835. [DOI] [PubMed] [Google Scholar]
14.Schulthess AW, et al. Genomics-informed prebreeding unlocks the diversity in genebanks for wheat improvement. Nat Genet. 2022;54:1544–1552. doi: 10.1038/s41588-022-01189-7. [DOI] [PubMed] [Google Scholar]
15.Hovmøller MS, et al. Replacement of the European wheat yellow rust population by new races from the centre of diversity in the near-Himalayan region. Plant. Pathol. 2016;65:402–411. doi: 10.1111/ppa.12433. [DOI] [Google Scholar]
16.Hovmøller, M. S. et al. GRRC report of yellow and stem rust genotyping and race analyses 2021. Global Rust Reference Center (GRRC), Aarhus University.https://agro.au.dk/fileadmin/www.grcc.au.dk/International_Services/Pathotype_YR_results/GRRC_Annual_Report2021.pdf (2022).
17.Hafeez AN, et al. Creation and Judicious Application of a Wheat Resistance Gene Atlas. Mol. Plant. 2021;14:1053–1070. doi: 10.1016/j.molp.2021.05.014. [DOI] [PubMed] [Google Scholar]
18.Miedaner T, Juroszek P. Climate change will influence disease resistance breeding in wheat in Northwestern Europe. Theor. Appl. Genet. 2021;134:1771–1785. doi: 10.1007/s00122-021-03807-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Sharma S, et al. Introducing beneficial alleles from plant genetic resources into the wheat germplasm. Biology. 2021;10:982. doi: 10.3390/biology10100982. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Wambugu PW, Ndjiondjop MN, Henry RJ. Role of genomics in promoting the utilization of plant genetic resources in genebanks. Brief. Funct. Genomics. 2018;17:198–206. doi: 10.1093/bfgp/ely014. [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Mascher M, et al. Genebank genomics bridges the gap between the conservation of crop diversity and plant breeding. Nat. Genet. 2019;51:1076–1081. doi: 10.1038/s41588-019-0443-6. [DOI] [PubMed] [Google Scholar]
22.Singh S, et al. Direct introgression of untapped diversity into elite wheat lines. Nat Food. 2021;2:819–827. doi: 10.1038/s43016-021-00380-z. [DOI] [PubMed] [Google Scholar]
23.Philipp N, et al. Grain number and grain yield distribution along the spike remain stable despite breeding for high yield in winter wheat. PLoS One. 2018;13:e0205452. doi: 10.1371/journal.pone.0205452. [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Royo C, Ammar K, Villegas D, Soriano JM. Agronomic, physiological and genetic changes associated with evolution, migration and modern breeding in durum wheat. Front. Plant. Sci. 2021;12:674470. doi: 10.3389/fpls.2021.674470. [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Schneider J, Berkner MO, Philipp N, Schulthess AW, Reif JC. Assessing the suitability of elite lines for hybrid seed production and as testers in wide crosses with wheat genetic resources. Front. Plant Sci. 2021;12:1128. doi: 10.3389/fpls.2021.689825. [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Sommer L, et al. Proof of concept to unmask the breeding value of genetic resources of barley (Hordeum vulgare) with a hybrid strategy. Plant Breed. 2020;139:536–549. doi: 10.1111/pbr.12795. [DOI] [Google Scholar]
27.Longin CFH, Reif JC. Redesigning the exploitation of wheat genetic resources. Trends Plant Sci. 2014;19:631–636. doi: 10.1016/j.tplants.2014.06.012. [DOI] [PubMed] [Google Scholar]
28.Wilkinson MD, et al. The FAIR guiding principles for scientific data management and stewardship. Sci. Data. 2016;3:16001. doi: 10.1038/sdata.2016.18. [DOI] [PMC free article] [PubMed] [Google Scholar]
29.Oppermann M, Weise S, Dittmann C, Knüpffer H. GBIS: the information system of the German Genebank. Database. 2015;2015:bav021. doi: 10.1093/database/bav021. [DOI] [PMC free article] [PubMed] [Google Scholar]
30.Neumann K, Kobiljski B, Denčić SS, Varshney RK, Börner A. Genome-wide association mapping: A case study in bread wheat (Triticum aestivum L.) Mol. Breed. 2011;27:37–58. doi: 10.1007/s11032-010-9411-7. [DOI] [Google Scholar]
31.Janßen, G. Mittelalte ähren-Eisen: 25 Jahre alte Getreidesorten muss man mit der Lupe suchen. IG Nachbau. Available at: https://www.ig-nachbau.de/spezialseiten/ig-nachbau-artikel/details/?tx_ttnews%5Btt_news%5D=4110&cHash=2f65433d95124157b9d7547f3239d95a (Accessed: 9th August 2022) (2021).
32.Keilwagen J, et al. Detecting large chromosomal modifications using short read data from genotyping-by-sequencing. Front. Plant Sci. 2019;10:1133. doi: 10.3389/fpls.2019.01133. [DOI] [PMC free article] [PubMed] [Google Scholar]
33.Kollers S, et al. Whole genome association mapping of Fusarium head blight resistance in European winter wheat (Triticum aestivum L.) PLoS ONE. 2013;8:e57500. doi: 10.1371/journal.pone.0057500. [DOI] [PMC free article] [PubMed] [Google Scholar]
34.Jiang Y, et al. Validating the prediction accuracies of marker-assisted and genomic selection of Fusarium head blight resistance in wheat using an independent sample. Theor. Appl. Genet. 2017;130:471–482. doi: 10.1007/s00122-016-2827-7. [DOI] [PubMed] [Google Scholar]
35.Würschum T, et al. Population structure, genetic diversity and linkage disequilibrium in elite winter wheat assessed with SNP and SSR markers. Theor. Appl. Genet. 2013;126:1477–1486. doi: 10.1007/s00122-013-2065-1. [DOI] [PubMed] [Google Scholar]
36.Longin CFH, et al. Hybrid wheat: quantitative genetic parameters and consequences for the design of breeding programs. Theor. Appl. Genet. 2013;126:2791–2801. doi: 10.1007/s00122-013-2172-z. [DOI] [PubMed] [Google Scholar]
37.Poland JA, Brown PJ, Sorrells ME, Jannink J-L. Development of high-density genetic maps for barley and wheat using a novel two-enzyme genotyping-by-sequencing approach. PLoS ONE. 2012;7:e32253. doi: 10.1371/journal.pone.0032253. [DOI] [PMC free article] [PubMed] [Google Scholar]
38.Wendler N, et al. Unlocking the secondary gene-pool of barley with next-generation sequencing. Plant Biotechnol. J. 2014;12:1122–1131. doi: 10.1111/pbi.12219. [DOI] [PubMed] [Google Scholar]
39.Martin M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet. journal. 2011;17:10–12. doi: 10.14806/ej.17.1.200. [DOI] [Google Scholar]
40.Bittencourt, S. A. FastQC: a quality control tool for high throughput sequence data. https://www.bioinformatics.babraham.ac.uk/projects/fastqc (2010).
41.Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Preprint at https://arxiv.org/abs/1303.3997 (2013).
42.The International Wheat Genome Sequencing Consortium Shifting the limits in wheat research and breeding using a fully annotated reference genome. Science. 2018;361:eaar7191. doi: 10.1126/science.aar7191. [DOI] [PubMed] [Google Scholar]
43.Li H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics. 2018;34:3094–3100. doi: 10.1093/bioinformatics/bty191. [DOI] [PMC free article] [PubMed] [Google Scholar]
44.Li H, et al. The sequence alignment/map format and SAMtools. Bioinformatics. 2009;25:2078–2079. doi: 10.1093/bioinformatics/btp352. [DOI] [PMC free article] [PubMed] [Google Scholar]
45.Li H. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics. 2011;27:2987–2993. doi: 10.1093/bioinformatics/btr509. [DOI] [PMC free article] [PubMed] [Google Scholar]
46.R Development Core Team. R: A Language and Environment for Statistical Computing. (R Foundation for Statistical Computing, Vienna, 2020).
47.Knaus BJ, Grünwald NJ. vcfr: a package to manipulate and visualize variant call format data in R. Mol. Ecol. Resour. 2017;17:44–53. doi: 10.1111/1755-0998.12549. [DOI] [PubMed] [Google Scholar]
48.Bundessortenamt. Richtlinien für die Durchführung von landwirtschaftlichen Wertprüfungen und Sortenversuchen. https://www.bundessortenamt.de/bsa/media/Files/Richtlinie_LW2000.pdf (2000).
49.Boeven PH, et al. Negative dominance and dominance-by-dominance epistatic effects reduce grain-yield heterosis in wide crosses in wheat. Sci. Adv. 2020;6:eaay4897. doi: 10.1126/sciadv.aay4897. [DOI] [PMC free article] [PubMed] [Google Scholar]
50.Henderson CR. Best linear unbiased estimation and prediction under a selection model. Biometrics. 1975;31:423–447. doi: 10.2307/2529430. [DOI] [PubMed] [Google Scholar]
51.Anscombe FJ, Tukey JW. The examination and analysis of residuals. Technometrics. 1963;5:141–160. doi: 10.1080/00401706.1963.10490071. [DOI] [Google Scholar]
52.Bernal-Vasquez A-M, Utz H-F, Piepho H-P. Outlier detection methods for generalized lattices: a case study on the transition from ANOVA to REML. Theor. Appl. Genet. 2016;129:787–804. doi: 10.1007/s00122-016-2666-6. [DOI] [PubMed] [Google Scholar]
53.Butler, D. G., Cullis, B. R., Gilmour, A. R. & Gogel, B. J. ASReml–R Reference manual. Brisbane: The State of Queensland, Department of Primary Industries and Fisheries. https://asreml.kb.vsni.co.uk/wp-content/uploads/sites/3/ASReml-R-3-Reference-Manual.pdf (2009).
54.Butler, D. G. et al. ASReml–R Reference manual version 4. VSN International Ltd., Hemel Hempstead: University of Wollongong. https://asreml.kb.vsni.co.uk/wp-content/uploads/sites/3/ASReml-R-Reference-Manual-4.pdf (2017).
55.Rogers JS. Measures of genetic similarity and genetic distance. Studies in Genetics. 1972;7:145–153. [Google Scholar]
56.Mantel N. The detection of disease clustering and a generalized regression approach. Cancer Res. 1967;27:209–220. [PubMed] [Google Scholar]
57.Oksanen J, et al. vegan: Community Ecology Package. R version. 2020;2:5–7. [Google Scholar]
58.Hill WG, Robertson A. Linkage disequilibrium in finite populations. Theor. Appl. Genet. 1968;38:226–231. doi: 10.1007/BF01245622. [DOI] [PubMed] [Google Scholar]
59.Spiess, A.-N. Package ‘propagate’, https://CRAN.R-project.org/package=propagate. 1–44 (2014).
60.VanRaden PM. Efficient methods to compute genomic predictions. J. Dairy Sci. 2008;91:4414–4423. doi: 10.3168/jds.2007-0980. [DOI] [PubMed] [Google Scholar]
61.Endelman JB. Ridge regression and other kernels for genomic selection with R package rrBLUP. Plant Genome. 2011;4:250–255. doi: 10.3835/plantgenome2011.08.0024. [DOI] [Google Scholar]
62.Li W, et al. The EMBL-EBI bioinformatics web and programmatic tools framework. Nucleic Acids Res. 2015;43:W580–W584. doi: 10.1093/nar/gkv279. [DOI] [PMC free article] [PubMed] [Google Scholar]
63.ENA European Nucleotide Archivehttps://identifiers.org/ena.embl:PRJEB41976 (2022). [DOI] [PMC free article] [PubMed]
64.ENA European Nucleotide Archivehttps://identifiers.org/ena.embl:PRJEB48988 (2022). [DOI] [PMC free article] [PubMed]
65.2015. ENA European Nucleotide Archive. PRJEB48738
66.Courtot M, et al. BioSamples database: an updated sample metadata hub. Nucleic Acids Res. 2019;47:D1172–D1178. doi: 10.1093/nar/gky1061. [DOI] [PMC free article] [PubMed] [Google Scholar]
67.Arend D, et al. PGP repository: a plant phenomics and genomics data publication infrastructure. Database. 2016;2016:baw033. doi: 10.1093/database/baw033. [DOI] [PMC free article] [PubMed] [Google Scholar]
68.Schulthess AW. 2022. Passport information for 8,838 genotyped wheat samples. e!DAL—Plant Genomics and Phenomics Research Data Repository. [DOI]
69.2022. ENA European Nucleotide Archive. PRJEB52759
70.Schulthess AW. 2022. Genomic-phenotypic data interoperability between 8,838 genotyped wheat samples, grain yield breeding value estimates and yellow rust infection scores from multiple-environment field trials. e!DAL—Plant Genomics and Phenomics Research Data Repository. [DOI]
71.Schulthess AW. 2022. Evaluating the yellow rust resistance of 7,682 winter wheat IPK genebank accessions and 80 modern European cultivars based on natural infections in multiple-environments. e!DAL—Plant Genomics and Phenomics Research Data Repository. [DOI]
72.Schulthess AW. 2022. Evaluating the yellow rust resistance of 600 winter wheat IPK genebank accession samples and 199 modern European cultivars based on natural and artificial inoculations in multiple-environments. e!DAL—Plant Genomics and Phenomics Research Data Repository. [DOI]
73.Zhao Y. 2022. Estimating the breeding values of 707 winter wheat IPK genebank accessions using yields of ‘Elite×PGR’ F1 hybrids tested across multiple-environment experiments. e!DAL—Plant Genomics and Phenomics Research Data Repository. [DOI]
74.Schulthess AW. 2022. Multiple-environment yield evaluation of 173 advanced wheat pre-breeding lines from crosses involving IPK genebank accessions with high yield breeding values. e!DAL—Plant Genomics and Phenomics Research Data Repository. [DOI]
75.Lüttringhaus S, Gornott C, Wittkop B, Noleppa S, Lotze-Campen H. The economic impact of exchanging breeding material: assessing winter wheat production in Germany. Front. Plant Sci. 2020;11:601013. doi: 10.3389/fpls.2020.601013. [DOI] [PMC free article] [PubMed] [Google Scholar]
76.Fradgley N, et al. A large-scale pedigree resource of wheat reveals evidence for adaptation and selection by breeders. PLoS Biol. 2019;17:e3000071. doi: 10.1371/journal.pbio.3000071. [DOI] [PMC free article] [PubMed] [Google Scholar]
77.Cavanagh CR, et al. Genome-wide comparative diversity uncovers multiple targets of selection for improvement in hexaploid wheat landraces and cultivars. Proc. Natl Acad. Sci. USA. 2013;110:8057–8062. doi: 10.1073/pnas.1217133110. [DOI] [PMC free article] [PubMed] [Google Scholar]
78.Roncallo PF, et al. Linkage disequilibrium patterns, population structure and diversity analysis in a worldwide durum wheat collection including Argentinian genotypes. BMC Genomics. 2021;22:233. doi: 10.1186/s12864-021-07519-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
79.Darrier B, et al. A comparison of mainstream genotyping platforms for the evaluation and use of barley genetic resources. Front. Plant. Sci. 2019;10:544. doi: 10.3389/fpls.2019.00544. [DOI] [PMC free article] [PubMed] [Google Scholar]
80.Meuwissen THE, Hayes BJ, Goddard ME. Prediction of total genetic value using genome-wide dense marker maps. Genetics. 2001;157:1819–1829. doi: 10.1093/genetics/157.4.1819. [DOI] [PMC free article] [PubMed] [Google Scholar]
81.Habier D, Fernando R, Dekkers JC. The impact of genetic relationship information on genome-assisted breeding values. Genetics. 2007;177:2389–2397. doi: 10.1534/genetics.107.081190. [DOI] [PMC free article] [PubMed] [Google Scholar]
82.Mascher M. 2022. Filtration Script for Genetic Variant Matrices in Variant Call Format (VCF) e!DAL—Plant Genomics and Phenomics Research Data Repository. [DOI]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Citations

2015. ENA European Nucleotide Archive. PRJEB48738
Schulthess AW. 2022. Passport information for 8,838 genotyped wheat samples. e!DAL—Plant Genomics and Phenomics Research Data Repository. [DOI]
2022. ENA European Nucleotide Archive. PRJEB52759
Schulthess AW. 2022. Genomic-phenotypic data interoperability between 8,838 genotyped wheat samples, grain yield breeding value estimates and yellow rust infection scores from multiple-environment field trials. e!DAL—Plant Genomics and Phenomics Research Data Repository. [DOI]
Schulthess AW. 2022. Evaluating the yellow rust resistance of 7,682 winter wheat IPK genebank accessions and 80 modern European cultivars based on natural infections in multiple-environments. e!DAL—Plant Genomics and Phenomics Research Data Repository. [DOI]
Schulthess AW. 2022. Evaluating the yellow rust resistance of 600 winter wheat IPK genebank accession samples and 199 modern European cultivars based on natural and artificial inoculations in multiple-environments. e!DAL—Plant Genomics and Phenomics Research Data Repository. [DOI]
Zhao Y. 2022. Estimating the breeding values of 707 winter wheat IPK genebank accessions using yields of ‘Elite×PGR’ F1 hybrids tested across multiple-environment experiments. e!DAL—Plant Genomics and Phenomics Research Data Repository. [DOI]
Schulthess AW. 2022. Multiple-environment yield evaluation of 173 advanced wheat pre-breeding lines from crosses involving IPK genebank accessions with high yield breeding values. e!DAL—Plant Genomics and Phenomics Research Data Repository. [DOI]
Mascher M. 2022. Filtration Script for Genetic Variant Matrices in Variant Call Format (VCF) e!DAL—Plant Genomics and Phenomics Research Data Repository. [DOI]

Data Availability Statement

[CR1] 1.FAO. Crop prospects and food situation - Quarterly global report No. 4, December 2021, Rome. https://www.fao.org/3/cb7877en/cb7877en.pdf (2021).

[CR2] 2.d’Amour CB, et al. Future urban land expansion and implications for global croplands. Proc. Natl. Acad. Sci. USA. 2017;114:8939–8944. doi: 10.1073/pnas.1606036114. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR3] 3.Cassman KG. Ecological intensification of cereal production systems: Yield potential, soil quality, and precision agriculture. Proc. Natl. Acad. Sci. USA. 1999;96:5952–5959. doi: 10.1073/pnas.96.11.5952. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR4] 4.Tilman D, et al. Forecasting agriculturally driven global environmental change. Science. 2001;292:281–284. doi: 10.1126/science.1057544. [DOI] [PubMed] [Google Scholar]

[CR5] 5.Zhao C, et al. Temperature increase reduces global yields of major crops in four independent estimates. Proc. Natl. Acad. Sci. USA. 2017;114:9326–9331. doi: 10.1073/pnas.1701762114. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR6] 6.Ray DK, Ramankutty N, Mueller ND, West PC, Foley JA. Recent patterns of crop yield growth and stagnation. Nat. Commun. 2012;3:1293. doi: 10.1038/ncomms2296. [DOI] [PubMed] [Google Scholar]

[CR7] 7.Chu J, et al. Suitability of single-nucleotide polymorphism arrays versus genotyping-by-sequencing for genebank genomics in Wheat. Front. Plant. Sci. 2020;11:42. doi: 10.3389/fpls.2020.00042. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR8] 8.Hammer K, Knüpffer H, Xhuveli L, Perrino P. Estimating genetic erosion in landraces - two case studies. Genet. Resour. Crop. Evol. 1996;43:329–336. doi: 10.1007/BF00132952. [DOI] [Google Scholar]

[CR9] 9.Roussel V, Leisova L, Exbrayat F, Stehno Z, Balfourier F. SSR allelic diversity changes in 480 European bread wheat varieties released from 1840 to 2000. Theor. Appl. Genet. 2005;111:162–170. doi: 10.1007/s00122-005-2014-8. [DOI] [PubMed] [Google Scholar]

[CR10] 10.Ali S, et al. Yellow rust epidemics worldwide were caused by pathogen races from divergent genetic lineages. Front. Plant. Sci. 2017;8:1057. doi: 10.3389/fpls.2017.01057. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR11] 11.Yuen J. Pathogens which threaten food security: Phytophthora infestans, the potato late blight pathogen. Food Secur. 2021;13:247–253. doi: 10.1007/s12571-021-01141-3. [DOI] [Google Scholar]

[CR12] 12.Wise RP, Bronson CR, Schnable PS, Horner HT. The genetics, pathology, and molecular biology of T-cytoplasm male sterility in maize. Adv. Agron. 1999;65:79–130. doi: 10.1016/S0065-2113(08)60911-6. [DOI] [Google Scholar]

[CR13] 13.Singh RP, et al. Disease impact on wheat yield potential and prospects of genetic control. Annu. Rev. Phytopathol. 2016;54:303–322. doi: 10.1146/annurev-phyto-080615-095835. [DOI] [PubMed] [Google Scholar]

[CR14] 14.Schulthess AW, et al. Genomics-informed prebreeding unlocks the diversity in genebanks for wheat improvement. Nat Genet. 2022;54:1544–1552. doi: 10.1038/s41588-022-01189-7. [DOI] [PubMed] [Google Scholar]

[CR15] 15.Hovmøller MS, et al. Replacement of the European wheat yellow rust population by new races from the centre of diversity in the near-Himalayan region. Plant. Pathol. 2016;65:402–411. doi: 10.1111/ppa.12433. [DOI] [Google Scholar]

[CR16] 16.Hovmøller, M. S. et al. GRRC report of yellow and stem rust genotyping and race analyses 2021. Global Rust Reference Center (GRRC), Aarhus University.https://agro.au.dk/fileadmin/www.grcc.au.dk/International_Services/Pathotype_YR_results/GRRC_Annual_Report2021.pdf (2022).

[CR17] 17.Hafeez AN, et al. Creation and Judicious Application of a Wheat Resistance Gene Atlas. Mol. Plant. 2021;14:1053–1070. doi: 10.1016/j.molp.2021.05.014. [DOI] [PubMed] [Google Scholar]

[CR18] 18.Miedaner T, Juroszek P. Climate change will influence disease resistance breeding in wheat in Northwestern Europe. Theor. Appl. Genet. 2021;134:1771–1785. doi: 10.1007/s00122-021-03807-0. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR19] 19.Sharma S, et al. Introducing beneficial alleles from plant genetic resources into the wheat germplasm. Biology. 2021;10:982. doi: 10.3390/biology10100982. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR20] 20.Wambugu PW, Ndjiondjop MN, Henry RJ. Role of genomics in promoting the utilization of plant genetic resources in genebanks. Brief. Funct. Genomics. 2018;17:198–206. doi: 10.1093/bfgp/ely014. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR21] 21.Mascher M, et al. Genebank genomics bridges the gap between the conservation of crop diversity and plant breeding. Nat. Genet. 2019;51:1076–1081. doi: 10.1038/s41588-019-0443-6. [DOI] [PubMed] [Google Scholar]

[CR22] 22.Singh S, et al. Direct introgression of untapped diversity into elite wheat lines. Nat Food. 2021;2:819–827. doi: 10.1038/s43016-021-00380-z. [DOI] [PubMed] [Google Scholar]

[CR23] 23.Philipp N, et al. Grain number and grain yield distribution along the spike remain stable despite breeding for high yield in winter wheat. PLoS One. 2018;13:e0205452. doi: 10.1371/journal.pone.0205452. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR24] 24.Royo C, Ammar K, Villegas D, Soriano JM. Agronomic, physiological and genetic changes associated with evolution, migration and modern breeding in durum wheat. Front. Plant. Sci. 2021;12:674470. doi: 10.3389/fpls.2021.674470. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR25] 25.Schneider J, Berkner MO, Philipp N, Schulthess AW, Reif JC. Assessing the suitability of elite lines for hybrid seed production and as testers in wide crosses with wheat genetic resources. Front. Plant Sci. 2021;12:1128. doi: 10.3389/fpls.2021.689825. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR26] 26.Sommer L, et al. Proof of concept to unmask the breeding value of genetic resources of barley (Hordeum vulgare) with a hybrid strategy. Plant Breed. 2020;139:536–549. doi: 10.1111/pbr.12795. [DOI] [Google Scholar]

[CR27] 27.Longin CFH, Reif JC. Redesigning the exploitation of wheat genetic resources. Trends Plant Sci. 2014;19:631–636. doi: 10.1016/j.tplants.2014.06.012. [DOI] [PubMed] [Google Scholar]

[CR28] 28.Wilkinson MD, et al. The FAIR guiding principles for scientific data management and stewardship. Sci. Data. 2016;3:16001. doi: 10.1038/sdata.2016.18. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR29] 29.Oppermann M, Weise S, Dittmann C, Knüpffer H. GBIS: the information system of the German Genebank. Database. 2015;2015:bav021. doi: 10.1093/database/bav021. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR30] 30.Neumann K, Kobiljski B, Denčić SS, Varshney RK, Börner A. Genome-wide association mapping: A case study in bread wheat (Triticum aestivum L.) Mol. Breed. 2011;27:37–58. doi: 10.1007/s11032-010-9411-7. [DOI] [Google Scholar]

[CR31] 31.Janßen, G. Mittelalte ähren-Eisen: 25 Jahre alte Getreidesorten muss man mit der Lupe suchen. IG Nachbau. Available at: https://www.ig-nachbau.de/spezialseiten/ig-nachbau-artikel/details/?tx_ttnews%5Btt_news%5D=4110&cHash=2f65433d95124157b9d7547f3239d95a (Accessed: 9th August 2022) (2021).

[CR32] 32.Keilwagen J, et al. Detecting large chromosomal modifications using short read data from genotyping-by-sequencing. Front. Plant Sci. 2019;10:1133. doi: 10.3389/fpls.2019.01133. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR33] 33.Kollers S, et al. Whole genome association mapping of Fusarium head blight resistance in European winter wheat (Triticum aestivum L.) PLoS ONE. 2013;8:e57500. doi: 10.1371/journal.pone.0057500. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR34] 34.Jiang Y, et al. Validating the prediction accuracies of marker-assisted and genomic selection of Fusarium head blight resistance in wheat using an independent sample. Theor. Appl. Genet. 2017;130:471–482. doi: 10.1007/s00122-016-2827-7. [DOI] [PubMed] [Google Scholar]

[CR35] 35.Würschum T, et al. Population structure, genetic diversity and linkage disequilibrium in elite winter wheat assessed with SNP and SSR markers. Theor. Appl. Genet. 2013;126:1477–1486. doi: 10.1007/s00122-013-2065-1. [DOI] [PubMed] [Google Scholar]

[CR36] 36.Longin CFH, et al. Hybrid wheat: quantitative genetic parameters and consequences for the design of breeding programs. Theor. Appl. Genet. 2013;126:2791–2801. doi: 10.1007/s00122-013-2172-z. [DOI] [PubMed] [Google Scholar]

[CR37] 37.Poland JA, Brown PJ, Sorrells ME, Jannink J-L. Development of high-density genetic maps for barley and wheat using a novel two-enzyme genotyping-by-sequencing approach. PLoS ONE. 2012;7:e32253. doi: 10.1371/journal.pone.0032253. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR38] 38.Wendler N, et al. Unlocking the secondary gene-pool of barley with next-generation sequencing. Plant Biotechnol. J. 2014;12:1122–1131. doi: 10.1111/pbi.12219. [DOI] [PubMed] [Google Scholar]

[CR39] 39.Martin M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet. journal. 2011;17:10–12. doi: 10.14806/ej.17.1.200. [DOI] [Google Scholar]

[CR40] 40.Bittencourt, S. A. FastQC: a quality control tool for high throughput sequence data. https://www.bioinformatics.babraham.ac.uk/projects/fastqc (2010).

[CR41] 41.Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Preprint at https://arxiv.org/abs/1303.3997 (2013).

[CR42] 42.The International Wheat Genome Sequencing Consortium Shifting the limits in wheat research and breeding using a fully annotated reference genome. Science. 2018;361:eaar7191. doi: 10.1126/science.aar7191. [DOI] [PubMed] [Google Scholar]

[CR43] 43.Li H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics. 2018;34:3094–3100. doi: 10.1093/bioinformatics/bty191. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR44] 44.Li H, et al. The sequence alignment/map format and SAMtools. Bioinformatics. 2009;25:2078–2079. doi: 10.1093/bioinformatics/btp352. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR45] 45.Li H. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics. 2011;27:2987–2993. doi: 10.1093/bioinformatics/btr509. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR46] 46.R Development Core Team. R: A Language and Environment for Statistical Computing. (R Foundation for Statistical Computing, Vienna, 2020).

[CR47] 47.Knaus BJ, Grünwald NJ. vcfr: a package to manipulate and visualize variant call format data in R. Mol. Ecol. Resour. 2017;17:44–53. doi: 10.1111/1755-0998.12549. [DOI] [PubMed] [Google Scholar]

[CR48] 48.Bundessortenamt. Richtlinien für die Durchführung von landwirtschaftlichen Wertprüfungen und Sortenversuchen. https://www.bundessortenamt.de/bsa/media/Files/Richtlinie_LW2000.pdf (2000).

[CR49] 49.Boeven PH, et al. Negative dominance and dominance-by-dominance epistatic effects reduce grain-yield heterosis in wide crosses in wheat. Sci. Adv. 2020;6:eaay4897. doi: 10.1126/sciadv.aay4897. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR50] 50.Henderson CR. Best linear unbiased estimation and prediction under a selection model. Biometrics. 1975;31:423–447. doi: 10.2307/2529430. [DOI] [PubMed] [Google Scholar]

[CR51] 51.Anscombe FJ, Tukey JW. The examination and analysis of residuals. Technometrics. 1963;5:141–160. doi: 10.1080/00401706.1963.10490071. [DOI] [Google Scholar]

[CR52] 52.Bernal-Vasquez A-M, Utz H-F, Piepho H-P. Outlier detection methods for generalized lattices: a case study on the transition from ANOVA to REML. Theor. Appl. Genet. 2016;129:787–804. doi: 10.1007/s00122-016-2666-6. [DOI] [PubMed] [Google Scholar]

[CR53] 53.Butler, D. G., Cullis, B. R., Gilmour, A. R. & Gogel, B. J. ASReml–R Reference manual. Brisbane: The State of Queensland, Department of Primary Industries and Fisheries. https://asreml.kb.vsni.co.uk/wp-content/uploads/sites/3/ASReml-R-3-Reference-Manual.pdf (2009).

[CR54] 54.Butler, D. G. et al. ASReml–R Reference manual version 4. VSN International Ltd., Hemel Hempstead: University of Wollongong. https://asreml.kb.vsni.co.uk/wp-content/uploads/sites/3/ASReml-R-Reference-Manual-4.pdf (2017).

[CR55] 55.Rogers JS. Measures of genetic similarity and genetic distance. Studies in Genetics. 1972;7:145–153. [Google Scholar]

[CR56] 56.Mantel N. The detection of disease clustering and a generalized regression approach. Cancer Res. 1967;27:209–220. [PubMed] [Google Scholar]

[CR57] 57.Oksanen J, et al. vegan: Community Ecology Package. R version. 2020;2:5–7. [Google Scholar]

[CR58] 58.Hill WG, Robertson A. Linkage disequilibrium in finite populations. Theor. Appl. Genet. 1968;38:226–231. doi: 10.1007/BF01245622. [DOI] [PubMed] [Google Scholar]

[CR59] 59.Spiess, A.-N. Package ‘propagate’, https://CRAN.R-project.org/package=propagate. 1–44 (2014).

[CR60] 60.VanRaden PM. Efficient methods to compute genomic predictions. J. Dairy Sci. 2008;91:4414–4423. doi: 10.3168/jds.2007-0980. [DOI] [PubMed] [Google Scholar]

[CR61] 61.Endelman JB. Ridge regression and other kernels for genomic selection with R package rrBLUP. Plant Genome. 2011;4:250–255. doi: 10.3835/plantgenome2011.08.0024. [DOI] [Google Scholar]

[CR62] 62.Li W, et al. The EMBL-EBI bioinformatics web and programmatic tools framework. Nucleic Acids Res. 2015;43:W580–W584. doi: 10.1093/nar/gkv279. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR63] 63.ENA European Nucleotide Archivehttps://identifiers.org/ena.embl:PRJEB41976 (2022). [DOI] [PMC free article] [PubMed]

[CR64] 64.ENA European Nucleotide Archivehttps://identifiers.org/ena.embl:PRJEB48988 (2022). [DOI] [PMC free article] [PubMed]

[CR65] 65.2015. ENA European Nucleotide Archive. PRJEB48738

[CR66] 66.Courtot M, et al. BioSamples database: an updated sample metadata hub. Nucleic Acids Res. 2019;47:D1172–D1178. doi: 10.1093/nar/gky1061. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR67] 67.Arend D, et al. PGP repository: a plant phenomics and genomics data publication infrastructure. Database. 2016;2016:baw033. doi: 10.1093/database/baw033. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR68] 68.Schulthess AW. 2022. Passport information for 8,838 genotyped wheat samples. e!DAL—Plant Genomics and Phenomics Research Data Repository. [DOI]

[CR69] 69.2022. ENA European Nucleotide Archive. PRJEB52759

[CR70] 70.Schulthess AW. 2022. Genomic-phenotypic data interoperability between 8,838 genotyped wheat samples, grain yield breeding value estimates and yellow rust infection scores from multiple-environment field trials. e!DAL—Plant Genomics and Phenomics Research Data Repository. [DOI]

[CR71] 71.Schulthess AW. 2022. Evaluating the yellow rust resistance of 7,682 winter wheat IPK genebank accessions and 80 modern European cultivars based on natural infections in multiple-environments. e!DAL—Plant Genomics and Phenomics Research Data Repository. [DOI]

[CR72] 72.Schulthess AW. 2022. Evaluating the yellow rust resistance of 600 winter wheat IPK genebank accession samples and 199 modern European cultivars based on natural and artificial inoculations in multiple-environments. e!DAL—Plant Genomics and Phenomics Research Data Repository. [DOI]

[CR73] 73.Zhao Y. 2022. Estimating the breeding values of 707 winter wheat IPK genebank accessions using yields of ‘Elite×PGR’ F1 hybrids tested across multiple-environment experiments. e!DAL—Plant Genomics and Phenomics Research Data Repository. [DOI]

[CR74] 74.Schulthess AW. 2022. Multiple-environment yield evaluation of 173 advanced wheat pre-breeding lines from crosses involving IPK genebank accessions with high yield breeding values. e!DAL—Plant Genomics and Phenomics Research Data Repository. [DOI]

[CR75] 75.Lüttringhaus S, Gornott C, Wittkop B, Noleppa S, Lotze-Campen H. The economic impact of exchanging breeding material: assessing winter wheat production in Germany. Front. Plant Sci. 2020;11:601013. doi: 10.3389/fpls.2020.601013. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR76] 76.Fradgley N, et al. A large-scale pedigree resource of wheat reveals evidence for adaptation and selection by breeders. PLoS Biol. 2019;17:e3000071. doi: 10.1371/journal.pbio.3000071. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR77] 77.Cavanagh CR, et al. Genome-wide comparative diversity uncovers multiple targets of selection for improvement in hexaploid wheat landraces and cultivars. Proc. Natl Acad. Sci. USA. 2013;110:8057–8062. doi: 10.1073/pnas.1217133110. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR78] 78.Roncallo PF, et al. Linkage disequilibrium patterns, population structure and diversity analysis in a worldwide durum wheat collection including Argentinian genotypes. BMC Genomics. 2021;22:233. doi: 10.1186/s12864-021-07519-z. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR79] 79.Darrier B, et al. A comparison of mainstream genotyping platforms for the evaluation and use of barley genetic resources. Front. Plant. Sci. 2019;10:544. doi: 10.3389/fpls.2019.00544. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR80] 80.Meuwissen THE, Hayes BJ, Goddard ME. Prediction of total genetic value using genome-wide dense marker maps. Genetics. 2001;157:1819–1829. doi: 10.1093/genetics/157.4.1819. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR81] 81.Habier D, Fernando R, Dekkers JC. The impact of genetic relationship information on genome-assisted breeding values. Genetics. 2007;177:2389–2397. doi: 10.1534/genetics.107.081190. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR82] 82.Mascher M. 2022. Filtration Script for Genetic Variant Matrices in Variant Call Format (VCF) e!DAL—Plant Genomics and Phenomics Research Data Repository. [DOI]

PERMALINK

Large-scale genotyping and phenotyping of a worldwide winter wheat genebank for its use in pre-breeding

Albert W Schulthess

Sandip M Kale

Yusheng Zhao

Abhishek Gogna

Maximilian Rembe

Norman Philipp

Fang Liu

Ulrike Beukert

Albrecht Serfling

Axel Himmelbach

Markus Oppermann

Stephan Weise

Philipp H G Boeven

Johannes Schacht

C Friedrich H Longin

Sonja Kollers

Nina Pfeiffer

Viktor Korzun

Anne Fiebig

Danuta Schüler

Matthias Lange

Uwe Scholz

Nils Stein

Martin Mascher

Jochen C Reif

Abstract

Background & Summary

Methods

Plant material

Overview

Fig. 1.

Fig. 2.

Seed sources

Large-scale genotyping

Genomic data production

Genomic data processing

Field evaluations of yellow rust resistance

Table 1.

Large-scale resistance screening in unbalanced experiments

Precision balanced experiments

Yield evaluations for informed pre-breeding

Table 2.

Yield breeding value estimation experiments

Yield breeding value validation experiments

Phenotypic data analyses

Data curation and preparation

Parameter estimations within experiments

Parameter estimations across experiments

Molecular diversity and linkage disequilibrium as captured by genotyping platforms

Genomic-phenotypic data interoperability

Data Records

Technical Validation

Genotyping-by-sequencing is a precise and cost-efficient platform to study molecular diversity in genebanks

Table 3.

Fig. 3.

Large-scale screening and precision experiments revealed yellow rust resistance as a rare phenotype in the German Federal ex situ genebank

Table 4.

Fig. 4.

Yield breeding values of plant genetic resources to inform breeders and initiate pre-breeding programs

Table 5.

Fig. 5.

Table 6.

High genomic prediction accuracies support the suitability of data for integrated phenotypic-genomic analyses

Fig. 6.

Usage Notes

Acknowledgements

Author contributions

Funding

Code availability

Competing interests

Footnotes

References

Associated Data

Data Citations

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES