Abstract
Objectives
This report provides information about the public release of the 2018–2019 Maize G X E project of the Genomes to Fields (G2F) Initiative datasets. G2F is an umbrella initiative that evaluates maize hybrids and inbred lines across multiple environments and makes available phenotypic, genotypic, environmental, and metadata information. The initiative understands the necessity to characterize and deploy public sources of genetic diversity to face the challenges for more sustainable agriculture in the context of variable environmental conditions.
Data description
Datasets include phenotypic, climatic, and soil measurements, metadata information, and inbred genotypic information for each combination of location and year. Collaborators in the G2F initiative collected data for each location and year; members of the group responsible for coordination and data processing combined all the collected information and removed obvious erroneous data. The collaborators received the data before the DOI release to verify and declare that the data generated in their own locations was accurate. ReadMe and description files are available for each dataset. Previous years of evaluation are already publicly available, with common hybrids present to connect across all locations and years evaluated since this project’s inception.
Keywords: Maize, Genotype by environment, Phenotype, Variable environments, Grain yield
Objective
Maize (Zea mays subsp. mays L.) plays an important role in the global economy. As a crop, it displays a variety of uses such as food, feed, and fuel. At the same time, and due to its versatility and relevance, maize has been widely studied. The Genomes to Fields (G2F) is a collaborative initiative involving scientists from the public sector that support growers, consumers, and society. G2F researchers generate phenotypic, genotypic, environmental, and metadata datasets to facilitate the understanding of the potential and challenges of maize production in different environments.
Individual genotype performances differ across environments, and the magnitude of this difference dictates the importance of the Genotype by Environment (G × E) interaction. Understanding and harnessing G × E interactions improves the efficiency in the use and allocation of resources, and it facilitates the identification of genotypes with higher stability across a range of locations, the identification of locations where the effect of G x E is minimized, and the identification of mechanisms affecting the differential response of phenotypes to variable environments. Furthermore, advances in our understanding of the fundamental components contributing to the differential response of plants to environmental cues will also improve genomic and phenotypic predictabilities for traits of interest. Therefore, this data release provides a unique resource of combined agronomic, phenological, and morphological information to dissect G × E interaction.
In the 2018 and 2019 experiments, 1153 publicly available hybrids were evaluated through a network of collaborators in 32 different locations. The main group of hybrids was produced by the cross of doubled-haploid (DH) inbred lines from a collection of three biparental populations that share one parent in common (PHW65) and PHN11, Mo44, and MoG as the alternative parent, to two ex-PVP inbred testers, LH195 in Midwest to Southern locations, and PHT69 in Northern locations.
Data description
The 2018 and 2019 datasets are publicly available via CyVerse/iPlant and structured as described in Table 1. Briefly, the datasets included here are:
Phenotypic dataset: Phenotypic measurements that follow a standard set of instructions, available in the G2F webpage [1]. Standard traits include days to anthesis, days to silking, ear height, plant height, stand count, stalk lodging, root lodging, grain moisture, test weight, plot weight, and estimated grain yield. Raw data and quality-controlled data are reported. Out of range observations were set to missing following the rules described in the readMe and data description files.
Genotypic dataset: Inbred parents of the tested hybrids were genotyped using the Practical Haplotype Graph (PHG) [2, 3]. The data is minimally filtered, allowing the public to perform their own quality control steps prior to using it. The raw sequencing reads are available under BioProject ID PRJNA530187 [4]. The code used to create the genotypic data is also available at https://bitbucket.org/bucklerlab/g2f_2018_phg_genotyping/src/master/.
Environmental dataset: WatchDog 2700 weather stations (Spectrum Technologies) were placed at each field site. Data was collected at 30-min intervals from planting through harvest at each location. The geographic locations of the experiments are not identical across years due to crop rotation management practices; thus, the locations of the weather stations vary across years. Each station measured wind speed, direction, and gust; air temperature, dewpoint, relative humidity; soil temperature and moisture; rainfall and solar radiation. Additional measurements taken at selected sites included soil electrical conductivity, ultra-violet light, carbon dioxide, and photosynthetically active radiation. Instructions for weather station maintenance activities including pre-season tasks, field setup, maintenance throughout the growing season, and removal are available in the G2F webpage [5].
Soil dataset: Each field location collected soil samples that represent the experiment field. Collaborators follow instructions available on the G2F webpage for sample collection [5].
Supplemental dataset: Supplemental information consists of metadata (any field-level data collected at planting, in season, and/or at harvest), agronomic information (list of pesticides, nutrients, and irrigation applied), and cooperator list (collaborators responsible for the field locations in 2018 and 2019).
Table 1.
Label | Name of data file/data set | File types (Extension) |
Data repository and identifier (DOI or accession number) |
---|---|---|---|
Data set 1 | Evaluation of genetic diversity across the inbreds used by G2F project (WGS skim sequencing) | fastq files (.fq.gz) | NCBI BioProject (https://identifiers.org/ncbi/bioproject:PRJNA530187) [4] |
Data file 1 | README.txt | Text file (.txt) | CyVerse (https://doi.org/10.25739/anqq-sg86) [6] |
Data file 2 | README.txt | Text file (.txt) | CyVerse (https://doi.org/10.25739/anqq-sg86) [6] |
Data file 3 | _g2f_2018_hybrid_data_description.pdf | Portable Document Format (.pdf) | CyVerse (https://doi.org/10.25739/anqq-sg86) [6] |
Data file 4 | g2f_2018_hybrid_data_clean.csv | comma-separated values file (.csv) | CyVerse (https://doi.org/10.25739/anqq-sg86) [6] |
Data file 5 | g2f_2018_hybrid_data_raw.csv | comma-separated values file (.csv) | CyVerse (https://doi.org/10.25739/anqq-sg86) [6] |
Data file 6 | raw_cleaning_readme.txt | Text file (.txt) | CyVerse (https://doi.org/10.25739/anqq-sg86) [6] |
Data file 7 | README_weather.txt | Text file (.txt) | CyVerse (https://doi.org/10.25739/anqq-sg86) [6] |
Data file 8 | _g2f_2018_weather_data_description.pdf | Portable Document Format (.pdf) | CyVerse (https://doi.org/10.25739/anqq-sg86) [6] |
Data file 9 | g2f_2018_weather_clean.csv | comma-separated values file (.csv) | CyVerse (https://doi.org/10.25739/anqq-sg86) [6] |
Data file 10 | g2f_2018_weather_raw.csv | comma-separated values file (.csv) | CyVerse (https://doi.org/10.25739/anqq-sg86) [6] |
Data file 11 | weather_cleaning_readme.txt | Text file (.txt) | CyVerse (https://doi.org/10.25739/anqq-sg86) [6] |
Data file 12 | Indigo_2018_Soil_Data.pdf | Portable Document Format (.pdf) | CyVerse (https://doi.org/10.25739/anqq-sg86) [6] |
Data file 13 | _g2f_2018_soil_description.pdf | Portable Document Format (.pdf) | CyVerse (https://doi.org/10.25739/anqq-sg86) [6] |
Data file 14 | g2f_2018_soil_data.csv | comma-separated values file (.csv) | CyVerse (https://doi.org/10.25739/anqq-sg86) [6] |
Data file 15 | G2F_PHG_minreads1_Mo44_PHW65_MoG_assemblies_14112019_filtered_plusParents.vcf | variant call format file (.vcf) | CyVerse (https://doi.org/10.25739/anqq-sg86) [6] |
Data file 16 | G2F_PHG_minreads1_Mo44_PHW65_MoG_assemblies_14112019_filtered_plusParents_description.pdf | Portable Document Format (.pdf) | CyVerse (https://doi.org/10.25739/anqq-sg86) [6] |
Data file 17 | G2F_PHG_minreads1_Mo44_PHW65_MoG_assemblies_14112019_filtered_plusParents_sampleDecoder.txt | Text file (.txt) | CyVerse (https://doi.org/10.25739/anqq-sg86) [6] |
Data file 18 | README_G2F_2020-03–13.txt | Text file (.txt) | CyVerse (https://doi.org/10.25739/anqq-sg86) [6] |
Data file 19 | README_Genotypic.txt | Text file (.txt) | CyVerse (https://doi.org/10.25739/anqq-sg86) [6] |
Data file 20 | g2f_2018_agronomic information.csv | comma-separated values file (.csv) | CyVerse (https://doi.org/10.25739/anqq-sg86) [6] |
Data file 21 | g2f_2018_cooperators_list.csv | comma-separated values file (.csv) | CyVerse (https://doi.org/10.25739/anqq-sg86) [6] |
Data file 22 | g2f_2018_field_metadata.csv | comma-separated values file (.csv) | CyVerse (https://doi.org/10.25739/anqq-sg86) [6] |
Data file 23 | g2f_2018_supplemental_information.txt | Text file (.txt) | CyVerse (https://doi.org/10.25739/anqq-sg86) [6] |
Data file 24 | g2f_planting_season_2019_readMe.txt | Text file (.txt) | CyVerse (https://doi.org/10.25739/t651-yy97) [7] |
Data file 25 | g2f_2019_phenotypic_clean_data.csv | comma-separated values file (.csv) | CyVerse (https://doi.org/10.25739/t651-yy97) [7] |
Data file 26 | g2f_2019_phenotypic_data_description.pdf | Portable Document Format (.pdf) | CyVerse (https://doi.org/10.25739/t651-yy97) [7] |
Data file 27 | g2f_2019_phenotypic_data_read_me.txt | Text file (.txt) | CyVerse (https://doi.org/10.25739/t651-yy97) [7] |
Data file 28 | g2f_2019_phenotypic_raw_data.csv | comma-separated values file (.csv) | CyVerse (https://doi.org/10.25739/t651-yy97) [7] |
Data file 29 | 2019_weather_cleaned.csv | comma-separated values file (.csv) | CyVerse (https://doi.org/10.25739/t651-yy97) [7] |
Data file 30 | 2019_weather_raw.csv | comma-separated values file (.csv) | CyVerse (https://doi.org/10.25739/t651-yy97) [7] |
Data file 31 | g2f_2019_weather_data_description.pdf | Portable Document Format (.pdf) | CyVerse (https://doi.org/10.25739/t651-yy97) [7] |
Data file 32 | g2f_2019_weather_readMe.txt | Text file (.txt) | CyVerse (https://doi.org/10.25739/t651-yy97) [7] |
Data file 33 | g2f_2019_soil_data.csv | comma-separated values file (.csv) | CyVerse (https://doi.org/10.25739/t651-yy97) [7] |
Data file 34 | g2f_2019_soil_data_description.pdf | Portable Document Format (.pdf) | CyVerse (https://doi.org/10.25739/t651-yy97) [7] |
Data file 35 | g2f_2019_agronomic_information.csv | comma-separated values file (.csv) | CyVerse (https://doi.org/10.25739/t651-yy97) [7] |
Data file 36 | g2f_2019_cooperators_list.csv | comma-separated values file (.csv) | CyVerse (https://doi.org/10.25739/t651-yy97) [7] |
Data file 37 | g2f_2019_field_metadata.csv | comma-separated values file (.csv) | CyVerse (https://doi.org/10.25739/t651-yy97) [7] |
Data file 38 | g2f_2019_supplemental_information.pdf | Portable Document Format (.pdf) | CyVerse (https://doi.org/10.25739/t651-yy97) [7] |
Limitations
These datasets contain missing data. Missing data includes data not reported by collaborators or erroneous data as determined by data description files. In 2019, some locations had pedigree information set to missing due to packaging problems and only plot number was reported in the phenotypic dataset to reduce misinterpretation.
Acknowledgements
We gratefully acknowledge contributions from many field managers and data collectors including: Dustin Eilert, Marina Borsecnik, Rachel Perry, Renata Barcelos and Ben Fischer (de Leon/Kaeppler labs, University of Wisconsin—Madison). Amanda Gilbert (Hirsch lab, University of Minnesota). Christine Smith, Brandi Sigmon, Connor Pedersen, Nathaniel Pester, Isaac Stevens (Schnable Lab, University of Nebraska-Lincoln). Trevor Perla, Amy Deariso, Paige Coffee, Steven Hughes, and C.J. Dudley (USDA Tifton). Naomi Rodman, Spencer Caro, Coleman Grindle, Allison McCabe, Samuel Morris, and Bamidele Sangoyomi (Wallace lab—University of Georgia). William Widdicombe and Linsey Newton (Singh and Thompson Labs, Michigan State University). Susan Melia-Hancock and Jim Elder (Flint-Garcia Lab, USDA-ARS, Columbia MO). Emmalea Ernest and Victor Green (University of Delaware). Colby Bass and Regan Lindsey (Texas A&M University System). Kyle Evans, Kirsten Hein, Anne Howard, Jack Mullen, Patrick Woods (McKay Lab, Colorado State University). Dietrich Kaufmann (Beissinger Lab). Christina Poudyal, Kevin Silverstein, Anna Rogers, Luis Samayoa, Tyson Swetnam. In addition, we gratefully acknowledge contributions from numerous staff indirectly involved in the project, graduate students, and student workers at many locations.
Abbreviations
- G2F
Genomes to Fields
- DOI
Digital Object Identifier
- DH
Doubled-haploid
- G x E
Genotype by environment
Authors’ contributions
Data management team: DCL, ACA, RTA, BAM, MCR, JLG, JH, JE, DE, JDW. Data contributors: DCL, ACA, RTA, NdL, SK, MCR, JLG, JH, TB, MB, EB, SFG, JE, CNH, EH, DCH, JEK, JMK, SL, JM, RM, DEM, SCM, RN, JCS, RSS, MPS, PT, AT, MT, JW, JDW, TW, RJW, WX. Communication: NdL, DE, SK. The data management team aggregated, curated, and made available data resources. Data contributors advised on data collection methods, collected the data, and reviewed data collection and curation methods as well as datasets. Communicating authors guided data collection, curation, and distribution. All authors reviewed the manuscript. All authors read and approved the final manuscript.
Funding
We gratefully acknowledge support from: National Corn Growers Association, Iowa Corn Promotion Board, Georgia Corn Commission, Nebraska Corn Board, Ohio Corn Marketing Program, Corn Marketing Program of Michigan, Texas Corn Producers Board, University of Göttingen startup funds, USDA-ARS, and USDA Germplasm Enhancement of Maize program.
Availability of data and materials
The data described in this Data note can be freely and openly accessed on CyVerse at https://doi.org/10.25739/anqq-sg86 (2018 Field Season) and https://doi.org/10.25739/t651-yy97 (2019 Field Season). Please see Table 1 and references list for details and links to the data.
Declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Footnotes
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Contributor Information
Dayane Cristina Lima, Email: dclima@wisc.edu.
Alejandro Castro Aviles, Email: alejandrocastro88@hotmail.com.
Ryan Timothy Alpers, Email: ralpers@wisc.edu.
Bridget A. McFarland, Email: Bridget.McFarland@usda.gov
Shawn Kaeppler, Email: smkaeppl@wisc.edu.
David Ertl, Email: dertl@iowacorn.org.
Maria Cinta Romay, Email: mcr72@cornell.edu.
Joseph L. Gage, Email: jlgage@ncsu.edu
James Holland, Email: Jim.Holland@usda.gov.
Timothy Beissinger, Email: beissinger@gwdg.de.
Martin Bohn, Email: mbohn@illinois.edu.
Edward Buckler, Email: esb33@cornell.edu.
Jode Edwards, Email: Jode.edwards@usda.gov.
Sherry Flint-Garcia, Email: sherry.flint-garcia@usda.gov.
Candice N. Hirsch, Email: cnhirsch@umn.edu
Elizabeth Hood, Email: ehood@astate.edu.
David C. Hooker, Email: dhooker@uoguelph.ca
Joseph E. Knoll, Email: Joe.Knoll@usda.gov
Judith M. Kolkman, Email: jmkolkman@gmail.com
Sanzhen Liu, Email: liu3zhen@ksu.edu.
John McKay, Email: john.mckay@colostate.edu.
Richard Minyo, Email: minyo.1@osu.edu.
Danilo E. Moreta, Email: dem324@cornell.edu
Seth C. Murray, Email: sethmurray@tamu.edu
Rebecca Nelson, Email: rjn7@cornell.edu.
James C. Schnable, Email: schnable@unl.edu
Rajandeep S. Sekhon, Email: sekhon@clemson.edu
Maninder P. Singh, Email: msingh@msu.edu
Peter Thomison, Email: thomison.1@osu.edu.
Addie Thompson, Email: thom1718@msu.edu.
Mitchell Tuinstra, Email: mtuinstr@purdue.edu.
Jason Wallace, Email: jason.wallace@uga.edu.
Jacob D. Washburn, Email: jacob.washburn@usda.gov
Teclemariam Weldekidan, Email: tecle@udel.edu.
Randall J. Wisser, Email: randall.wisser@inrae.fr
Wenwei Xu, Email: wxu@ag.tamu.edu.
Natalia de Leon, Email: ndeleongatti@wisc.edu.
References
- 1.Genomes to Fields. 2022. https://www.genomes2fields.org. Accessed 10 Oct 2022.
- 2.Bradbury PJ, Casstevens T, Jensen SE, Johnson LC, Miller ZR, Monier B, et al. The Practical Haplotype Graph, a platform for storing and using pangenomes for imputation. Bioinformatics. 2022;38(15):3698–3702. doi: 10.1093/bioinformatics/btac410. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Franco JAV, Gage JL, Bradbury PJ, Johnson LC, Miller ZR, Buckler ES, et al. A Maize Practical Haplotype Graph Leverages Diverse NAM Assemblies. bioRxiv. 2020. 10.1101/2020.08.31.268425.
- 4.Evaluation of genetic diversity across the inbreds used by G2F project (WGS skim sequencing). BioProject. 2022. https://identifiers.org/ncbi/bioproject:PRJNA530187.
- 5.Genomes to Fields resources. 2022. https://www.genomes2fields.org/resources/. Accessed 10 Oct 2022.
- 6.G2F Consortium. Genomes to Fields 2018 Data Set. CyVerse Data Commons. 2018. 10.25739/anqq-sg86.
- 7.G2F Consortium. Genomes to Fields 2019 dataset. CyVerse Data Commons. 2019. 10.25739/t651-yy97.
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
The data described in this Data note can be freely and openly accessed on CyVerse at https://doi.org/10.25739/anqq-sg86 (2018 Field Season) and https://doi.org/10.25739/t651-yy97 (2019 Field Season). Please see Table 1 and references list for details and links to the data.