Figure - PMC

Skip to main content

An official website of the United States government

Here's how you know

Here's how you know

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

View full-text article in PMC

. 2023 May 23;39(6):btad336. doi: 10.1093/bioinformatics/btad336

Search in PMC
Search in PubMed
View in NLM Catalog
Add to search

© The Author(s) 2023. Published by Oxford University Press.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

PMC Copyright notice

Generalization performance on a new environment and utilization for crop selection. (a) Comparison of yield prediction performance of a linear baseline (Lasso), two nonlinear baselines (Random forest and FCN) and our model (PheGeMIL) for prediction on a new, unseen environment using genotypic or phenotypic data. Multiple scenarios are evaluated. In all cases, training is done on data from environment A (2018 YT, see Table 1) and testing is done on data from environment B (2018 EYT). A set of experiments is conducted by training and evaluating on both multispectral images and genotypic data (first three rows). A second set of experiments is conducted by evaluating on genotypes alone (last three rows), to mimic prediction before sowing in breeding program scenarios. For baselines, training and testing must be done on the same data types and training can only be done on genotype alone. PheGeMIL, on the contrary, is trained with phenotypic data too, while still being evaluated on genotypes alone, thanks to the MIL framework. Distributions represent the performance in terms of Person correlation coefficient obtained on models trained on the 5 different splits of the training set. Ensembled performance for genotype-only predictions represents the prediction performance obtained when averaging the predicted values for a given sample across the 5 trained models. (b) Average yield obtained from a prediction-driven line selection of varying sizes (binned) using rankings derived from the values predicted by different ensembled methods. Lines are selected based on the predicted yield, their effective yield is then averaged across the set of selected lines and reported for an increasing selection size, ranging from 5% to 40% of the lines in the test set. MIL, multiple instance learning; FCN, fully connected network.