Genome‐enable prediction for health traits using high‐density SNP panel in US Holstein cattle

F Lopes; G Rosa; P Pinedo; J E P Santos; R C Chebel; K N Galvao; G M Schuenemann; R C Bicalho; R O Gilbert; S Rodrigez‐Zas; C M Seabury; W Thatcher

doi:10.1111/age.12892

. 2020 Jan 7;51(2):192–199. doi: 10.1111/age.12892

Genome‐enable prediction for health traits using high‐density SNP panel in US Holstein cattle

F Lopes ¹, G Rosa ¹, P Pinedo ^2,^✉, J E P Santos ³, R C Chebel ⁴, K N Galvao ⁴, G M Schuenemann ⁵, R C Bicalho ⁶, R O Gilbert ⁷, S Rodrigez‐Zas ⁸, C M Seabury ⁹, W Thatcher ³

PMCID: PMC7065151 PMID: 31909828

Summary

The objective of this study was to compare accuracies of different Bayesian regression models in predicting molecular breeding values for health traits in Holstein cattle. The dataset was composed of 2505 records reporting the occurrence of retained fetal membranes (RFM), metritis (MET), mastitis (MAST), displaced abomasum (DA), lameness (LS), clinical endometritis (CE), respiratory disease (RD), dystocia (DYST) and subclinical ketosis (SCK) in Holstein cows, collected between 2012 and 2014 in 16 dairies located across the US. Cows were genotyped with the Illumina BovineHD (HD, 777K). The quality controls for SNP genotypes were HWE P‐value of at least 1 × 10⁻¹⁰; MAF greater than 0.01 and call rate greater than 0.95. The fimpute program was used for imputation of missing SNP markers. The effect of each SNP was estimated using the Bayesian Ridge Regression (BRR), Bayes A, Bayes B and Bayes Cπ methods. The prediction quality was assessed by the area under the curve, the prediction mean square error and the correlation between genomic breeding value and the observed phenotype, using a leave‐one‐out cross‐validation technique that avoids iterative cross‐validation. The highest accuracies of predictions achieved were: RFM [Bayes B (0.34)], MET [BRR (0.36)], MAST [Bayes B (0.55), DA [Bayes Cπ (0.26)], LS [Bayes A (0.12)], CE [Bayes A (0.32)], RD [Bayes Cπ (0.23)], DYST [Bayes A (0.35)] and SCK [Bayes Cπ (0.38)] models. Except for DA, LS and RD, the predictive abilities were similar between the methods. A strong relationship between the predictive ability and the heritability of the trait was observed, where traits with higher heritability achieved higher accuracy and lower bias when compared with those with low heritability. Overall, it has been shown that a high‐density SNP panel can be used successfully to predict genomic breeding values of health traits in Holstein cattle and that the model of choice will depend mostly on the genetic architecture of the trait.

Keywords: Bayesian regression models, dairy, genomic prediction, reproduction

Introduction

Maintaining cow health and productivity around parturition and during early lactation is critical for the overall welfare and profitability of dairy herds. Transition from the non‐lactating pregnant state to non‐pregnant lactating state requires a dairy cow to drastically adjust her metabolism, so that nutrients can be partitioned to support milk synthesis, a process referred to as homeorhesis (Bauman & Currie 1980). Endocrine changes at calving and drastic metabolic adjustments to support milk synthesis result in negative energy balance and immune suppression (Goff 2004; Hammon et al. 2006). Despite orchestrated homeostatic controls and homeorhetic adjustments to cope with the changes in metabolism caused by milk production, 40–70% of dairy cows across different levels of milk production, breeds and management systems develop metabolic or infectious diseases in the first month of lactation (Santos et al. 2010; Ribeiro et al. 2013). These health problems not only cause reductions in milk yield and increases in treatment costs, but also affect the reproductive performance of dairy cows, which consequently impairs the sustainability of dairy herds.

This suboptimal status in the health and wellness of dairy cattle has led to a growing interest in the use of genetic improvement as part of a comprehensive health management strategy for dairy cows. Consequently, several new methodologies for genomic selection (GS) have been proposed (Weller et al. 2017).

The prediction quality of various Bayesian models has been computed and compared in multiple genomic selection studies (Meuwissen et al. 2001; Pérez et al. 2010; Magnabosco et al. 2016). However, the performance of each model depends on the trait evaluated and its genetic architecture. For example, models that perform some sort of variable selection (e.g. Bayes B and C) might have a predictive ability advantage for traits controlled by smaller numbers of major effect QTL. On the other hand, for polygenic traits that are affected by several QTL of small effects, most of the methods have similar predictive ability, including simpler models, such as ridge regression (Daetwyler et al. 2010). As such, each application of GS requires a step of model comparison. However, computing measures of predictive ability, such as prediction accuracy, require the use of cross‐validation (CV) methods, which might be time‐consuming, depending the number of individuals and or markers analyzed.

Efficient strategies for leave‐one‐out cross‐validation (LOOCV) have been proposed and their theory fully described for GS using analytical solutions (Gianola & Schön 2016), including the importance of sampling weights in the context of Bayesian inference (Gianola & Schön 2016). Such approaches permit the estimation of CV results by running the model only once, with no need for the fully iterative implementation of LOOCV. These methods have the advantage of being computationally cost‐efficient, reducing the total amount of time to run the analysis, and avoiding the use of parallel computing. Thus, this study was carried out to assess the potential of genomic selection in Holstein cattle by investigating the accuracy of molecular breeding values for health traits. Four Bayesian specifications of genomic regression models, namely Bayesian Ridge Regression (BRR), Bayes A, Bayes B, and Bayes Cπ, were compared in terms of prediction accuracy using the “LOOCV without CV” approach.

Materials and methods

Study population, health assessment and outcomes of interest

A total of 11 733 cows calving between 2012 and 2014 on 16 dairy farms located in four US regions [northeast (four herds), midwest (six herds), southeast (one herd) and southwest (five herds)] were enrolled at parturition and monitored for health and reproductive events assessed by the research team using the same standardized protocols.

Monitoring of disease was performed on weekly visits to the study farms. Procedures included: (i) evaluation of vaginal discharge at 7 ± 3 days in milk (DIM) and 28 ± 7 DIM using a 0–5 score system – 0 = no mucus, 1 = crystalline, 2 = flecks of pus, 3 = mucopurulent <50% pus, 4 = purulent, >50% pus and 5 = watery, reddish/brownish fetid discharge; (ii) collection of blood samples at 7 ± 3 DIM for determination of serum BHBA; and (iii) lameness scoring at 35 ± 3 DIM using a 1–5 score system – 1 = normal, 2 = mildly lame (stands with flat back, but arches when walks; gait slightly abnormal), 3 = moderately lame (stands and walk with arched back), 4 = lame (arched back standing and walking, one or more limbs favored) and 5 = severely lame (arched back, refuses to bear weight in one limb, great difficulty moving from lying position).

Diseases included retained fetal membranes (RFM; membranes not expelled within 24 h after calving), metritis (MET; 7 ± 3 DIM; mucus score 5), clinical mastitis (MAST; obtained from farm records, definition consisting of udder inflammation/milk altered), left displaced abomasum (DA; farm records), lameness (LS; score >2), clinical endometritis (CE; 28 ± 3 DIM; mucus score >2), respiratory disease (RD; farm records), dystocia (DYS; calving requiring significant intervention) and subclinical ketosis (SCK; 7 ± 3 DIM; serum BHBA >1.0 mmol/l). In addition, records included milk yield tested monthly, up to 305 days after calving.

Based on phenotypic information from individual cows, a reproductive index (RI) was developed, representing a calculated predicted probability that a cow will become pregnant as a function of the explanatory variables used in a logistic model. Potential significant effects were initially tested by univariable analysis. Effects with P < 0.05 were offered to the multivariable analysis and the final model was determined through backward elimination, considering potentially significant interactions.

The final model for RI included a complement of significant fixed effects and interactions as explanatory variables influencing a pregnancy outcome: parity number; body condition score at 40 DIM; incidence of RFM; MET; resumption of ovulation by 50 DIM; region; SCK; MAST; CE; milk yield at the first milk test after calving; interaction effect of resumption of ovulation by 50 DIM × region; interaction MAST × region; and interaction milk yield at the first milk test after calving × parity number.

The RI was developed as a continuous variable, originating from the probability equation of the logistic regression model, ranging from 0 to 1, which is directly related to the probability of pregnancy: $P (pregnancy | α, β) = e^{Σ β_{i} Z_{i} + μ σ} / 1 + e^{Σ β_{i} Z_{i} + μ σ}$ , where P(pregnancy|α,β) is the probability that a cow will be pregnant given a set of fixed factors Z_i, interactions, the set of multiplicative slopes β_i and a scale parameter σ.

The selection of cows for genotyping was performed by including two groups within each farm and calving season: (i) high‐fertility individuals = diagnosed pregnant at 60 days after the first post‐partum artificial insemination and within the highest 15% RI (n = 1750); and (ii) low‐fertility individuals = cows diagnosed not pregnant on day 60 after two post‐partum artificial insemination and within the lowest 7.25% RI (n = 850).

Animals and genotyping

The initial population included 11 733 Holstein cows calving between 2012 and 2014 on 16 US farms. The dataset in analysis was composed of a subpopulation of 2505 records in cows that were determined with high and low reproductive performance based on the calculated RI. Cow records included information on the presentation of RFM, MET, MAST, DA, LS, CE, RD, DYST and SCK. Except for LS that was treated as a multicategorical variable, all other traits are dichotomous (yes or no). An exploratory analysis was performed to investigate data consistency and to evaluate the significance of environmental effects on traits. The environmental factors studied included farm (16 farms), year (2012, 2013 and 2014) and season of the calving (summer or winter), and parity (primiparous or multiparous).

A total of 2505 cows were selected based on a reproductive index (IndexP60) and genotyped using Illumina bovine hd snp chip (777 962 SNPs; Illumina, Inc.). The data‐cleaning steps removed: (i) SNPs located on sex and mitochondrial chromosomes; (ii) SNPs without map position information; (iii) SNPs with call rate less than 0.95; (iv) SNPs with MAF less than 0.01; and (v) SNPs with deviation from HWE (P < 10⁻¹⁰). After data cleaning, 603 986 SNPs were retained for analyses.

Imputation

The fimpute program version 2.2 (Sargolzaei et al. 2014) was used for imputation of missing SNP genotypes in the HD chip. This program uses deterministic methods to combine family and population information. The imputation is based on overlapping sliding windows and assumes that individuals are related to some degree. Overlapping of windows allows for consistency of haplotype phases across windows. As pedigree information was available, fimpute was run using both family‐ and population‐based algorithms, with its own default parameters.

Pedigree‐based analyses

Pedigree‐based analyses were performed using a univariate threshold animal model (logit). Such models applied belong to the class of generalized linear mixed models, which can be used to analyze data with different distributions from the exponential family (e.g. binomial). This model uses a link function relating the expected value E(y_ijkl) = μ_ijkl to the linear predictor η_ijkl. The following linear predictor was used:

η_{ijkl} = φ + {YS}_{i} + {Parity}_{j} + H_{k} + u_{l},

where η_ijkl is a function of the expected phenotype, φ is an intercept, YS_i is the fixed effect of year‐season of calving (j = 1, 2, …, 5); Parity_j is the fixed effect of parity (primiparous or multiparous), H_k is a random effect for the herd of calving (k = 1, 2, …, 16) and u_l is a random effect animal (l = 1, 2, …, 11 733). The pedigree file contained the relationships of 18 610 animals.

The threshold model postulates an underlying continuous variable, liability, such that the observed binary response takes value 1 if liability exceeds a fixed threshold and 0 otherwise. The threshold and the residual variance are not identifiable, so these parameters were set equal to 0 and 1 respectively. Variance components were estimated using a generalized linear mixed model using the ai‐reml algorithm in the dmu package (Madsen & Jensen 2008).

Genomic prediction

Genomic prediction models were fit using four Bayesian specifications: BRR, Bayes A, Bayes B and Bayes Cπ. For these methods, the general statistical model is:

y = X β + \sum_{j = 1}^{k} m_{j} a_{j} + e

where y is an n × 1 vector of phenotypic information, β is a p × 1 vector of fixed effects, X is the incidence matrix for fixed effects, m _j is an n × 1 vector representing the genotypes of the jth SNP for all of the animals (j = 1, 2,…, k), a_j is the genetic effect of the jth SNP and $e \approx N (0, I σ_{e}^{2})$ is a vector of residuals.

In the BRR method, an independent Gaussian prior with common variance is assigned to each regression coefficient, that is, $p (a_{1}, a_{2}, \dots, a_{k} | σ_{a}^{2}) = \prod_{j = 1}^{k} N (a_{j} | 0, σ_{a}^{2})$ . The variance parameter $σ_{a}^{2}$ is treated as an unknown and assigned a scaled inverse chi‐squared (χ ⁻²) prior density, $p (σ_{a}^{2}) = χ^{- 2} (σ_{a}^{2} | v_{a}, S_{a}^{2})$ , with degrees of freedom, $v_{a} = 2$ and scale $S_{a}^{2}$ . Similarly, a χ ⁻² prior density is assumed also for the residual variance $σ_{e}^{2}$ , i.e. $p (σ_{e}^{2}) = χ^{- 2} (σ_{e}^{2} | v_{e}, S_{e}^{2})$ .

In the Bayes A model (Meuwissen et al. 2001), independent Gaussian distributions are assumed a priori for the SNP effects a_j, $a_{j} \approx N (0, σ_{a_{j}}^{2})$ , with zero mean and SNP‐specific dispersion parameter $σ_{a_{j}}^{2}$ . The variance associated with the effect of each SNP is assigned an independent and identically distributed scaled inverse chi‐square prior to distribution, $p (σ_{a_{j}}^{2}) = χ^{- 2} (σ_{a_{j}}^{2} | v_{a}, S_{a}^{2})$ , where $v_{a}$ and $S_{a}^{2}$ are known degrees of freedom and scale parameters respectively. With these specifications, the marginal prior distribution of each marker effect follows a t‐distribution, $p (a_{j} | v_{a}, S_{a}^{2}) = \int_{0}^{\infty} N (a_{j} | 0, σ_{a_{j}}^{2}) χ^{- 2} (σ_{a_{j}}^{2} | v_{a}, S_{a}^{2}) d σ_{a_{j}}^{2}$ , i.e. $p (a_{j} | v_{a}, S_{a}^{2}) = t (0, v_{a}, S_{a}^{2})$ (Rosa et al. 2003).

The Bayes B model (Meuwissen et al. 2001) assumes that, a priori, an SNP has no effect at all with a probability π, or it follows a normal distribution with zero mean and an SNP‐specific variance $σ_{j}^{2}$ with a probability 1 − π. That is,

p (a_{j} | σ_{a_{j}}^{2}, π) = \{\begin{matrix} 0 & with probability π \\ N (0, σ_{a_{j}}^{2}) & with probability (1 - π) \end{matrix}

Again, the prior distribution of SNP variances is a scaled inverse chi‐square distribution. Thus, the marginal prior distribution of a_j, after integrating $σ_{a_{j}}^{2}$ out, is a mixture distribution, as follows:

p (a_{j} | π) = \{\begin{matrix} 0 & with probability π \\ t (0, v_{a}, S_{a}^{2}) & with probability (1 - π) \end{matrix}

The Bayes Cπ method (Habier et al. 2011) is similar to the Bayes B approach, except that a prior distribution is also assumed for the proportion π of null‐effect markers, and that a Gaussian prior with a common variance is assumed for each of the non‐null marker effects. The inclusion (or exclusion) of each marker in the model is modeled by an indicator variable δ_j, which is equal to 1 if the marker j is fitted in the model or it is zero otherwise.

Predictive ability

The predictive ability was assessed by employing the strategy called “cross‐validation without doing cross‐validation” (Gianola & Schön 2016). The model performance was assessed using the area under the curve, the mean square error of the prediction and accuracy, defined as the correlation between the phenotypic information and its predicted genomic breeding value. The LOOCV without CV was performed for the Bayesian models by weighing the marker effects by importance sampling weights (Gelfand 1992; Gianola & Schön 2016). The following steps were used to assess the predictive ability:

For the GBLUP method, the algorithm used to predict the genomic breeding value û can be described as follows:

Compute the GEBV using the whole dataset:

\hat{u} = C^{- 1} y .

where $C^{- 1} = {(I + G^{- 1} λ)}^{- 1}$ and $λ = σ_{e}^{2} / σ_{g}^{2}$ .

For $i = 1, 2, \dots, n$

{\tilde{u}}_{i} = \frac{1}{1 - c^{i}} ({\hat{u}}_{i} - c^{i} y_{i}),

where G is the genomic relationship matrix, λ is the ratio between the residual and genetic variances and cⁱ is the diagonal element of the ith individual obtained from the matrix C ⁻¹.

The LOOCV without CV can be performed for the Bayesian alphabet methods by weighting the marker effects by importance sampling weights (Gelfand 1992; Gelfand & Sahu 1999; Gianola & Schön 2016; Vehtari et al. 2017). The following steps can be used to access the prediction accuracy:

Once the Markov chain Monte Carlo (MCMC) iterations are completed, the importance sampling weights for the marker effects can be computed as follows:

w_{i, s} = \frac{\exp [\frac{{(y_{i} - x_{i} a_{s})}^{2}}{2 σ_{e}^{2}}]}{\sum_{s = 1}^{s} \exp [\frac{{(y_{i} - x_{i} a_{s})}^{2}}{2 σ_{e}^{2}}]}

For $i = 1, 2, \dots, n$ ,

{\tilde{u}}_{i} = m_{i} \sum_{s = 1}^{s} w_{i, s} a_{i}^{(s)}

where w is the importance sampling weight, a is the marker effect, y is the phenotype and m _i is a vector representing the genotypes of the ith animal.

Genomic heritability

The narrow‐sense genomic heritabilities (h²) were estimated using the BRR, Bayes A, Bayes B, and Bayes Cπ models as:

h^{2} = \frac{σ_{g}^{2}}{σ_{g}^{2} + σ_{e}^{2}},

where $σ_{g}^{2}$ and $σ_{e}^{2}$ are the genomic and residual variances respectively. The genomic variance was calculated as

σ_{g}^{2} = \sum_{i = 1}^{j} 2 p_{j} (1 - p_{j}) a_{j}^{2} .

Analyses and computer resources

The bglr package (de los Campos & Pérez‐Rodríguez 2014) was used to fit the BRR, Bayes A, Bayes B and Bayes Cπ models using an MCMC method. For each model, the MCMC sampling length was 100 000 iterations, with the first 80 000 excluded as the burn‐in and a thinning interval of 10. Convergence was checked by visual inspection of trace plots and the convergence tests of Brooks, Gelman, and Rubin (Gelman & Rubin 1992) and Geweke (1992). The analyses were performed using the UW–Madison Center for High Throughput Computing in the Department of Computer Sciences.

Results and discussion

One of the most significant advances in dairy cattle health and welfare in recent decades has been the shift from treatment of clinical diseases to the implementation of preventive strategies to maintain health (LeBlanc et al. 2006). Moreover, significant research efforts are now centered on exploring genetic variation associated with the ability of animals to control disease (König & May 2019).

As reported in the literature (Gagneur et al. 2011; Miar et al. 2014), with the continuous reduction of genotyping costs over time, phenotyping has become the most important component in the calibration of GS models. In addition, the identification of a model that provides the best predictive ability and captures most of the genetic variance is one of the most important steps in genomic prediction.

Unlike the Bayesian models that use a probit link function, the pedigree‐based analyses performed to estimate the heritability used a threshold model with logit as a link function. In addition to the 2505 cows used in the genomic analyses, there were more 9229 cows with valid records. The pedigree‐based heritability ranged from 0.0418 (0.0168) to 0.3849 (0.1127) for CE and RD respectively (Table 1). The SNP‐heritability ranges among models for the health traits were (Table 2): RFM, 0.13 (Bayesian Ridge Regression) and 0.19 (Bayes B); MET, 0.12 (Bayes A) and 0.16 (Bayes Cπ); MAST, 0.09 (Bayes A) and 0.14 (Bayes B); DA, 0.06 (Bayesian Ridge Regression) and 0.09 (Bayes Cπ); lameness, 0.15 (Bayes B) and 0.20 (Bayes A); CE, 0.10 (Bayesian Ridge Regression) and 0.21 (Bayes A); RD, 0.09 (Bayesian Ridge Regression) and 0.19 (Bayes Cπ); DYST, 0.10 (Bayesian Ridge Regression) and 0.22 (Bayes Cπ); and SCK, 0.12 (Bayesian Ridge Regression) and 0.19 (Bayes A). Figures S1–S9 show Manhattan plots visualizing the genetic architecture of the health disorders considered in this study. Although the models usually deliver similar predictive abilities when properly tuned, their prior assumptions, distributions and the variable selection performed by some models (i.e. Bayes B and Bayes Cπ) result in different marker effect estimates, which could explain the difference in the genomic heritabilities obtained within traits.

Table 1.

Genetic variance and heritability estimated using pedigree‐based logit mixed model.

Trait	Genetic variance (SD)	Herd variance (SD)	Heritability¹
Retained placenta	0.6338 (0.2555)	0.1675 (0.0010)	0.1516 (0.0536)
Metritis	0.1920 (0.0748)	0.0589 (0.0007)	0.0538 (0.0200)
Mastitis	0.6077 (0.2329)	0.1946 (0.0005)	0.1457 (0.0490)
Displaced abomasum	1.3013 (0.5794)	0.1927 (0.0028)	0.2608 (0.0944)
Lameness	1.5849 (0.5800)	0.2045 (0.0000)	0.3027 (0.0831)
Clinical endometritis	0.1477 (0.0614)	0.0685 (0.0001)	0.0418 (0.0168)
Respiratory disease	2.3445 (0.9729)	0.1824 (0.0009)	0.3849 (0.1127)
Dystocia	1.1502 (0.4278)	0.0803 (0.0002)	0.2476 (0.0730)
Subclinical ketosis	0.7009 (0.2615)	0.3287 (0.0000)	0.1592 (0.0514)

Open in a new tab

Logit model: $h^{2} = σ_{u}^{2} + σ_{h}^{2} + π^{2} / 3$ , where $σ_{u}^{2}$ and $σ_{h}^{2}$ are the estimated additive genetic and herd variances respectively.

Table 2.

Genomic heritability for retained placenta, metritis, mastitis, displaced abomasum, lameness, clinical endometritis, respiratory disease, dystocia and subclinical ketosis using the Bayesian alphabet.

Trait	Bayesian Ridge Regression	Bayes A	Bayes B	Bayes C
Retained placenta	0.1342 (0.0017)	0.1576 (0.0011)	0.1935 (0.0039)	0.1729 (0.0027)
Metritis	0.1501 (0.0077)	0.1202 (0.0006)	0.1404 (0.0036)	0.1687 (0.0029)
Mastitis	0.1299 (0.0025)	0.0995 (0.0013)	0.1428 (0.0055)	0.1016 (0.0019)
Displaced abomasum	0.0609 (0.0014)	0.0671 (0.0005)	0.0799 (0.0004)	0.0965 (0.0014)
Lameness	0.1621 (0.0042)	0.2076 (0.0018)	0.1356 (0.0027)	0.1687 (0.0105)
Clinical endometritis	0.1071 (0.0014)	0.2129 (0.0014)	0.1327 (0.0016)	0.1803 (0.0042)
Respiratory disease	0.0908 (0.0007)	0.1263 (0.0014)	0.1311 (0.0008)	0.1921 (0.0028)
Dystocia	0.1043 (0.0018)	0.2064 (0.0019)	0.1783 (0.0026)	0.2246 (0.0075)
Subclinical ketosis	0.1249 (0.0039)	0.1922 (0.0019)	0.1338 (0.0039)	0.1613 (0.0095)

Open in a new tab

Table 3 shows the results of CV analysis for the predictive ability, in which a measure of the overall fit achieved with each method was assessed by the mean square error of prediction. The Bayesian Ridge Regression had the smallest mean square error for RFM; Bayes A for DA and CE; Bayes B for MET and MAST; and Bayes Cπ for lameness, RD, DYST and SCK.

Table 3.

Predictive ability (accuracy), area under the ROC curve (AUC) and mean squared error of prediction for retained fetal membranes, metritis, mastitis, displaced abomasum, lameness, clinical endometritis, respiratory disease, dystocia and subclinical ketosis using GBLUP and four Bayesian model specifications: Bayesian Ridge Regression, Bayes A, Bayes B and Bayes C.

Trait	GBLUP	Bayesian Ridge Regression	Bayes A	Bayes B	Bayes C
	Accuracy
Retained placenta	0.2600	0.2009	0.3000	0.3401	0.3219
Metritis	0.3360	0.3629	0.2097	0.2932	0.2418
Mastitis	0.3170	0.2352	0.1984	0.5546	0.2044
Displaced abomasum	0.1710	0.0615	0.0258	0.1275	0.2572
Lameness	0.1040	0.0858	0.1160	0.0931	0.1227
Clinical endometritis	0.3100	0.2530	0.3178	0.2780	0.2671
Respiratory disease	0.2510	0.0280	0.2105	0.1122	0.2350
Dystocia	0.2590	0.1307	0.3481	0.1443	0.3156
Subclinical ketosis	0.3480	0.2316	0.3023	0.2178	0.3761
	AUC
Retained placenta	0.7750	0.6819	0.7670	0.8108	0.7881
Metritis	0.7170	0.7332	0.6320	0.6805	0.6528
Mastitis	0.7810	0.6930	0.6593	0.8870	0.6707
Displaced abomasum	0.8630	0.6111	0.5515	0.7209	0.8829
Lameness	0.5809	0.6467	0.6364	0.5656	0.6572
Clinical endometritis	0.6920	0.6488	0.6896	0.6636	0.6624
Respiratory disease	0.8140	0.5626	0.8440	0.6799	0.8957
Dystocia	0.6970	0.5947	0.7301	0.6109	0.7133
Subclinical ketosis	0.7520	0.6531	0.7016	0.6447	0.7598
	Mean square error
Retained placenta	0.8970	0.2446	0.5299	0.3414	0.3077
Metritis	0.3700	0.3996	0.2481	0.1988	0.4018
Mastitis	0.7430	0.5279	0.3446	0.0879	0.5666
Displaced abomasum	0.3130	0.1940	0.1453	0.3283	0.3299
Lameness	1.4090	1.7929	1.3596	1.1740	1.0702
Clinical endometritis	0.4490	0.2634	0.2187	0.2330	0.2630
Respiratory disease	1.7760	0.5032	0.4340	0.6731	0.0202
Dystocia	0.1530	0.1370	0.1372	0.3051	0.1279
Subclinical ketosis	0.5380	0.4337	0.3405	0.4614	0.2971

Open in a new tab

Except for DA, lameness and RD, the predictive abilities were similar across models for all other traits. Overall, these results are in accordance with results in the literature (de los Campos et al. 2013; Gianola 2013), which indicate the similarity of the Bayesian models in terms of predictive ability. The models for best predictive abilities were: RFM, Bayes B (r = 0.34 and AUC = 0.81); MET, Bayes B (r = 0.34 and AUC = 0.81); MAST, Bayes B (r = 0.55 and AUC = 0.89); DA, Bayes Cπ (r = 0.26 and AUC = 0.88); lameness, Bayes Cπ (r = 0.26 and AUC = 0.88); CE, Bayes A (r = 0.32 and AUC = 0.69); RD, Bayes Cπ (r = 0.23 and AUC = 0.91); DYST, Bayes Cπ (r = 0.23 and AUC = 0.91); and SCK, Bayes Cπ (r = 0.23 and AUC = 0.91) respectively (Table 3).

The relatively low estimates of heritability might have limited the prediction accuracies of genomic breeding values. The correlation (Fig. 1) between heritability and accuracy (across methods and traits) ranges from 0.21 (Bayes C) to 0.64 (Bayes A) and from 0.30 (MET) to 0.93 (RFM). These positive and overall high estimates of correlation show that the heritability has indeed limited the prediction accuracies, among other factors. In fact, some authors have reported superior accuracies of genomic breeding values in scenarios involving higher heritable traits as well as larger numbers of phenotypic records and genotyped animals (Bolormaa et al. 2013a,b; Neves et al. 2014; Fernandes Júnior et al. 2016).

The correlation between heritability and accuracy across methods and traits.

Although four different modeling approaches were used for the genomic prediction, the correlations between phenotypes and their genomic breeding values were similar among them. However, it is important to stress that results from any specific genomic selection application cannot always be easily applied to other populations. This means that each application should involve its own model comparison exercise to maximize the predictive ability for each trait.

Conclusion

In this study, we investigated different methods for genomic prediction of breeding values of RFM, MET, MAST, DA, lameness, CE, RD, DYST and SCK. Except for DA, lameness and RD, the predictive abilities were similar between methods. Although the relatively low estimates of heritability might have limited the prediction accuracies of genomic breeding values, there was observed a strong relationship between the predictive ability and the heritability of the trait, where traits with higher heritability achieved higher accuracy and lower bias when compared with those with lower heritability. Overall, the information from high‐density SNP panel can be successfully used to predict genomic breeding values of health traits in Holstein cattle, but the model choice will most likely depend on the genetic architecture of the trait.

Conflict of interest

The authors declare that there is no conflict of interest.

Supporting information

Appendix S1 Genome‐wide association study for health traits.

Figures S1–S9 show Manhattan plots visualizing the genetic architecture of retained fetal membranes, metritis, mastitis, displaced abomasum, lameness, clinical endometritis, respiratory disease, dystocia and subclinical ketosis.

Click here for additional data file.^{(1.1MB, docx)}

Acknowledgements

This research was performed using the computer resources and assistance of the UW‐Madison Center for High Throughput Computing in the Department of Computer Sciences. The authors acknowledge USDA (NIFA AFRI Translational Genomics for Improved Fertility of Animals grant #2013‐68004) for financial support to perform the phenotyping and genotyping of the study animals.

Data availability

The phenotypes and SNP datasets of this Holstein population will be made accessible upon request for reproduction of results.

References

Bauman D.E. & Currie W.B. (1980) Partitioning of nutrients during pregnancy and lactation: a review of mechanisms involving homeostasis and homeorhesis. Journal of Dairy Science 63, 1514–29. [DOI] [PubMed] [Google Scholar]
Bolormaa S., Pryce J.E., Kemper K.E., Hayes B.J., Zhang Y., Tier B., Barendse W., Reverter A. & Goddard M.E. (2013a) Detection of quantitative trait loci in Bos indicus and Bos taurus cattle using genome‐wide association studies. Genetetics Selection Evolution 45, 43. [DOI] [PMC free article] [PubMed] [Google Scholar]
Bolormaa S., Pryce J.E., Kemper K. et al (2013b) Accuracy of prediction of genomic breeding values for residual feed intake and carcass and meat quality traits in Bos taurus, Bos indicus, and composite beef cattle. Journal of Animal Science 91, 3088–104. [DOI] [PubMed] [Google Scholar]
de los Campos G., Hickey J.M., Pong-Wong R., Daetwyler H.D. & Calus M.P.L. (2013) Whole-genome regression and prediction methods applied to plant and animal breeding. Genetics 193, 327–45. [DOI] [PMC free article] [PubMed] [Google Scholar]
de los Campos G. & Pérez-Rodríguez P. (2014) Bayesian Generalized Linear Regression. R package version 1.0.4. Available at: http://CRAN.R-project.org/package=BGLR (accessed August 1, 2019).
Daetwyler H.D., Pong‐Wong R., Villanueva B. & Woolliams J.A. (2010) The impact of genetic architecture on genome‐wide evaluation methods. Genetics 185, 1021–31. [DOI] [PMC free article] [PubMed] [Google Scholar]
Fernandes Júnior G.A., Rosa G.J.M., Valente B.D. et al (2016) Genomic prediction of breeding values for carcass traits in Nellore cattle. Genetetics Selection Evolution 48, 7. [DOI] [PMC free article] [PubMed] [Google Scholar]
Gagneur J.M., Elze C. & Tresch A. (2011) Selective phenotyping, entropy reduction, and the mastermind game. BMC Bioinformatics 12, 406. [DOI] [PMC free article] [PubMed] [Google Scholar]
Gelfand A.E. & Sahu S.K. (1999) Identifiability, improper priors and Gibbs sampling for generalized linear models. Journal of the American Statistical Association 94, 247–53. [Google Scholar]
Gelfand M.S. (1992) Statistical analysis and prediction of the exonic structure of human genes. Journal of Molecular Evolution 35, 239–52. [DOI] [PubMed] [Google Scholar]
Gelman A. & Rubin D.B. (1992) Inference from iterative simulation using multiple sequences. Statistical Science 7, 457–72. [Google Scholar]
Geweke J. (1992) Evaluating the accuracy of sampling-based approaches to the calculation of posterior moments In: Bayesian Statistics 4 (Ed. by Bernardo J.M., Berger J.O., Dawid A.P. & Smith A.F.M.), pp 169–93. Oxford University Press, Oxford, UK. [Google Scholar]
Gianola D. (2013) Priors in whole‐genome regression: the Bayesian alphabet returns. Genetics 194, 573–96. [DOI] [PMC free article] [PubMed] [Google Scholar]
Gianola D. & Schön C.C. (2016) Cross‐validation without doing cross‐validation in genome‐enabled prediction. G3: Genes|Genomes|Genetics 6, 3107–28. [DOI] [PMC free article] [PubMed] [Google Scholar]
Goff J.P. (2004) Macromineral disorders of the transition cow. Veterinary Clinics of North America: Food Animal Practice 20, 471–94. [DOI] [PubMed] [Google Scholar]
Habier D., Fernando R.L., Kizilkaya K. & Garrick D.J. (2011) Extension of the Bayesian alphabet for genomic selection. BMC Bioinformatics 12, 186. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hammon D.S., Evjen I.M., Dhiman T.R., Goff J.P. & Walters J.L. (2006) Neutrophil function and energy status in Holstein cows with uterine health disorders. Veterinary Immunology and Immunopathology 113, 21–9. [DOI] [PubMed] [Google Scholar]
König S. & May K. (2019) Invited review: Phenotyping strategies and quantitative‐genetic background of resistance, tolerance and resilience associated traits in dairy cattle. Animal 13, 897–908. [DOI] [PubMed] [Google Scholar]
LeBlanc S.J., Lissemore K.D., Kelton D.F., Duffield T.F. & Leslie K.E. (2006) Major advances in disease prevention in dairy cattle. Journal of Dairy Science 89, 1267–79. [DOI] [PubMed] [Google Scholar]
Madsen P. & Jensen J. (2008) A User's Guide to DMU. A Package for Analysing Multivariate Mixed Models. Version 6. Release 4.7. Danish Institute of Agricultural Sciences, Tjele. [Google Scholar]
Magnabosco C.U., Lopes F.B., Fragoso R.C., Eifert E.C., Valente B.D., Rosa G.J. & Sainz R.D. (2016) Accuracy of genomic breeding values for meat tenderness in Polled Nellore cattle. Journal of Animal Science 94, 2752–60. [DOI] [PubMed] [Google Scholar]
Meuwissen T.H.E., Hayes B.J. & Goddard M.E. (2001) Prediction of total genetic value using genome‐wide dense marker maps. Genetics 157, 1819–29. [DOI] [PMC free article] [PubMed] [Google Scholar]
Miar Y., Plastow G.S., Bruce H.L., Moore S.S., Durunna O.N., Nkrumah J.D. & Wang Z. (2014) Estimation of genetic and phenotypic parameters for ultrasound and carcass merit traits in crossbred beef cattle. Canadian Journal of Animal Science 94, 273–80. [Google Scholar]
Neves H.H., Carvalheiro R., Brien A.M.O. et al (2014) Accuracy of genomic predictions in Bos indicus (Nellore) cattle. Genetics Selection Evolution 46, 17. [DOI] [PMC free article] [PubMed] [Google Scholar]
Pérez P., de los Campos G., Crossa J. & Gianola D. (2010) Genomic‐enabled prediction based on molecular markers and pedigree using the Bayesian linear regression package in R. Plant Genome Journal 3, 106. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ribeiro E.S., Lima F.S., Greco L.F. et al (2013) Prevalence of periparturient diseases and effects on fertility of seasonally calving grazing dairy cows supplemented with concentrates. Journal of Dairy Science 96, 5682–97. [DOI] [PubMed] [Google Scholar]
Rosa G.J.M., Padovani C.R. & Gianola D. (2003) Robust linear mixed models with normal/independent distributions and Bayesian MCMC implementation. Biometrical Journal 45, 573–90. [Google Scholar]
Santos J.E.P., Bisinotto R.S., Ribeiro E.S., Lima F.S., Greco L.F., Staples C.R. & Thatcher W.W. (2010) Applying nutrition and physiology to improve reproduction in dairy cattle. Reproduction and Fertility Supplement 67, 387–403. [DOI] [PubMed] [Google Scholar]
Sargolzaei M., Chesnais J. & Schenkel F.S. (2014) FImpute Users’ Guide. 17. Available at: http://animalbiosciences.uoguelph.ca/~msargol/fimpute/ (accessed June 1, 2019). [Google Scholar]
Vehtari A., Gelman A. & Gabry J. (2017) Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Statistics and Computing 27, 1413–32. [Google Scholar]
Weller J.I., Ezra E. & Ron M. (2017) Invited review: A perspective on the future of genomic selection in dairy cattle. Journal of Dairy Science 100, 8633–44. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Appendix S1 Genome‐wide association study for health traits.

Click here for additional data file.^{(1.1MB, docx)}

Data Availability Statement

The phenotypes and SNP datasets of this Holstein population will be made accessible upon request for reproduction of results.

[age12892-bib-0001] Bauman D.E. & Currie W.B. (1980) Partitioning of nutrients during pregnancy and lactation: a review of mechanisms involving homeostasis and homeorhesis. Journal of Dairy Science 63, 1514–29. [DOI] [PubMed] [Google Scholar]

[age12892-bib-0002] Bolormaa S., Pryce J.E., Kemper K.E., Hayes B.J., Zhang Y., Tier B., Barendse W., Reverter A. & Goddard M.E. (2013a) Detection of quantitative trait loci in Bos indicus and Bos taurus cattle using genome‐wide association studies. Genetetics Selection Evolution 45, 43. [DOI] [PMC free article] [PubMed] [Google Scholar]

[age12892-bib-0003] Bolormaa S., Pryce J.E., Kemper K. et al (2013b) Accuracy of prediction of genomic breeding values for residual feed intake and carcass and meat quality traits in Bos taurus, Bos indicus, and composite beef cattle. Journal of Animal Science 91, 3088–104. [DOI] [PubMed] [Google Scholar]

[age12892-bib-0004] de los Campos G., Hickey J.M., Pong-Wong R., Daetwyler H.D. & Calus M.P.L. (2013) Whole-genome regression and prediction methods applied to plant and animal breeding. Genetics 193, 327–45. [DOI] [PMC free article] [PubMed] [Google Scholar]

[age12892-bib-0005] de los Campos G. & Pérez-Rodríguez P. (2014) Bayesian Generalized Linear Regression. R package version 1.0.4. Available at: http://CRAN.R-project.org/package=BGLR (accessed August 1, 2019).

[age12892-bib-0006] Daetwyler H.D., Pong‐Wong R., Villanueva B. & Woolliams J.A. (2010) The impact of genetic architecture on genome‐wide evaluation methods. Genetics 185, 1021–31. [DOI] [PMC free article] [PubMed] [Google Scholar]

[age12892-bib-0007] Fernandes Júnior G.A., Rosa G.J.M., Valente B.D. et al (2016) Genomic prediction of breeding values for carcass traits in Nellore cattle. Genetetics Selection Evolution 48, 7. [DOI] [PMC free article] [PubMed] [Google Scholar]

[age12892-bib-0008] Gagneur J.M., Elze C. & Tresch A. (2011) Selective phenotyping, entropy reduction, and the mastermind game. BMC Bioinformatics 12, 406. [DOI] [PMC free article] [PubMed] [Google Scholar]

[age12892-bib-0009] Gelfand A.E. & Sahu S.K. (1999) Identifiability, improper priors and Gibbs sampling for generalized linear models. Journal of the American Statistical Association 94, 247–53. [Google Scholar]

[age12892-bib-0010] Gelfand M.S. (1992) Statistical analysis and prediction of the exonic structure of human genes. Journal of Molecular Evolution 35, 239–52. [DOI] [PubMed] [Google Scholar]

[age12892-bib-0011] Gelman A. & Rubin D.B. (1992) Inference from iterative simulation using multiple sequences. Statistical Science 7, 457–72. [Google Scholar]

[age12892-bib-0012] Geweke J. (1992) Evaluating the accuracy of sampling-based approaches to the calculation of posterior moments In: Bayesian Statistics 4 (Ed. by Bernardo J.M., Berger J.O., Dawid A.P. & Smith A.F.M.), pp 169–93. Oxford University Press, Oxford, UK. [Google Scholar]

[age12892-bib-0013] Gianola D. (2013) Priors in whole‐genome regression: the Bayesian alphabet returns. Genetics 194, 573–96. [DOI] [PMC free article] [PubMed] [Google Scholar]

[age12892-bib-0014] Gianola D. & Schön C.C. (2016) Cross‐validation without doing cross‐validation in genome‐enabled prediction. G3: Genes|Genomes|Genetics 6, 3107–28. [DOI] [PMC free article] [PubMed] [Google Scholar]

[age12892-bib-0015] Goff J.P. (2004) Macromineral disorders of the transition cow. Veterinary Clinics of North America: Food Animal Practice 20, 471–94. [DOI] [PubMed] [Google Scholar]

[age12892-bib-0016] Habier D., Fernando R.L., Kizilkaya K. & Garrick D.J. (2011) Extension of the Bayesian alphabet for genomic selection. BMC Bioinformatics 12, 186. [DOI] [PMC free article] [PubMed] [Google Scholar]

[age12892-bib-0017] Hammon D.S., Evjen I.M., Dhiman T.R., Goff J.P. & Walters J.L. (2006) Neutrophil function and energy status in Holstein cows with uterine health disorders. Veterinary Immunology and Immunopathology 113, 21–9. [DOI] [PubMed] [Google Scholar]

[age12892-bib-0018] König S. & May K. (2019) Invited review: Phenotyping strategies and quantitative‐genetic background of resistance, tolerance and resilience associated traits in dairy cattle. Animal 13, 897–908. [DOI] [PubMed] [Google Scholar]

[age12892-bib-0019] LeBlanc S.J., Lissemore K.D., Kelton D.F., Duffield T.F. & Leslie K.E. (2006) Major advances in disease prevention in dairy cattle. Journal of Dairy Science 89, 1267–79. [DOI] [PubMed] [Google Scholar]

[age12892-bib-0020] Madsen P. & Jensen J. (2008) A User's Guide to DMU. A Package for Analysing Multivariate Mixed Models. Version 6. Release 4.7. Danish Institute of Agricultural Sciences, Tjele. [Google Scholar]

[age12892-bib-0021] Magnabosco C.U., Lopes F.B., Fragoso R.C., Eifert E.C., Valente B.D., Rosa G.J. & Sainz R.D. (2016) Accuracy of genomic breeding values for meat tenderness in Polled Nellore cattle. Journal of Animal Science 94, 2752–60. [DOI] [PubMed] [Google Scholar]

[age12892-bib-0022] Meuwissen T.H.E., Hayes B.J. & Goddard M.E. (2001) Prediction of total genetic value using genome‐wide dense marker maps. Genetics 157, 1819–29. [DOI] [PMC free article] [PubMed] [Google Scholar]

[age12892-bib-0023] Miar Y., Plastow G.S., Bruce H.L., Moore S.S., Durunna O.N., Nkrumah J.D. & Wang Z. (2014) Estimation of genetic and phenotypic parameters for ultrasound and carcass merit traits in crossbred beef cattle. Canadian Journal of Animal Science 94, 273–80. [Google Scholar]

[age12892-bib-0024] Neves H.H., Carvalheiro R., Brien A.M.O. et al (2014) Accuracy of genomic predictions in Bos indicus (Nellore) cattle. Genetics Selection Evolution 46, 17. [DOI] [PMC free article] [PubMed] [Google Scholar]

[age12892-bib-0025] Pérez P., de los Campos G., Crossa J. & Gianola D. (2010) Genomic‐enabled prediction based on molecular markers and pedigree using the Bayesian linear regression package in R. Plant Genome Journal 3, 106. [DOI] [PMC free article] [PubMed] [Google Scholar]

[age12892-bib-0026] Ribeiro E.S., Lima F.S., Greco L.F. et al (2013) Prevalence of periparturient diseases and effects on fertility of seasonally calving grazing dairy cows supplemented with concentrates. Journal of Dairy Science 96, 5682–97. [DOI] [PubMed] [Google Scholar]

[age12892-bib-0027] Rosa G.J.M., Padovani C.R. & Gianola D. (2003) Robust linear mixed models with normal/independent distributions and Bayesian MCMC implementation. Biometrical Journal 45, 573–90. [Google Scholar]

[age12892-bib-0028] Santos J.E.P., Bisinotto R.S., Ribeiro E.S., Lima F.S., Greco L.F., Staples C.R. & Thatcher W.W. (2010) Applying nutrition and physiology to improve reproduction in dairy cattle. Reproduction and Fertility Supplement 67, 387–403. [DOI] [PubMed] [Google Scholar]

[age12892-bib-0029] Sargolzaei M., Chesnais J. & Schenkel F.S. (2014) FImpute Users’ Guide. 17. Available at: http://animalbiosciences.uoguelph.ca/~msargol/fimpute/ (accessed June 1, 2019). [Google Scholar]

[age12892-bib-0030] Vehtari A., Gelman A. & Gabry J. (2017) Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Statistics and Computing 27, 1413–32. [Google Scholar]

[age12892-bib-0031] Weller J.I., Ezra E. & Ron M. (2017) Invited review: A perspective on the future of genomic selection in dairy cattle. Journal of Dairy Science 100, 8633–44. [DOI] [PubMed] [Google Scholar]

PERMALINK

Genome‐enable prediction for health traits using high‐density SNP panel in US Holstein cattle

F Lopes

G Rosa

P Pinedo

J E P Santos

R C Chebel

K N Galvao

G M Schuenemann

R C Bicalho

R O Gilbert

S Rodrigez‐Zas

C M Seabury

W Thatcher

Summary

Introduction

Materials and methods

Study population, health assessment and outcomes of interest

Animals and genotyping

Imputation

Pedigree‐based analyses

Genomic prediction

Predictive ability

Genomic heritability

Analyses and computer resources

Results and discussion

Table 1.

Table 2.

Table 3.

Figure 1.

Conclusion

Conflict of interest

Supporting information

Acknowledgements

Data availability

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases