Skip to main content
JDS Communications logoLink to JDS Communications
. 2021 Aug 26;2(6):366–370. doi: 10.3168/jdsc.2021-0092

Genomic evaluations using data recorded on smallholder dairy farms in low- to middle-income countries

Owen Powell 1,*, Raphael Mrode 2,3, R Chris Gaynor 1, Martin Johnsson 1,4, Gregor Gorjanc 1, John M Hickey 1
PMCID: PMC9623656  PMID: 36337118

Graphical Abstract

graphic file with name fx1.jpg

Summary: This study quantified the power of genomic information to enable genetic evaluation for smallholder dairy farmers in low and middle-income countries (LMICs). Stochastic simulations were used to generate pedigree, genotype, and trait information for populations with weak genetic connectedness and small herd sizes, reflecting the structure of LMIC dairy cattle datasets. Genomic and pedigree-based evaluations were compared on the accuracy of estimated breeding values for cows with own phenotypes.

Highlights

  • Genomic evaluations outperformed pedigree-based genetic evaluations.

  • Shared haplotypes captured "hidden" genetic relationships to strengthen connectedness in genomic evaluations.

  • Genomic evaluations were possible using LMIC smallholder records from herds with ≤4 cows. . Modelling herd as a random effect produced EBVs with the highest accuracies.

Abstract

Breeding has increased genetic gain for dairy cattle in advanced economies but has had limited success in improving dairy cattle in low- to middle-income countries (LMIC). Genetic evaluations are a central component of delivering genetic gain, because they separate the genetic and environmental effects of animals' phenotypes. Genetic evaluations have been successful in advanced economies because of large data sets and strong genetic connectedness, provided by the widespread use of artificial insemination (AI) and accurate recording of pedigree information. In smallholder dairy production systems of many LMICs, the limited use of AI and small herd sizes results in a data structure with insufficient genetic connectedness between herds to facilitate genetic evaluations based on pedigree. Genomic information keeps track of shared haplotypes rather than shared relatives captured by pedigree records. Therefore, genomic information could capture “hidden” genetic relationships, that are not captured by pedigree information, to strengthen genetic connectedness in LMIC smallholder dairy data sets. This study's objective was to use simulation to quantify the power of genomic information to enable genetic evaluation using LMIC smallholder dairy data sets. The results from this study show that (1) genetic evaluations using genomic information were more accurate than those using pedigree information in populations with a high effective population size and weak genetic connectedness; and (2) genetic evaluations modeling herd as a random effect had higher or equal accuracy than those modeling herd as a fixed effect. This demonstrates the potential of genomic information to be an enabling technology in LMIC smallholder dairy production systems by facilitating genetic evaluations with in situ records collected from herds of ≤4 cows. The establishment of routine genomic evaluations could allow the development of LMIC breeding programs comprising an informal set of nucleus animals distributed across many small herds within the target environment. These nucleus animals could be used for genetic evaluation, and the best animals could be disseminated to participating smallholder dairy farms. Together, this could increase the productivity, profitability, and sustainability of LMIC smallholder dairy production systems.


The large increase in milk yield of dairy cattle in advanced economies over the past century is an example of the powerful effect that selective breeding can have on improving livestock productivity (Cole et al., 2020). For example, in the US dairy industry, milk production per cow approximately doubled between 1964 and 2004 (Ma et al., 2019). However, breeding practices have had poor efficacy and adoption in smallholder dairy production systems in many low- to middle-income countries (LMIC), despite the potential benefits. Recent estimates from Kenyan smallholder farms suggest that average milk production per cow is approximately 5 L/d, and there is little evidence of significant genetic improvement in recent decades (Kahi et al., 2004; Muriuki, 2011; Ojango et al., 2016). The low levels of productivity and its economic importance have renewed efforts to improve dairy cow productivity in LMIC smallholder dairy farms (East African Dairy Development Program, 2013; Rothschild and Plastow, 2014; Ducrocq et al., 2018).

Genetic evaluation is a central component of delivering genetic gain. The properties of an ideal data set that enables an accurate genetic evaluation include (1) genetic connectedness between herds or management groups (Kennedy and Trus, 1993); (2) large numbers of animals; and (3) large herd sizes. Such data enable the genetic and environmental effects of an individual animal's phenotype to be accurately separated. These features are not present in many LMIC smallholder dairy production systems. For example, smallholder dairy farmers in Kenya and other East African countries with ≤5 cows account for more than 70% of milk production (East African Dairy Development Program, 2013; Abdulsamad and Gereffi, 2016). Simultaneously, there is a low prevalence of AI use (5–10%; Ojango et al., 2014). Traditionally, this has prevented the establishment of effective pedigree-based genetic evaluation systems in these settings.

Genomic evaluations use a genomic relationship matrix to capture the realized, rather than expected, pedigree-derived relationships between animals (Nejati-Javaremi et al., 1997). The use of genomic information has enhanced many genetic evaluation systems in advanced economies. For example, the accuracy (the square root of reliability) of prediction for milk yield of young candidate bulls increased from 0.62 using pedigree-based BLUP (PBLUP) to 0.85 for genomic-based BLUP (GBLUP; Wiggans et al., 2017). In the context of LMIC smallholder dairy production systems, genomic data could be even more important than it has been in advanced economies. In such a setting, genomic data could capture and utilize information pertaining to haplotypes shared by animals in different herds. This information could reveal genetic connectedness that is unseen by pedigree information, which would, in turn, enable more accurate partitioning of the genetic and environmental effects on an animal's performance in small herds. Therefore, the use of genomic data could establish effective genetic evaluation systems based on data sets with relatively low levels of genetic connectedness (according to pedigree information).

Herd or management groups are usually included in the genetic evaluation model to enhance the separation of the genetic and systematic environmental effects of an animal's performance. Herds can be modeled as fixed or random effects (Schaeffer, 2009). Most genetic evaluations in advanced economies model herd as a fixed effect because herd sizes are typically large, which leads to fixed and random effect models giving almost equal solutions (Ugarte et al., 1992; Visscher and Goddard, 1993). When herd sizes are very small, such as in many LMIC smallholder dairy production systems, modeling herd as a fixed effect leads to an over-parameterized system of equations or inaccurate solutions (Oikawa and Sato, 1997). Modeling small herds as random effects may reduce this inaccuracy, yielding EBV with higher accuracies.

This study used simulation to quantify first the power of genomic information to enable genetic evaluation based on phenotypes recorded on smallholder dairy farms; and then, under such conditions, the impact of modeling herds as fixed or random effects. The simulations were performed using AlphaSimR (Gaynor et al., 2020) and were designed to (1) generate whole-genome sequence data, SNP, and QTL; (2) mate 1,000 sires per generation and vary the average herd size to generate pedigree structures with weak genetic connectedness to resemble LMIC smallholder dairy populations; and (3) run genetic evaluations modeling herd as either fixed or random effects. Ten independent replicates of the complete pipeline; that is, the simulation scheme and genetic evaluations, were completed. Code for the pipeline can be accessed at https://github.com/powellow/lmic_gblup. Conceptually, the simulation scheme was divided into historical and evaluation phases (Figure 1).

Figure 1.

Figure 1

An overview of the simulation pipeline. Genome sequences were simulated for founder individuals and 10,000 segregating sites were assigned additive genetic effects for a low heritability trait representing total milk yield. Eleven generations of breeding were undertaken to generate a population with pedigree, genotype, and trait information. In the final generation, populations of 8,000 cows were assigned across herds of different sizes to generate the training sets for genetic evaluations..

A genome consisting of 10 chromosome pairs was simulated. The Markovian Coalescent Simulator (MaCS; Chen et al., 2009) and AlphaSimR were used to generate sequence data for 2,000 founder animals, with an effective population size (Ne) of 1,035 in the final generation, to reflect the high genetic diversity found in cattle populations in Africa (Kim et al., 2017). The 2,000 founder animals served as the initial parents of the simulation. Segregating sites were randomly selected to serve as 5,000 SNP markers per chromosome (50,000 genome-wide in total) and 1,000 QTL per chromosome (10,000 genome-wide in total). A single record for the trait representing total milk yield from a single lactation was simulated for all animals. Therefore, no missing values were present in the data. The true breeding values (TBV) were calculated by summing the average effects of the animal's genotype at each QTL. The QTL additive effects were sampled from a standard normal distribution, N(0,1), and linearly scaled to produce TBV in the founder population with a variance (σa2) of 0.2. Herd and random error effects were sampled from normal distributions, resulting in a trait with a narrow-sense heritability of 0.1 and herd effect variance ratio of 0.4, chosen based upon previous literature (Ojango et al., 2019). The TBV, herd effects, and random error effects were summed to create the phenotypes of the animal.

Recent (burn-in) breeding for milk yield was simulated over 5 discrete generations of selective breeding on phenotype. The features of this breeding stage were 225 sires per generation, 1,000 dams per generation, and 2,000 offspring per generation. These numbers were chosen to match the base population (Ne) of 1,035.

The evaluation phase of the simulation then modeled breeding with weak genetic connectedness for an additional 6 generations, following the common recent breeding burn-in phase. The common features across the evaluation phase were 1,000 sires mated per generation, offspring generated with an equal sex ratio, and, for simplicity, the selection of sires based on phenotype. Genetic connectedness was varied between different training populations by changing the average herd size. Herd sizes were sampled in 2 steps. In the first step, 8,000 samples were taken from a Poisson distribution with a lambda, the mean of the distribution, equal to 1, 2, 4, 8, and 16. However, stochasticity during the sampling process can result in the sum of the herd sizes differing from the size of training population of cows (8,000). Therefore, a second step randomly sampled herds to correct the difference between the sampled herd sizes and the required herd sizes to leave us with exactly 8,000 “slots” across all of the herds. This resulted in training populations with final average herd sizes of 1.58, 2.32, 4.06, 8, and 16. At the end of the simulation's evaluation phase, 8,000 phenotyped cows were randomly selected and randomly assigned to herds to serve as the training populations for the genetic evaluations. This number reflected the number of genotyped animals in the Africa Dairy Genetic Gains project at the time.

Breeding values were estimated using the following basic model:

y = Xb + Zu + e, [1]

where y is a vector of phenotype records measured on cows; b is a vector of fixed effects; u is a vector of breeding values for which we assumed that u˜N(0,Aσa2) with PBLUP and u˜N(0,Gσa2) with GBLUP, where A is the pedigree numerator relationship matrix (Henderson, 1975) based on the last 5 generations of error-free pedigrees, and G is the genomic numerator relationship matrix of individuals from the final generation of the evaluation phase (based on 50k SNP chip; VanRaden, 2008); e is a vector of residuals for which we assumed e~N(0,Iσe2); X and Z are the incidence matrices linking phenotype records respectively to b and u; and I is the identity matrix. We adapted the basic model to create 3 genetic evaluation models in relation to a herd effect: (1) we excluded it, which gave us the basic model with intercept as the only fixed effect; (2) we modeled herd as a fixed effect; and (3) we modeled herd as a random effect, for which we assumed h~N(0,Iσh2). All models included an overall intercept. We assumed that the variances of herd effects (σh2), breeding values (σa2), and residuals (σe2) were known, and we set them to the simulated values. Only phenotype data from generation 6 of the evaluation phase were used in genetic evaluations to mimic the recent introduction of phenotype, pedigree, and genomic data recording.

The PBLUP and GBLUP models were run using the Wombat software (Meyer, 2007). The different training populations and genetic evaluation models were compared based upon the accuracy of the EBV. We report mean and 95% interval of estimates over replicates. Accuracy was measured as the Pearson correlation coefficient between EBV and TBV.

The results from this study showed that genomic information enabled accurate genetic evaluation of phenotyped cows using data sets that comprised small herds with weak genetic connections. The main trends observed were that (1) genetic evaluations using genomic information had higher accuracy than those using pedigree information across all breeding designs; and (2) genetic evaluations with genomic information and modeling herd as a random effect had higher or equal accuracy compared with modeling herd as a fixed effect. The superiority of genetic evaluations using genomic information over pedigree information was consistent across trait heritabilities (h2 = 0.1, 0.3, and 0.5), but this superiority declined as heritability increased (data not shown). The superiority of modeling herd as a random effect was consistent across trait heritabilities (data not shown).

The genetic evaluation of phenotyped cows using genomic information had higher accuracy than that using pedigree information across breeding designs. Table 1 reports the accuracy of EBV of pedigree versus genomic evaluations as average herd size changed. The accuracies reported correspond to models in which herd was modeled as a random effect. At an average herd size of 1.58, phenotyped cows had an accuracy of EBV of 0.40 with PBLUP and 0.52 with GBLUP (an increase of 0.12). At all other average herd sizes, the increase in accuracy of GBLUP compared with PBLUP was between 0.12 and 0.13. Therefore, comparisons of different genetic evaluation models are only presented for GBLUP.

Table 1.

The impact of genetic evaluation method on EBV accuracy1

Method Size of herd
1.58 2.32 4 8 16
PBLUP 0.40 0.40 0.43 0.44 0.45
GBLUP 0.52 0.52 0.55 0.56 0.58
1

Comparison of the accuracy of genetic evaluation methods for training populations with different average herd sizes and using the pedigree (PBLUP) or genomic (GBLUP) method. Herd was modeled as a random effect. Standard errors were ≤0.01.

Genomic evaluations with herd modeled as a random effect had higher accuracy than modeling herd as a fixed effect at small average herd sizes. However, the accuracies of the 2 modeling approaches converged once a herd size of 8 was reached. Figure 2 plots the average herd size against accuracy for each of the 3 evaluation models. Figure 2 shows that excluding a herd effect gave an accuracy of 0.48, averaged across all herd sizes. At average herd sizes of 1.58 and 2.32, modeling herd as a random effect increased the accuracy by 0.10 and 0.05, respectively, compared with modeling herd as a fixed effect. At an average herd size of 8, the accuracies from the 2 modeling approaches practically converged.

Figure 2.

Figure 2

A comparison of the statistical modeling of herd effects with genomic BLUP (GBLUP), showing the accuracy of EBVs for training populations with different average herd size (1–16) when the herd effect was (1) excluded from the model (No Herd), (2) modeled as a fixed effect (Herd Fixed), and (3) modeled as a random effect (Herd Random).

Our results demonstrate that genomic evaluations could be effective for dairy cattle populations with weak genetic connectedness, small herd sizes, and low heritability traits. The improvement in the accuracies of EBV with genomic evaluations (Wiggans et al., 2017) compared with pedigree-based evaluations and modeling small herds as random effects have separately been shown previously in advanced economies (Ugarte et al., 1992; Visscher and Goddard, 1993; Schaeffer, 2009). We have confirmed these previous findings and extend them by demonstrating that genomic evaluations provide accurate EBV when using data structures representative of LMIC smallholder dairy production systems. Such smallholder dairy production systems will further benefit from using genomic evaluations compared with pedigree-based evaluations because of implicit increases in genetic connectedness between very small herds as a result of tracking shared haplotypes rather than shared relatives. The increases in genetic connectedness result in lower confounding between genetic and nongenetic effect estimates. However, our simulations did not model the full complexity of practical genetic evaluations for LMIC smallholder dairy production systems. A limitation of the current study is the assumption of the quality of phenotypes. We partially reflected the low quality of phenotypes by simulating a trait heritability of 0.1, based on empirical data (Ojango et al., 2019). However, we ignored the impact of missing data. Therefore, this study's results, dependent on the accurate estimation of variance components, should be validated with empirical data. Projects such as Africa Dairy Genetic Gains, with ongoing data collection efforts, are in a prime position to do so.

The establishment of effective genomic evaluations could enable in situ data recorded on smallholder farms to be used to drive in situ genetic improvement for LMIC target production environments. Such LMIC breeding programs could comprise an informal set of nucleus animals distributed across a subset of small herds within the target environment. These nucleus animals could be used for the genetic evaluation and the best animals disseminated to participating smallholder dairy farms. Together, this would increase the productivity, profitability, and sustainability of LMIC smallholder dairy production systems.

However, the infrastructure required for such breeding programs and the associated technologies is expensive, potentially creating a new cost barrier to animal breeding success in LMIC smallholder dairy production systems. New business models are needed to overcome this barrier in a self-sustaining way. These business models could bundle technology, data recording, extension services, and a marketplace for LMIC smallholder farmers. This type of self-sustaining platform would maximize the benefits and cost-efficiency of any component (e.g., the genotyping and phenotyping of animals). The Africa Dairy Genetic Gains and Public Private Partnership for AI Dissemination (PAID, 2017) projects, emerging social enterprises (e.g., One Acre Fund; https://oneacrefund.org/), and electronic marketplaces for agricultural products in LMICs (e.g., Livestock 247; https://livestock247.com) show that many components of such a model are already in place.

Notes

The authors acknowledge financial support from the Biotechnology and Biological Sciences Research Council (BBSRC) Institute Strategic Programme Grant to The Roslin Institute (Edinburgh, UK; BBS/E/D/30002275), and the Centre for Tropical Livestock Genetics and Health (CTLGH, Edinburgh, UK) dairy genetics program.

The authors thank Georgios Banos (The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh and Scotland's Rural College, Edinburgh, UK) for valuable comments on a previous draft of the paper. This work made use of the resources provided by the Edinburgh Compute and Data Facility (ECDF; http://www.ecdf.ed.ac.uk).

The authors have not stated any conflicts of interest.

References

  1. Abdulsamad A., Gereffi G. East Africa dairy value chains: Firm capabilities to expand regional trade. Africa Dairy Genetic Gains (ADGG) 2016. https://www.theigc.org/wp-content/uploads/2017/03/Dairy-chain-brief.pdf
  2. Chen G.K., Marjoram P., Wall J.D. Fast and flexible simulation of DNA sequence data. Genome Res. 2009;19:136–142. doi: 10.1101/gr.083634.108. 19029539. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Cole J.B., Eaglen S.A.E., Maltecca C., Mulder H.A., Pryce J.E. The future of phenomics in dairy cattle breeding. Anim. Front. 2020;10:37–44. doi: 10.1093/af/vfaa007. 32257602. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Ducrocq V., Laloe D., Swaminathan M., Rognon X., Tixier-Boichard M., Zerjal T. Genomics for ruminants in developing countries: From principles to practice. Front. Genet. 2018;9:251. doi: 10.3389/fgene.2018.00251. 30057590. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. East African Dairy Development Program Final Report. 2013. https://cgspace.cgiar.org/bitstream/handle/10568/79437/EADD%20FINAL%20REPORT.pdf
  6. Gaynor R.C., Gorjanc G., Wilson D.L., Money D., Hickey J.M. AlphaSimR: An R Package for Breeding Program Simulations. 2020. https://CRAN.R-project.org/package=AlphaSimR [DOI] [PMC free article] [PubMed]
  7. Henderson C.R. Best linear unbiased estimation and prediction under a selection model. Biometrics. 1975;31:423–447. doi: 10.2307/2529430. 1174616. [DOI] [PubMed] [Google Scholar]
  8. Kahi A.K., Nitter G., Gall C.F. Developing breeding schemes for pasture based dairy production systems in Kenya: II. Evaluation of alternative objectives and schemes using a two-tier open nucleus and young bull system. Livest. Prod. Sci. 2004;88:179–192. doi: 10.1016/j.livprodsci.2003.07.015. [DOI] [Google Scholar]
  9. Kennedy B.W., Trus D. Considerations on genetic connectedness between management units under an animal model. J. Anim. Sci. 1993;71:2341–2352. doi: 10.2527/1993.7192341x. 8407646. [DOI] [PubMed] [Google Scholar]
  10. Kim J., Hanotte O., Mwai O.A., Dessie T., Bashir S., Diallo B., Agaba M., Kim K., Kwak W., Sung S., Seo M., Jeong H., Kwon T., Taye M., Song K.D., Lim D., Cho S., Lee H.J., Yoon D., Oh S.J., Kemp S., Lee H.K., Kim H. The genome landscape of indigenous African cattle. Genome Biol. 2017;18:34. doi: 10.1186/s13059-017-1153-y. 28219390. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Ma L., Sonstegard T.S., Cole J.B., VanTassell C.P., Wiggans G.R., Crooker B.A., Tan C., Prakapenka D., Liu G.E., Da Y. Genome changes due to artificial selection in U.S. Holstein cattle. BMC Genomics. 2019;20:128. doi: 10.1186/s12864-019-5459-x. 30744549. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Meyer K. Wombat: A tool for mixed model analyses in quantitative genetics by REML. J. Zhejiang Uni. Science B. 2007;8:815–821. doi: 10.1631/jzus.2007.B0815. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Muriuki H.G. Food and Agriculture Organization of the United Nations; 2011. Dairy development in Kenya.http://www.fao.org/3/al745e/al745e.pdf [Google Scholar]
  14. Nejati-Javaremi A., Smith C., Gibson J.P. Effect of total allelic relationship on accuracy of evaluation and response to selection. J. Anim. Sci. 1997;75:1738–1745. doi: 10.2527/1997.7571738x. 9222829. [DOI] [PubMed] [Google Scholar]
  15. Oikawa T., Sato K. Treating small herds as fixed or random in an animal model. J. Anim. Breed. Genet. 1997;114:177–183. doi: 10.1111/j.1439-0388.1997.tb00503.x. 21395813. [DOI] [PubMed] [Google Scholar]
  16. Ojango J.M.K., Marete A., Mujibi D., Rao J., Pool J., Rege J.E.O., Gondro C., Weerasinghe W.M.S.P., Gibson J.P., Okeyo A.M. A novel use of high density SNP assays to optimize choice of different crossbred dairy cattle genotypes in small-holder systems in East Africa. Proc. 10th World Congress of Genetics Applied to Livestock Production 2–4. 2014. [DOI]
  17. Ojango J.M.K., Mrode R., Rege J.E.O., Mujibi D., Strucken E.M., Gibson J., Mwai O. Genetic evaluation of test-day milk yields from smallholder dairy production systems in Kenya using genomic relationships. J. Dairy Sci. 2019;102:5266–5278. doi: 10.3168/jds.2018-15807. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Ojango J.M.K., Wasike C.B., Enahoro D.K., Okeyo A.M. Dairy production systems and the adoption of genetic and breeding technologies in Tanzania, Kenya, India and Nicaragua. Anim. Genet. Resour. 2016;59:81–95. doi: 10.1017/S2078633616000096. [DOI] [Google Scholar]
  19. PAID Public Private Partnership for Artificial Insemination Delivery (PAID) 2017. https://www.slideshare.net/ILRI/adgg-paid2-feb2017
  20. Rothschild M.F., Plastow G.S. Applications of genomics to improve livestock in the developing world. Livest. Sci. 2014;166:76–83. doi: 10.1016/j.livsci.2014.03.020. [DOI] [Google Scholar]
  21. Schaeffer L.R. Contemporary groups are always random. 2009. http://www.aps.uoguelph.ca/~lrs/LRSsite/ranfix.pdf
  22. Ugarte E., Alenda R., Carabaño M.J. Fixed or random contemporary groups in genetic evaluations. J. Dairy Sci. 1992;75:269–278. doi: 10.3168/jds.S0022-0302(92)77762-5. [DOI] [Google Scholar]
  23. VanRaden P.M. Efficient methods to compute genomic predictions. J. Dairy Sci. 2008;91:4414–4423. doi: 10.3168/jds.2007-0980. 18946147. [DOI] [PubMed] [Google Scholar]
  24. Visscher M.P., Goddard M.E. Fixed and random contemporary groups. J. Dairy Sci. 1993;76:1444–1454. doi: 10.3168/jds.S0022-0302(93)77475-5. [DOI] [Google Scholar]
  25. Wiggans G.R., Cole J.B., Hubbard S.M., Sonstegard T.S. Genomic selection in dairy cattle: The USDA experience. Annu. Rev. Anim. Biosci. 2017;5:309–327. doi: 10.1146/annurev-animal-021815-111422. 27860491. [DOI] [PubMed] [Google Scholar]

Articles from JDS Communications are provided here courtesy of Elsevier

RESOURCES