Abstract
Sheep (Ovis aries) are a major source of meat, milk and fiber in the form of wool, and represent a distinct class of animals that have a specialized digestive organ, the rumen, which carries out the initial digestion of plant material. We have developed and analyzed a high quality reference sheep genome and transcriptomes from 40 different tissues. We identified highly expressed genes encoding keratin cross-linking proteins associated with rumen evolution. We also identified genes involved in lipid metabolism that had been amplified and/or had altered tissue expression patterns. This may be in response to changes in the barrier lipids of the skin, an interaction between lipid metabolism and wool synthesis, and an increased role of volatile fatty acids in ruminants, compared to non-ruminant animals.
Sheep and goats are thought to be the first domesticated livestock species and thus integral to animal husbandry. Sheep are ruminants, digesting plant material in a four chambered stomach (1), the largest compartment is the rumen, which uses microbial flora to ferment the feed facilitating the conversion of ligno-cellulose rich plant materials, of low value in the human diet, to animal protein (2). The rumen is thought to have evolved around 35-40 million years ago (3) coinciding with the emergence of grasslands, in a cooler climate and atmosphere containing lower CO2 than today (4, 5). Ruminants are now the dominant terrestrial herbivores. The rumen microbial flora also generate volatile fatty acids (VFAs) (6) requiring specialized energy and lipid metabolism in ruminants, and produce the greenhouse gas methane, which may be relevant to climate change (7). Another feature of sheep is wool, which has a significant proportion of its weight made up of lanolin, formed primarily from wax esters (8, 9). Thus, synthesis of wool may be linked to fatty acid metabolism. Given these unusual evolutionary traits, sheep provide a novel area for exploration of the genetic underpinnings of digestion and fatty acid metabolism.
We assembled the reference genome sequence of the sheep (Fig. 1) from two Texel individuals totaling ~150 fold sequence coverage (table S1) using linkage and radiation hybrid maps (tables S2, S3) to order and orientate the super-scaffolds (10). The final sheep genome assembly, Oar v3.1, has a contig N50 length of ~40 kb and a total assembled length of 2.61 Gb, with ~99% anchored onto the 26 autosomes and the X chromosome (table S4) (10). About 0.2% of each Texel’s genome are heterozygous loci, i.e. SNPs (Single Nucleotide Polymorphisms) (Figs. 1C, D), a quarter of which are heterozygous in both animals. Due to selection for a beneficial muscle hypertrophy phenotype in the Texel breed (11) (OMIA 001426-9940) both sheep share a long run of homozygosity spanning the MSTN gene (Figs. 1C, D). The SNPs also enabled us to identify allele-specific gene expression (Fig 1E) (supplementary online text) (10).
We identified segmental duplications in sheep (Fig. 1A, table S5) (supplementary online material) (10) and compared the genome assemblies of sheep, goats and cattle (fig. S1) identifying 141 breakpoints between sheep and cattle (Fig. 1B, table S6) (supplementary online text). We compared the sequences of sheep, goat, cattle, yak, pig, camel, horse, dog, mouse, opossum and human proteins and identified 4,850 single-copy orthologous genes from which we constructed a phylogenetic tree (Figs. 2, S2) (10). The separation of sheep from goats and the diversification of the bovids occurred contemporaneously with the expansion of the C4 grasses in the late Neogene (4). RNASeq transcriptome data were generated from 94 tissue samples (from 40 tissues), including 83 from four additional Texel individuals (table S7) (supplementary online text) (10). A protein clustering analysis among the 11 mammalian species identified 321 expanded sub-families in the ruminant branch, of which 73 were ruminant specific (tables S8, S9) (10). We identified sheep genes exhibiting changes in copy number (e.g. lysozyme C-related proteins, prolactin–related proteins, pregnancy-associated glycoproteins. RNASE1, ASIP, MOGAT2, MOGAT3) and changes in tissue specificity of gene expression (e.g. MOGAT2, MOGAT3, FABP9) (Figs. 1, 2) (supplementary online text) (10).
The mammalian epidermal development complex (EDC) region contains up to 70 genes encoding proteins involved in keratinized epidermal structure development, including the rumen (1), skin and wool (12). The sheep EDC region (Figs. 1, 3) included several previously unidentified, or poorly annotated, genes in any mammalian genome (table S10). One such gene in the top 0.1% of all genes expressed in the rumen, but not expressed in the skin (Fig. 3A), is predicted to encode a large S100 fused-type protein (12). This protein has homology to trichohyalin (TCHH) (12), which is highly expressed in the skin, and we designated it Trichohyalin-like 2 (TCHHL2) (figs. S3, S4). Expressed sequence tag (EST) data indicate that TCHHL2 is also expressed in the rumen of cattle (supplementary online text). Although not previously annotated, syntenically conserved orthologous genes to TCHHL2 were detected in many mammalian genomes, including a marsupial, the Tasmanian devil, and a monotreme, the platypus (fig. S3), but not in the chicken, suggesting that TCHHL2 may be specific to mammals. All TCHHL2 orthologs encode a protein containing up to 70 tandem copies of a highly variable 15 amino acid repeat, rich in arginine, glutamic acid, aspartic acid, and proline, which does not appear to be rumen specific (fig. S3). A short array of 7 copies of the 15 amino acid repeat unit has been duplicated in the common ancestor of ruminants (Figs. 3B, S4). S100 fused-type proteins are substrates for transglutaminase-mediated cross-linking to themselves, and to other proteins including keratins, during epidermal cornification and hair/wool development (12), suggesting TCHHL2 may play a role in cross-linking the keratins at the rumen surface. TCHHL2 was expressed in other sheep epidermal tissues, but >1000x lower than in rumen; thus, it may have a similar, but less extensive, role in these tissues.
PRD-SPRRII (13) is also in the EDC region and the top 0.1% of genes expressed in the rumen, but not in any other tissue sampled including skin (Fig. 3A). PRD-SPRRII is homologous to the SPRR2 gene family (12, 14), but encodes a protein with a distinctive proline and histidine rich sequence which disrupts the glutamine rich amino-terminus present in SPRR2 proteins, potentially affecting its transglutamination sites (Fig. 3C) (15). We identified four additional genes related to PRD-SPRRII in the sheep EDC region, two of which were also highly expressed in the rumen, but not in any other tissue sampled including the skin (Fig. 3A), and eight related genes in the cattle EDC region also expressed in the rumen, but none in non-ruminants (Fig. 3C) (supplementary online text). Thus it appears that the ruminant specific PRD-SPRRII family genes, resulting from the amplification and sequence divergence of an SPRR2 gene, have gained a new expression pattern, altered amino-terminal sequence, and modified function during rumen evolution. By analogy with SPRR2 (12, 16) the PRD-SPRRII family proteins are predicted to be major structural proteins and may function in the cornification of the keratin-rich surface of the rumen.
The primary role of the skin and wool, an important economic product of sheep, is to form a barrier between the organism and the external environment, reducing water and heat loss and pathogen entry (17). The sheep EDC gene, LOC101122906, which we designated LCE7A, represents a previously unrecognized subfamily of the late cornified envelope (LCE) genes (12, 18) (fig. S5). We identified LCE7A coding sequence in the genomes of most mammals, although it has not been previously annotated (fig. S5). LCE7A is expressed in sheep (Fig. 3A, table S10), cattle and goat skin (supplementary online text), but not in the rumen, or any other tissue examined (Fig. 3A). In situ hybridization showed LCE7A expression in Merino wool follicles including the inner root sheath (fig. S6). LCE7A contains the transglutaminase target site present in most LCE proteins (fig. S5) and is likely to be a substrate for transglutaminase mediated cross-linking of proteins in the epidermis, inner root sheath or wool (18). LCE7A also appears to be under positive selection in the sheep lineage, with a sheep vs. cattle ka/ks ratio of 2.5 (P < 0.005), possibly reflecting an association of LCE7A with wool development.
Wool grease (lanolin), secreted from the sebaceous glands attached to the wool follicles, constitutes 10-25% of the wool weight (9). The wool follicles are located in the dermal layer between the surface keratinocytes, which synthesize surface lipids (19), and the subcutaneous adipose tissue. We identified the genes encoding lipid metabolic enzymes expressed in the skin (table S11) and positioned them on known and putative lipid metabolic pathways likely to be involved in the storage and mobilization of long chain fatty acid components of the sebum and epidermal lipids (Fig. 4A). Unexpectedly, the skin transcriptome revealed high expression of MOGAT2 and MOGAT3, members of the acylglycerol O-acyltransferase (DGAT2/MOGAT) gene family which are involved in the synthesis of diacylglyceride (DAG) and triacylglyceride (TAG) from monoacylglceride (MAG) (Fig. 4A). In humans, MOGAT3 is an essential and rate-limiting step for the absorption of dietary fat in the small intestine (20), and an important liver enzyme (21). MOGAT2 and MOGAT3 are single copy genes in almost all mammals with available data. However, in ruminants both genes have undergone tandem gene expansions, indicative of evolving functionality. (Figs. 4B, S7, S8). MOGAT2 has more than five tandemly duplicated copies in sheep with the first copy expressed in the duodenum and the last copy expressed in the skin, with no expression of any copy detected in the liver (fig. S7). Three nearly identical MOGAT3 copies were highly expressed in sheep skin, and at a much lower level in white adipose and omentum (Fig. 4B). In contrast to humans, we detected no expression of functional MOGAT3 in sheep duodenum or liver. The skin had two MOGAT3 splice variants (fig. S9): the most common transcript encodes a predicted protein orthologous to the typical mammalian MOGAT3; the second contains a predicted alternate start codon and amino-terminal sequence that is missing the probable membrane anchor (fig. S10) (supplementary online text), predicted to be uncoupled from the endoplasmic reticulum membrane bound TAG synthesis pathway (Fig. 4A).
The presence of MOGAT2 and MOGAT3 in sheep skin indicates that there may be an alternative pathway for DAG synthesis, either from recycling MAG generated from the mobilization of TAG stored within a cell to generate fatty acids for incorporation into other products, or from external sources of MAG (Fig. 4A). The MOGAT pathway does not generate glycerol, which requires phosphorylation in the liver prior to reuse for TAG synthesis in the skin via the glycerol-3-phosphate (G3P) pathway (Fig. 4A), potentially increasing the efficiency of recycling of the glycerol backbone in sheep skin. The MOGAT pathway also bypasses 1-acyl-lysophosphatidic acid (LPA) and phosphatidic acid (PA) synthesis (Fig. 4A). Skin produces a lipase (LIPH) to cleave PA into 2-acyl LPA, which has a role in controlling hair follicle development (22). LIPH mutations in several mammalian species result in wool-like hair due to changes in follicle shape (23). Thus the MOGAT pathway in sheep skin may also reduce the coupling between TAG and PA synthesis, skin barrier lipid synthesis and follicle development signaling, facilitating wool production.
In ruminants, the liver is primarily a gluconeogenic organ utilizing propionate (a VFA) as the source of carbon. It contributes little to the synthesis of lipids, or to the uptake of lipids from circulation, and is inefficient at exporting TAG (24, 25). The apparent loss of MOGAT3 expression in the intestines and both MOGAT2 and MOGAT3 in the liver may reflect the greater importance of VFAs and reduced importance of the liver in long chain fatty acid metabolism in ruminants compared to non-ruminants.
In conclusion, we identified major genomic signatures associated with interactions between diet, the digestive system and metabolism in ruminants. These include two extensions of their biochemical capabilities that have been extensively exploited by humans: the production of wool by sheep and the evolution of an organ that houses a diverse community of microorganisms which enable efficient digestion of plants.
Supplementary Material
Acknowledgements
The International Sheep Genomics Sequencing Consortium is grateful to the following for funding support for the sheep genome sequencing project, two 973 Programs (No. 2013CB835200, No. 2007CB815700) to Kunming Institute of Zoology, China; BGI-Shenzhen (ZYC200903240077A, ZYC200903240078A); China National GeneBank-Shenzhen for support for the storage of samples and data; Inner Mongolia Agricultural University (30960246,31260538); The Roslin Institute, University of Edinburgh and Biotechnology and Biological Sciences Research Council, U.K. (BBSRC): BB/1025360/1, BB/I025328/1, Institute Strategic Programme and National Capability Grants; The Roslin Foundation; The Scottish Government, U.K.; Defra/HEFC/SHEFC Veterinary Training and Research Initiative, U.K.; USDA-ARS, USA, USDA-NRICGP, USA (grant numbers 2008-03923 and 2009-03305); Wellcome Trust (grant numbers WT095908 and WT098051), BBSRC (grant numbers BB/I025506/1) and EMBL; USDA-NRSP-8, USA; USDA-ARS grant 5348-32000-031-00D; Meat and Livestock Australia and Australian Wool Innovation Limited through sheepGENOMICS, Australia; Australian Government International Science Linkages Grant (CG090143), Australia; University of Sydney, Australia; CSIRO, Australia; AgResearch, NZ, Beef + Lamb NZ through Ovita, New Zealand; INRA and ANR project SheepSNPQTL, France; European Union through the Seventh Framework Programme Quantomics (KBBE222664) and 3SR (KBBE245140) projects; the Ole RØmer grant from Danish Natural Science Research Council, BGI-Shenzhen, China; Australian Department of Agriculture Food and Fisheries, Filling the Research Gap project, “Host control of methane emissions”. We thank Brad Freking (USDA-ARS-USMARC) for provision of Texel ram tissue samples for DNA extraction and sequencing. We thank the sequencing teams and other contributors, full details in the acknowledgements section of the supplementary material. We thank SheepGENOMICS and Utah State University for access to the genotyping data for the SheepGENOMICS and LSU flocks respectively.
Footnotes
The authors declare no competing financial interests.
The genome assemblies have been deposited in GenBank, Oar v1.0 (GCA_000005525.1), and Oar v3.1 (GCA_000298735.1) and in GigaDB, Oar v2.0 (http://dx.doi.org/10.5524/100023). The Ensembl annotation of the Oar v3.1 assembly is available at http://www.ensembl.org/Ovis_aries/Info/Annotation#assembly and the gene builds are available from ftp://ftp.ensembl.org/pub/release-74/bam/ovis_aries/genebuild/. The RNA-Seq datasets have been deposited in public databases; 83 samples from The Roslin Institute (ENA study accession PRJEB6169), seven tissues from the genome sequenced Texel ewe and Gansu alpine fine wool sheep skin (GenBank accession GSE56643), 3 blood samples (GenBank BioProject accession PRJNA245615). The MeDIP-seq raw reads from the Texel ewe liver have been deposited in GenBank (GSE56644). The BAC sequence assembly has been deposited in GenBank (KJ735098). The raw reads of the genome sequencing projects have been deposited in public nucleotide databases; Texel ewe (GenBank accession SRA059406), Texel ram (ENA study accession PRJEB6251, GenBank accession SRP015759), 0.5 fold coverage 454 sequencing of six animals (GenBank accessions SRP000982, SRP003883, SRP006794).
References and Notes
- 1.Hofmann RR. Oecologia. 1989;78:443–457. doi: 10.1007/BF00378733. [DOI] [PubMed] [Google Scholar]
- 2.Wolin MJ. Science. 1981;213:1463–1468. doi: 10.1126/science.7280665. [DOI] [PubMed] [Google Scholar]
- 3.Hackmann TJ, Spain JN. J. Dairy Sci. 2010;93:1320–1334. doi: 10.3168/jds.2009-2071. [DOI] [PubMed] [Google Scholar]
- 4.Stromberg CAE. Annu. Rev. Earth Planet. Sci. 2011;39:517–544. [Google Scholar]
- 5.Edwards EJ, et al. Science. 2010;328:587–591. doi: 10.1126/science.1177216. [DOI] [PubMed] [Google Scholar]
- 6.Bergman EN. Physiol. Rev. 1990;70:567–590. doi: 10.1152/physrev.1990.70.2.567. [DOI] [PubMed] [Google Scholar]
- 7.Johnson KA, Johnson DE. J. Anim. Sci. 1995;73:2483–2492. doi: 10.2527/1995.7382483x. [DOI] [PubMed] [Google Scholar]
- 8.Stewart ME. chap. 43. In: Bereiter-Hahn J, Matoltsy AG, Richards KS, editors. Biology of the integument. Vol. 2. Springer; Berlin Heidelberg: 1986. pp. 824–832. [Google Scholar]
- 9.Schlossman M, McCarthy J. J. Am. Oil Chem. Soc. 1978;55:447–450. [Google Scholar]
- 10.Material and methods are available as supplementary material on Science Online
- 11.Clop A, et al. Nat. Genet. 2006;38:813–818. doi: 10.1038/ng1810. [DOI] [PubMed] [Google Scholar]
- 12.Kypriotou M, Huber M, Hohl D. Exp. Dermatol. 2012;21:643–649. doi: 10.1111/j.1600-0625.2012.01472.x. [DOI] [PubMed] [Google Scholar]
- 13.Wang L, Baldwin R. L. t., Jesse BW. Biochem. J. 1996;317(Pt 1):225–233. doi: 10.1042/bj3170225. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Song H, et al. Genomics. 1999;55:28–42. doi: 10.1006/geno.1998.5607. [DOI] [PubMed] [Google Scholar]
- 15.Deng J, Pan R, Wu R. J. Biol. Chem. 2000;275:5739–5747. doi: 10.1074/jbc.275.8.5739. [DOI] [PubMed] [Google Scholar]
- 16.Steinert P, Candi E, Kartasova T, Marekov L. J. Struct. Biol. 1998;122:76–85. doi: 10.1006/jsbi.1998.3957. [DOI] [PubMed] [Google Scholar]
- 17.Feingold K. J. Lipid Res. 2007;48:2531–2546. doi: 10.1194/jlr.R700013-JLR200. [DOI] [PubMed] [Google Scholar]
- 18.Marshall D, Hardman M, Nield K, Byrne C. Proc. Natl. Acad. Sci. U.S.A. 2001;98:13031–13036. doi: 10.1073/pnas.231489198. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Radner F, Grond S, Haemmerle G, Laser A, Zechner R. Dermatoendocrinol. 2011;3:77–83. doi: 10.4161/derm.3.2.15472. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Cheng D, et al. J. Biol. Chem. 2003;278:13611–13614. doi: 10.1074/jbc.C300042200. [DOI] [PubMed] [Google Scholar]
- 21.Hall AM, et al. J. Lipid. Res. 2012;53:990–999. doi: 10.1194/jlr.P025536. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Kazantseva A, et al. Science. 2006;314:982–985. doi: 10.1126/science.1133276. [DOI] [PubMed] [Google Scholar]
- 23.Inoue A, et al. EMBO J. 2011;30:4248–4260. doi: 10.1038/emboj.2011.296. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Bobe G, Young J, Beitz D. J. Dairy Sci. 2004;87:3105–3124. doi: 10.3168/jds.S0022-0302(04)73446-3. [DOI] [PubMed] [Google Scholar]
- 25.Ingle D, Bauman D, Garrigus U. J. Nutr. 1972;102:617–623. doi: 10.1093/jn/102.5.617. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.