Abstract
Although there have been many studies of native Korean cattle, Hanwoo, there have been no selective sweep studies in these animals. This study was performed to characterize genetic variation and identify selective signatures. We sequencedthe genomes of 12 cattle, and identified 15125420 SNPs, 1768114 INDELs, and 3445 CNVs. The SNPs, INDELs, and CNVs were similarly distributed throughout the genome, and highly variable regions were shown to contain the BoLA family and GPR180, which are related to adaptive immunity. We also identified the domestication footprints of the Hanwoo population by searching for selective sweep signatures, which revealed the RCN2 gene related to BPV resistance. The results of this study may contribute to genetic improvement of the Hanwoo population in Korea. [BMB Reports 2013; 46(7):346-351]
Keywords: Adaptation, Genetic variants, Hanwoo, Korean native cattle, Selective sweep
INTRODUCTION
Identification of the genetic differences responsible for variations in phenotypic traits is one of the goals of livestock genomic research. Accordingly, it is important to characterize the genetic variation in livestock species. Domestication processes and breeding development have been imprinted in the genomes and provide detectable clues of selection within the cattle genome (1,2). Selective sweep signatures may have contributed to the domestication process (3,4). Despite various Hanwoo (Korean native cattle) mapping and diversity studies (5,6), the identification of mutations affecting some quantitative phenotypes (7,8), and several expression studies (9-11), selective sweep signatures in these animals have not been identified. There have been several attempts to conduct such characterization at the population level of other cattle by detecting selective sweep signatures (1,2). Hanwoo were used as a draft type until about 30 years ago, whereas dairy cattle have been artificially selected for milk production for a long time. However, Hanwoo have only been bred for meat for a short period, and it is important to evaluate its genetic features to improve performance as beef cattle. Therefore, the present study was performed to characterize the entire genome sequence of Hanwoo Korean native cattle at the population level.
Genetic variations, including SNPs, insertions and deletions (INDELs), and large structural variations, such as CNV, shape the genome architecture and provide signatures for identification of variations contributing to adaptation. Large structural variations can be attributed to inversions and translocations, duplications, and deletions. Although existing at lower rates than SNPs, CNVs and INDELs comprise another important type of genomic variation (11) that can be used to better understand genome features.
Adaptation from standing genetic variation implies that there are neutral or weakly deleterious variations that are maintained with a long history in populations and that they become advantageous with changes in the environment (12). Experimental evolution, which tests hypotheses or theories of evolution using an experimentally controlled population, has provided evidence that adaptation from standing variation is more repeatable than that from new mutations (12). Standing variation could contribute to rapid adaptation after a sudden environmental change (13); for example, Rowan et al. showed experimentally that sticklebacks have sufficient standing genetic variation to adapt to a significant change in climate in only three generations (14). Such adaptations are influenced by the type and quantity of the available genetic variation (15). Therefore, a comprehensive description of variations in a population will lead to a deeper understanding of its biology.
To investigate the features of the Hanwoo genome, a genomic scan of the Korean cattle population was performed for selective sweep signatures and highly variable regions were identified.
RESULTS
Identification of genetic variants
To construct high-quality Hanwoo re-sequencing data, we generated pair-end reads using an Illumina HiSeq2000. After removing possible PCR duplicates, 97.12% of all the reads, which together corresponds to mapping of 37603284033 bp, each individual was successfully aligned against the bovine reference genome UMD3.1. On average, a depth of 14.2× was achieved and the mapped reads covered an average of 98.96% of the reference genome. The depth was calculated for each individual using the DepthOfCoverage.jar of GATK (16), with the range of depth of 12.61×-15.53× (TableS1) and coverage was calculated using BEDTools (17).
For the UMD3.1 reference assembly, a total of 15125420 SNPs were identified, and the change rate was 166 base pairs(TableS2). Of these SNPs, 27% were located in genic regions, while 73% were located in intergenic regions (Supplement Fig. 2). A total of 1768114 INDELs were identified, and the change rate was 1420 base pairs (TableS2). Of these INDELs, 30% were located in genic regions and 70% were located in intergenic regions (Supplement Fig. 2). The minimum quality of SNPs and INDELs was 30. The quality of variants and INDEL length are shown in Supplment Fig. 6 and 7.
In this study, CNVs were defined based only on deletion type using Genome STRiP (18). A total of 3445 CNVs (10.2 Mb), with an average length of 2.97 kb (range 0.23-902 kb) were identified, and the change rate was 729196 base pairs (TableS2). Overall, 17% of the CNVs were located in genic regions overlapping with at least one gene, if not covering an entire gene, while 83% were located in intergenic regions without overlapping any gene (Supplement Fig. 2). Evaluation of the length distribution of CNVs showed that the number of CNVs decreased with increasing size, except for the 1-3 kb region (Supplement Fig.8). This exception corresponds to LINEs around 2,000 bp. This effect has also been identified in humans and cattle using the same method as used in this study and with other methods, respectively (18,19). These findings indicated that the exception is not due to the analysis method or sample population.
Correlation among variants and repeat elements
We found that SNPs, INDELS, and CNVs had similar distributions throughout the genome, and that the number of variants was related to the length of the genome. We calculated the proportions of the three types of variants on each chromosome and found that chromosomal proportions of each type of variant were similar to those of chromosomal length (Supplement Fig. 1a) and that the number of variants had a strong linear relationship with chromosomal length (Supplement Fig. 1b-d).
The number of variants at each 1 Mb bin and 10 Mb bin were similarly distributed throughout the genome (Fig. 1, Supplement Fig. 3). CNVs were relatively less similar to the other variants. As there was a small number of CNVs when using the large 10 Mb bin compared to the 1 Mb bin, the genomic distribution of the CNVs became similar to that of the other types of variants (Supplement Fig. 3).
Highly variable region
There are common regions in which many types of variants exist (Fig. 1). For example, the BoLA family is present in the first polymorphic common region, chr23:23-31 Mb, which is located near the centromere. The BoLA family is a type of cattle MHC, and has sufficient variation for immune defense. There are also many contigs around the 2nd highest peak of common region, chr12:69-77 Mb, which is near the centromere. Specifically, GPC6, BIRC3, DCT, TGDS, GPR180, CLDN10, DZIP1, DNAJC3, HS6ST3, MBNL2, and RAP2A are located in this region. The average nucleotide diversity was as high as 12.95 (TableS3), calculated by VCFtools (20). BoLA and GPR180 are located in a highly variable region, containing sufficient variants, including synonymous SNPs, non-synonymous SNPs, and intronic SNPs (Supplement Fig. 4A, B). However, RCN2 has only a small number of variants (Supplement Fig. 4C). SNPs and CNVs are enriched with MHC and a G protein-coupled receptor gene region, which has also been reported in other species (21).
A significant genome-wide correlation between the number of SNPs and INDELs was identified (Pearson’s correlation r = 0.87, P < 0.05) (Supplement Fig. 5). SNPs and INDELs were also highly correlated with CNV, while low complexity and simple repeats were positively correlated with CNV, and the GC content was negatively correlated with CNV.
Selective sweep region
We estimated a folded SFS because no ancestral allele information was available. The SFS of the SNPs and INDELs were normal, while those of CNV showed balancing selection (Supplement Fig. 11). Linkage disequilibrium (LD)-based ω statistics and SFS-based Λ statistics available within the population were used to identify selective sweep regions (Fig. 2). As expected, SweepFinder and OmegaPlus detected different regions and showed strong signals. The highest significance regions of Λ statistics were chr2:72-72.5 Mb and chr10:35.5-36 Mb. The chr21:32-33 Mb, chr18:5.5-10.5 kb, and 15.9 Mb regions showed clear ω statistics signatures. The first significant Λ statistics region (chr2:72-72.5 Mb) included the genes PTPN4, EPB41L5, TMEM185B, RALB, and INHBB, while the genes FSIP1, GPR176, SRP14, BMF, and PAK6 were located in the second significant Λ statistics region (chr10:35.5-36 Mb) (Table 1). The first significant ω statistics region (chr21:32-33 Mb) included the genes ETFA, ISL2, RCN2, PSTPIP1, and TSPAN3, while the ITFG1 gene was located in the second significant ω statistics region (chr18:15.9 Mb) (Table 1).
Table 1.
Symbol | Location (Mb) | CLR ratio |
---|---|---|
| ||
PTPN4 | 2:72.05-72.15 | 1106.470 |
EPB41L5 | 2:72.17-72.33 | 1190.173 |
RALB | 2:72.36-72.43 | 709.583 |
INHBB | 2:72.52 | 540.436 |
FSIP1 | 10:35.35-35.54 | 551.626 |
GPR176 | 10:35.56-35.69 | 851.561 |
SRP14 | 10:35.80 | 456.310 |
BMF | 10:35.85-35.87 | 442.939 |
PAK6 | 10:35.98-36.01 | 537.912 |
Symbol | Location (Mb) | Omega |
| ||
ITFG1 | 18:15.6-15.9 | 4.519 |
ETFA | 21:31.9-32 | 9.607 |
ISL2 | 21:32-32.1 | 5.7917 |
RCN2 | 21:32.57-32.59 | 5.291 |
PSTPIP1 | 21:32.64-32.69 | 8.454 |
TSPAN3 | 21:32.7-32.72 | 12.280 |
Combination statistics were used to identify common signals. The RCN2 gene was located in the highest combination statistics region (chr21:32.5 Mb) between the ω statistics and Λ statistics region. This gene produces a calcium-binding protein located in the lumen of the ER that contains six conserved regions with similarity to a high affinity Ca+2-binding motif, the EF-hand.
DISCUSSION
Highly variable region
The three types of variants were similarly distributed throughout the genome. Even if a type of variant was different, the quantity of the variant was affected by common effects, such as the mutation and recombination rates. These findings were supported by the results of a previous study in which SNP detection accuracy was shown to be affected by CNVs (19,21). A significant genome-wide correlation between SNP and INDEL density was identified in a separate study (19).
We identified a hypervariable region near the centromere and showed that the mutations were related to adaptive immunity. Specifically, the bovine MHC class II region lies near the centromere of BTA23 (Fig. 1). Variation in the high recombination rate region may be due to the existence of a polymorphic recombination hotspot (22-24), and regions with high recombination rates were significantly closer to the centromere (25). Genetic standing variants facilitate the emergence of adaptation. The MHC region contains a diverse array of genes that are crucial for the initiation of adaptive immune responses. All types of variants were enriched with the BoLA (bovine MHC) family in the highest peak region and GPR genes involved in the interaction with extracellular molecules in the second highest peak region (Fig. 1). Similar findings have been reported in other animals, e.g., the three-spined stickleback has many SNP and CNV variants in the MHC region (21), and genetic variations at the MHC loci are known to be involved in pathogen resistance (26). These regions are comprised of many contigs (Supplement Fig. 10) and the region of BTA23 has been confirmed experimentally (22-24). The cattle genome build UMD 3.1 has a low amount of erroneous duplication and error (27, 28); therefore, this was not caused by UMD 3.1 genome assembly.
Selective sweep genes
Generally, cattle have been marked by selection during domestication, breed formation, and ongoing selection to enhance performance and productivity. We utilized two methods to detect genomic selection in cattle: (i) the ω statistic (29,30), which detects specific LD patterns caused by genetic hitchhiking, and (ii) the composite likelihood ratio (CLR) Λ (31) using the SFS, which describes the frequency of allelic variants and shifts from neutral expectation toward rare and high frequency derived variants.
We found evidence of selective sweeps on chromosomes 2, 10, 18, and 21 (Table 1 and Fig. 2). Among these regions, we identified commonly selected regions near RCN2 (E6BP) by considering the ω and Λ values together, and this region had the highest correlation (r=0.9983). This gene interacted with cancer-associated HPV E6 and with BPV-1 E6. The transforming activity of BPV-1 E6 mutants was correlated with their E6BP-binding ability (32). Calcium is required for entry into mitosis, and E6 may play a role in this stage of cell growth, indicating that RCN2 is also important. The RCN gene has undergone a selective sweep, which may suggest that Korean cattle are resistant to BPV (33). In support of this suggestion, experimental evidence has been reported indicating that Korean native cattle have greater resistance to BPV than Holsteins (34).
Using the ω and Λ values, several genes were detected and we investigated whether this region was related with the cattle quantitative trait locus (QTL) except dairy cattle QTL information. Among the QTLs identified in the selective sweep region, the chr10:35.5-36 Mb region is related to the longissimus muscle area and the chr18:15.9 Mb region is related to carcass weight, longissimus muscle area, and social separation vocalization. Hanwoo was used as a draft type of cattle before 1980, but is now used for beef production; therefore, the QTLs of the selective sweep region are reasonable.
The results of the present study indicated that the distributions of SNPs, INDELs, and CNVs have a correlated pattern. We found that the selective sweep signatures of the Hanwoo genome and the highly variable region were related to adaptive immunity. We hope that these characteristics will contribute to genetic improvement and breeding of this strain. Future studies should include larger samples and various breeds and phenotypes to obtain better understanding of cattle genomes.
MATERIALS AND METHODS
Sample preparation and re-sequencing
Whole-blood samples were collected from 12 Hanwoo bulls from Kyungpook National University, Korea. Blood (10 ml) was drawn from the carotid artery and treated with heparin to prevent clotting. DNA was isolated from whole blood using a G-DEXTMIIb Genomic DNA Extraction Kit (iNtRoN Biotechnology, Seoul, Korea) according to the manufacturer’s protocol. We randomly sheared 3 μg of genomic DNA using the Covaris System to generate inserts of ∼300 bp. The fragments of sheared DNA were end-repaired, A-tailed, adaptor ligated, and amplified using a TruSeq DNA Sample Prep. Kit (Illumina, San Diego, CA). Paired-end sequencing was conducted by NICEM (National Instrumentation Center for Environmental Management, Seoul, Korea) using the Illumina HiSeq2000 platform with TruSeq SBS Kit v3-HS (Illumina). Finally, sequence data were generated using the Illumina HiSeq system.
Sequence alignment and genotype calling
Pair-end sequence reads were mapped to the reference bovine genome (UMD3.1) using Bowtie 2 with the default settings (35). Four open-source packages were used for downstream processing and variant calling for SNPs and INDELs: Picard tools (http://picard.sourceforge.net), SAMtools 0.1.18 (36), VCFtools 4.0 (20), and the Genome Analysis Toolkit 1.4 (16). Read Group was added and duplicate reads were removed using MarkDuplicates of Picard tools. SAMtools was used to index the resulting bam format files and to calculate the mapped read length with the flagstat option (36). Realignment and variant calling were performed using GATK (16) with Count-Covariates, RealignerTargetCreator, IndelRealigner, Select-Variant, VariantFiltration, and UnifiedGenotyper. VCFtools was used for handling the vcf file format (20).
Substitution calls were made with GATK UnifiedGenotyper (16). SNPs and INDELs called with a Phred-scaled quality score of less than 30 were filtered out. Variants were removed based on MQ0 (median quality score zero) >4, (MQ0/read depth) >10%, quality depth 5, and FS (Phred-scaled P value using Fisher’s exact test to detect strand bias) >200. After filtering, the SNPs were filtered again by removing those within 10 bp of INDELs. As ω and Λ statistics require haplotype information of each chromosome, we used BEAGLE (37) to infer the haplotype phase and impute missing alleles for the entire set of cattle populations simultaneously.
Genome STRiP (18) was used for deletion discovery and genotyping of structural variants in the population using the repeat masked genome. Initial genotype likelihoods were derived with a Bayesian model. Annotation information was obtained from Ensembl 68 (UMD_3.1) (38) and SnpEff 3.0 (39).
Statistical analysis
Cattle genomes were divided into bins of 1 Mb, and footprints of positive selection were calculated for each bin with a grid size of 1000 using LD-based statistics with OmegaPlus (29,30) and SFS-based statistics with SweepFinder (31). The cutoff for ω statistics from OmegaPlus and Λ statistics from SweepFinder was the 99.9% quartile of empirical distribution of each statistic. Combination statistics using Λ statistics and ω statistics were calculated . To check the correlation between distribution of variants and that of repeat elements, repeat element information was obtained from UCSC with RepeatMasker Open-3.0 (40) using bovine genome UMD 3.1. The number of variants and each of the repeat elements were counted in each 1 Mb bin region. The correlations were calculated among them, and they were drawn using the corrplot R package (41).
Acknowledgments
This work was supported by a grant from the BioGreen 21 Program (No. PJ0081912013), Rural Development Administration and Kyungpook National University Research Fund (2013) Republic of Korea.
References
- 1.Qanbari S., Gianola D., Hayes B., Schenkel F., Miller S., Moore S., Thaller G., Simianer H. Application of site and haplotype-frequency based approaches for detecting selection signatures in cattle. BMC Genomics. (2011);12:318. doi: 10.1186/1471-2164-12-318. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Larkin D. M., Daetwyler H. D., Hernandez A. G., Wright C. L., Hetrick L. A., Boucek L., Bachman S. L., Band M. R., Akraiko T. V., Cohen-Zinder M. Whole-genome resequencing of two elite sires for the detection of haplotypes under selection in dairy cattle. Proc. Natl. Acad. Sci. U.S.A. 2012;109:7693–7698. doi: 10.1073/pnas.1114546109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Xia Q., Guo Y., Zhang Z., Li D., Xuan Z., Li Z., Dai F., Li Y., Cheng D., Li R. Complete resequencing of 40 genomes reveals domestication events and genes in silkworm (Bombyx). Science. (2009);326:433–436. doi: 10.1126/science.1176620. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Rubin C. J., Zody M. C., Eriksson J., Meadows J. R. S., Sherwood E., Webster M. T., Jiang L., Ingman M., Sharpe T., Ka S. Whole-genome resequencing reveals loci under selection during chicken domestication. Nature. (2010);464:587–591. doi: 10.1038/nature08832. [DOI] [PubMed] [Google Scholar]
- 5.Kim J., Park S., Yeo J. Linkage mapping and QTL on chromosome 6 in Hanwoo (Korean Cattle). Asian-Aust. J. Anim. Sci. (2003);16:1402–1405. [Google Scholar]
- 6.Lee Y., Lee J., Lee J., Kim J., Park H., Yeo J. for growth and carcass traits related to QTL on chromosome 6 in Hanwoo (Korean cattle). Asian-Aust. J. Anim. Sci. (2008);21:1703–1709. [Google Scholar]
- 7.Lee S. H., Gondro C., van der Werf J., Kim N. K., Lim D., Park E. W., Oh S. J., Gibson J., Thompson J. Use of a bovine genome array to identify new biological pathways for beef marbling in Hanwoo (Korean Cattle). BMC Genomics. (2010);11:623. doi: 10.1186/1471-2164-11-623. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Lee S., Van Der Werf J., Park E., Oh S., Gibson J., Thompson J. Genetic polymorphisms of the bovine Fatty acid binding protein 4 gene are significantly associated with marbling and carcass weight in Hanwoo (Korean Cattle). Anim. Genet. (2010);41:442–444. doi: 10.1111/j.1365-2052.2010.02024.x. [DOI] [PubMed] [Google Scholar]
- 9.Lim D., Lee S. H., Cho Y. M., Yoon D., Shin Y., Kim K. W., Park H. S., Kim H. Transcript profiling of expressed sequence tags from intramuscular fat, longissimus dorsi muscle and liver in Korean cattle (Hanwoo). BMB Rep. 2010;43:151–121. doi: 10.5483/BMBRep.2010.43.2.115. [DOI] [PubMed] [Google Scholar]
- 10.Lee S. H., Park E. W., Cho Y. M., Kim S. K., Lee J. H., Jeon J. T., Lee C. S., Im S. K., Oh S. J., Thompson J. Identification of differentially expressed genes related to intramuscular fat development in the early and late fattening stages of hanwoo steers. J. Biochemical. Mol. Biol. (2007);40:757–764. doi: 10.5483/BMBRep.2007.40.5.757. [DOI] [PubMed] [Google Scholar]
- 11.Yang B. C., Hwang S. S., Im G. S., Lee D. K., Jeon I. S., Park S. B. Phenotypic characterization of Hanwoo (native Korean cattle) cloned from somatic cells of a single adult. BMB Rep. (2012);45:38–43. doi: 10.5483/bmbrep.2012.45.1.38. [DOI] [PubMed] [Google Scholar]
- 12.Barrett R. D. H., Schluter D. Adaptation from standing genetic variation. Trends Ecol. Evol. (2008);23:38–44. doi: 10.1016/j.tree.2007.09.008. [DOI] [PubMed] [Google Scholar]
- 13.Eizaguirre C., Lenz T. L., Kalbe M., Milinski M. Rapid and adaptive evolution of MHC genes under parasite selection in experimental vertebrate populations. Nat. Commun. (2012);3:621. doi: 10.1038/ncomms1632. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Barrett R. D. H., Paccard A., Healy T. M., Bergek S., Schulte P. M., Schluter D., Rogers S. M. Rapid evolution of cold tolerance in stickleback. P. Roy. Soc. B-Biol Sci. (2011);278:233–238. doi: 10.1098/rspb.2010.0923. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Hermisson J., Pennings P. S. Soft sweeps molecular population genetics of adaptation from standing genetic variation. Genetics. (2005);169:2335–2352. doi: 10.1534/genetics.104.036947. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.McKenna A., Hanna M., Banks E., Sivachenko A., Cibulskis K., Kernytsky A., Garimella K., Altshuler D., Gabriel S., Daly M., DePristo M. A. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. (2010);20:1297–1303. doi: 10.1101/gr.107524.110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Quinlan A. R., Hall I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. (2010);26:841–842. doi: 10.1093/bioinformatics/btq033. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Handsaker R. E., Korn J. M., Nemesh J., McCarroll S. A. Discovery and genotyping of genome structural polymorphism by sequencing on a population scale. Nat. Genet. (2011);43:269–276. doi: 10.1038/ng.768. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Zhan B., Fadista J., Thomsen B., Hedegaard J., Panitz F., Bendixen C. Global assessment of genomic variation in cattle by genome resequencing and highthroughput genotyping. BMC Genomics. (2011);12:557. doi: 10.1186/1471-2164-12-557. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Danecek P., Auton A., Abecasis G., Albers C. A., Banks E., DePristo M. A., Handsaker R. E., Lunter G., Marth G. T., Sherry S. T. The variant call format and VCFtools. Bioinformatics. (2011);27:2156–2158. doi: 10.1093/bioinformatics/btr330. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Feulner P. G. D., Chain F. J. J., Panchal M., Eizaguirre C., Kalbe M., Lenz T. L., Mundry M., Samonte I. E., Stoll M., Milinski M. Genome‐wide patterns of standing genetic variation in a marine population of three‐spined sticklebacks. Mol. Ecol. (2012);22:635–649. doi: 10.1111/j.1365-294X.2012.05680.x. [DOI] [PubMed] [Google Scholar]
- 22.Jarrell V. L., Lewin H. A., Da Y., Wheeler M. B. Gene-centromere mapping of bovine DYA, DRB3, and PRL using secondary oocytes and first polar bodies: evidence for four-strand double crossovers between DYA and DRB3. Genomics. (1995);27:33–39. doi: 10.1006/geno.1995.1005. [DOI] [PubMed] [Google Scholar]
- 23.Andersson L., Lunden A., Sigurdardottir S., Davies C. J., Rask L. Linkage relationships in the bovine MHC region. Immunogenetics. (1988);27:273–280. doi: 10.1007/BF00376122. [DOI] [PubMed] [Google Scholar]
- 24.Park C., Russ I., Da Y., Lewin H. A. Genetic mapping of F13A to BTA23 by sperm typing: difference in recombination rate between bulls in the DYA-PRL interval. Genomics. (1995);27:113–118. doi: 10.1006/geno.1995.1012. [DOI] [PubMed] [Google Scholar]
- 25.Paape T., Zhou P., Branca A., Briskine R., Young N., Tiffin P. Fine-scale population recombination rates, hotspots, and correlates of recombination in the medicago truncatula genome. Genome Biol. Evol. (2012);4:726–737. doi: 10.1093/gbe/evs046. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Hedrick P. W. Pathogen resistance and genetic variation at MHC loci. Evolution. (2002);56:1902–1908. doi: 10.1111/j.0014-3820.2002.tb00116.x. [DOI] [PubMed] [Google Scholar]
- 27.Zimin A. V., Kelley D. R., Roberts M., Marcais G., Salzberg S. L., Yorke J. A. Mis-assembled “segmental duplications” in two versions of the bos taurus genome. PloS One. (2012);7:e42680. doi: 10.1371/journal.pone.0042680. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Partipilo G., D P., Lacalandra G. M., Liu G. E., Rocchi M. Refinement of Bos taurus sequence assembly based on BAC-FISH experiments. BMC Genomics. (2011);12:639. doi: 10.1186/1471-2164-12-639. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Alachiotis N., Stamatakis A., Pavlidis P. OmegaPlus: a scalable tool for rapid detection of selective sweeps in whole-genome datasets. Bioinformatics. (2012);28:2274–2275. doi: 10.1093/bioinformatics/bts419. [DOI] [PubMed] [Google Scholar]
- 30.Kim Y., Nielsen R. Linkage disequilibrium as a signature of selective sweeps. Genetics. (2004);167:1513–1524. doi: 10.1534/genetics.103.025387. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Nielsen R., Williamson S., Kim Y., Hubisz M. J., Clark A. G., Bustamante C. Genomic scans for selective sweeps using SNP data. Genome Res. (2005);15:1566–1575. doi: 10.1101/gr.4252305. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Chen J. J., Reid C. E., Band V., Androphy E. J. Interaction of papillomavirus E6 oncoproteins with a putative calcium-binding protein. Science. (1995);269:529–531. doi: 10.1126/science.7624774. [DOI] [PubMed] [Google Scholar]
- 33.De Groot N. G., Otting N., Doxiadis G. G. M., Balla-Jhagjhoorsingh S. S., Heeney J. L., Van Rood J. J., Gagneux P., Bontrop R. E. Evidence for an ancient selective sweep in the MHC class I gene repertoire of chimpanzees. Proc. Natl. Acad. Sci. U.S.A. (2002);99:11748–11753. doi: 10.1073/pnas.182420799. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Bae Y., Lee C., Kang M., Yoon S., Park J., Jean Y. Bovine papillomavirus detection from bovine teats using immunohistochemistry and electronmicroscopy. Korean J. Vet. Res. (2005);45:233–238. [Google Scholar]
- 35.Langmead B., Salzberg S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods. (2012);9:357–359. doi: 10.1038/nmeth.1923. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.1000 Genome Project Data Processing Subgroup. Li H., Handsaker B., Wysoker A., Fennell T., Ruan J., Homer N., Marth G., Abecasis G., Durbin R. The sequence alignment/map format and SAMtools. Bioinformatics. (2009);25:2078–2079. doi: 10.1093/bioinformatics/btp352. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Browning S. R., Browning B. L. Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. Am. J. Hum. Gen. (2007);81:1084–1097. doi: 10.1086/521987. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Flicek P., Amode M. R., Barrell D., Beal K., Brent S., Carvalho-Silva D., Clapham P., Coates G., Fairley S., Fitzgerald S. Ensembl 2012. Nucl Acids Res. (2012);40:D84–D90. doi: 10.1093/nar/gkr991. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Cingolani P., Platts A., Coon M., Nguyen T., Wang L., Land S. J., Lu X., Ruden D. M. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly. (2012);6:80–92. doi: 10.4161/fly.19695. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Smit A., Hubley R., Green P. http://www.repeatmasker.org. Repeat Masker Open-3.0. (1996)
- 41.Friendly M. Corrgrams. Am. Stat. (2002);56:316–324. doi: 10.1198/000313002533. [DOI] [Google Scholar]