Linkage disequilibrium maps for European and African populations constructed from whole genome sequence data

Alejandra Vergara-Lope; M Reza Jabalameli; Clare Horscroft; Sarah Ennis; Andrew Collins; Reuben J Pengelly

doi:10.1038/s41597-019-0227-y

. 2019 Oct 17;6:208. doi: 10.1038/s41597-019-0227-y

Linkage disequilibrium maps for European and African populations constructed from whole genome sequence data

Alejandra Vergara-Lope ^1,^#, M Reza Jabalameli ^1,^#, Clare Horscroft ¹, Sarah Ennis ¹, Andrew Collins ¹, Reuben J Pengelly ^1,^✉

PMCID: PMC6797713 PMID: 31624256

Abstract

Quantification of linkage disequilibrium (LD) patterns in the human genome is essential for genome-wide association studies, selection signature mapping and studies of recombination. Whole genome sequence (WGS) data provides optimal source data for this quantification as it is free from biases introduced by the design of array genotyping platforms. The Malécot-Morton model of LD allows the creation of a cumulative map for each choromosome, analogous to an LD form of a linkage map. Here we report LD maps generated from WGS data for a large population of European ancestry, as well as populations of Baganda, Ethiopian and Zulu ancestry. We achieve high average genetic marker densities of 2.3–4.6/kb. These maps show good agreement with prior, low resolution maps and are consistent between populations. Files are provided in BED format to allow researchers to readily utilise this resource.

Subject terms: Genomics, Anthropology, Population genetics

Measurement(s)	Linkage Disequilibrium
Technology Type(s)	whole genome sequencing
Factor Type(s)	ethnic group
Sample Characteristic - Organism	Homo sapiens
Sample Characteristic - Location	Europe • Africa

Open in a new tab

Machine-accessible metadata file describing the reported data: 10.6084/m9.figshare.9901838

Background & Summary

Mapping of linkage disequilibrium (LD) is invaluable for many endeavours including identifying signatures of selection, refinement of signals in genome-wide association studies and studies into recombination^1–3.

One approach to the quantification of LD is the generation of LD maps applying the Malécot-Morton model^4,5. The product generated utilising the Malécot-Morton model are maps in cumulative linkage disequilibrium units (LDU), which are broadly analogous to an LD-based form of centimorgans. Previous studies have reported maps generated from array based genotyping data in multiple populations (e.g.⁶), allowing for cross-population comparisons.

The mathematical basis of LDMAP has been previously described^4,5. In brief, LDMAP generates a cumulative map of LD distances between markers, based upon the Malécot-Morton model of association by distance:

\hat{ρ} = (1 - L) M e^{- ϵ d} + L

where $\hat{ρ}$ is the association between two markers in a population, L is the component of $\hat{ρ}$ not due to LD, but due to confounding factors such as recent founder effects, M is the association at 0 distance (approximately 1 for monophyletic haplotypes), $ϵ$ is the rate of decline in the association between the markers and d is the physical distance between the markers⁵. The final LDU map is built by cumulative addition of $ϵ d$ for each inter-marker span.

The increasing availability of whole genome sequencing (WGS) data allows the investigation of LD patterns at the highest level, without the impact of issues such as ascertainment bias in the selection of single nucleotide polymorphism (SNP) markers. We have previously shown that WGS-based maps provide tangible benefits in their practical application. Arrays have been designed to give a reasonable coverage of LD information for a reduced set of SNPs, as such they have limited resolution and population-specific biases are introduced during SNP selection. Given that WGS variant identification is ‘hypothesis free’ (i.e. SNPs are not required to be pre-defined as in array genotyping), these data, and thus these maps, represent a maximally informative resource⁷.

The lack of ascertainment bias for SNP data collection is particularly important for African populations, as they have the greatest population diversity and are often under-represented in genomic studies. Though they are often underrepresented, these populations are particularly informative for many studies, given the extended time since a population bottleneck^7,8. Higher resolution maps allow for analyses on a finer scale of the patterns of LD, such as structure within genes⁹.

Here, we report our generation of WGS based LD maps for four populations, one of European and three of African descent. These maps provide a valuable population genetic resource, providing a maximal resolution, selection bias free, dataset for studies which require the incorporation of LD statistics.

Methods

Autosomal WGS data from two cohort sequencing studies was utilised. African populations were sequenced within the African Genome Diversity Project^8,10, utilising Illumina short read sequencing to an average depth of 4x. European ancestry individuals were sequenced by the Wellderly Study¹¹, utilising Complete Genomics high depth sequencing. Multidimensional scaling as implemented in PLINK¹² was applied to ensure genetic homogeneity within the sub-cohorts.

SNPs were subject to quality control prior to map generation. Specifically, they were required to have a minor allele frequency ≥1%, <5% genotype missingness and not to significantly deviate from Hardy-Weinberg equilibrium (at α = 10⁻³). All analyses were undertaken using the reference genome GRCh37 (hg19).

LD maps were made using LDMAP with default parameters. Owing to the computational intensity of LD map generation, this was performed for 12,000 marker overlapping segments, which were then concatenated into full chromosome maps, removing the 25 terminal markers of each segment to avoid end effects.

Data Records

LD maps reported here are freely available at 10.6084/m9.figshare.7850882 ¹³. These data are in Browser Extensible Data (BED) format, including the cumulative LDU position of every SNP marker within the generated maps. Additionally, these data are also made available as the kb/LDU ratio for each inter-SNP span providing a view of the regional ‘intensity’ of LD.

For the African populations⁸, 95–100 individuals were utilised for each sub-population, yielding approximately 14 million SNP markers (Table 1). The European map utilised 454 individuals¹¹, yielding approximately 7.5 million markers. The increased population diversity for the African compared to European population can be seen in the increase common SNP density, as well as the longer LDU length which corresponds to the longer total haplotypic diversity within a population.

Table 1.

Key statistics for generated LD maps.

Population	Individuals	Marker count	Density^a	LDU
Baganda	100	13,439,201	4.35	129,640
Ethiopian	95	13,892,209	4.48	107,001
European	454	7,062,420	2.28	63,427
Zulu	100	14,205,839	4.59	130,156

Open in a new tab

^aAverage SNP markers per kb.

Technical Validation

For these data, we can determine that they are robust as they are consistent with prior, lower resolution maps, and that they are consistent between populations assessed (Figs 1 and 2). As we know that patterns of recombination and thus LD are broadly consistent between populations, this meets our prior expectations; furthermore the total map lengths are proportional to time since an effective population bottleneck (being longer in African populations reflecting the additional diversity present)^6,7,14.

Fig. 1 — Comparison of the four maps for chromosome 22. The raw cumulative maps are shown (left), as well as maps normalised to have the same total length (right). It can be seen that the contour profiles of the maps are highly similar, though there is variation in the total map length.

Fig. 2 — Comparison of the four maps for all autosomes. The raw cumulative maps are shown. It can be seen that the contour profiles of the maps are highly similar, with a consistend trend in LDU lengths for the populations, with European being consistently the shortest and Baganda/Zulu the longest.

Usage Notes

Maps can be readily incorporated into genomic analyses using tools such as BEDTools¹⁵, allowing annotation of regions with LD information for subsequent analysis such as determining whether a genomic feature has higher LD than background on average.

Genome wide association studies using a composite likelihood model can be undertaken with LD information as provided here, allowing for additional power for signal detection and refinement^2,16.

Acknowledgements

The authors acknowledge the use of the IRIDIS High Performance Computing Facility, and associated support services at the University of Southampton, in the completion of this work.

Author contributions

A.V.-L. undertook data analysis. M.R.J. undertook data analysis. C.H. undertook data analysis. S.E. contributed to study design and supervision. A.C. contributed to study design and supervision. R.J.P. contributed to study design, data analysis, supervision and wrote the manuscript.

Code Availability

The core LDMAP software is written in C, and made available at www.soton.ac.uk/genomicinformatics/research/ld.page.

Competing Interests

The authors declare no competing interests.

Footnotes

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

These authors contributed equally: Alejandra Vergara-Lope and M. Reza Jabalameli.

References

1.Horscroft, C., Ennis, S., Pengelly, R. J., Sluckin, T. J. & Collins, A. Sequencing era methods for identifying signatures of selection in the genome. Briefings in Bioinformaticsbby064 (2018). [DOI] [PubMed]
2.Elding H, Lau W, Swallow DM, Maniatis N. Refinement in localization and identification of gene regions associated with crohn disease. American Journal of Human Genetics. 2013;92:107–113. doi: 10.1016/j.ajhg.2012.11.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Auton A, McVean G. Recombination rate estimation in the presence of hotspots. Genome Research. 2007;17:1219–1227. doi: 10.1101/gr.6386707. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Kuo, T.-Y., Lau, W. & Collins, A. R. LDMAP: the construction of high-resolution linkage disequilibrium maps of the human genome. In Collins, A. R. (ed.) Linkage Disequilibrium and Association Mapping, vol. 376 of Methods in Molecular Biology, 47–57 (Humana Press, 2007). [DOI] [PubMed]
5.Tapper W, et al. A map of the human genome in linkage disequilibrium units. Proc Natl Acad Sci USA. 2005;102:11835–9. doi: 10.1073/pnas.0505262102. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Service S, et al. Magnitude and distribution of linkage disequilibrium in population isolates and implications for genome-wide association studies. Nature Genetics. 2006;38:556. doi: 10.1038/ng1770. [DOI] [PubMed] [Google Scholar]
7.Pengelly RJ, et al. Whole genome sequences are required to fully resolve the linkage disequilibrium structure of human populations. BMC Genomics. 2015;16:666. doi: 10.1186/s12864-015-1854-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Gurdasani D, et al. The african genome variation project shapes medical genetics in africa. Nature. 2015;517:327. doi: 10.1038/nature13997. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Vergara-Lope, A., Ennis, S., Vorechovsky, I., Pengelly, R. J. & Collins, A. Heterogeneity in the extent of linkage disequilibrium among exonic, intronic, non-coding RNA and intergenic chromosome regions. European Journal of Human Genetics27, 1436–1444 (2019). [DOI] [PMC free article] [PubMed]
10.2015. European Genome-phenome Archive. EGAD00001001663
11.Erikson GA, et al. Whole-genome sequencing of a healthy aging cohort. Cell. 2016;165:1002–1011. doi: 10.1016/j.cell.2016.03.022. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Purcell S, et al. Plink: a tool set for whole-genome association and population-based linkage analyses. American Journal of Human Genetics. 2007;81:559–75. doi: 10.1086/519795. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Jabalameli MR, 2019. Whole-genome Linkage Disequilibrium Maps for European and African Populations. Figshare. [DOI] [PMC free article] [PubMed]
14.Bhérer C, Campbell CL, Auton A. Refined genetic maps reveal sexual dimorphism in human meiotic recombination at multiple scales. Nature Communications. 2017;8:14994. doi: 10.1038/ncomms14994. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Quinlan AR, Hall IM. Bedtools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010;26:841–842. doi: 10.1093/bioinformatics/btq033. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Collins A, Lau W. Chromscan: genome-wide association using a linkage disequilibrium map. Journal of Human Genetics. 2008;53:121–126. doi: 10.1007/s10038-007-0226-2. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Citations

2015. European Genome-phenome Archive. EGAD00001001663
Jabalameli MR, 2019. Whole-genome Linkage Disequilibrium Maps for European and African Populations. Figshare. [DOI] [PMC free article] [PubMed]

Data Availability Statement

The core LDMAP software is written in C, and made available at www.soton.ac.uk/genomicinformatics/research/ld.page.

[CR1] 1.Horscroft, C., Ennis, S., Pengelly, R. J., Sluckin, T. J. & Collins, A. Sequencing era methods for identifying signatures of selection in the genome. Briefings in Bioinformaticsbby064 (2018). [DOI] [PubMed]

[CR2] 2.Elding H, Lau W, Swallow DM, Maniatis N. Refinement in localization and identification of gene regions associated with crohn disease. American Journal of Human Genetics. 2013;92:107–113. doi: 10.1016/j.ajhg.2012.11.004. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR3] 3.Auton A, McVean G. Recombination rate estimation in the presence of hotspots. Genome Research. 2007;17:1219–1227. doi: 10.1101/gr.6386707. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR4] 4.Kuo, T.-Y., Lau, W. & Collins, A. R. LDMAP: the construction of high-resolution linkage disequilibrium maps of the human genome. In Collins, A. R. (ed.) Linkage Disequilibrium and Association Mapping, vol. 376 of Methods in Molecular Biology, 47–57 (Humana Press, 2007). [DOI] [PubMed]

[CR5] 5.Tapper W, et al. A map of the human genome in linkage disequilibrium units. Proc Natl Acad Sci USA. 2005;102:11835–9. doi: 10.1073/pnas.0505262102. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR6] 6.Service S, et al. Magnitude and distribution of linkage disequilibrium in population isolates and implications for genome-wide association studies. Nature Genetics. 2006;38:556. doi: 10.1038/ng1770. [DOI] [PubMed] [Google Scholar]

[CR7] 7.Pengelly RJ, et al. Whole genome sequences are required to fully resolve the linkage disequilibrium structure of human populations. BMC Genomics. 2015;16:666. doi: 10.1186/s12864-015-1854-0. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR8] 8.Gurdasani D, et al. The african genome variation project shapes medical genetics in africa. Nature. 2015;517:327. doi: 10.1038/nature13997. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR9] 9.Vergara-Lope, A., Ennis, S., Vorechovsky, I., Pengelly, R. J. & Collins, A. Heterogeneity in the extent of linkage disequilibrium among exonic, intronic, non-coding RNA and intergenic chromosome regions. European Journal of Human Genetics27, 1436–1444 (2019). [DOI] [PMC free article] [PubMed]

[CR10] 10.2015. European Genome-phenome Archive. EGAD00001001663

[CR11] 11.Erikson GA, et al. Whole-genome sequencing of a healthy aging cohort. Cell. 2016;165:1002–1011. doi: 10.1016/j.cell.2016.03.022. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR12] 12.Purcell S, et al. Plink: a tool set for whole-genome association and population-based linkage analyses. American Journal of Human Genetics. 2007;81:559–75. doi: 10.1086/519795. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR13] 13.Jabalameli MR, 2019. Whole-genome Linkage Disequilibrium Maps for European and African Populations. Figshare. [DOI] [PMC free article] [PubMed]

[CR14] 14.Bhérer C, Campbell CL, Auton A. Refined genetic maps reveal sexual dimorphism in human meiotic recombination at multiple scales. Nature Communications. 2017;8:14994. doi: 10.1038/ncomms14994. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR15] 15.Quinlan AR, Hall IM. Bedtools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010;26:841–842. doi: 10.1093/bioinformatics/btq033. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR16] 16.Collins A, Lau W. Chromscan: genome-wide association using a linkage disequilibrium map. Journal of Human Genetics. 2008;53:121–126. doi: 10.1007/s10038-007-0226-2. [DOI] [PubMed] [Google Scholar]

PERMALINK

Linkage disequilibrium maps for European and African populations constructed from whole genome sequence data

Alejandra Vergara-Lope

M Reza Jabalameli

Clare Horscroft

Sarah Ennis

Andrew Collins

Reuben J Pengelly

Abstract

Background & Summary

Methods

Data Records

Table 1.

Technical Validation

Fig. 1.

Fig. 2.

Usage Notes

Acknowledgements

Author contributions

Code Availability

Competing Interests

Footnotes

References

Associated Data

Data Citations

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Linkage disequilibrium maps for European and African populations constructed from whole genome sequence data

Alejandra Vergara-Lope

M Reza Jabalameli

Clare Horscroft

Sarah Ennis

Andrew Collins

Reuben J Pengelly

Abstract

Background & Summary

Methods

Data Records

Table 1.

Technical Validation

Fig. 1.

Fig. 2.

Usage Notes

Acknowledgements

Author contributions

Code Availability

Competing Interests

Footnotes

References

Associated Data

Data Citations

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases