Skip to main content
. Author manuscript; available in PMC: 2021 Apr 1.
Published in final edited form as: Nat Rev Genet. 2020 Feb 7;21(4):243–254. doi: 10.1038/s41576-020-0210-7

Table 1 |.

Reported novel sequences from efforts to sequence and analyse structural variation in large cohorts of human individuals

Population and consortium (if applicable) Number of individuals Data type Total novel sequence reported Average per individual Additional requirements Publication
year
Refs
Swedish, SweGen 1,000 (subset of 2) Short read (long read) 46 Mb (17.3 Mb) 0.6 Mb (12.1 Mb) Over 300 bp (over 100 bp) 2019 (2018) 18,79
Han Chinese 275 Short read 29.5 Mb ~5 Mb fully unaligned + ~6 Mb partially unaligned to reference Over 500 bp 2019 69
Mixed, TOPMed 53,831 Short read 2.2 Mb 0.2–0.5 Mb Must align to a hominid genome 2019 65
Mixed 154 BioNano maps, linked reads (10X Genomics) 60 Mb 14.2 Mb >2 kb 2019 71
Mixed 15 Long read 21.3 Mb 6.4 Mb Not in peri-centromeric regions, over 50 bp 2019 68
African ancestry, Consortium on Allergy in African-Ancestry Populations 910 Short read 296.5 Mb 2.5 Mb >1 kb 2019 28
Mixed 17 Linked reads (10X Genomics) 2.1 Mb 0.71 Mb Breakpoint resolved, over 50 bp of non-repetitive content per sequence 2018 73
Icelandic 15,219 Short read 0.33 Mb 0.16 Mb Non-repetitive, breakpoint resolved 2017 15
Danish, Danish Genome Project 150 Short read >15,000 insertionsa,b Not reported >50 bp 2017 17
Dutch, Genome of the Netherlands 769 Short read 4.3 Mb Not reported >150 bp 2016 70
Mixed 10,545 Short read 3.26 Mb 0.7 Mb Non-repetitive, >200 bp 2016 25
Mixed, data from 1KGP 45 Short read 61.6 Mb 17,700–20,500 insertionsa,c No size or other restrictions reported 2016 74
Mixed, The Simon’s Genome Diversity Project 300 Short read 5.8 Mb (13 Mb with repetitive elements) Not reported Non-repetitive, >500 bp 2016 24
Japanese, Tohoku Medical Megabank Organization 1,070 Short read 9,354 insertionsa 45 insertionsa >1 kb 2015 72

1KGP, 1000 Genomes Project; TOPMed, Trans-Omics Precision Medicine.

a

Did not report number of bases.

b

Estimated on the basis of figure 2b from REF.17.

c

Estimates separated into the average number of contiguous sequences per population with at least a partial match. The 61.6 Mb reported was based on 30,879 insertions.