Skip to main content
. 2023 Dec 13;624(7992):602–610. doi: 10.1038/s41586-023-06842-7

Extended Data Fig. 8. Genomic variation distribution and sampling dynamics across the cohort.

Extended Data Fig. 8

(a) Proportion of different SV types for NCIG-only variants classified as private, community-specific, widespread or shared. Types are non-repetitive (teal), tandem repeat (STR = red & TR = blue) and mobile element (fragment = light purple & complete = dark purple). (b) Log regression models predicting the number of non-redundant SVs identified, given the number of individuals sampled. The models are broken down by community (left panel), by geographical distribution (centre panel) and SV type (NCIG individuals; right panel). (c) Bar chart shows a discovery curve, in which starting with a single NCIG individual, the number of new non-redundant large indels is counted by iteratively adding the unique calls from additional NCIG individuals. Indels shared among all previously added samples are shown as green portions of each bar. The growth rate of the nonredundant set declines as the number of samples increases. (d) Log regression model showing the predicted number of non-redundant large indels identified given the number of individuals sampled. The model was broken down by variant type (Non-repetitive = teal, Tandem repeats = red). (e) Proportion of private, community-specific, widespread & shared NCIG-only variants among individuals, grouped by community. A total of n = 141 individuals (NCIGP1 = 41, NCIGP2 = 32, NCIGP3 = 9, NCIGP4 = 39 and non-NCIG = 20) were examined from independent sequencing experiments in figure e. In the boxplot, the middle line is the median, the box represents the interquartile range (IQR), the whiskers extend 1.5 times the IQR from the hinge, and any data points beyond the whiskers are shown individually.