Skip to main content
. 2024 Dec 18;17(12):e70058. doi: 10.1111/eva.70058

TABLE 2.

Filtration steps of ddRADseq and lcWGS datasets for sample and single nucleotide polymorphisms (SNPs). Each row is a filtration step targeting either samples or SNPs for selection or removal, and provides retention counts combining a given step and filtering steps in preceding rows. For the lcWGS dataset, prior to pruning for linked sites, mean SNP depth was 4.45×, while site missingness was 2.6% (Figure S12).

Filtration steps Target ddRADseq lcWGS
SNPs Loci N SNPs N
Initial samples 755 351
Reads mapping ≥ 96% Sample 746 71,851,401 344
Mean depth coverage ≥ 5× (post gstacks) Sample 2,138,443 680
Located outside repetitive elements SNP 41,966,793 344
Biallelic SNP 41,232,252 344
Located further than 5 bp of an indel SNP 40,867,199 344
Not an indel (SNPs only) SNP 40,312,568 344
Minor Allele Frequency (MAF) > 1% or SNP detected in > 25% samples SNP 136,771 88,544 680
Read depth > 15× and < 29× SNP 106,757 69,936 680
SNP < 10% missing data and sample < 30% (ddRADseq) or < 10% (lcWGS) missing loci SNP, sample 92,115 62,195 678 39,291,750 341
Mean read depth > 5× Sample 92,115 62,195 677
MAF > 5% SNP 3,000,827 341
Not linked to sex a SNP 2,946,667 341
Observed heterozygosity < 60% SNP 91,967 62,069 677 2,922,691 341
Sequencing plates effect SNP 90,117 61,159 677
Sex‐linked and located within repetitive elements SNP 88,433 60,102 677
Relatedness (Φ < 0.25) Sample 88,433 60,102 638 2,922,691 341
One SNP per locus SNP 60,102 60,102 638
MAF > 5% and < 5% missing data 26,019 26,019 638
Unlinked loci (r 2 < 0.25), 50 kbp sliding window SNP 845,731 340 b
Final dataset complete 26,019 26,019 638 845,731 340 b
Final dataset without outliers 24,709 24,709 638
a

The Y chromosome was not compiled into the original VCF file and sites on the X chromosome were removed.

b

One individual was removed from the final dataset due to conflicting genetic signal in replicate samples.