Skip to main content
letter
. 2021 May 5;38(8):3478–3485. doi: 10.1093/molbev/msab113

Table 1.

Comprehensive List of Functions Developed into PPP, Including Filters, File-converters, Data Analyses, and Other Utilities.

Function Type Script Name Capabilities
Core (VCF-based) vcf_filter Include/exclude variants sites by: allele count (i.e., biallelic, multiallelic, invariant), genomic position, missing data count and percentage, MAF, MAC, presence of indels, SNP IDs, and association with a specific flag (i.e., PASS).
Core (VCF-based) informative_loci_filter Include/exclude loci by: variant site count, missing data count, and locus length. Control variant count by: ignoring indels, ignoring multiallelic variants, and ignoring variants within CpG sites.
Core (VCF-based) vcf_calc Compute summary statistics from a variant file, including Tajima’s D, and Weir and Cockerham’s FST.
Core (VCF-based) vcf_split Uses a BED file of coordinates, or summary statistics to generate separate variant files for each locus or individual.
Core (VCF-based) vcf_phase Allows for phasing of variant files by invoking either BEAGLE, or SHAPEIT.
Core (VCF-based) vcf_four_gamete Outputs regions of no recombination upon conducting a four-gamete test between pairs of variants. Given phased input with individual variants over a region of the genome, this function generates an interval within those variants that passes the four-gamete filtering criteria, then returns either that interval or an output file with variants in that interval.
PPP utilities stat_sampler Computes summary statistics distributions, and pseudorandomly generates subsampled variants/loci either using a uniform sampling scheme, or randomly sampling within bins of statistics.
PPP utilities bed_utilities Automates various utilities for BED-formatted files. This currently includes: 1) sample a BED file; 2) subtract from a BED that overlap with a second BED file; 3) extend a BED upstream, downstream, or both upstream and downstream; 4) sort a single BED; 5) merge features within one or more BED files; 6) create a BED of complementary features.
PPP utilities vcf_utilities Implements various utilities for manipulation of VCF files, including obtaining a list of the chromosomes within a VCF-based file, obtaining a list of the samples within a VCF-based file, concatenating multiple VCF-based files, merging multiple VCF-based files, and soring a VCF-based file.
PPP utilities vcf_bed_to_seq Obtains sequences given a BED coordinates file, and a VCF file.
PPP input file generators vcf_to_ima, vcf_to_gphocs, vcf_format_conversions, vcf_to_fastsimcoal, vcf_to_treemix, vcf_to_dadi Conversion scripts that take a variant call format (VCF) file as input, and convert to formats used by IMa3, G-PhoCS, dadi, TREEMIX, and fastsimcoal2.
PPP analyses eigenstrat_fstats Contains functions that automate the calculation of multiple admixture statistics, including: Patterson’s D, F4 statistic, F4-ratio statistic, and F3 statistic.
PPP analyses admixture Automates the estimation of individual ancestries using Admixture. The functions allows for input as: 1) Binary-PED files or 2) PED 12-formatted files. The function is also capable of configuring the optional arguments of ADMIXTURE.
PPP analyses ima3_wrapper Automates the estimation of evolutionary history using IMa3.
PPP analyses plink_linkage_disequilibrium Automates the calculation of multiple LD statistics using PLINK.
PPP analyses vcf_to_sfs Automates generating the site frequency spectrum (SFS) for a population model from a VCF file.
Model creation model_creator Used to produce Model files by either: 1) manually entering the necessary information or 2) by using files with the relevant information. It is also all possible to create multiple models simultaneously and assign populations to more than a single model.