Skip to main content
. 2021 Jul 21;9(7):10.1002/aps3.11441. doi: 10.1002/aps3.11441

TABLE 1.

HybPhaser workflow overview containing all steps with a short description, the script/software used, required input, and generated output for each step.

Part/Step Description Script/software Input Output
1. Assessment of SNPs
Consensus sequence generation Reads are mapped to the de novo–assembled contigs to generate consensus sequences with SNPs where reads differ. Bash 1 (BWA, bcftools) de novo contigs (HybPiper), reads mapped to each locus (HybPiper) Consensus sequences
Consensus sequence assessment Proportion of SNPs and length of consensus sequences for each locus are collected. R1a Consensus sequences Tables with SNPs/locus and sequence length
Data set optimization Missing data can be reduced and putative paralogs removed. R1b Tables with SNPs/locus and sequence length Lists with samples and loci to be removed
Assessment of heterozygosity and allele divergence Summary tables are generated. R1c Tables with SNPs/locus and sequence length (cleaned) Summary table and graphs for the assessment of heterozygosity and the detection of hybrids
Sequence lists generation Sequences from HybPiper and HybPhaser folders are collated into sequence lists. R1d Contigs and consensus sequences and list with samples/loci to be removed Sequence lists for loci or samples with contigs or consensus sequences, raw or cleaned (optimized)
2. Clade association
Phylogenetic analysis Alignments and phylogenetic analysis e.g., MAFFT*, IQ‐TREE* Sequence lists Phylogenetic tree
Selection of clade references Taxa that represent major clades are selected by the user. Information from phylogeny and summary table Table (csv) with names of clade references
Extraction of mapped reads Generation of read files that contain only reads that mapped on the target sequences Bash 2 (samtools) Bam file from HybPiper Read files (mapped only)
BBSplit script preparation/execution Generate and run BBSplit script to match reads (mapped only) to clade references R2a Table (csv) with names of clade references BBSplit stats files with proportions of reads mapped to each reference
Collation of BBSplit results Generation of summary table for clade association R2b BBSplit stats files, summary table Clade association summary table
3. Phasing
Selection of accessions for phasing Clade association summary table Table (csv) with names of accessions to phase with respective references
BBSplit phasing script preparation and execution Generate and run BBSplit script to map and phase read files R3a Table (csv) with names of accessions to phase with respective references, sequence read files Read files of phased accessions, BBSplit stats files
Collation of BBSplit phasing results Generation of summary table for phasing R3b BBSplit stats files Summary table for phasing stats
4. Data set merging
Assembly of phased accessions Phased accessions are assembled using HybPiper and HybPhaser (part 1) HybPiper, HybPhaser Read files of phased accessions, target sequence list Sequence lists of phased accessions
Merging of data sets Sequences of phased accessions are merged with sequences of non‐phased accessions R4 Sequence lists of phased and non‐phased accessions Merged sequence lists of phased and non‐phased accessions
Phylogenetic analysis Alignments and phylogenetic analysis e.g., MAFFT*, IQ‐TREE* Merged sequence lists Phylogenetic tree including phased and non‐phased accessions
*

Software marked with an asterisk are not part of the workflow.