TABLE 1.
Part/Step | Description | Script/software | Input | Output |
---|---|---|---|---|
1. Assessment of SNPs | ||||
Consensus sequence generation | Reads are mapped to the de novo–assembled contigs to generate consensus sequences with SNPs where reads differ. | Bash 1 (BWA, bcftools) | de novo contigs (HybPiper), reads mapped to each locus (HybPiper) | Consensus sequences |
Consensus sequence assessment | Proportion of SNPs and length of consensus sequences for each locus are collected. | R1a | Consensus sequences | Tables with SNPs/locus and sequence length |
Data set optimization | Missing data can be reduced and putative paralogs removed. | R1b | Tables with SNPs/locus and sequence length | Lists with samples and loci to be removed |
Assessment of heterozygosity and allele divergence | Summary tables are generated. | R1c | Tables with SNPs/locus and sequence length (cleaned) | Summary table and graphs for the assessment of heterozygosity and the detection of hybrids |
Sequence lists generation | Sequences from HybPiper and HybPhaser folders are collated into sequence lists. | R1d | Contigs and consensus sequences and list with samples/loci to be removed | Sequence lists for loci or samples with contigs or consensus sequences, raw or cleaned (optimized) |
2. Clade association | ||||
Phylogenetic analysis | Alignments and phylogenetic analysis | e.g., MAFFT*, IQ‐TREE* | Sequence lists | Phylogenetic tree |
Selection of clade references | Taxa that represent major clades are selected by the user. | Information from phylogeny and summary table | Table (csv) with names of clade references | |
Extraction of mapped reads | Generation of read files that contain only reads that mapped on the target sequences | Bash 2 (samtools) | Bam file from HybPiper | Read files (mapped only) |
BBSplit script preparation/execution | Generate and run BBSplit script to match reads (mapped only) to clade references | R2a | Table (csv) with names of clade references | BBSplit stats files with proportions of reads mapped to each reference |
Collation of BBSplit results | Generation of summary table for clade association | R2b | BBSplit stats files, summary table | Clade association summary table |
3. Phasing | ||||
Selection of accessions for phasing | Clade association summary table | Table (csv) with names of accessions to phase with respective references | ||
BBSplit phasing script preparation and execution | Generate and run BBSplit script to map and phase read files | R3a | Table (csv) with names of accessions to phase with respective references, sequence read files | Read files of phased accessions, BBSplit stats files |
Collation of BBSplit phasing results | Generation of summary table for phasing | R3b | BBSplit stats files | Summary table for phasing stats |
4. Data set merging | ||||
Assembly of phased accessions | Phased accessions are assembled using HybPiper and HybPhaser (part 1) | HybPiper, HybPhaser | Read files of phased accessions, target sequence list | Sequence lists of phased accessions |
Merging of data sets | Sequences of phased accessions are merged with sequences of non‐phased accessions | R4 | Sequence lists of phased and non‐phased accessions | Merged sequence lists of phased and non‐phased accessions |
Phylogenetic analysis | Alignments and phylogenetic analysis | e.g., MAFFT*, IQ‐TREE* | Merged sequence lists | Phylogenetic tree including phased and non‐phased accessions |
Software marked with an asterisk are not part of the workflow.