Table 1.
Steps | Description | Primary program or custom script |
Probe design | ||
Match | Find genome and transcriptome sequences with 99% identity. | BLATa |
Filter | Retain single hits of substantial length. | Part of Building_exon_probes.shb |
Cluster | Remove isoforms and loci sharing >90% identity. | CD-HIT-ESTc, grab_singleton_clusters.pyb |
Filter | Retain loci with long exons summing to desired length. | blat_block_analyzer.pyb |
Cluster | Remove exons sharing >90% identity. | CD-HIT-ESTc, grab_singleton_clusters.pyb |
Short read processing and data analysis | ||
Read processing | Adapter trimming, quality filtering | Trimmomaticd |
Exon assembly | Reconstruct a sequence for each sample, for each exon. | YASRAe, Alignreadsf |
Identify assembled contigs | If contig identity is unknown, identify which targeting exon(s) it corresponds to. | BLATa |
Sequence alignment I: Collate exons | Cluster orthologous exons across samples. | assembled_exons_to_fasta.pyb |
Sequence alignment II: Perform alignment | Align homologous bases within each exon. | MAFFTg |
Concatenate exons | For each locus, concatenate the aligned exons. | catfasta2phyml.plh |
Gene tree construction | For each locus, estimate the maximum likelihood gene tree. | RAxMLi |
Species tree construction | Estimate the species tree from independent gene trees in a coalescent framework. | MP-ESTj |
New scripts written for this protocol, an example data set, and any future updates are available at https://github.com/listonlab/.