Table 3. Data dependencies required to successfully run each component of the McClintock pipeline.
| ngs_te_mapper | RelocaTE | TEMP | RetroSeq | PoPoolationTE | TE-locate | |
|---|---|---|---|---|---|---|
| Reference genome (fasta) | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
| Canonical TE sequences (fasta) | ✓ | ✓a | ✓ | ✓b | ✓ | |
| Annotation of reference TEs (GFF) | ✓ | ✓ | ||||
| Annotation of reference TEs (BED) | ✓ | ✓c | ||||
| Annotation of reference TEs (custom format) | ✓ | |||||
| Unaligned reads (single-end fastq) | ✓ | ✓ | ||||
| Unaligned reads (paired-end fastq) | ✓ | |||||
| Aligned reads (BAM) | ✓ | ✓ | ||||
| Aligned reads (lexically sorted SAM) | ✓ | |||||
| TE hierarchy (custom format) | ✓ | ✓ |
Must include an entry in the format “TSD=…” for each TE in the file on the same line as the header, where “…” is the TSD sequence if known, or a string of periods with equal to the TSD length if the TSD sequence is unknown. If neither length nor the sequence of the TSD is known, “TSD=UNK” can be supplied.
Must be formatted as one fasta file per TE family and a file of files listing their locations.
Must be one BED file for each entry in the reference TE annotation and a file of files listing their locations.