Skip to main content
. 2017 Jun 21;7(8):2763–2778. doi: 10.1534/g3.117.043893

Table 3. Data dependencies required to successfully run each component of the McClintock pipeline.

ngs_te_mapper RelocaTE TEMP RetroSeq PoPoolationTE TE-locate
Reference genome (fasta)
Canonical TE sequences (fasta) a b
Annotation of reference TEs (GFF)
Annotation of reference TEs (BED) c
Annotation of reference TEs (custom format)
Unaligned reads (single-end fastq)
Unaligned reads (paired-end fastq)
Aligned reads (BAM)
Aligned reads (lexically sorted SAM)
TE hierarchy (custom format)
a

Must include an entry in the format “TSD=…” for each TE in the file on the same line as the header, where “…” is the TSD sequence if known, or a string of periods with equal to the TSD length if the TSD sequence is unknown. If neither length nor the sequence of the TSD is known, “TSD=UNK” can be supplied.

b

Must be formatted as one fasta file per TE family and a file of files listing their locations.

c

Must be one BED file for each entry in the reference TE annotation and a file of files listing their locations.