cutadapt |
|
cutadapt trims and filters reads to expected length and quality |
152 |
-l trims the sequence to length (bp) |
--max-n discards reads with more than n N bases |
bwa |
index |
bwa index builds the index of reference sequences |
153, 156 |
-p prefix is used for output files |
mem |
bwa mem maps reads to the indexed linker or reference genome |
154, 156 |
-p specifies the paired-end mode |
-k specifies minimum seed length, matches shorter than specified will be missed |
-w specifies band width; gaps larger than it will be omitted |
-T specifies minimum score; alignment scored less than it will not be output |
-L specifies clipping penalty; used for the score calculation |
-B specifies mismatch penalty; used for the score calculation |
-O specifies gap open penalty; used for the score calculation |
samtools |
view |
samtools view reads, filters and transforms SAM/BAM files |
154, 156 |
-u specifies output of uncompressed BAM files |
-b specifies output in the BAM format |
-f bits flag; alignments with all bits present will be output |
sort |
samtools sort sorts alignments by coordinates or read names |
154, 156 |
-n specifies sort by read name rather than by chromosomal coordinates |
-m specifies maximum memory (in KB, MB or GB) per thread |
fixmate |
samtools fixmate fills in mate coordinates and related flags from a name-sorted |
156 |
alignment |
-p disables FR (forward–reverse orientation) proper pair check |
-m adds ms (mate score) tags |
markdup |
samtools markdup marks duplicate alignments from a coordinate sorted file |
156 |
GridTools.py |
matefq |
GridTools.py matefq parses the reads mapped to the GRID-seq linker in the BAM file |
155 |
into RNA–DNA mates in interleaved FASTQ format |
-l specifies minimum length; RNA or DNA with length less than specified will be omitted |
-n renames the prefix of each read |
-o outputs to file in HDF5 format |
evaluate |
GridTools.py evaluate calculates quality and quantity of each pair of RNA–DNA |
157 |
mates from the BAM file mapped to the genome. |
-g specifies gene annotation in GTF format |
-k specifies bin size (kb) of the genome (default: 10 kb) |
-m specifies moving window for smoothing in bins (default: 10) |
-o outputs mapping information to the HDF5 file |
stats |
GridTools.py stats calculates statistics of GRID-seq data |
158 |
-p specifies prefix of output file names |
-b outputs the summary of base-position information for RNA, Linker and DNA |
-c outputs the summary of mapping information in read counts |
-l outputs the distribution of sequence length for RNA, Linker and DNA |
-r outputs the resolution information of the library |
RNA |
GridTools.py RNA identifies chromatin-enriched RNAs and evaluates the gene |
159 |
expression levels as well as interaction scopes |
-e specifies output file for the gene expression |
-s specifies output file for the RNA interaction scope |
DNA |
GridTools.py DNA identifies RNA-enriched chromatin regions in background (trans) and foreground (cis) |
160 |
matrix |
GridTools.py matrix evaluates the RNA–chromatin interaction matrix |
161 |
-k specifies cutoff of RNA reads per kilobase in the gene body |
-x specifies cutoff of DNA reads per kilobase at the maximum bin |
model |
GridTools.py model builds a network model to deduce the enhancer–promoter proximity |
Box 3 |
-e specifies BED file of regulatory elements (e.g., enhancers and promoters) |
-k specifies cutoff of RNA reads per kilobase in the gene body |
-x specifies cutoff of DNA reads per kilobase at the maximum bin size |
-z specifies z score used to filter for significant proximity |
bgzip |
|
Block compression/decompression utility |
157 |
tabix |
|
Generic indexer for TAB-delimited genome position files |
157 |
-p specifies input file in GFF or GTF format |