[−1 <file>] |
Input file 1 is required. Note: For both Input 1 and Input 2 (see next row), the user can enter two kinds of inputs. One is the combined methylation level data (eg, “−1 MCF7.CG.combine”), and the other is the “acgt-count” output files, which includes uncombined methylation levels. If it is uncombined, that means the methylation levels on the forward and reverse strands are in two files and they should be separated by comma (,) when providing them as input files, eg, “−1 MCF7.CG.forward, MCF7CG.reverse”. |
[−2 <file>] |
Input file 2, optional. If specified, the pipeline will process both inputs and compare their final results. Default is only to process the input file 1, and not to do the comparing. Note: For both Input 1 and Input 2, the user can enter two kinds of inputs as explained in the above row. |
[-o <dir>] |
The output directory where all the output files are created and written. Default is “<current_dir>/final.results/.” |
[-c <int>] |
The value for selecting the methylation coverage is greater than B. (Default: B = 0). On each strand there must be at least B reads to cover a specific CpG site in order for HMPL to check if it is hemimethylated. Changing the “–c” value from a smaller value (eg, -c 5) to a larger value (eg, -c 10) will obtain a shorter list of hemimethylated sites and have a smaller false discovery rate. |
[-l <real>] |
The cutoff value for selecting low methylation level. (Default: 0.2, range: [0.05, 0.4]). This value corresponds to the “L0” mentioned in Step 4 of the pipeline. If the methylation level is less than this “-l” value, it will be claimed as unmethylated. Changing “-l” value from a smaller value (eg, -l 0.1) to a larger value (eg, -l 0.2) may give a longer list of hemimethylated sites, but there may be a larger false discovery rate. |
[-h <real>] |
The cutoff value for selecting high methylation level. (Default: 0.8, range: [0.6, 1]). This value corresponds to the “H0” mentioned in Step 4 of the pipeline. If the methylation level is greater than this “-h” value, it will be claimed as methylated. Changing “-h” value from a smaller value (eg, -h 0.7) to a larger value (eg, -h 0.9) may give a shorter list of hemimethylated sites, but there may be a smaller false discovery rate. |
[-d <int>] |
The maximum distance between two CpG sites to be selected as a cluster with default 50. If the maximum distance is changed from a smaller value (eg, -d 50) to larger value (eg, -d 100), the number of CpG sites in a cluster will be larger, but the total number of hemimethylation clusters will become smaller. |
[-r<file>] |
The reference gene file, not the genome reference sequence files. This file is used to provide genetic annotation (ie, gene names) to the hemimethylation sites. For example, we set it as “-r/home/reference/hg19/refGene.txt”. This “refGene.txt” file contains the gene names and gene information downloaded from the UCSC genome browser. |
[-D <int>] |
The distance of promoter region (Default: D = 1000). That is, if the transcript starting position is located at X = 5,000 bp on a chromosome, the promoter region of this gene is defined as from X-D = 4,000 to X = 5,000. |