Table 2.

A summary of computational tools available for TIS analysis

Tools/details	ESSENTIALS	Transit	Con-Artist	TraDIS	Tn-seq Explorer	TnseqDiff	MAGenTA	Aerobio
Raw read processing^a	Yes	Through separate tool (TPP)	No, needs separate tools	Yes	No, but enables read mapping with compatible tools	No	Yes	Yes
Overall read count Normalization^b	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes
Genomic location read normalization^c	LOESS correction	No	No	No	No	Yes	No	LOESS correction
Data readout^d	Log 2 counts^j and ratios	Read counts	Read counts	Log 2 counts^j and ratios	Log 2 counts and ratios	Log 2 counts and ratios	Read counts and relative fitness representing the growth rate^k	Read counts and relative fitness representing the growth rate^k
Core model/ approach^e	Negative binomial distribution	Bayesian or hidden Markov model (HMM)	Mann- Whitney U and HMM	Log-fold changes	Log-fold changes	Construction of confidence distribution function	Fits data to exponential growth model, incorporates population expansion	Fits data to exponential growth model, incorporates population expansion
Essential gene/ loci identification^f	Gene	Gene or loci	Gene or loci	Gene	Gene or loci	Gene or defined loci	Gene or loci	Gene or defined loci
Conditionally essential gene/loci identification^f	Gene	Gene or loci	Gene or loci	Gene	Gene or loci	Gene	Gene or loci	Gene or defined loci
Conditionally important/ quantitative read outs	Semi	Semi	Semi	Semi	Semi	Semi	Yes, fitness is growth rate	Yes, fitness is growth rate
Bottleneck calculation & correction^g	No	No^l	Yes	No	No	No	Yes, size calculation and correction	Yes, size calculation and correction
Quantitative comparisons across experiments^h	No	Yes, permutation test	Yes, multinomial distribution Simulation^m	Yes	No	No	Yes, fitness is relative for each condition and experiment	Yes, fitness is relative for each condition and experiment
Operationⁱ	Web-based or CL	GUI	CL	CL	GUI	R-based	CL or Galaxy	CL and server-based
Visualization and notes	Several visualization options	Several visualization options	Visualization with Artemisⁿ	Visualization with Artemisⁿ	Several visualization options	-	R-based visualization options	Many visualization options and performs RNA-seq, and whole-genome sequencing analysis.
Reference (s)	110	17	66	5	82	107	59,91	2

These tools have the integrated capability to perform processes such as barcode clipping, read-quality filtering, and mapping reads to a reference genome.

Are read counts coming from different samples or sequencing runs normalized?

Is there a possibility to account for differences in the number of reads based on the genomic location (e.g., read numbers around the origin of replication in bacterial genomes may sometimes have higher numbers of reads due to increased DNA replication in these locations)?

This is the main type of output provided by this tool.

This is the major approach or model used in the tool that defines data analyses and identifies essential or conditionally essential/important genes or loci.

Each approach identifies essential genes, which are those needed for growth under any condition, and conditionally essential genes, which are those required for growth only in a specific condition. Some use annotation information and are gene centered (gene), some use a sliding window and are annotation independent and can theoretically identify any essential region (e.g., intergenic or even a domain in a gene; loci), with some tools, loci other than genes can be explored, if the loci are specifically defined for instance in the annotation (GenBank) file (defined loci).

Some experiments are affected by bottlenecks, which can be tackled bioinformatically with some tools.

Some tools enable comparisons across experiments and conditions, making it easier to determine whether loci have significant phenotypes in one or more conditions.

ⁱ

The accessibility or user-friendliness of each tool is partially determined by the manner in which they are run: Web-based can be run directly in the browser; CL represents the command line running of scripts in languages such a Perl or Python; GUI is the general user interface and often easy to run; R-based is run in the R-environment; Galaxy is operated in the Galaxy environment; server-based requires extensive expertise to install, while operation is through CL.

These data are generated with the RNA-seq analysis EdgeR package (72).

Fitness in these packages is calculated as the growth rate, as described in detail in References 91, 92, and 94. By incorporating read counts from two time points and the growth expansion or retraction of the population during the experiment into an exponential growth model, the effect of fitness of a single insertion, or a group of insertions in a loci, is represented as the growth rate. Thereby, the measurement becomes independent of time, relative to one (the wt growth rate), and truly quantitative, making cross-experiment, cross-condition, or cross-strain comparisons possible.

The developers of Transit recently explored the zero inflated negative binomial for bottleneck corrections (85). This approach has been used particularly for single-cell RNA-seq analysis, and it could be incorporated into Transit.

Artist developers recently developed CompTIS, a principal component analysis-based approach to analyze multiple data sets (39).

ⁿ

This tool relies on Artemis, a previously developed visualization tool (11).