Table 2.
A summary of computational tools available for TIS analysis
| Tools/details | ESSENTIALS | Transit | Con-Artist | TraDIS | Tn-seq Explorer | TnseqDiff | MAGenTA | Aerobio |
|---|---|---|---|---|---|---|---|---|
| Raw read processinga | Yes | Through separate tool (TPP) | No, needs separate tools | Yes | No, but enables read mapping with compatible tools | No | Yes | Yes |
| Overall read count Normalizationb | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
| Genomic location read normalizationc | LOESS correction | No | No | No | No | Yes | No | LOESS correction |
| Data readoutd | Log 2 countsj and ratios | Read counts | Read counts | Log 2 countsj and ratios | Log 2 counts and ratios | Log 2 counts and ratios | Read counts and relative fitness representing the growth ratek | Read counts and relative fitness representing the growth ratek |
| Core model/ approache | Negative binomial distribution | Bayesian or hidden Markov model (HMM) | Mann- Whitney U and HMM | Log-fold changes | Log-fold changes | Construction of confidence distribution function | Fits data to exponential growth model, incorporates population expansion | Fits data to exponential growth model, incorporates population expansion |
| Essential gene/ loci identificationf | Gene | Gene or loci | Gene or loci | Gene | Gene or loci | Gene or defined loci | Gene or loci | Gene or defined loci |
| Conditionally essential gene/loci identificationf | Gene | Gene or loci | Gene or loci | Gene | Gene or loci | Gene | Gene or loci | Gene or defined loci |
| Conditionally important/ quantitative read outs | Semi | Semi | Semi | Semi | Semi | Semi | Yes, fitness is growth rate | Yes, fitness is growth rate |
| Bottleneck calculation & correctiong | No | Nol | Yes | No | No | No | Yes, size calculation and correction | Yes, size calculation and correction |
| Quantitative comparisons across experimentsh | No | Yes, permutation test | Yes, multinomial distribution Simulationm | Yes | No | No | Yes, fitness is relative for each condition and experiment | Yes, fitness is relative for each condition and experiment |
| Operationi | Web-based or CL | GUI | CL | CL | GUI | R-based | CL or Galaxy | CL and server-based |
| Visualization and notes | Several visualization options | Several visualization options | Visualization with Artemisn | Visualization with Artemisn | Several visualization options | - | R-based visualization options | Many visualization options and performs RNA-seq, and whole-genome sequencing analysis. |
| Reference (s) | 110 | 17 | 66 | 5 | 82 | 107 | 59,91 | 2 |
These tools have the integrated capability to perform processes such as barcode clipping, read-quality filtering, and mapping reads to a reference genome.
Are read counts coming from different samples or sequencing runs normalized?
Is there a possibility to account for differences in the number of reads based on the genomic location (e.g., read numbers around the origin of replication in bacterial genomes may sometimes have higher numbers of reads due to increased DNA replication in these locations)?
This is the main type of output provided by this tool.
This is the major approach or model used in the tool that defines data analyses and identifies essential or conditionally essential/important genes or loci.
Each approach identifies essential genes, which are those needed for growth under any condition, and conditionally essential genes, which are those required for growth only in a specific condition. Some use annotation information and are gene centered (gene), some use a sliding window and are annotation independent and can theoretically identify any essential region (e.g., intergenic or even a domain in a gene; loci), with some tools, loci other than genes can be explored, if the loci are specifically defined for instance in the annotation (GenBank) file (defined loci).
Some experiments are affected by bottlenecks, which can be tackled bioinformatically with some tools.
Some tools enable comparisons across experiments and conditions, making it easier to determine whether loci have significant phenotypes in one or more conditions.
The accessibility or user-friendliness of each tool is partially determined by the manner in which they are run: Web-based can be run directly in the browser; CL represents the command line running of scripts in languages such a Perl or Python; GUI is the general user interface and often easy to run; R-based is run in the R-environment; Galaxy is operated in the Galaxy environment; server-based requires extensive expertise to install, while operation is through CL.
These data are generated with the RNA-seq analysis EdgeR package (72).
Fitness in these packages is calculated as the growth rate, as described in detail in References 91, 92, and 94. By incorporating read counts from two time points and the growth expansion or retraction of the population during the experiment into an exponential growth model, the effect of fitness of a single insertion, or a group of insertions in a loci, is represented as the growth rate. Thereby, the measurement becomes independent of time, relative to one (the wt growth rate), and truly quantitative, making cross-experiment, cross-condition, or cross-strain comparisons possible.
The developers of Transit recently explored the zero inflated negative binomial for bottleneck corrections (85). This approach has been used particularly for single-cell RNA-seq analysis, and it could be incorporated into Transit.
Artist developers recently developed CompTIS, a principal component analysis-based approach to analyze multiple data sets (39).
This tool relies on Artemis, a previously developed visualization tool (11).