Table 2.
File types processed during the testing phase of the aws-s3-integrity-check tool.
File type | Description |
---|---|
Bam | Compressed binary version of a SAM file that represents aligned sequences up to 128 Mb. |
Bed | Browser Extensible Data format. This file format is used to store genomic regions as coordinates. |
Csv | Comma-Separated Values. |
Docx | File format for Microsoft Word documents. |
Fa | File containing information about DNA sequences and other related pieces of scientific information. |
Fastq | Text-based format for storing genome sequencing data and quality scores. |
Gct | Gene Cluster Text. This is a tab-delimited text format file that contains gene expression data. |
Gff | General Feature Format is a file format used for describing genes and other features of DNA, RNA, and protein sequences. |
Gz | A file compressed by the standard GNU zip (gzip). |
Html | HyperText Markup Language file. |
Ibd | Pre-processed mass spectrometry imaging (MSI) data. |
imzML | Imaging Mass Spectrometry Markup Language. Contains raw MSI data. |
Ipynb | Computational notebooks that can be opened with Jupyter Notebook. |
Jpg | Compressed image format for containing digital images. |
JSON | JavaScript Object Notation. Text-based format to represent structured data based on JavaScript object syntax. |
md5 | Checksum file. |
Msa | Multiple sequence alignment file. It generally contains the alignment of three or more biological sequences of similar length. |
Mtx | Sparse matrix format. This contains genes in the rows and cells in the columns. It is produced as output by Cell Ranger. |
Npy | Standard binary file format in NumPy [27] for saving numpy arrays. |
Nwk | Newick tree file format to represent graph-theoretical trees with edge lengths using parentheses and commas. |
Portable Document Format. | |
Py | Python file. |
Pyc | Compiled bytecode file generated by the Python interpreter after a Python script is imported or executed. |
R | R language script format. |
Svg | Scalable Vector Graphics. This is a vector file format. |
Tab | Tab-delimited text or data files. |
Tif | Tag Image File Format. Tif is a computer file used to store raster graphics and image information. |
Tsv | Tab-separated values to store text-based tabular data. |
Txt | Text document file. |
Vcf | Variant Call Format. Text file for storing gene sequence variations. |
Xls | Microsoft Excel Binary File format. |
Zip | A file containing one or more compressed files. |