Transcriptomics
|
Transcriptomics is defined as the measurement of large-scale (“global”) gene () expression in a biological sample including all genes or a representative subset of genes in the genome of a species (denoted as, ). Each gene is transcribed into one or more messenger RNA (mRNA) molecules. |
Transcriptomic technologies
|
Transcriptomic technologies use different approaches to measure global gene expression by quantifying individual mRNA molecules. Most technologies synthesize complementary DNA (cDNA) from mRNA and use complementary oligonucleotide probes to specifically detect cDNA by hybridization. Examples of transcriptomic technologies include: microarrays, L1000 and RNA-Seq. |
Transcriptomic data
|
Transcriptomic data (or gene expression data) from different technologies is generated from biological samples under different experimental conditions including normal vs. diseased, control vs. treated, etc. Two frequent types of treatments are chemical and genetic perturbations involving the knock-out or over-expression of specific genes. Transcriptomic data can be represented by at least four main levels where the higher levels of data are derived from the lower levels: raw data specific to the assay technology (L0), unnormalized mRNA data derived from L0 data using assay-specific processing (L1), normalized mRNA data that captures the absolute levels of expression for genes and is comparable across the study (L2), differential expression data that captures the change in mRNA levels from the control (and may have associated statistical significance scores) that is comparable across studies (L3). |
Transcriptomic profile
|
We define the transcriptomic profile () as the L3 transcriptomic data that captures differential expression values, e.g. log2 fold-changes (L2FC), Z-scores, p-values, q-values, etc. where is the differential expression of one gene , there are genes in the profile and . If available, the significance scores associated with differential expression values for genes can also be represented as a vector denoted as , where , is the p-value associated with the change in expression for gene . |
Extreme transcriptomic profile
|
We define the extreme transcriptomic profile () as one that contains the most differentially expressed genes in . The most up-regulated genes () and most down-regulated genes () are denoted as and , respectively. Then . |
Directional gene signature
|
A directional gene signature () can be a list of the most differentially expressed genes in (in other words, the genes in } associated with a biological state. Therefore, it is defined as: . Directional gene signatures can be defined based on other approaches and may not contain an equal set of up- and down-regulated genes. |
Gene signature
|
A gene signature () is a list of genes whose activity defines a biological state. . Therefore, can be defined as a list of genes/proteins in a canonical pathway or by the genes in or . |
Gene set object
|
We define a gene set object () as the most general concept for a transcriptomic profile (), an extreme transcriptomic profile (), a directional gene signature () or a gene signature () i.e. . Although we have introduced as lists and as vectors all can be represented as vectors. For example, a hypothetical non-directional pathway signature only contains a subset of the genes in and can be represented as a binary vector (that is, ). Similarly, a hypothetical directional signature , can be represented as the vector (that is, ). |