Skip to main content
. 2016 Feb 17;99(3):285–297. doi: 10.1002/cpt.318

Table 1.

Common data types for drug discovery

Data type Description Common techniques Public availabilitya
SNP A single nucleotide variation in a genetic sequence SNP array: most widely used ****
Whole genome sequencing
CNV Variation of the number of copies of a particular gene in the genetic sequence SNP array: most widely used; less sample DNA required; high probe density and coverage ****
Comparative genome hybridization: high sensitivity and specificity; low spatial resolution
Whole genome sequencing: can detect smaller CNVs and novel types (e.g., inversions)
Mutation A permanent change of the nucleotide sequence of the DNA; mostly somatic mutation that occurs in any of the cells except the germ cells Whole exome sequencing: most widely used ****
Whole genome sequencing: more expensive and more coverage
Gene expression Mostly expression of mRNA but also includes expression of other transcripts Microarray: most widely used *****
RNA‐Seq: can detect novel transcripts, low abundant transcripts and isoforms
Fluorescent in situ hybridization: can detect transcript abundance and spatial location in cells for a small number of genes
RT‐PCR: frequently used to confirm expression for a small number of genes
Protein expression Can be expression of multiple isoforms or variations due to posttranslational modifications Western blot: widely used to quantify protein expression for a small number of proteins ***
ELISA: widely used to detect and quantitatively measure a protein in samples
Immunohistochemistry: can detect intracellular localization for a small number of proteins
Reverse phase protein array: can detect expression for a few hundred proteins
Mass spectrometry: can detect expression for a wide range of proteins
Protein‐protein interaction Physical interactions between two or more proteins Two‐hybrid screening: low‐tech; high false‐positive rate ****
Mass spectrometry
Protein‐DNA interaction Binding of a protein to a molecule of DNA ChIP‐seq: combines chromatin immunoprecipitation with massively parallel DNA sequencing to identify the binding sites of DNA‐associated proteins ***
Gene silencing Effect of loss of gene function RNAi: established method; knocks gene down at mRNA or non‐coding RNA level; can have transient effect (siRNA) or long‐term effect (shRNA) **
CRISPR‐Cas9: new method; modifies gene (via knockout/knockin) at the DNA level; causes permanent and heritable changes in the genome
Gene overexpression Effect of gain of gene function cDNAs/ORFs: provide clones of sequence *
Drug efficacy Effect of drug treatment; primarily represented as IC50/EC50/GI50 in vitro HTS: rapidly assess the activity of a large number of compounds in biochemical assays or cell‐based assays ***
MTT assay: often used to confirm activity for a small number of compounds
Drug‐target interaction Physical interaction between a drug and a protein target Affinity chromatography with mass spectrometry: most sensitive and unbiased method ***
SPR
EMR/EHR Patient response upon interventions Digitalization *

CNV, copy number variation; CRISPR, clustered regularly interspaced short palindromic repeats; ELISA, enzyme‐linked immunosorbent assay; EMR/HER, electronic medical/health records; HTS, high throughput screening; MTT, methylthiazol tetrazolium; RT‐PCR, real‐time polymerase chain reaction; SNP, single‐nucleotide polymorphism; SPR, surface plasmon resonance.

a

Indicates the degree of public availability. For example, ***** shows researchers could easily access this type of data via public portals.