a, Empirical MPRAs enable targeted functional characterization of the effects of hundreds of thousands of CREs on transcription in episomal reporters, and can quantify the impact of programmable 200 bp oligonucleotide sequences. MPRAs across multiple cell types enable the identification of cell-type-specific activity of CREs. b, Malinois is a deep CNN model that predicts cell-type-specific CRE effects directly from the nucleotide sequence in K562 (teal), HepG2 (yellow) and SK-N-SH (red) cells. Contribution scores extracted from the model determine how subsequences drive predicted function in each cell type. c, Malinois predictions are highly correlated with empirically measured MPRA activity across K562 (teal), HepG2 (yellow) and SK-N-SH (red) cells. The performance for each cell type was measured using Pearson correlation (r) analysis of a test set of sequences that were withheld from training (n = 62,562 oligos, P < 10−300). Each point corresponds to the empirical and predicted activity of a single CRE in the corresponding cell type, and the topological lines indicate the point density (16.7%, 33.3%, 50%, 66.7%, 83.3%) in the scatter plots. Train–test splits were defined by chromosomes. d, Malinois predictions recapitulate an MPRA screen of overlapping fragments derived from a 2.1 Mb window centred on the GATA1 gene (Pearson’s r = 0.91, n = 51,242 oligos, P < 10−300; Supplementary Fig. 3). Purple signal indicates overlapping measurements, and the blue and red signals indicate either higher activity measurements or predictions by MPRA or Malinois, respectively, in the window chromosome X: 48000000–49000000. e, Malinois activity predictions for sequences centred on candidate CREs (cCRE) in chromosome 13 demarcated by DHS peaks in K562 cells (n = 2,413 peaks). This pattern of activation is concordant with quantitative signals measured using STARR-seq, DHS-seq and H3K27ac ChIP-seq.