Analysis of mRNA expression variation among DFT1 and DFT2 cell lines and primary tissues. a, b Gene expression was analysed in duplicate samples of DFT1 and DFT2 cell lines by RNA sequencing. a A continuous bar plot displaying the logFC of 12,632 genes in DFT2 cell lines relative to DFT1 is shown. Genes are ranked along the x-axis from lowest logFC to highest logFC. Vertical dashed lines represent a fourfold difference in gene expression between the two tumours. b Boxplots of pairwise Euclidean distance between DFT1 and DFT2 cell lines and biopsies, and between human tumours originating from the same and different tissue types, are shown. Mean, interquartile range and outliers are indicated for each distribution. Statistical significance is defined as n.s.p > 0.05, *p < 0.05, **p < 0.01, ***p < 0.001. c, d Gene expression was analysed in primary DFT1 and DFT2 tumour samples, testis, brain, spleen, heart and peripheral nerve by RNA sequencing. Two biological replicates were individually sequenced for each tissue type except for peripheral nerve (PN), where it was necessary to pool the two replicates prior to sequencing to generate sufficient template for analysis. c Hierarchical clustering of sample data (columns) was performed on the union of differentially expressed genes (logFC > 2.0, p < 0.05) and approximately unbiased p values (AU; coloured red) and bootstrap probabilities (BP; coloured green) were estimated from 1000 bootstrapping iterations using pvclust. Genes (rows) were clustered based on Pearson’s correlation coefficient. Heat map colour represents mean gene expression standardised across tissues (z score). d Mean pairwise distance between samples was calculated from log transformed RPKM-normalised mRNA read counts, based on Pearson’s correlation coefficient. Each value represents the average of the two biological replicates