a, Comparing numbers of genes and isoforms found in the microglia long-read samples (isoMiGA), with GENCODE, and an augmented reference combining GENCODE annotated isoforms with the novel isoforms found in isoform (GENCODE+Novel). b, Numbers of splicing events identified in the three references by SUPPA. c, Distribution and inferred properties of each class of novel isoform, grouped by coding status. Number of each isoform type plotted, with the percentage of isoforms predicted to be protein-coding in parentheses. NMD: nonsense-mediated decay. d, Correlations between long-read and short-read expression of genes (upper panels) and isoforms (lower panels), split by whether annotated isoform seen in long-read (annotated), only in GENCODE (GENCODE-only) or only in long-read reference (novel). Expression summarized as median TPM in long-read samples (n=30) and largest short-read microglia cohort (n=185). n refers to the number of genes or isoforms with median TPM > 0.1 in both sequencing modalities, r is the Pearson’s correlation coefficient. e, Shotgun proteomics-derived peptides support a novel downstream translation start site in exon 6 of HNRNPK. Numbers refer to the number of mass spectrometry samples the peptide was detected in. f, Isoforms discovered in the TREM2/TREML1 locus include multiple novel isoforms, including fusion isoforms connecting the two genes, most of which are not predicted to be translated (NMD-sensitive). The structure of each isoform is shown with wider boxes denoting the predicted coding sequence and narrower boxes depicting non-coding sequence. All isoforms are transcribed in the negative direction, denoted by the arrows. Introns shortened to better display exon structure. Locations of RT-PCR primers confirming gene fusion located at bottom of plot. d, The expression of each isoform in short-read RNA-seq microglia (n=185). Boxplots plot the first quartile, the median and the third quartile of the values, with the whiskers denoting 1.5 times the interquartile range. Overlaid violins plot the range and distribution of the values.