Red lines indicate abundance of TCRα and TCRβ sequences, both without (top) and with cleaning of overlap with memory (bottom). Blue lines represent model prediction for the abundance of α and β chains (blue dashed line, error bars indicate standard deviation over 10 simulations, note the different predictions between the volunteers due to different sample sizes). Sequences represented with 10 or more UMIs are grouped ('10+'). Due to the impact of sampling multiple RNAs from the same cell (Section 'Subsampling to exclude inflated abundance through multiple RNA contributions by single cells'), the predictions under-predict the number of doublets, especially for β chains. Apart from this, the model fits well to the ‘uncleaned’ data in most cases, but the cleaned data under-represents abundant clones, suggesting that several clones shared between memory and naive represent genuine abundant naive clones. The only exception is for CD4+ TCRβ, where we observe much fewer large clones than the model predicts.