Skip to main content
. 2020 Mar 18;9:e49900. doi: 10.7554/eLife.49900

Figure 4. Predictions of the neutral, power-law and two-population model compared with HTS data.

(A) Number of TCRα and TCRβ sequences which are predicted to be shared between 1 (red), 2 (blue) and 3 (green) subsamples as a function of the thymic output rate θ for the neutral model. (B) As A., but as a function of the slope of the power-law distribution. (C) The median generation probability 𝒫(σ) of TCRα and TCRβ sequences predicted by the neutral model. Dashed lines depict the mean of 10 model prediction repeats, shaded area indicates the standard deviation, solid lines show observed results in HTS data. (D) As C., but as a function of the slope of the power-law distribution. (E) Graphical representation of parameter sweep results for prediction of CD4+ and CD8+ repertoires from αβ clone-size distributions following a mixture model consisting of singleton clones and a small fraction of large clones. The color represents goodness of fit, with dark green being better predictions for number of sequences per incidence in samples. Empty circles indicate parameter combinations resulting in qualitatively correctly predicted 𝒫(σ), that is 3 > 2 > 1 for TCRα and 2 > 1 for TCRβ and 2 > 3 for TCRβ. Filled circles indicate parameter combinations with the smallest distance to the incidence data and a correct 𝒫(σ) prediction.

Figure 4.

Figure 4—figure supplement 1. Prediction of power-law model (exponent 2.3) for single sample data.

Figure 4—figure supplement 1.

Red lines indicate abundance of TCRα and TCRβ sequences, both without (top) and with cleaning of overlap with memory (bottom). Blue lines represent model prediction for the abundance of α and β chains (blue dashed line, error bars indicate standard deviation over 10 simulations, note the different predictions between the volunteers due to different sample sizes). Sequences represented with 10 or more UMIs are grouped ('10+'). Due to the impact of sampling multiple RNAs from the same cell (Section 'Subsampling to exclude inflated abundance through multiple RNA contributions by single cells'), the predictions under-predict the number of doublets, especially for β chains. Apart from this, the model fits well to the ‘uncleaned’ data in most cases, but the cleaned data under-represents abundant clones, suggesting that several clones shared between memory and naive represent genuine abundant naive clones. The only exception is for CD4+ TCRβ, where we observe much fewer large clones than the model predicts.
Figure 4—figure supplement 2. Similar to Figure 4, but for HTS data from which TCRα and TCRβ sequences were removed that also occurred in the corresponding memory samples.

Figure 4—figure supplement 2.

(A) Number of α and β chains predicted to occur in 1 (red), 2 (blue) and 3 (green) subsamples as a function of the thymic output rate θ for the neutral model. (B) As A., but as a function of the slope of the power-law distribution. (C) The median generation probability (σ) of TCRα and TCRβ sequences predicted by the neutral model. Dashed lines depict the mean of 10 model prediction repeats, shaded area indicates the standard deviation, solid lines show observed results in HTS data. (D) As C., but as a function of the slope of the power-law distribution. (E) Graphical representation of parameter sweep results for prediction of CD4+ and CD8+ repertoires from αβ clone-size distributions consisting of singleton clones and a small fraction of large clones. The color represents goodness of fit, with dark green being better predictions for number of sequences per incidence in samples. Symbols indicate parameter combinations resulting in qualitatively correctly predicted (σ), that is 3 > 2 > 1 for TCRα and 2 > 1 for TCRβ and 2 > 3 for TCRβ. Filled circles indicate parameter combinations with the smallest distance to the incidence data and a correct (σ) prediction.
Figure 4—figure supplement 3. Similar to Figure 4, but for HTS data processed with RTCR.

Figure 4—figure supplement 3.

(A) Number of TCRα and TCRβ sequences which are predicted to be shared between 1 (red), 2 (blue) and 3 (green) subsamples as a function of the thymic output rate θ for the neutral model. (B) As A., but as a function of the slope of the power-law distribution. (C) The median generation probability (σ) of TCRα and TCRβ sequences predicted by the neutral model. Dashed lines depict the mean of 10 model prediction repeats, shaded area indicates the standard deviation, solid lines show observed results in HTS data. (D) As C., but as a function of the slope of the power-law distribution. (E) Graphical representation of parameter sweep results for prediction of CD4+ and CD8+ repertoires from αβ clone-size distributions following a mixture model consisting of singleton clones and a small fraction of large clones. The color represents goodness of fit, with dark green being better predictions for number of sequences per incidence in samples. Empty circles indicate parameter combinations resulting in qualitatively correctly predicted (σ), i.e., 3 > 2 > 1 for TCRα and 2 > 1 for TCRβ and 2 > 3 for TCRβ. Filled circles indicate parameter combinations with the smallest distance to the incidence data and a correct (σ) prediction.