{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"toc": true
},
"source": [
"
Table of Contents
\n",
""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Calculation of multiplet frequency from cell-type mixing in `R`\n",
"Here we implement the simple function to calculate the multiplet frequency from single-cell RNA sequencing experiments where we have mixed cells of two types (e.g., human and mouse), and know the number of observed droplets that contain cells of each type."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Function to compute multiplet frequency\n",
"The multiplet frequency is computed in terms of the following three experimental observables:\n",
" - the number of droplets that contain at least one cell of type 1, which we denote as $N_1$\n",
" - the number of droplets that contain at least one of type 2, which we denote as $N_2$\n",
" - the number of droplets containing cells of both type 1 and type 2, which we denote as $N_{1,2}$\n",
"\n",
"The multiplet frequency $M$ is\n",
"$$M = 1 - \\frac{\\left(\\mu_1 + \\mu_2\\right)e^{-\\mu_1 - \\mu_2}}{1 - e^{-\\mu_1 - \\mu_2}}$$\n",
"where\n",
"$$\\mu_1 = -\\ln\\left(\\frac{N - N_1}{N}\\right)$$\n",
"and\n",
"$$\\mu_2 = -\\ln\\left(\\frac{N - N_2}{N}\\right),$$\n",
"and where\n",
"$$N = \\frac{N_1 N_2}{N_{12}}.$$\n",
"\n",
"## `R` function\n",
"Here is the calculation implemented as `R` function:"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"#' Multiplet frequency from cell-type mixing experiment.\n",
"#'\n",
"#' @param n1 Number of droplets with at least one cell of type 1\n",
"#' @param n2 Number of droplets with at least one cell of type 2\n",
"#' @param n12 Number of droplets with cells of both types\n",
"#' @return The estimated multiplet frequency\n",
"multiplet_freq <- function(n1, n2, n12) {\n",
" n <- n1 * n2 / n12\n",
" mu1 <- -log((n - n1) / n)\n",
" mu2 <- -log((n - n2) / n)\n",
" mu <- mu1 + mu2\n",
" return (1 - mu * exp(-mu) / (1 - exp(-mu)))\n",
"}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Example when cell types mixed in equal proportion\n",
"First we demonstrate the calculations in the case when the cell types are mixed in equal proportions.\n",
"\n",
"Let's create some hypothetical data.\n",
"We'll imagine that these data come from the [10X cellranger](https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/what-is-cell-ranger) software analysis of a multi-species experiment that mixed mouse and human cells equally.\n",
"The current version (2.1.1) of the [10X cellranger](https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/what-is-cell-ranger) pipeline for this type of study would return the *human Estimated Number of Cell Partitions*, the *mouse Estimated Number of Cell Partitions*, and the number of *GEMs with > 0 Cells* (this last quantity is also reported as the *Estimated Number of Cells*).\n",
"These statistics give the numbers of non-empty GEMs with cells of each type (GEMs is the term that 10X uses to refer to their droplets).\n",
"\n",
"Here is a data frame of some hypothetical data from three different experiments, each with 4000 cells total but with different numbers of cross-celltype droplets:"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
" | human_droplets | mouse_droplets | nonempty_droplets |
\n",
"\n",
"\texperiment 1 | 2005 | 2005 | 4000 |
\n",
"\texperiment 2 | 2050 | 2050 | 4000 |
\n",
"\texperiment 3 | 2500 | 2500 | 4000 |
\n",
"\n",
"
\n"
],
"text/latex": [
"\\begin{tabular}{r|lll}\n",
" & human\\_droplets & mouse\\_droplets & nonempty\\_droplets\\\\\n",
"\\hline\n",
"\texperiment 1 & 2005 & 2005 & 4000\\\\\n",
"\texperiment 2 & 2050 & 2050 & 4000\\\\\n",
"\texperiment 3 & 2500 & 2500 & 4000\\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"| | human_droplets | mouse_droplets | nonempty_droplets | \n",
"|---|---|---|\n",
"| experiment 1 | 2005 | 2005 | 4000 | \n",
"| experiment 2 | 2050 | 2050 | 4000 | \n",
"| experiment 3 | 2500 | 2500 | 4000 | \n",
"\n",
"\n"
],
"text/plain": [
" human_droplets mouse_droplets nonempty_droplets\n",
"experiment 1 2005 2005 4000 \n",
"experiment 2 2050 2050 4000 \n",
"experiment 3 2500 2500 4000 "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"df_equal <- data.frame(human_droplets=c(2005, 2050, 2500), \n",
" mouse_droplets=c(2005, 2050, 2500), \n",
" nonempty_droplets=c(4000, 4000, 4000), \n",
" row.names=c(\"experiment 1\", \"experiment 2\", \"experiment 3\"))\n",
"\n",
"df_equal"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We calculate the number of droplets with cells of **both** types (human and mouse) simply as the sum of the human and mouse droplets minus the total number of non-empty droplets, since these cross-celltype droplets are double counted in the tally of human and mouse droplets.\n",
"As is apparent after this calculation, in this hypothetical example the cross-celltype droplets represent 0.25%, 2.5%, and 25% of the total non-empty droplets in the three examples.\n",
"\n",
"Note that we need to import `dplyr` to run the next cell:"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"\n",
"Attaching package: ‘dplyr’\n",
"\n",
"The following objects are masked from ‘package:stats’:\n",
"\n",
" filter, lag\n",
"\n",
"The following objects are masked from ‘package:base’:\n",
"\n",
" intersect, setdiff, setequal, union\n",
"\n"
]
},
{
"data": {
"text/html": [
"\n",
"human_droplets | mouse_droplets | nonempty_droplets | human_and_mouse_droplets |
\n",
"\n",
"\t2005 | 2005 | 4000 | 10 |
\n",
"\t2050 | 2050 | 4000 | 100 |
\n",
"\t2500 | 2500 | 4000 | 1000 |
\n",
"\n",
"
\n"
],
"text/latex": [
"\\begin{tabular}{r|llll}\n",
" human\\_droplets & mouse\\_droplets & nonempty\\_droplets & human\\_and\\_mouse\\_droplets\\\\\n",
"\\hline\n",
"\t 2005 & 2005 & 4000 & 10\\\\\n",
"\t 2050 & 2050 & 4000 & 100\\\\\n",
"\t 2500 & 2500 & 4000 & 1000\\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"human_droplets | mouse_droplets | nonempty_droplets | human_and_mouse_droplets | \n",
"|---|---|---|\n",
"| 2005 | 2005 | 4000 | 10 | \n",
"| 2050 | 2050 | 4000 | 100 | \n",
"| 2500 | 2500 | 4000 | 1000 | \n",
"\n",
"\n"
],
"text/plain": [
" human_droplets mouse_droplets nonempty_droplets human_and_mouse_droplets\n",
"1 2005 2005 4000 10 \n",
"2 2050 2050 4000 100 \n",
"3 2500 2500 4000 1000 "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"library(dplyr)\n",
"\n",
"df_equal <- df_equal %>%\n",
" mutate(human_and_mouse_droplets=\n",
" human_droplets + mouse_droplets - nonempty_droplets)\n",
"\n",
"df_equal"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We now calculate the multiplet frequency in two ways.\n",
"The first way is to precisely calculate the multiplet frequency using the exact Poisson derivation as implemented in the `multiplet_freq` function above.\n",
"\n",
"The second way is to do the simple calculation that has commonly been used in paper that do equal-proportion mixing.\n",
"This method is to simply estimate the multiplet frequency as twice the frequency of cross-cell-type droplets among all non-empty droplets (that is, as $\\frac{2 \\times N_{1,2}}{N_1 + N_2 + N_{12}}$)."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"human_droplets | mouse_droplets | nonempty_droplets | human_and_mouse_droplets | multiplet_freq | twice_cross_celltype_freq |
\n",
"\n",
"\t2005 | 2005 | 4000 | 10 | 0.005 | 0.005 |
\n",
"\t2050 | 2050 | 4000 | 100 | 0.049 | 0.050 |
\n",
"\t2500 | 2500 | 4000 | 1000 | 0.425 | 0.500 |
\n",
"\n",
"
\n"
],
"text/latex": [
"\\begin{tabular}{r|llllll}\n",
" human\\_droplets & mouse\\_droplets & nonempty\\_droplets & human\\_and\\_mouse\\_droplets & multiplet\\_freq & twice\\_cross\\_celltype\\_freq\\\\\n",
"\\hline\n",
"\t 2005 & 2005 & 4000 & 10 & 0.005 & 0.005\\\\\n",
"\t 2050 & 2050 & 4000 & 100 & 0.049 & 0.050\\\\\n",
"\t 2500 & 2500 & 4000 & 1000 & 0.425 & 0.500\\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"human_droplets | mouse_droplets | nonempty_droplets | human_and_mouse_droplets | multiplet_freq | twice_cross_celltype_freq | \n",
"|---|---|---|\n",
"| 2005 | 2005 | 4000 | 10 | 0.005 | 0.005 | \n",
"| 2050 | 2050 | 4000 | 100 | 0.049 | 0.050 | \n",
"| 2500 | 2500 | 4000 | 1000 | 0.425 | 0.500 | \n",
"\n",
"\n"
],
"text/plain": [
" human_droplets mouse_droplets nonempty_droplets human_and_mouse_droplets\n",
"1 2005 2005 4000 10 \n",
"2 2050 2050 4000 100 \n",
"3 2500 2500 4000 1000 \n",
" multiplet_freq twice_cross_celltype_freq\n",
"1 0.005 0.005 \n",
"2 0.049 0.050 \n",
"3 0.425 0.500 "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"df_equal <- df_equal %>%\n",
" mutate(\n",
" multiplet_freq=multiplet_freq(\n",
" human_droplets, mouse_droplets, human_and_mouse_droplets),\n",
" twice_cross_celltype_freq=\n",
" 2 * human_and_mouse_droplets / nonempty_droplets\n",
" )\n",
"\n",
"df_equal %>% format(digits=2)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As can be seen above, the two methods give virtually identical results as long as the number of cross-celltype droplets is small relative to the total number of droplets.\n",
"The reason that the two methods aren't identical as simply calculating the multiplet frequency as twice the cross-celltype frequency neglects to account for droplets that have more than two cells.\n",
"However, this difference only becomes appreciable when the multiplet frequency is high.\n",
"So for the examples above, we can see that it only really matters in the case when the true multiplet frequency is $\\approx$0.425; in that case, simply taking twice the cross-celltype frequency slightly overestimates the true multiplet frequency"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Example when cell types are mixed unequally\n",
"Now we repeat the example above, but for experiments where the cell types are mixed unequally.\n",
"\n",
"Below we give some example calculations for various numbers. \n",
"An interesting (and initially non-intuitve aspect) of the results is that when the cells are mixed highly unequally and multiplets are common, the multiplet frequency is actually substantially **less** than the fraction of droplets with the rare cell type that are multiplets. \n",
"The reason is that multiplets are more likely than singlets to have a cell of the rarer type, and become progressively more likely to have a cell of the rare type as the number of cells in the multiplet increases."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"human_droplets | mouse_droplets | nonempty_droplets | human_and_mouse_droplets | multiplet_freq |
\n",
"\n",
"\t2050 | 2050 | 4000 | 100 | 0.049 |
\n",
"\t3050 | 1050 | 4000 | 100 | 0.065 |
\n",
"\t3550 | 550 | 4000 | 100 | 0.110 |
\n",
"\t3850 | 250 | 4000 | 100 | 0.245 |
\n",
"\t3950 | 150 | 4000 | 100 | 0.459 |
\n",
"\n",
"
\n"
],
"text/latex": [
"\\begin{tabular}{r|lllll}\n",
" human\\_droplets & mouse\\_droplets & nonempty\\_droplets & human\\_and\\_mouse\\_droplets & multiplet\\_freq\\\\\n",
"\\hline\n",
"\t 2050 & 2050 & 4000 & 100 & 0.049\\\\\n",
"\t 3050 & 1050 & 4000 & 100 & 0.065\\\\\n",
"\t 3550 & 550 & 4000 & 100 & 0.110\\\\\n",
"\t 3850 & 250 & 4000 & 100 & 0.245\\\\\n",
"\t 3950 & 150 & 4000 & 100 & 0.459\\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"human_droplets | mouse_droplets | nonempty_droplets | human_and_mouse_droplets | multiplet_freq | \n",
"|---|---|---|---|---|\n",
"| 2050 | 2050 | 4000 | 100 | 0.049 | \n",
"| 3050 | 1050 | 4000 | 100 | 0.065 | \n",
"| 3550 | 550 | 4000 | 100 | 0.110 | \n",
"| 3850 | 250 | 4000 | 100 | 0.245 | \n",
"| 3950 | 150 | 4000 | 100 | 0.459 | \n",
"\n",
"\n"
],
"text/plain": [
" human_droplets mouse_droplets nonempty_droplets human_and_mouse_droplets\n",
"1 2050 2050 4000 100 \n",
"2 3050 1050 4000 100 \n",
"3 3550 550 4000 100 \n",
"4 3850 250 4000 100 \n",
"5 3950 150 4000 100 \n",
" multiplet_freq\n",
"1 0.049 \n",
"2 0.065 \n",
"3 0.110 \n",
"4 0.245 \n",
"5 0.459 "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"data.frame(\n",
" human_droplets=c(2050, 3050, 3550, 3850, 3950),\n",
" mouse_droplets=c(2050, 1050, 550, 250, 150),\n",
" nonempty_droplets=c(4000, 4000, 4000, 4000, 4000)\n",
" ) %>%\n",
" mutate(\n",
" human_and_mouse_droplets=\n",
" human_droplets + mouse_droplets - nonempty_droplets,\n",
" multiplet_freq=multiplet_freq(\n",
" human_droplets, mouse_droplets, human_and_mouse_droplets)\n",
" ) %>%\n",
" format(digits=2)"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "R",
"language": "R",
"name": "ir"
},
"language_info": {
"codemirror_mode": "r",
"file_extension": ".r",
"mimetype": "text/x-r-source",
"name": "R",
"pygments_lexer": "r",
"version": "3.4.3"
},
"toc": {
"nav_menu": {},
"number_sections": false,
"sideBar": true,
"skip_h1_title": false,
"toc_cell": true,
"toc_position": {
"height": "628.5px",
"left": "0px",
"right": "869px",
"top": "110.5px",
"width": "212px"
},
"toc_section_display": "none",
"toc_window_display": true
}
},
"nbformat": 4,
"nbformat_minor": 2
}