Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2022 Oct 1.
Published in final edited form as: Electrophoresis. 2021 Sep 6;42(20):2070–2080. doi: 10.1002/elps.202100144

Segmentation-based analysis of single-cell immunoblots

Anjali Gopal 1,2, Amy E Herr 1,2,3
PMCID: PMC8526408  NIHMSID: NIHMS1744505  PMID: 34357587

Abstract

From genomics to transcriptomics to proteomics, microfluidic tools underpin recent advances in single-cell biology. Detection of specific proteoforms—with single-cell resolution—presents challenges in detection specificity and sensitivity. Miniaturization of protein immunoblots to single-cell resolution mitigates these challenges. For example, in microfluidic western blotting, protein targets are separated by electrophoresis and subsequently detected using fluorescently labeled antibody probes. To quantify the expression level of each protein target, the fluorescent protein bands are fit to Gaussians; yet, this method is difficult to use with noisy, low-abundance, or low-SNR protein bands, and with significant band skew or dispersion. In this study, we investigate segmentation-based approaches to robustly quantify protein bands from single-cell protein immunoblots. As compared to a Gaussian fitting pipeline, the segmentation pipeline detects >1.5× more protein bands for downstream quantification as well as more of the low-abundance protein bands (i.e., with SNR ~3). Utilizing deep learning-based segmentation approaches increases the recovery of low-SNR protein bands by an additional 50%. However, we find that segmentation-based approaches are less robust at quantifying poorly resolved protein bands (separation resolution, Rs < 0.6). With burgeoning needs for more single-cell protein analysis tools, we see microfluidic separations as benefitting substantially from segmentation-based analysis approaches.

Keywords: Immunoassay, Proteoform, Single cell, Western blotting

1. Introduction

Single-cell protein analysis is critically important for understanding cellular heterogeneity in a range of processes including cell development, differentiation, and cancer progression [13]. More specifically, detecting and quantifying protein isoform expression at the single-cell level greatly increases our ability to interrogate the human proteome [46]. While estimates suggest that there are upwards of 19,000 human coding genes, each associated protein is estimated to have up to 100 unique proteoforms that can be generated by processes such as alternative splicing, post-translational modifications, or single-nucleotide polymorphisms [7,8]. As a result, single-cell proteoform measurements are key to unlocking critical unknowns in the human protein landscape.

Miniaturization of analytical tools has provided new avenues for interrogating a range of key biological questions. Separation-based microscale assays enable new insights into fundamental biological phenomena including analysis of DNA damage [9,10], enzyme upregulation [11], metabolites [1214], and binding interactions [15,16]. For proteoform analysis, microscale assays, such as single-cell immunoblots (SCI), analyze dozens of protein targets (including isoforms) in arrays of tens to hundreds of single-cell separations [5,17,18]. In SCI, a single-cell suspension (e.g., from a dissociated tumor biopsy) is seeded onto a 40–50 μm thick polyacrylamide (PA) gel (Fig. 1A). To isolate each single cell, the PA gel itself is stippled with 30–40 μm diameter microwells, into which single cells are settled via gravitational sedimentation. A dual lysis/electrophoresis buffer facilitates in situ cell lysis, followed by ultra-rapid SDS-PAGE within each single-cell lysate. After separation, the resultant protein bands are immobilized within the hydrogel matrix via a UV-activatable chemical crosslinker (benzophenone), and subsequent protein detection occurs by application of primary, followed by fluorescently labeled secondary, antibodies (immunoblotting). The entire SCI gel array is imaged via a laser microarray scanner. By dividing the array into individual separation lanes, protein and isoform expression profiles are surveyed within each single cell [17].

Figure 1.

Figure 1.

Single-cell immunoblotting and downstream quantification interrogates isoform expression levels in single cells. (A) Single-cell suspensions (e.g., consisting of dissociated cells from a tumor biopsy) are seeded onto a thin (45 μm height) PA gel fabricated on a microscope slide. The gel is stippled with microwells (between 30 and 40 μm in diameter), which are typically spaced 400 × 1000 μm apart. Single cells settle into microwells via gravitational sedimentation. The entire microfluidic chip is submerged in a bath of dual lysis/electrophoresis buffer, resulting in in situ cell lysis, followed by SDS-PAGE. A UV-activated crosslinker chemistry (benzophenone) allows covalent immobilization of proteins to the hydrogel. To visualize protein bands, hydrogels are serially incubated with primary and fluorescently labeled secondary antibodies, washed, dried under an N2 stream, and scanned with a laser microarray scanner. The resulting array scan can be separated into individual separation lanes consisting of protein bands from single cells. (B) One-dimensional Gaussian fitting is performed by first averaging the fluorescence signal across the lateral axis of an individual separation lane. The resulting trace (black) can be fit to multiple Gaussian distributions (red), each corresponding to a protein band of interest. Gaussian fitting enables extraction of key characteristics of the electrophoretically separated protein bands including peak center (migration distance, which is correlated with protein molecular mass in denaturing PAGE), peak width, and protein quantity (area under the curve). (C) Segmentation involves thresholding and binarizing the separation lane pixels into background and foreground classes via Otsu’s method. The resulting image is subject to morphological image processing (morphological open, close, and dilation steps with disk-shaped elements of user-specified sizes), followed by application of the Watershed transform to separate the foreground into distinct protein bands. Use of the Watershed transform also allows for identification of other segmented objects in the separation lane, which often occur due to nonspecific binding of antibodies to other protein targets. With segmentation, the protein quantity (spot volume [SV]) is calculated by directly summing the intensities of the pixels corresponding to each individual protein band. The immunoblots shown in A-C are a representative example, and are also reproduced in Fig. 4.

Substantial innovation has been made in the types of single-cell proteoforms detected [5,18]. However, to ensure robust interrogation of key biological characteristics, including the underlying isoform expression level distribution, careful quantification of SCI is needed. Currently, 1D Gaussian fitting via nonlinear least squares regression is the standard approach for SCI quantification (Fig. 1B) [19,20]. Within a given separation lane, SCI confers a 2D Gaussian distribution to protein bands, due to diffusion during PAGE [17]. Similar to CZE [21,22], collapsing the immunoblot profile into a 1D electropherogram and fitting the resulting separation profiles to a Gaussian distribution extracts key analytical parameters, including location of the protein band center (mean of the Gaussian distribution), total protein quantity (area-under-the-curve [AUC]), and separation resolution (RS), or the degree of separation between two protein bands [19,20].

However, Gaussian fitting also suffers several disadvantages: specifically, due to the compression of a 2D separation lane into a 1D electropherogram, noise in the vicinity of the protein band can affect quantification accuracy when utilizing Gaussian fitting [20]. Furthermore, quantifying protein bands that have non-standard Gaussian diffusion profiles, including bands with significant skew or kurtosis, may also suffer with conventional Gaussian fitting pipelines without tuning several additional fit parameters.

By contrast, segmentation is the state-of-the art approach in identifying protein expression levels in single-cell immunofluorescence measurements [2325], in bulk 2D electrophoresis [2629], and in other single-cell separations such as comet assays [30]. Segmentation-based approaches can increase the accuracy of protein band quantification by utilizing the spatial information present in 2D separation profiles, and such approaches have been utilized for other single-cell measurements. However, to our knowledge, segmentation-based methods have not yet been applied to SCI.

In this study, we investigate the use of segmentation-based approaches to robustly quantify SCI (Fig. 1C). We first develop and benchmark the quantification accuracy of a segmentation pipeline against the Gaussian fitting pipeline for SCI [20]. Next, we explore the ability of segmentation-based methods to identify multiple protein isoforms from the same cell. Finally, we investigate automation of the segmentation process via deep learning. We use these results to uncover the principles of segmentation that may make it a more, or less, suitable method than Gaussian fitting for analyzing SCI.

2. Theory and data analysis

In this section, we introduce the mathematical principles underpinning analysis of SCI readouts for (1) protein target quantification, by comparing 1D Gaussian fitting and segmentation and (2) separation resolution of the protein targets of interest.

2.1. Gaussian fitting pipeline

To quantify protein bands identified by SCI via Gaussian fitting, we used a previously published analysis pipeline [20]. Briefly, individual separation lanes from a scanned SCI array undergo background subtraction by subtracting the average of 10 pixels from gutter regions (5 pixels from the leftmost and rightmost edge of the separation lane; Supporting Information Fig. S1A). Separation lanes are then transformed into a 1D electropherogram by averaging across the lateral axis of the separation lane (Supporting Information Fig. S1B). Finally, Gaussian fitting is performed on each individual trace via nonlinear least squares regression. A series of quality control steps are used to ensure that identified protein bands are suitable for downstream analysis (see Supporting Information Note S1). Critically, separation lanes with protein bands whose estimated SNR <3 (as defined by the Gaussian fit) are discarded. Although utilizing a SNR ≥ 3, quality control metric is essential for separating true signal from background noise, we discuss pitfalls of calculating the SNR of a 2D protein band from a 1D electropherogram in the section below.

2.2. Segmentation pipeline

Identical to the Gaussian fitting pipeline, fluorescence micrographs of the individual separation lanes from a scanned SCI array undergo background subtraction by averaging each of the 5 pixel gutter regions on either edge of the separation lane. All separation lanes are then overlaid into a single image, and the user is prompted to draw a rectangle encompassing the region containing the approximate centroids of the protein band(s) (see Supporting Information Fig. S2A). Next, each individual separation lane undergoes thresholding and image binarization via Otsu’s method. To reduce the presence of punctate noise, separation lanes are subject to a series of image erosion and dilation steps (morphological opening and closing). Finally, the resulting image undergoes a distance transform, which is then used as the seed for the Watershed transform (Fig. 1C).

Otsu’s method suffers in cases where the amount of background pixels is substantially greater than the amount of foreground pixels, and when attempting to threshold noisy images [31]. Noise and uneven background in SCI can result from various sources, including debris on the surface of the hydrogel, spatial variation from nonuniform immunoreagent distribution, and nonspecific interaction of immunoreagents with the hydrogel substrate including benzopinacol groups [32,33]. In the case of hard-to-segment separation lanes with significant noise, we allowed the user to specify threshold values that were fractionally smaller than the initial threshold found by Otsu’s method, to increase segmentation accuracy.

Once the segmentation map is produced via the Watershed transform, additional quality control steps filter out separation lanes with low-quality protein bands, or poorly segmented protein bands. First, all segmented regions outside of the user-defined rectangular boundaries are discarded. Next, separation lanes where the segmented regions-of-interest are larger than a user-specified maximum circularity (default = 1.5) or smaller than a user-specified minimum area (default = 125) are discarded. Of the resulting separation lanes, any lanes where the SNR of an identified protein band is <3 are discarded. Finally, a manual quality control step is performed, where users are tasked with discarding separation lanes where (i) segmentation was performed on noise or artifacts, or (ii) where the presence of artifacts within the protein band would obscure downstream quantification (see Supporting Information Fig. S2B).

2.3. Calculating spot volume

To assess how well either pipeline evaluates key protein band metrics, we first consider the total protein quantity that is present within a protein band. This parameter is termed the “spot volume” (SV) in the separation science literature [28]. In immunoblotting, SV is a proxy for protein target expression, and therefore, has key implications for assessing biological variability across single cells [19].

With segmentation, SV is the sum of pixel intensities for the pixels identified as belonging to a protein band (Supporting Information Fig. S1C). Thus, assuming that there are n pixels in a protein band, each having Ii fluorescence intensity after background subtraction, SVSegmentation can be expressed as:

SVSegmentation =i=1nIi (1)

In contrast, the Gaussian fitting pipeline determines SVGaussian by calculating the AUC of the protein band, within ±2σ from the peak center, corresponding to approximately 95% of the Gaussian distribution (Supporting Information Fig. S1B). Equation (2) describes this relationship, where μ corresponds to the peak center, σ corresponds to one standard deviation of the Gaussian distribution, and A corresponds to the maximum amplitude of the Gaussian distribution.

SVGaussian =x=μ2σx=μ+2σAe(xμ)22σ2 (2)

The averaging operation that collapses each protein band into a 1D electropherogram will create an offset factor between SVSegmentation and SVGaussian. The offset factor is expected to be proportional to the width of the separation lane, w. However, since this averaging is primarily occurring across background-subtracted background pixels, the background pixel values are expected to sum to zero. As a result, we expect strong correlation between SVSegmentation and SVGaussian. Importantly, SV is considered a semi-quantitative metric, which is dependent on antibody probe affinity to the protein band, fluorescence characteristics of the antibody probe label, and the imaging modality [34,35]. As such, the absolute SV values produced by each pipeline are less critical than maintaining the relative SV values between protein bands in a given dataset. Consequently, we posit that SV is a suitable benchmarking metric, allowing direct comparison of performance between the Gaussian fitting and segmentation pipelines.

2.4. Calculating signal-to-noise ratio

The SNR provides information about whether the measured protein quantity surpasses the lower limit of detection (LLOD) of an assay, and thus, whether the measured protein band is detectable. To calculate SNR according to the widely used definition in analytical chemistry (SNRAC), we use Eq. (3), where μsignal corresponds to the mean signal intensity of the protein band, μbackground corresponds to the mean signal intensity of the background region, and σbackground corresponds to the background noise.

SNRAC=μsignal μbackground σbackground  (3)

Specifically, the LLOD for an assay is defined as the minimum analyte (protein) concentration that produces a readout signal with SNR ≥ 3. For SCI, if a protein band does not satisfy SNR ≥ 3, the protein band is indistinguishable from noise, and is discarded from downstream analysis.

An alternate definition of SNR used in the separation science literature (e.g., CE, chromatography) is defining SNR as the ratio between a Gaussian peak maximum (Asignal, where A refers to the maximum amplitude of a Gaussian band) and the root-mean-square noise (NRMS) of a region of the chromatogram or electropherogram that does not contain protein signal (2 × σtrace) (Supporting Information Fig. S1B) [22]. We refer to this formulation as SNR Electrophoresis/Chromatography (SNRE/C) as given by Eq. (4):

SNRE/C=Asignal NRMS=Asignal 2σtrace  (4)

SNRAC and SNRE/C are not equivalent for a given protein band. Calculating the mean signal intensity of a protein band involves calculating the mean signal of the pixels in the protein band (Supporting Information Fig. S1D; for a representative protein band, μband =432.9), which typically has a smaller mean value than the maximum amplitude of the protein band (Supporting Information Fig. S1B; μ1D Gaussian = 1020.1). In this manner, we can see that within either the 2D or the 1D case, Asignal > μsignal.

The noise term between Eqs. (3) and (4) also differs. In SNRAC, noise is the standard deviation in a region of the measurement that does not contain any signal. In previous discussions of SCI, the noise term, σbackground, is calculated by evaluating the standard deviation of two thin (~5 pixel) gutter regions at the edge of the separation lane, where protein is not expected to be present (Supporting Information Fig. S1A; σGutter = 107.7). We assume that the pixels in the gutter region are drawn independent and identically distributed from the same distribution as the true background distribution, which we denote as σX. Thus, we approximate σgutterσX.

However, when evaluating the RMS noise in SNRE/C, we are not drawing from the same background distribution. Instead, we are calculating the standard deviation of a series of averaged background pixels. Specifically, if X is an independent and identically distributed background pixel, we can expect a single averaged data point to be equivalent to Xw, where w is the width of the lateral axis of the separation lane. From this, we can show that the RMS noise, NRMS=2σTrace =2Var(ΣXw)=2σXw, which understates the true noise of the micrograph (σX) by a factor of w2. Finally, unlike chromatography and electrophoresis methods, it should also be noted that calculating σtrace directly from the SCI 1D electropherogram presents more challenges, due to the presence of other protein isoforms and nonspecific bands along the length of the separation lane (see Supporting Information Fig. S1B; NRMS = 61.2) [20]. Therefore, previous studies have typically used σgutter to evaluate noise [20].

Additionally, the SNR calculations also produce diverging results for 2D versus 1D calculations. Calculating the signal term in SNRAC for the 2D case would involve calculating the average pixel intensity of the 2D protein band (see Supporting Information Fig. S1E; μband = 2215.04). In the 1D Gaussian case, this would involve calculating the average pixel intensity of an averaged 1D electropherogram (see Supporting Information Fig. S1D; μband = 432.9). As a result, the mean signal intensity value calculated in the 1D Gaussian case would be smaller than the average calculated in the 2D case.

A similar result occurs for SNRE/C. In the 1D case, Asignal refers to the maximum amplitude of the 1D Gaussian (Supporting Information Fig. S1B; μ1D Gaussian = 1020.1). In the 2D case, we can similarly calculate the maximum amplitude of the 2D multivariate Gaussian that defines a protein band injection profile (Supporting Information Fig. S1F; μ2D Gaussian = 5514.9). Similar to SNRAC, the maximum amplitude of the 1D Gaussian would be smaller than the maximum amplitude of the 2D multivariate Gaussian, due to the averaging operation required to construct the 1D electropherogram.

In this manner, we can see that direct comparisons of SNR with the same method (SNRE/C or SNRAC) across the 2D versus 1D case would produce a factor offset between the two dimensions. However, when comparing between SNRE/C or SNRAC across the two different dimensions, the relationship in SNR may not be as clear.

In accordance with standard practice in previous SCI work, we adopt a definition of SNRGaussian =A1D Signal σgutter  when evaluating the SNR with the Gaussian fitting pipeline [20]. However, for the segmentation pipeline, we adopt a definition of SNRSegmentation = SNRAC, where σbackground = σgutter. Due to the differences in quantification methodology between SNRGaussian and SNRSegmentation, we expect the correlation between SNR values to be lower than the SV correlation.

2.5. Calculating separation resolution

Separation resolution (Rs) defines how well-resolved neighboring protein bands are along a separation lane [36]. Rs is defined in Eq. (5), where μ1 and μ2 correspond to the Gaussian peak centers of the two protein bands under consideration, and 4σ1 and 4σ2 correspond to each respective peak width [36].

Rs=|μ1μ2|12(4σ1+4σ2) (5)

Importantly, Rs is typically calculated after neighboring protein bands are fit to a 1D Gaussian distribution. Baseline resolution (Rs ≥ 1.5) separates two neighboring protein bands with <1% overlap in 1D Gaussian distributions. However, achieving this stringent baseline resolution can be infeasible (i.e., resolving multiple proteins targets in a fixed-length separation lane). At the other extreme, distinguishing neighboring protein bands can be infeasible when Rs < 0.5 [36].

3. Materials and methods

SCIs were run as previously described [17,19]. Briefly, each microfluidic chip completes concurrent analysis of hundreds to thousands of individual cells, owing to an array of approximately 1100 individual separation lanes each headed by a 30–32 μm diameter microwell for single-cell isolation and lysis [17,19]. SDS-PAGE separates protein targets based on differences in molecular mass. Much like DNA or protein microarrays, the resulting SCI readout is an array of fluorescence micrographs, with each micrograph reporting an immunoblot for one single cell (Fig. 1A). By dividing the scanned array into individual separation lanes, we calculate protein expression profiles from each individual cell.

For benchmarking analysis, SCIs were performed on a breast cancer cell line (MCF7 cells), where each cell has been transfected with GFP. We analyzed a total of five chips for this MCF7 dataset. An additional three chips were analyzed for the SNR fold change analysis with PA gels of varying pore sizes. For isoform separations, SCI data was obtained from Kim et al., which assessed protein markers in an MCF7 cell line treated with tamoxifen [46]. For SNR fold-change data using protein molecular mass, SCI data were obtained from Kang et al., which assessed protein markers in a BT474 cell line [6]. Additional detail about reagents, cell culture, and assay parameters are given in Supporting Information Note S2.

4. Results

4.1. Benchmarking results

To evaluate the robustness of segmentation-based methods versus conventional Gaussian fitting, we first benchmarked against SV. We analyzed protein bands from MCF7 SCIs. From five chips, we analyzed the subset of separation lanes (117 lanes from a total of 6868 separation lanes) that were identified as containing a protein band (“inferred positive”) from both the segmentation and Gaussian fitting pipelines. Separation lanes that were inferred positive by only one of the two pipelines (104 lanes total, 17 from the Gaussian fitting pipeline, and 90 from the segmentation pipeline) were not included in this analysis, since the pipeline that did not identify a protein band in the resulting separation lane would produce an SV value of zero, thus confounding benchmarking.

In testing the supposition of strong correlation between SVSegmentation and SVGaussian, we observe R2 value >0.99 (Fig. 2A). Furthermore, in testing the supposition that an offset factor in SV would exist between the Gaussian fitting pipeline and the segmentation pipeline, we evaluate the slope of a linear regression model between the two datasets, and, indeed, observe an offset of 62.5. As a corollary supposition, we expected that the offset factor would approximate the width of the separation lane (80 pixels), which also appears to be supported here. Thus, we conclude that the segmentation pipeline is analyzing protein bands as accurately as the Gaussian fitting pipeline, for similar datasets.

Figure 2.

Figure 2.

Segmentation correlates well with Gaussian fitting for SV and SNR. (A) For the subset of separation lanes that were inferred positive by both Gaussian fitting and segmentation, the SV identified in each lane shows good agreement between the two pipelines (R2 = 0.995). (B) For the subset of separation lanes that were inferred positive by both Gaussian fitting and segmentation, the SNR of the separation lanes also shows good agreement (R2 = 0.941), though smaller agreement than that of SV. (C) When comparing the number of inferred positive separation lanes across n = 5 chips, segmentation identifies 1.5 × more separation lanes with protein bands than Gaussian fitting. The Wilcoxon signed-rank test was used to identify statistical significance (p = 0.042). (D) Of the 207 separation lanes that were inferred positive by segmentation and 131 separation lanes that were inferred positive by Gaussian fitting, 117 were inferred positive by both. (E) The separation lanes that were inferred positive by the segmentation pipeline had a greater number of low SV protein bands, compared to those that were inferred positive by both pipelines (p = 0.0017 for two-tailed Mann–Whitney U-test). (F) The separation lanes inferred positive by the segmentation pipeline also had a greater proportion of low-SNR protein bands, compared to the separation lanes inferred positive by both pipelines (p = 1.21 × 10−5 for two-tailed Mann–Whitney U-test). SV and SNR were evaluated using the segmentation pipeline.

We next assessed the correlation of estimated SNR values between the two pipelines. We expected the SNR correlation to be lower than the SV correlation, due to the differences in quantification methodology when evaluating SNR in each pipeline. Indeed, we observe a lower, but nevertheless strong, correlation (R2 = 0.941) for SNR values between the segmentation and Gaussian fitting pipelines (Fig. 2B).

Finally, we investigated whether one pipeline identified substantially more analyzable protein bands than another. Our results indicate that the segmentation pipeline identified 1.5× more separation lanes with protein bands as compared to the Gaussian fitting pipeline (Fig. 2C; p = 0.042; segmentation identified a total of 207 separation lanes with protein bands, as compared to 131 identified by Gaussian fitting). We explore this increase in protein band identification in the next section.

4.2. Comparison of inferred positive separation lanes across methods

To understand why segmentation identified more separation lanes with protein bands than Gaussian fitting, we compared the distributions of SV and SNRs of inferred positive separation lanes between both pipelines.

Given that we observed a factor offset between SVSegmentation and SVGaussian, we cannot directly compare distributions of SV values between the two pipelines. However, the majority of inferred positive separation lanes identified by Gaussian fitting (n = 131) was also inferred positive by segmentation (n = 117 inferred positive by both methods), compared to n = 207 inferred positive by segmentation alone (Fig. 2D). Thus, we assumed that the 117 inferred positive separation lanes identified by both methods would sufficiently describe the full distribution of separation lanes inferred positive by Gaussian fitting.

We compared the overlapping subset of inferred positive separation lanes to those inferred positive by segmentation alone. Through this comparison, we observed that the distribution of separation lanes inferred positive with segmentation included significantly more low-SV and low-SNR protein bands (SV, p = 0.0017, Mann–Whitney U-Test; SNR, p = 1.21 × 10−5, Mann–Whitney U-Test) (Fig. 2E, Fig. 2F). In the converse comparison, wherein we compared the subset of separation lanes inferred positive by both methods against the full set of separation lanes identified by Gaussian fitting, we found no substantial differences between the two distributions (see Supporting Information Fig. S3; p > 0.05 for SV and SNR). Therefore, we conclude that the Gaussian pipeline is not identifying substantially different protein bands compared to the segmentation pipeline. We further conclude that the segmentation pipeline is, indeed, identifying more low-abundance protein bands than the Gaussian fitting pipeline.

4.3. Assessing SNR fold change with varying peak width

We next wished to assess how SNR values for a single protein band would compare between the two pipelines. Although we determined that the segmentation pipeline identified more low-abundance protein bands, we hypothesized that due to the SNR attenuation in the Gaussian fitting pipeline (1) the SNR value for each protein band in the segmentation pipeline would be greater than the SNR value for each protein band in the Gaussian fitting pipeline; and (2) this “SNR fold change” value (the ratio of SNRSegmentation/SNRGaussian) would increase as the lateral protein band width decreased. Specifically, we hypothesized that with decreasing protein band width, there would be increasing mean signal truncation with the Gaussian fitting pipeline (due to the lateral averaging, wherein narrower protein bands will have a greater number of abutting background pixels, summing to zero-pixel intensities). In tandem, we hypothesized that there would be increasing mean signal intensity in the segmentation pipeline due to the smaller protein segments (a smaller number of low-intensity pixels, with high intensity pixels concentrated near the protein band center).

We sought to test our hypothesis by systematically modulating protein band widths in our separations. Protein band width is determined by several factors including electrophoresis duration (diffusional band broadening), PA gel pore size, protein molecular mass, and other factors [3638].

We performed SCI of MCF7-GFP cells on SCI chips with varying gel densities (6%T, 8%T, and 10%T, where %T represents total acrylamide concentration). A greater gel density results in smaller pores in the PA gel sieving matrix, thus, inducing less band broadening during a given electrophoresis run time, and therefore, a smaller peak width (Fig. 3A) [38]. Thus, we hypothesized that the SNR fold change would be greater for proteins electrophoresed in the denser gels.

Figure 3.

Figure 3.

Decreasing protein band width leads to greater SNR truncation with Gaussian fitting. (A) Greater gel densities (denoted by %T, total acrylamide concentration) lead to smaller pore sizes, which results in smaller band broadening (and a smaller migration distance) as proteins travel along the length of the separation lane. The immunoblots displayed are representative examples, and the 6%T immunoblot is reproduced in Supporting Information Fig. S1A and C. (B) The ratio of SNRs identified with the segmentation pipeline versus the Gaussian fitting pipeline (SNR fold change) increases as gel density increases, demonstrating that the Gaussian fitting pipeline truncates the SNR of smaller protein bands to a greater extent with increasing gel density.

Our results demonstrate that the SNR fold change does, indeed, increase as gel density increases (Fig. 3B). In addition, we also see an increase in the ratio of inferred positive separation lanes from segmentation compared to Gaussian fitting (see Supporting Information Note S3 and Table S1). A similar test on proteins with varying molecular masses also demonstrated that as protein band width decreases, the SNR fold change increases (see Supporting Information Note S4 and Fig. S4).

4.4. Assessing isoform quantification with segmentation

We next turned to the question of whether segmentation can accurately quantify multiple isoforms in the same separation lane. As mentioned, Gaussian fitting has difficulty detecting isoforms with Rs < 0.5. Consequently, we sought to determine if segmentation requires a larger Rs for accurate quantification of isoforms, since the Gaussian fitting pipeline can account for partial fluorescence intensities from overlapping protein bands that are not baseline resolved, whereas the current instantiation of the segmentation pipeline cannot.

To assess the ability of the segmentation pipeline to distinguish between isoforms, we used a canonical SCI dataset that aimed to resolve two estrogen receptor (ER) isoforms: ER-α66, a 66 kDa isoform, and ER-α46, a 46 kDa isoform in the breast cancer MCF7 cell line (Fig. 4A) [46]. ER-α46 lacks a transactivation domain present in the larger isoform, and has been hypothesized to reduce proliferation of certain types of breast cancer [39]. Importantly, since both isoforms are detected with the same fluorescently labeled antibody probe, PAGE separation and accurate quantification is critical in assessing the stoichiometry of the isoform expression.

Figure 4.

Figure 4.

Segmentation-based detection of ERα isoforms shows good agreement when RS > 0.6. (A) ER-α66 (66 kDa) and ER-α46 (46 kDa) are separated on a PA gel by SCI. The displayed immunoblot is a representative example, and is also reproduced in Fig. 1AC. (B) Gaussian fitting and segmentation demonstrate good agreement of SVs of the more abundant isoform, ER-α66 (R2 = 0.984). (C) Gaussian fitting and segmentation do not demonstrate good agreement of SVs of ER-α46 (R2 = 0.189). (D) When assessing SVs of ER-α46 from separation lanes that have RS > 0.6, we find better agreement between the segmentation and Gaussian fitting pipelines (R2 = 0.826).

To assess agreement between the segmentation and Gaussian fitting pipelines, we once again compared SV values for separation lanes that were inferred positive by both pipelines. The SV values obtained by the two pipelines demonstrated good agreement for the first isoform (Fig. 4B; R2 = 0.985), whereas the SV values for the second isoform initially demonstrated poor agreement (R2 = 0.189; Fig. 4C).

We sought to further understand the source of discrepancy for the SV values of the second isoform. When evaluating the Rs of the inferred positive separation lanes, we observed an average of Rs = 0.63 ± 0.16 (see Supporting Information Fig. S5A for an example of a separation lane with an above-average Rs). We hypothesized that when peaks had low separation resolution (Supporting Information Fig. S5B), the segmentation pipeline cannot account for areas of peak overlap, and therefore, would produce more quantification errors. Indeed, when we restricted the subset of separation lanes analyzed to those that had an above-average separation resolution (Rs ≥ 0.63), we achieved good agreement between SV values for the second isoform (Fig. 4D; R2 = 0.826). We, thus, conclude that the segmentation pipeline requires higher Rs between neighboring protein bands to appropriately distinguish multiple bands along a single separation lane. We further conclude that in cases where the separation resolution is insufficient (Rs < 0.6), Gaussian fitting may be a more appropriate approach for isoform quantification.

4.5. Evaluating automated classification and segmentation with deep learning

Finally, we explored the use of convolutional neural networks, which have had success in classifying and segmenting complex images, to automatically classify and segment protein bands [4043]. Specifically, we hypothesized that neural networks could both (1) reduce several of the manual parameter tuning and quality control processes required by the classical segmentation pipeline for every new chip and (2) classify and segment additional separation lanes that were missed by classical segmentation, due to the parameter complexity of the neural network offering increased sensitivity in detecting protein bands. In this manner, we anticipated that convolutional neural networks could further build upon the throughput and accuracy improvements offered by the classical segmentation pipeline. To assess the efficacy of convolutional neural networks in classifying and segmenting protein bands, we developed a deep learning pipeline that used a classification model in conjunction with a segmentation model to produce a final output consisting of a segmented protein band (Fig. 5A). For classification, we developed a model based on the AlexNet framework, and used separation lanes from our previous immunoblots of MCF7-GFP cells as inputs (Fig. 5A) [40]. We used the inferred positive separation lanes classified by this model as inputs into our second model to segment protein bands, using the U-Net framework, which has had wide success in biomedical image segmentation [41]. After segmentation, a final quality control step is performed to ensure that identified protein bands have an SNR ≥ 3. More detail on the methods used to formulate and train the AlexNet and U-Net models are available in Supporting Information Note S5.

Figure 5.

Figure 5.

Deep learning accurately classifies and segments SCI protein bands. (A) A classification model, based on the AlexNet architecture, initially classifies separation lanes into inferred positive or inferred negative separation lanes. The inferred positive separation lanes are then used as inputs into the U-Net model, which produces a segmentation mask for a given segmentation lane. The outputs of the U-Net model are then checked to ensure that the SNR ≥ 3 in the segmented region. Key protein band parameters can be extracted out of the subsequent separation lanes. (B) The area under the receiver operating characteristic curve of the classification model is >0.99, indicating high classification accuracy. (C) The SVs of the 14 separation lanes that were inferred positive by both the classical segmentation and the deep learning pipeline demonstrate high correlation, indicating that the deep learning pipeline can accurately quantify and segment protein bands (R2 = 0.993). (D) The SNRs of the 14 separation lanes that were inferred positive by the classical segmentation and the deep learning pipelines also demonstrate high correlation (R2 = 0.966).

For our test set for the AlexNet Model, we observed an area under the receiver operating characteristic (AUROC) value of 0.9913 (Fig. 5B). From the softmax outputs of the AlexNet model, we identified our optimal threshold via maximizing the value of Youden’s J statistic on the validation set [44]. With this optimal threshold (0.043), we observed a classification accuracy on our test set of 98.1%. We scrutinized the precision and recall of our two classes (see Supporting Information Table S2). Interestingly, we observed 100% recall for the positive class (all 14 separation lanes that contained protein bands were accurately identified), but only 52% precision (an additional 13 peaks were classified as false positives). However, we significantly increased the precision using the segmentation model (U-Net).

We used all separation lanes that were inferred positive by the AlexNet model as inputs into the U-Net model, and we observed an overall model accuracy of 95.8%. However, we also evaluated the correlation of SV and SNR of separation lanes that were inferred positive with both the segmentation pipeline and the deep learning pipeline. After implementing a quality control metric to only select separation lanes with protein bands that had an SNR ≥ 3, we observed R2 values of 0.993 for SV (Fig. 5C), and 0.966 for SNR (Fig. 5D), indicating good agreement between the segmentation and deep learning pipelines for both metrics. Furthermore, after the SNR ≥ 3 cut off, we once again observed 100% recall for the positive class, and precision of classification improved to 64%. However, upon more careful examination of the “false positive” separation lanes identified by the deep learning pipeline, we noticed that seven out of the eight separation lanes did, in fact, contain protein bands that were missed by the classical segmentation pipeline (see Supporting Information Note S6 and Table S3), indicating that combining segmentation with deep learning allows for the identification of 50% more protein bands than classical segmentation approaches.

Our results suggest that the deep learning pipeline identifies additional protein bands that may be missed due to the stringent quality control steps needed in the segmentation pipeline. We further hypothesize that we can decrease the misclassification of the deep learning pipeline by improving the classification model, especially by increasing the number of training micrographs that contain artifacts similar to protein bands.

5. Discussion

In this work, we demonstrate that segmentation-based approaches can quantify SCI as successfully as Gaussian fitting for singular protein bands. We additionally demonstrate that segmentation-based approaches can identify a greater number of low-abundance protein bands than Gaussian fitting approaches, due to the fact that Gaussian fitting approaches truncate a protein band’s estimated SNR. Furthermore, segmentation-based approaches become increasingly more useful as the peak widths of proteins bands decrease, as SNR truncation from Gaussian fitting increases as protein peak width decreases. From these results, we conclude that when analyzing protein bands with small peak widths, segmentation-based approaches may increase the total number of quantifiable protein bands for downstream analysis.

However, we find that segmentation-based approaches struggle with identifying single-cell isoform immunoblots when the separation resolution between isoforms is insufficient (Rs < 0.6). Thus, in such cases, we anticipate that Gaussian fitting may be more suitable in accurately quantifying isoforms.

Finally, we discover that deep learning can further increase the number of protein bands recovered by segmentation-based approaches, owing to the larger parameter complexity used in deep learning models to classify, and segment, protein bands in each individual separation lane.

We anticipate that the results of this study will aid in achieving greater accuracy for protein immunoblot quantification including for quantification of other microfluidic (bulk, or few cell) immunoblots [32]. Looking forward, we envision that utilizing more sophisticated segmentation algorithms, such as 2D Gaussian fitting, may increase the accuracy of segmenting isoforms with Rs < 0.6, while also increasing the throughput of identifying low-abundance or low-peak width protein bands. Additionally, we surmise that segmentation-based approaches may open inquiries into quantifying protein bands with significant injection dispersion, which may provide valuable insight into microfluidic assay design properties and technical variability [45].

Supplementary Material

Supplementary Material

Acknowledgments

The authors are grateful for helpful discussions with Julea Vlassakis, PhD and Emery Goossens, PhD from the Doerge lab. We would also like to acknowledge Hector Neira, PhD, Samantha Grist, PhD, John J. Kim, PhD, and Alison Su for assistance in obtaining SCI datasets. Finally, we are grateful for Prof. Jonathan Shewchuk and the teaching assistants of the Spring 2019 offering of UC Berkeley’s CS289A course for helpful discussions and code.

This work has been supported by the Siebel Foundation (AG), the John and Elizabeth Lewis Scholarship (AG), the National Sciences and Engineering Research Council of Canada (NSERC) (AG, T32GM008155), a National Institutes of Health (NIH) training grant under award #T32GM008155 (AG), NIH R01CA203018 (AEH), and the Chan Zuckerberg Biohub Investigator Program (AEH).

AEH is an inventor on University of California SCI intellectual property and may benefit from licensing royalties.

Abbreviations:

%T

total acrylamide percentage

Asignal

maximum amplitude of a Gaussian band

AUC

area-under-the-curve

AUROC

area under the receiver operating characteristic

ER

estrogen receptor

LLOD

lower limit of detection

NRMS

root-mean-square noise

Rs

separation resolution

SCI

single-cell immunoblotting

SNR

signal-to-noise ratio

SV

spot volume

Data Availability Statement

The data and code that support the findings of this study are openly available in Github at https://github.com/anjaligopal/scisegmentation

References

  • [1].Yu C, Woods A, Levison D, Histochemistry 1992, 24, 121–131. [DOI] [PubMed] [Google Scholar]
  • [2].Labib M, Kelley SO, Nat. Rev. Chem 2020, 4, 143–158. [DOI] [PubMed] [Google Scholar]
  • [3].Finotello F, Eduati F, Front. Oncol 2018, 8, 430. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [4].Zhang Y, Sohn C, Lee S, Ahn H, Seo J, Cao J, Cai L, Commun. Biol 2020, 3, 420. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [5].Tentori AM, Yamauchi KA, Herr AE, Angew. Chemie 2016, 12431–12435. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [6].Kang C-C, Ward TM, Bockhorn J, Schiffman C, Huang H, Pegram MD, Herr AE, NPJ Precis. Oncol 2018, 2, 10. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [7].O’Leary NA, Wright MW, Brister JR, Ciufo S, Haddad D, McVeigh R, Rajput B, Robbertse B, Smith-White B,Ako-Adjei D,Astashyn A,Badretdin A,Bao Y, Blinkova O, Brover V, Chetvernin V, Choi J, Cox E, Ermolaeva O, Farrell CM, Goldfarb T, Gupta T, Haft D, Hatcher E, Hlavina W, Joardar VS, Kodali VK, Li W, Maglott D, Masterson P, McGarvey KM, Murphy MR, O’Neill K, Pujar S, Rangwala SH, Rausch D, Riddick LD, Schoch C, Shkeda A, Storz SS, Sun H, Thibaud-Nissen F, Tolstoy I, Tully RE, Vatsan AR, Wallin C, Webb D, Wu W, Landrum MJ, Kimchi A, Tatusova T, DiCuccio M, Kitts P, Murphy TD, Pruitt KD, Nucleic Acids Res. 2016, 44, D733–D745. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [8].Ponomarenko EA, Poverennaya EV, Ilgisonis EV, Pyatnitskiy MA, Kopylov AT, Zgoda VG, Lisitsa AV, Archakov AI, Int. J. Anal. Chem 2016, 2016, 7436849. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [9].Li Y, Feng X, Du W, Li Y, Liu BF, Anal. Chem 2013, 85, 4066–4073. [DOI] [PubMed] [Google Scholar]
  • [10].Shaposhnikov SA, Salenko VB, Brunborg G, Nygren J, Collins AR, Electrophoresis 2008, 29, 3005–3012. [DOI] [PubMed] [Google Scholar]
  • [11].Dickinson AJ, Armistead PM, Allbritton NL, Anal. Chem 2013, 85, 4797–4804. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [12].Nemes P, Knolhoff AM, Rubakhin SS, Sweedler JV, Anal. Chem 2011, 83, 6810–6817. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [13].Liao HW, Rubakhin SS, Philip MC, Sweedler JV, Anal. Chim. Acta 2020, 1118, 36–43. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [14].Chen Y, Xiong G, Arriaga EA, Electrophoresis 2007, 28, 2406–2415. [DOI] [PubMed] [Google Scholar]
  • [15].Tao L, Kennedy RT, Electrophoresis 1997, 18, 112–117. [DOI] [PubMed] [Google Scholar]
  • [16].Yang P, Mao Y, Lee AWM, Kennedy RT, Electrophoresis 2009, 30, 457–464. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [17].Hughes AJ, Spelke DP, Xu Z, Kang C-C, Schaffer DV, Herr AE, Nat. Methods 2014, 11, 749–55. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [18].Yamauchi KA, Herr AE, Microsystems Nanoeng. 2017, 3, 16079. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [19].Kang C, Yamauchi KA, Vlassakis J, Sinkala E, Duncombe TA, Herr AE, Nat. Protoc 2016, 11, 1508–1530. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [20].Vlassakis J, Yamauchi KA, Herr AE, SLAS Technol. 2021, in press. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [21].Wehr T, In: Encyclopedia of Physical Science and Technology, Elsevier, San Diego: 2001, Capillary Zone Electrophoresis pp. 355–368. [Google Scholar]
  • [22].Bharadwaj R, Santiago JG, Mohammadi B, Electrophoresis 2002, 23, 2729–2744. [DOI] [PubMed] [Google Scholar]
  • [23].Hamilton N, Traffic 2009, 10, 951–961. [DOI] [PubMed] [Google Scholar]
  • [24].McQuin C, Goodman A, Chernyshev V, Kamentsky L, Cimini BA, Karhohs KW, Doan M, Ding L, Rafelski SM, Thirstrup D, Wiegraebe W, Singh S, Becker T, Caicedo JC, Carpenter AE, PLOS Biol. 2018, 16, e2005970. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [25].Wiesmann V, Franz D, Held C, Müzenmayer C, Palmisano R, Wittenberg T, J. Microsc 2015, 257, 39–53. [DOI] [PubMed] [Google Scholar]
  • [26].Kostopoulou E, Zacharia E, Maroulis D, 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO) 2012, pp. 2258–2262. [Google Scholar]
  • [27].Natale M, Caiazzo A, Ficarra E, In: Marengo E, Robotti E (Eds.), 2-D PAGE Map Analysis, Methods in Molecular Biology, Humana Press Inc., New York: 2016, pp. 203–211. [Google Scholar]
  • [28].dos Anjos A, Møller ALB, Ersbøll BK, Finnie C, Shahbazkia HR, Bioinformatics 2011, 27, 368–375. [DOI] [PubMed] [Google Scholar]
  • [29].Berth M, Moser FM, Kolbe M, Bernhardt J, Appl. Microbiol. Biotechnol 2007, 76, 1223–1243. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [30].Hong Y, Han HJ, Lee H, Lee D, Ko J, Hong ZY, Lee JY, Seok JH, Lim HS, Son WC, Sohn I, Sci. Rep 2020, 10, 18915. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [31].Liu J, Li W, Tian Y, 1991 International Conference on Circuits and Systems, Shenzhen, China, 16–17 June 1991, pp. 325–327. 10.1109/CICCAS.1991.184351. [DOI] [Google Scholar]
  • [32].Hughes AJ, Herr AE, Proc. Natl. Acad. Sci. U.S.A 2012, 109, 21450–21455. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [33].Geldert A, Huang H, Herr AE, Sci. Rep 2020, 10, 8768. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [34].Pillai-Kastoori L, Schutz-Geschwender AR, Harford JA, Anal. Biochem 2020, 593, 113608. [DOI] [PubMed] [Google Scholar]
  • [35].Vlassakis J, Herr AE, Anal. Chem 2015, 87, 11030–11038. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [36].Giddings JC, Unified Separation Science, Wiley, New York: 1991. [Google Scholar]
  • [37].Ferguson KA, Metabolism 1964, 13, 985–1002. [DOI] [PubMed] [Google Scholar]
  • [38].Tong J, Anderson JL, Biophys. J 1996, 70, 1505–1513. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [39].Penot G, Le Péron C, Mérot Y, Grimaud-Fanouillére E, Ferriére F, Boujrad N, Kah O, Saligaut C, Ducouret B, Métivier R, Flouriot G, Endocrinology 2005, 146, 5474–5484. [DOI] [PubMed] [Google Scholar]
  • [40].Krizhevsky A, Sutskever I, Hinton GE, in: Pereira F, Burges CJC, Bottou L, In: Weinberger KQ (Eds.), Advances in Neural Information Processing Systems, Curran Associates, Inc., New York: 2012, pp. 1097–1105. [Google Scholar]
  • [41].Ronneberger O, Fischer P, Brox T, Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics) 2015, 9351, 234–241, 10.1007/978-3-319-24574-4_28. [DOI] [Google Scholar]
  • [42].Lecun Y, Bengio Y, Hinton G, Nature 2015, 521, 436–444. [DOI] [PubMed] [Google Scholar]
  • [43].Van Valen DA, Kudo T, Lane KM, Macklin DN, Quach NT, DeFelice MM, Maayan I, Tanouchi Y, Ashley EA, Covert MW, PLOS Comput. Biol 2016, 12, e1005177. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [44].Youden WJ, Cancer 1950, 3, 32–35. [DOI] [PubMed] [Google Scholar]
  • [45].Pan Q, Herr AE,Anal. Chim. Acta 2018, 1000, 214–222. [DOI] [PubMed] [Google Scholar]
  • [46].Kim JJ, Liang W, Kang C-C, Pegram MD, Herr AE, Single-cell immunoblotting resolves estrogen receptor-α isoforms in breast cancer PLOS ONE 2021, 16(7), e0254783. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Material

Data Availability Statement

The data and code that support the findings of this study are openly available in Github at https://github.com/anjaligopal/scisegmentation

RESOURCES