Skip to main content
PLOS Biology logoLink to PLOS Biology
. 2026 Jan 20;24(1):e3003591. doi: 10.1371/journal.pbio.3003591

VASCilia is an open-source, deep learning-based tool for 3D analysis of cochlear hair cell stereocilia bundles

Yasmin M Kassim 1,*, David B Rosenberg 1, Samprita Das 1, Xiaobo Wang 1, Zhuoling Huang 1, Samia Rahman 1, Ibraheem M Al Shammaa 2, Samer Salim 1, Kevin Huang 1, Alma Renero 1, Yuzuru Ninoyu 3,4, Rick A Friedman 3, Artur A Indzhykulian 5, Uri Manor 1,3,6,*
Editor: Alan G Cheng7
PMCID: PMC12829968  PMID: 41557732

Abstract

Cochlear hair cells are essential for hearing, and their stereocilia bundles are critical for mechanotransduction. However, analyzing the 3D morphology of these bundles can be challenging due to their complex organization and the presence of other cellular structures in the tissue. To address this, we developed VASCilia (Vision Analysis StereoCilia), a Napari plugin suite that automates the analysis of 3D confocal microscopy datasets of phalloidin-stained cochlear hair cell bundles. VASCilia includes five deep learning-based models trained on mouse cochlear datasets that streamline the analysis process, including: (1) Z-Focus Tracker (ZFT) for selecting relevant slices in a 3D image stack; (2) PCPAlignNet (Planar Cell Polarity Alignment Network) for automated orientation of image stacks; (3) a segmentation model for identifying and delineating stereocilia bundles; (4) a tonotopic Position Prediction tool; and (5) a classification tool for identifying hair cell subtypes. In addition, VASCilia provides automated computational tools and measurement capabilities. Using VASCilia, we demonstrate its utility on challenging datasets, including neonatal wild type and Eps8 KO 5-day old mice. We further showcase its power by quantifying complex bundle disorganization in Cdh23−/− cochleae via texture analysis, which revealed systematically more heterogeneous and less regular bundles than littermate controls. These case studies demonstrate the power of VASCilia in facilitating detailed quantitative analysis of stereocilia. VASCilia also provides a user-friendly interface that allows researchers to easily navigate and use the tool, with the added capability to reload all their analyses for review or sharing purposes. We believe that VASCilia will be a valuable resource for researchers studying cochlear hair cell development and function, addressing a longstanding need in the hair cell research community for specialized deep learning-based tools capable of high-throughput image quantitation. We have released our code along with a manually annotated dataset that includes approximately 55 3D stacks featuring instance segmentation (https://github.com/ucsdmanorlab/Napari-VASCilia). This dataset comprises a total of 502 inner and 1,703 outer hair cell bundles annotated in 3D. As the first open-source dataset of its kind, we aim to establish a foundational resource for constructing a comprehensive atlas of cochlea hair cell images. Ultimately, this initiative will support the development of foundational models adaptable to various species, markers, and imaging scales to accelerate advances within the hearing research community.


Cochlear hair cell stereocilia bundles are vital for hearing, but their 3D morphology remains unclear due to their complex organisation. This study develops an open-source, deep learning-based tool called VASCilia that automates the high-throughput analysis of 3D confocal microscopy datasets of phalloidin-stained cochlear hair cell bundles.

1 Introduction

The last decade has seen a phenomenal advancement in Artificial Intelligence (AI), giving birth to countless architectures, backbones, and optimization techniques that are being continually refined [1,2]. Despite the potential to revolutionize computer vision tasks, many researchers in biological sciences still use computationally primitive and labor-intensive approaches to analyze their imaging and microscopy datasets. One major area where AI-based computer vision tasks could have maximal impact is in pre-clinical auditory research. In our lab, cutting-edge imaging tools and approaches are developed and utilized to gain insights into the structure-function relationship of subcellular structures in sensory hair cells with molecular resolution. These rich imaging datasets present an opportunity to understand how sensory hair cells function in health and disease, including cases of congenital and age-related hearing loss [39]. The structural attributes of the cochlea are quite uniform across species. For example, the famously snail-shaped cochlea is governed by a so-called “tonotopic” organization with a gradient of morphological features that change from one end of the cochlea to the other. The result of this organization is that each position along the length of the cochlea dictates the characteristic frequency of sound to which the tissue responds, therefore directly correlating position with function. In other words, the sensory hair cell organelles called “stereocilia” or “bundles” follow a pattern of increasing lengths as a function of position along the length of the cochlea [10,11]. The cartoon in Fig 1 illustrates how the length of hair cells corresponds to specific frequencies, when the cochlea is unrolled for illustrative purposes. Hair cells are organized tonotopically. Hair cells at the base of the cochlea respond best to high-frequency sounds, while those at the apex respond best to low-frequency sounds. The physical and functional properties of these structures vary systematically along the tonotopic axis of the auditory system. For example, hair bundles might be longer or shorter, or synapses might be more or fewer or have different properties, depending on whether they are associated with high-frequency or low-frequency regions [12,13]. The association between these attributes and their purpose makes the cochlea an ideal system for investigating the potential of AI in biological image analysis.

Fig 1. The anatomy of the cochlea [35].

Fig 1

Sensory hair cell stereocilia sit along a tonotopic axis that follows the length of the spiral-shaped cochlea with a striking pattern of increasing stereocilia lengths from the base to the apex of the cochlea. The morphological and spatial features of cochlear hair cell stereocilia follow extremely predictable tonotopic patterns between individuals and species: Stereocilia lengths increase as a function of position along the tonotopic axis of the cochlea, which in turn is reflective of the frequency of sound they are tuned to detect. Thus, the relationship between the morphological and spatial features of these cells and their function are relatively well-defined compared to many other biological systems. Due to their highly patterned organization, cochlear tissues thus present a particularly striking opportunity for automated computer vision tasks.

In this study, we leverage Napari, an open-source Python tool [14], as the foundation for developing our plugin, chosen for its robust viewer capabilities. The Napari platform has seen a growing number of plugins [1528] designed to address various biomedical challenges. However, our plugin represents the first such tool tailored specifically for the ear research community, enabling high-precision AI-based 3D instance segmentation and measurement quantification of stereocilia cells.

Previous studies have explored manual segmentation of stereocilia images to quantify measurements for specific research objectives [9,29]. Others have attempted to automate this process using traditional intensity-based methods [30]. However, these traditional approaches typically lack the capabilities provided by AI technologies, which can autonomously extract features and identify regions without relying exclusively on intensity cues. This gap underscores the critical need for advanced AI-driven segmentation techniques that utilize a broader array of features, thereby enhancing both the accuracy and the detail of the segmentation. We identified a few machine learning and deep learning publications. Urata et al. [31] relies on template matching and Machine learning-based pattern recognition to achieve detection and analysis of hair cell positions across the entire longitudinal axis of the organ of Corti. Buswinka et al. [32] develops a software tool that utilizes AI capabilities; however, this tool is limited to providing bounding box detection and does not offer 3D instance segmentation for the bundles. This significantly hampers the tool’s ability to provide shape-level descriptors and accurate measurements. Cortada et al. [33] use stardist deep learning architecture [34] to segment the hair cells, however, they don’t provide 3D instance segmentation, they only provide 2D detection based on max projection.

VASCilia (Fig 18) provides a comprehensive workflow assisted by AI for automated 3D hair cell image analysis: 1. Read and pre-processes 3D image z-stacks. 2. Removes out-of-focus frames (AI-assisted). 3. Aligns stacks to the planar cell polarity (PCP) axis (AI-assisted). 4. Generates instance-labeled 3D segmentation masks for downstream analysis (AI-assisted). 5. Computes 2D/3D measurements (e.g., volume, centroid, surface area) and additional derived metrics. 6. Measures tip-to-base height for each hair-cell bundle. 7. Classifies bundles into four rows (IHC, OHC1, OHC2, OHC3) (AI-assisted). 8. Quantifies protein intensity to assess expression levels. 9. Estimates bundle orientation relative to the PCP axis. 10. Identifies the cochlear region of origin (base, middle, apex) using a pre-trained model (AI-assisted). 11. Supports model fine-tuning on user data to accommodate staining protocols and experimental conditions (AI-assisted). 12. Enables full export/import of the analysis state (data and intermediates) via a single pickle file for reproducibility.

Fig 18. VASCilia enables the ear research community to process their cochlea samples through an end-to-end workflow, all within a user-friendly interface.

Fig 18

To demonstrate the robustness and practical utility of VASCilia, we specifically apply it to challenging preclinical datasets. These include neonatal (P5) mice, whose shorter stereocilia are historically difficult to analyze, and feature a range of phenotypes from wild type (WT) to knockout (Eps8 KO) and AAV-rescued genotypes. Furthermore, we showcase VASCilia’s ability to move beyond simple measurements by quantifying complex bundle disorganization in a genetic model of deafness (Cdh23 mutants), proving its capability to analyze intermediate and abnormal phenotypes that are otherwise intractable to quantify at scale. We anticipate that our AI-enhanced plugin will streamline the laborious process of manual quantification of hair cell bundles in 3D image stacks, offering high-throughput and reproducible analysis with clear biological interpretation.

2 Results

2.1 Z-Focus tracker for 3D images of cochlear hair cells

In this section, we present the performance and evaluation of our custom network, ZFT-Net, along with comparisons to other commonly used architectures such as ResNet10, ResNet18, DenseNet121, and EfficientNet. Our goal was to classify image frames within each 3D z-stack into three zones: the pre-cellular zone (PCZ), the cellular clarity zone (CCZ) and the noise saturation zone (NSZ) to focus segmentation and analysis on the portion of the z-stack that contains clear cellular structures. The results shown in Table 1 and Fig 2 highlight the performance of the networks in metrics such as accuracy, precision, recall, specificity, F1 score and error rate, and prediction stability. We found ZFT-Net produced the best results. All metrics were averaged across all three classes and calculated using true positives (TP), false positives (FP), false negatives (FN), and true negatives (TN) derived from the confusion matrix.

Table 1. Comparison of architectures based on performance metrics for Z-focus tracker.

The results highlight that ZFT-Net outperforms all other architectures across all evaluation metrics. Notably, ZFT-Net achieved a switch count of 0, meaning its predictions consistently transitioned from class 0 (pre-cellular zone) to class 1 (cellular clarity phase), and finally to class 2 (noise saturation zone), without fluctuations or misclassifications. This indicates the model’s robustness in maintaining a stable and accurate prediction sequence. Source data are provided in Supplementary Information S1 Data.xlsx.

Architecture Accuracy Precision Recall Specificity F1 Score Error Rate Confusion Matrix Switches
ZFT-Net 97.66 94.14 96.01 98.48 94.97 0.03 [23710672703159] 0
ResNet18 94.36 89.59 87.46 95.81 88.16 0.08 [236506652316143] 9
ResNet10 94.77 92.18 88.65 95.89 89.52 0.07 [2313012722201144] 6
DenseNet121 95.87 92.43 90.67 97.05 91.21 0.06 [241512692002145] 3
EfficientNet 95.46 90.01 89.98 96.68 89.99 0.06 [236907601007156] 3

Fig 2. The confusion matrix for ZFT-Net is presented alongside a corresponding bar plot, illustrating the overlapping predictions for the CCZ class, represented by the digit 1.

Fig 2

Errors are color-coded and oriented to reflect the different cases in the confusion matrix. GT0, GT1, and GT2 correspond to the Pre-Cellular Zone (PCZ), Cellular Clarity Zone (CCZ), and Noise Saturation Zone (NSZ), respectively. At the bottom of the figure, we provide an example of the first subject’s prediction, visualized through a barplot. It shows a smooth transition in prediction from 0 (PCZ), to 1 (CCZ), to 2 (NSZ) without any fluctuation errors, except for two minor boundary errors that are likely due to annotator subjectivity. For ZFT-Net, Conv represents Convolution layers, BN stands for Batch Normalization, ReLU is the Rectified Linear Unit activation function, Pool refers to Pooling layers, and FC denotes Fully Connected layers.

One notable feature of ZFT-Net is that it does not produce prediction fluctuation errors, which we named “switches”. This consistency in its predictions offers a high degree of confidence, making ZFT-Net our primary choice for VASCilia. As described in Sect 4.3, the preparation of the data and the configuration of CNN played a crucial role in achieving these results. We speculate that ZFT-Net outperforms the other well-known networks because our images are downsampled to 256x256 resolution. Deeper networks tend to lose spatial characteristics in the process, which negatively impacts their performance. The architecture of ZFT-Net, on the other hand, is better suited to preserve these critical spatial features, resulting in better performance across all metrics.

Accuracy=TP+TNTP+TN+FP+FNPrecision=TPTP+FPRecall=TPTP+FNSpecificity=TNTN+FP
F1-Score=2×Precision× RecallPrecision+RecallError Rate=ijConfusion Matrixi,ji,jConfusion Matrixi,j

2.2 Rotation correction and alignment of 3D image stacks with PCPAlignNet

The orientation of 3D cochlear image stacks is highly variable due to inconsistencies in imaging parameters, tissue handling, and the complex spiral structure of the cochlea. Automated orientation of cochlear hair cell image stacks can significantly accelerate batch processing, which is essential for efficiently handling large datasets. To address this challenge, we generated training data for PCPAlignNet, which aligns each 3D stack to orient the hair cell bundles along the planar polarity axis.

After data preparation and augmentation Sect 4.4, our training dataset resulted in 72 folders (classes), each containing 1,831 images. We experimented with several networks, and DenseNet121 yielded the best performance; see Table 2.

Table 2. Performance comparison of different architectures using various error metrics reveals that DenseNet121 has optimal performance across all the metrics.

Source data are provided in Supplementary Information S1 Data.xlsx.

Architecture MAE RMSE Max Error Median Absolute Error R-squared
ResNet50 10.53 41.36 180 0 0.86
ResNet18 10 41.33 180 0 0.86
MobileNet 10 41.33 180 0 0.86
DenseNet121 0 0 0 0 1

During inference, we corrected the orientation of the stereocilia bundles by predicting the rotation angle for each image in the tested stack using the pre-trained model. For each frame, the model produced class scores corresponding to possible angles of rotation. The predicted scores were averaged across all frames in the stack to ensure that the final decision accounted for the model’s predictions for every frame. The angle corresponding to the highest average score (class × 5°) was selected as the final predicted orientation. To achieve proper alignment, the images were rotated by 360 minus the predicted angle.

2.3 A 3D instance segmentation pipeline for stereocilia bundles

Our method utilizes 2D detection on individual frames, which is then followed by reconstructing the 3D object through a multi-object assignment algorithm. For detailed methodology, refer to Sect 4.6.

We partitioned the dataset into training, testing, and validation groups at the stack level to avoid data leakage and mixing frames from different stacks. The training set includes 30 stacks, while the validation contains 5 stacks. Ten stacks are withheld for testing and evaluation. These stacks are categorized into two types: typical and complex cases. Six stacks fall into the typical category, featuring images that are easier to segment due to less overlap among the bundles and well-separated bundle rows. The remaining four stacks are classified as complex cases. These contain structures that pose significant challenges for segmentation due to various factors. The objective of this partitioning is to evaluate the decrease in performance when processing challenging cases and check the robustness of the detection algorithm. Fig 3A3E illustrate the model’s proficiency in accurately detecting the cells and demonstrate the model’s effectiveness in overcoming the inherent challenges associated with this task. As a representative single-stack example, Fig 4 shows 13 frames from one stack segmented in 2D; a multi-object assignment algorithm then reconstructs each 3D object. Details are provided in Sect 4.6.

Fig 3. Panels A, B, and C show instances where inter-row stereocilia bundles are in close proximity, yet the algorithm successfully separates them, demonstrating robust performance.

Fig 3

Panel D presents a scenario where intra-row bundles are tightly clustered; however, the algorithm efficiently distinguishes each bundle, highlighting its ability to resolve complex spatial relationships within the samples. In Panel E, despite the raw image being notably dark, the algorithm remains effective in detecting and segmenting individual bundles.

Fig 4. The algorithm successfully detects stereocilia bundles in 2D across successive frames, assigning unique IDs to each detected object within the 2D plane.

Fig 4

The variation in IDs across frames reflects the independent detection process in each 2D frame. Subsequently, a multi-object assignment algorithm intervenes to reconcile these IDs, effectively re-assigning them to maintain consistency across frames. This step is crucial for reconstructing accurate 3D objects, ensuring that each bundle retains a consistent ID throughout all frames.

After completing the training process, we employed the trained model to segment each bundle in the images from the testing set. To evaluate performance, we calculate the Intersection over Union (IoU) for 3D volumes by comparing predicted and ground truth segmentation masks. It is defined as IoU=|AB||AB|, where |AB| represents the volume of the intersection of the two volumes, and |AB| represents the volume of their union. In addition, we calculate the F1 measure, accuracy, precision, and recall for the predictions against the ground truth. The evaluation process iterates through unique labels found in both ground-truth and predicted volumes, excluding the background label, and computes the IoU for the corresponding labels. This method tracks TP, FP, and FN across varying IoU thresholds. For instance, if one 3D bundle in the ground truth overlaps with two bundles in the predictions, the algorithm considers the one with the largest overlap as a TP and the unmatched one as a false positive. We perform evaluations across a range of IoU thresholds, from 0.1 to 1 in steps of 0.05. In object detection, we do not consider the True Negative (TN) because it refers to correctly labeling background pixels or non-object pixels as such. Since the background or non-object areas can be vast, focusing on TNs would skew the performance metric towards the most common class (background), which does not provide useful information about the model’s ability to detect objects of interest.

Fig 5A indicates that the average F1-measure and accuracy are 99.4% and 98.8%, respectively, at an IoU of 0.5 for typical test set. In contrast, Fig 5C reveals that at the same IoU level, the average F1-measure and Accuracy decrease to 95.7% and 91.9% for complex test set. Maintaining the F1 measure and accuracy above 90% demonstrates that the algorithm effectively handles the significant challenges present in the images. We can observe from Fig 5B and 5D that the total number of TPs remains high until the IoU reaches 0.8 and 0.72, respectively. Beyond these thresholds, there is a noticeable increase in both FPs and FNs. When you observe that TPs remain high until IoU thresholds of 0.8 and 0.72, this indicates that the algorithm is quite effective at correctly identifying and matching relevant objects or features in the dataset up to these points. The increasing errors (FPs and FNs) at higher IoU thresholds indicate that the algorithm struggles with precision and recall balance as criteria become more strict. Fig 6 showcases ten crops of stereocilia bundles: the first row displays the raw crops, the second and third rows features ground truth manual annotations made by two human annotators in the open-source Computer Vision Annotation Tool (CVAT [36]), and the fourth row presents our predicted 3D volumes for each bundle. It is evident that most bundles achieve IoU scores between 0.7 and 0.8, which visually suggest near-perfect alignment.

Fig 5. Detailed performance indicators are presented.

Fig 5

The metrics are evaluated across two categories: typical cases that do not present extreme challenges and complex cases affected by factors such as contrast variations, noise, overlapping bundles, and close proximity between them. Source data are provided in Supplementary Information S1 Data.xlsx.

Fig 6. Comparison of raw crops (first row), GT masks for the first human annotator (second row), GT masks for the second human annotator (third row), and the Predicted masks (fourth row) for 10 different crops from the same stack, the IoU score underneath the crops indicate the score between the 3D GT masks and the 3D predicted masks.

Fig 6

We examined the margin of error between two annotators for a single 3D stack over 47 instances. The average overlap was 0.70 between two annotators, 0.74 between one annotator and the prediction, and 0.76 between the second annotator and the prediction.

The dataset has been annotated by five different human annotators. To put the accuracy of our predictions in a more meaningful context, we investigated the margin of error between annotators on the same stack. We found that the average IoU for 47 instances from a single stack between two annotators is 0.70. Strikingly, the model achieved a higher average IoU of 0.76 with the first annotator and 0.74 with the second annotator. We speculate this is due to the model’s ability to effectively average the experience of multiple annotators (five annotators in our case). These results also highlight the difficulty of achieving accuracies beyond a certain amount, as even two human expert annotators only agreed 70% on pixel-based 3D annotations. In addition, Paired t-tests revealed that the model’s predictions are significantly better aligned with both Annotator 1 (t = 3.7399, p = 0.0002) and Annotator 2 (t = 7.8134, p = 0.0000) compared to the agreement between the two annotators. No significant difference was found between the model’s alignment with Annotator 1 and Annotator 2 (t = –1.7431, p = 0.0879), indicating that the model generalizes equally well across both sets of annotations (Fig 7A and 7B). We speculate that the model may be averaging the annotators’ varying styles effectively, and that these results underscore the robustness of the model in producing reliable and consistent predictions, which, in some cases, surpass the level of agreement between human expert annotators.

Fig 7. Comparison of the IoU distribution (left) and density plot (right) between model predictions and annotators. Source data are provided in Supplementary Information S1 Data.xlsx.

Fig 7

[A] Box plot comparing the IoU alignment of model predictions with Human 1 and Human 2, and the agreement between Human 1 and Human 2. Statistical tests indicate that the model predictions are significantly better aligned with both Human 1 (p = 0.0002) and Human 2 (p = 0.0000) compared to the agreement between the two annotators. Notably, there is no significant difference between the model’s alignment with Human 1 and Human 2 (p = 0.0879). [B] A density plot (KDE) estimates the distribution of IoU values by smoothing the data with small curves (kernels) at each point. The y-axis shows the density, representing the relative likelihood of observing the IoU values at different points on the x-axis. The density curves for Model vs. Human 1 and Model vs. Human 2 show considerable overlap, indicating similar IoU distributions between the model and both annotators. In contrast, Human 1 versus Human 2 shows less overlap, suggesting greater variability between the annotators’ manual segmentation.

2.4 VASCilia computational tools and measurements

The Napari plugin equips users with essential measurements and deep learning-based tools tailored for analyzing cochlear hair cell stereocilia bundles. Users can accurately obtain up to 15 different 2D and 3D measurements, including volume, surface area, and centroids of segmented regions. Beyond these fundamental measurements, the plugin includes specialized metrics critical for hair cell research, such as calculating stereocilia bundle heights, predicting the region of the cochlea from which the stack is taken (Base, Middle, Apex), classifying hair cells (e.g. inner vs. outer hair cell subtypes), measuring fluorescence signal within each bundle, and determining bundle orientation with respect to the planar polarity axis of the cochlea. These features are designed to support the most common goals of the hair cell imaging research community.

2.4.1 Validation of Stereocilia bundle height measurement.

VASCilia significantly reduces the time required for bundle height measurements. The manual and computational approaches used for these measurements are detailed in Sect 4.7. The plugin provides users with a CSV file that details the height measurements for each bundle, each tagged with a corresponding ID. Fig 8 illustrates the computation and Table 3 and Fig 9 present bundle height validation as measured between VASCilia and two human expert annotators for 18 cells from four different datasets (two WT and two Eps8 KO with AAV injection [37]). We found it takes an average of 5.5 minutes to manually annotate each bundle height using the most commonly used open-source microscope image analysis tool, Fiji [38] In contrast, VASCilia significantly reduces this time to just one second to press a button to calculate the height of all the bundles in a single 3D stack (as many as 55), with at most five minutes needed for refining the base and top points of stack bundles if necessary. Thus, measuring the length for 10 stacks, each containing 50 cells, would take 2,750 minutes (approximately 46 hours) with Fiji, compared to only 50 minutes with VASCilia. We conducted both paired t-tests and Pearson correlation analyses to evaluate the agreement between VASCilia and human expert annotators. The Pearson correlation analysis showed very strong positive relationships: 0.942 (p < 0.001) between VASCilia and annotator 1, 0.802 (p < 0.001) between VASCilia and annotator 2, and 0.830 (p < 0.001) between annotators 1 and 2. The paired t-test results indicate no statistically significant differences between the measurements obtained by VASCilia and annotator 1 (t = 1.868, p = 0.078) or annotator 2 (t = 1.191, p = 0.249). No significant difference (t = 0.180, p = 0.860) was observed between the measurements from observers 1 and 2. Together, our findings suggest VASCilia performs comparably to human expert annotators, and that the annotators themselves are consistent in their measurements, see Table 4.

Fig 8. An illustration of the plugin’s automated method for measuring the distance from the tip to the bottom of stereocilia bundles.

Fig 8

Users can adjust the positions of the upper and lower points. Upon making these adjustments, the plugin’s listeners automatically detect the changes, redraw the connecting line, and recalculate the distance accordingly. Left: shows the 3D visualization using Napari. Right: demonstrates how the distance is computed from x, y, and z, considering the physical resolution in microns.

Table 3. Measurements from VASCilia and Observers with Descriptive Statistics.
Sample ID VASCilia Observer1 Observer2
DataSet1 2 3.520 3.513 3.431
19 2.786 2.675 2.883
25 4.129 3.590 3.947
32 3.728 3.367 3.613
DataSet2 10 3.816 3.474 3.466
32 3.028 2.633 2.454
24 3.748 3.534 3.575
31 3.572 3.389 3.159
DataSet3 3 2.006 2.189 2.646
39 1.678 1.897 2.426
34 2.099 2.202 2.006
46 4.314 4.543 4.313
33 3.671 3.624 2.944
48 3.304 3.519 3.194
DataSet4 3 2.738 2.446 2.916
47 2.797 2.706 3.206
17 2.531 2.628 2.044
54 3.004 2.559 1.965
Statistics Mean Median Std. Dev.
VASCilia 3.137 3.166 0.725
Observer 1 3.027 3.037 0.663
Observer 2 3.010 3.052 0.649
Fig 9. Data set crops that are associated to the computations used in Table 3, See Fig S2 A-D, where all four complete datasets are visualized for reference.

Fig 9

Table 4. Pearson correlation coefficients, p-values, and paired t-test results.
Comparison Correlation Coefficient Correlation p-value t-test Statistic t-test p-value
VASCilia vs Observer 1 0.942 <0.001 1.868 0.078
VASCilia vs Observer 2 0.802 <0.001 1.191 0.249
Observer 1 vs Observer 2 0.830 <0.001 0.180 0.860

To further validate stereocilia–length quantification, we compared VASCilia against manual Fiji measurements on n = 15 Eps8-KO cells sampled from six stacks (two Base, two Middle, two Apex) drawn from the dataset used in Fig 10A. We also evaluated n = 15 cells lacking functional Cdh23 [39] (Cdh23v-6J/v-6J, referred to as Cdh23−/−), chosen to represent significantly disrupted bundles to assess generalization. Across both cohorts, the mean lengths were highly comparable and the paired differences were not statistically significant (paired t-test and Wilcoxon signed-rank); see S7 and S8 Tables.

Fig 10. This figure consists of two panels.

Fig 10

Panel (A) on the left shows bundle heights and tonotopic variations of phalloidin fluorescence intensity in P5 mouse cochlear hair cells across all cochlear regions. The tonotopic gradient observed here may not be representative of other developmental ages, which were not examined in this study. Panel (B) presents the anti-EPS8 analysis, comparing anti-EPS8 fluorescence intensity between wild type and knockout stereocilia bundles from the mid-cochlear region at P30. Source data are provided Supplementary Information S1 Data.xlsx.

2.4.2 Fluorescence intensity measurements.

Understanding signal levels whether they are one or multiple proteins or a particular stain is crucial in various research fields, including cochlear function. The plugin enables the user to get a precise quantification for the signal to allow researchers to gain deeper insights into its function and potential involvement in hearing loss.

The plugin has a mechanism to calculate both the mean and total fluorescence intensity, subtracting the background intensity (optional) to correct for ambient noise. This is achieved by superimposing the 3D segmented masks onto the corresponding fluorescent image slices and aggregating the fluorescent intensities across the z-stack.

The resulting intensities are normalized using max normalization and plotted, allowing for comparative analysis across different cell types such as inner hair cells (IHCs) and outer hair cells (OHCs) in various categories (OHC1, OHC2, and OHC3). Data for each cell type is stored and visualized through histogram that display the mean and total intensity distributions to provide insights and observation about the protein or signal expressions across the cochlear architecture. To enable further downstream analysis, users will obtain plots for each cell type, resulting in a total of 20 outputs per stack (10 plots and 10 CSV files). See Fig 11 for the total intensity histogram bars for all cells in three modes, illustrated in Sect 3, from the Napari plugin. The plugin can also be used to analyze and generate plots for other proteins in different channels as needed by the study.

Fig 11. The plot demonstrates the utility and ease with which users can compare data between bar plots and actual cellular images within the plugin.

Fig 11

Top: Segmented bundles; class legend (IHC WT = yellow, OHC WT = red). Green numbers mark each bundle’s ID. Middle: Per-bundle fluorescence (phalloidin) for three depth-aggregation modes: Mode 1:All layers (use each bundle’s full depth); Mode 2: Common depth only (truncate to n=min depth across bundles); Mode 3: Pad to max depth (pad shallower bundles to n=max depth by repeating the weakest-layer value, then sum). Bottom: Galleries from four example cells (shared same depth). Across cells, IHC bundles show consistently higher fluorescence than OHC bundles, reflecting greater F-actin signal. Source data are provided in Supplementary Information S1 Data.xlsx.

2.4.3 Case Study 1: Differences in stereocilia bundle heights and fluorescence signals (phalloidin and anti-EPS8, WT vs. Eps8 KO).

Analysis of bundle heights and phalloidin signal across cochlear regions and hair cell types in neonatal P5 WT and Eps8 KO mice (Fig 10A):

To challenge our approach and demonstrate its utility, we conducted a study involving 18 cochlear stacks from neonatal P5 mice, with samples from three wild type (WT) animals and three Eps8 KO mice, which have previously been shown to have abnormally short stereocilia [37]. Each animal’s cochlea was analyzed across three regions (Base, Middle, and Apex), and within each region, we studied two cell types (IHCs and OHCs). See Fig 10A (top row) for a detailed analysis of stereocilia bundle height variations between WT and KO. The violin plots were generated using data processed through the VASCilia plugin. Consistent with prior studies, VASCilia showed a gradient of increasing bundle heights along the base-mid-apex tonotopic axis for both WT and Eps8 KO mice, with Eps8 KO mice showing significantly shorter bundles than WT. S1 and S2 Tables summarize bundle height distributions (mean, SD, median, N) and the pairwise tonotopic differences. They show a robust Base→Middle→Apex increase in WT (IHC/OHC) and KO OHCs, with the only non-significant step being Base vs. Middle in KO IHC (Holm-adjusted tests).

See Fig 10A (bottom row) for the results of our challenge case study on the accumulated phalloidin signal between WT and Eps8 KO mice in three regions (Base, Middle, and Apex) for two cell types (IHC and OHC). Intensities were computed as the 3D sum within each bundle and global Min-Max normalized across all WT+KO samples to place values on a scale of 0 to 1. In this analysis, we applied mode 2, discussed in Sect 3, separately to each dataset. For a given data set D, we set n to the largest depth shared by all bundles within D; equivalently, n=min{bundle depths in D}. This dataset-specific choice of n ensures that within each data set, all bundles contribute the same number of layers. The IHC WT groups showed the highest normalized intensities in the Base region (median 0.84), with a tonotopic gradient of decreasing intensity from Base to Apex across all groups. KO groups exhibited a significant reduction in intensity compared to WT, with the OHC KO Apex group showing the lowest median intensity of 0.11. See S3 and S4 Tables, which report descriptive statistics (mean, SD, median, N) and pairwise contrasts (Δ, 95% CI, Hedges’ g, Holm-adjusted p-values); significant comparisons are indicated.

Analysis of anti-EPS8 signal in mid-cochlear inner hair cells of mature P30 WT and Eps8 KO mice (Fig 10B): Quantitative analysis of stereocilia bundle morphology and anti-EPS8 signal in postnatal day 30 (P30) inner hair cells (IHCs) revealed pronounced structural and molecular differences between wild type (WT) and Eps8 knockout (KO) mice. WT bundles (n = 14) exhibited a mean length of 3.59 ± 0.29 μm (SD; range: 3.12–4.28 μm), whereas Eps8 KO bundles (n = 15) were significantly shorter, measuring 1.22 ± 0.18 μm (SD; range: 0.90–1.53 μm; Welch’s t = 25.89, df = 21.29, p = 1.4 × 10−17). To evaluate bundle-specific anti-EPS8 labeling intensity, total fluorescence intensity was quantified within the same segmented IHC bundles. WT bundles exhibited a mean total intensity of 0.616 ± 0.059 (SD), while Eps8 KO bundles showed markedly lower values of 0.094 ± 0.012. Normalization was performed by dividing each bundle’s total intensity by the maximum total intensity observed across both WT and KO groups. This reduction was highly significant (Welch’s t = 8.63, df = 14.16, p = 5.1 × 10−7). These findings demonstrate that the loss of Eps8 leads to a substantial decrease in both the length of the stereocilia and the anti-Eps8 signal, consistent with its essential role in the development and maintenance of the stereocilia F-actin architecture.

2.4.4 Case Study 2: Fluorescence-based bundle texture analysis and classification in Cdh23−/− and Cdh23+/− hair cells.

Quantifying bundle disorganization (Cdh23−/− vs. Cdh23+/−) (Fig 12A): Here we present an illustrative case study demonstrating VASCilia’s utility beyond bundle-height and intensity measurements. We analyzed 3D image stacks from Cdh23v-6J/v-6J mice, referred here as Cdh23−/− [39] to quantify texture differences (Gray-level Co-occurrence Matrix (GLCM) and Local Binary Pattern (LBP)) between genotypes. We compared heterozygous controls (Cdh23+/−; n = 187 crops) with homozygous knockouts (Cdh23−/−; n = 176 crops). Cdh23+/− bundles were well organized, whereas Cdh23−/− bundles were visibly disorganized, consistent with the expected phenotype.

Fig 12. This figure presents the texture analysis displayed at the top as panel (A), and the anti-CDH23 fluorescence quantification analysis included at the bottom as panel (B).

Fig 12

Source data are provided in Supplementary Information S1 Data.xlsx. (Panel A) [Top row]: t-SNE of texture features [I to IV] and PCA projection show clear genotype separation in feature space, with crops from the two genotypes providing visual representation. (Panel A) [I to V]: GLCM energy by genotype (Cdh23−/− bundles have lower energy than Cdh23+/−, reflecting less uniform, more complex textures; differences are statistically significant); GLCM correlation by genotype (Cdh23−/− bundles exhibit lower correlation than Cdh23+/−, indicating weaker spatial regularity; differences are statistically significant); GLCM contrast by genotype (Cdh23−/− bundles show higher contrast than Cdh23+/−, consistent with greater local intensity variation; differences are statistically significant); GLCM homogeneity by genotype (Cdh23−/− bundles show reduced homogeneity relative to Cdh23+/−, consistent with more uneven local texture; differences are statistically significant); and Random forest feature importance highlights key GLCM/LBP drivers. (Panel B): Anti-CDH23 immunofluorescence quantification analysis for P7 IHC and OHC stereocilia bundles from a mid-cochlear location. Each data point represents a value obtained from an individual bundle, normalized to the maximum value within the dataset. Exemplar images on the right show stereocilia bundles stained with phalloidin (white) and anti-CDH23 antibody (magenta), with the automatically generated stereocilia bundle mask overlaid.

To quantify these differences, we extracted texture features from segmented bundle crops using VASCilia. Specifically, we computed GLCM features—contrast, correlation, energy, and homogeneity—and LBP [40] histograms for each crop. Cdh23−/− samples had higher median contrast (136.88 vs. 74.76), consistent with more heterogeneous texture; lower correlation (0.98 vs. 0.99), reflecting weaker spatial regularity; lower energy (0.0147 vs. 0.0157), indicating less uniformity; and lower homogeneity (0.1368 vs. 0.1782), consistent with greater local contrast. Two-sided Mann–Whitney U tests showed highly significant differences across all GLCM features (p<0.0001). Several LBP bins (0, 1, 2, 5, 6, 7, 9) also differed between genotypes (p<0.01), reflecting shifts in local texture patterns. Dimensionality reduction (PCA and t-SNE) visually revealed a clear separation between genotypes in feature space. To quantitatively validate this observation, we trained a random forest on the extracted features (training n = 290; test n = 73), which achieved 94.5% accuracy on held-out crops (4/73 misclassified). The most important features aligned with those that were statistically significant (Fig 12A); representative examples are shown in S5 Fig.

Anti-CDH23 fluorescence quantification analysis (Cdh23−/− vs. Cdh23+/− (Fig 12B): To examine how loss of cadherin-23 affects anti-CDH23 staining, we quantified total fluorescence intensity in IHCs and OHCs bundles from Cdh23−/− and Cdh23+/− mice. Data were obtained from 7 Cdh23−/− and 6 Cdh23+/− datasets, all from the mid-cochlear location. Values were min–max scaled using the global all cells range across both genotypes and cell types. As shown in Fig 12B, across all bundles, Cdh23−/− cells (n = 112) measured 0.202 ± 0.012 SEM, whereas Cdh23+/− cells (n = 76) had a mean normalized intensity of 0.482±0.021 SEM. By cell type, IHC bundles were highest: Cdh23−/− (n = 34) 0.352±0.021 SEM vs. Cdh23+/− (n = 24) 0.621 ± 0.040 SEM. OHC bundles showed the same trend: Cdh23−/− (n = 78) 0.137 ± 0.007 SEM vs. Cdh23+/− (n = 52) 0.417±0.019 SEM. These results demonstrate a robust, genotype-dependent reduction in anti-CDH23 bundle fluorescence in Cdh23−/− hair cells across both IHCs and OHCs, consistent with cadherin-23 being required to maintain normal stereocilia organization and F-actin integrity in the mid-cochlea.

2.4.5 Tonotopic position prediction (BASE, MIDDLE, APEX).

One of the fascinating features of cochlear hair cell bundles is the so-called “tonotopic” organization, wherein the bundle heights follow a pattern of increasing lengths as a function of position along the length of the cochlea, see Fig 1. Since the frequency of sound detected follows a similar pattern along the length of the cochlea, we were motivated to generate a model that can accurately assess the tonotopic position of the imaged region of the cochlea. To conduct this task, we trained a classification CNN. For more details about the training and architecture, see Sect 4.8. This method has demonstrated robust performance, achieving a subject-based accuracy of 97%; 28 out of 29 stacks were correctly identified. The single observed misclassification in our testing data involved a stack from the MIDDLE being incorrectly predicted as BASE, which we speculate was due to the close proximity and overlapping characteristics of these two cochlear regions. The covariance matrix for the regions BASE, MIDDLE, and APEX is in Fig 13.

Fig 13. left: covariance matrix for model prediction, others: Grad-CAM Visualization for Base, Middle, and Apex; Resized and Overlaid Response from the Last Convolutional Layer Highlighting Focus on Bundles During Decision Making.

Fig 13

Source data are provided in Supplementary Information S1 Data.xlsx.

To gain insights into the focus areas of our model during prediction, we employed Gradient-weighted Class Activation Mapping (Grad-CAM), a powerful visualization tool used to explain deep learning-based predictions. Grad-CAM helps visually identify which parts of an image are pivotal for a model’s decision-making process. Grad-CAM works by capturing the gradients of the target label, which is the correct class in this scenario, as they propagate back to the final convolutional layer just before the softmax. It then uses these gradients to weigh the convolutional feature maps from this layer, generating a heatmap that visually emphasizes the image regions most critical for predicting the class label. Grad-CAM visualization is shown in Fig 13.

2.4.6 High-throughput computation of bundle orientation.

Understanding the precise orientation of stereocilia bundles in the cochlea is critical for deciphering the intricate mechanics of hearing. These hair-like structures are arranged in a “V” shape due to a phenomenon called planar cell polarity (PCP). PCP ensures that neighboring hair cells and their stereocilia bundles are aligned in a specific direction which plays an essential role in orchestrating the precise orientation of stereocilia bundles, which is fundamental for both the directional sensitivity and efficient mechanotransduction necessary for our sense of hearing. Studying and potentially manipulating PCP holds immense potential for developing better hearing aids, diagnosing hearing loss with greater accuracy, and understanding the mechanisms behind various auditory disorders. In addition, it provides information on the effects of genetic mutations and environmental factors on hearing and facilitates comparative studies that explore evolutionary adaptations in hearing.

We developed two automated mechanisms, “Height only” and “Height and Distance”, in our plugin to automatically obtain the orientation of the stereocilia bundles based on our 3D segmentation masks:

For the ‘Height only’ approach, our method pinpoints and records the lowest points on both the left and right sides of the centroid within the 2D projection. Conversely, the ‘Height and Distance’ method not only locates these points but also measures distances from a specific peak point to identify the most remote and lowest points on each side. While the ‘Height only’ approach functions perfectly with cells that exhibit a V-shape structure as clear in Fig 14, it is less effective for cells with atypical shapes, commonly encountered in the apex region. This limitation led to the development of the ‘Height and Distance’ method, which reliably accommodates cells with irregular shapes, see Fig 15.

Fig 14. Automated Computation of Stereocilia Bundle Orientation Using a Height-Only Method.

Fig 14

Top Left: illustrates the bundle orientations superimposed on the raw data, Top right: displays the 3D segmentation masks with bundle orientation highlighted. Bottom Right and Left are cropped regions for a closer look. Source data are provided in Supplementary Information S1 Data.xlsx.

Fig 15. The limitations of the Height-Only orientation computation.

Fig 15

While the Height-Only method excels with cells exhibiting a clear V-shape Fig 14, it struggles with some apical region hair cells that lack this distinct structure. To address these challenges, we developed the Height and Distance approach, which effectively handles a wider variety of cell shapes. The visual comparison includes the frames from which the crops are taken (first column), Height-Only results superimposed on the 3D segmented labels (second column) and the original images (third column), alongside the height and distance results superimposed on the 3D segmented labels (fourth column) and the original images (fifth column). We observe that the Height and Distance method overcomes the limitations of the Height-Only computation. Source data are provided in Supplementary Information S1 Data.xlsx.

Following the orientation calculations, the script proceeds to generate lines and angles for each region. It connects the corresponding left and right points and calculates angles using the arctan2 function for deltaY and deltaX, providing a precise angular measurement. These orientation points, lines, and computed angles are then visualized in the Napari viewer through distinct layers, specifically designated for points, lines, and text annotations. This visualization is enhanced with carefully chosen colors and properties to ensure clarity and optimal visibility. To ensure dynamic interactivity within our visualization tool, event listeners are embedded to reflect any adjustments in orientation points directly on the orientation lines and measurements.

For Fig 14, the Fiji measurements had a mean of 94.04°, median of 92.32°, and standard deviation of 8.83°, while VASCilia reported a mean of 93.36°, median of 92.47°, and standard deviation of 8.44°. The Wilcoxon signed-rank test gave a p-value of 0.0652, and the paired t-test yielded a p-value of 0.0834, indicating no statistically significant difference. Similarly, for Fig 15, Fiji results showed a mean of 87.94°, median of 87.73°, and standard deviation of 7.80°, while VASCilia produced a mean of 86.88°, median of 85.75°, and standard deviation of 5.90°. The Wilcoxon test returned a p-value of 0.1901, and the paired t-test gave a p-value of 0.2738, again confirming no statistically significant difference. These results validate that VASCilia produces automated angle measurements that are consistent with manual Fiji annotations. See S5 and S6 Tables that record all comparison values.

We also quantified bundle orientation angles in PCP-deficit cochleae [41] to assess generalization. The early postnatal Pcdh15-R245X mouse cochleae were acutely dissected, fixed with 4% PFA, stained with phalloidin to visualize the stereocilia bundles, then mounted and imaged with Leica SP8 confocal microscopy using a 63x1.3NA lens. Validation against manual Fiji measurements showed comparable means without statistically significant differences (S9 Table; S4 Fig).

2.4.7 Hair cell classification (IHC, OHC1, OHC2, OHC3).

The categorization of hair cells into IHC, OHC1, OHC2, and OHC3 in cochlear studies is not merely a morphological distinction but is deeply tied to their physiological roles, susceptibility to damage, and their distinct roles in the auditory system. Higher throughput classification will enable more nuanced research exploration, diagnostics, and treatments in audiology and related biomedical fields. Identifying these rows manually is a time-consuming and laborious process. For this reason, we have enhanced our plugin with mechanisms to identify the four rows using three strategies: KMeans, Gaussian Mixture Model (GMM), and deep learning. As a result, this can significantly automate the process, providing high-throughput results for all cells in the current stack in just one second.

1. KMeans: We have implemented KMeans clustering due to its simplicity and efficiency in partitioning data into distinct clusters based on their features. Specifically, we use KMeans to categorize hair cells into four clusters, leveraging attributes such as proximity and density for grouping. This method calculates the centroids of the data points and iteratively minimizes the distance between these centroids and their assigned points, making it highly suitable for fast, general clustering tasks. KMeans proves particularly effective when IHCs and OHCs are distinctly separated and organized in linear rows, especially as it performs optimally without the presence of outliers or significant overlap between rows.

2. Gaussian Mixture Model: The Gaussian Mixture Model is a probabilistic model that assumes all the data points are generated from a mixture of several Gaussian distributions with unknown parameters. By fitting the GMM to the ’y’ coordinates of hair cell endpoints, we can predict the cluster for each cell, which aids in categorizing them into distinct groups based on their vertical spatial alignment. This approach is particularly useful in scenarios where the clusters might have different variances, which is often the case in biological tissues. The ability of GMM to accommodate mixed distribution models helps in accurately classifying cells into four predetermined clusters. The Gaussian Mixture Model (GMM) offers flexibility over KMeans by accommodating clusters of varying shapes, sizes, and densities, which is crucial in biological data where such variations are prevalent. Unlike KMeans, which assumes spherical clusters and hard partitions, GMM models the probability of each point’s membership in potential clusters, allowing for soft clustering.

3. Deep Learning: While KMeans and GMM provide robust and instant results for many samples, see first row of Fig 16, however, these statistics-based methods can struggle with complex configurations, particularly in samples with outliers, overlapping rows, or non-linear, curvy arrangements. For these challenging scenarios shown in second and third row of Fig 16, we have developed a deep learning approach that utilizes multi-class classification to accurately identify and categorize all hair cell bundles, effectively handling the spatial and structural complexities inherent in such data. Check Sect 4.9 about the architecture and model training details.

Fig 16. First row: Successful cases for all methods—KMeans, GMM, and Deep Learning—in accurately clustering the four rows into their respective categories are clearly demonstrated.

Fig 16

This scenario represents an ideal case where each row is well-separated, linearly aligned, and free from outliers that simplify the task of accurate clustering. Second and third rows: Failure cases for KMeans and GMM in accurately clustering the four rows into their respective categories are evident: IHC1 in yellow, OHC1 in cyan, OHC2 in green, and OHC3 in magenta. These traditional methods often struggle to precisely segregate the rows due to their inherent limitations in handling complex data distributions, outliers, and overlapping clusters. In contrast, Deep Learning significantly outperforms both KMeans and GMM, providing accurate and reliable clustering for all cell types. Errors are represented by red bounding boxes. For the sample in the second row, there are Five errors in KMeans, three errors in GMM, and no errors with deep learning. For the sample in the third row, there are Sixteen errors in KMeans, twenty eight errors in GMM, and no errors with deep learning

For inference, we focused exclusively on stacks from the Apex region, where the cells frequently display non-linear growth, are closely packed, and often overlap, leading to higher error rates with KMeans and GMM methods. Conversely, cells in the Base and Middle regions are typically well-aligned. We conducted two experiments: In the first, we tested all Apex data, including 41 datasets from both training and testing phases. In the second experiment, we evaluated 25 stack datasets that were not included in the training set to verify the model’s performance on both seen and unseen data.

Fig 17A and 17B illustrate the error rates by method and dataset to assess performance per sample. Fig 17C and 17D present heatmaps of the errors by method and sample, offering a more visual representation of performance with a color map transitioning from black at the bottom to yellow at the top. These figures highlight the pronounced error frequency in the KMeans and GMM methods compared to our deep learning approach. Fig 17E and 17F display the cumulative errors, providing insight into how errors accumulate linearly per dataset and method. It is notable that the errors with KMeans and GMM, represented in blue and brown, show a steep increase, whereas the deep learning method demonstrates a more consistent error rate. In Experiment 1, across a total of 1824 cells, the error counts were 202 for KMeans, 257 for GMM, and 12 for Deep Learning. In contrast, Experiment 2 involved 1102 cells, with error counts of 136 for KMeans, 169 for GMM, and 11 for Deep Learning. We also conducted a one-way ANOVA (Analysis of Variance) test. This statistical test is used to determine whether there are statistically significant differences between the means of three or more independent (unrelated) groups. We found that there is significant differences between groups with F-value equal to 10.45, A higher F-value indicates a greater degree of variation among the group and P-value equal to 0.0001 which reject the null hypothesis of the ANOVA test. The null hypothesis for an ANOVA test states that all group means are equal.

Fig 17. Comprehensive Examination of Error Rates, Heatmaps, and Cumulative Errors for Cell Type Identification in the Apex Region.

Fig 17

Subplots (A, B, E, and F) illustrate error rate and cumulative errors, with blue representing KMeans, brown for GMM, and green denoting Deep Learning. Subplots (C and D) utilize the ’inferno’ colormap to depict error rates, transitioning from black (low errors) to yellow (high errors), providing a visual gradient of error severity. This color-coded representation aids in distinguishing the methodologies applied across different datasets and highlights the specific error dynamics associated with each method. Source data are provided in Supplementary Information S1 Data.xlsx.

3 Discussion

VASCilia, displayed in Fig 18, stands out in several unique ways: 1) Purpose-Built for Inner Ear Hair Cell Bundle Research: VASCilia is the first Napari plugin and software analysis tool specifically designed for 3D segmentation of stereocilia bundles. It is equipped with comprehensive analysis capabilities and machine learning models tailored for the inner ear imaging community. 2) User-Friendly Interface: VASCilia features an intuitive, easy-to-install interface that allows users to navigate the tool with minimal effort or expertise. 3) Flexible Application and Fine-Tuning: The flexibility of VASCilia allows users to directly apply it to their own datasets, ranging from mice to humans. Additionally, users can fine-tune their own models using the tool’s built-in training module. 4) Batch Processing Capability: VASCilia provides a robust batch processing feature, enabling users to process multiple samples automatically. With automated filtering and rotation of each image stack, all subsequent steps in the pipeline are streamlined. Users simply input the names and paths of their sample files, and VASCilia generates results for each sample, storing them in separate folders with all intermediate results saved for easy access. 5) Review and Collaboration: VASCilia offers a distinctive feature for reloading analysis results from a pickle file, allowing multiple users to review results at a later time, facilitating collaboration and simplifying the review process. All workflow steps and buttons are documented with step-by-step screenshots in https://ucsdmanorlab.github.io/Napari-VASCilia/overview.html.

VASCilia delivers end-to-end 3D segmentation and quantitative analysis of stereocilia bundles; here we synthesize robustness evidence and key biological insights from each Results section.

In Sects 2.1 and 2.2, two preprocessing steps most strongly stabilize downstream analysis. First, ZFT-Net yields near switch-free PCZ→CCZ→NSZ focus labels along the z-axis, so segmentation and quantification operate on cellular-clarity planes rather than out-of-focus frames. Second, PCPAlignNet places each stack in a PCP-aligned reference frame, reducing rotational variance and standardizing height, intensity, orientation, and tonotopic measurements.

In Sect 2.3, accurate 3D instance segmentation of cochlear stereocilia is intrinsically challenging, due to nonspecific staining; close apposition within and across rows; noise, intra-cell signal dropout, intensity variation; and tonotopic shape heterogeneity (Fig 3). Despite these factors, our approach consistently separated closely apposed bundles and maintained detection in low-signal frames, indicating robustness to common experimental artifacts. To highlight variability in segmentation depth as a function of bundle orientation (upright vs. flat) and phalloidin-signal continuity, and to show that the segmentation excludes the cuticular plate, S1A and S1B Fig presents two bundles from a representative apical P21 cochlear stack, including all z-planes with and without overlays; orthogonal slice views (XY, XZ, YZ); and 3D volume renderings.

In Sect 2.4.1, VASCilia height measurements show human-level reliability, matching expert annotations with strong agreement, supporting its use for large quantitative studies. Beyond agreement with experts, a key methodological advantage is that our length metric is rotation-invariant because it traces bundles through the z-stack rather than relying on projected height. As shown in S3AS3C Fig, rotated OHC1 bundles traverse more layers (depth  = 10) than the straighter OHC2/3 bundles (depth  = 5), yet the computed lengths are comparable. For example, OHC1 cell bundles with IDs 17 and 18 measure 1.65--1.69 μm and cell bundle 16 is 1.75 μm, comparable to OHC2 cell bundle 38 at 1.71 μm. The apparent visual foreshortening effect of rotated bundles reflects viewing angle, not true length; the increased z-plane depth afforded by the 3D imaging data accounts for these angle differences. In ambiguous cases, specifically when a bundle does not exhibit a typical V-shape, we performed optional manual refinement. The plugin includes event handlers that allow users to manually adjust the highest and lowest points of each bundle. These adjustments are easily made through the intuitive 3D viewer in the Napari interface, enabling researchers to refine measurements as needed.

In Sect 2.4.2, one potential limitation when measuring total intensity between bundles is the variability in segmentation depth across cells, which can arise from differences in fluorescence signal strength (phalloidin here), imaging depth, or biological heterogeneity. This variability may cause some bundles to appear shallower, potentially affecting their total intensity measurements. Although such differences tend to average out when large numbers of cells are analyzed, minimizing their impact on overall trends and group comparisons, we also provide a dedicated option for researchers who wish to control for this factor. Specifically, VASCilia outputs a CSV file that records per-layer intensity values for each bundle. We extended the analysis code to offer three alternative aggregation modes for per-bundle fluorescence intensity (shown in Fig 11): (i) the sum of raw intensities across all detected layers for each bundle; (ii) the sum restricted to the top n layers, where n is the largest depth shared by all bundles being compared; and (iii) the sum obtained by padding shallower bundles to n layers by repeating the bundle’s weakest-layer value, where nmax=maxidi is the maximum depth across bundles.

To illustrate depth-controlled comparisons, Fig 11 presents four depth-matched examples—two IHCs (Cell #3 and Cell #7; yellow) and two OHCs (Cell #26 and Cell #18; red). With depth held constant, the total 3D intensity becomes primarily driven by morphology (bundle size/footprint) and F-actin density. Consistent with this, Cell #18 shows the lowest total intensity because it is the smallest and finest, whereas Cell #3 shows the highest total intensity, reflecting a larger bundle and denser phalloidin signal.

In Sect 2.4.3, which examines the relationship between bundle height and phalloidin intensity, we observed that height increases tonotopically from base to apex in both WT and Eps8 KO, while Eps8 loss yields consistently shorter bundles. Phalloidin intensity shows the opposite trend—highest at the base and decreasing toward the apex across groups—with KO animals reduced relative to WT (e.g., lowest in OHC KO at the apex). Although basal bundles are shorter than apical bundles, the higher basal intensity is consistent with a higher packing density of F-actin rather than length. Prior work also reports that IHC stereocilia are thicker than OHC stereocilia [4244]. See S11 Fig for a depth-matched visual comparison of base vs. apex in WT; the same qualitative pattern holds in KO. In summary, phalloidin-measured actin content does not necessarily increase with bundle height across WT and Eps8 KO P5 animals.

In Sect 2.4.4, texture analysis captured the expected phenotype: homozygous Cdh23−/− bundles exhibited more heterogeneous and less regular stereocilia organization than heterozygous controls, consistent with visible disorganization. Across metrics, Cdh23/ crops showed systematic shifts (e.g., lower energy, homogeneity, and correlation with higher local contrast), and these differences were separable in low-dimensional embeddings; a simple classifier trained on the same features generalized well to held-out data.

In Sect 2.4.5, we noticed how Grad-CAM (Fig 13) serves as a sanity check to confirm the model’s reliance on accurate features for its predictions. Interestingly, Grad-CAM revealed that the model predominantly focuses on the hair cell bundles to distinguish between the BASE, MIDDLE, or APEX regions. Future directions include exploring the possibility of developing a foundation model capable of accurately predicting cochlear regions across various datasets with different staining and imaging protocols. This effort would undoubtedly require extensive data collection. We believe this work demonstrates that this task is feasible. The Napari-based graphical user interface and interactive visualization tools presented here should help make these methodologies and trained models accessible to the average researcher. We hope these results help motivate collaborative efforts to gather and share more comprehensive imaging data within the inner ear hair cell research community.

In Sect 2.4.6, Bundle orientation is biologically meaningful because planar cell polarity (PCP) aligns stereocilia and underlies directional sensitivity and mechanotransduction. Quantifying orientation therefore provides an objective readout of PCP integrity. The time-consuming nature of the manual PCP measurement methods highlights the need for more automated and efficient techniques to further advance our understanding of this critical process. We identified an automated Fiji plugin, PCP Auto Count, for quantifying planar cell polarity and cell counting [45]. This plugin relies on identifying “chunks” (the hair cell apical surface) and “caves” (the fonticulus) to measure the orientation between their centers. However, this approach was not effective with our images due to the absence of clear caves, especially in low-contrast images. The reliance on distinct caves for orientation measurement poses a limitation for datasets where such features are not consistently visible or distinguishable. Automated angles closely matched manual Fiji measurements on both our datasets and an independent PCP-deficit dataset [41] , indicating human-level reliability. See S5, S6, and S9 Tables that record all comparison values.

In Sect 2.4.7, automated row identification (IHC, OHC1–3) proved reliable even in challenging apical stacks where rows curve and overlap. Compared with unsupervised clustering (KMeans/GMM), the learned classifier handled outliers and nonlinear row geometry substantially better, reducing row-assignment ambiguity in the apex. In Fig 17, errors for KMeans and GMM (blue and brown) increase steeply, whereas the deep-learning model shows a more stable error rate.

As demonstrated in the case studies and generalizable beyond them, VASCilia enables high throughput, standardized quantification. Once 3D instance segmentation is available, the pipeline automatically extracts height, depth-matched phalloidin intensity, orientation, row identity, tonotopic position, 2D and 3D morphology metrics, and texture features across thousands of bundles. This reduces analyses that would typically require months of manual curation to hours, yielding reproducible, statistically powered findings.

Comparison with commercial solutions: While commercial software packages such as Imaris and Arivis offer advanced 3D visualization and generic segmentation capabilities, these platforms are proprietary, costly, and often operate as “black boxes” with limited transparency regarding model architecture or training data. Furthermore, they are not optimized for the specific challenges of cochlear hair bundles, such as the dense, anisotropic packing of stereocilia and non-specific phalloidin staining of all actin filaments in whole mount tissues. In contrast, VASCilia is fully open-source and specifically tailored for hair-cell research. It allows users to transparently inspect and fine-tune segmentation models, ensuring reproducibility and providing a standardized, high-throughput workflow that fills a critical gap not addressed by general-purpose commercial tools.

Limitations and future work: All downstream features in VASCilia are only as accurate as the underlying 3D instance segmentation. While the model generalized across multiple datasets analyzed here, large distribution shifts (e.g., vestibular hair cells, different stains/microscopes), strong preprocessing changes (e.g., deconvolution/denoising), or very low SNR can degrade segmentation quality and thus bias derived measurements. In such cases, domain adaptation via fine-tuning on a small, labeled subset of the target dataset is advisable. We maintain and continuously update the segmentation models and training recipes in our public repository to facilitate such adaptation.

Extending segmentation from bundle level to single stereocilium resolution would enable higher-detail analyses, including per-bundle stereocilia counts and row-specific length distributions across the three rows within each bundle. Such capabilities would strengthen quantitative comparisons across conditions but will likely require much higher-resolution, higher-SNR datasets (e.g., STED or expansion microscopy) and specialized training data with reliable instance-level labels. Developing such a model is beyond the scope of the present work and represents an important direction for future studies.

4 Methods

4.1 Description of our 3D microscopy datasets

C57BL/6J mouse P5 and P21 cochleae were harvested and post-fixed overnight at 4°C in 4% PFA. After fixation, cochleae were dissected and tissues were permeabilized in PBS containing 0.3% Triton-X (PBST) for 30 minutes at room temperature. Alexa Fluor 568-conjugated phalloidin was applied in 0.03% PBST containing 3% NGS and incubated for 30 minutes at 23°C. Samples were washed three times with 0.03% PBST for 10 minutes each, mounted in ProLong Glass Antifade (ThermoFisher Scientific, Cat#P36980, Carlsbad, CA, USA) with a #1.5 coverslip, and imaged with a 2.5 μW 561 nm laser and a 63x 1.4NA DIC M27 objective on an 880 Airyscan confocal microscope with a 43 x 43 nm xy pixel size, 0.110 nm z-step size, and a pixel dwell time of 2.3 μs per pixel then processed with default Airyscan processing settings. All experiments were conducted under strict accordance with the recommendations and approval of the Institutional Animal Care and Use Committee (IACUC) at the University of California, San Diego (UCSD) (protocol number S23058).

4.2 Testing other lab datasets

4.2.1 Lab A’s datasets.

We tested our software without any fine-tuning of the training model on another lab’s (Lab A) datasets [4648], which included samples with varying mouse ages, illumination conditions, cell counts per sample, image dimensions, and pixel sizes. The software successfully performed 3D segmentation and all subsequent analysis steps on these diverse datasets. See S6 and S7 Figs.

For this dataset, all procedures were conducted in compliance with ethical regulations approved by the Institutional Animal Care and Use Committee of Mass Eye and Ear, and in agreement with ARRIVE guidelines. Cochleae were dissected in L-15 medium, fixed in 4% formaldehyde (EMS, #15713) in HBSS for 1 hour, and washed with Ca2+, Mg2+-free HBSS. For mice older than P6, fixed cochleae were decalcified in 0.12 M EDTA (pH = 7.0) for 2 days, washed, then micro-dissected, and permeabilized in 0.5% Triton X-100 (Thermo Scientific, #85111) in Ca2+, Mg2+-free HBSS for 30 minutes. Samples were blocked with 10% goat serum (Jackson ImmunoResearch, #005-000-121) in 0.5% Triton X-100 in Ca2+, Mg2+-free HBSS for 1 hour, and incubated in a 1:200 dilution of rabbit anti-PKHD1L1 (Novus Bio #NBP2-13765) overnight at 4°C. Samples were washed with Ca2+, Mg2+-free HBSS and stained for 2 hours in a 1:500 dilution of Goat anti-Rabbit CF568 (Biotium #20099) and phalloidin CF488 (1:20, Biotium #00042) in the blocking solution. Following washes, samples were mounted on slides with the ProLong Diamond Antifade Kit (ThermoFisher Scientific #P36961) and imaged on a Leica SP8 confocal microscope (63x, 1.3 NA objective lens), or an upright Olympus FluoView FV1000 confocal laser scanning microscope (60x, 1.42 NA objective lens).

4.2.2 Lab B’s lab datasets.

We also tested the software on yet another lab’s (Lab B) datasets, See S8 and S9 Figs. C57BL/6J mice were exposed at 5 weeks of age to noise at 118 dB SPL for 2 hrs, then intravascularly perfused at 8 weeks with 4% PFA. Cochleas were extracted and decalcified in 0.12M EDTA, cryoprotected in 30% sucrose, and stored for 2-3 months at -80°C. After thawing, they were dissected and blocked for one hour at room temperature in PBS with 5% normal horse serum and 0.3% Triton X-100. The tissue was then incubated overnight at 37°C with rabbit anti-ESPN (Sigma #HPA028674 @ 1:100), to label stereocilia, in PBS with 1% normal horse serum and 0.3% Triton X-100. Primary incubation was followed the next day by two sequential 60-minute incubations in an anti-rabbit secondary coupled Alexafluor 647 in PBS with 1% normal horse serum and 0.3% Triton X-100. After immunostaining, pieces were slide-mounted in Vectashield, coverslipped, and imaged on a Leica SP8 confocal with a 63x glycerol-immersion objective (N.A. = 1.3) at 38 nm per pixel in x and y and 250 nm in z.

In addition, S10 Fig shows VASCilia results on human data, despite the fact that no human data stacks were included in our training set. The stacks were imaged on a Leica SP8 confocal using a 63x glycerol immersion objective (Planapo NA 1.3) and the Lightning deconvolution package. The pixel resolution was 0.038 microns in x and y and 0.217 microns in z.

4.3 Z-Focus tracker for cochlear image data preparation and architecture

Our 3D image stacks consist of numerous frames, which we categorize into three distinct zones to better describe the progression of image quality and content throughout the stack:

Pre-Cellular Zone (PCZ): This refers to the early frames where no cellular structures are visible. These frames likely correspond to regions outside the tissue boundaries or the initial imaging volume that has not yet captured the cellular regions.

Cellular Clarity Zone (CCZ): This middle portion of the stack contains well-resolved, clearly visible cells, representing the optimal imaging conditions with a high signal-to-noise ratio. Here, the microscope achieves the clearest visualization of stereocilia bundles.

Noise Saturation Zone (NSZ): The later frames where image quality degrades and noise increases significantly, likely due to reduced laser penetration, light scattering, and other optical limitations, leading to fading and distortion of cellular structures.

We define these three zones as distinct classes for training a classifier. The objective is for the classifier to automatically exclude both the Pre-Cellular Zone (PCZ) and the Noise Saturation Zone (NSZ), retaining only the Cellular Clarity Zone (CCZ). This approach will significantly reduce processing time while enabling the segmentation model to concentrate on regions containing clear cellular structures.

We have compiled a dataset of 135 3D image stacks of P5 mouse cochlea, using 125 stacks for training and validation, and reserving 10 stacks for testing. Each frame was manually annotated into one of the three defined classes, resulting in 3,355 frames for the PCZ, 1,111 for the NSZ, and 2,088 for the CCZ. After applying data augmentation through rotation, we obtained a balanced dataset with 6,710 frames for each class.

To further enhance the generalizability of the model, we incorporated 13 additional stacks from P21 mice and 22 stacks from Liberman Lab (Lab B’s datasets). This expanded the training set to a total of 7,722 frames for each class, improving the model’s ability to generalize across different ages, species, and microscopes.

We resized our images to 256 × 256 and tested several networks, including ResNet10, ResNet18, DenseNet121, and EfficientNet. However, our custom network, Z-Focus Tracker Net (ZFT-Net), produced the best results. ZFT-Net consists of five blocks, each containing a convolutional layer, batch normalization, ReLU activation, and a pooling layer. This is followed by an additional ReLU activation and dropout layer, and finally, two fully connected layers.

4.4 PCPAlignNet data preparation and architecture

Our P5 and P21 mouse datasets consist of 3D stacks with different orientations ranging from 0 to 360 due to variations in how the cochlear tissue is handled or sliced during the sample preparation process. These small discrepancies in positioning can lead to notable shifts in orientation during imaging. Additionally, the cochlea’s spiral and complex three-dimensional structure contributes to the variability in the orientation of the stacks.

This variability in orientation presents a significant challenge during analysis, as the angle must be corrected so that the stereocilia bundle rows align horizontally with respect to the tissue’s planar polarity axis. This is not an easy task because both the images and the bundles exhibit high variability. Even when we manually rotate the images with great precision to appear horizontal to the human eye, some rows may still differ in orientation from others, and bundles can also vary among themselves. As a result, making manual decisions to establish ground truth is extremely challenging. Therefore, we opted to define 72 classes instead of 360. We selected 70% from the stacks of the P5 and P21 data set for training, 20% for validation, leaving 10% for testing. However, these stacks are insufficient to train a robust network capable of predicting across 72 classes, as we do not actually have all the possible angles represented. To address this, we augmented our data set by rotating each frame in every stack to all 72 possible classes (i.e. ranging from 0 to 355 degrees, with 5-degree increments), see Fig 20. For training, we experimented with several models, including ResNet50, ResNet18, MobileNet, and DenseNet121. We trained each model for 50 epochs, utilizing early stopping with a patience of 3. The Adam optimizer, along with a cross-entropy loss function, was employed to optimize the model.

Fig 20. To expand our dataset and represent all possible angles, we augmented each frame by rotating it and cropping the largest area without empty pixels from padded regions.

Fig 20

This process results in frames with varying perspectives and scales. In the example shown, the original slide is highlighted in orange, while all others are augmented versions.

A key challenge that caught our attention was the need to avoid the incorporation of empty pixel values in padded regions during image rotation. To address this, we applied a rotation correction in Python, followed by cropping the largest area that excluded the empty regions. Since each frame begins at a different orientation prior to augmentation, the corrected images, after rotation and removal of empty regions, vary in scale. These variations are often beneficial for CNN training, as they improve the network’s ability to generalize across different spatial representations of data. See Figs 19 and 20.

Fig 19. Determining the precise orientation of the frame is challenging, as not all stereocilia bundle rows share the same horizontal alignment.

Fig 19

Achieving an exact degree out of 360 is practically impossible, even manually. To address this, we decided to use 72-degree intervals with 5-degree increments. We have adapted the augmentation process, shown in Fig 20, to increase the number of image samples.

4.5 3D manual annotation and dataset partitioning for segmentation task

Training a 3D supervised model that efficiently segments each stereocilia bundle requires manual 3D annotation for many bundles, a process that is both cumbersome and slow. We utilized the Computer Vision Annotation Tool (CVAT) [36] to annotate our 3D samples. CVAT facilitates the drawing of manual polygons with an effective tracking feature, which annotators can use to accelerate the annotation process. We manually annotated 45 stacks using the CVAT cloud application [36], assigning each 3D bundle a unique ID for precise identification. The annotated data were thoroughly inspected and refined by both the author and the biologists responsible for imaging the data.

To maintain the integrity of the data split, we divided the dataset into training, testing, and validation sets at the stack level, thus preventing the mingling of frames from different stacks during partitioning. The training set comprises 30 stacks, the validation set includes 5 stacks, and the testing set consists of 10 stacks, which are further classified into 6 typical complexity cases and 4 complex cases. The details on how many training, validation, and testing 3D instances exist in this dataset can be found in Table 5.

Table 5. Summary of 3D stereocilia bundle instances across different data sets.

Data Set IHC OHC Total
Training (30 stacks) 270 940 1210
Validation (5 stacks) 47 170 217
Inference (10 stacks) 93 350 443
Total (45 stacks) 410 1460 1870

4.6 3D segmentation of stereocilia bundles using 2D detection and multi-object assignment algorithm

Our approach to the 3D segmentation task involves applying 2D detection to each frame, followed by the reconstruction of the 3D object using a multi-object assignment algorithm. We employ the Detectron2 library from Facebook Research [49], using the Mask R-CNN architecture [50] combined with a ResNet50 backbone [51] and a Feature Pyramid Network [52]. This setup leverages transfer learning from a model trained on the COCO dataset [53]. The algorithm is executed over 50,000 iterations with a learning rate of 0.00025. We focus on a single class, specifically the stereocilia bundle, with a head threshold score set at 0.5.

After getting all the 2D frame segmentation masks across all stacks, the multi-object assignment algorithm involve these steps, see Fig 4:

  1. Initialization:
    • Set frame_count to zero, marking the start of the frame sequence.
    • Create an empty list tracks to maintain records of active object tracks.
    • Load the initial frame of the stack to start the tracking process.
  2. Processing the First Frame:
    • Detect all objects within the first frame and assign each a unique track ID.
    • Store the position and ID of each detected object in tracks.
    • Increment frame_count.
  3. Tracking in Subsequent Frames:
    • Load the next frame to continue tracking.
    • Detect all visible objects in the current frame.
    • For each detected object:
      • – Calculate the overlap area with each object in tracks.
      • – Determine the appropriate track based on overlap:
        • * If no significant overlap is found, initialize a new track.
        • * If an overlap exists, assign the object to the track with the largest overlap and update the track’s position.
      • – Handle ambiguous cases:
        • * If multiple objects overlap significantly with a single track, choose the object with the largest overlap for the track.
        • * Consider initiating new tracks for other overlapping objects.
    • Increment frame_count.
  4. Loop Through Remaining Frames:
    • Repeat the process in Step 3 for each new frame until the end of the stack sequence.
  5. Finalization:
    • Assemble and output all completed tracks for further analysis.
    • Conclude the tracking algorithm. At this stage, each cell or bundle is assigned a unique ID based on the tracks to enable the user to visualize them in Napari.

4.7 Stereocilia bundle height measurement

Hair cell stereocilia bundles are essential for hearing, as they convert acoustic vibrations into electrical signals that the brain detects as sound. Many deafness mutations cause bundle defects, including improper elongation. Accurately and consistently measuring stereocilia bundle heights in 3D images is therefore critical, but unfortunately laborious and costly, in particular for shorter bundles in either or both developing or mutant (e.g. Eps8 KO mouse [37,54] cochlea tissues). Here we leverage our hair cell bundle segmentations to automate the measurement of bundle heights.

Automated segmentation-derived method. The steps for accurate and automated bundle height measurements involves calculating the distance from the tip to the base of the tallest row of stereocilia in hair cell bundles.

  1. Iterate through each connected component in the labeled volume. Skip the background and any filtered components.

  2. For each relevant component, identify all voxel coordinates that belong to the bundle.

  3. Create a binary sub-volume for the current component where the component’s voxels are marked as one, and all others are zeros.

  4. Project the binary sub-volume along the z-axis to reduce it to a 2D projection by summing along the z-dimension and then applying a threshold to ensure the projection remains binary (values greater than 1 are set to 1).

  5. Locate the highest (tip) and lowest (base) xy-coordinates in the 2D projection that have nonzero values:
    • Find the highest point by identifying the minimum xy-coordinate value that corresponds to 1 in the projection.
    • Find the bottom-most point by tracing downward from the centroid of the projection until reaching a xy-coordinate with a value of 0, then stepping back to the last non-zero coordinate.
  6. Determine the z-coordinates of these points in the original 3D volume:
    • For the highest point, find the z-coordinates where the voxel at the identified (x, y) location is 1.
    • For the lowest point, set zbase to the deepest slice index across the stack where the component is present.
  7. Store the coordinates of the highest and lowest points. These are used for calculating the 3D Euclidean distance between the tip and the base of the bundle.

  8. Calculate the distance using the Euclidean distance formula between the stored highest and lowest points.
    Distance=(x2x1)2+(y2y1)2+(z2z1)2
    where (x1, y1, z1) and (x2, y2, z2) are the coordinates of the first and second points in these dimensions. Note that a scaling factor should be used for each dimension to ensure the calculated distance accurately reflects the true spatial separation between the two points in real physical units.

Manual observer method (two observers): The height of manual stereocilia was measured in Fiji (ImageJ). For each bundle, an observer (i) identified the tallest stereocilium, (ii) placed a marker point at its base (x,y,z), (iii) scrolled through the z-stack to the tip of the same stereocilium to place a second marker (x,y,z), and (iv) calculated the 3D Euclidean distance between the base and tip using the known voxel dimensions (the same formula as above). Each bundle was independently measured by two observers.

4.8 Utilizing pre-trained ResNet50 for targeted classification of cochlear tonotopic regions

For training and validation, our dataset comprises 36 3D stacks from the BASE region containing 535 images, 35 3D stacks from the MIDDLE with 710 images, and 38 3D stacks from the APEX with 651 images. For testing the model, we used ten stacks from BASE, nine from MIDDLE, and ten from APEX that were withheld from the training data. The classification of each stack as BASE, MIDDLE, or APEX is determined through a majority voting mechanism applied across all frames of the associated stack, starting from the median and extending over a length of 13 frames.

In this study, we employed a modified ResNet50 model [51] using the PyTorch framework [55] to classify images of cochlear regions into three categories: BASE, MIDDLE, and APEX, which correspond to the high, middle, and low frequency response tonotopic positions of the cochlear spiral. The model, initialized with weights from pre-trained networks, was adapted to our specific task by altering the final fully connected layer to output three classes. When using a pre-trained ResNet50 model, the weights of the model have been adjusted based on its training on ImageNet [56], contains over 14 million images categorized into over 20,000 classes, where it has likely learned rich feature representations for a wide variety of images. This pre-training makes the model a strong starting point for most visual recognition tasks. To enhance model robustness and adaptability, training involved dynamic augmentation techniques including random resizing, cropping, flipping, color adjustments, and rotations, followed by normalization tailored to the ResNet50 architecture. This approach utilized both frozen and trainable layers, allowing for effective feature extraction adapted from pre-trained domain knowledge while refining the model to the specific needs of our dataset. Training was conducted over 100 epochs with real-time monitoring via TensorBoard, optimizing for accuracy through stochastic gradient descent with momentum. The best performing model was systematically saved to achieve marked improvements in classification accuracy.

4.9 IHC and OHC row classification

We implemented a deep learning strategy employing multi-class classification to precisely identify and categorize each hair cell bundle as either IHCs (first row from the buttom), OHC1 (second row from the buttom), OHC2 (third row from the buttom), and OHC3 (forth row from the buttom).

For the training and validation process, we utilized the same dataset used for the 3D segmentation. Each dataset was manually annotated into four rows: IHC, OHC1, OHC2, and OHC3. We employed a multi-class segmentation approach using a U-Net architecture with a ResNet-50 backbone and a feature pyramid network. Transfer learning was utilized, leveraging pre-trained ImageNet weights to enhance model robustness. Given the existing 3D segmentation, our objective was simply to identify which cell corresponds to which row. To this end, we applied maximum projection to all frames in the stack to simplify and standardize the results.

Data augmentation has played a crucial role in enhancing the model’s ability to generalize across different laboratory datasets, see Fig 21. Specifically, we have rotated the images by 10 and 20 degrees to both the right and left. Additionally, we have employed segmentation masks to mask out the raw images, effectively eliminating the background. This step creates additional training samples that focus on the foreground, helping the model to generalize better to other datasets where stains may not highlight the background, ensuring more accurate segmentation in diverse experimental conditions. Additionally, a CSV file that records distances is updated to include the new classifications, ensuring that each cell’s identity (whether IHC, OHC1, OHC2, or OHC3) is documented.

Fig 21. Augmentation of maximum projection images from 3D confocal stacks for training a classification model that discriminates between IHC, OHC1, OHC2, and OHC3.

Fig 21

4.10 Toward developing a foundational model for cochlea research: Integrating diverse data sources

For the final model available to the ear research community, we trained the architecture using a comprehensive dataset. This dataset included all the data from P5 (young mice), which consists of 45 stacks, as well as 10 stacks from P21 (adult mice), and 22 stacks provided by the Liberman lab. This amounted to a total of 901 2D images and 29,963 instances.

Given that this model is trained on young and adult mice of the same strain, as well as data from a different laboratory with varying settings, stains, and microscopy techniques, we consider this model to be the first trial in developing a foundational model. It is hoped that this model will work with data from other laboratories without requiring fine-tuning.

We are committed to maintaining our GitHub repository and actively encourage collaborators from other labs to share their data. By doing so, we aim to broaden the model’s applicability and enhance its robustness, ultimately benefiting the entire ear research community.

4.11 Computational resources for all the experiments

All experiments were conducted on a local desktop computer equipped with a 13th Gen Intel(R) Core(TM) i9-13900K CPU, 128GB of RAM, and an NVIDIA GeForce RTX 4080 GPU with 16.0 GB of dedicated memory, running Microsoft Windows 11 Pro. We utilized the PyTorch framework [55] to implement all machine learning models to develop the Napari plugin. For training using the Facebook Research Detectron2 library [49], we utilized the Windows Subsystem for Linux (WSL), as the library is not supported natively on Windows. To integrate the algorithm into the Napari plugin [14], we built an executable that runs through one of the plugin’s buttons via WSL. Consequently, users will need to set up WSL to utilize this plugin, details of which are thoroughly described in our GitHub documentation.

Supporting information

S1 File. Describes the VASCilia workflow and features, user-guided adjustment of automated measurements, and the training procedures for stereocilia bundle segmentation.

(PDF)

pbio.3003591.s001.pdf (119.7KB, pdf)
S1 Fig. Representative P21 apex stereocilia segmentations (upright vs. flattened) shown across all z planes with overlays and 3D/YZ views; masks follow stereocilia signal and exclude cuticular plate unless signal is present.

(TIFF)

S2 Fig. Full z-stacks (DS1–DS4) used to derive measurement crops; red boxes mark IHC/OHC positions.

(TIFF)

pbio.3003591.s003.tif (914.5KB, tif)
S3 Fig. Straight vs. rotated bundles (P5): raw stack, automatic segmentation, and 3D length readouts; 3D measurement handles both morphologies.

(TIFF)

pbio.3003591.s004.tif (709.6KB, tif)
S4 Fig. Angle computation on a PCP-deficit cochlear dataset; VASCilia closely matches manual Fiji measurements (paired t-test p = 0.783, Wilcoxon p = 0.965).

(TIFF)

pbio.3003591.s005.tif (1.5MB, tif)
S5 Fig. Representative images for Cdh23 +/− and Cdh23−/− mice.

(TIFF)

pbio.3003591.s006.tif (567.4KB, tif)
S6 Fig. VASCilia screenshots with multiple samples (Lab A), set 1.

(TIFF)

pbio.3003591.s007.tif (1.2MB, tif)
S7 Fig. VASCilia screenshots with multiple samples (Lab A), set 2.

(TIFF)

pbio.3003591.s008.tif (1.3MB, tif)
S8 Fig. VASCilia screenshots with Lab B mouse cochlea data, set 1.

(TIFF)

pbio.3003591.s009.tif (181.8KB, tif)
S9 Fig. VASCilia screenshots with Lab B mouse cochlea data, set 2.

(TIFF)

pbio.3003591.s010.tif (258.7KB, tif)
S10 Fig. VASCilia screenshots with Lab B human cochlea data.

(TIFF)

pbio.3003591.s011.tif (225.4KB, tif)
S11 Fig. Base vs. apex in WT animals: base shows wider/stiffer bundles; apex finer/tighter bundles with reduced phalloidin intensity.

(TIFF)

pbio.3003591.s012.tif (2.9MB, tif)
S1 Table. Bundle height summary by genotype (WT/KO), tonotopic region (Base/Middle/Apex), and cell type (IHC/OHC); mean, SD, median, N.

(PDF)

pbio.3003591.s013.pdf (72.6KB, pdf)
S2 Table. Pairwise tonotopic contrasts in bundle height (Welch t-tests; Hedges’ g; Holm-adjusted p); all significant except KO IHC Base–Middle.

(PDF)

pbio.3003591.s014.pdf (92.8KB, pdf)
S3 Table. Normalized fluorescence intensity by genotype/tonotopic region/cell type; mean, SD, median, N.

(PDF)

pbio.3003591.s015.pdf (68.5KB, pdf)
S4 Table. Pairwise tonotopic contrasts for normalized intensity with effect sizes and Holm-adjusted p-values.

(PDF)

pbio.3003591.s016.pdf (88.7KB, pdf)
S5 Table. Per-cell orientation angles (Manual vs. VASCilia), dataset 1; summary statistics.

(PDF)

pbio.3003591.s017.pdf (66.7KB, pdf)
S6 Table. Per-cell orientation angles (Manual vs. VASCilia), dataset 2; summary statistics.

(PDF)

pbio.3003591.s018.pdf (66.4KB, pdf)
S7 Table. Per-cell stereocilia lengths for Eps8 KO (Manual vs. VASCilia; n = 15); no significant differences (paired t p = 0.609; Wilcoxon p = 0.720).

(PDF)

pbio.3003591.s019.pdf (60.7KB, pdf)
S8 Table. Per-cell stereocilia lengths for Cdh23 −/− (Manual vs. VASCilia; n = 15); no significant differences (paired t p = 0.851; Wilcoxon p = 0.639).

(PDF)

pbio.3003591.s020.pdf (77.1KB, pdf)
S9 Table. Per-bundle orientation values for a PCP-deficit mouse dataset (Manual vs. VASCilia); tests indicate no significant difference.

(PDF)

pbio.3003591.s021.pdf (69.8KB, pdf)
S1 Data. Excel file containing all data related to Table 1, Table 2, Figs 5A–5D, Figs 7A–7B, Figs 10A–10B, Figs 1115, and Figs 17A–17F. The name of each worksheet corresponds to the related figure or table.

(XLSX)

pbio.3003591.s022.xlsx (1.7MB, xlsx)

Acknowledgments

We would like to extend our sincere gratitude to the Liberman lab for generously sharing their data, to I-Hsun Wu (Research Development, Office of Research and Innovation, UC San Diego) for creating the illustrations in Fig 1 and Wendy Groves (Strategic Programs Team in the School of Biological Sciences at UC San Diego) for supervising I-Hsun Wu, and to Leo Andrade for providing the two SEM images of the stereocilia bundles of cochlear hair cells shown in Fig 1, to Cayla Miller for providing helpful feedback on the manuscript, and to Akanksha Biju (a student intern) for her assistance in annotating a subset of stereocilia bundles.

Data Availability

In support of open science and to enhance reproducibility, we are pleased to publicly release all annotated datasets used for training and testing in our experiments. This dataset consists of 1,870 stereocilia bundles (410 IHCs and 1,460 OHCs) from P5 697 mice, and 335 bundles (92 IHCs and 243 OHCs) from P21 mice. It is freely available to the research community, facilitating further studies, benchmarking, and advances in the field. Additionally, the complete source code for Napari plugin, has been made publicly accessible. The code can be accessed through our dedicated GitHub repository at the following link: https://github.com/ucsdmanorlab/Napari-VASCilia. The corresponding archived release is available on Zenodo at https://doi.org/10.5281/zenodo.17871652. The full documentation is accessible through this link: https://ucsdmanorlab.github.io/Napari-VASCilia/. The raw data and manual annotations used to train the final model provided with VASCilia are available at the following link: https://doi.org/10.5281/zenodo.17905676. We encourage the community to utilize these resources in their research and to contribute improvements or variations to the methodologies presented.

Funding Statement

This work was supported by the Chan Zuckerberg Initiative DAF (CZI Imaging Scientist Award DOI: 10.37921/694870itnyzk), the National Science Foundation (NSF NeuroNex Award 2014862), the David F. And Margaret T. Grohne Family Foundation, the G. Harold & Leila Y. Mathers Foundation, and the National Institute on Deafness and Other Communication Disorders (NIDCD R01 DC021075-01), and the Dr. David V. Goeddel Chancellor’s Endowed Chair in Biological Sciences (to U.M.). This work was also supported by NIDCD grant R018566-03S1 to U.M. and R.A.F. Microscopy in this work was supported by the Waitt Advanced Biophotonics Core of the Salk Institute with funding from the Waitt Foundation, the NIH National Cancer Institute (NCI CCSG P30 014195), and the San Diego Nathan Shock Center (P30AG068635). This work was also supported by NIH NIDCD R01DC020190, R01DC017166 and R01DC021795 to A.A.I. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.Minaee S, Boykov Y, Porikli F, Plaza A, Kehtarnavaz N, Terzopoulos D. Image segmentation using deep learning: a survey. IEEE Trans Pattern Anal Mach Intell. 2022;44(7):3523–42. doi: 10.1109/TPAMI.2021.3059968 [DOI] [PubMed] [Google Scholar]
  • 2.Alzubaidi L, Zhang J, Humaidi AJ, Al-Dujaili A, Duan Y, Al-Shamma O, et al. Review of deep learning: concepts, CNN architectures, challenges, applications, future directions. J Big Data. 2021;8(1):53. doi: 10.1186/s40537-021-00444-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Swain SK. Age related hearing loss and cognitive impairment–a current perspective. Int J Res Med Sci. 2021;9(1):317–21. [Google Scholar]
  • 4.Liu K, Sun W, Zhou X, Bao S, Gong S, He DZ. Hearing loss and cognitive disorders. Frontiers in Neuroscience. 2022;16:902405. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Bisogno A, Scarpa A, Di Girolamo S, De Luca P, Cassandro C, Viola P, et al. Hearing loss and cognitive impairment: epidemiology, common pathophysiological findings, and treatment considerations. Life (Basel). 2021;11(10):1102. doi: 10.3390/life11101102 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Schwander M, Kachar B, Müller U. Review series: the cell biology of hearing. J Cell Biol. 2010;190(1):9–20. doi: 10.1083/jcb.201001138 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Hahn R, Avraham KB. Gene therapy for inherited hearing loss: updates and remaining challenges. Audiol Res. 2023;13(6):952–66. doi: 10.3390/audiolres13060083 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Miller KK, Wang P, Grillet N. High-resolution immunofluorescence imaging of mouse cochlear hair bundles. STAR Protoc. 2022;3(2):101431. doi: 10.1016/j.xpro.2022.101431 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Krey JF, Chatterjee P, Halford J, Cunningham CL, Perrin BJ, Barr-Gillespie PG. Control of stereocilia length during development of hair bundles. PLoS Biol. 2023;21(4):e3001964. doi: 10.1371/journal.pbio.3001964 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Mao B, Balasubramanian T, Kelley MW. Cochlear development, cellular patterning and tonotopy. Current Opinion in Physiology. 2020;18:116–22. doi: 10.1016/j.cophys.2020.09.010 [DOI] [Google Scholar]
  • 11.Fettiplace R. Hair cell transduction, tuning, and synaptic transmission in the Mammalian Cochlea. Compr Physiol. 2017;7(4):1197–227. doi: 10.1002/cphy.c160049 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Johnson SL, Safieddine S, Mustapha M, Marcotti W. Hair cell afferent synapses: function and dysfunction. Cold Spring Harb Perspect Med. 2019;9(12):a033175. doi: 10.1101/cshperspect.a033175 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Ji L, Borges BC, Martel DT, Wu C, Liberman MC, Shore SE, et al. From hidden hearing loss to supranormal auditory processing by neurotrophin 3-mediated modulation of inner hair cell synapse density. PLoS Biol. 2024;22(6):e3002665. doi: 10.1371/journal.pbio.3002665 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Contributors N. napari. N. napari: a multidimensional image viewer for Python. 2020. https://github.com/napari/napari
  • 15.Tyson AL, Rousseau CV, Niedworok CJ, Keshavarzi S, Tsitoura C, Cossell L, et al. A deep learning algorithm for 3D cell detection in whole mouse brain image datasets. PLoS Comput Biol. 2021;17(5):e1009074. doi: 10.1371/journal.pcbi.1009074 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Bishop KW, Erion Barner LA, Han Q, Baraznenok E, Lan L, Poudel C, et al. An end-to-end workflow for nondestructive 3D pathology. Nat Protoc. 2024;19(4):1122–48. doi: 10.1038/s41596-023-00934-4 [DOI] [PubMed] [Google Scholar]
  • 17.Windhager J, Zanotelli VRT, Schulz D, Meyer L, Daniel M, Bodenmiller B, et al. An end-to-end workflow for multiplexed image processing and analysis. Nat Protoc. 2023;18(11):3565–613. doi: 10.1038/s41596-023-00881-0 [DOI] [PubMed] [Google Scholar]
  • 18.Prigent S, Valades-Cruz CA, Leconte L, Maury L, Salamero J, Kervrann C. BioImageIT: Open-source framework for integration of image data management with analysis. Nat Methods. 2022;19(11):1328–30. doi: 10.1038/s41592-022-01642-9 [DOI] [PubMed] [Google Scholar]
  • 19.Rombaut B, Roels J, Saeys Y. BioSegment: Active Learning segmentation for 3D electron microscopy imaging. In: IAL@ PKDD/ECML. 2022. p. 7–26.
  • 20.Krentzel D, Elphick M, Domart MC, Peddie CJ, Laine RF, Henriques R. CLEM-Reg: an automated point cloud based registration algorithm for correlative light and volume electron microscopy. BioRxiv. 2023:2023-05. [DOI] [PMC free article] [PubMed]
  • 21.Lalit M, Tomancak P, Jug F. EmbedSeg: embedding-based instance segmentation for biomedical microscopy data. Med Image Anal. 2022;81:102523. doi: 10.1016/j.media.2022.102523 [DOI] [PubMed] [Google Scholar]
  • 22.Conrad R, Narayan K. Instance segmentation of mitochondria in electron microscopy images with a generalist deep learning model trained on a diverse dataset. Cell Syst. 2023;14(1):58-71.e5. doi: 10.1016/j.cels.2022.12.006 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Arzt M, Deschamps J, Schmied C, Pietzsch T, Schmidt D, Tomancak P, et al. LABKIT: labeling and segmentation toolkit for big image data. Front Comput Sci. 2022;4. doi: 10.3389/fcomp.2022.777728 [DOI] [Google Scholar]
  • 24.Müller A, Schmidt D, Albrecht JP, Rieckert L, Otto M, Galicia Garcia LE, et al. Modular segmentation, spatial analysis and visualization of volume electron microscopy datasets. Nat Protoc. 2024;19(5):1436–66. doi: 10.1038/s41596-024-00957-5 [DOI] [PubMed] [Google Scholar]
  • 25.Perdigão LMA, Ho EML, Cheng ZC, Yee NB-Y, Glen T, Wu L, et al. Okapi-EM: a napari plugin for processing and analyzing cryogenic serial focused ion beam/scanning electron microscopy images. Biol Imaging. 2023;3:e9. doi: 10.1017/S2633903X23000119 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Beter M, Abdollahzadeh A, Pulkkinen HH, Huang H, Orsenigo F, Magnusson PU, et al. SproutAngio: an open-source bioimage informatics tool for quantitative analysis of sprouting angiogenesis and lumen space. Sci Rep. 2023;13(1):7279. doi: 10.1038/s41598-023-33090-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Pennington A, King ONF, Tun WM, Ho EML, Luengo I, Darrow MC, et al. SuRVoS 2: accelerating annotation and segmentation for large volumetric bioimage workflows across modalities and scales. Front Cell Dev Biol. 2022;10:842342. doi: 10.3389/fcell.2022.842342 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.D’Antuono R, Pisignano G. ZELDA: a 3D image segmentation and parent-child relation plugin for microscopy image analysis in napari. Frontiers in Computer Science. 2022;3:796117. [Google Scholar]
  • 29.Ivanchenko MV, Cicconet M, Jandal HA, Wu X, Corey DP, Indzhykulian AA. Serial scanning electron microscopy of anti-PKHD1L1 immuno-gold labeled mouse hair cell stereocilia bundles. Sci Data. 2020;7(1):182. doi: 10.1038/s41597-020-0509-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Bernardis E, Yu SX. Robust segmentation by cutting across a stack of gamma transformed images. In: International Workshop on Energy Minimization Methods in Computer Vision and Pattern Recognition. Springer; 2009. p. 249–60.
  • 31.Urata S, Iida T, Yamamoto M, Mizushima Y, Fujimoto C, Matsumoto Y, et al. Cellular cartography of the organ of Corti based on optical tissue clearing and machine learning. Elife. 2019;8:e40946. doi: 10.7554/eLife.40946 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Buswinka CJ, Osgood RT, Simikyan RG, Rosenberg DB, Indzhykulian AA. The hair cell analysis toolbox is a precise and fully automated pipeline for whole cochlea hair cell quantification. PLoS Biol. 2023;21(3):e3002041. doi: 10.1371/journal.pbio.3002041 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Cortada M, Sauteur L, Lanz M, Levano S, Bodmer D. A deep learning approach to quantify auditory hair cells. Hear Res. 2021;409:108317. doi: 10.1016/j.heares.2021.108317 [DOI] [PubMed] [Google Scholar]
  • 34.Schmidt U, Weigert M, Broaddus C, Myers G. Cell detection with star-convex polygons. In: Medical Image Computing and Computer Assisted Intervention–MICCAI 2018 : 21st International Conference, Granada, Spain, September 16-20, 2018, Proceedings, Part II. 2018. p. 265–73.
  • 35.Cortada M, Levano S, Bodmer D. mTOR signaling in the inner ear as potential target to treat hearing loss. Int J Mol Sci. 2021;22(12):6368. doi: 10.3390/ijms22126368 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.CVAT-Team. CVAT: Computer Vision Annotation Tool. GitHub Repository. 2018. https://github.com/openvinotoolkit/cvat
  • 37.Jeng JY, Carlton AJ, Goodyear RJ, Chinowsky C, Ceriani F, Johnson SL, et al. AAV-mediated rescue of Eps8 expression in vivo restores hair-cell function in a mouse model of recessive deafness. Molecular Therapy Methods & Clinical Development. 2022;26:355–70. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Schindelin J, Arganda-Carreras I, Frise E, Kaynig V, Longair M, Pietzsch T, et al. Fiji: an open-source platform for biological-image analysis. Nat Methods. 2012;9(7):676–82. doi: 10.1038/nmeth.2019 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Indzhykulian AA, Stepanyan R, Nelina A, Spinelli KJ, Ahmed ZM, Belyantseva IA, et al. Molecular remodeling of tip links underlies mechanosensory regeneration in auditory hair cells. PLoS Biol. 2013;11(6):e1001583. doi: 10.1371/journal.pbio.1001583 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Pietikäinen M, Hadid A, Zhao G, Ahonen T. Computer vision using local binary patterns. Springer; 2011.
  • 41.Ivanchenko MV, Hathaway DM, Klein AJ, Pan B, Strelkova O, De-la-Torre P, et al. Mini-PCDH15 gene therapy rescues hearing in a mouse model of Usher syndrome type 1F. Nat Commun. 2023;14(1):2400. doi: 10.1038/s41467-023-38038-y [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Sekerková G, Richter C-P, Bartles JR. Roles of the espin actin-bundling proteins in the morphogenesis and stabilization of hair cell stereocilia revealed in CBA/CaJ congenic jerker mice. PLoS Genet. 2011;7(3):e1002032. doi: 10.1371/journal.pgen.1002032 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Zhou L-Y, Jin C-X, Wang W-X, Song L, Shin J-B, Du T-T, et al. Differential regulation of hair cell actin cytoskeleton mediated by SRF and MRTFB. Elife. 2023;12:e90155. doi: 10.7554/eLife.90155 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Pacentine IV, Barr-Gillespie PG. Cy3-ATP labeling of unfixed, permeabilized mouse hair cells. Sci Rep. 2021;11(1):23855. doi: 10.1038/s41598-021-03365-x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Stansak KL, Baum LD, Ghosh S, Thapa P, Vanga V, Walters BJ. PCP auto count: a novel fiji/ImageJ plug-in for automated quantification of planar cell polarity and cell counting. bioRxiv. 2024;:2024–01. [DOI] [PMC free article] [PubMed]
  • 46.Strelkova OS, Osgood RT, Tian C, Zhang X, Hale E, De-la-Torre P, et al. PKHD1L1 is required for stereocilia bundle maintenance, durable hearing function and resilience to noise exposure. Commun Biol. 2024;7(1):1423. doi: 10.1038/s42003-024-07121-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Wu X, Ivanchenko MV, Al Jandal H, Cicconet M, Indzhykulian AA, Corey DP. PKHD1L1 is a coat protein of hair-cell stereocilia and is required for normal hearing. Nat Commun. 2019;10(1):3801. doi: 10.1038/s41467-019-11712-w [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Hayashi Y, Chiang H, Tian C, Indzhykulian AA, Edge ASB. Norrie disease protein is essential for cochlear hair cell maturation. Proc Natl Acad Sci U S A. 2021;118(39):e2106369118. doi: 10.1073/pnas.2106369118 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Wu Y, Kirillov A, Massa F, Lo WY, Girshick R. Detectron2. 2019. https://github.com/facebookresearch/detectron2
  • 50.He K, Gkioxari G, Dollár P, Girshick R. Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision. 2017. p. 2961–9.
  • 51.He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2016. p. 770–8. 10.1109/cvpr.2016.90 [DOI]
  • 52.Lin TY, Dollár P, Girshick R, He K, Hariharan B, Belongie S. Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2017. p. 2117–25.
  • 53.Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, et al. Microsoft COCO: Common Objects in Context. In: European conference on computer vision. Springer; 2014. p. 740–55.
  • 54.Manor U, Disanza A, Grati M, Andrade L, Lin H, Di Fiore PP, et al. Regulation of stereocilia length by myosin XVa and whirlin depends on the actin-regulatory protein Eps8. Curr Biol. 2011;21(2):167–72. doi: 10.1016/j.cub.2010.12.046 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G. PyTorch: an imperative style, high-performance deep learning library. 2019.
  • 56.Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, et al. ImageNet large scale visual recognition challenge. Int J Comput Vis. 2015;115(3):211–52. doi: 10.1007/s11263-015-0816-y [DOI] [Google Scholar]

Decision Letter 0

Richard Hodge

22 Apr 2025

Dear Dr Manor,

Thank you for submitting your manuscript entitled "VASCilia (Vision Analysis StereoCilia): A Napari Plugin for Deep Learning-Based 3D Analysis of Cochlear Hair Cell Stereocilia Bundles" for consideration as a Research Article by PLOS Biology. Please accept my sincere apologies for the delay in getting back to you last week as we consulted with an academic editor about your submission.

Your manuscript has now been evaluated by the PLOS Biology editorial staff, as well as by an academic editor with relevant expertise, and I am writing to let you know that we would like to send your submission out for external peer review.

However, before we can send your manuscript to reviewers, we need you to complete your submission by providing the metadata that is required for full assessment. To this end, please login to Editorial Manager where you will find the paper in the 'Submissions Needing Revisions' folder on your homepage. Please click 'Revise Submission' from the Action Links and complete all additional questions in the submission questionnaire.

Once your full submission is complete, your paper will undergo a series of checks in preparation for peer review. After your manuscript has passed the checks it will be sent out for review. To provide the metadata for your submission, please Login to Editorial Manager (https://www.editorialmanager.com/pbiology) within two working days, i.e. by Apr 24 2025 11:59PM.

If your manuscript has been previously peer-reviewed at another journal, PLOS Biology is willing to work with those reviews in order to avoid re-starting the process. Submission of the previous reviews is entirely optional and our ability to use them effectively will depend on the willingness of the previous journal to confirm the content of the reports and share the reviewer identities. Please note that we reserve the right to invite additional reviewers if we consider that additional/independent reviewers are needed, although we aim to avoid this as far as possible. In our experience, working with previous reviews does save time.

If you would like us to consider previous reviewer reports, please edit your cover letter to let us know and include the name of the journal where the work was previously considered and the manuscript ID it was given. In addition, please upload a response to the reviews as a 'Prior Peer Review' file type, which should include the reports in full and a point-by-point reply detailing how you have or plan to address the reviewers' concerns.

During the process of completing your manuscript submission, you will be invited to opt-in to posting your pre-review manuscript as a bioRxiv preprint. Visit http://journals.plos.org/plosbiology/s/preprints for full details. If you consent to posting your current manuscript as a preprint, please upload a single Preprint PDF.

Feel free to email us at plosbiology@plos.org if you have any queries relating to your submission.

Kind regards,

Richard

Richard Hodge, PhD

Senior Editor, PLOS Biology

rhodge@plos.org

PLOS

Empowering researchers to transform science

Carlyle House, Carlyle Road, Cambridge, CB4 3DN, United Kingdom

California (U.S.) corporation #C2354500, based in San Francisco

Decision Letter 1

Richard Hodge

25 Jun 2025

Dear Dr Manor,

Thank you for your continued patience while your manuscript "VASCilia (Vision Analysis StereoCilia): A Napari Plugin for Deep Learning-Based 3D Analysis of Cochlear Hair Cell Stereocilia Bundles" was peer-reviewed at PLOS Biology. Please accept my sincere apologies for the delays that you have experienced during the peer review process. Your manuscript has now been evaluated by the PLOS Biology editors, an Academic Editor with relevant expertise, and by three independent reviewers.

In light of the reviews, which you will find at the end of this email, we would like to invite you to revise the work to thoroughly address the reviewers' reports.

As you can see, the reviewers think that the tool will be useful for the hair cell field, but they raise some important concerns that should be addressed in the revised version. Reviewer #1 notes that they have had problems installing the plugin and requests that more comprehensive installation instructions/tutorials are provided to ensure its user-friendliness and accessibility. The reviewer also raises concerns with the presentation of the paper and that it should be restructured to serve as a proof-of-principle for the different features. In addition, the reviewers several experimental revisions to fully demonstrate the utility and versatility of the approach. Reviewer #3 asks whether the tool can be adopted to study vestibular hair cells to broaden its impact and notes that additional validation data should be provided for the quantification of the stereocilia length. Finally, Reviewer #1 raises several technical concerns regarding the variability in the reported measurements.

Given the extent of revision needed, we cannot make a decision about publication until we have seen the revised manuscript and your response to the reviewers' comments. Your revised manuscript is likely to be sent for further evaluation by all or a subset of the reviewers.

In addition to these revisions, you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests shortly.

We expect to receive your revised manuscript within 3 months. Please email us (plosbiology@plos.org) if you have any questions or concerns, or would like to request an extension.

At this stage, your manuscript remains formally under active consideration at our journal; please notify us by email if you do not intend to submit a revision so that we may withdraw it.

**IMPORTANT - SUBMITTING YOUR REVISION**

Your revisions should address the specific points made by each reviewer. Please submit the following files along with your revised manuscript:

1. A 'Response to Reviewers' file - this should detail your responses to the editorial requests, present a point-by-point response to all of the reviewers' comments, and indicate the changes made to the manuscript.

*NOTE: In your point-by-point response to the reviewers, please provide the full context of each review. Do not selectively quote paragraphs or sentences to reply to. The entire set of reviewer comments should be present in full and each specific point should be responded to individually, point by point.

You should also cite any additional relevant literature that has been published since the original submission and mention any additional citations in your response.

2. In addition to a clean copy of the manuscript, please also upload a 'track-changes' version of your manuscript that specifies the edits made. This should be uploaded as a "Revised Article with Changes Highlighted" file type.

*Re-submission Checklist*

When you are ready to resubmit your revised manuscript, please refer to this re-submission checklist: https://plos.io/Biology_Checklist

To submit a revised version of your manuscript, please go to https://www.editorialmanager.com/pbiology/ and log in as an Author. Click the link labelled 'Submissions Needing Revision' where you will find your submission record.

Please make sure to read the following important policies and guidelines while preparing your revision:

*Published Peer Review*

Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out. Please see here for more details:

https://blogs.plos.org/plos/2019/05/plos-journals-now-open-for-published-peer-review/

*PLOS Data Policy*

Please note that as a condition of publication PLOS' data policy (http://journals.plos.org/plosbiology/s/data-availability) requires that you make available all data used to draw the conclusions arrived at in your manuscript. If you have not already done so, you must include any data used in your manuscript either in appropriate repositories, within the body of the manuscript, or as supporting information (N.B. this includes any numerical values that were used to generate graphs, histograms etc.). For an example see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5

*Blot and Gel Data Policy*

We require the original, uncropped and minimally adjusted images supporting all blot and gel results reported in an article's figures or Supporting Information files. We will require these files before a manuscript can be accepted so please prepare them now, if you have not already uploaded them. Please carefully read our guidelines for how to prepare and upload this data: https://journals.plos.org/plosbiology/s/figures#loc-blot-and-gel-reporting-requirements

*Protocols deposition*

To enhance the reproducibility of your results, we recommend that if applicable you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols

Thank you again for your submission to our journal. We hope that our editorial process has been constructive thus far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments.

Best regards,

Richard

Richard Hodge, PhD

Senior Editor, PLOS Biology

rhodge@plos.org

------------------------------------

REVIEWS:

Reviewer #1: This would certainly be a useful tool for hair cell biologists and the paper gives a thorough account of its development and features. However, the paper is long and very dense and I think a lot of the text could be moved onto their website as files to read when installing or using the program. I think the paper should serve more as a proof of principle for the different features, however some of that kind of data is lacking, especially for the measurement features (Figs 8-13).

What I think could be improved is as follows:

1) I would have loved to test out this plugin but was not able to get it installed following instructions in the ReadMe document that they direct you to in the paper. I also could not access the other documents in the github site. In order for this to be more user-friendly there has to be better installation instructions provided in the paper including the system requirements needed on the computer. I know of one other person who eventually got it installed on their PC but was unable to run the example files they provide and the software was unable to open their confocal images without crashing. I think without clear user-friendly instructions for the installation this will not be very usable for most labs. Perhaps a detailed tutorial written for one of the demo files would be useful for guiding the use of the plug-in.

2) In Figure 4 (and the other figures), it is unclear to me how the bundle segmentations appear in 3D or how far they extend into the z plane. It seems that most of the images are fairly shallow stacks of flattened hair bundles and judging from Figure 4 (and from figure 28) the bundle segmentation is fairly shallow and uniform within the z-axis. What happens with images where the bundle is not flattened and varies in shape more throughout the z-axis? Also, does the segmentation extend into the cuticular plate at all or is there cell-to-cell variability in how much cuticular plate is included? The phalloidin intensities in the cuticular plate and the stereocilia can be quite similar at certain timepoints which may make segmentation of only the bundle more difficult. The segmentations in Figure 6 appear to be for more upright bundles. It would be good to show one of these close up at different rotations to see the segmentation at xz and yz planes and to keep the raw actin signal separate to see how the segmentation aligns with the bundle in all planes.

3) For the height measurements in Figure 8, how are the two observers performing the measurements? I'm assuming it's by marking the top and base of the tallest stereocilia within the z-stack? This approach to height measurement, especially given the varying orientation of these bundles, will likely add quite a lot of variability. Can you compare these measurements to another approach?

4) What are the four datasets in figure 8? Are they all from the same age and tonotopic region? The heights are averaged in Table 3 so I am assuming they are but the values for each dataset are quite different. If they are from the same area, then it appears the orientation of the stereocilia has a huge effect on the height measurement. This would make sense if the amount of stereocilia length in the z-axis rather than xy varies from cell to cell. The approach seems most viable only for the more flattened bundles.

5) The comments in the text about the time needed for height measurements in Fiji versus in the program does not seem accurate. It is likely that the user will have to go through and refine the base and top points that the program provides or at least would have to double check that they were accurate. This would significantly minimize the time difference.

6) While the measurements between the program and observers are not different (Table 3), they are all highly variable judging by the standard deviations in the table. This result likely derives from the variability in the bundle positions in the images and the way in which their calculation method intersects with this variability, which would apply to both program and observer calculations. The SD is about 25% of the mean length making finding significant lengths changes between conditions quite challenging. For the data in Figure 9a, one can see that the variability is quite large across all conditions and I'm not sure what level of significance could be measured. I would like to see a table of these values that include mean and standard deviation as well as any statistical tests.

7) The total intensity measurements in figures 9 and 10 are also incredibly variable. For Figure 9, I would also like to see a table of these values that include mean and standard deviation as well as any statistical tests. I don't think the authors can make any claims about the correlation between height and actin intensity without showing the statistics.

8) In figure 10, one can see some cell-to-cell variations in bundle segmentation that could contribute to intensity variation in the plot. Could you show the segmentation from another angle? It seems likely that there could be variation in the amount of cuticular plate actin that would be included in each segmentation which would affect these intensity measurements. How do these values compare to another approach for actin intensity measurement?

9) For figure 11, I am not convinced as to how useful this tool will be for other datasets from other ages. It seems like you would have to retrain it using your own images, but if you have already marked the tonotopic position of each image beforehand than what value does this feature provide?

10) For figure 12 and 13, could you compare your measurements to those collected using a different method or program?

Reviewer #2 (identities themselves as Bo Zhao): The study "VASCilia (Vision Analysis StereoCilia): A Napari Plugin for Deep Learning-Based 3D Analysis of Cochlear Hair Cell Stereocilia Bundles" by Kassim and colleagues developed a fantastic deep-learning based software capable of automatically and efficiently measuring many key cellular properties of hair cells including stereocilia length, actin content, and cell polarity. These features are critical for maintaining normal hearing function, and defects of them are known to contribute to hearing loss. Compared with manual measurements using tools such as Fiji, which are often time-consuming and labor-intensive, this automated approach offers a significant advancement. The reviewer greatly appreciates the valuable tools the team has brought to the field. To further strengthen this manuscript and potentially enhance the software's performance, I have the following suggestions.

1. The sentence in the abstract stating that "Using VASCilia, we found that the total actin content of stereocilia bundles (as measured by phalloidin staining) does not necessarily increase with bundle height, which is likely due to differences in stereocilia thickness and number" could be clarified further, as it currently lacks essential information such as the genotypes and developmental stages of the hair cells being referenced.

2. The fluorescence intensity measurements in Fig. 10 appear to show edge effects. Specifically, the intensity of OHCs in the upper left corner is much lower compared with those OHCs on the right side.

3. Would it be feasible to measure the height of the lower rows of stereocilia and quantify the height differences between different rows? Since different rows of stereocilia frequently overlap at the basal region, is there a way to identify the base of each stereocilium based on 3D images, such as through counterstaining with a stereocilia basal marker?

Several minors:

1. Please consider labeling all the three OHCs in Fig. 1 using the same color.

2. There is a typo in the last sentence of page 3.

3. The unit of length is missing in the Y-axis in Fig. 9a.

4. To improve clarity, please specify that it is the length of the tallest row of stereocilia in Fig. 9a.

Reviewer #3: This manuscript presented a very useful tool for studying the morphology of auditory hair cell stereocilia. I tried this plugin and the overall quality of it is solid, with tutorials and functions well documented. It has potential to be adopted by other labs to streamline hair bundle analysis. That said, there are still some questions about VASCillia that could be addressed for improving its clarity and utility. Please check the comments below:

Versatility of VASCillia: Considering that vestibular hair cells are widely used for studying PCP phenotypes, can VASCillia be easily adopted to study those hair cells with retraining the model?

3D contour of hair bundle: From the representative images provided by Figure 6, most of the reconstructed 3D contours seem to have a "plateau" at the top of the hair bundle, which may not accurately represent the true staircase-like structure of a hair bundle. I wonder if this could be improved by retraining the model. Or whether the tip regions are excluded by design that they are classified into pre-cellular zone.

Stereocilia length quantification: The automatic quantification of the stereocilia length is very powerful. However, it is important to note that VASCillia only quantifies the longest stereocilia within each bundle. Since VASCilia identifies the 3D contour of the hair bundle, I am curious if its function can be extended to infer the height of the second row of stereocilia by using the contour size of each z-section. This function could be very helpful to study the actin remodeling after noise damage or MET deficit. But I understand this could be very challenging to implement.

Validation of the stereocilia length quantification: It would be helpful to include human annotator ground truth of Esp8 KO mice in figure 9a. Also, I wonder if VASCilia can also provide accurate quantification for hair bundles with severe developmental deficit, such as MYO7A or CDH23 KO hair cells.

Fluorescence Intensity Measurements: Could the authors elaborate on the method used to normalize fluorescent signal intensity? For example, what are phalloidin intensities normalized to in figure 9b and 10? Are intensities calculated as total signal, or volume-normalized as signal/voxel? More detail on the normalization strategy would be helpful to minimize batch effects and replicate these analyses.

High-Throughput Computation of Bundle Orientation: It is a very useful function of quantifying the angle of stereocilia. I wonder if the author could provide some validation of cochlea with PCP deficit. Meanwhile, is it easy to quantify the V-shape angle of hair bundle?

Other comments: There are some missing definitions for the abbreviations, such as "FC" in Figure 2 convolutional neural network. Please include that in the manuscript for improving readability.

Decision Letter 2

Richard Hodge

21 Nov 2025

Dear Dr Manor,

Thank you for your patience while we considered your revised manuscript "VASCilia (Vision Analysis StereoCilia): A Napari Plugin for Deep Learning-Based 3D Analysis of Cochlear Hair Cell Stereocilia Bundles" for publication as a Methods and Resources Article at PLOS Biology. Please accept my sincere apologies for the delays that you have experienced during this round of the peer review process. This revised version of your manuscript has been evaluated by the PLOS Biology editors, the Academic Editor and the original reviewers.

Based on the reviews, we are likely to accept this manuscript for publication, provided you satisfactorily address the remaining points raised by the reviewers. After discussions with the academic editor, we will not make the additional validation of the intensity quantification methods essential for the revision, but we do ask that you make sure that the installation instructions are correct and that the program can run successfully, as this is essential to ensure that the tool will be accessible and useful for the field.

IMPORTANT

In addition, please make sure to address the following editorial and data-related requests that I have provided below (A-G):

(A) We routinely suggest changes to titles to ensure maximum accessibility for a broad, non-specialist readership. In this case, we would suggest a minor edit to the title, as follows. Please ensure you change both the manuscript file and the online submission system, as they need to match for final acceptance:

“VASCilia is an open-source, deep learning-based tool for 3D analysis of cochlear hair cell stereocilia Bundles"

(B) You may be aware of the PLOS Data Policy, which requires that all data be made available without restriction: http://journals.plos.org/plosbiology/s/data-availability. For more information, please also see this editorial: http://dx.doi.org/10.1371/journal.pbio.1001797

Note that we do not require all raw data. Rather, we ask that all individual quantitative observations that underlie the data summarized in the figures and results of your paper be made available in one of the following forms:

-Supplementary files (e.g., excel). Please ensure that all data files are uploaded as 'Supporting Information' and are invariably referred to (in the manuscript, figure legends, and the Description field when uploading your files) using the following format verbatim: S1 Data, S2 Data, etc. Multiple panels of a single or even several figures can be included as multiple sheets in one excel file that is saved using exactly the following convention: S1_Data.xlsx (using an underscore).

-Deposition in a publicly available repository. Please also provide the accession code or a reviewer link so that we may view your data before publication.

Regardless of the method selected, please ensure that you provide the individual numerical values that underlie the summary data displayed in the following figure panels as they are essential for readers to assess your analysis and to reproduce it:

Figure 7A-B, 9A-B, 10, 15A-F

NOTE: the numerical data provided should include all replicates AND the way in which the plotted mean and errors were derived (it should not present only the mean/average values).

(C) Please also ensure that each of the relevant figure legends in your manuscript include information on *WHERE THE UNDERLYING DATA CAN BE FOUND*, and ensure your supplemental data file/s has a legend.

(D) Please note that we cannot accept sole deposition of code in GitHub, as this could be changed after publication. However, you can archive this version of your publicly available GitHub code to Zenodo. Once you do this, it will generate a DOI number, which you will need to provide in the Data Accessibility Statement (you are welcome to also provide the GitHub access information). See the process for doing this here: https://docs.github.com/en/repositories/archiving-a-github-repository/referencing-and-citing-content

(E) Please ensure that your Data Statement in the submission system accurately describes where your data can be found and is in final format, as it will be published as written there.

(F) Please ensure that you are using best practice for statistical reporting and data presentation. These are our guidelines https://journals.plos.org/plosbiology/s/best-practices-in-research-reporting#loc-statistical-reporting and a useful resource on data presentation https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1002128

- If you are reporting experiments where n ≤ 5, please plot each individual data point.

(G) Please note that per journal policy, the model system/species studied should be clearly stated in the abstract of your manuscript.

------------------------------------------------------------------------

As you address these items, please take this last chance to review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the cover letter that accompanies your revised manuscript.

In addition to these revisions, you may need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests shortly. If you do not receive a separate email within a few days, please assume that checks have been completed, and no additional changes are required.

We expect to receive your revised manuscript within two weeks.

To submit your revision, please go to https://www.editorialmanager.com/pbiology/ and log in as an Author. Click the link labelled 'Submissions Needing Revision' to find your submission record. Your revised submission must include the following:

- a cover letter that should detail your responses to any editorial requests, if applicable, and whether changes have been made to the reference list

- a Response to Reviewers file that provides a detailed response to the reviewers' comments (if applicable, if not applicable please do not delete your existing 'Response to Reviewers' file.)

- a track-changes file indicating any changes that you have made to the manuscript.

NOTE: If Supporting Information files are included with your article, note that these are not copyedited and will be published as they are submitted. Please ensure that these files are legible and of high quality (at least 300 dpi) in an easily accessible file format. For this reason, please be aware that any references listed in an SI file will not be indexed. For more information, see our Supporting Information guidelines:

https://journals.plos.org/plosbiology/s/supporting-information

*Published Peer Review History*

Please note that you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out. Please see here for more details:

https://plos.org/published-peer-review-history/

*Press*

Should you, your institution's press office or the journal office choose to press release your paper, please ensure you have opted out of Early Article Posting on the submission form. We ask that you notify us as soon as possible if you or your institution is planning to press release the article.

*Protocols deposition*

To enhance the reproducibility of your results, we recommend that if applicable you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols

Please do not hesitate to contact me should you have any questions.

Best regards,

Richard

Richard Hodge, PhD

Senior Editor, PLOS Biology

rhodge@plos.org

----------------------------------------------------------------------

Reviewer remarks:

Reviewer #1: The authors have made some significant improvements to the manuscript in response to the reviewers' comments, although the length of the responses made the document difficult to read. We do appreciate that the authors provided a summary of the responses and then followed that with more details. Nevertheless, the manuscript is not an easy read.

Overall, we think the paper presents what could be a useful tool for some hair cell biologists and does a decent job of demonstrating its fidelity by comparing to other measurement techniques. Some of the specific use cases, like measuring hair bundle angles, should be very useful as the VASCilia approach allows analysis of many samples very quickly, allowing much later datasets that should have higher statistical power.

That said, this paper is in essence a methods paper and does not provide any new biological information, but rather uses case studies on some selected data sets to demonstrate each feature of the VASCilia software. As discussed below, the one case study where they claim a new biological result (reported in Fig. 9b) lacks validation and thus is difficult to accept. It is not also clear whether this tool will prove more useful than commercially available solutions (which admittedly will also require a lot of work to train and configure).

Thus this manuscript is an interesting presentation of a novel tool that may be useful to a subset of hair cell biologists. The tool is unlikely to be the only one used by those in the field, however, and there remain lingering questions about how easy it is to use for anyone trying to set it up in their laboratory.

Concerns and comments

1. A concern from the original review was over challenges in installing and using the program due to insufficient installation instructions and description of system requirements. The authors addressed these concerns and their new installation guide that includes system requirements seems much more thorough and easier to follow. Moreover, the text is certainly more streamlined now due to removing the user guides. Nevertheless, we were unable to get the program running ourselves because it apparently does not recognize the GPU we have in the computer. We recognize that in order to properly test the program we need to match the specifications laid out precisely, but it certainly limits the utility of the program if a new dedicated system must be purchased.

2. The authors are surely aware that there are commercial software programs (Arivis, Imaris, etc.) that, while not specifically designed for hair cells, already allow for 3D segmentation of hair bundles and include the ability to create deep-learning instance-based segmentation models (e.g., Arivis). However, these programs are quite expensive and having an open source program already trained for bundle segmentation will certainly be useful to hair cell biologists for certain analysis tasks. Nevertheless, we would appreciate some discussion of alternatives.

3. In the original review, we had concerns over the depth of segmentations in 3D, the possible inclusion of cuticular plate, and possible variance in its performance on flattened versus upright bundles. The authors addressed these concerns by providing a new supplementary figure that shows the segmentation with per slice overlays and orthogonal views. This figure is very helpful at providing a more thorough representation of what the segmentations look like in 3D relative to the actin signal as well as what they look like for a wider variety of bundles (in terms of both age and orientation (flat vs upright)). The changes to Fig. 10 as well as the addition of the Cdh23 -/- bundle case study were also helpful in showing the 3D segmentation performance more thoroughly for a wider variety of bundles.

4. For the height measurements, the authors have clarified our questions as to how the manual measurements were made, provided more details on the datasets used, and provided statistical comparisons for the results. I appreciate the new supplemental figures showing the z-stacks and the excel output for each cell as well as the new validation the KO models provided in tables S7 and S8. Overall, our concerns over the height measurements have been satisfied.

5. The use case for height measurement, bundle disorganization, and bundle orientation are the most successful because the authors include appropriate case studies using KO lines with known changes in the relevant measurement parameters and thoroughly compare their results to other measurement techniques (for height and orientation). However, the section on intensity measurements is the least successful in this regard. The authors should have used case study for intensity measurements that could be more easily validated and interpreted; without validation of their result (basal hair cells bundles more phalloidin intensity than apical hair bundles), it is impossible to know whether the results are correct. They certainly do not seem correct; we have examined thousands of cochlear hair cells labeled with phalloidin, and we are very surprised by the authors' claimed result. The only way that data in Fig. 9b should be included is if the result is corroborated by similar experiments with different markers (e.g., anti-actin antibody or antibody against a stereocilia cytoskeleton protein). As is, the data on total actin intensities in Fig. 9b are very hard to interpret and do not provide a good example about how these intensity measurements could be used more generally to give accurate and more high-throughput results than current measurement tools.

6. A different approach to use intensity measurements might be to use the bundle segmentations to measure the intensity of another protein expressed in hair bundles that they know would be missing or altered in expression in KO bundles (for instance, measure EPS8 expression in the WT and Eps8 KO bundles). The resulting data would be more interpretable and also easier to verify using standard measurement techniques.

Reviewer #2 (Bo Zhao, identifies himself): Thanks for developing this tool for the field.

Reviewer #3 (Sihan Li, identifies himself): I have one minor concerns about the new phalloidin quantification methods provided by the authors. 1. "summing restricted to the top (n) layers" may lead to a loss of information for hair bundles that has larger the Z-dimensions. 2. "padding shallower bundles by repeating the bundle's lowest value" might introduce false values that does not represent real phalloidin signal. However, it is nice to include these methods in VASCilia, which provide more options to meet different experimental needs.

Overall, the revised manuscript has addressed most of my previous questions. It also provides sufficient experimental validation and detailed descriptions of the analysis. I find the revisions satisfactory and would endorse the publication of this manuscript.

Decision Letter 3

Richard Hodge

18 Dec 2025

Dear Uri,

On behalf of my colleagues and the Academic Editor, Alan Cheng, I am pleased to say that we can accept your manuscript for publication, provided you address any remaining formatting and reporting issues. These will be detailed in an email you should receive within 2-3 business days from our colleagues in the journal operations team; no action is required from you until then. Please note that we will not be able to formally accept your manuscript and schedule it for publication until you have completed any requested changes.

Please take a minute to log into Editorial Manager at http://www.editorialmanager.com/pbiology/, click the "Update My Information" link at the top of the page, and update your user information to ensure an efficient production process.

PRESS

We frequently collaborate with press offices. If your institution or institutions have a press office, please notify them about your upcoming paper at this point, to enable them to help maximise its impact. If the press office is planning to promote your findings, we would be grateful if they could coordinate with biologypress@plos.org. If you have previously opted in to the early version process, we ask that you notify us immediately of any press plans so that we may opt out on your behalf.

We also ask that you take this opportunity to read our Embargo Policy regarding the discussion, promotion and media coverage of work that is yet to be published by PLOS. As your manuscript is not yet published, it is bound by the conditions of our Embargo Policy. Please be aware that this policy is in place both to ensure that any press coverage of your article is fully substantiated and to provide a direct link between such coverage and the published work. For full details of our Embargo Policy, please visit http://www.plos.org/about/media-inquiries/embargo-policy/.

Thank you again for choosing PLOS Biology for publication and supporting Open Access publishing. We look forward to publishing your study.

Best wishes,;

Richard

Richard Hodge, PhD

Senior Editor, PLOS Biology

rhodge@plos.org

PLOS

Empowering researchers to transform science

Carlyle House, Carlyle Road, Cambridge, CB4 3DN, United Kingdom

California (U.S.) corporation #C2354500, based in San Francisco

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 File. Describes the VASCilia workflow and features, user-guided adjustment of automated measurements, and the training procedures for stereocilia bundle segmentation.

    (PDF)

    pbio.3003591.s001.pdf (119.7KB, pdf)
    S1 Fig. Representative P21 apex stereocilia segmentations (upright vs. flattened) shown across all z planes with overlays and 3D/YZ views; masks follow stereocilia signal and exclude cuticular plate unless signal is present.

    (TIFF)

    S2 Fig. Full z-stacks (DS1–DS4) used to derive measurement crops; red boxes mark IHC/OHC positions.

    (TIFF)

    pbio.3003591.s003.tif (914.5KB, tif)
    S3 Fig. Straight vs. rotated bundles (P5): raw stack, automatic segmentation, and 3D length readouts; 3D measurement handles both morphologies.

    (TIFF)

    pbio.3003591.s004.tif (709.6KB, tif)
    S4 Fig. Angle computation on a PCP-deficit cochlear dataset; VASCilia closely matches manual Fiji measurements (paired t-test p = 0.783, Wilcoxon p = 0.965).

    (TIFF)

    pbio.3003591.s005.tif (1.5MB, tif)
    S5 Fig. Representative images for Cdh23 +/− and Cdh23−/− mice.

    (TIFF)

    pbio.3003591.s006.tif (567.4KB, tif)
    S6 Fig. VASCilia screenshots with multiple samples (Lab A), set 1.

    (TIFF)

    pbio.3003591.s007.tif (1.2MB, tif)
    S7 Fig. VASCilia screenshots with multiple samples (Lab A), set 2.

    (TIFF)

    pbio.3003591.s008.tif (1.3MB, tif)
    S8 Fig. VASCilia screenshots with Lab B mouse cochlea data, set 1.

    (TIFF)

    pbio.3003591.s009.tif (181.8KB, tif)
    S9 Fig. VASCilia screenshots with Lab B mouse cochlea data, set 2.

    (TIFF)

    pbio.3003591.s010.tif (258.7KB, tif)
    S10 Fig. VASCilia screenshots with Lab B human cochlea data.

    (TIFF)

    pbio.3003591.s011.tif (225.4KB, tif)
    S11 Fig. Base vs. apex in WT animals: base shows wider/stiffer bundles; apex finer/tighter bundles with reduced phalloidin intensity.

    (TIFF)

    pbio.3003591.s012.tif (2.9MB, tif)
    S1 Table. Bundle height summary by genotype (WT/KO), tonotopic region (Base/Middle/Apex), and cell type (IHC/OHC); mean, SD, median, N.

    (PDF)

    pbio.3003591.s013.pdf (72.6KB, pdf)
    S2 Table. Pairwise tonotopic contrasts in bundle height (Welch t-tests; Hedges’ g; Holm-adjusted p); all significant except KO IHC Base–Middle.

    (PDF)

    pbio.3003591.s014.pdf (92.8KB, pdf)
    S3 Table. Normalized fluorescence intensity by genotype/tonotopic region/cell type; mean, SD, median, N.

    (PDF)

    pbio.3003591.s015.pdf (68.5KB, pdf)
    S4 Table. Pairwise tonotopic contrasts for normalized intensity with effect sizes and Holm-adjusted p-values.

    (PDF)

    pbio.3003591.s016.pdf (88.7KB, pdf)
    S5 Table. Per-cell orientation angles (Manual vs. VASCilia), dataset 1; summary statistics.

    (PDF)

    pbio.3003591.s017.pdf (66.7KB, pdf)
    S6 Table. Per-cell orientation angles (Manual vs. VASCilia), dataset 2; summary statistics.

    (PDF)

    pbio.3003591.s018.pdf (66.4KB, pdf)
    S7 Table. Per-cell stereocilia lengths for Eps8 KO (Manual vs. VASCilia; n = 15); no significant differences (paired t p = 0.609; Wilcoxon p = 0.720).

    (PDF)

    pbio.3003591.s019.pdf (60.7KB, pdf)
    S8 Table. Per-cell stereocilia lengths for Cdh23 −/− (Manual vs. VASCilia; n = 15); no significant differences (paired t p = 0.851; Wilcoxon p = 0.639).

    (PDF)

    pbio.3003591.s020.pdf (77.1KB, pdf)
    S9 Table. Per-bundle orientation values for a PCP-deficit mouse dataset (Manual vs. VASCilia); tests indicate no significant difference.

    (PDF)

    pbio.3003591.s021.pdf (69.8KB, pdf)
    S1 Data. Excel file containing all data related to Table 1, Table 2, Figs 5A–5D, Figs 7A–7B, Figs 10A–10B, Figs 1115, and Figs 17A–17F. The name of each worksheet corresponds to the related figure or table.

    (XLSX)

    pbio.3003591.s022.xlsx (1.7MB, xlsx)
    Attachment

    Submitted filename: Response to Reviewers.docx

    pbio.3003591.s023.docx (9.3MB, docx)
    Attachment

    Submitted filename: PLOS Biology response to reviewers.docx

    pbio.3003591.s024.docx (3.8MB, docx)

    Data Availability Statement

    In support of open science and to enhance reproducibility, we are pleased to publicly release all annotated datasets used for training and testing in our experiments. This dataset consists of 1,870 stereocilia bundles (410 IHCs and 1,460 OHCs) from P5 697 mice, and 335 bundles (92 IHCs and 243 OHCs) from P21 mice. It is freely available to the research community, facilitating further studies, benchmarking, and advances in the field. Additionally, the complete source code for Napari plugin, has been made publicly accessible. The code can be accessed through our dedicated GitHub repository at the following link: https://github.com/ucsdmanorlab/Napari-VASCilia. The corresponding archived release is available on Zenodo at https://doi.org/10.5281/zenodo.17871652. The full documentation is accessible through this link: https://ucsdmanorlab.github.io/Napari-VASCilia/. The raw data and manual annotations used to train the final model provided with VASCilia are available at the following link: https://doi.org/10.5281/zenodo.17905676. We encourage the community to utilize these resources in their research and to contribute improvements or variations to the methodologies presented.


    Articles from PLOS Biology are provided here courtesy of PLOS

    RESOURCES