Skip to main content
eLife logoLink to eLife
. 2021 Feb 24;10:e60321. doi: 10.7554/eLife.60321

Graphical-model framework for automated annotation of cell identities in dense cellular images

Shivesh Chaudhary 1, Sol Ah Lee 1, Yueyi Li 1, Dhaval S Patel 1, Hang Lu 1,2,
Editors: Ronald L Calabrese3, Ronald L Calabrese4
PMCID: PMC8032398  PMID: 33625357

Abstract

Although identifying cell names in dense image stacks is critical in analyzing functional whole-brain data enabling comparison across experiments, unbiased identification is very difficult, and relies heavily on researchers’ experiences. Here, we present a probabilistic-graphical-model framework, CRF_ID, based on Conditional Random Fields, for unbiased and automated cell identification. CRF_ID focuses on maximizing intrinsic similarity between shapes. Compared to existing methods, CRF_ID achieves higher accuracy on simulated and ground-truth experimental datasets, and better robustness against challenging noise conditions common in experimental data. CRF_ID can further boost accuracy by building atlases from annotated data in highly computationally efficient manner, and by easily adding new features (e.g. from new strains). We demonstrate cell annotation in Caenorhabditis elegans images across strains, animal orientations, and tasks including gene-expression localization, multi-cellular and whole-brain functional imaging experiments. Together, these successes demonstrate that unbiased cell annotation can facilitate biological discovery, and this approach may be valuable to annotation tasks for other systems.

Research organism: C. elegans

Introduction

Annotation of anatomical structures at cellular resolution in large image sets is a common data analysis step in many studies in Caenorhabditis elegans such as gene expression pattern analysis (Long et al., 2009; Murray, 2008), lineage tracing (Bao et al., 2006), multi-cell calcium imaging and whole-brain imaging (Schrödel et al., 2013; Kato et al., 2015; Venkatachalam et al., 2016; Nguyen et al., 2016). It is necessary for cellular resolution comparison of data across animals, trials, and experimental conditions. Particularly in whole-brain functional imaging, meaningful interpretation of population activity critically depends on cell identities as they facilitate the incorporation of existing knowledge about the system (Kato et al., 2015). Cell identities are also needed for applying common statistical data analysis methods such as Principal Component Analysis, Tensor Component Analysis, demixed-Principal Component Analysis (Williams et al., 2018; Kobak et al., 2016) etc as data across experiments needs to be indexed and pooled by cell identities before applying these methods.

While accurate annotation of cell identities in images is critical, this task is difficult. Typically, the use of cell-specific markers as landmarks delivers good accuracy, but has the cost of having to engineer cell-specific reagents without interfering with phenotypes of interest, which is not guaranteed. Further, even with markers such as the recently developed impressive reagents in the NeuroPAL collection (Yemini et al., 2021), there is still a need to automate the cell identification process. In the absence of markers, cells are identified by comparing images to a reference atlas such as WormAtlas (Altun and Hall, 2009) and OpenWorm (Szigeti et al., 2014) atlas. However, there are severe limitations from both using reference atlas and the presence of noise in data. Reference atlases assume a static and often single view of the anatomy; in contrast, anatomical features vary across individuals. Moreover, due to variations in experimental conditions during acquisition such as exact resolution and orientation of animals, image data often do not match the static atlases, making manual cell identification extremely difficult if not infeasible. Separately, two kinds of noise are prevalent in data. First, individual-to-individual variability in cell positions compared to positions in atlas (position noise). Second, mismatch between number of cells in image and atlas (count noise). Count noise is primarily caused by variability in the expression levels of the reporter used to label cells across animals (i.e. mosaicism), incomplete coverage of promoter to label desired cells, and limits in the computational methods to detect cells. In each of these cases, fewer cells are detected in the images than cells in the atlas. Empirical data have shown that in normalized coordinates, a cell’s position can deviate from the atlas position by more than the cell’s distance to its tenth’ nearest neighbor in the image (Yemini et al., 2021; Toyoshima et al., 2016). Further, our data, as well as data from other labs, have shown that 30–50% of cells in atlases may be missing from images (Kato et al., 2015; Venkatachalam et al., 2016; Nguyen et al., 2016). As a result of the large position and count noise common in data, identifying densely packed cells in head ganglion images of C. elegans by manually comparing images to the atlas is extremely difficult, even for experienced researchers. Further, manual annotation is labor intensive. Therefore, there is a critical need for automated methods for cell identification.

Previous computational methods for cell identification in C. elegans images (Long et al., 2009; Long et al., 2008; Qu et al., 2011; Aerni et al., 2013) focused on identifying sparsely distributed cells with stereotypical positions in young larvae animals. Tools for identification of cells in whole-brain datasets, that is in dense head ganglion, do not exist. Further, previous methods (Long et al., 2009; Yemini et al., 2021; Qu et al., 2011; Aerni et al., 2013; Toyoshima, 2019; Scholz, 2018) do not explicitly address the challenges imposed by the presence of position and count noise in the data. All previous methods either are registration-based or formulate a linear assignment problem; objective functions in these methods minimize a first-order constraint such as the distances between cell-specific features in images and atlases. Thus, these methods maximize only extrinsic similarity (Bronstein et al., 2007) between images and atlas, which is highly sensitive to count noise, position noise, and pre-alignment of spaces in which the image and the atlas exist (i.e. orientations of animals in images and atlases). With the amount of position and count noise commonly observed in experimental data, registration-based methods produce large matching errors.

An alternative criterion proposed for topology-invariant matching of shapes is to maximize intrinsic similarity (Bronstein et al., 2007; Bronstein et al., 2009), orthogonal to extrinsic similarity. This approach has advantages because noise that affects extrinsic similarity does not necessarily imply worse intrinsic similarity. For instance, although cell positions in an image may deviate from their positions in the atlas (large extrinsic noise), geometrical relationships among them are largely maintained (low intrinsic noise). As a specific example, although absolute positions of the cell bodies of AIBL and RIML in an image may deviate greatly from their atlas positions, AIBL soma stays anterior to RIML soma. Therefore intrinsic similarity is more robust against noises, independent of the pre-alignment of spaces, and inherently captures dependencies between cell label assignments that registration methods do not consider.

To directly optimize for intrinsic similarity and dependencies between label assignments, we cast the cell annotation problem as a Structured Prediction Problem (Bakir, 2007; Nowozin, 2010; Caelli and Caetano, 2005; Kappes et al., 2015) and build a Conditional Random Fields (CRF) model (Lafferty et al., 2001) to solve it. The model directly optimizes cell-label dependencies by maximizing intrinsic and extrinsic similarities between images and atlases. One major advantage, as shown using both synthetic data with realistic properties (e.g. statistics from real data) and manually annotated experimental ground-truth datasets, is that CRF_ID achieves higher accuracy compared to existing methods. Further, CRF_ID outperforms existing methods in handling both position noise and count noise common in experimental data across all challenging noise levels.

To further improve accuracy, we took two approaches. First, we took advantage of spatially distributed (fluorescently labeled) landmark cells. These landmark cells act as additional constraints on the model, thus aiding in optimization, and helping in pre- as well post-prediction analysis. Second, we developed a methodology to build data-driven atlases that capture the statistics of the experimentally observed data for better prediction. We provide a set of computational tools for automatic and unbiased annotation of cell identities in fluorescence image data, and efficient building of data-driven atlases using fully or partially annotated image sets. We show the utility of our approach in several contexts: determining gene expression patterns with no prior expectations, tracking activities of multiple cells during calcium imaging, and identifying cells in whole-brain imaging videos. For the whole-brain imaging experiments, our annotation framework enabled us to analyze the simultaneously recorded response of C. elegans head ganglion to food stimulus and identify two distinct groups of cells whose activities correlated with distinct variables – food sensation and locomotion.

Results

Cell annotation formulation using structured prediction framework

Our automated cell annotation algorithm is formulated using Conditional Random Fields (CRF). CRF is a graphical model-based framework widely used for structured/relational learning tasks in Natural Language Processing and Computer Vision community (Bakir, 2007; Nowozin, 2010; Lafferty et al., 2001; Sutton and McCallum, 2010). The goal of structured learning tasks is to predict labels for structured objects such as graphs. In our neuron annotation problem, we assume that our starting point is a 3D image stack of the C. elegans head ganglion (Figure 1A(i)) in which neurons have already been detected (Figure 1A(ii)), either manually or by automated segmentation, and we want to match each neuronal cell body or nucleus to an identity label (a biological name). Hence, we have N detected neuronal cell bodies x1,,xN that form the set of observed variables x={xi}1N, and their 3D coordinates, piR3, i{1,, N}. We also have a neuron atlas that provides a set of labels ={l1, , lK} (biological names) of the neurons and positional relationships among them. Note that the number of neurons in the atlas is greater than the number of neurons detected in the image stack in all datasets, that is K>N. The goal is to annotate a label yj to each neuron in the image stack. The problem is similar to structured labeling (Nowozin, 2010) since the labels to be assigned to neurons are dependent on each other. For example, if a certain neuron is assigned label AVAL, then the neurons that can be assigned label RMEL become restricted since only the cells anterior to AVAL can be assigned RMEL label.

Figure 1. CRF_ID annotation framework automatically predicts cell identities in image stacks.

(A) Steps in CRF_ID framework applied to neuron imaging in C. elegans. (i) Max-projection of a 3D image stack showing head ganglion neurons whose biological names (identities) are to be determined. (ii) Automatically detected cells (Materials and methods) shown as overlaid colored regions on the raw image. (iii) Coordinate axes are generated automatically (Note S1). (iv) Identities of landmark cells if available are specified. (v) Unary and pairwise positional relationship features are calculated in data. These features are compared against same features in atlas. (vi) Atlas can be easily built from fully or partially annotated dataset from various sources using the tools provided with framework. (vii) An example of unary potentials showing the affinity of each cell taking the label RMGL. (viii) An example of dependencies encoded by pairwise potentials, showing the affinity of each cell taking the label ALA given the arrow-pointed cell is assigned the label RMEL. (ix) Identities are predicted by simultaneous optimization of all potentials such that assigned labels maximally preserve the empirical knowledge available from atlases. (x) Predicted identities. (xi) Duplicate assignment of labels is handled using a label consistency score calculated for each cell (Appendix 1–Extended methods S1). (xii) The process is repeated with different combinations of missing cells to marginalize over missing cells (Note S1). Finally, top candidate label list is generated for each cell. (B) An example of automatically predicted identities (top picks) for each cell.

Figure 1.

Figure 1—figure supplement 1. Schematic description of various features in the CRF model that relate to intrinsic similarity and extrinsic similarity.

Figure 1—figure supplement 1.

(A) An example of binary positional relationship feature (Appendix 1–Extended methods S1.2.2) illustrated for positional relationships along AP axis. The table lists feature value for some exemplary assignment of labels ‘A’, ‘B’, and ‘C’ from the atlas to cells ‘1’ and ‘2’ in the image data. For example since cell ‘1’ is anterior to cell ‘2’ in image, if labels assigned to these cells are consistent with the anterior-posterior positional relationship (e.g. ‘A-B’, ‘A-C’, ‘B-C’), then the feature value is high (1); else low (0). CRF_ID model assigns identities to cells in image by maximizing the feature values for each pair of cells in image over all possible label assignments. The table also illustrates the difference between using a static atlas (or single data source) and a data-driven atlas built using available annotated data. In case of static atlas, the CRF model assumes that the cell ‘A’ is anterior to cell ‘B’ with 100% probability. In contrast, in experimental data cell ‘A’ may be anterior to cell ‘B’ with 80% probability (8 out of 10 datasets) and cell ‘B’ may be anterior to cell ‘C’ with 50% probability (5 out of 10 datasets). Thus, data-driven atlases relaxes the hard constraint and uses statistics from experimental data. The feature values are changed accordingly. Note, unlike registration based methods for building data-driven atlas, in CRF model data-driven atlases record only probabilistic positional relationship among cells and not probabilistic positions of cells. Thus CRF_ID does not build spatial atlas of cells. (B) An example of angular relationship feature (Appendix 1–Extended methods S1.2.4). The table lists feature value for some exemplary assignment of labels. For example, the feature value is highest for assigning labels ‘A’ and ‘C’ to cells ‘1’ and ‘2’’ because the vector joining cells ‘A’ and ‘C’ in atlas (vAC) is most directionally similar to vector joining cells ‘1’ and ‘2’ in image (u12) as measured by dot product of vectors. For data-driven atlas, average vectors in atlas are used. (C) An example of proximity relationship feature (Appendix 1–Extended methods S1.2.3). The table lists feature value for some exemplary assignment of labels. For example, the feature value is low for assigning labels ‘B’ and ‘C’ to cells ‘1’ and ‘2’’ because the distance between cells ‘B’ and ‘C’ in atlas (dBC) is least similar to distance between cells ‘1’ and ‘2’ in image (d12). The distance metric can be Euclidean distance or geodesic distance. For data-driven atlas, average distances in atlas are used. (D) An example illustrating the cell annotation performed by maximizing extrinsic similarity in contrast to intrinsic similarity. Registration based methods maximize extrinsic similarity by minimizing registration cost function reg. Here, a transformation 𝒯 is applied to the atlas and labels are annotated to cells in image by minimizing the assignment cost that is sum of distances between cell coordinates in image and transformed coordinates of cells in atlas. For data-driven atlas, a spatial atlas is built using annotated data that is used for registration. Note, in contrast, CRF_ID method does not build any spatial atlas of cells because it uses intrinsic similarity features. CRF_ID only builds atlases of intrinsic similarity features shown in panels (A-C).
Figure 1—figure supplement 2. Additional examples of unary and pairwise potentials and label consistency scores calculated for each cell.

Figure 1—figure supplement 2.

(A) Unary potentials encode affinities of each cell to take specific labels in atlas. Here, affinities of all cells to take the label specified on the top right corner of images are shown. Randomly selected examples are shown here. In practice, unary potentials are calculated for all cells for every label. (B) Pairwise potentials encode affinities of pair of cells in head ganglion to get two labels from atlas. Here, we show the affinity of all cells taking the label specified on the top right corner of images given the cell marked by the arrow is assigned the given label. Randomly selected examples are shown here. In practice, pairwise potentials are calculated for all pairs of cells for all pairs of labels. (C) Examples of label-consistency score of cells that were assigned duplicate labels (specified on the top right corner of the image) in an intermediate step in framework. To remove duplicate assignments, only the cell with the highest consistency score is assigned the label. Optimization is run again to assign labels to all unlabeled cells while keeping the identities of labeled cells fixed. (D) Comparison of label-consistency scores for accurately predicted cells and incorrectly predicted cells. Correctly predicted cells have a higher binary positional relationship consistency score, close to one angular relationship consistency score (smaller angular deviation between labels in image and atlas) and close to 0 proximity consistency score (smaller Gromov-Wasserstein discrepancy). Scores shown for all 130 predicted cells in synthetic data across ~1100 runs. Thus, n ≈ 150,000. *** denotes p<0.001, Bonferroni paired comparison test. Each run differed from the other in terms of random position noise and count noise applied to synthetic data to mimic real images. Top, middle, and bottom lines in box plot indicate 75th percentile, median, and 25th percentile of data, respectively.
Figure 1—video 1. Identities predicted automatically by the CRF_ID framework in head ganglion stack.
Download video file (531.4KB, mp4)
Top five identities predicted are shown sorted by consistency score. Scale bar 5 µm.

We use CRF-based formulation to directly optimize for such dependencies and automatically assign names to each cell. Briefly, a node vi is associated with each observed variable xi (i.e. segmented neuron in image data) forming the set of variables V={vi}1N in the model. Then, CRF models a conditional joint probability distribution Pyx over product space 𝒴=𝒴1××𝒴N of labels assigned to V given observations x, where each 𝒴i=,i{1,,N} and y𝒴 is a particular assignment of labels to V. 𝒴 contains non-optimal and optimal assignments. In CRF, label dependencies among various nodes are encoded by the structure of an underlying undirected graph G(V,𝒞) defined over nodes V, where 𝒞 denotes the set of cliques in graph G. With the underlying graph structure, the conditional joint probability distribution Pyx over full label space 𝒴 factorizes over cliques in G, making it tractable to the model:

P(yx)=1ZcΦc(yc;x) (1)

Here, Z is the normalization constant with Z= y𝒴c𝒞Φc(yc;x) and Φc denotes clique potential if nodes in clique c are assigned label yc𝒴c=𝒴1××𝒴c. In our model, the fully connected graph structure considers only pairwise dependencies between every pair of neurons. Thus, the graph structure of our model becomes G(V,), with the node set V containing nodes vi associated with each segmented neuron and pairwise edges between all nodes that form the edge set . The potential functions in our model are node-potentials Φi and edge-potentials Φij. These functions are parameterized with unary feature functions fa:x×𝒴iR and pairwise feature functions fb:x×𝒴i×𝒴jR, respectively:

Φi(m;i,x)=expaλafa(m;i,x) (2)
Φij(m,n;i,j,x)=expbλbfb(m,n;i,j,x) (3)

where m and n are labels in atlas. Note, there are a unary features and b pairwise features with weights λa and λb respectively to define node and edge potentials.

While unary features account for extrinsic similarity and cell-specific features, pairwise features account for intrinsic similarity. To maximize accuracy, we encode pairwise dependencies between all pairs of cells in the form of several geometrical relationship features (Figure 1—figure supplement 1). Optimal identities of all neurons y𝒴 is obtained by maximizing the joint-distribution Pyx. This is equivalent to maximizing the following energy function.

y=argmaxyyiVaλafa(m;i,x)+eijbλbfb(m,n;i,j,x) (4)

Optimizing this energy function over fully connected graphs (more specifically graphs with loops) is known to be an NP-hard problem (Kohli et al., 2009). However, approximate inference algorithms are widely used in CRF community as they provide reasonable solutions. We implemented a popular method called Loopy Belief Propagation (Murphy et al., 1999) to infer the most probable labeling over all cells, as well as marginal distributions of label assignments for each cell.

The features used in the base version of the model are geometrical relationship features that ensure identities assigned to cells in image are consistent with the atlas in terms of satisfying pairwise geometrical relationships. These features include binary positional relationship feature, proximity relationship feature, and angular relationship feature (Figure 1—figure supplement 1, Appendix 1–Extended methods S1.2). All these features are a variant of the quadratic Gromov-Wasserstein distance used in matching metric spaces (Bronstein et al., 2009; Peyre et al., 2016) and shapes (Solomon et al., 2016; Mémoli, 2011). Briefly, binary positional relationship features encode that as an example, if cell i is anterior, dorsal and to the right of cell j in image stack, then identities assigned to these cells should satisfy these relationships in the atlas. Proximity relationship features ensure that if cell i is spatially near to cell j in image stack, then identities of spatially distant cells in atlas would not be assigned to these cells. Finally, angular relationships ensure that identities assigned to cells i and j should satisfy fine-scale directional relationships as well, and not just simple binary relationships. We show that the CRF model can be easily updated to include additional features such as cells with known identities (landmark cells) and fluorescent spectral codes of cells. We demonstrate this by incorporating landmark cells and spectral information of cells in the model and show improvement in accuracy.

A critical component for the success of automated cell identification methods is data-driven atlas. Static atlases such as OpenWorm atlas provide a single observation of positional relationships among cells. For instance, if cell RMEL is to the left of cell AVAL in OpenWorm atlas, then the model assumes that RMEL is to the left of AVAL with 100% probability. In contrast, in observed experimental data RMEL may be observed to be left of AVAL with 80% probability (e.g. in 8 out of 10 annotated experimental datasets). Thus, data-driven atlases relax the hard-coded constraint of 100% probability imposed by static atlas and accounts for the statistics that is observed experimentally for all positional relationship features (Figure 1—figure supplement 1). Note, data-driven atlas built in our framework is considerably different from those built by registration-based methods. While the latter atlases store probabilistic positions of cells, atlases built by our framework store only probabilistic pairwise positional relationship features among cells, thus more generalizable. We show that building such data-driven atlases is easy for our CRF model (Appendix 1–Extended methods S1.7). We demonstrate this by building several data-driven atlases from different data sources containing various features, showing considerable improvement in accuracy. Further, building data-driven atlases is computationally cheap in CRF_ID, requiring only simple averaging operations; thus, it is scalable to build atlases from large-scale annotated data that may become available in future.

Computational workflow for automatic cell identification

Our annotation framework consists of four major steps (Figure 1A; Appendix 1–Extended methods S1). First, cells are automatically detected in input image channels using a Gaussian Mixture-based segmentation method (see Materials and methods – Whole-brain data analysis). Cells with known identities (landmarks cells) are also detected in this step and their identities are specified. We designed the framework to be flexible on several fronts: (1) easily using manual segmentations of image channels or segmenting on the run; (2) integrating landmark information from any number of image channels; (3) specifying identities of landmark cells on the run or from existing fully or partially annotated files generated with other tools such as Vaa3D (Peng et al., 2010). In the second step, a head coordinate is generated by solving an optimization problem with considerations of the directional consistency of axes (see Appendix 1-Extended methods S1.3). With this coordinate system, we next define cell-specific features (unary potentials) and co-dependent features (pairwise potentials) in the data (Figure 1—figure supplement 2A,B). The base version of the model uses only pairwise relationship features for all pairs of cells, including binary positional relationships, angular relationship, and proximity relationship between cells in images (Figure 1—figure supplement 1). However, additional unary features such as landmarks and color information can be easily added in the model. By encoding these features among all pairs of cells, our fully connected CRF model accounts for label dependencies between each cell pair to maximize accuracy. The atlas used for prediction may be a standard atlas such as the OpenWorm (Szigeti et al., 2014) atlas or it can be easily built from fully or partially annotated datasets from various sources using the tools provided with our framework (see Appendix 1–Extended methods S1.7). In the third step, identities are automatically predicted for all cells by optimizing the CRF energy function consisting of unary and pairwise potentials, which in our formulation is equivalent to maximizing the intrinsic similarity between data and the atlas (see Appendix 1–Extended methods S1.4). Duplicate assignments are resolved by calculating a label-consistency score for each neuron, removing duplicate assignments with low scores (Figure 1—figure supplement 2C,D, see Appendix 1–Extended methods S1.5) and re-running the optimization. After the third step, the code outputs top predicted label for each cell. Next, an optional fourth step can be performed to account for missing neurons in image stack. In this step, full atlas is subsampled to remove fixed number of labels from atlas by either sampling uniformly or based on prior confidence values available on missing rate of labels in images (see Appendix 1–Extended methods S1.5). Subsampled atlas assumes that labels removed are missing from the image and thus ensures that those labels cannot be assigned to any cell in the image. The sampling procedure is repeated, and identities are predicted in each run. We perform these runs in parallel on computing clusters. Lastly, identities predicted across each run are pooled to generate top candidate identities for each cell (Figure 1B; Figure 1—video 1; see Appendix 1–Extended methods S1.6). Thus, there are two modes of running the framework – single-run mode that outputs only top label for each cell and parallel-run mode that outputs multiple top candidate labels for each cell. We make the software suite freely available at https://github.com/shiveshc/CRF_Cell_ID (Chaudhary, 2021; copy archived at swh:1:rev:aeeeb3f98039f4b9100c72d63de25f73354ec526).

Prediction accuracy is bound by position and count noise in data

Given the broad utility of image annotation, we envision our workflow to apply to a variety of problems where experimental constraints and algorithm performance requirements may be diverse. For example, experimental data across different tasks inherently contains noise contributed by various sources in varying amounts that can affect annotation accuracy. These sources of noises include the following: (1) deviation between cell positions in images and positions in atlases, which is position noise, (2) smaller count of cells in images than number of cells in atlas due to missing cells in images, which is count noise, and (3) absence of cells with known identities, i.e. known landmarks. We set out to determine general principles of how these noises may affect cell identification accuracy across various tasks. We used two different kinds of data: synthetic data generated from OpenWorm 3D atlas (Szigeti et al., 2014Figure 2—figure supplement 1A,B and Figure 2—figure supplement 2) and experimental data generated using NeuroPAL strains (Yemini et al., 2021), consisting of annotated ground-truth of nine animals with ~100 uniquely identified neurons (Figure 2—figure supplement 3). While experimental data enables the assessment of prediction accuracy in real scenarios, synthetic data enable us to tune the amount of noise contributed from various sources and dissect their effects on accuracy independently.

To assess the effects of position noise and count noise on prediction accuracy, we simulated four scenarios using the synthetic data (Figure 2—figure supplement 1C). In the absence of any noise, relative positional relationship features predicted neuron identities with perfect accuracy (scenario one in Figure 2—figure supplement 1C), thus demonstrating the suitability of co-dependent features and CRF_ID framework for the annotation task. We found that both position noise and count noise affect accuracy significantly (Figure 2—figure supplement 1C,D) with position noise having a larger effect (compare scenarios 1–2 with 3–4 in Figure 2—figure supplement 1C). As mentioned before, count noise is primarily caused by inefficiencies of either the reporter used to label cells or inaccuracies of the cell detection algorithm used, thus leading to fewer cells detected in the images than cells in atlases. Results on both synthetic data and real data show that 10–15% improvement in prediction accuracy can be attained by simply improving reagents and eliminating count noise (Figure 2—figure supplement 1D). Next, we tested the effect of landmarks (cells with known identities) on annotation accuracy (Figure 2—figure supplement 1E). We hypothesized that landmarks will improve accuracy by acting as additional constraints on the optimization while the algorithm searches for the optimal arrangement of labels for non-landmark cells. Indeed, we found, in both experimental data and synthetic data, randomly chosen landmarks increased prediction accuracy by ~10–15%. It is possible that strategic choices of landmarks could further improve accuracy.

Another advantage of simulations using synthetic data is that by quantifying accuracy across the application of extreme-case of empirically observed noises, they can be used to obtain expected accuracy bounds for real scenarios. We obtained such bounds (shown as gray regions in Figure 2—figure supplement 1F) based on observed position noise in experimental data (Figure 2—figure supplement 2). Notably, the prediction results for experimental data lay close to the estimated bounds using synthetic data (Figure 2—figure supplement 1F). Together, good agreement between results obtained on synthetic and experimental data suggest that the general trends uncovered using synthetic data of how various noises affect accuracy are applicable to experimental data.

Next, with this knowledge, we tuned the features in the model, and we compared prediction accuracy for several combinations of positional relationship features. Among all co-dependent positional relationship features, the angular relationship feature by itself or when combined with PA, LR, and DV binary position relationship features performed best (Figure 2—figure supplement 4A). To account for missing cells, we developed a method that considers missing neurons as a latent state in the model (similar to hidden-state CRF Quattoni et al., 2007) and predicts identities by marginalizing over latent states (see Appendix 1–Extended methods S1.6). Compared to the base case that assumes all cells are present in data, simulating missing neurons significantly increased the prediction accuracy (Figure 2—figure supplement 4B) on experimental data.

Identity assignment using intrinsic features in CRF_ID outperforms other methods

We next characterized the performance of our CRF_ID framework by predicting the identities of cells in manually annotated ground-truth datasets (Figure 2A). To specify prior knowledge, we built data-driven atlases combining positional information of cells from annotated ground-truth datasets and OpenWorm atlas (Appendix 1–Extended methods S1.7). To predict cell identities in each ground-truth dataset, separate leave-one-out atlases were built keeping the test dataset held out. Building such data-driven atlases for our framework is extremely computationally efficient, requiring simple averaging operations; thus, new atlases can be built from thousands of annotated images very quickly. With data-driven atlases (of only eight annotated set, one for each test dataset), 74% of cells were correctly identified by the top label prediction in the ground-truth data set, which exceeds the state of the art. Further, 88% and 94% of cells had true identities within the top 3 and the top 5 predicted labels, respectively (Figure 2A). Note that with using only positional relationship features in the data-driven atlas, this case is equivalent to predicting identities in experimental whole-brain datasets without color information. More importantly, automated annotation is unbiased because, in principle, the framework can combine manual annotations of cell identities of several users (possibly across labs) in the form of data-driven atlases and can predict identities such that positional relationships in the atlas are maximally preserved. Thus, automated annotation removes individual biases in annotating cells. Further, it greatly supports researchers with no prior experience.

Figure 2. CRF_ID annotation framework outperforms other approaches.

(A) CRF_ID framework achieves high prediction accuracy (average 73.5% for top labels) using data-driven atlases without using color information. Results shown for whole-brain experimental ground truth data (n = 9 animals). Prediction was performed using separate leave-one-out data-driven atlases built for each animal dataset with test dataset held out. Gray regions indicate bounds on prediction accuracy obtained using simulations on synthetic data (see Figure 2—figure supplement 1F). Experimental data comes from strain OH15495. Top, middle, and bottom lines in box plot indicate 75th percentile, median, and 25th percentile of data, respectively. (B) Schematic highlighting key difference between registration-based methods and our CRF_ID framework. (C) Prediction accuracy comparison across methods for ground truth experimental data (n = 9, *p<0.05, Bonferroni paired comparison test) and synthetic data (n = 190–200 runs for each method, ***p<0.001, Bonferroni paired comparison test). OpenWorm atlas was used for predictions. Accuracy results shown for top predicted labels. Experimental data comes from strain OH15495. For synthetic data, random but realistic levels of position and count noise applied in each run. Gray regions indicate bounds on prediction accuracy obtained using simulations on synthetic data (see Figure 2—figure supplement 1F). Top, middle, and bottom lines in box plot indicate 75th percentile, median, and 25th percentile of data, respectively. (D) Comparison of methods across count noise levels (defined as percentage of cells in atlas that are missing from data) using synthetic data. (n = 150–200 runs for Rel. Position for each noise level, n = ~1000 runs for Registration for each noise level, ***p<0.001, Bonferroni paired comparison test). OpenWorm atlas was used for prediction. Accuracy results shown for top predicted labels. For a fixed count noise level, random cells were set as missing in each run. Markers and error bars indicate mean ± standard deviation. (E) Comparison of methods across position noise levels using synthetic data. (n = 190–200 runs for each method for each noise level, ***p<0.001, Bonferroni paired comparison test). OpenWorm atlas was used for prediction. Accuracy results shown for top predicted labels. For a fixed position noise level, random position noise was applied to cells in each run. Different noise levels correspond to different variances of zero-mean gaussian noise added to positions of cells (see section Materials and methods – Generating synthetic data for framework tuning and comparison against other methods). Noise levels 3 and 6 correspond to the lower bound and upper bound noise levels shown in Figure 2—figure supplement 1F. Markers and error bars indicate mean ± standard deviation. (F) Pairwise positional relationships among cells are more consistent with OpenWorm atlas even though the absolute positions of cells vary across worms. (Left) average deviation of angular relationship measured in ground truth data (n = 9) from the angular relationship in static atlas. (Right) distribution of all deviations in left panel (total of 8516 relationships) is sparse and centered around 0 deviation, thus indicating angular relationships are consistent with atlas.

Figure 2.

Figure 2—figure supplement 1. Performance characterization using synthetic data.

Figure 2—figure supplement 1.

(A) Freely available open-source 3D atlas (OpenWorm atlas) was used to generate synthetic data. (B) Four scenarios were simulated using atlas and prediction accuracies were quantified. These scenarios include different perturbations observed in experimental data such as count noise (discrepancy between cells in images and atlas), and position noise (variability in cell positions compared to atlas). (C) Prediction accuracy of framework for four simulated scenarios. For the scenario 1, in which no position noise or count noise perturbation was applied to generate simulated data, CRF_ID framework predicted identities with 99.9% accuracy thus highlighting that identities can be predicted by using relative positional relationships only (without any information about absolute positions of cells) in CRF_ID framework. With application of position and count noise, prediction accuracy decreased (scenarios 2–4). Two levels of box plots for scenario 3 and scenario 4 show prediction results for lower bound and upper bound levels of position noise applied to cells (see Figure 2—figure supplement 2). n = 200–203 runs for each scenario and for each number of landmarks condition. Each run differed from another with respect to (1) random perturbations applied to positions of cells (2) random combination of landmarks selected in each run (3) random combination of cells set as missing. Top, middle, and bottom lines in box plot indicate 75th percentile, median, and 25th percentile of data, respectively. (D)Scenarios 3 and 4 were simulated for experimental data by applying count noise to data and results were compared to results on synthetic data shown in panel C. Results show good match between synthetic data and experimental ground truth data. For synthetic data simulations, results are same as panel C for 0 landmarks condition and lower bound level of perturbation. Experimental data comes from OH15495 strain. Top, middle, and bottom lines in box plot indicate 75th percentile, median, and 25th percentile of data, respectively. (E) Effect of number of landmarks on prediction accuracy using both experimental data and synthetic data show similar accuracy trends. Images of nine animals were used as experimental ground truth datasets (strain OH15495). For no landmarks condition, n = 9. For non-zero landmarks conditions, n = 450 runs across nine datasets were performed for each condition with randomly selected landmarks in each run. For synthetic data, n = 200 runs were performed for each condition with randomly selected landmarks in each run. For no landmarks condition in synthetic data – each run is different from another only with respect to random perturbations applied in each run. Top, middle, and bottom lines in box plot indicate 75th percentile, median, and 25th percentile of data, respectively. (F) Accuracy bounds for Top, Top 3, and Top 5 labels were defined as average prediction accuracies achieved by applying lower bound and upper bound levels of position noise in synthetic data. These accuracy bounds are shown as gray regions in Figure 2. n = 50 runs for both lower and upper bounds levels of noise. Accuracies obtained for top, top 3, and top 5 labels in synthetic data are similar to accuracies in experimental data (n = 9). Experimental data comes from OH15495 strain. Top, middle, and bottom lines in box plot indicate 75th percentile, median, and 25th percentile of data, respectively.
Figure 2—figure supplement 2. Method of applying position noise to the atlas to generate synthetic data.

Figure 2—figure supplement 2.

(A, C, E) Variability in positions of cells were quantified in experimental data using landmark strains GT290 and GT298. Panels here show the variability of landmark cells along AP, LR, DV axes (n = 31 animals). Top and bottom lines in box plot indicate 75th percentile and 25th percentile of data, middle (red) line indicates data median, and whiskers indicate range. (B, D, F) Position variability of cells was compared to the inter-cell distances among all cells in the head ganglion. Panels show distributions of inter-cell distances between all pairs of cells in the head along AP, LR, and DV axes. 25th and 75th percentiles of cell position variability were compared to inter-cell distances to define lower bound and upper bound levels of variability in cell positions. (G, H) To calculate perturbations to be applied to cells in atlas, inter-cell distances in the atlas were calculated as well. Subsequently, the relation between position variability and inter-cell distance calculated in experimental data was used to calculate appropriately scaled perturbations based on inter-cell distance in atlas. This was done to remove the effect of different spatial scales of cell positions in experimental data vs atlas. Scaled noise perturbations were applied to each cell position. Panels show original positions of cells (red) and positions after applying perturbation (blue) in atlas for the lower bound and upper bound level of perturbations. Two views of the atlas are shown.
Figure 2—figure supplement 3. Details of manually annotated experimental ground-truth datasets.

Figure 2—figure supplement 3.

(A) Number of cells manually annotated in each of anterior (anterior ganglion), middle (lateral, dorsal, and ventral ganglion) and posterior (retrovesicular ganglion) regions of head ganglion in two kinds of experimental ground-truth datasets: where animal is imaged in lateral orientation, that is LR axis is perpendicular to the image plane (left panel) and where animal is non-rigidly rotated about AP axis (right panel). See Figure 4—figure supplement 2 for details on grouping of head ganglion in anterior, middle, and posterior regions. Data was collected using NeuroPAL strains (OH15495 and OH15500).
Figure 2—figure supplement 4. Model tuning/characterization – features selection and simulating missing cells.

Figure 2—figure supplement 4.

(A) Feature selection in the model was performed by keeping various feature combinations in the model and assessing prediction accuracy. Left panel – experimental data without using color information (n = 9 animals for each condition), Middle panel – experimental data using color information (n = 9 worms for each condition), Right panel – Synthetic data generated from atlas (n = 189–200 runs for each condition with random position and count noise perturbations applied in each run, see Materials and methods). Prediction accuracy across these datasets follow a similar trend for different feature combination. Overall, the angular relationship feature by itself or combined with binary positional relationship features performs best. Experimental data comes from OH15495 strain. Top, middle, and bottom lines in box plot indicate 75th percentile, median, and 25th percentile of data, respectively. (B) Accounting for missing neurons improves prediction accuracy (n = 9 animals, ***p<0.001, Bonferroni paired comparison test). Top, middle, and bottom lines in box plot indicate 75th percentile, median, and 25th percentile of data, respectively.
Figure 2—figure supplement 5. CRF_ID framework with relative positional features outperforms registration method.

Figure 2—figure supplement 5.

(A) Prediction accuracies achieved by Top, Top 3, and Top 5 labels predicted by three methods – Registration, CRF_ID framework with Relative Position features and CRF_ID framework with combined features (see Appendix 1–Extended methods S2.1-2.3). Relative position features outperform the registration method in both experimental data (left panel) and synthetic data (right panel). For experimental data, n = 9 worm datasets, strain OH15495. For synthetic data, n = 200 runs for registration method, n = 48–50 runs for Rel. Position method and combined method. Each run differed from the other in terms of position noise and count noise applied. *** denotes p<0.001, Bonferroni paired comparison test. Part of data is re-plotted in Figure 2C. Top, middle, and bottom lines in box plot indicate 75th percentile, median, and 25th percentile of data, respectively. (B) Accuracy comparison across count noise levels (defined as percentage of cells in atlas that are missing from data) for various fixed position noise levels. (n = 180–200 runs for Rel. Position for each count noise level for each condition, n = ~500 runs for Registration for each count noise level for each condition, ***p<0.001, Bonferroni paired comparison test). Results for position noise level two are shown in Figure 2D. For a fixed count noise level, random cells were removed in each run. Markers and error bars indicate mean ± standard deviation. (C) Accuracy comparison across position noise levels for various fixed count noise levels. (n = 180–200 runs for Rel. Position for each position noise level for each condition, n = ~500 runs for Registration for each position noise level for each condition, ***p<0.001, Bonferroni paired comparison test). Different noise levels correspond to different variances of zero-mean gaussian noise added to positions of cells (see section Materials and methods – Generating synthetic data for framework tuning and comparison against other methods). For a fixed position noise level, random noise was applied to cell positions in each run. Noise levels three corresponds to the lower bound noise level calculated in Figure 2—figure supplement 2. Markers and error bars indicate mean ± standard deviation.
Figure 2—figure supplement 6. Variability in absolute positions of cells and relative positional features in experimental data compared to the static atlas.

Figure 2—figure supplement 6.

(A) DV view (top) and LR view showing positions of cells across ground truth data (n = 9 worms, strain OH15495). Each point cloud of one color represents positions of a specific cell across datasets. Ellipsoid around the point clouds shows 50% confidence interval. Total 98 point-clouds for 98 uniquely identified cells across nine datasets shown here. (B) Distributions of deviations of absolute positions of cells in experimental data (n – nine animals, strain OH15495) compared to corresponding positions in static OpenWorm atlas. Top, middle, and bottom panels show deviation along anterior-posterior, (AP), left-right (LR), and dorsal-ventral (DV) axes. To put the deviations into perspective, dotted red lines show the median distance of each cell to its 10th nearest neighbor. Thus, in experimental data, cell positions can be deviated from their atlas position by much more than their distances to their 10th nearest neighbor. As a result, registration method often creates mismatches. (C) Deviation of relative positional relationships among cells compared to the relationship in static atlas. Top panel shows a schematic of how these deviations were calculated. for example each positional relationship is represented as a matrix. Each element in the matrix records the positional relationship between pair of cells. for example in case of static atlas, if a column cell such as RMEL is anterior to a row cell such as AIZR, then the corresponding elements will denote 1 otherwise 0. Similarly, for experimental data (n = 9 worms, same as panel A, strain OH15495), each element denotes the average number of times RMEL is observed anterior to AIZR in data. Hence the value can be between 0 and 1. Matrices are calculated in the same manner for other features. A simple subtraction of the two matrices records the deviation of the positional relationship feature. Below panels show deviations of AP, LR, DV, and angular relationships between static atlas and data. Closer to 0 values in PA and angular relationship indicates these relationships are most similar to static atlas. Gray cells in matrix denote relationships that could not be measured across nine datasets (when both cells were never annotated in same data). (D) Distributions of deviation of relative positional relationship features between static atlas and data. Total 8546 positional relationship features compared for 98 uniquely identified cell across data. Sparseness of the distributions indicate that the relative positional relationships are consistent with the atlas, although absolute positions of cells vary a lot (panel a, b).
Figure 2—figure supplement 7. Comparison of optimization runtimes of CRF_ID framework with a registration method CPD (Myronenko and Song, 2010).

Figure 2—figure supplement 7.

(A) Optimization runtimes of CRF method using Loopy Belief Propagation (LBP) as optimization method, and registration method CPD across different number of cells in data to be annotated. Synthetic data was used for simulations and annotation was performed using OpenWorm atlas with 206 head ganglion cells (n = 10 runs). Top, middle, and bottom lines in box plot indicate 75th percentile, median, and 25th percentile of data, respectively.

We next compared our method against registration-based methods popular for automatic cell annotation (Long et al., 2009; Long et al., 2008; Aerni et al., 2013; Toyoshima, 2019; Scholz, 2018) (see Appendix– S1.9 Registration methods do not consider intrinsic similarity features such as relative positional relationships and S2.1 Registration). For fair comparison across methods, all methods used OpenWorm atlas as reference for prediction. The major difference between our framework and previous methods is the use of intrinsic similarity compared to extrinsic similarities in previous methods in the annotation task (Figure 2B, Figure 1—figure supplement 1). Remarkably, for both experimental and synthetic data, CRF_ID using relative positional features performs the best (Figure 2C; Figure 2—figure supplement 5A). Note that the decrease in accuracy compared to Figure 2A here is due to using static OpenWorm atlas, further highlighting the importance of building data-driven atlases. Notably, CRF_ID outperforms registration-based method across all levels of count noise and position noise in data (Figure 2D,E; Figure 2—figure supplement 5B,C). The accuracy of registration-based methods falls rapidly with increasing count noise levels, whereas CRF_ID is highly robust, maintaining higher accuracy even when up to 75% of cells in atlas were missing from data. This has important practical implications as the amount of count noise observed in experimental data may vary significantly across reagents, imaging conditions etc. Further, neuron positions being highly variable across individual animals have been shown (Yemini et al., 2021), and confirmed by our datasets as well (Figure 2—figure supplement 6A). Because cell positions on average can deviate from their atlas position by more than the distance to their tenth nearest neighbor (Figure 2—figure supplement 6B), we expect that this variability introduces large matching errors in registration-based methods. In contrast, most pair-wise relationships are preserved despite the variability of absolute positions (Figure 2F; Figure 2—figure supplement 6C,D). Interestingly, a hybrid objective function that combines registration using absolute positions with relative position features in CRF_ID framework corrupts the annotation performance (Figure 2—figure supplement 5A), likely due to competing effects in the objective function. This again highlights the fact that higher accuracy is achieved by positional relationship features in CRF_ID method.

Next, to compare the computational efficiency of CRF_ID framework with that of registration based methods, we compared the optimization step runtimes of the single-run mode of CRF_ID framework with that of a popular registration method (Coherent Point Drift Myronenko and Song, 2010Figure 2—figure supplement 7A). The computational speed of both methods scales with the number of cells to be annotated in images and the number of cells in the atlas. As expected, CRF_ID framework is computationally more expensive compared to CPD, because it optimizes both unary and pairwise potentials. Nonetheless, the optimization runtime of CRF_ID framework for multi-cell calcium imaging use-case (10–50 cells in image) is on the order of 0.1–10 s, on a desktop computer (see Materials and methods – Runtime comparison), when full head ganglion atlas (206 cells) is used for annotation. We emphasize that using full head ganglion atlas for cell identity annotation in whole-brain imaging is important because without prior knowledge of which cells are missing in images, full atlas provides unbiased opportunity to cells in images to take any label from the atlas. In contrast, if only a partial atlas or partially annotated data set is used as atlas, the labels absent in atlas will never get assigned to any cell in images, thus potentially biasing the annotation. In practice, faster runtimes can be achieved in multi-cell calcium imaging and whole-brain imaging case with the use of smaller atlases based on prior knowledge of cells expected in strains. Further, the multiple-run mode of CRF_ID framework can be parallelized using multiple CPU workers. Thus, higher accuracy compared to registration based methods combined with reasonable speeds makes CRF_ID favorable for cell annotation tasks.

Cell annotation in gene-expression pattern analysis

We next demonstrate the utility of our framework for gene-expression localization analyses, which is important for many problems, for example mapping the cellular atlas of neurotransmitters (Gendrel et al., 2016; Pereira et al., 2015), receptors (Vidal et al., 2018), and neuropeptides (Bentley et al., 2016). Conventional methods, for example screening a list of cell-specific marker lines that overlap in expression with the reporter, are laborious and scale poorly with the number of cells expressing the genes of interest and the number of new genes for which expression patterns are to be determined. Our cell annotation framework can considerably reduce manual efforts by generating a small list of candidate identities for each cell expressing the reporter. Subsequently, researchers can easily verify or prune the candidate list. To demonstrate this use case, we imaged a strain with multiple cells labeled with GFP and predicted candidate identities for each cell (Figure 3A). Determining cell identities in this case is difficult due to large count noise along with position noise: since the full list of labels in the atlas is much bigger than few cells in the reporter strain (equivalent to scenario four in Figure 2—figure supplement 1C). Thus, several degenerate (equally probable) solutions are possible. To avoid accuracy decrease in such cases, we directly predicted the candidate identities of all cells marked with pan-neuronal red fluorescent protein (RFP) using full whole-brain atlas and subsequently assessed the accuracy of only cells of interest, that is those marked with GFP. Our framework accurately generated a candidate list for cells across all datasets (n = 21 animals); 85% of cells had true identities within the top five labels chosen by the framework. In comparison, the candidate list generated by the registration method achieved only 61% accuracy (Figure 3B).

Figure 3. CRF_ID framework predicts identities for gene expression pattern analyses.

Figure 3.

(A) (Top) Schematic showing a fluorescent reporter strain with GFP expressed in cells for which names need to be determined. Since no candidate labels are known a priori neurons labels are predicted for all cells marked with pan-neuronally expressed RFP using full whole-brain atlas. (Bottom) A proxy strain AML5 [rab-3p(prom1)::2xNLS::TagRFP; odr-2b::GFP] with pan-neuronal RFP and 19 cells labeled with GFP was used to assess prediction accuracy. (B) CRF_ID framework with relative position features outperforms registration method (n = 21 animals) (***p<0.001, Bonferroni paired comparison test). Accuracy shown for top five labels predicted by both methods. Experimental data comes from strain AML5. Top, middle, and bottom lines in box plot indicate 75th percentile, median, and 25th percentile of data, respectively.

Cell annotation in multi-cell functional imaging experiments

We next demonstrate the utility of our algorithm in another important application - annotating cell identities in multi-cell calcium functional imaging in vivo (Figure 4). Automation in this case dramatically reduces labor associated with cell annotation for many time points, across trials, animals, and experiments. We used a strain carrying GFP in multiple cells as a proxy for GCaMP-labeled strains for illustration purposes (Figure 4A). Given the known candidate list of labels that can be assigned (i.e. no count noise), the configurational space is small, which makes the task easy (similar to scenario three in Figure 2—figure supplement 1C). Indeed, our annotation framework identified neurons with high accuracy (98%, n = 35 animals). In comparison, the registration method predicted identities with lower accuracy (88%) even with the small label assignment space (Figure 4B). In reality, some neurons may be undetected in the data due to expression mosaicism or low-calcium transients thus adding count noise to data (equivalent to scenario 4 in Figure 2—figure supplement 1C). We thus simulated this case by randomly removing up to a third of total neurons from the images and predicting identities of remaining cells using the full label list (Figure 4C; Figure 4—figure supplement 1A). Even under these conditions, the accuracy of our method remains high (88%), significantly outperforming registration method (81%) (Figure 4—video 1). In practice, the performance can be further compensated for by using multiple frames from each video, which we are not doing here in the mock experiment.

Figure 4. Cell identity prediction in mock multi-cell calcium imaging experiments and landmark strain.

(A) (Top) schematic showing automatic identification of cells in multi-cell calcium imaging videos for high-throughput analysis. (Bottom) A mock strain with GFP-labeled cells was used as an illustration of GCaMP imaging. Only green channel of AML5 strain was used for this purpose. (B) CRF_ID framework outperforms registration method (n = 35 animals, ***p<0.001, Bonferroni paired comparison test). OpenWorm atlas was used for prediction. Accuracy results shown for top predicted labels. Experimental data comes from strain AML5 (only green channel used). Top, middle, and bottom lines in box plot indicate 75th percentile, median, and 25th percentile of data, respectively. (C) Prediction accuracy comparison for the case of missing cells in images (count noise). ***p<0.001, Bonferroni paired comparison test. Total n = 700 runs were performed across 35 animals for each method with 3 out 16 randomly selected cells removed in each run. For fair comparison, cells removed across methods were the same. OpenWorm atlas was used for prediction. Accuracy results shown for top predicted labels. Experimental data comes from strain AML5 (only green channel used). Top, middle, and bottom lines in box plot indicate 75th percentile, median, and 25th percentile of data, respectively. (D) Max-projection of 3D image stacks showing CyOFP labeled landmark cells in head ganglion (pseudo-colored as cyan): animals carrying [unc47p::NLS::CyOFP1::egl-13NLS] (GT296 strain) with nine landmarks (top), and animals carrying [unc-47p::NLS::CyOFP1::egl-13NLS; gcy-32p::NLS::CyOFP1::egl-13NLS] with 12 landmarks (bottom). (E) (Left) max-projection of a 3D image stack from whole-brain activity recording showing head ganglion cells and identities predicted by CRF_ID framework (Top labels). Animal is immobilized in a microfluidic device channel and IAA stimulus is applied to the nose tip. (Right) GCaMP6s activity traces extracted by tracking cells over time in the same 108 s recording and their corresponding identities. Blue shaded region shows IAA stimulation period. Experimental data comes from strain GT296.

Figure 4.

Figure 4—figure supplement 1. Relative position features perform better than registration in handling missing cells in images.

Figure 4—figure supplement 1.

(A) Comparison of prediction accuracies across three methods for different number of missing cells (out of total 16 cells) simulated in experimental data. Experimental data comes from AML5 strain (only GFP channel used). Results for 3 cells missing are shown in Figure 4C. ***p<0.001, **p<0.01, all comparisons done with Bonferroni paired comparison test. n = 700 runs were performed across data from 35 animals for each number of missing cells and each method. For fair comparison across methods, the same cells were considered missing for all methods for a fixed missing cell number. Each run differed from the other with respect to random subset of cells considered missing. Part of data is re-plotted in Figure 4C. Top, middle, and bottom lines in box plot indicate 75th percentile, median, and 25th percentile of data, respectively.
Figure 4—figure supplement 2. Spatially distributed landmarks or landmarks in lateral ganglion perform best in supporting CRF_ID framework for predicting identities.

Figure 4—figure supplement 2.

(A) Top panel – Region-wise prediction accuracy achieved by our CRF_ID framework when landmarks were constrained to lie in specific regions of the head. n = 200 runs when landmarks were constrained in anterior, middle and posterior regions, n = 1000 runs when landmarks were spatially distributed throughout the head. All data is synthetic data. A random combination of 15 landmarks was selected in each run. Landmarks constrained in the anterior region perform badly in predicting identities of the posterior region, similarly landmarks constrained in the posterior region perform badly in predicting identities of the anterior region. Landmarks in the middle region or spatially distributed throughout the head show balanced accuracy for all regions. Bottom panel – shows the grouping of head regions as anterior, middle, and posterior regions. Top, middle, and bottom lines in box plot indicate 75th percentile, median, and 25th percentile of data, respectively.
Figure 4—figure supplement 3. Microfluidic device used in chemical stimulation experiments and characterization.

Figure 4—figure supplement 3.

(A) Schematic of the microfluidic device Cho et al., 2020 used in chemical stimulation experiments. The position of nematode in the imaging channel is shown. Temporally varying stimulus is applied to the nose tip of the nematode by switching between food/IAA and buffer streams. (B) Stimulus characterization was performed using FITC as stimulus and S-basal as buffer. Figures show a zoom-in of the T-junction in the device (where the nose tip of the nematode would be). Colored boxes show the regions used to characterize the stimulus profile. (C) Applied stimulus profile and measured stimulus profile for 5 s on and 5 s off stimulus.
Figure 4—video 1. Comparison between the CRF_ID framework and the registration method for predicting identities in case of missing cells.
Download video file (53.9KB, mp4)
Identities shown in red are incorrect predictions. Scale bar 5 μm.

To further facilitate annotation accuracy, we explored the utility of landmarks with known identities. Landmarks can also help in establishing a coordinate system in images and guiding post-prediction correction. Because the combinatorial space of potential landmarks is very large (~1014 for 10 landmarks out of ~200 cells in the head), we asked what properties landmarks should have. We found that landmarks distributed throughout the head or in lateral ganglion perform better in predicting identities of neurons in all regions of the brain (Figure 4—figure supplement 2; Materials and methods). As a test case, we developed strains with spatially distributed, sparse neuronal landmarks using CyOFP (see Material and methods - Construction of landmark strains), which by itself can assist researchers in manual cell identification tasks. When crossed with pan-neuronally expressing GCaMP/RFP reagents, the strains can be used for whole-brain imaging (Figure 4D) by using only two channels. This has two advantages: CyOFP can be imaged 'for free' while imaging GCaMP and RFP simultaneously, thus the landmarks providing a concurrent reference in all frames; this strategy also leaves other channels open for optogenetic manipulations and voltage imaging (Piatkevich et al., 2019; Piatkevich et al., 2018).

We next tested this strategy in a simple whole-brain imaging experiment. Isoamyl alcohol (IAA) is a well-known component of the bacterial metabolites that C. elegans senses and responds to Chalasani et al., 2007; L'Etoile and Bargmann, 2000; Bargmann et al., 1993. We recorded neuronal responses to a step-change in IAA concentration using a microfluidic system (Cho et al., 2020Figure 4—figure supplement 3). We observed both odor-specific responses and spontaneous activities (Figure 4E). More importantly, neurons with algorithm-assigned identities demonstrate expected behavior. For instance, we identified the sensory neuron AWC, and detected an off-response to IAA, consistent with known AWC behavior. In addition, the predicted interneurons (e.g. AVA, RIB, and AIB) also demonstrate previously known activity patterns (Kato et al., 2015).

We also tested worms’ responses to periodic stimuli of a more complex and naturalistic input – supernatant of bacterial culture (Figure 5, Figure 5—video 1). A periodic input (5 s On and 5 s Off for eight cycles) entrains many neurons as expected, therefore allowing us to better separate the odor-elicited responses from spontaneous activities (Figure 5A). We generated the candidate identities for all recorded neurons (Figure 5—figure supplement 1A). Notably, several highly entrained neurons were identified as sensory neurons known to respond to food stimuli (Liu et al., 2019; Wakabayashi et al., 2009; Zaslaver et al., 2015Figure 5C), some of which responded to the onset of the stimuli and some to the withdrawal of the stimuli (Figure 5D). The power spectrum of these neurons showed a strong frequency component at 0.1 Hz as expected (Figure 5B).

Figure 5. CRF_ID framework identifies neurons representing sensory and motor activities in whole-brain recording.

(A) GCaMP6s activity traces of 73 cells automatically tracked throughout a 278 s long whole-brain recording and the corresponding predicted identities (top labels). Periodic stimulus (5 sec-on – 5 sec-off) of bacteria (E. coli OP50) supernatant was applied starting at 100 s (shaded blue regions). Experimental data comes from strain GT296. (B) Power spectrum of neuron activity traces during the stimulation period for all cells. Cells entrained by 0.1 Hz periodic stimulus show significant amplitude for 0.1 Hz frequency component (green). (C) Activity traces of cells entrained by periodic stimulus shown for the stimulation period. Blue shaded regions indicate stimulus ON, unshaded region indicate stimulus OFF. Identities predicted by the framework are labeled. (D) Average ON and OFF responses of cells entrained by periodic stimulus across trials. The black line indicates mean and gray shading indicates ± s.e.m. (E) Average activities of neurons with significant non-zeros weights in the first three sparse principal components (SPCs). Activities within each component are stereotypical and different components show distinct temporal dynamics. Cells with positive weights (blue) and negative weights (red) in SPC2 and SPC3 showed anti-correlated activity. Out of the 67 non-stimulus-tuned cells, 19 had non-zero weights in SPC1, 16 cells had non-zero weights in SPC2, and 5 cells had non-zero weights in SPC3. SPC1, SPC2, and SPC3 weights of cells are shown in Figure 5—figure supplement 1. Shading indicates mean ± s.e.m of activity. (F) Velocity (motion/second) traces of cells along anterior-posterior (AP) axis (blue to red) show phase shift in velocity indicating motion in device shows signatures of wave propagation. (G) Cells with non-zero weights in SPC2 show high mutual information with worm velocity compared to cells grouped in other SPCs (*** denotes p<0.001, Bonferroni paired comparison test). Median (red line), 25th and 75th percentiles (box) and range (whiskers). Dashed line indicates entropy of velocity (maximum limit of mutual information between velocity and any random variable). Velocity of cell indicated by the black arrow in panel H right was used for mutual information analysis. (H) Activity traces of 16 cells (with significant non-zero weights) in SPC2 and corresponding identities predicted by the framework. Red traces for cells with negative weights in SPC2, blue traces for cells with positive weights in SPC2. Worm motion/second shown on top. (Right) max projection of 3D image stack showing head ganglion neurons and cells with positive weights (blue) and negative weights (red) in SPC2. Motion/second of cell indicated with arrow is shown in left panel. (I) Cross-correlation analysis between velocity and cells with non-zero weights in SPC2 shows a strong correlation between neuron activities and velocity. In comparison, other cells show low correlation. Velocity of cell indicated by arrow in panel H right was used for cross-correlation analysis.

Figure 5.

Figure 5—figure supplement 1. Further analysis of data in periodic food stimulation and whole-brain imaging experiment.

Figure 5—figure supplement 1.

(A) Identities (top labels) predicted by our CRF_ID framework overlaid on the image (max-projection of image stack shown). Data comes from strain GT296. (B) Cumulative variance captured by traditional principal components (PCs) and sparse PCs. Sparse PCs capture lower variance as a tradeoff for minimizing mixing of different temporal dynamics across components to improve the interpretability of each PC. (C) The weights of cells across first three sparse principal components (SPCs). Blue and red bars in SPC2 and SPC3 denote cells with significant non-zero weights in SPC2 and SPC3. Activities of these cells are shown in Figure 5E. (D) Left panels top and bottom – Y and X displacement of randomly selected cells in the head ganglion, blue to red ordered from anterior to posterior. Right panels top and bottom show corresponding velocities along Y and X directions. Smooth displacement of cells and phase shift in peak velocities of cells along the AP axis show signatures of wave-propagation in partially immobilized worm in microfluidic device. (E) Mutual information between worm velocity and lagged activities of cells grouped as SPC2 positive (blue) and SPC2 negative (red). Positive lag of neuron activities at which mutual information is maximum indicates neuron activities precede velocity. Shading indicates mean ± sem. The horizontal dotted line indicates entropy of velocity that is maximum mutual information any random variable can have with velocity.
Figure 5—video 1. Whole-brain functional imaging with bacteria supernatant stimulation.
Download video file (2.8MB, mp4)
Circles indicate the tracking of two cells that show ON and OFF response to food stimulus. Scale bar 5 µm.
Figure 5—video 2. Wave propagation in animal and correlation of neuron activities to worm motion.
Download video file (2.7MB, mp4)
Top-left panel shows tracking of cell along the anterior-posterior axis used to calculate the motion of worm. Scale bar 5 μm. Bottom-left panel shows the velocity (px/s) of the cells. Top-right panel shows the velocity of one of the cells. Bottom-right panels show activity of two cells in SPC2 with negative (red) and positive (blue) weights in SPC2.

Next, to examine the latent dynamics in the whole-brain activities during the entire experiment, we used traditional Principal Component Analysis (PCA) and Sparse Principal Component Analysis (sPCA) (Zou et al., 2006). The overall dynamics are low-dimensional with top three traditional PCs capturing 70% of the variance (Figure 5—figure supplement 1B). In comparison, while the top 3 sparse PCs (SPCs) explain 43% of the variance in the data, they enable meaningful interpretation of the latent dynamics by eliminating mixing of activity profiles in PCs (Figure 5E). SPC1 shows a systematic decline of the signals, presumably related to photobleaching of the fluorophores; both SPC2 and SPC3 illustrate spontaneous activities with different temporal dynamics. With automatic annotation, we were able to identify cell classes belonging to each SPC (Figure 5—figure supplement 1C). We then analyzed the relationship between motion and neuron activities. In our microfluidic device, the animals are not fully immobilized. By tracking landmarks on the body; we observed propagating waves along the body (Figure 5F; Figure 5—figure supplement 1D, Figure 5—video 2). Interestingly, cells participating in SPC2 showed significantly higher mutual information with motion than any other component (Figure 5G). Examining the connection between activities of neurons that drive SPC2 and animal motion demonstrates that these neurons are indeed correlated or anti-correlated with the motion we detected (Figure 5H); notably, these neurons included several command interneurons such as AVA, RIM, and motor neurons such as VA and DA (Kato et al., 2015Figure 5H). Cross-correlation analysis between motion and neuron activities showed that neurons are activated ahead of motion (Figure 5I); when a lag is added to the neuron activities, the mutual information of SPC2 neurons with motion is maximum at the same delay observed in the cross-correlation analysis (Figure 5—figure supplement 1E). These experiments together demonstrate the power of the approach, which enabled previously difficult simultaneous analyses of several sensory, inter-, and motor neurons’ activities to natural food stimulus. Thus, automatic identity prediction enabled meaningful interpretation of the whole-brain data.

CRF framework is broadly applicable to wider conditions

Another important advantage of the CRF_ID framework is its flexibility to incorporate additional information to improve the identification accuracy, by simply adding new terms in the objective function without disturbing the weights of existing features. Here we demonstrate this idea by using the recently developed NeuroPAL (Yemini et al., 2021) that provides a unique chromatic code to each neuron (Figure 6A). The chromatic code was included as a unary feature in the model (see Appendix 1–Extended methods S2.6). Using manually curated ground-truth data, we compared different methods. These methods included different orthogonal feature combinations, as used by previous approaches, thus providing insights into which features perform best in predicting cell identities (Figure 6B, see Appendix 1–Extended methods S2). For fair comparison across methods, static OpenWorm atlas was used across all methods. For methods that use color information, we built data-driven color atlases (Appendix 1–Extended methods S2.4) using all datasets except the test dataset: leave-one-out color atlases. Unsurprisingly, registration performs poorly (with or without color information); color alone is not sufficient, and color combined with spatial features improves the accuracy (whether registration or relative position is used). Notably, the best performing model uses relative position features in combination with color and without registration term (Figure 6B; Figure 6—figure supplement 1A), achieving 67.5% accuracy for the top-label prediction. Further, for 85.3% of the neurons, the true identity is within the top three labels.

Figure 6. Annotation framework is generalizable and compatible with different strains and imaging scenarios.

(A) A representative image (max-projection of 3D stack) of head ganglion neurons in NeuroPAL strain OH15495. (B) (Left) comparison of prediction accuracy for various methods that use different information. CRF_ID framework that combines relative position features along with color information performs best (n = 9 animals, *p<0.05, **p<0.01, ***p<0.001, Bonferroni paired comparison test). (Right) the best performing method predicts cell identities with high accuracy. OpenWorm static atlas was used for all methods. Color atlas was built using experimental data with test data held out. Ensemble of color atlases that combine two different color matching methods were used for prediction. Accuracy results shown for top predicted labels. Experimental data comes from strain OH15495. (C) (Left) annotation framework can easily incorporate information from annotated data in the form of data-driven atlas, which improves prediction accuracy (***p<0.001, Bonferroni paired comparison test). Prediction was performed using leave-one-out data-driven atlases for both positional relationship features and color. Accuracy shown for top predicted labels. Ensemble of color atlases that combine two different color matching methods were used for prediction. (Right) accuracy achieved by top, top 3, and top 5 labels. Experimental data comes from strain OH15495. Top, middle, and bottom lines in box plot indicate 75th percentile, median and 25th percentile of data, respectively. (D) An example image of head ganglion neurons in NeuroPAL strain for rotated animal (nematode lying on DV axis). In contrast, animal lying on the LR axis is shown below. The locations of RMDVL/R, AVEL/R cells in the two images are highlighted for contrasts. Dashed ellipses indicate positions of cells in retrovesicular ganglion, showing that the rotated animal is not rigidly rotated. Experimental data comes from strain OH15495. (E) Top-label prediction accuracies for non-rigidly rotated animal. n = 7 animals. Experimental data comes from strain OH15495 and OH15500. Prediction was performed using leave-one-out data-driven atlases for both positional relationship features and color. Accuracy shown for top predicted labels. Ensemble of color atlases that combine two different color matching methods were used for prediction.

Figure 6.

Figure 6—figure supplement 1. Additional results on prediction performance of CRF_ID method on NeuroPAL data: comparison against registration method and utility of ensemble of color atlases.

Figure 6—figure supplement 1.

(A) Comparing accuracy of top 3 and top 5 identities predicted by different methods show CRF_ID framework with pairwise positional relationship features performs better than registration method (top three identities case). All methods used the same leave-one-out color atlas (see Appendix 1–Extended methods S2.3). Experimental data comes from OH15495 strain. n = 9 animals. All comparisons performed with Bonferroni paired comparison test. Top, middle, and bottom lines in box plot indicate 75th percentile, median, and 25th percentile of data respectively. (B) Prediction accuracy of CRF_ID framework on experimental datasets across different kinds of data-driven atlases. ‘All’ atlas includes positional relationships and color information from all datasets including test dataset. For ‘All color, leave-one-out pos.’ atlas, test dataset is held out from positional relationships atlas only. For ‘All pos., leave-one-out-color’ atlas, test dataset is held out from color atlas only. For ‘Both leave-one-out’ atlas, test dataset is held out from both positional relationship and color atlases. Experimental data comes from OH15495 strain. n = 9 animals. Top, middle, and bottom lines in box plot indicate 75th percentile, median, and 25th percentile of data, respectively. (C) Effect of different color distribution alignment methods on prediction accuracy. ‘Both leave-one-out’ case is the baseline case that uses leave-one-out atlases for both positional relationships and color, and color atlases are built by simple aggregation of RGB values across datasets. ‘norm’ indicates normalization of color channels, ‘histm’ indicates histogram matching of training datasets (images used to build atlas) to test dataset. ‘colconst’ indicates color invariant transformation applied to images. ‘norm + histm’ indicates normalization of color channels and then histogram matching of training images to test image. ‘colconst + histm’ indicates color invariant transformation and then histogram matching of training images to test image. ‘colconst + norm + histm’ indicates color normalization, subsequent color invariant transformation and finally histogram matching of training images to test image. ‘hsv + histm’ indicates using hsv color space instead of RGB color space and histogram matching. ‘contrast and gamma’ indicates contrast and gamma adjustment of image channels. ‘contrast and gamma + histm’ indicates contrast and gamma adjustment of image channels and subsequent histogram matching. ‘contrast and gamma + colconst + histm’ indicates contrast and gamma adjust of image channels, subsequent color invariant transformation and finally histogram matching. See Appendix 1–Extended methods S2.4 for more details on methods. Experimental data comes from OH15495 strain. n = 9 animals. Top, middle, and bottom lines in box plot indicate 75th percentile, median, and 25th percentile of data, respectively. (D) Comparison of prediction accuracy of CRF_ID framework across different kinds of data-driven atlases used for prediction. Test dataset was held out from data-driven atlases. ‘Pos. atlas only’ uses only positional relationship features atlas for prediction (these results are same as Figure 2A). ‘Pos. and Col. atlas’ uses positional relationship features and baseline color atlas (Appendix 1–Extended methods S2.4) built by simple aggregation of RGB values of cells in training data used to build atlas. ‘Pos. and Col. ensemble atlas’ uses ensemble of two color atlases for prediction along with positional relationship features atlas. In this case, color distributions in training images were aligned to test data using color invariants and histogram matching prior to building atlas (Appendix 1–Extended methods S2.4). (n = 9 animals, Bonferroni paired comparison test). Experimental data comes from OH15495 and OH15000 strains. Top, middle, and bottom lines in box plot indicate 75th percentile, median, and 25th percentile of data, respectively. (E) Similar to panel B for animals non-rigidly rotated about AP axis (n = 7 animals). Test dataset was held out from data-driven atlases. In this case, Positional relationship and color data-driven atlases were built using data from animals imaged in lateral orientation as well as rotated animals. Comparisons performed with Bonferroni paired comparison test. Experimental data comes from OH15495 and OH15000 strains. Top, middle, and bottom lines in box plot indicate 75th percentile, median, and 25th percentile of data, respectively.
Figure 6—figure supplement 2. Example annotations predicted by the CRF_ID framework for animals imaged lying on the LR axis.

Figure 6—figure supplement 2.

Data comes from OH15495 strain.
Figure 6—figure supplement 3. Example annotations predicted by the CRF_ID framework for animals twisted about the anterior-posterior axis (note the anterior and lateral ganglions show clear left-right separation whereas retrovesicular ganglion instead of being in the middle is more toward one of the left or right sides).

Figure 6—figure supplement 3.

Data comes from OH15495 and OH15500 strains.

Next, to assess true potential of CRF_ID framework, instead of using OpenWorm atlas, we used data-driven positional relationship atlases, so that the predictions are now performed with data-driven atlases for both positional relationships and color. To test the generalizability of the method on unseen datasets, we compared the accuracy of CRF_ID framework across several kinds of data-driven atlases (Figure 6—figure supplement 1B). These included the following: (1) positional relationship and color atlases, which include information from all datasets including test dataset, (2) color information comes from all datasets and leave-one-out atlases for positional relationships built with test dataset held out, (3) positional relationship information, which comes from all datasets and leave-one-out color atlases, and (4) leave-one-out atlases for both positional relationships and color. The analysis revealed that accuracy falls more sharply with using color leave-one-out atlases compared to the leave-one-out positional relationship atlases. This implies that in the datasets used, positional relationship features are more consistent compared to color features. Thus, leave-one-out positional relationships atlases can represent positional relationships among cells in test datasets. Further, to assess the contribution of color information to prediction accuracy, we compared the accuracy of the case using both positional relationship and color leave-one-out atlases (Figure 6—figure supplement 1B last column) to the case where predictions were performed using only leave-one-out positional relationship atlases shown earlier in Figure 2A. We found that color contributed little to improving accuracy. This is because in the datasets used, the color variability in raw RGB values across animals is far greater than the position variability across animals; hence, the distribution of color features in the training data does not match the distribution of features in the test data. This could be due to inherent variations in fluorophore expressions across animals, or variations in imaging settings (i.e. exposure time of each channel, laser power etc.) across sessions. Thus, a naive approach of building color atlas by directly aggregating RGB values from training images contributed little to improving accuracy. The problem of mismatched feature distributions in test data compared to training data is commonly solved by domain adaptation methods in machine-learning community. We adopted a simple domain adaptation strategy for dealing with color discrepancies and developed a two-step method (Appendix 1–Extended methods S2.4). First, we aligned the distributions of RGB values in training datasets to the test dataset by several methods such as simple normalization of color channels, histogram matching color channels in training images to test data set, contrast and gamma adjustment of image channels, and transforming the color space of all images with color invariants (Finlayson et al., 1998Figure 6—figure supplement 1C). Note that this alignment does not rely on cell identity information at all. These color alignment methods by themselves or in combination with other methods improved accuracy for some datasets but not all datasets. Second, we used an ensemble of leave-one-out color atlases for prediction, that is predictions were performed using multiple leave-one-out color atlases each built with a different color alignment technique. The ensemble, in comparison to single atlases, provides a combination of color features from aligned color distributions, thus improving accuracy. The two-step method improves accuracy by 6% over the naïve approach (Figure 6—figure supplement 1D). Overall, a significant improvement in the model accuracy was achieved by using data-driven atlas to account for biological variability in both the positional relationships and color (Figure 6C; Figure 6—figure supplement 2). Using the data-driven atlas, accuracy improved to 81% (top labels); more than 93% of the neurons have their true identities in the top three labels chosen by the model. We expect that more datasets for the atlas will continue to improve the accuracy.

Lastly, we show that our framework is equipped to work with realistic complex scenarios of animals imaged in different orientations, often not rigid rotations (Figure 6D). Identifying cells in these cases is challenging: manual annotation using the 2D-atlas (Altun and Hall, 2009) is not possible since it lacks left-right information; further, due to low-z sampling of image stacks, segmented positions of cells along z-axis are noisier. These challenges can be addressed by using the data-driven atlas. We manually annotated data collected for animals imaged with varying degrees of non-rigid rotations and built data-driven atlases for positional relationships and color. Further, we combined rotated animals’ atlas with the previous atlas built from animals imaged in lateral orientation to build a super atlas. With the test data held out in atlases, the prediction accuracy of top labels was 48.8%, and the accuracy was 68.7% for top three labels (Figure 6E ), which are reasonable for practical purposes. In this case too, aligning the color distributions in atlas to the test data set and using ensemble of color atlases with different alignment techniques helped to significantly improve accuracy over the naïve approach to build color atlases (Figure 6—figure supplement 1E).

Discussion

Annotating anatomical features and cellular identities in biological images are important tasks for many applications. Here, we demonstrated our CRF_ID framework is suitable for fluorescently labeled cells in 3D images for many applications. Using both ground-truth experimental data of whole-brain image stacks and synthetic data generated from an atlas, we showed that our framework is more accurate compared to existing approaches. We demonstrated using real examples how the pipeline can be used for analysis of gene expression pattern for instance, and for neuron identification from dense multi-cell or whole-brain imaging experiments. Further, our CRF_ID framework significantly speeds up the cell identification compared to manual labeling while reducing bias.

With the pipeline, we address several challenges. There is ample evidence that anatomy varies from individual to individual, and from condition to condition. This variability, or position noise, is a major source of roadblock in effectively applying previous methods to annotate the whole-brain recording data. Because our framework leverages intrinsic similarity (Bronstein et al., 2007), it performs better than registration methods in handling position noise (Figure 2E; Figure 2—figure supplement 5C). Further, CRF_ID formulation is more accurate in handling count noise that is cases of missing or undetectable cells in images (Figure 2D; Figure 4C; Figure 2—figure supplement 5B; Figure 4—figure supplement 1A), because the missing neurons do not upset the relationships among the detectable neurons in the CRF_ID formulation while missing neurons introduce large uncertainty in registration methods. Lastly, CRF_ID method predicts identities with sufficient accuracy for different postural orientations of the worms often seen in our microfluidic experiments. We expect that this superiority is maintained for any data that have relational information preserved, this is the case virtually in all biological samples where tissues are connected by matrix materials, such as in other whole-brain recordings or for registration of fixed tissues.

Building and using data-driven atlases in the pipeline is simple and yet highly effective. We expect that data from more animals, different orientations, age, and imaging techniques will further improve the generalizability. Since building such data-driven atlas for our framework requires only cheap mathematical operations (Appendix 1–Extended methods S1.7), incorporating more data is quite simple and easily scalable. In contrast, previous registration-based methods may require simultaneous or batch-wise registration of multiple images to one reference image; this would require solving multiple constrained regression problems on increasingly large data sets, thus rendering them computationally unscalable. Further, without systematic methodology of which image should be chosen as reference image, atlas gets biased toward the chosen reference image or by the order in which blocks of images are registered to the reference image. Tackling this challenge is an active field of research (Wang et al., 2008; Evangelidis et al., 2014). In comparison, in CRF method, atlas building is unbiased toward any image set because there is no concept of reference image. Additionally, atlas can be built from all images simultaneously because of the cheapness of mathematical operations.

Another major advantage of data-driven atlases in our framework is that the atlases can be built incrementally using partially or fully annotated datasets, for example using lines that label partial and distinct subsets of cells. In comparison, registration-based methods cannot build atlas from lines that label distinct subset of cells. This is because registration-based methods build probabilistic spatial atlases by first establishing correspondence among cells in images and subsequently registering images. However, this is not possible if the cells in different images do not have any overlapping cells or have very few overlapping cells. In comparison, atlases built in CRF_ID framework store probabilistic positional relationship features among cells observed within each image. Hence, correspondence between images is not required. Thus, in principle, CRF_ID framework can combine manually annotated data across different lines, generated by different researchers (and across labs) in the form of data-driven atlases. Automated annotation using such atlases removes individual biases in annotating cells. Further, it greatly supports researchers with no prior experiences with cell identification. We expect that using our framework, large-scale atlases can be built in the future with community contributed data.

Finally, a distinction of CRF_ID framework is its ability to build and annotate with complete atlases covering all cells. This is made possible by the efficient utilization of data, even from strains with non-overlapping cells. Annotating against a complete atlas is crucial because commonly in practice, no prior information is available on exactly which cells are missing from the images before annotation. Registration-based or unary potential-based methods are limited in building atlas by the availability of overlapping strains. Thus, in these methods, cells that are missing in the atlas can never be assigned to cells in images; hence these methods do not perform completely unbiased annotation. In comparison, CRF_ID framework uses a complete atlas to assign any possible label in the atlas to cells in the images, thus performing unbiased annotation, resulting in better handling of count noise in images.

CRF framework offers the flexibility of combining arbitrary unary features with arbitrary pairwise features for cell annotation task. We demonstrate the utility of such flexibility by combining color information of cells in NeuroPAL strain with positional relationship features and show higher accuracy compared to other methods. Our experiments show that in order to be able to utilize color information of cells in NeuroPAL for automatic annotation of cell identities, color consistency across animals needs to be maintained, either experimentally or by post hoc corrections. Experimentally, consistent protocol/imaging settings across sessions should be maintained as much as possible. Even with consistent protocol, color variation may exist due to inherent differences across animals in relative expressions of fluorophores that define the chromatic code of cells. This can be tackled by (1) collecting large volume of data to capture each cells’ full RGB variations and (2) using computational domain adaptation techniques. More advancement in image color transfer and domain adaptation techniques will further improve accuracy in future.

While we only considered pairwise features in the current formulation, feature functions with arbitrary dependency can be included in the model that may further improve prediction accuracy (Kohli et al., 2009; Najafi et al., 2014). Advances in structured energy minimization field (Kohli et al., 2009; Komodakis and Paragios, 2009; Krähenbühl and Koltun, 2011) will enable tackling the increased complexity of combinatorial optimization in these cases. Our workflow borrows techniques developed in metric object/shape matching literature for annotation in biological images. Log-linear parameterization in our framework makes the model a member of the exponential families (Wainwright and Jordan, 2007); thus, the objective function in our framework has interesting connections with max-entropy models and with the entropy-regularized optimal transport objective functions (Solomon et al., 2016; Nitzan et al., 2019). Therefore, improvements in computational speed can be achieved by borrowing fast optimization techniques for quadratic assignment problems developed in optimal transport literature. Advances in these fields will continue to improve the method development in image analysis.

We anticipate that by using our generalizable formulation, similar pipelines can be set up to annotate more image sets in other organisms and build atlases. Data in many anatomical annotation problems (e.g. brain atlas construction, registering images from different modalities, comparing animals or related species to one another for developmental studies) share a similar property, in that the anatomical features of interest maintain a cohesion from sample to sample. This underlining cohesion lends itself to the CRF framework. As we have shown, the pipeline is extremely flexible in incorporating new information. Thus, framework should be easily modifiable catering to the data demands in other organisms including features besides landmarks and spectral information such as cellular morphology and expected cellular activities (e.g. calcium transients). Because the only inputs to our framework are segmented anatomical regions in images and positional relationships among them, information already available in data across organisms (Robie et al., 2017; Kim et al., 2015; Chen et al., 2019; Ronneberger et al., 2012), the framework proposed here should be generally useful for many problems in model organisms such as Drosophila (Robie et al., 2017; Vaadia et al., 2019), zebrafish (Ronneberger et al., 2012), mammalian brains (Kim et al., 2015; Chen et al., 2019). Besides fluorescence, the pipeline should also be able to work with data from other modalities including EM, live imaging, and fluorescence imaging from cleared tissues.

Materials and methods

Reagents

For all experiments, animals were cultured using standard techniques (Stiernagle, 2006). A detailed list of strains used is provided below.

Name Genotype Experiments Reference
GT290 aEx18[unc-47p::NLS::CyOFP1::egl-13NLS] Strain with nine neuronal landmarks in head This work
GT298 aEx22[unc-47p::NLS::CyOFP1::egl-13NLS + gcy-32p::NLS::CyOFP1::egl-13NLS] Strain with 12 neuronal landmarks in head This work
AML32 wtfIs5 [rab-3p::NLS::GCaMP6s + rab-3p::NLS::tagRFP] Strain used to make whole-brain imaging strain with CyOFP labeled landmarks GT296 Nguyen et al., 2016
AML70 wtfIs5 [rab-3p::NLS::GCaMP6s + rab-3p::NLS::tagRFP]; lite-1(ce314) X Strain used to make whole-brain imaging strain with CyOFP labeled landmarks GT293 Scholz, 2018
KG1180 lite-1(ce314) X Strain used to make whole-brain imaging strain with CyOFP labeled landmarks GT296 CGC
GT296 wtfIs5 [rab-3p::NLS::GCaMP6s + rab-3p::NLS::tagRFP]; aEx18[unc-47p::NLS::CyOFP1::egl-13NLS]; lite-1(ce314) X Strain used for whole-brain functional imaging experiments (Figures 4 and 5) and quantifying cell position variability. This work
GT293 wtfIs5 [rab-3p::NLS::GCaMP6s + rab-3p::NLS::tagRFP]; lite-1(ce314) X; aEx22[unc-47p::NLS::CyOFP1::egl-13NLS + gcy-32p::NLS::CyOFP1::egl-13NLS] Strain used for quantifying cell position variability. This work
AML 5 otIs355 [rab-3p(prom1)::2xNLS::TagRFP] IV. kyIs51 [odr-2p::GFP + lin-15(+)] Strain used for mock gene-expression pattern analysis and mock multi-cell calcium imaging experiments Nguyen et al., 2016
OH15495 otIs696 [UPN::NLS::TagRFP-T + acr-5::NLS::mTagBFP2::H2B + flp-1::NLS::mTagBFP2::H2B + flp-6::NLS::mTagBFP2::H2B + flp-18::NLS::mTagBFP2::H2B + flp-19::NLS::mTagBFP2::H2B + flp-26::NLS::mTagBFP2::H2B + gcy-18::NLS::mTagBFP2::H2B + ggr-3::NLS::mTagBFP2::H2B + lim-4::NLS::mTagBFP2::H2B + pdfr-1::NLS::mTagBFP2::H2B + srab-20::NLS::mTagBFP2::H2B + unc-25::NLS::mTagBFP2::H2B + cho-1::NLS::CyOFP1::H2B + flp-13::NLS::CyOFP1::H2B + flp-20::NLS::CyOFP1::H2B + gcy-36::NLS::CyOFP1::H2B + gpa-1::NLS::CyOFP1::H2B + nlp-12::NLS::CyOFP1::H2B +nmr-1::NLS::CyOFP1::H2B + ocr-1::NLS::CyOFP1::H2B + osm-9::NLS::CyOFP1::H2B + srh-79::NLS::CyOFP1::H2B + sri-1::NLS::CyOFP1::H2B + srsx-3::NLS::CyOFP1::H2B + unc-8::NLS::CyOFP1::H2B + acr-2::NLS::mNeptune2.5 + ceh-2::NLS::mNeptune2.5 + dat-1::NLS::mNeptune2.5 + dhc-3::NLS::mNeptune2.5 + eat-4::NLS::mNeptune2.5 + flp-3::NLS::mNeptune2.5 + gcy-35::NLS::mNeptune2.5 + glr-1::NLS::mNeptune2.5 + gcy-21::NLS::CyOFP1::H2B::T2A::NLS::mTagBFP2::H2B + klp-6::NLS::mNeptune2.5::T2A::NLS::CyOFP1::H2B + lim-6::NLS::mNeptune2.5::T2A::NLS::CyOFP1::H2B + mbr-1::NLS::mNeptune2.5::T2A::NLS::mTagBFP2::H2B + mec-3::NLS::CyOFP1::H2B::T2A::NLS::mTagBFP2::H2B + odr-1::NLS::mNeptune2.5::T2A::NLS::mTagBFP2::H2B + srab-20::NLS::mNeptune2.5::T2A::NLS::mTagBFP2::H2B] NeuroPAL strain demonstrating the ease of incorporating color information, and thus demonstrating generalizability Yemini et al., 2021
OH15500 otIs672 [rab-3::NLS::GCaMP6s + arrd-4:NLS:::GCaMP6s]. otIs669 [UPN::NLS::TagRFP-T + acr-5::NLS::mTagBFP2::H2B + flp-1::NLS::mTagBFP2::H2B + flp-6::NLS::mTagBFP2::H2B + flp-18::NLS::mTagBFP2::H2B + flp-19::NLS::mTagBFP2::H2B + flp-26::NLS::mTagBFP2::H2B + gcy-18::NLS::mTagBFP2::H2B + ggr-3::NLS::mTagBFP2::H2B + lim-4::NLS::mTagBFP2::H2B + pdfr-1::NLS::mTagBFP2::H2B + srab-20::NLS::mTagBFP2::H2B + unc-25::NLS::mTagBFP2::H2B + cho-1::NLS::CyOFP1::H2B + flp-13::NLS::CyOFP1::H2B + flp-20::NLS::CyOFP1::H2B + gcy-36::NLS::CyOFP1::H2B + gpa-1::NLS::CyOFP1::H2B + nlp-12::NLS::CyOFP1::H2B + nmr-1::NLS::CyOFP1::H2B + ocr-1::NLS::CyOFP1::H2B + osm-9::NLS::CyOFP1::H2B + srh-79::NLS::CyOFP1::H2B + sri-1::NLS::CyOFP1::H2B + srsx-3::NLS::CyOFP1::H2B + unc-8::NLS::CyOFP1::H2B + acr-2::NLS::mNeptune2.5 + ceh-2::NLS::mNeptune2.5 + dat-1::NLS::mNeptune2.5 + dhc-3::NLS::mNeptune2.5 + eat-4::NLS::mNeptune2.5 + flp-3::NLS::mNeptune2.5 + gcy-35::NLS::mNeptune2.5 + glr-1::NLS::mNeptune2.5 + gcy-21::NLS::CyOFP1::H2B::T2A::NLS::mTagBFP2::H2B + klp-6::NLS::mNeptune2.5::T2A::NLS::CyOFP1::H2B + lim-6::NLS::mNeptune2.5::T2A::NLS::CyOFP1::H2B + mbr-1::NLS::mNeptune2.5::T2A::NLS::mTagBFP2::H2B + mec-3::NLS::CyOFP1::H2B::T2A::NLS::mTagBFP2::H2B + odr-1::NLS::mNeptune2.5::T2A::NLS::mTagBFP2::H2B + srab-20::NLS::mNeptune2.5::T2A::NLS::mTagBFP2::H2B] V NeuroPAL strain demonstrating the ease of incorporating color information, and thus demonstrating generalizability Yemini et al., 2021

Imaging

All imagings were performed using either a Perkin Elmer spinning disk confocal microscope (1.3 NA, 40x, oil objective) or Brucker Opterra II Swept field confocal microscope (0.75 NA, 40x, Plan Fluor air objective), with an EMCCD camera.

To acquire data used for framework validation and comparison against other methods (Figure 2), gene expression pattern analysis (Figure 3), multi-cell calcium imaging (Figure 4), imaging landmark strain (Figure 4) and NeuroPAL imaging (Figure 6), animals were synchronized to L4 stage and were imaged in an array microfluidic device (Lee et al., 2014). A single 3D stack was acquired with either 0.5 µm or 1 μm spacing between z-planes and 10 ms exposure time (except for NeuroPAL strain where exposure times of different channels were chosen based on the guidelines provided in NeuroPAL manuals Yemini et al., 2021).

Whole-brain functional recording data while providing chemical stimulus were acquired using a microfluidic device designed for applying chemical stimulation (Cho et al., 2020) to the nose-tip of the animal. Here, image stacks were acquired with 1 μm spacing between z-planes and 10 ms exposure for each z-plane. This enabled recording videos at 1.1 volumes/s while imaging two channels simultaneously (GCaMP and RFP). Animals were synchronized to Day-1 adult stage.

Generating synthetic data for framework tuning and comparison against other methods

Synthetic data was generated using the freely available 3D atlas at OpenWorm (Szigeti et al., 2014). Atlas available at Worm Atlas (Altun and Hall, 2009) was not used as it provides only a 2D view. To mimic the conditions encountered in experimental data, two noise perturbations were applied to the 3D atlas (Figure 2—figure supplement 2). First, due to inherent biological variability, positions of cells observed in images do not exactly match the positions in atlas. Thus, position noise was applied to each cell in the atlas. This noise was sampled from a normal distribution with zero mean and fixed variance Σ=diag([σx,σy,σz]). Here σx,σy and σz denote variances along x, y and z image dimensions and diag(x) denotes diagonalizing vector x. Hence, the position of ith cell piR3 in synthetic data was defined as pi=pi,atlas+ϵ, ϵ 𝒩(0,Σ). Here pi,atlas is the position of the ith cell in the atlas. To determine the variance Σ, we quantified the variance of cell positions observed in experimental data (Figure 2—figure supplement 2A, C, E) using the strains GT293, GT295 with neuronal landmarks. We calculated the 25th percentile and 75th percentile of the variance across all cells across all animals (n = 31) to define the lower bound and upper bound position noise levels observed in experimental data. However, this variability cannot be directly applied to the atlas due to different spatial scales in images and atlas. Thus, we first normalized the observed variance of cell positions in images with inter-cell distances in images and then scaled it according to the inter-cell distances in atlas (Figure 2—figure supplement 2B,D,F,G,H) to define lower bound and upper bound noise to be applied to the atlas. More position noise levels such as those in Figure 2E and Figure 2—figure supplement 5 were set by varying Σ between zero and upper-bound noise level.

Second, although there are 195–200 neurons in head ganglion in C. elegans, only 100–130 cells were detected in most image stacks. Remaining cells are not detected either due to low-expression levels of fluorophores, variability in expression levels of genetic marker used to label cells (mosaicism, incomplete coverage etc.) or inability of segmentation methods to resolve densely packed cells. This increases the complexity of determining the labels of cells. To illustrate this, matching 195 cells in an image to 195 cells in the atlas is easier as only one or very few possible configurations of label assignments exist that maximally preserves the positional relationships among cells. In contrast, in the case of matching 100 cells in an image to 195 cells in atlas, many possible labeling arrangements may exist that will equally preserve the positional relationships among cells. Thus, to simulate count noise in synthetic data, randomly selected cells in atlas were marked as missing and synthetic data was generated from the atlas with remaining cells. Hence, identities were predicted for remaining cells only in synthetic data using the full atlas. Number of cells set as missing was set by the count noise level parameter, defined as the fraction of total cells in the atlas that were set as missing. Since no prior information was available on which regions of the head ganglion had more cells missing, we selected the missing cells uniformly across brain regions.

Finally, bounds on prediction accuracy (shown as gray regions in Figure 2, Figure 2—figure supplement 1) were obtained as the average prediction accuracy across runs obtained on synthetic data by applying lower bound and upper bound position noise.

Generating ground-truth data for framework tuning and comparison against other methods

NeuroPAL reagents OH15495 and OH15500 were used to generate ground-truth data. 3D image stacks were acquired following the guidelines provided in NeuroPAL manual (Yemini et al., 2021). Identities were annotated in image stacks using the example annotations provided in NeuroPAL manual. Individual channel image stacks were read in MATLAB, gamma and contrast were adjusted for each channel individually so that the color of cells in the RGB image formed by combining the individual channels matched as much as possible (perceptually) with the colors of cells in NeuroPAL manuals. To annotate identities in the 3D stack, Vaa3D software was used (Peng et al., 2010).

Model comparison against previous methods

Detailed description of the methodology used for each method that our CRF_ID framework was compared against is provided in Appendix 1–Extended methods S2. Note, for fair comparisons, standard 3D OpenWorm atlas was used by all methods as the reference: either for defining positions of cells (used by registration methods) or for defining positional relationships among cells (used by the CRF_ID framework).

Simulations for choosing landmark locations

Landmarks (cell with known identities) improve prediction accuracy by constraining the optimization problem as it forces the CRF_ID framework to choose optimal labels for all cells such that they preserve their positional relationships with the cells with fixed identities. However, choosing an optimal set of landmarks is difficult. This is because the combinatorial space of choosing landmarks is huge (~1014 for 10 landmark cells out of 195 in head ganglion). Simulating each such combination and predicting identities is not computationally tractable. Thus, we asked which regions of the brain landmark cells should lie in. We divided the head ganglion region into three groups: anterior group consisting of anterior ganglion, middle group consisting of lateral, dorsal and ventral ganglion, and posterior group consisting of retrovesicular ganglion. Two hundred runs were performed for each group with 15 randomly selected landmarks in each run. We constrained the landmarks cells to lie in a specific group and assessed how well the landmarks in that group perform in predicting the identities of cells in other regions. Overall, landmarks in anterior and posterior groups performed badly in predicting identities of cells in posterior and anterior groups respectively. Landmarks in the middle group and landmarks spatially distributed throughout the head performed equally (Figure 4—figure supplement 2). We chose landmarks spatially distributed throughout the head due to practical advantages: spatially distributed landmarks can be easily identified manually in image stacks thus can be used as inputs to the CRF_ID framework. In contrast, cells in middle group are densely packed and may not be identified easily. We tested this using several reporter strains with GFP labeled cells. Further, landmarks should be reliably expressed across animals, should have known and verified expression patterns and should label neither too few cells (not useful) nor too many cells (difficult identification). Thus, we chose unc-47 and gcy-32 reporters for labeling landmarks.

Construction of landmark strains

We constructed two different transgenic strains in which nine (GT290) and twelve (GT298) neurons, respectively, were labeled with the fluorescent protein CyOFP1 (Chu et al., 2016). Due its long Stokes shift, CyOFP1 can be excited by the same laser line as GCaMP, while emitting red-shifted photons. This conveniently allows us to perform three-color imaging on our two-channel confocal microscope. We designed an optimized version of the CyOFP1 gene using the C. elegans Codon Adapter (Redemann et al., 2011), which was then synthesized (Integrated DNA Technologies) and sub-cloned into a kanamycin-resistant vector to yield the pDSP11 plasmid. Our CyOFP1 construct contains two different nuclear localization sequences (NLS), SV40 NLS at the N-terminus and EGL-13 NLS at the C-terminus, a strategy which has been shown to more effective at trafficking recombinant proteins to the nucleus of worm cells (Lyssenko et al., 2007). The nuclear localization of the CyOFP1 protein allows us to unambiguously identify labeled cells in the densely packed head ganglion of C. elegans.

We tested two different labeling strategies in our study. The first used the promoter of the unc-47 gene to drive expression CyOFP1 in all 26 GABAergic neurons of the worm (McIntire et al., 1997). As our study focused on the head ganglion, only the RMEL, RMER, RMEV, RMED, AVL, RIS, DD1, VD1, and VD2 neurons are labeled by this promoter in this region (Strain GT296, Figure 4D top panel). Our second strategy used the unc-47 CyOFP1 construct in combination with a second driven by the promoter of the gcy-32 gene, which is expressed in the AQR, PQR, and URX neurons (Yu et al., 1997), to label twelve cells in the head ganglion (Strain GT293, Figure 4D bottom panel). The DNA sequence of each promoter was amplified from N2 (wild type) genomic DNA and incorporated into a NotI-digested linear pDSP11 vector backbone using the NEBuilder HiFi DNA Assembly master mix (New England Biolabs) to yield the vectors pSC1 and pSC2. The following primers were used to amplify the promoters: unc-47 Forward 5’- cagttacgctggagtctgaggctcgtcctgaatgatatgcCTGCCAATTTGTCCTGTGAATCGT-3’ and Reverse 5’- gcctctcccttggaaaccatCTGTAATGAAATAAATGTGACGCTGTCGT, gcy-32 Forward 5’- cagttacgctggagtctgaggctcgtcctgaatgatatgcTTGTATAGTGGGAAATACTGAAATATGAAACAAAAAATATAGCTG-3’ and Reverse 5’- gcctctcccttggaaaccatTCTATAATACAATCGTGATCTTCGCTTCGG-3’.

To make landmark strains pSC1 and pSC2 were injected into N2 strain to make GT290 and GT298. GT290 and GT298 strains can be crossed with any strain where cells need to be identified. Landmarks in these strains help in defining a coordinate system in head and also improve the accuracy of automatic annotation framework by constraining optimization problem. To make strain GT293 for whole-brain imaging experiments, AML70 was crossed with GT298; lite-1(ce314) was confirmed by sequencing. To make strain GT296 for whole-brain imaging experiments, AML32 was crossed with GT290 and subsequently crossed with KG1180, lite-1(ce314) was confirmed by sequencing.

Whole-brain data analysis

All videos were processed using custom software in MATLAB for automatic segmentation and tracking of nuclei in whole-brain image stacks. Tracks for nuclei with minor tracking errors were corrected in post-processing steps. Tracks with large tracking errors were dropped from the data.

Segmentation

Neurons were automatically segmented in image stacks using a Gaussian Mixture model based segmentation technique. Briefly, intensity local maxima are detected in images to initialize the mixture components and subsequently a 3D gaussian mixture model is fitted to the intensity profiles in image stacks using Expectation-Maximization (EM) algorithm. The number of components in the model and the ellipsoidal shape of each component determines the number of nuclei segmented and their shapes.

Tracking

Custom software was used for tracking cells. Briefly, segmented nuclei at each timepoint in image stacks are registered to a common reference frame and temporally nearby frames to produce globally and locally consistent matching. Based on these matchings, consistency constraints such as transitivity of matching were imposed in the post-processing step to further improve tracking accuracy. A custom MATLAB GUI was used to quickly manually inspect the accuracy of tracking. Tracks of cells with minor tracking errors were resolved using semi-automated method.

Cell identification

Identities were predicted using the CRF_ID framework with positional features (Appendix 1–Extended methods S1) using the data-driven atlas. Landmarks cells with known identities were identified in the CyOFP channel and were provided as input to the framework to achieve higher accuracy.

Identification of stimulus tuned neurons

To identify stimulus tuned neurons, the power spectrum of activities of all cells within the stimulus application window (100 s – 180 s) was calculated using ‘fft’ function in MATLAB. Cells that showed significant power (>0.08) at 0.1 Hz (due to 5 s on 5 s off stimulus, Figure 5) were selected. This criterion identified all cells except two with low response amplitude to the stimulus; however, the response could be manually seen in the video. Thus, these cells were manually selected.

PCA and Sparse PCA

Principal Component analysis (PCA) of neuron activity time-series data was performed using in-built functions in MATLAB. Sparse Principal component analysis (SPCA) was performed using freely available MATLAB toolbox (Sjöstrand et al., 2018).

Neuron activities correlation to animal motion

To ascertain that the motion of the worm in device has signatures of wave-propagation in freely moving animals, we looked for phase shift in the velocity of the different regions of the animal in the device (similar to phase shift in curvature of body parts of animals seen in freely moving animals Stephens et al., 2008). To calculate the velocity, displacement of randomly selected cells along the anterior-posterior axis of the animal was calculated (Figure 5—video 2) based on the tracking of cells. Cell displacements were smoothed using Savitzky-Golay filter. Subsequently, velocity of each cell was calculated by differentiating the displacement of each cell.

Mutual information (MI) of the obtained velocity signal was calculated with (1) stimulus tuned neurons, (2) neurons with significant weights in sparse principal components 1–3, and (3) remaining cells. MI analysis requires estimating the joint probability density of velocity and neuron activity. We used the kernel density estimation (KDE) method to do so. KDE method used Gaussian kernel with bandwidth parameters (that specify the variance of gaussian kernel) set to [0.05, 0.05]. Cells grouped in SPC2 always had the largest mutual information with velocity regardless of the choice of the bandwidth parameter.

Runtime comparison

To compare optimization runtimes of CRF and registration-based method CPD (Myronenko and Song, 2010), synthetic data was generated using OpenWorm atlas as described previously with randomly selected 10, 20, 50, and 130 cells. Annotation was performed using only positional relationship features. Full head ganglion OpenWorm atlas (206 cells) was used for annotation. Simulations were run on standard desktop computer (Intel Xeon CPU E5-1620 v4 @3.5 GHz, 32 GB RAM).

Statistical analysis

Standard statistical tests were performed using Paired Comparisons App in OriginPro 2020. Details regarding the tests (sample size, significance, method) are reported in figure legends. Following asterisk symbols are used to denote significance level throughout the manuscript - * (p<0.05), ** (p<0.01), *** (p<0.001). Significance level not indicated in figures implies not significantly different (n.s).

Code and data availability

Code and data used in this study can be accessed at https://github.com/shiveshc/CRF_Cell_ID.git. This repository contains the following (1) All code and individual components necessary for using CRF_ID framework to annotate cells in new data, visualize results, and build new atlases based on annotated data (2) Code to reproduce results for comparison shown against other methods in this study, and (3) all raw datasets used in this study as well as human annotations created for those datasets except whole-brain imaging datasets.

Acknowledgements

The authors thank the Hobert Lab for providing NeuroPAL strains. The authors acknowledge the funding support of the U.S. NIH (R21DC015652, R01NS096581, NIH R01GM108962, and R01GM088333) and the U.S. NSF (1764406 and 1707401) to HL. Some nematode strains used in this work were provided by the Caenorhabditis Genetics Center (CGC), which is funded by the NIH (P40 OD010440), National Center for Research Resources and the International C. elegans Knockout Consortium. SC and HL designed the algorithm, experiments, and methods. SC collected ground-truth validation data. SC, SL collected whole-brain imaging data. SC, SL, YL, and DSP developed strain with neuronal landmarks; SC, SL, YL, and HL analyzed the data. SC and HL wrote the paper. The authors declare no competing interest. All raw datasets (except whole-brain imaging datasets) used in this study as well as ground-truth human annotations created for those datasets are available at https://github.com/shiveshc/CRF_Cell_ID/tree/master/Datasets.

Appendix 1

Extended methods

S1: Detailed description of CRF_ID model for cell annotation

S1.1 Advantages of CRF_ID framework

There are several advantages of CRF-based formulation for automatic cell annotation as discussed below.

  1. The major advantage of CRF framework is that arbitrary feature relationships fk among non-independent observations x can be specified (Sutton and McCallum, 2010). This enables optimizing user-defined arbitrary features. Depending on the scope of potential functions Φc, these features may be cell specific that is unary features (Φc=Φi), pairwise features (Φc=Φij) or higher order features. For example position of cells in ganglion, position related to a landmark cell with known identity, color etc.

  2. Since we are modeling a probability distribution over labels assigned to neurons, a set of candidate list of names can be generated for each neuron. A straightforward way to do this is by using the marginal probabilities of labels assigned to each cell. Once the normalization constant Z has been calculated, marginal probability of cell i taking a label yi can be obtained as Pyix=jV,jiPy/x. Other computationally efficient methods to estimate the marginal probabilities using eigenvector of potential functions Leordeanu and Hebert, 2005; Leordeanu and Hebert, 2009 have been proposed as well. We propose an alternative method to generate such list (see Appendix 1–Extended methods S1.5) taking into account the undetected neurons in image stack.

  3. With log-linear parameterization of feature functions as in (Equation 2), CRF models belong to exponential family models (Wainwright and Jordan, 2007) or max-entropy models. Thus, the joint probability distribution over labels assigned to neurons that we infer is maximally unbiased (maximum entropy) subject to some empirical constraints (sufficient statistics to define the probability distribution). In our case, these empirical constraints are geometrical relationships among cell positions. Interestingly, the maximum entropy nature of the objective function in our model also makes it very similar to the entropy regularized optimal transport problems (Solomon et al., 2016; Nitzan et al., 2019).

  4. CRF framework is a trainable algorithm (Taskar et al., 2004; Taskar et al., 2003). Thus, if annotated image stacks are available, the weights λk of the feature functions can be optimized directly such that the cell labels in annotated image stacks match the predicted labels. Further, we show that annotated experimental data can be utilized by building a data-driven atlas and feature functions can updated based on data-driven atlas (S1.7). For pairwise feature functions in our model, building such data-driven atlas requires cheap mathematical operations (averaging).

S1.2 Features in CRF_ID-based annotation framework

During manual annotation of neuron identities in images, researchers use intuitive features such as positions of neurons in image and atlas, positional relationships among neurons, proximity of neurons to one another, neuron with known identities such as neurons expressing specific markers etc. In this section, we describe how such intuitive features are encoded in our model.

S1.2.1 Unary potentials – positions along AP axis

In empirical data, we observed that anterior-posterior (AP) positional relationships among the cells are most stereotypical (Figure 2—figure supplement 6C,D) and consistent with the 3D atlas. Further since, x-y sampling of image stacks is much higher than z sampling, AP axis’ sampling is always higher than LR and DV axes. Thus, positions along AP axes detected by segmentation method are less noisy. Thus, we included a feature based on positions of cell along AP axis as unary feature.

fa(m;i,x)=exp(||xi,APym,AP||2σu2) (5)

Here, xi,AP is the position of cell i along AP axes in data and ym,AP is the position of cell with label m in atlas. σu is the bandwidth parameter. To account for scale differences between image and atlas, positions along AP axis are normalized first in image and atlas. Low value of σu greatly penalizes the deviation of cell position along AP axis in image from atlas position thus favoring that labeling should preserve positions along AP axis. Large value of σu decreases the effect of deviation of cell position along AP axis. Thus, this feature restricts the assignment of a particular label to certain cells along AP axes making the labeling AP consistent. Low values of σu also increases the influence (weight) of this unary potential compared to the pairwise potentials in labeling. We set this parameter as 1 to set equal influence of unary and pairwise potentials.

S1.2.2 Pairwise potentials – binary positional relationships

These features encode that the labels assigned to neuron in an image should preserve the relative positional relationships among them in the atlas. for example if neuron i is to the left of neuron j in the image stack and these neurons are assigned labels m and n respectively, then the neuron with label m in the atlas should be to the left of neuron with label n. Similar constraints can also be applied on anterior-posterior relationship and dorsal-ventral relationship of neurons in the image stack. Let xi,AP,xi,LR,xi,DV denote the coordinate of neuron i in the image along anterior-posterior (AP), left-right (LR) and dorsal-ventral (DV) axes, respectively. Similarly, ym,AP,ym,LR,ym,DV be the coordinates of neuron with label m in the atlas. The feature functions fAP is defined as

fAP(m,n;i,j,x)=λAP{1,(xi,APxj,AP)(ym,APyn,AP)>00,(xi,APxj,AP)(ym,APyn,AP)<0 (6)

Thus, this feature implies that if atlas labels m and n assigned to neurons i and j in image are consistent with the AP positional relationship of cells i and j, then the feature value is 1 else 0. Note that the feature values for cells i and j is same (1 or 0) irrespective of the labels m and n assigned to cells. This is true only if annotation is performed using a static atlas or an atlas built from only one data source. We expand more on this in section S1.7 (Building data-driven atlases) and explain how label dependent feature functions are formed with the availability of empirical hand-annotated datasets.

Similarly, features are defined for left-right and dorsal-ventral relationships, fLR and fDV, respectively.

fLR(m,n;i,j,x)=λLR{1,(xi,LRxj,LR)(ym,LRyn,LR)>00,(xi,LRxj,LR)(ym,LRyn,LR)<0 (7)
fDV(m,n;i,j,x)=λDV{1,(xi,DVxj,DV)(ym,DVyn,DV)>00,(xi,DVxj,DV)(ym,DVyn,DV)<0 (8)

Here, λAP, λLR, and λDV are hyperparameters in the model that weigh positional relationship features against other pairwise features in the model. We set these parameters as 1 to give equal weightage to all features.

S1.2.3 Pairwise potentials – proximity relationships

While manually annotating images by comparing positions of neurons to atlas, researchers often use proximity relationship among neurons that is if neuron i is anatomically far from neuron j then the identities to be assigned to these neurons from atlas should not belong to neighboring or nearby neurons. To encode such intuition in the model, we include proximity feature similar to the Gromov-Wasserstein discrepancy used in shape matching (Solomon et al., 2016; Mémoli, 2011)

fproximity(m,n;i,j,x)= λproximity||d(xi,xj)d(ym,yn)||2 (9)

Here, dxi,xj is any distance measure between neurons i and j in the image stack and dym,yn is the same distance measure between neurons with labels m and n in the atlas. We use geodesic distances between cells. To calculate geodesic distances, graphs were constructed by connecting each neuron to its nearest six neighbors. λproximity is hyperparameter that weighs proximity relationship feature function against other features in the model. We set λproximity as 1.

We compared geodesic distances rather than Euclidean 2 distances because geodesic distances are invariant to spatial scale in the image and atlas. This is critical since the scale of spatial distribution of neurons in the atlas is very different (much lower) than those in the images. Also, the spatial scale of distribution of cells may vary across images, depending on the size of worm in images; however, geodesic distances among neurons should be preserved.

S1.2.4 Pairwise potentials – angular positional relationships

Relative positional relationship features described in S1.2.2 encode information about positional relationships along axes independently that is each feature contains information about positional relationship along one axis only. For example fAP encodes whether neuron i is anterior to or posterior to neuron j and how the labels should be assigned to these cells. A feature that simultaneously accounts for positional relationships along all axes may additionally help in determining identities of neurons. Such a feature could be formed by multiplying AP, LR, and DV positional relationship features. However, a multiplied feature will still contain binary information only about whether neuron i is anterior, dorsal and to the right of neuron j or not. It would not tell anything about fine scale directional relationships. Thus, we formulated an angular relationship feature. Let pi' and pj' be the 3D vectors associated with coordinates of neurons i and j in the image stack. Also let pm'' and pn'' be the 3D vectors associated with coordinates of neurons with labels m and n in the atlas. Then the feature is defined as

fanglem,n;i,j,x=λangle1+pi'-pj'||pi'-pj'||2.pm''-pn''||pm''-pn''||22 (10)

Thus, if the vector p'q' aligns perfectly with the vector p''q'', fangle=1 and fangle=0 if the vectors point in completely opposite directions. This feature encodes directional agreement of the labels m and n assigned to neurons i and j. Here λangle is hyperparameter that weighs angular relationship feature function against other features in the model. We set λangle as 1.

S1.3 Defining AP, LR, and DV axes

To compare positional relationships among neurons in image and atlas, it is necessary to define anterior-posterior (AP), left-right (LR), and dorsal-ventral (DV) axes in image as well in atlas. 3D coordinates of neurons along these axes are then used to define features described above. We use two methods to define these axes. In method 1, we use Principal Component Analysis to obtain these axes. Let p=[p1,pN]R3×N be the centered coordinates (zero mean) of N neurons detected in the image stack or atlas. Then the principal components correspond to the eigenvectors of the matrix ppT. Since the spatial spread of neurons in image as well as in atlas is maximum along AP axes, the first principal component (eigenvector corresponding to maximum eigenvalue) always corresponds to AP axis. Second and third PCs can be assigned to LR and DV axes depending on the orientation of worm in image as described below. Due to rotation of worm about AP axis, the second and third eigenvectors may not always correspond to LR and DV axes. Thus, we designed two methods for these different scenarios –

  1. Worm lying on LR axis – In this case LR axis is assigned to the third eigenvector. This is because z-sampling of image stacks is much smaller compared to x-y sampling. Thus, the spread of neurons along LR axis is smallest. Figure below shows axes obtained using PCA.

  2. Worm rotated about AP axis – In this case we developed an alternative method. First, we define LR axis. To do so we use left-right pair of neurons that are easily detectable in image stack. We used RMEL-RMER for landmark strain (Figure 4) and RMDVL-RMDVR for NeuroPAL strain (Figure 6E, F). Using these neuron-pair coordinates, an LR vector, lr was defined as pr-pl||pr-pl||2. Next AP vector, ap was determined by solving constrained optimization problem
    ap=max(ap.v1)s.t.   ap.lr=0 and ||ap||2=1 (11)

That is a unit vector which is orthogonal to LR vector and in the direction of first principal component v1. Next, dv vector is obtained by defining a vector orthogonal to both ap and lr vectors.

Finally, we check the consistency of ap, lr, dv vectors that is these vectors should point to the anterior, right and ventral of the worm and should satisfy cross product rule to constitute a valid coordinate system. This is necessary because PCA axes are determined up to a multiplication factor of -1 that is coordinate system specified by the principal components (PC1, PC2, PC3) is same as the coordinate system specified by (-PC1, -PC2, -PC3). Thus, a user input is taken in the framework while defining axes. Users can easily click on any neuron in the anterior portion and the posterior portion of the worm image, when asked to do so, to specify PC1 direction.

Appendix 1—figure 1. Examples of PA (blue), LR (green), and DV (black) axes generated automatically in a whole-brain image stack.

Appendix 1—figure 1.

Here red dots correspond to the segmented nuclei in image stack. Shown are 3D view (a), XY (b), YZ (c), and XZ (d) views of the image stack.

S1.4 Inferring neuron identities

To infer most probable identity of neurons, energy function in (Equation 4) is to be maximized. Exact inference techniques for maximizing energy functions over arbitrary graph structures, such as the fully connected graph structure of our model, are not available (Kohli et al., 2009). Thus, we use a commonly used approximate inference method that has been used successfully in past for several applications, Loopy Belief Propagation (LBP) (Murphy et al., 1999; Ikeda et al., 2004) to infer optimal labeling that is the maximum of joint-probability distribution p(yx) as well as the marginal probabilities of labels assigned to each node. We implement our model in MATLAB using an open source package for undirected graphical models (Schmidt, 2007).

S1.5 Resolving duplicate assignments with label consistency score

In general, we define pairwise feature functions in the model such that it penalizes duplicate assignments of any label m to cells i and j in image as follows

fanglem,m;i,j,x=- (12)

Even with this measure, we still see some duplicate assignments when the LBP optimization converges. To resolve such assignments, we mark all cells that were not assigned duplicate labels as assigned. Next, we calculate a label consistency score for each cell that was assigned a duplicate label (examples shown in Figure 1—figure supplement 1C). This score measures how consistent the current label assigned to the cell is in terms of preserving its positional relationships with other cells that were not assigned any duplicate labels (i.e. cells marked as assigned). Among all the cells that were assigned same duplicate label, the cell with the highest consistency score is assigned the duplicate label and marked as assigned. Remaining cells are marked as unassigned. After resolving all such duplicate assignments for all labels, optimization is run again only for the unassigned cells while keeping the identities of other cells fixed. We calculate the label consistency score for each cell as follows

bin.pos.rel.(i)=jvfAP(m,n;i,j,x)+fLR(m,n;i,j,x)+fDV(m,n;i,j,x) (13)
ang.rel.(i)=jvfangle(m,n;i,j,x) (14)
proximityrel.(i)=jvfproximity(m,n;i,j,x) (15)
consistencyscore=bin.pos.rel.+ang.rel.+proximityrel. (16)

Here, i is a cell that was assigned a duplicate label and V' denotes the set of all other cells that were not assigned duplicate labels. Intuitively, correctly predicted cells should have higher consistency score. This is because correctly predicted cells will preserve their relationship with other correctly predicted cells and won’t preserve their relationship with incorrectly predicted cells. In contrast, incorrectly predicted cells will neither preserve their relationship with correctly predicted cells nor with incorrectly predicted cells thus having a lower consistency score. This was observed in simulations as well (Figure 1—figure supplement 1D). Thus, label consistency score also serves as a good criterion for sorting candidate list of labels predicted by the framework.

Label consistency score is combination of features in the model that define the probability distribution p(y/x) in our model. Thus, another way to look at the label consistency score is that it tries to maximize the pseudo-loglikelihood (Sutton and McCallum, 2007; Hyvärinen, 2006) of labels to be assigned to all cells with duplicate labels conditioned on the labels of other cells that were not assigned duplicate labels that is iduplogpyi;yV',x. Here dup denotes the set of all cells that were assigned duplicate labels.

S1.6 Simulating missing cells and generating candidate name list for each cell

The 3D atlas of C. elegans neurons we used is freely available (Szigeti et al., 2014). There are 195 cells in head ganglion in this atlas, that is the label list from which identities are to be assigned to cells in data has 195 elements. However, empirically we detect only ~120–130 neurons in whole-brain image stacks. Remaining neurons are undetected due to either no/low expression levels of fluorophore in these cells or false-negatives in automated detection algorithm. Approximately similar number of neurons were detected by other labs as well (Kato et al., 2015). Further, which cells are undetected is not known a priori. Thus, to take into account missing cells while annotating identities with the model, we define a hidden variable h0,1N that specifies the cells missing in images that is hk=1 if cell with label k in atlas is missing in the image. Here N is the number of labels/cells in atlas. With the hidden variable, we model the joint probability distribution Py,hx as

P(y,hx)=1ZiVΦi(m;i,x)eijΦij(m,n;i,j,x)iVΦi(m,h)Φ(h,x) (17)

Here Z= y𝒴,hiVΦi(yi;i,x)eijΦij(yi,yj;i,j,x) iVΦi(yi,h)Φ(h,x). Unary potential functions Φi and pairwise potential functions Φij are same as described in S1 and S1.2. Potential function Φi(m,h) captures the dependencies between label m assigned to cell i when the hidden variable is h. We set the potential function as

Φi(m,h)={0ifhm=11ifhm=0 (18)

Thus, the potential specifies that if mth label in atlas is missing in the image then that label cannot be assigned to any cell in the image. Further, if mth label is not missing in the image then each cell has equal potential to be assigned that label which will be updated based on positional relationship potentials. Further, the potential function Φi(h,x) captures the dependence between the observed image x and the hidden variable h. Defining this potential function is not trivial for whole-brain images since predicting which cells are missing just based on observed 3D whole-brain image stack is difficult. However, in specific cases confidence values of each label missing in the image may be available, denoted by βk,0βk1. Such confidence values may be specified based on prior knowledge for example based on expression pattern of fluorophore labeling cells in strain, cells that have low detection probability may have higher parameter βk. Using these confidence values, we define Φ(h,x) as

Φ(h,x)=k=1Nβkhk(1βk)(1hk) (19)

Our goal is to calculate Pyx which can be obtained by marginalizing (Equation 17) over the hidden states h. However, since the number of elements in the space of h is huge, marginalizing as well as calculating the normalization constant Z is not tractable. But, for specific cases such as we describe below, the calculation can be simplified.

In the absence of any prior information about which labels are missing in images for whole-brain imaging case, we assigned equal confidence value for each label missing in the image that is βk=β. Thus, for a fixed number P of missing cells in image, Φ(h,x)=βP(1β)NP. A consequence of defining potential function this way is that for a given number of missing cells in image, each combination of missing labels in atlas is equally probable as long as labels of missing cells are not assigned to any cell in image

P(hy=y,x)=P(y,hx)hP(y,hx) (20)
=iVΦi(m;i,x)eijΦij(m,n;i,j,x)iVΦi(m,h)Φ(h,x)hiVΦi(m;i,x)eijΦij(m,n;i,j,x)iVΦi(m,h)Φ(h,x) (21)
=iVΦi(m,h)Φ(h,x)hiVΦi(m,h)Φ(h,x)=1|h| (22)

Here, h{0,1}N such that hk=0kthlabel assigned to cells in y and khk=P. Further, |h| is the number of elements in h. Thus, we can randomly and uniformly sample h and keep it fixed while predicting P(yx). Our goal is to calculate P(yx). We do so as

argmaxy𝒴P(yx)=hargmaxy𝒴P(y,hx) (23)

Therefore, we randomly select P cells in atlas that are considered to be missing in the image data. These P cells are selected uniformly across different regions of the head ganglion following the discussion above. Labels of these P cells are removed from the atlas i.e. the list of possible labels that can be assigned to neurons, and identities of cells are predicted using the remaining atlas. This process is repeated ~1000 times to sample multiple possible combinations of h. Finally, a candidate list of names is generated using (Equation 23) that is compiling a list of optimum labels predicted for each cell in each run (by maximizing P(y,hx) in each run) and choosing the top frequent labels for each cell across all runs.

S1.7 Building data-driven consensus atlas

Here we describe the procedure and the intuition behind building data-driven consensus atlas in our framework. We also describe the intuition behind why it is computationally more efficient than building data-driven atlases for registration methods.

First, we describe how features in the model are defined using a static atlas such as OpenWorm atlas and then extend the formulation to building and using data-driven atlas. Positional relationships among cells based on the OpenWorm atlas are stored as matrices of size N×N where N is the number of cells in atlas. Each cell in matrix records the positional relationship between a pair of cells observed in the atlas. For example, for AP positional relationship matrix, if a column cell such as RMEL is anterior to a row cell such as AIZR in atlas, then the corresponding cell in matrix will denote 1 and otherwise 0. Here, 1 implies that according to the prior information available from OpenWorm, cell RMEL is observed to be anterior to cell AIZR with 100% probability. Let FAPRN×N be the AP positional relationship matrix and m,n be labels in atlas. Then,

FAP(m,n)={1,n>malongAP0,n<malongAP (24)

Note that FAP(n,m)=1FAP(m,n). Using the matrix FAPn,m=1-FAPm,n, the AP positional relationship feature function fAP(m,n;i,j,x) for cells i and j (as described in S1.2.2) can be defined as

fAP(m,n;i,j,x)={FAP(m,n)j>ialongAPFAP(m,n)j<ialongAP (25)

Here, fAP(m,n;i,j,x) denotes the AP positional relationship feature for assigning labels m and m in atlas to cells i and j in image. Note that Equation 25 is consistent with Equation 6 as shown below by expanding FAP(m,n) and FAP(n,m) terms using Equation 24 in Equation 25.

fAP(m,n;i,j,x)={1,n>mj>ialongAP0,n<mj>ialongAP1,m>nj<ialongAP0,m<nj<ialongAP (26)

Thus, the AP positional relationship matrix FAP stores the prior knowledge available on anterior-posterior relationships among cells. Since, FAP is built using only a single data source that is OpenWorm atlas, all elements in the matrix are either 1 or 0. This implies that for all pairs of cells m and n in atlas, the FAP matrix says that cell n is anterior to cell m with either 100% probability or 0 probability. Also, note that since FAP consists of only 1’s and 0’s, the AP positional relationship feature, fAP(m,n;i,j,x), also consists of only 1’s and 0s, thus it is independent of labels fAP(m,n;i,j,x) and m assigned to cells i and j.

In contrast to using a single data source, if additional information is available in the form of annotated experimental datasets, the prior knowledge on anterior-posterior relationships can be updated. For example it may be possible that cell n is observed to be anterior to cell m with 80% probability in annotated datasets (e.g. in 8 out of 10 experimental datasets) and posterior to cell n with 20% probability. This empirically observed AP positional relationships among cells can be updated in matrix FAP as

FAP,datadriven(m,n)={wmnn>malongAP1wmnn<malongAP (27)

Here, wmn, 0wmn1 is the fraction of annotated datasets in which labels m is annotated to the anterior of label n. Thus, instead of specifying a hard constraint based on a single data source static atlas (as in Equation 24), wmn specifies a soft constraint based on positional relationships observed in experimental annotated datasets. Formally, wmn is defined as

wmn=1Dd=1DI(m,n;x,d) (28)
I(m,n;x,d)={1xnd>xmd0xnd<xmd (29)

Here, d{di}1D denotes an annotated data set, D is the total number of annotated datasets used to build atlas, xmdx and xmdx are the coordinates along AP axis of cells annotated labels m and n in dataset d. Equation 26 can be seen as a generalization of Equation 24. Further, the AP positional relationship feature function fAP(m,n;i,j,x) can be updated based on data-driven atlas using Equation 25. The above discussion can be similarly extended to LR, DV relationships, angular relationship, and proximity relationship.

Similarly, for angular relationship feature, instead of using a fixed vector pq (refer to Equation 10) between cells m and n that is provided in static atlas, we use an average vector obtained from annotated experimental datasets.

pqdatadriven=1Dd=1Dpm,dpn,d||pm,dpn,d||2 (30)

Here, pm,d and pn,d are position coordinates of cells with labels m and n in data d. Similarly, for proximity relationship feature (refer to Equation 9), instead of using a fixed distance between cells with labels m and n in atlas, we use an average distance obtained from annotated experimental datasets.

d(ym,yn)datadriven=1Dd=1Dd(ym,d,yn,d) (31)

Here,  d(ym,yn) is the distance between cells with labels d(ym,d,yn,d) and n in data d.

The key difference between registration-based methods and our framework in building data-driven atlases lies in the underlying methodology used by these methods to annotate cells. Registration methods annotate cells by maximizing extrinsic similarity between images and atlas. This requires the pre-alignment of spaces in which the image and the atlas exist. Thus, building data-driven atlas for registration-based methods also requires pre-alignment of all annotated images to the same space. This is typically done by simultaneously registering or block-wise registering all annotated images which requires solving several constrained regression problems. In contrast, our framework annotates cells by maximizing intrinsic similarity between images and atlas which is independent to the pre-alignment of images to same space. Thus, in our framework positional relationship features can be calculated for each annotated dataset in its own space and subsequently the features are aggregated together by simple averaging operations, which is computationally efficient, to build data-driven atlases.

S1.8 Discussion on previous methods using registration to annotate cells

Qu et al., 2011 and Long et al., 2009 annotate cells in C. elegans images by spatially deforming an atlas with known cell identities and registering it to the image data. Similar method was proposed by Aerni et al., 2013. Here, several other features were used along with location feature such as cell shape, cell size, fluorophore expression level etc. In Long et al., 2008, C. elegans images were annotated by comparing each cell's location in image to cell locations in multiple template images to generate initial matches. Generated matches were then pruned by checking relative position consistency such as anterior-posterior relationship among cells. Thus, although pairwise positions were used, they were not systematically optimized to predict cell labels and were only used for enforcing consistency in post-registration matching. In Scholz, 2018, cell identities were determined by registering image data to the 2D atlas (Altun and Hall, 2009). Registering 3D data to 2D atlas makes it difficult to disambiguate the identities along LR axis since the LR neurons are not exactly symmetrical along that axis. Similarly, registration-based cell annotation was proposed in Toyoshima, 2019; however, in this case, the authors generated their own atlases by registering several partial atlases with subset of cells labeled in each atlas. Registration-based cell identification was proposed in Yemini et al., 2021 as well. Here, additional color information was integrated as a feature for registration along with spatial location of cells.

S1.9 Registration methods do not consider intrinsic similarity features such as relative positional relationships

One of the major reasons of higher cell annotation accuracy achieved by our framework (Figure 2C,D,E, Figure 3B, Figure 4B, Figure 6B, Figure 2—figure supplement 5) is that our model systematically includes and optimizes pairwise positional relationships between cells to determine cell labels. In comparison, registration-based methods that predict cell identities by registering image stack to the atlas maximize the extrinsic similarity between images and atlas (Bronstein et al., 2007). Due to inherent biological variability, spatial distribution of cells in worms differ significantly (Figure 2—figure supplement 6A,B) from the positions of cells in atlas while the relative positional relationships are still preserved (Figure 2—figure supplement 6C,D). Below, we provide a mathematical argument for why registration-based methods do not include pairwise positional relationships. Further, we show that how registration information can be included in our model.

Following the description in S2.1, the objective function to be optimized in registration methods (Myronenko and Song, 2009; Panaganti and Aravind, 2015; Ge et al., 2014; Ma, 2015; Chui and Rangarajan, 2003) is similar to

Eikcik||xi𝒯(yk)||Σk12+R(𝒯) (32)

Here, xiR3 is the coordinate of ith cell in the image, 𝒯(yk)R3 is the registered coordinate of kth cell with label k in the atlas, cik is the posterior probability that ith cell in image matches to the kth in cell in atlas, and (𝒯) is the regularization term on the transformation applied to cells to ensure smoothness of transformations for non-rigid registration. If only the first term in the energy function (Equation 4) is kept in our model then energy function can be written as

E(y,x)= (i=1Vaλafa(k;i,x)) (33)

This model considers only unary (cell specific) features for predicting cell names. Further, if we consider only one unary feature function in the model based on spatial locations of cells in atlas defined as below

fa(k;i,x)=||xi𝒯(yk)||Σk12 (34)

and substituted back in (Equation 32) then the energy function is equivalent to the objective function in registration algorithms.

E(y,x)= (i=1Vaλa||xi𝒯(yk)||Σk12) (35)

Thus, objective function in registration algorithms can be specified using only unary features in CRF_ID framework. This shows that registration methods do not account for relative positional relationship features. Further, it highlights that CRF_ID framework can easily integrate registration information, as unary feature term, with relative positional relationship features for predicting cell identities.

S2: Description of different methods that were compared

Below, we provide a brief description of the different methods that we compared. Each model uses a different and specific combination of information for predicting cell labels thus helps in dissecting what information is most useful in predicting cell biological names. Some of these methods consist of methods proposed previously for cell annotation in C. elegans. Note, for fair comparison, static 3D OpenWorm (Szigeti et al., 2014) atlas or 2D atlas available on wormatlas.org was used as reference for all methods: absolute positions of cells were used from atlases for registration methods, positional relationships derived from the same atlases were used for CRF_ID framework.

S2.1 Registration

In this case, detected neurons in image stacks or neurons in synthetic data were registered to the atlas using a widely used non-rigid registration algorithm (Myronenko and Song, 2009). For synthetic data, the process was repeated ~200 times with new synthetic data generated each time. Here, we provide a brief description of the registration algorithm method. Two point-sets are registered by iteratively applying smooth deformation to one of the point-sets. Let two point-sets be X=[x1,x2,...,xm]R3×m and Y=[y1,y2,...,yn]R3×n consisting of m and n points respectively. Here, xi and yi are coordinates of point-sets. In our case, 3D coordinates of nuclei detected in the image form point set X and cell positions in the atlas form the point-set Y. In CPD, each point in point-set X is considered to be a random sample drawn from a mixture of gaussian distributions. Further the point set Y specifies the centroids of gaussian components of this distribution. Thus, the probability of observing a point xi is given by

P(xi)= ωm+(1ω)nk=1n𝒩(xi;yk,Σk) (36)

Here, ω is outlier ratio. A smooth transformation 𝒯(yi,W):R3R3 is applied to points in point-set Y given by 𝒯(yi,W)=yi+ k=1nG(i,k)Wk so as to maximize the likelihood of point-set X. Here  GRn×n is gaussian kernel matrix defined as G(i,k)=exp((yiyk)2β2) and Wk is the kth column of parameters matrix WR3×n. Thus, the transformation 𝒯(yi,W) lies in a Reproducible Kernel Hilbert Space (RKHS) with gaussian kernel. The aim of parameterizing transformation in this way is to ensure the smoothness of transformation.

Parameters of transformation matrix W are estimated by maximizing the joint likelihood of data X and n latent variables corresponding to mixture components. This is done using Expectation-Maximization algorithm. E-step is equivalent to calculating the posterior probability that point xi is generated from component k, keeping the current parameters W, Σk fixed.

P(kxi)=αik=1ωnexp(1/2||xi𝒯(yk,W)||Σk12)ωm+(1ω)nk=1nexp(12||xi𝒯(yk,W)||Σk12) (37)

Here, ||xiyk||Σk12=(xiyk)TΣk1(xiyk). In M-step, component parameters are determined by maximizing the expected value of complete data log-likelihood (objective function). Additional regularization term is added to the objective function to minimize norm of 𝒯 in RKHS for controlling the complexity (smoothness) of transformation.

=i,km,nαik||xi𝒯(yk,W)||Σk12λ2tr(WGWT) (38)

Prior to registration, cells in image stack and in atlas were aligned by defining the position coordinates of cells in AP, LR, and DV coordinate system (as described in S1.3) to improve the registration accuracy. Cell identities were predicted based on the correspondences generated by the registration algorithm. Note that this case does not consider co-dependent features among cells (as discussed in S1.10) as registration algorithms utilize only absolution positions of cells.

Additionally, we modified the registration method described above to account for missing cells in images, in a manner similar to the CRF_ID framework and for fair comparison with the CRF_ID annotation method. Identities were predicted iteratively with uniformly and randomly selected cells across head-ganglion considered missing thus the labels of those cells were removed from the atlas list. Cells in images were registered to the remaining cells in atlas. This process was repeated ~1000 times to sample multiple combinations of missing cells and identities were predicted by pooling results across each iteration. We did see an improvement in prediction accuracy with this modification compared to the base that does not account for missing cells. Hence all comparisons were performed with this modification and all results shown are for the modified method.

For registration based matching we used a popular registration method (Myronenko and Song, 2010). The parameter settings of the method are below.

Parameter Value Description
 opt.method ‘nonrigid’ Non-rigid or rigid registration
 opt.beta 1 The width of gaussian kernel (smoothness)
 opt.lambda 3 Regularization weight
 opt.viz 0 Don’t show any iteration
 opt.outliers 0.3 Noise weight
 opt.fgt 0 Do not use FGT (fast gaussian transform)
 opt.normalize 1 Normalize to unit variance and zero mean before registering
 opt.corresp 1 Compute correspondence vector at the end of registration
 opt.max_it 100 Max number of iterations
 opt.tol 1e-10 tolerance

S2.2 Relative position (Rel. Pos)

In this case, full CRF-based annotation framework as described in Appendix 1–Extended methods S1 was used. We considered only co-dependent features (i.e. pairwise relative positional features as described in S1.2) between all cells. Optimal labels of cells were predicted using the optimization procedure described in S1.4. Note that the information used by model in this case is different than the information used in Registration method as no information about absolute positions of cells is used. Thus, comparing prediction accuracy across these cases helps in verifying that co-dependent features are more useful in predicting neuron identities.

S2.3 Registration + Rel position

In this case, we used both cell-specific features (i.e. unary features) and co-dependent features (i.e. pairwise features) to predict cell labels. Pairwise feature terms were the same as described in S1.2. Unary feature term was modified in this case as

freg(k;i,x)=exp(||xi𝒯(yk)||2σreg2) (39)

Here, xiR3 and ykR3 are the coordinates of cell i in image stack and cell k in the atlas, respectively (in AP-LR-DV coordinate system). Thus, cell i in image has a higher potential to take label lk of cell k in atlas if the distance between registered cell i in image and cell k in atlas is small. Optimal transformation 𝒯(yk) was inferred using the non-rigid registration method as described in S2.1. Here, parameter σreg controls the weight between registration term and CRF-based matching term. For example if σreg is small then the registered cell i in image strongly prefers matching to the nearest cell in atlas. Thus, the relative position features have little influence on altering the cell label based on relative position consistency criteria. In contrast if σreg is big then the registered cell i has uniform preference of matching to any cell atlas. This allows more flexibility to CRF_ID method to pick optimal labels enforcing the consistency criteria. We set σreg=1 to give equal weightage to the registration term and other features. Optimal labels of cells were predicted using the optimization procedure described in S1.4.

S2.4 Color

In this case, only color information was used to identify the names of cells without using position information of cells at all. This helps in gauging the prediction accuracy that can be attained by using color information. Further, comparing this case with the cases that combine color information with position information (described below) helps in gauging the contribution of cell position information alone in predicting cell names. We describe below, both the baseline (naive) method for building color atlas and also building ensemble of color atlases with color distributions in training images aligned to test image.

Baseline color atlas

In this case, color atlas was built by aggregating raw RGB values of cells across images used to build atlas leaving the test image out. Feature function in this case was defined as the kernelized Mahalanobis distance between the colors of cells in image stack and colors of cells in atlas. Note that this is a unary feature (as it is cell specific)

fcol(k;i,x)=exp((CiCk)TΣk1(CiCk)) (40)

Here, CiR3 is the mean RGB value of the ith cell in test image, CkR3 is the mean RGB value of cell with label k in atlas and Σk is the covariance matrix of the RGB values of cell with label k in atlas. Let rk=[r1,, rN]R3×N be the observed RGB values of cells with label rk=[r1,,rN]R3×N in k training images used to build the color atlas, then Ck=1/N(rk1) and Σk=(rkCk1T)(rkCk1T)T where 1RN×1 is a vector of ones. Thus, this feature specifies that cell i in image has a high potential of taking label k in atlas if the Mahalonobis distance between the colors of cell i and cell k is small.

Ensemble atlas with aligned color distributions

We found that when baseline color data-driven atlas was combined with positional relationship features data-driven atlas to annotate cells, the performance increased marginally over the case when only positional relationship features were used for annotation (Figure 6—figure supplement 1B). This implies that color information did not contribute to annotation task. We reasoned that this is because distributions of color values vary a lot across images thus color distributions in training data did not reflect the color distribution in test data. This may be due to inherent difference in fluorophore expression across animals, differences in imaging settings across sessions (exposure time, laser power) etc. The problem of different feature distributions in test data compared to training data is solved by domain transfer techniques in machine learning. Here we develop a simple two-step domain transfer method.

First, we align color distributions in training images used to build the atlas to the color distribution in test image using several methods –

  1. Normalization of each color channel in all images so that pixel values lie between 0 and 1. Let k denote raw intensity value of red color channel (mNeptune channel) at location IR(x) in image. Then the normalized values are calculated as
    IRnorm(x)=IR(x)min(IR(x))max(IR(x))min(IR(x)) (41)

    Similar normalization is performed for CyOFP and BFP channels.

  2. Histogram matching of each color channel in training image to the corresponding channel in test image. Histogram matching transforms the color distribution in an image so that the color histogram matches the color histogram of reference image. This was implemented using ‘imhistmatchn’ function in MATLAB.

  3. Color invariant transformation of training set images and test image, and subsequent histogram matching of training color invariant images to the test color invariant image. Color invariant transformation transforms the color space of image to remove dependency on lighting geometry and illumination (Finlayson et al., 1998). The transformation is performed as follows. Let Ci=[rx,gx,bx]Rp×3 be the matrix that stores RGB values of all voxels in ith image in the training set. Here p is the number of voxels in the image. Then the color invariant image is obtained by sequentially and repeatedly normalizing the rows and columns of Ci for a fixed number of iterations.

Note that aligning color distribution of training image to test image does not require cell identity information at all and is performed using all RGB pixel values in the image. After aligning the color distributions, the color atlas is built similar to the baseline color atlas that is by aggregating the RGB values of cells across images. However, in comparison to the baseline case, now RGB values come from the aligned distributions using one of the methods mentioned above.

Second, an ensemble of data-driven color atlases is used for predicting cell identities that is two data-driven color atlases are used with different color alignment techniques used in each atlas. Feature function in this case was defined as

fcol(k;i,x)=lλlexp((CiCk,l)TΣk,l1(CiCk,l)) (42)

Here, CiR3 is the mean RGB value of the CiR3 cell in test image, Ck,lR3 is the mean RGB value of cell with label Ck,lR3 in k atlas, lth is the covariance matrix of the RGB values of cell with label Σk,l in k atlas, and lth is the mixing weight of each atlas, λl. Note that each atlas is built with using RGB values from all training images.

In practice we used two atlases in the ensemble atlas. The first atlas used method 2 that is histogram matching of raw RGB distribution in training images to the test image. The second atlas used method 3 that is color invariant transformation was applied to all images (including test image) and subsequently color histogram of training images was matched to the test image. Mixing weights of 0.2 and 0.8 were selected by cross-validation.

Note that in both cases of color atlas, we do not consider any co-dependent features among cell, thus predicting cell names in this case is equivalent to minimizing the following energy function

y=argminy𝒴ikfcol(k;i,x) (43)

This is equivalent to maximum weight bipartite graph match problem and thus we used Hungarian algorithm to find the optimal solution (Kuhn, 1955).

S2.5 Color + registration

In this case, we used both color information and position information of cells to predict cell identities. The features used in this case were fcol(k;i,x)) and freg(k;i,x) as described in S2.1 and S2.4. Note that in this case too, we do not use co-dependent cell features. Cell labels were predicted by minimizing the following energy function

y=argminy𝒴ikfcol(k;i,x)freg(k;i,x) (44)

Here again, we used Hungarian algorithm (Kuhn, 1955) to find the optimal solution. By comparing prediction accuracy in this case to the Color only case (S2.4), we can gauge the contribution of position information of cells on prediction accuracy. Here again, we accounted for missing cells in images by predicting the identities iteratively (similar to S2.1). In this case too, we observed improvement in prediction accuracy by accounting for missing cells hence all the comparisons are shown for modified case.

Below, we briefly show how the objective function in (Equation 44) naturally arises in registration algorithms that combine multiple features such as color in spatial registration method (Danelljan et al., 2016). First, as defined in registration methods (Myronenko and Song, 2009; Jian and Vemuri, 2005) that use only spatial location feature xi of each cell, the probability of observing each cell is (as described in S2.1)

P(xi)=k=1nP(xizk)P(zk)=k=1nπk𝒩(xi;μk,Σk) (45)

Here, zk denotes a spatial mixture component with mean μk and covariance Σks. If color is also available then to maximize complete data log-likelihood using EM method, we need to define the joint probability of observing both spatial location xi and color feature yi of each cell, p(xi,yi). Next, we will discuss two possible cases as discussed in Danelljan et al., 2016 that differ in the way the joint-probability p(xi,yi) is defined.

First case, if the color information is considered to be independent of spatial information then the joint probability factorizes as

P(xi,yi,zi=k,ci=l)= P(xizi=k)P(yici=l)P(zi=k)P(ci=l)=𝒩(xi;μk,Σks)F(yi;θl)πkπl (46)

Here, zi and ci denote the spatial and color mixture components (latent variables) from which xi and yi are drawn. Also, πl,θl and F(ci,θl) denote the mixture weights, parameters and density of observing yi from a mixture density. The independence assumption is akin to assuming that distribution of colors observed in images in different cells is not dependent on spatial location of cells. However, this assumption is not valid since in NeuroPAL each cell is color coded thus observed color depends on spatial location of cells.

In second case, dependence of color on spatial location of cell is accounted for. Further, it is modeled, that for each spatial component (cell), color is drawn from a location specific mixture density. Thus, the joint probability factorizes as

P(xi,yi,zi=k,ci=l)= P(xizi=k)P(yici=l,zi=k)P(cizi=k)P(zi=k) (47)

Next, we need to define P(yici,zi) to define complete data likelihood. For NeuroPAL, this is easy since each cell is assigned a unique color (Yemini et al., 2021) that is for each spatial component (cell) the color density mixture has only 1 component. Thus, P(yici,zi)= P(yizi) and P(cizi) = 1.

With updated definitions, the complete likelihood of data is defined as

P(X,Y,Z,C)= iP(xi,yi,zi,ci)=iP(xizi)P(yici)P(zi)= ikπkzik𝒩(xi;μk,Σks)zikF(ci;θk)zik (48)

Now, if F(ci,k) is defined as exp((yiCk)TΣkc1(yiCk)) (as described in S2.4) and expected complete data log-likelihood is maximized using the EM method then the M-step is equivalent to maximizing

=i,kαik(xi𝒯(μk,W)Σks12+(yiCk)TΣkc1(yiCk))λ2tr(WGWT) (49)

Thus, it can be seen that the first term in Equation 49 is equivalent to Equation 44.

S2.6 Color + Rel. position

In this case we combined color information of cells with relative position information of cells. Note that in contrast to S2.5, in this case we use co-dependent features (that is pairwise relative position features) in combination with color information, whereas in S2.5, the feature was dependent on cell-specific information (absolute position of cells). The unary feature in this case is same as the Color only method (S2.4).

fcol(k;i,x)=exp((CiCk)TΣk1(CiCk)) (50)

Pairwise features in this case were the same as described in S1.2. Optimization of the objective function in this case was performed using Loopy Belief Propagation algorithm as described in S1.4. Comparing prediction accuracy in this case to the Color + Registration (S2.5) method helps in verifying that the higher accuracy achieved by this method in predicting cell identities is due to co-dependent cell features thus highlighting the importance of such features.

S2.7 Registration + color + Rel. position

In this case, we combined all the cell independent and co-dependent features in one model. Thus, the unary features in this case were fcol(k;i,x)) and freg(k;i,x) as described in S2.4 and S2.1, and co-dependent features were the same as the described in S1.2. Here again, objective function was optimized using Loopy Belief Propagation algorithm as described in S1.4. We simulated this condition to see if prediction accuracy can be boosted by combining the co-dependent position features with registration algorithm. However, in most cases, we saw a decrease in prediction accuracy. This is because the competition between registration term and relative position features term in objective function decreases accuracy.

Funding Statement

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Contributor Information

Hang Lu, Email: hang.lu@gatech.edu.

Ronald L Calabrese, Emory University, United States.

Ronald L Calabrese, Emory University, United States.

Funding Information

This paper was supported by the following grants:

  • National Institutes of Health R21DC015652 to Hang Lu.

  • National Institutes of Health R01NS096581 to Hang Lu.

  • National Institutes of Health R01GM088333 to Hang Lu.

  • National Science Foundation 1764406 to Hang Lu.

  • National Science Foundation 1707401 to Hang Lu.

  • National Institutes of Health R01GM108962 to Hang Lu.

  • National Institutes of Health P40OD010440 to Hang Lu.

Additional information

Competing interests

No competing interests declared.

Author contributions

Conceptualization, Resources, Data curation, Software, Formal analysis, Validation, Investigation, Visualization, Methodology, Writing - original draft, Writing - review and editing.

Investigation.

Investigation.

Methodology.

Conceptualization, Resources, Formal analysis, Supervision, Funding acquisition, Validation, Investigation, Visualization, Methodology, Writing - original draft, Project administration, Writing - review and editing.

Additional files

Transparent reporting form

Data availability

All data generated or analysed during this study are included in the manuscript and supporting files. Source data files are provided at https://github.com/shiveshc/CRF_Cell_ID.git (copy archived at https://archive.softwareheritage.org/swh:1:rev:aeeeb3f98039f4b9100c72d63de25f73354ec526/).

References

  1. Aerni SJ, Liu X, Do CB, Gross SS, Nguyen A, Guo SD, Long F, Peng H, Kim SS, Batzoglou S. Automated cellular annotation for high-resolution images of adult Caenorhabditis elegans. Bioinformatics. 2013;29:i18–i26. doi: 10.1093/bioinformatics/btt223. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Altun ZF, Hall DH. Worm atlas. Wormatlas. 2009;1:1–14. doi: 10.3908/wormatlas.1.14. [DOI] [Google Scholar]
  3. Bakir GH. Predicting Structured Data (Neural Information Processing. MIT Press; 2007. [Google Scholar]
  4. Bao Z, Murray JI, Boyle T, Ooi SL, Sandel MJ, Waterston RH. Automated cell lineage tracing in Caenorhabditis elegans. PNAS. 2006;103:2707–2712. doi: 10.1073/pnas.0511111103. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Bargmann CI, Hartwieg E, Horvitz HR. Odorant-selective genes and neurons mediate olfaction in C. elegans. Cell. 1993;74:515–527. doi: 10.1016/0092-8674(93)80053-H. [DOI] [PubMed] [Google Scholar]
  6. Bentley B, Branicky R, Barnes CL, Chew YL, Yemini E, Bullmore ET, Vértes PE, Schafer WR. The multilayer connectome of Caenorhabditis elegans. PLOS Computational Biology. 2016;12:e1005283. doi: 10.1371/journal.pcbi.1005283. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Bronstein AM, Bronstein MM, Kimmel R. Rock, paper, and scissors: extrinsic vs. intrinsic similarity of non-rigid shapes. Proceedings of the IEEE International Conference on Computer Vision; 2007. [DOI] [Google Scholar]
  8. Bronstein AM, Bronstein MM, Kimmel R. Topology-Invariant similarity of nonrigid shapes. International Journal of Computer Vision. 2009;81:281–301. doi: 10.1007/s11263-008-0172-2. [DOI] [Google Scholar]
  9. Caelli T, Caetano T. Graphical models for graph matching: approximate models and optimal algorithms. Pattern Recognition Letters. 2005;26:339–346. doi: 10.1016/j.patrec.2004.10.022. [DOI] [Google Scholar]
  10. Chalasani SH, Chronis N, Tsunozaki M, Gray JM, Ramot D, Goodman MB, Bargmann CI. Dissecting a circuit for olfactory behaviour in Caenorhabditis elegans. Nature. 2007;450:63–70. doi: 10.1038/nature06292. [DOI] [PubMed] [Google Scholar]
  11. Chaudhary S. CRF_Cell_ID. swh:1:rev:aeeeb3f98039f4b9100c72d63de25f73354ec526Software Heritage. 2021 https://archive.softwareheritage.org/swh:1:dir:f852385572fef1c1df5dd410e7a54fa07378e3af;origin=https://github.com/shiveshc/CRF_Cell_ID.git;visit=swh:1:snp:4cc3097dd2ace051a4a2404108355b9abbc551ed;anchor=swh:1:rev:aeeeb3f98039f4b9100c72d63de25f73354ec526/
  12. Chen Y, McElvain LE, Tolpygo AS, Ferrante D, Friedman B, Mitra PP, Karten HJ, Freund Y, Kleinfeld D. An active texture-based digital atlas enables automated mapping of structures and markers across brains. Nature Methods. 2019;16:341–350. doi: 10.1038/s41592-019-0328-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Cho Y, Lee SA, Chew YL, Broderick K, Schafer WR, Lu H. Multimodal Stimulation in a Microfluidic Device Facilitates Studies of Interneurons in Sensory Integration in C. elegans. Small. 2020;16:1905852. doi: 10.1002/smll.201905852. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Chu J, Oh Y, Sens A, Ataie N, Dana H, Macklin JJ, Laviv T, Welf ES, Dean KM, Zhang F, Kim BB, Tang CT, Hu M, Baird MA, Davidson MW, Kay MA, Fiolka R, Yasuda R, Kim DS, Ng HL, Lin MZ. A bright cyan-excitable orange fluorescent protein facilitates dual-emission microscopy and enhances bioluminescence imaging in vivo. Nature Biotechnology. 2016;34:760–767. doi: 10.1038/nbt.3550. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Chui H, Rangarajan A. A new point matching algorithm for non-rigid registration. Computer Vision and Image Understanding. 2003;89:114–141. doi: 10.1016/S1077-3142(03)00009-2. [DOI] [Google Scholar]
  16. Danelljan M, Meneghetti G, Khan FS, Felsberg MA. Probabilistic framework for Color-Based point set registration. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2016. pp. 1818–1826. [DOI] [Google Scholar]
  17. Evangelidis GD, Kounades-Bastian D, Horaud R, Psarakis EZ. A generative model for the joint registration of multiple point sets. In: Fleet D, editor. Lecture Notes in Computer Science. Springer; 2014. pp. 109–122. [DOI] [Google Scholar]
  18. Finlayson GD, Schiele B, Crowley JL. Comprehensive colour image normalization. In: Burkhardt H, editor. Lecture Notes in Computer Science. Springer; 1998. pp. 475–490. [DOI] [Google Scholar]
  19. Ge S, Fan G, Ding M. Non-rigid point set registration with global-local topology preservation. IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops; 2014. pp. 245–251. [DOI] [Google Scholar]
  20. Gendrel M, Atlas EG, Hobert O. A cellular and regulatory map of the GABAergic nervous system of C. elegans. eLife. 2016;5:e17686. doi: 10.7554/eLife.17686. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Hyvärinen A. Consistency of pseudolikelihood estimation of fully visible boltzmann machines. Neural Computation. 2006;18:2283–2292. doi: 10.1162/neco.2006.18.10.2283. [DOI] [PubMed] [Google Scholar]
  22. Ikeda S, Tanaka T, Amari S. Stochastic reasoning, free energy, and information geometry. Neural Computation. 2004;16:1779–1810. doi: 10.1162/0899766041336477. [DOI] [PubMed] [Google Scholar]
  23. Jian B, Vemuri BC. A robust algorithm for point set registration using mixture of gaussians. Proceedings IEEE International Conference on Computer Vision; 2005. pp. 1246–1251. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Kappes JH, Andres B, Hamprecht FA, Schnörr C, Nowozin S, Batra D, Kim S, Kausler BX, Kröger T, Lellmann J, Komodakis N, Savchynskyy B, Rother C. A comparative study of modern inference techniques for structured discrete energy minimization problems. International Journal of Computer Vision. 2015;115:155–184. doi: 10.1007/s11263-015-0809-x. [DOI] [Google Scholar]
  25. Kato S, Kaplan HS, Schrödel T, Skora S, Lindsay TH, Yemini E, Lockery S, Zimmer M. Global brain dynamics embed the motor command sequence of Caenorhabditis elegans. Cell. 2015;163:656–669. doi: 10.1016/j.cell.2015.09.034. [DOI] [PubMed] [Google Scholar]
  26. Kim Y, Venkataraju KU, Pradhan K, Mende C, Taranda J, Turaga SC, Arganda-Carreras I, Ng L, Hawrylycz MJ, Rockland KS, Seung HS, Osten P. Mapping social behavior-induced brain activation at cellular resolution in the mouse. Cell Reports. 2015;10:292–305. doi: 10.1016/j.celrep.2014.12.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Kobak D, Brendel W, Constantinidis C, Feierstein CE, Kepecs A, Mainen ZF, Qi XL, Romo R, Uchida N, Machens CK. Demixed principal component analysis of neural population data. eLife. 2016;5:e10989. doi: 10.7554/eLife.10989. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Kohli P, Ladický L, Torr PHS. Robust higher order potentials for enforcing label consistency. International Journal of Computer Vision. 2009;82:302–324. doi: 10.1007/s11263-008-0202-0. [DOI] [Google Scholar]
  29. Komodakis N, Paragios N. Beyond pairwise energies: efficient optimization for higher-order mrfs. 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, CVPR Workshops 2009; 2009. pp. 2985–2992. [DOI] [Google Scholar]
  30. Krähenbühl P, Koltun V. Advances in Neural Information Processing Systems. Curran Associate inc; 2011. [Google Scholar]
  31. Kuhn HW. The hungarian method for the assignment problem. Naval Research Logistics Quarterly. 1955;2:83–97. doi: 10.1002/nav.3800020109. [DOI] [Google Scholar]
  32. L'Etoile ND, Bargmann CI. Olfaction and odor discrimination are mediated by the C. elegans guanylyl cyclase ODR-1. Neuron. 2000;25:575–586. doi: 10.1016/S0896-6273(00)81061-2. [DOI] [PubMed] [Google Scholar]
  33. Lafferty J, McCallum A, Pereira FCN. Conditional random fields: probabilistic models for segmenting and labeling sequence data. ICML’ 01 Proc. Eighteenth Int. Conf. Mach. Learn; 2001. pp. 282–289. [Google Scholar]
  34. Lee H, Kim SA, Coakley S, Mugno P, Hammarlund M, Hilliard MA, Lu H. A multi-channel device for high-density target-selective stimulation and long-term monitoring of cells and subcellular features in C. elegans. Lab Chip. 2014;14:4513–4522. doi: 10.1039/C4LC00789A. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Leordeanu M, Hebert M. A spectral technique for correspondence problems using pairwise constraints. Proceedings of the IEEE International Conference on Computer Vision II; 2005. pp. 1482–1489. [DOI] [Google Scholar]
  36. Leordeanu M, Hebert M. Unsupervised learning for graph matching. IEEE Conf. Comput. Vis. Pattern Recognit; 2009. pp. 864–871. [DOI] [Google Scholar]
  37. Liu H, Qin LW, Li R, Zhang C, Al-Sheikh U, Wu ZX. Reciprocal modulation of 5-HT and octopamine regulates pumping via feedforward and feedback circuits in C. elegans. PNAS. 2019;116:7107–7112. doi: 10.1073/pnas.1819261116. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Long F, Peng H, Liu X, Kim S, Myers G. Automatic Recognition of Cells (ARC) for 3D images of C. elegans. In: Vingron M, Wong L, editors. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) Springer; 2008. pp. 128–139. [DOI] [Google Scholar]
  39. Long F, Peng H, Liu X, Kim SK, Myers E. A 3D digital atlas of C. elegans and its application to single-cell analyses. Nature Methods. 2009;6:667–672. doi: 10.1038/nmeth.1366. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Lyssenko NN, Hanna-Rose W, Schlegel RA. Cognate putative nuclear localization signal effects strong nuclear localization of a GFP reporter and facilitates gene expression studies in Caenorhabditis elegans. BioTechniques. 2007;43:596–600. doi: 10.2144/000112615. [DOI] [PubMed] [Google Scholar]
  41. Ma J. Robust L2E estimation of transformation for non-rigid registration. IEEE Transactions on Signal Processing : A Publication of the IEEE Signal Processing Society; 2015. pp. 1115–1129. [DOI] [Google Scholar]
  42. McIntire SL, Reimer RJ, Schuske K, Edwards RH, Jorgensen EM. Identification and characterization of the vesicular GABA transporter. Nature. 1997;389:870–876. doi: 10.1038/39908. [DOI] [PubMed] [Google Scholar]
  43. Mémoli F. Gromov–Wasserstein Distances and the Metric Approach to Object Matching. Foundations of Computational Mathematics. 2011;11:417–487. doi: 10.1007/s10208-011-9093-5. [DOI] [Google Scholar]
  44. Murphy KP, Weiss Y, Jordan MI. Loopy belief propagation for approximate inference: an empirical study. UAI'99: Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence; 1999. pp. 467–475. [Google Scholar]
  45. Murray JI. Automated analysis of embryonic gene expression with cellular resolution in C. elegans. Nat Methods. 2008;5:703–709. doi: 10.1038/nmeth.1228. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Myronenko A, Song X. Point-Set registration: coherent point drift. Advances in Neural Information Processing Systems; 2009. pp. 1009–1016. [Google Scholar]
  47. Myronenko A, Song X. Point set registration: coherent point drift. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2010;32:2262–2275. doi: 10.1109/TPAMI.2010.46. [DOI] [PubMed] [Google Scholar]
  48. Najafi M, Taghavi Namin S, Salzmann M, Petersson L. Non-associative higher-order markov networks for point cloud classification. In: Fleet D, editor. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) Springer; 2014. pp. 500–515. [DOI] [Google Scholar]
  49. Nguyen JP, Shipley FB, Linder AN, Plummer GS, Liu M, Setru SU, Shaevitz JW, Leifer AM. Whole-brain calcium imaging with cellular resolution in freely behaving Caenorhabditis elegans. PNAS. 2016;113:E1074–E1081. doi: 10.1073/pnas.1507110112. [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Nitzan M, Karaiskos N, Friedman N, Rajewsky N. Gene expression cartography. Nature. 2019;576:132–137. doi: 10.1038/s41586-019-1773-3. [DOI] [PubMed] [Google Scholar]
  51. Nowozin S. Structured learning and prediction in computer vision. Foundations and Trends in Computer Graphics and Vision. 2010;6:185–365. doi: 10.1561/0600000033. [DOI] [Google Scholar]
  52. Panaganti V, Aravind R. Robust nonrigid point set registration using graph-laplacian regularization. Proceedings - 2015 IEEE Winter Conference on Applications of Computer Vision, WACV 2015; 2015. pp. 1137–1144. [DOI] [Google Scholar]
  53. Peng H, Ruan Z, Long F, Simpson JH, Myers EW. V3D enables real-time 3D visualization and quantitative analysis of large-scale biological image data sets. Nature Biotechnology. 2010;28:348–353. doi: 10.1038/nbt.1612. [DOI] [PMC free article] [PubMed] [Google Scholar]
  54. Pereira L, Kratsios P, Serrano-Saiz E, Sheftel H, Mayo AE, Hall DH, White JG, LeBoeuf B, Garcia LR, Alon U, Hobert O. A cellular and regulatory map of the cholinergic nervous system of C. elegans. eLife. 2015;4:e12432. doi: 10.7554/eLife.12432. [DOI] [PMC free article] [PubMed] [Google Scholar]
  55. Peyre G, Cuturi M, Solomon J. Gromov-wasserstein averaging of kernel and distance matrices. 33rd International Conference on Machine Learning, ICML; 2016. pp. 3927–3935. [Google Scholar]
  56. Piatkevich KD, Jung EE, Straub C, Linghu C, Park D, Suk HJ, Hochbaum DR, Goodwin D, Pnevmatikakis E, Pak N, Kawashima T, Yang CT, Rhoades JL, Shemesh O, Asano S, Yoon YG, Freifeld L, Saulnier JL, Riegler C, Engert F, Hughes T, Drobizhev M, Szabo B, Ahrens MB, Flavell SW, Sabatini BL, Boyden ES. A robotic multidimensional directed evolution approach applied to fluorescent voltage reporters. Nature Chemical Biology. 2018;14:352–360. doi: 10.1038/s41589-018-0004-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  57. Piatkevich KD, Bensussen S, Tseng HA, Shroff SN, Lopez-Huerta VG, Park D, Jung EE, Shemesh OA, Straub C, Gritton HJ, Romano MF, Costa E, Sabatini BL, Fu Z, Boyden ES, Han X. Population imaging of neural activity in awake behaving mice. Nature. 2019;574:413–417. doi: 10.1038/s41586-019-1641-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  58. Qu L, Long F, Liu X, Kim S, Myers E, Peng H. Simultaneous recognition and segmentation of cells: application in C. elegans. Bioinformatics. 2011;27:2895–2902. doi: 10.1093/bioinformatics/btr480. [DOI] [PMC free article] [PubMed] [Google Scholar]
  59. Quattoni A, Wang S, Morency LP, Collins M, Darrell T. Hidden conditional random fields. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2007;29:1848–1852. doi: 10.1109/TPAMI.2007.1124. [DOI] [PubMed] [Google Scholar]
  60. Redemann S, Schloissnig S, Ernst S, Pozniakowsky A, Ayloo S, Hyman AA, Bringmann H. Codon adaptation-based control of protein expression in C. elegans. Nature Methods. 2011;8:250–252. doi: 10.1038/nmeth.1565. [DOI] [PubMed] [Google Scholar]
  61. Robie AA, Hirokawa J, Edwards AW, Umayam LA, Lee A, Phillips ML, Card GM, Korff W, Rubin GM, Simpson JH, Reiser MB, Branson K. Mapping the neural substrates of behavior. Cell. 2017;170:393–406. doi: 10.1016/j.cell.2017.06.032. [DOI] [PubMed] [Google Scholar]
  62. Ronneberger O, Liu K, Rath M, Rueβ D, Mueller T, Skibbe H, Drayer B, Schmidt T, Filippi A, Nitschke R, Brox T, Burkhardt H, Driever W. ViBE-Z: a framework for 3D virtual colocalization analysis in zebrafish larval brains. Nature Methods. 2012;9:735–742. doi: 10.1038/nmeth.2076. [DOI] [PubMed] [Google Scholar]
  63. Schmidt M. UGM: A Matlab Toolbox for Probabilistic Undirected Graphical Models. 2007 http://www.cs.ubc.ca/~schmidtm/Software/UGM.html
  64. Scholz M. Predicting natural behavior from whole-brain neural dynamics. bioRxiv. 2018 doi: 10.1101/445643. [DOI]
  65. Schrödel T, Prevedel R, Aumayr K, Zimmer M, Vaziri A. Brain-wide 3D imaging of neuronal activity in Caenorhabditis elegans with sculpted light. Nature Methods. 2013;10:1013–1020. doi: 10.1038/nmeth.2637. [DOI] [PubMed] [Google Scholar]
  66. Sjöstrand K, Clemmensen LH, Einarsson G, Larsen R, Ersbøll B. SpaSM: a MATLAB toolbox for sparse statistical modeling. Journal of Statistical Software. 2018;84:10. doi: 10.18637/jss.v084.i10. [DOI] [Google Scholar]
  67. Solomon J, Peyré G, Kim VG, Sra S. Entropic metric alignment for correspondence problems. ACM Transactions on Graphics. 2016;35:1–13. doi: 10.1145/2897824.2925903. [DOI] [Google Scholar]
  68. Stephens GJ, Johnson-Kerner B, Bialek W, Ryu WS. Dimensionality and dynamics in the behavior of C. elegans. PLOS Computational Biology. 2008;4:e1000028. doi: 10.1371/journal.pcbi.1000028. [DOI] [PMC free article] [PubMed] [Google Scholar]
  69. Stiernagle T. Maintenance of C. elegans WormBook: the online review of C. elegans. Biology. 2006;1:1–11. doi: 10.1895/wormbook.1.101.1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  70. Sutton C, McCallum A. Piecewise pseudolikelihood for efficient training of conditional random fields. ACM International Conference Proceeding Series; 2007. pp. 863–870. [Google Scholar]
  71. Sutton C, McCallum A. An introduction to conditional random fields. Machine Learning. 2010;4:267–373. [Google Scholar]
  72. Szigeti B, Gleeson P, Vella M, Khayrulin S, Palyanov A, Hokanson J, Currie M, Cantarelli M, Idili G, Larson S. OpenWorm: an open-science approach to modeling Caenorhabditis elegans. Frontiers in Computational Neuroscience. 2014;8:137. doi: 10.3389/fncom.2014.00137. [DOI] [PMC free article] [PubMed] [Google Scholar]
  73. Taskar B, Guestrin C, Koller D. Max-margin markov networks. Advances in Neural Information Processing Systems; 2003. pp. 25–32. [Google Scholar]
  74. Taskar B, Chatalbashev V, Koller D. Learning associative markov networks. Twenty-First International Conference on Machine Learning - ICML; 2004. pp. 100–102. [Google Scholar]
  75. Toyoshima Y, Tokunaga T, Hirose O, Kanamori M, Teramoto T, Jang MS, Kuge S, Ishihara T, Yoshida R, Iino Y. Accurate automatic detection of densely distributed cell nuclei in 3D space. PLOS Computational Biology. 2016;12:e1004970. doi: 10.1371/journal.pcbi.1004970. [DOI] [PMC free article] [PubMed] [Google Scholar]
  76. Toyoshima Y. An annotation dataset facilitates automatic annotation of whole-brain activity imaging of C. elegans. bioRxiv. 2019 doi: 10.1101/698241. [DOI] [PMC free article] [PubMed]
  77. Vaadia RD, Li W, Voleti V, Singhania A, Hillman EMC, Grueber WB. Characterization of proprioceptive system dynamics in behaving Drosophila larvae using High-Speed volumetric microscopy. Current Biology. 2019;29:935–944. doi: 10.1016/j.cub.2019.01.060. [DOI] [PMC free article] [PubMed] [Google Scholar]
  78. Venkatachalam V, Ji N, Wang X, Clark C, Mitchell JK, Klein M, Tabone CJ, Florman J, Ji H, Greenwood J, Chisholm AD, Srinivasan J, Alkema M, Zhen M, Samuel AD. Pan-neuronal imaging in roaming Caenorhabditis elegans. PNAS. 2016;113:E1082–E1088. doi: 10.1073/pnas.1507109113. [DOI] [PMC free article] [PubMed] [Google Scholar]
  79. Vidal B, Aghayeva U, Sun H, Wang C, Glenwinkel L, Bayer EA, Hobert O. An atlas of Caenorhabditis elegans chemoreceptor expression. PLOS Biology. 2018;16:e2004218. doi: 10.1371/journal.pbio.2004218. [DOI] [PMC free article] [PubMed] [Google Scholar]
  80. Wainwright MJ, Jordan MI. Graphical Models, Exponential Families, and Variational Inference. Foundations and Trends; 2007. [Google Scholar]
  81. Wakabayashi T, Kimura Y, Ohba Y, Adachi R, Satoh Y, Shingai R. In vivo calcium imaging of OFF-responding ASK chemosensory neurons in C. elegans. Biochimica Et Biophysica Acta (BBA) - General Subjects. 2009;1790:765–769. doi: 10.1016/j.bbagen.2009.03.032. [DOI] [PubMed] [Google Scholar]
  82. Wang F, Vemuri BC, Rangarajan A, Eisenschenk SJ. Simultaneous nonrigid registration of multiple point sets and atlas construction. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2008;30:2011–2022. doi: 10.1109/TPAMI.2007.70829. [DOI] [PMC free article] [PubMed] [Google Scholar]
  83. Williams AH, Kim TH, Wang F, Vyas S, Ryu SI, Shenoy KV, Schnitzer M, Kolda TG, Ganguli S. Unsupervised discovery of demixed, Low-Dimensional neural dynamics across multiple timescales through tensor component analysis. Neuron. 2018;98:1099–1115. doi: 10.1016/j.neuron.2018.05.015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  84. Yemini E, Lin A, Nejatbakhsh A, Varol E, Sun R, Mena GE, Samuel ADT, Paninski L, Venkatachalam V, Hobert O. NeuroPAL: a multicolor atlas for Whole-Brain neuronal identification in C. elegans. Cell. 2021;184:272–288. doi: 10.1016/j.cell.2020.12.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  85. Yu S, Avery L, Baude E, Garbers DL. Guanylyl cyclase expression in specific sensory neurons: a new family of chemosensory receptors. PNAS. 1997;94:3384–3387. doi: 10.1073/pnas.94.7.3384. [DOI] [PMC free article] [PubMed] [Google Scholar]
  86. Zaslaver A, Liani I, Shtangel O, Ginzburg S, Yee L, Sternberg PW. Hierarchical sparse coding in the sensory system of Caenorhabditis elegans. PNAS. 2015;112:1185–1189. doi: 10.1073/pnas.1423656112. [DOI] [PMC free article] [PubMed] [Google Scholar]
  87. Zou H, Hastie T, Tibshirani R. Sparse principal component analysis. Journal of Computational and Graphical Statistics. 2006;15:265–286. doi: 10.1198/106186006X113430. [DOI] [Google Scholar]

Decision letter

Editor: Ronald L Calabrese1

In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.

Acceptance summary:

Linking individual neurons to their anatomical names remains a vexing problem that limits efforts to determine patterns of gene expression and neuronal activity at the single-neuron level in C. elegans. This paper presents a computational method for solving this problem and, as such, has the potential to advance neurobiology in C. elegans. The approach could, in principle, be adapted to any nervous system from which a computational atlas could be derived following the pipeline laid out in this manuscript. The generality of the approach and its ability to incorporate additional datasets in composing the computational atlas, may lend itself to performance improvements by crowd-sourcing datasets.

Decision letter after peer review:

Thank you for submitting your article "Graphical Model Framework for Automated Annotation of Cell Identities in Dense Cellular Images" for consideration by eLife. Your article has been reviewed by Ronald Calabrese as the Senior Editor a Reviewing Editor, and three reviewers. The reviewers have opted to remain anonymous.

The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.

As the editors have judged that your manuscript is of interest, but as described below that additional experiments are required before it is published, we would like to draw your attention to changes in our revision policy that we have made in response to COVID-19 (https://elifesciences.org/articles/57162). First, because many researchers have temporarily lost access to the labs, we will give authors as much time as they need to submit revised manuscripts. We are also offering, if you choose, to post the manuscript to bioRxiv (if it is not already there) along with this decision letter and a formal designation that the manuscript is "in revision at eLife". Please let us know if you would like to pursue this option. (If your work is more suitable for medRxiv, you will need to post the preprint yourself, as the mechanisms for us to do so are still in development.)

Summary:

Whole brain imaging in C. elegans allows monitoring the activities of a large fraction of worm neurons during behaviors. However, the identity of each cell in the nerve ring remains largely unknown, prohibiting further analysis that integrates structural and functional information of the worm circuit. This problem has been partially amended by manual annotation from experts. However, this is a time-consuming and error-prone solution when one is dealing with a large dataset. Thus, the reviewers were enthusiastic about this manuscript because the authors take one important step towards automated annotation of neuronal identities. Methods in this work are innovative and attractive. By minimizing energy functions imposed by absolute coordinate differences (unary potential) between images and atlases, as well as differences of the pairwise relationship between neurons, the authors show that their CRF_ID model outperforms SOTA registration model in accuracy, speed, and robustness.

Essential revisions:

There were several concerns that necessitate serious revision and clarification. Ultimate acceptance by eLife will depend upon how the concerns can be addressed and whether the methodology will be practically useful to others in the field after further clarification. The original reviews are appended, and all concerns should be addressed. Chief among the concerns that emerged on the reviewer consultation were:

1) The authors were not clear about whether any data were held out in developing the data-driven atlas and how this choice will impact the ultimate utility of the method. Moreover, they were not clear about how this atlas was built – the motivation for building it.

2) The authors must be clear when synthetic data is being used and which strain is being used for the biological data.

3) The reviewers found the text inadequate and essentially reviewed the Appendix. For ultimate publication the authors must make the methodology clear to general readers in the main text.

4) The prediction accuracies of the model vary from case to case and was quite low in the scenario of missing neurons. The authors must state clearly, under the worst scenario such as whole brain imaging without colors, the prediction accuracy of their methods. The prediction accuracies of the CRF model, contingent upon different datasets and different noise levels, should be in a self-contained main figure, not in the supplements.

5) A key element of these reagents/strains and the annotation is that, strictly speaking, it annotates nuclei (not neurons). The authors need to explicitly tackle the question of whether the method would work with cytoplasmic markers and to properly annotate the strains they did create as using nuclear localization signals.

6) Important elements of the metrics presented to evaluate the annotation method are poorly described in the manuscript, and it is unclear from the manuscript or from the materials in the code repo the form of the results. By this we mean not the file format, but rather does the algorithm return a single assignment or a ranked list for each cell or something else?

7) Concerning the spatial location of the neurons used in the ground truth datasets. From inspecting the authors' GitHub repo, it seems that the neurons in the hand-annotated ground-truth Neuropal datasets are not evenly sampled throughout the head. Rather they are enriched in easy-to-label locations, like the anterior of the head, with less coverage in the more difficult locations (e.g. few neurons are labeled in the lateral ganglia or retrovisicular ganglia). The authors should explicitly recognize this bias in the hand-annotated ground-truth dataset.

Reviewer #1:

The application of conditional random fields to neuron registration in C. elegans represents a significant advance that will impact the field. I have major concerns about the interpretation of the method's performance: some technical, for example test sets were seemingly not held out from the data-driven atlas; and others more related to confusing or potentially misleading presentation. It was often unclear which dataset, method, reagent or performance metric (for example Top 1 vs Top 5) was being discussed. I also have concerns about giving appropriate credit for work and reagents from other groups.

1) Test sets used to assess performance do not seem to have been held out from the data-driven atlas, which raises doubts about the interpretation of Figure 2A and Figure 6E and C. This could artificially inflate performance compared to tests with a held-out animal, which would reduce the utility of the method. This could also explain the dramatic difference in performance observed fr om synthetic data with experimentally derived noise (35% accuracy) Figure 2—figure supplement 4A, and real experimental data matched to the data-driven atlas (>75% accuracy Figure 2A).

2) Lack of clarity could falsely give readers the impression of higher performance:

a) It is often ambiguous as to whether Top 1 or Top 5 performance is being plotted. Example: in Figure 3B the caption makes no mention, so one would assume Top 1 accuracy; but subsection “Cell annotation in gene expression pattern analysis” suggests Top 5. This is a crucial distinction and needs clarity in every subpanel.

b) It is also unclear when synthetic data is being used. Example: subsection “Identity assignment using intrinsic features in CRF_ID outperforms other methods” states that the algorithm "maintains high accuracy even when up to 75 % of cells in atlas were missing for data" but from the figures I gather this is synthetic data with unnaturally low position noise, so this could give the reader the wrong impression.

c) Similarly, it is often unclear from the figures which strain is being used. For example the strain in Figure 4A-C is not labeled, but it appears to be the same as Figure 3A,B. But if that's the case its unclear why performance would be so much higher in Figure 4B than Figure 3B. Please label which strain is used in which figure subpanel throughout the manuscript, or provide a table.

d) Related: Figure 4A prominently displays "calcium imaging," when this is in fact a GFP strain. This misleads the reader

3) The work critically relies on reagents from other groups but could do a better job acknowledging those contributions:

a) The Neuropal strain is crucial to the manuscript and provides the majority of the ground-truth data starting with the Results and Figure 2, but Neuropal is first mentioned only towards the very end of the Results section and even then only in the context of Figure 6.

b) The strain list should reference publications for strains and alleles used.

c) Subsection “Cell annotation in gene expression pattern analysis” misleadingly states that "a reporter strain was crossed with pan neuronal red fluorescent protein" but this cross was not performed by the authors, it was performed by another group, published and is publicly available on CGC. This should be fixed.

4) More details are needed about the ground truth datasets, including the number of neurons hand-annotated in each of the 9 individuals and their spatial location. (Manual annotation of Neuropal is easiest at the periphery and hardest in center of the ganglia, and so it would be important to know coverage).

Reviewer #2:

Whole brain imaging in C. elegans allows us to monitor the activities of a large fraction of worm neurons during behaviors. However, the identity of each cell in the nerve ring remains largely unknown, prohibiting further analysis that integrates structural and functional information of the worm circuit. This problem has been partially amended by manual annotation from experts. However, this is a time-consuming and error-prone solution when one is dealing with a large dataset. We are enthusiastic about this manuscript because the authors take one important step towards automated annotation of neuronal identities. Methods in this work are innovative and attractive. By minimizing energy functions imposed by absolute coordinate differences (unary potential) between images and atlases, as well as differences of the pairwise relationship between neurons, the authors show that their CRF_ID model outperforms SOTA registration model in accuracy, speed, and robustness.

We would like to raise several main issues of this manuscript.

1) Writing. The main text of the manuscript is brief, dense, technical, and very difficult to understand. I skipped the main text and went directly to the Appendix. This arrangement is not wise for a general audience, especially biologists. In future revisions, please embed the figures in the text, which would save us a lot of time. Here are some specific suggestions:

a) The CRF framework can be better explained by building upon physical intuition. The mathematical notations sometimes only cause confusion. For example, is it necessary to introduce clique? I thought that every neuron in their graph model is adjacent to every other neuron. If this is not the case, please describe how the graph model is constructed. You find the note on maximum entropy model is out of the blue (subsection “Structured prediction framework”). If you want to make a deep connection with statistical physics, you better explain it well in the text!!

b) The motivation to build and how to build the data-driven atlases are dismissive in the main text. It is briefly mentioned in subsection “Identity assignment using intrinsic features in CRF_ID outperforms other methods”. But these sentences are extremely difficult to understand without reading the Appendix. The Appendix -- extended methods S1.7, however, needs to be revised and better explained. For example, the equations are not quite consistent, such as in Equations 6 and 25, and the notations are quite confusing. Linking some examples like Figure 2B to S1.7 will help understand the concept.

c) The prediction accuracies of the model vary from case to case. It was quite low in the scenario of missing neurons. The authors need to state clearly in their paper, under the worst scenario such as whole brain imaging without colors, the prediction accuracy of their methods. This information is now buried somewhere that is hard to find. The prediction accuracies of the CRF model, contingent upon different datasets and different noise levels, would better be a self-contained main figure, not in the supplements. To follow the logic flow of the text, the figure panels that compared CRF with other registration methods would better be introduced later.

2) Notations. As mentioned above, the symbol consistency is a problem. In the formulation of the model, λ's, trainable parameters in original CRF, have been abused. For the λ appearing in Equations 2/3/4/6/7/8/9/10/32/34/37/46, the author might consider making a clear statement that which of them are trainable and which are hyperparameters. If they are hyperparameters, the setting and tuning strategies need to be disclosed. Also, the keywords "identification" and "annotation" should be considered unified.

3) Loop Belief Algorithms, speed, and stability. The authors should provide details on the computational complexity and the stability of their algorithms and compare it to other registration methods. In their current graph model, does each trial converge and do they all converge to the same solution starting from different initial conditions? If not, how do the authors deal with the discrepancies? Likewise, the authors may want to comment on the memory usage for constructing the data-driven atlases to see if it can be another innovation point comparing to the registration methods.

4) Other missing information and explanations:

a) The authors should post the exact name and its settings of the registration models used for comparison.

b) In Appendix S1.2.1 Unary potentials – Positions along AP axis, as x-y is a 2D plane, we do not see why AP's is always better than LR's and DV's.

c) Following the explanation of generating synthetic data (subsection “Generating synthetic data for framework tuning and comparison 464 against other methods”) is a bit hard. The reason for changing the low bound to zero might need further illustrations. Could you please comment on why the prediction accuracy is much lower in synthetic data (Figure 2A)?

d) For subsection “Whole-brain data analysis”, the author might consider linking the paragraph to Figure 5, making 0.1Hz a reasonable choice.

e) For subsection “Whole-brain data analysis”, "[0.05, 0.05]" seems to be a mistake.

f) Subscripts in Equation 13 are wrong.

g) Providing more annotations and explanations on the meaning of the colors on some worm images (e.g., Figure 4D) would be helpful.

5) Codes on GitHub may need to be slimmed down.

Reviewer #3:

Linking individual neurons to their anatomical names is a vexing problem that limits efforts to determine patterns of gene expression and neuronal activity at the single-neuron level. Given the landmark Mind of the Worm paper (White et al., 1986) and subsequent analyses of its connectome, many imagine that annotating individual neurons in C. elegans is a solved problem. Unfortunately, it is not. This paper presents a computational method for solving this problem and, as such, has the potential to advance neurobiology in C. elegans. The approach could, in principle, be adapted to any nervous system from which a computational atlas could be derived following the pipeline laid out in this manuscript. There is much to recommend this study and many opportunities for improvement regarding the presentation of the computational method, its limitations and expectations for application, and for considerations of generalization to other nervous systems.

1) Explanation of the computational method

The computational strategy is difficult to follow in the main text, unless one reviews the appendices. We suggest elaborating the process using more generic terms in the main text. The following specific elements need clarification or additional support

- The mixed approach of considering intrinsic and extrinsic similarities between image and atlas is described in a manner that assumes readers are already familiar with this computational strategy. Given the breadth of the eLife readership, the authors would increase the accessibility and impact of the paper by providing readers with examples of what features are intrinsic and which are extrinsic in this case.

- The difference between this approach and previous strategies based on image registration is communicated clearly in the Appendix, but not in the main text.

- Much is made of the computational efficiency of the CRF_ID approach, but little is provided in the way of data or analysis supporting this claim. The authors could, for example, compare the computational energy/time needed for this approach compared to prior image matching approaches. They could discuss the nature of this computational framework compared to others with respect to computational efficiency.

Generality of the method

To better support the authors claim that this method is easily adaptable for any nervous system, please apply the framework to estimate the optimal number (and/or density) of landmarks as a fraction of the total number of nuclei or density in 3D. This would assist others in applying the framework to other simple nervous systems (e.g. Aplysia, leech, hydra). Additionally, it is not clear if the image stacks used to derive data-driven atlases are distinct from the data analyzed for prediction or whether the same data is used iteratively to build atlases and improve prediction accuracy.

Application of the method to C. elegans

Specifying the AP/DV/LR axes from raw data is not completely automatic but rather appears to require some user input based on prior knowledge. This is not stated clearly in the text. In addition, it is also not clear how the algorithm handles data from partially twisted worms (non-rigid rotation around the AP-axis). This problem is more likely to be encountered if the images capture the whole worm. Is it possible to allow rotation of the DV and LR axes at different points along the AP axis?

Specific suggestions for improvement are collected below.

(1) The method depends on the tidy segmentation of nuclei, but the segmentation strategy is not discussed nor is whether or not nuclear localization of landmarks or unknown markers is essential for accurate annotation.

(2) The authors need to explicitly inform readers that the markers in the NeuroPal strains (OH15495, OH15500) and the GT strains are localized to neuronal nuclei. Additionally, the authors need to address (1) whether or not nuclear localization is required for the annotation framework to succeed, and (2) discuss whether or not a cytoplasmic marker would be sufficient, and (3) provide more information on the segmentation algorithm used and recommended for use with the computational framework discuss.

(3) The authors use the AML5 strain as a proxy for GCaMP-labeled strains (subsection “Cell annotation in multi-cell functional imaging experiments”) to demonstrate the utility of the annotation tool for identifying cells used for calcium imaging. This strain differs from the NeuroPal strain expressing GCaMP is several respects that are not discussed by the authors, leaving readers to wonder about the utility of AML5 as a proxy and to question whether or not the success of cell identification in this strain is truly generalizable. Two main differences exist: (1) In AML5, GFP is expressed in both the nucleus and cytoplasm but in in OH15500 GCaMP6s is tagged with an NLS; (2) The basal fluorescence of GCaMP6s, but not GFP depends on basal calcium concentrations-this fact may introduce additional detection noise/variance into the annotation problem. The authors need to discuss how these differences affect (or don't) the ability to generalize from success with the proxy strain to success with NLS::GCaMP6s expression.

(4) Please clarify the nature of the output from the code that implements the framework. In other words, a reader should learn from the paper what kind of output file they might receive if they attempted to apply the tool to their own images (annotated image with top prediction candidate, data structure with all predictions and respective probabilities, etc).

(5) Terminology – There are a number of general computational terms and specific terms relevant to this problem that are not defined for the reader (listed below). Providing readers with implicit or explicit definitions of these terms will improve clarity.

5(a) "count noises" – used to represent variation in the number of nuclei detected in a given image stack, but not defined at first use. Please provide conditions under which the number of nuclei detected might vary along with defining this term. Additionally, this is a singular concept and should be written as "count noise" not "count noises" (by analogy to "shot noise").

5(b) There several other instances in which "noises" is used where "noise" is meant.

5(c) "intrinsic similarity" – which seems to represent pairwise positions in 3D in the image and the atlas.

5(d) "extrinsic similarity" – not clear.

(6) (Subsection “Identity assignment using intrinsic features in CRF_ID outperforms other methods”) "More importantly, automated annotation is unbiased and several orders of magnitude faster than manual annotation by researchers with no prior experience."

(a) Unbiased – It is a common misconception that computational strategies are unbiased. For this claim to be rigorously accurate, the authors would need to demonstrate that each and every nucleus was equally likely to be identified correctly by the automated annotation algorithm. For instance, it is easy to imagine that some nuclei, such as those closer to a landmark, are more likely to identified correctly and with less uncertainty than those further away. If this were true, then automated annotation has a systematic bias in favor of nuclei closer to landmarks and a systematic bias against nuclei further away from landmarks. And would, therefore be biased – even though no human supervision is involved. The authors should back this claim up with data or clarify that what they mean is that automated annotation is an improvement over (biased and subjective) human annotation.

(b) Faster -The authors are free to speculate about the relative speed of automated vs. manual annotation by an inexperienced researchers in the discussion. By writing this as a result, however, data are needed to back it up.

(7) (Materials and methods) From the description of how GT290 and GT298 were made, it seems that both constructs include 2x NLS fragments, but the genotype description under reagents lacks that annotation and most, if not all other references to this strain neglect to mention the NLS.

Presentation concerns that need attention: There are three instances of undisclosed data re-use among the figures that need to be addressed. (1) The image in Figure 4A is identical to the green channel of the image shown in Figure 3A; (2) Some of the data shown in Figure 2—figure supplement 4A – Part of the data shown here is also shown in Figure 2C; (3) Figure 4C – Please indicate if the same number of cells were removed in each run. Also, it appears that this data also may have been used in Figure 4—figure supplement 1. If so, please disclose data re-use.

[Editors' note: further revisions were suggested prior to acceptance, as described below.]

Thank you for submitting your article "Graphical-Model Framework for Automated Annotation of Cell Identities in Dense Cellular Images" for consideration by eLife. Your article has been reviewed by Ronald Calabrese as the Senior Editor, a Reviewing Editor, and two reviewers. The reviewers have opted to remain anonymous.

The reviewers have discussed their reviews with one another, and the Reviewing Editor has drafted this to help you prepare a revised submission.

Essential Revisions:

(1) The authors performed additional computational experiments by holding out some of the data used to create atlases – an important and general strategy for computational solutions that depend on real-world data. The authors present a comparison of this approach for reviewers, but not for explicitly for readers. These analyses and plots must be included in the final version main text.

(2) The authors should state clearly the computational speed of their algorithms and compare it with the registration methods. The inference and subsampling procedures (~ 1000 times?) on the data-driven atlas are time-consuming and computationally expensive.

Reviewer #2:

Main Text

(1) Thanks for sending us a much better written manuscript. The authors should consider replotting some of their figures, which can be difficult to read.

(a) The neuron names in Figure 2F and Figure 2—figure supplement 6C are difficult to read. Is there a new way to plot? When I wanted to zoom in to have a better understanding, they caused my Preview crashed multiple times…

(b) The meaning of the error bars throughout the manuscript are not well explained.

(2) The authors should state clearly the computational speed of their algorithms and compare it with the registration methods. The inference and subsampling procedures (~ 1000 times?) on the data-driven atlas are time-consuming and computationally expensive.

(3) Is there a way to make the data-driven atlas publicly available in addition to the raw annotated datasets (n=9)?

Rebuttal

(1) Pipeline (Major Point#6):

I do not quite understand the necessity of single or multi-run, which can be combined into one inference process. Consider the following scenario in a freely behaving animal. At one time, the region to be annotated consists of N neurons. At a different time, a new neuron (now N+1) is squeezed into the region. It is probably always true that the number of neurons in the data is different from that in the atlas.

(2) Color and Robustness and Evaluation Metrics (Essential revisions #1, #4, #6):

It seems that the leverage of color information for annotation is quite limited in the revised manuscript. The procedures for properly normalizing the color distribution are also complicated. I also found that the following sentence was incorrect. "Further, when leave-one-out atlases are used for both positional relationship and color (Author response image 2), the accuracy achieved by top labels is marginally greater than the accuracy achieved by using leave-one-out positional relationship only (Author response image 1)". It was actually significantly lower if I read the figures correctly. This leaves the question how we shall correctly assemble the data-driven atlas efficiently.

Reviewer #3:

The revision is very responsive to the summary and specific reviewers' critiques.

There remain some awkward phrasing and missing details to address. These are noted in order to make the text accessible to a broader audience and to increase the impact of the study both within and beyond the community of C. elegans researchers. These include:

(1). The use of the plural "noises" when the authors probably mean either "noise" in the singular e.g. ” Prediction accuracy is bound by position and count 316 noise in data”, replace "amount of various noises" with "amount of noise contributed from various sources". This latter example combines both the concept of the magnitude of noise and the diverse types of noise.

(2) Shorthand terms or jargon. For example, when the text reads "the position of cells AIBL" what is meant is "the position of the cell bodies of AIBL". Similarly, replace "neurons" with "neuronal cell body" or "neuronal somata".

(3) C. elegans strains – in response to critiques of the initial submission, the authors have clarified the origin of the transgenic strains used in the study. Thank you! Please also correct the reference to AML70 which is not included in the table of strains. Based on context, it looks like the authors are referring to AML32. Please ensure that this is corrected.

(4) Figure 5, panel E. Please refer to Figure 5—figure supplement 1 panel C noting the cells with non-zero weights in SPC1, SPC2, SPC3

(5) Figure 5 panel H: Indicate the origin of the motion trace shown. It appears to belong to one of the posterior cells shown in Panel F. Please provide a rationale for this choice – since the traces in Panel F certainly indicate that motion signals differ among cells.

eLife. 2021 Feb 24;10:e60321. doi: 10.7554/eLife.60321.sa2

Author response


Essential revisions:

There were several concerns that necessitate serious revision and clarification. Ultimate acceptance by eLife will depend upon how the concerns can be addressed and whether the methodology will be practically useful to others in the field after further clarification. The original reviews are appended, and all concerns should be addressed. Chief among the concerns that emerged on the reviewer consultation were:

1) The authors were not clear about whether any data were held out in developing the data-driven atlas and how this choice will impact the ultimate utility of the method. Moreover, they were not clear about how this atlas was built – the motivation for building it.

We appreciate the reviewers for pointing these out. These points are indeed very important and were an oversight in the initial submission. We have since tested the model with held out data (detailed below) and added more text to emphasize the motivation for building the atlas main text, Appendix – extended methods S1.7.

We present new cell identity prediction results with held out test data. In all new results presented below, we built the atlases with all but one data set (one worm image stack) and used the held out set for testing the accuracy; thus, the atlas each time is a leave-one-out atlas. We then report the average and spread of these predictions. In the text below, we explicitly compare what was reported in the initial manuscript and what is reported in the revision. It is important to note that with the new results, all conclusions stand.

First, for Figure 2A where we build only positional relationship atlases, here we predict identities in experimental whole-brain datasets using data driven atlases. Note in this case, no color information is used for prediction. With leave-one-out-atlases, we see no difference in performance compared to previous results (Author response image 1). This implies that leave-one-out atlases are able to represent positional relationships among cells in test data.

Author response image 1. Accuracy comparison on experimental datasets when prediction was done using atlas built with all data vs leaveone-out atlases.

Author response image 1.

Prediction was done using positional relationship information only. This figure is update for Figure 2A. Experimental datasets come from NeuroPAL strains.

Second, for Figure 6C, in contrast to Figure 2A, we build both positional relationship atlases and color atlases, i.e. we use color information along with positional relationship information to predict identities. We tested accuracy for several kinds of atlases: (1) leave-one-out-positional and allcolor atlas – using only positional relationship leave-one-out atlas and color information comes from all datasets including test data; (2) all-positional and leave-one-out-color atlas – using only color leave-one-out-atlas and positional relationship information comes from all datasets including test data; (3) leave-one-out-positional-and-color atlas – using both positional relationship and color leave-one-out atlases (Author response image 2).

Author response image 2. Accuracy achieved on experimental datasets when prediction was done using different combinations of atlases for positional relationships and color.

Author response image 2.

In these cases, leave-one-out atlas was used for either positional relationship, color or both. Experimental datasets come from NeuroPAL strains.

We found that using color leave-one-out atlases decreases accuracy more sharply than using positional relationship leave-one-out atlas. Further, when leave-one-out atlases are used for both positional relationship and color (last bar in Author response image 2), the accuracy achieved by top labels is marginally greater than the accuracy achieved by using leave-one-out positional relationship atlas only (second pair of bars in Author response image 1, and Figure 2A). With the data sets at hand, using color does not contribute much to improving accuracy; this implies that the color variability among animals is far greater than the positional variability.

We see that raw RGB intensity distribution in test image data is different from RGB intensity distribution in atlas. This could be due to several reasons, including inherent differences in NeuroPAL fluorophore expression levels across animals, and differences in image acquisition settings across sessions (exposure times, laser power) etc. As a consequence, raw RGB values that represent colors of cells vary across animals. NeuroPAL manuals also suggest manual contrast and γ adjustment of color channels in images before performing manual annotation of cell identities. Thus, to be able to use color information for automatic annotation we need to first align distribution of features (RGB values) in atlas to distribution of features in test data, also commonly known as domain adaptation problem in machine learning.

In order to increase the utility of the color information, either a much larger dataset would need to be acquired to account for the variability, or that the atlas needs to be built based on images with as close to the experimental conditions as the data to be analyzed (which in our estimates is not a trivial task).We have elaborated on these points in the manuscript.

To test whether we can improve the utility of the color information, we explored several simple methods to align color values of images used for building leave-one-out color atlas to color values in test data. Note that color matching is performed using full RGB distributions in image without using any cell identity information. The methods we tried included simple normalization (norm) of color channels, color constancy/invariant transformation[1] (colconst), histogram matching (histm), contrast and γ adjustment (imadj), and a combination of these methods (Author response image 3). For each method we found that accuracy was improved for some data sets but not all datasets.

Author response image 3. Accuracy on experimental datasets when prediction was done using different leave-one-out atlases for color.

Author response image 3.

In each method, a different technique was used to match colors of images used to build atlas to colors of test image. Leave-one-out atlas for both positional relationships and color was used for prediction. Experimental datasets come from NeuroPAL strains.

Finally, we found that using an ensemble of two methods for color atlas gave best accuracy results (Author response image 4), improving top label accuracy by 6% (with an average of 80.5%) compared to the baseline case of color leave-one-out atlas (average of 74.5%). Here, ensemble of two methods means using a combination of two different leave-one-out color atlases with different color alignment manipulations performed in each for predicting cell identities in test data. Weights of 0.2 and 0.8 were assigned to the two atlases in ensemble (determined by cross validation). The methods included in the best ensemble were (1) histogram matching (i.e. histm), and (2) color invariant manipulation, normalization and subsequent histogram matching (i.e. colconst + norm + histm). We have updated the ensemble methods results in Figure 6C, Figure 6—figure supplement 2 and updated the text.

Author response image 4. Accuracy comparison between base case (when prediction was done using leave-one-out color atlas without any color matching) and ensemble of two leave-one-out color atlases (with different color matching techniques).

Author response image 4.

In both cases, same leave-one-out atlases for positional relationships were used. Experimental datasets come from NeuroPAL strain (n = 9). Results in this figure are updated in Fig. 6C.

Using ensemble of leave-one-out-color atlases also led to improvement in accuracy in non-rigidly rotated animal datasets (related to Figure 6E), improving accuracy of top labels from 39.1% (base case) to 48.8% (ensemble atlases with color matching) (Author response image 5). We have updated these results in Figure 6E, Figure 6—figure supplement 3 and updated the text.

Author response image 5. Accuracy comparison between base case (when prediction was done using leave-one-out color atlas without any color matching) and ensemble of two leave-one-out color atlases (with different color matching techniques).

Author response image 5.

In both cases, same leave-one-out atlases for positional relationships were used. Experimental data comes from NeuroPAL strains with non-rigidly rotated animals (n = 7). Results in this figure are updated in Fig. 6E.

We also updated results in Figure 6B, where we compare accuracies of several methods on experimental whole-brain datasets. In this figure, for fair comparison across methods OpenWorm atlas was used for positional relationships. For, methods that use color information (i.e. ‘Color’, ‘Reg. + Color’, ‘Color + Rel. Position’, and ‘Reg. + Color + Rel. Position’ methods), data-driven color atlas was built. Now we perform predictions using ensemble of leave-one-out color atlases for methods that use color information. Note that the conclusion remains same – CRF model with positional relationship features along with color outperforms other methods.

We have updated Appendix – extended methods section S2.4 with description of building baseline color atlas, building color atlas after matching color distributions in training data to test data and using ensemble of color atlases for prediction.

Thus, we conclude that in order to be able to utilize color information of cells in NeuroPAL for automatic annotation of cell identities, color consistency across animals needs to be maintained, either experimentally or by post hoc corrections. Experimentally, consistent protocol/imaging settings across sessions should be maintained. Even with consistent protocol, color variation may exist. This can be tackled by (1) collecting large volume of data to capture each cells’ full RGB variations, and (2) using computational domain adaptation techniques. More advancement in image color transfer and domain adaptation techniques will further improve accuracy in future. We have updated these discussions in the text.

[1] Finlayson, Schiele and Crowley. 1998.

To clarify the motivation and methodology of building data-driven atlas, we have now rewritten several sections and included an additional schematic figure –

1) We have text in the Introduction section that talks about the variability of the cell positions in the data which is the main motivation for building such data-driven atlas.

2) We have re-written the “Cell annotation using Structured prediction framework”. Here we describe the motivation and intuition behind building data-driven atlas

3) We have re-written the section “Appendix – extended methods S1.8 Building data-driven atlas”. This section further expands on the intuition and methodology of building data driven atlas for positional relationship features.

4) We have added new details in the section “Appendix – extended methods S2.4” to describe the methodology of building data driven atlases for color features.

5) We have added a new figure as Figure 1—figure supplement 1 that schematically explains how the positional relationship features in the model are modified when data driven atlas is used compared to when static atlas is used.

2) The authors must be clear when synthetic data is being used and which strain is being used for the biological data.

We have now indicated synthetic data or experimental data in each figure subpanel. Strain being used is mentioned in Figure legends. Following figures and figure legends have been changed: Figure 2, Figure 3, Figure 4, Figure 5, Figure 6, Figure 2—figure supplement 1, Figure 2—figure supplement 2, Figure 2—figure supplement 3, Figure 2—figure supplement 4, Figure 2—figure supplement 5, Figure 2—figure supplement 6, Figure 4—figure supplement 1, Figure 5—figure supplement 1, Figure 6—figure supplement 1, Figure 6—figure supplement 2, Figure 6—figure supplement 3.

3) The reviewers found the text inadequate and essentially reviewed the Appendix. For ultimate publication the authors must make the methodology clear to general readers in the main text.

We have rewritten the subsection “Cell annotation formulation using structured prediction framework” in main text, mainly by moving some of the prior text from the appendix and streamlining the description.

1) We describe the methodology to make it more accessible to readers.

2) We describe the motivation and intuition behind building data-driven atlases (subsection “Identity assignment using intrinsic features in CRF_ID outperforms other methods”, Appendix – extended methods S1.7)

3) We have added a new schematic figure as Figure 1—figure supplement 1. This figure clearly explains (1) all features in the CRF model, (2) difference between intrinsic similarity and extrinsic similarity features, and (3) how features are updated with data driven atlas.

We realize that this beginning portion of the text may be math-heavy, but left it as such for the sake of clarity as the reviewers proposed.

4) The prediction accuracies of the model vary from case to case and was quite low in the scenario of missing neurons. The authors must state clearly, under the worst scenario such as whole brain imaging without colors, the prediction accuracy of their methods. The prediction accuracies of the CRF model, contingent upon different datasets and different noise levels, should be in a self-contained main figure, not in the supplements.

We would like to reiterate a few important points that were perhaps obscured in the original manuscript. First, the models we built are data-driven. This means that the accuracy of each model depends on the quality (e.g. signal-to-noise ratio), information content (e.g. availability of labeling of positional information, color information), and quantity of the data. We do not build a single model because the use-case of these models are strongly dependent on the biological questions asked where different reagents are used (e.g. different markers). Second, the accuracy of the models will strongly depend on the information contained in the data that were used to build/train the model. In other words, a model that is built from data of higher quality, more information content, and a larger data set will perform better. Another way to look at this is that models cannot pull information out of thin air (bad data with little information).

In general, missing neurons in the data strongly erode the performance of the models, regardless of the methods used. How exactly missing neurons influence the model performance depends on the information content of the neurons missing. To give an example, neurons that can anchor the AP, DV, and LR axes are highly “valuable”; if they are randomly missing in the simulations, they will have a larger impact than some other neurons. For each scenario that we tested, we ran at least 180-200 random trials to assess the variability.

As we show in the sections “Prediction accuracy is bound by position and count noise” and “Identity assignment using intrinsic features in CRF_ID outperforms other methods”, accuracy of all automatic annotation methods will be degraded by the noise in data. We formalize the concept of types of noise encountered in automatic cell annotation task, i.e. position noise and count noise, and study the effect of these noises on accuracy across a range of noise levels.

Our motivation for testing accuracy across a range of noise levels comes from the fact that the amount of noise that will be encountered practically in experimental data will depend on the use case, and thus affecting the accuracy results. For example, in gene expression pattern analysis, the amount of count noise will be huge since there is no candidate label list available for annotation. In comparison, in multi-cell calcium imaging case, the amount of count noise will be much lower because candidate labels that can be assigned to few cells in image will already be available. Similarly, the effect of position noise on accuracy in whole-brain imaging case (densely packed cells) will be more pronounced than multi-cell calcium imaging case (sparsely labelled and spatially distributed cells). Thus, by systematically varying the noise levels using synthetic data, we show that CRF method outperforms popular registration based methods. The synthetic data illustrate the point about the information content in the data. Again, we emphasize that noise in the data such as missing neurons will affect any method; no one is immune to the problems caused by missing information, and no algorithm can make up information that is not in the data. The important point is whether an algorithm can outperform other algorithms, and get the maximum information from the available data.

Figure 2A is “the worst scenario such as whole brain imaging without colors (of) the prediction accuracy of (the) methods”. New results with leave-one-out atlas are added for the updated Figure 2A.

We have moved the figures and figure supplements closer to the text. We compare the accuracies of methods across different noise levels using synthetic data in Figure 2—figure supplement 5. Since these comparisons are performed using synthetic data, we felt that it should remain as a supplement. Figure 2C-E does contain the major take-away message of the comparisons, i.e. CRF methods performs better than registration based methods in handling both position and count noise. Further it demonstrated superior performance of experimental datasets.

5) A key element of these reagents/strains and the annotation is that, strictly speaking, it annotates nuclei (not neurons). The authors need to explicitly tackle the question of whether the method would work with cytoplasmic markers and to properly annotate the strains they did create as using nuclear localization signals.

We agree with the reviewers that the algorithm is annotating nuclei of neurons currently for whole-brain data (because the datasets are all labeling nuclei). The algorithm should work with cytoplasmic markers, as long as cells can be demarcated as separate entities (e.g. results shown in Figure 3B, Figure 4B, 4C are cytoplasmic markers). For whole-brain imaging, the cytosolic markers will make a bunch of cells look like one big blob, unless perhaps when cell membranes are labeled, in which case, our algorithm should work.

We also note that if the cells cannot be distinguished as separate entities (by eye or by segmentation algorithms), then this will contribute to the count noise that we have characterized in the paper; we have shown that CRF performs better than registration method in handling count noise.

To clarify, our landmark strains contain nuclear localized landmarks for whole-brain imaging application. All nuclei are annotated as shown in Figure 4D.

6) Important elements of the metrics presented to evaluate the annotation method are poorly described in the manuscript, and it is unclear from the manuscript or from the materials in the code repo the form of the results. By this we mean not the file format, but rather does the algorithm return a single assignment or a ranked list for each cell or something else?

We appreciate this comment on clarify. There are two modes of running algorithm: (1) Single run returns top label for each cell; this works when the number of segmented cells/nuclei matches the number of labels available to assign. (2) Accounting for missing neurons returns top candidate for each cell; this requires multiple runs, which we performed in parallel on computing clusters; since neurons missing will dependent on the confidence of knowing which neurons may not show up with the particular reagents (and experimental conditions for that matter), we also report additional top labels (e.g. top 3 and top 5) for users to potentially take advantage of heuristics or prior knowledge that are not hard-coded into the algorithm. We have now described the code output in subsection “Computational workflow for cell identification”. We have also now indicated, in each of the corresponding figure legends, which metric is being shown in the figure.

7) Concerning the spatial location of the neurons used in the ground truth datasets. From inspecting the authors' GitHub repo, it seems that the neurons in the hand-annotated ground-truth Neuropal datasets are not evenly sampled throughout the head. Rather they are enriched in easy-to-label locations, like the anterior of the head, with less coverage in the more difficult locations (e.g. few neurons are labeled in the lateral ganglia or retrovisicular ganglia). The authors should explicitly recognize this bias in the hand-annotated ground-truth dataset.

We agree with this comment. NeuroPAL is at the moment the best available data for neuron identity ground truth since it offers more neurons labeled than other strains available. Since we are not experts in neuroPAL, we tried our best at annotating the ground truth and may be biased in the cells identified. Additional details of number of cells in each of the anterior, middle, and posterior regions of head ganglia annotated in datasets set are provided below. We have added these details in the manuscript as Figure 2—figure supplement 3. As can be seen, we have good coverage in anterior and lateral ganglions. In our experience, annotating cells in the Retrovesicular ganglion (included in posterior region in our analysis) and the Ventral ganglion (included in middle region in our analysis) is the hardest. In animals with rotated orientations we have lower coverage because manually identifying neurons in this case is more difficult. As the reviewers said, the more data available the better the eventual model for prediction.

Reviewer #1:

The application of conditional random fields to neuron registration in C. elegans represents a significant advance that will impact the field. I have major concerns about the interpretation of the method's performance: some technical, for example test sets were seemingly not held out from the data-driven atlas; and others more related to confusing or potentially misleading presentation. It was often unclear which dataset, method, reagent or performance metric (for example Top 1 vs Top 5) was being discussed. I also have concerns about giving appropriate credit for work and reagents from other groups.

1) Test sets used to assess performance do not seem to have been held out from the data-driven atlas, which raises doubts about the interpretation of Figure 2A and Figure 6E and C. This could artificially inflate performance compared to tests with a held-out animal, which would reduce the utility of the method. This could also explain the dramatic difference in performance observed fr om synthetic data with experimentally derived noise (35% accuracy) Figure 2—figure supplement 4A, and real experimental data matched to the data-driven atlas (>75% accuracy Figure 2A).

Please see our explanation above (#1 in the overall response).

2) Lack of clarity could falsely give readers the impression of higher performance:

a) It is often ambiguous as to whether Top 1 or Top 5 performance is being plotted. Example: in Figure 3B the caption makes no mention, so one would assume Top 1 accuracy; but subsection “Cell annotation in gene expression pattern analysis” suggests Top 5. This is a crucial distinction and needs clarity in every subpanel.

Please see our explanation above (#6 in the overall response).

b) It is also unclear when synthetic data is being used. Example: subsection “Identity assignment using intrinsic features in CRF_ID outperforms other methods” states that the algorithm "maintains high accuracy even when up to 75 % of cells in atlas were missing for data" but from the figures I gather this is synthetic data with unnaturally low position noise, so this could give the reader the wrong impression.

Please see our explanation above (#2 in the overall response).

The point about the synthetic data accuracy is not to say that the algorithm has such and such accuracy, but rather to compare to existing methods (particularly registration) that for a fixed amount of position noise CRF maintains higher accuracy in handling even high count noise level (75%). Further we show that this is true across all position noise levels (Figure 2—figure supplement 5). We changed the language to say “higher accuracy”, rather than high accuracy.

To demonstrate CRF_ID method’s advantage over registration methods on experimental data, we show accuracy in several cases (Figure 2A, Figure 3B, Figure 4B, 4C, Figure 6B, 6C, 6E).

c) Similarly, it is often unclear from the figures which strain is being used. For example the strain in Figure 4A-C is not labeled, but it appears to be the same as Figure 3A,B. But if that's the case its unclear why performance would be so much higher in Figure 4B than Figure 3B. Please label which strain is used in which figure subpanel throughout the manuscript, or provide a table.

Figure 4B is demonstration of cell annotation in calcium imaging case. In this case, the candidate list of identities of cells are known a priori in practical experiments (based on promoters used to express GCaMP in cells). Thus, the task is to assign labels from the candidate list to the cells in images hence a smaller atlas is used for automatic annotation. A smaller atlas of candidate identities also implies smaller count noise. In comparison, Figure 3B is demonstration of gene-expression pattern analysis. In this case, no candidate list of cell identities is known a priori. Thus, full brain atlas is used for automatic annotation. Assigning identities to few GFP labeled cells using whole-brain atlas also implies large count noise. Thus, the cell annotation in gene-expression task is more difficult and hence lower accuracy.

Labeling methods and strains in the figures made figures look very busy, so we have moved the information into the legends instead. We have now added strain information in figure legends. Changes made in Figure 2, Figure 3, Figure 4, Figure 5, Figure 6, Figure 2—figure supplement 1, Figure 2—figure supplement 2, Figure 2—figure supplement 3, Figure 2—figure supplement 4, Figure 2—figure supplement 5, Figure 2—figure supplement 6, Figure 4—figure supplement 1, Figure 5—figure supplement 1, Figure 6—figure supplement 1, Figure 6—figure supplement 2, Figure 6—figure supplement 3.

d) Related: Figure 4A prominently displays "calcium imaging," when this is in fact a GFP strain. This misleads the reader

We used GFP strain as a proxy for GCaMP strain to demonstrate cell identification (mentioned in subsection “Cell annotation in multi-cell functional imaging 603 experiments”). To clarify, we have now modified the Figure 4A title to “Mock multi-cell calcium imaging”.

3) The work critically relies on reagents from other groups but could do a better job acknowledging those contributions:

a) The Neuropal strain is crucial to the manuscript and provides the majority of the ground-truth data starting with the Results and Figure 2, but Neuropal is first mentioned only towards the very end of the Results section and even then only in the context of Figure 6.

We have cited NeuroPAL strain earlier (Introduction).

b) The strain list should reference publications for strains and alleles used.

We have made the changes (Material and Methods section).

c) Subsection “Cell annotation in gene expression pattern analysis” misleadingly states that "a reporter strain was crossed with pan neuronal red fluorescent protein" but this cross was not performed by the authors, it was performed by another group, published and is publicly available on CGC. This should be fixed.

We have made the changes.

4) More details are needed about the ground truth datasets, including the number of neurons hand-annotated in each of the 9 individuals and their spatial location. (Manual annotation of Neuropal is easiest at the periphery and hardest in center of the ganglia, and so it would be important to know coverage).

Please see our explanation above (#7 in the overall response).

Reviewer #2:

Whole brain imaging in C. elegans allows us to monitor the activities of a large fraction of worm neurons during behaviors. However, the identity of each cell in the nerve ring remains largely unknown, prohibiting further analysis that integrates structural and functional information of the worm circuit. This problem has been partially amended by manual annotation from experts. However, this is a time-consuming and error-prone solution when one is dealing with a large dataset. We are enthusiastic about this manuscript because the authors take one important step towards automated annotation of neuronal identities. Methods in this work are innovative and attractive. By minimizing energy functions imposed by absolute coordinate differences (unary potential) between images and atlases, as well as differences of the pairwise relationship between neurons, the authors show that their CRF_ID model outperforms SOTA registration model in accuracy, speed, and robustness.

We would like to raise several main issues of this manuscript.

1) Writing. The main text of the manuscript is brief, dense, technical, and very difficult to understand. I skipped the main text and went directly to the Appendix. This arrangement is not wise for a general audience, especially biologists. In future revisions, please embed the figures in the text, which would save us a lot of time. Here are some specific suggestions:

a) The CRF framework can be better explained by building upon physical intuition. The mathematical notations sometimes only cause confusion. For example, is it necessary to introduce clique? I thought that every neuron in their graph model is adjacent to every other neuron. If this is not the case, please describe how the graph model is constructed. You find the note on maximum entropy model is out of the blue (subsection “Structured prediction framework”). If you want to make a deep connection with statistical physics, you better explain it well in the text!!

We appreciate this comment from the reviewer. We have re-written several sections to make the methodology clear (please see our response to #3 in overall comments). We have moved the figures in the text to facilitate readability, we have moved some supplemental text into the main manuscript, and perhaps most importantly, we have added cartoons to facilitate the explanation of what is done. We now emphasize the intuition-building for potential users while maintaining some terminologies often used in graphical model literature to be consistent with this body of literature.

To some of the specific points from the reviewer:

- It is true that every neuron is adjacent to every other neuron in the graph because the underlying graph structure in our model is a fully connected graph. We have clarified this in the text.

- In mentioning maximum entropy model, we did not intend to make a deep connection with statistical physics. Our aim was to establish an analogy of our data-driven atlases to sufficient statistics in maximum entropy models (exponential family models in general). We realized that this is confusing and not adding too much to the understanding of our model or the use of it, so we have removed this text. We now emphasize in the text, with intuitive explanation, how the statistics can be updated based on available data using simple averaging operations, which makes this approach computationally advantageous over other methods where updating is computationally costly.

b) The motivation to build and how to build the data-driven atlases are dismissive in the main text. It is briefly mentioned in subsection “Identity assignment using intrinsic features in CRF_ID outperforms other methods”. But these sentences are extremely difficult to understand without reading the Appendix. The Appendix -- extended methods S1.7, however, needs to be revised and better explained. For example, the equations are not quite consistent, such as in Equations 6 and 25, and the notations are quite confusing. Linking some examples like Figure 2B to S1.7 will help understand the concept.

We appreciate the reviewer’s sentiment here. We have changed the text to clarify these points here (Results; Appendix – extended methods S1.7). Briefly, we explain that the current atlases (including the OpenWorm atlas) is too idealistic; this is substantiated by our data as well as other published work. For example, in OpenWorm atlas, left-right cell pairs are exactly symmetric across AP axis whereas this is rarely true for experimental data. Because data-driven atlases can capture the actual observed positional relationships among cells in biological data and they can be updated (or specialized) for new data when they become available, they should prove to be more useful.

Our paper is motivated by the following important point: that data-driven atlases achieve higher accuracy. One example is to compare accuracy achieved on same datasets using data-driven atlas with test data held out (Figure 2A) and OpenWorm atlas (Figure 2C left panel). Data-driven atlases can be built easily using the code provided for as per the experimental need. For example, for multicell calcium imaging, an atlas can be built only for cells of interest. In contrast, for whole-brain imaging experiments a separate atlas can be built. We echo these points in the Discussion.

We have revised several sections in the Appendix – extended methods:

1) Motivation and intuition behind building the data driven atlas is explained clearly in Appendix -extended methods S1.7 and in main text.

2) Equations and notations are now consistent. Particularly for Equation 6 and 25 we explain how these equations are consistent using Equation 26.

3) We have added additional figure as Figure 1—figure supplement 1 that schematically explains the how the features in the model are updated when data-driven atlas is used compare to static atlas.

c) The prediction accuracies of the model vary from case to case. It was quite low in the scenario of missing neurons. The authors need to state clearly in their paper, under the worst scenario such as whole brain imaging without colors, the prediction accuracy of their methods. This information is now buried somewhere that is hard to find. The prediction accuracies of the CRF model, contingent upon different datasets and different noise levels, would better be a self-contained main figure, not in the supplements. To follow the logic flow of the text, the figure panels that compared CRF with other registration methods would better be introduced later.

Please see our response to the overall point #4. Again, we stress that prediction accuracy depends upon position noise and count noise present in data compared to atlas. The levels of these noise depend on application e.g. different amounts of count and position noise in multi-cell calcium imaging experiments compared to whole-brain imaging. We have characterized several position and count noise cases using synthetic data and shown that CRF model performs better than registration method. In fact, CRF model is very robust in handling count noise compared to registration based methods (Figure 2D, Figure 2—figure supplement 5B).

Also, Figure 2A shows the accuracy using experimental datasets for whole-brain imaging accuracy without colors.

2) Notations. As mentioned above, the symbol consistency is a problem. In the formulation of the model, λ's, trainable parameters in original CRF, have been abused. For the λ appearing in Equations 2/3/4/6/7/8/9/10/32/34/37/46, the author might consider making a clear statement that which of them are trainable and which are hyperparameters. If they are hyperparameters, the setting and tuning strategies need to be disclosed. Also, the keywords "identification" and "annotation" should be considered unified.

We have now clearly described in Appendix – extended methods S1.2, what terms in Equation 6/7/8/9/10 are hyperparameters and methodology for their setting. Explanation for lambdas in Equations 2/3/4 follow from this explanation. Briefly, all lambdas in these Equations are hyperparameters. We also note that (old Equations 37/46 and new Equations 38/49) are not our model and lambdas in these equations are just demonstration of previous methods.

In our model, the trainable parameters are either feature function values, e.g. positional relationship feature values in Equations 6/7/8, or inputs to the feature functions, e.g. distance between cells in Equation 9 and angle vector between cells in Equation 10. These parameters are trainable parameters and are updated while-building data-driven atlas. This is described in Appendix – extended methods S1.7.

3) Loop Belief Algorithms, speed, and stability. The authors should provide details on the computational complexity and the stability of their algorithms and compare it to other registration methods. In their current graph model, does each trial converge and do they all converge to the same solution starting from different initial conditions? If not, how do the authors deal with the discrepancies? Likewise, the authors may want to comment on the memory usage for constructing the data-driven atlases to see if it can be another innovation point comparing to the registration methods.

We thank the reviewer for pointing out potential confusing points. Each trial with a defined initial condition yields a converged solution; in other words, the solution to the optimization problem is deterministic. If, however, different missing neurons (typically randomly assigned) are used, then the solutions may be different. This is the origin of the ranked list of assigned names for each cell.

We do want to acknowledge that the CRF method for each round of solution is computationally intensive (because it is a quadratic method, as opposed to linear methods), but updating the model with new data is cheap (while other methods require re-optimization using all data simultaneously).

Separately as the reviewer pointed out, atlas-building compared to registration methods is indeed a big point. For registration methods, the issue resides in that not only the computational requirements are large, but also that there is no systematic method of building atlas. First, not all images are registered simultaneously due to memory constraints. Second, atlases are built iteratively by blockwise registration of groups of images to a reference image. Without systematic methodology of which image should be chosen as reference image, atlas is biased towards the reference image and the choice of the reference image is often not as “principled”. Another way atlas can be biased is by the order in which blocks of images are registered to the reference frame. Tackling this challenge is an active field of research Wang et al., 2008; Evangelidis et al., 2014. In comparison, in the CRF method, atlas building is unbiased towards a specific image set because there is no concept of reference image and atlas can be built from all images simultaneously because of the cheapness of mathematical operation. This is a major take-away from this work. We echo these points in the Discussion.

4) Other missing information and explanations:

a) The authors should post the exact name and its settings of the registration models used for comparison.

This information has been added in Appendix-extended methods S2.1.

b) In Appendix S1.2.1 Unary potentials – Positions along AP axis, as x-y is a 2D plane, we do not see why AP's is always better than LR's and DV's.

Here, we wanted to point out that AP is always better because AP axis is always in the xy plane.

In comparison one of the LR or DV (depending on the orientation of worm) is in the z-direction. Since z-sampling in images is lower in resolution compared to xy-sampling, measurements of neuron positions in z-direction contribute to noise in defining positional relationships.

c) Following the explanation of generating synthetic data (subsection “Generating synthetic data for framework tuning and comparison 464 against other methods”) is a bit hard. The reason for changing the low bound to zero might need further illustrations. Could you please comment on why the prediction accuracy is much lower in synthetic data (Figure 2A)?

To generate synthetic data, we start with OpenWorm atlas and add two types of noise to the atlas to mimic noises present in experimental data. These are position noise (random Gaussian noise that is added to spatial positions of cells) and count noise (random combination of cells that are removed from the data). For both position and count noise, we swept through a range of noise level values for comparison against registration methods. Particularly for position noise we swept from zero (zero noise variance) to upper bound noise (seventy-fifth percentile of the position variance of neurons observed experimentally). I.e. we did not change the low bound to zero, we just expanded the range of parameters swept for comparison against other methods. Results for the different levels of position noise are shown in Figure 2—figure supplement 5C.

Figure 2A is not synthetic data. It is experimental data and predictions were performed using data driven atlas built using experimental data with test dataset held out.

In comparison, Figure 2C left-panel is experimental data and prediction is performed using OpenWorm atlas. Lower accuracy achieved in Figure 2C left panel compared to Figure 2A highlights the fact that positional relationships of cells in OpenWorm atlas are not representative of experimental data. This is also the main motivation of building data driven atlases and a major point of the work.

d) For subsection “Whole-brain data analysis”, the author might consider linking the paragraph to Figure 5, making 0.1Hz a reasonable choice.

We have made the changes.

e) For subsection “Whole-brain data analysis”, "[0.05, 0.05]" seems to be a mistake.

This is the bandwidth parameter used in kerned density estimation method for estimating joint probability distributions of neuron activity and motion speed.

f) Subscripts in Equation 13 are wrong.

Thank you. We have made changes.

g) Providing more annotations and explanations on the meaning of the colors on some worm images (e.g., Figure 4D) would be helpful.

Thank you. We have added color annotations in Figure 4D.

5) Codes on GitHub may need to be slimmed down.

We have moved extra files (e.g. examples and extra functions) to a different folder on GitHub.

Reviewer #3:

Linking individual neurons to their anatomical names is a vexing problem that limits efforts to determine patterns of gene expression and neuronal activity at the single-neuron level. Given the landmark Mind of the Worm paper (White et al., 1986) and subsequent analyses of its connectome, many imagine that annotating individual neurons in C. elegans is a solved problem. Unfortunately, it is not. This paper presents a computational method for solving this problem and, as such, has the potential to advance neurobiology in C. elegans. The approach could, in principle, be adapted to any nervous system from which a computational atlas could be derived following the pipeline laid out in this manuscript. There is much to recommend this study and many opportunities for improvement regarding the presentation of the computational method, its limitations and expectations for application, and for considerations of generalization to other nervous systems.

1) Explanation of the computational method

The computational strategy is difficult to follow in the main text, unless one reviews the appendices. We suggest elaborating the process using more generic terms in the main text. The following specific elements need clarification or additional support

- The mixed approach of considering intrinsic and extrinsic similarities between image and atlas is described in a manner that assumes readers are already familiar with this computational strategy. Given the breadth of the eLife readership, the authors would increase the accessibility and impact of the paper by providing readers with examples of what features are intrinsic and which are extrinsic in this case.

We thank the reviewer for this comment. We agree that it is a balance between details and confusion. We have made changes in many parts of the text to address this issue e.g. we have re-written the subsection “Cell annotation formulation using structured prediction framework” to make methodology clear. To clarify the point about intrinsic and extrinsic similarities, we have also included a supplemental figure to illustrate these relationships (Figure 1—figure supplement 1).

- The difference between this approach and previous strategies based on image registration is communicated clearly in the supplement (lines 906-910), but not in the main text.

With new additions, we now describe the conceptual differences between our approach and registration based methods in several places including the Introduction and Discussion, Figure 1—figure supplement 1, Figure 2B. Technical details on differences in objective functions used by these methods are present in Appendix – extended methods S1.9.

- Much is made of the computational efficiency of the CRF_ID approach, but little is provided in the way of data or analysis supporting this claim. The authors could, for example, compare the computational energy/time needed for this approach compared to prior image matching approaches. They could discuss the nature of this computational framework compared to others with respect to computational efficiency.

We thank the reviewer for this comment. This indeed can be confusing. We claim the computational efficiency compared to registration methods only in building data-driven atlases. See the Discussion.

In terms of the run time for optimization, CRF is more computationally expensive because it is a quadratic method as opposed to a linear method. We argue, however, that this is ok because of the gain in accuracy, reduction of bias, and the elimination of the need of choosing references. Please also see responses to related comments (#3) from reviewer 2.

Generality of the method

To better support the authors claim that this method is easily adaptable for any nervous system, please apply the framework to estimate the optimal number (and/or density) of landmarks as a fraction of the total number of nuclei or density in 3D. This would assist others in applying the framework to other simple nervous systems (e.g. Aplysia, leech, hydra).

Our simulation results show that accuracy increases with more landmarks, i.e. there is no saturation point. Thus, there is no optimal number of landmarks from the computational point of view. Practically, however, there are constraints such as the availability of promoters for genetic control of expression and whether landmarks can be easily distinguished. For instance, if too many densely packed cells are labelled as landmarks such that their identity cannot be easily determined manually, then the automated method cannot use these cells as landmarks. The distinguishability of landmarks depends not only on density in 3D but also on geometry of nervous system. Further it is possible that genetic labelling strategies may be too cumbersome or appropriate promoters may not be available to label all desired landmarks. Therefore, it is not possible to provide an estimate of the optimal number of landmarks in general.

Importantly, we want to note that even without landmarks, CRF model can achieve high accuracy (Figure 2A) by using data-driven atlases. Thus, in our opinion, data-driven atlases are more critical for achieving high accuracy compared to landmarks.

Additionally, it is not clear if the image stacks used to derive data-driven atlases are distinct from the data analyzed for prediction or whether the same data is used iteratively to build atlases and improve prediction accuracy.

As discussed in the overall response (#1), we performed the hold-out exercise – holding out a test dataset for prediction and building positional relationship and color atlases using the remaining datasets. We do not iteratively build atlases using the same data on which prediction is to be tested.

Application of the method to C. elegans

Specifying the AP/DV/LR axes from raw data is not completely automatic but rather appears to require some user input based on prior knowledge. This is not stated clearly in the text. In addition, it is also not clear how the algorithm handles data from partially twisted worms (non-rigid rotation around the AP-axis). This problem is more likely to be encountered if the images capture the whole worm. Is it possible to allow rotation of the DV and LR axes at different points along the AP axis?

AP/DV/LR axes are defined using PCA on positions of cells in images, and thus automatic. The only user input is to determine the polarity – which end of the worm is anterior (dorsal/left); this is because PCA algorithm does not distinguish AP from PA axis (the axes are -1 multiplications of each other); same is true for DV and LR axes. In our framework, this ambiguity is easily resolved by users clicking on any neuron in anterior (dorsal/left) of the worm. No other extra prior knowledge is needed from users. The Materials and methods section has a description of this.

Our strategy of handling non-rigid rotations about AP axis is to appropriately rotate LR and DV axes plane about AP axis (as described in Appendix – extended methods S1.3). This is done by rotating LR axis using easily identifiable LR neuron pairs in image. In the current framework (dealing with head ganglion cells), the same rotation is applied across all points in AP axis. However, we agree with reviewer that different rotations can be applied across different points along AP axis. This is currently not implemented in current framework.

Specific suggestions for improvement are collected below.

1) The method depends on the tidy segmentation of nuclei, but the segmentation strategy is not discussed nor is whether or not nuclear localization of landmarks or unknown markers is essential for accurate annotation.

We have added the description of the GMM method in subsection “Whole-brain data analysis”.

Nuclear localization is not necessary as long as cells can be distinguished (e.g. results in Figure 3B, Figure 4B, 4C for strains without nuclear localization). If cells are not detected accurately this leads to count noise. We have characterized various count noise levels (Figure 2—figure supplement 5).

2) The authors need to explicitly inform readers that the markers in the NeuroPal strains (OH15495, OH15500) and the GT strains are localized to neuronal nuclei. Additionally, the authors need to address (1) whether or not nuclear localization is required for the annotation framework to succeed, and (2) discuss whether or not a cytoplasmic marker would be sufficient, and (3) provide more information on the segmentation algorithm used and recommended for use with the computational framework discuss.

Nuclear localization is not necessary as long as cells can be distinguished (e.g. results in Figure 3B, Figure 4B, 4C for strains without nuclear localization). If cells are not detected accurately this leads to count noise. We have characterized count noise (Figure 2—figure supplement 5).

We use a Gaussian Mixture based segmentation algorithm. We have made changes in the text to clarify this point.

3) The authors use the AML5 strain as a proxy for GCaMP-labeled strains (subsection “Computational workflow for automatic cell identification”) to demonstrate the utility of the annotation tool for identifying cells used for calcium imaging. This strain differs from the NeuroPal strain expressing GCaMP is several respects that are not discussed by the authors, leaving readers to wonder about the utility of AML5 as a proxy and to question whether or not the success of cell identification in this strain is truly generalizable. Two main differences exist: 1) In AML5, GFP is expressed in both the nucleus and cytoplasm but in in OH15500 GCaMP6s is tagged with an NLS; 2) The basal fluorescence of GCaMP6s, but not GFP depends on basal calcium concentrations-this fact may introduce additional detection noise/variance into the annotation problem. The authors need to discuss how these differences affect (or don't) the ability to generalize from success with the proxy strain to success with NLS::GCaMP6s expression.

To clarify, according to Nyugen et al., 2016.

1) Nuclear localization: As discussed earlier as well (response to #5 overall comments), CRF_ID model works regardless of marker localization, as long as the cells can be separated/distinguished. If the cells cannot be separated, this leads to count noise. We have demonstrated superior performance of CRF_ID in handling count noise.

2) Cells with only basal fluorescence: Basal fluorescence only could lead to cell detection errors, thus leading to count noise. We have characterized for various amounts of count noise levels for both experimental data (Figure 4C) and synthetic data (Figure 2D, Figure 2—figure supplement 5B), and shown that CRF model performs better than registration methods. In practice, for GCaMP videos, data from multiple frames can be used to get a superset of cells (thus removing count noise) before annotation with CRF_ID. This would directly address the basal fluorescence issue.

3) AML5 example (Figure 4B, 4C) is to demonstrate a use case different than whole-brain imaging (fast annotation of cells in multi-cell calcium imaging videos). To demonstrate success in wholebrain imaging, we show prediction for whole-brain datasets (Figure 2A, Figure 6C).

4) Please clarify the nature of the output from the code that implements the framework. In other words, a reader should learn from the paper what kind of output file they might receive if they attempted to apply the tool to their own images (annotated image with top prediction candidate, data structure with all predictions and respective probabilities, etc).

We run the code in two modes:

Mode 1 – Single run will give a deterministic prediction for each cell.

Mode 2 – multiple runs taking missing cells into account. Each run will generate predicted identities of cells in that run. The results from various runs are used in subsequent step to generate top candidate list for each cell with label consistency score for each candidate.

We have clarified the text about these modes of running the algorithm (subsection “Computational workflow for automatic cell identification”).

(5) Terminology – There are a number of general computational terms and specific terms relevant to this problem that are not defined for the reader (listed below). Providing readers with implicit or explicit definitions of these terms will improve clarity.

5(a) "count noises" – used to represent variation in the number of nuclei detected in a given image stack, but not defined at first use. Please provide conditions under which the number of nuclei detected might vary along with defining this term. Additionally, this is a singular concept and should be written as "count noise" not "count noises" (by analogy to "shot noise").

5(b) There several other instances in which "noises" is used where "noise" is meant.

Count noise (defined in the Introduction and on) is not variation in the number of nuclei detected. It is the difference in the number of nuclei detected in the image, compared to the number of cells present in atlas to which image is to be matched. For example, if 100 cells are detected in an image whereas 200 cells are present in the atlas (i.e. 200 labels) to be used, count noise would be equal to (200-100)/200 = 50%.

The practical reasons for count noise is in the mosaicism of the expression of the genetically encoded marker/reporter, the incomplete coverage of the marker/reporter (e.g. “pan-neuronal” not covering every neuron), errors in cell segmentation method due to dim cells or tightly clustered cells etc. These reasons are given in the same paragraph.

We have changed all mentions of “count noises” to “count noise”.

5(c) "intrinsic similarity" – which seems to represent pairwise positions in 3D in the image and the atlas.

5(d) "extrinsic similarity" – not clear.

We have included Figure 1—figure supplement 1 to clarify these points.

(6) (Subsection “Identity assignment using intrinsic features in CRF_ID outperforms other methods”) "More importantly, automated annotation is unbiased and several orders of magnitude faster than manual annotation by researchers with no prior experience."

(a) Unbiased – It is a common misconception that computational strategies are unbiased. For this claim to be rigorously accurate, the authors would need to demonstrate that each and every nucleus was equally likely to be identified correctly by the automated annotation algorithm. For instance, it is easy to imagine that some nuclei, such as those closer to a landmark, are more likely to identified correctly and with less uncertainty than those further away. If this were true, then automated annotation has a systematic bias in favor of nuclei closer to landmarks and a systematic bias against nuclei further away from landmarks. And would, therefore be biased – even though no human supervision is involved. The authors should back this claim up with data or clarify that what they mean is that automated annotation is an improvement over (biased and subjective) human annotation.

We agree with the reviewer that if there are biases in making the atlas, then those biases will be reflected in the predictions. For example, if no information is present for some cells in the atlas, then it may be expected that prediction accuracy of those cells will be lower, compared to cells for which a lot of information is present in the atlas. However, this is true for any machine learning method, as all machine learning methods are as (un)biased as the training data. (The same perhaps can be said about human curation.)

Our use of the terminology is with respect to maximum-entropy models. With log-linear parameterization of features, CRF model is similar to maximum-entropy models. Further, maximum-entropy models produce a maximally unbiased (maximum entropy) probability distribution over labels subject to some constraints, which in our case are observed positional relationships in the atlas.

In addition, the model is unbiased in the sense that it predicts identities using an atlas (that may be built with fully or partially annotated datasets contributed by various researchers). Thus, CRF_ID method combines the knowledge of all researchers in the form of positional relationship atlas and then predicts identities that maximally tries to satisfy the atlas. Thus, it removes individual biases in cell identity annotation. We have clarified this point in text.

(b) Faster -The authors are free to speculate about the relative speed of automated vs. manual annotation by an inexperienced researchers in the discussion. By writing this as a result, however, data are needed to back it up.

We only included discussions on this point in the Discussion section. Our point is that CRF_ID can be run automatically without direct input after the models are trained, so the use is simple.

(7) (Materials and methods) From the description of how GT290 and GT298 were made, it seems that both constructs include 2x NLS fragments, but the genotype description under reagents lacks that annotation and most, if not all other references to this strain neglect to mention the NLS.

We apologize for the confusion; we have clarified the genotypes. The strains are nuclear localized.

Presentation concerns that need attention: There are three instances of undisclosed data re-use among the figures that need to be addressed. (1) The image in Figure 4A is identical to the green channel of the image shown in Figure 3A; (2) Some of the data shown in Figure 2—figure supplement 4A – Part of the data shown here is also shown in Figure 2C; (3) Figure 4C – Please indicate if the same number of cells were removed in each run. Also, it appears that this data also may have been used in Figure 4—figure supplement 1. If so, please disclose data re-use.

Same strain, AML5, was used for demonstration of cell annotation in gene expression and multicell calcium imaging cases. For gene expression case (Figure 3A) both red and green channels were used for cell identity prediction whereas for multi-cell calcium imaging case (Figure 4A) only green channel was used. We have replaced Figure 4A with a new image.

Data in Figure 2C (right panel) and Figure 2—figure supplement 5A come from the same experiment that is predicting cell identities in experimental datasets using OpenWorm atlas and comparing accuracy across methods. Figure 2C (right panel) shows comparison for only top labels predicted by methods, in comparison Figure 2—figure supplement 5A shows comparison for top 3 and top 5 labels as wells.

Figure 2C is indeed related to Figure 2—figure supplement 5A. Figure 2C highlights the most important comparisons. For the sake of easy side-by-side comparisons, we left Figure 2—figure supplement 5A as is, and added in the legend that part of data are replotted in Figure 2C for simple comparison.

For Figure 4C, the same fraction (3 out of 16) but a different combination of randomly selected cells was removed in each run. We have mentioned this in Figure panel now. Figure 4—figure supplement 1 shows additional results for different numbers of cells removed (2, 4, and 5). We have now mentioned in the legend of Figure 4—figure supplement 1 that part of data are replotted in Figure 4C.

[Editors' note: further revisions were suggested prior to acceptance, as described below.]

Essential Revisions:

(1) The authors performed additional computational experiments by holding out some of the data used to create atlases – an important and general strategy for computational solutions that depend on real-world data. The authors present a comparison of this approach for reviewers, but not for explicitly for readers. These analyses and plots must be included in the final version main text.

We have included the new data in the revised manuscript in the last round (Figure 2A, Figure 6B, C, E, Figure 6—figure supplement 1, Figure 6—figure supplement 2, Figure 6—figure supplement 3).

(2) The authors should state clearly the computational speed of their algorithms and compare it with the registration methods. The inference and subsampling procedures (~ 1000 times?) on the data-driven atlas are time-consuming and computationally expensive.

We have now included a new figure (Figure 2—figure supplement 7) comparing the optimization runtimes of CRF and registration methods. A discussion is added.

Reviewer #2:

Main Text

(1)Thanks for sending us a much better written manuscript. The authors should consider replotting some of their figures, which can be difficult to read.

(a) The neuron names in Figure 2F and Figure 2—figure supplement 6C are difficult to read. Is there a new way to plot? When I wanted to zoom in to have a better understanding, they caused my Preview crashed multiple times…

We have increased the size of Figure 2F and the Figure 2—figure supplement 6C. They are now available on github. We enlarged them to page width but perhaps still not clear enough to see details. Here in the figures we mainly want to show the general shape of the data. For readers interested in the exact details, they may download the figure from github.

(b) The meaning of the error bars throughout the manuscript are not well explained.

Description of boxplots and error bars is included now in all figure legends.

(2) The authors should state clearly the computational speed of their algorithms and compare it with the registration methods. The inference and subsampling procedures (~ 1000 times?) on the data-driven atlas are time-consuming and computationally expensive.

We have now compared the runtimes of CRF_ID framework with registration methods. The discussion is added in subsection “Identity assignment using intrinsic features in CRF_ID outperforms other methods” and on and results are added as Figure 2—figure supplement 7.

(3) Is there a way to make the data-driven atlas publicly available in addition to the raw annotated datasets (n=9)?

The datasets are all on our github. Positional relationship atlases are available here https://github.com/shiveshc/CRF_Cell_ID/tree/master/Runs/Data_driven_atlases/PositionalRelationship_ atlases, and Color atlases are available here https://github.com/shiveshc/CRF_Cell_ID/tree/master/Runs/Data_driven_atlases/Color_atlases.

Rebuttal

(1) Pipeline (Major Point#6):

I do not quite understand the necessity of single or multi-run, which can be combined into one inference process. Consider the following scenario in a freely behaving animal. At one time, the region to be annotated consists of N neurons. At a different time, a new neuron (now N+1) is squeezed into the region. It is probably always true that the number of neurons in the data is different from that in the atlas.

We agree with the reviewer that the number of neurons in the dataset is likely to be different from that in the atlas in many experimental conditions. There are, however, use-cases when the count-noise is zero (i.e. the number of cells is exactly the same as in the available labels). For instance, we talk about multicell imaging scenario (Figure 4A/B) – in some situations where multiple cells are labeled and the identity set of the cells is well defined (e.g. well-known well-behaved promoters).

(2) Color and Robustness and Evaluation Metrics (Essential revisions #1, #4, #6):

It seems that the leverage of color information for annotation is quite limited in the revised manuscript. The procedures for properly normalizing the color distribution are also complicated. I also found that the following sentence was incorrect. "Further, when leave-one-out atlases are used for both positional relationship and color (Figure 6—figure supplement 1), the accuracy achieved by top labels is marginally greater than the accuracy achieved by using leave-one-out positional relationship only (Author response image 1)". It was actually significantly lower if I read the figures correctly. This leaves the question how we shall correctly assemble the data-driven atlas efficiently.

We apologize for the confusion. The statement was comparing the yellow bar in Figure 6—figure supplement 1 (leave-one-out for both position and color, which is at 74.5%) and the first column orange bar in Author response image 1 (leave-oneout for position only, which is at 73.4 %). That is why we say including color is only marginally better. The reviewer makes a good point, which we agree, that not all information is additive – either position or color alone can carry quite a bit of information, but in this case, together, they did not provide a whole lot more extra information. With color normalization and using ensemble of color atlases the leave-one-out accuracy is 80.7% for top labels.

Reviewer #3:

The revision is very responsive to the summary and specific reviewers' critiques.

There remain some awkward phrasing and missing details to address. These are noted in order to make the text accessible to a broader audience and to increase the impact of the study both within and beyond the community of C. elegans researchers. These include:

(1). The use of the plural "noises" when the authors probably mean either "noise" in the singular e.g. “Prediction accuracy is bound by position and count 316 noise in data”, replace "amount of various noises" with "amount of noise contributed from various sources". This latter example combines both the concept of the magnitude of noise and the diverse types of noise.

Changes are made.

(2) Shorthand terms or jargon. For example, when the text reads "the position of cells AIBL" what is meant is "the position of the cell bodies of AIBL". Similarly, replace "neurons" with "neuronal cell body" or "neuronal somata".

Changes are made.

(3) C. elegans strains – in response to critiques of the initial submission, the authors have clarified the origin of the transgenic strains used in the study. Thank you! Please also correct the reference to AML70 which is not included in the table of strains. Based on context, it looks like the authors are referring to AML32. Please ensure that this is corrected.

We have updated the strains reference table and description of making strains in the subsection “Construction of landmark strains”.

(4) Figure 5, panel E. Please refer to Figure 5—figure supplement 1 panel C noting the cells with non-zero weights in SPC1, SPC2, SPC3

Changes are made.

(5) Figure 5 panel H: Indicate the origin of the motion trace shown. It appears to belong to one of the posterior cells shown in Panel F. Please provide a rationale for this choice – since the traces in Panel F certainly indicate that motion signals differ among cells.

We have now indicated the cell whose motion trace was used for mutual-information (Figure 5G) and cross-correlation analysis (Figure H, I) in Figure 5H. We chose one of the posterior cells because velocity traces of those cell were in phase with neuron activity. We agree with reviewer that motion signal among cells are different in terms of phase shifting and magnitude; this could be because the motion of animal is restricted in the device.

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    Transparent reporting form

    Data Availability Statement

    Code and data used in this study can be accessed at https://github.com/shiveshc/CRF_Cell_ID.git. This repository contains the following (1) All code and individual components necessary for using CRF_ID framework to annotate cells in new data, visualize results, and build new atlases based on annotated data (2) Code to reproduce results for comparison shown against other methods in this study, and (3) all raw datasets used in this study as well as human annotations created for those datasets except whole-brain imaging datasets.

    All data generated or analysed during this study are included in the manuscript and supporting files. Source data files are provided at https://github.com/shiveshc/CRF_Cell_ID.git (copy archived at https://archive.softwareheritage.org/swh:1:rev:aeeeb3f98039f4b9100c72d63de25f73354ec526/).


    Articles from eLife are provided here courtesy of eLife Sciences Publications, Ltd

    RESOURCES