Skip to main content
PLOS Computational Biology logoLink to PLOS Computational Biology
. 2021 Apr 27;17(4):e1008914. doi: 10.1371/journal.pcbi.1008914

WormPose: Image synthesis and convolutional networks for pose estimation in C. elegans

Laetitia Hebert 1, Tosif Ahamed 1,2, Antonio C Costa 3, Liam O’Shaughnessy 3, Greg J Stephens 1,3,*
Editor: Dina Schneidman-Duhovny4
PMCID: PMC8078761  PMID: 33905413

Abstract

An important model system for understanding genes, neurons and behavior, the nematode worm C. elegans naturally moves through a variety of complex postures, for which estimation from video data is challenging. We introduce an open-source Python package, WormPose, for 2D pose estimation in C. elegans, including self-occluded, coiled shapes. We leverage advances in machine vision afforded from convolutional neural networks and introduce a synthetic yet realistic generative model for images of worm posture, thus avoiding the need for human-labeled training. WormPose is effective and adaptable for imaging conditions across worm tracking efforts. We quantify pose estimation using synthetic data as well as N2 and mutant worms in on-food conditions. We further demonstrate WormPose by analyzing long (∼ 8 hour), fast-sampled (∼ 30 Hz) recordings of on-food N2 worms to provide a posture-scale analysis of roaming/dwelling behaviors.

Author summary

Recent advances in machine learning have enabled the high-resolution estimation of bodypoint positions of freely behaving animals, but manual labeling can render these methods imprecise and impractical, especially in highly deformable animals such as the nematode C. elegans. Such animals also frequently coil, resulting in complicated shapes whose ambiguity presents difficulties for standard pose estimation methods. Efficiently solving coiled shapes in C. elegans, exhibited in a variety of important natural contexts, is the primary limiting factor for fully automated high-throughput behavior analysis. WormPose provides pose estimation that works across imaging conditions, naturally complements existing worm trackers, and harnesses the power of deep convolutional networks but with an image generator to automatically provide precise image-centerline pairings for training. We apply WormPose to on-food recordings, finding a near absence of deep δ-turns. We also show that incoherent body motions in the dwell state, which do not translate the worm, have been misidentified as an increase in reversal rate by previous, centroid-based methods. We expect that the combination of a body model and image synthesis demonstrated in WormPose will be both of general interest and important for future progress in precise pose estimation in other slender-bodied and deformable organisms.

Introduction

All animals, including humans, reveal important and subtle information about their internal dynamics in their outward configurations of body posture, whether these internal dynamics originate from gene expression [1], neural activity [2], or motor control strategies [3]. Estimating and analyzing posture and posture sequences from high-resolution video data is thus a general and important problem, and the basis of a new quantitative approach to movement behavior (for reviews see e.g. [4, 5]).

The roundworm C. elegans, important on its own as a model system (see e.g. [6]), provides an illustrative example, where “pose” can be identified as the geometry of the centerline extracted from worm images [7]. Even with a relatively simple body plan, identifying the centerline can be challenging due to coiling and other self-occluded shapes, Fig 1. These shapes occur in important behaviors such as an escape response [8, 9], among mutants [10] and are a yet unanalyzed component in increasingly copious and quantitative recordings such as the Open Worm Movement Database [11].

Fig 1. The nematode C. elegans naturally exhibits a variety of coiled shapes which challenge the determination of the centerline posture, a fundamental component for quantitative behavioral understanding.

Fig 1

(A) An exemplar collection of images displaying coiled shapes. (B) Instantaneous worm pose encoded as the centerline curve parameterized by tangent angles θ = (θ1, … θi, … θN) ordered from head to tail. (C) Standard image processing techniques extract the centerline by morphological operations or image features analysis and have not been able to differentiate solutions with very different centerlines (red, grey) that occur with coiled posture. The correct centerline (red) can be determined by close visual inspection (A), however high-throughput analysis necessitates a pose estimation algorithm which is robust to fluctuations in brightness, blur, noise, and occlusion.

Classical image skeletonization methods can be used to identify the worm centerline for non-overlapping shapes [7] and are employed in widely-used worm trackers because of their simplicity and speed. For coiled or self-overlapping postures, more advanced statistical models combine image features such as edges with a model of the worm’s centerline [10, 1215]. However, such image features are not always visible and are not robust to changes in noise or brightness, often requiring data-specific engineering which reduces portability. Another recent technique utilizes an optimization algorithm by searching for image matches in the “eigenworm” posture space [9], but is limited in efficacy by the slow nature of multi-dimensional image search and by the low resolving power of a comparison metric, which uses only a binary version of the raw image.

With the ability to extract complex visual information about articulated objects, methods built from convolutional neural networks (CNNs) offer a new, promising direction. CNNs are the foundation for recent, remarkable progress in markerless body point tracking [1618], including worm posture [19]. However, intensive labeling requirements by human annotators, even if assisted by technology [20], as well as the ambiguity of which or exactly how many points to label, offer a barrier to the usefulness of CNNs in posture tracking and beyond. Body point marking is challenging in the case of worm images where the annotation task is to label enough points along the worm body to reconstruct the posture. While human annotators can quickly pinpoint the extremities of the worm body, other landmarks are less obvious. In some recordings, it is even difficult to distinguish the worm head from the tail, which makes the labeling error-prone and imprecise. Furthermore, the labeling is specific to the recording conditions and can be hard to generalize across changes in resolution, organism size, background, illumination, and to rare posture configurations not specifically isolated.

We describe an algorithm, WormPose, for pose estimation in C. elegans containing two principal advances: (1) We create a model of worm shape probabilities (a generative model) which we combine with a new technique for producing synthetic but realistic worm images. These images are used for network training, thus circumventing the difficulty and ambiguity of human labeling, and can be easily adapted to different imaging conditions (2) We develop a CNN to reliably transform worm images to a centerline curve. We demonstrate our approach using on-food behavior of N2 and mutant worms and use our results to provide a new posture-scale analysis of roaming and dwelling behavioral states. Compared to prior work, the CNN at the core of our algorithm results in dramatically increased computational speed, providing new opportunities for scientific exploration.

Design & implementation

Data requirements

Our focus is on resolving coiled, overlapping, blurred, or other challenging images of a single worm. We assume that the input data consists of videos of a single moving worm and that most of the non-coiled frames are analyzed beforehand, for example by Tierpsy tracker [21]. For each (non-coiled) frame, we require the coordinates of equidistant points along the worm centerline, ordered from head to tail, and the worm width for the head, midbody, and tail (defined in [22]). We also use the recording frame rate. WormPose 1.0 does not detect the head of the worm, so we also expect that the labeled frames provide the head-tail position at regular intervals throughout the video. Head-tail positions are an included output of the Tierpsy tracker.

Processing worm images

From a dataset described as above, we process worm images to focus on the worm object of interest. Broadly, we first segment the worm in the image and set all non-worm pixels to a uniform color. Then we either crop or extend to create a square image of uniform size with the worm in the center, cleaned of noise and non-worm objects.

The specific process of segmenting a single worm in an image can be adapted to each recording condition. For concreteness, we provide a simple OpenCV [23] implementation that is sufficient for most videos of the Open Worm Movement Database [11]. Raw images from the video are first processed by a Gaussian Blur filter with a window size of 5 pixels, and then thresholded with an automatic Otsu threshold to separate the background and the foreground. The morphological operation “close” is applied to fill in the holes in the foreground image. We use a connected components function to identify the objects belonging to the foreground. To focus on objects located at the center of the image, we crop the thresholded image on each side by an amount consisting of 15% of the size of the image. We isolate the largest blob in this cropped image as the worm object of interest. We calculate the background color as the average of the background pixels of the original image, and assign this background value to all pixels that do not belong to the worm object. All processed images are then either cropped or extended to be the same width and height, with the worm object set in the center. We set the default value for the processed image size as the average worm length of the biggest worm in the dataset, a size large enough to encompass all examples. Alternatively, the image size can be set by the user, and the images will be resized with linear interpolation, which is useful to speed up computation on large images. The minimum image size is 32 × 32 pixels.

Generating worm shapes

We generate realistic worm shape through a Gaussian Mixture Model (GMM) Fig 2A, which we fit to a collection of resolved body postures obtained from previous analysis [7, 9]. We use the GMM as a simple generative model of body shapes that enables the sampling of an arbitrarily large training set for the network, with shapes that respect the overall correlations between body parts while generalizing to more complex postures. We parameterize worm shape by a 100-dim vector of angles θ, formed by measuring the angle between 101 points equally spaced along the body’s centerline. Uncoiled shapes were obtained using classical image tracking to extract θ directly from images [7]. Coiled shapes were obtained in [9] by searching the lower-dimensional space of eigenworm projections (d = 5, obtained through Principal Component Analysis of the space of {θ}), to find the combination of eigenworm coefficients that best matches a given image and projecting these back into the θ space. Using the classical image analysis results from [7] allows us to expand the space of possible θ beyond the one captured by the first 5 eigenworms used in [9]. We use an equal population of coiled and uncoiled postures from N2 worms foraging off-food and sample uniformly according to the body curvature as measured by the third eigenworm projection, a3. This yields a training set of ∼ 15000 θ vectors. We fit the GMM through an Expectation-Maximization algorithm which finds the set of N Gaussian components that maximizes the likelihood (see e.g. [24]). The full model is parameterized by the mean and covariance of each Gaussian, and the weight associated with each Gaussian component. We assess the trade-off between model complexity and accuracy with Akaike’s information criterion, which indicates that N ∼ 250 − 275 components would be an appropriate choice, S1 Fig. We set N = 270 components for this manuscript, but N can be tuned by the user according to the desired degree of variability in the generated worm shapes: larger N decreases the variability (the GMM is closer to the underlying training set), while lower N increases the variability in the obtain worm shapes. We train the GMM using sklearn.mixture.GaussianMixture from scikit-learn in Python [25].

Fig 2. We combine a generative model of worm posture with textures extracted from real video to create realistic yet synthetic images for a wide variety of naturalistic postures, including coils, thus avoiding the need for manually-annotated training data.

Fig 2

(A) We model the high-dimensional space of worm posture (left) by Gaussian mixtures (middle) constructed from a core set of previously analyzed worm shapes [9]. To each generated posture we add a global orientation (chosen uniformly between 0 and 2π), and we randomly assign the head to one end of the centerline. (right) We use the resulting centerline (angle coordinates) to construct the posture skeleton (pixel coordinates). (B) We warp small rectangular pixel patches along the body of a real template image (left) to the target centerline (middle), producing a synthetic worm image (right). Overlapping pixels are alpha-blended to connect the patches seamlessly. Unwanted pixels protruding from the target worm body are masked and the background pixels are set to a uniform color. Finally, the image is cleaned of artifacts through a medium blur filter.

Generating synthetic images

We build a synthetic image generator to produce a worm image with a specific posture and with the same appearance as a reference image, Fig 2B. Such synthetic images have a similar appearance to real images processed as described above.

We exclusively use classical image processing techniques, including image warping and alpha blending, to effectively bend a known worm centerline from a reference image into a different posture. The reference image is typically of a non-overlapping worm, with its associated labeled features: (1) the skeleton as a list of NS coordinates (Sx, Sy) equidistant along the centerline, and (2) the worm width at three body points: head, midbody and tail. To create a new synthetic image we first draw a centerline θ of size 100 from the GMM worm shape generator. We produce target skeleton coordinates {Sx, Sy} through the transformation

Sx(i+1)=Sx(i)+dScosθi (1)
Sy(i+1)=Sy(i)+dSsinθi

for i = 1, 2…NS. The length element dS is determined by dividing the worm length of the reference image by NS − 1 and we set the origin by centering the skeleton in the middle of the target image, Fig 2A(right). If needed, the target skeleton is resampled to have the same number of points NS as the reference skeleton. We use the labeled width for the head, midbody and tail to calculate the worm width (in pixels) at all skeleton points ww(i):

 ww[0:head]=head_width

 ww[head:midbody]=interp(head_width, midbody_width)

 ww[midbody:tail]=interp(midbody_width, tail_width)

 ww[tail:Ns-1]=tail_width

In a “reverse skeletonization”, we take small rectangular image patches of size (l, w) from the reference image and add them along the target skeleton. Along each skeleton we create rectangles with l^ oriented along the direction formed by the skeleton points i and i + step, and width w(i) = wmultiplier × ww(i). The parameter step determines the length l of the rectangle. For each pair of rectangles, we find the affine transformation that maps a rectangle in the reference image to a rectangle in the target image using the function getAffineTransform from OpenCV [23]. If step is too small (equal to 1), the patches will not overlap which will create discontinuities in the synthetic image, but if step is too large, then the patches could be larger than the amount of curvature of the worm. In practice, we set step = 1/16 × NS. We set wmultiplier = 1.2, which means the rectangle width will be larger than the actual worm width to include background pixels around the worm body.

For each pair of source-target rectangles, we use the function warpAffine from OpenCV [23] to project the pixels from the rectangle in the source image to the coordinates of the target rectangle in the target image. We combine the transformed patches into a single cohesive worm image by iteratively updating a mask image created from the overlapping regions. For each transform, we add the values of the new transformed image containing one patch to the current full image. We then multiply by the mask image set to 1 for non-overlapping areas and 0.5 for overlapping areas. We draw the rectangles from the worm tail so that the last rectangles will be of the worm head, as this configuration is more likely to occur naturally.

The overlapping areas combine seamlessly because of the blending, but some protrusions are still visible, especially when the target pose is very coiled. We eliminate these artifacts by masking the image with a generated image representing the expected worm outline. This mask image is created by drawing convex polygons along the target centerline of the desired worm width, complete with filled circles at the extremities. We apply a median filter with a window size of 3 to smooth the remaining noise due to the joining of the patches. Finally, all non-worm pixels are set to a uniform color: the average of the background pixels in the reference image.

To add diversity to the synthetic images, we include a set of optional augmentations. We translate the target skeleton coordinates by a uniform value between 0 and 5% of the image size. We vary the worm length uniformly between 90% and 110%, and the worm thickness multiplier between 1.1 and 1.3. We randomly switch the drawing order from head to tail or the contrary, so that each is equally probable. Finally, we add an extra Gaussian blur filter at the end of the process 25% of the time, with a blur kernel varying between 3% and 10% of the image size or 13 pixels, whichever is smaller.

In WormPose, the Python implementation of the image generator is optimized for speed and memory allocation. Generating a synthetic image of a large size will be slower than a smaller one. It is also faster to limit the number of reference images, as some calculation is cached. The generation is usually split into several processes, and we use a maximum of 1000 reference images per process, chosen randomly. The number of skeleton points NS from the reference image is flexible and depends on the dataset. If NS is too small (NS ≲ 20), the synthetic worm image will be too simplistic compared to the real images. On the other end, increasing NS too much will decrease the performance, and the resulting synthetic image will not benefit in detail. We routinely use 50 ≲ NS ≲ 100.

Network architecture and training

For reasons ranging from motion blur to self-obscured postures, it is often difficult to discern the worm’s head from the tail, such as in Fig 3A. Images with similar worm shape but opposite head-tail locations have quantitatively different centerlines, thus providing a challenge to network training. To handle this ambiguity, we design a loss function that minimizes the difference between the network prediction θ^ and the closest of two labels: θa and θb = flip(θa) + π, representing the same overall pose but with swapped locations of the head and tail, Fig 3B. The output training error is the minimum of the root mean square error of the angle difference d(θ1, θ2) between the output centerline θ^ and the two training labels {θa, θb} (Fig 3C) with

ϵ(α,β)=atan2(sin(α-β),cos(α-β))
d(θ1,θ2)=1Ni=1Nϵ(θ1i,θ2i)2 (2)

The learned function is therefore a mapping between the input image and a worm pose without regard to head-tail location, which we determine later with the aid of temporal information.

Fig 3. We train a convolutional network to associate worm images with an unoriented centerline to overcome head-tail ambiguities which are common due to worm behaviors and imaging environments.

Fig 3

(A) An example image with a seemingly symmetrical worm body. (B) We associate each training image to two possible centerline geometries, resulting in two equivalent labels: θa and θb = flip(θa) + π, corresponding to a reversed head/tail orientation. (C) We compare the output centerline θ^ to each training centerline through the root mean squared error of the angle difference d(θ1, θ2) (Eq 2) and assign the overall error as loss=min(d(θ^,θa),d(θ^,θb)).

Our lightweight neural network architecture is heavily inspired by the Residual Network [26], as applied to the CIFAR-10 dataset. Our worm images are bigger than the CIFAR-10 32 × 32 pixels: we routinely pick a linear dimension of 128 pixels, and below 90 pixels the posture become difficult to see. So the first layer of our network is a 7 × 7 pixels convolution layer with 32 filters and a stride of 2 pixels, followed by a max-pooling layer with a pool size of 2 × 2 pixels and a stride of 2 pixels, which reduces the input image size early in the network. We then use a stack of 3 residual blocks composed of 3 basic blocks, with number of filters: 32, 64, 128. We follow the ResnetV2 architecture [27] where a batch normalization and an activation layer precede the convolution layers. We choose LeakyRelu as the activation layer. The last layers are a global average pooling followed by a densely connected layer with a size of 100.

For each dataset, we generate 500k synthetic images for training, and randomly select 10k real preprocessed images for evaluation. When training, we use a batch size of 128. We train for 100 epochs and save the model with the smallest error on the evaluation set. We use the Adam optimizer [28] with a learning rate of 0.001.

Post-prediction

Image error and outlier detection

For real data, the lack of labeled data for coiled worm images means that we cannot directly evaluate the accuracy of the network predictions. Instead, we leverage our ability to generate synthetic images and apply an image error measure between the input image and the two synthetic images generated from the two possible predicted centerlines. We generate synthetic worm images representing the two predictions, using the nearest labeled frame in time as a reference image. We crop the synthetic images to the bounding box of the synthetic worm shape plus a padding of 2 pixels on each side, and apply a template matching function between this synthetic image representing the prediction and the original image. We use the matchTemplate function from OpenCV [23] with the normalized correlation coefficient method, which translates a template image across a source image and compute the normalized correlation c at each location. The result is a correlation map, of a size Size(source) − Size(template) + 1, with values ranging between c = − 1 (perfect anti-correlation, as would occur in a pair of reversed-intensity black and white images) and c = 1 (perfect correlation). We use the maximum value |c|max to define the image error 1 − |c|max and the location of |c|max to estimate the predicted skeleton coordinates. Frames with an image error above a threshold value will be discarded.

To select the threshold (potentially different for each different dataset), we plot the image error distribution on a selection of labeled frames. Comparing images with their reconstructed synthetic image based on their (trusted) labels shows a distribution of low error values, S2 Fig. We select an image error threshold with a default value 0.3, which retains the majority of the predictions while removing obviously incorrect reconstructions.

Head-tail assignment

Once the network is trained, we can predict the centerline in full video sequences, but the resulting postures have a random head-tail assignment. For each image, we augment the predicted centerline θ^ with the head-tail switched centerline θ^flipped=flip(θ^)+π. We use temporal information and the labeled frames to determine the final worm pose as either one of these two centerlines, or we discard the frame entirely in low-confidence cases.

We first create segments with near-continuous poses by using an angle distance function between adjacent frames, distance(θ1,θ2)=1Nn=0N-1|ϵ(θ1n,θ2n)|, with ϵ from Eq 2. We start with the first frame and assign its head position randomly. We then calculate the angle distance between this centerline and the two possible options in the next frame. If the distance is higher than a threshold (we use 30°), we cannot reliably assign the head position by comparing to this adjacent frame. We calculate the distance on the following frames (maximum 0.2 s in the future) until we cannot find any frame that is close enough to the last aligned frame, we then start a new time segment with a random head-tail orientation. After this first process, we obtain temporal segments with a consistent head-tail position, possibly with small gaps containing outlier results to be discarded. To increase confidence in the results, we discard segments that are too small (less than 0.2 seconds).

While the pose of the worm is consistent in these segments, there are still two possible head-tail orientations per segment: we use the labeled data from the non-coiled frames to pick the correct solution. We align the whole segments with the labeled data by calculating a cosine similarity between the head to tail vector coordinates of the prediction and the available labels. We finally align the remaining unaligned segments with no labels by comparing them to the neighbor segments that have been aligned before: we also calculate the cosine similarity between the head to tail vector between the two closest frames of the aligned and unaligned segment.

Interpolation

For an optional post-prediction step, we interpolate small gaps (max_gap = 4 frames), using a third-order spline interpolation method, using the scipy.interpolate.interp1d function from Scipy [29]].

Smoothing

For an optional post-prediction step, we smooth the angle time series using a Savitsky-Golay filter with third-order polynomials in 8 frame windows, using the scipy.signal.savgol_filter function from Scipy [29].

Implementation

In Fig 4 we show a schematic of the full computational process, which we implement in a Python package “WormPose”, with source code: https://github.com/iteal/wormpose, and documentation: https://iteal.github.io/wormpose/. We optimize for speed via intensive use of multiprocessing and also for big video files that do not fit into memory. We provide default dataset loaders: for the Tierpsy tracker [21] and for a simple folder of images. Users can add their own dataset loader by implementing a simple API: FramesDataset reads the images of the dataset into memory, FeaturesDataset contains the worm features for the labeled frames, and FramePreprocessing contains the image processing logic to segment the worm in images and to calculate the average value of the background pixels. A custom dataset loader is typically a Python module exposing these three objects, which then can be loaded into WormPose by the use of Python entry points. A simplified example of adding a custom dataset is available in the source code repository: https://github.com/iteal/wormpose/tree/master/examples/toy_dataset. We provide a tutorial notebook with sample data and an associated trained model, which can be tested in Google Colaboratory. We also include an optional interface to export results in a custom format: for the Tierpsy tracker dataset, we can export the results to the Worm tracker Commons Object Notation (WCON) format.

Fig 4. The WormPose pipeline.

Fig 4

(0) We use classical image processing methods to extract partial labels of simple, non-coiled postures, and then apply a CNN-based approach to complete the missing frames which result from complex images. We analyze each video recording with a three-step pipeline. (1) We generate synthetic data with the visual appearance of the target images but containing a wider range of postures, Fig 2. We use this synthetic data to train a deep neural network to produce the centerline angles from a single image. During training, we periodically evaluate the network on real labeled images and keep the model that best generalizes. (2) We predict the entire set of target images. The images are first cropped and processed to look more visually similar to the synthetic images: background and any non-worm pixels are set to a uniform color. For each such processed image, the trained network predicts the centerline angles for both possible head-tail orientations. (3) Our algorithm produces a full image as output and we discard inaccurate results using a pixel-based comparison with the input image. Finally, we resolve the head-tail orientation by comparing adjacent frames. Once trained, the WormPose pipeline is rapid and robust across videos from a wide variety of recording conditions.

We also tested WormPose on a high-performance laptop with a Intel i7-10875H CPU and a NVIDIA GeForce RTX 2070 SUPER GPU, on the same N2 dataset and parameters of the paper. The dataset generation and training the network took 6.5 hours, and predicting the full dataset of 600k+ frames took less than 2 hours. We see no barrier to performing pre-analysis, such as with Tierpsy, on a laptop, thus making the full pipeline accessible.

Roaming/dwelling analysis

To connect to previous analysis on roaming/dwelling behavior, we compute the worm’s speed and angular speed from the centroid c=(x,y) position as a function of time. To simplify the comparison with [30], we downsample the time series to 3 Hz and compute the centroid velocity as the finite difference between subsequent time points v(t)=c(t+Δt)-c(t)Δt, where Δt = 1/3 s after downsampling. The speed is obtained by taking the norm of the velocity vector s(t)=|v(t)|, where |.| represents the 2-norm. The angular speed is computed by estimating the angle between the two vectors defined from three subsequent points, which gives the change in the tangential component of the velocity. From these estimates, we obtain roaming and dwelling states by fitting a two-state Hidden Markov Model (HMM) to the speed and angular speed time series averaged in 10 s windows (as in [30]). The model is composed of two hidden states, their stationary distributions π and Markov transition matrices P, and Gaussian emission probabilities conditioned on the current state. Fitting is performed through an Expectation-Maximization algorithm (Baum-Welch), with the emission probabilities being Gaussian distributions with a diagonal covariance matrix. The sequence of hidden states is obtained through a Viterbi algorithm. We use an open-source Python HMM package, hmmlearn, obtained from: https://github.com/hmmlearn/hmmlearn. For more on HMMs, see [31].

To determine the directionality of the worms movement, we estimate the tangential component of the velocity vector by ψ(t) = tan−1(vy(t)/vx(t)), and the overall tail-to-head angle by averaging the centerline angles Ψ(t)=θ(t). The worms orientation at each time point is obtained by subtracting these two quantities Δψ = ψ − Ψ and normalizing into the interval [-π2,3π2] [32].

Posture analysis was performed by projecting the centerline angle time series θ(t) into a canonical lower dimensional space of “eigenworms” [7], resulting in a mode time series a(t). In this space, the first two eigenmodes capture the propagation of the body wave along the body. The angle between them, ϕ(t) = tan−1(a2(t)/a1(t)), defines the phase of the wave, while its derivative ω(t)=ϕ˙(t) is the phase velocity. Estimates of the phase velocity ω are obtained through fitting a cubic spline to ϕ, using Scipy’s interpolate.CubicSpline [29]. We estimated the frequency of complete body waves by finding segments in which the body wave phase velocity ω did not change sign, and there is a recurrence in cos(ϕ(t)). We make a conservative estimate of the body wave frequency by counting peaks in the time series of cos(ϕ(t)), using the scipy.signal.find_peaks function of Scipy [29], with a prominence 1.95 and a minimum time between peaks of 8 frames (∼ 0.27 s).

For comparison to off-food behavior we used a previously-analyzed dataset [9, 33], in which N2-strain C. elegans were imaged at f = 32 Hz with a video tracking microscope on a food-free plate. Worms were grown at 20°C under standard conditions [34]. Before imaging, worms were removed from bacteria-strewn agar plates using a platinum worm pick, and rinsed from E. coli by letting them swim for 1 min in NGM buffer. They were then transferred to an assay plate (9 cm Petri dish) that contained a copper ring (5.1 cm inner diameter) pressed into the agar surface, preventing the worm from reaching the side of the plate. Recording started approximately 5 min after the transfer, and lasted for 2100 s. In our analysis we used data downsampled to f = 16 Hz [9], yielding 33600 frames per recording.

Results

Pose estimation from wild-type and mutant worm recordings

We quantify WormPose using synthetic data as well as (N = 24) wild-type N2 worm recordings and (N = 24) AQ2934 mutants from the Open Worm Movement Database. The synthetic data analyzed here was not used for training and consists of 600k images. We choose N2 for general interest and AQ2934 (with gene mutation nca-2 and nRHO-1) for the prevalence of coiled shapes. For the AQ2934 dataset, we used all of the available videos. For the N2 dataset, we selected 24 videos randomly from the large selection in the Open Worm Movement Database, but with a criterion of a high ratio of successfully analyzed frames from the Tierpsy Tracker (in practice ranging between 79% to 94%). Videos where there are very few analyzed frames may signal that the worm goes out of frame, or that the image quality is so low that no further analysis is possible. Images are sampled at rate fs ∼ 30 hz for ∼ 15 min in duration, resulting in 600k frames from each dataset. We set the image size to 128 × 128 pixels. We train distinct models for each dataset and then predict all images from each dataset. We show the cumulative distribution of the image error in Fig 5A, including typical (input and output) worm images for various error values. For additional context, we also show the image error calculated on synthetic image data not used in training. Errors in the synthetic data are larger than those for N2 worms because we have more (and more complicated) coiled postures in the synthetic data generator. In Fig 5B we use our image generator to show the error in mode values for synthetic data, the only data for which we have ground truth for the centerlines. The “error worms” (worm shapes representing mode values with δai = 1.0) are essentially flat and we report even smaller median mode errors δamedian=(0.30,0.29,0.29,0.29). In Fig 5C and 5D we show that the errors remain small when separated into uncoiled and coiled shapes (see [9] for a comparison).

Fig 5. Quantifying the error in pose estimation.

Fig 5

(A) We show the cumulative image error of predicted images for different datasets. We predict 24 videos totaling over 600k frames from N2 wild-type and AQ2934 mutant datasets and calculate the image error between the original image and the two possible predictions, and keep the lowest value between the two. For the error calculations here we bypass the postprocessing step so no result is discarded. For interpretability we also draw representative worm image pairs for different error values and note that predictions overwhelmingly result in barely discernible image errors. On average, the N2 predictions have a lower image error than the mutant which exhibit much more coiled challenging postures. We also generate new synthetic images (using N2 as templates, 600k values) not seen during the training and predict them in the same way. The image error for the synthetic images (which generally include a higher fraction of complex, coiled shapes) is on average worse than the N2 type, but better than the mutant. (B) Our synthetic training approach also allows for a direct comparison between input and output centerlines, here quantified through the difference in eigenworm mode values. As with the images, the differences are also small so that even in the large-error tail of the distribution the “error worms” (worm shapes representing mode values with δai = 1.0) are essentially flat. The median mode errors are δamedian=(0.30,0.29,0.29,0.29). (C, D) We additionally show the mode errors for synthetic images separated into (C) uncoiled (a3 ≤ 15) and (D) coiled (a3 > 15) shapes. Dashed lines denote median error values of δauncoiled=(0.26,0.25,0.23,0.23) and δacoiled=(0.43,0.41,0.47,0.46). The errors are small in all cases.

Comparison with previous approaches

The only comparable open-source, coiled-shape solution is detailed in previous work from some of the current authors [9] (hereafter noted as RCS from an abbreviation of the title). RCS was designed before the widespread application of CNNs and was evaluated entirely on postures from N2 worms. For coiled frames, RCS employs a computationally expensive pattern search in the space of binarized down-scaled worm images, thus ignoring texture and other greyscale information. A temporal algorithm then matches several solutions across frames to resolve ambiguities. We apply RCS to the N2 and mutant AQ2934 datasets analyzed above. The mutant dataset is especially challenging as a large proportion of coiled frames require the slow pattern search algorithm. We split each video into segments of approximately 500 frames to parallelize the computation on the OIST HPC cluster and obtain results in approximately one week while running 100 cluster jobs simultaneously. For comparison, WormPose applied to the mutant data completed in approximately a day while running only one job on a GPU node with an Nvidia Tesla V100 16GB, with the majority of the time spent on network training. Ultimately we obtained posture estimates for 98% of the frames of the mutant dataset and 99.8% of the N2 dataset.

Unfortunately, a lack of ground truth posture sequences means that we cannot directly compare the posture estimates of RCS and WormPose. Posture sequences are fundamental to RCS and this information is not contained in the image generator of WormPose. However, we can leverage the image error between the original image and the predicted posture (without head information), S3 Fig. While WormPose is dramatically faster and uses no temporal information (a possible route for future improvement), we obtain very similar image reconstruction errors for both methods (A). For a closer examination, we also show cumulative distributions of the difference in turning mode values (B). One source of these discrepancies are coiled loop-like postures where both methods struggle to recover the correct pose. Another discrepancy (C) results from crossings such as illustrated in Fig 1 where RCS’s temporal matching algorithm picks the wrong solution, perhaps a reflection of loss of information upon binarization.

Posture-scale analysis of roaming/dwelling behavior

We further demonstrate WormPose by exploring previously unanalyzed N = 8 longtime (T ∼ 8 h, fs ∼ 30 hz) recordings of on-food N2 worms. The length of these recordings (O(106) frames) renders impractical previous coiled shape solutions [9, 10], which has prevented fine-scale posture analysis of roaming/dwelling behavior.

On food-rich environments, worms typically switch between two long-lasting behaviors: a roaming state, in which worms move abundantly on the plate at higher speeds and relatively straight paths; and a dwelling state, in which worms stay on a local patch with lower speeds and higher angular speeds [30, 35]. Roaming and dwelling states can last for tens of minutes, so long recordings are essential, and we leverage our ability to obtain high-resolution posture tracking to explore their fine-scale behavioral details.

To identify roaming and dwelling states consistent with previous work [30], we fit a Hidden Markov Model to the centroid and angular speed averaged in 10 s windows, which yields a high speed, low angular speed state (roaming) and a low speed, high angular speed state (dwelling), Fig 6A. We estimate the frame-by-frame directionality of the worm’s movement by subtracting the angle of the velocity vector ψ=tan-1(vyvx) (where vx and vy are the x and y components of the centroid velocity) by the overall tail-to-head worm angle on the plate Ψ, obtained by averaging the centerline angle along the body, Δψ = ψ − Ψ. The distribution of Δψ is bimodal, indicative of switching between forward (Δψ ≈ 0 rad) and reversal (Δψπ rad) movement, Fig 6B. As in previous observations [30, 35], worms mostly move forward in the roaming state, while dwelling exhibits a larger fraction of backward locomotion.

Fig 6. Posture-scale analysis of roaming/dwelling behavior from long (T ∼ 8 h) recordings reveals that the centroid-derived increase in the dwelling reversal rate results from incoherent body motions that do not translate the worm, and that deep ventral turns are less common in on-food vs off-food conditions.

Fig 6

(A) We align with previous definitions by identifying roaming/dwelling behavior through a Hidden Markov Model of the linear and angular speed, averaged in 10 s windows, thus splitting each trajectory into two states: a low speed, high angular speed state (dwelling, blue), and a high speed, low angular speed state (roaming, orange). (inset) Example 5 minute centroid trajectories for each state. (B) In centroid-based analysis, the dwelling state exhibits a larger fraction of reversals vs roaming. We identify forward and backward motion using the angle between the centroid velocity vector and the tail-to-head angle obtained by averaging centerline angles: Δψ < π/2 for forward locomotion, Δψ > π/2 for backwards. (C-F) Posture analysis reveals that the centroid characterisation of roaming and dwelling behavior is incomplete. (C, left) Roaming worms exhibit a larger fraction of higher body wave phase velocities, ω, in both reversal and forward motion. (C, right) Probability of reversals longer that τrev, P(t > τrev) = 1 − P(tτrev) in the dwelling and roaming states. The roaming state generally exhibits longer reversals than dwelling, for which reversal bouts are extremely short. Thick lines indicate the CDF for the ensemble of worms, while lighter lines are for each individual. (D) The rate of reversal events with complete body waves is an order of magnitude higher in the roaming state compared to dwelling. For comparison we show also the reversal rate for worms foraging off-food [7, 9] (gray), which also exhibits an increased reversal rate. For such body wave analysis we identify forward and reversal events according to the sign of the phase velocity ω. (E) Body curvature θ as a function of time for example dwelling and roaming states. The dwelling state (left) exhibits incoherent body waves that do propagate through the entire body, whereas coherent full body waves are commonly observed in roaming (right). (F) Worms on-food exhibit a lower fraction of deep ventral turns. We show the probability distribution function (PDF) of the turning mode a3 for roaming, dwelling and foraging (gray) worms. Roaming worms exhibit a larger fraction of Ω-turns than dwelling worms and δ-turns are rare in on-food data: they are not observed in the full ∼ 66 hours of recordings.

Our high-resolution posture measurements provide a unique opportunity to dissect the fine-scale details of these long time scale behaviors; WormPose allows us to substantially reduce noise in the estimate of the body wave phase velocity resulting from blurry frames in the long recordings, and to obtain the full body posture through coiling events. We leverage the interpretability of the eigenworm decomposition of the centerline angles [7] to assess the properties of the body wave. The first two eigenworms (a1 and a2) capture the undulatory motion of the worm: the angle between these two modes ϕ=-tan-1(a2a1) is the overall phase of the body wave, while its derivative, ϕ˙=ω is the body wave phase velocity. The third eigenworm, a3, captures the overall turning amplitude of the worm: |a3| ≳ 10 correspond to Ω-like turns [7, 9]. In Fig 6C we show the distribution of phase velocities ω, in the roaming and dwelling states previously identified. Roaming worms typically exhibit higher body wave phase velocities in both forward (ω > 0) and backward (ω < 0) locomotion, Fig 6C(left), contrary to the centroid characterization, which indicates that dwelling worms increase their rate of reversals and reorientation events [30, 35]. Notably, most reversals in the dwelling state are very short (90% of them are shorter than ∼ 0.25 s), when compared to the typical reversal length in the roaming state, Fig 6C(right). The low phase velocities in dwelling also indicate that such reversals result in an insignificant translation of the worm’s body. This suggests that most of the reversals measured through a centroid-based analysis in fact correspond to incoherent body motions, such as head oscillations or short retractions. Indeed, we count the frequency of body waves that travel all the way across the body, Fig 6D, and find that the frequency of full-body waves is extremely small in dwelling when compared to the roaming, for which coherent body movements are much more frequent. Comparison to off-food behavior indicates that foraging worms exhibit an even higher rate of full body waves in the forward and reversal state, contradicting the centroid-derived picture that an increase rate of reversals results in local exploration.

While dwelling states at the centroid level exhibit larger reversal rates, the nature of these reversals is very different from the coherent body wave reversals found during roaming. In Fig 6E we show examples of 30s segments of the body angles as a function of time, illustrating how apparent reversals in the dwelling state result from incoherent body motions. To further dissect the nature of roaming and dwelling states, we leverage WormPose to compute the distribution of turning amplitudes a3 across states, Fig 6F. Roaming states exhibit a slightly larger fraction of Ω-turns (10 ≲ a3 < 20) when compared to dwelling worms, which contradicts the centroid-derived picture (prevalent in previous literature) that dwelling results from increased reversal and turning rates. Remarkably, in a total of ∼ 66 hours of analyzed data in on-food conditions, we find no occurrence of deep δ-turns, a behavior commonly observed when foraging off food.

Availability & future directions

WormPose is open-source and free with a permissive 3-Clause BSD License. The source code is available: https://github.com/iteal/wormpose, and can be installed from the Python package index: https://pypi.org/project/wormpose. The GitHub README also includes a link to the data used in our analyses. The GitHub material includes scripts to download on-food datasets from Zenodo, as well as trained models.

Discussion

WormPose enables 2D pose estimation of C. elegans by combining a CNN with a synthetic worm image generator for training without manually labeled data. Our approach is especially applicable to complex, coiled shapes, which have received less attention in quantitative analyses even as they occur during important turning behaviors and in a variety of mutants. We also introduce an image similarity, which leverages the synthetic worm generator to assess the quality of the predicted pose without manual centerline annotation. Once trained, the convolution computation is fast and could enable real-time, coiled-pose estimation and feedback [36]. The computational pipeline is optimized to analyze large datasets efficiently and is packaged in an easy to use, install and extend, open-source Python package.

With common imaging resolutions, the determination of the worm’s head-tail orientation is surprisingly subtle. Our approach uses the presence of labeled trusted frames from traditional tracking methods which rely on brightness changes or velocity. An appealing alternative would be to estimate the head location directly. For example, [37] uses a network to regress the coordinates of C. elegans head and tail. In addition, CNN’s that estimate keypoint positions [16], [17], [18] are now widely available. However, such current general techniques applied to ambiguous worm images result in low-confidence head-tail location probabilities, especially for blurry, low-resolution or self-occluded images. Training for this task is noisy and slow to converge, suggesting that there is simply not enough visual information in a single image.

Our posture model necessitates a library of examples which we obtained from N2 worms. Some strains however have different postures such as lon-2 or dpy which are longer and shorter than N2, respectively. In particular, lon-2 can make more coils due to its longer body, and our posture model does not represent the wider variety of possible postures. Of course, we can always augment the posture library. But a more general solution is to create a physical model of the worm [38].

Our approach follows advances in human eye gaze and hand pose estimation where it is difficult to obtain accurate labeled data. 3D Computer Graphics are often employed to create synthetic images [39] with increasing realism [40]. Synthetic images for human pose estimation have also been created by combining and blending small images corresponding to the body limbs of a labeled image, to form new realistic images [41]. To bridge the similarity gap between the real and the synthetic domain, Generative Adversarial Networks (GAN) techniques alter such computer-generated images [42] or directly generate synthetic images from a source image and a target pose [43]. Models of the deformable source object (e.g. human limbs) are often encoded into such generative networks to avoid unrealistic results. Some of these ideas have been recently applied to laboratory organisms [44], including C. elegans, but have avoided the fundamental complexity of self-occluding shapes. Outside of the laboratory, [45] proposes an end-to-end approach to estimate zebra pose using a synthetic dataset and jointly estimating a model of the animal pose with a texture map. Another approach is to adversarially train a feature discriminator until the features from the synthetic and real domain are indistinguishable [46, 47]. In both humans and animals, we expect that the combination of physical body models and image synthesis will be important for future progress in precise pose estimation.

Supporting information

S1 Fig. Model selection assessment in the Gaussian Mixture Model of worm shapes.

(A) Akaike Information Criterion for GMMs with different numbers of gaussian components. The minimum is attained with N = 270 gaussian components. Error bars represent 95% confidence intervals over 100 different training sets of ∼ 15000 worm shapes sampled uniformly according to the body curvature as measured by the third eigenworm coefficient, a3. (B) Covariance matrix of the space of mean subtracted tangent angles θ for the data used in training (left) and an equal number of simulated angles (right).

(TIF)

S2 Fig. The cumulative distribution of the image error for all available labeled (and thus uncoiled) frames in the N2 wild-type and AQ2934 mutant datasets.

(TIF)

S3 Fig. Comparing WormPose to a reference method (RCS) [9].

(A) We show the cumulative image error of predicted images, similarly to Fig 5A. While the image error is similar, WormPose is faster and does not make use of temporal information (a possible route for future improvement). (B) Cumulative distributions of the difference in a3 mode values δ=WPa3|-|RCSa3, restricted to coiled shapes (|a3| > 15) and image error ≤0.3 as determined from the output of WormPose. We plot separate distributions for the wild-type and mutant strains. Large deviations between the methods occur primarily in the coiled mutants and we manually examine a subset of 100 images with δ > 10 (a difference chosen to facilitate comparisons by eye) where we find 72% correctly tracked by WormPose, 6% correctly tracked by RCS, and 22% in which the better tracked centerline was unclear. A video of this inspection process is available with the data. (C) Qualitative results for a selection of frames where the image error doesn’t fully describe the discrepancies between the two methods. Very tight loops (top) are challenging for both methods and RCS typically misidentifies crossings where greyscale information would help (middle and bottom).

(TIF)

Acknowledgments

We thank Mathijs Rozemuller (AMOLF) for code testing and for providing a tutorial dataset, as well as Jarlath Rodgers (University of Toronto) and Kelimar Diaz Cruz (Georgia Tech) for code testing. We are also grateful for the help and support provided by the Scientific Computing section of Research Support Division at OIST.

Data Availability

The data is available here: https://wormpose.unit.oist.jp.

Funding Statement

We acknowledge funding from the Vrije Universiteit Amsterdam and OIST Graduate University. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1. Niepoth N, Bendesky A. How Natural Genetic Variation Shapes Behavior. Annual Review of Genomics and Human Genetics. 2020;21(1). 10.1146/annurev-genom-111219-080427 [DOI] [PubMed] [Google Scholar]
  • 2. Musall S, Kaufman MT, Juavinett AL, Gluf S, Churchland AK. Single-trial neural dynamics are dominated by richly varied movements. Nature Neuroscience. 2019;22(10):1677–1686. 10.1038/s41593-019-0502-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Ahamed T, Costa AC, Stephens GJ. Capturing the continuous complexity of behaviour in Caenorhabditis elegans. Nature Physics. 2021;17(2):275–283. 10.1038/s41567-020-01036-8 [DOI] [Google Scholar]
  • 4. Berman GJ. Measuring behavior across scales. BMC biology. 2018;16:23. 10.1186/s12915-018-0494-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Brown AEX, de Bivort B. Ethology as a physical science. Nature Physics. 2018;14(7):653–657. 10.1038/s41567-018-0093-0 [DOI] [Google Scholar]
  • 6. Gray JM, Hill JJ, Bargmann CI. A circuit for navigation in Caenorhabditis elegans. Proceedings of the National Academy of Sciences of the United States of America. 2005;102(9):3184–91. 10.1073/pnas.0409009101 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Stephens GJ, Johnson-Kerner B, Bialek W, Ryu WS. Dimensionality and Dynamics in the Behavior of C. elegans. PLOS Computational Biology. 2008;4(4):1–10. 10.1371/journal.pcbi.1000028 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Donnelly JL, Clark CM, Leifer AM, Pirri JK, Haburcak M, Francis MM, et al. Monoaminergic orchestration of motor programs in a complex C. elegans behavior. PLoS Biology. 2013;11(4):e1001529. 10.1371/journal.pbio.1001529 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Broekmans OD, Rodgers JB, Ryu WS, Stephens GJ. Resolving coiled shapes reveals new reorientation behaviors in C. elegans. eLife. 2016;5:e17227. 10.7554/eLife.17227 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Nagy S, Goessling M, Amit Y, Biron D. A Generative Statistical Algorithm for Automatic Detection of Complex Postures. PLOS Computational Biology. 2015;11(10):e1004517. 10.1371/journal.pcbi.1004517 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. Javer A, Currie M, Lee CW, Hokanson J, Li K, Martineau CN, et al. An open-source platform for analyzing and sharing worm-behavior data. Nature Methods. 2018;15(9):645–646. 10.1038/s41592-018-0112-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Fontaine E, Burdick J, Barr A. Automated Tracking of Multiple C. Elegans. In: 2006 International Conference of the IEEE Engineering in Medicine and Biology Society; 2006. p. 3716–3719. 10.1109/IEMBS.2006.260657 [DOI] [PubMed]
  • 13. Huang Kuang-Man, Cosman Pamela, Schafer William R. Machine vision based detection of omega bends and reversals in C. elegans. Journal of neuroscience methods. 2006;158(2):323–336. 10.1016/j.jneumeth.2006.06.007 [DOI] [PubMed] [Google Scholar]
  • 14. Roussel N, Sprenger J, Hendricks Tappan S, Glaser J. Robust tracking and quantification of C. elegans body shape and locomotion through coiling, entanglement, and omega bends. Worm. 2015;3:00–00. 10.4161/21624054.2014.982437 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Guo Y, Govindarajan LN, Kimia B, Serre T. Robust pose tracking with a joint model of appearance and shape; 2018. https://arxiv.org/abs/1806.11011 [Google Scholar]
  • 16. Mathis Alexander, Mamidanna Pranav, Cury Kevin M, Abe Taiga, Murthy Venkatesh N, Mathis Mackenzie Weygandt, et al. DeepLabCut: markerless pose estimation of user-defined body parts with deep learning. Nature Neuroscience. 2018;21(9):1281–1289. 10.1038/s41593-018-0209-y [DOI] [PubMed] [Google Scholar]
  • 17. Pereira Talmo D, Aldarondo Diego E, Willmore Lindsay, Kislin Mikhail, Wang Samuel S H, Murthy Mala, et al. Fast animal pose estimation using deep neural networks. Nature Methods. 2019;16(1):117–125. 10.1038/s41592-018-0234-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Graving JM, Chae D, Naik H, Li L, Koger B, Costelloe BR, et al. DeepPoseKit, a software toolkit for fast and robust animal pose estimation using deep learning. eLife. 2019;8. 10.7554/eLife.47994 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Wang L, Kong S, Pincus Z, Fowlkes C. Celeganser: Automated Analysis of Nematode Morphology and Age. In: The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops; 2020. p. 968–969.
  • 20. Bates K, Jiang S, Chaudhary S, Jackson-Holmes E, Jue ML, McCaskey E, et al. Fast, versatile and quantitative annotation of complex images. BioTechniques. 2019;66(6):269–275. 10.2144/btn-2019-0010 [DOI] [PubMed] [Google Scholar]
  • 21. Javer A, Currie M, Lee CW, Hokanson J, Li K, Martineau CN, et al. An open-source platform for analyzing and sharing worm-behavior data. Nature Methods. 2018;15(9):645–646. 10.1038/s41592-018-0112-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22. Javer A, Ripoll-Sánchez L, Brown AEX. Powerful and interpretable behavioural features for quantitative phenotyping of Caenorhabditis elegans. Philosophical Transactions of the Royal Society B: Biological Sciences. 2018;373(1758):20170375. 10.1098/rstb.2017.0375 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23. Bradski G. The OpenCV Library. Dr Dobb’s Journal of Software Tools. 2000;. [Google Scholar]
  • 24. Bishop CM. Pattern Recognition and Machine Learning (Information Science and Statistics). Berlin, Heidelberg: Springer-Verlag; 2006. [Google Scholar]
  • 25. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research. 2011;12:2825–2830. [Google Scholar]
  • 26.He K, Zhang X, Ren S, Sun J. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV); 2015. p. 1026–1034.
  • 27. He K, Zhang X, Ren S, Sun J. Identity mappings in deep residual networks. In: Computer Vision—ECCV 2016. ECCV; 2016; 2016. p. 630–645. [Google Scholar]
  • 28.Kingma D, Ba J. Adam: A Method for Stochastic Optimization. International Conference on Learning Representations. 2014;.
  • 29. Jones E, Oliphant T, Peterson P, et al. SciPy: Open source scientific tools for Python; 2001–. Available from: http://www.scipy.org/. [Google Scholar]
  • 30. Flavell SW, Pokala N, Macosko EZ, Albrecht DR, Larsch J, Bargmann CI. Serotonin and the neuropeptide PDF initiate and extend opposing behavioral states in C. elegans. Cell. 2013;154(5):1023–1035. 10.1016/j.cell.2013.08.001 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31. Rabiner LR. Tutorial on Hmm and Applications. Proceedings of the IEEE. 1989;77(2):257–286. 10.1109/5.18626 [DOI] [Google Scholar]
  • 32. Helms Stephen J, Rozemuller W Mathijs, Costa Antonio Carlos, Avery Leon, Stephens Greg J, Shimizu Thomas S. Modelling the ballistic-to-diffusive transition in nematode motility reveals variation in exploratory behaviour across species. Journal of the Royal Society Interface. 2019;16(157):20190174–11. 10.1098/rsif.2019.0174 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33. Stephens GJ, Bueno de Mesquita M, Ryu WS, Bialek W. Emergence of long timescales and stereotyped behaviors in Caenorhabditis elegans. Proceedings of the National Academy of Sciences of the United States of America. 2011;108(18):7286–7289. 10.1073/pnas.1007868108 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34. Sulston JE, Brenner S. The DNA of Caenorhabditis elegans. Genetics. 1974;77(1):95–104. 10.1093/genetics/77.1.95 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35. Ben Arous J, Laffont S, Chatenay D. Molecular and sensory basis of a food related two-state behavior in C. elegans. PLoS ONE. 2009;4(10):1–8. 10.1371/journal.pone.0007584 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36. Lee JB, Yonar A, Hallacy T, Shen CH, Milloz J, Srinivasan J, et al. A compressed sensing framework for efficient dissection of neural circuits. Nature Methods. 2019;16(1):126–133. 10.1038/s41592-018-0233-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37. Mane MR, Deshmukh AA, Iliff AJ. Head and Tail Localization of C. elegans; 2020. https://arxiv.org/abs/2001.03981 [Google Scholar]
  • 38. Cohen N, Ranner T. A new computational method for a model of C. elegans biomechanics: Insights into elasticity and locomotion performance; 2017. https://arxiv.org/abs/1702.04988 [Google Scholar]
  • 39.Kearney S, Li W, Parsons M, Kim KI, Cosker D. RGBD-Dog: Predicting Canine Pose from RGBD Sensors. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2020. p. 8336–8345.
  • 40.Mu J, Qiu W, Hager GD, Yuille AL. Learning From Synthetic Animals. In: The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2020. p. 12386–12395.
  • 41. Rogez G, Schmid C. Image-Based Synthesis for Deep 3D Human Pose Estimation. International Journal of Computer Vision. 2018;126(9):993–1008. 10.1007/s11263-018-1071-9 [DOI] [Google Scholar]
  • 42.Shrivastava A, Pfister T, Tuzel O, Susskind J, Wang W, Webb R. Learning From Simulated and Unsupervised Images Through Adversarial Training. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2017. p. 2107–2116.
  • 43.Balakrishnan G, Zhao A, Dalca AV, Durand F, Guttag J. Synthesizing Images of Humans in Unseen Poses. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE; 2018. p. 8340–8348. Available from: 10.1109/cvpr.2018.00870. [DOI]
  • 44.Li S, Günel S, Ostrek M, Ramdya P, Fua P, Rhodin H. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2020. p. 13158–13168.
  • 45.Zuffi S, Kanazawa A, Berger-Wolf T, Black M. Three-D Safari: Learning to Estimate Zebra Pose, Shape, and Texture From Images “In the Wild”. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV); 2019. p. 5358–5367.
  • 46.Lahiri A, Agarwalla A, Biswas PK. Unsupervised Domain Adaptation for Learning Eye Gaze from a Million Synthetic Images: An Adversarial Approach; 2018. In Proceedings of the 11th Indian Conference on Computer Vision, Graphics and Image Processing (ICVGIP 2018). Association for Computing Machinery, New York, NY, USA, Article 69, 1–9. 10.1145/3293353.3293423 [DOI]
  • 47.Kuhnke F, Ostermann J. Deep Head Pose Estimation Using Synthetic Images and Partial Adversarial Domain Adaption for Continuous Label Spaces. 2019 IEEE/CVF International Conference on Computer Vision (ICCV). 2019; p. 10163–10172.
PLoS Comput Biol. doi: 10.1371/journal.pcbi.1008914.r001

Decision Letter 0

Dina Schneidman-Duhovny

12 Dec 2020

Dear Prof. Stephens,

Thank you very much for submitting your manuscript "WormPose: Image synthesis and convolutional networks for pose estimation in C. elegans" for consideration at PLOS Computational Biology. As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. The reviewers appreciated the attention to an important topic. Based on the reviews, we are likely to accept this manuscript for publication, providing that you modify the manuscript according to the review recommendations.

Please prepare and submit your revised manuscript within 30 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email. 

When you are ready to resubmit, please upload the following:

[1] A letter containing a detailed list of your responses to all review comments, and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out

[2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file).

Important additional instructions are given below your reviewer comments.

Thank you again for your submission to our journal. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments.

Sincerely,

Dina Schneidman

Software Editor

PLOS Computational Biology

***********************

A link appears below if there are any accompanying review attachments. If you believe any reviews to be missing, please contact ploscompbiol@plos.org immediately:

[LINK]

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: "WormPose: Image synthesis and convolutional networks for pose estimation in C. elegans" by Hebert et al. introduces a new algorithm for resolving coiled postures in the nematode worm C. elegans. Progress in tracking and analysing worm behaviour has been foundational in the study of the physics of behaviour and has contributed to behavioural genetics and phentoypic drug screens. Coiled postures are an important element of C. elegans behaviour because they adopt them for sharp reorientations that are used in taxis and escape. However, because the pose of coiled worms is difficult to estimate, coiled shapes have contributed less to the analysis of worm behaviour than they should. The improved method presented here is thus an important contribution to C. elegans behaviour research.

The method can also be seen in the context of deep learning methods applied to animal behaviour which have seen a recent explosion of interest. Much of this interest has been in the fact that complex animals in complex scenes can finally be tracked at all, but there has been less attention to the accuracy of the tracking. I appreciate the lenghts the authors go to in this paper to quantify the quality of the tracking results.

The final important aspect of the method is that it works without large sets of manually labelled training images which is an advantage compared to most other methods. Even in cases where manually annotated data are available, I can imagine their image synthesis method being used for augmentation.

The paper is well written and clear, the figures are well-presented, and the reported software is accessible and well-documented. In my opinion the paper is publishable as-is. My comments below should be seen only as suggestions for possible improvement.

Specific comments:

-when discussing previous methods of solving the coiling problem Huang et al. (2006) J. Neurosci. Meth. should be cited as well.

-"CNNs are the foundation for recent, remark- able progress in markerless body point tracking [15–17], including worm posture [18, 19]. However, intensive la- beling requirements by human annotators, even if as- sisted by technology [20], as well as the ambiguity of which or exactly how many points to label, offer a bar- rier to the usefulness of CNNs in posture tracking and beyond." If I've understood correctly, the method in reference 18 also works without manually annotated images so the discussion should be modified to reflect that.

-in the network architecture and training section more detail on the CNN should be provided (type of CNN (ResNet), how many layers, neurons per layer, etc.)

-typo: "we set set the weights"

-C. elegans not italicised in second paragraph of discussion.

Reviewer #2: Summary

The authors develop a neural network-based algorithm and open source package in python for reconstructing postures of the model organism C. elegans. Previous work successfully reconstructs postures for simple and complex body shapes, and this work is tailored at reconstructing shapes difficult to resolve. In addition, the computational time of this algorithm is orders of magnitude faster than the most similar related work, making the approach largely feasible now. This algorithm is trained on pixels directly, and includes a generative model that can simulate large quantities of images with a ground truth posture. This step allows small quantities of previously labeled images to be expanded to arbitrary amounts of training data, allowing automation of the entire pipeline. Finally, the authors apply their algorithm to analyze a large posture dataset, producing new insights about the organization of actions within roaming-dwelling two-state switching behavior.

Detailed Review

Overall this manuscript describes an appropriate use of an exciting technology (neural networks) to an outstanding problem of biological interest (posture analysis), and overcomes some key limitations of previous work (extremely long computational time). The main issue that we see is appropriate error quantifications, also with respect to direct comparability with previous work. Thus should be straight-forward to fix. The following is organized into several sections:

1) Various error quantifications

Specifically, the errors reported, e.g. in Fig. 5, are not most relevant to broader usefulness of the algorithm to experimentalists and the interpretable output. We would like to see a quantification of errors but for ground truth centerlines in real data.

The use of pixel error is less interpretable and does not address the difficult problems this algorithm aims to solve. This is particularly important for cases like that shown in Fig. 1C, where the pixel error may be very small, but the posture error may be very large. Indeed, how often does this algorithm correctly distinguish between the two possibilities in Fig. 1C? We wonder whether there could be a direct and intuitive metric used for evaluation, for example the fraction of correctly annotated centerline crossings.

Related, the underlying data are effectively split into easy postures (straight motion) and very difficult postures (coiled and self-occluding). Thus, it seems important to the claims of the paper to characterize the algorithm error separately for these qualitatively different clusters of postures, as in the similar panel of Fig. 2F in related previous work [1].

Similarly, the comparison to previous work is incomplete. Figure S4 compares the pixel-wise errors between this work and a previous algorithm, but given that the scientific output of this algorithm is centerlines, a direct comparison of centerline error seems necessary. In the second results section the authors state “Unfortunately, a lack of ground truth posture sequences means that we cannot directly compare the posture estimates of RCS and WormPose.” However, a smaller set of manual annotations can be generated to produce this comparison.

A different step of the algorithm also appears to require an additional error reporting: the head-tail discrimination module. While the methodology is well explained, the overall fraction of correctly annotated heads/tails is not reported.

2) Minor data notes

The authors state “we also expect that the labeled frames provide the head-tail position at regular intervals throughout the video". Is there a GUI or expected format for this input? Although this requirement leaves the following statement in the abstract technically correct “thus avoiding the need for human-labeled training”, it should be noted more prominently that analyzing a new dataset is not completely automated.

Related, the authors assume “that the input data consists of videos of a single moving worm and that most of the non-coiled frames are analyzed beforehand”, but Section Results/”Comparison with previous approaches” states that WormPose can be performed on a laptop. A sentence about whether the entire pipeline (i.e. including the required pre-analysis) can also be done on a laptop should be included.

3) Minor algorithm notes

The authors use a GMM to build their generative model. It is unclear why this choice was made. We assume this method was chosen because it is simple, fast, common, and generative. However, this should be explained.

In Fig. S2A, the authors use AIC to determine the number of components of their GMM. However, two things are unclear: Why was AIC chosen and not cross-validation? What are the error bars on the plot? Although it does not seem to affect the results at all, a sentence or two of explanation would be appreciated.

4) Minor other notes and typos

The videos on the GitHub page are nice; some similar videos and definitely more of them should be included with the supplementary information.

In the second paragraph of section Methods/”Processing Natural Images” the quotes are wrong in the phrase:

The morphological operation ”close”

The caption of Fig. S1 should be “generate” not “generated”

5) Other computational notes

We commend the authors for a well presented open-source package. In particular, it is excellent that the package is installable via pip and the requirements are included, with versions, in the repository.

We believe that the computational speed of this package is a very strong asset, and could be emphasized more as an important scientific contribution.

References

[1] Broekmans, O.D., Rodgers, J.B., Ryu, W.S. and Stephens, G.J., 2016. Resolving coiled shapes reveals new reorientation behaviors in C. elegans. Elife, 5, p.e17227.

Reviewer #3: Summary:

This paper addresses problems with resolving coiled/self-occluding shapes in worms. Conventional image skeletonization methods work for uncoiled shapes but fail when worms self-intersect. For these coiled shapes, this group has previously implemented a method which resolves centerlines by searching the space of eigenworms for a reasonable match, however this method is time consuming and computationally expensive. Other groups have used convolutional neural networks to track organisms (e.g. DeepLabCut) without labels and have even applied CNNs to tracking worm centerlines. The authors claim that these CNN based methods are limited because they require human annotation as an input, so they have devised a scheme for creating synthetic data based on traditional image skeletonization (uncoiled shapes) and the eigenmode projection method in ref 9 (for coiled shapes) as a training data set for the CNN. This is the primary new contribution in the paper. The paper will certainly be of interest to researchers studying nematode movement, but it would nice to know if the technique is restricted to "roaming and dwelling" behaviors or can be applied to movement in more complex environments (and other organisms).

Here are some broad comments which would be valuable for the authors to address at least in response letter, and could potentially make it into the manuscript to increase biological relevance:

Overall, I think the method is sound and useful, however I wonder about its domain of applicability. Since it's based on eigenmode projections, its not clear to me that it will work efficiently in cases where worms are no longer well captured by the eigenworms derived from observations of agar crawling. This might include worms in more complex environments. If they could show that, for instance, thrashing worms could still be reasonably well resolved, or mutants that display radically different mechanical properties, rolling mutants etc. could be well captured with this eigenworm driven training set that would be valuable. I'm not sure if that would just reflect that robustness of the CNN or if that would suggest that the eigenworms from agar are generic enough to be pushed into new territory. In terms of domain of application, what about other problems where linear shapes are bent into self-occluding shapes (e.g. snakes or plant roots)?

If the use is limited to worms on agar, that's still a pretty wide community and essentially the tool is good for identifying nuances of turning behavior for worms on agar at scale. This suggests a question, why not just use the centroid information to perform behavioral phenotyping and studying long time behavior? Or perhaps centroid + fitting the worm to an ellipsoid to get overall angle information? What do you gain by having postural information at this scale? They attempt to address this in the last section of the results (posture-scale analysis of roaming/dwelling behavior).

It is certainly impressive and a little tantalizing to have posture-level resolution in a 10 hour, 30 Hz experiment, however, I think they could have done a little more digging to say what is gained with this extremely fine grained analysis. They assess some subtleties of the difference between roaming and dwelling states which have previously been identified by simply looking at raw motion, however I don't know how salient these subtleties are, moreover the basic identification of these behavioral states is not upended by the details. Can the postural analysis make the definition of these states more robust, rather than just using centroid information to define the states and then commenting on subtle postural differences in the centroid-determined states? The most striking detail they uncover is in Fig. 6 E, but this doesn't really engage the self-occluded shapes, since these are forward and reverse travelling body waves? I don't dispute that these details are interesting, but they could be pushed further, though that may be beyond the scope of this paper.

My last question is this, in previous work (ref 9) resolution of self-occluded shapes revealed a distinction between the delta turn and the omega turn. Since this technique allows a dramatic scale up in the number of resolvable self-intersecting states, does the larger data set shed any light on novel details of turning? Can turning behaviors be even more subtly delineated with the higher statistical resolution the technique allows?

Here are some specific (nitpicking) comments: I would like to see terms like "generative" defined. And I would hardly call the images presented in the manuscript "Natural" images!

**********

Have all data underlying the figures and results presented in the manuscript been provided?

Large-scale datasets should be made available via a public repository as described in the PLOS Computational Biology data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Reviewer #3: No

Figure Files:

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.

Data Requirements:

Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example in PLOS Biology see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5.

Reproducibility:

To enhance the reproducibility of your results, PLOS recommends that you deposit laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see http://journals.plos.org/ploscompbiol/s/submission-guidelines#loc-materials-and-methods

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1008914.r003

Decision Letter 1

Dina Schneidman-Duhovny

25 Mar 2021

Dear Prof. Stephens,

We are pleased to inform you that your manuscript 'WormPose: Image synthesis and convolutional networks for pose estimation in C. elegans' has been provisionally accepted for publication in PLOS Computational Biology.

Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests.

Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated.

IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript.

Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us now if you or your institution is planning to press release the article. All press must be co-ordinated with PLOS.

Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Computational Biology. 

Best regards,

Dina Schneidman

Software Editor

PLOS Computational Biology

***********************************************************

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1008914.r004

Acceptance letter

Dina Schneidman-Duhovny

8 Apr 2021

PCOMPBIOL-D-20-01856R1

WormPose: Image synthesis and convolutional networks for pose estimation in C. elegans

Dear Dr Stephens,

I am pleased to inform you that your manuscript has been formally accepted for publication in PLOS Computational Biology. Your manuscript is now with our production department and you will be notified of the publication date in due course.

The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript.

Soon after your final files are uploaded, unless you have opted out, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers.

Thank you again for supporting PLOS Computational Biology and open-access publishing. We are looking forward to publishing your work!

With kind regards,

Andrea Szabo

PLOS Computational Biology | Carlyle House, Carlyle Road, Cambridge CB4 3DN | United Kingdom ploscompbiol@plos.org | Phone +44 (0) 1223-442824 | ploscompbiol.org | @PLOSCompBiol

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Fig. Model selection assessment in the Gaussian Mixture Model of worm shapes.

    (A) Akaike Information Criterion for GMMs with different numbers of gaussian components. The minimum is attained with N = 270 gaussian components. Error bars represent 95% confidence intervals over 100 different training sets of ∼ 15000 worm shapes sampled uniformly according to the body curvature as measured by the third eigenworm coefficient, a3. (B) Covariance matrix of the space of mean subtracted tangent angles θ for the data used in training (left) and an equal number of simulated angles (right).

    (TIF)

    S2 Fig. The cumulative distribution of the image error for all available labeled (and thus uncoiled) frames in the N2 wild-type and AQ2934 mutant datasets.

    (TIF)

    S3 Fig. Comparing WormPose to a reference method (RCS) [9].

    (A) We show the cumulative image error of predicted images, similarly to Fig 5A. While the image error is similar, WormPose is faster and does not make use of temporal information (a possible route for future improvement). (B) Cumulative distributions of the difference in a3 mode values δ=WPa3|-|RCSa3, restricted to coiled shapes (|a3| > 15) and image error ≤0.3 as determined from the output of WormPose. We plot separate distributions for the wild-type and mutant strains. Large deviations between the methods occur primarily in the coiled mutants and we manually examine a subset of 100 images with δ > 10 (a difference chosen to facilitate comparisons by eye) where we find 72% correctly tracked by WormPose, 6% correctly tracked by RCS, and 22% in which the better tracked centerline was unclear. A video of this inspection process is available with the data. (C) Qualitative results for a selection of frames where the image error doesn’t fully describe the discrepancies between the two methods. Very tight loops (top) are challenging for both methods and RCS typically misidentifies crossings where greyscale information would help (middle and bottom).

    (TIF)

    Attachment

    Submitted filename: WormPose_rebuttal.pdf

    Data Availability Statement

    The data is available here: https://wormpose.unit.oist.jp.


    Articles from PLoS Computational Biology are provided here courtesy of PLOS

    RESOURCES