Skip to main content
PLOS ONE logoLink to PLOS ONE
. 2020 Mar 3;15(3):e0229583. doi: 10.1371/journal.pone.0229583

Optimized criteria for locomotion-based healthspan evaluation in C. elegans using the WorMotel system

Areta Jushaj 1, Matthew Churgin 2, Bowen Yao 2, Miguel De La Torre 2, Christopher Fang-Yen 2, Liesbet Temmerman 1,*
Editor: Sean P Curran3
PMCID: PMC7053758  PMID: 32126105

Abstract

Getting a grip on how we may age healthily is a central interest of biogerontological research. To this end, a number of academic teams developed platforms for life- and healthspan assessment in Caenorhabditis elegans. These are very appealing for medium- to high throughput screens, but a broader implementation is lacking due to many systems relying on custom scripts for data analysis that others struggle to adopt. Hence, user-friendly recommendations would help to translate raw data into interpretable results. The aim of this communication is to streamline the analysis of data obtained by the WorMotel, an economically and practically appealing screening platform, in order to facilitate the use of this system by interested researchers. We here detail recommendations for the stepwise conversion of raw image data into activity values and explain criteria for assessment of health in C. elegans based on locomotion. Our analysis protocol can easily be adopted by researchers, and all needed scripts and a tutorial are available in S1 and S2 Files.

Introduction

While a bliss for individuals in good health, the continued increase in human life expectancy is also associated with an increased prevalence of age-related diseases, warning our societies to tackle this socio-economic challenge. Therefore, amelioration of the quality of life in aged populations will be an important task in years to come.

It is well understood in the research field that the concept of being healthy is much more ambiguous than the concept of being alive, and different individuals have a different perception of what is understood as ‘being healthy’ [1]. Healthspan is generally described as the period in life during which the organism is in good health and free from disease [2]. It is immediately clear that this fluid definition reflects a similar lack of consensus amongst researchers, which translates to a variety of proposed parameters for healthspan evaluation. In human clinical settings, grip strength, gait analysis and ability to perform daily tasks (e.g. bathing) are often used as criteria for good health [35].

To find interventions that affect aging, considerations in cost and time-efficiency have led to the use of different model organisms. The nematode C. elegans is a well-established model for aging with the advantages of a short lifespan and ease of cultivation. Work in this model indicates that a longer lifespan does not always correlate with proportional increases in healthy life [69], reaffirming the notion that understanding how organisms can age healthily is important.

As summarized by [6], several physiological and functional parameters that change with age can be studied in C. elegans, such as lipofuscin accumulation or pharyngeal pumping. Among these, the most powerful predictor of longevity seems to be movement [8,1013]. Similar to humans, the ability of C. elegans to move diminishes with aging [14], as they decline towards a state of frailty where they are only able to move their head, characteristic of late phases of life. Research into C. elegans aging is often challenging due to labor-intensive follow-up of experiments and the collection of longitudinal data at the population level, rather than at the level of the individual. To address these impediments, several groups developed semi-automated systems that bypass (some of) these issues and assess movement longitudinally in aging animals [7,1518]. All these systems rely on longitudinal imaging of either individual [7,12,13,16,19] or populations of [17,18,20,21] worms, after which image processing and analysis are used for determination of lifespan and activity decline. While relying on similar principles, these systems differ in high-throughput potential, detail in image acquisition, the way the worms are stimulated to move and whether populations or individuals are studied.

The WorMotel [22] allows longitudinal measurements of activity in 240 individual animals simultaneously. Time-lapse images of the aging worms are used to quantify their movement for life- and healthspan determination. Because 240-well plates are typically imaged for only 20 minutes per day, one imaging station can collect data for thousands of individuals each day. Due to the aspired throughput of this system, having a clear-cut analysis protocol that can distinguish phenotypes, is crucial. In terms of lifespan, this is straightforward, but because of the ambiguous definition of healthspan, criteria to determine whether an animal is healthy or not are currently lacking. While basic programming tools to calculate movement based on images taken with the WorMotel system were developed and reported [7], users are faced with several choices for image processing parameters during data analysis. These include deciding on a time interval for activity calculation, condensing data of the intermittent monitoring periods into data points along longitudinal activity curves, setting a reasonable threshold for health, and considering the applicability of chosen settings to long- and short-lived populations. A systematic study on how these choices affect outcomes such as individual life- and healthspan has not been carried out. We therefore aimed to determine the most robust choices for straightforward selection of interventions that may affect healthy ageing.

We performed a proof of concept study on data of wild-type, long-lived daf-2 RNAi-treated and short-lived daf-16 RNAi-treated animals. This work provides insight into recommended standard settings and can serve as a basis for users of the WorMotel to tune their own data processing choices and highlight specific behaviors of interest.

Materials and methods

Strains, maintenance and worm synchronization

In this study, wild-type N2 animals fed on E. coli OP50 were used. Strain maintenance and experiments were performed at 20°C. Mixed cultures were bleached and eggs were collected by standard procedures [23].

WorMotel plate preparation and RNAi

WorMotel plates were prepared as described by [7]. We used E.coli strain HT115 transformed with the L4400 vector containing no (= control), daf-2 or daf-16 RNAi constructs as derived from the Ahringer library [24]. Bacterial strains were grown at 37°C for a minimum of 12 hours, whereafter 1 mM IPTG was used for induction (2 hours, 37°C). To minimize effects of diet, equal amounts of bacterial solutions were seeded onto WorMotel wells, and potential positional effects were minimized by doing so according to a quadrant design (60 wells per strain–one control and three test strains per plate). Hatched N2 L1 stage worms were grown on Nematode Growth Medium (NGM) plates containing carbenicillin (50 μg/ml) seeded with E. coli HT115 containing the (empty) L4440 vector and reared at 20°C for 48 hours. At late L4 stage, worms were sorted using a COPAS Biosort (Union Biometrica) onto WorMotel plates.

Image processing and parameter extraction

Each plate was monitored daily for 20 minutes with an Imaging Source DMK 23GP031 camera (2592 x 1944 pixels) equipped with a Fujinon lens (HF12.5SA-1, 1:1.4/12.5 mm, Fujifilm Corp., Japan) as previously described [7]. We used IC Capture (Imaging Source) to acquire time-lapse images through a gigabit Ethernet connection, this over a period of approximately 40 days. Images were taken every five seconds and a five-second blue light stimulation was applied at minute 10. For the blue light stimulation, three high-power LEDs (at a current of 20 A, Luminus PT-121, Sunnyvale, CA, irradiance at plate 1.2 mW/mm2) were used. Image subtraction with a custom-made MATLAB script was performed, where for every captured image (~ 120 pictures post blue light stimulation) pixel value intensity changes are calculated in comparison with an image preceding it by a defined interval (not necessarily the preceding image in the series). For this study, pixel differences were calculated for intervals of 5, 20, 60, 80, 100, 150, 200, 250, 300 and 540 seconds (S1 File). Calculations were executed according to [7]. Briefly, for each set of two compared images, a difference image was calculated and divided by the average pixel intensity between the two images to generate normalized maps of pixel value intensity change. Incorporation of noise was reduced by consecutively applying (i) a Gaussian smoothing filter (standard deviation of one pixel) and (ii) a binary threshold of 0.25 to the difference image [7]. The total number of pixel locations changed on the resulting image was then used as a measurement for activity. We always worked with post-stimulation pixel difference data (collected minutes 10–20), since it has been shown that spontaneous activity is a confounded readout [7]. Moreover, we observed that stimulated activity leads to a more reliable assessment of lifespan in aged animals, as they tend to show less spontaneous movement (Fig 1).

Fig 1. Data collected after blue light stimulation are most suitable for quantification of lifespan.

Fig 1

Data are from one representative animal. Blue light stimulation (at “time 0” of the imaging interval) is crucial to ensure accurate lifespan determination, especially in older animals, which typically display little or no spontaneous movement within the 20 min imaging interval.

To convert the imaging data to a single value per condition per day, we considered several options that represent different ways of looking at the animal’s ability to move (Fig 2). For this, for all daily pixel difference data series the median, 99th percentile (or ‘maximal activity’), average of all values within the range defined by [95th to 99th] percentiles (further referred to as ‘peak activity’) and integral (corresponding to area under the curve of Fig 2A) were calculated (Fig 2A).

Fig 2. Overview of data analysis.

Fig 2

The activity of a single worm during one monitoring time (one day) can be summarized in different ways, relying on (A) the median, 99th percentile (also: ‘maximal activity’), average of all values between the 95th to 99th percentiles (also: ‘peak activity’, red box) or integral (purple shading) values of the pixel difference data. (B) This process can be repeated for different days for the same worm, unveiling how activity (here: peak activity) changes over a lifetime. (C) This analysis is performed for all the worms of the population, based on which (D) the average survivor activity of worms belonging to the same population can be calculated. Worms showing an activity above the green threshold (panel C) are considered healthy—see main text for details.

Overall variation

Overall variation was calculated as:

overallvariation=i=1N(|ai+1ai|<a1N>)N

with N the total number of days the worm was monitored as alive, ai the activity value on day i for that worm, and <a1→N> the average activity of the animal over its lifetime. As such, an individual’s overall variation reflects its average change in activity per day. In short, for each worm, all differences in activity between each two consecutive days are summed. This value is normalized by dividing it by the average activity of the worm over its lifetime, a necessary step for comparison of metrics with different magnitudes (e.g. median vs integral). For ease of comparison, a daily value is obtained by dividing by the number of observations (note that this is not essential for interpretation). For each worm, overall activity values were calculated based on median, maximal activity, peak activity and integral (Fig 2) input values.

Variance of movement Z-score

Each individual’s movement Z-score as a function of time is defined as:

Zscore=aiμaiσai

where ai is the activity value of the worm on day i, μai is average activity of the population on day i and σai is the standard deviation of the population on that day. A Z-score therefore reflects how different an individual worm is from the population, and this for each day of its life. When calculating the variance of this Z-score for each individual worm, a value is obtained that reflects the magnitude to which the longitudinal activity profile of an individual worm deviates from that of the (simultaneously alive part of the) population.

Lifespan and health determination

The lifespan of each worm was always determined as the last day when the worm showed a daily peak activity above 5 pixels changed, as described in [7].

To determine whether an animal was healthy or not, we correlated blinded manual assessments of health with calculated pixel differences for a representative set of animals. Locomotive health was empirically evaluated by three independent scientists for blinded activity movies of 24 randomly chosen worms, aimed to represent two worms per genotype per plate (randomly selected from each population). These 24 animals to be evaluated for each monitoring time lead to a total of 715 manual assessments, all executed in triplicate. Each scientist could assign quality of movement upon blue light stimulation to one of five categories: (1) very fast: the animal moved multiple (>2) body lengths, (2) fast: the animal moved 1 to 2 body lengths, (3) medium fast: the animal continuously moved, but within body length, (4) slow: the animal did move within body length, but was then inactive, and (5) inactive: the animal did not move. One worm had to be excluded from the analysis because it did not belong to a genotype discussed in this study, leading to a total of 23 studied worms.

For our analyses, we distinguish ‘total days of health’ from healthspan. Total days of health (TDH) refer to the total number of days—not necessarily consecutive—for which an individual displays an activity greater than the 160 pixel difference threshold (see Results). Healthspan (HS) is defined as the very last day when an individual’s activity is above said threshold. Obviously, the value for ‘total days of health’ is always lower than that of healthspan. We further define the health ratio (HR) of individual worms as the ‘total days of health’ divided by the total days of lifespan. Alternatively, healthspan ratio (HSR) can be calculated as the ratio of healthspan vs lifespan.

As a final metric for health interpretation, we calculated the definite integral of the average activity of the population by approximation through the trapezoid rule. This value approximates the area under the curve of the average activity of the studied population, whose shape depends on the genotype [7,13].

Statistical analysis and graphical representations

Graphical representations and statistical tests–regarding normality (Shapiro-Wilk), significance of population differences (Kruskal-Wallis or ANOVA) and correction for multiple testing (Tukey-Kramer)–were run using MATLAB®. Linear correlation between different metrics for the same worm, at the same time point, was assessed based on least-squares fit and calculated using MATLAB®.

Results

In recent years, diverse research teams worked towards alleviating the labor-intensive aspects of C. elegans-based studies of longevity and aging. Amongst the developed semi-automated platforms [17,18,20], the WorMotel [7] stands out for its capacity to collect data on thousands of individuals on a daily basis, adding to its appeal as a medium- to high-throughput screening solution for studies of aging. To facilitate the adoption of this system for fast evaluation of high numbers of interventions, we here evaluate the data analysis workflow and discuss analytical decisions made during the process of life- and healthspan analysis.

We collected data of wild-type animals reared under control conditions (marked ‘empty vector’ (EV)) or treated with daf-2 or daf-16 RNAi. The exact effects of genetic interventions on the lifespan of C. elegans vary somewhat in high-throughput [17] screens and between different labs [25], but daf-2 consistently leads to longevity, while daf-16 consistently shortens lifespan. To reflect the expected variation when different labs use the WorMotel platform, we used data of four completely independent experiments that represent a large plate-to-plate variability, run over a period of 4 months. For one of the four independently executed experiments (Fig 3), the study period was terminated before it could capture all deaths of daf-2 RNAi-treated animals, as can happen for long-lived interventions when evaluated in high-throughput screens.

Fig 3. Independent experiments comparing control, daf-2 and daf-16 RNAi-treated populations show inter-experiment differences yet adhere to expected relative survival changes.

Fig 3

To capture expected variation between possible end-users, four entirely independent experiments were performed, where many factors—including the robotic setup—could have contributed to differences in absolute effect size. Experiments I-IV shown in panels A-D.

Data were obtained and processed as described in ‘Materials and Methods’. During this procedure, the analyst faces several choices that may influence the final results. We evaluated these potential choices at each step, to propose a workflow minimizing variation while remaining applicable to and control, and short-, and long-lived phenotypes.

Daily peak activity is the more robust activity parameter

For calculations of life- and healthspan, any worm’s daily activity trace needs to be converted to a single value per worm per day (Fig 2). There are different ways to do this, representing slightly different biological perspectives.

One way to define the worm’s activity, is by taking the median of the worm’s response activity values (Figs 2A and 4A). This value is less sensitive to fluctuations due to noise than the more commonly used mathematical average, therefore it is an expectedly more robust way to define the worm’s “average” response over the studied time interval. Alternatively, one can look at the peak response, representative of the animal’s maximal response to the stimulus. To extract this information, we relied on 99th percentile (maximal activity) (Figs 2A and 4B), but also on the average of all values within the range defined by the [95th to 99th] percentiles (peak activity) (Figs 2A and 4C), as this latter value again should be slightly less prone to outliers or noise than the 99th percentile. Finally, the integral, i.e. the area under the curve, is indicative of the worm’s overall capability to move and maintain that activity (Figs 2A and 4D). For example, one could expect an older worm to still quickly respond to the stimulus, but also return fast to very low activity levels, whereas a younger worm might, upon stimulation, keep up the elevated movement for a longer time. Both median and integral values would capture such a difference better than maximal or peak values would.

Fig 4. All metrics used to produce an activity trace reflect the inherent day-to-day variation.

Fig 4

Traces of a representative daf-2 RNAi-treated worm, constructed based on (A) the median, (B) maximal activity, (C) peak activity or (D) integral—see main text.

Possible interdependency of these parameters can easily be assessed by a simple correlation analysis. Linear regression analysis suggests that peak and maximal values correlate strongly on one hand, and median values clearly correlate with the integral values on the other hand (S1 and S2 Figs, and S1 Table). All other correlations are much weaker, suggesting that these four parameters reflect two interpretations of the daily activity profiles: peak/maximal vs median/integral (S1 and S2 Figs, and S1 Table).

The worm’s activity trace over its lifetime is ultimately used to determine life- and healthspan of the animal. Due to sparse sampling in high-throughput settings (such as monitoring once or twice per day for a short period of time), however, activity traces are discontinuous and their fluctuation results from a combination of biological and technical influences [7]. For each worm, building longitudinal activity curves based on each of the studied parameters–i.e. median, peak, maximal or integral activity (example shown in Fig 4)–will therefore unveil small differences in day-to-day variation that reflect differences between parameters in capturing biological and/or technical sources of variation. The ideal parameter minimizes technical viariation while correctly reflecting biological variation.

We tested the effect of parameter choice based on two assumptions: (i) aging is accompanied by a gradual activity decline on the slow timescale [14], and (ii) variation of the population average may reflect true biological variation. For this, we sought to minimize (i) ‘overall variation’ and (ii) variance of movement Z-score, as defined in Methods. Briefly, for each worm, the overall variation measures its average day-to-day variation over its lifetime, while the variance of its Z-score represents how different this individual’s activity was from that of the entire population. Using the peak or maximal activity to define the worm’s daily activity showed significantly lower overall variation (Fig 5 and S2 Table). We found this to be consistent over genotypes and time intervals, except for the 540 s interval, where results are similar for all tests (S3 Fig). The variance of the Z-score never differed for any of the tested parameters at any time interval (Fig 6 and S3 Table). Together, this indicates that although the choice of parameter does not significantly affect population spread (i.e. worms do not differently deviate from populations depending on the chosen parameter), the peak or maximal values do create smoother activity traces (lower overall variation) compared to those generated from median or integral values.

Fig 5. Data based on peak or maximal values lead to lower overall variation than those based on median or integral values.

Fig 5

Data for activity values based on daily pixel differences of 100 second intervals (other intervals: S3 Fig) for control (EV, black), daf-2 RNAi (pink) and daf-16 RNAi (yellow) show similar trends. Box plots based on individual worm data from all worms of the same genotype across all experiments. Analysis on an individual experiment basis leads to the same conclusion (S4 Fig).

Fig 6. Choice of activity parameters does not affect variance of Z-score.

Fig 6

Data for activity values based on daily pixel differences of 100 second intervals (other intervals: S5 Fig) for control (EV, black), daf 2 RNAi (pink) and daf-16 RNAi (yellow) show similar trends. Box plots based on individual worm data from all worms of the same genotype across all experiments. Analysis on an individual experiment basis leads to the same conclusion (S6 Fig).

Based on these considerations, we opted for the peak activity as the activity value of choice, preferring it over the maximal activity based on its ability to better buffer possible outliers and therefore, it being an expectedly more accurate representation of true biology when compared to the maximal value. The peak activity is used for the remainder of this study.

Longer time intervals are better suited to determine lifespan

It could be possible that the time interval used for daily pixel difference calculations (i.e. image subtractions) affects ultimate decisions on lifespan. To test this, we first averaged daily activities of surviving worms of the same genotype for each day of the population’s lifespan (Fig 7 for experiment I, S7 Fig for experiments II-IV). Higher intervals lead to higher activity values, which may be especially important for animals with daf-2-like longevity, showing lowered but consistent movement during the later phases of life [13]. As can be expected, longer time intervals are able to saturate pixel difference values such as those recorded in the early phases of life, with the highest average activities for most populations peaking around 400 activity units. It is important to keep in mind that fewer worms are alive at later time points, therefore the recorded survivor activity relies on fewer data points as time progresses (Fig 7).

Fig 7. Average survivor activity is always higher for longer time intervals.

Fig 7

(A) Control populations display an activity decline in line with [7]. (B) The typical ‘twilight tail’ [13] or ‘gerospan’ [26] is observed in daf-2 RNAi-treated populations, where animals maintain low-level activity for the majority of their extended life. (C) In contrast, the activity of daf-16 RNAi-treated populations decreases slightly faster than that of controls.

The choice of time interval does not influence the determination of lifespan for control or short-lived (daf-16 RNAi-treated) populations (Fig 8 and S4 Table). However, in the case of daf-2 RNAi-treated animals, time intervals do affect the lifespan decision (S4 Table). Here, the calculated average lifespan reaches a plateau for intervals as of 60 seconds (Fig 8). Similar trends are true for individual experiments (S8 Fig). Overall, time intervals ≥ 60 seconds are acceptable for robust lifespan determination.

Fig 8. Lifespan of the long-lived condition is most sensitive to the choice of time interval.

Fig 8

Population averages as calculated for control (black), daf-2 (pink) and daf-16 (yellow) RNAi-treated populations (error bars: standard error of the mean), when different time intervals are used for determination of lifespan. Time intervals of ≥ 60 seconds are advisable.

Defining individual health

The ultimate goal of this analysis is to facilitate the search for conditions that affect life- and healthspan via medium- to high-throughput screens. Whereas lifespan is based on a binary measurement (the worm is either alive or not), healthspan is a nebulous concept. To identify interventions that affect health in large screens, however, a simple ‘health threshold’ that allows a similar binary decision, would nevertheless be helpful.

To test whether such a threshold can be found, we determined which observed pixel difference values correspond to which qualitative assessments of health. For this, we defined five categories describing an animal’s movement (very fast—fast—medium fast—slow—inactive, see Methods). Blinded evaluations of 694 activity movies collected from 23 animals over their entire lifespans then allowed to assign each movie to one of these categories. We decided that animals in the slow or inactive categories—i.e. barely moving, or not at all (see Methods)—are unhealthy. When linking the WorMotel-calculated pixel difference to the categorical value of each data point (Fig 9A), this analysis showed that despite some overlap between qualitative categories, decreasing pixel differences correspond to decreasing locomotive health. Therefore, the most suitable threshold value should maximize the number of truly healthy worms in healthy (very fast—fast—medium fast) categories, while maximizing the number of truly unhealthy worms in the unhealthy categories (slow–inactive, excluding data for dead animals). A first analysis including all data, showed that such a threshold can be found around 177 pixels changed (S9 Fig). Further refining to best differentiate the ‘medium fast’ and ‘slow’ animals, reveals a threshold to be set at approximately 160 pixels different (Fig 9B), an observation that holds true when analyzing individual plates (S10 and S11 Figs). We defined a worm as healthy when it showed an activity value above 160 pixels of difference, which is on the lenient side of the pixel difference options for threshold-based, binary (yes-no) assessment of health.

Fig 9. Decreasing pixel differences caused by worm movement correspond to decreasing locomotive health.

Fig 9

(A) Each dot in the figure represents the pixel difference (calculated by WorMotel analysis) for a single observation that was assigned to one of five qualitative categories (x-axis). (B) Above a pixel difference of 160, most animals are scored as healthy (~medium fast movement) by operators, whereas the majority of animals below this threshold are considered less healthy (~slow movement). Black line: fraction of animals in the 'slow' category with a pixel difference value < x-axis value; red line: fraction of animals in the 'medium fast' category with a pixel difference value >x-axis value.

Building on this threshold, we define ‘total days of health’ (TDH) as the total number of days on which the animal showed an activity higher than 160 pixels, and the healthspan as the last day of its life for which this was true. One can then calculate health(span) ratios as TDH/lifespan or healthspan/lifespan, to reflect the proportion of its life an animal can be considered healthy.

Longer time intervals also suit the assessment of health status

As is true for lifespan, the choice of time interval between analyzed images (Fig 8, S4 Table and Methods) affects the ability to quantify an animal’s activity, hence, health. We tested the effect of time-interval on the quantification of TDH and HS, based on the threshold for health being 160 pixels changed. As time intervals below 80 seconds are unadvisable for lifespan calculations, we opted to look at the time intervals ranging from 80 to 540 seconds included in this study. Within this range, the choice of time interval does not affect the quantification of TDH and HS for any of the genotypes (Fig 10 and S4 Table). For health(span) ratios, the effects of time interval are also less outspoken as these will influence TDH&HS vs LS values in similar ways.

Fig 10. Total days of health and healthspan quantifications are not sensitive to the choice of time interval within the 80–540 seconds range.

Fig 10

Population averages for control (black), daf-2 (pink) and daf 16 (yellow) RNAi-treated populations with accompanying standard error bars (reflecting standard error of the mean) when different time intervals are used for determination of (A) total days of health (TDH) and (B) healthspan (HS) are shown. Time intervals of >60 seconds are advisable.

Taken together (Figs 8 and 10), our data show that time intervals of >60 seconds are suitable to analyze WorMotel data of diverse conditions. We propose 100 seconds as the ideal compromise between improved activity detection and number of data points collected during one monitoring period, as the latter decreases for increasing time intervals.

TDH, HS and integrated activity of the population together aid in interpreting health

Whereas the concept of healthspan (HS) fits the hypothesis of gradual activity decline over aging very well, it is susceptible to severe misinterpretation of longitudinal health in a number of instances, e.g. in case of a single bout of activity right before death, after a long period of sickness. In such situations, total days of health (TDH) is a better representation of longitudinal health, but it is also more susceptible to day-to-day variation, e.g. classifying a single day of low activity in between several days of obvious health, as unhealthy. Our data show that most HSR values strongly near 1, hence, calculated HS often nearly equals LS (S5 Table). Visual inspection of activity traces from individual worms reveals that in general, HR values better approximate the observed fraction of life spent in a healthy state (S12 Fig). In addition, when comparing the long- and short-lived conditions with controls, HR is capable of distinguishing daf-2 RNAi-treated populations from internal controls for each experiment, whereas HSR failed to do so in one experiment plate (S6 Table). As expected, neither could distinguish daf-16 RNAi-treated animals from controls (S6 Table). Based on all these observations, we suggest HR as the primary choice for health readout in experiments where throughput demands fast and simple indicators of potentially interesting conditions.

While TDH and HS are valuable readouts at the level of individual worms, there is complementary value in comparing health at the population level. Additional health information is contained in the shape of the population-level activity curve (Fig 2D), and a metric reflecting this holds added value to the threshold-based TDH and HS. To incorporate this, we also use the area under the population-based survivor activity curve (Fig 11) as a health metric, for ease called integrated activity (IA). For our data, this value is consistently larger for daf-2 and lower for daf-16 RNAi-treated populations, as would be expected. Long-lived populations will have larger IA values, despite potentially adding more unhealthy time. Between populations of similar longevity, however, higher IA values reflect interventions that lead to more responsive animals. As opposed to the binary TDH or HS call, IA is assessed along a continuous scale, adding information on the extent of the effect at the population level. In combination with TDH or HS, this more integrative population metric helps distinguish interventions with disproportionate effects on health vs longevity.

Fig 11. Normalized IA helps distinguish treated populations from controls.

Fig 11

(A) The mean normalized lifespan (x-axis) vs total days of health (y-axis) and (B) normalized IA (x-axis) vs normalized HR (y-axis), are shown in relation to EV control populations (black, coordinates 1:1:1, all normalizations to internal controls). Average normalized values for populations treated with daf-2 (pink) or daf-16 (yellow) RNAi from Experiment I (‘◊’), Experiment II (‘Δ’), Experiment III(‘x’) or Experiment IV (‘o’) are used as coordinates.

The analysis described here aimed to select parameters that can easily distinguish control from long- and short-lived interventions. For this, we relied on the well-described daf-16 and daf-2 extremes, but also aim to account for long-lived populations that escape the ‘extended twilight’ phenotype [13]. It can be concluded that for screening purposes, WorMotel data can be collected at 5s intervals and analyzed at 100s intervals. For fast selection of interesting interventions in settings of considerable throughput, we propose to describe each by a combination of three parameters: lifespan, health ratio (~total days of health) and population-based integrated activity. The WorMotel’s ease of use and immense throughput potential balance its larger chip-to-chip variation (in comparison with variation over manual assays), as has been observed for other solutions aiming at increasing throughput [17]. This is why relative assessment, by normalization of individual data to the values of respective control populations, facilitates direct comparison over experiments (Fig 11). As expected, daf-16 RNAi-treated populations are fairly similar to EV populations but can be distinguished based on integrated activity (Fig 11). daf-2 RNAi-treated populations show lower normalized health ratios, with ~ 20% decrease of HR mean (Fig 11) and are easily distinguished by a significantly increased LS, decreased HR and increased integrated activity. Conclusions drawn from the here proposed analysis workflow are supported by each individual experiment, hence, are insensitive to inter-experiment variations.

Discussion

This study aimed to develop a straightforward and user-friendly protocol for rapid identification of interventions affecting aging in C. elegans. Our proof of concept study relies on data obtained with the WorMotel [7] and concludes that a variety of interventions can be tested when collecting data at 5s intervals, from which daily peak activity can be calculated using a time interval of 100s. These are the basis for candidate evaluation using three parameters: (1) lifespan, (2) total days of health, and (3) integrated activity, which allow straightforward discrimination of differently aging populations, and keep performing under inter-experimental variation as is typical for these longitudinal experiments [17,25]. Relevant scripts can be found in S2 File.

Collective efforts in the field have revealed pathways capable of determining lifespan of C. elegans. However, no perfect consensus can be reached on the concept of being ‘healthy’, as it is an umbrella term covering many facets of quality of life. Several groups have used different physiological parameters to describe health in C. elegans, such as oxidative and heat stress resistance, pharyngeal pumping and autofluorescence [6], vulval integrity [27], intestinal atrophy [28], muscle integrity and yolk production [10]. We here selected daily, stimulated peak activity over daily, stimulated average/median activity as a readout for overall health, as peak values were most robust and less prone to day-to-day activity variations (Fig 5 and S2 Table).

The WorMotel setup uses blue light to stimulate the animals [22,29], therefore peak activity reflects the intrinsic maximal ability of the animal to react to blue light and is evaluated over its lifetime. This organismal response integrates health status of the animal’s perceptive abilities with its neuromuscular health. Whereas interventions affecting sensory perception, overall neuronal or muscular health cannot be separated by our approach, previous studies showed no significant difference between survival curves of animals grown in WorMotels (stimulated by blue light) and animals grown in standard plates (stimulated by touch) [7], indicating that the WorMotel provides a good readout of general health in aging populations.

Our analysis revealed that defining the daily activity of a worm by condensing its activity trace of one monitoring period in different ways, delivers slightly different information (S1 and S2 Figs, and S1 Table). The strong correlation of peak with maximal values on one hand, vs of median with integral values on the other, indicates that these parameters together reflect only two interpretations of the stimulated activity trace. Indeed, integral values (~median) take the ability of sustained locomotion upon stimulation into account, whereas peak (~maximal) values are more indicative of the initial ability to respond to the stimulus.

Our definition of health requires individuals to stay above a threshold chosen based on a qualitative analysis of locomotive health in ageing worms. We found that, independent of phenotype, largely non-responding worms—i.e. slow or inactive—can be roughly discriminated from healthy animals by applying a pixel difference threshold of 160 (Fig 9, S9 and S10 Figs). While this proved the optimal choice, especially for cases where considerable throughput is expected for data analysis, our data also clearly show overlap between categories, due to large spread within categories (Fig 9 and S11 Fig). This indicates that also for the WorMotel, health remains a noisy concept, and a binary decision (yes or no) is not to be taken as the sole pillar of decision.

In the same line of reasoning, HR deals with limitations regarding the extent to which an animal is healthy. This is why ‘integrated activity’ is a useful additional metric, even though it can only be used at population level and disproportionally weighs the activities of the longer- vs shorter-lived individuals. For populations with similar lifespans and similar HR, IA permits to select the healthier ones. This is especially interesting when looking for interventions that increase health more than they increase longevity, a combination of high biological and medical interest.

Several methods have been reported for automated life- and healthspan evaluation in C. elegans [7,1618,20,21]. While their relevance is evident, widespread use is hampered because these are often the product of in-house optimization. Hence, there are no plug-and-play solutions and the expertise to use such systems is typically contained within only a few individuals globally. With this work, we offer the community an analysis pipeline to easily adopt the WorMotel system as described by Churgin et al. [22]. It is our hope that this may facilitate the implementation of automated life- and healthspan evaluation in other labs, as such contributing to progress in the field.

Supporting information

S1 Fig. Peak and maximal activity on one hand, vs median and integral activity on the other hand, form two separate groups of correlative parameters.

Activity calculated by using the peak values correlates perfectly with maximal activity values (99th percentile) at any time interval (A), whereas correlation with integral (B) and median (C) values is time interval dependent. (D) Integral values, on the contrary, correlate well with median values, but neither of these (E integral, F median) escape the weaker and interval-dependent correlation with maximal activity. These data suggest that only two interpretations of the activity profile are made by determination of peak/maximal and median/integral daily activities. Time interval dependence of the correlations in B, C, E and F is easily explained by the higher sensitivity of median/integral values to the time interval between analyzed images.

(TIF)

S2 Fig. Correlation of individual worm data calculated at 100 seconds interval visually shows the two separate groups of correlative parameters.

Activity was calculated based on the different parameters for each worm on each day, independent of genotype. (A) Activity calculated by using the peak values correlates perfectly with maximal activity values (99th percentile), whereas correlation with (B) integral and (C) median values is less pronounced. (D) Integral values, on the contrary, correlate well with median values, but both (E integral, F median) show a weaker correlation with the maximal value.

(TIF)

S3 Fig. Peak activity values result in activity curves with the lowest variation, as is clear from the distributions of overall variation based on daily median, integrated, peak or maximal activity for control (black), daf-2 (pink) or daf-16 (yellow) RNAi-treated populations across all experiments by increased time interval.

For each individual, day-to-day variation was calculated as stated in the main text. Box values: Q1-2-3, whiskers: +/–2.7σ.

(TIF)

S4 Fig. Also for individual experiments plates, peak and maximal activity values result in activity curves with the lowest variation.

Distributions of overall variation based on daily median, integrated, peak or maximal activity for control (black), daf-2 (pink) or daf-16 (yellow) RNAi-treated populations across all experiments (A Exp I; B Exp II; C Exp III; D Exp IV) follow the same trends as pooled data (Fig 5). For each individual, day-to-day variation was calculated as stated in the main text. Box values: Q1-2-3, whiskers: +/–2.7σ.

(TIF)

S5 Fig. Variance in Z-score is similar for all activity parameters at all time intervals, as is clear from the distributions based on daily median, peak, maximal or integrated activity for pooled control (black), daf-2 (pink) or daf-16 (yellow) RNAi-treated populations across all experiments.

For each individual, day-to-day variation was calculated as stated in the main text. Box values: Q1-2-3, whiskers: +/–2.7σ.

(TIF)

S6 Fig

Variance in Z-score based on individual plates is similar for all activity parameters at all time intervals, as is clear from the distributions based on daily median, peak, maximal or integrated activity for control (black), daf-2 (pink) or daf-16 (yellow) RNAi-treated populations across all experiments (A Exp I; B Exp II; C Exp III; D Exp IV). For each individual, day-to-day variation was calculated as stated in the main text. Box values: Q1-2-3, whiskers: +/–2.7σ.

(TIF)

S7 Fig. Average survivor activity is higher for longer time intervals across all experiments.

Average survivor activity for control, daf-2 and daf-16 RNAi-treated populations for (A-C) Exp II, (D-F) Exp III and (G-I) Exp IV. Longer time intervals (≥60s) provide more accurate measurements, this is especially important in late phases of life.

(TIF)

S8 Fig. The choice of time interval for activity evaluation affects the determination of LS in a genetype-dependent manner across all experiments.

Mean lifespan (error bars: standard error of mean) was calculated for different time intervals for (A) Exp I, (B) Exp II, (C) Exp III and (D) Exp IV. The choice of time interval does not affect the calculation of lifespan of control (black) and daf-16 RNAi-treated (yellow) populations but does affect lifespan decisions made for the long-lived daf-2 RNAi-treated (pink) populations.

(TIF)

S9 Fig. A majority of observed pixels changed for healthy worms lie above 177 pixels, whereas the majority of observed pixels changed for unhealthy worm lie under this value.

Determination of a threshold that maximizes the number of truly healthy worms in healthy (very fast—fast—medium fast) categories, while maximizing the number of truly unhealthy worms in the unhealthy categories (slow—inactive) led to a threshold value of 177 pixels changed. Black line: fraction of animals in the 'slow' category with a pixel difference value < x-axis value; red line: fraction of animals in the ‘very fast’, ‘fast’ and 'medium fast' category with a pixel difference value >x-axis value.

(TIF)

S10 Fig

Threshold determination on individual plate level is very similar to pooled data, with cumulative curves of from medium fast and slow worms intersecting at approximately 160 pixel differences for (A Exp I; B Exp II; C Exp III; D Exp IV).

(TIF)

S11 Fig. Decrease in pixel differences with locomotive health follows similar trend for individual plates, with lower pixel differences being assigned to categories of lower locomotive health.

In general, pixel differences below 160 belong to categories 4 and 5 for A Exp I; B Exp II; C Exp III; D Exp IV.

(TIF)

S12 Fig. Three examples illustrate visually that TDH is a better approximation of health observed in our data when compared to HS.

We plotted the activity profile of three individual wild-type worms whose HS was 1, but whose TDH (A) strongly, (B) moderately or (C) slightly deviated from HS. Activity profile of (A) shows a flare of activity in the last day of life, resulting in a misleadingly high HS. TDH of (B) nears HS more than in case of (A), however, fluctuations in the activity profile of this worm indicate that TDH has a better representation of the animal’s health. (C) TDH deviates only two days from the quantified HS, nevertheless, leads to a better approximation of health. All three worms visually indicate that HR (~TDH) is a more accurate quantification of observed health than HSR (~HS).

(TIF)

S1 Table. Correlation of tested activity parameters depends on the parameter and time interval.

Correlation of the activity values based on either median, peak, maximal (99th percentile) or integral values (see Methods) was tested with a linear regression model for each strain (Column D) and time interval (column E). R2-values (column C) show that peak activity correlates well with 99th percentile value, while median and integral values correlate well with each other. Correlation becomes stronger with increasing time interval.

(XLSX)

S2 Table. Data based on peak or maximal activity show the lowest overall variation.

Overall variation was calculated for pooled (per genotype) activity data based on either median, peak, maximal (99th p = 99th percentile) or integral values (see methods), as calculated for each time interval (column D). Differences (columns A vs B) were compared via Kruskal-Wallis testing (column C: multiple testing-corrected p-values). Bold red: p-values indicative of statistically significant differences.

(XLSX)

S3 Table. Use of different activity parameters does not influence variance in Z-score.

Variance in Z-score pooled (per genotype) activity data based on either median, peak, maximal (99th p = 99th percentile) or integral values (see Methods), as calculated for each time interval (column D). Differences (columns A vs B) were compared via Kruskal-Wallis testing (column C: multiple testing-corrected p-values). Bold red: p-values indicative of statistically significant differences.

(XLSX)

S4 Table. Time intervals affect the determination of lifespan in a genotype-dependent manner.

Lifespan was calculated for pooled (per genotype) activity data for each time interval under consideration. Differences (columns A vs B) were compared via Kruskal-Wallis testing (column C: multiple testing-corrected p-values). Bold red: p-values indicative of statistically significant differences.

(XLSX)

S5 Table. HSR values for individual worms often equal 1, independent of genotype or experiment.

LS, HR and HSR values for individual worms for each population and experiment are shown. HSR values often equal 1 for different genotypes, meaning that healthspan and lifespan are the same. HR values on the contrary, reflect more what has been observed in literature with values that often vary from 0.5 to 0.8.

(XLSX)

S6 Table. Health(span) ratio of daf-2 RNAi treated animals is significantly different from internal controls.

H(S)R of controls, daf-2 and daf-16 RNAi-treated animals were calculated. H(S)R distributions of daf-2 and daf-16 RNAi treated animals were probed for significant differences from controls at the same threshold (multiple testing-corrected p-valueKruskal-Wallis < 0,05). Significant p-values are marked in red.

(XLSX)

S1 File. Pixel difference datafiles for all experiments.

Pixel differences per worm per day were calculated (see Methods) and stored in daily pdata files. Number in pdata file name increases with time. Each column in a pdata file represents an individual worm (from 1 to 240), with each row listing a value according to the time vector.

(ZIP)

S2 File. Tutorial for data analysis.

This includes relevant scripts (See Tutorial).

(ZIP)

Acknowledgments

The authors are grateful to Dr. Wouter De Haes for advice regarding statistics, to Ing. Erind Jushaj for assistance with VBA programming, to Bram Cockx for tutorial testing and to Wahab Al-Aani, Elke Vandewyer and Amanda Kieswetter for quality assessment of movement. Strains used in this study were provided by Caenorhabditis Genetics Center (CGC).

Data Availability

All relevant data are within the paper and its Supporting Information files. The series of images accompanying the tutorial for data analysis have been made available at https://www.ebi.ac.uk/biostudies/studies/S-BSST313#.

Funding Statement

All authors received project funding from the European Union’s Horizon 2020 research and innovation programme (633589). This work was supported by the KU Leuven Research Council (C14/15/049) to LT. AJ received travel support from the FWO Flanders (V426816N and K218918N) and Junior Mobility Program at KU Leuven (JUMO/16/021 and JUMO/18/009). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Strains used in this study were provided by the Caenorhabditis Genetics Center (CGC), which is funded by National Institutes of Health (Office of Research Infrastructure Programs Grant P40 OD010440). European Union’s Horizon 2020 research and innovation programme: https://ec.europa.eu/programmes/horizon2020/en). KULeuven Research Council: https://admin.kuleuven.be/raden/en/research-council FWO: https://www.fwo.be/. Junior Mobility Program at KU Leuven: https://www.kuleuven.be/personeel/careercenter/youreca-career-center/yourecaENG/youreca-internationalmobility.

References

  • 1.Kaeberlein M. How healthy is the healthspan concept? GeroScience 2018;40:361–4. 10.1007/s11357-018-0036-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Luyten W, Antal P, Braeckman BP, Bundy J, Cirulli F, Fang-Yen C, et al. Ageing with elegans: a research proposal to map healthspan pathways. Biogerontology 2016;17:771–82. 10.1007/s10522-016-9644-x [DOI] [PubMed] [Google Scholar]
  • 3.Ross JL, Yudin J, Galluzzi K. The geriatric assessment team: A case report. Fam Syst Med 1992;10:213–8. 10.1037/h0089169 [DOI] [Google Scholar]
  • 4.Lemmink KAPM, Han K, De Greef MHG, Rispens P, Stevens M. Reliability of the Groningen Fitness Test for the Elderly. J Aging Phys Act 2001;9:194–212. 10.1123/japa.9.2.194 [DOI] [Google Scholar]
  • 5.Ito T. Comprehensive Physical Function Assessment in Elderly People. Clin. Phys. Ther., InTech; 2017. 10.5772/67528 [DOI] [Google Scholar]
  • 6.Bansal A, Zhu LJ, Yen K, Tissenbaum HA. Uncoupling lifespan and healthspan in Caenorhabditis elegans longevity mutants. Proc Natl Acad Sci 2015;112:E277–86. 10.1073/pnas.1412192112 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Churgin MA, Jung SK, Yu CC, Chen X, Raizen DM, Fang-Yen C. Longitudinal imaging of caenorhabditis elegans in a microfabricated device reveals variation in behavioral decline during aging. Elife 2017;6:e26652 10.7554/eLife.26652 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Hahm JH, Kim S, Diloreto R, Shi C, Lee SJ V., Murphy CT, et al. C. elegans maximum velocity correlates with healthspan and is maintained in worms with an insulin receptor mutation. Nat Commun 2015;6:8919 10.1038/ncomms9919 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Newell Stamper BL, Cypser JR, Kechris K, Kitzenberg DA, Tedesco PM, Johnson TE. Movement decline across lifespan of Caenorhabditis elegans mutants in the insulin/insulin-like signaling pathway. Aging Cell 2018;17:1–14. 10.1111/acel.12704 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Herndon LA, Schmeissner PJ, Dudaronek JM, Brown PA, Listner KM, Sakano Y, et al. Stochastic and genetic factors influence tissue-specific decline in ageing C. Elegans. Nature 2002;419:808–14. 10.1038/nature01135 [DOI] [PubMed] [Google Scholar]
  • 11.Huang C, Xiong C, Kornfeld K. Measurements of age-related changes of physiological processes that predict lifespan of Caenorhabditis elegans. Proc Natl Acad Sci U S A 2004;101:8084–9. 10.1073/pnas.0400848101 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Pincus Z, Smith-Vikos T, Slack FJ. MicroRNA predictors of longevity in caenorhabditis elegans. PLoS Genet 2011;7 10.1371/journal.pgen.1002306 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Zhang WB, Sinha DB, Pittman WE, Hvatum E, Stroustrup N, Pincus Z. Extended Twilight among Isogenic C. elegans Causes a Disproportionate Scaling between Lifespan and Health. Cell Syst 2016;3:333–345.e4. 10.1016/j.cels.2016.09.003 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Marck A, Berthelot G, Foulonneau V, Marc A, Antero-Jacquemin J, Noirez P, et al. Age-related changes in locomotor performance reveal a similar pattern for Caenorhabditis elegans, Mus domesticus, Canis familiaris, Equus caballus, and Homo sapiens. Journals Gerontol—Ser A Biol Sci Med Sci 2017;72:455–63. 10.1093/gerona/glw136 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Mathew MD, Mathew ND, Ebert PR. WormScan: A Technique for High-Throughput Phenotypic Analysis of Caenorhabditis elegans n.d. 10.1371/journal.pone.0033483 [DOI] [PMC free article] [PubMed]
  • 16.Rahman M, Hewitt JE, Van-Bussel F, Edwards H, Blawzdziewicz J, Szewczyk NJ, et al. NemaFlex: A microfluidics-based technology for standardized measurement of muscular strength of: C. elegans. Lab Chip 2018;18:2187–201. 10.1039/c8lc00103k [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Stroustrup N, Ulmschneider BE, Nash ZM, López-Moyado IF, Apfeld J, Fontana W. The caenorhabditis elegans lifespan machine. Nat Methods 2013;10:665–70. 10.1038/nmeth.2475 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Swierczek NA, Giles AC, Rankin CH, Kerr RA. High-throughput behavioral analysis in C. elegans. Nat Methods 2011;8:592–602. 10.1038/nmeth.1625 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Hulme SE, Shevkoplyas SS, McGuigan AP, Apfeld J, Fontana W, Whitesides GM. Lifespan-on-a-chip: microfluidic chambers for performing lifelong observation of C. elegans. Lab Chip 2010;10:589–97. 10.1039/b919265d [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Mathew MD, Mathew ND, Ebert PR. WormScan: A technique for high-throughput phenotypic analysis of Caenorhabditis elegans. PLoS One 2012;7:e33483 10.1371/journal.pone.0033483 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Pitt JN, Strait NL, Vayndorf EM, Blue BW, Tran CH, Davis BEM, et al. WormBot, an open-source robotics platform for survival and behavior analysis in C. elegans. GeroScience 2019. 10.1007/s11357-019-00124-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Churgin MA, Fang-Yen C. An imaging system for C. Elegans behavior. Methods Mol. Biol., vol. 1327, 2015, p. 199–207. 10.1007/978-1-4939-2842-2_14 [DOI] [PubMed] [Google Scholar]
  • 23.Stiernagle T. Maintenance of C. elegans. WormBook; 2006:1–11. 10.1895/wormbook.1.101.1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Kamath RS, Ahringer J. Genome-wide RNAi screening in Caenorhabditis elegans. Methods 2003;30:313–21. 10.1016/s1046-2023(03)00050-1 [DOI] [PubMed] [Google Scholar]
  • 25.Lucanic M, Plummer WT, Chen E, Harke J, Foulger AC, Onken B, et al. Impact of genetic background and experimental reproducibility on identifying chemical compounds with robust longevity effects. Nat Commun 2017;8:14256 10.1038/ncomms14256 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Bansal A, Zhu LJ, Yen K, Tissenbaum HA. Uncoupling lifespan and healthspan in Caenorhabditis elegans longevity mutants. Proc Natl Acad Sci U S A 2015;112:E277–86. 10.1073/pnas.1412192112 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Leiser SF, Jafari G, Primitivo M, Sutphin GL, Dong J, Leonard A, et al. Age-associated vulval integrity is an important marker of nematode healthspan. Age (Omaha) 2016;38:419–31. 10.1007/s11357-016-9936-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Gelino S, Chang JT, Kumsta C, She X, Davis A, Nguyen C, et al. Intestinal Autophagy Improves Healthspan and Longevity in C. elegans during Dietary Restriction. PLoS Genet 2016;12:e1006135 10.1371/journal.pgen.1006135 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Lee KH, Aschner M. A Simple Light Stimulation of Caenorhabditis elegans. Curr Protoc Toxicol 2016;67:11.21.1–5. 10.1002/0471140856.tx1121s67 [DOI] [PMC free article] [PubMed] [Google Scholar]

Decision Letter 0

Sean P Curran

16 Sep 2019

PONE-D-19-22873

Optimized criteria for locomotion-based healthspan evaluation in C. elegans using the WorMotel system

PLOS ONE

Dear Ms Jushaj,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a significantly revised version of the manuscript that addresses all of the points raised during the review process.

==============================

  • The reviewers were split between recommending rejection and major revision.  I believe that the study has merit and should be published, but only after significant revision that addresses each of the reviewers concerns.

==============================

We would appreciate receiving your revised manuscript by Oct 31 2019 11:59PM. When you are ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter.

To enhance the reproducibility of your results, we recommend that if applicable you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). This letter should be uploaded as separate file and labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. This file should be uploaded as separate file and labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. This file should be uploaded as separate file and labeled 'Manuscript'.

Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.

We look forward to receiving your revised manuscript.

Kind regards,

Sean P. Curran

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

http://www.journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and http://www.journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. We note that you have indicated that data from this study are available upon request. PLOS only allows data to be available upon request if there are legal or ethical restrictions on sharing data publicly. For more information on unacceptable data access restrictions, please see http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions.

In your revised cover letter, please address the following prompts:

a) If there are ethical or legal restrictions on sharing a de-identified data set, please explain them in detail (e.g., data contain potentially sensitive information, data are owned by a third-party organization, etc.) and who has imposed them (e.g., an ethics committee). Please also provide contact information for a data access committee, ethics committee, or other institutional body to which data requests may be sent.

b) If there are no restrictions, please upload the minimal anonymized data set necessary to replicate your study findings as either Supporting Information files or to a stable, public repository and provide us with the relevant URLs, DOIs, or accession numbers. For a list of acceptable repositories, please see http://journals.plos.org/plosone/s/data-availability#loc-recommended-repositories.

We will update your Data Availability statement on your behalf to reflect the information you provide.

Additional Editor Comments (if provided):

As you will note, the reviewers identified several major issues with the manuscript. Although the decision to allow a major revision has been made, I want to emphasize that a revised manuscript will need to address all of the reviewers concerns.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Partly

Reviewer #2: Partly

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: No

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: No

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: Summary: The paper focuses on optimizing the use of movement-based data from the WorMotel system to better evaluate healthspan. They do this by first evaluating four different statistics of single worm movement before choosing the one they claim is the least variable over independent experiments (peak activity). They then measure “total days of health” and healthspans of daf-2 and daf-16 RNAi animals, both of which are based on varying thresholds of peak activity compared to wildtype. They further analyze these data by using various ratios and integrals to develop further differences between the phenotypes of the animals and to most robustly capture differences in their healthspans.

The principal of these experiments, developing and evaluating statistics to measure worm movement in high-throughput automated systems, is worthwhile. Much of the approach is warranted and well-described. However, some of the tools and many of the quantitative “cutoffs” and measurements are not properly justified. Also, the text does not take confounding issues into account, relying solely upon quantitative movement measurements with little regard to what they might mean and what they might really be measuring. As a whole, the manuscript takes what could be interesting data and makes many assertions that are not well-justified toward the main subject (healthspan/healthy aging) and are not put into a context that makes sense beyond statistical analysis.

Major concerns:

-While pixels changed is a nice quantifiable readout, it is unclear what this represents. It would not be difficult to use the recordings to produce a qualitative corollary (fast/slow movement across plate, partial paralysis, movement in place etc.) and to then quantify these movements as more reliable readouts of movement with pixels changed. The use of an arbitrary quantitative cutoff when qualitative data can be added to it seems like a lost opportunity.

-Calling daily peak activity more robust ignores what these measurements are supposed to take into account. Health is definitely not judged by peak activity (e.g. sprinting), and just because a measurement creates quantitative spread, it doesn’t make it the best measurement. None of these things conclusively measure health.

-Why is it necessary to convert activity to a single value per day? Why not more frequently? Less frequently? Why not multiple values? It seems like they are just trying to make it simple for simplicity and not for a well-justified reason.

-All of the choices presented for single value readouts seem to have more random variation than one would like, reflecting higher day-to-day oscillations and thus challenges in the readout. This is a function of the complexity of complex organisms.

-Is peak response really the maximum ability to move? Or does it also have to do with sensitivity to the blue light changing with age and a number of other unknown factors? None of the other potential factors are mentioned or accounted for.

-“Smoother curves lead to more robust assessment.” They may lead to more separation, but that doesn’t make it necessarily correct. It is looking at different things, some of which have different variation, and none of which are looked at for why they vary and thus why they might be better or worse indicators of health. It is impossible to say what is better or worse at measuring health, only that some are statistically more separable.

-What is a “non-optimal activity?” How can one assess this? What does this mean? Without qualitative measurements this is a completely arbitrary cutoff with no real meaning.

-What exactly does “total days of health” tell you? They define the parameter, but how does it contribute to a better description of the health of the animal? It seems they use it primarily because it is less than the healthspan (in numerical quantity) resulting in better resolution of the health ratios they utilize later in the paper. If the healthy days don’t have to be consecutive, what does it mean when a worm doesn’t have a “healthy” day, but resumes being healthy the following day? HR “outperforms” HSR as a distinguishing health metric? What does this mean? Arbitrarily creating increased differences is not outperforming. It is not justified why TDH is a useful metric, what it really means, or whether it has any place in measurements.

-“The WorMotel setup uses blue light to stimulate the animals therefore peak activity reflects the maximal ability of an animal to move, and it is evaluated over its lifetime” This may be true in young worms, but is not known in older individuals who lose not only the ability to move, but to perceive things such as blue light.

Minor concerns:

Figure 3: Peak activity and 99th percentile appear do have the lowest variation but no statistics that they are significantly different than median or integrated activity are given.

Figure 6B, D: B has no error bars (or maybe they’re just really small) while D has huge error bars. In general, dividing both TDH and healthspan by the same value (lifespan) is plotting the same information on a different scale. This seems unnecessarily redundant since the trends are going to be the same.

Line 315: They propose 100 seconds as the “ideal compromise” for the time intervals, but if they identified 60 – 300 seconds a finer gradient would be necessary to find a more optimal time instead of just taking the middle time point between the two.

Figure 7: They never actually define the acronym IA in the paper. A) The variation in their individual daf-2 RNAi experiments makes it hard to determine the significance of the points compared to WT. B) It would seem that the daf-16 RNAi animals are healthier but move slightly less than WT over their lifespans, while the daf-2 RNAi animals are less healthy but move more than WT over their lifespans.

The daf-2 RNAi animals are healthier (i.e. they move more) at later stages in life (the “twilight” period) than WT, but since their activity is below the threshold chosen by the authors, daf-2 RNAi animals are considered to be less healthy than WT. This illustrates the problem with arbitrary thresholds: depending where they are set you can get either answer.

Figure S6: It appears the plots are reversed compared to what they describe in the caption.

Discussion:

Line 355: “…a wide variety of interventions can be tested…” Only one type of was tested here. This seems like an overreach.

Reviewer #2: SUMMARY

The authors use data from the previously described "WorMotel" system to evaluate metrics for assaying health and healthspan. In particular, the authors compare N2, daf-2(RNAi) and daf-16(RNAi) for health effects under a variety of metrics to identify an "optimal" scheme for measuring individual and population health (according to several metrics the authors have chosen). The authors find that peak activity across measurement intervals of 60-300 sec. provide the least variable measurements of movement among those considered. They further argue that two metrics provide useful and complementary information about health: indivdual health ratios (a ratio of a healthspan-like metric to lifespan) and population-level integrated activity.

IMPRESSION

There is great experimental and conceptual inconsistency within the field regarding how to evaluate "healthspan" in C. elegans. The current study provides a useful starting point for a much-needed discussion about experimental design and data analysis.

Before that, however, there are several points within the study where the authors need to clarify methodological details, and explicate some of the logic by which they draw their conclusions. In particular, the criteria by which the authors evaluate "optimal" metrics are rather ad hoc and deployed with little justification or statistical backing. More explanation here will be crirical.

In related matters, the authors reasoning in justifying their choice of health measures is somewhat circular in places. The main logic appears to be that "a good healthspan metric should distinguish daf-2, daf-16, and control conditions" -- which presumes that healthspan is actually different in those cases. What if the "true" healthspan (or normalized healthspan ratio) is actually the same in daf-16 vs. N2? There's no a priori reason to assume that the "right" metric is one that separates those conditions. At a minimum these caveats need to be carefully discussed.

The writing is also somewhat loose and informal throughout, and in places overly vague regarding specific experimental details. Nevertheless, with a little tightening of both the writing and the logic of the data analysis -- and perhaps dialing some of the conclusions back to what is supported by the data -- this will be a useful contribution to the literature.

MAJOR CONCERNS

1) In the abstract, the authors refer to RNAi against daf-2 and daf-16 (vs. empty-vector control) as a "wide range of conditions". Indeed, throughout the work the authors simply assume that any findings based on analysis of these three conditions will inherently generalize to other (non-IIS-perturbing) conditions. This is a bit of a stretch, and the authors need to be much more careful with claims of generality after examining only IIS-pathway perturbations.

2) Many of the methods are under-described, especially for a manuscript that aims to propose a canonical analysis scheme.

2a) The authors need to describe precisely how image pixels are turned into numerical scores. I assume the image immediately before stimulation is compared to images at different intervals after stimulation. (Or are the differences calculated not for t=0 vs. t=n, for all n, but for t=n vs. t=n+1? This matters, obviously.) Next, I assume a pixel-wise difference image is calculated for whichever pair of images is under consideration. But how is that image then converted into a single difference score? The median absolute pixel difference? The root mean squared pixel difference? The sum of absolute differences? The count of pixels that are different by a certain threshold? (If the latter, how is the threshold chosen? And how does the threshold choice influence subsequent conclusions regarding healthspan?)

If a sum-of-differences type of metric is used (rather than e.g. a count-of-above-threshold-differences), the authors discuss the caveat that conditions that change that pixel intensity distribution of worms (i.e. produce individuals that are either more clear or darker or more mottled than WT) will naturally generate different pixel differences for the same amount of total movement. This is also a problem for thresholded counts, but may be less severe depending on the threshold employed. Overall, these details matter, and they need to be explicitly described in the methods (rather than be left implicit in a matlab script) and the choices / trade-offs should ideally be justified if possible.

2b) Given a pixel-difference score as a function of time post-stimulation, the authors next propose several ways to summarize that as a single number for each individual. Of these, the "peak activity" score needs to be described and justified better. What does "average of the 95th to 99th percentiles" mean? Is it the mean of all values within the range defined by the 95th to 99th percentiles? Or just (95th-percentile-value + 99th-percentile-value)/2? More generally, what is the point of using both a percentile and an arithmetic mean? The choice is justified as saying it "should be slightly less prone to outliers or noise", but I'm not sure on what statistical basis one might conclude that a mean (which is outlier prone) of a percentile (which is more robust to outliers) would be better than just e.g. using the 97.5th percentile or whatever. Especially given that this mathematically-weird metric is the one the authors recommend later, a little more description / justification is warranted.

2c) The overall variation score is not clearly defined. Presumably the brackets in the denominator of the numerator of the overall formula represent the expectation (mean) of the day-to-day absolute changes. But what is the mean over -- is it a mean of all individuals at that day? A mean of all days for that individual?

Next, in the results, this is described as a sum of changes "in percent", but it's not clear that the formula really is calculating percentage changes in any meaningful way. A percent (or fractional) change would generally be calculated something like |a_i - a_i+1| / |a_i|, or similar. (I.e. change / baseline, rather than the current formula, which is individual-change / population-mean-change.)

More generally, this score is rather ad hoc, and the specific choice isn't particularly well justified. In particular, given that some degree of activity-score changes over time is expected (as aging happens, scores decline), clearly an optimal health score isn't just one that minimizes all changes over time!

The authors are correct in their desire for a metric that changes smoothly, rather than one that is noise-prone and gives jagged curves over time. But the authors' proposed variation score doesn't just penalize jagged curves -- it penalizes any curve that isn't completely flat. That is, the only way to minimize the proposed total variation score is for a health measure to give a completely flat, zero-slope line. So, broadly, the key metric for this whole manuscript isn't really measuring what the authors want it to measure (or claim it is measuring). Intuition: jaggedness is a second-derivative (and higher derivatives) property, but the score is just examining first derivatives.

A better approach might be to minimize day-to-day changes with reference to the overall population trend. Something more like |a_i/mean(a_i) - a_i+1/mean(a_i+1)| (where the mean is across all individuals at that timepoint), for example. A good health measure could produce zero on this variation score and still allow for smooth changes in health over time.

Even that's pretty ad hoc, however, and would confounded by changes in the population variance in activity scores over time. So the real right answer is for the authors to look to the existing statistical toolkit, rather than reinventing their own measures of variation. In particular, if the authors wish to measure variation, then statistical variance would be an obvious choice. But one wouldn't want to measure the variance of the a_i since, as above, those scores are expected to change over time. A more principled approach would be to calculate each individual's movement z-score as a function of time i.e. z_i = (a_i - mean(a_i)) / std(a_i) (where the mean and std are across all individuals at that timepoint), and then calculate the variance of the z_i scores across all timepoints for that particular individual. The "smoothest" activity measure would be the one with the least variance in z-scores over time.

Note also the "peak activity" metric is truly the best and most robust, one might hope that it would perform best across a panel of different metrics, such as all those suggested above. But if each different variation score yields a different choice of "best metric", one might become skeptical of the whole project.

2d) A "5 pixels changed" threshold is used to define lifespan (line 146-7 and 265-6), but there is no discussion of (i) how this threshold was chosen, (ii) over what time interval it is computed (since the authors note that longer time intervals increase sensitivity to movement), (iii) how sensitive lifespan estimates are to that specific threshold.

3) Regarding the different activity metrics (peak, median, etc.), how much do these distinctions actually matter? Do any of the conclusions from the later sections actually depend on using peak/99th percentile, or are the qualitative results the same if median or integral was employed? For that matter, how well correlated are all of these measures with one another? Perhaps this is all a bit of a distinction without a difference, if all the measures are highly correlated...

4) Regarding the determination of an appropriate healthspan descriptor (HR vs. HSR) + threshold (lines 273-299), the logic/language in this section is pretty unclear throughout.

4a) Consider making these paragraphs into a new section altogether (e.g. “determining ideal metrics for health”).

4b) It would be useful to clearly state ahead of time that the decision of interest involves choosing HR vs. HSR (as opposed to TDH vs. HS, and separate from just determining the threshold).

4c) lines 273-289 and lines 290-299: the authors appear to be making the assumption that the ideal descriptor and threshold should (i) distinguish among daf-2, daf-16, and WT, and (ii) be consistent across experiments. The authors should state these criteria explicitly and justify them. In particular, as above, this introduces some degree of circular reasoning. If the authors chose their health metric based on how well it can distinguish daf-2 and daf-16 from WT, then some reader of this paper would be on very shaky ground to then use that metric to try to learn anything about the health effects of IIS by using such a metric.

4d) lines 292-295: "at very high thresholds, nTDH reflects the intrinsic activity of the young worm populations (vs control levels), whereas at low thresholds, this value approaches longevity (Fig S4 and S5). The optimal threshold region can be found in between these two, where HR (=TDH/LS) reflects health". The authors' argument seems to be that because the extremes of threshold ranges are bad health scores, the middle of the range must be a good score. This is not a priori true: the middle of the range could be bad too! Absent some external criterion, there's no way for the authors to really figure out what "optimal" might be.

5) The authors don't mention anywhere the real take-home message of Figure 6, which is that the HSR metric is hugely more variable than all the rest (look at those standard errors!) and is probably useless as a result.

6) The discussion of the integrated activity (Figure 7 and lines 320-327) is extremely vague and unclear: what does the integral of the population curve tell us that the population averages of the individual statistics don't? The figure legend claims that the normalized IA better distinguishes the daf-16 replicates from WT (and/or daf-2) than the other measures, but that's not visually obvious from Figure 7, nor is there any quantification.

7) Lines 343-345: "This is why relative assessment, by normalization of individual data to the values of respective control populations, facilitates direct comparison over experiments (Fig 7)." This is an assertion, but isn't really backed up by the data. Figure 7 still shows a ton of replicate-to-replicate variability. How much smaller is the normalized inter-replicate variability compared to un-normalized? (At high enough levels of inter-plate variability, normalization can't really help anymore because you can't even count on the experiment-control pairs to be comparable within a replicate.)

8) Lines 377-379: "In line with literature, this classifies the ‘gerospan’ or ‘extended twiglight’ of daf-2-like interventions as unhealthy". This isn't really consistent with the authors' data, which show that daf-2 has a larger TDH, HS and IA than WT. The HR is smaller for daf-2, and the HSR is so variable as to be useless (as above). Thus across many of the authors own measures daf-2 is perfectly healthy. The "deficiency" of daf-2 is that while it gains many more days of good health vs. WT, it also gains even more days of poor health. Whether or not this is a "good" tradeoff is a value judgement rather than an empirical fact. Moreover, as per Hahm 2015, under other metrics of health daf-2 behaves quite differently.

Related to the above, lines 384-385 are also an assertion / value judgement masquerading as fact: "interventions that increase health more than they increase longevity [are] a combination of high biological and medical interest". Take for example a condition in which daf-2 individuals are euthanized the second that their activity falls below whatever threshold is defined for "good health". This condition would produce a substantial increase in health with a much more modest increase in lifespan (i.e. this would truncate the "extended twilight" tail of daf-2). Would such an intervention really be better or more medically interesting than a daf-2-like intervention that doesn't involve euthanasia? The fact that lifespan-limiting "interventions" can trivially produce health ratios of 1 suggests that the HR may not really be the most biologically informative measure of "good health". Some careful discussion of what health ratios actually mean (if anything) is in order.

MINOR CONCERNS

1) Typos or vague / unclear language :

lines 71-75: run-on / complex / overly informal sentence.

line 139: “day-tot day”

lines 218-221: what is actually being said here?

lines 278-281: likewise unclear.

line 378: “twiglight”

2) More care should be taken in referencing:

2a) ref 12, the Lifespan Machine paper, is not particularly germane to the claim on lines 171-173 that "The exact effects of these genetic interventions on the lifespan of C. elegans varies somewhat in high-throughput screens and between different labs".

2b) The following aren't in the correct format / style:

line 50: Bansal et al.

line 89: Wormbook

2c) Probably should also cite Herndon 2002, Huang 2004, Pincus 2011, and Zhang 2016 regarding movement decline as a predictor of lifespan (line 52). Likewise, citations for longitudinal analysis of individuals / populations are a bit spotty. Hulme 2010, Pincus 2011, Zhang 2016 would be good to include for the individual case, and both of the Stroustrup papers for populations.

2d) line 113-115: Citing Hahm 2015 here is deceptive. They showed that spontaneous movement on food is confounded by genotype, compared to maximum velocity off food. But the Hahm results have nothing at all to say about whether stimulated movement (on food) is more or less confounded than spontaneous movement (on food), which is the point that the present authors are attempting to make. If anything, the original Churgin paper may better support this statement. If this is an important of a point to emphasize, the authors should explain their rationale more carefully.

3) Lines 232-234: "Based on these considerations, we opted for the peak activity as the activity value of choice, preferring it over the maximal activity solely based on the fact that the first value is based on more observation points and might therefore be a more accurate representation of true biology." This is not really a mathematically or statistically cogent statement. Calculating a percentile uses all of the input data in the same way that calculating a mean does. It doesn't somehow use "more" data to calculate a percentile and then calculate a mean based off of a subset of the data defined by the percentile.

4) Figure 2: (a) Consider labeling titles/axes for each respective graph to identify which statistic is used to generate the data. (b) Panels B & C for 99th and peak activity appear identical; check whether these graphs have been correctly generated.

5) Figure 5: Why do daf-2 lifespan estimates decrease with the longest time intervals? Are the animals somehow returning back to where they started such that the difference scores decrease and they register as "dead" incorrectly? This is a more than a little odd...

6) Figure 7: It would be helpful to explicitly mention that (a) everything in this figure is normalized to the mean of WT, and (b) IA stands for integrated activity. (Including in the relevant methods section.)

7) Though mentioned in Methods, consider reintroducing the concept of HR/HSR in the Results in greater detail than lines 255-6. These terms are heavily used in the Results and it is a bit be difficult for the reader absent a refresher.

8) Table S4: Consider making the second column title for Table S4 “p-value for interaction with experiment”

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files to be viewed.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. Please note that Supporting Information files do not need this step.

Decision Letter 1

Sean P Curran

11 Feb 2020

Optimized criteria for locomotion-based healthspan evaluation in C. elegans using the WorMotel system

PONE-D-19-22873R1

Dear Dr. Jushaj,

We are pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it complies with all outstanding technical requirements.

Within one week, you will receive an e-mail containing information on the amendments required prior to publication. When all required modifications have been addressed, you will receive a formal acceptance letter and your manuscript will proceed to our production department and be scheduled for publication.

Shortly after the formal acceptance letter is sent, an invoice for payment will follow. To ensure an efficient production and billing process, please log into Editorial Manager at https://www.editorialmanager.com/pone/, click the "Update My Information" link at the top of the page, and update your user information. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to enable them to help maximize its impact. If they will be preparing press materials for this manuscript, you must inform our press team as soon as possible and no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

With kind regards,

Sean P. Curran

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: This revised manuscript focuses on optimizing the use of movement-based data from the WorMotel

system to better evaluate healthspan. They do this by first evaluating four different statistics of

single worm movement before choosing the one that is the least variable over independent

experiments (peak activity). They then measure “total days of health” and healthspans of daf-2 and

daf-16 RNAi animals, both of which are based on varying thresholds of peak activity compared to

wildtype. They further analyze these data by using various ratios and integrals to develop further

differences between the phenotypes of the animals and to most robustly capture differences in their

healthspans.

I would like to thank the authors for their revised manuscript. In the revised version, the authors have done a tremendous job in responding to the reviewer comments, and as such, the paper is much improved. Based on the improved explanations and additional data, I now see this manuscript as an extremely useful text for scientists interested in C elegans aging and healthspan to refer to and to move the healthspan conversation forward. My only note to the authors, based on this revised manuscript, is that some figures (e.g. Fig 6, 7, 9) lack clear labeling where it would be helpful in the figure, and some images are quite blurry. This likely can be fixed during production.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Acceptance letter

Sean P Curran

24 Feb 2020

PONE-D-19-22873R1

Optimized criteria for locomotion-based healthspan evaluation in C. elegans using the WorMotel system

Dear Dr. Jushaj:

I am pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please notify them about your upcoming paper at this point, to enable them to help maximize its impact. If they will be preparing press materials for this manuscript, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

For any other questions or concerns, please email plosone@plos.org.

Thank you for submitting your work to PLOS ONE.

With kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Sean P. Curran

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Fig. Peak and maximal activity on one hand, vs median and integral activity on the other hand, form two separate groups of correlative parameters.

    Activity calculated by using the peak values correlates perfectly with maximal activity values (99th percentile) at any time interval (A), whereas correlation with integral (B) and median (C) values is time interval dependent. (D) Integral values, on the contrary, correlate well with median values, but neither of these (E integral, F median) escape the weaker and interval-dependent correlation with maximal activity. These data suggest that only two interpretations of the activity profile are made by determination of peak/maximal and median/integral daily activities. Time interval dependence of the correlations in B, C, E and F is easily explained by the higher sensitivity of median/integral values to the time interval between analyzed images.

    (TIF)

    S2 Fig. Correlation of individual worm data calculated at 100 seconds interval visually shows the two separate groups of correlative parameters.

    Activity was calculated based on the different parameters for each worm on each day, independent of genotype. (A) Activity calculated by using the peak values correlates perfectly with maximal activity values (99th percentile), whereas correlation with (B) integral and (C) median values is less pronounced. (D) Integral values, on the contrary, correlate well with median values, but both (E integral, F median) show a weaker correlation with the maximal value.

    (TIF)

    S3 Fig. Peak activity values result in activity curves with the lowest variation, as is clear from the distributions of overall variation based on daily median, integrated, peak or maximal activity for control (black), daf-2 (pink) or daf-16 (yellow) RNAi-treated populations across all experiments by increased time interval.

    For each individual, day-to-day variation was calculated as stated in the main text. Box values: Q1-2-3, whiskers: +/–2.7σ.

    (TIF)

    S4 Fig. Also for individual experiments plates, peak and maximal activity values result in activity curves with the lowest variation.

    Distributions of overall variation based on daily median, integrated, peak or maximal activity for control (black), daf-2 (pink) or daf-16 (yellow) RNAi-treated populations across all experiments (A Exp I; B Exp II; C Exp III; D Exp IV) follow the same trends as pooled data (Fig 5). For each individual, day-to-day variation was calculated as stated in the main text. Box values: Q1-2-3, whiskers: +/–2.7σ.

    (TIF)

    S5 Fig. Variance in Z-score is similar for all activity parameters at all time intervals, as is clear from the distributions based on daily median, peak, maximal or integrated activity for pooled control (black), daf-2 (pink) or daf-16 (yellow) RNAi-treated populations across all experiments.

    For each individual, day-to-day variation was calculated as stated in the main text. Box values: Q1-2-3, whiskers: +/–2.7σ.

    (TIF)

    S6 Fig

    Variance in Z-score based on individual plates is similar for all activity parameters at all time intervals, as is clear from the distributions based on daily median, peak, maximal or integrated activity for control (black), daf-2 (pink) or daf-16 (yellow) RNAi-treated populations across all experiments (A Exp I; B Exp II; C Exp III; D Exp IV). For each individual, day-to-day variation was calculated as stated in the main text. Box values: Q1-2-3, whiskers: +/–2.7σ.

    (TIF)

    S7 Fig. Average survivor activity is higher for longer time intervals across all experiments.

    Average survivor activity for control, daf-2 and daf-16 RNAi-treated populations for (A-C) Exp II, (D-F) Exp III and (G-I) Exp IV. Longer time intervals (≥60s) provide more accurate measurements, this is especially important in late phases of life.

    (TIF)

    S8 Fig. The choice of time interval for activity evaluation affects the determination of LS in a genetype-dependent manner across all experiments.

    Mean lifespan (error bars: standard error of mean) was calculated for different time intervals for (A) Exp I, (B) Exp II, (C) Exp III and (D) Exp IV. The choice of time interval does not affect the calculation of lifespan of control (black) and daf-16 RNAi-treated (yellow) populations but does affect lifespan decisions made for the long-lived daf-2 RNAi-treated (pink) populations.

    (TIF)

    S9 Fig. A majority of observed pixels changed for healthy worms lie above 177 pixels, whereas the majority of observed pixels changed for unhealthy worm lie under this value.

    Determination of a threshold that maximizes the number of truly healthy worms in healthy (very fast—fast—medium fast) categories, while maximizing the number of truly unhealthy worms in the unhealthy categories (slow—inactive) led to a threshold value of 177 pixels changed. Black line: fraction of animals in the 'slow' category with a pixel difference value < x-axis value; red line: fraction of animals in the ‘very fast’, ‘fast’ and 'medium fast' category with a pixel difference value >x-axis value.

    (TIF)

    S10 Fig

    Threshold determination on individual plate level is very similar to pooled data, with cumulative curves of from medium fast and slow worms intersecting at approximately 160 pixel differences for (A Exp I; B Exp II; C Exp III; D Exp IV).

    (TIF)

    S11 Fig. Decrease in pixel differences with locomotive health follows similar trend for individual plates, with lower pixel differences being assigned to categories of lower locomotive health.

    In general, pixel differences below 160 belong to categories 4 and 5 for A Exp I; B Exp II; C Exp III; D Exp IV.

    (TIF)

    S12 Fig. Three examples illustrate visually that TDH is a better approximation of health observed in our data when compared to HS.

    We plotted the activity profile of three individual wild-type worms whose HS was 1, but whose TDH (A) strongly, (B) moderately or (C) slightly deviated from HS. Activity profile of (A) shows a flare of activity in the last day of life, resulting in a misleadingly high HS. TDH of (B) nears HS more than in case of (A), however, fluctuations in the activity profile of this worm indicate that TDH has a better representation of the animal’s health. (C) TDH deviates only two days from the quantified HS, nevertheless, leads to a better approximation of health. All three worms visually indicate that HR (~TDH) is a more accurate quantification of observed health than HSR (~HS).

    (TIF)

    S1 Table. Correlation of tested activity parameters depends on the parameter and time interval.

    Correlation of the activity values based on either median, peak, maximal (99th percentile) or integral values (see Methods) was tested with a linear regression model for each strain (Column D) and time interval (column E). R2-values (column C) show that peak activity correlates well with 99th percentile value, while median and integral values correlate well with each other. Correlation becomes stronger with increasing time interval.

    (XLSX)

    S2 Table. Data based on peak or maximal activity show the lowest overall variation.

    Overall variation was calculated for pooled (per genotype) activity data based on either median, peak, maximal (99th p = 99th percentile) or integral values (see methods), as calculated for each time interval (column D). Differences (columns A vs B) were compared via Kruskal-Wallis testing (column C: multiple testing-corrected p-values). Bold red: p-values indicative of statistically significant differences.

    (XLSX)

    S3 Table. Use of different activity parameters does not influence variance in Z-score.

    Variance in Z-score pooled (per genotype) activity data based on either median, peak, maximal (99th p = 99th percentile) or integral values (see Methods), as calculated for each time interval (column D). Differences (columns A vs B) were compared via Kruskal-Wallis testing (column C: multiple testing-corrected p-values). Bold red: p-values indicative of statistically significant differences.

    (XLSX)

    S4 Table. Time intervals affect the determination of lifespan in a genotype-dependent manner.

    Lifespan was calculated for pooled (per genotype) activity data for each time interval under consideration. Differences (columns A vs B) were compared via Kruskal-Wallis testing (column C: multiple testing-corrected p-values). Bold red: p-values indicative of statistically significant differences.

    (XLSX)

    S5 Table. HSR values for individual worms often equal 1, independent of genotype or experiment.

    LS, HR and HSR values for individual worms for each population and experiment are shown. HSR values often equal 1 for different genotypes, meaning that healthspan and lifespan are the same. HR values on the contrary, reflect more what has been observed in literature with values that often vary from 0.5 to 0.8.

    (XLSX)

    S6 Table. Health(span) ratio of daf-2 RNAi treated animals is significantly different from internal controls.

    H(S)R of controls, daf-2 and daf-16 RNAi-treated animals were calculated. H(S)R distributions of daf-2 and daf-16 RNAi treated animals were probed for significant differences from controls at the same threshold (multiple testing-corrected p-valueKruskal-Wallis < 0,05). Significant p-values are marked in red.

    (XLSX)

    S1 File. Pixel difference datafiles for all experiments.

    Pixel differences per worm per day were calculated (see Methods) and stored in daily pdata files. Number in pdata file name increases with time. Each column in a pdata file represents an individual worm (from 1 to 240), with each row listing a value according to the time vector.

    (ZIP)

    S2 File. Tutorial for data analysis.

    This includes relevant scripts (See Tutorial).

    (ZIP)

    Attachment

    Submitted filename: Response to Reviewers.pdf

    Data Availability Statement

    All relevant data are within the paper and its Supporting Information files. The series of images accompanying the tutorial for data analysis have been made available at https://www.ebi.ac.uk/biostudies/studies/S-BSST313#.


    Articles from PLoS ONE are provided here courtesy of PLOS

    RESOURCES