Skip to main content
PLOS Biology logoLink to PLOS Biology
. 2025 Jul 10;23(7):e3003217. doi: 10.1371/journal.pbio.3003217

Wakefulness can be distinguished from general anesthesia and sleep in flies using a massive library of univariate time series analyses

Angus Leung 1,*,#, Ahmed Mahmoud 1,#, Travis Jeans 2, Ben D Fulcher 3, Bruno van Swinderen 2, Naotsugu Tsuchiya 1,4,5,*
Editors: Christopher Chambers6, Simon van Gaal6
PMCID: PMC12244653  PMID: 40638578

Abstract

The neural mechanisms of consciousness remain elusive. Previous studies on both human and non-human animals, through manipulation of level of conscious arousal, have reported that specific time-series features correlate with level of consciousness, such as spectral power in certain frequency bands. However, such features often lack principled, theoretical justifications as to why they should be related with level of consciousness. This raises two significant issues: firstly, many other types of times-series features which could also reflect conscious level have been ignored due to researcher biases toward specific analyses; and secondly, it is unclear how to interpret identified features to understand the neural activity underlying consciousness, especially when they are identified from recordings which summate activity across large areas such as electroencephalographic recordings. To address the first concern, here we propose a new approach: in the absence of any theoretical priors, we should be maximally agnostic and treat as many known features as feasible as equally promising candidates. To apply this approach, we use highly comparative time-series analysis (hctsa), a toolbox which provides over 7,700 different univariate time-series features originating from different research fields. To address the second issue, we employ hctsa to high-quality neural recordings from a relatively simple brain, the fly brain (Drosophila melanogaster), extracting features from local field potentials during wakefulness, general anesthesia, and sleep. At Stage 1 of this registered report, we constructed a classifier for each feature, for discriminating wakefulness and anesthesia in a discovery group of flies (N = 13). At Stage 2, we assessed their performances on four independent groups of evaluation flies, from which recordings were made during anesthesia and sleep, and which were originally blinded to the data analysis team (N = 49). We found only 47 time-series features, applied to recordings obtained from the center of the fly brain, to also significantly classify wake from anesthesia or sleep in all 4 of these evaluation datasets. Most of these were related to autocorrelation, and they indicated that signals during wakefulness remained correlated to their past for a longer timescale than during anesthesia and sleep. Meanwhile, time-series features related to well-known potential markers of consciousness, such as those related to complexity or spectral power, failed to generalize across all the flies. However, many of these complexity and spectral features have a consistent direction of effect due to anesthesia or sleep across flies, suggesting that even slight variations in experiment setup can reduce generalizability of classifiers. These results caution the current state of frequent discoveries of new potential consciousness markers, which may not generalize across datasets, and point to autocorrelation as a class of dynamical properties which does.


Varying levels of consciousness correlate with multiple time-series features. This pre-registered study uses a data-driven approach to search for markers that can determine the individual performance of these features in distinguishing levels of consciousness.

Introduction

The question of how physical mechanisms generate conscious arousal is a longstanding question in neuroscience. Understanding the mechanisms that support consciousness will have significant impacts in clinical assessment of loss of consciousness [1]. Historically, researchers have approached this question through identifying electrophysiological differences in brain recordings between differing levels of consciousness, such as wakefulness and general anesthesia. This approach has resulted in the discovery of multiple time-series features as markers of level of consciousness, including spectral power in different frequency bands [28] and measures of signal complexity in spontaneous recordings [912]. Despite these developments, performance in distinguishing levels of consciousness using such markers remains limited [13,14].

The limited performance of previously identified markers in distinguishing levels of consciousness, and failure to extend to new conditions, may be due to a lack of theoretical expectations as to what they should be. Historically, candidate markers were often found through visually contrasting electrophysiological recordings, such as electroencephalograms, obtained at varying levels of consciousness [15,16]. Though this approach has led to well-known markers of depth of anesthesia, it is limited by biases toward groups of time-series features for which differences in level of consciousness are visually clear (for a related issue in sleep research, see [17]). While newer markers have moved away from features which are visually clear, a similar problem applies, wherein researchers investigate features selected based on their own expertise of particular facets of time-series structure. Consequently, a vast range of other time-series features have been ignored as potential markers for conscious level.

One way of removing the bias inherent to selecting individual features to investigate as markers of conscious level is to be maximally agnostic about the types of time-series properties which can map to consciousness. Then, we can systematically test and compare as many time-series features as feasible. This approach consists of two main components.

Firstly, “all” potential features of some given neurophysiological time series should be investigated. While comparison of multiple features has been done previously on well-established features [13,14], “all” features should be compared, not only those determined by or related to visual inspection or individual expertise in particular time-series features. While at first this may seem like a daunting task, this is feasible using highly comparative time series analysis (hctsa; [18]). hctsa is a computational framework which extracts from a given time series a massive number (>7,000) of univariate time-series features. These features are taken from a multitude of research fields, and include measures such as basic statistics of the distribution of time samples, linear correlations among timepoints, stationarity, entropy measures, among others. This library has been applied previously to find meaningful time-series features for such applications as detecting falls [19] to identifying physiological dynamics underlying neurological disorders [20].

Secondly, to avoid overfitting to a particular dataset, features should be validated on datasets independent from the original dataset from which the features were originally identified [21,22]. In particular, validating features on blinded datasets while utilizing a registered report approach can help mitigate biases toward certain features [23]. While standards are shifting toward testing proposed features in independent samples [14,24,25], “cross-validation”, a method which splits data into training and testing sets is still widely utilized, likely due to the cost and clinical problems of obtaining independent datasets. This is especially true for data from human participants which involve manipulations of level of consciousness through general anesthesia [26]. Ethics further limits recruitment of healthy participants for which there is no medical reason for inducing anesthesia or obtaining recordings. Due to these considerations, the use of independent, blinded data is particularly rare in consciousness research (but see [27]).

The issue of data availability in human anesthesia recordings can be circumvented by first applying our approach to simpler brains, such as fly brains. Recordings from flies can be obtained relatively cheaply with no clinical concerns, and, due to the relatively small brain, (~105 neurons compared to 1011 for human brains; [28,29]), neural activity can be obtained simultaneously throughout the whole brain. Consequently, we can obtain high-quality recordings from many healthy flies. Using high-quality recordings from a relatively simple system also offers an advantage. That is, the identified time-series features can be more directly interpreted to understand underlying neural phenomena (compared to features identified from, e.g., recordings from the human scalp). Despite seemingly different neural architecture compared to mammals, flies seem to experience varying states of arousal, regulated in a similar way to mammals, such as sleep [3033] and anesthesia [34,35]. Given these similarities and advantages described above, the fly serves as a useful model to begin to apply new data-driven approaches to discriminating consciousness levels from univariate neural time series (see also [12,36,37]).

In this registered report, we aim to evaluate a massive, comprehensive set of individual time-series features, coming from multiple research fields, as potential markers of level of consciousness. Which univariate time-series features accurately and reliably distinguish between conscious levels? And do they correspond to previously proposed univariate measures of conscious levels? Or are there some conceptually unexplored time-series features which perform better? If no features reliably distinguish conscious levels, this would highlight the need for bivariate or multivariate features. These would include features such as coherence, Granger causality, [35,38], transfer entropy [39], Lempel-Ziv complexity (which can be applied both at the individual channel level as well as across multiple channels) [40], perturbational complexity index [41], etc. Alternatively, new measures derived from theories, such as integrated information, may be necessary [37,42]. Indeed, many theories of consciousness rely on interactions among parts, and would predict univariate features to be uninformative of conscious level.

Here, we compare the most comprehensive available set of scientific features, made available in the hctsa toolkit [18], searching for features that may warrant further exploration in the future as potential markers of consciousness. First, we search for features which reliably distinguish wakefulness from isoflurane anesthesia in a discovery dataset (N = 13 discovery flies) and which generalize to a blinded, independent dataset (N = 12 flies with graded levels of anesthesia; N = 18 flies with single level of anesthesia; and N = 19 flies during sleep). Second, we search for features for which the direction of the effect of anesthesia (i.e., yield consistently higher or lower values in anesthesia versus wakefulness) is consistent across datasets. These directionally consistent measures could be useful in assessing level of consciousness when a subject’s baseline is known. For these purposes, we apply and compare the hctsa features systematically. Critically, we validate them on recordings obtained from an independent set of flies which were blinded to the analysis team at the time of submitting the Stage 1 manuscript for this registered analysis. At the time of submitting the Stage 1 manuscript, our pilot results (on 2 of the evaluation flies) indicated that the performances of many features which had statistically significant performance in classifying wakefulness and anesthesia in the discovery dataset would not generalize to a second, independent, evaluation dataset, highlighting the importance of evaluating measures on independent datasets. Despite this, across the datasets, many features maintained their direction of the effect of anesthesia across the flies.

Summary table

Research question What univariate time-series features (from hctsa) can serve as markers of level of consciousness ACROSS individuals? What univariate time-series features (from hctsa) can serve as markers of level of consciousness WITHIN individuals?
Hypotheses

1 hypothesis for each hctsa feature at each channel:

  • Feature X classifies wake/anesthesia above chance

1 hypothesis for each hctsa feature at each channel:

  • Direction of effect of anesthesia for feature X is more consistent than chance

Sampling plan

Use existing data:

  • 13 discovery flies (Canton S wild-type) × 8 2.25s epochs each of wake/isoflurane (published previously);

  • 2 pilot evaluation flies (isoCJ1; previously unpublished) × 112 2.25s epochs each of wake/isoflurane;

  • 10 multi-dosage evaluation flies (isoCJ1; previously unpublished), epochs from wake/isoflurane at same concentration as pilot evaluation flies, plus epochs from all 12 flies (including previous 2 pilot evaluation flies) at a second isoflurane concentration (number of epochs undisclosed to data analysis team, but expecting same/similar to pilot evaluation flies) and during recovery after isoflurane;

  • 18 single-dosage evaluation flies (Canton S wild-type; previously unpublished), epochs from wake/isoflurane/post-isoflurane (number of epochs undisclosed to data analysis team);

  • 19 sleep evaluation flies (Canton S wild-type; unpublished), epochs from wake/sleep (number of epochs undisclosed to data analysis team)

Statistical analyses

Classification analysis, using a nearest-median classifier trained on the discovery flies.

  • Obtain classifier accuracy on discovery flies (leave-one-fly-out validation) and evaluation flies

  • Obtain significance by comparing classifier performance to random classification distribution (α = 0.05)

  • FDR correction at each channel (q = 0.05)

Consistency of wakeful epochs being greater/less than anesthesia epochs at each fly, based on direction of anesthesia effect in the discovery flies (see Methods section “Within-fly effect direction consistency”)

  • Obtain significance by comparing consistency to random consistency distribution (α = 0.05)

  • FDR correction at each channel (q = 0.05)

Pre-specified outcomes

The performance of feature X in discriminating wakefulness/anesthesia shows significant generalization across individuals and the feature is worth future investigation as a marker of conscious level if:

  • It performs significantly in the discovery flies AND

  • It performs significantly in the evaluation flies

The within-individual effect of anesthesia for feature X shows significant generalization across individuals, and the feature is worth future investigation as a marker of conscious level if:

  • Consistency of the direction of the effect of anesthesia is significantly above chance in the discovery flies AND

  • Consistency is significantly above chance in the evaluation flies, for the same direction as the discovery flies

Methods

Data and preprocessing

We use already-collected local field potentials (LFPs) from fruit fly brains during wakefulness and during isoflurane anesthesia. We use four independent datasets: (i) a discovery dataset for initially identifying features which perform well at discriminating wakefulness from anesthesia; and (ii) a blinded evaluation dataset for assessing the generalizability of these features to a separate dataset, which manipulated level of consciousness through (a) multi-dose anesthesia (N = 12), (b) single-dose anesthesia (N = 18), and (c) sleep (N = 19). Fig 1 illustrates our data analysis pipeline for the two sets of flies. As our discovery dataset, we use previously published data from 13 flies [12,35,37]. As our blinded evaluation dataset, we use data from an additional 49 flies collected by TJ and BvS which were provided to AL and NT for analysis only after in-principle acceptance of the Stage 1 manuscript (initials refer to authors of this registered report). At the time of submission of the Stage 1 manuscript, 2 of the evaluation flies from the multi-dose anesthesia set were provided and used for pilot analysis, with the remaining flies being withheld for final evaluation after analysis methods are fixed.

Fig 1. Analysis pipeline for individual features in hctsa. (.

Fig 1

a) Flies were dorsally fixed to a rod and placed on an air-supported ball. Isoflurane was administered through a rubber hose. We use a discovery dataset of 13 flies to identify time-series features which discriminate wakefulness from anesthesia. We assess how the performances of these features generalize to an independent evaluation dataset consisting of 49 flies. We use 2 of these flies to obtain pilot generalization performance for registering this analysis. (b) Local field potentials (LFPs) are obtained during wakefulness and anesthesia using linear multi-electrode arrays inserted laterally into the fly brain. (c) At a given channel and time-series feature (here we show the feature StatAvl250), we compute feature values for every epoch from each fly (each entry in the image plot corresponds to a scaled feature value from one epoch). We train a nearest-median classifier using the discovery (D) flies, where the threshold for classifying wakefulness (red) versus anesthesia (blue) is the middle point (black vertical line) between the median values of the two conditions (red and blue vertical lines). We assess the feature’s across-fly classification performance on the discovery flies using a leave-one-fly-out cross-validation procedure. We assess the generalization of the feature’s performance by classifying epochs from the evaluation (E) flies using its threshold as obtained from all the discovery flies. (d) As a weaker form of generalization, we also assess within-fly effect direction consistency by finding, for each wake epoch, the proportion of anesthesia epochs which have greater or lesser (depending on the direction of the effect of anesthesia for the feature as illustrated in (c) feature values. We visualize this here for a second feature by showing the within-fly differences between scaled feature values. Each entry in the rightmost image plot gives the difference between every combination of one wake epoch and one anesthesia epoch from the same fly. Images in (a) and (b) obtained from [36]. Source data available on OSF (https://osf.io/8wvsq/?view_only=8a056d1c573b4f23a6cf6cea8b976ddb).

Discovery flies

For this dataset, we provide details relevant to this registered report (for full details see [35]). Thirteen laboratory-reared female Drosophila melanogaster (Canton S wild type 3–7 days post-eclosion) were collected under cold anesthesia and glued dorsally to a tungsten rod. Linear silicon probes (Neuronexus Technologies) were inserted laterally into the fly’s eye [43]. Each linear probe consisted of 16 electrodes separated with a site separation of 25 µm, and covered approximately half of the fly brain. Recordings were made with a sampling rate of 25 kHz using a Tucker-Davis Technologies multichannel data acquisition system and downsampled to 1,000 Hz.

Recordings for each fly were obtained from two blocks: one block with 0 vol% isoflurane at the fly body (wake condition), followed by a block with 0.6 vol% isoflurane (anesthesia condition). Isoflurane was delivered from an evaporator to the fly through a rubber hose. Each block followed a series of air puffs, and consisted of 18 s of rest, 248 s of visual stimuli, another 18 s of rest, and a second series of air puffs. Isoflurane was administered following the last air puff of the first block, and flies were left to adjust to the new concentration for 180 s before beginning the second block. Flies in the wake condition responded to air puffs by moving their legs and abdomen, but were inert during the anesthesia condition. Flies regained responsiveness after isoflurane was removed, ensuring that flies were alive during the anesthesia recordings [36]. We use the data obtained in the 18 s period of each block corresponding to the rest period preceding the visual stimuli.

We bipolar re-referenced the LFPs by subtracting adjacent electrodes to acquire 15 signals which we refer to as “channels”. Channel 1 refers to the channel positioned furthest into the fly brain. Finally, we segmented the 18 s period into 2.25 s segments, giving 8 epochs per fly and condition.

Pilot evaluation flies

On 14/06/2019, the data-analysis team (AM, AL, and NT) was provided with 56 segments of 20 second spontaneous activity recordings from the data collection team (TJ and BvS). The 56 segments were known to the data analysis team as coming from 2 flies and from varied levels of anesthesia. The analysis team was initially blinded to the labeling of the segments, such that the source condition and fly of each segment was unknown. Further, the analysis team was blinded as to the distribution of segments coming from each fly or anesthesia condition (e.g., whether the 56 segments had an equal number of wake and anesthetized segments, or an equal number of segments from each fly), and to the specific variant of fly and the context in which the data had originally been collected.

However, these labels and information were made available (in June 2019) after early analyses using 18 s segments (corresponding to the original length of the segments from the discovery flies, instead of 2.25 s segments). We later deemed the 18 s approach inappropriate as we would be generalizing across-fly classification performance to within-fly classification performance (applying classifiers trained on a single epoch each of wakefulness and anesthesia from each discovery fly to multiple epochs from an individual pilot evaluation fly; see Section “Classification of conscious level”), before finalizing the full methods and parameters. With the labels, it was revealed that the 56 segments were equally divided into 14 segments of wakefulness or anesthesia for each of the two flies. It was also revealed that the flies were of a w2202 background (also called isoCJ1), which has a similar isoflurane sensitivity profile to the Canton-S wild-type fly (CS; [44]).

The following technical details of the recordings were available to the data analysis team, to enable equal pre-processing of signals. Electrophysiological data were recorded at 25 kHZ, down sampled to 1,000 Hz. Next, LFPs were bipolar re-referencing by subtracting adjacent unipolar channels (n = 16) to acquire 15 channels.

The exact details originally provided to the analysis team are available at https://osf.io/bq5ry/?view_only=3789097395c1419db2a9eb615bc1effe.

Final evaluation flies

Final evaluation data were provided to the data analysis team after in-principle acceptance of the Stage 1 manuscript. We describe the information disclosed by the data collection team (TJ and BvS) to the data analysis team at the time of writing Stage 1 of this registered analysis, followed by additional details which were provided after in-principle acceptance. The evaluation data consists of three datasets as follows. The electrode preparation for each dataset followed the same protocol as described in [45]. At time of submission of the Stage 1 manuscript, the analysis strategy was fixed, and final evaluation data had not been provided to the analysis team. The teams agreed that disclosing the following information would not affect the outcome of the results.

Multi-dosage evaluation flies

The multi-dosage evaluation dataset consists of 12 female isoCJ1 flies, which were administered isoflurane at two concentrations. The two pilot evaluation flies described previously and analyzed in Stage 1 of this registered analysis were taken from this dataset (using epochs from wakefulness and one concentration of isoflurane anesthesia). Epochs from this dataset were obtained after an air puff stimulus and consist of five possible conditions: (1) wakefulness; (2) isoflurane concentration A; (3) isoflurane concentration B; (4) post-isoflurane (after isoflurane administration, but before flies are fully awake); and (5) recovery (when flies are fully awake and responsive after isoflurane). At the time of submission of the Stage 1 manuscript, the isoflurane concentrations A and B were not disclosed to the data analysis team.

After in-principle acceptance of the Stage 1 manuscript, the recordings from these flies were provided, and the following details were revealed. Recording setup and anesthetic delivery was similar to the discovery flies, with 16-electrode linear silicon probes with site separation of 25 µm inserted laterally into the fly’s eye, and recordings being made at 25 kHz and downsampled to 1,000 Hz. Probes were inserted such that the outermost electrode was within the fly’s eye (this was also the procedure for the flies in the single-dosage and sleep evaluation flies), unlike in the discovery flies where the outermost electrode was positioned just outside the eye. For each fly, 14 chunks of 20 s recordings, bipolar re-referenced in the same manner as the discovery flies, were provided for each condition.

Similar to the discovery flies, recordings were obtained in sequential blocks corresponding to the conditions outlined previously. Each block consisted of a 5 min period of darkness, followed by an air puff and 15 min of red ambient lighting and visual flickering stimuli. Blocks were separated by 1 min, with an air puff 30 s after the end of each block. The provided recordings corresponded to the 5 min period of darkness at the beginning of each block. We divided each 20 s chunk into eight 2.25 s epochs, discarding the last 2 s of each chunk, to match the epoch duration of the discovery flies. It was revealed that the isoflurane concentrations A and B at the fly body were 0.6 and 1.2 vol%, respectively, as estimated through gas chromatography. However, due to the increased concentration of isoflurane and longer duration of exposure, an increased vacuum flow compared to the discovery flies (16 L/min, compared to 9 L/min in the discovery and single-dosage evaluation flies) was used to clear the room of anesthesia. As such, actual anesthetic concentration is likely to have been lower. Unlike the discovery flies, recordings were obtained in the absence of ambient lighting.

Upon conducting our analyses, we found that the distribution of feature values varied greatly between the first eight and last four flies. Upon further investigation, the analysis team discovered that the flies had been recorded in two batches. Originally, eight flies were recorded. An additional four flies were subsequently recorded from after a delay of a year. While to our knowledge the recording and experimental setups were unchanged, the distribution of feature values varied greatly between the two subsets of flies. From inspecting power spectra, we suspected that a low pass filter may have been inadvertently in use during the recording of the last four flies (Fig B in S1 Text). Thus, we separate out these two subsets of flies. We refer to the first eight flies as MD8 and the last four flies as MD4.

As electrode probes were inserted deeper into the evaluation flies than for the discovery flies, we considered offsetting the channels of the evaluation flies (i.e., all the flies from the multi-dosage, single-dosage, and sleep evaluation flies) to better align them with the channel locations of the discovery flies. However, we did not find evidence to suggest that such an offset would better align the channels (S1 Text).

Single-dosage evaluation flies

The single-dosage evaluation dataset consists of 18 female Canton S wild-type flies, which were administered isoflurane at a single concentration. Epochs from this dataset were obtained after an air puff stimulus and consist of four possible conditions: (1) wakefulness; (2) isoflurane concentration C (which may or may not be equal to isoflurane concentrations A and B in the multi-dosage evaluation flies); (3) post-isoflurane; and (4) recovery.

After in-principle acceptance of the Stage 1 manuscript, the recordings from these flies were provided, and the following details were revealed. Flies were 3–10 days post-eclosion. Fly preparation and anesthesia delivery was again similar to the discovery flies, with 16-electrode linear silicon probes with site separation of 25 µm inserted laterally into the fly’s eye, and recordings being made at 25 kHz and downsampled to 1,000 Hz. For each fly, 18 s from each condition was provided.

Again, similar to the discovery flies, recordings were obtained in sequential blocks corresponding to each condition. Each block consisted of an airpuff, followed 30 s later by visual stimuli lasting 18 min and 18 s, followed by another 30 s of rest. Isoflurane was then administered and flies were left for 180 s to adjust to the new concentration before starting the next block. The provided 18 s recordings correspond to 18 s preceding the visual stimuli. We bipolar re-referenced the LFPs in the same manner as the discovery flies and segmented the 18 s recordings into 2.25 s epochs to match the epoch duration of the discovery flies.

Sleep evaluation flies

The sleep evaluation dataset consists of 19 female Canton S wild-type flies. Recordings were obtained during wakefulness and periods of sleep. Hence, epochs from this dataset consist of two possible conditions: (1) wakefulness, and (2) sleep. While the body of the flies were tethered, their limbs were free to move. We obtained video recordings of the flies to detect and quantify the movement of the flies [36]. With this movement detection, we defined sleep using the commonly accepted criterion of periods of immobility lasting 5 min or longer [30].

After in-principle acceptance of the Stage 1 manuscript, the recordings from these flies were provided, and the following details were revealed. Flies were 1–2 days post-eclosion to maximize survivability across 12 h. Recordings for different flies started at different times spread throughout the 24-h day to capture bouts of sleep during both the day and night. As for the previous datasets, fly preparation was similar to the discovery flies. Linear 16-electrode silicon probes with site separation of 25 µm were inserted laterally into the fly’s eye, and recordings were made at a sampling rate of 25 kHz and downsampled to 1,000 Hz.

Along with the electrophysiological recording, video recordings of the flies in profile view were captured to identify sleep bouts. Recordings were made using a Scopetek DCM130E with a Navitar zoom lens (coupler 1–6,010, adapter tube 1–6,020) under infrared light. The resulting videos were analyzed using OpenCV to identify periods of continuous periods of immobility, using the procedure described in [46]. For each fly, one recording block containing at least 5 min of continuous immobility was provided for analysis.

With one block of sleep per fly, we extracted a period of 18 s (wake) immediately before the onset of the sleep bout, and 18 s ending 2 min after the onset of the sleep bout (sleep). Fig C in S1 Text illustrates the extracted temporal locations relative to the recording block provided for each fly. We selected these temporal locations due to varied sleep and recording lengths for each fly in the provided recordings. In the same manner as for the other flies, we bipolar re-referenced the LFPs and segmented the 18 s period into 2.25 s segments.

Local field potential pre-processing

The data analysis team subtracted the mean voltage from each epoch of the discovery and pilot evaluation flies, and then removed line noise from each epoch using the rmlinesc.m function of the Chronux toolbox (http://chronux.org/; [47]) with 9 tapers, a time-bandwidth product of 5, and zero-padding factor 2. As a sanity check, we performed visual inspection of power spectrum plots after pre-processing to confirm the removal of line noise. These same pre-processing steps were also applied to the final evaluation flies.

Feature-based time-series analysis using hctsa

We extracted 7,702 time-series features from each epoch and bipolar re-referenced channel of the discovery and pilot evaluation flies using hctsa (v1.03; [18]) on MATLAB 2017b. For a given time series, hctsa extracts a vast set of 7,702 univariate time-series features from analysis methods developed in a wide range of scientific disciplines, including nonlinear physics, biomedicine, economics, and neuroscience. hctsa broadly groups these features into several general themes, such as distribution, correlation, information theory, stationarity, and so on. Within these themes, features are further grouped into “master operations”, which implement computations which are relevant to groups of time-series features. In this way, individual features are obtained by running master operations while specifying a range of parameters. An example of this are the features SP_Summaries_welch_wmax_<P>, which compute spectral edge frequency at P, with P = 5%, 10%, 25%, and 95%.

Not all of the available time-series features could be extracted successfully from our datasets. For example, the class of features derived from the hctsa function DN_CompareKSFit includes fits of the data to a beta distribution, which assumes values between 0 and 1, an assumption that is not fulfilled by our data and consequently returns missing (NaN) values. To filter out these cases, we excluded any feature which returned NaN across all time series for a given channel in the discovery flies. This reduced the set of features down to an average of 6,860 features across the 15 channels (ranging from 6,657 to 7,004). We further excluded features which returned a constant value across all time series for a given channel in the discovery flies because they are uninformative for classification, reducing the set of features again to on average 6,764 features across the 15 channels (ranging from 6,560 to 6,908).

While we analyze raw hctsa features, the range of values varies greatly across features, and some features include infinity values (which we keep, as they can be used in our classification analysis, see Section: “Classification of conscious level”). Where specified, we visualize scaled feature values using an outlier-robust sigmoidal transformation, which maps values of all epochs for a given feature to the unit interval [48]. We scale feature values in the evaluation flies based on the scaling parameters from the discovery flies.

Classification of conscious level

We use single-feature classification analysis at each channel to compare the performance of each individual time-series feature in distinguishing wakefulness from deep anesthesia (i.e., highest dosage in the multi-dose data) and sleep. If a feature distinguishes conscious level, it should have high classification performance in the discovery flies which generalizes to the evaluation flies. To account for features which can return infinity values, we employ nearest-centroid classifiers, with class medians as centers.

We first trained and cross-validated each feature’s classifier on the discovery flies. For a given channel and feature, we employed a leave-one-fly-out cross-validation procedure on the evaluation flies. Specifically, at each cross-validation iteration, we trained a classifier on all 8 epochs of wake and anesthesia from 12 flies, and tested on 8 epochs of wake and anesthesia on the remaining fly. Each classifier consists of: (1) a threshold, the middle point between the median feature value for wakefulness and anesthesia as obtained from the training set; and (2) a direction indicating whether points above the threshold should be classified as wakeful or anesthetized (and vice versa).

After obtaining cross-validation accuracies on the discovery flies, we finally obtained classifiers for each feature and channel by training on all epochs from all flies in the discovery dataset. We validate and report the performance of these classifiers on the pilot evaluation flies (N = 2, in the Stage 1 manuscript) and final evaluation datasets (total N = 49 evaluation flies, after in-principle acceptance).

In both experimental and applied settings, feature values can drift across datasets, due to uncontrollable changes in experimental settings or unknown sources of group differences. Some features may be sensitive to such uninteresting drift, which can be corrected automatically. To account for such drift, we report classifier performances both before and after normalizing feature values in the evaluation flies to match the mean and standard deviation of values in the discovery flies. We normalized each pilot evaluation fly by transforming feature values into z-scores, ignoring infinity values (which remained as infinity values after normalization), using the mean and standard deviation across all epochs from the other pilot evaluation fly. We then back-transformed the resulting z-scores using the mean and standard deviation across all epochs from all the discovery flies. We normalized the final evaluation datasets in a similar manner, by transforming feature values into z-scores using the mean and standard deviation across all epochs from each set of the evaluation flies and back-transforming using the mean and standard deviation across all epochs from the discovery flies.

We determined if a feature’s classifier discriminates wake and anesthesia significantly better than chance by comparing each feature to a random classification distribution at the α = 0.05 level. We corrected for multiple comparisons at each channel using the false discovery rate (FDR) correction [49] to account for potential positive dependency among the tests, which is likely to be the case as there are features in hctsa which are expected to give similar results (such as CO_AutoCorr features which compute autocorrelation at different lags) [50,51]. We obtained random-classification distributions for the discovery and pilot evaluation flies by repeatedly classifying discovery or evaluation epochs randomly, with equal probability (as there are 7,702 potentially available features in hctsa, we repeated this random classification N = 7,702 times to estimate the null distribution). We expected that features which reflect some process underlying change in conscious level will have significant classification performance which persists through cross-validation on the discovery flies to the final evaluation flies.

Upon receiving the datasets for Stage 2 analysis, we made additional methodological decisions as follows. As the evaluation datasets consisted of multiple wake and/or multiple unconscious conditions, we evaluated classification performance by, for a given dataset, pairing each wake state with each unconscious state. For the multi-dosage and single-dosage evaluation flies, we considered the isoflurane and sleep conditions as unconscious, and the others as wake. This gave a total of 6, 2, and 1 possible condition pairs for multi-dosage, single-dosage, and sleep flies, respectively (totaling 16 condition pairs, after splitting the multi-dosage flies into two groups and including the wake-anesthesia pair in the discovery flies). To obtain the performance of a given feature at a given condition pair, we used all the predictions from the wake condition and the unconscious condition. The number of epochs among all conditions for a given dataset was always equal, hence chance performance was always 50% for all condition pairs. As we found no features to generalize across all the condition pairs (i.e., achieve significant classification performance or consistency), we proceeded to limit our analyses to the condition pairs which we deemed as the most similar to the discovery flies. Given the experimental details provided for each dataset, we considered the pre-anesthesia and 1.2 vol% isoflurane conditions in the multi-dosage evaluation flies to be most similar to the wake and anesthesia conditions in the discovery flies. For the single-dosage evaluation flies, we considered the pre-anesthesia and 0.6 vol% isoflurane conditions to be the most similar to the discovery fly conditions.

Within-fly effect direction consistency

In assessing generalizability, it is possible that the effect of anesthesia (relative to wake) is highly consistent within individuals, even when features do not classify well across subjects. This is relevant in scenarios where, such as in this registered report, there may be variability in the exact placement of electrodes among individuals, an effect which cannot be corrected even with our group-level batch normalization. Values may further vary among individuals due to factors such as exact experimental setups and baseline arousal states. To address this, we assessed a weaker form of generalization—whether a feature is predictive of the relative difference between conscious levels within an individual fly—and report the consistency of the direction of the effect of anesthesia (after receiving the correct wake/anesthesia/sleep labels in the case of the final evaluation flies).

Specifically, at a given feature, fly, and channel, we obtained for each wakeful datapoint the proportion of anesthetized data points which lie below it. Because the direction of the effect of anesthesia is not necessarily the same across features and channels, we first assigned directionality labels based on the median wakeful and anesthesia values in the 13 discovery flies (i.e., based on the direction component of the classifiers described above). For a given feature and channel, we gave a label of 1 if the median wakeful value was greater than for anesthesia, and −1 otherwise. We then multiplied feature values by these labels, flipping the direction of the effect of anesthesia when the median wakeful value is lesser than the median anesthesia value and making the analysis uniform across features and channels. Finally, we report the average proportion across all wakeful epochs and flies.

In a similar way as for testing for significance of classification performance, we used permutation testing to determine if the within-fly effect direction consistency of a feature was significantly better than chance. We obtained reference chance distributions for the discovery flies and pilot evaluation flies by repeatedly (N = 7,702) randomly assigning the portion of anesthesia epochs which are below each wakeful epoch, with equal probability, and averaging across wakeful epochs and flies. We compare each feature to the distribution at the α = 0.05 level, correcting for multiple comparisons at each channel using FDR correction. We again limited our analysis to the conditions in the evaluation flies which were most similar to the discovery flies, for the same reasons as previously described for the classification analysis.

Pilot results

We investigate if any of the time-series features in hctsa individually serve as a potential measure of level of conscious arousal in independently obtained recordings from fly brains. We first assessed the performance of hctsa features which we applied to a discovery dataset of previously published fly brain recordings (N = 13) [12,3537]. Then, to assess generalizability, we apply classifiers trained on the discovery flies to recordings obtained from an independent set of pilot evaluation flies (N = 2). Upon in-principle acceptance of this registered analysis, we repeated the analyses conducted on the pilot evaluation flies on a final set of evaluation flies (N = 47), reporting the features which consistently perform well in distinguishing wakefulness from anesthesia and sleep at all recording locations across all the flies.

Classification of conscious level

We first extracted 7,702 time-series features from the initial discovery flies using hctsa, yielding 6,560–6,908 valid features across the 15 channels (M = 6,764). Fig 2A shows a matrix of feature values extracted from Channel 6 in the discovery dataset. We first visually inspected this feature matrix to inspect trends across features and flies. To facilitate interpretation, we first sorted the order of the features according to hierarchical clustering using correlation distances between features, across time series. This revealed two clear clusters of features, one with values which are generally greater during wakefulness (columns roughly 500–1,500), and one with values which are generally greater during anesthesia (columns roughly 4,500–5,500). Features in each of these clusters would likely achieve similar classification accuracies.

Fig 2. Classification performance of hctsa features. (.

Fig 2

a) Values of hctsa features in the discovery flies, at Channel 6. Each row corresponds to an individual 2.25 s epoch, from 13 flies (F) during wakefulness (W) and anesthesia (A). Each row displays scaled values for all valid features for the channel. Features (columns) are ordered based on hierarchical clustering using correlation (across time series) distance between normalized features. This ordering places features with highly correlated values across the dataset close to each other. Arrows indicate the features which attained the highest classification performance in the discovery (green) and pilot evaluation (red) flies. (b) Number of features which achieved statistically significant classification performance at each channel, in the discovery flies (blue line), pilot evaluation flies (orange line), and in both the discovery flies and pilot evaluation flies (broken black line). (c) Correlation of classification performances between the discovery (x-axis) and pilot evaluation flies (y-axis). Each dot represents the classification performance of one of the 6,800 features shown in a). Solid horizontal and vertical lines indicate chance classification performance (= 0.5). Dashed horizontal and vertical lines indicate the thresholds for statistically significant across-fly classification performance in each set of flies (see Methods). Dots located in the top right quadrant are the features which successfully classified wake from anesthesia across both the discovery and pilot evaluation flies. Circled are the features which attained the highest performance in the discovery (green) and pilot evaluation (red) flies (corresponding to the features pointed to in a). Coloured x’s indicate performance of features related to previously described measures of conscious level (features related to spectral power across frequency bands and spectral edge frequency, SP_Summaries; sample entropy, EN_SampEn; permutation entropy, EN_PermEn; Lempel-Ziv complexity, EN_MS_LZcomplexity; approximate entropy; ApEn, as indicated by the color bar). (d) Number of features which achieved statistically significant classification performance at each channel, as in b), but after normalizing the pilot evaluation flies. (e) Correlation of classification performances between the discovery and pilot evaluation flies, as in c), but after normalizing the pilot evaluation flies.

Having reordered the features, groupings across rows corresponding to epochs from individual flies became apparent. This indicated strong within-fly correlations of feature values but weak correlations across flies, suggesting that few features, if any, would generalize across all the flies. Overall, our visual inspection of the similarities across features and similarities across flies suggested that many features could individually achieve better-than-chance classification accuracy. However, there appeared to be no clear cluster of features which would perfectly discriminate wakefulness from anesthesia in all of the flies.

While Fig 2A set up our global expectations visually, there may have been features outside the visually clear clusters which also distinguish wakefulness from anesthesia extremely well. To reveal such features, we next quantified the across-fly classification performance (within the discovery flies). For each feature, we classified wakeful from anesthetized epochs using a nearest-median classification rule. We assessed the statistical significance of the cross-validation accuracy of each feature by comparing it to a distribution of accuracies resulting from random classification (see Methods). For Channel 6, this yielded 3,089 features which performed significantly better than chance (p < .018). The best-performing feature, an index of mean stationarity (hctsa feature: StatAvl250; [52]), achieved a mean classification accuracy of 76% (SD = 12% across 13 cross-validations; Fig 2B). Upon performing the classification analysis for each of the remaining 14 channels, we found features to perform heterogeneously across the channels. Overall, the average classification accuracy achieved across channels tended to be much lower than that achieved by individual channels. For example, the across-channel average of the mean cross-validated accuracy of StatAvl250 was 63% (SD = 6% across 15 channels).

Indeed, the number of significant features varied greatly across the channels, ranging from 14 to 2,948, with channels closer to the periphery tending to have fewer significantly performing features (Fig 2B). We found the greatest number of significant features, along with the most accurately classifying features, to occur at Channels 5 and 6, corresponding roughly to the protocerebrum. This is consistent with our previous analyses on this dataset, which reported better discrimination between wakefulness and anesthesia in some but not all channels [12,35].

We next sought to determine how well the performance of features would generalize to an independent evaluation set of flies. While the overall recording procedure was known by the data analysis team to be similar to that of the discovery flies, the exact experimental methods were not revealed at the time of submitting this registered analysis (see Methods). We finalized the training of classifiers by obtaining thresholds based on all 13 discovery flies. As a pilot for this registered analysis, we applied these classifiers to recordings from 2 flies (out of a total of 12 evaluation flies). Across the 15 channels, we found an additional 48–416 (M = 180) features to either output a NaN or have a constant value across epochs in the pilot evaluation flies.

Fig 2C shows how classification performance at Channel 6 in the discovery flies generalizes into the pilot evaluation flies. The best performing feature at Channel 6 in the discovery flies, StatAvl250, attained a much-reduced accuracy of 63% (green circle). Meanwhile, several features attained higher performance than in the discovery flies. The feature with the best performance at Channel 6 in the pilot evaluation flies, which quantifies the relative low-frequency power in the Fourier power spectrum (hctsa feature: SP_Summaries_welch_rect_logarea_2_1), attained 76% accuracy, despite attaining 62% (SD = 14% across cross-validations) in the discovery flies (red circle). The best-performing features across all the channels in the pilot evaluation flies were related to signal variance at Channel 5 (including root-mean-square, hctsa feature rms, and standard deviation, standard_deviation), and also had greater performance than in the discovery flies, attaining 91% accuracy, compared to 71% (SD = 21% across 13 cross-validations, for both features).

For comparison, Fig 2C also shows the performance of hctsa features which are related to previously described indicators of conscious level in human electroencephalographic (EEG) recordings. These include features related to spectral power across frequency bands, spectral edge frequencies, and spectral entropy (SP_Summaries), sample entropy (EN_SampEn), permutation entropy (EN_PermEn), single-channel Lempel-Ziv complexity (EN_MS_LZcomplexity), and approximate entropy (ApEn) [13,14,53]. Surprisingly, the vast majority of these features did not classify wakefulness from anesthesia significantly for both the discovery and pilot evaluation flies. While we leave the interpretation of high-performing features until after the final analysis, it is notable that most of these features are related to the variance of the voltage fluctuations, which is consistent with previous literature on the effects of anesthesia on fly LFPs [35].

Generally, however, across the channels, we found a drastic drop in the number of features with statistically significant classification accuracy (Fig 2B). This suggested that features overall performed worse in the evaluation flies, and that their performance was again heterogeneous across channels. Across the channels, the number of significantly performing features was substantially less in pilot evaluation flies, ranging now from 9 to 867. Further, after restricting to the set of features which yielded significant cross-validation accuracies in the discovery flies, the number of significant features dropped even further, ranging across the channels from 0 to 561. Fig 2C illustrates this drop also for the majority of previously described indicators of conscious level in EEG, particularly for spectral features which performed well in the discovery flies but at chance level in the pilot evaluation flies. This result alerts us to the danger of interpreting cross-validation accuracies of the discovery flies as an estimate of the true generalization accuracies, which can only be evaluated using an independent dataset. We will discuss the implication of this finding in Discussion in the Stage 2 manuscript. Upon final data analysis, we provide the classification performance of each significant feature, at each channel in https://osf.io/8wvsq/?view_only=8a056d1c573b4f23a6cf6cea8b976ddb.

Given that the performance of so few features generalized to the pilot evaluation flies, we next repeated the above analysis, but after normalizing feature values in the pilot evaluation flies to match the distribution of values in the discovery flies (see Methods). This normalization ensures that the ranges of feature values in the pilot evaluation flies match those in the discovery flies. With normalization, many more features performed significantly in the pilot evaluation flies, ranging from 38 to 2,892 features across the 15 channels (which reduced to 0–2,223 features after restricting to features which also performed significantly in the discovery flies; Fig 2B), with many features which previously performed at chance level in the pilot evaluation flies performing significantly above chance after normalization (Fig 2D). The highest performing feature in the discovery flies, StatAvl250, achieved an improved accuracy of 71%. With normalization, the best performing feature across all the channels in the pilot evaluation flies was also a statistical moment of the signal, this time kurtosis (feature DN_Moments_raw_4 at Channel 5, achieving 92% accuracy). Upon final data analysis, we also provide the classification performance of each significant feature after normalization, at each channel in https://osf.io/8wvsq/?view_only=8a056d1c573b4f23a6cf6cea8b976ddb.

Within-fly effect direction consistency

Given that the across-flies classification performance of many features in the discovery flies did not generalize to the pilot evaluation flies, we next assessed a weaker form of generalization. Even though features may not classify well across subjects, features for which the effect of anesthesia is highly consistent within individuals may still be useful for clinical assessment of conscious level. This is especially true for individual subjects whose baseline neural activity is available (e.g., before anesthetic induction). So, for each feature, we assessed the within-fly effect direction consistency of anesthesia. For an individual fly, consistency would be 1 if the direction of change of feature values from wake to anesthesia (i.e., either an increase or decrease) is conserved for every pairing of wake and anesthesia epochs (see Methods). In other words, for a given feature in a given channel, if the value for wake minus anesthesia is always above 0 for any pair of one wake and one anesthesia epoch (or vice versa), then we consider such a feature as a perfect measure of consciousness when a baseline measurement is available.

Fig 3A illustrates the within-fly effect direction consistency for each feature for the discovery flies, again at Channel 6, by showing the differences in feature values between wake and anesthesia epochs. Overall, the direction of the effect of anesthesia for each feature appeared to be reliable within individual flies, as we expected. However, strikingly, the direction of the effect of anesthesia seemed to be consistent even across flies, despite mediocre classification performance (e.g., due to differing baseline values at each fly). Visually, there appeared to be two clusters of features (from column 500–1,500 and from 4,500 to 5,000) with high consistency.

Fig 3. Within-fly effect direction (wake – anesthesia) consistency of hctsa features. (.

Fig 3

a) Differences in scaled hctsa values between wakefulness and anesthesia in the 13 discovery flies (F), at Channel 6. Each row displays the difference between a wakeful and anesthesia epoch from the same fly, for all valid features for the channel. Features (columns) have the same ordering as in Fig 2A. Arrows indicate the features which attained the highest classification performance in the discovery (green) and pilot evaluation (red) flies. (b) Number of features which achieved statistically significant consistency at each channel, in the discovery flies, pilot evaluation flies, and in both the discovery flies and pilot evaluation flies. (c) Correlation of within-fly effect direction consistencies between the discovery (x-axis) and pilot evaluation flies (y-axis). Each dot represents the consistency of one of the 6,800 features shown in a). Solid horizontal and vertical lines indicate chance consistency (= 0.5). Dashed horizontal and vertical lines indicate the thresholds for statistically significant consistency in each set of flies (see Methods). Dots located in the top right quadrant are the features which were significantly consistent across both the discovery and pilot evaluation flies. Circled are the features which attained the highest consistency in the discovery (green) and pilot evaluation (red) flies. Colored x’s indicate performances of features related to previously described measures of conscious level, as described in Fig 2C and indicated by the color bar.

We assessed the statistical significance of each feature, this time by comparing its consistency to a distribution of consistencies for randomly labeled epochs (see Methods). For Channel 6, this gave 3,882 features which were more consistent than chance (p < .027). The feature with the highest consistency at Channel 6, as well as on average across all the channels (a measure of variability along the identity line when plotting, in our data, time samples against their immediate future values, hctsa feature: MD_rawHRVmeas_SD2; [54]) had a consistency of 0.94 (i.e., on average, each wakeful epoch from an individual fly had a greater value than 94% of anesthesia epochs from the same fly), which previously attained an across-flies cross-validation accuracy of 68% (SD = 20%). Across the 15 channels, in general we found many more features to have significant within-fly consistency (2,474–3,882; Fig 3B), compared to across-flies classification.

We next assessed how the within-fly effect direction consistencies generalized to the pilot evaluation flies. We computed consistencies in the pilot evaluation flies, taking into account the direction of the effect of anesthesia observed in the discovery flies. Hence, if wakeful and anesthesia epochs were perfectly separable in the same direction as the discovery flies, consistency would be 1. However, if they were perfectly separable in the opposite direction to the discovery flies, consistency in pilot evaluation flies would be 0.

Fig 3C shows how within-fly effect direction consistency at Channel 6 in the discovery flies generalizes to the pilot evaluation flies. Unlike for across-flies classification, within-fly consistencies between the discovery and pilot evaluation flies seemed to be strongly positively correlated, including features related to previously explored measures of conscious level. This indicates that the within-fly consistency of many more features generalized to the pilot evaluation flies. The feature with the highest consistency in the evaluation flies, root-mean-square (hctsa feature: rms), also achieved high consistency in the pilot evaluation flies (0.91, red circle and arrow). Across the 15 channels, the number of significantly consistent features seemed to vary more in the pilot evaluation flies, ranging from 463 to 3,899 (Fig 3B). This range reduced to 176–3,049 after restricting to the set of features which also had significant consistency in the discovery flies.

Notably, the decrease in the number of significant features for within-fly consistency in the pilot evaluation flies was less pronounced than for across-fly classification performance. Like across-fly classification, there were more significant features for within-fly consistency at the central than peripheral channels. Overall, these results indicate that many features could be informative and consistent in terms of changes within a single fly due to anesthesia without being strong, absolute measures of conscious level across flies. We will revisit the implication of this finding in Discussion after the final data analysis. Upon final data analysis, we provide the consistencies of each significant feature, at each channel in https://osf.io/8wvsq/?view_only=8a056d1c573b4f23a6cf6cea8b976ddb.

Results

As described in the Stage 1 Manuscript, we repeated the analysis as described in Methods and Pilot results, extending and updating the analyses done on the pilot evaluation flies to the full set of 47 evaluation flies. We investigated each of the time-series features in hctsa as potential measures of level of conscious arousal. The number of time-series features extracted using hctsa which we considered to be valid varied across the datasets. The multi-dosage evaluation flies yielded 7,401–7,420 valid features across the 15 channels (M = 7,413), the single-dosage evaluation flies yielded 6,570–6,923 valid features (M = 6,632), and the sleep evaluation flies yielded 6,492–6,963 valid features (M = 6,629). Considering only features which were valid across all the evaluation flies and the discovery flies yielded 6,367–6,742 (M = 6,480) valid features altogether across the 15 channels. To evaluate the generalizability of features identified in the Stage 1 Manuscript, we applied the classifiers previously trained on the discovery flies during the Stage 1 analysis to recordings from the independent set of evaluation flies (N = 49) which were previously inaccessible to the data analysis team (AL, BF, and NT). We report for features which achieved significant classification performance or consistency across all the evaluation flies and discovery flies.

Poor classification generalization due to heterogeneous feature value ranges among flies and datasets

We first checked whether core assumptions required to evaluate classification generalization were met. Specifically, to achieve good classification performance in the evaluation flies, we required feature value ranges in the evaluation flies to be similar to those of the discovery flies. Fig 4A shows the distributions of channel-averaged feature values across each of the flies from all the datasets, for the main wake and anesthesia/sleep conditions (see Methods). Visual inspection indicated high inter-fly variability in feature values, as well as heterogeneity in feature ranges across datasets. Specifically, flies from part of the multi-dosage and sleep datasets seemed to have feature values whose ranges differed greatly to those in the discovery flies (horizontal bands in Fig 4A).

Fig 4. Classification performance of hctsa features across all the flies. (.

Fig 4

a) Scaled values of hctsa features in all flies, averaged across epochs and channels. Each row corresponds to one fly from each of the datasets (N = 13 discovery, N = 8 MD8, N = 4 MD4, N = 18 single-dosage, and N = 19 sleep flies, respectively). Features (columns) are ordered based on hierarchical clustering using correlation (across time-series) distance between features. Arrows indicate features which attained the highest average classification performances across the evaluation flies. (b) Number of features which achieved statistically significant performance at each channel, in 1, 2, 3, 4, and 5 datasets, including the discovery flies, are indicated by the color of lines as brown, orange, green, light blue and dark blue, respectively. Thin lines show for individual combinations of datasets, while thick lines show the average across all combinations. (c) Correlation of classification performances in the 13 discovery flies (x-axis) and average classification performances in the 49 evaluation flies (y-axis), at Channel 1. Each dot represents the classification performance of each of the features shown in a). Dots outlined with small circles indicate features which consistently achieved significant classification performance in all the flies. Solid horizontal and vertical lines indicate chance performance (= 0.5). Dashed vertical line indicates threshold for statistically significant across-fly classification performance in the discovery flies. Colored x’s indicate performance of features related to previously described measures of conscious level (features related to spectral power across frequency bands and spectral edge frequency, SP_Summaries (orange); sample entropy, EN_SampEn (purple); permutation entropy, EN_PermEn (green); Lempel-Ziv complexity, EN_MS_LZcomplexity (blue); approximate entropy; ApEn (red)). (d–e) Same as (b) and (c), but for classification after normalizing feature values in each set of evaluation flies to match the distribution of values in the discovery flies.

Given the high variability across flies and datasets, we suspected that performance of the classifiers trained on the discovery flies would likely be low in the evaluation flies. The thin brown lines in Fig 4B show the number of significantly classifying features at each channel for each of the datasets. Across each of the evaluation datasets, the number of significantly performing features varied substantially (thin red lines in Fig 4B). In general, the number of significant features was greater in the multi-dosage evaluation flies (for MD8, 1,255–4,014 significantly performing features, M = 2,912, SD = 780 across 15 channels; and for MD4, 698–2,934, M = 1,667, SD = 630), slightly less in the single-dosage evaluation flies (0–2,304 significantly performing features, M = 745, SD = 763) and much less in the sleep evaluation flies (0–1,552 significantly performing features, M = 260, SD = 397). The other thin lines illustrate how the number of significantly performing features drops as the requirement for extends from classifying significantly in one dataset to two datasets, etc. Overall, consistent with our expectation, most features did not in fact generalize across all the datasets. In fact, only 47 features achieved significant classification performance across all the datasets, with all of them achieving so only at the deepest channel, Channel 1 (thick, dark blue line in Fig 4B).

Given that we only found significant features at Channel 1, it follows that the high-performing features we previously highlighted in the Stage 1 Pilot results, the stationarity-related feature StatAvl250 and low-frequency power-related feature SP_Summaries_welch_rect_logarea_2_1 at Channel 6, failed to generalize to all the evaluation flies. Across all the datasets, they achieved an average classification accuracy, weighted by number of flies in each set, of 55% and 51% at Channel 6, failing to achieve significant performance in the sleep flies (in the case of StatAvl250) or in the sleep and single-dosage flies (in the case of SP_Summaries_welch_rect_logarea_2_1). Further, the variance-related features rms and standard_deviation at Channel 5, which we previously highlighted as the best-performing features across all channels, both failed to generalize and did not achieve significant performance across all the datasets at any channels, including at Channel 1. At Channel 5, they only achieved significant performance in four of the multi-dosage evaluation flies (MD4; see Methods). The failure of these features in generalizing across all the evaluation flies illustrates the issue of focusing on the highest-performing features in a given dataset, which can fail to generalize to new datasets due to inter-dataset variability.

Meanwhile, the features which did achieve significant classification performance for all datasets achieved lower accuracies than those we highlighted in the Pilot results. Average classification performance, weighted by the number of flies in each dataset, ranged from roughly 60% to 64%. To illustrate how these features achieve significant classification performance across all the flies, but not others (such as those features highlighted in the Pilot results), we show the distribution of raw values for two features in Fig 5. Fig 5A5C show the distributions of values for one significantly-performing feature, along with one feature which failed to generalize across all the datasets, the low-frequency power feature SP_Summaries_fft_area_5_1. While the distribution of values for the low-frequency power is such that there is an apparent difference in wake versus anesthesia/sleep values, the range of feature values is different to the discovery flies. As a result, across-fly classifiers trained on the discovery flies fail to classify the level of consciousness for these flies.

Fig 5. Distributions of feature values in generalizing and non-generalizing features. (.

Fig 5

a) Distributions of values for a feature which achieved significant classification accuracy across all datasets (f1, NL_BoxCorrDim_50_ac_5_minr13, related to the concept of a fractal dimension, with the greatest weighted average performance across datasets, 64%) against one which achieved significant classification accuracy in the discovery flies, but not subsequently in all the evaluation flies (f2, SP_Summaries_fft_area_5_1, related to low-frequency power, with weighted average performance of 53%), both at Channel 1. Solid lines indicate discrimination thresholds obtained from the discovery flies. (b, c) Distributions for f1 and f2, grouped by individual flies. Open, bolded circles indicate median values for each fly. (d) Difference in autocorrelation features for each dataset (wake minus anesthesia or sleep, and averaged across flies, excluding 3 single-dosage evaluation flies, see Fig D of S1 Text). Vertical dotted line indicates the greatest time-delay for which autocorrelation was evaluated (in the hctsa feature set). Shaded areas indicate standard error across flies for each dataset. (e) Example time series for the epochs with the 10 greatest feature values across all epochs and flies for f1, autocorrelation at 30 ms (AC_30), and f2 during wakefulness (red), and the 10 smallest feature values during anesthesia/sleep (blue).

Fig 6A and 6B show each of the features which achieved significant performance in all datasets, their values for each of the flies and conditions, and how they cluster together based on correlating their values across all flies and epochs. There appeared to be several themes of features which significantly generalized, such as autocorrelation features (e.g., AC_29), fractal dimension related features (e.g., NL_BoxCorrDim_50_ac_5_minr13), and features related to outlier detection (e.g., ST_LocalExtrema_n50_stdmax). Despite the seeming variety in the themes of the features, feature values were highly correlated across epochs. As such, these features are likely capturing some similar common aspect of the time series in order to distinguish wake from anesthesia and sleep. In particular, we highlight the autocorrelation features (AC_29 through AC_34), which are the simplest to understand. Specifically, these features indicate that there is higher autocorrelation in the time series during wakefulness at a timescale of roughly 30 ms (Fig 5D).

Fig 6. Empirical grouping of significantly classifying features with and without normalizing evaluation flies. (.

Fig 6

a) Average feature values (across epochs and channels) from each fly, for each feature which consistently achieved significant classification performance across all the flies. Note that axes are swapped, compared to Fig 4A. (b) Agglomerative hierarchical cluster tree grouping significant features, constructed using Spearman correlation distances as linkage distances between every pair of features (computed using all epochs from all flies). Dendrogram colors indicate groupings of features, using a threshold absolute Spearman correlation distance of 0.7. Individual feature names are listed to the right, along with their average classification performance in the evaluation flies. Colors indicate the broad theoretical category from which individual features belong (descriptions of each category can be found in the documentation for hctsa, at https://time-series-features.gitbook.io/hctsa-manual/information-about-hctsa/list-of-included-code-files). (c, d) Same as (a) and (b), but showing for the top 50 features with the greatest average classification performance, across all channels, after normalizing feature values in each set of evaluation flies to match the distribution of values in the discovery flies.

Normalizing feature values only marginally improves generalization

Most features likely failed to generalize due to the different ranges of feature values across the datasets as seen previously in Figs 4 and 5. Specifically, the ranges of values for many features in the MD4 and sleep flies did not overlap with ranges of values for the discovery, MD8, or single-dosage flies, across both wake and unconscious conditions. For such cases, classifier performance would achieve only chance accuracy even if the wake and unconscious conditions are separable within the specific dataset. To account for this inter-dataset variation, we repeated the classification analysis after normalizing feature values in the evaluation flies such that the means and standard deviations across epochs matched those of the discovery flies (see Methods). In the context of a potential future marker in clinical settings, this normalization can be considered as corresponding to calibration of measurement devices to particular environments.

Normalizing feature values in this manner marginally reduced the number of significantly classifying features in the MD8 flies (1,101–4,111 significantly performing features, M = 2,823, SD = 823 across 15 channels). Meanwhile, the number of significantly performing features greatly increased in the MD4 and sleep flies (1,066–3,507, M = 2,811, SD = 727 in the MD4 flies; and 0–2,095, M = 892, SD = 587 in the sleep flies), and marginally so in the single-dosage flies (0–2,319, M = 754, SD = 775). As such, normalizing feature values before performing classification most benefited the MD4 and sleep flies, consistent with the distributions of values shown in Figs 4A and 5B.

Surprisingly, when considering only features which achieved significant classification performance for all datasets, this normalization resulted in only a slightly higher number of significantly performing features. Fig 4E shows how while the classification performance for many features increased as a result of performing the normalization, most still failed to generalize. In total across the 15 channels, 263 features achieved significant classification accuracy across all the datasets. Channel 1 showed the greatest number of significantly performing features, with almost half of these features, 126, followed by Channel 6 with around a third, 92 (Fig 4D). Similar to the previous results without normalization, the highest performing feature which we highlighted in the pilot evaluation flies, kurtosis (DN_Moments_raw_4) previously in Channel 5, failed to generalize across all the datasets, at any channel.

Considering the 50 features that achieved the greatest classification performance at any of the channels (treating the same feature at different channels as separate), we found three general clusters of features (Fig 6D). One of these clusters (colored blue in Fig 6D) included only features at Channel 1 and corresponded to the autocorrelation-related features which previously generalized when performing classification without normalization (cf. Fig 6A and 6B). The other two clusters included features from different channels (specifically Channels 3, 6, and 9). The first of these (colored green in Fig 6D) included features similar to the features highlighted previously for Channel 1, again related to fractal dimension and variation in extreme values in short time windows, but at mainly Channel 6. The separation of this cluster from the previous suggests some small difference in the autocorrelation of signals from the center of the fly brain with those obtained elsewhere. The second of these additional clusters (colored red in Fig 6D) included features related to variability in time samples, similar to those highlighted in our consistency analysis of the Pilot results, such as MD_rawHRVmeas_SD2. We provide classification performances for all features and channels in https://osf.io/8wvsq/?view_only=8a056d1c573b4f23a6cf6cea8b976ddb.

Consistent effect of loss of consciousness even in features that did not exhibit significant classification ability

While the above normalization matched the distributions of feature values across datasets, it did not address variability among individual flies. Specifically, while the ranges of feature values may have varied greatly among flies, the direction of the effect of anesthesia or sleep may have been consistent across flies. So, to address the high inter-fly variability in feature values, we next ignored individual differences in raw feature values and evaluated the degree to which within-fly effect direction consistencies generalized to the full set of evaluation flies. That is, we aimed to investigate whether some features reliably increase (or decrease) from wake to anesthesia and sleep, making them a reliable relative indicator of a change in conscious arousal within a given fly.

Removing individual differences in this manner yielded slightly fewer significant features than for classification in the MD8 flies (208–3,788, M = 2,252, SD = 975 across 15 channels), but many more for the other flies (1,763–3,741, M = 2,857, SD = 480 in the MD4 flies; 70–3,063, M = 1,846, SD = 1,009 in the single-dosage flies; and 649–2028, M = 1,728, SD = 577 in the sleep flies). Features may achieve significant classification accuracy but not consistency in cases where there is greater variability in feature values during, e.g., wakefulness, such that they surround the values for anesthesia. In such a scenario, depending on the trained classifier threshold, a large majority of anesthesia data points may be classified correctly, along with a minority of wakeful data points, leading to significant classification performance but not consistency.

In focusing on the direction of the effect of anesthesia and sleep within flies, many features appeared to be able to discriminate between wake and anesthesia or sleep. Fig 7A shows differences in feature values, at Channel 1, between wake and unconscious epochs to illustrate within-fly effect direction consistency for each feature across all the flies. In contrast to raw values (Fig 4A), which showed greater variability in feature values among flies than among conscious levels, the direction of the effect of anesthesia or sleep for many features appeared to be reliable across flies for many features. This is consistent with what we previously reported in the Pilot results, where the direction of the effect of anesthesia seemed consistent across flies even despite mediocre classification performance.

Fig 7. Within-fly effect direction (wake – anesthesia) consistency of hctsa features across all flies. (.

Fig 7

a) Differences in scaled hctsa values between wakefulness and anesthesia/sleep in all the flies, averaged across channels and epochs for each feature. Each row corresponds to one fly from each of the datasets (N = 13 discovery, N = 8 MD8, N = 4 MD4, N = 18 single-dosage, and N = 19 sleep flies respectively). Features (columns) are ordered as in Fig 4A. (b) Number of features which achieved statistically significant within-fly direction effect consistency at each channel, in 1, 2, 3, 4, and 5 datasets, including the discovery flies, are indicated by the color of lines as brown, orange, green, light blue and dark blue, respectively. Thin lines show for individual combinations of datasets, while thick lines show the average across all combinations. (c) Correlation of classification performances in the 13 discovery flies (x-axis) and average classification performances in the 49 evaluation flies (y-axis). Each dot represents the classification performance of each of the features shown in a). Dots outlined with small circles indicate features which achieved significant consistency in all the datasets. Solid horizontal and vertical lines indicate chance performance (= 0.5). Dashed vertical line indicates threshold for statistically significant consistency in the discovery flies. Colored x’s indicate performances of features related to previously described measures of conscious level, as described in Fig 4A and indicated by the color bar. Thicker x’s indicate those features which achieved significant consistency in all the datasets. (d) Average feature values (across epochs and channels) from each fly, for the 50 features with the greatest consistencies across all the flies, across all 15 channels. (e) Agglomerative hierarchical cluster tree grouping significant features, constructed using Spearman correlation distances as linkage distances between every pair of features (computed using all epochs from all flies). Dendrogram colors indicate groupings of features, using a threshold absolute Spearman correlation distance of 0.7. Individual feature names are listed to the right, along with their average classification performance in the evaluation flies. Colors indicate the broad theoretical category from which individual features belong.

As in the Pilot results, there again appeared to be clusters of features (such as in columns ~2,900 to ~3,000, or columns ~3,600 to ~3,800) for which consistency would be high. However, the number of features in these columns was an order of magnitude smaller than what we previously reported in the Pilot results. Specifically, when considering only features for which the direction of the effect of anesthesia or sleep was significantly consistent, the number of significant features ranged from 13 to 902 across the 15 channels, with the greatest number of significant features occurring at Channel 3 (Fig 7B). This is likely a consequence of variability across datasets from factors such as differences in exact experimental setup (e.g., electrode location), severely limiting inter-fly generalization.

Fig 7C shows how within-fly effect direction consistency at Channel 1 generalized to the evaluation flies. In contrast to our classification analysis, features related to well-known markers of consciousness, i.e., those related to spectral power and complexity, showed significant within-fly effect direction consistency (thicker x’s in Fig 7C). However, these did not achieve the greatest averaged consistencies across the datasets. Instead, features related to signal variance generally achieved the greatest consistencies. The feature which achieved the greatest consistency was MD_rawHRVmeas_SD1 at Channel 3, a measure of variability between consecutive time samples (conceptually related to MD_rawHRVmeas_SD2 which we previously highlighted in the Pilot results, which also achieved a significant consistency of 0.73 at Channel 3; both features also achieved significance in Channel 6, 0.69, and 0.68, respectively for SD1 and SD2). The top 50 features, taken from any of the 15 channels, were all related to descriptions of the distribution of time samples (such as standard deviation, standard_deviation, and distributional moments, DN_Moments_raw), and clustering of these indicated that their pattern of values across epochs were all indeed highly correlated, even across channels (Fig 7E). These results suggest that relatively simple measures related to variability and distributional shape may be more reliable in distinguishing conscious levels in individuals than, e.g., low-frequency power or complexity measures, when baseline measurements are available. We provide within-fly effect direction consistencies for all features and channels in https://osf.io/8wvsq/?view_only=8a056d1c573b4f23a6cf6cea8b976ddb.

Discussion

In this registered report, we used the hctsa univariate time-series feature library, the most comprehensive set of available univariate time-series features, to search for features which may serve as potential markers of consciousness. In our Stage 1 manuscript, we first searched for features which discriminate wakefulness from isoflurane anesthesia in a discovery set of flies (N = 13), training and fixing classifiers for each feature using only these flies. To account for the potential for poor generalization, we also evaluated the degree to which the direction of the effect of anesthesia or sleep on feature values was consistent among these flies.

In Stage 2, we evaluated the degree to which the classifiers trained in Stage 1 discriminated wakefulness from loss of consciousness in independent sets of evaluation flies, including conditions of anesthesia (N = 28) and sleep (N = 19), which were previously blinded to the data analysis team. Here, we found only a small set of features to achieve significant performance across all the datasets, at the center of the fly brain (Channel 1). We also evaluated the within-fly effect direction consistency for all the flies, finding many more features to achieve significant consistency than for classification. These features included those related to well known, previously reported markers of consciousness, such as spectral power or complexity measures, though features with the highest consistencies were related to measures of variance.

Across-subject classification using single, specific features—which is ideal for a consciousness measure—is rarely evaluated across independent datasets involving new unseen levels of consciousness, especially in the context of “discovering” and proposing new potential consciousness markers. Studies which do evaluate already-proposed markers across independent datasets consisting the same conscious conditions, such as [14], still report limited generalization performance, likely because of the strong focus on traditional spectral power or complexity measures which may be more suited for within-subject comparisons. We elaborate on these points below.

Within-dataset generalization versus blinded generalization to new datasets

We originally focused on Channel 6 in our Stage 1 Pilot results as it showed the greatest number of significantly performing features in the discovery flies. Further, many of these features related to prominent consciousness markers, such as those related to low-frequency spectral power, achieved above chance cross-validated classification performance in these flies. While many of these already failed to generalize to the pilot evaluation flies, one feature related to low-frequency spectral power did achieve significant generalization, in fact achieving the greatest classification performance in the pilot evaluation flies. We also highlighted better-performing features, related to variance of voltage fluctuations and stationarity. Normalizing feature values in the pilot evaluation flies to match the means and standard deviations in the discovery flies allowed many of these features to achieve significant classification accuracy.

However, none of these previously highlighted features generalized across all four sets of evaluation flies, even when normalizing feature values to match the means and standard deviations of those in the discovery flies. Instead, we identified features which previously were overshadowed in terms of classification performance. The failure of previously highlighted features in generalizing to the evaluation flies illustrates potential drawbacks of evaluating individual candidate consciousness markers in single datasets, even when carrying out cross-validation. Specifically, focusing on single individual datasets inflates the seeming performance of particular features (whereas across datasets, there may be high variation in ranges of feature values, especially in cases where inter-experimental variability may be substantial), while focusing on particular candidate markers greatly limits the potential for discovering other candidate markers. These drawbacks are problematic as it can lead to frequent supposed discoveries of new consciousness markers which ultimately do not generalize. More generally, selection of specific analysis methods by different research groups, even on the same dataset, can lead to inconsistent conclusions [55]. These issues apply equally to our own previous publication [17], which requires further validation studies with larger, unseen datasets in order to better evaluate generalization.

One concrete factor which contributed to the lack of generalization for many features was the difference in feature value ranges across datasets and individual flies. In particular, values in the MD4 and sleep flies tended to be overall much greater or much smaller than in the other flies. For example, for a feature related to low frequency power, almost all the values in these flies were above the threshold for classifying wake from anesthesia trained in the discovery flies. If this was the case for just the sleep dataset, this might be interpreted as sleeping flies having a higher level of consciousness compared to anesthetized flies. However, a same difference in value ranges was present also in the MD4 flies. The difference in feature value ranges might also be attributed to variations in recording setup. However, all the datasets analyzed here were recorded from the same laboratory (BvS), and all the evaluation datasets were recorded by one PhD student (TJ) using the same recording setups. As such, while there is likely variation in exact channel location across all the flies (Fig 1F from [43] illustrates this variability for a whole-brain preparation), systematic differences in channel location across the datasets is unlikely to be the case (see S1 Text regarding difference in depth of electrode probe insertion between the discovery and all the evaluation flies). This is especially the case across the evaluation datasets due to a method for consistent electrode placement ([46]; which was developed after recordings from the discovery flies were made). Other potential sources of variation might lie in seemingly small experimental differences, such as the age of the flies (see Methods).

Why do so many features fail to generalize despite consistent direction of effect of loss of wakefulness?

We evaluated two forms of generalization. As discussed above, we first evaluated the degree to which the classifiers trained on the discovery flies could predict wakefulness from loss of consciousness in the evaluation flies. We next evaluated the degree to which the direction of the effect of anesthesia or sleep in the evaluation flies was consistent with that in the discovery flies. In principle, evaluating within-fly effect direction consistency factors out inter-individual and inter-dataset differences in feature value ranges. While we found few features to achieve significant classification performance, we found many more features to have a consistent direction of effect of anesthesia or sleep. As such, much of the low generalizability in classification performance was likely to have been due to differences in feature value ranges, rather than, e.g., inability to separate wake from anesthesia or sleep in some datasets. This suggests that a potential marker of consciousness may globally shift across datasets. As potential reasons for this global shift, we consider two possible explanations.

The first explanation relates to variation in experimental and recording setup as we raised earlier. In addition to variation in exact electrode placement and age of the flies, other differences which may have led to shifts in values include obtaining recordings in a dark versus lit-up room, presentation of stimuli relevant to the original experiments in between periods which we analyze, such as flickering lights or air puffs, and so on. However, we did not expect these factors to greatly affect a potential marker of conscious level, especially if we assumed flies to have the same level of consciousness in all wake conditions and again in all the anesthetized and sleep conditions. If this is the case, then features which showed significant consistency do not generalize enough to be considered ideal consciousness markers which can be applied across many contexts. Rather, they require some baseline observation made with the same setup and in the same environment to be able to inform as to an individual’s consciousness level. Or conversely, conditions during a test observation should be manipulated to be comparable to those during a baseline observation. Which specific environmental factors are more important for this may be clarified through experiments explicitly manipulating the background environment.

The second explanation is sensitivity to inter-individual differences. The sleep evaluation flies consisted of flies which were several days younger than in the other datasets. However, we consider this explanation unlikely, as the MD4 flies who also appeared to have shifted value ranges, share the same age range as the other multi-dosage evaluation flies. However, baseline levels of consciousness may have varied among flies due to individual sensitivity and recovery to electrode insertion and the accompanying cold anesthesia during preparation, or sleepiness due to time of the recording. The problem of uncertain baseline levels of consciousness introduces a level of circularity in evaluating potential consciousness markers, but is somewhat mitigated through the registered report format where authors and reviewers agree on and fix a definition of conscious/non-conscious (and its implied variability across individuals), before evaluating candidate markers. In this vein, we are currently also conducting another registered report study to evaluate consistency of candidate markers between human and monkey neurophysiology data [56].

Lastly, we acknowledge an important assumption in this study—that the neural process(es) underlying consciousness are affected in the same way during loss of consciousness for both anesthesia and sleep. However, there are clear differences between anesthesia and sleep, such as the possible presence of dreams during sleep in humans and maybe even flies [46], or the capability of human subjects to follow verbal commands during various sleep stages [57]. As such, the general lack of generalization across most features may also be interpreted in several ways. One simple interpretation is that sleep does not induce a loss of consciousness to the same degree as anesthesia, such that it presents as an intermediate level of consciousness between anesthesia and wakefulness (e.g., in the case of dreams). Another is that neither anesthesia nor sleep entail the complete loss of consciousness, and each follows a different path of breaking down of the process(es) underlying consciousness. A third interpretation is that non-conscious neural processes during sleep such as those related to memory consolidation, which are suppressed under anesthesia, may be masking the breakdown of consciousness related processes. Given this interpretation, many of the univariate features in hctsa lack the sensitivity to distinguish these processes from consciousness related ones. In this vein, more complex bi- or multi-variate features which incorporate information across the brain might be more able to distinguish such processes.

The importance of investigating individual features

Here, we evaluated and compared the performance of individual features available in hctsa in distinguishing between wake and anesthesia or sleep. In doing so, it is possible to identify specific time-series features or overall themes which similar features capture. This may inspire new hypotheses about how neural activity leads to consciousness. Meanwhile, it is also possible to combine multiple features together with multivariate classification. In particular, combining features in this manner may achieve greater classification performance in distinguishing conscious levels. There is already work following this approach, combining already proposed markers of conscious level in the literature [13,14] to achieve better classification of disorders of consciousness. This avenue is particularly appealing due to the wide range of analysis types available in hctsa. However, this kind of analysis has drawbacks regarding interpretation of features, and should be carried out carefully.

Generally speaking, multivariate classifiers require fitting large numbers of parameters. However, when the amount of parameters to fit exceeds the available training data, these classifiers can severely suffer from overfitting. Most published studies claim to overcome this difficulty using the process of “cross-validation”, which repeatedly uses different “folds” of a given dataset for training and testing (which we conducted in the discovery flies). However, cross-validation only addresses this problem if the data is sufficiently varied to represent the true variability in the full, global population. While even an overfitted model can provide good generalization to new data which occurs with the “region” of the training data [58], classification can suffer greatly in cases of data outside those regions, e.g., the addition of a sleep condition in our evaluation flies. Meanwhile, univariate analysis requires much less parameter fitting, drastically reducing the chance of overfitting. Even so, we already observe something related already in our univariate classification, where classification suffers greatly from the inclusion of a sleep dataset, when the classifiers were trained on an anesthesia dataset.

A more important issue regards the interpretability of multivariate classifiers. While combining multiple features into a single classifier can greatly improve classification performance (ignoring the issue of overfitting), how to interpret a bundle of wildly varying candidate consciousness markers beyond evaluating their contribution in improving classification performance (such as in, e.g., [14]) is unclear. This is the case in recent, successful artificial neural networks such as large language models, where huge numbers of parameters are fit to extremely large and varied datasets. Such models are able to provide seemingly good predictions, but in a completely black-box manner, where the mechanism by which these predictions are generated is abstracted out. Thus, from an application viewpoint, better classification performance using multivariate classifiers can be appealing. However, from a scientific viewpoint, pursuit of better classification performance may not necessarily deepen our understanding of a given phenomenon. To pursue the latter, we now turn to interpreting the univariate features we highlighted in our results.

Autocorrelation as a marker of conscious level

In focusing on individual features, we identify a theme of analysis—autocorrelation—which to our knowledge has not received much attention in consciousness research. Interestingly, features related to the concept of spectral power did not successfully generalize across all the datasets, despite autocorrelation and the power spectrum being directly related through a Fourier transform. This may, however, be due to lack of granularity of the available features in hctsa covering the frequency domain, compared to the available autocorrelation features.

Many of the features which significantly discriminate wake from anesthesia and sleep, are related to the notion of fractal dimension. However, these features (and in general, features which include “_ac” in their name) include early processing steps which depend on properties of autocorrelation. For example, NL_BoxCorrDim_50_ac_5_minr13 first generates a time-delay embedding based on when the autocorrelation function first reaches zero. Despite including more complicated later processing steps, these fractal dimension features achieved comparable discrimination performance to the simpler autocorrelation features. Given these points, and the in general strong correlations among significantly classifying features, it is likely that the performances of such fractal dimension features are in fact being driven by autocorrelation properties.

The concept of fractal dimensions has been somewhat explored in the context of detecting drowsiness and discriminating sleep stages in EEG recordings. Specifically, measures of fractal dimension have been reported to be reduced during drowsiness [59,60] and in deeper sleep stages [61,62], and during anesthesia [63,64], consistent with our own results. However, the theoretical motivation thus far of using such a method has been limited to using a complex nonlinear analysis in the hopes of capturing so-called complex nonlinear dynamics in the brain. Given our findings, and lack of a priori theoretical motivation for investigating fractal dimension related measures, results from these studies may simply be reflecting differences in autocorrelation.

Directions for future investigation

We have evaluated a vast library of time-series features as candidate markers of conscious level, and identified features which generalize across multiple independent fly datasets. In doing so, we are taking steps similar to an “iterative natural kind” approach to finding consciousness markers [65]. In this approach, we assume consciousness to be a “natural kind”, that is, that conscious systems share an underlying nature which is identifiable through iterative procedures. Specifically, we make the assumption that a relatively simple system such as the fly brain is conscious to identify potential pre-theoretical markers of consciousness which should then be gradually tested in more complex systems and developed into a theory whose explanatory power and simplicity can be evaluated. This in contrast to starting from pre-established or popular markers (such as applying power spectral measures which seem to discriminate conscious levels in human recordings in order to test if they are conscious), or to starting from some kind of theory and testing its predictions.

While there are several popular theories of consciousness [66,67], one stands out in providing an operationalized measure which can be directly applied to neural recordings in general. Integrated information theory [6871] attempts to start from first principles, identifying universal aspects of consciousness and then deriving a multivariate measure, integrated information, which reflects the extent to which a system supports these aspects. We previously applied one version of integrated information [70] to the discovery flies, finding similar classification performance as the high performing features in the Pilot results (i.e., higher performance than what we reported in the Stage 2 Results). However, we did not evaluate generalization to new datasets as we did in the present study. Further, as a multivariate measure, computed across multiple channels, it is not clear whether a direct, fair comparison can be made between integrated information and the univariate features evaluated here.

A more fair comparison can be made, however, equating the number of channels used, with other bi- or multi-variate measures. In this regard, there is a more recent toolbox, pyspi [72], which provides over 250 bivariate analysis methods, such as correlation, coherence, and Granger causality, and includes the theory-driven integrated information. Given that a common idea is that it is the interactions among neurons or brain regions which matter for consciousness, evaluating whether these multivariate features perform better and generalize more successfully than the univariate features in hctsa—or whether combining multivariate and univariate features improves generalization beyond using either alone [73]—is a clear next step to take in finding and evaluating potential consciousness markers.

Overall, this work highlights the limitations of the standard cross-validation approach in discovering and validation potential measures of consciousness. In particular, we show that while standard cross-validation within a dataset can find potential markers, they may not generalize to independently obtained datasets. Further exploration of properties relating to autocorrelation may lead to a reliable across-subject consciousness marker utilizing some regularized combination of fractal dimension-related features. Understanding if and how such features are related to existing consciousness theories will help distinguish among them. Meanwhile, the extension of the exploration of features to include bi- or multi-variate measures characterizing interactions among neural populations, especially any which are proposed from theories of consciousness, will also be a fruitful avenue toward better identifying conscious level across subjects.

Supporting information

S1 Text. Correlations in feature values between discovery and evaluation flies with and without offsetting evaluation fly channels.

Supplementary figures.

(PDF)

pbio.3003217.s001.pdf (716.3KB, pdf)

Abbreviations

EEG

electroencephalographic

FDR

false discovery rate

hctsa

highly comparative time series analysis

LFPs

local field potentials

Data Availability

Pre-processed data from the discovery flies are available on Figshare: https://doi.org/10.26180/5ebe420ae8d89 Pre-processed data from the pilot evaluation flies and blinding procedure are available on OSF: https://osf.io/bq5ry/?view_only=3789097395c1419db2a9eb615bc1effe Pre-processed data, hctsa values, and classification performances and within-fly effect direction consistencies for all the flies are also available on OSF: https://osf.io/8wvsq/?view_only=8a056d1c573b4f23a6cf6cea8b976ddb Analysis codes are available on Zenodo: https://doi.org/10.5281/zenodo.15370576

Funding Statement

AL and TJ were supported by Australian Government Research Training Program (RTP) Scholarships. TJ and BvS were funded by the National Health and Medical Research Council Project Grant GNT1065715. BvS was funded by the National Health and Medical Research Council Grant GNT1164879. AL, BF, and NT were funded by the National Health and Medical Research Council Ideas Grant GNT1183280. NT was funded by TWCF0199 from Templeton World Charity Foundation, Inc., Australian Research Council Discovery Projects DP180104128 and DP180100396, and the National Health and Medical Research Council APP1183280. AL and NT were funded by Japan Society for the Promotion of Science, Grant-in-Aid for Transformative Research Areas (A) (23H04829, 23H04830). BF acknowledges support from the Australian Research Council (FT240100418). The funders had no role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.Laureys S, Gosseries O, Tononi G. The neurology of consciousness: cognitive neuroscience and neuropathology. Academic Press; 2015. [Google Scholar]
  • 2.Panayiotopoulos CP, Obeid T, Waheed G. Differentiation of typical absence seizures in epileptic syndromes. A video EEG study of 224 seizures in 20 patients. Brain. 1989;112 (Pt 4):1039–56. doi: 10.1093/brain/112.4.1039 [DOI] [PubMed] [Google Scholar]
  • 3.Thomsen CE, Rosenfalck A, Nørregaard Christensen K. Assessment of anaesthetic depth by clustering analysis and autoregressive modelling of electroencephalograms. Comput Methods Programs Biomed. 1991;34(2–3):125–38. doi: 10.1016/0169-2607(91)90038-u [DOI] [PubMed] [Google Scholar]
  • 4.Panayiotopoulos CP, Chroni E, Daskalopoulos C, Baker A, Rowlinson S, Walsh P. Typical absence seizures in adults: clinical, EEG, video-EEG findings and diagnostic/syndromic considerations. J Neurol Neurosurg Psychiatry. 1992;55(11):1002–8. doi: 10.1136/jnnp.55.11.1002 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Vuilleumier P, Assal F, Blanke O, Jallon P. Distinct behavioral and EEG topographic correlates of loss of consciousness in absences. Epilepsia. 2000;41(6):687–93. doi: 10.1111/j.1528-1157.2000.tb00229.x [DOI] [PubMed] [Google Scholar]
  • 6.Goldfine AM, Victor JD, Conte MM, Bardin JC, Schiff ND. Determination of awareness in patients with severe brain injury using EEG power spectral analysis. Clin Neurophysiol. 2011;122(11):2157–68. doi: 10.1016/j.clinph.2011.03.022 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Murphy M, Bruno M-A, Riedner BA, Boveroux P, Noirhomme Q, Landsness EC, et al. Propofol anesthesia and sleep: a high-density EEG study. Sleep. 2011;34(3):283-91A. doi: 10.1093/sleep/34.3.283 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Colombo MA, Napolitani M, Boly M, Gosseries O, Casarotto S, Rosanova M, et al. The spectral exponent of the resting EEG indexes the presence of consciousness during unresponsiveness induced by propofol, xenon, and ketamine. Neuroimage. 2019;189:631–44. doi: 10.1016/j.neuroimage.2019.01.024 [DOI] [PubMed] [Google Scholar]
  • 9.Bruhn J, Röpcke H, Rehberg B, Bouillon T, Hoeft A. Electroencephalogram approximate entropy correctly classifies the occurrence of burst suppression pattern as increasing anesthetic drug effect. Anesthesiology. 2000;93(4):981–5. doi: 10.1097/00000542-200010000-00018 [DOI] [PubMed] [Google Scholar]
  • 10.Liang Z, Wang Y, Ouyang G, Voss LJ, Sleigh JW, Li X. Permutation auto-mutual information of electroencephalogram in anesthesia. J Neural Eng. 2013;10(2):026004. doi: 10.1088/1741-2560/10/2/026004 [DOI] [PubMed] [Google Scholar]
  • 11.Liang Z, Wang Y, Sun X, Li D, Voss LJ, Sleigh JW, et al. EEG entropy measures in anesthesia. Front Comput Neurosci. 2015;9:16. doi: 10.3389/fncom.2015.00016 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Muñoz RN, Leung A, Zecevik A, Pollock FA, Cohen D, van Swinderen B, et al. General anesthesia reduces complexity and temporal asymmetry of the informational structures derived from neural recordings in Drosophila. Phys Rev Res. 2020;2(2). doi: 10.1103/physrevresearch.2.023219 [DOI] [Google Scholar]
  • 13.Sitt JD, King J-R, El Karoui I, Rohaut B, Faugeras F, Gramfort A, et al. Large scale screening of neural signatures of consciousness in patients in a vegetative or minimally conscious state. Brain. 2014;137(Pt 8):2258–70. doi: 10.1093/brain/awu141 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Engemann DA, Raimondo F, King J-R, Rohaut B, Louppe G, Faugeras F, et al. Robust EEG-based cross-site and cross-protocol classification of states of consciousness. Brain. 2018;141(11):3179–92. doi: 10.1093/brain/awy251 [DOI] [PubMed] [Google Scholar]
  • 15.Forrest FC, Tooley MA, Saunders PR, Prys-Roberts C. Propofol infusion and the suppression of consciousness: the EEG and dose requirements. Br J Anaesth. 1994;72(1):35–41. doi: 10.1093/bja/72.1.35 [DOI] [PubMed] [Google Scholar]
  • 16.Schwilden H, Stoeckel H, Schüttler J. Closed-loop feedback control of propofol anaesthesia by quantitative EEG analysis in humans. Br J Anaesth. 1989;62(3):290–6. doi: 10.1093/bja/62.3.290 [DOI] [PubMed] [Google Scholar]
  • 17.Decat N, Walter J, Koh ZH, Sribanditmongkol P, Fulcher BD, Windt JM, et al. Beyond traditional sleep scoring: massive feature extraction and data-driven clustering of sleep time series. Sleep Med. 2022;98:39–52. doi: 10.1016/j.sleep.2022.06.013 [DOI] [PubMed] [Google Scholar]
  • 18.Fulcher BD, Jones NS. hctsa : a computational framework for automated time-series phenotyping using massive feature extraction. Cell Syst. 2017;5(5):527-531.e3. doi: 10.1016/j.cels.2017.10.001 [DOI] [PubMed] [Google Scholar]
  • 19.Nahian MdJA, Ghosh T, Banna MdHA, Aseeri MA, Uddin MN, Ahmed MR, et al. Towards an accelerometer-based elderly fall detection system using cross-disciplinary time series features. IEEE Access. 2021;9:39413–31. doi: 10.1109/access.2021.3056441 [DOI] [Google Scholar]
  • 20.Schreglmann SR, Wang D, Peach RL, Li J, Zhang X, Latorre A, et al. Non-invasive suppression of essential tremor via phase-locked disruption of its temporal coherence. Nat Commun. 2021;12(1):363. doi: 10.1038/s41467-020-20581-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Kriegeskorte N, Simmons WK, Bellgowan PSF, Baker CI. Circular analysis in systems neuroscience: the dangers of double dipping. Nat Neurosci. 2009;12(5):535–40. doi: 10.1038/nn.2303 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Ferdinandy B, Gerencsér L, Corrieri L, Perez P, Újváry D, Csizmadia G, et al. Challenges of machine learning model validation using correlated behaviour data: evaluation of cross-validation strategies and accuracy measures. PLoS ONE. 2020;15(7):e0236092. doi: 10.1371/journal.pone.0236092 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Soderberg CK, Errington TM, Schiavone SR, Bottesini J, Thorn FS, Vazire S, et al. Initial evidence of research quality of registered reports compared with the standard publishing model. Nat Hum Behav. 2021;5(8):990–7. doi: 10.1038/s41562-021-01142-4 [DOI] [PubMed] [Google Scholar]
  • 24.Demertzi A, Antonopoulos G, Heine L, Voss HU, Crone JS, de Los Angeles C, et al. Intrinsic functional connectivity differentiates minimally conscious from unresponsive patients. Brain. 2015;138(Pt 9):2619–31. doi: 10.1093/brain/awv169 [DOI] [PubMed] [Google Scholar]
  • 25.Demertzi A, Tagliazucchi E, Dehaene S, Deco G, Barttfeld P, Raimondo F, et al. Human consciousness is supported by dynamic complex patterns of brain signal coordination. Sci Adv. 2019;5(2):eaat7603. doi: 10.1126/sciadv.aat7603 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Mashour GA, Lydic R. Neuroscientific foundations of anesthesiology. USA: Oxford University Press; 2011. [Google Scholar]
  • 27.Wong W, Noreika V, Móró L, Revonsuo A, Windt J, Valli K, et al. The dream catcher experiment: blinded analyses failed to detect markers of dreaming consciousness in EEG spectral power. Neurosci Conscious. 2020;2020(1):niaa006. doi: 10.1093/nc/niaa006 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Alivisatos AP, Chun M, Church GM, Greenspan RJ, Roukes ML, Yuste R. The brain activity map project and the challenge of functional connectomics. Neuron. 2012;74(6):970–4. doi: 10.1016/j.neuron.2012.06.006 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Herculano-Houzel S. The human brain in numbers: a linearly scaled-up primate brain. Front Hum Neurosci. 2009;3:31. doi: 10.3389/neuro.09.031.2009 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Shaw PJ, Cirelli C, Greenspan RJ, Tononi G. Correlates of sleep and waking in Drosophila melanogaster. Science. 2000;287(5459):1834–7. doi: 10.1126/science.287.5459.1834 [DOI] [PubMed] [Google Scholar]
  • 31.Cirelli C, Bushey D. Sleep and wakefulness in Drosophila melanogaster. Ann N Y Acad Sci. 2008;1129:323–9. doi: 10.1196/annals.1417.017 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Kirszenblat L, van Swinderen B. The yin and yang of sleep and attention. Trends Neurosci. 2015;38(12):776–86. doi: 10.1016/j.tins.2015.10.001 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Geissmann Q, Beckwith EJ, Gilestro GF. Most sleep does not serve a vital function: evidence from Drosophila melanogaster. Sci Adv. 2019;5(2):eaau9253. doi: 10.1126/sciadv.aau9253 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Zalucki O, van Swinderen B. What is unconsciousness in a fly or a worm? A review of general anesthesia in different animal models. Conscious Cogn. 2016;44:72–88. doi: 10.1016/j.concog.2016.06.017 [DOI] [PubMed] [Google Scholar]
  • 35.Cohen D, van Swinderen B, Tsuchiya N. Isoflurane impairs low-frequency feedback but leaves high-frequency feedforward connectivity intact in the fly brain. eNeuro. 2018;5(1). doi: 10.1523/ENEURO.0329-17.2018 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Cohen D, Zalucki OH, van Swinderen B, Tsuchiya N. Local versus global effects of isoflurane anesthesia on visual processing in the fly brain. eNeuro. 2016;3(4):ENEURO.0116-16.2016. doi: 10.1523/ENEURO.0116-16.2016 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Leung A, Cohen D, van Swinderen B, Tsuchiya N. Integrated information structure collapses with anesthetic loss of conscious arousal in Drosophila melanogaster. PLoS Comput Biol. 2021;17(2):e1008722. doi: 10.1371/journal.pcbi.1008722 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Boly M, Moran R, Murphy M, Boveroux P, Bruno M-A, Noirhomme Q, et al. Connectivity changes underlying spectral EEG changes during propofol-induced loss of consciousness. J Neurosci. 2012;32(20):7082–90. doi: 10.1523/JNEUROSCI.3769-11.2012 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Lee U, Blain-Moraes S, Mashour GA. Assessing levels of consciousness with symbolic analysis. Philos Trans A Math Phys Eng Sci. 2015;373(2034):20140117. doi: 10.1098/rsta.2014.0117 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Schartner M, Seth A, Noirhomme Q, Boly M, Bruno M-A, Laureys S, et al. Complexity of multi-dimensional spontaneous eeg decreases during propofol induced general anaesthesia. PLoS One. 2015;10(8):e0133532. doi: 10.1371/journal.pone.0133532 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Casarotto S, Comanducci A, Rosanova M, Sarasso S, Fecchio M, Napolitani M, et al. Stratification of unresponsive patients by an independently validated index of brain complexity. Ann Neurol. 2016;80(5):718–29. doi: 10.1002/ana.24779 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Lee U, Kim S, Noh G-J, Choi B-M, Hwang E, Mashour GA. The directionality and functional organization of frontoparietal connectivity during consciousness and anesthesia in humans. Conscious Cogn. 2009;18(4):1069–78. doi: 10.1016/j.concog.2009.04.004 [DOI] [PubMed] [Google Scholar]
  • 43.Paulk AC, Zhou Y, Stratton P, Liu L, van Swinderen B. Multichannel brain recordings in behaving Drosophila reveal oscillatory activity and local coherence in response to sensory stimulation and circuit activation. J Neurophysiol. 2013;110(7):1703–21. doi: 10.1152/jn.00414.2013 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Zalucki O, Day R, Kottler B, Karunanithi S, van Swinderen B. Behavioral and electrophysiological analysis of general anesthesia in 3 background strains of Drosophila melanogaster. Fly (Austin). 2015;9(1):7–15. doi: 10.1080/19336934.2015.1072663 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Van De Poll M, van Swinderen B. Whole-brain electrophysiology in Drosophila during sleep and wake. Cold Spring Harb Protoc. 2024;2024(9):pdb.prot108418. doi: 10.1101/pdb.prot108418 [DOI] [PubMed] [Google Scholar]
  • 46.Jagannathan SR, Jeans T, Van De Poll MN, van Swinderen B. Multivariate classification of multichannel long-term electrophysiology data identifies different sleep stages in fruit flies. Sci Adv. 2024;10(8):eadj4399. doi: 10.1126/sciadv.adj4399 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Mitra PP, Bokil HS. Observed brain dynamics. Oxford University Press; 2007. [Google Scholar]
  • 48.Fulcher BD, Little MA, Jones NS. Highly comparative time-series analysis: the empirical structure of time series and their methods. J R Soc Interface. 2013;10(83):20130048. doi: 10.1098/rsif.2013.0048 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B: Methodol. 1995;57(1):289–300. doi: 10.1111/j.2517-6161.1995.tb02031.x [DOI] [Google Scholar]
  • 50.Benjamini Y, Yekutieli D. The control of the false discovery rate in multiple testing under dependency. Ann Stat. 2001;29:1165–88. [Google Scholar]
  • 51.Genovese CR, Lazar NA, Nichols T. Thresholding of statistical maps in functional neuroimaging using the false discovery rate. Neuroimage. 2002;15(4):870–8. doi: 10.1006/nimg.2001.1037 [DOI] [PubMed] [Google Scholar]
  • 52.Pincus SM, Cummins TR, Haddad GG. Heart rate control in normal and aborted-SIDS infants. Am J Physiol. 1993;264(3 Pt 2):R638-46. doi: 10.1152/ajpregu.1993.264.3.R638 [DOI] [PubMed] [Google Scholar]
  • 53.Nilsen AS, Juel B, Thürer B, Storm JF. Proposed EEG measures of consciousness: a systematic, comparative review. PsyArXiv; 2020. doi: 10.31234/osf.io/sjm4a [DOI] [Google Scholar]
  • 54.Brennan M, Palaniswami M, Kamen P. Do existing measures of Poincaré plot geometry reflect nonlinear features of heart rate variability?. IEEE Trans Biomed Eng. 2001;48(11):1342–7. doi: 10.1109/10.959330 [DOI] [PubMed] [Google Scholar]
  • 55.Botvinik-Nezer R, Holzmeister F, Camerer CF, Dreber A, Huber J, Johannesson M, et al. Variability in the analysis of a single neuroimaging dataset by many teams. Nature. 2020;582(7810):84–8. doi: 10.1038/s41586-020-2314-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Shimaoka D, Leung A, Price N, Banks M, Nourski K, Tsuchiya N. Registered report: common signatures of loss of consciousness in human and macaque electrocorticogram. OSF. 2024. doi: 10.31234/osf.io/7gync [DOI] [Google Scholar]
  • 57.Türker B, Manasova D, Béranger B, Naccache L, Sergent C, Sitt JD. Distinct dynamic connectivity profiles promote enhanced conscious perception of auditory stimuli. Commun Biol. 2024;7(1):856. doi: 10.1038/s42003-024-06533-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Hasson U, Nastase SA, Goldstein A. Direct fit to nature: an evolutionary perspective on biological and artificial neural networks. Neuron. 2020;105(3):416–34. doi: 10.1016/j.neuron.2019.12.002 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Inouye T, Ukai S, Shinosaki K, Iyama A, Matsumoto Y, Toi S. Changes in the fractal dimension of alpha envelope from wakefulness to drowsiness in the human electroencephalogram. Neurosci Lett. 1994;174(1):105–8. doi: 10.1016/0304-3940(94)90130-9 [DOI] [PubMed] [Google Scholar]
  • 60.Mohd Radzi SS, Asirvadam VS, Yusoff MZ. Fractal dimension and power spectrum of electroencephalography signals of sleep inertia state. IEEE Access. 2019;7:185879–92. doi: 10.1109/access.2019.2960852 [DOI] [Google Scholar]
  • 61.Susmáková K, Krakovská A. Discrimination ability of individual measures used in sleep stages classification. Artif Intell Med. 2008;44(3):261–77. doi: 10.1016/j.artmed.2008.07.005 [DOI] [PubMed] [Google Scholar]
  • 62.Chouvarda I, Rosso V, Mendez MO, Bianchi AM, Parrino L, Grassi A, et al. Assessment of the EEG complexity during activations from sleep. Comput Methods Programs Biomed. 2011;104(3):e16-28. doi: 10.1016/j.cmpb.2010.11.005 [DOI] [PubMed] [Google Scholar]
  • 63.Ferenets R, Lipping T, Anier A, Jäntti V, Melto S, Hovilehto S. Comparison of entropy and complexity measures for the assessment of depth of sedation. IEEE Trans Biomed Eng. 2006;53(6):1067–77. doi: 10.1109/TBME.2006.873543 [DOI] [PubMed] [Google Scholar]
  • 64.Ruiz de Miras J, Soler F, Iglesias-Parro S, Ibáñez-Molina AJ, Casali AG, Laureys S, et al. Fractal dimension analysis of states of consciousness and unconsciousness using transcranial magnetic stimulation. Comput Methods Programs Biomed. 2019;175:129–37. doi: 10.1016/j.cmpb.2019.04.017 [DOI] [PubMed] [Google Scholar]
  • 65.Bayne T, Seth AK, Massimini M, Shepherd J, Cleeremans A, Fleming SM, et al. Tests for consciousness in humans and beyond. Trends Cogn Sci. 2024;28(5):454–66. doi: 10.1016/j.tics.2024.01.010 [DOI] [PubMed] [Google Scholar]
  • 66.Seth AK, Bayne T. Theories of consciousness. Nat Rev Neurosci. 2022;23(7):439–52. doi: 10.1038/s41583-022-00587-4 [DOI] [PubMed] [Google Scholar]
  • 67.Doerig A, Schurger A, Herzog MH. Hard criteria for empirical theories of consciousness. Cogn Neurosci. 2021;12(2):41–62. doi: 10.1080/17588928.2020.1772214 [DOI] [PubMed] [Google Scholar]
  • 68.Tononi G. An information integration theory of consciousness. BMC Neurosci. 2004;5:42. doi: 10.1186/1471-2202-5-42 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Tononi G. Consciousness as integrated information: a provisional manifesto. Biol Bull. 2008;215(3):216–42. doi: 10.2307/25470707 [DOI] [PubMed] [Google Scholar]
  • 70.Oizumi M, Albantakis L, Tononi G. From the phenomenology to the mechanisms of consciousness: integrated information theory 3.0. PLoS Comput Biol. 2014;10(5):e1003588. doi: 10.1371/journal.pcbi.1003588 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Albantakis L, Barbosa L, Findlay G, Grasso M, Haun AM, Marshall W, et al. Integrated information theory (IIT) 4.0: formulating the properties of phenomenal existence in physical terms. PLoS Comput Biol. 2023;19(10):e1011465. doi: 10.1371/journal.pcbi.1011465 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72.Cliff OM, Bryant AG, Lizier JT, Tsuchiya N, Fulcher BD. Unifying pairwise interactions in complex dynamics. Nat Comput Sci. 2023;3(10):883–93. doi: 10.1038/s43588-023-00519-x [DOI] [PubMed] [Google Scholar]
  • 73.Bryant AG, Aquino K, Parkes L, Fornito A, Fulcher BD. Extracting interpretable signatures of whole-brain dynamics through systematic comparison. bioRxiv. 2024;:2024.01.10.573372. doi: 10.1101/2024.01.10.573372 [DOI] [PMC free article] [PubMed] [Google Scholar]

Decision Letter 0

Gabriel Gasque

Dear Dr Leung,

Thank you for submitting your manuscript entitled "Towards blinded classification of levels of consciousness: distinguishing wakefulness from general anesthesia in flies using a massive library of univariate time series analyses" for consideration as a Preregistered Research Article by PLOS Biology. Please accept my apologies for the delay in sending the decision below to you.

Your manuscript has now been evaluated by the PLOS Biology editorial staff. We have also discussed your proposal with two academic editors, one with expertise in the biological question you are addressing and another one with expertise in Pre-registered Reports. I am writing to let you know that we are interested in peer-reviewing your proposal, but before we can commit to that, we would like you to revise your submission by including a summary table that aligns each research question with the hypothesis/es used to answer the question, the sampling plan for each hypothesis (e.g. power analysis, where applicable), the specific statistical analysis/es that will be used to test the hypothesis, and a pre-specification of which outcomes will confirm or disconfirm the hypothesis (to varying degrees of strength where multiple analyses with different possible outcomes are used to interrogate one hypothesis). You can read our guidelines for Pre-registered Reports here: https://plos-marketing.s3.amazonaws.com/Marketing/Biology+Preregistered+Articles+Guidelines+for+Authors.pdf

In addition, we need you to complete your submission by providing the metadata that is required for full assessment. To this end, please login to Editorial Manager where you will find the paper in the 'Submissions Needing Revisions' folder on your homepage. Please click 'Revise Submission' from the Action Links and complete all additional questions in the submission questionnaire.

Please re-submit your manuscript within two working days, i.e. by Oct 28 2021 11:59PM. Do let me know if you need more time or would like to discuss our decision.

Login to Editorial Manager here: https://www.editorialmanager.com/pbiology

During resubmission, you will be invited to opt-in to posting your pre-review manuscript as a bioRxiv preprint. Visit http://journals.plos.org/plosbiology/s/preprints for full details. If you consent to posting your current manuscript as a preprint, please upload a single Preprint PDF when you re-submit.

Given the disruptions resulting from the ongoing COVID-19 pandemic, please expect delays in the editorial process. We apologise in advance for any inconvenience caused and will do our best to minimize impact as far as possible.

Feel free to email us at plosbiology@plos.org if you have any queries relating to your submission.

Kind regards,

Gabriel

Gabriel Gasque

Senior Editor

PLOS Biology

ggasque@plos.org

Decision Letter 1

Gabriel Gasque

Dear Dr Leung,

Thank you for submitting your manuscript "Towards blinded classification of levels of consciousness: distinguishing wakefulness from general anesthesia in flies using a massive library of univariate time series analyses" for consideration as a Preregistered Research Article at PLOS Biology. Your manuscript has been evaluated by the PLOS Biology editors, by two academic editors with relevant expertise, and by four independent reviewers. Please accept my apologies for the long delay in sending the decision below to you.

In light of the reviews (below), we will not be able to accept the current version of the manuscript, but we would welcome re-submission of a much-revised version that takes into account the reviewers' comments. We cannot make any decision about publication until we have seen the revised manuscript and your response to the reviewers' comments. Your revised manuscript is also likely to be sent for further evaluation by the reviewers. We expect your revision to gather enough support from the reviewers for us to consider eventual acceptance.

One academic editor also provided specific feedback (below), which you should also address.

We expect to receive your revised manuscript within 3 months.

Please email us (plosbiology@plos.org) if you have any questions or concerns, or would like to request an extension. At this stage, your manuscript remains formally under active consideration at our journal; please notify us by email if you do not intend to submit a revision so that we may end consideration of the manuscript at PLOS Biology.

**IMPORTANT - SUBMITTING YOUR REVISION**

Your revisions should address the specific points made by each reviewer. Please be thorough when addressing reviewer 1's comments, as the concerns raised about conceptual advance are particularly important. Please submit the following files along with your revised manuscript:

1. A 'Response to Reviewers' file - this should detail your responses to the editorial requests, present a point-by-point response to all of the reviewers' comments, and indicate the changes made to the manuscript.

*NOTE: In your point by point response to the reviewers, please provide the full context of each review. Do not selectively quote paragraphs or sentences to reply to. The entire set of reviewer comments should be present in full and each specific point should be responded to individually, point by point.

You should also cite any additional relevant literature that has been published since the original submission and mention any additional citations in your response.

2. In addition to a clean copy of the manuscript, please also upload a 'track-changes' version of your manuscript that specifies the edits made. This should be uploaded as a "Related" file type.

*Re-submission Checklist*

When you are ready to resubmit your revised manuscript, please refer to this re-submission checklist: https://plos.io/Biology_Checklist

To submit a revised version of your manuscript, please go to https://www.editorialmanager.com/pbiology/ and log in as an Author. Click the link labelled 'Submissions Needing Revision' where you will find your submission record.

Please make sure to read the following important policies and guidelines while preparing your revision:

*Published Peer Review*

Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out. Please see here for more details:

https://blogs.plos.org/plos/2019/05/plos-journals-now-open-for-published-peer-review/

*PLOS Data Policy*

Please note that as a condition of publication PLOS' data policy (http://journals.plos.org/plosbiology/s/data-availability) requires that you make available all data used to draw the conclusions arrived at in your manuscript. If you have not already done so, you must include any data used in your manuscript either in appropriate repositories, within the body of the manuscript, or as supporting information (N.B. this includes any numerical values that were used to generate graphs, histograms etc.). For an example see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5

*Protocols deposition*

To enhance the reproducibility of your results, we recommend that if applicable you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols

Thank you again for your submission to our journal. We hope that our editorial process has been constructive thus far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments.

Sincerely,

Gabriel Gasque

Senior Editor

PLOS Biology

ggasque@plos.org

*****************************************************

REVIEWS:

Academic editor: I would invite the authors to specify better how they will evaluate different effects on different electrodes. What would it mean if a feature significantly predicts the difference between wakefulness and anesthesia in one channel (either within or across flies), but that same feature does not for another channel? How are the authors going to arbitrate between whether these differences are due to methodological/statistical factors (e.g. different signal-to-noise ratios in the different channels), which may be less interesting, or whether they relate to meaningful differences in the brain structures that are sampled?

If, as anticipated, many neural measures may predict the difference between wakefulness and anesthesia (even for one channel), the authors have to think about an appropriate way to summarize those results in an intuitive manner for the reader. How are the authors planning to do so? Are there specific “types” of measures that are specifically informative? Is there a way to cluster time-series features (see a similar comment by reviewer 4) and link those features back to the existing literature on the known neural processes underlying the change from wakefulness to anesthesia?

Reviewer #1: The neural mechanism of anesthesia action remains to be fully elucidated. Using previously recorded LFP signals of flies with/without anesthesia (Cohen et al. 2018 eNeuro), in addition to a newly collected dataset, Leung et al aim to develop a protocol for classifying the depth of anesthesia based on LFP characteristics. Identifying quantitative alterations in LFP signals in the context of anesthesia is quite valuable to a broader neuroscience community beyond the immediate field of anesthetics research. However, based on the authors study design, there are several conceptual and technical gaps that seem to preclude its potential utility to the broader scientific community. At the current stage of this report, this rather specialized study with a debatable conceptual leap in the definition of consciousness seems to be more suitable for a journal dedicated to computational neuroscience.

Major points:

1. It is not clear what (if any) broad biological insight this study by itself will provide. Previous studies by the van Swinderen group establish changes in LFP signals associated with a particular type of anesthetics, isoflurane. However, in this report, the researchers do not propose to compare the time series measures against established differences, nor do they seem to make an attempt to compare a certain subset of interesting features against established measures. Furthermore, as discussed in the manuscript, it is apparent that there will be hundreds of time series features that may be successfully extracted and generalized. What would be the essence of finding these many features? More importantly, it is not clear what identification of a particular set of features extracted from data obtained under a single experimental condition will tell us about either biology of anesthesia or a deeper meaning of "consciousness".

2. Dose-response effects. The authors propose to compare a single dose (0.6 % iso) against the no anesthesia condition. However, it is not clear how this dose was selected. Conceivably, a subset of LFP features could emerge at different "levels" of anesthesia. Using this study design, the authors could miss key features related to anesthetic induction, or may generalize effects which are specific to the selected dose. It would be important to explore/analyze the LFP during induction and recovery.

3. Variety of anesthetics. It is utmost important for the authors to demonstrate that the time series features of LFP attributable to "consciousness" (or lack of it) are independent of the types of anesthetics. Otherwise, there will be no ground for the proposal to equate the altered time-series features during anesthesia to a LFP component due to loss of consciousness.

4. The hctsa framework is not well-known to the general biology audience of the Journal. It would be helpful for the authors to better describe the library and to give specific examples of some univariate time-series features and spell out more concrete meaning and implications.

5. According to Fulcher et al., 2013, several questions can be raised regarding implementation.

1. Apart from line noise removal, were there any pre-processing/normalization of LFP signals? For several of the hctsa analyses, it would seem that pre-processing was necessary. For example, one would expect that the ST_propsimp function, which measures the proportion of positive vs. negative vs. zero values would be highly sensitive to the DC value of the signal. This measure could be very useful for DC-subtracted data, but meaningless if the DC offset is included.

2. Many measures include hard-coded parameters, e.g. CO_autocorr uses tau = 1 … 40. Are these parameters values appropriate? One would expect the interpretation of lag would be dependent on sampling rate. A 40-point lag on a 20 kHz signal is very different than that on a 1000 Hz signal.

3. How many of these measures require a signal to be stationary for proper interpretation? Is there evidence that the LFP signals during the anesthesia epochs are stationary?

6. Study power & sample size. The authors should justify why a sample size of 13 flies is sufficient for this analysis. Similarly, the test sample size of 12 should be justified. (Both seem somewhat low)

7. False discovery rate. A broader discussion of the Benjamini-Hochberg procedure is warranted as many of the tests in hctsa are clearly not independent to one-another. For example, CO_autocorr, t = 1 and CO_autocorr, t=2 would be expected to yield similar results.

Reviewer #2: Stage 1 review

The importance of the research question(s).

There are various theories for the neural basis of conciousness which suggest various signatures and mechanisms, however, there is little agreement. I believe the research question posed by the authors, which in short is trying to understand the fundamental mechanisms of conciousness, is important, both from the perspective of scientific understanding but also because it has clinical implications.

The specific research questions, 'What univariate time-series features (from hctsa) can serve as markers of level of consciousness ACROSS individuals?' (and WITHIN), are both important questions which would help us both predict (assess clinical patients for example) or understand mechanisms.

The logic, rationale, and plausibility of the proposed hypotheses (does the manuscript provide a valid rationale for the proposed study, with clearly identified and justified research questions?

The proposed hypotheses (testing each feature at each channel) are logical and rationale. The authors present a data-driven approach to identify time-series that are predictive above chance of conciousness. Until now, most research has been focussed on pre-selected features (either measures of spectral power, or within their own realm of expertise). Taking an indiscrimnate approach removes the feature selection bias. The hypotheses are entirely plausible given that predetermined features have shown some efficacy previously.

The soundness and feasibility of the methodology and analysis pipeline (including statistical power analysis where appropriate). Is the protocol technically sound and planned in a manner that will lead to a meaningful outcome and allow testing of the stated hypotheses?

As made clear by the authors, their methodology introduces a multiple hypothesis problem which the authors will correct for with (i) FDR, (ii) leave one out testing, and (iii) independent evaluation. The protocol would lead to results that a statistically robust. Feasability is clear with the existing hctsa package, pilot results and existing data.

Whether the clarity and degree of methodological detail is sufficient to exactly replicate the proposed experimental procedures and analysis pipeline.

Regarding experimental methodology - it seems clear to me, but I am not an experimentalist and therefore would rely upon other reviewers to check the clarity and detail of the experimental methodology.

The analysis methodology was detailed and clear, and could easily be replicated.

Whether the authors have pre-specified sufficient outcome-neutral tests for ensuring that the results obtained are able to test the stated hypotheses, including positive controls and quality checks.

The authors have clear stated tests that provide outcomes. The authors use an independent classification set, whose importance becomes evident in the pilot results section.

Additional review notes.

The WITHIN fly analysis is quite an interesting approach. Out of curiosity, would WITHIN fly normalisation or WITHIN fly engineering of features not allow the authors to perform a classification task like that used with the ACROSS individual analysis? e.g. normalise values within a fly, instead of across the cohort. A couple of sentences on this might just clear this up for readers.

There are various methods for batch correction which could improve the generalisability of the ACROSS top features to the independent data set. Is this something the authors have looked into? It would be good to add a sentence or two mentioning this and why it is (or isn't) applicable.

Overall, I believe the results will be of interest to the community. Moreover, I think the methodological approach is also useful, both for extending to other analyses/scientific problems and for highlighting the importance of an independent evaluation dataset.

Reviewer #3, Jacobo Diego Sitt I congratulate the authors for this very interesting work. The approach proposed is very interesting and will add an enormous exploration of time-series features for the study of consciousness.

My only methodological question refers to the FDR correction for multiple comparisons. Do the authors propose to apply multiple FDR corrections (across channels) for each feature independently or a unique FDR correction across features x channels? In my view given the number of comparisons, the latter is here the correct approach.

I would like to mention that the authors indicate that prediction generalization using independent datasets is limited in human data. I have to disagree with this statement. It is true that the first studies of biomarkers of consciousness relied on unique datasets to make predictions, but the current standard in the field is to use independent datasets to validate the proposed biomarkers, for example, see reference 14 in your manuscript for EEG or Demertzi Brain 2015 and Sci Adv 2019 for fMRI biomarkers.

Regarding the interpretation of the potential results, I would appreciate it if in the next stage the authors could address in more detail the potential limitations of the weaker form of generalization. In addition, it would be interesting to read the view of the authors of how the proposed approach fits the current discussion of the need for interpretability and explainability for machine learning / biomarkers for putative medical applications.

Reviewer #4, Tristan Bekinschtein: The aim to separate, in a data-driven manner, wakefulness (the wake state, either active or passive) from anaesthesia (the anaesthetised state) with neural data is, in principle, useful to understand the underlying signatures of each state and ultimately to further our understanding of consciousness from a neuroscientific perspective. I agree of the strength of using a completely new dataset to test once trained in the original dataset as the independence would allow to trust the features obtained and imply generalization. I am more worried about the low data amount compared to the level of features and how is that conceptualised. Finally I am asking for further interpretation at the neuroscience level once the data analyses is performed, to conceptually validate the results, as I believe that data science alone cannot define or interpret the result. I comment in more details in the following paragraphs.

I am assuming that the statistical framework, including the permutation and relationship between epochs, electrodes, conditions and flies (test and training sets) are state of the art, I cannot unfortunately be the right expert for that, although I can comment and hope for an explanation as to whether is it correct and valid to evaluate ~700 features in a small set of data (12 flies, 15 electrodes and eight epochs). This is not a critique but a plea for an explanation on how to trust the results with such limited data resources. In particular I don't understand how is FDR applied in each channel for those ~7000 features and whether is gives you a specific cut-off that you trust.

If I understand correctly for the second analysis you give a direction of effect based on the median

"Because the direction of the effect of anesthesia is not necessarily the same across features and channels, we first assigned directionality labels based on the median wakeful and anesthesia values in the 13 discovery flies. For a given feature and channel, we gave a label of 1 if the median wakeful value was greater than for anesthesia, and -1 otherwise. We then multiplied feature values by these labels, flipping the direction of the effect of anesthesia when the median wakeful value is lesser than the median anesthesia value and making the analysis uniform across features and channels. Finally, we report the average proportion across all wakeful epochs and flies."

I wonder if removing the strictly binary label of 1 and -1 and leaving the normalized difference would give you a direction and also a normalized strength of effect that would shield the results less jerky that the binarization and add more nuanced and strength to the classifiers.

Further to this, it would be good from the beginning to described some of the features, or families of features that separate between conscious states in the original data sets and generalised to the 2 flies in the pilot analyses. When having this amount of features it is possible to colour or cluster features according to its characteristics or common mathematical effects, roughly saying what they measure. I see in the manuscript the examples of specific features, but I would like a paragraph on whether there is clustering on features (for figs 2c and 3c) and which of them, as I see in the colourful patterns in the other subfigures (2a and 3a).

Can we trust this 2 features singled and maximal in figure 3c? isn't statistically fishy to have ~95% classification? Depending on the model used there is sometimes penalizing criteria to avoid overfitting classification. I wonder how the authors protect themselves from this, or if they have different interpretation. This specific result also helps me ask for clusters, and how in that see of blue dots some specific neural features families are together and how that helps interpret the results beyond the simple data-driven aspect. This plea of for us to understand if once the results are given (for the discovery and the pilot) we can go back to neural science and away from data science and start to trust the data output as meaningful for the data we have (LFP) and its conditions (awake and anaesthetised).

This aim is clearly stated in the current RR and justified but I am unsure, while reading the introduction, how is this an RR since the preliminary results are already informing. I think there is the need to state clearly which questions and analyses belong to the RR early on in the manuscript by mentioning that are different flies and how will analyses are blinded.

A detail in the intro, it might be good to defined what the authors mean by a univariate feature as for example, they ascribed non-univariate states to Lempel-Ziv Complexity and I would imagine that since it can be calculate on a single electrode, it is not a multivariate feature, or bivariate feature. This might seem a detail but it would be good to hear the classification of the feature the authors have in mind and what are the inclusion or exclusion criteria for them.

In Page 17 "We obtained random-classification distributions for the discovery and pilot evaluation flies by repeatedly classifying discovery or evaluation epochs randomly, with equal probability (as there are 7702 potentially available features in hctsa, we repeated this random classification N = 7702 times to build the distribution)." The numbers of features can be corrected to the expected number taking into consideration the limitations described in "Not all of the available time-series features could be extracted successfully from our datasets. For example, the class of features derived from the hctsa function DN_CompareKSFit includes fits of the data to a beta distribution, which assumes values between 0 and 1, an assumption that is not fulfilled by our data and consequently returns missing (NaN) values. To filter out these cases, we excluded any feature which returned NaN across all time series for a given channel in the discovery flies. This reduced the set of features down to an average of 6860 features across the 15 channels (ranging from 6657 to 7004). We further excluded features which returned a constant value across all time series for a given channel in the discovery flies because they are uninformative for classification, reducing the set of features again to on average 6764 features across the 15 channels (ranging from 6560 to 6908)."

Decision Letter 2

Kris Dickson, PhD

Dear Dr Leung,

Thank you for your patience while we considered your revised manuscript "Towards blinded classification of levels of consciousness: distinguishing wakefulness from general anesthesia and sleep in flies using a massive library of univariate time series analyses" for consideration as a Preregistered Research Article at PLOS Biology. Your revised study has now been evaluated by the PLOS Biology editors, the two Academic Editors and the original reviewers.

In light of the reviews, which you will find at the end of this email, we are pleased to offer you the opportunity to address the remaining concerns raised by Reviewer 1 in a revision that we anticipate should not take you very long. We will then assess your revised manuscript and your response to the reviewers' comments and consult with Reviewer 1 to determine whether they are satisfied with these final revisions.

We expect to receive your revised manuscript within 1 month. Please email us (plosbiology@plos.org) if you have any questions or concerns, or would like to request an extension.

At this stage, your manuscript remains formally under active consideration at our journal; please notify us by email if you do not intend to submit a revision so that we withdraw the manuscript.

**IMPORTANT - SUBMITTING YOUR REVISION**

Your revisions should address the specific points made by each reviewer. Please also submit the following files, and ensure you address the along with your revised manuscript:

1. A 'Response to Reviewers' file - this should detail your responses to the editorial requests, present a point-by-point response to all of the reviewers' comments, and indicate the changes made to the manuscript.

*NOTE: In your point-by-point response to the reviewers, please provide the full context of each review. Do not selectively quote paragraphs or sentences to reply to. The entire set of reviewer comments should be present in full and each specific point should be responded to individually.

You should also cite any additional relevant literature that has been published since the original submission and mention any additional citations in your response.

2. In addition to a clean copy of the manuscript, please also upload a 'track-changes' version of your manuscript that specifies the edits made. This should be uploaded as a "Revised Article with Changes Highlighted " file type.

*Resubmission Checklist*

When you are ready to resubmit your revised manuscript, please refer to this resubmission checklist: https://plos.io/Biology_Checklist

To submit a revised version of your manuscript, please go to https://www.editorialmanager.com/pbiology/ and log in as an Author. Click the link labelled 'Submissions Needing Revision' where you will find your submission record.

Please make sure to read the following important policies and guidelines while preparing your revision:

*Published Peer Review*

Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out. Please see here for more details:

https://blogs.plos.org/plos/2019/05/plos-journals-now-open-for-published-peer-review/

*PLOS Data Policy*

Please note that as a condition of publication PLOS' data policy (http://journals.plos.org/plosbiology/s/data-availability) requires that you make available all data used to draw the conclusions arrived at in your manuscript. If you have not already done so, you must include any data used in your manuscript either in appropriate repositories, within the body of the manuscript, or as supporting information (N.B. this includes any numerical values that were used to generate graphs, histograms etc.).

Note that we do not require all raw data. Rather, we ask that all individual quantitative observations that underlie the data summarized in the figures and results of your paper be made available in one of the following forms:

1) Supplementary files (e.g., excel). Please ensure that all data files are uploaded as 'Supporting Information' and are invariably referred to (in the manuscript, figure legends, and the Description field when uploading your files) using the following format verbatim: S1 Data, S2 Data, etc. Multiple panels of a single or even several figures can be included as multiple sheets in one excel file that is saved using exactly the following convention: S1_Data.xlsx (using an underscore).

2) Deposition in a publicly available repository. Please also provide the accession code or a reviewer link so that we may view your data before publication.

Regardless of the method selected, please ensure that you provide the individual numerical values that underlie the summary data displayed in any graphs/heat maps as they are essential for reviewers and readers to assess your work.

NOTE: the numerical data provided should include all replicates AND the way in which the plotted mean and errors were derived (it should not present only the mean/average values).

Please also ensure that FIGURE LEGENDS in your manuscript include information on where the underlying data can be found, and ensure your supplemental data file/s has a legend. (This step is often forgotten!!!)

Finally, please ensure that your Data Statement in the submission system accurately describes where your data can be found.

For an example see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5

*Blot and Gel Data Policy*

We require the original, uncropped and minimally adjusted images supporting all blot and gel results reported in an article's figures or Supporting Information files. We will require these files before a manuscript can be accepted so please prepare them now, if you have not already uploaded them. Please carefully read our guidelines for how to prepare and upload this data: https://journals.plos.org/plosbiology/s/figures#loc-blot-and-gel-reporting-requirements

*Protocols deposition*

To enhance the reproducibility of your results, we recommend that if applicable you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols

Thank you again for your submission to our journal. We hope that our editorial process has been constructive thus far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments.

Sincerely,

Kris

Kris Dickson, Ph.D. (she/her)

Neurosciences Senior Editor/Section Manager

PLOS Biology

kdickson@plos.org

----------------------------------------------------------------

REVIEWS:

Reviewer's Responses to Questions

Do you want your identity to be public for this peer review?

Reviewer #1: No

Reviewer #2: No

Reviewer #3: Yes: Jacobo Diego SITT

Reviewer #1: We appreciate the authors' genuine effort and the substantial improvements in the manuscript that have made since last review. It is recognized that collecting a robust data set within various readily available sources would enable the authors to draw more convincing conclusions. The inclusion of a wake/sleep data set thus could potentially improve the impact of the paper, provided that the experimental conditions are well defined and the sample size is sufficient for strong inferences. We have a few concerns and suggestions to further strengthen this study:

I. How is sleep defined in the data set?

Since the recording was performed on mounted flies, sleep indicators based on conventional locomotion activity criteria will be very limited and need additional detectable indicators. This should be spelled out explicitly.

Recent work by the Van Swinderen and Allada Labs (van Alphen et al., 2021Science Advances) has indicated different sleep states in flies correlated with proboscis pumping. Does the data set have enough flies to resolve differences in sleep state? A clear-cut result would be a desirable contribution to the field.

II. Depth of anesthesia.

Adding the new data set for a different concentration may reveal the commonality of anesthetic effect and enable a comparison to see the overlap and distinction between sleep (states) and depth of anesthesia. It is still a severe limitation in the scientific inference based on ac single drug to draw conclusions on "anesthesia" effects on LFP features. If appropriately treated, this study may become a significant contribution to the field. The authors should strive for clearly accessible results based on high-quality data sets from more than one anesthetic compound.

III. Are the findings of HCTSA parameters sex-specific?

Given the widespread recognition of sex-based differences, the authors should report the sex of the recorded flies, and if possible describe any sex-based differences found in recordings from males and females under wake/sleep/anesthetic states. It is well known that male and female flies display different circadian sleep-wake cycle patterns. An analysis of sex differences would be practically useful to others in the field, and could help identify issues/artifacts originating from electrode positioning (female fly heads are a bit larger and have more facets in their compound eyes).

IV. What is meant by "conscious levels"

A clear operational definition of "consciousness" is required. What is meant by "conscious arousal"? It can be argued that the work is more so an investigation into different sorts of "un-consciousness" and it seems that "brain states" or "arousal states" would be more appropriate terms in describing how sleep and anesthesia induce changes in LFP features associated with different categories of "un-consciousness". The title of the manuscript needs to be appropriately modified to reflect this point.

V. Number of independent samples, feature robustness, reproducibility, calibration of analytical tools

We are still concerned that the number of independent LFP recordings from different flies is too low to draw the intended conclusions. It is strongly encouraged to include additional data sets (new recordings or previously published) to enhance the validity of the conclusions. This is well within the reach of this established group with recognized expertise in this area.

This will help the authors fulfill a major goal: to examine and estimate how robust each (or at least some) of the various features extracted from the signals (time series) in the data set to identify most useful indicators:

which ones are more tolerant to (or more sensitive to) biological or experiment variability, i.e. which ones stay invariant across different biological samples and/or experimental conditions (such as electrode track precision; biological variation among individual organisms).

We were surprised to see that the original description of the multi-electrode approach was not cited (Paulk et al., 2013, J Neurophysiol). The authors should also consider using raw data of recordings from this work to improve the robustness of the analysis. Additionally, Paulk et al have a description of the variability of the electrode insertion tracks. The authors should discuss their HCTSA findings with respect to the variable electrode insertion described in Paulk.

Instead of a single epoch, analyzing additional epochs or time segments from each individual can better demonstrate certain time-invariant stationary features, uncovering useful time series features and robust parameters.

Some simple treatments that are more accessible to the general readership of PLoS Biol may be first presented and examined before attempting to interpret overall pictures with a large number of time series features. For example, in Fig R1.3, it could be readily seen whether Autocorrelation features can distinguished sleep, awake and anesthesia states if each of them display best autocorrelation values with distinct ranges of time lags.

Many of these points can be empirically determined by the authors with suitable analysis and additional data sets. Comparing the current with the new data sets can also demonstrate the reproducibility and strengthen the conclusions.

Reviewer #2:

The addition of more evaluation flies + a more detailed description of the spatial effects has boosted the proposed analysis.

Personally, the capability to accurately predict/distinguish is what interests me. However, I agree with Reviewer #1, in that whilst I can see what biological insights might come from the analyses, e.g. spatial locations associated with conciousness, I do find it difficult to see how any mechanism could be inferred from some identified time-series feature. For example, if we find a particular time-series feature that is highly predictive, how can this be linked to an underlying neurological mechanism? Of course, this extends more broadly to even traditional neuroscientific features (such as band power), and as such, I don't think its reason to hold this analysis back. Interpretation of the top predictive hctsa features and how they could be generated would lead to interesting insights into underlying mechanisms.

I have no further revision requests.

Reviewer #3: All my comments are properly addressed.

Decision Letter 3

Kris Dickson, PhD

Dear Dr Leung,

Thank you for your patience while your Pre-Registered Research Article "Towards blinded classification of loss of consciousness: distinguishing wakefulness from general anesthesia and sleep in flies using a massive library of univariate time series analyses" was re-reviewed for PLOS Biology. Your Stage 1 manuscript has been evaluated by the PLOS Biology editors, two Academic Editor with relevant expertise, and by Reviewer 1 in this last round.

Reviewer 1 is now also happy with your Stage 1 Protocol. All of the reviewers now satisfied that your Stage 1 Protocol meets our criteria for importance of research question and technical soundness of the study proposal. We are thus happy to issue a Stage 1 'in-principle acceptance' decision, with a commitment to publish the final Stage 2 Preregistered Research Article (after revision, if needed), pending successful completion of the study.

**Please carefully read all the following information. There are steps below that are required to complete to begin Stage 2.**

The study should now be completed according to the Stage 1 approved methods and analytic procedures, and the final manuscript should include an evidence-based interpretation of the results. Please see the review criteria for Stage 2 manuscripts here:

https://journals.plos.org/plosbiology/s/reviewer-guidelines#loc-reviewing-preregistered-research-articles

Subsequent editorial decisions for this study will not be based on the perceived importance or novelty of the results obtained during the data gathering and analysis phase of the work. It is critical however that you adhere to the approved Stage 1 study design when performing the study. Any deviation from these experimental procedures would need to be justified and approved by the editors (and potentially the reviewers), as otherwise it could lead to rejection of the manuscript at Stage 2. Please consult the editors immediately for advice if you need to alter this approved study plan.

**IMPORTANT**: Please follow the link below for important information regarding the Stage 2 manuscript template and review criteria. Please carefully read the guidelines on Stage 2 data collection BEFORE performing your study and completing your Stage 2 manuscript.

AUTHOR GUIDELINES: https://genweb.plos.org/Marketing/Biology%20Preregistered%20Articles%20Guidelines%20for%20Authors.pdf

*Depositing this Stage 1 Protocol*

PLOS Biology does not publish Stage 1 Protocols immediately following an in-principle acceptance. Instead they are held and integrated into a single, completed 'Preregistered Research Article' following review and acceptance of the final Stage 2 manuscript. You are however required to register this approved Stage 1 Protocol with the Center for Open Science (https://cos.io/prereg/) or another recognised repository. This may be done publicly or under private embargo until submission of the Stage 2 manuscript. Stage 1 Protocols can be quickly and easily registered using a tailored mechanism for Registered Reports (https://osf.io/rr/). Please do this now. You will need to include the URL to this deposited protocol in your Stage 2 manuscript.

*Timeline*

We understand that carrying out the study will require a significant length of time and are willing to allow you [***1 year - EDITOR TO EDIT BASED ON THE PROPOSED TIMELINE PROVIDED IN THE STAGE 1 MS. MAKE SURE TO UPDATE THE EM REVISION DATE BEFORE SENDING THIS LETTER**] to perform the study. Please email us at plosbiology@plos.org to discuss this if you have any questions or concerns, or to discuss an alternate timeline.

At this stage, your manuscript remains formally under active consideration at our journal. Please notify us by email if you do not wish to submit a Stage 2 manuscript or wish to pursue publication elsewhere, so that we may withdraw your manuscript.

*Resubmission Checklist*

Before submitting the Stage 2 manuscript, please review the following resubmission checklist: https://plos.io/Biology_Checklist

Please note that for PRA stage 2, the response to reviewers file does not follow the standard format, but should rather be a document for the reviewers detailing the changes made to the manuscript since the stage 1 accept.

To submit a revised version of your manuscript, please go to https://www.editorialmanager.com/pbiology/ and log in as an Author. Click the link labelled 'Submissions Needing Revision' where you will find your submission record.

Please make sure to read the following important policies and guidelines while preparing your revision:

*Published Peer Review*

Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out. Please see here for more details:

https://blogs.plos.org/plos/2019/05/plos-journals-now-open-for-published-peer-review/

*PLOS Data Policy*

Please note that as a condition of publication, PLOS' data policy (http://journals.plos.org/plosbiology/s/data-availability) requires that you make available all data used to draw the conclusions arrived at in your manuscript. Please note that for this article type, the raw data itself should be archived and made freely available in a public repository rather than submitted as supplementary material. Please make sure to read the Stage 2 submission guidelines online regarding how this data should be annotated and appropriately time stamped to show that data was collected after this Stage 1 in-principle acceptance and not before.

*Blot and Gel Data Policy*

We require the original, uncropped and minimally adjusted images supporting all blot and gel results reported in an article's figures or Supporting Information files. We will require these files before a manuscript can be accepted so please prepare them now, if you have not already uploaded them. Please carefully read our guidelines for how to prepare and upload this data: https://journals.plos.org/plosbiology/s/figures#loc-blot-and-gel-reporting-requirements

*Protocols deposition*

To enhance the reproducibility of your results, we recommend that, if applicable, you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosbiology/s/submission-guidelines#loc-materials-and-methods

Thank you again for your submission to PLOS Biology. We hope that our editorial process has been constructive thus far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments.

Sincerely,

Kris

Kris Dickson, Ph.D. (she/her)

Neurosciences Senior Editor/Section Manager

PLOS Biology

kdickson@plos.org

--------------------------------------

REVIEWS:

Reviewer #1: The authors have carefully and thoughtfully responded to the comments, and the issues raised in the previous review are either satisfactorily resolved or will be treated in followed-up studies. The critical information in experimental methods, data analysis, and result interpretations have now been properly revised or explicitly qualified, which greatly improved the precision and rigor of presentation. As proposed, a comprehensive analysis can potentially fill in some important gaps in our understanding of the functional manifestation of different "conscious levels" and as such, a thorough treatment may identify novel time series parameters to become a valuable contribution to the literature.

Decision Letter 4

Christian Schnell, PhD

Dear Angus,

Thank you for your patience while we considered your revised manuscript "Towards blinded classification of loss of consciousness: distinguishing wakefulness from general anesthesia and sleep in flies using a massive library of univariate time series analyses" for publication as a Preregistered Research Article at PLOS Biology. Apologies for the long delay in getting back to you. As I mentioned previously, we had unexpected difficulties in finding suitable reviewers as not all of the original reviewers of your Stage 1 submission were available to re-review the revised manuscript. In any case, your revised manuscript has been evaluated by the PLOS Biology editors, the academic editors and two of the original reviewers.

Based on the reviews and on our academic editors' assessment of your revision, we are likely to accept this manuscript for publication, provided you satisfactorily address the remaining points raised by the reviewers and one comment from one of the academic editors. Please also make sure to address the following data and other policy-related requests:

* We would like to suggest a different title to improve its accessibility for our broad audience:

Wakefulness can be distinguished from general anesthesia and sleep in flies using a massive library of univariate time series analyses

* Please add the links to the funding agencies in the Financial Disclosure statement in the manuscript details.

* DATA POLICY:

You may be aware of the PLOS Data Policy, which requires that all data be made available without restriction: http://journals.plos.org/plosbiology/s/data-availability. For more information, please also see this editorial: http://dx.doi.org/10.1371/journal.pbio.1001797

Note that we do not require all raw data. Rather, we ask that all individual quantitative observations that underlie the data summarized in the figures and results of your paper be made available in one of the following forms:

1) Supplementary files (e.g., excel). Please ensure that all data files are uploaded as 'Supporting Information' and are invariably referred to (in the manuscript, figure legends, and the Description field when uploading your files) using the following format verbatim: S1 Data, S2 Data, etc. Multiple panels of a single or even several figures can be included as multiple sheets in one excel file that is saved using exactly the following convention: S1_Data.xlsx (using an underscore).

2) Deposition in a publicly available repository. Please also provide the accession code or a reviewer link so that we may view your data before publication.

Regardless of the method selected, please ensure that you provide the individual numerical values that underlie the summary data displayed in the following figure panels as they are essential for readers to assess your analysis and to reproduce it: 1CD and S3ABC.

NOTE: the numerical data provided should include all replicates AND the way in which the plotted mean and errors were derived (it should not present only the mean/average values).

Please also ensure that figure legends in your manuscript include information on where the underlying data can be found, and ensure your supplemental data file/s has a legend.

Please ensure that your Data Statement in the submission system accurately describes where your data can be found.

* CODE POLICY

Per journal policy, if you have generated any custom code during the course of this investigation, please make it available without restrictions. Please ensure that the code is sufficiently well documented and reusable, and that your Data Statement in the Editorial Manager submission system accurately describes where your code can be found.

Please note that we cannot accept sole deposition of code in GitHub, as this could be changed after publication. However, you can archive this version of your publicly available GitHub code to Zenodo. Once you do this, it will generate a DOI number, which you will need to provide in the Data Accessibility Statement (you are welcome to also provide the GitHub access information). See the process for doing this here: https://docs.github.com/en/repositories/archiving-a-github-repository/referencing-and-citing-content

* Because you looked for generalization across different states (anesthesia/sleep), the underlying assumption is that anesthesia and sleep are caused by the same changes in the neural circuit, as compared to wakefulness. This is a very strong assumption and probably causes a lack of generalization of many univariate time series measures. Some measures may generalize from the discovery set to the test set for anesthesia but not for sleep, for example. This issue should be addressed more in the discussion section of the manuscript.

As you address these items, please take this last chance to review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the cover letter that accompanies your revised manuscript.

In addition to these revisions, you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests shortly.

We expect to receive your revised manuscript within two weeks.

To submit your revision, please go to https://www.editorialmanager.com/pbiology/ and log in as an Author. Click the link labelled 'Submissions Needing Revision' to find your submission record. Your revised submission must include the following:

- a cover letter that should detail your responses to any editorial requests, if applicable, and whether changes have been made to the reference list

- a Response to Reviewers file that provides a detailed response to the reviewers' comments (if applicable, if not applicable please do not delete your existing 'Response to Reviewers' file.)

- a track-changes file indicating any changes that you have made to the manuscript.

NOTE: If Supporting Information files are included with your article, note that these are not copyedited and will be published as they are submitted. Please ensure that these files are legible and of high quality (at least 300 dpi) in an easily accessible file format. For this reason, please be aware that any references listed in an SI file will not be indexed. For more information, see our Supporting Information guidelines:

https://journals.plos.org/plosbiology/s/supporting-information

*Published Peer Review History*

Please note that you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out. Please see here for more details:

https://plos.org/published-peer-review-history/

*Press*

Should you, your institution's press office or the journal office choose to press release your paper, please ensure you have opted out of Early Article Posting on the submission form. We ask that you notify us as soon as possible if you or your institution is planning to press release the article.

*Protocols deposition*

To enhance the reproducibility of your results, we recommend that if applicable you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols

Please do not hesitate to contact me should you have any questions.

Sincerely,

Christian

Christian Schnell, PhD

Senior Editor

cschnell@plos.org

PLOS Biology

------------------------------------------------------------------------

Reviewer remarks:

Reviewer #3: In this Stage 2 version of the manuscript, the authors present their final version of the study.

Overall, the new version follows the Stage 1 rationale, and the authors adhered precisely to the approved Stage 1 experimental procedure. The modifications highlighted in red indicate areas where the authors have updated their methods, but these changes do not fundamentally alter the original hypotheses or approach. The newly analyzed data (additional flies) robustly test the original hypotheses. However, some potential experimental differences—such as acquisition conditions or preprocessing procedures—exist between the Stage 1 and Stage 2 datasets, leading to shifts in data distribution that the authors report but had not fully understood.

In general, the discussion is supported by the results provided in the paper. However, I have two important comments that the authors need to address:

1) In line 959, the authors state: "Across-subject classification using single, specific features—which is ideal for a consciousness measure—is rarely performed, especially in the human neuroimaging literature. In the rare case where it is carried out, significant classification performance is not obtained, likely because only traditional spectral power or complexity measures are used."

I strongly disagree with this argument. Cross-validation and independent validation of neuroimaging biomarkers across subjects using single (and multivariate) features is a well-established standard in the field, particularly in clinical studies. For example, reference 14 in the manuscript employs a series of univariate features, which are cross-validated in one dataset and generalized to two independent validation datasets. This methodological framework has become a standard approach in the field. The authors should either revise their claim or provide substantial evidence supporting their position.

2) The authors do not sufficiently discuss the limitations of using sleep as a model for unconsciousness. Recent research demonstrates that human subjects can respond to commands across different sleep stages, which challenges the validity of sleep as a complete model of unconsciousness (Türker et al., Nature Neuroscience, 2024). In light of this, the reported lack of generalization across all sets of evaluation flies should be discussed within the framework of using sleep as an equally valid or limited model of unconsciousness. The authors should expand their discussion to address this nuance.

Finally, the data availability statement complies with PLOS' requirements. The authors have made preprocessed data publicly accessible in repositories such as OSF and Figshare, with appropriate timestamps confirming data collection after Stage 1 approval.

Addressing these concerns will improve the manuscript's clarity and alignment with current standards in the field.

Reviewer #4 (Tristan Bekinschtein): After the changes to the pre-registered report, the further changes and clarification during the analyses submission process I think this is ready for the consideration of the editor, I as a reviewer was happy throughout the process and engage in scientific conversation at every stage. Thanks.

Decision Letter 5

Christian Schnell, PhD

Dear Angus,

Thank you for the submission of your revised Preregistered Research Article "Wakefulness can be distinguished from general anesthesia and sleep in flies using a massive library of univariate time series analyses" for publication in PLOS Biology. On behalf of my colleagues and the academic editors, Christopher Chambers and Simon van Gaal, I am pleased to say that we can in principle accept your manuscript for publication, provided you address any remaining formatting and reporting issues. These will be detailed in an email you should receive within 2-3 business days from our colleagues in the journal operations team; no action is required from you until then. Please note that we will not be able to formally accept your manuscript and schedule it for publication until you have completed any requested changes.

While you attend to those requests to come, please also make sure to mention in the legend of Figure 1 where the source data can be found.

Please take a minute to log into Editorial Manager at http://www.editorialmanager.com/pbiology/, click the "Update My Information" link at the top of the page, and update your user information to ensure an efficient production process.

PRESS

We frequently collaborate with press offices. If your institution or institutions have a press office, please notify them about your upcoming paper at this point, to enable them to help maximise its impact. If the press office is planning to promote your findings, we would be grateful if they could coordinate with biologypress@plos.org. If you have previously opted in to the early version process, we ask that you notify us immediately of any press plans so that we may opt out on your behalf.

We also ask that you take this opportunity to read our Embargo Policy regarding the discussion, promotion and media coverage of work that is yet to be published by PLOS. As your manuscript is not yet published, it is bound by the conditions of our Embargo Policy. Please be aware that this policy is in place both to ensure that any press coverage of your article is fully substantiated and to provide a direct link between such coverage and the published work. For full details of our Embargo Policy, please visit http://www.plos.org/about/media-inquiries/embargo-policy/.

Thank you again for choosing PLOS Biology for publication and supporting Open Access publishing. We look forward to publishing your study. 

Sincerely, 

Christian

Christian Schnell, PhD

Senior Editor

PLOS Biology

cschnell@plos.org

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Text. Correlations in feature values between discovery and evaluation flies with and without offsetting evaluation fly channels.

    Supplementary figures.

    (PDF)

    pbio.3003217.s001.pdf (716.3KB, pdf)
    Attachment

    Submitted filename: Response to Reviewer Comments.pdf

    pbio.3003217.s004.pdf (3.1MB, pdf)
    Attachment

    Submitted filename: Response to reviewer comments revision2.pdf

    pbio.3003217.s005.pdf (63.7KB, pdf)

    Data Availability Statement

    Pre-processed data from the discovery flies are available on Figshare: https://doi.org/10.26180/5ebe420ae8d89 Pre-processed data from the pilot evaluation flies and blinding procedure are available on OSF: https://osf.io/bq5ry/?view_only=3789097395c1419db2a9eb615bc1effe Pre-processed data, hctsa values, and classification performances and within-fly effect direction consistencies for all the flies are also available on OSF: https://osf.io/8wvsq/?view_only=8a056d1c573b4f23a6cf6cea8b976ddb Analysis codes are available on Zenodo: https://doi.org/10.5281/zenodo.15370576


    Articles from PLOS Biology are provided here courtesy of PLOS

    RESOURCES