Skip to main content
Scientific Reports logoLink to Scientific Reports
. 2024 Dec 30;14:32041. doi: 10.1038/s41598-024-83706-8

Selective learning for sensing using shift-invariant spectrally stable undersampled networks

Ankur Verma 1, Ayush Goyal 2, Sanjay Sarma 3, Soundar Kumara 1,
PMCID: PMC11686352  PMID: 39738646

Abstract

The amount of data collected for sensing tasks in scientific computing is based on the Shannon-Nyquist sampling theorem proposed in the 1940s. Sensor data generation will surpass 73 trillion GB by 2025 as we increase the high-fidelity digitization of the physical world. Skyrocketing data infrastructure costs and time to maintain and compute on all this data are increasingly common. To address this, we introduce a selective learning approach, where the amount of data collected is problem dependent. We develop novel shift-invariant and spectrally stable neural networks to solve real-time sensing problems formulated as classification or regression problems. We demonstrate that (i) less data can be collected while preserving information, and (ii) test accuracy improves with data augmentation (size of training data), rather than by collecting more than a certain fraction of raw data, unlike information theoretic approaches. While sampling at Nyquist rates, every data point does not have to be resolved at Nyquist and the network learns the amount of data to be collected. This has significant implications (orders of magnitude reduction) on the amount of data collected, computation, power, time, bandwidth, and latency required for several embedded applications ranging from low earth orbit economy to unmanned underwater vehicles.

Subject terms: Aerospace engineering, Electrical and electronic engineering, Mechanical engineering

Introduction

The Shannon-Nyquist sampling theorem developed in the 1940s is foundational to contemporary scientific computing techniques. It states that the sampling rate must be at least twice the bandwidth of the signal to avoid loss of information. When overlayed with the fact that sensor data generation will surpass 73 trillion GB by 2025, there are significant data infrastructure costs and time involved in processing all this data1,2. New units of measurements namely ronna (1027) and quetta (1030) were recently introduced to account for the impending data deluge3. This presents both opportunities and challenges for the future of data-driven decision making in the physical world49. Researchers have previously developed methods for collecting less data while ensuring signal reconstruction. They have also established that real world sensor data related to time series, and images are redundant, and we can preserve signal information by collecting less data upfront1016. In biology, recent research shows that low dimensional embeddings learnt from low dimensional data can be used to solve a variety of problems in omics and therapeutics1720. Other approaches include photonic integrated circuits, sparsifying algorithms, and hardware accelerators for neural networks2123. Recently, attention-based mechanisms have been proposed to map spatio-temporal relationships between data at different scales2426. To deal with vast amounts of sensor data, we draw inspiration from the way human perception system works. With very limited storage and processing power, the human brain performs very well on different perception tasks like vision, hearing, touch, taste, and smell. This raises the question of whether machine sensing and human perception operate differently, and whether machine sensing can be improved by studying human perception. We mention some prior seminal literature to motivate the importance of our research question. Active focus in human perception suggests problem-dependent sensing, gathering only necessary information to conserve energy, memory, and processing resources in the brain2729. This phenomenon has been described as a selective attention mechanism which governs our ability to remember what we have seen27. Simons et al. stated that without attention, we may not even perceive objects (known as inattentional blindness)30. Furthermore, power laws are observed in the human visual cortex: most neurons respond to the same stimuli; additional stimuli activate exponentially fewer cells31. The question we ask here is: Can we formulate the ‘human approach’ into a learnable problem? More rigorously, Can we preserve the frequency and temporal information by preserving relative positions of data points? This is possible in techniques like compressed sensing because they exploit the sparsity of data to model sparse approximation problems using convex polytopes resulting in precise undersampling theorems. Compressed sensing is a model-based approach, which relies on sparse representation in a predefined basis like the Fourier basis. Solving an iterative recovery problem makes it computationally expensive, slow, and limits the applicability to specific types of data with underlying sparse structure in a pre-defined basis. Can we learn through a data-driven approach instead of a model-driven way (compressed sensing)? We address the problem purely from a data driven perspective. LeCun et al. in their 1998 Convolutional Neural Networks (CNN) paper underscored the importance of preserving relative positions rather than precise positions of each extracted feature, while highlighting that using precise locations is potentially harmful due to positional variations in different instances32. Fukushima et al. in their 1982 paper proposed Neocognitron; shift and distortion invariant techniques that leveraged the idea of only maximum output cells having their input interconnections reinforced33. Recently, neural scaling or power laws have been used to describe the improvement in test accuracy with larger training data sets34,35.

We motivate our approach using a simple example posed as an exercise for the reader. Consider Fig. 1A, B and C, and 1D sequentially. Suppose that we ask the questions: What is Fig. 1A? What is Fig. 1B? After observing Fig. 1C, do you find some new insight that previously was absent in Fig. 1A and B? Fig. 1C helps us infer that the data fits a sine wave, which we can see is the case in Fig. 1D. Our motivation is illustrated in Fig. 1A-C, where the human vision system requires a certain amount of information to make sense. Figure 1D shows a periodic sine wave with a frequency of 10 Hz. This discussion forms our motivation.

Fig. 1.

Fig. 1

Visual motivation. A, B, C, and D are all undersampled sine waves having different amounts of undersampling.

Problem definition

To address the increasing data infrastructure costs and time incurred in processing sensor data, we propose a selective (undersampled) learning approach to sensing using Shift-Invariant and spectrally stable Undersampled Network (SIUN) to solve classification and regression problems with much less data, compared to the state-of-the-art techniques. SIUN is based on the foundational idea that relative positions of real-world sensor data are sufficient to train learnable models by exploiting the inherent redundancy in these data. Collecting more than a certain fraction of raw data does not significantly improve performance.

To develop the SIUN architecture, we introduce some novel architectural ideas, analogous to what the CNNs do. CNNs combine the following three key architectural ideas to achieve shift, scale, and distortion invariance: (i) local receptive fields, (ii) shared weights, and (iii) spatial or temporal subsampling30. For 1-D sensor data analysis, we need to ensure (i) shift invariance, and (ii) spectral stability. Analogous to CNNs, SIUN introduces the following architectural ideas to preserve these respectively: (i) windowing to preserve local information, and (ii) random seed-based sampling on each window with the data sampled at Nyquist frequency. SIUN converts real-time sensing into learning problems like classification or regression, making it applicable to most sensor data analysis tasks. We use random seed-based sampling at Nyquist rate to preserve signal amplitudes at specific instances of time, as shown in Fig. 2. This forms the basis for separating one function from another and one signature from another. Our approach is inspired by the human vision system’s ability to infer a periodic structure from an undersampled monotone sine wave with much fewer data (20–30% data as shown in Fig. 1B-C) than contemporary approaches require (100% data). We aim to improve upon this ability using a learnable function, as shown in Fig. 2, and to capture signatures of physical phenomena more effectively. We use neural networks for modelling as they (i) give a general differentiable learnable mapping that can be fine-tuned for different tasks (like classification and regression) by changing the type of loss function and hyperparameters, (ii) can keep improving with larger datasets, and (iii) the same architectural ideas can be extended to other sensing modalities like images and videos which have redundant data.

Fig. 2.

Fig. 2

Architecture of the shift-invariant and spectrally stable undersampled neural network (SIUN).

The architecture of the SIUN is shown in Fig. 2, highlighting the implementation of undersampling, positional encoding, and the feed forward block on different problems like classification, anomaly detection, transfer learning, and regression. The weights and biases are learned using backpropagating errors by formulating appropriate loss functions: Cross entropy loss for classification, and L1 loss (L1)1/x, x = 2n, n = positive integer, for regression. Adam optimizer was used along with Tanh activation function for classification and Leaky ReLu activation function for regression.

Experimental setup

We use the SIUN to solve classification and regression problems on sensor data, with results categorized into these two areas. Extensive experiments are performed on five datasets: four for classification and one for regression. The first set of experiments are conducted on the Case Western Reserve University (CWRU) bearing fault dataset36. The 12 kHz sampling frequency and Drive end dataset is used in this study, similar to previous literature37,38. Three different faults (ball defect, inner race defect, and outer race defect centered at 6:00) at 3 fault diameters (0.007 in, 0.014 in, 0.021 in) are considered as 9 faulty classes. This results in a 10-class classification problem with 1 normal class and 9 faulty classes. The first 90,000 data points of each of the 10 classes are used for training and the next 30,000 data points are used for testing, resulting in a 75 − 25 train-test split. A window length of 2000 is used, with overlapping windows used to create 1x, 10x, 20x, and 40x data augmentation in training set. The experiments are repeated four times. The second set of experiments are performed on the UCI HAR dataset39. This dataset comprises of human activity recognition data collected from 30 subjects while performing daily activities of living including walking, walking upstairs, walking downstairs, sitting, standing, and laying. The UCI HAR dataset consists of different train and test folders of sizes 192 and 77 MB respectively. The sampling frequency of channels in this dataset is 50 Hz, and windows of length 2.4 s are used, creating windows of 120 data points each. Overlapping windows are used to create 1x, 2x, 5x, and 10x data augmentation in the train set. The experiments are repeated ten times. The third set of experiments are conducted on the PAMAP2 (Physical Activity Monitoring) dataset40, as previously performed in the literature41. It consists of 18 classes of activities (including 6 optional activities) performed by 9 subjects. The dataset has 54 channels of data, out of which 51 are used in this work to evaluate SIUN performance. The sampling frequency of channels in this dataset is 100 Hz, and windows of length 1 s are used, creating windows of 100 data points each. In this dataset, windowing is done prior to 70 − 30 split and no overlapping windows are used. This method is used in this dataset and not others, as it has activities from each individual captured sequentially. If we split the dataset into 70 − 30 before windowing, we will only show certain classes to the train set and certain classes to the test set. Overlapping windows are not used even in the train set to avoid data leakage into the test set. The experiments are repeated three times. The fourth set of experiments are performed on a machinery fault simulator dataset, wherein the experimentation was performed and the data was collected by the first author. A shaft unbalance was created by adding a weight of 11 g on the machinery fault simulator to study the normal and unbalanced fault signatures. Each experiment is 20 s of data sampled at 4096 Hz for 2 classes – normal, and shaft unbalance, amounting to a total of 4 MB of data for training the network/experiment. A window length of 400 is used, with overlapping windows used to create 1x, 5x, 10x, and 20x data augmentation in the train set. The experiments are repeated five times. A 75 − 25 train-test split is used in this study. For the regression problem, we conducted 12 experiments on composite frequency signals having 2–5 frequencies in a broadband range of 20 Hz − 20 kHz. SIUN was trained on simulated data comprising of 1–5 frequencies + noise in the range of 20 Hz – 20 kHz at a sampling frequency of 44.1 kHz. It was tested on (i) test set of the simulated data (Fig. 3A), and (ii) real-world sound and vibration calibration signals (Fig. 3B and C). These experiments allow us to test the SIUN on a classification-regression spectrum for temporal data.

Fig. 3.

Fig. 3

Undersampled learning-based regression using SIUN. A Variation of test accuracy with % undersampling for the regression (12 experiments), here, test accuracy is the fraction of test data having < 15% prediction error, B inference using SIUN on a sound calibration signal (945 Hz) using 10% undersampling, and C inference using SIUN on a vibration calibration signal (159 Hz) using 10% undersampling.

Results

The key findings were that: (i) less data can be collected while still preserving information, (ii) test accuracy improves with data augmentation (increasing the size of training data) rather than collecting more than a certain fraction of raw data, and (iii) collecting more than a certain fraction of raw data only leads to marginal improvements in SIUN performance. We do data augmentation (increase training data size) by using overlapping sliding windows rather than collecting more data points. Data augmentation is done only on the train set to avoid data leakage. The detailed results for classification and regression problems are presented below.

Classification

SIUN was evaluated on four different classification datasets: (i) Case Western Reserve University (CWRU) bearing fault dataset, (ii) UCI HAR (Human Activity Recognition) dataset, (iii) PAMAP2 (Physical Activity Monitoring) dataset, and (iv) shaft unbalance dataset using a machinery fault simulator (MFS, performed by the first author). Figure 4A and B, and 4C show the analysis performed on the CWRU dataset, Fig. 4D shows the analysis performed on the MFS dataset, Fig. 4E shows the analysis performed on UCI HAR dataset, and Fig. 4F shows the analysis performed on PAMAP2 dataset. The variation of test accuracy with % data used for CWRU dataset is shown in Fig. 4A. Figure 4B shows the confusion matrix for the entire CWRU dataset, showing that there is no class imbalance in the results. Figure 4C shows the results of transfer learning on the CWRU dataset, where we change the last layer of the 10-class classification problem to a 4-class classification problem (normal, ball defect, inner race defect, and outer race defect). Figure 4D shows the variation of test accuracy with % data used for the MFS dataset. Figure 4E shows the variation of test accuracy with % data used for the UCI HAR dataset. Figure 4F shows the variation of test accuracy with % data used for the PAMAP2 dataset. As can be seen across these four datasets: (i) SIUN reaches 90% test accuracy or more with just 10–20% data used, (ii) performance increases with data augmentation (size of the training dataset), rather than with the fraction of raw data collected. These results show that SIUN is effective in preserving the necessary information from undersampled data for solving classification tasks.

Fig. 4.

Fig. 4

Undersampled learning-based classification using SIUN. A Variation of test accuracy with % data used for the CWRU bearing fault dataset, B Confusion matrix for the CWRU dataset, C Architecture used for transfer learning on a 4-class classification problem, D Variation of test accuracy with % data used for the MFS shaft unbalance dataset, E Variation of test accuracy with % data used for the UCI HAR dataset, and F Variation of test accuracy with % data used for the PAMAP2 dataset. The spread in the above figures shows the range across several experimental runs.

Regression

Figure 3A shows the results of 12 experiments done by varying number of frequencies (4 frequencies – 2, 3, 4, and 5) and SIUN architecture (100 × 100, 200 × 200, 400 × 400). The following two observations are significant: (i) SIUN reaches 80%+ accuracies for some experiments with just 20% undersampling, (ii) there are minor performance improvements with the amount of raw data collected. SIUN trained on simulated data predicts the frequency of actual undersampled sound and vibration calibration signals as shown in Fig. 3B and C respectively, indicating that the requisite information is preserved. The sound calibration signal has a frequency of 945 Hz and the vibration calibration signal has a frequency of 159 Hz. These results show that SIUN is effective in preserving the necessary information from undersampled data for solving regression tasks.

Performance comparison

We compare the performance (test accuracy) and model complexity of SIUN with state-of-the-art Convolutional Neural Networks (CNNs). Within model complexity, we calculate the number of parameters and floating-point operations for each of the models. A detailed comparison is provided in Table 1. Test accuracy is used as a metric to quantify the number of correct class predictions for classification, and to measure the fraction of test data having < 15% prediction error for regression, to account for noise in the signals. Confusion matrices are used to check for any class imbalance in the model predictions. Different configurations of a feed-forward network are used for different classification and regression problems. For the CWRU dataset, the SIUN achieves the achieves 96.0% test accuracy on just 30% of the raw data. A LeNet5-based CNN achieves 99.77% accuracy on 100% of the raw data, as reported in literature38. The SIUN achieves a 435.01x reduction in the number of FLOPS required while only being just 3.77% lower in accuracy. For the UCI HAR dataset, the SIUN achieves 90.67% accuracy on just 20% of the raw data. A CNN achieves 92.71% accuracy on 100% of the raw data as reported in literature41. The SIUN used 26.84x less FLOPS than the CNN, while only being 2.04% lower in accuracy. For the PAMAP2 dataset, the SIUN achieves 91.1% test accuracy on just 10% of the raw data. A CNN achieves 91.0% accuracy on 100% of raw data as reported in literature41. The SIUN used 7.97x less FLOPS while being 0.1% more accurate than the CNN. For the MFS dataset, the SIUN achieves 100.0% test accuracy on just 20% of the raw data. A CNN benchmark is not available for the MFS dataset as the experiments were conducted by the first author of this article. For the classification problems, we used 64 × 32 × 16, 64 × 16, 50 × 40, 30 × 30 architectures for CWRU, UCI HAR, PAMAP2, MFS datasets respectively. For regression problems we use 100 × 100 × 2–5, 200 × 200 × 2–5, and 400 × 400 × 2–5 architectures to predict 2–5 frequencies in a broad range of 20 Hz−20 kHz.

We design our architecture to preserve shift invariance and spectral stability42,43; two primary requirements for time and frequency domain analysis of temporal signals. Preserving the shift-invariance is important because different windows with the same information when passed through SIUN should give the same output, i.e. we want position independent sampling. To do this, we first decide the window length based on the data characteristics in order to maintain the stationarity of data within the window and then undersample each window using a random sampling scheme. Preserving spectral stability is important to make sure that the network is neither selective to low nor high frequency signals. We generate both low and high frequency signals (between 20 Hz and 20 kHz) to train SIUN on individual and composite signals with up to 5 frequencies and noise. Differential data augmentation for training on varying numbers of frequencies is used to make sure that the network is equally sensitive to low or high frequency signals. Spectral stability is demonstrated in Fig. 3A indicating that SIUN is not selective for lower or higher frequencies and can preserve spectral information in temporal or spatial signals.

Table 1.

Performance and model complexity comparison.

Dataset Model Performance (Test Accuracy %) Model complexity Reduction in SIUN FLOPS compared to CNN
Parameters (in 1000s) FLOPS (in 1000s)
CWRU SIUN 96.0 41.24 82.48 435.01x
CNN (38) 99.77 3040 35,880
UCI HAR SIUN 90.67 15.03 30.06 26.84x
CNN (41) 92.71 89.734 806.884
PAMAP2 SIUN 91.1 27.58 55.16 7.97x
CNN (41) 91.0 79.308 439.496
MFS SIUN 97.0 3.42 6.84 -
CNN - - -

Discussion

To analyze large amounts of sensor data, we introduce a selective learning approach to sensing, where the amount of data collected is problem dependent. The amount of sensor data to be collected for a class of problems is a learnable property of the proposed SIUN architecture. The SIUN gives a general trainable model that preserves shift invariance and spectral stability for temporal sensor data. This architecture successfully solves problems formulated as classification and regression. Reducing the amount of data collected significantly reduces the number of network nodes needed for data representation and computation, as reported in the text, leading to lightweighting of the network. Performance improvements are observed with data augmentation (larger training datasets) rather than by collecting more than a certain fraction of raw data. Intuitively, our results are consistent with the power law observed in the human visual cortex: most neurons respond to the same stimuli while additional stimuli activate exponentially fewer cells. The approach can be applied to solve multiple sensing problems in various domains, such as satellite data analysis, drones and Unmanned Aerial Vehicles (UAVs), manufacturing condition monitoring, underwater monitoring, and other embedded AI applications. SIUN significantly reduces the required computation (flops), power, storage, memory (RAM), and bandwidth required for different sensor applications. Some of the considerations in actual deployment include implementation on bare-metal devices, ensuring micro to milli second level prediction latencies, reducing the number of flops to reduce power consumption.

Outlook

The anticipated data deluge demands novel approaches for efficient and economical data analysis. We propose SIUN, a selective learning-based neural network architecture, giving significant improvements in power, compute, storage, transmission, and latency for data-driven decision making. We evaluate SIUN’s performance on various sensing problems by formulating them into classification and regression using a comprehensive experimental design. Our approach highlights the significance of problem-dependent sampling for efficient sensing rather than collecting all the raw data. This fundamental insight of problem-dependent sampling is driven by a learning approach and is hardware agnostic, making it equally beneficial for electronic, photonic, or other compute hardware modalities.

Acknowledgements

Portions of this research were conducted with Advanced CyberInfrastructure computational resources provided by the Institute for Computational and Data Sciences at The Pennsylvania State University (https://www.icds.psu.edu/). Part of the experimentation presented in this work was carried out during the primary author’s internship at the Acoustics and Condition Monitoring lab, Indian Institute of Technology, Kharagpur, India. The authors would like to thank Dr. Thomas Gabrielson from Pennsylvania State University for the sound calibration data. These sources were used only for example data and not for developing the methodologies and claims presented.

Author contributions

AV conceived the SIUN architecture. AV, AG, and SK developed the methodology. AV and AG performed the investigation and visualization. AV and SK were involved in the funding acquisition and project administration. SK, SS supervised the project. AV wrote the initial draft, with review & editing done by AV, AG, SK, SS.

Funding

This material is based upon work supported by the National Science Foundation under I-Corps National Teams Award (2235121). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

Data availability

Data for most classification experiments are available in the main text, while data for one classification, and regression experiments are available on reasonable request from the corresponding author (SK).

Declarations

Competing interests

AV and SK have filed two provisional patent applications on some implementations of the above technology, on April 13th, 2022 (U.S. Prov. Pat. App. No. 63330669), and on September 22nd, 2022 (U.S. Prov. Pat. App. No. 63408961). AV, AG, and SK have filed a provisional patent application on some implementations of the above technology, on January 4th, 2024 (U.S. Prov. Pat. App. No. 63/617,463), and on December 17th, 2024 (PCT/US24/60489). AV, AG, and SK have financial interests in Lightscline. SS has no competing interests to declare. As of May, and June 2024, AV and AG respectively are employed with Lightscline Inc.

Footnotes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  • 1.What is Iot? The Future of Business | SAP December (2023). https://www.sap.com/products/artificial-intelligence/what-is-iot.html. (Accessed: 29th December 2023)
  • 2.Future of Industry Ecosystems: Shared Data and Insights (IDC, 2021). (Accessed: 29th December 2023)
  • 3.Expansion to the SI prefix range. NPLWebsite December (2023). https://www.npl.co.uk/si-prefix. (Accessed: 29th December 2023)
  • 4.Meiser, L. C. et al. Synthetic DNA applications in Information Technology. Nat. Commun.13, 352 (2022). [DOI] [PMC free article] [PubMed]
  • 5.Verma, A., Goyal, A., Kumara, S. & Kurfess, T. Edge-cloud computing performance benchmarking for IOT based machinery vibration monitoring. Manuf. Lett.27, 39–41 (2021). [Google Scholar]
  • 6.Verma, A., Goyal, A. & Kumara, S. Machine learning-assisted collection of reduced sensor data for improved analytics pipeline. Procedia CIRP. 121, 150–155 (2024). [Google Scholar]
  • 7.Gao, R. X., Wang, L., Helu, M. & Teti, R. Big Data Analytics for smart factories of the future. CIRP Ann.69, 668–692 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Verma, A., Oh, S. C., Arinez, J. & Kumara, S. Hierarchical energy signatures using machine learning for operational visibility and diagnostics in automotive manufacturing. Manuf. Lett.40, 81–84 (2024). [Google Scholar]
  • 9.Mohanty, A. R. Machinery Condition Monitoring: Principles and Practices (CRC, 2017). [Google Scholar]
  • 10.Baraniuk, R. G. More is less: Signal Processing and the Data Deluge. Science331, 717–719 (2011). [DOI] [PubMed] [Google Scholar]
  • 11.Candes, E. J., Romberg, J. & Tao, T. Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information. IEEE Trans. Inf. Theory. 52, 489–509 (2006). [Google Scholar]
  • 12.Baraniuk, R. Compressive sensing [lecture notes]. IEEE. Signal. Process. Mag.24, 118–121 (2007). [Google Scholar]
  • 13.Donoho, D. L. & Tanner, J. Precise undersampling theorems. Proceedings of the IEEE 98, 913–924 (2010).
  • 14.Yan, W., Rosca, M. & Lillicrap, T. Deep Compressed Sensing. International Conference on Machine Learning, (2019).
  • 15.Brunton, S. L. & Kutz, J. N. Data-Driven Science and Engineering: Machine Learning, Dynamical Systems, and Control (Cambridge University Press, 2022). [Google Scholar]
  • 16.Bronstein, M. M., Bruna, J., LeCun, Y., Szlam, A. & Vandergheynst, P. Geometric deep learning: going beyond euclidean data. IEEE. Signal. Process. Mag.34, 18–42 (2017). [Google Scholar]
  • 17.Cleary, B. & Regev, A. The necessity and power of random, under-sampled experiments in biology. arXiv:2012.12961 [q-bio.QM] (2020).
  • 18.Cleary, B. et al. Compressed sensing for highly efficient imaging transcriptomics. Nat. Biotechnol.39, 936–942 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Cleary, B., Cong, L., Cheung, A., Lander, E. S. & Regev, A. Efficient generation of transcriptomic profiles by random composite measurements. Cell171, 1424–1436 (2017). [DOI] [PMC free article] [PubMed]
  • 20.Jain, P. & Sarma, S. E. Measuring light transport properties using speckle patterns as structured illumination. Sci. Rep.9, 11157 (2019). [DOI] [PMC free article] [PubMed]
  • 21.Sludds, A. et al. Delocalized photonic deep learning on the internet’s edge. Science378, 270–276 (2022). [DOI] [PubMed] [Google Scholar]
  • 22.Wetzstein, G. et al. Inference in artificial intelligence with Deep Optics and Photonics. Nature588, 39–47 (2020). [DOI] [PubMed]
  • 23.Bogaerts, W. et al. Programmable photonic circuits. Nature586, 207–216 (2020). [DOI] [PubMed] [Google Scholar]
  • 24.Vaswani, A. et al. Attention is all you need. Advances in neural information processing systems 30, (2017).
  • 25.Dosovitskiy, A. et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020).
  • 26.He, K. et al. Masked autoencoders are scalable vision learners, arXiv:2111.06377 [cs.CV] (2021).
  • 27.Bays, P. M. & Husain, M. Dynamic shifts of limited working memory resources in human vision. Science321, 851–854 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Padamsey, Z., Katsanevaki, D., Dupuy, N. & Rochefort, N. L. Neocortex saves energy by reducing coding precision during food scarcity. Neuron110, 280–296 (2022). [DOI] [PMC free article] [PubMed]
  • 29.Bajcsy, R. Active perception. Proc. IEEE. 76, 966–1005 (1988). [Google Scholar]
  • 30.Simons, D. J. & Chabris, C. F. Gorillas in our midst: sustained inattentional blindness for dynamic events. Perception28, 1059–1074 (1999). [DOI] [PubMed]
  • 31.Stringer, C., Pachitariu, M., Steinmetz, N., Carandini, M. & Harris, K. D. High-dimensional geometry of population responses in visual cortex. Nature571, 361–365 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Lecun, Y., Bottou, L., Bengio, Y. & Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE. 86, 2278–2324 (1998). [Google Scholar]
  • 33.Fukushima, K., Miyake, S. & Neocognitron A new algorithm for pattern recognition tolerant of deformations and shifts in position. Pattern Recogn.15, 455–469 (1982). [Google Scholar]
  • 34.Meir, Y. et al. Power-law scaling to assist with key challenges in Artificial Intelligence. Sci. Rep.10, 19628 (2020). [DOI] [PMC free article] [PubMed]
  • 35.Bahri, Y., Dyer, E., Kaplan, J., Lee, J. & Sharma, U. Explaining Neural Scaling Laws Preprint at https://arxiv.org/abs/2102.06701 (2021). [DOI] [PMC free article] [PubMed]
  • 36.Case Western Reserve University Bearing Data Center. Accessed: Dec. 22, 2019. [Online]. Available: https://engineering.case.edu/bearingdatacenter
  • 37.Smith, W. A. & Randall, R. B. Rolling element bearing diagnostics using the Case Western Reserve University data: a benchmark study. Mech. Syst. Signal Process.64–65, 100–131 (2015).
  • 38.Wen, L., Li, X., Gao, L. & Zhang, Y. A new convolutional neural network-based data-driven fault diagnosis method. IEEE Trans. Industr. Electron.65(7), 5990–5998 (2018).
  • 39.Reyes-Ortiz, J., Anguita, D., Ghio, A., Oneto, L. & Parra, X. Human Activity Recognition Using Smartphones [Dataset]. UCI Machine Learning Repository. 10.24432/C54S4K. (2013).
  • 40.Reiss, A. PAMAP2 physical activity monitoring. UCI Mach. Learn. Repository. 10.24432/C5NW2H (2012). [Dataset].
  • 41.Wan, S., Qi, L., Xu, X., Tong, C. & Gu, Z. Deep learning models for real-time human activity recognition with smartphones. Mob. Networks Appl.25(2), 743–755 (2019).
  • 42.Paul, S. & Chen, P. U. Vision transformers are robust learners. Preprint at https://arxiv.org/abs/2105.07581 (2021).
  • 43.Yin, D. et al. A Fourier perspective on model robustness in computer vision. arXiv:1906.08988 [cs.LG] (2019).

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

Data for most classification experiments are available in the main text, while data for one classification, and regression experiments are available on reasonable request from the corresponding author (SK).


Articles from Scientific Reports are provided here courtesy of Nature Publishing Group

RESOURCES