Abstract
Inverse-designed nanophotonic devices offer promising solutions for analog optical computation, where high-density photonic integration is critical for scaling computational complexity. Here, we present an inverse-designed photonic neural network (PNN) accelerator, enabling ultra-compact and energy-efficient optical computing. Using a wave-based inverse-design method based on three-dimensional finite-difference time-domain simulations, we exploit the linearity of Maxwell’s equations to reconstruct arbitrary spatial fields through optical coherence. Each subwavelength voxel serves as a trainable degree of freedom, yielding a computational density of approximately 400 million parameters per mm². By decoupling the forward-pass process into linearly separable simulations, our approach is highly amenable to computational parallelism. We experimentally demonstrate two inverse-designed PNN accelerators, achieving on-chip MNIST and MedNIST classification accuracies of 89% and 90% respectively, within footprints of 20 × 20 µm² and 30 × 20 µm². Our results establish a scalable, energy-efficient platform for photonic computing, bridging inverse nanophotonic design with high-performance optical information processing.
Subject terms: Integrated optics, Micro-optics
Scaling of optical computing requires computationally dense and efficient hardware. Here, the authors inverse-design and experimentally demonstrate ultracompact nanophotonic neural network accelerators with high computational density, enabling scalable and energy-efficient analog photonic computing.
Introduction
Machine learning has transcended conventional signal analysis techniques in numerous research fields ranging from computer vision to natural language processing1,2. Central to this progress is the ability of neural networks to learn complex data representations, enabling robust and adaptive inference. However, as neural models continue to grow in complexity, conventional electronic hardware faces increasing challenges in meeting the demands of computing speed and energy consumption3–6. In response, photonic neural networks (PNNs) have emerged as a promising platform for analog neural computation, offering key advantages such as ultrafast processing and low-power consumption4–10. PNN accelerator hardware based on programmable photonic integrated circuits such as Mach-Zehnder interferometer (MZI) meshes7, microrings8, and waveguide attenuators9, enables in situ training10 and dynamic weight adjustment through optical backpropagation10,11. By contrast, task-optimized architectures are trained offline12 and then realized in hardware for stable and efficiency-critical inference6. This paradigm, analogous to application-specific integrated circuits (ASICs) in electronics, eliminates static tuning power and runtime weight calibration13, leading to compact footprints with low-latency, stable inference14. Moreover, physically embedded weights enable in-memory optical computing4, significantly alleviating memory-bandwidth constraints present at run-time in conventional computing architectures at scale15.
To unlock the full potential of PNNs, it is essential to achieve high-density integration capable of supporting complex models and large datasets on-chip3,6,16. Our approach emphasizes in-memory optical single-shot run-time operation, with highly efficient, stable and low-latency inference. Traditional photonic design approaches, while effective for standard components, often fall short when optimizing for compactness, multifunctionality, and performance simultaneously. In this context, topology optimization-based inverse-design methods have emerged as a powerful tool17–20. By computationally exploring a vast design space unconstrained by human intuition, it enables the discovery of non-intuitive geometries that maximize light-matter interaction within a minimal footprint21. Unlike diffractive models4–6,22, which rely on sequential, layer-by-layer architectures mimicking feedforward deep neural networks (DNNs), topology optimization allows arbitrary control over scattering, interference, and phase evolution throughout the device’s entire volume, enabling more compact and functionally dense implementations23–25.
However, the practical deployment of topology-optimized PNNs to accommodate increased dataset sizes remains challenging due to the absence of a scalable, physics-accurate training framework. Moreover, increasing the refractive index contrast enhances signal expressivity, which refers to the capacity for a rich set of field transformations within a fixed geometry21, through strengthening confinement, supporting internal resonances26, and enabling stronger interference and engineered scattering effects27,28 - all of which are critical for high-fidelity optical computation.
Consequently, three dimensional solvers such as three dimensional finite-difference time-domain (3D-FDTD) are required, since approximation methods such as the effective index method often fail to accurately capture the resulting out-of-plane scattering dynamics28–30. Moreover, as neural architectures scale to support increasingly complex tasks and larger datasets, computational efficiency and scalability become essential. Thus, achieving ultra-compact PNNs suitable for dense on-chip integration necessitates a design approach that captures the full-wave physics of high-index-contrast nanophotonic structures, while remaining computationally scalable and fabrication feasible.
Here, we demonstrate ultra-compact, inverse-designed PNN accelerators fabricated on a high-index-contrast silicon-on-insulator (SOI) platform operating at a single wavelength, where higher index contrast strengthens light confinement and supports rich interference and mode-mixing dynamics27,28, thereby expanding the space of attainable optical responses per unit volume. To enhance the inverse-design efficiency, we develop a wave-based inverse-design approach grounded in 3D-FDTD simulations, which exploits the linearity of Maxwell’s equations to reconstruct arbitrary spatial fields through optical coherence. By decoupling the forward-pass process into linearly independent simulations, our approach enables efficient parallelization and scales naturally across high-throughput computing hardware, such as graphics processing units (GPUs). Each subwavelength voxel acts as a trainable degree of freedom, yielding an ultra-compact representation of optical transformations with a computational density of approximately 400 million parameters per mm². The fabricated devices, with footprints as small as 20 × 20 µm² and 30 × 20 µm², perform on-chip image classification, demonstrating their efficacy on benchmarking datasets (MNIST31 and MedNIST32–34), and achieving experimental accuracies of 89% and 90%, respectively. Our inverse-designed PNN demonstrates the practical feasibility of ultra-compact nanophotonic devices for accelerating on-chip data processing. Owing to the method’s scalability and inherent compatibility with parallel computing architectures, this work is extensible to higher-dimensional processing tasks, offering a scalable pathway to efficiently handle increasingly demanding computational workloads.
Results
PNN accelerator configuration and inverse-design
Figure 1a presents a schematic overview of the on-chip PNN accelerator, wherein compressed and flattened input features from the dataset are encoded onto coherent optical amplitudes at a single wavelength (λ). Within the topology-optimized scattering region, the encoded optical fields undergo complex interference and scattering interactions throughout the device volume. The scattering region itself is designed to support non-uniform optical transport21,28, with routing behavior that adapts to the spatial distribution of optical inputs, efficiently redistributing optical power from the input encoding ports toward the class output ports. We denote as the number of classes in the dataset and as the output dimension of the PNN, where in the one-hot setting. The PNN outputs represent class-evidence scores. A probability distribution across classification categories is obtained after photodetection, analogous to the output layer of a conventional multi-layer perceptron network (Fig. 1b).
Fig. 1. Inverse-designed nanophotonic neural network accelerator.
a Schematic diagram of the proposed on-chip PNN accelerator with N input feature encoding class output distribution , and scattering region . b Corresponding schematic of an analogous digital multi-layer perceptron network. c Flow chart detailing the inverse-design and training procedure for on-chip PNN. A latent design feature space is parametrized into a real permittivity matrix of dimension , representing spatial discretization in the , and directions. Subsequently, complex -fields are generated, and a physics-based gradient is computed via AVM for backpropagation. This gradient is derived from the -forward fields and -reverse mode fields . Inset summaries the inverse-design forward-pass and backpropagation procedures. Columns 1-2 illustrate the real and imaginary components of the -coherent forward mode fields ( superimposed) at indexes with values bounded between []. Column 3 shows the field magnitude |E| for exemplary training samples in classes ‘CXR’, ‘Hand’ and ‘AbdomenCT’ at indexes from matrix multiplication with the dataset feature matrix . Importantly, this operation reconstructs a full set of simulated fields from only a limited number of forward simulations. Column 4 captures the corresponding T-reverse mode induced gradient spatial fields which are concatenated (⨁) and used to optimize the parameter space with respect to the cross-entropy loss function with values bounded between [∇min, ∇max].
To optimize the device material distribution across the scattering region, we directly perform the optimization within the binarized regime, while incorporating fabrication process-specific design constraints every epoch (see Methods). This approach removes the necessity for hyperparameter scheduling and post-processing discretization35. As illustrated in Fig. 1c, the latent design feature space, which varies continuously between 0 and 1, undergoes low-pass filtering to smooth parameter spatial distributions and is subsequently binarized36 into fabrication-compatible materials: silicon (Si) and silicon dioxide (SiO2) as representative examples. A custom material distribution, characterized by its permittivity matrix , where and are its real and imaginary components respectively, can model material loss and dispersive effects. We apply morphological operations to remove small features37, and B-spline contour approximations38–40 to further refine and shape the final geometry under minimal feature size and radius of curvature constraints, ensuring manufacturability41.
To accurately capture the performance of our design applied to multi-class image classification, it is suitable to minimize the cross-entropy (CE)1 loss between the one-hot encoded ground truth labels and the output optical power vector. This objective function indicates degraded performance when the optical power at the target output port is low. The corresponding output probability distribution across classes is computed by taking the surface integral of the optical power density at each port normalized by their total sum24. As a result of this normalization, the CE loss function implicitly penalizes higher distributed power at non-target ports, which is important for limiting channel crosstalk. The gradient of the loss function with respect to the full design space is computed using the adjoint variable method (AVM)42,43 and is proportional to the overlap of the forward mode fields and the adjoint fields. For each classification category, an adjoint source is constructed from the one-hot encoded target vector, which is used to derive the reverse-mode field (Eadj).
To simulate the forward-pass optical fields for a dataset containing L samples, each comprising N features, we exploit the linearity of Maxwell’s equations to significantly reduce computational complexity. Rather than performing separate 3D-FDTD simulations, we reduce the simulation cost by executing one independent simulation for each of the N-input encoding ports, as illustrated in columns 1 and 2 of the inset in Fig. 1c. Leveraging the coherent nature of optical fields44, one can derive sample-wise fields as a linear combination of precomputed mode source fields , such that , where are the coefficients mapped to dataset features at sample and is the full 3D field from the i-th mode source. More generally, the entire dataset of reconstructed optical fields can be compactly expressed as , where is the dataset feature matrix encoding all input samples, and is the tensor containing the corresponding forward fields. The information is reshaped into a complex tensor with dimensions , where nx, ny and denote the matrix dimensions in the , and -axes respectively. From this tensor, a subset T⊂L is selected as training fields, which directly inform the iterative updates during inverse-design optimization. Representative training fields corresponding to sample images from the MedNIST dataset are shown in Fig. 1c, inset column 3. The exact gradient update, as illustrated in Fig. 1c inset, column 4 is obtained by solving the AVM equation across the training samples. This gradient acquisition differs from straight-through estimators, which approximate the physical gradient through binary activation functions using an identity operator - a process which is a commonly used in training quantized DNNs45. Using stochastic gradient descent with momentum (Fig. 1c), we iteratively update the latent parameter distribution to minimize the CE loss, thus progressively increasing the classification accuracy during optimization (see Methods).
The total number of required 3D-FDTD simulations per epoch is only , composing simulations (forward mode) to precompute input-related fields, plus simulations (reverse mode) to compute gradients for each class category. This approach substantially lowers computational costs compared to directly simulating the entire dataset, especially beneficial when dealing with large datasets, where the total dataset size is significantly greater than 2,46. Furthermore, by virtue of the linear separability of the forward mode spatial fields, this inverse-design approach is highly amenable to computational parallelism, making it particularly well suited for acceleration on GPU and other parallel computing architectures, thereby enhancing scalability across large problem domains.
On-chip image classification tasks
To demonstrate the scalability of our inverse-design scheme applied to image classification, we perform numerical experiments using two benchmark datasets with progressively increased complexity: the MNIST dataset, containing 70,000 grayscale digit images31 with 10 input features (N = 10), 10 classes (C = 10) and a design region footprint 20 × 20 µm2, and the MedNIST containing 58,954 medical images32–34 with 15 features (N = 15) across six classes (C = 6) and a design region footprint of 30 × 20 µm2. For both datasets, the input data features are spatially encoded onto the amplitude of light at a wavelength of 1550 nm. The MNIST dataset is partitioned into 50,000 training images and 20,000 test images, while the MedNIST dataset includes 48,000 training images and 10,954 test images. Given that these datasets are in grayscale, it is only necessary to restrict the problem domain to the time-harmonic regime in numerical simulations21. The MNIST classification results are shown in Fig. 2. Figure 2a illustrates the normalized power distribution across all output ports and the corresponding electric field (E−field) propagation at epochs 0, 60, and 120 for a randomly selected example from Class ‘5’.
Fig. 2. MNIST 10-class classification results.
a Normalized optical power distribution with respect to a maximum value Pmax and minimum value Pmin across output ports at epochs 0, 60 and 120, shown alongside the corresponding E-field magnitude maps at center-line, with maximum and minimum values denoted as and respectively for a randomly selected test sample from class ‘5’. Discrete index contours highlight the evolving hot-spot concentration at the correct output port. b Test loss and accuracy curves as a function of epochs with range overlayed across three random data permutations. Green: Test CE loss. Blue: Test classification accuracy. c Confusion matrix for 10-class image classification. d Energy density matrix at final epoch. e Blue: Mean test accuracy as a function of training data, with range overlaid across three random data permutations. Red: Max-normalized time taken for single training epoch as a function of training data.
Initially, the randomized high-contrast material topology does not yield meaningful classification outcomes. However, as topology optimization progresses, optical energy becomes increasingly concentrated at the correct output port corresponding to Class ‘5’, while power at neighboring ports is effectively suppressed. The corresponding continuous wave (CW) spatial field response of the design at the final epoch is presented in Supplementary Movie 1, showing the evolution of the optical field distribution over multiple optical cycles. Figure 2b depicts test loss and accuracy as a function of training epochs, with range intervals capturing variability over three separate training sequences where the datasets are randomly permutated. The final confusion matrix (Fig. 2c) demonstrates a high overall classification accuracy of 97.8%. In addition, the energy density matrix (Fig. 2d), as a measure of the output power distribution averaged across the test set, confirms effective energy localization, showing the optical energy is well concentrated at the correct output ports corresponding to each class, thereby indicating minimal cross-class optical crosstalk. Moreover, Fig. 2e illustrates the classification accuracy with respect to training data size, reinforcing the need for high volumes of training data to perform accurate signal classification. Notably, despite the increase in training data volume, our method maintains a constant number (N+C = 20) of required 3D-FDTD simulations per epoch in the inverse-design. As a result, the increase in epoch runtime is just 6.7% when using the training full dataset compared to only 10% of it.
Next, we perform classification on the MedNIST dataset to evaluate the scalability of the proposed method to a higher-dimensional dataset with a larger volume of data. Figure 3a shows the inference fields for a single randomly selected sample from each of the six MedNIST classes, along with their corresponding normalized power distributions, demonstrating correct signal localization at the respective class output port. The CW spatial field response for the ‘BreastMRI’ sample at a wavelength of 1550 nm is presented in Supplementary Movie 2, illustrating how the optimized PNN structure effectively concentrates optical power at the correct output port. As shown in Fig. 3b, the optimizer reached an equilibrium after 150 epochs, yielding a peak numerical accuracy of 99.1%. The optimized material topology at the final training epoch is shown in the 3D-rendered schematic (Fig. 3b Inset), highlighting the 220 nm fully-etched Si waveguide designed to enable efficient wave propagation through its high refractive-index contrast. The confusion matrix (Fig. 3c) quantitatively confirms this high classification accuracy, reaching 100% accuracies for classes ‘ChestCT’, ‘AbdomenCT’, ‘HeadCT’, and ‘BreastMRI’.
Fig. 3. MedNIST 6-class classification result.
a Normalized power distribution with respect to a maximum value Pmax and minimum value Pmin across 6 random test class samples as measured at output ports, and corresponding E-field magnitude distributions at center-line with maximum and minimum values denoted as and , respectively at 150th epoch. b Test loss and accuracy curves as a function of epochs with range overlayed across three random data permutations. Green: Test CE loss. Blue: Test classification accuracy. Inset 1: 3D-rendered schematic with B-spline interpolated topology at final epoch. c Confusion matrix for 6-class classification. d Energy density matrix at final epoch. e Blue: Mean test accuracy as a function of training data, with range overlaid across three random data permutations. Red: Max-normalized time taken for a single training epoch as a function of training data.
Furthermore, the energy density matrix (Fig. 3d) demonstrates effective energy localization, with approximately half of the total output power concentrated at the correct output port for five classes. Although slight coupling to neighboring ports is observed, the device maintains high performance even within its ultra-compact footprint. Figure 3e further validates the performance of our 3D-based optimization approach when the volume of training data increases. The observed classification accuracy improves consistently with increasing dataset size, while the computational overhead remains minimal due to the fixed number of FDTD simulations per epoch. The MNIST and MedNIST PNNs comprise approximately 1.6 × 105 and 2.4 × 105 trainable parameters, respectively (see Supplementary Note 1). During each training epoch, the corresponding multiplication operations are parallelized on GPUs, enabling fast and scalable gradient updates. The computational density reaches approximately 400 million trainable parameters per mm² (Supplementary Note 1), highlighting our PNNs’ capability to achieve high computational density within an ultra-compact footprint.
Hardware performance and inverse-design computational scalability
Figure 4 shows the benchmark results for a single simulation job, including key FDTD simulation parameters: mesh time, FDTD time and total wall time for MNIST and MedNIST datasets. The inverse-design process is conducted across three GPU platforms: Nvidia RTX 5090, RTX 4090, and V100. It is observed that the FDTD time for V100 is approximately 64 seconds to complete a single simulation for MNIST, compared to 48 seconds for RTX 4090 and 30 seconds for RTX 5090. The wall-clock time for a single simulation also accounts for factors such as mesh time - typically CPU-bound. Moreover, given the size of the MedNIST chip is 1.5 times larger than MNIST, this increased footprint is reflected in the wall-clock time, which is 1.46 times longer than MNIST.
Fig. 4. Benchmark timing results for a single simulation job and total simulation time.
Results are with respect to three computing nodes: Nvidia V100 with Xeon E5-2686 v4 CPU, RTX 4090 with Intel Core i9-10980XE CPU and RTX 5090 GPU with Intel Core i7-14700KF CPU. The bar chart compares single simulation mesh time, 3D-FDTD time, and wall-clock time for MNIST and MedNIST PNN designs. The inset compares the total simulation time under 3 distributed computing conditions: Single RTX 5090 GPU node, RTX 5090 + RTX 4090 GPU nodes, RTX 5090 + RTX 4090 + V100, demonstrating the simulation scalability via parallel processing.
The cumulative wall-clock time for the inverse-designed PNN is defined as the product of the single wall-clock time, the fixed number (N+C) 3D-FDTD simulations per epoch, and the total number of epochs. The total simulation time accounts for both the cumulative wall-clock time and the total time required for each epoch to derive spatial fields across the entire dataset. Our approach enables the linear separability of each simulation, allowing individual simulations to be independently executed on a clustered computing network47. This capability supports scheduling via makespan minimization, a strategy that optimally balances computational loads across available resources48, significantly reducing overall compute time through parallelization. The inset of Fig. 4 illustrates the total simulation time under three distributed computing schemes. On a single GPU Linux node equipped with an Intel Core i7-processor, 128 GB of RAM, and an Nvidia RTX 5090 (32 GB) graphics card, the inverse-design optimization for MNIST PNN runs for approximately 29.7 hours. Using the same hardware configuration, the MedNIST PNN optimization requires approximately 56.3 hours. Combining RTX 4090 and RTX 5090 nodes reduces these times to 19.7 hours (MNIST) and 37.9 hours (MedNIST). Incorporating an additional V100 node further reduces the total simulation time to 17.1 hours for MNIST and 33.3 hours for MedNIST. These benchmarking results clearly demonstrate the effectiveness of our approach in leveraging parallel computing.
Fabrication and measurement
To experimentally demonstrate the feasibility of our designs, we fabricate PNN accelerators for MNIST and MedNIST datasets on the SOI platform (see Methods). Figures 5a and b show the scanning electron microscope (SEM) images of the fabricated 20 × 20 µm2 MNIST and 30 × 20 µm2 MedNIST PNN accelerators, respectively, highlighting their compact footprints. Both input and output waveguides are designed to support the fundamental transverse-electric (TE) mode at 1550 nm, with a width of 500 nm and height of 220 nm. The minimal feature size across the device is 80 nm.
Fig. 5. PNN accelerators fabricated on the SOI platform.
SEM images of a 20 × 20 µm2 MNIST PNN with 10 input and 10 output waveguides and b 30 × 20 µm2 MedNIST PNN with 15 input and 6 output waveguides. c Packaged SOI chip wire-bonded to PCB. d Microscope image of MedNIST PNN chip illustrating amplitude () and phase () control components for input data preparation, with optical input, output and monitoring ports. Gold traces and bond pads facilitate the interface between the photonic chip and the MCU. Insets: Microscope images of (1) VGC, (2) 1 × 15 optical power splitter, (3) MZI-attenuator with optical monitoring ports, (4) optical phase combiners, and (5) MedNIST PNN accelerator.
Optical signals from a CW laser source at 1550 nm are vertically coupled into the chip via vertical grating couplers (VGCs), distributed through a power splitter, and encoded with amplitude and phase information for input data preparation via thermo-optic tuning (see Methods), and then processed via the inverse-designed PNN. To illustrate full device integration, Fig. 5c shows the MedNIST PNN chip wire-bonded to a printed circuit board (PCB) (see Methods) as the electrical interface through which optical amplitude and phase are adjusted for input data preparation. A detailed microscope image of the MedNIST chip is shown in Fig. 5d, clearly illustrating the VGC (Inset 1) for light coupling in and out of the chip, the multimode-interferometer (MMI) power splitter (Inset 2), MZIs (Inset 3), and phase-combiners (Inset 4) for input data preparation, along with the compact PNN accelerator region (Inset 5) and monitoring ports. These monitoring ports enable direct measurement of optical power at critical locations within the device, facilitating real-time calibration (see Supplementary Fig. 1). Finally, the optical power distribution from six output ports of the PNN accelerator is simultaneously measured using a multiport optical power meter to obtain classification results (see Supplementary Fig. 2). The MNIST PNN chip follows a similar layout and packaging (see Supplementary Fig. 3).
For experimental data preparation, we measure analog optical input power to the PNN accelerators across 100 MNIST and 60 MedNIST test samples, where 10 samples are randomly selected from each class (see Supplementary Fig. 4). To verify the accuracy of these inputs, we measure a global mean average error (MAE) of 0.0277 for MNIST and 0.0310 for MedNIST with respect to the simulated inputs (see Supplementary Fig. 5). These low MAE values confirm a strong correlation with the simulated amplitude-encoded features, validating the accuracy of our optical data preparation process. To evaluate how well the experimental dataset represents the test distribution, we compute several distributional benchmarks. A 3D principal component analysis shows that the experimental and test set features largely overlap within their 90% confidence ellipses (see Supplementary Fig. 6a, b). The mean feature distributions yield low per-feature MAE values of 0.0273 for MNIST and 0.0336 for MedNIST, and small average 1D Wasserstein distances49 of 0.0212 and 0.0279, respectively, indicating high distributional agreement (Supplementary Fig. 6c, d). Bootstrap resampling50 over 10,000 iterations (Supplementary Fig. 7a, b) produces global root mean square error precisions of ±2.02% for MNIST and ±3.63% for MedNIST relative to the test set mean at 90% confidence (Supplementary Fig. 7c, d). These results confirm that the stratified subset adequately represents the overall test distribution and that our experimental data remain statistically robust under resampling.
For MNIST, an overall classification accuracy of 89% is achieved, with Class ‘1’ exhibits the highest classification accuracy of 100%. For the MedNIST dataset, we achieve an overall classification accuracy of 90%, with the ‘BreastMRI’ and ‘CXR’ classes reaching the highest classification accuracy of 100%. Moreover, the MedNIST energy density matrix reveals an average of 50.5% of total output power concentrated at the correct output port. Among the correctly classified output ports, our results indicate only a 16.1% variation in maximum output power, demonstrating consistent classification performance across a higher-dimensional sequence of data. The classification score can be further improved by minimizing the power nonuniformities among the VGCs through enhanced fiber-to-chip alignment tolerance51. The deviation between simulated and experimental results mainly arises from fabrication-induced geometry errors. When the fabrication deviation (see Supplementary Fig. 8a) caused by lithographic over- or under-exposure41,52 reaches approximately ±20 nm, changes in classification accuracy become evident for both MNIST and MedNIST devices (see Supplementary Fig. 8b, c). The tolerance to fabrication deviations can be enhanced by independently simulating geometric perturbations53 and updating designs informed by these results, while training with augmented datasets6 can further improve device robustness.
Robustness verification
To evaluate the robustness of the PNN devices experimentally, we explore the effect of input data phase deviations on classification performance (see Methods). Figures 6e and f show scatterplots of the maximum phase variation while still maintaining a correct classification for MNIST and MedNIST datasets, respectively. Despite large input phase swings, the classification accuracy remains robust and consistent across both datasets. This robustness is primarily attributed to the amplitude-dominated encoding scheme, where classification decisions are governed predominantly by the strongest optical intensity features, reducing sensitivity to phase fluctuations. Figure 6g shows representative examples from the MNIST and MedNIST datasets under different mean absolute phase deviation conditions of 0.89 and 1.18 radians, respectively, clearly illustrating the dominant energy distribution at the correct output ports despite these phase deviations.
Fig. 6. Experimental classification performance of inverse-designed PNN accelerator.
Confusion matrix for a (MNIST) and b (MedNIST) datasets. Energy density matrix for c (MNIST) and d (MedNIST) datasets. 3D plots of correctly classified data samples under a randomly induced phase variation between radians for e 10 inputs MNIST and f 15 inputs MedNIST. g Comparison of transmission under mean absolute phase-error conditions of 0.89 radians (MNIST) and 1.18 radians (MedNIST) with respect to the reference phase ϕr for a random test-sample image from MNIST and MedNIST datasets, respectively. is defined as the mean absolute deviation of the input phases from their ideal reference values across all channels.
PNN model scalability to larger neural networks
We scale the model to larger neural networks by stacking PNNs with interleaved photodetector nonlinearity54, where each PNN is inverse-designed for task-optimized optical-field transformations. Figure 7 illustrates the schematic architecture for the PNN model scalable to larger neural networks. Input images are segmented into patches, encoded as feature vectors, and processed by pairs of consecutive PNN cores. The first PNN maps input features to an optical-field distribution across class channels (Fig.7 Inset a), which are converted after photodetection into a series of patch class-activation vectors55. The second PNN re-encodes this vector into refined patch features (Fig. 7 Inset b), which become increasingly aligned with class prototypes as they propagate in the next stage. Each pair is analogous to a layer block in multilayer DNNs56, where blocks are stacked to extract and refine feature representations57. Successive PNN blocks progressively refine these class-level embeddings, filtering out patches with low confidence e.g., background, and sharpening those with high-class correlation, generating richer interpretability of feature-to-class associations58. This depth-wise refinement expands the effective receptive field59, while a sliding window with a fixed stride integrates broader spatial context. At the network output, patch-level predictions are confidence-weighted via monitored optical power and aggregated to yield the final image-level classification result (Fig.7 Inset c). Different output encoding schemes, such as one-hot, binary, or error-correcting output codes (ECOC)60 are viable for classification, where multi-class decisions are recovered by code-word decoding at inference54. Weight-sharing across network width enables patches to be evaluated in parallel61. Moreover, optical multiplexing62 such as wavelength or polarization, increases inference throughput and enables multi-task operation to be executed on the same chip (see Supplementary Note 2). By using dual wavelengths, we demonstrate a 30 × 20 µm² single-chip design which simultaneously classifies MNIST and MedNIST datasets, achieving 95.1% and 98.0% test accuracy, respectively (see Supplementary Note 2).
Fig. 7. Schematic diagram of PNN model scalable to larger neural networks.
Images are segmented into patches () by a sliding window and encoded as feature vectors. Patches are processed by a sequence of inverse-designed PNN cores with interleaved photodetector nonlinearities (NL). Width scalability is achieved by weight sharing across patches, while depth scalability arises from stacking cores with re-extraction of overlapping patches. Within each layer block, the first PNN core maps a feature vector to an optical-field distribution over channels for class evidence (Inset a) and forms the patch class-activation vector after photodetection. This vector is re-encoded onto the optical input ports ' of the second PNN (Inset b). These inputs are used to refine and embed class-prototype information in patch features before propagating to the next PNN core. Repeating this process sharpens per-patch class attributions, which are aggregated by confidence-weighted pooling into image-level predictions (Inset c). The same forward operations occur during training. In addition, pooling is applied at intermediate layers for auxiliary supervision.
A patch-efficient adjoint method (see Supplementary Note 3) is incorporated into the inverse-design of each PNN core, ensuring that the training cost is independent of the number of image patches. For a PNN core with input dimension N and output dimension B, the number of FDTD solves per epoch is fixed at N+B under single-wavelength operation (see Methods). This scaling keeps training tractable even for large-class datasets. For instance, with a 4 × 4 patch kernel (N = 16) on ImageNet (1000 classes), where B = 16 is determined by ECOC coding54 with a 10-bit minimum code length plus 6 redundancy bits for error correction (see Methods), each PNN core design update requires only 32 FDTD-solves.
Discussion
In this paper, we demonstrate ultra-compact inverse-designed PNN accelerators for on-chip image classification, based on a scalable, 3D-FDTD inverse-design methodology. By harnessing the inherent linearity of Maxwell’s equations, our approach achieves computationally efficient and highly parallelizable optimization, enabling effective handling of increased dataset sizes. Experimental results on two SOI-chips reveal classification accuracies of 89% (MNIST) and 90% (MedNIST) in small footprints of 20 × 20 µm2 and 30 × 20 µm2, respectively. These results highlight the promise of inverse-designed PNN in emerging analog optical computing. Specifically, such designs, assisted by surrogate pre-processing, or in place of a digital fully connected output layer before classification, help alleviate the computational burden on electronic hardware. Looking ahead, our proposed framework readily scales to accommodate increased numbers of input channels, benefiting from its intrinsic parallelizability and computational efficiency. This scalability opens opportunities for processing higher-dimensional and larger-scale datasets. Multiplexing methods in the time63–65 or frequency-domain65 can be seamlessly integrated into our inverse-design process to encode additional degrees of freedom, enhancing model capacity within ultra-compact footprints. Moreover, incorporating high-speed modulation schemes, such as those based on the plasma dispersion effect66, electro-absorption67, and Pockels effect68,69, along with fast photodetectors and analog-to-digital converters (ADCs)70 can facilitate high-bandwidth data interfacing for the PNNs. Finally, our approach can be integrated into existing reconfigurable photonic architectures, offering a unified framework that bridges the gap between in-situ training capabilities and passive mathematical operations in the photonic domain.
Methods
Fabrication-informed inverse-design parametrization
At each inverse-design optimization step, we consider the parametrization of a latent feature space into a fabrication-feasible, binarized material index distribution. These latent parameters vary continuously between 0 and 1, which correlates to SiO2 (ϵmin) and Si (ϵmax), respectively for SOI waveguide. The feature space is randomly sampled from a uniform distribution then low-pass filtered using a Gaussian blur kernel71. To embed fabrication constraints into the design, we first employ a low-pass filter via a conic kernel41. This filtered latent space is projected onto a binary distribution using the modified hyperbolic tan function36 and is fully-binarized to enforce a real material permittivity matrix. The material is biased more toward Si (η< 0.5), where η is the material biasing parameter. A B-spline approximation is utilized to further enforce minimum feature size and radial constraints in the graphic data system (GDS) design file, while facilitating a spatially adaptive resolution and anisotropic topology - subject to considered placement of control points38. A discrete contour which defines the boundary between material and void regions is first constructed using the Suzuki-Abe algorithm72, followed by small feature removal using the skimage-morphology package in Python. The B-spline curve itself is composed of a linear combination of basis functions in 2D, expressed using the Cox-de Boor recursion formula39,40. By tailoring both the degree of the curve k and the number of knots, the density of approximated curves is effectively relaxed, thereby informing minimum feature and curvature constraints. In our case, a cubic (k = 3) B-spline approximation maps the discrete material voxels to a topology with a minimum feature size of 80 nm. Open-source Python library gdspy is used to translate these control points to create a GDS file at each design epoch.
Numerical simulation setup
To simulate the optical field distribution, a full-wave 3D-FDTD method (ANSYS Lumerical) is used. A uniform mesh is chosen with a step size less than λ/30 to be equivalent in dimension to the parameter space, and a non-uniform mesh is used outside the design region. A lightweight, semi-supervised autoencoder is used to learn a low-dimensional latent representation of the input data, which is then normalized and encoded onto the amplitude of light at 1550 nm. Open-source Python framework cupy is used to compute and store the 128-bit complex fields, each of size [nx, ny, nz]) in the GPU’s temporary memory. For both designs, a mini-batch size of 250 training samples is chosen to smoothen the gradient distribution and mitigate oscillatory behavior in optimization updates. The latent weights are randomly initialized, and Adam73 is used as the stochastic gradient optimizer, with a learning rate of 0.005 and exponential decay rates of 0.667 and 0.9 for the first and second-moment estimates, respectively.
Device fabrication and prototyping
Both the MNIST and MedNIST PNN accelerators are fabricated via electron beam lithography with a foundry-compatible minimum feature size (80 nm)74 on a standard SOI wafer, which has a 220 nm thick top silicon layer sitting on top of a 2 μm thick buried oxide layer, above a 725 μm-thick silicon substrate. VGCs43,75 are fabricated with an etch depth of 70 nm. Plasma-enhanced chemical vapor deposition process is used to deposit a 1 μm layer of SiO2 onto the fabricated devices to act as an insulation layer between the electrodes and the device, which is sufficiently thick to minimize excess absorption due to the metal electrodes76. To facilitate thermo-optic tuning, thin-film titanium microheaters and gold traces are fabricated atop the SiO2 cladding layer via a maskless lithography system (Heidelberg MLA100). For packaging and electrical interfacing, the devices are mounted onto a custom-designed PCB, which utilizes electroless nickel electroless palladium immersion gold contacts to provide the wire bonding material interface and electrical connectivity to a central MCU. The photonic chip is affixed to a thermoelectric cooler to ensure thermal stabilization. Temperature regulation is achieved via a thermistor embedded near the chip and connected through plated vias to the PCB.
Data preparation, calibration, and phase robustness testing
Input light at 1550 nm from a tunable laser (Keysight N7778C) with a relative intensity noise less than −150 dB/Hz77 is fiber-coupled into the chip. A set of 10 and 15 MZIs is used for MNIST and MedNIST PNN devices, respectively to facilitate amplitude control as data preparation. Each MZI is composed of a pair of 50:50 MMI splitters, integrated tapers, and S-bends with 30 µm arc radius. Downstream of MZIs, asymmetric 10:90 MMI are employed to tap a small portion of optical power for monitoring the attenuation in each channel, measured using multiport optical power meters (Keysight N7744C) with dynamic range of −65 dB (exceeding 80 dB with range switching)78 that incorporate built-in averaging to suppress random thermal and shot noises at the photodiode readout79. To monitor optical phase, adjacent channels are similarly tapped using 10:90 MMIs, and the extracted signals are recombined through 50:50 MMI combiners. The phase difference between channels is inferred from the combined optical power, where maximized combined optical power corresponds to phase equalization. For both MNIST and MedNIST chips, a stratified random sample of 10 examples per class is taken from the simulated test set to form the experimental validation set, demonstrating diverse feature representation (see Supplementary Fig. 6). The corresponding input channels are attenuated using MZI-based amplitude control, with feedback from the optical power meter, while simultaneously monitoring and equalizing the relative phase between channels. For both amplitude and phase calibration, the voltage generated by a digital-to-analog converter is swept between 1.1 V and 5 V with increments of 3 mV. The optimal voltage is identified by minimizing the error between the ground-truth and experimental optical transmission at 1550 nm. For phase robustness testing, the phase monitoring ports are used to identify the optimal phase corresponding to the highest classification result - this phase is then served as a reference. The applied phase variation relative to this reference is inferred from the electrical power delivered to the thermo-optic phase shifters, based on their known tuning characteristics80.
Training framework and scalability
Each PNN core is trained to implement task-specific optical-field transformations by minimizing a CE loss1. Class information is embedded through one-hot, binary, or ECOC coding60, with ECOC advantageous for large-class datasets by reducing the number of output channels required54. A patch-efficient adjoint method (see Supplementary Note 3) is incorporated into the optimization, whereby patch contributions are aggregated algebraically before gradient computation. This ensures that the total number of required FDTD solves per epoch for one per PNN core depends only on the input and output dimensions (N + B) and is independent of the number of image patches (see Supplementary Note 3). The optimization incorporates nonlinear photodetection, which converts optical logits into class-activation vectors and provides interpretable measures of patch-to-class association55. To stabilize training and ensure global consistency, auxiliary supervision can be introduced by aggregating patch-level predictions with confidence weighting at intermediate layers, enforcing agreement with image-level classification. Training is proceeded block by block, where each consecutive pair of PNN cores is optimized jointly and then frozen after convergence. Overlapping patches are re-extracted and passed to the next stage, enabling depth scalability through successive refinement. Reuse of pre-trained shallow blocks as priors can further improve efficiency by reducing the number of gradient updates needed for deeper layers81. The final PNN core maps refined patch features into class-activations, and confidence-weighted pooling across patches produces the image-level prediction. All optimizations are performed in simulation, and the resulting geometries are used directly for inference in the fabricated device.
Supplementary information
Description of Additional Supplementary Files
Acknowledgements
We thank S. Desai for discussions and assistance on device simulation, A. See for assistance with wire-bonding, L. Farar for assistance with device characterization, and M. Kwok for assistance with device testing. The authors acknowledge the facilities as well as the scientific and technical assistance of the Research and Prototype Foundry Core Research Facility at the University of Sydney, a part of the Australian National Fabrication Facility. We also acknowledge the Sydney Informatics Hub for providing cloud computing resources, and the Royal Society of New South Wales for the Bicentennial Postgraduate Scholarship. J.S., G.L., and D.M. acknowledge support from Research Training Program Scholarships from the University of Sydney. This work was supported by the Sydney Research Accelerator Fellowship.
Author contributions
X.Y., L.L., S.S., and J.S. conceived the experiment. L.L. and S.S. fabricated the devices. J.S. and D.M. performed device modeling and simulations. G.L. and J.S. performed HPC cluster acceleration for device simulation. J.S., S.S., and L.L carried out the device measurement and characterization. J.S., S.S., and X.Y. wrote the manuscript with contributions from all authors. All coauthors contributed to discussions of the protocol and results. X.Y. supervised the project.
Peer review
Peer review information
Nature Communications thanks Charis Mesaritakis and the other, anonymous, reviewer for their contribution to the peer review of this work. A peer review file is available.
Data availability
The data that support the findings of this study are available from the corresponding authors on request.
Code availability
The 3D-FDTD simulations were performed using Lumerical FDTD Solutions (Ansys Inc.). The codes generated during this study are available from the corresponding author upon request.
Competing interests
The authors declare no competing interests.
Footnotes
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
The online version contains supplementary material available at 10.1038/s41467-026-68648-1.
References
- 1.LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature521, 436–444 (2015). [DOI] [PubMed] [Google Scholar]
- 2.Deng, J. et al. ImageNet: a large-scale hierarchical image database. In Proc. IEEE Conference on Computer Vision and Pattern Recognition 248–255 (IEEE, 2009).
- 3.Fu, T. et al. Optical neural networks: progress and challenges. Light Sci. Appl.13, 263 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Liu, W. et al. Ultra-compact multi-task processor based on in-memory optical computing. Light Sci. Appl.14, 134 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Cheng, J. et al. Multimodal deep learning using on-chip diffractive optics with in situ training capability. Nat. Commun.15, 6189 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Fu, T. et al. Photonic machine learning with on-chip diffractive optics. Nat. Commun.14, 70 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Shen, Y. et al. Deep learning with coherent nanophotonic circuits. Nat. Photon.11, 441–446 (2017). [Google Scholar]
- 8.Filipovich, M. J. et al. Silicon photonic architecture for training deep neural networks with direct feedback alignment. Optica9, 1323–1332 (2022). [Google Scholar]
- 9.Ashtiani, F., Geers, A. J. & Aflatouni, F. An on-chip photonic deep neural network for image classification. Nature606, 501–506 (2022). [DOI] [PubMed] [Google Scholar]
- 10.Pai, S. et al. Experimentally realized in situ backpropagation for deep learning in photonic neural networks. Science380, 398–404 (2023). [DOI] [PubMed] [Google Scholar]
- 11.Xu, D., Ma, Y., Jin, G. & Cao, L. Intelligent photonics: a disruptive technology to shape the present and redefine the future. Engineering46, 186–213 (2025). [Google Scholar]
- 12.Momeni, A. et al. Training of physical neural networks. Nature645, 53–61 (2025). [DOI] [PubMed] [Google Scholar]
- 13.Wetzstein, G. et al. Inference in artificial intelligence with deep optics and photonics. Nature588, 39–47 (2020). [DOI] [PubMed] [Google Scholar]
- 14.McMahon, P. L. The physics of optical computing. Nat. Rev. Phys.5, 717–734 (2023). [Google Scholar]
- 15.Gholami, A. et al. AI and memory wall. IEEE Micro44, 33–39 (2024). [Google Scholar]
- 16.Zhu, H. H. et al. Space-efficient optical computing with an integrated chip diffractive neural network. Nat. Commun.13, 1044 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Li, Z. et al. Inverse design enables large-scale high-performance meta-optics reshaping virtual reality. Nat. Commun.13, 2409 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Yang, K. Y. et al. Multi-dimensional data transmission using inverse-designed silicon photonics and microcombs. Nat. Commun.13, 7862 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Goel, S. et al. Inverse design of high-dimensional quantum optical circuits in a complex medium. Nat. Phys.20, 232–239 (2024). [Google Scholar]
- 20.Sun, A. et al. Edge-guided inverse design of digital metamaterial-based mode multiplexers for high-capacity multi-dimensional optical interconnect. Nat. Commun.16, 2372 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Molesky, S. et al. Inverse design in nanophotonics. Nat. Photon.12, 659–670 (2018). [Google Scholar]
- 22.Wu, Z., Zhou, M., Khoram, E., Liu, B. & Yu, Z. Neuromorphic metasurface. Photon. Res.8, 46–50 (2020). [Google Scholar]
- 23.Hughes, T. W., Williamson, I. A. D., Minkov, M. & Fan, S. Wave physics as an analog recurrent neural network. Sci. Adv. 5, eaay6946 (2019). [DOI] [PMC free article] [PubMed]
- 24.Khoram, E. et al. Nanophotonic media for artificial neural inference. Photon. Res.7, 823–827 (2019). [Google Scholar]
- 25.Feng, F. et al. Symbiotic evolution of photonics and artificial intelligence: a comprehensive review. Adv. Photon.7, 024001 (2025). [Google Scholar]
- 26.Miller, D. A. B. All linear optical devices are mode converters. Opt. Express20, 23985–23993 (2012). [DOI] [PubMed] [Google Scholar]
- 27.Meng, Y. et al. Optical meta-waveguides for integrated photonics and beyond. Light Sci. Appl.10, 235 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Nikkhah, V. et al. Inverse-designed low-index-contrast structures on a silicon photonics platform for vector-matrix multiplication. Nat. Photon.18, 501–508 (2024). [Google Scholar]
- 29.Jiang, A., Shi, S., Jin, G. & Prather, D. W. Performance analysis of three-dimensional high-index contrast dielectric waveguides. Opt. Express12, 633–643 (2004). [DOI] [PubMed] [Google Scholar]
- 30.Hammer, M. & Ivanova, O. V. Effective index approximations of photonic crystal slabs: a 2-to-1-D assessment. Opt. Quant. Electron.41, 267–283 (2009). [Google Scholar]
- 31.LeCun, Y., Bottou, L., Bengio, Y. & Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE86, 2278–2324 (1998). [Google Scholar]
- 32.Halabi, S. S. et al. The RSNA pediatric bone age machine learning challenge. Radiology290, 498–503 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Wang, X. et al. Chestx-ray8: hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases. In Proc. IEEE Conference on Computer Vision and Pattern Recognition 2097-2106 (IEEE, 2017).
- 34.Clark, K. et al. The cancer imaging archive (TCIA): maintaining and operating a public information repository. J. Digit. Imaging26, 1045–1057 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Schubert, M. F., Cheung, A. K., Williamson, I. A. D., Spyra, A. & Alexander, D. H. Inverse design of photonic devices with strict foundry fabrication constraints. ACS Photon9, 2327–2336 (2022). [Google Scholar]
- 36.Wang, F., Lazarov, B. S. & Sigmund, O. On projection methods, convergence and robust formulations in topology optimization. Struct. Multidisc. Optim.43, 767–784 (2011). [Google Scholar]
- 37.Soille, P. Morphological Image Analysis: Principles and Applications 2nd edn, 316 (Springer-Verlag New York, 2003).
- 38.Khoram, E., Qian, X., Yuan, M. & Yu, Z. Controlling the minimal feature sizes in adjoint optimization of nanophotonic devices using B-spline surfaces. Opt. Express28, 7060–7069 (2020). [DOI] [PubMed] [Google Scholar]
- 39.De Boor, C. On calculating with B-splines. J. Approx. Theory6, 50–62 (1972). [Google Scholar]
- 40.Cox, M. G. The numerical evaluation of B-splines. IMA J. Appl. Math.10, 134–149 (1972). [Google Scholar]
- 41.Hammond, A. M., Oskooi, A., Johnson, S. G. & Ralph, S. E. Photonic topology optimization with semiconductor-foundry design-rule constraints. Opt. Express29, 23916–23938 (2021). [DOI] [PubMed] [Google Scholar]
- 42.Hughes, T. W., Minkov, M., Williamson, I. A. D. & Fan, S. Adjoint method and inverse design for nonlinear nanophotonic devices. ACS Photon5, 4781–4787 (2018). [Google Scholar]
- 43.Chen, Y. et al. Reflective microring-resonator-based microwave photonic sensor incorporating a self-attention assisted convolutional neural network. Appl. Opt.63, D59–D66 (2024). [DOI] [PubMed] [Google Scholar]
- 44.Dong, B. et al. Partial coherence enhances parallelized photonic computing. Nature632, 55–62 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Hubara, I., Courbariaux, M., Soudry, D., El-Yaniv, R. & Bengio, Y. Binarized neural networks. In Proc. 30th International Conference Neural Information Processing Systems, 4114–4122 (NIPS, 2016).
- 46.Lin, T.-Y. et al. Microsoft COCO: common objects in context. In Proc. IEEE Conference on Computer Vision and Pattern Recognition 740–755 (IEEE, 2014).
- 47.Hammond, A. M. et al. High-performance hybrid time/frequency-domain topology optimization for large-scale photonics inverse design. Opt. Express30, 4467–4491 (2022). [DOI] [PubMed] [Google Scholar]
- 48.Kang, C. et al. Large-scale photonic inverse design: computational challenges and breakthroughs. Nanophotonics13, 3765–3792 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Villani, C. Optimal Transport: Old and New Vol. 338 (Springer, 2009).
- 50.Efron, B. & Tibshirani, R. J. An Introduction to the Bootstrap (CRC Press, 1994).
- 51.Zhang, Z. et al. High-efficiency two-dimensional perfectly vertical grating coupler with ultra-low polarization-dependent loss and large fibre misalignment tolerance. IEEE J. Quantum Electron57, 8400407 (2021). [Google Scholar]
- 52.Probst, M. J., Khurana, A., Slaby, J. B., Hammond, A. M. & Ralph, S. E. Fabrication tolerant multi-layer integrated photonic topology optimization. Opt. Express32, 31448–31462 (2024). [DOI] [PubMed] [Google Scholar]
- 53.Wang, K. et al. Robust inverse design of digital photonic devices for photonic integrated circuits. Opt. Express33, 130–143 (2025). [DOI] [PubMed] [Google Scholar]
- 54.Xu, Z. et al. Large-scale photonic chiplet Taichi empowers 160-TOPS/W artificial general intelligence. Science384, 202–209 (2024). [DOI] [PubMed] [Google Scholar]
- 55.Brendel, W. & Bethge, M. Approximating CNNs with bag-of-local-features models works surprisingly well on ImageNet. In Proc. 7th International Conference on Learning Representations (ICLR, 2019).
- 56.Alzubaidi, L. et al. Review of deep learning: concepts, CNN architectures, challenges, applications, future directions. J. Big Data8, 53 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Hou, L. et al. Patch-based convolutional neural network for whole slide tissue image classification. In Proc. IEEE Conference on Computer Vision and Pattern Recognition 2424–2433 (IEEE, 2016). [DOI] [PMC free article] [PubMed]
- 58.Choromanski, K. et al. Rethinking attention with performers. InProc. 9th International Conference on Learning Representations (ICLR, 2021).
- 59.Luo, W., Li, Y., Urtasun, R. & Zemel, R. Understanding the effective receptive field in deep convolutional neural networks. In Proc. 30th Conference on Neural Information Processing Systems, 4905–4913 (NIPS, 2016).
- 60.Dietterich, T. G. & Bakiri, G. Solving multiclass learning problems via error-correcting output codes. J. Artif. Intell. Res.2, 263–286 (1995). [Google Scholar]
- 61.Wang, X. et al. Integrated photonic encoder for low-power and high-speed image processing. Nat. Commun.15, 4510 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Zhang, Y., Zhang, K., Hu, P., Li, D. & Feng, S. Multi-wavelength diffractive optical neural network integrated with 2D photonic crystals for joint optical classification. Nanophotonics14, 2891–2899 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Lin, Z. et al. 120 GOPS photonic tensor core in thin-film lithium niobate for inference and in situ training. Nat. Commun.15, 9081 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Song, Y. et al. Integrated electro-optic digital-to-analogue link for efficient computing and arbitrary waveform generation. Nat. Photon.19, 1107–1115 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Ou, S. et al. Hypermultiplexed integrated photonics-based optical tensor processor. Sci. Adv. 11, eadu0228 (2025). [DOI] [PMC free article] [PubMed]
- 66.De Marinis, L., Andriolli, N. & Contestabile, G. Analysis of integration technologies for high-speed analog neuromorphic photonics. IEEE J. Select. Top. Quant. Electron. 29 (2023).
- 67.Zhou, X., Yi, D., Chan, D. W. U. & Tsang, H. K. Silicon photonics for high-speed communications and photonic signal processing. npj Nanophotonics1, 27 (2024). [Google Scholar]
- 68.Hu, Y. et al. Integrated lithium niobate photonic computing circuit based on efficient and high-speed electro-optic conversion. Nat. Commun.16, 8178 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Zhang, S. et al. Thin-film lithium niobate photonic circuit for ray tracing acceleration. Nat. Commun.16, 5938 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Shekhar, S. et al. Roadmapping the next generation of silicon photonics. Nat. Commun.15, 751 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Gonzalez, R. C. & Woods, R. E. Digital Image Processing, 3rd edn. (Prentice-Hall, Inc., 2007).
- 72.Suzuki, S. & Abe, K. Topological structural analysis of digitized binary images by border following. Comput. Vis. Graph. Image Process.30, 32–46 (1985). [Google Scholar]
- 73.Kingma, D. P. & Ba, J. L. Adam: a method for stochastic optimization. In Proc. 3rd International Conference on Learning Representations (ICLR, 2015).
- 74.Rakowski, M. et al. 45 nm CMOS - silicon photonics monolithic technology (45CLO) for next-generation, low power and high speed optical interconnects. In 2020 Optical Fiber Communications Conference and Exhibition (IEEE, 2020).
- 75.Taillaert, D., Bienstman, P. & Baets, R. Compact efficient broadband grating coupler for silicon-on-insulator waveguides. Opt. Lett.29, 2749–2751 (2004). [DOI] [PubMed] [Google Scholar]
- 76.Powell, K. et al. Integrated silicon carbide electro-optic modulator. Nat. Commun.13, 1851 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Keysight Technologies. Keysight N777-C family of tunable laser sources. Available at https://www.keysight.com/it/en/assets/3119-1067/data-sheets/5992-4217.pdf (2024).
- 78.Keysight Technologies. N7744C and N7745C optical multiport power meters. Available at https://www.keysight.com/it/en/assets/3119-1066/data-sheets/5992-4218.pdf (2023).
- 79.Keysight Technologies. Tips for optimizing power meter / sensor measurement speed. Available at https://www.keysight.com/it/en/assets/7018-03043/application-notes/5990-8471.pdf (2017).
- 80.Liu, S. et al. Thermo-optic phase shifters based on silicon-on-insulator platform: state-of-the-art and a review. Front. Optoelectron.15, 9 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 81.Chen, T., Goodfellow, I. & Shlens, J. Net2net: accelerating learning via knowledge transfer. In Proc. International Conference on Learning Representations (ICLR, 2016).
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Description of Additional Supplementary Files
Data Availability Statement
The data that support the findings of this study are available from the corresponding authors on request.
The 3D-FDTD simulations were performed using Lumerical FDTD Solutions (Ansys Inc.). The codes generated during this study are available from the corresponding author upon request.







