Skip to main content
ACS AuthorChoice logoLink to ACS AuthorChoice
. 2025 Jul 31;12(8):4279–4288. doi: 10.1021/acsphotonics.5c00552

Physics-Guided Hierarchical Neural Networks for Maxwell’s Equations in Plasmonic Metamaterials

Sean Lynch , Jacob LaMountain ‡,*, Bo Fan , Jie Bu §, Amogh Raju , Daniel Wasserman , Anuj Karpatne §, Viktor A Podolskiy
PMCID: PMC12372168  PMID: 40861265

Abstract

While machine learning (ML) has found multiple applications in photonics, traditional “black box” ML models typically require prohibitively large training data sets. Generation of such data, as well as the training processes themselves, consume significant resources, often limiting practical applications of ML. Here, we demonstrate that embedding Maxwell’s equations into ML design and training significantly reduces the required amount of data and improves the physics-consistency and generalizability of ML models, opening the road to practical ML tools that do not need extremely large training sets. The proposed physics-guided machine learning (PGML) approach is illustrated on the example of predicting complex field distributions within hyperbolic meta­material photonic funnels, based on multilayered plasmonic–dielectric composites. The hierarchical network design used in this study enables knowledge transfer and points to the emergence of effective medium theories within neural networks.

Keywords: machine learning, metamaterials, plasmonics, physics-guided machine learning


graphic file with name ph5c00552_0008.jpg


graphic file with name ph5c00552_0006.jpg

1. Introduction

Composite materials with engineered optical properties, metamaterials and metasurfaces, are rapidly advancing as platforms for optical communications, sensing, imaging, and computing. The complexity of typical metamaterials makes it almost impossible to understand and optimize their interaction with light based on experimental or analytical theory approaches alone, leaving the problem of light interaction with metamaterials to computational sciences. ,,, Currently, finite-difference time domain (FDTD) and finite element methods (FEM) , represent industry-standard approaches to understanding the optics of nonperiodic composite media.

Machine learning (ML) techniques, particularly neural networks (NNs), have recently been incorporated into the design, evaluation, and measurement of nanophotonic structures. Properly trained ML tools can be used as surrogate models that predict the spectral response of composites orrarelyfield distributions within metamaterials. ,− Since ML does not solve the underlying electromagnetic problem, these predictions are significantly faster than brute-force simulations. However, extensive training sets, often featuring ∼103 to 105 configurations are required in order to develop high-quality ML models. ,− The time and computational resources needed to generate these data sets, as well as the time and resources needed for the ML training process, are significant and often serve as the main limitation to ML use in computational photonics.

Embedding physics-based constraints (physics-consistency) , into the ML training process may be beneficial for the resulting models. ML methods for general solutions of partial differential equations (PDEs) are being developed. However, as of now, these techniques are illustrated on convenient “toy” models and cannot be straightforwardly applied to practical electromagnetic problems. Physics-guided machine learning (PGML) ,, is emerging as a promising platform that can combine data- and physics-driven learning. Notably, previous PGML attempts have been focused on dielectric or relatively simple plasmonic composites. Here, we develop PGML models that are capable of predicting electromagnetic fields within plasmonic metamaterials. We illustrate our technique by analyzing the optical response of metamaterials-based photonic funnels: conical structures with strongly anisotropic composite cores that are capable of concentrating light to deep subwavelength areas. We show that physics-based constraints enable training on unlabeled data and significantly improve the accuracy and generalizability of the models. We also attempt to understand the inner workings of the NNs by analyzing the performance of hierarchical models with different data resolutions.

2. Hyperbolic Metamaterial-Based Photonic Funnels

An electromagnetic composite comprising sufficiently thin alternating layers of nonmagnetic materials with permittivities ϵ1, ϵ2 and thicknesses d 1, d 2 (see Figure ) behaves as a uniaxial medium whose optical axis is perpendicular to the layers (direction in this work) and whose diagonal permittivity tensor has components given by ϵxx=ϵyy=ϵ=d1ϵ1+d2ϵ2d1+d2 and ϵzz=(d1+d2)ϵ1ϵ2d1ϵ2+d2ϵ1 . Such a material supports the propagation of two types of plane waves that differ in their polarization and have fundamentally different dispersions.

1.

1

(a) Schematic of the photonic funnel with cut-out demonstrating the composite structure of the core; inset shows scanning electron microscopy (SEM) image of the as-fabricated array of funnels. (b) Simulation setup used in FEM-based solutions of Maxwell’s equations; NNs are trained only on a subregion of the FEM data, which contains the funnel; (c,d) wavelength dependence of (c) the permittivity of the highly doped plasmonic components of the funnels and (d) the components of the resulting effective permittivity tensor.

The ordinary waves (which have E⃗ ) satisfy the dispersion relation k 2 = ϵω2/c 2, with k⃗, ω, and c representing the wavevector of the wave, operating angular frequency, and speed of light in vacuum, respectively. This dispersion is identical to that of plane waves propagating in a homogeneous isotropic material with permittivity ϵ. On the other hand, the extraordinary, or transverse-magnetic (TM), waves (with H⃗) have dispersion kx2+ky2ϵzz+kz2ϵ=ω2c2 . Notably, for anisotropic materials, the dispersion of extraordinary waves is either elliptical or hyperbolic. The topology of the iso-frequency contours strongly depends on the combination of signs of the effective permittivity tensor.

When the components of the permittivity tensor are of opposite signs, the iso-frequency surfaces are hyperboloids. This hyperbolic dispersion has been identified as the enabling mechanism for such unique optical phenomena as negative refraction, strong enhancement of light–matter interaction, and light manipulation in deep subwavelength areas. ,− Hyperbolicity can be achieved by alternating layers of dielectric (ϵ1 = ϵd > 0) and plasmonic (ϵ2 = ϵm < 0) layers. In the semiclassical regime (typically, when the layer thickness ≳ 10 nm), the permittivity of the plasmonic layers as a function of angular frequency of light is well described by the Drude model

ϵm(ω)=ϵ(1ωp2ω2+iγω) 1

with ϵ, ωp, and γ being background permittivity, plasma frequency, and scattering rate, respectively. Here, we use ϵ = 12.15 and γ = 1013 s–1 and parameterize the plasma frequency using the plasma wavelength, λp, via ωp = 2πcp.

The most common implementation of these metamaterials leverages a 50/50 composition (d 1 = d 2 = d). For such systems, topological transitions occur when the permittivity of the plasmonic layers (ϵm) and the weighted permittivity of the mixture (ϵ) change signs. , The dispersion of TM waves inside the metamaterial is elliptic for shorter wavelengths λ < λp. It changes to type-I hyperbolicity (ϵ > 0, ϵ zz < 0) for λp<λ<λ̃p , with the renormalized plasma frequency, λ̃p , defined as Re(ϵ(λ̃p))=0 . Finally, the dispersion of TM waves in the composite becomes type-II hyperbolic (ϵ < 0, ϵ zz > 0) for λ̃p<λ .

Photonic funnels, conical waveguides with hyperbolic metamaterial cores, shown in Figure , represent excellent examples of structures capable of manipulating light at a deep subwavelength scale. Recent experimental results , demonstrate efficient concentration of mid-infrared light with a vacuum wavelength of ∼10 μm to spatial areas as small as ∼300 nm, 1/30th of the operating wavelength, within an all-semiconductor “designer metal” material platform. Further analysis relates the field concentration near the funnels’ tips to the absence of the diffraction limit within the hyperbolic material and to the anomalous internal reflection of light from the funnel sidewall, which forms an interface oblique to the optical axis. Importantly, the optical response of realistic funnels can be engineered at the time of fabrication by controlling the doping of the designer metal layers and thereby adjusting the plasma frequency of these layers. The unusual electromagnetic response, strong field confinement, and significant field inhomogeneities make photonic funnels an ideal platform for testing the performance of ML-driven surrogate solvers of Maxwell’s equations.

3. Methods

3.1. Data Set Description and Generation

To construct a sufficiently diverse set of labeled configurations, we used FEM to solve for electromagnetic field distributions in photonic funnels with plasmonic layers of different doping concentrations corresponding to plasma wavelengths of 6 μm, 7 μm, 8.5 μm, 10 μm, and 11 μm. Figure illustrates the wavelength-dependent permittivity of plasmonic layers with various doping concentrations as well as the corresponding effective medium response of the layered metamaterials. Note the drastic changes of effective medium response as a function of both wavelength and doping.

For each doping level, wavelength-dependent permittivity and electromagnetic field distributions have been calculated with a commercial FEM-based solver (that takes into account that all fields are proportional to exp­(−iϕ), with ϕ being the angular coordinate of the cylindrical reference frame) for free-space wavelengths from 8 to 12 μm with increments of 62.5 nm. The FEM model setup is shown schematically in Figure a. Electromagnetic waves that are normally incident on the funnel base are generated by the port boundary condition. Perfectly matched layers and scattering boundary conditions are used to make the outside boundaries of the simulation region completely transparent to electromagnetic waves, thereby mimicking the surrounding infinite space. The model, which explicitly incorporates 80 nm-thick layers in the funnel cores, is meshed with a resolution of at most 40 nm inside the funnel and 200 nm outside the funnels, with the mesh growth factor set to 1.1 to avoid artifacts related to abrupt changes in mesh size.

For every plasma wavelength and operating frequency, the distribution of electromagnetic fields, along with the distributions of permittivities within a small (5 × 12 μm) region of space containing the funnel (see Figure ), is interpolated onto a rectangular mesh with resolution 12.5 nm × 10 nm along the r and z directions, respectively, forming the basis for the data sets used in the study. Note that selecting this internal region of space from the FEM simulations allows us to (i) implicitly incorporate the proper boundary conditions for both incident as well as scattered electromagnetic fields and (ii) avoid the implementation of perfectly matched layers, ports, and scattering conditions within the physics-based constraints used in training our NNs.

The original FEM-generated data has been then resampled into three separate data sets:

  • low-resolution data set, 20 × 60 pixels with resolution 250 and 200 nm in r and z directions, respectively

  • medium-resolution data set, 100 × 300 pixels with resolution 50 nm × 40 nm

  • high-resolution data set, 200 × 600 pixels with resolution of 25 nm × 20 nm

4. Neural Network Architecture

On a fundamental level, approximating solutions of Maxwell’s equations within metamaterials with ML necessitates a neural network to map the operating frequency and the distribution of permittivity across the composite to the distribution of electromagnetic fields, a problem that is similar to image transformation. Previous analysis , has demonstrated that convolutional neural networks (CNNs) excel in image transformation. Specifically, encoder-decoder, CNN, and U-net architectures have shown success in electromagnetic problems, ,,− presumably due to the cores of the networks learning some low-dimensional representation of the solutions. , Note, however, that the vast majority of previous ML-driven solvers of Maxwell’s equations ,,, have analyzed dielectric composites (where electromagnetic fields are relatively smooth) and were trained on relatively large data sets. ,,

We follow the general approach of constructing U-nets. The design of our networks is summarized in Figure . Starting with the pixel resolution of the data set, the proposed CNNs reduce the dimensionality of the problem to 20 × 10 pixels and then expand the resulting distributions to their original size.

2.

2

Setup of the CNN used in the study; the three rows represent low-, medium-, and high-resolution networks; boxes represent the size of data as it propagates through the network; arrows represent CNN data operations: each thick solid arrow represents the combination of a (transposed) convolutional layer and a tanh activation layer; thin black and orange arrows represent skip connections; thin red arrows represent input and output.

The linear parts of the network employ standard convolutional and transposed convolutional layers with stride = 1 for those parts of the network that preserve pixel size and with stride >1 for those that perform encoding/downsampling and decoding/upsampling. Hyperbolic tangent activation layers are used to add nonlinearities to the CNN. Combinations of convolutional and tanh layers are marked as thick arrows in Figure . In addition, custom layers are introduced to implement skip connections that propagate the vacuum wavelength and permittivity distributions into the depth of the network for both stability of the resulting NN and to enable evaluation of the physics-consistency of the resulting predictions. These layers operate by directly appending several layers of pixels to the output of a given convolutional layer (thin black arrows in Figure ) or by first downsampling to the core resolution and then concatenating (thin orange arrows in Figure ).

The base part of the NN (blue layers in Figure ) is designed to learn the distribution of the ϕ components of the electric and magnetic fields. Note that in our hierarchical setup, the core of the networks remains the same, independent of the resolution of the data set, with the outer structure producing encoding/decoding from/to the higher resolution. The inner structure of the network (layer dimensionality and filter size) was optimized using the low-resolution data set. The medium- and high-resolution networks build upon this geometry by adding “hierarchical” downsampling and upsampling layers, implemented via convolutional and transposed convolutional layers in our NNs. Our analysis suggests that it is important to initialize the downsampling layers with unit weights, thereby setting the network for brute-force averaging of permittivity during the initial training iterations.

The physics-agnostic portion of the CNN, which is trained to produce the ϕ components of the magnetic and electric fields, is followed by a physics-informed layer (gray layers in Figure ), which calculates distributions of the r and z components of the magnetic field based on analytical expressions derived from Maxwell’s equations

E⃗rz=iϵω2c21r2(1rD⃗rzEϕωcϕ̂×D⃗rzHϕ) 2
H⃗rz=iϵω2c21r2(1rD⃗rzHϕ+ϵωcϕ̂×D⃗rzEϕ) 3

where we have introduced the vector differential operator D⃗rzf=1rr(rf)+fz .

Because the fields are discretized on a regular rectangular grid, all derivatives are approximated with finite difference schemes. Forward and backward differences are used at the edges of the computational domain, while central differences are used within it. Our implementation of the CNNs used in this work and the data sets used in training are available on GitHub and Figshare, respectively. ,

As seen from eq , predictions for H r and H z may diverge when ϵr 2ω2/c 2 ≈ 1. This instability is a direct consequence of applying differential operators in a cylindrical geometry. Here, we address the related issues by introducing a regularizing function (see below and Supporting Information). Our approach, illustrated here on an example of cylindrical geometry, may be generalized to other curvilinear coordinates.

4.1. Knowledge Transfer between Different NNs

As mentioned above, in the limit of ultrathin layers, the optics of multilayer metamaterials can be adequately described by the effective medium theory. In a related but separate scope, the U-shaped NNs are hypothesized to learn low-dimensional representations of the underlying phenomena. These considerations motivate the hierarchical design of the NNs used in this work.

To explore whether the learning outcomes of the NNs are consistent with the effective medium description, we performed a series of experiments where pretrained lower-resolution networks were used as pretrained cores of higher-resolution transfer-learning (TL) networks. In these studies, the learning parameters of the pretrained “core” layers were frozen, with only the averaging and transposed convolution peripheral layers of the higher-resolution NN being trained.

At the implementation level, we drew inspiration from the ResNet architecture’s approach of organizing layers into “residual blocks.” Specifically, we grouped the frozen layers into a single block, with the internal layer weights corresponding to those of the selected pretrained network. The forward function was designed to perform training within the layers of the block; however, during back-propagation, the weight updates bypass the internal layers of the block, passing directly to the previous layer.

We explored knowledge transfer from low- to medium-resolution networks as well as from medium- to high-resolution networks.

4.2. Training Protocols

To assess the benefits of the physics-based constraints, three different regimes of training the CNN are explored. In the base-case black-box (BB) scenario, the model minimizes only the radially weighted mean-squared error of the ϕ components of the electric and magnetic fields (directly produced by the physics-agnostic part of the network) Lϕ=w(r)[|HϕYHϕT|2+|EϕYEϕT|2] . Here, the superscripts Y and T correspond to the predicted and ground-truth fields, respectively, the angled brackets, ⟨···⟩, represent an arithmetic mean over the simulation region, and the radial weight function, w(r), is used to emphasize the region of small radii where the funnel is located.

The second, field-enhanced (FE) model utilizes a hybrid loss that combines the above-described L ϕ with its analog for the remaining components of the magnetic field

LFE=Lϕ+Lrz 4

with Lrz=w(r)|R|2[|HrYHrT|2+|HzYHzT|2] and the rz components of the magnetic field being produced by the physics layer of the CNN.

In order to prevent the instability of eq from dominating the overall loss, we introduce the regularization function, R(r, z), such that R(r, z) → 0 when rc/(ϵ(r,z)ω) (see the Supporting Information for details). Because calculation of the r and z field components requires differentiating the ϕ components, the addition of L rz allows the CNN to learn the relationships between the spatial field distributions and the distributions of their derivatives. Importantly, evaluation of both L ϕ and L rz terms requires the training set to contain the solutions of Maxwell’s equations (labeled data).

Finally, physics-guided (PG) training combines the above labeled-data-dependent terms, L ϕ and L rz , with the physics loss

Lph=1max|HϕY||(HzYR2)r(HrYR2)z+iωcϵEϕYR22(HzYRRrHrYRRz)| 5

which represents the (regularized) residual of Maxwell’s equations for the H ϕ component of the field (see the Supporting Information). Therefore, PG training aims to enforce consistency of the solutions that are generated by the NN with Maxwell’s equations. Notably, an evaluation of the physics loss does not require labeled data. As a result, unlabeled-trained (UL) networks can utilize a combination of labeled and unlabeled data, with the former inherently incorporating the boundary conditions and the latter allowing the expansion of the training set without computing additional PDE solutions. This UL loss was also used in training the TL networks described in the preceding section.

Previous analysis demonstrated that BB- and PG-loss often compete with each other. Here, this competition reflects the different differentiation schemes used by FEM and the PG-loss as well as the existence of multiple solutions to Maxwell’s equations (for example, the trivial solution E⃗=H⃗=0 ) that do not necessarily satisfy the boundary conditions that are implicitly enforced by labeled data. To guide the network toward the correct implementation of boundary conditions, the weight of the physics-loss, w ph, is dynamically adjusted during training, resulting in the dynamic PG loss

LPG=Lϕ+Lrz+wphLph 6

In order to assess the ability of the networks to interpolate and extrapolate between data sets having plasmonic layers with different plasma wavelengths, we train the networks on 50% of the data with plasma wavelengths of 6 and 11 μm or with plasma wavelengths of 7 and 10 μm and add up to 10% of the labeled data from other data sets to the training. The UL models are also provided configurations from the remaining data sets as unlabeled data. The training scenarios are summarized in Table , which gives the percent of each data set that was used as labeled and unlabeled data in each network type. Each training scenario has been used to train at least 10 different networks of each resolution and loss type, with the dynamics of their training and validation loss presented in the Supporting Information and their averaged performance summarized below.

1. Labeled and Unlabeled Training Data Composition of Each Network.

  labeled %
unlabeled %
  λp (μm)
λp (μm)
network 6 7 8.5 10 11 6 7 8.5 10 11
BB i , FE i , PG i 50 0 10 0 50 none        
TL i , UL i 50 0 10 0 50 0 0 40 0 0
BB e , FE e , PG e 0 50 10 50 0 none        
UL e 0 50 10 50 0 25 0 40 0 25
BB x , FE x , PG x 10 50 10 50 10 none        
UL x 10 50 10 50 10 25 0 40 0 25

5. Results

To demonstrate the impact of physics-based constraints on the accuracy and consistency of NN-predicted fields, we analyze the dependence of the three average losses introduced above (L ϕ, L rz , and L ph) both on the enforcement of physics-consistency and on the presence of unlabeled data during training. Sample field distributions are presented to illustrate the models’ performance. Finally, we analyze the generalizability of the models by evaluating their performance across the plasma wavelengths of the plasmonic layers.

The three components of the loss, L ϕ, L rz and L ph, are arranged in increasing degree of physics consistency and simultaneously decreasing reliance on data. Indeed, L ϕ, which analyzes only the physics-agnostic output of the networks, relies exclusively on data. L rz , which primarily relies on the output of the physics layer, enforces the relationships between the fields at neighboring points [see eq ]. Lastly, L ph exclusively analyzes physics-consistency and pays no regard to data consistency. Our analysis (see below) illustrates that training with L ph not only improves the consistency with eq but also improves other metrics that are related to Maxwell’s equations, such as energy conservationas analyzed through the Poynting theorem (see Supporting Information).

5.1. Impact of Physics Information on Accuracy

The performance of the different models is summarized in Figure . With the comparatively simple low-resolution model, adding the physics-based layer to the network and adding the L rz component to the loss function provides enough additional information to adequately represent the coarsely sampled data. Providing the network additional physics-based information (by implementing PG loss) does not quantitatively boost the performance of the modeldue to a combination of the model’s simplicity and the mesh being too coarse to resolve the composite structure.

3.

3

Performance of NNs with different architectures and training protocols, evaluated on the data that was not used in training; panels (a–d) represent low-resolution (a), medium resolution (b,c), and high-resolution (d) networks (see Table for network labels); loss metrics of individual predictions are represented as filled semitransparent circles, resulting in the color-coded distributions; solid white markers and black bars represent the mean and standard deviation of these distributions; the purple horizontal lines show the average L ph of all interpolated FEM solutions.

As the resolution and complexity of the model grow, increasing physics-based constraints and adding unlabeled data yield measurable improvements in model performance. Interestingly, the extra physics consistency (as demonstrated by the improving L ph metric) sometimes comes at the cost of a small increase of L ϕ. This apparent contradiction results from the fact that the data used in training was generated by reinterpolating FEM solutions from a triangular mesh to a rectangular mesh. As a result, the “ground truth” does not yield vanishing L ph. As seen in Figure , predictions of the neural net tend to be closer solutions to Maxwell’s equations on the rectangular mesh than the FEM-sourced data.

A more granular look at the NN predictions is shown in Figure where representative examples of model predictions are compared with FEM solutions. Note that in contrast to their BB counterparts, PG networks predict smoother fields and resolve individual layers of the structure.

4.

4

Representative predictions of the NNs with (a–e) low-, (g–k) medium-, and (m–r) high-resolution; input permittivity is shown in panels (f,l); panels (a,g,m) represent ground truth; panels (b,h,n)predictions of BB i NNs, panels (c,i,o)predictions of FE i networks, panels (d,j,p)predictions of PG i networks, and panels (e,k,q)predictions of UL i networks; panel (r) illustrates the performance of the TL i network. Note that higher-performing networks resolve field oscillations on the scale of individual layers within the composite and field concentration near the tip of the funnel.

Our results are in agreement with previous studies , that focused on predictions of field distributions in dielectric structures trained on relatively large (∼104 configurations) data sets. Incorporation of physics loss in these NNs resulted in substantial (but limited) improvements in physics consistency (by a factor of ≲ 2). Here, we see similar dynamics for low-resolution networks that require few labeled-data training inputs to achieve their top performance. At the same time, the physics consistency of our medium- and high-resolution networks, which are trained in the data-poor regime, is improved by an order of magnitude as a result of the incorporation of physics-based constraints.

5.2. Knowledge Transfer

As mentioned above, we have attempted knowledge transfer from a pretrained low-resolution network to a medium-resolution network and from a pretrained medium resolution network to its high-resolution counterpart. In both cases, a single average-performing lower-resolution UL network was chosen as the source of the frozen core of the higher-resolution TL networks. Notably, the low-resolution network poorly resolves the individual layers within the composite. Consistent with this design, implementation of PG loss does not substantially improve network performance (see above), and using a pretrained low-resolution network as a learning-free core of the medium-resolution counterpart does not yield adequate performance of the resulting NN.

In contrast, using a pretrained medium resolution network as a (fixed) core of a high-resolution NN provided reasonable performance. As seen in Figures and , the accuracy of TL i networks falls between the fully trained high-resolution FE i and PG i NNs.

The physics of finely stratified composites is analytically described by effective medium theories (EMT). In the EMT formalism, the spatial distribution of homogenized (averaged over the scale of the inclusion ∼ d) electromagnetic fields is given by effective parameters (here, ϵ and ϵ zz ). These homogenized fields, along with equations that relate the effective medium parameters to microscopic distributions of permittivity, can then be used to recover fine-scale field distributions.

The analytical procedure described above is somewhat similar to the operation of the hierarchical TL CNN reported in this work. Indeed, the CNN-based U-nets are known to learn a low-dimensional representation of the underlying phenomena. From this standpoint, while we do not analyze the neural operation of the CNN in detail, the medium-resolution network is likely to learn some form of materials averaging/field recovery by analyzing the transition between the scale of individual layers (resolved at the entrance and exit of the network) and compact representations in its core. The TL high-resolution wraparound parts of the network likely learn the averaging and upscaling procedures. We reserve the analysis of the relationship between the analytical EMT and the operation of TL-based hierarchical CNNs for future work.

By freezing the inner core of the CNN within knowledge transfer networks, we significantly reduce the number of training parameters. Therefore, we expect smaller variability and faster learning in the TL i networks as compared with their fully trained high-resolution PG i counterparts. However, in our implementation, the time required to calculate one training epoch of a TL i network is almost identical to the time required for one epoch of a PG i network, indicating that the time spent updating the learnable NN parameters is significantly less than the time spent executing forward and backward propagation steps. Different implementation and optimization settings may affect this result.

At the same time, further analysis (Supporting Information) suggests that TL i networks converge over a smaller number of epochs than their PG i counterparts. In addition, in our studies, variation between the performance of the best and the worst TL i networks was significantly smaller than the variation between the best and the worst PG i networks.

5.3. Interpolation vs Extrapolation within the Models

As described above (Table ), the NNs have been trained on multiple subsets of the data derived from FEM solutions, aiming to assess both correctness and generalizability of the proposed PGML networks. Here, we are particularly interested in the ability of the NN to generalize the results between different plasma wavelengths of the doped components of the funnels’ cores.

In the “interpolating” models (subscripted i), 50% of the data from the sets representing the lowest and the highest plasma frequencies and an additional 10% from the data set representing the central plasma wavelength were used as labeled training data. The unlabeled networks further included 40% of the central plasma wavelength data set as unlabeled data. Therefore, the CNN would have to deduce the behavior of the composites with λp = 7 and 10 μm. For the “extrapolating” (subscripted e) and “extended extrapolating” (subscripted x) networks, a similar approach was used, except with the bulk of labeled training data coming from the 7 and 10 μm plasma wavelengths, having the CNN deduce the behavior of the metamaterials with λp = 6, 11 μm.

Typically, data interpolation is a much simpler problem than data extrapolation. However, this general rule does not hold for our analysis. As seen in Figure b,c, the average performance of the two classes of medium-resolution networks is almost identical to each other, indicating that both interpolation and extrapolation tasks (in terms of λp) in our study represent similar difficulties to the NNs.

Figure provides a more in-depth look at this behavior. In general, as characterized by L ϕ loss, the networks perform their best in predicting the fields within the metamaterials for the same plasma wavelength that comprises the majority of their labeled training set. Indeed, L ϕ is ∼2 times lower for the data that has a plasma wavelength that is well-represented in the training set than for the configurations with plasma wavelengths that contribute few or no instances to the labeled training data. Incorporation of physics-based constraints improves the physics-consistency of the results for all values of λp by an order of magnitude, indicating that the CNNs learn the general properties of the field distributions but miss the particular boundary conditions that are encoded in the labeled data.

5.

5

Performance of the medium-resolution networks for predicting the field distribution of composites with given plasma wavelengths; panels (a,c,e) and (b,d,f) represent L ϕ and L ph, respectively, for (a,b) interpolating, (c,d) extrapolating, and (e,f) extended extrapolating networks; colors represent training protocols; individual predictions are represented as filled semitransparent circles, resulting in the color-coded distributions; solid white circle markers and black bars represent the mean and standard deviations of these distributions.

By comparing the performance of extrapolating networks to their “extended” counterparts [Figure c,e], it is seen that adding very little labeled data can somewhat address this issue of underspecified boundary conditions: introduction of ∼ 20 labeled distributions (total) for λp = 6, 11 μm reduces the λp-specific L ϕ by ∼20% with almost no effect on L ph.

Interestingly, in all scenarios, L ph decreases as a function of λp. This behavior traces the strength of the resonance in ϵ zz that decays and moves out of the spectral range of the study as λp increases (see Figure d).

6. Conclusions

We have presented a hierarchical design of PG neural network surrogate solvers of Maxwell’s equations and demonstrated the proposed formalism by predicting the field distributions in hyperbolic metamaterial-based photonic funnels. We have demonstrated that embedding physics information into the ML process, by enforcing the physics-based constraints and by adding unlabeled training configurations, improves the quality of ML predictions in the regime of limited training data. In particular, physics-guided ML predictions are almost 2 orders of magnitude more physics-consistent than their BB–ML counterparts, even near wavelengths where the layered composite undergoes topological transitions. Separately, we have demonstrated that a hierarchical network architecture enables knowledge transfer from existing pretrained models to higher-resolution NN implementations.

The approach presented can be directly applied to the analysis of complex rotationally symmetric electromagnetic systems. The technique can be straightforwardly extended to quasi-2D geometries with inclusions of various sizes and shapes by using the appropriate coordinate-representations of Maxwell’s equations. The formalism can be further extended to 3D geometries, although we anticipate that such extensions will require significantly larger computational resources.

Supplementary Material

ph5c00552_si_001.pdf (1.1MB, pdf)

The following files are available free of charge. The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acsphotonics.5c00552.

  • Analytical derivation of the regularized physics loss L ph, elaboration on our chosen regularization and radial weight functions, demonstration of the effect of physics consistency on energy conservation, and analysis of the training and validation loss dynamics of our networks (PDF)

The work has been supported by the National Science Foundation (awards# 2004298, 2423215, 2004422, 2026710, and 2239328).

The authors declare no competing financial interest.

A preprint of this work is available on arXiv, DOI: 10.48550/arXiv.2502.17644.

References

  1. Noginov, M. ; Podolskiy, V. A. . Tutorials in Metamaterials; CRC Press, 2012. [Google Scholar]
  2. Kildishev A. V., Boltasseva A., Shalaev V. M.. Planar Photonics with Metasurfaces. Science. 2013;339:1232009. doi: 10.1126/science.1232009. [DOI] [PubMed] [Google Scholar]
  3. Lin D., Fan P., Hasman E., Brongersma M. L.. Dielectric gradient metasurface optical elements. Science. 2014;345:298–302. doi: 10.1126/science.1253213. [DOI] [PubMed] [Google Scholar]
  4. Engheta, N. ; Ziolkowski, R. . Metamaterials: Physics and Engineering Explorations; Willey-IEEE press, 2006. [Google Scholar]
  5. Epstein A., Wong J. P. S., Eleftheriades G. V.. Cavity-excited Huygens’ metasurface antennas for near-unity aperture illumination efficiency from arbitrarily large apertures. Nat. Commun. 2016;7:10360. doi: 10.1038/ncomms10360. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Elhamod M., Bu J., Singh C., Redell M., Ghosh A., Podolskiy V., Lee W.-C., Karpatne A.. CoPhy-PGNN: Learning Physics-Guided Neural Networks with Competing Loss Functions for Solving Eigenvalue Problems. ACM Trans. Intell. Syst. Technol. 2022;13:1–23. doi: 10.1145/3530911. [DOI] [Google Scholar]
  7. Ghosh A., Elhamod M., Bu J., Lee W.-C., Karpatne A., Podolskiy V. A.. Physics-Informed Machine Learning for Optical Modes in Composites. Adv. Photonics Res. 2022;3:2200073. doi: 10.1002/adpr.202200073. [DOI] [Google Scholar]
  8. Yu N., Genevet P., Kats M. A., Aieta F., Tetienne J.-P., Capasso F., Gaburro Z.. Light Propagation with Phase Discontinuities: Generalized Laws of Reflection and Refraction. Science. 2011;334:333–337. doi: 10.1126/science.1210713. [DOI] [PubMed] [Google Scholar]
  9. Banks, J. W. ; Henshaw, W. D. ; Kildishev, A. V. ; Kovačič, G. ; Prokopeva, L. J. ; Schwendeman, D. W. . Solving Maxwell’s Equations with a Generalized Dispersive Material Madel on Overset Grids. 2019 International Applied Computational Electromagnetics Society Symposium (ACES); IEEE, 2019; pp 1–2. [Google Scholar]
  10. Taflove, A. ; Hagness, S. . Computational Electrodynamics: The Finite-Difference Time-Domain Method; Artech House, 2005. [Google Scholar]
  11. Jin, J.-M. The Finite Element Method in Electromagnetics; John Wiley & Sons, 2014. [Google Scholar]
  12. Inc., C. COMSOL v. 6.3. https://www.comsol.com/(accessed July 08, 2025).
  13. Malkiel I., Mrejen M., Nagler A., Arieli U., Wolf L., Suchowski H.. Plasmonic nanostructure design and characterization via Deep Learning. Light: Sci. Appl. 2018;7:60. doi: 10.1038/s41377-018-0060-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Khatib O., Ren S., Malof J., Padilla W. J.. Deep Learning the Electromagnetic Properties of MetamaterialsA Comprehensive Review. Adv. Funct. Mater. 2021;31:2101748. doi: 10.1002/adfm.202101748. [DOI] [Google Scholar]
  15. Jiang J., Chen M., Fan J. A.. Deep neural networks for the evaluation and design of photonic devices. Nat. Rev. Mater. 2021;6:679–700. doi: 10.1038/s41578-020-00260-1. [DOI] [Google Scholar]
  16. Lin Z., Roques-Carmes C., Pestourie R., Soljačić M., Majumdar A., Johnson S. G.. End-to-end nanophotonic inverse design for imaging and polarimetry. Nanophotonics. 2021;10:1177–1187. doi: 10.1515/nanoph-2020-0579. [DOI] [Google Scholar]
  17. Kudyshev Z. A., Sychev D., Martin Z., Yesilyurt O., Bogdanov S. I., Xu X., Chen P.-G., Kildishev A. V., Boltasseva A., Shalaev V. M.. Machine learning assisted quantum super-resolution microscopy. Nat. Commun. 2023;14:4828. doi: 10.1038/s41467-023-40506-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Kudyshev Z. A., Kildishev A. V., Shalaev V. M., Boltasseva A.. Optimizing startshot lightsail design: A generative network-based approach. ACS Photonics. 2022;9:190–196. doi: 10.1021/acsphotonics.1c01352. [DOI] [Google Scholar]
  19. Li W., Sedeh H. B., Tsvetkov D., Padilla W. J., Ren S., Malof J., Litchinitser N. M.. Machine Learning for Engineering Meta-Atoms with Tailored Multipolar Resonances. Laser Photon. Rev. 2024;18:2300855. doi: 10.1002/lpor.202300855. [DOI] [Google Scholar]
  20. Pestourie R., Mroueh Y., Nguyen T. V., Das P., Johnson S. G.. Active learning of deep surrogates for PDEs: application to metasurface design. npj Comput. Mater. 2020;6:164. doi: 10.1038/s41524-020-00431-2. [DOI] [Google Scholar]
  21. Zhelyeznyakov M., Fröch J., Wirth-Singh A., Noh J., Rho J., Brunton S., Majumdar A.. Large area optimization of meta-lens via data-free machine learning. Commun. Eng. 2023;2:60. doi: 10.1038/s44172-023-00107-x. [DOI] [Google Scholar]
  22. Ma W., Cheng F., Xu Y., Wen Q., Liu Y.. Probabilistic representation and inverse design of metamaterials based on a deep generative model with semi-supervised learning strategy. Adv. Mater. 2019;31:1901111. doi: 10.1002/adma.201901111. [DOI] [PubMed] [Google Scholar]
  23. Ma W., Cheng F., Liu Y.. Deep-Learning-Enabled On-Demand Design of Chiral Metamaterials. ACS Nano. 2018;12:6326–6334. doi: 10.1021/acsnano.8b03569. [DOI] [PubMed] [Google Scholar]
  24. Lu L., Pestourie R., Yao W., Wang Z., Verdugo F., Johnson S. G.. Physics-informed neural networks with hard constraints for inverse design. SIAM J. Sci. Comput. 2021;43:B1105–B1132. doi: 10.1137/21M1397908. [DOI] [Google Scholar]
  25. Hestness, J. ; Ardalani, N. ; Diamos, G. . Beyond human-level accuracy: computational challenges in deep learning. Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming; ACM: New York, NY, USA, 2019, pp 1–14. [Google Scholar]
  26. Zhao, Y. ; Chen, J. ; Oymak, S. . On the Role of Dataset Quality and Heterogeneity in Model Confidence. 2020, https://arxiv.org/abs/2002.09831 (accessed July 09, 2025).
  27. Jiang J., Fan J. A.. Global optimization of dielectric metasurfaces using a physics-driven neural network. Nano Lett. 2019;19:5366–5372. doi: 10.1021/acs.nanolett.9b01857. [DOI] [PubMed] [Google Scholar]
  28. Mao, C. ; Lupoiu, R. ; Dai, T. ; Chen, M. ; Fan, J. A. . Towards General Neural Surrogate Solvers with Specialized Neural Accelerators. 2024, https://arxiv.org/abs/2405.02351 (accessed July 09, 2025).
  29. Karniadakis G., Kevrekidis I., Lu L., Perdikaris P., Wang S. L. Y., Yang L.. Physics-informed machine learning. Nat. Rev. Phys. 2021;3:422–440. doi: 10.1038/s42254-021-00314-5. [DOI] [Google Scholar]
  30. Huang L., Chen H., Liu T., Ozcan A.. Self-supervised learning of hologram reconstruction using physics consistency. Nat. Mach. Intell. 2023;5:895–907. doi: 10.1038/s42256-023-00704-7. [DOI] [Google Scholar]
  31. Bar-Sinai Y., Hoyer S., Hickey J., Brenner M. P.. Learning data-driven discretizations for partial differential equations. Proc. Natl. Acad. Sci. U.S.A. 2019;116:15344–15349. doi: 10.1073/pnas.1814058116. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Wang S., Wang H., Perdikaris P.. Learning the solution operator of parametric partial differential equations with physics-informed DeepONets. Sci. Adv. 2021;7:eabi8605. doi: 10.1126/sciadv.abi8605. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Lu L., Jin P., Pang G., Zhang Z., Karniadakis G. E.. Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators. Nat. Mach. Intell. 2021;3:218–229. doi: 10.1038/s42256-021-00302-5. [DOI] [Google Scholar]
  34. Seidman J., Kissas G., Perdikaris P., Pappas G. J.. NOMAD Nonlinear manifold decoders for operator learning. Adv. Neural Inf. Process. Syst. 2022;35:5601–5613. [Google Scholar]
  35. Goswami, S. ; Bora, A. ; Yu, Y. ; Karniadakis, G. E. In Machine Learning in Modeling and Simulation: Methods and Applications; Rabczuk, T. , Bathe, K.-J. , Eds.; Springer International Publishing: Cham, 2023; pp 219–254. [Google Scholar]
  36. Li, Z. ; Kovachki, N. ; Azizzadenesheli, K. ; Liu, B. ; Bhattacharya, K. ; Stuart, A. ; Anandkumar, A. . Fourier Neural Operator for Parametric Partial Differential Equations. 2021, https://arxiv.org/abs/2010.08895 (accessed July 09, 2025).
  37. Yao H. M., Jiang L., Sha W. E. I.. Enhanced Deep Learning Approach Based on the Deep Convolutional Encoder–Decoder Architecture for Electromagnetic Inverse Scattering Problems. IEEE Antenn. Wireless Propag. Lett. 2020;19:1211–1215. doi: 10.1109/LAWP.2020.2995455. [DOI] [Google Scholar]
  38. Raissi M., Perdikaris P., Karniadakis G. E.. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. J. Comput. Phys. 2019;378:686–707. doi: 10.1016/j.jcp.2018.10.045. [DOI] [Google Scholar]
  39. Chen M., Lupoiu R., Mao C., Huang D.-H., Jiang J., Lalanne P., Fan J. A.. High speed simulation and freeform optimization of nanophotonic devices with physics-augmented deep learning. ACS Photonics. 2022;9:3110–3123. doi: 10.1021/acsphotonics.2c00876. [DOI] [Google Scholar]
  40. Li K., Simmons E., Briggs A., Nordin L., Xu J., Podolskiy V., Wasserman D.. Sub-diffraction limited photonic funneling of light. Adv. Opt. Mater. 2020;8:2001321. doi: 10.1002/adom.202001321. [DOI] [Google Scholar]
  41. Govyadinov A. A., Podolskiy V. A.. Metamaterial photonic funnels for subdiffraction light compression and propagation. Phys. Rev. B:Condens. Matter Mater. Phys. 2006;73:155108. doi: 10.1103/PhysRevB.73.155108. [DOI] [Google Scholar]
  42. LaMountain J., Raju A., Wasserman D., Podolskiy V. A.. Anomalous reflection for highly efficient subwavelength light concentration and extraction with photonic funnels. Nanophotonics. 2024;13:4625–4637. doi: 10.1515/nanoph-2024-0213. [DOI] [Google Scholar]
  43. Drachev V. P., Podolskiy V. A., Kildishev A. V.. Hyperbolic metamaterials: new physics behind a classical problem. Opt. Express. 2013;21:15048–15064. doi: 10.1364/OE.21.015048. [DOI] [PubMed] [Google Scholar]
  44. Prokes S., Glembocki O. J., Livenere J. E., Tumkur T. U., Kitur J. K., Zhu G., Wells B., Podolskiy V. A., Noginov M. A.. Hyperbolic and plasmonic properties of Silicon/Ag aligned nanowire arrays. Opt. Express. 2013;21:14962–14974. doi: 10.1364/OE.21.014962. [DOI] [PubMed] [Google Scholar]
  45. Vasilantonakis N., Wurtz G. A., Podolskiy V. A., Zayats A. V.. Refractive index sensing with hyperbolic metamaterials: strategies for biosensing and nonlinearity enhancement. Opt. Express. 2015;23:14329–14343. doi: 10.1364/OE.23.014329. [DOI] [PubMed] [Google Scholar]
  46. Galfsky T., Gu J., Narimanov E. E., Menon V. M.. Photonic hypercrystals for control of light–matter interactions. Proc. Natl. Acad. Sci. U.S.A. 2017;114:5125–5129. doi: 10.1073/pnas.1702683114. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Jackson, J. Classical Electrodynamics; Willey, 1999. [Google Scholar]
  48. Poddubny A., Iorsh I., Belov P., Kivshar Y.. Hyperbolic metamaterials. Nat. Photonics. 2013;7:948–957. doi: 10.1038/nphoton.2013.243. [DOI] [Google Scholar]
  49. Shekhar P., Atkinson J., Jacob Z.. Hyperbolic metamaterials: fundamentals and applications. Nano Convergence. 2014;1:14–17. doi: 10.1186/s40580-014-0014-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Law S., Adams D., Taylor A., Wasserman D.. Mid-infrared designer metals. Opt. Express. 2012;20:12155–12165. doi: 10.1364/OE.20.012155. [DOI] [PubMed] [Google Scholar]
  51. Hou, X. ; Duan, J. ; Qiu, G. . Deep Feature Consistent Deep Image Transformations: Downscaling, Decolorization and HDR Tone Mapping. 2017, https://arxiv.org/abs/1707.09482 (accessed June 11, 2025).
  52. Hou X., Gong Y., Liu B., Sun K., Liu J., Xu B., Duan J., Qiu G.. Learning Based Image Transformation Using Convolutional Neural Networks. IEEE Access. 2018;6:49779–49792. doi: 10.1109/ACCESS.2018.2868733. [DOI] [Google Scholar]
  53. Li H., Chen L., Qiu J.. Convolutional Neural Networks for Multifrequency Electromagnetic Inverse Problems. IEEE Antenn. Wireless Propag. Lett. 2021;20:1424–1428. doi: 10.1109/LAWP.2021.3085033. [DOI] [Google Scholar]
  54. Huang G., Liu K., Liang J., Cai C., Gu Z. H., Qi F., Li Y., Yu Z. L., Wu W.. Electromagnetic Source Imaging via a Data-Synthesis-Based Convolutional Encoder–Decoder Network. IEEE Transact. Neural Networks Learn. Syst. 2024;35:6423–6437. doi: 10.1109/TNNLS.2022.3209925. [DOI] [PubMed] [Google Scholar]
  55. Kudyshev Z. A., Kildishev A. V., Shalaev V. M., Boltasseva A.. Machine learning–assisted global optimization of photonic devices. Nanophotonics. 2020;10:371–383. doi: 10.1515/nanoph-2020-0376. [DOI] [Google Scholar]
  56. Gong R., Tang Z.. Investigation of convolutional neural network U-net under small datasets in transformer magneto-thermal coupled analysis. COMPEL. 2020;39:959–970. doi: 10.1108/COMPEL-12-2019-0491. [DOI] [Google Scholar]
  57. Lim J., Psaltis D.. MaxwellNet: Physics-driven deep neural network training based on Maxwell’s equations. APL Photon. 2022;7:011301. doi: 10.1063/5.0071616. [DOI] [Google Scholar]
  58. Zhang, P. ; Hu, Y. ; Jin, Y. ; Deng, S. ; Wu, X. ; Chen, J. . A Maxwell’s Equations Based Deep Learning Method for Time Domain Electromagnetic Simulations. 2020 IEEE Texas Symposium on Wireless and Microwave Circuits and Systems (WMCS); IEEE, 2020, pp 1–4. [Google Scholar]
  59. Lynch, S. ; LaMountain, J. ; Fan, B. ; Bu, J. ; Raju, A. ; Wasserman, D. ; Karpatne, A. ; Podolskiy, V. A. . Physics informed machine learning for photonic funnels. 2025, https://github.com/viktor-podolskiy/physics-informed-machine-learning-for-photonic-funnels (accessed June 11, 2025).
  60. Lynch, S. ; LaMountain, J. ; Fan, B. ; Bu, J. ; Raju, A. ; Wasserman, D. ; Karpatne, A. ; Podolskiy, V. A. . PGML Dataset. 2025, https://figshare.com/articles/dataset/PGML_Dataset/28439978 (accessed Feb 21, 2025).
  61. He, K. ; Zhang, X. ; Ren, S. ; Sun, J. . Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR); CVPR, 2016, pp 770–778. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

ph5c00552_si_001.pdf (1.1MB, pdf)

Articles from ACS Photonics are provided here courtesy of American Chemical Society

RESOURCES