Abstract
Metamaterials and their related research have had a profound impact on many fields, including optics, but designing metamaterial structures on demand is still a challenging task. In recent years, deep learning has been widely used to guide the design of metamaterials, and has achieved outstanding performance. In this work, a metamaterial structure reverse multiple prediction method based on semisupervised learning was proposed, named the partially Conditional Generative Adversarial Network (pCGAN). It could reversely predict multiple sets of metamaterial structures that can meet the needs by inputting the required target spectrum. This model could reach a mean average error (MAE) of 0.03 and showed good generality. Compared with the previous metamaterial design methods, this method could realize reverse design and multiple design at the same time, which opens up a new method for the design of new metamaterials.
Keywords: metamaterials, deep learning, reverse design, multiple design
1. Introduction
As special human-made materials, metamaterials have attracted the attention of a large number of experts and scholars since they were proposed [1,2]. They allow the design of specific microstructures to change the transmission characteristics of waves to achieve special purposes, including perfect absorption [3,4], hyperbolic materials [5], circular polarization imaging [6,7], and hyperlenses [8]. In addition, based on the chiral phenomenon found in metamaterials, multiple functions have been realized [9,10,11]. The terahertz wave is in the transitional region between electronics and photonics research, and has many unique properties, such as low photon energy and strong penetrability. The emergence of metamaterials makes up for the lack of electromagnetic materials in the terahertz frequency band, and provides an effective way to realize functional devices in the terahertz frequency band. It is also a hot research topic in the field of metamaterials [12,13,14].
Although research on metamaterials is in full swing, designing metamaterial structures on demand is still a challenging task. Researchers must spend a lot of time and computing resources on calculating the effect that a metamaterial with a specific structure can achieve, and then reoptimize the structure of the metamaterial. Even highly experienced researchers cannot easily design a metamaterial structure on demand through traditional methods in a short time.
As an interdisciplinary field covering life sciences, mathematics, psychology, and many other subjects, deep learning has attracted much attention since it was proposed [15]. It allows a model to bypass the basic theory and obtain useful parts from a large amount of data to construct the mapping relationship between input and output. Presently, it has made great achievements in many computer-related fields, including image classification [16], natural language processing [17], and feature recognition [18]. At the same time, there have been many successes in other non-computer-related fields, including many basic disciplines such as chemistry [19], physics [20], and biology [21].
After applying some simple machine-learning algorithms such as a genetic algorithm [22], linear regression [23], and a Bayesian algorithm [24] to the design of metamaterials, deep learning has gradually been applied in this field [25,26,27]. Some researchers have achieved a reverse design for different structures by using different deep-learning networks. However, even within the same structure, there may not be only one set of structural parameters that can achieve the target effect. Therefore, by introducing generative models, including the Variational Autoencoder (VAE), other researchers can calculate other structures that can achieve the same purpose through the designed structure [28,29,30]. However, the method of reverse multistructure design for a certain target effect may be more practical.
In this article, we innovatively proposed a pCGAN based on the idea of the Conditional Generative Adversarial Network (CGAN) [31]. Compared with the method of reverse design using CGAN alone, we obtained a faster training speed, used fewer network parameters, and realized higher accuracy of reverse prediction. The results showed that the use of this model could achieve the purpose of the reverse multistructure design of metamaterials, and the final trained model could reach an MAE of 0.028 (12% lower than for CGANs), and the model showed good robustness and generalization. By using the trained model to guide the design of metamaterials, the design cycle of metamaterials can be effectively shortened. In addition, this model also has high scalability. By only replacing the dataset with the dataset of another structure, the reverse on-demand design of different structures and different target curves can be realized.
2. Materials and Methods
2.1. COMSOL Simulation Model
We built a simulation model to verify that our network could realize the multiple reverse design of the metamaterial structure. We choose high-resistance silicon (ρ > 5000 Ω∙m) commonly used in the terahertz region as the material to establish a square resonant ring model with four gaps, and observed its effect on the incident light electromagnetic response.
As shown in Figure 1, the entire metasurface was arranged by meta-atoms with identical structures. Its height , square outer side length , square line width , and gap size were chosen as structural parameters. The left-handed circularly polarized (LCP) light was incident on this structure perpendicularly, and the frequency-transmittance curves under different structures were obtained by changing the four structural parameters. The size of the super unit substrate was 150 μm × 150 μm, the thickness of the substrate was 50 μm, and the gap position was centered.
The above model was constructed using COMSOL Multiphysics 5.5 [32], and the required dataset by scanning the 4 structural parameters. The scan range of the parameters and was (10, 40), the step size was 10, the scan range of the parameters and was (30, 110), and the step size was 20. Finally, 400 sets of simulation data were obtained.
2.2. The pCGAN Model
When using traditional CGAN for reverse design, neither the generator nor the discriminator can output the spectrum that the structure can produce, which leads to the need to repeatedly use simulation methods to operate the output of the generator during the training process.
On the other hand, the CGAN method is the same as for other Generative Adversarial Networks (GANs). As shown in Formula (1), due to its adversarial loss function construction, the optimal solution requires that the Nash equilibrium be satisfied. The process of seeking the optimal solution requires that the training process between the generator and the discriminator be well synchronized, otherwise the model will be difficult to fit. In addition, when directly using the CGAN method for reverse design, the network training results cannot be intuitively verified through the network, and it still needs to rely on traditional simulation methods, which also makes the training more difficult.
(1) |
Therefore, we designed a semisupervised-learning deep neural network based on the idea of CGAN, and named it pCGAN. This network included a generator for reverse design and a discriminator for predicting the frequency-transmittance curve that the structure could match. The generator was the main body to realize multiple backward predictions, and the training of the generator relied on an excellent discriminator. The discriminator can be regarded as the loss function of the generator. Therefore, before training the generator, we needed to train the discriminator in a supervised learning method. When the discriminator is trained well, the two networks will be connected, and trained in an unsupervised learning method.
As shown in Figure 2, the discriminator could output the corresponding frequency-transmittance curve by inputting different structural parameters. The well-trained discriminator could obtain the corresponding relationship between the structure parameter and the frequency-transmittance curve. The process of obtaining the frequency-transmittance curve through the trained discriminator took a very short time, and it could be used alone to replace the time-consuming simulation process. Using COMSOL Multiphysics to simulate the single set of structural parameters of the above-mentioned square split resonator ring took about 1 h, while the time for using this network to predict the transmittance curve was only tens of milliseconds. This also allowed the trained discriminator to intuitively reflect the training results of the generator, which had a significant effect on subsequent network optimization and verification.
As shown in Figure 3, the generator could output structural parameters that could achieve the required functions by inputting a set of random noise and the required frequency-transmittance curve. The well-trained generator could reversely design different structural parameters that could reach the target curve according to the required frequency-transmittance curve and different random noises. Similarly, the time efficiency of using a trained generator for backward prediction was extremely high.
2.3. Neural Network Method
As shown in Figure 4, the working principle of the neural network is based on the way the human brain works and learns, and a large number of neural network nodes are constructed to simulate neurons. Each node is connected to adjacent nodes, and the output of the node is adjusted by adjusting the link weight. The output of a single node can be expressed as:
(2) |
where f is the activation function, is the connection weight between the i-th node of the previous layer and this node, is the output of the i-th node of the previous layer, is the bias term of the node, and is the number of nodes in the previous layer connected to the j-th node.
2.4. Data Preprocessing
Due to the different dimensions of the input data, it is difficult to maintain the same range of input data. In the process of neural network training, the error caused by the larger input value is usually larger, to avoid the difference between the smaller input value and the larger input value on the network, it is usually necessary to preprocess the input data.
In this work, the transmittance range of different wavelength points in the spectrum data was the same (0 to 1). However, the value ranges of different parameters in the structure parameters were different. It was necessary to perform data preprocessing on the structural parameters. However, after processing them, the structural parameters predicted by the generator also conformed to the rules after preprocessing, so a data preprocessing method that was convenient for restoring the data needed to be used. Here, we choose the Min–Max Normalization data preprocessing method to scale all the original input data to the range (0, 1):
(3) |
This operation not only could avoid the impact on network training caused by different dimensions, but it also helped us to restore the reverse prediction data. The restoration formula is:
(4) |
2.5. Activation Function
To meet the high nonlinearity of the reverse design problem, the Exponential Linear Units (ELU) function that combines the advantages of the Sigmoid and Rectified Linear Unit (ReLU) functions was used as the activation function [33]. The output of the ELU function can be expressed as:
(5) |
where is the original input, and the parameter α ranges from 0 to 1.
As shown in Figure 5, when , it had better soft saturation, which could make the network more robust to the input; and when , the gradient was always 1, which was beneficial in alleviating the gradient disappearance of the neural network and making the network easier to converge.
2.6. Overfitting Solution
Using a small dataset to train the network will often cause the network to perform worse on data outside the dataset due to the occurrence of overfitting. In order to improve the generalization of the network, L2 regularization was used here to process the weight , which could make the network learn more features. The regularized output can be expressed as:
(6) |
where represents the original loss function, and the regularization term is added on this basis, where represents the regularization coefficient, represents the data throughput, and is the weight. After the regularization term is added, the value of the weight tends to decrease, and the appearance of excessively large values can be avoided, so it is also called weight decay. L2 regularization can reduce the weight to avoid large slope in the fitted curve, thereby effectively alleviating the overfitting phenomenon of the network and helping to converge.
3. Results
Training of pCGAN
Firstly, we conducted supervised training on the above-mentioned discriminator, and the training set was the simulation data obtained through the COMSOL Multiphysics simulation software. After normalizing the dataset, the order was shuffled, and 10% of it was extracted as the validation set, and the remaining 90% was used as the training set.
In the case of reaching the lowest error, we chose the network structure with the least training time. As shown in Figure 6a, the discriminator was composed of one input layer, one output layer, and three hidden layers. The number of nodes in each layer increased with the depth of the network. The hidden layer and the output layer used ELU and Sigmoid as the activation functions, and the L2 regularization method was used to process the weights of each layer to eliminate overfitting. The number of network parameters was 45,524, and the average training time was 1.2 s/epoch.
When the discriminator training had been completed, we combined the generator and discriminator as shown in Figure 7, and set the discriminator to be untrainable. Then we inputted different random noise and frequency-transmittance curves for unsupervised training. It should be noted that the input frequency-transmittance curve and the output frequency-transmittance curve were the same. The noise used a random array that satisfied the Gaussian distribution; its mean was 0, its variance was 1, and its size was 4.
As shown in Figure 6b, the generator consisted of one input layer, one embedding layer, one output layer, and four hidden layers that decreased with the depth of the network. Similarly, the hidden layer and the output layer used ELU and Sigmoid (for limited the range of output values) as the activation functions. The embedding layer was responsible for embedding the input noise into the target curve, so that the data of the input hidden layer not only retained the characteristics of the target curve, but also carried the characteristics of random noise. Even if the same target curves were inputted, the generator could still generate different structural parameters according to the different noise. The network parameter amount was 187,432, and the average training time was 244 ms/epoch.
It is worth noting that the networks mentioned above were all running on computers equipped with Intel i7-10750H processors and NVIDIA GeForce RTX 2060 graphics cards.
4. Discussion
After the discriminator had been trained, there was an obvious overfitting phenomenon, which showed that the error on the verification set was far larger than the error on the training set. This reduced the generalization performance of the discriminator, and the discriminator could not be further trained. After the regularization coefficient λ had been set to 1 × 10−4, the overfitting phenomenon was effectively alleviated. In the end, the discriminator reached the optimum after 1500 epochs of training, showing close errors on the training set and the validation set (MAE = 0.033), as shown in Figure 8.
As shown in Figure 9, when we entered the structural parameters shown in the lower left corner to the discriminator, the network output frequency-transmittance curve basically coincided with the simulation result. With this as proof, our discriminator was able to accurately predict the frequency-transmittance curve that the input structural parameters could represent, and the positions and sizes of the peaks and valleys were highly consistent, which achieved the expected purpose. Although there was still a subtle error in the details, it was already within an acceptable range. Using this trained discriminator could effectively guide the training of the generator.
The training of the generator was non-data-driven unsupervised learning, so there was no need to consider the impact of overfitting, and it reached the best after 2000 epochs of training (MAE = 0.028), as shown in Figure 10.
Considering that it was difficult to manually draw a set of frequency-transmittance curves that could be achieved by this structure, we selected a set of frequency-reflectance curves in the verification set as the target curve during verification. It is worth noting that this set of curves was not included in the training set, so it was not used when training the network. As shown in Figure 11, when inputting the target curve with two different sets of noise into the generator, the generator could predict two different sets of structural parameters, and both could reach the optical response reflected by the target curve.
In order to verify the reliability of the generator’s predicted results, we entered the two sets of structural parameters outputted by the generator and the structural parameters that met the target curve from the simulation results into the COMSOL simulation software, and verified them through simulation. As shown in Figure 12, the verification result showed that the effect that our discriminator and generator achieved was remarkable, and achieved the purpose of a one-to-many reverse design.
The above-mentioned reverse multiple design method of pCGAN had good scalability. This method only needs to replace the data of the training discriminator, and can be applied to other different structures, and the target curve is not limited to the frequency-transmittance curve, but can also be applied to the corresponding relationship between phase, incident angle, and absorptance and reflectance. We attempted to verify its performance on different structures, and all the results showed good adaptability. In theory, if a fixed format dataset can be used to characterize different structures and their parameters, this method can be applied to the reverse design of all metamaterial structures. However, because it is not easy to build such a huge dataset, this work is still ongoing.
5. Conclusions
In this article, we were inspired by the ideas of CGAN and designed a metamaterial reverse design method based on pCGAN. This method could not only realize the reverse design of the metamaterial structure, but also realized the one-to-many structure design according to the goal. The generator predicted according to the input target curve, and outputted a variety of structures or structural parameters that met the requirements, so that the on-demand design could be solved in this way. Moreover, this method was extremely scalable, and can be widely used in metamaterial design re-search. Experiments have proved that this method is feasible and effective, and the time efficiency is unmatched by traditional design methods.
Author Contributions
Conceptualization, Z.H. and J.S; methodology, Z.H.; software, C.L.; validation, P.Z., M.G. and J.L.; investigation, T.T.; resources, C.L.; data curation, T.T.; writing—original draft preparation, Z.H.; writing—review and editing, Z.H.; visualization, Z.H.; supervision, J.S.; project administration, J.S. All authors have read and agreed to the published version of the manuscript.
Funding
This research was supported by the Hainan Provincial Natural Science Foundation of China (2019RC054), and the Finance Science and Technology Project of Hainan Province (ZDKJ2020009).
Conflicts of Interest
The authors declare no conflict of interest.
Footnotes
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- 1.Schurig D., Mock J.J., Justice B.J., Cummer S.A., Pendry J.B., Starr A.F., Smith D.R. Metamaterial Electromagnetic Cloak at Microwave Frequencies. Science. 2006;314:977–980. doi: 10.1126/science.1133628. [DOI] [PubMed] [Google Scholar]
- 2.Shelby R.A., Smith D.R., Schultz S. Experimental Verification of a Negative Index of Refraction. Science. 2001;292:77–79. doi: 10.1126/science.1058847. [DOI] [PubMed] [Google Scholar]
- 3.Rhee J.Y., Yoo Y.J., Kim K.W., Kim Y.J., Lee Y.P. Metamaterial-Based Perfect Absorbers. J. Electromagn. Waves Appl. 2014;28:1541–1580. doi: 10.1080/09205071.2014.944273. [DOI] [Google Scholar]
- 4.Cai Y., Zhu J., Liu Q.H. Tunable enhanced optical absorption of graphene using plasmonic perfect absorbers. Appl. Phys. Lett. 2015;106:043105. doi: 10.1063/1.4906996. [DOI] [Google Scholar]
- 5.Alexander P., Iorsh I., Belov P., Kivshar Y. Hyperbolic Metamaterials. Nat. Photonics. 2013;7:948–957. [Google Scholar]
- 6.Zheng C., Li J., Wang S., Li J., Li M., Zhao H., Hao X., Zang H., Zhang Y., Yao J. Optically tunable all-silicon chiral metasurface in terahertz band. Appl. Phys. Lett. 2021;118:051101. doi: 10.1063/5.0039992. [DOI] [Google Scholar]
- 7.Li W., Coppens Z.J., Besteiro L.V., Wang W., Govorov A.O., Valentine J. Circularly polarized light detection with hot electrons in chiral plasmonic metamaterials. Nat. Commun. 2015;6:8379. doi: 10.1038/ncomms9379. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Lu D., Liu Z. Hyperlenses and metalenses for far-field super-resolution imaging. Nat. Commun. 2012;3:1205. doi: 10.1038/ncomms2176. [DOI] [PubMed] [Google Scholar]
- 9.Wang Z., Jia H., Yao K., Cai W., Chen H., Liu Y. Circular Dichroism Metamirrors with near-Perfect Extinction. ACS Photonics. 2016;3:2096–2101. doi: 10.1021/acsphotonics.6b00533. [DOI] [Google Scholar]
- 10.Hooper D.C., Mark A.G., Kuppe C., Collins J.T., Fischer P., Valev V.K. Strong Rotational Anisotropies Affect Nonlinear Chiral Metamaterials. Adv. Mater. 2017;29:1605110. doi: 10.1002/adma.201605110. [DOI] [PubMed] [Google Scholar]
- 11.Neubrech F., Hentschel M., Liu N. Reconfigurable Plasmonic Chirality: Fundamentals and Applications. Adv. Mater. 2020;32:e1905640. doi: 10.1002/adma.201905640. [DOI] [PubMed] [Google Scholar]
- 12.Li J., Zheng C., Wang G., Li J., Zhao H., Yang Y., Zhang Z., Yang M., Wu L., Li J., et al. Circular dichroism-like response of terahertz wave caused by phase manipulation via all-silicon metasurface. Photonics Res. 2021;9:567. doi: 10.1364/PRJ.415547. [DOI] [Google Scholar]
- 13.Xiao S., Wang T., Liu T., Yan X., Li Z., Xu C. Active modulation of electromagnetically induced transparency analogue in terahertz hybrid metal-graphene metamaterials. Carbon. 2018;126:271–278. doi: 10.1016/j.carbon.2017.10.035. [DOI] [Google Scholar]
- 14.Hamouleh-Alipour A., Mir A., Farmani A. Analytical Modeling and Design of a Graphene Metasurface Sensor for Thermo-Optical Detection of Terahertz Plasmons. IEEE Sens. J. 2020;21:4525–4532. doi: 10.1109/JSEN.2020.3035577. [DOI] [Google Scholar]
- 15.Schmidhuber J. Deep learning in neural networks: An overview. Neural Netw. 2015;61:85–117. doi: 10.1016/j.neunet.2014.09.003. [DOI] [PubMed] [Google Scholar]
- 16.He K., Zhang X., Ren S., Sun J. Deep Residual Learning for Image Recognition; Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR); Las Vegas, NV, USA. 26 June–1 July 2016; pp. 770–778. [Google Scholar]
- 17.Young T., Hazarika D., Poria S., Cambria E. Recent Trends in Deep Learning Based Natural Language Processing [Review Article] IEEE Comput. Intell. Mag. 2018;13:55–75. doi: 10.1109/MCI.2018.2840738. [DOI] [Google Scholar]
- 18.Chen L.C., Papandreou G., Kokkinos I., Murphy K., Yuille A.L. Deeplab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected Crfs. IEEE Trans. Pattern. Anal. Mach. Intell. 2018;40:834–848. doi: 10.1109/TPAMI.2017.2699184. [DOI] [PubMed] [Google Scholar]
- 19.Goh G.B., Hodas N.O., Vishnu A. Deep learning for computational chemistry. J. Comput. Chem. 2017;38:1291–1307. doi: 10.1002/jcc.24764. [DOI] [PubMed] [Google Scholar]
- 20.Baldi P., Sadowski P., Whiteson D. Searching for exotic particles in high-energy physics with deep learning. Nat. Commun. 2014;5:4308. doi: 10.1038/ncomms5308. [DOI] [PubMed] [Google Scholar]
- 21.Angermueller C., Pärnamaa T., Parts L., Stegle O. Deep learning for computational biology. Mol. Syst. Biol. 2016;12:878. doi: 10.15252/msb.20156651. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Chen P.Y., Chen C.H., Wang H., Tsai J.H., Ni W.X. Synthesis Design of Artificial Magnetic Metamaterials Using a Genetic Algorithm. Opt. Express. 2008;6:12806. doi: 10.1364/OE.16.012806. [DOI] [PubMed] [Google Scholar]
- 23.Qiu T., Shi X., Wang J., Li Y., Qu S., Cheng Q., Cui T., Sui S. Deep Learning: A Rapid and Efficient Route to Automatic Metasurface Design. Adv. Sci. 2019;6:1900128. doi: 10.1002/advs.201900128. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Bessa M.A., Glowacki P., Houlder M. Bayesian Machine Learning in Metamaterial Design: Fragile Becomes Supercompressible. Adv. Mater. 2019;31:e1904845. doi: 10.1002/adma.201904845. [DOI] [PubMed] [Google Scholar]
- 25.Ma W., Cheng F., Liu Y. Deep-Learning-Enabled on-Demand Design of Chiral Metamaterials. ACS Nano. 2018;12:6326–6334. doi: 10.1021/acsnano.8b03569. [DOI] [PubMed] [Google Scholar]
- 26.Malkiel I., Mrejen M., Nagler A., Arieli U., Wolf L., Suchowski H. Plasmonic Nanostructure Design and Characterization Via Deep Learning. Light Sci. Appl. 2018;7:60. doi: 10.1038/s41377-018-0060-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.So S., Mun J., Rho J. Simultaneous Inverse Design of Materials and Structures Via Deep Learning: Demonstration of Dipole Resonance Engineering Using Core-Shell Nanoparticles. ACS Appl. Mater. Interfaces. 2019;11:24264–24268. doi: 10.1021/acsami.9b05857. [DOI] [PubMed] [Google Scholar]
- 28.Ma W., Cheng F., Xu Y., Wen Q., Liu Y. Probabilistic Representation and Inverse Design of Metamaterials Based on a Deep Generative Model with Semi-Supervised Learning Strategy. Adv. Mater. 2019;31:e1901111. doi: 10.1002/adma.201901111. [DOI] [PubMed] [Google Scholar]
- 29.Mall A., Patil A., Sethi A., Kumar A. A Cyclical Deep Learning Based Framework for Simultaneous Inverse and Forward Design of Nanophotonic Metasurfaces. Sci. Rep. 2020;10:19427. doi: 10.1038/s41598-020-76400-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.So S., Rho J. Designing Nanophotonic Structures Using Conditional Deep Convolutional Generative Adversarial Networks. Nanophotonics. 2019;8:1255–1261. doi: 10.1515/nanoph-2019-0117. [DOI] [Google Scholar]
- 31.Mehdi M., Osindero S. Conditional Generative Adversarial Nets. [(accessed on 30 September 2021)]. Available online: https://arxiv.org/abs/1411.1784.
- 32.Comsol Multiphysics® Version v. 5.5. [(accessed on 30 September 2021)]. Available online: https://cn.comsol.com.
- 33.Clevert D., Unterthiner T., Hochreiter S. Fast and Accurate Deep Network Learning by Exponential Linear Units (Elus) [(accessed on 30 September 2021)]. Available online: https://arxiv.org/abs/1511.07289v2.