A small microring array that performs large complex-valued matrix-vector multiplication

Junwei Cheng; Yuhe Zhao; Wenkai Zhang; Hailong Zhou; Dongmei Huang; Qing Zhu; Yuhao Guo; Bo Xu; Jianji Dong; Xinliang Zhang

doi:10.1007/s12200-022-00009-4

. 2022 Apr 28;15(1):15. doi: 10.1007/s12200-022-00009-4

A small microring array that performs large complex-valued matrix-vector multiplication

Junwei Cheng ^1,^#, Yuhe Zhao ^1,^#, Wenkai Zhang ¹, Hailong Zhou ^1,^2,³, Dongmei Huang ^3,⁴, Qing Zhu ⁵, Yuhao Guo ⁵, Bo Xu ⁵, Jianji Dong ^1,^✉, Xinliang Zhang ¹

PMCID: PMC9756268 PMID: 36637556

Abstract

As an important computing operation, photonic matrix–vector multiplication is widely used in photonic neutral networks and signal processing. However, conventional incoherent matrix–vector multiplication focuses on real-valued operations, which cannot work well in complex-valued neural networks and discrete Fourier transform. In this paper, we propose a systematic solution to extend the matrix computation of microring arrays from the real-valued field to the complex-valued field, and from small-scale (i.e., 4 × 4) to large-scale matrix computation (i.e., 16 × 16). Combining matrix decomposition and matrix partition, our photonic complex matrix–vector multiplier chip can support arbitrary large-scale and complex-valued matrix computation. We further demonstrate Walsh-Hardmard transform, discrete cosine transform, discrete Fourier transform, and image convolutional processing. Our scheme provides a path towards breaking the limits of complex-valued computing accelerator in conventional incoherent optical architecture. More importantly, our results reveal that an integrated photonic platform is of huge potential for large-scale, complex-valued, artificial intelligence computing and signal processing.

Keywords: Photonic matrix–vector multiplication, Complex-valued computing, Microring array, Signal/image processing

Introduction

With the rapid advancement of technology in recent decades, there is a growing demand for large-capacity, high-speed computing over traditional computing. This is especially seen in the field of convolutional processing, a computationally intensive operation in electronics that occupies over 80% of the total processing time for image processing [1–3]. Optical computing has the ability of parallel processing with wavelength division multiplexing (WDM) due to its intrinsic high speed and low power consumption, thus has been proposed as a promising candidate for mass data processing [4]. Matrix multiplication is the kernel and most common operation in artificial intelligence (AI). It is widely used in artificial neutral networks (ANNs), which have been universally applied in signal processing, imaging recognition, voice recognition, real-time video analysis, and autonomous driving [5, 6]. The optical neural networks (ONNs) can improve the computation speed by several orders of magnitude. For example, a photonic convolutional accelerator comprised of soliton microcombs could carry out up to 10 trillion operations per second [7]. In addition, phase-change material (PCM) has been employed in non-volatile memory storage in optical computing to reduce the energy consumption of optical-electrical conversion during weight data refreshing [8–11]. Recently, an integrated photonic hardware accelerator has successfully executed $10^{12}$ multiply-accumulate operations per second by combining phase-change-material memory and soliton microcombs [9].

A copious amount of research has been conducted in optical matrix computing using spatial light modulators [12, 13], electro-optic modulations [14–16], direct driven LED arrays [17], acousto-optic Bragg cells [18–20], and photorefractive medias [21–23]. Although spatial light modulators and other spatial elements are easily programmable, these methods are in general bulky, complex, and power-consuming. With the advancement of integrated photonics technology and hardware implementation of nanophotonic processors, integrated photonic platforms have shown huge potential for high-performance computing. At present, most existing neural networks are based solely on real-valued algorithms, but complex-valued algorithms may provide a significant advantage when performing tasks, such as the symmetry or XOR problem [24]. A great deal of research on integrated optical computing networks has been done using a cascaded Mach Zehnder interferometer (MZI) mesh [25–28]. MZI meshes have been widely used in linear optical circuits [25, 29], quantum information processing [30], universal multiport interferometers [27], optical modes descramblers [31, 32], and polarization processors [33]. For the linear section of optical neutral networks, impressive works, such as vowel recognition, have been demonstrated [34]. This method allows for good reconfigurability and independent control of both the amplitude and phase. However, the loading of the transmission matrix relies on iterative algorithms, which are quite slow and unsuitable for flexible matrix computations. Moreover, MZIs require a larger power consumption than resonant devices, such as microring resonators (MRRs), which are compact (several micron radius), more energy-efficient, highly integrated, and easily scalable [35, 36]. MRRs are resonant devices and the transmission coefficients are wavelength-sensitive. Parallel incoherent matrix computing can be achieved by controlling the resonant states of MRRs, which is commonly used in optical tensor computing and ONNs [11, 37]. The problem of MRR arrays is that the computation is incoherent, which means MRR arrays can only perform amplitude modulation without phase information. Thus, MMR arrays can only compute non-negative or real numbers assisted by differential detection. In addition, ultra-large-scale MRRs are difficult to implement because of the heavy thermal crosstalk and electronic circuits packaging. Hence, it is believed that MRRs cannot be implemented in a large-scale matrix multiplication to compute complex numbers.

In this paper, we present a systematic solution to extend the matrix computation of MRR arrays from the real-valued field to the complex-valued field, and from small scale (i.e., 4 × 4) to large scale matrix computation (i.e., 16 × 16). We experimentally demonstrate typical matrix–vector multiplication (MVM) applications of MRR arrays in Walsh Hardmard transform (WHT), discrete cosine transform (DCT), discrete Fourier transform (DFT), and image convolutional processing. These applications have significantly expanded the fields of optical computation based on MRR arrays. Our work shows huge potential for high-speed and universal matrix computations, such as applications in photonic accelerators and optical artificial intelligence.

Principle

The structure of the proposed on-chip MRR array (i.e., photonic complex-MVM core) is schematically illustrated in Fig. 1. The on-chip photonic complex-MVM core consists of a tunable silicon MRR array that includes 16 add-drop MRRs arranged in 4 rows and 4 columns. The entire architecture is based on wavelength-division multiplexing (WDM) and on-chip reconfigurable MRR array. The MRR array forms a complete network of a 4 × 4 transmission matrix, whose configuration can be realized by tuning the heater of each MRR.

Without consideration of the transmission loss, every add-drop MRR in each row of the array decides the through transmittance coefficient of $1 - a_{ij}$ and drop transmittance coefficient of $a_{ij}$ , respectively [38]. Then, the difference of these two ports is given by

\begin{matrix} O = XI = [\begin{matrix} \begin{matrix} {1 - 2 a}_{11} & 1 - {2 a}_{12} \\ 1 - {2 a}_{21} & 1 - 2 a_{22} \end{matrix} & \begin{matrix} 1 - {2 a}_{13} & 1 - {2 a}_{14} \\ 1 - {2 a}_{23} & 1 - {2 a}_{24} \end{matrix} \\ \begin{matrix} 1 - {2 a}_{31} & 1 - {2 a}_{32} \\ {1 - 2 a}_{41} & 1 - {2 a}_{42} \end{matrix} & \begin{matrix} 1 - {2 a}_{33} & 1 - {2 a}_{34} \\ 1 - {2 a}_{43} & 1 - {2 a}_{44} \end{matrix} \end{matrix}] ⌈ \begin{matrix} \begin{matrix} i_{1} \\ i_{2} \end{matrix} \\ \begin{matrix} i_{3} \\ i_{4} \end{matrix} \end{matrix} ⌉, \end{matrix}

where the 4 × 1 vector $O = {[o_{1}, o_{2}, o_{3}, o_{4}]}^{T}$ represents the output vector, 4 × 1 vector $I = {[i_{1}, i_{2}, i_{3}, i_{4}]}^{T}$ represents the input vector, and 4 × 4 matrix $X$ stands for the transmission matrix. When the transmission loss is ignored, the drop port coefficient $a_{ij}$ falls in the range of $[0, 1]$ and the corresponding coefficient in the transmission matrix, defined by $1 - {2 a}_{ij}$ , falls in the range of $[- 1, 1]$ . Thus, in the MVM operation, the input vector of $I$ is non-negative, while the transmission matrix of $X$ and the output vector of $O$ can cover the real number field.

Figure 1 also shows the working principle to extend the matrix computation of the MRR array from the real-valued field to the complex-valued field, and from small-scale (i.e., 4 × 4) to large-scale matrix computation. Combining matrix decomposition and matrix partition, our photonic complex-MVM chip can support arbitrary large-scale and complex-valued matrix computation.

Without loss of generality, the MVM consists of an 8 × 1 complex input matrix of $I$ , 8 × 8 complex transmission matrix of $X$ , and output matrix of $O$ . To process a large amount of MVM, the size of the matrices is reduced through matrix partition. Matrix $I$ can be broken into two 4 × 1 matrices, while matrix $X$ can be divided into four 4 × 4 matrices. To process complex MVM in full complex number field, matrix $I$ is divided into $I_{1}$ , $I_{2}$ , $I_{3}$ , $I_{4}$ , defined as the positive real, positive imaginary, negative real, and negative imaginary parts of matrix $I$ , respectively. Matrix $X$ is also divided into $X_{1}$ and $X_{2}$ , representing the real and imaginary parts of $X$ . The elements of the input submatrix, $I_{n} = {[i_{1}, i_{2}, i_{3}, i_{4}]}_{n}^{T} (n = 1, 2, 3, 4)$ , are loaded onto the beams with different wavelengths of $λ_{1}$ , $λ_{2}$ , $λ_{3}$ , and $λ_{4}$ by optical intensity modulators (IMs). After mixing by a wavelength multiplexer (MUX), the input is equally divided into four branches, each of which consists of four independent MRRs aligned to resonate the $λ_{1}$ , $λ_{2}$ , $λ_{3}$ , and $λ_{4}$ wavelengths, respectively. Matrix $X_{n} (n = 1, 2)$ is loaded onto the photonic complex MVM core with the 4 × 4 MRR array, where the coefficients are determined by the voltages applied to each MRR. The output matrix of $O$ is detected by balanced photodetectors (PDs).

If the input vectors of $I_{1}$ , $I_{2}$ ,…, $I_{m}$ are loaded in series, the input vector can be expanded into a n × m matrix where $I = [I_{1}, I_{2}, \dots, I_{m}]$ . Similarly, the corresponding output powers of $O_{1}$ , $O_{2}$ ,…, $O_{m}$ should be measured in series so that the output m × n matrix can be written as $O = [O_{1}, O_{2}, \dots, O_{m}]$ . Hence, the MVM can be expanded into matrix–matrix multiplication denoted by the following equation:

[O_{1}, O_{2}, \dots, O_{m}] = X [I_{1}, I_{2}, \dots, I_{m}] .

Results

Fabrication and experimental setup

The proposed device was fabricated on a silicon-on-insulator (SOI) platform. A 725 μm SOI wafer with 220 nm of top silicon and 2 μm of buried oxide (BOX) was used. The layout is transferred onto photoresist using electron beam lithography (EBL) and the top silicon is etched by inductively coupled plasma (ICP). The grating coupler is shallowly etched by 70 nm, while the silicon waveguide is fully etched by 220 nm. Between the waveguide and metal electrodes, 1 μm of silicon dioxide was deposited using plasma enhanced chemical vapor deposition (PECVD). The metal for the heaters and pads was deposited by electron beam evaporator (EBE). The heaters were made of 150 nm thick and 1 μm wide Ti. The electrical wires and pads were made of 20/250 μm thick Ti/Au.

The microscope image of the fabricated chip is illustrated in Fig. 2a. The input signal is injected through a grating coupler on the left and subsequently divided into four identical branches with a 4 × 4 MRR array. There are eight output gratings, representing the bus through waveguides and bus drop waveguides for each row of MRRs. The eight output gratings are placed in equal distances of 127 μm, the exact distance of the fiber array (FA) coupler. Figure 2b shows the packaged chip, where the metal pads are connected to the printed circuit board (PCB) by wire-bonding and the PCB is controlled by a custom 120-channel voltage source via a flexible flat cable. The input optical grating is coupled to an optical fiber that is vertically glued to the SOI chip. The output optical gratings on the chip are coupled to an optical FA that is attached to the PCB and equally distributed in 127 μm spacing V-groove, so that vertical output light from the chip is reflected 45° by the FA.

The experimental setup is shown in Fig. 2c. A continuous-wave (CW) laser was used as the stable optical source for the IMs. The electrical input data was encoded by a programmable voltage source and used as the driving signal that was temporally fed into the IMs. Since the output of the modulator is polarization-dependent, PCs were placed before and after the IMs to control the polarizations. A dense wavelength division multiplexing component (DWDM) was employed to combine the four wavelengths into a bus waveguide coupled to the packaged SOI chip. The optical powermeter is capable of both detecting and displaying the power values of the optical signals, which allowed us to obtain and record the results directly.

To verify the MVM function, IMs were used to configure the input vector $I$ , while the transmission matrix $X$ was loaded by tuning the voltages applied on the MRR array. The output power values were then obtained from the balanced PDs. After calibration and normalization, the output vector $O$ was obtained. When the input is the identity matrix, the output matrix $O$ is equivalent to the transmission matrix $X$ , allowing the transmission matrix $X$ to be directly read at the output ports. In practical situations, the variation ranges of through transmittance coefficient and drop transmittance coefficient are different due to MRR loss. In this case, the coefficients will require recalibration for actual optical matrix computation (See Appendix A).

To statistically describe the performance of this multiplier, over 500 sets of input vector data and matrix, $X$ , were configured to the IMs and MRR array, respectively. Experimental results showed that the majority of the absolute values of the errors fall within the range of 0 − 0.1, which suggets rather accurate computing. See Appendix B for more details.

Matrix–vector multiplication extending to the full real number field

Since the input vector $I$ was determined by the optical powers modulated by the IMs, the elements must be non-negative. Although the transmission matrix $X$ and output vector $O$ can only cover the real number field, our proposed scheme allows for the conversion of the input elements into negative values, extending the MVM to the full real number field.

Figure 3 illustrates the proposed scheme. First, the input vector (real numbers) was divided into $I_{+}$ , containing all the positive elements and zeros, and $I_{-}$ , containing all the absolute values of the negative elements. The relationship between $I_{+}$ , and $I_{-}$ are given by

\{\begin{matrix} I_{+} = \frac{|I| + I}{2}, \\ I_{-} = \frac{|I| - I}{2}, \\ I = I_{+} - I_{-} . \end{matrix})

Fig. 3 — Matrix computation extending to the full real number field. The 4 × 1 block array represents the input or output vectors and the 4 × 4 block array represents the transmission matrix. The bar graph shows the results from one operation, where the inputs or experimental outputs are represented by the colored bars and the theoretical outputs or transmission matrix are represented by the gray bars

The resulting two non-negative vectors, $I_{+}$ and $I_{-}$ , are subsequently used in place of the origin input vector. The transmission matrix $X$ was then loaded and the input vectors were configured as $I_{+}$ and $I_{-}$ , respectively, to obtain the two output vectors, $P$ and $Q$ . The targeted output matrix $O$ was obtained following subtraction operation. The relationships between $P$ , $Q$ , and $O$ are expressed below

\begin{matrix} \{\begin{matrix} P = X I_{+}, \\ Q = X I_{-}, \\ O = P - Q . \end{matrix}) \end{matrix}

Using the method described above, we were able to successfully split a real-valued optical MVM operation into two non-negative optical MVMs and one subtraction in the electrical domain. Figure 3 shows an experimental example of a real-valued MVM. The theoretical and experimental results are shown in three-dimensional bar graphs next to the corresponding matrices or vectors.

Matrix–vector multiplication extending to the full complex number field

To further extend our matrix computation into the complex number field, the input vector $I$ and transmission matrix $X$ were both separated into a real part and imaginary part. The output vector can be expressed as

\begin{matrix} O = XI = (real (X) + i * imag (X)) (real (I) + i * imag (I)), \end{matrix}

where $i$ is the square root of minus one, $real (M)$ represents the real part of matrix $M$ , and $imag (M)$ represents the imaginary part of matrix $M (here, M can be X, I or O)$ .

The matrix multiplication can then be divided into

\begin{matrix} \{\begin{matrix} real (O) = real (X) real (I) - imag (X) imag (I), \\ imag (O) = real (X) imag (I) + imag (X) real (I) . \end{matrix}) \end{matrix}

As seen in Fig. 4a, the complex-valued matrix multiplication was divided into four operations of optical MVMs, specifically $real (X) real (I)$ , $imag (X) imag (I)$ , $real (X) imag (I)$ , and $imag (X) real (I)$ , as well as two operations of electrical addition or subtraction operations. Figure 4a also shows an experimental demonstration of complex MVM. The two-dimensional coordinate diagrams in blue dots represent the corresponding input vectors or output vectors, and the three-dimensional gray bar graphs represent the transmission matrix. The experimental results are consistent with the theoretical results. In addition, the experimental results presented in Fig. 4b of the output of complex-valued matrix multiplication are also consistent with the predicted results.

Matrix–vector multiplication extending to higher dimensions

Considering the fact that partition of matrix can enlarge the matrix dimension, we were able to implement a high dimensional MVM with low dimensional MRR array via matrix partition. Figure 5 illustrates the basic principle of matrix partition. The input and output data are 8 × 1 vectors and the transmission matrix of $X$ is an 8 × 8 matrix. To execute the 8 × 8 matrix computation using our 4 × 4 processor, the input and output vectors have to be split into two 4 × 1 vectors. Meanwhile, the transmission matrix is broken into four 4 × 4 matrices. Therefore, the equation can be written as

\begin{matrix} O = (\binom{O_{1}}{O_{2}}) = (\begin{matrix} X_{11} & X_{12} \\ X_{21} & X_{22} \end{matrix}) (\binom{I_{1}}{I_{2}}) = (\binom{X_{11} I_{1} + X_{12} I_{2}}{X_{21} I_{1} + X_{22} I_{2}}) . \end{matrix}

Fig. 5 — Example of the partition of an 8 × 8 MVM. The 4 × 1 block array represents the input or output vectors, and the 4 × 4 block array represents the transmission matrix. The bar graph shows the results from one operation, where the inputs or experimental outputs are represented by colored bars and the theoretical outputs are represented by gray bars

Therefore, the partition of matrix can be realized by four rounds of optical MVMs and two rounds of electrical additions. Figure 5 shows an experimental demonstration of a partition of MVM, where the theoretical or experimental results are given in the three-dimensional bar graphs. It can be also seen from Fig. 5 that the experimental results are in agreement with the theoretical predictions.

Applications in signal transformation and image processing

Modern signal and image processing are two fields where algorithms based on large complex MVMs are widely utilized. This paper demonstrates three typical signal transformations, specifically, discrete WHT, DCT, and DFT [39]. WHT is orthogonal transformation that is widely used in imaging and code division multiple access [40]. The Hadamard matrix elements are equal to 1 or − 1, so that there are only addition and subtraction operations in the calculation, making it much simpler than DFT and DCT. Energy concentration is a characteristic of WHT, meaning the more uniform the numbers in the original data are, the more concentrated the transformed data are on the side. This property makes WHT advantageous for image compression [41]. Figure 6a shows the input signal and Fig. 6e shows the transformed signals after our matrix size to 16 × 16 was extended. One can see that WHT can compress information in the low frequency region if the input signal has a uniform amplitude distribution, thus the high frequency region can be ignored since it has a very low amplitude. DCT plays an important role in signal processing, signal modulation, and demodulation [42]. A periodic sequence was input into a 16 × 16 network and the output matrix was calculated, as shown in Fig. 6b and f. The first half of the former sequence was loaded into an 8 × 8 network as the input, depicted in Fig. 6c. The resulting output vector is quite similar to that presented in Fig. 6f and g. These results reflect the symmetry of DCT and provide supporting evidence that our system can correctly perform DCT. In addition, DFT can convert a signal sampling in time domain into frequency domain, one of the most frequently used operations in signal transformation [43]. Here, we used an input signal in the form of a square wave. Since DFT is a complex transformation, the amplitude of the output sequence is shown in form of its absolute value, which is shaped in a sinc function, as shown in Fig. 6d and h. The results show that not only can DFT be performed by our system, the calculation errors are also very small.

Fig. 6 — Input and output signal sequence of three signal transformation. The input sequence of a WHT, b and c, DCT, and d DFT. e Experimental results (blue bars) and ideal (red bars) output of WHT. f and g Experimental results (blue bars) and ideal (red bars) output of DCT. h Experimental results (blue bars) and ideal (red bars) output of DFT

Image convolution is of paramount importance to convolutional neural networks and image processing, which can be performed in optical domain to achieve convolutional acceleration. To experimentally verify image convolution with our MVM, we choose the logo of Wuhan National Laboratory for Optoelectronics (WNLO) as an example, as well as seven different 3 × 3 sized kernels. The kernels are designed to perform different image processing functions or highlight different edges of the original image. The pixel values of the input image are loaded into the IMs by the electrical waveform and the on-chip MRR array is loaded by the transmission matrix representing the kernel. Figure 7 shows the experimental results, including the recovered feature maps and corresponding transmission matrices of the kernels.

Compared with the original image, the edge features of the processed image are clearly visible in Fig. 7e−h, demonstrating the effectiveness of the optical convolution operation. The kernels in Fig. 7b–d correctly performed different image processing functions, including blur, motion blur, and sharpen. The kernels in Fig. 7e−h highlighted the edges of the original image in different directions. Using the theoretical results as reference, we determined that the calculation errors of the optical convolution operation was mainly concentrated on the bright part (i.e., high pixel value area) of the image, which indicates that these errors are largely caused by thermal crosstalk, rather than noise. Real-time calibration algorithms and external temperature control devices are implemented for system stability.

Discussion and future perspective

The experimental results of both signal and image processing clearly demonstrate that our proposed system is able to extend matrix computation to (1) real numbers, (2) full complex numbers, (3) higher processing dimensions, and (4) convolution. Thus, the processor can serve as a universal matrix arithmetic processor for complex tasks in various application scenarios.

However, the processor can be further improved in several ways. For example, the computational efficiency can be multiplied by making full use of parallel computation or by increasing the number of input wavelengths. Note that the transmission spectrum of MRR is repeated with a period of about 6 nm, which represents the free spectral range (FSR) of MRR. Therefore, multiple sets of input vectors with an interval equal to FSR can be operated simultaneously, as shown in Fig. 8. Suppose that there are m sets of different input vectors and the wavelengths of the input matrices are set as $(λ_{1}, λ_{2}, λ_{3}, λ_{4}) + p FSR$ , where $p = 0, 1, \dots, m - 1$ . To obtain the output data, the output powers of each row are divided by the wavelength-division multiplexer and separately detected by m sets of corresponding balanced PDs. In this process, the state of the transmission matrix is fixed (i.e., the state of MRR array is fixed), while the m sets of input and output vectors are independently paralleled. This means that m sets of MVMs can be executed simultaneously, demonstrating the possibility of parallel optical computation. Secondly, full integration is crucial to improve the competitiveness of optical computing compared to electrical matrix processing. As shown in Fig. 8, an optical comb is integrated into the chip, providing a series of comb lines that are modulated by IMs of the input module. With this, the experimental setup is greatly simplified. The thermally tuned MRRs can be replaced by electrically tuned ones, which might improve the response rate by several orders of magnitude. As for electrical control, the electrical controller/receiver, together with microcontroller, random access memory (RAM), and external ports are applied to improve system response rate.

Fig. 8 — Highly integrated on-chip scheme for optical parallel computation. There are m sets of different input vectors provided by multiwavelength light source (e.g., on-chip optical comb). Input signals are modulated in different wavelengths by the IMs, then multiplexed as the input of MRR array via wavelength division multiplexers (MUXs). The output powers of each row in MRR array are divided by the wavelength division demultiplexers (DEMUXs) and separately detected by m sets of corresponding photodiodes. Each set of wavelengths is used for one input vector. The electrical controller/receiver are driven by microcontroller equipped with RAM and external ports

Conclusion

In conclusion, we have demonstrated a small MRR array that performs large complex MVM. Through matrix decomposition and partition, we have also optimized the photonic complex-MVM core so that it can perform larger complex MVM and extended its matrix computation to (1) real number, (2) complex number, and (3) higher processing dimensions. We have fabricated the integrated photonic complex-MVM core on an SOI platform, which is compact and compatible with CMOS technology. With a small MRR array, the 4 × 4 matrix computation system can be scaled up to 8 × 8, 16 × 16, or even larger operation dimensions in complex field with traditional incoherent computing. The processor was then applied for WHT, DCT and DFT signal transformations. Image processing with 7 types of convolutional kernels is also experimentally demonstrated. Our proposed system shows adequate performance in various applications. The processing capacity of this matrix–vector multiplier can be further enhanced by enabling parallel WDM computation and full integration with on-chip laser sources and electrical microcontrollers in the future.

Acknowledgements

This work was partially supported by the National Key Research and Development Project of China (No. 2018YFB2201901), the National Natural Science Foundation of China (Grant Nos. 61805090 and 62075075), Shenzhen Science and Technology Innovation Commission (No. SGDX2019081623060558), and Research Grants Council of Hong Kong SAR (No. PolyU152241/18E).

Biographies

Junwei Cheng

is currently a Ph.D. candidate in Huazhong University of Science and Technology (HUST), China. He received his B.Eng. degree from HUST in 2019. His current research interests include silicon photonics and photonic neuromorphic computing. graphic file with name 12200_2022_9_Figa_HTML.jpg

Yuhe Zhao

received her Ph.D. degree in Optical Engineering from Huazhong University of Science and Technology, China in 2021. Her research interests are optoelectronic devices and integration, microwave photonics and arbitrary waveform generation. graphic file with name 12200_2022_9_Figb_HTML.jpg

Wenkai Zhang

received his Bachelor’s degree from Huazhong University of Science and Technology (HUST), China in 2021. Then he joined Wuhan National Laboratory for Optoelectronics at HUST as a Ph.D. candidate. His research interests include photonic integrated circuits and neuromorphic photonics. graphic file with name 12200_2022_9_Figc_HTML.jpg

Hailong Zhou

received his B.S. degree from Huazhong University of Science and Technology (HUST), China in 2012/06. From 2012/09 − 2017/06, he studied in Wuhan National Laboratory for Optoelectronics at HUST as a doctoral candidate and received his Ph.D. degree in 2017/06. Currently, he is a post-doctor in Wuhan National Laboratory for Optoelectronics at HUST, China. His research interests are silicon photonics and photonic accelerators for artificial intelligence. graphic file with name 12200_2022_9_Figd_HTML.jpg

Dongmei Huang

received her B.S. degree in 2014 from Huazhong University of Science and Technology, China, obtained her M.S. degree in 2017 from Chongqing University, China, and obtained her Ph.D. degree in 2020 from the Hong Kong Polytechnic University, China. She is currently a Research Assistant Professor at Photonics Research Centre of The Hong Kong Polytechnic University. Her research interests include wavelength swept lasers and its applications in optical coherence tomography and optical sensing systems, nonlinear microresonators. graphic file with name 12200_2022_9_Fige_HTML.jpg

Qing Zhu

Senior Engineer from Huawei Technologies Co., LTD., who received her Ph.D. degree in Condensed Matter Physics from Institute of Physics, University of the Chinese Academy of Sciences, China, in 2020, and her bachelor’s degree in Physics from Jilin University, China, in 2014. After finishing her PhD, she has been working in Institute of Strategic Research of Huawei Technologies Co., LTD. Her recent research interests include optical computing, silicon modulator and integrated photonics. graphic file with name 12200_2022_9_Figf_HTML.jpg

Yuhao Guo

received the Ph.D. degree from Tianjin University, China, in 2020. Since 2021, he has been a senior engineer with Institute of Strategic Research in Huawei technologies Co. LTD. His research interests include integrated nanophotonics, chip-scale optical interconnects, metasurface, and optical computing. He has authored or co-authored 27 peer-reviewed articles and he has 3 patents issued. graphic file with name 12200_2022_9_Figg_HTML.jpg

Bo Xu

Senior Engineer from Huawei Technologies Co., LTD., who received her Ph.D. degree in Optics from Shanghai Institute of Optics and Fine Mechanics, University of the Chinese Academy of Sciences, China, in 2020, and her bachelor’s degree in Optoelectronic Technology and Science from Nankai University, Tianjin, China, in 2015. After finishing her Ph.D., she has been worked in Institute of Strategic Research of Huawei Technologies Co., LTD. Her recent research interests include photonic chip and optical computing. graphic file with name 12200_2022_9_Figh_HTML.jpg

Jianji Dong

is Professor of Wuhan National Laboratory for Optoelectronics (WNLO), Huazhong University of Science and Technology (HUST), China. He received his Ph.D. degree in Optical Engineering from HUST in 2008. After that, he worked as postdoc at Cambridge University, UK till 2010. From March 2010, he returned to HUST and was promoted as a full professor in 2013. His research interests include integrated microwave photonics, silicon photonics, and photonic computing. He has published more than 100 Journal papers, including Nature Communications, Light science and applications, Physical Review Letters, etc. He has some special contribution to energy-efficient graphene silicon microheater, programmable temporal cloak, and complex spectrum analyzer of orbital angular momentum mode. He was honored First award of Natural Science of Hubei Province. He is the editorial member of Scientific Reports, associate editor of IET Optoelectronics, and executive editor-in-chief of Frontiers of Optoelectronics. He is an IEEE Senior Member and OSA member. graphic file with name 12200_2022_9_Figi_HTML.jpg

Xinliang Zhang

received his Ph.D. degree in Physical Electronics from Huazhong University of Science and Technology (HUST), China in 2001. He is currently with Wuhan National Laboratory for Optoelectronics and School of Optical and Electronic Information, HUST, as a Professor. He is the author or coauthor of more than 300 journal and conference papers. His current research interests include InP-based and Si-based devices and integration for optical network, high-performance computing and ultrafast optical measurements. In 2016, he was elected as OSA Fellow. graphic file with name 12200_2022_9_Figj_HTML.jpg

Appendix

A. Calibration of MRR array

Since the MRR is a resonant device, the transmittance of the through and drop ports depends on the difference between the laser and resonance wavelength of the MRR. Therefore, the four laser wavelengths need to be calibrated at the resonance peak of the corresponding MRR prior to experimentation. Figure 9a shows the state in which the laser wavelength is not aligned with the resonant peak of the MRR. In this case, the transmission coefficient of the MRR is $x_{ij} = 1$ . As shown in Fig. 9b, the voltage values of the four MRRs were changed so that the four laser wavelengths coincide with the resonant peak of the MRRs, where the transmission coefficients were all x_ij = 1. The calibration of the ring array is between these two states. Figure 9c shows the normalized power detected at the through port when the MRR is fixed in the all-pass state (i.e., the transmission coefficient of the MRR is 1) and the voltage applied on the IM is changed. The voltage-input relational table was obtained by choosing a fixed step length of 20 mV, applying 300 V steps to the IM, and measuring the corresponding output power. When a particular input needs to be loaded, the computer applies table look-up and loads the corresponding voltage into the IM. Similarly, the table look-up method is used in MRR calibration. First, the corresponding input is set at the maximum value of 1 and the voltages are selected according to a fixed step size between $x_{ij} = - 1$ and $x_{ij} = 1$ . Then, the voltages are applied to the MRR array and the output powers of MRR are measured. The voltage-transmission relational table was obtained and shown in Fig. 9d and e. When a particular transfer coefficient need to be loaded, the computer looks up the nearest value in the table using the look-up table method and loads the corresponding voltage onto the MRR electrodes.

Fig. 9 — Calibration of MRR array. a and b are the spectra before and after laser wavelength calibration respectively, and the four laser wavelengths are represented by four colors (red, green, blue, purple) respectively. c Voltage-input relational table. d Normalized optical power of the through port (red line) and drop port (blue line) of a certain MRR. e Normalized optical power curve after difference calculation

B Experimental verification of matrix–vector multiplier

Figure 10a presents a sample experimental transmission function of $X$ , Fig. 10b lists the corresponding theoretical results, and Fig. 10c summarizes the vector data results. Each data point represents a dot product of one of the row vectors of $X$ and the input vector. The blue line represents the experimental results and the red line represents the deviation of each experimental point. The error statistics are calculated and shown in Fig. 10d, where most of the absolute values of the errors fall within the range of 0–0.1.

Authors’ contributions

The authors read and approved the final manuscript.

Declarations

Competing interests

The authors declare that they have no competing interests.

Footnotes

Junwei Cheng and Yuhe Zhao contributed equally to this work.

References

1.Shi, W., Caballero, J., Huszar, F., Totz, J., Aitken, A.P., Bishop, R., Rueckert, D., Wang, Z.: Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. Proc. CVPR 1874–1883 (2016)
2.Li, X., Zhang, G., Huang, H.H., Wang, Z., Zheng, W.: Performance analysis of GPU-based convolutional neural networks. Proc. ICPP 67–76 (2016)
3.Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. Proc. CVPR 5325–5334 (2015)
4.Kitayama KI, Notomi M, Naruse M, Inoue K, Kawakami S, Uchida A. Novel frontier of photonics for data processing—photonic accelerator. APL Photonics. 2019;4(9):090901. doi: 10.1063/1.5108912. [DOI] [Google Scholar]
5.Krizhevsky A, Sutskever I, Hinton GE. ImageNet classification with deep convolutional neural networks. Commun. ACM. 2017;60(6):84–90. doi: 10.1145/3065386. [DOI] [Google Scholar]
6.LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015;521(7553):436–444. doi: 10.1038/nature14539. [DOI] [PubMed] [Google Scholar]
7.Xu X, Tan M, Corcoran B, Wu J, Boes A, Nguyen TG, Chu ST, Little BE, Hicks DG, Morandotti R, Mitchell A, Moss DJ. 11 TOPS photonic convolutional accelerator for optical neural networks. Nature. 2021;589(7840):44–51. doi: 10.1038/s41586-020-03063-0. [DOI] [PubMed] [Google Scholar]
8.Wu C, Yu H, Lee S, Peng R, Takeuchi I, Li M. Programmable phase-change metasurfaces on waveguides for multimode photonic convolutional neural network. Nat. Commun. 2021;12(1):96. doi: 10.1038/s41467-020-20365-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Feldmann J, Youngblood N, Karpov M, Gehring H, Li X, Stappers M, Le Gallo M, Fu X, Lukashchuk A, Raja AS, Liu J, Wright CD, Sebastian A, Kippenberg TJ, Pernice WHP, Bhaskaran H. Parallel convolutional processing using an integrated photonic tensor core. Nature. 2021;589(7840):52–58. doi: 10.1038/s41586-020-03070-1. [DOI] [PubMed] [Google Scholar]
10.Ríos C, Youngblood N, Cheng Z, Le Gallo M, Pernice WHP, Wright CD, Sebastian A, Bhaskaran H. In-memory computing on a photonic platform. Sci. Adv. 2019;5(2):5759. doi: 10.1126/sciadv.aau5759. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Feldmann J, Youngblood N, Wright CD, Bhaskaran H, Pernice WHP. All-optical spiking neurosynaptic networks with self-learning capabilities. Nature. 2019;569(7755):208–214. doi: 10.1038/s41586-019-1157-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Lin X, Rivenson Y, Yardimci NT, Veli M, Luo Y, Jarrahi M, Ozcan A. All-optical machine learning using diffractive deep neural networks. Science. 2018;361(6406):1004–1008. doi: 10.1126/science.aat8084. [DOI] [PubMed] [Google Scholar]
13.Zhou T, Lin X, Wu J, Chen Y, Xie H, Li Y, Fan J, Wu H, Fang L, Dai Q. Large-scale neuromorphic optoelectronic computing with a reconfigurable diffractive processing unit. Nat. Photonics. 2021;15(5):367–373. doi: 10.1038/s41566-021-00796-w. [DOI] [Google Scholar]
14.Zhu W, Zhang L, Lu Y, Zhou P, Yang L. Design and experimental verification for optical module of optical vector-matrix multiplier. Appl. Opt. 2013;52(18):4412–4418. doi: 10.1364/AO.52.004412. [DOI] [PubMed] [Google Scholar]
15.Habiby SF, Collins Jr SA. Implementation of a fast digital optical matrix-vector multiplier using a holographic look-up table and residue arithmetic. Appl. Opt. 1987;26(21):4639–4652. doi: 10.1364/AO.26.004639. [DOI] [PubMed] [Google Scholar]
16.Bocker RP, Clayton SR, Bromley K. Electrooptical matrix multiplication using the twos complement arithmetic for improved accuracy. Appl. Opt. 1983;22(13):2019. doi: 10.1364/AO.22.002019. [DOI] [PubMed] [Google Scholar]
17.Goodman JW, Dias AR, Woody LM. Fully parallel, high-speed incoherent optical method for performing discrete Fourier transforms. Opt. Lett. 1978;2(1):1–3. doi: 10.1364/OL.2.000001. [DOI] [PubMed] [Google Scholar]
18.Hong J, Yeh P. Photorefractive parallel matrix-matrix multiplier. Opt. Lett. 1991;16(17):1343–1345. doi: 10.1364/OL.16.001343. [DOI] [PubMed] [Google Scholar]
19.Cartwright S. New optical matrix-vector multiplier. Appl. Opt. 1984;23(11):1683–1684. doi: 10.1364/AO.23.001683. [DOI] [PubMed] [Google Scholar]
20.Athale RA, Collins WC. Optical matrix-matrix multiplier based on outer product decomposition. Appl. Opt. 1982;21(12):2089–2090. doi: 10.1364/AO.21.002089. [DOI] [PubMed] [Google Scholar]
21.Mukhopadhyay S, Das DN, Das PP, Ghosh P. Implementation of all-optical digital matrix multiplication scheme with nonlinear material. Opt. Eng. (Redondo Beach, Calif.) 2001;40(9):1998–2002. [Google Scholar]
22.Liu B, Liu LR, Shao L, Chen HQ. Matrix–vector multiplication in a photorefractive crystal. Opt. Commun. 1998;146(1–6):34–38. doi: 10.1016/S0030-4018(97)00512-9. [DOI] [Google Scholar]
23.Gu C, Campbell S, Yeh P. Matrix–matrix multiplication by using grating degeneracy in photorefractive media. Opt. Lett. 1993;18(2):146–148. doi: 10.1364/OL.18.000146. [DOI] [PubMed] [Google Scholar]
24.Nitta T. Orthogonality of decision boundaries in complex-valued neural networks. Neural Comput. 2004;16(1):73–97. doi: 10.1162/08997660460734001. [DOI] [PubMed] [Google Scholar]
25.Zhou H, Zhao Y, Xu G, Wang X, Tan Z, Dong J, Zhang X. Chip-scale optical matrix computation for pagerank algorithm. IEEE J. Sel. Top. Quantum Electron. 2020;26(2):1–10. [Google Scholar]
26.Bogaerts W, Pérez D, Capmany J, Miller DAB, Poon J, Englund D, Morichetti F, Melloni A. Programmable photonic circuits. Nature. 2020;586(7828):207–216. doi: 10.1038/s41586-020-2764-0. [DOI] [PubMed] [Google Scholar]
27.Clements WR, Humphreys PC, Metcalf BJ, Kolthammer WS, Walsmley IA. Optimal design for universal multiport interferometers. Optica. 2016;3(12):1460–1465. doi: 10.1364/OPTICA.3.001460. [DOI] [Google Scholar]
28.Miller DAB. Self-configuring universal linear optical component. Photonics Res. 2013;1(1):1–15. doi: 10.1364/PRJ.1.000001. [DOI] [Google Scholar]
29.Mennea PL, Clements WR, Smith DH, Gates JC, Metcalf BJ, Bannerman RHS, Burgwal R, Renema JJ, Kolthammer WS, Walmsley IA, Smith PGR. Modular linear optical circuits. Optica. 2018;5(9):1087–1090. doi: 10.1364/OPTICA.5.001087. [DOI] [Google Scholar]
30.Carolan J, Harrold C, Sparrow C, Martín-López E, Russell NJ, Silverstone JW, Shadbolt PJ, Matsuda N, Oguma M, Itoh M, Marshall GD, Thompson MG, Matthews JCF, Hashimoto T, O’Brien JL, Laing A. Universal linear optics. Science. 2015;349(6249):711–716. doi: 10.1126/science.aab3642. [DOI] [PubMed] [Google Scholar]
31.Zhou H, Zhao Y, Wang X, Gao D, Dong J, Zhang X. Self-configuring and reconfigurable silicon photonic signal processor. ACS Photonics. 2020;7(3):792–799. doi: 10.1021/acsphotonics.9b01673. [DOI] [Google Scholar]
32.Annoni A, Guglielmi E, Carminati M, Ferrari G, Sampietro M, Miller DAB, Melloni A, Morichetti F. Unscrambling light-automatically undoing strong mixing between modes. Light Sci Appl. 2017;6(12):e17110. doi: 10.1038/lsa.2017.110. [DOI] [PMC free article] [PubMed] [Google Scholar]
33.Zhou H, Zhao Y, Wei Y, Li F, Dong J, Zhang X. All-in-one silicon photonic polarization processor. Nanophotonics. 2019;8(12):2257–2267. doi: 10.1515/nanoph-2019-0310. [DOI] [Google Scholar]
34.Shen Y, Harris NC, Skirlo S, Prabhu M, Baehr-Jones T, Hochberg M, Sun X, Zhao S, Larochelle H, Englund D, Soljačić M. Deep learning with coherent nanophotonic circuits. Nat. Photonics. 2017;11(7):441–446. doi: 10.1038/nphoton.2017.93. [DOI] [Google Scholar]
35.Tait AN, de Lima TF, Zhou E, Wu AX, Nahmias MA, Shastri BJ, Prucnal PR. Neuromorphic photonic networks using silicon photonic weight banks. Sci. Rep. 2017;7(1):7430. doi: 10.1038/s41598-017-07754-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
36.Yang, L., Zhang, L., Ji, R.: On-chip optical matrix-vector multiplier. Optics and Photonics for Information Processing Vii (2013)
37.Miscuglio M, Sorger VJ. Photonic tensor cores for machine learning. Appl. Phys. Rev. 2020;7(3):031404. doi: 10.1063/5.0001942. [DOI] [Google Scholar]
38.Zhao Y, Wang X, Gao D, Dong J, Zhang X. On-chip programmable pulse processor employing cascaded MZI-MRR structure. Front. Optoelectron. 2019;12(2):148–156. doi: 10.1007/s12200-018-0846-5. [DOI] [Google Scholar]
39.Roy, A.B., Dey, D., Mohanty, B., Banerjee, D.: Comparison of FFT, DCT, DWT, WHT compression techniques on electrocardiogram and photoplethysmography signals. IJCA Special Issue on International Conference on Computing, Communication and Sensor Network CCSN, 2012. 6–11
40.Rahardja S, Ser W, Lin ZN. UCHT-based complex sequences for asynchronous CDMA system. IEEE Trans. Commun. 2003;51(4):618–626. doi: 10.1109/TCOMM.2003.810798. [DOI] [Google Scholar]
41.Andrushia AD, Thangarjan R. Saliency-based image compression using walsh–hadamard transform (WHT) Biologically rationalized computing techniques for image processing applications: Springer; 2018. pp. 21–42. [Google Scholar]
42.Strang G. The discrete cosine transform. SIAM Rev. 1999;41(1):135–147. doi: 10.1137/S0036144598336745. [DOI] [Google Scholar]
43.Oppenheim A.V., Schafer, R. W., Buck, J. R.: Discrete-TimeSignal Processing. Norwood: Pearson Education India (1999)

[CR1] 1.Shi, W., Caballero, J., Huszar, F., Totz, J., Aitken, A.P., Bishop, R., Rueckert, D., Wang, Z.: Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. Proc. CVPR 1874–1883 (2016)

[CR2] 2.Li, X., Zhang, G., Huang, H.H., Wang, Z., Zheng, W.: Performance analysis of GPU-based convolutional neural networks. Proc. ICPP 67–76 (2016)

[CR3] 3.Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. Proc. CVPR 5325–5334 (2015)

[CR4] 4.Kitayama KI, Notomi M, Naruse M, Inoue K, Kawakami S, Uchida A. Novel frontier of photonics for data processing—photonic accelerator. APL Photonics. 2019;4(9):090901. doi: 10.1063/1.5108912. [DOI] [Google Scholar]

[CR5] 5.Krizhevsky A, Sutskever I, Hinton GE. ImageNet classification with deep convolutional neural networks. Commun. ACM. 2017;60(6):84–90. doi: 10.1145/3065386. [DOI] [Google Scholar]

[CR6] 6.LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015;521(7553):436–444. doi: 10.1038/nature14539. [DOI] [PubMed] [Google Scholar]

[CR7] 7.Xu X, Tan M, Corcoran B, Wu J, Boes A, Nguyen TG, Chu ST, Little BE, Hicks DG, Morandotti R, Mitchell A, Moss DJ. 11 TOPS photonic convolutional accelerator for optical neural networks. Nature. 2021;589(7840):44–51. doi: 10.1038/s41586-020-03063-0. [DOI] [PubMed] [Google Scholar]

[CR8] 8.Wu C, Yu H, Lee S, Peng R, Takeuchi I, Li M. Programmable phase-change metasurfaces on waveguides for multimode photonic convolutional neural network. Nat. Commun. 2021;12(1):96. doi: 10.1038/s41467-020-20365-z. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR9] 9.Feldmann J, Youngblood N, Karpov M, Gehring H, Li X, Stappers M, Le Gallo M, Fu X, Lukashchuk A, Raja AS, Liu J, Wright CD, Sebastian A, Kippenberg TJ, Pernice WHP, Bhaskaran H. Parallel convolutional processing using an integrated photonic tensor core. Nature. 2021;589(7840):52–58. doi: 10.1038/s41586-020-03070-1. [DOI] [PubMed] [Google Scholar]

[CR10] 10.Ríos C, Youngblood N, Cheng Z, Le Gallo M, Pernice WHP, Wright CD, Sebastian A, Bhaskaran H. In-memory computing on a photonic platform. Sci. Adv. 2019;5(2):5759. doi: 10.1126/sciadv.aau5759. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR11] 11.Feldmann J, Youngblood N, Wright CD, Bhaskaran H, Pernice WHP. All-optical spiking neurosynaptic networks with self-learning capabilities. Nature. 2019;569(7755):208–214. doi: 10.1038/s41586-019-1157-8. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR12] 12.Lin X, Rivenson Y, Yardimci NT, Veli M, Luo Y, Jarrahi M, Ozcan A. All-optical machine learning using diffractive deep neural networks. Science. 2018;361(6406):1004–1008. doi: 10.1126/science.aat8084. [DOI] [PubMed] [Google Scholar]

[CR13] 13.Zhou T, Lin X, Wu J, Chen Y, Xie H, Li Y, Fan J, Wu H, Fang L, Dai Q. Large-scale neuromorphic optoelectronic computing with a reconfigurable diffractive processing unit. Nat. Photonics. 2021;15(5):367–373. doi: 10.1038/s41566-021-00796-w. [DOI] [Google Scholar]

[CR14] 14.Zhu W, Zhang L, Lu Y, Zhou P, Yang L. Design and experimental verification for optical module of optical vector-matrix multiplier. Appl. Opt. 2013;52(18):4412–4418. doi: 10.1364/AO.52.004412. [DOI] [PubMed] [Google Scholar]

[CR15] 15.Habiby SF, Collins Jr SA. Implementation of a fast digital optical matrix-vector multiplier using a holographic look-up table and residue arithmetic. Appl. Opt. 1987;26(21):4639–4652. doi: 10.1364/AO.26.004639. [DOI] [PubMed] [Google Scholar]

[CR16] 16.Bocker RP, Clayton SR, Bromley K. Electrooptical matrix multiplication using the twos complement arithmetic for improved accuracy. Appl. Opt. 1983;22(13):2019. doi: 10.1364/AO.22.002019. [DOI] [PubMed] [Google Scholar]

[CR17] 17.Goodman JW, Dias AR, Woody LM. Fully parallel, high-speed incoherent optical method for performing discrete Fourier transforms. Opt. Lett. 1978;2(1):1–3. doi: 10.1364/OL.2.000001. [DOI] [PubMed] [Google Scholar]

[CR18] 18.Hong J, Yeh P. Photorefractive parallel matrix-matrix multiplier. Opt. Lett. 1991;16(17):1343–1345. doi: 10.1364/OL.16.001343. [DOI] [PubMed] [Google Scholar]

[CR19] 19.Cartwright S. New optical matrix-vector multiplier. Appl. Opt. 1984;23(11):1683–1684. doi: 10.1364/AO.23.001683. [DOI] [PubMed] [Google Scholar]

[CR20] 20.Athale RA, Collins WC. Optical matrix-matrix multiplier based on outer product decomposition. Appl. Opt. 1982;21(12):2089–2090. doi: 10.1364/AO.21.002089. [DOI] [PubMed] [Google Scholar]

[CR21] 21.Mukhopadhyay S, Das DN, Das PP, Ghosh P. Implementation of all-optical digital matrix multiplication scheme with nonlinear material. Opt. Eng. (Redondo Beach, Calif.) 2001;40(9):1998–2002. [Google Scholar]

[CR22] 22.Liu B, Liu LR, Shao L, Chen HQ. Matrix–vector multiplication in a photorefractive crystal. Opt. Commun. 1998;146(1–6):34–38. doi: 10.1016/S0030-4018(97)00512-9. [DOI] [Google Scholar]

[CR23] 23.Gu C, Campbell S, Yeh P. Matrix–matrix multiplication by using grating degeneracy in photorefractive media. Opt. Lett. 1993;18(2):146–148. doi: 10.1364/OL.18.000146. [DOI] [PubMed] [Google Scholar]

[CR24] 24.Nitta T. Orthogonality of decision boundaries in complex-valued neural networks. Neural Comput. 2004;16(1):73–97. doi: 10.1162/08997660460734001. [DOI] [PubMed] [Google Scholar]

[CR25] 25.Zhou H, Zhao Y, Xu G, Wang X, Tan Z, Dong J, Zhang X. Chip-scale optical matrix computation for pagerank algorithm. IEEE J. Sel. Top. Quantum Electron. 2020;26(2):1–10. [Google Scholar]

[CR26] 26.Bogaerts W, Pérez D, Capmany J, Miller DAB, Poon J, Englund D, Morichetti F, Melloni A. Programmable photonic circuits. Nature. 2020;586(7828):207–216. doi: 10.1038/s41586-020-2764-0. [DOI] [PubMed] [Google Scholar]

[CR27] 27.Clements WR, Humphreys PC, Metcalf BJ, Kolthammer WS, Walsmley IA. Optimal design for universal multiport interferometers. Optica. 2016;3(12):1460–1465. doi: 10.1364/OPTICA.3.001460. [DOI] [Google Scholar]

[CR28] 28.Miller DAB. Self-configuring universal linear optical component. Photonics Res. 2013;1(1):1–15. doi: 10.1364/PRJ.1.000001. [DOI] [Google Scholar]

[CR29] 29.Mennea PL, Clements WR, Smith DH, Gates JC, Metcalf BJ, Bannerman RHS, Burgwal R, Renema JJ, Kolthammer WS, Walmsley IA, Smith PGR. Modular linear optical circuits. Optica. 2018;5(9):1087–1090. doi: 10.1364/OPTICA.5.001087. [DOI] [Google Scholar]

[CR30] 30.Carolan J, Harrold C, Sparrow C, Martín-López E, Russell NJ, Silverstone JW, Shadbolt PJ, Matsuda N, Oguma M, Itoh M, Marshall GD, Thompson MG, Matthews JCF, Hashimoto T, O’Brien JL, Laing A. Universal linear optics. Science. 2015;349(6249):711–716. doi: 10.1126/science.aab3642. [DOI] [PubMed] [Google Scholar]

[CR31] 31.Zhou H, Zhao Y, Wang X, Gao D, Dong J, Zhang X. Self-configuring and reconfigurable silicon photonic signal processor. ACS Photonics. 2020;7(3):792–799. doi: 10.1021/acsphotonics.9b01673. [DOI] [Google Scholar]

[CR32] 32.Annoni A, Guglielmi E, Carminati M, Ferrari G, Sampietro M, Miller DAB, Melloni A, Morichetti F. Unscrambling light-automatically undoing strong mixing between modes. Light Sci Appl. 2017;6(12):e17110. doi: 10.1038/lsa.2017.110. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR33] 33.Zhou H, Zhao Y, Wei Y, Li F, Dong J, Zhang X. All-in-one silicon photonic polarization processor. Nanophotonics. 2019;8(12):2257–2267. doi: 10.1515/nanoph-2019-0310. [DOI] [Google Scholar]

[CR34] 34.Shen Y, Harris NC, Skirlo S, Prabhu M, Baehr-Jones T, Hochberg M, Sun X, Zhao S, Larochelle H, Englund D, Soljačić M. Deep learning with coherent nanophotonic circuits. Nat. Photonics. 2017;11(7):441–446. doi: 10.1038/nphoton.2017.93. [DOI] [Google Scholar]

[CR35] 35.Tait AN, de Lima TF, Zhou E, Wu AX, Nahmias MA, Shastri BJ, Prucnal PR. Neuromorphic photonic networks using silicon photonic weight banks. Sci. Rep. 2017;7(1):7430. doi: 10.1038/s41598-017-07754-z. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR36] 36.Yang, L., Zhang, L., Ji, R.: On-chip optical matrix-vector multiplier. Optics and Photonics for Information Processing Vii (2013)

[CR37] 37.Miscuglio M, Sorger VJ. Photonic tensor cores for machine learning. Appl. Phys. Rev. 2020;7(3):031404. doi: 10.1063/5.0001942. [DOI] [Google Scholar]

[CR38] 38.Zhao Y, Wang X, Gao D, Dong J, Zhang X. On-chip programmable pulse processor employing cascaded MZI-MRR structure. Front. Optoelectron. 2019;12(2):148–156. doi: 10.1007/s12200-018-0846-5. [DOI] [Google Scholar]

[CR39] 39.Roy, A.B., Dey, D., Mohanty, B., Banerjee, D.: Comparison of FFT, DCT, DWT, WHT compression techniques on electrocardiogram and photoplethysmography signals. IJCA Special Issue on International Conference on Computing, Communication and Sensor Network CCSN, 2012. 6–11

[CR40] 40.Rahardja S, Ser W, Lin ZN. UCHT-based complex sequences for asynchronous CDMA system. IEEE Trans. Commun. 2003;51(4):618–626. doi: 10.1109/TCOMM.2003.810798. [DOI] [Google Scholar]

[CR41] 41.Andrushia AD, Thangarjan R. Saliency-based image compression using walsh–hadamard transform (WHT) Biologically rationalized computing techniques for image processing applications: Springer; 2018. pp. 21–42. [Google Scholar]

[CR42] 42.Strang G. The discrete cosine transform. SIAM Rev. 1999;41(1):135–147. doi: 10.1137/S0036144598336745. [DOI] [Google Scholar]

[CR43] 43.Oppenheim A.V., Schafer, R. W., Buck, J. R.: Discrete-TimeSignal Processing. Norwood: Pearson Education India (1999)

PERMALINK