CNN-based noise reduction for multi-channel speech enhancement system with discrete wavelet transform (DWT) preprocessing

. 2024 Feb 28;10:e1901. doi: 10.7717/peerj-cs.1901

Algorithm 2: Processing DWT output signals through CNN-BLSTM Algorithm

Input: speech signals, Deep learning parameter (batch size, feature dimension, classes, train test ratio).

Output: enhanced speech signal with recognition rate performance.

Step 1: capture speech signals by using DMA microphone array

Step 2: Apply an analogue to digital converter to convert an analogue signal into a digital signal.

Step 3: apply wavelet transform by applying

X (a, b) = \frac{1}{\sqrt{a}} \int_{- \infty}^{\infty} ψ (\frac{t - b}{b}) x (t) dt

• Decompose signal into LL, HL, LH, and HH bands by computing the wavelet coefficients as

c_{jk} = [W_{ψ} f] (2^{- j}, k 2^{- j})

Step 4: Input these coefficients to deep learning

• Process through convolutional layers

n_{out} = [\frac{n_{in} + 2 p - k}{s}] + 1

, n_in denotes the input attributes, n_out denotes the output features, k convolution kernel size, p padding size, s is the stride

• Process the convolved data through pooling layer

h_{xy}^{l} = \max_{i = 0, \dots s, j = 0, . . s} h_{(x + 1) (y + j)}^{l - 1}

• Perform linearization by applying linear layer

• Apply BiLSTM layer

• Process the memory unit data through fully connected layer z^l = W^lh^l−1

• Soft max layer

softmax (z_{i}) = \frac{e^{z_{i}}}{\sum_{j} e^{z_{j}}}

Step 5: obtain the final output speech data and measure the performance