Skip to main content
Sensors (Basel, Switzerland) logoLink to Sensors (Basel, Switzerland)
. 2026 Jan 16;26(2):620. doi: 10.3390/s26020620

Radio Frequency Signal Recognition of Unmanned Aerial Vehicle Based on Complex-Valued Convolutional Neural Network

Yibo Xin 1, Junsheng Mu 1,*, Xiaojun Jing 1, Wei Liu 2
Editor: Gia Khanh Tran
PMCID: PMC12846147  PMID: 41600412

Abstract

The rapid development of unmanned aerial vehicle (UAV) technology necessitates reliable recognition methods. Radio frequency (RF)-based recognition is promising, but conventional real-valued CNNs (RV-CNNs) typically discard phase information from RF spectrograms, leading to degraded performance under low-signal-to-noise ratio (SNR) conditions. To address this, this paper proposes a complex-valued CNN (CV-CNN) that operates on a constructed complex representation, where the real part is the logarithmic power spectral density (PSD) and the imaginary part is derived from Sobel edge detection. This enables genuine complex convolutions that fuse magnitude and structural cues, enhancing noise resilience. As complex-valued networks are known to be sensitive to architectural choices, we conduct comprehensive ablation experiments to investigate the impact of key hyperparameters on model performance, revealing critical stability constraints (e.g., performance collapse beyond 4–5 network depth). Evaluated on the 25-class DroneRFa dataset, the proposed model achieves 100.00% accuracy under noise-free conditions. Crucially, it demonstrates significantly superior robustness in low-SNR regimes: at −20 dB SNR, it attains 15.58% accuracy, over seven times higher than a dual-channel RV-CNN (2.20%) with identical inputs; at −15 dB, it reaches 45.86% versus 14.03%. These results demonstrate that the CV-CNN exhibits potentially superior robustness and interference resistance in comparison to its real-valued counterpart, maintaining high recognition accuracy even under low-SNR conditions.

Keywords: unmanned aerial vehicle, radio frequency, complex-valued convolutional neural network, Sobel edge detection, ablation experiments

1. Introduction

Reliable detection and recognition of UAVs are critical for ensuring safety and security in low-altitude airspace [1]. Existing detection technologies face significant limitations: acoustic sensors suffer from short range and ambient noise interference [2], visual sensors are affected by weather and lighting conditions [3], and radar systems, while robust, are costly and prone to clutter from obstacles [4]. Recent work has also explored novel sensing modalities such as electrostatic charge accumulation on the UAV surface during high-speed flight [5] and vibration signatures induced by propeller rotation, which can be measured via radar for fine-grained identification and tamper detection [6]. Furthermore, system-level solutions leveraging secure drone-to-drone communication and blockchain-based traffic management have been proposed to address safety and privacy concerns from a different angle [7]. RF-based detection has thus emerged as a promising alternative due to its long range, all-weather capability, and ability to identify RF-emitting UAVs even when visually obscured [8].

The efficacy of deep learning in UAV RF signal recognition has been widely demonstrated [9,10,11,12], with CNNs becoming the mainstream approach [13,14,15,16,17]. A study by O’Shea et al. [18] demonstrated the effectiveness of CNNs for radio modulation recognition, laying the groundwork for applying deep learning to RF signal analysis. However, most RV-CNNs process only the magnitude component of the RF spectrogram, discarding phase information that encodes critical hardware-specific features such as oscillator nonlinearity and clock offsets [19]. This omission compromises signal integrity and degrades recognition performance, particularly under low-SNR conditions where phase cues are essential for discrimination [20].

In view of the statistical correlation that characteristically exists between the real and imaginary parts of complex numbers, the utilization of complex-valued modeling approaches is more substantiated. Complex-valued systems impose stronger mathematical constraints than real-valued systems, thereby facilitating the extraction of more intrinsic feature representations [21]. A comprehensive survey on complex-valued neural networks can be found in [22], which details their theoretical advantages and diverse applications. Despite the existence of studies that propose the decomposition of complex values into two real components for the purpose of separate processing, the supposition that a CV-CNN is equivalent to a two-dimensional RV-CNN is erroneous. Research [23] demonstrated that the computational constraints introduced by complex multiplication limit the freedom of CV-CNNs in synaptic weighting, resulting in a fundamental difference in representational capability compared to RV-CNNs.

The employment of CV-CNN ensures the preservation of the algebraic structure inherent within complex data, thus indicating its potential for facilitating the acquisition of more sophisticated feature representations. Extensive research indicates that CV-CNN [24] outperforms traditional RV-CNN [25,26,27] across multiple tasks. However, its adoption remains limited, partly due to the maturity of complex-valued convolutional techniques lagging behind real-valued methods. In recent years, the feasibility of constructing and training CV-CNN has been significantly enhanced by mainstream frameworks such as PyTorch, which have begun to support complex-valued convolutional operations [28]. On this basis, this study proposes a CV-CNN model for recognizing UAV RF signals.

The primary contributions of this work are structured into three core aspects:

  • (1)

    Empirical validation of complex-valued deep learning for UAV RF signal recognition: We demonstrate that a CV-CNN, operating on a carefully constructed complex representation, achieves significantly superior robustness under low-SNR conditions compared to a real-valued counterpart with identical architecture and input. This provides compelling evidence for the practical advantage of complex-domain modeling in RF-based UAV recognition, a domain where phase-like structural cues are critical for noise resilience.

  • (2)

    A hybrid complex feature engineering strategy: Recognizing that the raw RF phase is lost during PSD computation, we propose a pragmatic surrogate for the imaginary component by applying Sobel edge detection to the logarithmic PSD. This design fuses magnitude and structural information into a complex-valued input, enabling genuine complex convolutions that prove highly effective for classification in noisy environments, as validated by our experiments.

  • (3)

    Comprehensive ablation and comparative analysis: We conduct a systematic investigation into the sensitivity of CV-CNN performance to key hyperparameters (network depth, kernel size, and dropout strategy), revealing critical stability constraints (e.g., performance collapse beyond 4–5 layers). Furthermore, we perform a fair, controlled comparison against a dual-channel real-valued CNN (dual-RV-CNN) using the exact same input features, conclusively attributing the low-SNR advantage to the complex arithmetic itself rather than the input representation.

Compared to existing studies, our work presents distinct advancements. First, unlike conventional RV-CNN approaches [13,14,15] that operate solely on magnitude spectrograms and discard phase-related information, our CV-CNN explicitly models the coupling between magnitude and structural cues via complex arithmetic, yielding significantly improved low-SNR robustness. Second, while prior complex-valued models in RF classification (e.g., [27]) typically require access to the full complex time–frequency representation (e.g., raw I/Q data or complex STFT) to exploit phase information, our method is uniquely designed for practical scenarios where only the power spectral density (PSD) is available. By ingeniously substituting the missing phase with Sobel-derived structural features, we retain the benefits of complex-domain processing without relying on raw signal access. Third, in contrast to generic deep architectures like ResNet18 or EfficientNet-B0 that treat RF spectrograms as ordinary images [8], our CV-CNN is purpose-built for the algebraic structure of RF signals, as evidenced by its superior performance under noise.

The remainder of this paper is organized as follows: Section 2 introduces the system model. Section 3 provides a detailed description of the proposed CV-CNN model. In Section 4, the experimental methodology, configuration, and ablation results on the DroneRFa dataset are presented. Section 5 offers a dedicated discussion that contextualizes the findings, compares them with the relevant literature, and critically analyzes the limitations of the proposed approach. Finally, we conclude our work in Section 6.

2. System Model

The proposed system model, as shown in Figure 1, is intended to address the classification of RF signals from UAVs, with the objective of achieving this aim through the implementation of a three-stage processing framework. The initial section, Section 2.1, provides an exposition of signal preprocessing and time–frequency analysis. The subsequent section, Section 2.2, introduces complex-valued feature extraction methods. Finally, Section 2.3 details the classification and decision process.

Figure 1.

Figure 1

The proposed system model.

2.1. Signal Preprocessing and Time–Frequency Analysis

The RF signal data used in this study are drawn from the publicly available DroneRFa dataset [29], which contains raw RF recordings from several distinct categories of commercially available UAVs. The dataset encompasses a diverse range of UAV models from various manufacturers, including but not limited to DJI (e.g., Mini 2, Mavic Pro, Phantom 4 Pro, Inspire 2), and mid-sized platforms such as the MATRICE series. The data are collected using an NI (National Instruments, Austin, TX, USA) USRP-2955 RF transceiver and VERT2450 omnidirectional gain antennas, and a host computer. The USRP-2955, which features four RF channels and supports a sampling rate of up to 100 MS/s, is used to capture the complex I/Q data. The VERT2450 antennas, with a gain of 3 dBi, are tuned to common UAV control and video transmission bands (2.4–2.48 GHz and 4.9–5.9 GHz). The host computer communicates with the USRP via a PCIe interface, ensuring high-throughput data transfer.

The input data comprises dual-channel raw RF data stored in .mat files. The present dataset comprises the in-phase components (RF0I, RF1I) and the quadrature components (RF0Q, RF1Q), with the RF0 channel data originating from the first channel of the RF receiver and the RF1 channel data from the second channel. The letter I is used to denote the in-phase component value of the baseband signal, while Q is used to denote the quadrature component value. The I/Q components are then combined to form the analyzed signal, which can be written as

RF0=RF0I + j·RF0Q (1)
RF1=RF1I + j·RF1Q (2)

Then, the continuous IQ data stream undergoes fixed-length segmentation, with each segment comprising 5 million sample points. This segment length was chosen based on two practical considerations. First, the raw RF files in the DroneRFa dataset typically contain over 150 million samples (corresponding to more than 1.5 s of signal), which is too long for direct neural network processing and contains multiple flight states. Second, prior studies have shown that critical RF fingerprinting features of UAVs, such as frequency hopping, transient bursts, and modulation transitions, often occur within time scales shorter than 5 ms [14,20]. A 50 ms segment (5 million samples at 100 MHz) thus provides a sufficient temporal window to capture multiple such transient events while remaining computationally tractable. The 100 MHz sampling rate, fixed during the original data acquisition of the DroneRFa dataset, ensures sufficient bandwidth coverage for the UAV RF bands of interest (2.4–5.9 GHz). Non-overlapping segmentation is adopted to prevent data leakage between training and test samples.

Time–frequency analysis is implemented using the Short-Time Fourier Transform (STFT). For each channel’s data, the STFT results are computed separately using the following equation:

S0(t, f)=STFT(RF0seg) (3)
S1(t, f)=STFT(RF1seg) (4)

where the STFT results for channels 0 and 1 are denoted by S0(t, f) and S1(t, f). The segmented IQ data for channels 0 and 1 is represented by RF0seg and RF1seg, respectively. Subsequently, the complex STFT results are converted to logarithmic power spectral density P0 and P1, which can be written as

P0=10log10(|S0|2) (5)
P1=10log10(|S1|2) (6)

and the time–frequency transformation operation T(RFseg) can be expressed as

T(RFseg)=10log10(|STFT(RFseg)|2) (7)

This operation computes the logarithmic power spectral density (log-PSD) of the segmented RF signal. Since the STFT itself is a canonical time–frequency representation that captures how the frequency content of a signal evolves over time, the log-PSD inherits this joint time–frequency structure. The logarithmic scaling is a standard practice in RF and audio signal processing, as it enhances the visibility of low-energy components and approximates the human perception of signal power. Thus, T(·) yields a time–frequency representation suitable for visual and machine analysis.

2.2. Complex-Valued Feature Construction and Extraction

During the feature construction phase, the raw logarithmic power spectrum P0/P1 is initially treated as the real part, while the Sobel edge detection results from the spectral image are computed as the imaginary part. In the specific implementation, the imaginary part feature is generated by

G(P)=(PSx)2+(PSy)2 (8)

where Sx and Sy denote the horizontal and vertical convolutional kernels of the Sobel operator, respectively, and ∗ represents the convolutional operation. The Sobel operator is a widely used image processing technique for edge detection. It works by computing the image gradient in two orthogonal directions: Sx=101202101 highlights vertical edges by detecting horizontal intensity changes, while Sy=121000121 highlights horizontal edges by detecting vertical intensity changes. The resulting magnitude map G(P) emphasizes regions of high spatial frequency (i.e., edges and texture), which serve as a phase-like structural surrogate in our complex representation. We clarify that the imaginary component G(P) is not the original RF phase but a phase-like surrogate encoding spatial structure via edge gradients. This design reflects our hypothesis that structural features, rather than raw phase, are more discriminative for classification under strong-noise conditions.

The complex-valued feature construction operation, designated as Ψ(P), can be defined as

Ψ(P)=P+j·G(P) (9)

The feature extraction section comprises N, where N is an integer no less than 3, cascaded complex convolutional blocks. Each constituent block comprises a complex convolutional layer, a CBN layer, and a CReLU activation layer. The number of channels increases according to the pattern 326432 × 2n1, ultimately resulting in the output of feature maps via adaptive average pooling.

CV-CNN performs feature extraction on G(P) with the complex convolutional operation adhering to the following mathematical definition:

Conv(X)=(Wr+jWi)(Xr+jXi)=(WrXrWiXi)+j(WrXi+WiXr) (10)

where the real and imaginary parts of the convolutional kernel weights are denoted by Wr and Wi, respectively, while Xr and Xi represent the real and imaginary parts of the input features. CBN of the input features can be defined as

CBN(X)=γXrμrσr2+ϵ+jγXiμiσi2+ϵ (11)

where μr and μi represent the mean values, σr2 and σi2 denote the squares of the standard deviations of the real and imaginary parts, and γ is the scaling factor.

The activation function adopts the CReLU form, which involves the application of a ReLU nonlinear transformation to the magnitude of complex signals while preserving phase information. This can be expressed as

CReLU(X)=ReLU(|X|)·ejX (12)

where j(X) denotes the phase angle of a complex signal.

The complete complex feature extraction network may be expressed as FC over the complex domain C.

FC=APk=1NCReLUCBNConv (13)

where the symbol ∘ is used to emphasize the directionality of data flow, ⊗ denotes the stacking of convolutional blocks, and AP represents the pooling method. In the case of k<N, AP denotes maximum pooling; in the case of k=N, AP denotes adaptive average pooling.

2.3. Classification and Output

The classifier employs a fully connected (FC) layer architecture comprising 1024512m neurons, with dropout regularization applied at the 512-dimensional hidden layer at a probability p=0.5. The system’s output layer generates an m-dimensional logit vector Z, corresponding to binary encodings for m distinct UAV types. The classification decision formula may be expressed as

Zm=FCmDropout0.5FC512FC1024 (14)

Softmax is selected as the output function, and the final predicted category formula y^ can be defined as

y^=argmaxsoftmaxZm (15)

The complete processing chain may be summarized as follows: The raw IQ signal is subjected to STFT analysis, after which complex features are constructed and extracted. These are then fed into a CV-CNN for classification.

The complete signal processing and classification workflow described by this system may be abstracted as a multi-stage composite function transformation. The mathematical essence of this process is to achieve an end-to-end mapping from raw IQ signals to UAV type identification through a cascade of operations across four stages: time–frequency analysis, complex feature construction, feature extraction, and classification decision-making. The system under discussion can be summarized as formula K, which can be written as

K=ZmFCΨTRFseg (16)

The system model’s core innovations can be categorized as follows: The preservation of the phase information of RF signals throughout the process is to be achieved via complex-domain operations. Secondly, the optimization design of time–frequency analysis and deep learning is to be synergistic. Thirdly, a hybrid feature engineering strategy is to be employed, combining traditional signal processing (Sobel edge detection) with data-driven methods. This structured representation not only reveals the mathematical essence of the system but also highlights its theoretical advantage in achieving high-precision RF signal recognition under complex electromagnetic environments.

3. The Proposed CV-CNN Scheme

CV-CNN is a deep learning architecture that has been specifically designed for the purpose of processing complex-valued data. The present study has demonstrated that the technology under discussion finds extensive applications in fields requiring phase information preservation. Such fields include radar signal processing, medical imaging (e.g. MRI), and wireless communications. The proposed CV-CNN principally comprises a feature extraction module and a classifier module. In comparison to conventional RV-CNN, CV-CNN meticulously adheres to intricate mathematical principles in pivotal operations, thereby ensuring the model’s capacity to efficiently discern the magnitude and phase characteristics of complex data.

3.1. Architecture

The architecture of CV-CNN proposed in this paper represents an enhancement of the model introduced in [30], with its overall structure illustrated in Figure 2. The network under consideration adopts a modular design, primarily consisting of two major modules: a feature extractor and a classifier. The system has been engineered to facilitate efficient complex signal processing through the implementation of meticulously designed complex operation layers.

Within the architecture of the input layer, the network is configured to receive complex tensors of size [batch size, 2, 384, 288]. The first channel is responsible for storing the real-part information of the original spectrogram, while the second channel is responsible for storing the imaginary-part information processed by Sobel edge detection. This dual-channel representation is capable of preserving the complex nature of the signal, thereby establishing the mathematical foundation for subsequent processing.

Figure 2.

Figure 2

Architecture of the proposed CV-CNN.

The feature extractor module consists of N (set to 3–6) stacked complex convolutional blocks, each comprising four key components: The complex convolutional layer employs a four-branch real convolutional function to simulate complex multiplication. The complex arithmetic rule (a+bi)(c+di)=(acbd)+(ad+bc)i is implemented rigorously through the parallel branches conv(rr), conv(ri), conv(ir), and conv(ii). This layer is responsible for preserving the spatial dimensions of the feature map through padding operations, whilst only modifying the channel dimension, which may be increased to a maximum of 256 to prevent overfitting and computational explosion. The CBN layer, a key component of the model, jointly normalizes the concatenated real and imaginary parts, thereby preserving the consistency of complex feature distributions. The CReLU activation function is an innovative implementation of a nonlinear transformation in the complex domain. It first calculates the magnitude of the complex number and applies the ReLU operation, and then reconstructs the complex output by integrating the original phase information. The pooling layer employs a hybrid strategy, whereby the first N-1 layers utilize 2 × 2 max pooling to halve the feature map size sequentially, while the final layer switches to 6 × 6 adaptive average pooling to ensure that the network can process inputs of varying resolutions.

We devised two dropout modes (Standard and Reduced) as regularization strategies. The “Standard” mode employs a strategy of progressively increasing dropout rate, raising the rate by 0.1 for each additional convolutional layer (initialized at 0.3). The “Reduced” mode employs the same increment logic for the first three layers (initialized at 0.2); however, it forcibly disables dropout from the fourth layer onwards. Ablation experiments demonstrated that this design effectively balanced the model’s generalization capability and representational power.

The classifier module consists of three FC layers, the function of which is to flatten the four-dimensional feature tensor [batch, C, H, W] into a two-dimensional matrix [batch, C × H × W]. The initial FC layer compresses the high-dimensional features (e.g., 18,432 dimensions) to 1024 dimensions, while the subsequent layer reduces this further to 512 dimensions, ultimately producing 25-dimensional category logits. The implementation of ReLU activation functions and dropout with a 0.5 probability between each FC layer serves to effectively prevent overfitting.

Utilizing a model comprising four convolutional blocks, a kernel size of three, and a dropout pattern of “Reduced” (4L, 3K, Reduced), the model architecture is illustrated in Table 1. The complete forward-propagation process demonstrates the dimensional changes in the data: The input complex signal [batch, 2, 384, 288] undergoes processing through four convolutional blocks, transforming into [batch, 256, 6, 6]. Subsequent to unfolding, this yields a feature vector of [batch, 9216], which is progressively compressed through FC layers to produce an output of [batch, 25]. This architecture design is capable of achieving efficient feature extraction and classification of complex signals while maintaining the mathematical completeness of complex operations.

Table 1.

Architecture configuration of the (4L, 3K, Reduced) CV-CNN model.

Layer Type Layer Parameter Activation
Input 1 × 2 × 384 × 288
CvConv1 + CBN k = 3 × 3, s = 1, p = 1 → 1 × 64 × 384 × 288 CReLU
MaxP MaxPool 2 × 2 → 1 × 64 × 192 × 144
Dropout1 p = 0.3
CvConv2 + CBN k = 3 × 3, s = 1, p = 1 → 1 × 128 × 192 × 144 CReLU
MaxP MaxPool 2 × 2 → 1 × 128 × 96 × 72
Dropout2 p = 0.4
CvConv3 + CBN k = 3 × 3, s = 1, p = 1 → 1 × 256 × 96 × 72 CReLU
MaxP MaxPool 2 × 2 → 1 × 256 × 48 × 36
Dropout3 p = 0.5
CvConv4 + CBN k = 3 × 3, s = 1, p = 1 → 1 × 256 × 48 × 36 CReLU
AvgP AdaptiveAvgPool 6 × 6 → 1 × 256 × 6 × 6
Flatten 1 × 9216
FC1 9216 → 1024 ReLU
DropoutFC p = 0.5
FC2 1024 → 512 ReLU
DropoutFC p = 0.5
FC3/Output 512 → 25 Softmax

3.2. Feature Extractor

The core components of the feature extractor of the present study comprise complex convolutional layers, CBN layers, CReLU activation functions, and pooling layers, thereby achieving efficient complex signal processing through a modular design.

3.2.1. Complex Convolutional Layer

The core operation of CV-CNN is complex-valued convolution, which is designed to process complex input data. The mathematical foundation of this approach is predicated on the linear separability of complex multiplication, thus exhibiting a fundamental distinction from real-valued convolution, as shown in Table 2.

Table 2.

Distinction between complex-valued and real-valued convolution.

Aspect Complex-Valued Convolution Real-Valued Convolution
Input/output Complex number Real number
Number of parameters 4 times real-valued Relatively few
Phase processing Preserving phase Phase loss
Frequency domain Correlation modeling Magnitude-only
Computational complexity Higher Lower

In the case of an input complex tensor xRB × 2 × H × W, where B denotes the batch size, 2 represents the real part (channel 0) and the imaginary part (channel 1), and H and W denote the input height and width, respectively, the tensor can be expressed in this form. It can be demonstrated that, for the complex input z=a+bi and the convolutional kernel W=Wr+Wi, the output z can be expressed as

z=(aWrbWi)+(aWi+bWr)i (17)

In this context, the symbol ∗ denotes the standard two-dimensional convolutional operation, with Wr and Wi denoting the convolutional kernel weights for the real and imaginary parts, respectively. In practical implementation, complex-valued convolution is approximated through parallel computation using four real-valued convolutional kernels: convrr(a) denotes the convolution of the real part with the real kernel, convri(a) denotes the convolution of the real part with the imaginary kernel, convir(b) denotes the convolution of the imaginary part with the real kernel, and convii(b) denotes the convolution of the imaginary part with the imaginary kernel. The final output is obtained through linear combination:

out_real=convrr(a)convii(b) (18)
out_imag=convri(a)+convir(b) (19)

This design explicitly models the cross-coupling between the real and imaginary parts, rendering it suitable for time–frequency analysis (e.g., STFT features) and coherent signal processing.

Complex-valued convolution offers several advantages over its real-valued counterpart. Firstly, it preserves phase information, which, in UAV RF signals, carries target motion data, enabling effective modeling of phase variations. Secondly, it exhibits frequency-domain filtering characteristics, with complex kernels functioning as frequency-domain filters that better suit time–frequency analysis tasks (such as STFT feature processing) than real-valued convolution. Thirdly, it offers physical interpretability, as complex operations align with signal processing theory (e.g., analytic signals and Hilbert transform).

3.2.2. CBN Layer

The CBN layer performs independent normalization on the real and imaginary parts to ensure training stability. For complex data z=a+bi, the normalization method is defined as

BN(z)=BN(a)+BN(b)i (20)

where BN denotes a standard BatchNorm2d, which can be expressed as

BN(X)=γXμσ2+ϵ (21)

This specific implementation method involves the calculation of μ and σ separately for the real and imaginary parts. CBN is a method that has been developed to address the issue of internal covariate shifts in complex data. This process prevents optimization difficulties arising from scale differences between real and imaginary components. The primary benefit of this approach is that it stabilizes the distribution of complex features, thereby accelerating the convergence of training. Additionally, it preserves the independence of the real and imaginary components.

3.2.3. CReLU Layer

CReLU is a nonlinear activation function that defines a magnitude activation while preserving the phase in the complex domain. For a complex number z=a+bi, the magnitude |z| and phase ϕ(z) can be written as

|z|=a2+b2+ε(ε=106) (22)
ϕ(z)=atan2(b,a) (23)

The application of a standard ReLU to the magnitude yields |z|, which is given by

|z|=ReLU(|z|)=max(0,|z|) (24)

Finally, z must be reconstructed in order to derive the CReLU formula:

CReLU(z)=|z|·cosϕ+i(|z|·sinϕ) (25)

CReLU circumvents the phase distortion issues that arise from conventional real–imaginary separation activation mechanisms by introducing a nonlinear transformation in the vicinity of the origin whilst strictly preserving complex phase information.

3.2.4. Complex Pooling Layer

In CV-CNN, the complex pooling operation is of pivotal importance as a module for the reduction of feature dimensionality, primarily employed to decrease the spatial resolution of feature maps while enhancing translation invariance. In light of the properties inherent to complex data, it is imperative that pooling operations process the real part and the imaginary part separately, thereby ensuring the maintenance of mathematical consistency in complex arithmetic. The specific implementations comprise max pooling and adaptive average pooling.

For the set of complex samples within the sliding window (win) {zk=ak+bki|kwin}, max pooling independently selects the maximum value for both the real and imaginary parts, whereas average pooling calculates the arithmetic mean. Max pooling and average pooling can be defined as

MaxPool(zk)=maxkwinzk=max(ak)+i·max(bk) (26)
AveragePool(zk)=1|win|kwinzk=1|win|kwinak+i·1|win|kwinbk (27)

In the domain of signal processing, the max pooling operation, as a feature dimension reduction method, derives its physical significance primarily from its ability to enhance locally salient features. This approach effectively accentuates locally prominent features exhibiting strong responses, a characteristic of considerable value in UAV RF signal recognition applications. Specifically, within the domain of UAV RF signal processing, this method preserves both the amplitude and phase information of the RF signal from the target.

The primary benefit of average pooling is attributable to its capacity for spatial smoothing, a property that is particularly advantageous for the processing of complex signals. The process of averaging serves to effectively suppress random noise in local regions while preserving the overall energy distribution characteristics of the signal. From an implementation perspective, complex average pooling is equivalent to performing standard real-valued average pooling operations separately on the real and imaginary components of the feature map. This approach not only is computationally efficient but, more importantly, maintains the continuity of phase information in the output. In particular, the phase value of the output feature is defined as the weighted average of all complex sample phases within the specified window. This property makes it particularly suitable for applications that require strict phase consistency, such as coherent signal processing and RF data analysis of UAVs.

3.3. Classifier

The classifier module is composed of an FC layer and an output layer, a design that is imperative for processing complex features. The input to the proposed CV-CNN model is a flattened complex feature vector zR2N, composed of the real part aRN and the imaginary part bRN. For instance, if the real and imaginary parts each possess 128 dimensions, the input z will have a dimension count of 256. The parameter configuration for this layer is defined by the weight matrix WRM × 2N, which is a matrix of real numbers with dimensions M × 2N, and the bias vector bRM, which is a real number with dimensions M. The output dimension is denoted by M.

In the actual implementation process, the real part a and the imaginary part b undergo independent matrix multiplication operations (i.e., Wa and Wb), with the final results being merged. This design approach of separating computations effectively preserves the physical significance of complex features, including both magnitude and phase information. It is particularly well-suited for processing frequency-domain signals or the output features of complex convolutional layers.

The principal function of the FC layer is to map high-dimensional complex features (such as the output of complex convolutional layers) onto the target dimensional space (such as the number of categories in a classification task), and the separation of the real and imaginary components for computation ensures that the physical meaning of the complex features is preserved, thereby providing a suitable feature representation for subsequent classification tasks.

The output layer generates raw logits, which are converted into a probability distribution over the 25 UAV classes via the standard Softmax function. For numerical stability, the implementation employs the log-Softmax variant during training, consistent with common practice in deep learning frameworks.

3.4. Loss Function and Backpropagation

We adopt the standard cross-entropy loss to measure the discrepancy between predicted probabilities and ground-truth labels.

The model is trained using the Adam optimizer with a cross-entropy loss function, following standard practice for multi-class classification tasks. The initial learning rate is set to 0.001, with β1=0.9 and β2=0.999, as commonly used in the literature.

This optimization scheme effectively mitigates gradient oscillation by incorporating first-order and second-order moment estimation; subsequently addresses significant moment estimation bias during the initial phase through a deviation correction mechanism; and finally achieves robust convergence performance across varying parameter dimensions by employing an adaptive learning rate.

Gradient computation for complex-valued parameters is performed using Wirtinger calculus, which is natively supported in modern deep learning frameworks such as PyTorch (version 2.5.1).

This decoupled computational approach ensures that complex parameters retain their physical significance throughout the optimization process, while effectively capturing the coupling relationship between the real and imaginary parts.

4. Experiments and Results

This section conducts ablation experiments on the proposed CV-CNN model under ideal conditions (noise-free) using a spectrogram dataset generated from the raw IQ data within the DroneRFa dataset [29]. The impact of various parameters on the model’s performance in UAV RF signal classification was evaluated using accuracy. In addition, in order to provide a more comprehensive comparison of the performance of CV-CNN with that of traditional RV-CNN models, ablation studies were also conducted on the RV-CNN models. It is noteworthy that both RV-CNN and CV-CNN employed model structures and training logic that were identical throughout the course of the aforementioned ablation experiments.

In consideration of the pervasive presence of noise interference in practical application scenarios, subsequent research in Section 4.2 will undertake a more profound analysis of the performance differences between the two models under varying SNR conditions. This will facilitate a more comprehensive evaluation of the applicability of CV-CNN and RV-CNN for real-world applications.

4.1. Experimental Setup

The present study has devised a comprehensive experimental framework for CV-CNN with the objective of identifying RF signals from UAVs, as shown in Figure 3. During the construction and preprocessing of the dataset, the DroneRFA dataset was used, which comprises IQ data from 25 categories of RF signals from UAVs. The raw IQ data underwent fixed-length segmentation, with each segment containing 5 × 106 samples corresponding to a 50 ms duration at a 100 MHz sampling rate. The signal data within any two segments did not overlap. Subsequently, each segmented IQ data point underwent STFT, with the following parameters being utilized: A 1024-point Hanning window, with a 50% overlap, is utilized in conjunction with a 1024-point FFT operation, ensuring a consistent 100 MHz sampling rate. This process yielded a dataset comprising 38,480 spectrograms with a resolution of 1250 × 938.

The preprocessing workflow encompasses four critical stages: During spectrogram processing, the raw image (938 × 1250) undergoes effective region cropping; for spectrogram visualization, a 60 dB dynamic range is set (referenced to maximum power), employing a Jet color palette, with the frequency axis display range set to 50 MHz to 50 MHz to achieve centered display. Subsequently, a logarithmic transformation enhances the discernibility of low-intensity signal features. The final output comprises a 384 × 288-pixel PNG image, which serves as input for the RV-CNN and the real-part channel input for the CV-CNN. Finally, the Sobel edge detection algorithm is applied to the PNG image to generate the imaginary-part channel input, thereby constructing the complete complex representation for the CV-CNN. A comparative analysis is conducted to assess the performance of the two models under both noise-free and noisy conditions.

Figure 3.

Figure 3

Ablation experiment flowchart.

In terms of network architecture design, the proposed CV-CNN model comprises three core components: the complex convolutional layer strictly adheres to complex multiplication rules to ensure mathematical completeness of complex operations; the CBN layer performs independent normalization on the real and imaginary parts to maintain feature numerical stability; and the CReLU activation function implements nonlinear transformation based on complex magnitude while preserving phase information. Specifically, the network employs a dynamic feature extractor design that allows flexible adjustment of network depth based on the parameters of the ablation experiment.

The design of the ablation experiment employs a full-combination scheme of three factors in order to systematically evaluate the impact of different network configurations on performance. The first factor examines the influence of the number of convolutional layers (3, 4, 5, or 6 layers) on feature extraction capability; the second factor compares the local feature capture ability of different convolutional kernel sizes (3 × 3 and 5 × 5). The third factor contrasted two dropout regularization strategies. The “Standard” mode employed a progressively increasing dropout rate (0.3–0.8) across layers, whilst the “Reduced” mode applied a dropout rate increment strategy (0.2–0.4) exclusively to the first three layers. This multi-factor experimental design provides exhaustive empirical evidence for the optimization of model performance.

For the 25 categories of UAV RF signals, the experiment randomly partitioned the data into training, validation, and test sets at a ratio of 0.7:0.15:0.15. The implementation of the neural network model was accomplished through the utilization of the PyTorch framework (version 2.5.1, Meta Platforms, Inc., Menlo Park, CA, USA) on a Ubuntu 22.04.5 platform (Canonical Ltd., London, United Kingdom). The training, validation, and testing phases of the process were conducted on an NVIDIA RTX 3090 (NVIDIA Corporation, Santa Clara, CA, USA), utilizing the Adam optimizer and the cross-entropy loss function. The initial learning rate, weight decay, batch size, and training epoch were separately set to 0.001, 1 × 105, 8, and 50.

4.2. Ablation Study on CV-CNN Architecture

In this section, the impact of network depth, kernel size, and dropout mode on the classification accuracy of the CV-CNN model was evaluated based on the results of the ablation experiments shown in Table 3.

Table 3.

Accuracy of CV-CNN model under different parameter configurations.

Network Depth Kernel Size Dropout Mode Accuracy (%)
3 3 Standard 96.08
3 3 Reduced 98.35
3 5 Standard 95.88
3 5 Reduced 96.53
4 3 Standard 99.74
4 3 Reduced 99.90
4 5 Standard 99.98
4 5 Reduced 100.00
5 3 Standard 99.98
5 3 Reduced 8.52
5 5 Standard 99.79
5 5 Reduced 100.00
6 3 Standard 76.33
6 3 Reduced 9.23
6 5 Standard 7.52
6 5 Reduced 7.52

4.2.1. Impact of Network Depth

The experimental results indicated that network depth is the most substantial factor influencing the performance of the CV-CNN model. As shown in Figure 4, the performance of the model notably improved with an increase in network depth from 3 to 4 layers. However, when the depth was increased to five layers, the marginal benefit diminished, with performance even showing slight deterioration. In the (5, 3, Reduced) configuration, the accuracy of the model decreased to 8.52%, which demonstrates the model’s extreme sensitivity to parameter combinations within this range and its susceptibility to training instability.

This instability is primarily attributed to the combination of excessive network depth and insufficient regularization. Specifically, for deeper networks (5 or 6 layers), the “Reduced” dropout mode completely removes dropout from the final convolutional layers, which can lead to severe overfitting and gradient instability during training. This is further evidenced by the training loss curves for these configurations, which often plateau at a high value without converging, as opposed to stable models like (4L, 3K, Reduced) that show continuous improvement. A comparison of training behavior between two configurations is illustrated in Figure 5. The loss curve for the (5L, 3K, Reduced) model plateaus early, indicating a failure to learn, whereas the (4L, 3K, Reduced) model exhibits a steady decline in loss, confirming its stability.

As the depth was increased to six layers, the model demonstrated a significant deterioration in performance, with accuracy falling below 10% in the majority of configurations. This finding suggests that the network has become excessively deep, which has resulted in substantial gradient-related issues or the overfitting of the model. Consequently, subsequent analyses in this section exclude models with a depth of 6 and the (5, 3, Reduced) configuration.

Figure 4.

Figure 4

Impact of network depth on accuracy.

Figure 5.

Figure 5

Training loss curves for (a) the unstable (5L, 3K, Reduced) configuration and (b) the stable (4L, 3K, Reduced) configuration.

4.2.2. Impact of Kernel Size

The impact of kernel size on CV-CNN performance is relatively minor, exhibiting no discernible pattern, as illustrated in Figure 6. When the network depth was set at 3, models utilizing 3 × 3 convolutional kernels exhibited marginally superior performance in comparison to those employing 5 × 5 kernels. At a depth of 4, 5 × 5 kernel-based models demonstrated slightly superior performance compared to their 3 × 3 counterparts. However, at a depth of 5, the 3 × 3 kernel-based models once again exhibited slight superiority over the 5 × 5 kernel-based models, suggesting a transition in the relative effectiveness of the two kernels as the network depth increases. This phenomenon suggests that the influence of kernel size may be modulated by interactions with other parameters, rather than being an independent dominant factor.

Figure 6.

Figure 6

Impact of kernel size on accuracy.

4.2.3. Impact of Dropout Mode

The impact of dropout mode on model performance is highly correlated with network depth, as shown in Figure 7. When model complexity was moderate (network depth = 3, 4), the ’Reduced’ mode often yielded a slight performance improvement, such as an increase from 96.08% for (3, 3, Standard) to 98.35% for (3, 3, Reduced). However, as the complexity of the model increased, the ’Reduced’ mode was shown to significantly exacerbate overfitting due to insufficient regularization strength, leading to a performance collapse. For instance, the configuration (5, 3, Reduced) attained an accuracy of merely 8.52%.

Figure 7.

Figure 7

Impact of dropout mode on accuracy.

4.3. Performance Comparison Under Noise-Free Conditions

This section drew parallels between the performance of CV-CNN and RV-CNN models operating under identical experimental and training protocols in the absence of noise, as illustrated in Table 4.

Comparison of the performance of the two model types reveals that both can achieve near-100% accuracy under optimal configurations (e.g., network depth = 4 or 5 in most cases), indicating comparable theoretical upper limits. However, RV-CNN requires lower parameter tuning costs to achieve equivalent performance and demonstrates significant advantages in terms of model stability. As shown in Figure 8, CV-CNN demonstrated variability in performance when the network depth ranged from 3 to 5, while RV-CNN exhibited consistent and superior performance. For instance, under the configuration (5L, 3K, Reduced), RV-CNN achieved 100% accuracy, whereas CV-CNN registered only 8.52%. Furthermore, RV-CNN demonstrated greater fault tolerance to increased network depth, as shown in Figure 8b. When the network depth was designated as 6, the performance of CV-CNN was significantly compromised (often falling below 10% accuracy), while RV-CNN exhibited notable resilience with certain configurations. For instance, under (6L, 3K, Standard) and (6L, 5K, Standard) configurations, RV-CNN achieved accuracies of 99.98% and 84.13%, respectively.

Table 4.

Comparison of accuracy between CV-CNN and RV-CNN models.

Network
Depth
Kernel
Size
Dropout
Mode
CV-CNN
Accuracy (%)
RV-CNN
Accuracy (%)
3 3 Standard 96.08 99.88
3 3 Reduced 98.35 99.98
3 5 Standard 95.88 99.97
3 5 Reduced 96.53 100
4 3 Standard 99.74 98.35
4 3 Reduced 99.9 100
4 5 Standard 99.98 100
4 5 Reduced 100 100
5 3 Standard 99.98 99.62
5 3 Reduced 8.52 100
5 5 Standard 99.79 100
5 5 Reduced 100 100
6 3 Standard 76.33 99.98
6 3 Reduced 9.23 54.12
6 5 Standard 7.52 84.13
6 5 Reduced 7.52 10.93

Figure 8.

Figure 8

Performance comparison of (a) CV-CNN and (b) RV-CNN at different network depths.

In conclusion, the RV-CNN models offer enhanced stability and resistance to overfitting, while exhibiting comparable peak performance to the CV-CNN models. This renders the RV-CNN models a more reliable choice.

However, it should be noted that the aforementioned conclusions are obtained under ideal experimental conditions, devoid of noise. In view of the pervasive presence of noise interference in practical application scenarios, we will further analyze the performance differences between the two models under varying SNRs in the subsequent section. This extensive investigation will provide a more comprehensive assessment of the suitability of CV-CNN and RV-CNN for real-world applications.

4.4. Robustness Evaluation Under Varying SNR Conditions

To rigorously evaluate the robustness of the proposed CV-CNN under realistic electromagnetic interference, we conduct a series of controlled experiments across a wide SNR range from 20 dB to 20 dB. These experiments are designed to answer three key questions: (1) How does CV-CNN compare against conventional real-valued CNNs that only use magnitude spectrograms? (2) Is the observed low-SNR advantage attributable to the complex-domain arithmetic itself, rather than the input representation? (3) How does CV-CNN perform relative to representative deep learning and traditional machine learning baselines? The following subsections address these questions through progressively refined comparisons.

4.4.1. Performance Comparison Between CV-CNN and RV-CNN Under Varying SNR

A systematic comparison was made between the recognition accuracy of the CV-CNN model and the traditional RV-CNN model under varying SNRs through experimental evaluation. The performance of both models was comprehensively assessed across an SNR range from −20 dB to 20 dB. As indicated by the findings in Section 4.3, both the CV-CNN and RV-CNN models demonstrated substantial degradation when the network depth reached six layers. Consequently, we abandoned analysis of models with a depth of six layers.

The experimental results revealed a conspicuous discrepancy in performance between the two models, particularly in regions characterized by low SNR, as shown in Figure 9. At extremely low SNR levels (−20 dB to −10 dB), the two models demonstrated fundamental discrepancies in their performance. As shown in Figure 9a, all configurations of the CV-CNN model demonstrated substandard performance at −20 dB. However, certain models, including the (5L, 5K, Standard) and (5L, 5K, Reduced) models, exhibited foundational classification capabilities, attaining accuracy rates that significantly exceeded random guessing levels. When SNR improved to −10 dB, the CV-CNN models demonstrated substantial performance gains, with most configurations achieving accuracy rates exceeding 80%. When subject to low-SNR conditions that were identical in all respects, failure of the RV-CNN models was complete, as shown in Figure 9b. At both −20 dB and −10 dB, the RV-CNN models demonstrated an accuracy rate of less than 10%, indicating an inability to extract effective features for classification in the presence of strong noise.

Within the SNR range of −10 dB to 0 dB, the CV-CNN demonstrated its most pronounced performance advantage. At an SNR of 0 dB, the CV-CNN approached near-complete convergence. The majority of model configurations achieved accuracy rates in excess of 95%, with configurations such as (5L, 5K, Standard) and (4L, 5K, Standard) attaining or approaching 100% accuracy. In contrast, the performance of RV-CNN in this range exhibited a considerable delay. Even at 0 dB, the accuracy of all RV-CNN models remained below 10%, indicating a substantial discrepancy compared to CV-CNN models.

Figure 9.

Figure 9

Accuracy comparison of (a) CV-CNN and (b) RV-CNN at different SNRs.

As SNR continued to increase, the performance of the RV-CNN models began to improve rapidly. Beyond an SNR exceeding 10 dB, the accuracy of all CV-CNN and RV-CNN models approached saturation, reaching or approaching the ultimate performance threshold of 100%. This finding indicated that, in conditions of high SNR, both models demonstrate the capacity to leverage the unique signal characteristics in their entirety.

4.4.2. Controlled Comparison Between CV-CNN and Dual-Channel RV-CNN with Identical Inputs

Section 4.4.1 demonstrated that the CV-CNN outperforms the standard magnitude-only RV-CNN. However, both models utilize different input representations: the RV-CNN processes only the logarithmic power spectral density, whereas the CV-CNN incorporates an additional Sobel edge map as the imaginary component. Consequently, it remains unclear whether the observed performance gain is attributable to the complex-valued processing itself or simply to the richer input representation. To isolate the effect of complex arithmetic, we design a controlled experiment where both models receive identical dual-channel inputs, the logarithmic PSD and Sobel edge map, and share the same architectural hyperparameters. This setup allows us to directly attribute any performance difference to the underlying arithmetic domain (complex vs. real).

We implemented a dual-RV-CNN that uses the exact same input representation as our CV-CNN: logarithmic PSD as channel 1 and Sobel edge map as channel 2. Both models share identical architecture hyperparameters (4L, 5K, Reduced), and were trained and evaluated under the same protocol.

To provide a complete picture, we also include the full-SNR-range comparison (Figure 10) for this representative configuration. This clearly illustrates that while the two models perform comparably in high-SNR regimes, the CV-CNN exhibits significantly superior robustness under severe noise conditions.

Figure 10.

Figure 10

Accuracy comparison of CV-CNN and dual-RV-CNN across an SNR range from (a) −20 dB to 20 dB (in 5 dB steps), and (b) −20 dB to −10 dB (in 2 dB steps).

As shown in Figure 10a, both models achieve near-perfect accuracy above −10 dB SNR. However, in the critical low-SNR regime (−20 dB to −10 dB), the CV-CNN demonstrates a clear advantage. To investigate this further, we refined the evaluation granularity from 5 dB to 2 dB steps in this range. The results in Figure 10b confirm that the CV-CNN consistently outperforms the dual-RV-CNN, with 15.58% vs. 2.20% at −20 dB, and 45.86% vs. 14.03% at −15 dB, despite identical inputs. This indicates that the benefit arises not from the input representation alone, but from the intrinsic noise resilience of complex-domain arithmetic during feature fusion.

4.4.3. Comparison with State-of-the-Art Deep and Traditional Machine Learning Models

To provide a more comprehensive evaluation of the proposed CV-CNN’s performance, we extended our baseline comparisons beyond RV-CNN to include several widely used deep learning architectures for image classification (ResNet18 and EfficientNet-B0) and traditional machine learning methods (SVM, Random Forest, XGBoost, KNN). We also implemented LSTM and GRU models, which are commonly applied to time-series data. All models were trained and evaluated on the same preprocessed DroneRFa dataset using identical experimental protocols. The results under noise-free conditions are summarized in Table 5.

Table 5.

Accuracy of various models under noise-free conditions on the DroneRFa dataset.

Model Accuracy (%)
CV-CNN (4L, 5K, Reduced) 100.00
ResNet18 100.00
EfficientNet-B0 100.00
LSTM 50.33
GRU 51.20
SVM 45.48
Random Forest 69.63
XGBoost 73.20
KNN 49.74

As shown in Table 5, while ResNet18 and EfficientNet-B0 achieve perfect accuracy under ideal conditions, they exhibit significantly lower robustness to noise compared to CV-CNN, as demonstrated in Figure 11. In contrast, LSTM and GRU, despite their suitability for sequential data, perform poorly on our spectrogram inputs, likely due to the loss of temporal structure during STFT preprocessing. Traditional machine learning methods also show subpar performance, primarily because they rely on handcrafted features that fail to capture the complex, high-dimensional patterns present in RF spectrograms. This highlights the advantage of end-to-end deep learning models, particularly those designed to process the full complex-domain structure of the signal.

In summary, under the experimental conditions and dataset employed in this study, the CV-CNN demonstrated notably superior performance over RV-CNN under low-SNR conditions. CV-CNN has been shown to maintain effective feature extraction and classification capabilities even in extremely low-SNR environments, where RV-CNN has been demonstrated to fail, thus suggesting potential robust noise resilience of CV-CNN. Concurrently, CV-CNN exhibited a higher performance threshold, achieving near-perfect classification performance at 0 dB SNR, whereas RV-CNN remained virtually inoperable at this SNR level. While these results highlight the promise of complex-domain modeling for RF signal recognition under noise, it should be acknowledged that they are based on a single dataset and preprocessing pipeline. Generalization to other scenarios warrants further investigation.

Figure 11.

Figure 11

Accuracy comparison of CV-CNN (4L, 5K, Reduced), ResNet18, and EfficientNet-B0 across an SNR range from −20 dB to 20 dB.

In practical applications, when the anticipated SNR in the working environment is below 10 dB, the CV-CNN architecture should be prioritized. Conversely, in high-SNR environments (more than 10 dB), the performance discrepancy between the two models is significantly reduced, enabling selection based on computational efficiency and other factors. The anomalous performance observed at five layers and the comprehensive degradation at six layers provide crucial avenues for future research. This finding necessitates further investigation into the optimal depth constraints for CV-CNN and the underlying mechanisms governing their interaction with regularization strategies.

4.5. Real-Time Inference Performance Evaluation

The practical deployment of UAV recognition systems in real-world scenarios demands not only high accuracy but also low-latency inference and efficient resource utilization. To assess the operational viability of the proposed CV-CNN model (configured as 4L, 5K, Reduced), we conduct a dedicated evaluation of its inference characteristics under conditions that simulate real-time monitoring. This section investigates two critical aspects: (1) whether the model meets the time-sensitive requirements of real-time UAV detection, specifically, in terms of latency and throughput for processing continuous RF data streams, and (2) its computational footprint, which informs the potential for future deployment on resource-constrained, field-deployable platforms such as embedded devices.

Specifically, we measure three key metrics under identical experimental conditions: (1) average inference time per 50 ms RF signal segment (corresponding to one spectrogram input); (2) throughput in terms of samples processed per second; and (3) GPU memory consumption during inference. All tests are conducted on an NVIDIA RTX 3090 GPU using the PyTorch framework, with input data preprocessed as described in Section 4.1.

The results, summarized in Figure 12, indicate that the model achieves an average inference latency of 9.82 ms per sample, enabling a throughput of 101.9 samples per second. The peak GPU memory footprint during inference is 0.1 GB. These preliminary results demonstrate that the current implementation operates with low latency and moderate resource usage, suggesting its potential suitability for real-time scenarios. The ability to process over 100 samples per second on a single GPU is a strong indicator of feasibility for many operational contexts where rapid detection is required.

While this study focuses on algorithmic validation, we recognize that actual deployment in operational UAV monitoring systems requires further engineering optimizations, including model quantization, hardware-aware acceleration, and integration with continuous RF data streaming pipelines. These aspects will be thoroughly investigated in our future work to realize a fully functional real-time CV-CNN-based UAV recognition system.

Regarding the suitability for deployment on embedded devices, we acknowledge that our current evaluation is conducted on a high-performance NVIDIA RTX 3090 GPU, which represents a desktop/server-grade platform rather than a field-deployable embedded system. While the measured metrics (9.82 ms latency, 0.1 GB memory) are promising and suggest that the model’s computational footprint is not prohibitively large, direct deployment on resource-constrained embedded hardware (e.g., ARM-based SoCs or FPGAs) requires further optimization. These optimizations may include model quantization, pruning, knowledge distillation, and the use of specialized inference engines (e.g., TensorRT, OpenVINO). The low latency and moderate memory usage observed in our preliminary tests provide a strong foundation for such future work, indicating that the CV-CNN architecture is not inherently incompatible with embedded deployment. However, a definitive assessment of its feasibility on specific embedded platforms will require targeted benchmarking and hardware-aware implementation, which constitutes a critical next step beyond the scope of this algorithmic validation study.

Figure 12.

Figure 12

Inference performance of the (4L, 5K, Reduced) CV-CNN model. (a) Average inference time per sample; (b) throughput in samples per second; (c) GPU memory consumption during inference.

5. Discussions

The experimental results presented in this study demonstrate that the proposed CV-CNN achieves significantly enhanced robustness under low-SNR conditions compared to its real-valued counterpart, despite both models receiving identical dual-channel inputs (log-PSD and Sobel edge map). This advantage aligns with theoretical expectations from the complex-valued deep learning literature [21,24], which posits that complex-domain operations impose stronger algebraic constraints and better preserve the intrinsic structure of signals originally defined in the complex plane. Our findings corroborate this hypothesis in the specific context of UAV RF signal recognition, where phase-like structural cues, encoded via edge gradients, complement magnitude information to sustain discriminability even when SNR drops below −10 dB. It is worth noting that while other recent works have explored complex-valued models for RF signal classification [31], our approach uniquely focuses on leveraging a hybrid magnitude–edge representation to achieve superior noise resilience, rather than relying on raw data.

Nevertheless, several important limitations must be acknowledged. First, the “phase” component in our complex representation is not the true RF phase but a surrogate derived from Sobel edge detection on the power spectrogram. Although this design empirically improves low-SNR performance, it discards the original hardware-specific phase signatures (e.g., oscillator drift). This loss may limit the model’s ability to discriminate between devices with subtle RF fingerprint differences. Second, the CV-CNN exhibits high sensitivity to architectural hyperparameters, particularly network depth. As shown in Table 3, configurations beyond four layers often suffer from severe training instability, leading to catastrophic accuracy drops (e.g., 8.52% for “5L, 3K, Reduced”). This suggests that existing regularization strategies (e.g., dropout) may be insufficient for deeper complex networks, warranting future investigation into complex-specific normalization or initialization schemes. Potential directions include adopting orthogonal or unitary initialization for complex weights [24], exploring complex BatchNorm variants that jointly normalize magnitude and phase [22], or integrating spectral regularization to stabilize training dynamics in deep CV-CNNs.

Third, all experiments are conducted on a single dataset (DroneRFa) with a fixed preprocessing pipeline (STFT to log-PSD and to Sobel). The generalizability of our approach to other RF datasets, modulation types, or time–frequency representations (e.g., CWT) remains unverified. Finally, the current framework is not fully end-to-end; it relies on handcrafted feature construction rather than learning the optimal complex representation directly from raw IQ data. While this hybrid approach balances interpretability and performance, an end-to-end complex model may unlock further gains if training stability challenges can be overcome.

Despite these limitations, this work provides compelling evidence that complex-valued modeling, when carefully designed, can substantially improve the noise resilience of RF-based UAV recognition systems, offering a promising direction for future research in complex-domain signal intelligence. Looking ahead, the insights from collaborative multi-agent systems [32] suggest that robust RF signal recognition could serve as a foundational component for future cooperative UAV traffic management and swarm intelligence frameworks.

6. Conclusions

This study aims to implement CV-CNN in recognizing RF signals from UAVs. In order to leverage phase information, a CV-CNN incorporating complex convolutional layer, CBN, CReLU, complex pooling, and complex backpropagation algorithms is proposed. A series of ablation experiments was conducted on the DroneRFa dataset, with the aim of comparing the performance of the proposed CV-CNN with that of an RV-CNN sharing the same architecture. Evaluated on the 25-class DroneRFa dataset, the proposed CV-CNN achieves 100.00% accuracy under noise-free conditions. Crucially, it demonstrates significantly superior robustness in low-SNR regimes: at −20 dB SNR, it attains 15.58% accuracy, over seven times higher than a dual-channel RV-CNN (2.20%) with identical inputs; at −15 dB, it reaches 45.86% versus 14.03%. This research provides a foundation for the broad application of CV-CNN in the domain of UAV RF signal recognition.

Importantly, the advantage of CV-CNN does not stem from preserving the original RF phase, which is lost during PSD computation. Instead, it arises from the model’s ability to process a magnitude–edge complex representation through genuine complex arithmetic, which yields superior low-SNR robustness compared to real-valued alternatives. The experimental results suggest that, within the scope of the DroneRFa dataset and the adopted preprocessing strategy, the CV-CNN exhibits enhanced robustness and interference resistance in comparison to RV-CNN of identical architecture, maintaining high recognition accuracy even under low-SNR conditions.

Furthermore, comparative experiments with state-of-the-art image classification networks (ResNet18, EfficientNet-B0) and traditional machine learning methods confirm that the proposed CV-CNN offers not only high accuracy but also exceptional robustness under challenging low-SNR conditions.

While these results are promising, it should be noted that practical deployment may be constrained by the need for GPU-supported complex arithmetic and dependence on controlled RF data acquisition conditions (e.g., fixed sampling rate and known frequency bands), which could affect adaptability in highly dynamic electromagnetic environments.

Abbreviations

The following abbreviations are used in this manuscript:

UAV Unmanned aerial vehicle
RF Radio frequency
PSD Power spectral density
CNN Convolution neural network
CV-CNN Complex-valued convolution neural network
RV-CNN Real-valued convolution neural network
SNR Signal-to-noise ratio
CBN Complex batch normalization
CReLU Complex rectified linear unit
STFT Short-Time Fourier Transform
FC Fully connected layer
SVM Support Vector Machine
KNN K Nearest Neighbors
LSTM Long Short-Term Memory
GRU Gated Recurrent Unit

Author Contributions

Conceptualization, Y.X. and J.M.; methodology, J.M. and X.J.; software, Y.X.; validation, Y.X. and J.M.; formal analysis, Y.X. and W.L.; investigation, Y.X. and J.M.; resources, Y.X. and W.L.; data curation, Y.X.; writing—original draft preparation, Y.X.; writing—review and editing, Y.X. and J.M.; visualization, Y.X.; supervision, X.J. and W.L.; project administration, Y.X. and J.M. All authors have read and agreed to the published version of the manuscript.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

Funding Statement

This research received no external funding.

Footnotes

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

References

  • 1.Cui Q., You X., Wei N., Nan G., Zhang X., Zhang J., Lyu X., Ai M., Tao X., Feng Z., et al. Overview of AI and communication for 6G network: Fundamentals, challenges, and future research opportunities. Sci. China Inf. Sci. 2025;68:171301. doi: 10.1007/s11432-024-4337-1. [DOI] [Google Scholar]
  • 2.Nijim M., Mantrawadi N. Drone classification and identification system by phenome analysis using data mining techniques; Proceedings of the 2016 IEEE Symposium on Technologies for Homeland Security (HST); Waltham, MA, USA. 10–11 May 2016. [Google Scholar]
  • 3.Gökçe F., Üçoluk G., Şahin E., Kalkan S. Vision-Based Detection and Distance Estimation of Micro Unmanned Aerial Vehicles. Sensors. 2015;15:23805–23846. doi: 10.3390/s150923805. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Guvenc I., Koohifar F., Singh S., Sichitiu M.L., Matolak D. Detection, Tracking, and Interdiction for Amateur Drones. IEEE Commun. Mag. 2018;56:75–81. doi: 10.1109/MCOM.2018.1700455. [DOI] [Google Scholar]
  • 5.Xia R., Shi H., Ma S., Li F., Yang Y., Zhang H. Modeling and Validation of Electrostatic Sensing for UAV Targets in High-Dynamic Encounter Scenarios. Sensors. 2025;25:5107. doi: 10.3390/s25165107. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Bertocco M., Brighente A., Ciattaglia G., Gambi E., Peruzzi G., Pozzebon A., Spinsante S. Malicious Drone Identification by Vibration Signature Measurement: A Radar-Based Approach. IEEE Trans. Instrum. Meas. 2025;74:8004415. doi: 10.1109/TIM.2025.3571136. [DOI] [Google Scholar]
  • 7.Khor J.H., Sidorov M., Chong M.J.Y. Secure LoRa Drone-to-Drone Communication for Public Blockchain-Based UAV Traffic Management. Sensors. 2025;25:5087. doi: 10.3390/s25165087. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Al-lQubaydhi N., Alenezi A., Alanazi T., Senyor A., Alanezi N., Alotaibi B., Alotaibi M., Razaque A., Hariri S. Deep learning for unmanned aerial vehicles detection: A review. Comput. Sci. Rev. 2024;51:100614. doi: 10.1016/j.cosrev.2023.100614. [DOI] [Google Scholar]
  • 9.Hudson S., Balaji B. Application of Machine Learning for Drone Classification using Radars; Proceedings of the Conference on Signal Processing, Sensor/Information Fusion, and Target Recognition; Online. 12–17 April 2021. [Google Scholar]
  • 10.Dale H., Jahangir M., Baker C.J., Antoniou M., Harman S., Ahmad B.I. Convolutional Neural Networks for Robust Classification of Drones; Proceedings of the 2022 IEEE Radar Conference (RadarConf22); New York City, NY, USA. 21–25 March 2022; pp. 1–6. [DOI] [Google Scholar]
  • 11.Mu J., Cui Y., Ouyang W., Yang Z., Yuan W., Jing X. Federated Learning in 6G Non-Terrestrial Network for IoT Services: From the Perspective of Perceptive Mobile Network. IEEE Netw. 2024;38:72–79. doi: 10.1109/MNET.2024.3380647. [DOI] [Google Scholar]
  • 12.Mu J., Zhang R., Cui Y., Gao N., Jing X. UAV Meets Integrated Sensing and Communication: Challenges and Future Directions. IEEE Commun. Mag. 2023;61:62–67. doi: 10.1109/MCOM.008.2200510. [DOI] [Google Scholar]
  • 13.Al-Emadi S., Al-Senaid F. Drone Detection Approach Based on Radio-Frequency Using Convolutional Neural Network; Proceedings of the 2020 IEEE International Conference on Informatics, IoT, and Enabling Technologies (ICIoT); Doha, Qatar. 2–5 February 2020; pp. 29–34. [DOI] [Google Scholar]
  • 14.Al-Sa’d M.F., Al-Ali A., Mohamed A., Khattab T., Erbad A. RF-based drone detection and identification using deep learning approaches: An initiative towards a large open source drone database. Future Gener. Comput. Syst. 2019;100:86–97. doi: 10.1016/j.future.2019.05.007. [DOI] [Google Scholar]
  • 15.Basak S., Rajendran S., Pollin S., Scheers B. Drone classification from RF fingerprints using deep residual nets; Proceedings of the 2021 International Conference on COMmunication Systems & NETworkS (COMSNETS); Bangalore, India. 5–9 January 2021; pp. 548–555. [DOI] [Google Scholar]
  • 16.Teoh Y.J.J., Seow C.K. RF and Network Signature-based Machine Learning on Detection of Wireless Controlled Drone; Proceedings of the 2019 PhotonIcs & Electromagnetics Research Symposium—Spring (PIERS-Spring); Rome, Italy. 17–20 June 2019. [Google Scholar]
  • 17.Medaiyese O., Ezuma M., Lauf A.P., Guvenc I. Wavelet transform analytics for RF-based UAV detection and identification system using machine learning. Pervasive Mob. Comput. 2022;82:101569. doi: 10.1016/j.pmcj.2022.101569. [DOI] [Google Scholar]
  • 18.O’Shea T.J., Corgan J., Clancy T.C. Engineering Applications of Neural Networks. Springer; Cham, Switzerland: 2016. Convolutional radio modulation recognition networks; pp. 213–226. [Google Scholar]
  • 19.Jacques L., Feuillen T. The Importance of Phase in Complex Compressive Sensing. IEEE Trans. Inf. Theory. 2021;67:4150–4161. doi: 10.1109/TIT.2021.3073566. [DOI] [Google Scholar]
  • 20.Mandal S., Satija U. Time–Frequency Multiscale Convolutional Neural Network for RF-Based Drone Detection and Identification. IEEE Sens. Lett. 2023;7:7003304. doi: 10.1109/LSENS.2023.3289145. [DOI] [Google Scholar]
  • 21.Hirose A., Yoshida S. Neural Information Processing. Springer; Berlin/Heidelberg, Germany: 2011. Comparison of complex- and real-valued feedforward neural networks in their generalization ability. [Google Scholar]
  • 22.Lee C., Hasegawa H., Gao S. Complex-valued neural networks: A comprehensive survey. IEEE/CAA J. Autom. Sin. 2022;9:1406–1426. doi: 10.1109/JAS.2022.105743. [DOI] [Google Scholar]
  • 23.Hirose A., Yoshida S. Generalization Characteristics of Complex-Valued Feedforward Neural Networks in Relation to Signal Coherence. IEEE Trans. Neural Netw. Learn. Syst. 2012;23:541–551. doi: 10.1109/TNNLS.2012.2183613. [DOI] [PubMed] [Google Scholar]
  • 24.Trabelsi C., Bilaniuk O., Zhang Y., Serdyuk D., Subramanian S., Santos J.F., Mehri S., Rostamzadeh N., Bengio Y., Pal C.J. Deep Complex Networks. arXiv. 2018 doi: 10.48550/arXiv.1705.09792.1705.09792 [DOI] [Google Scholar]
  • 25.Choi H.S., Kim J., Huh J., Kim A., Ha J.W., Lee K. Phase-Aware Speech Enhancement with Deep Complex U-Net; Proceedings of the International Conference on Learning Representations 2019; New Orleans, LA, USA. 6–9 May 2019. [Google Scholar]
  • 26.Zhang Y., Wang J., Sun J., Adebisi B., Gacanin H., Gui G., Adachi F. CV-3DCNN: Complex-Valued Deep Learning for CSI Prediction in FDD Massive MIMO Systems. IEEE Wirel. Commun. Lett. 2021;10:266–270. doi: 10.1109/LWC.2020.3027774. [DOI] [Google Scholar]
  • 27.Mu H., Zhang Y., Jiang Y., Ding C. CV-GMTINet: GMTI Using a Deep Complex-Valued Convolutional Neural Network for Multichannel SAR-GMTI System. IEEE Trans. Geosci. Remote Sens. 2021;60:5201115. doi: 10.1109/TGRS.2020.3047112. [DOI] [Google Scholar]
  • 28.Paszke A., Lerer A., Killeen T., Antiga L., Yang E., Tejani A., Fang L., Gross S., Bradbury J., Lin Z. PyTorch: An Imperative Style, High-Performance Deep Learning Library; Proceedings of the 33rd International Conference on Neural Information Processing Systems; Virtual. 6–12 December 2020. [Google Scholar]
  • 29.Yu N., Mao S., Zhou C., Sun G., Shi Z., Chen J. DroneRFa: A Large-scale Dataset of Drone Radio Frequency Signals for Detecting Low-altitude Drones. J. Electron. Inf. Technol. 2024;46:1147–1156. doi: 10.11999/JEIT230570. [DOI] [Google Scholar]
  • 30.Zhang Z., Wang H., Xu F., Jin Y.Q. Complex-Valued Convolutional Neural Network and Its Application in Polarimetric SAR Image Classification. IEEE Trans. Geosci. Remote Sens. 2017;55:7177–7188. doi: 10.1109/TGRS.2017.2743222. [DOI] [Google Scholar]
  • 31.Ouamna H., Kharbouche A., El-Haryqy N., Madini Z., Zouine Y. Performance Analysis of a Hybrid Complex-Valued CNN-TCN Model for Automatic Modulation Recognition in Wireless Communication Systems. Appl. Syst. Innov. 2025;8:90. doi: 10.3390/asi8040090. [DOI] [Google Scholar]
  • 32.Hao R., Yu H., Zhong J., Wang C., Wang J., Kan Y., Yang W., Fan S., Yin H., Qiu J., et al. Research challenges and progress in the end-to-end v2x cooperative autonomous driving competition; Proceedings of the IEEE/CVF International Conference on Computer Vision; Honolulu, HI, USA. 19–23 October 2025; pp. 1828–1839. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.


Articles from Sensors (Basel, Switzerland) are provided here courtesy of Multidisciplinary Digital Publishing Institute (MDPI)

RESOURCES