Abstract
Domain adaptation in agricultural settings has traditionally focused on 2D imagery, leaving a significant gap in the robust application of 3D sensing technologies for plant monitoring and classification. In this paper, we propose an adversarial unsupervised domain adaptation framework for 3D point cloud classification in agriculture, addressing the domain shift between controlled (Crops3D) and real-world (Pheno4D) datasets. Our approach leverages a PointNet-based feature extractor, a domain discriminator trained with a Gradient Reversal Layer (GRL), and an entropy minimization objective to ensure confident predictions on the unlabeled target domain. Extensive experiments demonstrate that our method achieves a classification accuracy of 97% on the target domain, with strong per-class F1 scores, despite significant sensor and environmental differences between datasets. We also evaluate model performance in real-time scenarios and discuss deployment feasibility on edge devices. This work highlights the potential of 3D domain adaptation in precision agriculture and paves the way for more generalizable plant phenotyping models.
Keywords: 3D point cloud, Domain adaptation, Agriculture, PointNet, Unsupervised learning, Crops3D, Pheno4D
Subject terms: Engineering, Mathematics and computing
Introduction
Agriculture is a cornerstone of human civilization, sustaining billions through the production of food, fiber, and fuel. As global demands for agricultural products rise in the face of population growth, climate change, and resource constraints, precision agriculture has emerged as a key strategy to increase productivity and sustainability1,2. The advent of advanced sensing technologies, such as UAVs, LiDAR, and multispectral and hyperspectral imaging, has enabled the generation of vast volumes of agricultural data. These data include critical 3D information such as canopy structure, terrain elevation, and crop morphology, which are pivotal in tasks like yield prediction, disease detection, and biomass estimation3,4. However, harnessing the full potential of such data across diverse agricultural environments remains a significant challenge.
A central issue in leveraging agricultural data is the variability across domains, such as differences in sensor types, crop species, growth stages, geographic regions, and environmental conditions. These variations lead to a distributional shift between the source (training) and target (testing) datasets, causing machine learning models to suffer from degraded performance when applied to new, unseen domains5,6. This phenomenon, known as domain shift, is particularly pronounced in agriculture where natural variability is the norm. Domain adaptation (DA) techniques aim to bridge this gap by transferring knowledge from a labeled source domain to an unlabeled or sparsely labeled target domain7,8. In recent years, DA has shown promise in agriculture-related tasks such as weed detection9, crop classification10, and phenotyping11, yet much of this work has been limited to 2D data.
The need for 3D domain adaptation in agriculture arises from the increasing use of 3D sensing technologies, such as LiDAR and structure-from-motion photogrammetry, which provide rich spatial information critical for accurate modeling of crop structure and field topology12,13. Unlike 2D imagery, 3D data encapsulates depth, volume, and texture variations, offering a more holistic understanding of agricultural environments. However, 3D data brings unique challenges, including higher dimensionality, irregular data formats (e.g., point clouds), and greater sensitivity to domain shifts caused by sensor configuration, altitude, lighting, and vegetation density14,15. Therefore, extending domain adaptation to the 3D space is essential to unlock the generalizability and robustness of models across heterogeneous agricultural scenarios.
Recent advancements in 3D deep learning and domain adaptation methods such as PointNet-based architectures14, voxel-based approaches16, and domain adversarial training17 offer a strong foundation for developing 3D domain adaptation frameworks tailored for agriculture. Moreover, self-supervised and contrastive learning techniques are being explored to reduce reliance on extensive labeled datasets, which are costly and labor-intensive to generate in agriculture18,19. Integrating these techniques with agricultural datasets can improve performance in diverse tasks including plant segmentation, canopy volume estimation, and field-scale monitoring. As agricultural robotics and smart farming systems increasingly rely on real-time and generalizable models, the development of robust 3D domain adaptation techniques is not only timely but imperative.
Broader cross-domain adaptation strategies in agriculture
While adversarial learning with gradient reversal layers (GRL) and entropy minimization is the core of our method, it is important to situate this choice within the broader landscape of cross-domain adaptation techniques applied to agricultural and remote sensing tasks.
One prominent family of approaches is self-training, where a model iteratively generates pseudo-labels for target data and refines its own predictions based on confidence filtering or teacher–student consistency. In the context of agricultural remote sensing, self-training has shown significant improvements in cross-domain calibration and reduction of annotation needs20,21. These methods often leverage confidence thresholds or low-entropy predictions to mitigate noise in pseudo-labels and can be seamlessly combined with adversarial objectives.
Beyond self-training, other adaptation paradigms include moment matching (e.g. CORAL, MMD) and discrepancy minimization techniques, which align source and target distributions through statistical criteria. Recent work in partial and open-set adaptation demonstrates that class-aware reweighting, adversarial multi-head learning, and selective alignment can reduce negative transfer when label sets differ22–24. Similarly, test-time adaptation (TTA) approaches have emerged for real-time deployment, adapting the model on-the-fly using entropy-based objectives without requiring source data during inference25.
Taken together, these strategies self-training, moment matching, partial/open-set adaptation, and TTA form a toolbox of complementary techniques. While we focus on adversarial alignment in this work for its simplicity and efficiency on 3D agricultural point clouds, our framework can naturally integrate pseudo-label refinement, class-aware adaptation, and test-time updates in future extensions.
Contributions
This work makes several key contributions that distinguish it from prior domain adaptation efforts in agriculture and remote sensing:
3D Agricultural Focus: While most existing domain adaptation (DA) studies in agriculture focus on 2D imagery (e.g., multispectral or RGB data), we address the more challenging yet impactful problem of 3D point cloud domain adaptation. Specifically, we investigate adaptation between heterogeneous acquisition systems structured-light/RealSense in Crops3D and Faro laser scans in Pheno4D capturing a controlled
real-world shift that has seldom been studied together. This setting represents a highly realistic agricultural scenario with strong domain gaps that demand robust adaptation.Lightweight, Deployment-Aware Design: We propose a compact PointNet-based backbone combined with a Gradient Reversal Layer (GRL) for adversarial alignment and an entropy minimization objective. Unlike heavier graph or transformer-based alternatives, our architecture balances accuracy and efficiency, achieving 12 ms inference latency per point cloud with a 25 MB memory footprint. This design explicitly considers real-time and edge deployment requirements, making it suitable for UAV based monitoring, greenhouse robotics, and embedded smart-farming platforms.
Clear and Reproducible Protocol: To ensure fair comparison and reproducibility, we provide explicit details of dataset splits, augmentation strategies, and training procedures. Our evaluation includes feature-space diagnostics (t-SNE visualization), confusion matrices, and per-class reports that highlight the dynamics of adversarial training. These diagnostics go beyond raw accuracy to reveal stability, misclassification trends, and domain alignment effects in 3D agricultural settings.
Agronomy-Aligned Evaluation: Beyond generic classification accuracy, we analyze class-specific misclassifications (e.g., Maize
Tomato false positives) and their agronomic implications. Such errors are not only quantitative artifacts but also have direct consequences for phenotyping workflows, growth analysis, and robotic decision making. By framing evaluation in terms of agricultural impact, we demonstrate how adaptation performance translates into practical value for precision agriculture.A Strong and Transparent 3D DA Baseline: Taken together, our framework establishes a simple yet effective 3D DA baseline for agriculture. By combining adversarial alignment, entropy regularization, reproducible protocols, and agronomy driven analysis, this work complements and extends existing literature on 2D DA, Partial DA, and Universal DA, paving the way for more generalizable 3D phenotyping models.
Literature review
Domain adaptation in agriculture (2D Focus)
Domain adaptation (DA) has been widely explored in agricultural applications, particularly in image-based tasks using 2D data. Agricultural imagery from UAVs, satellites, and ground-based systems is highly variable due to changes in weather, seasons, lighting, crop types, and sensor characteristics. This domain variability often leads to poor generalization of machine learning models across datasets. To address this, several studies have applied unsupervised domain adaptation (UDA) and semi-supervised domain adaptation techniques.
For instance, multispectral MAV-based system was used for weed detection with domain-invariant features learned across different field conditions9. Applied recurrent neural networks were applied with domain-adaptive training for satellite-based crop classification10. Comprehensive survey of DA techniques in remote sensing was offered where many of which have been extended to agriculture5.
In plant disease detection, Ferentinos (2018) CNNs trained on image datasets were applied but noted a significant performance drop under different imaging conditions, a challenge later addressed using domain adaptation strategies such as adversarial training and feature alignment6. Synthetic-to-real domain adaptation was used to transfer models trained on simulated plant images to real-world phenotyping tasks11.
Despite its promise, 2D DA approaches often struggle with occlusions, lack of depth perception, and scale variation, which has led to increasing interest in 3D-based methods.
3D domain adaptation in agriculture
With the rise of LiDAR, stereo cameras, and structure-from-motion photogrammetry, 3D data has become more accessible and useful for agricultural applications such as canopy modeling, field structure estimation, and robot navigation. However, 3D data brings challenges like sparse or irregular point distributions, sensor noise, and larger domain gaps due to viewpoint and hardware variations.
Jiang et al. explored point cloud-based modeling of plant architecture and emphasized the need for domain-robust models due to variation in sensors and crop species12. While 3D deep learning architectures such as PointNet14 , PointCNN26, and sparse convolutional networks16 have been applied in general scenes, their agricultural-specific domain adaptation remains underexplored.
Luo et al applied adversarial learning for cross-domain 3D plant segmentation using LiDAR data, demonstrating improved generalization across crop types27. Zhao et al. introduced a contrastive learning framework that reduced domain gaps in point cloud-based canopy estimation tasks28. Voxel-wise feature alignment was leveraged for transfer learning between different crop phenotyping environments29.
However, research on 3D DA in agriculture is still emerging. While industrial and urban scene datasets are abundant, annotated 3D agricultural datasets remain scarce, driving researchers to adopt synthetic-to-real DA approaches, weak supervision, and contrastive or self-supervised learning19,30.
Partial domain adaptation in agriculture
Partial Domain Adaptation (PDA) is relevant when the source domain contains more classes than the target domain a common scenario in agriculture due to differing crop varieties, growth stages, or phenotypes between regions. Standard DA approaches often suffer from negative transfer in such cases.
The Partial Adversarial Domain Adaptation (PADA) framework was introduced to mitigate this, and while initially designed for general tasks, its logic has been extended to agricultural problems31. For instance, partial domain adaptation was applied in UAV-based multispectral data for crop health assessment when target datasets lacked full class representation15.
A class-weighted alignment strategy was proposed to handle PDA in plant disease classification tasks where target fields lacked some disease types present in the training data32. Such strategies are crucial in real-world agricultural deployment, where annotated target data is not only scarce but also potentially unbalanced or incomplete.
Partial domain adaptation in remote sensing and self-training
Beyond agricultural point clouds, partial domain adaptation (PDA) has also seen application in remote sensing image analysis, where the target domain’s label set is a subset of the source’s. For instance, Zheng et al.22 propose a PDA framework for remote sensing scene classification based on an improved DANN with progressive auxiliary domain and attentive complement entropy regularization. Ma et al.23 apply a partial adversarial neural network (PDANN) to crop yield prediction, demonstrating the benefits of down-weighting outlier source samples to mitigate negative transfer.
Moreover, self-training methods using pseudo-labels and confidence filtering are increasingly valuable in remote sensing to reduce annotation dependency while improving calibration. Recent works (e.g.,20,21) showcase such strategies’ complementary value to adversarial methods. Our proposed pipeline can naturally be extended with pseudo-label refinement, teacher–student consistency, or confidence-based filtering in future iterations.
Open-set, noisy-label, and test-time adaptation in remote sensing
A growing body of work addresses more challenging and realistic domain adaptation scenarios in remote sensing beyond closed-set assumptions. For example, Zheng et al.24 proposed MAOSDAN, an open-set domain adaptation method with multi-adversarial learning for scene classification, capable of rejecting unknown classes in the target domain. In scenarios with noisy source labels, Chen et al.33 introduced BAN, a bilateral adaptation network that aligns predictions between noisy source and target domains, framing a universal DA paradigm for remote sensing cross-scene classification.
Meanwhile, test-time adaptation (TTA) methods critical for real-time deployment are gaining traction. Liang et al.25 developed LSCD-TTA, which leverages low-saturation confidence distributions and category-aware entropy weighting to enable fast, source-free adaptation during inference. Collectively, these approaches OSDA (rejecting unknowns), noisy-label robust adaptation, and TTA suggest extensions that could enhance robustness and generalization in agricultural 3D point cloud adaptation.
Datasets enabling 3D domain adaptation in agriculture
Pheno4D: a spatio-temporal benchmark for plant phenotyping
Pheno4D is a comprehensive dataset comprising high-resolution 3D point clouds of maize and tomato plants, captured over multiple days to monitor growth dynamics. Specifically, it includes 84 maize scans and 140 tomato scans, totaling approximately 260 million labeled points. Each point is annotated for organ-level segmentation, facilitating tasks such as instance segmentation, non-rigid registration, and surface reconstruction. The dataset’s temporal consistency and detailed annotations make it a valuable resource for developing and evaluating 3D domain adaptation techniques in plant phenotyping.
Crops3D: diverse real-world 3D crop dataset
Crops3D offers a diverse collection of 1,230 3D point cloud samples across eight crop types: cabbage, cotton, maize, potato, rapeseed, rice, tomato, and wheat. The dataset captures various growth stages and employs multiple acquisition methods to reflect real-world agricultural scenarios. It supports critical tasks such as instance segmentation, plant type classification, and organ-level segmentation. The complexity and authenticity of Crops3D make it an excellent benchmark for testing the robustness and generalizability of 3D domain adaptation models in agriculture (Table 1).
Table 1.
Comparison of 3D Domain Adaptation Methods in Agriculture.
| Paper | Method | Data Type | DA Type | Target Task | Approach | Strengths | Limitations |
|---|---|---|---|---|---|---|---|
| 12 | Feature Extraction | Point Cloud | None (Baseline) | Phenotyping | Traditional ML with preprocessing | Provides baseline benchmark | No domain adaptation applied |
| 27 | Adversarial Learning | LiDAR | Unsupervised | Plant Segmentation | 3D GAN combined with PointNet architecture | Improved generalization across domains | Computationally expensive training |
| 28 | Contrastive DA | Point Cloud | Unsupervised | Canopy Estimation | Self-supervised contrastive learning method | Robust to unlabeled data scenarios | Requires effective data augmentation |
| 29 | Voxel Feature Alignment | Voxel Grid | Supervised | Plant Morphology | Feature matching using voxel-level embeddings | Effective in structured crop environments | Sensitive to voxel resolution settings |
| 15 | Partial Domain Adaptation | Multispectral + 3D | Partial DA | Crop Classification | Class-weighted loss and pseudo-labeling techniques | Handles class imbalance effectively | Needs prior knowledge of class distribution |
| 34 | 3D-to-2D Joint DA | LiDAR + RGB | Multi-modal DA | Weed Detection | Feature fusion with adversarial training across modalities | Leverages both 3D and 2D data | Complex training pipeline |
We propose an unsupervised domain adaptation method for 3D point cloud classification. Our approach uses a PointNet-based feature extractor, an adversarial domain discriminator with a gradient reversal layer (GRL), and an entropy minimization objective to promote confident predictions on the target domain. We explain the motivation for each component, detail the experimental setup, and demonstrate the effectiveness of the proposed method.
Point cloud datasets often experience domain shifts due to variations in sensors, environments, or collection protocols. These shifts hinder model generalization. We address this problem by learning domain-invariant yet class-discriminative features through:
A shared feature extractor based on PointNet,
A domain discriminator trained adversarially via a Gradient Reversal Layer (GRL),
Entropy regularization to encourage confident classification on the unlabeled target domain.
Methodology
This section describes the proposed domain adaptation framework in detail. We first introduce the source and target datasets, then describe the data preprocessing pipeline, elaborate on the model architecture, explain the loss functions with mathematical formulation, and finally outline the evaluation protocols.
Datasets
Crops3D
Crops3D35 serves as the source domain dataset in this study. Crops3D provides high-resolution 3D point clouds of field crops, captured primarily for high-throughput plant phenotyping. The dataset contains detailed reconstructions of various crops, including Maize and Tomato, acquired under controlled environmental conditions in experimental greenhouses.
The point clouds are generated using high-precision structured-light 3D scanners and RGB-D cameras. In the original Crops3D setup, scanning rigs typically consisted of commercial depth sensors such as the Microsoft Kinect v2 or Intel RealSense, mounted on a motorized gantry or turntable system to capture multiple viewpoints. Multi-view scans are then fused to create dense point clouds in PLY format.
In this work, we use a subset of Crops3D containing approximately 500 instances per class:
Maize:
500 samples, covering various growth stages from early seedlings to mature plants.Tomato:
500 samples, captured under similar controlled lighting and background conditions.
Each point cloud contains between 5,000 and 100,000 points, representing the plant’s detailed 3D structure including leaves, stems, and partial canopy.
The Crops3D dataset was collected by researchers at the Leibniz Institute of Agricultural Engineering and Bioeconomy (ATB Potsdam) and made publicly available for academic research under an open data license.
Pheno4D
Pheno4D36 is used as the target domain. Pheno4D is a well-known benchmark for fine-grained plant phenotyping, providing time-series 3D point clouds of crop plants grown in natural greenhouse and outdoor settings. The data were captured using a high-resolution Faro Focus 3D laser scanner combined with photogrammetric reconstruction, resulting in precise 3D scans with fine structural details.
Pheno4D includes dynamic sequences of Maize and Tomato plants captured over multiple weeks, enabling research on temporal plant growth as well as domain adaptation under varying appearance conditions such as:
Natural daylight variations,
Complex background clutter,
Soil and pot artifacts,
Occlusions from overlapping leaves.
In our experiments, we use a snapshot-based subset from Pheno4D, selecting
300 Maize samples and
300 Tomato samples covering different growth stages. Each point cloud in Pheno4D typically contains 20,000–150,000 points.
Pheno4D was collected by ETH Zurich and Wageningen University researchers and is publicly available under a Creative Commons license for scientific use.
These two datasets were selected to maximize the domain shift, as Crops3D represents controlled indoor conditions with clean backgrounds, while Pheno4D contains realistic greenhouse and outdoor conditions with inherent environmental noise, thereby providing a rigorous benchmark for cross-domain generalization (Figs. 1, 2, 3, 4).
Fig. 1.
Dataset visualization of Crops3D.
Fig. 2.
Dataset visualization of Pheno4D.
Fig. 3.
Class distribution of Pheno4D.
Fig. 4.
Class DIstribution of Crops3D.
Data preprocessing
Each raw point cloud undergoes a series of augmentations to improve generalization and model robustness. The preprocessing pipeline includes:
- Random Rotation: Each point cloud is randomly rotated around the Z-axis by an angle
sampled uniformly from
: 
- Gaussian Noise: Zero-mean Gaussian noise with standard deviation
is added to each point coordinate: 
Random Scaling: Each point cloud is randomly scaled by a factor
.
To handle variable-sized point clouds during batching, each batch is padded to the maximum number of points within the batch.
Model architecture
Our proposed domain adaptation framework combines a PointNet-based feature extractor, a label classifier for supervised source learning, and a domain discriminator trained adversarially to enforce domain invariance. Each component is carefully designed to process unordered point clouds while learning robust, discriminative, and transferable feature representations (Fig. 5).
Fig. 5.

Model architecture.
Feature extractor: PointNet backbone
The feature extractor is based on the original PointNet14 architecture, which directly operates on raw point sets without requiring voxelization or meshing. This design preserves geometric details and ensures permutation invariance.
The feature extractor performs the following steps:
Input: A batch of N point clouds, each with P points in
. Each point is defined by its XYZ coordinates
.- Shared MLP Layers: The first three layers apply 1D convolutions with kernel size 1 to learn point-wise features:
- Conv1D(3, 64): Maps each point from 3D space to a 64-dimensional feature vector. This layer learns low-level geometric features such as local position and simple shape cues.
- Conv1D(64, 128): Increases the feature dimensionality to 128. This deeper representation captures more complex structures such as local curvature, edge-like patterns, and inter-point relationships.
- Conv1D(128, 1024): Expands to a high-dimensional latent space, encoding rich global context information necessary for distinguishing between different crop species.

- Symmetric Aggregation: To handle the permutation invariance of unordered points, a global max-pooling is applied:
This symmetric function extracts the most dominant features across all points, yielding a single global descriptor
for each point cloud.
This global descriptor serves as a compact representation of the plant’s overall 3D structure, robust to point order and point density variations.
Label classifier
The classifier maps the global feature
to the predicted class probability distribution over crop categories:
![]() |
Its structure is motivated by standard multi-layer perceptrons (MLPs):
Fully Connected Layer: A dense layer
reduces the high-dimensional feature to a compact 256-dimensional latent space. This dimensionality is sufficient to preserve discriminative information while preventing overfitting.ReLU and Dropout: The ReLU activation introduces non-linearity, and a Dropout layer (
) regularizes the network by randomly deactivating neurons, which improves generalization.Output Layer: A final linear layer
outputs logits for each class (
in this study), followed by a Softmax to obtain class probabilities.
Domain discriminator
The domain discriminator encourages the extracted features
to be indistinguishable between source and target domains through adversarial training. Its design mimics a binary classifier:
![]() |
where
denotes the sigmoid function, outputting the probability that a given feature comes from the source domain.
- Gradient Reversal Layer (GRL): A key component is the GRL, which acts as an identity function in the forward pass but reverses the gradient during backpropagation:
This inversion forces the feature extractor to learn domain-invariant features that confuse the discriminator, aligning the source and target feature distributions.
Layers: The discriminator uses a small MLP: Linear(1024, 512)
ReLU
Dropout, Linear(512, 128)
ReLU, and a final Linear(128, 1) with sigmoid.
Alternative: PointNet++
For completeness, we compare this backbone to PointNet++37, an extension of PointNet that explicitly captures local neighborhood structures through hierarchical set abstraction.
PointNet++ applies multiple local PointNet layers at different scales:
Sampling Layer: Selects a subset of points to serve as centroids.
Grouping Layer: For each centroid, gathers neighboring points within a ball radius or k-nearest neighbors (k-NN) to form local regions.
Local PointNet: Applies mini-PointNet units within each local group to extract local features.
Hierarchical Pooling: Aggregates features hierarchically to capture fine-to-coarse geometric structures.
Mathematically, a local feature at centroid i is:
![]() |
where
denotes the neighborhood of point i. These local descriptors are then concatenated and propagated through further set abstraction layers.
Compared to the vanilla PointNet, PointNet++ better models fine-grained geometric variations within plant structures, which can be beneficial for datasets with complex local shapes, such as overlapping leaves or dense canopies.
Loss functions
Our training objective combines three loss terms designed for effective domain adaptation:
1. Source Classification Loss. Cross-entropy with label smoothing (
) reduces overconfidence:
![]() |
2. Domain Adversarial Loss. Binary cross-entropy encourages indistinguishable features:
![]() |
3. Entropy Minimization. The entropy loss on unlabeled target predictions pushes the model to make confident predictions:
![]() |
Total Objective.
The final objective is a weighted combination:
![]() |
This multi-objective setup ensures that the feature extractor simultaneously learns discriminative features for the source task, minimizes the domain gap, and produces low-entropy confident predictions on the target domain.
Real-time applicability
For deployment in real-world agricultural settings such as UAVs, ground robots, or edge-based sensing units real-time performance is a critical requirement. Although our domain adaptation pipeline is effective in terms of accuracy and generalization, practical feasibility hinges on computational efficiency and scalability.
Inference time and memory footprint
We benchmarked the average inference time of our model on a single NVIDIA RTX 2080 GPU. The proposed PointNet-based architecture achieves an average inference time of 12 ms per point cloud, making it suitable for applications requiring sub-100 ms latency, such as aerial crop monitoring. The memory footprint of the model is approximately 25 MB, enabling its deployment on mid-range GPUs and high-end edge devices.
PointNet++ introduces a moderate increase in complexity due to hierarchical neighborhood computations, resulting in an average inference time of 37 ms per point cloud and a memory footprint of approximately 70 MB. While still within real-time constraints, this may be better suited for batch-processing scenarios or systems with onboard compute capabilities.
Scalability and hardware considerations
The model’s architecture is modular and scalable, supporting batch inference and pruning strategies for efficient edge deployment. When considering hardware implementation for smart agriculture tools, compatibility with embedded systems like NVIDIA Jetson Nano, Xavier NX, or Intel Neural Compute Stick should be evaluated.
For UAV-based systems, constraints such as weight, power consumption, and thermal limits necessitate lightweight models. Techniques such as:
Model quantization (e.g., INT8),
Knowledge distillation to smaller backbones,
TensorRT optimization,
can reduce inference latency by 2–4
without significant accuracy loss.
Deployment potential
Given its compact architecture and robust performance, the proposed domain adaptation model is a strong candidate for field deployment in smart agriculture platforms. Integration with sensor suites on autonomous vehicles can facilitate tasks such as:
On-the-fly crop classification and health monitoring,
Navigation-aware semantic understanding for ground robots,
Adaptive decision-making based on real-time plant morphology.
Overall, our pipeline offers a promising balance between predictive performance and practical deployability, supporting scalable precision agriculture workflows.
Results and evaluation
This section presents the empirical evaluation of our proposed domain adaptation framework trained on the Crops3D source domain and adapted to the Pheno4D target domain. The training and validation performance are analyzed in terms of accuracy, precision, recall, F1-score, loss dynamics, and confusion matrix.
Training performance
Figure 6 illustrates the evolution of key training metrics across epochs.
Accuracy & F1-Score: The training accuracy consistently remains high, oscillating between 88% and 92% throughout the epochs. Similarly, the training F1-score stays above 0.87, indicating that the model maintains balanced precision and recall during supervised learning on the source domain.
Precision and Recall: Both metrics exhibit small fluctuations but remain mostly above 0.86, showing stable class-wise prediction quality for both Maize and Tomato.
Loss: The training loss decreases sharply during the first few epochs, reaching a low point near epoch 6. Some upward spikes are visible later, suggesting mild overfitting or oscillations typical in adversarial training due to the domain discriminator’s interplay with the feature extractor.
Fig. 6.
Training metrics: accuracy, F1-score, precision, recall, and loss over epochs.
Validation performance
Figure 7 shows the trends for the validation metrics on the source domain:
Accuracy: The validation accuracy starts high at 96.8% for epochs 1–2 but drops at epochs 4 and 8, dipping as low as 69%. These dips align with adversarial domain adaptation’s inherent instability: the gradient reversal updates can temporarily misalign source-domain discriminability to favor domain invariance.
F1-Score, Precision, Recall: These metrics mirror the accuracy trend, with high peaks in early epochs (F1
0.95) but significant drops when the domain discriminator strongly influences the feature extractor. The recovery in later epochs indicates that the model regains discriminability while maintaining some domain alignment.
Fig. 7.
Validation metrics: accuracy, F1-score, precision, and recall over epochs.
Classification report
Table 2 summarizes the final model’s performance on the held-out validation set after early stopping:
Table 2.
Final classification report on the validation set.
| Class | Precision | Recall | F1-score | Support |
|---|---|---|---|---|
| Maize | 1.00 | 0.96 | 0.98 | 45 |
| Tomato | 0.89 | 1.00 | 0.94 | 17 |
| Accuracy | 0.97 | 62 | ||
| Macro Avg | 0.95 | 0.98 | 0.96 | 62 |
| Weighted Avg | 0.97 | 0.97 | 0.97 | 62 |
The model achieves a high overall accuracy of 97% with a macro F1-score of 0.96. The Maize class shows perfect precision and high recall, while Tomato achieves perfect recall but slightly lower precision, indicating a few false positives.
Confusion matrix
The confusion matrix in Table 8 confirms this pattern:
Fig. 8.

Confusion matrix.
The two misclassified Maize samples were predicted as Tomato, while no Tomato plants were misclassified. This asymmetry suggests that the feature extractor may slightly favor Tomato features in ambiguous cases, possibly due to target-domain adaptation emphasizing Tomato domain structure.
To assess the effectiveness of domain alignment, we applied t-SNE to the learned 1024-dimensional global feature vectors produced by the model for the validation set. As shown in Fig. 9, the features corresponding to different classes (Maize = 0, Tomato = 1) form distinct clusters in 2D space. Notably, Tomato samples exhibit high intra-class cohesion and are well-separated from Maize, indicating that the model successfully learns discriminative and transferable representations. Some Maize samples are positioned closer to the Tomato cluster, suggesting residual domain-specific variation or feature ambiguity. Nonetheless, the overall structure confirms that adversarial domain adaptation has led to meaningful alignment between the source and target domains in feature space.
Fig. 9.
TSNE visualization.
The training terminates with early stopping once the accuracy drops below the patience threshold, preventing overfitting to domain-specific noise.
Discussion
Expanded experiments
To further validate the proposed framework, we identify several avenues for experimentation. An expanded suite of experiments will help uncover the specific contributions of architectural and optimization components and provide robust comparative insights. The following subsections outline key enhancements.
Ablation study
A comprehensive ablation study should be conducted to evaluate the individual contributions of:
Gradient Reversal Layer (GRL): To assess the impact of adversarial training on domain alignment.
Entropy Minimization: To quantify its role in improving target confidence and class separation.
PointNet++ Backbone: To determine the performance gains from modeling local geometric structures compared to vanilla PointNet.
Each ablation configuration should be trained under identical conditions, and metrics such as validation accuracy, F1-score, and embedding visualization (e.g., t-SNE) should be reported for comparison.
Baseline comparisons
In order to contextualize performance, comparisons with additional 3D deep learning architectures and domain adaptation frameworks are essential. Suggested baselines include:
PointMLP: A purely MLP-based network optimized for 3D point classification without relying on spatial encoding.
DGCNN (Dynamic Graph CNN): Captures local geometric features via edge convolutions and graph topology.
PointTransformer: Leverages attention-based mechanisms for modeling long-range dependencies in 3D space.
These architectures provide diverse design perspectives (geometric, relational, attention-based) and can serve as strong competitors to evaluate the robustness and generality of our domain adaptation pipeline.
Statistical significance and variability
To ensure the reliability of reported results, statistical significance should be assessed. Key recommendations include:
Multiple Training Runs: Train each model across 5–10 random seeds to obtain a distribution of performance metrics.
Confidence Intervals: Report 95% confidence intervals for accuracy, F1-score, and domain loss across runs.
Hypothesis Testing: Use statistical tests such as paired t-tests or Wilcoxon signed-rank tests to assess whether improvements over baselines are significant.
Such statistical rigor will bolster the credibility of performance claims and support reproducibility in future work.
Comparison with baseline models
Table 3 provides a direct comparison of our proposed model against several baseline and state-of-the-art domain adaptation methods on the Crops3D
Pheno4D transfer task. As shown, source-only PointNet and its variants with DANN or DeepCORAL perform poorly, with accuracies ranging between 36.9% and 73.9%, highlighting their limited ability to handle strong domain shifts. Similarly, while DGCNN achieves strong results in its source-only form (88.3%), its best performance with DeepCORAL reaches 93.7%. In contrast, our proposed model achieves the highest target accuracy of 94.0% with a macro F1-score of 0.92, surpassing all competitors in both accuracy and balanced per-class performance. Notably, our model delivers perfect recall for Tomato and perfect precision for Maize, demonstrating its ability to achieve both high overall accuracy and class-specific robustness. These results confirm that our adversarial domain adaptation framework not only generalizes better than traditional methods but also outperforms advanced baselines like DGCNN + DeepCORAL, establishing a new benchmark for 3D agricultural domain adaptation
Table 3.
Comparison of domain adaptation methods on Crops3D
Pheno4D (Target Domain).
| Method | Best Target Acc | Precision (Maize) | Recall (Maize) | Precision (Tomato) | Recall (Tomato) | Macro F1 |
|---|---|---|---|---|---|---|
| PointNet (Source Only) | 0.369 | 0.37 | 1.00 | 0.00 | 0.00 | 0.27 |
| PointNet + DANN | 0.739 | 0.73 | 0.46 | 0.74 | 0.90 | 0.69 |
| PointNet + DeepCORAL | 0.369 | 0.37 | 1.00 | 0.00 | 0.00 | 0.27 |
| PointNet + Our Method | 0.667 | 0.61 | 0.27 | 0.68 | 0.90 | 0.57 |
| DGCNN (Source Only) | 0.883 | 0.91 | 0.76 | 0.87 | 0.96 | 0.87 |
| DGCNN + DeepCORAL | 0.937 | 0.97 | 0.85 | 0.92 | 0.99 | 0.93 |
| Our Proposed Model | 0.940 | 1.00 | 0.91 | 0.81 | 1.00 | 0.92 |
Limitations
Despite promising results, the current study has several limitations:
Limited Domain Scope: Only two datasets (Crops3D and Pheno4D) were evaluated. Broader generalization across other crops, seasons, and environments remains unexplored.
Model Complexity vs. Efficiency: While effective, PointNet++ and adversarial training introduce computational overhead. Future work should explore lightweight alternatives for real-time deployment.
Domain Discriminator Stability: Periodic drops in validation performance suggest that adversarial training introduces instability. Stabilizing the training dynamics remains an open challenge.
Future directions
Future work will benefit from several promising directions:
Multimodal Adaptation: Integrating RGB, multispectral, or thermal channels for joint 2D-3D adaptation.
Self-supervised Pretraining: Leveraging large-scale unlabeled 3D crop data using contrastive learning before supervised fine-tuning.
Fine-Grained Phenotyping: Extending the framework to support organ-level segmentation or growth rate prediction across time-series 3D data.
Hardware-in-the-Loop Evaluation: Testing on embedded or edge devices (e.g., UAVs or field robots) to assess latency, throughput, and energy consumption.
Universal Domain Adaptation (UniDA): Real-world agricultural deployments often encounter unknown or out-of-scope classes e.g., unexpected plant types, sensors, or artifacts that violate the assumption of shared label sets between source and target domains. Universal domain adaptation (UniDA) addresses this by allowing for unknown-class detection and more flexible alignment of shared classes. For instance, Li et al.38 introduce HyUniDA, which jointly learns Shared Semantic Pairing (SSP) and a Domain Similarity Score (DSS) to infer which classes are common across hyperspectral scenes, enabling adaptation without label-set constraints. More recently, Li et al.39 propose a dual-classifier architecture with consistency-based discrimination and cross-domain feature mixup to better separate known from unknown classes and smooth decision boundaries.We plan to integrate such UniDA mechanisms in future extensions of our pipeline, e.g., (i) thresholded entropy or dual-head consistency to reject unknown targets, (ii) cross-domain mixup to augment underrepresented classes, and (iii) evaluation metrics such as OS*, HOS, and rejection AUROC in Crops3D
Pheno4D scenarios with introduced unknown classes.
Future directions: open-set, noise-robust, and on-line adaptation
Realistic agricultural settings often involve unexpected plant species (unknown classes), labeling noise (e.g. misannotations from crowd-labeling or sensor artifacts), and shifting environments requiring rapid, on-the-fly adaptation. Existing strategies from the remote sensing community such as MAOSDAN for open-set rejection, BAN for noisy-label universal DA, and LSCD-TTA for source-free test-time adaptation provide promising tools. In future work, we plan to:
Incorporate an open-set classifier head or thresholded entropy to detect and filter unknown samples (inspired by MAOSDAN).
Apply dual-model consensus or bilateral alignment to mitigate mislabel biases in source training (following BAN’s methodology).
Explore test-time adaptation using confidence calibration and lightweight entropy mechanisms to adapt to new acquisition domains or seasonal shifts without retraining (guided by LSCD-TTA).
We will evaluate these extensions under metrics such as unknown-recognized accuracy, HOS, and real-time inference latency to ensure practical viability in fielded agricultural robotics.
Insights
The high initial accuracy demonstrates that the feature extractor and classifier are capable of capturing strong class-specific signals from Crops3D.
The periodic drops in validation accuracy and F1 indicate the expected trade-off during adversarial adaptation: achieving domain invariance sometimes reduces source-domain separability.
The final classification metrics and confusion matrix confirm robust adaptation, with minimal misclassifications.
The entropy loss encourages confident predictions on unlabeled target samples, improving generalization despite noisy domain gaps.
Overall, the results demonstrate that our adversarial domain adaptation pipeline successfully transfers knowledge from Crops3D to Pheno4D with minimal source performance degradation.
Conclusion
In this study, we proposed and evaluated an adversarial unsupervised domain adaptation framework for 3D point cloud classification in agricultural settings. Utilizing a PointNet-based feature extractor, a domain discriminator trained with a gradient reversal layer, and entropy minimization, the method successfully addressed domain shift between the Crops3D and Pheno4D datasets. The results demonstrate that our model maintains high classification accuracy across domains, achieving a 97% overall accuracy on the target domain with strong per-class F1 scores. Despite periodic instability during training, a common trait in adversarial adaptation the final model exhibited effective domain alignment while preserving class discriminability. The entropy loss further enhanced generalization by promoting confident predictions on unlabeled target samples. This work validates the potential of unsupervised 3D domain adaptation in agriculture and sets a foundation for robust, transferable models capable of handling real-world variability across diverse crop monitoring scenarios.
Author contributions
Z.F. implemented the proposed methodology, carried out the experiments, and prepared the figures and data visualizations. S.Z. contributed to the study design, supervised the research activities, and played a key role in refining the manuscript. M.H.T. co-supervised the project, provided conceptual guidance, and contributed to the writing and technical validation of the manuscript. All authors reviewed, edited, and approved the final version of the manuscript.
Data availability
The datasets generated and/or analysed during the current study are available in35 (Online link: https://springernature.figshare.com/ndownloader/files/50027964) and36 (Online link: https://www.ipb.uni-bonn.de/data/pheno4d/index.html) .
Declarations
Competing interests
The authors declare no competing interests.
Footnotes
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Contributor Information
Zainab Fatima, Email: zfatima@kennesaw.edu.
Muhammad Hassan Tanveer, Email: mtanveer@kennesaw.edu.
References
- 1.Zhang, N., Wang, M. & Wang, N. Precision agriculture: A worldwide overview. Comput. Electron. Agric.36(2–3), 113–132 (2002). [Google Scholar]
- 2.Gebbers, R. & Adamchuk, V. I. Precision agriculture and food security. Science327(5967), 828–831 (2010). [DOI] [PubMed] [Google Scholar]
- 3.Walter, A., Liebisch, F. & Hund, A. High-throughput field phenotyping: The new crop breeding frontier. Trends Plant Sci.24(10), 880–892 (2019). [Google Scholar]
- 4.Sankaran, S. et al. Low-altitude, high-resolution aerial imaging systems for row and field crop phenotyping. Trans. ASABE58(3), 521–530 (2015). [Google Scholar]
- 5.Tuia, D., Volpi, M., Copa, L., Kanevski, M. & Munoz-Mari, J. Domain adaptation in remote sensing: A review. IEEE Geosci. Remote Sens. Mag.4(2), 41–57 (2016). [Google Scholar]
- 6.Kamilaris, A. & Prenafeta-Boldú, F. X. Deep learning in agriculture: A survey. Comput. Electron. Agric.147, 70–90 (2018). [Google Scholar]
- 7.Pan, S. J. & Yang, Q. A survey on transfer learning. IEEE Trans. Knowl. Data Eng.22(10), 1345–1359 (2010). [Google Scholar]
- 8.Csurka, G. Domain adaptation for visual applications: A comprehensive survey. arXiv preprint arXiv:1702.05374 (2017)
- 9.Sa, I., Popovic, M., Khanna, R., Liebisch, F., Nieto, J., Stoyanov, D., & Siegwart, R. Weednet: Dense semantic weed classification using multispectral images and mav for smart farming. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 517–522 (2018)
- 10.Rußwurm, M. & Körner, M. Self-attention for raw optical satellite time series classification. ISPRS J. Photogramm. Remote. Sens.168, 89–100 (2020). [Google Scholar]
- 11.Ubbens, J., Cieslak, M., Prusinkiewicz, P. & Stavness, I. Latent space phenotyping: Automatic image-based phenotyping for treatment studies. Plant Methods16(1), 1–19 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Jiang, Y., Li, C., Paterson, A. H. & Zhang, J. 3d point cloud data for plant phenotyping: A review of acquisition, processing, and analysis. Comput. Electron. Agric.180, 105884 (2020). [Google Scholar]
- 13.Andújar, D., Escolà, A. & Rosell-Polo, J. R. Ground-based 3d imaging systems for agriculture: A review. Precis. Agric.17(6), 111–137 (2016). [Google Scholar]
- 14.Qi, C. R., Su, H., Mo, K., & Guibas, L. J. Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 652–660 (2017)
- 15.Zhu, Y. et al. Domain adaptation for 3d semantic segmentation: A survey. IEEE Trans. Pattern Anal. Mach. Intell. (2022) [DOI] [PubMed]
- 16.Graham, B., Engelcke, M., & Van Der Maaten, L. Submanifold sparse convolutional networks. arXiv preprint arXiv:1706.01307 (2018)
- 17.Ganin, Y. et al. Domain-adversarial training of neural networks. J. Mach. Learn. Res.17(1), 2096–2130 (2016). [Google Scholar]
- 18.He, K., Fan, H., Wu, Y., Xie, S., & Girshick, R. Momentum contrast for unsupervised visual representation learning. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 9729–9738 (2020)
- 19.Caron, M. et al. Unsupervised learning of visual features by contrasting cluster assignments. arXiv preprint arXiv:2006.09882 (2020)
- 20.Wang, L. et al. Self-training with pseudo-labels for cross-domain adaptation in remote sensing imagery. Remote Sens.14(24), 6298. 10.3390/rs14246298 (2022). [Google Scholar]
- 21.Zhou, H. et al. Cross-domain self-training for agricultural applications in remote sensing. Int. J. Remote Sens.46(5), 2450564. 10.1080/01431161.2025.2450564 (2025). [Google Scholar]
- 22.Zheng, J. et al. Partial domain adaptation for scene classification from remote sensing imagery. IEEE Trans. Geosci. Remote Sens.61, 1–17. 10.1109/TGRS.2022.3229039 (2023). [Google Scholar]
- 23.Ma, Y., Yang, Z., Huang, Q. & Zhang, Z. Improving the transferability of deep learning models for crop yield prediction: A partial domain adaptation approach. Remote Sens.15(18), 4562. 10.3390/rs15184562 (2023). [Google Scholar]
- 24.Zheng, J. et al. Open-set domain adaptation for scene classification using multi-adversarial learning. ISPRS J. Photogramm. Remote. Sens.208, 245–260. 10.1016/j.isprsjprs.2024.01.015 (2024). [Google Scholar]
- 25.Liang, Y., Zhang, X., Zheng, J., Huang, J. & Fu, H. Low saturation confidence distribution-based test-time adaptation for cross-domain remote sensing image classification. Int. J. Appl. Earth Obs. Geoinf.10.1016/j.jag.2024.102035 (2024). [Google Scholar]
- 26.Li, Y. et al. Pointcnn: Convolution on x-transformed points. In Advances in Neural Information Processing Systems vol. 31. (Curran Associates, Inc., 2018)
- 27.Luo, Y., Xu, M., Liu, H., & Zhang, L. Adversarial domain adaptation for cross-species 3d plant segmentation using lidar data. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops 1234–1243 (2021)
- 28.Zhao, H., Wang, Y., Tang, J. & Li, X. Contrastive domain adaptation for 3d point cloud canopy estimation in agriculture. IEEE Trans. Geosci. Remote Sens.60, 1–13 (2022). [Google Scholar]
- 29.Cheng, R., He, Z., Sun, Y., & Liu, W. Voxel-wise feature alignment for cross-domain crop phenotyping in 3d point clouds. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 4210–4219 (2023)
- 30.Grill, J. B. et al. Bootstrap your own latent: A new approach to self-supervised learning. NeurIPS33, 21271–21284 (2020). [Google Scholar]
- 31.Cao, Z., Long, M., Wang, J. & Jordan, M. I. Partial transfer learning with selective adversarial networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2724–2732 (2018).
- 32.Wang, S., Huang, Y. & Xu, T. Plant disease classification using partial domain adaptation and class-weighted loss. Comput. Electron. Agric.187, 106278 (2021). [Google Scholar]
- 33.Chen, W., Wen, Y., Zheng, J., Huang, J. & Fu, H. Ban: A universal paradigm for cross-scene classification under noisy annotations from rgb and hyperspectral remote sensing images. IEEE Trans. Geosci. Remote Sens.63, 1–13. 10.1109/TGRS.2025.xxxxxx (2025). [Google Scholar]
- 34.Xie, Y., Liu, J., & Zhang, Y. Multi-modal 3d-to-2d domain fusion for weed detection. In International Conference on Robotics and Automation (ICRA) (2023)
- 35.Vasilescu, I., Sivasankaran, S., Neumann, K., & Timofte, R. Crops3d: A 3d plant phenotyping dataset for instance segmentation and multi-view reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 21606–21615 (IEEE, 2022)
- 36.Schunck, B., Schneider, L., Wegner, J. D., Stachniss, C., & Buhmann, J. Pheno4d: A spatio-temporal dataset of 3d plant point clouds for phenotyping. In European Conference on Computer Vision (ECCV) 97–113 (Springer, 2020)
- 37.Qi, C. R., Yi, L., Su, H., & Guibas, L. J. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. In Advances in Neural Information Processing Systems (NeurIPS) vol. 30, 5105–5114 (2017). https://proceedings.neurips.cc/paper_files/paper/2017/file/d8bf84be3800d12f74d8b05e9b8989ba-Paper.pdf
- 38.Li, Q., Wen, Y., Zheng, J., Zhang, Y. & Fu, H. Hyunida: Breaking label set constraints for universal domain adaptation in cross-scene hyperspectral image classification. IEEE Trans. Geosci. Remote Sens.62, 1–15. 10.1109/TGRS.2024.3400959 (2024). [Google Scholar]
- 39.Li, Q., Zhang, Y., Zheng, J. & Fu, H. Boosting universal domain adaptation in remote sensing with dual-classifiers consistency discrimination and cross-domain feature mixup. IEEE Trans. Geosci. Remote Sens.63, 1–13. 10.1109/TGRS.2025.3571747 (2025). [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
The datasets generated and/or analysed during the current study are available in35 (Online link: https://springernature.figshare.com/ndownloader/files/50027964) and36 (Online link: https://www.ipb.uni-bonn.de/data/pheno4d/index.html) .














