Skip to main content
PLOS ONE logoLink to PLOS ONE
. 2025 Jan 6;20(1):e0314086. doi: 10.1371/journal.pone.0314086

LBNP: Learning features between neighboring points for point cloud classification

Lei Wang 1,2, Ming Huang 2,*, Zhenqing Yang 3, Rui Wu 1,2, Dashi Qiu 2, Xingxing Xiao 1,2, Dong Li 1,2, Cai Chen 1,2
Editor: Ayesha Maqbool4
PMCID: PMC11703072  PMID: 39761310

Abstract

Inspired by classical works, when constructing local relationships in point clouds, there is always a geometric description of the central point and its neighboring points. However, the basic geometric representation of the central point and its neighborhood is insufficient. Drawing inspiration from local binary pattern algorithms used in image processing, we propose a novel method for representing point cloud neighborhoods, which we call Point Cloud Local Auxiliary Block (PLAB). This module explores useful neighborhood features by learning the relationships between neighboring points, thereby enhancing the learning capability of the model. In addition, we propose a pure Transformer structure that takes into account both local and global features, called Dual Attention Layer (DAL), which enables the network to learn valuable global features as well as local features in the aggregated feature space. Experimental results show that our method performs well on both coarse- and fine-grained point cloud datasets. We will publish the code and all experimental training logs on GitHub.

Introduction

Point cloud classification is one of the core tasks in the field of computer vision and plays a vital role in various fields, such as urban information modeling, 3D map creation, and automatic driving.

For irregular point cloud data, PointNet [1] pioneered point-based deep neural network technology using a point-wise Multi-Layer Perceptron(MLP) and symmetric function. Subsequently, PointNet++ [2] introduced local SetAbstraction to realize local information aggregation. The development of these techniques has inspired subsequent research, and many excellent studies have developed novel and complex local aggregation modules to enhance the local information representation ability of point clouds. However, these methods mostly establish the relationship between neighborhood points and the center point to form an implicit or explicit local geometric description, thereby enhancing the aggregation and representation capabilities of the local point cloud. We believe that strengthening the correlation between neighborhood points, rather than just the correlation between neighborhood points and the center point, will be more helpful for the local feature extraction of point clouds, which will bring beneficial effects to downstream tasks.

In this paper, we propose a plug-and-play Local Auxiliary Block called the Point Cloud Local Auxiliary Block (PLAB), which focuses on enhancing the correlation between neighboring points to improve the local feature representation capabilities of point clouds. Inspired by the extraction methods of image texture features, particularly the Local Binary Pattern (LBP) [3], this method emphasizes the subtle structural differences between different regions in an image and is mainly used to describe the surface properties of an image or image region, such as the thickness and density of the texture, which aid in image discrimination. Inspired by this, we sought to use neural networks to identify potential “texture features” in point clouds to improve the expression of local features. Mimicking LBP, we propose a local feature aggregation module, which uses relationship mapping between neighboring points relative to the central point to obtain sufficient neighborhood feature information by feature aggregation instead of using an encoding process. To improve the sensitivity of neighborhood points, we transform the Cartesian coordinate system into a spherical coordinate system as a supplementary feature, and we use the spherical coordinates to sort the adjacent points clarifying the point–pair relationships. For more consistent order independence, we employ a symmetric function to aggregate the information of neighboring points.

Furthermore, we note that the over-subdivision of the PLAB structure may introduce additional noise. To address this problem, we propose a novel local and global fusion pure Transformer Layer called the Dual-Attention Layer (DAL) as our backbone network. It is well known that both local and global features play important roles in point cloud tasks. Global features can capture the shape and structural information of the point cloud, while local features provide set and detail information. Recently, various model have emerged for point cloud classification. The Transformer has received extensive attention owing to its strong long-range dependence modeling ability, and many outstanding works have emerged from it. For example, the local attention mechanism of Point Transformer [4] and global attention mechanism of Point Cloud Transformer [5] have inspired related research. Other 3D Transformer-related variants have also been successfully explored in terms of speed, accuracy, and efficiency; however, few pure Transformer models are based on both local and global information. In our Transformer module, offset-attention is used as the global Transformer module, and the calculated global attention scores is introduced into the local attention module with high attention position indices, forming the DAL.

We propose a complementary module for local neighborhoods and a pure Transformer layer model designed for local and global modeling in point cloud classification tasks.

Our main contributions are as follows:

·A novel local aggregation module, PLAB, is proposed, which can improve all types of point cloud networks with local aggregation properties to different degrees.

·The PLAB module is plug-and-play. Without changing the original network structure, it can be inserted directly into the model to improve its effectiveness.

·A new pure Transformer structure that integrates global and local features, DAL, is characterized by low parameter count and superior performance, showing promising results on classification datasets.

Related work

Deep learning for point clouds

Owing to the disordered and unstructured nature of point clouds, early deep learning methods were usually based on multi-view or voxel methods. Multi-view methods [68] usually use multiple viewpoints of the point cloud to capture richer information to improve task performance. However, this projection method cannot adapt to surface density changes and is affected by occlusion. Voxel methods [913] map 3D space to a regular voxel grid, so 3D convolution can be used naturally, but such methods are prone to losing fine-grained geometric features, and the voxelization process results in heavy memory and computational consumption. PointNet pioneered a point-based approach that uses a shared MLP to encode each individual point and global pooling to aggregate each encoded point feature. PointNet++ is an extension of PointNet, by using multi-scale local PointNets to enhance the geometric representation. Numerous other studies have also built upon this foundational works.

Point cloud neighborhood feature aggregation

Generally, local aggregation operators can be divided into two categories: convolution- and graph-based methods. Among the convolution-based methods, PointCNN [14] uses x-transformation to learn the weights and order of neighborhood points to ensure that the point cloud can perform convolution operations. KPConv [15] associates the weight matrix with a predefined kernel point in 3D space to adapt it to the uneven distribution of local points. PAConv [16] constructs a position-adaptive convolution operator with a dynamic kernel, which combines weight matrices in a weight library, and utilizes MLPs to learn weight coefficients based on relative point positions. Among graph-based methods, DGCNN [17], as a pioneering method, aggregates neighboring points in the feature space on a dynamically updated graph at each layer. CurveNet [18] groups line segments connected into lines in a point cloud expanding the receptive field and explore useful geometric information. Most of these methods adhere to the basic paradigm of PointNet++, wherein the relationship between adjacent points and center points are constructed by absolute or relative coordinates, and then other simple or complex operations are used to obtain more explicit local geometric features.

In previous approaches, a widely used encoding approach is by constructing the coordinates of neighboring points relative to the center point, which brings benefits to the model in terms of position invariance and preservation of local structure [1417]. However, we believe that such an encoding approach ignores the associations between neighboring points, because the local geometry exists not only in the relative relationship between neighboring points and the centroid, but as a whole in all pairs of points. In order to enhance the associations between point pairs, we attempt to enhance the representation of geometric information in the neighborhood by exploring the relationships between neighborhood points used to aggregate features, hence the proposed PLAB module.

Point cloud transformer

Transformers have made significant progress in the field of computer vision, and a series of explorations have been conducted in the 3D point cloud field. PCT is a classical global Transformer structure that first employs neighborhood embedding to aggregate local information and then uses this information as a four-layer stacked global Transformer block. Finally, global Max pooling and average pooling are applied to obtain the global classification information. 3CROSSNet [19] utilizes cross-level, cross-scale, and cross-attention strategies to capture point cloud features. The network first obtains point cloud subset features at different resolutions through FPS and MLP and then uses the stack and fusion of Transformer blocks to obtain complex structures and details in the point cloud. Referring to the Canny edge detection operator, APES [20] explored a point cloud downsampling method combined with an attention mechanism to capture salient points in the point cloud contour. Specifically, this method calculates the correlation between points and selects the points with high correlation, namely, the salient points on the contour for sampling. Thus, the network can learn and extract edge features from the data more effectively. The idea of the Stratified Transformer [21] is similar to that of Swin Transformer [22], which divides the point cloud into different windows to reduce the computation of the Transformer. To achieve information interaction between different windows and improve the receptive field, window displacement operation and key sampling stratification strategy are adopted.PT [4] is another classic local Transformer structure composed of five layers of continuously downsampled local Transformer blocks. For each block, it uses KNN to obtain nearby points, and for each neighborhood point set, it uses the vector attention mechanism to capture local features. PTv2 [23] also introduced group vector attention, position encoding multipliers, and pooling based on segmentation to enhance local geometric capture and ensure sensitivity to global context. Subsequent PTv3 [24] introduced serialized encoding to achieve faster and better results. PointConT [25] uses the locality of points in the feature space to cluster sampling points with similar features into the same class and calculates the self-attention for the points in each class. In the feature aggregation of PointConT, the feature vector is divided into two branches of low and high frequency through maximum and average pooling, respectively, which are used to extract local information more fully. In the above studies, the global attention mechanism fails to provide finer local geometric capture [5, 19, 20], while the local attention mechanism loses its long-range dependency [4], and the hybrid approach incorporates other complex modules or structures [21, 2326]. However, it is easy to be overlooked that the global attention matrix is the neighborhood representation of points in the feature space, and we propose that DAL using the neighborhood features indexed by the attention graph can preserve the long-range dependence property inside the Transformer while accomplishing the local aggregation.

Methodology

Overview

Fig 1 shows our network architecture, in which the PLAB structure is utilized in the Embedding layer to assist local feature extraction, and the main framework is constructed by stacking the DAL, which is composed of Offset-Attention and External-Attention [27]. Specifically, in the PLAB layer, point cloud coordinates are used as the main features, and the initial features are projected into a higher dimension space using a local MLP and symmetric function. PLAB is used as an auxiliary feature and is extended to the same dimension as the main feature by MLP and the symmetric function and then fused with it. The k-nearest neighbor point clusters are organized and sorted according to the spherical coordinates to establish the point–pair relationship among the neighborhood points and calculate the relative coordinates. The transformed spherical coordinates are added to the neighborhood point pairs as supplementary features. We want the features of the neighborhood pairs to be optional; therefore, we employ Attentive Pooling [28] as the aggregation unit which assign different weights to the feature channels and finally aggregate them using the summation function.

Fig 1. Model structure.

Fig 1

The encoder part consists of a layer of PLAB to expand the dimension, four layers of DAL with MLP to mine features, MA Pooling to connect Max Pool and Average Pool, and LBRD consisting of Linear, BatchNorm, ReLU, and Dropout.

The global module of DAL utilizes the dot product of the query vector, key vector, and softmax to compute the attention score map, which serves as a feature selector. Specifically, it has been demonstrated that most pixels are closely related to only a few other pixels [27], and this is also applicable to point clouds. Therefore, the attention score formed by the dot product can select the Top-K most relevant to the current point to fully describe the feature information of the point. The selected Top-K feature vectors are then sent to the local Transformer module for local attention calculation. However, due to the quadratic complexity of the attention calculation incurs huge computational overhead, if the local attention calculation still uses the dot product attention form will become DAL huge. Therefore, we utilize the lightweight attention mechanism External-Attention for local attention calculation. This not only has the lightweight advantage of linear complexity but also the memory unit can pay attention to the potential relationship between different local features.

Point cloud local auxiliary block

Our PLAB module is based on the texture feature extraction algorithm, LBP, which is an operator designed the local texture features of an image and can be employed for image feature analysis. Here, we provide a briefly introduce the classical LBP operator. The classical LBP operator is defined as a 3×3 square window, where the center pixel of the window is taken as the threshold and the gray value of its eight neighboring pixels is compared with the pixel value of the center of the current window. If the pixel value of the neighborhood is less than that of the center point, the value of the pixel is set to 0; otherwise, the value of the pixel is set to 1. In this manner, the eight pixels in the neighborhood of a 3×3 window are compared with the center pixel, and an 8-bit binary number is generated, which can generate 256 LBP codes. The LBP codes calculated using this method can be used to reflect the regional texture feature information of the window.

We note that the size comparison of neighborhood points relative to the center point within the local window can be interpreted as a way to establish the relationship between the neighborhood and center point, such as their relative position in the point cloud, whereas the binary encoding process can be considered as a process to establish the relationship between neighborhood points. As show in Fig 2, we describe this process as a simple deep learning task for point clouds:

Fig 2. PLAB structure, which concatenates the two coordinate representations into the local module after coordinate transformation, subtracts neighboring coordinates, and uses attentive pooling to weight and aggregate features.

Fig 2

  1. obtain the nearest neighbors of the point cloud;

  2. sort the neighboring points to establish the relationship between the neighboring point pairs;

  3. learn feature information about the neighboring points.

In local feature sorting and encoding, for the k-nearest points of each central point Pi={Pi1Pik}, the Cartesian coordinate system is converted into a spherical coordinate system, and the adjacent points are sorted according to their polar angles to obtain the sorted adjacent points P^i={P^i1P^ik}. The features of the adjacent point pairs are calculated using the sorted adjacent points, and the dimension is extended. This is defined as follows:

Fik=MLP{F^cik;F^sik}F^cik=FcikFcik1,F^sik=FsikFsik1 (1)

where Fci and Fsi represent the Cartesian coordinate-based Pi features and spherical coordinate-based P^i features of neighboring points after sorting, respectively. The adoption of these two types of coordinate representations can mitigate the feature ambiguity caused by too close a proximity of neighboring points [29].

The attention score is then calculated. After obtaining the features of the local point pairs Fi={Fi1Fik}, a set of shared functions g() is used to obtain the weighted attention scores of the features for each feature channel. Finally, the sum function is used to aggregate the adjacent points. This is defined as follows:

Fi=k=1k(Fiksik)sik=g(F^ik,W) (2)

where W is the learnable weight of the shared MLP, and g() is formed by combining the shared MLP with softmax. The feature aggregation of neighboring points using this method not only brings the advantage of order independence but also means the local features of neighboring points play a role together, similar to the encoding process of LBP.

Dual attention layer

As show in Fig 3, the DAL contains two types of attention mechanisms: Offset-Attention and External-Attention. Although these two attention mechanisms are based on existing models, they are combined by ingenious design to pay attention to the global and local relations at the same time. Inspired by DGCNN, the feature vector for local aggregation is formed by calculating the pairwise feature vector distance in the feature space and selecting the k-nearest neighbors. The main role of self-attention is to determine the relevance of each vector to the other vectors, and W places an importance weight on each element. Instead of using the pairwise feature space distances, we utilize a dot product score matrix to select the feature vectors corresponding to the Top-K scores, thereby forming the feature vector for local aggregation. The local Transformer module employs External-Attention. In addition, the memory unit implicitly considers the relationship between different feature maps. When the input is local information, the memory unit considers the relationships among all local features.

Fig 3. DAL structure, which consists of offset-attention and external-attention.

Fig 3

N points are input with d-dimensional features from the PLAB layer. Q, K, and V are obtained by Fs∈ℝN×d as the linear transformation of the query, key, and value, respectively:

(Q,K,V)=Fs(Wq,Wk,Wv)Q,KN×d/4,VN×d (3)

Where Wq, Wk and Wv are a shared learnable linear transformation, and the dimensions of query and key are reduced to d/4 for efficient computation.

We compute the attention weights by matrix dot product using query and key:

A˜=(a˜)i,j=QKT (4)

Where A˜ is the attention score matrix obtained by computing the dot product of Q and K. Specifically, based on the query vector Q, a weight distribution A˜ is obtained by calculating the similarity of all query vectors Q with all key vectors K.

We normalize the weights and assign them to the score matrix A˜ as follows:

(a¯)i,j=softmax(a˜i,j)=exp(a˜i,j)kexp(a˜k,j)+CAg=ai,j=a¯i,jka¯i,k (5)

where softmax enhances the difference between the values through exponential operations, so that larger attention scores are given higher weights in the probability distribution, and the influence of smaller attention scores is suppressed, and then l1-norm is applied for further normalization to reduce the noise effect.

Next, we select the Top-K attention scores from matrix Ag to obtain Vk, which corresponds to the selected V values. Vk is then used as input for the local module:

V˜k=VVk (6)

where Vk is the nearest neighbor feature in feature space obtained by score matrix indexing, local features V˜k are obtained by establishing the relative relationship.

Al=Norm(V˜kMkT) (7)
V¯k=AlMv (8)

where Mk and Mv are learnable linear layers. Since V˜k corresponds to a neighborhood in the feature space and each feature vector affects Mk and Mv, a memory cell is formed that takes into account all local features. After computing Al, step (5) is still performed. Finally, V¯k is the computed result of the local External-Attention. We obtain the local attention feature matrix, aggregate the features, and compute the global attention as follows:

V¯=max(V¯k) (9)

Where the local feature V¯k is aggregated into feature V¯ after maximum pooling.

Fsa=AgV¯ (10)

the local aggregated features are weighted by the attention score matrix Ag to obtain the global self-attention Fsa.

The final output is obtained through the bias module:

Fout=BR(LBR(FsFsa)+Fs) 11

Where LBR stands for Linear, BatchNorm, and ReLU. This part is used to enhance the model’s feature learning.

Experiments

After describing the methodology, we test the accuracy of the proposed method on the ModelNet40 dataset and the FG3D dataset for both coarse- and fine-grained classification tasks, and compare it with a range of classical and state-of-the-art models to illustrate the advancement of our method. In addition, we insert the PLAB module into other models to illustrate its effectiveness. We also provide implementation details of the models, including hardware configuration and hyperparameter settings.

Experimental details

The classification network was implemented in PyTorch using an NVIDIA GEFORCE RTX 4060Ti GPU. We used the SGD optimizer with momentum 0.9, weight decay rate 0.0001, initialized learning rate set to 0.01, and cosine annealing schedule to adjust the learning rate of each epoch. We used 250 epochs for the classification network. The batch size was set to 24.

Classification on ModelNet40 dataset

The ModelNet40 dataset contains 12311 CAD models with 40 classes, split it into 9843 training samples and 2468 test samples. Similar to most networks, we downsampled each input point cloud to 1024 points and used only 3D coordinates as input. The overall accuracy of each class was used for accuracy evaluation. The total number of Parameters and Floating-Point operations (FLOPs) were used for model size evaluation.

Table 1 lists the results for the various types of networks reproduced using our device. Our Transformer module and 3DGTN achieved the best results among all network results; however, our network only uses 3D coordinates, and the numbers of parameters and FLOPs are relatively low.

Table 1. Classification result on the ModelNet40 dataset.

Methods Input OA (%) Parameters FLOPs
Other Learning-based Methods
SO-Net [30] 2048×3 90.9 - -
Point2Sequence [31] 2048×3 92.6 - -
PointCNN [14] 1024×3 91.7 - -
PointNet [1] 1024×3 89.2 3.47 M 0.45 G
PointNet++(SSG) [2] 1024×3 92.4 1.48 M 0.87 G
PointNet++(MSG) [2] 1024×3 92.7 1.75 M 4.07 G
DGCNN [17] 1024×3 92.6 1.82 M 2.43 G
DGCNN+Pnp-3D [32] 1024×3 92.5 1.93 M 3.57 G
PointMLP [33] 1024×3 92.8 13.23 M 15.73 G
DualMLP [34] 1024×3 93.1 14.32M -
G-PointNet++ [35] 1024×3 92.7 - -
Transformer-based Methods
GBNet [36] 1024×3 92.7 8.79 M 9.86 G
Point Transformer [37] 1024×3 91.1 9.85 M 18.40 G
PCT [5] 1024×3 92.4 2.88 M 2.32 G
3DGTN [38] 1024×3,N 93.3 5.12 M 3.09 G
DCNet [39] 1024×3 92.4 2.21 M 7.80 G
PointConT [40] 1024×3 92.9 - -
Ours 1024×3 93.3 2.43 M 5.60 G

† represents open source code recapitulation network experiments on NVIDIA GEFORCE RTX 4060Ti GPU. N a represents normal vector.

To verify the effectiveness of PLAB, it was incorporated into several classical models. To make the experiment more comprehensive, we fixed the random seed and ensured that other hyperparameters and the model network structure remained unchanged.

Table 2 shows that our method had improved effects on point-, graph-, and Transformer-based networks, as well as Fig 4 illustrates the reasons why PLAB can improve feature recognition. In PointNet++ and PCT, PLAB was inserted into the downsampling stage of the network, and the neighborhood size was set to that of the original network. In DGCNN, because Edge Conv has the property of non-local aggregation, we only inserted PLAB after the first Edge Conv, and the other layers remained unchanged. Owing to the large scale of the MSG network, we set its batch size to 16, which resulted in the performance of MSG being lower than that of SSG. Additionally, fixing the random seed reduced the overall accuracy of the network by approximately 0.3 percentage points.

Table 2. PLAB validity verification.

Based Methods OA(%) +PLA(B%)
MLP PointNet++(SSG) [2] 92.3 92.6(0.3↑)
PointNet++(MSG) [2] 92.2 92.5(0.3↑)
Transformer PCT [5] 92.1 92.7(0.6↑)
Graph DGCNN [17] 92.4 92.7(0.3↑)

† represents open source replicated network experiment and

‡ represents fixed random seed 1207.

Fig 4. Visualization of the normalized PLAB-encoded values.

Fig 4

It can be seen that different coding values are generated in the edge parts and different surfaces, which is key to assisting the classification task.

Classification on FG3D dataset

The FG3D dataset is a fine-grained point cloud dataset that contains three major categories: aircraft, car, and chair, with 66 fine-grained subcategories and 25552 3D models. We used the same parameters as ModelNet40 for testing this dataset. The experimental results are shown in Table 3.

Table 3. Classification results on the FG3D dataset.

Methods Input FG3D
Airplane Car Chair
MVCNN [8] View 12×2242 91.11 76.12 82.9
FG3D-Net [41] View 12×2242 93.99 79.47 83.94
SO-Net [30] Point 2048×3 82.92 59.32 70.05
Point2Sequence [31] Point 2048×3 92.76 73.54 79.12
PointCNN [14] Point 1024×3 90.3 68.37 74.87
PointNet [1] Point 1024×3 89.34 73 75.44
PointNet++(MSG) [2] Point 1024×3 95.96 77.87 81.23
MSP-Net [42] Point 1024×3 93.03 74.25 68.69
PointAtrousGraph [43] Point 1024×3 95.22 74.77 79.2
Point2SpatialCapsule [44] Point 1024×3 95.19 75.92 79.53
DGCNN [17] Point 1024×3 93.6 72.1 79.53
DGCNN+Pnp-3D [32] Point 1024×3 94.26 74.98 78.39
GBNet [36] Point 1024×3 95.21 75.36 80
Point Transformer [37] Point 1024×3 91.53 67.88 71.73
PCT [5] Point 1024×3 95.16 78.89 81.37
PointMLP [33] Point 1024×3 95.76 76.35 81.81
DCNet [39] Point 1024×3 97.31 79.15 83.67
Ours Point 1024×3 96.02 76.73 81.33

† represents reproduction of open-source code experiments on an NVIDIA Tesla V100 GPU

Table 3 shows the classification accuracies of airplane, car and chair using our method. DCNet performs better on fine-grained datasets compared to our method, and its model focuses exclusively on local details because FG3D has greater intraclass sample variance, which requires that the model must be more attentive to the sample details, whereas ModelNet40 needs to be more attentive to the global features in order to achieve the interclass sample recognition, which also makes it perform poorly on coarse-grained datasets. Our method balances the two requirements well and achieves good results on both coarse- and fine-grained datasets, although it does not perform as well as DCNet on fine-grained datasets.

Ablation study

To evaluate the effectiveness of each part of our method, we performed ablation experiments on the ModelNet40 dataset. In Table 4, use PCT as the baseline network, DAL-G represents the global part of DAL, and DAL-L represents the local part.

Table 4. Ablation study on model structure.

Baseline PLAB DAL-G DAL-L OA(%) mAcc(%)
92.4 89.4
92.6 89.9
92.7 90.1
92.6 89.9
93.1 90.5
93.3 91.1

In addition, we experimented with the network accuracy effects of different network depths and neighborhood sizes, as shown in Table 5. The experimental results are consistent with the characteristics of the Transformer architecture [40].

Table 5. Ablation study on different channel & stage and number of neighboring point.

Number of Neighbors(k) OA(%) mAcc(%)
12 92.0 88.6
16 92.8 90.2
20 93.3 91.1
24 93.1 90.7
40 93.0 90.3
Network depth (stage & channel) OA(%) mAcc(%)
64-64-64 92.7 89.8
64-64-128-256 93.0 90.1
64-64-128-128-256 93.3 91.1
64-64-128-64-128-256 93.1 90.5

Conclusion

In this paper, we proposed a simple and effective local complement module, PLAB, which is derived from the local binary pattern image texture extraction algorithm, to establish the relationship between neighborhood points and obtain effective features. Experiments showed that this module is suitable for different types of point cloud networks. In addition, we propose a dual local and global attention mechanism, DAL, which not only has the long-range dependence property of the Transformer structure but also pays attention to consistency in the feature space. The attention score map was used to obtain consistent features in the feature space for each point as the input of local attention. Considering the computing power consumed by the Transformer network, the lightweight attention mechanism of External-Attention was used to calculate the local feature attention in the local attention part. Finally, we proved the effectiveness of the proposed model on the ModelNet40 and FG3D datasets.

Data Availability

The ModelNet40 dataset are available from the Kaggle database (https://www.kaggle.com/datasets/balraj98/modelnet40-princeton-3d-object-dataset) The FG3D dataset are available from Kaggle database (https://www.kaggle.com/datasets/yue123y/fine-grained-3d).

Funding Statement

This study is sponsored by the BUCEA Doctor Graduate Scientific Research Ability Improvement Project(DG2024034) and National Natural Science Foundation of China (42171416). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.Charles R.Q., et al. PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation. in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2017. [Google Scholar]
  • 2.Ruizhongtai Qi C., et al., PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space. 2017. [Google Scholar]
  • 3.Ojala T., Pietikainen M., and Maenpaa T., Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2002. 24(7): p. 971–987. [Google Scholar]
  • 4.Zhao H., et al. Point transformer. in Proceedings of the IEEE/CVF international conference on computer vision. 2021. [Google Scholar]
  • 5.Guo M.-H., et al., PCT: Point cloud transformer. Computational Visual Media, 2021. 7(2): p. 187–199. [Google Scholar]
  • 6.Boulch A., Saux B.L., and Audebert N. , Unstructured point cloud semantic labeling using deep segmentation networks, in Proceedings of the Workshop on 3D Object Retrieval. 2017, Eurographics Association: Lyon, France. p. 17–24. [Google Scholar]
  • 7.Lawin F.J., et al. Deep Projective 3D Semantic Segmentation. in Computer Analysis of Images and Patterns. 2017. Cham: Springer International Publishing. [Google Scholar]
  • 8.Su H., et al. Multi-view Convolutional Neural Networks for 3D Shape Recognition. in 2015 IEEE International Conference on Computer Vision (ICCV). 2015. [Google Scholar]
  • 9.Maturana D. and Scherer S. VoxNet: A 3D Convolutional Neural Network for real-time object recognition. in 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). 2015. [Google Scholar]
  • 10.Shi S., et al., PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection. 2020. 10526–10535. [Google Scholar]
  • 11.Riegler G., Ulusoy A.O., and Geiger A., OctNet: Learning Deep 3D Representations at High Resolutions. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016: p. 6620–6629. [Google Scholar]
  • 12.Ben-Shabat Y., Lindenbaum M., and Fischer A., 3DmFV: Three-Dimensional Point Cloud Classification in Real-Time Using Convolutional Neural Networks. IEEE Robotics and Automation Letters, 2018. 3(4): p. 3145–3152. [Google Scholar]
  • 13.Meng H.-Y., et al., VV-Net: Voxel VAE Net With Group Convolutions for Point Cloud Segmentation. 2019. 8499–8507. [Google Scholar]
  • 14.Li Y., et al. PointCNN: Convolution On X-Transformed Points. in Neural Information Processing Systems. 2018. [Google Scholar]
  • 15.Thomas H., et al. KPConv: Flexible and Deformable Convolution for Point Clouds. in 2019 IEEE/CVF International Conference on Computer Vision (ICCV). 2019. [Google Scholar]
  • 16.Xu M., et al. PAConv: Position Adaptive Convolution with Dynamic Kernel Assembling on Point Clouds. in 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2021. [Google Scholar]
  • 17.Wang Y., et al., Dynamic Graph CNN for Learning on Point Clouds . ACM Trans. Graph., 2019. 38(5): p. Article 146. [Google Scholar]
  • 18.Xiang T., et al. Walk in the Cloud: Learning Curves for Point Clouds Shape Analysis. in 2021 IEEE/CVF International Conference on Computer Vision (ICCV). 2021. [Google Scholar]
  • 19.Han X.F., et al., 3CROSSNet: Cross-Level Cross-Scale Cross-Attention Network for Point Cloud Representation. IEEE Robotics and Automation Letters, 2022. 7(2): p. 3718–3725. [Google Scholar]
  • 20.Wu C., et al. Attention-Based Point Cloud Edge Sampling. in 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2023. [Google Scholar]
  • 21.Lai X., et al. Stratified Transformer for 3D Point Cloud Segmentation. in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2022. [Google Scholar]
  • 22.Liu Z., et al. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. in 2021 IEEE/CVF International Conference on Computer Vision (ICCV). 2021. [Google Scholar]
  • 23.Wu X., et al., Point Transformer V2: Grouped Vector Attention and Partition-based Pooling. ArXiv, 2022. abs/2210.05666. [Google Scholar]
  • 24.Wu X., et al. Point Transformer V3: Simpler, Faster, Stronger. in 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2024. [Google Scholar]
  • 25.Liu Y., et al., Point Cloud Classification Using Content-Based Transformer via Clustering in Feature Space . IEEE/CAA Journal of Automatica Sinica, 2024. 11(1): p. 231–239. [Google Scholar]
  • 26.Lu D., et al., 3DGTN: 3-D Dual-Attention GLocal Transformer Network for Point Cloud Classification and Segmentation. IEEE Transactions on Geoscience and Remote Sensing, 2022. 62: p. 1–13. [Google Scholar]
  • 27.Guo M.H., et al., Beyond Self-Attention: External Attention Using Two Linear Layers for Visual Tasks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023. 45(5): p. 5436–5447. doi: 10.1109/TPAMI.2022.3211006 [DOI] [PubMed] [Google Scholar]
  • 28.Hu Q., et al. RandLA-Net: Efficient Semantic Segmentation of Large-Scale Point Clouds. in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2020. [Google Scholar]
  • 29.Ran H., Liu J., and Wang C. Surface Representation for Point Clouds. in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2022. [Google Scholar]
  • 30.Li J., Chen B.M., and Lee G.H. SO-Net: Self-Organizing Network for Point Cloud Analysis. in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018. [Google Scholar]
  • 31.Liu X., et al. Point2Sequence: Learning the Shape Representation of 3D Point Clouds with an Attention-based Sequence to Sequence Network. in AAAI Conference on Artificial Intelligence. 2018. [Google Scholar]
  • 32.Qiu S., Anwar S., and Barnes N., PnP-3D: A Plug-and-Play for 3D Point Clouds. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021. 45: p. 1312–1319. [DOI] [PubMed] [Google Scholar]
  • 33.Ma X., et al., Rethinking Network Design and Local Geometry in Point Cloud: A Simple Residual MLP Framework. ArXiv, 2022. abs/2202.07123. [Google Scholar]
  • 34.Paul S., Patterson Z., and Bouguila N., DualMLP: a two-stream fusion model for 3D point cloud classification. The Visual Computer, 2023. [Google Scholar]
  • 35.Liu H. and Tian S., Deep 3D point cloud classification and segmentation network based on GateNet. The Visual Computer, 2024. 40(2): p. 971–981. [Google Scholar]
  • 36.Qiu S., Anwar S., and Barnes N., Geometric Back-Projection Network for Point Cloud Classification. IEEE Transactions on Multimedia, 2022. 24: p. 1943–1955. [Google Scholar]
  • 37.Zhao H., et al. Point Transformer. in 2021 IEEE/CVF International Conference on Computer Vision (ICCV). 2021. [Google Scholar]
  • 38.Lu D., et al., 3DPCT: 3D Point Cloud Transformer with Dual Self-attention. ArXiv, 2022. abs/2209.11255. [Google Scholar]
  • 39.Wu R., et al., DCNet: exploring fine-grained vision classification for 3D point clouds. Vis. Comput., 2023. 40(2): p. 781–797. [Google Scholar]
  • 40.Liu Y., et al., Point Cloud Classification Using Content-based Transformer via Clustering in Feature Space. 2023. [Google Scholar]
  • 41.Liu X., et al., Fine-Grained 3D Shape Classification With Hierarchical Part-View Attention. IEEE Transactions on Image Processing, 2021. 30: p. 1744–1758. doi: 10.1109/TIP.2020.3048623 [DOI] [PubMed] [Google Scholar]
  • 42.Bai J. and Xu H., MSP-Net: Multi-Scale Point Cloud Classification Network. Journal of Computer-Aided Design & Computer Graphics, 2019. 31(11): p. 1917–1924. [Google Scholar]
  • 43.Pan L., Chew C.M., and Lee G.H. PointAtrousGraph: Deep Hierarchical Encoder-Decoder with Point Atrous Convolution for Unorganized 3D Points. in 2020 IEEE International Conference on Robotics and Automation (ICRA). 2020. [Google Scholar]
  • 44.Wen X., et al., Point2SpatialCapsule: Aggregating Features and Spatial Relationships of Local Regions on Point Clouds Using Spatial-Aware Capsules. IEEE Transactions on Image Processing, 2020. 29: p. 8855–8869. doi: 10.1109/TIP.2020.3019925 [DOI] [PubMed] [Google Scholar]

Decision Letter 0

Ayesha Maqbool

16 Jun 2024

PONE-D-24-19727Learning features between neighboring points for point cloud classificationPLOS ONE

Dear Dr. wang,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

The paper presents the Point Cloud Local Auxiliary Block (PLAB) and the Dual Attention Layer (DAL), enhancing feature representation in point clouds. While the experimental results are promising, the manuscript requires significant revisions. The introduction should better connect existing methods to the authors' work, highlighting innovations. Additionally, the clarity of images and detailed explanations of formulas need improvement. Finally, more recent comparison methods should be included to strengthen the research context and relevance.

Please submit your revised manuscript by Jul 31 2024 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Ayesha Maqbool, PhD

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at 

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and 

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. Please note that PLOS ONE has specific guidelines on code sharing for submissions in which author-generated code underpins the findings in the manuscript. In these cases, we expect all author-generated code to be made available without restrictions upon publication of the work. Please review our guidelines at https://journals.plos.org/plosone/s/materials-and-software-sharing#loc-sharing-code and ensure that your code is shared in a way that follows best practice and facilitates reproducibility and reuse.

3. Thank you for stating the following financial disclosure: 

"This study is sponsored by the BUCEA Doctor Graduate Scientific Research Ability Improvement Project(DG2024034) and National Natural Science Foundation of China (42171416)."

Please state what role the funders took in the study.  If the funders had no role, please state: "The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript." 

If this statement is not correct you must amend it as needed. 

Please include this amended Role of Funder statement in your cover letter; we will change the online submission form on your behalf.

4. "Thank you for stating the following in the Acknowledgments Section of your manuscript: 

"This study is sponsored by the BUCEA Doctor Graduate Scientific Research Ability Improvement Project(DG2024034) and National Natural Science Foundation of China (42171416)."

We note that you have provided funding information that is not currently declared in your Funding Statement. However, funding information should not appear in the Acknowledgments section or other areas of your manuscript. We will only publish funding information present in the Funding Statement section of the online submission form. 

Please remove any funding-related text from the manuscript and let us know how you would like to update your Funding Statement. Currently, your Funding Statement reads as follows: 

"This study is sponsored by the BUCEA Doctor Graduate Scientific Research Ability Improvement Project(DG2024034) and National Natural Science Foundation of China (42171416)."

Please include your amended statements within your cover letter; we will change the online submission form on your behalf.

5. Thank you for stating the following in your Competing Interests section:  

"NO authors have competing interests"

Please complete your Competing Interests on the online submission form to state any Competing Interests. If you have no competing interests, please state "The authors have declared that no competing interests exist.", as detailed online in our guide for authors at http://journals.plos.org/plosone/s/submit-now 

 This information should be included in your cover letter; we will change the online submission form on your behalf.

6. Please amend the manuscript submission data (via Edit Submission) to include authors Dr. Ming Huang, Dr. Rui Wu, Dr. Dashi Qiu, Dr. Xingxing Xiao, Dr. Dong Li, and Dr. Cai Chen.

7. Please amend your list of authors on the manuscript to ensure that each author is linked to an affiliation. Authors’ affiliations should reflect the institution where the work was done (if authors moved subsequently, you can also list the new affiliation stating “current affiliation:….” as necessary).

8. We note that Figure [1] includes an image of a [patient / participant / in the study]. 

As per the PLOS ONE policy (http://journals.plos.org/plosone/s/submission-guidelines#loc-human-subjects-research) on papers that include identifying, or potentially identifying, information, the individual(s) or parent(s)/guardian(s) must be informed of the terms of the PLOS open-access (CC-BY) license and provide specific permission for publication of these details under the terms of this license. Please download the Consent Form for Publication in a PLOS Journal (http://journals.plos.org/plosone/s/file?id=8ce6/plos-consent-form-english.pdf). The signed consent form should not be submitted with the manuscript, but should be securely filed in the individual's case notes. Please amend the methods section and ethics statement of the manuscript to explicitly state that the patient/participant has provided consent for publication: “The individual in this manuscript has given written informed consent (as outlined in PLOS consent form) to publish these case details”. 

If you are unable to obtain consent from the subject of the photograph, you will need to remove the figure and any other textual identifying information or case descriptions for this individual.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: No

Reviewer #2: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: No

Reviewer #2: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: No

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: This paper introduces the Point Cloud Local Auxiliary Block (PLAB), inspired by local binary pattern algorithms, to improve neighborhood feature representation in point clouds. Additionally, the Dual Attention Layer (DAL), a Transformer structure, captures both local and global features using attention score maps. Experimental results show enhanced learning of features on various point cloud datasets. The authors will publish their code and experimental logs on GitHub, promoting transparency and reproducibility.

Most of the comparison methods are too outdated, especially other learning-based methods.

Why modelnet40 is conduct on 4060ti, fg3d is on 3060ti? Why not both using 4060ti?

In fg3d, why dcnet is better than the method proposed in this paper?

The references and comparison methods are too outdated. It is now 2024, so at least try to compare the methods from 2023 to the present.

Which benchmark network is this paper primarily based on?

There are many typos and grammar issues in the paper. For example, “Relu”; line 226 “… and …”;

There are some issues with the formatting of the paper. Eq. (4), (5) (8); The title and body of Table 1 are separated. I suggest the author use the latex template of PLOS ONE.

In table 1, the “point” in the table is redundant.

Most of the comparison methods are too outdated.

Reviewer #2: In this paper, the author proposed a simple and effective local complement module, PLAB, which is derived from the local binary pattern image texture extraction algorithm, to establish the relationship between neighborhood points and obtain effective features. The text performs well in the chart and experimental sections, but there are some minor issues in the introduction of the work.

Here are a few suggestions:

1.The percentage symbols in Equations 4, 5, 7, and 8 are misaligned. It is recommended to adjust the size of the percentage symbols to make them look more coordinated.

2.It is suggested that the author introduces more about the connection between different methods and their own work in the related work section. Explain how your method finds innovation among many methods, rather than merely listing the contents of others' work extensively.

3.It is recommended to provide more detailed explanations for the formulas in the text.

4.The images inserted in the text are mostly of insufficient clarity; it is recommended to replace them with higher-resolution images.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2025 Jan 6;20(1):e0314086. doi: 10.1371/journal.pone.0314086.r002

Author response to Decision Letter 0


30 Aug 2024

response to editor:

point 1:Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming.

response 1:We have carefully revised the manuscript according to the template, and if it still does not meet the requirements, we will continue to revise it.

point 2:Please note that PLOS ONE has specific guidelines on code sharing for submissions in which author-generated code underpins the findings in the manuscript. In these cases, we expect all author-generated code to be made available without restrictions upon publication of the work.

response 2:Our code is on other devices, and we hope to release it after the article is accepted.

point 3:Please state what role the funders took in the study. If the funders had no role, please state: "The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript." If this statement is not correct you must amend it as needed. Please include this amended Role of Funder statement in your cover letter; we will change the online submission form on your behalf.

response 3:We stated "The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript." in our cover letter.

point 4:We note that you have provided funding information that is not currently declared in your Funding Statement. However, funding information should not appear in the Acknowledgments section or other areas of your manuscript. We will only publish funding information present in the Funding Statement section of the online submission form. Please remove any funding-related text from the manuscript and let us know how you would like to update your Funding Statement.

response 4:We have removed the mention of fund support in the acknowledgements, so please update the fund support "BUCEA Doctor Graduate Scientific Research Ability Improvement Project(DG2024034) and National Natural Science Foundation of China (42171416).".

point 5:Please complete your Competing Interests on the online submission form to state any Competing Interests. If you have no competing interests, please state "The authors have declared that no competing interests exist."

response 5:We have already noted "The authors have declared that no competing interests exist." in the cover letter.

Point 6: Please amend the manuscript submission data (via Edit Submission) to include authors Dr. Ming Huang, Dr. Rui Wu, Dr. Dashi Qiu, Dr. Xingxing Xiao, Dr. Dong Li, and Dr. Cai Chen.

Response 6: We will amend the manuscript submission data.

Point 7: Please amend your list of authors on the manuscript to ensure that each author is linked to an affiliation. Authors’ affiliations should reflect the institution where the work was done (if authors moved subsequently, you can also list the new affiliation stating “current affiliation:….” as necessary).

Response 7:We have amend list of authors on the manuscript.

Point 8: We note that Figure [1] includes an image of a [patient / participant / in the study]. If you are unable to obtain consent from the subject of the photograph, you will need to remove the figure and any other textual identifying information or case descriptions for this individual.

Response 8: We were unable to obtain consent, so we removed Figure 1.

Response to Reviewer#1:

Point 1. Most of the comparison methods are too outdated, especially other learning-based methods.

Response 1: Thank you for your suggestion. In fact, the writing of this manuscript was completed at the end of 2023, and the methods we compared included some classic papers as well as newer papers from 2022 and 2023, such as 3DPCT and PointConT. However, we agree with your suggestion and add some of the latest methods for comparison. Specifically refer to lines 159-165 and Table 1 of the Revised Manuscript with Track change.

Point 2. Why modelnet40 is conduct on 4060ti, fg3d is on 3060ti? Why not both using 4060ti?

Response 2: Thank you for your careful review. Since the experiments were conducted on different devices, we have repeated the experiments on the 4060Ti device. Specifically refer to Table 3 of the Revised Manuscript with Track change.

Point 3. In fg3d, why dcnet is better than the method proposed in this paper?

Response 3:Your question is well-taken. After extensive experiments, we found that many existing models perform differently on different datasets. For example, while DCNet outperformed our proposed method on the FG3D dataset, it only achieved a 92.4% overall accuracy on the ModelNet40 dataset, significantly lower than our 93.3% overall accuracy. Furthermore, while our proposed method's performance on the FG3D dataset was not the best, its results were still competitive, indicating that our method is adaptable to different datasets.

Point 4. The references and comparison methods are too outdated. It is now 2024, so at least try to compare the methods from 2023 to the present.

Response 4:Thank you for your suggestion. As mentioned in Point 1, the methods we compared include both classic and newer ones, and we have added new methods to the manuscript.

Point 5. Which benchmark network is this paper primarily based on?

Response 5:Thank you for your question. Our benchmark network is PCT, and we highlighted the issue of this oversight in the manuscript. Specifically refer to lines 403 of the Revised Manuscript with Track change.

Point 6. There are many typos and grammar issues in the paper. For example, “Relu”; line 226 “… and …”;

Response 6:Thank you for your suggestion. We carefully rechecked the manuscript and made corrections to spelling and grammar. Specifically refer to the Revised Manuscript with Track change.

Point 7. There are some issues with the formatting of the paper. Eq. (4), (5) (8); The title and body of Table 1 are separated. I suggest the author use the latex template of PLOS ONE.

Response 7:We have modified the paper according to the formatting requirements of PLOS ONE and reorganized the equations. Specifically refer to the Revised Manuscript with Track change.

Point 8. In table 1, the “point” in the table is redundant.

Response 8:"point" was redundant, and we have made the correction in the table 1.

Point 9. Most of the comparison methods are too outdated.

Response 9:Thank you for your suggestion. In order to compare newer methods, we have added newer methods in the Related Work section, and conducted an experimental comparison of the latest method, which is shown in Table 1.

Response to Reviewer#2:

Point 1. The percentage symbols in Equations 4, 5, 7, and 8 are misaligned. It is recommended to adjust the size of the percentage symbols to make them look more coordinated.

Response 1:We revised the formula expression to make it clearer and more aesthetically pleasing. Specifically refer to the Revised Manuscript.

Point 2. It is suggested that the author introduces more about the connection between different methods and their own work in the related work section. Explain how your method finds innovation among many methods, rather than merely listing the contents of others' work extensively.

Response 2:Thank you for your suggestion. We have added content to enhance the relevance of the related work to this Manuscript, to highlight its relevance. Specifically refer to lines 127-133,171-174,181-183 of the Revised Manuscript with Track change.

Point 3. It is recommended to provide more detailed explanations for the formulas in the text.

Response 3:We have provided a more detailed explanation of the formulas in the Manuscript. Specifically refer to lines 289-330 of the Revised Manuscript with Track change.

Point 4. The images inserted in the text are mostly of insufficient clarity; it is recommended to replace them with higher-resolution images.

Response 4:Thank you for your suggestion. We have modified some unclear images and tried to make them more visually appealing, as shown in Fig 4. We have also removed Fig 1 due to copyright issues. Specifically refer to line 383 of the Revised Manuscript with Track change.

Attachment

Submitted filename: Response to Reviewers#2.docx

pone.0314086.s001.docx (264.2KB, docx)

Decision Letter 1

Ayesha Maqbool

3 Oct 2024

PONE-D-24-19727R1Learning features between neighboring points for point cloud classificationPLOS ONE

Dear Dr. wang,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

The article "Learning Features Between Neighboring Points for Point Cloud Classification" presents a valuable contribution; however, to enhance its quality, I recommend improving the analysis section and presenting the findings in greater depth. The objectives of the work should be further elaborated, with a clearer explanation of how these objectives are achieved. Providing a more structured presentation of the alignment between the research goals and results will significantly strengthen the paper.

Please submit your revised manuscript by Nov 17 2024 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Ayesha Maqbool, PhD

Academic Editor

PLOS ONE

Journal Requirements:

Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: (No Response)

Reviewer #2: (No Response)

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Partly

Reviewer #2: Partly

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: N/A

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: No

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: No

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: Most of comments have been addressed by the authors. However, all the figures in this paper look very rough, the images look very blurry. It is recommended to use vector graphics instead.

Another concern is that most of the network figures look very similar to the figures in the publised papers, so the novely of this paper also need to be concerned.

Reviewer #2: 1.There are several spelling and grammar errors in the text, such as the phrase "performed we this module" in the abstract. These details need to be meticulously proofread to prevent unnecessary mistakes that could impact the quality of the paper.

2.It has been noted that the images in the paper, particularly Figure 4, lack clarity. It is recommended to ensure that all images are of high resolution to facilitate easier viewing for readers.

3.In Table 1, the title and body are misaligned in several areas, which affects readability. Furthermore, the term "point" is redundant and should be eliminated from the table.

4.The explanations for formulas 4, 5, 7, and 8 lack sufficient detail. It is recommended to provide more comprehensive explanations of the derivation and underlying principles of these formulas. This will help ensure that readers can clearly grasp the derivation process and the physical significance of each step.

5.Although the "Related Work" section references relevant studies, the analysis primarily consists of a basic enumeration of their contents. It is advisable to emphasize the distinctions and innovations of your method in comparison to existing approaches to further strengthen the paper's persuasiveness.

6.Although the experimental results indicate that the method performs well on certain datasets, a more in-depth discussion is necessary regarding specific cases (e.g., the reasons why the method underperforms compared to DCNet on the FG3D dataset). Clarifying these differences will enhance readers' understanding of the method's strengths and weaknesses.

7.Some paragraphs transition rather abruptly, particularly between the introduction of the methods and the experiments section. Incorporating transitional sentences can enhance the overall flow and coherence of the paper's structure.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

Attachment

Submitted filename: Reviewer Attachments for Manuscript.docx

pone.0314086.s002.docx (12.6KB, docx)
PLoS One. 2025 Jan 6;20(1):e0314086. doi: 10.1371/journal.pone.0314086.r004

Author response to Decision Letter 1


1 Nov 2024

response to review #1:

Thanks to your suggestions, we have replaced all the images with higher resolution images and Figure 4 has been redrawn with new tools to make it look better. You can view it below or in Revised Manuscript with Track change.For problems where the numbers are very close, because the classification task is close to saturation, even small overall accuracy gains still require quite complex improvements, and from the published networks, each percentage point of overall accuracy gain for the classification task requires approximately one to two years of experimentation by the relevant researchers around the world to improve the network performance.

response to review #2:Detailed responses have been put into the “Response to Reviewer” file.

Response 1:Thanks to your suggestion, we have carefully checked the whole text and corrected several grammatical and spelling errors. You can view in Revised Manuscript with Track change.

Response 2:Thanks to your suggestions, we have replaced all the images with higher resolution images and Figure 4 has been redrawn with new tools to make it look better. You can view it in Revised Manuscript with Track change.

Response 3:Thanks to your careful scrutiny, we've reworked the problems with the table 1. You can check it out in the Revised Manuscript.

Response 4:Thanks for your suggestion, I have provided more explanations for formulas 4, 5, 7, and 8. You can check it out in the Revised Manuscript.

Response 5:Thanks to your suggestion, we've added more differentiating descriptions to our related work to highlight the distinctiveness and innovation of our approach. You can check it out in the Revised Manuscript.

Response 6:Thank you for your insightful suggestions. Indeed, networks targeted for fine-grained point cloud classification tasks differ in their design goals from those for traditional point cloud classification tasks. For the traditional point cloud dataset (ModelNet40), the network usually needs to learn the global structure of the point cloud samples in order to perform classification efficiently. However, since fine-grained point cloud datasets (FG3D) have relatively small interclass sample differences and relatively large intraclass sample differences, this property motivates the network to have to pay more attention to the local detailed features in order to achieve more accurate classification. fine-grained point cloud classification differs significantly from traditional point cloud classification in terms of the focus of feature learning. In order to focus on the ModelNet40 dataset, our proposed network concentrates on learning the global structure of the point cloud samples for the classification task, and thus has a lower accuracy than DCNet on the FG3D dataset. However, it is worth noting that our net exhibits superior performance on the fine-grained dataset FG3D compared to existing networks such as PCT. This is mainly due to the innovative design of the dual attention layer in terms of local feature capture, which enables the network to extract some fine features more efficiently, thus jointly improving the coarse- and fine-grained classification accuracy. We also analyse this problem in the manuscript,you can check it out in the Revised Manuscript.

Response 7:The excesses of these two parts were indeed abrupt, we readjusted the transitional sentences to make the manuscript smoother. You can check it out in the Revised Manuscript.

Attachment

Submitted filename: Response to Reviewer.docx

pone.0314086.s003.docx (8.5MB, docx)

Decision Letter 2

Ayesha Maqbool

5 Nov 2024

Learning features between neighboring points for point cloud classification

PONE-D-24-19727R2

Dear Dr. Wang,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice will be generated when your article is formally accepted. Please note, if your institution has a publishing partnership with PLOS and your article meets the relevant criteria, all or part of your publication costs will be covered. Please make sure your user information is up-to-date by logging into Editorial Manager at Editorial Manager® and clicking the ‘Update My Information' link at the top of the page. If you have any questions relating to publication charges, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Ayesha Maqbool, PhD

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Acceptance letter

Ayesha Maqbool

10 Dec 2024

PONE-D-24-19727R2

PLOS ONE

Dear Dr. Wang,

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now being handed over to our production team.

At this stage, our production department will prepare your paper for publication. This includes ensuring the following:

* All references, tables, and figures are properly cited

* All relevant supporting information is included in the manuscript submission,

* There are no issues that prevent the paper from being properly typeset

If revisions are needed, the production department will contact you directly to resolve them. If no revisions are needed, you will receive an email when the publication date has been set. At this time, we do not offer pre-publication proofs to authors during production of the accepted work. Please keep in mind that we are working through a large volume of accepted articles, so please give us a few weeks to review your paper and let you know the next and final steps.

Lastly, if your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

If we can help with anything else, please email us at customercare@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Ayesha Maqbool

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    Attachment

    Submitted filename: Response to Reviewers#2.docx

    pone.0314086.s001.docx (264.2KB, docx)
    Attachment

    Submitted filename: Reviewer Attachments for Manuscript.docx

    pone.0314086.s002.docx (12.6KB, docx)
    Attachment

    Submitted filename: Response to Reviewer.docx

    pone.0314086.s003.docx (8.5MB, docx)

    Data Availability Statement

    The ModelNet40 dataset are available from the Kaggle database (https://www.kaggle.com/datasets/balraj98/modelnet40-princeton-3d-object-dataset) The FG3D dataset are available from Kaggle database (https://www.kaggle.com/datasets/yue123y/fine-grained-3d).


    Articles from PLOS ONE are provided here courtesy of PLOS

    RESOURCES