Skip to main content
MethodsX logoLink to MethodsX
. 2024 Dec 9;14:103090. doi: 10.1016/j.mex.2024.103090

Enhancing land feature classification with the BTR Extractor: A novel software package for high-accuracy analysis of aerial laser scan data

Jamshid Talebi 1, Zahra Azizi 1,
PMCID: PMC11699430  PMID: 39758432

Abstract

The semi-automatic and automatic extraction of land features such as buildings, trees, and roads using aerial laser scan data is crucial in land use change studies and urban management. This research introduces the "BTR" extractor, a novel software package designed to enhance classification accuracy of phenomena identified in the super points obtained from aerial laser scanners. Our method focuses on:

  • Comparing classification methods using airborne laser scanning data.

  • Implementing supervised algorithms for high-accuracy classification.

  • Evaluating the performance against existing software like TerraSolid.

The user-friendly interface allows data entry, training data collection, and selection of classification methods. We employed five methods (Bayesian algorithms, support vector machine, K-nearest neighbor, C-Tree, and discriminant analysis) to classify land features. Comparative results show the BTR extractor outperforms TerraSolid, particularly in supervised classification, demonstrating high accuracy and reliable implementation in the studied area. Our findings advocate for the use of supervised algorithms in classifying cloud data for enhanced accuracy and efficiency in remote sensing applications.

Keywords: Points cloud, Supervised algorithms, SVM

Method name: BTR Extractor

Graphical abstract

Image, graphical abstract


Specifications table

Subject area: Environmental Science
More specific subject area: Remote Sensing
Name of your method: BTR Extractor
Name and reference of original method: No
Resource availability: No

Background

The classification of land features, such as buildings, trees, and roads, has long been a focal point in remote sensing and photogrammetry. Traditionally, researchers have concentrated on analyzing aerial or satellite imagery [1,2]. However, with advancements in aerial laser scan data such as LiDAR (Light Detection and Ranging) technology, especially in Iran, there is an emerging need to explore and develop new algorithms for classifying land features using point cloud data. This research aims to address this need by introducing the "BTR" extractor, a software package designed to enhance the accuracy of land feature classification using aerial laser scan data point clouds.

Accurate classification of land features is essential for creating urban and agricultural maps, providing critical information for urban planning, infrastructure development, and 3D modeling of urban landscapes. It is crucial to have precise data on the location of land features for effective urban management and future planning. Over the past few years, the extraction of features such as tree height, trunk diameter, biomass, and changes in urban infrastructure using aerial laser scan data has become a key research area. Early studies focused on creating digital elevation models (DEMs) and canopy height models (CHMs) using LiDAR data [[3], [4], [5], [6], [7]].

In this context, we utilized aerial laser scan data point clouds as inputs for our software package, which outputs classified point clouds in LAS format. Classification in machine learning involves assigning new observations to predefined groups based on their characteristics [[8], [9], [10], [11]]. Supervised classification, where the user selects sample points for training, is widely used in this field. Various algorithms, including nearest neighbors, decision trees, and support vector machines (SVM), have been developed for classification tasks. This study explores the optimization of these algorithms for classifying land features using aerial laser scan data [[12], [13], [14], [15], [16], [17], [18], [19]].

Challenges in working with aerial laser scan data include handling complex calculations, data discontinuities, density variations, and the lack of effective descriptors. Unlike image data, point clouds are discrete and irregular, requiring specialized search algorithms to define neighborhoods [9]. Research has shown a strong correlation between point cloud density and classification accuracy. This study not only compares different classification algorithms but also evaluates their performance using the newly developed software, demonstrating the advantages of supervised classification in accurately classifying land features.

Method details

The study was conducted in two distinct urban areas of Qom city, Iran, to evaluate the performance of the software package in classifying laser point cloud data. These areas were chosen due to their diverse urban landscapes, which include a mix of built-up structures, natural terrain, and vegetation, making them ideal for testing the classification algorithms.

Area 1

Central Urban District

The first study site is located in the central urban district of Qom city, characterized by dense infrastructure, including residential and commercial buildings, roads, and parks. This area is heavily built-up, with a complex distribution of vegetation and concrete surfaces. The central district was selected to test the software's ability to classify built-up features, such as roads, buildings, and trees, within a dense urban environment.

Area 2

Peripheral Suburban Area

The second study site is a suburban area located on the outskirts of the city, where urban development is less dense, and the landscape transitions into open fields and agricultural land. This area offers a mix of urban features and natural vegetation, including trees, agricultural plots, and roads. The suburban area was selected to assess the software's performance in less densely built environments, where distinguishing between different types of ground surfaces and vegetation may

The primary data used in this research is derived from aerial laser scan data technology, which involves active sensors that emit laser pulses towards objects and measure the return signals. These return signals provide essential information about the objects, with particular emphasis on distance and intensity. Laser sensors can capture multiple return signals, but most commonly, they record the first and last returns. The first return reflects from the surface that the pulse first encounters, such as the canopy of trees, providing data on the uppermost layers of the environment. On the other hand, the last return penetrates through vegetation to reach the ground, offering information about the lower layers, including the terrain surface.

For this study, the raw input data consisted of First Pulse Return (FPR), Last Pulse Return (LPR), First Pulse Intensity (FPI), and Last Pulse Intensity (LPI). The dataset was collected in an urban area of Qom city using a laser scanner mounted on an aircraft flying at an altitude of 1500 m. The point density of the data is 1.2 points per square meter, with an average point spacing of 0.98 m. A detailed view of the dataset can be found in Fig. 1. This figure displays the laser data captured from two districts in Qom city. Panel (a) shows data from the first district, while panel (b) depicts the second district. Panel (c) illustrates the data within the software environment, where the point cloud is processed and classified into different land features. The figure provides an overview of the raw laser data and its transformation into classified information within the software, which is crucial for urban feature analysis, such as tree, building, and road detection.

  • Laser sensors, as active sensors, work by emitting laser pulses and recording the reflected signals. The primary data captured for this research, including FPR, LPR, FPI, and LPI, provides key insights into both the vertical structure of the environment and the intensity of the returns, which are critical for classifying and analyzing the scanned area.

Fig. 1.

Fig 1

Aerial Laser Scan Data from Urban Areas of Qom City: a. The first district b. The second district c. Data in the software environment and point cloud classification.

Software description

  • This study involved developing a custom software package designed to maximize classification accuracy. The software processes laser point cloud data to classify land features with high precision. It incorporates several key functionalities that ensure effective data handling and classification.

Software workflow

Input Data: The user imports raw aerial laser scan data into the software, which then processes the data for further steps.

Descriptor Generation: The software generates descriptors, which are essential for the classification process. These descriptors can be spectral, textural, or structural, and they serve to differentiate between different land features (such as trees, roads, and buildings). In this study, descriptors such as Digital Terrain Model (DTM), slope, aspect, profile curvature, plan curvature, variance, and Laplacian filter were generated.

Feature Space Generation: Once descriptors are generated, the software creates a feature space, which helps enhance classification accuracy. Users can select the appropriate feature space from 23 available types, depending on the region type (e.g., dense urban, rural). Normalization tools are incorporated to adjust feature spaces to a desired range, ensuring compatibility with the training data.

Training Data Preparation: Training data is critical for achieving accurate classification. The software provides default classes (e.g., Buildings, Roads, Trees, Artificial Ground, Natural Ground, Cars) and allows users to select and define new classes. Drawing tools are provided to help users manually select regions for training. Fig. 2 shows an example of the feature spaces produced in the software. The visual representation in this figure helps demonstrate the capability of the software to organize and classify raw laser point cloud data into meaningful categories based on the selected feature space.

  • Classification of Data

Fig. 2.

Fig 2

Feature space generated in the software.

After preparing the training data, the software proceeds with classification using various algorithms. Classification in photogrammetry is crucial for separating data into predefined classes, based on training data provided by the user.

  • Classification Algorithms

The software supports several classification algorithms, each suitable for different types of data and research objectives:

  • 1.

    SVM Classification

    SVM is a supervised, non-parametric classification method known for its high accuracy and minimal training data requirements. The goal is to find a hyperplane that minimizes classification errors. This algorithm is implemented within the software's Classification menu, where users can select SVM to classify point cloud data.

  • 2.

    Bayesian Classification

    The Bayesian method classifies phenomena based on the probability of their occurrence. It is a simple, supervised classification algorithm with acceptable accuracy. The method can be enhanced by using kernel density estimation for improved results. The software allows users to apply this method for classification tasks.

  • 3.

    Classification Tree (C-Tree)

    The C-Tree is a decision tree method that is non-parametric. It can be used to classify both ordinal and continuous variables, making it versatile for various applications. It is commonly used in approaches like audit analysis and logistic regression.

  • 4.

    Discriminant Analysis

    Discriminant analysis is a technique that seeks to express an independent variable as a linear combination of other characteristics. This method is closely related to Principal Component Analysis (PCA) and is effective for distinguishing between different classes based on predefined variables.

  • 5.
    K-Nearest Neighbor Classification
    • The KNN algorithm is widely used in data mining, machine learning, and pattern recognition. This non-parametric method is simple yet effective, particularly for classifying data into discrete categories based on proximity to other data points.

Error matrix and output generation

Once the classification is completed, the software provides users with the option to estimate error matrices for validation purposes. It also allows users to verify the results of classification through various metrics and generate outputs in standard formats, such as classified point clouds and error analysis reports.

  • Additional Notes

The software package developed for this study allows users to easily input aerial laser scan data, generate descriptors, prepare training data, and apply classification algorithms with high accuracy. It integrates several advanced algorithms, including SVM, Bayesian, C-Tree, Discriminant Analysis, and KNN, to perform classification tasks efficiently. The software's user-friendly interface and comprehensive output options make it a powerful tool for land feature classification using laser point clouds.

Accuracy and Consistency: The software is designed with a focus on reproducibility, ensuring that other researchers can apply the same methods and achieve similar results with their laser data.

Modular Design: The software is modular, allowing users to select the most appropriate algorithm and configuration based on the type of data and research objectives.

Method validation

For this purpose, classification of 5 SVM decision tree algorithms, separate component analysis, k nearest neighbor and simple Bayesian were performed with the provided software package. At first, it was possible to generate the appropriate feature space type on each data. And by choosing the user to use each feature space, the classification process took place.

The first results obtained showed that the use of all production feature spaces will not necessarily lead to the best results. In the Fig. 3, a comparison between Aspect feature space and nDSM-GD was shown. As can be seen in the figure; Aspect feature space will not help to increase the accuracy of the output classification product.

Fig. 3.

Fig 3

Comparison of obtainable information for two feature spaces Aspect and nDSM - GD (Qom data).

After selecting the training data, the supervised classification was done and the training data of the building, tree and road class were classified in the new software environment as well as the Trasolid software, then the error matrix (Fig. 4) and the results Fig. 5) was compared for each classification. Fig. 4 illustrates the results of different classification methods applied to the laser point cloud data. Fig. 5 demonstrates the evaluation of the classification accuracy using an error matrix. Panel (a) shows the error matrix for the classification algorithms, and panel (b) presents the overall accuracy(%) and Kappa coefficient values. The error matrix provides insights into the performance of the classifier by comparing the classified data against the ground truth data. The overall accuracy and Kappa coefficient values reflect the reliability and quality of the classification results.

Fig. 4.

Fig 4

Comparison of classification methods:(a) C- Trees (b) Bayesian (c) SVM (d) K-NN (e) D – analysis.

Fig. 5.

Fig 5

Overall accuracy values(%) and Kappa coefficient for five programmed classification methods (Qom data).

Also, the overall accuracy analysis for the result of Bayesian classification based on the use of nDSM-GD feature space was calculated as 95.23 % and the use of Aspect feature space as 86.80 %. Also, the value of kappa coefficient based on the use of nDSM-GD feature space was calculated as 92.18 and the use of Aspect feature space as 84.15. The obtained result indicates the importance of choosing the right feature space for obtaining training data and performing the classification process.

Another result points to the importance of choosing the correct classification method. According to the results obtained in the section related to classification methods, the difference in the obtained results can be clearly seen. By comparing the overall accuracy values for the five algorithms programmed in this software package on the studied data, it can be found that the Bayesian classification method, separate component analysis, and SVM have the highest overall accuracy values. Also, the classification tree algorithm has the lowest overall accuracy (Fig. 6, Fig. 7).

Fig. 6.

Fig 6

D-analysis classification in Terrasolid software(a) and Class combination issues(b).

Fig. 7.

Fig 7

Comparison of Kappa coefficient values and overall accuracy(%) Between terrasolid software and the proposed software package.

Terrasolid is a suite of software tools that works in conjunction with MicroStation and AutoCAD. It provides a comprehensive set of tools for point cloud processing, including feature extraction, surface modeling, and terrain analysis. It is specifically designed to handle aerial laser scan data and images, offering various classification algorithms and tools to process and classify land features from airborne laser data. While Terrasolid is a robust software for point cloud processing, we have used it as a comparison tool to benchmark the performance of our custom software.

Our custom software, developed for this research, focuses specifically on enhancing classification accuracy and providing a user-friendly interface for data entry, feature space selection, and classification execution. The main advantage of our software lies in its advanced classification algorithms and the ability to fine-tune parameters for optimal feature extraction and classification of urban and natural land features.

Both software packages were used in parallel to compare their performance and evaluate their ability to classify land features accurately, particularly in urban environments

Fig. 6 shows the results of Discriminant Analysis (D-analysis) classification applied to laser point cloud data using the Terrasolid software. Panel (a) presents the classification results obtained from Terrasolid, where the point cloud data is categorized into predefined classes such as trees, buildings, and roads. Panel (b) illustrates the issue of class combination in the software, where different classes, particularly vegetation and buildings, are misclassified or merged due to the limitations of the classification algorithm. This figure highlights the challenges faced when applying D-analysis in automated classification tasks, particularly when class boundaries are not distinct enough for accurate separation. Fig. 7 compares the Kappa coefficient and overall accuracy values of classification results obtained from Terrasolid software and the proposed software package. Panel (a) shows the Kappa coefficient values for both methods, which indicate the level of agreement between the classified results and the reference data. A higher Kappa coefficient signifies better classification performance and less agreement with random chance. Panel (b) presents the overall accuracy values for both software systems, reflecting the percentage of correctly classified data points out of the total dataset. The comparison highlights the effectiveness of the proposed software package in achieving higher accuracy and reliability in classifying laser data compared to Terrasolid, demonstrating its potential for more precise land feature detection.

The obtained results indicate the performance of Bayesian classification methods with a value of 98.25 %, analysis of separate components with a value of 98.23 % and SVM with a value of 97.09 %.

In the next step, a comparison was made between the results obtained from the programmed software and Terrasolid software in the automatic classification of three classes of trees, buildings and roads. Since each of the three desired classes are classified separately in Terrasolid software, the outputs must be superimposed and produce the final classification map. In this section, two regions (the second region of Qom data), in addition to the mentioned region (the first region of Qom data) were used in the previous stages to compare the results.

The visual comparison of the results showed that the methods based on the supervised classification of the software package compared to the method used in the Terrasolid software, in the case where the point cloud density is low, are more efficient and accurate in the form of data classification in Qom (second region). The software package has calculated the class of trees very well.

Visually, by applying point cloud data classification in three classes of buildings, trees and roads according to the parameters suggested in Terrasolid software guide and Terrascan module, the following results were obtained. Due to the fact that Terrasolid does not classify several classes at once, therefore, the interference between classes can be seen in the output of Terrasolid software as shown below.

However, in the data integration, some classes may be placed under another class according to the operator's diagnosis and the results will be confused. The software package simultaneously classifies and separates the classes desired by the user, so we will not have the above-mentioned problem of inter-class interference in the integration phase related to Terrasolid software here.

Limitations

Not applicable.

Ethics statements

This research did not involve human participants, animal experiments, or data collected from social media platforms. All data utilized in this study were collected by researchers adhering to the respective ethical guidelines and without violating privacy rights. No additional ethical approval was required for the use of these datasets in our study.

CRediT authorship contribution statement

Jamshid Talebi: Methodology, Validation, Formal analysis, Writing – original draft. Zahra Azizi: Methodology, Validation, Investigation, Writing – review & editing.

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Footnotes

Related research article: None.

For a published article: None.

Data availability

Data will be made available on request.

References

  • 1.Secord J., Zakhor A. Tree detection in urban regions using aerial LiDAR and image data. IEEE Geosci. Remote Sens. Lett. 2007;4(2):192–200. [Google Scholar]
  • 2.Jiang M., Lin Y. Individual deciduous tree recognition in leaf-off aerial ultrahigh spatial resolution remotely sensed imagery. IEEE Geosci. Remote Sens. Lett. 2013;10(1):38–42. [Google Scholar]
  • 3.Magnussen S., Boudewyn P. Derivations of stand heights from airborne laser scanner data with canopy-based quantile estimators. Can. J. For. Res. 2024;28(7):1016–1031. [Google Scholar]
  • 4.Antkowiak, M., "Artificial neural networks vs. support vector machines for skin diseases recognition," Master's thesis, Department of Computing Science, Umea University, Sweden, 2006.
  • 5.Miraki M., Sohrabi H., Immitzer M. Estimating biomass and carbon storage of mangrove forests using UAV-image-derived variables. J. Geomatics Sci. Technol. 2024;13(3):1–11. [Google Scholar]
  • 6.Iovan C., Boldo D., Cord M. Detection, characterization, and modeling vegetation in urban areas from high-resolution aerial imagery. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2008;1(3):206–213. [Google Scholar]
  • 7.Kim S.R., Kwak D.A., Lee W.K., Son Y., Bae S.W., Kim C., Yoo S. Estimation of carbon storage based on individual tree detection in Pinus Densiflora stands using a fusion of aerial photography and LiDAR data. Sci. China Life Sci. 2010;53(7):885–897. doi: 10.1007/s11427-010-4017-1. [DOI] [PubMed] [Google Scholar]
  • 8.Li J., Hu B., Noland T.L. Classification of tree species based on structural features derived from high density LiDAR data. Agric. For. Meteorol. 2024;171:104–114.18. [Google Scholar]
  • 9.Zhao Y., Gui W., Chen Z. Proceedings of the Sixth World Congress on Intelligent Control and Automation WCICA 2006. Vol. 2. IEEE; 2006. Edge detection based on multi-structure elements morphology; pp. 9795–9798. [Google Scholar]
  • 10.Jian S., Jiang J., Lu K., Zhang Y. Proceedings of the 2014 12th International Conference on Signal Processing (ICSP) 2014. SEU-tolerant restricted Boltzmann Machine learning on DSP-based fault detection; pp. 1503–1506. 19-23 Oct. 2014. [DOI] [Google Scholar]
  • 11.Azizi Z., Najafi A., Sadeghian S. Forest road detection using LiDAR data. J. For. Res. 2014;25:975–980. doi: 10.1007/s11676-014-0544-0. (Harbin) [DOI] [Google Scholar]
  • 12.Ogutu J.O., Piepho H.P., Schulz-Streeck T. A comparison of random forests, boosting and support vector machines for genomic selection. BMC Proc. 2011;5(Suppl 3):S11. doi: 10.1186/1753-6561-5-S3-S11. BioMed Central Ltd. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Tang Y., Krasser S., He Y., Yang W., Alperovitch D. Proceedings of the Global Telecommunications Conference. IEEE; 2008. Support vector machines and random forests modeling for spam sender's behavior analysis; pp. 1–5. IEEE GLOBECOM 20082008IEEE. [Google Scholar]
  • 14.Arora, S., Bhattacharjee, D., Nasipuri, M., Malik, L., Kundu, M., and Basu, D.K. "Performance comparison of SVM and ANN for handwritten devnagari character recognition," arXiv preprint arXiv:1006.5902, 2010.
  • 15.Yavari S.M., Azizi Z., Kiadaliri H., Aghamohamadi H. Reducing the effect of the forest canopy to measure distances between trees using UAV imageSmart. Agricultural Technol. 2023;6 [Google Scholar]
  • 16.Jahromi A.B., Zoej M.J.V., Mohammadzadeh A., Sadeghian S. A novel filtering algorithm for bare-earth extraction from airborne laser scanning data using an artificial neural network. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024;4(4):836–843. [Google Scholar]
  • 17.Azizi Z., Najafi A., Sadeghian Saeed. Forest road detection using LiDAR Data. J. For. Res. 2013;25(4):975–980. doi: 10.1007/s11676-014-0544-0. (Harbin) [DOI] [Google Scholar]
  • 18.Azizi Z., Sadeghiyan S. Forest canopy modeling with LiDAR data and digital aerial imagery. Proceedings of the 2nd International Conference on Sensors and Models in Photogrammetry and Remote Sensing (SMPR’13; Tehran, Iran; 2011. [Google Scholar]
  • 19.Azizi Z., Miraki M. Individual urban trees detection based on point clouds derived from UAV-RGB imagery and local maxima algorithm, a case study of Fateh Garden, Iran. Environ. Dev. Sustain. 2024;26(1):2331–2344. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

Data will be made available on request.


Articles from MethodsX are provided here courtesy of Elsevier

RESOURCES