Skip to main content
Royal Society Open Science logoLink to Royal Society Open Science
. 2022 Nov 23;9(11):220597. doi: 10.1098/rsos.220597

An STP-HSI index method for urban built-up area extraction based on multi-source remote sensing data

Lijing Bu 1, Dong Dai 1, Liying Tu 2, Zhengpeng Zhang 1,, Mingjun Deng 1, Xinyu Xie 1
PMCID: PMC9682302  PMID: 36425520

Abstract

A change in an urban built-up area can reflect the process of urbanization and the development of a city. At present, multi-source remote sensing data extraction of built-up areas based on the human settlement index (HSI) has achieved relatively good results but the existence of noise, such as light spillover in the night-time light remote sensing data, seriously affects the accuracy of the HSI. In this paper, a high-precision human settlement index (STP-HSI) method based on spatio-temporal remote sensing and point-of-interest (POI) data is presented to improve the classification accuracy in urban built-up areas extractions. First, to correct light spillover, a new night-time light index the fuzzy c-means spatio-temporal point (FCM-STP) based on fuzzy c-means clustering is proposed, which integrates the spatio-temporal characteristics and uses night light video imaging data from Luojia-1 and POI data. Then, based on the FCM-STP index, the HSI is updated to the STP-HSI index. Finally, a random forest algorithm is used to extract the urban built-up areas, and the random forest feature database is composed of normalized difference vegetation index (NDVI), normalized difference built-up index (NDBI) and STP-HSI index features and texture features. To develop and evaluate the accuracy of the new method for built-up areas extraction with multi-source data, three test sites located in the cities of China (Guangzhou, Xiamen and Nanjing) are used. The experimental results show that our method outperforms the single-source multi-spectral (Landsat 8) data extraction results, the overall accuracy is improved by up to 7.52%, and the kappa coefficient is improved by up to 14%. Compared with the HSI index, the maximum contribution rates of the STP-HSI increased by 25.74%. These experimental results show that the method in this paper is feasible.

Keywords: extraction of built-up area, night-time light remote sensing image, landsat, human settlement index, random forest, multi-source

1. Introduction

Urban built-up areas are often used to study the urbanization process of a region, and they are an important indicator of the level of urban development and expansion [16]. The data sources for urban built-up area extraction can be specifically divided into those based on optical remote sensing data [79] and those based on night-time light remote sensing data [1013]. In the earliest studies, Shiping et al. analysed the target of an image based on high-resolution remote sensing images using a greyscale co-occurrence matrix and the extracted areas were measured with an accuracy of 93.00% [7]. Jun et al. extracted urban built-up areas based on Landsat thematic mapper/enhanced thematic mapper (TM/ETM+) images that combined spectral information and multiple textures, which was better than the traditional greyscale co-occurrence matrix [8]. Bhatti et al. proposed a built-up area extraction method (BAEM) for areas building extraction using the newer Landsat-8 Operational Land Imager (OLI) data, and the omission and commission errors were reduced by 75.96% and 33.36%, respectively [9]. Due to the image element blending problem in multi-spectral optical remote sensing images, it is difficult to distinguish between built-up areas and non-built-up areas under limited resolution. Night-time light images most directly reflect economic factors, which is conducive to the extraction of urban built-up areas, so the use of night-time light remote sensing images for urban built-up areas has begun to be widely used by scholars. Duque et al. performed urban extent delineation by using defence meteorological satellite system (DMSP)/OLS night-time light data [10]. Shi et al. compared the results of NPP-VIIRS with those of DMSP for urban built-up areas extraction and concluded that the National Polar Orbit Partnership visible infrared imaging radiometer (NPP-VIIRS) has a higher extraction accuracy [11]. Liu et al. used a night-time light remote sensing image of Luojia-1 to extract the built-up areas, and the study found that it had a high spatial resolution and could extract the rich internal information of the built-up areas [12]. Zhang et al. used a vegetation index to adjust the urban night-time lights and found that the method could attenuate light spillovers and improve accuracy [13].

The above single-source urban built-up areas extraction methods still have many shortcomings, and many scholars have started to adopt multi-source methods for urban built-up areas extraction. Pandey et al. monitored urbanization areas in India by integrating DMSP/OLS night-time lights and SPOT-VGT data, and a support vector machine (SVM) was used to extract urban built-up areas from the mutual calibration datasets in 1998 and 2008 and from the Systeme Probatoire d’Observation de la Terre/Vegetation (SPOT-VGT) dataset [14]. Zhuo et al. proposed an extraction method that incorporates night-time lighting data, normalized vegetation index and demographic data and a model that used different urban and rural areas, but there was a problem of negative computational heterogeneity [15]. Tan et al. proposed a method to fuse night-time lighting data, digital elevation models, demographic data, road network density and land cover data, which could avoid overfitting by considering more comprehensive factors and had good tolerance for outliers and noise but there was a poor simulation accuracy problem in areas with low or high population densities [16]. Sharma et al. proposed an improved methodology for urban built-up areas extraction by combining MODIS multi-spectral data with VIIRS night-time light data, and a region-specific threshold approach was used to extract the urban built-up areas. It was concluded that the resulting map captured detailed information of the urban built-up areas at a global scale that were missed by the existing maps [17]. Guo et al. proposed a method to fuse night-time light data, demographic data and building data, which took into account the building distribution and was easy to calculate, but the model was relatively coarse and low precision [18]. Wu et al. proposed a method to fuse night-time light data, demographic data, digital elevation model (DEM), land use data and river road network data, which was richer in the considered factors, but the accuracy varied greatly among the different cities [19]. Yang et al. proposed a method to extract night-time light data, normalized difference vegetation index (NDVI), enhanced vegetation index, DEM and demographic data, and used a model in urban and rural areas with different methods, which weakened the effects of light saturation as well as light spillover and the model's applicability [20,21]. The above method used multi-source data for urban built-up areas extraction, which can improve the accuracy of urban extraction, but there are still some shortcomings.

At present, the methods for extracting urban built-up areas can be broadly divided into two types, the threshold methods and the machine learning methods. Sirmacek and Unsalan first extracted the edges and corners of buildings in different orientations using Gabor filters and then used these local feature points to vote for candidate urban areas [22]. Li et al. constructed the point of interest (POI) and land surface temperature (LST) adjusted NTL urban index (PLANUI) based on the threshold method, fusing POI and surface temperature, and the results showed that the index helps to attenuate the blooming effect of lights [23]. Li et al. constructed an index fusing multi-source remote sensing data based on the dichotomous method, and the experimental results showed that the index has a high check-all rate, F1 score and check-accuracy index [24]. However, threshold methods to extract the built-up areas will be disrupted by human factors and is labour intensive. The machine learning methods can achieve automatic segmentation effects and are suitable for the extraction of built-up areas. Bramhe et al. used convolutional neural networks for migration learning based on deep learning methods for urban built-up areas extraction [25]. Pal and Mather proposed using SVM for land cover classification and experimentally proved that SVM has a higher classification accuracy than the machine learning (ML) or artificial neural networks (ANN) classifiers [26]. Subsequently, SVM was widely used for built-up areas extraction. Pelizari et al. fused multi-sensor features for built-up areas extraction based on a random forest algorithm, and the results showed that the method has a high classification accuracy [27]. Among the machine learning methods, the random forest algorithm is one that contains many decision trees, has a fast computing speed, high accuracy and less overfitting.

In summary, there are currently problems of light spillover and low spatial resolution in the studies for extracting built-up areas based on multi-spectral and hyperspectral remote sensing data such as Landsat. Due to the interference of buildings, vegetation and water bodies, the extraction accuracy of a classification method using the HSI threshold method alone is low. Therefore, a random forest classification method from the perspective of multi-source remote sensing fusion is presented to extract urban built-up areas, with the following research contributions: (i) To reduce light spillover and boundary omission of built-up areas, a comprehensive fuzzy c-means spatio-temporal point (FCM-STP) index is presented that incorporates night-time spatial and temporal information and POI information; night-time lighting data fused with spatial information and time-series information can reduce the light outliers and noise. A night-time lighting data POI has accurate location information and attribute information, allowing the low-light and no-light urban built-up areas to be extracted. (ii) The traditional human settlement index is affected by the low spatial resolution and light spillover of night-time light data, which will reduce the extraction accuracy of the built-up areas. We propose a high-precision human settlement index (STP-HSI) based on the FCM-STP index. (iii) Based on night-time light remote sensing data, Landsat 8 OLI data, POI data and other auxiliary data, a multi-source remote sensing data extraction method for urban built-up areas is presented to improve the extraction accuracy.

2. Material and methods

The built-up areas extraction method for multi-source data was performed using a random forest algorithm divided into two stages. The FCM-STP index was proposed in the first stage, which is calculated on the basis of the preliminary classification results combining the spatio-temporal information-based night light video imaging data of Luojia-1 and the POI data. In the second stage, the texture features, normalized building index, normalized vegetation index and high-precision human settlement index (STP-HSI) proposed in this paper are composed into a multi-source remote sensing data feature database. The feature database is then used together with the sample points collected from Google Earth data to construct a multi-source random forest built-up areas extraction model for built-up areas extraction. The specific process is shown in figure 1.

Figure 1.

Figure 1.

Technology guideline.

2.1. FCM-STP night-time light index

Currently, scholars have started to fuse the POI data and night-time light data as a method to extract the urban built-up areas [28], but night-time light data has light spillovers and noise, which leads to fragmented boundaries and scattered patches of built-up areas; and the built-up areas in the low-light and no-light areas cannot be effectively extracted. Starting from the fusion of multi-source heterogeneous spatio-temporal data, to improve the extraction accuracy of urban built-up areas, an FCM-STP comprehensive index that integrates the spatio-temporal features of night-time lights and POIs is proposed, which can effectively use the spatio-temporal features of the night-time lights and POI data.

The index takes into account the fact that FCM is a classical fuzzy clustering algorithm and uses the FCM method for initial automatic classification to avoid some errors caused by unnecessary human factors. Spatial neighbourhood information and time-series information can reduce noise and anomalous data. POI data have accurate location information and attribute information, which can be used to extract the low-light and no-light urban built-up areas. Finally, a comprehensive index of FCM-STP is obtained by fusing information from night-time spatial and temporal data and POI data. The POI data kernel density processing formula, FCM formula [20] and FCM-STP composite index are calculated as shown in formulae (2.1)–(2.3). The flow of FCM-STP index calculation is shown in figure 2.

Pi=1nπR2×j=1nKj(1Dij2R2)2, 2.1
J=i=1ck=1n(uik)mxkvi2 2.2
andFCM-STP=y=1TWy(ZRi/ZR)j=1cy=1TWy(ZRj/ZR)×Pi. 2.3

Figure 2.

Figure 2.

FCM-STP index flow chart.

In formula (2.1), Kj is the weight of data point j; Dij is the Euclidean distance between spatial point i and data point j; R is the bandwidth (Dij<R) of the calculation rule areas; and n is the number of j data points in the calculation rule areas.

In formula (2.2), uikis the membership degree, the constraint condition satisfies i=1cuik=1(1kn), uik0(1ic,1kn); xkvi2 is the Euclidean distance between the pixel xk and the ith cluster centre vi; n is the total number of pixels in the image to be classified; c is the number of cluster categories; and m is the fuzzification parameter.

In formula (2.3), ZRi is the number of pixels belonging to the ith type in the spatial neighbourhood of the initial urban built-up areas classification result J obtained from formula (2.2); ZR is the number of pixels in the spatial neighbourhood; Wy is the time weight of the yth frame in the T frame night-time light data; and Pi is the kernel density calculated from the POI data kernel density from formula (2.1).

2.2. High precision human settlement index (STP-HSI)

The traditional human settlement index (HSI) helps in urban built-up areas extraction but has certain drawbacks, such as difficulty in distinguishing urban areas from bare soil and water bodies, and low recognition of areas with no or low light [29,30]. Therefore, we made further improvements based on the HSI index principle. Considering that the HSI is calculated from the night-time light remote sensing image and NDVI together [31,32], the original night-time light remote sensing image has problems such as light spillage and noise, which affect the accuracy of the HSI to a certain extent. This paper proposes the STP-HSI index, which improves the scenario when an image element has both built-up and unbuilt-up urban areas by fusing the neighbourhood and time-series information of the image element on the night-time light remote sensing image space with the POI data, thus improving the accuracy of the traditional human settlement index. The STP-HSI takes into account the disadvantages of poor greyscale information and the noise immunity of the individual image elements and the effect of anomalous data on the extraction results. The original HSI formula and NDVI for the normalized vegetation index and the improved high-precision human settlement index are shown in formulae (2.4)–(2.6).

HSI=(1NDVI)+OLSnor(1OLSnor)+NDVI+OLSnor×NDVI, 2.4
NDVI=NIRRedNIR+Red 2.5
andSTPHSI=(1NDVI)+((EEmin)/(EmaxEmin))(1((EEmin)/(EmaxEmin)))+NDVI+((EEmin)/(EmaxEmin))×NDVI, 2.6

where OLSnor is the night-time light index and E is the FCM-STP index.

2.3. Random forest feature database

In the random forest algorithm [33], the eigenvalues of the feature database have a large impact on the random forest classification accuracy. In this paper, 59 feature values were extracted, taking into account the geographical, architectural and economic factors, specifically including one high-precision human settlement index, STP-HSI, based on multi-source remote sensing data extraction, 58 feature values from the random forest method built-up areas extraction feature database for single-source remote sensing data, 56 texture features, 1 normalized vegetation index and 1 normalized architectural index, as shown in table 1 below.

Table 1.

Feature parameters of the built-up areas extracted from random forest based on multi-source remote sensing data.

feature classification name indicates number of eigenvalues
texture features mean mean_b1,2,3,4,5,6,7 7
varian varian_b1,2,3,4,5,6,7 7
contrast contrast_b1,2,3,4,5,6,7 7
homogeneity homogeneity_b1,2,3,4,5,6,7 7
dissimilarity dissimilarity_b1,2,3,4,5,6,7 7
correlation correlation_b1,2,3,4,5,6,7 7
entropy entropy_b1,2,3,4,5,6,7 7
second-moment second-moment_b1,2,3,4,5,6,7 7
feature index normalized vegetation index NDVI 1
normalized building index NDBI 1
high precision human settlement index STP-HSI 1

3. Experiment and discussion

To broadly evaluate the superiority of the method proposed in this paper, the method was implemented in three Chinese cities, Guangzhou, Xiamen and Nanjing. The cities were selected to cover different geographical locations and climatic conditions across China. The overall accuracy (calculated from the confusion matrix) and kappa coefficient were used to evaluate the model accuracy. In general, the overall accuracy of the confusion matrix was close to 1, the kappa coefficient is between [0.6, 1], the classification results were highly consistent with the actual results, and the classification accuracy was high. The kappa coefficient was between [0.4, 0.6], the classification result was moderately consistent with the actual result, and the classification accuracy was average.

3.1. Datasets and data preprocessing

The datasets used in this experiment were the night-time light remote sensing data, Landsat 8 OLI remote sensing data and POI data, and the datasets were preprocessed accordingly.

3.1.1. Night-time light remote sensing data

Two kinds of night-time light remote sensing data, Luojia-1 and NPP-VIIRS, were selected to evaluate the extraction accuracy of the different night-time light data in the built-up areas.

The ‘Luojia-1’ satellite is widely used in urban built-up areas extraction [34]. Its imaging mode is a frame pushing and sweeping imaging mode, and the sampling interval is 5 s per frame; therefore, there will be multiple night light remote sensing images taken continuously in the same areas on the same day. The main parameters of ‘Luojia-1’ are as shown in table 2.

Table 2.

Luojia-1 night-time light remote sensing satellite parameters.

parameter items indicators parameter items indicators
time series June 2018–present resolution (m) 100–150
nominal height of track (km) 500–600 positioning accuracy (m) Better than 700
number of image elements 2048 × 2048 SNR (dB) better than 35
imaging spectrum (nm) full colour 480–800 revisit time (days) 15

The NPP satellite is a satellite launched by the United States to detect the Earth's environment; it carries a VIIRS sensor with high sensitivity and captures images in both day and night modes. Day/Night Band (DNB) is one of the bands in NPP-VIIRS and is mainly used to detect night light information.

Night-time light remote sensing data preprocessing mainly includes reprojection, resampling and image cropping. The resampling accuracy of the Liaojia-1 night-time light remote sensing image is 50 m, which is consistent with the spatial resolution of the POI kernel density map. Image cropping is performed using Chinese administrative boundary vector data to obtain two types of night-time light remote-sensing images of the desired city. Regarding the preprocessing of the Luojia-1 night-time light remote sensing data as radiation correction, the expression is as follows:

NTLLuojial01=DN3/21010, 3.1

where DN and NTLLuojial01 are the image brightness values before and after a radiation correction.

3.1.2. Landsat 8 Operational Land Imager remote sensing data

Landsat 8 is one of the longest observing Earth satellites, with a large amount of Earth observation data, and it can usually be applied directly without requiring geometric corrections. To use Landsat 8 remote sensing data rich in beam spectral information and to best retain the information in the original spectrum, radiometric calibration and image cropping must be performed [3537].

3.1.3. Point-of-interest data

POI data exists in point form, and each point-of-interest datum represents a geographical object. It has high data accuracy, timeliness and rich information, so many scholars also apply POI data to the field of urban planning [38]. In this paper, the POI data of Guangzhou city (2018), Nanjing city (2018) and Xiamen city (2018) were used in our experiments, and the POI kernel density map of Guangzhou is shown in figure 3.

Figure 3.

Figure 3.

POI kernel density of Guangzhou.

3.2. FCM-STP night-time light index results and analysis

To demonstrate the superiority of the FCM-STP index, in this paper, night-time light remote sensing data such as Luojia-1 and NPP-VIIRS were applied. The following experimental results use Luojia-1 as an example. The FCM index method and the FCM-STP index method compared with the reference built-up areas are shown in figures 46 (red represents the real data areas, and white represents the index extraction areas). The extraction results show that our proposed FCM-STP index is the closest to the reference built-up areas as a whole, and it can be seen that the lighting data spillover problem is reduced, the influence of lighting data noise on the extraction of built-up areas is reduced and the extracted built-up areas is the closest match to the reference built-up areas. It can be seen that the FCM-STP index proposed in this paper can reduce the light data spillover problem and reduce the influence of light data noise on the built-up areas extraction, which can improve the accuracy of the built-up areas extraction and has universal applicability.

Figure 5.

Figure 5.

Comparison of the two algorithms based on the referenced built-up areas for Nanjing city. (a) FCM index. (b) FCM-STP index.

Figure 4.

Figure 4.

Comparison of the two algorithms based on the referenced built-up areas for Guangzhou city. (a) FCM index (b) FCM-STP index.

Figure 6.

Figure 6.

Comparison of the two algorithms based on the referenced built-up areas for Xiamen city. (a) FCM Index. (b) FCM-STP Index.

3.3. Comparison analysis of built-up areas extracted from single-source and multi-source remote sensing data

In this paper, experiments in three regions are given to compare the accuracy of the built-up areas extraction models based on single-source and multi-source remote sensing data, as shown in figures 79, in which the yellow areas indicate the model-extracted urban built-up areas and the green areas indicate the model-extracted non-urban built-up areas. As a result, the built-up areas extraction model simulation map of single-source remote sensing data is generally more patchy, more fragmented and has larger internal gaps than the built-up areas extraction model map of the multi-source remote sensing data, and much information is lost in the extraction process.

Figure 8.

Figure 8.

Extraction results of built-up areas based on multi-source remote sensing data from Luojia-1, POI and Landsat 8. (a) Guangzhou. (b) Nanjing. (c) Xiamen.

Figure 7.

Figure 7.

Extraction results of built-up areas based on single-source remote sensing data from Landsat 8. (a) Guangzhou. (b) Nanjing. (c) Xiamen.

Figure 9.

Figure 9.

Extraction results of built-up areas based on the multi-source remote sensing data of NPP-VIIRS, POI and Landsat 8. (a) Guangzhou. (b) Nanjing. (c) Xiamen.

The kappa coefficients and the overall accuracy (calculated from the confusion matrix) of the built-up areas obtained for the three test cities are shown in tables 3 and 4.

Table 3.

Evaluation indices of the built-up areas extraction model based on single remote sensing data.

city overall accuracy (%) kappa coefficient
Landsat 8
Guangzhou 96.93 0.94
Nanjing 94.62 0.89
Xiamen 94.37 0.89

Table 4.

Evaluation index of the built-up areas extraction model based on multi-source remote sensing data.

city overall accuracy (%)
kappa coefficient
Luojia-1, POI, Landsat 8 NPP-VIIRS, POI, Landsat 8 Luojia-1, POI, Landsat 8 NPP-VIIRS, POI, Landsat 8
Guangzhou 97.77 98.40 0.96 0.97
Nanjing 95.07 93.72 0.90 0.87
Xiamen 96.54 96.00 0.93 0.92

Comparing the results of the built-up areas extraction model with the single-source and multi-source remote sensing data, it is clearly seen from the accurate evaluation of each group in tables 3 and 4 that the accuracy of the built-up areas model extraction for multi-source remote sensing data is higher than that of the built-up areas model extraction for single-source remote sensing data. The results of the built-up areas extracted by the two methods are used to calculate two evaluation indices, the overall accuracy and the kappa coefficient. The overall accuracy in Guangzhou was improved by up to 1.47%, and the kappa coefficient is improved by up to 3%. The overall accuracy in Nanjing was improved by up to 0.45%, and the kappa coefficient was improved by up to 0.91%. The overall accuracy in Xiamen was improved by up to 2.17%, and the kappa coefficient was improved by up to 4%. In the four groups of model training, the overall accuracy of the results of the built-up areas extraction model training based on multi-source remote sensing data was a minimum of 93.72% and a maximum of 98.40%, the kappa coefficient was a minimum of 0.87 and a maximum of 0.97, the kappa coefficients were all greater than 0.8, and the classification results were almost identical to the actual results. However, the overall accuracy of the training results of the built-up areas extraction model based on single-source remote sensing data was 94.37% at the minimum and 96.93% at the maximum, and the kappa coefficient was 0.89 at the minimum and 0.94 at the maximum. Both indices were lower than the training results of the built-up areas extraction model based on multi-source remote sensing data.

3.4. A comparison of random forest-based multi-source extraction methods and support vector machine algorithms

To evaluate the superiority of the random forest-based multi-source extraction algorithm and obtain more accurate simulation results of urban construction land, in this paper, three cities, Guangzhou, Xiamen and Nanjing in China, are sampled, and two kinds of night light remote sensing data, Luojia-1 and NPP/VIIRS, are used as data sources for 12 sets of comparison experiments. Simultaneously, the urban built-up areas model based on the random forest method and support vector machine (SVM) simulates 14 sets of experiments in three cities, and after sample validation, the different states presented by different lighting data applied to the same method can be seen in figure 10. Whether applying random forest extraction or support vector machine extraction, it is obvious that the data patches of Luojia-1 star are relatively fragmented, the main urban part of the NPP-VIIRS data patches are distributed in patches with fragmented boundaries, and there is a phenomenon of missing urban boundary information.

Figure 10.

Figure 10.

Comparison results of built-up areas extractions based on support vector machine and random forest. (a) random forest algorithm. (b) SVM algorithm.

Additionally, to evaluate the accuracy of both algorithms, the kappa coefficients and the overall accuracy (calculated from the confusion matrix) of the built-up areas obtained for the three test cities are shown in table 5.

Table 5.

Evaluation indices of the SVM built-up areas extraction model based on random forest.

city overall accuracy (%)
Kappa coefficient
Luojia-1, POI, Landsat 8
NPP-VIIRS, POI, Landsat 8
Luojia-1, POI, Landsat 8
NPP-VIIRS, POI, Landsat 8
random forest SVM random forest SVM random forest SVM random forest SVM
Guangzhou 97.77 89.48 98.40 93.35 0.96 0.78 0.97 0.86
Nanjing 95.07 84.72 93.72 86.57 0.90 0.69 0.87 0.73
Xiamen 96.54 91.77 96.00 91.77 0.93 0.83 0.92 0.83

Comparing the results of the training of the urban built-up areas extraction model of the random forest algorithm and SVM algorithm, it is clearly seen from the accuracy evaluation of each group that the accuracy of the urban built-up areas extracted by the random forest model is higher than that of the SVM model. The overall accuracy and kappa coefficient were calculated for the results of the built-up areas extracted by the two methods. In 14 groups of model training, the overall accuracy of the SVM algorithm-based urban built-up areas extraction model was 84.72% at the minimum and 93.35% at the maximum, and the kappa coefficient was 0.69 at the minimum and 0.86 at the maximum. The overall accuracy of the results of the random forest algorithm used in this paper is 93.72% at the minimum and 98.40% at the maximum; the kappa coefficient was 0.87 at the minimum and 0.97 at the maximum. Compared with the urban built-up areas extraction model of the SVM algorithm, the results of the random forest algorithm show that the overall accuracy in Guangzhou city was improved by up to 8.29%, and the kappa coefficients were improved by up to 0.18. The overall accuracy in Nanjing was improved by up to 10.35%, and the kappa coefficient was improved by up to 0.21. The overall accuracy in Xiamen was improved by up to 4.77%, and the kappa coefficient was improved by up to 0.10. It can be considered that the method proposed in this paper is better in terms of effect.

3.5. Contribution rates analysis of different features

A contribution rate analysis selects some of the feature values, in a dataset with many feature values, that have a large impact on the results by building the model. In this paper, random forest was used to calculate the contribution of the two indices to urban built-up areas extraction. Also, in this paper, the STP-HSI index was presented for urban built-up areas extraction using multi-source remote sensing data. This index has a high contribution rate for the classification of built-up areas under different environmental conditions.

3.5.1. Comparison of the HSI and STP-HSI contributions

To compare the contribution of HSI and STP-HSI based on the random forest algorithm. In this experiment, STP-HSI and HSI were selected for the construction of the database, and the final contribution rates obtained are shown in table 6. The STP-HSI index of Guangzhou, Nanjing and Xiamen were obtained compared with the traditional HSI index, as shown in figures 1113.

Figure 12.

Figure 12.

Comparison of HSI and STP-HSI in Nanjing on different night-time light data. (a) Luojia-1. (b) NPP-VIIRS.

Table 6.

Comparison of the contribution rates between HSI and STP-HSI.

city Luojia-1, POI, Landsat 8
NPP-VIIRS, POI, Landsat 8
HSI STP-HSI HSI STP-HSI
Guangzhou 0.4817 0.5184 0.4813 0.5187
Nanjing 0.3713 0.6287 0.4304 0.5696
Xiamen 0.4126 0.5874 0.4618 0.5382
Figure 11.

Figure 11.

Comparison of HSI and STP-HSI in Guangzhou based on different night-time light data. (a) Luojia-1. (b) NPP-VIIRS.

Figure 13.

Figure 13.

Comparison of HSI and STP-HSI in Xiamen on different night-time light data. (a) Luojia-1. (b) NPP-VIIRS.

As seen from the comparative plots obtained from the four groups, the range of HSI obtained from the raw night-time light remote sensing data is larger, and the contours of the STP-HSI range are more pronounced. The contribution of the STP-HSI index is generally higher than the contribution of the HSI, as shown by the contribution indicators in table 6. The STP-HSI index contribution was a minimum of 0.5184 and a maximum of 0.6287, while the HSI index contribution was a minimum of 0.3713 and a maximum of 0.4817. In Guangzhou, it improved by up to 3.74%, in Nanjing by up to 25.74% and in Xiamen by up to 17.48%. This reflects the universality of the STP-HSI index and further shows that the FCM-STP index proposed in this paper is useful in increasing the boundary between built and unbuilt areas and reducing light spillover.

3.5.2. Contribution of eigenvalues

Based on the random forest algorithm, to better extract the urban construction areas and obtain more accurate urban construction areas results in the computing process, this paper sets a feature value greater than 0.01 as the important feature value for this experiment for the built-up areas extraction model feature database. The feature contribution degree graph is shown in figure 14.

Figure 14.

Figure 14.

Contribution rates of different features. (a) Guangzhou. (b) Nanjing. (c) Xiamen.

From the feature contribution graph, each group of graphs has nine features, including STP-HSI, NDVI, NDBI, Mean_b1, Mean_b2, Mean_b4, Mean_b5, Mean_b6 and Mean_b7, indicating that STP-HSI is normalized. The vegetation index, normalized building index and the mean value of image band texture features have the greatest contribution rates to the extraction of urban built-up areas. The STP-HSI ranks first in the contribution of the 59 eigenvalues, indicating that the human habitation index proposed in this paper is inextricably linked with built-up areas and is an important factor in the extraction of urban built-up areas.

4. Conclusion

Remote sensing satellite data offer many new methods for urban built-up areas extraction. In this paper, the main objective was to develop a new method that improves the accuracy of built-up areas extraction. A new STP-HSI index method for urban built-up areas extraction based on multi-source remote sensing data was designed. Considering the shortcomings of the existing methods for urban built-up areas extraction, first, an FCM-STP index is proposed, which can effectively fuse the spatial and temporal information of night-time light and POI attribute information to reduce the light spillover of single-source night-time light data. Furthermore, an STP-HSI was calculated based on the FCM-STP index and added to the random forest database to optimize the extraction results of the urban built-up areas. Experiments have shown that the contribution of the STP-HSI is improved by up to 25.74% compared with the HSI. From the perspective of the extraction method for the random forest method, extracting built-up areas from multiple sources of remote sensing data was better than extracting from single-source remote sensing data, with better overall results and higher model accuracy. Therefore, the application of this method to the extraction of urban built-up areas is of great practical significance.

Data accessibility

Data are available from the Dryad Digital Repository: https://doi.org/10.5061/dryad.qz612jmhr [39].

Authors' contributions

L.B.: conceptualization, data curation, investigation, methodology, project administration; D.D.: methodology, software, writing—review and editing; L.T.: methodology, writing—original draft; Z.Z.: formal analysis, funding acquisition; M.D.: data curation, resources; X.X.: investigation.

All authors gave final approval for publication and agreed to be held accountable for the work performed therein.

Conflict of interest declaration

We declare we have no competing interests.

Funding

This research was funded by the National Key R&D Program of China (grant no. 2020YFA0713503), the Natural Science Foundation of Hunan Province (grant no. 2022JJ30561), the Project of Hunan Provincial Natural Resources Department (grant no. 2022-15), the Natural Science Foundation of Liaoning Province (grant no. 2020-BS-259) and supported by Hunan Provincial Innovation Foundation for Postgraduate (grant nos. QL20220161; XDCX2022L024).

References

  • 1.He C, Shi P, Xie D, Zhao Y. 2010. Improving the normalized difference built-up index to map urban built-up areas using a semiautomatic segmentation approach. Remote Sens. Lett. 1, 213-221. ( 10.1080/01431161.2010.481681) [DOI] [Google Scholar]
  • 2.Wang L, Zhu J, Xu Y, Wang Z. 2018. Urban built-up area boundary extraction and spatial-temporal characteristics based on land surface temperature retrieval. Remote Sens. 10, 473. ( 10.3390/rs10030473) [DOI] [Google Scholar]
  • 3.Saravanabavan V, et al. 2020. Urban disease ecology and its spatial variation of Chikungunya in Madurai City, Tamilnadu, India: a geo- medical study. GeoJournal 86, 2335-2350. ( 10.1007/s10708-020-10192-6) [DOI] [Google Scholar]
  • 4.Chaudhuri D, Kushwaha NK, Samal A, Agarwal RC. 2016. Automatic building detection from high-resolution satellite images based on morphology and internal gray variance. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 9, 1767-1779. ( 10.1109/JSTARS.2015.2425655) [DOI] [Google Scholar]
  • 5.Pesaresi M, et al. 2013. A global human settlement layer from optical HR/VHR RS data: concept and first results. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 6, 2102-2131. ( 10.1109/JSTARS.2013.2271445) [DOI] [Google Scholar]
  • 6.Goldblatt R, et al. 2018. Using Landsat and nighttime lights for supervised pixel-based image classification of urban land cover. Remote Sens. Environ. 205, 253-275. ( 10.1016/j.rse.2017.11.026) [DOI] [Google Scholar]
  • 7.Shiping Y, Yuzhang W, Xiaocan Z. 2007. An automated method to built-up area extraction from quickbird imagery. In Proc. of 2007 Int.Symp. on Computer Science and Technology (ISCST'2007), pp. 377-380. Atlanta, GA: The American Scholars Press. [Google Scholar]
  • 8.Jun Z, Peijun L, Jinfei W. 2014. Urban built-up area extraction from landsat TM/ETM+ images using spectral information and multivariate texture. Remote Sens. 6, 7339-7359. ( 10.3390/rs6087339) [DOI] [Google Scholar]
  • 9.Bhatti SS, Tripathi NK. 2014. Built-up area extraction using Landsat 8 OLI imagery. GIScience Remote Sens. 51, 445-467. ( 10.1080/15481603.2014.939539) [DOI] [Google Scholar]
  • 10.Duque JC, Lozano-Gracia N, Patino JE, Restrepo P, Velasquez WA. 2019. Spatiotemporal dynamics of urban growth in Latin American cities: an analysis using night-time light imagery. Landsc. Urban Plan. 191, 103640. ( 10.1016/j.landurbplan.2019.103640) [DOI] [Google Scholar]
  • 11.Shi K, Huang C, Yu B, Yin B, Huang Y, Wu J. 2014. Evaluation of NPP-VIIRS night-time light composite data for extracting built-up urban areas. Remote Sens. Lett. 5, 358-366. ( 10.1080/2150704X.2014.905728) [DOI] [Google Scholar]
  • 12.Liu Q, Zhan Q, Li J, Yang C, Liu W. 2021. Extracting built-up areas using Luojia-1A nighttime light imageries in Wuhan, China. Geomat. Inf. Sci. Wuhan Univ. 46, 30-39. ( 10.13203/j.whugis20190376) [DOI] [Google Scholar]
  • 13.Zhang Y, Li X, Song Y, Li C. 2020. Urban spatial form analysis of GBA based on ‘LJ1-01’ night-time light remote sensing images. J. Appl. Sci. 38, 466-477. ( 10.3969/j.issn.0255-8297.2020.03.012) [DOI] [Google Scholar]
  • 14.Pandey B, Joshi PK, Seto KC. 2013. Monitoring urbanization dynamics in India using DMSP/OLS night time lights and SPOT-VGTdata. Int. J. Appl. Earth Obs. Geoinf. 23, 49-61. ( 10.1016/j.jag.2012.11.005) [DOI] [Google Scholar]
  • 15.Zhuo L, Ichinose T, Zheng J, Chen J, Shi PJ, Li X. 2009. Modelling the population density of China at the pixel level based on DMSP/OLS non-radiance-calibrated night-time light images. Int. J. Remote Sens. 30, 1003-1018. ( 10.1080/01431160802430693) [DOI] [Google Scholar]
  • 16.Tan M, Liu K, Liu L, Zhu Y, Wang D. 2017. Spatialization of population in the Pearl River Delta in 30 m grids using random forest model. Prog. Geogr. 36, 1304-1312. ( 10.18306/dlkxjz.2017.10.012) [DOI] [Google Scholar]
  • 17.Sharma R, Tateishi R, Hara K, Gharechelou S, Iizuka K. 2016. Global mapping of urban built-up areas of year 2014 by combining MODIS multispectral data with VIIRS night-time light data. Int. J. Digit. Earth 9, 1-17. ( 10.1080/17538947.2016.1168879) [DOI] [Google Scholar]
  • 18.Guo SS, Gong J, Yin JF. 2016. Study on grid refinement for population distribution based on DMSP/OLS. J. Seismol. Res. 39, 321-326. ( 10.3969/j.issn.1000-0666.2016.02.020) [DOI] [Google Scholar]
  • 19.Wu J, Xu D, Xie W, Peng J. 2015. Spatialization of demographic data at medium scale based on remote sensing images; regarding Beijing-Tianjin-Hebei as an example. Acta Sci. Nat. Univ. Pekin. 51, 707-717. ( 10.13209/j.0479-8023.2015.100) [DOI] [Google Scholar]
  • 20.Yang XC, Yue WZ, Gao DW. 2013. Spatial improvement of human population distribution based on multi-sensor remote-sensing data: an input for exposure assessment. Int. J. Remote Sens. 34, 5569-5583. ( 10.1080/01431161.2013.792970) [DOI] [Google Scholar]
  • 21.Sun W, Zhang X, Wang N, Cen Y. 2017. Estimating population density using DMSP-OLS night-time imagery and land cover data. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 10, 2674-2684. ( 10.1109/JSTARS.2017.2703878) [DOI] [Google Scholar]
  • 22.Sirmacek B, Unsalan C. 2010. Urban area detection using local feature points and spatial voting. IEEE Geosci. Remote Sens. Lett. 7, 146-150. ( 10.1109/LGRS.2009.2028744) [DOI] [Google Scholar]
  • 23.Li F, Yan Q, Bian Z, Liu B, Wu Z. 2020. A POI and LST adjusted NTL urban index for urban built-up area extraction. Sensors 20, 2918. ( 10.3390/s20102918) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Li C, Wang X, Wu Z, Dai Z, Yin J, Zhang C. 2021. An improved method for urban built-up area extraction supported by multi-source data. Sustainability 13, 5042-5042. ( 10.3390/SU13095042) [DOI] [Google Scholar]
  • 25.Bramhe VS, Ghosh SK, Garg PK. 2018. Extraction of built-up areas using convolutional neural networks and transfer learning from Sentinel-2 satellite images. In Proc. of the ISPRS Technical Commission III Midterm Symp. on Developments, Technologies and Applications in Remote Sensing. ISPRS Technical Commission III on Remote Sensing. ( 10.3390/rs10030473) [DOI] [Google Scholar]
  • 26.Pal M, Mather PM. 2005. Support vector machines for classification in remote sensing. Int. J. Remote Sens. 26, 1007-1011. ( 10.1080/01431160512331314083) [DOI] [Google Scholar]
  • 27.Pelizari PA, Spröhnle K, Geiß C, Schoepfer E, Plank S, Taubenböck H. 2018. Multi-sensor feature fusion for very high spatial resolution built-up area extraction in temporary settlements. Remote Sens. Environ. 209, 793-807. ( 10.1016/J.RSE.2018.02.025) [DOI] [Google Scholar]
  • 28.He X, Zhang Z, Yang Z. 2021. Extraction of urban built-up area based on the fusion of night-time light data and point of interest data. R. Soc. Open Sci. 8, 210838. ( 10.5194/ISPRS-ARCHIVES-XLII-3-79-2018) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Bitam A, Ameur S. 2013. A local-spectral fuzzy segmentation for MSG multispectral images. Int. J. Remote Sens. 34, 8360-8372. ( 10.1080/01431161.2013.838707) [DOI] [Google Scholar]
  • 30.Lu D, Tian H, Zhou G, Ge H. 2015. Regional mapping of human settlements in southeastern China with multisensor remotely sensed data. Remote Sens. Environ. 112, 3668-3679. ( 10.1016/j.rse.2008.05.009) [DOI] [Google Scholar]
  • 31.Matsushita B, Fan Y, Fukushima T. 2014. Impervious surface area as an indicator for evaluating drainage basins. In Integrative observations and assessments (eds Nakano S, Yahara T, Nakashizuka T), pp. 239-252. Ecological Research Monographs. Tokyo, Japan: Springer. ( 10.1007/978-4-431-54783-9_12) [DOI] [Google Scholar]
  • 32.Breiman L. 2001. Random forests. Mach. Learn. 45, 5-32. ( 10.1023/A:1010933404324) [DOI] [Google Scholar]
  • 33.Tucker CJ. 1979. Red and photographic infrared linear combinations for monitoring vegetation. Rem. Sens. Environ. 8, 127-150. ( 10.1016/0034-4257(79)90013-0) [DOI] [Google Scholar]
  • 34.Zhu Z, et al. 2013. Global data sets of vegetation leaf area index (LAI) 3g and fraction of photosynthetically active radiation (FPAR) 3g derived from global inventory modeling and mapping studies (GIMMS) normalized difference vegetation index (NDVI3g) for the period 1981 to 2011. Rem. Sens. 5, 927-948. ( 10.3390/rs5020927) [DOI] [Google Scholar]
  • 35.Zhai W, Han B, Cheng C. 2020. Evaluation of Luojia 1–01 nighttime light imagery for built-up urban area extraction: a case study of 16 cities in China. IEEE Geosci. Remote Sens. Lett. 17, 1802-1806. ( 10.1109/LGRS.2019.2955496) [DOI] [Google Scholar]
  • 36.Zhong P, Wang R. 2007. A multiple conditional random fields ensemble model for urban area detection in remote sensing optical images. IEEE Trans. Geosci. Remote Sens. 45, 3978-3988. ( 10.1109/TGRS.2007.907109) [DOI] [Google Scholar]
  • 37.Schaaf CB, et al. 2002. First operational BRDF, albedo nadir reflectance products from MODIS. Remote Sens. Environ. 83, 135-148. ( 10.1016/S0034-4257(02)00091-3) [DOI] [Google Scholar]
  • 38.Wu H, Lin A, Xing X, Song D, Li Y. 2021. Identifying core driving factors of urban land use change from global land cover products and POI data using the random forest method. Int. J. Appl. Earth Obs. Geoinf. 103, 102475. ( 10.1016/j.jag.2021.102475) [DOI] [Google Scholar]
  • 39.Bu L, Dai D, Tu L, Zhang Z, Deng M, Xie X. 2022. Data from: An STP-HSI index method for urban built-up area extraction based on multi-source remote sensing data. Dryad Digital Repository. ( 10.5061/dryad.qz612jmhr) [DOI] [PMC free article] [PubMed]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Citations

  1. Bu L, Dai D, Tu L, Zhang Z, Deng M, Xie X. 2022. Data from: An STP-HSI index method for urban built-up area extraction based on multi-source remote sensing data. Dryad Digital Repository. ( 10.5061/dryad.qz612jmhr) [DOI] [PMC free article] [PubMed]

Data Availability Statement

Data are available from the Dryad Digital Repository: https://doi.org/10.5061/dryad.qz612jmhr [39].


Articles from Royal Society Open Science are provided here courtesy of The Royal Society

RESOURCES