A 30-meter resolution national urban land-cover dataset of China, 2000–2015

Accurate urban land-cover datasets are essential for mapping urban environments. However, a series of national urban land-cover data covering more than 15 years that characterizes urban environments is relatively rare. Here we propose 10 a hierarchical principle on remotely sensed urban land-use/cover classification for mapping intra-urban structure/component dynamics. China’s Land Use/cover Dataset (CLUD) is updated, delineating the imperviousness, green surface, waterbody and bare land conditions in cities. A new data subset called CLUD-Urban is created from 2000 to 2015 at five-year intervals with a medium spatial resolution (30m). The first step is a prerequisite to extract the vector boundaries covered with urban areas from CLUD. A new method is then proposed using logistic regression between urban impervious surface area (ISA) 15 and the annual maximum Normalized Difference Vegetation Index (NDVI) value retrieved from Landsat images based on a big-data platform with Google Earth Engine. National ISA and urban green space (UGS) fraction datasets for China are generated at 30-meter resolution with five-year intervals from 2000 to 2015. The overall classification accuracy of national urban areas is 92%. The root mean square error values of ISA and UGS fractions are 0.10 and 0.14, respectively. The datasets indicate that the total urban area of China was 6.28×10 km in 2015, with average fractions of 70.70% and 26.54% 20 for ISA and UGS, respectively. The ISA and UGS increased between 2000 and 2015 with unprecedented annual rates of 1,311.13 km/yr and 405.30 km/yr, respectively. CLUD-Urban can be used to enhance our understanding of urbanization impacts on ecological and regional climatic conditions and urban dwellers’ environments. CLUD-Urban can be applied in future researches on urban environmental research and practices in the future. The datasets can be downloaded from https://doi.org/10.5281/zenodo.2644932 (Kuang et al., 2019). 25

water flows within the city (Kuang et al., 2016). Urban bare land indicates land not covered by vegetation, water, buildings, or roads in cities (Li et al., 2016). UGS and waterbody provide significant, good influences on urban environments (Hamdi and Schayes, 2008), but most urban classification products tend to exclude these two components.
First, we extracted urban areas according to the boundaries established in CLUD. In medium-high resolution images, urban areas are reflected by the composite characteristics of ISA and UGS, and the spectral characteristics of different urban 5 land uses are not consistent (Fig. 1). For example, the image characteristics of Suzhou old city and new city are different in Landsat 8 OLI images. Because buildings in the old area of the city are distributed compactly, their color in remote sensing images is relatively dark. The new area of the city is dominated by industrial land and was planned well. The city is relatively sprawled out and its features are distinct in the images. With prior knowledge of image classification and humancomputer visual interpretation, we extracted urban land in Suzhou by detecting the city's boundaries. 10

Pixel-based relationship between ISA and NDVI
A negative correlation between normalized difference vegetation index (NDVI) and ISA fraction was found at the pixel level (Kuang et al., 2016), and this relationship has since been applied in remotely sensed ISA estimation. In arid regions, however, percentage of vegetation cover is seasonally dependent (Lu et al., 2008). If NDVI is calculated by using 15 only one image, some UGS might be interpreted as bare soil. The fusion of NDVI data in one year may help to improve the accuracy of vegetation characterization. Therefore, multi-interval NDVI data were employed to generate an annual NDVI maximum value. The ISA fraction was estimated through the corresponding empirical model. We calculated the relationship between ISA percentage and NDVI value and plotted an ISA percentage as a function of the NDVI map (Fig. 2). According to the statistical results, the negative correlation between ISA percentage and NDVI 20 value does not fit well in a linear regression relationship. Under the linear regression assumption, there is an overestimation of ISA in the low-value range and underestimation of ISA in the high-value range. The logistic regression model (LRM) was selected for fitting (Walker and Duncan, 1967). Using this model, the ISA percentage and the maximum annual NDVI value fit better in high-value and low-value accumulation areas. In addition, the input parameters required by logistic regression-ISA classification data and NDVI maximum data-can be obtained through existing methods and datasets (Weng, 2012;25 Gorelick et al., 2017). This relationship can be performed for data analysis. We conducted data analysis as follows: First, the annual NDVI maximum value and ISA classification data were retrieved from Landsat images. Second, the parameters of the logistic regression model were estimated. Third, the annual NDVI maximum value was used as input data to estimate the ISA fraction at the pixel level using the developed LRM as follows: where formula (1) refers to LRM, and refer to the parameters of LRM and is the annual NDVI maximum value.

Calculation of NDVI with Google Earth Engine
The annual NDVI maximum value is calculated from all NDVI images within one year using the maximum algorithm. 5 During the process of combination, the impact of seasonal changes on classification is removed (Lu et al., 2008). Therefore, the vegetation percentage can be accurately extracted. The formula to calculate the annual NDVI maximum value is as follows: where is the annual NDVI maximum value and is the NDVI value of the i th image. 10 We selected 30-m resolution Landsat images for calculating NDVI. All images were collected in Google Earth Engine (GEE) (Gorelick et al., 2017). GEE is a big-data processing platform for classification of multitemporal satellite imagery and geospatial data analysis. It integrates common remote sensing data such as Landsat, MODIS, Sentinel, DMSP-OLS, as well as thematic data such as population distribution and climate data. GEE performs by visual operation interface and analyzes big data through JavaScript codes and functions for big data analyzes. In this study, all Landsat 5/7 images in 2000, 2005 and15 2010 in urban areas and all Landsat 8 images in 2015 were selected to calculate the maximum value of NDVI in a given year.
NDVI data were mainly distributed in the interval between 0 and 1 and stored in float format. Since this format is not conducive to data storage, processing and analysis, NDVI data were amplified by 1,000 and converted into integer format.

Extraction of ISA type for calibration
Urban ISA type was retrieved by linear spectral unmixing methods (Wu and Murray. 2003;Lu et al. 2014). The method 20 was mainly divided into the following steps: First, based on the first three components using the minimum noise fraction (MNF) transformation, four endmembers-high-albedo surfaces (such as plazas, roofs), low-albedo surfaces (such as water, shadow), vegetation and bare soil-were selected. The spectral unmixing method was employed to unmix the Landsat multispectral bands into the four endmembers. A decision tree was built to classify the high-albedo surfaces, low-albedo surfaces, water, vegetation and bare soil based on the fractions after unmixing and the calculation of indexes. The high-25 albedo and low-albedo surfaces were combined to generate the ISA type. We calculated NDVI and modified normalized difference water index (MNDWI) to remove the vegetation and water. The ISA classification of sample cities was extracted for calibration of parameters.

Retrieval of intra-urban structure/component
Due to regional differences in climate and geographical environments, huge discrepancies were found in the ISA and UGS components of cities in different regions in China. We chose sampling cities for the selection of sample points to calibrate the model. According to China's geographical zoning and urban development level, 28 large cities were selected for calibrating the LRM's parameters (Fig. 3). LRM was calculated in each city with 1,000 random samples located in ISA and 5 1,000 random samples located in UGS or waterbody. These samples were merged to extract their corresponding annual NDVI maximum value. They were used as the input for LRM to calibrate its parameters (Fig. 4).
The mean values of parameters were calculated to obtain the LRM for each region (Table 2) October were chosen. The modified normalized difference water index (MNDWI) was employed to delineate water distribution (Xu, 2006). Similar to the extraction of urban waterbody, the bare land data of 2000, 2005, 2010 and 2015 were derived from Landsat images as well. We used a simple index, enhanced built-up and bareness index (EBBI) for urban bare land extraction (As-syakur et al., 2012). Because the spectral features of some green space are similar to bare land in non-15 growth season, it may influence the result of classification (Lu et al., 2008). In this study, the annual greenest-pixel top-ofatmosphere (TOA) reflectance composite products from GEE were used to EBBI calculation. The greenest-pixel means the pixel with the highest value of the NDVI, which remove seasonal turbulence on the classification of bare land. The distribution of water and bare land were extracted by threshold division, which was set by manual selection. Then we retrieved the UGS fraction data. We assumed that in the urban area, the areas without water and bare land were mosaics of 20 ISA and UGS. Among non-water and non-bare areas, the proportion of ISA was deducted, which generated the UGS fraction datasets.

Time-series data refinement
The spectral characteristics of unchanged urban areas as seen from multiple Landsat images varied, resulting in inconsistency of ISA and UGS products. It was necessary, therefore, to refine the products before further analysis. The 25 refinement was based on the following assumptions: (1) The generally increasing trend of impervious density inside Chinese cities is shown beside of urban greening area. (2) There is some unchanged area whose impervious density remained constant from year to year. (3) The UGS density may increase as a result of greening in local areas.
Because of its relatively high accuracy, the dataset of 2015 was chosen as a criterion for refinement of other period datasets. The refinement process was conducted in this way: First, if the proportion of ISA in a certain pixel of a given year 30 was lower than that in the previous year, it was modified to the proportion of ISA in that year as a precondition for those areas which did not implement greening projects.
The ISA fraction data was modified by the following formula: where −1 is the percentage of ISA in the former year and is the percentage of ISA in the later year.
Second, we generated random points inside an urban area for the year 2000. The sampling points where ISA or UGS fractions had changed were excluded in order to generate the unchanged points inside the city. For the second assumption, the proportion of ISA of some unchanged points was kept constant, because the proportion of ISA at these points should be 5 consistent in all the periods.
Third, the proportion of ISA in unchanged points in each year was calculated. The proportion derived from the 2015 images was used to obtain the calibrating parameters for 2000, 2005 and 2010, which improved the initial ISA fraction data for those years. High spatial resolution images from Google Earth have been commonly used for validation (Liu et al., 2014;Zhang et al., 2014;Kuang et al., 2016), and we adopted that method for this study. At least 2,200 random points for each interval were generated throughout China. Evaluating the results of comprehensive data quality in each interval, the overall accuracy of urban land or built-up area was more than 91.98% (Table 3).

Validation of ISA and UGS fractions 20
Google Earth images with higher spatial resolution than Landsat images were employed for the validation of ISA and UGS fractions. The 30 m × 30 m ISAs were first rectified with Google Earth images. A total of 1,111 validation samples with a window size of 3 × 3 pixels (90 m × 90 m grids) for each sample plot were acquired randomly from 44 cities in different regions in China and chosen for validation. We calculated the mean ISA and UGS density in each grid (Fig. 6). The actual value in the same area was obtained by visual interpretation from Google Earth images. Accuracy assessment of ISA 25 and UGS was performed by root mean square error (RMSE) and R-square (Fig. 7). The RMSEs of ISA and UGS fractions were 0.10 and 0.14, respectively. The R-squares were 0.82 and 0.82, respectively.

Comparison of the CLUD-Urban product with other land-use products
We compared the vector boundaries of urban areas with the existing land-use products. There were obvious discrepancies among these products in data production, data source, resolution and definition of urban land-use types. The spatial resolutions of MODIS and ESA land-cover products range from 300 to 1000 m, and their classification systems are based on IGBP and FAO frameworks, respectively (Belward, 1996;FAO, 1997). These products were selected for data 5 comparison (Fig. 8).
The MODIS land-cover data uses the classification system defined by IGBP. In this system, the urban area is defined as built-up area, and is characterized by at least 30% ISA, including building materials and asphalt (Friedl et al., 2010). The ESA land-cover data considers urban areas to be artificial surfaces and associated areas (Bontemps et al., 2011). CLUD-Urban distinguishes between urban and rural areas and can better characterize the urban boundaries (Liu et al., 2005). 10 We take Beijing as an example to demonstrate the different urban land-use products. The Landsat images acquired in 2000,2005,2010 and 2015 were used as reference (Fig. 8a), with a red-green-blue composition. The images of 2000,2005 and 2010 were from Landsat 5 TM, and 2015 images were from Landsat 8 OLI. The resolution of MODIS land-cover products is relatively low (500 m) (Fig. 8c). Due to the coarse definition, the urban areas are relatively large. Moreover, there was little change in urban coverage between different years, which did not allow characterization of urban expansion (Fig.  15 8c). Because both city and countryside are included in the MODIS definition of built-up area, this product cannot effectively distinguish between urban and rural lands. Although the ESA land-cover data had a better resolution of 300 m, urban and rural land still cannot be distinguished (Fig. 8d). In our dataset, the urban and rural areas are well distinguished because of a good definition of urban area (Fig. 8b).

Patterns of urban area change 20
To illustrate the general pattern of national urban land-use/cover change, we analyzed the process of urban expansion since 2000, together with ISA and UGS, which account large of urban area and carry ecological function in cities dynamics in China (Fig. 9, Fig. 10). The growth of ISA and UGS are obvious in main urban areas, like Beijing-Tianjin, Yangtze River Delta and Guangdong-Hong Kong-Macao Great Bay Area (Fig. 9, Fig. 10). Both ISA and UGS showed an increasing trend associated with urban expansion. Higher proportions of ISA and UGS were located in eastern China, where economic 25 conditions were better. Detailed fractions of ISA, UGS, waterbody and bare land in Shenyang are shown in Fig. 11, which illustrates urban internal composition. High proportional areas of ISA represent buildings, roads and plazas, whereas low proportional areas represent parks and greenbelts with ecological functions. This dataset can characterize differences among the selected cities. Some cities, like Beijing and Nanjing, with better-planned urban construction had smaller proportions of ISA (64.60% and 68.19%, respectively) and higher proportions of UGS (33.81% and 30.33%, respectively) within their 30 urban areas in 2015 (Fig. 12).

Dynamics and patterns of ISA density
ISA, UGS, waterbody and bare land are main land-use types within cities. As a key component of urban structure, ISA growth was monitored during urban expansion. China's remotely sensed ISA was 2.22×10 4 km 2 , 2.66×10 4 km 2 , 4. The spatiotemporal distribution of ISA in provinces across China was analyzed, and we found that it is mainly 20 distributed in the coastal and the central regions (Fig. 14d). The distribution is relatively low in the western region. From 2000 to 2015, the ISA of each province showed an increasing trend. The three provinces with the fastest growing trend were Shandong (0.45×10 4 km 2 ), Jiangsu (0.37×10 4 km 2 ) and Guangdong (0.23×10 4 km 2 ). Although the ISA in different provinces showed an increasing trend, the pattern of "high in east and low in west" remained unchanged in each year.
We further analyzed the urban ISA density in different provinces (Fig. 15). The ISA density in 2015 was employed for 25 spatial characteristic delineation. Among all provinces, the three provinces with the lowest ISA density were Chongqing (56.54%), Taiwan (59.89%) and Heilongjiang (60.70%), which are located in southwestern, southeastern and northeastern China, respectively. This result showed that a more dispersed distribution of ISA may be due to reasons like climate, urban policy, etc. On the time scale, from 2000 to 2015, the ISA density showed an increasing trend in most provinces. However, the rate of increase was relatively small. The overall increase in urban ISA density from 2000 to 2015 was 2.36% (from 30 68.34% to 70.70%).

Changes in UGS
China's UGS monitored by remote sensing shows an increasing trend, similar to the trend of urban land area and urban ISA. The total UGS increased from 1.00×10 4 km 2 in 2000 to 1.15×10 4 km 2 in 2005 1.90×10 4 km 2 in 2010 to 1.99×10 4 km 2 in 2015 (Fig. 16, Table 6). Looking at both ISA and UGS in urban areas, our results indicate a slight decrease in percentage of UGS, while ISA density has increased. The UGS fraction was 30.77%, 29.86%, 30. 31% and 27.70%, in 2000, 2005, 2010 5 and 2015, respectively (Fig. 17, Table 6). The results show that the proportion of China's UGS decreased slightly; however, it has stayed around 30% at the national scale.
In order to analyze the regional differences in UGS distribution in China, the provincial administrative units were employed. As seen from the spatial distribution (Fig. 16)

Data availability 20
All data presented in this paper are available in https://doi.org /10.5281/zenodo.2644932 (Kuang et al., 2019). This datasets cover 2000, 2005, 2010 and 2015 with a spatial resolution of 30m. Detailed metadata are associated in the introduction, including contact information.

Conclusion
In this study, CLUD-Urban, a 30-m resolution national urban land-cover change data series for China, was generated 25 using a satellite big-data platform. CLUD-Urban was able to distinguish detailed urban land classification and urban internal time series of urban land-cover change, including ISA, UGS, which can be used to accurately characterize the urban internal structure. The RMSEs of ISA and UGS fractions are 0.10 and 0.14, respectively. Results from the analysis of urban areas, including ISA and UGS and their proportions for each province, show large regional differences in China.

Author contribution
KW, ZS and LX designed the research; ZS and LX implemented the research; KW and ZS wrote the paper. 5

Competing interests
The authors declare that they have no conflict of interest.