A 30-meter resolution dataset of China’s urban impervious surface area and green space fractions, 2000–2018

. Accurate and timely maps of urban underlying land properties at the national scale are of significance in improving habitat environment and achieving sustainable development goals. Urban impervious surface (UIS) and urban green space (UGS) are two core components for characterizing urban underlying environments. However, the UIS and UGS are often 15 mosaicked in the urban landscape with complex structures and composites. The ‘hard classification’ or binary single type cannot be used effectively to delineate spatially explicit urban land surface property. Although six mainstream datasets on global or national urban land use/cover products with 30-m spatial resolution have been developed, they only provide the binary pattern or dynamic of a single urban land type, which cannot effectively delineate the quantitative components or structure of intra-urban land cover. Here we proposed a new mapping strategy to acquire the multitemporal and fractional 20 information of the essential urban land cover types at national scale through synergizing the advantage of both big data processing and human interpretation in aid of geoknowledge. Firstly, the vector polygons of urban boundaries in 2000, 2005, 2010, 2015 and 2018 were extracted from China’s Land Use/cover Dataset (CLUD) derived from Landsat images. Secondly, the national settlement and vegetation percentages were retrieved using sub-pixel decomposition method through random forest algorithm using Google Earth Engine (GEE) platform. Finally, the products of China’s UIS and UGS fractions (CLUD-25 Urban) at 30-meter resolution were developed in 2000, 2005, 2010, 2015 and 2018. We also compared our products with existing six mainstream datasets in quality and accuracy. The assessment results showed that the CLUD-Urban product has higher accuracies in urban boundaries and urban expansion detection than other products, in addition that the accurate UIS and UGS fractions were developed in each period. The overall accuracy of urban boundaries in 2000-2018 are over 92.65%; and the correlation coefficient (R) and root mean square errors (RMSE) of UIS and UGS fractions are 0.91 and 0.10, and 0.89 and 30 0.11, respectively. Our result indicates that the 71% pixels of urban land were mosaicked by the UIS and UGS within cities in 2018, which single UIS classification may highly increase the mapping uncertainty. The high spatial heterogeneity of urban underlying covers was exhibited with average fractions of 68.21% for UIS and 22.30% for UGS in 2018 at national scale. The

effectively extract substrate, dark and vegetation (Small, 2013), the UISA cannot be accurately and directly extracted from multispectral image without post-processing considering its widely spectral variation and different meanings between UISA and substrate (Lu et al., 2014). Because of the high correlation between UISA and vegetation indices in the urban landscape (Weng et al., 2004), fractional UISA dataset can be estimated from vegetation indices using regression-based approach (Sexton et al., 2013;Wang et al., 2017). 70 In this study, we developed the UISA and UGS fractions dataset with 30-m spatial resolution at national scale at fiveyear intervals between 2000 and 2018. This dataset provides foundation for urban dwellers' environments and enhance our understanding on the impacts of urbanization on ecological services and functions, and is also helpful in future researches and practices on urban planning and urban environmental sustainability.
Urban areas as a composite of UISA and UGS have different spectral characteristics in Landsat imagery, as shown in Fig. 1 as an example for a comparison of old cities and new cities in Suzhou. Because buildings in the old city are distributed compactly, their colours in Landsat images are relatively dark, while the new city is dominated by industrial lands with welldesigned urban landscapes, their colours appear bright. With prior knowledge of image classification and human-computer 100 visual interpretation, we extracted China's urban land by detecting the city's boundaries from CLUD: the interpretation symbols of cities in Landsat images were firstly established (Fig. 1), the polygons in GIS were then used to delineate urban boundaries, and were created and labelled as urban area.

Retrieval of UISA fraction 105
The UISA and UGS were characterized as percentage of UISA or UGS in a pixel. In arid and semiarid regions, however, percentage of vegetation cover is seasonally dependent (Lu et al., 2008), therefore, we used multitemporal normalized difference vegetation index (NDVI) data in a year to generate an annual NDVI maximum image to improve the accuracy of vegetation characterization. As a negative correlation between NDVI and UISA fraction was found at the pixel level (Kuang et al., 2016), a regression model based on the relationship between NDVI and UISA fraction was established to 110 estimate UISA fraction.
According to the statistical results, the negative correlation between UISA fraction and NDVI value does not fit well in a linear regression relationship. Under the linear assumption, UISA fraction is overestimated in the low-value range and underestimated in the high-value range (Zhang et al., 2009). However, we found that the logistic regression model (LRM) can reduce the shortcomings of the linear regression model mentioned above, thus, LRM was selected for UISA fraction 115 estimation (Walker and Duncan, 1967). In addition, the input parameters required by logistic regression-UISA classification data with binary value and NDVI maximum data-can be obtained from existing datasets. The major steps include (1) the annual NDVI maximum value and UISA classification data were retrieved from Landsat images, (2) the parameters of the logistic regression model were estimated, and (3) the annual NDVI maximum value was used as input data to estimate the UISA fraction at the pixel level using the developed LRM, which can be expressed as: 120 where and represent the parameters of LRM; is the annual NDVI maximum value: where is the NDVI value of the i th image. Individual NDVI was calculated from Landsat image and all images were 125 Huge discrepancies in the UISA and UGS components of different cities were found because of different climate and geographical conditions. The UISA is often related to urban economic and geographic conditions, and the same economic region can be assumed to have similar UISA density. According to the Chinese economic and geographic zones, we selected 130 28 typical cities to calibrate UISA data using the LRM model. For each city, 1,000 samples for UISA and UGS were randomly selected. They were used as the input for LRM to calibrate parameters (Fig. 2, Fig. 3). The average value of the parameters in each economic and geographic zone is obtained as a regression parameter for all cities in the same zone (Table   2).

Retrieval of UGS fraction 135
According to sample plots collected from typical cities based on Chinese economic and geographic zones, the UGS were calibrated from the vegetation cover in urban landscapes with the following equations: Where VC is the vegetation cover in the urban landscape. NDVI veg and NDVI soil are NDVI values (the annual NDVI 140 maximum image, see equation (3)) at pure vegetation and pure bare soils. α and β are constant and slope in the linear regression.

Validation of CLUDs and of UISA and UGS fractions
The unified quality check and data integration were performed for the years of 2000, 2005, 2010, 2015 and 2018 to ensure the quality and consistency of the interpretation results. In the process of land-use/cover interpretation, field 145 investigations were mainly carried out in autumn in the northern part of the country and in spring in the southern part. High spatial resolution images from Google Earth were used for validation (Liu et al., 2014;Zhang et al., 2014;Kuang et al., 2016;Ning et al., 2019). At least 2,200 points for each interval were randomly generated throughout China. Based on validation results, the overall accuracy of urban land or built-up area was 92-99% for each given year (Table 3) and the overall accuracy for urban land change was 95-97% for each period (Table 4). Google Earth images with higher spatial resolution than Landsat images were employed for the validation of UISA and UGS fractions. Firstly, the 30 m × 30 m UISAs were rectified with Google Earth images. A total of 1,111 validation samples with a window size of 3 × 3 pixels (90 m × 90 m grids) for each sample plot were randomly acquired from 44 cities in different regions in China for validation (Fig. 4). Mean UISA and UGS densities in each grid were calculated. The actual value in the same area was obtained by visual interpretation from Google Earth images. Accuracy assessment of UISA and 155 UGS was performed by root mean square error (RMSE) and correlation coefficient (R). The validation of UISA and UGS fractions in each period shows that the RMSEs were 0.09-0.12 and 0.12-0.17 respectively,, and the R values were 0.89-0.93 and 0.85-0.89 respectively (Table 3). For the validation for change detection results at different period, we chose 741 samples (90m×90m) within urban area for validation. We used medium relatively error (MRE) and R to examine the accuracy. The MRE values of UISA and UGS fractions for each period were 5.2-6.8% and 5.9-7.1% respectively (Table 4). 160

Results
We compared the vector boundaries of urban areas with the existing land-use products and found their obvious discrepancies because of the differences in data production, data source, resolution and definition of urban land-use types.
The spatial resolutions of land-cover products range from 30 m to 1000 m, and their classification systems are based on IGBP or FAO frameworks (Belward, 1996;FAO, 1997). Figure 5 provides a comparison of a list of urban land datasets (see 165   Table 5 for these datasets), showing that our product has better performance in delineating the detailed intra-urban land cover spatial patterns note: both of the GHS Built and GlobaLand 30 products only have two years). The intra-urban landcover is more complex than rural area. However, most urban land products cannot effectively distinguish urban and rural land using an automatic classification method (Fig. 5b, c, d, e). In our dataset, urban area is emphasized from the area where county's or town's government located, usually with a sufficient size of population. Because other products cannot 170 effectively distinguish urban and rural lands, their urban areas were overestimated considerably (Fig. 5). CLUD-Urban can delineate intra-urban land-cover at pixel level, providing more elaborate than other products.
China's UISA shows an increasing trend, from 2.22×10 4 km 2 in 2000 to 5.20×10 4 km 2 in 2018 (Fig. 6) To illustrate the pattern of national urban land change, we analysed the process of urban expansion since 2000, together with UISA and UGS dynamics (Fig. 6, Fig. 7 and Fig. 8). The growths of UISA and UGS were obvious in main urban areas, like Beijing-Tianjin, Yangtze River Delta and Guangdong-Hong Kong-Macao Great Bay Area. Both UISA and UGS 185 showed an increasing trend associated with urban expansion. High proportions of UISA and UGS were located in eastern China because of its good economic conditions. High proportional UISA represents buildings, roads and plazas, whereas low proportional UISA represents parks and greenbelts with ecological functions. This dataset can characterize differences among the selected cities. Some cities, like Beijing and Nanjing with well-planned urban landscapes had relatively small proportions of UISA (59.35% and 68.19%, respectively) and high proportions of UGS (38.61% and 30.33%, respectively) in 190 their urban landscapes in 2018.

Conclusion
The CLUD-Urban -China's UISA and UGS fraction datasets with 30-m spatial resolution was generated using multiple data sources. CLUD-Urban provided detailed delineation in UISA and UGS components for 2000, 2005, 2010, 2015 in China. The novelty of this dataset, comparing to other products, is that it takes cities as heterogeneous units at the pixel level, which is consisted of UISA, UGS, and others. The accuracy of the CLUD-Urban dataset is 91.98% using the 200 integrated approach of visual interpretation and prior knowledge. The RMSEs of UISA and UGS fractions are 0.10 and 0.14, respectively. Results from the analysis of urban areas, including UISA and UGS, show large regional differences in China.
CLUD-Urban provides fundamental data sources for examining urban environment issues and for delineating intra-urban structure or urban landscape at the national scale.

Author contribution 205
KW, ZS and LX designed the research; ZS and LX implemented the research; KW, ZS and LD wrote the paper.

Competing interests
The authors declare no conflict of interest.     MRE, medium relatively error