Time series of Inland Surface Water Dataset in China ( ISWDC ) for 2000-2016 1 derived from MODIS archives 2

Chinese Academy of Sciences, Beijing 100094, China; 5 College of Information Science and Engineering, Shandong Agricultural University, Tai’an 271018, China; 6 School of Earth Sciences and Resources, China University of Geosciences, Beijing 100083, China; 7 College of Earth Science, Chengdu University of Technology, Chengdu 610059, China; 8 State key Laboratory of Simulation and Regulation of Water Cycle in River Basin, China Institute of Water Resources and Hydropower 9


Introduction
Surface water is the most important source of water from planetary water resources available for the survival of both human and ecological systems (Lu and He, 2006).It is a key component of the hydrological cycle and the key factor affecting the sustainable development of human society and ecosystem.Both climate change and human activities have a role in affecting the surface water availability at a given area and time.In order to locate the position and examine the change in dynamics of the inland surface water, regional and global datasets have already been produced through remotely sensed data by various researchers (Carroll et al., 2009;Ver-poorter et al., 2014;Feng et al., 2015;Klein et al., 2014;Tulbure et al., 2016), but these contemporary studies were limited to measuring long-term changes at high spatial and temporal resolution.Pekel et al. (2016) quantified the changes in global surface water (GSW) over the past 32 years  at 30 m resolution by using the Landsat imagery.Klein et al. (2017) generated a 250 m daily global dataset of inland water bodies based on a combination of MODIS Terra and Aqua daily classifications.However, the temporal resolution of the former research is near monthly, and the latter research only produced datasets from 2013 to 2015 until now, while the entire MODIS archive back to July 2002 is still ongoing (Klein et al., 2017).
In China numerous regional case studies have been done and produced some surface water datasets but only in bits and pieces (Du et al., 2012;Lai et al., 2013;Luo et al., 2017).Their research mainly focused on lakes in the Tibetan Plateau (Lu et al., 2017).Several research groups are focusing on lake water changes of this region and have produced decadal lake surface water datasets since the 1960s (Song et al., 2014;G. Zhang et al., 2014G. Zhang et al., , 2017;;Wan et al., 2014Wan et al., , 2016)).At the national scale, the national wetland remote sensing datasets in 1978, 1990, 2000and 2008(Niu et al., 2012)), the national land cover datasets in 1990, 2000, 2010, and 2015(Wu et al., 2017)), and the national land use datasets in 1990, 1995, 2000, 2005, 2010, and 2015(Liu et al., 2018) ) contain the interdecadal or 5-year time-scale water surface dataset (Table 1).However, these datasets are available with limited temporal resolution and are not freely and fully shared.
The most commonly used method of water extraction is based on water indices, such as the normalized difference water index (NDWI) (Gao, 1996;McFeeters, 1996;Rogers and Kearney, 2004), the modified normalized difference water index (MNDWI) (Xu, 2006), the automated water extraction index (AWEI) (Feyisa et al., 2014), and the enhanced water index (EWI) (Wang et al., 2015).Furthermore, the single band threshold segmentation method (Li et al., 2012;Lu et al., 2017) and the multiband transformation method (Pekel et al., 2014) are also in practice.The key step for using these methods in extracting the water boundary is to determine the threshold value for segmentation.The existing threshold determination methods include human visual judgment (Huang et al., 2008;Li et al., 2012) and sample statistical analysis (Feyisa et al., 2014;Pekel et al., 2014Pekel et al., , 2016)).The former relies on subjective experience, which causes the extraction results to be unstable and thus difficult to apply on larger scales and to large volumes of data.Although the latter can get more accurate results through extensive sampling statistics, the use of a unified threshold for the whole image or whole region may produce large errors in the local area.To overcome these problems, various comprehensive classification methods are widely used.Verpoorter et al. (2014) combined the principal component analysis (PCA) and the modified brightness index (MBI) to generate supervised classes and to divide these into water and non-water regions by using the decision tree method.Pekel et al. (2016) proposed an expert system by synthetic use of a visual analytical spectral library, the NDVI index, HSV transformation results, and the decision tree method.Khandelwal et al. (2017) introduced a global supervised classification based approach by defining initial spatial extents of each water body, using the global sample datasets, and incorporating all the spectral reflectance bands of the MODIS imagery.Use of supervised classification or the decision tree method may improve the accuracy of the water surface boundary extraction; however they can increase the difficulty and efficiency of the method at the same time.T. Zhang et al. (2017) proposed an automatic threshold determination method based on the LBV (L, the general radiance level; B, the visible-infrared radiation balance; V, the radiance variation vector between bands) transformation of Landsat 8 OLI surface reflectance images.It was verified as an accurate, simple and robust method for surface water extraction.However, cloud pixels and atmospheric correction influences were not considered.
China has one of the highest densities of rivers and lakes in the world.There are more than 1500 rivers with an area exceeding 1000 km 2 and 2928 lakes with an area larger than 1 km 2 which form a total surface water area of 91 020 km 2 (Ma et al., 2011).However, owing to the influence of climate, geography, and landscape of the country, these surface water resources are unevenly distributed.They are found more in the south than in the north, and more in the east than in the west.With the development of the economy, the increase in the demand for industrial, agricultural and domestic water has placed great pressure on these surface water systems, especially during the irrigation and drought season (Gong et al., 2011;Barnett et al., 2015).Therefore, there is an urgent need for spatio-temporal continuous surface water datasets to support the efficient and robust management of water resources and to investigate the relationship between the national surface water and the global climate and human activities.However, until now, full publicly available data products with moderate spatial resolution and near-daily temporal resolution are still lacking in China.
In order to address these limitations and to fulfill the need to develop a comprehensive spatio-temporal dataset, this paper presents the Inland Surface Water Dataset in China (ISWDC) during the period of 2000-2016 (and will be updated continuously for the subsequent years on the Zenodo platform), which is derived from the 8 d and 250 m spatial resolution MODIS MOD09Q1 product.After recalling the methodology used in surface water mapping from the MODIS MOD09Q1 as described by Lu et al. (2017), the precision and accuracy of the dataset are reported, including the cross comparison with the existing national and global datasets.

Methods
The threshold segmentation method proposed by Lu et al. (2017), which employs single bands with one-by-one segmentation of water bodies, is used to extract the surface water boundary, which includes four steps: interferences removal, preliminary water surface mapping, annual water surface mask acquisition, and water surface boundary extraction (Fig. 1).In this study the last two steps of the method are updated and improved as shown in Sect.3.1 and 3.2.

Annual water surface mask acquisition
The water surface mask is a key input data for excluding land disturbance factors that affect the extraction of the water surface boundary.It is generated from the preliminary water surface mapping results based on the modified Otsu threshold method applied on the selected images having less cloud cover and better quality in each year (Lu et al., 2017).In order to eliminate error in water area information caused by the cloud and cloud shadow in this process, the determination probability (p) parameter is used based on the fact that the cloud and its shadow will not appear in the same position for several days.The equation is as follows: where n is the number of the preliminary water surface mapping images, d i is the pixel value of image i, D is the pixel value of the annual water surface mask, and p is the determination probability for identifying water pixels.In this study the reference images from 2013 to 2016 were selected and the determination probability (p) was determined based on the same rule as within Lu et al. (2017).Furthermore, the annual reference images and determination probability (p) of 2000-2012 are directly used here because they were originally obtained based on the whole images of China (Table 2).

Comparison with the national land cover dataset
Based on the 30 m resolution national land cover dataset of 2000, 2005, and 2010, 511 samples from lakes and rivers spreading out across the country are selected as ground truth data (Fig. 2).This includes 11 very large water bodies with areas larger than 1000 km 2 , 12 large water bodies with areas larger than 500 km 2 and less than 1000 km 2 , 29 medium-sized water bodies with area larger than 100 km 2 and less than 500 km 2 , and 459 smaller water bodies with areas less than 100 km 2 .They were compared with the maximum ISWDC in the corresponding years.
The results show that the ISWDC is highly consistent with the reference land-cover-derived surface water data.The coefficients of determination (R 2 ) in 2000, 2005, and 2010 are found to be 0.9974, 0.992, and 0.9932, respectively, as shown in Fig. 3.The confusion matrix analysis results show that the average user accuracy is 91.13 %, the average producer accuracy is 88.95 %, and the average kappa coefficient is 0.88 in 3 years (Table 3).
Earth Syst.Sci.Data, 11, 1099Data, 11, -1108Data, 11, , 2019 www.earth-syst-sci-data.net/11/1099/2019/  As the national land cover data in 2000, 2005, and 2010 are based on 30 m Landsat images that were mainly obtained in the summer season, the water surface in these datasets can be equated with annual maximum water surface results.So we compared them with our maximum ISWDC from the corresponding years.The calculated R 2 is based on the area of different size of water bodies.The larger the R 2 , the better the consistency and the smaller the area error between the two datasets.Furthermore, the results of the confusion matrix are equivalent to pixel scale analysis although it is not as intuitive as visual contrast.

Assessment against the global surface water dataset
The time series of annual ISWDC and GSW permanent water bodies of the whole of China with an area larger than 0.0625 km 2 from 2000 to 2015 were also compared.The results show that the two datasets possess very good consistency (R 2 = 0.6532) (Fig. 4a) and very similar change dynamics (Fig. 4b).The annual ISWDC and GSW permanent water bodies in 2015 also indicate similar spatial patterns in different regions (Fig. 5).For the lake groups in the central Tibetan Plateau, the comparison between ISWDC obtained from MODIS and Landsat-derived GSW indicated a closer pattern between the two results (Fig. 5a).For the rivers and lakes interlaced with the Poyang Lake region, in addition to the narrow width of the river and some small water bodies, the coincidence between the two datasets is also very high (Fig. 5b).The over-extracted water (red regions in Fig. 5) on the margins for large water bodies like Siling Co, Nam Co, Poyang Lake, and some of the wide rivers, and the underextracted slender rivers and small water bodies (green regions in Fig. 5), for the ISWDC dataset, are mainly caused by the mixed pixel effects due to relatively coarse spatial resolution of the MODIS images.

Time series of surface water dataset applications
Time series of the surface water dataset can be used to analyze the inter-annual and seasonal variation characteristics of surface water area, including the inter-annual variation trend, abrupt change time, intra-annual hydrological process monitoring, etc. (Huang et al., 2018;Xing et al., 2018).Similarly, it can also be used as cross-validation reference data for global surface water datasets with a similar spatial resolution (Klein et al., 2017) and as a key input parameter for regional and global hydro-climatic model calibration and evaluation (Khan et al., 2011;Stacke and Hagemann, 2012).
For example, based on the ISWDC from 2000 to 2016, the annual variation of surface water in China can be obtained by superimposing all the 8 d time series water surface area data of each year.Figure 6 shows that the surface water area began to increase in early March and increased gradually in spring and summer.After reaching its peak in autumn, it then began to decrease gradually.The annual variation of surface water area in different regions can also be portrayed by calculating the multi-year average of every 8 d data.Figure 7 shows that the surface water area of Southwest China (SW) and Northwest China (NW) is very large and inter-seasonally it varies greatly compared to the surface water area of other regions.The surface water area in Northeast China (NE) began to increase rapidly in spring.It reached a peak in May and decreased slightly in June-July.After reaching its maximum in August-September, it began to decline again in October.In North China (NC) surface water area is relatively small, but the change still shows some seasonality.There is a significant increase in summer and autumn, but the range of increase and decrease is relatively small.Surface water area in Central China (CC) and Eastern China (EC) varies steadily during the year.It reaches its maximum in summer and be-  gins to decrease gradually in late summer and early autumn.Surface water area in South China (SC) was relatively stable throughout the year.
Furthermore, the spatial distributions of surface water can clearly be depicted by means of multi-year average analysis.The results in Table 4 show that surface water of inland China is mainly distributed in western China, accounting for 49.13 % of the total surface water area, with 29.88 % in the Southwest China (SW) and 19.25 % in the Northwest China (NW), followed by the Central China (CC) and East China (EC), which accounted for 8.13 % and 24.78 % of the total surface water area, respectively.The North China (NC), Northeast China (NE) and South China (SC) account for the other 17.96 % of the national surface water area.

Data availability
The ISWDC dataset is distributed under a Creative Commons Attribution 4.0 License.The data may be downloaded from the data repository Zenodo at https://doi.org/10.5281/zenodo.2616035(Lu et al., 2019).In each 8 d surface water image, the pixel values of 1 and 0 represent the water and the background respectively.The  The average user accuracy is 91.13 %, the average producer accuracy is 88.95 %, and the average kappa coefficient is 0.88 for these 3 years.Furthermore, a comparison with the GSW service underlines the reliability of temporal processes and spatial distribution.In terms of temporal variation, the ISWDC and the GSW possess excellent consistency and very similar change dynamics during the whole time period, which simply show that both datasets are highly correlated.For the spatial-distribution characteristics, the ISWDC in 2015 has similar spatial patterns in different regions to that of the GSW dataset, especially for larger water bodies such as lakes, water reservoirs, and wide rivers.
The advantage of the ISWDC dataset is its high level of revealing the spatio-temporal variability of inland surface water.Based on this dataset, the spatial-distribution characteristics and temporal-variation processes of surface water can be described through the multi-year average spatial statistics and annual data overlapping analysis.In addition, the dataset can also be used as a cross-validation reference data for other global surface water datasets and as a key input parameter for regional and global hydro-climatic models.
However influenced by the algorithm design and the data sources used, the results have certain limitations.First of all, as for other surface water datasets derived from multispectral sensors, the ISWDC only includes open water surfaces, while water bodies which are covered by vegetation are not captured.Secondly, as ISWDC only uses MODIS MOD09Q1 near-infrared band for water surface extraction, thus the accuracy of datasets depends mainly on the quality of the original 8 d synthetic images.When clouds exist in the water distribution region of the synthetic image at a certain time, the cloud-covered water surface will not be extracted and causes an underestimation for extracting water bodies.In addition, the reference images used to produce the annual water surface mask will also affect the accuracy of the final results.For example, if the selected image does not contain the information of the actual maximum water surface occurrence in that year, it may lead to the exclusion of that part of the water pixels which lies outside the mask.Finally, because of the small difference in reflectance between the ice-water mixing boundary in autumn and spring, the accuracy of water surface area extraction will be limited in these two seasons.
Although the water surface extraction method designed in this study is aimed at extracting water surface information from the MODIS MOD09Q1 images, its core process is the automatic thresholding for the estimation of water bodies one by one.Therefore, this method is also applicable to traditional water body indices, such as NDWI, MNDWI, and AWEI, or to other water surface information based on enhanced thematic data.In the future, while continuing to extend the existing datasets from 2017 to present by using this method, the 30 m GSW dataset in China will be extended.At the same time, the national 10 m spatial-resolution water surface dataset based on Sentinel-2 imagery will be produced.After the national-scale datasets are completed, the corresponding global-scale datasets are also expected.Review statement.This paper was edited by Ge Peng and reviewed by three anonymous referees.

Figure 2 .
Figure 2. The boundary of China, the accuracy assessment and the upper-limit threshold calculation samples for surface water extraction.NW: Northwest China, SW: Southwest China, SC: South China, CC: Central China; NC: North China, NE: Northeast China, EC: East China.

Figure 4 .
Figure 4. Comparison of the time series annual ISWDC and GSW permanent water bodies of the whole of China from 2000 to 2015.(a) is the correlation analysis result, and (b) is the change trend comparison result.

Figure 5 .
Figure 5.Comparison of permanent water bodies derived from ISWDC and GSW over the sites of the central Tibetan Plateau (a) and Poyang Lake region (b).

Figure 6 .
Figure 6.Annual change of total water area during the period of 2000-2016.

Figure 7 .
Figure 7.The 8 d surface water area in different regions of China from 2000 to 2016.NE: Northeast China, NC: North China, EC: East China, SC: South China, CC: Central China, NW: Northwest China, SW: Southwest China.

7
Discussion and conclusions In this study, the 8 d time series 250 m resolution surface water dataset of inland China (ISWDC) from 2000 to 2016 has been introduced.It is a publicly available data product with prominent features of long time series, moderate spatial resolution and high temporal resolution.The ISWDC is a valuable basic data source for the analysis of dynamic changes of surface water in China over the past 20 years.The results have been validated based on the 2000, 2005 and 2010 national land-cover-derived surface water data and show high accuracy.

Financial support .
This research has been supported by the National Key Research and Development Program of China (grant nos.2017YFC0405802, 2016YFC0503507-03), the Key Program of the National Natural Science Foundation of China (grant no.91637209), and the Strategic Priority Research Program of the Chinese Academy of Sciences (grant no.XDA19070201).

Table 1 .
National and regional surface-water-related datasets of China.
Lake water surface of Tibetan Plateau Lu et al. (2017) 8 d, 2000-2012 250 m Lake surface area of Tibetan Plateau , 2015 30 m erage from 24 February 2000 to 26 December 2016, a total of 16 698 images were used.The SRTM (Shuttle Radar Topography Mission) DEM (Digital Elevation Model) data with 90 m spatial resolution are used as an ancillary data for surface water extraction, which is jointly operated by NASA's JPL (NASA's Jet Propulsion Laboratory) and NIMA (National Imagery and Mapping Agency

Table 2 .
The images used for annual water surface mask generation and the determination probability each year.

Table 3 .
Accuracy analysis samples in different regions and the accuracy evaluation results.

Table 4 .
The average distribution of surface water area in inland China from 2000 to 2016. in each month can be used to calculate the monthly water occurrence and all the 8 d data in each year can be used to calculate the yearly water occurrence, by summing up all the surface water images together in the corresponding time periods.The vector datasets of the 8 d surface water boundaries extracted from the raster data products can also be obtained through the same link.