the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
1km Monthly Precipitation and Temperatures Dataset for China from 1952 to 2019 based on a Brand-New and High-Quality Baseline Climatology Surface
Abstract. Long-term climate data and high-quality baseline climatology surface with high resolution are essential to multiple fields including climatological, ecological, and environmental sciences. Here, we created a brand-new baseline climatology surface (ChinaClim_baseline) and developed a 1km monthly precipitation and temperatures dataset in China during 1952–2019 (ChinaClim_time-series). Thin plate spline (TPS) algorithm in each month with different model formulations by accounting for satellite-driven products and climatic research unit (CRU) datasets, was used to generate ChinaClim_baseline and monthly climate anomaly surface. Climatologically aided interpolation (CAI) was used to superimpose monthly anomaly surface with ChinaClim_baseline to generate ChinaClim_time-series. Our results showed that ChinaClim_baseline exhibited very high performance in four climatic regions with the RMSEs of precipitation and temperature elements estimation being 1.276 ~28.439 mm and 0.310 ~ 2.040 °C, respectively. The correlations among ChinaClim_baseline and WorldClim2 and CHELSA were high, but our results also captured clearly spatial differences among them. WorldClm2 and CHELSA might overestimated (or underestimated) climate events such as warming and drought in temperate continental region and high cold Tibetan plateau where weather stations were sparse. For ChinaClim_time-series, precipitation and temperature elements had average RMSEs between 7.502 mm ~ 52.307 mm, and 0.461 °C ~ 0.939 °C for all months, respectively. Compared with Peng’s climate surface and CHELSAcruts, R2 increased by ~ 7 %, RMSE and MAE decreased by ~ 17 % for precipitation; for temperature elements, R2 hardly increased, but RMSE and MAE decreased by ~50 %. Our results showed ChinaClim_baseline obviously improved the accuracy of time-series climatic elements estimation, and the satellite-driven data can greatly improve the accuracy of time-series precipitation estimation, but not the accuracy of time-series temperatures estimation. Overall, ChinaClim_baseline, an excellent baseline climatology surface, can be used for obtaining high-quality and long-term climate datasets from past to future. In the meantime, ChinaClim_time-series of 1km spatial resolution based on ChinaClim_baseline, is suitable for investigating the spatial-temporal patterns of climate changes and their impacts on eco-environmental systems in China.
Here, ChinaClim_baseline is available at 10.5281/zenodo.5900743 (Gong, 2020a), ChinaClim_time-series of precipitation is available at 10.5281/zenodo.5919442 (Gong, 2020b), ChinaClim_time-series of maximum temperature is available at 10.5281/zenodo.5919448 (Gong, 2020c), ChinaClim_time-series of minimum temperature is available at 10.5281/zenodo.5919423 (Gong, 2020d) and ChinaClim_time-series of average temperature is available at 10.5281/zenodo.5919450 (Gong, 2020e).
- Preprint
(2578 KB) - Metadata XML
-
Supplement
(7169 KB) - BibTeX
- EndNote
Status: closed
-
RC1: 'Comment on essd-2022-45', Anonymous Referee #1, 18 Mar 2022
Review
This manuscript reported on one dataset of 1km Monthly Precipitation and Temperatures Dataset for China from 1952 to 2019. Basically, the dataset of temperature and precipitation at monthly scale are believed to be not able make any new contributions to the field of weather and climate community. China Meteorological Administration has compiled and archived a wide range of temperature and precipitation dataset at finer scales, in which a series of strict and physically- based quality-control measures have been implemented. Technically, a variety of data have been used to compile the dataset. Nevertheless, my big concern lies at the ambiguousness of data sources, which needs further clarification. Importantly, the selection of the optimization model was simply relying on the sorting combination of various data sources, while the rationality or physical basis behind the different combinations is not explained at all. In addition, there were a lot of still descriptive errors in this manuscript, and most of the figures were not presented in scientific ways, among others. Sorry for that I cannot be more positive. Therefore, I have to recommend the rejection of the work. The specific concerns of mine are listed as follows:
- Section 3.1, line 273 "After removing duplicate and invalid weather stations", based on what methods and standards are invalid stations defined, and which station data have been removed?
- line286, Table 1 introduces the data used for the interpolation. The precipitation data used the TRMM3B43 data from 1998 to 2019 as a covariate. Was the data from 1952 to 1997 a default covariate? Whether the two pieces of data (1952-1997, 1998-2019) were continuous? For the same reason, temperature series (TMEAN, TMAX, TMIN), the time range of LST data was 2001-2019, then how to deal with the data from 1952-2000, Were the two pieces of data (1952 -2000, 2001-2019) continuous?
- We learned that different optimization models (Table1, TableS7, TableS8) were used in different months, which means that data from different sources was applied to the entire sequence. Has the 1952-2019 change been assessed, how to consider the homogeneity of the climate series (ChinaClim_baseline)? Was this scheme reasonable?
- WorldClim data were downscaled from CRU-TS-4.03.The temperature data in this paper also used CRU as a covariate, and finally the WorldClim was compared with the sequence (ChinaClim_baseline) in this paper. What was the significance of comparing the same data source?
- There were only 613 observation stations in this paper. The satellite data was inversion data and non-observation data, and the accuracy was 25km*25km (0.25°). Other data (CRU) were the grid data with 613 station interpolation applied in China, and the accuracy was 25km*25km (0.25°). The 30-year average climate (1981-2010) data belonged to a 30-year average of 366 data for each station. Precipitation events are small and medium-sized local events of the weather system. Precipitation events are small and medium-scale local events of the weather system. Was it reasonable to interpolate these data to 1km with this method of multiple statistical downscaling?
- Section 2.1, lines 164-165: "Dataset of 30-year average climate (1981-2010) was obtained from two sources, 2438 weather stations from CMD and 25 weather stations from Central Weather Bureau ." It was inconsistent with the description in the legend of Figure 1.
- Section 2.1 Lines 167-168: "from the China Meteorological Data Service Center (CMD: http://cdc.nmic.cn)." the website cannot be accessible.
- Figure 2 and Figure 4-7 were not the standardized maps of China, and the location of the nine-dash line and the Diaoyu Islands and Huangyan Islands were not marked.
- The DEM data of the application and the method of how to make it into 1KM*1KM were not explained.
Citation: https://doi.org/10.5194/essd-2022-45-RC1 -
AC1: 'Reply on RC1', Gong Haibo, 30 Apr 2022
We have answered point-by-point to all the comments raised by the referee. The answers are provided below each comment. We would like to thank the reviewer for the quality and detailed reviewing provided.
- Referee Comment 1:
This manuscript reported on one dataset of 1km Monthly Precipitation and Temperatures Dataset for China from 1952 to 2019. Basically, the dataset of temperature and precipitation at monthly scale are believed to be not able make any new contributions to the field of weather and climate community. China Meteorological Administration has compiled and archived a wide range of temperature and precipitation dataset at finer scales, in which a series of strict and physically- based quality-control measures have been implemented. Technically, a variety of data have been used to compile the dataset. Nevertheless, my big concern lies at the ambiguousness of data sources, which needs further clarification. Importantly, the selection of the optimization model was simply relying on the sorting combination of various data sources, while the rationality or physical basis behind the different combinations is not explained at all. In addition, there were a lot of still descriptive errors in this manuscript, and most of the figures were not presented in scientific ways, among others. Sorry for that I cannot be more positive. Therefore, I have to recommend the rejection of the work. The specific concerns of mine are listed as follows:
We sincerely thank the reviewer for his very insightful and effective comments and suggestions. Although the usefulness of climate datasets at a monthly scale may be weak in the field of weather and climate community, the climate datasets at monthly/annual scales are essential in the field of vegetation ecology, such as the effects of climate change on species distribution and the response of vegetation productivity/phenology to climate change. To our knowledge, the China Meteorological Administration has compiled a wide range of datasets, however, there were presently no publicly and freely available baseline climatology surface and the long-term time series datasets at high-quality spatial resolution (1km). Previous studies suggested that the release of public climate data over China is insufficient to construct a baseline climatology dataset better than that available from WorldClim2 (Peng et al., 2019). In comparison to WorldClim/WorldClim2, ChinaClim_baseline, was a better baseline climatology surface than WorldClim/WorldClim2, taking into account the dense weather stations (over 2000) and the useful satellite-driven TRMM3B43. We acknowledged the ambiguousness of our data sources and explained the source and processing of the data in a supplement called station_informations in supplement2.zip. Notably, WorldClim/WorldClim2, which was one of the most popular climate datasets, was generated by the optimization model relying on the combination of various variables. Additiionally, variations in local context and seasons have significant influence on climate processes, thus the model for fitting baseline climatology surface should vary from various climatic regions and different months. ChinaClim_baseline was created by using the optimal TPS model for each climatic region and different months. This adaptive method allowed for better model fits in remote regions and specific months. Hence, our method of generation ChinaClim was feasible. Indeed, we also must acknowledge that the physical basis behind the different combinations was weak and current interpolation models hardly reflect this physical mechanism. We carefully addressed this manuscript's shortcomings in terms of descriptive errors and erroneous figures, and we rewrote several phrases with existing expression issue. We apologize for the numerous errors in this manuscript. We carefully analyze each problem in this manuscript and try all effort to improve it here. We truly hope to present a strict, meaningful, and attractive manuscript to readers.
- Section 3.1, line 273 "After removing duplicate and invalid weather stations", based on
what methods and standards are invalid stations defined, and which station data have
been removed?
Thanks for your suggestion!
We provided a supplement called station_informations in supplement2.zip containing weather station information, as well as a list of the weather stations that were utilized and removed in this manuscript.
Specially, we excluded 47 weather stations with no records longer than ten years from a dataset of 2160 weather stations for the 30-year average climate (1981-2010). We excluded 143 weather stations with no records from 1981 to 2010 and less than 30 years of records from monthly climate observations (756 weather stations).
- Line286, Table 1 introduces the data used for the interpolation. The precipitation data
used the TRMM3B43 data from 1998 to 2019 as a covariate. Was the data from 1952 to 1997 a default covariate? Whether the two pieces of data (1952-1997, 1998-2019) were continuous? For the same reason, temperature series (TMEAN, TMAX, TMIN), the time range of LST data was 2001-2019, then how to deal with the data from 1952-2000, Were the two pieces of data (1952 -2000, 2001-2019) continuous?
Indeed, continuity is a serious issue since it may impact greatly the quality of time series datasets. We made attempts in both data input and model selection to preserve the continuity of the time series dataset (ChinaClim time-series) as much as feasible. The continuous CRU dataset was always included as one of the variables to construct the TPS model across the whole time for data input. We used a segmented strategy for model selection, which first determined the basic monthly-scale TPS model, one of 7 model formulations (Table S6) constructed by using different combinations of variables (Longitude, Latitude, Elevation, Distance to the nearest coast, CRU anomaly (ratio), and the 30-Year normals). Then this model was used as the TPS model for the period without satellite-driven dataset (eg: 1952-1997 for precipitation); then, the basic monthly-scale TPS model, the satellite data variable was added as one of the covariates or spline variables to select the optimal TPS model for the rest of the period (eg: 1998-2019 for precipitation). This segmented strategy for model selection allowed us to preserve a degree of similarity between the models over the two periods, ensuring the finest potential continuity of the final dataset.
Additionally, we calculated the temporal trends for ChinaClim_time-series datasets at a monthly scale and checked the possible mutations at the related years (eg., 1998 and 2001). Our dataset displayed good continuity, according to the results in a supplement called trend in supplement2.zip.
- We learned that different optimization models (Table1, TableS7, TableS8) were used in
different months, which means that data from different sources was applied to the entire sequence. Has the 1952-2019 change been assessed, how to consider the homogeneity of the climate series (ChinaClim_baseline)? Was this scheme reasonable?
Thanks!
As shown in a supplement called trend in supplement2.zip, we assessed the temporal variation of ChinaClim_time-series.
Additionally, I'm not sure what you mean about the homogeneity of the climate series; do you believe the spatial patterns of the various baseline climatology surfaces (or time-series climate data) should be assessed?
- WorldClim data were downscaled from CRU-TS-4.03. The temperature data in this paper also used CRU as a covariate, and finally the WorldClim was compared with the sequence (ChinaClim_baseline) in this paper. What was the significance of comparing the same data source?
Thanks!
As far as we known, both the generation of WorldClim and WorldClim2 used the thin-plate smoothing spline algorithm for interpolation instead of downscaled from CRU-TS-4.0. It was also described in section 2.6 of this manuscript. The specific articles are:
- Hijmans RJ, Cameron SE, Parra JL, Jones PG, Jarvis A. 2005. Very high resolution interpolated climate surfaces for global land areas. Int. J. Climatol. 25: 1965–1978.
- Fick, S.E. and R.J. Hijmans, 2017. WorldClim 2: new 1km spatial resolution climate surfaces for global land areas. International Journal of Climatology 37 (12): 4302-4315.
We did not use CRU data as a covariate in the generation of the baseline climatology surface, as shown in Table 1.
Generally, the accuracy of diverse data sets should be evaluated using their own independent samples. We were unable to find independent samples to test the accuracy of WorldClim2 and CHELSA in this manuscript, thus we just analyzed the differences in estimated temperature and precipitation values for different baseline climatology surfaces.
- There were only 613 observation stations in this paper. The satellite data was inversion data and non-observation data, and the accuracy was 25km*25km (0.25°). Other data (CRU) were the grid data with 613 station interpolation applied in China, and the accuracy was 25km*25km (0.25°). The 30-year average climate (1981-2010) data belonged to a 30-year average of 366 data for each station. Precipitation events are small and medium-sized local events of the weather system. Precipitation events are small and medium-scale local events of the weather system. Was it reasonable to interpolate these data to 1km with this method of multiple statistical downscaling?
Thanks!
In this work, 613 observation sites were used to construct the ChinaClim time-series, which was the whole amount of freely available weather station. Given the limited set of observation stations, attempting to interpolate directly to ~1km spatial resolution from these stations may lead to significant errors. Here, we employed climatologically aided interpolation (CAI) approach to obtain the final climatic surface by superimposing the baseline climatology surface and the monthly anomaly surface. To begin, 12 baseline climatology surfaces were generated using the multi-year monthly average observation data from more than 2,000 weather stations. These baseline climatology surfaces represent the local average monthly and annual conditions. Then, based on the anomaly time series from 613 observation stations, the monthly anomalies were interpolated to create monthly anomaly surfaces.
Notably, we downscaled the TRMM3B43 of 0.25° by applying the Cubist algorithm, which was believed to be effective for precipitation downscaling. The related references were as follows:
- Improving TMPA 3B43 V7 Data Sets Using Land-Surface Characteristics and Ground Observations on the Qinghai–Tibet Plateau.
- A spatial data mining algorithm for downscaling TMPA 3B43 V7 data over the Qinghai–Tibet Plateau with the effects of systematic anomalies removed.
We used TPS interpolation to interpolate the CRU data to 1km spatial resolution, the reason for this was that we refered to the processing of some datasets such as CHELSA and KGclim. We realized that this method would increase the uncertainty of our dataset, and future work needs to strive to further improve the algorithm to improve the accuracy.
CHELSA: Climatologies at high resolution for the earth's land surface areas
KGclim: A 1 km global dataset of historical (1979–2013) and future (2020–2100) Köppen–Geiger climate classification and bioclimatic variables
- Section 2.1, lines 164-165: "Dataset of 30-year average climate (1981-2010) was
obtained from two sources, 2438 weather stations from CMD and 25 weather stations
from Central Weather Bureau." It was inconsistent with the description in the legend of
Figure 1.
Thanks!
We revised the description in the legend and attached specific information about weather stations used in this manuscript. We provided new Figure 1 in a supplement called figures in supplement2.zip.
For the generation of ChinaClim_baseline, the number of weather stations used in this study was 2138, containing 2113 weather stations from CMD and 25 weather stations from Central Weather Bureau.
- Section 2.1 Lines 167-168: "from the China Meteorological Data Service Center (CMD: http://cdc.nmic.cn)." the website cannot be accessible.
Thanks!
We modified the inaccessible website (http://cdc.nmic.cn) to the correct link (http://data.cma.cn).
- Figure 2 and Figure 4-7 were not the standardized maps of China, and the location of the nine-dash line and the Diaoyu Islands and Huangyan Islands were not marked.
Thanks!
We revised the maps of China based on the standardized maps and a supplement called figures in supplement2.zip.
Given the drawing space limitations, as shown in Figure 2, we only marked the location of the nine-dash line and the Diaoyu Islands and Huangyan Islands for the bottom two maps.
- The DEM data of the application and the method of how to make it into 1KM*1KM were
not explained.
Thanks!
The DEM data was aggregated by using nearest-neighbor interpolation method to 1km spatial resolution.
-
RC2: 'Comment on essd-2022-45', Simon Tett, 30 May 2022
Review of 1km Monthly Precipitation and Temperature Dataset for china from 1952 to 2019 based on a Brand-New and High-Quality Baseline Climatology Surface.
By Gong et al
This paper aims to produce a 1km resolution monthly dataset from 1952 to present for temperature and precipitation. It does this by first generating a 1km climatology for 1981 to 2010. This is done by interpolating in situ station data using various covariates including, when available, satellite estimates of precipitation and land surface temperature. The authors show that the data is an improvement on existing datasets. I think the paper and dataset are a useful contribution to knowledge of Chinese climate change and should eventually be published.
Major points
- Though the paper is not badly written, it would benefit from an edit to improve some of the English and clarify some points. In particular the discussion around model selection is difficult to follow.
- I am concerned about the lack of quality control on the data though the station data used in the dataset might have been quality controlled by CMA. Some discussion of this is needed in the final paper. The Impression that my Chinese collaborators give is the instrumental data prior to 1961 is not very reliable. Some discussion of this is needed and could form part of the quality discussion on the data.
- I do not understand why satellite data was interpolated to 1km to then be used in the interpolation of the station data. It would be better to keep that on its original resolution (or have the same values for every 1km pixel within the satellite information foot print) rather than add false spatial information to it. Further, as this data is only present for some of the climatology period I worry about the quality of the analysis prior to that data being present.
- The section that describes the selection of the various interpolation models is unclear. My sense is that multiple models are tried and per calendar month the best one (smallest RMSE) is used. I am concerned that this will lead to over confidence. I appreciate the authors have kept 10% of stations, selected at random, back for testing. I *think** this data is used for testing the 11 different models and determining, for each month, which is the best model. In the absence of physical insight it would be better to have a single interpolation model for the whole period with data withhold from the model generation & selection, and then used to test the final model. This could also be used to test the error model used in the interpolation.
- The climatological analysis lacks a comprehensive uncertainty analysis. At the station level the authors are, I assume, using a white noise error. However, I think the interpolation will generate correlated error. The dataset would be an advance on existing approaches if it generated an ensemble of datasets that included the various error sources. That would allow users to determine how uncertain the results are, I suspect this will vary considerably depending on station density and other features of the data.
- The title is too long and over claims. It will not be “brand new” for long and the authors do not demonstrate its quality. I suggest excising such text from the title and a revised paper.
- Figures and associated text are rather small. I recommend that the authors create the figures at the size they expect them to be in the paper. Doing that will mean that text elements are of the appropriate size.
- As climatological precipitation has very large spatial gradients I think it would be better to compute RMSE statistics as fractions of estimated precipitation rather than in mm.
- I have similar concerns about the monthly-resolved model as I do about the climatology. That is the model selection and given the satellite data only includes a relatively small part of the entire dataset wonder why this is being used at all. Further there is a lack of uncertainty analysis.
- For the monthly resolved dataset I suspect that interpolating fraction of normal precipitation, rather than total precipitation, would be more accurate.
- The authors should consider making available an anomaly dataset (difference from normal) available whose effective resolution is likely low.
- The authors make the data available in a variety of formats (NC for the climatology and geotiff for the monthly resolved dataset). They also describe the scaling used in the NC file. I have not looked at the data in detail. Given my concerns about the work I will wait for a revised paper before doing this. I do not think they need to say in the revised paper what the scaling is – that should be in the NetCDF file. Unless they are reducing the precision it is not necessary to introduce a scaling factor either. I looked at the precipitation climatology dataset using xarray. I suggest that sensible names are used and a correct crs code (or name) So change “variable” to “pr” or “tas” as appropriate. And rather than z set it to month. I also recommend adding some metadata to the dataset pointing to the paper, the authors, and information on the period covered and possibly the statistical model used.
Citation: https://doi.org/10.5194/essd-2022-45-RC2
Status: closed
-
RC1: 'Comment on essd-2022-45', Anonymous Referee #1, 18 Mar 2022
Review
This manuscript reported on one dataset of 1km Monthly Precipitation and Temperatures Dataset for China from 1952 to 2019. Basically, the dataset of temperature and precipitation at monthly scale are believed to be not able make any new contributions to the field of weather and climate community. China Meteorological Administration has compiled and archived a wide range of temperature and precipitation dataset at finer scales, in which a series of strict and physically- based quality-control measures have been implemented. Technically, a variety of data have been used to compile the dataset. Nevertheless, my big concern lies at the ambiguousness of data sources, which needs further clarification. Importantly, the selection of the optimization model was simply relying on the sorting combination of various data sources, while the rationality or physical basis behind the different combinations is not explained at all. In addition, there were a lot of still descriptive errors in this manuscript, and most of the figures were not presented in scientific ways, among others. Sorry for that I cannot be more positive. Therefore, I have to recommend the rejection of the work. The specific concerns of mine are listed as follows:
- Section 3.1, line 273 "After removing duplicate and invalid weather stations", based on what methods and standards are invalid stations defined, and which station data have been removed?
- line286, Table 1 introduces the data used for the interpolation. The precipitation data used the TRMM3B43 data from 1998 to 2019 as a covariate. Was the data from 1952 to 1997 a default covariate? Whether the two pieces of data (1952-1997, 1998-2019) were continuous? For the same reason, temperature series (TMEAN, TMAX, TMIN), the time range of LST data was 2001-2019, then how to deal with the data from 1952-2000, Were the two pieces of data (1952 -2000, 2001-2019) continuous?
- We learned that different optimization models (Table1, TableS7, TableS8) were used in different months, which means that data from different sources was applied to the entire sequence. Has the 1952-2019 change been assessed, how to consider the homogeneity of the climate series (ChinaClim_baseline)? Was this scheme reasonable?
- WorldClim data were downscaled from CRU-TS-4.03.The temperature data in this paper also used CRU as a covariate, and finally the WorldClim was compared with the sequence (ChinaClim_baseline) in this paper. What was the significance of comparing the same data source?
- There were only 613 observation stations in this paper. The satellite data was inversion data and non-observation data, and the accuracy was 25km*25km (0.25°). Other data (CRU) were the grid data with 613 station interpolation applied in China, and the accuracy was 25km*25km (0.25°). The 30-year average climate (1981-2010) data belonged to a 30-year average of 366 data for each station. Precipitation events are small and medium-sized local events of the weather system. Precipitation events are small and medium-scale local events of the weather system. Was it reasonable to interpolate these data to 1km with this method of multiple statistical downscaling?
- Section 2.1, lines 164-165: "Dataset of 30-year average climate (1981-2010) was obtained from two sources, 2438 weather stations from CMD and 25 weather stations from Central Weather Bureau ." It was inconsistent with the description in the legend of Figure 1.
- Section 2.1 Lines 167-168: "from the China Meteorological Data Service Center (CMD: http://cdc.nmic.cn)." the website cannot be accessible.
- Figure 2 and Figure 4-7 were not the standardized maps of China, and the location of the nine-dash line and the Diaoyu Islands and Huangyan Islands were not marked.
- The DEM data of the application and the method of how to make it into 1KM*1KM were not explained.
Citation: https://doi.org/10.5194/essd-2022-45-RC1 -
AC1: 'Reply on RC1', Gong Haibo, 30 Apr 2022
We have answered point-by-point to all the comments raised by the referee. The answers are provided below each comment. We would like to thank the reviewer for the quality and detailed reviewing provided.
- Referee Comment 1:
This manuscript reported on one dataset of 1km Monthly Precipitation and Temperatures Dataset for China from 1952 to 2019. Basically, the dataset of temperature and precipitation at monthly scale are believed to be not able make any new contributions to the field of weather and climate community. China Meteorological Administration has compiled and archived a wide range of temperature and precipitation dataset at finer scales, in which a series of strict and physically- based quality-control measures have been implemented. Technically, a variety of data have been used to compile the dataset. Nevertheless, my big concern lies at the ambiguousness of data sources, which needs further clarification. Importantly, the selection of the optimization model was simply relying on the sorting combination of various data sources, while the rationality or physical basis behind the different combinations is not explained at all. In addition, there were a lot of still descriptive errors in this manuscript, and most of the figures were not presented in scientific ways, among others. Sorry for that I cannot be more positive. Therefore, I have to recommend the rejection of the work. The specific concerns of mine are listed as follows:
We sincerely thank the reviewer for his very insightful and effective comments and suggestions. Although the usefulness of climate datasets at a monthly scale may be weak in the field of weather and climate community, the climate datasets at monthly/annual scales are essential in the field of vegetation ecology, such as the effects of climate change on species distribution and the response of vegetation productivity/phenology to climate change. To our knowledge, the China Meteorological Administration has compiled a wide range of datasets, however, there were presently no publicly and freely available baseline climatology surface and the long-term time series datasets at high-quality spatial resolution (1km). Previous studies suggested that the release of public climate data over China is insufficient to construct a baseline climatology dataset better than that available from WorldClim2 (Peng et al., 2019). In comparison to WorldClim/WorldClim2, ChinaClim_baseline, was a better baseline climatology surface than WorldClim/WorldClim2, taking into account the dense weather stations (over 2000) and the useful satellite-driven TRMM3B43. We acknowledged the ambiguousness of our data sources and explained the source and processing of the data in a supplement called station_informations in supplement2.zip. Notably, WorldClim/WorldClim2, which was one of the most popular climate datasets, was generated by the optimization model relying on the combination of various variables. Additiionally, variations in local context and seasons have significant influence on climate processes, thus the model for fitting baseline climatology surface should vary from various climatic regions and different months. ChinaClim_baseline was created by using the optimal TPS model for each climatic region and different months. This adaptive method allowed for better model fits in remote regions and specific months. Hence, our method of generation ChinaClim was feasible. Indeed, we also must acknowledge that the physical basis behind the different combinations was weak and current interpolation models hardly reflect this physical mechanism. We carefully addressed this manuscript's shortcomings in terms of descriptive errors and erroneous figures, and we rewrote several phrases with existing expression issue. We apologize for the numerous errors in this manuscript. We carefully analyze each problem in this manuscript and try all effort to improve it here. We truly hope to present a strict, meaningful, and attractive manuscript to readers.
- Section 3.1, line 273 "After removing duplicate and invalid weather stations", based on
what methods and standards are invalid stations defined, and which station data have
been removed?
Thanks for your suggestion!
We provided a supplement called station_informations in supplement2.zip containing weather station information, as well as a list of the weather stations that were utilized and removed in this manuscript.
Specially, we excluded 47 weather stations with no records longer than ten years from a dataset of 2160 weather stations for the 30-year average climate (1981-2010). We excluded 143 weather stations with no records from 1981 to 2010 and less than 30 years of records from monthly climate observations (756 weather stations).
- Line286, Table 1 introduces the data used for the interpolation. The precipitation data
used the TRMM3B43 data from 1998 to 2019 as a covariate. Was the data from 1952 to 1997 a default covariate? Whether the two pieces of data (1952-1997, 1998-2019) were continuous? For the same reason, temperature series (TMEAN, TMAX, TMIN), the time range of LST data was 2001-2019, then how to deal with the data from 1952-2000, Were the two pieces of data (1952 -2000, 2001-2019) continuous?
Indeed, continuity is a serious issue since it may impact greatly the quality of time series datasets. We made attempts in both data input and model selection to preserve the continuity of the time series dataset (ChinaClim time-series) as much as feasible. The continuous CRU dataset was always included as one of the variables to construct the TPS model across the whole time for data input. We used a segmented strategy for model selection, which first determined the basic monthly-scale TPS model, one of 7 model formulations (Table S6) constructed by using different combinations of variables (Longitude, Latitude, Elevation, Distance to the nearest coast, CRU anomaly (ratio), and the 30-Year normals). Then this model was used as the TPS model for the period without satellite-driven dataset (eg: 1952-1997 for precipitation); then, the basic monthly-scale TPS model, the satellite data variable was added as one of the covariates or spline variables to select the optimal TPS model for the rest of the period (eg: 1998-2019 for precipitation). This segmented strategy for model selection allowed us to preserve a degree of similarity between the models over the two periods, ensuring the finest potential continuity of the final dataset.
Additionally, we calculated the temporal trends for ChinaClim_time-series datasets at a monthly scale and checked the possible mutations at the related years (eg., 1998 and 2001). Our dataset displayed good continuity, according to the results in a supplement called trend in supplement2.zip.
- We learned that different optimization models (Table1, TableS7, TableS8) were used in
different months, which means that data from different sources was applied to the entire sequence. Has the 1952-2019 change been assessed, how to consider the homogeneity of the climate series (ChinaClim_baseline)? Was this scheme reasonable?
Thanks!
As shown in a supplement called trend in supplement2.zip, we assessed the temporal variation of ChinaClim_time-series.
Additionally, I'm not sure what you mean about the homogeneity of the climate series; do you believe the spatial patterns of the various baseline climatology surfaces (or time-series climate data) should be assessed?
- WorldClim data were downscaled from CRU-TS-4.03. The temperature data in this paper also used CRU as a covariate, and finally the WorldClim was compared with the sequence (ChinaClim_baseline) in this paper. What was the significance of comparing the same data source?
Thanks!
As far as we known, both the generation of WorldClim and WorldClim2 used the thin-plate smoothing spline algorithm for interpolation instead of downscaled from CRU-TS-4.0. It was also described in section 2.6 of this manuscript. The specific articles are:
- Hijmans RJ, Cameron SE, Parra JL, Jones PG, Jarvis A. 2005. Very high resolution interpolated climate surfaces for global land areas. Int. J. Climatol. 25: 1965–1978.
- Fick, S.E. and R.J. Hijmans, 2017. WorldClim 2: new 1km spatial resolution climate surfaces for global land areas. International Journal of Climatology 37 (12): 4302-4315.
We did not use CRU data as a covariate in the generation of the baseline climatology surface, as shown in Table 1.
Generally, the accuracy of diverse data sets should be evaluated using their own independent samples. We were unable to find independent samples to test the accuracy of WorldClim2 and CHELSA in this manuscript, thus we just analyzed the differences in estimated temperature and precipitation values for different baseline climatology surfaces.
- There were only 613 observation stations in this paper. The satellite data was inversion data and non-observation data, and the accuracy was 25km*25km (0.25°). Other data (CRU) were the grid data with 613 station interpolation applied in China, and the accuracy was 25km*25km (0.25°). The 30-year average climate (1981-2010) data belonged to a 30-year average of 366 data for each station. Precipitation events are small and medium-sized local events of the weather system. Precipitation events are small and medium-scale local events of the weather system. Was it reasonable to interpolate these data to 1km with this method of multiple statistical downscaling?
Thanks!
In this work, 613 observation sites were used to construct the ChinaClim time-series, which was the whole amount of freely available weather station. Given the limited set of observation stations, attempting to interpolate directly to ~1km spatial resolution from these stations may lead to significant errors. Here, we employed climatologically aided interpolation (CAI) approach to obtain the final climatic surface by superimposing the baseline climatology surface and the monthly anomaly surface. To begin, 12 baseline climatology surfaces were generated using the multi-year monthly average observation data from more than 2,000 weather stations. These baseline climatology surfaces represent the local average monthly and annual conditions. Then, based on the anomaly time series from 613 observation stations, the monthly anomalies were interpolated to create monthly anomaly surfaces.
Notably, we downscaled the TRMM3B43 of 0.25° by applying the Cubist algorithm, which was believed to be effective for precipitation downscaling. The related references were as follows:
- Improving TMPA 3B43 V7 Data Sets Using Land-Surface Characteristics and Ground Observations on the Qinghai–Tibet Plateau.
- A spatial data mining algorithm for downscaling TMPA 3B43 V7 data over the Qinghai–Tibet Plateau with the effects of systematic anomalies removed.
We used TPS interpolation to interpolate the CRU data to 1km spatial resolution, the reason for this was that we refered to the processing of some datasets such as CHELSA and KGclim. We realized that this method would increase the uncertainty of our dataset, and future work needs to strive to further improve the algorithm to improve the accuracy.
CHELSA: Climatologies at high resolution for the earth's land surface areas
KGclim: A 1 km global dataset of historical (1979–2013) and future (2020–2100) Köppen–Geiger climate classification and bioclimatic variables
- Section 2.1, lines 164-165: "Dataset of 30-year average climate (1981-2010) was
obtained from two sources, 2438 weather stations from CMD and 25 weather stations
from Central Weather Bureau." It was inconsistent with the description in the legend of
Figure 1.
Thanks!
We revised the description in the legend and attached specific information about weather stations used in this manuscript. We provided new Figure 1 in a supplement called figures in supplement2.zip.
For the generation of ChinaClim_baseline, the number of weather stations used in this study was 2138, containing 2113 weather stations from CMD and 25 weather stations from Central Weather Bureau.
- Section 2.1 Lines 167-168: "from the China Meteorological Data Service Center (CMD: http://cdc.nmic.cn)." the website cannot be accessible.
Thanks!
We modified the inaccessible website (http://cdc.nmic.cn) to the correct link (http://data.cma.cn).
- Figure 2 and Figure 4-7 were not the standardized maps of China, and the location of the nine-dash line and the Diaoyu Islands and Huangyan Islands were not marked.
Thanks!
We revised the maps of China based on the standardized maps and a supplement called figures in supplement2.zip.
Given the drawing space limitations, as shown in Figure 2, we only marked the location of the nine-dash line and the Diaoyu Islands and Huangyan Islands for the bottom two maps.
- The DEM data of the application and the method of how to make it into 1KM*1KM were
not explained.
Thanks!
The DEM data was aggregated by using nearest-neighbor interpolation method to 1km spatial resolution.
-
RC2: 'Comment on essd-2022-45', Simon Tett, 30 May 2022
Review of 1km Monthly Precipitation and Temperature Dataset for china from 1952 to 2019 based on a Brand-New and High-Quality Baseline Climatology Surface.
By Gong et al
This paper aims to produce a 1km resolution monthly dataset from 1952 to present for temperature and precipitation. It does this by first generating a 1km climatology for 1981 to 2010. This is done by interpolating in situ station data using various covariates including, when available, satellite estimates of precipitation and land surface temperature. The authors show that the data is an improvement on existing datasets. I think the paper and dataset are a useful contribution to knowledge of Chinese climate change and should eventually be published.
Major points
- Though the paper is not badly written, it would benefit from an edit to improve some of the English and clarify some points. In particular the discussion around model selection is difficult to follow.
- I am concerned about the lack of quality control on the data though the station data used in the dataset might have been quality controlled by CMA. Some discussion of this is needed in the final paper. The Impression that my Chinese collaborators give is the instrumental data prior to 1961 is not very reliable. Some discussion of this is needed and could form part of the quality discussion on the data.
- I do not understand why satellite data was interpolated to 1km to then be used in the interpolation of the station data. It would be better to keep that on its original resolution (or have the same values for every 1km pixel within the satellite information foot print) rather than add false spatial information to it. Further, as this data is only present for some of the climatology period I worry about the quality of the analysis prior to that data being present.
- The section that describes the selection of the various interpolation models is unclear. My sense is that multiple models are tried and per calendar month the best one (smallest RMSE) is used. I am concerned that this will lead to over confidence. I appreciate the authors have kept 10% of stations, selected at random, back for testing. I *think** this data is used for testing the 11 different models and determining, for each month, which is the best model. In the absence of physical insight it would be better to have a single interpolation model for the whole period with data withhold from the model generation & selection, and then used to test the final model. This could also be used to test the error model used in the interpolation.
- The climatological analysis lacks a comprehensive uncertainty analysis. At the station level the authors are, I assume, using a white noise error. However, I think the interpolation will generate correlated error. The dataset would be an advance on existing approaches if it generated an ensemble of datasets that included the various error sources. That would allow users to determine how uncertain the results are, I suspect this will vary considerably depending on station density and other features of the data.
- The title is too long and over claims. It will not be “brand new” for long and the authors do not demonstrate its quality. I suggest excising such text from the title and a revised paper.
- Figures and associated text are rather small. I recommend that the authors create the figures at the size they expect them to be in the paper. Doing that will mean that text elements are of the appropriate size.
- As climatological precipitation has very large spatial gradients I think it would be better to compute RMSE statistics as fractions of estimated precipitation rather than in mm.
- I have similar concerns about the monthly-resolved model as I do about the climatology. That is the model selection and given the satellite data only includes a relatively small part of the entire dataset wonder why this is being used at all. Further there is a lack of uncertainty analysis.
- For the monthly resolved dataset I suspect that interpolating fraction of normal precipitation, rather than total precipitation, would be more accurate.
- The authors should consider making available an anomaly dataset (difference from normal) available whose effective resolution is likely low.
- The authors make the data available in a variety of formats (NC for the climatology and geotiff for the monthly resolved dataset). They also describe the scaling used in the NC file. I have not looked at the data in detail. Given my concerns about the work I will wait for a revised paper before doing this. I do not think they need to say in the revised paper what the scaling is – that should be in the NetCDF file. Unless they are reducing the precision it is not necessary to introduce a scaling factor either. I looked at the precipitation climatology dataset using xarray. I suggest that sensible names are used and a correct crs code (or name) So change “variable” to “pr” or “tas” as appropriate. And rather than z set it to month. I also recommend adding some metadata to the dataset pointing to the paper, the authors, and information on the period covered and possibly the statistical model used.
Citation: https://doi.org/10.5194/essd-2022-45-RC2
Data sets
1 km Monthly Minimum Temperature Dataset for China from 1952 to 2019 (ChinaClim_time-series) Gonghaibo https://doi.org/10.5281/zenodo.5919423
1 km Monthly Maximum Temperature Dataset for China from 1952 to 2019 (ChinaClim_time-series) Gonghaibo https://doi.org/10.5281/zenodo.5919448
1 km Monthly Average Temperature Dataset for China from 1952 to 2019 (ChinaClim_time-series) Gonghaibo https://doi.org/10.5281/zenodo.5919450
1 km Monthly Precipitation Dataset for China from 1952 to 2019 (ChinaClim_time-series) Gonghaibo https://doi.org/10.5281/zenodo.5919442
A Brand-New and High-Quality Baseline Climatology Surface for China (ChinaClim_baseline) Gonghaibo https://doi.org/10.5281/zenodo.5900743
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
1,278 | 301 | 86 | 1,665 | 93 | 61 | 79 |
- HTML: 1,278
- PDF: 301
- XML: 86
- Total: 1,665
- Supplement: 93
- BibTeX: 61
- EndNote: 79
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1