the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
ChinaWheatYield30m: A 30-m annual winter wheat yield dataset from 2016 to 2021 in China
Yu Zhao
Shaoyu Han
Jie Zheng
Hanyu Xue
Zhenhai Li
Yang Meng
XuGang Li
Xiaodong Yang
Zhenhong Li
Shuhong Cai
Guijun Yang
Abstract. Generating spatial crop yield information is of great significance for academic research and guiding agricultural policy. Most existing public yield datasets have a coarse spatial resolution. Although these datasets are useful for analyzing regional temporal and spatial change, they cannot deal with spatial heterogeneity, which happens to be the most significant characteristic of the Chinese small-scale farmers' economy. Hence, we generated a 30-m Chinese winter wheat yield dataset (ChinaWheatYield30m) for major winter wheat-producing provinces in China for the period 2016–2021 with a semi-mechanistic model (hierarchical linear model, HLM). The yield prediction model was built by considering the wheat growth status and climatic factors. It can estimate wheat yield with excellent accuracy and low cost using a combination of satellite observations and regional meteorological information (i.e., Landsat 8, Sentinel-2 and ERA5 data from the Google Earth Engine (GEE) platform). The results were validated by using in situ measurements and census statistics and indicated a stable performance of the HLM model based on calibration datasets across China, with r of 0.81** and nRMSE of 12.59 %. With regards to validation, the ChinaWheatYield30m dataset was highly consistent with in situ measurement data and census data, indicated by r (nRMSE) of 0.72** (15.34 %) and 0.73** (19.41 %). With its high spatial resolution and accuracy, the ChinaWheatYield30m is a valuable dataset that can support numerous applications, including crop production modeling and regional climate evaluation.
Yu Zhao et al.
Status: final response (author comments only)
-
RC1: 'Comment on essd-2022-417', Anonymous Referee #1, 03 Feb 2023
The study aims to generate a 30-m Chinese winter wheat yield dataset (ChinaWheatYield30m) using hierarchical linear model with various input datasets. Results show that the ChinaWheatYield30m dataset was consistent with in-situ measurement data and statistical data, indicated by r (nRMSE) of 0.72** (15.34%) and 0.73** (19.41%), respectively. Overall, there are some aspects needed to be further improved, and the comments are listed below.
- Line 161-168. How many field-scale yields per year? Where are they distributed? A map is needed for this. And are these fields in the same location each year? If not, a table can be listed to show the number of fields for each year and province.
- Line 185-186. Why train the model separately for each province, considering that some provinces have a large geographical span and thus may have internal heterogeneity, I think it may be more reasonable to divide them according to agricultural cultivation subdivisions.
- Line 218. Please check, is “relative root mean square error” shorted as “nRMSE” instead of “rRMSE”?
- Section 3.1 and 3.2, the field-level yield dataset can be further divided into different sets, and then a cross-validation result will better indicate the reliability of the model.
- The province-level validation for this dataset seems to be less meaningful because they are too different in scale and there are too many uncertainties, e.g., crop classification. County or municipal level validation may be more valuable.
- Section 3 is results and discussion, and Section 4 is discussion. I think the title is inappropriate.
- It would be better to present the exact location of the in situ measurement.
- Discussion section is insufficient, especially the uncertainty analysis. For example, how do author deal with these datasets with different spatial resolution, does this bring uncertainties to the final results? Is there any deficiency in HLM model? Etc.
- Some minor formatting issues: Line 163, “1 m2 per point”, Line 197-198, “βmj”, “γm0”,…, some subscripts are not displayed correctly.
Citation: https://doi.org/10.5194/essd-2022-417-RC1 -
AC1: 'Reply on RC1', Yu Zhao, 06 May 2023
The study aims to generate a 30-m Chinese winter wheat yield dataset (ChinaWheatYield30m) using hierarchical linear model with various input datasets. Results show that the ChinaWheatYield30m dataset was consistent with in-situ measurement data and statistical data, indicated by r (nRMSE) of 0.72** (15.34%) and 0.73** (19.41%), respectively. Overall, there are some aspects needed to be further improved, and the comments are listed below.
1.Line 161-168. How many field-scale yields per year? Where are they distributed? A map is needed for this. And are these fields in the same location each year? If not, a table can be listed to show the number of fields for each year and province.
[Response]: Thank you very much for your suggestion. The sample points may be in the same location each year, or the experimental sampling locations may be increased or modified. In order to more accurately display the sampling points and data in this article, Figure 1 in article has been modified and sample points have been added to the figure. In addition, a table has been added to record the sample numbers of the fields in different years and provinces. We therefore rewrote the following paragraph from Lines 126-128 and table 2:
Table 2 Detailed statistics on the sample numbers in this study.
Province
Anhui
Gansu
Hebei
Henan
Hubei
Jiangsu
Shaanxi
Shandong
Shanxi
Sichuan
Tianji
Xinjiang
Total
2016
12
8
26
45
33
10
3
11
1
149
2017
53
4
35
72
16
46
25
59
11
9
1
2
333
2018
85
3
63
126
18
47
21
56
14
13
1
3
450
2019
85
3
48
130
13
53
17
62
14
10
2
437
2020
82
10
26
121
11
60
19
52
14
0
395
2021
81
7
25
125
10
26
18
64
8
7
2
3
376
Total
398
35
223
619
68
265
100
303
64
50
5
10
2140
2.Line 185-186. Why train the model separately for each province, considering that some provinces have a large geographical span and thus may have internal heterogeneity, I think it may be more reasonable to divide them according to agricultural cultivation subdivisions.
[Response]: Thank you very much for your suggestion. China's agricultural regions refer to the division of China into different areas based on factors such as climate, land use, and crop types, in order to scientifically and reasonably organize and develop agricultural production. The main production areas of winter wheat in China are mainly distributed in some areas of the North China region, the Huang-Huai-Hai Plain, the Yangtze River Middle and Lower Reaches region, and the southwestern region. Each main production area of winter wheat includes multiple provinces. There are significant differences in main crop varieties, crop growth and development, and management practices in these regions. The distribution of China's agricultural regions and provinces can be seen in Figure S1. Figure S1(a) and (b) showed that one agricultural region spanning multiple provinces, for example, the Huang-Huai-Hai Plain includes Henan, Hebei and Shandong. These provinces are a smaller unit of one agricultural region, in other words, each province located in the same agricultural region. Therefore, this article trains the yield model at the province scale to maximize the accuracy of yield prediction results.
Of course, we also agree with your consideration of using agricultural regions for yield prediction. Therefore, we used different agricultural regions as standards for cross-validation of yield results, as shown in Figure 7, Line 236-239 and line 281-293. In addition, we also added a discussion on this aspect.
3.Line 218. Please check, is “relative root mean square error” shorted as “nRMSE” instead of “rRMSE”?
[Response]: Thank you very much for your suggestion. we replaced rRMSE, with nRMSE. The revised sections include all relevant information in the text, figures, and tables throughout the article.
4.Section 3.1 and 3.2, the field-level yield dataset can be further divided into different sets, and then a cross-validation result will better indicate the reliability of the model.
[Response]: Thank you very much for your suggestion. Based on your opinion, in addition to using independent samples for validation, we also selected cross-validation of the model deviation in different agricultural regions. In this paper, commonly used 5-fold cross-validation is used in this study. Modification is incorporated in Line 236-239, Line 281-293 and Figure 7.
5.The province-level validation for this dataset seems to be less meaningful because they are too different in scale and there are too many uncertainties, e.g., crop classification. County or municipal level validation may be more valuable.
[Response]: Thank you very much for your suggestion. The modification has been made according to your opinion, mainly including the acquisition of statistical data, the purpose of comparison, and the comparison results. Modification is incorporated in Line 175-179 and Figure 6.
6.Section 3 is results and discussion, and Section 4 is discussion. I think the title is inappropriate.
[Response]: Thank you very much for your suggestion. We apologized for the mistake. We have changed the title of Section 3 to "Results" and updated and revised the content of the article accordingly.
7.It would be better to present the exact location of the in situ measurement.
[Response]: Thank you very much for your suggestion. We fully understand the purpose of presenting the in situ measurement points, so we have added Table 2 and modified Figure 1 to better display the spatial distribution of the data and sample number.
8.Discussion section is insufficient, especially the uncertainty analysis. For example, how do author deal with these datasets with different spatial resolution, does this bring uncertainties to the final results? Is there any deficiency in HLM model? Etc.
[Response]: Thank you very much for your suggestion. To further discuss the potential uncertainties in the dataset input variables, model, and results, the article analyzed various aspects including dataset resolution, classification dataset, survey samples, and model structure. Modification is incorporated in Line 378-418, as follows:
“1) Remote sensing and meteorological data used in this study still have uncertainties. This study generated ChinaWheatYield30m dataset with 30-m resolution, the primary reason is we adopted winter wheat classification map from (Yuan et al., ESSD 2020), providing highest resolution of 30-m wheat pixels. The ChinaWheatYield30m input data consist of meteorological variables and remote sensing data, all datasets were resampled to a 30-m resolution to ensure data uniformity. In terms of remote sensing data, resampling Sentinel 2 data to 30 meters may result in loss of some surface information, and the differences between pixels in the image may not be accurately captured. The increase in the number of mixed pixels can lead to uncertainties in yield estimation results. Besides, maximum EVI2 is obtained at the heading or flowering period (Luo et al., 2020), but due to the irregular availability of usable Sentinel 2 and Landsat 8 observations, the maximum EVI2 nationwide may correspond to different phenological periods.In addition, meteorological data is another important component of the yield dataset. To obtain spatially and temporally continuous meteorological driving data, this study utilizes a dataset generated by ECMWF, its meteorological data was timely updated to meet our spatio-temporal demand. However, meteorological data such as precipitation, temperature, and radiation exhibit highly nonlinear and chaotic characteristics (Lorenz, 1993), leading to ongoing debates about the reliability of interpolation methods. The coarse resolution of meteorological data, combined with its high spatial homogeneity over larger areas, weakens its ability to effectively capture the relationship between remote sensing data and yield variations as the second-level correction in the HLM model.
2)Uncertainties in winter wheat classifications are transferred to the yield predictions. The wheat classification is based on optical remote sensing data and may be affected by meteorological factors such as clouds and rain (Dong et al., 2020).
…
4) The uncertainties of HLM application scenarios need further analysis. There is a nested issue between vegetation indices and yield relationships, as well as between meteorological data and yield relationships (Li et al., 2020; Xu et al., 2020). HLM has advantages in addressing this problem. Under similar meteorological conditions, the yield estimation of the model mainly depends on the differences in vegetation indices. In the major wheat production area, variations in crop types, soil types, climate factors, and other factors have an impact on the model's estimation results (Li et al., 2021). The current model only considers the effect of meteorological data on remote sensing yield estimation, and future analyses will incorporate additional factors such as soil to generate more accurate yield datasets. The current model is primarily constructed based on normal production conditions, and estimating winter wheat yield under abnormal climatic conditions introduces significant uncertainties. Therefore, it is necessary to consider stress factors and further improve the framework of remote sensing estimation models for winter wheat in the future.”
9.Some minor formatting issues: Line 163, “1 m2 per point”, Line 197-198, “βmj”, “γm0”,…, some subscripts are not displayed correctly.
[Response]: Thank you very much for your suggestion. Modification is incorporated in manuscript.
-
RC2: 'Comment on essd-2022-417', Anonymous Referee #2, 24 Mar 2023
This article generated a 30m Chinese winter wheat yield from 2016 to 2021 based on the HLM model, called ChinaWheatYield30m. The semi-mechanical model was constructed in a combination of RS observations and regional meteorological data for major wheat-producing regions in China. The ChinaWheatYield30m dataset is validated and has a potential to be applied in some related academic researches.
The paper was basically well organized and written. However, to further improve the paper, some issues need to be deal with. Below are some specific comments:
1. The detailed description is needed to address how to compare the ChinaWheatYield30m dataset with the province-level statistical data;
2. The strength of the ChinaWheatYield30m dataset needs to be emphasized comparing with some existed remote sensing yield estimation datasets;
3. The Table 1 needs to be reformatted;
4. A line needs to be inserted between the Table1 and the below paragraph;
5. Line135:“2.1 The winter wheat land cover data” should be “2.2.1”;
6. There is something wrong with the format of the 2.3.1 section, needs to be adjusted.
Citation: https://doi.org/10.5194/essd-2022-417-RC2 -
AC2: 'Reply on RC2', Yu Zhao, 06 May 2023
This article generated a 30m Chinese winter wheat yield from 2016 to 2021 based on the HLM model, called ChinaWheatYield30m. The semi-mechanical model was constructed in a combination of RS observations and regional meteorological data for major wheat-producing regions in China. The ChinaWheatYield30m dataset is validated and has a potential to be applied in some related academic researches.
The paper was basically well organized and written. However, to further improve the paper, some issues need to be deal with. Below are some specific comments:
1.The detailed description is needed to address how to compare the ChinaWheatYield30m dataset with the province-level statistical data;
[Response]: Thank you very much for your suggestion. The primary purpose of statistical data is to verify the accuracy of data sets when statistics are performed at different scales, in order to better serve different institutions for use. Modification is incorporated in Line 223-225. as following:
“This study compared and analyzed national statistical data at different scales, focusing mainly on the provincial and municipal levels, to validate the accuracy of the ChinaWheatYield30m dataset. This study compared the difference between statistical yield per unit area from 2016 to and the average yield using ChinaWheatYield30m extracted from both province and municipal vector data.”.
2.The strength of the ChinaWheatYield30m dataset needs to be emphasized comparing with some existed remote sensing yield estimation datasets;
[Response]: Thank you very much for your suggestion. According to your opinion, the article further elaborates on the advantages of the ChinaWheatYield30m. Modification is incorporated in Line 344-375. as following:
“…
1) This study generated ChinaWheatYield30m dataset with 30-m resolution (Fig.10), the primary reason is we adopted winter wheat classification map from (Yuan et al., ESSD 2020), providing highest resolution of 30-m wheat pixels. Such a resolution will provide not only higher result credibility, but also balance the computational efficiency problems. High-resolution yield datasets can provide more accurate spatial information about crop production, improving agricultural productivity and enabling rapid monitoring and analysis of large agricultural areas. This allows for timely detection and resolution of issues that arise during crop growth, ultimately enhancing both the efficiency and effectiveness of agricultural production.
2) A stable accuracy at field scale and large regional scale will highly contributing to field management, modelling agricultural systems, drafting agricultural policies. This study combined remote sensing and meteorological data to construct a spatiotemporally expandable HLM method for predicting winter wheat yield in the main producing areas. The relationship between vegetation index and crop yield varies across different years and regions (Li et al., 2020). Meteorological data has an important impact on crop yield (Moschini and Hennessy, 2001; Lee et al., 2013). Li et al. (2021) showed that environmental data for wheat in China explained more than 60% of the variation in wheat yield. In this study, we generated ChinaWheatYield30m with stable results, which fully exploited the advantages of HLM to solve the nested problem of yield prediction impacted by remote sensing and meteorological data.
3) The product has a high real-time performance and can be used to forecast the output in the early period of the year. EVI2max and meteorological data used in this paper can be obtained before May, while wheat in China's main winter wheat production areas is generally harvested in June. Therefore, the proposed method can accurately predict winter wheat yield in real time. The strengths of the HLM model are overcoming inter-annual and regional variations (Li et al., 2020; Xu et al., 2021; Zhao et al., 2022). The results based on field investigation and statistical data show that the method can accurately predict winter wheat yield in the main production areas. The ChinaWheatYield30m is presumed to be most commonly concerned in metropolis level or county level, in this sense, the resolution will be feasible to these scales.”
3.The Table 1 needs to be reformatted;
[Response]: Thank you very much for your suggestion. Table 1 is now reformatted.
4.A line needs to be inserted between the Table1 and the below paragraph;
[Response]: Thank you very much for your suggestion. A row has been inserted below Table 1 in the article.
5.Line135:“2.1 The winter wheat land cover data” should be “2.2.1”;
[Response]: Thank you very much for your suggestion. Based on your feedback, 2.1 has been changed to 2.2.1. Modification is incorporated in Line 138.
- There is something wrong with the format of the 2.3.1 section, needs to be adjusted.
[Response]: Thank you very much for your suggestion. The format of section 2.3.1 has been modified and all format throughout the entire document has been checked. Modification is incorporated in manuscript.
-
AC2: 'Reply on RC2', Yu Zhao, 06 May 2023
Yu Zhao et al.
Data sets
ChinaWheatYield30m: A 30-m annual winter wheat yield dataset from 2016 to 2021 in China Yu Zhao, Shaoyu Han, Jie Zheng, Hanyu Xue, Zhenhai Li, Yang Meng, Xuguang Li, Xiaodong Yang, Zhenhong Li, Shuhong Cai, Guijun Yang https://doi.org/10.5281/zenodo.7360753
Yu Zhao et al.
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
415 | 133 | 17 | 565 | 4 | 6 |
- HTML: 415
- PDF: 133
- XML: 17
- Total: 565
- BibTeX: 4
- EndNote: 6
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1