AsiaRiceYield4km: Seasonal Rice Yield in Asia from 1995 to 2015
- 1Key Laboratory of Environmental Change and Natural Disasters, Ministry of Education Beijing Normal University, Beijing 100875, People’s Republic of China
- 2School of National Safety and Emergency Management, Beijing Normal University, Beijing 100875 / Zhuhai 519087, People’s Republic of China
- 3Key Laboratory of Land Surface Pattern and Simulation, Institute of Geographical Sciences and Natural Resources Research, Chinese Academy of Sciences, Beijing, 100101, People’s Republic of China
- 4College of Resources and Environment, University of Chinese Academy of Sciences, Beijing 100049, People’s Republic of China
- These authors contributed to the work equally and should be regarded as co-first authors.
- 1Key Laboratory of Environmental Change and Natural Disasters, Ministry of Education Beijing Normal University, Beijing 100875, People’s Republic of China
- 2School of National Safety and Emergency Management, Beijing Normal University, Beijing 100875 / Zhuhai 519087, People’s Republic of China
- 3Key Laboratory of Land Surface Pattern and Simulation, Institute of Geographical Sciences and Natural Resources Research, Chinese Academy of Sciences, Beijing, 100101, People’s Republic of China
- 4College of Resources and Environment, University of Chinese Academy of Sciences, Beijing 100049, People’s Republic of China
- These authors contributed to the work equally and should be regarded as co-first authors.
Abstract. Rice is the most important staple food in Asia. However, high-spatiotemporal-resolution rice yield datasets are very limited over a large region. The lack of such products hugely hinders the studies on accurately assessing the impacts of climate change and simulating agricultural production. Based on dynamic rice maps in Asia, we incorporated four predictor categories into three machine learning (ML) models to generate a high-spatial-resolution (4 km) rice yield dataset (AsiaRiceYield4km) for main rice seasons from 1995 to 2015. Four predictor categories considered the most comprehensive rice growing conditions and the optimal ML model was determined for each rice season based on an inverse proportional weight method. The results showed that AsiaRiceYield4km has a good accuracy for seasonal rice yield prediction (single rice: R2 = 0.88, RMSE = 920 kg/ha, double rice: R2 = 0.91, RMSE = 554 kg/ha, and triple rice: R2 = 0.93, RMSE = 588 kg/ha). Compared with Spatial Production Allocation Model (SPAM), R2 of grided rice yields was improved by 0.20 and RMSE was reduced by 618 kg/ha on average for single rice. Particularly, constant environmental conditions including longitude, latitude, elevation, and soil properties contributed the most (~45 %) to rice yield prediction. As for different growing periods of rice, we found that the predictors in reproductive period had more impacts on rice yield prediction than those of the vegetative period and the whole growing period. AsiaRiceYield4km is a novel high-spatial-resolution gridded rice yield dataset that can fill the unavailability of seasonal yield products across major rice production areas and promote more relevant studies on agricultural sustainability in the world. AsiaRiceYield4km can be downloaded from an open-data repository (DOI: https://doi.org/10.5281/zenodo.6901968; Wu et al., 2022).
- Preprint
(1888 KB) -
Supplement
(305 KB) - BibTeX
- EndNote
Huaqing Wu et al.
Status: closed
-
RC1: 'Comment on essd-2022-273', Anonymous Referee #1, 13 Sep 2022
This manuscript developed a high spatial resolution (4km) rice yield dataset from 1995 to 2015, covering major rice growing seasons and regions in Asia. Overall, this dataset would be a good complement to current rice yield products due to its high spatiotemporal resolution. I have the following questions or suggestions, which may help improve the manuscript clarity.
Major concerns:
- The authors used the GLASS AVHRR LAI data to extract key crop phenological indicators for training, including planting, heading, and harvesting dates. However, since rice fields in Asia are very fragmented and the spatial resolution of GLASS LAI data (i.e., 0.05 deg) is not fine enough to capture pure rice LAI information, there should be mixed-pixel problems. How did the authors deal with these problems? In addition, I would say the extracted planting and harvesting dates are more of indicators of the early rapid growth and senescence stages rather the real planting and harvesting dates. The authors should clarify these conceptual differences to avoid possible confusions.
- The authors used the Pearson correlation analysis to identify those predictors with a significant correlation with rice yield at each administrative unit for training (Line 218-220). I'm curious if the authors trained the model in each administrative unit and then combined all the training results to get the rice yields for the entire Asian region. More explanations about the experimental implementations should be given. Meanwhile, how do the authors deal with the multicollinearity problems of these predictors? There is a significant correlation between the different predictors in Table S3. In addition, I found very limited information on hyper-parameters in the supplementary material, the authors may want to provide detailed information of those parameters in each optimal model (e.g., how many hidden layers, node numbers, and max-depth, etc). Furthermore, in Line 295, detailed information on the trained 27 optimal models should also be give (maybe present in the supplementary material).
- The authors compared their dataset with observations via scatter plots (Figure 5). This is good. However, it would be better if the authors can additionally provide comparisons of the interannual variations in rice yield for each rice system (e.g., single, double early and later) in each country (there should be some survey data). The performance of your dataset in capturing interannual variations in rice yield is important.
- The authors used cumulative values of predictors (e.g., LAI and PDSI) in different phenological periods (e.g., vegetative and reproductive) to train models. However, these cumulative information has no actual physiological significance. Meanwhile, considering that crop phenological dates (e.g., planting and harvesting) vary from year to year, it would be better to use the average value of these predictors over each phenological periods for training (i.e., more comparable across years).
- I would suggest that the authors get editing help from someone with full professional proficiency in English, as the current manuscript has substantial language issues. I pointed out some, but not all.
Other concerns:
Line 72: When you say prediction, it is more of a future period than a historical period.
Line 112: Change “i.e., ” to “e.g., ”
Line 113: Change “Philippines” to “China”: the season number of 12 and 13 should belong to China.
Line 117: Change “are” to “were”.
Line 275: Have you tried any other proportions (e.g., 0.6/0.2/0.2) to examine the robustness of your datasets, trained models and evaluation results?
Figure 3: What does the legend mean? I didn’t see any difference in the color of these dots.
Section 3.2: I would suggest moving this section to the end of “3 Results”. Meanwhile, you should add additional analysis of temporal variations.
Line 417: Add using: by “using” multi-source
Table S1: names of the local administrative unit presents the specific… -> names of the local administrative unit represent the specific…
Table S2: Provide the full names of these abbreviations in the footnotes.
Table S3: What do you mean in these rows:
The sum of for whole growing period
The sum of for vegetative stage
The sum of for reproductive stage
The maximum for whole growing period
-
AC1: 'Reply on RC1', Zhao Zhang, 25 Nov 2022
Dear Editors and Reviewers:
Sincere thanks for the evaluation of this work and your valuable comments and suggestions for improving this manuscript. We carefully considered the concerning points and made efforts to improve the rigor, logic, and clarity of our manuscript (essd-2022-273) titled “AsiaRiceYield4km: Seasonal Rice Yield in Asia from 1995 to 2015”. Here we submit the revised version, which has been modified according to the comments from the reviewers. The major changes that we made in the revised manuscript are summarized as follows:
(1) To make the manuscript more readable, some words and phrases were clarified including case, administrative unit, and the three key phenological dates.
(2) To further illustrate the dataset division rules and gridded yield estimation, we revised (1) Dataset division rules and added one paragraph named (5) Gridded rice yield generation in Sect. 2.3.3. Figure 2 step3 was also revised correspondingly.
(3) As suggested, we added the temporal validation of AsiaRiceYield4km compared with observed yields (Sect. 3.2) and the temporal variation analysis of AsiaRiceYield4km from 1995 to 2015 (Sect. 3.4).
(4) Some uncertainty from GLASS LAI, crop intensity maps were added into Sect. 4.3 Uncertainty analysis.
(5) More information of the data sources and model parameters were listed in the revised supplement.
We attach the detailed item-by-item response to all comments and suggestions for the evaluation.
Yours sincerely,
Zhao Zhang and co-authors
-
RC2: 'Comment on essd-2022-273', Anonymous Referee #2, 03 Oct 2022
High-spatial and high-temporal resolution rice yield datasets are lack especially over large regions. The manuscript employed machine learning algorithms to generate long-term high-resolution rice yield over the South Asia, Southeast Asia, and East Asia. Undergoing a study at continental scales like this is a huge project. The 5km rice yield map over the major rice producing countries in Asia from 1995 to 2015 fills the data gap for assessing the impacts of climate change and the sustainable development. However, I have a few major concerns to be addressed so that the manuscript could be more solid.
(1) The rice cultivated area is the fundamental information for rice yield estimation. The manuscript used rice map for each year from 2000 to 2020 while the yield model was developed and used to estimate spatial distribution of rice yield during 1995 to 2015. Since most input dataset used for rice yield model in the study are available for the year 2000 to 2020, why not generating rice yield for 2000 to 2020 so that the map and the rice yield coincided with each other for the same year?
(2) Another concern is the way of predictor selection. The authors selected the predictors based on the correlation analysis between indicators and the yield at each administrative unit. While this is in general logic, it might be a problem when great differences existed in cropping patterns and the rice management in an administrative unit. The correlations may fail to achieve a significant level when an improper unit was targeted. This needs more clarification. Please also specify the administrative unit. Is it national level or sub-national level administrative units?
(3) The authors only used one vegetation indicator LAI as the inputs. It is assessed by several research that LAI products are of high uncertainty even for the improved GLASS LAI products. The product still has some abnormal values and unrealistic seasonality especially in winter. From my understanding, using LAI products might introduce high uncertainty in yield model which is unable to be solved.
(4) According to the importance of the indicators, static indicators (Year, Lat, Long, Ele) are much higher than other indicators. For some countries, the proportioned importance of CEC+TI indicators could be higher than 90%. And for the whole study area, the CEC+TI are the most important indicators. How to explain this? Does this mean there are no need to add other indicators for yield mapping?
(5) When the model is applied for yield estimation during different growing season, does the pixel level cropping intensity map used or it is mainly based on the majority of rice cropping patterns in each administrative unit? The uncertainty of season rice yield might exceeded the uncertainty of the model due to the biased seasonal rice map.
(6) Any possibility to use some in-situ collected actual yield data to validate the yield map?
Specific comments:
(1) Page 4 Line 106, what do you mean by 27 seasons?
(2) The authors collected many rice yield data from different sources. Please add more detailed information of the yield data including the spatial units, temporal extent, etc.
(3) Page 10, Line 229 – 234, the dataset was first divided into two parts according to the administrative units. 80% of the administrative units were randomly selected as training and validation among which 70% of samples were used for training and 30% were used as validation sets. In this case, the training samples were not 56% of the whole dataset. Same for validation and testing. Please make it more clear for readers.
(4) Add more testing results for other years. The authors estimated rice yield for Asia for 1995-2015 but was insufficiently validated and tested for different years. Also, the temporal changes of rice yield should be added to result and discussion sections.
-
AC2: 'Reply on RC2', Zhao Zhang, 25 Nov 2022
Dear Editors and Reviewers:
Sincere thanks for the evaluation of this work and your valuable comments and suggestions for improving this manuscript. We carefully considered the concerning points and made efforts to improve the rigor, logic, and clarity of our manuscript (essd-2022-273) titled “AsiaRiceYield4km: Seasonal Rice Yield in Asia from 1995 to 2015”. Here we submit the revised version, which has been modified according to the comments from the reviewers. The major changes that we made in the revised manuscript are summarized as follows:
(1) To make the manuscript more readable, some words and phrases were clarified including case, administrative unit, and the three key phenological dates.
(2) To further illustrate the dataset division rules and gridded yield estimation, we revised (1) Dataset division rules and added one paragraph named (5) Gridded rice yield generation in Sect. 2.3.3. Figure 2 step3 was also revised correspondingly.
(3) As suggested, we added the temporal validation of AsiaRiceYield4km compared with observed yields (Sect. 3.2) and the temporal variation analysis of AsiaRiceYield4km from 1995 to 2015 (Sect. 3.4).
(4) Some uncertainty from GLASS LAI, crop intensity maps were added into Sect. 4.3 Uncertainty analysis.
(5) More information of the data sources and model parameters were listed in the revised supplement.
We attach the detailed item-by-item response to all comments and suggestions for the evaluation.
Yours sincerely,
Zhao Zhang and co-authors
-
AC2: 'Reply on RC2', Zhao Zhang, 25 Nov 2022
Status: closed
-
RC1: 'Comment on essd-2022-273', Anonymous Referee #1, 13 Sep 2022
This manuscript developed a high spatial resolution (4km) rice yield dataset from 1995 to 2015, covering major rice growing seasons and regions in Asia. Overall, this dataset would be a good complement to current rice yield products due to its high spatiotemporal resolution. I have the following questions or suggestions, which may help improve the manuscript clarity.
Major concerns:
- The authors used the GLASS AVHRR LAI data to extract key crop phenological indicators for training, including planting, heading, and harvesting dates. However, since rice fields in Asia are very fragmented and the spatial resolution of GLASS LAI data (i.e., 0.05 deg) is not fine enough to capture pure rice LAI information, there should be mixed-pixel problems. How did the authors deal with these problems? In addition, I would say the extracted planting and harvesting dates are more of indicators of the early rapid growth and senescence stages rather the real planting and harvesting dates. The authors should clarify these conceptual differences to avoid possible confusions.
- The authors used the Pearson correlation analysis to identify those predictors with a significant correlation with rice yield at each administrative unit for training (Line 218-220). I'm curious if the authors trained the model in each administrative unit and then combined all the training results to get the rice yields for the entire Asian region. More explanations about the experimental implementations should be given. Meanwhile, how do the authors deal with the multicollinearity problems of these predictors? There is a significant correlation between the different predictors in Table S3. In addition, I found very limited information on hyper-parameters in the supplementary material, the authors may want to provide detailed information of those parameters in each optimal model (e.g., how many hidden layers, node numbers, and max-depth, etc). Furthermore, in Line 295, detailed information on the trained 27 optimal models should also be give (maybe present in the supplementary material).
- The authors compared their dataset with observations via scatter plots (Figure 5). This is good. However, it would be better if the authors can additionally provide comparisons of the interannual variations in rice yield for each rice system (e.g., single, double early and later) in each country (there should be some survey data). The performance of your dataset in capturing interannual variations in rice yield is important.
- The authors used cumulative values of predictors (e.g., LAI and PDSI) in different phenological periods (e.g., vegetative and reproductive) to train models. However, these cumulative information has no actual physiological significance. Meanwhile, considering that crop phenological dates (e.g., planting and harvesting) vary from year to year, it would be better to use the average value of these predictors over each phenological periods for training (i.e., more comparable across years).
- I would suggest that the authors get editing help from someone with full professional proficiency in English, as the current manuscript has substantial language issues. I pointed out some, but not all.
Other concerns:
Line 72: When you say prediction, it is more of a future period than a historical period.
Line 112: Change “i.e., ” to “e.g., ”
Line 113: Change “Philippines” to “China”: the season number of 12 and 13 should belong to China.
Line 117: Change “are” to “were”.
Line 275: Have you tried any other proportions (e.g., 0.6/0.2/0.2) to examine the robustness of your datasets, trained models and evaluation results?
Figure 3: What does the legend mean? I didn’t see any difference in the color of these dots.
Section 3.2: I would suggest moving this section to the end of “3 Results”. Meanwhile, you should add additional analysis of temporal variations.
Line 417: Add using: by “using” multi-source
Table S1: names of the local administrative unit presents the specific… -> names of the local administrative unit represent the specific…
Table S2: Provide the full names of these abbreviations in the footnotes.
Table S3: What do you mean in these rows:
The sum of for whole growing period
The sum of for vegetative stage
The sum of for reproductive stage
The maximum for whole growing period
-
AC1: 'Reply on RC1', Zhao Zhang, 25 Nov 2022
Dear Editors and Reviewers:
Sincere thanks for the evaluation of this work and your valuable comments and suggestions for improving this manuscript. We carefully considered the concerning points and made efforts to improve the rigor, logic, and clarity of our manuscript (essd-2022-273) titled “AsiaRiceYield4km: Seasonal Rice Yield in Asia from 1995 to 2015”. Here we submit the revised version, which has been modified according to the comments from the reviewers. The major changes that we made in the revised manuscript are summarized as follows:
(1) To make the manuscript more readable, some words and phrases were clarified including case, administrative unit, and the three key phenological dates.
(2) To further illustrate the dataset division rules and gridded yield estimation, we revised (1) Dataset division rules and added one paragraph named (5) Gridded rice yield generation in Sect. 2.3.3. Figure 2 step3 was also revised correspondingly.
(3) As suggested, we added the temporal validation of AsiaRiceYield4km compared with observed yields (Sect. 3.2) and the temporal variation analysis of AsiaRiceYield4km from 1995 to 2015 (Sect. 3.4).
(4) Some uncertainty from GLASS LAI, crop intensity maps were added into Sect. 4.3 Uncertainty analysis.
(5) More information of the data sources and model parameters were listed in the revised supplement.
We attach the detailed item-by-item response to all comments and suggestions for the evaluation.
Yours sincerely,
Zhao Zhang and co-authors
-
RC2: 'Comment on essd-2022-273', Anonymous Referee #2, 03 Oct 2022
High-spatial and high-temporal resolution rice yield datasets are lack especially over large regions. The manuscript employed machine learning algorithms to generate long-term high-resolution rice yield over the South Asia, Southeast Asia, and East Asia. Undergoing a study at continental scales like this is a huge project. The 5km rice yield map over the major rice producing countries in Asia from 1995 to 2015 fills the data gap for assessing the impacts of climate change and the sustainable development. However, I have a few major concerns to be addressed so that the manuscript could be more solid.
(1) The rice cultivated area is the fundamental information for rice yield estimation. The manuscript used rice map for each year from 2000 to 2020 while the yield model was developed and used to estimate spatial distribution of rice yield during 1995 to 2015. Since most input dataset used for rice yield model in the study are available for the year 2000 to 2020, why not generating rice yield for 2000 to 2020 so that the map and the rice yield coincided with each other for the same year?
(2) Another concern is the way of predictor selection. The authors selected the predictors based on the correlation analysis between indicators and the yield at each administrative unit. While this is in general logic, it might be a problem when great differences existed in cropping patterns and the rice management in an administrative unit. The correlations may fail to achieve a significant level when an improper unit was targeted. This needs more clarification. Please also specify the administrative unit. Is it national level or sub-national level administrative units?
(3) The authors only used one vegetation indicator LAI as the inputs. It is assessed by several research that LAI products are of high uncertainty even for the improved GLASS LAI products. The product still has some abnormal values and unrealistic seasonality especially in winter. From my understanding, using LAI products might introduce high uncertainty in yield model which is unable to be solved.
(4) According to the importance of the indicators, static indicators (Year, Lat, Long, Ele) are much higher than other indicators. For some countries, the proportioned importance of CEC+TI indicators could be higher than 90%. And for the whole study area, the CEC+TI are the most important indicators. How to explain this? Does this mean there are no need to add other indicators for yield mapping?
(5) When the model is applied for yield estimation during different growing season, does the pixel level cropping intensity map used or it is mainly based on the majority of rice cropping patterns in each administrative unit? The uncertainty of season rice yield might exceeded the uncertainty of the model due to the biased seasonal rice map.
(6) Any possibility to use some in-situ collected actual yield data to validate the yield map?
Specific comments:
(1) Page 4 Line 106, what do you mean by 27 seasons?
(2) The authors collected many rice yield data from different sources. Please add more detailed information of the yield data including the spatial units, temporal extent, etc.
(3) Page 10, Line 229 – 234, the dataset was first divided into two parts according to the administrative units. 80% of the administrative units were randomly selected as training and validation among which 70% of samples were used for training and 30% were used as validation sets. In this case, the training samples were not 56% of the whole dataset. Same for validation and testing. Please make it more clear for readers.
(4) Add more testing results for other years. The authors estimated rice yield for Asia for 1995-2015 but was insufficiently validated and tested for different years. Also, the temporal changes of rice yield should be added to result and discussion sections.
-
AC2: 'Reply on RC2', Zhao Zhang, 25 Nov 2022
Dear Editors and Reviewers:
Sincere thanks for the evaluation of this work and your valuable comments and suggestions for improving this manuscript. We carefully considered the concerning points and made efforts to improve the rigor, logic, and clarity of our manuscript (essd-2022-273) titled “AsiaRiceYield4km: Seasonal Rice Yield in Asia from 1995 to 2015”. Here we submit the revised version, which has been modified according to the comments from the reviewers. The major changes that we made in the revised manuscript are summarized as follows:
(1) To make the manuscript more readable, some words and phrases were clarified including case, administrative unit, and the three key phenological dates.
(2) To further illustrate the dataset division rules and gridded yield estimation, we revised (1) Dataset division rules and added one paragraph named (5) Gridded rice yield generation in Sect. 2.3.3. Figure 2 step3 was also revised correspondingly.
(3) As suggested, we added the temporal validation of AsiaRiceYield4km compared with observed yields (Sect. 3.2) and the temporal variation analysis of AsiaRiceYield4km from 1995 to 2015 (Sect. 3.4).
(4) Some uncertainty from GLASS LAI, crop intensity maps were added into Sect. 4.3 Uncertainty analysis.
(5) More information of the data sources and model parameters were listed in the revised supplement.
We attach the detailed item-by-item response to all comments and suggestions for the evaluation.
Yours sincerely,
Zhao Zhang and co-authors
-
AC2: 'Reply on RC2', Zhao Zhang, 25 Nov 2022
Huaqing Wu et al.
Data sets
AsiaRiceYield4km: Seasonal Rice Yield in Asia from 1995 to 2015 Huaqing Wu, Jing Zhang, Zhao Zhang, Jichong Han, Juan Cao, Liangliang Zhang, Yuchuan Luo, Qinghang Mei, Jialu Xu, Fulu Tao https://doi.org/10.5281/zenodo.6901968
Huaqing Wu et al.
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
470 | 153 | 16 | 639 | 45 | 4 | 3 |
- HTML: 470
- PDF: 153
- XML: 16
- Total: 639
- Supplement: 45
- BibTeX: 4
- EndNote: 3
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1