the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
NortheastChinaSoybeanYield20m: an annual soybean yield dataset at 20 m in Northeast China from 2019 to 2023
Abstract. Accurate monitoring of crop yield is important for ensuring food security. However, exiting yield datasets with a coarse spatial resolution are inadequate for capturing small scale spatial heterogeneity. Current yield estimation methods, such as machine learning models or the assimilation of remotely sensed biophysical variables into crop growth models, depend heavily on ground observations and involve significant computational costs. To solve these problems, a hybrid framework coupling the World Food Studies Simulation Model (WOFOST) and the Gated Recurrent Unit model (GRU) was proposed to generate a 20 m soybean yield dataset in Northeast China from 2019 to 2023 (NortheastChindaSoybeanYield20m). A soybean growth dataset was first generated based on the WOFOST that simulated various production scenarios (climates, crop varieties, soil types and agro-managements). The GRU model was then trained for characterizing relationships between model simulated LAI and soybean yield. The trained model was then applied for soybean yield estimation in Northeast China using time series LAI of different growth stages derived from Sentinel-2. The accuracy of the dataset was evaluated by in-situ measured and statistical data. The overall accuracy was 287.44 kg ha-1 and 272.36 kg ha-1 in the root mean squared error (RMSE) for field and regional scale, respectively. Stable results were achieved through the years with mean relative error (MRE) on average of 11.46 % in municipal scale and 7.94 % in provincial scale. Results demonstrated that the model was able to capture spatial-temporal variation of soybean yield. The NortheastChinaSoybeanYield20m was able to capture spatial-temporal variation of soybean yield, which can be applied for optimizing soybean production distribution and guiding agricultural decision-making. The NortheastChinaSoybeanYield20m dataset can be downloaded from https://doi.org/10.5281/zenodo.14263103 (Xu et al., 2024).
- Preprint
(1892 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on essd-2024-586', Anonymous Referee #1, 23 Jan 2025
-
AC1: 'Reply on RC1', Jingyuan Xu, 01 Mar 2025
We would like to sincerely thank your valuable comments and insightful suggestions. The feedback has greatly helped improve the quality of our manuscript. We have carefully addressed each comment and have provided detailed responses in the attached file.
-
AC2: 'Reply on RC1', Jingyuan Xu, 25 Apr 2025
Dear Referee #1,
Thank you again for your great efforts on our manuscript. We recently received comments from other reviewers and have updated our responses accordingly. Please refer to the supplement for our updated responses.
Sincerely yours,
Jingyuan Xu, on behalf of the co-authors
-
AC1: 'Reply on RC1', Jingyuan Xu, 01 Mar 2025
-
RC2: 'Comment on essd-2024-586', Anonymous Referee #2, 17 Mar 2025
The overall structure of the article is clear and logically organized. The research demonstrates innovation by integrating crop growth models with deep learning algorithms for soybean yield estimation, representing a promising direction in agricultural remote sensing. The research objectives are well-defined, aiming to address existing limitations in soybean yield data (insufficient spatial resolution and reliance on ground observations), thereby supporting optimized soybean production distribution and agricultural decision-making.
Specific Comments:
1 Introduction: The section comprehensively highlights soybean's global food security significance and limitations of current yield estimation methods, establishing a solid research rationale. However, the comparative discussion of data-driven and knowledge-driven methods could be more concise to better emphasize core issues and proposed solutions. Additionally, enhancing explanations of environmental factors' mechanisms (e.g., how climatic conditions affect growth cycles and photosynthesis, or how soil properties constrain nutrient uptake and water retention) would provide a more systematic understanding of key yield determinants and their interactions.
2 Data Collection: The dataset (field measurements, meteorological/soil data, satellite imagery, crop distribution maps, and statistics) is comprehensive and representative. However, data processing steps (e.g., meteorological data interpolation, satellite image preprocessing) require more detailed technical descriptions to improve reproducibility. Furthermore, explicit clarification is needed regarding spatial alignment and scale conversion methods employed for integrating multi-resolution datasets.
3 Results: Results are effectively visualized through figures/tables demonstrating WOFOST model simulations, multi-scale estimation accuracy, and spatial yield patterns. The analysis appropriately discusses model accuracy, stability, and spatiotemporal pattern recognition capabilities. However, deeper interpretation of anomalies (e.g., regional/yearly estimation errors) is needed. Notably, the systematic overestimation in field-scale validation suggests potential model biases (e.g., systematic errors or overfitting), warranting further investigation.
4 Discussion: When discussing MODIS-Sentinel-2 complementarity, quantitative comparisons of their performance under varying conditions (weather/vegetation coverage) would strengthen data selection guidance. Future research directions could be expanded by aligning with emerging trends (e.g., integration with IoT/blockchain technologies, precision agriculture applications), thereby enhancing both theoretical depth and practical relevance for agricultural challenges.
Citation: https://doi.org/10.5194/essd-2024-586-RC2 -
AC3: 'Reply on RC2', Jingyuan Xu, 25 Apr 2025
Dear Referee #2,
Thank you very much for your great efforts on our manuscript. Inspired by your valuable comments, we have made a major revision to our manuscript. Please refer to the supplement for our point-to-point responses to your comments.
Sincerely yours,
Jingyuan Xu, on behalf of the co-authors
-
AC3: 'Reply on RC2', Jingyuan Xu, 25 Apr 2025
-
RC3: 'Comment on essd-2024-586', Anonymous Referee #3, 23 Mar 2025
This study presents a well-structured and logically organized framework for high-resolution soybean yield estimation. The combination of process-based modeling with deep learning offers a novel perspective for enhancing agricultural monitoring capabilities. The objectives are clearly articulated, with a strong focus on improving soybean yield data accuracy to support agricultural decision-making and production optimization. The methodological approach is rigorous, leveraging diverse production scenarios to train the GRU model and applying time-series Sentinel-2 data for large-scale yield estimation. The evaluation using in-situ measurements and government statistical data provides strong validation, and the reported accuracy metrics indicate reliable model performance across spatial and temporal scales. There are some suggestions as follows, which can be considered for further improvement of the manuscript.
The research is well-founded and presents significant innovations. However, the abstract and introduction sections could benefit from more professional and polished language to enhance readability and better highlight the study’s contributions. Refining the writing style would improve clarity, strengthen the articulation of the research objectives, and more effectively emphasize the novelty of the proposed hybrid framework.
Figure 1: where is the soybean classification map from? What is the accuracy?
Figure 5 appears blurry, which affects the clarity and readability of the presented data. I suggest organizing box plots and histograms as subfigures.
The discussion on the advancements of the proposed method is embedded within the “Limitations and future developments” section. To better highlight the strengths of this study, I recommend extracting this content into a standalone subsection. This would allow for a clearer and more structured presentation of the method’s advantages, making it easier for readers to appreciate its contributions in comparison to existing approaches.
The conclusion effectively summarizes the study but could be further refined to better highlight the innovation in dataset construction and its practical applications in agricultural management.
Citation: https://doi.org/10.5194/essd-2024-586-RC3 -
AC4: 'Reply on RC3', Jingyuan Xu, 25 Apr 2025
Dear Referee #3,
Thank you very much for your great efforts on our manuscript. Inspired by your valuable comments, we have made a major revision to our manuscript. Please refer to the supplement for our point-to-point responses to your comments.
Sincerely yours,
Jingyuan Xu, on behalf of the co-authors
-
AC4: 'Reply on RC3', Jingyuan Xu, 25 Apr 2025
-
EC1: 'Comment on essd-2024-586', Peng Zhu, 12 Jul 2025
The authors developed a deep learning model using a GRU architecture to predict crop yield, utilizing only two predictors: LAImean1 and LAImean2. Given the simplicity of these two predictors, it raises questions about how they can achieve high prediction accuracy. The authors should provide a more detailed explanation of the underlying reasons or mechanisms that enable such effective performance with just these two variables.
Citation: https://doi.org/10.5194/essd-2024-586-EC1 -
AC5: 'Reply on EC1', Jingyuan Xu, 21 Aug 2025
Dear Editor,
Thank you very much for your great efforts on our manuscript. Inspired by your valuable comments, we have made a major revision to our manuscript. Please refer to the supplement for our response to your comments.
Sincerely yours,
Jingyuan Xu, on behalf of the co-authors
-
AC5: 'Reply on EC1', Jingyuan Xu, 21 Aug 2025
Data sets
NortheastChinaSoybeanYield20m: an annual soybean yield dataset at 20 m in Northeast China from 2019 to 2023 Jingyuan Xu, Xin Du, Taifeng Dong, Qiangzi Li, Yuan Zhang, Hongyan Wang, Jing Xiao, Jiashu Zhang, Yunqi Shen, and Yong Dong https://doi.org/10.5281/zenodo.14263103
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
770 | 195 | 32 | 997 | 24 | 39 |
- HTML: 770
- PDF: 195
- XML: 32
- Total: 997
- BibTeX: 24
- EndNote: 39
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1
I am very familiar with the WOFOST model and the dataset used by the author. It is not a good simulation project, not only because the simulation accuracy did not meet industry standards, but also because the author withheld many critical details and settings of the WOFOST in the manuscript, which makes it difficult for me to assess the rationality and scientific validity of the simulation. Earth System Science Data, as the name suggests, focuses on the application of datasets, but the author's professionalism in describing and processing the dataset is not good. Moreover, the description of CRU is severely inadequate. After reading the entire manuscript, I still do not understand the role of the CRU used by the author in this study.