the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
ChinaSoyArea10m: a dataset of soybean planting areas with a spatial resolution of 10 m across China from 2017 to 2021
Abstract. Soybean, an essential food crop, has witnessed a steady rise in demand in recent years. There is a lack of high-resolution annual maps depicting soybean planting areas in China, despite China being the world’s largest consumer and fourth largest producer of soybeans. To address this gap, we developed a novel method called phenological- and pixel-based soybean area mapping (PPS) based on Sentinel-2 remote sensing images from the Google Earth Engine (GEE) platform. We utilized various auxiliary data (e.g., cropland layer, detailed phenology observations) to select the distinct features that differentiate soybeans most effectively from other crops across various regions. These features were then input for an unsupervised classifier (K-means), and the most likely type was determined by a post-classification method based on dynamic time warping (DTW). For the first time, we generated a dataset of soybean planting areas across China, with a high spatial resolution of 10 meters, spanning from 2017 to 2021 (ChinaSoyArea10m). The R2 values between the mapping results and the census data at both county- and prefecture-level were consistently around 0.85 in 2017–2020. Moreover, the overall accuracy of mapping results at the field level in 2017, 2018, and 2019 were 77 %, 84 % and 88 %, respectively. Compared with the existing 10-m crop-type maps in Northeast China (Cropland Data Layer, CDL) based on field samples and supervised classification methods, the mapping accuracy is significantly improved by 31 % (R2 increases from 0.53 to 0.84) according to their consistency with census data, particularly at the county level. ChinaSoyArea10m is spatially consistent well with the two existing datasets (CDL and GLAD maize-soybean map). ChinaSoyArea10m provides important information for sustainable soybean production and management, as well as agricultural system modeling and optimization. ChinaSoyArea10m can be downloaded from an open-data repository (DOI: https://zenodo.org/doi/10.5281/zenodo.10071426, Mei et al., 2023).
- Preprint
(2514 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on essd-2023-467', Anonymous Referee #1, 11 Jan 2024
The article proposed an unsupervised method for identifying soybean crops within the defined croplands across China. The topic is interesting, and also important for sustainable agricultural development due to its large spatial and long-term coverage. The data is well collected and processed, and the results are properly presented. I would suggest some minor revisions as listed below.
L43: Not sure what are the “shortcomings of domestic supply”?
L46: Please add references to previous studies.
L59-62: I would suggest revising the statements as “the previous studies made laudable efforts to craft a comprehensive national maize-soybean map for China in 2019 by combining field data and regression estimators (Li et al., 2023). Nonetheless, these studies were confined to specific regions or a single year, despite prior attempts to accurately map soybean cultivation areas.”
L64-70: to me, this is not “generally” way of categorizing remote sensing classification methods. Supervised and unsupervised are the widely accepted categories. I would suggest authors revise the paragraph, link the specific classification method mentioned in L71-78 to each category, and discuss the pros and cons.
L121: Please justify the impact of using TOA reflectance, rather than surface reflectance, on classification results.
L123: Depending on the platform/sensor used, red edge bands are also typical “traditional bands” in vegetation-related studies.
L135: Please specify what is the “gaps”? If it is related to crop growth, how the “average” procedure was conducted?
L189: it seems the purpose of this paragraph is to provide an overview of the method. The details regarding the “soybean mapping” can be merged with the sections below.
L198-L200: I recon this is also the step that deals with the data gaps due to cloud? Please add more details regarding the method incorporated (e.g. moving window size etc?) if possible.
L203: no-cropland --> non-cropland
L204: you might need to define the “starting and ending dates of the growing season” first.
L206: Please provide the full name for EVI first. And, revise the sentence slightly, “… we masked out the pixels with maximum EVI less than 0.4 during the growing seasons”. Please also justify how the threshold (0.4) for fallow land was determined.
L296- : it is good that the authors noticed the large estimation uncertainties in small-planting regions (figure 4, and figures 5). It would help to justify why this happened by looking into several regions and checking the reasons.
Also, given the great similarities of maize and soybean index profiles (Figure 3), it is important to check whether the overestimated regions belong to maize crops? Since the classifiers are trained for individual regions, the authors might consider increasing the number of clusters for sparsely planting regions if maize is mixing with soybean due to their similarities? One potential way to check is to compare the ChinaSoyArea10m with the GLAD layer, especially the overestimating regions?
L315: to me, “became higher and higher” is not a scientific way to describe the trend here. Please consider “increased” or similar terms for the statement if it is a critical finding.
L353: It is good to see the authors outline the limitations of the proposed method in regard to its sensitivity to data availability and applicability in sparsely planted regions. It would be good to have some insights into the advantages of the method compared to the mentioned GLAD and CDL products and promote its applications in some suggested circumstances.
Citation: https://doi.org/10.5194/essd-2023-467-RC1 -
CC1: 'Reply on RC1', Qinghang Mei, 03 Mar 2024
Thank you for your positive and constructive comments, which surely encourage us to further enhance our research quality. We carefully revised our manuscript and provided a point-by-point response in the supplement. Moreover, we have positively addressed all points in the revised edition, which will be updated after responding all referees’ comments.
-
AC1: 'Reply on RC1', zhao zhang, 04 Mar 2024
Dear Reviewer,
Thank you for your positive and constructive comments, which surely encourage us to further enhance our research quality. We carefully revised our manuscript and provided a point-by-point response in the supplement. Moreover, we have positively addressed all points in the revised edition, which will be updated after responding all referees’ comments.
Thank you again for your reviewing and valuable comments.
Sincerely,
Qinghang Mei and Zhao Zhang on behalf of all co-authors
-
CC1: 'Reply on RC1', Qinghang Mei, 03 Mar 2024
-
RC2: 'Comment on essd-2023-467', Anonymous Referee #2, 06 Feb 2024
The ms employed two steps method to map soybean at large scale in China for 2017-2021. While the topic and the generated dataset have great potential to benefit the agriculture community in both research and operational monitoring aspects, there are some major flaws that need to be addressed to enhance the scientific soundness of the paper and the reliability of the data.
The authors listed three objectives. The new data product of soybean maps was generated and openly shared to address the third objective. However, the first two objectives have not been thoroughly investigated. Further examination is required to test the method's robustness in extracting soybean fields across different regions. Although the nationwide validation using ground samples shows generally acceptable accuracy, the variations in accuracy among regions need to be illustrated. This can be easily done as the classification was applied at the prefecture level. Additionally, the accuracy in low soybean growing regions should be specified. The proposed method appears to be ineffective in accurately extracting soybean fields and lacks effectiveness in non-soybean producing provinces. In this case, it may not be meaningful to generate soybean map at a national scale while most non-producing provinces presents unreliable results. Additionally, the validation process is questionable since the data used to determine soybean clusters was also used in the validation.
Specific comments:
- Line 29, Cropland Data Layer or Crop Data Layer? The existing maps are described as crop type maps not cropland maps.
- In second paragraph of introduction section, it is recommended to specify the research study areas for each citation when highlighting their work. For example, line 52 to 55. I thought the research generated 20-years maize-soybean maps for whole China but it is not.
- Line 145-147, does National Bureau of Statistics of China provide county and prefecture level data? How to you use national and provincial data to validate at county and prefecture level information?
- Line 215-217, how high the uncertainty resulting from the cloud cover or miss values during the proposed period?
- Line 255-257 not clearly stated. Any quantitative information to determine whether crops are major ones or minor ones? It is problematic when statistical area of some crops in double cropping pattern, for example double rice.
- Ground samples were only collected from 2017 to 2019 in five provinces. How do you determine the whether the clusters are closest to the soybean samples in other 9 provinces and 2020-2021 when DTW is applied? Even during 2017 to 2019, you don’t have soybean samples collected during the ground survey.
- Those ground samples were used in both cluster assignments and validation. Scientifically, independent validation shall be applied.
- According to the validation in 3.1, it seems that the mapping accuracy is much lower in counties with less soybean area at both county and prefecture level. This does not surprise me due to the combined resolution bias and the algorithm uncertainties. This raise up another question, is it meaningful to generate soybean maps at almost whole national scale?
- The discussion needs significant improvements. The author discussed the limitations of the research while ignoring the strong points of the research. Also, the uncertainty of the classification at small-scale soybean cultivation areas shall be addressed from a more theoretical way.
- The ms does not consider the soybean-maize intercropping systems in part of China.
Citation: https://doi.org/10.5194/essd-2023-467-RC2 -
AC2: 'Reply on RC2', zhao zhang, 22 Mar 2024
Dear Reviewer,
Many thanks for your thoughtful and valuable comments and suggestions, which are very helpful in improving our manuscript. We have conducted substantial new experiments and analyses to ensure that the study is more comprehensive and rigorous, and our maps are more reliable. We carefully revised our manuscript and provided a point-by-point response in the supplement. Moreover, we have positively addressed all points in the revised edition, which will be updated after responding all referees’ comments.
Thank you again for your reviewing and valuable comments.
Sincerely,
Qinghang Mei and Zhao Zhang, on behalf of all co-authors
-
RC3: 'Comment on essd-2023-467', Anonymous Referee #3, 26 Feb 2024
This manuscript developed a phenological- and pixel-based soybean area mapping (PPS) method to identify soybean on a large scale and generated a dataset of soybean planting areas across China. The topic is significant for sustainable soybean production and management. However, the proposed methodology lacks notable innovation when compared to prior studies. Given the intricate spectral variations within soybeans and the fragmentated nature of agricultural landscapes across China, the presented method fails to demonstrate its robustness across diverse regions and time periods, therefore raising concerns about the reliability of the resulting soybean map. Furthermore, certain descriptions of the proposed method lack essential details and specific contents are not easy to follow. Below, I have provided several detailed comments:
1. Line83-93: The expression and logic are not clear. I suggest that the authors reorganize “method (5)” to emphasize its key theory, advantages and disadvantages. Additionally, Line93-98 should be revised to describe the fundamental theory and performance of those method proposed by prior researchers. Furthermore, in Introduction section, the authors didn’t introduce the fundamental concept behind the proposed method, nor highlighted the current issues faced by previous previous efforts in large-scale soybean mapping.
2. Fig.1 shows that there are more soybean agrometeorological observation stations in Jiangxi Province than in Sichuan Province. So, why does the study area not include regions in South China, especially prefectures in the Jiangxi Province?
3. As stated in lines 149-151, the regions chosen to validate the classification results didn’t include samples from fragmented planting regions with small soybean cultivation areas. Could this validation approach potentially lead to an overestimation of the overall validation accuracy? Additionally, there is a lack of a spatial distribution map for these field samples.
4. L206-207: References are needed to support these statements.
5. The main crop types and cropping intensity vary across regions with different climate conditions. However, Fig.3 (a-i) only presents spectral curves for soybean planting in Northern China. Are the phenological characteristics described in “(2) Feature selection” also applicable to soybeans planted in Southwestern China? I suggest that the authors also provide spectral curves of soybean and main crops planted in South China.
6. The authors need provide example figures illustrating the result of “time window from 15 days before the podding date (DOYpodding) to 15 days after the full-seed date (DOYseed)”
7. L241-242: these contents are confusing, is there any typo?
8. L255-256:How did you determine the number of K-mean clusters based on statistics? Further explanation is needed for clarity.
9. The DTW step is not clearly described:
(1) I wonder whether the length and time coverage of S2 time series used for calculating DTW distance vary across different AEZs?
(2) Did the authors use averaged time series for 100 random points and those for all field samples around the whole China to calculate DTW distances? If so, it is important to note that the spectral differences between crops in North and South China may affect the validity of DTW calculation results. Have you considered the impact of intra-class spectral differences in soybean samples from different regions on the DTW calculation results and the final classification results?
(3) Line 219-221: Did the authors use all the above 8 feature to calculate DTW distances? How did you integrated the 8 DTW distances into the final DTW value used for classification?
(4) “The cluster closest to the samples was identified as the soybean cluster.” How did you determine the threshold?10. Fig.8 (a1-3) depict false-color composite images composed of bands 4, 3, and 2. Distinguishing between soybeans and non-soybeans in these images is visually difficult. It is recommended to present images composited with other bands. The authors can refer to the following article, which uses the shortwave infrared band for false-color compositing.
Song X-P, Potapov P V, Krylov A, King L, Di Bella C M, Hudson A, Khan A, Adusei B, Stehman S V,Hansen M C. National-scale soybean mapping and area estimation in the United States using medium resolution satellite imagery and field survey. Remote Sens. Environ., 2017, 190: 383-395
You N,Dong J. Examining earliest identifiable timing of crops using all available Sentinel 1/2 imagery and Google Earth Engine. ISPRS-J. Photogramm. Remote Sens., 2020, 161: 109-12311. Fig. 9 indicates that there is a notably low frequency of clear observations in Sichuan Province, with the majority of areas showing zero clear observations per month. How can it be ensured that a complete 10-day composited time series is generated for DTW calculations in this region?
12. A considerable number of pixels corresponding to field ridges were inaccurately classified as soybeans in the 2020 map, particularly evident in East Heilongjiang, North Shandong and Henan Province. Can the authors consider the use of post-processing methods to eliminate this issue?Citation: https://doi.org/10.5194/essd-2023-467-RC3 -
AC3: 'Reply on RC3', zhao zhang, 24 Mar 2024
Dear Reviewer,
Many thanks for your thoughtful and valuable comments and suggestions, which are very helpful in improving our manuscript. We have conducted substantial new experiments and analyses to ensure that the study is more comprehensive and rigorous, and our maps are more reliable. We carefully revised our manuscript and provided a point-by-point response in the supplement. Moreover, we have positively addressed all points in the revised edition.
Thank you again for your reviewing and valuable comments.
Sincerely,
Qinghang Mei and Zhao Zhang, on behalf of all co-authors
-
AC3: 'Reply on RC3', zhao zhang, 24 Mar 2024
-
RC4: 'Comment on essd-2023-467', Anonymous Referee #4, 12 Mar 2024
Mei et al's work mapped the soybean planting areas across China with a high spatial resolution of 10 meters, spanning from 2017 to 2021, provided important information for sustainable soybean production and management, as well as agricultural system modeling and optimization. In this work, authors summarized five methods of mapping crops by remote sensing. The advantages and uncertainties of each method were compared, and a highly effective for accurately mapping crops over a larger region method named combining unsupervised classification and post-classification methods applied in this paper. They accomplished this by Sentinel-2 remote sensing images from the GEE platform with cropland layer and detailed phenology observations. They validated the results with the census data at both county- and prefecture-level, and with the two existing datasets (CDL and GLAD maize-soybean map).
Overall, I find this work to be valuable. However, I have some concerns regarding the robustness from the sparse number of AMSs in SW Zonal IV and uncertainty in quality of satellite imagery. I hope the authors will consider these points and provide further clarification in their responses and/or revisions. Please find my major comments and minor for clarification below.
Major comments:
- The text mentions the need for 10-day time series composite images per month, but in certain areas, the average monthly count of clear observations is insufficient to meet this requirement. Can the existing time series composite methods be optimized to accommodate the inadequacy of observational data?
- The observations per month of satellite imagery in SW Zonal IV are less, and the AMSs in this zonal also only have two sites. Whether it is possible to increase the observational data or phenological data from remote sensing to test the robust.
- To determine the potential cropping areas, authors filtered the pixels exhibiting an EVI maximum value during the growing season greater than 0.4 to remove fallow land. For spatial variation across four zonal, the constant threshold would bring some uncertainty. I expect to see more evidence for selecting 0.4 or a sensitivity analysis of threshold can also be implemented.
Minor comments:
Line 58: “same areas” means the north China?
Line 180, Figure2: The label on the left in Figure2 (i.e. ‘Data processing‘ and ‘Accuracy assessment’) are set to rotate 180° to match reading habits.
Line 180, Figure2: In step2, part (2) of the dashed box is confusing. What the color represents? If I understand correctly, they represent different layers of indexes. It is recommended to put the abbreviation to the right of the color layers.
-
AC4: 'Reply on RC4', zhao zhang, 24 Mar 2024
Dear Reviewer,
Many thanks for your thoughtful and valuable comments and suggestions, which are very helpful in improving our manuscript. We have conducted substantial new experiments and analyses to ensure that the study is more comprehensive and rigorous, and our maps are more reliable. We carefully revised our manuscript and provided a point-by-point response in the supplement. Moreover, we have positively addressed all points in the revised edition.
Thank you again for your reviewing and valuable comments.
Sincerely,
Qinghang Mei and Zhao Zhang, on behalf of all co-authors
Data sets
ChinaSoyArea10m: a dataset of soybean planting areas with a spatial resolution of 10 m across China from 2017 to 2021 Qinghang Mei, Zhao Zhang, Jichong Han, Jie Song, Jinwei Dong, Huaqing Wu, Jialu Xu, Fulu Tao https://zenodo.org/doi/10.5281/zenodo.10071426
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
643 | 144 | 37 | 824 | 25 | 29 |
- HTML: 643
- PDF: 144
- XML: 37
- Total: 824
- BibTeX: 25
- EndNote: 29
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1