the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Long history paddy rice mapping across Northeast China with deep learning and annual result enhancement method
Abstract. Northeast China, a significant production base for paddy rice, has received lots of attention in crop mapping. However, understanding the spatiotemporal dynamics of paddy rice expansion in this region remains limited, making it difficult to track the changes in paddy rice planting over time. For the first time, this study utilized multi-sensor Landsat data and a deep learning model, the full resolution network (FR-Net), to explore the annual mapping of paddy rice for Northeast China from 1985 to 2023 (available at https://doi.org/10.6084/m9.figshare.27604839.v1, Zhang et al., 2024). First, a cross-sensor paddy training dataset comprising 155 Landsat images was created to map the paddy rice. Then, we developed the annual result enhancement (ARE) method, which considers the differences in category probability of FR-Net at different stages to diminish the impact of the limited training sample in large-scale and across-sensors paddy rice mapping. The accuracy of the paddy rice dataset was evaluated using 107954 ground truth samples. In comparison to traditional rice mapping methods, the results obtained using the ARE method showed a 6 % increase in the F1 score. The overall mapping result obtained from the FR-Net model and ARE methods achieved high user accuracy (UA), producer accuracy (PA), F1 score, and Matthews correlation coefficient (MCC) values of 0.92, 0.95, 0.93, and 0.81, respectively. The study revealed that the area used for paddy rice cultivation in Northeast China increased from 1.11×104 km2 to 6.45×104 km2. Between 1985 and 2023, there was an overall expansion of 5.34×104 km2 in the paddy rice cultivation area, with the highest growth (4.33×104 km2) occurring in Heilongjiang province. This study shows that long-history crop mapping could be achieved with deep learning, and the result of paddy rice will be beneficial for making timely adjustments to cultivation patterns and ensuring food security.
- Preprint
(2663 KB) - Metadata XML
-
Supplement
(24 KB) - BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on essd-2024-516', Anonymous Referee #1, 13 Feb 2025
This paper presents an effort to map the long-term paddy rice cultivation in Northeast China using Landsat time series and deep learning. The proposed long-term maps could be useful for understanding the historical crop dynamics in the study area. However, I do have major concerns as below:
- Field data is critical for model training and map validation. However, the manuscript lacks clarity and details regarding how the authors collected the field samples and how they chose the field target for visual interpretation through GEE. The sampling strategy for field data collection is unknown, especially for validation data collection. Using points to validate 30-m maps is inappropriate especially when mixed pixels occur. The reported accuracies could be largely impacted by 30-m mixed pixels when only point data are employed for map validation.
- For the map evaluation, the distribution of the validation dataset is not reported in the manuscript. Are they derived from probability sample? Otherwise, the validation would not be valid. Constructing a confusion matrix based on pixel counting is not recommended. The population error matrix of classes with cell entries should be expressed in terms of the proportion of area. Besides, the uncertainty of these accuracy metrics should also be reported. Refer to Olofsson et. al (2014) for a guideline on how to conduct map accuracy evaluation solidly.
Reference:
Olofsson, P., Foody, G.M., Herold, M., Stehman, S.V., Woodcock, C.E. and Wulder, M.A., 2014. Good practices for estimating area and assessing accuracy of land change. Remote sensing of Environment, 148, pp.42-57.
Other comments:
L22: Are there average numbers from 1985 to 2023? Make it clearer.L44-45: What is the justification for such a statement that it's challenging to produce long-term maps using phenology-based methods? any references?
L64: I think you are saying that the final map for a specific year is derived from multiple intermediate maps within the year. But ‘multiple annual results” means multiple yearly maps, i.e., a map for each year. This could be confusing.
L116: You mentioned Result_pre in the text but there is no Result_pre in Eq.1.
L132: From Eq.2, t represents the image corresponding to the highest absolute value of the difference between the category probability and 0.5, not the direct highest Pi. Why not use max(Pi) instead? For example, if P1=0.1, P2=0.6, then there would be t=1, and Pt=P1=0.1. Would you determine the final results as non-paddy since Pt < 0.5?
L136: Are you using exactly the same parameters (models) in these different phenological stages? Otherwise, how could you ensure that the category probability outputs among m images are comparable?
L142: In which months did you download the data and from where? Specify the date range for each year, or the same range across years if they are consistent.
L145: Please provide the specific band names instead of numeric names.
L156: There are some issues with the validation dataset. The sampling strategy is unknown. How did you select the field sites to visit? If the distribution of validation datasets is biased (not randomly selected), then the map accuracy based on the validation datasets is not valid. Did you collect field data as point observations? Using points to validate 30-m maps is inappropriate especially when mixed pixels occur. What are the spatial and temporal distributions of training and validation data?
L170: 29906 + 9968 + 50956 + 16985 = 107815, this is less than the total size (68865 + 39098 = 107963, L154), did you remove any ground samples and why? What are the criteria for dividing the entire ground data into these training/validation sets with these specific numbers, for Landsat5 and 8/9 respectively?
L205: Please use explicit band names. In Fig.3, did you use the same probability threshold of 0.5 in both overlay maps and the ARE maps? For the red circled area, e.g., in E5, a non-paddy pixel in the overlay map means all category probability outputs are less than 0.5. According to Eq2, for ARE methods, a paddy pixel must have a probability greater than 0.5. How come a non-paddy pixel in the overlay map would become a paddy pixel in the ARE map?
L212: The distribution of the validation points is totally unknown. Are they derived from probability samples? Otherwise, the validation would not be valid. A confusion matrix based on pixel counting is not recommended. The population error matrix of classes with cell entries should be expressed in terms of the proportion of area. Besides, the uncertainty of these accuracy metrics should also be reported. Refer to Olofsson et. al (2014) for a guideline on how to conduct map accuracy evaluation in a solid manner.
Reference:
Olofsson, P., Foody, G.M., Herold, M., Stehman, S.V., Woodcock, C.E. and Wulder, M.A., 2014. Good practices for estimating area and assessing accuracy of land change. Remote sensing of Environment, 148, pp.42-57.
L236: Fig.4, what is the scale of this comparison? I assume this is the total area in the entire study area. What about the comparisons at the district, municipal, and provincial levels since you collected the agricultural statistics?
L263: what if there are no clear-sky observations available in the proceeding and subsequent years? did you leave it as no data? In Fig.7, what is "interpolated paddy"? Did you interpolate your classification directly from the classification in the previous/next year, if there is no cloud-free satellite data in the current year? This has to be clarified.
I assume the interpolated paddy pixels are derived from satellite data that are interpolated from previous/next Landsat observations. The term 'interpolated paddy' implies that the classified pixel itself is somehow interpolated, which makes no sense.
L275-276: There is no figure showing the admin boundaries and labels. It's hard to reader unfamiliar with China to link your descriptive context to the spatial locations in the map.
L301: This is contradictory to the table. Instead, #1 and #4 show that using data from only one sensor achieved the best accuracy and had the best results (if the models are trained and applied to the same sensor), compared to other scenarios using multiple sensors.
L304-305: Not clear how you conducted 'transfer learning". For #8, training on L5 & 20% L8, then apply to L8? Need to clarify.
L306: An enhanced accuracy compared to what?
Citation: https://doi.org/10.5194/essd-2024-516-RC1 -
RC2: 'Comment on essd-2024-516', Anonymous Referee #2, 04 Mar 2025
This study presents a spatially explicit rice mapping dataset for Northeast China spanning 1985–2023. The dataset holds significant value for land use and agricultural management research. However, the current manuscript's structure and expression do not position it as a major advancement but rather as an extension of Xia et al. (2022). Below are my suggestions for the authors' consideration.
Specific comments:
Abstract: Lines 13–14 provide a good overview of the methodology, including the data and model used. However, the workflow of ARE is unclear to readers unfamiliar with it. Adding one or two sentences explaining its mechanism would be beneficial. The explanation in lines 17–18 is too general. Readers may struggle to grasp the concept just from the abstract. Line 22 presents rice expansion values but lacks a specified duration. I recommend revising this sentence and the following one for better readability and clarity.
Introduction: This section is well-structured and easy to follow. In line 63, the phrase “multiple annual results” is unclear. Using plain language would improve readability.
Methods: This section requires substantial improvement for better clarity. Adding a workflow diagram would significantly enhance readability and coherence. The sections are loosely connected. The authors should clearly specify: Which datasets were used to feed the model? Which results were used for ARE? Which datasets were used for validation and how they relate to the modeling outputs or ARE results?
2.2.1: As this is the most critical section for mapping, more details on FR-Net are necessary. Although Xia et al. (2022) describes the model, a concise explanation of its working principles, strengths, and weaknesses is still needed. This will provide readers with a foundational understanding, allowing them to refer to the cited work for further details.
2.2.2: In line 113, it would be helpful to first explain why multiple mapping results exist and how they are produced. The ARE method is not clearly explained, which raises concerns. Equation 2 suggests that a good map (Pt > 0.5) wins only when it has a greater distance from 0.5. However, this approach may not be fair in all situations. For instance, probability values at the start and end of the growing season may be less reliable than those in mid-season, potentially leading to misclassification of rice pixels as non-rice. More details on this method and additional case studies under different conditions would be beneficial.
2.3.1: How is the growing season defined? How is the model trained by using these bands? The band meaning and numbers vary across years and satellite products, how are they handled? I suggest merging 2.3.3 to this section.
2.3.2: This dataset is crucial for validating the study’s product and holds significant value for broader research communities in the study area. It is necessary to publish the relevant dataset for validation checking and a broader use.
2.3.3: This section is very confusing and needs to be recontructed. First, what is the connect of XGBoost to the DL model? Given it can generate the paddy and non-paddy maps, what are the differences between its results and the DL model? Second, The ROIs in Fig 1(c)&(d) are very large. From my understand, they indicate paddy and non-paddy. Does it mean that within the ROI, all pixels are either paddy or non-paddy? Third, how was the manual correction conducted? Forth, What does the mask mean in line 167?
2.5: There are no clear criteria for model constraints, such as loss functions. This should be explicitly mentioned.
Technique comments:
The authors need to update the caption for figures. What is the scale of the dots in Fig 4, district, county, or province? In (a), the dots are for one year or multiple years? In Fig 7, what is the difference between paddy and interpolated paddy? In Fig 8, what does the trend mean? Are the values on map the change rate?
Citation: https://doi.org/10.5194/essd-2024-516-RC2
Data sets
Long history paddy rice mapping across Northeast China with deep learning and annual result enhancement method Zihui Zhang, Lang Xia, Fen Zhao, Yue Gu, Jing Yang, Yan Zha, Shangrong Wu, and Peng Yang https://doi.org/10.6084/m9.figshare.27604839.v1
Model code and software
Paddy Lang Xia https://github.com/xialang2012/Paddy
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
234 | 45 | 5 | 284 | 28 | 3 | 3 |
- HTML: 234
- PDF: 45
- XML: 5
- Total: 284
- Supplement: 28
- BibTeX: 3
- EndNote: 3
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1