the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Long history paddy rice mapping across Northeast China with deep learning and annual result enhancement method
Abstract. Northeast China, a significant production base for paddy rice, has received lots of attention in crop mapping. However, understanding the spatiotemporal dynamics of paddy rice expansion in this region remains limited, making it difficult to track the changes in paddy rice planting over time. For the first time, this study utilized multi-sensor Landsat data and a deep learning model, the full resolution network (FR-Net), to explore the annual mapping of paddy rice for Northeast China from 1985 to 2023 (available at https://doi.org/10.6084/m9.figshare.27604839.v1, Zhang et al., 2024). First, a cross-sensor paddy training dataset comprising 155 Landsat images was created to map the paddy rice. Then, we developed the annual result enhancement (ARE) method, which considers the differences in category probability of FR-Net at different stages to diminish the impact of the limited training sample in large-scale and across-sensors paddy rice mapping. The accuracy of the paddy rice dataset was evaluated using 107954 ground truth samples. In comparison to traditional rice mapping methods, the results obtained using the ARE method showed a 6 % increase in the F1 score. The overall mapping result obtained from the FR-Net model and ARE methods achieved high user accuracy (UA), producer accuracy (PA), F1 score, and Matthews correlation coefficient (MCC) values of 0.92, 0.95, 0.93, and 0.81, respectively. The study revealed that the area used for paddy rice cultivation in Northeast China increased from 1.11×104 km2 to 6.45×104 km2. Between 1985 and 2023, there was an overall expansion of 5.34×104 km2 in the paddy rice cultivation area, with the highest growth (4.33×104 km2) occurring in Heilongjiang province. This study shows that long-history crop mapping could be achieved with deep learning, and the result of paddy rice will be beneficial for making timely adjustments to cultivation patterns and ensuring food security.
- Preprint
(2663 KB) - Metadata XML
-
Supplement
(24 KB) - BibTeX
- EndNote
Status: open (until 07 Mar 2025)
-
RC1: 'Comment on essd-2024-516', Anonymous Referee #1, 13 Feb 2025
reply
This paper presents an effort to map the long-term paddy rice cultivation in Northeast China using Landsat time series and deep learning. The proposed long-term maps could be useful for understanding the historical crop dynamics in the study area. However, I do have major concerns as below:
- Field data is critical for model training and map validation. However, the manuscript lacks clarity and details regarding how the authors collected the field samples and how they chose the field target for visual interpretation through GEE. The sampling strategy for field data collection is unknown, especially for validation data collection. Using points to validate 30-m maps is inappropriate especially when mixed pixels occur. The reported accuracies could be largely impacted by 30-m mixed pixels when only point data are employed for map validation.
- For the map evaluation, the distribution of the validation dataset is not reported in the manuscript. Are they derived from probability sample? Otherwise, the validation would not be valid. Constructing a confusion matrix based on pixel counting is not recommended. The population error matrix of classes with cell entries should be expressed in terms of the proportion of area. Besides, the uncertainty of these accuracy metrics should also be reported. Refer to Olofsson et. al (2014) for a guideline on how to conduct map accuracy evaluation solidly.
Reference:
Olofsson, P., Foody, G.M., Herold, M., Stehman, S.V., Woodcock, C.E. and Wulder, M.A., 2014. Good practices for estimating area and assessing accuracy of land change. Remote sensing of Environment, 148, pp.42-57.
Other comments:
L22: Are there average numbers from 1985 to 2023? Make it clearer.L44-45: What is the justification for such a statement that it's challenging to produce long-term maps using phenology-based methods? any references?
L64: I think you are saying that the final map for a specific year is derived from multiple intermediate maps within the year. But ‘multiple annual results” means multiple yearly maps, i.e., a map for each year. This could be confusing.
L116: You mentioned Result_pre in the text but there is no Result_pre in Eq.1.
L132: From Eq.2, t represents the image corresponding to the highest absolute value of the difference between the category probability and 0.5, not the direct highest Pi. Why not use max(Pi) instead? For example, if P1=0.1, P2=0.6, then there would be t=1, and Pt=P1=0.1. Would you determine the final results as non-paddy since Pt < 0.5?
L136: Are you using exactly the same parameters (models) in these different phenological stages? Otherwise, how could you ensure that the category probability outputs among m images are comparable?
L142: In which months did you download the data and from where? Specify the date range for each year, or the same range across years if they are consistent.
L145: Please provide the specific band names instead of numeric names.
L156: There are some issues with the validation dataset. The sampling strategy is unknown. How did you select the field sites to visit? If the distribution of validation datasets is biased (not randomly selected), then the map accuracy based on the validation datasets is not valid. Did you collect field data as point observations? Using points to validate 30-m maps is inappropriate especially when mixed pixels occur. What are the spatial and temporal distributions of training and validation data?
L170: 29906 + 9968 + 50956 + 16985 = 107815, this is less than the total size (68865 + 39098 = 107963, L154), did you remove any ground samples and why? What are the criteria for dividing the entire ground data into these training/validation sets with these specific numbers, for Landsat5 and 8/9 respectively?
L205: Please use explicit band names. In Fig.3, did you use the same probability threshold of 0.5 in both overlay maps and the ARE maps? For the red circled area, e.g., in E5, a non-paddy pixel in the overlay map means all category probability outputs are less than 0.5. According to Eq2, for ARE methods, a paddy pixel must have a probability greater than 0.5. How come a non-paddy pixel in the overlay map would become a paddy pixel in the ARE map?
L212: The distribution of the validation points is totally unknown. Are they derived from probability samples? Otherwise, the validation would not be valid. A confusion matrix based on pixel counting is not recommended. The population error matrix of classes with cell entries should be expressed in terms of the proportion of area. Besides, the uncertainty of these accuracy metrics should also be reported. Refer to Olofsson et. al (2014) for a guideline on how to conduct map accuracy evaluation in a solid manner.
Reference:
Olofsson, P., Foody, G.M., Herold, M., Stehman, S.V., Woodcock, C.E. and Wulder, M.A., 2014. Good practices for estimating area and assessing accuracy of land change. Remote sensing of Environment, 148, pp.42-57.
L236: Fig.4, what is the scale of this comparison? I assume this is the total area in the entire study area. What about the comparisons at the district, municipal, and provincial levels since you collected the agricultural statistics?
L263: what if there are no clear-sky observations available in the proceeding and subsequent years? did you leave it as no data? In Fig.7, what is "interpolated paddy"? Did you interpolate your classification directly from the classification in the previous/next year, if there is no cloud-free satellite data in the current year? This has to be clarified.
I assume the interpolated paddy pixels are derived from satellite data that are interpolated from previous/next Landsat observations. The term 'interpolated paddy' implies that the classified pixel itself is somehow interpolated, which makes no sense.
L275-276: There is no figure showing the admin boundaries and labels. It's hard to reader unfamiliar with China to link your descriptive context to the spatial locations in the map.
L301: This is contradictory to the table. Instead, #1 and #4 show that using data from only one sensor achieved the best accuracy and had the best results (if the models are trained and applied to the same sensor), compared to other scenarios using multiple sensors.
L304-305: Not clear how you conducted 'transfer learning". For #8, training on L5 & 20% L8, then apply to L8? Need to clarify.
L306: An enhanced accuracy compared to what?
Citation: https://doi.org/10.5194/essd-2024-516-RC1
Data sets
Long history paddy rice mapping across Northeast China with deep learning and annual result enhancement method Zihui Zhang, Lang Xia, Fen Zhao, Yue Gu, Jing Yang, Yan Zha, Shangrong Wu, and Peng Yang https://doi.org/10.6084/m9.figshare.27604839.v1
Model code and software
Paddy Lang Xia https://github.com/xialang2012/Paddy
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
119 | 28 | 4 | 151 | 24 | 2 | 3 |
- HTML: 119
- PDF: 28
- XML: 4
- Total: 151
- Supplement: 24
- BibTeX: 2
- EndNote: 3
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1