the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
GlobalRice20: A 20 m resolution global paddy rice dataset for 2015 and 2024 derived from multi-source remote sensing
Abstract. Accurate, high-resolution spatial data of paddy rice are indispensable for assessing global food security and tracking progress toward Sustainable Development Goal 2 (Zero Hunger). However, a consistent global rice map at medium-to-high resolution has been lacking due to the challenges of cloud contamination and the temporal irregularity of multi-source satellite archives. Here, we present GlobalRice20, the first global 20m resolution paddy rice dataset for the years 2015 and 2024. We developed a "Time-Series-to-Vision" framework (T2VRCM) that transforms heterogeneous optical and SAR time-series into standardized 2D visual representations, specifically designed to handle irregular sampling and missing modalities. The dataset was produced using Sentinel-1/2 and Landsat imagery and rigorously validated against 164,000 reference samples, achieving an overall accuracy of 92.33 %. Cross-comparison with national agricultural statistics reveals a high coefficient of determination (R2 = 0.91 for 2024), confirming the dataset's reliability for national-scale accounting. Spatiotemporal analysis during the first decade of SDGs (2015–2024) indicates a 6.6 % expansion in global rice area, with Africa exhibiting the most significant growth (15.7 %). This dataset fills a critical gap in global agricultural monitoring, providing a baseline for analyzing food production trends and climate impacts. The dataset is available at https://doi.org/10.5281/zenodo.18168302 (Zhang et al., 2026).
- Preprint
(3822 KB) - Metadata XML
- BibTeX
- EndNote
Status: open (extended)
- CC1: 'Comment on essd-2026-24', Ran Huang, 10 Mar 2026 reply
-
CC2: 'Comment on essd-2026-24', Dailiang Peng, 22 Mar 2026
reply
The authors innovatively propose the T2VRCM model, which effectively resolves critical technical bottlenecks such as asynchronous sensor observations and missing data modalities. Based on this framework, they have successfully constructed the world's first 20-meter resolution global paddy rice dataset (GlobalRice20), demonstrating strong methodological innovation and practical application value. Overall, the experimental design of this study is comprehensive and rigorous, the figures are beautifully and intuitively crafted, and the public release of the related dataset provides invaluable data support for global food security assessment and the monitoring of UN Sustainable Development Goal 2 (SDG 2). To further enhance the overall quality and readability of the manuscript, I offer the following revision suggestions:
- The manuscript currently mentions the use of 164,000 global samples for the experiments, but the specific partitioning method for these samples is not clearly described. It is recommended to further clarify in the text exactly how these 164,000 global samples were divided into training and test sets.
- In the results analysis of Section 4.2, it is recommended to supplement the model's accuracy performance when utilizing only optical data (e.g., Sentinel-2 in 2024 or Landsat-8 in 2015). Furthermore, please quantitatively analyze the specific magnitude of accuracy improvement brought by the addition of Sentinel-1 radar data, thereby better highlighting the necessity of multi-source data fusion.
- To improve the readability of the tables, it is recommended to highlight the best-performing model results in bold in both Table 1 and Table 2.
- In Table 2, which presents the ablation study results, adding an extra column or using a clearer notation (e.g., adding "+X.XX%" in parentheses) to show the specific performance improvement relative to the baseline would help to more intuitively reflect the actual contributions of each module.
- In Figure 1, the font size within the legends appears slightly uncoordinated compared to the typography of the rest of the map. It is recommended to appropriately adjust and standardize the font sizes in these legends to further enhance the overall aesthetic appeal of the figure.
- There is a certain degree of content overlap between Figure 3 and Figure 4. It is recommended to appropriately simplify the model visualization of the T2VRCM component in Figure 3 to avoid excessive redundancy with Figure 4.
- The core concept "Time-Series-to-Vision" appears multiple times in the text. However, the capitalization forms (e.g., "Time-series-to-Vision" vs. "Time-Series-to-Vision") are currently used interchangeably and are not fully consistent. It is recommended to conduct a full-text search and unify them into a standard format.
- In Section 3.4, the metrics are denoted as UArice and PArice, while the subsequent Table 2 and Figure 6 use UA and PA. It is recommended to standardize the names of these relevant indicators throughout the manuscript to avoid any ambiguity.
- The capitalization of "KM" and "km" in the spatial distribution maps is inconsistent. It is recommended to unify their representation (preferably using the standard lowercase "km").
- In the in-text labels of Figure 8, the coefficient of determination is currently written as "R2". It is recommended to format this using the standard mathematical superscript as R2 to comply with rigorous academic publishing standards.
Citation: https://doi.org/10.5194/essd-2026-24-CC2 -
RC1: 'Comment on essd-2026-24', Anonymous Referee #1, 19 Apr 2026
reply
This paper presents GlobalRice20, claimed to be the first global 20 m resolution paddy rice dataset covering 98 countries for 2015 and 2024. The core methodological contribution is T2VRCM, a framework that converts irregular, multi-source remote sensing time series (Sentinel-1 VV/VH, and optical EVI/LSWI from Sentinel-2 or Landsat-8) into 2D line-graph images, which are then classified by a custom CNN with gated convolution blocks and a full-stage feature interaction module. The model achieves 92.33% overall accuracy on a 164,000-sample test set and shows R² = 0.91 agreement with national statistics for 2024. Decadal analysis reveals a 6.6% global rice area increase, with Africa showing the strongest growth at 15.7%.
The topic is timely and the ambition to produce a global 20 m rice map is commendable. Overall, this dataset would be a valuable resource to the community. However, before this paper can be accepted, I have the following concerns for authors to address.
1. The “stable sample” strategy introduces selection bias. Validation samples are restricted to locations that maintained consistent land cover across both 2015 and 2024. This systematically excludes the most challenging pixels, areas where land use changed, where rice fields were abandoned or newly established, or where confusion with other crops is most likely. The reported 92.33% OA therefore overestimates real-world performance. Accuracy on dynamic areas, which are precisely where the dataset would be most useful for SDG 2 monitoring, remains unknown. At minimum, additional validation on independently sampled points (including changed areas) is needed.
2. The 2015 optical data come from Landsat-8 at 30 m, and Sentinel-1 GRD has a pixel spacing of 10 m but an effective spatial resolution closer to 20 m in range. Resampling 30 m Landsat pixels to 20 m does not create information at that finer scale. The paper should clearly state that the 2015 product has a coarser effective resolution than the 2024 product and discuss the implications for change detection between the two years. Calling both “20 m resolution” without this caveat is misleading.
3. The comparison with native time-series models is unfair. For RF, LSTM, and Transformer baselines, missing temporal phases were filled by linear interpolation and completely absent modalities were zero-filled. Zero-filling is a known poor strategy since it conflates “missing data” with “zero signal”, which is especially harmful for SAR backscatter where zero is not a physically meaningful value. More appropriate strategies exist (e.g., learnable masking tokens, attention masks, multi-task imputation). The paper should either apply state-of-the-art missing-data handling for the time-series baselines or acknowledge this limitation explicitly. Without this, the claim that the time-series-to-vision strategy is inherently superior is not well supported.
4. Train/test splitting protocol is insufficiently described. The paper states 164,000 samples across 98 countries but does not specify: (1) the ratio of training to testing, (2) whether the split is spatially stratified to prevent spatial autocorrelation leakage, and (3) whether the same samples are used for both 2015 and 2024 evaluation (which they appear to be, given the "stable sample" constraint). If nearby pixels appear in both training and test sets, accuracy will be inflated. Readers who wish to reuse or benchmark against this dataset need these details.
5. Country-level R² conflates spatial accuracy with area estimation. A map with spatially compensating errors (overestimation in one region, underestimation in another) can still achieve a high R² against national statistics. This metric does not validate spatial accuracy at the pixel or field level. The paper should acknowledge this limitation. Subnational (e.g., province-level) comparisons, at least for countries where such statistics are available (China, India, USA), would strengthen the validation considerably.
6. The paper jumps from Results (Section 4) directly to Conclusion (Section 5). A dedicated Discussion section is expected in ESSD to address: limitations of the method and dataset, potential error sources (e.g., confusion with other flooded crops, aquaculture, seasonal wetlands), sensitivity to SAR coverage gaps, and the impact of using different optical sensors across the two years. The absence of any honest discussion of failure cases or systematic errors is a significant gap.
7. The time-series-to-vision transformation discards numerical precision. Encoding floating-point time-series values as colored lines on a 224×224 pixel image introduces quantization error. The model must learn to "read" a graph visually rather than operating on the underlying numbers. While the results suggest this works in practice, the paper should acknowledge and discuss this trade-off. A simple experiment comparing the T2VRCM backbone fed with raw numerical input (appropriately padded/masked) versus the image-based input would clarify how much accuracy is gained or lost by the visual encoding itself.
8. Reference sample construction lacks detail. The phrase “rigorous visual interpretation process integrating high-resolution optical imagery and existing regional rice maps” is vague. How many interpreters were involved? What was the interpretation protocol? Was inter-annotator agreement measured? Which high-resolution imagery was used, and at what date? These details are important for assessing label quality.
9. Cropping system not distinguished in the output. The paper discusses single-season, double-season, and mixed-season rice systems (Section 2.1) but the final product is a binary rice/non-rice map. For a 20 m product aimed at SDG 2 monitoring, distinguishing cropping intensity would add substantial value. This should at least be discussed as a limitation or future direction.
10. Computational cost is not reported. Global-scale inference at 20 m resolution is a massive undertaking. The paper does not report total processing time, number of GEE computation hours, or the cost of running T2VRCM inference globally. This information is relevant for reproducibility and for other groups wishing to update or extend the dataset.
11. SAR data compositing at 12-day intervals is stated but not justified. Why 12 days? Sentinel-1A has a 12-day repeat cycle, but in regions with ascending and descending passes, the effective revisit is 6 days. Was orbit direction handled? Were ascending and descending passes composited together or separately? Mixing orbit directions without accounting for viewing geometry differences can introduce noise in backscatter values.
12. The ablation study is limited in scope. The ablation in Table 2 tests InceptionDWC and FSFI independently but does not ablate the time-series-to-vision transformation itself. A comparison of the T2VRCM backbone operating on raw time-series input (maybe with proper missing-data handling) versus image input would isolate the contribution of the visual encoding strategy from the contribution of the network architecture.
13. Some figures are bit of vague. I would recommend authors to use vector graph instead of raster graph.
14. Duplicate reference. Ni et al. (2021) appears twice in the reference list (lines 581–586), with identical content.Citation: https://doi.org/10.5194/essd-2026-24-RC1
Data sets
GlobalRice20: A 20 m resolution global paddy rice dataset for 2015 and 2024 derived from multi-source remote sensing Hong Zhang et al. https://doi.org/10.5281/zenodo.18168302
Viewed
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 547 | 289 | 51 | 887 | 52 | 63 |
- HTML: 547
- PDF: 289
- XML: 51
- Total: 887
- BibTeX: 52
- EndNote: 63
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
This study developed the GlobalRice20 dataset with 20m resolution for 2015 and 2024, constructed a Time-Series-to-Vision framework (T2VRCM) to address cloud contamination and irregular time-series issues in satellite data, and produced a high-accuracy global paddy rice distribution product, which is highly important and practically significant for global food security assessment, agricultural monitoring. The detailed comments are as follows.