the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
CropLayer: A high-accuracy 2-meter resolution cropland mapping dataset for China in 2020 derived from Mapbox and Google satellite imagery using data-driven approaches
Abstract. Accurate and detailed cropland maps are essential for agricultural planning, resource management, and food security, particularly in countries like China, where agricultural productivity is high but resources are limited. Despite the availability of several medium-to-high-resolution satellite-based cropland maps, significant discrepancies in area estimates and spatial distribution persist, limiting their utility. This study proposes a data-driven framework for cropland mapping that leverages 2 m High Resolution (HR) imagery from Mapbox and Google. The framework consists of three main stages: First, national imagery is partitioned into 0.05°×0.05° blocks for efficient parallel computation. An Image Quality Assessment (IQA) using ResNet models is performed on both sources to address the challenge of missing image acquisition metadata. Second, a robust cropland identification model integrates Mask2Former for precise segmentation and XGBoost for error evaluation, facilitating iterative improvements through active learning. Finally, a novel integration strategy combines four feature groups—Geography, IQA, Region Property, and Consistency—using XGBoost to merge the datasets into a unified cropland layer, named Croplayer. The Croplayer dataset achieves an overall mapping accuracy of 88.73 %, with 30 out of 32 provincial units reporting area estimates within ±10 % of official statistics. In contrast, only 1 to 9 provinces from seven other existing datasets meet the same accuracy standard. The results highlight Croplayer's potential for applications such as crop yield estimation and agricultural structure analysis, offering a reliable tool for addressing agricultural and food security challenges.
- Preprint
(4362 KB) - Metadata XML
- BibTeX
- EndNote
Status: open (extended)
-
RC1: 'Comment on essd-2025-44', Anonymous Referee #1, 05 Aug 2025
reply
This manuscript presents a valuable contribution to high-resolution cropland mapping in China through the development of the CropLayer dataset, leveraging data-driven approaches with Mapbox and Google satellite imagery. The integration of deep learning models and active learning strategies to address limitations in existing datasets is methodologically sound. The comprehensive validation against seven existing datasets and official statistics strengthens the credibility of the findings. However, several scientific issues require clarification to enhance the robustness and reproducibility of the work.
1.The image quality assessment (IQA) using ResNet for cover type classification is innovative, comparative analysis of model performance over other state-of-the-art models for IQA would strengthen this choice.
2.The active learning framework for sample selection mentions "stopping criteria" based on the absence of significant artifacts or underestimation errors, but some quantitative thresholds for termination are not clear, for example, what objective metrics guided the decision to stop sampling?
3.The integration strategy using XGBoost to fuse Mapbox and Google results relies on four feature groups (geographic, IQA, regional attributes, consistency). However, the relative importance of each feature group in improving integration accuracy is not analyzed. A permutation importance analysis would clarify which features drive the model’s decisions.
4.The comparison with seven existing datasets shows that CropLayer outperforms others in provincial area estimation, but the reasons for discrepancies in specific regions are not fully explored. Could topographic complexity or cropland fragmentation explain these biases?
5.The Mask2Former model is selected for cropland segmentation based on its highest IoU (88.73%), but the computational efficiency trade-offs (e.g., training time: 11h56m vs. 5h41m (Segformer)) are not discussed. For large-scale applications, model speed and resource requirements are critical.
6.The limitation regarding "inability to capture temporal dynamics" (reliance on 2020 data) is noted, but no feasible path for multi-temporal extension is proposed. For instance, could seasonal imagery from Mapbox/Google (e.g.,2021-2024) be integrated using the same framework?
Citation: https://doi.org/10.5194/essd-2025-44-RC1
Data sets
CropLayer: 2-meter resolution cropland mapping dataset for China in 2020 Hao Jiang, Xia Zhou, and Mengjun Ku https://doi.org/10.5281/zenodo.14726428
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
758 | 165 | 22 | 945 | 24 | 34 |
- HTML: 758
- PDF: 165
- XML: 22
- Total: 945
- BibTeX: 24
- EndNote: 34
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1