CropLayer: A high-accuracy 2-meter resolution cropland mapping dataset for China in 2020 derived from Mapbox and Google satellite imagery using data-driven approaches

Jiang, Hao; Ku, Mengjun; Zhou, Xia; Zheng, Qiong; Liu, Yangxiaoyue; Xu, Jianhui; Li, Dan; Wang, Chongyang; Wei, Jiayi; Zhang, Jing; Chen, Shuisen; Huang, Jianxi

doi:https://doi.org/10.5194/essd-2025-44

Preprints

https://doi.org/10.5194/essd-2025-44

Preprints

12 Mar 2025

| 12 Mar 2025

Status: this preprint is currently under review for the journal ESSD.

CropLayer: A high-accuracy 2-meter resolution cropland mapping dataset for China in 2020 derived from Mapbox and Google satellite imagery using data-driven approaches

Hao Jiang, Mengjun Ku, Xia Zhou, Qiong Zheng, Yangxiaoyue Liu, Jianhui Xu, Dan Li, Chongyang Wang, Jiayi Wei, Jing Zhang, Shuisen Chen, and Jianxi Huang

Abstract. Accurate and detailed cropland maps are essential for agricultural planning, resource management, and food security, particularly in countries like China, where agricultural productivity is high but resources are limited. Despite the availability of several medium-to-high-resolution satellite-based cropland maps, significant discrepancies in area estimates and spatial distribution persist, limiting their utility. This study proposes a data-driven framework for cropland mapping that leverages 2 m High Resolution (HR) imagery from Mapbox and Google. The framework consists of three main stages: First, national imagery is partitioned into 0.05°×0.05° blocks for efficient parallel computation. An Image Quality Assessment (IQA) using ResNet models is performed on both sources to address the challenge of missing image acquisition metadata. Second, a robust cropland identification model integrates Mask2Former for precise segmentation and XGBoost for error evaluation, facilitating iterative improvements through active learning. Finally, a novel integration strategy combines four feature groups—Geography, IQA, Region Property, and Consistency—using XGBoost to merge the datasets into a unified cropland layer, named Croplayer. The Croplayer dataset achieves an overall mapping accuracy of 88.73 %, with 30 out of 32 provincial units reporting area estimates within ±10 % of official statistics. In contrast, only 1 to 9 provinces from seven other existing datasets meet the same accuracy standard. The results highlight Croplayer's potential for applications such as crop yield estimation and agricultural structure analysis, offering a reliable tool for addressing agricultural and food security challenges.

Received: 24 Jan 2025 – Discussion started: 12 Mar 2025

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this preprint. The responsibility to include appropriate place names lies with the authors.

Download & links

Hao Jiang, Mengjun Ku, Xia Zhou, Qiong Zheng, Yangxiaoyue Liu, Jianhui Xu, Dan Li, Chongyang Wang, Jiayi Wei, Jing Zhang, Shuisen Chen, and Jianxi Huang

Status: open (extended)

Post a comment Subscribe to comment alert

RC1: 'Comment on essd-2025-44', Anonymous Referee #1, 05 Aug 2025 reply

This manuscript presents a valuable contribution to high-resolution cropland mapping in China through the development of the CropLayer dataset, leveraging data-driven approaches with Mapbox and Google satellite imagery. The integration of deep learning models and active learning strategies to address limitations in existing datasets is methodologically sound. The comprehensive validation against seven existing datasets and official statistics strengthens the credibility of the findings. However, several scientific issues require clarification to enhance the robustness and reproducibility of the work.

1.The image quality assessment (IQA) using ResNet for cover type classification is innovative, comparative analysis of model performance over other state-of-the-art models for IQA would strengthen this choice.
2.The active learning framework for sample selection mentions "stopping criteria" based on the absence of significant artifacts or underestimation errors, but some quantitative thresholds for termination are not clear, for example, what objective metrics guided the decision to stop sampling?
3.The integration strategy using XGBoost to fuse Mapbox and Google results relies on four feature groups (geographic, IQA, regional attributes, consistency). However, the relative importance of each feature group in improving integration accuracy is not analyzed. A permutation importance analysis would clarify which features drive the model’s decisions.
4.The comparison with seven existing datasets shows that CropLayer outperforms others in provincial area estimation, but the reasons for discrepancies in specific regions are not fully explored. Could topographic complexity or cropland fragmentation explain these biases?
5.The Mask2Former model is selected for cropland segmentation based on its highest IoU (88.73%), but the computational efficiency trade-offs (e.g., training time: 11h56m vs. 5h41m (Segformer)) are not discussed. For large-scale applications, model speed and resource requirements are critical.
6.The limitation regarding "inability to capture temporal dynamics" (reliance on 2020 data) is noted, but no feasible path for multi-temporal extension is proposed. For instance, could seasonal imagery from Mapbox/Google (e.g.,2021-2024) be integrated using the same framework?

Reply

Citation: https://doi.org/10.5194/essd-2025-44-RC1

Hao Jiang, Mengjun Ku, Xia Zhou, Qiong Zheng, Yangxiaoyue Liu, Jianhui Xu, Dan Li, Chongyang Wang, Jiayi Wei, Jing Zhang, Shuisen Chen, and Jianxi Huang

Data sets

CropLayer: 2-meter resolution cropland mapping dataset for China in 2020 Hao Jiang, Xia Zhou, and Mengjun Ku https://doi.org/10.5281/zenodo.14726428

Hao Jiang, Mengjun Ku, Xia Zhou, Qiong Zheng, Yangxiaoyue Liu, Jianhui Xu, Dan Li, Chongyang Wang, Jiayi Wei, Jing Zhang, Shuisen Chen, and Jianxi Huang

Viewed

Total article views: 945 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
758	165	22	945	24	34

HTML: 758
PDF: 165
XML: 22
Total: 945
BibTeX: 24
EndNote: 34

Views and downloads (calculated since 12 Mar 2025)

Month	HTML	PDF	XML	Total
Mar 2025	197	54	3	254
Apr 2025	174	16	7	197
May 2025	114	24	4	142
Jun 2025	90	21	2	113
Jul 2025	155	35	4	194
Aug 2025	28	15	2	45

Cumulative views and downloads (calculated since 12 Mar 2025)

Month	HTML	PDF	XML	Total
Mar 2025	197	54	3	254
Apr 2025	174	16	7	197
May 2025	114	24	4	142
Jun 2025	90	21	2	113
Jul 2025	155	35	4	194
Aug 2025	28	15	2	45

Viewed (geographical distribution)

Total article views: 912 (including HTML, PDF, and XML) Thereof 912 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 07 Aug 2025

Short summary

Existing cropland datasets in China show significant discrepancies. We created a high-resolution cropland map of China for 2020, using imagery from Mapbox and Google. By combining image quality assessments, active learning for semantic segmentation, and results integration. The accuracy achieved to 88.73 %, with 30 out of 32 provincial units reporting area estimates within ±10 % of official statistics. In contrast, only 9 to 1 provinces from 7 existing datasets meet the same accuracy standard.


Total:	0
HTML:	0
PDF:	0
XML:	0