the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
ChinaAI-FSC: A Comprehensive AI-Ready MODIS Fractional Snow Cover Dataset for China (2000–2022)
Abstract. We present ChinaAI-FSC, the first large-scale, standardized, AI-ready fractional snow cover (FSC) sample collection for mainland China, spanning 22 snow seasons from 2000 to 2022 and addressing a critical gap in long-term snow monitoring. The dataset consists of 47,728 samples (each 128 × 128 MODIS-pixel tiles), where high-resolution Landsat-5/7/8/9 and Sentinel-2 imagery provide consistent FSC reference labels. A total of 20 feature variables, including MODIS surface reflectance (bands 1-7), topographic attributes, forest and land cover information, and geolocation factors, were extracted to enable both point-scale and tile-scale spatially contextualized AI modelling. A structured and transparent workflow, encompassing systematic sample preparation, rigorous quality control, spatiotemporal sample partitioning, and standardized metadata, ensures reproducibility, physical consistency, and interoperability across machine learning and deep learning applications. Dataset reliability and AI-readiness were systematically evaluated using a novel “Four Layers-Four Domains-Fifteen Attributes (4L-4D-15A)” assessment protocol, covering data, information, system, and application dimensions. The quality, reliability, and usability of ChinaAI-FSC were demonstrated through three representative use cases: (1) benchmarking of six ML/DL models (ANN, SVR, RF, CNN, UNet, and ResNet), (2) validation of the standard MODIS FSC product, and (3) nationwide seamless FSC mapping. By providing harmonized, validated, and well-documented samples, ChinaAI-FSC establishes a unified foundation for AI-driven snow cover mapping, long-term monitoring, and cryosphere–hydrological modelling, promoting reproducible, interoperable, and next-generation research in cryospheric science. The dataset is publicly available from the National Tibetan Plateau Data Center (TPDC) at https://doi.org/10.11888/Cryos.tpdc.303034 (also accessible via https://cstr.cn/18406.11.Cryos.tpdc.303034) and from Zenodo at https://doi.org/10.5281/zenodo.17707386.
- Preprint
(2900 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on essd-2025-662', Anonymous Referee #1, 28 Dec 2025
- AC1: 'Reply on RC1', Jinliang Hou, 20 Jan 2026
-
RC2: 'Comment on essd-2025-662', Anonymous Referee #2, 30 Dec 2025
The authors present the development and evaluation of the ChinaAI-FSC dataset, a comprehensive, AI-ready MODIS-based fractional snow cover (FSC) sample collection for China covering 2000–2022. The work aims to establish a standardized, large-scale, and high-quality benchmark for AI-driven snow cover mapping. Considerable effort is evident in data integration, quality control, and validation, and the introduction of the novel “4L-4D-15A” evaluation framework is a clear strength. Overall, the study represents a meaningful contribution to FSC retrieval from MODIS. However, several issues in the current manuscript need to be addressed to better align the presentation with the scientific contribution and to meet the expectations of a high-quality journal.
Major Comments
- While the manuscript provides extensive detail on the “AI-ready” nature of the dataset, it repeatedly frames the work more as a project report than a scientific contribution to snow remote sensing. This emphasis, particularly in structure and narrative, risks misleading readers into viewing the paper as a technical documentation of an AI platform rather than a methodological advance in FSC retrieval. To better highlight your unique scientific contribution, I recommend significantly reducing discussion of the AI project framework and refocusing the manuscript on FSC data processing, algorithmic choices, and evaluation rigor. For example, Section 3.3 reads more like a description of an evaluation protocol than an explanation of how it advances FSC validation. Similarly, parts of the text give the impression that your team developed the evaluation methodology itself—please clarify what is novel (or are you just follow NOAA evaluation framework?) versus what is applied.
- Section 5 is currently dominated by forward-looking statements about the dataset’s future applications, which detracts from the core scientific message. Most readers, including myself, are primarily interested in the FSC dataset itself: how it was produced, its limitations, and how it improves upon existing products. The current discussion is confusing and lacks focus, particularly Section 5.1, which reads like a project roadmap rather than a scientific discussion. I suggest removing Section 5.1 entirely and redirecting the discussion toward substantive issues in FSC retrieval, such as: 1) Training sample selection and representativeness, 2) Impact of sample size and spatial/temporal distribution, 3) Challenges in complex terrain and forested regions, 4) How your approach handles subpixel snow in heterogeneous landscapes. These would strengthen the paper’s relevance to the snow remote sensing community.
- The arguments in Section 5.2 currently read as personal opinions rather than evidence-based discussion. Please support your claims with relevant literature. Without citations, the section lacks scientific credibility and appears speculative.
- I would like to know the performance of your dataset in the forested area. If possible, I suggest you attach the relevant analysis results and discussion content.
Minor Comments
L12: Remove “mainland”.
L54–65: Please add a brief review of prior FSC retrieval studies in challenging environments (e.g., mountainous or forested regions), such as Xiao et al. (2022, JAG).
Xiao et al. 2022. Estimating fractional snow cover in vegetated environments using MODIS surface reflectance data
L74: Consider removing “AI-ready” here to frame the research gap more broadly.
L74–80: The two stated objectives appear redundant. Clarify whether they represent distinct goals or rephrase to avoid repetition. Given that the primary output is an FSC dataset, focus the motivation on its scientific value—not its compatibility with AI workflows.
L101: Suggest revising to: “Standardized AI-ready metadata and unified evaluation protocols.”
L106–123: Avoid restating the abstract. Provide a concise overview of the study’s scope and structure instead.
Section 3.1.1: Clarify the acquisition and processing specifics of the two satellite datasets (Landsat and Sentinel-2).
L156–159: Were surface reflectance data for Landsat and Sentinel-2 processed by your team, or were standard products used?
L160–163: Clarify whether cloud masking was performed using your own implementation of CFMask (Landsat) and SCL (Sentinel-2), or if you relied solely on the native QA layers.
L164–165: Was the interpolation of Landsat-7 ETM+ SLC-off gaps performed by your team, or did you use an existing gap-filled product? Please specify.
Section 3.1.2:
1) Replace “MODIS data” with “MODIS series products” or similar for precision.
2) Briefly describe the seamless surface reflectance processing algorithm to help readers understand that this product—rather than standard MOD09GA—is the foundation of your FSC retrieval.
Section 3.2.2: Why were all input variables retained without feature selection? In many FSC applications, not all predictors contribute meaningfully, and including redundant variables can reduce model efficiency and interpretability (e.g., Xiao et al., 2022, JAG). Please justify your approach.
Section 3.2.4: Support your threshold choices (e.g., 0.2 for Ref4, 0.4 for Ref2 and Ref6) with references or sensitivity analyses.
L283: Replace “violate” with “fail to meet.” Please check the entire manuscript for similar phrasing.
Equation 2: Provide a clearer physical or empirical rationale for the formulation.
Figures 7, 9, 10: Include spatial scales and clearly label the regions of analysis.
Section 4 heading: Consider renaming to “Demonstration of Applications Using the AI-Ready FSC Dataset” or similar.
L447–448: The current statement is too vague. Elaborate on the specific factors influencing FSC accuracy (e.g., illumination, forest structure, grain size).
L602–603: The claim that “expanded feature space enables AI models to better characterize complex snow–terrain–climate interactions” is speculative without evidence. Rephrase to clarify what you mean, e.g., which features improve representation of which physical processes?
Citation: https://doi.org/10.5194/essd-2025-662-RC2 - AC2: 'Reply on RC2', Jinliang Hou, 20 Jan 2026
Data sets
ChinaAI-FSC: A Comprehensive AI-Ready MODIS Fractional Snow Cover Dataset for China (2000-2022) Jinliang Hou et al. https://doi.org/10.5281/zenodo.17707386
Model code and software
AI-Ready-China-FSC Jinliang Hou et al. https://github.com/houjin0503/AI-Ready-China-FSC
Viewed
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 376 | 112 | 32 | 520 | 14 | 26 |
- HTML: 376
- PDF: 112
- XML: 32
- Total: 520
- BibTeX: 14
- EndNote: 26
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
This manuscript presents the development and evaluation of the ChinaAI-FSC dataset, a comprehensive AI-ready MODIS fractional snow cover sample collection for China spanning 2000–2022. The objective of this work is to provide a standardized, large-scale, and high-quality benchmark for AI-driven snow cover mapping. The authors have undertaken a substantial effort in data integration, quality control, and validation, and the introduction of a novel "4L-4D-15A" evaluation framework is a notable strength. However, the manuscript in its current form has several issues that need to be justified. The most critical concerns revolve around the potential imbalance of samples across varying snow conditions and geographic regions, as well as insufficient discussion regarding the sources and mitigation of uncertainty. These aspects affect the perceived robustness and broad applicability of the dataset and must be thoroughly addressed before publication.
Major Comments:
The statements regarding the dataset's utility appear overstated or misaligned with its actual characteristics as presented. The authors should modify these claims to accurately reflect the dataset's demonstrated strengths and limitations.
Additionally, the AI results trained on the dataset provided by the authors show a significant visual discrepancy from the reference values. Why does this occur?
Minor Comments:
Line 324 is missing a period at the end. Please carefully review the entire text to avoid similar issues.