the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
An Accurate 10 m Annual Crop Map Product of Maize and Soybean Across the United States
Abstract. High-resolution crop maps over large spatial extents are fundamental to many agricultural applications; however, generating high-quality crop maps consistently across space and time remains a challenge. In this study, we improved a workflow for crop mapping and developed the first openly available, annual, 10-m spatial resolution maize and soybean maps over the Contiguous United States (CONUS) from 2019 to 2022, available at the Global Land Analysis and Discovery at the University of Maryland (https://glad.umd.edu/projects/mapping-crops-10-m-resolution-united-states). We obtained all available Sentinel-2 surface reflectance data between May and October for every year, applied quality assurance, corrected the bidirectional reflectance distribution function (BRDF) effects, and generated 10-day analysis ready data (ARD) composites. We then derived multi-temporal metrics from the 10-day ARD as training features for the national-scale wall-to-wall mapping. We implemented a stratified, two-stage cluster sampling, and then conducted annual field surveys and collected ground data. Utilizing the training data with Sentinel-2 multi-temporal metrics and topographic factors, we trained random forest models generalized for annual maize and soybean classification separately. Validated using field data from the two-stage cluster sample, our annual maps achieved consistent overall accuracies (OA) greater than 95 % with standard errors of less than 1 %. User’s accuracies (UAs) and producer’s accuracies (PAs) for maize were higher than 91 % and 84 % across the years, and UAs and PAs for soybean were greater than 88 % and 82 %, respectively. To illustrate the substantial improvement of the 10-m map over existing datasets, e.g., the 30-m Cropland Data Layer (CDL), we aggregated the 10-m maps to 30-m spatial resolution and quantified the amount of 30-m mixed pixels that can be reduced at field, regional, and national levels. The counties with the most maize and soybean production in Iowa, Illinois and Nebraska had the lowest reduction in mixed pixels, ranging from 1 % to 10 %, whereas southern counties had a higher reduction in mixed pixels. Overall, the median percentages of mixed maize and soybean pixels reduction across all counties were 14 % and 16 %, respectively. With more Sentinel-2-like data available from continuous observations and incoming satellite missions, we anticipate that 10-m crop maps will greatly benefit long-term monitoring for agricultural practices from the field to global scales. The dataset is also available at https://doi.org/10.6084/m9.figshare.28934993.v1 (Li et al., 2025).
- Preprint
(5144 KB) - Metadata XML
-
Supplement
(1506 KB) - BibTeX
- EndNote
Status: open (until 04 Nov 2025)
-
EC1: 'Comment on essd-2025-361', Peng Zhu, 18 Jul 2025
reply
-
AC1: 'Reply on EC1', Haijun Li, 18 Jul 2025
reply
The website was experiencing an upgrade. We resolved this issue, and the link works now: https://glad.umd.edu/projects/mapping-crops-10-m-resolution-united-states
Citation: https://doi.org/10.5194/essd-2025-361-AC1
-
AC1: 'Reply on EC1', Haijun Li, 18 Jul 2025
reply
-
RC1: 'Comment on essd-2025-361', Anonymous Referee #1, 12 Aug 2025
reply
I have reviewed the paper, and my primary concern is whether the data make sense. National-scale crop type mapping is essentially an operational task, and conducting it at this scale poses significant challenges for a scientific team. I cannot recommend publication in ESSD before all my concerns are thoroughly addressed.
1. Over the past two decades, numerous studies have been published on maize and soybean mapping in the United States, and CDL data (including all major crop types) have been available for more than 15 years. In contrast, the dataset presented here includes only maize and soybean. Given that cropland in the U.S. is characterized by large field sizes, the higher spatial resolution offered by this dataset does not appear to meaningfully improve national-level crop area statistics. It is unclear who the intended users of this dataset are. Do the authors envision USDA adopting it to replace the current CDL for NASS statistics? Are there specific states or counties that would use it for their own statistical work? For maize/soybean yield forecasting, which national projects would benefit from this dataset? For climate change analysis, does this 10 m maize/soybean map outperform the existing CDL products? From my perspective, the dataset would be more compelling if it were a 10 m crop type map covering all crops in the U.S., or if it provided 10 m major crop type maps for underrepresented regions such as Africa.
2. Line 122~142, Sentinel 2 Level 2A data is Surface Reflectance data, I wonder whether the author know what is "TOA reflectance" and what is "SR"! And I wonder whether the author understand the Sentinel-2 ARD pre-processing workflow and know what is the input.
3. Section 2.3: The samples are used to identify maize and soybean in the PSUs, and the resulting maize/soybean maps within the PSUs are then used to identify these crops at the national level. I find this approach questionable, particularly given that the study focuses on only two crop types. It is unclear how well this method would perform if all crop types in the U.S. were included. Moreover, any misclassification in the PSU-level mapping is likely to be amplified when scaling up to the national level. A potentially more robust solution would be to use high-resolution drone imagery in place of Sentinel data for maize/soybean mapping within the PSUs.
4. Regarding the selection of training data, please demonstrate its representativeness by providing examples of typical vegetation index (VI) time series and corresponding meteorological data.
5. Validation data collection (Table S2): For each year, only 90 PSUs are used, and validation samples are collected solely from these PSUs. This sample size and distribution are not sufficiently representative. I recommend selecting at least 10 samples per PSU and including a minimum of 300 PSUs per year. The same PSUs could be used across multiple years, and the authors could consider contacting and collaborating with farmers to facilitate this process.
6. Figure 9, The comparison with CDL data does not make sense because the CDL result is the maize/soybean from all crop-type map, and the 10m map is just a maize/soybean map. Please conduct all crop type crop type mapping to make it comparable.
7. The mixed-pixel analysis (Figures 12 and 13) does not make sense either, If more crop types were considered, the number of mixed maize pixels would likely increase. Therefore, I recommend first producing a 10 m resolution map for all crop types. Additionally, for this scale factor analysis, please include CDL data in the comparison to provide a more complete evaluation.
Citation: https://doi.org/10.5194/essd-2025-361-RC1 -
RC2: 'Comment on essd-2025-361', Anonymous Referee #2, 06 Oct 2025
reply
Review summary
This manuscript presents an approach to mapping maize and soybean at 10 m resolution using Sentinel-2 data and a two-step classification framework. While the topic is relevant and of potential interest to the remote sensing and agricultural monitoring communities, several aspects of the paper require clarification and refinement before it can be considered for publication. The study is, however, a refreshing example of work that combines a sound sampling design with expert-driven processing of remote sensing data, emphasizing data quality and the use of relatively simple, interpretable models. This stands in contrast to the growing tendency in the field to rely on deep learning models that ingest large volumes of raw data without adequate consideration of underlying data quality or sampling consistency. Overall, the study could become a valuable contribution if the authors strengthen the framing of their research objective, provide clear justifications for methodological choices, and align the discussion with recent developments in 10 m crop mapping.
Major comments
- The scope of the paper is not clearly defined. The described classification methodology builds on well-established works by the same research group and therefore - by itself - lacks substantial novelty. Moreover, the claim that the Cropland Data Layer (CDL) is available only at 30 m resolution is no longer valid (see: nass.usda.gov/Research_and_Science/Cropland/SARS1a.php), as CDL products are now also distributed at 10 m. The title suggests that the study’s primary aim is to produce accurate 10 m maize and soybean maps. However, a considerable portion of the analysis instead focuses on the differences between 10 m and 30 m classification maps. The authors should therefore clarify the main objective of the paper (L104-105 hints at this but it should get more focus). If the focus is on generating accurate 10 m maps, the added value of these maps compared to the now-available 10 m CDL must be articulated more convincingly. Alternatively, if the emphasis lies on analyzing the effect of spatial resolution on crop type mapping, this should be clearly stated and consistently reflected throughout the manuscript. Or if this workflow is merely a first step in potential upscaling to more challenging regions where there's less competing well-established products, it should be stated like this as well.
- The manuscript uses data covering the period 2019–2022, but the rationale for selecting this specific time frame is not explained. It would be helpful if the authors clarified why these years were chosen, whether due to data availability or another reason. Providing this context would strengthen the transparency and reproducibility of the study design.
- The main motivation for the two-step classification approach of first producing decision-tree-based PSU maps and then using those for training random forest algorithms for full-scale production is not clearly stated. Did the authors quantify the added value of this compared to a one-step approach where the training data is immediately used for training the random forest models? Also, why - in case of a two-step approach - do the inputs (all bands + ratios of any two bands, vs the temporal stats) and algorithms (decision tree vs random forest) have to be different? I did not find any justification for that and it is quite confusing for the reader why these classification steps are so different.
Specific comments
L29: abstract should be self-explanatory to the extent possible. This line is very difficult to understand without context ("pixels that can be reduced at field, regional, and national levels"). I suggest to rephrase to make the message clear.
L95: see major comment: as of the 2024 version, CDL is now also offered at 10m resolution (source: https://www.nass.usda.gov/Research_and_Science/Cropland/SARS1a.php). Please add a note on this and the authors could also consider taking this up in their outlook based on the 10m vs 30m analysis results.
L98: note the recent release of European continental 10m crop type maps which should be referenced here as well (source: https://sdi.eea.europa.eu/catalogue/srv/api/records/9db29b07-5968-4ce0-8351-1e356b3d7d47?language=all)
L101: WorldCereal released global maps while the sentence reads as if these are only European maps.
L131-132: Two cloud masks are combined which results in very agressive masking. At least one source (Sen2Cor) is known to suffer from false positives over bright surfaces. Can the authors comment on the impact of potential numerous false positives in their masking procedure?
L155-L156: in case no interpolation was done, how was the missing value in the time series treated?
L159-L160: What is the motivation of a reprojection to lat/lon, also for the final output products, especially since that leads to variable pixel sizes? The authors should more carefully clarify their motivation for this approach.
L200-L201: which previous year's crop map? Your own final map of the preceding year? If so, how did you produce the first year?
L217-L218: it would be helpful if a table was added with total sample sizes per label (both SSU and wind shield survey numbers)
L224: cfr main comment on the two-step approach
L229-L230: cfr main comment on the two-step approach
L243: what's the rationale behind these seemingly arbitrary numbers? (0.2 vs 0.8 %)
L252: How can the authors be confident that the probability outputs of the independently trained maize vs soybean models can be used like this? It's a common observation that some models can be overly confident while others are not. It's perfectly possible that either the maize or soybean models are - in case of doubt - consistently outputting higher probabilities than the other model. In competition, that model would always win in this kind of aggregation rule. Could the authors comment on this potential pitfall promoting one model over the other purely based on the level of confidence?
L305: Sect. 4.1 focuses on the difference between 10m and 30m based on the same original 10m map. This is insightful to the community. Focusing on the map outputs themselves, it would be helpful though if a comparison was made between the 10m map, the aggregated 30m map and the corresponding 30m CDL map. This demonstrates the resolution effect, but also compares map quality between the map produced here and the CDL map.
L393: add the usage of the 0 and 255 values in the data as well
L394: as is always the case for this group, the field data is unfortunately not made public. The claim "Data used in this study are openly accessible online" is therefore not fully correct. Please clarify that only the external data being used is openly accessible.
L405-L406: given the already published 10m CDL (see source in major comments section), the authors may reconsider the use of the word "first" here.
Data
- The TIFF files provided are not cloud-optimized. I would recommend offering these kind of huge files as COG because they greatly simplify the visualization in GIS applications.
- Some extra metadata in the TIFF files such as the legend would be useful for users.
Supplementary material
- Table S1: does this table miss a horizontal line?
Citation: https://doi.org/10.5194/essd-2025-361-RC2
Data sets
2019-2022 10-m maize and soybean maps over the United States Haijun Li, Xiao-peng Song, Bernard Adusei, Jeffrey Pickering, Andre de Lima, Andrew Poulson, Antoine Baggett, Peter Potapov, Ahmad Khan, Viviana Zalles, Andres Hernandez-Serna, Samuel M. Jantz, Amy H. Pickens, Carolina Ortiz-Dominguez, Xinyuan Li, Theodore Kerr, Zhen Song, Svetlana Turubanova, Eddy Bongwele, Heritier Koy Kondjo, Anna Komarova, Stephen V. Stehman, Matthew C. Hansen https://glad.umd.edu/projects/mapping-crops-10-m-resolution-united-states
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
1,379 | 76 | 33 | 1,488 | 29 | 26 | 27 |
- HTML: 1,379
- PDF: 76
- XML: 33
- Total: 1,488
- Supplement: 29
- BibTeX: 26
- EndNote: 27
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1
Comments: This link does not work: https://glad.umd.edu/projects/mapping-crops-10-m-resolution-united-states. I thought it would be important to address this issues
Please resolve it.