An Accurate 10 m Annual Crop Map Product of Maize and Soybean Across the United States

Li, Haijun; Song, Xiao-Peng; Adusei, Bernard; Pickering, Jeffrey; Lima, Andre; Poulson, Andrew; Baggett, Antoine; Potapov, Peter; Khan, Ahmad; Zalles, Viviana; Hernandez-Serna, Andres; Jantz, Samuel M.; Pickens, Amy H.; Ortiz-Dominguez, Carolina; Li, Xinyuan; Kerr, Theodore; Song, Zhen; Turubanova, Svetlana; Bongwele, Eddy; Kondjo, Heritier Koy; Komarova, Anna; Stehman, Stephen V.; Hansen, Matthew C.

doi:10.5194/essd-2025-361

Preprints

https://doi.org/10.5194/essd-2025-361

Preprints

14 Jul 2025

| 14 Jul 2025

Status: a revised version of this preprint is currently under review for the journal ESSD.

An Accurate 10 m Annual Crop Map Product of Maize and Soybean Across the United States

Haijun Li, Xiao-Peng Song, Bernard Adusei, Jeffrey Pickering, Andre Lima, Andrew Poulson, Antoine Baggett, Peter Potapov, Ahmad Khan, Viviana Zalles, Andres Hernandez-Serna, Samuel M. Jantz, Amy H. Pickens, Carolina Ortiz-Dominguez, Xinyuan Li, Theodore Kerr, Zhen Song, Svetlana Turubanova, Eddy Bongwele, Heritier Koy Kondjo, Anna Komarova, Stephen V. Stehman, and Matthew C. Hansen

Abstract. High-resolution crop maps over large spatial extents are fundamental to many agricultural applications; however, generating high-quality crop maps consistently across space and time remains a challenge. In this study, we improved a workflow for crop mapping and developed the first openly available, annual, 10-m spatial resolution maize and soybean maps over the Contiguous United States (CONUS) from 2019 to 2022, available at the Global Land Analysis and Discovery at the University of Maryland (https://glad.umd.edu/projects/mapping-crops-10-m-resolution-united-states). We obtained all available Sentinel-2 surface reflectance data between May and October for every year, applied quality assurance, corrected the bidirectional reflectance distribution function (BRDF) effects, and generated 10-day analysis ready data (ARD) composites. We then derived multi-temporal metrics from the 10-day ARD as training features for the national-scale wall-to-wall mapping. We implemented a stratified, two-stage cluster sampling, and then conducted annual field surveys and collected ground data. Utilizing the training data with Sentinel-2 multi-temporal metrics and topographic factors, we trained random forest models generalized for annual maize and soybean classification separately. Validated using field data from the two-stage cluster sample, our annual maps achieved consistent overall accuracies (OA) greater than 95 % with standard errors of less than 1 %. User’s accuracies (UAs) and producer’s accuracies (PAs) for maize were higher than 91 % and 84 % across the years, and UAs and PAs for soybean were greater than 88 % and 82 %, respectively. To illustrate the substantial improvement of the 10-m map over existing datasets, e.g., the 30-m Cropland Data Layer (CDL), we aggregated the 10-m maps to 30-m spatial resolution and quantified the amount of 30-m mixed pixels that can be reduced at field, regional, and national levels. The counties with the most maize and soybean production in Iowa, Illinois and Nebraska had the lowest reduction in mixed pixels, ranging from 1 % to 10 %, whereas southern counties had a higher reduction in mixed pixels. Overall, the median percentages of mixed maize and soybean pixels reduction across all counties were 14 % and 16 %, respectively. With more Sentinel-2-like data available from continuous observations and incoming satellite missions, we anticipate that 10-m crop maps will greatly benefit long-term monitoring for agricultural practices from the field to global scales. The dataset is also available at https://doi.org/10.6084/m9.figshare.28934993.v1 (Li et al., 2025).

How to cite. Li, H., Song, X.-P., Adusei, B., Pickering, J., Lima, A., Poulson, A., Baggett, A., Potapov, P., Khan, A., Zalles, V., Hernandez-Serna, A., Jantz, S. M., Pickens, A. H., Ortiz-Dominguez, C., Li, X., Kerr, T., Song, Z., Turubanova, S., Bongwele, E., Kondjo, H. K., Komarova, A., Stehman, S. V., and Hansen, M. C.: An Accurate 10 m Annual Crop Map Product of Maize and Soybean Across the United States, Earth Syst. Sci. Data Discuss. [preprint], https://doi.org/10.5194/essd-2025-361, in review, 2025.

Received: 16 Jun 2025 – Discussion started: 14 Jul 2025

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Preprint (PDF, 5144 KB)

Supplement (1506 KB)

Download & links

Status: final response (author comments only)

EC1:
'Comment on essd-2025-361', Peng Zhu, 18 Jul 2025

Comments: This link does not work: https://glad.umd.edu/projects/mapping-crops-10-m-resolution-united-states. I thought it would be important to address this issues

Please resolve it.

Citation: https://doi.org/10.5194/essd-2025-361-EC1
- AC1: 'Reply on EC1', Haijun Li, 18 Jul 2025
  
  The website was experiencing an upgrade. We resolved this issue, and the link works now: https://glad.umd.edu/projects/mapping-crops-10-m-resolution-united-states
  
  Citation: https://doi.org/10.5194/essd-2025-361-AC1
RC1: 'Comment on essd-2025-361', Anonymous Referee #1, 12 Aug 2025

I have reviewed the paper, and my primary concern is whether the data make sense. National-scale crop type mapping is essentially an operational task, and conducting it at this scale poses significant challenges for a scientific team. I cannot recommend publication in ESSD before all my concerns are thoroughly addressed.
1. Over the past two decades, numerous studies have been published on maize and soybean mapping in the United States, and CDL data (including all major crop types) have been available for more than 15 years. In contrast, the dataset presented here includes only maize and soybean. Given that cropland in the U.S. is characterized by large field sizes, the higher spatial resolution offered by this dataset does not appear to meaningfully improve national-level crop area statistics. It is unclear who the intended users of this dataset are. Do the authors envision USDA adopting it to replace the current CDL for NASS statistics? Are there specific states or counties that would use it for their own statistical work? For maize/soybean yield forecasting, which national projects would benefit from this dataset? For climate change analysis, does this 10 m maize/soybean map outperform the existing CDL products? From my perspective, the dataset would be more compelling if it were a 10 m crop type map covering all crops in the U.S., or if it provided 10 m major crop type maps for underrepresented regions such as Africa.
2. Line 122~142, Sentinel 2 Level 2A data is Surface Reflectance data, I wonder whether the author know what is "TOA reflectance" and what is "SR"! And I wonder whether the author understand the Sentinel-2 ARD pre-processing workflow and know what is the input.
3. Section 2.3: The samples are used to identify maize and soybean in the PSUs, and the resulting maize/soybean maps within the PSUs are then used to identify these crops at the national level. I find this approach questionable, particularly given that the study focuses on only two crop types. It is unclear how well this method would perform if all crop types in the U.S. were included. Moreover, any misclassification in the PSU-level mapping is likely to be amplified when scaling up to the national level. A potentially more robust solution would be to use high-resolution drone imagery in place of Sentinel data for maize/soybean mapping within the PSUs.
4. Regarding the selection of training data, please demonstrate its representativeness by providing examples of typical vegetation index (VI) time series and corresponding meteorological data.
5. Validation data collection (Table S2): For each year, only 90 PSUs are used, and validation samples are collected solely from these PSUs. This sample size and distribution are not sufficiently representative. I recommend selecting at least 10 samples per PSU and including a minimum of 300 PSUs per year. The same PSUs could be used across multiple years, and the authors could consider contacting and collaborating with farmers to facilitate this process.
6. Figure 9, The comparison with CDL data does not make sense because the CDL result is the maize/soybean from all crop-type map, and the 10m map is just a maize/soybean map. Please conduct all crop type crop type mapping to make it comparable.
7. The mixed-pixel analysis (Figures 12 and 13) does not make sense either, If more crop types were considered, the number of mixed maize pixels would likely increase. Therefore, I recommend first producing a 10 m resolution map for all crop types. Additionally, for this scale factor analysis, please include CDL data in the comparison to provide a more complete evaluation.

Citation: https://doi.org/10.5194/essd-2025-361-RC1
RC2: 'Comment on essd-2025-361', Anonymous Referee #2, 06 Oct 2025

Review summary
This manuscript presents an approach to mapping maize and soybean at 10 m resolution using Sentinel-2 data and a two-step classification framework. While the topic is relevant and of potential interest to the remote sensing and agricultural monitoring communities, several aspects of the paper require clarification and refinement before it can be considered for publication. The study is, however, a refreshing example of work that combines a sound sampling design with expert-driven processing of remote sensing data, emphasizing data quality and the use of relatively simple, interpretable models. This stands in contrast to the growing tendency in the field to rely on deep learning models that ingest large volumes of raw data without adequate consideration of underlying data quality or sampling consistency. Overall, the study could become a valuable contribution if the authors strengthen the framing of their research objective, provide clear justifications for methodological choices, and align the discussion with recent developments in 10 m crop mapping.
Major comments
- The scope of the paper is not clearly defined. The described classification methodology builds on well-established works by the same research group and therefore - by itself - lacks substantial novelty. Moreover, the claim that the Cropland Data Layer (CDL) is available only at 30 m resolution is no longer valid (see: nass.usda.gov/Research_and_Science/Cropland/SARS1a.php), as CDL products are now also distributed at 10 m. The title suggests that the study’s primary aim is to produce accurate 10 m maize and soybean maps. However, a considerable portion of the analysis instead focuses on the differences between 10 m and 30 m classification maps. The authors should therefore clarify the main objective of the paper (L104-105 hints at this but it should get more focus). If the focus is on generating accurate 10 m maps, the added value of these maps compared to the now-available 10 m CDL must be articulated more convincingly. Alternatively, if the emphasis lies on analyzing the effect of spatial resolution on crop type mapping, this should be clearly stated and consistently reflected throughout the manuscript. Or if this workflow is merely a first step in potential upscaling to more challenging regions where there's less competing well-established products, it should be stated like this as well.
- The manuscript uses data covering the period 2019–2022, but the rationale for selecting this specific time frame is not explained. It would be helpful if the authors clarified why these years were chosen, whether due to data availability or another reason. Providing this context would strengthen the transparency and reproducibility of the study design.
- The main motivation for the two-step classification approach of first producing decision-tree-based PSU maps and then using those for training random forest algorithms for full-scale production is not clearly stated. Did the authors quantify the added value of this compared to a one-step approach where the training data is immediately used for training the random forest models? Also, why - in case of a two-step approach - do the inputs (all bands + ratios of any two bands, vs the temporal stats) and algorithms (decision tree vs random forest) have to be different? I did not find any justification for that and it is quite confusing for the reader why these classification steps are so different.
Specific comments
L29: abstract should be self-explanatory to the extent possible. This line is very difficult to understand without context ("pixels that can be reduced at field, regional, and national levels"). I suggest to rephrase to make the message clear.
L95: see major comment: as of the 2024 version, CDL is now also offered at 10m resolution (source: https://www.nass.usda.gov/Research_and_Science/Cropland/SARS1a.php). Please add a note on this and the authors could also consider taking this up in their outlook based on the 10m vs 30m analysis results.
L98: note the recent release of European continental 10m crop type maps which should be referenced here as well (source: https://sdi.eea.europa.eu/catalogue/srv/api/records/9db29b07-5968-4ce0-8351-1e356b3d7d47?language=all)
L101: WorldCereal released global maps while the sentence reads as if these are only European maps.
L131-132: Two cloud masks are combined which results in very agressive masking. At least one source (Sen2Cor) is known to suffer from false positives over bright surfaces. Can the authors comment on the impact of potential numerous false positives in their masking procedure?
L155-L156: in case no interpolation was done, how was the missing value in the time series treated?
L159-L160: What is the motivation of a reprojection to lat/lon, also for the final output products, especially since that leads to variable pixel sizes? The authors should more carefully clarify their motivation for this approach.
L200-L201: which previous year's crop map? Your own final map of the preceding year? If so, how did you produce the first year?
L217-L218: it would be helpful if a table was added with total sample sizes per label (both SSU and wind shield survey numbers)
L224: cfr main comment on the two-step approach
L229-L230: cfr main comment on the two-step approach
L243: what's the rationale behind these seemingly arbitrary numbers? (0.2 vs 0.8 %)
L252: How can the authors be confident that the probability outputs of the independently trained maize vs soybean models can be used like this? It's a common observation that some models can be overly confident while others are not. It's perfectly possible that either the maize or soybean models are - in case of doubt - consistently outputting higher probabilities than the other model. In competition, that model would always win in this kind of aggregation rule. Could the authors comment on this potential pitfall promoting one model over the other purely based on the level of confidence?
L305: Sect. 4.1 focuses on the difference between 10m and 30m based on the same original 10m map. This is insightful to the community. Focusing on the map outputs themselves, it would be helpful though if a comparison was made between the 10m map, the aggregated 30m map and the corresponding 30m CDL map. This demonstrates the resolution effect, but also compares map quality between the map produced here and the CDL map.
L393: add the usage of the 0 and 255 values in the data as well
L394: as is always the case for this group, the field data is unfortunately not made public. The claim "Data used in this study are openly accessible online" is therefore not fully correct. Please clarify that only the external data being used is openly accessible.
L405-L406: given the already published 10m CDL (see source in major comments section), the authors may reconsider the use of the word "first" here.
Data
- The TIFF files provided are not cloud-optimized. I would recommend offering these kind of huge files as COG because they greatly simplify the visualization in GIS applications.
- Some extra metadata in the TIFF files such as the legend would be useful for users.
Supplementary material
- Table S1: does this table miss a horizontal line?

Citation: https://doi.org/10.5194/essd-2025-361-RC2
RC3:
'Comment on essd-2025-361', Anonymous Referee #3, 29 Oct 2025
This is an excellent paper that introduces an extremely valuable data product: the 2019-2022 annual 10-m resolution maps of maize and soybean for the Contiguous United States (CONUS). Given that the journal Earth System Science Data (ESSD) aims to publish high-quality, thoroughly validated, and well-described datasets, this paper excels in its methodological rigor, validation thoroughness, and data accessibility, making it an excellent fit for the journal.
Overall, this is impressive work, and the data product is of great significance for agricultural monitoring, food security assessment, and ecological modeling. The paper is clearly written and well-structured. While I hold this work in high regard, I also propose the following modification suggestions, primarily focused on clarifying key technical details and enhancing the analytical depth, which I believe will further improve the quality of the paper and the usability of the data product.
Major Technical Question

The authors trained two binary classifiers separately for maize (RF-Maize) and soybean (RF-Soybean). In practice, a pixel might be predicted as "present" by both models (with high probability), or predicted as "absent" by both. The authors have not explained how this "maize vs. soybean" conflict or overlap is handled.
The "area-matching approach" is typically used to guide post-classification processing (e.g., adjusting classification thresholds) to match the total area with a reference value (like official statistics). In this study, was this method applied before or after the morphological operations (opening and closing)? How was it combined with the two independent probability layers (maize, soybean)?
Technical Details

The authors state that the 12-m TanDEM-X elevation data was resampled to 10-m using "nearest neighbor" resampling. For continuous surface data like elevation, "bilinear interpolation" or "cubic convolution" is generally recommended to obtain a smoother, more realistic topographic transition.
While this may have a negligible impact on the final classification results, the authors could briefly explain the rationale for choosing "nearest neighbor," or, if possible, switch to bilinear interpolation. If "nearest neighbor" is retained, it is recommended to confirm that this choice does not introduce significant topographic artifacts at the 10-m scale.
In Section 2.3.2, the authors mention applying a "5x5 pixel kernel opening" followed by a 10x10 pixel kernel closing" after combining the probability layers. This is a key post-processing step to eliminate scattered pixels and fill holes.
However, the chosen kernel sizes (especially the 10x10 closing, i.e., 100m x 100m) are relatively large. The 5x5 opening will remove isolated patches smaller than 50x50m, while the 10x10 closing will fill non-crop holes (like ponds or buildings) within large fields if they are smaller than 100x100m.
While this improves the visual "purity" of the map product, it may also (especially in highly fragmented agricultural landscapes) eliminate fine-scale features that the 10-m data was intended to capture.
Please add a sentence in Section 2.3.2 to briefly explain the justification for choosing these specific kernel sizes (5x5 and 10x10). For example, was this based on considerations of average field size in the US, or were tests conducted to balance noise removal with detail preservation?
Citation: https://doi.org/10.5194/essd-2025-361-RC3

Supplement

https://doi.org/10.5194/essd-2025-361-supplement

Data sets

2019-2022 10-m maize and soybean maps over the United States Haijun Li, Xiao-peng Song, Bernard Adusei, Jeffrey Pickering, Andre de Lima, Andrew Poulson, Antoine Baggett, Peter Potapov, Ahmad Khan, Viviana Zalles, Andres Hernandez-Serna, Samuel M. Jantz, Amy H. Pickens, Carolina Ortiz-Dominguez, Xinyuan Li, Theodore Kerr, Zhen Song, Svetlana Turubanova, Eddy Bongwele, Heritier Koy Kondjo, Anna Komarova, Stephen V. Stehman, Matthew C. Hansen https://glad.umd.edu/projects/mapping-crops-10-m-resolution-united-states

Viewed

Total article views: 2,199 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	Supplement	BibTeX	EndNote
1,751	394	54	2,199	104	45	87

HTML: 1,751
PDF: 394
XML: 54
Total: 2,199
Supplement: 104
BibTeX: 45
EndNote: 87

Views and downloads (calculated since 14 Jul 2025)

Month	HTML	PDF	XML	Total
Jul 2025	253	42	18	313
Aug 2025	239	12	5	256
Sep 2025	832	15	9	856
Oct 2025	118	29	4	151
Nov 2025	68	79	1	148
Dec 2025	85	86	8	179
Jan 2026	81	59	7	147
Feb 2026	47	57	2	106
Mar 2026	28	15	0	43

Cumulative views and downloads (calculated since 14 Jul 2025)

Month	HTML	PDF	XML	Total
Jul 2025	253	42	18	313
Aug 2025	239	12	5	256
Sep 2025	832	15	9	856
Oct 2025	118	29	4	151
Nov 2025	68	79	1	148
Dec 2025	85	86	8	179
Jan 2026	81	59	7	147
Feb 2026	47	57	2	106
Mar 2026	28	15	0	43

Viewed (geographical distribution)

Total article views: 2,153 (including HTML, PDF, and XML) Thereof 2,153 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 08 Mar 2026

Download

Preprint (5144 KB)
Metadata XML

Short summary

We developed the first annual, 10-m spatial resolution maize and soybean maps over the US from 2019 to 2022. Evaluated by ground data collected over a stratified random sample, our maps achieved > 95 % overall accuracy consistently. Our analysis suggested that mixed pixels could be substantially reduced by the increased spatial resolution from 30 m to 10 m. Our maps can support research subjects such as forecasting crop yield, analyzing agricultural-related greenhouse gas emissions, etc.


Total:	0
HTML:	0
PDF:	0
XML:	0