the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
CropSight-US: An Object-based Crop Type Ground Truth Dataset Using Street View and Sentinel-2 Satellite Imagery across the Contiguous United States, 2013–2023
Abstract. Accurate and scalable crop type maps are vital for supporting food security, as they provide critical information on the specific crops cultivated in a given area to inform agricultural decision-making and enhance crop productivity. The generation of these maps depends on high-quality crop type ground truth data, which are essential for developing remote sensing–based crop type classification models applicable across varying spatial and temporal contexts. Yet existing crop type ground truth datasets often focus on specific crop types of limited spatial and temporal ranges, constrained by the high cost and labor intensity of traditional field surveys. This limitation hinders their applicability to large-scale and multi-year applications, such as nationwide crop monitoring and long-term yield forecasting. Additionally, most existing crop type ground truth datasets contain only pixel-level labels without explicit field boundaries, impeding the extraction of field-level texture and structure information needed for accurate crop type mapping in heterogeneous agricultural landscapes. Collectively, these limitations hinder the development of scalable crop type mapping workflows and reduce the precision and reliability of resulting crop type maps for agricultural monitoring and decision support. In this study, we introduce CropSight-US, the first national scale, object-based crop type ground truth dataset for the contiguous United States (CONUS). This dataset spans the years 2013 to 2023 and includes over 100,000 crop type ground truth objects across 17 major crops and 294 Agricultural Statistics Districts, offering broad spatial and temporal coverage and high representativeness at field level. Each crop type ground truth object is accompanied by an uncertainty score that quantifies the confidence in its crop type identification, enabling users to filter or weight samples according to their specific reliability requirements. The crop type ground truthing framework of CropSight-US innovatively integrates crop labels derived from Google Street View imagery with field boundaries delineated from Sentinel-2 imagery to produce object-based crop type ground truth data. This scalable framework offers a valuable alternative to traditional field surveys by replacing in-person observations with virtual audits, significantly improving the efficiency, scalability, and cost-effectiveness of ground truth data collection. This framework achieves 97.2 % overall accuracy in crop type identification and 98.0 % F1 score in cropland field boundary delineation using the reference dataset. By delivering high-resolution, standardized, and reproducible reference data, CropSight-US establishes a new benchmark for crop type ground truthing and supports more informed agricultural research, monitoring, and decision-making. CropSight-US is available at https://doi.org/10.5281/zenodo.15702415 (Zhou et al., 2025).
- Preprint
(3538 KB) - Metadata XML
- BibTeX
- EndNote
Status: closed
- RC1: 'Comment on essd-2025-527', Anonymous Referee #1, 07 Dec 2025
-
RC2: 'Comment on essd-2025-527', Anonymous Referee #2, 02 Feb 2026
This manuscript introduces CropSight-US, a national-scale, object-based crop type ground truth dataset for the contiguous United States (2013–2023), derived from Google Street View imagery and Sentinel-2–based field boundary delineation. The dataset is novel in its integration of street-level imagery at national scale and in its object-level design with uncertainty metrics.However, several fundamental aspects of dataset representativeness, temporal consistency, and uncertainty interpretation remain insufficiently documented, limiting confidence in its general applicability.
Specific Comments
1.All samples are derived from road-accessible GSV imagery, yet the manuscript provides no quantitative assessment of resulting spatial bias. Without statistics on distance-to-road coverage or cropland representativeness at the ASD level, it is difficult to evaluate how well CropSight-US captures agricultural landscapes away from road networks.
2.The selection of Sentinel-2 (or NAIP) imagery “closest in time” to GSV acquisition is not quantified. The manuscript should report typical and maximum temporal offsets and discuss potential impacts on both crop labeling and boundary delineation.
3.The manuscript places strong emphasis on uncertainty as a key advantage of the dataset. However, it remains unclear how users are expected to interpret and use the provided uncertainty metrics in practice. In particular, it is not specified whether the reported entropy and variance values are directly comparable across crop types, or whether they should be interpreted as crop-specific relative measures. Without such clarification, the practical value of the uncertainty information for downstream applications is difficult to assess.
4.The authors group wheat, rice, and other small-grain cereals into a single “cereal” class based on their visual similarity in street-level imagery. While this decision is understandable from an engineering and labeling perspective, it introduces substantial risks at the application level that are not sufficiently discussed. In particular, rice paddies are typically characterized by surface flooding and distinct water management practices, whereas wheat is associated with fundamentally different hydrological and energy conditions. This raises the concern that water-related signals may be implicitly mixed into the “cereal” class, potentially degrading the reliability of rice-related samples. The manuscript should more explicitly discuss these implications and the limitations they impose on crop-specific analyses.
Technical Issues
Sect. 3.1.3 contains an unresolved reference (“Error! Reference source not found.”).
Citation: https://doi.org/10.5194/essd-2025-527-RC2 -
AC1: 'Comment on essd-2025-527', Chunyuan Diao, 03 Mar 2026
The comment was uploaded in the form of a supplement: https://essd.copernicus.org/preprints/essd-2025-527/essd-2025-527-AC1-supplement.pdf
Status: closed
-
RC1: 'Comment on essd-2025-527', Anonymous Referee #1, 07 Dec 2025
This paper proposed an object-based crop type ground truth dataset from 2013 to 2023, called “CropSight-US”, using street view and Sentinel-2 images. This dataset covered 17 major crop types across 294 Agricultural Statistics Districts (ASDs) in CONUS. Specifically, crop type labels were extracted from Google street view images and field boundary information was derived from Sentinel-2 imagery. CONUS-UncertainFusionNet was developed to generate this dataset with uncertainty information. In the experimental results, the performance of several deep learning-based networks was compared. This paper is well-organized and the reviewer has the following detailed comments:
- This paper included that “CropSight-US, the first national scale, object-based crop type ground truth dataset”. The authors were encouraged to review the existing products to highlight this contribution.
- This article contains multiple instances of the full name and abbreviation for CONUS. It is recommended to provide the full name only upon first mention, using the abbreviation thereafter.
- For line 138 and 655, these links were invalid. Please double check all the links in the manuscript.
- What are the advantages of using Sentinel-2 over NAIP imagery?
- What is the experimental environment of ViT-B16 and ResNet-50 training, such as GPU and CPU? The authors are suggested to offer more details about the setting of the proposed network.
- For all the tables, it is recommended to display the best performance in bold to help readers better follow.
- SAM is trained on high-resolution natural images, and Sentinel-2 has a lower resolution. How to overcome this resolution shift?
- For Figure 6, there are some image patches after image embedding. Do they refer to image embeddings? The authors are suggested to check this as image patches are not equal to image embeddings.
- For line 353, there is an error regarding reference “(Error! Reference source not found.).”.
- How to choose reasonable thresholds for entropy and variance?
Citation: https://doi.org/10.5194/essd-2025-527-RC1 -
RC2: 'Comment on essd-2025-527', Anonymous Referee #2, 02 Feb 2026
This manuscript introduces CropSight-US, a national-scale, object-based crop type ground truth dataset for the contiguous United States (2013–2023), derived from Google Street View imagery and Sentinel-2–based field boundary delineation. The dataset is novel in its integration of street-level imagery at national scale and in its object-level design with uncertainty metrics.However, several fundamental aspects of dataset representativeness, temporal consistency, and uncertainty interpretation remain insufficiently documented, limiting confidence in its general applicability.
Specific Comments
1.All samples are derived from road-accessible GSV imagery, yet the manuscript provides no quantitative assessment of resulting spatial bias. Without statistics on distance-to-road coverage or cropland representativeness at the ASD level, it is difficult to evaluate how well CropSight-US captures agricultural landscapes away from road networks.
2.The selection of Sentinel-2 (or NAIP) imagery “closest in time” to GSV acquisition is not quantified. The manuscript should report typical and maximum temporal offsets and discuss potential impacts on both crop labeling and boundary delineation.
3.The manuscript places strong emphasis on uncertainty as a key advantage of the dataset. However, it remains unclear how users are expected to interpret and use the provided uncertainty metrics in practice. In particular, it is not specified whether the reported entropy and variance values are directly comparable across crop types, or whether they should be interpreted as crop-specific relative measures. Without such clarification, the practical value of the uncertainty information for downstream applications is difficult to assess.
4.The authors group wheat, rice, and other small-grain cereals into a single “cereal” class based on their visual similarity in street-level imagery. While this decision is understandable from an engineering and labeling perspective, it introduces substantial risks at the application level that are not sufficiently discussed. In particular, rice paddies are typically characterized by surface flooding and distinct water management practices, whereas wheat is associated with fundamentally different hydrological and energy conditions. This raises the concern that water-related signals may be implicitly mixed into the “cereal” class, potentially degrading the reliability of rice-related samples. The manuscript should more explicitly discuss these implications and the limitations they impose on crop-specific analyses.
Technical Issues
Sect. 3.1.3 contains an unresolved reference (“Error! Reference source not found.”).
Citation: https://doi.org/10.5194/essd-2025-527-RC2 -
AC1: 'Comment on essd-2025-527', Chunyuan Diao, 03 Mar 2026
The comment was uploaded in the form of a supplement: https://essd.copernicus.org/preprints/essd-2025-527/essd-2025-527-AC1-supplement.pdf
Data sets
CROPSIGHT-US: An Object-Based Crop Type Ground Truth Dataset Using Street View and Sentinel-2 Satellite Imagery across the Contiguous United States Zhijie Zhou et al. https://zenodo.org/records/15702415
Viewed
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 850 | 575 | 50 | 1,475 | 46 | 58 |
- HTML: 850
- PDF: 575
- XML: 50
- Total: 1,475
- BibTeX: 46
- EndNote: 58
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
This paper proposed an object-based crop type ground truth dataset from 2013 to 2023, called “CropSight-US”, using street view and Sentinel-2 images. This dataset covered 17 major crop types across 294 Agricultural Statistics Districts (ASDs) in CONUS. Specifically, crop type labels were extracted from Google street view images and field boundary information was derived from Sentinel-2 imagery. CONUS-UncertainFusionNet was developed to generate this dataset with uncertainty information. In the experimental results, the performance of several deep learning-based networks was compared. This paper is well-organized and the reviewer has the following detailed comments: