CropSight-US: An Object-based Crop Type Ground Truth Dataset Using Street View and Sentinel-2 Satellite Imagery across the Contiguous United States, 2013&ndash;2023

Zhou, Zhijie; Liu, Yin; Diao, Chunyuan

doi:10.5194/essd-2025-527

Preprints

https://doi.org/10.5194/essd-2025-527

Preprints

03 Nov 2025

| 03 Nov 2025

Status: a revised version of this preprint was accepted for the journal ESSD and is expected to appear here in due course.

CropSight-US: An Object-based Crop Type Ground Truth Dataset Using Street View and Sentinel-2 Satellite Imagery across the Contiguous United States, 2013–2023

Zhijie Zhou, Yin Liu, and Chunyuan Diao

Abstract. Accurate and scalable crop type maps are vital for supporting food security, as they provide critical information on the specific crops cultivated in a given area to inform agricultural decision-making and enhance crop productivity. The generation of these maps depends on high-quality crop type ground truth data, which are essential for developing remote sensing–based crop type classification models applicable across varying spatial and temporal contexts. Yet existing crop type ground truth datasets often focus on specific crop types of limited spatial and temporal ranges, constrained by the high cost and labor intensity of traditional field surveys. This limitation hinders their applicability to large-scale and multi-year applications, such as nationwide crop monitoring and long-term yield forecasting. Additionally, most existing crop type ground truth datasets contain only pixel-level labels without explicit field boundaries, impeding the extraction of field-level texture and structure information needed for accurate crop type mapping in heterogeneous agricultural landscapes. Collectively, these limitations hinder the development of scalable crop type mapping workflows and reduce the precision and reliability of resulting crop type maps for agricultural monitoring and decision support. In this study, we introduce CropSight-US, the first national scale, object-based crop type ground truth dataset for the contiguous United States (CONUS). This dataset spans the years 2013 to 2023 and includes over 100,000 crop type ground truth objects across 17 major crops and 294 Agricultural Statistics Districts, offering broad spatial and temporal coverage and high representativeness at field level. Each crop type ground truth object is accompanied by an uncertainty score that quantifies the confidence in its crop type identification, enabling users to filter or weight samples according to their specific reliability requirements. The crop type ground truthing framework of CropSight-US innovatively integrates crop labels derived from Google Street View imagery with field boundaries delineated from Sentinel-2 imagery to produce object-based crop type ground truth data. This scalable framework offers a valuable alternative to traditional field surveys by replacing in-person observations with virtual audits, significantly improving the efficiency, scalability, and cost-effectiveness of ground truth data collection. This framework achieves 97.2 % overall accuracy in crop type identification and 98.0 % F1 score in cropland field boundary delineation using the reference dataset. By delivering high-resolution, standardized, and reproducible reference data, CropSight-US establishes a new benchmark for crop type ground truthing and supports more informed agricultural research, monitoring, and decision-making. CropSight-US is available at https://doi.org/10.5281/zenodo.15702415 (Zhou et al., 2025).

Received: 31 Aug 2025 – Discussion started: 03 Nov 2025

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Zhijie Zhou, Yin Liu, and Chunyuan Diao

Status: closed

RC1:
'Comment on essd-2025-527', Anonymous Referee #1, 07 Dec 2025
This paper proposed an object-based crop type ground truth dataset from 2013 to 2023, called “CropSight-US”, using street view and Sentinel-2 images. This dataset covered 17 major crop types across 294 Agricultural Statistics Districts (ASDs) in CONUS. Specifically, crop type labels were extracted from Google street view images and field boundary information was derived from Sentinel-2 imagery. CONUS-UncertainFusionNet was developed to generate this dataset with uncertainty information. In the experimental results, the performance of several deep learning-based networks was compared. This paper is well-organized and the reviewer has the following detailed comments:
This paper included that “CropSight-US, the first national scale, object-based crop type ground truth dataset”. The authors were encouraged to review the existing products to highlight this contribution.

This article contains multiple instances of the full name and abbreviation for CONUS. It is recommended to provide the full name only upon first mention, using the abbreviation thereafter.

For line 138 and 655, these links were invalid. Please double check all the links in the manuscript.

What are the advantages of using Sentinel-2 over NAIP imagery?

What is the experimental environment of ViT-B16 and ResNet-50 training, such as GPU and CPU? The authors are suggested to offer more details about the setting of the proposed network.

For all the tables, it is recommended to display the best performance in bold to help readers better follow.

SAM is trained on high-resolution natural images, and Sentinel-2 has a lower resolution. How to overcome this resolution shift?

For Figure 6, there are some image patches after image embedding. Do they refer to image embeddings? The authors are suggested to check this as image patches are not equal to image embeddings.

For line 353, there is an error regarding reference “(Error! Reference source not found.).”.

How to choose reasonable thresholds for entropy and variance?
Citation: https://doi.org/10.5194/essd-2025-527-RC1
RC2: 'Comment on essd-2025-527', Anonymous Referee #2, 02 Feb 2026

This manuscript introduces CropSight-US, a national-scale, object-based crop type ground truth dataset for the contiguous United States (2013–2023), derived from Google Street View imagery and Sentinel-2–based field boundary delineation. The dataset is novel in its integration of street-level imagery at national scale and in its object-level design with uncertainty metrics.However, several fundamental aspects of dataset representativeness, temporal consistency, and uncertainty interpretation remain insufficiently documented, limiting confidence in its general applicability.
Specific Comments
1.All samples are derived from road-accessible GSV imagery, yet the manuscript provides no quantitative assessment of resulting spatial bias. Without statistics on distance-to-road coverage or cropland representativeness at the ASD level, it is difficult to evaluate how well CropSight-US captures agricultural landscapes away from road networks.
2.The selection of Sentinel-2 (or NAIP) imagery “closest in time” to GSV acquisition is not quantified. The manuscript should report typical and maximum temporal offsets and discuss potential impacts on both crop labeling and boundary delineation.
3.The manuscript places strong emphasis on uncertainty as a key advantage of the dataset. However, it remains unclear how users are expected to interpret and use the provided uncertainty metrics in practice. In particular, it is not specified whether the reported entropy and variance values are directly comparable across crop types, or whether they should be interpreted as crop-specific relative measures. Without such clarification, the practical value of the uncertainty information for downstream applications is difficult to assess.
4.The authors group wheat, rice, and other small-grain cereals into a single “cereal” class based on their visual similarity in street-level imagery. While this decision is understandable from an engineering and labeling perspective, it introduces substantial risks at the application level that are not sufficiently discussed. In particular, rice paddies are typically characterized by surface flooding and distinct water management practices, whereas wheat is associated with fundamentally different hydrological and energy conditions. This raises the concern that water-related signals may be implicitly mixed into the “cereal” class, potentially degrading the reliability of rice-related samples. The manuscript should more explicitly discuss these implications and the limitations they impose on crop-specific analyses.
Technical Issues
Sect. 3.1.3 contains an unresolved reference (“Error! Reference source not found.”).

Citation: https://doi.org/10.5194/essd-2025-527-RC2
AC1: 'Comment on essd-2025-527', Chunyuan Diao, 03 Mar 2026

The comment was uploaded in the form of a supplement: https://essd.copernicus.org/preprints/essd-2025-527/essd-2025-527-AC1-supplement.pdf

Citation: https://doi.org/10.5194/essd-2025-527-AC1

Status: closed

RC1:
'Comment on essd-2025-527', Anonymous Referee #1, 07 Dec 2025
This paper proposed an object-based crop type ground truth dataset from 2013 to 2023, called “CropSight-US”, using street view and Sentinel-2 images. This dataset covered 17 major crop types across 294 Agricultural Statistics Districts (ASDs) in CONUS. Specifically, crop type labels were extracted from Google street view images and field boundary information was derived from Sentinel-2 imagery. CONUS-UncertainFusionNet was developed to generate this dataset with uncertainty information. In the experimental results, the performance of several deep learning-based networks was compared. This paper is well-organized and the reviewer has the following detailed comments:
This paper included that “CropSight-US, the first national scale, object-based crop type ground truth dataset”. The authors were encouraged to review the existing products to highlight this contribution.

This article contains multiple instances of the full name and abbreviation for CONUS. It is recommended to provide the full name only upon first mention, using the abbreviation thereafter.

For line 138 and 655, these links were invalid. Please double check all the links in the manuscript.

What are the advantages of using Sentinel-2 over NAIP imagery?

What is the experimental environment of ViT-B16 and ResNet-50 training, such as GPU and CPU? The authors are suggested to offer more details about the setting of the proposed network.

For all the tables, it is recommended to display the best performance in bold to help readers better follow.

SAM is trained on high-resolution natural images, and Sentinel-2 has a lower resolution. How to overcome this resolution shift?

For Figure 6, there are some image patches after image embedding. Do they refer to image embeddings? The authors are suggested to check this as image patches are not equal to image embeddings.

For line 353, there is an error regarding reference “(Error! Reference source not found.).”.

How to choose reasonable thresholds for entropy and variance?
Citation: https://doi.org/10.5194/essd-2025-527-RC1
RC2: 'Comment on essd-2025-527', Anonymous Referee #2, 02 Feb 2026

This manuscript introduces CropSight-US, a national-scale, object-based crop type ground truth dataset for the contiguous United States (2013–2023), derived from Google Street View imagery and Sentinel-2–based field boundary delineation. The dataset is novel in its integration of street-level imagery at national scale and in its object-level design with uncertainty metrics.However, several fundamental aspects of dataset representativeness, temporal consistency, and uncertainty interpretation remain insufficiently documented, limiting confidence in its general applicability.
Specific Comments
1.All samples are derived from road-accessible GSV imagery, yet the manuscript provides no quantitative assessment of resulting spatial bias. Without statistics on distance-to-road coverage or cropland representativeness at the ASD level, it is difficult to evaluate how well CropSight-US captures agricultural landscapes away from road networks.
2.The selection of Sentinel-2 (or NAIP) imagery “closest in time” to GSV acquisition is not quantified. The manuscript should report typical and maximum temporal offsets and discuss potential impacts on both crop labeling and boundary delineation.
3.The manuscript places strong emphasis on uncertainty as a key advantage of the dataset. However, it remains unclear how users are expected to interpret and use the provided uncertainty metrics in practice. In particular, it is not specified whether the reported entropy and variance values are directly comparable across crop types, or whether they should be interpreted as crop-specific relative measures. Without such clarification, the practical value of the uncertainty information for downstream applications is difficult to assess.
4.The authors group wheat, rice, and other small-grain cereals into a single “cereal” class based on their visual similarity in street-level imagery. While this decision is understandable from an engineering and labeling perspective, it introduces substantial risks at the application level that are not sufficiently discussed. In particular, rice paddies are typically characterized by surface flooding and distinct water management practices, whereas wheat is associated with fundamentally different hydrological and energy conditions. This raises the concern that water-related signals may be implicitly mixed into the “cereal” class, potentially degrading the reliability of rice-related samples. The manuscript should more explicitly discuss these implications and the limitations they impose on crop-specific analyses.
Technical Issues
Sect. 3.1.3 contains an unresolved reference (“Error! Reference source not found.”).

Citation: https://doi.org/10.5194/essd-2025-527-RC2
AC1: 'Comment on essd-2025-527', Chunyuan Diao, 03 Mar 2026

The comment was uploaded in the form of a supplement: https://essd.copernicus.org/preprints/essd-2025-527/essd-2025-527-AC1-supplement.pdf

Citation: https://doi.org/10.5194/essd-2025-527-AC1

Zhijie Zhou, Yin Liu, and Chunyuan Diao

Data sets

CROPSIGHT-US: An Object-Based Crop Type Ground Truth Dataset Using Street View and Sentinel-2 Satellite Imagery across the Contiguous United States Zhijie Zhou et al. https://zenodo.org/records/15702415

Zhijie Zhou, Yin Liu, and Chunyuan Diao

Viewed

Total article views: 1,475 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
850	575	50	1,475	46	58

HTML: 850
PDF: 575
XML: 50
Total: 1,475
BibTeX: 46
EndNote: 58

Views and downloads (calculated since 03 Nov 2025)

Month	HTML	PDF	XML	Total
Nov 2025	377	101	18	496
Dec 2025	118	120	5	243
Jan 2026	130	102	5	237
Feb 2026	45	86	5	136
Mar 2026	96	90	7	193
Apr 2026	81	74	10	165
May 2026	3	2	0	5

Cumulative views and downloads (calculated since 03 Nov 2025)

Month	HTML	PDF	XML	Total
Nov 2025	377	101	18	496
Dec 2025	118	120	5	243
Jan 2026	130	102	5	237
Feb 2026	45	86	5	136
Mar 2026	96	90	7	193
Apr 2026	81	74	10	165
May 2026	3	2	0	5

Viewed (geographical distribution)

Total article views: 1,400 (including HTML, PDF, and XML) Thereof 1,400 with geography defined and 0 with unknown origin.

Country	#	Views	%

Cited

Latest update: 05 May 2026

Short summary

We developed an object-based crop type ground truth dataset CropSight-US across the Contiguous United States for 2013–2023. Using satellite and street view imagery, we created an operational way to identify crop types and field boundaries without in-person surveys. This novel dataset provides reliable crop type ground truth with delineated field boundaries that can support large-scale crop monitoring and decision-making. The dataset can be accessed via: https://doi.org/10.5281/zenodo.15702415.


Total:	0
HTML:	0
PDF:	0
XML:	0

CropSight-US: An Object-based Crop Type Ground Truth Dataset Using Street View and Sentinel-2 Satellite Imagery across the Contiguous United States, 2013–2023

Data sets

Viewed

Viewed (geographical distribution)

Cited

1 citations as recorded by crossref.