the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Spatiotemporal mapping of invasive yellow sweetclover blooms using Sentinel-2 and high-resolution drone imagery
Abstract. Yellow sweetclover (Melilotus officinalis (L.) Lam.; MEOF) is an invasive forb pervasive across the Northern Great Plains, often linked to traits such as wide adaptability, strong stress tolerance, and high productivity. Despite MEOF's prevalent ecological-economic impacts and importance, knowledge of its spatial distribution and temporal evolution is extremely limited. Here, we aim to develop a spatial database of annual MEOF abundance (2016–2023) across western South Dakota (SD) at 10 m spatial resolution by applying a generalized prediction model on Sentinel-2 imagery. We collected in situ quadrat-based total vegetation cover with MEOF percent cover estimates across western SD from 2021 through 2023 and synthesized with other available percent cover estimates (2016–2022) of several federal, state, and non-governmental sources. We conducted drone overflights at 14 sites across Butte County, SD in 2023 to develop very high spatial resolution (4–6 cm) and accurate MEOF cover maps by applying a random forest (RF) classification model. The field-measured and uncrewed aerial system (UAS) derived MEOF percent cover estimates were used to train, test, and validate a RF regression model. The predicted MEOF percent cover dataset was validated with UAS-derived percent cover in 2023 across four sites (out of 14 sites). We found that the variation in the Tasseled Cap Greenness and Normalized Difference Yellowness Index were among the top predicting variables in predicting MEOF abundance. Our predictive model yielded greater accuracies with an R2 of 0.76, RMSE of 15.11 %, MAE of 10.95 %, and MAPE of 1.06 %. We validated our 2023 predicted maps using the 3-m resolution PlanetScope imagery for regions where field samples could not be collected in 2023. The database of MEOF abundance showed consecutive years of average or above-average precipitation yielded a higher MEOF abundance across the study region. The database could assist local land managers and government officials pinpoint locations requiring timely land management to control the rapid spread of MEOF in the Northern Great Plains. The developed invasive MEOF percent cover datasets are freely available at the figshare repository (https://doi.org/10.6084/m9.figshare.29270759.v1).
- Preprint
(3234 KB) - Metadata XML
-
Supplement
(1573 KB) - BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on essd-2025-353', Anonymous Referee #1, 08 Aug 2025
Saraf et al. produce a time-series of yellow sweet clover maps in western SD using sentinel imagery. The methods are solid and the data appear to be useful to managers. However, there are many occurrences of text that is confusing/out of context. I provide detailed suggestions below. Also, the language could be a bit more polished. For example, on line 70: “One such case we have for an invasive plant named yellow sweetclover (Melilotus officinalis….” Should be rephrased to something like “A common invasive biennial in the NGP, yellow sweetclover (Melilotus officinalis….” And on line 72: “of such plant species till the previous decade” should be “of such plant species until the 2010s””
Specific Comments
Line 32: Why did the authors chose an RF approach as opposed to a DNN/CNN?
Line 81-82: “Invasive forbs such as MEOF develop inflorescences with yellow flowers that are prominent during flowering time.” Suggest “Invasive forbs such as MEOF develop yellow inflorescences that are prominent during flowering time”
Line 85: Suggest adding “For example,” before “Sentinel-2 imagery with 10 m spatial resolution” to better tie with prior sentence. I would also recommend adding some additional citations to this section.
Line 91 “Moreover, its yellow flowers can be easily mistaken for other” suggest “Moreover, its yellow flowers can be easily mistaken in remote sensing imagery for other”
Line 133-134: “with poor representation in the field data” in what field data? In the data you’ve collected?
Line 139: Is objective 1 to map the extent (i.e. presence/absence) or fractional cover of MEOF? This isn’t clear.
Line 156: Replace rainfall with precipitation.
Line 160: “mosaic of mixed-grass prairie interspersed with shrubs…” I would say more accurately, from a landscape perspective “mosaic of mixed-grass prairie interspersed with cultivated lands”
Line 166-168: “Dryland sedges (Carex spp. L.), prairie threeawn (Aristida oligantha Michx.), and fringed sagewort (Artemisia frigida Willd.) increase with disturbance.” what is the source of this information?
Lines 178-180: Wouldn’t you also want samples of non MEOF sites?
Lines 203-212: What attributes were collected in the field sample? Just MEOF cover? Or something else also? Please include this information.
Line 268: what years are included in the climate means?
Lines 270: The seasonal composites are also means from across multiple years?
Line 280 “such as” implies there are other terrain derivatives calculated from NED. Is this the case? If so, please list.
Line 285: Proximity to roads could not be calculated from NLCD. You could derive distance to urban though, which may be synonymous with roads in some cases.
Line 295: “identically” distributed is odd. Do you mean “randomly”?
Lines 295-308: How is this discussion of Moran’s I/spatial autocorrelation different from above (lines 250-256)? The final n (11,235) is the same in both. Is the first describing the UAS site predictions, and the 2nd describing the regional model? The heading of 2.6 should be revised to clarify this.
Lines 333-336: This description of fig 3 is practically a caption, no need for this in text.
Lines 377-378: “However, climate variables like annual precipitation or snow depth, did not rank among the top predicting variables” Or could there be a lag effect (eg last years precip vs current year MEOF)?
Lines 381-382: The spatial resolution difference described in the prior sentence is not the reason for “created a MEOF percent cover map series for 2016 through 2023” as currently implied by “therefore”. Rather, you compared the “MEOF percent cover map series for 2016 through 2023” to “precipitation anomaly maps”.
Line 374: The first couple paragraphs on 4.1 read almost as results since there is no references cited at all.
Line 425: “displaying huge appearances” is odd
Line 450: NDMI was already defined in line 369.
Line 454: Or could it be proximity to roads importance is driven by sampling distribution? Are there simply more samples near road due to accessibility? And the model has put together a false association?
Line 471: “though it also increased RMSE” could this be simply down to a higher mean in the new maps? Seems this is the case based on lines 489-494. The higher cover relating to higher RMSE suggests to use a different stat such as nRMSE to account for this.
Lines 469-481: 1) The increase in sample size in the current map vs the older one is described is improving results. Is this not in opposition to Line 357 “We noticed that the reduction in sample size had little-to-no effect”?
2) Regarding the temporal imbalance in samples. Is there just one MEOF RF model built through time, in which case the temporal imbalance should be largely irrelevant. Or is a unique MEOF model built for each year, in which case the temporal and spatial balance would be key.
Line 497: “We manually delineated polygons of invasive MEOF presence, which were then used to train the RF classifier.” But this was done on UAS imagery, not directly for the RF classification as implied here.
Lines 497-516: Your approach of scaling observations with UAS imagery to serve as training over a broader landscape is similar in some respects to Rigge et al. 2020 (https://www.mdpi.com/2072-4292/12/3/412).
Line 526: “The prediction map for 2023….” This is the 3rd time this pattern has been discussed.
Lines 529-537: This entire section has nothing to do with validation (the section title). And much of it is introduction-type content.
Line 553: “Our model does not explain the variation in the MEOF cover that has biennial life cycle.” What? Isn’t that precisely the point of making time-series maps for 2016-2023?
Line 561: I don’t understand the point on HLS. Was HLS data actually used in modelling? And how is the sentence related to line 563 “We resolved this issue”? I don’t see the connection.
Line 849: Your group collected all of the ~23K observations? Are you using any BLM AIM data? Looking at Table S1, data from AIM, NEON, etc. was used. This should be described in the methods. Also, the “land cover map” should be more clearly defined as NLCD, date x.
Line 857 (Figure 3) should the range of percent cover be 0-100 not 0-1?
Line 867: “Yellow sweetclover percent cover estimates in the high yellow sweetclover probability” is quite awkward. Rephrase.
Figure 5. The point of this figure is to contrast the older Saraf 2023 map with the new one from the current paper? This needs to be made much more clear in the caption.
All figures, figures 3, 4, and 5 all have different color ramps (symbology). Pick one and stick with it for all figures.
Line 876 (Figure 7 caption): “Predicted percent cover estimates for invasive yellow sweetclover (MEOF) in panel (a) at four different sites represented with numbers and each site is compared with the PlanetScope imagery available at 3 m resolution shown in green, green, and blue band combination to highlight yellow sweetclover blooms in panel (b). (PlanetScope imagery © Planet Labs PBC).” This is all super confusing.
I would add a label “2019” over the left set of panels and “2023” over the right set, and change the text to:
“Predicted percent cover estimates for invasive yellow sweetclover (MEOF) at four different sites represented with numbers for 2019 (left) and 2023 (right). In each site, a) 3 m resolution PlanetScope imagery shown in green, green, and blue band combination to highlight yellow sweetclover blooms, B) fractional cover of MEOF. (PlanetScope imagery © Planet Labs PBC).”
Citation: https://doi.org/10.5194/essd-2025-353-RC1 -
RC2: 'Comment on essd-2025-353', Anonymous Referee #2, 09 Sep 2025
Remote sensing of invasive species is an important research topic. Here, the authors combine Sentinel-2 data with drone imagery to map invasive yellow sweetclover blooms across western South Dakota in a multiannual approach using machine learning.
This study could be an interesting example of how remote sensing can be applied in monitoring plant invasions. Further details about data analysis need to be added to the methods. Some results need to be supported by data. Some parts of the discussion need to be moved to the results section, and the discussion would benefit from further interpretation of the results which includes citing external literature.
I have a couple of questions/remarks:
L63: “However, previous studies often miss important data…”: Data on what?
L83: The spatial resolution depends on the size of the target species and/or age of the plant individual. Reformulate.
L132: What do you mean by a generalized model?
L141: From the logic of the introduction it is not clear why this validation step using PS imagery is required. Refine.
L199: How were the ten sites selected?
L204ff: So the cover of smaller plots were regarded as representative for the larger plots? Were they averaged? Please elaborate.
L209ff: How do these samples compare to the samples described in L204ff? Were they sampled in a similar way?
L215ff: Was there any non-flowering MEOF in your plots, and was the cover of non-flowering MEOF estimated?
L225: Hyperparameter tuning was performed to optimize which accuracy parameter?
L228: Which threshold was chosen for the binary classification?
L237ff, L256, L318ff: If data based on an RF model is used to perform another analysis using RF (or any other kind of model), can this lead to error propagation?
L239: Why did you decided to use linear regression and not any other type of approach?
L250: I think the methods needs a workflow diagram to visualize how which data was used for what purpose.
L289: How were the data resampled?
L309: Write RF out and cite it earlier in the manuscript.
L309: Overlaid or extracted?
L330: Distinguishing flowering MEOF pixel?
L345: RF predictions of what?
L348: Name the predictors or at least the most important groups.
L362ff: Support this result statement with data.
L377ff: Could mass blooming be affected rather by ground water parameters than precipitation? Were any patterns observed regarding closeness to floodplains? Maybe further analysis focussing on watersheds could also help to understand the mass blooming.
L378-383: “This unexpected result may be due to the large disparity in spatial resolution between Sentinel-derived variables at 10 m and the 1 km climate variables, with the 10,000-fold difference in spatial resolution contributing to an underestimation of precipitation as a significant variable. Therefore, we created a MEOF percent cover map series for 2016 through 2023 and compared it with precipitation anomaly maps during the same period computed using the Daymet dataset product.”: This sounds like a results. And I don’t really understand the latter part. Please reformulate.
L390: What does CV stand for?
L377-396: This sounds like a results section. Please reformulate.
L415ff: How did the time-series maps support the hypothesis? I don’t see this in your line of argumentation.
L398-423: This section lacks external references. Support your interpretation with references to existing literature.
L427: Why does particularly the bloom trigger changes in soil nitrogen content? Is it not generally an N-fixing species?
L443-446: The ecological consequences are an interesting aspect to be discussed. I think this aspect could be elaborated further.
L465: Which data show that local moisture dynamics and human disturbance play a critical role? Explore this further.
L478: Is there any way to deal with unbalanced data sets? Can you really relate the increased RSME with the imbalanced date set?
L497: Why did you not use the manually delineated polygons for modelling instead of the modelled cover values to avoid problems of error propagation?
L539ff: The whole section sounds like results. Reformulate and/or remove.
L548-550: Support this statement with data. It would rather belong to results. In the discussion, further interpretation of the results are needed.
L569: Do you mean PlanetScope data when referring to high-resolution mapping? What could be limitations of PlanetScope data?
Citation: https://doi.org/10.5194/essd-2025-353-RC2
Data sets
Spatiotemporal mapping of invasive yellow sweetclover blooms using Sentinel-2 and high-resolution drone imagery Sakshi Saraf, Ranjeet John, Venkatesh Kolluru, Khushboo Jain, Geoffrey Henebry, Jiquan Chen, Raffaele Lafortezza https://doi.org/10.6084/m9.figshare.29270759.v1
Model code and software
Spatiotemporal mapping of invasive yellow sweetclover blooms using Sentinel-2 and high-resolution drone imagery Sakshi Saraf, Ranjeet John, Venkatesh Kolluru, Khushboo Jain, Geoffrey Henebry, Jiquan Chen, Raffaele Lafortezza https://doi.org/10.6084/m9.figshare.29270759.v1
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
560 | 21 | 7 | 588 | 17 | 17 | 17 |
- HTML: 560
- PDF: 21
- XML: 7
- Total: 588
- Supplement: 17
- BibTeX: 17
- EndNote: 17
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1